The Null Device

Disguising machine code as English

Security researchers are now working on ways of generating machine code that looks like English-language text (PDF).
In this paper we revisit the assumption that shellcode need be fundamentally different in structure than non-executable data. Specifically, we elucidate how one can use natural language generation techniques to produce shellcode that is superficially similar to English prose. We argue that this new development poses significant challenges for inline payloadbased inspection (and emulation) as a defensive measure, and also highlights the need for designing more efficient techniques for preventing shellcode injection attacks altogether.
The code is generated by a language engine which selects fragments of text, Markov-chain-fashion, from a large source (such as Wikipedia or the Gutenberg Project). It looks like the random gibberish spammers pad their emails out with, though if executed, functions as x86 machine code. (Rather inefficient machine code, with a lot of jumps and circumlocutions to fit the constraints of looking like English, but good enough to sneak exploits through in.) Below is an example of some code thus disguised:

There are no comments yet on "Disguising machine code as English"

Want to say something? Do so here.

Post pseudonymously

Display name:
URL:(optional)
To prove that you are not a bot, please enter the text in the image into the field below it.

Your Comment:

Please keep comments on topic and to the point. Inappropriate comments may be deleted.

Note that markup is stripped from comments; URLs will be automatically converted into links.