In this paper we revisit the assumption that shellcode need be fundamentally different in structure than non-executable data. Specifically, we elucidate how one can use natural language generation techniques to produce shellcode that is superficially similar to English prose. We argue that this new development poses significant challenges for inline payloadbased inspection (and emulation) as a defensive measure, and also highlights the need for designing more efficient techniques for preventing shellcode injection attacks altogether.The code is generated by a language engine which selects fragments of text, Markov-chain-fashion, from a large source (such as Wikipedia or the Gutenberg Project). It looks like the random gibberish spammers pad their emails out with, though if executed, functions as x86 machine code. (Rather inefficient machine code, with a lot of jumps and circumlocutions to fit the constraints of looking like English, but good enough to sneak exploits through in.) Below is an example of some code thus disguised:
Please keep comments on topic and to the point. Inappropriate comments may be deleted.
Note that markup is stripped from comments; URLs will be automatically converted into links.