Programs with potentially malicious content are becoming increasingly common. Such programs are usually highly obfuscated, using a variety of techniques that make it difficult to analyze the code, figure out its internal logic, and develop countermeasures. Existing tools for reverse engineering such programs are primitive and require a great deal of tedious and time-consuming manual intervention, which hampers the timely development of defenses against newly discovered malware. This project aims to devise automatic techniques to simplify away these obfuscations and thereby make it significantly faster and easier to understand the internal logic of obfuscated code with potentially malicious content.
The project uses dynamic program analysis techniques to identify instructions that affect the program's observable behavior; these instructions are then extracted and, where appropriate, simplified using equational techniques. Key research questions investigated include simplification in the face of arbitrary obfuscations and combinations of obfuscations, including (possibly multiple layers of) self-modification and emulation. The main impact of this project will be to make it easier and quicker for security researchers to figure out the internal logic of malware programs. This, in turn, will make it possible to respond more quickly to new malware and develop countermeasures to them faster and with less manual intervention. The effect will be to reduce the damage done by malware before they can be neutralized.