Software obfuscation is a transformation procedure to make a program difficult to understand, but still preserves all of the program's original behavior. Mixed-Boolean-Arithmetic (MBA) obfuscation is a powerful and efficient obfuscation method. It transforms simple calculations to very complex expressions with mixed Boolean and arithmetic operators. Since many malware developers have adopted obfuscation techniques to hide malware from detection, analyzing obfuscated programs plays a crucial role in modern software security. This project seeks to effectively reverse MBA obfuscation result, which means to recover the original program logic from an obfuscated program produced by MBA transformation.
The objective of this project is to unveil the theoretical and practical attributes of MBA obfuscation. This research reveals the undiscovered fundamental weakness of MBA obfuscation and consequently challenges the existing design of MBA obfuscation. The research tasks include: 1) developing an arithmetic-based simplification method to reverse normal MBA obfuscation; 2) simplifying multi-granularity MBA obfuscation; and 3) reducing generic non-linear MBA expression. This project will advance human knowledge about MBA de-obfuscation and produce practical MBA reverse analysis tools.
The project will enable broader adoption of formal methods in security analysis applications and inspire more interdisciplinary research across programming languages and software security. The developed methods and data set will be publicly available. Besides, this project will facilitate the development of novel educational tools to enhance several current courses at The University of New Hampshire (UNH). The minority students and under-served populations will be engaged in both research and extracurricular activities (such as Capture-the-Flag competition) to participate in cutting-edge cyber-security research.
Source code, documentation, experimental results, and scholarly publications, will be managed using the distributed version control system Git. New curriculum materials will be organized by the course management system at UNH. A local repository copy will be stored in the backup servers at UNH SoftSec Group. Data will be retained for at least three years beyond the award period. The scholarly publications, presentations, and open-source code will be available on the homepage (www.cs.unh.edu/~dxu).
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.