This I-Corps team plans to further develop a software tool that takes x86 binary programs as input (including stripped binaries), and produces equivalent source-code programs in C. The binary can be compiled from any language. The output C code is not the same as the original source code, but is functionally equivalent. The output C code is fully functional: it can be modified, recompiled, and run as needed. Alternately, the software can output a rewritten binary, or the intermediate representation (IR) of the open-source LLVM compiler, allowing further analysis and transformation of binary code with existing or new LLVM passes. The software developed by the team is able to perform deep binary analysis where the output code is high-level, containing symbols, functions, arguments, return values, types (including aggregate types), and there are high-level control flow constructs, and an abstract stack. Alias analysis and type recovery schemes have been developed that work synergistically to do effective alias analysis on binary code, and recover types including aggregate types like structures and arrays. The team has also developed technologies to rewrite stripped binaries (i.e., those without relocation and symbolic information).

With further development this software tool may be a valuable tool for the recovery of source code from legacy binaries. Both in government and industry, legacy code is run every day, but its source code is often hard to track or lost, given that the original code vendor may have gone through corporate mergers, reorganization or liquidations. Re-developing code from scratch can be costly and difficult to replicate as the full scope of the original functionality is often unknown. In these cases, this software tool may be able to provide source code that can be understood, maintained, updated and recompiled with newer compliers and for newer versions of the x86 instruction set. Additionally, this tool may have applications in forensics to examine and understand the behavior of vulnerable or un-trusted code prior to or after a security breach. This goes beyond existing security tools in dynamic binary rewriters or binary analysis tools, which used automated security tools that are useful, but cannot help with the human understanding of un-trusted or vulnerable code.

Project Report

The goal of this project was to study the commercial feasibility of research at the University of Maryland on the analysis and transformation of executable computer software. Our software to do this analysis and transformation is called SecondWrite. Based on almost one hundred interviews during the NSF-funded I-corps (Michigan, Oct - Nov 2012) customer discovery process, we believe that SecondWrite will be a game changer that will dramatically improve the speed, efficiency and efficacy in countering cyber threats. President Obama recently cited cyber-threats as one of our most serious economic and national security challenges. Cyber-crime costs the US economy billions of dollars and poses a direct threat to our national infrastructure and financial institutions. The losses from theft of intellectual property alone cost American companies around $250 Billion per year. SecondWrite has the potential to enable orders-of-magnitude productivity improvements across the cyber security spectrum including malware analysis, exposing undesirable behavior in untrusted code and detecting vulnerabilities from proprietary software. SecondWrite’s innovative techniques result in a precise discovery of features and robust defense measures against the threats. SecondWrite also enables modification and maintenance of legacy software whose source code has been lost. Consequently, SecondWrite enables a substantially faster, automated, and more detailed analysis of cyber-threats resulting in a more robust defense capability. SecondWrite directly contributes to minimizing losses to the US economy. Better protection of our IP and trade secrets also contributes to minimizing American job losses.

Agency
National Science Foundation (NSF)
Institute
Division of Industrial Innovation and Partnerships (IIP)
Type
Standard Grant (Standard)
Application #
1265331
Program Officer
Rathindra DasGupta
Project Start
Project End
Budget Start
2012-10-01
Budget End
2013-06-30
Support Year
Fiscal Year
2012
Total Cost
$50,000
Indirect Cost
Name
University of Maryland College Park
Department
Type
DUNS #
City
College Park
State
MD
Country
United States
Zip Code
20742