A deep, pervasive problem when attempting to secure modern computer networks arises from the bewildering range of applications that these networks carry. Unless a specific application is understood, its presence cannot be soundly monitored and controlled. Yet we have seen in the past decade the rise and use of many hundreds of applications, a growth far outpacing the ability of security practitioners to apprehend their individual operation and implications.
This research effort aims to facilitate pervasive understanding and control of the wealth of application protocols running on today's networks. Developing in-depth visibility into these protocols will perforce lead to new capabilities for exposing the inner-workings of modern applications that have yet to be well understood within the community. Such understanding will provide pragmatic, high-impact functionality, since operators will be able to directly incorporate this information in their monitoring efforts.
A key goal of the undertaking is to facilitate the means by which the broader network research community can work together to jointly construct application analysis resources that are shared across the field. The project envisions a "lingua franca" for expressing application protocol structure and semantics that moves beyond the status quo by providing a common platform and language for expressing a wide range of semantics and analyses. While the focus within our project is on application analysis for purposes of monitoring and securing networks, the tools we develop will often lend themselves to repurposing in support of other networking concerns such as network management, trouble-shooting, and performance optimization.
We have made significant progress in advancing the research areas of malware analysis, intrusion detection systems, mobile application analysis and web application analysis. First, we have designed and developed novel techniques which enables us to automatically extract botnet command and control protocols. A botnet (also known as zombie army) is a large collection of compromised computers on the Internet, controlled by a malicious bot master to perform tasks such as spamming and distributed denial-of-service attacks. Our protocol inference technique can automatically learn the message format and content (i.e., the protocol) between the bot master and the infected machines. The learned protocol can be used to analyze botnet behaviors and weaknesses. Our work has lead to the discovery of a weakness in MegaD, a botnet at peak is responsible for sending a major portion of the spams worldwide. This weakness allows the extraction of spam detection rules that accurately identify MegaD spams. Secondly, we have developed multiple novel techniques for signature generation and anomaly detection, which has greatly improved the state-of-the-art intrusion detection systems. A signature is a set of specifications and rules for matching and detecting certain behaviors. Our findings include a novel protocol-level constraint-guided exploration for signature generation. In our experimental results, our tool generates compact, high coverage signatures for real-world vulnerabilities. We also develop a new model of system call behavior, called an execution graph. The execution graph is the first such model that both requires no static analysis of the program source or binary, and conforms to the control flow graph of the program. In addition, we also developed new notions of process-level similarity as a means to detect an attack on one process that causes its behavior to deviate from that of another. On the mobile application analysis side, we have performed research both on profiling the network fingerprints of mobile applications and on examining the mobile applications from their event-level behaviors. Our fingerprint profiling project allows us to identify what applications are being used from the network level. We use our technique to generate network profiles for thousands of apps. Using our network profiles we are able to detect the presence of these apps in real-world network traffic logs from a cellular provider. Our event-based behavioral analysis allows us to automatically detect sensitive operations being performed without the user’s consent, such as recording audio after the stop button is pressed, or accessing an address book in the background. Our evaluation on real-world Android applications shows that we can detect, or prove the absence of malicious behaviour beyond the reach of existing techniques. Finally, we have also performed research on analyzing and hardening web-based application. We conduct a security analysis of five popular web-based password managers. We identify four key security concerns for web-based password managers and, for each, identify representative vulnerabilities through our case studies. Our attacks are severe: in four out of the five password managers we studied, an attacker can learn a user’s credentials for arbitrary websites. Our study suggests that it remains to be a challenge for the password managers to be secure. To guide future development of password managers, we provide guidance for password managers. Given the diversity of vulnerabilities we identified, we advocate a defense-in-depth approach to ensure security of password managers. In another project on web security and privacy, we develop new techniques to use encrypted data in web applications. We first perform a systematization of the design space of web applications and highlight the advantages and limitations of current proposals. Next, we design and implement ShadowCrypt, a previously unexplored design point that enables encrypted input/output without trusting any part of the web applications. ShadowCrypt allows users to transparently switch to encrypted input/output for text-based web applications.