A deep, pervasive problem when attempting to secure modern computer networks arises from the bewildering range of applications that these networks carry. Unless a specific application is understood, its presence cannot be soundly monitored and controlled. Yet we have seen in the past decade the rise and use of many hundreds of applications, a growth far outpacing the ability of security practitioners to apprehend their individual operation and implications.

This research effort aims to facilitate pervasive understanding and control of the wealth of application protocols running on today's networks. Developing in-depth visibility into these protocols will perforce lead to new capabilities for exposing the inner-workings of modern applications that have yet to be well understood within the community. Such understanding will provide pragmatic, high-impact functionality, since operators will be able to directly incorporate this information in their monitoring efforts.

A key goal of the undertaking is to facilitate the means by which the broader network research community can work together to jointly construct application analysis resources that are shared across the field. The project envisions a "lingua franca" for expressing application protocol structure and semantics that moves beyond the status quo by providing a common platform and language for expressing a wide range of semantics and analyses. While the focus within our project is on application analysis for purposes of monitoring and securing networks, the tools we develop will often lend themselves to repurposing in support of other networking concerns such as network management, trouble-shooting, and performance optimization.

Project Report

This project aimed to enable broad understanding and control of the wealth of application protocols running on today's Internet. To do so, we pursued developing technologies to capture the structure of how applications communicate: concise ways to describe their forms and powerful tools for distilling from monitored traffic its underlying information. We then employed these technologies to analyze the nature of modern network traffic. A basic component of our vision for comprehensive application analysis and control was the development of a "lingua franca" for expressing application protocol structure and semantics. With our framework, an analyst can simply describe the elements and sequencing of a protocol without needing to consider how to then identify those elements when they appear in network traffic. Instead, the framework itself automatically generates recognizers from the analyst's high-level description. In another thrust, we developed extensive techniques for "reverse engineering" the functioning of unknown application protocols. These approaches primarily draw upon execution analysis of clients and servers that use a given protocol, for which we then use dynamic monitoring of the execution to automate the extraction of the format of the application's messages. An important application of such protocol reverse-engineering is the study of command-and-control (C&C) protocols used by botnets - huge ensembles of compromised Internet systems all under the control of a single attacker. By doing so, we were able to "infiltrate" botnets in order to observe their operations, targeting, and potential weaknesses, including in some cases the ability to alter their functioning unbeknownst to the "botmaster" controlling the botnet. We also extensively assessed the degree to which network control elements manipulate the connectivity available to a user's Internet applications. These manipulations often occur in a hidden fashion by which the user has no visibility into the transformations applied to their traffic or the reasons for denied connectivity. Our ongoing "Netalyzr" project (per netalyzr.icsi.berkeley.edu) represents conducting one of the largest studies of the constraints imposed on Internet users undertaken to date. Netalyzr works from a user's Web browser, working in concert with back-end servers we operate to conduct dozens of tests regarding network behavior. To date we have logged more than 1.3 million Netalyzr runs, illuminating a wide spectrum of application-layer proxies, rewritten Domain Name System (DNS) replies, network address translation, security issues, performance parameters, and firewall policies. In a different dimension, the Internet's DNS represents key infrastructure for virtually all applications, and we studied a number of aspects of its use and operation. We developed methodologies for efficiently discovering complex client-side DNS infrastructure, including measurement techniques for isolating the behavior of the different parties in the infrastructure. We also examined the nature of the process of domain names being registered. In this context, we particularly aimed to discern differences between the registration of names ultimately to be used for malicious purposes, such as in spam campaigns, versus those for use by benign applications. Our assessment enabled us to develop a classifier that predicts with high accuracy whether a newly registered domain will see future malicious use. Finally, as web browsers increasingly become the de facto operating system of network-mediated applications, malicious manipulation of web browsing becomes a significant threat. One new domain in this regard concerns the rise of "web extension" ecosystems; in a number of ways, such extensions are a new form of "app", albeit one that exists wholly within the browser universe. We developed a system for rapidly determining whether a Chrome browser extension includes latent malicious functionality, and worked in partnership with Google to investigate the related problem of "web injection" - the surreptitious alteration of the requests and replies sent by a user's browser - which we find affects tens of millions of Internet users. We incorporated a number of components of our application analysis technology into our "Bro" network analysis framework, per www.bro.org. Bro's provides particularly strong capabilities for real-time application-layer analysis. Incorporating our technology into it allows the fruits of our efforts to reach a broad audience. As one particular example, the Summer 2014 Bro Conference had 148 attendees from a wide range of institutions and companies.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
0831535
Program Officer
Angelos Keromytis
Project Start
Project End
Budget Start
2008-10-01
Budget End
2014-08-31
Support Year
Fiscal Year
2008
Total Cost
$1,650,000
Indirect Cost
Name
International Computer Science Institute
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704