Layer-8 attacks (e.g., spam and phishing) are launched from a malicious service platform, e.g., botnet, which consists of a large number of infected machines (or bots). Such an attack platform relies on lower-layer network services to achieve efficiency, robustness, and stealth in communication and attack activities. These services include look-up (e.g., DNS), hosting (e.g., Web servers), and transport (e.g., BGP).
The main research goals and approaches of the CLEANSE project are:
1. Control-plane monitoring. Much of the infrastructure for mounting layer-8 attacks involves abuse of the control plane in core network services (e.g., DNS and BGP). The CLEANSE project develops control-plane anomaly detection sensors that are distributed, online, and real-time.
2. Data-plane monitoring. The project develops new and general network anomaly detection algorithms based on traffic sampling and clustering for monitoring high-speed traffic.
3. Improved security auditing capabilities. The CLEANSE project develops packet "tagging/tainting" techniques to enable tracking and clustering of network traffic flows (e.g., that are generated by the same bot program). The project also develops improved traffic sampling capabilities that are attack-aware and distributed network-wide.
By focusing on monitoring of core network services, the CLEANSE framework can detect future layer-8 attacks and new forms of large-scale malware infections. The project also creates educational contents, including new textbooks and on-line course materials, which directly benefit from the research activities. The CLEANSE project team also work with industry partners (including the ISPs) to organize focused workshops that bring together researchers from academia and practitioners from the industry/ISP, government, and law enforcement agencies to foster the exchange of ideas, data, and technologies.
Application attacks on the Internet, such as email and blog spam, phishing, and click fraud, manipulate network applications to victimize users. Such attacks rely on network services to efficiently and stealthily coordinate attack activities. Infected computers often use look-up services to locate the command-and-control servers and receive instructions. Hosting services allow the storage and exchange of attack-related data (e.g., malware or stolen information), analogous to the use of "drop sites" in the physical world. Finally, malware takes advantage of protocols to connect to intended victims. Such attacks therefore often result in observable service violations and anomalies because the activities of infected computers are different from normal user activity. This project developed a better understanding of how current and future attacks might be launched and detected. In particular, we sought to identify the basic network services that are necessary for large-scale attacks, and developed new analysis and detection algorithms and infrastructures to monitor these service activities to detect and predict attacks. Much of the research at UNC focused on the Domain Name System (DNS), which is a globally distributed collection of "name servers" and associated protocols for mapping user-friendly domain names (e.g., "nytimes.com") to the Internet Protocol (IP) addresses used to route packets to/from their websites. - A growing trend in optimizing the speed of web browsers is to prefetch DNS resolutions for domains in hyperlinks, in case the user clicks on one. We showed that if left unchecked, DNS prefetching could lead to new security and privacy abuses. - The importance of domain names for resale, serving ad content, or launching malware has contributed to the rise of questionable practices in acquiring them. Our accomplishments included one of the first comprehensive studies of abusive domain-registration practices. We explored ways to automatically generate high-quality domain names related to current events to measure domain "front running" by registrars and "speculation" by others. - We studied techniques for detecting algorithmically generated domains (AGDs) that are automatically generated to minimize collisions with others. Although such domains are used by malware to thwart defenses, they are also used for benign purposes. The rise of these benign applications negatively impacts the ability to accurately classify malicious AGDs. We studied current uses of and existing detection mechanisms for AGDs, and then developed better techniques for identifying infected computers using AGDs. Outside the context of DNS specifically, we also developed technologies and explored threats relevant to Internet defense: - A requirement in some types of Internet monitoring is repeatedly probing some set of targets over time. While some probers exercise restraint to limit collateral damage of their probing, the literature is rife with examples of what many might consider egregious practices. We developed efficient probing algorithms and tools to more responsibly manage probing. - Due to existing router support for collecting network traffic summaries called flow records, there is increasing attention being devoted to performing network anomaly detection using flow records. We explored the limits of analysis using flow records for security purposes, by investigating to what extent an attacker who compromises machines in an enterprise, for example, can perform his activities in a way that is undetectable in summary records. - We developed a new approach to flow monitoring and evaluated it against several application-specific approaches to detecting network abuses. Our approach provides comparable or better accuracy than these application-specific approaches. Our design has significant implications for router vendors, network operators, and measurement researchers, since it reduces router complexity without compromising a vendor's ability to satisfy its customers' demands. - Graph models from network theory have been widely applied to study properties of real-world networks, including social, biological, and computer networks. Several works have also applied graph models from network theory to study communication among infected computers. We developed more realistic models for such communication and used these models to improve malware takedown strategies. - Network defenses are often assembled using dedicated appliances that are "patched into" the network on a piecemeal basis, leading to inefficiencies in resource usage and management. We developed a top-down design for more efficient network defense that includes a more open, programmable appliance that can be leveraged for multiple purposes in a network, at the behest of a centralized controller. - Searching collected network data to identify activity of interest is complicated by the fact that it is typically very sensitive and so stored in an encrypted form. We developed technologies to permit the storage of large datasets in encrypted form so that authorized parties can still perform rich searches on those records. The above research served as the locus of training for numerous Ph.D. students, who went on to take positions at technology leaders such as Google, Apple, Intel, and EMC, for example. Our advances are also leveraged in courses where appropriate, at UNC and elsewhere.