SaTC: CORE: Medium: Large-Scale Data Driven Anomaly Detection and Diagnosis from System Logs

Ricci, Robert; Li, Feifei; Srikumar, Vivek; Ricci, Robert

Abstract

Detecting unusual and anomalous behavior in computer systems is a critical part of ensuring they are secure and trustworthy. System logs, which record actions taken by programs, are a promising source of data for such anomaly detection. However, existing practices and tools for doing log analysis require deep expertise, as well as heavy human involvement in both defining and interpreting possible anomalies, which limits their scalability and effectiveness. This project's goal is to improve the state of the art around log-based anomaly detection by developing a framework called DeepLog through (a) advancing natural language processing techniques to extract structured information from a wide variety of log files to support analysis across different data sources and across time, (b) developing new methods to model legitimate workflows and log event sequences over time, (c) adapting machine learning methods to identify deviations from those workflows that represent potential anomalies, and (d) creating tools for system administrators to help them diagnose possible security issues more effectively and efficiently. The work will be integrated into a freely available software package to benefit both other researchers and practicing system administrators and used to support both classroom and research-based educational activities at the investigators' institutions.

Toward log parsing, the team will adapt named entity recognition methods to parse unstructured logs as well as structured logs where the structure is not pre-defined by, e.g., regular expressions, into structured key-value pairs of log event types and parameters. This data can be seen as a multi-dimensional feature space whose contents are constrained by the execution of the underlying programs and thus reflects a hidden structure that defines the set of valid, non-anomalous execution sequences. To help articulate this hidden structure, the team will develop long-short-term-memory (LSTM)-based neural network models that use both the key and value elements to extract semantically meaningful subsequences of program behavior from data extracted from system runs known to be normal. Once these models are developed using known-good training data, they can be applied to anomaly detection by flagging for consideration new log entries that are unexpected given the current state of the system, logs, and model; they can also be used to infer the underlying workflows and hidden structures described earlier. These models will be improved through that online learning methods, administrators' feedback about the seriousness of reported anomalies, and generative adversarial training models which create execution sequences that, though anomalous, hew closely to the hidden structures embedded in the logs and the LSTM-based models.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Network Systems (CNS)
Type: Standard Grant (Standard)
Application #: 1801446
Program Officer: Wei-Shinn Ku

Project Start
Project End
Budget Start: 2018-08-01
Budget End: 2022-07-31
Support Year
Fiscal Year: 2018
Total Cost: $1,100,000
Indirect Cost

SaTC: CORE: Medium: Large-Scale Data Driven Anomaly Detection and Diagnosis from System Logs
Ricci, Robert Li, Feifei Srikumar, Vivek Ricci, Robert
University of Utah, Salt Lake City, UT, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments