Recent advances in sensors and high throughput data acquisition technologies have made it possible to collect massive amount of data, and especially time series data in a number of domains (e.g., climate sciences, biological sciences). While a wide range of techniques have been developed for clustering and mining such data, there has been limited progress on scalable algorithms for extracting causal relationships from time series data. This project aims to develop novel machine learning models based on Granger causality to uncover the complex dependence structures from high-dimensional time series. The resulting algorithms will be evaluated in the context of two real-world applications (climate change, computational biology).

The project aims to address three fundamental challenges of data analysis from time series data, including: (1) developing the theoretical foundations of causality analysis from time series data to quantify the gap between Granger causality and true causality, (2) developing a unified framework to incorporate different types of domain knowledge in data analysis, and (3) examining effective solutions to important but usually overlooked practical issues, including irregular nature of the time series and scalability. The resulting algorithms will be evaluated on two real applications, i.e., gene regulatory network discovery in immune systems and climate change attribution, by collaborating with researchers in biology and climate science.

The proposed research could impact multiple application domains where discovery of causal relationships from high dimensional time series data is of interest. The project is expected to advance the theoretical foundations of data analytic techniques for time-series data and provide a unified framework that can easily integrate domain knowledge. The results of this project can be expected to significantly advance the current state of the art in eliciting insights regarding causal relationships from time series data. In addition to the core research advances, this project contributes easy-to-use software based on workflows for teaching machine learning to students, researchers and practitioners with a broad range of backgrounds. Educational and outreach activities include new interdisciplinary courses, workshops, tutorials, and high-school visits. Software and data resulting from this work will be freely disseminated to the broader research and educational community. Additional information about the project can be found at: http://www-bcf.usc.edu/~liu32/uscTimeSeries.htm.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1254206
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2013-03-15
Budget End
2021-02-28
Support Year
Fiscal Year
2012
Total Cost
$510,385
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089