With the increasing availability of large-scale datasets such as from intensive care units (ICUs), researchers face a flood of data that does not lead immediately to knowledge. Given its volume and frequency of collection (ICU patients are monitored every 5 seconds) many important events will be rare occurrences. Unlike the traditional approach of prospectively measuring a small set of variables hypothesized to be important, these observational datasets contain a large, unselected, and incomplete set of features. They can allow insight into cases where experiments are infeasible, but using them for decision-making requires new methods for finding the impact of rare events and hidden variables in complex time sense, along with realistic simulated data for evaluation. This proposal addresses two main challenges of large-scale observational data: 1) evaluating the causal impact of rare events, and 2) identifying latent causes. First, we leverage the volume of data and the connection between type (general) and token (singular) causality to infer a model of how a system normally functions, and then determine whether rare event explain a deviation from usual behavior. The basic approach of company a model and observed instances forms the basis for finding latent variables, where we aim to find how much of a variable's value (or how many of its occurrences) is due to influences outside the dataset and to find shared causes for sets of variables. This is motivated by applications to neurological ICU (NICU) data streams where the volume of continuous recordings of patients'brain activity and physiological signs surpasses clinicians'ability to find complex patterns in real time to use them for treatment. Further, clinicians need to know not just that a patient is having a seizure (a low probability event with a potentially significant impact on outcomes), but whether it is causing harm before they can determine how to treat it. To enable rigorous validation of the algorithms, we develop a new computational platform for generating simulated NICU time series data. The methods will improve understanding of seizures in stroke patients and will be broadly applicable to large-scale high- resolution time series data, enabling discoveries in areas such as computational social science.

Public Health Relevance

;The methods developed will improve the translation of data to knowledge to policy by identifying actionable information on causes, enabling better and more rapid decision-making by clinicians. Creating and disseminating realistic simulated data will allow for comparison and validation of methods, facilitating computational advances by researchers in computer science and medicine.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM011826-02
Application #
8667497
Study Section
Special Emphasis Panel (ZRG1-BST-N (52))
Program Officer
Sim, Hua-Chuan
Project Start
2013-06-01
Project End
2016-05-31
Budget Start
2014-06-01
Budget End
2015-05-31
Support Year
2
Fiscal Year
2014
Total Cost
$212,104
Indirect Cost
$51,822
Name
Stevens Institute of Technology
Department
Type
DUNS #
064271570
City
Hoboken
State
NJ
Country
United States
Zip Code
07030