With the increasing availability of large-scale datasets such as from intensive care units (ICUs), researchers face a flood of data that does not lead immediately to knowledge. Given its volume and frequency of collection (ICU patients are monitored every 5 seconds) many important events will be rare occurrences. Unlike the traditional approach of prospectively measuring a small set of variables hypothesized to be important, these observational datasets contain a large, unselected, and incomplete set of features. They can allow insight into cases where experiments are infeasible, but using them for decision-making requires new methods for finding the impact of rare events and hidden variables in complex time sense, along with realistic simulated data for evaluation. This proposal addresses two main challenges of large-scale observational data: 1) evaluating the causal impact of rare events, and 2) identifying latent causes. First, we leverage the volume of data and the connection between type (general) and token (singular) causality to infer a model of how a system normally functions, and then determine whether rare event explain a deviation from usual behavior. The basic approach of company a model and observed instances forms the basis for finding latent variables, where we aim to find how much of a variable's value (or how many of its occurrences) is due to influences outside the dataset and to find shared causes for sets of variables. This is motivated by applications to neurological ICU (NICU) data streams where the volume of continuous recordings of patients' brain activity and physiological signs surpasses clinicians' ability to find complex patterns in real time to use them for treatment. Further, clinicians need to know not just that a patient is having a seizure (a low probability event with a potentially significant impact on outcomes), but whether it is causing harm before they can determine how to treat it. To enable rigorous validation of the algorithms, we develop a new computational platform for generating simulated NICU time series data. The methods will improve understanding of seizures in stroke patients and will be broadly applicable to large-scale high- resolution time series data, enabling discoveries in areas such as computational social science.

Public Health Relevance

; The methods developed will improve the translation of data to knowledge to policy by identifying actionable information on causes, enabling better and more rapid decision-making by clinicians. Creating and disseminating realistic simulated data will allow for comparison and validation of methods, facilitating computational advances by researchers in computer science and medicine.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stevens Institute of Technology
United States
Zip Code
Morris, Nicholas A; Robinson, David; Schmidt, J Michael et al. (2018) Hunt-Hess 5 subarachnoid haemorrhage presenting with cardiac arrest is associated with larger volume bleeds. Resuscitation 123:71-76
Zheng, Min; Claassen, Jan; Kleinberg, Samantha (2018) Automated Identification of Causal Moderators in Time-Series Data. Proc Mach Learn Res 92:4-22
Reznik, Michael E; Mahta, Ali; Schmidt, J Michael et al. (2018) Duration of Agitation, Fluctuations of Consciousness, and Associations with Outcome in Patients with Subarachnoid Hemorrhage. Neurocrit Care 29:33-39
Witsch, Jens; Frey, Hans-Peter; Schmidt, J Michael et al. (2017) Electroencephalographic Periodic Discharges and Frequency-Dependent Brain Tissue Hypoxia in Acute Brain Injury. JAMA Neurol 74:301-309
Hum, R Stanley; Kleinberg, Samantha (2017) Replicability, Reproducibility, and Agent-based Simulation of Interventions. AMIA Annu Symp Proc 2017:959-968
Reznik, Michael E; Schmidt, J Michael; Mahta, Ali et al. (2017) Agitation After Subarachnoid Hemorrhage: A Frequent Omen of Hospital Complications Associated with Worse Outcomes. Neurocrit Care 26:428-435
Mikell, Charles B; Dyster, Timothy G; Claassen, Jan (2016) Invasive seizure monitoring in the critically-Ill brain injury patient: Current practices and a review of the literature. Seizure 41:201-5
Claassen, Jan; Rahman, Shah Atiqur; Huang, Yuxiao et al. (2016) Causal Structure of Brain Physiology after Brain Injury from Subarachnoid Hemorrhage. PLoS One 11:e0149878
Claassen, Jan; Velazquez, Angela; Meyers, Emma et al. (2016) Bedside quantitative electroencephalography improves assessment of consciousness in comatose subarachnoid hemorrhage patients. Ann Neurol 80:541-53
Heintzman, Nathaniel; Kleinberg, Samantha (2016) Using uncertain data from body-worn sensors to gain insight into type 1 diabetes. J Biomed Inform 63:259-268

Showing the most recent 10 out of 11 publications