Massive longitudinal healthcare data, such as administrative claims and electronic health records, provide an opportunity to greatly enhance the accuracy and clinical impact of patient-level predictions across a wide range of outcomes. This research targets the national priority domain of healthcare IT and showcases the advances that Big Data afford in helping patients make informed healthcare decisions leading to improved outcomes. Other involved stakeholders include healthcare providers, insurers and governmental agencies, and the databases this proposed grant employs encompass diverse and vulnerable patient populations, including the young, the poor and the elderly. Within this context, this grant is seeking to predict patient-level health events based upon personal characteristics and conditions. Accurate and well-calibrated predictions could significantly improve the wellbeing of patients and populations. This grant proposes to derive predictive models from massive observational data and then, for example, predict that a particular patient has an 18% chance of experiencing a stroke in the next 12 months. With this prediction in hand, caregivers and patients can optimize medical interventions and implement behavioral changes to hopefully prevent the predicted event. Further, this grant integrates two graduate student researchers, whose mentored experiences begin to rectify the shortage of data scientists trained at the intersection of statistics and medicine, and provides general statistical software tools for building large-scale predictive models from massive data across scientific domains.

From a technical perspective, the proposed grant aims to first evaluate performance and applicability of an existing predictive model across five administrative claims and electronic health record databases covering over 80 million lives, using CHADS2 stroke risk as a motivating example. Then the grant will develop an innovative data-driven process for building patient-level predictive models from longitudinal observational data, and initially apply the process to predicting stroke in patients with atrial fibrillation for comparison of performance against CHADS2, Finally, the grant aims to explore characteristics of the process and resulting models, such as: evaluation of out-of-sample predictive performance in different databases; consideration of how models change over time; and assessment of which clinical variables most substantially contribute to patient-level predictions. Together, this research will focus on identifying heuristics to extract clinically relevant predictors from longitudinal electronic healthcare data, developing algorithms to use this information in multivariate modeling through massive parallelization using graphics processing units, optimized for data sparsity, and evaluating performance based on accuracy in predicting outcomes at the patient level. As a proof-of-concept, the grant will develop an approach to predict stroke risk and apply this approach across five disparate data sources (80+ million patients, including drugs, lab values, procedures, emergency room visits, primary care visits, inpatient encounters, etc) that reflect diverse patient populations across the US, including the privately insured, Medicare-eligible, and Medicaid beneficiaries. The underlying goal of the grant is to apply innovative statistical and machine learning techniques using advancing computer technology to large-scale observational data to develop accurate and well-calibrated patient-level predictive models enabling the prediction of future medical events for individual patients.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1251151
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2013-07-01
Budget End
2017-06-30
Support Year
Fiscal Year
2012
Total Cost
$688,969
Indirect Cost
Name
University of California Los Angeles
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90095