Patient care has been transformed by the availability of high-dimensional sources like electronic health records (EHR) and genomic data, allowing health care decisions to be tailored to individual patients. Statistical methods have been developed to ef?ciently use such high dimensional data, but critical gaps still remain. Several common models for survival analysis have recently been extended to accommodate high-dimensional variable selection and machine learning prediction methods, but similar tools have not yet been developed for the setting of semi- competing risks. In the semi-competing risks setting, interest focuses on jointly modeling both a terminal time- to-event outcome, as well as a non-terminal time-to-event outcome which can only occur for subjects who have not yet experienced the terminal event. Examples of this exist in severe pregnancy-related diseases such as pre-eclampsia (PE - further described below). PE and subsequent delivery are natural semi-competing risks, as PE can develop before delivery, but not after. Current methods do not provide analysts with data-driven tools for uncovering important covariates from high-dimensional data, and clinicians lack meaningful, personalized predictions of patients' joint probability of experiencing one or both outcomes prospectively through time. This proposal addresses these methodological gaps with tools for high-dimensional inference and prediction.
In Aim 1, I will address the challenge of variable selection by developing a suite of regularized estimators for se- lecting important covariates from large datasets into a semi-competing risks model, and evaluating performance by simulation.
In Aim 2, I will create a deep feed forward neural network modeling framework for predicting individual patients' joint probabilities of experiencing one or both outcomes of interest across future time points. Together, these aims will improve personalization of health care decisions. Software will be developed that provides researchers practical and user-friendly tools for applying these methods.
In Aim 3, I will apply these approaches for semi-competing risks to evaluate risk of PE, which is globally a leading cause of maternal and fetal/neonatal mortality and morbidity. Using EHR pregnancy data from 50,000 births between 2011-2020, I will use the proposed variable selection methods to develop a model identifying risk factors for PE along with factors affecting time-to-delivery among PE patients. Through this work, I will also build a deep learning model in order to jointly predict maternal PE and NICU admission of the infant, yielding personalized prediction plots to facilitate care decisions that balance maternal and fetal health risks. For ease of use by clinicians and patients, I will disseminate this prediction model using an interactive online tool.

Public Health Relevance

Using personalized risk prediction to help clinicians and patients make health care decisions is a vital and rapidly growing way to improve outcomes and quality of care. However, in the common survival analysis setting known as semi-competing risks where both a non-terminal event and a terminal event are of interest, there lack adequate methods for modeling patients' joint risks using high-dimensional data sources such as electronic health records. The focus of this proposal is the development of statistical and machine-learning methods for this setting to predict individual patients' prospective joint risk over time of experiencing one or both outcomes of interest, and apply them to risk strati?cation and individualized prediction of outcomes for preeclampsia in pregnant women.

Agency
National Institute of Health (NIH)
Institute
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type
Predoctoral Individual National Research Service Award (F31)
Project #
1F31HD102159-01
Application #
9992419
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Koso-Thomas, Marion
Project Start
2020-09-01
Project End
2022-08-31
Budget Start
2020-09-01
Budget End
2021-08-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Harvard University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
149617367
City
Boston
State
MA
Country
United States
Zip Code
02115