Patient care has been transformed by the availability of high-dimensional sources like electronic health records (EHR) and genomic data, allowing health care decisions to be tailored to individual patients. Statistical methods have been developed to ef?ciently use such high dimensional data, but critical gaps still remain. Several common models for survival analysis have recently been extended to accommodate high-dimensional variable selection and machine learning prediction methods, but similar tools have not yet been developed for the setting of semi- competing risks. In the semi-competing risks setting, interest focuses on jointly modeling both a terminal time- to-event outcome, as well as a non-terminal time-to-event outcome which can only occur for subjects who have not yet experienced the terminal event. Examples of this exist in severe pregnancy-related diseases such as pre-eclampsia (PE - further described below). PE and subsequent delivery are natural semi-competing risks, as PE can develop before delivery, but not after. Current methods do not provide analysts with data-driven tools for uncovering important covariates from high-dimensional data, and clinicians lack meaningful, personalized predictions of patients' joint probability of experiencing one or both outcomes prospectively through time. This proposal addresses these methodological gaps with tools for high-dimensional inference and prediction.
In Aim 1, I will address the challenge of variable selection by developing a suite of regularized estimators for se- lecting important covariates from large datasets into a semi-competing risks model, and evaluating performance by simulation.
In Aim 2, I will create a deep feed forward neural network modeling framework for predicting individual patients' joint probabilities of experiencing one or both outcomes of interest across future time points. Together, these aims will improve personalization of health care decisions. Software will be developed that provides researchers practical and user-friendly tools for applying these methods.
In Aim 3, I will apply these approaches for semi-competing risks to evaluate risk of PE, which is globally a leading cause of maternal and fetal/neonatal mortality and morbidity. Using EHR pregnancy data from 50,000 births between 2011-2020, I will use the proposed variable selection methods to develop a model identifying risk factors for PE along with factors affecting time-to-delivery among PE patients. Through this work, I will also build a deep learning model in order to jointly predict maternal PE and NICU admission of the infant, yielding personalized prediction plots to facilitate care decisions that balance maternal and fetal health risks. For ease of use by clinicians and patients, I will disseminate this prediction model using an interactive online tool.
Using personalized risk prediction to help clinicians and patients make health care decisions is a vital and rapidly growing way to improve outcomes and quality of care. However, in the common survival analysis setting known as semi-competing risks where both a non-terminal event and a terminal event are of interest, there lack adequate methods for modeling patients' joint risks using high-dimensional data sources such as electronic health records. The focus of this proposal is the development of statistical and machine-learning methods for this setting to predict individual patients' prospective joint risk over time of experiencing one or both outcomes of interest, and apply them to risk strati?cation and individualized prediction of outcomes for preeclampsia in pregnant women.