With growing interest in personalized medicine and the rise of machine learning, constructing good risk prediction and prognostic models has been drawing renewed attention. In this development, much effort is concentrated in identifying good predictors of patient outcomes, although the same level of rigor is often absent in improving the outcome side of prediction. The majority of popular supervised techniques (e.g., regularized logistic regression and its variations), which can be readily applied in risk model development, assumes that the prediction target is a clear single outcome measured at a single time point. In clinical reality, patient outcomes are often complex, multivariate, and measured with errors. Even when a target is a relatively clear univariate outcome (e.g., death, cancer, diabetes, etc), the process that leads to this ultimate outcome often involves complex intermediate outcomes, where predicting and understanding this intermediate process can be crucial in providing effective care and preventing negative ultimate outcomes. The situation calls for a ?exible learning framework that can easily incorporate this important but neglected aspect in model development - better characterizing and constructing prediction targets before building prediction models. Focusing on risk labels as prediction targets, we propose a pragmatic 3-stage learning approach, where we sequentially 1) generate latent labels, 2) validate them using explicit validators, and 3) go on with supervised learning with labeled data. Latent variable (LV) strategies used in Satge 1 have great potentials in handling complex outcome information. The unsupervised nature of LV strategies makes highly ?exible data synthesis and organization possible. The same nature, however, can also be seen as esoteric and subjective, which is not desirable in situations where transparency and reproducibility are of great concern such as in risk prediction. As a practical solution to this problem, we propose the use of explicit clinical validators, which not only makes LV-based labels closely aligned with contemporary science and clinical practice, but also makes it possible to automatically validate and narrow a large pool of candidate labels. With the goal of developing a practical and transparent system of learning and inference for clinical research and practice, we formed a highly interdisciplinary team of researchers with expertise in latent variable modeling, machine learning, psychometrics and causal inference along with clinical/substantive expertise. Our streamlined learning framework focuses on direct and transparent validation of latent variable solutions to ensure clear communication across risk model developers, clinical researchers and practitioners. The project ultimately aims to improve personalized treatment and care by improving risk prediction.

Public Health Relevance

This project intends to develop a pragmatic learning and risk predction framework that will facilitate utilization of multivariate data collected from research and health care services, which otherwise is underutilized in developing methods to improve personalized care of future patients. The project ultimately aims to improve personalized treatment and care by improving risk prediction, and therefore will have positive impact on public health.

Agency
National Institute of Health (NIH)
Institute
National Institute of Mental Health (NIMH)
Type
Research Project (R01)
Project #
1R01MH123443-01
Application #
10033908
Study Section
Mental Health Services Research Committee (SERV)
Program Officer
Freed, Michael
Project Start
2020-07-08
Project End
2023-05-31
Budget Start
2020-07-08
Budget End
2021-05-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Stanford University
Department
Psychiatry
Type
Schools of Medicine
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94305