A fundamental challenge in precision medicine is to understand the patterns of differentiation between individuals. To address this challenge, we propose to go beyond the traditional `one disease--one model' view of bioinformatics and pursue a new view built upon personalized patient models that facilitates precision medicine by leveraging both commonalities within a patient cohort as well as signatures unique to every individual patient. With the emergence of large-scale databases such as The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC), and the Gene Expression Omnibus (GEO), which collect multi-omic data on many different diseases, a new ?pan-omics? and ?pan-disease? paradigm has emerged to jointly analyze all patients in a disease cohort while accounting for patient-specific effects. An example of this is the recently released Pan-Cancer Atlas. At the same time, next generation statistical tools to accurately and rigorously draw the necessary inferences are lacking. In this project we propose a series of mathematically rigorous, statistically sound, and computationally feasible approaches to infer sample-specific models, providing a more complete view of heterogeneous datasets. By bringing together ideas from the machine learning, statistics, and mathematical optimization communities, we provide a rigorous framework for precision medicine via sample-specific statistical models. Crucially, we propose to analyze this framework and prove strong theoretical guarantees under weak assumptions--this dramatically distinguishes our framework from much of the existing literature. Towards these goals, we propose the following aims:
Aim 1 : Discovery of new molecular profiles with sample-specific statistical models. We propose a general framework for inferring sample-specific models with low-rank structure based on the novel concept of distance-matching. This allows us to infer statistical models at the level of a single patient without overfitting, and is general enough to be applied for prediction, classification, and network inference as well as a variety of diseases and phenotypes.
Aim 2 : Multimodal approaches to personalized diagnosis--contextually interpretable models for actionable clinical decision support. In order to translate these models into practice, we propose a novel interpretable predictive model that supports complex, multimodal data types such as images and text combined with high-level interpretable features such as SNP data, gender, age, etc. This framework simultaneously boosts the accuracy of clinical predictions by exploiting sample heterogeneity while providing human-digestable explanations for the predictions being made.
Aim 3 : Next-generation precision medicine--algorithms and software for personalized estimation. To put our models into practical use, we will develop new algorithms for interpretable prediction of personalized clinical outcomes and visualization of personalized statistical models. All of our tools will be combined into a user-friendly software package called PrecisionX that will be freely available to researchers and clinicians everywhere.

Public Health Relevance

Personalization with data is a critical challenge whenever decisions must be made at scale, and has applications that go beyond precision medicine; businesses, educational institutions, and financial institutions are among the many players that have acknowledged a stake in this complex problem. We expect the proposed work to provide a rigorous foundation for personalization with large and high-dimensional datasets, finding use throughout the broader scientific community as well as with industry and educational institutions. Alongside our collaboration with Pitt/UPMC, we will work with physicians and data scientists for practical feedback as well as provide training in the methods developed.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM140467-01
Application #
10133782
Study Section
Special Emphasis Panel (ZGM1)
Program Officer
Ravichandran, Veerasamy
Project Start
2020-09-01
Project End
2024-08-31
Budget Start
2020-09-01
Budget End
2021-08-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Carnegie-Mellon University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
052184116
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213