The key to success in personalized and precision cancer genomics lies in: (1) discovering and understanding the molecular-level mechanisms of how genetic alterations influence various cellular processes relevant to cancer, and (2) utilizing molecular signatures to tailor more personalized treatment strategies for patients. In order to achieve these goals, various high-throughput experimental methods have been developed in recent years to obtain information about a patient's cancer genome sequence, mRNA expression, protein expression, epigenetic readout, and other detailed information about a patient's tumor. However, algorithms that fully harness such a massive amount of high dimensional data to yield biomedical insights are often lacking. This project will advance the field of data-driven complex modeling of cancer genomic data for personalized cancer treatment by developing novel algorithms that use emerging and new techniques in high-dimensional machine learning. The results of this research have the potential to impact both the machine learning field and the computational genomics field. The educational components integrated with the research program will develop new curriculum materials, involve undergraduate students and underrepresented groups in research, and train a new generation of interdisciplinary graduate researchers.

This project consists of two synergistic research thrusts to develop novel high-dimensional machine learning algorithms for analyzing high-throughput cancer genomic data. First, the project will develop high-dimensional graphical models for multi-view data modeling to integrate data from heterogeneous genome-wide data sources. Second, it will devise novel high-dimensional collaborative learning methods for personalized drug recommendation. The high-dimensional graphical models will be used to estimate networks for different cancer subtypes. These networks will then be integrated into the recommendation algorithms, which in turn will help improve the multi-view graphical model estimation. This project will enhance the ability to interpret large-scale cancer genomics data by pinpointing the roles of complex molecular interactions in cancer onset and progression, which will enable novel ways to more effectively discover personalized molecular signatures and more targeted potential treatments of cancer. Such technical innovation and conceptual advancement have the potential to reshape the way that one approaches graphical model estimation and its role in biological contexts. The project will potentially open up new possibilities for both theoreticians and practitioners in machine learning and computational biology as well as other disciplines.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1717205
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2017-08-01
Budget End
2021-07-31
Support Year
Fiscal Year
2017
Total Cost
$200,000
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213