The goal of this study is to develop machine learning methods, especially deep learning models (DLMs), to learn a better representation of activation states of cellular signaling pathways in an individual tumor and use such information to predict its sensitivity to anti-cancer drugs. Cancer is mainly caused by somatic genome alterations (SGAs) that perturb cellular signaling pathways, and aberrations in pathways eventually lead to cancer development. Precision oncology aims to accurately detect and target tumor-specific aberrations, but challenges remain. Currently, there is no well-established method to detect the activation states of signaling pathways, and the common practice of using mutation status of a targeted gene as the indicator for prescribing a molecularly targeted drug has limitations. To overcome such limitation, we hypothesize that, by closely simulating the hierarchical organization of cellular signaling systems, DLMs can be used to systematically identify major cancer signaling pathways, to detect tumor-specific aberrations in signaling pathways, and to predict cancer cell sensitivity to anti-cancer drugs. We will develop models that more precisely represent the state of signaling systems in cancer cells and use such information to enhance precision oncology. I will design and apply innovative DLMs to cancer big data, including large-scale pharmacogenomic data and cancer omics data to learn unified representation of aberrations in signaling systems caused by driver SGAs in cancer cell, despite of their different growth conditions, such as in cell culture, PDX and real tumor. This will enable us to transfer the models trained using cell lines and PDXs to clinical setting (real tumors) in future. By the nature of drugs that may share common target proteins, we develop model DLM-MLT (the combination of DLM and multi-task learning) to predict the sensitivity of tumor samples to multiple drugs at once. Furthermore, we will develop model BioSI-DLM to use various perturbations (ex. SGA/LINCS perturbation data) as side information to learn better representation that potentially map latent variables in a DLM to biological entities. We hypothesize that the representation learned from our designed models will significantly improve the prediction accuracy compared with the conventional indication for drug treatment (ex. mutation state of the drug targeting protein). In summary, our study uses deep learning based machine learning methods to learn better and concise representation embedded in the cancer omics data to reflect the personalized genomic changes, which could be used to guide the personalized treatment. Our study could significantly contribute to the development of cancer ontology and promote the development of precision medicine.

Public Health Relevance

Cancer is among the leading causes of death worldwide. The proposed project aims to develop machine learning methods, especially deep learning, to study cellular signaling systems and disease mechanism. A better representation embedded in the cancer omics data will improve the prediction of drug sensitivity and patient survival. Our study promotes the development of precision medicine to guide the personalized treatment based on patient?s unique genetic changes.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Career Transition Award (K99)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1)
Program Officer
Vanbiervliet, Alan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Schools of Medicine
United States
Zip Code