Many forms of biomolecular (e.g., gene expression, genetics, proteomics) and clinical (e.g., clinical biomarkers, drug targets and indications) data pertaining to many different diseases are now readily available from publicly- available data repositories and knowledge-bases. There is now an opportunity to integrate these data into a unified, globally coherent representation of human disease, or nosology. Such a nosology would express how diseases are related to one another across multiple molecular and clinical axes. In this competitive renewal, we are planning a major expansion for this project. We plan to capture data from newer public repositories with more types of molecular measurements. Inclusion of genetic and protein measurements will enable a richer modeling of diseases and disease similarity, beyond mRNA measurements. To help link the molecular changes seen in disease to genetic differences, we plan to incorporate Expression Quantitative Trait Loci (eQTLs) into our disease models, built from simultaneous genetic and expression measurements. To expand the utility of our nosology in personalized medicine, we plan to incorporate more quantitative epidemiological measurements on disease, and to model transitions between disease states using probabilistic relational modeling. We will compare our nosology with the well-known ICD-10 as well as ICD-11, under development. We will develop novel visualization methods for the complex of edges and nodes seen in nosologies. We also plan to test our nosology in two Driving Biological Projects, in small cell lung cancer and immunology and disease, specifically yielding novel diagnostics and therapeutics ready for clinical trials.

Public Health Relevance

In this competitive renewal, building from 36 publications in the first funding period, we plan to create a new disease classification based on clinical, molecular, and epidemiological data and knowledge, and to use this classification to identify novel diagnostics and drugs for small cell lung cancer and immunological disease.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Long, Rochelle M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Schools of Medicine
United States
Zip Code
Chen, B; Butte, A J (2016) Leveraging big data to transform target selection and drug discovery. Clin Pharmacol Ther 99:285-97
Kosti, Idit; Jain, Nishant; Aran, Dvir et al. (2016) Cross-tissue Analysis of Gene and Protein Expression in Normal and Cancer Tissues. Sci Rep 6:24799
Kodama, Keiichi; Zhao, Zhiyuan; Toda, Kyoko et al. (2016) Expression-Based Genome-Wide Association Study Links Vitamin D-Binding Protein With Autoantigenicity in Type 1 Diabetes. Diabetes 65:1341-9
Paik, H; Chen, B; Sirota, M et al. (2016) Integrating Clinical Phenotype and Gene Expression Data to Prioritize Novel Drug Uses. CPT Pharmacometrics Syst Pharmacol 5:599-607
Hughey, Jacob J; Hastie, Trevor; Butte, Atul J (2016) ZeitZeiger: supervised learning for high-dimensional data from an oscillatory system. Nucleic Acids Res 44:e80
Wu, Menghua; Sirota, Marina; Butte, Atul J et al. (2015) Characteristics of drug combination therapy in oncology by analyzing clinical trial data on Pac Symp Biocomput :68-79
Paik, Hyojung; Chung, Ah-Young; Park, Hae-Chul et al. (2015) Repurpose terbutaline sulfate for amyotrophic lateral sclerosis using electronic medical records. Sci Rep 5:8580
Fan-Minogue, Hua; Chen, Bin; Sikora-Wohlfeld, Weronika et al. (2015) A systematic assessment of linking gene expression with genetic variants for prioritizing candidate targets. Pac Symp Biocomput :383-94
Hughey, Jacob J; Butte, Atul J (2015) Robust meta-analysis of gene expression using the elastic net. Nucleic Acids Res 43:e79
Chen, B; Greenside, P; Paik, H et al. (2015) Relating Chemical Structure to Cellular Response: An Integrative Analysis of Gene Expression, Bioactivity, and Structural Data Across 11,000 Compounds. CPT Pharmacometrics Syst Pharmacol 4:576-84

Showing the most recent 10 out of 68 publications