The onset of most human disease involves multiple, molecular-level changes to the complex system of interacting genes and pathways that function differently in specific cell-lineage, pathway and treatment contexts. While this system has been probed by the thousands of functional genomics and quantitative genetic studies, careful extraction of signals relevant to these specific contexts is a challenging problem. General integration of these heterogeneous data was an important first step in detecting signals that be used to build networks to generate experimentally-testable hypotheses. However, only by dealing with the fact that disease happens at the intersection of multiple contexts and by integrating functional genomics with quantitative genetics will we be able to move toward a molecular-level understanding of human pathophysiology, which will pave the way to new therapy and drug development. The long-term goal of this project is to enable such discoveries through the development of innovative bioinformatics frameworks for integrative analysis of diverse functional genomic data. In the previous funding periods, we developed accurate data integration and visualization methodologies for most common model organisms and human, created methods for tissue-specific data analysis, and applied these methods to make novel insights about important biological processes. We further enabled experimental biological discovery by implementing these methods in publicly accessible interactive systems that are popular with experimental biologists. Leveraging our prior work, we now will directly address the challenge of enabling data-driven study of molecular mechanisms underlying human disease by developing novel semi-supervised and multi-task machine learning approaches and implementing them in a real-time integration system capable of predicting genome-scale functional and mechanism-specific networks focused on any biological context of interest. This will allow any biomedical researcher to quickly make data-driven hypotheses about function, interactions, and regulation of genes involved in hypertension in the kidney glomerulus or to predict new regulatory interactions relevant to Parkinson's disease that affect the ubiquitination pathway in Substantia nigra. Furthermore, we will develop methods for disease gene discovery that leverage these highly specific networks for functional analysis of quantitative genetics data. Our deliverable will be a general, robust, user-friendly, and automatically updated system for user-driven functional genomic data integration and functional analysis of quantitative genetics data. Throughout this work, we (with our close experimental and clinical collaborators) will also apply our methods to chronic kidney disease, cardiovascular disease/hypertension, and autism spectrum disorders both as case studies for the iterative improvement of our methods and to make direct contribution to better understanding of these diseases.

Public Health Relevance

We will create a web-accessible system that will enable biologists and clinicians to generate hypotheses regarding disease-linked genes, molecular mechanisms underlying genetic disorders, and drug discovery for more targeted treatment. Underlying this user-friendly web interface will be novel algorithms for on-the-fly integration of vast amount of functional genomics and quantitative genetics data based on the context(s) defined by the biologist or clinician. As applied to the three case study areas, chronic kidney disease, cardiovascular disease/hypertension, and autism spectrum disorders, our system has the potential to identify novel disease genes and pathways and to enable development of better diagnostic biomarkers, drug targets, and, in the longer term, treatments.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM071966-13
Application #
9415436
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2004-07-01
Project End
2019-01-31
Budget Start
2018-02-01
Budget End
2019-01-31
Support Year
13
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Princeton University
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
002484665
City
Princeton
State
NJ
Country
United States
Zip Code
08543
Watson, Emma; Olin-Sandoval, Viridiana; Hoy, Michael J et al. (2016) Metabolic network rewiring of propionate flux compensates vitamin B12 deficiency in C. elegans. Elife 5:
Zhou, Jian; Troyanskaya, Olga G (2016) Probabilistic modelling of chromatin code landscape reveals functional diversity of enhancer-like chromatin states. Nat Commun 7:10528
Krishnan, Arjun; Zhang, Ran; Yao, Victoria et al. (2016) Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat Neurosci 19:1454-1462
Gorenshteyn, Dmitriy; Zaslavsky, Elena; Fribourg, Miguel et al. (2015) Interactive Big Data Resource to Elucidate Human Immune Pathways and Diseases. Immunity 43:605-14
Goya, Jonathan; Wong, Aaron K; Yao, Victoria et al. (2015) FNTM: a server for predicting functional networks of tissues in mouse. Nucleic Acids Res 43:W182-7
Wong, Aaron K; Krishnan, Arjun; Yao, Victoria et al. (2015) IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res 43:W128-33
Zhou, Jian; Troyanskaya, Olga G (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12:931-4
Park, Christopher Y; Krishnan, Arjun; Zhu, Qian et al. (2015) Tissue-aware data integration approach for the inference of pathway interactions in metazoan organisms. Bioinformatics 31:1093-101
Dolinski, Kara; Troyanskaya, Olga G (2015) Implications of Big Data for cell biology. Mol Biol Cell 26:2575-8
Greene, Casey S; Krishnan, Arjun; Wong, Aaron K et al. (2015) Understanding multicellular function and disease with human tissue-specific networks. Nat Genet 47:569-76

Showing the most recent 10 out of 63 publications