The onset of most human disease involves multiple, molecular-level changes to the complex system of interacting genes and pathways that function differently in specific cell-lineage, pathway and treatment contexts. While this system has been probed by the thousands of functional genomics and quantitative genetic studies, careful extraction of signals relevant to these specific contexts is a challenging problem. General integration of these heterogeneous data was an important first step in detecting signals that be used to build networks to generate experimentally-testable hypotheses. However, only by dealing with the fact that disease happens at the intersection of multiple contexts and by integrating functional genomics with quantitative genetics will we be able to move toward a molecular-level understanding of human pathophysiology, which will pave the way to new therapy and drug development. The long-term goal of this project is to enable such discoveries through the development of innovative bioinformatics frameworks for integrative analysis of diverse functional genomic data. In the previous funding periods, we developed accurate data integration and visualization methodologies for most common model organisms and human, created methods for tissue-specific data analysis, and applied these methods to make novel insights about important biological processes. We further enabled experimental biological discovery by implementing these methods in publicly accessible interactive systems that are popular with experimental biologists. Leveraging our prior work, we now will directly address the challenge of enabling data-driven study of molecular mechanisms underlying human disease by developing novel semi-supervised and multi-task machine learning approaches and implementing them in a real-time integration system capable of predicting genome-scale functional and mechanism-specific networks focused on any biological context of interest. This will allow any biomedical researcher to quickly make data-driven hypotheses about function, interactions, and regulation of genes involved in hypertension in the kidney glomerulus or to predict new regulatory interactions relevant to Parkinson's disease that affect the ubiquitination pathway in Substantia nigra. Furthermore, we will develop methods for disease gene discovery that leverage these highly specific networks for functional analysis of quantitative genetics data. Our deliverable will be a general, robust, user-friendly, and automatically updated system for user-driven functional genomic data integration and functional analysis of quantitative genetics data. Throughout this work, we (with our close experimental and clinical collaborators) will also apply our methods to chronic kidney disease, cardiovascular disease/hypertension, and autism spectrum disorders both as case studies for the iterative improvement of our methods and to make direct contribution to better understanding of these diseases.

Public Health Relevance

We will create a web-accessible system that will enable biologists and clinicians to generate hypotheses regarding disease-linked genes, molecular mechanisms underlying genetic disorders, and drug discovery for more targeted treatment. Underlying this user-friendly web interface will be novel algorithms for on-the-fly integration of vast amount of functional genomics and quantitative genetics data based on the context(s) defined by the biologist or clinician. As applied to the three case study areas, chronic kidney disease, cardiovascular disease/hypertension, and autism spectrum disorders, our system has the potential to identify novel disease genes and pathways and to enable development of better diagnostic biomarkers, drug targets, and, in the longer term, treatments.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Princeton University
Biostatistics & Other Math Sci
Biomed Engr/Col Engr/Engr Sta
United States
Zip Code
Zhou, Jian; Theesfeld, Chandra L; Yao, Kevin et al. (2018) Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat Genet 50:1171-1179
Kaletsky, Rachel; Yao, Victoria; Williams, April et al. (2018) Transcriptome analysis of adult Caenorhabditis elegans cells reveals tissue-specific gene and isoform expression. PLoS Genet 14:e1007559
Dannenfelser, Ruth; Nome, Marianne; Tahiri, Andliena et al. (2017) Data-driven analysis of immune infiltrate in a large cohort of breast cancer and its association with disease progression, ER activity, and genomic complexity. Oncotarget 8:57121-57133
Nirschl, Christopher J; Suárez-Fariñas, Mayte; Izar, Benjamin et al. (2017) IFN?-Dependent Tissue-Immune Homeostasis Is Co-opted in the Tumor Microenvironment. Cell 170:127-141.e15
Watson, Emma; Olin-Sandoval, Viridiana; Hoy, Michael J et al. (2016) Metabolic network rewiring of propionate flux compensates vitamin B12 deficiency in C. elegans. Elife 5:
Zhou, Jian; Troyanskaya, Olga G (2016) Probabilistic modelling of chromatin code landscape reveals functional diversity of enhancer-like chromatin states. Nat Commun 7:10528
Krishnan, Arjun; Zhang, Ran; Yao, Victoria et al. (2016) Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat Neurosci 19:1454-1462
Zhou, Jian; Troyanskaya, Olga G (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12:931-4
Gorenshteyn, Dmitriy; Zaslavsky, Elena; Fribourg, Miguel et al. (2015) Interactive Big Data Resource to Elucidate Human Immune Pathways and Diseases. Immunity 43:605-14
Wong, Aaron K; Krishnan, Arjun; Yao, Victoria et al. (2015) IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res 43:W128-33

Showing the most recent 10 out of 67 publications