The onset of most human disease involves numerous molecular-level changes to the complex system of interacting genes and pathways that function differently in specific cell-lineage, pathway, and treatment contexts. This system is probed by thousands of functional genomics and quantitative genetic studies, and integrative analysis of these data can generate testable hypotheses identifying causal genetic variants and linking them to network level changes in cells to disease phenotypes. This can enable deeper molecular-level understanding of pathophysiology, paving the way to genome-based precision medicine. The long term goal of this project is to enable such discoveries through integrative analysis of high- throughput biological data in a disease context. In the previous funding periods, we developed accurate data integration methods, created algorithms for the prediction of disease genes through context-specific and mechanistic network models and analysis of quantitative genetics data, and made novel insights into important biological processes and diseases. We further enabled experimental biological discovery by building public interactive systems capable of real-time user-driven integration that are popular among experimental biologists. We now propose to connect these gene-level functional network approaches with the underlying genomic variation by deciphering how genomic variants lead to specific transcriptional and posttranscriptional effects. We propose to develop ab initio sequence-level models capable of predicting biochemical effects of any genomic variant (including rare or never observed) on chromatin state and RNA regulation, then link these effects with gene-level regulatory consequences (including tissue-specific transcription and RNA splicing), and finally put genomic sequence directly into the network context via a statistical approach for detecting genes and network neighborhoods with a significantly elevated mutational burden in disease. Our key deliverable will be a user- friendly, interactive web-based framework enabling systems-level variant impact analysis in a network context and an open source library for computational scientists. In addition to systematic analysis across contexts and diseases, we will collaborate with experimentalists to apply our methods to Alzheimer?s, autism spectrum disorders, chronic kidney disease, immune diseases, and congenital heart defects as case studies for the iterative improvement of our methods and to directly contribute to better understanding of these diseases.

Public Health Relevance

To pave the way for mechanistic interpretation of disease in the genomic context and eventually, precision medicine, we will develop algorithms for de novo prediction of functional biochemical effects of noncoding variants at the DNA regulation and RNA processing levels and then build frameworks for sequence-based prediction of tissue-specific transcription and post-transcriptional RNA processes (starting with splicing). To facilitate discovery of disease mechanisms, we will develop approaches for analyzing these variant effects in a network context, including those developed in the previous grant period (mechanistic and functional networks) and novel network models that integrate exon usage information or enhancer-gene interactions. In addition to verifying top predictions experimentally in our group or by our collaborators in case study areas of neurodegenerative disease, chronic kidney disease, ASD, and congenital heart disease, we will make our methods available to the broader biomedical community through public, interactive user interfaces and open source libraries.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM071966-15
Application #
9902503
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2005-04-01
Project End
2023-03-31
Budget Start
2020-04-01
Budget End
2021-03-31
Support Year
15
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Princeton University
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
002484665
City
Princeton
State
NJ
Country
United States
Zip Code
08543
Kaletsky, Rachel; Yao, Victoria; Williams, April et al. (2018) Transcriptome analysis of adult Caenorhabditis elegans cells reveals tissue-specific gene and isoform expression. PLoS Genet 14:e1007559
Zhou, Jian; Theesfeld, Chandra L; Yao, Kevin et al. (2018) Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat Genet 50:1171-1179
Dannenfelser, Ruth; Nome, Marianne; Tahiri, Andliena et al. (2017) Data-driven analysis of immune infiltrate in a large cohort of breast cancer and its association with disease progression, ER activity, and genomic complexity. Oncotarget 8:57121-57133
Nirschl, Christopher J; Suárez-Fariñas, Mayte; Izar, Benjamin et al. (2017) IFN?-Dependent Tissue-Immune Homeostasis Is Co-opted in the Tumor Microenvironment. Cell 170:127-141.e15
Watson, Emma; Olin-Sandoval, Viridiana; Hoy, Michael J et al. (2016) Metabolic network rewiring of propionate flux compensates vitamin B12 deficiency in C. elegans. Elife 5:
Zhou, Jian; Troyanskaya, Olga G (2016) Probabilistic modelling of chromatin code landscape reveals functional diversity of enhancer-like chromatin states. Nat Commun 7:10528
Krishnan, Arjun; Zhang, Ran; Yao, Victoria et al. (2016) Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat Neurosci 19:1454-1462
Wong, Aaron K; Krishnan, Arjun; Yao, Victoria et al. (2015) IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res 43:W128-33
Goya, Jonathan; Wong, Aaron K; Yao, Victoria et al. (2015) FNTM: a server for predicting functional networks of tissues in mouse. Nucleic Acids Res 43:W182-7
Park, Christopher Y; Krishnan, Arjun; Zhu, Qian et al. (2015) Tissue-aware data integration approach for the inference of pathway interactions in metazoan organisms. Bioinformatics 31:1093-101

Showing the most recent 10 out of 67 publications