Modern genome-scale experimental techniques enable for the first time in biological research the comprehensive monitoring of the entire molecular regulatory events leading to disease. Their integrative analyses hold the promise of generating specific, experimentally testable hypotheses, paving the way for a systems-level molecular view of complex disease. However, systems-level modeling of metazoan biology must address the challenges of: 1. biological complexity, including individual cell lineages and tissue types, 2. the increasingly large scale of data in higher organisms, and 3. the diversity of biomolecules and interaction mechanisms in the cell. The long-term goal of this research is to address these challenges through the development of bioinformatics frameworks for the study of gene function and regulation in complex biological systems thereby contributing to a greater understanding of human disease. In the initial funding period, we have developed accurate methods for integrating and visualizing diverse functional genomics data in S. cerevisiae and implemented them in interactive web-based systems for the biology community. Our methods have led to experimental discoveries of novel biology, are widely used by the yeast community, and are integrated with the SGD model organism database. We now propose to leverage our previous work to develop novel data integration and analysis methods and implement them in a public system for human data. In the proposed research period, we will create algorithms appropriate for integrating metazoan data in a tissue- and cell-lineage specific manner in health and disease. We will also develop novel hierarchical methods for predicting specific molecular interaction mechanisms and will extend our methods for integrating additional biomolecules. These methods will direct experiments focused on the glomerular kidney filter, a critical and complex component of the human vascular system whose dysfunction directly contributes to microvascular disease. Prediction of these cell-lineage specific functional networks will advance the understanding of the glomerulus function and its role in microvascular disease, leading to better clinical predictors, diagnoses, and treatments. From a technical perspective, application to glomerular biology will enable iterative improvement of the proposed methods based on experimental feedback. The end product of this research will be a general, robust, interactive, and automatically updated system for human data integration and analysis that will be freely available to the biomedical community. We will leverage parallel processing technologies (inspired by Google- type cloud computing solutions) to ensure interactive-analysis speed on the system. This system will allow biomedical researchers to synthesize, analyze, and visualize diverse data in human biology, enabling accurate predictions of biological networks and understanding their cell-lineage specificity and role in disease. Such integrative analyses will provide experimentally testable hypotheses, leading to a deeper understanding of complex disorders and paving the way to molecular-defined tissue targeted therapies and drug development.

Public Health Relevance

Our general system will enable integrative analysis of human functional genomics data in a cell-lineage and disease-focused manner, allowing biomedical researchers to identify potential clinical biomarkers and to formulate specific hypotheses elucidating the cause and development of a variety of complex disorders. Our application of this system to generate cell-lineage specific functional networks will lead to a better understanding of the glomerulus function and will directly benefit human health through the development of improved predictors, diagnoses, and treatments for microvascular disease.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Princeton University
Biostatistics & Other Math Sci
Schools of Engineering
United States
Zip Code
Kaletsky, Rachel; Yao, Victoria; Williams, April et al. (2018) Transcriptome analysis of adult Caenorhabditis elegans cells reveals tissue-specific gene and isoform expression. PLoS Genet 14:e1007559
Zhou, Jian; Theesfeld, Chandra L; Yao, Kevin et al. (2018) Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat Genet 50:1171-1179
Dannenfelser, Ruth; Nome, Marianne; Tahiri, Andliena et al. (2017) Data-driven analysis of immune infiltrate in a large cohort of breast cancer and its association with disease progression, ER activity, and genomic complexity. Oncotarget 8:57121-57133
Nirschl, Christopher J; Suárez-Fariñas, Mayte; Izar, Benjamin et al. (2017) IFN?-Dependent Tissue-Immune Homeostasis Is Co-opted in the Tumor Microenvironment. Cell 170:127-141.e15
Watson, Emma; Olin-Sandoval, Viridiana; Hoy, Michael J et al. (2016) Metabolic network rewiring of propionate flux compensates vitamin B12 deficiency in C. elegans. Elife 5:
Zhou, Jian; Troyanskaya, Olga G (2016) Probabilistic modelling of chromatin code landscape reveals functional diversity of enhancer-like chromatin states. Nat Commun 7:10528
Krishnan, Arjun; Zhang, Ran; Yao, Victoria et al. (2016) Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat Neurosci 19:1454-1462
Zhou, Jian; Troyanskaya, Olga G (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12:931-4
Gorenshteyn, Dmitriy; Zaslavsky, Elena; Fribourg, Miguel et al. (2015) Interactive Big Data Resource to Elucidate Human Immune Pathways and Diseases. Immunity 43:605-14
Wong, Aaron K; Krishnan, Arjun; Yao, Victoria et al. (2015) IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res 43:W128-33

Showing the most recent 10 out of 67 publications