Modern genome-scale experimental techniques enable for the first time in biological research the comprehensive monitoring of the entire molecular regulatory events leading to disease. Their integrative analyses hold the promise of generating specific, experimentally testable hypotheses, paving the way for a systems-level molecular view of complex disease. However, systems-level modeling of metazoan biology must address the challenges of: 1. biological complexity, including individual cell lineages and tissue types, 2. the increasingly large scale of data in higher organisms, and 3. the diversity of biomolecules and interaction mechanisms in the cell. The long-term goal of this research is to address these challenges through the development of bioinformatics frameworks for the study of gene function and regulation in complex biological systems thereby contributing to a greater understanding of human disease. In the initial funding period, we have developed accurate methods for integrating and visualizing diverse functional genomics data in S. cerevisiae and implemented them in interactive web-based systems for the biology community. Our methods have led to experimental discoveries of novel biology, are widely used by the yeast community, and are integrated with the SGD model organism database. We now propose to leverage our previous work to develop novel data integration and analysis methods and implement them in a public system for human data. In the proposed research period, we will create algorithms appropriate for integrating metazoan data in a tissue- and cell-lineage specific manner in health and disease. We will also develop novel hierarchical methods for predicting specific molecular interaction mechanisms and will extend our methods for integrating additional biomolecules. These methods will direct experiments focused on the glomerular kidney filter, a critical and complex component of the human vascular system whose dysfunction directly contributes to microvascular disease. Prediction of these cell-lineage specific functional networks will advance the understanding of the glomerulus function and its role in microvascular disease, leading to better clinical predictors, diagnoses, and treatments. From a technical perspective, application to glomerular biology will enable iterative improvement of the proposed methods based on experimental feedback. The end product of this research will be a general, robust, interactive, and automatically updated system for human data integration and analysis that will be freely available to the biomedical community. We will leverage parallel processing technologies (inspired by Google- type cloud computing solutions) to ensure interactive-analysis speed on the system. This system will allow biomedical researchers to synthesize, analyze, and visualize diverse data in human biology, enabling accurate predictions of biological networks and understanding their cell-lineage specificity and role in disease. Such integrative analyses will provide experimentally testable hypotheses, leading to a deeper understanding of complex disorders and paving the way to molecular-defined tissue targeted therapies and drug development.

Public Health Relevance

Our general system will enable integrative analysis of human functional genomics data in a cell-lineage and disease-focused manner, allowing biomedical researchers to identify potential clinical biomarkers and to formulate specific hypotheses elucidating the cause and development of a variety of complex disorders. Our application of this system to generate cell-lineage specific functional networks will lead to a better understanding of the glomerulus function and will directly benefit human health through the development of improved predictors, diagnoses, and treatments for microvascular disease.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Princeton University
Biostatistics & Other Math Sci
Schools of Engineering
United States
Zip Code
Zhou, Jian; Troyanskaya, Olga G (2014) Global quantitative modeling of chromatin factor interactions. PLoS Comput Biol 10:e1003525
Park, Christopher Y; Wong, Aaron K; Greene, Casey S et al. (2013) Functional knowledge transfer for high-accuracy prediction of under-studied biological processes. PLoS Comput Biol 9:e1002957
Lee, Young-suk; Krishnan, Arjun; Zhu, Qian et al. (2013) Ontology-aware classification of tissue and cell-type signals in gene expression profiles across platforms and technologies. Bioinformatics 29:3036-44
Ju, Wenjun; Greene, Casey S; Eichinger, Felix et al. (2013) Defining cell-type specificity at the transcriptional level in human disease. Genome Res 23:1862-73
Caudy, Amy A; Guan, Yuanfang; Jia, Yue et al. (2013) A new system for comparative functional genomics of Saccharomyces yeasts. Genetics 195:275-87
Guan, Yuanfang; Dunham, Maitreya J; Troyanskaya, Olga G et al. (2013) Comparative gene expression between two yeast species. BMC Genomics 14:33
Chikina, Maria D; Troyanskaya, Olga G (2012) An effective statistical evaluation of ChIPseq dataset similarity. Bioinformatics 28:607-13
Guan, Yuanfang; Yao, Victoria; Tsui, Kyle et al. (2011) Nucleosome-coupled expression differences in closely-related species. BMC Genomics 12:466
Chikina, Maria D; Troyanskaya, Olga G (2011) Accurate quantification of functional analogy among close homologs. PLoS Comput Biol 7:e1001074
Greene, Casey S; Troyanskaya, Olga G (2011) PILGRM: an interactive data-driven discovery platform for expert biologists. Nucleic Acids Res 39:W368-74

Showing the most recent 10 out of 44 publications