Context-Sensitive Search of Human Expression Compendia

Troyanskaya, Olga

Abstract

Gene expression experiments are an abundant and robust source of functional genomics data, with thousands of microarray and a growing number of high throughput RNA sequencing studies publicly available, most interrogating clinical and biological systems relevant to disease. They hold the promise of data-driven characterization of gene function and regulation, including in specific tissues, cell lines, and disease states, and can advance the understanding and modeling of regulatory changes that form the basis of human disease. However, these data remain largely underutilized, as biology researchers do not have effective tools to explore and analyze the entire data collection to generate novel hypotheses and direct experiments. The situation is similar to that of the Internet before the search engines - a biology researcher has to know a priori which datasets pertain to the biological question she is asking, reflect the tissue/cell-lineage specific signals of interest to her, and accurately measure the expression of genes related to her pathways of interest. There is a clear need for methods that will enable biology researchers to use their domain-specific knowledge to direct their exploration of public human expression data, enabling them to generate hypotheses and direct experiments addressing challenging biomedical questions. Such a system should provide users with ability to effectively explore automatically identified datasets relevant to their biological question of interest, leverage metazoan complexity including cell lineage and disease specific signals, and allow the researcher to securely include their unpublished data in the analysis. To address these challenges, this proposal describes a """"""""Google-style"""""""" public search engine for large collections of gene expression data built using novel search algorithms and leveraging cloud-computing technologies. This system implements a novel query-based context-sensitive algorithm for search of large expression compendia that exploits the complexity of metazoan organisms, including cell-lineage complexity and disease aspects inherent to human expression studies. Furthermore, the challenge of heterogeneity in human samples will be addressed by developing novel hierarchical learning methods to predict cell-lineage or tissue-specific gene expression based on the compendium and to identify these signals in each dataset. This will enable users to explore tissue-specific expression and also will be integrated with the search algorithm to improve search accuracy. Proposed algorithms, search engine, and user interface will be extensively evaluated in close collaboration with biology researchers, and top predictions will be tested experimentally. These methods will be implemented in a user-friendly public search system that will leverage cloud computing to provide robust interactive query response and will enable biology researchers to explore both published data collections and their own pre-publication datasets in a context-specific, integrated, and secure manner.

Public Health Relevance

We will develop a Google-style search engine for massive collections of human gene expression data. Our system will enable researchers to use their domain knowledge to explore the entirety of public human expression data to generate hypotheses and direct experiments addressing a diverse range of challenging biomedical questions. Public availability of our system will advance genome-level understanding of human biology and facilitate development of novel drugs, therapies, and personalized medical treatments.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG005998-02
Application #: 8290295
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Bonazzi, Vivien

Project Start: 2011-06-28
Project End: 2014-04-30
Budget Start: 2012-05-01
Budget End: 2013-04-30
Support Year: 2
Fiscal Year: 2012
Total Cost: $383,572
Indirect Cost: $141,055

Institution

Name: Princeton University
Department: Biostatistics & Other Math Sci
Type: Schools of Engineering
DUNS #: 002484665

City: Princeton
State: NJ
Country: United States
Zip Code: 08544

Related projects


NIH 2013 R01 HG	Context-Sensitive Search of Human Expression Compendia Troyanskaya, Olga G. / Princeton University	$365,164
NIH 2012 R01 HG	Context-Sensitive Search of Human Expression Compendia Troyanskaya, Olga G. / Princeton University	$383,572
NIH 2011 R01 HG	Context-Sensitive Search of Human Expression Compendia Troyanskaya, Olga G. / Princeton University	$391,080

Publications

Zhou, Jian; Theesfeld, Chandra L; Yao, Kevin et al. (2018) Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat Genet 50:1171-1179

Watson, Emma; Olin-Sandoval, Viridiana; Hoy, Michael J et al. (2016) Metabolic network rewiring of propionate flux compensates vitamin B12 deficiency in C. elegans. Elife 5:

Zhou, Jian; Troyanskaya, Olga G (2016) Probabilistic modelling of chromatin code landscape reveals functional diversity of enhancer-like chromatin states. Nat Commun 7:10528

Krishnan, Arjun; Zhang, Ran; Yao, Victoria et al. (2016) Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat Neurosci 19:1454-1462

Zhou, Jian; Troyanskaya, Olga G (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12:931-4

Wong, Aaron K; Krishnan, Arjun; Yao, Victoria et al. (2015) IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res 43:W128-33

Goya, Jonathan; Wong, Aaron K; Yao, Victoria et al. (2015) FNTM: a server for predicting functional networks of tissues in mouse. Nucleic Acids Res 43:W182-7

Park, Christopher Y; Krishnan, Arjun; Zhu, Qian et al. (2015) Tissue-aware data integration approach for the inference of pathway interactions in metazoan organisms. Bioinformatics 31:1093-101

Greene, Casey S; Krishnan, Arjun; Wong, Aaron K et al. (2015) Understanding multicellular function and disease with human tissue-specific networks. Nat Genet 47:569-76

Zhu, Qian; Wong, Aaron K; Krishnan, Arjun et al. (2015) Targeted exploration and analysis of large cross-platform human transcriptomic compendia. Nat Methods 12:211-4, 3 p following 214

Showing the most recent 10 out of 23 publications

Comments

Be the first to comment on Olga Troyanskaya's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: