Computational Tools for Mining Large Amounts of ChIP and Gene Expression Data

Ji, Hongkai

Abstract

ChIP-seq and ChIP-chip, hereinafter referred to as ChIPx, are powerful technologies to map genome-wide protein-DNA interactions (PDIs). Microarray, exon array and RNA-seq, on the other hand, are widely used to measure gene expression. Integrating ChIPx and gene expression data provides a powerful approach to study gene regulation both during development and in diseases. Traditionally, ChIPx and gene expression experiments conducted by a single laboratory are mainly used to study a specific biological system. The collective efforts of many labs have resulted in a large volume of data representing diverse biological systems. Jointly, these data contain enormous amounts of information that have not been fully utilized by each individual lab. This proposal aims to develop a coordinated set of computational, statistical and software tools to allow scientists to synthesize information in 3000+ publicly available ChIPx samples and 60,000+ gene expression profiles in human and mouse to make new discoveries. The project will turn these heterogeneous data into a tool for high-throughput discovery of biological contexts (i.e., cell types, tissues and diseases) associated with gene regulatory pathway activities. First, a statistical method named Gene Set Context Analysis (GSCA) will be developed. GSCA utilizes large amounts of public gene expression data to infer biological contexts and diseases in which one or more gene sets (i.e., groups of genes) are coordinately activated or inactivated. Second, based on the GSCA, a method called Transcription Factor Context Analysis (TFCA) will be developed. TFCA discovers novel functional contexts of transcription factors (TFs) and gene regulatory pathways. This method first classifies target genes of a TF into different functional categories by integrating one's own ChIPx and gene expression data with public ChIPx and Gene Ontology data. It then uses GSCA to systematically discover biological contexts (including diseases) associated with the function of each category. Collectively, GSCA and TFCA will establish a new paradigm for analyzing ChIPx and gene expression data. The conventional approach analyzes data tied to a particular system. In the new approach, one also leverages the rich information in public ChIPx and gene expression data to extend findings in one system to other biological systems. By allowing one to make novel discoveries beyond the scope of the original experiments and connect gene regulatory pathways to diseases, the new approach will significantly increase the value of both new and existing data. Applying GSCA and TFCA, 3000+ ChIPx samples and 60,000+ gene expression samples in human and mouse will be analyzed together to systematically map TF functions and ChIPx defined regulatory pathway activ- ities to diseases. Some new predictions will be validated experimentally. In addition to creating new knowledge about a variety of diseases, this research will provide urgently needed data integration and data mining tools to help scientists to translate the rich information in the publicly available ChIPx and gene expression data into new discoveries, and identify promising new areas of biomedical research.

Public Health Relevance

The publicly available genomic data on gene expression and protein-DNA interactions contain enormous amounts of information that have not been fully utilized. This proposal develops computational, statistical and software tools to extract the information and applies these tools to systematically discover novel connections between genes and biological pathways to diseases. The findings will increase our understanding of a variety of diseases and point to promising new areas of biomedical research.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG006282-02
Application #: 8516554
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Pazin, Michael J

Project Start: 2012-07-25
Project End: 2017-04-30
Budget Start: 2013-05-01
Budget End: 2014-04-30
Support Year: 2
Fiscal Year: 2013
Total Cost: $385,770
Indirect Cost: $116,872

Institution

Name: Johns Hopkins University
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #: 001910777

City: Baltimore
State: MD
Country: United States
Zip Code: 21218

Related projects


NIH 2016 R01 HG	Computational Tools for Mining Large Amounts of ChIP and Gene Expression Data Ji, Hongkai / Johns Hopkins University
NIH 2015 R01 HG	Computational Tools for Mining Large Amounts of ChIP and Gene Expression Data Ji, Hongkai / Johns Hopkins University	$393,819
NIH 2014 R01 HG	Computational Tools for Mining Large Amounts of ChIP and Gene Expression Data Ji, Hongkai / Johns Hopkins University	$395,854
NIH 2013 R01 HG	Computational Tools for Mining Large Amounts of ChIP and Gene Expression Data Ji, Hongkai / Johns Hopkins University	$385,770
NIH 2012 R01 HG	Computational Tools for Mining Large Amounts of ChIP and Gene Expression Data Ji, Hongkai / Johns Hopkins University	$419,462

Publications

Zhang, Boyang; Hong, Xiumei; Ji, Hongkai et al. (2018) Maternal smoking during pregnancy and cord blood DNA methylation: new insight on sex differences and effect modification by maternal folate levels. Epigenetics 13:505-518

Kuang, Zheng; Ji, Zhicheng; Boeke, Jef D et al. (2018) Dynamic motif occupancy (DynaMO) analysis identifies transcription factors and their binding sites driving dynamic biological processes. Nucleic Acids Res 46:e2

Kuang, Zheng; Ji, Hongkai; Boeke, Jef D (2018) Stress response factors drive regrowth of quiescent cells. Curr Genet 64:807-810

Zhou, Weiqiang; Sherwood, Ben; Ji, Zhicheng et al. (2017) Genome-wide prediction of DNase I hypersensitivity using gene expression. Nat Commun 8:1038

Kuang, Zheng; Pinglay, Sudarshan; Ji, Hongkai et al. (2017) Msn2/4 regulate expression of glycolytic enzymes and control transition from quiescence to growth. Elife 6:

Ji, Zhicheng; Zhou, Weiqiang; Ji, Hongkai (2017) Single-cell regulome data analysis by SCRAT. Bioinformatics 33:2930-2932

Zhou, Weiqiang; Sherwood, Ben; Ji, Hongkai (2016) Computational Prediction of the Global Functional Genomic Landscape: Applications, Methods, and Challenges. Hum Hered 81:88-105

Li, Qiang; Lex, Rachel K; Chung, HaeWon et al. (2016) The Pluripotency Factor NANOG Binds to GLI Proteins and Represses Hedgehog-mediated Transcription. J Biol Chem 291:7171-82

Wang, Jie; Xia, Shuli; Arand, Brian et al. (2016) Single-Cell Co-expression Analysis Reveals Distinct Functional Modules, Co-regulation Mechanisms and Clinical Outcomes. PLoS Comput Biol 12:e1004892

Ji, Zhicheng; Ji, Hongkai (2016) TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res 44:e117

Showing the most recent 10 out of 18 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: