Recent efforts in large-scale data generation projects have attempted to produce genomic and epigenomic maps of multiple reference and cancer tissues (Epigenomic Roadmap, ENCODE, IHEC, TCGA). These orthogonal '-omic' profiles can be used to understand tissue-specific regulation in normal tissues, as well as identify regions of dysregulation that lead to disease. However, the data produced by different profiling approaches are typically heterogeneous and each specific platform suffers from different technical biases. Furthermore, current normalization and artifact correction methods are platform and data type specific, and may require both the training and test set for application, leading to data over-fitting, and renormalization or analysis modification if additional samples are later introduced. I this research, we will develop and demonstrate a comprehensive and coordinated framework for data standardization across tissue types and profiling platforms. We will develop algorithmic approaches for the uniform preprocessing of individual profiling samples that allow for seamless integration and interpretation across platforms, experiments, and techniques. We will use our standardized data to identify regions of coordinate and discordant epigenetic co-regulation across multiple tissue/cell types and across epigenomic profiling data platforms. Our particular focus will be across epithelial cell types. Finally, we will develop multi-'omic' drug efficacy biomarkers for epigenetic drugs across multiple tissue or cell types, followed by demonstration on patient samples from multiple public data resources.

Public Health Relevance

We plan to develop and demonstrate a comprehensive and coordinated framework for data standardization across tissue types and profiling platforms. We will use our standardized data to identify regions of coordinate and discordant epigenetic co-regulation across multiple tissue/cell types and across epigenomic profiling data platforms and develop multi-'omic' drug efficacy biomarkers for epigenetic drugs across multiple tissue or cell types, followed by demonstration on patient samples from multiple public data resources.

Agency
National Institute of Health (NIH)
Institute
National Institute of Environmental Health Sciences (NIEHS)
Type
Research Project (R01)
Project #
5R01ES025002-02
Application #
8928619
Study Section
Special Emphasis Panel (ZRG1-IMST-R (51))
Program Officer
Chadwick, Lisa
Project Start
2014-09-18
Project End
2016-08-31
Budget Start
2015-09-01
Budget End
2016-08-31
Support Year
2
Fiscal Year
2015
Total Cost
$321,520
Indirect Cost
$101,920
Name
Boston University
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
604483045
City
Boston
State
MA
Country
United States
Zip Code
02118
Zhang, Yuqing; Jenkins, David F; Manimaran, Solaiappan et al. (2018) Alternative empirical Bayes models for adjusting for batch effects in genomic studies. BMC Bioinformatics 19:262
Rahman, Mumtahena; MacNeil, Shelley M; Jenkins, David F et al. (2017) Activity of distinct growth factor receptor network components in breast tumors uncovers two biologically relevant subtypes. Genome Med 9:40
Manimaran, Solaiappan; Selby, Heather Marie; Okrah, Kwame et al. (2016) BatchQC: interactive software for evaluating sample and batch effects in genomic data. Bioinformatics 32:3836-3838
Rahman, Mumtahena; Jackson, Laurie K; Johnson, W Evan et al. (2015) Alternative preprocessing of RNA-Sequencing data in The Cancer Genome Atlas leads to improved analysis results. Bioinformatics 31:3666-72