Recent efforts in large-scale data generation projects have attempted to produce genomic and epigenomic maps of multiple reference and cancer tissues (Epigenomic Roadmap, ENCODE, IHEC, TCGA). These orthogonal '-omic' profiles can be used to understand tissue-specific regulation in normal tissues, as well as identify regions of dysregulation that lead to disease. However, the data produced by different profiling approaches are typically heterogeneous and each specific platform suffers from different technical biases. Furthermore, current normalization and artifact correction methods are platform and data type specific, and may require both the training and test set for application, leading to data over-fitting, and renormalization or analysis modification if additional samples are later introduced. I this research, we will develop and demonstrate a comprehensive and coordinated framework for data standardization across tissue types and profiling platforms. We will develop algorithmic approaches for the uniform preprocessing of individual profiling samples that allow for seamless integration and interpretation across platforms, experiments, and techniques. We will use our standardized data to identify regions of coordinate and discordant epigenetic co-regulation across multiple tissue/cell types and across epigenomic profiling data platforms. Our particular focus will be across epithelial cell types. Finally, we will develop multi-'omic' drug efficacy biomarkers for epigenetic drugs across multiple tissue or cell types, followed by demonstration on patient samples from multiple public data resources.
We plan to develop and demonstrate a comprehensive and coordinated framework for data standardization across tissue types and profiling platforms. We will use our standardized data to identify regions of coordinate and discordant epigenetic co-regulation across multiple tissue/cell types and across epigenomic profiling data platforms and develop multi-'omic' drug efficacy biomarkers for epigenetic drugs across multiple tissue or cell types, followed by demonstration on patient samples from multiple public data resources.
Zhang, Yuqing; Jenkins, David F; Manimaran, Solaiappan et al. (2018) Alternative empirical Bayes models for adjusting for batch effects in genomic studies. BMC Bioinformatics 19:262 |
Rahman, Mumtahena; MacNeil, Shelley M; Jenkins, David F et al. (2017) Activity of distinct growth factor receptor network components in breast tumors uncovers two biologically relevant subtypes. Genome Med 9:40 |
Manimaran, Solaiappan; Selby, Heather Marie; Okrah, Kwame et al. (2016) BatchQC: interactive software for evaluating sample and batch effects in genomic data. Bioinformatics 32:3836-3838 |
Rahman, Mumtahena; Jackson, Laurie K; Johnson, W Evan et al. (2015) Alternative preprocessing of RNA-Sequencing data in The Cancer Genome Atlas leads to improved analysis results. Bioinformatics 31:3666-72 |