Next-generation sequencing technologies are capable of producing tens of millions of sequence reads during each instrument run, and are quickly being applied in diverse types of experiments (e.g. RNA-Seq, miRNA-Seq, ChIP-Seq, BS-seq, CNV-Seq) to address biomedical questions by cost-effectively generating genome-wide datasets. While sequencing has been promoted as overcoming longstanding limitations of microarray-based studies, its data files are much larger than for microarrays, and its diverse data types raise similar as well as novel statistical and computational challenges. There is a pressing need for statistical and computational tools to address what leaders in the field have stated are the largest problems: data analysis and data integration. We propose to develop a comprehensive and coordinated set of statistical methods for high throughput sequencing (HTS) that directly address many important data analysis problems in epigenomics. Specifically we plan to address the following computational and statistical challenges facing researchers conducting HTS experiments: 1) develop sensitive statistical methods for the analysis of ChIP-seq data both for single- and paired-end-tag runs, particularly the focusing on applications in genome-wide profiling of nucleosome positions. 2) develop statistical methods for the analysis of BS-seq data, producing base-level DNA methylation profiles. 3) develop new statistical tools and methods for data integration in order to gain new biological insights about global transcription and regulation. We also plan to apply these approaches to a variety of high throughput sequencing data sets to demonstrate the relevance and utility of our methods. We plan to work with stimulated STAT1 and STAT3 data, and data from the ETS transcription factor family and its cofactors, for which we have already gathered significant data through our collaborations, including transcription factors, histone marks, DNAse I hypersensitivity and gene expression.
We propose to develop a comprehensive and coordinated set of statistical methods for high throughput sequencing (HTS) that directly address many important data analysis problems in epigenomics. In particular, we plan to integrate data from multiple sources including expression, transcription factor binding, nucleosome positioning, histone marks and DNA methylation to better understand the mechanisms that regulate the behavior of a cell. Much of our proposal involves not just the development of new statistical and computational methods, but also the design, implementation and delivery of software tools that support these ideas. The many useful applications of next-generation sequencing with assure that or well- developed methods will have a broad impact in molecular biology, specifically in transcription regulation, chromatin dynamics, development, and cancer.
|Bodily, Paul M; Fujimoto, M Stanley; Snell, Quinn et al. (2016) ScaffoldScaffolder: solving contig orientation via bidirected to directed graph reduction. Bioinformatics 32:17-24|
|Piccolo, Stephen R; Hoffman, Laura M; Conner, Thomas et al. (2016) Integrative analyses reveal signaling pathways underlying familial breast cancer susceptibility. Mol Syst Biol 12:860|
|Piccolo, Stephen R; Andrulis, Irene L; Cohen, Adam L et al. (2015) Gene-expression patterns in peripheral blood classify familial breast cancer susceptibility. BMC Med Genomics 8:72|
|Yazdani, Neema; Parker, Clarissa C; Shen, Ying et al. (2015) Hnrnph1 Is A Quantitative Trait Gene for Methamphetamine Sensitivity. PLoS Genet 11:e1005713|
|Whipple, Joseph M; Youssef, Osama A; Aruscavage, P Joseph et al. (2015) Genome-wide profiling of the C. elegans dsRNAome. RNA 21:786-800|
|Hong, Changjin; Manimaran, Solaiappan; Johnson, William Evan (2014) PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets. Cancer Inform 13:167-76|
|Fujimoto, M; Bodily, Paul M; Okuda, Nozomu et al. (2014) Effects of error-correction of heterozygous next-generation sequencing data. BMC Bioinformatics 15 Suppl 7:S3|
|Byrd, Allyson L; Perez-Rogers, Joseph F; Manimaran, Solaiappan et al. (2014) Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data. BMC Bioinformatics 15:262|
|Francis, Owen E; Bendall, Matthew; Manimaran, Solaiappan et al. (2013) Pathoscope: species identification and strain attribution with unassembled sequencing data. Genome Res 23:1721-9|
|Tennant, B R; Robertson, A G; Kramer, M et al. (2013) Identification and analysis of murine pancreatic islet enhancers. Diabetologia 56:542-52|
Showing the most recent 10 out of 26 publications