Statistical Tools and Methods for Next-Generation Sequencing in Epigenetics

Johnson, William

Abstract

Next-generation sequencing technologies are capable of producing tens of millions of sequence reads during each instrument run, and are quickly being applied in diverse types of experiments (e.g. RNA-Seq, miRNA-Seq, ChIP-Seq, BS-seq, CNV-Seq) to address biomedical questions by cost-effectively generating genome-wide datasets. While sequencing has been promoted as overcoming longstanding limitations of microarray-based studies, its data files are much larger than for microarrays, and its diverse data types raise similar as well as novel statistical and computational challenges. There is a pressing need for statistical and computational tools to address what leaders in the field have stated are the largest problems: data analysis and data integration. We propose to develop a comprehensive and coordinated set of statistical methods for high throughput sequencing (HTS) that directly address many important data analysis problems in epigenomics. Specifically we plan to address the following computational and statistical challenges facing researchers conducting HTS experiments: 1) develop sensitive statistical methods for the analysis of ChIP-seq data both for single- and paired-end-tag runs, particularly the focusing on applications in genome-wide profiling of nucleosome positions. 2) develop statistical methods for the analysis of BS-seq data, producing base-level DNA methylation profiles. 3) develop new statistical tools and methods for data integration in order to gain new biological insights about global transcription and regulation. We also plan to apply these approaches to a variety of high throughput sequencing data sets to demonstrate the relevance and utility of our methods. We plan to work with stimulated STAT1 and STAT3 data, and data from the ETS transcription factor family and its cofactors, for which we have already gathered significant data through our collaborations, including transcription factors, histone marks, DNAse I hypersensitivity and gene expression.

Public Health Relevance

We propose to develop a comprehensive and coordinated set of statistical methods for high throughput sequencing (HTS) that directly address many important data analysis problems in epigenomics. In particular, we plan to integrate data from multiple sources including expression, transcription factor binding, nucleosome positioning, histone marks and DNA methylation to better understand the mechanisms that regulate the behavior of a cell. Much of our proposal involves not just the development of new statistical and computational methods, but also the design, implementation and delivery of software tools that support these ideas. The many useful applications of next-generation sequencing with assure that or well- developed methods will have a broad impact in molecular biology, specifically in transcription regulation, chromatin dynamics, development, and cancer.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG005692-05
Application #: 8628854
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Pazin, Michael J

Project Start: 2010-06-01
Project End: 2015-02-28
Budget Start: 2014-03-01
Budget End: 2015-02-28
Support Year: 5
Fiscal Year: 2014
Total Cost: $329,464
Indirect Cost: $77,408

Institution

Name: Boston University
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 604483045

City: Boston
State: MA
Country: United States
Zip Code: 02118

Related projects


NIH 2014 R01 HG	Statistical Tools and Methods for Next-Generation Sequencing in Epigenetics Johnson, William Evan / Boston University	$329,464
NIH 2013 R01 HG	Statistical Tools and Methods for Next-Generation Sequencing in Epigenetics Johnson, William Evan / Boston University	$321,060
NIH 2012 R01 HG	Statistical Tools and Methods for Next-Generation Sequencing in Epigenetics Johnson, William Evan / Boston University	$336,105
NIH 2011 R01 HG	Methods for the analysis and integrations of next-generation sequencing with appl Johnson, William Evan / Brigham Young University	$333,828
NIH 2010 R01 HG	Methods for the analysis and integrations of next-generation sequencing with appl Johnson, William Evan / Brigham Young University	$349,700

Publications

Bodily, Paul M; Fujimoto, M Stanley; Snell, Quinn et al. (2016) ScaffoldScaffolder: solving contig orientation via bidirected to directed graph reduction. Bioinformatics 32:17-24

Piccolo, Stephen R; Hoffman, Laura M; Conner, Thomas et al. (2016) Integrative analyses reveal signaling pathways underlying familial breast cancer susceptibility. Mol Syst Biol 12:860

Mortenson, Jeffrey B; Heppler, Lisa N; Banks, Courtney J et al. (2015) Histone deacetylase 6 (HDAC6) promotes the pro-survival activity of 14-3-3? via deacetylation of lysines within the 14-3-3? binding pocket. J Biol Chem 290:12487-96

Yazdani, Neema; Parker, Clarissa C; Shen, Ying et al. (2015) Hnrnph1 Is A Quantitative Trait Gene for Methamphetamine Sensitivity. PLoS Genet 11:e1005713

Piccolo, Stephen R; Andrulis, Irene L; Cohen, Adam L et al. (2015) Gene-expression patterns in peripheral blood classify familial breast cancer susceptibility. BMC Med Genomics 8:72

Whipple, Joseph M; Youssef, Osama A; Aruscavage, P Joseph et al. (2015) Genome-wide profiling of the C. elegans dsRNAome. RNA 21:786-800

Hong, Changjin; Manimaran, Solaiappan; Johnson, William Evan (2014) PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets. Cancer Inform 13:167-76

Fujimoto, M; Bodily, Paul M; Okuda, Nozomu et al. (2014) Effects of error-correction of heterozygous next-generation sequencing data. BMC Bioinformatics 15 Suppl 7:S3

Byrd, Allyson L; Perez-Rogers, Joseph F; Manimaran, Solaiappan et al. (2014) Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data. BMC Bioinformatics 15:262

Francis, Owen E; Bendall, Matthew; Manimaran, Solaiappan et al. (2013) Pathoscope: species identification and strain attribution with unassembled sequencing data. Genome Res 23:1721-9

Showing the most recent 10 out of 27 publications

Comments

Be the first to comment on William Johnson's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: