Beyond heuristics: a tool for the rigorous statistical analysis of *-seq assays.

Bickel, Peter

Abstract

Assays based upon next generation sequencing technologies (*-seq assays) are widely used in the genomics community. As these assays mature and attempt to probe more subtle biological phenomenon, new tools based upon powerful statistical techniques will be needed to provide confidence in the resulting biological conclusions. To date, *-seq assay analysis tools can be split into two distinct classes, mapping and quantification. Mapping tools attempt to match each read with a genomic location, whereas quantification tools infer biological features from the """"""""mapped"""""""" reads. The results of the mapping are often very dependent on tuning parameters and rarely, if ever, provide any notion of confidence. The analysis tools typically take the provided mappings as gospel. This project will take a different approach. The investigators propose to use known physical and biochemical properties of the assay to model the assay. Such an approach will yield better mappings, while providing a notion of confidence that can be made an integral part of downstream analysis. Extensive validation of the software and underlying models is planned in three organisms using data from five different validatory experiments. The work proposed in this project will result in significant improvements in the analyses of *-seq data. If successful, this project will replace a host of mapping algorithms, peak callers, and transcript quantifiers, forming the foundation of a software suite for the integrative analysis of *-seq assays.

Public Health Relevance

The results of the mapping reads in assays based on next generation sequencing, (e.g. ChIP-seq, RNA-seq, DNase-seq) are often very dependent on tuning parameters without ever providing a notion of confidence, and downstream analysis tools typically take the provided mappings as gospel. Our approach is different: we make the known physical and biochemical properties of the assay and biological properties of the feature assayed an integral part of the mapping process and then on the basis of our assay model, set confidence limits on our mappings that can then be made an integral part of downstream analysis, analytical or biological. Our working prototype is called Statmap, which we intend to replace a host of mappers, peak callers, and transcript quantifiers as the principle tool for analysis and quantification in the computational genomicists arsenal by the end of 2012.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Exploratory/Developmental Grants (R21)
Project #: 5R21HG006187-02
Application #: 8290222
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Bonazzi, Vivien

Project Start: 2011-06-27
Project End: 2013-05-31
Budget Start: 2012-06-01
Budget End: 2013-05-31
Support Year: 2
Fiscal Year: 2012
Total Cost: $222,835
Indirect Cost: $72,835

Institution

Name: University of California Berkeley
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 124726725

City: Berkeley
State: CA
Country: United States
Zip Code: 94704

Related projects


NIH 2012 R21 HG	Beyond heuristics: a tool for the rigorous statistical analysis of *-seq assays. Bickel, Peter J. / University of California Berkeley	$222,835
NIH 2011 R21 HG	Beyond heuristics: a tool for the rigorous statistical analysis of *-seq assays. Bickel, Peter J. / University of California Berkeley	$185,134

Publications

Boley, Nathan; Stoiber, Marcus H; Booth, Benjamin W et al. (2014) Genome-guided transcript assembly by integrative analysis of RNA sequence data. Nat Biotechnol 32:341-6

Comments

Be the first to comment on Peter Bickel's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: