Preprocessing and Analysis Tools for High-Throughput Technologies

Irizarry, Rafael

Abstract

High-throughput technologies are poised to become instrumental in the era of precision medicine. Applications of these technologies go beyond genome sequencing of genomic DNA itself and include the measurement of quan- titative and dynamic outcomes underlying genomic function. In fact, several gene expression based tests have been translated into clinical practice. Although some applications of high-throughput technologies are relatively mature, manufacturers continue to develop new products at a rapid pace. With new technologies and new ap- plications come new unexpected statistical challenges. Quantitative outcomes are particularly subject to severe systematic bias and unforeseen variability. We have witnessed how these biases can greatly impact downstream analyses, with several results published in the top biological journals brought into question after careful examina- tion of the data. For high-throughput technologies to be useful in clinical applications, rigorous statistical methods that account for these issues need to be developed. Our group has previously demonstrated that statistical methodology can provide great improvements over ad hoc data analysis algorithms offered as defaults by technology developers. We have successfully applied our tools in multiple clinical and translational settings that demonstrate the value of our work in this context. Our highly cited statistical methodology and our widely used software implementations, developed during the ?rst two funding periods, demonstrate the success of our work. We have been dedicated to understanding and developing solutions to overcome bias and systematic errors and have helped improve clarity in results and contributed to data-driven discovery. We are enthusiastic about continuing this work and helping to make genomics technology a primary tool for translational research and clinical applications. We have identi?ed three speci?c statistical challenges urgently requiring reliable statistical solutions that can greatly bene?t from our expertise. Namely, we propose providing a precise and accurate single-sample process- ing method to facilitate clinical application of RNA-Seq, developing statistical methodology that accounts for the problem of detection bias in high-throughput single cell data, and developing a framework for statistical inference for region detection. A common thread to the ideas in the current proposal is that we leverage the public data repositories to develop rigorous statistical solutions. We will disseminate our work by developing open source sta- tistical software and providing compelling examples of how our methods facilitate biological discovery, especially in the context of clinical applications.

Public Health Relevance

High-throughput technologies are poised to become instrumental in the era of precision medicine. Although some applications are relatively mature, with new technologies and new applications come new unexpected statistical challenges. We will develop the necessary statistical methods to help make genomics technology a primary tool for translational research and clinical applications.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 2R01GM083084-11
Application #: 9177343
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Ravichandran, Veerasamy

Project Start: 2007-09-24
Project End: 2020-06-30
Budget Start: 2016-09-01
Budget End: 2017-06-30
Support Year: 11
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: Dana-Farber Cancer Institute
Department
Type
DUNS #: 076580745

City: Boston
State: MA
Country: United States
Zip Code

Related projects

Publications

Korthauer, Keegan; Chakraborty, Sutirtha; Benjamini, Yuval et al. (2018) Detection and accurate false discovery rate control of differentially methylated regions from whole genome bisulfite sequencing. Biostatistics :

Hicks, Stephanie C; Okrah, Kwame; Paulson, Joseph N et al. (2018) Smooth quantile normalization. Biostatistics 19:185-198

Kumar, M Senthil; Slud, Eric V; Okrah, Kwame et al. (2018) Analysis and correction of compositional bias in sparse sequencing count data. BMC Genomics 19:799

Shukla, Chinmay J; McCorkindale, Alexandra L; Gerhardinger, Chiara et al. (2018) High-throughput identification of RNA nuclear enrichment sequences. EMBO J 37:

Fan, Jianqing; Liu, Han; Sun, Qiang et al. (2018) I-LAMM FOR SPARSE LEARNING: SIMULTANEOUS CONTROL OF ALGORITHMIC COMPLEXITY AND STATISTICAL ERROR. Ann Stat 46:814-841

Hicks, Stephanie C; Townes, F William; Teng, Mingxiang et al. (2018) Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19:562-578

McCall, Matthew N; Kim, Min-Sik; Adil, Mohammed et al. (2017) Toward the human cellular microRNAome. Genome Res 27:1769-1781

Nakayama, Robert T; Pulice, John L; Valencia, Alfredo M et al. (2017) SMARCB1 is required for widespread BAF complex-mediated activation of enhancers and bivalent promoters. Nat Genet 49:1613-1623

Teng, Mingxiang; Irizarry, Rafael A (2017) Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data. Genome Res 27:1930-1938

Zhao, Tuo; Liu, Han (2016) Accelerated Path-following Iterative Shrinkage Thresholding Algorithm with Application to Semiparametric Graph Estimation. J Comput Graph Stat 25:1272-1296

Showing the most recent 10 out of 108 publications

Comments

Be the first to comment on Rafael Irizarry's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: