Preprocessing and Analysis Tools for Contemporary Microarray Applications

Irizarry, Rafael

Abstract

After more than a decade of improvements to experimental and data analysis techniques, microarray technology is poised to become instrumental in the era of personalized genomics. In fact, Affymetrix, a leading manufacturer, recently achieved the first FDA clearance of high-throughput gene profiling reagents. Microarrays were also crucial in the successful development of an FDA approved breast cancer recurrence assay - making it possible to identify patients at risk of distant recurrence following surgery. Moreover, approximately one half of all Pub Med publications citing microarrays were published during the last two years. We therefore expect laboratories in academia and industry to continue relying on these technologies for several years as newer genomic technologies mature, and that manufacturers will continue to develop new products at a rapid pace. All microarray data analyses begin by converting raw measures into the data and summary statistics relied upon by biologists and clinicians. This first step, referred to as preprocessing, has an enormous influence on the quality of the ultimate measurements and results from studies that rely upon them. Our group has previously demonstrated that statistical methodology can provide great improvements over ad hoc data analysis algorithms offered as defaults by array manufacturers. Our highly cited statistical methodology and our widely used software implementations demonstrate the success of our work. While gene expression has been the most popular microarray application, recently, the technology has been used to measure diverse genomic endpoints including genotype, copy number variants, transcription factor binding sites, and several epigenetic marks, including DNA methylation. During the first funding period, our group was dedicated to understanding the bias and systematic errors which can obscure results, thwart discovery, and contribute to findings that are not reproducible. We have amassed expertise and developed successful data analysis tools to effectively preprocess raw data, making the technology prime for translational research and clinical applications. However, this transition from basic to clinical research will generate new statistical challenges and our methodology, which has partly facilitated the success of microarrays, will play an important role in the promising next period of research driven by microarray technology. Our goal is to develop the next generation of preprocessing and analysis tools with an emphasis on translational applications. Toward this goal, the current proposal has the following specific aims: developing single array preprocessing methodology with emphasis on batch effect removal, developing microarray analysis tools for three urgent needs, and developing generalized bump hunting methodology for detecting differentially methylated regions.

Public Health Relevance

Microarrays are poised to become instrumental in the era of personalized genomic. The first step in microarray data analyses convert raw data into the summaries relied upon by biologists. Our group will develop statistical methodology for the next generation of microarray applications.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM083084-09
Application #: 8731247
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Sheeley, Douglas

Project Start: 2007-09-24
Project End: 2016-08-31
Budget Start: 2014-09-01
Budget End: 2015-08-31
Support Year: 9
Fiscal Year: 2014
Total Cost: $254,492
Indirect Cost: $52,277

Institution

Name: Dana-Farber Cancer Institute
Department
Type
DUNS #: 076580745

City: Boston
State: MA
Country: United States
Zip Code: 02215

Related projects

Publications

Korthauer, Keegan; Chakraborty, Sutirtha; Benjamini, Yuval et al. (2018) Detection and accurate false discovery rate control of differentially methylated regions from whole genome bisulfite sequencing. Biostatistics :

Hicks, Stephanie C; Okrah, Kwame; Paulson, Joseph N et al. (2018) Smooth quantile normalization. Biostatistics 19:185-198

Kumar, M Senthil; Slud, Eric V; Okrah, Kwame et al. (2018) Analysis and correction of compositional bias in sparse sequencing count data. BMC Genomics 19:799

Shukla, Chinmay J; McCorkindale, Alexandra L; Gerhardinger, Chiara et al. (2018) High-throughput identification of RNA nuclear enrichment sequences. EMBO J 37:

Fan, Jianqing; Liu, Han; Sun, Qiang et al. (2018) I-LAMM FOR SPARSE LEARNING: SIMULTANEOUS CONTROL OF ALGORITHMIC COMPLEXITY AND STATISTICAL ERROR. Ann Stat 46:814-841

Hicks, Stephanie C; Townes, F William; Teng, Mingxiang et al. (2018) Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19:562-578

McCall, Matthew N; Kim, Min-Sik; Adil, Mohammed et al. (2017) Toward the human cellular microRNAome. Genome Res 27:1769-1781

Nakayama, Robert T; Pulice, John L; Valencia, Alfredo M et al. (2017) SMARCB1 is required for widespread BAF complex-mediated activation of enhancers and bivalent promoters. Nat Genet 49:1613-1623

Teng, Mingxiang; Irizarry, Rafael A (2017) Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data. Genome Res 27:1930-1938

Zhao, Tuo; Liu, Han (2016) Accelerated Path-following Iterative Shrinkage Thresholding Algorithm with Application to Semiparametric Graph Estimation. J Comput Graph Stat 25:1272-1296

Showing the most recent 10 out of 108 publications

Comments

Be the first to comment on Rafael Irizarry's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: