Preprocessing and Analysis Tools for Contemporary Microarray Applications

Irizarry, Rafael

Abstract

Microarrays are an example of powerful high throughput genomics tools that are revolutionizing the measurement of biological systems. In this and other technologies, a number of critical steps are required to convert the raw measures into the results relied upon by biologists and clinicians. These data manipulation have enormous influence on the quality of the ultimate measurements and studies that rely upon them. Our group has previously demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. Various companies have now incorporated our methods into their data analysis software (e.g. GeneSpring, GeneTraffic). Microarrays are now being used to measure diverse high genomic endpoints including genotype, chromosomal abnormalities including deletions/insertions, protein binding sites, methylation, and alternative splicing. In each case, the genomic units of measurement are short oligonucleotides referred to as probes. Without appropriate understanding of the bias and variance of these measurements, biological inferences based upon probe analysis will be compromised. In these new technologies, we expect our proposed research to produce statistical methods that facilitate improvements similar to those attained with expression arrays. The need for more research of this kind has grown dramatically in recent years, with the rapid expansion of novel uses of the microarray technology. Our long-term goal is to improve the quality of results obtained using microarray experiments via the use of improved statistical methodology. Toward this goal, the current proposal has the following specific aims: to develop basic analysis tools for the most popular emerging applications, to develop preprocessing methodology to serve the most urgent needs of the user community, and to develop general statistical methodology for population wide hot-spot detection.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM083084-04
Application #: 7924862
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Remington, Karin A

Project Start: 2007-09-24
Project End: 2012-08-31
Budget Start: 2010-09-01
Budget End: 2011-08-31
Support Year: 4
Fiscal Year: 2010
Total Cost: $433,199
Indirect Cost

Institution

Name: Johns Hopkins University
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #: 001910777

City: Baltimore
State: MD
Country: United States
Zip Code: 21218

Related projects

Publications

Korthauer, Keegan; Chakraborty, Sutirtha; Benjamini, Yuval et al. (2018) Detection and accurate false discovery rate control of differentially methylated regions from whole genome bisulfite sequencing. Biostatistics :

Hicks, Stephanie C; Okrah, Kwame; Paulson, Joseph N et al. (2018) Smooth quantile normalization. Biostatistics 19:185-198

Kumar, M Senthil; Slud, Eric V; Okrah, Kwame et al. (2018) Analysis and correction of compositional bias in sparse sequencing count data. BMC Genomics 19:799

Shukla, Chinmay J; McCorkindale, Alexandra L; Gerhardinger, Chiara et al. (2018) High-throughput identification of RNA nuclear enrichment sequences. EMBO J 37:

Fan, Jianqing; Liu, Han; Sun, Qiang et al. (2018) I-LAMM FOR SPARSE LEARNING: SIMULTANEOUS CONTROL OF ALGORITHMIC COMPLEXITY AND STATISTICAL ERROR. Ann Stat 46:814-841

Hicks, Stephanie C; Townes, F William; Teng, Mingxiang et al. (2018) Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19:562-578

McCall, Matthew N; Kim, Min-Sik; Adil, Mohammed et al. (2017) Toward the human cellular microRNAome. Genome Res 27:1769-1781

Nakayama, Robert T; Pulice, John L; Valencia, Alfredo M et al. (2017) SMARCB1 is required for widespread BAF complex-mediated activation of enhancers and bivalent promoters. Nat Genet 49:1613-1623

Teng, Mingxiang; Irizarry, Rafael A (2017) Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data. Genome Res 27:1930-1938

Zhao, Tuo; Liu, Han (2016) Accelerated Path-following Iterative Shrinkage Thresholding Algorithm with Application to Semiparametric Graph Estimation. J Comput Graph Stat 25:1272-1296

Showing the most recent 10 out of 108 publications

Comments

Be the first to comment on Rafael Irizarry's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: