Microarray technology has become a standard tool in medical science and basic biology research. A major achievement of the technology is the successful development of an FDA approved breast cancer recurrence assay making it possible to identify patients at risk of distant recurrence following surgery. Microarrys have also become the standard tool of genome wide association studies (GWAS) which, according to Francis collins, have led to "an astounding number of common DNA variations that play a part in the risk of developing common diseases such as heart disease, diabetes, cancer or autoimmunity". Approximately one half of all PubMed publications citing microarrays were published during the last 2 years (15,275 published during 2009-2010;15,926 published prior to 2009). We therefore expect that laboratories in academia and industry will continue to rely on these technologies for several years and that manufacturers will continue to develop new products at a rapid pace. With microarray technologies, a number of critical steps are required to convert raw measures into the data relied upon by biologists and clinicians. These data manipulations referred to as preprocessing, have enormous influence on the quality of the ultimate measurements and on the studies that rely upon them. However, the typical analysis software does not provide access to raw probe-level data. Our group has previously demonstrated that the use of alternative methodology can substantially improve accuracy and precision, relative to ad-hoc procedures introduced by default tools provided by the manufacturers. Through our suite of Bioconductor packages, we offer a flexible environment for statistical computing that continues to be the most widely used tool for the analysis of microarray probe-level data. During the last decade, much of our research has been dedicated to understanding the bias and systematic errors that can arise in high-throughput technologies. Systematic errors obscure results, thwart discovery, and contribute to findings that are not reproducible. The challenges for removing systematic errors are not isolated to array-based technologies. For example, similar problems to those encountered in microarrays have been reported for second generation sequencing raw data. For microarrays, we have amassed a substantial knowledge base and data analysis tools to effectively preprocess raw data, making the technology prime for translational research and clinical applications. Our software tools have partly facilitated this achievement and will play an important role in the promising next period of research driven by microarray technology. We are therefore responding to the request for application (RFA) for the continued development and maintenance of software, by proposing to continue to provide our successful and widely used resources.

Public Health Relevance

The research community has amassed substantial knowledge and developed reliable data analysis tools that effectively deal with bias and systematic error in microarray technology. The technology is prime for translational research and clinical applications. Our software tools have partly facilitated this achievement and will play an important role in the promising next period of research driven by microarray technology.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Sheeley, Douglas
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Johns Hopkins University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Aryee, Martin J; Jaffe, Andrew E; Corrada-Bravo, Hector et al. (2014) Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30:1363-9
McCall, Matthew N; Jaffee, Harris A; Zelisko, Susan J et al. (2014) The Gene Expression Barcode 3.0: improved data processing and mining tools. Nucleic Acids Res 42:D938-43
Jaffe, Andrew E; Feinberg, Andrew P; Irizarry, Rafael A et al. (2012) Significance analysis and statistical dissection of variably methylated regions. Biostatistics 13:166-78
Leek, Jeffrey T; Johnson, W Evan; Parker, Hilary S et al. (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882-3
Jaffe, Andrew E; Murakami, Peter; Lee, Hwajin et al. (2012) Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int J Epidemiol 41:200-9