Microarray technology has become a standard tool in medical science and basic biology research. A major achievement of the technology is the successful development of an FDA approved breast cancer recurrence assay making it possible to identify patients at risk of distant recurrence following surgery. Microarrys have also become the standard tool of genome wide association studies (GWAS) which, according to Francis collins, have led to """"""""an astounding number of common DNA variations that play a part in the risk of developing common diseases such as heart disease, diabetes, cancer or autoimmunity"""""""". Approximately one half of all PubMed publications citing microarrays were published during the last 2 years (15,275 published during 2009-2010;15,926 published prior to 2009). We therefore expect that laboratories in academia and industry will continue to rely on these technologies for several years and that manufacturers will continue to develop new products at a rapid pace. With microarray technologies, a number of critical steps are required to convert raw measures into the data relied upon by biologists and clinicians. These data manipulations referred to as preprocessing, have enormous influence on the quality of the ultimate measurements and on the studies that rely upon them. However, the typical analysis software does not provide access to raw probe-level data. Our group has previously demonstrated that the use of alternative methodology can substantially improve accuracy and precision, relative to ad-hoc procedures introduced by default tools provided by the manufacturers. Through our suite of Bioconductor packages, we offer a flexible environment for statistical computing that continues to be the most widely used tool for the analysis of microarray probe-level data. During the last decade, much of our research has been dedicated to understanding the bias and systematic errors that can arise in high-throughput technologies. Systematic errors obscure results, thwart discovery, and contribute to findings that are not reproducible. The challenges for removing systematic errors are not isolated to array-based technologies. For example, similar problems to those encountered in microarrays have been reported for second generation sequencing raw data. For microarrays, we have amassed a substantial knowledge base and data analysis tools to effectively preprocess raw data, making the technology prime for translational research and clinical applications. Our software tools have partly facilitated this achievement and will play an important role in the promising next period of research driven by microarray technology. We are therefore responding to the request for application (RFA) for the continued development and maintenance of software, by proposing to continue to provide our successful and widely used resources.

Public Health Relevance

The research community has amassed substantial knowledge and developed reliable data analysis tools that effectively deal with bias and systematic error in microarray technology. The technology is prime for translational research and clinical applications. Our software tools have partly facilitated this achievement and will play an important role in the promising next period of research driven by microarray technology.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
8R01GM103552-05
Application #
8333372
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Sheeley, Douglas
Project Start
2005-07-01
Project End
2014-07-31
Budget Start
2012-08-01
Budget End
2013-07-31
Support Year
5
Fiscal Year
2012
Total Cost
$395,276
Indirect Cost
$138,793
Name
Johns Hopkins University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
001910777
City
Baltimore
State
MD
Country
United States
Zip Code
21218
Kumar, M Senthil; Slud, Eric V; Okrah, Kwame et al. (2018) Analysis and correction of compositional bias in sparse sequencing count data. BMC Genomics 19:799
Phipson, Belinda; Lee, Stanley; Majewski, Ian J et al. (2016) ROBUST HYPERPARAMETER ESTIMATION PROTECTS AGAINST HYPERVARIABLE GENES AND IMPROVES POWER TO DETECT DIFFERENTIAL EXPRESSION. Ann Appl Stat 10:946-963
Hicks, Stephanie C; Irizarry, Rafael A (2015) quantro: a data-driven approach to guide the choice of an appropriate normalization method. Genome Biol 16:117
Timp, Winston; Bravo, Hector Corrada; McDonald, Oliver G et al. (2014) Large hypomethylated blocks as a universal defining epigenetic alteration in human solid tumors. Genome Med 6:61
Aryee, Martin J; Jaffe, Andrew E; Corrada-Bravo, Hector et al. (2014) Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30:1363-9
McCall, Matthew N; Jaffee, Harris A; Zelisko, Susan J et al. (2014) The Gene Expression Barcode 3.0: improved data processing and mining tools. Nucleic Acids Res 42:D938-43
Wu, George; Yustein, Jason T; McCall, Matthew N et al. (2013) ChIP-PED enhances the analysis of ChIP-seq and ChIP-chip data. Bioinformatics 29:1182-9
Leek, Jeffrey T; Johnson, W Evan; Parker, Hilary S et al. (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882-3
Parker, Hilary S; Leek, Jeffrey T (2012) The practical effect of batch on genomic prediction. Stat Appl Genet Mol Biol 11:Article 10
McCall, Matthew N; Jaffee, Harris A; Irizarry, Rafael A (2012) fRMA ST: frozen robust multiarray analysis for Affymetrix Exon and Gene ST arrays. Bioinformatics 28:3153-4

Showing the most recent 10 out of 19 publications