DNA microarrays are a new and promising biotechnology which allow the monitoring of expression levels for thousands of genes simultaneously. Microarrays are being applied increasingly in biological and medical research to address a wide range of problems, such as the classification of tumors and the study of host genomic responses to bacterial infections. The broad, long-term objectives of this project are to develop novel statistical methods for the design and analysis of DNA microarray experiments.
The specific aims of the proposal fall into four areas, all of which are concerned with improving the efficiency and reliability of microarray experiments, from the early design and pre-processing stages to higher level analyses.I. Experimental design. Proper experimental design is essential to ensure that biological questions are answered accurately and precisely given experimental constraints. Flexible designs and methods of analysis will be developed for time-series and multifactorial experiments, which monitor the gene expression response over time for factors such as treatment and cell type. II. Pre-processing. Image analysis and normalization are components of all microarray experiments and can have a substantial impact on higher level analyses. Spot and slide quality statistics will be derived as well as procedures for incorporating these statistics in subsequent analyses. Normalization methods based on robust local regression are proposed to accommodate different types of dye biases and to exploit control sequences spotted on the array. Ill. Pattern discovery and recognition. New methods for clustering, discrimination, and multiple testing are proposed in order to elucidate associations between gene expression levels and other covariates or responses. This includes assessing the effects of treatment interventions, the discovery of temporal or spatial gene expression patterns, and the identification of genes associated with clinical outcomes such as cancer incidence and survival. IV. Software development. Statistical methods developed as part of this project will be implemented in packages built on the R language for statistical computing. To facilitate use and integration with biological information resources a web-browser interface will be provided.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM007609-03
Application #
6781847
Study Section
Genome Study Section (GNM)
Program Officer
Florance, Valerie
Project Start
2002-08-01
Project End
2006-07-31
Budget Start
2004-08-01
Budget End
2006-07-31
Support Year
3
Fiscal Year
2004
Total Cost
$416,602
Indirect Cost
Name
University of California Berkeley
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
124726725
City
Berkeley
State
CA
Country
United States
Zip Code
94704
Tai, Yu Chuan; Speed, Terence P (2009) On gene ranking using replicated microarray time course data. Biometrics 65:40-51
Qin, Xiaoli; Ahn, Soyeon; Speed, Terence P et al. (2007) Global analyses of mRNA translational control during early Drosophila embryogenesis. Genome Biol 8:R63
Hothorn, Torsten; Buhlmann, Peter; Dudoit, Sandrine et al. (2006) Survival ensembles. Biostatistics 7:355-73
Dugas, Jason C; Tai, Yu Chuan; Speed, Terence P et al. (2006) Functional genomic analysis of oligodendrocyte differentiation. J Neurosci 26:10967-83
Rabbee, Nusrat; Speed, Terence P (2006) A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics 22:7-12
Dudoit, Sandrine; Gentleman, Robert C; Quackenbush, John (2003) Open source software for the analysis of microarray data. Biotechniques Suppl:45-51