Microarray's are an exciting new technology allowing the simultaneous collection of gene expression data for literally thousands of genes. Recent developments have made this technology accessible to an ever widening group of researchers, and in the near future microarray technology may become reliable enough and inexpensive enough for routine medical use. Data arising from microarray's can be used to differentiate cell types (e.g., cancerous versus noncancerous), and for many other purposes. Researchers are only beginning to explore the potential uses of this new technology. The main difficulty in the analysis of microarray data is its relative abundance: gene expression information for thousands of genes is gathered for a relatively small number of experiments. Simple hypothesis tests must account for multiple testing, and standard statistical methods like multidimensional scaling and cluster analysis must be used on very large matrices, matrices that may be too big to even fit into a computers memory. Here we propose to develop innovative software for the analysis of data arising from microarrays. Some of this software will be based upon existing adaptive model building procedures using B-splines and their tensor products, but employed in a new setting. Algorithms for other analyses will be innovative in the sense that they will be capable of working on very large """"""""distance"""""""" matrices. The resulting software module should find a wide range of uses in the analysis of data arising from microarray experiments.
Use of microarray methodologies will grow exponentially over the next few years as signal extraction and other aspects of the technology mature. Statistical analysis methods that extend a readily available statistical analysis package such as S-PLUS and are also capable of handling the large number of variables that are produced by microarrays will be very popular among biologists other researchers.