Background: Gene expression measurement technology has evolved rapidly over the last three years. Fluorescent-tagged probe DNA hybridized to microarrays printed on glass is one promising technology. Another approach uses P33-labeled DNA to probe arrays on nylon membranes and has the advantage of readily available reagents and instrumentation, greater sensitivity and ability to use smaller samples. Images of arrays of either sort must be quantified to produce a list of numerical intensities proportional to the expression levels corresponding to the gene fragments placed at each spot in the array. Using bioinformatics tools, spots on these arrays must be associated with sequence information for the corresponding clones, Unigene clusters, genes and protein products. Links to structural and functional information are also required. Numerous statistical, image processing and bioinformatics problems confront users of these technologies. As arrays can be constructed to contain thousands of spots, manual analysis of the resulting images is not feasible. Further, as investigators seek to couple this technology with laser capture microdissection (LCM) in the analysis of pathological tissue, the technology itself must be refined and improved. Accordingly, this projects seeks to address problems in this area at the statistical, numerical, computational, and informatics levels. Progress in FY99: Working with laboratories in NCI, NICHD, NHGRI, NIA, NIDDK, NINDS and NIDCR, we have analyzed over 400 array images to estimate intensity levels representing over 1,000,000 DNA hybridization measurements. The program PSCAN was developed to facilitate the image-processing steps of the analysis and produces optimal estimates of spot intensities. The program is written in MATLAB, and the code is being made publicly available, and a Web distribution site has been established. Numerous improvements to the image processing steps have been achieved including: improved spot detection, location and quantification algorithms, improved image rotation algorithms which preserve image density, reduced disk storage requirements, and improved processing speed. Our analysis method relies on a number of data visualization tools, and allows users to identify significantly over- or under-expressed genes in a comparative study. Importantly, these techniques also allow users to identify experimental artifacts, outliers and other data anomalies which are present and a large percentage of hybrization studies, such as non-constant background hybridization, image defects, dropouts, printing artifacts, spot bleeds, etc. Such techniques are now incorporated into the P-SCAN distrubution including scatterplot of normalized intensities, supression of background spots. A list of over- and under-expressed spots is also provided, and linkage of clone identifiers to web-based databases will shortly become available. We have also developed array layouts for new commercial arrays and for several arrays developed at NIH, including those where radio-labeled probe may be hybridized to glass-based arrays.
Showing the most recent 10 out of 46 publications