It is widely recognized that cancer and many other diseases are fundamentally genetic diseases. Analysis of expressed genes has subsequently become a major focus of human disease research. DNA microarrays have become a preferred platform to do so. Many laboratories have chosen commercially produced microarrays (Affymetrix GeneChip_) to reduce performance variations, however this promising technology has been hampered by significant performance problems. Quantitative values reported from other platforms often bear little similarity to values reported by microarrays. Further, there is even a lack of correspondence between reported gene expression values from DNA microarrays of similar or even the same tumor or other samples, particularly for genes expressed at low levels which are frequently lost in background noise, if even detected at all. The problem is exacerbated by variables such as lab to lab variation, sample degradation, chip variability and manufacturing defects, scanner variability and other non-biologic """"""""noise'.
We aim to analyze the most common variables in this study developing a set of metrics that would then be implemented in software as analysis routines to broadly control and normalize for sources of non-biologic variation, focusing on 1) mRNA degradation, 2) the effects of mRNA amplification and preparation protocols, e.g. staining protocols, 3) the effect of scanner calibration and linear range of detection on individual gene probe set and aggregate gene performance, especially extreme high and low expression, 4) the effect of chemical saturation on individual gene probe set, 5) the contribution of cross hybridization between probes from different genes on overall gene expression values, and 6) the effect of different signal detection and processing algorithms on reported gene expression values and incorporating in software decision support to automatically use different algorithms when appropriate. Successful completion of this project will yield greatly improved and more reliable gene expression values from microarrays as well as enhanced efficiency of the process by eliminating many errors in data analysis.