The objective of this proposal is to develop mathematical framework and corresponding computational tools for identifying statistically significant patterns in functional genomics data. Cluster analysis has been a productive approach for identifying groups of co-expressed genes in microarray data and other complex biological data sets. Results of such analyses have served to dissect regulatory mechanisms driving co- expression, identify pathways involved in biological processes and functionally annotate genes. The quality of these results and conclusions are directly related to the quality of the clustering procedure used in the analysis. Currently used computational tools for cluster analysis are inadequate with respect to assessing the statistical significance of clustering results, clustering data across different studies and biological systems, and integrating additional data types in the analysis. We propose a multidisciplinary study to extend the mathematical framework of Bayesian infinite mixtures and to develop related computational tools. Bayesian infinite mixtures are unique among currently available clustering approaches in their ability to optimally use the information in the data and to account for all sources of uncertainty that are incorporated in results of a cluster analysis. In our experiments thus far, clustering procedures based on infinite mixture models outperformed all alternative approaches with both simulated data and real-world microarray datasets. Proposed extensions will facilitate """"""""meta-clustering"""""""" across different studies and biological systems and will integrate background knowledge and additional data types in the analysis. Corresponding computational procedures will be validated through simulation studies, analysis of publicly available data and a study of mammary tumor initiation. All methods will be optimized and delivered to the scientific community through a Bioconductor package, as a stand-alone command-line program, and the source code itself. Biomedical researchers using these computational procedures will significantly improve their ability to correctly interpret results of their microarray experiments.
Showing the most recent 10 out of 17 publications