The objective of this proposal is to develop mathematical framework and corresponding computational tools for identifying statistically significant patterns in functional genomics data. Cluster analysis has been a productive approach for identifying groups of co-expressed genes in microarray data and other complex biological data sets. Results of such analyses have served to dissect regulatory mechanisms driving co- expression, identify pathways involved in biological processes and functionally annotate genes. The quality of these results and conclusions are directly related to the quality of the clustering procedure used in the analysis. Currently used computational tools for cluster analysis are inadequate with respect to assessing the statistical significance of clustering results, clustering data across different studies and biological systems, and integrating additional data types in the analysis. We propose a multidisciplinary study to extend the mathematical framework of Bayesian infinite mixtures and to develop related computational tools. Bayesian infinite mixtures are unique among currently available clustering approaches in their ability to optimally use the information in the data and to account for all sources of uncertainty that are incorporated in results of a cluster analysis. In our experiments thus far, clustering procedures based on infinite mixture models outperformed all alternative approaches with both simulated data and real-world microarray datasets. Proposed extensions will facilitate """"""""meta-clustering"""""""" across different studies and biological systems and will integrate background knowledge and additional data types in the analysis. Corresponding computational procedures will be validated through simulation studies, analysis of publicly available data and a study of mammary tumor initiation. All methods will be optimized and delivered to the scientific community through a Bioconductor package, as a stand-alone command-line program, and the source code itself. Biomedical researchers using these computational procedures will significantly improve their ability to correctly interpret results of their microarray experiments.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG003749-04
Application #
7649466
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Pazin, Michael J
Project Start
2006-07-01
Project End
2011-12-31
Budget Start
2009-07-01
Budget End
2011-12-31
Support Year
4
Fiscal Year
2009
Total Cost
$352,322
Indirect Cost
Name
University of Cincinnati
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
041064767
City
Cincinnati
State
OH
Country
United States
Zip Code
45221
Chen, Jing; Hu, Zhen; Phatak, Mukta et al. (2013) Genome-wide signatures of transcription factor activity: connecting transcription factors, disease, and small molecules. PLoS Comput Biol 9:e1003198
Leikauf, George D; Pope-Varsalona, Hannah; Concel, Vincent J et al. (2012) Integrative assessment of chlorine-induced acute lung injury in mice. Am J Respir Cell Mol Biol 47:234-44
Joshi, Vineet K; Freudenberg, Johannes M; Hu, Zhen et al. (2011) WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results. Source Code Biol Med 6:3
Leikauf, George D; Concel, Vincent J; Liu, Pengyuan et al. (2011) Haplotype association mapping of acute lung injury in mice implicates activin a receptor, type 1. Am J Respir Crit Care Med 183:1499-509
Fabisiak, James P; Medvedovic, Mario; Alexander, Danny C et al. (2011) Integrative metabolome and transcriptome profiling reveals discordant energetic stress between mouse strains with differential sensitivity to acrolein-induced acute lung injury. Mol Nutr Food Res 55:1423-34
SenthamaraiKannan, Paranthaman; Sartor, Maureen A; O'Connor, Kyle T et al. (2011) Identification of maternally regulated fetal gene networks in the placenta with a novel embryo transfer system in mice. Physiol Genomics 43:317-24
Freudenberg, Johannes M; Sivaganesan, Siva; Phatak, Mukta et al. (2011) Generalized random set framework for functional enrichment analysis using primary genomics datasets. Bioinformatics 27:70-7
Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M et al. (2010) Genomics Portals: integrative web-platform for mining genomics data. BMC Genomics 11:27
Freudenberg, Johannes M; Sivaganesan, Siva; Wagner, Michael et al. (2010) A semi-parametric Bayesian model for unsupervised differential co-expression analysis. BMC Bioinformatics 11:234
Stark, James M; Barmada, M Michael; Winterberg, Abby V et al. (2010) Genomewide association analysis of respiratory syncytial virus infection in mice. J Virol 84:2257-69

Showing the most recent 10 out of 17 publications