Transcriptional regulation is one of the crucial mechanisms used by living systems to regulate protein levels. Disregulation of gene expression underlies toxic effects of many chemicals, and gene expression changes are often reliable markers of a disease. Understanding of gene expression regulation mechanisms is likely to improve our ability to effectively treat human disease and predict effects of environmental toxicants. Identifying groups of co-expressed genes by the cluster analysis of microarrays data has been a commonly used approach for characterizing patterns of gene expression. Currently used computational tools for cluster analysis are inadequate with respect to quantifying reproducibility of observed patterns. We propose to develop computational tools for efficient and reproducible extraction of biologically significant patterns from functional genomics data. Proposed computational procedures will be based on the Bayesian infinite mixture model. This approach allows for efficient use of information in the data and for assessing reproducibility of observed patterns. Precise modeling of uncertainty in cluster analysis will especially be beneficial when clusters of co-expressed genes are used as a starting point in characterizing the co-regulation of such genes. Joint modeling of functional genomics data and genomic regulatory sequences will facilitate optimal information exchange between these two data types. During the R21 portion of the grant, computational tools based on the Bayesian infinite mixture model will be validated. During the R33 portion, joint expression-sequence data models will be validated and all computational procedures will be incorporated in a user-friendly public domain software package. Key features of the software will be an intuitive graphical user interface and ability to directly access, manipulate and analyze diverse types of data. In addition to newly developed computational methods, the software will incorporate other relevant statistical techniques for correlating gene expression and cis-regulatory elements data. By using this software, biomedical researchers will be able to make reliable and reproducible conclusions about gene expression patterns and regulatory elements associate with these patterns.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21HG002849-02
Application #
6805766
Study Section
Genome Study Section (GNM)
Program Officer
Good, Peter J
Project Start
2003-09-30
Project End
2006-06-30
Budget Start
2004-07-01
Budget End
2006-06-30
Support Year
2
Fiscal Year
2004
Total Cost
$144,129
Indirect Cost
Name
University of Cincinnati
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
041064767
City
Cincinnati
State
OH
Country
United States
Zip Code
45221
Liu, X; Sivaganesan, S; Yeung, K Y et al. (2006) Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset. Bioinformatics 22:1737-44
Yeung, Ka Yee; Medvedovic, Mario; Bumgarner, Roger E (2004) From co-expression to co-regulation: how many microarray experiments do we need? Genome Biol 5:R48
Medvedovic, M; Yeung, K Y; Bumgarner, R E (2004) Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 20:1222-32