Transcriptional modules (TM) are groups of co-regulated genes along with transcriptional factors regulating their expression. Identifying TMs based on experimental data and genomic regulatory sequences is an important and difficult problem in biomedicine. The data that can be used in reconstructing TMs comes from genome-wide gene expression profiling experiments, whole genome transcription factor binding experiments, sequences of experimentally established DNA regulatory motifs and sequences of gene regulatory regions. Benefits of using all available types of data in the process of identifying and characterizing TMs have been demonstrated in numerous studies. While precise probabilistic models generally do exist for analyzing different data types separately, unifying models for all available data types are scarce. Computational methods currently available to biomedical researchers are inadequate either due to the lack of appropriate computational tools, or due to inadequacies of underlying mathematical framework. Furthermore, protocols for establishing relative benefits of different strategies for joint modeling of different data types are non-existent. This leaves biomedical researchers without means to make an informed decision when choosing the optimal data analysis approach. We propose to develop Infinite Transcriptional Modules (ITM) framework consisting of a novel probabilistic model and related computational tools for identifying transcriptional modules by jointly modeling gene expression and regulatory data. The unifying probabilistic model will utilize the Infinite Mixtures Model mechanism for averaging over models with different number of modules and thus circumvent the problem of estimating the """"""""correct"""""""" number of modules. Each different data type will be modeled separately within different context of a Context Specific Infinite Mixture Model. Such modular approach will facilitate the use of the most appropriate probabilistic models for representing different types of data. Our intention is not to develop new models and analytical approaches for different data types. Instead, we will focus on developing a principled probabilistic framework for integrating currently available state of the art models for individual data types. We hypothesize that our unifying modeling approach will result in significantly higher precision of identified transcriptional modules than it would be achieved by either separately analyzing different data types, or by applying currently available algorithms for joint analysis. We also expect that the posterior distribution of co-membership in a TM, based on our model, will offer credible assessment of statistical significance of identified TMs. Using real world data;we will construct datasets and protocols for objectively comparing key performance aspects of different methods for TM reconstruction.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21LM009662-02
Application #
7691698
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2008-09-30
Project End
2011-09-29
Budget Start
2009-09-30
Budget End
2011-09-29
Support Year
2
Fiscal Year
2009
Total Cost
$175,500
Indirect Cost
Name
University of Cincinnati
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
041064767
City
Cincinnati
State
OH
Country
United States
Zip Code
45221
Leikauf, George D; Pope-Varsalona, Hannah; Concel, Vincent J et al. (2012) Integrative assessment of chlorine-induced acute lung injury in mice. Am J Respir Cell Mol Biol 47:234-44
Joshi, Vineet K; Freudenberg, Johannes M; Hu, Zhen et al. (2011) WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results. Source Code Biol Med 6:3
Leikauf, George D; Concel, Vincent J; Liu, Pengyuan et al. (2011) Haplotype association mapping of acute lung injury in mice implicates activin a receptor, type 1. Am J Respir Crit Care Med 183:1499-509
Fabisiak, James P; Medvedovic, Mario; Alexander, Danny C et al. (2011) Integrative metabolome and transcriptome profiling reveals discordant energetic stress between mouse strains with differential sensitivity to acrolein-induced acute lung injury. Mol Nutr Food Res 55:1423-34
Freudenberg, Johannes M; Sivaganesan, Siva; Phatak, Mukta et al. (2011) Generalized random set framework for functional enrichment analysis using primary genomics datasets. Bioinformatics 27:70-7
Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M et al. (2010) Genomics Portals: integrative web-platform for mining genomics data. BMC Genomics 11:27
Freudenberg, Johannes M; Sivaganesan, Siva; Wagner, Michael et al. (2010) A semi-parametric Bayesian model for unsupervised differential co-expression analysis. BMC Bioinformatics 11:234
Stark, James M; Barmada, M Michael; Winterberg, Abby V et al. (2010) Genomewide association analysis of respiratory syncytial virus infection in mice. J Virol 84:2257-69
Freudenberg, Johannes M; Joshi, Vineet K; Hu, Zhen et al. (2009) CLEAN: CLustering Enrichment ANalysis. BMC Bioinformatics 10:234
Sartor, Maureen A; Schnekenburger, Michael; Marlowe, Jennifer L et al. (2009) Genomewide analysis of aryl hydrocarbon receptor binding targets reveals an extensive array of gene clusters that control morphogenetic and developmental programs. Environ Health Perspect 117:1139-46