Integrative Probabilistic Models for Identifying Transcriptional Modules

Medvedovic, Mario

Abstract

Transcriptional modules (TM) are groups of co-regulated genes along with transcriptional factors regulating their expression. Identifying TMs based on experimental data and genomic regulatory sequences is an important and difficult problem in biomedicine. The data that can be used in reconstructing TMs comes from genome-wide gene expression profiling experiments, whole genome transcription factor binding experiments, sequences of experimentally established DNA regulatory motifs and sequences of gene regulatory regions. Benefits of using all available types of data in the process of identifying and characterizing TMs have been demonstrated in numerous studies. While precise probabilistic models generally do exist for analyzing different data types separately, unifying models for all available data types are scarce. Computational methods currently available to biomedical researchers are inadequate either due to the lack of appropriate computational tools, or due to inadequacies of underlying mathematical framework. Furthermore, protocols for establishing relative benefits of different strategies for joint modeling of different data types are non-existent. This leaves biomedical researchers without means to make an informed decision when choosing the optimal data analysis approach. We propose to develop Infinite Transcriptional Modules (ITM) framework consisting of a novel probabilistic model and related computational tools for identifying transcriptional modules by jointly modeling gene expression and regulatory data. The unifying probabilistic model will utilize the Infinite Mixtures Model mechanism for averaging over models with different number of modules and thus circumvent the problem of estimating the """"""""correct"""""""" number of modules. Each different data type will be modeled separately within different context of a Context Specific Infinite Mixture Model. Such modular approach will facilitate the use of the most appropriate probabilistic models for representing different types of data. Our intention is not to develop new models and analytical approaches for different data types. Instead, we will focus on developing a principled probabilistic framework for integrating currently available state of the art models for individual data types. We hypothesize that our unifying modeling approach will result in significantly higher precision of identified transcriptional modules than it would be achieved by either separately analyzing different data types, or by applying currently available algorithms for joint analysis. We also expect that the posterior distribution of co-membership in a TM, based on our model, will offer credible assessment of statistical significance of identified TMs. Using real world data;we will construct datasets and protocols for objectively comparing key performance aspects of different methods for TM reconstruction.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Exploratory/Developmental Grants (R21)
Project #: 5R21LM009662-02
Application #: 7691698
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Ye, Jane

Project Start: 2008-09-30
Project End: 2011-09-29
Budget Start: 2009-09-30
Budget End: 2011-09-29
Support Year: 2
Fiscal Year: 2009
Total Cost: $175,500
Indirect Cost

Institution

Name: University of Cincinnati
Department: Public Health & Prev Medicine
Type: Schools of Medicine
DUNS #: 041064767

City: Cincinnati
State: OH
Country: United States
Zip Code: 45221

Related projects


NIH 2009 R21 LM	Integrative Probabilistic Models for Identifying Transcriptional Modules Medvedovic, Mario / University of Cincinnati	$175,500
NIH 2008 R21 LM	Integrative Probabilistic Models for Identifying Transcriptional Modules Medvedovic, Mario / University of Cincinnati	$198,000

Publications

Leikauf, George D; Pope-Varsalona, Hannah; Concel, Vincent J et al. (2012) Integrative assessment of chlorine-induced acute lung injury in mice. Am J Respir Cell Mol Biol 47:234-44

Joshi, Vineet K; Freudenberg, Johannes M; Hu, Zhen et al. (2011) WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results. Source Code Biol Med 6:3

Leikauf, George D; Concel, Vincent J; Liu, Pengyuan et al. (2011) Haplotype association mapping of acute lung injury in mice implicates activin a receptor, type 1. Am J Respir Crit Care Med 183:1499-509

Fabisiak, James P; Medvedovic, Mario; Alexander, Danny C et al. (2011) Integrative metabolome and transcriptome profiling reveals discordant energetic stress between mouse strains with differential sensitivity to acrolein-induced acute lung injury. Mol Nutr Food Res 55:1423-34

Freudenberg, Johannes M; Sivaganesan, Siva; Phatak, Mukta et al. (2011) Generalized random set framework for functional enrichment analysis using primary genomics datasets. Bioinformatics 27:70-7

Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M et al. (2010) Genomics Portals: integrative web-platform for mining genomics data. BMC Genomics 11:27

Freudenberg, Johannes M; Sivaganesan, Siva; Wagner, Michael et al. (2010) A semi-parametric Bayesian model for unsupervised differential co-expression analysis. BMC Bioinformatics 11:234

Stark, James M; Barmada, M Michael; Winterberg, Abby V et al. (2010) Genomewide association analysis of respiratory syncytial virus infection in mice. J Virol 84:2257-69

Freudenberg, Johannes M; Joshi, Vineet K; Hu, Zhen et al. (2009) CLEAN: CLustering Enrichment ANalysis. BMC Bioinformatics 10:234

Sartor, Maureen A; Schnekenburger, Michael; Marlowe, Jennifer L et al. (2009) Genomewide analysis of aryl hydrocarbon receptor binding targets reveals an extensive array of gene clusters that control morphogenetic and developmental programs. Environ Health Perspect 117:1139-46

Comments

Be the first to comment on Mario Medvedovic's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: