Significant resources have been developed that include large amounts of microarray data, representing studies on both model organisms and humans. Many early studies incorporating microarray methods have been focused on identification of genes that are expressed at different levels in two conditions, ignoring potential confounding transcription from multiple regulation. This is a logical focus if the goal of the analysis is identification of biomarkers. However, in order to detect biological activity, it is necessary to obtain transcriptional signatures linked to processes rather than to conditions. Due to multiple regulation of the majority of genes and limited information concerning such multiple regulation, identification of transcriptional coregulation cannot be accomplished without significant mathematical modeling. The work outlined here will lead to an open-source, statistically powerful, and flexible algorithm for identification of transcriptional signatures that leverages existing biological knowledge available through pathway databases, gene ontology, and databases of gene regulation. The proposal consists of two specific aims. First, we will create a novel Markov chain Monte Carlo algorithm that can directly infer the activity of biological processes through the use of enrichment analysis. The algorithm will include swappable error models whose parameters are estimated during sampling. To the best of our knowledge, we are the first group to propose direct inference on biological processes within a mathematical framework allowing for multiple regulation. Second, we will encode the algorithm in a user friendly open-source tool and within the R language and as a GenePattern module. This work will provide an algorithm specifically designed to identify transcriptional signatures and changes in biological processes from noisy data using prior biological knowledge. While such data is now typical in microarray studies, it will soon exist in genotyping and proteomic studies as well. Our inclusion of a flexible, parameterized error model will make this algorithm useful in these emerging fields as well. In the future, we intend to focus our work on models of signaling networks in mammalian systems, relying on the results of this work to provide transcriptional signatures to guide inference on the these networks. This work has significant implications for the development of systems capable of utilizing the growing functional genomics data to infer the activity of specific biological processes, such as signaling networks and metabolic pathways. Such information is vital to understanding human disease and the response to therapy, especially with new molecularly targeted therapeutics.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Exploratory/Developmental Grants (R21)
Project #
3R21LM009382-01A2S1
Application #
7922313
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2009-09-30
Project End
2011-09-29
Budget Start
2009-09-30
Budget End
2011-09-29
Support Year
1
Fiscal Year
2009
Total Cost
$38,042
Indirect Cost
Name
Johns Hopkins University
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
001910777
City
Baltimore
State
MD
Country
United States
Zip Code
21218
Little, J L; Serzhanova, V; Izumchenko, E et al. (2014) A requirement for Nedd9 in luminal progenitor cells prior to mammary tumorigenesis in MMTV-HER2/ErbB2 mice. Oncogene 33:411-20
Fertig, Elana J; Favorov, Alexander V; Ochs, Michael F (2013) Identifying context-specific transcription factor targets from prior knowledge and gene expression data. IEEE Trans Nanobioscience 12:142-9
Fertig, Elana J; Ren, Qing; Cheng, Haixia et al. (2012) Gene expression signatures modulated by epidermal growth factor receptor activation and their relationship to cetuximab resistance in head and neck squamous cell carcinoma. BMC Genomics 13:160
Yörük, Erdem; Ochs, Michael F; Geman, Donald et al. (2011) A comprehensive statistical model for cell signaling. IEEE/ACM Trans Comput Biol Bioinform 8:592-606
Kossenkov, Andrew V; Ochs, Michael F (2010) Matrix factorisation methods applied in microarray data analysis. Int J Data Min Bioinform 4:72-90
Fertig, Elana J; Ding, Jie; Favorov, Alexander V et al. (2010) CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data. Bioinformatics 26:2792-3
Rink, Lori; Skorobogatko, Yuliya; Kossenkov, Andrew V et al. (2009) Gene expression signatures and response to imatinib mesylate in gastrointestinal stromal tumor. Mol Cancer Ther 8:2172-82
Kossenkov, Andrew V; Ochs, Michael F (2009) Matrix factorization for recovery of biological processes from microarray data. Methods Enzymol 467:59-77
Ochs, Michael F; Rink, Lori; Tarn, Chi et al. (2009) Detection of treatment-induced changes in signaling pathways in gastrointestinal stromal tumors using transcriptomic data. Cancer Res 69:9125-32