Being able to correctly infer the perturbed pathways interactions that cause the disease from a list of differentially expressed (DE) genes or proteins may be the key to transforming the now abundant high- throughput expression data into biological knowledge. However, the current methods that aim to bridge this gap by using the DE genes to identify significantly impacted pathways are rather unsophisticated. Many if not all such methods often treat the pathways as simple sets of genes, and either ignore or under-utilize the very essence of such pathways: the graphs that describe the complex ways in which genes interact with each other. Our preliminary results show that the existing pathway analysis methods often provide incorrect results. In addition, the p-values they provide are inappropriately influenced by common pathway genes through a pathway coupling phenomenon. The goal of this proposal is to address the problems above by developing methods that implement a systems biology approach for the analysis of gene signaling pathways. Given a disease characterized using a high throughput gene expression approach, we propose an impact analysis technique able to: i) identify the significantly impacted pathways, and ii) propose specific gene signaling cascades that could potentially be targeted by drugs. This technique takes into consideration biologically important factors currently neglected by the existing pathway analysis tools including: i) the gene interactions as described by the pathway graph, ii) the gene type and position in the given pathways, and iii) the efficiency with which perturbations propagate from one gene to another across the pathway. Furthermore, we propose to study the pathway coupling and develop appropriate correction methods for the hypergeometric, GSEA and pathway impact analysis methods. This analysis will be applied to diabetes and obesity research. The novel approach developed here will be applied to microarray data from white fat of mice treated with low dose CL 316,243 (CL), which has been shown to have the potential to transform white fat into brown fat (which burns energy rather than store it). We will also apply this approach on data collected during the differentiation of 3T3-L1 pre-adipocytes after induction of adipogenesis. The goal here is three-fold: i) to validate the novel approach;ii) to assess the efficiency with which gene perturbations propagate on each KEGG pathway during adipogenesis and fat tissue remodeling, and construct a custom set of pathways relevant to obesity and diabetes;and iii) to identify pathways and signaling cascades that are important in adipogenesis and fat tissue remodeling. The methods developed will be made available as a Bioconductor package, as well as a free Java web application. Our team has excellent qualifications and track record in developing novel algorithms for the analysis of high-throughput data, multiple hypothesis testing, as well as obesity and diabetes.

Public Health Relevance

In molecular biology and genetics, our data gathering capabilities have greatly surpassed the available data analysis techniques. Even though high-throughput data is relatively easy to be obtained, understanding the underlying phenomena is as challenging as ever, if not more so. There is a large gap between our ability to collect data and our ability to interpret it. We are proposing an effective way to analyze the vast amount of data that has been and will continue to be collected. The proposed approach will reliable identify the most impacted gene signaling pathways in a given condition. This can greatly facilitate pinpointing the causes of the observed phenomena and therefore has the potential to have a great impact in many public health areas by facilitating the identification of putative molecular causes of disease, as well as the identification of potential therapeutic interventions and their potential side effects. The main focus of this proposal is on obesity and diabetes. Achieving of the goals described here can lead to new potential therapeutic interventions to help millions of people suffering from these conditions. However, due to the generality of the methods proposed, the benefits of the proposed research are expected to impact a larger number of research areas spanning from cancer, to development, to aging as well as any other life science area in which high-throughput methods (e.g. DNA microarrays, protein microarrays, metabolomics, etc.) are used.

National Institute of Health (NIH)
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Sechi, Salvatore
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Wayne State University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Bollig-Fischer, Aliccia; Marchetti, Luca; Mitrea, Cristina et al. (2014) Modeling time-dependent transcription effects of HER2 oncogene and discovery of a role for E2F2 in breast cancer cell-matrix adhesion. Bioinformatics 30:3036-43
Yao, Fayi; Walker, Paul D; MacKenzie, Robert G (2013) A Tet-on system for DRD1-expressing cells. PLoS One 8:e72681
Taghavi, Zeinab; Movahedi, Narjes S; Draghici, Sorin et al. (2013) Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities. Bioinformatics 29:2395-401