Gene Set Enrichment Analysis (GSEA) aims at identifying essential pathways, or more generally, sets of biologically related genes that are involved in complex human diseases. Due to many advantages it offers, GSEA has been proved to be crucial in systems biology studies that can lead to an integrated understanding of fundamental biological processes underlying disease pathogenesis, and elements defining therapeutic targets as well as responses to treatment selections. However, despite its potential importance in promoting human health, it is striking that conclusions of GSEA drawn from isolated studies are often sparse, and different studies may lead to inconsistent and sometimes contradictory results. This problem is largely related to the following limitations. Firstly, studies have shown that isoform-specific expression variations play important roles in complex human diseases. However, the microarray technology traditionally used for mRNA profiling often lacks the resolution needed to measure isoform-specific expression. Secondly, sample sizes of individual genome-wide transcriptomic studies are typically insufficient relative to an overwhelming number of genes. In the wake of next generation sequencing (NGS) technologies, it has been made possible to measure genome-wide isoform-specific expression levels, calling for next generation innovations that can utilize the un- precedence resolution. Further, enormous amounts of data have been created from various microarray and RNA-seq experiments; and the volume continues to grow fast. All these give rise to tremendous demand for developing methods of integrative GSEA (iGSEA) that allow for explicit utilization of isoform-specific expression, to combine multiple relevant studies, in order to avoid indecisive or potentially conducting conclusions from individual data and so to enhance the power, reproducibility and interpretability of the analysis. The goal of this project is to develop novel statistical methods and bioinformatical tools for iGSEA to efficiently synthesize diverse mRNA expression data from studies involving newly emerging RNA-Seq experiments as well as conventional microarray experiments, with an emphasis on integrating isoform-specific expression.
In Aim 1, we will develop an innovative meta-analysis method for iGSEA using isoform-specific expression. Specifically, we will incorporate ideas from exe-effect and random-effects models, newly proposed and tested for meta-analysis of genome-wide association studies, into iGSEA, in order to achieve the maximum possible statistical efficiency while allowing for inclusion of heterogeneous studies.
Aim 2 will propose robust meta-analysis methods to integrate both isoform- and gene-level expression data from a variety of sources.
Aim 3 will develop a fully integrated Bayesian method to incorporate existing biological information more effectively. A powerful Bayesian hierarchical approach will be proposed to jointly model different sources of information. This will not only drastically improve the power of iGSEA, but also simultaneously reveal interesting genes and gene sets, as well as `responsible' isoforms of each identified gene.

Public Health Relevance

To understand molecular mechanisms underlying complex human diseases, one important task is to identify groups of related genes that are combinatorial involved in such biological processes, mainly through Gene Set Enrichment Analysis (GSEA). In the past, many statistical methods have been developed for GSEA; and many studies have shown that GSEA is a very useful bioinformatics tool, which plays critical roles in the innovation of disease prevention and intervention strategies. However, in the dawn of a new big data era, there is an increasingly urgent need to perform integrative GSEA (iGSEA), i.e., integrating multiple relevant GSEA studies, to turn individual data into collective knowledge. The goal of this project is to develop a comprehensive set of statistical methods and computational tools critically needed for iGSEA involving multiple RNA-Seq and/or microarray datasets, which allow utilization of isoform-specific expression, integration of mixed mRNA data from different technologies, and incorporation of elaborate biological information, to promote the power, reproducibility and interpretability of the analysis.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Academic Research Enhancement Awards (AREA) (R15)
Project #
1R15GM113157-01A1
Application #
8957112
Study Section
Special Emphasis Panel (ZRG1-BST-F (80))
Program Officer
Bender, Michael T
Project Start
2015-09-16
Project End
2018-08-31
Budget Start
2015-09-16
Budget End
2018-08-31
Support Year
1
Fiscal Year
2015
Total Cost
$356,094
Indirect Cost
$107,095
Name
Southern Methodist University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
001981133
City
Dallas
State
TX
Country
United States
Zip Code
75275
Li, Lie; Bai, Ou; Wang, Xinlei (2018) An integrative shrinkage estimator for random-effects meta-analysis of rare binary events. Contemp Clin Trials Commun 10:141-147
Wang, Tao; Lu, Rong; Kapur, Payal et al. (2018) An Empirical Approach Leveraging Tumorgrafts to Dissect the Tumor Microenvironment in Renal Cell Carcinoma Identifies Missing Link to Prognostic Inflammatory Factors. Cancer Discov 8:1142-1155
Lu, Wentao; Wang, Xinlei; Zhan, Xiaowei et al. (2018) Meta-analysis approaches to combine multiple gene set enrichment studies. Stat Med 37:659-672
Li, Qiwei; Wang, Xinlei; Liang, Faming et al. (2018) A Bayesian hidden Potts mixture model for analyzing lung cancer pathology images. Biostatistics :
Li, Xue; Choudhary, Pankaj Kumar; Biswas, Swati et al. (2018) A Bayesian latent variable approach to aggregation of partial and top-ranked lists in genomic studies. Stat Med 37:4266-4278
Li, Lie; Wang, Xinlei; Xiao, Guanghua et al. (2017) Integrative gene set enrichment analysis utilizing isoform-specific expression. Genet Epidemiol 41:498-510
Jia, Gaoxiang; Wang, Xinlei; Xiao, Guanghua (2017) A permutation-based non-parametric analysis of CRISPR screen data. BMC Genomics 18:545
Li, Lie; Wang, Xinlei (2017) Meta-analysis of rare binary events in treatment groups with unequal variability. Stat Methods Med Res :962280217721246
Yu, Donghyeon; Lim, Johan; Wang, Xinlei et al. (2017) Enhanced construction of gene regulatory networks using hub gene information. BMC Bioinformatics 18:186
Park, Sunho; Kim, Seung-Jun; Yu, Donghyeon et al. (2016) An integrative somatic mutation analysis to identify pathways linked with survival outcomes across 19 cancer types. Bioinformatics 32:1643-51

Showing the most recent 10 out of 12 publications