Our long-term goals are to develop efficient statistical and computational methods for different types of integrative analysis of the current wealthy collection of large-scale genomics, proteomics and transcriptomics data, and to further understand the etiology of complex diseases as well as to improve their preventions and treatments. In this proposed project, the focus will be the concordant integrative analysis of multiple large- scale gene expression data sets. Genes and gene sets showing consistent behavior in multiple related studies are of great biological interest. However, there is a lack of efficient statistical methods for this type of analysis. The following specific aims are proposed for this project: (1) To develop statistical methods for the detection of concordant differential expression from multiple large-scale gene expression data sets;(2) To develop statistical methods for the detection of concordant gene set enrichment from multiple large-scale gene expression data sets;(3) To develop user-friendly R-package based software. All the statistical methods to be developed in this proposed project will be implemented in the free R-package. Our methods and software will also be useful for a concordant integrative analysis of other similar large-scale data sets. Simulated data as well as experimental data will be used to rigorously evaluate the performance of our methods. Our methods will be particularly applied to the gene expression data sets collected for aging and cancer studies.

Public Health Relevance

Efficient statistical and computational methods will be developed for different types of concordant integrative analysis of large-scale gene expression data. The methods will be applied to the gene expression data sets for aging and cancer studies. R-package based computer software will be developed, documented and freely distributed to the scientific community.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM092963-01A1
Application #
7993295
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Gaillard, Shawn R
Project Start
2010-09-01
Project End
2013-08-31
Budget Start
2010-09-01
Budget End
2011-08-31
Support Year
1
Fiscal Year
2010
Total Cost
$150,657
Indirect Cost
Name
George Washington University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
043990498
City
Washington
State
DC
Country
United States
Zip Code
20052
Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K et al. (2017) Detecting discordance enrichment among a series of two-sample genome-wide expression data sets. BMC Genomics 18:1050
Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K et al. (2017) An efficient concordant integrative analysis of multiple large-scale two-sample expression data sets. Bioinformatics 33:3852-3860
Lai, Yinglei (2017) A statistical method for the conservative adjustment of false discovery rate (q-value). BMC Bioinformatics 18:69
Lai, Yinglei; Albert, Paul S (2014) Identifying multiple change points in a linear mixed effects model. Stat Med 33:1015-28
Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K et al. (2014) Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets. BMC Genomics 15 Suppl 1:S6
Lai, Yinglei (2012) Change-point analysis of paired allele-specific copy number variation data. J Comput Biol 19:679-93
Lai, Yinglei (2011) On the adaptive partition approach to the detection of multiple change-points. PLoS One 6:e19754