Our long-term goals are to develop efficient statistical and computational methods for different types of integrative analysis of the current wealthy collection of large-scale genomics, proteomics and transcriptomics data, and to further understand the etiology of complex diseases as well as to improve their preventions and treatments. In this proposed project, the focus will be the concordant integrative analysis of multiple large- scale gene expression data sets. Genes and gene sets showing consistent behavior in multiple related studies are of great biological interest. However, there is a lack of efficient statistical methods for this type of analysis. The following specific aims are proposed for this project: (1) To develop statistical methods for the detection of concordant differential expression from multiple large-scale gene expression data sets;(2) To develop statistical methods for the detection of concordant gene set enrichment from multiple large-scale gene expression data sets;(3) To develop user-friendly R-package based software. All the statistical methods to be developed in this proposed project will be implemented in the free R-package. Our methods and software will also be useful for a concordant integrative analysis of other similar large-scale data sets. Simulated data as well as experimental data will be used to rigorously evaluate the performance of our methods. Our methods will be particularly applied to the gene expression data sets collected for aging and cancer studies.
Efficient statistical and computational methods will be developed for different types of concordant integrative analysis of large-scale gene expression data. The methods will be applied to the gene expression data sets for aging and cancer studies. R-package based computer software will be developed, documented and freely distributed to the scientific community.