We propose developing, evaluating and comparing biologically motivated statistical methods in analyzing and interpreting microarray data, including a heart failure dataset. Some of the proposed methods will incorporate or be applied to multiple types of genomic or proteomic data. The overarch- ing theme is that, to increase statistical power for new discovery and to maximize the use of existing knowledge and data, we propose integrating gene networks and multiple types of high-throughput data, such as gene expression data, DNA-protein binding, DNA sequences and SNP data, with novel analysis methods and applications. Specifically, we propose 1) further development and evaluation of a network-based statistical analysis method for genomic discovery with applications to several real datasets;2) developing analysis strategies to integrate gene networks and gene functional annota- tions for genomic discovery, such as detecting differentially expressed genes based on expression data, and identify binding target genes of a single transcription factor based on DNA-protein binding (i.e. ChIP-chip) data;3) developing analysis strategies to integrate gene networks and multiple types of genomic and proteomic data, such as gene expression data, DNA-protein binding data, and DNA sequences;4) integrating gene networks into regression analysis for variable selection and param- eter smoothing with applications to inferring expression quantitative trait loci (eQTL) by regressing expression data on genotype data. 5) software development for free public use.

Public Health Relevance

This proposed research is expected not only to advance statistical methodology and theory for complex data with complicated dependency structures, but also to contribute valuable analysis tools to the elucidation of molecular mechanisms underlying diseases.

National Institute of Health (NIH)
National Heart, Lung, and Blood Institute (NHLBI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Wolz, Michael
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Minnesota Twin Cities
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Kim, Junghi; Wozniak, Jeffrey R; Mueller, Bryon A et al. (2014) Comparison of statistical tests for group differences in brain functional networks. Neuroimage 101:681-94
Zhang, Yiwei; Xu, Zhiyuan; Shen, Xiaotong et al. (2014) Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. Neuroimage 96:309-25
Pan, Wei; Kim, Junghi; Zhang, Yiwei et al. (2014) A powerful and adaptive association test for rare variants. Genetics 197:1081-95
Xu, Zhiyuan; Shen, Xiaotong; Pan, Wei et al. (2014) Longitudinal analysis is more powerful than cross-sectional analysis in detecting genetic association with neuroimaging phenotypes. PLoS One 9:e102312
Zhang, Yiwei; Guan, Weihua; Pan, Wei (2013) Adjustment for population stratification via principal components in association analysis of rare variants. Genet Epidemiol 37:99-109
Zhang, Yiwei; Shen, Xiaotong; Pan, Wei (2013) Adjusting for population stratification in a fine scale with principal components and sequencing data. Genet Epidemiol 37:787-801
Pan, Wei; Shen, Xiaotong; Liu, Binghui (2013) Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty. J Mach Learn Res 14:1865
Zhu, Yunzhang; Shen, Xiaotong; Pan, Wei (2013) Simultaneous grouping pursuit and feature selection over an undirected graph. J Am Stat Assoc 108:713-725
Kim, Sunkyung; Pan, Wei; Shen, Xiaotong (2013) Network-based penalized regression with application to genomic data. Biometrics 69:582-93
Austin, Erin; Pan, Wei; Shen, Xiaotong (2013) Penalized Regression and Risk Prediction in Genome-Wide Association Studies. Stat Anal Data Min 6:

Showing the most recent 10 out of 56 publications