We propose developing, evaluating and comparing biologically motivated statistical methods in analyzing and interpreting microarray data, including a heart failure dataset. Some of the proposed methods will incorporate or be applied to multiple types of genomic or proteomic data. The overarching theme is that, to increase statistical power for new discovery and to maximize the use of existing knowledge and data, we propose integrating gene networks and multiple types of high-throughput data, such as gene expression data, DNA-protein binding, DNA sequences and SNP data, with novel analysis methods and applications. Specifically, we propose 1) further development and evaluation of a network-based statistical analysis method for genomic discovery with applications to several real datasets;2) developing analysis strategies to integrate gene networks and gene functional annotations for genomic discovery, such as detecting differentially expressed genes based on expression data, and identify binding target genes of a single transcription factor based on DNA-protein binding (i.e. ChIP-chip) data;3) developing analysis strategies to integrate gene networks and multiple types of genomic and proteomic data, such as gene expression data, DNA-protein binding data, and DNA sequences;4) integrating gene networks into regression analysis for variable selection and parameter smoothing with applications to inferring expression quantitative trait loci (eQTL) by regressing expression data on genotype data. 5) software development for free public use.

Public Health Relevance

This proposed research is expected not only to advance statistical methodology and theory for complex data with complicated dependency structures, but also to contribute valuable analysis tools to the elucidation of molecular mechanisms underlying diseases.

National Institute of Health (NIH)
National Heart, Lung, and Blood Institute (NHLBI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Wolz, Michael
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Minnesota Twin Cities
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Xu, Zhiyuan; Shen, Xiaotong; Pan, Wei et al. (2014) Longitudinal analysis is more powerful than cross-sectional analysis in detecting genetic association with neuroimaging phenotypes. PLoS One 9:e102312
Zhang, Yiwei; Xu, Zhiyuan; Shen, Xiaotong et al. (2014) Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. Neuroimage 96:309-25
Ho, Yen-Yi; Baechler, Emily C; Ortmann, Ward et al. (2014) Using gene expression to improve the power of genome-wide association analysis. Hum Hered 78:94-103
Pan, Wei; Kim, Junghi; Zhang, Yiwei et al. (2014) A powerful and adaptive association test for rare variants. Genetics 197:1081-95
Zhang, Yiwei; Pan, Wei (2014) Adjusting for population stratification and relatedness with sequencing data. BMC Proc 8:S42
Austin, Erin; Pan, Wei; Shen, Xiaotong (2014) Does the inclusion of rare variants improve risk prediction? BMC Proc 8:S94
Kim, Junghi; Wozniak, Jeffrey R; Mueller, Bryon A et al. (2014) Comparison of statistical tests for group differences in brain functional networks. Neuroimage 101:681-94
Zhu, Yunzhang; Shen, Xiaotong; Pan, Wei (2014) Structural pursuit over multiple undirected graphs. J Am Stat Assoc 109:1683-1696
Zhang, Yiwei; Guan, Weihua; Pan, Wei (2013) Adjustment for population stratification via principal components in association analysis of rare variants. Genet Epidemiol 37:99-109
Shen, Xiaotong; Pan, Wei; Zhu, Yunzhang et al. (2013) On constrained and regularized high-dimensional regression. Ann Inst Stat Math 65:807-832

Showing the most recent 10 out of 69 publications