With recent biotechnology advances, biomedical investigations have become computationally more complex and more challenging, involving high-dimensional structured data collected at a genomic scale. To respond to the pressing need to analyze such high-dimensional data, the research team proposes to develop powerful statistical and computational tools to model and infer condition-specific gene networks through sparse and structured learning of multiple precision matrices, as for time-varying gene network analyses with microarray data. The approach will be generalized to regression analysis with covariates and to mixture models with phenotype heterogeneity, e.g., unknown disease subtypes. Statistically, the team will investigate novel penalization or regularization approaches to improve accuracy and efficiency of estimating multiple large precision matrices describing pairwise partial correlations in Gaussian graphical models and Gaussian mixture models. Computationally, innovative strategies will be explored based on the state-of-the art optimization techniques, particularly difference convex programming, augmented Lagrangian method, and the method of coordinate decent.
Specific aims i nclude: a) developing computational tools for inferring multiple precision matrices, especially when the size of a matrix greatly exceeds that of samples;b) developing regression approaches for sparse as well as structured learning to associate partial correlations with covariates of interest;c) developing mixture models to infer gene disregulations in the presence of unknown disease subtypes, and to discover novel disease subtypes;d) applying the developed methods to analyze two microarray datasets for i) inference of condition-specific gene networks for E. coli, and ii) new class discovery and prediction for human endothelial cells;e) developing public-domain software.
This proposed research is expected not only to contribute valuable analysis tools for the elucidation of condition-specific gene networks, but also to advance statistical methodology and theory in Gaussian graphical models and Gaussian mixture models for high-dimensional data.
|Zhang, Yiwei; Pan, Wei (2015) Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements? Genet Epidemiol 39:149-55|
|Kim, Junghi; Wozniak, Jeffrey R; Mueller, Bryon A et al. (2014) Comparison of statistical tests for group differences in brain functional networks. Neuroimage 101:681-94|
|Zhang, Yiwei; Xu, Zhiyuan; Shen, Xiaotong et al. (2014) Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. Neuroimage 96:309-25|
|Pan, Wei; Kim, Junghi; Zhang, Yiwei et al. (2014) A powerful and adaptive association test for rare variants. Genetics 197:1081-95|
|Xu, Zhiyuan; Shen, Xiaotong; Pan, Wei et al. (2014) Longitudinal analysis is more powerful than cross-sectional analysis in detecting genetic association with neuroimaging phenotypes. PLoS One 9:e102312|
|Zhang, Yiwei; Guan, Weihua; Pan, Wei (2013) Adjustment for population stratification via principal components in association analysis of rare variants. Genet Epidemiol 37:99-109|
|Zhang, Yiwei; Shen, Xiaotong; Pan, Wei (2013) Adjusting for population stratification in a fine scale with principal components and sequencing data. Genet Epidemiol 37:787-801|
|Pan, Wei; Shen, Xiaotong; Liu, Binghui (2013) Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty. J Mach Learn Res 14:1865|
|Zhu, Yunzhang; Shen, Xiaotong; Pan, Wei (2013) Simultaneous grouping pursuit and feature selection over an undirected graph. J Am Stat Assoc 108:713-725|
|Kim, Sunkyung; Pan, Wei; Shen, Xiaotong (2013) Network-based penalized regression with application to genomic data. Biometrics 69:582-93|
Showing the most recent 10 out of 34 publications