With recent biotechnology advances, biomedical investigations have become computationally more complex and more challenging, involving high-dimensional structured data collected at a genomic scale. To respond to the pressing need to analyze such high-dimensional data, the research team proposes to develop powerful statistical and computational tools to model and infer condition-specific gene networks through sparse and structured learning of multiple precision matrices, as for time-varying gene network analyses with microarray data. The approach will be generalized to regression analysis with covariates and to mixture models with phenotype heterogeneity, e.g., unknown disease subtypes. Statistically, the team will investigate novel penalization or regularization approaches to improve accuracy and efficiency of estimating multiple large precision matrices describing pairwise partial correlations in Gaussian graphical models and Gaussian mixture models. Computationally, innovative strategies will be explored based on the state-of-the art optimization techniques, particularly difference convex programming, augmented Lagrangian method, and the method of coordinate decent.
Specific aims i nclude: a) developing computational tools for inferring multiple precision matrices, especially when the size of a matrix greatly exceeds that of samples;b) developing regression approaches for sparse as well as structured learning to associate partial correlations with covariates of interest;c) developing mixture models to infer gene disregulations in the presence of unknown disease subtypes, and to discover novel disease subtypes;d) applying the developed methods to analyze two microarray datasets for i) inference of condition-specific gene networks for E. coli, and ii) new class discovery and prediction for human endothelial cells;e) developing public-domain software.
This proposed research is expected not only to contribute valuable analysis tools for the elucidation of condition-specific gene networks, but also to advance statistical methodology and theory in Gaussian graphical models and Gaussian mixture models for high-dimensional data.
Liu, Binghui; Shen, Xiaotong; Pan, Wei (2016) Nonlinear Joint Latent Variable Models and Integrative Tumor Subtype Discovery. Stat Anal Data Min 9:106-116 |
Liu, Binghui; Shen, Xiaotong; Pan, Wei (2016) Integrative and regularized principal component analysis of multiple sources of data. Stat Med 35:2235-50 |
Gao, Chen; Zhu, Yunzhang; Shen, Xiaotong et al. (2016) Estimation of multiple networks in Gaussian mixture models. Electron J Stat 10:1133-1154 |
Wei, Peng; Cao, Ying; Zhang, Yiwei et al. (2016) On Robust Association Testing for Quantitative Traits and Rare Variants. G3 (Bethesda) 6:3941-3950 |
Kim, Junghi; Pan, Wei; Alzheimer's Disease Neuroimaging Initiative (2015) A cautionary note on using secondary phenotypes in neuroimaging genetic studies. Neuroimage 121:136-45 |
Zhang, Yongli; Shen, Xiaotong (2015) Adaptive Modeling Procedure Selection by Data Perturbation. J Bus Econ Stat 33:541-551 |
Austin, Erin; Shen, Xiaotong; Pan, Wei (2015) A Novel Statistic for Global Association Testing Based on Penalized Regression. Genet Epidemiol 39:415-26 |
Kim, Junghi; Pan, Wei; Alzheimer's Disease Neuroimaging Initiative (2015) Highly adaptive tests for group differences in brain functional connectivity. Neuroimage Clin 9:625-39 |
Kim, Junghi; Wozniak, Jeffrey R; Mueller, Bryon A et al. (2015) Testing group differences in brain functional connectivity: using correlations or partial correlations? Brain Connect 5:214-31 |
Pan, Wei; Kwak, Il-Youp; Wei, Peng (2015) A Powerful Pathway-Based Adaptive Test for Genetic Association with Common or Rare Variants. Am J Hum Genet 97:86-98 |
Showing the most recent 10 out of 60 publications