With recent biotechnology advances, biomedical investigations have become computationally more complex and more challenging, involving high-dimensional structured data collected at a genomic scale. To respond to the pressing need to analyze such high-dimensional data, the research team proposes to develop powerful statistical and computational tools to model and infer condition-specific gene networks through sparse and structured learning of multiple precision matrices, as for time-varying gene network analyses with microarray data. The approach will be generalized to regression analysis with covariates and to mixture models with phenotype heterogeneity, e.g., unknown disease subtypes. Statistically, the team will investigate novel penalization or regularization approaches to improve accuracy and efficiency of estimating multiple large precision matrices describing pairwise partial correlations in Gaussian graphical models and Gaussian mixture models. Computationally, innovative strategies will be explored based on the state-of-the art optimization techniques, particularly difference convex programming, augmented Lagrangian method, and the method of coordinate decent.
Specific aims i nclude: a) developing computational tools for inferring multiple precision matrices, especially when the size of a matrix greatly exceeds that of samples;b) developing regression approaches for sparse as well as structured learning to associate partial correlations with covariates of interest;c) developing mixture models to infer gene disregulations in the presence of unknown disease subtypes, and to discover novel disease subtypes;d) applying the developed methods to analyze two microarray datasets for i) inference of condition-specific gene networks for E. coli, and ii) new class discovery and prediction for human endothelial cells;e) developing public-domain software.
This proposed research is expected not only to contribute valuable analysis tools for the elucidation of condition-specific gene networks, but also to advance statistical methodology and theory in Gaussian graphical models and Gaussian mixture models for high-dimensional data.
Showing the most recent 10 out of 60 publications