New Machine Learning Tools for Biomedical Data

Shen, Xiaotong; Pan, Wei

Abstract

With recent biotechnology advances, biomedical investigations have become computationally more complex and more challenging, involving high-dimensional structured data collected at a genomic scale. To respond to the pressing need to analyze such high-dimensional data, the research team proposes to develop powerful statistical and computational tools to model and infer condition-specific gene networks through sparse and structured learning of multiple precision matrices, as for time-varying gene network analyses with microarray data. The approach will be generalized to regression analysis with covariates and to mixture models with phenotype heterogeneity, e.g., unknown disease subtypes. Statistically, the team will investigate novel penalization or regularization approaches to improve accuracy and efficiency of estimating multiple large precision matrices describing pairwise partial correlations in Gaussian graphical models and Gaussian mixture models. Computationally, innovative strategies will be explored based on the state-of-the art optimization techniques, particularly difference convex programming, augmented Lagrangian method, and the method of coordinate decent.
Specific aims i nclude: a) developing computational tools for inferring multiple precision matrices, especially when the size of a matrix greatly exceeds that of samples;b) developing regression approaches for sparse as well as structured learning to associate partial correlations with covariates of interest;c) developing mixture models to infer gene disregulations in the presence of unknown disease subtypes, and to discover novel disease subtypes;d) applying the developed methods to analyze two microarray datasets for i) inference of condition-specific gene networks for E. coli, and ii) new class discovery and prediction for human endothelial cells;e) developing public-domain software.

Public Health Relevance

This proposed research is expected not only to contribute valuable analysis tools for the elucidation of condition-specific gene networks, but also to advance statistical methodology and theory in Gaussian graphical models and Gaussian mixture models for high-dimensional data.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM081535-06
Application #: 8281439
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Brazhnik, Paul

Project Start: 2007-07-15
Project End: 2015-06-30
Budget Start: 2012-07-01
Budget End: 2013-06-30
Support Year: 6
Fiscal Year: 2012
Total Cost: $294,260
Indirect Cost: $89,260

Institution

Name: University of Minnesota Twin Cities
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 555917996

City: Minneapolis
State: MN
Country: United States
Zip Code: 55455

Related projects


NIH 2014 R01 GM	New Machine Learning Tools for Biomedical Data Shen, Xiaotong; Pan, Wei / University of Minnesota Twin Cities
NIH 2013 R01 GM	New Machine Learning Tools for Biomedical Data Shen, Xiaotong; Pan, Wei / University of Minnesota Twin Cities	$283,518
NIH 2012 R01 GM	New Machine Learning Tools for Biomedical Data Shen, Xiaotong; Pan, Wei / University of Minnesota Twin Cities	$294,260
NIH 2011 R01 GM	New Machine Learning Tools for Biomedical Data Shen, Xiaotong; Pan, Wei / University of Minnesota Twin Cities	$290,523
NIH 2010 R01 GM	New Machine Learning Methods for Biomedical Data Shen, Xiaotong / University of Minnesota Twin Cities	$264,640
NIH 2009 R01 GM	New Machine Learning Methods for Biomedical Data Shen, Xiaotong / University of Minnesota Twin Cities	$267,801
NIH 2008 R01 GM	New Machine Learning Methods for Biomedical Data Shen, Xiaotong / University of Minnesota Twin Cities	$268,274
NIH 2007 R01 GM	New Machine Learning Methods for Biomedical Data Shen, Xiaotong / University of Minnesota Twin Cities	$266,852

Publications

Gao, Chen; Zhu, Yunzhang; Shen, Xiaotong et al. (2016) Estimation of multiple networks in Gaussian mixture models. Electron J Stat 10:1133-1154

Wei, Peng; Cao, Ying; Zhang, Yiwei et al. (2016) On Robust Association Testing for Quantitative Traits and Rare Variants. G3 (Bethesda) 6:3941-3950

Liu, Binghui; Shen, Xiaotong; Pan, Wei (2016) Nonlinear Joint Latent Variable Models and Integrative Tumor Subtype Discovery. Stat Anal Data Min 9:106-116

Liu, Binghui; Shen, Xiaotong; Pan, Wei (2016) Integrative and regularized principal component analysis of multiple sources of data. Stat Med 35:2235-50

Austin, Erin; Shen, Xiaotong; Pan, Wei (2015) A Novel Statistic for Global Association Testing Based on Penalized Regression. Genet Epidemiol 39:415-26

Kim, Junghi; Pan, Wei; Alzheimer's Disease Neuroimaging Initiative (2015) Highly adaptive tests for group differences in brain functional connectivity. Neuroimage Clin 9:625-39

Kim, Junghi; Wozniak, Jeffrey R; Mueller, Bryon A et al. (2015) Testing group differences in brain functional connectivity: using correlations or partial correlations? Brain Connect 5:214-31

Pan, Wei; Kwak, Il-Youp; Wei, Peng (2015) A Powerful Pathway-Based Adaptive Test for Genetic Association with Common or Rare Variants. Am J Hum Genet 97:86-98

Pan, Wei; Chen, Yue-Ming; Wei, Peng (2015) Testing for polygenic effects in genome-wide association studies. Genet Epidemiol 39:306-16

Zhang, Yiwei; Pan, Wei (2015) Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements? Genet Epidemiol 39:149-55

Showing the most recent 10 out of 60 publications