In the past few years, we have witnessed a dramatic increase of the amount of data available to biomedical research. An example is the recent advances of high-throughput biotechnologies, making it possible to access genome-wide gene expressions. To address biomedical issues at molecular levels, extraction of the relevant information from massive data of complex structures is essential. This calls for advanced mechanisms for statistical prediction and inference, especially in genomic discovery and prediction, where statistical uncertainty involved in a discovery process is high. The proposed approach focuses on the development of mixture model-based and large margin approaches in semisupervised and unsupervised learning, motivated from biomedical studies in gene discovery and prediction. In particular, we propose to investigate how to improve accuracy and efficiency of mixture model-based and large margin learning systems in generalization. In addition, we will develop innovative methods taking the structure of sparseness and the grouping effect into account to battle the curse of dimensionality, and blend them with the new learning tools. A number of technical issues will be investigated, including: a) developing model selection criteria and performing automatic feature selection, especially when the number of features greatly exceeds that of samples;b) developing large margin approaches for multi-class learning, with most effort towards sparse as well as structured learning;c) implementing efficient computation for real-time applications, and d) analyzing two biological datasets for i) gene function discovery and prediction for E. coli, and ii) new class discovery and prediction for BOEC samples;e) developing public-domain software. Furthermore, computational strategies will be explored based on global optimization techniques, particularly convex programming and difference convex programming.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM081535-04
Application #
7881671
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Gaillard, Shawn R
Project Start
2007-07-15
Project End
2011-06-30
Budget Start
2010-07-01
Budget End
2011-06-30
Support Year
4
Fiscal Year
2010
Total Cost
$264,640
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
555917996
City
Minneapolis
State
MN
Country
United States
Zip Code
55455
Liu, Binghui; Shen, Xiaotong; Pan, Wei (2016) Nonlinear Joint Latent Variable Models and Integrative Tumor Subtype Discovery. Stat Anal Data Min 9:106-116
Liu, Binghui; Shen, Xiaotong; Pan, Wei (2016) Integrative and regularized principal component analysis of multiple sources of data. Stat Med 35:2235-50
Gao, Chen; Zhu, Yunzhang; Shen, Xiaotong et al. (2016) Estimation of multiple networks in Gaussian mixture models. Electron J Stat 10:1133-1154
Wei, Peng; Cao, Ying; Zhang, Yiwei et al. (2016) On Robust Association Testing for Quantitative Traits and Rare Variants. G3 (Bethesda) 6:3941-3950
Austin, Erin; Shen, Xiaotong; Pan, Wei (2015) A Novel Statistic for Global Association Testing Based on Penalized Regression. Genet Epidemiol 39:415-26
Kim, Junghi; Pan, Wei; Alzheimer's Disease Neuroimaging Initiative (2015) Highly adaptive tests for group differences in brain functional connectivity. Neuroimage Clin 9:625-39
Kim, Junghi; Wozniak, Jeffrey R; Mueller, Bryon A et al. (2015) Testing group differences in brain functional connectivity: using correlations or partial correlations? Brain Connect 5:214-31
Pan, Wei; Kwak, Il-Youp; Wei, Peng (2015) A Powerful Pathway-Based Adaptive Test for Genetic Association with Common or Rare Variants. Am J Hum Genet 97:86-98
Pan, Wei; Chen, Yue-Ming; Wei, Peng (2015) Testing for polygenic effects in genome-wide association studies. Genet Epidemiol 39:306-16
Zhang, Yiwei; Pan, Wei (2015) Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements? Genet Epidemiol 39:149-55

Showing the most recent 10 out of 60 publications