Flexible statistical machine learning techniques for cancer-related data

Liu, Yufeng; Fuentes, Montserrat

Abstract

Gene expression provides a snapshot of the cellular changes that promote tumor malignancy. Quantitative gene expression analysis, especially as implemented by DNA microarrays, has identified many new important cancer related genes and led to the development of new genomic-based clinical tests. For the quantitative aspect of gene expression analysis, many statistical methods have been used to study human tumors and to classify them into groups that can be used to predict clinical behavior. Despite progress, with the rapid advance of technology, massive and complex data are being generated in cancer research. Analyzing such data becomes more and more challenging. These challenges call for novel statistical learning methods, especially for high dimensional and noisy data. The goal of this project is to develop a host of new statistical learning techniques for solving complicated learning problems. In particular, this project develops (1) novel techniques to assess statistical significance of clustering for high dimensional data;(2) several novel predictive models including classification and regression which are expected to yield highly competitive accuracy and interpretability;(3) new methods for high dimensional biomarker/variable selection;(4) new approaches to estimate high dimensional covariance/precision matrix for biological network construction. These new developments are expected to allow scientists to analyze complex cancer genomic data with accurate prediction accuracy and increased interpretability. The research team will apply the proposed techniques to cancer research data analysis. The success of this project will be important in bridging statistical machine learning and cancer research.

Public Health Relevance

This project aims to develop a host of new statistical learning techniques for solving complicated learning problems, especially for problems with high dimensional and noisy data such as gene expression data. These new techniques are expected to allow scientists to analyze complex cancer genomic data with accurate prediction accuracy and increased interpretability.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Project (R01)
Project #: 5R01CA149569-03
Application #: 8204935
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Li, Jerry

Project Start: 2010-02-01
Project End: 2014-12-31
Budget Start: 2012-01-01
Budget End: 2012-12-31
Support Year: 3
Fiscal Year: 2012
Total Cost: $292,488
Indirect Cost: $65,673

Institution

Name: University of North Carolina Chapel Hill
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 608195277

City: Chapel Hill
State: NC
Country: United States
Zip Code: 27599

Related projects


NIH 2014 R01 CA	Flexible statistical machine learning techniques for cancer-related data Liu, Yufeng; Wu, Yichao / University of North Carolina Chapel Hill	$263,142
NIH 2013 R01 CA	Flexible statistical machine learning techniques for cancer-related data Liu, Yufeng; Wu, Yichao / University of North Carolina Chapel Hill	$274,890
NIH 2012 R01 CA	Flexible statistical machine learning techniques for cancer-related data Liu, Yufeng; Fuentes, Montserrat / University of North Carolina Chapel Hill	$292,488
NIH 2011 R01 CA	Flexible statistical machine learning techniques for cancer-related data Liu, Yufeng; Wu, Yichao / University of North Carolina Chapel Hill	$292,538
NIH 2010 R01 CA	Flexible statistical machine learning techniques for cancer-related data Liu, Yufeng; Wu, Yichao / University of North Carolina Chapel Hill	$313,635

Publications

Kimes, Patrick K; Liu, Yufeng; Neil Hayes, David et al. (2017) Statistical significance for hierarchical clustering. Biometrics 73:811-821

White, Kyle R; Stefanski, Leonard A; Wu, Yichao (2017) Variable Selection in Kernel Regression Using Measurement Error Selection Likelihoods. J Am Stat Assoc 112:1587-1597

Hu, Hao; Yao, Weixin; Wu, Yichao (2017) The Robust EM-type Algorithms for Log-concave Mixtures of Regression Models. Comput Stat Data Anal 111:14-26

Zhang, Chong; Liu, Yufeng; Wang, Junhui et al. (2016) Reinforced Angle-based Multicategory Support Vector Machines. J Comput Graph Stat 25:806-825

Zhang, Xiang; Wu, Yichao; Wang, Lan et al. (2016) A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces. J Mach Learn Res 17:1-26

Zhang, Chong; Liu, Yufeng (2016) Comments on: Probability Enhanced Effective Dimension Reduction for Classifying Sparse Functional Data. Test (Madr) 25:44-46

Chen, Guanhua; Liu, Yufeng; Shen, Dinggang et al. (2016) Composite large margin classifiers with latent subclasses for heterogeneous biomedical data. Stat Anal Data Min 9:75-88

Zhang, Chong; Liu, Yufeng; Wu, Yichao (2016) On Quantile Regression in Reproducing Kernel Hilbert Spaces with Data Sparsity Constraint. J Mach Learn Res 17:1-45

Hu, Hao; Wu, Yichao; Yao, Weixin (2016) Maximum likelihood estimation of the mixture of log-concave densities. Comput Stat Data Anal 101:137-147

Shin, Sunyoung; Fine, Jason; Liu, Yufeng (2016) Adaptive Estimation with Partially Overlapping Models. Stat Sin 26:235-253

Showing the most recent 10 out of 67 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: