Gene expression provides a snapshot of the cellular changes that promote tumor malignancy. Quantitative gene expression analysis, especially as implemented by DNA microarrays, has identified many new important cancer related genes and led to the development of new genomic-based clinical tests. For the quantitative aspect of gene expression analysis, many statistical methods have been used to study human tumors and to classify them into groups that can be used to predict clinical behavior. Despite progress, with the rapid advance of technology, massive and complex data are being generated in cancer research. Analyzing such data becomes more and more challenging. These challenges call for novel statistical learning methods, especially for high dimensional and noisy data. The goal of this project is to develop a host of new statistical learning techniques for solving complicated learning problems. In particular, this project develops (1) novel techniques to assess statistical significance of clustering for high dimensional data;(2) several novel predictive models including classification and regression which are expected to yield highly competitive accuracy and interpretability;(3) new methods for high dimensional biomarker/variable selection;(4) new approaches to estimate high dimensional covariance/precision matrix for biological network construction. These new developments are expected to allow scientists to analyze complex cancer genomic data with accurate prediction accuracy and increased interpretability. The research team will apply the proposed techniques to cancer research data analysis. The success of this project will be important in bridging statistical machine learning and cancer research.

Public Health Relevance

This project aims to develop a host of new statistical learning techniques for solving complicated learning problems, especially for problems with high dimensional and noisy data such as gene expression data. These new techniques are expected to allow scientists to analyze complex cancer genomic data with accurate prediction accuracy and increased interpretability.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
5R01CA149569-03
Application #
8204935
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Li, Jerry
Project Start
2010-02-01
Project End
2014-12-31
Budget Start
2012-01-01
Budget End
2012-12-31
Support Year
3
Fiscal Year
2012
Total Cost
$292,488
Indirect Cost
$65,673
Name
University of North Carolina Chapel Hill
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599
Zhou, Hua; Wu, Yichao (2014) A Generic Path Algorithm for Regularized Statistical Estimation. J Am Stat Assoc 109:686-699
Stefanski, L A; Wu, Yichao; White, Kyle (2014) Variable Selection in Nonparametric Classification via Measurement Error Model Selection Likelihoods. J Am Stat Assoc 109:574-589
Kruppa, Jochen; Liu, Yufeng; Biau, GĂ©rard et al. (2014) Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory. Biom J 56:534-63
Kruppa, Jochen; Liu, Yufeng; Diener, Hans-Christian et al. (2014) Probability estimation with machine learning methods for dichotomous and multicategory outcome: applications. Biom J 56:564-83
Shin, Seung Jun; Wu, Yichao; Zhang, Hao Helen (2014) Two-Dimensional Solution Surface for Weighted Support Vector Machines. J Comput Graph Stat 23:383-402
Shin, Seung Jun; Wu, Yichao (2014) Variable selection in large margin classifier-based probability estimation with high-dimensional predictors. Biom J 56:594-6
Kimes, Patrick K; Cabanski, Christopher R; Wilkerson, Matthew D et al. (2014) SigFuge: single gene clustering of RNA-seq reveals differential isoform usage among cancer samples. Nucleic Acids Res 42:e113
Wu, Shuang; Xue, Hongqi; Wu, Yichao et al. (2014) Variable Selection for Sparse High-Dimensional Nonlinear Regression Models by Combining Nonnegative Garrote and Sure Independence Screening. Stat Sin 24:1365-1387
Ha, Min Jin; Sun, Wei (2014) Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation. Biometrics 70:765-73
Lee, Myung Hee; Liu, Yufeng (2013) Kernel Continuum Regression. Comput Stat Data Anal 68:190-201

Showing the most recent 10 out of 31 publications