Recent technology advances have generated data of unprecedented size and complexity across different scientific fields. To analyze such complex data, the principal investigator (PI) aims to develop new statistical methodologies. The PI proposes to study four interrelated research topics. First, the PI focuses on large-margin classification and proposes new large-margin classifiers to deliver competitive classification and conditional class probability estimation. He also proposes to address the question of whether a soft or hard classifier is preferred for a particular classification task and how to incorporate estimated conditional class probability to improve dimension reduction for data with a categorical response. Second, the PI proposes an extension of the least angle regression to deal with generalized linear models and, more generally, a strictly convex optimization problem. The new solution path is piecewise given by systems of ordinary differential equations and can be slightly modified to get the corresponding LASSO regularized solution path. Third, data with a sparse and irregular functional predictor are considered. New response-based dimensional reduction methods are proposed for such data using cumulative slicing and a viable scheme is also proposed to extend large-margin classifiers to analyze such data. Fourth, the PI focuses on the semi-parametric multi-index regression. By noticing that the Hessian operator filters out the effect of the linear component automatically, the PI provides a direct estimation scheme to estimate the space spanned by the multiple indices. The new scheme differs from existing methods in that it does not require estimating the nonparametric link while estimating the space spanned by the multiple indices as in other existing approaches.

The proposed statistical methodology innovations are widely applicable in various fields. For example, the proposed new large-margin classifiers can be applied to analyze genomic data with a categorical response such as cancer type; new ordinary differential equation based solution path algorithms can used to analyze survival or binary genomic data to identify important predictors; while analyzing longitudinal data of aging, the new proposed statistical methods for sparse and irregular functional data will be useful. In order to facilitate the use of the proposed new methods, the PI will implement them in R or Matlab and make new software available to the public along with the corresponding research reports. The success of the proposed research will help to improve public health.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1055210
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2011-09-01
Budget End
2017-12-31
Support Year
Fiscal Year
2010
Total Cost
$400,000
Indirect Cost
Name
North Carolina State University Raleigh
Department
Type
DUNS #
City
Raleigh
State
NC
Country
United States
Zip Code
27695