The investigator develops new methodologies for sufficient dimension reduction both for linear and nonlinear dimension reduction problems. A method developed recently by the investigator and his collaborators utilizes classic two-class Support Vector Machine algorithms to develop a new class of algorithms for linear and nonlinear sufficient dimension reduction under a unified framework. In this work the investigator extends this methodology in several directions. First, different extensions of the classic Support Vector Machine algorithms are used to improve the performance and the asymptotic properties of the original method. Second, the method is extended to algorithms that allow for multi-class classification. Third, the investigator develops new method-specific and method-free variable selection methodologies for sufficient dimension techniques based on ideas in the machine learning literature. Finally, new algorithms for the order determination of the dimension reduction space based on the new methodology are developed.

Recent advancements in computer science have increased computer power and, subsequently, the capability of storing large datasets efficiently. Thus, to analyze large datasets effectively in many sciences, like Biology, Meteorology, Genetics and Economics, new techniques are needed to reduce the dimensionality of the datasets. This work creates new algorithms to reduce the dimensionality of large datasets effectively, for both linear or nonlinear relationships between variables. These techniques transform a high-dimensional regression or classification problem to a lower-dimensional one, which helps to identify hidden relationships among variables. The methodology being developed will be an efficient tool for scientists working with large datasets, and it will open new research frontiers to statisticians to develop new ideas in the area of dimension reduction.

Project Report

During this era were high dimensional datasets appear in all aspects of our lives and in almost every field of research, statisticians are in need of creating efficient tools to summarize the information in these massive datasets. This project intended to attack the problem from a theoretical perspective and give practicioners efficient tools for Dimension Reduction mainly in a regression setting. Our effort was to combine two different areas of Statistics, dimension reduction and machine learning to improve the performance of existing algorithms in the literature. Although this project ended prematurely, in the short year that the project run we were able to produce important results. First we developed one algorithm for sufficient dimension reduction using a reweighted SVM algorithm to take advantage of the imbalance nature of the distribution of the data. Second we were able to establish a connection between supervised and unsupervised dimension reduction by proving that the most well known algorithm of unsupervised dimension reduction (that is Principal Component Analysis) perform well most of the times in models widely used in dupervised dimension reduction settings. We will continue to investigate and developing appropriate algorithms to satisfy the objectives of the project, in the coming months or years as we find this area of great interest to researchers from many fields (Earth Sciences, Engineering, Biosciences, Medical Sciences etc). Some developments in early stages hint that existing methodologies have specific theoretical weaknesses that can be overcome by appropriately changing the definition of the problem and this is where future research will focus.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1207651
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2012-09-15
Budget End
2013-08-31
Support Year
Fiscal Year
2012
Total Cost
$19,385
Indirect Cost
Name
Michigan Technological University
Department
Type
DUNS #
City
Houghton
State
MI
Country
United States
Zip Code
49931