Although high-dimensional data analysis has become the most active research area in statistics, there are still many challenging unsolved problems which call for the development of new methods and theory. This project aims to develop new statistical tools and software to statistical modeling and inference on high-dimensional data. The proposed research is expected to significantly enhance the availability of statistical tools and software for analysis of high-dimensional data, which have frequently been collected in many research areas including genomics, biomedical imaging, functional magnetic resonance imaging, tomography, tumor classifications and finance. Hence, the proposed work is expected to benefit a broad range of scientists and researchers in various fields.

Considerable attention has been devoted to high-dimensional estimation and sparsity recovery over the last 10 years, but much less is known about hypothesis testing. In this project, the PIs first plan to develop new projection Hotelling's test and chi-squares tests for high-dimensional one-sample and two-sample mean problems. The tests are distinguished from the existing ones in that they are based on optimal projection directions that are derived to achieve optimal power performance. The PIs further propose an effective data-driven method to estimate the optimal projection direction by a sample-splitting strategy. The proposed procedure can be easily carried out. They plan to investigate the estimation of the sparsity optimal projection direction via regularization methods. Linear discriminant analysis has been hugely successful in classification, but most of the existing procedures cannot handle diverging number of classes. In this project, they also plan to study ultrahigh dimensional linear discriminant analysis with a diverging number of classes and develop new procedures enable researchers to apply low-dimensional linear discriminant analysis techniques for ultrahigh-dimensional linear discriminant analysis, and make ultrahigh-dimensional linear discriminant analysis with a diverging number of classes computationally feasible in practice. This model and associated new methodology have high potential for big data analysis. The PIs plan to continue collaborating with engineers, meteorologists, public health science researchers and prevention researchers and introduce the proposed methodology to scientists beyond statistics and biostatistics. The PIs plan to disseminate the research results through publications, conference presentations and software distribution.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1512422
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2015-07-01
Budget End
2019-05-31
Support Year
Fiscal Year
2015
Total Cost
$123,288
Indirect Cost
Name
Pennsylvania State University
Department
Type
DUNS #
City
University Park
State
PA
Country
United States
Zip Code
16802