In the era of big data, data sets may be massive in size and have high dimensionality. Examples include social media data, high-resolution image data, and genomic data. High dimensionality and massive sample size pose great computational and statistical challenges. This project aims to investigate novel optimization techniques and statistical analytic tools for big data collected in important applications. The project will significantly enhance the capabilities of optimization and statistics in analyzing big data. Results of the project are anticipated to benefit a broad range of areas including public health, medical studies, and financial portfolio management.

This project consists of three sub-projects to advance knowledge in modern optimization techniques and statistical procedures for big data. (1) The project studies a unified framework for high-dimensional constrained regularization, and further investigates the statistical performance of a general framework for high-dimensional data under folded concave penalty and fixed constraints. (2) The project explores efficient distributed algorithms for constrained statistical learning, investigating communication-efficient sample-splitting algorithms to handle the vast number of samples and high communication cost in high-dimensional constrained regularization problems with folded concave penalty. The project will study the convergence of the new algorithms for non-convex learning as well as statistical convergence. The work demonstrates the necessity to consider statistical analysis and optimization analysis simultaneously. (3) The investigators plan to apply the methodology in analyzing big data sets to address clinical and scientific questions related to Parkinson's disease.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1820702
Program Officer
Christopher Stark
Project Start
Project End
Budget Start
2018-07-01
Budget End
2022-06-30
Support Year
Fiscal Year
2018
Total Cost
$350,000
Indirect Cost
Name
Pennsylvania State University
Department
Type
DUNS #
City
University Park
State
PA
Country
United States
Zip Code
16802