The data sets involved in many modern applications are extremely large, and are often collected at distributed locations and continuously with the progression of time. Common examples are data sets associated with searches on the Internet, social networks, information technology, healthcare, biology, finance, and engineering. Analyzing and learning from these massive data sets imposes great challenges on computation, data storage, and data transfer. On the other hand, high performance computers are now readily available. This project aims to develop novel computational methods to enable high performance computing for questions involving extremely large data sets. The approaches address several computational challenges that emerge from applications across data sciences and engineering. Undergraduate and graduate students are involved in the project.

This project is focused on designing novel computational algorithms and analyzing their theoretical behaviors for solving structured optimization problems that involve huge data sets and are parameterized by large numbers of variables. Both the defining objective functions and the optimal solutions exhibit particular structures, including convexity, smoothness, and multi-linearity for the former, and sparsity, low-rank, and orthogonality for the latter. This research aims to take advantage of this structure in designing efficient computational methods. The project includes several research directions, from variable splitting for handling complicated regularizers, to adaptive asynchronous parallel computing and analysis of convergence rates. Stochastic approximations will be used for dealing with problems involving stream data, and novel numerical approaches will be used to solve non-linearly constrained problems via primal-dual updates. Problems with multi-array structure will also be investigated. The research aims to significantly speed up existing algorithms both theoretically and practically, lead to new theoretical results of existing algorithms that currently lack convergence analysis, and give rise to novel algorithms for computing solutions to complicated problems that are currently not efficiently solvable.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1719549
Program Officer
Leland Jameson
Project Start
Project End
Budget Start
2017-08-15
Budget End
2020-07-31
Support Year
Fiscal Year
2017
Total Cost
$96,000
Indirect Cost
Name
Rensselaer Polytechnic Institute
Department
Type
DUNS #
City
Troy
State
NY
Country
United States
Zip Code
12180