Big Data has become ubiquitous in modern industrial and scientific applications where the size and dimensionality of data are becoming so large as to require new statistical tools for efficient data analysis. This collaborative project involving researchers at Rutgers University and Microsoft Research focuses on the theoretical and algorithmic development of advanced computational methods for big data analytics. While the problems to be investigated are motivated by various Internet applications, the resulting solutions are expected to be broadly applicable to other domains.

The project considers three interrelated main themes in big data analytics: (a) effective sampling of big datasets to filter out unreliable data source and improve statistical analysis; (b) dimensionality reduction techniques that can best preserve information via hashing and sparse random projection techniques; and (c) large scale optimization techniques for machine learning that can directly handle large datasize. Anticipated results of this work include new theoretical results, new data analytics algorithms, and their open source software implementations.

Broader impacts of the research include broadly disseminated open source implementations of scalable data analytics algorithms, research-based training and education of graduate and undergraduate students, and academic-industrial collaborations resulting in an interplay between fundamental research in machine learning and industrial applications.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1250985
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2013-07-01
Budget End
2017-06-30
Support Year
Fiscal Year
2012
Total Cost
$738,971
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
Piscataway
State
NJ
Country
United States
Zip Code
08854