Big Data has become ubiquitous in modern industrial and scientific applications where the size and dimensionality of data are becoming so large as to require new statistical tools for efficient data analysis. This collaborative project involving researchers at Rutgers University and Microsoft Research focuses on the theoretical and algorithmic development of advanced computational methods for big data analytics. While the problems to be investigated are motivated by various Internet applications, the resulting solutions are expected to be broadly applicable to other domains.
The project considers three interrelated main themes in big data analytics: (a) effective sampling of big datasets to filter out unreliable data source and improve statistical analysis; (b) dimensionality reduction techniques that can best preserve information via hashing and sparse random projection techniques; and (c) large scale optimization techniques for machine learning that can directly handle large datasize. Anticipated results of this work include new theoretical results, new data analytics algorithms, and their open source software implementations.
Broader impacts of the research include broadly disseminated open source implementations of scalable data analytics algorithms, research-based training and education of graduate and undergraduate students, and academic-industrial collaborations resulting in an interplay between fundamental research in machine learning and industrial applications.