RI: Small: Statistically Sound and Computationally Efficient Data Analysis Through Algorithmic Applications of Rademacher Averages

Upfal, Eli

Abstract

Machine learning and data mining are among the most influential contributions of computer science in the last decade. Given sufficiently large datasets and computational power one can discover patterns and make reasonably accurate predictions. While there has been tremendous progress in designing efficient algorithms for analyzing massive datasets, there has been less progress in providing rigorous measures of statistical significance or robustness of the analysis. As we analyze large and noisy datasets to model complex relationships in data, it is critical to develop formally proven methods with clear performance guarantees. This project advocates a responsible approach to data analysis, based on well-founded mathematical and statistical concepts. Such an approach enhances the effectiveness and reliability of evidence- based decision making in medicine, policy and other social applications of big data analysis. Capacity-building activities of this project include: (1) Creation and dissemination of algorithms and software that implement rigorous, interpretable, and usable computational and statistical approaches to big data analysis; and (2) Educational initiatives at the graduate and undergraduate level to build a bigger and more diverse workforce of data scientists with the appropriate foundational skills both to apply analytical tools to existing datasets and to develop new approaches to future datasets.

The goal of this project is developing practical data analysis algorithmic applications based on the theoretical machine learning concept of Rademacher complexity. This project is motivated by preliminary results that have shown that the analytical properties of the Rademacher complexity, combined with its efficient sampling properties, provide a unique opportunity to develop general tools to begin bridging the gap between theory and practice in large scale data analysis. In particular, the project is focused on the following aims: improve the efficiency of rigorous data analysis algorithms through better sample complexity bounds; improve multi-comparisons and overfitting control through Rademacher generalization bounds; develop theory and practical applications of Cartesian and Chaos Rademacher Complexities; develop efficient algorithms for estimating the empirical Rademacher complexity; and explore new rigorous data analysis algorithms through the application of Rademacher theory.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 1813444
Program Officer: Rebecca Hwa

Project Start
Project End
Budget Start: 2018-09-01
Budget End: 2021-08-31
Support Year
Fiscal Year: 2018
Total Cost: $466,000
Indirect Cost

RI: Small: Statistically Sound and Computationally Efficient Data Analysis Through Algorithmic Applications of Rademacher Averages
Upfal, Eli
Brown University, Providence, RI, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments