One of the key tenets taught in courses on Statistics and Machine Learning is that data interpolation (or, data memorization) inevitably leads to overfitting and poor prediction performance. Yet, most of the modern large-scale models, including over-parametrized neural networks, are routinely optimized to achieve zero error on training data. The research objective of this project is to challenge the common wisdom and develop theoretical and algorithmic foundations for methods that interpolate the training data.
The project will focus on the statistical and computational aspects of interpolation methods. Consistency and finite-sample bounds will be derived for regression and classification methods in the interpolation regime, and information-theoretic limits of interpolating rules will be developed. The project will also focus on the computational aspects of interpolation. The PI aims to shed light on the relative advantages and disadvantages of over-parametrized models that have capacity to perfectly fit the data.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.