Deep learning has attracted a significant amount of interest in recent years due to its widespread applicability in computer vision, artificial intelligence and natural language processing, alongside recent strides in autonomous driving. The theoretical underpinnings behind such success, however, remain elusive to a large extent, hindering its further adoption in other applications. This project aims to advance the theoretical foundations of training neural networks in terms of optimization landscape and algorithmic efficacy, which in turn should have a measurable impact on the practice of deep learning by providing guiding principles for network design, algorithm selection, hyperparameter tuning, and adversarial training. This project adopts an interdisciplinary approach fusing ideas from machine learning, optimization, statistical signal processing, high-dimensional statistics, nonparametric statistics, and information theory. This project will likewise develop courses and tutorials on theoretical foundations of large-scale machine learning and provide extensive training opportunities for students at all levels.
This project aims to develop a comprehensive theory to characterize the optimization landscape and geometry of loss functions and algorithmic regularizations of major neural network training problems, and explore how the network architecture---including depth, width, and activation functions---affect these properties, thus providing guidelines for the design of algorithms to train these networks more efficiently with theoretical performance guarantees. The project will explore the geometric properties and their impact on the optimization performance in training multi-layer neural networks, auto-encoders, generative adversarial networks, and adversarial training involving non-convex and saddle-point problems.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.