Recent advances in modern machine learning (deep learning in particular) are ushering in the era of artificial intelligence, which has the potential to revolutionize every aspect of our daily lives. However, much like the early days of the steam engine, a satisfactory understanding of deep learning has so far been elusive. We currently lack a formal theory of deep learning, one that could explain why we can train overly complex models with seemingly not enough training data and still find solutions that generalize to previously unseen data, or why models trained for one task also perform well on another related task, or why trained models are so vulnerable to slight, nearly imperceptible, corruptions of data. This project aims to address this need by developing an explanatory and prescriptive theory of deep learning that is tightly integrated with and motivated by the practice. Rather than view learning as simply a black-box optimization problem, the approach investigates the inner workings by shedding light on algorithmic heuristics that potentially play an equally important role in endowing the trained models with excellent generalization properties. Given the broad applicability of deep learning and the complementary nature of theoretical analyses and empirical studies in the proposed research, the project is particularly suited for integrating research into education and outreach. The proposed educational activities include curriculum development, summer internships, hackathons, and instructor's outreach through local Baltimore programs.

The project investigates the role of explicit algorithmic regularization in the form of early stopping, batch normalization, and dropout, as well as the choice of optimization algorithms and network architecture in providing an adequate inductive bias that helps with generalization. A second overarching goal of the project is to understand, more broadly, the generalization phenomenon in deep learning. It seeks to understand why systems that memorize the training data can still generalize well, how the neural network architecture enables transfer learning, and how to design robust algorithms that will guarantee that deep learning solutions generalize despite adversarial corruption to data.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1943251
Program Officer
Rebecca Hwa
Project Start
Project End
Budget Start
2020-02-15
Budget End
2025-01-31
Support Year
Fiscal Year
2019
Total Cost
$192,804
Indirect Cost
Name
Johns Hopkins University
Department
Type
DUNS #
City
Baltimore
State
MD
Country
United States
Zip Code
21218