Innovations driven by recent progress in artificial intelligence (AI) have demonstrated human-competitive performance. However, as research expands to safety-critical applications, such as autonomous vehicles and healthcare treatment, the question of their safety becomes a bottleneck for the transition from theories to practice. Safety-critical autonomy must go through a rigorous evaluation before massive deployment. They are unique in the sense that failures may cause serious consequences, thus requiring an extremely low failure rate. This means that test results under naturalistic conditions are extremely imbalanced - with the failure cases being rare. The rarity, together with the complex AI structures, poses a huge challenge to design effective evaluation methods that cannot be adequately addressed by conventional methods.

This proposal aims to understand the fundamental challenges in assessing the risk of safety-critical AI autonomy and puts forward new theories and practical tools to develop certifiable, implementable, and efficient evaluation procedures. The specific aims of this research are to develop evaluation methods for three types of AI autonomy that cover a broad array of real-world applications: deep learning systems, reinforcement learning systems, and sophisticated systems comprising sub-modules, and validate them with the sensing and decision-making systems of real-world autonomous systems. This research lays the foundation for the PI’s long-term career goal to safely deploy AI in the physical world, opens up a new cross-cutting area to develop rigorous and efficient evaluation methods, addresses the urgent societal concern with the upcoming massive deployment of AI autonomy, and train a diverse, globally competitive workforce through education at all levels.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Computer and Network Systems (CNS)
Application #
Program Officer
David Corman
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Carnegie-Mellon University
United States
Zip Code