This EArly-Concept Grant for Exploratory Research (EAGER) project takes a clean-slate first-principles approach to the design of safety-critical autonomous systems by integrating formal methods and reinforcement learning from data. Several recent high-profile traffic incidents involving semi-autonomous vehicles have raised questions about whether current artificial intelligence (AI)-centered methods can ever lead us to Level 4 or 5 autonomy, i.e., to the realization of fully-autonomous vehicles with performance equivalent to a human driver in all driving scenarios. On the other hand, approaches rooted in formal methods for verification and synthesis can provide safety guarantees but have difficulty in efficiently reasoning about uncertainty and the correctness of data-driven models. This project will combine these two, seemingly incompatible, paradigms for designing autonomous systems. It will use model-free reinforcement learning algorithms to learn from semi-autonomous vehicle driving data. It will adopt model-based methods for system design, verification, and synthesis to offer provably safe operation in highly uncertain scenarios. An AutoDrive testbed will be set up where human driving data from scaled vehicular models will be leveraged to infer safe control policies using imitation and inverse reinforcement learning algorithms. The research is relevant to the science of intelligent autonomous transportation systems with significant societal implications. The experimental testbed will be used to provide hands-on research experience to undergraduate students and for K-12 outreach efforts.
In particular, the project will develop a framework for optimal control synthesis for safety and performance specification expressed in signal temporal logic. It will then incorporate vehicular and pedestrian kinematics in non-deterministic/probabilistic transition models specified via probabilistic computation tree logic. Finally, it will develop formal reinforcement learning methods for partially observed dynamic models subject to safety specifications and complex temporal goals by learning from traces of safe human drivers. One key technical contribution of the project will be development of new formal reinforcement learning methods that may be useful in a broad array of applications wherein we must synthesize optimal controllers that satisfy certain safety specifications by learning from data.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.