In state-of-the-art Cyber-Physical-Systems (CPS) supervised learning or unsupervised learning are typically used to analyze data. Nevertheless, in many such systems rules cannot be determined in advance and these data mining techniques are not directly applicable due to the dynamic nature of the data, their large volume that prohibits labelling in practice, and the fact that these data are added to the system piece by piece and not altogether in advance. On the other hand, control of CPS is usually done in a model-based manner, where a desired control policy is computed from a high-fidelity system model that has been derived at design-time, and potentially may be updated at runtime. However, this approach is not suitable for highly dynamical CPS, that potentially represent systems of systems whose spatial and temporal configurations may rapidly change. In fact, with such high number of configuration levels, it is almost impossible to derive suitable control policies using standard model-driven techniques. Consequently, it is critical to facilitate design of data-based controllers, with strong performance guarantees, in a way that allows for natural runtime control adaptation. Reinforcement Learning (RL) provides such a framework. In RL agents interact with the environment in a feedback loop to learn an optimal policy by taking appropriate sequences of actions in order to optimize longterm payoff. As such, RL can be much more efficient compared to supervised and unsupervised learning, in analyzing streaming data and especially in controlling a system. The goal of this project is to develop a distributed off-policy RL framework for the control of CPS. Distributed RL methods avoid the fragility, communication overhead, and privacy concerns of collecting all information at a central processing unit. Moreover, off-policy learning methods significantly improve sampling efficiency and ensure safer operation. The distributed RL framework developed under this project will have a profound impact on the control of CPS, in areas as diverse as transportation, manufacturing, health-care, smart city, urban planning, etc., that rely on multiple sensors for data collection and control. This project also involves an educational agenda focusing on K-12, undergraduate, and graduate level education. The outreach component of this project focuses on improving the pre-college students' awareness of the potential and attractiveness of a research and engineering career.

The technical aims of this project are divided into four thrusts. The first thrust develops distributed off-policy RL methods using linear function approximation of the action-value function. Distributed RL algorithms using linear function approximation have been proposed for policy evaluation only. This thrust develops new RL algorithms that can also improve the policy until an optimal policy is found, which is necessary for control. Since defining appropriate feature vectors for RL problems is generally difficult and since linear mappings might not able to capture possibly nonlinear interactions between these features, the second thrust develops distributed off-policy RL methods using nonlinear function approximation, specifically, Neural Networks. The third thrust develops distributed off-policy Actor-Critic methods. When the action space is large or continuous, Actor-Critic methods are much more effective since they parameterize the target policy function using either linear or nonlinear function approximation and learn the optimal parameter so that the resulting policy maps to the optimal action for every state. Finally, the fourth thrust develops distributed RL methods for asynchronous, heterogeneous, and non-stationary data that are common in modern CPS, where sensors do not observe identically distributed data nor do they sample data at the same time. Moreover, the distributions from which data are sampled can change with time. This project focuses on the development of algorithms and supporting theoretical results. The developed algorithms are evaluated in simulation on resource allocation problems in CPS, specifically, on the control of distributed shared vehicle dispatch systems.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1932011
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2019-10-01
Budget End
2022-09-30
Support Year
Fiscal Year
2019
Total Cost
$407,522
Indirect Cost
Name
Duke University
Department
Type
DUNS #
City
Durham
State
NC
Country
United States
Zip Code
27705