Basing decisions on data is preferable to relying on heuristics or rules of thumb. Using data effectively, however, can be challenging. In domains like agriculture or medicine, datasets are usually small, biased, and noisy. For instance, the full effects of reduced pesticide applications depend on the weather and the impacts on yield may not be known until the harvest. Reducing pesticide applications reduces costs and provides ecological and consumer benefits, but using too little of it can easily cause a crop failure and significant financial losses. These dual problems of limited data availability and a high cost of failure are also common in manufacturing, maintenance, and even robotics. Because most existing reinforcement learning methods assume large datasets, stakeholders often dismiss data-driven methods and rely on heuristics to make decisions that are apparently safe but quite sub-optimal. This research develops new robust methods for data-driven decision making that can recommend good actions that are also safe even when data is limited. The new reinforcement learning methods use prior domain knowledge to estimate the confidence in possible outcomes to prevent catastrophic failure when predictions are incorrect. The practical viability of these methods is tested on the problem of using historical data to recommending improved pesticide schedules for fruit orchards and is disseminated to practitioners.

This research targets reinforcement learning problems with 1) limited or expensive data and 2) a high cost of failure. When bad decisions cause large losses, injury, or death, then having confidence in a policy's quality is more important than its optimality gap. Computing high-confidence policies in reinforcement learning is difficult. Even small errors can quickly accumulate through positive feedback loops and covariate shift. Therefore, more robust methods are needed to convince practitioners to benefit from data instead of relying on heuristics. The project combines robust optimization with model-based reinforcement learning to compute good policies that are resistant to data errors. Robust optimization has achieved successes in many areas but can be difficult to use with reinforcement learning. It requires a model of plausible uncertainty levels, so-called ambiguity sets, to properly balance solution?s quality and confidence. Constructing good ambiguity sets manually in sequential decision problems is very difficult even for robust optimization experts. This research investigates a new data-driven Bayesian approach to robust reinforcement learning. It combines hierarchical Bayesian models with robust optimization to leverage powerful hierarchical modeling techniques while avoiding the computational complexity often associated with Bayesian reinforcement learning.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2018-08-15
Budget End
2021-07-31
Support Year
Fiscal Year
2018
Total Cost
$437,753
Indirect Cost
Name
University of New Hampshire
Department
Type
DUNS #
City
Durham
State
NH
Country
United States
Zip Code
03824