This project focuses on the subfield of machine learning referred to as Reinforcement Learning (RL), in which algorithms or robots learn by trial and error. As with many areas of machine learning, there has been a surge of interest in "deep learning" approaches to reinforcement learning, i.e, "Deep RL." Deep learning uses computational models motivated by structures found in the brains of animals. Deep RL has enjoyed some stunning successes, including a recent advance by which a program learned to play the Asian game of Go better than the best human player. Notably, this level of performance was achieved without any human guidance. Given only the rules of the game, the program learned by playing against itself. Although games are intriguing and attention-grabbing, this feat was merely a technology demonstration. Firms are seeking to deploy Deep RL methods to increase the efficiency of their operations across a range of applications such as data center management and robotics. To realize fully the potential of Deep RL, further research is required to make the training process more predictable, reliable, and efficient. Current techniques require massive amounts of training data and computation, and subtle changes in the configuration of the system can cause huge differences in the quality of the results obtained. Thus, even though RL systems can learn autonomously by trial and error, a large amount of human intuition, experience and experimentation may be required to lay the groundwork for these systems to succeed. This proposal seeks to develop new techniques and theory to make high quality deep RL results more widely and easily obtainable. In addition, this proposal will provide opportunities for undergraduates to be involved in research through Duke's Data+ initiative.

The proposed research is partly inspired by past work on feature selection and discovery for reinforcement learning. Much of that work focused primarily on linear value function approximation. Its relevance to deep reinforcement learning is that methods such as Deep Q-learning have a linear final layer. The preceding, nonlinear layers can, therefore, be interpreted as performing feature discovery for what is ultimately a linear value function approximation process. Sufficient conditions on the features that were specified for successful linear value function approximation in earlier work can now be re-interpreted as an intermediate objective function for the penultimate layer of a deep network. The proposed research aims to achieve the following objectives: 1) Develop a theory of feature construction that explains and informs deep reinforcement learning methods, 2) develop improved approaches to value function approximation that are applicable to deep reinforcement learning, 3) develop improved approaches to policy search that are applicable to deep reinforcement learning, and 4) develop new algorithms for exploration in reinforcement learning that take advantage of learned feature representations, and 5) perform computational experiments demonstrating the efficacy of the new algorithms developed on benchmark problems.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1815300
Program Officer
Rebecca Hwa
Project Start
Project End
Budget Start
2018-08-01
Budget End
2021-07-31
Support Year
Fiscal Year
2018
Total Cost
$499,968
Indirect Cost
Name
Duke University
Department
Type
DUNS #
City
Durham
State
NC
Country
United States
Zip Code
27705