The recent success of a machine-learning technique called reinforcement learning in benchmark tasks suggests a potential revolutionary advance in practical applications, and has dramatically boosted the interest in this technique. However, common algorithms that use this approach are highly data-inefficient, leading to impressive results only on simulated systems, where an infinite amount of data can be simulated. For example, for online tasks that most humans pick up within a few minutes, reinforcement learning algorithms take much longer to reach human-level performance. A good reinforcement learning algorithm called "Rainbow deep Q-network" needs about 18 million frames of simulation data to beat human in performance for the simplest of online tasks. This amount of data corresponds to about 80 person-hours of online experience. This level of data requirements limits the application of reinforcement learning algorithms in many practical applications that only have a limited amount of data. Theoretical understanding of how much data is needed for effective reinforcement learning is still very limited. This project aims to reduce the data requirements to train reinforcement learning algorithms by developing a comprehensive methodology for reinforcement learning algorithm design and analyzing convergence rates, which will in turn motivate design of fast and stable reinforcement learning algorithms. This project will have a direct impact on various engineering and science applications, e.g., the financial market, business strategy planning, industrial automation and online advertising.

This project will take a fresh perspective of using tools and concepts from both optimization and reinforcement learning. The following thrusts will be investigated in an increasing order of difficulty. 1) Linear function approximation: tools and insights will be developed to tackle challenges of non-smoothness and non-convexity in control problems. 2) General function approximation: new challenge of non-linearity will be addressed. 3) Neural function approximation: convergence to globally and/or universally optimal solutions will be investigated. In each of the three thrusts, new algorithms will be designed, and their convergence rates will be characterized. These results will be further used as guideline for parameter tuning, and to motivate design of fast and convergent algorithms.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Suny at Buffalo
United States
Zip Code