A fundamental challenge in machine learning is the design of computational agents that, rather than being explicitly programmed, autonomously learn complex tasks in stochastic real-world environments. Past approaches, such as reinforcement learning algorithms for solving Markov decision processes, scale poorly to large state spaces. The proposed research addresses this curse of dimensionality by investigating a novel framework combining reinforcement learning and online convex optimization, in particular mirror descent and related algorithms. Mirror descent scales significantly better than classical first-order gradient descent in high-dimensional state spaces, by using a distance-generating function specific to a particular state space geometry.

The proposed framework enables several significant algorithmic advances in the design of autonomous machine learning agents: a new class of first-order mirror-descent based methods for learning sparse solutions to Markov decision processes will be developed that scale significantly significantly better than previous second-order methods; novel hierarchical methods for solving semi-Markov decision processes will be investigated; and finally, applications to a variety of high-dimensional Markov decision processes will be explored.

The anticipated outcomes of the proposed work include foundational advances in designing autonomous agents that learn to solve sequential decision-making problems, which will impact a large number of target applications from manufacturing to robotics and scheduling. The educational goal includes the development of a graduate-level course in online convex optimization for sequential decision-making, as well as interdisciplinary tutorials to enhance the cross-fertilization of ideas from applied mathematics and optimization to machine learning and artificial intelligence.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Massachusetts Amherst
United States
Zip Code