A fundamental challenge in machine learning is the design of computational agents that, rather than being explicitly programmed, autonomously learn complex tasks in stochastic real-world environments. Past approaches, such as reinforcement learning algorithms for solving Markov decision processes, scale poorly to large state spaces. The proposed research addresses this curse of dimensionality by investigating a novel framework combining reinforcement learning and online convex optimization, in particular mirror descent and related algorithms. Mirror descent scales significantly better than classical first-order gradient descent in high-dimensional state spaces, by using a distance-generating function specific to a particular state space geometry.
The proposed framework enables several significant algorithmic advances in the design of autonomous machine learning agents: a new class of first-order mirror-descent based methods for learning sparse solutions to Markov decision processes will be developed that scale significantly significantly better than previous second-order methods; novel hierarchical methods for solving semi-Markov decision processes will be investigated; and finally, applications to a variety of high-dimensional Markov decision processes will be explored.
The anticipated outcomes of the proposed work include foundational advances in designing autonomous agents that learn to solve sequential decision-making problems, which will impact a large number of target applications from manufacturing to robotics and scheduling. The educational goal includes the development of a graduate-level course in online convex optimization for sequential decision-making, as well as interdisciplinary tutorials to enhance the cross-fertilization of ideas from applied mathematics and optimization to machine learning and artificial intelligence.