The objective of this project is to develop theory and algorithms for Markov decision processes (MDPs) with reversible dynamics. Reversible Markov chains arise in numerous areas, including queueing systems, models of caching schemes, biological models, models of diffusion processes, and Markov chain Monte Carlo techniques. The methods developed in this project will yield algorithms for computing optimal control policies for stochastic control problems involving reversible Markov chains. By exploiting linear programming formulations of MDPs together with equilibrium conditions for reversible systems, optimality conditions for reversible MDPs will be established. Several classes of algorithms that seek solutions to these optimality conditions will be developed, including a form of the policy iteration algorithm and an algorithm based on simulation. The methods developed in this project may also be extended to settings where effective suboptimal policies can be obtained by appro ximating general MDPs by reversible MDPs.
If successful, the results of this research will provide computationally efficient algorithms for a broad class of stochastic control problems, and might also provide insight into the structural properties of the solutions of some well known queuing problems. The algorithms to be developed can be implemented using significantly less computation and storage than conventional algorithms for Markov decision processes. One class of algorithms to be developed in this project can be implemented whenever a reversible system can be simulated, requiring very little model information beyond the output of the simulation. Additionally, this research program will be coupled with two educational activities. As part of this project, a new seminar course on the intersection of stochastic control, optimization, and game theory will be offered to graduate students at the University of Virginia. Also, a publicly available course reader will be developed for a graduate course on stochastic syste ms, emphasizing engineering applications and algorithms for analyzing stochastic models.
Markov processes are a very widely used class of probability models. For example, they can be used for modelling stock prices, chemical reactions, and communication networks. A large subset of these models are said to be "reversible", meaning that their behavior when observed forward in time is statistically similar to their behavior when observed backward in time. The research aim of this project was to develop new theory and algorithms for controlling and analyzing reversible Markov processes. The main outcomes of this project are the following: - We considered the problem of making sequential decisions in an environment that could be modelled by a reversible Markov process. We developed a theoretical characterization of the optimal control decisions, and well as algorithms for computing these decisions. - We applied our general results to improve a commonly used technique for sampling from complex probability distributions. - When simultating a model of a random process, one of the key challenges lies in simulating the model long enough so that the long-term behavior of the model can be observed. We showed how our general results can be used to accelerate simulations of reversible Markov processes. - We developed and implemented a new graduate course that covered topics related to data science. Specifically, the course covered the overlap between the topics of of optimal decision making, machine learning, and role played by reversible Markov processes.