Learning schemes in general, are known for their slow convergence rates. The proposal suggests various modifications for improving the speed of learning. The first is the use of multiple models which has proved very effective in other areas of systems theory. A second modification is the use of decentralized or partially decentralized multi-agent state decomposition approaches. Both of the above will make learning schemes in general and reinforcement schemes in particular fast converging and applicable to a wide class of real-world problems. The proposed research will develop the foundations of fast reinforcement learning using these two broad approaches, and will also apply these methods to the optimal control of a fleet of Plug-in Hybrid Electric Vehicles. The outcome of this research will have a societal impact for optimally controlling a fleet of Plug-in Hybrid Electric Vehicles for energy efficiency. The application of the proposed research will have extended application in other areas of systems theory such as neurobiology. Graduate and undergraduate students will be trained in learning schemes for multidisciplinary applications. Cross-fertilization of ideas will be facilitated through a bi-annual workshop on adaptive and learning systems and by leveraging a multidisciplinary Institute for Mathematical Modeling and Computational Science (iMMCS).
The proposal focuses on improving convergence speed of learning schemes in multiple agents in order to overcome the limitations of dimensionality, as well as in situations where infrequent communication exists between agents. Innovation is in the use of multiple models and state decomposition to speed up the learning and convergence. The technical approach lies in using multiple identification models; reinforcement learning by using decentralized or partially decentralized multi-agent state decomposition approaches; and evaluating fast reinforcement learning in an application test-bed for controlling a fleet of Plug-in Hybrid Electric Vehicles (PHEVs). The Research Team further plans to quantify the trade-off between learning speed and quality of the "learned solutions".