The objective of this research is to develop new neural network structures to solve optimal control problems with dynamic decision making. These problems are quite complex since the system dynamics could switch modes at unknown times based on event based decision making. The approach is to develop the decision-making paradigms from cognitive science principles but their mathematical representations will use Decision Field Theory. Their solutions contained in neural networks will interact with another set of networks that embed solutions to the related optimal control problem formulated in an approximate dynamic programming framework.
Intellectual Merit
This research seeks to find unified controller solutions to problems which have both continuous and discrete elements in them. It is expected that the mathematical cognitive science ideas developed will lead to new representations and problem solving structures in computational neuroscience and control. The work proposed in this effort seeks to accomplish these objectives by offering a transformative approach that integrates concepts from system science and cognitive science.
Broader Impact
Abstractions and solution structures developed through this research can be used in consequence or emergency management systems like managing the aftermath of an earthquake, retrieving an impaired aircraft to stability and sustainable motion and landing, and managing multiple assets and allocation in striking responses to threats. Decision making structures resulting from this research can make tremendous impact on human-machine interactions too. For example, driver aid systems can be developed to augment human perception and enhance their cognition when they drive under impaired conditions.
One of the key factors that provide the ability of human experts to make complex dynamic decisions is their facility in seamlessly moving between a discrete plan of large scale goals and continuous control over behavior to achieve these goals. My colleague, S. N. Balakrishnan, and I proposed to build on this key property and develop autonomous systems that have this facility to seamlessly move back and forth between discrete plans and smooth control. To accomplish this goal we examined human decision making and planning behavior in dynamic decision task called a predator - prey or goal seeking task. In this task, one agent (e.g. a predator) must seek a moving goal (e.g. a prey). To make the task amenable to mathematical modeling, we used a discrete time and discrete state virtual world called a grid world, in which the agents and goals move from cell in one time step to another cell in a large table of cells in the grid world. There are obstructions and penalties located in the grid world that the agent has to learn to avoid in order to capture to the goal. This predator prey problem is an example of a large class of dynamic decision problems called Markov decision problems. Optimal solutions to Markov decision problems can sometimes be solved using a a mathematical method called dynamic programming. However, sometimes the problem is too large to solve using this method. In any case, dynamic programming is not something a human could easily perform in a natural way. Instead, in many complex applications, it is necessary to use learning models that learn the optimal path from experience with a large number of learning trials in the environment. Models which learn to solve Markov decision problems from experience are called reinforcement learning models. One of the important issues for reinforcement learning models is the choice of exploring new paths to find a better path or exploiting previously learned paths to maximize the probability of catching the goal. This is called the exploration-exploitation problem in reinforcement learning. One of the traditional methods for doing this is to used a rule called the soft max rule that chooses steps in a path probabilistically. The problem with this traditional method is that it's performance is not robust, because it is highly dependent on a parameter called a temperature parameter, and the proper value for this parameter is usually difficult to determine for different environments. We developed a new solution to the exploration-exploitation problem using a new theory of decision making called quantum probability theory. This theory applies the mathematical principle from quantum theory to human decision making. We developed a new quantum reinforcement learning model that used a quantum rule for selecting steps along a path. The quantum algorithm proved to be more robust than the traditional soft max method. In fact, when two predators (one using the traditional soft max rule, and the other using the quantum rule) are placed in the same grid world to catch a single prey, the quantum algorithm most frequently outperforms and catches the prey before the traditional soft max agent. We also examined human performance on these tasks to see which learning model matches human behavior the best. We found that humans are capable of much faster replanning after being blocked than is possible by reinforcement learning models. This finding has motivated us to develop a different kind of learning model capable of rapid replanning.