A fundamental challenge for Artificial Intelligence is sequential decision making under uncertainty, a task where automated algorithms lag far behind human-level intelligence. The primary reason for the disparity is curse of dimensionality - the number of states is exponential in the problem features. Recent advances that restrict decision-theoretic computation to a reachable subset of state space have scaled to moderately-sized problems, but proven ineffective in scaling to real problems. On the other hand, probabilistic planners based on deterministic planning might scale up, but with a massive loss in solution quality.
This project is investigating several methods to scale probabilistic planning to real-sized problems. We combine decision-theoretic analysis, basis function approximation and the classical AI planning techniques, to develop a series of highly scalable planners. A common theme in our techniques is the use of deterministic plans to automatically obtain domain abstractions in the form of 'good' or 'bad' properties, or intermediate subgoals. The project introduces and exploits a principled collaboration between decision theory and classical planning techniques, thus retaining the benefits of both - high quality as well as high performance. Experiments show that our new planner solves difficult planning competition problems using orders of magnitude less memory outputting high quality policies.
Our research also proposes effective solutions to long-standing problems of generating a set of basis functions and computing a hierarchical problem decomposition. Both basis function approximation and hierarchical decomposition are popular in existing literature for speeding up planning, but they are not fully automated - a human is required to specify the basis functions and the hierarchy. We provide novel, domain-independent solutions that remove this additional human effort.
Our research addresses several long standing challenges in AI, like scaling stochastic planning, and automatically generating basis functions and subgoal hierarchies. We expect to produce state-of-the-art planners that will be effective in large and complex real world scenarios, e.g., planetary exploration, military operations planning, and robotic decision making.
Automated planning – devising a course of action to best accomplish an agent’s goals and objectives – is a major area of study for Artificial Intelligence researchers. Planning is especially difficult in domains where uncertainty is rife and actions do not always have their intended effect. In this case a mathematical framework, called Markov Decision Processes (MDPs), is often adopted because it can describe a wide variety of planning scenarios ranging from military operations planning to controlling a Mars rover. However, today’s MDP solution techniques scale poorly, limiting their practical applicability. Furthermore, theorists have shown that efforts to devise algorithms that solve MDPs exactly (eg find the guaranteed best possible policy) are doomed to be impractically slow for large problems. Therefore, under this grant, we investigated approximate strategies to find methods with that run quickly enough to be practical yet produced excellent (if not optimal) plans. Specifically, we devised algorithms that automatically discover and exploit the hidden structure of factored MDPs. Doing so helps solve MDPs faster and with less memory than state-of-the-art techniques. Our algorithms discover two complementary state abstractions – basis functions and nogoods. A basis function is a conjunction of literals; if the conjunction holds true in a state, this guarantees the existence of at least one trajectory to the goal. Conversely, a nogood is a conjunction whose presence implies the non-existence of any such trajectory, meaning the state is a dead end. We compute basis functions by regressing goal descriptions through a determinized version of the MDP (i.e., one in which the effects of actions are predictable). Nogoods are constructed with a novel machine-learning algorithm that uses basis functions as training data. Our state abstractions can be leveraged in several ways. We describe three diverse approaches — GOTH, a heuristic function for use in heuristic search algorithms such as RTDP; RETRASE, an MDP solver that performs modified Bellman backups on basis functions instead of states; and SIXTHSENSE, a method to quickly detect dead- end states. In essence, our work integrates ideas from deterministic planning and basis function-based approximation, leading to methods that outperform previous approaches by a wide margin.