The applicability of many machine-learning techniques is limited to very specific, fixed application environments. As opposed to this, model-based Bayesian reinforcement learning technique is a more sophisticated framework, enabling human-like decision-making with evolving objectives and without complete knowledge of the environment. However, its computational complexity is very high, and consequently, its progress has been limited to small-scale demonstrations on limited applications. The goal of this project is to overcome this hurdle by developing innovative algorithm-hardware co-design techniques. The research outcome is expected to greatly accelerate the computation of model-based Bayesian reinforcement learning for practical, large-scale real-life applications, especially those with real-time constraints. The research results will benefit many fields that directly impact society, e.g., driver-less cars, unmanned aerial vehicles, smart agricultural irrigation, robotics for disaster relief and robotic assistants for handicapped people. The research will also train students, including women and other underrepresented groups, for the much needed U.S. workforce in related areas of technology.
The computational kernel of model-based Bayesian reinforcement learning is random sampling over a decision tree. In this project, the computational acceleration will be realized by making use of the intrinsic parallelism offered by the sampling process. Both the memory and the arithmetic bottlenecks of traditional approaches will be addressed. First, a logic circuit based technique will be developed to represent probabilities, and thereby greatly reduce the memory utilization of the algorithm. Second, powerful new arithmetic techniques will be explored to achieve area-efficient computations for the newly proposed number representation. Third, a new sampling method, that is friendly to circuit implementation, will be investigated. Fourth, an approximation technique will be studied to alleviate the complexity arising from tracing sampling histories. Finally, path-aware parallel sampling will be exploited to avoid the redundant computations in software implementations. These techniques and their overall effectiveness will be validated via experiments.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.