This project will create novel algorithms and learning architectures suitable for understanding how to plan and execute actions in an environment for purposeful object manipulation. Such understanding is indispensable for autonomous agents operating in unstructured environments, and it is also valuable in providing automated assistance to humans during the execution of various physical tasks. The project will computationally "imagine" changes that actors with human-like manipulation capabilities can make on that environment and generate plans that can accomplish the desired manipulations. Such tools facilitate the creation of smart environments, where for example a perception system watching an elderly person can infer the task the person is trying to accomplish and offer advice/assistance. They also allow the creation of automated instructional videos customized to a particular environment that can be used for efficient training of unskilled workers. The project will provide mentoring and research opportunities for a diverse set of students, including members of groups typically under-represented in computer science.
This research will study environments formed by objects, some of which can be manipulated, while others define obstacles to be avoided or support surfaces to be used. Manipulating an object typically means interacting with small parts of the object, referred to as its active sites: handles, buttons, levers, graspable or pushable regions, etc. A deep challenge is to develop tools for identifying and classifying these active sites on objects at large scale, and to codify the types of interactions they partake of based on dynamic 2D/3D imagery, building a vocabulary of elementary actions. This requires novel machine learning methods and deep architectures for processing large-scale dynamic visual and geometric data. It also requires characterizing manipulations at a more abstract level so that they can be used by a variety of effectors, robotic or human, on different object geometries and physical characteristics. A further challenge is the accumulation and update of actionable information as more visual data is received in online object model repositories, such as ShapeNet. A final but key step of the approach will be the development of tools for transporting such action knowledge to new settings that are similar but not identical to the capture settings, using a variety of mathematical tools including functional maps.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.