This research program is developing theory and algorithms that will enable a robot to learn through training and experimentation how to predict object and environmental affordances from sensor data. These affordances determine which actions a robot may perform when interacting with a given object, and thus define the capabilities of the robot at any given time. For example, a doorway affords the possibility to leave one room and enter another, and a handle attached to an object affords the ability to grasp it. The approach being developed leverages a graphical model approach to learn visual categories ? to learn the world contains entities such as doors and handles ? that provide a powerful intermediate representation for affordance prediction and learning. This is in contrast to the classical direct perception approach in which the agent learns a direct mapping from image features to affordances. The models and theory are being validated on two robot platforms and tasks: an outdoor mobile robot performing navigation and pursuit/evasion tasks, and an indoor robot manipulator performing assembly/disassembly tasks.

The importance and broader impact of this research lies in empowering robots to actively and effectively learn about its environment given little human training. Because pre-programmed sensing capabilities are typically brittle ? not accounting for the variability of the world in which the robot is actually operating ? and because extensive human training and supervision is too labor intensive, such learning paradigms are essential for the development of robots that operate effectively in the human world.

Project Report

The outcomes of this work fall broadly into two areas. The first is in developing the ability for a computer vision system to understand egocentric imagery - imagery that is take from the point of view of a human agent. The second is in allowing a robot to understand and leverage the affordances of an object - the affordances being the types of actions that can be applied to an object and the resulting out comes form the application of those actions. In the egocentric work, we first developed a novel segmentation method based upon temporally consistent boundary partitioning. This work was essential to generate primitive regions that can be considered as objects relevant to the human activity. This method can be used in both n interactive mode and a purely automatic method; in the latter case we use the system for producing candidate regions for egocentric activity analysis. There we learn a hierarchical model of an activity by exploiting the consistent appearance of objects, hands, and actions that results from the egocentric context. The other main focus of the work was the learning of object affordance by a robot. One key development is an architecture that decouples the behavior used to attempt to achieve a particular manipulation goal, the real time controller user to control the behavior primitive, and the perceptual primitive used to encode the state of the object during behavior execution. Through systematic experimentation the robot can learn how to best manipulate a set of objects. The second significant result of this thrust is knowledge transfer between objects. The robot uses a shape description that captures both the global structure of an object and the local context of where it is applying the action. By manipulating a variety of objects, the robot learns a mapping from shape description to effective manipulation strategy. This allows the robot to apply information learned from manipulating a given set of objects to a novel object. This type of learned dynamics can also be applied to a model predictive control strategy though our results there are only preliminary. future work will leverage learned affordance models in a planning framework that permits the robot to plan how to achieve and overall task.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0916687
Program Officer
Richard Voyles
Project Start
Project End
Budget Start
2009-07-15
Budget End
2014-02-28
Support Year
Fiscal Year
2009
Total Cost
$449,063
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332