The PIs will undertake an experimental study and computational modeling of the internal representations and associated processes that underlie action perception and understanding by observers, and action planning and execution by actors. To facilitate both careful experimentation and formal theory, the PIs will approach the behavior representation problem primarily through the visual system, asking how do we understand the actions of others using our vision? That is, how do we perform mappings from image sequences depicting simple actions to the corresponding internal representations that allow action recognition, imitation, etc? The PIs will further explore higher-level cognitive representations and mechanisms used to categorize, reason about, and judge the movements and actions of others. The approach is based on a novel formal theory of the mental representations and processes subserving action understanding and planning, which the PIs believe provides a compact but powerful and extensible computational approach to the analysis and synthesis of complex actions (and action sequences) based on a very small set of atomic postural elements ("key frames" or "anchors") and the corresponding probabilistic, grammatical rules for their combination. This probabilistic "pose grammar" approach to action representation is similar to state of the art techniques used for speech recognition (e.g., hidden Markov models), but with key postural silhouettes taking the place of phonemes; such augmented transition grammars also nicely reflect sophisticated new control-theoretic techniques in robotics for robust anthropomorphic movement. The action representational system is not monolithic, but rather occupies a spectrum of informational structures at hierarchical levels corresponding to different behavior "spaces": mechatronic space, used in movement planning and production; cognitive space, involving representations for action recognition, analysis, and evaluation; visual motion space, which encodes and organizes visual motion caused by human action; and linguistic motion space, comprised of conceptual/symbolic action encoding. Excluding here the latter space, the PIs' theoretic, computational, and experimental efforts seek to clarify and formally describe both the nature of the representations in these spaces and, crucially, the mapping of representations across spaces. Notably, they explore a candidate action representation, referred to as a visuo-motor representation, which, in facilitating the understanding of observed actions, may recapitulate and resonate with the actual motor representations used to generate movement. Moreover, they present a promising approach for obtaining this representation from discrete action elements or anchors.

Broader Impacts: This project will lead to significant advancements in both research and applications in psychology (e.g., robust social judgments given degraded biological motion), kinesiology (e.g., analysis/modeling/training of movement profiles, as in athletics or pathology/rehabilitation), robotics (e.g., control of anthropomorphic robots), human and computer vision (e.g., automated action recognition in digital video), and other fields concerned with the interpretation and production of human/humanoid action.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0433226
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
2004-09-01
Budget End
2008-08-31
Support Year
Fiscal Year
2004
Total Cost
$375,000
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138