This project investigates problems of human action recognition in video. A human action does not occur in isolation, and it is not the only thing recorded in a video sequence. A video clip of a human action also contains many other components, including the background scene, the interacting objects, the camera motion, and the activity of other people. Some of these components are contextual elements that frequently co-occur with the category of action in consideration. The project develops technologies that separated human actions from co-occurring factors for large-scale recognition and fine-grain visual interpretation of human actions. The developed technologies can have many practical applications in a wide range of fields, ranging from human computer interaction and robotics to security and health-care.

This research develops an approach to human action recognition by explicitly factorizing human actions from context. The key idea is to exploit the benefits of the information from conjugate samples of human actions. A conjugate sample is defined as a video clip that is contextually similar to an action sample, but does not contain the action. For instance, a conjugate sample of a handshake sample can be the video sequence showing two people approaching each other prior to the handshake. The handshake clip and the video sequence preceding it have many similar or even the same contextual elements, including the people, the background scene, the camera angle, and the lighting condition. The only thing that sets these two video clips apart is the actual human action itself. A conjugate sample provides complementary information to the action sample; it can be used to suppress contextual irrelevance and magnify the action signal. The specific research objectives of this project include: (1) collecting human action samples for many action classes; (2) developing algorithms to mine and extract conjugate human action samples; and (3) developing a framework that utilizes the benefits of conjugate samples for separating actions from context to learn classifiers for large-scale recognition and fine-grain understanding of human actions.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1566248
Program Officer
Jie Yang
Project Start
Project End
Budget Start
2016-08-01
Budget End
2019-07-31
Support Year
Fiscal Year
2015
Total Cost
$174,855
Indirect Cost
Name
State University New York Stony Brook
Department
Type
DUNS #
City
Stony Brook
State
NY
Country
United States
Zip Code
11794