This project develops advanced machine learning and computer vision technologies for affect-based video retrieval to retrieve videos according to their emotional content. Introducing such a personal touch into video retrieval can improve user's interaction experiences with videos by allowing user to retrieve and organize videos based on their specific emotional needs. In addition, the project also has impacts on a wide range of fields including advertisement and education, allowing the video creators such as advertisers, and educators to effectively customize the videos to best serve the users' emotional needs. The project also contributes to education and student training. The project is integrated with education by (a) introducing a course on computer vision for affective computing; (b) involving undergraduate and graduate students in this project, especially those from under-represented groups; and (c) organizing workshops and tutorials in major computer vision and affective computing conferences on topics related to this research for further dissemination of the research ideas and results.

This research addresses problems in video affective content analysis. Affect-based video retrieval faces two major challenges. First, there exists a significant semantic gap between the low level video features and the high level affective content of the video. Second, due to the subjective nature of user's perception of emotion, user's emotional responses to videos vary significantly with people. For the first challenge, the PI develops a novel generative deep model to automatically learn an affect-sensitive multi-modal middle level video representation from the raw video data. To further improve the characterization of the video's affective content, the PI augments it with semantic video attributes derived from well-established video production knowledge to produce a hybrid multi-modal middle level video representation. The hybrid multi-modal middle level representation can effectively bridge the gap between the raw video and its affective content. For the second challenge, the PI employs a multi-task deep learning method to tailor the middle level representation to each user's specific affective preferences in order to maximize their experience with videos.

Project Start
Project End
Budget Start
2015-09-01
Budget End
2020-12-31
Support Year
Fiscal Year
2015
Total Cost
$482,000
Indirect Cost
Name
Rensselaer Polytechnic Institute
Department
Type
DUNS #
City
Troy
State
NY
Country
United States
Zip Code
12180