The process by which people shift their attention from one thing to another touches upon everything that we think and do, and as such has widespread importance in fields ranging from basic research and education to applications in industry and national defense. This research develops a computational model for predicting these human shifts in visual attention. Prediction is understanding, and with this model we will achieve a greater understanding of this core human cognitive process. More tangibly, prediction enables applications to anticipate where attention will shift in response to seeing specific imagery. This in turn would usher in 1) a new generation of human-computer interactive systems, ones capable of interacting with users at the level of their attention movements, and 2) novel ways to annotate and index visual content based on attentional importance or interest. This project combines computational work with cognitive science and digital media, providing an entry to computer science through valuable learning experiences for women and underrepresented minorities who might otherwise be intimidated by traditional computational work. The project broadens the exposure of underrepresented minorities to STEM through partnerships with several ongoing efforts at Stony Brook University, including the Women in Science and Engineering (WISE) and Louis Stokes Alliance for Minority Participation (LSAMP) initiatives.
This project investigates a synergistic computational and behavioral approach for modeling the movements of human attention. This approach is based on an assumption that attentional engagement on an image (or video frame) depends on both the pixels that are being viewed and the viewer's previous state. Based on this assumption, visual attention is posed as a Markov decision process, and inverse reinforcement learning is used to learn a reward function to associate specific spatio-temporal regions in an image, corresponding to the pixels at a viewer's momentary locus of attention, with a reward. Under this novel approach, the attention mechanism is treated as an agent whose action is to select a location in an image or image frame that will maximize its total reward. This model is being evaluated against a behavioral ground truth consisting of the eye movements that people make as they view images and video in the context of free viewing and visual search tasks.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.