Understanding how humans differentiate between visual objects is a central problem for psychologists who want to understand the visual system and for computer scientists who want to emulate its abilities. Previous work has developed techniques for determining the image locations and spatial frequencies most responsible for human performance in classification and identification tasks. These techniques, however, cannot tell us if the important image features are detected in a single step, for example, by matching the visual object to a single internal representation such as a template, or by combining the result of multiple detectors. With support from the National Science Foundation, Dr. Andrew Cohen from the University of Massachusetts, Amherst, will develop a new model, the Multiple Independent Template Induction Model (MITIM), designed to answer such questions.
Human behavioral experiments on tasks such as word recognition in noisy images have suggested that people do not always recognize objects holistically. Rather, they appear detect pieces independently and then combine those results. If this is true in domains such as face or object classification and recognition, understanding these abilities requires knowledge of the independently detected parts and their relative importance. The proposed MITIM algorithm will use statistical techniques from artificial intelligence research to process human image classification data and discover the independent templates used in object recognition. The insights gained will help illuminate human visual performance and help scientists and engineers better understand how to build computer systems that can replicate that performance, functionality that is increasingly important in web search, robotic, and security applications.
This award was supported as part of the fiscal year 2006 Mathematical Sciences priority area special competition on Mathematical Social and Behavioral Sciences (MSBS).