This project explores (semi-)automatic ways to create "semantically discriminative" mid-level cues for visual object categorization, by introducing external knowledge of object properties into the statistical learning procedures that learn to distinguish them. In particular, the PIs investigate four key ideas: (1) exploiting taxonomies over object categories to inform feature selection algorithms such that they home in on the most abstract description for a given granularity of label predictions; (2) leveraging inter-object relationships conveyed by the same taxonomies to guide context learning, so that it captures more than simple data-driven co-occurrences; (3) exploring the utility of visual attributes drawn from natural language, both as auxiliary learning problems to bias models for object categorization, as well as ordinal properties that must be teased out using non-traditional human supervision strategies; (4) mining attributes that are both distinctive and human-nameable, moving beyond manually constructed semantics.
The project entails original contributions in both computer vision and machine learning, and is an integral step towards semantically-grounded object categorization. Whereas mainstream approaches reduce human knowledge to mere category labels on exemplars, this work leverages semantically rich knowledge more deeply and earlier in the learning pipeline. The approach results in vision systems that are less prone to overfit incidental visual patterns, and representations that are readily extendible to novel visual learning tasks. Beyond the research community, the work has broader impact through inter-disciplinary training of graduate and undergraduate students, and outreach to pre-college educators and students through workshops and summer camps encouraging young students to pursue science and engineering.