This project investigates how to accurately and robustly detect attributes from images (videos, and 3D data), with the goal of developing and publicly providing effective attribute detection tools. Visual attributes refer to human-namable and machine-detectable inherent characteristics of visual content from objects, scenes, and activities (e.g., four-legged, outdoor, and crowded). They possess versatile properties and application potentials by offering a natural human-computer interaction channel for involving humans in the loop of machine vision algorithms, serving as basic building blocks for one to compose categories and describe instances, and bringing rich prior knowledge and regularization to statistical learning models, to name a few. The project advances the long-standing pursuit of utilizing attributes for a wide variety of visual recognition and search tasks. The project also actively engages graduate and undergraduate students, and outreaches local high-school students. The research results from this project can impact several related communities such as NLP, speech, and robotics, etc..
This research explicitly tackles the need that attribute detectors should generalize well across different categories, including those previously unseen ones. The research team approaches the problem based on multi-source domain generalization by taking each category as a domain. In particular, this project develops new feature extraction tools tailored to account for the middle-level attributes, as opposed to the traditional features primarily designed and tested for high-level visual recognition. The project consists of three major thrusts hinging on the key motivation of the analogy between attribute detection and domain generalization. It begins by learning a fine-grained "shallow" feature mapping (Thrust I) to distill attribute-discriminative signals that are category-invariant, and then investigates "deeper" into the feature extraction frameworks - Fisher vectors (Thrust II) and convolutional neural networks (Thrust III)-to revise them for the purpose of attribute detection.