The automatic identification in images of people, places, objects, and especially object categories is a central and ongoing challenge within computer vision. This project addresses this problem using low-level image features to learn intermediate representations, ones in which objects in images are labeled with an extensive list of highly descriptive visual attributes. This work demonstrates this approach in three domains: faces, plant species, and architecture. In each domain, it develops techniques for deriving visual attribute vocabularies, training attribute detectors, and building compositional models to automatically label attributes in images.

The project is making four fundamental contributions to the use of visual attributes. 1) It is developing new methods by which automatic systems and humans can interact to select domain-appropriate attribute vocabularies and label large image collections. 2) It is developing compositional models that capture dependencies between attributes. This provides more accurate attribute detection and enables inference of global properties of objects. 3) Using compositional models, the project is developing new, localizable attributes that capture the geometric relations between object parts and landmarks. 4) The project is designing algorithms that combine attributes to identify objects, search through image vast collections, and automatically annotate image databases.

Not only is this research generating large datasets of labeled images that should help catalyze new research, it is also demonstrating the feasibility of new systems for analyzing images in specialized domains such as faces, plants, and architecture. For example, the project develops new software applications for analyzing and searching images of faces as well as free mobile apps for plant species identification.

Project Report

Our work over the three years of this award, on part- and attribute-based descriptions of objects, ultimately focused on the creation of a species-driven automatic recognition system utilized in an electronic database comprised of 500 North American bird species, created and categorized in new methods for acquiring domain specific vocabularies of describable visual attrributes; techniques that have enacted state-of-the art recognition performance on a new, large dataset that we make publicly available on two apps developed for the iPhone and a comprehensive website. Specific contributions to higher education and science for the general public include: 1. 2 PhDs trained at Columbia Univ. In addition, dozens of undergrads and masters students volunteered to work on the websites and app development in elective courses in computer science for extra credit. 2. Dogsnap iPhone app 3. Birdsnap website 4. Birdsnap iPhone app 5. General audience press as follows: What Kind of Bird is That? Snap a Picture and Find Out, in Columbia News (a.k.a. The Record), 13 May 2014 Is It a Crow or a Raven? New Birdsnap App Will Tell You!, from Columbia Engineering, 28 May 2014 Crow or raven? Twitcher app analyses your photos to tell you what bird you've snapped, in the Mail Online from the Daily Mail, 29 May 2014 New App Makes Identifying Bird Species Easy, in Mental Floss 3 June 2014 The Birdsnap App Review, in This Machine Watches Birds, 4 June 2014 Innovative Technology Gives Birdwatching a Boost, in Audubon Magazine, 13 June 2014 Birdsnap: Bird Nerding on a Smartphone, in OZY, 18 June 2014 161 Bird-Watcher Apps for the iPhone—and They’re All for the Birds, in Scientific American, 18 June 2014 To ID Birds, Try Facial Recognition, in Science News, 28 June 2014 (issue date 12 July 2014) App Wrap on Birdsnap (and GolfMatch), on NY1, 7 July 2014 Bird Watching in the 21st Century, in UMD Right Now, 17 July 2014 Birdsnap app lets enthusiasts ID birds with mobile photos, on WTOP (Washington D.C. 103.5 FM), 23 July 2014 Birdsnap Review, on Because Birds, 2 August 2014 6. "New methods that advance computer vision and machine learning for automatically detecting parts of objects in images." 7. "New methods that advance computer vision and machine learning for recognizing fine-grained visual categories in images." 8. "Acquisition and labeling of the largest image dataset of North American bird species for machine learning."

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University
New York
United States
Zip Code