This research investigates shared representations, flexible learning techniques, and efficient multi-category inference methods that are suitable for large-scale visual recognition. The goal is to produce visual systems that can accurately describe a wide range of objects with varying precision, rather than being limited to identifying objects within a few pre-defined categories. The main approach is to design object representations that enable new objects to be understood in terms of existing ones, which enables learning with fewer examples and faster and more robust recognition.
The research has three main components: (1) Designing appearance and spatial models for objects that are shared across basic categories; (2) Investigating algorithms to learn from a mixture of detailed and loose annotations and from human feedback; and (3) Designing efficient search algorithms that take advantage of shared representations.
The research provides more detailed, flexible, and accurate recognition algorithms that are suitable for high-impact applications, such as vehicle safety, security, assistance to the blind, household robotics, and multimedia search and organization. For example, if a vehicle encounters a cow in the road, the vision system would localize the cow and its head and legs and report ``four-legged animal, walking left'', even if it has not seen cows during training. The research also provides a unique opportunity to involve undergraduates in research, promote interdisciplinary learning and collaboration, and engage in outreach. Research ideas and results are disseminated through scientific publications, released code and datasets, public talks, and demonstrations for high school students.