Detecting and recognizing objects from real world images is a very challenging problem with many practical applications. The past few years have shown growing success for tasks such as detecting faces, text, and for recognizing objects which have limited spatial variability.
Broadly speaking, the difficulty of detection and recognition increases with the variability of the objects ? rigid objects being the easiest and deformable articulated objects being the hardest. There is, for example, no computer vision system which can detect a highly deformable and articulated object such as a cat in realistic conditions or read text in natural images. This project develops and evaluates computer vision technology for detecting and recognizing deformable articulated objects.
The strategy is to represent objects by recursive compositional models (RCMs) which describe objects into compositions of subparts. Preliminary work has shown that these RCMs can be learnt with only limited supervision from natural images. In addition, inference algorithms have been developed which can rapidly detect and describe a limited class of objects. This project starts with single objects with fixed pose and viewpoint and proceeds to multiple objects, poses, and viewpoints. Theoretical analysis of these models gives insight and understanding of the performance and computational complexity of RCMs.
The expected results are a new technology for detecting and recognizing objects for the applications mentioned above. The results are disseminated by peer reviewed publications, webpage downloads, and by university courses.