Finding statistical models that capture the regularities and variabilities of the bewildering varieties of visual patterns in natural scenes is at the heart of understanding the mystery of vision. Continuing the pattern-theoretical approach pioneered by Grenander and advocated by Mumford, and building on the active basis model that the PIs have recently developed, the PIs propose research projects to further develop statistical models as well as associated learning and inference algorithms for vision. The active basis model is a mathematical representation of deformable templates of object patterns. Each template is a sparse composition of selected Gabor wavelet elements that are allowed to perturb their locations and orientations. The template can be learned from training images by a shared sketch algorithm. The learned template can then be used to recognize objects from testing images using a cortex-like architecture of sum-max maps. The proposed research develops hierarchical compositional models with active basis models as building blocks or part-templates. The proposed research studies unsupervised learning of dictionaries of active basis templates from natural images or images of objects from multiple categories and viewpoints. The proposed research also studies a shape script model where the part-templates are designed elementary geometric shapes that are represented by the active basis models. Moreover, the proposed research compares generative and discriminative approaches to learning, using active basis model as an example of generative model. In addition, the proposed research extends the active basis model by coupling wavelet sparse coding for shape patterns and Markov random fields for texture patterns.
Biological visual cortex can learn and recognize huge number of visual patterns in its environment effortlessly. One may consider the visual cortex as an extremely sophisticated statistical model equipped with extremely efficient and robust learning and inference algorithms. What this model looks like and how it learns from its visual environment is still a deep mystery. The proposed research has the potential to contribute to advancing our understanding of this issue. It also leads to concrete models and algorithms that can be used for learning and recognizing a wide variety of object patterns.
This project is about developing mathematical and statistical representations of images of natural scenes. In particular, we seek to develop a method that can automatically learn a dictionary of "visual words" from training images, so that each image can be represented by a small number of "words" automatically selected from the dictionary. Such a representational scheme provides symbolic interpretations of natural images, and it enables the computer to perform vision tasks such as object recognition. In this project, we have developed a compositional sparse coding model for representing natural images based on the afore-mentioned idea. We have also developed an associated computational algorithm for learning the model from the training images. We have applied the model and algorithm to some vision tasks and achieved the state of art performances. The PI and co-PI have given invited talks in departmental seminars in different universities, in conferences, workshops and summer schools, based on the results obtained in this project. Some of the papers resulted from this project have appeared in or been accepted to peer reviewed journals and conference proceedings. The project helps us train graduate students in conducting original research, writing computer code, doing experiments to evaluate the proposed methods, and communicating the results by writing papers and giving presentations.