The long-term goal of this research is to develop a computational model of visual perception that achieves the same degree of robust intelligence exhibited in biological vision systems. The proposed research will advance the state of the art in the analysis of time-varying images by building models that capture the robust intelligence of the mammalian visual system. These models will allow the invariant structure (form, shape) to be modeled independently of its variations (position, size, rotation) and will be composed of multiple layers that capture progressively more complex forms of scene structure in addition to modeling its transformations. Mathematically, these multi-layer models have a powerful bilinear form and their detailed structure is learned from natural time-varying images using the principles of sparse and efficient coding.

The early measurements and models of natural image structure have had a profound impact on a wide variety of disciplines including visual neuroscience (e.g. predictions of receptive field properties of retinal ganglion cells and cortical simple cells in visual cortex) and image processing (e.g. wavelets, multi-scale representations, image denoising). The approach taken by this project extends this interdisciplinary work by learning higher-order scene structure from sequences of natural time-varying images. Given the evolutionary pressures on the visual cortex to process time-varying images efficiently, it is plausible that the computations performed by the cortex can be understood in part from the constraints imposed by efficient representation. Modeling the higher order structure will also advance the development of practical image processing algorithms by finding good representations of the scene for the image-processing task at hand. Completion of the specific goals of this project will provide new generative models of time-varying image formation and tools with which to analyze the statistics of natural scenes.

Most image processing problems are greatly simplified by finding a good representation of the data. As a result, this research has practical applications for deriving improved means for representing, indexing, and accessing digital content such as 2D images, and video. the models developed as part of this project are also broadly applicable to advancing image processing algorithms such as denoising of movies, movie compression, and scene analysis and classification. In addition, these models have a mathematical form that makes them generally applicable to research areas other than vision such as analysis of auditory signals, dynamic routing of network signals, and general data mining of complex data sets.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0625717
Program Officer
Daniel F. DeMenthon
Project Start
Project End
Budget Start
2006-06-01
Budget End
2007-05-31
Support Year
Fiscal Year
2006
Total Cost
$57,007
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704