The ability to accurately model such complex phenomena as the natural scene statistics inherent in stacks of photographs or movies, or the collective behavior of hundreds of simultaneously recorded neurons in the cerebral cortex, would be transformative for our understanding of the natural world and of human thought. The insights gained would not only enhance our understanding of the brain and the sensory stimuli it can process, but they would confer practical advantages as well -- leading to improvements in automated speech recognition and meaningful analysis of real-time video, for example. The various data needed for these studies is coming online at a rapid pace, but these large and complex data sets defy traditional modeling and analysis techniques. Unfortunately, the complexity and size of many recently acquired corpora in biology, physics, and engineering domains render them incapable of being fit by powerful mathematical models unless they are constrained by strong and unjustified assumptions about the data. This, coupled with the general difficulty of developing general purpose machine learning algorithms has driven most contemporary scientists and engineers to focus on algorithms tailored to narrow problem spaces rather than tackling the more general machine learning problem. Fortunately, some researchers have continued to push for general learning algorithms with capabilities more similar to human intelligence, but they have typically had to rely on ad hoc assumptions or uncontrolled approximations in order to make progress on this daunting problem. This proposal is to further develop a recently introduced machine learning technique, called Minimum Probability Flow learning, so that it is capable of fitting exceedingly general parametric models to much larger data sets than has ever been possible before. In addition, this proposal is to develop novel, complimentary methods for sampling efficiently from a model distribution once the parameters have been fit to data, so that the models can be understood and meaningfully compared with one another. These techniques will be used to study the statistical structure of natural scenes by fitting a new and powerful mathematical model to a database consisting of a large number of photographs. The program proposed here is highly interdisciplinary, drawing ideas and approaches from physics, engineering, computer science, and systems neuroscience.

Project Start
Project End
Budget Start
2012-09-15
Budget End
2016-08-31
Support Year
Fiscal Year
2012
Total Cost
$449,999
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94710