Zygmunt Pizlo, Purdue University Longin Jan Latecki, Temple University
The human eye, like a camera, produces 2-dimensional images of a 3-dimensional world. How does the human brain succeed in interpreting these impoverished 2-dimensional images, allowing us to see the world as it actually is "out there?" This fundamental question, whose significance has been appreciated for 300 years, has not been answered despite the efforts of many scientists, engineers and mathematicians. Conventional approaches, which have not been successful, tried to recover the 3-dimensional shapes of objects and scenes from their 2-dimensional images by analyzing the depths of surfaces in multiple images (such as might be obtained from two eyes or from moving images) and by emphasizing the role of learning and familiarity. The approach taken by Zygmunt Pizlo at Purdue University and Longin Jan Latecki at Temple University is very different. It uses only a single 2-D image to recover the third dimension by applying a priori constraints (assumptions about the world built-in to the human visual system) that reflect important visual properties that are generally present in the physical world, properties such as the symmetry and compactness of 3D objects.
Pizlo and Latecki's research has the potential of encouraging theoretical changes in the study of human perception because it uses an entirely new approach to a classical unsolved problem in vision. It could support breakthroughs in machine vision because human beings are known to be much better than any machine confronted with recovering the 3D world from 2D information. Machine vision has important applications to many domains, including law enforcement and national security. Pizlo and Latecki's research attempts to solve the visual 3D shape problem by combining the results of experiments on human observers with state-of-the-art computational modeling. The project will provide an excellent opportunity for the interdisciplinary education of graduate and undergraduate students in psychology, computer science and engineering.
Shape refers to all spatially global geometrical characteristics of a 3D object. Because of its inherent complexity, shape is almost always sufficient to establish the identity and functionality of the object. This is true with chairs, tables, sofas, bookshelves and trash-cans, animals, trees and flowers, tools, parts, cars, planes, bicycles and motorcycles, i.e., everything important in our environment. Clearly, the ability to see shapes of objects correctly is critical in all our everyday activities. Subjectively and commonsensically, visual perception of 3D shapes is accurate, easy and effortless. But the underlying computational mechanisms are less than obvious. From the mathematical point of view the main challenge is related to the fact that the visual input is a set of 2D retinal images, but the world around us is 3D. It follows that there are always many different possible 3D interpretations of the 2D retinal images. Is the visual system able to choose a unique interpretation, and if so, is the interpretation the right one? The long history of research on visual shape perception is quite inconclusive: some studies reporting accurate shape perception, while others reporting large shape illusions. We showed that perception of shapes of 3D objects is extremely good and we provided a mathematical theory and a computational model of the underlying mechanisms. The computational model explains which aspects of the 2D retinal image are used by the visual system and how they are combined with a priori knowledge of the 3D physical world, "out there" to produce an accurate 3D interpretation. We showed that symmetry of natural objects is the main a priori constraint used by the human visual system. This constraint is universal in the sense that all natural and man-made objects are either symmetrical or near-symmetrical. If this constraint cannot be applied to a given object, 3D shape perception is completely unreliable. Our computational model serves as the basis for building a machine that sees like us. This way, our basic research on human vision has broader implications for computer vision, artificial intelligence and robotics. At the same time, our model explains how the brain solves the shape problem and, furthermore, the model resolves all apparent contradictions that cluttered the research on human shape perception for dozens of years.