Understanding printed documents is an intelligent activity. This research is about automating one aspect of analyzing a document image and deriving a high-level representation of its visual content. Documents contain photographs and accompanying text. This effort is concerned with arriving at an integrated interpretation of the communicative unit consisting of photographs and their captions. When text describes salient aspects of a photograph, it is possible to use the text to direct a vision system in understanding the photograph. There are two components to this research: the first deals with language issues and the second with development of a vision subsystem. Methods of extracting visual information from text, specifically cues required to identify salient objects, are to be studied; such information may be present in a variety of forms, based on both syntax and semantics. The role of textually extracted visual cues in performing visual object recognition is also to be studied. As a test of the theory, it is proposed to develop a system where the result of parsing a caption of a newspaper photograph is used to identify human faces in the photograph. The face location subsystem will incorporate scale invariant techniques, and filters that characterize faces based on the presence of distinguishing visual features. //

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9014110
Program Officer
Howard Moraff
Project Start
Project End
Budget Start
1991-06-01
Budget End
1993-05-31
Support Year
Fiscal Year
1990
Total Cost
$188,888
Indirect Cost
Name
Suny at Buffalo
Department
Type
DUNS #
City
Buffalo
State
NY
Country
United States
Zip Code
14260