This project is a collaborative effort between the University of Delaware and Millersville University. Information graphics (non-pictorial graphics such as bar charts and line graphs) occur frequently in popular media such as newspapers and magazines. Not only is the knowledge conveyed by these graphics very often not included in the article's text, but (in contrast with scientific documents) the article's text most often does not even explicitly refer to the graphics. Information retrieval research has focused on the text of documents, and their information graphics have largely been ignored. Yet, the graphic designer considered the graphic's message important enough to warrant designing a graphic to convey it. This project's goal is a novel methodology for retrieving relevant information graphics from a digital library in response to user queries.

Information graphics in popular media generally have a communicative goal or message that they are intended to convey. This message encapsulates the high-level knowledge contained in the graphic. The approach of the project is a language model that treats the relevance of a graphic to a query as a mixture of three components: a graphic's intended message, other textual components of the graphic such as its caption and additional textual description augmenting the caption, and the text of the document containing the graphic. Challenges that are being addressed include identifying the portion of the article that is relevant to the graphic, associating query terms with the intended messages of graphics in the document library, expanding the abbreviated captions and additional textual descriptions of graphics to more fully capture their content, and appropriately weighting the contribution of individual components of the mixture model. In addition, some kinds of graphics, such as grouped bar charts, have both a primary intended message and a secondary message. The impact of the secondary message on retrieval when an ideal graphic is unavailable is also being addressed. Evaluation of the graph retrieval methodology consists of experiments in which human subjects rate the relevance of retrieved graphics to user queries.

The goal of this project is to produce a system for retrieving relevant information graphics, thereby expanding the utility of digital libraries. Together with the SIGHT system, which conveys the content of information graphics via speech, the project will extend the information resources available to individuals with sight-impairments. The project will also produce a corpus of information graphics and their XML representations that can be used by other researchers. Corpora and research results will be disseminated on the project web site (www.cis.udel.edu/~carberry/Graph-Retrieval). In addition to significantly increasing the resources accessible from a digital library, the research will lay the foundation for expanding research on question-answering to take into account information graphics. The project will contribute to the development of future scientists by educating graduate students, providing research opportunities for undergraduates at a predominantly undergraduate institution, and enhancing the mentoring skills of graduate students as they work on a team that includes undergraduates.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1016916
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2010-09-01
Budget End
2015-08-31
Support Year
Fiscal Year
2010
Total Cost
$419,056
Indirect Cost
Name
University of Delaware
Department
Type
DUNS #
City
Newark
State
DE
Country
United States
Zip Code
19716