This CAREER proposal deals with the development of novel systems for automatic summarization which incorporate both linguistic and content quality considerations in their operation. The main motivation for the work is that even the best current systems do not take the characteristics of the input into account during their operation, they cannot estimate how successful they perform content selection, and completely ignore issues of linguistic quality of the output.

Improvement of linguistic quality of summaries requires a combination and relative assessment of a wide range of text quality factors: discourse relations, topic/entity/word coherence, form of referring expressions, vocabulary. Tools for automatic extraction of such models from the input text, including automatic discourse analysis of explicit and implicit discourse relations, are developed as part of the project. The resulting models of linguistic quality will have broader impact on a whole range of text producing applications including questions answering, machine translation, automatic essay grading and computer-assisted writing tutoring.

Improvement of content quality requires taking into account characteristics of the input. In particular, we develop measures of input difficulty, which enable systems to automatically predict if they can produce a good quality summary for a given input and permit for change of summarization strategy when necessary. Specialized summarization strategies for input types where current system performance is known to be suboptimal are also elaborated.

Text quality and summarization are research topics with cross-disciplinary appeal. The PI will offer project-based courses at the undergraduate and graduate level which have the potential to attract young people to the field of computer science.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0953445
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2010-02-01
Budget End
2015-01-31
Support Year
Fiscal Year
2009
Total Cost
$431,926
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104