Text Reuse and Information Flow

Croft, W. Bruce

Abstract

News stories or Web pages can contain a great deal of reused information. Different authors may each present different versions of a story or event based on the same sources, and the facts of an event may get recapitulated or restated each time it is presented. Sometimes such presentations have little in common with each other; at other times one may be a copy of the other with minor edits. Given a topic of interest, then, a sufficiently extensive archive could be used to identify when particular ideas or statements originated and to check their validity. The goal of this project is to develop techniques to identify alternative versions of the same information in order to reconstruct how information "flows" between documents.

The project involves the investigation of a range of approaches to detecting reuse at the level of sentences, passages and documents. The research is evaluated using a range of corpora, such as news, Web crawls, and blogs, in order to explore the dimensions of reuse and information flow in different situations.

The research and its outcomes will have a significant impact on the design of tools that can be used to validate and assess information that comes from sources of differing reliability. Such a tool would be valuable in many applications in education, scientific research, and national security. The results of the research will be published in papers, will be accessible via the project Web site (http://ciir.cs.umass.edu/research/textreuse.html) and source code will be distributed through the popular Lemur toolkit (www.lemurproject.org/).

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0534383
Program Officer: Maria Zemankova

Project Start
Project End
Budget Start: 2006-04-15
Budget End: 2010-03-31
Support Year
Fiscal Year: 2005
Total Cost: $398,000
Indirect Cost

Text Reuse and Information Flow
Croft, W. Bruce
University of Massachusetts Amherst, Amherst, MA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments