III-COR: From a Pile of Documents to a Collection of Information: A Framework for Multi-Dimensional Text Analysis

Callan, Jamie

Abstract

Many information workers are swamped with unfamiliar collections of text. One challenge is to obtain an accurate overview of a large text collection, such as the public comments collected in ''''''''notice and comment'''''''' rulemaking. No single tool currently provides a sufficiently diversified picture of such a corpus, and no adequate theory exists to help people explore and form a deep and nuanced understanding of such a text collection. This research seeks to develop a computational framework that allows further exploration of this problem from multiple, integrated perspectives. All the assembled perspectives will be brought together into a single overall supra-document structure that is dynamically constructed under user guidance. In this structure, hierarchical topic clusters will be cross-linked by opinion and argumentation links, using two classes of text analysis engines: one for topics and subtopics, and the other for argument structures. The research team will design, develop, build, and systematically test an overall text exploration framework, an application to support federal regulation writersone called the Rule-Writers Workbench. There is a strong collaboration with Federal government officials who will provide data and participate in user testing. The three PIs have successfully collaborated on a related project under previous NSF funding.

Intellectual Merit: This is a sustainable collaboration between computer science and political/social science research, rooted in a challenging and important real world application and informed by years of end user research. Dynamic, user-driven subtopic definition and clustering algorithms coupled with language modeling are an innovative yet reachable set of goals. The framework to be developed will be grounded in the humanities disciplines'' expertise in rhetoric, discourse structure, and subjectivity.

Broader Impacts: The Rule-Writers Workbench will allow federal government regulation writers to employ a suite of technical tools that perform independent analyses of public responses to proposed regulations, including near-duplicate detection and clustering, user-based topic selection from dynamically extracted keywords, opinion identification, and subtopic clustering. These capabilities will open new avenues for federal comment analysis.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0704210
Program Officer: Maria Zemankova

Project Start
Project End
Budget Start: 2007-08-15
Budget End: 2011-07-31
Support Year
Fiscal Year: 2007
Total Cost: $346,765
Indirect Cost

III-COR: From a Pile of Documents to a Collection of Information: A Framework for Multi-Dimensional Text Analysis
Callan, Jamie
Carnegie-Mellon University, Pittsburgh, PA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments