Many information workers are swamped with unfamiliar collections of text. One challenge is to obtain an accurate overview of a large text collection, such as the public comments collected in ''''''''notice and comment'''''''' rulemaking. No single tool currently provides a sufficiently diversified picture of such a corpus, and no adequate theory exists to help people explore and form a deep and nuanced understanding of such a text collection. This research seeks to develop a computational framework that allows further exploration of this problem from multiple, integrated perspectives. All the assembled perspectives will be brought together into a single overall supra-document structure that is dynamically constructed under user guidance. In this structure, hierarchical topic clusters will be cross-linked by opinion and argumentation links, using two classes of text analysis engines: one for topics and subtopics, and the other for argument structures. The research team will design, develop, build, and systematically test an overall text exploration framework, an application to support federal regulation writersone called the Rule-Writers Workbench. There is a strong collaboration with Federal government officials who will provide data and participate in user testing. The three PIs have successfully collaborated on a related project under previous NSF funding.

Intellectual Merit: This is a sustainable collaboration between computer science and political/social science research, rooted in a challenging and important real world application and informed by years of end user research. Dynamic, user-driven subtopic definition and clustering algorithms coupled with language modeling are an innovative yet reachable set of goals. The framework to be developed will be grounded in the humanities disciplines'' expertise in rhetoric, discourse structure, and subjectivity.

Broader Impacts: The Rule-Writers Workbench will allow federal government regulation writers to employ a suite of technical tools that perform independent analyses of public responses to proposed regulations, including near-duplicate detection and clustering, user-based topic selection from dynamically extracted keywords, opinion identification, and subtopic clustering. These capabilities will open new avenues for federal comment analysis.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0704210
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2007-08-15
Budget End
2011-07-31
Support Year
Fiscal Year
2007
Total Cost
$346,765
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213