Each year Federal regulatory agencies issue more than 4,000 new rules. Many of these must be created through a process known as notice and comment (N&C) rulemaking: the agency drafts a proposed rule and then exposes the proposal, any underlying data, and its legal and policy rationale to public comment. N&C rulemaking is one of the most important methods of contemporary public policy making; it is also one of the slowest and most expensive. Although an agency may receive hundreds of thousands of comments for a proposed rule, its legal obligation is to review and respond to all significant comments. As requirements to consult, study, and/or certify have proliferated, rule writers have found it increasingly difficult to keep track of them and to recognize which, if any, are relevant in a particular rulemaking. Electronic rulemaking (eRulemaking) has the potential to radically transform the N&C process. It could make the process more transparent and accessible to the public, and more substantively reliable and cost-effective for the agency. So far, though, E-docket systems and eRulemaking workbenches make only rudimentary use of available technology. This grant will use well-developed and emerging methods of natural language processing (NLP) to develop tools to aid agency rule writers in: (1) organizing, analyzing, and managing the comments, studies, and other supporting documents associated with a proposed rule; and (2) analyzing proposed rules to flag possibly relevant legal mandates from among the large number of statutes and Executive Orders that potentially require analyses, consultations, or certifications during rulemaking. The research team will collaborate with the Federal Departments of Transportation and Commerce. The team will focus, in particular, on the use of information extraction, text categorization, and opinion-oriented text analysis techniques in both supervised and weakly supervised machine learning frameworks. Evaluation will involve: the use of accepted technical measures of NLP performance (e.g., recall and precision); a combination of qualitative and quantitative social science methods to assess integration of the tools into the rulewriting process as perceived by staff at various levels of the agency hierarchy; and observation by legally-trained researchers with expert understanding of the rulemaking process.

Intellectual Merit. The research will help realize the positive potential of eRulemaking, advance the state-of-the-art in NLP, and improve our understanding of the effects of technology on rulemaking. Because of its interdisciplinary composition - combining expertise in NLP, expert knowledge about regulatory law and legal information systems, and social science experience in the effect of technology on organizations - the Cornell team is well situated to generate both qualitative and quantitative data about the crucial, but still largely under-studied, rulemaking process.

Broader Impacts. The project provides an important opportunity for interdisciplinary education and research for PhD, master's, and undergraduate students in Cornell's Information Science Program. All data sets and tools will be made available to other researchers. The NLP methods to be developed are general-purpose techniques, trainable for any domain or genre, and useful in any context that requires managing, organizing, and analyzing large volumes of text. Finally, many of the same techniques that help agency rule writers can be used to design agency websites that help the public search, sort, and otherwise selectively access materials in the rulemaking process.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0535099
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2005-11-15
Budget End
2010-04-30
Support Year
Fiscal Year
2005
Total Cost
$825,000
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850