This software capitalization project is to fund the reformatting of the online text data, which exists as part of the Association for Computational Linguistics Data Encoding Initiative, into a common SGML-based format and make it available to the research community at low cost and with minimal restrictions. This is the first of several collections which are being re-formatted. The project enables scaling up of natural language research so that more realistic problems can be studied. This is particularly relevant for applications in the recognition and analysis of text and speech. Existing generally-available text databases are too small. It is expensive and time consuming to obtain sufficient text and to make it usable for research. For individual researchers to duplicate this effort is wasteful. A common database will permit published results to be replicated or extended. There is joint funding for this project with other NSF offices and with DARPA.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
9113530
Program Officer
Larry H. Reeker
Project Start
Project End
Budget Start
1991-07-01
Budget End
1993-12-31
Support Year
Fiscal Year
1991
Total Cost
$139,904
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104