This software capitalization project is to fund the reformatting of the online text data, which exists as part of the Association for Computational Linguistics Data Encoding Initiative, into a common SGML-based format and make it available to the research community at low cost and with minimal restrictions. This is the first of several collections which are being re-formatted. The project enables scaling up of natural language research so that more realistic problems can be studied. This is particularly relevant for applications in the recognition and analysis of text and speech. Existing generally-available text databases are too small. It is expensive and time consuming to obtain sufficient text and to make it usable for research. For individual researchers to duplicate this effort is wasteful. A common database will permit published results to be replicated or extended. There is joint funding for this project with other NSF offices and with DARPA.