7 # 9 9 ) ) ) ) ) 3 = = = = = x x The goal of this project is to develop a sound basis and methodology for the representation of corpora and linguistic information in corpora, as well as for the design of text-handling tools, for use in corpus-based natural language processing (NLP) research. The work is undertaken in collaboration with the Laboratoire Parole et Langage in Aix-en-Provence, France. The project involves (1) analysis of the needs of corpus-based NLP research, both in terms of the kinds and degree of annotation required and the requirements for efficient processing, accessibility, etc.; (2) analysis of general properties and configuration of corpora, analysis of the relevant structural and logical features of component text types, and the design of encoding mechanisms that can represent all required elements and features while accomodating the requirements determined in (1); and (3) specifications for text software design, coordinated with (2), designed to avoid redundancy and maximize the modifiability, extendability, and reusability of corpus-handling software. The methods and materials developed in this project will provide a comprehensive framework for the machine representation and manipulation of large corpora for corpus-based NLP research, thus enabling both software and data to be easily shared, used, and re-used in the future. u h 9 9 9 9 9 9 = / B Ide - Abstract Strong Times 9 9