This is a collaborative effort among three universities (Columbia, Rochester, and Pittsburgh) to construct, evaluate, and disseminate a package of Corpus Analysis Resources for Discourse (CARD). The goal is to provide the means for a large-scale, robust analysis of language use, both within and across distinct types of discourse corpora. The three components of CARD are a Discourse Annotation Language (DAL) to encode information pertaining to language use directly within discourse corpora; reliability measures of the degree of variability in DAL annotations; and a library of DAL-annotated corpora, varying in modality, number of participants, domain, and communicative task. DAL follows the Text Encoding Initiative guidelines and is implemented in Standard Generalized Markup Language to facilitate common authoring and editing utilities. DAL is a modular language with five layers of linguistic representation: morpho-syntactic, prosodic, anaphoric, lexical, and segmental.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9528998
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
1996-05-15
Budget End
2000-04-30
Support Year
Fiscal Year
1995
Total Cost
$760,821
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027