This award supports the development of a system for computation of syntactic structure in spoken language conversations with a specific emphasis on parent-child conversations. The new system, called GRASP (Grammatical Relations Analysis for Spontaneous Protocols), replaces the traditional computation of full parse trees with the computation of grammatical relations linked through a dependency structure. GRASP is being applied to all the English language corpora in the CHILDES (Child Language Data Exchange System) database. It replaces tedious hand calculation of commonly used child language profiles such as IPSyn, DSS, and LARSP with a more reliable automatic method. GRASP is able to achieve significantly greater parsing accuracy, since it can concentrate on the computation of syntactic structures that are most relevant to these profiles. To validate accuracy, results from GRASP are compared with hand-coded results contributed by 8 laboratories. Deviations between GRASP and hand-coded results are analyzed in detail to diagnose ways to increase system accuracy. The current parser relies primarily on a simple statistical parser for grammatical relations trained on maternal utterances in the Eve and Sachs corpora in CHILDES. Analyses have shown that training on the grammatical relations in the maternal input is superior to training on inputs that include children's utterances, even when the target utterances are child utterances. Additional modules in construction emphasize rule-based grammatical relation construction and robustness rules.

Automation of these scoring profiles benefits academics and clinical professionals working with bilingual and minority populations, delayed talkers, and children with hearing problems. In addition, the grammatical relation dependency structures being computed are at the core of the language acquisition theory, such as thematic role structure, binding, relativization, complementation, and movement. The specific identity of the grammatical relations being encoded was determined by paying close attention to structures of traditional importance in the literature and by an open solicitation of additional suggested relations posted to the child language bulletin board. After consolidating these new parsing methods and automating the relevant child language measurement instruments, the project will extend the results to language development in other languages, as well as to adult spoken language corpora. The final system will be disseminated by facilitating web-based analyses of online corpora and by packaging into a simple desktop application.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0414630
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2004-12-15
Budget End
2008-11-30
Support Year
Fiscal Year
2004
Total Cost
$405,999
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213