Linguistic code switching (LCS) is the practice of switching back and forth between the shared languages of bilingual or multilingual speakers. This phenomenon is particularly prevalent in geographic regions with linguistic boundaries or where there are large immigrant groups. Various levels of language (phonological, morphological, syntactic, semantic and discourse-pragmatic) may be implicated in LCS in different language pairs and/or genres. Computational algorithms trained for a single language quickly break down when the input includes LCS. A major barrier to research on LCS in computational linguistics (CL) has been the lack of large, accurately annotated corpora of LCS data. In this project, a large repository of LCS data is collected and a large annotation infrastructure is developed. It is consistently annotated in different modalities (speech and text), at various levels of linguistic granularity, and across different language pairs reflecting different linguistic typologies (Standard Arabic and Dialectal Arabic, Arabic-English, Spanish-English, Chinese-English, Hindi-English). The focus of the effort is on intra-sentential LCS.

This infrastructure and unified large LCS data resource is eagerly awaited by the CL research community, since annotated LCS data provides a natural test-bed for adaptive learning algorithms and the handling of diverse data sources, as well as a framework for genuine multilingual processing. It will also be of benefit to sociolinguistic and theoretical linguistic researchers, and provide a platform for collaborative interdisciplinary research. Finally, research on LCS helps overcome biases against multilingual speakers by demonstrating the creativity of such speakers in exploiting their verbal repertoires. Such a result is particularly important for K-12 education and testing policies in the USA with its diverse immigrant population.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1205556
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
2012-09-01
Budget End
2013-06-30
Support Year
Fiscal Year
2012
Total Cost
$400,000
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027