Igbo is a language spoken by twenty million people, mostly in southern Nigeria. This EAGER project makes use of a corpus of spoken Igbo, which will cover all of the dialects of the language. The corpus is to be used for explorations in which statistical machine learning (ML) programs are created to learn an "inter-Igbo'' consisting cognate sets (Igbo words pronounced differently in different locations but having the same meaning) that enable the corpus to be treated as if it were spoken as a single language, even though the dialects are, at extreme ends of the Igbo homeland, mutually unintelligible. Another aspect of our work is the extension of the existing corpus to fill in gaps in dialect coverage where there currently are recordings from locations that have no near geographical neighbors. The need for this stems from the fact that the closer a dialect's neighbors the more similar they are, and the easier for programs to locate words which differ systematically.

Achieving goals of this exploratory project is of considerable interest for computational linguistics. As opposed to language change over time, there is little computational work on language change over geography, and finding the appropriate ML models for the latter aspect of language variation is a considerable challenge.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1240178
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
2012-07-01
Budget End
2014-06-30
Support Year
Fiscal Year
2012
Total Cost
$91,000
Indirect Cost
Name
Brown University
Department
Type
DUNS #
City
Providence
State
RI
Country
United States
Zip Code
02912