Language is one of the most complex aspects of human behavior, and provides the foundation for many kinds of social interaction. The question of how people learn and use language is a subject of extensive research in several behavioral sciences, including cognitive science, psychology, and linguistics. There is a long tradition of using formal approaches to explore answers to this question, and recent work has begun to emphasize the importance of statistical models. With support from the National Science Foundation, Dr. Griffiths at UC Berkeley and Dr. Johnson at Brown University will develop and investigate new methods and models for learning and analyzing natural languages based on Bayesian statistics. In Bayesian statistics, the information about the structure of language provided by linguistic data is combined with a "prior" distribution that constrains the structures under consideration. This approach can make it easier to learn the properties of a language from limited amounts of data, and has a direct connection to theories of human language acquisition that emphasize the role of constraints in learning. This research project aims to integrate the statistical models used for learning and analyzing language with two methods from modern Bayesian statistics: Markov chain Monte Carlo algorithms and nonparametric Bayesian models. These methods make it possible to apply Bayesian inference in complex models of the kind that people typically work with in cognitive science and linguistics. The results of this project will provide new ways of working with traditional models of language, and lead to new models that are potentially of relevance to explaining how people acquire language. By exploring how contemporary statistical methods can be applied to the probabilistic models used in computational linguistics, this project will build closer connections between statistics, linguistics, and cognitive science, and provide opportunities for students to receive training in topics at the intersection of these disciplines.
This award was supported as part of the fiscal year 2006 Mathematical Sciences priority area special competition on Mathematical Social and Behavioral Sciences (MSBS).