RI-Small: Exploiting Syntactic, Semantic and Lexical Regularities in Statistical Language Modeling

Wang, Shaojun

Abstract

This project's aim is to build a statistical language model that is able to capture various kinds of regularities of natural language, mainly local lexical and long range syntactic, or semantic regularities to improve the performance of various natural language applications. It is conducted under the directed Markov random field paradigm to sequentially embed more advanced syntactic structure and/or semantic topic components plus to form complex distributions for natural language. By exploiting the particular structure of each composite language model, the seemingly complex statistical representations are decomposed into simpler ones; this enables the estimation and inference algorithms for the simpler composite language models to become internal building blocks for the estimation of complex composite language models, thus finally solving the estimation problem for extremely complex, high-dimensional distributions.

The composite language models are scalable and might significantly increase the performance of the state-of-the-art speech recognition and machine translation systems which would constitute an important contribution to the language modeling research. The techniques developed in this project might not only lead to effective, robust and intelligent language technology applications, but also might be extended and applied to solve problems in computational biology and computer vision. The project provides an excellent environment for interdisciplinary education in information technology that bridges areas of language and speech processing, machine learning and computational statistics, and theoretical computer science to benefit students of all levels.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0812483
Program Officer: Tatiana D. Korelsky

Project Start
Project End
Budget Start: 2008-08-01
Budget End: 2010-01-31
Support Year
Fiscal Year: 2008
Total Cost: $102,623
Indirect Cost

RI-Small: Exploiting Syntactic, Semantic and Lexical Regularities in Statistical Language Modeling
Wang, Shaojun
Wright State University, Dayton, OH, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments