RI: Small: CompCog: Modeling Latent Discrete Knowledge Across Utterances

Eisner, Jason

Abstract

Each human language is a system of conventions for communicating information. Yet how does everyone know this complex system? Describing it is difficult even for linguists. Yet young children somehow figure out the rules and vocabulary of their native language. Adults continue to learn when confronted with unfamiliar words, with new conventions associated with social media, or with the layout conventions of a new website.

This project develops new artificial intelligence methods for tasks of this kind. These methods will enable computers to deal with a wider variety of human language data, thus improving information access and global communication. They will also provide insight as to why human intelligence is able to succeed at these problems.

The methods will seek to discern the systematic structure that explains the patterns in naturally occurring linguistic data. Specifically, our computers will analyze naturally occurring data in order to learn:

* How to break down words into meaningful parts and reassemble those parts into new words. This is a subject that linguists call morphophonology. It is practically important in automated analysis and translation of speech and text.

* How to break down sentences into meaningful phrases. This requires determining the basic word order facts of the language -- the problem of grammar induction, considered to be a central mystery of human language learning.

* How to extract machine-readable data from large websites that present databases in human-readable form. This involves automatically figuring out the database structure and layout conventions of a website.

* How to track names across large quantities of informal text. By discovering the principles that govern how people use and modify names, a computer can recognize that the nickname "Vlad P." or the misspelled patronymic "Vladimir Vladimirovich" might be variant ways of referring to "Vladimir Putin," especially in a political comment.

The project will address each of these domains in a principled way. Our strategy in each domain is to develop a novel Bayesian generative model along with efficient, principled machine learning algorithms for approximate inference. We expect to expand the range of modeling and inference techniques that are available to the natural language processing community.

Innovative technical directions include the automatic reconstruction of phonological underlying forms, a novel treatment of grammar induction as structured prediction, a nonparametric model of databases and database-backed websites, and a phylogenetic model of name variation.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 1423276
Program Officer: Tatiana Korelsky

Project Start
Project End
Budget Start: 2014-08-01
Budget End: 2018-07-31
Support Year
Fiscal Year: 2014
Total Cost: $457,999
Indirect Cost

RI: Small: CompCog: Modeling Latent Discrete Knowledge Across Utterances
Eisner, Jason
Johns Hopkins University, Baltimore, MD, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments