Learning Word Relationships Using TupleFlow

Allan, James; Croft, W. Bruce

Abstract

The Center for Intelligent Information Retrieval (CIIR) is investigating the impact of statistically derived semantic word relationships on information retrieval. Exploiting these relationships, for example, by identifying when different words express the same content can lead to more effective rankings of retrieval results. Semantic relationships are not labeled explicitly in text and are too varied to be identified solely by hand. The CIIR is mining massive corpora for direct and indirect word co-occurrence data using both offline and retrieval-time computation. The particular focus is on techniques that create and use Web-based corpora of "comparable" sentences and text chunks for estimating word and phrase translation probabilities, and on techniques that derive relationships from "context vectors" that represent word and phrase meanings. The quality of the word relationships that are discovered is being tested using large-scale retrieval experiments. In addition, the CIIR is addressing computational barriers to large-scale data mining by moving its new distributed computational framework, TupleFlow, to Hadoop. That framework was developed for the type of indexing and analysis operations that are required for large-scale studies of relational structure in text. TupleFlow is an extension of MapReduce, with advantages in flexibility, scalability, disk abstraction, and low abstraction penalties. This work is expected to have broad impact by improving the quality of search results.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0844226
Program Officer: Maria Zemankova

Project Start
Project End
Budget Start: 2009-02-01
Budget End: 2012-01-31
Support Year
Fiscal Year: 2008
Total Cost: $450,000
Indirect Cost

Learning Word Relationships Using TupleFlow
Allan, James Croft, W. Bruce
University of Massachusetts Amherst, Amherst, MA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments