Natural language inference (NLI) and data-driven paraphrasing share the related goals of being able to detect the semantic relationship between two natural language expressions, and being able to re-word an input text so that the resulting text is meaning-equivalent but worded differently. On the one hand, work in recognizing textual entailment (RTE) within NLI has attempted to formalize the process of determining whether a natural language hypothesis is entailed by a natural language premise, sometimes called "natural logic". Research in data-driven paraphrasing, on the other hand, attempts to extract paraphrases at a variety of levels of granularity including lexical paraphrases (simple synonyms), phrasal paraphrases, phrasal templates (or "inference rules"), and sentential paraphrases, for various downstream applications such as question answering, information extraction, text generation, and summarization.

This EAGER award explores bridging the gap, through analysis of sentential paraphrasing via synchronous context free grammars (SCFGs), and how they may be coupled to formal constraints akin to recent work in phrase-based formulations of natural logic for RTE. Data-driven paraphrasing has largely neglected semantic formalisms, and NLI has relied heavily on hand-crafted resources like WordNet. If this project is successful it will potentially lead towards NLI systems that are more robust, and paraphrasing systems that are better formalized. Taken together, these improvements will allow better RTE systems to be developed. Moreover, this project has the potential to impact widely used human language technologies such as web search and natural language interfaces to mobile devices, and to further the connection between computational semantics and formal linguistics.

Project Report

In this project we conducted exploratory research into a hybrid of natural language inference (NLI) and data-driven paraphrasing. The goals of the two are inter-related. The goal of NLI is to detect the semantic relationship between two natural language expressions. The goal of paraphrasing is to be able to re-write an input text so that the resulting text is meaning-equivalent but worded differently. Work in NLI and recognizing textual entailment (RTE) has attempted to formalize the process of determining whether a natural language hypothesis is entailed by a natural language premise. Research in data-driven paraphrasing attempts to extract paraphrases at a variety of levels of granularity including lexical paraphrases (simple synonyms), phrasal paraphrases, phrasal templates (or "inference rules"), and sentential paraphrases, for various downstream applications such as question answering, information extraction, text generation, and summarization. Typically NLI and RTE research has employed some underlying formal semantic representation, but often at the expense of covering only a small fragment of language. Conversely, paraphrasing has typically been atheoretical, without a formal semantic representation, but it has typically had very robust coverage of language. This project aimed at exploring how data-driven paraphrase resources could be made more "formal" in capturing linguistic meaning. We analyzed a subset of the 100 million paraphrases contained in the paraphrsae database (PPDB), and characterized the semantic relationships between them, using the ontological relations in WordNet and using Natural Logic. This 1-year EAGER project turned into a larger research effort through the DARPA DEFT program.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1249516
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
2012-08-15
Budget End
2014-07-31
Support Year
Fiscal Year
2012
Total Cost
$99,535
Indirect Cost
Name
Johns Hopkins University
Department
Type
DUNS #
City
Baltimore
State
MD
Country
United States
Zip Code
21218