Automatic paraphrasing is considered vital to applications as diverse as machine translation (MT), question answering, summarization, and dialogue systems. Paraphrasing has also been shown recently to hold promise for automatic methods of evaluating MT, when the paraphrases are of sufficiently high quality.

This project investigates novel methods for acquiring and generating such high quality paraphrases in order to automatically approximate the human translation error rate (HTER) metric for MT evaluation, where human annotators post-edit MT outputs into acceptable paraphrases of the reference translations. The project emphasizes the use of a linguistically informed, grammar-based parser and realizer for acquiring and generating paraphrases using disjunctive logical forms (DLFs), in sharp contrast to most recent work that relies entirely on shallow methods. Specifically, the project investigates methods of (1) engineering a broad coverage English grammar from the CCGbank, with semantic roles integrated from Propbank; (2) scaling up OpenCCG for efficient parsing and realization with this grammar, adapting supertagging and parse ranking methods for generation; (3) adapting and extending previous methods of acquiring paraphrases to work on DLFs; (4) generating high quality n-best paraphrases of one or more reference sentences; and (5) experimentally evaluating whether the automatically generated paraphrases can be used with current MT metrics to yield improved correlations with human judgments of translation quality.

By providing a way to automatically approximate the HTER metric, the project will help drive future MT research. Additionally, by dramatically extending the realization capacity of OpenCCG, the project promises to benefit a wide range of NLP tasks where the breadth of target texts is of crucial importance.

Project Report

In this project, we investigated methods of generating high-quality paraphrases using a broad coverage lexicalized grammar and automatically learned preferences for choosing among possible paraphrases. In recent years, word-level paraphrases have been investigated in a broad coverage setting, with applications in various text-to-text generation tasks such as summarization, simplification and machine translation, as well as in the automatic evaluation of machine translation systems. However, there has been little research on generating paraphrases of unrestricted text with a grammar. We developed novel techniques for generating paraphrases by parsing a sentence into a representation of its meaning, then generating back sentences that express the same meaning according to the grammar. In our case, the grammar was extracted from an enhanced version of the CCGbank, a corpus of sentence derivations based on the Penn Treebank following the principles of Combinatory Categorial Grammar (CCG). With a broad coverage grammar, there are often many expressions generated this way, not all of which adequately express the original sentence's meaning in a truly grammatical way. To select preferred outputs, we developed a realization ranker that improved upon existing models on all three interrelated sub-tasks traditionally considered part of the surface realization task in natural language generation research: inflecting lemmas with grammatical word forms, inserting function words and linearizing the words in a grammatical and natural order. The model takes as its starting point two state-of-the-art probabilistic models of syntax that have been developed for CCG parsing. Using averaged perceptron models, a form of discriminative machine learning, we trained a model to combine these existing syntactic models with several n-gram language models, which are simple probabilistic sequence models not requiring a treebank. This model improved upon the state-of-the-art in terms of automatic evaluation scores on held-out test data, but nevertheless our error analysis revealed a surprising number of word order, function word and inflection errors, spurring us to tackle each issue in turn. To reduce the number of subject-verb agreement errors, we extended the model with features enabling it to make correct verb form choices in sentences involving complex coordinate constructions and with expressions such as 'a lot of' where the correct choice is not determined solely by the head noun. We also improved animacy agreement with relativizers, reducing the number of errors where 'that' or 'which' was chosen to modify an animate noun rather than 'who' or 'whom' (and vice-versa), while also allowing both choices where corpus evidence was mixed. With function words, we showed that we could improve upon the model's predictions for when to employ 'that'-complementizers using featured inspired by Florian Jaeger's work on using the principle of uniform information density, which holds that human language use tends to keep information density relatively constant in order to optimize communicative efficiency. In news text, complementizers are left out two times out of three, but in some cases the presence of 'that' is crucial to the interpretation. Generally, inserting a complementizer makes the onset of a complement clause more predictable, and thus less information dense, thereby avoiding a potential spike in information density that is associated with comprehension difficulty. See Figure 1 for an example. Finally, to improve word ordering decisions, we demonstrated that incorporating a feature inspired by Edward Gibson's dependency locality theory can deliver statistically significant improvements in automatic evaluation scores, better match the distributional characteristics of sentence orderings, and significantly reduce the number of serious ordering errors as confirmed by a targeted human evaluation. Gibson's theory holds that the preference to minimize the distance between headwords and their dependents has a strong influence on word ordering choices, as supported by comprehension and corpus studies. See Figure 2 for illustration. State-of-the-art models in natural language processing (NLP) typically incorporate hundreds of thousands of low-level features loosely correlated with linguistically explanatory features, but which themselves usually have little or no linguistic motivation, calling into question the relevance of linguistic theory for NLP. For this reason, we believe our results making use of uniform information density and dependency locality theory are exciting ones, as they demonstrate the relevance of human sentence processing research for developing state-of-the-art models. The methods developed during the project for generating high quality grammatical paraphrases can potentially benefit the tuning and evaluation of statistical machine translation (SMT) systems. In as-yet unpublished research, we found that our automatically generated paraphrases can improve the correlations of existing automatic MT metrics with human judgments. We are currently continuing experiments in this direction under defense department funding, and this follow-on project has already shown for the first time that automatic paraphrases can benefit the parameter tuning stage of training SMT systems even when there are four human-authored reference sentences available.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0812297
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2008-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2008
Total Cost
$396,621
Indirect Cost
Name
Ohio State University
Department
Type
DUNS #
City
Columbus
State
OH
Country
United States
Zip Code
43210