This project investigates trainable methods of paraphrasing natural language sentences to effectively disambiguate their meaning, using precise, bidirectional grammars induced from corpora to "close the loop"' between parsing and generation. The approach generalizes previous work on probabilistically avoiding ambiguity in natural language generation to a broad coverage setting, disambiguating only as necessary in order to better balance clarity and readability. Generating disambiguating paraphrases in a broad coverage setting makes it possible to explore ways of adapting parsers to new domains using crowd-sourced judgments of meaning similarity. Accordingly, the project explores methods of (1) inducing OpenCCG grammars from the dependency output of parsers such as the C&C parser, (2) generating paraphrases with OpenCCG that explicitly aim to avoid likely distractor interpretations, (3) collecting meaning similarity judgments between the original sentence and paraphrases of its most likely interpretations, and (4) retraining the parser using the collected judgments. To evaluate the approach while also conducting outreach, the project involves data collection and experimentation at Ohio State's language research pod at the COSI science museum, in addition to the use of Amazon's Mechanical Turk.
By closing the loop between interpretation and generation, the project promises to dramatically enhance the prospects for using crowd-sourcing to adapt natural language processing tools to new domains. The project will also enable international collaborations with the University of Sydney, and help to educate the public about language science and technology, providing an inspirational example of science in action to the children who attend COSI.