The collaborating Principal Investigators from the University of Pennsylvania, University of Colorado, University of Washington, and Columbia University are performing preliminary investigations into the feasibility of creating a treebank with multiple representations including both dependency structure and phrase structure.
Issues of interest include:
* To what extent can the conversion between dependency and phrase structure be automated, and to what extent is hand correction necessary after automatic conversion?
* How can we leverage experience from the creation of other treebanks?
To address these issues, the collaborating team is convening a working meeting of international scholars who have actively worked on treebanks representing many different formalisms and languages. Building on previous results in the conversion of phrase structure treebanks to dependency structures, tree adjoining grammar structures and combinatory categorical structures, a set of experiments is performed demonstrating the feasibiity of generating phrase structure from dependency structure. This requires well defined phrase structure and dependency structure guidelines for the language in question, currently limited to English. Other languages with tested guidelines for one of the two formalisms include Chinese, Korean, and Arabic (phrase structure) or Czech, German, Dutch, and Russian (dependency structure). The team is extending their conversion experiments to one of these languages as well, creating the necessary additional guidelines in the process. They are sharing a qualitative evaluation of our results with the colleagues at the workshop, and discussing the suitability of further experiments and alternative languages, as input into a resubmission of the CRI proposal.