This Small Business Innovation Research (SBIR) Phase I project implements and tests the efficacy of a software approach for utilizing client-specific data in order to customize fully-automatic translations produced by state-of-the-art Machine Translation (MT) systems. Based on this approach, the company leverages databases of previously-translated material in order to produce client-customized high-quality fully-automatic translations for commercial language service providers (LSPs) and their enterprise clients. These services are provided via a "software-as-a-service" (SaaS) model. The proposed approach provides a dramatically less-expensive solution for creating client-specific customized MT engines. While it is broadly recognized that customization to client- and domain-specific data can greatly boost the translation quality of MT systems, the common approach of customizing the MT engine directly is costly, and is consequently practical only for major commercial enterprises with very large translation volumes. The company uses the client-specific data maintained by LSPs for their enterprise clients in order to augment and modify the translations produced by a state-of-the-art generic MT system. The same client-customized MT systems can also be incorporated into the human-based workflow used by LSPs for producing human-quality translations for clients, reducing human translator effort concomitantly with the overall cost and duration of translation projects.
The SaaS-based services proposed solutions have the potential of fundamentally changing the commercial translation landscape by removing barriers to wide-spread adoption of MT technology by the broad LSP industry and their even broader client-base. The industry is dominated by a large number of small and medium size LSPs which possess large volumes of client-specific translation data, but lack the resources and the know-how to develop MT-based solutions that leverage these data. By partnering with such LSPs, the company can quickly gain access to a large number of commercial enterprise clients through the LSPs' existing business relationships.
1. Introduction Safaba Translation Solutions is a technology start-up company that specializes in developing customized high-performance automated language translation systems for commercial enterprises. This Small Business Innovation Research Phase-I project funded the development and prototype-testing of Safaba’s core software solution for customizing the output of fully-automatic translations produced by state-of-the-art Machine Translation (MT) systems using client-specific data. The performance period of this SBIR Phase-I project was July 1, 2010 to December 31, 2010. All major software development tasks and benchmark testing were successfully completed on schedule. Significant pilot deployments of the resulting software products are currently in progress. 2. Goals The primary goal of the project was to implement an automated software framework for building machine translation engines that are customized for individual enterprise clients. The main software product outcome is not a translation system – it is an automated multi-component software workflow for building data-driven client-customized MT systems and for creating the data resources and statistical models that these MT systems require. Secondary goals included the implementation of the necessary software components for processing and incorporating client-specific data into the statistical training process for building customized MT engines, as well as the software required for integrating the resulting customized MT systems as a hosted software service into standard translation systems and workflows used in the commercial translation industry. 3. Results Technical accomplishments of this project include the following: Framework for Building Automated Post-Editing (APE) Translation Engines: We implemented a complete and customized software workflow for building Automated Post-Editing Translation engines, based on proprietary adaptations and extensions to the components and standard software workflow of the open-source Moses toolkit for building phrase-based statistical MT systems. Software Components for Processing Client-Specific Data: We implemented the software tools required to extract, transform and manipulate translation data from enterprise clients so that this data can be incorporated into the training process for building the client-customized APE engine. System Integration Tasks: We implemented the software architecture for integrating a complete runtime translation workflow that generates client-customized translations by serially chaining together a baseline MT engine with a client-specific instance of our APE engine. Our integrated architecture also merges translations retrieved from client-specific databases (Translation Memories) and our customized MT system, and includes software converters for generating output in industry-standard formats for easy integration into commercial translation processing workflows. Benchmark Testing and Evaluation: We put together a real-world benchmark testing and evaluation setup using English-Spanish data from a Fortune-100 industrial enterprise that was obtained via our commercial translation partner. This benchmark setup was used to test, debug, isolate problems and improve the various components in our developed framework, and conduct experimental research on software design and implementation variations. Evaluation results using standard MT measures (BLEU and METEOR) indicate that our customized MT system achieves impressive translation quality gains over the underlying baseline MT system. Pilot and Commercial Deployments: Several pilot and commercial deployments of the software products developed under this project are currently in progress. Together with our commercial translation partner, we are in the final stages of a feasibility study to measure translator productivity gains when using our customized English-to-Spanish MT system. Also in progress is a separate revenue-generating project to deploy a customized English-to-Japanese MT system using our technology for a Fortune-100 IT enterprise. This project is in partnership with an established commercial MT technology provider. Several other commercial deployments of MT engines using our technology are currently in advanced stages of negotiation. 4. Conclusions Safaba’s Phase-I SBIR project was has been a major success. The software that was implemented in the course of this project implements a fully functional automated framework for training and deploying customized MT engines for commercial clients. This provides Safaba with a complete, fully-functional, minimally-viable product (MVP), which we are already using for building and deploying customized MT systems for initial commercial clients. This technology has allowed us to establish a partnership with an established commercial MT technology provider, and through this partnership, gain access to major enterprise clients.