The need for robust language processing capabilities across academic disciplines, education, and industry is without question of vital importance to national security, infrastructure development, and the competitiveness of American business. However, at this time a robust, interoperable software infrastructure to support natural language processing (NLP) research and development does not exist. To fill this gap, this project establishes an large international collaborative effort involving key international players to develop an open, web-based infrastructure through which massive and distributed language resources can be easily accessed, in whole or in part, and within which tailored language services can be efficiently composed, disseminated and consumed by researchers, developers, and students.
The goal of this project is to build a comprehensive network of web services and resources within the NLP community. This requires four specific activities: (1) Design, develop and promote a service-oriented architecture for NLP development that defines atomic and composite web services, along with support for service discovery, testing and reuse; (2) Construct a Language Application Grid (LAPPS Grid) based on Service Grid Software developed in Japan; (3) Provide an open advancement (OA) framework for component- and application-based evaluation that enables rapid identification of frequent error categories within modules, thus contributing to more effective investment of resources; (4) Actively promote adoption, use, and community involvement with the LAPPS Grid.
By providing access to cloud-based services and support for locally-run services, the LAPPS Grid will lead to the development of a massive global network of language data and processing capabilities that can be used by scientists and engineers from diverse disciplines, providing components that require no expertise in language processing to use. Research in sociology, psychology, economics, education, linguistics, digital media, and the humanities will be impacted by the ability to easily manipulate and process diverse language data sources in multiple languages.