This Small Business Innovation Research Phase I project from Naftware, Inc. addresses the need for better information search methods. Recent advances in computational linguistics (event semantics) present opportunities for major productivity enhancements for American knowledge workers by enabling the development of ScenarioNet, which automatically learns to extract deep event information from text. A major deficiency in current information extraction (IE) systems is that training examples must be painstakingly annotated by human experts. In ScenarioNet, examples are automatically extracted, ranked, and kept or discarded in rapid user-feedback cycles, dramatically streamlining training the system for new or revised domains. To enable a 'deeper' level of event information extraction, ScenarioNet incorporates a statistical full parser, event models, event builder (with sub-event merging), relationship models (extracting multiple relationships from a single phrase or sentence), scenario models, and scenario builder (with event merging algorithms). For deep IE, events themselves must be placed in context of related events: ScenarioNet's hierarchical event and scenario models include representations of causal, temporal, and structural relationships. Event merging algorithms utilize coreference resolution techniques, including discourse-level and cross-document coreference, to recognize multiple elements of events and how events combine into scenarios. ScenarioNet eases cross-domain portability and enables deep event, relationship, and scenario information extraction.
A robust, easily customizable event, relationship, and scenario information extraction system has strong commercial potential in such industries as defense, intelligence, insurance (review of applications and claims), healthcare, financial services, legal services, business intelligence gathering, all levels of government (review of applications and reports) and engineering (to keep up with new developments). Naftware's proffered technology, ScenarioNet, removes the barriers to wide commercialization of IE by cutting the customization effort (cross-domain portability) and by incorporating deeper semantics into its object-oriented extraction models and templates.