Current AI systems still lack the knowledge and reasoning abilities needed to handle the semantic subtleties of language and the thematic breadth of human discourse and thinking. This project is developing a basic repertoire of lexical and other general knowledge for use in a powerful inference engine (EPILOG) designed expressly to support unrestricted language understanding and reasoning.
The methods being employed exploit the insights into language-based inference gained in recent years in the area of "natural Logic", which makes systematic use of word-level and structural entailment properties of language. These are easily modeled in EPILOG, which uses a language-like meaning representation (Episodic Logic). Some very general semantic properties are being manually encoded, and in addition, large numbers of knowledge items are being extracted computationally from lexical resources such as WordNet and VerbNet, and from word similarity or paraphrase clusters derived from large text corpora.
The expected result is a knowledge base of fundamental lexical and other commonsense knowledge that will allow demonstration of many previously infeasible language-based inferences, including both forward and backward reasoning and many multi-premise entailment inferences in existing test suites. This will significantly advance the state of the art in basic language understanding and in mechanizing "obvious inferences", with potential applications to intelligent dialogue-based agents (for question answering, tutoring, personal assistance, etc.), and to knowledge bootstrapping through machine reading. The results will be disseminated both through papers at conferences and in journals, and through web sites making available EPILOG and the newly developed knowledge bases.
The goal of Artificial Intelligence is to provide insight into thinking and intelligence, and to build useful and interesting artifacts that display these attributes, at least to some degree or in specialized areas. The grand goal that motivates much of the field is to endow AI systems with human-like conversational abilities and common sense, along with specialized expertise. At the present time, however, AI systems still lack the vast amounts of knowledge and the subtle reasoning abilities needed to handle the semantic complexities of language and the thematic breadth of human discourse and ordinary thinking. This project has taken some steps towards the grand goal of AI, both on the reasoning front and the knowledge accumulation front. The group working on the project has continued the development of an inference engine called EPILOG 2, to handle various types of inferences that humans find utterly obvious. For example, given that "[the bank] refused to reveal the fact that the building was to be condemned", it follows that the building in question was to be condemned, and that the bank did not reveal this fact (presumably, when it should have done so). It also follows, for example, that an architectural structure was to be condemned, since after all buildings are architectural structures. Recent work in "natural logic" has focused on this type of inference, which rests in part on the semantics of so-called implicative words like refuse, reveal, and fact, and in part on so-called entailment relationships among words, such as that every building is an architectural structure. After addition of some new functionalities, EPILOG 2 easily and naturally handled these types of inferences, in part because the knowledge representation used by EPILOG is itself rather language-like. Part of the work of demonstrating natural-logic reasoning in EPILOG consisted of assembling and formalizing a sizable collection of implicative words (by scouring existing collections and augmenting these with the help of lexicons and thesauri), and a much larger collection of entailment relations between words (more than 100,000). The latter were obtained by "mining" a very large online lexicon, WordNet, and one accomplishment of the project was to develop ways of avoiding many of the errors that such a mining process can lead to. For example, in following a sequence of ever more general word senses in WordNet, starting at font (i.e., a print style), we arrive at the term communication, but we do not wish to conclude that a font is a communication, when in fact it is just a print style that may be used in a communication. Natural-logic inferences still fall far short of the kinds of inferences people make with ease, such as that dining entails eating a substantial meal, probably relatively late in the day (and probably only once per day), and that the diner probably starts out at least somewhat hungry and ends up more or less sated. Thus we need knowledge about the preconditions, process, and consequences entailed by verbs like dine -- and the many thousands of other verbs in any adequate lexicon. We also need general knowledge such as the typical times of day and frequency of meals, the enjoyment derived from eating, the cooking and preparations that are necessary, and so on. The project made progress on gathering and formalizing many pieces of knowledge of these types. One method that was employed was to formalize the essential commonalities and differences of verbs in families of related verbs (for example, run, walk, amble, saunter, march, etc. all refer to locomotion on foot, but differ in the manner and speed they imply). Another method was to mechanically extract meaningful fragments from large text collections (news media, Wikipedia, etc.) and shape these into general knowledge. For example, from a weblog containing the snippet "Before enjoying a delicious home-cooked meal, ...", the extraction code derives "Many meals are delicious" and "Many meals are cooked" (represented in the formalism used by EPILOG). Many millions of such simple general facts have been gathered, and in future will be filtered (probably, using crowdsourcing) to remove the 15% or so that are faulty in one way or another. The work on the project has thus enhanced both the knowledge infrastructure and reasoning capabilities thought to be necessary for human-like understanding and common sense. Along the way, a powerful new tool, called TTT, for matching and transforming linguistic and logical patterns was developed, and was shown to be useful for many aspects of linguistic processing and inference. This brings the AI enterprise a few steps closer to the goal of developing agents that can provide intelligent question-answering, tutoring, personal assistance, and other services. The project has also broadened the knowledge and honed the research skills of the 5 graduate students and 6 undergraduate research assistants who contributed to the work.