This project addresses a key problem in advancing the state of the art in cognitive assistant systems that can interact naturally with humans in order to help them perform everyday tasks more effectively. Such a system would help not only people with cognitive disabilities but all individuals as they perform complex tasks they are unfamiliar with. The research focuses on structured activities of daily living that lend themselves to practical experimentation, such as meal preparation and other kitchen activities.
Specifically, the core focus of the research is activity recognition, i.e., systems that can identify the goals and individual actions a person is performing as they work on a task. Key innovations of this work are 1) that the activity models are learned from the user via intuitive natural demonstration, and 2) that the system is able to reason over activity models to generalize and adapt them. In contrast, current practice requires specialized training supervised by the researchers and supports no reasoning over the models. This advance is accomplished by integrating capabilities that are typically studied separately, including activity recognition, knowledge representation and reasoning, natural language understanding and machine learning. The work addresses a significant step towards the goal of building practical and flexible in-home automated assistants.
Activity Recognition and Learning for a Cognitive Agent The ability to automatically recognize human activity has a wide range of potential applications, from automated assistants for those with cognitive impairments to human-robot collaborative activity. This project focused on a key problem of how systems can learn the underlying models of activity automatically from natural language demonstrations. We have developed a range of innovative learning techniques that have been tested in a variety of domains, including assisting in kitchen-based activities such as cooking, monitoring activity in wet-labs to ensure compliance, and automatically learning new procedures in text-editing tasks. A unique focus of this work was the use of language understanding, both for rapid learning of new tasks, and for improving activity recognition when language is provided. A key finding was the development of a general model of activity recognition that can take advantage of input at varying levels of detail (from low level vision to high level action descriptions in language). This required the development of a new temporal model of action that supports probabilistic inference (using Markov Logic Networks), and demonstrating the effectiveness of this model in integrating perception and language for activity recognition. We also demonstrated how a system can rapidly learn the meaning of new words, including adjectives, grounding these terms in its perceptual system. In addition, we developed a new technique for learning procedures in a text-editing domain, where the system can rapidly learn new procedures from natural language instruction plus concrete examples. Related to these efforts, we also developed a new formalism encoding and resolving scope ambiguities in natural language, which greatly expands the range of scoping phenomena that can be effectively resolved. Finally, in the performance of this research, it became clear that rapid learning of activity models from language requires extensive common-sense knowledge in order to link the language to the observed activities being demonstrated. To meet this need, we developed new techniques for acquiring commonsense knowledge from reading natural definitions, specifying those found in the lexical dictionary WordNet. We supported 14 PhD students at various stages of their work, combing our efforts between both IHMC and the University of Rochester. Four of these have completed their PhD’s already, and many more will finish in the coming two years.