This research project enables next generation dialogue systems to be able to collaborate with a user without the limitations of system-initiative interaction, in order to solve complex tasks in an optimal manner. The research develops reinforcement learning (RL) strategies to learn dialogue policies that are mixed-initiative. The specific aims of this are to (a) extend RL to mixed-initiative dialogue interaction; (b) allow the system policy to adapt to different user types, such as people with poor memory, or poor problem-solving skills; and (c) simultaneously learn the policy for the simulated user.
This approach will allow more advanced dialogue systems to be deployed, such as assisting the elderly so they can live independently longer, and helping provide health care information to rural areas. The proposed research project will result in a toolkit that will allow a wide range of users to easily develop dialogue policies. The toolkit will (a) allow students to be effectively trained in this area, (b) lower the barrier for other researchers to contribute to the field, and (c) help transfer this new technology to industry.
The long-term goal of this research project is to enable next generation dialogue systems that will be able to collaborate with a user without the limitations of system-initiative interaction, in order to solve complex tasks in an optimal manner. Our previous work has been directed towards this goal by allowing reinforcement learning (RL) to use a well-defined formalism for expressing the preconditions and effects of actions that the system will take, and by allowing the dialogue policy to be learned using a simulated user, which is simultaneously learned with RL. The research objective of this proposal is to further develop the methodology to allow RL to learn dialogue policies that are mixed-initiative. Intellectual Merit: Most applications of RL for dialogue management focus on form-filling dialogues, in which the system just has to have the user provide answers to a set number of fixed questions. To extend the use of RL, we investigated a task that requires collaboration, in which the user and system each have a set of preferences, and must find the solution that best fits these preferences. In order for RL to learn an optimal policy, we constructed the state to include both action selection variables and bookkeeping variables. The bookkeeping variables track how the user and system reached agreement, which allows RL to properly propagate the costs of reaching an agreement back to the appropriate state-action pairs. Turn-taking is a major component of mixed-initiative interaction. We proposed a new approach to turn-taking for spoken dialogue systems, based on evidence of how people manage turn-taking. After each utterance, the system and user bid for making the next utterance. We used reinforcement learning to determine a system policy for both what the system should say and how much it should bid for the turn. The use of RL allows the system's bid to be based on how important the system believes its proposed turn is terms of finishing the dialogue. We also proposed an extension of this model in which we also separated attention actions, which determine what the system will talk about next, and learned all three action types using RL. A key part of building a dialogue system is to determine what mechanisms the system needs to use to participate in a dialogue, such as turn-taking, showing understanding, and how to repair mistakes. We investigated how the dialogue of children with typical development (TD) differs from that of children with Developmental Language Disorder (DLD) and Autism Spectrum Disorder (ASD). Children with ASD have social and language impairments. Children with DLD tend to display similar language deficits as children with ASD, but without social impairments. We found that children with ASD use the filler `uh' at the same rate as children with DLD and TD; while using `um' at a much lower rate. We also found that children with TD and DLD were more likely to produce a pause after `um' then were the children with ASD. This work suggests that `um' might be a result of social reasoning (a dialogue coordination mechanism), while `uh' might be due to speaker processing, and not intended for the other conversant. Broader Impacts: This grant partially funded two Ph.D. students, both U.S. citizens, one who is now in his fourth year, and a second who is about to graduate. Two supplements to this grant (Research Experiences for Undergraduates) enabled two undergraduate students to become involved in the research project, one student for one summer, and a second for two summers. The PI created a lecture sequence on using Reinforcement Learning with Information State Update (ISU) rules, along with a 5 homework sequence, building hand-crafted system policies with an ISU engine, augmenting the ISU engine to simulate dialogues between a user and a system, augmenting the ISU engine to learn the system dialogue policy using RL, and using the resulting RL-ISU toolkit to experiment with different learning parameters for RL. This material is available on the web. Our work on dialogue of children with ASD demonstrates the feasibility of using dialogue behaviors as the basis for developing an objective instrument for diagnosing children with ASD, and for determining what dialogue functions might require remediation for children with ASD. This work is also relevant to differential diagnosis of ASD versus DLD, which has been recognized as particularly problematic.