Current spoken dialog systems are generally not pleasant to interact with. While human interlocutors can deftly negotiate and control pace, and smoothly signal understanding, control intentions, attitude, and so on, most dialog systems deal poorly, if at all, with these dimensions of interaction. Lacking this, dialogs tend to be stilted, awkward and frustrating, tend to demand careful attention, and tend to be time-inefficient. To address these problems, this research program seeks to develop and evaluate techniques that allow dialog systems to interpret and generate non-verbal and other indications of attitude and feeling, thereby improving these real-time aspects of system usability.
The PIs are recording human-human dialogs in controlled domains, analyzing the prosodic and contextual cues that humans use. Further, they are seeking to interpret these cues as expressing pragmatic dimensions of the interaction. The result will be a model of real-time interpersonal interaction as manifested in spoken dialog. This model will be useful for the development of more usable systems for voice access to information. The findings may also support the construction of spoken dialog systems for more challenging dialog types, such as teaching, advising and selling.