The human ability to use language flexibly is a hallmark of robust intelligence. In interactive dialog, utterances are dynamically tailored to the common ground or specific context with specific partners. However, interaction with spoken dialog systems is highly constrained and constraining, allowing speakers very little flexibility in what they can say while the system presents pre-determined messages. To make interactive dialog technology broadly useful, this exploratory interdisciplinary project collects a corpus of dialogs exhibiting some important sources of variation, analyzes the corpus, and uses the resulting analyses to develop models and prototype implementations of dynamic dialog strategies. The ultimate goal of this effort is to support the synthesis of entirely new, flexible, and robust spoken dialog systems that are capable of adapting on-line.
The Walking-Around corpus consists of 40 human-human dialog interactions where a remotely located person gives directions to a pedestrian walking around in an urban or campus environment. The experimental paradigm varies the friendship relationship of the dialog partners, whether the director can see what the pedestrian sees, and the familiarity of both the director and the pedestrian with the environment. No other existing direction-giving corpora model dialog interaction in an outdoor real-time environment where the physical context grounds the dialog context. The resulting corpus is used to test hypotheses about, and develop models of, the evolution of local and global dialog adaptation strategies. Key to our effort is determining which adaptations are actually functional, that is, beneficial for a particular task or context in spoken dialog systems.
The human ability to use speech and language flexibly is a hallmark of robust intelligence. Speakers have many ways in which to express the same message; when speakers interact with each other, they dynamically tailor what they say to each other by taking into account the common ground or environment they share. However, when people speak with an automated spoken dialog system, the possibilities are highly constraining; speakers often have very little flexibility in what they can say, while the system in turn is often inflexible in its pre-determined responses. In order to work toward a future in which interactive spoken dialog technology is easy to use and more responsive to people from diverse backgrounds, we collected a corpus of dialogs from pairs of volunteers (college students) that captured some important sources of speech variation and behavioral strategies, and then we transcribed and analyzed the corpus. We had the students complete several additional tasks so that we could test hypotheses about how people generate referring expressions. The corpus and resulting analyses are expected to be useful for developing models for flexible automated dialogs for giving navigational directions to pedestrians. Intellectual merit. This project enabled us to study "entrainment", or how two people come to converge on the same perspectives and referring expressions as they refer to something they've discussed before. The Walking-Around corpus consists of 36 dialogs collected while a remotely located person gave directions to a pedestrian walking around, in order to visit and photograph 18 pre-determined locations in an outdoor campus environment. The locations were (for the most part) previously unknown to the students, although the student all had some familiarity with the campus. We varied the amount of visual information that the direction-giver had about each location, kept track of each member of the pair's spatial ability and familiarity with the environment as well as their familiarity with each other, and analyzed the variability in different speakers' referring expressions to the same locations across multiple contexts. In addition, we measured what each partner recalled individually about the 18 locations, coded whether these solitary memories converged on the same expression, and then reunited the members of the pair for 6 rounds of a communication task in which they worked together to match pictures of the locations they had discussed earlier in the navigation task. For that matching task task we coded the degree to which the partners converged on the same perspective they had constructed together earlier, while the direction-follower walked around. Results so far indicate that people do keep track of what expressions they have used in conversation with a particular partner, and that these expressions often differ when they do solitary tasks. These results are described in two papers published in the 35th Annual Meeting of the Cognitive Science Society (other papers are pending): Brennan, S. E., Schuhmann, K. S., & Batres, K. M. (2013). Entrainment on the move and in the lab: The Walking Around Corpus. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Conference of the Cognitive Science Society (pp. 1934-1939). Austin, TX: Cognitive Science Society. (See http://mindmodeling.org/cogsci2013/ and http://cognitivesciencesociety.org/conference2013/index.html) Brennan, S. E., Schuhmann, K., & Batres, K. (2013). Collaboratively setting perspectives and referring to locations across multiple contexts. In Proceedings, Production of referring expressions: Bridging the gap between cognitive and computational approaches to reference. Pre-conference workshop held before the 35th Annual Meeting of the Cognitive Science Society, July 31, Berlin, Germany. (See http://pre2013.uvt.nl/workshop-program.html) Broader Impacts. The Walking Around Corpus is a resource that includes digital audio recordings, transcripts, and experimental materials. The recordings capture spontaneous interaction between direction-givers and pedestrians walking to a systematic set of locations in a natural, outdoor environment. This corpus can be used to better understand spatial dialogue and locative expressions, how referring expressions change in different situations, how a system might best give directions to pedestrians, the development of local and global adaptation strategies, and how pairs of people check for and repair conversational misunderstandings. The students who volunteered to participate have agreed to contribute their data to others for research purposes, so the corpus is available to other researchers. The ultimate goal of this exploratory interdisciplinary project is to understand human needs and behavior well enough to create increasingly natural and robust spoken dialog systems that are capable of adapting to individual human speakers. For more information, see: www.psychology.sunysb.edu/sbrennan-/wacpublic/