This proposal outlines a challenging goal of combining the ease of voice with the visual feedback of graphics to create a new communication medium. The project will create a graphical language for visualizing communication. Different versions of the graphical language will be used to mediate remote and co-located conversation, to create learning tools for acquiring new language skills and conducting speech therapy, to create new visualization techniques combining time and phase analysis, and to produce novel methods of archiving audio, speech, and voice.
Intellectual Merit. The intellectual merit of this project is the consideration audio, speech, and voice processing from a perspective and goal different from that of traditional audio engineers. The project begins with simple yet effective techniques that have not yet been studied and progresses to developing new analysis and visualization techniques. The graphical languages will be evaluated through applications catered to their use. This research studies the use of social cues and signals that are more readily apparent in mediated interaction than in face-to-face interaction; the parameters in voice that are most effective and easy to interpret; the use of color, shape, and motion mapped onto these voice parameters. The research will help us understand how visualizing conversation publicly affects our vocal interaction when in co-located spaces and in remote spaces; and how much value is added by combining voice data with graphical data.
Broader Impact. This research will alter vocal communication interaction in online environments and in physical co-located spaces. The act of archiving in real-time will have new consequences for voice communication and collaboration. The research will provide novel tools for helping users learn the subtleties of vocalization. These tools will be used for learning new languages and for speech therapy. The research extends the vocalization tools to tangible toy objects to encourage vocalization in children who have social impairments. The research will provide an alternate graphical method for archiving large bodies of audio that can be searched at a glance rather than with a search engine. The research results will be disseminated in new courses and in research publications.
This work explored using visualization to emphasize and uncover cues in spoken language. The work initially focused on two domains: (1) visualizing small group conversation around a tabletop and (2) visualization to encourage vocalization in those diagnosed with Autism Spectrum Disorders (ASD). We later explored cues in some written discourse. We highlight the projects below. The Conversation Clock: The Conversation Clock explored small group dynamics about a tabletop. A visualization explicitly rendered participant contribution visible from a third person perspective to the group. The system highlighted cues such as turn-taking, silence, domination, mimicry, "yes"-behavior, laughter, interruption and prosody. In a series of studies of this visualization in groups of 3-4 people, we found the visualization resulted in decreased participation of participants that previously spoke 'above average'and increased participation in participants who previously spoke 'below average'. We found significant differences in how participants modified their vocal behavior with respects to number of turns and length of turns in conversation. We placed this tool in three additional contexts: (1) to analyze and teach turn-taking skills in teenagers and adults with Asperger’s Syndrome. We found that "seeing" conversation helped participants better understand goals. (2) to explore visual "lies" in the clock interface. We made slight modifications to the visualization to suggest participants had spoken more than they actually had. We found we could influence participation to a degree. Participants whose visualization suggested they spoke more spoke less. When modified above a threshold, participants felt the microphones in the system were faulty. Conversation Votes: Conversation Votes extended the Conversation Clock by allowing human annotation in addition to vocal contribution by voting on moments of significance. A precursor to the Facebook "Like" button, it is used in real time during a vocal conversation. We wanted to highlight participants that spoke less but had something important to say. Earlier work found those higher on a power hierarchy dominate conversation. We wanted to democratize participation. This tool resulted in similar vocal contribution behavior changes as the Conversation Clock. While we initially anticipated that participants who spoke less would vote more, we found that those who spoke more voted overwhelmingly more. In both the Conversation Clock and Conversation Votes, the table further served as a tool for indexing vocal conversation. Conversation Votes had the additional features of intentional annotation cues. Conversation Clusters: We created a tool that visualizes conversation topics spoken in groups. It shows when topics split, when they merge and who contributed topics. The Spoken Impact Project: We created a series of visualizations designed to encourage vocalization in children with Autism Spectrum Disorders (ASD). The first series was designed to increase speech-like and non-speech-like focalization. The second was designed to increase speech-like vocalization. We conducted a single subject multiple-baseline study over the course of a year with 5 children. Four out of the five children increased vocalization in short amounts of time throughout the course of the study. We followed up this study with a wizard-of-oz segment that showed rewards when there was incremental improvement. An unanticipated finding was that for 3 our of the 5 children, even though audio, video, and audio-video feedback was helpful, audio feedback was more powerful than the visual feedback. VCode/VData: Analyzing the Spoken Impact Project proved quite challenging. Existing annotation tools lacked specific phonetic tools required in the coding and metrics for success. We created an annotation tool called VCode/VData. While designed for our project, it has since been downloaded over 10,000 times around the world and used by leading researchers in Universities, sports coaches. Its available on Gitgub and was recently forked to explore comedy show reactions. VocSyl: We created a series of visualizations to explore prosody and syllable articulation in children diagnosed with speech delay. We explored various visualization metaphors to determine preference. Next, we evaluated syllable articulation and found improvement in children who had used the tool in a series of studies at a speech and hearing clinic. Visualization of Skype Convresation: We created a series of visualizations for Skype as browser plug-ins to highlight dyadic patterns over time. Additional Projects: We created visualizations of Amazon reviews to create a single representative image. It highlighted positive, neutral and negative sentiment. Subjects were more likely to buy a product with more positive terms in our visualization in a pilot study visualizing reviews compared to the traditional Amazon interface alone. We created models of tie strength in face-to-face relationships from Facebook conversation traces. We uncovered cues from text conversation and found that tie strength on Facebook and Twitter can be explained with essentially the same model. We created a visualization to encourage vocal participation in large classrooms. We found that using the interface, participation increased due to lowering of evaluation apprehension. This interface was highly dependent on the flexibility of the instructor of the class.