This research in gesture, speech and gaze investigates automatic discourse segmentation of multimodal communication data. The goal is to discern discourse structure from video and voice analysis of data that one can legitimately expect from common video and its audio track. The research will address the interpretation of gesture, speech, and gaze in discourse management, utilizing psycholinguistic models to explain how these modalities combine to express discourse structure; specifically, by developing algorithms for recognizing 'catchments' as empirically grounded thematic segments based on the partial recurrence of prosodic, gaze and gesture features during natural discourse. The research will be integrated into a hierarchical model that is both amenable to computational implementation and reflective of human communicative realities. The approach involves experiments designed to discover and quantify cues in the various modalities, and their relation with respect to discourse management; the development of computational algorithms to detect and recognize such cues; and the integration of these cues into a cogent discourse management system. The team, comprising psycholinguistic, machine vision and signal processing researchers, gains strength from its interdisciplinary scope. Technology developed will have significant impact on natural language understanding, human-computer interaction, and discourse and video databases.