Large-enrollment courses are a practical necessity for introductory courses in science, technology, engineering and mathematics (STEM) at many institutions. Innovative technologies, such as audience response systems, can enhance instruction in large-enrollment classes, but the information that can be conveyed with these existing technologies remains quite limited. The goal of this project is to develop a new role for technology in large STEM classes, one that exploits advances in computer vision technology and the rapid proliferation of digital video. By building a multi-camera array to simultaneously observe all individuals in a large classroom, the investigators will pursue foundational research in both education and computer vision. For computer vision, large classrooms provide a convenient microcosm of social interaction in which individual activities are constrained but not controlled; this project will leverage this property to develop vision-based recognition systems for large group activites. Educationally, this new vision system will be used to systematically study learning in large classrooms---something that has not previously been possible.
Insights gained from research in these two areas will be used to create a radically new tool for computer-assisted collaborative instruction. This project will develop systems to automatically detect and summarize real-time activity information for course instructors, thereby enhancing their ability to make optimal use of interactive class time. These systems are viewed as prototypes that can ultimately be replicated at other institutions. In addition, the results of this research into the nature of learning in large classrooms will serve as a basis for improving instruction in large-enrollment STEM courses nationwide, regardless of their technological assets. More broadly, the project offers a new paradigm for education research, in which small-scale ecological observations are scaled up by automated visual activity recognition.
Intellectual Merit. We have provided a cornerstone for computer vision-based observation, understanding, and augmentation of large-classroom learning. We built a multi-camera and microphone array to simultaneously observe student-to-student interactions for the duration of a four-month undergraduate physics course, and used this system for foundational research in education and computer vision. For computer vision, large classrooms are a convenient microcosm of social interaction in which individual activities are constrained but not controlled; and we leveraged this property to develop vision-based recognition systems for large group activities. We created a suite of computer vision tools, including a general framework for the video-based analysis of group behavior, new computational representations of group behavior, and new computational processes for localizing and recognizing salient group behaviors that are embedded in a crowd of non-participants. For education, our system was used to study learning in large classrooms by systematically observing interactions of an entire student population—something that had not previously been possible. Experts in education analyzed the audio and video manually, forming the basis for associating particular kinds of learning interactions with visible student behaviors, and for developing a coding scheme of behaviors that occur in this instructional environment. Based on our analysis of over fifty student discussion video segments, we have learned that in interactive lecture classrooms, students work together to build new physics ideas; that the seating arrangement in a classroom significantly affects how students talk to one another; and that visual cues reveal the kind of conversations students are having, whether or not we can hear what they are saying. Finally, we created an advanced audience response system, called Learning Catalytics, that substantially broadens the scope of interactive teaching. It is a web-based platform that acts as a computational catalyst for enhanced human-to-human interaction in classrooms. Unlike traditional "clickers", which allow the one-way flow of multiple-choice or simple-text answers from students to instructor, this new platform allows information to flow from instructor to students. Moreover, since it is web-based, it operates on common laptops and smart-phones and can be readily extended to allow soliciting, recording, and analyzing very rich forms of student responses, including graphs, drawings, free text, algebraic expressions, photographs, and more. Broader Impacts. This project was carried out by a research team with significant expertise in computer vision, observation-based education research, and collaborative instruction. The results of our research into the nature of learning in large classrooms are a basis for improving instruction in large-enrollment STEM courses nationwide, regardless of their technological assets. Our project offers a new paradigm for education research, in which we aim for small-scale ecological observations to be scaled up by automated visual activity recognition. The project also had broader impacts for research in computer vision. Machine learning techniques have shown tremendous potential for the automatic understanding of human interactions at a small scale, but to date, we have lacked the "ground-truth" data that is required to train and test related approaches for large groups. The constrained activity in classrooms provided an environment in which models for uncontrolled large group interactions were trained and tested for the first time. Thus, although the focus of our research was in large classrooms, our models are applicable more generally for surveillance, video archival and retrieval, human-computer interaction.