In the past decade, we have seen a dramatic increase in the availability of on-line academic lecture material. It is conspicuous, however, that in contrast to many other communicative activities, lecture processing has until now enjoyed little benefit from the development of human language technology. The goal of this proposal is to enable fast, accurate and easy access to lecture content. We will develop new technologies in the area of speech recognition, structure induction and summarization.
Our work will contribute to a better understanding of the relationship between written and spoken language, a long standing issue in linguistics which has seen limited empirical research. Our research will rectify this situation by an extensive corpus-based study of this relationship at different levels, ranging from vocabulary to discourse variations.
The tools we propose to develop will be integrated and tested in the framework of the MIT Open CourseWare Initiative, a large publicly available on-line repository of teaching material from 500 MIT courses. In addition, we will also incorporate our tools in the Liberated Learning Initiative, which works on the integration of students with disabilities in mainstream higher education.