Online learning environments are rapidly expanding the nature of the data that can be collected about student opportunity to learn and the details of the learning process. As high schools, community colleges and universities enroll more students in blended or online courses, the volume of data about what students are doing surpasses typical educational research methodologies. Borrowing from multiple traditions, analysis of educational discourse data addresses multiple persistent problems in education, including the development of ways to better predict student success during a course and to design interventions to provide support. The researchers in this proposal will build a community to address three important areas: ethics in data sharing; technical issues in data sharing; and, plans for an infrastructure to support and develop capabilities in both of these. The community built in this effort will collaboratively create the necessary metadata and standards for data sets and tools, templates for appropriate approval to use online discourse data, and de-identification algorithms that can be shared across data platforms.
The research will be conducted through two interrelated workshops parsed by planning meetings with an expert advisory board. An initial workshop will focus on educational discourse and the undergraduate experience to identify the issues around the infrastructure necessary to support collaborative education discourse research. The second workshop will focus on the specific infrastructure needs for working with discourse data from massive open online courses (MOOCs). A white paper will be constructed and disseminated that addresses ethical and technical issues and solutions, as well as infrastructure needs to facilitate collaborative educational discourse research.