This project will develop and evaluate machine-learning derived statistical models of quality for digital library contents for science and engineering education applications. Developing a computational model of quality that approximates expert human judgments is a foundational requirement for developing interfaces and tools that can optimize and scaffold human judgments on quality. The research will investigate: a) characteristics of digital learning resources and library collections that serve as key markers of quality for experts engaged in resource selection and collection curation; B) machine learning and natural language processing techniques with sufficient discrimination to approximate human-decision making; and c) how quality markers might be modeled computationally and requisite design considerations.
The research will be conducted by a collaborative partnership between investigators at the University of Colorado at Boulder and the Digital Library for Earth System Education (DLESE) Program Center at UCAR. Quality has emerged as a dominant yet poorly understood concern within national educational digital library efforts such as the National Science Digital Library (NSDL) and DLESE and other large-scale efforts. Evaluating quality involves making complex, time-consuming, and variable human judgments and it is not clear that human-intensive practices will scale to meet anticipated future library growth. Computational models for supporting human quality judgments can thus play a critical role in building future diverse data repositories and networks of these. Primary outcomes from this work will include a conceptual model of expert quality evaluation processes and a corresponding computational model that formalizes the conceptual model and can be empirically validated. Secondary outcomes include the documentation of best practices for collection curation and preliminary guidelines for the design of tools to scaffold curation processes around quality. This research extends current theories in collaborative systems in the design and optimization of the flow of information about quality to support human judgments in distributed and evolving digital library environments.