This project explores the design and analysis of peer grading technology. A peer grading system is an online tool that collects student submissions, assigns review tasks to the students and graders, and aggregates reviews to produce assessments of both the submissions and the peer reviews. The PIs have developed a prototype system and have collected preliminary evidence that suggests that peer review has important potential benefits:
1. Learning by reviewing: Students learn from critical assessment of other students' work. In the PIs' prototype at Northwestern, 60% of the students reported that peer review helped them learn course material and 55% of the students reported that peer review helped them to prepare better homework solutions themselves.
2. Reduced grading staff: Peer grading reduces the grading load on course staff and allows for effective teaching with larger classes. This is especially important currently, as interest in computer science classes increases at a faster pace than teaching resources. In the PIs' prototype at Northwestern, the course staff graded 1/5 of the student submissions.
3. Promptness of feedback: Reduced teacher grading enables prompt feedback to students. In the PIs' prototype at Northwestern, peer reviews were available within three days and final assessment of both the submission and peer reviews were available within five days. Prior to introducing peer review, assessments took one to two weeks.
A peer grading system is comprised of three main components:
1. The review matching algorithm determines which peers should review which submissions and which submissions should be reviewed by the teacher.
2. The submission grading algorithm aggregates the reviews of the peers and the submissions and assigns grades to the submissions.
3. The review grading algorithm compares the peer reviews with the teacher reviews and assigns grades to the peer reviews. Without this algorithm, peers may not put effort into providing quality reviews, and the reviews will be neither accurate for grading nor beneficial for the peer.
The details of these algorithms are crucial for the proper working of the peer review system. A main research effort of this project is to identify the algorithms to use for each of these components. The review matching algorithm affects the accuracy of the subsequent grading algorithms and the grading load of the teacher. The submission grading algorithm determines which peer reviews are accurate and which are inaccurate and uses this understanding to assign grades to the submissions that are representative of the submission quality. The review grading algorithm incentivizes the peers to put in sufficient effort to determine whether a submission is good or bad and it is calibrated so that good reviews and bad reviews get the appropriate review grades.
The PIs have implemented prototypes of these algorithms as part of a peer grading system that has been prototyped in Northwestern computer science classes. However, the space of possible algorithms is large and the PIs' work on the prototype has yet to determine the algorithms that combine to give the best education outcomes. A main focus of this project will be improving the understanding of which algorithms lead to the best education outcomes.
Theoretical work in algorithms and machine learning provides a starting point for the project's study of good algorithms for peer grading systems. A key endeavor of the project is translating and applying these theoretical algorithms to the peer grading domain. As one example, proper scoring rules are a natural approach for grading the peer reviews. However, test runs of the PIs' prototype implementation suggest that these rules might not be so good in practice. Both new models and algorithms are needed in theory, and these new algorithms need to work in practice.