Assessment of individuals' proficiency at complex tasks is often accomplished by observation and rating. Teachers or testing agencies, for example, rate students' essays and their solutions to complex problems in mathematics and science. School districts employ trained observers to rate teachers' performance in the classroom. Experts rate radiologists' ability to classify x-ray images. Ratings, however, may change over time due to changes in the way the rater perceives the work and/or changes in individuals' proficiency. The material being rated also may reflect more than one dimension of proficiency. Finally, summaries of these ratings may be misleading when the data collection design includes groupings (schools, hospitals, etc.) that introduce extraneous statistical dependence into the rating data. This project will expand the Hierarchical Rater Model (HRM), a multilevel item response theory model that accounts for dependencies between multiple ratings of the same work, into a framework that will accommodate (a) variation in ratings over time; (b) multidimensional assessments; and (c) clusters and other hierarchical structure introduced by the data collection design. This new framework will allow the HRM to provide estimates of the overall proficiencies of individuals on the rated tasks, as well as estimates of precision, accuracy, and other rater characteristics, under a broad variety of practical rating situations. Analytical work, simulation studies, and real data applications will be used to explore and demonstrate the feasibility and applicability of the expanded HRM framework. In particular, planned analysis of data from the Measures of Effective Teaching project (MET; Bill and Melinda Gates Foundation, 2012), a large study of class-room teaching in the United States, will demonstrate feasibility of the proposed methodological advancements to the HRM. The research will culminate with a new HRM framework with unified notation and formulations so that researchers may specify and estimate special cases of the generalized model as needed. The project also will provide computational tools including algorithms and source code, so that researchers can apply the framework with ease.

The new HRM framework will advance scientific and practical knowledge in two ways. It will enable researchers and practitioners to obtain high-quality estimates of proficiency that account and adjust for complex structure in the ratings. It also will provide rich information about raters and the rating process. Ratings of work, performance, and behavior are an increasing part of high-stakes decisions in many fields including human resources, medical diagnosis, and psychology. The largest impact of this project may be in education policy and research, where ratings of teachers and students are increasingly common. The new HRM framework will allow researchers and practitioners in these fields to produce more accurate assessments of individuals being rated, and to diagnose possible issues in the measurement and rating design, contributing to improved high-stakes decision making based on rating data.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Type
Standard Grant (Standard)
Application #
1324587
Program Officer
Cheryl Eavey
Project Start
Project End
Budget Start
2013-09-01
Budget End
2018-08-31
Support Year
Fiscal Year
2013
Total Cost
$350,001
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213