Computerized Adaptive Testing (CAT) has become popular in high-stakes testing programs. Examples of large-scale CATs include the Graduate Record Examination, the Graduate Management Admission Test, the National Council of State Boards Nursing, and the Armed Services Vocational Aptitude Battery. An advantage of CAT over paper-and-pencil exams is that it provides more efficient estimation of abilities because it can appropriately tailor item selection to the estimated abilities of examinees. This research addresses two areas of critical importance for CAT. First, flexible models for response times will be developed to assist in controlling the duration of an exam, and also to assist in the measurement of ability in appropriate circumstances. Second, statistical methods for constraint management will be developed to ensure that an exam has sufficient information to diagnose fine-grained skills while also providing an accurate summary score.
The impact of the research will be to provide technology to better utilize response-time information and also enhance the ability of CAT to provide diagnostic information. Models for response-time distributions will be developed that make few assumptions concerning functional form and allow for dependence between response times and a latent trait that represents ability on the studied domain. Estimation techniques will be developed that may be used with data previously collected from exams administered with CAT. Algorithms for utilizing estimated response-time distributions will be constructed to better manage duration of exams and to extract information from response times to better estimate the ability the exam is designed to measure. In addition to addressing response times, the problem of managing the diagnostic information to assess mastery of fine-grained skills will be studied for exams that also aim to provide a single summary score. CAT methods for adaptively selecting items to more efficiently provide a score will be modified to simultaneously balance the coverage of skills and attributes of interest.
Standardized testing plays a major role in our society, from elementary school all the way through higher education, as well as in professional certification. Computers have allowed us to make great advances in standardized testing, especially through computarized adaptive testing (CAT). CAT affords the opportunity to tailor items to the ability of the examinee, to more efficiently measure ability. Computer administered tests also allow for recording response times, the time it takes to answer an item. However, little has been done with these response times to improve testing, and that was the aim of this project. Among the developments resulting from this grant was a method for utilizing response times to better estimate a persons ability in whatever domain the exam concerns. A method known as the maximum information criterion has been used for decades. We improved upon this method, by fitting a model to response times and ability together, that allows one to select an item that can optimally accrue information about the examinee's ability per unit of time. This means that in a fixed amount of time, abilities may be estimated more accurately than with previous methods. Another accomplishment of the grant was to create flexible statistical models for response times that may more accurately reflect reality. Statistical modeling almost always involves some assumptions that simplify the world. However, making such assumptions can be dangerous, if the model does not adequately agree with the process that is being modeled. We utilized techniques from both psychometrics and survival analysis to construct models that can adapt to a variety of potential realities and give a fit that is adequate for use. The ultimate goal of the project was to develop a set of tools that could advance CAT and standardized testing by paying careful attention to response times, previously an underutlized source of information. The primary accomplishments were to develop techniques for using these in selecting a sequence of items for a given examinee, and to create flexible models for response times. Response times may be used in many other ways that could be valuable directions for research. Among these are test formats that directly use response times in the estimation of ability, and methods for ensuring integrity in test taking through examining response times. Though these two topics were not addressed by the current research, the models that have been developed may play a role in addressing these issues. The method for item selection that we have proposed is straightforward and efficient, and ready for use. In addition, software used for research is available and may assist others in researching this potentially fruitful area of testing.