PI: Quallmalz, Edys Institution: WestEd
PROJECT SUMMARY
This project addresses the suitability of available assessments for measuring what students should know and be able to do in science using a technology-based science tests. The investigators will design new assessments to gather evidence of complex science learning that incorporate recent changes in the practice of science and mathematics. They will conduct research on how to apply a new generation of dynamic science assessments, their construct validity and compare student performance on tasks and items in static, active, and interactive modes. The project will provide new information on the construct validity of technology-enhanced assessment tasks and items designed to measure complex science learning. The investigators compare the construct validity of dynamic (both active and interactive) assessment tasks to those of static formats intended to measure the same complex learning. They hope to be able to identify principles for types of science assessment task design structures that incorporate technology to elicit knowledge of science systems and inquiry abilities. The investigation is in collaboration with the American Association for the Advancement of Science (AAAS).
Edys S. Quellmalz1, Principal Investigator Jodi L. Davenport1, Michael J. Timms2 and George DeBoer3, ,Co-Principal Investigators (WestEd1, Australian Council for Educational Research2, American Association for the Advancement of Science3) How can assessments measure complex science learning? Although traditional, multiple-choice items can efficiently measure scientific facts or concepts, they are considered less well-suited for providing evidence of science inquiry practices such as making observations or designing and conducting investigations. Thus, students who perform very proficiently in "science" as measured by static, conventional tests, may have strong factual knowledge, but little ability to apply this knowledge to conduct meaningful investigations. As technology has advanced, interactive, simulation-based assessments have the promise of capturing information about these more complex science practice skills. In the project, we tested whether interactive assessments may be more effective than traditional, static assessments at discriminating student proficiency across the three types of science practices defined in the 2009 Science Framework for the National Assessment of Educational Progress: 1) identifying principles (e.g., recognizing principles), 2) using principles (e.g., applying knowledge to make predictions and generate explanations, and 3) conducting inquiry (e.g., designing experiments). The project explored three modalities of assessment: static, most similar to traditional items in which the system presents still images and does not respond to student actions, active, in which the system presents dynamic portrayals, such as animations, which students can observe and review, and interactive, in which the system shows dynamic systems "in action" and responds to student input. Three analyses were employed—a generalizabilty study, confirmatory factor analysis, and Multidimentsional Item Response Theory--to evaluate how well each assessment modality distinguished performance on these three types of science practices. The study found that the interactive task sets that served as a basis of the interactive assessments were more effective than either static or active assessments at uniquely measuring students’ ability to engage in inquiry practices. Therefore, assessment developers who wish to design assessments of science inquiry skills should consider the use of active and interactive assessment tasks. Intellectual Merit. The Foundations of 21st Century Science Assessments study has extended principles from the fields of learning and testing to the design and validation of a new generation of dynamic science assessments. The study contributes reusable research-based design principles for developing the next generation of dynamic science assessment tasks that take advantage of the capabilities of technology to measure complex science learning and inquiry strategies. The study forged a new, principled framework for studying and documenting the validity of technology-enhanced science assessments. Broader Impact. The study provides test developers, policymakers, and science educators with evidence of the validity and comparability of types of static and dynamic task and item designs on the next generation of computer-based science tests. The methods and findings provide guidance to state, national, and international educators for gathering evidence to interpret claims and make decisions based on results of tests transitioning from static to dynamic forms.