The Cognition, Measurement and Evaluation (CME) project combines the latest thinking in cognitive science about scientific reasoning and its attainment with recent advances in modern instrument development techniques. The CME project will produce a linked system of developmentally-appropriate, multiple-choice and constructed-response assessment instruments designed to measure scientific reasoning skills across age levels. These tools will be useful for researchers, classroom teachers and educators in settings for learners from 5th grade through college. The project focuses on scientific reasoning skills of control of variables and evaluating evidence. These skills will be defined in ways suitable for modern instrument development and will result in a draft instrument and model computer platform that would be ready for initial field testing. The ultimate goal is to create an assessment system that will provide information to: (1) STEM researchers who want to understand how innovative technologies, instructional approaches, and/or teaching practices impact students' scientific reasoning abilities, and (2) teachers who need to understand how students are responding to particular aspects of inquiry instruction. The institutions involved include the University of Minnesota and the University of Kansas and surrounding school districts.

Based on needs assessment, existing research and instruments and content expert advice, we will map the constructs by developmental level across the three age groups. Each construct map will include developmental levels and a qualitative description of scientific reasoning at each level. The item development process will include development of scoring systems to map responses to certain levels of the scientific reasoning constructs. Initial drafts of the instruments will be investigated through think-alouds, exit interviews, and focus group interviewing. An initial pilot test with a more substantial sample will allow for modeling of the data using IRT. We will use item fit, differential item function, and coverage maps to assess item properties. We will also use item fit and model comparisons to examine the structure of the constructs. During the model-building phase, the Rasch family models, which are the most parsimonious models, will be fitted to the pilot data. If it is discovered that Rasch family models do not fit the data, revisions to the construct, instrument, scoring model, and measurement model will be implemented.

One of the most pressing needs in the evaluation of science education programs is the need for appropriate measuring devices. Currently, the field uses a variety of assessment devices, making it almost impossible to determine national effects or to compare different approaches. Although assessment of scientific reasoning is not a new concept, recent ideas about its definition and its measurement may be transformative. The assessment devices developed by this project will be validated to ensure cultural relevancy and absence of bias and therefore will enhance the infrastructure available for research and education. Over time the use of the developed instruments will allow in-depth understanding of the impact of different programmatic approaches to the development of scientific reasoning, a critical goal of STEM education.

Project Report

The Cognition, Measurement and Evaluation (CME) project was designed to combine recent thinking from cognitive science about scientific reasoning and the latest trends in evaluation and measurement techniques to develop a test of scientific reasoning. The test is designed to evaluate the impact of science education programs on the scientific reasoning of middle and high school students, as well as adults. This project used existing literature to develop a conceptualization of scientific reasoning consisting of five facets; each with specified skills. Facet I focuses on the ability to observe a situation or event, recognize that there is something to find out, recognize the difference between existing understanding and what more needs to be learned, and to clearly articulate a question that can guide an empirical investigation. Facet II focuses on the ability to design tests of a hypothesis that correctly identify and manipulate all relevant variables in order that empirical evidence may be produced that will allow one to answer questions. A major aspect of this facet is controlling variables. Facet III focuses on students’ ability to interpret the results of an investigation and to draw justified inferences and/or conclusions based upon that data. This facet involves the ability to coordinate theory and evidence in such a way so as to draw inferences that account for either causal relationships or stochastic relationships. These activities employ theory, seek underlying theoretical causes for the evidence and utilize models to describe patterns in the data. Facet V focuses on students’ ability to evaluate theory in light of experimental conclusions, reconcile new evidence with prior beliefs, and (if required) revise one’s theory and generate new predictions. The project produced a set of carefully developed and pilot tested items matched to the facets to assess scientific reasoning. The pilot test data were used to examine the effectiveness of the items. The complete list of items along with each of their individual item characteristic data is available on the project website Also available is a detailed description of the methods used to develop the items and some suggested revisions of the items based on the pilot test data. The hope is that these items will be used by program evaluators to help evaluate science education programs.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Minnesota Twin Cities
United States
Zip Code