The goal of this project is to develop a nationally available fully automated web-based diagnostic system called ARCADE (Automated Reading Comprehension and Diagnostic Evaluation) that will be capable of assessing complex comprehension based upon student free response data. The technology used will be to combine information extraction (IE) technologies and advanced psychometric techniques in novel ways so as to provide detailed assessment and diagnostic information for use by teachers and students in the service of improving classroom instruction and learning. The ARCADE system will also contribute to a data infrastructure useable by other investigators. Specifically, computer scientists interested in the development of new information extraction technology and cognitive scientists and educators interested in the development of new theories of comprehension and assessment would be able to access the database that will be developed in the course of using ARCADE with students. Thus, in addition to improving educational effectiveness in classrooms on a national level, the ARCADE system has the potential to provide a nation-wide resource for facilitating the advancement of scientific research in both the fields of reading comprehension and information technology research.

The specific research being pursued is to develop and empirically test the core mathematical algorithms of the ARCADE system with respect to their reliability and validity for assessing reading comprehension in foundational literacy in science and literature. The empirical database will consist of free response data generated by student examinees in grade school and junior high school classroom settings in response to open-ended probe questions. The core ARCADE system employs innovative combinations of information extraction and psychometric techniques to address a critical educational need, namely, ways to assess multiple dimensions of complex comprehension. Such dimensions of comprehension are specified by a set of special semantic networks (called "knowledge digraphs") which embody meaning relations among ideas in texts and documents as well as relations to prior knowledge and inferences. A new statistical model of examinee behavior is then defined which incorporates techniques from the fields of Item Response Theory (IRT), Hidden Markov Model IE technology, and Knowledge Digraph Contribution analysis. The important innovation of this new statistical model is that multiple dimensions of comprehension in conjunction with their respective standard errors can be directly estimated from examinee free response data using Monte Carlo simulation and econometric methods Moreover, using an approach analogous to that developed in IRT , these assessments of comprehension dimensions can be mathematically proven to be reliable across a given family of equivalent testing materials.

Project Start
Project End
Budget Start
2001-11-15
Budget End
2005-10-31
Support Year
Fiscal Year
2001
Total Cost
$393,793
Indirect Cost
Name
University of Texas at Dallas
Department
Type
DUNS #
City
Richardson
State
TX
Country
United States
Zip Code
75080