This is a pilot project to provide enhanced access to two seminal social science data resources - the American National Election Studies (ANES) and General Social Survey (GSS) - by creating structured, machine-actionable metadata and a portal populated with new tools for data discovery and analysis. The project will also analyze the current workflows that produce the ANES and GSS data and make recommendations for transitioning to metadata-driven processes to streamline data production and guard against the metadata loss that currently occurs.

A team including representatives from the ANES, GSS, and ICPSR will develop rich metadata in Data Documentation Initiative (DDI) XML, a well-defined standard for the social sciences. This work will entail retrofitting all existing documentation, which currently exists in disparate formats, to a uniform XML structure. A sample of the ANES and GSS documentation will be enriched with detailed information on provenance, universe, and other contextual information that accumulates across the data lifecycle. This sample will also include information to facilitate comparison and harmonization. All documentation created for the project will be freely available.

The Metadata Portal for the Social Sciences will demonstrate DDI-based open-source tools for advanced searching, dynamic codebooks, question banks, harmonization, and other functions. The portal will also feature links to bibliographic citations for both surveys and will provide opportunities for researchers and others to comment and interact. An important aspect of the project will be to re-envision the workflows currently used to produce the ANES and the GSS and to lay the groundwork for new metadata-driven workflows to realize a more seamless "interview to internet" process based on DDI and the Generic Statistical Business Process Model (GSBPM).

This project will facilitate transparent and user-friendly access to the ANES and the GSS to enable expert use of the data as well as exploration by more novice users. Production transparency in terms of how data came into being is essential, and this project will provide structured metadata on provenance of variables as well as detailed universe statements to permit users to understand the routing patterns for specific respondents and missing data. Many researchers are interested in determining comparability of data items and questions over time, and this project will demonstrate ways to assess comparability for these key time series. In general the project will provide more information about the ANES and the GSS data than researchers have had access to in the past. Improving the ANES and GSS workflows will lead to the automated capture of more metadata "upstream" that can be made available across the life cycle.

Improved access to data through better search, extraction, and analysis tools will enable greater participation across all segments of society interested in democratic process and social trends. The demonstration portal and its tools will illustrate the potential of using structured documentation as a foundation for tools development and will be extensible to other surveys, leading to improved accessibility for other social science data resources. The large base of metadata and the open source applications developed for this project will encourage software developers to create new ways to access ANES, GSS, and other data. New workflows will focus on metadata re-use over the life cycle, leading to greater efficiencies and cost-savings in creating DDI metadata for all social science data projects.

Project Report

Significant Results This two-year project resulted in enhanced study- and variable-level DDI markup for both the American National Election Study (ANES) and the General Social Survey (GSS). The project resulted in XML metadata following the Data Documentation Initiative (DDI) standard for 58 ANES surveys (79,521 variables) and the cumulative GSS 1972-2012 dataset (5,558 variables). This has created canonical versions of each survey for distribution from multiple sites that disseminate these data (ANES, GSS, ICPSR, and Roper Center). On the basis of the DDI metadata which now covers all ANES and GSS data sets, we created model tools and services for researchers. Question bank. The availability of variable-level metadata allows researchers to find and compare survey questions used across all ANES and GSS data sets. Users can examine question text and descriptive statistics about each question. This tool facilitates meta-analysis of multiple data sets and benchmarks for future research. Visualizing question routing. We created an interactive codebook designed to help researchers understand the flow of questions within a survey. Advanced surveys like ANES and GSS have many branches and "skip" patterns, so that respondents are not asked irrelevant questions. Researchers must understand which respondents were included in each question. The interactive codebook allows them to easily look backward and forward to understand which questions were used as filters to select respondents for later questions. Key outcomes The standardized metadata in DDI XML created on this project is an important legacy for ANES, GSS, and the research community. Since XML is machine-actionable, it can be used to develop any number of tools for data discovery, visualization, and dissemination. The tools developed for this project are merely examples, and more sophisticated tools will follow. Since the information is embedded in the XML and not in the program code, anyone can invent and implement a new tool. Moreover, any tool that operates on GSS and ANES will operate on any other data set with DDI metadata. This project will make it easier for researchers to search and use two of the premiere data collections in the social sciences. We have created online documentation for legacy versions of the ANES and GSS going back to 1948, which are still widely used. Researchers will also have better tools for data discovery, variable harmonization, and understanding questionnaire design. New tools and workflows will reduce the time from interview to data release, save money, and produce more accurate and detailed documentation in machine-actionable form.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Type
Standard Grant (Standard)
Application #
1229967
Program Officer
Brian Humes
Project Start
Project End
Budget Start
2012-09-15
Budget End
2014-08-31
Support Year
Fiscal Year
2012
Total Cost
$498,518
Indirect Cost
Name
National Opinion Research Center
Department
Type
DUNS #
City
Chicago
State
IL
Country
United States
Zip Code
60637