This is a pilot project to provide enhanced access to two seminal social science data resources - the American National Election Studies (ANES) and General Social Survey (GSS) - by creating structured, machine-actionable metadata and a portal populated with new tools for data discovery and analysis. The project will also analyze the current workflows that produce the ANES and GSS data and make recommendations for transitioning to metadata-driven processes to streamline data production and guard against the metadata loss that currently occurs.
A team including representatives from the ANES, GSS, and ICPSR will develop rich metadata in Data Documentation Initiative (DDI) XML, a well-defined standard for the social sciences. This work will entail retrofitting all existing documentation, which currently exists in disparate formats, to a uniform XML structure. A sample of the ANES and GSS documentation will be enriched with detailed information on provenance, universe, and other contextual information that accumulates across the data lifecycle. This sample will also include information to facilitate comparison and harmonization. All documentation created for the project will be freely available.
The Metadata Portal for the Social Sciences will demonstrate DDI-based open-source tools for advanced searching, dynamic codebooks, question banks, harmonization, and other functions. The portal will also feature links to bibliographic citations for both surveys and will provide opportunities for researchers and others to comment and interact. An important aspect of the project will be to re-envision the workflows currently used to produce the ANES and the GSS and to lay the groundwork for new metadata-driven workflows to realize a more seamless "interview to internet" process based on DDI and the Generic Statistical Business Process Model (GSBPM).
This project will facilitate transparent and user-friendly access to the ANES and the GSS to enable expert use of the data as well as exploration by more novice users. Production transparency in terms of how data came into being is essential, and this project will provide structured metadata on provenance of variables as well as detailed universe statements to permit users to understand the routing patterns for specific respondents and missing data. Many researchers are interested in determining comparability of data items and questions over time, and this project will demonstrate ways to assess comparability for these key time series. In general the project will provide more information about the ANES and the GSS data than researchers have had access to in the past. Improving the ANES and GSS workflows will lead to the automated capture of more metadata "upstream" that can be made available across the life cycle.
Improved access to data through better search, extraction, and analysis tools will enable greater participation across all segments of society interested in democratic process and social trends. The demonstration portal and its tools will illustrate the potential of using structured documentation as a foundation for tools development and will be extensible to other surveys, leading to improved accessibility for other social science data resources. The large base of metadata and the open source applications developed for this project will encourage software developers to create new ways to access ANES, GSS, and other data. New workflows will focus on metadata re-use over the life cycle, leading to greater efficiencies and cost-savings in creating DDI metadata for all social science data projects.
This two-year project has developed new tools and resources to improve access to the American National Election Study (ANES) and the General Social Survey (GSS), which are among the most widely used sources of data about American society. The ANES has been conducting public opinion surveys in Presidential election years since 1948. The GSS is distributed as a cumulative file spanning the years 1972 to the most recent wave (currently 2012). Since 1972 the GSS has asked a standard core of demographic, behavioral, and attitudinal questions, plus topics of special interest. By compiling standardized machine-readable metadata ("data about data") for the all ANES and GSS surveys, we have enabled the application of new online services that assist users in finding and retrieving data. Users can now search for specific questions among more than 79,521 variables in 58 ANES surveys and the cumulative GSS 1972-2012 dataset, which contains 5,558 variables. A Question Bank allows users to compare questions and results across surveys, and a Data Extraction tool assists users in compiling customized subsets. We have also created a consolidated variable classification that allows users to find concepts across all waves of both studies. The tools developed for this project point to new ways for researchers to discover data. Currently, if a researcher wants to find a specific variable or combination of variables, she must download and examine codebooks and survey instruments for each potential source of data. We have demonstrated that this entire process can be provided online in ways that save time and provide much better information. Our examination of the workflows of GSS and ANES have resulted in a new model for the continuous capture of metadata, which promises to reduce costs and produce higher quality documentation. While the production of research data is a highly automated process, production of metadata is still manual, tedious, and repetitive. We have already initiated the process of creating a new standard for describing data transformations, which is required for automating the capture of metadata during the final stages of data production.