Baseline Socioeconomic Microdata for Population and Health Research

Ruggles, Steven; Warren, John

Abstract

This application seeks funding to create a complete set of microdata describing socioeconomic characteristics of the U.S. population in 1940. The project will digitize critical information on income, education, housing, and employment, greatly increasing the usefulness of the 1940 census for answering fundamental scientific questions about health and demographic change. The 1940 census was the first to collect information on years of schooling completed, wage and salary income, hours worked last week, and weeks worked last year. Data on parental income and education are essential for assessing childhood socioeconomic status. Accordingly, these indicators will be invaluable for assessing the role of early life conditions on health outcomes. Because the database will cover the entire population with full geographic detail, it will provide contextual information on childhood neighborhood characteristics, including labor-market conditions. More broadly, because these data offer the earliest information available on key social and economic characteristics, they will provide an important baseline for studies of demographic and economic change. The socioeconomic variables will make a permanent and substantial addition to the nation's statistical infrastructure and will have far-reaching implications for research across the social and behavioral sciences. The project involves (1) transcription of over one billion keystrokes of data describing socioeconomic characteristics of all individuals present in the United States in 1940;(2) evaluation of data quality through random blind verification and comparison with published census returns;(3) data cleaning, including editing and imputation of inconsistent and missing data values;(4) development of a data dictionary to convert approximately 80,000 different open-ended descriptions of institutions into numeric classifications compatible with previous and subsequent census data;(5) development of documentation, including full descriptions of data processing methods, detailed analysis of comparability issues, and comprehensive machine-processable metadata;(6) incorporation of the additional variables into the Integrated Public Use Microdata Series (IPUMS) data access system for free dissemination to the scientific community;and (7) implementation of secure data protection and preservation policies. The project will be executed by a team of highly-experienced researchers with exceptional proficiency in large- scale data creation, integration, and dissemination. The project is a collaboration of the Minnesota Population Center with the nation's largest producer of genealogical data, the Census Bureau, and the National Archives and Records Administration. This collaboration allows a cost-effective use of scarce resources to develop shared infrastructure for population and health research.

Public Health Relevance

This project will provide basic infrastructure for health and population research, education, and policy-making. It will allow study of the impact of early life conditions-including parental income and education-on later health and mortality. It will enable new kinds of spatial analysis, providing contextual information on childhood neighborhood characteristics, including labor-market conditions. The proposed work is directly relevant to the central mission of the NIH as the steward of medical and behavioral research for the nation: the new data will advance fundamental knowledge about population health and population dynamics.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type: Research Project (R01)
Project #: 5R01HD073967-03
Application #: 8660558
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Clark, Rebecca L

Project Start: 2012-08-15
Project End: 2017-04-30
Budget Start: 2014-05-01
Budget End: 2015-04-30
Support Year: 3
Fiscal Year: 2014
Total Cost
Indirect Cost

Institution

Name: University of Minnesota Twin Cities
Department: Miscellaneous
Type: Schools of Arts and Sciences
DUNS #

City: Minneapolis
State: MN
Country: United States
Zip Code: 55455

Related projects


NIH 2016 R01 HD	Baseline Socioeconomic Microdata for Population and Health Research Ruggles, Steve; Warren, John Robert / University of Minnesota Twin Cities
NIH 2015 R01 HD	Baseline Socioeconomic Microdata for Population and Health Research Ruggles, Steven; Warren, John Robert / University of Minnesota Twin Cities	$585,641
NIH 2014 R01 HD	Baseline Socioeconomic Microdata for Population and Health Research Ruggles, Steven; Warren, John Robert / University of Minnesota Twin Cities
NIH 2013 R01 HD	Baseline Socioeconomic Microdata for Population and Health Research Ruggles, Steven; Warren, John Robert / University of Minnesota Twin Cities	$566,416
NIH 2012 R01 HD	Baseline Socioeconomic Microdata for Population and Health Research Ruggles, Steven; Warren, John Robert / University of Minnesota Twin Cities	$624,648

Publications

Ruggles, Steven; Fitch, Catherine; Roberts, Evan (2018) Historical Census Record Linkage. Annu Rev Sociol 44:19-37

Kugler, Tracy A; Fitch, Catherine A (2018) Interoperable and accessible census and survey data from IPUMS. Sci Data 5:180007

Roberts, Evan; Warren, John Robert (2017) Family structure and childhood anthropometry in Saint Paul, Minnesota in 1918. Hist Fam 22:258-290

Ruggles, Steven (2014) Big microdata for population research. Demography 51:287-97

Sobek, Matthew; Cleveland, Lara; Flood, Sarah et al. (2011) Big Data: Large-Scale Historical Infrastructure from the Minnesota Population Center. Hist Methods 44:61-68

Comments

Be the first to comment on Steven Ruggles's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: