This application seeks funding to create a complete set of microdata describing socioeconomic characteristics of the U.S. population in 1940. The project will digitize critical information on income, education, housing, and employment, greatly increasing the usefulness of the 1940 census for answering fundamental scientific questions about health and demographic change. The 1940 census was the first to collect information on years of schooling completed, wage and salary income, hours worked last week, and weeks worked last year. Data on parental income and education are essential for assessing childhood socioeconomic status. Accordingly, these indicators will be invaluable for assessing the role of early life conditions on health outcomes. Because the database will cover the entire population with full geographic detail, it will provide contextual information on childhood neighborhood characteristics, including labor-market conditions. More broadly, because these data offer the earliest information available on key social and economic characteristics, they will provide an important baseline for studies of demographic and economic change. The socioeconomic variables will make a permanent and substantial addition to the nation's statistical infrastructure and will have far-reaching implications for research across the social and behavioral sciences. The project involves (1) transcription of over one billion keystrokes of data describing socioeconomic characteristics of all individuals present in the United States in 1940; (2) evaluation of data quality through random blind verification and comparison with published census returns; (3) data cleaning, including editing and imputation of inconsistent and missing data values; (4) development of a data dictionary to convert approximately 80,000 different open-ended descriptions of institutions into numeric classifications compatible with previous and subsequent census data; (5) development of documentation, including full descriptions of data processing methods, detailed analysis of comparability issues, and comprehensive machine-processable metadata; (6) incorporation of the additional variables into the Integrated Public Use Microdata Series (IPUMS) data access system for free dissemination to the scientific community; and (7) implementation of secure data protection and preservation policies. The project will be executed by a team of highly-experienced researchers with exceptional proficiency in large- scale data creation, integration, and dissemination. The project is a collaboration of the Minnesota Population Center with the nation's largest producer of genealogical data, the Census Bureau, and the National Archives and Records Administration. This collaboration allows a cost-effective use of scarce resources to develop shared infrastructure for population and health research.

Public Health Relevance

This project will provide basic infrastructure for health and population research, education, and policy-making. It will allow study of the impact of early life conditions-including parental income and education-on later health and mortality. It will enable new kinds of spatial analysis, providing contextual information on childhood neighborhood characteristics, including labor-market conditions. The proposed work is directly relevant to the central mission of the NIH as the steward of medical and behavioral research for the nation: the new data will advance fundamental knowledge about population health and population dynamics.

Agency
National Institute of Health (NIH)
Institute
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type
Research Project (R01)
Project #
4R01HD073967-05
Application #
9062472
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Clark, Rebecca L
Project Start
2012-08-15
Project End
2017-04-30
Budget Start
2016-05-01
Budget End
2017-04-30
Support Year
5
Fiscal Year
2016
Total Cost
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Miscellaneous
Type
Schools of Arts and Sciences
DUNS #
555917996
City
Minneapolis
State
MN
Country
United States
Zip Code
55455
Ruggles, Steven; Fitch, Catherine; Roberts, Evan (2018) Historical Census Record Linkage. Annu Rev Sociol 44:19-37
Kugler, Tracy A; Fitch, Catherine A (2018) Interoperable and accessible census and survey data from IPUMS. Sci Data 5:180007
Roberts, Evan; Warren, John Robert (2017) Family structure and childhood anthropometry in Saint Paul, Minnesota in 1918. Hist Fam 22:258-290
Ruggles, Steven (2014) Big microdata for population research. Demography 51:287-97
Sobek, Matthew; Cleveland, Lara; Flood, Sarah et al. (2011) Big Data: Large-Scale Historical Infrastructure from the Minnesota Population Center. Hist Methods 44:61-68