Researchers will soon have access to the world's largest individual-level population database, comprising two billion individual records describing the characteristics of Americans enumerated in the U.S. censuses taken between 1790 and 2010. Most of these data are already in digital form; with support from NICHD, NIA, and NSF, they are being processed and will soon be available in an integrated format to the scientific community. The data series covers the entire enumerated population with full geographic detail. The series provide the most comprehensive view of long-run population dynamics available for any place in the world, and they have the potential to transform our understanding of processes of demographic and economic change. This proposal seeks funding to realize this potential by filling a major gap in the series. The midsection of the data series, 1900 to 1930, is missing key information on socioeconomic and demographic characteristics, including variables describing fertility, mortality, marriage, economic activities, education, immigration, and housing. Through collaboration with the world's largest genealogical firm, it is now feasible to fill this gap and complete the data series. The data expansion will make a permanent and substantial addition to the nation's statistical infrastructure and will have far-reaching implications for health-related research across the social and behavioral sciences. The project involves (1) transcription of 5.8 billion keystrokes of data describing demographic and economic characteristics of all individuals enumerated in the United States Census between 1900 and 1930; (2) evaluation of data quality through random blind verification and comparison with published census returns; (3) data cleaning, including editing and imputation of inconsistent and missing data values; (4) development of data dictionaries to classify twenty million different open-ended descriptions of occupations, industries, languages, and institutions into numeric classifications compatible with previous and subsequent census data; (5) preparation of documentation, including full descriptions of data processing methods, detailed analysis of comparability issues, and comprehensive machine-processable metadata; (6) incorporation of the additional variables into the Integrated Public Use Microdata Series (IPUMS) data access system for free dissemination to the scientific community; and (7) implementation of secure data protection and preservation policies. The project will be executed by a team of highly-experienced researchers with exceptional proficiency in large-scale data curation, integration, and dissemination. The collaboration of the Minnesota Population Center with the nation's largest producer of genealogical data allows a cost-effective use of scarce resources to develop shared infrastructure for population and health research.

Public Health Relevance

This project will provide basic infrastructure for health and population research, education, and policy-making. It will allow research on fertility, mortality, family composition, life-course transitions, economic and geographic mobility, and the impact of neighborhood conditions on health and demographic behavior; it will enable unprecedented analyses of long-run population dynamics across generations of change. The proposed work is directly relevant to the central mission of the NIH as the steward of medical and behavioral research for the nation: the new data will advance fundamental knowledge about population health and population dynamics and will spawn new methods of spatiotemporal analysis that can deepen understanding of the ongoing transformations of American society.

Agency
National Institute of Health (NIH)
Institute
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type
Research Project (R01)
Project #
1R01HD083829-01
Application #
8858360
Study Section
Special Emphasis Panel (ZRG1-PSE-C (03))
Program Officer
Bures, Regina M
Project Start
2015-03-16
Project End
2020-02-29
Budget Start
2015-03-16
Budget End
2016-02-29
Support Year
1
Fiscal Year
2015
Total Cost
$624,616
Indirect Cost
$210,777
Name
University of Minnesota Twin Cities
Department
Type
Organized Research Units
DUNS #
555917996
City
Minneapolis
State
MN
Country
United States
Zip Code
55455
Kugler, Tracy A; Fitch, Catherine A (2018) Interoperable and accessible census and survey data from IPUMS. Sci Data 5:180007
Roberts, Evan; Warren, John Robert (2017) Family structure and childhood anthropometry in Saint Paul, Minnesota in 1918. Hist Fam 22:258-290