This proposal seeks funding to expand the Integrated Public Use Micro-data Series (IPUMS) by adding demographic and geographic data describing the entire enumerated population of the U.S. from 1790 to 1930. The project will provide data on the characteristics of over 600 million persons, quadrupling the quantity of U.S. census micro-data available for scientific research. The data will cover entire populations with full geographic detail, providing contextual information on neighborhood characteristics, including ethnic composition, demographic behavior, and population mobility. These data offer the earliest information available on key social and economic characteristics, and they will provide invaluable insight into processes of long-run demographic and economic change. The data will make a permanent and substantial addition to the nation's statistical infrastructure and will have far-reaching implications for research across the social and behavioral sciences. The project is made possible by the donation of a massive high-quality verified transcription of information in the U.S. censuses, prepared by two major genealogical organizations. Converting this immense body of raw data into a format suitable for scientific analysis will require the following tasks: () classify and code geographic locations to be compatible with categories used in the published census returns;(2) assess completeness and accuracy of the data transcription;(3) convert alphabetic string data into numeric categories that are comparable over time;(4) employ new data cleaning software to identify and correct common enumeration and transcription errors;(5) develop documentation, including full descriptions of data processing methods, detailed analysis of comparability issues, and comprehensive machine-processable metadata;(6) incorporate the data into the IPUMS data access system for free dissemination to the scientific community;and (7) implement secure data protection and preservation policies. The project will be executed by a team of highly-experienced researchers with exceptional proficiency in large- scale data creation, integration, and dissemination and will leverage cutting-edge technology to process an unprecedented volume of data at reasonable cost. The project is a collaboration of the Minnesota Population Center with the world's largest producers of genealogical data, allowing cost-effective use of scarce resources to develop shared infrastructure for population and health research.
This project will provide basic infrastructure for health and population research, education, and policy-making. It will allow research on fertility, mortality, family composition, life-course transitions, mobility, and the impact of neighborhood conditions on demographic behavior. The proposed work is directly relevant to the central mission of the NIH as the steward of medical and behavioral research for the nation: the new data will advance fundamental knowledge about population health and population dynamics and will spawn new methods of spatiotemporal analysis that can deepen understanding of the ongoing transformations of American society.