This proposal seeks continuation funding to expand the 1900 Public Use Microdata Sample of the U.S. Census almost six-fold from 806,000 cases to approximately 4,626,000 cases, representing 6 percent of the population. Similar high-density samples are available for the period since 1980. With the rapid decline in data storage and processing costs during the past decade, scholars are increasingly capitalizing on the power of these large samples. Historical research based on census microdata is also growing rapidly. The Integrated Public Use Microdata Series (IPUMS), a series of census microdata spanning the period from 1850 to 2000, has opened important new avenues of research that have expanded our understanding of migration, nuptiality, fertility, the family, and labor markets. A new sample for 1900 providing information on 6 percent of the American population will stimulate a broad range of new research topics and methodological approaches to the study of long-run demographic change, ranging from the study of the oldest-old to multilevel analysis. This project involves seven main tasks: 1) Data entry of information on approximately 3,820,000 individuals from the original enumerators' manuscripts; 2) Development of dictionaries to translate each alphabetic entry into numeric codes compatible with IPUMS samples for other census years; 3) Evaluation of sample quality through consistency checks, random blind verification of approximately 115,000 cases, and comparison with aggregate statistics in the published census volumes; 4) Editing, cleaning, and allocation of missing, illegible, and inconsistent data through logical rules and imputation procedures; 5) Construction of new variables on household composition, relationships within families, geographic characteristics, and socioeconomic status; 6) Development of documentation, including full descriptions of the sampling and data processing methods, a detailed analysis of comparability issues, and a user's guide; and, 7) Incorporation of the sample and documentation into the IPUMS data access system.

National Institute of Health (NIH)
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Research Project (R01)
Project #
Application #
Study Section
Social Sciences, Nursing, Epidemiology and Methods 4 (SNEM)
Program Officer
Evans, V Jeffrey
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Minnesota Twin Cities
Other Domestic Higher Education
United States
Zip Code
Ruggles, Steven (2014) Big microdata for population research. Demography 51:287-97
Ruggles, Steven (2011) Intergenerational Coresidence and Family Transitions in the United States, 1850 - 1880. J Marriage Fam 73:138-148
Sobek, Matthew; Cleveland, Lara; Flood, Sarah et al. (2011) Big Data: Large-Scale Historical Infrastructure from the Minnesota Population Center. Hist Methods 44:61-68
Ruggles, Steven (2009) Reconsidering the Northwest European Family System: Living Arrangements of the Aged in Comparative Historical Perspective. Popul Dev Rev 35:249-273