This proposal seeks funding to make available a remarkable new data source for social and economic research. Volunteers working with the Church of Jesus Christ of Latter-Day Saints (LDS) have invested approximately two million hours transcribing information from the 1880 U.S. Census of Population. This database - which covers the entire U.S. population - has the potential to become our most important resource for the study of the economic and social organization of late-nineteenth century American society. The LDS holds the copyright to these data, but has agreed to make them freely available for academic use in exchange for modest assistance in cleaning the data. In addition to cleaning the data, we propose to convert it into a form suitable for statistical analysis. The needed work can be divided into ten main tasks: (1) correct several technical errors that were introduced inadvertently by the LDS in the course of data processing; (2) identify missing cases so the LBS can re-enter them; (3) correct flags distinguishing separate households and dwellings; (4) check for inconsistencies in family relationship, sex, marital status, and age, and make needed corrections; (5) create comprehensive data dictionaries to classify cases according to standardized coding systems for geographic location, group quarters, and place of birth, and occupation; (6) allocate missing, illegible, and inconsistent data through logical rules and imputation procedures; (7) construct new variables to simplify statistical analysis; (8) develop documentation, including detailed descriptions of the cleaning and data processing methods, and the data collection procedures employed by the LDS; (9) integrate the database and documentation into the Integrated Public Use Microdata Series IPUMS); and (10) disseminate the database via the Internet. The late nineteenth century is a critical period in the study of fertility decline, urbanization, immigration, household composition, and occupational structure. The LDS 1880 database includes a wealth of information on these topics that can only be fully explored through the creation of a new microdata set. The 1880 census database will not only constitute an invaluable resource in its own right but will also enhance the value of the previously created historical microdata samples; used in combination these microdata will constitute our most important resource for the study of nineteenth-century social structure. The database will be distributed via the 'International Integrated Microdata Access System,"""""""" an infrastructure project recently funded by the National Science Foundation, as well as through conventional data archives.
Ruggles, Steven (2014) Big microdata for population research. Demography 51:287-97 |
Sobek, Matthew; Cleveland, Lara; Flood, Sarah et al. (2011) Big Data: Large-Scale Historical Infrastructure from the Minnesota Population Center. Hist Methods 44:61-68 |
Ruggles, Steven (2011) Intergenerational Coresidence and Family Transitions in the United States, 1850 - 1880. J Marriage Fam 73:138-148 |
Goeken, Ron; Huynh, Lap; Lenius, Thomas et al. (2011) New Methods of Census Record Linking. Hist Methods 44:7-14 |
Ruggles, Steven (2009) Reconsidering the Northwest European Family System: Living Arrangements of the Aged in Comparative Historical Perspective. Popul Dev Rev 35:249-273 |