This project will develop record linkage methods necessary to create an unprecedented data resource covering the American population over seven decades. Specifically, this project will develop new strategies for placing unique protected identification keys (PIKs) on twentieth century census records and will evaluate the results and optimize the data for population and health research. These strategies will facilitate linking census, survey, and administrative records to create an integrated database allowing life-course and intergenerational analysis of health and wellbeing. Within a secure data environment, the Census Bureau assigns PIKs on many recent census and survey data which allows them to uniquely identify and link individuals across data sources for the purposes of improving data quality and program efficiency while maintaining confidentiality. This project proposes research to obtain PIK rates on 1940 census data that approach the Bureau's success on recent data. If successful, by matching 1940 cross-sectional data with recent cross sectional and panel data, this work will allow the research community to (1) construct longitudinal data on individuals over long periods of time; (2) construct longitudinal data on related individuals (siblings and parents and children) over long periods of time and (3) construct data on multiple generations of families (dynasties). Such data will be used to study fundamental issues of American society including the effects of early life living conditions on later life health outcomes and the intergenerational transfer of wealth, health and human capital. The 1940 Census is an excellent test bed for developing algorithms for assigning PIKs to earlier census data. It is the most recent decennial census for which the original manuscripts are available under the Census's 72-year rule for data release. Names and addresses as well as a host of demographic information for individuals and their household members are easily accessible through IPUMS data, giving potential information for uniquely identifying individuals with other administrative data sources. This pilot project will evaluate: (1) the overall PIK rate of the 1940 Census using algorithms developed for recent census data, including how the PIK rate varies with demographic characteristics especially age, sex and race; (2) how additional data and new methods can be used to improve the PIK rate on pre-2000 data including the use of Social Security data used to administer the OASDI program and military enlistment records; (3) the tradeoff between bias and completeness introduced by various matching methods; and (4) econometric methods to use data matched not uniquely (but to a small number of people). Findings from this study will inform future efforts to develop a data infrastructure program linking a range of data sources on individuals and families over long periods of time to study life-cycle and intergenerational issues

Public Health Relevance

This project explores methods of constructing data on individuals from birth until death and constructing data on families across generations. There is increasing evidence that insults to the body early in life, even in utero, can have lasting effects on adult health, as well as evidence that certain diseases runs in families. To date there has been no data that follows the entire population that allows researchers to study these questions on large representative samples, an important gap in our knowledge that this project fills.

Agency
National Institute of Health (NIH)
Institute
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21HD087914-02
Application #
9276711
Study Section
Social Sciences and Population Studies A Study Section (SSPA)
Program Officer
Bures, Regina M
Project Start
2016-05-20
Project End
2018-02-28
Budget Start
2017-03-01
Budget End
2018-02-28
Support Year
2
Fiscal Year
2017
Total Cost
$89,154
Indirect Cost
$21,654
Name
National Bureau of Economic Research
Department
Type
Research Institutes
DUNS #
054552435
City
Cambridge
State
MA
Country
United States
Zip Code
02138