Data Augmentation and Multiple Imputation for Unit Nonresponse

Peytchev, Andrey

Abstract

Key national and local health indicators rely on sample survey data through the selection of probability- based samples from the population. The inferential mechanism in surveys relies on obtaining responses from all eligible sample members-the absence or near absence of unit non response. However, response rates have declined and continue to decline in household surveys. The inability to measure all sample members creates the potential for bias in survey estimates. This threat is real in the National Health Interview Survey (NHIS), for example, non-respondents were 59 percent more likely to report being in poor or fair health (Khare, Mohadjer, Ezzati-Rice, and Waksberg, 1994). Such non-response bias can affect key estimates of prevalence rates, estimates of change over time, and estimates of impact of introduced policies, leading to ill-informed policies and misallocation of government funds. Adjustments can be constructed, but information available on the entire sample is usually very limited, especially in telephone surveys. Weighting is a commonly used method to correct for bias due to unit non-response. The effectiveness of this approach relies on the information available on all respondents and non respondents in the sample. Unfortunately, such information is limited;in random-digit dialing (RDD) studies, it is often restricted to the identification of the aggregate geographic area of the telephone number. Additional information can be obtained to inform adjustments, but is not available for the entire sample. Electronic information on individuals is progressively amassed into databases. These auxiliary data include variables that are known correlates of health-related behaviors and unit non-response, providing conditions for reduction of non-response bias. However, these data are not available for all individuals in the population. Alternative statistical procedures exist that can utilize incomplete auxiliary information. A potential solution lies in methods developed for dealing with item non-response. These methods generally exploit the associations between variables and use the information contained in other variables to inform missing values. This permits the use of incomplete information-records from databases can be merged to survey samples and used to inform missing values for unit non respondents. Furthermore, the statistical objective can be shifted from adjusting non-response rates to reducing non-response bias, producing more efficient and unbiased estimates. This research has two sets of goals: (1) to evaluate the properties and usefulness of commercially available data, and (2) to evaluate a theoretically different approach to addressing unit non-response that can utilize such incomplete data. If successful, this approach can be a valuable tool for reduction of bias in health-related surveys.

Public Health Relevance

Much public health research and public health policy making rely on survey data from probability samples, yet inference from surveys is increasingly undermined by non-response. The purpose of this research is to test the use of publicly and commercially available auxiliary data to inform missing values. This research also employs statistical methods that can utilize such incomplete auxiliary data and produce more efficient estimates, allowing for better and more informed decision making.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Exploratory/Developmental Grants (R21)
Project #: 5R21CA140764-02
Application #: 8044021
Study Section: Special Emphasis Panel (ZRG1-RPHB-K (51))
Program Officer: Breen, Nancy

Project Start: 2010-04-01
Project End: 2014-03-31
Budget Start: 2011-04-01
Budget End: 2014-03-31
Support Year: 2
Fiscal Year: 2011
Total Cost: $69,407
Indirect Cost

Institution

Name: Research Triangle Institute
Department
Type
DUNS #: 004868105

City: Research Triangle
State: NC
Country: United States
Zip Code: 27709

Related projects


NIH 2011 R21 CA	Data Augmentation and Multiple Imputation for Unit Nonresponse Peytchev, Andrey Alexandrov / Research Triangle Institute	$69,407
NIH 2010 R21 CA	Data Augmentation and Multiple Imputation for Unit Nonresponse Peytchev, Andrey Alexandrov / Research Triangle Institute	$71,360

Comments

Be the first to comment on Andrey Peytchev's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: