Key national and local health indicators rely on sample survey data through the selection of probability- based samples from the population. The inferential mechanism in surveys relies on obtaining responses from all eligible sample members-the absence or near absence of unit non response. However, response rates have declined and continue to decline in household surveys. The inability to measure all sample members creates the potential for bias in survey estimates. This threat is real in the National Health Interview Survey (NHIS), for example, non-respondents were 59 percent more likely to report being in poor or fair health (Khare, Mohadjer, Ezzati-Rice, and Waksberg, 1994). Such non-response bias can affect key estimates of prevalence rates, estimates of change over time, and estimates of impact of introduced policies, leading to ill-informed policies and misallocation of government funds. Adjustments can be constructed, but information available on the entire sample is usually very limited, especially in telephone surveys. Weighting is a commonly used method to correct for bias due to unit non-response. The effectiveness of this approach relies on the information available on all respondents and non respondents in the sample. Unfortunately, such information is limited;in random-digit dialing (RDD) studies, it is often restricted to the identification of the aggregate geographic area of the telephone number. Additional information can be obtained to inform adjustments, but is not available for the entire sample. Electronic information on individuals is progressively amassed into databases. These auxiliary data include variables that are known correlates of health-related behaviors and unit non-response, providing conditions for reduction of non-response bias. However, these data are not available for all individuals in the population. Alternative statistical procedures exist that can utilize incomplete auxiliary information. A potential solution lies in methods developed for dealing with item non-response. These methods generally exploit the associations between variables and use the information contained in other variables to inform missing values. This permits the use of incomplete information-records from databases can be merged to survey samples and used to inform missing values for unit non respondents. Furthermore, the statistical objective can be shifted from adjusting non-response rates to reducing non-response bias, producing more efficient and unbiased estimates. This research has two sets of goals: (1) to evaluate the properties and usefulness of commercially available data, and (2) to evaluate a theoretically different approach to addressing unit non-response that can utilize such incomplete data. If successful, this approach can be a valuable tool for reduction of bias in health-related surveys.

Public Health Relevance

Much public health research and public health policy making rely on survey data from probability samples, yet inference from surveys is increasingly undermined by non-response. The purpose of this research is to test the use of publicly and commercially available auxiliary data to inform missing values. This research also employs statistical methods that can utilize such incomplete auxiliary data and produce more efficient estimates, allowing for better and more informed decision making.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21CA140764-01A1
Application #
7896261
Study Section
Special Emphasis Panel (ZRG1-RPHB-K (51))
Program Officer
Breen, Nancy
Project Start
2010-04-01
Project End
2012-03-31
Budget Start
2010-04-01
Budget End
2011-03-31
Support Year
1
Fiscal Year
2010
Total Cost
$71,360
Indirect Cost
Name
Research Triangle Institute
Department
Type
DUNS #
004868105
City
Research Triangle
State
NC
Country
United States
Zip Code
27709