We propose to develop and evaluate simple, variable-specific indices of non-ignorable selection bias for researchers in the health sciences working with data collected from non-probability samples. Classical methods of scientific probability sampling and corresponding ?design-based? frameworks for making statistical inferences about populations have long been used in the health sciences to advance scientific knowledge. The random selection of elements from a population of interest into a probability sample, where all population elements have a known non-zero probability of selection, ensures that elements included in the sample mirror the population in expectation. That is, for all variables of interest, the mechanism of selection of a subset of elements into the sample is ignorable, following the theoretical framework for missing data mechanisms originally introduced by Donald Rubin. Unfortunately, the modern survey research environment has had a severe negative impact on these ?tried and true? methods of survey research that rely on probability samples: it has become harder and harder to contact sampled units, survey response rates continue to decline in all modes of administration (face-to-face, telephone, etc.), and the costs of collecting and maintaining scientific probability samples are steadily rising. Health science researchers are thus turning to volunteers and relatively inexpensive sources of ?big data? collected from samples where the probabilities of selection are unknown (e.g., commercial databases, or social media platforms like Twitter and Facebook). A key question that arises from analyses of these non-probability samples is how good the resulting population inferences are. If the mechanisms underlying selection into the non-probability sample depend on the variables of research interest, then estimates of population parameters may well be biased. The proposed research aims to draw on recent developments in the survey statistics literature related to assessment of the bias arising from non-ignorable nonresponse in surveys, and develop simple but novel model-based indices of non-ignorable selection bias for non-probability samples, in addition to methods for adjusting population inferences based on those indices. The proposed indices offer advantages over competing indices in terms of their focus on specific survey variables. They are also entirely model-based, enabling researchers to develop appropriate models relating their key survey measures to auxiliary variables that are known for the responding persons and available in aggregate for the target population. This research will have widespread impact, enabling quantification of (and adjustment for) the bias in estimates arising from non-probability samples.

Public Health Relevance

The collection of data on health outcomes from high-quality scientific probability samples, where everyone in a population has a known probability of selection into the sample and methods for making representative population inferences are well-established, has become much more difficult and expensive in recent years. Researchers in the health sciences are thus turning to relatively inexpensive analyses of data collected from non-probability samples (e.g., Twitter users), but these samples do not provide researchers with a statistical framework for making inferences about larger populations. We aim to provide survey researchers with simple, theoretically sound indices of the amount of selection bias in survey estimates arising from non- probability samples, in addition to related methods of adjusting population estimates for this bias.

Agency
National Institute of Health (NIH)
Institute
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21HD090366-02
Application #
9569276
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Bures, Regina M
Project Start
2017-09-23
Project End
2019-08-31
Budget Start
2018-09-01
Budget End
2019-08-31
Support Year
2
Fiscal Year
2018
Total Cost
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Type
Organized Research Units
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109