Respondent-Driven Sampling (RDS) is a type of link-tracing network sampling used to study hard-to-reach human populations. Beginning with a convenience sample, respondents are given uniquely identified coupons to distribute to other population members, making them eligible for enrollment. This is effective at collecting large diverse samples from many hard-to-reach populations. Despite often highly-effective sampling, statistical inference from RDS data is under-developed. Estimates are based on strong assumptions allowing the data to be treated as a probability sample. This project develops methodology for RDS in three ways: (1) Identifying sources of bias and elevated variance in real populations; (2) Extending the network model-assisted inferential framework to address additional sources of bias; and (3) Expanding the network model-assisted framework to arbitrarily large populations. In (1), the research will use simulated RDS samples from fully-observed network data to both explore features of real-world network structures that induce bias or elevated variance in estimates, and also to calibrate diagnostics of such features based on sampled data. Parts (2) and (3) involve advances of the network model-assisted estimator for RDS data. This is the most flexible estimator currently available for RDS data, but is quite computationally burdensome. In (2), the project will extend the existing network model-assisted estimator to adjust for additional features of the network and sampling process. In (3), the project will develop a new variant of this estimator that is more computationally practical in large populations.

Hard-to-reach populations are often at the low end of disparities in wealth, opportunity, and acceptance and at higher risk for negative social and health outcomes. These populations couple reduced social and health opportunities with a difficulty in collecting traditional probability samples. Thus, RDS is widely used and is of special interest to social and behavioral scientists as well as public health officials. This project will allow practitioners to confidently use RDS data in a wider variety of contexts. Specifically, researchers will (a) better understand which network and sampling conditions might induce bias into their estimators and have better tools to find evidence of these features in their samples, (b) have more options for adjusting their estimators to account for non-ideal network and sampling conditions, and (c) be able to compute the resulting estimators with greater efficiency, making the methods practical even in cases of very large target populations. The project is supported by the Methodology, Measurement, and Statistics Program and a consortium of federal statistical agencies as part of a joint activity to support research on survey and statistical methodology.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Type
Standard Grant (Standard)
Application #
1230081
Program Officer
Cheryl L. Eavey
Project Start
Project End
Budget Start
2012-09-15
Budget End
2017-08-31
Support Year
Fiscal Year
2012
Total Cost
$199,938
Indirect Cost
Name
University of Massachusetts Amherst
Department
Type
DUNS #
City
Hadley
State
MA
Country
United States
Zip Code
01035