Hidden or hard-to-reach populations such as sex workers, men who have sex with men, or people who inject drugs suffer from a disproportionately high burden of adverse health outcomes, but are the most difficult to study. Members of these groups are often socially stigmatized or legally criminalized, so potential subjects are not directly enumerable and random sampling is usually impossible. For this reason, researchers have developed survey techniques that rely on tracing social network links between individuals. The most popular technique is respondent-driven sampling (RDS), and it is widely used in epidemiological and clinical research on HIV, HBV, HCV, and syphilis, tobacco, alcohol, and illicit drug use, and access to treatment. RDS is also used by the CDC for HIV surveillance in the US, and by UNAIDS/WHO internationally. Remarkably, most of the network information contained in these samples is either discarded or misapplied in standard approaches to estimate characteristics of the target population from RDS data. Methodological research on RDS is focused on two related inferential targets: social network characteristics (e.g. clustering, degree, centrality) and population-level quantities (e.g. HIV prevalence, total population size), but accurate estimation of network structures, disease rates, and risk factors in high-risk hidden and hard-to-reach populations remains a major unsolved problem in public health. In this proposal I outline a plan to develop rigorous methodology for social epidemiology from social link-tracing designs in hidden and hard-to-reach populations. The key insight in this work is that RDS reveals structural information about the target population social network that can be used to dramatically improve epidemiological inference. I will begin by showing that existing statistical approaches to analyzing data from link-tracing studies rely on unrealistic assumptions, neglect important observable data, and produce estimates that suffer from serious bias. By rigorously characterizing the observed and missing network data for each sampling process, I will provide statistical and mathematical tools that allow researchers leverage the network data revealed by RDS. The approach allows accurate estimation of population averages (e.g. HIV prevalence), assessment of risk factors associated with epidemiological outcomes using network regression, hidden population size estimation, and geospatial mapping of risk and health outcomes. This network-based perspective is a radical departure from established approaches to RDS and has the potential to revolutionize the way epidemiologists collect and analyze data from surveys of hidden and hard-to-reach risk groups. Finally, I will develop free, open-source, web-based software for design and analysis of RDS studies that will be available to anyone anywhere in the world. Preliminary application of these ideas to empirical studies in real-world risk populations has already yielded promising results. The proposed work is innovative because it leverages previously neglected network information collected by every RDS study and uses it to dramatically improve the accuracy and precision of population-level estimates for key risk populations in public health.

Public Health Relevance

Understanding the social context of health outcomes and disease risk factors is a major focus in epidemiological research on vulnerable groups. Respondent-driven sampling (RDS) is the most common procedure for recruiting participants in epidemiological studies of hidden and hard-to-reach populations, but most of the network information contained in these samples is either discarded or misapplied in standard statistical approaches. The proposed work leverages the network information revealed by RDS to dramatically improve epidemiological and public health studies of the most at-risk populations.

National Institute of Health (NIH)
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
NIH Director’s New Innovator Awards (DP2)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-MOSS-C (56)R)
Program Officer
Lee, Sonia S
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Public Health & Prev Medicine
Schools of Medicine
New Haven
United States
Zip Code
Morozova, Olga; Cohen, Ted; Crawford, Forrest W (2018) Risk ratios for contagious outcomes. J R Soc Interface 15:
Gonsalves, Gregg S; Crawford, Forrest W; Cleary, Paul D et al. (2018) An Adaptive Approach to Locating Mobile HIV Testing Services. Med Decis Making 38:262-272
Gonsalves, Gregg S; Crawford, Forrest W (2018) Dynamics of the HIV outbreak and response in Scott County, IN, USA, 2011-15: a modelling study. Lancet HIV 5:e569-e577
Liu, Yiyi; Crawford, Forrest W (2018) Estimating dose-specific cell division and apoptosis rates from chemo-sensitivity experiments. Sci Rep 8:2705
Crawford, Forrest W; Ho, Lam Si Tung; Suchard, Marc A (2018) Computational methods for birth-death processes. Wiley Interdiscip Rev Comput Stat 10:
Pachankis, John E; Hatzenbuehler, Mark L; Wang, Katie et al. (2018) The Burden of Stigma on Health and Well-Being: A Taxonomy of Concealment, Course, Disruptiveness, Aesthetics, Origin, and Peril Across 93 Stigmas. Pers Soc Psychol Bull 44:451-474
Bazazi, Alexander R; Vijay, Aishwarya; Crawford, Forrest W et al. (2018) HIV Testing and awareness of HIV status among people who inject drugs in greater Kuala Lumpur, Malaysia. AIDS Care 30:59-64
Culbert, Gabriel J; Crawford, Forrest W; Murni, Astia et al. (2017) Predictors of Mortality within Prison and after Release among Persons Living with HIV in Indonesia. Res Rep Trop Med 8:25-35
Chen, Lin; Karbasi, Amin; Crawford, Forrest W (2016) Estimating the Size of a Large Network and its Communities from a Random Sample. Adv Neural Inf Process Syst 29:3072-3080