This proposal is largely motivated by our involvement with the Botswana Combination Prevention Project (BCPP) which is an on-going large scale human immunodeficiency virus (HIV) cluster randomized prevention trial conducted in 30 communities across Botswana. As in most HIV prevention studies, incomplete data on HIV status and nonresponse to queries about sexual behavior is an important challenge the study currently faces, with data likely missing not at random and in complex patterns across individuals. Recognizing that existing statistical methods for missing data are largely ill-suited to fully address this important problem in HIV research, we propose to develop the next generation of missing data methods going well beyond current theory of identification and inference. Specifically, we propose (1) to develop a unified theory of identification bringing together recent developments in the theory of identification based on causal graphs with recent identification results from the statistics literature. This will allow us to establish conditions under which in complex missing data settings as in the BCPP, one can untangle features of the underlying population which may be of scientific interest from features of the non-response process not necessarily of scientific interest;(2) to build on (1) to develop corresponding inverse-probability-weighted and doubly robust methods for statistical inference in the BCPP where data are likely to be missing not at random and in complex patterns; (3) to develop novel semiparametric imputation methods that solely rely on assumptions encoded in the nonresponse process, thus allowing the complete data distribution in the BCPP to remain unscathed by the imputation process; (4) to develop user-friendly software to facilitate widespread use of the methods developed in Aims 1-3, and to apply and demonstrate their good performance in extensive simulation studies as well as in answering scientific queries of primary interest in the BCPP.

Public Health Relevance

The proposed project will develop the next generation of methods to address selection bias due to missing data in HIV prevention research. The public health impact of these methods promises to be significant given the ubiquity of incomplete data in HIV research and the inadequacy of existing methods to address the severity of this problem. The favorable performance of the proposed methods will be compared and contrasted with existing methods in both extensive simulation studies but also in a number of applications including but not limited to (a) the evaluation of HIV prevalence in Botswana subject to missing data on HIV status, (b) the estimation of HIV incidence in Botswana using a cross-sectional survey subject to incomplete data, (c) the evaluation of UNAIDS 90-90-90 goals in Botswana, i.e. evaluating the proportion of HIV infected people who know their status, the proportion of people who know their status and are on treatment for HIV, and the proportion of people on treatment who are virologically suppressed; and (d) estimation of the rate of concurrency of sexual relationships in the presence of data missing not at random and in a complex patterns. Queries such as (a)-(d) will be investigated using methods developed in this grant and applied to the Botswana Combination Prevention Project, an ongoing large scale cluster randomized HIV prevention trial in Botswana, in which the investigators are intimately involved.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Research Project (R01)
Project #
Application #
Study Section
AIDS Clinical Studies and Epidemiology Study Section (ACE)
Program Officer
Gezmu, Misrak
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Biostatistics & Other Math Sci
Sch of Business/Public Admin
United States
Zip Code
Dukes, Oliver; Martinussen, Torben; Tchetgen Tchetgen, Eric J et al. (2018) On doubly robust estimation of the hazard difference. Biometrics :
Sun, BaoLuo; Tchetgen Tchetgen, Eric J (2018) On Inverse Probability Weighting for Nonmonotone Missing at Random Data. J Am Stat Assoc 113:369-379
Sun, BaoLuo; Perkins, Neil J; Cole, Stephen R et al. (2018) Inverse-Probability-Weighted Estimation for Monotone and Nonmonotone Missing Data. Am J Epidemiol 187:585-591
Wang, Linbo; Tchetgen Tchetgen, Eric (2018) Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables. J R Stat Soc Series B Stat Methodol 80:531-550
Nabi, Razieh; Shpitser, Ilya (2018) Fair Inference on Outcomes. Proc Conf AAAI Artif Intell 2018:1931-1940
Miao, Wang; Tchetgen Tchetgen, Eric (2017) Invited Commentary: Bias Attenuation and Identification of Causal Effects With Multiple Negative Controls. Am J Epidemiol 185:950-953
Tchetgen Tchetgen, Eric J; Wirth, Kathleen E (2017) A general instrumental variable framework for regression analysis with outcome missing not at random. Biometrics 73:1123-1131