We do not have a broadly efficacious vaccine against HIV, a virus that causes approximately 2 million new infections each year. Current proof-of-concept studies using broadly neutralizing antibodies (bnAbs) against HIV aim to understand how prevention varies with genotypic characteristics of the virus. Since performing an exhaustive search over all genotypic characteristics results in low statistical power to detect effects after adjusting for multiple comparisons, researchers typically pre-specify a small number of features to focus on. There is growing interest in using machine learning-based methods to both corroborate prior understanding and suggest new important genotypic characteristics in predicting sensitivity of the HIV virus to bnAbs. While machine learning-based methods have the potential to yield valid predictive models, issues remain in using these methods for estimating importance. The proposed research will address three such issues: developing a model-free variable importance measure, incorporating information from complex sampling designs, and valid statistical inference both when a genotypic feature is truly important and when it is not. First, the main classical tool for evaluating the importance of characteristics is the ANOVA decomposition, which makes strong modeling assumptions. Machine learning-based methods use minimal assumptions; however, these methods do not generally admit valid statistical inference, and the importance estimates are intimately tied to the technique employed. We will employ an approach based on ideas from the theory of semiparametric estimation and inference to develop a model-free measure of variable importance with valid confidence intervals for the true importance. Second, many HIV vaccine trials incorporate a nested case-control study, where additional information is measured on a subset of the trial participants. Estimating importance only using the subset ignores information from the remaining participants, resulting in a loss of efficiency and potentially adding some bias in estimating variable importance. The proposed research will develop methods that properly account for the sampling design. Finally, to determine if a set of features can be excluded from further analyses, we need a procedure for testing if the feature set truly has no importance. Hypothesis testing using machine learning-based methods is challenging, but we will build on recent advances in semiparametric inference to develop valid procedures for hypothesis testing in the context of variable importance. By combining advances in machine learning technology with ideas from semiparametric estimation and inference, we will determine important feature sets in predicting sensitivity of the HIV virus to bnAbs. In addition to yielding a deeper understanding of HIV neutralization, this information will allow researchers to make the best possible use of data from current clinical trials. This, in turn, could lead to either a shorter time to an HIV vaccine or new bnAbs in the research pipeline that are more broadly efficacious or potent. Any of these outcomes will transform preventative care for patients at risk of HIV infection.

Public Health Relevance

Patients living with HIV or AIDS usually require a combination of antiretroviral therapy (ART) and supportive care for life. Understanding how features of the HIV genotype explain the susceptibility of the HIV virus to neutralization by broadly neutralizing antibodies (bnAbs) against HIV could lead to a broadly efficacious vaccine against HIV infection, either by using a combination of bnAbs or by developing more potent or longer lasting bnAbs that target certain important regions of the HIV genotype. These advances have the potential to make a large public health impact by reducing the number of incident HIV infections, thus reducing the use of ART and the need for supportive care.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Predoctoral Individual National Research Service Award (F31)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Gezmu, Misrak
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Washington
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code