(Shepherd and Shaw, R01, Statistical methods for correlated outcome and covariate errors in studies of HIV/AIDS) There is growing interest in using administrative electronic health record (EHR) data and other routinely collected data sources as cost-effective means to support HIV/AIDS research. Validation of observational cohort and EHR data demonstrate the substantial presence of errors in these types of data. There may be errors in failure and censoring times (e.g., time from ART initiation to clinical events), event classifications, and covariates (e.g., CD4 at ART initiation), with strong correlation between the magnitudes of errors in these variables. These correlated errors can bias estimation. Ideally, researchers could validate a subsample of their data and use information learned from this subsample to improve estimation for the entire cohort, thereby obtaining valid estimates without validating the entire database. However, the current lack of available methods and software to correct for these types of errors for time-to-event outcomes are major barriers to performing correct inference on these types of data. There is also little guidance on what records and variables to validate to optimize resources. This project will create novel statistical methods for estimation to reduce or eliminate bias caused by correlated errors in failure-time outcomes and associated covariates. The developed methods will use information on the structure of the measurement error, gained by data validation or audit subsets, to adjust estimation and correct for errors that remain in the unvalidated data. The project will develop and examine extensions of regression calibration, corrected scores, and multiple imputation methods, augmented with raking techniques to address these correlated errors. The project will also develop efficient data validation and audit sampling designs that use adaptive, multi-wave sampling in order to target successive validation and audit subsets towards informative subgroups of patients. Open source tools will be developed to allow researchers to implement these methods and study designs. The methods and designs will be applied to data from the International Epidemiologic Databases to Evaluate AIDS (IeDEA) to estimate the incidence of tuberculosis and Kaposi's sarcoma and their outcomes, risk factor associations, and temporal trends among persons living with HIV in East Africa and Latin America.

Public Health Relevance

Routinely collected data are increasingly used for HIV/AIDS research, but these data are prone to errors. We will create novel statistical methods, study designs, and tools to guide strategies for validating subsamples of patient records and incorporating data validation findings into analyses to improve estimation for time-to-event outcomes. The proposed methods will be applied to studies using data from the Latin American and East African regions of the International epidemiologic Databases to Evaluate AIDS (IeDEA) network.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Research Project (R01)
Project #
Application #
Study Section
AIDS Clinical Studies and Epidemiology Study Section (ACE)
Program Officer
Gezmu, Misrak
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Vanderbilt University Medical Center
United States
Zip Code