Disparities in health and health care have been a longstanding challenge in the United States. One specific area of medical care in which racial/ethnic disparities have been identified is total joint arthroplasty (TJA), particularly total knee arthroplasty (TKA) and total hip arthroplasty (THA). Large, population based studies necessary to address healthcare disparities can be costly and difficult to perform, and may be compromised by sampling strategies and patient selection biases. Efficient alternatives are publicly-available nationally representative databases such as the HCUP State Inpatient Databases (SID) and National Inpatient Sample (NIS). The SID provide information on all patients admitted to hospitals within participating states, allowing for comparison of health care access among many vulnerable populations, across states, and over time. The NIS is the largest publicly-available all-payer inpatient health care database in the nation. It is sampled from the SID through a complex survey design, yielding national estimates of health care utilization, quality, and outcomes. A significant limitation of the NIS and the SID is the quantity of missing data. In particular, ?patient race?, a key indicator for health disparities research, has a high proportion of missingness. Multiple imputation (MI) approaches have been increasingly popular for providing sound statistical methods to account for missing data. When conducting MI, it is suggested that imputation models be as general as data allow them to be, in order to accommodate a wide range of subsequent analyses of imputed data sets. This requires all relationships that are going to be investigated in any subsequent analysis, such as nonlinearities and interactions, to be included in the imputation model. Unfortunately, traditional MI methods, such as the multivariate imputation by chained equations (MICE), are built on parametric imputation models. These models are often not flexible enough to capture interactions and nonlinearities in high dimensional and large scale data settings. Unlike parametric models, machine learning techniques (MLTs) are model-free methods, and thus provide flexibility for missing data imputation. MLTs use algorithms that automatically and iteratively learn from all data to detect statistical dependencies in observations without being explicitly programmed where to look. The goal of this study is to make the two HCUP databases a more useful resource for the study of surgical disparities and other areas of medicine. Accordingly, we propose novel MI methods based on MLTs to impute missing data in the SID and the NIS, and to use the imputed datasets to measure racial disparity in TKA.

Public Health Relevance

It would be challenging to approach the goal of eliminating healthcare disparities without identifying disparities and analyzing the underlying causes using nationally representative databases such as the HCUP SID and NIS. This proposal imputes missing data for surgical disparities research using the two databases.

National Institute of Health (NIH)
National Institute on Minority Health and Health Disparities (NIMHD)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Aviles-Santa, Larissa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
George Washington University
Public Health & Prev Medicine
Schools of Public Health
United States
Zip Code