Peripheral arterial disease (PAD) is a major cause of morbidity and mortality in the United States, affecting over eight million Americans, of whom 100,000 a year suffer major amputation. Current guidelines dictate medical treatment and aggressive risk factor modification for all PAD patients, whether symptomatic or not, with revascularization attempts for patients with chronic limb threatening ischemia (CLTI) or lifestyle-limiting claudication. Despite strongly-worded standards of care, variability in PAD outcomes persists. Prior research has demonstrated that some demographic factors such as gender, race, and socioeconomic status are associated with worse PAD care and outcomes even when controlling for comorbidities. It is unknown what specific patient, provider, and healthcare system factors lead to these disparities. Efforts to understand which patients will suffer worse outcomes and disease progression have been hampered by contemporary outcomes research techniques. The majority of PAD outcomes research relies on administrative claims databases, procedural registries, or single center retrospective reviews. While each of these methods has some advantages, none offer the combination of patient- and disease-specific data, information about care provision on a provider and health-system level, and outcomes across a range of possible locations. Furthermore, use of any of these methods at the scale necessary to draw powerful conclusions is prohibitively time- and resource-intensive. The overall objective of this research is to use a novel natural language processing model to build a combined EHR/CMS database and to use that database to predict which PAD patients are at highest risk of poor outcomes with improved power and precision. This proposal contains plans for collaboration with Duke Forge, who bring expertise in natural language processing and machine learning in order to efficiently identify PAD patients within our EHR and efficiently abstract information about them. Once identified, these patients can be linked to their CMS outcomes, allowing for assessment of how patient-, physician-, and healthcare-specific factors affect PAD outcomes. Our central hypothesis is that natural language processing powered by machine learning will permit efficient identification of patients with PAD, thereby facilitating higher-powered and higher-quality investigation into disparities in PAD outcomes. This research will pave the way for future interventions targeting sources of outcome inequality, possibly including access to care, physician adherence to national guidelines, and patient preferences or health literacy.

Public Health Relevance

Peripheral arterial disease affects more than eight million Americans, with over one hundred thousand amputations performed yearly. Despite the prevalence and morbidity of this disease, there is a lack of knowledge about which patients will require amputation, multiple surgeries or hospitalizations, or suffer cardiovascular- related death. This proposal uses natural language processing to improve on current research methodology in order to account for patient-specific, physician-specific, and system-specific factors in peripheral arterial disease care and outcomes.

National Institute of Health (NIH)
National Heart, Lung, and Blood Institute (NHLBI)
Postdoctoral Individual National Research Service Award (F32)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Purkiser, Kevin
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Duke University
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code