As learning algorithms become ubiquitous in our lives, both observers and insiders have expressed concerns about the potentially harmful or discriminatory biases and disparities. These issues may arise when algorithms use sensitive features in the data, such as race, age, gender, or sexual orientation, in inappropriate ways. Troubling racial disparities have been discovered for many kinds of health outcomes. In particular, it is known that African Americans have a higher prevalence of coronary heart disease compared to other ethnic groups, and are known to suffer higher rates of post-operative morbidity and mortality, after undergoing surgical interventions. While these disparities are well established in the literature, the extent to which they are due to biological factors, socioeconomic factors, or differences in offered care is not known. This proposal will address the conceptual, methodological, and practical gaps in assessing and addressing reasons for disparities in health outcomes by a combination of tools from causal mediation analysis and fairness-aware algorithms, and a rich dataset obtained from electronic health records. This will ensure the benefits of learning algorithms used for prediction and decision support in healthcare settings apply fairly and equitably to all. In addition, as part of the project, the project will allow for the introduction of disparities and algorithmic fairness into the data science curriculum at the university. Methodological and practical innovations for quantifying and addressing disparities developed in this research are crucial to make sure the benefits of learning algorithms used for prediction and decision support in healthcare settings apply fairly and equitably to all.

The perspective on disparities and fairness in the proposed project builds on the team's preliminary work where fairness constraints correspond to vanishing causal effects along certain (domain-specific) "impermissible" pathways in a causal model. The formal framework of causal modeling has allowed the team to mathematize (un)fairness criteria in terms of causal path-specific effects (PSEs) that can be estimated from observed data, and then imposed as constraints on the optimization task. The project will rigorously justify the proposed framework, making precise how the proposed formalization of fairness constraints (unlike previous proposals) is designed to intervene on cycles of injustice. Importantly we will draw on the relevant literature from moral philosophy and philosophy of science here, since the crucial concepts -- fairness, systemic injustice, causal explanation -- have been the subject of much debate and analysis in philosophy for decades. To address the limitations of prior work, which only allowed high quality solutions for relatively simple parametric models, or entailed intractable methods such as rejection sampling, this project develops novel methodology that will use techniques from structural nested models in causal inference and empirical likelihood in statistics to rephrase the problem in the framework of maximum likelihood. These methods will be easier to reliably scale to high dimensional data, and yield much higher quality solutions to both prediction and policy learning problems than previously possible. This will make our methodology for assessing and satisfying fairness constraints applicable to complex data found in healthcare. Finally, this project will apply the developed methodology to data on patients that have undergone heart surgery, and perform preliminary analyses that aim to assess the extent to which disparities are attributable to pathways associated with biology, socioeconomic status, and differences in care. The clinical team will begin to validate the models and the resulting findings. While algorithmic fairness is a topic of considerable interest to the machine learning community, with multiple approaches already explored, this proposal is unique in three ways. First, the proposed framework is well-motivated, and provides a systematic way to evaluate disparate and sometimes conflicting intuitions that underly previous proposals. Second, the project is designed to break the cycles of injustice in a formal sense. Finally, the proposed approach to fair inference is not an incremental extension of a single method to the problem, but draws on insights from multiple communities, and can be viewed as a novel combination of tools from analytic philosophy, causal inference, semi-parametric statistics, and optimization.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Johns Hopkins University
United States
Zip Code