The imminent deployment of AI and machine learning in poorly-characterized settings such as autonomous driving, personalized news feeds, and treatment recommendation systems has created an urgent need for machine learning systems that explain their decisions. Interpretability helps human experts ascertain whether machine learning systems, trained on technical objective functions, have sensible outputs despite unmodeled unknowns. For example, a clinical decision support system will never know all of a patient's history, nor may it know which of many side effects a specific patient is willing to tolerate. An important challenge, then, is how to design machine learning systems that both predict well and provide explanation. Within this broad challenge, this work develops techniques for domain-targeted interpretability, finding summaries of high-dimensional data that are relevant for making decisions. The proposed work focuses on healthcare applications, where interpretable models are essential to safety. However, the project aims to produce foundational learning algorithms applicable to a range of scientific and social domains. The developed methods will be tested on real problems in personalizing treatment recommendations and prognoses for sepsis, depression, and autism spectrum disorder. Thus, the successful completion of the work will impact both interpretable machine learning and clinical science. All software developed in the course of the project will be freely shared. The educational component of the proposed work will educate early elementary students about the impact of statistics in medicine and educate policy-makers and legal scholars on how a right to explanation might be regulated in the context of machine learning, such as clinical decision support systems. PI Doshi-Velez also engages high school students, undergraduates, women, and researchers from underserved areas in her lab.

The proposed work addresses a specific challenge common in scientific settings: domain-targeted interpretability. In many scientific domains, unsupervised generative models are used by domain experts to understand patterns in the data, but as the dimensionality of data grow, the most salient patterns in the data may not be relevant for the specific investigation. For example, a psychiatrist may find the strongest signals in the data from his patient cohort come from diabetes and heart disease, which may not be relevant for choosing therapies for depression. The proposed work leverages synergies in explaining domain-relevant patterns in the data and performing well on domain-relevant tasks to achieve domain-targeted interpretability. It defines a task-constrained approach to domain-targeted interpretability and develops essential inference techniques, develops extensions to sequential decision making, and defines extensions to improve downstream task performance while retaining interpretabilty. While there is a large body of work on making unsupervised learning models also useful for downstream tasks, none of these approaches truly manage the trade-offs between providing an interpretation of data and task performance. The proposed work addresses these shortcomings to make domain-targeted interpretability and task performance synergistic goals, and proposes a number of innovations to solve the proposed objective. Innovations include combining an existing rich literature on inference traditional unsupervised models with modern inference techniques and directly searching for dimensions or patterns relevant to the downstream task.

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Application #
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
United States
Zip Code