Many machine learning technologies are built using black-box approaches, which can make it difficult to scrutinize the technology's decision-making process. This lack of interpretability represents a fundamental barrier to the adoption of machine learning technologies in some areas, such as health care, where transparency is key. Researchers have long relied on decision trees as a means of interpretable machine learning. In this approach, one develops a series of yes/no questions that eventually lead to a particular action being taken. While appealing in their simplicity, researchers have generally accepted that this approach will perform more poorly than more complex (but less interpretable) approaches. This project will establish that this need not be the case. A new approach to creating decision trees will be developed, which has a strong theoretical basis and performance competitive with less interpretable algorithms that are considered state-of-the-art. Software to implement the new approach will be developed and made freely available. The developed methods will be applied to ongoing studies of preventive vaccines and are poised to have broad impacts by providing personalized recommendations for vaccination. The project will also support graduate students and develop pedagogical material pertaining to ethical issues arising in machine learning in public health and clinical care.

The typical process for building a decision tree involves recursively partitioning the feature space using a greedy search. While this approach conveniently yields a decision tree, it is generally accepted to have worse performance when compared to other tree-based strategies, such as random forests or boosted trees. The project seeks to develop an alternative approach, where the feature space is partitioned using a method based on penalization called the highly adaptive lasso. The resultant prediction function enjoys desirable theoretical properties, but is not immediately representable as a decision tree, as it relies on non-recursive partitioning of the feature space. The first aim of the project will develop strategies for optimally representing a given partitioning via a recursive partitioning, thereby allowing a decision tree representation. A novel application of deep learning will be used to learn an optimal strategy for this representation, leveraging this archetypal uninterpretable algorithm to generate interpretable machine learning. In the second aim of the project, the approach will be extended to the context of personalized medicine and optimal decision trees for assigning treatments will be developed. The developments will have broad impacts on the theory of causal inference and robust machine learning. In the final aim, the developed methods will be applied to several contemporary trials of preventive vaccines to develop personalized vaccination recommendations. In particular, the methods will be applied to help determine optimal dosing strategies for a preventive malaria vaccine in children, which could have broad impacts on informing future vaccination strategies in Sub-Saharan Africa.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
2015540
Program Officer
Huixia Wang
Project Start
Project End
Budget Start
2020-09-01
Budget End
2023-08-31
Support Year
Fiscal Year
2020
Total Cost
$219,995
Indirect Cost
Name
Emory University
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30322