Causality is central to many of the most important questions in science and policy: Which cancer treatments are most effective for which patients? Would more strict gun laws result in fewer homicides? Causal inference is concerned with formulating such questions mathematically, exploring whether answers can be gleaned from data, and if so, determining how well and with what statistical methods. Classical methods in causal inference tend to aim at simple summary effects, such as how outcomes would change on average if a treatment were applied to an entire population versus not at all. However, with big data, investigators can ask more complicated questions, such as how treatment effects vary with complex covariate information, or how outcome densities would change with sequential treatments applied over many timepoints. In this project the PI will develop flexible statistical methods for answering such questions without imposing strong assumptions, and will study optimality, i.e., how well one can possibly answer such questions.

The above questions can be framed as high-dimensional functional estimation problems. The classical approach here is to use strong parametric assumptions to reduce these problems to finite-dimensional ones. This allows for standard methods and a deep understanding of optimality, but when true parametric structure is unknown, incorrect assumptions can result in sizable bias and irrelevant efficiency bounds. In fact, little is known in the nonparametric case. Thus, the PI will develop novel nonparametric estimators of high-dimensional causal functionals, study their risk, provide confidence bands and inferential tools, and explore minimax lower bounds. All methods will be made available in R software. This proposal focuses on the foundational problems of (a) estimating counterfactual densities and (b) heterogeneous treatment effects, in three prominent domains of causal inference: (i) unconfounded point treatments, (ii) instrumental variables, and (iii) time-varying treatments. In addition, the PI will establish a general framework for bias-corrected estimation and inference for high-dimensional functionals.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Mathematical Sciences (DMS)
Application #
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Carnegie-Mellon University
United States
Zip Code