A scientific mission of critical importance is to transform massive data into actionable knowledge, which largely centers on understanding causal relationships. Causal inference has become one of three main tasks in data science, in addition to descriptive and predictive analyses. This research project aims to close existing gaps in estimation of heterogeneous causal effects and will make more statistical tools available for analyzing massive observational data. It will blend the conventional statistical approaches to causal inference with the fast-growing machine learning techniques and provide researchers and policy makers with powerful methodological tools to better evaluate the impact of interventions and thus to optimize decision making. Doctoral students in Statistics and Biostatistics will be involved in the development and implementation of the methods.

This project concerns the development of a stream of innovative Bayesian semiparametric methods for efficient and robust causal inference in the presence of effect heterogeneity in large observational datasets. Conventional statistical approaches have a strong tie to randomized experiments, which enjoy easy causal interpretation but may suffer in terms of efficiency. Moreover, recently developed nonparametric regression and machine learning methods focus primarily on outcome modelling and prediction, which may encounter troubles from confounding and are often more difficult to interpret. Furthermore, hidden bias from unmeasured confounding is a major concern in observational studies. The status quo sensitivity analysis for assessing hidden bias does not accommodate complex data structures. The PIs will develop a robust Bayesian semiparametric framework for incorporating the treatment assignment process into the outcome modelling. The framework can easily accommodate complex heterogenous effects or hierarchical structures in massive observational data, adequately take advantage of experts? knowledge and existing causal theory on how the intervention might work, and effectively assess the impact due to potential unmeasured confounders. Propensity scores will be incorporated in potential outcome models via Gaussian process priors and connections with the conventional matching estimators will be established. Moreover, the impact of unmeasured confounding will be assessed through Bayesian sensitivity analysis.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
2015552
Program Officer
Pena Edsel
Project Start
Project End
Budget Start
2020-07-01
Budget End
2023-06-30
Support Year
Fiscal Year
2020
Total Cost
$250,000
Indirect Cost
Name
Ohio State University
Department
Type
DUNS #
City
Columbus
State
OH
Country
United States
Zip Code
43210