The discovery of cause-and-effect relationships is a fundamental notion in science. To find such causal relationships, traditional methods based on interventions or randomized experiments are usually expensive or even impossible. Causal discovery aims to find the underlying causal structure or model from purely observational data and has many applications in various disciplines. Despite its successes on a number of real problems, the presence of measurement error in the observed data can produce serious mistakes in the output of various causal discovery methods. Given the ubiquity of measurement error caused by instruments or proxies used in the measuring process, this problem has been recognized as one of the main obstacles to reliable causal discovery. It is still unknown to what extent the causal structure for relevant variables can be identified in the presence of measurement error, let alone how to develop practical algorithms to solve this problem. This project aims to fill the void. It will investigate what information of the causal model of interest can be recovered from observed data and what assumptions one has to make to achieve successful recovery of the causal information. Based on such theoretical results, the project will then investigate efficient estimation procedures.

The project will establish theoretical identifiability results for the underlying, true causal structure and, in light of such results, develop practical causal discovery algorithms. Preliminary results show theoretically how measurement error changes the (conditional) independence and dependence relationships in the data, i.e., how the (conditional) independence and independence relations between the observed variables are different from those between the measurement-error-free variables. Based on the preliminary results, several research tasks will be carried out. First, classical causal discovery often assumes a linear-Gaussian model for the data, in which the causal relations are linear and the variables are jointly Gaussian. This project will establish the conditions under which the underlying causal model is identifiable up to an equivalence class or only partially identifiable. Second, this study will investigate how the identifiability of underlying causal structure in the presence of measurement error can actually benefit from the non-Gaussian noise assumption. Third, this study will develop statistically more efficient estimation procedures, by extending the GES method, by exploiting suitable sparsity constraints, or by extending the A* Bayesian network learning procedure. Finally, the above ideas will be extended to deal with related models in causality or statistics, including other contamination models, nonlinear causal models, and Markov networks.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2018-08-01
Budget End
2019-07-31
Support Year
Fiscal Year
2018
Total Cost
$59,967
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213