The proposed activity will address two problems: (1) transportability, and (2) data fusion. In the first topic, the project focuses on the problem of utilizing conclusions obtained in one environment in another by permitting reasoning agents to focus their reasoning on only the differences, while taking for granted that which is common to both environments. In the second topic, this project will formalize and reduce to algorithmic procedures the general problem of fusing data coherently from multiple heterogeneous sources. The proposed activities will develop effective procedures for determining whether unbiased estimates of causal relationships in a target environment can be synthesized from information obtained from a set of heterogeneous studies. These activities will lead to a theoretical understanding of the conditions under which a learning system can rely on previously learned information, transferred from a different environment.

Results from this research project have the potential to impact all data-related sciences where the transportability and data-fusion problems are ubiquitous. These two problems demand understanding of causal relationships in the domains being considered. Such causal relationships need to be addressed by causal calculi so as to extract the invariant features from each information source. The approach pursued in this project builds on previous work of the PI, for instance, reasoning with structural causal models and counterfactuals. The problems of transportability and data fusion are critical in the health and social sciences, where data is scarce and experiments are costly; they are of particular interest in the "Big Data" enterprise, which is driven by the premise that data availability will automatically result in data interpretability and where there are nuances among the contexts of data collection.

Project Report

The investigating team has tackled a scientific and computational problem that has been lingering around for at least two centuries. The problem is that of "generalizability" -- under what conditions we are able to generalize experimental results obtained in one environment onto a different environment, potentially different from the first. Needless to state, this problem haunts all the empirical sciences and it has now received formal treatment using graph theoretic tools and perspectives; we currently know precisely when cross domain extrapolations are feasible; that is, we have polynomial time algorithms for deciding this question and for combining information from several diverse environments to synthesize a coherent estimate of the target quantity. We anticipate an immediate practical applications of these results in improving the validity of Randomized Controlled Trials (RCT) which have become the gold standard of program effectiveness tests in the health and social sciences, yet suffer severely from lack of external validity -- subjects volunteering for these trials are not representatives of the population as a whole. Another problem that was tackled using the language of causal graph is that of "missing data." We showed that this problem, which has traditionally been handled by statistical technique, is fundamentally a causal problem, and can benefit substantially from reasoning about the causal process responsible from causing some variables to escape measurement. This paradigm has led to new algorithms capable of recovering probabilistic and causal queries from data corrupted by missingness. Finally, the algorithmization of counterfactuals has enabled the investigating team to apply counterfactual logic to personal decision making, with the aim of equipping robots with the capability of acquiring autonomy, based on their own experience.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1249822
Program Officer
Hector Munoz-Avila
Project Start
Project End
Budget Start
2012-10-01
Budget End
2014-09-30
Support Year
Fiscal Year
2012
Total Cost
$300,000
Indirect Cost
Name
University of California Los Angeles
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90095