The commercial success of data mining, and the great research interest that this area attracts, prove that there is a need for analyzing and understanding data that goes well beyond classical database queries. Users are often particularly interested in understanding the causal relationship between data items and the reasons for observations.

Current database systems cannot explicitly model the causal structure within data (although it is often implicit in the data), and thus offer no specific support for causal queries. In the absence of information about causal relationships, users have to rely on techniques for mining for statistically significant patterns in data. Causal relationships are often simply concluded from statistical dependencies. This can lead to inaccurate conclusions; correlation does not necessarily imply causation.

This project creates the foundations for a new breed of databases called causal databases. Causal databases can model causal information, and allow for queries regarding causality and explanations, which are beyond the scope of current databases. They can also take advantage of causal information that is implicit, but unexploited, in some current databases, such as those for large engineering projects. In the project, new database models and query languages for representing and transforming causal information are developed, with particular focus on large engineering databases and scientific databases. In addition, efficient and scalable techniques for processing causality and computing explanations in large causal databases are developed. This involves both work on integrating causality processing into traditional database query processing architectures and the development of special datastream techniques for scaling up to the most data-intensive applications.

Further information on the project can be found at the project web page: www.cs.cornell.edu/databases/causality/

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0911036
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2009-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2009
Total Cost
$2,353,128
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850