Statistical conclusions from research studies may often be misleading due to a variety of reasons including small sample sizes for the studies or confounding factors which are unknown to the investigators of the study. One way to reduce the possibility of misleading conclusions is to combine the results of multiple research studies using a technique referred to as "meta-analysis." Meta-analysis is one of the most widely used techniques to infer knowledge from data in science. The idea behind meta-analysis studies is that the combined statistical conclusions from multiple research studies reflect the information in all of the studies and are more likely to be accurate. The conclusions from meta-analyses are considered "better" or "more likely to generalize" than conclusions from single studies. However, this notion is not well formalized and formalizing this question is a goal of this project. In addition, existing meta-analysis methods do not take into account any knowledge of the similarities and differences between the studies. Taking advantage of these similarities and differences can improve the effectiveness of meta-analysis.

This project takes advantage of recent developments in the area of "causal inference" which is the study inferring cause and effect relationships from data. These types of inferences utilizes a type of graph called a causal graph which graphically represents cause and effect relationships. This project develops an alternate framework for meta-analysis based on a novel type of causal graph, a selection graph. A selection graph formally represents the similarities and differences between the studies. This project provides a unifying framework and powerful powerful methodology for meta-analysis. The methods developed in this project are applied to genetic studies where meta-analyses have discovered thousands of variants involved in common human disease in the past few years.

Causal graphs have had a major impact on the way causality is taught and understood in cognitive science, statistics, and the health and social sciences. The proposed research promises to have similar impacts by transforming the approach to meta-analysis, one of the work horses of statistical inference in the physical, life and social sciences. The resulting techniques will be used to perform meta-analyses of genetic studies which can lead to the discovery of variation involved in disease. The results of the project, including publications, software, data sets, and course materials will be made freely available through the project web site: http://zarlab.cs.ucla.edu/causal-meta-analysis/.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1302448
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2013-07-15
Budget End
2019-06-30
Support Year
Fiscal Year
2013
Total Cost
$1,120,781
Indirect Cost
Name
University of California Los Angeles
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90095