III: Small: Causal and Statistical Inference in the Presence of Confounding Factors

Eskin, Eleazar

Abstract

The presence of unmeasured confounding factors can result in incorrect statistical and causal inferences if the confounding factors are correlated with the observed data. This phenomenon has been well documented in at least two important applications. One application is identifying genetic variation involved in disease from populations of related individuals. A second application is identifying genes active in a disease when comparing disease and health samples. In this proposal we propose a new approach to correct for unobserved confounders in taking advantage of insights into how confounders affect high dimensional data. These insights motivate a formal definition for a specific type of confounder which we term a 'low-rank confounder.' Formalizing this definition allows us to motivate methods for correcting for the effects of these types confounders even when the confounders are not observed. Our proposal will develop a theory of how confounders affect data and under what conditions unobserved confounders can be corrected. The proposed theory is related to recent developments in understanding sparsity which has been well studied in electrical engineering, computer science and statistics. The result of our proposed methods will lead to improved methods for applications where such confounders are present.

Nontechnical Abstract

Inference of knowledge from high dimensional data is a fundamental problem affecting virtually all areas of science including physics, astronomy, chemistry, computer science, social science and many areas of biology. Many of these problems are driven by recently available large sources of data and advances in measurement or data collection technologies. A major challenge is the presence of unknown (and unmeasured) confounding factors. Confounding factors are variables that are often not observed in the data, but are correlated with various features of the data. Unfortunately, confounding factors can cause incorrect inferences. This phenomenon has been well documented in at least two important applications: one application is identifying genetic variation involved in disease from populations of related individuals, and a second application is identifying genes active in a disease when comparing disease and health samples. There are traditional approaches to perform inference if the confounders are observed in the data. However, dealing with unobserved confounders is more difficult. This project will develop and study a new approach to correct for unobserved confounders, taking advantage of insights into how confounders affect high dimensional data. The project has broad impact due to its utility in a wide range of scientific questions, through the interdisciplinary research opportunities provided to undergraduate and graduate students, and through the distribution of software and data.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 1320589
Program Officer: Sylvia Spengler

Project Start
Project End
Budget Start: 2013-06-01
Budget End: 2017-05-31
Support Year
Fiscal Year: 2013
Total Cost: $499,919
Indirect Cost

III: Small: Causal and Statistical Inference in the Presence of Confounding Factors
Eskin, Eleazar
University of California Los Angeles, Los Angeles, CA, United States

Abstract

Nontechnical Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Nontechnical Abstract

Funding Agency

Institution

Comments