Causal Inference is a broad area of statistical research where investigators are interested in the quantitative exploration of cause and effect relationships between exposures and outcomes. Questions that fall under this framework range across a vast canvass of applications, including, medical sciences and biology, economics, social sciences, and environmental health. Specific examples can include understanding the efficacy of treatments for a disease, the importance of genes and proteins on deciding biological functions, the interplay between environmental factors and genetic variations on human mortality, and the role of careful product placements to modulate market behaviors. This immense breadth of the statistical paradigm naturally comes with its share of subtleties and pitfalls. One of the major challenges in a statistically principled analysis of cause and effects is the presence of other factors, known as confounders, which often mislead investigators in falsely believing spurious relationships. Accounting for such confounders, therefore, becomes of great importance. The increasing ability of human beings to collect more and more data has made measurements of many such factors possible in common examples of causal inference studies. Although in principle, this has made causal inference a more feasible and exciting field, the mathematical formalism of such studies still come with a burden of assumptions to deal with these confounders -- which can often be extremely restrictive for practical applications. It is widely believed that tools from machine learning and artificial intelligence are natural choices to alleviate the burden of these assumptions. This project is aimed at understanding the role of these tools in disentangling causal inference related questions in a statistically principled and mathematically sound manner.

As mentioned above, it is often argued that the use of machine-learning methods to nonparametrically estimate nuisance parameters alleviates the burden of the assumptions made in observational studies. Although true at heart, most machine learning methods are geared to attain low prediction errors in regression type problems -- whereas estimation of quantities like causal effects might require a somewhat different understanding. This project is, therefore, aimed at disentangling some machine learning algorithms used in the study of causal effects. The major goals of this project can be divided into the following regimes -- (i) exploring the crucial, and often overlooked, need of formal statistical methods for inferring causal effects which are adaptive over standard assumptions made in practice, (ii) a causal mediation analysis framework which paves the way for seamless application of state of the art machine learning methods, and (iii) the mathematical exploration of machine learning algorithms such as deep neural networks and generative adversarial networks in the context of these inferential problems. The developed understanding will be used to explore the effect of early life exposure to metal mixtures (like lead, arsenic, and cadmium exposures through drinking water) on late-life neurological diseases (such as Alzheimer's disease) and the potential role of high dimensional biomarkers such as EV miRNA's that might modulate such effects.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2020-02-01
Budget End
2022-01-31
Support Year
Fiscal Year
2019
Total Cost
$120,700
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138