There is a fundamental gap in correctly addressing causality in observational studies due to missing data, and lack of randomization, and complications due to temporality. Measures of association are inapposite for making relevant policy recommendations because these involve suggestions for interventions, which are causal statements. The long-term goal is to address important environmental health causal questions (e.g., discovering biological mechanisms linking air pollution to low birth weight or autism, and relating extreme weather conditions to heat stroke or worldwide nutritional deficiency) that have policy-relevant consequences, and develop statistical methodology that correctly addresses causality in environmental health studies. The overall objective of this grant application is to correctly formulate and estimate causal environmental health effects, especially in the presence of intermediate variables, by transporting successful statistical tools developed in the fields of missing data (Rubin 1978) and classical and modern multi-factorial randomized experiments over the past 80 years (essentially since Fisher, 1935). Guided by preliminary development and significant applications, this proposed research will consist of four specific aims: 1) Expand successful multiple- imputation methods for high-dimensional missing data to enable valid statistical inference when confronted with missing data using standard complete-data methods. Two different settings will be considered, the first one dealing with multivariate time series and the second with gold standard prediction from less accurate but available measurements. 2) Develop statistical theory to estimate casual estimands from data collected by observational studies, which would be reconstructed to approximate data from a randomized experiment; one particularly interesting setting will consider intermediate variables on the causal pathway between an exposure and an outcome (also called mediators). 3) Expand standard methods developed for causal mediation analysis; analysis of mediation has become a popular developing tool to examine causal biological pathways and their relative contribution to adverse health effects. 4) Implement these methods developed in the three previous aims with new software that is compatible with software currently used by biomedical researchers. The proposed research targeting correct formulation and estimation of causal environmental health effects is innovative because it represents a substantial departure from the status quo by transporting successful methods and concepts developed in classical and modern statistics in two areas: 1) multiple imputation techniques for handling missing data, 2) analysis of complex multi-factorial randomized experiments, especially in the presence of intermediate variables and complex data (e.g., longitudinal, survival, and high-dimensional). The proposed statistical methodology will be significant to biomedical research because it will yield valid causal environmental health effect estimates under precisely stated assumptions, which are expected to provide positive impacts on policy decisions and suggest appropriate interventions.
Showing the most recent 10 out of 11 publications