It is well known that causal analysis frequently suffers when relevant variables are left unobserved. Because of this, many modern public health datasets have started including massive quantities of previously unavailable information on each individual. For example, a recent study of flu-like-illness spread on college campuses has collected numerous different static and dynamic networks, biometric information, as well as standard demographic data for each individual. This project develops new statistical and computational tools that incorporate these new data structures into the evaluation of different interventions and produce interpretable causal analyses. Typical approaches to such causal analyses rely on strong modeling assumptions and dimension-reduction techniques that throw away relevant information about individuals and can lead to biased causal estimates. For example, when network information is collected it is frequently reduced to egocentric summaries that do not reflect the overall network structure. The goals of this project are as follows: 1. Develop fast almost-matching-exactly algorithms that construct matched sets for causal inference in massive datasets. 2. Develop methods for matching on available network information in order to better understand how biological processes spread. These tools are widely applicable and may lead to new insights into complex causal mechanisms. In particular this study will evaluate the efficacy of isolation interventions on flu-like-illness spread and propose new and efficient interventions to battle pandemic spread.
Reliable and consistent causal analysis of public health interventions requires the use of massive, previously unavailable datastreams. For example, evaluation of the efficacy of isolation interventions on flu-like-illness spread must include information on friendships and interactions between individuals, biometric information, as well as standard demographic data. The proposed research provides statistical and computational tools for properly employing this data for the identification and quantification of causal effects of such treatments that can lead to the development of better public health interventions.