The landscape of data formats is rapidly expanding, with image, text and other complex formats becoming available for health related outcomes. By considering such data within the context of observational causal inference, they can be leveraged to improve clinical decisions, help evaluate treatment efficacy by estimating individualized treatment effects and help develop intelligent therapeutic systems where individualized treatments can be deployed. In R01EB025021, we concentrate on understanding how nearly exact matching can be achieved in the presence of a large number of categorical covariates. The proposed approach (called FLAME - Fast Large Almost Matching Exactly) is able to quickly learn which categorical covariates are important and to produce high quality matches citep{wang2017flame,dieng2018collapsing}. The main shortfall in the proposed work for R01EB025021 is that it does not naturally extend to more complex data types, it only works for categorical data in which each feature is meaningful. { bf This proposal will develop new statistical and computational tools for causal analysis of complex data structures.} Our new approach is called { emph Matching After Learning to Stretch (MALTS)}. For each unit (e.g. patient), we propose learn a latent representation of their covariate information and a distance metric on the latent space such that units that are matched tend to provide accurate estimates of treatment effect. MALTS can use deep learning to encode the latent representations for the units, or it can learn basis transformations in linear space (stretching and rotation matrices) for simpler continuous data types. We will develop the MALTS algorithm, and apply it in a medical context. Our goal is to construct high quality matches for the following types of data: (i) medical images, such as x-rays and CT scans, (ii) medical record data, (iii) time series data (continuous EEG data), (iv) a combination of any of the first three types of data.
We aim to leverage the newly developed tools to continue our evaluation of the efficacy of isolation for flu-like ailments as well as to apply them more broadly to publicly available modern datasets such as the MIMIC III database.
Reliable and consistent causal analysis of public health interventions requires the use of massive previously unavailable datastreams. For example, evaluation of the efficacy of isolation interventions on flu-like-illness spread must include information on friendships and interactions between individuals, biometric information, imaging, longitudinal health record data as well as standard demographic data. The proposed research provides machine learning and deep learning tools for properly employing this data for the identification and quantification of causal effects of such treatments that can lead to the development of better public health interventions.