Pooling is a method which combines multiple individual biospecimens which are measured as a single unit to reduce cost, improve analytic feasibility, and/or improve statistical efficiency. Previously showing that pooling allowed for a highly accurate estimation of the mean, random sampling provided a more efficient estimate of the variance leading to our development of a cost-efficient hybrid design that involves taking a sample of both pooled and unpooled data in an optimal proportion to efficiently estimate the unknown parameters of the biomarker distribution. While continuing to develop methods in discrimination analysis with pooled biomarkers, our focus has evolved to developing methods in regression with either a pooled exposure or a pooled outcome. For exposure measurements, we not only developed methods for pools formed stratified by a covariate or outcome but also for pools formed independent of other variables, which could be of mixed outcome status. The latter may be highly impactful since it relaxes the need for stratified pools common to current methods and allows for secondary analysis of pooled biomarkers after an optimal design to pool dependently on a primary outcome has been employed. Furthermore, we developed methods for normally distributed exposures and outcomes allowing for skewed biomarkers in both roles. Notably, all of these techniques maintain the flexibility to allow for a hybrid pooled-unpooled design. Progress was also made in using pooled samples in a case-only design for estimation of gene-environment interactions related to disease, investigations which are prone to low statistical power due to the need for a sufficient number of individuals on each level of disease, gene, and environment. Specifically, a case-only design was proposed, assuming gene-environment interaction independence in controls for a rare disease. However, the gene-environment interaction independence assumption is not always strictly met; therefore to maintain the increased efficiency of the CO estimator while being more robust to departures of this assumption, modifications to the traditional case-control estimator were proposed using a two-stage estimator and an empirical Bayes-type shrinkage estimator. Regarding biologically informed innovations in biostatics for epidemiology, we continued to bring together laboratory science and epidemiology. Under this umbrella two collaborative efforts, funded by a competitive external grant from the American Chemistry Council, were created to bridge biostatistics and etiologic research. The first series of papers explored the current state-of-the-art statistical methods for handling missing data and promoting pragmatic principled parametric (e.g., multiple imputation) and semi-parametric (e.g., inverse probability weighting) techniques, arguing the importance of principled missing data methods is equal to that of adjusting for confounding, and that the use of such methods should be similarly prevalent in etiologic research. The second series of papers was motivated by interest in outcome dependent sampling designs as fiscally and statistically efficient designs. Dependent sampling designs enrich a cohort based on an exposure or outcome of interest, thus collecting data on the most informative individuals. These designs are accompanied by analysis techniques which account for the enrichment and provide proper inference. As the current literature was based on statistical idealization, our group sought to broaden the appeal of these methods by honing the designs for specific epidemiologic application. For example, a cluster-stratified case-control outcome dependent design was developed motivated by the practical need to sample patients within clinics rather than across clinics. In all the papers, operating characteristics as well as potential trade-offs of a standard design were provided as practical guidance and motivation for these highly efficient designs. More focused work on novel methods specific to reproductive and perinatal epidemiology has led to several valuable contributions. Building on previous longitudinal methodology on menstrual cycle and pregnancy, we addressed issues surrounding timing of measuring maternal and fetal weight gain during pregnancy which are time-dependent exposures in some analysis and outcomes of interest in others. Specifically, a regression-based adjustment for gestational age was described which produces unbiased estimates for the association between maternal gestational weight gain and neonatal mortality risk. Similarly, a time-to-delivery approach was also developed to assess the relationship of maternal weight gain and preterm birth through a survival framework, illustrating how several strategically timed measurements can yield unbiased risk estimates where a nave analysis fails to mitigate bias. Lastly, with the recent developments of causal inference, especially the utilization of directed acyclic graphs (DAGs), many common terms in epidemiology, such as confounding, selection bias, and measurement error, have been more precisely defined. However, concepts such as overadjustment, specific cases of selection bias, and collinearity remained unexplored. Using DAGs, we previously redefined overadjustment bias and truncation in terms of DAGs. Collinearity is another loosely defined term with broad impact in epidemiologic studies, where convention is to delete or combine variables that are highly correlated (i.e. collinear). We succinctly showed the consequences of collinearity in linear and logistic regression in three fundamental causal scenarios: intermediates, confounders, and colliders. Through closed form solutions and simulation results for linear and logistic regression, respectively, bias and variance of total effect estimates challenged the dogma of variable reduction and instead advocated for a focus on a properly specified model where unbiased results can be achieved even with near perfect correlation between the exposure and a given intermediate, confounder, or collider. With an increased utilization of multiplex assays, interest in the exposome, and concerns over environmental chemical mixtures, these important findings highlight a critical need to consider the causal framework rather than deleting variables for statistical convenience. We will continue to generate new methodologies that are born of real world problems and that are cost efficient and statistically principled while incorporating knowledge of the etiologic and measurement processes underlying most biomarkers.

Project Start
Project End
Budget Start
Budget End
Support Year
14
Fiscal Year
2016
Total Cost
Indirect Cost
Name
U.S. National Inst/Child Hlth/Human Dev
Department
Type
DUNS #
City
State
Country
Zip Code
Perkins, Neil J; Cole, Stephen R; Harel, Ofer et al. (2018) Principled Approaches to Missing Data in Epidemiologic Studies. Am J Epidemiol 187:568-575
Zhang, Wei; Liu, Aiyi; Albert, Paul S et al. (2018) A pooling strategy to effectively use genotype data in quantitative traits genome-wide association studies. Stat Med 37:4083-4095
Vernet, CĂ©line; Philippat, Claire; Calafat, Antonia M et al. (2018) Within-Day, Between-Day, and Between-Week Variability of Urinary Concentrations of Phenol Biomarkers in Pregnant Women. Environ Health Perspect 126:037005
Van Domelen, Dane R; Mitchell, Emily M; Perkins, Neil J et al. (2018) Logistic regression with a continuous exposure measured in pools and subject to errors. Stat Med 37:4007-4021
Pollack, Anna Z; Mumford, Sunni L; Krall, Jenna R et al. (2018) Exposure to bisphenol A, chlorophenols, benzophenones, and parabens in relation to reproductive hormones in healthy women: A chemical mixture approach. Environ Int 120:137-144
Schildcrout, Jonathan S; Schisterman, Enrique F; Aldrich, Melinda C et al. (2018) Outcome-related, Auxiliary Variable Sampling Designs for Longitudinal Binary Data. Epidemiology 29:58-66
Sjaarda, Lindsey A; Ahrens, Katherine A; Kuhr, Daniel L et al. (2018) Pilot study of placental tissue collection, processing, and measurement procedures for large scale assessment of placental inflammation. PLoS One 13:e0197039
Schildcrout, Jonathan S; Schisterman, Enrique F; Mercaldo, Nathaniel D et al. (2018) Extending the Case-Control Design to Longitudinal Data: Stratified Sampling Based on Repeated Binary Outcomes. Epidemiology 29:67-75
Ananth, Cande V; Schisterman, Enrique F (2018) Reply. Am J Obstet Gynecol 218:366-367
Harel, Ofer; Mitchell, Emily M; Perkins, Neil J et al. (2018) Multiple Imputation for Incomplete Data in Epidemiologic Studies. Am J Epidemiol 187:576-584

Showing the most recent 10 out of 101 publications