Extensions of likelihood-based sufficient dimension reduction methods were proposed and studied for analyzing biomarkers that are left and/or right censored due to lower or upper limits of detection. These methods apply generally to any type of outcome, including continuous and categorical outcomes. Bias of estimates of exposure effects conditional on covariates was assessed when summary scores of confounders, instead of the confounders themselves, were used to analyze observational data. Two scores, the propensity score (PS) and the disease risk score (DRS) were studied in detail. New procedures were developed for seasonal adjustment and calibration of blood measurements of vitamin D to support the multicenter international Vitamin D Pooling Project for Breast and Colorectal Cancer. These methods were used to guide the analysis on the associations of blood levels of Vitamin D with the risk of breast and of colorectal cancer with findings that indicate protective effects for colorectal cancer. Parametric and semi-parametric mixture models have been proposed for analyzing left or interval-censored data from electronic health records. The new approach was used for risk estimates that underlie current U.S. risk-based cervical cancer screening guidelines. A multiple imputation approach based on Additive Regression, Bootstrapping and Predictive mean matching (ARBP) methods was introduced to accurately impute the missing values for steps collected in the 2003-2004 National Health and Nutrition Examination Survey NHANES. A novel class of functional data analysis models based hierarchical stochastic differential equations was developed to address some limitations by existing methods. An efficient Hardy Weinberg Equilibrium test was developed to analyze genetic data collected from population-based household surveys utilizing pairwise composite likelihood methods that incorporate the sample weighting effect and genetic correlation induced by the complex sample designs. A general procedure was developed for conducting gene and pathway analysis that uses only SNP-level summary statistics in combination with genotype correlations estimated from a reference panel of individual-level genetic data. A family of multi-locus testing procedures were developed for detecting the composite association between a set of genetic markers and two traits, based on a random effect model with two variance components, with each presenting the genetic effect on one trait. A likelihood-based test was developed for mutual exclusivity analysis in detection of cancer driver gene and applied to TCGA data, as well as a DCEG lung cancer study. A statistical framework and a computationally efficient software package were developed for identifying host genetic variants associated with microbiome beta diversity with or without interacting with an environmental factor.
Showing the most recent 10 out of 182 publications