Bayesian Variable Selection in Generalized Linear Models with Missing Varibles

Yang, Xiaowei

Abstract

The applicant seeks to address the problem of missing values A major challenge for biomedical research comes from the problems of missing values, which may be caused by subjective (e.g., nonresponse and dropout) and technical reasons (e.g., censoring over/below quantization level). Generalized linear models (GLMs) and Generalized Linear Mixed Models (GLMMs) are popularly applied in biomedical data analysis where a fundamental task is to identify a subset of independent variables (e.g., genetic, proteomic, behavioral, or environmental factors) to interpret or predict a dependent variable (e.g., therapeutic effectiveness and safety). Given an incomplete data set, practitioners may needlessly resort to the strategy of case-deletion where individuals are excluded from consideration if they miss any of the variables targeted for analysis. This method would not only sacrifice useful information, but also give rise to biased estimates because it requires strong assumptions to accept the missingness mechanisms. A more satisfactory solution for missing data problems involves multiple imputation, where several imputations are created for the same set of missing values. Across multiply imputed data sets, however, traditional variable selection methods (based on significance tests or likelihood criteria) often result in models with different selected predictors, thus presenting a problem of combining the models to make final inferences. In this R01 proposal, we aim to develop alternative strategies of variable selection for GLMs with missing values by drawing on a Bayesian framework. One approach called """"""""impute, then select"""""""" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. The second strategy - """"""""simultaneously impute and select"""""""" (SIAS) - conducts Bayesian variable selection and missing data imputation simultaneously within one Markov Chain Monte Carlo (MCMC) process. ITS and SIAS offer two generic frameworks within which various Bayesian variable selection algorithms and missing data imputation algorithms can be implemented. The strategies will be extended to handle complex data sets such as those with multi-level design structures and/or large number of variables. The strategies will be developed, evaluated, and implemented into an R library for normal, binomial/multinomial, and Poisson regression models with mixed categorical and continuous explanatory variables. Simulated and practical data sets from studies on childhood autism and drug dependence will be used to address the effectiveness and flexibility of the proposed strategies.

Public Health Relevance

Missing data is the normal circumstance when developing large data sets. This issue comes to the forefront when using large data sets to develop personalized and individualized care. To avoid this loss of data and provide better predictions of risk and benefit, imputation-based Bayesian variable selection strategy provides a powerful analytical tool. The availability of our new method and software package will greatly enhance the capacity and quality of medical research and healthcare delivery

Funding Agency

Agency: National Institute of Health (NIH)
Institute: Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type: Research Project (R01)
Project #: 5R01HD061404-02
Application #: 8317303
Study Section: Special Emphasis Panel (ZRG1-AARR-F (52))
Program Officer: King, Rosalind B

Project Start: 2011-08-11
Project End: 2012-08-27
Budget Start: 2012-05-01
Budget End: 2012-08-27
Support Year: 2
Fiscal Year: 2012
Total Cost: $96,862
Indirect Cost: $24,229

Institution

Name: University of California Davis
Department: Public Health & Prev Medicine
Type: Schools of Medicine
DUNS #: 047120084

City: Davis
State: CA
Country: United States
Zip Code: 95618

Related projects


NIH 2013 R01 HD	Bayesian Variable Selection in Generalized Linear Models with Missing Varibles Yang, Xiaowei / Hunter College	$229,953
NIH 2012 R01 HD	Bayesian Variable Selection in Generalized Linear Models with Missing Varibles Yang, Xiaowei / University of California Davis	$96,862
NIH 2012 R01 HD	Bayesian Variable Selection in Generalized Linear Models with Missing Varibles Yang, Xiaowei / Hunter College	$95,377
NIH 2011 R01 HD	Bayesian Variable Selection in Generalized Linear Models with Missing Varibles Yang, Xiaowei / University of California Davis	$192,688

Publications

Kim, Soeun; Belin, Thomas R; Sugar, Catherine A (2016) Multiple imputation with non-additively related variables: Joint-modeling and approximations. Stat Methods Med Res :

Kim, Soeun; Sugar, Catherine A; Belin, Thomas R (2015) Evaluating model-based imputation methods for missing covariates in regression models with interactions. Stat Med 34:1876-88

Zhang, Xiaoshuai; Xue, Fuzhong; Liu, Hong et al. (2014) Integrative Bayesian variable selection with gene-based informative priors for genome-wide association studies. BMC Genet 15:130

Peng, Bin; Zhu, Dianwen; Ander, Bradley P et al. (2013) An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways. PLoS One 8:e67672

Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang et al. (2013) A PLSPM-based test statistic for detecting gene-gene co-association in genome-wide association study with case-control design. PLoS One 8:e62129

Comments

Be the first to comment on Xiaowei Yang's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: