Bayesian Variable Selection in Generalized Linear Models with Missing Varibles

Yang, Xiaowei

Abstract

In conducting medical research, especially with behavioral and social problems, a challenge for statistical data analysis comes from the problems introduced by missing values. Missing values may be caused by subjective (e.g., nonresponse and dropout) and technical reasons (e.g., censoring over/below quantization level). Generalized linear models (GLMs) are popularly applied in biomedical data analysis where a fundamental task is to interpret or predict an outcome variable by a subset of potentially explanatory variables. Given an incomplete data set, practitioners frequently resort to the strategy of case-deletion where individuals are excluded from consideration if they miss any of the variables targeted for analysis. This is the default option used in many software packages. Yet, case-deletion may not only sacrifice useful information, but also give rise to biased estimates because it requires strong assumptions on the missingness mechanisms. A more satisfactory solution for missing data problems involves multiple imputation, where several imputations are created for the same set of missing values. The variance between imputations reflects the uncertainty due to missingness. Across multiply imputed data sets, however, traditional variable selection methods (based on significance tests or various criteria) often result in models with different selected predictors, thus presenting a problem of combining the models to make final inferences. In this R01 proposal with a 3-year research plan, we aim to develop two alternative strategies of variable selection for GLMs with missing values by drawing on a Bayesian framework. One approach, which we call """"""""impute, then select"""""""" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. The second strategy - """"""""simultaneously impute and select"""""""" (SIAS) - is to conduct Bayesian variable selection and missing data imputation simultaneously within one Markov Chain Monte Carlo (MCMC) process. ITS and SIAS offer two generic frameworks within which various Bayesian variable selection algorithms and missing data imputation algorithms can be implemented. Both strategies will be developed, evaluated, and implemented into an R library for normal regression, binomial regression, and other GLMs with categorical and/or continuous explanatory variables. Practical data sets from several studies on substances abuse and childhood autism will be used to address the effectiveness and flexibility of the proposed strategies. Development of these procedures and contribution of the software to statisticians and researchers in medical research would significantly improve the quality of evaluation of important and clinically relevant data.

Public Health Relevance

Variable selection in generalized linear models (GLMs) is a fundamental task and missing values are commonly seen in biomedical research. The proposed method of Bayesian variable selection within multiple imputation overcomes the limitation of traditional variable selection methods, especially in handling missing values. The accomplishment of the methodology and software development will provide the research society with powerful statistical tools to enhance the quality of medical research.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type: Research Project (R01)
Project #: 5R01HD061404-04
Application #: 8471550
Study Section: Special Emphasis Panel (ZRG1-AARR-F (52))
Program Officer: King, Rosalind B

Project Start: 2011-08-11
Project End: 2014-04-30
Budget Start: 2013-05-01
Budget End: 2014-04-30
Support Year: 4
Fiscal Year: 2013
Total Cost: $229,953
Indirect Cost: $79,657

Institution

Name: Hunter College
Department
Type: Schools of Public Health
DUNS #: 620127915

City: New York
State: NY
Country: United States
Zip Code: 10065

Related projects


NIH 2013 R01 HD	Bayesian Variable Selection in Generalized Linear Models with Missing Varibles Yang, Xiaowei / Hunter College	$229,953
NIH 2012 R01 HD	Bayesian Variable Selection in Generalized Linear Models with Missing Varibles Yang, Xiaowei / University of California Davis	$96,862
NIH 2012 R01 HD	Bayesian Variable Selection in Generalized Linear Models with Missing Varibles Yang, Xiaowei / Hunter College	$95,377
NIH 2011 R01 HD	Bayesian Variable Selection in Generalized Linear Models with Missing Varibles Yang, Xiaowei / University of California Davis	$192,688

Publications

Kim, Soeun; Belin, Thomas R; Sugar, Catherine A (2016) Multiple imputation with non-additively related variables: Joint-modeling and approximations. Stat Methods Med Res :

Kim, Soeun; Sugar, Catherine A; Belin, Thomas R (2015) Evaluating model-based imputation methods for missing covariates in regression models with interactions. Stat Med 34:1876-88

Zhang, Xiaoshuai; Xue, Fuzhong; Liu, Hong et al. (2014) Integrative Bayesian variable selection with gene-based informative priors for genome-wide association studies. BMC Genet 15:130

Peng, Bin; Zhu, Dianwen; Ander, Bradley P et al. (2013) An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways. PLoS One 8:e67672

Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang et al. (2013) A PLSPM-based test statistic for detecting gene-gene co-association in genome-wide association study with case-control design. PLoS One 8:e62129

Comments

Be the first to comment on Xiaowei Yang's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: