Project Background: Statistical inference in the medical research used to inform VA policy deci- sions, healthcare initiatives, and patient care frequently requires use of complex regression mod- els. Regression model diagnostics therefore are critically important for the analysis of these re- search studies to establish the appropriateness of the models from which inference is based and to protect the validity and trustworthiness of inferences drawn from these statistical models. There exist sizable statistical theory and methodologies for linear regression model diagnostics that functions well. Much of the diagnostic theory and methods for generalized linear models, such as logistic regression and Poisson regression, are direct translations and modifications of the residual based diagnostic theory for linear models. However, several of the residual based diagnostic methods may not perform as well for generalized linear models as for linear models. Project Objectives: The proposed research will develop a non-residual based statistical theory for generalized linear regression model diagnostics that will address many of the shortcomings of current residual based methods. Project Methods: For a generalized linear regression, the proposed research will demonstrate the predictors and the outcome are independent given the correct regression function. Hence, if the regression function is well specified then the outcome and the predictors will appear independent given the value of the estimated regression function. This simple result then opens numerous possibilities for diagnostic techniques. For example, with an estimated regression function close to the true regression function, simple scatterplots of the outcome against the predictors conditional on estimated regression function should exhibit independence. We will use asymptotic theory for likelihood estimates under mis- specified models and other areas of statistical theory to develop simple graphical methods for assessing the fit of generalized linear models. Preliminary mathematical results indicate that these simple scatterplots together with smoothing and aggregation of these plots can diagnosis omission of interactions and transformations of the predictors from the regression function. In ad- dition, the proposed research will investigate use of moment generating functions, aggregation of p-values for within strata tests, and stratified nonparametric tests to develop formal tests for lack of fit for generalized linear regression models. Importance to VA: In developing methods that lead to improved, more reliable inference in epi- demiological, clinical, and health services research, the proposed study will lead to more soundly established medical interventions and health programs that will directly impact veteran's health. Over the course of numerous such research studies the cumulative indirect impact of this research could be substantial.

Public Health Relevance

The VA uses statistical inference from epidemiologic, clinical, and health services research stud- ies to inform policy decisions and healthcare initiatives. Statistical inference in epidemiology and health services research, as well as other areas of medical research, frequently requires use of complex generalized linear regression models. The proposed research will develop new theory and sets of methodologies for assessing the fit of these regression models. This theory will stem from foundational results establishing the conditional independence of an outcome and the predic- tors in the model given the values of the correctly specified regression function. These methods will be helpful to researchers in establishing the validity of the models used for statistical inference.

National Institute of Health (NIH)
Veterans Affairs (VA)
Non-HHS Research Projects (I01)
Project #
Application #
Study Section
HSR-3 Informatics and Research Methods Development (HSR3)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Minneapolis VA Medical Center
United States
Zip Code