The focus of this project is development and refinement of statistical procedures for evaluating and selecting regression models. Problems under investigation include two major areas: multi-step variable selection in multiple regression analysis and discriminating among alternative model specifications, including non-nested classes of models. Although various variable selection procedures have found widespread application in biostatistics and epidemiology, e.g., in the analysis of case-control and cohort studies with many potential risk factors, their statistical properties are quite poorly understood. Exploratory model selection affects statistical properties of conventional estimators for a chosen model and, in particular, can lead to their substantial bias. A problem of estimating the mean squared error of prediction (MSEP) for a model chosen by a subset selection procedure has been considered. Different bootstrap-type estimators (both parametric and nonparametric) that allow for the selection effect have been studied using theory and Monte-Carlo simulations. It is shown that although the direct application of the bootstrap idea does not produce good results, some modified bootstrap estimators have much better statistical properties than conventional ones and can be successfully applied in practice. A FORTRAN program has been developed jointly with D. Midthune to evaluate MSEP based on different modified bootstrap approaches. Many alternative model specifications in applied biostatistical studies contain non-nested classes. Methodology has been developed for discriminating among non-nested models. The approach is based on a nonparametric relevancy criterion that evaluates each model by comparing its performance for the observed data and generated pseudo- data. The computer simulations have demonstrated that the developed criterion has better statistical properties than many known procedures. Current research is being conducted jointly with D. Midthune and includes applications of this approach to discriminating among different linear and non-linear regression models.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Intramural Research (Z01)
Project #
1Z01CN000187-02
Application #
5201424
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
1995
Total Cost
Indirect Cost
Name
Division of Cancer Prevention and Control
Department
Type
DUNS #
City
State
Country
United States
Zip Code