The focus of this project is development and refinement of statistical procedures for evaluating and selecting regression models. Problems under investigation include two major areas: multistep variable selection in multiple regression analysis and discriminating among alternative model specifications, including non-nested classes of models. Although various variable selection procedures have found widespread application in biostatistics and epidemiology, e.g., in the analysis of case-control and cohort studies with many potential risk factors, their statistical properties are quite poorly understood. Subset selection involves two kinds of inference: 1) fitting data according to a tentative model and 2) assessment of the fit and replacement, if possible, of the current model with a new one that provides a better fit. The conventional assessment criteria, such as Cp or AIC, have been derived under the assumption that a subset under evaluation has been chosen a priori without reference to the data. Since model selection violates this assumption, the conventional criteria often lead to false inference about the chosen subset, thereby failing to produce a reliable model at the end of subset selection. Different bootstrap-like assessment criteria that allow for the selection effect have been studied using theory and Monte Carlo simulations. It is shown that although the direct application of the bootstrap idea does not produce good results, some modified bootstrap assessment criteria have much better statistical properties than conventional ones and can be successfully applied in practice. A FORTRAN program has been developed jointly with D. Midthune that provides subset selection criteria based on different modified bootstrap approaches. Many alternative model specifications in applied biostatistical studies contain non-nested classes. Methodology for discriminating among non-nested models based on a nonparametric relevancy criterion has been refined and applied to the problem of choosing the change point. The latter is defined to differentiate between two or more data subsets that follow different regression models. Extensive computer simulations have demonstrated that the developed criterion has better statistical properties than many known procedures.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Intramural Research (Z01)
Project #
1Z01CN000187-04
Application #
6161608
Study Section
Special Emphasis Panel (BB)
Project Start
Project End
Budget Start
Budget End
Support Year
4
Fiscal Year
1997
Total Cost
Indirect Cost
Name
Division of Cancer Prevention and Control
Department
Type
DUNS #
City
State
Country
United States
Zip Code