Statistical Inference in Model Selection

Kipnis, V.

Abstract

The focus of this project is development and refinement of statistical procedures for evaluating and selecting regression models. Problems under investigation include two major areas: multistep variable selection in multiple regression analysis and discriminating among alternative model specifications, including non-nested classes of models. Although various variable selection procedures have found widespread application in biostatistics and epidemiology, e.g., in the analysis of case-control and cohort studies with many potential risk factors, their statistical properties are quite poorly understood. Subset selection involves two kinds of inference: 1) fitting data according to a tentative model and 2) assessment of the fit and replacement, if possible, of the current model with a new one that provides a better fit. The conventional assessment criteria, such as Cp or AIC, have been derived under the assumption that a subset under evaluation has been chosen a priori without reference to the data. Since model selection violates this assumption, the conventional criteria often lead to false inference about the chosen subset, thereby failing to produce a reliable model at the end of subset selection. Different bootstrap-like assessment criteria that allow for the selection effect have been studied using theory and Monte Carlo simulations. It is shown that although the direct application of the bootstrap idea does not produce good results, some modified bootstrap assessment criteria have much better statistical properties than conventional ones and can be successfully applied in practice. A FORTRAN program has been developed jointly with D. Midthune that provides subset selection criteria based on different modified bootstrap approaches. Many alternative model specifications in applied biostatistical studies contain non-nested classes. Methodology for discriminating among non-nested models based on a nonparametric relevancy criterion has been refined and applied to the problem of choosing the change point. The latter is defined to differentiate between two or more data subsets that follow different regression models. Extensive computer simulations have demonstrated that the developed criterion has better statistical properties than many known procedures.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Intramural Research (Z01)
Project #: 1Z01CN000187-04
Application #: 6161608
Study Section: Special Emphasis Panel (BB)

Project Start
Project End
Budget Start
Budget End
Support Year: 4
Fiscal Year: 1997
Total Cost
Indirect Cost

Institution

Name: Division of Cancer Prevention and Control
Department
Type
DUNS #

City
State
Country: United States
Zip Code

Related projects


NIH 1997 Z01 CA	Statistical Inference in Model Selection Kipnis, V. / Division of Cancer Prevention and Control
NIH 1996 Z01 CA	Statistical Inference in Model Selection Kipnis, V. / Division of Cancer Prevention and Control
NIH 1995 Z01 CA	Statistical Inference in Model Selection Kipnis, V. / Division of Cancer Prevention and Control
NIH 1994 Z01 CA	Statistical Inference in Model Selection Kipnis, V. / Division of Cancer Prevention and Control

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Related projects

Comments