9704809 Foster and Stine The research studies the use of information theory for selecting statistical models. The first component of the research characterizes statistical model selection criteria as methods for data compression. From this viewpoint, criteria for model selection choose the model which minimizes the length of a compressed version of the observed data. The proposed research will allow one to make these characterizations precise and to extend this paradigm. The second area of research exploits the relationship between model selection and data compression, leading to methods of adaptive model selection. This research builds on and extends context trees which are the underlying statistical devices used to obtain some of the most effective data compression algorithms. The third body of research concerns the statistical properties of the information theoretic estimators implied by these techniques. In order to allow one to estimate such statistical properties from observed data, the research also seeks to develop practical bootstrap resampling methods to determine the accuracy of the statistical estimates. Each phase of the proposed research combines statistical theory and information theory with computing. In addition to the use of simulations to validate the proposed methods, the research will provide publically available software distributed via the Web that implements the various selection criteria and associated modeling techniques. Further numerical simulations will benchmark the various large sample results, measuring the practical performance of these procedures. Given the increasing prevalence of large data sets in science and economics, traditional statistical model building faces new challenges. The use of traditional variable selection methods lead to overfitting, characterized by overly complex models that result from chance variation. This research attacks this problem in several ways, first by using in formation theory to construct a common framework in which to study the latest developments in statistical model identification. Building on this framework, the research expands the scope of model selection to encompass models that can adapt to structural changes. In addition, this work develops the associated statistical properties of these methods, allowing comparison to classical procedures and practical use in real-world applications.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
9704809
Program Officer
Dean M Evasius
Project Start
Project End
Budget Start
1997-09-15
Budget End
1999-08-31
Support Year
Fiscal Year
1997
Total Cost
$104,000
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104