Proposal: DMS 9504495 PI: Trevor Hastie Institution: Stanford University Title: Flexible Regression and Classification Abstract: The research concerns several research directions with a common theme: to push widely accepted but limited statistical tools in more adventurous directions, while retaining some of their attractive features, such as model interpretability. Specifically, the research involves the development of: a) nonparametric extensions of logistic regression for multiclass responses, including additive, projection pursuit and basis expansion techniques, as well as rank reduced models similar to Fisher's LDA; b) a new adaptive algorithm for basis selection, similar to Friedman's MARS model, which uses a natural penalized criterion to simultaneously select variables and shrinks their coefficients; c) a technique for locally adapting the nearest neighbor distance metric to combat the curse of dimensionality. Many important problems in data analysis and modeling focus on prediction. Some important examples include computer assisted diagnosis of disease (e.g. reading digital mammograms), heart disease risk assessment, automatic reading of handwritten digits (e.g. zip-codes on envelopes), speech recognition, to name a few. This research is about enriching the current toolbox of well established statistical models in a natural way to address some of these more complex scenarios. Often new exotic techniques, such as neural networks, are ``black boxes'' that appear to produce good results, but do not provide the analyst with an interpretable model, diagnostics or similar feedback to give them confidence that the box has produced sensible results. Statistics can play an active role in these important prediction and data analysis problems through the development competitive and defensible models. This research does just that by creating a blend between the well understood classical techniques and the new techniques that allow for model exploration.