This project is dealing with statistical methods for large and complex data sets involving shape aspects. One example for a method relevant to this project is PRIM (Friedman and Fisher, 1999), a popular data mining algorithm with a wide range of possible applications. PRIM has been applied to serval real world problems and two modifications have recently been developed (Becker and Fahrmeir (2001) and LeBlanc et al. (2002)). Nevertheless, no theoretical foundation of the algorithm exists which might provide a deeper understanding of the algorithm. To provide such an analysis of PRIM and its modifications is part of the proposed project. In fact, a preliminary analysis revealed a close connections to much better understood methods based on minimum volume sets. This revelation also shows the connection to "shape statistics". In another subproject the investigator will develop a novel projection pursuit type method for dimension reduction which is aimed at subsequent classification or mode hunting. This task involves both algorithmic and theoretical challenges, and again "shape" aspects (modes, antimodes, etc.) come into play.
In more general terms it can be said that there is an acknowledged lack of (theoretical) understanding of many statistical methods for large and complex data sets, and enhancing this knowledge is considered to be an important task of statistics (see Kettenring et al. 2003). This project is aimed at contributing to this task both directly and indirectly: (a) directly by providing an analysis of some of the existing data mining procedures, and (b) indirectly by developing novel methods for large and complex data sets including supporting statistical theory.