This research effort characterizes sparse model recovery for general model classes, extending existing results for generalized linear models and in classification. The central goals of this proposal are: (a) to define sparsity and target recovery in high dimensional - low sample size settings; (b) to show that empirical risk minimization with a lasso-type penalty allows for target recovery, under minimal assumptions. The investigators advance the use of a novel type of oracle inequalities to show that the penalized empirical risk minimizers adapt to the unknown sparsity of the underlying statistical model. Special attention is given to random design regression and classification with a reject option.
High dimensional data are increasingly common in many scientific disciplines such as biological and medical sciences. Accurate estimation and implementation of complex statistical models used for the analysis of such data are challenging. The aim of this project is to develop a unified theory for the analysis of computationally efficient procedures in high dimensional data settings. The usefulness of these techniques will be demonstrated by applications to gene expression data and concurrent EEG / fMRI data. The newly introduced classification procedures have a built-in reject option that allows for withholding decision in cases that are hard to classify. This greatly improves the performance of tumor classification where the consequences of misdiagnosis are severe. The software for these procedures will be made freely available on the world wide web.