This project seeks to make important theoretical and methodological contributions to several critical areas of nonparametric statistical inference for high dimensional data. Specifically, this project concentrates on (i) developing empirical likelihood methods for high dimensional data that, among other applications, allows for simultaneous testing of a large number of hypotheses with user-specified confidence levels even with a moderate sample size; (ii) developing bootstrap methodology for high dimensional data for post-variable selection inference; (iii) developing limit theory for studying first- and higher- order asymptotic properties of statistical methods in high dimensions; and (iv) investigating theoretical properties of the proposed and existing resampling methods in high dimensions.
In recent years, high dimensional data appear routinely in many areas of sciences (e.g., Molecular Genetics, Finance, Climate studies, brain mapping, etc.) and in an ever increasing number of everyday activities (e.g., social networking, internet browsing, etc.). This presents unique challenges for information extraction, as traditional statistical methods do not perform well in such "needle in a haystack" situations - where the relevant information is confounded by the presence of a huge number of irrelevant variables. The proposed research seeks to address this need directly by developing novel statistical methods for high dimensional data without stringent assumptions on the data structure.