The investigators continue the development of new methodology and the accompanying mathematical theory for problems in multiple testing and inference, driven by the many burgeoning applications in the information age. Further motivation for valid methods stems from exploratory analysis of large data sets, where the process of "data snooping" (or "data mining") often leads to challenges of multiple testing and simultaneous inference. In such problems, the statistician is faced with the challenge of accounting for all possible errors resulting from a complex analysis of the data, so that any resulting inferences or conclusions can reliably be viewed as "real" rather than spurious findings or artifacts of the data. It is safe to say that the mathematical justification of sound statistical methods is not keeping pace with the demand for valid new tools. In particular, the investigators develop randomization tests as inferential methods for semi-parametric and nonparametric models that do not rely on unverifiable assumptions. To a great extent, resampling methods, such as the bootstrap and subsampling, are successful in many problems, at least in an asymptotic sense, but for many problems they are unsatisfactory. Examples of such problems in contemporary statistics include "high" dimensional problems, where the "curse of dimensionality" may cause resampling methods to break down, and "non-regular" problems, where a lack of convergence of the approximation that is not at least locally uniform in the underlying data generating process may cause resampling methods to break down. Some specific problems addressed include Tobit regression and linear regression with weak instruments. Moreover, resampling methods do not enjoy exact finite-sample validity, which is perhaps the main reason permutation and rank tests are so commonly used in many fields, such as medical studies. The investigators apply randomization tests to many new problems that statisticians face, despite issues of high dimensionality, simultaneous inference, unknown dependence structures, non-Gaussianity, etc. An exciting feature of the approach is that, properly constructed, randomization tests enjoy good robustness properties in situations where the assumptions guaranteeing finite-sample validity may fail. Mathematical theory is developed as well as feasible computational constructs.

Useful statistical methodology is the key tool to analyzing any study or scientific experiment. Recently, the demand for efficient and reliable confirmatory statistical methods has grown rapidly, driven by problems arising in the analysis of DNA microarray biotechnology, econometrics, finance, educational evaluation, global warming, and astronomy, as well as many others. In general, the philosophical approach is to develop practical methods that have both robustness of validity and robustness of efficiency so that they may be applied in increasingly complex situations as the scope of modern data analysis continues to grow. The broader impact of this work is potentially quite large because the resulting inferential tools can be applied to such diverse fields as genetics, bioengineering, image processing and neuroimaging, clinical trials, education, astronomy, finance and econometrics. The results will be widely disseminated, and public software of new statistical tools made accessible whenever possible. The many thriving fields of applications demand new statistical methods, creating challenging and exciting opportunities for young scholars under the direction of the investigators.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1307973
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2013-09-01
Budget End
2016-08-31
Support Year
Fiscal Year
2013
Total Cost
$150,000
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305