The main goal of this research proposal is the development of theory and methodology for problems in multiple testing and inference. A classical approach to dealing with multiplicity is to require decision rules that control the familywise error rate (FWER), the probability of rejecting at least one true hypothesis. But when the number of tests is large, control of the FWER is so stringent that alternative hypotheses have little chance of being detected. In response, the false discovery rate (FDR) of Benjamini and Hochberg has gained wide use. Alternative measures, such as the probability of rejecting k or more true hypotheses, or ones based directly of the actual false discovery proportion (FDP) will be considered by the investigator. For each measure of error control, it is desired to construct procedures that exhibit error control under the weakest possible assumptions. Subject to error control, the procedures should be efficient in their ability to detect alternative hypotheses. The main approach used to develop methods that do not rely on unrealistic or unverifiable model assumptions will be the use of the bootstrap, subsampling, and other computer-intensive methods. These tools offer viable approaches to obtaining valid distributional approximations while assuming very little about the stochastic mechanism generating the data. Just as resampling has been enormously successful in the case of questions of a single inference, its use can be extended fruitfully to questions of multiple inferences. While such an approach has been used with some success, its full potential is currently unrealized and it is clear that efficient and more broadly applicable methods will be advanced in the next few years. The power of the bootstrap and related methods is that the joint dependency structure of the individual test statistics can be captured so that methods need not be overly conservative. The pursuit of such new methodology will be investigated from theoretical, computational and practical points of view. Notably, the investigator will address the multiple inference problem when the number of hypotheses is large compared with sample size, the open problem of directional errors, as well as the construction of efficient techniques that control the FDR, as well as other measures of error. Virtually any scientific experiment sets out to answer questions about the process under investigation, which often can be translated formally into a set of hypotheses. It is the exception that a single hypothesis is considered. Moreover, due to effects of "data snooping" (or "data mining"), other inference questions arise as well. The statistician is then faced with the challenge of accounting for all possible errors resulting from a complex data analysis, so that any resulting inferences or interesting conclusions can reliably be viewed as real structure rather than artifacts of random data. While the history of statistical methods that deal with problems of simultaneous inference data back at least half a century, most of the classical techniques typically rely on strong assumptions, or they are inefficient. Driven by the advent of computers and the information age, there has been a growing demand for more reliable and efficient methods for multiple testing. For example, current methods in biotechnology and genomics generate DNA microarray experiments, where expression levels in cells for thousands of genes must be analyzed simultaneously. Similar problems arise in image processing, such as neuroimaging, and econometrics. It is now not uncommon to encounter data consisting of megabytes of information. Thus, the statistician is faced with new challenges of devising techniques that are not based on strong assumptions and can effectively deal with problems of multiplicity in the presence of vast amounts of data.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0404979
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2004-07-01
Budget End
2008-06-30
Support Year
Fiscal Year
2004
Total Cost
$89,998
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Palo Alto
State
CA
Country
United States
Zip Code
94304