Theory and Methods for Multiple Testing and Inference

Romano, Joseph

Abstract

The main goal of this research proposal is the development of theory and methodology for problems in multiple testing and inference. A classical approach to dealing with multiplicity is to require decision rules that control the familywise error rate (FWER), the probability of rejecting at least one true hypothesis. But when the number of tests is large, control of the FWER is so stringent that alternative hypotheses have little chance of being detected. In response, the false discovery rate (FDR) of Benjamini and Hochberg has gained wide use. Alternative measures, such as the probability of rejecting k or more true hypotheses, or ones based directly of the actual false discovery proportion (FDP) will be considered by the investigator. For each measure of error control, it is desired to construct procedures that exhibit error control under the weakest possible assumptions. Subject to error control, the procedures should be efficient in their ability to detect alternative hypotheses. The main approach used to develop methods that do not rely on unrealistic or unverifiable model assumptions will be the use of the bootstrap, subsampling, and other computer-intensive methods. These tools offer viable approaches to obtaining valid distributional approximations while assuming very little about the stochastic mechanism generating the data. Just as resampling has been enormously successful in the case of questions of a single inference, its use can be extended fruitfully to questions of multiple inferences. While such an approach has been used with some success, its full potential is currently unrealized and it is clear that efficient and more broadly applicable methods will be advanced in the next few years. The power of the bootstrap and related methods is that the joint dependency structure of the individual test statistics can be captured so that methods need not be overly conservative. The pursuit of such new methodology will be investigated from theoretical, computational and practical points of view. Notably, the investigator will address the multiple inference problem when the number of hypotheses is large compared with sample size, the open problem of directional errors, as well as the construction of efficient techniques that control the FDR, as well as other measures of error. Virtually any scientific experiment sets out to answer questions about the process under investigation, which often can be translated formally into a set of hypotheses. It is the exception that a single hypothesis is considered. Moreover, due to effects of "data snooping" (or "data mining"), other inference questions arise as well. The statistician is then faced with the challenge of accounting for all possible errors resulting from a complex data analysis, so that any resulting inferences or interesting conclusions can reliably be viewed as real structure rather than artifacts of random data. While the history of statistical methods that deal with problems of simultaneous inference data back at least half a century, most of the classical techniques typically rely on strong assumptions, or they are inefficient. Driven by the advent of computers and the information age, there has been a growing demand for more reliable and efficient methods for multiple testing. For example, current methods in biotechnology and genomics generate DNA microarray experiments, where expression levels in cells for thousands of genes must be analyzed simultaneously. Similar problems arise in image processing, such as neuroimaging, and econometrics. It is now not uncommon to encounter data consisting of megabytes of information. Thus, the statistician is faced with new challenges of devising techniques that are not based on strong assumptions and can effectively deal with problems of multiplicity in the presence of vast amounts of data.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Mathematical Sciences (DMS)
Type: Standard Grant (Standard)
Application #: 0404979
Program Officer: Gabor J. Szekely

Project Start
Project End
Budget Start: 2004-07-01
Budget End: 2008-06-30
Support Year
Fiscal Year: 2004
Total Cost: $89,998
Indirect Cost

Theory and Methods for Multiple Testing and Inference
Romano, Joseph
Stanford University, Palo Alto, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments