When an applied researcher tests a hypothesis, she can make one of two mistakes. She can reject a true null hypothesis or accept a false one. To implement the test the applied researcher picks a significance level, which is the maximal probability at which the researcher is willing to commit the first type of error. This maximal probability at which the first error occurs is also called the size of the test.
Competing testing procedures of equal (large sample) size are typically ranked according to their relative power properties, where power denotes 1 minus the probability of the second type of error. However, noting that key assumptions underlying Econometric models are often questionable in practice, the PI proposes an alternative ranking of tests according to their relative large sample size distortion under local violations of certain model assumptions. The PI includes tests into the comparison that have large sample size equal to nominal size when the key model assumptions hold true and that are consistent against fixed alternatives when the model is point identified. As a more ambitious goal--beyond ranking existing tests according to the new measure--the PI intends to investigate whether there exists an optimal test, that is, a test that has smallest large sample size distortion for a given degree of local violations of the model assumptions in the class of tests that have correct asymptotic size when the model assumptions are true and are consistent when the model is point identified.
Out of many examples, the PI focuses on two lead examples. First, the PI considers hypothesis tests involving the structural parameter vector in the linear instrumental variables (IVs) model where the IVs and the structural error term may be correlated. The correlation fades away at rate n^(-1/2) as the sample size n increases to infinity. The PI considers tests that have correct large sample size when the correlation is in fact zero and that are consistent when the IVs are strong and uncorrelated with the error term. Of the various tests considered the PI finds that Anderson-and-Rubin-type tests are the least distorted under local correlation of the IVs and the error term.
Second, the PI considers tests for the unknown parameter vector in partially identified models defined by moment inequalities. The PI ranks the tests with respect to their large sample size distortion when the moment inequalities are locally violated at rate n^(-1/2). The PI finds that among the tests considered those based on plug-in asymptotic critical values are the least size distorted under local misspecification. An optimality theory is under investigation for both examples.
Borader Impact: If an applied researcher chooses a test based on its favorable power properties, then her inference may suffer from severe size distortion when the model assumptions are slightly violated. Unfortunately, model violations seem to be pervasive in empirical applications. The new criterion instead suggests using tests that limit the size distortion while still being consistent under standard assumptions. The proposed methods will have broad empirical impact and has the potential to improve inference. It is expected that the methods provided by this research will find frequent use by applied researchers in social sciences within academia and the public sector.