When an applied researcher tests a hypothesis, she can make one of two mistakes. She can reject a true null hypothesis or accept a false one. To implement the test the applied researcher picks a significance level, which is the maximal probability at which the researcher is willing to commit the first type of error. This maximal probability at which the first error occurs is also called the size of the test.

Competing testing procedures of equal (large sample) size are typically ranked according to their relative power properties, where power denotes 1 minus the probability of the second type of error. However, noting that key assumptions underlying Econometric models are often questionable in practice, the PI proposes an alternative ranking of tests according to their relative large sample size distortion under local violations of certain model assumptions. The PI includes tests into the comparison that have large sample size equal to nominal size when the key model assumptions hold true and that are consistent against fixed alternatives when the model is point identified. As a more ambitious goal--beyond ranking existing tests according to the new measure--the PI intends to investigate whether there exists an optimal test, that is, a test that has smallest large sample size distortion for a given degree of local violations of the model assumptions in the class of tests that have correct asymptotic size when the model assumptions are true and are consistent when the model is point identified.

Out of many examples, the PI focuses on two lead examples. First, the PI considers hypothesis tests involving the structural parameter vector in the linear instrumental variables (IVs) model where the IVs and the structural error term may be correlated. The correlation fades away at rate n^(-1/2) as the sample size n increases to infinity. The PI considers tests that have correct large sample size when the correlation is in fact zero and that are consistent when the IVs are strong and uncorrelated with the error term. Of the various tests considered the PI finds that Anderson-and-Rubin-type tests are the least distorted under local correlation of the IVs and the error term.

Second, the PI considers tests for the unknown parameter vector in partially identified models defined by moment inequalities. The PI ranks the tests with respect to their large sample size distortion when the moment inequalities are locally violated at rate n^(-1/2). The PI finds that among the tests considered those based on plug-in asymptotic critical values are the least size distorted under local misspecification. An optimality theory is under investigation for both examples.

Borader Impact: If an applied researcher chooses a test based on its favorable power properties, then her inference may suffer from severe size distortion when the model assumptions are slightly violated. Unfortunately, model violations seem to be pervasive in empirical applications. The new criterion instead suggests using tests that limit the size distortion while still being consistent under standard assumptions. The proposed methods will have broad empirical impact and has the potential to improve inference. It is expected that the methods provided by this research will find frequent use by applied researchers in social sciences within academia and the public sector.

Project Report

When a hypothesis is being tested, two types of mistakes can occur. First, a true hypothesis may be rejected or second, a false hypothesis may not get rejected. In frequentist inference, one often fixes the probability of a type I error uniformly over all null data generating processes (that is, one fixes the size of the test), at 5% say, and subject to this, tries to "minimize" the probability of a type II error. The starting point of this research is the observation that the underlying models are only approximations to the truth and as such inherently wrong. We are looking for testing procedures whose (asymptotic) size is robust (relative to other testing procedures) when the model is "locally wrong", that is, we want to find testing procedures whose probability of a type I error is still relatively close to the nominal size (e.g. 5%) in the likely scenario where the underlying model differs from the true model. Several important Economic models are studied, in particular 1) the linear instrumental variables (IVs) model where the instruments may not be exactly exogenous and 2) partially identified models defined by moment inequalities where the inequalities maybe locally violated. In more detail, we first consider hypothesis tests involving the structural parameter vector in the linear instrumental variables model where the IVs and the structural error term may be correlated and the correlation fades away at rate O(n^(-1/2)) as the sample size n increases to infinity and the IVs maybe only weakly correlated with the endogenous variable. We show that the Anderson and Rubin (1949, AR) test is less size distorted than the LM and CLR tests. All these tests have correct asymptotic size when the correlation is in fact zero and are consistent against fixed alternatives when the IVs are strong and uncorrelated with the error term. Second, we consider tests for the unknown parameter vector in partially identified models defined by moment inequalities. We rank the tests with respect to their large sample size distortion when the moment inequalities are locally violated at rate O(n^(-1/2)) and find that among the tests considered, those based on plug-in asymptotic critical values are less size distorted under local misspecification than subsampling or GMS tests and that the latter two tests have the same amount of size distortion. In particular, this last point is very interesting because it is known that, based on power considerations, one should prefer the GMS tests over subsampling tests. Therefore, the new criterion to choose a test based on asymptotic size robustness in locally misspecified scenarios provides additional discriminatory power. We also make suggestions as to which test statistic is to be preferred when robustness to local model misspecification is the choice criterion. In terms of broader impact of the research findings, if an applied researcher chooses a test based on its favorable power properties, then her inference may suffer from severe size distortion when the model assumptions are slightly violated. Unfortunately, model violations seem to be pervasive in empirical applications. The new criterion instead suggests using tests that limit the size distortion while still being consistent under standard assumptions. Our proposed methods will have broad empirical impact and have the potential to improve inference. The methods provided by this research will find frequent use by applied researchers in social sciences within academia and the public sector.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Type
Standard Grant (Standard)
Application #
1346827
Program Officer
Georgia Kosmopoulou
Project Start
Project End
Budget Start
2013-07-01
Budget End
2014-08-31
Support Year
Fiscal Year
2013
Total Cost
$81,310
Indirect Cost
Name
Pennsylvania State University
Department
Type
DUNS #
City
University Park
State
PA
Country
United States
Zip Code
16802