The proposed research involves developing new inference procedures for a variety of non-linear models with cross sectional or panel data. The models discussed, such as the binary choice and Roy model have seen widespread use in empirical work.
The proposed activity can be divided into three parts. The first pertains to panel data versions of models with self selection. Self selection models enable the econometrician to control for optimal decisions of the economic agent. For example, observed wages should reflect that the wage offered to an individual in one sector exceeds the wage offered in all other sectors. Panel data models, where an agent's outcomes are observed over multiple time periods, have become increasingly popular in empirical research. The increased availability of longitudinal panel data sets has presented new opportunities for econometricians to control for individual unobserved heterogeneity across agents. Important work in nonlinear panel data models is surveyed in (Arellano and Honore (2001)). However, there is very little work in the area of panel data for models with self selection, and the proposed research aims to address this.
Inference methods are proposed under both stationary and nonstationary conditions. The former refers to an assumption that unobserved components of individuals have the same distribution over time. The latter relaxes this assumption but imposes that unobserved components for different individuals in the cross section have the same distribution in the same time period. In both cases the new methods are able to estimate sharp sets for parameter of interest, such as the slope of a labor supply curve. A sharp set refers to the smallest set that can be obtained when the data satisfies the assumptions of the econometric model.
The second part of this proposal pertains to cross sectional binary choice models with discrete endogenous covariates. Such models arise frequently in the treatment effect literature, where the endogenous variable is often the treatment status, and the outcome variable is binary, such as employment status. A parameter that is often of interest in these situations is the coefficient on treatment in a regression framework. Two approaches to identifying such a parameter that have been considered in the literature are the control function and the instrumental variable methods. The proposed activity here is to establish a relation between the two methods. In particular, a theorem is established for a control function model which demonstrates how difficult it is to conduct inference on the treatment effect parameter of interest. This is analogous to the theorem in (Khan and Tamer (2010)) for the instrumental variable model. Consequently, inference becomes nonstandard and so new inference methods are proposed.
The third part is about establishing optimality results for a wide class of cross sectional censored regression models with self selection, such as the Roy model. First, conditions that ensure point identification of the parameters of interest are considered, such as independence, or support conditions, and efficiency bounds are derived. Point identification refers to the sharp set reducing to a single value. Efficiency bounds refer to the smallest attainable variance for an estimation procedure under the assumptions of the econometric model. The usefulness of such bounds is twofold - for one it will enable measuring the relative efficiency of methods that are adopted in practice, and second it will suggest new estimation procedures which attain the bound.
References Arellano, M., and B. Honore (2001): "Panel Data Models: Some Recent Developments," Handbook of econometrics. Volume 5, pp. 3229-96. Khan, S., and E. Tamer (2010): "Irregular Identification, Support Conditions and Inverse Weight Estimation," Econometrica, forthcoming.
The proposal was comprised of three sections. Each section involved a proposal for a new inference metod for an econometric model that is of wide interest in applied fields such as labor economics, industrial organization or public economics. As I describe in more detail below, each section has resulted in new methods for which working papers at peer reviewed journals have been developed and new software codes for their implementation have been developed and made available. In the first part, a new inference method was developed for a wide class of nonlinear palnel data models. Such models have received an increasing level of interest in fields such as labor economics and public finance. This is because of the increased availability of large panel data sets, which is a data set where one has a large cross section and each cross sectional unit is observed for mutiple time periods. Such data allows for researchers to control for unobserved heterogeneity in individuals in a way they cannot with cross sectional data alone. However, despite the promise that this type of data holds, it can be difficult to exploit this for nonlinear models that arise in applied microeconometrics, where the outcome variable is binary, such as employment status, or censored, such as public expenditures. The ideas in the first part of the proposal address this open problem and an entirely new method has been fully developed. This has resulted in multiple working papers as well as the development of new software code for the implementation of the new methods. The papers and code have been made avaiable on the PI's web page, http://public.econ.duke.edu/~shakeebk/Personal_Homepage.html. One of the main papers that resulted from this part of the proposal has been revised and resubmitted to the Journal of Econometrics. In the second part of the proposal, new inference methods were proposed for a class of models with endogenous discrete variables. These models are of wide interest in applied fields, such as labor economics, where one is interested in the effect of a binary variable such as job training on a binary outcome, such as employment status. Another example is industrial organization, where one studies market entry decision, where one firm decides whether or not to enter a particular market, and key determining factor is whether or not a competing firm is already present in that market. The statistical analysis in examples can be very nonstandard and complicated. The proposal develops new methods; this has resulted in multiple working papers, one of which titled "Information Structure and Statistical Information in Discrete Response Models" (with D. Nekipelov) has been revised and resubmitted to Quantitative Economics. In the third part of the proposal a new method was given to conduct optimal inference in a wide class of competing risk models. These models are of wide interest in both biostatistics and microeconometrics. In the former case, one is interested in survival analysis, specfically how long a patient will live after receiving some treatment. In the latter case one is interested in estimating a labor supply curve where the agent is deciding which sector to work in, and statistical analysis is complicated by the fact that the observed wage is the highest one offered to the individual. While this model is widely studied in both biostatistics and econometrics, there is little if any work on conducting optimal inference, meaning methods that are statistically efficient. The proposal fully develops such a method and this has resulted in a new working paper with D. Nekipelov.