Modern statistical and econometric models in social and natural sciences are complex and typically include many unknown parameters. Some of these parameters are of particular interest to the researchers and policymakers (e.g., the mean effect of a treatment), while others are not (e.g., the exact form of a regression function or the probability law of the observed covariates). The latter parameters are usually called nuisance parameters because their values are needed in order to conduct valid statistical inference on the parameters of interest, even though the researchers are not interested in them. An important class of these models are the so-called semiparametric models, which have the distinctive feature that the nuisance parameters are functions (as opposed to numbers). A modern, well-established approach in statistics and econometrics to conduct inference in semiparametric models is to estimate in a flexible, non-parametric way the nuisance parameters first (using the available data), and then employ these estimates as a preliminary guess of their true value in order to conduct inference on the parameters of interest. This procedure is generically referred to as semiparametric inference, and is particularly useful because of its flexibility and lack of sensitivity to biases generated by model misspecification.

Semiparametric inference procedures are very popular among theoretical researchers, partially because of their nice and well understood large sample properties (approximations that assume a large amount of data). However, these inference procedures are considerably less popular among empirical researchers and policymakers, mainly because they are known to be highly sensitive to the way that they are implemented in practice. Specifically, an important drawback of most semiparametric inference procedures is that they rely on non-parametric techniques for the estimation of the nuisance parameters, which in turn require the selection of tuning and smoothing parameters. These additional parameters are artificially introduced in the inference procedure to flexibly approximate the unknown functions (the nuisance parameters). The large sample approximations employed in the literature ignore the effect of these additional parameters that are artificially introduced in the construction of the inference procedure. This fact, in turn, leads to an important lack of robustness of semiparametric inference procedures, that is, small changes in the choice of tuning and smoothing parameters lead to dramatically different empirical results, making applied work unreliable in general. In other words, this lack of robustness usually translates in incorrect statistical inference that may lead researchers and policymakers to draw flawed conclusions from empirical work that employs these semiparametric inference procedures.

The main goal of the proposed research agenda is to develop new, alternative large sample approximations to commonly used semiparametric inference procedures that (at least partially) account for the effect of the specific user-defined choices of tuning and smoothing parameters involved in the inference procedure. This alternative asymptotic theory leads to more "robust" statistical inference procedures because it captures the effect of certain terms that are assumed away by the conventional large sample approximations. This project will proceed in two main stages. First, alternative large sample approximations will be developed for specific semiparametric examples, including weighted averaged derivatives and partially linear model. Not only these models are of interest in their own right, but also they will provide some of the key ingredients to understand the new theoretical features emerging from the non-standard large sample approximations studied in this proposal. Among other problems, the goal is to establish an alternative first-order large sample distribution, derive valid standard-error estimators, develop new ways of selecting the value of the tuning and smoothing parameters, study the validity of commonly used resampling procedures, and explore the higher-order implications of the alternative asymptotic approximations. Once the study of these particular semiparametric procedures is well understood, the second stage of the investigation will be to develop a generalization and unification of the theoretical results outlined for the special examples, which will cover many other problems of interest.

The results of this research are expected to benefit several fields of study, ranging from Economics or Political Science to Biostatistics or Public Health, allowing researchers to conduct "robust" inference in semiparametric models, and making semiparametric inference more attractive to researchers and policymakers. To further increase the impact of this research proposal, a key goal is to provide computer code for commonly used platforms, and to write a non-technical survey with a discussion on theory and implementation of both the classical results and the new results emerging from the research proposed.

Project Report

Conducting credible empirical work in Economics, and other social sciences, is one of the most important and difficult tasks in both academic and policy work. Academic researchers and policy-makers prefer statistical inference procedures that are both flexible and reliable when used in empirical work. Unfortunately, flexibility often comes at the price of less robustness: many econometric and statistical procedures in Economics and other sciences require the choice of tuning and other parameters that make these procedures quite sensitive in applications. The main goal of this grant was to introduce and develop a new large-sample distribution theory for econometric procedures that can be used to construct new econometric and statistical procedures which are more more credible (that is, more robust) in applications. Specifically, the grant developed new interval estimators for parameters related to treatment effects (i.e., the effect of a policy or intervention on some outcome of interest) for two classes of models: (1) weighted average derivatives and (2) linear regression models with many covariates. These models are commonly used in empirical work related to program evaluation, and thus results from this grant may be useful for a variety of fields ranging from Economics or Political Science to Public Policy or Public Health. While the research work underlying this grant was mostly theoretical in nature, several practical results of importance for empirical work were also obtained. These are: (1) new standard errors and confidence intervals formulas for weighted average derivatives, (2) new standard errors and confidence intervals formulas for linear regression models with many covariates, (3) new inference procedures based on resampling techniques (also known as Bootstrap) for weighted average derivatives estimators, (4) new tuning parameter selectors for different estimators in the models mentioned previously, and (5) new software for estimation (and simulation evidence) for the models mentioned previously.

National Science Foundation (NSF)
Division of Social and Economic Sciences (SES)
Standard Grant (Standard)
Application #
Program Officer
Georgia Kosmopoulou
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Berkeley
United States
Zip Code