This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).

In the context of missing data and semiparametric regression models (i.e., models with both finite dimensional and infinite-dimensional parameters), little work has been done on efficient estimation and still less on estimating general functionals. Most studies limit their attention to estimating the mean response. In contrast, this research project studies estimation of arbitrary expectations involving response and covariables. The investigator will also address estimating densities and distribution functions. The focus is on efficient estimation in semiparametric regression with responses missing at random. The analysis of semiparametric models is an important topic with practical, real-world implications: in applications there is typically some information about the structure of the data available, but not sufficient to specify an appropriate parametric model; semiparametric methods make optimal use of that information. However, even simple (widespread) semiparametric models, such as the partly linear model, are not yet fully understood. This research will further our understanding. Most of the anticipated results will also apply to cases where data are complete. The first research strand has the goal of deriving efficient estimators of expectations of covariates and the response variable in semiparametric regression. A second strand focuses on estimation of the response density in the nonlinear regression model. The investigator intends to show that, for certain classes of well-behaved regression functions, the response density can be estimated with a root n rate and, moreover, efficiently. It is not anticipated that it will always be possible to estimate the density with the parametric rate root n: limitations and possible alternative approaches will be investigated. The key methodological innovation in these two strands is the combination of full imputation, efficiency and empirical likelihood ideas. The third strand considers estimation of the error distribution function in nonparametric regression with missing responses.

Many scientific investigations depend upon statistical analysis to draw conclusions. In many cases, however, incomplete data present a challenge to the accuracy of those conclusions. This applies in many fields, including epidemiology, pharmaceutical research and social/behavioral investigations involving the analysis of survey data. The results of this research project will enable data sets with missing values to be treated more efficiently and improve the accuracy of statistical conclusions about the data. Despite significant recent progress, inefficient methods remain in frequent use. Examples include listwise deletion of cases, and imputation methods which do not use all the available information about the data. Deleting or disregarding unique or scarce data is clearly not a desirable option. Efficient analysis will make use of all available information about the structure of the data, leading to unbiased, least-dispersed estimation methods: in other words, greater accuracy.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0907014
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2009-07-01
Budget End
2012-08-31
Support Year
Fiscal Year
2009
Total Cost
$113,928
Indirect Cost
Name
Texas A&M Research Foundation
Department
Type
DUNS #
City
College Station
State
TX
Country
United States
Zip Code
77845