The objective of this research is to develop a new conceptual framework together with an effective analytical approach for survey sampling. The new framework centers on explicitly linking survey sampling to missing data problems. Conceptually, inference from a survey sample is similar to that in a missing data problem: study variables are observed for subjects in the sample, but are missing for those outside the sample. This researcher will develop efficient estimation, by first constructing efficient estimators, for example, based on nonparametric maximum likelihood for missing data problems with independent and identically distributed data and then extending those estimators to survey sampling.

Survey sampling is widely used for information gathering and analysis in various settings, including government agencies, academic institutions, and industries. This research will help to draw more accurate inferences than before from survey data. This will improve the cost-effectiveness of surveys and lead to better informed policy analysis and scientific investigation. Computer programs for implementing the methods will be made publicly available.

Project Report

Survey sampling is an important area of statistics to social and economic sciences and to government agencies among others. It is often regarded as a unique area because the finite population under study is fixed whereas the sampling process is random. We proposed a novel approach to survey sampling by exploiting the connection between sampling and missing data problems: the data on individuals who are not in the sample are missing by design. The proposed estimators (i) take the simple form of generalized regression and calibration estimators, (ii) are design-efficient, similarly to an optimal but complicated regression estimator, under rejective sampling or high-entropy sampling (including Rao-Sampford sampling), and (iii) remain consistent and asymptotically normal under a general sampling design, similarly to but often more efficient than existing estimators. Moreover, we developed weighted Kullback-Leibler distance based inferences for complex survey data. With suitable choices of a weight factor, the proposed approach includes, as special cases, the pseudo empirical likelihood method and the calibrated likelihood method. Accurate confidence intervals can be constructed using a ratio statistic, which has an asymptotic scaled chi-squared distribution similarly to the pseudo empirical likelihood ratio statistic. The new methods are disseminated as part of the R package iWeigReg, publicly available at http://cran.r-project.org/web/packages/iWeigReg/index.html.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Type
Standard Grant (Standard)
Application #
1155668
Program Officer
Cheryl Eavey
Project Start
Project End
Budget Start
2012-09-15
Budget End
2013-08-31
Support Year
Fiscal Year
2011
Total Cost
$50,000
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
Piscataway
State
NJ
Country
United States
Zip Code
08854