This research project will develop a unified framework for survey weighting through novel modifications of multilevel regression and poststratification (MRP) to incorporate design-based information into modeling. Real-life survey data often are unrepresentative due to selection bias and nonresponse. Existing methods for adjusting for known differences between the sample and population from which the sample is drawn have some advantages but also practical limitations. Classical weights are subject to large variability and can result in unstable estimators, while regression approaches present computational and modeling challenges. The new framework developed by these investigators will allow adjustment for selection bias and nonresponse as well as improvements in design-respecting inference. Using this approach, survey analysts will be able to properly account for non-ignorable design issues in the regression framework, and practitioners who conduct surveys in government, academic, commercial, and non-profit sectors will be able to construct statistically efficient survey weights in a routine manner. This new framework may be applicable to problems resulting from the newly emerging explosion of "big data," such as integration of surveys from multiple sources, analysis of streaming data, and respondent-driven sampling. The project will develop software that can be accessed by the general research community.

This research project will connect survey weighting with poststratification under the framework of MRP. In MRP, data are partially pooled during the modeling process and then local estimates are combined via poststratification to obtain the population inference. This smoothed estimation borrows information from neighboring poststratification cells and allows flexible multilevel modeling strategies that have the potential to be robust to model misspecification. The project generalizes MRP to handle weighting adjustments for regression, deep interactions, calibration for non-census variables, complex survey design, multistage sampling, multiple survey frames, and other complications that arise in real-world survey analysis. The new methods will be applied to two ongoing surveys, the New York Longitudinal Poverty Measure study and the Fragile Families and Child Wellbeing study. Computations will be performed using the open source Bayesian program Stan and will be freely disseminated. The project is supported by the Methodology, Measurement, and Statistics Program and a consortium of federal statistical agencies as part of a joint activity to support research on survey and statistical methodology.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Type
Standard Grant (Standard)
Application #
1534414
Program Officer
Cheryl Eavey
Project Start
Project End
Budget Start
2015-10-01
Budget End
2018-09-30
Support Year
Fiscal Year
2015
Total Cost
$91,332
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027