This research project will develop a unified framework for survey weighting through novel modifications of multilevel regression and poststratification (MRP) to incorporate design-based information into modeling. Real-life survey data often are unrepresentative due to selection bias and nonresponse. Existing methods for adjusting for known differences between the sample and population from which the sample is drawn have some advantages but also practical limitations. Classical weights are subject to large variability and can result in unstable estimators, while regression approaches present computational and modeling challenges. The new framework developed by these investigators will allow adjustment for selection bias and nonresponse as well as improvements in design-respecting inference. Using this approach, survey analysts will be able to properly account for non-ignorable design issues in the regression framework, and practitioners who conduct surveys in government, academic, commercial, and non-profit sectors will be able to construct statistically efficient survey weights in a routine manner. This new framework may be applicable to problems resulting from the newly emerging explosion of "big data," such as integration of surveys from multiple sources, analysis of streaming data, and respondent-driven sampling. The project will develop software that can be accessed by the general research community.
This research project will connect survey weighting with poststratification under the framework of MRP. In MRP, data are partially pooled during the modeling process and then local estimates are combined via poststratification to obtain the population inference. This smoothed estimation borrows information from neighboring poststratification cells and allows flexible multilevel modeling strategies that have the potential to be robust to model misspecification. The project generalizes MRP to handle weighting adjustments for regression, deep interactions, calibration for non-census variables, complex survey design, multistage sampling, multiple survey frames, and other complications that arise in real-world survey analysis. The new methods will be applied to two ongoing surveys, the New York Longitudinal Poverty Measure study and the Fragile Families and Child Wellbeing study. Computations will be performed using the open source Bayesian program Stan and will be freely disseminated. The project is supported by the Methodology, Measurement, and Statistics Program and a consortium of federal statistical agencies as part of a joint activity to support research on survey and statistical methodology.