The purpose of the proposed research is to develop methodologies for the analysis of data with highly differential probabilities of inclusion by developing model-based methods that combine the robustness of the design approach with the optimality obtainable through a model. In unequal-probability-of-selection sample designs often found in national, population-based, complex-sample-design health surveys, correlations between the probability of selection and the sampled data can induce bias. Weights equal to the inverse of the probability of selection are often used to counteract this bias. Highly disproportional sample designs have large weights, which can introduce unnecessary variability in statistics such as the population mean estimate. Weight trimming reduces large weights to a fixed cutpoint value and adjusts weights below this value to maintain the untrimmed weight sum. This reduces variability at the cost of introducing some bias. Standard approaches are not """"""""data-driven"""""""": they do not use the data to make the appropriate bias-variance tradeoff, or else do so in a highly inefficient fashion. We propose to develop model-based methods for """"""""weight trimming"""""""" to supplement standard, ad-hoc design-based methods in disproportional probability-of-inclusion designs where variances due to sample weights exceeds bias correction. We will also consider the use of these models to estimate population parameters in linear and generalized linear regression models. We develop these models in the context of stratified and poststratified known-probability sample designs, and extend their use into more general multistage cluster sample designs. We plan to develop Bayesian models, as we believe that they offer both theoretical and practical advantages to traditional frequentist analyses and that their use has been neglected in the analysis of survey data. We will consider three major applications: analyses to determine predictors and mechanisms of injury to children in passenger vehicle crashes using the population-based surveillance dataset of the Partners for Child Passenger Safety, to explore the development of cardiovascular risk factors in children using the National Health and Nutrition Examination Survey and to determine the prevalence of smoking and cancer screening behavior among adults using calibration estimators developed to combine data from the Behavioral Risk Factor Surveillance Survey and the National Health Interview Survey.

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
Research Project (R01)
Project #
5R01HL068987-03
Application #
6922810
Study Section
Social Sciences, Nursing, Epidemiology and Methods 4 (SNEM)
Program Officer
Wolz, Michael
Project Start
2003-08-10
Project End
2005-08-31
Budget Start
2005-08-01
Budget End
2005-08-31
Support Year
3
Fiscal Year
2005
Total Cost
$1
Indirect Cost
Name
University of Pennsylvania
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104
Elliott, Michael R (2009) Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models. J Off Stat 25:1-20
Elliott, Michael R (2008) Model Averaging Methods for Weight Trimming. J Off Stat 24:517-540