Diagnostics for linear regression models are included as options in many statistical packages and now are readily available to analysts. However, these tools are generally aimed at ordinary or weighted least squares regression and do not account for stratification, clustering, and survey weights that are features of data sets collected in complex sample surveys. The ordinary least squares diagnostics can mislead users because the variances of model parameter estimates will usually be estimated incorrectly by the standard procedures. The variance or standard error estimates are an intimate part of many diagnostics. This research will adapt existing diagnostics for use with survey data, and, where necessary, develop new ones. This project also will study the properties of existing linear regression diagnostics when they are applied to complex survey data. Extensions are needed to cover both clustered and unclustered data. The particular techniques to be studied are: leverages for linear regression and their heuristic cutoffs for influence; distributions of leverages, including histograms and quantiles; modification of unit-deletion measures of influence on model parameter estimates and predicted values and the rules-of-thumb used to identify influential observations; change in standard error estimates due to deletion of an observation or groups of observations; and extension of collinearity diagnostics, including variance inflation factors and variance decompositions for parameter estimates.

The data collected in many surveys sponsored by U.S. government agencies and other domestic and international organizations are used to fit statistical models. These models are used to understand the correlates of disease, unemployment, education achievement levels, and other topics. The surveys are typically stratified, single or multistage surveys where units can have substantially different survey weights. Some examples of substantive areas are medical conditions, expenditures for medical care, the social welfare of families and children, and the status of progress in education. Evaluation and improvements to existing methods of model-fitting and diagnosis are important in order to make the most of the data that are collected in these surveys and to avoid conclusions that may be misleading or erroneous. The research is supported by the Methodology, Measurement, and Statistics Program and a consortium of federal statistical agencies as part of a joint activity to support research on survey and statistical methodology.

National Science Foundation (NSF)
Division of Social and Economic Sciences (SES)
Standard Grant (Standard)
Application #
Program Officer
Cheryl L. Eavey
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Michigan Ann Arbor
Ann Arbor
United States
Zip Code