This project supports research on the development of statistical methods for conducting cancer and other disease epidemiologic and surveillance analyses from national health surveys. We developed innovative statistical methods and statistical software for displaying scatter plots and for estimating kernel density smoothers to obtain conditional mean and percentile plots using weighted cluster samples. These methods have been useful in examining residual plots from multiple linear, logistic and Cox regressions to identify data points that might be disproportionately influencing the results of the analysis. We developed methods for constructing confidence intervals for rare binary outcomes observed from a survey. Because binomial theory breaks down when the data is weighted and correlated within sampled clusters, our approach modifies methods used to obtain exact binomial confidence limits. These methods involve determining an effective sample size due to the complex sample design's inflation of the variance and using the effective sample size in the exact confidence limit formulas. A problem arising from logistic regression analysis of risk factors for disease is determining how well the estimated logistic model fits the data. We have developed a method to test the goodness-of-fit of a logistic regression model with survey data. In this approach the distribution of a Wald test that compares the observed and expected counts from deciles of risk is simulated under the null hypothesis. This approach is particularly promising for logistic models with small numbers of outcomes where the asymptotic distribution of the Wald test is not accurate. We are extending this simulation approach to testing of regression coefficients from logistic regression when the number of outcomes in covariate cells are sparse. The simulation approach is being compared to score tests under these same sparse data conditions. When we use regression analysis such as multiple linear, logistic or Cox regression, it is useful to estimate the average predicted response from the regression for each level of the risk factor if everyone in the population had been exposed to that level of risk. This is called a predictive margin. We have developed variance estimates for predictive margins when the sample data is from a survey. This methodology has been used to analyze the relationship of cancer screening to type of health insurance. We wrote a graduate level text book for instructing students and researchers in public health and epidemiology on how to analyze national health survey data. Research is underway regarding utilization of survey methods for analyzing two-stage case-control studies. We have been examining application of jackknife replication methods, which are used in survey research for estimating the variances, to the problem of variance estimation of logistic regression coefficients from two stage case-control studies. We are developing methods for making inferences about superpopulation parameters. We have developed adjustments to classical finite population variance estimators that can provide accurate variances for superpopulation means. We have extended these variance estimators to ratio and regression parameters and applying these estimators to the National Health Interview Survey, National Hospital Discharge Survey and the Third National Health and Nutrition Examination Survey. We are researching methods for using latent class theory to analyze dietary survey data. We have developed jackknife methods for estimating standard errors for estimators of latent class parameters and have investigated Wald procedures for testing hyptheses about these parameters. These methods have been successfully applied to dietary intake data from the USDA Continuing Survey of Food Intakes by Individuals to estimate the proportion of individuals who meet NCI guidelines for consuming fruits and vegetables. We are developing design-based consistent estimators of population variance components that are an improvement over existing inconsistent estimators. Simulation studies are under way to investigate the small sample properties of these design-based estimators.

National Institute of Health (NIH)
Division of Cancer Epidemiology And Genetics (NCI)
Intramural Research (Z01)
Project #
Application #
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Cancer Epidemiology and Genetics
United States
Zip Code
Kant, A K; Graubard, B I (1999) Variability in selected indexes of overall diet quality. Int J Vitam Nutr Res 69:419-27
Breslow, R A; Wideroff, L; Graubard, B I et al. (1999) Alcohol and prostate cancer in the NHANES I epidemiologic follow-up study. First National Health and Nutrition Examination Survey of the United States. Ann Epidemiol 9:254-61
Forman, M R; Zhang, J; Nebeling, L et al. (1999) Relative validity of a food frequency questionnaire among tin miners in China: 1992/93 and 1995/96 diet validation studies. Public Health Nutr 2:301-15
Stillman, F; Hartman, A; Graubard, B et al. (1999) The American Stop Smoking Intervention Study. Conceptual framework and evaluation design. Eval Rev 23:259-80
Graubard, B I; Korn, E L (1999) Analyzing health surveys for cancer-related objectives. J Natl Cancer Inst 91:1005-16
Graubard, B I; Korn, E L (1999) Predictive margins with survey data. Biometrics 55:652-9
Kulldorff, M; Graubard, B; Velie, E (1999) The P-value and P-value function. Epidemiology 10:345-7