The ready availability of public-use data from large National population-based complex surveys have immense potential to lead to the assessment of (1) population frequency of cancer (incidence and prevalence);(2) hospital length of stay and related costs for treatment;(3) cancer screening rates;(4) newly discovered associations between risk factors (e.g. screening rates, diet) and different cancers. The goal of this project i to demonstrate this potential using novel statistical methods applied to at least seven United States complex surveys. Specifically, we will use the Behavioral Risk Factor Surveillance System and the Health Information National Trends Survey to describe screening rates;the National Health and Nutrition Examination Survey to explore behaviors (diet, smoking, etc.) in current and future cancer patients;the Nationwide Inpatient Sample and the Medical Expenditure Panel Survey to describe hospital length of stay and related costs for treating cancer;the National Home and Hospice Care Survey to explore end-of-life care for cancer patients;and the National Health Interview Survey to examine follow-up of cancer survivors. Complex sample surveys present some quite unique problems, and we will develop appropriate models and methods complex surveys. Our proposal has three broad aims of significance to medical researchers. (1) New statistical approaches for small subgroup analyses in which the standard large sample complex survey methods can be inappropriate;2) New statistical procedures for databases that are too large for the usual complex survey approaches to be feasible;and 3) Complex survey methods for skewed data. An additional goal is to make the newly developed statistical/epidemiological methodology widely accessible to non-statisticians. For the methods described in each aim, we plan to create macros and procedures which can be used with existing, widely-used statistical packages (e.g., SAS). Statistical macros and procedures will be documented and made available on the Internet, together with documentation on how to apply these macros to the examples analyzed in the resulting publications.

Public Health Relevance

National complex survey data are used often in cancer epidemiology. We propose new approaches for analyzing such data that are theoretically valid, technically simple and can be implemented within most standard sample survey packages.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Epidemiology of Cancer Study Section (EPIC)
Program Officer
Feuer, Eric J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Brigham and Women's Hospital
United States
Zip Code
Lin, Yan; Lipsitz, Stuart R; Sinha, Debajyoti et al. (2018) Exact Bayesian p-values for a test of independence in a 2?×?2 contingency table with missing data. Stat Methods Med Res 27:3411-3419
Lipsitz, Stuart; Fitzmaurice, Garrett; Sinha, Debajyoti et al. (2017) One-Step Generalized Estimating Equations with Large Cluster Sizes. J Comput Graph Stat 26:734-737
Rader, Kevin A; Lipsitz, Stuart R; Fitzmaurice, Garrett M et al. (2017) Bias-corrected estimates for logistic regression models for complex surveys with application to the United States' Nationwide Inpatient Sample. Stat Methods Med Res 26:2257-2269
Lipsitz, Stuart R; Fitzmaurice, Garrett M; Sinha, Debajyoti et al. (2017) Efficient Computation of Reduced Regression Models. Am Stat 71:171-176
Fraser, Raphael André; Lipsitz, Stuart R; Sinha, Debajyoti et al. (2016) Approximate median regression for complex survey data with skewed response. Biometrics 72:1336-1347
Lipsitz, Stuart R; Fitzmaurice, Garrett M; Arriaga, Alex et al. (2015) Using the jackknife for estimation in log link Bernoulli regression models. Stat Med 34:444-53
Lipsitz, Stuart R; Fitzmaurice, Garrett M; Sinha, Debajyoti et al. (2015) Testing for independence in J×K contingency tables with complex sample survey data. Biometrics 71:832-40
Fitzmaurice, Garrett M; Lipsitz, Stuart R; Arriaga, Alex et al. (2014) Almost efficient estimation of relative risk regression. Biostatistics 15:745-56
Carter, Stacey C; Lipsitz, Stuart; Shih, Ya-Chen T et al. (2014) Population-based determinants of radical prostatectomy operative time. BJU Int 113:E112-8
Fitzmaurice, Garrett; Lipsitz, Stuart; Natarajan, Sundar et al. (2014) Simple methods of determining confidence intervals for functions of estimates in published results. PLoS One 9:e98498

Showing the most recent 10 out of 14 publications