We will develop and evaluate improved statistical methods for the design and analysis of biomedical studies conducted with general biased sampling design schemes, the univariate and multivariate outcome-auxiliary- dependent sampling (OADS) and the two-stage OADS designs. The advantage of such designs is that it allows both prospective and retrospective samples at the same time where the prospective sample provides the benefits of a cohort study and the retrospective sample enables investigators to concentrate resources on where there is the greatest amount of information, i.e., some judiciously chosen subsets based on the outcome and auxiliary covariate information. New statistical methods is needed to achieve the potential statistical efficiency. Extension of the simple ODS design to allow the sampling probability to depend on a continuous outcome and a continuous auxiliary covariates will be developed. We also develop optimal two-stage OADS designs under commonly encountered budget and precision/power constraints in practice. Tools and benchmark for distinguishing available sampling options in the planning stage of the study will be developed. These are the relative-budget-index for fixed precision/power case and the relative-gain-index for fixed budget case. The proposed methods are particularly useful in cancer and environmental research where auxiliary exposure information and expensive exposure assessment are frequent challenges. ? ? The proposal consists of six projects. The first project deals with semiparamtric efficient inference for two-stage OADS design where the first stage data can be either from a simple random sample or from an ODS sample itself. The second project concerns the optimal two-stage OADS design for a fixed budget and the development of a formal evaluation criteria (RGI) that measures the closeness of an alternative design to the optimal one. The third project concerns the optimal two-stage OADS design for a given precision/power and the development of a formal evaluation criteria (RBI) that measures the closeness of an alternative design to the optimal one (the one with the minimal budget). The fourth project considers a multivariate OADS and multivariate two-stage OADS design and develop the semiparametric inferences for correlated responses under the multivariate OADS. The fifth project concerns a partial linear model for the nonlinear exposure effects in both fixed and random effects regression analysis under an OADS and two-stage OADS design. The sixth project investigates a variable selection and hypothesis testing techniques for data from two-stage OADS design. ? ? The strengths and weaknesses of proposed methods will be critically examined via theoretical investigations and simulations. Cost-effective sampling strategies in a given setting will be investigated. Comparisons with existing methods will be conducted. Related software will be developed. Data sets from epidemiologic and environmental studies on the effects of environmental exposures, and on cancer and other diseases will be analyzed. These include the Cancer Risk in Uranium Miners Study, the Magnetic Fields and Breast Cancer Risk Study, the Collaborative Perinatal Project, the Family Heart Study, and the DDE-antiandrogen Study.

Public Health Relevance

We propose and investigate some new study designs/analytical methods that will allow biomedical study to be conducted less costly in practice while still providing a good statistical power to detect the effect of interests. These designs allow investigators to conduct their study more efficiently for a given budget and hence can help improve the overall efficiency and productivity of the public health research. ? ? ?

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Feuer, Eric J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of North Carolina Chapel Hill
Biostatistics & Other Math Sci
Schools of Public Health
Chapel Hill
United States
Zip Code
Xu, Wangli; Zhou, Haibo (2012) Mixed effect regression analysis for a cluster-based two-stage outcome-auxiliary-dependent sampling design with a continuous outcome. Biostatistics 13:650-64
Liu, Yanyan; Yuan, Zhongshang; Cai, Jianwen et al. (2012) Marginal hazard regression for correlated failure time data with auxiliary covariates. Lifetime Data Anal 18:116-38
Zhou, Haibo; You, Jinhong; Qin, Guoyou et al. (2011) A Partially Linear Regression Model for Data from an Outcome-Dependent Sampling Design. J R Stat Soc Ser C Appl Stat 60:559-574
Zhou, Haibo; Qin, Guoyou; Longnecker, Matthew P (2011) A partial linear model in the outcome-dependent sampling setting to evaluate the effect of prenatal PCB exposure on cognitive function in children. Biometrics 67:876-85
Zhou, Haibo; Wu, Yuanshan; Liu, Yanyan et al. (2011) Semiparametric inference for a 2-stage outcome-auxiliary-dependent sampling design with continuous outcome. Biostatistics 12:521-34
Qin, Guoyou; Zhou, Haibo (2011) Partial linear inference for a 2-stage outcome-dependent sampling design with a continuous outcome. Biostatistics 12:506-20
Zhou, Haibo; Zou, Baiming; Hazucha, Milan et al. (2011) Nasal nitric oxide and lifestyle exposure to tobacco smoke. Ann Otol Rhinol Laryngol 120:455-9
Zhou, Haibo; Song, Rui; Wu, Yuanshan et al. (2011) Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome. Biometrics 67:194-202
Liu, Yanyan; Wu, Yuanshan; Cai, Jianwen et al. (2010) Additive-multiplicative rates model for recurrent events. Lifetime Data Anal 16:353-73
Carson, Johnny L; Lu, Tsui-Shan; Brighton, Luisa et al. (2010) Phenotypic and physiologic variability in nasal epithelium cultured from smokers and non-smokers exposed to secondhand tobacco smoke. In Vitro Cell Dev Biol Anim 46:606-12

Showing the most recent 10 out of 25 publications