The Data Compilation Core (Core B) will develop and maintain a central resource of analysis-ready, annotated and documented data sets from clinical trials and related studies to be utilized by the investigators of the program. These data sets will be used to evaluate the methods developed in this program as well as to demonstrate the software developed in the Computational Resource Core (Core C). The primary source of the data will be the clinical trials and related studies of the Cancer and Leukemia Group B (CALGB), one of the major NCI-sponsored cancer cooperative groups. In addition, data from cancer research studies conducted at two large NCI-designated Comprehensive Cancer Centers, the Lineberger Comprehensive Cancer Center at UNC and the Duke Comprehensive Cancer Center, will also be utilized. This is a major advantage for the program in that the data sets provided can be exceptionally well annotated and documented, with the direct involvement of clinical and statistical scientists who were involved in the primary design and analysis of the studies.

Public Health Relevance

A major disadvantage of using public data sets is that the investigator is often unable to understand the clinical and molecular data as the data are provided without appropriate documentation. Indeed, it is not possible to carry out a thorough statistical analysis of data from clinical trials without taking into account and understanding the design of the study, the specifics of the data collection process, the history of the study and the medical issues. This core will address these issues by providing analysis-ready data sets with extensive annotation and documentation.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Program Projects (P01)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1-RPRB-7)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of North Carolina Chapel Hill
Chapel Hill
United States
Zip Code
Wang, Zhi; Maity, Arnab; Luo, Yiwen et al. (2015) Complete effect-profile assessment in association studies with multiple genetic and multiple environmental factors. Genet Epidemiol 39:122-33
Geng, Yuan; Zhang, Hao Helen; Lu, Wenbin (2015) On optimal treatment regimes selection for mean survival time. Stat Med 34:1169-84
Liu, Yulun; Chen, Yong; Chu, Haitao (2015) A unification of models for meta-analysis of diagnostic accuracy studies without a gold standard. Biometrics 71:538-47
Chen, Qingxia; Zeng, Donglin; Ibrahim, Joseph G et al. (2015) Quantifying the average of the time-varying hazard ratio via a class of transformations. Lifetime Data Anal 21:259-79
Viele, Kert; Berry, Scott; Neuenschwander, Beat et al. (2014) Use of historical control data for assessing treatment effects in clinical trials. Pharm Stat 13:41-54
Chen, Ming-Hui; Ibrahim, Joseph G; Zeng, Donglin et al. (2014) Bayesian design of superiority clinical trials for recurrent events data with applications to bleeding and transfusion events in myelodyplastic syndrome. Biometrics 70:1003-13
Wang, Xin; Zhang, Daowen; Tzeng, Jung-Ying (2014) Pathway-guided identification of gene-gene interactions. Ann Hum Genet 78:478-91
Zhang, Jing; Carlin, Bradley P; Neaton, James D et al. (2014) Network meta-analysis of randomized clinical trials: reporting the proper summaries. Clin Trials 11:246-62
Lin, Ja-An; Zhu, Hongtu; Mihye, Ahn et al. (2014) Functional-mixed effects models for candidate genetic mapping in imaging genetic studies. Genet Epidemiol 38:680-91
Molenberghs, Geert; Kenward, Michael G; Aerts, Marc et al. (2014) On random sample size, ignorability, ancillarity, completeness, separability, and degeneracy: sequential trials, random sample sizes, and missing data. Stat Methods Med Res 23:11-41

Showing the most recent 10 out of 133 publications