The Childhood Cancer Data Initiative (CCDI) 2019 symposium hosted by the National Cancer Institute (NCI) identified ?a critical need to collect, analyze, and share data to address the burden of cancer in children, adolescents and young adults?. Currently, cancer registries in the U.S. hold structured information on every cancer case, including pediatric cancers, within their respective catchment areas. For childhood cancer patients and survivors, issues of late effects, recurrence, subsequent malignant neoplasms (SMN), and follow-up are critically important to consider and further study. It is also important to address survivors? life trajectories, which often include moving to different states and away from where their data was initially collected.
The aim of the National Childhood Cancer Registry (NCCR) is to build a connected data infrastructure that includes longitudinal data from multiple sources and enables secure sharing of childhood cancer data with vetted research investigators. These efforts will support childhood cancer research and provide a population-level dataset on all childhood cancer patients. As a base for the infrastructure, data currently collected at targeted cancer registries, including those in the Surveillance Epidemiology & End Results (SEER) program, will be used to aggregate key information on childhood cancer patients and survivors. In addition to registry data, NCCR aims to incorporate unique data sources including the National Death Index (NDI), State vital records, LexisNexis (residential history and social determinants of health data), and Virtual Pooled Registry (VPR) data on subsequent primaries, as well as pharmacy and radiation oncology data. Additional categories of information that may be integrated into the database include detailed diagnostic characterization of the tumor, treatment information, indicators of tumor recurrence, identification of multiple primary cancers, and genomic characterization of initial and potentially recurrent disease. The Department of Energy (DOE)/Oak Ridge National Laboratory (ORNL) is the largest multipurpose science laboratory in the DOE national laboratory system. The mission of ORNL is to deliver scientific discoveries and technical breakthroughs that will accelerate the development and deployment of solutions to meet pressing global challenges aligned with the DOE?s goals. This ?science-to-solutions? mission depends on the integration and application of distinctive capabilities in basic and applied research, which include leadership positions and capabilities in high-performance computing (HPC), advanced visualization and data fusion, and computational science and systems engineering and integration. ORNL is recognized as a leader in the research and development of health and data sciences. In the DOE national laboratory system, ORNL has a highly capable and proven data and computing enclave certified to host and analyze protected health information (PHI). ORNL?s capabilities, certified systems that accommodate PHI, HPC capabilities and facilities, and the technical expertise that has proven successful in a recent multi-year project with the Centers for Medicare and Medicaid Services (CMS), Department of Veteran?s Affairs (VA) and its broader engagement with the National Institutes of Health (NIH), are unmatched in the domestic private sector, making ORNL uniquely qualified to perform the work required for this effort. With growing complexity of cancer diagnosis and treatment, the national cancer surveillance program faces increasing challenges in capturing essential information needed to better understand the effectiveness of cancer treatments in the context of our complex medical and social environment. The capacity to collect automatically and comprehensively information about cancer patients would enhance the ability of cancer registry data to support a broad variety of cancer research and would provide an infrastructure that would permit evaluation of the generalizability of cancer diagnostics and therapies outside the clinical trials setting to the 97% of the general population not covered by clinical trials. These advances will be important in future healthcare applications beyond cancer, and the knowledge gained will be directly related to technology for future HPC architectures and for making HPC more available to biomedical research. The Current DOE collaboration with NIH/NCI, known as the Joint Design of Advanced Computing Solutions for Cancer, has produced several useful tools and documentation which have been utilized by the cancer surveillance community for adult cancer populations. In particular, ORNL has developed deep learning models and modalities of data analytics that are of particular interest to the NCCR. Their expertise and development of various tools using information from multiple data sources can be used to support various data queries, provide assessment of feasibility of developing trials, and enabling comparisons of key characteristics among the entire population of childhood cancer patients at a population level to compare to patients enrolled in clinical trials (e.g. Children?s Oncology Group [COG]). Furthermore, their expertise and computing capabilities will enable NCI to refine and develop scaling mechanisms of existing data extraction algorithms to obtain data from text documentation in pediatric cancer abstracts that would otherwise be difficult to obtain. This infrastructure will serve as the central data index and warehouse for childhood cancer data through large-scale data identification and linkages and has the potential to provide real-world evidence to support data-driven clinical guidelines. These developed tools combined with this data infrastructure will also support a broad range of research questions. Through the shared follow-up information across registries and other data linkage sources, annual follow-up data will be greatly enhanced for childhood cancer survivors and pediatric oncology research. The purpose of this agreement is to provide funding to 1) support the development and refinement of advanced computational tools (e.g., artificial intelligence, knowledge graph, graph analytics, text extraction algorithms, predictive modeling, and other advanced computational methodologies) using key data from the NCCR infrastructure and various other external data sources, and 2) utilize NCCR to further refine existing tools developed by DOE/ORNL.

National Institute of Health (NIH)
National Cancer Institute (NCI)
NIH Inter-Agency Agreements (Y01)
Project #
Application #
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Cancer Institute
Zip Code