The National Cancer Institute (NCI) will deploy an integrating biomedical informatics infrastructure, the cancer Biomedical Informatics Grid (caBIGTM), to expedite the cancer research community's access to key bioinformatics platforms. In partnership with the cancer research community, the NCI is creating a common, extensible informatics platform that integrates diverse data types and supports interoperable analytic tools. This platform will allow research groups to tap into the rich collection of emerging cancer research data while supporting their individual investigations.This Integrative Cancer Research Workspace will provide tools and systems to enable integration and sharing of information among cancer researchers. These tools will facilitate the integration of data not only from different centers, but also data of different types, thereby enabling translational and integrative research, These tools also provide for the integration of clinical and basic research data. The Workspace is tasked to develop a well-documented and validated toolset for use throughout the cancer research community. Workspace activities will include platforms and standards to facilitate the sharing of datasets and repositories, and those appropriate for testing the caBIGTM infrastructure are being asked to participate. A major goal of this workspace will be a demonstration of how a shared informatics platform can allow a comprehensive, federated grid of information to be made available to the cancer research community.The main goal of the Integrative Cancer Research Workspace is to assemble data, tools, and infrastructure that facilitate the cross silo use of cancer biology information to promote integrated cancer research. Working towards this goal, the NCICB is developing an integrative application framework, known as caintegrator, designed to facilitate cross data analysis in support of ongoing cancer research.2.0 ObjectivesThis SOW is intended to cover three major projects/initiatives at the NCI Center for Bioinformatics:Computational Portal and Analysis System (CPAS), Cancer Genetic Markers of Susceptibility(CGEMS) and caintegrator.The following high level objectives are intended to be achieved by the work described herein.CPAS: The Computational Portal and Analysis System is part of deliverable from the mouse BiomarkerDiscovery Project (BDI), funded by Nd. Data generated under the BDI contract is to be delivered toNCICB in addition to the CPAS system. NCICB will make the data accessible to the public via an NCBIproteomics portal, based on CPAS. Several enhancements are planned to adapt CPAS from pipelinebased system to a public repository. Support is also required to(a) verify and ensure caBIG Silver-level compatibility (https://cabig.nci.nih.gov/quidelines documentation)(b) Get the mouse BDI data (including mass spectrometry and 2D gel) loaded into NCICB's installation ofCPAS;(c) Load proteomics data from other studies, as these become available;(d) Fix bugs;(e) Install and maintain the caBIG Silver-level API when this becomes available.CGEMS: The Cancer Genetic Markers of Susceptibility project is an effort to identify germ-line single nucleotide polymorphisms (SNPs) that correlate with disease. As currently scoped, the CGEMS project will conduct two genotyping studies: one for prostate cancer and one for breast cancer. For each study,1200 cases and 1200 controls will be genotyped for 500,000 SNPs. The goal is to identify SNPs that predispose a person to disease.The CGEMS project is coordinated by the Office of Cancer Genomics (OCG), Division of Cancer Epidemiology and Genetics (DCEG), and the Core Genotyping Facility (CGF). Initial analysis of the data will be performed by CGF. All data to be made public will be delivered to NCICB. NCICB is responsible for long-term storage and presentation of the data through a publicly-accessible portal. This will consist of:(a) a caBlG object API to the data,(b) a web interface allowing query by genomic region, significance of association with disease, etc. NCICB expects to store the data in an instance of the Clinical Genomics Object Model (CGOM), expanded to accommodate data types needed for CGEMScaintegrator: Calntegrator (http://caintecirator.nci.nih.gov/) is a translational informatics platform that allows researchers and bioinformaticians to access and analyze clinical and experimental data from clinical trials and studies. The calntegrator framework provides a mechanism for integrating and aggregating biomedical research data and provides access to a variety of data types (e.g. Immunohistochemistry (IHC), microarray-based gene expression, SNPs [Single nucleotide polymorphisms], clinical trials data etc.) in a cohesive fashion. The calntegrator knowledge framework provides researchers with the ability to perform ad hoc querying, reporting and analysis across multiple domains. At the heart of calntegrator is the Clinical Genomics Object Model (CGOM) that provides programmatic access to the integrated biomedical data collected in the calntegrator data system. One of the major charters of this project is to build an analytical framework that allows Clinician /Scientists/Biostatisticians to conduct translational analysis of study specific data in a user-friendly manner, for example: Predictive analysis of microarrays, Kaplan-Maier/cox survival analysis, and descriptive statistics. The following high-level features are proposed for this period of performance:(a) Extend the Clinical Genomic Object Model (CG-OM) to support LOH, Genotype, Mutation, FISH, protein detection (IHC, cell lysate arrays, TMA), Tissue Microarray data, DNA methylation data, microRNA data, proteomics data, MRI and Clinical Findings(b) Develop and extend caBIG silver-compatible set of APIs (Query By Example APIs using caCORE SDK (http://ncicb. nci.nih.qov/NCICB/infrastructure/cacoresdk) , and a middle tier for specific high-performance translational queries) to retrieve Clinical genomic objects that map to the translational data in the study data marts.(c) Publish caintegrator APIs to caGRID 1.0(d) Implement generic stand alone Higher Level Analysis API to provide run-time analysis of clinical genomic datasets that uses caBIG services such as Gene Pattern's middleware, geWorkbench (http://qforcie.nci.nih.qov/proiectsIc~eworkbench/) and DWD(http://gforge. nci. nih.qov/softwaremap/trove list. ph p?form cat=366) tools.(e) Publish calntegrator's High Level Analysis Services to caGRID 1.0 as a caGRID Analytical Service(f) Implement an ETL process for calntegrator using caAdapter data mapping utility (http://trials dev. nd . nih .Qov/proiects/infrastructu reProiect/caAdapter), CTOM (Clinical Trials Object Models) and caArray (http:I/caarray.nci.nih.gov/) APIs. Explore the use of COTS ETL tools for this purpose.(g) Continue to implement calntegrator to support studies with translational datasets Rembrandt, I-SPY breast cancer study, CGEMS (http://cqems.cancer.~ov/index.asp), TCGA (http://cancerqenome.nih.gov/index.asp), DCEG's EAGLE. Background on I-SPY breast cancer study and DCEG's EAGLE study are provided below(h) Provide API support for caBIGs caTRlP http://gforqe.nci.nih.qov/proiects/catrip/) - distributed translational query platform.Background on I-SPY breast cancer study: In an effort to validate surrogate markers of response and to establish a basis for tailoring therapy, the American College of Radiology Imaging Network (ACRIN), the Cancer and Leukemia Group B (CALGB), and Specialized Program of Research Excellence (SPORE) are cbllabor~ting in a multi-center trial of serial imaging and biopsy of women with tumors at least 3 cm in size who undergo neoadjuvant chemotherapy, entitled Investigation of Serial Studies to Eredict lour Iherapeutic ~esponse with imaging ~nd mo~ecular analysis (I-SPY TRIAL) CALGB protocols 150007 and 150012 and ACRIN protocol 6657.The primary objective of the I SPY Trial is to identify surrogate markers of response to neoadjuvant chemotherapy that are predictive of pathologic remissions and survival in Stage Ill breast cancer. These markers include clinical size change, change in volume and longest diameter as measured by MRI, and residual disease at time of surgery. Molecular and imaging characteristics are being utilized to identify those patients likely to respond to novel therapeutic agents, which could then be tested in the neoadjuvant setting. Secondary objectives include: 1) The identification of molecular markers and/or MRI results that predict 3-year disease-free survival in patients with stage Ill breast cancer, 2) the correlation of molecular markers with specific imaging patterns seen on MRI, and 3) the prediction of response with MRI results and marker data from cell cycle checkpoints, proliferation, angiogenesis, hormone receptors, and molecular profiles.In addition to the identification of predictors of response and ascertainment of potential therapeutic alternatives, a principal goal of the analysis involves continual quality control throughout the course of the trial. Regular assessment of the sufficiency of the tissue cores is required, as well as, the quality and quantity of DNA and RNA, and IHC quality. Finally the analysis includes cross platform validation (Her2 by IHC FISH, expression array CGH arrays proteomics and serum) and assay validation (p53 conformal mutation analysis vs. p53 IHC).The trial consists of eight cancer centers across the country including the University of California, San Francisco (UCSF), which acts the main coordinating site, Georgetown University, Memorial Sloan Kettering (MSKCC), University of Alabama, University of North Carolina at Chapel Hill (UNC), University of Pennsylvania (Penn), University of Texas Southwestern and University of Washington. There are plans to add an additional site, University of Chicago, in Winter 2005. In addition to patient accrual, study institutions are responsible for: tissue processing and marker core (UNC); cyclin assays and MRI pathology coordination (Penn); clinical study design and patient accrual (MSKCC); serum markers (UCSF); MRI imaging and marker correlation (UCSF); and genomic and expression markers (UCSF and UNC). Additionally, UCSF has the role of oversight of the conduct of the trial, integration of the entire series of nested studies (ACRIN imaging, CALGB clinical and correlative science, and InterSPORE studies), and implementation and beta-testing of web-based programs for data management, analysis, and communication (NCICB website and database design) that will be utilized in this trial and future correlative science studies.One of the charters for calntegrator is to develop an n-tiered application to facilitate integration between diverse datatypes from the I-SPY study.Background on DCEG!EAGLE lung cancer study: NCI's Division of cancer epidemiology and Genetics has designed and conducted the EAGLE (Environmental and Genetic Lung Cancer Etiology), an interdisciplinary multi-center case-control study of lung cancer situated in Milan, Italy, designed to explore the genetic determinants both of lung cancer and smoking. The study has enrolled 2,000 incident lung cancer cases, including both males and females of Italian nationality, ages 35 to 79 years old, with verified lung cancer of any histologic type. In addition 2,000 healthy controls randomly selected from the catchment area and matched to cases by age, gender and residence are enrolled. Extensive epidemiologic data, clinical data, blood and lung tissue paraffin blocks have been collected. From 500 surgical cases multiple fresh normal lung tissue and tumor samples have also been obtained.Objectives of EAGLE:The major objectives for this study are: Genetic profiling of all individuals by 1 5STR markers Analysis of gene expression in adenocarcinoma lung cancer tissue of smokers and non-smokers Histologic characteristics of lung cancer in relation to genotype, gene expression, somatic mutations and smoking Therapy efficacy and survival of lung cancer patients is planned Lung cancer-affected siblings of cases and the unaffected siblings in the same sibships will be identified as well Goal is to be able to perform integrative analyses of the above-mentioned datasets in the context of the epidemiological data from the study.The study will generate data on 4,000 participants that will include epidemiologic, clinical and genomic data. The genomic data will include SNP array, micro-array and proteomic data.One of the charters for caintegrator is to develop an n-tiered application to facilitate integration between diverse datatypes from the EAGLE study.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
NIH Inter-Agency Agreements (Y01)
Project #
Y1CO8108-1-0-1
Application #
7713589
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
2008
Total Cost
$140,055
Indirect Cost
Name
National Cancer Institute
Department
Type
DUNS #
City
State
Country
United States
Zip Code