Recent advances in high-throughput measurements of critical parameters related to cancer genesis and development have led to a wealth of cancer-related information available in public and private databases. To realize the promise of dramatic advancement in integrative cancer research enabled by this rapidly expanding information, novel informatics tools that allow researchers to efficiently integrate this available data are needed. The main objective of this proposal is to enable enhanced understanding and modeling of cancer processes through the development of the Cancer Biology Data Integration (CBDI) System, a collection of caGrid services and grid client applications capable of integrating data and information from disparate sources. The CBDI System aims to exploit the rich semantic metadata information and the robust data exchange standards established through the Cancer Biomedical Informatics Grid (caBIG) Initiative, enhancing its use through the provision of a coherent ontological view of caBIG semantics, so that data sources can be queried using a standard semantic query language. The CBDI System contains an Ontology View Generator to expose caBIG semantics using the Web Ontology Language (OWL) and a distributed Semantic Query Processor that implements the semCDI query model to execute RDF-based queries using SPARQL against caGrid data services. The CBDI System also enables the integration of local private data collected by investigators and research institutions. An intuitive user interface providing multiple visualization and concept searching abilities is used to build queries and view results. In Phase I of the CBDI project, key algorithms and mechanisms of the CBDI System were developed, including the exposure of ontology views and the conversion of queries from SPARQL into caBIG's common query language. In addition the overall feasibility of the CBDI System was demonstrated with proof-of-concept prototypes of system components. During Phase II, the complete CBDI System will be implemented and tested with caBIG data services at six research institutions, evaluating its use in a variety of real-world operating conditions and functional scenarios.
The Cancer Biology Data Integration System is a collection of caBIG-compatible services that formulate a coherent ontological view of caBIG semantics so that ontology-based queries can be performed using the SPARQL query language over distributed caBIG-compatible data services.
Shironoshita, E Patrick; Jean-Mary, Yves R; Bradley, Ray M et al. (2009) semQA: SPARQL with Idempotent Disjunction. IEEE Trans Knowl Data Eng 21:401-414 |