Information Integration of Heterogeneous Data Sources

Kabuka, Mansur

Abstract

The wealth of biological and biomedical data constantly being generated promises dramatic advancement in the life sciences. To realize this promise, this pool of rapidly expanding information needs to be efficiently integrated, that is, combined in such a way that it can be queried to extract relevant data that can be subsequently analyzed to answer meaningful research questions. The main objective of this proposal is to develop the GeneTegra System, an information integration solution that provides a common interaction environment to query data and knowledge from multiple sources. Two main obstacles have to be overcome in order to attain an effective integration of knowledge from different data sources: syntactic heterogeneity, where data sources have different representation and access mechanisms;and semantic variability, where similar lexical terms may refer to multiple concepts or dissimilar terms refer to the same concept. The GeneTegra System addresses these obstacles through the use of Semantic Web technologies: ontologies constructed using the Web Ontology Language (OWL) as a common data and knowledge representation for data sources of diverse formats, automated mechanisms for the generation and maintenance of these ontology representations, and a robust system architecture based on reusable, service-oriented mediators. The core of the proposed system consists of general algorithms, procedures, and mechanisms developed during Phase I of this project, that enable the automatic generation of ontologies, the automated identification of semantic correspondences between ontology models, and the creation and execution of queries over these ontology- modeled, distributed, heterogeneous sources. In Phase II, the GeneTegra System will be developed, implemented, and tested as a human-centered solution building on the core components developed during Phase I, incorporating a highly usable interface for query creation and execution, a mechanism for registration, sharing, and re-use of information using Web Services standards, a mechanism for determining quality of data and query reliability, and a security and privacy subsystem that allows the construction of collaborative communities while ensuring that users are properly authenticated and authorized to access information through the system. The GeneTegra System will be designed and evaluated to specifically address the integration of sources relevant to investigations of genotype-phenotype associations and to the identification of genes responsible for human diseases and conditions.

Public Health Relevance

The GeneTegra System is an information integration solution that provides a common interaction environment to query data and knowledge from multiple heterogeneous sources. It uses ontologies as the base formulism for semantic and syntactic modeling, and contains automated mechanisms for the generation of these ontologies, and for the reuse and sharing of integration configurations. It is specifically designed to address the integrated querying of sources relevant to investigations of genotype-phenotype associations and to the identification of genes responsible for human diseases and conditions.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Research Resources (NCRR)
Type: Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #: 5R44RR018667-05
Application #: 7798074
Study Section: Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer: Brazhnik, Olga

Project Start: 2003-07-01
Project End: 2011-12-31
Budget Start: 2010-04-01
Budget End: 2011-12-31
Support Year: 5
Fiscal Year: 2010
Total Cost: $507,489
Indirect Cost

Institution

Name: Infotech Soft, Inc.
Department
Type
DUNS #: 035354070

City: Miami
State: FL
Country: United States
Zip Code: 33131

Related projects


NIH 2010 R44 RR	Information Integration of Heterogeneous Data Sources Kabuka, Mansur R. / Infotech Soft, Inc.	$507,489
NIH 2009 R44 RR	Information Integration of Heterogeneous Data Sources Kabuka, Mansur R. / Infotech Soft, Inc.	$505,764
NIH 2008 R44 RR	Information Integration of Heterogeneous Data Sources Kabuka, Mansur R. / Infotech Soft, Inc.	$473,392

Publications

Jean-Mary, Yves R; Shironoshita, E Patrick; Kabuka, Mansur R (2009) Ontology Matching with Semantic Verification. Web Semant 7:235-251

Shironoshita, E Patrick; Jean-Mary, Yves R; Bradley, Ray M et al. (2009) semQA: SPARQL with Idempotent Disjunction. IEEE Trans Knowl Data Eng 21:401-414

Comments

Be the first to comment on Mansur Kabuka's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: