The broad long-term objectives of this proposal are to create and evaluate an infrastructure (GeneSeek) to permit searching across heterogeneous source databases (genomic and citation databases) for relevant information needed for curation of an existing database of clinical knowledge (GeneClinics).
The Specific Aims are: 1) to use a novel, general purpose knowledge representation language to capture the schema of an existing database of clinical knowledge (GeneClinics genetic testing database), 2) to build a shared schema for mediating cross database queries, by extending the schema of GeneClinics and incorporating pertinent schema elements from other structured and semi-structured information sources, 3) to create and test interfaces to the targeted genetic information sources (databases and other structured information) from the shared query mediation schema, 4) to adapt the existing Tukwila data integration system to implement cross database query planning, query execution, and query result aggregation in the context of our shared query mediation schema and the multiple structured (genetic) information sources to create the GeneSeek data integration system, 5) to evaluate the performance of the Tukwila based GeneSeek data integration system and the shared data schema for precision and recall in finding relevant information for curation of a clinical database (GeneClinics genetic testing database). The broad health relatedness of the project is that data integration tools are needed to help clinicians apply the ever- growing body of medical information to patient care. The tools are needed by curators of databases of medical knowledge as well as by the care providers themselves. Nowhere is the growth in information more apparent than in the Human Genome project thus the choice of genetics as a domain to test this data integration system. The specific genetics database whose curation the GeneSeek system will be evaluated against the GeneClinics database. If successful these data integration systems could be more broadly applied to other domains in biomedicine. The research design is to apply recent developments in data integration from the artificial intelligence and database areas of computer science to a real world clinical genetics data integration problem to evaluate the applicability of this system to biomedical information retrieval tasks. The methods are to expand an existing collaboration between the current GeneClinics content and informatics teams and investigators in the Department of Computer science to: 1) enhance the Tukwila data integration architecture and its related CARIN knowledge representation language, and 2) to use these tools and the existing GeneClinics data model to implement and evaluate this data integration system in the specific domain of medical genetics.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG002288-01
Application #
6031661
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Brooks, Lisa
Project Start
2000-08-15
Project End
2003-07-31
Budget Start
2000-08-15
Budget End
2001-07-31
Support Year
1
Fiscal Year
2000
Total Cost
$354,198
Indirect Cost
Name
University of Washington
Department
Pediatrics
Type
Schools of Medicine
DUNS #
135646524
City
Seattle
State
WA
Country
United States
Zip Code
98195
Cadag, Eithon; Tarczy-Hornoch, Peter; Myler, Peter J (2012) Learning virulent proteins from integrated query networks. BMC Bioinformatics 13:321
Sarkar, Indra Neil; Butte, Atul J; Lussier, Yves A et al. (2011) Translational bioinformatics: linking knowledge across biological and clinical realms. J Am Med Inform Assoc 18:354-7
Shen, Terry H; Tarczy-Hornoch, Peter; Detwiler, Landon T et al. (2010) Evaluation of probabilistic and logical inference for a SNP annotation system. J Biomed Inform 43:407-18
Lacson, Ronilda; Pitzer, Erik; Hinske, Christian et al. (2009) Evaluation of a large-scale biomedical data annotation initiative. BMC Bioinformatics 10 Suppl 9:S10
Shen, Terry H; Carlson, Christopher S; Tarczy-Hornoch, Peter (2009) Evaluating the accuracy of a functional SNP annotation system. BMC Bioinformatics 10 Suppl 9:S11
Shen, Terry H; Carlson, Christopher S; Tarczy-Hornoch, Peter (2009) SNPit: a federated data integration system for the purpose of functional SNP annotation. Comput Methods Programs Biomed 95:181-9
Louie, Brenton; Tarczy-Hornoch, Peter; Higdon, Roger et al. (2008) Validating annotations for uncharacterized proteins in Shewanella oneidensis. OMICS 12:211-5
(2007) Bio*Medical Informatics and Genomic Medicine: Research and Training. J Biomed Inform 40:1-4
Cadag, Eithon; Louie, Brent; Myler, Peter J et al. (2007) Biomediator data integration and inference for functional annotation of anonymous sequences. Pac Symp Biocomput :343-54
Anderson, Nicholas R; Lee, E Sally; Brockenbrough, J Scott et al. (2007) Issues in biomedical research data management and analysis: needs and barriers. J Am Med Inform Assoc 14:478-88

Showing the most recent 10 out of 20 publications