This application for an NIH Mentored Quantitative Research Career Award requests support for Dr. Kei-Hoi Cheung as he embarks on a faculty career focused on genome-related bioinformatics. This application presents a research career development plan in the field of bioinformatics, bridging computer science and biology. The plan includes two partially overlapping phases: (1) a didactic phase that emphasizes training, including coursework and laboratory work in the area of genetics and genomics to complement Dr. Cheung's doctoral training in Computer Science and (ii) a development phase that focuses on intense development of the proposed research. These two phases will be closely supervised by a steering committee of senior scientists, who will serve as mentors or advisors, in the area of biology and bioinformatics. The human genome project and the rapid advance in genomic technology (e.g., microarrays) have produced numerous local, national, and international genome databases, many of which are Web-accessible. To answer questions that arise in advanced genome research projects, researchers often need to analyze a large amount of data that are collected from multiple related databases. Therefore, it is important to explore (1) how to integrate the databases involved in a flexible and useful fashion and (2) how to perform large-scale data analyses as easily and rapidly as possible. To this end, we propose two complimentary approaches. 1. The problem of data integration or interoperation is difficult because of the syntactic and semantic heterogeneities involved. To address this problem, we propose a metadata-driven approach using eXtensible Markup Language (XML), which incorporates standardized vocabulary to map heterogeneous Web-accessible data sets into a common format that facilitates interoperability. 2. To facilitate and speed up analysis of a large quantity of data, we will also explore a range of computational techniques including the use of Turbogenomics, which represents collaboration with the high performance computing group within the Yale department of Computer Science. These techniques allow (i) integration of heterogeneous software components (analysis tools) to be done easily and (ii) exploitation of the power of parallel computing. We will design, develop, test, and evaluate the approach in the context of current database projects including: 1) TRIPLES that manages data for large-scale yeast genome analysis (with Prof Snyder) and 2) ALFRED that stores gene frequency data on different human populations (with Prof Kidd). We have identified a number of related external Web-accessible databases as well as tools that users would like to access from TRIPLES and ALFRED in an integrated fashion. We will initially develop and apply our approach to integrate these databases and tools. We will extend our approach to other types of genomic data such as microarray data, which both laboratories and others will soon be generating in large quantities.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Mentored Quantitative Research Career Development Award (K25)
Project #
5K25HG002378-03
Application #
6649804
Study Section
Ethical, Legal, Social Implications Review Committee (GNOM)
Program Officer
Good, Peter J
Project Start
2001-09-01
Project End
2006-08-31
Budget Start
2003-09-01
Budget End
2004-08-31
Support Year
3
Fiscal Year
2003
Total Cost
$141,992
Indirect Cost
Name
Yale University
Department
Anesthesiology
Type
Schools of Medicine
DUNS #
043207562
City
New Haven
State
CT
Country
United States
Zip Code
06520
Crasto, Chiquito J; Marenco, Luis N; Liu, Nian et al. (2007) SenseLab: new developments in disseminating neuroscience information. Brief Bioinform 8:150-62
Lam, Hugo Y K; Marenco, Luis; Clark, Tim et al. (2007) AlzPharm: integration of neurodegeneration data using RDF. BMC Bioinformatics 8 Suppl 3:S4
Smith, Andrew K; Cheung, Kei-Hoi; Yip, Kevin Y et al. (2007) LinkHub: a Semantic Web system that facilitates cross-database queries and information retrieval in proteomics. BMC Bioinformatics 8 Suppl 3:S5
Yip, Kevin Y; Qi, Peishen; Schultz, Martin et al. (2006) SemBiosphere: a semantic web approach to recommending microarray clustering services. Pac Symp Biocomput :188-99
Lam, Hugo Y K; Marenco, Luis; Shepherd, Gordon M et al. (2006) Using web ontology language to integrate heterogeneous databases in the neurosciences. AMIA Annu Symp Proc :464-8
Carriero, Nicholas; Osier, Michael V; Cheung, Kei-Hoi et al. (2005) A high productivity/low maintenance approach to high-performance computation for biomedicine: four case studies. J Am Med Inform Assoc 12:90-8
Cheung, Kei-Hoi; Yip, Kevin Y; Smith, Andrew et al. (2005) YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics 21 Suppl 1:i85-96
Osier, Michael V; Zhao, Hongyu; Cheung, Kei-Hoi (2004) Handling multiple testing while interpreting microarrays with the Gene Ontology Database. BMC Bioinformatics 5:124
de Knikker, Remko; Guo, Youjun; Li, Jin-Long et al. (2004) A web services choreography scenario for interoperating bioinformatics applications. BMC Bioinformatics 5:25
Cheung, Kei-Hoi; de Knikker, Remko; Guo, Youjun et al. (2004) Biosphere: the interoperation of web services in microarray cluster analysis. Appl Bioinformatics 3:253-6

Showing the most recent 10 out of 14 publications