With the completion of the Human Genome Project, there is a need to translate genome-era discoveries into clinical utility. One difficulty in making bench-to-bedside translations with gene-expression and proteomic data is our current inability to relate these findings with each other and with clinical measurements. A translational researcher studying a particular biological process using microarrays or proteomics will want to gather as many relevant publicly-available data sets as possible, to compare findings. Translational investigators wanting to relate clinical or chemical data with multiple genomic or proteomic measurements will want to find and join related data sets. Unfortunately, finding and joining relevant data sets is particularly challenging today, as the useful annotations of this data are still represented only by unstructured free-text, limiting its secondary use. A question we have sought to answer is whether prior investments in biomedical ontologies can provide leverage in determining the context of genomic data in an automated manner, thereby enabling integration of gene expression and proteomic data and the secondary use of genomic data in multiple fields of research beyond those for which the data sets were originally targeted. The three specific aims to address this question are to (1) develop tools that comprehensively map contextual annotations to the largest biomedical ontology, the Unified Medical Language System (UMLS), built and supported by the National Library of Medicine, validate, and disseminate the mappings, (2) execute a four-pronged strategy to evaluate experiment-concept mappings, and (3) apply experiment-context mappings to find and integrate data within and across microarray and proteomics repositories. To keep these tools relevant to biomedical investigators, we have included three Driving Biological Projects (DBPs), in the domains of breast cancer, organ transplantation, and T-cell biology. To accomplish these DBPs, our tools and mappings will be used to find and join experimental data within and across microarray and proteomic repositories. Having DBPs to address will focus our development on a set of scalable tools that can access and analyze experimental data covering a large variety of diseases. Through our advisory committee of world-renowned NIH-funded investigators, we will ensure that our findings will have broad applicability and are useful to a wide variety of biomedical researchers.
Showing the most recent 10 out of 48 publications