The analysis of high-throughput data such as gene-expression assays usually results in a long list of "significant genes." One commonly used method to gain insight into the biological significance of alterations in gene expression levels is to determine whether the Gene Ontology (GO) terms about specific biological processes, molecular functions, or cellular components are over- or under-represented in the annotations of the gene sets generated as the output of the statistical analysis. This analysis method often referred to as "enrichment analysis," can be used to summarize and profile a gene-set, as well as other genome scale data. While the GO has been the principal focus for enrichment analysis, we can carry out the same sort of profiling using any ontology available in the biomedical domain. We can perform enrichment analysis using disease ontologies - such as SNOMED-CT. For example, by annotating known protein mutations with disease terms, Mort et al. identified a class of diseases - blood coagulation disorders - that are associated with a significant depletion in substitutions a O-linked glycosylation sites. We can apply the enrichment analysis methodology to other datasets of interest - such as patient cohorts. For example, enrichment analysis might detect specific co-morbidities that have an increased incidence in rheumatoid arthritis patients - a topic of recent discussion in the literature and considered essential to provide high quality care. We can also ask translational questions;for example, by identifying other disease associations for the genes involved in a certain disease of interest we can gain insight into how the causation of seemingly unrelated diseases might be related, e.g., Werner's syndrome, Cockayne syndrome, Burkitt's lymphoma, and Rothmund-Thomson Syndrome are all related by the fact that they share the same underlying gene related to aging. Despite widespread adoption, GO-based enrichment analysis has intrinsic drawbacks. Our goal is to develop and apply general enrichment analysis methods - that can use any biomedical ontology - to profile diverse datasets, such as patient cohorts from electronic medical records and sets of genes deemed significant in genomic analyses. We propose to address some of the key shortcomings of the current enrichment-analysis methods, to expand significantly the ontologies that are used for such analyses, and to apply enrichment analysis on novel data sources for asking translational questions. The hypothesis spanning all our aims is that if we are successful, enrichment analysis - a widely used analysis approach by bioinformatics scientists - will be possible with more than just the GO and the method will be extended to ask clinical questions.

Public Health Relevance

If we are successful, enrichment analysis?a widely used analysis approach by bioinformatics scientists? will be possible with more than just the GO and the method will be extended to ask clinical questions. Our work is significant because we will extend the scope of enrichment analysis to the clinical domain. To the best of our knowledge, our work will the first to analyze a large corpus of millions of free-text clinical notes with ?omics? inspired, ontology-based methods to profile off-label usage and their associated safety profiles.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1-ZH-C (01))
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code
Jung, Kenneth; LePendu, Paea; Chen, William S et al. (2014) Automated detection of off-label drug use. PLoS One 9:e89324
Huang, Sandy H; LePendu, Paea; Iyer, Srinivasan V et al. (2014) Toward personalizing treatment for depression: predicting diagnosis and severity. J Am Med Inform Assoc 21:1069-75