The analysis of high-throughput data such as gene-expression assays usually results in a long list of significant genes. One commonly used method to gain insight into the biological significance of alterations in gene expression levels is to determine whether the Gene Ontology (GO) terms about specific biological processes, molecular functions, or cellular components are over- or under-represented in the annotations of the gene sets generated as the output of the statistical analysis. This analysis method often referred to as enrichment analysis, can be used to summarize and profile a gene-set, as well as other genome scale data. While the GO has been the principal focus for enrichment analysis, we can carry out the same sort of profiling using any ontology available in the biomedical domain. We can perform enrichment analysis using disease ontologies - such as SNOMED-CT. For example, by annotating known protein mutations with disease terms, Mort et al. identified a class of diseases - blood coagulation disorders - that are associated with a significant depletion in substitutions a O-linked glycosylation sites. We can apply the enrichment analysis methodology to other datasets of interest - such as patient cohorts. For example, enrichment analysis might detect specific co-morbidities that have an increased incidence in rheumatoid arthritis patients - a topic of recent discussion in the literature and considered essential to provide high quality care. We can also ask translational questions; for example, by identifying other disease associations for the genes involved in a certain disease of interest we can gain insight into how the causation of seemingly unrelated diseases might be related, e.g., Werner's syndrome, Cockayne syndrome, Burkitt's lymphoma, and Rothmund-Thomson Syndrome are all related by the fact that they share the same underlying gene related to aging. Despite widespread adoption, GO-based enrichment analysis has intrinsic drawbacks. Our goal is to develop and apply general enrichment analysis methods - that can use any biomedical ontology - to profile diverse datasets, such as patient cohorts from electronic medical records and sets of genes deemed significant in genomic analyses. We propose to address some of the key shortcomings of the current enrichment-analysis methods, to expand significantly the ontologies that are used for such analyses, and to apply enrichment analysis on novel data sources for asking translational questions. The hypothesis spanning all our aims is that if we are successful, enrichment analysis - a widely used analysis approach by bioinformatics scientists - will be possible with more than just the GO and the method will be extended to ask clinical questions.

Public Health Relevance

If we are successful, enrichment analysis -a widely used analysis approach by bioinformatics scientists - will be possible with more than just the GO and the method will be extended to ask clinical questions. Our work is significant because we will extend the scope of enrichment analysis to the clinical domain. To the best of our knowledge, our work will the first to analyze a large corpus of millions of free-text clinical notes with 'omics' inspired, ontology-based methods to profile off-label usage and their associated safety profiles.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM011369-03
Application #
8909186
Study Section
Special Emphasis Panel (ZLM1)
Program Officer
Ye, Jane
Project Start
2013-09-01
Project End
2016-08-31
Budget Start
2015-09-01
Budget End
2016-08-31
Support Year
3
Fiscal Year
2015
Total Cost
Indirect Cost
Name
Stanford University
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94304
Callahan, Alison; Winnenburg, Rainer; Shah, Nigam H (2018) U-Index, a dataset and an impact metric for informatics tools and databases. Sci Data 5:180043
Coulet, Adrien; Shah, Nigam H; Wack, Maxime et al. (2018) Predicting the need for a reduced drug dose, at first prescription. Sci Rep 8:15558
Wang, Liwei; Rastegar-Mojarad, Majid; Ji, Zhiliang et al. (2018) Detecting Pharmacovigilance Signals Combining Electronic Medical Records With Spontaneous Reports: A Case Study of Conventional Disease-Modifying Antirheumatic Drugs for Rheumatoid Arthritis. Front Pharmacol 9:875
Ravikumar, K E; Rastegar-Mojarad, Majid; Liu, Hongfang (2017) BELMiner: adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences. Database (Oxford) 2017:
Agarwal, Vibhu; Shah, Nigam H (2017) LEARNING ATTRIBUTES OF DISEASE PROGRESSION FROM TRAJECTORIES OF SPARSE LAB VALUES. Pac Symp Biocomput 22:184-194
Oellrich, Anika; Collier, Nigel; Groza, Tudor et al. (2016) The digital revolution in phenotyping. Brief Bioinform 17:819-30
Li, Dingcheng; Wang, Zhen; Wang, Liwei et al. (2016) A Text-Mining Framework for Supporting Systematic Reviews. Am J Inf Manag 1:1-9
Agarwal, Vibhu; Podchiyska, Tanya; Banda, Juan M et al. (2016) Learning statistical models of phenotypes using noisy labeled training data. J Am Med Inform Assoc 23:1166-1173
Hripcsak, George; Ryan, Patrick B; Duke, Jon D et al. (2016) Characterizing treatment pathways at scale using the OHDSI network. Proc Natl Acad Sci U S A 113:7329-36
Rastegar-Mojarad, Majid; Komandur Elayavilli, Ravikumar; Liu, Hongfang (2016) BELTracker: evidence sentence retrieval for BEL statements. Database (Oxford) 2016:

Showing the most recent 10 out of 36 publications