Methods for generalized ontology terms enrichment analysis

Shah, Nigam

Abstract

The analysis of high-throughput data such as gene-expression assays usually results in a long list of significant genes. One commonly used method to gain insight into the biological significance of alterations in gene expression levels is to determine whether the Gene Ontology (GO) terms about specific biological processes, molecular functions, or cellular components are over- or under-represented in the annotations of the gene sets generated as the output of the statistical analysis. This analysis method often referred to as enrichment analysis, can be used to summarize and profile a gene-set, as well as other genome scale data. While the GO has been the principal focus for enrichment analysis, we can carry out the same sort of profiling using any ontology available in the biomedical domain. We can perform enrichment analysis using disease ontologies - such as SNOMED-CT. For example, by annotating known protein mutations with disease terms, Mort et al. identified a class of diseases - blood coagulation disorders - that are associated with a significant depletion in substitutions a O-linked glycosylation sites. We can apply the enrichment analysis methodology to other datasets of interest - such as patient cohorts. For example, enrichment analysis might detect specific co-morbidities that have an increased incidence in rheumatoid arthritis patients - a topic of recent discussion in the literature and considered essential to provide high quality care. We can also ask translational questions; for example, by identifying other disease associations for the genes involved in a certain disease of interest we can gain insight into how the causation of seemingly unrelated diseases might be related, e.g., Werner's syndrome, Cockayne syndrome, Burkitt's lymphoma, and Rothmund-Thomson Syndrome are all related by the fact that they share the same underlying gene related to aging. Despite widespread adoption, GO-based enrichment analysis has intrinsic drawbacks. Our goal is to develop and apply general enrichment analysis methods - that can use any biomedical ontology - to profile diverse datasets, such as patient cohorts from electronic medical records and sets of genes deemed significant in genomic analyses. We propose to address some of the key shortcomings of the current enrichment-analysis methods, to expand significantly the ontologies that are used for such analyses, and to apply enrichment analysis on novel data sources for asking translational questions. The hypothesis spanning all our aims is that if we are successful, enrichment analysis - a widely used analysis approach by bioinformatics scientists - will be possible with more than just the GO and the method will be extended to ask clinical questions.

Public Health Relevance

If we are successful, enrichment analysis -a widely used analysis approach by bioinformatics scientists - will be possible with more than just the GO and the method will be extended to ask clinical questions. Our work is significant because we will extend the scope of enrichment analysis to the clinical domain. To the best of our knowledge, our work will the first to analyze a large corpus of millions of free-text clinical notes with 'omics' inspired, ontology-based methods to profile off-label usage and their associated safety profiles.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM011369-03
Application #: 8909186
Study Section: Special Emphasis Panel (ZLM1)
Program Officer: Ye, Jane

Project Start: 2013-09-01
Project End: 2016-08-31
Budget Start: 2015-09-01
Budget End: 2016-08-31
Support Year: 3
Fiscal Year: 2015
Total Cost
Indirect Cost

Institution

Name: Stanford University
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 009214214

City: Stanford
State: CA
Country: United States
Zip Code: 94304

Related projects


NIH 2020 R01 LM	From enrichment to insights Shah, Nigam / Stanford University
NIH 2019 R01 LM	From enrichment to insights Shah, Nigam / Stanford University
NIH 2018 R01 LM	From enrichment to insights Shah, Nigam / Stanford University
NIH 2017 R01 LM	From enrichment to insights Shah, Nigam / Stanford University
NIH 2016 R01 LM	Methods for generalized ontology terms enrichment analysis Shah, Nigam / Stanford University
NIH 2015 R01 LM	Methods for generalized ontology terms enrichment analysis Shah, Nigam / Stanford University
NIH 2014 R01 LM	Methods for generalized ontology terms enrichment analysis Shah, Nigam / Stanford University
NIH 2013 R01 LM	Methods for generalized ontology terms enrichment analysis Shah, Nigam / Stanford University	$477,224

Publications

Callahan, Alison; Winnenburg, Rainer; Shah, Nigam H (2018) U-Index, a dataset and an impact metric for informatics tools and databases. Sci Data 5:180043

Coulet, Adrien; Shah, Nigam H; Wack, Maxime et al. (2018) Predicting the need for a reduced drug dose, at first prescription. Sci Rep 8:15558

Wang, Liwei; Rastegar-Mojarad, Majid; Ji, Zhiliang et al. (2018) Detecting Pharmacovigilance Signals Combining Electronic Medical Records With Spontaneous Reports: A Case Study of Conventional Disease-Modifying Antirheumatic Drugs for Rheumatoid Arthritis. Front Pharmacol 9:875

Ravikumar, K E; Rastegar-Mojarad, Majid; Liu, Hongfang (2017) BELMiner: adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences. Database (Oxford) 2017:

Agarwal, Vibhu; Shah, Nigam H (2017) LEARNING ATTRIBUTES OF DISEASE PROGRESSION FROM TRAJECTORIES OF SPARSE LAB VALUES. Pac Symp Biocomput 22:184-194

Oellrich, Anika; Collier, Nigel; Groza, Tudor et al. (2016) The digital revolution in phenotyping. Brief Bioinform 17:819-30

Li, Dingcheng; Wang, Zhen; Wang, Liwei et al. (2016) A Text-Mining Framework for Supporting Systematic Reviews. Am J Inf Manag 1:1-9

Agarwal, Vibhu; Podchiyska, Tanya; Banda, Juan M et al. (2016) Learning statistical models of phenotypes using noisy labeled training data. J Am Med Inform Assoc 23:1166-1173

Hripcsak, George; Ryan, Patrick B; Duke, Jon D et al. (2016) Characterizing treatment pathways at scale using the OHDSI network. Proc Natl Acad Sci U S A 113:7329-36

Rastegar-Mojarad, Majid; Komandur Elayavilli, Ravikumar; Liu, Hongfang (2016) BELTracker: evidence sentence retrieval for BEL statements. Database (Oxford) 2016:

Showing the most recent 10 out of 36 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: