Integration of comprehensive cancer mutation and expression-associated data for biomarker evaluation and discovery

Mazumder, Raja; Crichton, Daniel

Abstract

Current technologies for cancer genomics research generate petabytes of data that are dispersed across multiple archives in a non-standard fashion. This dispersal poses major challenges to comprehensive analyses based on the integration of such data. Two common types of secondary data generated from sequencing- based studies involve mutation and gene expression associated with the cancer state as inferred from comparing matched tumor and normal samples. Massive collaborations like the Cancer Genome Atlas (TCGA) and the International Cancer Genomic Consortium (ICGC) are instrumental in facilitating the generation of the sequence data and providing a modicum of standardization through best practices, but they do not always follow the same standards between projects. Moreover, proprietary databases like the Catalogue of Somatic Mutations in Cancer (COSMIC) generally store and annotate data in a format uniquely optimized for their own database to meet individual business needs. Thus, integrating mutation and expression data across resources involves a massive undertaking with efforts devoted to data curation, unification, harmonization, and appropriate annotation for proper representation at a central location. Additionally, it is difficult to comprehensively collect and map protein functional sites to the mutation sites from a variety of databases such as UniProt, RefSeq, and many others because the underlying sequences in these databases can be different. To address this challenge, the Early Detection Research Network (EDRN) Associate Membership funded the development of BioMuta and BioXpress, cancer-associated mutation and expression databases, respectively, to provide access to unified data from several popular cancer repositories and functional data from well-known molecular biology resources. Links to BioMuta are available through the EDRN portal and UniProt. The focus of the proposed project is to provide a custom portal encompassing up-to-date releases of BioMuta and BioXpress leveraging the existing EDRN framework and data. This will provide a broader understanding of the cancer landscape moving toward the proteomic space and working synergistically with other ITCR resources. To supplement these data, we further propose to integrate normal expression data across several species that can be used to derive a deeper understanding of the cancer-associated expression profiles. Text-mining support will also be applied to the identified cancer-related mutation and expression profiles for evidence to aid in interpretation of the findings. It is expected that such large-scale integration of cancer data and supporting information will not only benefit cancer research, but will also become a critical necessity for ensuring the most efficient synthesis of information and therefore the earliest detection methods possible.

Public Health Relevance

The proposed research will simultaneously streamline and advance cancer biomarker identification pipelines by making various and numerous pre-analyzed cancer-relevant mutation and expression data, mapped to protein functional site data and protein functional information, available in a unified manner through a single user interface.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 1U01CA215010-01
Application #: 9296729
Study Section: Special Emphasis Panel (ZCA1)
Program Officer: Abrams, Natalie

Project Start: 2017-05-01
Project End: 2020-04-30
Budget Start: 2017-05-01
Budget End: 2018-04-30
Support Year: 1
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: George Washington University
Department: Biochemistry
Type: Schools of Medicine
DUNS #: 043990498

City: Washington
State: DC
Country: United States
Zip Code: 20052

Related projects


NIH 2019 U01 CA	Integration of comprehensive cancer mutation and expression-associated data for biomarker evaluation and discovery Mazumder, Raja; Crichton, Daniel / George Washington University
NIH 2018 U01 CA	Integration of comprehensive cancer mutation and expression-associated data for biomarker evaluation and discovery Mazumder, Raja; Crichton, Daniel / George Washington University
NIH 2017 U01 CA	Integration of comprehensive cancer mutation and expression-associated data for biomarker evaluation and discovery Mazumder, Raja; Crichton, Daniel / George Washington University

Publications

Hu, Yu; Dingerdissen, Hayley; Gupta, Samir et al. (2018) Identification of key differentially expressed MicroRNAs in cancer patients through pan-cancer analysis. Comput Biol Med 103:183-197

Gupta, Samir; Dingerdissen, Hayley; Ross, Karen E et al. (2018) DEXTER: Disease-Expression Relation Extraction from Text. Database (Oxford) 2018:

Comments

Be the first to comment on Raja Mazumder's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: