Community Platform for Data Wrangling of Gene and Genetic Variant Annotations

Wu, Chunlei

Abstract

zed and structured in the form of annotations of biological entities such as genes, genetic variants, diseases, and pathways. These annotations are fragmented across dozens of data repositories like NCBI Entrez, Ensembl, UniProt, and hundreds (or more) of other specialized databases. While the volume and breadth of annotations is valuable, their fragmentation across many data silos is often frustrating and inefficient. Bioinformaticians everywhere must continuously and repetitively engage in data wrangling in an effort to comprehensively integrate knowledge from all these resources, and these uncoordinated efforts represent an enormous duplication of work. The problem of fragmentation is exacerbated (perhaps even fundamentally caused) by the inability of data providers to efficiently contribute to existing repositories. As a result, annotaion providers must generate new resources in order to host newly-generated annotations that are unavailable in the central repositories. In this proposal, we will create a hybrid solution that combines the high performance of a centralized system with the flexibility and breadth of a federated system. The centralized component will provide high-performance computational infrastructure for the integration, query and access of biological annotations. The technical design of this component will be based on our successful MyGene.info web services (://mygene.info). The federated component builds on our extensive background in crowdsourcing. We will build community infrastructure that allows the small- and medium-scale data wrangling that is already being performed (and repeated) by many scientists to be aggregated into a single big-data resource. Additionally, semantic interoperability will be added to our system to ensure that it will integrate with current and future Linked Data applications.

Public Health Relevance

A primary challenge in the biomedical Big Data era is that the vast amount of scientific discoveries outpaces the traditional efforts of structuring them in a computable form. Successful completion of this work will result in a platform to harvest structured data from individual researchers directly, and speed up biomedical research with this aggregated community intelligence.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 3U01HG008473-02S1
Application #: 9268840
Study Section: Special Emphasis Panel (ZRG1-BST-N (50)R)
Program Officer: Sofia, Heidi J

Project Start: 2015-06-01
Project End: 2018-05-31
Budget Start: 2016-09-26
Budget End: 2017-05-31
Support Year: 2
Fiscal Year: 2016
Total Cost: $372,625
Indirect Cost: $115,625

Institution

Name: Scripps Research Institute
Department
Type
DUNS #: 781613492

City: La Jolla
State: CA
Country: United States
Zip Code: 92037

Related projects


NIH 2017 U01 HG	Community Platform for Data Wrangling of Gene and Genetic Variant Annotations Wu, Chunlei / Scripps Research Institute
NIH 2016 U01 HG	Community Platform for Data Wrangling of Gene and Genetic Variant Annotations Wu, Chunlei / Scripps Research Institute
NIH 2016 U01 HG	Community Platform for Data Wrangling of Gene and Genetic Variant Annotations Wu, Chunlei / Scripps Research Institute	$372,625
NIH 2015 U01 HG	Community Platform for Data Wrangling of Gene and Genetic Variant Annotations Wu, Chunlei / Scripps Research Institute

Publications

Xin, Jiwen; Afrasiabi, Cyrus; Lelong, Sebastien et al. (2018) Cross-linking BioThings APIs through JSON-LD to facilitate knowledge exploration. BMC Bioinformatics 19:30

Wilkinson, Mark D; Sansone, Susanna-Assunta; Schultes, Erik et al. (2018) A design framework and exemplar metrics for FAIRness. Sci Data 5:180118

Cai, Binghuang; Li, Biao; Kiga, Nikki et al. (2017) Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges. Hum Mutat 38:1266-1276

Xin, Jiwen; Mark, Adam; Afrasiabi, Cyrus et al. (2016) High-performance web services for querying gene and variant annotation. Genome Biol 17:91

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: