An innovative WGS platform for discovery, annotation, and interpretation of all forms of human genetic variation.

Pedersen, Brent

Abstract

Modern DNA sequencing technologies have transformed our ability to interrogate human genomes in a single experiment, thereby eliminating the inherent blind spots of gene panels and whole exome sequencing. Furthermore, recent speed and economy improvements are driving the cost of whole genome sequencing (WGS) down to that of WES; therefore, we foresee a transition over the next two years to WGS as the de facto test for human disease research and diagnosis in academic labs, hospitals, and both biotechnology and pharmaceutical companies. Indeed, conservative estimates project 20 million human genomes will be sequenced in next decade. However, the transition to research and diagnostics driven by WGS presents a substantial data processing burden, as a single WGS sample represents at least 100 gigabytes and converting the raw data into a comprehensive set of genetic variation requires an intricate, rapidly changing, and computationally onerous workflow. Based on our history of developing innovative computational methods for genomic research and motivated by the acute need for advanced, scalable computing platforms, the applicant team founded Base2 Genomics (Base2). Base2 has created an innovative platform for WGS data processing, quality control, variant detection and prioritization, and data visualization using Amazon Web Services (AWS) cloud computing. Developed in close collaboration with AWS engineers, the fundamental strengths of the Base2 platform are its speed, cost, capacity for parallelization, and, most importantly, its ability to accurately identify all forms of genetic variation, whereas most other commercial offerings focus on solely the easiest forms (SNPs and INDELs) of variation to discover. We argue that, in order to maximize the research, diagnostic, and pharmacogenetics utility of WGS, it is imperative to create a complete catalog of all variation in each sequenced genome. In this proposal, we will further improve our technologies with the following aims:
Aim 1. Develop proprietary technologies for prioritizing and annotating copy-number and structural variation via population-scale databases. We have developed STIX (STructural variant IndeX), a proprietary compression algorithm and database for efficiently profiling evidence for SV among thousands of human genomes. We propose to leverage this innovation to create unique, proprietary STIX databases, and an associated SV annotation engine to facilitate accurate prioritization of SV for customer WGS cohorts.
Aim 2. Create a secure, high-performance customer data submission portal. We will develop a secure customer data submission portal that maximizes efficiency and security while allowing customers to upload data and invoke processing through the Base2 platform.

Public Health Relevance

Modern DNA sequencing technologies have transformed our ability to interrogate human genomes in a single experiment, thereby eliminating the inherent blind spots of gene panels and whole exome sequencing. However, the transition to research and diagnostics driven by whole genome sequencing presents a substantial data processing burden and converting the raw data into a comprehensive set of genetic variation requires an intricate, rapidly changing, and computationally onerous workflow. This proposal from base2 Genomics, LLC seeks to develop new software and algorithms that empower human genome analysis and interpretation in both diagnostic and research settings.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Small Business Technology Transfer (STTR) Grants - Phase I (R41)
Project #: 1R41HG010126-01A1
Application #: 9620349
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Sofia, Heidi J

Project Start: 2018-09-12
Project End: 2019-08-31
Budget Start: 2018-09-12
Budget End: 2019-08-31
Support Year: 1
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: BASE2 Genomics, LLC
Department
Type
DUNS #: 080087921

City: Salt Lake City
State: UT
Country: United States
Zip Code: 84105

Related projects


NIH 2019 R41 HG	An innovative WGS platform for discovery, annotation, and interpretation of all forms of human genetic variation. Pedersen, Brent / BASE2 Genomics, LLC
NIH 2018 R41 HG	An innovative WGS platform for discovery, annotation, and interpretation of all forms of human genetic variation. Pedersen, Brent / BASE2 Genomics, LLC

Comments

Be the first to comment on Brent Pedersen's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: