Modern DNA sequencing technologies have transformed our ability to interrogate human genomes in a single experiment, thereby eliminating the inherent blind spots of gene panels and whole exome sequencing. Furthermore, recent speed and economy improvements are driving the cost of whole genome sequencing (WGS) down to that of WES; therefore, we foresee a transition over the next two years to WGS as the de facto test for human disease research and diagnosis in academic labs, hospitals, and both biotechnology and pharmaceutical companies. Indeed, conservative estimates project 20 million human genomes will be sequenced in next decade. However, the transition to research and diagnostics driven by WGS presents a substantial data processing burden, as a single WGS sample represents at least 100 gigabytes and converting the raw data into a comprehensive set of genetic variation requires an intricate, rapidly changing, and computationally onerous workflow. Based on our history of developing innovative computational methods for genomic research and motivated by the acute need for advanced, scalable computing platforms, the applicant team founded Base2 Genomics (Base2). Base2 has created an innovative platform for WGS data processing, quality control, variant detection and prioritization, and data visualization using Amazon Web Services (AWS) cloud computing. Developed in close collaboration with AWS engineers, the fundamental strengths of the Base2 platform are its speed, cost, capacity for parallelization, and, most importantly, its ability to accurately identify all forms of genetic variation, whereas most other commercial offerings focus on solely the easiest forms (SNPs and INDELs) of variation to discover. We argue that, in order to maximize the research, diagnostic, and pharmacogenetics utility of WGS, it is imperative to create a complete catalog of all variation in each sequenced genome. In this proposal, we will further improve our technologies with the following aims:
Aim 1. Develop proprietary technologies for prioritizing and annotating copy-number and structural variation via population-scale databases. We have developed STIX (STructural variant IndeX), a proprietary compression algorithm and database for efficiently profiling evidence for SV among thousands of human genomes. We propose to leverage this innovation to create unique, proprietary STIX databases, and an associated SV annotation engine to facilitate accurate prioritization of SV for customer WGS cohorts.
Aim 2. Create a secure, high-performance customer data submission portal. We will develop a secure customer data submission portal that maximizes efficiency and security while allowing customers to upload data and invoke processing through the Base2 platform.

Public Health Relevance

Modern DNA sequencing technologies have transformed our ability to interrogate human genomes in a single experiment, thereby eliminating the inherent blind spots of gene panels and whole exome sequencing. However, the transition to research and diagnostics driven by whole genome sequencing presents a substantial data processing burden and converting the raw data into a comprehensive set of genetic variation requires an intricate, rapidly changing, and computationally onerous workflow. This proposal from base2 Genomics, LLC seeks to develop new software and algorithms that empower human genome analysis and interpretation in both diagnostic and research settings.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Technology Transfer (STTR) Grants - Phase I (R41)
Project #
1R41HG010126-01A1
Application #
9620349
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Sofia, Heidi J
Project Start
2018-09-12
Project End
2019-08-31
Budget Start
2018-09-12
Budget End
2019-08-31
Support Year
1
Fiscal Year
2018
Total Cost
Indirect Cost
Name
BASE2 Genomics, LLC
Department
Type
DUNS #
080087921
City
Salt Lake City
State
UT
Country
United States
Zip Code
84105