The production of whole genome sequences (de novo assembly) is fundamental to both basic and clinical genomics research in the areas of gene regulation, structural variant detection, haplotype phasing, and metagenomics analysis. The 10,000 fold reduction in raw Next Generation DNA Sequencing (NGS) costs the past seven years has made genomics research much more affordable. However, while economical these technologies currently produce highly fragmented genome assemblies. Dovetail Genomics has developed a novel and cost-effective Multi-Scale Linking (MSL) sequencing library that leverages existing NGS technology to improve the median assembly size (N50) over 100 fold from less than 100 kbp to greater than 10 Mbp. Our service will utilize these libraries and our novel assembly algorithms to produce high quality, more complete genomes beginning from only DNA, no cells or tissue required. Furthermore, we will do so rapidly (less than one month) and economically (less than $30,000). Our service can also be used to improve existing assemblies for less than $10,000. A principal feature of these libraries that enables such improvement is the production of genomic read pairs spanning many scales and extremely long distances. Our software pipeline leverages this feature to produce more contiguous and complete genome assemblies. Our long-term goal with the provision of these high-quality genomes is the improvement of human health and the expansion of human knowledge of ourselves and other organisms. Higher quality genomes enable more powerful studies and understanding of human genomic disease. They also place personalized medicine within reach via the rapid and affordable provision of individual patient genomes. In Dovetail's first year we demonstrated the feasibility of our technologies' capability to increase genome assembly contiguity 100 fold. Moving forward we are striving to refine both our library production and analysis platforms to a commercially viable level. This will require increasing the value of the data produced by our proprietary sequencing libraries and refining and deploying our assembly pipeline and other analysis software.
Our specific aims to accomplish this feat are: (1) Increase the efficiency and genomic range of our library production platform. (2) Improve our existing prototype assembly pipeline, optimizing speed and scalability by transitioning to cloud-based compute infrastructure. (3) Improve existing pipelines for genomic phasing and the detection and characterization of structural variation in humans for clinical genomics. There are commercial opportunities for such a service in both academic and clinical research, as well as in the clinic itself. Sufficiently complete and cost-effective genome assemblies will enable broader and more powerful studies of genomes at both the population and individual level. In clinical research, where large patient cohorts are the norm, they will enable the discovery and description of genomic drivers of human disease. And in the clinic itself they will enable the rapid acquisition of patient genomes for diagnosis and treatment of many diseases, including and especially cancer.

Public Health Relevance

Many genomic diseases, including and especially cancer, are associated with large rearrangements in the genome that can only be adequately discovered and characterized with long-range genomic sequence information. Similarly, knowledge of the inheritance patterns of genomic variants associated with disease and the economical acquisition of personal genomes for medical purposes benefit tremendously from the same information. Consequently, the attainment of extremely long-range genomic sequence information has significant implications for human health, particularly for the study and treatment of genomic diseases.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
5R44HG008719-02
Application #
9135446
Study Section
Special Emphasis Panel (ZRG1-IMST-J (15)B)
Program Officer
Smith, Michael
Project Start
2015-09-01
Project End
2017-08-31
Budget Start
2016-09-01
Budget End
2017-08-31
Support Year
2
Fiscal Year
2016
Total Cost
$760,913
Indirect Cost
Name
Dovetail Genomics, LLC
Department
Type
DUNS #
079097900
City
Santa Cruz
State
CA
Country
United States
Zip Code
95060