Hepatitis infection is a widespread and persistent crisis, both in the U.S. and worldwide. Chronic infection with the hepatitis B virus (HBV) is a major cause of end-stage liver disease, with staggering long-term healthcare costs. Thus, HBV research and innovation have been deemed national priorities in the U.S., as evidenced by calls-to-action by the Institute of Medicine and the Department of Health and Human Services. The intra-host HBV infection comprises a genetically diverse population of variants (quasispecies) that is an important determinant of pathogenesis and treatment outcome. Mapping the quasispecies is required; however map construction is difficult owing to HBVs' complex genome structure, variant divergence from reference genomes and a lack of accurate tools. Current de novo assembly algorithms intended for viral genome assembly produce inadequate single linear representations of a viral population. Algorithms meant for diploid genome assembly are taxed and confused by virology data, produce unnecessarily complex output and are computationally expensive. For this Phase I project, GATACA, LLC proposes to develop the Assembly Tool to accurately map intra-host HBV strains from short read data. Using novel steps, the tool will assemble the reads into multiple interconnected consensus sequences (contigs) as a map of global haplotypes. The contig sets will provide valuable reference data backbones for subsequent analyses. The tool will improve inter-host comparisons which depend on accurate HBV quasispecies parameters. The Assembly Tool will be integrated into existing software developed by GATACA.
Specific Aims for Phase I are: (1) Develop, test and prototype a de novo algorithm based on novel iterative clustering and priority merging steps and represent global HBV variation as interconnected graphs. (2) Develop a validation algorithm for generating simulated HBV data, incorporating patient-derived HBV data and benchmarking the performance of the Assembly Tool against that of other viral genome assemblers. In Phase II, GATACA will develop HepBbase, a commercial web-based platform that will provide data management and allow users to plug-and-play familiar analysis tools alongside HBV-specific functions. An Assembly pipeline will be developed in Phase II to automate the labor-intensive steps of developing HBV draft genomes. GATACA will begin HepBbase commercialization efforts in Q2 2017 during Phase II development. Potential customers are HBV virologists in all research-based disciplines, who lack adequate or centralized user-friendly HBV management and analysis software. Discoveries made with our tool will also inform clinicians, based on assembled patient reference genomes. As the first virus-specific large- scale capacity bioinformatics platform, HepBbase will eliminate bottlenecks and facilitate collaboration.

Public Health Relevance

Chronic infection with hepatitis B virus (HBV) is associated with serious health concerns (such as cancer, cirrhosis, and liver failure) and escalating healthcare costs. Deep sequencing of viral infections using next generation sequencing (NGS) methods yields extensive data about the genetic complexity of HBV and other viral infections. Newer sequencing technologies are able to produce longer, useable sequences, but are years away from reaching the maturity required for wide-spread adoption in virology. Meanwhile, NGS continues to yield new insights on heterologous infections to identify resistance associated variants (RAVs) even if the resistance requires a long-range interaction and occurs in rare variants. Current analysis tools are taxed and challenged to provide an accurate intra-host variant map; they detect only point mutations at limited viral regions. Virologists require tools fr aggregating the NGS fragments into accurate variant strains to advance research in the therapeutics and clinical fields, and to guide clinical management of patients. In this Phase I SBIR, GATACA, LLC proposes to develop an innovative short read assembler to bridge the de novo single-consensus assembly approach and the haplotype inference problem that is especially challenging in HBV intra-host populations.

Agency
National Institute of Health (NIH)
Institute
National Institute of Allergy and Infectious Diseases (NIAID)
Type
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
1R43AI122785-01A1
Application #
9141787
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Koshy, Rajen
Project Start
2016-02-19
Project End
2018-01-31
Budget Start
2016-02-19
Budget End
2018-01-31
Support Year
1
Fiscal Year
2016
Total Cost
Indirect Cost
Name
Gataca, LLC
Department
Type
DUNS #
608613043
City
Newport
State
VA
Country
United States
Zip Code
24128