Commercialization of HaploSeq as a Service (HaaS) for generating chromosome-span phased genome and exome sequence information Arima Genomics SPECIFIC AIMS 7. Project Summary/Abstract Over 90% of Next-Generation Sequencing (NGS) is sequenced via Illumina short-read sequencers. This is because of its cost-effectiveness and faster turn-around times. However, short-read sequencing technologies lose critical contiguity information and are limited in assembling genomes de novo and reconstructing maternal and paternal haplotypes of diploid genomes. Contiguity information is valuable for understanding the genetics of human health and disease, and therefore critical for advancing precision and personalized medicine. Long- read technologies (e.g. Pacific Biosciences) only reach megabase-scale chromosomal contiguity, but is 5-7X more expensive than Illumina short-read, limiting its use. Recent advances in DNA preparation can preserve long-range information that is compatible with Illumina short-read sequencing. These ?synthetic? long-reads (or SLR) methods can improve short-read technologies with, long-range contiguity and short-read economy. However, the maximal SLR contiguity is only 1-5% the contiguity a 100Mb average human chromosome. To construct multi-megabase contiguity, SLR methods require genomic DNA (gDNA) fragments >100-150Kb, but obtaining long gDNA fragments is challenging and this limits current SLR methods. Arima Genomics has optimized HiC technology, an SLR-based DNA protocol, to establish Arima HiC (A-HiC) that preserves chromosome-span contiguity for de novo assembly, haplotype phasing and metagenomics, the libraries of which can be sequenced via Illumina short-read instruments. A-HiC, rather than using purified gDNA, leverages the long-contiguity information preserved naturally in the 3-dimensional (3D) organization of genomes in cells. Indeed, 3D information is not only long-range but in fact full chromosome-range information. A-HiC optimizes multiple features of HiC ? while HiC is laborious, time-consuming 3-day procedure, costly protocol, A-HiC is easy to perform, generates consistent quality libraries, >70% less cost and is only a 6-hour protocol, and is compatible to standard library preps such as KAPA Hyper preps ? together, these properties of A-HiC make them automatable. After success with manual automation via 96-well plates, we propose to use liquid handler (in partnership with Agilent) to automate A-HiC, and furthermore, we aim to make A-HiC robust to wide-range of sample types (cells, tissues, blood, human and non-human) to serve diverse customers via the automated service platform, referred to as HaaS. In addition to optimizing experimental A-HIC, we develop several algorithms to generate chromosome-span phase information of genomes and exomes, which we will publish as open source software (OSS). We also leverage existing OSS for other HiC-based apps, and together we will automate software for all HiC-enabled apps in supercomputing infrastructure to enable quick turn around time for our diverse customers. Together, HaaS architecture has automated experimental (A-HiC) and computational aspects (OSS for HiC-based apps). Many commercial players (e.g. Novogene) provide sequencing services. Arima's HaaS provides chromosomal contiguity and thus is differentiated from the traditional (category-1) service providers such as Novogene who provide fragmented contiguity based on short-read or long-read methods. Indeed, we propose to collaborate with Novogene. We are also in communication with PacBio for a potential collaboration and marketing agreement. On the other hand, we compete directly with category-2 players who offer HiC services, specifically for de novo assembly application. Nonetheless HiC services from other category-2 companies suffer significant limitations ? (1) high prices (>$10,000 while Arima prices at <$5,000 for large genomes and <$3,000 for small genomes), (2) low quality data via usage of traditional HiC (while Arima uses optimized A-HiC), (3) non- automatable traditional HiC (while Arima uses fully automatable A-HiC), and in addition, we have developed specialized phasing algorithms to garner wide customer base. To date, Arima has barely marketed our services, but word-of-mouth has garnered new customers for Arima and attracted repeat business, which reflects the quality of Arima's services and demonstrates significant market demand for Arima ? setting the stage for rapid growth to be supported by more deliberate traditional marketing of our proposed HaaS business model. In this proposal, we develop and benchmark HaaS via collaboration with Key Opinion leaders across multiple sample types (blood, cells, tissues, human and non-human samples) for several HiC-based apps.

Public Health Relevance

Commercialization of HaploSeq as a Service (HaaS) for generating chromosome-span phased genome and exome sequence information Arima Genomics Project Narrative Next-generation short-read sequencing (e.g. Illumina) has advantages of price, speed, accuracy, and broad adoption. With short-read sequencing, however, a key principle in genomics is severely limited: contiguity, a measure of fully-linked DNA sequence. Contiguity in the context of haplotype phasing to delineate maternal and paternal copies and de novo assembly is critical to understand the structure and function of the genome and for genomics based medicine. Several companies have attempted to improve contiguity with long-read or synthetic long-read technologies by increasing read-length, or bar-coding, or reconstituting artificial chromatin. These methods are limited to 1-5% of the actual contiguity of an average human chromosome, require specialized and expensive equipment, and are technically demanding. Arima has overcome these deficiencies by using HiC methodology ? a method that can generate chromosomal contiguity. We have optimized HiC protocol to result in 96-well compatible, highly- reproducible, 6-hour, and low-cost protocol referred to as Arima-HiC (A-HiC). Via this proposal, Arima aims to automate A-HiC and the associated computational applications (apps) for phasing and other applications such as de novo assembly and metagenomics through the HaaS architecture. HaaS service will receive samples, which are put through A-HiC DNA prep protocol and standard library prep in an automated fashion (implemented on a liquid handler) and will be subjected to standard Illumina short-read sequencing. The data from this experiment can be analyzed to reveal chromosome-span contiguity for de novo assembly or phasing of genomes and exomes. HaaS aims to serve diverse customers with maximum contiguity and genomic information, at low-cost and fast turn around time.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Smith, Michael
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Arima Genomics, Inc.
San Diego
United States
Zip Code
Panopoulos, Athanasia D; D'Antonio, Matteo; Benaglio, Paola et al. (2017) iPSCORE: A Resource of 222 iPSC Lines Enabling Functional Characterization of Genetic Variation across a Variety of Cell Types. Stem Cell Reports 8:1086-1100
DeBoever, Christopher; Li, He; Jakubosky, David et al. (2017) Large-Scale Profiling Reveals the Influence of Genetic Variation on Gene Expression in Human Induced Pluripotent Stem Cells. Cell Stem Cell 20:533-546.e7