The most persistent and significant hindrance to genomic studies of Trypanosoma cruzi is the lack of an acceptable reference genome. There are a number of factors that have impeded progress in generating a high quality, fully assembled T. cruzi reference sequence, including high repeat content (50%) in the T. cruzi genome, extreme genome-wide heterozygosity in the reference strain (CL-Brener) that was chosen for sequencing and reference genome assembly, and genetic mosaicism in the reference strain. Thus, the current reference genome for T. cruzi has hundreds of gaps and regions of assembly collapse, has never been fully assembled and is, in fact, presented as two genomes due to the heterozygosity of the reference strain -each chromosome is presented as two versions differing substantially in terms of gene content, sequence identity where the two chromosome sequences align, and even lengths of the two chromosomes in a pair. These issues have greatly muted the efficacy and power of previous and ongoing whole genome sequencing studies in this causative agent of human disease, and have made virtually any endeavor involving the need for scrutinizing the reference genome difficult and frequently unproductive. We propose to tackle the issues with the current T. cruzi reference genome using three modifications of the approach originally used to generate it. First, we propose to use single-molecule, long-read (up to 10 kb) sequencing to close gaps and expand regions of assembly collapse that occurred when primarily Sanger reads (700 bp) were used to assemble the original CL-Brener reference genome. Second, we will select T. cruzi isolates for sequencing that are homozygous and do not have mosaic genomes to simplify the assemblies. Third, because all T. cruzi lines are derived from two ancestral, divergent, homozygous, non-mosaic lineages, we will generate reference sequences to each parental lineage in order to cover the scope of genetic content in all T. cruzi isolates.
Trypanosoma cruzi, the causative agent of the frequently fatal human disease, Chagas disease, is a blood-borne protozoan parasite endemic to the Americas. Studies into Chagas disease drug treatment, prevention, and epidemiology are critically needed and are all hampered by the lack of a suitable reference genome for T. cruzi. This proposal addresses the need for a high quality, fully assembled reference genome by using newly available single-molecule, long-read sequencing technology and strategic selection of isolates from the two ancestral lineages of T. cruzi to produce two fully assembled and annotated reference genomes that, between them, will cover the full range of genetic content in all T. cruzi isolates.