Generation of reference genomes for Trypanosoma cruzi using PacBio sequencing

Tarleton, Rick; Minning, Todd

Abstract

The most persistent and significant hindrance to genomic studies of Trypanosoma cruzi is the lack of an acceptable reference genome. There are a number of factors that have impeded progress in generating a high quality, fully assembled T. cruzi reference sequence, including high repeat content (50%) in the T. cruzi genome, extreme genome-wide heterozygosity in the reference strain (CL-Brener) that was chosen for sequencing and reference genome assembly, and genetic mosaicism in the reference strain. Thus, the current reference genome for T. cruzi has hundreds of gaps and regions of assembly collapse, has never been fully assembled and is, in fact, presented as two genomes due to the heterozygosity of the reference strain -each chromosome is presented as two versions differing substantially in terms of gene content, sequence identity where the two chromosome sequences align, and even lengths of the two chromosomes in a pair. These issues have greatly muted the efficacy and power of previous and ongoing whole genome sequencing studies in this causative agent of human disease, and have made virtually any endeavor involving the need for scrutinizing the reference genome difficult and frequently unproductive. We propose to tackle the issues with the current T. cruzi reference genome using three modifications of the approach originally used to generate it. First, we propose to use single-molecule, long-read (up to 10 kb) sequencing to close gaps and expand regions of assembly collapse that occurred when primarily Sanger reads (700 bp) were used to assemble the original CL-Brener reference genome. Second, we will select T. cruzi isolates for sequencing that are homozygous and do not have mosaic genomes to simplify the assemblies. Third, because all T. cruzi lines are derived from two ancestral, divergent, homozygous, non-mosaic lineages, we will generate reference sequences to each parental lineage in order to cover the scope of genetic content in all T. cruzi isolates.

Public Health Relevance

Trypanosoma cruzi, the causative agent of the frequently fatal human disease, Chagas disease, is a blood-borne protozoan parasite endemic to the Americas. Studies into Chagas disease drug treatment, prevention, and epidemiology are critically needed and are all hampered by the lack of a suitable reference genome for T. cruzi. This proposal addresses the need for a high quality, fully assembled reference genome by using newly available single-molecule, long-read sequencing technology and strategic selection of isolates from the two ancestral lineages of T. cruzi to produce two fully assembled and annotated reference genomes that, between them, will cover the full range of genetic content in all T. cruzi isolates.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Allergy and Infectious Diseases (NIAID)
Type: Small Research Grants (R03)
Project #: 1R03AI124228-01
Application #: 9101686
Study Section: Pathogenic Eukaryotes Study Section (PTHE)
Program Officer: Joy, Deirdre A

Project Start: 2016-04-01
Project End: 2018-03-31
Budget Start: 2016-04-01
Budget End: 2017-03-31
Support Year: 1
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: University of Georgia
Department: Public Health & Prev Medicine
Type: Organized Research Units
DUNS #: 004315578

City: Athens
State: GA
Country: United States
Zip Code: 30602

Related projects


NIH 2017 R03 AI	Generation of reference genomes for Trypanosoma cruzi using PacBio sequencing Tarleton, Rick L. / University of Georgia
NIH 2016 R03 AI	Generation of reference genomes for Trypanosoma cruzi using PacBio sequencing Tarleton, Rick L.; Minning, Todd A. / University of Georgia

Comments

Be the first to comment on Rick Tarleton's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: