Toxoplasma gondii is an important protist pathogen of humans. A genome sequence is available but the research community is greatly hampered by critical assembly errors that inhibit the ability of researchers to discover new, and to study known, gene duplications associated with virulence. This proposal focuses on the ascertainment and resolution of genome compressions and assembly errors in the reference genome sequence for Toxoplasma gondii ME49. Copy number variation is linked to differences in phenotype and virulence in many pathogens. The goal of this project is to identify and disambiguate local genome segment duplications that were collapsed/merged as an artifact of current genome assembly algorithms. Bioinformatics analysis of the assembled reference genome sequence for T. gondii strain ME49 reveals no segmental genome duplications, a highly anomalous result indicative of the extent to which replicate regions have been collapsed. In contrast, an analysis of sequence data from 62 T. gondii strains released from the community T. gondii genome project has revealed an excess of SNPs and an excess of sequence reads in many genomic locations when compared to the T. gondii ME49 reference. This finding indicates the existence of multiple repetitive regions in the genome assemblies as well as strain differences in the number of repetitive regions and SNPs each contains. The critical step needed to correct the reference genome sequence and disambiguate replicated regions is to obtain long-read single-molecule sequences (5-10 kb) that span the genome sufficiently well to cover duplicated regions.
Two aims are proposed that focus on resolving duplicated genome regions via long-read sequences for several key T. gondii strains with a focus on the reference, ME49. Resolved sequences will be compared with each other to catalog the extent and types of replicated regions. Finally, comparisons of duplicated regions themselves will be used to catalog affected genes. All data will be released to the community via ToxoDB.org and archived appropriately. This study will provide a reference genome sequence with greatly reduced errors and much needed insight in the scope and potential significance of genome duplications in the evolution of Toxoplasma gondii strains.

Public Health Relevance

Toxoplasma gondii is a parasite of global significance that primarily causes disease in pregnant women and immunocompromised individuals. In this proposal we seek to correct errors in the reference genome sequence and define the extent and potential significance of recently duplicated, nearly identical regions of the parasite genome. We anticipate that the results of these studies will ultimately lead to a better understanding of genetic differences that may be linked to pathogenicity.

Agency
National Institute of Health (NIH)
Institute
National Institute of Allergy and Infectious Diseases (NIAID)
Type
Small Research Grants (R03)
Project #
5R03AI115339-02
Application #
8965503
Study Section
Pathogenic Eukaryotes Study Section (PTHE)
Program Officer
Joy, Deirdre A
Project Start
2014-11-10
Project End
2017-10-31
Budget Start
2015-11-01
Budget End
2017-10-31
Support Year
2
Fiscal Year
2016
Total Cost
Indirect Cost
Name
University of Georgia
Department
Public Health & Prev Medicine
Type
Organized Research Units
DUNS #
004315578
City
Athens
State
GA
Country
United States
Zip Code
30602
Buscaglia, Carlos A; Kissinger, Jessica C; Agüero, Fernán (2015) Neglected Tropical Diseases in the Post-Genomic Era. Trends Genet 31:539-555