Trichomonas vaginalis is a flagellated, anaerobic protist that causes trichomoniasis, the most common non-viral sexually transmitted disease in humans, with prevalence in the USA estimated at 13% for non- Hispanic Black woman, and 5% in women aged 15-49 globally. Trichomoniasis can induce painful genital tract inflammation and discharge, increase the risk of HIV acquisition and transmission, and have pregnancy sequelae that include preterm delivery and low birth weight. The Centers for Disease Control and Prevention has identified trichomoniasis as a neglected parasitic infection and targeted it as a priority for public health action. In addition to T. vaginalis, virtually all known trichomonads are parasites or commensals of vertebrates, and include other human-infecting species (Trichomonas tenax, Pentatrichomonas hominis), devastating pathogens of birds (Trichomonas gallinae, Trichomonas stableri), an economically important pathogen of cattle (Tritrichomonas foetus), and pathogens of pets (Tr. foetus, P. hominis). Alarmingly, the host ranges of some trichomonads suggest that they can be agents of disease transmitted between humans and animals (i.e., zoonotic). Research on trichomonads remains neglected; genomic studies to date are primarily confined to T. vaginalis, a draft assembly of which (TvG3_2007) was published by our group in 2007 and deposited in public databases, including the NIH-funded Bioinformatics Resource Center (BRC) TrichDB, part of the EuPathDB suite of eukaryotic pathogen databases. While TvG3_2007 was groundbreaking and fruitful for research into the basic biology of T. vaginalis, it is highly fragmented due to the enormous complement of high copy number sequences, particularly long transposable elements (TEs), as well as expanded gene families. Uncertainty about gene and TE copy numbers due to assembly fragmentation is compounded by the enormous number (>70%) of the ~50,000 predicted protein-coding genes that could not be annotated beyond `hypothetical' or `conserved hypothetical' status. TvG3_2007 remains the only trichomonad genome deposited in TrichDB, and only its TE annotation has been updated since then. All of these factors are obstacles to studying T. vaginalis and other trichomonad pathogens at genomic, evolutionary, and molecular levels. This proposal seeks to remedy this situation by using modern databases and tools, including those available from the EuPathDB BRC, to annotate a new, high-quality, long-read T. vaginalis assembly that we have recently generated, and transfer that annotation to 17 assemblies that we and colleagues have also generated from eight trichomonad species. All data will be deposited in TrichDB, massively expanding the genomic assets available for T. vaginalis and trichomonad research. We will subsequently use the improved TrichDB resource to conduct comparative genomics across the trichomonads to elucidate differences in genome characteristics, gene family expansion, and importantly TE burden, in order to investigate correlates to host range and implications for potential zoonosis.
The flagellated, anaerobic parasite Trichomonas vaginalis infects an estimated 3.7 million people in the United States, and yet current genomic datasets for this species and other species of zoonotic trichomonad that infect cattle, pets, and birds, are severely limited, with only a single highly fragmented T. vaginalis genome assembly from 2007 available, with functional annotation for <30% of the gene models. A vastly improved new long-read T. vaginalis assembly containing ~40,000 putative protein coding genes, and 17 short-read assemblies from eight other species, are now available, but require manual curation and transitive annotation. This project will curate these new assemblies to high standards using modern tools, including those available in the Bioinformatics Resource Center TrichDB, deposit them to TrichDB, and conduct the first comparative genomics analyses across trichomonad genera in order to assess pathogenic gene family sizes, transposable element burden, and the implication of genomic differences for host range preference and potential for zoonosis.