The rapidly growing database of completely and nearly completely sequenced genomes of bacteria, archaea, eukaryotes and viruses (several thousand genomes already available and many more in progress) creates both extensive new opportunities and major new challenges for genome research. During the last year, we performed a variety of studies that took advantage of the genomic information to establish fundamental principles of genome evolution. Much of our work aimed at understanding evolution of viruses and mobile elements. Virus genomes are prone to extensive gene loss, gain, and exchange and share no universal genes. Therefore, in a broad-scale study of virus evolution, gene and genome network analyses can complement traditional phylogenetics. We performed an exhaustive comparative analysis of the genomes of double-stranded DNA (dsDNA) viruses by using the bipartite network approach and found a robust hierarchical modularity in the dsDNA virosphere. Bipartite networks consist of two classes of nodes, with nodes in one class, in this case genomes, being connected via nodes of the second class, in this case genes. Such a network can be partitioned into modules that combine nodes from both classes. The bipartite network of dsDNA viruses includes 19 modules that form 5 major and 3 minor supermodules. Of these modules, 11 include tailed bacteriophages, reflecting the diversity of this largest group of viruses. The module analysis quantitatively validates and refines previously proposed nontrivial evolutionary relationships. An expansive supermodule combines the large and giant viruses of the putative order Megavirales with diverse moderate-sized viruses and related mobile elements. All viruses in this supermodule share a distinct morphogenetic tool kit with a double jelly roll major capsid protein. Herpesviruses and tailed bacteriophages comprise another supermodule, held together by a distinct set of morphogenetic proteins centered on the HK97-like major capsid protein. Together, these two supermodules cover the great majority of currently known dsDNA viruses. We formally identify a set of 14 viral hallmark genes that comprise the hubs of the network and account for most of the intermodule connections. The empirical research into the evolutionary relationships between viral genomes was complemented by theoretical modeling of virus-host coevolution, with the model predictions tested against comparative genomic data. Almost all cellular life forms are hosts to diverse genetic parasites with various levels of autonomy including plasmids, transposons and viruses. Theoretical modeling of the evolution of primordial replicators indicates that parasites ('cheaters') necessarily evolve in such systems and can be kept at bay primarily via compartmentalization. Given the (near) ubiquity, abundance and diversity of genetic parasites, the question becomes pertinent: are such parasites intrinsic to life? At least in prokaryotes, the persistence of parasites is linked to the rate of horizontal gene transfer (HGT). We mathematically derive the threshold value of the minimal transfer rate required for selfish element persistence, depending on the element duplication and loss rates as well as the cost to the host. Estimation of the characteristic gene duplication, loss and transfer rates for transposons, plasmids and virus-related elements in multiple groups of diverse bacteria and archaea indicates that most of these rates are compatible with the long term persistence of parasites. Notably, a small but non-zero rate of HGT is also required for the persistence of non-parasitic genes. We hypothesize that cells cannot tune their horizontal transfer rates to be below the threshold required for parasite persistence without experiencing highly detrimental side-effects. As a lower boundary to the minimum DNA transfer rate that a cell can withstand, we consider the process of genome degradation and mutational meltdown of populations through Muller's ratchet. A numerical assessment of this hypothesis suggests that microbial populations cannot purge parasites while escaping Muller's ratchet. Thus, genetic parasites appear to be virtually inevitable in cellular organisms. Casposons are a superfamily of putative self-synthesizing transposable elements that we discovered during our studies into the evolution of CRISPR-Cas systems. The casposons are predicted to employ a homolog of Cas1 protein as a recombinase and could have contributed to the origin of the CRISPR-Cas adaptive immunity systems in archaea and bacteria. Casposons remain uncharacterized experimentally, except for the recent demonstration of the integrase activity of the Cas1 homolog, and given their relative rarity in archaea and bacteria, original comparative genomic analysis has not provided direct indications of their mobility. We found evidence of casposon mobility obtained by comparison of the genomes of 62 strains of the archaeon Methanosarcina mazei. In these genomes, casposons are variably inserted in three distinct sites indicative of multiple, recent gains, and losses. Some casposons are inserted into other mobile genetic elements that might provide vehicles for horizontal transfer of the casposons. Additionally, many M. mazei genomes contain previously undetected solo terminal inverted repeats that apparently are derived from casposons and could resemble intermediates in CRISPR evolution. We further demonstrated the sequence specificity of casposon insertion and note clear parallels with the adaptation mechanism of CRISPR-Cas. Finally, besides identifying additional representatives in each of the three originally defined families, we describe a new, fourth, family of casposons. Taken together, these studies advance the existing understanding of the genome evolution in diverse life forms, in particular viruses and mobile elements, and provide new insights into general principles of genome evolution.

Project Start
Project End
Budget Start
Budget End
Support Year
21
Fiscal Year
2016
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Krupovic, Mart; Cvirkaite-Krupovic, Virginija; Iranzo, Jaime et al. (2018) Viruses of archaea: Structural, functional, environmental and evolutionary genomics. Virus Res 244:181-193
Yutin, Natalya; Makarova, Kira S; Gussow, Ayal B et al. (2018) Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut. Nat Microbiol 3:38-46
He, Fei; Bhoobalan-Chitty, Yuvaraj; Van, Lan B et al. (2018) Anti-CRISPR proteins encoded by archaeal lytic viruses inhibit subtype I-D immunity. Nat Microbiol 3:461-469
Shmakov, Sergey A; Makarova, Kira S; Wolf, Yuri I et al. (2018) Systematic prediction of genes functionally linked to CRISPR-Cas systems by gene neighborhood analysis. Proc Natl Acad Sci U S A 115:E5307-E5316
Pushkarev, Alina; Inoue, Keiichi; Larom, Shirley et al. (2018) A distinct abundant group of microbial rhodopsins discovered using functional metagenomics. Nature 558:595-599
Amarasinghe, Gaya K; Aréchiga Ceballos, Nidia G; Banyard, Ashley C et al. (2018) Taxonomy of the order Mononegavirales: update 2018. Arch Virol 163:2283-2294
Yutin, Natalya; Bäckström, Disa; Ettema, Thijs J G et al. (2018) Vast diversity of prokaryotic virus genomes encoding double jelly-roll major capsid proteins uncovered by genomic and metagenomic sequence analysis. Virol J 15:67
Ferrer, Manuel; Sorokin, Dimitry Y; Wolf, Yuri I et al. (2018) Proteomic Analysis of Methanonatronarchaeum thermophilum AMET1, a Representative of a Putative New Class of Euryarchaeota, ""Methanonatronarchaeia"". Genes (Basel) 9:
Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I et al. (2018) Phyletic Distribution and Lineage-Specific Domain Architectures of Archaeal Two-Component Signal Transduction Systems. J Bacteriol 200:
Sorokin, Dimitry Y; Makarova, Kira S; Abbas, Ben et al. (2017) Discovery of extremely halophilic, methyl-reducing euryarchaea provides insights into the evolutionary origin of methanogenesis. Nat Microbiol 2:17081

Showing the most recent 10 out of 196 publications