The rapidly growing database of completely sequenced genomes of bacteria, archaea and eukaryotes (over 1200 genomes available by the beginning of 2011 and many more in progress) creates both new opportunities and new challenges for genome research. Over the last year, we performed several studies that took advantage of the genomic information to establish fundamental principles of genome evolution and function. In particular, we performed a comprehensive genome analysis of a new giant virus, the mamavirus, which led to a significant progress in the characterization of gene repertoire of giant viruses. In another major project, we performed extensive comparative genomic analysis of the diverse functional systems that protect archaea and bacteria from viruses and other forms of alien DNA. Statistical analysis of the distribution of genes involved in various forms of defense in bacterial and archaeal genomes led to the demonstration of the existence of numerous """"""""defense islands"""""""" that are enriched for these genes. By comparing the distributions of defense genes and mobile elements on bacterial and archaeal chromosomes, we came to the conclusion that defense islands have not evolved as a specific adaptation but rather result from preferential attachment caused by addictive properties of toxin-antitoxin and possibly other defense systems. In addition, this research has led to prediction of several novel defense systems and mechanisms. A new classification and a comprehensive evolutionary scenario for the CRISPR-Cas system of prokaryotic adaptive immunity have been developed. In a separate project, we applied genome-wide analysis of rare genomic changes associated with conserved amino acids (RGC_CAs) and used several independent techniques to obtain date estimates for the divergence of the major lineages of eukaryotes with calibration intervals for insects, land plants and vertebrates. Accurate estimation of the divergence time of the extant eukaryotes is a fundamentally important but extremely difficult problem owing primarily to gross violations of the molecular clock at long evolutionary distances and the lack of appropriate calibration points close to the date of interest. These difficulties are intrinsic to the dating of ancient divergence events and are reflected in the large discrepancies between estimates obtained with different approaches. Estimates of the age of Last Eukaryotic Common Ancestor (LECA) vary approximately twofold, from 1,100 million years ago (Mya) to 2,300 Mya. The results of our analysis suggest an early divergence of monocot and dicot plants, approximately 340 Mya, raising the possibility of plant-insect coevolution. The divergence of bilaterian animal phyla is estimated at 400-700 Mya, a range of dates that is consistent with cladogenesis immediately preceding the Cambrian explosion. The origin of opisthokonts (the supergroup of eukaryotes that includes metazoa and fungi) is estimated at 700-1,000 Mya, and the age of LECA at 1,000-1,300 Mya. We separately analyzed the red algal calibration interval which is based on single fossil. This analysis produced time estimates that were systematically older compared to the other estimates. Nevertheless, the majority of the estimates for the age of the LECA using the red algal data fell within the 1,200-1,400 Mya interval. The inference of a """"""""young LECA"""""""" is compatible with the latest of previously estimated dates and has substantial biological implications. If these estimates are valid, the approximately 1 to 1.4 billion years of evolution of eukaryotes that is open to comparative-genomic study probably was preceded by hundreds of millions years of evolution that might have included extinct diversity inaccessible to comparative approaches. In a separate, theoretical study, we investigated a fundamental problem in the origin of life: what could be the evolutionary factors that drove the emergence of DNA as dedicated molecule for storage and transmission of genetic information. The division of labor between template and catalyst is a fundamental property of all living systems: DNA stores genetic information whereas proteins function as catalysts. The RNA world hypothesis, however, posits that, at the earlier stages of evolution, RNA acted as both template and catalyst. Why would such division of labor evolve in the RNA world? We investigated the evolution of DNA-like molecules, i.e. molecules that can function only as template, in minimal computational models of RNA replicator systems. In the models, RNA can function as both template-directed polymerase and template, whereas DNA can function only as template. Two classes of models were explored. In the surface models, replicators are attached to surfaces with finite diffusion. In the compartment models, replicators are compartmentalized by vesicle-like boundaries. Both models displayed the evolution of DNA and the ensuing division of labor between templates and catalysts. In the surface model, DNA provides the advantage of greater resistance against parasitic templates. However, this advantage is at least partially offset by the disadvantage of slower multiplication due to the increased complexity of the replication cycle. In the compartment model, DNA can significantly delay the intra-compartment evolution of RNA towards catalytic deterioration. These results are explained in terms of the trade-off between template and catalyst that is inherent in RNA-only replication cycles: DNA releases RNA from this trade-off by making it unnecessary for RNA to serve as template and so rendering the system more resistant against evolving parasitism. Our analysis of these simple models suggests that the lack of catalytic activity in DNA by itself can generate a sufficient selective advantage for RNA replicator systems to produce DNA. Given the widespread notion that DNA evolved owing to its superior chemical properties as a template, this study offers a novel insight into the evolutionary origin of DNA.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Library of Medicine
Zip Code
Sorokin, Dimitry Y; Makarova, Kira S; Abbas, Ben et al. (2017) Discovery of extremely halophilic, methyl-reducing euryarchaea provides insights into the evolutionary origin of methanogenesis. Nat Microbiol 2:17081
Koonin, Eugene V; Makarova, Kira S; Wolf, Yuri I (2017) Evolutionary Genomics of Defense Systems in Archaea and Bacteria. Annu Rev Microbiol :
Koonin, Eugene V; Krupovic, Mart (2017) Polintons, virophages and transpovirons: a tangled web linking viruses, transposons and immunity. Curr Opin Virol 25:7-15
Koonin, Eugene V; Makarova, Kira S; Zhang, Feng (2017) Diversity, classification and evolution of CRISPR-Cas systems. Curr Opin Microbiol 37:67-78
Shmakov, Sergey; Smargon, Aaron; Scott, David et al. (2017) Diversity and evolution of class 2 CRISPR-Cas systems. Nat Rev Microbiol 15:169-182
Krupovic, Mart; B├ęguin, Pierre; Koonin, Eugene V (2017) Casposons: mobile genetic elements that gave rise to the CRISPR-Cas adaptation machinery. Curr Opin Microbiol 38:36-43
Koonin, Eugene V (2017) Evolution of RNA- and DNA-guided antivirus defense systems in prokaryotes and eukaryotes: common ancestry vs convergence. Biol Direct 12:5
Iranzo, Jaime; Krupovic, Mart; Koonin, Eugene V (2016) The Double-Stranded DNA Virosphere as a Modular Hierarchical Network of Gene Sharing. MBio 7:
Lobkovsky, Alexander E; Wolf, Yuri I; Koonin, Eugene V (2016) Evolvability of an Optimal Recombination Rate. Genome Biol Evol 8:70-7
Smith, Richard H; Hallwirth, Claus V; Westerman, Michael et al. (2016) Germline viral ""fossils"" guide in silico reconstruction of a mid-Cenozoic era marsupial adeno-associated virus. Sci Rep 6:28965

Showing the most recent 10 out of 183 publications