The rapid accumulation of genome sequences and protein structures during the last decade has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI formed the basis of our work on protein motif analysis. In addition, Hidden Markov Models (HMM), protein profile-against-profile comparison implemented in the HHSearch method, protein structure comparison methods and genome context analysis were extensively applied. Over the last year, we made further progress in detailed analysis of the classification, evolution, and functions of several classes of proteins and domains. Specifically, we studied in detail the protein complexes and functional systems that are involved in adaptive immunity in prokaryotes. The CRISPR-Cas adaptive immunity systems that are present in most Archaea and many Bacteria function by incorporating fragments of alien genomes into specific genomic loci, transcribing the inserts and using the transcripts as guide RNAs to destroy the genome of the cognate virus or plasmid. This RNA interference-like immune response is mediated by numerous, diverse and rapidly evolving Cas (CRISPR-associated) proteins, several of which form the Cascade complex involved in the processing of CRISPR transcripts and cleavage of the target DNA. Our comparative analysis of the Cas protein sequences and structures led to a new classification of the CRISPR-Cas systems into three distinct Types (I, II and III). A detailed comparison of the available sequences and structures of Cas proteins revealed several previously unnoticed homologous relationships. The Repeat-Associated Mysterious Proteins (RAMPs) containing a distinct form of the RNA Recognition Motif (RRM) domain, which are major components of the CRISPR-Cas systems, were classified into three large groups, Cas5, Cas6 and Cas7. Each of these groups includes many previously uncharacterized proteins now shown to adopt the RAMP structure. Evidence is presented that large subunits contained in most of the CRISPR-Cas systems could be homologous to Cas10 proteins which contain a polymerase-like Palm domain and are predicted to be enzymatically active in Type III CRISPR-Cas systems but inactivated in Type I systems. These findings, the fact that the CRISPR polymerases, RAMPs and Cas2 all contain core RRM domains, and distinct gene arrangements in the three types of CRISPR-Cas systems together provide for a simple scenario for origin and evolution of the CRISPR-Cas machinery. Under this scenario, the CRISPR-Cas system originated in thermophilic Archaea and subsequently spread horizontally among prokaryotes. Thus. unification of Cas protein families previously considered unrelated provided for substantial improvement in the classification of CRISPR-Cas systems and a reconstruction of their evolution. In addition, we have extensively characterized protein domains and predicted numerous novel protein complexes involved in other forms of antivirus defense in prokaryotes. The arms race between cellular life forms and viruses is a major driving force of evolution. A substantial fraction of bacterial and archaeal genomes is dedicated to antivirus defense. We analyzed the distribution of defense genes and typical mobilome components (such as viral and transposon genes) in bacterial and archaeal genomes, and demonstrated statistically significant clustering of antivirus defense systems and mobile genes and elements in genomic islands. The defense islands are enriched in putative operons and contain numerous over-represented gene families. A detailed sequence analysis of the proteins encoded by genes in these families shows that many of them are diverged variants of known defense system components, whereas others show features, such as characteristic operonic organization, that are suggestive of novel defense systems. Thus, genomic islands provide abundant material for experimental study of bacterial and archaeal antivirus defense. Except for the CRISPR-Cas systems, different classes of defense systems, in particular toxin-antitoxin and restriction-modification systems, show non-random clustering in defense islands. It remains unclear to what extant these associations reflect functional cooperation between different defense systems and to what extent the islands are genomic 'sinks'that accumulate diverse non-essential genes, particularly those acquired via HGT. The characteristics of defense islands resemble those of mobilome islands. Defense and mobilome genes are non-randomly associated in islands, suggesting non-adaptive evolution of the islands via a preferential attachment-like mechanism underpinned by the addictive properties of defense systems such as toxins-antitoxins and an important role of horizontal mobility in the evolution of these islands.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Library of Medicine
Zip Code
Koonin, Eugene V (2017) Evolution of RNA- and DNA-guided antivirus defense systems in prokaryotes and eukaryotes: common ancestry vs convergence. Biol Direct 12:5
Smargon, Aaron A; Cox, David B T; Pyzocha, Neena K et al. (2017) Cas13b Is a Type VI-B CRISPR-Associated RNA-Guided RNase Differentially Regulated by Accessory Proteins Csx27 and Csx28. Mol Cell 65:618-630.e7
Koonin, Eugene V; Makarova, Kira S; Wolf, Yuri I (2017) Evolutionary Genomics of Defense Systems in Archaea and Bacteria. Annu Rev Microbiol :
Krupovic, Mart; Koonin, Eugene V (2017) Homologous Capsid Proteins Testify to the Common Ancestry of Retroviruses, Caulimoviruses, Pseudoviruses, and Metaviruses. J Virol 91:
Koonin, Eugene V; Makarova, Kira S; Zhang, Feng (2017) Diversity, classification and evolution of CRISPR-Cas systems. Curr Opin Microbiol 37:67-78
Shmakov, Sergey; Smargon, Aaron; Scott, David et al. (2017) Diversity and evolution of class 2 CRISPR-Cas systems. Nat Rev Microbiol 15:169-182
Mekhedov, Sergei L; Makarova, Kira S; Koonin, Eugene V (2017) The complex domain architecture of SAMD9 family proteins, predicted STAND-like NTPases, suggests new links to inflammation and apoptosis. Biol Direct 12:13
Krupovic, Mart; Koonin, Eugene V (2017) Multiple origins of viral capsid proteins from cellular ancestors. Proc Natl Acad Sci U S A 114:E2401-E2410
Faure, Guilhem; Ogurtsov, Aleksey Y; Shabalina, Svetlana A et al. (2016) Role of mRNA structure in the control of protein folding. Nucleic Acids Res :
Kapitonov, Vladimir V; Makarova, Kira S; Koonin, Eugene V (2016) ISC, a Novel Group of Bacterial and Archaeal DNA Transposons That Encode Cas9 Homologs. J Bacteriol 198:797-807

Showing the most recent 10 out of 105 publications