The rapid accumulation of genome sequences and protein structures during the last decade has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI formed the basis of our work on protein motif analysis. In addition, Hidden Markov Models (HMM), protein profile-against-profile comparison implemented in the HHSearch method, protein structure comparison methods and genome context analysis were extensively applied. Over the last year, we made further progress in detailed analysis of the classification, evolution, and functions of several classes of proteins and domains. Specifically, we studied in detail the protein complexes and functional systems that are involved in adaptive immunity in prokaryotes. The CRISPR-Cas adaptive immunity systems that are present in most Archaea and many Bacteria function by incorporating fragments of alien genomes into specific genomic loci, transcribing the inserts and using the transcripts as guide RNAs to destroy the genome of the cognate virus or plasmid. This RNA interference-like immune response is mediated by numerous, diverse and rapidly evolving Cas (CRISPR-associated) proteins, several of which form the Cascade complex involved in the processing of CRISPR transcripts and cleavage of the target DNA. Our comparative analysis of the Cas protein sequences and structures led to a new classification of the CRISPR-Cas systems into three distinct Types (I, II and III). A detailed comparison of the available sequences and structures of Cas proteins revealed several previously unnoticed homologous relationships. The Repeat-Associated Mysterious Proteins (RAMPs) containing a distinct form of the RNA Recognition Motif (RRM) domain, which are major components of the CRISPR-Cas systems, were classified into three large groups, Cas5, Cas6 and Cas7. Each of these groups includes many previously uncharacterized proteins now shown to adopt the RAMP structure. Evidence is presented that large subunits contained in most of the CRISPR-Cas systems could be homologous to Cas10 proteins which contain a polymerase-like Palm domain and are predicted to be enzymatically active in Type III CRISPR-Cas systems but inactivated in Type I systems. These findings, the fact that the CRISPR polymerases, RAMPs and Cas2 all contain core RRM domains, and distinct gene arrangements in the three types of CRISPR-Cas systems together provide for a simple scenario for origin and evolution of the CRISPR-Cas machinery. Under this scenario, the CRISPR-Cas system originated in thermophilic Archaea and subsequently spread horizontally among prokaryotes. Thus. unification of Cas protein families previously considered unrelated provided for substantial improvement in the classification of CRISPR-Cas systems and a reconstruction of their evolution. In addition, we have extensively characterized protein domains and predicted numerous novel protein complexes involved in other forms of antivirus defense in prokaryotes. The arms race between cellular life forms and viruses is a major driving force of evolution. A substantial fraction of bacterial and archaeal genomes is dedicated to antivirus defense. We analyzed the distribution of defense genes and typical mobilome components (such as viral and transposon genes) in bacterial and archaeal genomes, and demonstrated statistically significant clustering of antivirus defense systems and mobile genes and elements in genomic islands. The defense islands are enriched in putative operons and contain numerous over-represented gene families. A detailed sequence analysis of the proteins encoded by genes in these families shows that many of them are diverged variants of known defense system components, whereas others show features, such as characteristic operonic organization, that are suggestive of novel defense systems. Thus, genomic islands provide abundant material for experimental study of bacterial and archaeal antivirus defense. Except for the CRISPR-Cas systems, different classes of defense systems, in particular toxin-antitoxin and restriction-modification systems, show non-random clustering in defense islands. It remains unclear to what extant these associations reflect functional cooperation between different defense systems and to what extent the islands are genomic 'sinks'that accumulate diverse non-essential genes, particularly those acquired via HGT. The characteristics of defense islands resemble those of mobilome islands. Defense and mobilome genes are non-randomly associated in islands, suggesting non-adaptive evolution of the islands via a preferential attachment-like mechanism underpinned by the addictive properties of defense systems such as toxins-antitoxins and an important role of horizontal mobility in the evolution of these islands.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Library of Medicine
Zip Code
Yutin, Natalya; Makarova, Kira S; Gussow, Ayal B et al. (2018) Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut. Nat Microbiol 3:38-46
He, Fei; Bhoobalan-Chitty, Yuvaraj; Van, Lan B et al. (2018) Anti-CRISPR proteins encoded by archaeal lytic viruses inhibit subtype I-D immunity. Nat Microbiol 3:461-469
Shmakov, Sergey A; Makarova, Kira S; Wolf, Yuri I et al. (2018) Systematic prediction of genes functionally linked to CRISPR-Cas systems by gene neighborhood analysis. Proc Natl Acad Sci U S A 115:E5307-E5316
Pushkarev, Alina; Inoue, Keiichi; Larom, Shirley et al. (2018) A distinct abundant group of microbial rhodopsins discovered using functional metagenomics. Nature 558:595-599
Yutin, Natalya; Bäckström, Disa; Ettema, Thijs J G et al. (2018) Vast diversity of prokaryotic virus genomes encoding double jelly-roll major capsid proteins uncovered by genomic and metagenomic sequence analysis. Virol J 15:67
Ferrer, Manuel; Sorokin, Dimitry Y; Wolf, Yuri I et al. (2018) Proteomic Analysis of Methanonatronarchaeum thermophilum AMET1, a Representative of a Putative New Class of Euryarchaeota, ""Methanonatronarchaeia"". Genes (Basel) 9:
Koonin, Eugene V; Makarova, Kira S (2018) Discovery of Oligonucleotide Signaling Mediated by CRISPR-Associated Polymerases Solves Two Puzzles but Leaves an Enigma. ACS Chem Biol 13:309-312
Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I et al. (2018) Phyletic Distribution and Lineage-Specific Domain Architectures of Archaeal Two-Component Signal Transduction Systems. J Bacteriol 200:
Krupovic, Mart; Cvirkaite-Krupovic, Virginija; Iranzo, Jaime et al. (2018) Viruses of archaea: Structural, functional, environmental and evolutionary genomics. Virus Res 244:181-193
Smargon, Aaron A; Cox, David B T; Pyzocha, Neena K et al. (2017) Cas13b Is a Type VI-B CRISPR-Associated RNA-Guided RNase Differentially Regulated by Accessory Proteins Csx27 and Csx28. Mol Cell 65:618-630.e7

Showing the most recent 10 out of 117 publications