The rapid accumulation of genome sequences and protein structures during the last decade has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI formed the basis of our work on protein motif analysis. In addition, Hidden Markov Models (HMM), protein profile-against-profile comparison implemented in the HHSearch method, protein structure comparison methods and genome context analysis were extensively applied. Over the last year, we made further progress in detailed analysis of the classification, evolution, and functions of several classes of proteins and domains. Specifically, we studied the evolution and functions of protein domains that are involved in virus-host interactions, from both the host and the virus sides. The major role of enzymatic toxins that target nucleic acids in biological conflicts at all levels has become increasingly apparent thanks in large part to the advances of comparative genomics. Typically, toxins evolve rapidly hampering the identification of these proteins by sequence analysis. Here we analyze an unexpectedly widespread superfamily of toxin domains most of which possess RNase activity. The HEPN superfamily is comprised of all α-helical domains that were first identified as being associated with DNA polymerase β-type nucleotidyltransferases in prokaryotes and animal Sacsin proteins. Using sensitive sequence and structure comparison methods, we vastly extend the HEPN superfamily by identifying numerous novel families and by detecting diverged HEPN domains in several known protein families. The new HEPN families include the RNase LS and LsoA catalytic domains, KEN domains (e.g. RNaseL and Ire1) and the RNase domains of RloC and PrrC. The majority of HEPN domains contain conserved motifs that constitute a metal-independent endoRNase active site. Some HEPN domains lacking this motif probably function as non-catalytic RNA-binding domains, such as in the case of the mannitol repressor MtlR. Our analysis shows that HEPN domains function as toxins that are shared by numerous systems implicated in intra-genomic, inter-genomic and intra-organismal conflicts across the three domains of cellular life. In prokaryotes HEPN domains are essential components of numerous toxin-antitoxin (TA) and abortive infection (Abi) systems and in addition are tightly associated with many restriction-modification (R-M) and CRISPR-Cas systems, and occasionally with other defense systems such as Pgl and Ter. We present evidence of multiple modes of action of HEPN domains in these systems, which include direct attack on viral RNAs (e.g. LsoA and RNase LS) in conjunction with other RNase domains (e.g. a novel RNase H fold domain, NamA), suicidal or dormancy-inducing attack on self RNAs (RM systems and possibly CRISPR-Cas systems), and suicidal attack coupled with direct interaction with phage components (Abi systems). These findings are compatible with the hypothesis on coupling of pathogen-targeting (immunity) and self-directed (programmed cell death and dormancy induction) responses in the evolution of robust antiviral strategies. We propose that altruistic cell suicide mediated by HEPN domains and other functionally similar RNases was essential for the evolution of kin and group selection and cell cooperation. HEPN domains were repeatedly acquired by eukaryotes and incorporated into several core functions such as endonucleolytic processing of the 5.8S-25S/28S rRNA precursor (Las1), a novel ER membrane-associated RNA degradation system (C6orf70), sensing of unprocessed transcripts at the nuclear periphery (Swt1). Multiple lines of evidence suggest that, similar to prokaryotes, HEPN proteins were recruited to antiviral, antitransposon, apoptotic systems or RNA-level response to unfolded proteins (Sacsin and KEN domains) in several groups of eukaryotes. Extensive sequence and structure comparisons reveal unexpectedly broad presence of the HEPN domain in an enormous variety of defense and stress response systems across the tree of life. In addition, HEPN domains have been recruited to perform essential functions, in particular in eukaryotic rRNA processing. These findings are expected to stimulate experiments that could shed light on diverse cellular processes across the three domains of life. We also performed a comprehensive comparative genomic analysis of proteins and domain that are involved in antivirus defense in prokaryotes. Our knowledge of prokaryotic defense systems has vastly expanded as the result of comparative genomic analysis, followed by experimental validation. This expansion is both quantitative, including the discovery of diverse new examples of known types of defense systems, such as restriction-modification or toxin-antitoxin systems, and qualitative, including the discovery of fundamentally new defense mechanisms, such as the CRISPR-Cas immunity system. Large-scale statistical analysis reveals that the distribution of different defense systems in bacterial and archaeal taxa is non-uniform, with four groups of organisms distinguishable with respect to the overall abundance and the balance between specific types of defense systems. The genes encoding defense system components in bacterial and archaea typically cluster in defense islands. In addition to genes encoding known defense systems, these islands contain numerous uncharacterized genes, which are candidates for new types of defense systems. The tight association of the genes encoding immunity systems and dormancy- or cell death-inducing defense systems in prokaryotic genomes suggests that these two major types of defense are functionally coupled, providing for effective protection at the population level. Jointly, these ongoing studies provide a new perspective on the remarkable diversity of protein domains involved in virus-host interactions.

Project Start
Project End
Budget Start
Budget End
Support Year
20
Fiscal Year
2013
Total Cost
$285,594
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Krupovic, Mart; Cvirkaite-Krupovic, Virginija; Iranzo, Jaime et al. (2018) Viruses of archaea: Structural, functional, environmental and evolutionary genomics. Virus Res 244:181-193
Yutin, Natalya; Makarova, Kira S; Gussow, Ayal B et al. (2018) Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut. Nat Microbiol 3:38-46
He, Fei; Bhoobalan-Chitty, Yuvaraj; Van, Lan B et al. (2018) Anti-CRISPR proteins encoded by archaeal lytic viruses inhibit subtype I-D immunity. Nat Microbiol 3:461-469
Shmakov, Sergey A; Makarova, Kira S; Wolf, Yuri I et al. (2018) Systematic prediction of genes functionally linked to CRISPR-Cas systems by gene neighborhood analysis. Proc Natl Acad Sci U S A 115:E5307-E5316
Pushkarev, Alina; Inoue, Keiichi; Larom, Shirley et al. (2018) A distinct abundant group of microbial rhodopsins discovered using functional metagenomics. Nature 558:595-599
Yutin, Natalya; Bäckström, Disa; Ettema, Thijs J G et al. (2018) Vast diversity of prokaryotic virus genomes encoding double jelly-roll major capsid proteins uncovered by genomic and metagenomic sequence analysis. Virol J 15:67
Ferrer, Manuel; Sorokin, Dimitry Y; Wolf, Yuri I et al. (2018) Proteomic Analysis of Methanonatronarchaeum thermophilum AMET1, a Representative of a Putative New Class of Euryarchaeota, ""Methanonatronarchaeia"". Genes (Basel) 9:
Koonin, Eugene V; Makarova, Kira S (2018) Discovery of Oligonucleotide Signaling Mediated by CRISPR-Associated Polymerases Solves Two Puzzles but Leaves an Enigma. ACS Chem Biol 13:309-312
Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I et al. (2018) Phyletic Distribution and Lineage-Specific Domain Architectures of Archaeal Two-Component Signal Transduction Systems. J Bacteriol 200:
Koonin, Eugene V; Makarova, Kira S; Wolf, Yuri I (2017) Evolutionary Genomics of Defense Systems in Archaea and Bacteria. Annu Rev Microbiol 71:233-261

Showing the most recent 10 out of 117 publications