The rapid accumulation of genome sequences and protein structures during the last decade has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI formed the basis of our work on protein motif analysis. In addition, Hidden Markov Models (HMM), protein profile-against-profile comparison implemented in the HHSearch method, protein structure comparison methods and genome context analysis were extensively applied. Over the last year, we made further progress in detailed analysis of the classification, evolution, and functions of several classes of proteins and domains. Specifically, we studied in detail the protein complexes and functional systems that are involved in adaptive immunity in prokaryotes. The CRISPR-Cas adaptive immunity systems that are present in most Archaea and many Bacteria function by incorporating fragments of alien genomes into specific genomic loci, transcribing the inserts and using the transcripts as guide RNAs to destroy the genome of the cognate virus or plasmid. This RNA interference-like immune response is mediated by numerous, diverse and rapidly evolving Cas (CRISPR-associated) proteins, several of which form the Cascade complex involved in the processing of CRISPR transcripts and cleavage of the target DNA. Our comparative analysis of the Cas protein sequences and structures led to a new classification of the CRISPR-Cas systems into three distinct Types (I, II and III). A detailed comparison of the available sequences and structures of Cas proteins revealed several previously unnoticed homologous relationships. The Repeat-Associated Mysterious Proteins (RAMPs) containing a distinct form of the RNA Recognition Motif (RRM) domain, which are major components of the CRISPR-Cas systems, were classified into three large groups, Cas5, Cas6 and Cas7. Each of these groups includes many previously uncharacterized proteins now shown to adopt the RAMP structure. Evidence is presented that large subunits contained in most of the CRISPR-Cas systems could be homologous to Cas10 proteins which contain a polymerase-like Palm domain and are predicted to be enzymatically active in Type III CRISPR-Cas systems but inactivated in Type I systems. These findings, the fact that the CRISPR polymerases, RAMPs and Cas2 all contain core RRM domains, and distinct gene arrangements in the three types of CRISPR-Cas systems together provide for a simple scenario for origin and evolution of the CRISPR-Cas machinery. Under this scenario, the CRISPR-Cas system originated in thermophilic Archaea and subsequently spread horizontally among prokaryotes. Thus. unification of Cas protein families previously considered unrelated provided for substantial improvement in the classification of CRISPR-Cas systems and a reconstruction of their evolution. In addition, we have extensively characterized protein domains and predicted numerous novel protein complexes involved in other forms of antivirus defense in prokaryotes. The arms race between cellular life forms and viruses is a major driving force of evolution. A substantial fraction of bacterial and archaeal genomes is dedicated to antivirus defense. We analyzed the distribution of defense genes and typical mobilome components (such as viral and transposon genes) in bacterial and archaeal genomes, and demonstrated statistically significant clustering of antivirus defense systems and mobile genes and elements in genomic islands. The defense islands are enriched in putative operons and contain numerous over-represented gene families. A detailed sequence analysis of the proteins encoded by genes in these families shows that many of them are diverged variants of known defense system components, whereas others show features, such as characteristic operonic organization, that are suggestive of novel defense systems. Thus, genomic islands provide abundant material for experimental study of bacterial and archaeal antivirus defense. Except for the CRISPR-Cas systems, different classes of defense systems, in particular toxin-antitoxin and restriction-modification systems, show non-random clustering in defense islands. It remains unclear to what extant these associations reflect functional cooperation between different defense systems and to what extent the islands are genomic 'sinks'that accumulate diverse non-essential genes, particularly those acquired via HGT. The characteristics of defense islands resemble those of mobilome islands. Defense and mobilome genes are non-randomly associated in islands, suggesting non-adaptive evolution of the islands via a preferential attachment-like mechanism underpinned by the addictive properties of defense systems such as toxins-antitoxins and an important role of horizontal mobility in the evolution of these islands.
Showing the most recent 10 out of 117 publications