The rapidly growing database of completely and nearly completely sequenced genomes of bacteria, archaea, eukaryotes and viruses (several thousand genomes already available and many more in progress) creates both extensive new opportunities and major new challenges for genome research. During the last year, we performed a variety of studies that took advantage of the genomic information to establish fundamental principles of genome evolution. We made efforts at developing a general theory of genome evolution starting with ideas and models from condensed matter physics. Biological systems reach hierarchical complexity that has no counterpart outside the realm of biology. Undoubtedly, biological entities obey the fundamental physical laws. Can today's physics provide an explanatory framework for understanding the evolution of biological complexity? We argue that the physical foundation for understanding the origin and evolution of complexity can be gleaned at the interface between the theory of frustrated states resulting in pattern formation in glass-like media and the theory of self-organized criticality (SOC). On the one hand, SOC has been shown to emerge in spin-glass systems of high dimensionality. On the other hand, SOC is often viewed as the most appropriate physical description of evolutionary transitions in biology. We unify these two faces of SOC by showing that emergence of complex features in biological evolution typically, if not always, is triggered by frustration that is caused by competing interactions at different organizational levels. Such competing interactions lead to SOC, which represents the optimal conditions for the emergence of complexity. Competing interactions and frustrated states permeate biology at all organizational levels and are tightly linked to the ubiquitous competition for limiting resources. This perspective extends from the comparatively simple phenomena occurring in glasses to large-scale events of biological evolution, such as major evolutionary transitions. Frustration caused by competing interactions in multidimensional systems could be the general driving force behind the emergence of complexity, within and beyond the domain of biology. On a different front, we ventured into the area of cancer genomics using mathematical methods for network analysis. Cancer genomics has produced extensive information on cancer-associated genes, but the number and specificity of cancer-driver mutations remains a matter of debate. We constructed a bipartite network in which 7,665 tumors from 30 cancer types are connected via shared mutations in 198 previously identified cancer genes. We show that about 27% of the tumors can be assigned to statistically supported modules, most of which encompass one or two cancer types. The rest of the tumors belong to a diffuse network component suggesting lower gene specificity of driver mutations. Linear regression of the mutational loads in cancer genes was used to estimate the number of drivers required for the onset of different cancers. The mean number of drivers in known cancer genes is approximately two, with a range of one to five. Cancers that are associated with modules had more drivers than those from the diffuse network component, suggesting that unidentified and/or interchangeable drivers exist in the latter. We extended our research into the selection processes that shape the evolution of different genetic signals, such as start and stop codons of protein-coding genes. Modes of evolution of stop codons in protein-coding genes, especially the conservation of UAA, have been debated for many years. We reconstructed the evolution of stop codons in 40 groups of closely related prokaryotic and eukaryotic genomes. The results indicate that the UAA codons are maintained by purifying selection in all domains of life. In contrast, positive selection appears to drive switches from UAG to other stop codons in prokaryotes but not in eukaryotes. Changes in stop codons are significantly associated with increased substitution frequency immediately downstream of the stop. These positions are otherwise more strongly conserved in evolution compared to sites farther downstream, suggesting that such substitutions are compensatory. Although GC content has a major impact on stop codon frequencies, its contribution to the decreased frequency of UAA differs between bacteria and archaea, presumably, due to differences in their translation termination mechanisms. We also continued research into comparative genomics of the CRISPR-Cas systems, in particular, the evolution of the RNA molecules that are required for the functions of Cas9 and related CRISPR effectors. Trans-activating CRISPR (tracr) RNA is a distinct RNA species that interacts with the CRISPR (cr) RNA to form the dual guide (g) RNA in type II and subtype V-B CRISPR-Cas systems. The tracrRNA-crRNA interaction is essential for pre-crRNA processing as well as target recognition and cleavage. The tracrRNA consists of an antirepeat, which forms an imperfect hybrid with the repeat in the crRNA, and a distal region containing a Rho-independent terminator. Exhaustive comparative analysis of the sequences and predicted structures of the Class 2 CRISPR guide RNAs shows that all these guide RNAs share distinct structural features, in particular, the nexus stem-loop that separates the repeat-antirepeat hybrid from the distal portion of the tracrRNA and the conserved GU pair at that end of the hybrid. These structural constraints might ensure full exposure of the spacer for target recognition. Reconstruction of tracrRNA evolution for 4 tight bacterial groups demonstrates random drift of repeat-antirepeat complementarity within a window of hybrid stability that is, apparently, maintained by selection. An evolutionary scenario is proposed whereby tracrRNAs evolved on multiple occasions, via rearrangement of a CRISPR array to form the antirepeat in different locations with respect to the array. A functional tracrRNA would form if, in the new location, the antirepeat is flanked by sequences that meet the minimal requirements for a promoter and a Rho-independent terminator. Alternatively, or additionally, the antirepeat sequence could be occasionally 'reset' by recombination with a repeat, restoring the functionality of tracrRNAs that drift beyond the required minimal hybrid stability. Taken together, these studies advance the existing understanding of the general principles and specific aspects of genome evolution in diverse life forms, in particular viruses and mobile elements, as well as cancer genome evolution, and provide new insights into general principles of genome evolution.

Project Start
Project End
Budget Start
Budget End
Support Year
23
Fiscal Year
2018
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Krupovic, Mart; Cvirkaite-Krupovic, Virginija; Iranzo, Jaime et al. (2018) Viruses of archaea: Structural, functional, environmental and evolutionary genomics. Virus Res 244:181-193
Yutin, Natalya; Makarova, Kira S; Gussow, Ayal B et al. (2018) Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut. Nat Microbiol 3:38-46
He, Fei; Bhoobalan-Chitty, Yuvaraj; Van, Lan B et al. (2018) Anti-CRISPR proteins encoded by archaeal lytic viruses inhibit subtype I-D immunity. Nat Microbiol 3:461-469
Shmakov, Sergey A; Makarova, Kira S; Wolf, Yuri I et al. (2018) Systematic prediction of genes functionally linked to CRISPR-Cas systems by gene neighborhood analysis. Proc Natl Acad Sci U S A 115:E5307-E5316
Pushkarev, Alina; Inoue, Keiichi; Larom, Shirley et al. (2018) A distinct abundant group of microbial rhodopsins discovered using functional metagenomics. Nature 558:595-599
Amarasinghe, Gaya K; Aréchiga Ceballos, Nidia G; Banyard, Ashley C et al. (2018) Taxonomy of the order Mononegavirales: update 2018. Arch Virol 163:2283-2294
Yutin, Natalya; Bäckström, Disa; Ettema, Thijs J G et al. (2018) Vast diversity of prokaryotic virus genomes encoding double jelly-roll major capsid proteins uncovered by genomic and metagenomic sequence analysis. Virol J 15:67
Ferrer, Manuel; Sorokin, Dimitry Y; Wolf, Yuri I et al. (2018) Proteomic Analysis of Methanonatronarchaeum thermophilum AMET1, a Representative of a Putative New Class of Euryarchaeota, ""Methanonatronarchaeia"". Genes (Basel) 9:
Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I et al. (2018) Phyletic Distribution and Lineage-Specific Domain Architectures of Archaeal Two-Component Signal Transduction Systems. J Bacteriol 200:
Koonin, Eugene V (2017) Evolution of RNA- and DNA-guided antivirus defense systems in prokaryotes and eukaryotes: common ancestry vs convergence. Biol Direct 12:5

Showing the most recent 10 out of 196 publications