The rapidly growing database of completely sequenced genomes of bacteria, archaea and eukaryotes (over 500 genomes available by the middle of 2004 and many more in progress) creates both new opportunities and new challenges for genome research. During the last year, we performed several studies that took advantage of the genomic information to establish fundamental principles of genome evolution and function. In particualr, we proposed a new type of rare genomic changes (RGCs) designated RGC_CAMs (after Conserved Amino acids-Multiple substitutions), which are inferred using a genome-scale analysis of protein and underlying nucleotide sequence alignments. The RGC_CAM approach utilizes amino acid residues conserved in major eukaryotic lineages, with the exception of a few species comprising a putative clade, and selects for phylogenetic inference only those amino acid replacements that require 2 or 3 nucleotide substitutions, in order to reduce homoplasy. The RGC_CAM analysis was combined with a procedure for rigorous statistical testing of competing phylogenetic hypotheses. The RGC_CAM method was shown to be robust to branch length differences and taxon sampling. When applied to animal phylogeny, the RGC_CAM approach strongly supports the coelomate clade that unites chordates with arthropods as opposed to the ecdysozoan (molting animals) clade. This conclusion runs against the view of animal evolution that is currently prevailing in the evo-devo community. It is expected that RGC_CAM and other RGC-based methods will be crucial for these future, definitive phylogenetic studies. In another major study, we performed detailed evolutionary analysis of conserved eukaryotic genes in an attempt to gain new insights into the origin of eukaryotes. The set of conserved eukaryotic protein-coding genes includes distinct subsets one of which appears to be most closely related to and, by inference, derived from archaea, whereas another one appears to be of bacterial, possibly, endosymbiotic origin. The """"""""archaeal"""""""" genes of eukaryotes, primarily, encode components of information-processing systems, whereas the """"""""bacterial"""""""" genes are predominantly operational. The precise nature of the archaeo-eukaryotic relationship remains uncertain, and it has been variously argued that eukaryotic informational genes evolved from the homologous genes of Euryarchaeota or Crenarchaeota (the major branches of extant archaea) or that the origin of eukaryotes lies outside the known diversity of archaea. We describe a comprehensive set of 355 eukaryotic genes of apparent archaeal origin identified through ortholog detection and phylogenetic analysis. Phylogenetic hypothesis testing using constrained trees, combined with a systematic search for shared derived characters in the form of homologous inserts in conserved proteins, indicate that, for the majority of these genes, the preferred tree topology is one with the eukaryotic branch placed outside the extant diversity of archaea although small subsets of genes show crenarchaeal and euryarchaeal affinities. Thus, the archaeal genes in eukaryotes appear to descend from a distinct, ancient, and otherwise uncharacterized archaeal lineage that acquired some euryarchaeal and crenarchaeal genes via early horizontal gene transfer.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000073-13
Application #
7735071
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
13
Fiscal Year
2008
Total Cost
$603,505
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Ivankov, Dmitry N; Payne, Samuel H; Galperin, Michael Y et al. (2013) How many signal peptides are there in bacteria? Environ Microbiol 15:983-90
Rogozin, Igor B; Carmel, Liran; Csuros, Miklos et al. (2012) Origin and evolution of spliceosomal introns. Biol Direct 7:11
Mulkidjanian, Armen Y; Bychkov, Andrew Yu; Dibrova, Daria V et al. (2012) Open questions on the origin of life at anoxic geothermal fields. Orig Life Evol Biosph 42:507-16
Denoeud, France; Henriet, Simon; Mungpakdee, Sutada et al. (2010) Plasticity of animal genome architecture unmasked by rapid evolution of a pelagic tunicate. Science 330:1381-5
Lee, Renny C H; Gill, Erin E; Roy, Scott W et al. (2010) Constrained intron structures in a microsporidian. Mol Biol Evol 27:1979-82
Wolf, Yuri I; Novichkov, Pavel S; Karev, Georgy P et al. (2009) Inaugural Article: The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc Natl Acad Sci U S A 106:7273-80
Koonin, Eugene V (2009) On the origin of cells and viruses: primordial virus world scenario. Ann N Y Acad Sci 1178:47-64
Basu, Malay Kumar; Poliakov, Eugenia; Rogozin, Igor B (2009) Domain mobility in proteins: functional and evolutionary implications. Brief Bioinform 10:205-16
Koonin, Eugene V; Senkevich, Tatiana G; Dolja, Valerian V (2009) Compelling reasons why viruses are relevant for the origin of cells. Nat Rev Microbiol 7:615; author reply 615
Koonin, E V; Wolf, Y I; Puigbò, P (2009) The phylogenetic forest and the quest for the elusive tree of life. Cold Spring Harb Symp Quant Biol 74:205-13

Showing the most recent 10 out of 101 publications