In the universal genetic code, UGA is used as a signal for termination of protein synthesis and as a selenocysteine codon. Genes for selenocysteine-containing proteins lack common nucleotide and amino acid sequence motifs and no algorithms are available that identify such genes or distinguish between UGA-encoded selenocysteine and stop signals. Two eukaryotic genomes have been sequenced, and other genomes, including human, soon will be. Resolving the uncertainty of UGA dual function is important for identifying and characterizing proteins encoded in these genomes. This proposal aims to answer that challenge by identifying eukaryotic selenocysteine-encoding UGA codons through a new, efficient, sequence-based, genomic approach. This is possible because all eukaryotic selenoprotein genes contain a common stem-loop structure, selenocysteine insertion sequence (SECIS) element, in their 3' untranslated regions. The laboratory of the P.I. developed a computer program, SECISearch, that identifies selenoprotein genes by recognizing SECIS elements on the basis of their primary and secondary structures and free energy requirements. The proposed study will be guided by three specific aims: 1) development of a computer program that distinguishes between terminator and selenocysteine UGA codons; 2) identification of all or the majority of eukaryotic selenoprotein genes in completely sequenced eukaryotic genomes including human genome; and 3) experimental evidence for the occurrence and localization of newly identified selenoprotein. The feasibility of this approach has been tested. The initial version of SECISearch was used to successfully analyze genomes of S. cerevisiae and C. elegans, and detected selenoprotein genes in a variety of other eukaryotic organisms. In addition, new mammalian selenoprotein genes were found in nucleotide sequence databases and partially characterized. These data establish that it is feasible to identify and characterize all or the absolute majority of eukaryotic selenocysteine containing genes encoded in completely sequenced genomes. Identification of selenoprotein genes will distinguish UGA codons for selenocysteine from terminator codons and may explain many biological and biomedical effects of selenium.
Showing the most recent 10 out of 128 publications