Completion of the DNA sequence of the yeast genome has made accessible a large number of questions about the organization and expression of eukaryotic genomes. Important among these questions is defining a complete minimum protein set necessary for eukaryotic cell growth and regulation, key to understanding human cancer. A hallmark of the eukaryotes is the abundant presence of introns, internal gene sequences not found in the mature messenger RNAs (mRNAs) that specify the protein coding capacity of the genome. The presence of introns clouds our ability to see open reading frames in the genomic sequence. To understand the complete coding capacity of the yeast genome, and of other eukaryotic genomes, we must first be able to recognize introns in the genomic sequence. With the complete sequence of the yeast genome in hand, we have the opportunity to map the positions of all the nuclear pre-mRNA introns in the yeast genome, and thus reveal its protein coding capacity. At this writing 220 yeast introns are known or predicted, but these have been identified in a biased, ad hoc fashion. We have developed a powerful molecular approach to the direct detection of introns in a manner not biased by the contents of the gene in which it is embedded. Oligonucleotides complementary to the unique lariat sequence formed during splicing (""""""""branchmers"""""""") specifically prime reverse transcription of lariat intron RNA. Mutations that inactivate the lariat debranching enzyme cause dramatic accumulation of intron RNA in yeast. Thus branchmer oligonucleotides will be used to generate expressed intron probes.
Our aims are (1) to create and screen libraries of """"""""expressed intron tag"""""""" clones derived from strains of yeast that accumulate large-amounts of intron RNA. These clones will be sequenced to generate a database of expressed intron sequences, (2) to identify genomic sequences similar to known introns using informatic approaches and test these for splicing potential in vivo, and (3) to refine repeated applications of each approach until a complete set of confirmed introns is mapped to the sequence of the genome. Finding all the introns will be essential to the complete understanding of the coding capacity of the genome.