The generation, cloning, and use of 5' Short Sequence Tags (5' SSTs) will be developed in this project as a new and powerful tool for rare mRNA identification, genome annotation, and promoter discovery. These tags represent the 20-100 nucleotides adjacent to the 5'-cap of mRNA. Because of their short length, they can be used for highly specific subtractive hybridization to remove abundant messages and enable rare message identification. This Phase I project will develop the fragmentation and purification details necessary to produce high quality 5' SST libraries. To date, mapping genes in genomic DNA sequence relies on the sequence data from expressed sequence tags (ESTs). This EST data is highly redundant due to repetitive sequencing of the abundant mRNA species. The size and purity of the 5' SSTs is expected to allow more efficient removal of abundant mRNA, potentially giving >1000x improved representation of rare mRNA species at a small fraction of the extant EST technology costs. 5' SSTs can be used for fast annotation of newly sequenced genomes. Importantly, the technology would also allow fast mapping of 5'-upstream promoters, few of which are currently known, and discovery of disease-associated SNPs in these promoters.