Techniques are proposed for identifying full-length cDNA clones based on single-pass 5' EST data. The purpose of this classification is to select clones that are good candidates for full insert sequencing. To find significant alternative transcripts that has not yet been described. The project is driven by a collaborative effort between the laboratories of Thomas Casavant and Bento Soares. The Soares lab is well known for their capabilities in producing high-quality cDNA libraries, enriched for full-length mRNA transcripts. The Casavant lab has significant experience in managing and analyzing large amounts of EST data, and full-insert sequence assembly. The project will first work to further develop a pipeline to handle the specialized analysis unique to 5' ESTs from full-length enriched libraries. The pipeline will use primarily homology-based methods to identify ESTs that should be selected for full insert sequencing and assembly. Software will also be developed to identify ESTs that are candidates for full length sequencing that do not have evidence for this assignment from homology to known genes. This has the potential for finding interesting transcripts from previously uncharacterized genes, and proteins. Finally, approaches that use existing genomic based prediction tools will be explored for their utility in correctly assigning clones by using a combination of EST and genomic sequence data. The results from each of the methods will be evaluated for their effectiveness in selection of sequence confirmed clones.
Kalari, Krishna R; Casavant, Melanie; Bair, Thomas B et al. (2006) First exons and introns--a survey of GC content and gene structure in the human genome. In Silico Biol 6:237-42 |
Bonaldo, Maria F; Bair, Thomas B; Scheetz, Todd E et al. (2004) 1274 full-open reading frames of transcripts expressed in the developing mouse nervous system. Genome Res 14:2053-63 |