With the rapid appearance of large portions of prokaryotic and eukaryotic genome sequences, it is important to assess the capabilities and limitations of current computer methods for biological interpretation of these sequences, and to extend the methods to allow a greater range of functional features to be identified. Detailed analyses of sequences of yeast chromosome III and portions of the E. coli chromosome are being performed as test cases in order to define an adequate set of methods. >From the yeast chromosome III sequence (315 kilobases), 182 initially-reported open reading frames (ORFs), together with additional ORFs that were completely contained within these ORFs or in intergene regions, were subjected to a strategy based on the BLAST algorithm for comparison with sequence databases. Very low BLAST score thresholds were initially defined and every similarity that could be suspected to be biologically significant was assessed in detail using multiple alignment methods. This analysis led to identification of meaningful similarities with other proteins for 31 ORFs, in addition to the 29 described previously and the 37 known genes, thus raising to over 50 percent the fraction of ORFs for which functional assignment and/or sequence relationships were available. Significant sequence similarity was found for only one additional ORF in an intergene region, suggesting that genes are loosely packed in this yeast chromosome. In addition, using strategies for identification of transmembrane segments, 49 possible membrane proteins were found, of which 17 contained multiple probable transmembrane helices and may function as transporters or receptors. Statistical methods are also being applied in order to assess the significance of any clustering of various attributes of ORFs and other DNA sequence features. The significance of the project lies in the potential for development of improved strategies for computer analysis of gene functions and arrangement of genes and DNA features in large genomes.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000027-01
Application #
3845115
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
1992
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code