Expressed Sequence Tags (ESTs) are partial nucleotide sequences derived from clones that are randomly-selected from cDNA libraries. The accumulation and analysis of ESTs has become an important component of genome research. The rate of EST sequence acquisition is accelerating and >25,000 ESTs have been accessioned into our database, dbEST, during the past two years.
dbEST aims to provide value-added annotation and timely access to new data and analyses. We have developed a relational database to manage EST data as well as a custom software system for data analysis. This system performs periodic homology updates after screening the query sequences and masking contaminating or uninformative subsequences. Also stored in the database is information about the availability of physical DNA clones and genetics map locations. These data and analyses are made available to the public in four ways: 1) dbEST is a line-item database for network and email BLAST searches; 2) full reports are available from est_report@ncbi.nlm.nih.gov; 3) as a FASTA-formatted file for anonymous ftp; and 4) in the new EST Division of GenBank. Plans are underway to implement an Internet Gopher service for EST information retrieval. We will also be expanding our storage and retrieval capability for genetic mapping data and will accept submissions of exon-trapped sequences as well as ESTs. A number of discoveries of medical significance have already been made using the dbEST resource to accelerate the cloning of human genes whose homologs have already been characterized in other species. Previous work has shown that the S. cerevisiae CDC27 gene acts late in G2 of the cell cycle, after DNA replication but prior to the onset of chromosome segregation and that the encoded protein binds to the mitotic spindle. Several unsuccessful attempts were made to clone the human homolog by conventional techniques. A search of the database of expressed sequence tags showed a weak but significant match to a human brain protein that was previously unidentified. Additional sequencing confirmed the homology and the human counterpart of the yeast CDC27 gene was mapped to human chromosome 17q21-24 thus becoming a candidate gene for human breast cancer susceptibility. A computer study was performed using a training set of known yeast-human homolog pairs to assess the general utility of this method for gene discovery and to optimize search parameters. Over the next three years we will systematically search for new yeast-human homolog pairs and map approximately 1000 of these to human chromosomes. We expect to find a number of yeast proteins that will become models for human disease genes by virtue of their map locations.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000015-02
Application #
3781263
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
1993
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code