Well Characterized Tools for Gene Discovery and Analysis

Fickett, James

Abstract

It is well known both that the speed of sequencing far outstrips the speed at which sequence can be experimentally analyzed to identify new genes and classify them according to function, and that combining computational and experimental evidence significantly speeds this analysis. More accurate algorithms will be economically as well as scientifically advantageous, since the decision to perform an experiment is often based in part on computational results. This proposal is to significantly improve the state of the art of computational methods for gene identification and classification, in each of three areas: 1. Bench marking. Current algorithms often fail to use the best methods, primarily because most methods are of unknown accuracy. Recently, Fickett & Tung made the first comprehensive assessment of coding region detection measures. This study showed that while many packages still base coding region detection on codon counts, in-phase hexamer counts can give better accuracy. It was also shown that merely combining the six best coding measures with a linear discriminant gives improvement over the already impressive Coding Recognition Module of GRAIL. Further assessment will be done for decision methods; for transcription, splicing, and translation signal detection; and for characterization of overall gene syntax. The best methods will be refined. 2. Biology. Current algorithms incorporate a number of elegant computational and statistical techniques, but none incorporates a model of transcription, splicing, and translation that is current with biological understanding. The Kozak rules for location of the translation initiation codon provide one clear example. Another is that, while it is not yet possible to describe eukaryotic promoters in detail, the current norm of always requiring a simple consensus CAAT and TATA box can be improved upon. Also, ft can be shown that taking the domain structure of genomes into account reduces prediction errors by 20%. 3. Integration. Most investigators currently gather information independently (to a first approximation) from experiment, from database searches, and from gene identification algorithms, and afterwards mentally integrate it to arrive at tentative locations and possible functions of genes in a sequence. However, data from each of these sources can influence not only the interpretation of data from the others, but even which data are brought to one's attention. Under this proposal algorithms will be developed that can take voluminous low-level data from the three sources and give an overall summary consistent (insofar as possible) with all the information.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 1R01HG000981-01A1
Application #: 2209225
Study Section: Genome Study Section (GNM)

Project Start: 1994-08-01
Project End: 1997-05-31
Budget Start: 1994-08-01
Budget End: 1995-05-31
Support Year: 1
Fiscal Year: 1994
Total Cost
Indirect Cost

Institution

Name: Los Alamos National Lab
Department
Type: Organized Research Units
DUNS #

City: Los Alamos
State: NM
Country: United States
Zip Code: 87545

Related projects


NIH 1996 R01 HG	Well Characterized Tools for Gene Discovery and Analysis Fickett, James W. / Glaxosmithkline
NIH 1995 R01 HG	Well Characterized Tools for Gene Discovery and Analysis Fickett, James W. / Los Alamos National Lab
NIH 1994 R01 HG	Well Characterized Tools for Gene Discovery and Analysis Fickett, James W. / Los Alamos National Lab

Publications

Wasserman, W W; Palumbo, M; Thompson, W et al. (2000) Human-mouse genome comparisons to locate regulatory sites. Nat Genet 26:225-8

Wasserman, W W; Fickett, J W (1998) Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol 278:167-81

Guigo, R (1998) Assembling genes from predicted exons in linear time with dynamic programming. J Comput Biol 5:681-702

Fickett, J W (1998) Predictive methods using nucleotide sequences. Methods Biochem Anal 39:231-45

Fickett, J W; Hatzigeorgiou, A G (1997) Eukaryotic promoter recognition. Genome Res 7:861-78

Guigo, R (1997) Computational gene identification: an open problem. Comput Chem 21:215-22

Fickett, J W (1996) Coordinate positioning of MEF2 and myogenin binding sites. Gene 172:GC19-32

Fickett, J W (1996) Finding genes by computer: the state of the art. Trends Genet 12:316-20

Fickett, J W (1996) Quantitative discrimination of MEF2 sites. Mol Cell Biol 16:437-41

Burset, M; Guigo, R (1996) Evaluation of gene structure prediction programs. Genomics 34:353-67

Comments

Be the first to comment on James Fickett's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: