Goals of this project are: 1) Develop theoretical foundations for sequence analysis research. 2) Provide a consistent methodology for locating putative functional domains in unannotated nucleic acid sequences. Computational experiments have been performed to investigate which of the existing methods (including those resulting from the previous years of this project) gives the best estimate of the location of distinct functional domains in unannotated nucleotide sequences. It has also been necessary to examine the very theoretical foundations of sequence analysis. In particular, conceptual algorithms enabling one to interpret nucleotide sequences through a linguistic framework have been developed and implemented in new sequence analysis software. Universal information-theoretic principles governing distribution of short oligonucleotides have been discovered and are now being studied in detail. It is believed that the novel principles provide a powerful basis for discriminant analysis algorithms and, on the other hand, might shed some light on the mechanistic aspects of genome fragments' (not only genes') expression. An international workshop (Open Problems of Computational Molecular Biology, Telluride, CO, June, 1991) has been organized to address the so called biological coding problem emerging from this project. The formulation of this problem and methods for its solution are now being studied in several research groups worldwide.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Intramural Research (Z01)
Project #
1Z01CB008394-04
Application #
3796475
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
4
Fiscal Year
1992
Total Cost
Indirect Cost
Name
Division of Cancer Biology and Diagnosis
Department
Type
DUNS #
City
State
Country
United States
Zip Code