This work involves a form of datamining, including the determination of motifs that may exist in a database of objects. This becomes more significant when one has minimal information concerning the motifs. Ultimately, one would like to determine whether or not the set of motifs discovered act as good classifiers. The data objects dealt with may consist of sequences, trees, graphs or records.Examples of the use of these methods include: 1) the determination of 3D motifs in bio- molecules. The motifs that the algorithms find are rigid substructures which may occur in a graph after allowing for an arbitrary number of rotations and translations as well as a small number of node insert/delete operations in the motifs or graphs. By combining a geometric hashing? technique and ?block detection? algorithms for undirected graphs we are able to find motifs approximately in a set of graphs; 2) the determination of the largest approximately common substructures of two trees based on an edit distance metric. Using a method known as ?selective memorization?, the algorithm was used to discover motifs in multiple RNA secondary structures which can be represented as trees; 3) sequence data, as mentioned above, canalso be used for pattern discovery. Protein sequences were classified with a 98% precision rate. The technique is currently being applied to recognize splice junction sites.Z01 BC 10045-04 - Computer analysis, databases, Protein families, Protein Sequences, Sequence analysis, Datamining, motifs, RNA secondary structure, 3D molecular motifs, computational biology, Computational methods, RNA folding,

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Intramural Research (Z01)
Project #
1Z01BC010045-04
Application #
6289315
Study Section
Special Emphasis Panel (LECB)
Project Start
Project End
Budget Start
Budget End
Support Year
4
Fiscal Year
1999
Total Cost
Indirect Cost
Name
National Cancer Institute Division of Basic Sciences
Department
Type
DUNS #
City
State
Country
United States
Zip Code