Pattern Discovery for Combinatorial Databases

Shapiro, B

Abstract

The determination of motifs that classify entries in a database is important especially when one has minimal information concerning the motifs that determine a specific family. This project, a form of datamining, revolves around the concept of computationally finding the largest approximately identical substructures in a set of data objects. These discovered substructures or motifs are then tested against the data to see how well they characterize the data in the sense of being good classifiers. The database of objects can consist of entities such as sequences, trees, graphs or records. We have applied these classification techniques to biological databases in the following three areas: 1) 3-D graphs representing bio-molecules. We were able to show a 91% precision rate in determing motifs to classify three different families of molecules. Z01 BC 10045-02 LMMB to LECB 2) tree structures representing RNA secondary structure. We were able to discover tree motifs that classified three families of RNA structures. 3) strings representing protein sequences. Using five methods for, protein sequence classification, one being our own, we found that the five methods gave information which is complementary to each other. Thus, using the five methods together, one can obtain high confidence classifications or suggest alternative hypotheses.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Intramural Research (Z01)
Project #: 1Z01BC010045-02
Application #: 6161135
Study Section: Special Emphasis Panel (LECB)

Project Start
Project End
Budget Start
Budget End
Support Year: 2
Fiscal Year: 1997
Total Cost
Indirect Cost

Institution

Name: National Cancer Institute Division of Basic Sciences
Department
Type
DUNS #

City
State
Country: United States
Zip Code

Related projects


NIH 1999 Z01 CA	Pattern Discovery for Combinatorial Databases Shapiro, Bruce / National Cancer Institute Division of Basic Sciences
NIH 1998 Z01 CA	Pattern Discovery for Combinatorial Databases Shapiro, B A. / National Cancer Institute Division of Basic Sciences
NIH 1997 Z01 CA	Pattern Discovery for Combinatorial Databases Shapiro, B A. / National Cancer Institute Division of Basic Sciences

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Related projects

Comments