Classification and Statistical Learning Approaches to Molecular Structure Analysi

States, D

Abstract

Molecular sequence databases contain approximately 5,000 independent families of protein sequence. A small number of these span multiple phyla and must represent ancient evolutionarily conserved families of proteins. For well studied phyla, most of these ancient families now appear to be represented in the molecular sequence databases. Proposed course: An algorithm, HHS, has bee developed to take pairwise similarity relations generated by the program BLASTP and to assemble these into classes of mutually related proteins. Two phases were used. In the first phase, the ungapped high scoring segments identified by BLAST are assembled into sets of mutually consistent diagonals forming a gapped sequence alignment. In the second phase, the extents of these gapped alignments two each protein are compared. Overlapping alignments indicate the presence of a protein sequence domain. A connected set definition is employed to map out each family of protein domains. The algorithm is computationally efficient and has been used to classify BLAST searches run between all pairs of the NCBI non-redundant sequence database. Future work will implement a name generator for these protein domains to allow them to be used se as an automated source of protein annotation for molecular sequences. The evolution of individual domains is also being explored.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Intramural Research (Z01)
Project #: 1Z01LM000028-01
Application #: 3845116
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 1
Fiscal Year: 1992
Total Cost
Indirect Cost

Classification and Statistical Learning Approaches to Molecular Structure Analysi
States, D J.
National Library of Medicine, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments