Statistics of Gapping in Protein Motif Matching

Spouge, J

Abstract

The need for rigorous statistics in sequence analysis is now generally conceded, particularly in light of the success of the BLAST suite of programs at NCBI. Insertions and deletions in proteins pose statistical problems in sequence matching, problems that are presently at best only partially solved. Classifying proteins into protein families has been shown to improve dection of distant homologs in the protein database, because it provides a broader picture of motif conservation in a particular protein. Several approaches to protein classification are presently available. Andy Neuwald has pursued a strategy using Gibbs sampling to analyze the motifs in a protein family, but until recently the Gibbs sampler could not take advantage of gapping information. This information can be described as follows. The distance between different sequence elements in a protein motif usually reflect loops between conserved secondary structure elements in the protein, and it is known that the loops often have a tight, well-defined length distribution. The gaps between the motif elements can be included in match scores, but an assessment of the statistical significance of the resulting scores has largely been lacking up to now. With improved combinatoric computational techniques, we discovered some new relationships between distant members of a protein family called the """"""""AAA+"""""""" family. This discovery indicates that current methods of similarity detection are overlooking information in the gap lengths between protein motifs.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Intramural Research (Z01)
Project #: 1Z01LM000081-02
Application #: 6111080
Study Section: Special Emphasis Panel (CBB)

Project Start
Project End
Budget Start
Budget End
Support Year: 2
Fiscal Year: 1998
Total Cost
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country: United States
Zip Code

Related projects


NIH 1998 Z01 LM	Statistics of Gapping in Protein Motif Matching Spouge, J L. / National Library of Medicine
NIH 1997 Z01 LM	Statistics of Gapping in Protein Motif Matching Spouge, J L. / National Library of Medicine

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Related projects

Comments