The construction of a complete searchable library of protein fold models is one of the primary goals of this resubmitted proposal. These folds are to be represented as Discrete State Models (DSMs). These are statistical models representing all possible sequences of secondary structures and solvent exposure patterns compatible with a given protein 3D fold. The proposal extends initial work that successfully used an automated assembly of such models directly from the 3D coordinates of a set of 58 proteins of determined structure, with the most recent work bringing this number to 790. The entire library can be searched with a query amino acid sequence using the Hidden Markov Model (HMM) forward algorithm for the model having highest posterior probability. Given the most probable model or models, the HMM forward-backward algorithm can then be used to assign to each amino acid in any query sequence the probabilities of its occurrence in each of the modeled structural states. Given the existence of such a library of modeled folds, it is also proposed to exploit recent work on combining functionally diagnostic sequence patterns with their proper structural context. This is to be done by embedding such patterns when represented as profiles directly into the DSM models as HMM state amino acid emission probabilities. This will result in a second library of DSMs diagnostic of both protein functional as well as structural families. Finally, Dr. Smith proposes, as a component of this revised submission, to extend the use of the DSM/HMM library to the structure dissection of multidomains of unknown structure. The availability of these two libraries will greatly facilitate the analysis of new genomic sequences. This in turn will provide new insights into the function of many biologically and medically important proteins. In addition, these libraries and their associated analysis tools will provide valuable data to aid in the experimental design to test predictions about the cellular roles of such proteins.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM060564-01A1
Application #
6231333
Study Section
Molecular and Cellular Biophysics Study Section (BBCA)
Program Officer
Edmonds, Charles G
Project Start
2001-05-01
Project End
2004-04-30
Budget Start
2001-05-01
Budget End
2002-04-30
Support Year
1
Fiscal Year
2001
Total Cost
$244,500
Indirect Cost
Name
Boston University
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
042250712
City
Boston
State
MA
Country
United States
Zip Code
02215