A new statistical model for prediction of protein secondary structure was able to achieve 76% accuracy on three structural classes. This model utilized penalized maximum likelihood techniques for a quadratic logistic model based on 17 residue neighborhoods. The parameters of the model were interpreted in light of known preference patterns for residue-residue contacts. A nonparametric kernel density estimation approach produced greater than 60% accuracy and could effectively incorporate both homology and structural class information into the predictions. Data visualization techniques were effectively employed to aid understanding and communication of features of the Brookhaven Data Base of protein structures. C-alpha distance and C-alpha contact maps appropriately presented revealed patterns and textures related to regularities in structure. The statistical distribution of alpha-carbon pair separation distances as a function of chain separation revealed significant patterns related to secondary structures. A normal and lognormal distribution gave good approximation to the observed distribution with some significant departures. Our previously developed algorithm for alignment of multiple sequences was upgraded and optimized for its implementation on DOS-based machines. The algorithm is presently being implemented on the Intel parallel machine in collaboration with CSL. Other groups (ICOT-Japan) have adapted our algorithm to advantage using parallel architectures. The algorithm promises to be highly efficient in a parallel setting.

Agency
National Institute of Health (NIH)
Institute
Center for Information Technology (CIT)
Type
Intramural Research (Z01)
Project #
1Z01CT000226-02
Application #
3838537
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
1992
Total Cost
Indirect Cost
Name
Center for Information Technology
Department
Type
DUNS #
City
State
Country
United States
Zip Code