We have completed and tested a novel potential for protein fold recognition which makes use of higher order interactions, namely up to four-body interactions, among geometrically close amino acid residue side chains. Careful investigations have shown that the potential is a significant improvement over potentials involving only single and pair- wise interactions, and that particular interacting sets of residues, for example, positive charge-negative-charge-hydrophobic residue triples can interact in a surprising, stabilizing way. Use of this potential may improve protein threading studies beyond the current levels of performance obtained in other laboratories. We have developed hidden Markov models (HMMs) of protein secondary structure sequences for 14 distinct topologies of protein folds. These models have been tested in a systematic cross-validated setting, and show a substantial ability to recognize distantly related protein sequences as members of their respective classes. For example, one model can recognize the OB gene product leptin correctly, as a member of the cytokine fold family. This set of models and recognition methodology will prove extremely useful for assigning both structure and function to novel sequences emerging from large scale sequencing projects. A project to make the HMMs available as a Web-based service was begun, and should be completed in the coming months. This service will allow users to submit sequences for possible recognition by one of the models in our current library. The library will be augmented with additional models as the protein structure database grows, and as time permits. In collaboration with the NIH Clinical Center Clinical Pathology Department, ABS staff has designed a new intelligent computational technology for enhancing quality control in laboratory instruments. The methodology employs a novel blend of statistical, signal processing, and neural network techniques for early on-line detection of bias and precision errors occurring in test values associated with specimens drawn from patients and other human subjects. If proven effective, this methodology will enable faster verification of test results thereby facilitating patient management decisions, improving patient care, and, in general, enhancing certain biomedical research objectives. A working prototype and patent filing are expected to be completed by the end of calendar year 1997. The Section continues to support and develop a Web-server providing users with protein secondary structure predictions based on sequence. The algorithms include the quadratic logistic (QL), GOR-IV and numerous pointers to other algorithms and resources provided elsewhere. The Section gave a course on this topic in the DCRT training program, and extended version of this workshop was presented at the 5th International Meeting of Intelligent Systems in Molecular Biology (ISMB- 97) held in Halkidiki, Greece. The Section also continues to distribute and support programs for ligand binding data analysis for NIH users (LIGAND and ALLFIT). Statistical consultations were provided to several groups in this area and in the areas of experimental design, protein structure recognition, and gene function identification.
Showing the most recent 10 out of 15 publications