Current DNA and protein sequencing technology allows rapid acquisition of information concerning the primary structure of proteins. The growth of information concerning three dimensional protein structure, which is critical to understanding protein function is markedly slower, however. Although the primary structure of a protein completely determines its secondary and tertiary structure, no procedure can yet predict the complete structure of a protein form sequence alone. This project will apply neural network simulations to the protein folding problem. A back-propagation neural network architecture, already implemented and optimized on the Minnesota Cray 2 supercomputer, will be configured to form an association between sequence information and three dimensional structure for 100 of the smaller proteins with known structure. After the network has""""""""learned"""""""" this training set after perhaps 100-1000 presentations, it will be tested for retention of some of the rules governing protein folding: we will present the network with sequences which are new to it but which have known structures. We will compare its """"""""predictions"""""""" with actual structure; successful performance would be 2 A predictions for 95% of the novel proteins. The neural network program we will use was developed here to model arbitrarily large networks. The program includes a Network Description Language (NDL) which allows an experimenter to configure the network for input-output data sets of arbitray structure and dimensionality. Using the Minnesota Supercomputer Center Cray 2, its interactive UNICOS operating system and the University ethernet network, NDL programs can be created and executed interactively and the results of a simulation reviewed graphically at a Sun or Macintosh II workstation. Experiments with the network show that it can learn associations between one dimensional inputs (e.g. sequence) and multi-dimensional outputs (e.g. 3D structure) and has shown recall of some structural elements of small proteins. Support is requested for a year of experimentation with the network using data sets (learning sets) representing relationships between protein sequence and tertiary structure.

Agency
National Institute of Health (NIH)
Institute
National Center for Research Resources (NCRR)
Type
Small Research Grants (R03)
Project #
1R03RR005294-01
Application #
3431648
Study Section
Biotechnology Resources Review Committee (BRC)
Project Start
1989-09-30
Project End
1990-09-29
Budget Start
1989-09-30
Budget End
1990-09-29
Support Year
1
Fiscal Year
1989
Total Cost
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Type
Schools of Medicine
DUNS #
168559177
City
Minneapolis
State
MN
Country
United States
Zip Code
55455