Recent advances in neural network theory, in which the present authors have played a significant role, have resulted in machine learning algorithms of great power. In an initial investigation, the authors have applied these algorithms to detecting and exploiting pattern regularities in DNA and also in amino acid sequences. In the two situations considered thus far (determination of whether or not a fragment of DNA codes for a protein, and predicting protein secondary structure given amino acid sequence) the results of the neural net analysis technique equals or exceeds results of conventional methods. We propose to intensively investigate these two problems with the goal of verifying and expanding our initial results, particularly to DNA sequences of other species, especially humans. We plan to expand our investigations to include pattern recognition searches for promoter/terminator sequences, intron/exon splice junctions, and other regulatory signals. Methods for the sequence to structure problem will be extended to include new results in energy minimization techniques for analogue models that contain numerous local minima. Different network architectures and different representations for the data will be investigated. When a neural net method exceeds a conventional method in accuracy we plan to analyze the network connections with the goal of understanding what rules the network developed (by virtue of the learning algorithm) that yielded the increased accuracy. Other machine learning algorithms, such as """"""""classifier systems,"""""""" will also be applied, as well as new approaches to information theoretic constructions of default hierarchies.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM040789-02
Application #
3298717
Study Section
(SSS)
Project Start
1988-07-01
Project End
1991-06-30
Budget Start
1989-07-01
Budget End
1990-06-30
Support Year
2
Fiscal Year
1989
Total Cost
Indirect Cost
Name
Los Alamos National Lab
Department
Type
Organized Research Units
DUNS #
City
Los Alamos
State
NM
Country
United States
Zip Code
87545
Stolorz, P; Lapedes, A; Xia, Y (1992) Predicting protein secondary structure using neural net and statistical methods. J Mol Biol 225:363-77
Farber, R; Lapedes, A; Sirotkin, K (1992) Determination of eukaryotic protein coding regions using neural networks and information theory. J Mol Biol 226:471-9