Recent advances in neural network theory, in which the present authors have played a significant role, have resulted in machine learning algorithms of great power. In an initial investigation, the authors have applied these algorithms to detecting and exploiting pattern regularities in DNA and also in amino acid sequences. In the two situations considered thus far (determination of whether or not a fragment of DNA codes for a protein, and predicting protein secondary structure given amino acid sequence) the results of the neural net analysis technique equals or exceeds results of conventional methods. We propose to intensively investigate these two problems with the goal of verifying and expanding our initial results, particularly to DNA sequences of other species, especially humans. We plan to expand our investigations to include pattern recognition searches for promoter/terminator sequences, intron/exon splice junctions, and other regulatory signals. Methods for the sequence to structure problem will be extended to include new results in energy minimization techniques for analogue models that contain numerous local minima. Different network architectures and different representations for the data will be investigated. When a neural net method exceeds a conventional method in accuracy we plan to analyze the network connections with the goal of understanding what rules the network developed (by virtue of the learning algorithm) that yielded the increased accuracy. Other machine learning algorithms, such as """"""""classifier systems,"""""""" will also be applied, as well as new approaches to information theoretic constructions of default hierarchies.
Stolorz, P; Lapedes, A; Xia, Y (1992) Predicting protein secondary structure using neural net and statistical methods. J Mol Biol 225:363-77 |
Farber, R; Lapedes, A; Sirotkin, K (1992) Determination of eukaryotic protein coding regions using neural networks and information theory. J Mol Biol 226:471-9 |