We will continue our development of methods for recognizing and representing functional domains in biological sequences. This includes methods to identify regulatory sites in DNA starting from unaligned sequences, and to develop models that will allow new sites to be accurately predicted. This will involve the adoption of better statistical models so that the most significant alignments can be more readily obtained. We will also develop improved methods for recognizing functional motifs in RNA sequences that are composed of both sequence and structure. These methods will be useful for identifying regulatory domains that operate post-transcriptionally, and also for determining the common motifs in RNAs selected in vitro for particular activities. And we will further enhance methods for representing conserved domains in protein families that new members of the families can be identified more reliably. This will involve the use of neural network methods that optimize the discrimination of protein family members from other sequences in the database that are not members of the family. We will also continue several collaborations with biologists who can take advantage of our methods in their work, and develop new collaborations as opportunities arise.
Showing the most recent 10 out of 109 publications