The development of computational tools in molecular biology, however, has often been hindered by the lack of predefined algorithms and data structures corresponding to operations on mathematical objects representing molecular sequences. Consequently, the development of sequence analysis methods often involves the need to first define (either explicitly or implicitly) and implement such objects and operations. This results in a duplication of effort and, in some cases, in poorly designed algorithms. This project seeks to address this problem. Mathematical objects representing various attributes of molecular sequences and commonly used operations on these objects were defined. These include: an alphabet (for proteins or nucleic acids), a sequence, a set of sequences, a sequence segment, an alignment of segments, a pattern (that is, a regular expression), a motif (that is, a model representing the frequency with which specific residues or bases occur at various positions in a local multiple alignment), and several types of scoring matrices. These objects and operations were implemented in the C programming language and have facilitated the development of a variety of new methods including a depth-first pattern searching algorithm, several Gibbs sampling methods for motif detection, and others. Often substantial development time can be saved. The code for these structures is being made available to the biological community over the network.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000058-01
Application #
3759326
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
1994
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code