Technical progress in the isolation and sequence analysis of DNA has greatly accelerated the rate of accumulation of new sequences. The use of computers to store and analyze nucleic acid sequence data has also been on the increase in recent times. Computers are currently used in molecular biology for the following purposes: (i) to set up databanks for storage of sequences with specific annotations and comments on the special sequence features. (ii) to analyze a given sequence for patterns of restriction enzyme sites, repetitive sequences and potential coding regions. (iii) to search for consensus regions having specific biological signals. (iv) to search a databank for sequences homologous to a given sequence. (v) to perform dot-matrix analyses to determine the evolutionary relationship between homologous sequences. Several unusual features, such as tandem repetitions, interspersed repetitive sequences, consensus sequence regions, high AT or GC-rich regions, that may have functional significance have been observed recently in natural DNA sequences. A systematic statistical analysis that may bring to light some of the important distribution characteristics of a given sub-sequence could significantly contribute to the understanding of the DNA sequence repetitions and the distributions of other sequence elements. To this end, we have carried out a systematic statistical analyses of DNA sequences using the sequences available from the databanks and various statistical methods.

Agency
National Institute of Health (NIH)
Institute
Center for Information Technology (CIT)
Type
Intramural Research (Z01)
Project #
1Z01CT000131-01
Application #
4692541
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
1985
Total Cost
Indirect Cost
Name
Computer Research and Technology
Department
Type
DUNS #
City
State
Country
United States
Zip Code