DNA sequences show enourmous statistical heterogeneity in their local compositional characteristics. This project is investigating the possible causes of the mosaicism in sequence complexity, particularly the roles of mutational biases, medium and long range combinatorial constraints, and correlates of specific functional and structural classes of sequence. A. Distributions of local complexity: Using formal definitions of local compositional complexity of DNA subsequences, the robustness of the Salamon/Konopka maximum entropy relationship has been explored: Given a functionally equivalent set of DNA sequences, the distribution of complexity among all subsequences of this set appears to be as random as possible consistent with the mean complexity of these subsequences. It has now been shown that this maximum entropy effect follows as a consequence of the dynamics of almost any mutational mechanism that incorporates a bias toward low-complexity, for example the neighbor- dependent biases in substitution mutations observed in human genomic mutations. B. Feathered structure of medium range complexity distributions: DNA segments of length range 40 to 200 nucleotides have distributions of compositional complexity with a novel regularity of structure (""""""""feathering""""""""). This is a consequence of the mathematical properties of the local compositional complexity measures when applied to relatively long sequences. The pattern observed in genomic sequences but not in random sequences reflects the nonuniform representation of different complexity classes in natural DNA sequences. Significance of project: The project is beginning to provide detailed explanations for some of the puzzling features found from statistical analyses of DNA sequences, including aspects of the so-called """"""""long range correlations"""""""" inferred by other research groups from spectral analysis.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000050-02
Application #
3759318
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
1994
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code