Dr. Archie will examine patterns of evolution of DNA sequences in a broad array of organisms. Through the use of information in DNA sequence databases, the emphasis of the research will be on the evaluation of levels of accumulated random substitutions within DNA sequences and an examination of how these substitutions affect our ability to analyze sequence data. It has been generally acknowledged that DNA sequences contain a large amount of information on the evolutionary history of living organisms that can be used to make inferences about relationships among species. However, it is recognized that these sequences contain a large amount of random information because of the nature of the origin of change at sites in the sequences and because each site can take only four different forms. In previous research, biologists have assumed that there is substantial residual information in the sequences, a by- product of the evolutionary history of the organisms, that could be decoded using modern methods of analysis through the incorporation of computers. However, the methods for the analysis of these data break down and produce less certain results as the level of randomness increases. Levels of randomness are expected to vary between different regions of the DNA within the nucleus as well as in nucleotide sequences outside of the nucleus and are expected to increase as the amount of time since divergence between the species being analyzed increases. Since many species and species groups being compared have diverged over millions, tens of millions, or even hundreds of millions of years, in many comparisons of sequences biologists expect substantial amounts of random information about the evolutionary history to be present in the sequence data. The research program will analyze levels of random sequence information for a variety of different groups of organisms and for different areas in the genome. It will rely on sequence information that is present in DNA and protein sequence databases for information. These databases are growing rapidly in size because of recent technological advances in molecular biology. The sensitivity of a variety of numerical techniques for analyzing sequence data due to the accumulation of random information will be examined. Several new analytical techniques have been developed over the last 2-3 years and have generated a substantial amount of interest. However, neither these nor the older techniques have been examined from the perspective to be used in this research. The results from the study will help to indicate which sequences and parts of sequences will be useful for inferring relationships among species and species groups by identifying regions of the genome that tend to accumulate evolutionary useful and random changes at different rates. The results will also help indicate the lengths of sequences that must be examined in order to make reasonable inferences about evolutionary relationships.