This project is a continuing study of questions concerning what similarities can be expected to occur purely by chance when two protein or DNA sequences are compared. A subsidiary and related question concerns the definition of scoring systems that are optimal for distinguishing biologically meaningful patterns from chance similarities. Advances this year include a study of the distribution of optimal scores from local alignments allowing gaps, which showed empirically that the characteristic value for this distribution grows linearly with the log of the search space size, and does not require a log-log term. This permits the two relevant statistical parameters to be determined from a random simulation for a single search space size. These parameters were estimated for a number of frequently used amino acid substitution matrices, and a wide range of gap penalties. It was also shown that the statistics for the sum of the scores of the best locally optimal segment pairs (Karlin & Altschul, 1993) may be extended to alignments allowing gaps. These advances permitted the development of a modification of the BLAST database search programs that permits gaps and reports accurate statistical significances. The work was done in collaboration with Warren Gish (Washington University, St. Louis), and is described in a paper soon to appear in Methods in Enzymology.