This project is a continuing study of questions concerning what similarities can be expected to occur purely by chance when two protein or DNA sequences are compared. A subsidiary and related question concerns the definition of scoring systems that are optimal for distinguishing biologically meaningful patterns from chance similarities. Work this year includes: a) A study of the distribution of optimal scores for local alignments allowing gaps. We showed empirically that the characteristic value for this distribution grows linearly with the log of the search space size, and does not require a log-log term. This permits the two relevant statistical parameters to be determined from a random simulation for a single search space size. These parameters were estimated for a number of frequently used amino acid substitution matrices, and a wide range of gap cost penalties. It was also shown that the statistics for the sum of the scores of the r best locally optimal segment pairs may be extended to alignments allowing gaps. These advances permitted the development of a modification of the BLAST database search programs that permits gaps and reports accurate statistical significances. b) A refinement of the statistical treatment of multiple, distinct, locally optimal subalignments from the comparison of two sequences. When several distinct regions of similarity are shared by two proteins, it is appropriate to construct a combined assessment of their statistical significance. Earlier treatments have allowed the relative orders of corresponding regions within the sequences to be taken into account. The new treatment also permits constraints to be placed upon distances between the conserved regions.