This project is a continuing study of questions concerning what similarities can be expected to occur purely by chance when two protein or DNA sequences are compared. A subsidiary and related question concerns the definition of scoring systems that are optimal for distinguishing biologically meaningful patterns from chance similarities. Work this year includes: a) The definition of a new method for scoring gaps within protein alignments, and the empirical study of the statistics of optimal alignment scores using this scoring system. Based upon the observation that a single mutational event can delete or insert multiple residues, affine gap costs for sequence alignment charge a penalty for the existence of a gap, and a further length-dependent penalty. From structural or multiple alignments of distantly related proteins, it has been observed that conserved residues frequently fall into ungapped blocks separated by relatively non-conserved regions. To take advantage of this structure, a simple generalization of affine gap costs was proposed which allows non-conserved regions to be effectively ignored. The distribution of scores from local alignments using these generalized gap costs was shown empirically to follow an extreme value distribution. In many cases generalized affine gap costs yield superior alignments from the standpoints both of statistical significance and alignment accuracy. Guidelines for selecting generalized affine gap costs were developed. b) The development of statistics for local alignments seeded by a pattern. The recently developed PHI-BLAST program constructs optimal local alignments seeded by a pattern specified by a researcher. The random distribution of these local alignments was studied both analytically and empirically. The statistics developed were incorporated into the PHI-BLAST program, allowing it in many instances to detect significant similarity between homologous proteins that were not recognizably realted using traditional single-pass database search methods.