Inexact simple repeats can be quantified as local sums in a Markov additive process (MAP). The maximum of the local sums has an asymptotic Gumbel distribution, which are given by general MAP formulas. The general MAP formulas are usually computationally intractable, but an essential simplification in the case of repeats permits the Gumbel parameters to be computed from matrices whose dimension equals the size of the relevant alphabet. My analytic results for ungapped repeats are more detailed than those derived by simulation in G. Achaz et al. (2006) Repseek, a tool to retrieve approximate repeats from large DNA sequences. Bioinformatics. A working prototype of program for finding repeats with the Ruzzo-Tompa algorithm and then evaluating the results is available internally, within NCBI. Martin Frith is currently implementing our code for inexact gapless repeats into a tool for detecting genomic repeats. We are also making significant progress in quantifying the statistics of gapped repeats.

Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
2009
Total Cost
$38,700
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Spouge, John L; Mariño-Ramírez, Leonardo; Sheetlin, Sergey L (2014) Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps. Int J Bioinform Res Appl 10:384-408