Inexact simple repeats can be quantified as local sums in a Markov additive process (MAP). The maximum of the local sums has an asymptotic Gumbel distribution, which are given by general MAP formulas. The general MAP formulas are usually computationally intractable, but an essential simplification in the case of repeats permits the Gumbel parameters to be computed from matrices whose dimension equals the size of the relevant alphabet. Dr. Spouge's analytic results for ungapped repeats are more detailed than those derived by simulation in G. Achaz et al. (2006) Repseek, a tool to retrieve approximate repeats from large DNA sequences. Dr. Sheetlin has a working prototype of a program for finding gapped tandem repeats using variants of the Ruzzo-Tompa algorithm, and Dr. Mario-Ramrez is using empirical methods to determine the efficacy of the repeat retrieval methods. Drs. Sheetlin and Guirguis are extending the methods for gapped tandem repeats to more general repeat-finding techniques with Hidden Markov models.
|Spouge, John L; Mariño-Ramírez, Leonardo; Sheetlin, Sergey L (2014) Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps. Int J Bioinform Res Appl 10:384-408|