Inexact simple repeats can be quantified as local sums in a Markov additive process (MAP). The maximum of the local sums has an asymptotic Gumbel distribution, which are given by general MAP formulas. The general MAP formulas are usually computationally intractable, but an essential simplification in the case of repeats permits the Gumbel parameters to be computed from matrices whose dimension equals the size of the relevant alphabet. My analytic results for ungapped repeats are more detailed than those derived by simulation in G. Achaz et al. (2006) Repseek, a tool to retrieve approximate repeats from large DNA sequences. Bioinformatics. A working prototype of program for finding repeats with the Ruzzo-Tompa algorithm and then evaluating the results is available internally, within NCBI.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM200882-01
Application #
7735090
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
2008
Total Cost
$17,243
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code