A tandem repeat is an occurrence of two or more adjacent, often approximate copies of a sequence of nucleotides. Tandem repeats have known functional roles, including coding with loss of function, switching, and acting as modifiers of gene expression. Tandem repeats are primary components of chromosomal structures. They are useful for genetic linkage analysis, bacterial strain typing, DNA fingerprinting and studies of changes in DNA over short time scales.
Identification of tandem repeats has been made easier by new software that processes the entire genome. The rapid analysis permits identification and annotation of repeats, clustering into families for further study. A multi-genome Tandem Repeats Database will bring together information about repeats as well as serving as the platform for development of new tools. These include algorithms to compare and cluster repeats, as well as for identifying predictive criteria for copy number polymorphisms. This will further enable annotation of repeats and of repeat families, including genomic environment, copy number polymorphisms, whole genome properties and family properties. All of this will be available through a web site with integrated data visualization and data model specification for transfer to other formats.