The genomes of most eukaryotes include vast numbers of interspersed repeats (IRs), which are the remnants of mostly selfishly amplified transposable elements. Transposable elements have an exceptionally wide-ranging mutagenic effect on genomes, while recognition of IRs provide unparalleled information on genome evolution and is crucial in many aspects of bioinformatics. This grant would support maintenance and further development of RepeatMasker, a computational tool that has become the de facto standard for identification and characterization of IRs, and the GESTALT Workbench, a graphical user interface for detailed visualization of RepeatMasker results in their genomic context. The source codes of these tools are freely available to the academic community. Development will emphasize the following: a) RepeatMasker needs to be rewritten to allow expansion along with the increasing amount and variety of genomic sequence data. b) """"""""Phylogenetic interpolation"""""""" will be used to construct repeat libraries for species for which only limited sequence is available (e.g. macaque, dog). This will be done by proper selection of the relevant subset of IR families from related species, and by applying appropriate, lineage-specific alignment parameters. c) Major expansion of the RepeatMasker functionalities of contamination detection and masking of only lineage specific IRs to facilitate the generation of interspecies genomic alignments. d) Building a public web server for RepeatMasker and GESTALT at the Institute for Systems Biology. This server will enable real-time analysis of private sequences, as well as offer pre-computed RepeatMasker results for all publicly available genomic sequences. It will also include novel repeat-based analysis services, such as genome sequence comparison, contamination detection and transcript prediction.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genome Study Section (GNM)
Program Officer
Good, Peter J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Institute for Systems Biology
United States
Zip Code
Agarwal, Prasoon; Enroth, Stefan; Teichmann, Martin et al. (2016) Growth signals employ CGGBP1 to suppress transcription of Alu-SINEs. Cell Cycle 15:1558-71
Hubley, Robert; Finn, Robert D; Clements, Jody et al. (2016) The Dfam database of repetitive DNA families. Nucleic Acids Res 44:D81-9
Hoen, Douglas R; Hickey, Glenn; Bourque, Guillaume et al. (2015) A call for benchmarking transposable element annotation methods. Mob DNA 6:13
Suh, Alexander; Churakov, Gennady; Ramakodi, Meganathan P et al. (2015) Multiple lineages of ancient CR1 retroposons shaped the early genome evolution of amniotes. Genome Biol Evol 7:205-17
Rosenbloom, Kate R; Armstrong, Joel; Barber, Galt P et al. (2015) The UCSC Genome Browser database: 2015 update. Nucleic Acids Res 43:D670-81
Carbone, Lucia; Harris, R Alan; Gnerre, Sante et al. (2014) Gibbon genome and the fast karyotype evolution of small apes. Nature 513:195-201
Caballero, Juan; Smit, Arian F A; Hood, Leroy et al. (2014) Realistic artificial DNA sequences as negative controls for computational genomics. Nucleic Acids Res 42:e99
Knijnenburg, Theo A; Ramsey, Stephen A; Berman, Benjamin P et al. (2014) Multiscale representation of genomic signals. Nat Methods 11:689-94
Green, Richard E; Braun, Edward L; Armstrong, Joel et al. (2014) Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs. Science 346:1254449
Chong, Amanda Y; Kojima, Kenji K; Jurka, Jerzy et al. (2014) Evolution and gene capture in ancient endogenous retroviruses - insights from the crocodilian genomes. Retrovirology 11:71

Showing the most recent 10 out of 16 publications