Most eukaryotic genomes include vast numbers of interspersed repeats (IRs), which are the remnants of mostly selfishly amplified transposable elements. Transposable elements have an exceptionally wide- ranging mutagenic effect on genomes, while recognition of IRs provide unparalleled information on genome evolution and is crucial in many aspects of bioinformatics. This grant would continue support for the maintenance and further development of RepeatMasker, a computational tool that has become the de facto standard for identification and characterization of IRs, and support the development of RepeatModeler, a program designed to derive RepeatMasker-grade databases of IR consensus sequences. The source code for these tools are freely available to the public. Development will emphasize the following: a) With the rapid growth of sequenced mammalian species, the building of mammalian repeat libraries has become our highest priority. The RepeatModeler program already excels in its consensus building ability and IR classification scheme, but is still in an early phase and many modules need to be developed. b) RepeatMasker development will initially be focused on the annotation modules. These need to be parallelized and made auditable in order to link annotations to the relevant database entries. We also present strategies to improve RepeatMasker""""""""s detection of ancient, highly fragmented IRs and of IRs in draft genomes, and one that allows it to recognize genomic recombination sites within IRs. c) For many applications of RepeatMasker, including interspecies genome alignments and inference of species phylogenies, knowledge of the age and species distribution of IRs is crucial.
We aim to automate and refine the process of """"""""phylogenetic labeling"""""""" of consensus sequences in the library. d) We will further develop our website, by adding our transcript prediction program FEAST, increasing the number of pre-analyzed genomes, expanding our new protein based repeat masking services, and optionally presenting data in a graphical form. ? ? ?
Agarwal, Prasoon; Enroth, Stefan; Teichmann, Martin et al. (2016) Growth signals employ CGGBP1 to suppress transcription of Alu-SINEs. Cell Cycle 15:1558-71 |
Hubley, Robert; Finn, Robert D; Clements, Jody et al. (2016) The Dfam database of repetitive DNA families. Nucleic Acids Res 44:D81-9 |
Hoen, Douglas R; Hickey, Glenn; Bourque, Guillaume et al. (2015) A call for benchmarking transposable element annotation methods. Mob DNA 6:13 |
Suh, Alexander; Churakov, Gennady; Ramakodi, Meganathan P et al. (2015) Multiple lineages of ancient CR1 retroposons shaped the early genome evolution of amniotes. Genome Biol Evol 7:205-17 |
Rosenbloom, Kate R; Armstrong, Joel; Barber, Galt P et al. (2015) The UCSC Genome Browser database: 2015 update. Nucleic Acids Res 43:D670-81 |
Carbone, Lucia; Harris, R Alan; Gnerre, Sante et al. (2014) Gibbon genome and the fast karyotype evolution of small apes. Nature 513:195-201 |
Caballero, Juan; Smit, Arian F A; Hood, Leroy et al. (2014) Realistic artificial DNA sequences as negative controls for computational genomics. Nucleic Acids Res 42:e99 |
Knijnenburg, Theo A; Ramsey, Stephen A; Berman, Benjamin P et al. (2014) Multiscale representation of genomic signals. Nat Methods 11:689-94 |
Green, Richard E; Braun, Edward L; Armstrong, Joel et al. (2014) Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs. Science 346:1254449 |
Chong, Amanda Y; Kojima, Kenji K; Jurka, Jerzy et al. (2014) Evolution and gene capture in ancient endogenous retroviruses - insights from the crocodilian genomes. Retrovirology 11:71 |
Showing the most recent 10 out of 16 publications