Technological advances in DNA sequencing are enabling new areas of genomic research allowing fine connections between genomic variants and phenotype. We are studying transcriptomes and genomes from a variety of eukaryotic and prokaryotic non model organisms and have contributed to gain biological insights. At the moment we are collaborating with researchers in Corpoica, Colombia to continue the characterization of the Cape gooseberry transcriptome with additional data generated by next-generation sequencing technologies and have plans to construct a database using NCBI tools to display the newly assembled Cape gooseberry transcriptome. Additionally, we are in the process of assembling and annotating native Salmonella enterica isolates from Colombia using the NCBIs Prokaryotic Genomes Annotation Pipeline. This is done in collaboration with Richa Agarwala and Roberto Vera Alvarez. In collaboration with John Spouge and Sergey Sheetlin, we have used the generalized Ruzzo-Tompa algorithm to find repeats in genomic sequences and made an implementation of the algorithm called RepWords, the implementation allows the penalized deletion of unfavorable letters, the algorithmic generalization therefore includes gaps. The program RepWords, which finds inexact simple repeats in DNA, exemplifies the general concepts by out-performing similar ad hoc previously described tools. In another study related to the investigation of repeated sequences in genomes in collaboration with I. King Jordan we tested the possibility that mammalian-wide interspersed repeats (MIRs) contribute functional enhancers to the human genome. We found that MIRs are highly concentrated in enhancers of the K562 and HeLa human cell-types and MIR-derived enhancers were found to be a rich source of transcription factor binding sites. Therefore, MIRs can exercise a regulatory function in the human genome and this study contributes to explain why their role as the most ancient family of transposable elements. We contributed to the genome assembly and annotation of the first genome sequences of three Vibrio navarrensis strains obtained from clinical and environmental sources. Hybrid assembly were constructed using Pacific Biosciences and Illumina technologies and the genomes were annotated with the NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP). This research was done in collaboration with I. King Jordan and Cheryl Tarr (CDC). In a different study in collaboration with Carlos Y. Soto we identified novel genes for glycopeptidolipids biosynthesis in Mycobacterium colombiense, contributing to the understanding of sliding motility, biofilm formation, and glycopeptidolipid production in these mycobacteria.
Showing the most recent 10 out of 37 publications