In a resequencing experiment, assembling reads into a coherent picture enables joint analysis of raw reads, offering an unbiased approach to detect genomic differences between individuals in population studies or to identify somatic changes in cancer research. This approach is gaining interest as large scale studies, such as The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) projects, compile their preliminary findings. Our implementation of a de novo assembly algorithm and its downstream analysis pipelines are popular tools in the field for interrogating genomes (ABySS) and transcriptomes (Trans-ABySS). Using these tools, our team has been contributing analysis results to a number of cancer studies, including several TCGA and ICGC projects. We also make these software available for the community;as of January 2014, ABySS and Trans-ABySS have collectively received over 700 citations (source: Thomson-Reuters) while enjoying vibrant user discussion venues at Google Groups. Building on the success of our analysis platforms, we will continue developing our algorithms, and will adapt them to data from the rapidly evolving sequencing technologies. We propose to improve the performance of ABySS and Trans-ABySS, and continue supporting a growing user base with better genome, transcriptome, and metagenome assembly and analysis tools. We will also expand the functionality of our analysis pipelines to integrate orthogonal data that support detected events;present alternative isoform usage in assembled transcriptomes as slice graphs;reconstruct 3'untranslated regions;and refine contig to reference alignments and their interpretation for better structural variation and chimeric transcript detection. To accomplish these goals, we will focus on (1) algorithmic improvements on the primary sequence assembly and alignment approaches, (2) high performance computing platforms, and optimize our analysis approaches on the next generation of central processing unit (CPU) architectures, and (3) downstream analysis pipelines, building streamlined standard operating procedures. With sequencing technologies changing rapidly, and their throughput still increasing exponentially, there is a need to adapt established bioinformatics tools, such as ABySS and Trans-ABySS, improve their performance, and make their use accessible to a growing community. The continued development of our tools will enable translational genomics studies on the road to precise personal medicine.
Analysis tools we developed to investigate DNA and RNA sequences from normal and diseased samples are being used by a wide group of investigators. With sequencing technologies changing rapidly, and their costs dropping sharply, there is a need to adapt established bioinformatics tools, improve their performance, and make their use accessible to a growing community. The continued development of our tools will enable translational genomics studies on the road to precise personal medicine.
|Paulino, Daniel; Warren, RenÃ© L; Vandervalk, Benjamin P et al. (2015) Sealer: a scalable gap-closing application for finishing draft genomes. BMC Bioinformatics 16:230|
|Birol, InanÃ§; Chu, Justin; Mohamadi, Hamid et al. (2015) Spaced Seed Data Structures for De Novo Assembly. Int J Genomics 2015:196591|
|Birol, InanÃ§; Raymond, Anthony; Chiu, Readman et al. (2015) Kleat: cleavage site analysis of transcriptomes. Pac Symp Biocomput :347-58|
|Vandervalk, Benjamin P; Yang, Chen; Xue, Zhuyi et al. (2015) Konnector v2.0: pseudo-long reads from paired-end sequencing data. BMC Med Genomics 8 Suppl 3:S1|
|Warren, RenÃ© L; Yang, Chen; Vandervalk, Benjamin P et al. (2015) LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads. Gigascience 4:35|
|Mohamadi, Hamid; Vandervalk, Benjamin P; Raymond, Anthony et al. (2015) DIDA: Distributed Indexing Dispatched Alignment. PLoS One 10:e0126409|