In a resequencing experiment, assembling reads into a coherent picture enables joint analysis of raw reads, offering an unbiased approach to detect genomic differences between individuals in population studies or to identify somatic changes in cancer research. This approach is gaining interest as large scale studies, such as The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) projects, compile their preliminary findings. Our implementation of a de novo assembly algorithm and its downstream analysis pipelines are popular tools in the field for interrogating genomes (ABySS) and transcriptomes (Trans-ABySS). Using these tools, our team has been contributing analysis results to a number of cancer studies, including several TCGA and ICGC projects. We also make these software available for the community; as of January 2014, ABySS and Trans-ABySS have collectively received over 700 citations (source: Thomson-Reuters) while enjoying vibrant user discussion venues at Google Groups. Building on the success of our analysis platforms, we will continue developing our algorithms, and will adapt them to data from the rapidly evolving sequencing technologies. We propose to improve the performance of ABySS and Trans-ABySS, and continue supporting a growing user base with better genome, transcriptome, and metagenome assembly and analysis tools. We will also expand the functionality of our analysis pipelines to integrate orthogonal data that support detected events; present alternative isoform usage in assembled transcriptomes as slice graphs; reconstruct 3' untranslated regions; and refine contig to reference alignments and their interpretation for better structural variation and chimeric transcript detection. To accomplish these goals, we will focus on (1) algorithmic improvements on the primary sequence assembly and alignment approaches, (2) high performance computing platforms, and optimize our analysis approaches on the next generation of central processing unit (CPU) architectures, and (3) downstream analysis pipelines, building streamlined standard operating procedures. With sequencing technologies changing rapidly, and their throughput still increasing exponentially, there is a need to adapt established bioinformatics tools, such as ABySS and Trans-ABySS, improve their performance, and make their use accessible to a growing community. The continued development of our tools will enable translational genomics studies on the road to precise personal medicine.

Public Health Relevance

Analysis tools we developed to investigate DNA and RNA sequences from normal and diseased samples are being used by a wide group of investigators. With sequencing technologies changing rapidly, and their costs dropping sharply, there is a need to adapt established bioinformatics tools, improve their performance, and make their use accessible to a growing community. The continued development of our tools will enable translational genomics studies on the road to precise personal medicine.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG007182-03
Application #
9002847
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Felsenfeld, Adam
Project Start
2014-03-04
Project End
2017-01-31
Budget Start
2016-02-01
Budget End
2017-01-31
Support Year
3
Fiscal Year
2016
Total Cost
$249,665
Indirect Cost
$16,641
Name
British Columbia Cancer Agency
Department
Type
DUNS #
209137736
City
Vancouver
State
BC
Country
Canada
Zip Code
V5 1-L3
Coombe, Lauren; Zhang, Jessica; Vandervalk, Benjamin P et al. (2018) ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers. BMC Bioinformatics 19:234
Yeo, Sarah; Coombe, Lauren; Warren, René L et al. (2018) ARCS: scaffolding genome drafts with linked reads. Bioinformatics 34:725-731
Chiu, Readman; Nip, Ka Ming; Chu, Justin et al. (2018) TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data. BMC Med Genomics 11:79
Khan, Hamza; Mohamadi, Hamid; Vandervalk, Benjamin P et al. (2018) ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data. Bioinformatics 34:1697-1704
Kucuk, Erdi; Chu, Justin; Vandervalk, Benjamin P et al. (2017) Kollector: transcript-informed, targeted de novo assembly of gene loci. Bioinformatics 33:1782-1788
Mohamadi, Hamid; Khan, Hamza; Birol, Inanc (2017) ntCard: a streaming algorithm for cardinality estimation in genomics data. Bioinformatics 33:1324-1330
Hammond, S Austin; Warren, René L; Vandervalk, Benjamin P et al. (2017) The North American bullfrog draft genome provides insight into hormonal regulation of long noncoding RNA. Nat Commun 8:1433
Hasan, Nabeeh A; Warren, René L; Epperson, L Elaine et al. (2017) Complete Genome Sequence of Mycobacterium chimaera SJ42, a Nonoutbreak Strain from an Immunocompromised Patient with Pulmonary Disease. Genome Announc 5:
Chu, Justin; Mohamadi, Hamid; Warren, René L et al. (2017) Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art. Bioinformatics 33:1261-1270
Yang, Chen; Chu, Justin; Warren, René L et al. (2017) NanoSim: nanopore sequence read simulator based on statistical characterization. Gigascience 6:1-6

Showing the most recent 10 out of 18 publications