De Novo Assembly Tools: Research with Unbiased Engines (DNA-TRUE)

Birol, Inanc

Abstract

In a resequencing experiment, assembling reads into a coherent picture enables joint analysis of raw reads, offering an unbiased approach to detect genomic differences between individuals in population studies or to identify somatic changes in cancer research. This approach is gaining interest as large scale studies, such as The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) projects, compile their preliminary findings. Our implementation of a de novo assembly algorithm and its downstream analysis pipelines are popular tools in the field for interrogating genomes (ABySS) and transcriptomes (Trans-ABySS). Using these tools, our team has been contributing analysis results to a number of cancer studies, including several TCGA and ICGC projects. We also make these software available for the community; as of January 2014, ABySS and Trans-ABySS have collectively received over 700 citations (source: Thomson-Reuters) while enjoying vibrant user discussion venues at Google Groups. Building on the success of our analysis platforms, we will continue developing our algorithms, and will adapt them to data from the rapidly evolving sequencing technologies. We propose to improve the performance of ABySS and Trans-ABySS, and continue supporting a growing user base with better genome, transcriptome, and metagenome assembly and analysis tools. We will also expand the functionality of our analysis pipelines to integrate orthogonal data that support detected events; present alternative isoform usage in assembled transcriptomes as slice graphs; reconstruct 3' untranslated regions; and refine contig to reference alignments and their interpretation for better structural variation and chimeric transcript detection. To accomplish these goals, we will focus on (1) algorithmic improvements on the primary sequence assembly and alignment approaches, (2) high performance computing platforms, and optimize our analysis approaches on the next generation of central processing unit (CPU) architectures, and (3) downstream analysis pipelines, building streamlined standard operating procedures. With sequencing technologies changing rapidly, and their throughput still increasing exponentially, there is a need to adapt established bioinformatics tools, such as ABySS and Trans-ABySS, improve their performance, and make their use accessible to a growing community. The continued development of our tools will enable translational genomics studies on the road to precise personal medicine.

Public Health Relevance

Analysis tools we developed to investigate DNA and RNA sequences from normal and diseased samples are being used by a wide group of investigators. With sequencing technologies changing rapidly, and their costs dropping sharply, there is a need to adapt established bioinformatics tools, improve their performance, and make their use accessible to a growing community. The continued development of our tools will enable translational genomics studies on the road to precise personal medicine.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG007182-02
Application #: 8816112
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Felsenfeld, Adam

Project Start: 2014-03-04
Project End: 2017-01-31
Budget Start: 2015-02-01
Budget End: 2016-01-31
Support Year: 2
Fiscal Year: 2015
Total Cost: $249,665
Indirect Cost: $16,641

Institution

Name: British Columbia Cancer Agency
Department
Type
DUNS #: 209137736

City: Vancouver
State: BC
Country: Canada
Zip Code: V5 1-L3

Related projects


NIH 2020 R01 HG	De Novo Assembly Tools: Research with Unbiased Engines - Renewal (DNA-TRUER) Birol, Inanc / Provincial Health Services Authority
NIH 2019 R01 HG	De Novo Assembly Tools: Research with Unbiased Engines - Renewal (DNA-TRUER) Birol, Inanc / Provincial Health Services Authority
NIH 2018 R01 HG	De Novo Assembly Tools: Research with Unbiased Engines - Renewal (DNA-TRUER) Birol, Inanc / Provincial Health Services Authority
NIH 2017 R01 HG	De Novo Assembly Tools: Research with Unbiased Engines - Renewal (DNA-TRUER) Birol, Inanc / British Columbia Cancer Agency
NIH 2017 R01 HG	De Novo Assembly Tools: Research with Unbiased Engines - Renewal (DNA-TRUER) Birol, Inanc / Provincial Health Services Authority
NIH 2016 R01 HG	De Novo Assembly Tools: Research with Unbiased Engines (DNA-TRUE) Birol, Inanc / British Columbia Cancer Agency	$249,665
NIH 2015 R01 HG	De Novo Assembly Tools: Research with Unbiased Engines (DNA-TRUE) Birol, Inanc / British Columbia Cancer Agency	$249,665
NIH 2014 R01 HG	De Novo Assembly Tools: Research with Unbiased Engines (DNA-TRUE) Birol, Inanc / British Columbia Cancer Agency	$249,465

Publications

Coombe, Lauren; Zhang, Jessica; Vandervalk, Benjamin P et al. (2018) ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers. BMC Bioinformatics 19:234

Yeo, Sarah; Coombe, Lauren; Warren, René L et al. (2018) ARCS: scaffolding genome drafts with linked reads. Bioinformatics 34:725-731

Chiu, Readman; Nip, Ka Ming; Chu, Justin et al. (2018) TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data. BMC Med Genomics 11:79

Khan, Hamza; Mohamadi, Hamid; Vandervalk, Benjamin P et al. (2018) ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data. Bioinformatics 34:1697-1704

Kucuk, Erdi; Chu, Justin; Vandervalk, Benjamin P et al. (2017) Kollector: transcript-informed, targeted de novo assembly of gene loci. Bioinformatics 33:1782-1788

Mohamadi, Hamid; Khan, Hamza; Birol, Inanc (2017) ntCard: a streaming algorithm for cardinality estimation in genomics data. Bioinformatics 33:1324-1330

Hammond, S Austin; Warren, René L; Vandervalk, Benjamin P et al. (2017) The North American bullfrog draft genome provides insight into hormonal regulation of long noncoding RNA. Nat Commun 8:1433

Hasan, Nabeeh A; Warren, René L; Epperson, L Elaine et al. (2017) Complete Genome Sequence of Mycobacterium chimaera SJ42, a Nonoutbreak Strain from an Immunocompromised Patient with Pulmonary Disease. Genome Announc 5:

Chu, Justin; Mohamadi, Hamid; Warren, René L et al. (2017) Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art. Bioinformatics 33:1261-1270

Yang, Chen; Chu, Justin; Warren, René L et al. (2017) NanoSim: nanopore sequence read simulator based on statistical characterization. Gigascience 6:1-6

Showing the most recent 10 out of 18 publications

Comments

Be the first to comment on Inanc Birol's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: