This grant renewal proposal is about developing innovative new software that will allow health researchers to take advantage of new advances in DNA sequencing. Over the last decade, technology advances have made DNA sequencing a routine and cost- effective method in many fields of life sciences research. The dominant technology today generates millions of short sequences, consisting of 75-300 base pairs (the ?letters? that make up the DNA sequence). These short ?reads? have to be assembled in the right order to make sense of the data. Dr. Birol and his team are world leaders in genome assembly, and the award- winning software they have developed (with support from their existing NIH grant and other funding) has been used in diverse DNA sequencing projects, including The Cancer Genome Atlas project. Newer technologies are now becoming available that generate information on much longer stretches of the input DNA as long or linked reads. Long read platforms can sequence over 100,000 base pairs per read, though with a very high error rate and low throughput. Linked read platforms can associate multiple reads over similar lengths, although the data contains many gaps. Still, if coupled with bioinformatics tools that can leverage the rich information they provide, these new sequencing platforms will open new frontiers in health research. Dr. Birol is seeking to renew his NIH funding so that he can develop specialized software that will quickly, accurately, and efficiently assemble and analyse long and linked sequence reads. These tools would provide advanced capabilities in a range of projects, such as tracking infectious disease outbreaks, using genetic information to select the best drugs to treat an individual patient's cancer, and other applications. The new tools will be made available online free for other non-profit researchers to use in their own sequencing projects, allowing teams around the world to make faster progress in health research.

Public Health Relevance

DNA sequence analysis tools developed by Dr. Birol are used by health researchers in the US and around the world in projects that have furthered our understanding of many different types of disease. The proposed new software tools, designed to match recent advances in DNA sequencing technology, will make genetic analysis accessible to even more health research teams. While the new tools will be particularly relevant to understanding how infectious diseases spread and how cancer develops, they will find applications in diverse aspects of health research.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
2R01HG007182-04A1
Application #
9382151
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Felsenfeld, Adam
Project Start
2014-03-04
Project End
2018-03-31
Budget Start
2017-09-01
Budget End
2018-03-31
Support Year
4
Fiscal Year
2017
Total Cost
Indirect Cost
Name
British Columbia Cancer Agency
Department
Type
DUNS #
209137736
City
Vancouver
State
BC
Country
Canada
Zip Code
V5 1L3
Coombe, Lauren; Zhang, Jessica; Vandervalk, Benjamin P et al. (2018) ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers. BMC Bioinformatics 19:234
Yeo, Sarah; Coombe, Lauren; Warren, René L et al. (2018) ARCS: scaffolding genome drafts with linked reads. Bioinformatics 34:725-731
Chiu, Readman; Nip, Ka Ming; Chu, Justin et al. (2018) TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data. BMC Med Genomics 11:79
Khan, Hamza; Mohamadi, Hamid; Vandervalk, Benjamin P et al. (2018) ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data. Bioinformatics 34:1697-1704
Kucuk, Erdi; Chu, Justin; Vandervalk, Benjamin P et al. (2017) Kollector: transcript-informed, targeted de novo assembly of gene loci. Bioinformatics 33:1782-1788
Mohamadi, Hamid; Khan, Hamza; Birol, Inanc (2017) ntCard: a streaming algorithm for cardinality estimation in genomics data. Bioinformatics 33:1324-1330
Hammond, S Austin; Warren, René L; Vandervalk, Benjamin P et al. (2017) The North American bullfrog draft genome provides insight into hormonal regulation of long noncoding RNA. Nat Commun 8:1433
Hasan, Nabeeh A; Warren, René L; Epperson, L Elaine et al. (2017) Complete Genome Sequence of Mycobacterium chimaera SJ42, a Nonoutbreak Strain from an Immunocompromised Patient with Pulmonary Disease. Genome Announc 5:
Chu, Justin; Mohamadi, Hamid; Warren, René L et al. (2017) Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art. Bioinformatics 33:1261-1270
Yang, Chen; Chu, Justin; Warren, René L et al. (2017) NanoSim: nanopore sequence read simulator based on statistical characterization. Gigascience 6:1-6

Showing the most recent 10 out of 18 publications