Software Tools for Next-Generation Sequencer Data

Marth, Gabor

Abstract

The vast throughput of next-generation sequencing technologies will enable costeffective organismal polymorphism discovery, complete mutational profiling, and individual human resequencing. An ambitious undertaking, the 1000 Genomes Project aims to discover all common human genetic variations by sequencing a large number of individuals. These projects will generate a vast amount of data posing formidable challenges for data storage and analysis. The shorter read length of next-generation technologies and the need to support new sequencing applications demand new, efficient informatics tools. Building on our existing prototype software we will develop a complete suite of tools to support next-generation resequencing applications. Specifically, we will develop base calling programs that improve upon the native software supplied by the machine manufacturers. We will delineate those regions of genomes that can be unambiguously resequenced with the shorter next-generation reads, and propose novel protocols for efficient representation of such annotations. We will develop a flexible, high-performance read alignment program that can map billons of reads to large, complex genome sequences. We will expand our existing SNP and short-INDEL polymorphism discovery program, and build new software for structural variation discovery. Finally, we will develop a graphical assembly viewer program to aid data validation and hypothesis generation by integrating gene annotations with primary data views. Our tools will be used both in whole-genome and in targeted individual human resequencing applications: in normal samples to discover segregating markers for medical association studies;in cases and controls to identify the causative alleles in regions implicated by such studies;and in cancer samples to find point mutations and structural rearrangements. The projects enabled by our tools will help understand the genetic causes of human diseases, leading to improved diagnostic procedures and more successful treatment. We are developing computer software for DNA sequencing projects to uncover the genetic causes of human diseases. The discoveries made from these projects will help to better understand, diagnose, and treat the disease.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG004719-04
Application #: 8119767
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Brooks, Lisa

Project Start: 2008-09-01
Project End: 2014-06-30
Budget Start: 2011-07-01
Budget End: 2014-06-30
Support Year: 4
Fiscal Year: 2011
Total Cost: $546,108
Indirect Cost

Institution

Name: Boston College
Department: Biology
Type: Schools of Arts and Sciences
DUNS #: 045896339

City: Chestnut Hill
State: MA
Country: United States
Zip Code: 02467

Related projects


NIH 2011 R01 HG	Software Tools for Next-Generation Sequencer Data Marth, Gabor T. / Boston College	$546,108
NIH 2010 R01 HG	Software Tools for Next-Generation Sequencer Data Marth, Gabor T. / Boston College	$666,447
NIH 2009 R01 HG	Software Tools for Next-Generation Sequencer Data Marth, Gabor T. / Boston College	$525,371
NIH 2009 R01 HG	Software Tools for Next-Generation Sequencer Data Marth, Gabor T. / Boston College	$314,553
NIH 2008 R01 HG	Software Tools for Next-Generation Sequencer Data Marth, Gabor T. / Boston College	$468,377

Publications

Lee, Wan-Ping; Wu, Jiantao; Marth, Gabor T (2015) Toolbox for mobile-element insertion detection on cancer genomes. Cancer Inform 14:37-44

Challis, Danny; Antunes, Lilian; Garrison, Erik et al. (2015) The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes. BMC Genomics 16:143

Wu, Jiantao; Lee, Wan-Ping; Ward, Alistair et al. (2014) Tangram: a comprehensive toolbox for mobile element insertion detection. BMC Genomics 15:795

Farrell, Andrew; Coleman, Bradley I; Benenati, Brian et al. (2014) Whole genome profiling of spontaneous and chemically induced mutations in Toxoplasma gondii. BMC Genomics 15:354

Lee, Wan-Ping; Wu, Jiantao; Marth, Gabor T (2014) Toolbox for mobile-element insertion detection on cancer genomes. Cancer Inform 13:45-52

Lee, Wan-Ping; Stromberg, Michael P; Ward, Alistair et al. (2014) MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. PLoS One 9:e90581

Qiao, Yi; Quinlan, Aaron R; Jazaeri, Amir A et al. (2014) SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization. Genome Biol 15:443

Zhao, Mengyao; Lee, Wan-Ping; Garrison, Erik P et al. (2013) SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. PLoS One 8:e82138

Busby, Michele A; Stewart, Chip; Miller, Chase A et al. (2013) Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression. Bioinformatics 29:656-7

Miller, Chase A; Anthony, Jon; Meyer, Michelle M et al. (2013) Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web. Bioinformatics 29:381-3

Showing the most recent 10 out of 25 publications

Comments

Be the first to comment on Gabor Marth's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: