Adapting Phred/Phrap/Consed for NextGen Sequencing

Green, Philip

Abstract

Adapting Phred/Phrap/Consed to Next-Generation Sequencing New methods for DNA sequencing are allowing the production of much more data at a fraction of the cost of traditional technologies, such that DNA sequencing is now being used more than ever before in biomedical research. However software to analyze the output from these new technologies could be significantly improved. This proposal is to upgrade the widely used phred/phrap/consed package for these """"""""next-generation"""""""" sequencers. We have developed a new base-calling and image analysis program, next_phred, for the Illumina sequencer which gives 80%-90% more reads than the Illumina software and 50% fewer base- calling errors, thus significantly reducing sequencing costs and allowing more confident detection of sequence variants. We will make further performance improvements and investigate whether changes to the Illumina experimental protocol can increase yield still further. We will also calibrate the error probabilities for the base-callers of other next-generation sequencers. We will enable consed (the visualization, finishing, and analysis tool) to nimbly handle assemblies of up to several billion reads, a large reference sequence, and high depth of coverage;to detect structural variants and determine SNPs using a probabilistic model;to directly read the output of assemblers commonly used with next-generation data;and to perform batch correction of erroneous assemblies and consensus bases. We will further improve cross_match (the flexible sequence alignment program which is part of phred/phrap/consed) and our new ultrafast aligner phaster for mapping large numbers of genomic or RNA-Seq reads to a reference genome. Both programs will be given speed and functionality enhancements, including the capability to handle paired reads and to output alignments in a more compact file format. We will create a bioinformatics environment allowing even small labs to manage the massive amounts of data from next-generation sequencers. This will include the implementation of compact file formats, prescriptions for data storage, generation of files usable in a variety of applications, and pipelines for Illumina and 454 data processing.

Public Health Relevance

New DNA sequencing technologies are vastly increasing the amount of data available to decipher the genetic basis of human disease. Software able to fully exploit this data is currently lacking. Our software, commonly used for older types of sequencing machines, will be improved to meet this challenge and to significantly lower sequencing costs.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG005710-02
Application #: 8144487
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Bonazzi, Vivien

Project Start: 2010-09-18
Project End: 2013-06-30
Budget Start: 2011-07-01
Budget End: 2012-06-30
Support Year: 2
Fiscal Year: 2011
Total Cost: $600,454
Indirect Cost

Institution

Name: University of Washington
Department: Genetics
Type: Schools of Medicine
DUNS #: 605799469

City: Seattle
State: WA
Country: United States
Zip Code: 98195

Related projects


NIH 2012 R01 HG	Adapting Phred/Phrap/Consed for NextGen Sequencing Green, Philip P. / University of Washington	$605,005
NIH 2011 R01 HG	Adapting Phred/Phrap/Consed for NextGen Sequencing Green, Philip P. / University of Washington	$600,454
NIH 2010 R01 HG	Adapting Phred/Phrap/Consed for NextGen Sequencing Green, Philip P. / University of Washington	$591,336

Publications

Gordon, David; Green, Phil (2013) Consed: a graphical editor for next-generation sequencing. Bioinformatics 29:2936-7

Comments

Be the first to comment on Philip Green's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: