Algorithms for Optimal Base-Calling in Sequencing-by-Synthesis

Vikalo, Haris

Abstract

Next generation sequencing-by-synthesis platforms enable fast and affordable DNA sequencing. However, read-lengths that they achieve are still shorter than those provided by the costly Sanger sequencing, and their accuracy is insufficient for most medical studies. To determine the order of nucleotides in a DNA fragment, sequencing-by-synthesis relies on enzymatic synthesis of the complementary strand on the fragment. The synthesis is enabled by a sequential addition of free nucleotides;extension of the complementary strand with the Watson-Crick complement of the first unpaired base of the DNA fragment is detected optically. However, the signal generated by sequencing a single DNA molecule is weak, and thus its detection requires complex and expensive hardware. Ensemble-based systems provide an efficient alternative: they amplify the signal by sequencing a large number of identical copies of the DNA fragment in parallel. To fully reap the benefits of having multiple signal sources, extension of complementary strands should progress at the same rate (so that the signals add in phase). However, synthesis of strands in an ensemble gets out-of-sync due to an occasional failure of nucleotide incorporation in some strands, and premature extension of others. These so-called phasing effects, probabilistic in nature, limit the achievable accuracy and read-lengths of sequencing-by-synthesis. The goal of the proposed project is to develop practical algorithms for optimal base-calling in sequencing-by-synthesis systems, improving their effective read-lengths and accuracy. To this end, we rely on concepts and tools from signal processing and information theory. We address two broadly employed systems: Illumina's four-color platform and Roche's (454 Life Sciences) pyrosequencing platform. If successful, as we expect based on preliminary results, our research will have immediate impact on various applications which require high-performance DNA sequencing.

Public Health Relevance

Performance of next generation DNA sequencing is fundamentally limited by the stochastic nature of the underlying biochemical process. Drawing on concepts from signal processing and information theory, we propose to design practical algorithms which may significantly improve the accuracy and effective read-lengths of next generation DNA sequencing systems. If successful, as we expect based on preliminary results, our research will have immediate impact on various applications which require high-performance DNA sequencing.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Exploratory/Developmental Grants (R21)
Project #: 1R21HG006171-01
Application #: 8095652
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Bonazzi, Vivien

Project Start: 2011-09-01
Project End: 2013-07-31
Budget Start: 2011-09-01
Budget End: 2012-07-31
Support Year: 1
Fiscal Year: 2011
Total Cost: $178,309
Indirect Cost

Institution

Name: University of Texas Austin
Department: Engineering (All Types)
Type: Schools of Engineering
DUNS #: 170230239

City: Austin
State: TX
Country: United States
Zip Code: 78712

Related projects


NIH 2012 R21 HG	Algorithms for Optimal Base-Calling in Sequencing-by-Synthesis Vikalo, Haris / University of Texas Austin	$177,883
NIH 2011 R21 HG	Algorithms for Optimal Base-Calling in Sequencing-by-Synthesis Vikalo, Haris / University of Texas Austin	$178,309

Publications

Das, Shreepriya; Vikalo, Haris (2013) Base calling for high-throughput short-read sequencing: dynamic programming solutions. BMC Bioinformatics 14:129

Shen, Xiaohu; Vikalo, Haris (2012) ParticleCall: a particle filter for base calling in next-generation sequencing systems. BMC Bioinformatics 13:160

Das, Shreepriya; Vikalo, Haris (2012) OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing. Bioinformatics 28:1677-83

Comments

Be the first to comment on Haris Vikalo's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: