A general Bayesian polymorphism discovery tool

Marth, Gabor

Abstract

Genetic variations are landmarks that allow us to track our genetic ancestry and their genome structure informs us about the molecular and demographic forces that have shaped it. For medical research the most important polymorphisms are disease-causing variants, but non-functional polymorphisms are also useful as markers for linkage and association studies. The detection of single-nucleotide polymorphisms (SNPs) and short insertion/deletions (INDELs) from DNA sequences is challenging because one must align and compare sequences from varied sources, and differentiate true polymorphisms from sequencing errors. There is a growing need to find rare, medically important alleles in deep alignments of clonal sequences and diploid sequence traces; to identify large numbers of markers for mapping studies in humans, model organisms, and plants; and to discover informative polymorphisms for pathogen strain identification. Building on our existing software, POLYBAYES, we propose to develop a general polymorphism discovery tool that meets these challenges. We will organize fragementary sequences by layering them upon the genome reference sequence; discard paralogous sequences from similar, duplicated genome regions; and use base quality values in a rigorous, Bayesian scheme to compare sequences of arbitrary quality standards. Specifically, we propose methods to align multi-exon genes, and novel methods for paralog filtering based either on complete mapping information or on genome distributions of sequence divergence. We will develop new algorithms for the difficult problem of INDEL detection; integrate heterozygote detection in diploid traces into our software; enhance sensitivity to detect rare alleles; and include a new measure to estimate the true positive rate of our candidate predictions. We will implement a fast, reliable, full functionality discovery tool that is free for academic research, performs well in large discovery projects, but can run on desktop computers, and is easily accessible to Biologists in small or medium laboratories.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 1R01HG003698-01
Application #: 6959489
Study Section: Special Emphasis Panel (ZRG1-BST-D (51))
Program Officer: Brooks, Lisa

Project Start: 2005-09-16
Project End: 2010-07-31
Budget Start: 2005-09-16
Budget End: 2006-07-31
Support Year: 1
Fiscal Year: 2005
Total Cost: $369,355
Indirect Cost

Institution

Name: Boston College
Department: Biology
Type: Schools of Arts and Sciences
DUNS #: 045896339

City: Chestnut Hill
State: MA
Country: United States
Zip Code: 02467

Related projects


NIH 2009 R01 HG	A general Bayesian polymorphism discovery tool Marth, Gabor T. / Boston College	$326,511
NIH 2009 R01 HG	A general Bayesian polymorphism discovery tool Marth, Gabor T. / Boston College	$378,000
NIH 2008 R01 HG	A general Bayesian polymorphism discovery tool Marth, Gabor T. / Boston College	$326,511
NIH 2007 R01 HG	A general Bayesian polymorphism discovery tool Marth, Gabor T. / Boston College	$350,215
NIH 2006 R01 HG	A general Bayesian polymorphism discovery tool Marth, Gabor T. / Boston College	$336,263
NIH 2005 R01 HG	A general Bayesian polymorphism discovery tool Marth, Gabor T. / Boston College	$369,355

Publications

1000 Genomes Project Consortium; Abecasis, Goncalo R; Auton, Adam et al. (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56-65

Huang, Weichun; Li, Leping; Myers, Jason R et al. (2012) ART: a next-generation sequencing read simulator. Bioinformatics 28:593-4

1000 Genomes Project Consortium; Abecasis, Gonçalo R; Altshuler, David et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061-73

Hillier, LaDeana W; Marth, Gabor T; Quinlan, Aaron R et al. (2008) Whole-genome sequencing and variant discovery in C. elegans. Nat Methods 5:183-8

Huang, Weichun; Marth, Gabor (2008) EagleView: a genome assembly viewer for next-generation sequencing technologies. Genome Res 18:1538-43

Quinlan, Aaron R; Stewart, Donald A; Stromberg, Michael P et al. (2008) Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Nat Methods 5:179-81

Comments

Be the first to comment on Gabor Marth's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: