Genome-wide prediction and analysis of coding variants

Chan, Agnes

Abstract

The identification of genetic variants that are associated with disease is an important step in linking sequence data with new approaches to improve human health. Among the sequence variants currently known to be directly linked with human disease, 57% are based on mutations that encode a single nonsynonymous amino acid substitution in the corresponding protein. An additional 23% of variants linked with disease are due to small insertions and deletions (indels) in genes. Therefore, an important problem in human health is the identification of coding variants, SNPs and indels, which affect protein function and might be associated with disease. To this end, we developed SIFT, an algorithm that predicts if an amino acid substitution affects protein function. This algorithm, available at the SIFT website (http://blocks.fhcrc.org/sift/SIFT.html), is widely used by the research community and is often used as a benchmark for similar prediction algorithms. The popularity of SIFT and other similar tools emphasizes the need to analyze coding variants and prioritize which amongst them are most likely to have a phenotypic effect. Moreover, large numbers of variants, including SNPs and indels, are being generated by advances in DNA sequencing technologies and they will require analysis. We propose to expand SIFT by developing an algorithm that will predict which small coding indels affect protein function and hence may be involved in disease. In addition, we propose to enhance the ability of SIFT to perform large-scale analysis for coding variants. These new features will be incorporated into the SIFT web server to enable genome-wide analysis. Executables and code will also be made freely available to the research community. Recent advances in DNA sequencing technologies are generating large numbers of genetic variation that necessitate analysis. Small insertions and deletions (indels) are common types of variation and 23% of variants linked with disease are due to indels. In order to improve identification of disease variants, we propose to expand our existing algorithm SIFT by enabling the prediction of small coding indels that affect protein function. In addition, we propose to enhance the ability of SIFT to perform large-scale analysis for coding variants.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG004701-03
Application #: 7879322
Study Section: Special Emphasis Panel (ZRG1-BST-Q (01))
Program Officer: Brooks, Lisa

Project Start: 2008-09-01
Project End: 2012-06-30
Budget Start: 2010-07-01
Budget End: 2011-06-30
Support Year: 3
Fiscal Year: 2010
Total Cost: $286,848
Indirect Cost

Institution

Name: J. Craig Venter Institute, Inc.
Department
Type
DUNS #: 076364392

City: Rockville
State: MD
Country: United States
Zip Code: 20850

Related projects


NIH 2011 R01 HG	Genome-wide prediction and analysis of coding variants Chan, Agnes / J. Craig Venter Institute, Inc.	$298,740
NIH 2010 R01 HG	Genome-wide prediction and analysis of coding variants Chan, Agnes / J. Craig Venter Institute, Inc.	$286,848
NIH 2009 R01 HG	Genome-wide prediction and analysis of coding variants Murphy, Sean / J. Craig Venter Institute, Inc.	$298,202
NIH 2008 R01 HG	Genome-wide prediction and analysis of coding variants Ng, Pauline / J. Craig Venter Institute, Inc.	$293,229

Publications

Choi, Yongwook; Chan, Agnes P (2015) PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31:2745-7

Sim, Ngak-Leng; Kumar, Prateek; Hu, Jing et al. (2012) SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 40:W452-7

Choi, Yongwook; Sims, Gregory E; Murphy, Sean et al. (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS One 7:e46688

Kumar, Prateek; Henikoff, Steven; Ng, Pauline C (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4:1073-81

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: