The identification of genetic variants that are associated with disease is an important step in linking sequence data with new approaches to improve human health. Among the sequence variants currently known to be directly linked with human disease, 57% are based on mutations that encode a single nonsynonymous amino acid substitution in the corresponding protein. An additional 23% of variants linked with disease are due to small insertions and deletions (indels) in genes. Therefore, an important problem in human health is the identification of coding variants, SNPs and indels, which affect protein function and might be associated with disease. To this end, we developed SIFT, an algorithm that predicts if an amino acid substitution affects protein function. This algorithm, available at the SIFT website (http://blocks.fhcrc.org/sift/SIFT.html), is widely used by the research community and is often used as a benchmark for similar prediction algorithms. The popularity of SIFT and other similar tools emphasizes the need to analyze coding variants and prioritize which amongst them are most likely to have a phenotypic effect. Moreover, large numbers of variants, including SNPs and indels, are being generated by advances in DNA sequencing technologies and they will require analysis. We propose to expand SIFT by developing an algorithm that will predict which small coding indels affect protein function and hence may be involved in disease. In addition, we propose to enhance the ability of SIFT to perform large-scale analysis for coding variants. These new features will be incorporated into the SIFT web server to enable genome-wide analysis. Executables and code will also be made freely available to the research community. Recent advances in DNA sequencing technologies are generating large numbers of genetic variation that necessitate analysis. Small insertions and deletions (indels) are common types of variation and 23% of variants linked with disease are due to indels. In order to improve identification of disease variants, we propose to expand our existing algorithm SIFT by enabling the prediction of small coding indels that affect protein function. In addition, we propose to enhance the ability of SIFT to perform large-scale analysis for coding variants.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG004701-04
Application #
8133148
Study Section
Special Emphasis Panel (ZRG1-BST-Q (01))
Program Officer
Brooks, Lisa
Project Start
2008-09-01
Project End
2013-06-30
Budget Start
2011-07-01
Budget End
2013-06-30
Support Year
4
Fiscal Year
2011
Total Cost
$298,740
Indirect Cost
Name
J. Craig Venter Institute, Inc.
Department
Type
DUNS #
076364392
City
Rockville
State
MD
Country
United States
Zip Code
20850
Choi, Yongwook; Chan, Agnes P (2015) PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31:2745-7
Choi, Yongwook; Sims, Gregory E; Murphy, Sean et al. (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS One 7:e46688
Sim, Ngak-Leng; Kumar, Prateek; Hu, Jing et al. (2012) SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 40:W452-7
Kumar, Prateek; Henikoff, Steven; Ng, Pauline C (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4:1073-81