The identification of genetic variants that are associated with disease is an important step in linking sequence data with new approaches to improve human health. Among the sequence variants currently known to be directly linked with human disease, 57% are based on mutations that encode a single nonsynonymous amino acid substitution in the corresponding protein. An additional 23% of variants linked with disease are due to small insertions and deletions (indels) in genes. Therefore, an important problem in human health is the identification of coding variants, SNPs and indels, which affect protein function and might be associated with disease. To this end, we developed SIFT, an algorithm that predicts if an amino acid substitution affects protein function. This algorithm, available at the SIFT website (http://blocks.fhcrc.org/sift/SIFT.html), is widely used by the research community and is often used as a benchmark for similar prediction algorithms. The popularity of SIFT and other similar tools emphasizes the need to analyze coding variants and prioritize which amongst them are most likely to have a phenotypic effect. Moreover, large numbers of variants, including SNPs and indels, are being generated by advances in DNA sequencing technologies and they will require analysis. We propose to expand SIFT by developing an algorithm that will predict which small coding indels affect protein function and hence may be involved in disease. In addition, we propose to enhance the ability of SIFT to perform large-scale analysis for coding variants. These new features will be incorporated into the SIFT web server to enable genome-wide analysis. Executables and code will also be made freely available to the research community. Recent advances in DNA sequencing technologies are generating large numbers of genetic variation that necessitate analysis. Small insertions and deletions (indels) are common types of variation and 23% of variants linked with disease are due to indels. In order to improve identification of disease variants, we propose to expand our existing algorithm SIFT by enabling the prediction of small coding indels that affect protein function. In addition, we propose to enhance the ability of SIFT to perform large-scale analysis for coding variants.
Choi, Yongwook; Chan, Agnes P (2015) PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31:2745-7 |
Choi, Yongwook; Sims, Gregory E; Murphy, Sean et al. (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS One 7:e46688 |
Sim, Ngak-Leng; Kumar, Prateek; Hu, Jing et al. (2012) SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 40:W452-7 |
Kumar, Prateek; Henikoff, Steven; Ng, Pauline C (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4:1073-81 |