High-throughput sequencing has been producing millions of protein sequences without solved structures and functional annotations, which raise demand for computational tools especially user-friendly web servers for protein structure and functional elucidation. This project will transform RaptorX, a popular protein structure modeling web server, to one that can also annotate functions of a protein sequence and the quality of a theoretical protein model in the absence of the corresponding native structure. The resultant new server will greatly facilitate the interpretation and proper usage of a theoretical protein model, just like what E-value does for homology search. The server will also predict functions of a protein sequence with coverage beyond what can be reached by native-structure-based methods and accuracy much higher than sequence-based methods. Ultimately, the project will deliver a long-term sustainable cyber-infrastructure for protein sequence, structure and functional analysis that enables transformative biological and biomedical research. This project will also advance protein structure and functional prediction by developing several sophisticated computational methods for model quality assessment and functional prediction.
Proteins play fundamental roles in all biological processes. Complete description of protein structures and functions is a fundamental step towards understanding biological life. This project will benefit a broad range of biological/biomedical applications, such as the study of plant metabolic pathways, drug design, and bio-energy development. The research results will be communicated to the broader community through a variety of venues (wiki, talks, papers and posters). The software will be freely available to the public. Since its first release in August 2011, RaptorX has processed dozens of thousands of protein modeling and analysis jobs for more than 3500 users around the world. After the new RaptorX is implemented, it will contribute much more to the broader community. This project will also contribute to computer science by studying machine learning problems inspired from protein bioinformatics. This project shall enrich and disseminate knowledge on protein bioinformatics, machine learning and web programming. It will also train minority students, future K-12 science teachers and nationwide students in the Illinois online bioinformatics program. All involved students will receive training in the intersection of computer science, molecular biology, biophysics, and biochemistry. The research results will be integrated into course materials, which will be used in the classes and also freely available to the public.