Biologists are deluged with sequence data yet have derived comparatively little biological information from it. The accurate annotation of protein function is key to understanding life, but experimentally determining what each protein does is costly and difficult, and cannot scale up to accommodate the vast amount of sequence data already available. Therefore discovering protein protein function by computational, rather than experimental means, is of primary importance. Genomic sequence data are available from thousands of species, and those are coupled with massive high-throughput experimental data. Together, these data have created new opportunities as well as challenges for computational function prediction. As a result, many computational annotation methods have been developed by research groups worldwide, but their accuracy and applicability need to be improved upon. The mission of the Automated Function Prediction Special Interest Group (AFP-SIG) is to bring together computational biologists, experimental biologists and biocurators who are dealing with the important problem of predicting protein function, to share ideas, and create collaborations. To improve computational function prediction methods, the Critical Assessment of protein Function Annotation algorithms (CAFA) was established as an ongoing experiment. CAFA was designed to provide a large-scale assessment of computational methods dedicated to predicting protein function. By challenging dozens of research groups worldwide to develop and provide their best software for function prediction, the researchers involved in the AFP-SIG will improve the ability of biologists to understand life at the molecular level. The AFP-SIG researchers will also generate experimental data from fruit-flies, fungi and bacteria to be used as benchmarks to test the software participating in CAFA, and a deeper understanding of these model organisms.

It is now possible to collect data that comprehensively profile many different states of complex biological systems. Using these data it should be possible to understand and explain the underlying systems, but significant challenges remain. One of the primary challenges is that, as researchers collect more data from many different organisms in many different systems, they discover more and different genes. Assigning functions to these newly discovered genes represents a key step towards interpretation of high-throughput data. This leads to a critical need to assess the quality of the function prediction methods that researchers have developed in recent years. The mission of the Automated Function Prediction Special Interest Group (AFP-SIG), founded in 2005, is to bring together bioinformaticians and biologists who are addressing this key challenge of gene function prediction. In addition to sharing ideas and creating collaboration, AFP-SIG has created CAFA: the Critical Assessment of (protein) Function Annotation. CAFA is a community-driven challenge to assess the performance of protein function prediction software, and it has been carried out twice since 2010. The investigators will provide the following outcomes: (1) robust open-source software to be used in function prediction and assessment of function prediction methods, incorporated into the high-profile annotation pipelines of UniProt-GOA; (2) expansion of the AFP community by engaging bioinformaticians, biocurators and experimentalists, thereby improving the quality and relevance of function prediction methods; (3) large-scale experimental screens in Drosophila, Candida and Pseudomonas for novel associations of targeted functional terms with genes; (4) an expanded CAFA event, incorporating both the curated annotations from the literature and our own experimental screens, in the last two years of the project. The progress of the AFP-SIG and CAFA will be available from http://BioFunctionPrediction.org

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
1458390
Program Officer
Peter McCartney
Project Start
Project End
Budget Start
2015-09-01
Budget End
2019-08-31
Support Year
Fiscal Year
2014
Total Cost
$498,724
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104