Thanks to continuing developments in DNA sequencing technology, we now know the exact genetic makeup ("genome") of thousands of different organisms, encoding millions of different proteins, and these numbers continue to grow rapidly. But simply knowing the chemical specification (the "sequence") of these proteins is only a first step: the ultimate goal is to discover how genes and proteins function to support the diversity of life, and also how some of them can be used for commercial and biotechnology applications. This research project will expand the capability of scientists and their students to advance their analyses from sequences to functions, by bringing together multiple different state-of-the-art approaches. Each of these approaches uses both computational (necessary to address a problem of this magnitude) and broad biological expertise.

The general approach in this project is to classify proteins into families of related proteins, and, wherever possible, describe how each family relates to function. These relations may be very complex, and scientific accuracy will require application of multiple, diverse methods. In order to accomplish this aim, the project will expand InterPro, a widely used resource that already contains (though with limited integration) three of the leading databases for protein family and functional classification: PANTHER, Pfam and TIGRFAM. A fourth classification resource, the Structure-Function Linkage Database (SFLD), will also be incorporated into InterPro. These four databases use complementary methodologies to represent and describe protein relationships, which will be integrated to address the problem of protein function classification with unprecedented accuracy, precision and ease-of-use. The products of this work will be used to improve sequence analysis tools that support the scientific community, as well as to provide enhanced educational materials, and will be broadly accessible over the web at http://ebi.ac.uk/interpro.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
1458808
Program Officer
Peter McCartney
Project Start
Project End
Budget Start
2015-07-01
Budget End
2019-06-30
Support Year
Fiscal Year
2014
Total Cost
$1,679,187
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089