New York University is awarded a grant to develop a public database that provides information about the structure and inferred function of proteins found in two plant genomes ? the model species Arabidopsis thaliana and the crop plant Oryza sativa (rice). Proteins are synthesized in the cell as long polymers that fold to form three-dimensional shapes critical for their function; knowledge of the 3D structure of proteins can be crucial for inferring their specific function. Multiple state-of-the-art methods for predicting protein structure from protein sequences, including fold-recognition approaches, where protein sequences are mapped onto known folds, and Rosetta de novo structure prediction, where proteins are folded in silico, will be applied to all annotated proteins in these plant genomes. We will also integrate these structure predictions with knowledge of how proteins (and functional sites on folded proteins) evolve, by estimating the phylogenies of all protein domain families in these genomes and identifying positively-selected amino acid sites in these prorein families using codon-based molecular evolution models that can be mapped onto the predicted structures. This integration of structural and evolutionary information will result in annotated functional information inferences that will be useful to a wide cross-section of biologists working on several plant species. These methods will be especially useful for annotating the large fraction of proteins in plant genomes whose functions are currently unknown, the majority of which do not have any annotation of 3D folded structure (i.e., no detectable similarity to another protein with known structure). The bioinformatics resources for this project can also be extended for application to other sequenced plant genomes. The project is a collaborative effort between New York University and the American Museum of Natural History and will be carried out on the World Community Grid (a loosely coupled computing platform composed of >400,000 volunteers, organized by IBM), sidestepping the computational barrier to the required genome-wide structure prediction in a cost effective way. By participating in the World Community Grid (wcgrid.org), the project will also provide a forum for explaining plant genomics to several hundred thousand Grid participants spanning all geographic, age, and socioeconomic categories. Finally, the project will be coupled to a continuing education program for high school teachers at the NYU Steinhardt School that will train teachers how to incorporate bioinformatics into high school science curricula.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
0820757
Program Officer
Julie Dickerson
Project Start
Project End
Budget Start
2008-08-15
Budget End
2011-07-31
Support Year
Fiscal Year
2008
Total Cost
$1,621,765
Indirect Cost
Name
New York University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10012