Determining protein interactions and functions are central in most proteomics projects. This research focuses on the development of statistical and computational methods for the analysis of protein interaction data coming from high-throughput proteomic technologies such as yeast two-hybrid assays and mass spectrometry, and protein function data coming from databases of large-scale function annotations. The research involves the study of the following two important problems in biology: (1) identifying domain-domain interactions and protein-domain interactions from a large number of protein-protein interactions, and (2) assigning functions to unknown proteins from the knowledge of the functions of the annotated proteins, gene expression profiles, gene knockouts, protein sequence similarities, and protein-protein interactions. These results obtained for yeast proteins can help us predict interactions and functions of human proteins. Training postdoctoral associates and graduate students from mathematics, statistics, computer science and molecular biology for proteomic research is an important part of the proposed research. Based on the existing excellent education program in the field of computational biology and bioinformatics within the Center for Computational and Experimental Genomics (CCEG) in USC, the principal investigators plan to train future researchers through rigorous course work, seminars, discussion groups, and participation in the research project. In recent years, an increasing number of genomes of model organisms have been sequenced. Using these genomic sequences, researchers have been able to make tremendous progress in the study of genomes. Beyond these successes is the far more challenging and rewarding task of understanding proteomes. In addition to genome sequences, many other databases, such as protein-protein physical interactions, genetic interactions, protein-DNA interactions, and gene expressions, have become available. An important problem is how to combine information from the variety of different databases to understand the biological processes and biological functions of proteins. The principal investigators will develop new statistical and computational methods for estimation and prediction of protein functions and for understanding important biological problems integrating several relevant large databases. They will also train graduate students and postdoctoral associates in the field of computational biology and bioinformatics through their participation in research activities related to the proposed project. The proposed project will generate a suite of computer algorithms related to protein-protein interactions and functional predictions. Both the algorithms and results will be disseminated through the web. The results from this study will be important for basic biological studies as well as for disease related studies from identifying protein functions.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0241102
Program Officer
Mary Ann Horn
Project Start
Project End
Budget Start
2003-07-01
Budget End
2008-06-30
Support Year
Fiscal Year
2002
Total Cost
$1,036,000
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089