Carnegie Mellon University is awarded a grant to develop general purpose software and data resource for homology detection and classification of multi-domain sequence families. Reliable homology identification is essential for accuracy in gene annotation, function prediction and comparative genomics. Homology identification is a well studied problem for single domain proteins, but remains unsolved for complex, multi-domain proteins. The project will develop a resource of handcurated validation data sets; high accuracy methods for detection of multi-domain homologs and orthologs; methods for automated classification of multi-domain families; and a publicly available database with a web-based interface and visualization tools for exploratory analysis of the data. A key component of the proposed platform is a novel and accurate method for inferring multi-domain family relationships by exploiting the local structure of the sequence similarity network. It will produce a platform for studying the evolutionary processes of domain shuffling; the histories of gene duplications and domain duplications, insertions, deletions and rearrangements in the evolution of specific of multi-domain families; and the functional roles these families play in contemporary organisms. Accurate methods to identify multi-domain homologs will enhance a broad range of applications in gene function prediction and comparative genomics, thus contributing to essential research infrastructure. Scientifically, the proposed research addresses basic problems of classification that are fundamental to a broad range of widely used genomic analyses, including protein function prediction, especially for complex, multi-domain proteins. Such proteins are of particular interest in vertebrate genomes, where they are implicated in development, neural function, tissue repair and the immune system. The research themes of the proposal will be incorporated into lectures for the Pittsburgh Supercomputing Center's workshop on Developing Bioinformatics Programs, a two-week course aimed at preparing MARC (Minority Access to Research Careers) program faculty to teach bioinformatics courses at their local campuses. Undergraduates will participate in research on individual multi-domain families through course projects in ''Evolution and the History of Life''.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
0641313
Program Officer
Reed Beaman
Project Start
Project End
Budget Start
2007-07-01
Budget End
2010-06-30
Support Year
Fiscal Year
2006
Total Cost
$677,319
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213