Intruding into the midnight zone of protein comparisons

Rost, Burkhard

Abstract

The 'twilight zone' of protein sequence comparison is the region in which sequence similarity does not suffice to conclude e.g. structural similarity. The vast majority of all protein pairs of similar structure populate a 'midnight zone' i.e. their sequences differ too much for sequence-based comparisons. Here, we propose to refine, extend, and specialise methods combining sequence alignment, structure prediction and functional information. Goal is to unravel hidden similarities in entirely sequenced organisms by a reliable, automatic tool. Towards the end of our project, the sequences for most protein families realised by life will supposedly be available. We hope that our system will correctly detect a relation for most of these. (1) Prediction-based threading combines sequence alignments with predictions of secondary structure and accessibility to find remote similarities. We hope to considerably improve detection and alignment accuracy by comparing families with families rather than single proteins. (2) About one third of all proteins in worm and fly seem to have long regions lacking regular secondary structure. We hope to develop a method tailored to reliably detect and compare such regions. (3) No current method finds similarities between extremely diverged membrane proteins. We propose to develop such a method combining 'membrane threading' with classifications of membrane proteins. (4) Since sequence comparison in the twilight zone and below is an extremely demanding task, most existing methods have very low levels of accuracy. In practice, experts compare aspects of function between the protein pair under investigation. We want to develop an automatic method evaluating functional aspects. In particular, we intend to start with proteins binding to DNA. The tasks will be to (i) predict DNA-binding sites in proteins, and to (ii) restrict the threading to the subset of proteins for which binding regions were found. In the following step, we hope to use general sequence motifs for the automatic comparison. (5) Threading entire genomes: the first task will be to find all proteins in an entire organism for which we know structure. However, the particular edge of our method will be to find remote similarities even in the absence of experimental information about structure.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 1R01GM063029-01
Application #: 6321115
Study Section: Special Emphasis Panel (ZRG1-BBCB (01))
Program Officer: Edmonds, Charles G

Project Start: 2001-04-05
Project End: 2005-03-31
Budget Start: 2001-04-05
Budget End: 2002-03-31
Support Year: 1
Fiscal Year: 2001
Total Cost: $279,773
Indirect Cost

Institution

Name: Columbia University (N.Y.)
Department: Biochemistry
Type: Schools of Medicine
DUNS #: 167204994

City: New York
State: NY
Country: United States
Zip Code: 10032

Related projects


NIH 2004 R01 GM	Intruding into the midnight zone of protein comparisons Rost, Burkhard / Columbia University (N.Y.)	$266,506
NIH 2003 R01 GM	Intruding into the midnight zone of protein comparisons Rost, Burkhard / Columbia University (N.Y.)	$266,057
NIH 2002 R01 GM	Intruding into the midnight zone of protein comparisons Rost, Burkhard / Columbia University (N.Y.)	$280,192
NIH 2001 R01 GM	Intruding into the midnight zone of protein comparisons Rost, Burkhard / Columbia University (N.Y.)	$279,773

Publications

Ofran, Yanay; Rost, Burkhard (2007) ISIS: interaction sites identified from sequence. Bioinformatics 23:e13-6

Mika, Sven; Rost, Burkhard (2006) Protein-protein interactions more conserved within species than across species. PLoS Comput Biol 2:e79

Mika, Sven; Rost, Burkhard (2005) NMPdb: Database of Nuclear Matrix Proteins. Nucleic Acids Res 33:D160-3

Schlessinger, Avner; Rost, Burkhard (2005) Protein flexibility and rigidity predicted from sequence. Proteins 61:115-26

Przybylski, Dariusz; Rost, Burkhard (2004) Improving fold recognition without folds. J Mol Biol 341:255-69

Mika, Sven; Rost, Burkhard (2004) Protein names precisely peeled off free text. Bioinformatics 20 Suppl 1:i241-7

Liu, Jinfeng; Rost, Burkhard (2004) Sequence-based prediction of protein domains. Nucleic Acids Res 32:3522-30

Bigelow, Henry R; Petrey, Donald S; Liu, Jinfeng et al. (2004) Predicting transmembrane beta-barrels in proteomes. Nucleic Acids Res 32:2566-77

Wrzeszczynski, Kazimierz O; Rost, Burkhard (2004) Cataloging proteins in cell cycle control. Methods Mol Biol 241:219-33

Liu, Jinfeng; Rost, Burkhard (2003) Domains, motifs and clusters in the protein universe. Curr Opin Chem Biol 7:5-11

Showing the most recent 10 out of 36 publications

Comments

Be the first to comment on Burkhard Rost's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: