Finding Protein Sequence Motifs - Methods and Applications

Koonin, E

Abstract

The generation of protein sequence data on a genome scale has greatly increased the demand for rapid, sensitive and reliable methods for detecting functionally important, conserved motifs (cm) in proteins. A method for detecting cm in protein sequence databases and assessing their statistical significance was developed and implemented in the CAP (Consistent Alignment Parser) and MoST (Motif Search Tool) programs. The MoST procedure consists of iteratively abstracting from an alignment block a weight matrix representing the cm, scanning the database with this matrix, and locating new segments to add to the alignment block. The approach is based on the statistics of score distributions for position-dependent weight matrices. This method was generalized to allow searches with two alignment blocks separated by a variable distance; this procedure was implemented in the MoST2 program. Methods for motif detection are further used in conjunction with other methods for protein sequence analysis in order to identify conserved domains and delineate protein superfamilies. This strategy was applied to a variety of biologically important groups of proteins. Selected examples: S-adenosyl methionine-binding motifs was identified in eukaryotic nucleolar proteins fibrillarins, and it was predicted that fibrillarins possess rRNA methyltransferase activity. A dinucleotide-binding domain was detected in a family of guanine nucleotide exchange proteins one of which is implicated in human hereditary blindness. A superfamily of proteins containing a lyase domain was delineated, and unexpectedly, such a domain was detected in adducin, a eukaryotic cytoskeletal protein implicated in hereditary hypertension. A nucleotidyltransferase domain, an acetyltransferase domain, and a putative new protein-protein interaction domain were detected in a family of eukaryotic translation initiation factors. A library of conserved motifs that characterize protein families with representatives encoded int he Escherichia coli genome was constructed. The library consists of 166 con-served alignment blocks that can be used by the MoST program. The significance of the project is in the development of a coherent strategy for identifying cm and domains and delineating protein superfamilies and in the prediction of the functions of a number of biologically important proteins using these methods.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Intramural Research (Z01)
Project #: 1Z01LM000061-02
Application #: 5203632
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 2
Fiscal Year: 1995
Total Cost
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country: United States
Zip Code

Related projects

Publications

Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V (2009) Comprehensive comparative-genomic analysis of type 2 toxin-antitoxin systems and related mobile stress response systems in prokaryotes. Biol Direct 4:19

Makarova, Kira S; Wolf, Yuri I; van der Oost, John et al. (2009) Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements. Biol Direct 4:29

Ng, C Leong; Waterman, David G; Koonin, Eugene V et al. (2009) Conformational flexibility and molecular interactions of an archaeal homologue of the Shwachman-Bodian-Diamond syndrome protein. BMC Struct Biol 9:32

Yutin, Natalya; Wolf, Maxim Y; Wolf, Yuri I et al. (2009) The origins of phagocytosis and eukaryogenesis. Biol Direct 4:9

Wolf, Yuri I; Novichkov, Pavel S; Karev, Georgy P et al. (2009) Inaugural Article: The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc Natl Acad Sci U S A 106:7273-80

Koonin, Eugene V; Aravind, L (2009) Comparative genomics, evolution and origins of the nuclear envelope and nuclear pore complex. Cell Cycle 8:1984-5

Galperin, Michael Y (2008) Telling bacteria: do not LytTR. Structure 16:657-9

Hou, Shaobin; Makarova, Kira S; Saw, Jimmy H W et al. (2008) Complete genome sequence of the extremely acidophilic methanotroph isolate V4, Methylacidiphilum infernorum, a representative of the bacterial phylum Verrucomicrobia. Biol Direct 3:26

Basu, Malay Kumar; Carmel, Liran; Rogozin, Igor B et al. (2008) Evolution of protein domain promiscuity in eukaryotes. Genome Res 18:449-61

Elkins, James G; Podar, Mircea; Graham, David E et al. (2008) A korarchaeal genome reveals insights into the evolution of the Archaea. Proc Natl Acad Sci U S A 105:8102-7

Showing the most recent 10 out of 50 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: