Finding Protein Sequence Motifs - Methods and Applications

Koonin, E

Abstract

With the rapid growth of sequence information which greatly supersedes the rate of accumulation of experimental data on protein functions, the role of sensitive methods for protein sequence analysis, including the detection of subtle but functionally important motifs, is constantly increasing. The goals of this project include the development of a coherent strategy for delineating protein superfamilies and predicting protein function, eventually aiming at the construction of a comprehensive database of protein functional motifs. The methods used included sequence database search with individual sequences (the programs of the BLAST and FASTA families) and multiple sequence alignments (HMMer program package that builds Hidden Markov Models from multiple alignments and applies them for database screening); methods for detection of motifs in protein sequences, including those developed at an earlier stage of this project (programs PAST, CAP, MoST, GIBBS); multiple sequence alignment methods (programs MACAW, CLUSTALW); methods for partitioning protein sequences into predicted globular and non-globular domains (program SEG with varying parameters); methods for prediction of protein secondary structure (programs PHD, COILS), transmembrane domains (PHDhtm), and signal peptides (Signalp); a method for prediction of coding regions in DNA based on non-homogeneous Markov models (GeneMark); methods for clustering proteins by sequence similarity (CLUS). These methods were combined in a sequence analysis strategy designed primarily in order to efficiently analyze the sequences of large, multidomain proteins which comprise the majority of the products of genes implicated in human diseases. The protein sequences were first partitioned into putative globular and non-globular domains, after which database searches were conducted separately with the sequences of individual globular domains using a combination of transitive BLAST searches and motif analysis. In addition to general purpose sequence databases, separate, smaller databases were constructed using information on protein function and/or phylogenetic origin. Two large data sets, namely the products of genes involved in animal development and the products of positionally cloned human disease genes, were analyzed using these approaches. A variety of previously uncharacterized but potentially functionally important domains and motifs were discovered. Two important examples include a putative FAD-binding domain in the human choroideremia protein with a modified dinucleotide-binding consensus which prevented its previous detection,and a domain designated BRCT, which is conserved in a number of proteins involved in DNA damage-responsive cell cycle checkpoints, including the product of the human BRCA1 gene implicated in hereditary breast and ovarian cancers.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Intramural Research (Z01)
Project #: 1Z01LM000061-03
Application #: 2578634
Study Section: Special Emphasis Panel (CBB)

Project Start
Project End
Budget Start
Budget End
Support Year: 3
Fiscal Year: 1996
Total Cost
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country: United States
Zip Code

Related projects

Publications

Ng, C Leong; Waterman, David G; Koonin, Eugene V et al. (2009) Conformational flexibility and molecular interactions of an archaeal homologue of the Shwachman-Bodian-Diamond syndrome protein. BMC Struct Biol 9:32

Yutin, Natalya; Wolf, Maxim Y; Wolf, Yuri I et al. (2009) The origins of phagocytosis and eukaryogenesis. Biol Direct 4:9

Wolf, Yuri I; Novichkov, Pavel S; Karev, Georgy P et al. (2009) Inaugural Article: The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc Natl Acad Sci U S A 106:7273-80

Koonin, Eugene V; Aravind, L (2009) Comparative genomics, evolution and origins of the nuclear envelope and nuclear pore complex. Cell Cycle 8:1984-5

Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V (2009) Comprehensive comparative-genomic analysis of type 2 toxin-antitoxin systems and related mobile stress response systems in prokaryotes. Biol Direct 4:19

Makarova, Kira S; Wolf, Yuri I; van der Oost, John et al. (2009) Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements. Biol Direct 4:29

Galperin, Michael Y (2008) Telling bacteria: do not LytTR. Structure 16:657-9

Hou, Shaobin; Makarova, Kira S; Saw, Jimmy H W et al. (2008) Complete genome sequence of the extremely acidophilic methanotroph isolate V4, Methylacidiphilum infernorum, a representative of the bacterial phylum Verrucomicrobia. Biol Direct 3:26

Basu, Malay Kumar; Carmel, Liran; Rogozin, Igor B et al. (2008) Evolution of protein domain promiscuity in eukaryotes. Genome Res 18:449-61

Elkins, James G; Podar, Mircea; Graham, David E et al. (2008) A korarchaeal genome reveals insights into the evolution of the Archaea. Proc Natl Acad Sci U S A 105:8102-7

Showing the most recent 10 out of 50 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: