Detecting Homology in the """"""""Twilight Zone"""""""" of Sequence Similarity

Patterson, Randen

Abstract

The `protein problem'has remained unsolved despite decades of research [1, 2]. In principle, one expects that the primary amino acid sequence of a protein determines its structure, function, and evolutionary (SF&E) characteristics. Yet, there still is no reliable method for predicting the native state structure of a protein and its function given only its sequence. In addition, inferring the evolutionary relationships among highly divergent protein sequences is a daunting task. In general, when pairwise sequence alignments between protein sequences fall below 25% identity, statistical measurements do not provide support robust enough to identify clear phylogenetic relationships despite intensive research in this area [1, 3, 4]. The recent explosion in the availability of knowledge bases and computational techniques for the analysis of complex data has created an unprecedented opportunity for teasing out invaluable information from protein sequences. Starting with a basic premise that protein sequence encodes information about SF&E, we developed a unified framework for inferring SF&E from sequence information using a knowledge-based approach in which we measure the similarity between a query sequence and a set of biologically relevant profiles in an unbiased manner. Results from this Gestalt Domain Detection Algorithm-Basic Local Alignment Tool (GDDA-BLAST) provide phylogenetic profiles that have the capacity to model SF&E relationships of various proteins. Indeed, GDDA-BLAST is capable of deriving deep phylogenetic relationships for highly divergent proteins in a quantifiable manner [5, 6]. Preliminary results from our computational case study of the highly divergent family of retroelements accord with those previously reported, and demonstrate that GDDA-BLAST measurements can be treated as """"""""fingerprints"""""""" that can be used to derive distance estimates and hence phylogenetic relationships without prior information, multiple sequence alignment, or manual editing. We propose that sequence information present within the """"""""twilight zone"""""""" of sequence similarity can provide key insight into SF&E relationships among distantly related and/or rapidly evolving proteins. This proposal aims to push our limits of detecting homology within the """"""""twilight zone"""""""" of sequence similarity by evaluating and optimizing GDDA-BLAST performance on benchmark and experimental data sets. Armed with these refined GDDA- BLAST measurements we propose to conduct a comprehensive, ab initio, phylogenetic study of retroelements and RNA dependent RNA polymerases from the positive-strand family of RNA viruses (+ssRNA). Simultaneously we will derive high-resolution maps of domain boundaries and empirically validate functional annotations and predictions of key residues for those activities. This work aims to perform translational research from the computer to the laboratory bench top. We expect that the tools and resources generated from this grant will be accessible and user-friendly to the bench scientist, thereby speeding the discovery process of other clinically relevant research endeavors.

Public Health Relevance

The long-term implication of this proposal is the development of a unified framework for high-resolution and simultaneous measurements of structure, function, and evolution. Should this be possible: (i) functional and evolutionary measurements could quantitatively inform structural modeling to derive accurate atomic resolution protein structures, (ii) structural and functional measurements could inform evolutionary histories to derive accurate evolutionary rates, deep-branch relationships, and homologous spaces within each protein, and (iii) structural and evolutionary measures would inform as to the location of functionalities contained within any protein and the regulatory elements which control these functions. Armed with this information, the speeds at which diseases could be understood and pharmacophores/therapies developed to combat them would likely increase dramatically.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM087410-05
Application #: 8288082
Study Section: Genetic Variation and Evolution Study Section (GVE)
Program Officer: Lyster, Peter

Project Start: 2009-04-10
Project End: 2014-03-31
Budget Start: 2012-04-01
Budget End: 2014-03-31
Support Year: 5
Fiscal Year: 2012
Total Cost: $232,788
Indirect Cost: $49,803

Institution

Name: University of California Davis
Department: Physiology
Type: Schools of Medicine
DUNS #: 047120084

City: Davis
State: CA
Country: United States
Zip Code: 95618

Related projects


NIH 2012 R01 GM	Detecting Homology in the """"""""Twilight Zone"""""""" of Sequence Similarity Patterson, Randen Lee / University of California Davis	$232,788
NIH 2011 R01 GM	Detecting Homology in the """"""""Twilight Zone"""""""" of Sequence Similarity Patterson, Randen Lee / University of California Davis	$232,472
NIH 2010 R01 GM	Detecting Homology in the """"""""Twilight Zone"""""""" of Sequence Similarity Patterson, Randen Lee / Pennsylvania State University	$141,578
NIH 2010 R01 GM	Detecting Homology in the """"""""Twilight Zone"""""""" of Sequence Similarity Patterson, Randen Lee / University of California Davis	$132,732
NIH 2009 R01 GM	Detecting Homology in the """"""""Twilight Zone"""""""" of Sequence Similarity Patterson, Randen Lee / Pennsylvania State University	$265,987

Publications

Chintapalli, Sree V; Bhardwaj, Gaurav; Patel, Reema et al. (2015) Molecular dynamic simulations reveal the structural determinants of Fatty Acid binding to oxy-myoglobin. PLoS One 10:e0128496

Lindy, Amanda S; Parekh, Puja K; Zhu, Richard et al. (2014) TRPV channel-mediated calcium transients in nociceptor neurons are dispensable for avoidance behaviour. Nat Commun 5:4734

Chintapalli, Sree V; Bhardwaj, Gaurav; Babu, Jagadish et al. (2013) Reevaluation of the evolutionary events within recA/RAD51 phylogeny. BMC Genomics 14:240

Todd, George K; Boosalis, Casey A; Burzycki, Aaron A et al. (2013) Towards neuronal organoids: a method for long-term culturing of high-density hippocampal neurons. PLoS One 8:e58996

Bhardwaj, Gaurav; Ko, Kyung Dae; Hong, Yoojin et al. (2012) PHYRN: a robust method for phylogenetic analysis of highly divergent sequences. PLoS One 7:e34261

Hong, Yoojin; Chintapalli, Sree Vamsee; Ko, Kyung Dae et al. (2011) Predicting protein folds with fold-specific PSSM libraries. PLoS One 6:e20557

Han, Qingxia; Aligo, Jason; Manna, David et al. (2011) Conserved GXXXG- and S/T-like motifs in the transmembrane domains of NS4B protein are required for hepatitis C virus replication. J Virol 85:6464-79

Hong, Yoojin; Kang, Jaewoo; Lee, Dongwon et al. (2010) Adaptive GDDA-BLAST: fast and efficient algorithm for protein sequence embedding. PLoS One 5:e13596

Kiselyov, Kirill; van Rossum, Damian B; Patterson, Randen L (2010) TRPC channels in pheromone sensing. Vitam Horm 83:197-213

Hong, Yoojin; Chalkia, Dimitra; Ko, Kyung Dae et al. (2009) Phylogenetic Profiles Reveal Structural and Functional Determinants of Lipid-binding. J Proteomics Bioinform 2:139-149

Comments

Be the first to comment on Randen Patterson's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: