Core Research Grant

Nicholas, Hugh

Abstract

This subproject is one of many research subprojects utilizing theresources provided by a Center grant funded by NIH/NCRR. The subproject andinvestigator (PI) may have received primary funding from another NIH source,and thus could be represented in other CRISP entries. The institution listed isfor the Center, which is not necessarily the institution for the investigator.In the past year we have developed an algorithm that identifies the residues inbiological macromolecules that confer the necessary specificity of interactionon the members of a paralogous family of molecules that carry out the samefunction with or upon different and distinct partners or substrates. Forexample, each paralogous tRNA interacts selectively with only one of twentydifferent aminoacyl tRNA synthetases. Each paralogous serine protease cleavespeptide bonds involving only particular amino acids, and each paralogousheterotrimeric G binding protein binds only a specific receptor and activatesonly particular kinases or other members of specific signaling pathways. Thealgorithm we have developed identifies the sequence features that confer thisspecificity on individual molecules. In applying this analysis, we havediscovered that the analysis not only identifies sequence elements that conferthe desired unique properties but also identifies ensembles of co-evolvingsequence elements which we describe below. We believe these coevolvingensembles to be an important part of the mechanism by which biologicalmacromolecules fine tune their specificity of action.The identification of sequence elements that confer specificity of action onbiological macromolecules is achieved by dividing sequence residues into threecategories, the second of which is the focus of our research: 1. Highly conserved sequence residues essential to the structure andactivity of the entire homologous family of macromolecules. 2. Highly circumscribed sequence residues that maintain the specificity ofthe activity within the paralogous subfamilies. 3. Sequence residues that may vary freely.We assign residues to these three categories based on the amounts of twodifferent kinds of entropy associated with each sequence residue in a multiplesequence alignment. The first is the family relative entropy, the entropycalculated at a particular position in the alignment over all of the sequencesin the alignment (all the sequences in the family). The family relative entropyachieves its highest value when all of the sequences in an alignment have thesame kind of residue at that position in the alignment and that kind of residueis rare compared to other possible residues. The family relative entropy iscomputed as: where pi is the fraction of residue type i in a particular position of thealignment and qi is the fraction of residue type i expected in random sequence. qi is usually taken as the fractions of residue types in an appropriatesequence database.The second kind of entropy considered is the group cross entropy. The groupcross entropy achieves its highest value when only a single kind of residue isfound within the group and a different single kind of residue is found in therest of the sequences in the alignment. It is computed as: sum(i) {(qi-pi)*log2(pi/qi)} where pi is the fraction of residue type i in a particular position of thealignment for sequences in the predefined group and qi is the fraction ofresidue type i in a particular position of the alignment for sequences not inthe predefined group. This form of the cross entropy is symmetric and henceusable as a distance measure in various clustering procedures.Category 1 residues are those that have a high family relative entropy and alow group cross entropy for all groups. Category 2 residues are those thathave a low family relative entropy and a high group cross entropy for at leastone of the predefined groups. Category 3 residues are those where both thefamily relative entropy and the group cross entropy are low. We generallydefine high entropy score to be a normalized Z score of 3 or greater, althoughfor some analyses a value as low as 2 can be useful. (The normalized Z scoreis the raw score minus the average score and this difference divided by thestandard deviation of the scores.) Note that the underlying entropy values arenot normally distributed and thus the Z scores should not be used for inferringstatistical significance.The analysis is not specific to either protein or nucleic acid sequences. Thisallowed me to check that the methods would work on a biologically importantsystem where the answers were already known from extensive experimental work aswell as previous analysis. I applied the new analysis to the same system of 67tRNAs that I had analyzed earlier with the initial, simple counting models(McClain and Nicholas, 1987). The experimental work confirming the correctnessof this earlier analysis is reviewed in McClain (1995). The new,information-based methods provided the same answers with a substantiallyimproved signal to noise ratio (Nicholas, 1999). I presented this as aninvited talk at the 'Emerging Sources of RNA Information' workshop in Decemberof 1998. The improved signal to noise ratio will make it easier select whichanswers to test by laboratory experiment.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Research Resources (NCRR)
Type: Biotechnology Resource Grants (P41)
Project #: 5P41RR006009-18
Application #: 7723100
Study Section: Special Emphasis Panel (ZRG1-BCMB-Q (40))

Project Start: 2008-08-01
Project End: 2009-07-31
Budget Start: 2008-08-01
Budget End: 2009-07-31
Support Year: 18
Fiscal Year: 2008
Total Cost: $473
Indirect Cost

Institution

Name: Carnegie-Mellon University
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 052184116

City: Pittsburgh
State: PA
Country: United States
Zip Code: 15213

Related projects

Publications

Simakov, Nikolay A; Kurnikova, Maria G (2018) Membrane Position Dependency of the pKa and Conductivity of the Protein Ion Channel. J Membr Biol 251:393-404

Yonkunas, Michael; Buddhadev, Maiti; Flores Canales, Jose C et al. (2017) Configurational Preference of the Glutamate Receptor Ligand Binding Domain Dimers. Biophys J 112:2291-2300

Hwang, Wonmuk; Lang, Matthew J; Karplus, Martin (2017) Kinesin motility is driven by subdomain dynamics. Elife 6:

Earley, Lauriel F; Powers, John M; Adachi, Kei et al. (2017) Adeno-associated Virus (AAV) Assembly-Activating Protein Is Not an Essential Requirement for Capsid Assembly of AAV Serotypes 4, 5, and 11. J Virol 91:

Subramanian, Sandeep; Chaparala, Srilakshmi; Avali, Viji et al. (2016) A pilot study on the prevalence of DNA palindromes in breast cancer genomes. BMC Med Genomics 9:73

Ramakrishnan, N; Tourdot, Richard W; Radhakrishnan, Ravi (2016) Thermodynamic free energy methods to investigate shape transitions in bilayer membranes. Int J Adv Eng Sci Appl Math 8:88-100

Zhang, Yimeng; Li, Xiong; Samonds, Jason M et al. (2016) Relating functional connectivity in V1 neural circuits and 3D natural scenes using Boltzmann machines. Vision Res 120:121-31

Lee, Wei-Chung Allen; Bonin, Vincent; Reed, Michael et al. (2016) Anatomy and function of an excitatory network in the visual cortex. Nature 532:370-4

Murty, Vishnu P; Calabro, Finnegan; Luna, Beatriz (2016) The role of experience in adolescent cognitive development: Integration of executive, memory, and mesolimbic systems. Neurosci Biobehav Rev 70:46-58

Jurkowitz, Marianne S; Patel, Aalapi; Wu, Lai-Chu et al. (2015) The YhhN protein of Legionella pneumophila is a Lysoplasmalogenase. Biochim Biophys Acta 1848:742-51

Showing the most recent 10 out of 292 publications

Comments

Be the first to comment on Hugh Nicholas's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: