Advances in genome-scale sequencing have opened a unique pathway for understanding biological function. The challenge lies in mapping the information content of sequences to the dynamic physicochemical and biological properties of biomolecules. Over the previous funding period, in contrast to conventional methods that employ sequence alignment methods, we undertook an alignment-independent classification approach building on information theory and a search engine technology that had been successfully used in classifying medical records. We have analyzed the probabilistic occurrence of n-gram patterns NP(r,s) (segments of n = r + s contiguous residues, including s wild cards) in protein sequences to discover and verify that NP(4,2) patterns provide a highly informative description of conservation behavior and secondary structural propensities, while triplets of residues (3-grams) distinguished by their unique/rare natural occurrences appear to impart specificity. We have now built a fully-automated tool for screening any query sequence against the >5M sequences accessible in the UniProtKB to help identify distinctive n-grams that play functional roles. In this competing renewal application we plan to extend our previous studies by evaluating n-gram patterns and conservation profiles with regard to the physicochemical properties of residues (aim 1), by examining n-gram pattern covariance within single domain proteins and at the interface between protein-protein and protein-DNA complexes in all major protein families (aim 2), by analyzing inter-residue contacts for representative protein family members that have 3-dimensional structures (aim 2) and by examining correlations between residue motions during the equilibrium dynamics of proteins and their complexes (aim 3). The development of a systematic methodology for mapping between n-gram patterns, residue co-variations and dynamic correlations, and the flexible server framework launched during the initial funding period will form the basis for an integrated web based tool that will assist users in mapping the information content in sequences to dynamic and structural properties that are important for biological function (aim 4). Two major application areas are the structural characterization of membrane proteins and the assessment of possible sites of allosteric interactions in multimeric structures and complexes. The focus on selected systems including glutamate transporters and receptors as membrane proteins, HIV-protease and DNA helicase as multimeric enzymes, and transcription factors forming complexes with DNA will serve as prototypes for refining the computational methodology and for answering biological questions of importance.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Schools of Medicine
United States
Zip Code
Koes, David R; Vries, John K (2017) Evaluating amber force fields using computed NMR chemical shifts. Proteins 85:1944-1956
Koes, David R; Vries, John K (2017) Error assessment in molecular dynamics trajectories using computed NMR chemical shifts. Comput Theor Chem 1099:152-166
Dutta, Anindita; Krieger, James; Lee, Ji Young et al. (2015) Cooperative Dynamics of Intact AMPA and NMDA Glutamate Receptors: Similarities and Subfamily-Specific Differences. Structure 23:1692-1704
Dutta, Arpana; Altenbach, Christian; Mangahas, Sheryll et al. (2014) Differential dynamics of extracellular and cytoplasmic domains in denatured States of rhodopsin. Biochemistry 53:7160-9
Huang, Grace T; Cunningham, Kathryn I; Benos, Panayiotis V et al. (2013) Spectral clustering strategies for heterogeneous disease expression data. Pac Symp Biocomput :212-23
Coronnello, Claudia; Benos, Panayiotis V (2013) ComiR: Combinatorial microRNA target prediction tool. Nucleic Acids Res 41:W159-64
Zomot, Elia; Bahar, Ivet (2013) Intracellular gating in an inward-facing state of aspartate transporter Glt(Ph) is regulated by the movements of the helical hairpin HP2. J Biol Chem 288:8231-7
Schlattner, Uwe; Tokarska-Schlattner, Malgorzata; Ramirez, Sacnicte et al. (2013) Dual function of mitochondrial Nm23-H4 protein in phosphotransfer and intermembrane lipid transfer: a cardiolipin-dependent switch. J Biol Chem 288:111-21
Kshirsagar, Meghana; Carbonell, Jaime; Klein-Seetharaman, Judith (2013) Multitask learning for host-pathogen protein interactions. Bioinformatics 29:i217-26
Jain, Shilpa; Kapetanaki, Maria G; Raghavachari, Nalini et al. (2013) Expression of regulatory platelet microRNAs in patients with sickle cell disease. PLoS One 8:e60932

Showing the most recent 10 out of 77 publications