Computational Learning & Discovery for Biological Sequence, Structure, Function

Reddy, Raj

Abstract

This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. Seven focus areas in the realm of protein structure have been identified for application of the language analogy approach. These focus areas are: protein folding, conformational changes, protein-protein interactions, protein/gene networks and pathways, secondary structure and repetitive folds prediction and segmentation, protein family classification, and genome comparison. The ultimate goal is to develop linguistic models for each that are capable of advancing the understanding of these areas. The protocol followed in this process consists of several steps. The first step is to utilize existing benchmark datasets or to define datasets suitable for training and testing of these models. As controls, existing approaches in the focus areas, if available, are studied and a scheme is designed for evaluating the language model approaches and comparing them to existing other approaches. The next step is to implement our language approach. This implementation initially needs to meet one or both of two requirements: (i) the system has to perform equally well or better than existing systems as defined in step 2 and/or (ii) it needs to provide interpretable biological hypotheses. For example, a neural network might be the algorithm with best performance in a classification task, but the underlying features resulting in this performance can be unclear. A language-based approach that might have lesser performance but allows the researcher to analyze the types of features that result in successful classification can be used to build hypotheses on the fundamental building blocks of protein sequence language. The final step in the protocol is to design and carry out experiments that specifically test these hypotheses. The following systems have been chosen as experimental test cases for the language models: G protein coupled receptors (GPCR) such as rhodopsin, metabotropic glutamate receptors, epidermal growth factor receptor, viral tailspike protein, virus infection process, peptide n-grams. For each of the seven focus areas, we are working to identify or develop benchmark datasets for training and testing of linguistic models. Students and postdoctoral fellows participate in all aspects of the projects.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Research Resources (NCRR)
Type: Biotechnology Resource Grants (P41)
Project #: 5P41RR010888-10
Application #: 7369285
Study Section: Special Emphasis Panel (ZRG1-BECM (03))

Project Start: 2006-07-01
Project End: 2007-06-30
Budget Start: 2006-07-01
Budget End: 2007-06-30
Support Year: 10
Fiscal Year: 2006
Total Cost: $1,196
Indirect Cost

Institution

Name: Boston University
Department: Biochemistry
Type: Schools of Medicine
DUNS #: 604483045

City: Boston
State: MA
Country: United States
Zip Code: 02118

Related projects

Publications

Lu, Yanyan; Jiang, Yan; Prokaeva, Tatiana et al. (2017) Oxidative Post-Translational Modifications of an Amyloidogenic Immunoglobulin Light Chain Protein. Int J Mass Spectrom 416:71-79

Sethi, Manveen K; Zaia, Joseph (2017) Extracellular matrix proteomics in schizophrenia and Alzheimer's disease. Anal Bioanal Chem 409:379-394

Hu, Han; Khatri, Kshitij; Zaia, Joseph (2017) Algorithms and design strategies towards automated glycoproteomics analysis. Mass Spectrom Rev 36:475-498

Ji, Yuhuan; Bachschmid, Markus M; Costello, Catherine E et al. (2016) S- to N-Palmitoyl Transfer During Proteomic Sample Preparation. J Am Soc Mass Spectrom 27:677-85

Hu, Han; Khatri, Kshitij; Klein, Joshua et al. (2016) A review of methods for interpretation of glycopeptide tandem mass spectral data. Glycoconj J 33:285-96

Pu, Yi; Ridgeway, Mark E; Glaskin, Rebecca S et al. (2016) Separation and Identification of Isomeric Glycans by Selected Accumulation-Trapped Ion Mobility Spectrometry-Electron Activated Dissociation Tandem Mass Spectrometry. Anal Chem 88:3440-3

Wang, Yun Hwa Walter; Meyer, Rosana D; Bondzie, Philip A et al. (2016) IGPR-1 Is Required for Endothelial Cell-Cell Adhesion and Barrier Function. J Mol Biol 428:5019-5033

Srinivasan, Srimathi; Chitalia, Vipul; Meyer, Rosana D et al. (2015) Hypoxia-induced expression of phosducin-like 3 regulates expression of VEGFR-2 and promotes angiogenesis. Angiogenesis 18:449-62

Yu, Xiang; Sargaeva, Nadezda P; Thompson, Christopher J et al. (2015) In-Source Decay Characterization of Isoaspartate and ?-Peptides. Int J Mass Spectrom 390:101-109

Steinhorn, Benjamin S; Loscalzo, Joseph; Michel, Thomas (2015) Nitroglycerin and Nitric Oxide--A Rondo of Themes in Cardiovascular Therapeutics. N Engl J Med 373:277-80

Showing the most recent 10 out of 253 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: