This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. Seven focus areas in the realm of protein structure have been identified for application of the language analogy approach. These focus areas are: protein folding, conformational changes, protein-protein interactions, protein/gene networks and pathways, secondary structure and repetitive folds prediction and segmentation, protein family classification, and genome comparison. The ultimate goal is to develop linguistic models for each that are capable of advancing the understanding of these areas. The protocol followed in this process consists of several steps. The first step is to utilize existing benchmark datasets or to define datasets suitable for training and testing of these models. As controls, existing approaches in the focus areas, if available, are studied and a scheme is designed for evaluating the language model approaches and comparing them to existing other approaches. The next step is to implement our language approach. This implementation initially needs to meet one or both of two requirements: (i) the system has to perform equally well or better than existing systems as defined in step 2 and/or (ii) it needs to provide interpretable biological hypotheses. For example, a neural network might be the algorithm with best performance in a classification task, but the underlying features resulting in this performance can be unclear. A language-based approach that might have lesser performance but allows the researcher to analyze the types of features that result in successful classification can be used to build hypotheses on the fundamental building blocks of protein sequence language. The final step in the protocol is to design and carry out experiments that specifically test these hypotheses. The following systems have been chosen as experimental test cases for the language models: G protein coupled receptors (GPCR) such as rhodopsin, metabotropic glutamate receptors, epidermal growth factor receptor, viral tailspike protein, virus infection process, peptide n-grams. For each of the seven focus areas, we are working to identify or develop benchmark datasets for training and testing of linguistic models. Students and postdoctoral fellows participate in all aspects of the projects.

Agency
National Institute of Health (NIH)
Institute
National Center for Research Resources (NCRR)
Type
Biotechnology Resource Grants (P41)
Project #
5P41RR010888-10
Application #
7369285
Study Section
Special Emphasis Panel (ZRG1-BECM (03))
Project Start
2006-07-01
Project End
2007-06-30
Budget Start
2006-07-01
Budget End
2007-06-30
Support Year
10
Fiscal Year
2006
Total Cost
$1,196
Indirect Cost
Name
Boston University
Department
Biochemistry
Type
Schools of Medicine
DUNS #
604483045
City
Boston
State
MA
Country
United States
Zip Code
02118
Lu, Yanyan; Jiang, Yan; Prokaeva, Tatiana et al. (2017) Oxidative Post-Translational Modifications of an Amyloidogenic Immunoglobulin Light Chain Protein. Int J Mass Spectrom 416:71-79
Sethi, Manveen K; Zaia, Joseph (2017) Extracellular matrix proteomics in schizophrenia and Alzheimer's disease. Anal Bioanal Chem 409:379-394
Hu, Han; Khatri, Kshitij; Zaia, Joseph (2017) Algorithms and design strategies towards automated glycoproteomics analysis. Mass Spectrom Rev 36:475-498
Ji, Yuhuan; Bachschmid, Markus M; Costello, Catherine E et al. (2016) S- to N-Palmitoyl Transfer During Proteomic Sample Preparation. J Am Soc Mass Spectrom 27:677-85
Hu, Han; Khatri, Kshitij; Klein, Joshua et al. (2016) A review of methods for interpretation of glycopeptide tandem mass spectral data. Glycoconj J 33:285-96
Pu, Yi; Ridgeway, Mark E; Glaskin, Rebecca S et al. (2016) Separation and Identification of Isomeric Glycans by Selected Accumulation-Trapped Ion Mobility Spectrometry-Electron Activated Dissociation Tandem Mass Spectrometry. Anal Chem 88:3440-3
Wang, Yun Hwa Walter; Meyer, Rosana D; Bondzie, Philip A et al. (2016) IGPR-1 Is Required for Endothelial Cell-Cell Adhesion and Barrier Function. J Mol Biol 428:5019-5033
Srinivasan, Srimathi; Chitalia, Vipul; Meyer, Rosana D et al. (2015) Hypoxia-induced expression of phosducin-like 3 regulates expression of VEGFR-2 and promotes angiogenesis. Angiogenesis 18:449-62
Yu, Xiang; Sargaeva, Nadezda P; Thompson, Christopher J et al. (2015) In-Source Decay Characterization of Isoaspartate and ?-Peptides. Int J Mass Spectrom 390:101-109
Steinhorn, Benjamin S; Loscalzo, Joseph; Michel, Thomas (2015) Nitroglycerin and Nitric Oxide--A Rondo of Themes in Cardiovascular Therapeutics. N Engl J Med 373:277-80

Showing the most recent 10 out of 253 publications