Large-scale omics techniques including proteomics and RNA-seq have become important tools to identify disease mechanisms and therapeutic targets. However, these experiments have largely not considered ?proteoforms? - protein variants coded by the same gene such as through alternative splicing and post- translational modifications that can serve different cellular functions and whose distributions are often permuted in disease. In the heart in particular, alternative splicing is implicated in broad pathological processes in heart failure and cardiomyopathy, but at present we have a poor understanding of the expression status and molecular functions of many alternative splice isoform products at the protein level. Recently we have developed and optimized a computational pipeline which can integrate information from RNA-seq and proteomics data to recover lost protein isoform information from proteomics data. Our goal now is to perform a targeted secondary analysis of publicly available quantitative proteomics data on heart diseases that are housed in persistent data repositories. Specifically, Aim 1 will (i) identify and quantify alternative splice isoforms in heart failure and atrial fibrillation proteomics data, by using custom sequence databases constructed from RNA-seq data; and (ii) determine the intersections between AS isoforms with PTM sites at regulatory hotspots, with the aid of mass-tolerant open-search algorithms that can recover unexpected PTMs in proteomics data. By reanalyzing existing datasets with our pipeline we aim to extract isoform-level knowledge on existing data, which we are confident will have a strong likelihood to open unforeseen avenues into the research of heart diseases, and also add value to the existing rich data resources in our research community.
Proteins variants from the same gene often have different functions due to biochemical modifications of their amino acid sequences, but these differences are often not resolved in large-scale studies of cardiac diseases due to technical limitations. Here we propose to perform a secondary analysis of protein expression datasets in the public domain to extract hidden information on proteoforms using a computational approach we recently developed. If successful, the results of the study may improve researchers' ability to discern a new class of disease biomarkers (changes in variant proteins), which can in turn help diagnose and prognosticate the progression of heart diseases.