Bioinformatics and Biostatistics Core (BBC): Project Summary 1. Our bioinformatics effort will support the overall goals of the Center via the following. a) Integrate proteomic, transcriptomic, and genomic data to allow isoform-level interrogation of the proteome. b) Provide an isoform-centric database of mRNA and protein abundance in rodent, and extend to non- human primate, and human species. c) Integrate data across individuals, conditions, and species through the development of multi-level network analyses to complement pathway-level analyses. 2. The proposed biostatistics efforts include the following components. a) Provide statistical guide for current and future experimental design, sample quality assessment, exploratory analysis, and visualization for proteomics data including label-free, multiple reaction monitoring, and data-independent acquisition. b) Develop a downstream statistical analysis framework for MS/proteomics data that includes data normalization and significance analysis of differentially expressed proteins or peptides. c) Construct an automated web-based analysis pipeline in collaboration with the YPED team. 3. We will make the following improvements to the Yale Protein Expression Database (YPED). a) Incorporate into the web interface new types of proteomics data and associated data analyses. b) Collaborate with the Bioinformatics and Biostatistics teams to incorporate new data results obtained using their new analysis pipelines for LC-MRM, SWATH and RNA-Seq. c) Link YPED to external data sources via interoperation with Neuroscience Information Framework (NIF). Use NIF ontologies to standardize YPED data annotation and facilitate integration with RNA-seq data. d) Expand the YPED repository (the public portion of YPED) to enable more rapid dissemination of a wider variety of types of proteomics data to the scientific community. 4. The high performance computing (HPC) resource provides the following support. a) Provide continued support of large-scale peptide sequence alignment and support novel pipelines to integrate genomic, transcriptomic, and proteomic datasets. b) Work closely with the database, bioinformatics, and biostatistics teams to help benchmark, scale, optimize, and speed up computing tasks involving large-scale MS data. c) Develop open-source proteomics pipelines (e.g., Skyline) in HPC settings. 5. Training and Education of Graduate students and Postdoctoral Fellows.
Showing the most recent 10 out of 185 publications