The Biostatistics and Bioinformatics Core (BBC) supports statistical, bioinformatic and computational needs of the Discovery and Targeted Proteomics Cores, as well as Center Investigators, their postdoctoral associates and students, and Pilot Grant awardees. The BBC has four inter-related Specific Aims: 1) Biostatistics; 2) Bioinformatics; 3) High Performance Computing; 4) Training and Education.
In Aim 1, we will provide statistical guidance on experimental design and data analysis, including sample quality assessment, and exploratory analysis for a wide range of types of proteomics data sets; continue to develop and add more features/functions to ProteomicsBrowser, a proteomics data analysis and visualization tool developed by the BBC to assist Center users in better interpreting complex proteomics data; develop and implement novel statistical methods to impute missing information in proteomics data; and develop and implement an online tool for proteomics data preprocessing, including data normalization, batch effect correction, and missing data imputation.
In Aim 2, we will provide advanced bioinformatics software and approaches to assist Center investigators and Pilot Grant awardees in fully interpreting their comparative protein and protein post-translational modification profiling data; we will leverage information from single-cell RNA-seq by incorporating stochastic expression at the cell-type-level into an analytic framework to deconvolve tissue-level transcriptomics (RNA-seq) into fractions of constituent cell types for individual samples, and identify genes and cell types showing significant discrepancies between RNA and protein levels; and develop a unified computational framework for the detection of allele-specific peptides and allele-specific events from existing whole-genome sequencing, whole-transcriptome sequencing, and proteomics data generated from post-mortem human brain samples.
In Aim 3, we will provide continued support for large-scale peptide sequence alignment and support novel pipelines to integrate genomic, transcriptomic, and proteomic datasets; work closely with the bioinformatics and biostatistics teams to help benchmark, scale, optimize, and speed up computing tasks involving large-scale data analyses and database queries; and explore alternatives to traditional high performance computing environments such as container systems and private cloud computing.
In Aim 4, we will provide training and education in biostatistics, bioinformatics, database and high performance computing through interaction and collaboration with the Center investigators, including working closely with the Yale Medical Library Bioinformatics Support Program and other Yale organizations.
Showing the most recent 10 out of 185 publications