In recent years, the development and availability of omic-based technologies has moved analytical research to the forefront of biology. A desirable approach to systems-level biology research is to iterate between computation and experimentation. Explicitly, by using computational, statistical, and visualization-based techniques to interrogate the data, new experimental hypotheses can be developed and subsequently tested in the laboratory. However, the volume and heterogeneity of data being generated by high-throughput methods has created a need to develop improved methods for data integration and interpretation. The focus of this proposal is the continued development and maintenance of our existing visual analytics software: Platform for Proteomics Peptide and Protein data exploration (PQuad), a multi-resolution environment that can currently integrate genomic and proteomic data for complex prokaryotic datasets. PQuad currently has the capability to identify differentially expressed peptides and proteins between two experiments, and perform basic data integration of categorical information. The interrogation of multiple lines of evidence in prokaryotic systems has immediate significance for identifying virulence determinants in pathogens. We propose to continue the development of PQuad in two core areas: (1) advanced user-interaction and (2) enhanced visualizations.
Specific Aim #1 : The development of an advanced user-interface that will guide users in uploading multiple sources of information (both experimental and metadata), performing queries to target specific biomolecules of interest, and export specific queries of interest for further exploration outside of PQuad. In addition, we will offer the ability to perform basic statistical analyses of MS-based proteomic peptide identifications that can be used for thresholding queries and visualizations.
Specific Aim #2 : The development of new visualizations to support analysis and integration of data sources and queries. New visual paradigms will be incorporated into the software, which are not genome-centric, but targeted at facilitating the biological interpretation of available data sources or specific queries as defined in Aim 1. Through collaboration with users associated with one of the NIAID-funded Biodefense Proteomics Research Centers (www.proteomicsresource.org/PRC/About.aspx), we will demonstrate the data integration capabilities with the end goal of virulence determinant discovery in Salmonella
|Webb-Robertson, Bobbie-Jo M; Matzke, Melissa M; Metz, Thomas O et al. (2013) Sequential projection pursuit principal component analysis--dealing with missing data associated with new -omics technologies. Biotechniques 54:165-8|
|Peterson, Elena S; McCue, Lee Ann; Schrimpe-Rutledge, Alexandra C et al. (2012) VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data. BMC Genomics 13:131|
|Matzke, Melissa M; Waters, Katrina M; Metz, Thomas O et al. (2011) Improved quality control processing of peptide-centric LC-MS proteomics data. Bioinformatics 27:2866-72|
|Webb-Robertson, Bobbie-Jo M; Matzke, Melissa M; Jacobs, Jon M et al. (2011) A statistical selection strategy for normalization procedures in LC-MS proteomics experiments through dataset-dependent ranking of normalization scaling factors. Proteomics 11:4736-41|
|Webb-Robertson, Bobbie-Jo M; McCue, Lee Ann; Waters, Katrina M et al. (2010) Combined statistical analyses of peptide intensities and peptide occurrences improves identification of significant peptides from MS-based proteomics data. J Proteome Res 9:5748-56|