We propose to develop and test the feasibility of a Recombinant Proteomic Data Resource (RPDR), a cloud based software system which provides a venue for researchers to (i) easily upload existing data for both productive and failed protein production outcomes, (ii) find and compare experimental details, and (iii) become more efficient by increasing research reproducibility. The RPDR would provide the first working platform for future big data mining efforts to better understand the underlying drivers for successful protein production. We will develop the RPDR software system, consisting of a web-hosted extensible protein production results database as well as UPLOADER, EXPLORER and CONFIGURATOR companion web applications (Aim 1). We will test the server-installed RPDR by uploading 10,000 records from the publicly available TargetTrack DB. Once deemed functional we will populate and test it by uploading ~40,000 protein production data records of non-peer- reviewed as well as negative expression results data from our seven collaborators (Aim 2). Our moderated upload process is designed to impose only minimal ? if any ? data format requirements, namely unique target identification and a normalized score for experimental outcome. In total we aim to enter and confirm the integrity of 30,000 records from a minimum of four different sources. We will test the performance of this new informatics system by making available to researchers the EXPLORER and CONFIGURATOR tools to find records in the RPDR and leverage these for the design of new protein production systems. The search function of the EXPLORER module will be developed to include potentially helpful production results from related records by retrieving UniProt cross-referenced homologous targets via an API and by sorting and grouping of records. We will develop the CONFIGURATOR functionality to allow selection of data from preferred records and to aggregate these into a preliminary production plan (Target Production Scratch Pad). In order to attract users, we will grant free-of-charge access to the EXPLORER and CONFIGURATOR modules by inviting a group of previously contacted interested researchers, and promote the system at an industry event. We will work with these researchers to use the RPDR and eventually upload new experimental results into the system. We plan to carry out a validation study, collecting feedback on the utility of the RPDR platform, gather new feature requests and explore commercialization preferences (Aim 3). We plan to assess the system's utility for the creation of new recombinant protein production plans by benchmarking target production plans, collecting user analytics and soliciting direct feedback from biomedical researchers. Further development of the Recombinant Proteomic Data Resource will be pursued by a Phase 2 SBIR proposal.
Exposing currently inaccessible protein production information to many researchers should have a major impact on the economics of establishing new recombinant protein production systems. To achieve this goal, we propose to develop a web-based Recombinant Proteomic Data Resource, allowing entry of a diverse set of recombinant protein production data and thereby facilitate configuration of new protein production systems.