Rapid improvements in DNA sequencing and synthesis have the potential to usher in a new era of precision medicine. To realize this vision, however, we must re-imagine the computational and storage infrastructure used to manage and extract actionable results from the massive data sets made possible by widely available advances in DNA sequencing and synthetic biology. In this application, we propose to harden and improve a hybrid cloud hardware and software platform that can effortlessly move biomedical analyses and data between local and remote computational storage resources in compliance with not only institutional policies but also user preference for underlying time and cost trade-offs. A 300TB, 500-core prototype of such a system has been in production since 2007 and federated across two data centers. This prototype has been used to contribute to informatics analyses in dozens of publications, which have collectively received over 1,000 citations. Commercialization of this open-source system, which will be greatly accelerated by this grant, will permit organizations to seamlessly span on-premise & hosted cloud-operating systems and vastly simplify data-management & computation, all while facilitating compliance with institutional policies and regulatory requirements.
The delivery of healthcare based on molecular data specific to an individual patient (i.e. precision medicine) will require the creation of a new ecosystem of Clinical Decision Support (CDS) applications. This work will provide a platform that will make the development of such applications faster, easier, and less expensive.