Computer systems are being applied to increasingly demanding cross disciplinary applications. The environments in which they function and the resources they manage are gradually more diverse, distributed and dynamic. This is particularly observed for biomedical applications where the need for flexible, dynamic, data intensive computing functionalities are substancially critical. This revolution of modern medicine requires novel and efficient approaches to support resource integration and scientific workflows (SWF) that capture, analyze, retrieve, integrate, and produce molecular information including biological specimens derived from tissue, cells, or blood combined with medical records or clinical trial data. Such workflows are increasingly used in e-science discovery and are characterized by frequent changes that are part of the scientific discovery process. The research develops a uniform approach to support live updates of scientific workflows accessing distributed resources. The goal of live updates is to allow continuous execution in the presence of changes. The research supports updates of both the structure and nature of SWFs processes and the data they access and create (produce and consume). The research divides the live update problem for SWFs into four main technical aims: software update, persistent storage update, workflow update, and evaluation and testing. It develops the mechanisms for updates as well as automatic checks for the safety of a planned update. The research impacts information-based drug discovery and development, and healthcare decision support systems.
Overview of the Project The project aimed at developing techniques and tools for dynamic software update. Unlike the software update that many are familiar with and typically requires restarting the software application or even the device for the update to take effect, dynamic software update updates a software application while it is running. Our study was not confined to traditional software application, but also looked at updates of scientific workflows. A scientific workflow typically consists of many software components that can be developed by different developers and that are put together in a workflow to achieve a scientific analysis. The execution of scientific workflows is time consuming and there is a need to support experimentation with scientific workflows in which modified versions of the same workflow are executed in order to find the best way to analyze scientific data. An important goal of the work was to expose students in computer science to work with scientists from the general biotechnology field through internships at the Translational Genomics Institute. Outcomes We have been successful in achieving many of the project goals. Publications. Ten publications in conferences and workshops resulted from this project with some additional publications in preparation. Development of a general mechanism for dynamic software update. We have developed a general approach and software tool for updating programs while they are executing. We call the tool UpStare. It is a general dynamic software update tool and it is available together with a user manual for upload and use by other researchers. Identification of the inherent overhead introduced by dynamic software update. We have shown that any general dynamic software update tool that does not place limitation on when an update can occur must introduce overhead to the execution of a program even when the program is not being updated. We have proposed, but not implemented, a mechanism whose overhead would be the lowest possible. Improvement of the Structural Prediction for pRotein fOlding UTility System (SPROUTS). SPROUTS was supported by a previous NSF grant and is available to scientists to compare and integrate various analyses for protein folding. SPROUTS was made much more useable and significant improvements were made to the ability to compare multiple analyses as part of effort under this project. As the results of the improvements, SPROUTS' collection grew to 900 entries for various proteins. Involvement of undergraduate students in research. Five undergrdauate students worked on research related to the project. The work lead to one student being a co-author on a workshop paper. Another student who has started working on the project before the expiration of the grant is continuing to work on the topic of dynamic database update. A high school student was also hosted at ASU as part of the PI's outreach effort to local high schools. Training of workforce. Three computer science and engineering students working on the grant got the chance to work with life scientists and help in making it easier for the scientists to solve their scientific problems. One PhD student and one Masters student already graduated and one PhD and one Masters student are expected to graduate in 2014. One computer science student spent part of a summer in France working with French scientists. Identification of conditions under which a dynamic software update can be applied safely. Typical updates that are made to computer software are made with the assumption that the update will not be applied dynamically and such updates might not work as expected if they are applied dynamically. We have identified conditions under which an update can be safely applied dynamically. Development of an approach for efficient execution of multiple versions of a scientific workflow. We have formally defined the data reuse problem in which data produced by one version of a workflow can be reused to save on the time needed to execute another version of the workflow. We devised very efficient strategy for data reuse to minimize the time needed to execute multiple versions of a workflow. Identification of patterns of scientific workflow changes. In order to support efficient updates of scientific workflows, it is important to understand the patterns of changes in these workflows. We were able to identify some patterns of changes in scientific workflows as a resulit of significant effort in revamping an existing scientific workflow. New technique for browser security. This was not a planned outcome. We have designed and developed a new tool for more secure access to sensitive websites. We call the tool Auto-FBI because it automatically creates fresh browser instance (window) to access sensitive content.