The Seattle Children's Hospital is awarded a grant to conduct two workshops that will address the opportunities offered by cloud computing to confront the task of uncovering scientific knowledge from enormous amounts of data generated by biological research. The workshop goals are responsive to the NSF strategic vision on Cyberinfrastructure Framework for 21st Century Science and Engineering, which challenges the community to develop and sustain the necessary cyberinfrastructure capable of enabling science and engineering in the 21st century. Cloud computing offers an unprecedented opportunity to address the challenges of this data bottleneck and open up a new era in Data-Intensive Science (DIS). The two workshops will bring practitioners in biological informatics together to discuss challenges, opportunities and strategies in order to propose short- and long-term strategies to take on these challenges. There is a significant and very timely potential for widespread applicability in that there are many disciplines that now routinely generate data sets that overwhelm storage and analysis infrastructures. The workshops will showcase not only the communities and their challenges, but, more importantly, address how best to meet those challenges.
The workshops will connect computational, data analysis, and inter-disciplinary research communities, including researchers, analyzers, developers, educators, community and tribal leaders, scientific administrators, and policymakers. This will enable both high-level (strategic) and specific (operational) discussions and developments of the user requirements, user-based evaluations, and standardized development with broad impact beyond the particular community challenges.
Cloud computing can have a major impact at helping four main types of diversity issues and institutions. First, clouds have the potential to allow access to extensive compute resources to research groups from all sizes of institutes, but particularly the small to mid-sized institutes that cannot afford to increase their local compute infrastructure. Similarly, secondly, minority-serving institutes (e.g. Howard University) and, thirdly, gender-serving institutions (e.g. Wellesley College) can take advantage of a common resource to boost their compute capabilities. Fourth, young investigators can have ready access to resources outside of their current support levels while more senior investigators can adapt to the increased need for compute resources in their field.
These workshops will be held in September 19-20, 2010 (Seattle, WA; Seattle Children's Research Institute) and March 20-21, 2011 (Washington, D.C.; J. Craig Venter Institute). Further information on the workshops and their outcomes will be available via the PI's lab home page at http://kolkerlab.proteinspire.org/.
The Public Report for Data-Intensive Science Workshops DISW1 and DISW2 Opportunities and Challenges for the Life Sciences Community September 19-20, 2010 and May 16-17, 2011 Editor: Eugene Kolker Seattle Children’s Research Institute, Seattle WA. Keywords: Bioinformatics, Data-intensive/Data-enabled Science, Life Sciences community, cloud, 4th paradigm Address correspondence to: Eugene Kolker, Ph.D. Seattle Children's Research Institute 1900 Ninth Avenue C9S-9 Seattle, WA 98101 E-mail: eugene.kolker@seattlechildrens.org Modern life sciences are data enabled sciences (DES) that seek to understand biological processes through computationally intensive techniques. Currently, the rate of data generation in the life sciences exceeds that of Moore’s law: the increase in the amount of data generated is exceeding the rate of increase of computer capabilities. In addition, existing data resources and tools lack continuity and can be difficult to disseminate and maintain because the resources (both people and cyberinfrastructure) are not organized to sustain them. The first NSF-funded Data Intensive Workshop (DISW1, Seattle WA, September 19-20, 2010) had 6 working groups (Policy, Communication, Biology, Education, Technology, Bioinformatics) that identified the challenges within the topic and summarized findings in order to build a platform for the second workshop. Challenges identified included: The research necessity of the life sciences community to work across diverse domains and with computer, cyberinfrastructure, and data experts to leverage opportunities in DES. Scientific progress and accelerated rate of life sciences result in a pressing need for reproducibility. A perceived gap between the needs of data-enabled life sciences and current funding initiatives. A specific need to integrate data-enabled sciences with major international and national initiatives. The second NSF-funded DIS workshop (DISW2) in Washington DC (May 16-17, 2011) was organized to plan for the transition to a cloud-based paradigm for data-intensive/data-enabled sciences. This transition is seen as a shift that will facilitate solutions to many of the challenges identified in the first workshop. As the workshop progressed animated discussions of the transitional issues made it clear that there is a need to think about a supporting infrastructure that organizes, supports and provides resources and services to the scientific community. Based on the findings of DISW 1 and 2 the following overarching recommendation to NSF was developed: Establish a community alliance to be the voice and framework of the community. Its immediate goals would be to 1) Synergize research and educational efforts across the life sciences using contemporary compute approaches to comprehend large, diverse data, 2) Cohesively address the needs of the community through the ecosystem of federal agencies, foundations, academia, industry, 3) Make the alliance an integral part of the international and national developments to address challenges of data-enabled sciences. Specific action items that this alliance and its supporters are addressing include: Life Sciences Informatics Catalogs and Journal 1. Catalog, evaluate and rank existing data resources. 2. Catalog, evaluate and rank existing analysis tools. 3. Establish Data Journal for Life Sciences Access and Support Cyberinfrastructure 3. Develop multiple distributed data and meta-data repositories and associate them with compute resources. 4. Provide access to standardized tools & supporting documentation. 5. Develop and secure resources for data repositories and tools maintenance and support. People and Policies 6. Align merit evaluations and funding strategies with key DES needs, products, infrastructure and management resources. 7. Synergize DES training & collaborative research between life and computer scientists, educators, data and infrastructure experts. In these times of severe budget cuts, the cloud-based data access solution will provide more value for every funding dollar as data collected in one lab can be used by many others. The access to quality data resources will also be a notable educational asset. Finally, highly-accessible data and tools will lead to scientific advances and collaborative research efforts.