Data intensive science has rapidly emerged as the Fourth Paradigm of scientific discovery after empirical, theoretical, and computational methods. This is particularly true in the area of data assimilation and ensemble prediction. Yet, significant barriers exist in using the data efficiently or integrating them into data assimilation or ensemble prediction systems as the scientific community lacks easy-to-use common cyberinfrastructure frameworks. By some estimates, researchers may spend 80 percent of their time dealing with data discovery, access, and processing, and only 20 percent "doing science" by way of interpretation, synthesis, and knowledge creation.

The goal of the National Science Foundation's EarthCube initiative is to transform the conduct of research by supporting the development of community-guided cyberinfrastructure. It is critical that EarthCube is both shaped by as well as benefits the different scientific communities to which it is targeted.

This project will fund a workshop to bring the research, education, and information technology communities together to discuss some of the science, technology and cyberinfrastructure issues related to distributed but shared mesoscale modeling, data assimilation, and ensemble prediction. The title of the workshop, which is planned to be held 17-18 December 2012 in Boulder, CO, is "Shaping the Development of EarthCube to Enable Advances in Data Assimilation and Ensemble Prediction."

One of the goals of the workshop is to shape the development of EarthCube and help in building a cyberinfrastructure and work toward a scientific ecosystem in which "data friction" is reduced, and data transparency and ease-of-use are significantly increased. We believe achieving the workshop goals will help mesoscale ensemble prediction and data assimilation communities to work toward a transformation in the conduct of data-centric research and education. To that end, we would like to assemble a team from across the country to develop a multi-institutional, multi-model, multi-data-assimilation regional scale ensemble prediction and analysis system that is capable of real-time forecasts, as well as historical reanalysis. It is anticipated that workshop participants will come from U. S. universities, NCAR and UCAR, NOAA, NSF, and other research organizations.

Broad Impacts: There is an urgent need for educating and training the next generation of students in mesoscale modeling, data assimilation, ensemble and probabilistic forecasting in the United States. The workshop will engage students pursuing careers in the aforementioned three areas of research. It is envisioned that the products from the planned data assimilation and ensemble prediction systems will be readily accessed by a broad community of university researchers, students and educators for exploring dynamics, physics of the atmosphere, as well as for educating the next generation of students to gain knowledge and expertise in advanced numerical weather prediction topics. The real-time ensemble prediction system can be used as a complementary tool by operational forecasters, especially in terms of probabilistic forecasting of severe weather and tropical cyclones. Finally, this workshop will help build capacity among the community of researchers and users of ensemble prediction and data assimilation and will foster further collaborative efforts to advance research in mesoscale meteorology.

Project Report

EXECUTIVE SUMMARY: Organizers: Mohan Ramamurthy, Unidata/UCAR, Fuqing Zhang, Penn State U., and Russ Schumacher, Colorado State U. Dates of the workshop: 17-18 December 2012 Earth Cube Workshop Title: Shaping the Development of EarthCube to Enable Advances in Data Assimilation and Ensemble Prediction This workshop was held to shape the development of EarthCube from the perspectives of the mesoscale modeling, data assimilation, and ensemble prediction communities and help in building a robust geosciences cyberinfrastructure. There were 72 registered participants, and they came from all sectors of the atmospheric science community (academia, government, and private sector), and from geographically distributed universities, research labs, and organizations that provide data to the atmospheric research community. 1. Important science drivers and challenges: Participants identified several high-priority science questions that will be the focus of interdisciplinary efforts during the next 5-15 years (list 3 to 6). • What are the limits of predictability in the atmosphere? What are the sources of uncertainty/errors, and how do they feed into predictability? • What observations are critically needed to enhance atmospheric predictions, and where? What is the optimal configuration of the observation network? • What are the appropriate types, combinations, and configurations of parameterization schemes for high-resolution mesoscale models? How can the errors and biases in these parameterizations be quantified and corrected? • What is the optimal ensemble configuration to accurately predict the distribution of possible outcomes? How many ensemble members are needed and how should the ensembles be initialized? • What are the advantages and disadvantage of variational versus ensemble-based data assimilation techniques, as well as different types of hybrid approaches? • What are the most effective ways to post-process ensemble forecasts to achieve reliable and calibrated probabilistic predictions? 2. Current challenges to high-impact, interdisciplinary science: Several themes emerged as consistent challenges faced within/across the involved discipline(s) (list 3 to 6). • Significant barriers exist in using the data efficiently or integrating them into data assimilation or ensemble prediction systems. Today, there is too much overhead to doing research efficiently – e.g., setting up one's data and analyzing it. Rarely are there tools that really reduce this overhead. • The scientific community lacks easy-to-use common cyberinfrastructure frameworks, data format standards, sufficient metadata for observations, and methods/tools for quality controlling observations, mining of large volumes of data, visualization, and verification. • While many good facilities exist in this field (e.g., Unidata, DTC, and DART), they sometimes operate in silos and their activities and services are not always well coordinated or integrated. • Lack of a central repository for finding, accessing, and using data and software. • Significant spin-up time for students in preparing, using, processing, and analyzing data. While similar challenges exist for researchers, such problems are particularly acute for students who have a limited time before they graduate. . • Barriers to collaboration between closely linked disciplines; e.g., Atmospheric Sciences, Computer Science, Mathematics and Statistics; TECHNICAL INFORMATION/ISSUES/CHALLENGES 1. Desired tools, databases, etc. needed for pursuing key science questions with brief elaboration: • Centralized data repositories and services that link existing and future data systems. For example, a centralized community repository could be created for data submission and sharing. • Advanced software, tools kits, and services for quality control, in-depth data analysis, visualization, verification, and mining of data (observational and model output). These tools and services need to be user-friendly and accessible by the whole scientific community. • Common data formats and frameworks for assimilation, modeling, analysis and visualization. • Common data assimilation framework; currently, each assimilation system uses its own framework for data I/O, processing, and running algorithms. • Collaboration tools, platforms, and frameworks (e.g., Wiki for data) • Server-side processing tools for data processing, analysis, visualization COMMUNITY NEXT STEPS 1. List of what your community needs to do next to move forward how it can use EarthCube to achieve those goals: • A pilot project on coordinated, distributed national ensemble prediction that involves universities that are interested in participating • Developing a prototype system that links data sets/systems together, involving the most used projects like the reanalysis data sets; Develop a system that works seamlessly, and then expand to include other data sets/systems • Continued discussion with the goal of developing a concrete plan for greater coordination of ongoing and future programs and facilities that serve the data assimilation and prediction communities, and developing a next-generation testbed facility to advance the science. • PI meetings to leverage and expand communication, and enhance data sharing, and facilitate sustained interactions • Entrain current undergraduate and graduate students into research and educational activities related to "big data", ensemble prediction and data assimilation, and EarthCube, to move these initiatives forward for the future scientific workforce. • Reach out to other geoscience communities, including climate, space weather, oceanography, hydrology, and air-quality, as well as the computer and information science communities.

Agency
National Science Foundation (NSF)
Institute
Division of Atmospheric and Geospace Sciences (AGS)
Type
Standard Grant (Standard)
Application #
1266399
Program Officer
A. Gannet Hallar
Project Start
Project End
Budget Start
2012-12-15
Budget End
2013-11-30
Support Year
Fiscal Year
2012
Total Cost
$41,225
Indirect Cost
Name
University Corporation for Atmospheric Res
Department
Type
DUNS #
City
Boulder
State
CO
Country
United States
Zip Code
80301