The Milky Way Laboratory will create interactive access to one of the largest numerical simulations of cosmological structure ever created. With its data volume close to a petabyte, it illustrates the emerging data challenge -- how new breakthroughs require non-incremental changes in the way we do science. As datasets are doubling in size every year, traditional analysis techniques are no longer adequate. Scientists need a new instrument, like a "microscope of data", a data-scope, that can look at both the details and the large-scale features in the mountains of digital data. Increased raw computer power is necessary, but not enough -- we also need to use new computational thinking to develop the new approaches needed. The NSF-supported research aims to achieve such a breakthrough in data-intensive computational analysis.
The particular science focuses on the compelling question of the origin and evolution of the Milky Way Galaxy, the home of the Sun and our solar system. Understanding our place in the Universe will in turn lead to insight into the nature of the enigmatic 'dark matter', the substance whose gravity holds the Milky Way together, but about which little is known. The project will create a publicly accessible 'Milky Way Laboratory' that will combine the output of the largest state-of-the-art computer simulations of the formation of the Milky Way with the largest observational datasets of the properties of stars (like the Sun) in our Galaxy, and provide the means to answer questions that go beyond the scope of either dataset.
The publicly accessible data and the tools developed will enable scientists to run their own innovative experiments at unprecedented scales, demonstrating that it is possible to tackle problems on the petabyte scale in a university setting today. The project will provide a blueprint to address similar data-intensive problems in other areas of science.
The international team consists of astronomers, particle physicists and computer scientists from academia (Johns Hopkins and UC Santa Cruz) and industry to accomplish its ambitious goals. The project will be using the NSF-funded 5.5-petabyte Data-Scope system, under construction at the Johns Hopkins University, a unique data-intensive environment designed to carry out experiments like the Milky Way Laboratory. The Data-Scope will be located at a new computational facility at JHU, funded by an NSF ARRA grant.