The Grid computing model connects computers that are scattered over a wide geographic area, allowing their computing power to be shared. Just as the World Wide Web enables access to information, computer grids enable access to distributed computing resources. These resources include detectors, computer cycles, computer storage, visualization tools, and more. Thus, grids can combine the resources of thousands of different computers that are not fully utilized, and assemble these to create a massively powerful resource, and, with GRID software, this resource can be accessible from a personal computer.
For scientists in international collaborations, grid computing provides the power that can enable effective collaborations whose members are widely dispersed geographically. Grids also can enable simulations that might take weeks on a single PC to run in hours on a grid. Further, the development of computing grids also develops new communities. Grids therefore encourage and require people from different countries and cultures to work together to solve problems.
Grid computing works because people participating in grids opt to share their computer power with others. This opens many questions, both social and technical. Who should be allowed to use each grid? Whose job should get priority in the queue to use grid power? What is the best way to protect user security? How will users pay for grid usage? Answering these questions requires all-new technical solutions, each of which must evolve as other grid and information technologies develop. Since grids involve countries and regions all over the world, these solutions must also suit different technical requirements, limitations and usage patterns.
The Large Hadron Collider (LHC), the accelerator facility discussed in this proposal, is a particle accelerator constructed as a collaboration between more than 50 countries. The world's largest machine, it accelerates particles to nearly the speed of light and then steers these particles into 600 million collisions every second. Data from these collisions is expected to change our basic understanding of antimatter, dark energy and more. The LHC will produce 15 million gigabytes of data a year: the storage capacity of around 20,000,000 CDs. Thousands of physicists all over the world want timely access to this data.
The LHC Computing Grid (LCG) combines the computing resources of more than 140 computing centers aiming to harness the power of 100,000 computers to process, analyze and store data produced from the LHC, making it equally available to all partners, regardless of their physical location.
Through this proposal, a new LCG computing model for the CMS experiment at the LHC will enable dynamic access to existing world-wide data caches and will provide the capabilities for applications on any laptop, server, or cluster, to access data seamlessly from wherever it is stored. Data access will no longer require the operation of large scale storage infrastructures local to the participating procesors..
This model will give the distributed physics groups automated access to "Any Data" at "Anytime" from "Anywhere", decreasing data access costs, and reducing application failure. When pre-staging of voluminous datasets into such local storage is not plausible - due to either a lack of local hardware capability or instantaneous demand - it will be replaced with on-demand access of only those data objects the analysis actually requires from any remote site where the data are available. This significantly reduces the overall I/O and storage space requirements outside the CMS computer systems. The total cost of ownership of computer centers at universities throughout the US is thus significantly reduced, as the dominant human as well as hardware costs related to provisioning and operating a storage infrastructure disappear. The project will enable smaller scale university clusters and physicists' desktop computers access to all types of CMS data and thus improve the scientific output of CMS scientists nationwide.
Further, this proposed infrastructure could be used for data-driven research in all fields, ranging from other natural sciences to social sciences and the humanities.