The project establishes 100G connectivity to the Johns Hopkins University in support of data intensive science. Recently, the NSF has funded several regional centers to bring their connection speed to Internet2 and the NLR to 100G. In Oct 2011 DOE and Internet 2 have announced the world's first transcontinental deployment of a 100G network based on coherent technology. Connections are now operational between (among others) New York, Washington DC, Chicago, and Sunnyvale. This comprehensive transformation includes the MidAtlantic Crossroads (MAX) that provides the JHU connectivity to Internet2.
Johns Hopkins is also in the process of establishing its own high-speed data overlay research network across the University, connecting 6 locations across the university with multiple 10G connections. These would be aggregated into a single 100G outgoing path to MAX and beyond: to the Teragrid, Internet2, and other national resources supporting computational science.
JHU has been awarded an NSF MRI grant to build the 5PB Data-Scope, a novel instrument to observe large data sets. The system will not only have large storage, but it will have extreme IO bandwidth (450GBps) and GPU based processing capability (~200TF), aimed at highly data-intensive analyses simply not possible anywhere else today. Data sets to be analyzed include very large simulations from astrophysics (>1PB), ocean circulation models (600TB), computational fluid dynamics (300TB), bioinformatics and neuroscience (2-300TB each). The main difficulty in these challenges is how to bring the data sets to the Data-Scope -- most of these are generated externally, like on the Teragrid, or at the Oak Ridge Jaguar and Kraken systems.
These amounts of data are pushing the limits of even a 10G connection. Demonstrating the ability to move PB of data and analyzing them in a timely fashion would encourage others to follow, and would change the way large data problems are approached in all areas of science today. It would also connect the Data-Scope to the whole US community, who can explore how to use such an instrument. It can also serve as a local hub for aggregating data sets for fast uploads into other data intensive national facilities, like the Gordon system in San Diego and the Open Science Data Grid, centered in Chicago. Without the high speed network it will be quite challenging to move petabytes: transferring a 1PB data set at an effective 5Gbps would take 18.5 days. Moving data at 100Gbps would shrink the time to 1 day, a make or break difference for a PB.
In order to upgrade to 100G the award activities provide end equipment at MAX, the corresponding optics at JHU, Cisco optics, the addition of a 100G capable card to the Cisco Nexus 7K switch at the JHU side, and a 6x40G-based card for the Nexus 7K to link to the internal JHU research network hubs around the University, where data-intensive resources (clusters, instruments, storage) are located. This strategy would enable the delivery of large amounts of data to the most critical locations at JHU.
The proposed high-speed connectivity would enable JHU and its partners to move petabyte-scale data sets, and enable a wide community to tackle cutting edge problems, from the traditional HPC and CFD to people who study the connectivity of the Internet. Students and researchers could do research on topics involving very large datasets, a very current and timely area, and acquire data management and analysis skills on a scale previously unheard of in a university setting.