The Department of Systems Biology (DSB) at the Columbia University Medical Center (CUMC) hosts and operates a state-of-the-art high performance computing environment (HPCE), assembled specifically as a Core resource to serve the needs of biomedical and systems biology research at Columbia University. The HPCE has been instrumental in supporting the research of DSB members, enabling demanding computational investigations that required hundreds of thousands of CPU hours, involved the processing of many terabytes of genomic data, and led to numerous high-impact publications. Beyond its use by DSB faculty, the DSB HPCE is a resource that is available to the entire CUMC community and is extensively utilized by many non-DSB investigators. Importantly, it is crucial for the operation of the Columbia Genome Center, providing high performance computing and storage capabilities that are essential for the bioinformatics analysis of next generation sequencing data. Central to the operation of our HPCE is a multi-tiered, network-accessible storage architecture capable of accommodating both high performance computing as well as long term data storage and backup. The key hardware components of this architecture are 28 EMC/Isilon nodes that provide a total capacity of 1.2 Petabytes of usable disk space. These nodes have been in service for several years and are now approaching the end of their service life. Through this application we seek to retire these aging modules and replace them with a new 1.6 Petabyte storage solution that can accommodate the current and future needs of our investigator community. Further, we aim to optimize the overall cost and performance profile of our architecture by refining the mix of high-end (fast) and low-end (slower) drives to create a solution that is both more cost-effective and better aligned with the historical and projected data usage patterns of our user base. Specifically, we propose to replace our existing EMC/Isilon modules with a new storage cluster from EMC/Isilon comprising their latest nodes, models X410 and NL410. The X410 nodes are designed for high performance cluster computing and provide optimized I/O and network performance. The NL410 nodes are designed for high capacity general storage usage. Both nodes incorporate advanced replication and redundancy technology to ensure data protection and reliability. Our design also includes commodity hardware from RAID Inc., which will provide additional lower cost storage to accommodate the archiving of non-critical datasets. The new equipment is mission-critical for our continuing abilit to support the high performance computing needs of NIH-funded research at the CUMC. No adequate HPCE alternatives exist at Columbia or nearby institutions. And the cost of commercial cloud computing options is prohibitive when taking into account the sheer volume of our high performance computing and data storage needs.

Public Health Relevance

Increasingly, modern biomedical research depends critically on computationally demanding analyses of large genomic datasets, requiring access to sophisticated and expensive high-performance computing infrastructure that individual investigator labs cannot afford or manage. To address this challenge Columbia University has assembled a state-of-the-art High Performance Computing Environment (HPCE) which serves as a Core resource supporting biomedical research at the University. In this application we seek to replace aging data storage equipment so that our HPCE can continue serving the needs of NIH-sponsored research for the foreseeable future.

National Institute of Health (NIH)
Office of The Director, National Institutes of Health (OD)
Biomedical Research Support Shared Instrumentation Grants (S10)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-M (30))
Program Officer
Klosek, Malgorzata
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Schools of Medicine
New York
United States
Zip Code
Chiu, Hua-Sheng; Martínez, María Rodríguez; Komissarova, Elena V et al. (2018) The number of titrated microRNA species dictates ceRNA regulation. Nucleic Acids Res 46:4354-4369
Ding, Hongxu; Wang, Wanxin; Califano, Andrea (2018) iterClust: a statistical framework for iterative clustering analysis. Bioinformatics 34:2865-2866
Thorsson, Vésteinn; Gibbs, David L; Brown, Scott D et al. (2018) The Immune Landscape of Cancer. Immunity 48:812-830.e14
Risom, Tyler; Langer, Ellen M; Chapman, Margaret P et al. (2018) Differentiation-state plasticity is a targetable resistance mechanism in basal-like breast cancer. Nat Commun 9:3815
Tomljanovic, Zeljko; Patel, Mitesh; Shin, William et al. (2018) ZCCHC17 is a master regulator of synaptic gene expression in Alzheimer's disease. Bioinformatics 34:367-371
Ding, Hongxu; Douglass Jr, Eugene F; Sonabend, Adam M et al. (2018) Quantitative assessment of protein activity in orphan tissues and single cells using the metaVIPER algorithm. Nat Commun 9:1471
Rajbhandari, Presha; Lopez, Gonzalo; Capdevila, Claudia et al. (2018) Cross-Cohort Analysis Identifies a TEAD4-MYCN Positive Feedback Loop as the Core Regulatory Element of High-Risk Neuroblastoma. Cancer Discov 8:582-599
Cesana, Marcella; Guo, Michael H; Cacchiarelli, Davide et al. (2018) A CLK3-HMGA2 Alternative Splicing Axis Impacts Human Hematopoietic Stem Cell Molecular Identity throughout Development. Cell Stem Cell 22:575-588.e7
Dionne, Gilman; Qiu, Xufeng; Rapp, Micah et al. (2018) Mechanotransduction by PCDH15 Relies on a Novel cis-Dimeric Architecture. Neuron 99:480-492.e5
Boboila, Shuobo; Lopez, Gonzalo; Yu, Jiyang et al. (2018) Transcription factor activating protein 4 is synthetically lethal and a master regulator of MYCN-amplified neuroblastoma. Oncogene 37:5451-5465

Showing the most recent 10 out of 47 publications