We request funds to purchase an integrated supercomputer to unite 5 highly productive and collaborative laboratories with complementary expertise in the microbiome, proteomics, text mining, and supercomputing, and to extend these capabilities to the broader NIH-funded biomedical research community via cloud and web applications. The critical shared need not met by other systems on campus, unavailable in commercial clouds, and oversubscribed at national labs, is for a system that can run jobs that require high memory (8-32 GB/core) and long duration (>2 weeks wall-time), and is optimized for high-IO tasks that saturate network or storage on other systems. The system will consist of 128 servers, each using 2x8-core 2.93GHz Intel Sandybridge CPUs. 20 large-memory nodes will each have 512GB of RAM (32GB/core), and 100 compute nodes will each have 128GB of RAM (8GB/core). These 120 nodes will each use two 10Gbps Ethernet ports bonded together for a 20Gbps/node (2.5GB/s) connection to the rest of the system, and each node will have 2.4TB raw high- performance local storage. The total aggregate performance of these local disks is over 36GB/s sustained (>300MB/s per node). The remaining 8 nodes will be used for administration, support for advanced software tools and infrastructure, and user interaction. A central high-performance Lustre parallel file system will provide 1.15PB of usable scratch space and sustain 36GB/s to the 128 clients. An archival system of 4 drives/300 tapes will sustain >1GB/s aggregate (accounting for compression), provide 450TB of raw capacity, store ~4.5 PB of user data, and scale to 5x this size. The system, valued at $4.5 million but quoted at $2 million by HP due to the strategic importance of this partnership, will be housed in a state-of-the art machine room in the new Jennie Smoly Caruthers Biotechnology Building on the Boulder campus (opening Feb 2012), and connect to the rest of the campus at 40Gbps. The system will be a key enabling technology for key scientific areas where data growth is exponential and current systems on campus are end-of-life, solely dedicated to other purposes, or optimized for other tasks. The major users will use the instrument largely for time-consuming one-time tasks such as parameter optimization for microbiome and genome assembly workflows, building knowledgebases, and performing simulations and database searches that will provide resources that are re-used by much broader user communities (hundreds of collaborators;thousands of end users) who lack supercomputing access. One key innovative aspect of this proposal is configuration of part of the system as an academic cloud, which will allow us to pilot workflows that can later be deployed by diverse users on commercial clouds (e.g. Amazon EC2) and academic clouds (e.g. Magellan and DIAG) once those clouds are upgraded. The system will also build a broad expertise base in high-performance computing in the life sciences through outreach to promising new faculty and trainees on NIH training grants, and collaborations with new users of the Sequencing Core. The proposed instrument will thus have a profound impact on NIH-funded research.

National Institute of Health (NIH)
Office of The Director, National Institutes of Health (OD)
Biomedical Research Support Shared Instrumentation Grants (S10)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-F (30))
Program Officer
Levy, Abraham
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Colorado at Boulder
Schools of Arts and Sciences
United States
Zip Code
Tripodi, Ignacio J; Allen, Mary A; Dowell, Robin D (2018) Detecting Differential Transcription Factor Activity from ATAC-Seq Data. Molecules 23:
Fulbright, Scott P; Robbins-Pianka, Adam; Berg-Lyons, Donna et al. (2018) Bacterial community changes in an industrial algae production system. Algal Res 31:147-156
Azofeifa, Joseph G; Allen, Mary A; Hendrix, Josephina R et al. (2018) Enhancer RNA profiling predicts transcription factor activity. Genome Res :
Scott, Amber L; Richmond, Phillip A; Dowell, Robin D et al. (2017) The Influence of Polyploidy on the Evolution of Yeast Grown in a Sub-Optimal Carbon Source. Mol Biol Evol 34:2690-2703
Lladser, Manuel E; Azofeifa, Joseph G; Allen, Mary A et al. (2017) RNA Pol II transcription model and interpretation of GRO-seq data. J Math Biol 74:77-97
Stefferson, Michael W; Norris, Samantha L; Vernerey, Franck J et al. (2017) Effects of soft interactions and bound mobility on diffusion in crowded environments: a model of sticky and slippery obstacles. Phys Biol 14:045008
Azofeifa, Joseph G; Allen, Mary A; Lladser, Manuel E et al. (2017) An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq. IEEE/ACM Trans Comput Biol Bioinform 14:1070-1081
Blackwell, Robert; Edelmaier, Christopher; Sweezy-Schindler, Oliver et al. (2017) Physical determinants of bipolar mitotic spindle assembly and stability in fission yeast. Sci Adv 3:e1601603
Azofeifa, Joseph G; Dowell, Robin D (2017) A generative model for the behavior of RNA polymerase. Bioinformatics 33:227-234
Funk, Christopher S; Cohen, K Bretonnel; Hunter, Lawrence E et al. (2016) Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition. J Biomed Semantics 7:52

Showing the most recent 10 out of 14 publications