Recent dramatic increases in capacity and reduction of costs for Next Generation Sequencing (NGS) is generating revolutionary new information for biomedical sciences. New information from NGS is providing new insights into genomic aberrations, factors underlying complex diseases, and uncovering new non-coding RNAs, as well as novel RNA modifications associated with cell function. However, the NGS data require intensive computing even to put the data in usable form;and, modeling and analysis of these data require even more computing. Simply put, without matching computational resources the NGS machines are useless. The University of Pennsylvania has six installed NGS machines with additional four machines planned, which are being used by a large community of NIH-sponsored investigators. As the costs continue to come down the user community will only increase. Processing the raw data from 10 NGS machines require up to 14.4 million CPU hours of computing per year. Existing computational instruments can only meet 20-30% of this demand. The only other possible source, external commercial cloud computing, has significantly higher costs, data security risks, and still require additional infrastructure to store the resultig data. To meet the critical demand of NGS technologies, we propose to purchase and operate a dedicated high-performance computation instrument. The proposed instrument from IBM (IDataPlex/SONAS) will have 1,440 computing cores and an expandable multi-tier storage with a total capacity of 1.9 petabytes. This instrument features efficient power and cooling, which is critical for extremely large scale computing, and a modular storage system that can be fine-tuned for NGS performance and cost-effectively enlarged using a balance of hard drives and tapes. Unlike most other specialized equipment, this high-performance computing instrument for biomedical data will impact the research of hundreds of investigators, postdoctoral fellows, and graduate trainees. The instrument will remove a significant bottleneck to utilizing NGS technology;potentially alleviate more than $1 million of computing costs per year for NIH-sponsored investigators;and make prior institutional and NIH investment more efficient and useful.

Agency
National Institute of Health (NIH)
Institute
Office of The Director, National Institutes of Health (OD)
Type
Biomedical Research Support Shared Instrumentation Grants (S10)
Project #
1S10OD012312-01
Application #
8334809
Study Section
Special Emphasis Panel (ZRG1-BST-F (30))
Program Officer
Levy, Abraham
Project Start
2013-04-22
Project End
2014-04-21
Budget Start
2013-04-22
Budget End
2014-04-21
Support Year
1
Fiscal Year
2013
Total Cost
$1,954,859
Indirect Cost
Name
University of Pennsylvania
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104