Computational appliance: a supercomputer for modern biomedical research

Datta, Somalee

Abstract

PAR16054Application,PI:SomaleeDatta,PhD,StanfordUniversity NIH investigators at Stanford are increasingly analyzing terabyte to petabyte scale datasets generatedusingstateoftheartbiomedicaltechnologies.Itisnolongerunusualtofindstudies that analyze hundreds of samples, or correlate with other available large scale cohorts (e.g. UK10K, TCGA), or involve longitudinal multimodal data. Analysis and interpretation of these large scale complex data require a computational environment that is fast and affordable. Stanfordresearchershaveaccesstocomputationalcapacityintheformoftraditionalclusters. TheseclustersexhibitsignificantIObottleneckstherebyslowingdowntherateandamountof analysisandinsomecases,makingtheanalysisimpractical.Stanfordhasalsomadesignificant investment to prototype certain biomedical applications using the ?Big Data? distributed programming stack, Hadoop, such as memory intensive Spark clusters or Google BigQuery. Unfortunately,itisimpracticaltorewrite100sofcommonbiomedicalapplicationsforHadoop. Wethereforeproposeasupercomputer,thatallowsourbiomedicalapplicationstoscalewithout additionalinvestmentinpersonneltorewritethecommonlyusedtools.Thesupercomputerhas largenumberofprocessors(256CPUcores,4TeslaK80GPUs)andmemory(8TBRAM,16 TB?NVMeflash?)toprovideperformanceforlargedata.Tooptimallysupporttheneedsofour research community, we propose to colocate the supercomputer along with our existing computational capacity. The system will be hosted at a Stanford approved data center and managed by Stanford IT. Researchers using this device will have access to an extensive biomedicalstackwithover350commonlyusedtoolsaswellasmanycommonlyusedreference datasets(e.g.1000genome)andaccesscontroldatasets(e.g.GTExorTCGA).Withaddition of the supercomputer, the computational environment will provide a full stack biomedical research environment including capacity, range of computational capabilities, extensive biomedicaltools,commonreferencedatasets,easilyavailabletrainingandconsultingsupport. By making this supercomputer available to a large crossdisciplinary biomedical research communityatStanford,weexpecttoinvigoratedevelopmentofnovelalgorithms,mathematical andstatisticalapproachesunhinderedbythelimitationsofcurrentcapabilitiesfoundintypical academicclustersandpublicClouds. ProjectSummary/Abstract

Public Health Relevance

PAR16054Application,PI:SomaleeDatta,PhD,StanfordUniversity ProjectNarrative: Therequestedsupercomputerwillbeusedtoanalyzet?erabyteandpetabytescaledata generatedusingmodernsequencingandimagingtechnologiesi?nNIHsponsoredstudiesat StanfordUniversity.Thesestudiesplayanessentialroleinpursuitofunderstandinghuman healthanddisease.Theyalsorepresentawidespectrumofapplications,fromexploringthe fundamentalbuildingblocksinbiologytoinvestigatingundiagnoseddiseases.The supercomputerisspecificallydesignedtomeetunmetcomputationalneedsandwillequip researcherswithamoderncomputationalenvironmentnecessarytoproperlyanalyzethe valuableexperimentaldatainacosteffectiveandtimelymanner. ProjectNarrative

Funding Agency

Agency: National Institute of Health (NIH)
Institute: Office of The Director, National Institutes of Health (OD)
Type: Biomedical Research Support Shared Instrumentation Grants (S10)
Project #: 1S10OD023452-01
Application #: 9273288
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Horska, Alena

Project Start: 2017-03-15
Project End: 2018-03-14
Budget Start: 2017-03-15
Budget End: 2018-03-14
Support Year: 1
Fiscal Year: 2017
Total Cost
Indirect Cost

Computational appliance: a supercomputer for modern biomedical research
Datta, Somalee
Stanford University, Stanford, CA, United States

Abstract

Public Health Relevance

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Public Health Relevance

Funding Agency

Institution

Comments