PAR16054Application,PI:SomaleeDatta,PhD,StanfordUniversity NIH investigators at Stanford are increasingly analyzing terabyte to petabyte scale datasets generatedusingstateoftheartbiomedicaltechnologies.Itisnolongerunusualtofindstudies that analyze hundreds of samples, or correlate with other available large scale cohorts (e.g. UK10K, TCGA), or involve longitudinal multimodal data. Analysis and interpretation of these large scale complex data require a computational environment that is fast and affordable. Stanfordresearchershaveaccesstocomputationalcapacityintheformoftraditionalclusters. TheseclustersexhibitsignificantIObottleneckstherebyslowingdowntherateandamountof analysisandinsomecases,makingtheanalysisimpractical.Stanfordhasalsomadesignificant investment to prototype certain biomedical applications using the ?Big Data? distributed programming stack, Hadoop, such as memory intensive Spark clusters or Google BigQuery. Unfortunately,itisimpracticaltorewrite100sofcommonbiomedicalapplicationsforHadoop. Wethereforeproposeasupercomputer,thatallowsourbiomedicalapplicationstoscalewithout additionalinvestmentinpersonneltorewritethecommonlyusedtools.Thesupercomputerhas largenumberofprocessors(256CPUcores,4TeslaK80GPUs)andmemory(8TBRAM,16 TB?NVMeflash?)toprovideperformanceforlargedata.Tooptimallysupporttheneedsofour research community, we propose to colocate the supercomputer along with our existing computational capacity. The system will be hosted at a Stanford approved data center and managed by Stanford IT. Researchers using this device will have access to an extensive biomedicalstackwithover350commonlyusedtoolsaswellasmanycommonlyusedreference datasets(e.g.1000genome)andaccesscontroldatasets(e.g.GTExorTCGA).Withaddition of the supercomputer, the computational environment will provide a full stack biomedical research environment including capacity, range of computational capabilities, extensive biomedicaltools,commonreferencedatasets,easilyavailabletrainingandconsultingsupport. By making this supercomputer available to a large crossdisciplinary biomedical research communityatStanford,weexpecttoinvigoratedevelopmentofnovelalgorithms,mathematical andstatisticalapproachesunhinderedbythelimitationsofcurrentcapabilitiesfoundintypical academicclustersandpublicClouds. ProjectSummary/Abstract
PAR16054Application,PI:SomaleeDatta,PhD,StanfordUniversity ProjectNarrative: Therequestedsupercomputerwillbeusedtoanalyzet?erabyteandpetabytescaledata generatedusingmodernsequencingandimagingtechnologiesi?nNIHsponsoredstudiesat StanfordUniversity.Thesestudiesplayanessentialroleinpursuitofunderstandinghuman healthanddisease.Theyalsorepresentawidespectrumofapplications,fromexploringthe fundamentalbuildingblocksinbiologytoinvestigatingundiagnoseddiseases.The supercomputerisspecificallydesignedtomeetunmetcomputationalneedsandwillequip researcherswithamoderncomputationalenvironmentnecessarytoproperlyanalyzethe valuableexperimentaldatainacosteffectiveandtimelymanner. ProjectNarrative