PAR16054Application,PI:SomaleeDatta,PhD,StanfordUniversity NIH investigators at Stanford are increasingly analyzing terabyte to petabyte scale datasets generatedusingstateoftheartbiomedicaltechnologies.Itisnolongerunusualtofindstudies that analyze hundreds of samples, or correlate with other available large scale cohorts (e.g. UK10K, TCGA), or involve longitudinal multimodal data. Analysis and interpretation of these large scale complex data require a computational environment that is fast and affordable. Stanfordresearchershaveaccesstocomputationalcapacityintheformoftraditionalclusters. TheseclustersexhibitsignificantIObottleneckstherebyslowingdowntherateandamountof analysisandinsomecases,makingtheanalysisimpractical.Stanfordhasalsomadesignificant investment to prototype certain biomedical applications using the ?Big Data? distributed programming stack, Hadoop, such as memory intensive Spark clusters or Google BigQuery. Unfortunately,itisimpracticaltorewrite100sofcommonbiomedicalapplicationsforHadoop. Wethereforeproposeasupercomputer,thatallowsourbiomedicalapplicationstoscalewithout additionalinvestmentinpersonneltorewritethecommonlyusedtools.Thesupercomputerhas largenumberofprocessors(256CPUcores,4TeslaK80GPUs)andmemory(8TBRAM,16 TB?NVMeflash?)toprovideperformanceforlargedata.Tooptimallysupporttheneedsofour research community, we propose to colocate the supercomputer along with our existing computational capacity. The system will be hosted at a Stanford approved data center and managed by Stanford IT. Researchers using this device will have access to an extensive biomedicalstackwithover350commonlyusedtoolsaswellasmanycommonlyusedreference datasets(e.g.1000genome)andaccesscontroldatasets(e.g.GTExorTCGA).Withaddition of the supercomputer, the computational environment will provide a full stack biomedical research environment including capacity, range of computational capabilities, extensive biomedicaltools,commonreferencedatasets,easilyavailabletrainingandconsultingsupport. By making this supercomputer available to a large crossdisciplinary biomedical research communityatStanford,weexpecttoinvigoratedevelopmentofnovelalgorithms,mathematical andstatisticalapproachesunhinderedbythelimitationsofcurrentcapabilitiesfoundintypical academicclustersandpublicClouds. ProjectSummary/Abstract

Public Health Relevance

PAR16054Application,PI:SomaleeDatta,PhD,StanfordUniversity ProjectNarrative: Therequestedsupercomputerwillbeusedtoanalyzet?erabyteandpetabytescaledata generatedusingmodernsequencingandimagingtechnologiesi?nNIHsponsoredstudiesat StanfordUniversity.Thesestudiesplayanessentialroleinpursuitofunderstandinghuman healthanddisease.Theyalsorepresentawidespectrumofapplications,fromexploringthe fundamentalbuildingblocksinbiologytoinvestigatingundiagnoseddiseases.The supercomputerisspecificallydesignedtomeetunmetcomputationalneedsandwillequip researcherswithamoderncomputationalenvironmentnecessarytoproperlyanalyzethe valuableexperimentaldatainacosteffectiveandtimelymanner. ProjectNarrative

Agency
National Institute of Health (NIH)
Institute
Office of The Director, National Institutes of Health (OD)
Type
Biomedical Research Support Shared Instrumentation Grants (S10)
Project #
1S10OD023452-01
Application #
9273288
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Horska, Alena
Project Start
2017-03-15
Project End
2018-03-14
Budget Start
2017-03-15
Budget End
2018-03-14
Support Year
1
Fiscal Year
2017
Total Cost
Indirect Cost
Name
Stanford University
Department
Genetics
Type
Schools of Medicine
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94304