Computational and data science has transformed biomedical scientific discovery: its approaches are embedded into a wide range of workflows for diseases such as schizophrenia, depression, Alzheimer's, epilepsy, influenza, autism, drug addiction, pediatric cardiac care, Inflammatory Bowel Disease, prostate cancer and multiple myleloma. Sixty-one basic and translational researchers at Mount Sinai representing over $100 million in NIH funding, along with their collaborators from 75 external institutions, have utilized the Big Omics Data Engine (BODE) supercomputer to elucidate significant scientific findings in over 167 publications, including high impact journals such as Nature and Science, with 2,427 citations in three years. These researchers have also shared the data generated on BODE throughout their consortia and into national data sharing repositories. BODE is nearing the end of its vendor maintainable life, and researchers need increased computational throughput and storage space. To empower researchers to not only continue their inquiries, but to also tackle more complex scientific questions with decreased time to solution, we propose the Big Omics Data Engine 2 Supercomputer (BODE2). BODE2 will contain a total of 3,200 Intel Cascade Lake cores with 15 terabytes of memory and 14 petabytes of raw storage, and will leverage an existing 250 terabytes of SSDs. An instrument of this size is not available elsewhere affordably. With the proposed instrument, researchers will be able to take advantage of three major benefits: (1) the ability to receive results faster for overall greater scientific throughput; (2) the ability to increase the fidelity of their simulations and analyses; and (3) the ability to migrate research applications seamlessly to the software environment for greater scientific productivity. As with data produced on BODE, BODE2 data products will also be shared with the broader scientific community. BODE2 will provide the critical infrastructure needed by the wide range of researchers and clinicians for the genetics and population analysis, gene expression, machine learning and structural and chemical biology approaches used to make advances in these diseases.
A specialized Big Omics Data Engine 2 Supercomputer instrument will provide necessary computational and data science infrastructure for 61 research projects with 75 collaborating institutions in diverse areas such as Alzheimer's, autism, schizophrenia, drug addiction, influenza, pediatric cardiac care, depression, epilepsy, prostate cancer and multiple myeloma. Data generated from this instrument will be shared in national databases.