This project, acquiring a High Performance Computing (HPC) cluster and an ancillary Parallel File System, aims to be able to fulfill the demand for computationally intensive data analysis and simulation resulting from the increased scale and complexity of biological and biotechnical data. The underlying work addresses the challenge created when analyzing, managing, and relating this type of information generated by the interdisciplinary distributed center in the state (BioSNTR: Biochemical Spatio-Temporal Network Resource). The HPC cluster supports the BioSNTR's data analysis research activities, which spans the disciplines of genomics, optical microscopy, satellite imaging, computational chemistry, and electrical, and mechanical engineering; in fact, it will be a hub for BioSNTR, a distributed research center across the state created by another NSF-funded project that has resulted in 10 new tenure-track hires across SD. This center has acquired new optical microscopes that can produce massive amounts of data, requiring intensive computational reconstruction methods driving pressing need for local HPC resources. A previous MRI enabled the creation of a next generation gene sequencing facility that, coupled with BioSNTR hires in bioinformatics, drives the need for expanded HPC capabilities for a range of genomics applications. BioSNTR aims to link advanced imaging with next generation sequencing to connect the system-level genotype to phenotype. The HPC platform is essential for realizing the integration of large data sets. Additionally, the South Dakota Image Processing Laboratory (IPLab) would benefit from this equipment.

Engineering applications include satellite and fluid dynamic computation. IPLab develops novel techniques to account for atmospheric anomalies that adversely impact spectral reflectance signatures measured from land features. Future work includes extensive computational processing of Landsat scenes for climatological analysis (3 to 4 million data sets). Estimates indicates that this work will consume all the processing power within the current cluster for an entire year. Simulating efficiencies in energy power systems relies heavily on high parallel processing. Bioscience and biotechnology have entered a new era in which the new instrumentation transforms the scale of complexity of biological data. These instruments produce expansive information about organismal phenotype/function and genotype, thereby creating a computational challenge in analyzing, managing, and relating this information. Beyond the more traditional disciplines, the instrumentation will facilitate research in emerging areas. Examples include data intensive research methods to analyze high frequency financial transactions within economics, and journalism research using data mining techniques to study the characteristics of social factors that contribute to collaborative conversational tendencies.

Broader Impacts: The instrumentation allows inclusion of parallel computer processing into the graduate courses within the biological, chemical, and statistical science and enhances the development of curriculum within the Computer Science program. The new cluster leverages a recently-funded REU site focused on early undergraduate students from tribal and community colleges. The investigators are able to coordinate yearly training workshops in transcriptomics and image analysis in collaboration with BioSNTR. These will be open to students, researchers, and faculty throughout the state.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1726946
Program Officer
Rita Rodriguez
Project Start
Project End
Budget Start
2017-10-01
Budget End
2018-12-31
Support Year
Fiscal Year
2017
Total Cost
$796,359
Indirect Cost
Name
South Dakota State University
Department
Type
DUNS #
City
Brookings
State
SD
Country
United States
Zip Code
57007