This project will acquire a state-of-the-art High Performance Computing (HPC) cluster to support large scale, data-driven research. The instrument will support a variety of projects from computer science, electrical engineering, ecology, evolutionary biology, neuroscience and genomics. In neuroscience, the cluster will allow the use of advanced statistical techniques at scale to identify and connect anatomical and functional brain-imaging features of diseased and healthy subjects with specific underlying genetic profiles. In computer science, using machine learning algorithms deployed on the instrument, researchers will to seek new ways to protect the security and privacy of users in large-scale networked systems. Finally, the cluster will also enable research that will improve our understanding of evolutionary history and the molecular complexities of traits through the analysis of multi-animal, large-scale genomic datasets. In addition, through short courses and multiday boot-camps, the instrument will provide valuable opportunities for training postdoctoral fellows, graduate students, and advanced undergraduates in large-scale computational data science. The instrument will also be a valuable asset for certificate programs in statistics and machine learning (one for undergraduate students, the other for graduate students) and for a certificate program in computational science, all of which will support broadening participation of groups underrepresented in STEM. The research and training enabled by the instrument is expected to help improve our understanding of human health and well-being, help create new knowledge that will aid economic competitiveness, and help maintain the country's leadership in science and engineering.
The computing cluster will be formed of by nodes with very large memory. The system complements the institution's investments in research cyberinfrastructure and will be managed by the Princeton Institute for Computational Science and Engineering (PICSciE) and the Office of Information Technology (OIT). The instrument would initially be used by five research groups, part of the Center for Statistics and Machine Learning (CSML), which will leverage existing programs and partnerships to increase participation in data science. The initial five specific projects are united under a common theme: machine learning will be used for analyzing big data sets that may not be easily broken into smaller pieces for processing. Specifically, they will examine the following: 1) the use of probabilistic models for large-scale scientific analysis and de novo design in applications areas such as mechanical metamaterials and mixed-signal circuit development; 2) statistical machine learning in genomics, biomedicine, and health biostatistics including the analysis of hospital records to aid doctors in taking early action to improve patient outcomes, the heritability of neuropsychiatric diseases and drug responses, and statistical and experimental examination of cardiovascular disease risk; 3) security and privacy challenges in networked systems using machine learning techniques to detect and isolate attackers in networked systems such as social media; 4) large-scale machine learning for neuroscience such as joint analysis of many large-scale, multi-subject fMRI datasets where the size and number of the datasets; 5) evolutionary genomic and epigenome analyses through collection and analysis of large datasets to investigate the evolutionary history and molecular complexities of traits. Collectively, these research groups are composed of forty graduate students, ten postdocs, and include, on average, thirteen undergrad research projects per year. The instrument will also be used by other researchers engaged in large-scale, data-driven research across a wide variety of disciplines. Hence both the capacity and the capability aspects of the proposed instrument will be highly utilized and will enable the continued advancement of research at the University.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.