An award is made to the University of California, Riverside (UCR) to acquire a highly scalable Big Data compute cluster dedicated to long-term support for data-intensive NSF research. By freeing and expanding currently overbooked research compute resources, the proposed compute cluster will have a significant impact on training and educating graduate and undergraduate students, including a high percentage from populations traditionally underrepresented in STEM disciplines. Based on URM enrollment, UCR has been designated by the Dept. of Education as an Hispanic Serving Institution (HSI). The availability of adequate computing and its beneficial impact on research programs will serve to attract outstanding students to a large number of undergraduate and graduate programs, including successful NSF-Funded REU and IGERT programs. New courses planned in conjunction with system acquisition will train students in parallel computing concepts and offer expanded access to integrate Big Data computing into their research projects. The Big Data compute cluster of this project also will support several smaller biotechnology companies in California that already use IIGB's computer facility. Future users from external research institutions, minority-serving colleges and industrial partners, particularly start-ups, will be actively recruited to gain access to IIGB's computing resources. Combined with UCR's diverse ethnicity and research mission, this investment will benefit a wide array of research directions and technology-based economic development initiatives at UCR, an institution that serves as an important driver of economic development in California's Inland Empire.
The goal of this project is to enable Big Data driven research at UCR to address grand challenge questions in a highly interdisciplinary environment. Questions include: How do different organism groups adapt to, and defend themselves against extreme environmental conditions or pathogens? How can more efficient and selective small molecules be developed to accelerate discovery-oriented chemical biology and genomics research? How can this knowledge translate into improved stress and pathogen tolerance, for example in crops to respond to global climate change and feed a growing world population? Recent advances in high-throughput and monitoring technologies offer, for the first time, novel methods to address these challenges systematically, comprehensively, and with unprecedented resolution. The new Big Data compute cluster substantially strengthens UCR's high-performance compute infrastructure and provides a critical enabling resource for UCR researchers from a broad spectrum of research specializations, including environmental science, chemical genomics, evolution, statistics, computational biology, and genome biology of multiple organism groups. Since this research relies on high-throughput and computational modeling approaches, its success and future growth is critically dependent on high-performance computer resources to manage and process large and rapidly increasing data sets. The system will be managed by experienced personnel in the Research Compute Facility of the Institute for Integrative Genome Biology (IIGB). The IIGB facility serves a broad user population distributed across departments from more than 50 research groups with 160 active users. The requested computing system therefore will reach a maximum number of NSF-funded investigators at UCR and constitutes a cost-effective investment of NSF and UCR funds.