This project is targeted towards the NSF research workforce who need to develop machine learning applications that will run on the national cyberinfrastructure (CI). At the heart of this project is the CI-enabled Machine Learning (CIML) training system and repository that will support the development of cyberliteracy in the ML space. Unlike much of current ML-related training material found easily online, CIML material will be centered on science and engineering applications that make use of CI-enabled ML techniques and will be used to train a research workforce that is capable of understanding the challenges of working with CI, new HPC architectures, software, and applications. The user community for CIML training material will include students (undergraduate, graduate), postdocs, PIs, researchers, educators, and HPC trainers, each with their own diverse backgrounds and application requirements. The project will support national educational goals by ensuring that the CI modules run on advanced CI tools and resources, and that core literacy and discipline appropriate skills in advanced CI will be integrated into curricula and instructional material. CIML will support national security concerns by facilitating a workforce capable of developing ML applications in scientific domains such as climate and weather, the biosciences, physics, and chemistry. As a result of outreach and extension of the training efforts, this program will impact thousands of users and help develop the next generation of the CI research workforce. CIML training material will be available online, so the project has a huge potential to reach beyond the NSF cyber workforce to impact other communities including hospital and medical treatment systems, transportation and electrical monitoring systems, stock market monitoring systems, and disaster response systems.

The Cyberinfrastructure-enabled Machine Learning (CIML) training system and repository will use a “best practices” approach to develop a unique program targeted towards the research workforce who use machine learning (ML) and big data analytics methods for their domain specific applications or instructional material on large-scale cyberinfrastructure. The project will apply methods of Cyber Literacy and HPC Competencies to define a set of core ML and domain specific literacy areas as a function of the dimensions of learning ranging from a technological focus to a problem-solving focus or a focus on ML or computational science. Sources for the CIML system will be drawn from the work of HPC training, existing HPC researchers and users, collaborators, as well as new code and methods. The materials developed will be available via the CIML repository, which includes a web site, documentation, GitHub repositories for code, data, and related materials. CIMIL will become a useful tool for 2 communities: users who want to understand what technologies and skills they need to master in order to run a particular ML application, what systems to use, and suggested software libraries; and trainers who need to know what topics to teach. The outcome of these efforts will result in a community of machine learning and data analytics CI Users (CIU) and Contributors (CIC) who actively contribute to the training material repository and incorporate the materials into their projects and courses. As a result of these efforts, the CIML program will extend the scope of the ongoing education and training across the research workforce by developing cyberinfrastructure-based materials that will utilize and contribute to training material developed for XSEDE training, higher education, and other programs, and will impact thousands of existing and new users, including students (undergrads/grads), postdocs, PIs, researchers, and educators, each with their own diverse backgrounds and application requirements. CIML training material will be available online, so the project has a huge potential to reach beyond the cyber workforce and to impact many communities, including hospital and medical treatment systems, transportation and electrical monitoring systems, stock and market monitoring systems, and disaster response systems.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
2017767
Program Officer
Alan Sussman
Project Start
Project End
Budget Start
2020-09-01
Budget End
2024-08-31
Support Year
Fiscal Year
2020
Total Cost
$500,000
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093