NSF Convergence Accelerator Track D: Towards Intelligent Sharing and Search for AI Models and Datasets

Shang, Jingbo; Gupta, Rajesh; Ohno-Machado, Lucila; Kumar, Arun; Quer, Giorgio

Abstract

The NSF Convergence Accelerator supports use-inspired, team-based, multidisciplinary efforts that address challenges of national importance and will produce deliverables of value to society in the near future. A major goal of AI-driven applications is to discover the underlying patterns in domain-specific datasets, which typically requires tremendous field experience and interdisciplinary knowledge to design or even select suitable AI models. This project will develop a hub and portal for AI data sets and models. It will offer data and model matching recommendations, the use of domain knowledge to improve search strategies for data sets and models, and support for privacy. The hub and portal will engage a broad range of users (in STEM and non-STEM fields) creating AI-driven innovations in various domains that we can only imagine today. Successful execution will provide new tangible artifacts consisting of model and data schemas, software, systems, and services that would make the AI models and datasets easily discoverable, accessible, interoperable, and reproducible.

Four novel techniques will be used to realize the envisioned system: (1) A fine-grained privacy control technique with adaptive descriptive statistics, achieving a balance between the privacy needs of data owners and application-driven usability. All other components will have access to only the privacy-controlled data; (2) An automated metadata generation method that exploits various kinds of information about AI models and datasets (e.g., data values, model parameters, auxiliary descriptions) to incorporate domain logic into semantics. This metadata, together with the models and datasets, will be organized as a text-rich network; (3) A representation learning method that transforms information in the text-rich network into a latent space, where datasets/models with similar semantics would be close to each other. This learning over multimodal data will enable comprehensive understandings about models and datasets; (4) A learning-to-match model with constraints will be built to bridge datasets and models. The constraints are mainly induced from schema alignment between models and datasets, which can also filter out obvious non-compatible model and dataset choices, significantly expediting the search and matching process.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Office of International and Integrative Activities (IIA)
Type: Standard Grant (Standard)
Application #: 2040727
Program Officer: Mike Pozmantier

Project Start
Project End
Budget Start: 2020-09-15
Budget End: 2021-05-31
Support Year
Fiscal Year: 2020
Total Cost: $947,200
Indirect Cost

NSF Convergence Accelerator Track D: Towards Intelligent Sharing and Search for AI Models and Datasets
Shang, Jingbo Gupta, Rajesh Ohno-Machado, Lucila Kumar, Arun Quer, Giorgio
University of California San Diego, La Jolla, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments