The NSF Convergence Accelerator supports use-inspired, team-based, multidisciplinary efforts that address challenges of national importance and will produce deliverables of value to society in the near future. A major goal of AI-driven applications is to discover the underlying patterns in domain-specific datasets, which typically requires tremendous field experience and interdisciplinary knowledge to design or even select suitable AI models. This project will develop a hub and portal for AI data sets and models. It will offer data and model matching recommendations, the use of domain knowledge to improve search strategies for data sets and models, and support for privacy. The hub and portal will engage a broad range of users (in STEM and non-STEM fields) creating AI-driven innovations in various domains that we can only imagine today. Successful execution will provide new tangible artifacts consisting of model and data schemas, software, systems, and services that would make the AI models and datasets easily discoverable, accessible, interoperable, and reproducible.

Four novel techniques will be used to realize the envisioned system: (1) A fine-grained privacy control technique with adaptive descriptive statistics, achieving a balance between the privacy needs of data owners and application-driven usability. All other components will have access to only the privacy-controlled data; (2) An automated metadata generation method that exploits various kinds of information about AI models and datasets (e.g., data values, model parameters, auxiliary descriptions) to incorporate domain logic into semantics. This metadata, together with the models and datasets, will be organized as a text-rich network; (3) A representation learning method that transforms information in the text-rich network into a latent space, where datasets/models with similar semantics would be close to each other. This learning over multimodal data will enable comprehensive understandings about models and datasets; (4) A learning-to-match model with constraints will be built to bridge datasets and models. The constraints are mainly induced from schema alignment between models and datasets, which can also filter out obvious non-compatible model and dataset choices, significantly expediting the search and matching process.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Diego
La Jolla
United States
Zip Code