Progress in threat detection research is greatly hindered by the fact that many data sets related to areas of national security cannot be shared with experts in academia or industry due to security clearance barriers. The limited access to meaningful data sets prevents many researchers from contributing their expertise in algorithm development and verification. This research effort is poised to solve this important problem by developing a rigorous mathematical framework for the faithful and privacy-preserving generation of synthetic data. The goal is to create an as-realistic-as-possible dataset, one that not only maintains the nuances of the original data, but does so without endangering important pieces of sensible information. The results of this project will play a key role in advancing research in threat detection and many other fields where privacy is key. Strong expectation for success of this project is based on solid theoretical achievements by the investigators in high-dimensional probability, signal processing, and mathematical data science, as well as their expertise in turning advanced mathematical concepts into real-world applications in the areas of artificial intelligence, signal processing, medical diagnostics, threat detection, and communications engineering.
This research effort is a fusion of several areas of cutting edge mathematics with state-of-the-art artificial intelligence. It seeks to bring advanced techniques from optimization, probability, and machine learning to data science in form of robust and efficient computational methods. Theoretical deliverables are expected to be in the form of new mathematical concepts for the development of multimodal scalable synthetic data. Computational deliverables will be in the form of numerical algorithms for privacy-protecting artificial intelligence. Beyond the project's broad technological impact, it will serve as a model for the kind of cross-disciplinary activity critical for research and education at the frontier of mathematics and data science. The payoffs for society at large are many, including increased privacy protection while maintaining the benefits of data-driven discovery. The users of synthetic data will include researchers in the national security sector, computer scientists, privacy experts, health administrators, medical information system developers, epidemiologists, oncologists and health economists.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.