Generative adversarial networks (GAN), which estimate the real data distribution through an adversarial game, have achieved great success in image generation and text generation. However, many decision models from real world applications are trained on relational data that contains both numerical and categorical attributes. Furthermore, real world applications often have legal or ethical requirements on the generated data, e.g., discrimination free or privacy preservation, or expect to generate complementary samples to their existing data that may only contain normal samples. The research significantly improves the applicability of generative adversarial networks from image/text data to relational data with application requirements and contributes to the limited base of knowledge in the area of using generative adversarial networks for constraint aware relational data generation. The findings, tools, software code, and curricular materials documents are disseminated to the research community, IT industry, and users, which expects to help domain users generate constraint aware data to meet their business needs.
This EAGER research develops novel techniques that enable the current generative adversarial networks to generate realistic relational data with constraints. The developed framework adds a decoder to the generator to generate both numerical and categorical data and incorporates constraint terms into the objective functions of generator and discriminator or introduces multiple discriminators to enforce requirement constraints. The research adopts f-divergences to analyze the convergence of the constraint aware GAN framework when complex constraints are introduced. The research then focuses on developing under the unifying framework two specific models, fair GAN for generating discrimination-free data, and complementary GAN for generating negative samples when only positive samples are available in the training data. The research conducts empirical evaluations of the framework and two specific models in terms of accuracy and convergence, implements and integrates the developed algorithms into TensorFlow, an open source deep learning software system. The developed framework expects to advance theoretical understanding of generative adversarial networks and the two specific GAN models expect to improve the current research on fairness aware learning and fraud detection. In particular, the fair GAN introduces the new approach of fair data generation based on GAN as current fairness aware learning research mainly adopts data modification. The complimentary GAN outperforms existing one-class classification models for fraud detection by generating complementary samples and enabling the trained discriminator to accurately separate abnormal samples from normal ones.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.