Preliminary research has demonstrated that by gaining control of the input data or the computation procedures, attackers can render machine learning based security analysis ineffective. The history of cybersecurity suggests that such attacks will become more prevalent in the real-world soon. This project undertakes the challenge of developing a systematic, foundational, and practical framework to understand attacks, quantify vulnerabilities, and fortify machine learning based security analytics. The framework's effectiveness is evaluated and demonstrated in realistic, user-facing environments, using real malware datasets. This project aims to fundamentally change how machine learning based systems will be designed, developed and deployed for security and malware analytics, cybersecurity more broadly, and numerous other application areas in science, education, and technology, as the use of machine learning is ubiquitous. The findings can lead to new breeds of adaptive defense systems that are highly resilient to current and future security attacks, helping protect the nation and its citizens from cyber harm.
This project combines multiple novel ideas synergistically, organized into four inter-related research thrusts: (1) machine learning theoretical framework, based on machine teaching and active learning, for understanding attacks, quantifying vulnerabilities, and measuring the capabilities of adversaries and model robustness; (2) algorithmic techniques for machine learning resilience, to adaptively counter adversaries' feature and sample manipulation strategies; (3) extensive evaluation of the identified attack and defense strategies with real and mutated malware datasets, on existing security systems, and demonstrate the improved attack resilience of the new, fortified machine learning system; (4) system-level countermeasures in real-world user-facing security analysis environments.