Most websites use javascript to provide personalized content to the users. At the same time, more and more attackers are using the web to deliver their attacks, especially with malicious javascript. Malicious javascript detection needs to be fast enough so that it does not interfere with users' normal activities (non-invasive), and yet effective enough to protect them from the majority of attacks. Rule-based or signature-based detection mechanisms often fail to detect obfuscated and malicious javascript. Behavior-based detection mechanisms are more robust against obfuscation, and have been effective in identifying variants of known attacks. However, monitoring behavior during execution usually is rather invasive, as it requires too much time and resources to be used in web browsers while users are interacting with websites.
This project investigates non-invasive detection of malicious javascript using classifiers (data mining techniques) trained on malicious scripts, including obfuscated scripts. Preliminary results show that it is possible to detect the vast majority of malicious scripts without full-blown de-obfuscation, while labeling very few benign scripts as malicious. As the detection mechanism correctly identifies most benign scripts, resource-intensive detection mechanisms can use this method to filter most benign scripts and focus on the remainder only.
Key elements of the envisioned solutions are: (a) automatic collection of malicious javascript; (b) partial de-obfuscator that will extract features for classifiers; (c) classifiers that assess the maliciousness of scripts; (d) redirection graphs that chronicle the connections between websites hosting known malicious scripts; (e) feedback mechanism to assist javascript collection and classifier re-training.