Unlawful online business often leaves behind human-readable text traces for interacting with its targets (e.g., defrauding victims, advertising illicit products to intended customers) or coordinating among the criminals involved. Such text content is valuable for detecting various types of cybercrimes and understanding how they happen, the perpetrator's strategies, capabilities and infrastructures and even the ecosystem of the underground business. Automatic discovery and analysis of such text traces, however, are challenging, due to their deceptive content that can easily blend into legitimate communication, and the criminal's extensive use of secret languages to hide their communication, even on public platforms (such as social media and forums). The project aims at systematically studying how to automatically discover such text traces and intelligently utilize them to fight against online crime. The research outcomes will contribute to more effective and timely control of online criminal activities, and the team's collaboration with industry also enables the team to get feedback and facilitate the transformation of new techniques to practical use.
This project focuses on both criminals' communication with their targets and the underground communications among miscreants. To discover and understand illicit online activities, the research looks for any semantic inconsistency between text content and its context (such as advertisements for selling illegal drugs on an .edu domain) and for inappropriate operations being triggered (such as a malware download). Inconsistencies are captured by the Natural Language Processing (NLP) techniques customized to various security settings. Further, based upon crime-related content discovered, the project will study various machine learning techniques that support automatic extraction and analysis of threat intelligence and criminal activities. The techniques are evaluated using data collected from various sources (public datasets, underground forums and others), and the findings they make are validated through a process that involves manual labeling, communication with affected parties, and collaborations with industry partners. This work will help create in-depth knowledge about underground ecosystems and lead to more effective control of illicit operations of these online businesses.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.