TWC: Medium: Collaborative: HIMALAYAS: Hierarchical Machine Learning Stack for Fine-Grained Analysis of Malware Domain Groups

Gu, Guofei

Abstract

The domain name system (DNS) protocol plays a significant role in operation of the Internet by enabling the bi-directional association of domain names with IP addresses. It is also increasingly abused by malware, particularly botnets, by use of: (1) automated domain generation algorithms for rendezvous with a command-and-control (C&C) server, (2) DNS fast flux as a way to hide the location of malicious servers, and (3) DNS as a carrier channel for C&C communications. This project explores the development of a scalable, hierarchical machine-learning stack, called HIMALAYAS, which specializes in algorithms for automatically mining DNS data for malware activity. In particular, we are interested in isolating both ordered and unordered sets of malware domain groups whose access patterns are temporally and logically correlated.

HIMALAYAS performs a task of increasing complexity at each level ? starting from scalable clustering and feature selection at lower levels, to more advanced malware domain subsequence identification algorithms at higher levels. It has multiple benefits, including speed, accuracy, interpretability, and ability to use domain knowledge, which makes it very well suited for malware analysis and related tasks. The analysis by HIMALAYAS should accelerate the identification and takedown of malware domains on the Internet and improve services such as Google SafeSearch.

The machine-learning stack developed as part of the HIMALAYAS project has broader application to many important data mining problems, e.g., in financial data analysis, and mining user patterns from web access logs. The project provides opportunities for students to participate in the development and transition of the technology.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Network Systems (CNS)
Type: Standard Grant (Standard)
Application #: 1314823
Program Officer: Shannon Beck

Project Start
Project End
Budget Start: 2013-10-01
Budget End: 2018-09-30
Support Year
Fiscal Year: 2013
Total Cost: $250,000
Indirect Cost

TWC: Medium: Collaborative: HIMALAYAS: Hierarchical Machine Learning Stack for Fine-Grained Analysis of Malware Domain Groups
Gu, Guofei
Texas A&M Engineering Experiment Station, College Station, TX, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments