Applied Machine Learning in Recognition of DGA Domain Names

Miroslav Štampar1 and Krešimir Fertalj2

  1. SekuriPy LLC, Mirka Račkog 10
    10360 Zagreb, Croatia
    miroslav.stampar@sekuripy.hr
  2. Faculty of Electrical Engineering and Computing, Unska 3
    10000 Zagreb, Croatia
    kresimir.fertalj@fer.hr

Abstract

Recognition of domain names generated by domain generation algorithms (DGAs) is the essential part of malware detection by inspection of network traffic. Besides basic heuristics (HE) and limited detection based on blacklists, the most promising course seems to be machine learning (ML). There is a lack of studies that extensively compare different ML models in the field of DGA binary classification, including both conventional and deep learning (DL) representatives. Also, those few that exist are either focused on a small set of models, use a poor set of features in ML models or fail to secure unbiased independence between training and evaluation samples. To overcome these limitations, we engineered a robust feature set, and accordingly trained and evaluated 14 ML, 9 DL, and 2 comparative models on two independent datasets. Results show that if ML features are properly engineered, there is a marginal difference in overall score between top ML and DL representatives. This paper represents the first attempt to neutrally compare the performance of many different models for the recognition of DGA domain names, where the best models perform as well as the top representatives from the literature.

Key words

domain generation algorithm, binary classification, supervised machine learning, deep learning, blind evaluation

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS210104046S

Publication information

Volume 19, Issue 1 (January 2022)
Year of Publication: 2022
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Štampar, M., Fertalj, K.: Applied Machine Learning in Recognition of DGA Domain Names. Computer Science and Information Systems, Vol. 19, No. 1, 205-227. (2022), https://doi.org/10.2298/CSIS210104046S