0% found this document useful (0 votes)

44 views

DGA Botnet Detection Using Supervised Learning Methods-1

Uploaded by

VũTuấnHưng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

DGA Botnet Detection Using Supervised Learning Methods-1

Uploaded by

VũTuấnHưng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Machine Translated by Google

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/321741275

DGA Botnet Detection Using Supervised Learning Methods

Conference Paper · December 2017
DOI: 10.1145/3155133.3155166

CITATIONS READS WILL

44 1.581

5 authors, including:

Understanding Mac
Duc Quang Tran
Hanoi University of Science and Technology Hanoi University of Science and Technology

8 PUBLICATIONS 264 CITATIONS 17 PUBLICATIONS 272 CITATIONS

SEE PROFILE SEE PROFILE

Van Tong Giang Nguyen

Université Paris-Est Créteil Val de Marne - Université Paris 12 Hanoi University of Science and Technology (Hanoi Polytechnic)
15 PUBLICATIONS 317 CITATIONS 17 PUBLICATIONS 308 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Research and Development of the DDoS attack prevention and Botnet detection system View project

Pervasive and Secure Information Service Infrastructure for Internet of Things based on Cloud Computing View project

All content following this page was uploaded by Hieu Mac on December 18, 2017.

The user has requested enhancement of the downloaded file.

Machine Translated by Google

DGA Botnet Detection Using Supervised Learning Methods

Understanding Mac Duc Tran Van Tong
Bach Khoa Cybersecurity Center Bach Khoa Cybersecurity Center Bach Khoa Cybersecurity Center
HUST HUST HUST
Vietnam Vietnam Vietnam
hieumd@soict.hust.edu.vn ductq@soict.hust.edu.vn tongvanvan94@gmail.com

Linh Giang Nguyen Hai Anh Tran

Bach Khoa Cybersecurity Center Bach Khoa Cybersecurity Center
HUST HUST
Vietnam Vietnam
Giangnl@soict.hust.edu.vn anhth@soict.hust.edu.vn
ACM Reference format:
ABSTRACT1
Modern botnets are based on Domain Generation Algorithms H. Mac, D. Tran, V. Tong, G. Nguyen, A. Tran. 2017. DGA Botnet
Detection Using Supervised Learning Methods. In SoICT '17: Eighth
(DGAs) to build a resilient communication between bots and
International Symposium on Information and Communication Technology,
Command and Control (C&C) server. The basic aim is to avoid
December 7–8, 2017, Nha Trang City, Vietnam. ACM, New York, NY,
blacklisting and evade the Intrusion Protection Systems (IPS). USA, 8 pages. https://doi.org/10.1145/3155133.3155166
Given the prevalence of this mechanism, numerous solutions have
been developed in the literature. In particular, supervised learning
has received an increased interest as it is able to operate on the 1 INTRODUCTION
raw domains and is amenable to real-time applications.
Botnets have become a technological backbone to support cyber
Hidden Markov Model, C4.5 decision tree, Extreme Learning
criminals such as launching distributed denial of service attacks,
Machine, Long Short-Term Memory networks have become the
stealing personal data and sending spam mail [2]. Most bots today
state of the art in DGA botnet detection. There also exist several
are based on Domain Generation Algorithms (DGAs) to create a
advanced supervised learning methods, namely Support Vector
rendezvous point with their Command and Control (C&C) server
Machine (SVM), Recurrent SVM, CNN+LSTM and Bidirectional
[4]. A typical DGA consists of various seeds, which utilizes static
LSTM, which have not been suitably appropriated in such domain.
integer, current date and time to generate a list of candidate
This paper presents a first attempt to thoroughly investigate all the
domains. The domain list is changed over the time, making it
above methods, evaluate them on the real world collected DGA
difficult for the law enforcement agencies to detect and shut down
dataset involving 38 classes with 168,900 samples, and should
botnets. Traditional solutions include blacklisting and reverse
provide a valuable reference point for future research in this field.
engineering [3]. Blacklisting is however not sufficient to give
protection against domain fluxing [1] since it becomes extremely
challenging to determine and blacklist all the malicious domains.
CCS CONCEPTS Reverse engineering, on the other hand, is time-consuming and
• Security and privacy ÿ Intrusion/anomaly detection and requires a malware sample, which is not always possible in
malware ÿ Malware and its mitigation mitigation practical applications [1].
Machine learning has recently attracted considerable attention
KEYWORDS in security community. It also provides a mean to combat DGA
DGA Botnet, Supervised Learning, Long Short-Term Memory and find the related malware structure. The
networks, Recurrent SVM, Bidirectional LSTM machine learning methods can be either unsupervised or
supervised. Unsupervised learning groups domains into clusters
Permission to make digital or hard copies of all or part of this work for personal or in order to take advantage of the statistical attributes for each
classroom use is granted without fee provided that copies are not made or distributed for group [8]. Krishna et al. [16] observed that such approach is time-
profit or commercial advantage and that copies bear this notice and the full citation on consuming; it needs several hours to create the domain clusters
the first page . Copyrights for components of this work owned by others than ACM must
that are able to produce good generalization capabilities.
be proud. Abstracting with credit is permitted. To copy otherwise, or republish, to post on
servers or to redistribute to lists, requires prior specific permission and/or a fee. Request In some extreme cases, the statistical attributes cannot be
permissions from permissions@acm.org. extracted due to the limited availability of bots, especially bots that
are associated with the same DGA in the enterprise networks [5].
SoICT '17, December 7–8, 2017, Nha Trang City, Vietnam
© 2017 Association for Computing Machinery.
ACM ISBN 978-1-4503-5328-1/17/12…$15.00 Supervised learning does not rely on statistical attributes to
https://doi.org/10.1145/3155133.3155166 uncover DGAs. It works directly on the raw domains and their
linguistic attributes. Bilge et al. [6] developed EXPOSURE, where
Machine Translated by Google

SoICT '17, December 7–8, 2017, Nha Trang City, Vietnam H. Mac et al.

Supervised Learning Methods

Handcrafted Features based

Implicit Features based Methods
Methods

SVM C4.5 ELM HMM LSTM Recurrent SVM CNN+LSTM Bidirectional LSTM

Figure 1: Taxonomy for the supervised learning methods in DGA botnet detection.

the C4.5 decision tree (C4.5) is constructed using the attributes

25
extracted from DNS traffic. Shi et al. [7] utilized Extreme Learning
Machine (ELM) to discriminate benign from malicious domains.
15
Antonakakis et al. [2] trained one distinct Hidden Markov Model
(HMM) per DGA. HMM receives in input a domain and classifies
5
whether it is automatically generated.
Alexa Ramnit Ranbyus Suppobox Banjori
Woodbridge et al. [8] leveraged the so-called Long Short-Term
Memory network (LSTM) that produces a 90% detection rate (a) Length of the domain
with a 1:10000 False Positive (FP) rate. C4.5, ELM, HMM and 4

LSTM seem to be adequate for DGA detection in concrete 3

frameworks; but no attempt has made to evaluate them on a
2
reasonably large dataset. There also exist several advanced
supervised learning methods such as Support Vector Machine first

(SVM), Recurrent SVM [9], [10], CNN+LSTM [11], and Alexa Ramnit Ranbyus Suppobox Banjori
Bidirectional LSTM [12], which have not been validated in this
(b) Entropy
application domain .
first

No single method can be the best performer for all problems.

Hence, this paper aims at providing a thorough investigation on
the different supervised learning methods with the aim to 0.5
determine the appropriate one for DGA detection. This should
be a useful source for practitioners and a valuable reference 0
point for future research in this field. This paper is structured as Alexa Ramnit Ranbyus Suppobox Banjori
follows. Section 2 presents an overview on the supervised (c) Dictionary matching score
learning methods that can be applied in DGAs. The evaluation
measures and comparative results are discussed in detail in 100
Section 3. Section 4 is dedicated to conclusion and
future works. 50

2 SUPERVISED LEARNING METHODS 0

Alexa Ramnit Ranbyus Suppobox Banjori
Suppervised learning [32] attempts to discover the relationship (d) 2-gram normality score
between input and its corresponding output. In general, the
relationship is represented in a structure, also known as a model 100
that can be used to predict the outputs for some future inputs. In
this section, we present a comprehensive overview on the 50
supervised learning methods for DGA botnet detection. These
methods can be grouped into two categories: handcrafted 0
Alexa Ramnit Ranbyus Suppobox Banjori
features based and implicit features based methods (Fig. 1).
(e) The merging of 3-, 4-, 5-grams
2.1 Handcrafted Features Based Methods
Constructing set of properties that helps describe each DGA Figure 2: Boxplots showing the distinction between the
class is essential in any conventional supervised learning linguistic attributes related to the non-DGA (Alexa) and 4
methods. This is due to fact that different models have different DGA classes.
Machine Translated by Google

|ÿÿ| |ÿÿ|
ÿ ÿÿ ÿ ÿ

ÿÿ ÿ ÿ

Output gate

Forget

Input gate

Alexa Banjori… Suppobox

Machine Translated by Google

SoICT '17, December 7–8, 2017, Nha Trang City, Vietnam H. Mac et al.

Long ShortTerm Memory network (LSTM) [13], [14] holds more promise for
ÿÿ ÿ ÿÿ ÿÿ ÿ ÿ ÿ ÿ ÿÿ ÿÿ ÿ ÿ ÿ ÿ (ten)
realize DGA malwares since it is capable of modeling temporal sequences and their
long-term dependencies [8]. Traditional HMM is limited to discrete state space, while (11)
ÿÿ ÿ ÿÿ ÿÿ ÿ ÿ ÿ ÿ ÿÿ ÿÿ ÿ ÿ ÿ ÿ
LSTM has Turing capabilities, making it more suitable for all sequence learning tasks.
The LSTM basic unit is the memory block containing one or more memory cells and ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿÿ ÿÿ ÿ ÿÿ ÿ ÿÿ ÿ (twelfth)

three multiplicative gating units (see Fig. 3a). LSTM aims at mapping an input
where ÿ is an update function, which is implemented by combining Eqs. (4) and (8).
sequence to an output sequence by using the following equations interactively from
The Bidirectional LSTM allows the output units ÿÿ to learn a representation from both
ÿ 1 to ÿ
the past and future information without having fixed-size window around ÿ

[23]. In this paper, it is based on two LSTM layers (forward and backward) with 128
ÿÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿ ÿ ÿ (4) memory blocks in each direction.

ÿÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿ ÿ ÿ (5) Table 1: Summary of the collected dataset

ÿÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿ ÿ ÿ (6) Domain Type #Sample Domain Type #Sample

Geodo 58 Fobber 60
ÿÿ ÿ ÿÿÿÿÿ ÿÿ ÿ ÿÿÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿÿ ÿÿ ÿ ÿ ÿ ÿ ÿ (7) Beebone 42 Alexa 88347
Murofet 816 Dyre 800
ÿÿ ÿ ÿÿÿ ÿ ÿ ÿÿ ÿ (8) Pykspa 1422 cryptowall 94
Padcrypt 58 Corebot 28
ÿ ÿ ÿ ÿÿ ÿÿ ÿ ÿÿ ÿÿ ÿ (9) Ramnit 9158 P 200
Volatile 50 Beautiful baby
172
where ÿ , ÿÿ ÿare sigmoid and hyperbolic tangent activation functions. and are the
Ranbyus 1232 Matsnu 48
weight matrices with ÿ Qakbot 4000 PT Goz 6600
ÿÿ, ÿ, ÿÿ ,ÿ denoting the forget gate, input gate, output gate and the cell. ÿ is the Simda 1365 Necurs 2398
Softmax function that is applied to maximize the log-likelihood and label the input Ramdo 200 Pushdo 168
Suppobox 101 Cryptolocker 600
sequence. In this regard, ÿÿ is assigned to class ÿ, which has the highest ÿÿ value.
Locky 186 Dircrypt 57
Tempedreve 25 Shifu 234
Woodbridge et al. [8] leveraged LSTM to classify DGA in real-time. LSTM is fast
Qadars 40 Bamital 60
in testing and provides an accuracy of 90% with a 1:10000 FP rate. It is also compact, 64 Kraken 508
Symmi
involving an embedding layer, a LSTM network layer and a fully connected layer. The 42166 600
Banjori Nymaim
LSTM network layer consists of 128 blocks and one memory cell per block, while the Tinba 6385 Shiotob 1253
fully connected layer can be either logistic regression or multinomial logistic regression. Hesperbot 192 W32.Virus 60

3 EXPERIMENTS
Motivated by the pioneering work of LSTM, several LSTM variants have been
This section is dedicated to assess the various supervised learning methods, ie,
developed in the literature. Recurrent SVM
HMM, C4.5, ELM, SVM, LSTM, Recurrent SVM, CNN+LSTM and Bidirectional LSTM.
is constructed using the idea of replacing the Softmax with SVM.
In particular, we evaluate these methods in both binary (DGA vs. non-DGA) and
Softmax tries to minimize the cross-entropy, while the aim of SVM is to find the
multiclass (which DGA?) problems. The Wilcoxon signed ranks test is also performed
maximum margin between samples from different classes [9]. In CNN+LSTM, the
to compare each pair of methods based on their F1-scores. All the codes were written
input domain is fed into a single Convolutional Neural Network (CNN) with max
using Keras and scikit-learning libraries [26], [27], and were executed on a PC running
pooling across the sequence for each convolutional feature to find the morphological
Ubuntu 16.04 x64 with Intel Core i5 and 8 GBs of RAM.
patterns [11]. The output of CNN is then treated as the input of LSTM to reduce the
temporal variations.

As a consequence, each CNN and LSTM block captures information about the input
3.1 Dataset Specification The experiments
representation at different scale [21]. For this reason, CNN+LSTM is expected to be
better alternative to the original LSTM. are carried out on a real-world dataset that contains 1 non-DGA (Alexa) and 37 DGA
classes. It is collected from two sources: The Alexa top 1 million domains [28] and
Bidirectional LSTM is an extension of the traditional LSTM, consisting of a the OSINT DGA feed from Bambenek Consulting [29]. In total, there are 88,357
forward and backward LSTM. It is observed to achieve higher generalization legitimate domains and 81,490 DGA domains. The dataset also includes some

performance on sequence classification problems [12]. In Bidirectional LSTM, the notable DGA families such as Cryptolocker, Locky, Kraken, Gameover Zeus. Matsnu
Cryptowall, Suppobox and Volatile are based on domains, which were generated
forward hidden sequence ÿÿ , the backward hidden sequence ÿÿ and output sequence
ÿÿ are computed as follows: using English dictionary word list. Table 1 illustrates
Machine Translated by Google

3.2 Evaluation Measures

Machine Translated by Google

3.3 Results
Machine Translated by Google

DGA Botnet Detection Using Supervised Learning Methods SoICT '17, December 7–8, 2017, Nha Trang City, Vietnam

averaging recall, precision and F1-score are shown in Tables 2 and performers. A significant difference is also observed between
3. Macro-averaging treats all classes equally, while micro averaging Bidirectional LSTM and other supervised learning methods.
favors the class, which have more samples. Macro averaging should Table 6 illustrates the evaluation time, which is critical for practical
be a better measure; however micro-averaging is also presented for uses. There is almost no computation cost in LSTM (9 ms). In
interested readers. It can be proved that macro-averaging recall is Bidirectional LSTM, additional processes are needed because
equal to the accuracy, reported in the literature. In Table 2, HMM updating input and output layers cannot be achieved at once [12].
achieves lower detection rate than expected. The rationale for this is Bidirectional LSTM requires 27 ms to process a domain. The
that HMM requires a huge amount of data to train the model, while evaluation time related to ELM is given in [7]. C4.5, ELM and SVM
several DGA classes, such as Tempedreve and Corebot have very are most computationally expensive since these methods are based
little representation in the training data. We note that in [2] HMM was on the hand-crafted attributes. It is clear that
only evaluated using Conficker, Morofet, Bobax, and Sinowal HMM, C4.5, ELM and SVM are not suitable for real-time DGA
malwares. detection applications.
It becomes obvious that C4.5 is superior to both the SVM and ELM. There are still 8 DGA malwares that cannot be detected by all
The dominance of Recurrent SVM and Bidirectional LSTM the supervised learning methods. Geodo, Tempedreve, Hesperbot,
was established by a large margin, leaving LSTM and CNN+LSMT Fobber, Dircrypt, Qadars and Locky are misclassified as Ramnit
in a group with very small difference between them. LSTM cannot since these malwares share a common generator, which has uniform
recognize 15 DGA families. This number is reduced to 8 when distribution over the characters. Matsnu is based on
Bidirectional LSTM is applied to detect malicious domains. Apart pronounceable domains. Hence, it cannot be isolated from Alexa.
from HMM, the implicit features based methods were observed to be
better than the hand-crafted
features based ones. Table 4: Ranks computed by the Wilcoxon test

first

- 210.5 227 311 128 100 63 46.5

HMM (1)
C4.5 (2) 492.5 - 406.5 423 277.5 219.5 237.5 223

ELM (3) 476 334.5 - 413.5 126 96 63.5 53

SVM (4) 392 318 289.5 - 125.5 96 sixty four 47.5

0.95 LSTM (5) 575 425.5 577 615.5 - 199 295.5 179
603 483.5 607 607 542 - 471 281
Recurrent SVM (6)
HMM (AUC = 0.8965) 640 503.5 639.5 639 445.5 232 - 192
CNN+LSTM (7)
C4.5 (AUC = 0.9935)
-
ELM (AUC = 0.9606) Bidirectional LSTM (8) 656.5 518 688 655.5 524 422 549
SVM (AUC = 0.9904)
LSTM (AUC = 0.9955)
0.9 Recurrent SVM (AUC = 0.9969)
Table 5: Summary of the Wilcoxon test. ÿ the method in the
CNN + LSTM (AUC = 0.9959)
Bidirectional LSTM (AUC = 0.9964) row improves the method of the column, while ÿ the method
0 0.2 0.4 0.6 False 0.8 first
in the column improves the method of the row.
Positive Rate Upper diagonal of level significance ÿ = 0.9; Lower
diagonal level of significance ÿ = 0.95

Figure 4: The ROC curves related to the supervised learning

methods. ÿÿÿÿ
HMM (1)
C4.5 (2) ÿÿÿ
In order to compare each pair of algorithms, we conduct the ÿÿÿÿ
ELM (3)
Wilcoxon signed ranks test [24] using KEEL Data-Mining Software
SVM (4)
Tool [25]. Wilcoxon singed ranks test compares for the positive and
LSTM (5)
negative difference. Table 4 shows the detailed ranks; each number
Recurrent SVM (6)
above the diagonal is the sum of ranks (denoted as Rÿ) for the data
CNN+LSTM (7)
classes on which the algorithm on the row is better than the algorithm
in the corresponding column and each number below the diagonal is
the sum of ranks (denoted as Rÿ) ) for the data classes on which the
Table 6: The evaluation time (in ms) of HMM, C4.5, ELM, SVM,
algorithm on the column is better than the algorithm in the
LSTM, Recurrent SVM, CNN+LSTM, Bi-LSTM
corresponding row [25]. For the confidence level ÿ ÿ 0.95 and N ÿ 38,
the two algorithms are considered as being significant different if the
smaller of Rÿ and Rÿ is less than 235. As it can be seen in Table 5,
Recurrent SVM and Bidirectional LSTM are the best
Machine Translated by Google

SoICT '17, December 7–8, 2017, Nha Trang City, Vietnam H. Mac et al.

4 CONCLUSIONS [11] Kim, Yoon, et al. Character-Aware Neural Language Models. AAAI. 2016.
[12] A. Graves, and J. Schmidhuber, Framewise phoneme classification with bidirectional
DGA botnets have become a technology backbone to support LSTM and other neural network architectures, Neural Networks 18.5 (2005): 602-610.
cyber-criminals. The supervised learning provides a mean to
[13] S. Hochreiter, and J. Schmidhuber, Long short-term memory, Neural computation
recognize and shut down this type of botnet. We have thoroughly
9(8) (1997): 1735-1780.
evaluated various supervised learning methods, including Hidden [14] FA Gers, J. Schmidhuber, and F. Cummins, Learning to forget: Continual prediction
Markov Model, C4.5 decision tree, Support Vector Machines, with LSTM, Neural computation 12(10) (2000): 2451-2471.
Extreme Learning Machine, Long Short-Term Memory network, [15] Jay Jacobs, Building a DGA Classifier: Feature Engineering. Available online at: http://
datadrivensecurity.info/blog/posts/2014/Oct/dga-part2/. October 2014
Recurrent SVM, CNN+LSTM and Bidirectional LSTM.
[16] S. Krishnan, T. Taylor, F. Monrose, and J. McHugh, Crossing the threshold: Detecting
Experiments demonstrate that Bidirectional LSTM and Recurrent network malfeasance via sequential hypothesis testing, 43rd Annual IEEE/IFIP
SVM achieve the highest detection rate on both the binary and International Conference on Dependable Systems and Networks (DSN) (2013) 1–12
multiclass classification problems. These methods share some
[17] N. Cristianini, and J. Shawe-Taylor, An introduction to support vector machines and
important features with the LSTM, and thus, making them other kernel-based learning methods, Cambridge university press, 2000.
amenable to real-time detection applications.
[18] J. Milgram, M. Cheriet, and R. Sabourin, “One against one” or “one against all”: Which
ACKNOWLEDGMENTS This one is better for handwriting recognition with SVMs?, Tenth international workshop
on frontiers in handwriting recognition. La Baule, 2006.
research is supported by the Vietnam Ministry of Education and
Training research project “Development of DDoS attack [19] JR Quinlan, C4. 5: programs for machine learning, Elsevier, 2014
prevention and Botnet detection system” B2016-BKA-06. [20] GB Huang, Q.-Y. Zhu, and C.-K. Siew, Extreme learning machine: theory and
applications, Neurocomputing 70.1 (2006): 489-501.
REFERENCES [21] W. Yin, K. Kann, M. Yu, and H. Schütze, Comparative Study of CNN and RNN
for Natural Language Processing, arXiv preprint arXiv: 1702. 01923 (2017).
[1] S. Yadav, AKK Reddy, ALN Reddy, S. Ranjan, Detecting algorithmically generated
[22] V. Tong, and G. Nguyen, A method for detecting DGA botnet based on semantic and
domain-flux attacks with DNS traffic analysis, IEEE/ACM Transactions on Networking
cluster analysis, Proceedings of the Seventh Symposium on Information and
20.5 (2012): 1663-1677.
Communication Technology. ACM, 2016.
[2] M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh, W. Lee, and D.
[23] P. Su, X. Ding, Y. Zhang, Y. Li, and N. Zhao, Predicting Blood Pressure with Deep
Dagon, From Throw-Away Traffic to Bots: Detecting the Rise of DGA Based
Bidirectional LSTM Network, arXiv preprint arXiv:1705.04524 (2017).
Malware. In: the 21st USENIX Security Symposium (USENIX Security 12) (2012).
[24] J. Demšar, Statistical comparisons of classifiers over multiple data sets, Journal
of Machine learning research 7 (2006): 1-30.
[3] Y. Zhou, QS Li, Q. Miao, K. Yin, DGA-Based Botnet Detection Using DNS Traffic,
Journal of Internet Services and Information Security, 3.3/4 (2013): [25] J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. García, L. Sánchez, F.
116-123. Herrera, Keel data-mining software tool: Data set repository, integration of
algorithms and experimental analysis framework, Journal of Multiple-Valued Logic
[4] S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero, Phoenix: DGA-based botnet
Soft Computing 17 (2–3) (2011) 255–287.
tracking and intelligence, International Conference on Detection of Intrusions and
Malware, and Vulnerability Assessment (DIMVA) (2014), LNCS 8550, 192-211 [26] Chollet, François. Keras (2015). URL http://keras. io (2017).
[27] F, Pedregosa, et al., Scikit-learn: Machine learning in Python, Journal of Machine
[5] H. Zhang, M. Gharaibeh, S. Thanasoulas, and C. Papadopoulos, Botdigger: Detecting Learning Research 12 (2011): 2825-2830.
dga bots in a single network, Proceedings of the IEEE International Workshop on [28] Does Alexa have a list of its top-ranked websites? Available online at: https://
Traffic Monitoring and Analaysis. 2016. support.alexa.com/hc/en-us/articles/200449834-Does-Alexa-have-a list-of-its-
[6] L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, EXPOSURE: Finding Malicious topranked-websites-. (2017).
Domains Using Passive DNS Analysis, Ndss. 2011. [29] Bambenek Consulting - Master feeds. Available online at:
[7] Y. Shi, C. Gong and L. Juntao, Malicious Domain Name Detection Based on Extreme http://osint.bambenekconsulting.com/feeds/ (2016).
Machine Learning, Neural Processing Letters (2017): 1-11. [30] M. Masud, T. Al-khateeb, L. Khan, B. Thuraisingham, and K. Hamlen, Flow based
[8] J. Woodbridge, HS Anderson, A. Ahuja, and D. Grant, Predicting Domain Generation identification of botnet traffic by mining multiple log files, in Distributed Framework
Algorithms with Long Short-Term Memory Networks. arXiv preprint arXiv:1611.00791 and Applications, 2008. DFmA 2008. First International Conference on, oct. 2008,
(2016). pp. 200 -206.

[9] Y. Tang, Deep learning using linear support vector machines, arXiv preprint [31] M. Antonakakis, et al., Building a Dynamic Reputation System for
arXiv:1306.0239 (2013). DNS, USENIX security symposium. 2010.

[10] SX Zhang, R. Zhao, C. Liu, J. Li, and Y. Gong Recurrent support vector machines for [32] Kotsiantis, Sotiris B., I. Zaharakis, and P. Pintelas. Supervised machine learning: A
speech recognition, IEEE International Conference on Acoustics, Speech and Signal review of classification techniques. (2007): 3-24.
Processing (ICASSP), 2016.