An Efficient Intrusion Detection Method Based on LightGBM and Autoencoder
Abstract
:1. Introduction
- The feature selection method based on the LightGBM algorithm is applied to intrusion detection;
- According to the error of reconstructed data obtained by the autoencoder, an appropriate threshold is set to identify normal and attack behaviors. Based on this, an innovative IDS is developed;
- Compared with the deep learning algorithms such as VAE, DAE, and other machine learning algorithms such as Decision Tree, Random Forest, etc., the performance of the proposed LightGBM-AE is verified, and the LightGBM-AE can effectively distinguish the normal behavior from the attack behavior in the NLS-KDD dataset.
2. Related Work
3. Dataset and Methodology
3.1. NSL-KDD Dataset
- Probe: Probe includes attacks that collect information about the network to effectively avoid the security control systems.
- DoS: DoS includes attacks that cause the machine to slow down or shut down by sending traffic information that exceeds the system’s processing capacity to the server. Legitimate network traffic or access to services is affected by DoS attacks.
- R2L: R2L includes attacks that illegally access computers by sending remote spoofing packets to the system.
- U2R: U2R include attacks that provide root access. In this case, the hacker finds out the system vulnerability and starts using the system as a normal user.
3.2. Methodology
3.3. Data Preprocessing
3.3.1. Data Normalization
3.3.2. One-Hot-Encoding
3.4. Feature Selection
3.5. Autoencoders
3.5.1. Autoencoder
3.5.2. Variational Autoencoder
3.5.3. Denoising Autoencoder
3.6. Classification
Algorithm 1 The Detection Algorithm With Trained AE |
Input: the test dataset ; |
Output: accuracy, precision, recall, and F1-score;
|
4. Experimental Results
4.1. Experimental Conditions
4.2. Performance Evaluation
4.3. Parameter Settings and Training Details
4.4. Results and Analysis
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
IDS | Intrusion Detection System |
AE | Autoencoder |
VAE | Variational Autoencoder |
DAE | Denoising Autoencoder |
R2L | Remote-to-Local |
U2R | User-to-Root |
Dos | Denial-of-Service |
TP | True Positives |
TN | True Negatives |
FP | False Positives |
FN | False Negatives |
References
- Ahmed, M.; Mahmood, A.N.; Hu, J. A survey of network anomaly detection techniques. J. Netw. Comput. Appl. 2016, 60, 19–31. [Google Scholar] [CrossRef]
- Abuadlla, Y.; Kvascev, G.; Gajin, S.; Jovanovic, Z. Flow-based anomaly intrusion detection system using two neural network stages. Comput. Sci. Inf. Syst. 2014, 11, 601–622. [Google Scholar] [CrossRef]
- Liu, W.; Ci, L.; Liu, L. A New Method of Fuzzy Support Vector Machine Algorithm for Intrusion Detection. Appl. Sci. 2020, 10, 1065. [Google Scholar] [CrossRef] [Green Version]
- Maalouf, M.; Homouz, D.; Trafalis, T.B. Logistic regression in large rare events and imbalanced data: A performance comparison of prior correction and weighting methods. Comput. Intell. 2018, 34, 161–174. [Google Scholar] [CrossRef]
- Bhattacharya, S.; Krishnan, S.S.R.; Maddikunta, P.K.R.; Kaluri, R.; Singh, S.; Gadekallu, T.R.; Alazab, M.; Tariq, U. A Novel PCA-Firefly Based XGBoost Classification Model for Intrusion Detection in Networks Using GPU. Electronics 2020, 9, 219. [Google Scholar] [CrossRef] [Green Version]
- Li, Z.; Gurgel, H.; Dessay, N.; Hu, L.; Xu, L.; Gong, P. Semi-Supervised Text Classification Framework: An Overview of Dengue Landscape Factors and Satellite Earth Observation. Int. J. Environ. Res. Public Health 2020, 17, 4509. [Google Scholar] [CrossRef]
- Malowany, D.; Guterman, H. Biologically Inspired Visual System Architecture for Object Recognition in Autonomous Systems. Algorithms 2020, 13, 167. [Google Scholar] [CrossRef]
- Shankar, K.; Elhoseny, M.; Lakshmanaprabu, S.K.; Ilayaraja, M.; Vidhyavathi, R.M.; Elsoud, M.A.; Alkhambashi, M. Optimal feature level fusion based ANFIS classifier for brain MRI image classification. Concur. Comput. Pract. Exp. 2020, 32, e4887. [Google Scholar]
- Almiani, M.; Abughazleh, A.; Alrahayfeh, A.; Atiewi, S.; Razaque, A. Deep recurrent neural network for IoT intrusion detection system. Simul. Model. Pract. Theory 2020, 101, 102031. [Google Scholar] [CrossRef]
- Congyuan, X.; Jizhong, S.; Xin, D. A Method of Few-Shot Network Intrusion Detection Based on Meta-Learning Framework. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3540–3552. [Google Scholar]
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
- Alqatf, M.; Lasheng, Y.; Alhabib, M.; Alsabahi, K. Deep Learning Approach Combining Sparse Autoencoder With SVM for Network Intrusion Detection. IEEE Access 2018, 6, 52843–52856. [Google Scholar] [CrossRef]
- Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J.; Alazab, A. Hybrid Intrusion Detection System Based on the Stacking Ensemble of C5 Decision Tree Classifier and One Class Support Vector Machine. Electronics 2020, 9, 173. [Google Scholar] [CrossRef] [Green Version]
- Tchakoucht, T.A.; Ezziyyani, M. Multilayered Echo-State Machine: A Novel Architecture for Efficient Intrusion Detection. IEEE Access 2018, 6, 72458–72468. [Google Scholar] [CrossRef]
- Dey, S.K.; Rahman, M.M. Effects of Machine Learning Approach in Flow-Based Anomaly Detection on Software-Defined Networking. Symmetry 2019, 12, 7. [Google Scholar] [CrossRef] [Green Version]
- Yang, K.; Liu, J.; Zhang, C.; Fang, Y. Adversarial Examples Against the Deep Learning Based Network Intrusion Detection Systems. In Proceedings of the 2018 IEEE Military Communications Conference (MILCOM), Los Angeles, CA, USA, 29–31 October 2018; pp. 559–564. [Google Scholar]
- Yin, C.; Zhu, Y.; Fei, J.; He, X. A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks. IEEE Access 2017, 5, 21954–21961. [Google Scholar] [CrossRef]
- Lotfollahi, M.; Siavoshani, M.J.; Zade, R.S.; Saberian, M. Deep Packet: A Novel Approach For Encrypted Traffic Classification Using Deep Learning. Soft Comput. 2020, 24, 1999–2012. [Google Scholar] [CrossRef] [Green Version]
- Zavrak, S.; iskefiyeli, M. Anomaly-Based Intrusion Detection From Network Flow Features Using Variational Autoencoder. IEEE Access 2020, 8, 108346–108358. [Google Scholar] [CrossRef]
- Ieracitano, C.; Adeel, A.; Morabito, F.C.; Hussain, A. A Novel Statistical Analysis and Autoencoder Driven Intelligent Intrusion Detection Approach. Neurocomputing 2020, 387, 51–62. [Google Scholar] [CrossRef]
- Devan, P.; Khare, N. An efficient XGBoost–DNN-based classification model for network intrusion detection system. Neural Comput. Appl. 2020, 32, 12499–12514. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.W.; Wang, T.; Chen, W.; Ma, W.; Qiwei, Y.; Liu, T. LightGBM: A highly efficient gradient boosting decision tree. In Neural Information Processing Systems; Neural Information Processing Systems Foundation: Long Beach, CA, USA, 2017. [Google Scholar]
- Hinton, G.E.; Salakhutdinov, R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [Green Version]
- Doersch, C. Tutorial on Variational Autoencoders. arXiv 2016, arXiv:1606.05908. [Google Scholar]
- Lee, S.M.; Kim, H.J.; Kim, S.B. Dynamic dispatching system using a deep denoising autoencoder for semiconductor manufacturing. Appl. Soft Comput. 2020, 86, 105904. [Google Scholar] [CrossRef]
- Wan, F.; Guo, G.; Zhang, C.; Guo, Q.; Liu, J. Outlier Detection for Monitoring Data Using Stacked Autoencoder. IEEE Access 2019, 7, 173827–173837. [Google Scholar] [CrossRef]
- Zhou, Y.; Qin, R.; Xu, H.; Sadiq, S.; Yu, Y. A Data Quality Control Method for Seafloor Observatories: The Application of Observed Time Series Data in the East China Sea. Sensors 2018, 18, 2628. [Google Scholar] [CrossRef] [Green Version]
- Langer, M.; Hall, A.; He, Z.; Rahayu, W. MPCA SGD—A Method for Distributed Training of Deep Learning Models on Spark. IEEE Trans. Parallel Distrib. Syst. 2018, 29, 2540–2556. [Google Scholar] [CrossRef]
NSL-KDD | Normal | Probe | DoS | R2L | U2R |
---|---|---|---|---|---|
Train (125,973) | 67,343 | 11,656 | 45,927 | 995 | 52 |
53.46% | 9.25% | 36.46% | 0.79% | 0.04% | |
Test (22,544) | 9711 | 2421 | 7458 | 2754 | 200 |
43.07% | 10.74% | 33.08% | 12.22% | 0.89% |
Attack Category | Attack Name |
---|---|
Probe | Satan, Saint, Ipsweep, Portsweep, Nmap, Mscan |
DoS | Apache2, Smurf, Neptune, Back, Teardrop, Pod, Land, Mailbomb, Processtable, UDPstorm |
R2L | WarezClient, Guess_Password, WarezMaster, Imap, Ftp_Write, Named, MultiHop, Phf, Spy, Sendmail, SnmpGetAttack, Worm, Xsnoop, Xlock, SnmpGuess |
U2R | Buffer_Overflow, Httptuneel, Rootkit, Perl, Ps, Xterm, SQLattack, LoadModule |
Threshold | Number of Features | Accuracy (%) |
---|---|---|
0 | 41 | 99.12 |
0.001 | 33 | 99.10 |
0.002 | 31 | 99.10 |
0.003 | 30 | 99.07 |
0.004 | 29 | 99.07 |
0.005 | 28 | 99.10 |
0.006 | 25 | 99.13 |
0.007 | 23 | 99.18 |
0.009 | 21 | 99.20 |
0.013 | 20 | 99.07 |
0.019 | 17 | 99.06 |
0.021 | 16 | 99.03 |
0.024 | 15 | 98.80 |
0.029 | 14 | 98.69 |
0.031 | 13 | 98.72 |
0.036 | 11 | 98.67 |
0.037 | 10 | 98.55 |
0.044 | 9 | 98.37 |
0.047 | 7 | 98.34 |
0.048 | 6 | 98.40 |
0.049 | 4 | 95.05 |
0.061 | 3 | 94.17 |
0.087 | 2 | 92.88 |
0.154 | 1 | 88.55 |
ID | Data Type | Feature |
---|---|---|
F1 | Continuous | duration |
F2 | Symbolic | protocol_type |
F3 | Symbolic | service |
F4 | Symbolic | flag |
F5 | Continuous | src_bytes |
F6 | Continuous | dst_bytes |
F10 | Continuous | hot |
F12 | Binary | logged_in |
F23 | Continuous | count |
F24 | Continuous | srv_count |
F30 | Continuous | diff_srv_rate |
F32 | Continuous | dst_host_count |
F33 | Continuous | dst_host_srv_count |
F34 | Continuous | dst_host_same_srv_rate |
F35 | Continuous | dst_host_diff_srv_rate |
F36 | Continuous | dst_host_same_src_port_rate |
F37 | Continuous | dst_host_srv_diff_host_rate |
F38 | Continuous | dst_host_serror_rate |
F39 | Continuous | dst_host_srv_serror_rate |
F40 | Continuous | dst_host_rerror_rate |
F41 | Continuous | dst_host_srv_rerror_rate |
Predict | Normal | Attack | |
---|---|---|---|
Actual | |||
normal | TP | FN | |
attack | FP | TN |
Model | Accuracy/% | Precision/% | Recall/% | F1-Score/% |
---|---|---|---|---|
AE | 89.82 | 91.81 | 90.16 | 90.98 |
VAE | 84.28 | 84.92 | 88.01 | 86.43 |
DAE | 84.30 | 85.22 | 87.63 | 86.41 |
Features | VAE | DAE | DT | RF | KNN | GBDT | XGBoost | Proposed Model (AE) |
---|---|---|---|---|---|---|---|---|
122 | 82.56 | 82.79 | 78.72 | 70.10 | 76.25 | 78.31 | 75.85 | 87.48 |
102 | 84.28 | 84.30 | 80.09 | 71.06 | 76.51 | 78.34 | 78.61 | 89.82 |
Classifier | HL1 | HL2 | HL3 | HL4 | Accuracy All Features | Accuracy 21 Features |
---|---|---|---|---|---|---|
64 | - | - | - | 85.33 | 87.07 | |
64 | 32 | - | - | 82.37 | 83.89 | |
64 | 32 | 16 | - | 84.33 | 86.40 | |
64 | 32 | 8 | - | 83.98 | 84.07 | |
64 | 32 | 16 | 8 | 81.06 | 83.64 | |
64 | 16 | - | - | 84.75 | 85.31 | |
64 | 8 | - | - | 80.22 | 81.03 | |
48 | - | - | - | 79.25 | 82.45 | |
48 | 32 | - | - | 81.32 | 83.21 | |
48 | 32 | 16 | - | 87.48 | 89.82 | |
48 | 32 | 16 | 8 | 80.75 | 82.61 | |
32 | - | - | - | 82.65 | 83.49 | |
32 | 16 | - | - | 82.39 | 83.51 | |
32 | 8 | - | - | 81.57 | 82.69 | |
32 | 16 | 8 | - | 79.58 | 81.31 |
Work | Classifier | ACC (%) |
---|---|---|
Khraisat et al. [13] | C5.0 Decision tree+one-class SVM | 83.24 |
Yang, Kaichen, et al. [16] | DNN | 89.00 |
Yin, C., et al. [17] | RNN | 83.28 |
Cosimo Ieracitano et al. [20] | AE+softmax | 84.21 |
Proposed AE model | AE | 89.82 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tang, C.; Luktarhan, N.; Zhao, Y. An Efficient Intrusion Detection Method Based on LightGBM and Autoencoder. Symmetry 2020, 12, 1458. https://doi.org/10.3390/sym12091458
Tang C, Luktarhan N, Zhao Y. An Efficient Intrusion Detection Method Based on LightGBM and Autoencoder. Symmetry. 2020; 12(9):1458. https://doi.org/10.3390/sym12091458
Chicago/Turabian StyleTang, Chaofei, Nurbol Luktarhan, and Yuxin Zhao. 2020. "An Efficient Intrusion Detection Method Based on LightGBM and Autoencoder" Symmetry 12, no. 9: 1458. https://doi.org/10.3390/sym12091458
APA StyleTang, C., Luktarhan, N., & Zhao, Y. (2020). An Efficient Intrusion Detection Method Based on LightGBM and Autoencoder. Symmetry, 12(9), 1458. https://doi.org/10.3390/sym12091458