Abstract
Voice phishing, or vishing, is a phishing phone call in which an attacker lures receivers into providing personal their information. Damage from vishing is a serious problem worldwide and is increasing in frequency. Therefore, this study is aimed at detecting vishing in real time. Owing to the absence of research on spam detection using low-resource languages, we detect vishing in the Korean language using basic machine-learning models. We collected actual vishing damage data and converted the voice files into text to achieve spam detection using natural language processing techniques. The focus is on determining whether vishing can be rapidly detected, rather than model development. Based on the results, we suggest that vishing can be detected in real time and requires only a short training time when using machine learning models.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability Statement
The datasets collected for the current study are available at https://anonymous.4open.science/r/vishing-1AF8. Additional information used in this study can be obtained from the corresponding author upon reasonable request.
Notes
References
Abu-Nimeh S, Nappa D, Wang X, Nair S (2007) A comparison of machine learning techniques for phishing detection. In: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, ACM, pp 60–69
Akinyelu AA, Adewumi AO (2014) Classification of phishing email using random forest machine learning technique. J Appl Math 2014:425731
Arık SÖ, Chrzanowski M, Coates A, Diamos G, Gibiansky A, Kang Y, Li X, Miller J, Ng A, Raiman J et al (2017) Deep voice: real-time neural text-to-speech. In: Proceedings of the International Conference on Machine Learning, PMLR, pp 195–204
Barraclough PA, Hossain MA, Tahir M, Sexton G, Aslam N (2013) Intelligent phishing detection and protection scheme for online transactions. Expert Syst Appl 40(11):4697–4706
Biswal S (2021) Real-time intelligent vishing prediction and awareness model (rivpam). In: Proceedings of the 2021 international conference on cyber situational awareness. Data Analytics and Assessment (CyberSA), IEEE, pp 1–2
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory, ACM, pp 144–152
Breiman L (2001) Random forests. Mach Learning 45(1):5–32
Choi K, Jl L, Yt C (2017) Voice phishing fraud and its modus operandi. Secur J 30(2):454–466
Cook S (2021) 35+ phone spam stattistics for 2017–2021. https://www.comparitech.com/blog/information-security/phone-spam-statistics/
Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5–6):352–359
Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Trans Neural Networks 10(5):1048–1054
Ghourabi A, Mahmood MA, Alzubi QM (2020) A hybrid cnn-lstm model for sms spam detection in Arabic and English messages. Future Internet 12(9):156
Gómez Hidalgo JM, Bringas GC, Sánz EP, García FC (2006) Content based sms spam filtering. In: Proceedings of the 2006 ACM symposium on Document engineering, ACM, pp 107–114
Gorham M (2019) 2018 internet crime report. https://www.ic3.gov/Media/PDF/AnnualReport/2018_IC3Report.pdf
Gupta H, Jamal MS, Madisetty S, Desarkar MS (2018) A framework for real-time spam detection in twitter. In: Proceedings of the 2018 10th international conference on communication systems & networks (COMSNETS), IEEE, pp 380–383
Hwang S, Kim J, Park E, Kwon SJ (2020) Who will be your next customer: a machine learning approach to customer return visits in airline services. J Bus Res 121:121–126
Kadoya Y, Khan MSR, Yamane T (2020) The rising phenomenon of financial scams: evidence from Japan. J Financial Crime 27(2):387–396
Kenton JDMWC, Toutanova LK (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL-HLT), pp 4171–4186
Kim J, Bae K, Park E, del Pobil AP (2019) Who will subscribe to my streaming channel? The case of twitch. In: Conference companion publication of the 2019 on computer supported cooperative work and social computing (CSCW Companion), pp 247–251
Kim J, Lee J, Park E, Han J (2020) A deep learning model for detecting mental illness from user content on social media. Sci Rep 10(1):1–6
Kim J, Hwang S, Park E (2021a) Can we predict the Oscar winner? A machine learning approach with social network services. Entertain Comput 39:100441
Kim JW, Hong GW, Chang H (2021b) Voice recognition and document classification-based data analysis for voice phishing detection. Human-Centric Comput Info Sci 11:2
Korea Financial Supervisory Service (2021) Analysis of voice phishing status in 2020. https://www.fss.or.kr/fss/kr/promo/bodobbs_view.jsp?seqno=23836
Korea National Police Agency (2020) Voice phishing status. https://www.data.go.kt/data/15063815/fileData.do
Koøcz A, Alspector J (2001) SVM-based Filtering of E-mail Spam with Content-specific Misclassification Costs. In: Proceedings of the workshop on text mining (TEXTDM), Citeseer, pp 1–14
Lee S, Ji H, Kim J, Park E (2021) What books will be your bestseller? A machine learning approach with amazon kindle. Electron Libr 39(1):137–151
Li Z, Nie F, Chang X, Nie L, Zhang H, Yang Y (2018a) Rank-constrained spectral clustering with flexible embedding. IEEE Trans Neural Netw Learning Syst 29(12):6073–6082
Li Z, Nie F, Chang X, Yang Y, Zhang C, Sebe N (2018b) Dynamic affinity graph construction for spectral clustering using multiple features. IEEE Trans Neural Netw Learning Syst 29(12):6323–6332
Li Z, Yao L, Chang X, Zhan K, Sun J, Zhang H (2019) Zero-shot event detection via event-adaptive concept relevance mining. Pattern Recogn 88:595–603
Mccord M, Chuah M (2011) Spam detection on twitter using traditional classifiers. In: Proceedings of the international conference on autonomic and trusted computing (ATC), Springer, pp 175–186
Obuhuma J, Zivuku S (2020) Social engineering based cyber-attacks in kenya. In: Proceedings of the 2020 IST-Africa conference (IST-Africa), IEEE, pp 1–9
Raj H, Weihong Y, Banbhrani SK, Dino SP (2018) Lstm based short message service (sms) modeling for spam classification. In: Proceedings of the 2018 International Conference on Machine Learning Technologies, pp 76–80
Ren P, Xiao Y, Chang X, Huang PY, Li Z, Chen X, Wang X (2021) A comprehensive survey of neural architecture search: challenges and solutions. ACM Comput Surveys (CSUR) 54(4):1–34
Roy PK, Singh JP, Banerjee S (2020) Deep learning to filter sms spam. Futur Gener Comput Syst 102:524–533
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674
Sasaki M, Shinnou H (2005) Spam detection using text clustering. In: Proceedings of the 2005 international conference on cyberworlds (CW), IEEE, pp 1–4
Shen J, Pang R, Weiss RJ, Schuster M, Jaitly N, Yang Z, Chen Z, Zhang Y, Wang Y, Skerrv-Ryan R et al (2018) Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In: Proceedings of the 2018 IEEE international conference on acoustics. Speech and Signal Processing (ICASSP), IEEE, pp 4779–4783
Song J, Kim H, Gkelias A (2014) ivisher: real-time detection of caller id spoofing. ETRI J 36(5):865–875
Stein RA, Jaques PA, Valiati JF (2019) An analysis of hierarchical text classification using word embeddings. Inf Sci 471:216–232
Sun N, Lin G, Qiu J, Rimba P (2020) Near real-time twitter spam detection with machine learning techniques. Int J Comput Appl. https://doi.org/10.1080/1206212X.2020.1751387
Tran MH, Le Hoai TH, Choo H (2020) A third-party intelligent system for preventing call phishing and message scams. In: Proceedings of the international conference on future data and security engineering (FDSE), Springer, pp 486–492
Trivedi SK (2016) A study of machine learning classifiers for spam detection. In: Proceedings of the 2016 4th international symposium on computational and business intelligence (ISCBI), IEEE, pp 176–180
Wei F, Nguyen T (2020) A lightweight deep neural model for sms spam detection. 2020 International Symposium on Networks. Computers and Communications (ISNCC), IEEE, pp 1–6
Wijaya A, Bisri A (2016) Hybrid decision tree and logistic regression classifier for email spam detection. In: 2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE), IEEE, pp 1–4
Wu T, Liu S, Zhang J, Xiang Y (2017) Twitter spam detection based on deep learning. In: Proceedings of the australasian computer science week multiconference (ACSW), ACM, pp 1–8
Yan C, Chang X, Luo M, Zheng Q, Zhang X, Li Z, Nie F (2020) Self-weighted robust lda for multiclass classification with edge classes. ACM Trans Intell Syst Technol (TIST) 12(1):1–19
Yeboah-Boateng EO, Amanor PM (2014) Phishing, smishing & vishing: an assessment of threats against mobile devices. J Emerg Trends Comput Inf Sci 5(4):297–307
Zhang R, Gurtov A (2009) Collaborative reputation-based voice spam filtering. In: Proceedings of the 2009 20th international workshop on database and expert systems application, IEEE, pp 33–37
Acknowledgements
This work was supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. IITP-2021-0-00358, AI big data-based cyber security orchestration, and automated response technology development). Moreover, this research was supported by National Research Foundation (NRF) of Korea Grant funded by the Korean Government (MSIT) (No. 2021R1A4A3022102).
Author information
Authors and Affiliations
Contributions
ML and EP designed the study. ML collected and analyzed the data. EP presented the results. ML and EP wrote and revised the manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts or competing interests to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A. Data analysis
Table 6 shows the top-100 most widely used words in spam and nonspam cases when analyzing spam and nonspam text content, respectively. The nonspam cases mostly included everyday words, such as us, movies, people, and me, whereas the spam cases included words such as loan, investigation, bank accounts, bank, illegality, and victims (given in bold). Therefore, understanding the meaning of these words is important in vishing detection.
Appendix B. Speech-to-text tool examples
We converted the collected.mp3 files of voice phishing speech, and the results when using actual voice scripts, Google speech-to-text API, and Naver Clova Speech speech-to-text conversion tools are shown in Table 7.
Rights and permissions
About this article
Cite this article
Lee, M., Park, E. Real-time Korean voice phishing detection based on machine learning approaches. J Ambient Intell Human Comput 14, 8173–8184 (2023). https://doi.org/10.1007/s12652-021-03587-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03587-x