Improving detection accuracy of politically motivated cyber-hate using heterogeneous stacked ensemble (HSE) approach

Mullah, Nanlir Sallau; Zainon, Wan Mohd Nazmee Wan

doi:10.1007/s12652-022-03763-7

Improving detection accuracy of politically motivated cyber-hate using heterogeneous stacked ensemble (HSE) approach

Original Research
Published: 04 March 2022

Volume 14, pages 12179–12190, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

364 Accesses
Explore all metrics

Abstract

The surge in cyber-hate crimes is largely fuelled by the popularization of social media platforms. On that note, cyber-hate has become an increasing concern for most countries, especially those that are practising democracy. Studies on the influence of social media (SM) on political discourse have now become an important research area due to the rising trends of SM politics. It becomes necessary to address this problem using automated social intelligence. To tackle this concern, the researchers built a novel heterogeneous stacked ensemble (HSE) classifier for detecting politically motivated cyber-hate on Twitter. We constructed a heterogeneous stacked ensemble with eight baseline estimators. In the proposed methodology, the researchers employed TF-IDF for feature vectorisation. The researchers used Twitter API for data scraping to harvest tweets during a gubernatorial election in Nigeria for the training and evaluation of the stacked ensemble model. A total of 15,502 tweets were collected and after some preliminary cleaning, 5876 tweets were manually labelled as hate (1) or non-hate (0). The coded tweets contain 16.87% hate and 83.13% non-hate tweets. This article has three contributions – a critical review of literature on the detection of politically motivated cyber-hate, the building of a new dataset and the proposed stacked ensemble method. Two other public datasets (Kaggle and HASOC) were used to test the performance of our method. The F1-score metric was employed for comparison. Our method is better by 12% on the Kaggle and 4% on the HASOC datasets. We are working on more data for deep learning experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Detection of Hate Speech Using Ensemble Models

An Efficient Method for Detecting Hate Speech in Tamil Tweets Using an Ensemble Approach

Supervised Classifiers to Identify Hate Speech on English and Spanish Tweets

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Additional information

Correspondence and requests for the dataset and any material should be addressed to MN.

Notes

References

Adum AN, Ojiakor OE, Nnatu S (2019) Party Politics, Hate Speech and the Media: A Developing Society Perspective. 5(1), 45–54
Aggrawal N (2018) Detection of Offensive Tweets: A Comparative Study Niyati. 1(1), 1–26
Birch S, Daxecker U, Höglund K (2020) Electoral violence: An introduction. J Peace Res 57(1):3–14. https://doi.org/10.1177/0022343319889657
Article Google Scholar
Breiman L (1996) Bagging Predictors. Mach Learn 24(421):123–140. https://doi.org/10.1007/BF00058655
Article MATH Google Scholar
Brownlee J (2019) Statistical Methods for Machine Learning Discover how to Transform Data into Knowledge with Python
Burnap P, Williams ML (2016) Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Science 5(1):1–15. https://doi.org/10.1140/epjds/s13688-016-0072-6
Chauhan P, Sharma N, Sikka (2021) The emergence of social media data and sentiment analysis in election prediction. J Ambient Intell Humaniz Comput 12(2):2601–2627. https://doi.org/10.1007/s12652-020-02423-y
Article Google Scholar
Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. Proceedings of the 11th International Conference on Web and Social Media, ICWSM 2017, 512–515
Divina F, Gilson A, Goméz-Vela F, Torres MG, Torres JF (2018) Stacking ensemble learning for short-term electricity consumption forecasting. Energies 11(4):1–31. https://doi.org/10.3390/en11040949
Article Google Scholar
Dou J, Yunus AP, Bui DT, Merghadi A, Sahana M, Zhu Z, Chen CW, Han Z, Pham BT (2020) Improved landslide assessment using support vector machine with bagging, boosting, and stacking ensemble machine learning framework in a mountainous watershed. Japan Landslides 17(3):641–658. https://doi.org/10.1007/s10346-019-01286-5
Article Google Scholar
Ezeibe CC (2015) Hate Speech and Electoral Violence in Nigeria. Hhate Speech and Electoral Violence in Nigeria, July 2015, 1–35
Fatemifar S, Awais M, Akbari A, Kittler J (2020) A Stacking Ensemble for Anomaly Based Client-Specific Face Spoofing Detection. Proceedings - International Conference on Image Processing, ICIP, 2020-Octob(October), 1371–1375. https://doi.org/10.1109/ICIP40778.2020.9190814
Feng F, Zhou Q, Shen Z, Yang X, Han L, Wang JQ (2018) The application of a novel neural network in the detection of phishing websites. J Ambient Intell Humaniz Comput 0(0):1–15. https://doi.org/10.1007/s12652-018-0786-3
Article Google Scholar
Fjelde H (2020) Political party strength and electoral violence. J Peace Res 57(1):140–155. https://doi.org/10.1177/0022343319885177
Article Google Scholar
Goldwasser D (2021) MEAN: Multi-head Entity Aware Attention Network for Political Perspective Detection in News Media. 66–75
Gorrell G, Greenwood MA, Roberts I, Maynard D, Bontcheva K (2018) Twits, twats and twaddle: Trends in online abuse towards UK politicians. 12th International AAAI Conference on Web and Social Media, ICWSM 2018, 600–603
Guellil I, Adeel A, Azouaou F, Chennoufi S, Maafi H, Hamitouche T (2020) Detecting hate speech against politicians in Arabic community on social media. Int J Web Inform Syst 16(3):295–313. https://doi.org/10.1108/IJWIS-08-2019-0036
Article Google Scholar
Gwet KL (2015) On Krippendorff ’ s Alpha Coefficient. 1971, 1–16
He H, Zhang W, Zhang S (2018) A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Syst Appl 98:105–117. https://doi.org/10.1016/j.eswa.2018.01.012
Article Google Scholar
Hegazi MO, Al-Dossari Y, Al-Yahy A, Al-Sumari A, Hilal A (2021) Preprocessing Arabic text on social media. Heliyon 7(2):e06191. https://doi.org/10.1016/j.heliyon.2021.e06191
Article Google Scholar
Hussain S, Mufti MR, Sohail MK, Afzal H, Ahmad G, Khan AA (2019) A step towards the improvement in the performance of text classification. KSII Trans Internet Inf Syst 13(4):2162–2179. https://doi.org/ 10.3837/ tiis.2019.04.024
Article Google Scholar
Jurman G, Riccadonna S, Furlanello C (2012) A comparison of MCC and CEN error measures in multi-class prediction. PLoS ONE 7(8):1–8. https://doi.org/10.1371/journal.pone.0041882
Article Google Scholar
Kowsari K, Meimandi KJ, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: A survey. Inform (Switzerland) 10(4):1–68. https://doi.org/10.3390/info10040150
Article Google Scholar
Krippendorff K (2011) Agreement and Information in the Reliability of Coding. Communication Methods and Measures 5(2):93–112
Article Google Scholar
Laaksonen SM, Haapoja J, Kinnunen T, Nelimarkka M, Pöyhtäri R (2020) The Datafication of Hate: Expectations and Challenges in Automated Hate Speech Monitoring. Front Big Data 3, 1–16. https://doi.org/10.3389/fdata.2020.00003
Madichetty S, Muthukumarasamy S, Jayadev P (2021) Multi-modal classification of Twitter data during disasters for humanitarian response. Journal of Ambient Intelligence and Humanized Computing, 1–15
Mandl T, Modha S, Patel D, Majumder P, Dave M, Mandlia C, Patel A (2019) Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages. In Proceedings of the 11th Forum for Information Retrieval Evaluation, 14–17
Mullah NS, Zainon WMNW (2021) Advances in Machine Learning Algorithms for Hate Speech Detection in Social Media: A Review. IEEE Access 9:88364–88376. https://doi.org/10.1109/ACCESS.2021.3089515
Article Google Scholar
Mwadime G, Odeo M, Ngari B, Mutuvi S (2020) Modeling Hate Speech Detection in Social Media Interactions Using Bert. VII(Ii), 78–81
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grise O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D (2011) Scikit-learn. J Mach Learn Res 19(1):2825–2830. https://doi.org/10.1145/2786984.2786995
Article MATH Google Scholar
Rao RS, Pais AR (2020) Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach. J Ambient Intell Humaniz Comput 11(9):3853–3872. https://doi.org/10.1007/s12652-019-01637-z
Article Google Scholar
Ratkiewicz J, Meiss M, Conover M, Gonçalves B, Flammini A, Menczer F (2011) Detecting and Tracking Political Abuse in Social Media. Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, 297
Rong G, Alu S, Li K, Su Y, Zhang J, Zhang Y, Li T (2020) Rainfall induced landslide susceptibility mapping based on bayesian optimized random forest and gradient boosting decision tree models—a case study of shuicheng county, china. Water (Switzerland) 12(11):1–22. https://doi.org/10.3390/w12113066
Article Google Scholar
Rosenzweig S (2015) Dangerous Disconnect: How Politicians’ misperceptions about voters lead to violence in kenya. Seasupennedu, 1–22. http://www.seas.upenn.edu/~eas285/Readings/Hammond_HowPeopleLearn.pdf
Salton G, Yang CS (1973) On the specification of term values in automatic indexing. J Doc 29(July):351–372
Article Google Scholar
Schapire RE (1990) The Strength of Weak Learnability. Mach Learn 5(2):197–227. https://doi.org/10.1023/A:1022648800760
Article Google Scholar
Stambolieva E (2017) Methodology: Detecting Online Abuse against Women MPs on Twitter. Amnesty International, 1–20
Visvizi A, Lytras MD, Aljohani N (2021) politics, governance and democracy. J Ambient Intell Humaniz Comput 12(4):4303–4304. https://doi.org/10.1007/s12652-021-03171-3. Big data research for politics: human centric big data research for policy making,
Wang D, Cai X (2021) Smooth ROC curve estimation via Bernstein polynomials. PLoS ONE 16(5):e0251959. https://doi.org/10.1371/journal.pone.0251959
Article MathSciNet Google Scholar
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259. https://doi.org/10.1016/S0893-6080(05)80023-1
Article Google Scholar
Yadav N, Kudale O, Rao A, Gupta S, Shitole A (2021) Twitter Sentiment Analysis Using Supervised Machine Learning…” In Intelligent Data Communication Technologies and Internet of Things: Proceedings of ICICI 2020, 57(March), 631–642. https://doi.org/10.1007/978-981-15-9509-7_51
Yahav I, Shehory O, Schwartz D (2019) Comments Mining With TF-IDF: The Inherent Bias and Its Removal. IEEE Trans Knowl Data Eng 31(3):437–450. https://doi.org/10.1109/TKDE.2018.2840127
Article Google Scholar
Zhang Z, Luo L (2018) Hate speech detection: A solved problem? The challenging case of long tail on Twitter. Semantic Web 10(5):925–945. https://doi.org/10.3233/SW-180338
Article Google Scholar
Zhu Z, Liang J, Li D, Yu H, Liu G (2019) Hot Topic Detection Based on a Refined TF-IDF Algorithm. IEEE Access 7:26996–27007. https://doi.org/10.1109/ACCESS.2019.2893980
Article Google Scholar

Download references

Funding

No funds, grants, or other support was received.

Author information

Authors and Affiliations

School of Computer Sciences, Universiti Sains Malaysia, 11800, Penang, Malaysia
Nanlir Sallau Mullah & Wan Mohd Nazmee Wan Zainon
Federal College of Education Pankshin, PMB1027, Pankshin, Plateau State, Nigeria
Nanlir Sallau Mullah

Authors

Nanlir Sallau Mullah
View author publications
You can also search for this author in PubMed Google Scholar
Wan Mohd Nazmee Wan Zainon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nanlir Sallau Mullah.

Ethics declarations

Conflict of interest

The authors declare no competing interests relevant to this article.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mullah, N.S., Zainon, W.M.N.W. Improving detection accuracy of politically motivated cyber-hate using heterogeneous stacked ensemble (HSE) approach. J Ambient Intell Human Comput 14, 12179–12190 (2023). https://doi.org/10.1007/s12652-022-03763-7

Download citation

Received: 08 July 2021
Accepted: 09 February 2022
Published: 04 March 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s12652-022-03763-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Improving detection accuracy of politically motivated cyber-hate using heterogeneous stacked ensemble (HSE) approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Detection of Hate Speech Using Ensemble Models

An Efficient Method for Detecting Hate Speech in Tamil Tweets Using an Ensemble Approach

Supervised Classifiers to Identify Hate Speech on English and Spanish Tweets

Additional information

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Improving detection accuracy of politically motivated cyber-hate using heterogeneous stacked ensemble (HSE) approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Detection of Hate Speech Using Ensemble Models

An Efficient Method for Detecting Hate Speech in Tamil Tweets Using an Ensemble Approach

Supervised Classifiers to Identify Hate Speech on English and Spanish Tweets

Explore related subjects

Additional information

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation