0% found this document useful (0 votes)
12 views

2020 Based On Deep Learning Architecture

Uploaded by

Prathy usha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

2020 Based On Deep Learning Architecture

Uploaded by

Prathy usha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Multimedia Systems

https://doi.org/10.1007/s00530-020-00701-5

SPECIAL ISSUE PAPER

Cyberbullying detection solutions based on deep learning


architectures
Celestine Iwendi1 · Gautam Srivastava2,3 · Suleman Khan4 · Praveen Kumar Reddy Maddikunta5

Received: 1 July 2020 / Accepted: 24 September 2020


© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract
Cyberbullying is disturbing and troubling online misconduct. It appears in various forms and is usually in a textual format
in most social networks. Intelligent systems are necessary for automated detection of these incidents. Some of the recent
experiments have tackled this issue with traditional machine learning models. Most of the models have been applied to one
social network at a time. The latest research has seen different models based on deep learning algorithms make an impact on
the detection of cyberbullying. These detection mechanisms have resulted in efficient identification of incidences while others
have limitations of standard identification versions. This paper performs an empirical analysis to determine the effectiveness
and performance of deep learning algorithms in detecting insults in Social Commentary. The following four deep learning
models were used for experimental results, namely: Bidirectional Long Short-Term Memory (BLSTM), Gated Recurrent
Units (GRU), Long Short-Term Memory (LSTM), and Recurrent Neural Network (RNN). Data pre-processing steps were
followed that included text cleaning, tokenization, stemming, Lemmatization, and removal of stop words. After performing
data pre-processing, clean textual data is passed to deep learning algorithms for prediction. The results show that the BLSTM
model achieved high accuracy and F1-measure scores in comparison to RNN, LSTM, and GRU. Our in-depth results shown
which deep learning models can be most effective against cyberbullying when directly compared with others and paves the
way for future hybrid technologies that may be employed to combat this serious online issue.

Keywords Cyberbullying · Social media · Deep learning · NLP · Mining · Emotions

1 Introduction

The development of information and networking technology


has created open online communication channels. Unfortu-
* Gautam Srivastava nately, trolls have exploited this technology for cyber-attacks
srivastavag@brandonu.ca and threats. Statistics show that about 18% of Europe’s chil-
Celestine Iwendi dren were affected either through people bullying or har-
celestine.iwendi@ieee.org assing them via the Internet and mobile communication.
Suleman Khan EU Kids Online Report of 2014 stated that nearly 20% of
171518@students.au.edu.pk kids who are between the ages of 11 and 16 are vulnerable
Praveen Kumar Reddy Maddikunta to cyberbullying [19]. Quantitative research [29] indicates
praveenkumarreddy@vit.ac.in cyber-victimization rates among adolescents ranging from
1
Department of Electronics, BCC of Central South University 20 to 40%. All of these show how important it is to find an
of Forestry and Technology, Changsha, China adequate, speedy, and tested approach to solving this online
2
Department of Math and Computer Science, Brandon pandemic.
University, Brandon, MB, Canada There is a need to consider and tackle cyber-bullying
3
Research Center for Interneural Computing, China Medical from various viewpoints including automatic detection and
University, Taichung 40402, Taiwan, ROC avoidance of these accidents. There are methods already
4
Air University Islamabad, Islamabad, Pakistan developed that can mark as instances of bullying, including
5 the engagement of [9] services that seek to help the victims
Vellore Institute of Technology, Vellore, Tamil Nadu, India

13
Vol.:(0123456789)
C. Iwendi et al.

[32]. Besides, most online channels that are widely used Memory (BLSTM), Gated Recurrent Units (GRU), Long
by adolescents have safe centers, such as YouTube Safety Short-Term Memory (LSTM) and Recurrent Neural Net-
Center and Twitter Safety and Protection, which offer user work (RNN).
assistance and track communications. 4. Carried out an empirical analysis to determine the effec-
A jet age transformation of cyberbullying is now in force tiveness and performance of deep learning algorithms in
and has become very common with students seeing it as detecting insults in Social Commentary.
fun on cyberspace to harass their friends and enemies. The 5. Finally, a comparison between all the DL algorithm
authors in [22] discussed the metamorphosis in the domain used. Our result shows that the BLSTM model achieved
of electronics and how the influence has become negative, high accuracy and F1-measure scores in contrast to
creating problems for the world. Furthermore, their research RNN LSTM and GRU in detecting insults in Social
follows an extension of an original study intended to evalu- Commentary.
ate by experimental procedures the nature and extent of
cyberbullying. They aim to add to the fact that there can The rest of the paper is organized as follows: Sect. 2 is about
be a systemic stoppage of communication by intersecting related work, the proposed methodology is discussed in
between communication and computers. Hence, providing a Sect. 3, Sect. 4 discusses experimental results and finally,
backdrop of which there can be continuity of their research. Sect. 5 concludes the paper.
In cyberspace, cyber-bullying takes place through several
mediums, and it’s mainly on social media, where youth and
adults access almost all the time. A cyber-bullying research 2 Related work
group surveyed between July and October 2016. High school
students and the findings show that 34% of students had In recent times, there has been an increase in online activi-
encountered cyber-bullying in their lifetime [3]. Given this, ties by teens, especially on social networking sites, which
automatic and valid identification of cyber-bullying is nec- have invariably exposed them to cyber-bullying. Com-
essary to resolve such issues. Researchers have suggested ments containing abusive words affect the psychology of
that cyber-bullying comes in various ways, such as embar- teens, demoralize them, and may lead them to depression
rassment, stalking, coercion, exploitation, or domination of or suicide. Two rules for feature extraction that was used
a designated victim [10]. All types can be summarized in to detect perceived negative and offensive comments often
text format when the words are explicit or implied. Explicit directed towards peers were presented by the authors [7].
11 expressions arise by using profane words with a negative Their combined hand-crafted features with the traditional
emotion, where implicit expressions may come with ironic feature extraction tend to increase the accuracy of detection
or cynical phrases that have no foul words. There has been of the system positively. Although current methods inspired
a lot of research on explicit speech identification. Still, a lot by deep learning and machine learning have enhanced the
of work is needed to solve the implicit language that makes accuracy of cyber-bullying detection, the fact remains that
detecting cyber-bullying on social media a challenging job. lack of good standard labeled datasets limits the advance-
Nevertheless, progress has been made into the detection ment of this approach. Therefore, the authors [8] proposed a
of cyber-bullying using approaches to machine learning system where the dataset of the user comments for labeling
(ML) and deep learning (DL). However, much of the cur- is applied with crowdsourcing, capturing the real-time sce-
rent work needs to be more developed to provide a reliable nario of deliberately abusive words or kind words.
approach that incorporates clear and indirect factors into The authors in [28] argue that Cybersecurity experts
account. Analytical research aimed at evaluating the output should not be pushed away in the advent of AI-controlled
of DL algorithms is discussed in this paper. cyberbullying. They must be allowed to continue doing their
The contributions of this paper include: job and testing networks, just as doctors are still allowed to
read the result of X-ray scans in situations where Human
1. Deep Bidirectional Long Short Term Memory (BLSTM) Intelligence is needed to control the Artificial intelligence.
is used for prediction. We also used linguistic methods This is the goal of the next generation of artificial intel-
to evaluate our results. We used parts of speech to see ligence, known as AI 2.0. The idea of using soft comput-
which type of pattern is followed by normal and abusing ing methods for the detection of cyberbullying, especially
tweets. in social media platforms, was studied by [18]. They com-
2. Text cleansing, tokenization, stemming, Lemmatization, pared their study with previous studies and came up with
and removal of stop words are performed as pre-process- the idea that social media platforms should use a meta-ana-
ing steps. lytic method in tackling cyberbullying detection. Using a
3. Application of four deep learning models for the experi- method to identify, text classifying, and personalized text-
mental results, namely: Bidirectional Long ShortTerm based cyberstalking was created by [13]. It was an ethical

13
Cyberbullying detection solutions based on deep learning architectures

framework and a way to detect text-based cyberstalking. limitation of this application and Reproducibility study from
They went further to focus on using other initiatives such as integrating other sources of information from the impact of
digital forensics to perform author identification. gaining access to the social media profile.
The authors in [31] presented different stages and multi- Haidai et al. suggested a multilingual cyberbullying pre-
ple technique systems that first uses crowdsourcing for post vention system [14]. The authors strive to avoid cyberbul-
and hashtag annotation and subsequently uses machine- lying attacks on the Arabic language. The experiment was
learning methods to identify extra posts for annotation. This performed on a real-time Arabic dataset from Arab coun-
is proper research as we can compare our latest results and tries. During this cycle, the authors used Dataiku DSS and
that performed from this paper. They concluded that you WEKA, supporting Arabic. Naive Bayes and SVM Classi-
could have an excellent performance of models if the data- fiers are used for prediction, achieving satisfactory results.
set is trained with their approach. Meanwhile, according to Nevertheless, this research can be expanded by considering
[21, 26], their results from multiple variate logistic regres- deep learning and increasing the dataset size. In [2], the
sion techniques show that physical violence is associated researchers implemented a cyberbullying strategy by collect-
with peers that smoke or with another feature of carrying ing 20,000 random tweets. Data pre-processing was applied
weapons before the cyberbullying enactment. They highlight to remove noisy and unwanted data. Such pre-processed
the importance of ensuring positive and reliable monitor- data is divided into training and trained data. For train-
ing by parents, teachers, or peers to improve cyberbullying ing data, tweet classification was provided to mark tweets.
prevention efforts. The author in [35] proposed a method Later, deep convolutional neural networks were used to clas-
that is capable of analyzing the hidden feature structure of sify a dataset. No encouraging experimental results were
cyberbullying evidence and acquire a vigorous and discrimi- achieved. Research must be expanded by considering a large
native depiction of text. Finally, their results and approaches dataset and several languages. Similarly, the authors in [4]
perform better than other baseline text depiction methods. used deep convolutionary neural networks by considering
The authors in [25] used 22 studies and experiments the 69,874 tweets twitter dataset. Tweets were mapped to
to validate current practices on automatic cyberbullying vectors through Glove’s open-source word embedding. The
detection. They finally ended with results indicating that experimental results showed that with deep convolutionary
cyberbullying is frequently distorted, creating the assump- neural networks, the authors achieved 93.7% accuracy. How-
tion that it is not a big deal after all. With the imbalance of ever, detecting cyberbullying in chats containing Hindi and
datasets, it is difficult to have actual practical impartation English together can further expand the research.
of the consequences of cyberbullying. Rosa et al. used for Wiki-Detox dataset was the main point of the research of
their studies two data sets, which are intimidating trace and [33]. They presented a classifier that can generate a result
databases with Formspring. To build the prediction mod- roughly as good as the 3 Human Workers in total, as cal-
els, they implemented Support Vector Machines (SVM), culated by ROC curve and Spearman correlation field. In
Random Forest, and Logistical Regression. F1 score was terms of model construction, they looked at three dimen-
used to test the outcomes of experiments, and the embed- sions: Architecture model (Logistic Regression vs Multi-
ding achieves an F1 score of 0.45. This same approach was Layer Perceptron), sort n-gram (word vs char) and sort of
considered by [27]. The only difference is that their response mark (one-hot vs empirical distribution). They then answer
grading system felt the ruthlessness of cyberbullying and questions on identifying harassment using their classifier.
gave suitable responses. We used the work of [6] as a practical application based
Rakib et al. first developed a word embedding applica- on Turkish contents since the detection of cyberbullying has
tion using Reddit, then followed by a cyberbullying iden- been ignored. The authors designed eight different artifi-
tification model using the Kaggle dataset comprising of cial neural network models that detected cyberbullying in
6594 comments. Random Forest model was used to train Turkish social media. According to the evaluation results,
the system. The prediction model obtained 0.90 Area under they had 91% F1-measure score and a better performance
the curve (AUC) and 0.89 Precision. The drawback of this than the experimented machine learning classifiers in their
analysis, however, is that the sample is also imbalanced with previous study. Another similar scenario is the work done
cyberbullying texts consisting of only 25% [24]. While the by [23]. Their paper presented a solution for detecting and
authors [1, 11] repeated another related research with three stopping cyberbullying with focus on content written in the
real-world datasets, namely: Formspring, Twitter, and Wiki- Arabic language.
pedia. They applied deep learning algorithms to construct Finally, deep learning was optimized with the algorithm,
the prediction models after the datasets were extracted with generating a good parameter tuning. The authors in [17] used
three word embedding structures, including random vector another scenario and proved the inefficiency in the classifica-
initialization, GloVe, and Sense-Specific Embedding Word tion of previous methods. The only limitation which we are
(SSWE). They used over-sampling methods and maybe a now considering in our research is lack of regressive training

13
C. Iwendi et al.

of system that makes sure that cyberbullying is detected in made. This is often blank, which means it is not possible to
real-time chats. It also created another channel where cyber- get an exact timestamp. This material is based mainly on
bullying can be detected in chats that contains a mixture of commentary in the English language, with some editing on
different words in different languages. We can conclude after different occasions.
considering the stipulated approaches that our approach will
solve most of the limitations of the past and present research
3.2 Pre‑processing
in the detection of words used to harass, intimidate others
while using the social media.
3.2.1 Text cleaning

When a text is received based on the implementation of


3 Methodology
Fig. 1, the data is scrutinized to a high degree of refinements
applying and following the steps stipulated below: Stop word
The proposed methodology is depicted in Fig. 1. The steps
remover, tokenization, lower casing, sentence segmentation,
applied in the methodology are listed below:
and punctuation removal. These are the steps that were taken
to have the data reduced to size, and thus, we also removed
– Load the dataset from the Kaggle repository.
unwanted information that could be found in the data. In
– Perform pre-processing of the dataset by doing text
furtherance of this approach, we created a generic pre-pro-
cleaning, tokenization, stemming, lemmatization and
cessing that resulted in the removal of punctuation and also
stop word removal.
some non-letter characters from each document. Finally, the
– After cleaning the text, some linguistic approaches used
letter case of each document was lowered. The result from
to analyze the bad comments pattern.
this approach gave us a sliced document text based on the n
– Then Split the dataset into training and testing data.
length with an n-gram word-based tokenizer.
– Train the dataset by various deep learning algorithms.
– Evaluate the performance of the deep learning algorithms
by using the testing dataset with several metrics. 3.2.2 Tokenization

3.1 Dataset Tokenization has been used in this process to address a sce-


nario where a given text will be separated into smaller bits
In this research, the Kaggle dataset has been used [5] for the known as tokens. The following are also regarded as tokens.
detection of insult over social media platforms. The dataset They include Words, numbers, and punctuation marks. In
used in this research will help us to solve the problem of the addition, another non-sensitive equivalent element replaced
classification of a single class. The label is either 0, which by a sensitive data element with no meaning or value. We
means a neutral statement or one which means an offen- assured that the tokenization method used was protected and
sive comment. In order words, we are using it as neutral tested using the best standards relevant to the safety of con-
irrespective that it does not belong to the class of insults. fidential data. The tokenization framework methodology we
The first attribute to consider is the date the comment was have used offers authority and APIs for obtaining tokens for

Fig. 1  Proposed model

13
Cyberbullying detection solutions based on deep learning architectures

data processing applications Where necessary and can be


detokenized back to sensitive data.

3.2.3 Stemming

The next step after we had gone through the tokenization


system is to transform the tokens into another standard
format. Stemming, simply means, we can now change the
words back to their form where we originally started from
but now with a decrease in the number of words types and/
or classes in the data. For example, we have used the words
“Running,” “Ran,” and “Runner” was reduced to the word
“run.”. it shows that stemming can actually be used for make
classification.

3.2.4 Lemmatization Fig. 2  Long short term memory architecture

Like stemming, the purpose of lemmatization is to minimize


inflectional forms to a specific base form. Lemmatization
does not necessarily break off inflections, rather than stem- When the activation function range Input from – 1 to
ming. It just uses the bases of lexical information to achieve 1 with a gate i(t) consisting of tanh, it made the current
the right fundamental types of vocabulary. input to become x(t) with attributes h(t−1) and C(t−1) . Inside
forget gate f (t) there is tanh and sigmoid function which
3.2.5 Stopwords is used as an activation function. It is interesting to note
here that the forget gate makes the decision of the num-
This paper has used insignificant words as languages capable ber of information to retain when it receives information
of creating noise as a useful feature when we are perform- from the previous output. For example, when we have a
ing text classification. Such terms are called Stop words. value to be 1, it means that the data was transferred to
We can see them used in sentences that assist in connecting the network. But, if it is 0, it means the data will not be
our thinking while helping with the way the sentences are allowed to pass through the network. Note that the output
constructed. Articles, prepositions and conjunctions, and gate o(t) also has sigmoid as another activation function
some pronouns, for example, are considered to stop words. with a range of -1 to 1. It shows that at any time mark, i(t) ,
Our method extracted common terms from the records, such o(t) , f (t) will be computed when we apply Eqs. (1), (2), and
as “a, for, an, are, like, at, are, by, for, from, how, in, is, in, (3), respectively.
on, or, the, these, this too, was when, where, where, how,
i(t) = 𝜎 W i [Ct−1 , h(t−1) , x(t) ] + bi , (1)
( )
how, how,” etc. Afterward, we store the documents being
processed and prepared for the next step.
o(t) = 𝜎 W o [Ct−1 , h(t−1) , x(t) ] + bi , (2)
( )

3.3 Application of long short term memory (LSTM)


f (t) = 𝜎 W f [Ct−1 , h(t−1) , x(t) ] + bi . (3)
( )
RNN, which we know as a class of artificial neural networks
with connections between nodes has some setbacks due to What we have shown is different from the usual Traditional
the excess number of network layers. It was a tough task, LSTM that works on bi-direction. This research we have
and we had to look at a recent study and discovered that the used two LSTMs which includes; one LSTM for upward
LSTM network is an answer to this challenge due to its chain and downward scanning and the other LSTM used is for
structure similar to that of multiple neural network modules right and left scanning. We have imputed the second LSTM
with RNN. Figure 2 represents the LSTM architecture, con- as a summation of the first LSTM. Therefore, our proposed
sisting of various gates, including 1st gate is the input gate, LSTM utilizes double input gates, output gates and forget
2nd gate is the output gate, and it also has a forget gate that gates in comparison to the traditional LSTM known. This
was used inside the LSTM model. We have used these gates achievement gives a better accuracy. However, our proposed
in selecting how information is accepted and rejected across model has more computational complexity and cost in terms
the network. of performance.

13
C. Iwendi et al.

3.4 Bidirectional long short term memory (BLSTM) critical data. This is important for the optimization of the
LSTM network output.
Bidirectional LSTM (BLSTM) model retains two separate
(4)
( [ ] )
input and forwards input states provided by two different ft = 𝜎 Wf ⋅ ht−1 , xt + bf
LSTMs.The first LSTM is a regular sequence starting from
ht−1 is the hidden state of the previous cell or the last cell’s
the starting of the paragraph, while the second LSTM is a
output, and xt is the input at that particular time step. We
standard sequence, the series of inputs are fed in the oppo-
used the weight matrices to multiply the data that was given,
site order. The concept behind the bi-directional network is
and a bias is applied. Following this, the value is added with
to gather knowledge about the inputs around it. It typically
the sigmoid function and generates a vector corresponding
knows more rapidly than a one-way approach, but it depends
to each number in the cell structure. The value varies from
on the mission as shown in Fig. 3 that represents the stricture
0 to 1. Again, If ’0’ is the output given as the value in the
of used BLSTM model.
cell state, the forgotten gate would want the cell state not to
BLSTM mode consists of 200 neurons, 2nd layer has 400
recognize the piece of knowledge. In the same way, a ’1’
neurons. We have 3 dense layers, 1st dense layer has 128
means the lost gate will automatically recall the whole bit
neurons, 2nd dense layer has 64 neurons, and 3rd dense layer
of knowledge. Finally, the vector output is multiplied to the
has 32 neurons, respectively. We also used 3 dropout layers
cell state from the sigmoid function.
to avoid over-fitting.

3.4.1 Forget gate 3.4.2 Input gate

We have applied the Forget gate to be responsible for the The duty of the input gate is for the addition of cell state
way extraction is done with the cell-state information by information. This was done by first involving a sigmoid
multiplying a filter. This stage removes the information that function where it regulates which values are to be added to
we don’t need to make the LSTM understand things or less cell state.

Fig. 3  Structure of BLSTM


used

13
Cyberbullying detection solutions based on deep learning architectures

(5)
( [ ] )
it = 𝜎 Wi ⋅ ht−1 , xt + bi

(6)
( [ ] )
C̃ t = tanh WC ⋅ ht−1 , xt + bC

This is similar to the gate[ no more ] recognized in the network


but acts as a filter for all ht−1 , xt information. It then gener-
ates a vector that will include all possible values applicable
to the cell state (as interpreted from ht−1 and xt ). This is
performed with the tanh function, outputting values from −1
to +1. Finally, the value of the regulatory filter (the sigmoid
gate) we have used is now multiplied to the vector gener-
Fig. 4  Working model of RNN
ated (the tanh function) and this information is then applied
through additional operation to the cell status.
where 𝜎(⋅) the activation function is considered to be. The
3.4.3 Output gate
activation feature may be Sigmoid, Relu or Tanh. At any
time mark t, the Concealed State Xt is calculated by using
The output gate acts as the selection center. Important cell
Equation (9) with the required parameters and inputs.
state information is picked as inputs. A vector is created after
applying the tanh function to the cell state when the scaling the
value of the ranges down to operate from −1 to +1.
3.5 Gated recurrent unit (GRU)
(7)
( [ ] )
ot = 𝜎 Wo ht−1 , xt + bo
We have applied GRU as the latest variant of RNN designed
to deal with short-term memory problems that are similar to
(8)
( )
ht = ot ∗ tanh Ct LSTM. Note that GRU does not have a cell state and makes
use of a hidden state to carry information. It consists of two
We then used the values of ht−1 and t to make a filter that
gates: a reset gate rt and an update gate zt represented by Eqs.
controls the values needed to be extracted from the vector
(11), (12). The update gate performs similar functions of
generated with the filter created using a sigmoid feature.
the forget gate and an input gate of an LSTM with a respon-
Finally, the value of this regulatory filter is multiplied to the
sibility to choose which information should be dropped or
vector generated using the tanh function.
included. The reset gate determines the amount of the previ-
ous data be forgotten since GRU has fewer gates compared
3.4.4 Recurrent neural networks
to LSTM, which speeds up the training process.
Due to the vanishing gradient problem, conventional Neural zt = 𝜎 wzt .xt + Uzt .ht−1 + bzt , (11)
( )
Networks (NN) do not provide us a satisfactory result when
implemented on time series data. In 1982 JohnHopfield imple-
rt = 𝜎 wrt .xt + Urt .ht−1 + brt , (12)
( )
mented RNN to address the above-mentioned subject matter.
See Fig. 4 represents the structure of RNN model.
where zt denotes update gate, 𝜎(⋅) represents the sigmoid
RNN is better with the trends of using NN learn over a time
function, w, U and b: parameter matrices and vector, ht
frame. RNN was used to forecast serial data such as actions in
denotes the output vector, xt denotes the input vector.
a video based on past events, voice audio, text events, etc. Fig-
ure 4 demonstrates the operating configuration for the RNN. In
the figure, Xt Its weight vector stands for the hidden layer and
represents the output layer weight vector; Et Is the output layer
4 Experiment results
Weight Vector, Dt denotes matrix for the input word. Times-
tamp on the hidden layer t is measured by using Equation (9).
Experimentation results are presented in this section follow-
(9) ing the idea from the authors in [15, 16, 20]. The experimen-
( )
Xt = 𝜎 A × Dt + C × Xt−1 ,
tation is carried out using “Google Colab”, Google’s online
Graphical Processing Unit (GPU). In this research we were
Et = 𝜎(B × Xt ), (10) equipped with Python 3.7 as our programming language, a
good personal computer operating with a higher capacity
Operating System and processor.

13
C. Iwendi et al.

4.1 Performance metrics size of each word designates its occurrence or significance


[12, 30, 34]. Therefore, from Figs. 5 and 6 we can see that
This paper uses the below performance metrics in evaluating
and projecting how our proposed BLSTM performed.
Confusion matrix We have used the confusion matrix to
evaluate the performance of a classification model. This is a
typical usage of the confusion matrix.
(True Positive + True Negative)
Accuracy = . (13)
Total Instances

True Positive
Precision = . (14)
(Predicted Instances = True)

True Positive
Recall = . (15)
Actual number of instances as True

2 × Precision × Recall
F1Measure = (16)
Precision + Recall
The word cloud we have implemented is an image col-
lection of cyberbullying words used in a precise text or
subject used in open online communication, in which the
Fig. 6  Frequency bargraph for neutral words

Fig. 5  Wordcloud for neutral


words

13
Cyberbullying detection solutions based on deep learning architectures

majority words used in neutral sentences are “people”, Precision, Recall and F1-Measure scores for the normal
“like”,“just”,“make”,“now”,“right”,“can”,“one”, “think” class using BLSTM model are 86%, 91% and 88% , respec-
these are majority words or most frequent words used in tively as seen in Fig. 11. Similarly for Insult class Precision
neutral sentences. Figure 7 depict that majority of words is 71%, Recall is 60% and F1-Measure is 65% , respectively.
in neutral sentences are positive and its count is 4232, For GRU normal class Precision, Recall and F1-Measure
and 3888 words are negative. 1831 words contain angry scores are 86%, 89% and 87% , respectively. Similarly for
words. Similarly, 1983 words contain fear in words. 1013 Insult class Precision, Recall and F1-Measure scores are
words contains sadness in words. Anticipations, disgust, 68% , 62% and 65% , respectively as shown Table 1 and
joy and trust words are 2174, 1488, 1783, 2945 in sen- Fig. 11. Classifier accuracy scores are given in Fig. 12.
tences, respectively. LSTM and RNN model Precision score for normal class
From Figs. 8 and 9, we can see that majority words used is 85% each , respectively. Similarly, Recall and F1-Meas-
in neutral sentences are “people”,“shit”,“think”,“idiot”,“life ure scores for both the models are 90% and 87% each,
”,“little”,“bitch” ,“back”,“dumb” these most frequent words respectively.
used in bad sentences. Receiver Operating Characteristic (ROC) curve for
Figure 10 depict that majority of words in bad sen- BLSTM is represented in Fig. 13 and for GRU ROC curve is
tences are negative and its count is 1930 and 947 words represented in Fig. 14. GRU area under the curve (AUC) has
are positive. 703 words contain angry words. Similarly, 632 a score of 74.72% which is an increased for BLSTM by 2%
words contains fear in words. 704 words contains sadness and for BLSTM AUC score is 76.56%. Similarly, AUC score
in words. Anticipations, disgust, joy and trust words are for LSTM and RNN are 73.30% as shown in Figs. 15 and 16,
488, 943, 417, 641 in sentences, respectively (see Figs. 11, respectively. BLSTM AUC is 3% higher then LSTM and 5%
12). from RNN algorithm. BLSTM on this dataset outperforms
In this research, 4 deep learning models were used for the others in terms of Precision, Recall, F1-Measure and AUC.
detection of cyberbullying. The results show that BLSTM LSTM model for insult class achieved 68% Precision, 80%
outperformed other deep learning models when we consider Recall and 63% F1-Measure , respectively. Similarly, Preci-
accuracy, Precision, Recall and F1-Measure. sion, Recall and F1-Measure scores for RNN model insult
From Table 2 we can see that BLSTM testing accuracy class are 69%, 57% and 62% , respectively. In this research,
is 82.18% while testing loss is 1.8 after 20 epoch. Similarly, we find what type of most frequent words are used to insult
GRU and RNN achieved 81.46% and 81.01% testing accu- someone over the social media. We also find different emo-
racy, respectively. Loss for GRU and RNN model is 1.9 and tions inside the text in both insulting text and in normal text.
1.5 , respectively. LSTM testing and loss scores are 80.86% From Table 3 we can see that for Bidirectional Long
and 2.1 , respectively. Short Term Memory 1735 tweets were detected correctly

Fig. 7  Emotion mining neutral words

13
C. Iwendi et al.

Fig. 8  Wordcloud for bad words

Table 1  Classification report for all models


Model Labels Precision Recall F1-Measure
BLSTM Normal 86 91 88

Insult 71 60 65
GRU​ Normal 86 89 87
Insult 68 62 65
LSTM Normal 85 90 87
Insult 68 80 63
RNN Normal 85 90 87
Insult 69 57 62

Table 2  Accuracy of all models Model Testing accuracy

BLSTM 82.18
GRU​ 81.46
LSTM 80.86
RNN 81.01
Fig. 9  Frequency bargraph for bad words

13
Cyberbullying detection solutions based on deep learning architectures

Fig. 10  Emotion mining bad words

Fig. 12  Classifiers accuracy

Fig. 11  Classification report false positive and false negative scores using RNN for
tweets are 312 and 415, respectively.
Receiver Operating Characteristic (ROC) curve for
as no abusing tweet while 176 tweets predicated as abus- BLSTM is represented in Fig. 13 and for GRU ROC curve is
ing tweets. For Long Short Term Memory true positive represented in Fig. 14. GRU area under the curve (AUC) has
and true negative tweets are 1712 and 421 , respectively. a score of 74.72% which is an increased for BLSTM by 2%
Similarly, false positive and false positive tweets for and for BLSTM AUC score is 76.56%. Similarly, AUC score
LSTM is 199 and 277 , respectively. True positive rate for LSTM and RNN are 73.30% as shown in Figs. 15 and 16 ,
tweets for Recurrent Neural Network and Gated Recur- respectively. BLSTM AUC is 3% higher then LSTM and 5%
rent Unit are 1699 and 1722 , respectively. Similarly, true from RNN algorithm. BLSTM on this dataset outperforms
negative tweets for both GRU and RNN are 450 and 415 others in terms of Precision, Recall, F1-Measure and AUC.
, respectively. For GRU false positive and false negative LSTM model for insult class achieved 68% Precision,
rates for tweets are 212 and 277, respectively. Similarly, 80% Recall and 63% F1-Measure , respectively. Similarly

13
C. Iwendi et al.

Fig. 13  BLSTM ROC curve Fig. 16  RNN ROC Curve

Table 3  Confusion matrix


Classifiers TP FP FN TN

BLSTM 1735 176 294 433


GRU​ 1699 212 277 450
LSTM 1712 199 306 421
RNN 1722 189 312 415

Precision, Recall and F1-Measure scores for RNN model


insult class are 69%, 57% and 62% , respectively. In this
research, we find what type of most frequent words are used
to insult someone over the social media. We also find dif-
ferent emotions inside the text in both insulting text and in
Fig. 14  GRU ROC curve normal text.
From Table 3, we can see that for Bidirectional Long
Short Term Memory 1735 tweets were detected correctly
as no abusing tweet while 176 tweets predicated as abusing
tweets. For Long Short Term Memory true positive and true
negative tweets are 1712 and 421 , respectively. Similarly
false positive and false positive tweets for LSTM is 199 and
277 , respectively. True positive rate tweets for Recurrent
Neural Network and Gated Recurrent Unit are 1699 and
1722 , respectively. Similarly, true negative tweets for both
GRU and RNN are 450 and 415 , respectively. For GRU
false positive and false negative rates for tweets are 212 and
277 , respectively. Similarly false positive and false negative
scores using RNN for tweets are 312 and 415, respectively.

5 Conclusion and future work


Fig. 15  LSTM ROC curve
The advent of information and networking technology has
created the good, the bad, and the ugly in online commu-
nication responses. These responses are often abused and
have caused irreparable emotional damage that most often

13
Cyberbullying detection solutions based on deep learning architectures

lead to depression and suicide on innocent individuals when 6. Bozyiğit, A., Utku, S., Nasiboğlu, E.: Cyberbullying detection
they were unable to speak out to get help from different by using artificial neural network models. In: 2019 4th Inter-
national Conference on Computer Science and Engineering
agencies or family members. Meanwhile, some researchers (UBMK), pp. 520–524. IEEE (2019)
have previously discussed this issue with traditional machine 7. Chavan, V.S., Shylaja, S.: Machine learning approach for detec-
learning models, however most of these models built in these tion of cyber-aggressive comments by peers on social media
experiments can be applied to one social network at a time. network. In: 2015 International Conference on Advances in
Computing, Communications and Informatics (ICACCI), pp.
In this paper, our novel methoodology is compared with the 2354–2358. IEEE (2015)
latest research on using deep learning-based models to make 8. Chen, H., Mckeever, S., Delany, S.J.: Presenting a labelled data-
their way in the detection of cyberbullying incidents. Our set for real-time detection of abusive user posts. In: Proceed-
proposed LSTM utilizes doubled input gates, output gates, ings of the International Conference on Web Intelligence, pp.
884–890 (2017)
and forget gates in comparison to the traditional LSTM in 9. Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive lan-
use. This achievement gives better accuracy. However, our guage in social media to protect adolescent online safety. In:
proposed model has more computational complexity and 2012 International Conference on Privacy, Security, Risk and
cost in terms of performance. This paper takes a closer Trust and 2012 International Confernece on Social Computing,
pp. 71–80. IEEE (2012)
look at resolving the limitations of previous studies with 10. Chisholm, J.F.: Review of the status of cyberbullying and cyber-
better identification efficiency when directly compared to bullying prevention. J. Inf. Syst. Educ. 25(1), 77 (2014)
standard versions. Furthermore, our paper has performed 11. Dadvar, M., Eckert, K.: Cyberbullying detection in social net-
an empirical analysis to determine the effectiveness and per- works using deep learning based models; a reproducibility
study. arXiv preprint arXiv:1812.08046 (2018)
formance of deep learning algorithms in detecting insults 12. Dwivedi, A.D., Malina, L., Dzurenda, P., Srivastava, G.: Opti-
in Social Commentary. Four deep learning models, namely mized blockchain model for internet of things based healthcare
RNN, LSTM, GRU, and BLSTM are used in the experi- applications. In: 2019 42nd International Conference on Tel-
ments. Data pre-processing steps are applied that include: ecommunications and Signal Processing (TSP), pp. 135–139
(2019)
text cleaning, tokenization, stemming, as well as lemma- 13. Frommholz, I., Al-Khateeb, H.M., Potthast, M., Ghasem, Z.,
tization to remove and stop words in the communication Shukla, M., Short, E.: On textual analysis and machine learn-
chain from getting to gullible users. The data from the pre- ing for cyberstalking detection. Datenbank-Spektrum 16(2),
processing step is later passed through clean textual data and 127–135 (2016)
14. Haidar, B., Chamoun, M., Serhrouchni, A.: Multilingual cyber-
directly into deep learning algorithms for prediction. We can bullying detection system: Detecting cyberbullying in arabic
conclude that the BLSTM model achieved high accuracy content. In: 2017 1st Cyber Security in Networking Conference
and F1-measure scores in comparison to RNN, LSTM, and (CSNet), pp. 1–8. IEEE (2017)
GRU. In the future, we shall integrate our deep learning 15. Iwendi, C., Jalil, Z., Javed, A.R., Reddy, T., Kaluri, R., Sriv-
astava, G., Jo, O.: Keysplitwatermark: zero watermarking
approach with automatic detection by tracking and object algorithm for software protection against cyber-attacks. IEEE
identification mechanism through the application of the next Access 8, 72650–72660 (2020)
phase of artificial intelligence (AI) 2.0 interface to aid law 16. Javed, A.R., Sarwar, M.U., Khan, S., Iwendi, C., Mittal, M.,
enforcement agencies in the smart city to curb the menace Kumar, N.: Analyzing the effectiveness and contribution of each
axis of tri-axial accelerometer sensor for accurate activity rec-
of cyberbullying. ognition. Sensors 20(8), 2216 (2020)
17. Jeyasheeli, P.G., Selva, J.J.: An iot design for smart lighting in
green buildings based on environmental factors. In: 2017 4th
International Conference on Advanced Computing and Com-
References munication Systems (ICACCS), pp. 1–5. IEEE (2017)
18. Kumar, A., Sachdeva, N.: Cyberbullying detection on social
1. Agrawal, S., Awekar, A.: Deep learning for detecting cyberbully- multimedia using soft computing techniques: a meta-analysis.
ing across multiple social media platforms. In: European Confer- Multimed. Tools Appl. 78(17), 23973–24010 (2019)
ence on Information Retrieval, pp. 141–153. Springer (2018) 19. Livingstone, S., Haddon, L., Hasebrink, U., Ólafsson, K.,
2. Al-Ajlan, M.A., Ykhlef, M.: Optimized twitter cyberbullying O’Neill, B., Smahel, D., Staksrud, E.: Eu kids online: Findings,
detection based on deep learning. In: 2018 21st Saudi Computer methods, recommendations. LSE, London, EU Kids Online.
Society National Computer Conference (NCC), pp. 1–5. IEEE http://lsede​signu​nit.com/EUKid​sOnli​ne (2014). Accessed May
(2018) 2020
3. Al-Hashedi, M., Soon, L.K., Goh, H.N.: Cyberbullying detection 20. Mittal, M., Iwendi, C., Khan, S., Rehman Javed, A.: Analysis of
using deep learning and word embeddings: An empirical study. In: security and energy efficiency for shortest route discovery in low-
Proceedings of the 2019 2nd International Conference on Compu- energy adaptive clustering hierarchy protocol using levenberg-
tational Intelligence and Intelligent Systems, pp. 17–21 (2019) marquardt neural network and gated recurrent unit for intrusion
4. Banerjee, V., Telavane, J., Gaikwad, P., Vartak, P.: Detection of detection system. Trans. Emerg. Telecommun. Technol. (2020).
cyberbullying using deep neural network. In: 2019 5th Interna- https​://doi.org/10.1002/ett.3997
tional Conference on Advanced Computing & Communication 21. Paez, G.R.: Assessing predictors of cyberbullying perpetration
Systems (ICACCS), pp. 604–607. IEEE (2019) among adolescents: the influence of individual factors, attach-
5. Bhaskaran, J., Kamath, A., Paul, S.: DISCo: Detecting insults in ments, and prior victimization. Int. J. Bullying Prev. 2, 149–159
social commentary. Stanford CS 229 Repository (2017) (2020). https​://doi.org/10.1007/s4238​0-019-00025​-7

13
C. Iwendi et al.

22. Patchin, J.W., Hinduja, S.: Bullies move beyond the schoolyard: secure assisted living iot environments. J. Supercomput. (2020).
a preliminary look at cyberbullying. Youth Viol. Juv. Just. 4(2), https​://doi.org/10.1007/s1122​7-020-03387​-8
148–169 (2006) 31. Van Bruwaene, D., Huang, Q., Inkpen, D.: A multi-platform
23. Pawar, R., Raje, R.R.: Multilingual cyberbullying detection sys- dataset for detecting cyberbullying in social media. Lang. Resour.
tem. In: 2019 IEEE International Conference on Electro Informa- Eval. (2020). https​://doi.org/10.1007/s1057​9-020-09488​-3
tion Technology (EIT), pp. 040–044. IEEE (2019) 32. Van der Zwaan, J., Dignum, M., Jonker, C.: Simulating peer sup-
24. Rakib, T.B.A., Soon, L.K.: Using the reddit corpus for cyberbully port for victims of cyberbullying. In: BNAIC 2010: 22rd Benelux
detection. In: Asian Conference on Intelligent Information and Conference on Artificial Intelligence, Luxembourg, 25–26 Octo-
Database Systems, pp. 180–189. Springer (2018) ber 2010. Citeseer (2010)
25. Rosa, H., Pereira, N., Ribeiro, R., Ferreira, P.C., Carvalho, J.P., 33. Wulczyn, E., Thain, N., Dixon, L.: Ex machina: Personal attacks
Oliveira, S., Coheur, L., Paulino, P., Simão, A.V., Trancoso, I.: seen at scale. In: Proceedings of the 26th International Conference
Automatic cyberbullying detection: a systematic review. Comput. on World Wide Web, pp. 1391–1399 (2017)
Hum. Behav. 93, 333–345 (2019) 34. Yazdinejad, A., HaddadPajouh, H., Dehghantanha, A., Parizi,
26. Siriaraya, P., Zhang, Y., Wang, Y., Kawai, Y., Mittal, M., Jesze- R.M., Srivastava, G., Chen, M.Y.: Cryptocurrency malware hunt-
nszky, P., Jatowt, A.: Witnessing crime through tweets: A crime ing: A deep recurrent neural network approach. Appl, Soft Com-
investigation tool based on social media. In: Proceedings of the put., 106630 (2020)
27th ACM SIGSPATIAL International Conference on Advances 35. Zhao, R., Mao, K.: Cyberbullying detection based on semantic-
in Geographic Information Systems, pp. 568–571 (2019) enhanced marginalized denoising auto-encoder. IEEE Trans.
27. Sugandhi, R., Pande, A., Agrawal, A., Bhagat, H.: Automatic Affect. Comput. 8(3), 328–339 (2016)
monitoring and prevention of cyberbullying. Int. J. Comput. Appl.
8, 17–19 (2016) Publisher’s Note Springer Nature remains neutral with regard to
28. Taddeo, M.: Three ethical challenges of applications of artificial jurisdictional claims in published maps and institutional affiliations.
intelligence in cybersecurity. Minds Mach. 29(2), 187–191 (2019)
29. Tokunaga, R.S.: Following you home from school: a critical
review and synthesis of research on cyberbullying victimization.
Comput. Hum. Behav. 26(3), 277–287 (2010)
30. Vallathan, G., John, A., Thirumalai, C., Mohan, S., Srivastava, G.,
Lin, J.C.W.: Suspicious activity detection using deep learning in

13

You might also like