A Study of Cyberbullying Detection Using Machine

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

A Study of Cyberbullying Detection Using Machine

Learning Techniques

1. Introduction
In this article, we propose a cyberbullying detection framework to generate features from Twitter
content(tweets) by leveraging a pointwise mutual information technique. Based on these features, we
have developed a supervised machine learning solution for cyberbullying detection and multi-class
categorization of its severity in Twitter. We have applied Embedding, Sentiment, and Lexicon features
along with PMI-semantic orientation. Extracted features were applied with Naïve Bayes, KNN, Decision
Tree, Random Forest, and Support Vector Machine algorithms.

In this article we first briefly present background on key areas that our study focuses upon. In section 2,
we outline related work in the state of the art related to classification of severity of cyberbullying. Section 3
provides the background for data usage for cyberbullying detection and its accessibility. Section 4 and 5
provide the research methodology framework used for cyberbullying detection and its severity. Proposed
framework evaluation and results are presented in section 6 and comparison of baseline and proposed
framework results are provided in section 7. Finally, the article provides some conclusions related to the
significance of the proposed framework and suggests some future work.

1.1. Online social network (OSN)

The Internet has become an essential component of the life of individuals and the growth of social media from
standard web pages (Web 1.0) to the Internet of Things (Web 3.0) has advanced how users access data, interact with
individuals and seek out information. ‘Social media’ refers to a set of tools developed and dedicated to support
social interactions online. The most popular are web-based technologies termed online social network (OSN).
Facebook, Twitter, Instagram, YouTube are examples of such OSNs. The empowerment that these networks have
brought have resulted in an interpersonal and expressive phenomenon that has enabled the connection of thousands
of users to other people around the world [1,2]. These OSNs are used by users as creative communication tools
where they can create profiles and communicate with others regardless of location or other limitations [3]. Beside
social interactions and supporting communications, social networking platforms have given us incalculable more
opportunities than ever before. Education, information, entertainment, and social communications can be obtained
efficiently by merely going online. For the vast majority, these opportunities are considered valuable, allowing
people to acquire understanding and knowledge at a much quicker pace than past generations.

Despite the undeniable benefits that OSNs can bring, people can be humiliated, insulted, bullied, and harassed by
anonymous users, strangers, or peers [4] on OSNs. This is because OSN users can be reached every minute of every
day and the fact that some users are able to stay unknown whenever they want: this unfortunately means that OSNs
can provide an opportunity for bullying to take place wherever and whenever that go beyond normal societal
situations [5]. Consequently, the rise of OSNs has led to a substantial increase in cyberbullying behaviours,
particularly among youngsters [6].

1.2 Adverse consequences


Although the use of internet and social media has clear advantages for societies, the frequent use of internet and
social media also has significant adverse consequences. This involves unwanted sexual exposure, cybercrime and
cyberbullying. Sexual exposure is where offenders impersonate victims in online ads, and suggest—falsely—that
their victims are interested in sex [7]. Cybercrime includes intellectual property thefts, spams, phishing
cyberbullying, and other forms of social engineering [8].

As OSNs are constructed to facilitate the sharing of information by users such as links, messages, videos and photos
[9], cybercriminals have exploited this in a new manner to perform different types of cybercrimes [10].

Cyberbullying, a type of bullying, has been proclaimed a serious risk to public health and the general public has
already received warnings from example the Centre for Disease Control and Prevention (CDC) [11]. Globally,
millions of people are affected every year across all cultures and social fields [12].

Cyberbullying can be defined as the use of information and communication technology by an individual or a group
of individuals to harass, threaten and humiliate other users [13]. Cyberbullying is a kind of harassment associated
with significant psychosocial problems [14]. Exposure to such incidences has been connected to depression, low
self-confidence, loneliness, anxiety and suicidal thoughts [15–20].

1.3 Severity of cyberbullying

Cyberbullying takes various forms, such as circulating filthy rumours on the bases of racism, gender, disability,
religion and sexuality; humiliating a person; social exclusion; stalking; threatening someone online; and displaying
personal information about an individual that was shared in confidence [21].

According to the national advocacy group in US, the bullying can take several forms: racism and sexuality are two
of these [22]. Based on a report at Pew Research Centre, two distinct categories of online harassment have been
described among internet users. The first category includes less severe experiences: it involves swearing and
humiliation, because those who see or experience it often claim they ignore it. The second category of harassment
although targeting a smaller number of online users, includes more severe experiences such as physical threats,
long-term harassment, trapping and sexual harassment [23].

Assessing the severity level of a cyberbullying incident may be important in depicting the different correlations
observed in cyberbullying victims, and principally, how these incidents impact victims’ experience with
cyberbullying [24]. Researchers, however, have not paid enough attention to the extent to which the different
cyberbullying incidents could have more severe impact upon victims. Therefore, it is significant to develop a
method to identify the severity of cyberbullying in OSNs.

Our contribution can be summarized as follows:

 We highlight the limitation of existing techniques related to cyberbullying detection and


its severity levels.
 We provide a systemic framework for identifying cyberbullying severity in online social
networks, which is based on previous research from different disciplines. We build
machine learning multi-classifier for classifying cyberbullying severity into different
levels. Our cyberbullying detection model work with multi-class classification problem
and as well as for binary class classification problem.
2. LITERATURE SURVEY

In 2020, Vimala Balakrishnan et al. [1] presented an automatic cyberbullying detection taking Twitter
users’ psychological features into account. The three main stages discussed in improving cyberbullying
detection are Twitter data collection, feature extractions, and cyberbullying detection and
classification.The annotated dataset contained 9484 tweets, out of which 4.5% of users are labelled as
bullies, 31.8% as spammers, 3.4% as aggressors, and 60.3% as normal.However, the final dataset
contained 5453 tweets as a result of the pre-processing step which included removing non-English
tweets, profiles containing no data, and special characters. The features extracted were text features,
user features, and network features. The model was executed using WEKA 3.8 with 10-fold cross-
validation. Since Naïve Bayes performed poorly during preliminary experimental analysis it was
eliminated while Random Forest and J48 continued to perform well. The classifiers were trained using
manually annotated data. In 2020, Jaideep Yadav et al. [2] proposed a novel pretrained BERT model
developed by Google researchers that generates contextual embeddings and task-specific embeddings.
In the proposed method, for the base model, a deep neural network called the Transformer is used. The
Bert contains 12 layers to encode the input data and is built on top of a base model. The data is
tokenized and padded accordingly and is fed into the model which generates the final embeddings. The
classifier layer classifies the embeddings generated by the previous layers and generates the final output
accordingly. Using a pre-trainedBERT model they were able to achieve efficient and stable results in
comparison to the previous models to detect cyberbullying. In 2020, Sudhanshu Baliram Chavan et al.
[3] proposed the approach to detect cyberbullying on Twitter. The required dataset was collected from
sources like GitHub, Kaggle. Initially, the data is pre-processed and features are extracted using a TFDIF
vectorizer algorithm. These tweets are then passed through the naive Bayes and SVM model and are
classified accordingly. When a tweet is categorized as bullying, ten other tweets from that users'
account will be fetched and passed through naive Bayes and SVM classifiers again. If the overall
probability of that user’s tweets lies above 0.5 then it will be considered as a bullied tweet.Based on the
accuracy score and the results it was evident that the SVM model outperformed the naive Bayes with
the accuracy score of 71.25%. In 2019, John Hani et al.[4] presented a supervised learning approach to
detect cyberbullying. As a part of the preprocessing step, data is cleaned by removing the noise and
unnecessary text. This is performed using tokenization, lowering text, stop words along with encoding
cleaning and word correction. The second step is the feature extraction step which is done using TF IDF
and sentiment analysis technique including NGrams for considering different combinations of the words
like 2- Gram, 3-Gram, and 4- Gram. The cyberbullying dataset from Kaggle is split into ratios (0.8, 0.2)for
train and test. SVM and Neural networks are used as classifiers that run on a different n-gram language
model. Accuracy, recall and precision, and f-score are the performance measures. It is found that Neural
Network performed better than the SVM classifier. Neural Network achieved an average f-score of
91.9% and SVM achieved an average f-score of 89.8%. In 2018, Monirah Abdullah Al-Ajlanet and Mourad
Ykhlef[6] proposed a novel algorithm CNN-CB which is based on a convolutional neural organization and
adapts the idea of word embedding. The architecture comprises four layers - Embedding, Convolution
Layer, Max Pooling Layer, and Dense Layer. The first layer, word embedding, creates a vector space of
vocabulary which is the input to the subsequent layer, the convolutional layer,which compresses the
input vector without losing significantfeatures. The third layer, the Max pooling layer, takes the output
of the second layer as its input and finds the maximum value ofthe chosen region to save just significant
highlights. The last layer, the Dense layer, does the classification. This gave a precision of 95%. In 2018,
Monirah A. Al-Ajlan et al. [7] proposed optimized Twitter cyberbullying detection based on deep
learning (OCDD) which does not extract features from tweets instead, it represents a tweet as a set of
word vectors that are fed to a convolutional neural network (CNN)for classification.Hence the feature
extraction and selection phases are eliminated in this approach. To represent the semantics between
words, word embedding is used and is generated using (GloVe) technique. CNN uses a lot of parameters
and to optimize these values, a metaheuristic optimization algorithm is used to find optimal or near-
optimal values that will be used for classification. CNN showed great results. In 2017, Yee Jang Foong
and Mourad Oussalah [10] presented an automated cyberbullying detection that uses natural language
processing techniques, text mining, and machine learning. For dataset ASKfm, a social media platform
where users can anonymously ask questions and view a sample of a user’s profile is used. As a part of
the preprocessing procedureweb links and unknown characters are removed, incorrect wordings in case
any are corrected, and also lexicons are replaced with equivalent textual expressions. A combination of
features has been used which includes TF-IDF, Unusual capitalization count, LIWC, and Dependency
parser. The data set is split into a 70% training set and 30% testing set. SVM was used as a classifier
which was trained with a linear kernel on the training data.To label the training posts Amazon
Mechanical Turk Service was used. The combination of features mentioned above yielded the highest
performance in terms of accuracy, precision, recall, F1, and F2 scores. In 2016, X. Zhang et al. [11]
proposed a novel approach based on a pronunciation-based convolutional neural network (PCNN).
Word-to-Pronunciation conversionis done to group a set of words spelled incorrectly, which have the
same meaning and pronunciation, together with the corrected word. Two separate CNN is used to
establish a baseline. For the first baseline feature set, word-embedding based on Google’s word-vector
was used. For the creation of the feature set of the second baseline, CNN Random, arbitrarily generated
vectorswere used.The phoneme codes were arbitrarily introduced into vectors for the feature set for
PCNN. To handle class imbalance three techniques were implemented- threshold moving, cost function
adjust, and a hybrid solution, out of which cost function adjusting is most effective. In 2016, Michele Di
Capua et al. [12] presented an unsupervised approach to detect cyberbullying using a design model
inspired by Growing Hierarchical SOMs. Firstly, features are divided into four groups: Syntactic features,
Semantic features, Sentiment features, Social features.GrowingHierarchical SelfOrganizing Map
(GHSOM) network algorithm, which is well suited for a large collection of documents that has to be
classified, is used. It uses a hierarchical structure of multiple layers, where each layer consists of a
variety of independent SOMs. A single SOM is employed atthe rootlayer. For every unit, during this map,
a SOM could be added to the subsequent layer of the hierarchy. GHSOM Network is trained and tested
concerning a K-folded dataset, applying a K-fold partitioning of data. In 2014, Sourabh Parime and
Vaibhav Suri[13] presented an approach of using data mining and machine learning techniques to detect
cyberbullying.Text mining is performed on unstructured data using machine learning techniques to
extract knowledge from the text which includes multiple stages like document clustering, data pre-
processing, attribute generation for which an in-built classifier is used to generate labels from the
features fed into it and occurrences are counted and a weight is assigned to each label and irrelevant
attributes are removed which helps to estimate the nature of the comments. Sentiment analysis is used
for determining the tone of the given text. Two classes of data are considered one with positive
emotions and the other with negative emotions. These are stored into a vector and used to train a
supervised learning algorithm SVM. In 2011, Kelly Reynolds et al. [14] presented a languagebased
method for detecting cyberbullying. Data for the dataset is collected from the website Formspring.me
which is a question and answer-based website where users openly invite others to ask and answer
questions. This website is highly populated by teens and college students increasing the percentage of
bullying content. Amazon’s Mechanical Turk labelled a post as "yes" if it was a cyberbullying post else
"no". Out of 2696 posts in the training set, 196 received a final class label of “yes,” and out of 1219 posts
in the test set, 173 were recognized as cyberbullying. The SUM and TOTAL features that were used to
measure the overall badness of a post were included in both the versions of the datasets which
wereNUM andNORM. Weka a software suite for machine learning which uses J48, JRIP, IBK, and SMO
algorithms. Using 10- fold cross-validation itis observed that the NORM training set outperforms the
NUM training set except for the SMO algorithm. In 2011, Roi Reichart et al. [15] proposed the approach
to detect cyberbullying on social media using a range of binary and multiclass classifiers. They have used
a dataset from the YouTube comment section and grouped it into labels of sexuality, physical
appearance, race, and intelligence they have trained various supervised models like JRip, SVM, J48, and
Naive Bayes. They have experimented with a binary classifier trained for specific labels and multiclass
classifiers on all combined labels. On examining the kappa statistic, accuracy it was evident that the
label-specific classifiers outperformed the multiclass classifiers in detecting cyberbullying.
REFERENCES [1] Balakrishnan, Vimala & Khan, Shahzaib & Arabnia, Hamid. (2020). Improving
Cyberbullying Detection using Twitter Users’ Psychological Features and Machine Learning. Computers &
Security. 90. 101710. 10.1016/j.cose.2019.101710. [2] J. Yadav, D. Kumar and D. Chauhan,
"Cyberbullying Detection using Pre-Trained BERT Model," 2020 International Conference on Electronics
and Sustainable Communication Systems (ICESC), Coimbatore, India, 2020, pp. 1096-1100, doi:
10.1109/ICESC48915.2020.9155700. [3] R. R. Dalvi, S. Baliram Chavan and A. Halbe,"Detecting A Twitter
Cyberbullying Using Machine Learning," 2020 4th International Conference on Intelligent Computing and
Control Systems (ICICCS), Madurai, India, 2020, pp. 297-301, doi: 10.1109/ICICCS48265.2020.9120893
[4] John Hani, Mohamed Nashaat, Mostafa Ahmed, Zeyad Emad, Eslam Amer and Ammar Mohammed,
“Social Media Cyberbullying Detection using Machine Learning” InternationalJournal ofAdvanced
Computer Science and Applications(IJACSA),10(5),2019. [5] R. Pawar and r. R. Raje, "multilingual
cyberbullying detection system," 2019 ieee international conference on electro information technology
(eit), brookings, sd, usa, 2019, pp. 040-044, doi: 10.1109/eit.2019.8833846. [6] Monirah Abdullah Al-
Ajlan and Mourad Ykhlef, “Deep Learning Algorithm for Cyberbullying Detection” InternationalJournal
ofAdvanced Computer Science and Applications(IJACSA), 9(9), 2018.
http://dx.doi.org/10.14569/IJACSA.2018.090927 [7] M. A. Al-Ajlan and M. Ykhlef, "Optimized Twitter
Cyberbullying Detection based onDeep Learning," 2018 21st Saudi Computer Society National Computer
Conference (NCC), Riyadh, 2018, pp. 1-5, doi: 10.1109/NCG.2018.8593146. [8] B. Haidar, m. Chamoun
and a. Serhrouchni, "arabic cyberbullying detection: using deep learning," 2018 7th international
conference on computer and communication engineering (iccce), kuala lumpur, 2018, pp. 284-289, doi:
10.1109/iccce.2018.8539303. [9] Noviantho, s. M. Isa and l. Ashianti, "cyberbullying classification using
text mining," 2017 1st international conference on informatics and computational sciences (icicos),
semarang, 2017, pp. 241-246, doi: 10.1109/icicos.2017.8276369. [10] Y. J. Foong and M. Oussalah,
"Cyberbullying System Detection andAnalysis," 2017 European Intelligence and Security Informatics
Conference (EISIC), Athens, 2017, pp. 40-46, doi: 10.1109/EISIC.2017.43. [11] X. Zhang et al.,
"Cyberbullying Detection with a Pronunciation Based Convolutional Neural Network," 2016 15th IEEE
International Conference on Machine Learning and Applications (ICMLA), Anaheim, CA, 2016, pp. 740-
745, doi: 10.1109/ICMLA.2016.0132. [12] M. Di Capua, E. Di Nardo and A. Petrosino, "Unsupervised
cyber bullying detection in social networks," 2016 23rd International Conference on Pattern Recognition
(ICPR), Cancun, 2016, pp. 432-437, doi: 10.1109/ICPR.2016.7899672. [13] S. Parime and V. Suri,
"Cyberbullying detection and prevention: Data mining and psychological perspective," 2014
International Conference on Circuits, Power and Computing Technologies [ICCPCT-2014], Nagercoil,
2014, pp. 1541-1547, doi: 10.1109/ICCPCT.2014.7054943. [14] K. Reynolds, A. Kontostathis and L.
Edwards, "Using Machine Learning to Detect Cyberbullying," 2011 10th International Conference on
Machine Learning and Applications and Workshops, Honolulu, HI, 2011, pp. 241-244, doi:
10.1109/ICMLA.2011.152. [15] Dinakar, K., Roi Reichart and H. Lieberman. “Modeling the Detection of
Textual Cyberbullying.” The Social Mobile Web (2011).

Methodology:

Data collection: The major source for data collection is social networks like Twitter, Face book, and
Instagram. Data can be extracted from these websites using authorized application program interfaces
(APIs) by supplying keywords, hashtags, and user profiles [13]. The drawback of extracting data from
SM’s APIs is a limited amount of data returned [14].

Data Wrangling: Data wrangling is the process of structuring and cleaning the metadata into a
meaningful format. It is one of the crucial steps in building the models because the extracted data may
be in an unstructured format, which affects the performance of the model. It consists of tasks like
finding missing values, identifying outliers in the raw data, and removal of unnecessary data. Data
wrangling is also being referred to as the pre-processing of data (Fig. 1).

Feature Engineering: The performance of the predictive models depends on choosing the right feature
set, which is used to train the model [15]. This task plays a vital role in building cyberbullying prediction
models by machine learning algorithms [16, 17]. The right combination of feature set gives the effective
classifier for the prediction of cyberbullying activity from machine learning algorithms [18]. Figure 2
displays some of the features used for cyberbullying prediction models [19].

Feature selection Techniques: Feature selection techniques are very useful to increase the performance
of cyberbullying prediction models. Chi-square test and PCA are used to extract significant features from
the set [20]. Information Gain explains the relation between training datasets and class labels [21].
Pearson correlation could be utilized to decrease the feature dimensionality in classification model

The chi-square test is used to test independence between a feature


and its class.

Machine learning Algorithms: Most of the machine learning algorithms follow three kinds of
approaches for cyberbullying detection: (1) Supervised learning approach, (2) Lexicon-based approach,
and (3) Rule-based approach.

Support Vector Machine (SVM): It is a kind of supervised classifier mostly used in the classification of
text [24]. Chen et al. developed a prediction model for the offensive text in social media [25].
Mangaonkar et al. implemented the SVM model for the detection of cyberbullying on Twitter [26].
Dinakar et al. considered comments from YouTube to predict cyberbullying using SVM and concluded
that SVM is more accurate than Naïve Bayes and J48 [27].

Naïve Bayes (NB): NB classifiers can be implemented by Bayes’ theorem between features. It is mostly
used for the classification of text. NB is used to build the prediction model for cyberbullying [28].
Shreyas Kumar et al. used the NB algorithm in Twitter Bullying Detection [29]. NB is a very frequently
used algorithm in machine learning.

Random forest (RF): RF combines the decision tree and ensemble process. RF selects feature variable
randomly for classification. RF was used to build cyber prediction models [30]. RF works based on the
principle of bagging and improves classification performance as compared to other models. The
advantages of RF are proper handling of missing values and the classifier model will not overfit.

Decision Tree (DT): C 4.5 is the most improved algorithm in DT, which is used in cyberbullying
prediction models [31]. This initial step of C 4.5 follows the divide and conquers method. DT can be used
for both regression and classification problems. Information gain and entropy are the two measures for
the decision trees. DT can be used for both categorical and numerical data. Overfitting problem occurs
due to noise data in decision trees.

K-nearest neighbor (KNN): KNN is a basic non-parametric classification algorithm in machine learning. It
uses Euclidean distance as a parameter. KNN classifier is used for the prediction of cyberbullying
messages in Turkish [32]. KNN is also called as a lazy learner because it doesn’t involve in any training
process. The disadvantage of KNN is the processing time becomes slow when the volume of data
increases.

Logistic regression (LR): LR is a static technique used for classification. It uses the logistic sigmoid
function to transform its output and return a probability value. LR separates the hyperplane between
two datasets using the logistic function. LR takes features (variables) as inputs and generates probability
value in the output. If the probability value is >0.5, the classification is positive class, otherwise negative
class. Figure 3 shows the frequency of algorithms used in cyberbullying (Fig. 4).
Results and Discussion

This section presents the results of the experiments and discusses their significance. First, each
classifier’s performance results have been listed and discussed in Table 2, where it shows the
evaluations of each classifier in terms of precision, recall, and F1 score, respectively. Secondly, the
training time complexity of each algorithm is illustrated in Table 3. These will be discussed in detail in
the following sections
Both Random Forest and J48 performed well for the cyberbullying classifications, although J48
performed slightly better (no significant difference). As the intent of the study is not in determining the
best algorithm, therefore, only results for J48 are presented in this section. Results from both Tables 2
and 3 are in accordance with empirical evidences that have shown personalities (Alonso and Romero,
2017; van Geel et al., 2017; Festl and Quandt, 2013) and sentiments (Xu et al., 2012; Dinakar et al., 2011;
Nahar et al., 2014) to be linked with cyberbullying perpetration. Although a direct comparison is not
possible with other cyberbullying detection studies due to the nature of the dataset used, different
classification algorithms and analysis mechanisms, our findings generally indicate a higher effectiveness
in cyberbullying detection (e.g., our AUC of 0.970 as opposed to 0.943 in Al-garadi et al. (2016); 0.817 in
Dani et al. (2017); and 0.815 in Chatzakou et al. (2017a). We conclude that this is probably due to the
inclusion of users’ psychological features, namely their personalities and sentiment which were not
investigated in any of these mentioned studies. As personalities were found to improve cyberbullying
detection, a further analysis was administered whereby each individual personality traits was examined
for its impact on cyberbullying detection. Table 4 provides the accuracy and F-scores for the individual
models. Higher scores for accuracy and F-scores for extraversion, agreeableness, and neuroticism (Big
Five) and psychopathy (Dark Triad) indicate that these traits have greater impacts on cyberbullying
detection. T-tests indicate significant differences between extravert
sion, agreeableness and neuroticism, with openness and conscientiousness (i.e., p < 0.05). As for Dark
Triad, psychopathy was found to be significantly different with Machiavellianism and narcissism (p <
0.001). Similar outcomes were reflected in previous empirical studies in which the traits were found to
have significant relations with cyberbullying perpetrations (van Geel et al., 2017). For example,
extraverted people have a higher tendency to engage in cyberbullying perpetration to increase their
social status (van Geel et al., 2017), and they communicate and use social media more compared to
those who score low on extraversion (Marshall et al., 2015). The inclusion of emotion however, had no
positive impact on the detection model’s performance (see Table 2). In fact, emotion resulted in lower
accuracies in both Baseline + Personality + Sentiment + Emotion (i.e., 91.12%) and Baseline + Sentiment
+ Emotion (i.e., 89.95%), compared to those without and thus indicating no significant effect of emotion
in cyberbullying detection, an observation that was reflected in Patch (2015). This could be attributed to
the nature of the dataset, for example, most often negative emotions such as angry, fear,
embarrassment etc. are related to cyber victims (Balakrishnan, 2018; Xu et al., 2012; Gan et al., 2014;
Kokkinos et al., 2014), although there is a tendency among bullies to exhibit these emotions to a certain
extent (Balakrishnan, 2018; Schenk et al., 2013). The dataset lacked tweets related to victims; hence this
may have affected the impact of emotion on the detection mechanism. Looking at the best performing
model (i.e., Baseline + Personality + Sentiment), we wanted to identify the specific features tha
may have contributed to the cyberbullying detection. For this reason, data dimensionality reduction
technique was applied, particularly wrapper feature selection method. The wrapper method basically
uses a predetermined learning algorithm (e.g., K-Means, Affinity Propagation, Greedy algorithm etc.) to
prepare, evaluate and select the best features. The study used the wrapper method due to its high
accuracy and ability to consider interactions between features and predictive models (Jindal and Kumar,
2017). To identify the top 10 key features, the greedy algorithm based on the best-first search were
administered. This technique basically lists the best features first (or deletes the worst feature first) in
each round (Hall et al., 2009; Jindal and Kumar, 2017; Panthong and Srivihok, 2015). The 10 key features
produced were number of followers, following, popularity, user favorite count and status count (i.e.,
Twitter features), extraversion, agreeableness and neuroticism (Big Five), psychopathy (Dark Triad) and
sentiment. These key features were integrated into a single model, and compared against the best
performing model in Table 2. Fig. 3 clearly indicates that when key features are used, the performance
of the cyberbullying detection model is further improved. The finding indicates that although multiple
features can be used to enhance cyberbullying detection, specific features play more profound roles in
the process of detecting bullying patterns online. Table 5 depicts the breakdown of the classification for
the key feature model, indicating the model performed the best in detecting bullies online (i.e., 92.88%)
compared to the Baseline + Personality + Sentiment model. The present study showed that not only
personalities and sentiment can be effectively used to detect cyberbullying, but focusing on specific
features further improves the detection process. The findings add support to existing empirical
evidences linking specific personalities, particularly extraversion, agreeableness, neuroticism and
psychopathy to cyberbullying perpetration, whereby these key traits had been shown to significantly
improve online bullying detection. The top Twitter features extracted by the dimension reduction
technique, namely, number of followers, following, popularity, user favorite count and status count
belong to the user and network-based features, suggesting that activities and connectivity of a user in
the network play important roles in identifying the bullies and non-bullies, a phenomenon observed in
Algaradi et al. (2016) and Chatzakou et al. (2017a).
4.1. Evaluation Metrics

The effectiveness of a proposed model was examined in this study by utilizing serval evaluation
measures to evaluate how successfully the model can differentiate cyberbullying from non-
cyberbullying. In this study, seven machine learning algorithms have been constructed, namely, LR, Light
LGBM, SGD, RF, AdaBoost, Naive Bayes, and SVM. It is essential to review standard assessment metrics
in the research community to understand the performance of conflicting models. The most widely used
criteria for evaluating SM platforms (e.g., Twitter) with cyberbullying classifiers are as follows:

Accuracy

Accuracy calculates the ratio of the actual detected cases to the overall cases, and it has been utilized to
evaluate models of cyberbullying predictions in [60,65,79]. Therefore, it can be calculated as follows:
Accuracy = (tp + tn) (tp + fp + tn + fn) (5) where tp means true positive, tn is a true negative, fp denotes
false positive, and fn is a false negative.

• Precision calculates the proportion of relevant tweets among true positive (tp) and false positive (fp)
tweets belonging to a specific group.

• Recall calculates the ratio of retrieved relevant tweets over the total number of relevant tweets.

• F-Measure provides a way to combine precision and recall into a single measure that captures both
properties.

The three evaluation measures listed above have been utilized to evaluate cyberbullying prediction
models in [67,79,98,104]. They are calculated as follows:

Precision = tp/(tp + fp) , Recall = tp/(tp + fn) , F measure = (2 × precision × recall)/ (recision + recall), (4.2.
Performance Result of Classifiers

The proposed model utilizes the selected seven ML classifiers with two different feature extraction
techniques. These techniques were set empirically to achieve higher accuracy. For instance, LR achieved
the best accuracy and F1 score in our dataset, where the classification accuracy and F1 score are 90.57%
and 0.9280, respectively. Meanwhile, there is a slight difference between LR, SGD, and LGBM classifier
performance, where SGD achieved an accuracy of 90.6%, but the F1 score was lower than LR. However,
the LGBM classifier achieved an accuracy of 90.55%, and the F1 score was 0.9271. This means LR
performs better than other classifiers, as shown in Table 2. Moreover, RF and AdaBoost have achieved
almost the same accuracy, but in terms of F1 Score, RF performs better than AdaBoost. Multinomial NB
has achieved low accuracy and precision with a detection rate of 81.39% and 0.7952, respectively, and
we can notice that the excellent recall levels-out the low precision, giving a good F-measure score of
0.8754 as illustrated in Table 2. Finally, SVM has achieved the lowest accuracy and precision in our
dataset, as shown in Figure 5. Nevertheless, it achieved the best recall compared to the rest of the
classifiers implemented in the current research. Furthermore, some studies have looked at the
automatic cyberbullying detection incidents; for example, an effect analysis based on lexicon and SVM
was found to be effective in detecting cyberbullying. However, the accuracy decreased when data size
increased, suggesting that SVM may not be ideal for dealing with common language ambiguities typical
of cyberbullying [61]. This proves that the low accuracy achieved by SVM is due to the large dataset used
in this research.
F-measure is one of the most effective evaluation metrics. In this research, the seven classifiers’
performances were computed using the F-measure metric, as shown in Figure 6. Furthermore, the
performances of all ML classifiers are enhanced by producing additional data utilizing data synthesizing
techniques. Multinomial NB assumes that every function is independent, but this is not true in real
situations [115]. Therefore, it does not outperform LR in our research as well. As stated in [116], LR
performs well for the binary classification problem and works better as data size increases. LR updates
several parameters iteratively and tries to eliminate the error. Simultaneously, SGD uses a single sample
and uses a similar approximation to update the parameters. Therefore, SGD performs almost as LR, but
the error is not as reduced as in LR [92]. Consequently, it is not surprising that LR also outperforms the
other classifiers in our study.

4.3. Time Complexity of Algorithms Table 3 shows the time complexity of the best and the worst
algorithms in terms of training and prediction time. The results in Table 3 indicate that Multinomial NB
has achieved the best training time, and RF has obtained the worst training time, 0.014s and 2.5287s,
respectively. Meanwhile, LR outperforms all the classifiers implemented in this research. However, there
were slight differences between SGD and Multinomial NB compared to LR, as shown in Table 3

You might also like