Urdu Sentiment Analysis With Deep Learning Methods

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/352810896

Urdu Sentiment Analysis With Deep Learning Methods

Article in IEEE Access · June 2021


DOI: 10.1109/ACCESS.2021.3093078

CITATIONS READS

77 2,126

5 authors, including:

Ammar Amjad Noman Ashraf


National Yang Ming Chiao Tung University Johnson & Johnson
10 PUBLICATIONS 279 CITATIONS 18 PUBLICATIONS 445 CITATIONS

SEE PROFILE SEE PROFILE

Hsien-Tsung Chang Alexander Gelbukh


Chang Gung University Instituto Politécnico Nacional
86 PUBLICATIONS 1,127 CITATIONS 724 PUBLICATIONS 13,455 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Ammar Amjad on 18 July 2021.

The user has requested enhancement of the downloaded file.


Received May 20, 2021, accepted June 19, 2021, date of publication June 28, 2021, date of current version July 15, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3093078

Urdu Sentiment Analysis With


Deep Learning Methods
LAL KHAN 1 , AMMAR AMJAD1 , NOMAN ASHRAF2 ,
HSIEN-TSUNG CHANG 1,3,4,5 , (Member, IEEE),
AND ALEXANDER GELBUKH 2
1 Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan 333, Taiwan
2 Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional, Ciudad de México 07738, Mexico
3 Bachelor Program in Artificial Intelligence, Chang Gung University, Taoyuan 333, Taiwan
4 Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Taoyuan 333, Taiwan
5 Artificial Intelligence Research Center, Chang Gung University, Taoyuan 333, Taiwan

Corresponding author: Hsien-Tsung Chang (smallpig@widelab.org)


This work was supported in part by Chang Gung Memorial Hospital under Grant CMRPD2J0023, and in part by Chang Gung University
under Grant BMRPA07.

ABSTRACT Although over 169 million people in the world are familiar with the Urdu language and a large
quantity of Urdu data is being generated on different social websites daily, very few research studies and
efforts have been completed to build language resources for the Urdu language and examine user sentiments.
The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu
language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment.
To find the best technique, we compare two modes of text representation: count-based, where the text is
represented using word n-gram feature vectors and the second one is based on fastText pre-trained word
embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP,
LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types.
Our study shows that the combination of word n-gram features with LR outperformed other classifiers for
sentiment analysis task, obtaining the highest F1 score of 82.05% using combination of features.

INDEX TERMS Urdu sentiment analysis, machine learning, deep learning, natural language processing.

I. INTRODUCTION from other languages due to its morphological structure as it


In recent years, with the remarkable increase in the use starts from right to left. Due to its morphological structure,
of hand-held devices and the internet, the use of social the Urdu script is not very common; therefore, a standard
media such as Twitter, Facebook, and blogs has been equally dataset or corpora is required to perform natural language
increasing by individual users to express their emotions and processing tasks.
sentiments [1]–[3]. Currently, people want to publicly share Sentiment analysis of the Urdu language is equally essen-
their opinions, feedback, reviews, and feelings about prod- tial, as it is important in other languages, as it assists non
ucts, politics, or any viral news. As a results, businesses and Urdu speakers in grasping the basic feelings, emotions, and
institutes are searching for useful information from social opinions of any user behind text. Urdu is the national and
media [4]–[7]. Therefore, there is a need for intelligent sys- official language of Pakistan and commonly spoken medium
tems such as sentiment analyzers, which can convert raw in many states of India.1 In regard to social media, several
social media user data into useful information. To recognize native Urdu speakers use Urdu script on platforms such as
and detect emotions and for sentiment analysis, languages Twitter, Facebook, and YouTube to express their emotions,
such as English, French, German, and other European lan- feelings, and opinions. As a result, it is important to analyze
guages are considered rich languages in terms of tool acces- the Urdu text to understand the opinions and feelings of native
sibility. Nevertheless, languages such as Urdu, Punjabi, and Urdu speakers.
Hindi are judged resource deprived [8]. Urdu is very different There are many problems with Urdu sentiment analysis,
such as a shortage of recognized lexical resources [9]–[11].
The associate editor coordinating the review of this manuscript and
approving it for publication was Hao Ji. 1 https://www.ethnologue.com/language/urd

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 97803
L. Khan et al.: Urdu Sentiment Analysis With Deep Learning Methods

Mostly, Urdu websites are developed in a descriptive arrange- Similarly, the 2014 version of the SemEval Twitter dataset
ment rather than a proper text encoding structure; due to this contains 1,853 user tweets and 1,142 LiveJournal news [15].
hurdle, it is challenging to create a benchmark corpus in Urdu. The 2016 and 2017 versions of the SemEval datasets were
Urdu sentiment analysis has not yet been investigated com- split into training, development, and test sets for each sub-
pletely even after its considerable use; most of the existing task [16]. In this edition there were five subtasks: A, B, C, D,
literature studies are focused on different aspects of language and E.
processing [12], [13]. In addition to the SemEval efforts, Korean, German, and
In this paper, the primary focus is to contribute a bench- Indonesian languages have also been investigated for sen-
mark corpus for Urdu sentiment analysis. Our corpus known timent analysis. A Korean dataset was created (KOSAC)
as Urdu Corpus for Sentiment Analysis (UCSA). This new that contains 332 news articles. Their primary aim was to
dataset and experiments provide a benchmark enabling fur- examine sentiments in Korean and they used Korean subjec-
ther research in sentiment analysis in Urdu language. tivity markup language to annotate their dataset [17]. Another
The main contributions of this research are as follows: dataset was developed that contains customer reviews about
• A new sentiment analysis corpus in Urdu is collected various Amazon products [18]. Amazon review parser was
that contains user reviews about various services: prod- used for the dataset collection. Human experts annotated
ucts, games, and politics. It is manually annotated by each review according to their semantic meaning. A total
experts following a set of guidelines (publicly available; of 63,067 reviews were collected about different products.
see a link below); Another effort was made to develop an Indonesian corpus.
• We provided baseline results for the state-of-the-art The Twitter Streaming API was used to collect this dataset
machine leaning (RF, NB, SVM, AdaBoost, MLP, and they also used geo location just to collect Indonesian
LR) and deep learning (1D-CNN, LSTM) models dialect tweets. Their Indonesian dataset contains 5.3 million
on our UCSA corpus using two text representations: tweets [19].
word n-gram features and fastText pre-trained word Recently, deep learning methods were implemented to
embeddings; investigate text representations and to overcome the prob-
• To the best of our knowledge, no research study shows lem of sentiment classification on a large social network
the use of deep learning models with pre-trained word datasets [20]–[22]. In addition, improved word vectors
embedding models for Urdu sentiment analysis; there- (IWVs), was recommended for word embedding because
fore, we studied the effectiveness of word embedding of their higher performance in the domain of sentiment
models in resource-deprived languages such as Urdu. analysis [23].
Our corpus UCSA is publicly available.2 A few study were performed on the sentiment analysis
The rest of the paper is organized as follows. Section II of social network data on the subject to support intelligent
presents the background and related work. Section III transportation systems [24]–[26]. Data were gathered from
describe the corpus collection details. Section IV presents the various social networking sites such as Facebook, Twitter,
methodology of the paper. Section V analyzes the experimen- TripAdvisor. They achieved an accuracy of 93% on their sen-
tal setting and results. Finally, Section VI concludes the paper. timent analysis dataset. In addition, based on social network
data, a real-time observation framework was suggested to
II. BACKGROUND AND RELATED WORK detect traffic accidents and analyze traffic conditions by using
In this section, we discuss famous datasets as well as machine BiLSTM [26]. They achieved an accuracy of 97% for traffic
and deep learning techniques for sentiment analysis. event detection analysis.

A. SENTIMENT ANALYSIS DATASETS AND TECHNIQUES B. URDU DATASETS FOR SENTIMENT ANALYSIS
To create a benchmark dataset for sentiment analysis, Although a considerable quantity of data is available on
SemEval contests are considered one of the most noticeable internet research on sentiment analysis, Urdu is still at the
literature efforts. In the series of SemEval competitions to initial level compared to other resource-rich languages such
examine sentiment analysis, researchers performed distinct as English. A large quantity of data is required to create a
tasks using different datasets. These datasets developed in benchmark dataset for sentiment analysis. The drawbacks of
Arabic and English [14]. Generally, these datasets contain existing corpora are that they are too small or contains data
user tweets from Twitter and they are related to different about limited genres.
products such as laptops, TVs, and mobiles. The SemEval In the first study [27], authors collected user reviews
corpus 2013 edition consists of Twitter and SMS data; Twitter to create two corpora to find their models efficiency. The
tweets were divided into three sets: training (9,728), devel- first corpus contains 322 positive and 328 negative movie
opment (1,654) and test (3,813) while SMS messages were reviews. The second corpus contains reviews about electronic
used for testing purpose only, which contains 2,093 messages. appliances. This dataset contains 650 user reviews, among
322 are positive and 328 are negative. In this study, they
2 http://ieee-dataport.org/documents/urdu-corpus-sentiment-analysis; last used grammatical-based approach as well as they focused
visited: 20-06-2021 on sentence grammatical structure. They achieved 82.5%

97804 VOLUME 9, 2021


L. Khan et al.: Urdu Sentiment Analysis With Deep Learning Methods

accuracy on their best model. There are many problems with


their dataset as they did not mention any data annotation
techniques as well as their corpus is not publicly available.
In another study [28], authors extracted Urdu text from Urdu
news websites such as BBC Urdu news and Dawn news on
a particular topic for corpus generation. The authors used
a lexicon-based architecture and assigned polarity to each
token according to its sentiment. To find the model efficiency,
they performed experiments only on 124 comments, which
they extracted from different websites. The lexicon-based
model reveals an overall accuracy of 66%. The most sig-
nificant effort to build an Urdu sentiment analysis corpus
was made by the authors of study [27]. This study began
FIGURE 1. Dataset examples with their translation in English and roman
with a collection of Urdu blogs of different genres. A total Urdu.
of 6,025 Urdu sentences were gathered from 151 differ-
ent blogs. Three human experts annotated these collected language text corpus. Consumer reviews contain informa-
sentences into positive, negative, and neutral classes. After tion about politics, movies, Urdu drama, TV talk shows and
applying basic pre-processing techniques such as stop word sports. Four individuals were hired for manual data collec-
removal, the authors used LIBSVM library to implement tion. They were native Urdu speakers and it took 3 months to
decision tree (DT), and k-nearest neighbors algorithm (k-NN) collect the raw data. Initially, data was gathered in an Excel
algorithms for classification purpose. They achieved highest sheet.
accuracy of 67.01% on k-NN classifier. The corpus used in
this study is not publicly available. B. ANNOTATIONS GUIDELINES
Note that Urdu is a resource-deprived language, linguis- This section explains the annotation process that the authors
tically and technically. According to the existing literature, used in manual corpus generation. This step includes prepar-
many of the procedures applicable to sentiment analysis of ing the rules or guidelines for annotation, manual annotation
other languages are not relevant to the Urdu language due of the complete dataset by native Urdu speakers. We design
to morphological structure [29], [30]. Additionally, the defi- rules for sentiment analysis from existing literature review.
ciency in linguistic and language resources such as lexicons Figure 1 shows examples of user reviews fitting to the positive
and corpora also makes it difficult to implement the currently and negative classes.
available sentiment analysis methods cited in the literature • A sentence is labeled as positive if it conveys an overall
review, such as the availability of lexicons and datasets. positive sentiment or if it expresses both positive and
Moreover, accessible annotated datasets are not sufficient for neutral or if it contains agreement approval [31], [32];
implementing useful sentiment analysis. In addition, datasets • Sentences with words such as congratulations and admi-
and sentences generally belong to the same or limited genres. ration were also marked as positive [32];
To reduce this deficiency, this study emphasizes building • A sentence is labeled as negative if it conveys an overall
an Urdu dataset containing sentences fitted to six different negative sentiment or if it has more negative words than
domains. We implemented machine learning and deep learn- the other sentiments [19];
ing models on our constructed corpus, UCSA, which has not • If any sentence shows any disagreement, then the sen-
yet been studied fully for the sentiment analysis of Urdu data. tence is classified as negative [32];
• If a sentence has terms such as ban, penalizing, and
III. BUILDING THE DATASET assessing, it is labeled negative [32];
This section describes the procedures to create an annotated • If a sentence comprises a negative word with a positive
Urdu dataset for sentiment analysis. The stages included for adjective, it is classified as negative [33].
building the Urdu corpus are collecting user reviews from the
internet, preparation of annotation rules, manual annotation, C. DATASET STATISTICS
and final version of the corpus. The Urdu dataset was manually annotated by three (X, Y,
and Z) human experts to create a benchmark dataset here-
A. COLLECTING REVIEWS FROM THE INTERNET after named Urdu Corpus for Sentiment Analysis (UCSA).
To build a benchmark dataset for Urdu sentiment anal- Native Urdu speakers annotated all the user reviews, and
ysis, user reviews contain information about various ser- all were master graduates in the Urdu language. The anno-
vices, products, games, and politics from different web- tators were aware of sentiment analysis and annotation
sites that allow users to post their Urdu views. Urdu is a rules, as discussed above. Experts X and Y annotated each
resource-deprived language; therefore, the authors decided sentence either in positive or negative classes, considering
to collect data about different genres from internet reposito- the above-discussed rules. The conflicts between X and Y
ries that are easily accessible to construct a standard Urdu were resolved by Z by labeling the review. We obtained an

VOLUME 9, 2021 97805


L. Khan et al.: Urdu Sentiment Analysis With Deep Learning Methods

TABLE 1. Statistics of dataset.

inter-annotator agreement (IAA) of 73.91% and a Cohen


Kappa score of 59.7% (moderate) on our UCSA dataset. IAA
and moderate scores revealed that the annotators followed
the annotation guidelines during the labeling phase. UCSA
contains 9,601 user reviews, of which 4,843 are positive and
the remaining are negative reviews, as shown in Table 1.
From the statistics in Table 1, it can be clearly seen that our
corpus is class balanced. Very few scholars in the existing
literature have made efforts to create datasets for carrying out
experiments. Nevertheless, unluckily, most of the currently
available datasets are very small and are from specific genres
or cover very few genres rather than different genres. The
corpora [19], [27] are small and contain user reviews to
specific fields.

IV. METHODOLOGY
This section focuses on the experimental details of our
machine learning and deep learning models such as the
support vector machine (SVM), naïve Bayes (NB), random
forest (RF), AdaBoost, multilayer perceptron (MLP), logistic
regression (LR), 1-dimensional convolutional neural network
(1D-CNN), and long short-term memory (LSTM). All these FIGURE 2. High-level system architecture for Urdu sentiment analysis.

machine and deep learning models have been implemented on


our proposed UCSA corpus. Figure 2 represents the overall
architecture of the system. 2) NORMALIZATION
The normalization of Urdu text is essential to make it advan-
A. PREPOSSESSING tageous for NLP-related tasks. This step is used to solve the
The preprocessing of Urdu text is essential to make it easy issue of correct encoding for the Urdu characters. Normal-
and useful for NLP tasks. To enhance our model’s accuracy, ization is used to obtain all the characters in the required
emoji’s, URLs, email addresses, phone numbers, numerical unicode range (0600-06FF) for Urdu text. This step is also
numbers, numerical digits, currency symbols, and punctua- used to avoid the concatenation of different Urdu words. For
tion marks were removed. Additionally, the following text example, ‘‘ ’’ is one word (unigram) with two dif-
preprocessing steps were performed to increase our model’s ferent strings. These two strings (khush and bash) are part of
effectiveness for Urdu text. the same word concerning syntax and semantics. If the space
between two strings is omitted, then we obtain ‘‘ ’’
1) STOP WORDS which is an incorrect word in the Urdu language. With the
The words used to complete sentences are called stop words. help of normalization, authors attempt to minimize this effect.
Words such as ‘‘ ’’ and ‘‘ ’’ are commonly used words in We used UrduHack library for this task.3
Urdu. We removed these words from our corpus. Neverthe-
less, due to the Urdu language’s morphological structure and B. N-GRAM FEATURES
poor resources, it is challenging to remove stop words auto- In natural processing tasks such as text classification, the text
matically. Figure 3 explains the flowchart of the Urdu stop is generally denoted as a vector of weighted features. In this
words removal steps. All commonly used Urdu stop words study, different n-gram models are used; these are the models
were collected in a file, and then all those were eliminated
from the corpus. 3 https://pypi.org/project/urduhack/; last visited: 20-06-2021

97806 VOLUME 9, 2021


L. Khan et al.: Urdu Sentiment Analysis With Deep Learning Methods

FIGURE 4. Proposed system architecture of an LSTM network with a


fastText word embedding layer for Urdu sentiment classification.

the best performing classifier on our dataset such as 1D-CNN


and LSTM. Keras neural-network library4 was used for the
implementation of 1D-CNN and LSTM to implement the
FIGURE 3. Flowchart of the Urdu stop words removal module. state-of-the-art baseline approaches for the comprehensive
evaluation of sentiment analysis on our dataset.
Primarily, 1D-CNN is used for the computer vision,
that allocate probabilities to a sequence of words. An n-gram
however, it performs well on classification tasks in natu-
is a sequence of n words; a unigram is a model that contains
ral language processing domain. A 1D-CNN is extremely
a sequence of one word such as ‘‘homework’’; similarly,
capable when you expect to acquire new attributes from
a bigram is a sequence of two words such as ‘‘your home-
short fixed-length chunks of the overall data set and where
work’’ and a trigram model contains a sequence of three
the placement of the feature is not of relevance [36]–[38].
words such as ‘‘complete your homework’’. We explored
LSTM [39] is recurrent neural network architecture and
n-gram features such as unigram, bigram and trigram on our
shows state-of-the-art results for sequential data. Basically,
dataset.
LSTM is designed to capture the long-term dependencies
between text data. For each time step, the LSTM model
C. PRE-TRAINED WORD EMBEDDINGS
obtains the input from the current word, and the output from
Recently, pre-trained word vector models have been applied
the previous or last word produces an output, which is used to
in many natural processing tasks and have shown state-of-
feed to the next state. The hidden layer from the previous state
the-art results. The basic concept behind these pre-trained
(and sometimes all hidden layers) is then used for classifica-
models is to train these models on very large corpora and
tion. The high-level system architecture of an LSTM network
fine tune these models for specific tasks. fastText [33] is a
with fastText embedding is shown in Figure 4. A typical
word vector model trained on Wikipedia and common crawl
LSTM network contains four main components: input gate,
datasets. This model is trained for a total of 157 languages,
forget gate, memory cell, and output gate. Basically, these
including Urdu. This is the motive behind using the fast-
gates are used to flow in and out of the data at the existing time
Text word embedding model for this task with deep learning
step. LSTM working is divided into three parts as follow:
models. The fastText model was trained using skip-gram and
continuous bag of words (CBOW) [34], [35]. The fastText
1) STEP 1
model is an extension of skip-gram that breaks down the
LSTM identify the insignificant information in the first step
unigram (words) into bags of character n-grams (sub-words)
and disappear it from the cell. Sigmoid layer is used for the
and allocates a vector value to individual character n-grams.
identification and elimination of details by acquiring output
Therefore, each single word is represented by the summation
from the final LSTM unit ht − 1 at time t −1 and the available
of its related n-gram vectors.
input Xt at time t, sigmoid function clarify which chunk of old
output should be removed. The output should be in the range
D. CLASSIFICATION MODELS
between 0 and 1, which is stored in the vector ft , for every
Various machine and deep learning models, namely, SVM,
cell state Ct − 1. Sigmoid function takes the decision that
NB, RF, AdaBoost, MLP, LR, 1D-CNN, and LSTM are used
information should be kept or discarded based on the output.
to find the effectiveness of our corpus and achieve state-of-
the-art results. We do not explain these conventional machine ft = σ (Wf [ht − 1, Xt ] + bf ) (1)
learning models here because these models are prevalent and
famous. Two deep-learning classifiers were applied to find 4 https://keras.io/

VOLUME 9, 2021 97807


L. Khan et al.: Urdu Sentiment Analysis With Deep Learning Methods

TABLE 2. Urdu sentiment analysis results using machine learning models with word N-gram features.

TABLE 3. Urdu sentiment analysis results using deep learning models with pre-trained word embeddings.

σ represent sigmoid function in the above equation while information need to be update or discard while tanh layer
Wf and bf specify weighted matrices and bias, correspond- allocate weights to the passing values. Then these values are
ingly of the forget state. multiplied to update the cell state and then add new memory
to old memory Yt − 1 that result in Yt [2].
2) STEP 2
it = σ (Wi [ht − 1, Xt ] + bi ), (2)
In step 2, we store new input Xt as well as update the cell state.
we executes two actions: one is for sigmoid layer while other Nt = tanh(Wn [ht − 1, Xt ] + bn ), (3)
one is for tanh layer. Sigmoid layer makes a decision which Yt = Yt − 1 ∗ ft + Nt ∗ it . (4)

97808 VOLUME 9, 2021


L. Khan et al.: Urdu Sentiment Analysis With Deep Learning Methods

TABLE 4. Comparison with existing state-of-the-art results.

FIGURE 5. Performance comparison of machine learning models using unigram features.

FIGURE 6. Performance comparison of machine learning models using bigram features.

where Yt − 1 and Yt are showing the cell states at time t − 1 output cell state Yt but in a filtered form. In ordered to create
and t. While W represent weight matrices and b represent bias output, sigmoid layer choose the part of cell state. After that
to the cell sate. sigmoid gate Yt output is multiplied by the new values that
are produced by tanh layer from the cell state Yt
3) STEP 3 Yt = σ (Wo [ht − 1, Xt ] + bo ) (5)
In this step, we have output values ht . These values based ht = Yt ∗ tanh(Ct ) (6)
on output cell state Yt ; however, in a filtered form. The last
step is related to output values ht , which depend on the Wo and bo depicts the weight matrices and bias.

VOLUME 9, 2021 97809


L. Khan et al.: Urdu Sentiment Analysis With Deep Learning Methods

FIGURE 7. Performance comparison of machine learning models using trigram features.

FIGURE 8. Performance comparison of machine learning models using combination (1-2) of features.

FIGURE 9. Performance comparison of machine learning models using combination (1-3) of features.

E. EVALUATION MEASURES where TP and FP stand for true positive and false positive,
We evaluate the effectiveness of our sentiment analysis mod- and FN stands for false negative.
els using Recall (R), Precision (P), and F1 -measure. The
mathematical equations are as follows: V. EXPERIMENTAL SETTINGS AND RESULTS
TP We performed our experiments on UCSA, which is publicly
Precision = , available to the research community. UCSA contains
TP + FP
9,601 Urdu reviews, which belong to politics, dramas,
TP movies, TV talk shows, sports, and software domains.
Recall = ,
TP + FN The dataset is split into training, which contains 80% of
2×P×R user reviews, and testing, which contains 20%. In all the
F1 = , experiments for machine learning models we used default
P+R

97810 VOLUME 9, 2021


L. Khan et al.: Urdu Sentiment Analysis With Deep Learning Methods

parameters. For deep learning algorithms, we used mean accuracy has been achieved for Urdu sentiment analysis using
square error (MSE) as a loss function, Adam as an optimizer. various machine and deep learning models. After perform-
We set the number of epochs to 25. ing various experiments based on two text representations:
n-gram features and pre-trained word embeddings, we achieve
A. RESULT AND DISCUSSION the highest F1 score of 82.05% using LR with combination of
Each of the six machine learning classifiers is run on the features. The SVM classifier is the second highest performer
UCSA dataset using word n-gram features. All the revealed for this task and its average performance is better than all
results are carefully examined to improve the results and iden- other classifiers. This study open a new domain for future
tify the finest machine learning classifier with features that researchers to explore resource-deprived languages. One of
achieve better results than the others concerning the accuracy, the limitations of this study is that it includes only positive
precision, recall, and F1 score. By witnessing the Table 2 and negative classes; our future work will include a neutral
results, all the machine learning classifiers’ performances class in our dataset. In the future, we will also include
are quite poor with the trigram feature. Generally, there are state-of-the-art classifiers in the benchmark techniques such
discriminative models (SVM, LR, etc.) and generative classi- as BERT.
fication models (NB).
Both SVM and LR achieve satisfactory results, as both REFERENCES
classifiers belong to discriminative models. Logistic regres- [1] S.-U. Hassan, A. Akram, and P. Haddawy, ‘‘Identifying important citations
sion is a supervised machine learning algorithm that is used using contextual information from full text,’’ in Proc. ACM/IEEE Joint
Conf. Digit. Libraries (JCDL), Jun. 2017, pp. 1–8.
when problems are categorical in nature and it is the most [2] Y. Liu, F. Du, J. Sun, T. Silva, Y. Jiang, and T. Zhu, ‘‘Identifying social roles
commonly used classifier when the data have two classes, using heterogeneous features in online social networks,’’ J. Assoc. Inf. Sci.
either positive or negative. Overall, the highest accuracy Technol., vol. 70, pp. 660–674, Mar. 2019.
[3] Z. Luo, S. Huang, and K. Q. Zhu, ‘‘Knowledge empowered prominent
of 81.94%, precision of 79.95%, recall of 84.26%, and F1 aspect extraction from product reviews,’’ Inf. Process. Manage., vol. 56,
score of 82.05% were achieved by LR with the combination no. 3, pp. 408–423, May 2019.
of n-gram features. The SVM classifier achieves the second [4] F. Anwaar, N. Iltaf, H. Afzal, and R. Nawaz, ‘‘HRS-CE: A hybrid frame-
work to integrate content embeddings in recommender systems for cold
highest accuracy, precision, recall and F1 score, which were start items,’’ J. Comput. Sci., vol. 29, pp. 9–18, Nov. 2018.
81.47%, 80.32%, 82.36%, and 81.47%, respectively, with the [5] R. Nawaz, P. Thompson, and S. Ananiadou, ‘‘Identification of manner in
unigram feature. bio-events,’’ in Proc. LREC, 2012, pp. 3505–3510.
[6] H. Qadir, O. Khalid, M. U. S. Khan, A. U. R. Khan, and R. Nawaz,
The worst accuracy out of all classifiers was 55.25% gain ‘‘An optimal ride sharing recommendation framework for carpooling ser-
by RF with trigram features. All classifiers perform better vices,’’ IEEE Access, vol. 6, pp. 62296–62313, 2018.
with bigram features as compared to trigram features. The [7] M. Z. Asghar, A. Sattar, A. Khan, A. Ali, F. M. Kundi, and S. Ahmad,
‘‘Creating sentiment lexicon for sentiment analysis in Urdu: The case
overall results using different machine learning models with of a resource-poor language,’’ Expert Syst., vol. 36, no. 3, Jun. 2019,
different features shown in Table 2. Figures 5, 6, 7, 8 and 9 Art. no. e12397.
describe the comparison of each model in terms of accuracy, [8] A. Z. Syed, M. Aslam, and A. M. Martinez-Enriquez, ‘‘Lexicon based
sentiment analysis of Urdu text using sentiunits,’’ in Proc. Mexican Int.
precision, recall and F1 measure with word n-gram features. Conf. Artif. Intell. Berlin, Germany: Springer, 2010, pp. 32–43.
Table 3 presents the results of deep learning mod- [9] M. Ijaz and S. Hussain, ‘‘Corpus based Urdu lexicon development,’’ in
els on our dataset. LSTM achieves slightly better results Proc. Conf. Lang. Technol. (CLT), Peshawar, Pakistan, vol. 73, 2007,
pp. 1–12.
than the 1D-CNN model in terms of accuracy, which is [10] W. Anwar, X. Wang, and X.-L. Wang, ‘‘A survey of automatic urdu
75.96 for LSTM and 75.73% for 1D-CNN. Deep learn- language processing,’’ in Proc. Int. Conf. Mach. Learn. Cybern., 2006,
ing results are slightly lower than machine learning mod- pp. 4489–4494.
[11] A. Daud, W. Khan, and D. Che, ‘‘Urdu language processing: A survey,’’
els. It is because some of the words are out of vocabulary Artif. Intell. Rev., vol. 47, no. 3, pp. 279–311, Mar. 2017.
in fastText pre-trained model. Therefore, in machine and [12] S. Kiritchenko, S. Mohammad, and M. Salameh, ‘‘SemEval-2016 task
deep learning our results are in line with state-of-the-art 7: Determining sentiment intensity of english and arabic phrases,’’
in Proc. 10th Int. Workshop Semantic Eval. (SemEval), 2016,
results. pp. 42–51.
As previously stated, a lack of research using machine [13] J. Villena-Román, J. García-Morera, and J. C. González-Cristóbal,
learning algorithms in Urdu sentiment analysis is seen. Very ‘‘DAEDALUS at SemEval-2014 task 9: Comparing approaches for sen-
timent analysis in Twitter,’’ in Proc. 8th Int. Workshop Semantic Eval.
few studies are found regarding this context and they used (SemEval), 2014, pp. 218–222.
different machine learning classifiers on a very insignificant [14] P. Nakov, A. Ritter, S. Rosenthal, F. Sebastiani, and
dataset. Our dataset contains more user reviews as a compare V. Stoyanov, ‘‘SemEval-2016 task 4: Sentiment analysis in Twitter,’’ 2019,
arXiv:1912.01973. [Online]. Available: https://arxiv.org/abs/1912.01973
to previous studies. The results of our study reveal that each [15] H. Jang, M. Kim, and H. Shin, ‘‘KOSAC: A full-fledged Korean sentiment
model in our study performs better than existing models. analysis corpus,’’ in Proc. 27th Pacific Asia Conf. Lang., Inf., Comput.
A comparison of our study with existing studies is presented (PACLIC), 2013, pp. 366–373.
[16] L.-S. Chen, C.-H. Liu, and H.-J. Chiu, ‘‘A neural network based approach
in Table 4. for sentiment classification in the blogosphere,’’ J. Informetrics, vol. 5,
no. 2, pp. 313–322, Apr. 2011.
VI. CONCLUSION AND FUTURE WORK [17] A. F. Wicaksono, C. Vania, B. Distiawan, and M. Adriani, ‘‘Auto-
matically building a corpus for sentiment analysis on Indonesian
Few research studies have been reported in the Urdu sen- tweets,’’ in Proc. 28th Pacific Asia Conf. Lang., Inf. Comput., 2014,
timent analysis domain. In this paper, high classification pp. 185–194.

VOLUME 9, 2021 97811


L. Khan et al.: Urdu Sentiment Analysis With Deep Learning Methods

[18] A. Z. Syed, M. Aslam, and A. M. Martinez-Enriquez, ‘‘Associ- LAL KHAN was born in D. G. Khan, Punjab,
ating targets with SentiUnits: A step forward in sentiment analy- Pakistan, in 1990. He received the M.S. degree in
sis of Urdu text,’’ Artif. Intell. Rev., vol. 41, no. 4, pp. 535–561, computer science from the Federal Urdu Univer-
Apr. 2014. sity of Arts, Science and Technology, Islamabad,
[19] Z. U. Rehman and I. S. Bajwa, ‘‘Lexicon-based sentiment analysis for in 2017. He is currently a Ph.D. Scholar with
urdu language,’’ in Proc. 6th Int. Conf. Innov. Comput. Technol. (INTECH), the Department of Computer Science and Infor-
Aug. 2016, pp. 497–501. mation Engineering, Chang Gung University,
[20] W. Zhao, Z. Guan, L. Chen, X. He, D. Cai, B. Wang, and Q. Wang,
Taiwan. He is also working in NLP task for
‘‘Weakly-supervised deep embedding for product review sentiment anal-
resource-deprived languages. His research inter-
ysis,’’ IEEE Trans. Knowl. Data Eng., vol. 30, no. 1, pp. 185–197,
Jan. 2018. ests include machine learning, deep learning, nat-
[21] M. Kamkarhaghighi and M. Makrehchi, ‘‘Content tree word embedding ural language processing (NLP), and speech recognition.
for document representation,’’ Expert Syst. Appl., vol. 90, pp. 241–249,
Dec. 2017. AMMAR AMJAD received the master’s degree
[22] Z. Hu, J. Hu, W. Ding, and X. Zheng, ‘‘Review sentiment analysis based in computer science from the National Col-
on deep learning,’’ in Proc. IEEE 12th Int. Conf. e-Bus. Eng., Oct. 2015, lege of Business Administration and Economics,
pp. 87–94. in March 2017. He is currently pursuing the Ph.D.
[23] S. M. Rezaeinia, A. Ghodsi, and R. Rahmani, ‘‘Improving the accu- degree in electrical engineering with the Division
racy of pre-trained word embeddings for sentiment analysis,’’ 2017, of Computer Science and Information Engineer-
arXiv:1711.08609. [Online]. Available: https://arxiv.org/abs/1711.08609 ing, Chang Gung University, Taiwan. His main
[24] F. Ali, D. Kwak, P. Khan, S. El-Sappagh, A. Ali, S. Ullah, K. H. Kim, research interests include speech processing, lan-
and K.-S. Kwak, ‘‘Transportation sentiment analysis using word embed-
guage learning, speech analysis, speech synthe-
ding and ontology-based topic modeling,’’ Knowl.-Based Syst., vol. 174,
sis, voice pathologies, auditory neuroscience, and
pp. 27–42, Jun. 2019.
[25] F. Ali, S. El-Sappagh, S. M. R. Islam, A. Ali, M. Attique, M. Imran, and machine learning.
K.-S. Kwak, ‘‘An intelligent healthcare monitoring framework using wear-
able sensors and social networking data,’’ Future Gener. Comput. Syst., NOMAN ASHRAF received the master’s degree
vol. 114, pp. 23–43, Jan. 2021. in computer science from the National University
[26] F. Ali, A. Ali, M. Imran, R. A. Naqvi, M. H. Siddiqi, and K.-S. of Computer and Emerging Sciences, Islamabad,
Kwak, ‘‘Traffic accident detection and condition analysis based on Pakistan. He worked as a Lecturer with The Uni-
social networking data,’’ Accident Anal. Prevention, vol. 151, Mar. 2021, versity of Lahore, Pakistan, from 2017 to 2019.
Art. no. 105973.
He is currently a Ph.D. Scholar with the Cen-
[27] N. Mukhtar and M. A. Khan, ‘‘Urdu sentiment analysis using supervised
tro de Investigación en Computación, Instituto
machine learning approach,’’ Int. J. Pattern Recognit. Artif. Intell., vol. 32,
no. 2, Feb. 2018, Art. no. 1851001. Politécnico Nacional (IPN). His research inter-
[28] M. Pontiki, D. Galanis, H. Papageorgiou, I. Androutsopoulos, ests include natural language processing (NLP),
S. Manandhar, M. Al-Smadi, M. Al-Ayyoub, Y. Zhao, B. Qin, machine learning, and deep learning.
O. De Clercq, and V. Hoste, ‘‘SemEval-2016 task 5: Aspect based
sentiment analysis,’’ in Proc. Int. Workshop Semantic Eval., 2016, HSIEN-TSUNG CHANG (Member, IEEE)
pp. 19–30.
received the M.S. and Ph.D. degrees from the
[29] H. Masroor, M. Saeed, M. Feroz, K. Ahsan, and K. Islam, ‘‘Transtech:
Department of Computer Science and Informa-
Development of a novel translator for Roman Urdu to English,’’ Heliyon,
vol. 5, no. 5, May 2019, Art. no. e01780. tion (CSIE), National Chung Cheng University,
[30] A. Rafae, A. Qayyum, M. Moeenuddin, A. Karim, H. Sajjad, and in July 2000 and July 2007, respectively. He joined
F. Kamiran, ‘‘An unsupervised method for discovering lexical variations the Faculty of the Department of Computer Sci-
in roman urdu informal text,’’ in Proc. Conf. Empirical Methods Natural ence and Information Engineering, Chang Gung
Lang. Process., 2015, pp. 823–828. University, and served as an Associate Professor.
[31] D. Maynard and K. Bontcheva, ‘‘Challenges of evaluating sentiment anal- He is currently a member of the Artificial Intel-
ysis tools on social media,’’ in Proc. 10th Int. Conf. Lang. Resour. Eval. ligence Research Center, Chang Gung University,
(LREC), 2016, pp. 1142–1148. and the Department of Physical Medicine and Rehabilitation, Chang Gung
[32] M. Ganapathibhotla and B. Liu, ‘‘Mining opinions in comparative sen- Memorial Hospital. He is also the Director of the Web Information and
tences,’’ in Proc. 22nd Int. Conf. Comput. Linguistics (COLING), 2008, Data Engineering Laboratory (WIDE Lab). His research interests include
pp. 241–248. artificial intelligence, natural language processing, information retrieval, big
[33] E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, ‘‘Learn- data, web services, and search engines.
ing word vectors for 157 languages,’’ 2018, arXiv:1802.06893. [Online].
Available: https://arxiv.org/abs/1802.06893
[34] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, ‘‘Enriching word ALEXANDER GELBUKH is currently a Research
vectors with subword information,’’ Trans. Assoc. Comput. Linguistics, Professor and the Head of the Natural Lan-
vol. 5, pp. 135–146, Dec. 2017. guage Processing Laboratory, Center for Com-
[35] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, ‘‘Dis- puting Research, Instituto Politécnico Nacional,
tributed representations of words and phrases and their compositional- Mexico, and a Honorary Professor of Amity Uni-
ity,’’ 2013, arXiv:1310.4546. [Online]. Available: https://arxiv.org/abs/ versity, India. He has authored or coauthored more
1310.4546 than 500 publications in computational linguis-
[36] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and
tics, natural language processing, and artificial
P. Kuksa, ‘‘Natural language processing (almost) from scratch,’’ J. Mach.
intelligence, recently with a focus on sentiment
Learn. Res., vol. 12, pp. 2493–2537, Aug. 2011.
[37] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, ‘‘A convolutional neu- analysis and opinion mining. He is a member of
ral network for modelling sentences,’’ 2014, arXiv:1404.2188. [Online]. the Mexican Academy of Sciences, a Founding Member of the Mexican
Available: https://arxiv.org/abs/1404.2188 Academy of Computing, and a National Researcher of Mexico (SNI) at
[38] Y. Kim, ‘‘Convolutional neural networks for sentence classifica- excellence level 3 (highest). He is the Editor-in-Chief, an associate editor,
tion,’’ Sep. 2014, arXiv:1408.5882v2. [Online]. Available: https://arxiv. or an editorial board member of more than 20 international journals, and
org/abs/1408.5882v2 has been the chair or the program committee chair of over 50 international
[39] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural conferences.
Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

97812 VOLUME 9, 2021

View publication stats

You might also like