Sentiment Analysis Using CATAtools
Sentiment Analysis Using CATAtools
Sentiment Analysis Using CATAtools
net/publication/370419820
CITATIONS
3 authors, including:
Saroj Date
Jawaharlal Nehru Engineering College
9 PUBLICATIONS 11 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
A medical QA system combining NLP techniques and semantic Web technologies View project
Forecasting novel COVID-19 confirmed cases in India using Machine Learning Methods View project
All content following this page was uploaded by Saroj Date on 01 May 2023.
Abstract. Recently the use of computerized text analysis tools to assess an indi-
vidual’s linguistic, emotional and psychological characteristics has exploded in
the field of empirical psychology. As a result, information about what people con-
vey through their words can be swiftly and reliably extracted and analyzed. The
key purpose of this research work is to analyze text data to assess linguistic and
emotional characteristics with the help of computer-assisted text analysis tools.
The analysis employed widely available text and sentiment analysis tools, Empath
and LIWC. As text data, children’s storybook reviews were analyzed in this work.
These reviews are written by the children for the children. Empath and LIWC
tools helped to measure the reviewer’s sentiment, analytical ability and cognition
level. Finally, by calculating the Pearson correlation coefficient for the selected
variables, it is inferred that Empath shares a high correlation with LIWC.
1 Introduction
An emerging area of research in computational social science and human-computer inter-
action makes use of technologies to interpret the emotions and sentiments expressed in
natural language. For example: since more than a couple of decades, scholarly writing
analysis has been the subject of active cognitive research. It uses techniques like time-
consuming traditional methods to automated text analysis tools. To extract high-quality
information from text, such as cognition level, communication, authenticity, tone, ana-
lytical ability etc. the automated techniques combine linguistic, statistical and machine
learning methods. For automated text analysis, researchers collect the data from common
sources like social media, product reviews, news articles, blogs, etc.
As social media users have access to ever larger and more diverse data from the
Internet, it becomes significant to scale our ability to conduct such analyses with breadth
and accuracy. In this paper, reviews written for children’s storybooks are collected to
perform sentiment analysis tasks. These reviews are obtained from a website and written
by children of different age groups. The collected reviews are processed and analyzed
with the help of computer-assisted text analysis (CATA) tools: LIWC and Empath.
• One of the most basic approaches for text analysis is the traditional approach. It
involves judgment-based content analysis by human beings. Human judges, such as
subject matter experts, study and categorize a sample of textual data based on content
similarities.
• Basic computer-based automation techniques are largely used to identify text data.
These techniques simply count the word frequencies and categorize the data. It may
also display the results in visual forms like word clouds.
• Another popular computerized technique to find out the sentiment of text data is the use
of existing data dictionaries. In addition, researchers can design their own dictionaries
to measure certain constructs and employ them for content analysis. It is similar to a
keyword search method in which the researcher identifies all relevant phrases related
to a construct and then searches the texts for these words. Publicly some dictionaries
are available for sentiment classification like LIWC and DICTION.
• Advanced Natural Language Processing (NLP) tools and techniques (like latent
Dirichlet analysis (LDA) and latent semantic analysis (LSA) can be employed to iden-
tify the language constructs that exist in a corpus. These techniques may be combined
with data dictionaries and basic automation.
To analyze text data, some software are available like Nvivo, Linguistic Inquiry Word
Count (LIWC), Atlas.ti, DICTION, Cat Scanner, MonoConc Pro, General Inquirer (GI),
Empath, Wordstat, Leximancer, Textpak, Textual Analysis Computing Tools (TACT),
Automap.
In this paper, to analyze the text data, LIWC and Empath tools are used. Empath is
one of the modern text analysis tools. It enables researchers to build and test new lexical
categories on demand by combining machine learning approaches and crowdsourcing.
LIWC is a good software to analyze text corpus by counting words in lexical
categories. It analyzes social, cognitive, emotional, and other psychological dimen-
sions within the written text. It has significant features like fast data processing, easy
interpretation of the results and extensively validated dictionary.
2 Literature Survey
The significant research work carried out in the domain of sentiment analysis which uses
computer-assisted text analysis (CATA) tools is presented in this section. Researchers
of this domain used CATA for emotion/sentiment analysis. As per the survey, there is a
Sentiment Analysis Using Computer-Assisted Text Analysis Tools 673
considerable increase in the usage of this type of software for sentiment analysis as well
as evaluating linguistic and psychological properties of human language.
Over the last two decades, Emotion/Sentiment Analysis from text has been regarded
as a demanding and interesting task. Vaibhav Tripathi et al. conducted an extensive
survey on the computational analysis of emotions. They compiled a list of the various
methodologies, datasets, and resources for sentiment analysis [1].
Zachary Dau reported outcomes on a computational analysis of Twitter activity by
politicians. He found a significant increase in Twitter activity throughout the pandemic
[2]. In the subject of Human Resource Management, Emily D. Campion and Michael
A. Campion employed computer-assisted text analysis [3].
Von Selasinsky et al. used computer-aided text analysis to investigate crowdfunding
success factors. They examined textual information from video subtitles as well as project
titles and descriptions. They discovered that including subtitle information enhances the
variation explained by the respective models and, as a result, their predictive potential for
financing success [4]. Bumsoo Kim used computer-assisted text analysis to determine
which social grooming characteristics diminish incivility among social media users when
discussing or publishing on the COVID-19 situation in South Korea. According to the
findings, the size of one’s social network is a negative predictor of civility [5].
Maverick Ferreira et al. demonstrated a method for automatically classifying the
content of messages in online discussions. To accomplish this task, he presented a
method based on a mix of classic text mining characteristics and word counts retrieved
using proven linguistic frames [6]. Shihab Elbagir and Jing Yang classified sentiments
conveyed in Twitter data using the Valence Aware Dictionary for sEntimentReasoner
(VADER) [7].
For sentiment analysis of the most “Fanned” Facebook Pages, Alan R. Pella
employed the most widely explored method, LIWC [8]. Saifuddin Ahmed et al. stud-
ied whether online protest activities have the same emotional underpinnings as offline
protest actions for sustaining and nourishing a social movement, and how these emotions
alter across different stages of the social movement [9]. Ryan L. Boyd highlighted how
language may reveal profound insights into the minds of others using well-established
and straightforward psychometric approaches [10].
3 Methodology
In academic research, text data analysis using various approaches has a long history.
The primary goal of this research work is to analyze text data with the help of modern
computer-assisted text analysis tools. As written in Sect. 1, the tools under considera-
tion are LIWC and Empath. Empath and LIWC analyses are driven by dictionary-based
word counts. Empath works on a broader range of categories and may build and validate
new categories on demand using unsupervised language modeling whereas LIWC pro-
vides a highly validated dictionary for analysis. After successful analysis, the obtained
results are discussed in Sect. 4 of this paper. To accomplish above mentioned task, the
domain of scholarly writing analysis is considered. One of the major characteristics of
scholarly writing is that it should be written in concise statements and should show an
understanding of the topic.
674 S. S. Date et al.
For this work, children’s storybooks reviews were collected from a website. These
are book reviews by children for children. Following this, the reviews were processed
by custom-written Python scripts that transformed the text into a form that could be
analyzed using the Empath and LIWC. Empath and LIWC software produces hundreds of
measures as a result for analysis. For this study, to identify the sentiments of reviews, we
limited the analysis to three variables of Empath and ten variables of LIWC. The selected
Empath and LIWC category score values are processed and compared for performance
analysis.
4 Results
The analysis of the results of two CATA tools are presented in this section. The storybook
reviews, which are considered for experiment, are written by children of different age
groups for children. The age groups are 8 to 10 Years, 10 to 12 years and 12 years
and above. We labeled it as Group A, Group B and Group C respectively. Analysis on
different parameters is done group-wise.
300
255
250
200
153
150 118
100
50
0
Group A Group B Group C
Some of the common variables from both CATA tools are evaluated to see how
closely Empath and LIWC categories are correlated. These variables are communi-
cation, positive-emotion and negative_emotion. Communication refers to exchange of
information by speaking, writing, or using some other medium. A positive emotion is
an emotional reaction that is intended to convey a pleasant feeling and negative emotion
convey unpleasant feeling. The score values for these communications, positive_emotion
and negative_emotion variables are shown in Fig. 4.
Group-wise sentiment analysis score values are shown in Table 2. These are being
used to calculate the Pearson correlation coefficient (PCC). It is the most widely used
measure in research field to determine the strength and direction of the relation between
two variables. It is a value that ranges from −1 to 1 and represents linear correla-
tion.With a value of −1 signifying a total linear correlation that is negative, 0 signifying
no correlation, and +1 denoting a total linear correlation that is positive.
In Table 3, Group (ABC) denotes the calculated average scores of Groups A, B and
C. After closely analyzing the average scores of the common variables, it can be inferred
that selected Empath’s categories are highly correlated with the respective categories of
LIWC’s with average PCCs of r = 0.909. It implies that Empath’s data-driven word
counts are quite close to an extensively validated LIWC dictionary.
Sentiment Analysis Using Computer-Assisted Text Analysis Tools 677
5 Conclusion
In this paper, we presented the use of Computer-Assisted Text Analysis tools to identify
natural language constructs in order to assess the linguistic and emotional characteristics
of text data. To analyze storybook reviews, Empath and LIWC tools were used. From the
analysis, it is scientifically proved that children’s scholarly writing improves linearly with
age. Through experimental work, it is concluded that Empath has a high correlation with
LIWC software. However, the overall observation is that, while Empath covers a broader
set of categories than LIWC, it does not include similar kinds of LIWC variables such
as Analytic, Clout, Authentic, and Tone. These are LIWC’s highly condensed summary
variables. Another noting is that, Empath can generate and validate new categories
with a few seed words. LIWC’s dictionaries are created and validated rigorously using
semi-automatic approaches.
This work has some limitations. The focus is on analyzing storybook reviews, which
means that the findings may not be applicable beyond this domain. Nonetheless, we
would expect similar results for the same domain. Future researchers can extend this
work by comparing and contrasting the results obtained with other text analysis software.
Sentiment Analysis Using Computer-Assisted Text Analysis Tools 679
They can also assess other aspects of LIWC and Empath besides analytical ability,
cognition, communication, positive emotion and negative emotion.
References
1. Tripathi, V., Joshi, A., Bhattacharyya, P.: Emotion analysis from text: A survey. Center for
Indian Language Technology Surveys 11(8), 66–69 (2016).
2. Dau, Z.: A Computational Analysis of the Coronavirus Pandemic Response of Tri-State Area
Politicians on Twitter. In Corpus Linguistics, (2021).
3. Campion, E. D., Campion, M. A.: Using Computer-assisted Text Analysis (CATA) to Inform
Employment Decisions: Approaches, Software, and Findings. Research in Personnel and
Human Resources Management, (2020).
4. Von Selasinsky, C., Isaak, A. J.: It’s all in the (Sub-) title? Expanding Signal Evaluation in
Crowdfunding Research. arXiv preprint arXiv:2010.14389, (2020).
5. Kim, B.: Effects of social grooming on incivility in COVID-19. Cyberpsychology, Behavior,
and Social Networking 23(8), 519–525 (2020).
6. Ferreira, M., Rolim, V., Mello, R. F., Lins, R. D., Chen, G., Gašević, D.: Towards automatic
content analysis of social presence in transcripts of online discussions. In Proceedings of the
tenth international conference on learning analytics & knowledge, pp. 141–150 (2020).
7. Elbagir, S., Yang, J.: Twitter sentiment analysis using natural language toolkit and VADER
sentiment. In Proceedings of the international multiconference of engineers and computer
scientists (Vol. 122, p. 16), (2019).
8. Peslak, A.: Facebook Fanatics: A Linguistic and Sentiment Analysis of the Most “Fanned”
Facebook Pages. Journal of Information Systems Applied Research, 11(1), 23, (2018).
9. Ahmed, S., Jaidka, K., Cho, J.: Tweeting India’s Nirbhaya protest: A study of emotional
dynamics in an online social movement. Social Movement Studies 16(4), 447–465(2017).
10. Boyd, R. L.: Psychological text analysis in the digital humanities. In Data analytics in digital
humanities (pp. 161–189). Springer, Cham (2017).
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-
NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/),
which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.