Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation

Aryal, Saurav K.; Prioleau, Howard; Washington, Gloria

Computer Science > Computation and Language

arXiv:2210.16461 (cs)

[Submitted on 29 Oct 2022]

Title:Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation

Authors:Saurav K. Aryal, Howard Prioleau, Gloria Washington

View PDF

Abstract:With increasing globalization and immigration, various studies have estimated that about half of the world population is bilingual. Consequently, individuals concurrently use two or more languages or dialects in casual conversational settings. However, most research is natural language processing is focused on monolingual text. To further the work in code-switched sentiment analysis, we propose a multi-step natural language processing algorithm utilizing points of code-switching in mixed text and conduct sentiment analysis around those identified points. The proposed sentiment analysis algorithm uses semantic similarity derived from large pre-trained multilingual models with a handcrafted set of positive and negative words to determine the polarity of code-switched text. The proposed approach outperforms a comparable baseline model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English dataset. Theoretically, the proposed algorithm can be expanded for sentiment analysis of multiple languages with limited human expertise.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.16461 [cs.CL]
	(or arXiv:2210.16461v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.16461

Submission history

From: Saurav Keshari Aryal PhD [view email]
[v1] Sat, 29 Oct 2022 01:52:25 UTC (341 KB)

Computer Science > Computation and Language

Title:Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators