Projet Scientifique
Projet Scientifique
Projet Scientifique
SOCIAL MEDIA
SUPERVISED BY
❏ Introduction
❏ Problem Statement
❏ Objectives
❏ Main Objective
❏ Specific Objectives
❏ Literature Review
❏ Comparative Studies of the
different techniques
❏ Proposed Methodology
❏ Conclusions / Further Works
❏ References
INTRODUCTION
INTRODUCTION
The emergence of web 2.0 is changing the world of social media. Not only online
social media used to connect, share information and their personal opinion to others,
but even business can also communicate, understand and improve their product and
services through connecting in social media. The number of social media users
increases every day and it is estimated in 2019 there will be up to 2.77 billion social
media users worldwide.
INTRODUCTION
Sentiment analysis is an
approach that uses Natural
Language Processing (NLP)
to extract, convert and
interpret opinion from a text
and classify them into
positive, negative or natural
sentiment
PROBLEM STATEMENT
PROBLEM STATEMENT
The main issues with sentiment analysis include the slowness of the models which
in part may be due to the complexity of the text data, this includes the context of
the words/phrases or sentences, the cultural background of the user, models trained
on poorly prepared data, negation, sarcasm and irony, difficulty in determining the
user’s stance. This can have an effect in industries like the Healthcare industry as
this can endanger effective communication between patients and doctors since they
will not understand patients’ needs and hence get to what services dissatisfied
them, and may later on be unable to monitor the side-effects of medications based
on social media posts and prevent unforeseen consequences, for example.
OBJECTIVES
OBJECTIVES
OBJECTIVES
MAIN OBJECTIVE
❏ The main objective of this paper is to obtain a fast and efficient model of sentiment
analysis for social media.
SPECIFIC OBJECTIVES
Here the main problem this work tackles is the low performance of
lexicon-based approaches in terms of precision and coverage. In an
attempt to solve this issue, this paper investigates whether
computationally cheap techniques like document filtering, text pre-
processing, & frequency cut-off can be used to improve the
performance of rule-based techniques, as well as if machine
learning techniques relying only on lexicon emotion scores can be
used as the baseline for robust, complex and fast models which can
be portable across languages.
LITERATURE REVIEW
To this end, a new lexicon was built upon the publicly available DepecheMood
lexicon, which is generated from news sources distantly annotated with emotional
scores. It was evaluated and released to the community as an extension of the
original lexicon, built using a larger dataset, with a novel emotion lexicon targeting
the Italian language and built with the same methodology. Experiments were then
performed on six datasets/tasks exhibiting a wide diversity in terms of domain
(namely: news, blog posts, mental health forum posts, Twitter), in different
languages (English and Italian) and with different settings (both supervised and
unsupervised), and task (regression and classification). The lexicon is called
DepecheMood ++.
LITERATURE REVIEW
It is an upgrade on DepecheMood (which was built upon a dataset of 25.3k documents and
13.5M words (530 words per document on average) built using an expanded dataset in
order (i) to re-build the English lexicon on a larger corpus, and (ii) to build a novel lexicon
targeting the Italian language.
LITERATURE REVIEW
In that paper, different machine learning classifiers for sentiment classification were used
since no single classifier is good for all kinds of datasets. There, they combined three
different classifiers by using voting approach, in which each feature is assigned a number
of votes and choose that label which gets the most votes.
LITERATURE REVIEW
In the proposed methodology, there were four main components that are
preprocessing, feature extraction, meta learning and training data
LITERATURE REVIEW
LITERATURE REVIEW
The models used here are Support Vector Machines, Naive Bayes,
and Maximum Entropy.
LITERATURE REVIEW
❏ Lexicon-based sentiment analysis methods are easily accessible as many publicly available resources (e.g.,
SentiWordNet, DepecheMood) exist.
❏ They are less expensive because they do not require implementing advanced sentiment analysis algorithms.
❏ There is no need for training data, especially if companies use a dictionary-based approach, as the tags are
determined manually, and there is quick access to the meaning of the words.
CONS
● Lexicon-based sentiment analysis methods usually do not identify sarcasm, negation, grammar mistakes,
misspellings, or irony. Thus, it may not be suitable for analyzing data gathered from social media platforms.
LITERATURE REVIEW
PROS
● Can be trained to detect sarcasm, irony, or negation in sentiment analysis. This can ease social
media sentiment analysis.
● Learn the affective valence of the words, so they do not require a pre-determined dataset.
● Are faster than traditional sentiment analysis methods.
● Provide more accurate results.
CONS