BAHIR DAR UNIVERSITY proposal (Autosaved)
BAHIR DAR UNIVERSITY proposal (Autosaved)
BAHIR DAR UNIVERSITY proposal (Autosaved)
No Name ID
1. Elias Worku 1103141
2. Eba Assefa 1102993
3. Natnael Worku 1102396
4. Nibretu Atinkut 1102390
Name Signature
1. ____________ ____________
2. ____________ ____________
3. ____________ ____________
4. ____________ ____________
Date of Submission:
This project proposal has been submitted for examination with our approval as a university
advisor.
I
Acknowledgment
First and foremost, praises and thanks to the God, the Almighty for his showers of blessings.
We would like to sincerely thank Bahir Dar Institute of Technology (BiT) who give us the
golden opportunity to do this wonderful project. We would like to thank our academic advisor
Mr. Addis for fruitful advice and on initiating us to do our project. Additionally we want to thank
Mr. Yessuneh for approving the project title and giving us the chance to work on this project.
II
Executive Summary
The exponential growth of social media such as Facebook, Telegram, Twitter and community
forums has revolutionized communication and content publishing, but is also increasingly
exploited for the propagation of hate speech and the organization of hate-based activities. The
anonymity and mobility afforded by such media has made the breeding and spreading of hate
speech eventually leading to hate crime effortless in a virtual land Correspondence.
The term ‘hate speech’ was formally defined as any communication that disparages a person or a
group on the basis of some characteristics (to be referred to as types of hate or hate classes) such
as race, color, ethnicity, gender, sexual orientation, nationality, religion, or other characteristics.
This project is used to build an automated system that detects tags, eliminates and controls all these
comments, posts, news feeds, reviews and updates; which would be time-consuming, tedious and
boring. With said system, we would be able to detect and monitor the spread of hate speech on
social media and this reduce the dangerous consequences of societal conflicts and civil war. With
Ethiopian indigenous languages such as Amharic, Affan oromo and English, we can use sentiment
analysis application on social media with other natural language processing methods for the
detection.
To work this project we use python programming language framework called tensorflow. Because
it is more powerful and it can work with a lot of library. We use all data processing techniques
using python programing language.
Key words: Hate speech, RNN, CNN, machine learning, Offensive language
III
Table of contents
Contents
Declaration ..................................................................................................................................................... I
Acknowledgment .......................................................................................................................................... II
Executive Summary ..................................................................................................................................... III
Table of contents .......................................................................................................................................... IV
List Of Figure ................................................................................................................................................. V
List of Table .................................................................................................................................................. VI
Abbreviations .............................................................................................................................................. VII
Background ................................................................................................................................................... 1
Literature Review .......................................................................................................................................... 2
Problem Statement ........................................................................................................................................ 3
Objective of Study ........................................................................................................................................ 4
General Objective ......................................................................................................................................... 4
Specific Objectives: ....................................................................................................................................... 4
Methodology ................................................................................................................................................. 5
Methodology for hate speech detection ...................................................................................................... 5
Methodology for violence image detection ................................................................................................. 7
Scope of the Project ...................................................................................................................................... 8
Significance Of The Study ............................................................................................................................ 9
Time Frame for the Project/Work Plan ....................................................................................................... 10
Cost and Materials Required ....................................................................................................................... 10
References ................................................................................................................................................... 11
IV
List Of Figure
Figure 1 methodology diagram for hate speech detection .......................................................................... 5
Figure 2 methodology diagram for violence image detection...................................................................... 7
V
List of Table
VI
Abbreviations
VII
Background
A total of 5.07 billion people around the world use the internet today – equivalent to 63.5 percent
of the world’s total population and there are 29.83 million internet users in Ethiopia which is 25.0
percent of the country’s population which shows the rapid development of social networking
websites and their usability by users from all over the world to easily expressing an opinion and
communicating each other.( https://www.statista.com/) Nowadays it is easy to spread individuals’
idea within the population using a different languages through which the idea can reach to the
desired web technology users, resulting in a huge amount of user-generated data available for an
enormous online audience. Those opportunities are widely used to express hateful statements to
large groups or specific individuals with the malicious intention with the freedom of speech on
social networks and anonymity on the world wide web (WWW), people are free to comment on
hate, insults, fake news, and disinformation. Hate speech can have an adverse effect on human
behavior and livelihood and it also affects societal queues and its consequences can be fatal if not
controlled on time.
The explosive growth in hate speech and its erosion to democracy, justice, peace-building, and
public trust has increased the demand for automated hate speech detection and technological
intervention because hate speech posts are more influential in Ethiopian communities as
individuals become more sensitive to something which they think is mine due to that they follow
posts about those things and participate through comments and share. Those posts shared by
different users deliberately or unknowingly create a bad feeling on some side and which became
a cause for conflicts, that is why the government of Ethiopia mostly blocking social media sites
while rebellion arises to minimize their effect. So that having organized knowledge about those
hate speech posts to take some measures by the government need the application of automated
hate speech posts detection based on mostly used languages such as English, Amharic and Afan
Oromo languages because of the huge amount and unstructured data used in social media are
difficult to analyze manually to solve those type of problems. Therefore, we developed a system
which can detect hate speech posts by applying deep learning techniques and algorithms.
1
Literature Review
In hate and offensive speech detection, there is no single definition that can agree everybody, due
to this it became a topic to hotly debate by expert. But there is legal definition that set by
government based on there on culture and other. In our country criminal code it define hate speech
in the following
Hate speech" is the speech that intentionally promotes discrimination, hatred, or attack
against a discernable group of identity or person, based on race, ethnicity, gender, religion
or disability".
Ethiopian Hate Speech and Disinformation Prevention and Suppression
Proclamation Page 12339 under Proclamation No. 1185 /2020
Many social media define in different angle of view and make a rule and polices to prevent hate
speech.
There are many literature regarding to hate speech in many language which listed below.
A study carried out by Surafel Getachew Tesfaye (2020) concentrated on the hate speech
detection. On his study there are major areas that covered is accuracy. He try to increase
the accuracy of the detection using RNN with word embeddings approaches. He has good
accuracy 95% with its own limitation. The main limitation of this research was the dataset
collected only in facebook posts and comments only and it is not integrated implemented
with social media platform. [1]
Z. Mossie and J.-H. Wang (2018) on this research has good accuracy 97% when we
compare to the above and it has more or less dataset size. This will show that they use good
modeling for their data set. The limitation was not much different from the above research.
It is not integrated implemented with social media platform. And also it is not use different
algorithm of machine learning while in training. [2]
Paula Fortuna on his article named “Automatic detection of hate speech in text” covered
more areas from organization of dataset to final and also contain dataset annotation for
Portuguese. [3]
2
Problem Statement
Hate speech posts on social media has become a major problem for both social media users and
the government by triggering the users for conflict through hateful speech which bring hatred
among social groups to initiate for violent act based on their ethnicity, political attitude, religion
and other social assets When the government wants to know the situations that happened because
of hateful posts and the reaction of the society for hate speech posts related to the real-time
situation using posts and comment to forward a decision based on the situation is difficult to do so
manually because of the huge amount of social media data.
Amharic language is the widely used language by most social media users in Ethiopia and it is
used to post essential texts about business, social, political, or any other activities followed by
other languages English and Afan Oromo. Detecting hate speech texts from Amharic posts are
difficult because of a unique set of punctuation marks and its rich verb morphology, compound
words, abbreviations, spelling variation, and unstructured nature of social media posts.to overcome
these difficulties and to provide a prediction based on those texts after analytics is done on them
the development of automated hate speech detection model using deep learning is needed.
A lot of researches on hate speech detection has been undertaken for English, Amharic and other
languages before. But they have a problem which is they used a small-sized data set they used
machine learning algorithms with low accuracy to classify hate speech as hate and offensive but
hate speech is always changing and new trends emerge the system only identifies text but not
image and also it was not integrated with current popular social media Telegram. To overcome
this problems in our system we used both text and image detection and also we integrated the
system with Telegram. Also to stay ahead of trends, we try to identify potentially inflammatory
speeches that haven’t been reviewed for possible removal from Social media by improving the test
accuracy of the system.
3
Objective of Study
General Objective
The general objective of the project is to develop a model to detect hate speech posts by analyzing
the data from social media from English, Amharic and Afan Oromo text and image posts using
deep learning approach.
Specific Objectives:
Analysis of the general structure of English, Amharic and Afan Oromo statements related
to opinions and sentiments such as identifying, negative, positive and neutral statements.
Analyze the semantics between opinion expressions across the three languages.
Prepare the labeled dataset of the languages.
Develop a model to detect hate speech posts.
Building domain specific and general-purpose lexicon of English, Amharic and Oromiffa
language’s opinion terms where these terms are tagged accordingly.
4
Methodology
First, we will collect data from social media post and comments
Then we label the data that we collect from the social media in to hate speech and
normal speech.
Next, we are going to clean dataset from empty spaces, digits, non-word characters,
HTML, URLs, and non-Amharic characters.
Then we normalize the dataset by replace the characters with the same
pronunciation but different in character.
After that we tokenize the data.
After we tokenize we split the data in to two.
The first dataset is for testing and the second is for train the machine.
5
Then we train the machine with the train dataset.
We plan to use 40% our train dataset for validation.
Then we design model Architecture which is suitable for our dataset. It may be
CNN or LSTM and other. We mainly use CNN and RNN with word2vec
Algorithms
The final stage for the project is to integrate with telegram. After we complete the
above we will build telegram bot and test.
6
Methodology for violence image detection
7
Scope of the Project
The main aim of this project is to develop a hate speech detection model in the sentence level in
three language. These language are
Amharic
Afaan Oromo and
English
We plan to gather dataset from social media like Facebook, YouTube, Twitter posts and comments.
We plan to classified hate speech in the following parts
o ethnic hate speech
o religious hate speech
o gender hate speech or sexism or adult word
o normal speech
In other hand the goal of this project to detect violence image that posted on social media. The
final stage of this project it to integrate with telegram.
8
Significance of The Study
In Ethiopia, there are a lot of social media users because of easy accessibility for users and there
are a lot of posts from different users and those posts pose influence on the community either
positively or negatively. One of the negative impacts of social media is triggering conflict and
create serious disagreement and quarrel among the social group which affects both the societies
and the government. Hate speech detection systems create a monitored environment that no hate
speech is tolerated and exercised. And hate speeches that are tagged would be investigated and
people exercising it would become accountable for their actions which may create catastrophic
result. In the long run, hate-free speech would build societal unity and peace for all people. It is
also meant to create awareness about the seriousness of the action and its dangerous outcome.
To detect hate speech posts and comments of the community from English, Amharic and
Afan Oromo text posts and images
This model will help the social media platform by improving their local language hate
speech post detection system.
To help the user to be protected from hate speech and conflict triggering posts.
Creating a safe and tolerable web environment where everyone respects and interacts
peacefully.
9
Time Frame for the Project/Work Plan
10
References
[1] Z. Mossie, J. Wang, Social Network Hate Speech Detection for Amharic Language, Taipei, Taiwan:
National Taipei University of Technolgy, 2018.
[2] S. Getachew, Automated Amharic Hate speech Posts and Comments Detection Model using
Recurrent Neural Network, Addis Ababa , Ethiopia: Addis Ababa Science and Technology
University, 2020.
[3] Paula Fortuna ,Sergio Nunes, Automatic detection of hate speech in text, Portugal: INESC TEC and
Faculty of Engineering, University of Porto, 2019.
11