FINAL

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

A

Mini Project report on

EMOTION RECOGNITION FROM


TEXT USING BERT: A DEEP
LEARNING APPROACH IN NLP
Submitted To
Nalla Narasimha Reddy Education Society’s Group Of Institutions
In Partial Fulfillment of the Requirements for the Award of Degree Of

BACHELOR OF TECHNOLOGY

IN

COMPUTER SCIENCE AND ENGINEERING

Submitted By

G.VARUN TEJA 227Z5A6603


G.SHRUTHI 217Z1A6623
B.ABHIRAM 217Z1A6607

Under the Guidance of


Mrs. P. MANASA
Assistant Professor

SCHOOL OF ENGINEERING
Department of Computer Science and Engineering

NALLA NARASIMHA REDDY


EDUCATION SOCIETY’S GROUP OF INSTITUTIONS
(Approved by AICTE, New Delhi, Affiliated to Autonomous-Hyderabad)
Chowdariguda (VIll) Korremula 'x' Roads, Via Narapally, Ghatkesar (Mandal)
Medchal (Dist), Telangana-500088

2024-2025
CERTIFICATE

This is to certify that the project report titled “EMOTION RECOGNITION


FROM TEXT USING BERT: A DEEP LEARNING APPROACH IN NLP” is being
submitted by G. Varun Teja (22Z5A6603), G. Shruthi (217Z1A6623), B. Abhiram
(217Z1A6607), in Partial fulfilment for the award of Bachelor of technology in Computer
Science & Engineering is a record bonfire work carried out by them. The results embodied
in this report have not been submitted to any other University for the award of any degree.

Internal Guide Head of the Department


(Mrs. P. Manasa) (Dr. K. Rameshwaraiah)

Submitted for the University Examination held on…………………………….

External Examiner
DECLARATION

We G. Varun Teja, G. Shruthi, B. Abhiram the students of Bachelor of Technology


in Computer Science and Engineering, Nalla Narasimha Reddy Education Society’s
Group Of Institutions, Hyderabad, Telangana, hereby declare that the work presented in
this project work entitled Emotion Recognition From Text Using Bert: A Deep Learning
Approach In NLP is the outcome of our own bonafide work and is correct to the best of our
knowledge and this work has been undertakentaking care of engineering ethics. It contains
no material previously published or written by another person nor material which has been
accepted for the award of any other degree or diploma of the university or other institute of
higher learning.

G. VARUN TEJA 227Z5A6603


G. SHRUTHI 217Z1A6623
B. ABHIRAM 217Z1A6607

Date:

Signature:
ACKNOWLEDGEMENT

We express our sincere gratitude to our guide Mrs. P. MANASA, Assistant Professor,
in Computer Science and Engineering Department, NNRESGI, who motivated throughout
the period of the project and also for his valuable and intellectual suggestions apart from his
adequate guidance, constant encouragement right throughout our work.

We wish to record our deep sense of gratitude to our Project In-charge Mrs. INDRANI,
Assistant Professor, in Computer Science and Engineering Department, NNRESGI, for
giving her insight, advice provided during the review sessions and providing constant
monitoring from time to time in completing our project and for giving us this opportunity to
present the project work.

We profoundly express our sincere thanks to Dr. K. Rameshwaraiah, Professor &


Head, Department of Computer Science and Engineering, NNRESGI, for his cooperation
and encouragement in completing the project successfully.

We wish to express our sincere thanks to Dr. G. Janardhana Raju, Dean School of
Engineering, NNRESGI, for providing the facilities for completion of the project.

We wish to express our sincere thanks to Dr. C. V. Krishna Reddy, Director


NNRESGI for providing the facilities for completion of the project.

Finally, we would like to thank Project Co-Ordinator, Project Review Committee


(PRC) members, all the faculty members and supporting staff, Department of Computer
Science and Engineering, NNRESGI for extending their help in all circumstances.

By :
G. VARUN TEJA 227Z5A6603
G. SHRUTHI 217Z1A6623
B. ABHIRAM 217Z1A6607
ABSTRACT

Emotion recognition from text is a critical task in Natural Language Processing (NLP),
with wide-ranging applications in sentiment analysis, mental health monitoring, and human
computer interaction. It investigates the application of Bidirectional Encoder
Representations from Transformers (BERT) for emotion recognition. BERT’s ability to
capture contextual dependencies and nuances in text, through its bidirectional transformer
architecture, makes it ideal for this task. By fine-tuning pre-trained BERT models on an
emotion-labeled dataset, we classify emotions such as happiness, sadness, anger, surprise,
fear, and others. The model leverages BERT's pre-trained knowledge of language and its
capacity to understand complex syntactic and semantic patterns. Experimental evaluations
show that BERT achieves superior accuracy in detecting emotions compared to traditional
machine learning models like SVM and logistic regression, due to its deep contextual
embeddings. Furthermore, our approach improves generalization across varied emotional
expressions, demonstrating BERT's robustness in emotion aware NLP tasks. These results
suggest that BERT-based models hold significant promise for enhancing the accuracy and
effectiveness of emotion recognition systems. The main tool we have used in this project is
Streamlit. With the appearance of Streamlit, we can create powerful web packages that
leverage the power of NLP for emotion detection.

Keywords : Sentiment Analysis, Probabilistic model, Streamlit, Natural Language Processing,


Machine Learning, Bidirectional Encoder Representations from Transformers Algorithm,
Feature Extraction, Deep Learning Architectures.
TABLE OF CONTENTS
Page No.
Abstract
List of Tables i
List of Figures iii

1. INTRODUCTION 1

1.1 Motivation 1
1.2 Problem Statement 1
1.3 Purpose 1
1.4 Scope 1
1.5 Project objective 2
1.6 Limitations 2

2. LITERATURE SURVEY 3
2.1 Existing System 4
2.3 Proposed System 5

3. SYSTEM ANALYSIS 6
3.1 Functional Requirements 6
3.2 Non Functional Requirements 7
3.3 Interface Requirements 9

4. SYSTEM DESIGN 10
4.1 DFD/ER/UML Diagrams 11
4.2 Modules 16

5. IMPLEMENTATION & RESULTS 18


5.1 Method of Implementation 18
5.2 Explanation of Key function 22
5.3 Output Screens 26

6. SYSTEM TESTING 29
6.1 Introduction for testing 29
6.2 Various Testcase Scenarios 31

7. CONCLUSION 33
7.1 Project Conclusion 33
7.2 Future Enhancement 33

8. REFERENCES 34
8.1 Paper References 34
8.2 Websites 35
8.3 Text Books 35
LIST OF FIGURES

Figure No. Name Of The Figure Page No

4.1.1.1 DFD Level-0 11

4.1.1.2 DFD Level-1 11


4.1.2.1 Use Case Diagram 12
4.1.3.1 Class Diagram 13

4.1.4.1 Sequence diagram 14

4.1.5.1 Activity diagram 15

5.1.1 Output screens 1- Open Link 26

5.1.2 Output screens 2- Home page 26

5.1.3 Output screens 3- Input 27

5.1.4 Output screens 4- Output 22


LIST OF TABLES

Table No. Name Of The Table Page No.

6.2.1 Test Cases 31-32


1. INTRODUCTION

1.1 MOTIVATION
In the last few years, research to understand human emotions based on text data has
become especially important in areas like customer service, mental health analysis, and
human-computer interactions. However, current emotion detection models do not
capture fine-grained emotions, sarcasm or more nuanced context. The main purpose of
this system is to increase the accuracy of emotion recognition in the text to provide more
insights into emotional states. In addition, the main motivation of the project is to utilize
BERT's contextual understandings to better emotions detected from text while in turn
improving the implications for sentiment analysis and feedback systems.

1.2 PROBLEM STATEMENT


Detecting emotions accurately from text can be challenging, as traditional methods
struggle to interpret subtle emotions, sarcasm, and context, leading to incorrect
classification. This lack of accuracy affects applications like sentiment analysis and
customer service interactions. This project aims to use BERT's advanced deep learning
capabilities to enhance emotion recognition from text, providing more precise
emotional insights.

1.3 PURPOSE
This project will utilize BERT to enhance emotion detection in text, providing more
accurate information for applications such as sentiment analysis and customer service,
resulting in better decisions and interactions..

1.4 SCOPE
With the growing importance of emotion detection in areas like customer service,
mental health, and social media analysis, this project aims to apply BERT for more
accurate emotion recognition in real-world applications, improving human-computer
interactions and decision-making processes.

1
1.5 PROJECT OBJECTIVE
To improve emotion recognition from text using BERT, enhancing accuracy and
effectiveness in applications like customer service and sentiment analysis.
By leveraging BERT’s contextual understanding, the project aims to significantly
improve the accuracy and nuance of emotion detection, enhancing applications such as
sentiment analysis, customer service, and social media monitoring.

1.6 LIMITATIONS
Using BERT for emotion recognition requires significant computational resources
and memory. Its accuracy also depends on the quality and diversity of the training data, and
it may face challenges with domain-specific language or imbalanced datasets.

BERT models are computationally expensive, requiring substantial memory and


processing power for both training and inference, which can limit their accessibility for
smaller organizations or low-resource environments. Additionally, BERT relies on large
amounts of labeled data for fine-tuning to achieve optimal performance, which can be a
challenge in domains where such data is scarce.

BERT's pre-training, based on static word embeddings, may also struggle with out-
of-vocabulary words or rapidly evolving language. Moreover, BERT is not inherently
designed for handling long sequences efficiently, as its architecture imposes quadratic
scaling in attention computation with input length, making it less effective for tasks
involving very long texts.

2
.

2. LITERATURE SURVEY

2.1 INTRODUCTION
Emotion recognition from text has become increasingly critical in today’s digital
landscape, impacting various fields such as customer service, mental health analysis,
and social media monitoring. Accurate emotion detection is essential for understanding
user sentiments, improving customer interactions, and analyzing social trends.
Traditional emotion detection models often struggle with capturing the complexity of
human emotions, including subtleties such as sarcasm, irony, and context-specific
sentiments. This limitation can lead to inaccuracies in sentiment analysis and reduced
effectiveness in applications relying on emotional insights.

Recent advancements in deep learning have significantly enhanced emotion


recognition capabilities. One of the most notable developments is the introduction of
BERT (Bidirectional Encoder Representations from Transformers). BERT represents
a major leap forward in natural language processing (NLP) by offering a more
sophisticated understanding of text. Unlike earlier models, BERT processes text in a
bidirectional manner, meaning it considers both the context before and after a word,
which allows for a deeper and more nuanced understanding of language.

BERT’s architecture is designed to handle complex linguistic features and capture


the context-dependent meanings of words. By providing richer contextual embeddings,
BERT can more accurately identify emotional cues in text, such as detecting subtle
variations in sentiment or understanding context-specific emotional expressions that
traditional models might overlook .

Incorporating BERT into emotion recognition systems aims to address the


limitations of previous approaches by enhancing the accuracy and depth of emotion
classification. With BERT, the goal is to achieve a more precise understanding of
emotional nuances, which can significantly improve applications like sentiment
analysis, customer feedback processing, and social media monitoring.

3
By leveraging BERT’s advanced capabilities, organizations can gain deeper insights
into user emotions, leading to more informed decision-making and improved user
experiences.

Incorporating BERT into emotion recognition systems aims to address the


limitations of previous approaches by enhancing the accuracy and depth of emotion
classification. With BERT, the goal is to achieve a more precise understanding of
emotional nuances, which can significantly improve applications like sentiment
analysis, customer feedback processing, and social media monitoring. By leveraging
BERT’s advanced capabilities, organizations can gain deeper insights into user
emotions, leading to more informed decision-making and improved user experiences.

2.2 EXISTING SYSTEM


For emotion detection, we have developed an organized method that starts with the
Emotion Dataset collection. This dataset provides information for a range of emotions
like joy, fear, disgust, sadness, and guilt.

Then, we conduct Data Pre-Processing (NLP) which involves the deletion of stop-
words and the use of stemmers to clean and prepare the data to be analyzed. This step
makes sure that the data is in the desired format for the subsequent processes.

The Data Training Process..., in this step, we divide data into testing and training
data. It divides the dataset in such a way that we will have a better method of evaluating
data and minimizing the issues that are arising due to the inaccuracy of the model.

In the following step, we employ a Machine Learning Algorithm called Logistic


Regression to the training data. The choice of logistic regression is due to its
effectiveness in solving classification problems which are the basis for the prediction of
emotions.

The Model Prediction is then pursued where the model is used to predict emotions
based on new data which is used to train it.

Finally, the result of this is the misidentification of each of the emotions which are
whether the emotion is joy, fear, disgust, sadness, or guilt.

4
2.3 PROPOSED SYSTEM
For emotion detection, we have arrived at a systematic strategy that starts with the
acquisition of an Emotion Dataset.We Initially, it collect data for various emotions such
as joy, fear, disgust, sadness, and guilt are collected.

Then we do Data Pre-Processing (NLP), where the data is cleansed by the removal
of stop words and stemming to clean the data and prepare them for analysis. This process
facilitates the proper data structure for the following steps.

Afterwards, we carefully examine the data by means of Exploratory Data Analysis


(EDA) to be able to find the key insights and patterns underneath. This allows us to
know if there are any outliers or trends which training can help to overcome.

The Data Training Process is accomplished by dividing the dataset into training data
and test data. Modelling is performed using Deep Learning Algorithm (BERT’s) with
the training data. BERT (Bidirectional Encoder Representations from Transformers) is
not without being inklings of interest about its efficacy in comprehending and
interpreting natural language and as a result, it became a widely acknowledged tool that
helps us to understand and predict emotions more effectively.

Through the aforementioned model training, we can subsequently Model Prediction,


where the model knows the emotions based on the new input data that has been applied.
The verification data is used to estimate the model's correctness.

Conclusively, emotions are discovered by the detection model and identified as


emotions of joy, fear, disgust, sadness, or guilt. Streamlit is the platform used to visualize
the outcomes of the detection method-model accuracy is one of the evaluation results
represented on the app page.

5
3.SYSTEM ANALYSIS

3.1 FUNCTIONAL REQUIREMENTS


In an emotion detection system based on the BERT (Bidirectional Encoder
Representations from Transformers) algorithm using Natural Language Processing
(NLP), the functional requirements define the system's expected capabilities. Here are
key functional requirements for such a system:

1. Input Handling

Text Input: The system must accept text inputs (sentences, paragraphs, or documents)
from the user.

Preprocessing: The system must perform text preprocessing tasks such as


tokenization, padding, and truncation, as required by BERT.

Language Support: It should handle multiple languages or specific ones based on the
training data.

2. Model Loading and Configuration

Pre-trained BERT Model Loading: The system should load a pre-trained BERT
model (e.g., bert-base-uncased) or a fine-tuned version specific to emotion detection.

Tokenizer Integration: The system must utilize the appropriate tokenizer for the
BERT model to convert input text into token IDs.

GPU/CPU Processing: It should support both GPU and CPU processing, enabling
faster inference if a GPU is available.

3. Emotion Detection

Emotion Prediction: The system must classify the input text into a predefined set of
emotions (e.g., joy, sadness, anger, fear, disgust, surprise, etc.).

Multi-Class Classification: The system should be capable of handling multi-class


classification, assigning a single emotion label to each input.

Confidence Score: For each prediction, the system must provide a confidence score or

6
probability for the detected emotion.

4. Batch Processing

Batch Input Support: The system should allow for batch processing of multiple texts
at once to increase processing efficiency.

5. Model Training (if included)

Fine-Tuning BERT: The system should support fine-tuning the pre-trained BERT
model on a specific emotion-labeled dataset.

Training Data Handling: The system must allow loading and preprocessing of custom
datasets for model training and validation.

Evaluation Metrics: The system should provide performance metrics such as


accuracy, precision, recall, and F1-score during and after training.

6. User Interface (if applicable)

Interactive Interface: If integrated into an application (e.g., Streamlit), the system


should provide a user-friendly interface to input text and view emotion predictions.

Visualization: The system should display the predicted emotion and its confidence
score visually, perhaps with bar charts or color-coded labels.

3.2 NON-FUNCTIONAL REQUIREMENTS


The non-functional requirements (NFRs) for a BERT-based emotion detection
system define the qualities and constraints that the system must meet to ensure usability,
performance, security, and maintainability. These requirements help shape how well the
system performs, rather than what the system does. Here are key non-functional
requirements for such a system:

1. Performance

Latency: The system should have minimal response time (e.g., under 1 second per text
input) for emotion prediction, ensuring a smooth user experience, especially in real-time
applications.

7
Throughput: The system should be able to process a high number of text inputs per
second, particularly when handling batch processing or API calls.

Scalability: The system should scale efficiently, whether vertically (increasing


computational power) or horizontally (adding more instances), to handle increased
workloads without significant performance degradation.

2. Accuracy and Reliability

Prediction Accuracy: The emotion detection model should maintain high accuracy,
with a desired F1-score, precision, or recall above a specified threshold (e.g., 85%+ on a
benchmark dataset).

Consistency: The system should produce consistent results for similar inputs,
minimizing model instability.

Fault Tolerance: In case of component failure (e.g., model loading errors, processing
errors), the system should recover gracefully without impacting other functionalities.

3. Usability

User-Friendly Interface: If deployed as an application (e.g., using Streamlit), the


interface should be intuitive, easy to navigate, and provide clear instructions for users to
input text and view results.

Learnability: Users should be able to easily learn how to interact with the system,
with minimal training or documentation required.

4. Resource Efficiency

Memory Usage: The system should optimize memory usage to avoid crashes or
slowdowns, particularly for large-scale models like BERT.

CPU/GPU Utilization: The system should make efficient use of available hardware,
leveraging GPUs for model inference when available to minimize processing time.

Energy Efficiency: The system should aim for minimal energy consumption,
particularly in cloud-based deployments, to reduce operational costs and environmental
impact.

8
5. Availability and Uptime

High Availability: The system should have high availability (e.g., 99.9% uptime),
especially in production environments, ensuring that users can access it at any time.

Redundancy: The system should have failover mechanisms or redundancy (e.g.,


backup models, databases) in case of hardware or software failures.

INTERFACE REQUIREMENTS
3.2.1 User Requirements
Data preprocessing
Test frameworks
Web frameworks
Real time Predition

3.1.1 System Requirements:


3.1.1.1 Hardware Requirements:

System : MINIMUM i5.


Hard Disk : 128 GB.
Ram : 8 GB.
3.1.1.2 Software Requirements:
Operating System : XP/7/10/11
Coding Lnguage : Python 3.11

3.1.2 Performance Requirements:

1.Accuracy: Measure how often the model correctly predicts the emotion. High accuracy
is crucial, especially if the application is sensitive to misclassifications.

2.Precision and Recall: Precision indicates the proportion of positive identifications


that were actually correct, while recall indicates the proportion of actual positives that
were correctly identified. Balancing these metrics is important for practical applications.

3.F1 Score: This is the harmonic mean of precision and recall. It is a good metric to assess
the model's performance, especially in cases where class distribution is imbalanced.

9
3. SYSTEM DESIGN

4.1 UML DIAGRAMS


UML stands for Unified Modeling Language. UML is a standardized general-
purpose modeling language in the field of object-oriented software engineering. The
standard is managed, and was created by, the Object Management Group. The goal is
for UML to become a common language for creating models of object oriented computer
software. In its current form UML is comprised of two major components: a Meta-model
and a notation. In the future, some form of method or process may also be added to; or
associated with, UML.
The Unified Modeling Language is a standard language for specifying,
Visualization, Constructing and documenting the artifacts of software system, as well as
for business modeling and other non-software systems. The UML represents a collection
of best engineering practices that have proven successful in the modeling of large and
complex systems.
The UML is a very important part of developing objects oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.

Goals:
The Primary goals in the design of the UML are as follows:
Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
Provide extendibility and specialization mechanisms to extend the core concepts.
Be independent of particular programming languages and development process.
Provide a formal basis for understanding the modeling language.
Encourage the growth of OO tools market.
Support higher level development concepts such as collaborations, frameworks,
patterns and components.

10
4.1.1 Data Flow Diagram
A data flow diagram (DFD) maps out the flow of information for any process or
system.It uses defined symbols like rectangles, circles and arrows, plus short text
labels, to show data inputs, outputs, storage points and the routes between each
destination. There are 2 levels in the below diagram.

Fig :4.1.1.1 Data Flow Diagram : Level 0

Fig :4.1.1.2 Data Flow Diagram : Level 1

11
4.1.2 Use Case Diagram:
A use case diagram in the Unified Modeling Language (UML) is a type of
behavioral diagram defined by and created from a Use-case analysis. Its purpose is to
present a graphical overview of the functionality provided by a system in terms of
actors, their goals (represented as use cases), and any dependencies between those use
cases. The main purpose of a use case diagram is to show what system functions are
performed for which actor. Roles of the actors in the system can be depicted.

Fig :4.1.2.1 Usecase Diagram

12
4.1.3 Class Diagram:
In software engineering, a class diagram in the Unified Modeling Language (UML)
is a type of static structure diagram that describes the structure of a system by showing
the system's classes, their attributes, operations (or methods), and the relationships
among the classes. It explains which class contains information.

Fig :4.1.3.1 Class Diagram

13
4.1.4 Sequence Diagram:
A sequence diagram in Unified Modelling Language (UML) is a kind of
interaction diagram that shows how processes operate with one another and in what
order. It is a construct of a Message Sequence Chart. Sequence diagrams are
sometimes called event diagrams, event scenarios, and timing diagrams.

Fig :4.1.4.1 Sequence Diagram

14
4.1.5 Activity Diagram:
Activity diagrams are graphical representations of workflows of stepwise activities
and actions with support for choice, iteration and concurrency. In the Unified Modeling
Language, activity diagrams can be used to describe the business and operational step-
by-step workflows of components in a system. An activity diagram shows the overall
flow of control.

Fig :4.1.5.1 Activity Diagram


15
4.2 MODULES:

4.2.1 System Module

Data Collection: Collect a diverse set of text data from various sources (e.g., social media
posts, reviews, or conversational data) labelled with different emotions such as joy,
sadness, anger, fear, etc. Ensure the data covers a wide range of emotional expressions to
provide robust training for emotion classification. Obtain pre-trained BERT models for text
embedding as a starting point.

Preprocessing: Clean the collected text data by removing unnecessary elements such as
special characters, URLs, and stopwords. Tokenize the text and convert it into a format
suitable for BERT, including adding [CLS] and [SEP] tokens for sentence representation.
Perform label encoding for emotions to ensure they are in a numerical format for model
training. Split the dataset into training and testing sets for model evaluation.

Training: Fine-tune the pre-trained BERT model using the preprocessed text data. Split
the preprocessed data into training and validation sets for model training. Adjust
hyperparameters such as learning rate, batch size, and maximum sequence length for
optimal performance.

Evaluation: Evaluate the fine-tuned BERT model using evaluation metrics such as
accuracy, F1-score, precision, and recall. Analyze the model's performance across different
emotional categories and review its precision in detecting emotions from text. Visualize
prediction probabilities and compare them with the true labels to assess the model’s
predictive accuracy.

Deployment: Deploy the trained BERT-based emotion detection model in a user-friendly


application (e.g., a Streamlit app). Ensure the system’s scalability, reliability, and security
for real-time text emotion detection. Continuously monitor the model's performance and
update it with new data or fine-tune it periodically to improve its ability to detect emotions
accurately from text.

16
4.2.2 End User Module

Users can log into the application using a username and password, ensuring secure
access. Upon successful login, users are directed to the main interface of the emotion detection
system. Users can input any text or sentence into the provided text field for emotion analysis.
The input text is preprocessed and prepared for emotion detection using the BERT-based
model. Upon submitting the text, the system predicts the emotion conveyed in the input. The
BERT-based model processes the text and classifies it into one of the predefined emotion
categories (e.g., joy, anger, sadness, etc.).The application displays the predicted emotion
along with a confidence score. Users can view a graphical representation of the emotion
prediction, such as a probability chart or emoji-based feedback for better understanding. The
application may also provide a comparison of various emotion probabilities detected in the
input text.

17
5 IMPLEMENTATION AND RESULTS

5.1 METHOD OF IMPLEMENTATION


5.1.1 What is Python :-
Python is currently the most widely used multi-purpose, high-level programming
language.Python allows programming in Object-Oriented and Procedural paradigms.
Python programs generally are smaller than other programming languages like Java.
Programmers have to type relatively less and indentation requirement of the
language, makes them readable all the time.

Python
Python is an interpreted high-level programming language for general-purpose
programming. Created by Guido van Rossum and first released in 1991, Python has
a design philosophy that emphasizes code readability, notably using significant
whitespace.

Python features a dynamic type system and automatic memory management. It


supports multiple programming paradigms, including object-oriented, imperative,
functional and procedural, and has a large and comprehensive standard library.

Python is Interpreted − Python is processed at runtime by the interpreter. You


do notneed to compile your program before executing it. This is similar to PERL
and PHP.

Python is Interactive − you can actually sit at a Python prompt and interact
with theinterpreter directly to write your programs.
Python also acknowledges that speed of development is important. Readable and
terse code is part of this, and so is access to powerful constructs that avoid tedious
repetition of code. Maintainability also ties into this may be an all but useless metric,
but it does say something about how much code you have to scan, read and/or the
easewith which a programmer of other languages can pick up basic Python skills
and the huge standard library is key to another area where Python excels. All its tools
have been quick to implement, saved a lot of time, and several of them have later
been patched and updated by people with no Python background - without breaking.

18
5.1.2 What is Deep Learning : -

Deep learning is a subset of machine learning that focuses on using neural networks
with multiple layers, known as deep neural networks, to model complex patterns and
representations from data. It is an approach that has revolutionized artificial intelligence (AI),
enabling machines to perform tasks such as image recognition, natural language processing,
speech recognition, and autonomous driving with unprecedented accuracy. Unlike traditional
machine learning algorithms, deep learning does not require manual feature extraction but
learns features automatically from raw data through hierarchical representations.

Neural Networks: The Foundation of Deep Learning At the core of deep learning
lies the artificial neural network (ANN), which is inspired by the structure and function of the
human brain. A neural network consists of interconnected layers of nodes, also called neurons,
that process input data and learn to make decisions based on it.
• Input Layer: This layer receives the input data, such as an image, a piece of text, or a sound.
• Hidden Layers: These layers lie between the input and output layers. Each neuron in a hidden
layer transforms the data using a mathematical function and passes it to the next layer. In deep
learning, the number of hidden layers is large, allowing the network to capture increasingly
abstract representations of the data.
• Output Layer: This layer produces the final output, such as a classification (e.g., recognizing
an object in an image) or a prediction (e.g., estimating the price of a stock).

Transformer architectures, introduced in natural language processing (NLP), have replaced


RNNs for many NLP tasks due to their ability to handle long-range dependencies more efficiently.
The most well-known transformer model is BERT (Bidirectional Encoder Representations
from Transformers), which has achieved state-of-the-art results in tasks like sentiment analysis,
translation, and question-answering.

19
Modules Used in Project :-
ktrain

The ktrain module is a lightweight Python library designed to simplify the process
of building, training, and deploying machine learning models. It provides an intuitive
and user-friendly interface for both traditional machine learning algorithms and modern
deep learning models, making it easier for developers and researchers to work with
complex frameworks like TensorFlow and Keras.

Numpy
Numpy is a general-purpose array-processing package. It provides a high-
performance multidimensional array object, and tools for working with these arrays. It
is the fundamental package for scientific computing with Python. It contains various
features including these important ones:
A powerful N-dimensional array object.
Sophisticated (broadcasting) functions.
Tools for integrating C/C++ and Fortran code.
Useful linear algebra, Fourier transform, and random number capabilities.
Besides its obvious scientific uses, Numpy can also be used as an efficient multi-
dimensional container of generic data. Arbitrary data-types can be defined using Numpy
which allows Numpy to seamlessly and speedily integrate with a wide variety of
databases.

Pandas
Pandas is an open-source Python Library providing high-performance data
manipulation and analysis tool using its powerful data structures. Python was majorly
used for data munging and preparation. It had very little contribution towards data
analysis. Pandas solved this problem.

20
Using Pandas, we can accomplish five typical steps in the processing and analysis
of data, regardless of the origin of data load, prepare, manipulate, model, and analyze.

Python with Pandas is used in a wide range of fields including academic and commercial
domains including finance, economics, Statistics, analytics, etc.

Matplotlib
Matplotlib is a Python 2D plotting library which produces publication quality figures
in a variety of hardcopy formats and interactive environments across platforms.
Matplotlib can be used in Python scripts, the Python and IPython shells,the
Jupyter Notebook, web application servers, and four graphical user interface toolkits.
Matplotlib tries to make easy things easy and hard things possible. You can generate
plots, histograms, power spectra, bar charts, error charts, scatter plots, etc., with just a
few lines of code. For examples, see the sample plots and thumbnail gallery.

For simple plotting the pyplot module provides a MATLAB-like interface, particularly
when combined with IPython. For the power user, you have full control of line styles,
font properties, axes properties, etc, via an object oriented interface or via a set of
functions familiar to MATLAB users.

Scikit – learn
Scikit-learn provides a range of supervised and unsupervised learning algorithms via
a consistent interface in Python. It is licensed under a permissive simplified BSD license
and is distributed under many Linux distributions, encouraging academic and
commercial use. Python

Python is an interpreted high-level programming language for general-purpose


programming. Created by Guido van Rossum and first released in 1991, Python has a
design philosophy that emphasizes code readability, notably using significantwhitespace

21
5.2 EXTENSION OF KEY FUNCTION:

5.2.1 Data Preprocessing:

The neattext library's functions module under the alias nfx. The neattext library is a
tool for cleaning and preprocessing text data, often used in NLP (Natural Language
Processing) tasks.

The remove_userhandles function from the neattext library to each entry in the Text
column of the DataFrame df. The remove_userhandles function removes user handles (e.g.,
@username) commonly found in social media posts like tweets. The cleaned text is stored in
a new column called Clean_Text.

The remove_stopwords function from neattext to the Clean_Text column. Stopwords


are common words (e.g., "the," "is," "and") that are often removed in text preprocessing as
they do not carry significant meaning in most NLP tasks. The resulting text remains in the
Clean_Text column.

22
5.2.2 Training

BERT Algorithm:

This function is used to convert raw text into a format suitable for BERT-based
models. It tokenizes the input text and ensures the data is prepared for efficient processing
during model training. BERT requires specific input formats like tokenized sequences and
attention masks, and this function helps in setting that up.

The function text.texts_from_array in ktrain is part of the text classification workflow,


especially when working with transformer models like BERT. This function performs
critical preprocessing steps that are required before feeding text data into a BERT model for
training or inference

23
5.2.3 Evaluation

Accuracy:

The one-cycle learning rate policy involves varying the learning rate during training. The
learning rate starts small, increases to a maximum value, and then decreases. This approach can
help in achieving better convergence.

Over the epochs, you can see that both training and validation accuracies generally improve,
indicating that the model is learning and generalizing well.

Overall, the model shows improvement in both training and validation metrics over the
epochs, which suggests that the training process is effective.

24
• BERT Algorithm:
BERT (Bidirectional Encoder Representations from Transformers) is a powerful
transformer-based model introduced by Google for natural language understanding tasks. It has
been widely used in various NLP tasks due to its ability to capture deep contextual relationships
in text.

Unlike traditional models that process text in a left-to-right or right-to-left manner, BERT
processes text bidirectionally. This means it looks at the entire sentence (or context)
simultaneously to understand the meaning of each word in relation to others.

BERT is initially trained on a large corpus of text data (such as Wikipedia) to learn general
language representations. It uses tasks like Masked Language Modeling (MLM) and Next
Sentence Prediction (NSP) during this phase.

After pre-training, BERT can be fine-tuned on specific tasks (like text classification) using
a smaller, task-specific dataset. This step adapts BERT's general language understanding to the
particular task at hand.

• LSTM Algorithm:
LSTM (Long Short-Term Memory) is a type of Recurrent Neural Network (RNN)
architecture specifically designed to handle sequential data and long-term dependencies. Unlike
traditional RNNs, which struggle with the vanishing gradient problem, LSTMs are capable of
learning long-range dependencies in sequences. Here’s a detailed explanation of the LSTM
algorithm based on its common usage and principles.

LSTM networks are well-suited for tasks involving sequences of data, such as time series
forecasting, text analysis, and speech recognition. They process data sequentially, maintaining a
hidden state that captures information from previous steps.

LSTMs are effective for tasks where understanding the order and context of elements in a
sequence is crucial. They excel in scenarios like sentiment analysis and sequence prediction,
where maintaining long-term dependencies in data is necessary

25
5.3 OUTPUT SCREENS :

Fig 5.1.1 Open Link

In above screen server started and now open the link.

Fig 5.1.2 Home Page

In the above screen you need to give the input in text area

26
Fig 5.1.3 Input

In above screen click on ‘Submit’ button to get the prediction

The output of an emotion detection system designed to analyze emotions conveyed in


text. In this instance, the input text, "She is so annoying," has been analyzed by the system,
which interprets the emotion conveyed as anger. This is a reasonable inference, as the word
"annoying" often carries negative connotations that can evoke feelings of frustration or
irritation. The system highlights this by explicitly labeling the detected emotion as anger,
accompanied by an angry emoji ( ).

27
Fig 5.1.4 Output

A bar chart is also provided to show the prediction probabilities for various
emotions. Among the emotions such as joy, sadness, fear, anger, neutral, and
others, anger has the highest probability, reaching around 0.7 (or 70%). Other
emotions, including joy, sadness, and neutral, register very low probabilities,
essentially close to zero. This visualization further reinforces the system’s conclusion
that anger is the dominant emotion in the analyzed text.

28
6. SYSTEM TEST

The purpose of testing is to discover errors. Testing is the process of trying to


discover every conceivable fault or weakness in a work product. It provides a way to check
the functionality of components, sub assemblies, assemblies and/or a finished product It is
the process of exercising software with the intent of ensuring that the Software system meets
its requirements and user expectations and does not fail in an unacceptable manner. There
are various types of test. Each test type addresses a specific testing requirement.

6.1 TYPES OF TESTS


Unit testing
Unit testing involves the design of test cases that validate that the internal program logic
is functioning properly, and that program inputs produce valid outputs. All decision branches
and internal code flow should be validated. It is the testing of individual software units of
the application .it is done after the completion of an individual unit before integration. This
is a structural testing, that relies on knowledge of its construction and is invasive. Unit tests
perform basic tests at component level and test a specific business process, application,
and/or system configuration. Unit tests ensure that each unique path of a business process
performs accurately to the documented specifications and contains clearly defined inputs and
expected results.

Integration testing
Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components
were individually satisfaction, as shown by successfully unit testing, the combination of
components is correct and consistent. Integration testing is specifically aimed at exposing
the problems that arise from the combination of components.

29
Functional test
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user
manuals.

Functional testing is centered on the following items:


Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must
be exercised.
Systems/Procedures : interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key


functions, or special test cases. In addition, systematic coverage pertaining to identify
Business process flows; data fields, predefined processes, and successive processes must be
considered for testing. Before functional testing is complete, additional tests are identified
and the effective value of current tests is determined.

System Test
System testing ensures that the entire integrated software system meets requirements. It
tests a configuration to ensure known and predictable results. An example of system testing
is the configuration oriented system integration test. System testing is based on process
descriptions and flows, emphasizing pre-driven process links and integration points.

White Box Testing


White Box Testing is a testing in which in which the software tester has knowledge of the
inner workings, structure and language of the software, or at least its purpose. It is purpose.
It is used to test areas that cannot be reached from a black box level.

30
Black Box Testing

Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of
tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a testing in
which the software under test is treated, as a black box .you cannot “see” into it. The test
provides inputs and responds to outputs without considering how the software works.

6.2 VARIOUS TESTCASE SCENARIOS

Fig : 6.2.1 Testcases

Testcases ID Test Testca Test data Expecte Actu Status


case se d result al
Description steps result
TC01 Accuracy 1.Give “I am The output The output Success
keyword Furious” should be is “anger”
s to see “anger” with a
high with high accuracy of
accuracy. accuracy 1

TC02 Neutral 1. Test “I went to The output The output Success


Text the Store” should be is neutral
model's “neutral”
behavior
with
neutral
or
ambiguo
us text.

31
TC03 Multiple 1. Verify “I am “The output “Only one Fail
Emotions the Happy but should emotion is
output also show two been
with the Nervous” emotions” predicted”
input text
expresse
s mixed
emotions
.
TC04 Empty 1. Verify “ ” The The output Success
input if the program is invalid
model should input.
handles handle
empty errors
input indicating
gracefull invalid
y. input

32
7. CONCLUSION AND FUTURE ENHANCEMENT

7.1 PROJECT CONCLUSION :


In this project, we successfully implemented a BERT-based text emotion detection
system using Natural Language Processing (NLP) techniques. The BERT
(Bidirectional Encoder Representations from Transformers) model, known for its state-
of-the-art performance in a wide range of NLP tasks, was fine-tuned to detect emotions
from textual data. This model leverages the pre-trained language understanding of BERT
and was adapted for the specific task of emotion classification, providing high accuracy
in detecting subtle emotional cues from text.

7.2 FUTURE ENHANCEMENT:

At present, most BERT-based emotion detection models are limited to English,


which restricts their use in non-English-speaking regions or multilingual contexts.
Expanding the system to support multiple languages would greatly increase its utility,
especially for global applications like customer service and social media sentiment
analysis. This could be achieved by integrating models like mBERT (Multilingual BERT)
or XLM-R (Cross-lingual RoBERTa), allowing the model to understand and detect
emotions in a variety of languages, thereby enhancing its global reach and effectiveness.

The current model often analyzes emotions on a sentence-by-sentence basis, which


can result in inaccurate predictions when emotions shift over the course of a conversation.
Future enhancements could focus on making the system context-aware, allowing it to
analyze an entire dialogue or text sequence to better capture emotional transitions. This
would provide a more nuanced understanding of emotions in longer texts, such as email
threads, social media conversations, or customer support chats. Implementing sequence-
based learning or using models that track emotional states across multiple sentences can
significantly improve the model's contextual accuracy.

33
8. REFERENCES

8.1 PAPER REFERENCES :


[1] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, “Emotion
Recognition In Human Computer Interaction,” in IEEE Signal Processing Magazine, vol.
18(1), Jan. 2001, pp. 32-80, doi: 10.1109/79.911197.

[2] Jia Guo. "Deep learning approach to text analysis for human emotion detection
from big data" , Journal of Intelligent Systems, 2022.J. Vijayalakshmi, K. PandiMeena,
Agriculture TalkBot Using AI, International Journal of Recent Technology and
Engineering (IJRTE), July 2019

[3] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, ‘‘Bert: Pre-Training Of Deep


Bidirectional Transformers For Language Understanding,’’ Oct. 2018.

[4] M. Abdul-Mageed and L. Ungar, ‘‘Emonet: Fine Grained Emotion Detection With
Gated Recurrent Neural Networks’’ in Proc. 55th Annu. Meeting Assoc. Comput.
Linguistics, vol. 4, Jul. 2017, pp. 718–728.

[5] S Ibrahiem, K Bahnasy, M Morsey, M Aref. "FEATURE EXTRACTION


ENCHANCEMENT IN USERS’ ATTITUDE DETECTION", International Journal of
Intelligent Computing and Information Sciences, 2018

[6] Binali, Haji, Chen Wu, and Vidyasagar Potdar. "Computational approaches for
emotion detection in text." In Digital Ecosystems and Technologies (DEST), 2010 4th
IEEE International Conference on, pp. 172-177. IEEE, 2010.

[7] Wu, Chung-Hsien, Ze-Jing Chuang, and Yu-Chung Lin. "Emotion recognition from
text using semantic labels and separable mixture models." ACM transactions on Asian
language information processing (TALIP) 5, no. 2(2006):165-183

[8] Kim, Y., ‘‘Convolutional Neural Networks For Sentence Classification,’’


arXiv:1408.5882.

34
[9] P. Zhong and C. Miao, ‘‘Ntuer At Semeval-2019 Task 3: Emotion Classification
With Word And Sentence Representations In RCNN’’ Feb. 2019.

[10] https://www.geeksforgeeks.org/explanation-of-bert-model-nlp/

[11] https://www.analyticsvidhya.com/blog/2020/05/what-is-tokenization-nlp/

8.2 WEBSITES
[1] https://www.sciencedirect.com/science/article/abs/pii/S0893608022000958
[2] https://ieeexplore.ieee.org/document/9317523
[3] https://ieeexplore.ieee.org/document/10053943

8.3 TEXT BOOKS


1. "Sentiment Analysis and Opinion Mining" by Bing Liu

2. "Foundations of Statistical Natural Language Processing" by Christopher D. Manning


and Hinrich Schütze

3. "Natural Language Processing with Python: Analyzing Text with the Natural
Language Toolkit" by Steven Bird, Ewan Klein, and Edward Loper

4. “Deep Learning for Natural Language Processing" by Palash Goyal, Sumit Pandey, and Karan
Jain

35

You might also like