Last Data Analytics Report-1267
Last Data Analytics Report-1267
on
Empowering Sentiment Analysis with Hugging Face on Amazon
SageMaker
during
III Year II Semester Summer
submitted to
The Department of Information Technology
Bachelor of Technology
in
Information Technology
by
Mr. M. Dhanaraju
Assistant Professor
An Autonomous Institution
Affiliated to
Jawaharlal Nehru Technology University
Hyderabad - 500085
Department of Information Technology
Sreenidhi Institute of Science and Technology
Department of Information Technology
CERTIFICATE
This is to certify that this Summer Industry Internship – II Report on “Empowering Sentiment Analysis
with
Hugging Face on Amazon SageMaker”, submitted by Sai charan Baru(20311A1282) ,in the year
2023 in partial fulfillment of the academic requirements of Jawaharlal Nehru Technologica University for the
award of the degree of Bachelor of Technology in Information Technology, is a bonafide work in industry
internship that has been carried out during III B-Tech IT-II Semester, under our guidance. This report has not
been submitted to any other institute or university for the
award of any degree.
External
Examiner Date:-
Internship Certificate:
DECLARATION
I would like to express my gratitude to all the people behind the screen who helped me to transform
an idea into a real application. I would like to thank Internship coordinator Mr.M.Dhanaraju sir for
their technical guidance, constant encouragement and support in carrying out my project at college. I
profoundly thank Dr. Sunil Bhutada sir, Head of the Department of Computer Science &
Engineering who has been an excellent guide and also a great source of inspiration to my work.
I would like to express my heart-felt gratitude to my parents without whom I would not have been
privileged to achieve and fulfill my dreams. I am grateful to our principal, Dr. T. Ch. Siva Reddy,
who most ably run the institution and has had the major hand in enabling me to do my project. The
satisfaction and euphoria that accompany the successful completion of the task would be great but
incomplete without the mention of the people who made it possible with their constant guidance and
encouragement crowns all the efforts with success. In this context, I would like thank all the other
staff members, both teaching and non-teaching, who have extended theirtimely help and eased my
task.
Abstract:
In this project, I have explained a combined service of AWS using some AWS services, i.e.,
Amazon Redshift, S3,IAM,SageMaker and some more. If we deploy this service as a project or
product, it will be more useful to the companies and as well as to the individual users also. In this
project, we’ve taken a sample dataset i.e., India GDP data and, by using the services provided by
Amazon RedShift, S3 and other modules, we’ve gone through the data and interpreted it (data) in
the form of charts and graphs. To avail this service precisely, the user or company has to follow
the steps that we’ve provided. The user has to upload, their specific data into S3 storage service,
then load it into the query editor and then the user will be free to perform analysis and
interpretation. These services will ensure the effective and efficient analysis and interpretation of
data. We’ve explained in detail about every step, with figures. We believe, by reading the
description given and by analyzing the picture of each step, the user will be able to implement the
desired services with ease. Data visualization will definitely help in prediction of further steps to
be taken and analyzing the data.
Rishitha
Goshika
20311A1267
INDEX
Abstract i
1. INTRODUCTION 1
1.1 About the Internship and Plan of Training program 1
1.2 Scope 1
1.3Proposed System 2
2. SYSTEM ANALYSIS 3
7. REFERENCES 14
1. INTRODUCTION:
In the part1 of internship, i.e., AWS Cloud Technology, the mentors in the modules has
explained about every service. In the modules, the mentors has explained in-detail about
every services that the AWS provides to the users and also how the services are
implemented. The mentors has explained some basics of AWS cloud services, which are
so much beneficial for a beginner. In the Cloud Internship, the student will get to know
everything about AWS Cloud technology and some of the crucial techniques that one can
use to develop her/his idea.
In part2 of the internship, i.e., AWS Data Analytics, the students were taught how to
handle different kind of operations with different type data based on the requirements.
Themodules will give clear knowledge about how to load data into the platform or dialog
and handle data. It contains of 8 labs, and each module will explain a service that the
AWS provides, which will help in uploading, understanding data and interpret it using
different techniques.
1.2. Scope:
Data visualization is the graphical representation of information and data. By using visual
elements like charts, graphs, and maps, data visualization tools provide an accessible way to
see and understand trends, outliers, and patterns in data. Additionally, it provides an
excellent way for employees or business owners to present data to non-technical audiences
without confusion. In the world of Big Data, data visualization tools and technologies are
essential to analyse massive amounts of information and make data-driven decisions.
Asthe “age of Big Data” kicks into high gear, visualization is an increasingly key tool to
make sense of the trillions of rows of data generated every day. Data visualization helps to
tell stories by curating data into a form easier to understand, highlighting the trends and
outliers.
1
A good visualization tells a story, removing the noise from data and highlighting useful
information.
Building a BI and data visualization service in the cloud allows you to take advantage of
capabilities such as scalability, availability, redundancy, and enterprise grade security. It also
lowers the barrier to data connectivity and allows access to far wider range of data sources
—both traditional, such as databases, as well as non-traditional, such as SaaS sources. An
added advantage to a cloud-based data visualization service is the elimination of
undifferentiated heavy lifting related to managing server infrastructure.
Amazon Web Services (AWS) has numerous services for different applications. Like, for
storing data, analysing data, interpreting data, connection management and many more using
different modules, like, EC2 service, S3 bucket, RDS, etc. One of them is Visualization
service which is a combo of different services. Amazon Redshift is one of the most helping
service available in AWS for data visualization, analysis and interpretation. We’ve used
services in Amazon Redshift, like, clustering, query Editors, query Editor version2, etc.
In Amazon RedShift, the user will upload the data securely and then using the services
available in QueryEditors of it, the user can try to visualize and understand the data with
different techniques and can store or save it for further analysis and prediction works.
2
2. SYSTEM ANALYSIS:
An Operating System, can be any type, i.e., Windows or MaC or Linux, etc… containing
a highly reliable browser application or program with high-speed internet is much
appreciable to perform the operations.
The user should have an AWS account to access AWS Console Management.
AWS is also available as mobile edition. But, only some of the services available in AWS
are allowed to use. So, online interface is better option.
As the tasks includes accessing of more than one sessions or dialogs simultaneously, the
operating system and processor should be able to handle it. So, fast computing system is
better recommended.
3
3. SYSTEM ARCHITECTURE AND UML DIAGRAM
In UML, use-case diagrams model the behavior of a system and help to capture the
requirements of the system. Use-case diagrams describe the high-level functions and
scope of a system. These diagrams also identify the interactions between the system
and its actors.
4
4. SYSTEM IMPLEMENTATION:
* AmazonS3FullAccess
5
Output Screen:
6
Fig.4.3:creating Buckets in S3
Output Screen:
Fig.4.4:Output Screen of S3
7
Task 3: Go to Amazon Redshift and Create a cluster:
Fig.4.7:Cluster dashboard
Output Screens:
8
Task 4: Creating Jupyter Notebook with Amazon SageMaker
On the AWS Management Console, on the Services menu, choose Amazon SageMaker.
Fig.4.6:Iam service
From the navigation menu, choose Notebook instances.
Fill in the required details and give required permissions.
Click on Create Notebook instance.
The jupyter notebook is created
9
Output Screens:
Fig.4.8:Notebook Instances
This is the jupyter notebook lab page that we have created just now.
Fig.4.9:Jupyter Notebook
10
Task 5 : Creating Visualizations (Line graph) with Bokeh:
After Successfully creating the jupyter notebook, Open it.
By using some simple code we are going to create a line graph.
After choosing Run,Bokeh creates a file called lines.html.
Then we will save it as Create line graph.
Open the Jupyter dashboard by choosing the Jupyter logo.
From the list, Open the lines.html file.
Output Screen:
Fig.4.10:Visualization of data
11
5. INTERNSHIP FEEDBACK:
12
6. CONCLUSION AND FUTURE SCOPE:
Data visualization is the most effective way of interpreting the data. Rather than normal data
representation, i.e., in the form of tables, sheets, etc..., visualized documentation will
communicate more about the data. Using visualized data, one can easily remove outliers,
handle noises, understand and classify the data and predict future scoping needs for a
company. Which is very much helpful for entrepreneurs. So, we recommend the users or
customers to utilize this service and the services that the interface or platform provides. If
one’s goal is to understand perfectly about their organization, they will look at every aspect
of the company, which includes, it’s production, progress, revenue that company is making,
etc… traits. This service is the most reliable and efficient platform to work for such kind of
goals. Because, as we’ve seen about what kind of services does the platform or interface is
providing and implementation of them, which are some of the main tasks that every
organization wants to expect of. Using the services that are explained before, anyone can
easily visualize and interpret the data and make analysis about it, i.e., the data.
13
7. REFERENCES:
1. https://docs.aws.amazon.com/wellarchitected/latest/analytics-lens/data-visualization.html
2. https://www.tableau.com/learn/articles/data-visualization
3. https://en.wikipedia.org/wiki/Amazon_Web_Services
4. https://aws.amazon.com/console
5. https://signin.aws.amazon.com/signin?redirect
6. https://aws.amazon.com/certification/certification-prep/testing
7. https://aws.amazon.com/about-aws/whats-new/2017/05/aws-training-and-certification-portal-
now-live/
8. https://aws.amazon.com/certification/certified-cloud-practitioner/
14
APPENDIX A
Abstract
In this project, we’ve explained a combined service of AWS using some AWS services, i.e.,
Amazon Sagemaker and Hugging face hub. Amazon SageMaker provides several built-in machine
learning (ML) algorithms that you can use for a variety of problem types. These algorithms provide
high-performance, scalable machine learning and are optimized for speed, scale, and accuracy.
Using these algorithms you can train on petabyte-scale data. They are designed to provide up to 10x
the performance of the other available implementations. Hugging Face's transformers
library with a custom Amazon sagemaker-sdk extension to fine-tune a pre- trained transformer on
binary text classification. The pre-trained model is fine-tuned using the sst2 dataset.
15
APPENDIX B
DOMAIN OF INTERNSHIP AND NATURE OF INTERNSHIP
Nature of Project
Product Application Research Others(please
Title specify)
Empowering sentiment
analysis with hugging faces
on amazon sagemaker
16
Table 3: Domain of the Project/ Internship work (Please tick √Appropriate
for your project)
17
18
19
20
21
22
23
24
25