1sj18cs064 Nisarga B
1sj18cs064 Nisarga B
1sj18cs064 Nisarga B
DECLARATION
Institute of Technology, Chickballapur, hereby declare that the Internship work entitled
“Amazon Product Review Analysis ” has been independently carried out by me under the
requirement for the award of degree in Bachelor of Engineering in Computer Science &
further declare that the report has not been submitted to any other University for the award of
PLACE: NISARGA B
Date: USN: 1SJ18CS064
i
ABSTRACT
A sentiment analysis API uses natural language processing (NLP) tasks to not only identify
aspects of the products from the Amazon reviews but also enable brands to look beyond star
ratings. Amazon review data analysis can give insightful customer information that can be
harnessed for product betterment.
We further scrutinized the reviews to analyze the qualitative sentiment beyond the numerical star
rating given. Sometimes people provide higher ratings, and their reviews are bad. The reviews
are written by actual users of the product, which means, what’s written is usually a first-hand
experience. his data is a treasure trove for brands because people can be very vocal about their
experience and give key information about the positives and negatives of a product, including its
delivery and the customer service they received
Another important reason is that even though a product may have received 4 stars, it doesn’t
necessarily mean that the product was good. Digging deeper into the comments, many a time,
shows that there are many negative points that the review mentions even though the person may
have given a higher star rating. And vice versa.
ii
ACKNOWLEDGEMENT
With reverential pranam, we express my sincere gratitude and salutations to the feet of his
holiness Byravaikya Padmabhushana Sri Sri Sri Dr. Balagangadharanatha Maha Swamiji,
& his holiness Jagadguru Sri Sri Sri Dr. Nirmalanandanatha Swamiji of Sri
Adichunchanagiri Mutt for their unlimited blessings. First and foremost we wish to express my
deep sincere feelings of gratitude to our institution, Sri Jagadguru Chandrashekaranatha
Swamiji Institute of Technology. For providing me an opportunities for completing my
internship work successfully.
I extend deep sense of sincere gratitude to Dr. G T Raju, Principal, S J C Institute of
Technology, Chickballapur, for providing an opportunity to complete the Internship Work.
I extend special in-depth, heartfelt, and sincere gratitude to our HOD Dr. Manujunath
Kumar, Professor and Head of the Department, Computer Science and Engineering, S J C
Institute of Technology, Chickballapur, for her constant support and valuable guidance of the
Internship Work.
I convey our sincere thanks to Internship Internal Guide Prof. Harshavardhana
Doddamani, Assistant Professor, Department of Computer Science and Engineering, S J C
Institute of Technology, for his/her constant support, valuable guidance and suggestions of the
Internship Work.
I am thankful to Internship External Guide Dhanush S, Project Engineer, Compsoft
Technologies, Rajajinagar for providing valuable guidance and encouragement of the
Internship Work.
I also feel immense pleasure to express deep and profound gratitude to our Internship
Coordinator Prof. Narendra Babu, Assistant Professor, Department of Computer Science
and Engineering, S J C Institute of Technology, for his guidance and suggestions of the
Internship Work.
Finally, I would like to thank all faculty members of Department of Computer Science
and Engineering, S J C Institute of Technology, Chickballapur for their support.
I also thank all those who extended their support and co-operation while bringing out this
Internship Report.
Nisarga B (1SJ18CS064)
iii
CONTENTS
Declaration i
Abstract ii
Acknowledgement iii
Contents iv
List of Figures vii
3 TASK PERFORMED
9
4 REFLECTION NOTES
4.1 Experience 14
4.2 Technical Outcomes 14
4.2.1 System Requirement Specification
4.3 System Analysis and Design 15
4.3.1 Existing System 15
4.3.2 Disadvantages of the Existing System 16
iv
4.3.3 Proposed System 16
4.3.4 Advantages of the Proposed System 16
4.4 System Architecture
4.4.1 Data Flow Diagram 17
4.5 Implementation 17
4.5.1 Modules
4.6 Screen Shots 18
5 CONCLUSION
21
BIBLIOGRAPHY
22
v
LIST OF FIGURES
vi
vii
CHAPTER - 1
COMPANY PROFILE
Our resources are expert in designing and developing applications using Agile and Scrum
methodologies. Whatever your software development methodologies may be, our resources
have experience in broad areas and they can pull any project successfully.
We work hard to enhance continuously our reputation for accessibility, professionalism,
performance, and the depth and quality of our long-term consultative relationships with our
clients. We endeavor to be valued as an industry leader in client satisfaction, quality
performance and reputation. All activities will be conducted to the highest ethical and
professional standards.
To help our clients achieve their objectives by serving as their manpower consulting firm.
Compsoft Technologies has one-to-one relationships with a number of clients, helping them
benefit from all of the technologies available to them and build a better solution that exceeds
client's expectation. It is our goal that offers a full range of software, consulting, support,
automation combined with a wide range of technologies that enable clients to consider how
they could achieve their objectives.
1.1.1 Objectives
We are committed to going the extra mile to bring success to the clients consistently. We are
dedicated to delivering the right people, solutions, and services to the clients that they require
to meet their technology challenges and business goals.
1
Amazon Product Review Analysis Company Profile
Delivering the most efficient and the best solution to our clients to every client leveraging
leading technologies & industry best practices.
We have a proven record of building highly scalable, world-class consulting processes that
offer tremendous business advantages to our clients in the form of huge cost-benefits,
definitive results and consistent project deliveries across the globe.
We prominently strive to improve your business by delivering the full range of competencies
including operational performance, developing and applying business strategies to improve
financial reports, defining strategic goals and measure and manage those goals along with
measuring and managing them.
We have proven record of evaluating the best candidates for your requirement and stand by
on the quality throughout the project implementation.
Leveraging the enormous talent of our passionate and proven individuals. We are hugely a
customer-centric organization that is bent upon consummating the needs of the customers
beyond their expectations. We successfully host a consortium of experienced
professionals who work in synergy in order to gain an edge over the market. we look at
ourselves as a team where we co-create with them.
Through its integrated Embedded and IoT services, Techno soft helps build intelligent &
connected devices that can be remotely monitored and controlled while leveraging edge
and cloud computing for a host of intelligent applications and analytics.
7 Full Stack Web Development
Full stack web development is the practice of working on both the front-end and back-
end of a program. Full Stack is a layer of software or web development consisting of the
front-end and the back-end portions of an application. Front-end is what the users will
see or interact with on your application. Back-end part is what users do not see, such as
application’s logic, database, server, etc. A full-stack web developer is comfortable
working with both back-end and front-end technologies which make a website or
application function properly.
6
Amazon Product Review Analysis About the Department
component runs in co-ordination with the existing components. Unit and Integration testing
are iteratively performed until the complete product is built. Once the complete product is
built, it is again tested against multiple test cases and all the functionalities.
The product could be working fine in the developer’s environment but might not necessarily
work well in all other environments that the users could be using. Hence, the product is also
tested under multiple environments (Various operating systems and devices). At every step, if
a flaw is observed, the component is rebuilt to fix the bugs. This way, testing is done
hierarchically and iteratively.
Since the internship was online, to ensure easy onboarding of interns, the company
had additional individuals who took care of the smooth run of online training.
Operation and Strategy Head- Ensured there were no difficulties for interns while
onboarding. Best of mentors and doubt clarifying sessions were arranged too.
Technical Lead- Ensured the technicalities of online training to be smooth. Best
platforms were arranged for our meetings and trainings.
Mentors- They have helped us to understand the concepts, gave us tasks to get
practical take a ways and clarified doubts to the best.
Interns- Worked through the tasks given either individually or in a group
Sentiment analysis has come a long way since it was first introduced decades ago and it's
know something we use in our day-to-day conversations. From Siri answering questions in a
more humanized way to chatbots being able to understand what we want from them; this field
has grown considerably in recent years but there are still challenges that need solving if it's
ever going to be perfect.
In this project we design an interface for a retailer to register and login. If the retailer is the
existing candidate, then he can directly login. Once the retailer login in, he can view the
review analysis. The reviews are categorised as good, bad and average. Similarly based on
the name of the customer pie chat of gender who had posted the review is also seen.
User Interface:
The user interface developed using python language on anaconda platform. It is achieved by
importing tkinter module. Functions were created for a new register, assigning function to
new register,login page and the password check
At the Backend,
9
Amazon Product Review Analysis Task Performed
1. Data Scraping : Crawl the amazon review url to extract all required details from it.
We need to take care of the text so as to satisfy the required format, for e.g. tags have
a special meaning to the browser i.e. break read or next line, we need to explicitly
convert each
tag to spaces or else the crawling result will be improper. When working with online
reviews there is always a question in our mind, how can I trust the review. This is not
a problem with amazon reviews, amazon reviewers can up vote or down vote a
review, this collectively is available as helpful count.The datasets were provided by
the company for the analysis
2. Data Cleaning and Processing: The data extracted need to be cleaned so that we get
proper text review on which analysis can be performed. Cleaning of crawled data is
done by removal of all special characters (such as: “:/.,’#$*^&-) in order to retrieve
best results. After cleaning the crawled content copy it into a csv file. The next step is
processing the cleaned data, firstly review is classified as service, feature or product
review. If the review is a feature review then feature extraction is done using POS
Tagging and grammar rule all stated below. After feature extraction the feature
opinion polarization is obtained.
3. All processed output is stored in one csv file for further use.
Algorithm 1.
1. Crawl the amazon review csv to extract all required details from it. Special
care for required format of information must be taken.
3. Read the csv file for processing, for each review do the following:
i. Perform a service review test where the review is tested for occurrence
of+++++ service words, i.e. if the review length is shorter than 15
words and service words are found in the review the the review is
classified as service review else if the length of review is greater than
15 then more than 2 service must occur in the review for it to be a
service review.
ii. If the review fails for the service test then it is tested for features of a
product (such as camera, microphone and battery) if these exist then
we classify the review as a feature review.
i. For each feature we extract its sentiment from the review using POS
tagging and ruled based extraction (using regular expressions).
iii. If the review fails the feature test also, then the review is classified as a
product review. A new final csv is generated with the classification
and sentiment of the feature phrases.
4. This csv is then loaded into the database for creating the visualizations by
querying data from the database.
The service and the product review’s polarity is the rating the user provides for that review.
The Good Reviews are those with rating 5 stars and 4 stars, Average Reviews are those with
rating 3 stars and Bad Reviews are those with rating 2 stars and 1 star. Finally, when a feature
sentiment is extracted the sentiment phrase is sent to a polarizer method, this method
basically returns +1 if the phrase is a positive sentiment else -1 if the phrase is a negative
sentiment. Firstly, the phrases are tested for indirect opinions such as “Battery no better than
iPhone 4s”, the test phrase is tested for certain pre-defined phrases that were found during
manual analysis of reviews. Next if the phrase test fails, the review is tested for the word
“not” if the word not exists then everything after not is polarized meaning every word after
not is tested for whether it is a positive word or a negative word and consecutive words
polarity are added and finally negated, for example “Camera is not good” this phrase is
classified as negative as the word “good” is negated by the word “not”. Lastly if “phrase” and
“not” test fail the test phrase is broken down into words and polarity of each word is found
from a dictionary of sentiment words bifurcated as good and bad words and collective
polarity is considered i.e. if the sum is below 0 the outcome is negative (-1) else outcome is
positive (+1). Rules for feature extraction
The following are some rules that our model uses to extract feature and its sentiment:
Adjective + Noun
Noun +Adjective
Adverb + Noun
Noun + Verb
Noun + Verb
Algorithm 2
Step 1: Load the important libraries
Abar graph was implemented to view the review results. And the pie chart was implemented
to view the gender of the reviewer.
4.1 Experience
As per our experience during the internship, Compsoft Technologies follows a good work
culture and it has friendly employees, starting from the staff level to the management level.
The trainers are well versed in their fields, and they treat everyone equally. There is no
distinguishing between fresher graduates and corporates, and everyone is respected equally.
There is a lot of teamwork followed in every task, be it hard or easy and there is a very calm
and friendly atmosphere always maintained. There is a lot of scope for self-improvement due
to the great communication and support that can be found. Interns have been treated and
taught well and all our doubts and concerns regarding the training or the companies have
been properly answered. All in all, Compsoft Technologies was a great place for a fresher to
start career and also for a corporate to boost his/her career. It has been a great experience to
be an intern in such a reputed organization.
These are those requirements which deal with what the system should do for users. It
describes the behavior of the system as it relates to systems it relates to systems functionality
and includes the designation of the required functions.
System Features
Under the system features, the super administrator can have a lot of freedom and do a huge
variety of tasks and manage the system very effectively and efficiently to extract the fullest of
the potential and advantage of the Disease Prediction System
1. The super administrator can manage the information along with adding the symptoms
of the patient.
14
Amazon Product Review Analysis Reflection Notes
2. The user can generate reports based on the symptoms that the patent has provided to
the user. He then directs the patients to take the tests to confirm the presence of the
disease.
3. The hardware requirements are the Intel Core Duo 2.0 GHz Processor with 1 GB or
more RAM and 80 GB or more Hard Disk. It also requires 15th CRT or LCD monitor,
Normal or Multimedia keyboard and a compatible mouse.
4. The software requirements consist of Windows (10 or more) operating system and
Python 3.8 programming language.
3.2 Non Functional Requirements
● It is a requirement specification that specifies criteria that can be used to judge the
operations of a system rather than specific behavior. These requirements are also
called quality attributes of a system as these include the majority those metrics that
define the standard and quality for the system, Some of the parameters coming under
this includes Performance, Security etc.
● Performance: Performance term is mainly used to measure the parameters called time
& space. This project uses verify less space and the actions up or operations
performed are done very quickly in fraction of seconds. There is no issue of memory
size out of bounds.
● Security: Security or authorization is one of the major parameters of all computerized
applications. As details are confidential no malicious user must be allowed to operate
on
The existing system allows to take reviews from the consumers. These reviews must be read
and manually evaluated. Since amazon owns millions of customers evaluating each review is
take slots of efforts. Reviews tell you what products and features trending are, what’s in-
demand, what’s no longer relevant, how your products and those of your competitors are
doing, and more.
It’s no wonder that some sellers have been found willing to pay up to $5 per review with no
questions asked. That said, ensuring that you’re analyzing authentic, uncensored reviews is
key, as 62 percent of consumers say they will not support brands that engage in review
censorship, and consumers are on the lookout for fake reviews.
As a result, a seller can't afford not to have an effective way to monitor authentic customer
opinions, enabling them to stay ahead of the competition, and secure their place on top of
search rankings.
Properly monitoring these customer opinions also allows sellers to identify trends and
patterns over time so they can improve their products.
What makes this process challenging is that there are millions of new product reviews every
month. This high volume makes it impossible for sellers to monitor these customer opinions
manually, without compromising their profit margins or risking missing out on important
trends or patterns over time.
If reviews are evaluated careless then there may be a chance of business shutdown.
Amazon is one of the largest online vendors in the World. People often gaze over the
products and reviews of the product before buying the product on amazon itself. But the
reviews on amazon are not necessarily of products but a mixture of product of product review
and service review (amazon related or Product Company related). The buyer is misled as the
overall sentiment (rating classification) that amazon gives is a collective one and there is no
bifurcation between a service review and product review. The proposed model satisfactorily
segregates service and product review, in addition to this it also classifies the review as
Feature review if the user talks about some product feature. A featured review is nothing but
a product review, our model also gives sentiment of the text about the product feature. For
example, if the user writes in his review, “the camera for this phone is very good.”, then we
also classify camera feature as positive. We aim to build a system that visualizes the review’s
sentiment in the form of charts.
Model
4.4 Implementation
4.4.1 MODULES
Authorization: The analysis must be done by the retailer to maintain the confidentiality.
Therefore, only authorized users must be able to login to analyze the review.
Data Preprocessing: The data extracted need to be cleaned so that we get proper text
review on which analysis can be performed. Cleaning of crawled data is done by removal
of all special characters (such as: “:/.,’#$*^&-) in order to retrieve best results. After
cleaning the crawled content copy it into a csv file.
The first one is used to segregate the product reviews and the service reviews.
This algorithm also stores the service review in the form of csv file.
The second algorithm is support vector algorithm. This algorithm classifies the
reviews as Good, average and bad.
Figure-4.6.1 Output 1
Figure-4.6.2 Output 2
21
BIBLIOGRAPHY
[1] tweepy.API — Twitter API v1.1 Reference — tweepy 3.10.0 documentation. (n.d.).
Tweepy.Org. https://docs.tweepy.org/en/latest/api.html.
[2] Part 3: The RabbitMQ Management Interface - CloudAMQP. (n.d.). RabbitMQ.
https://www.cloudamqp.com/blog/part3-rabbitmq-for-beginners_the-
management interface.html.
[3] Tutree. (n.d.). Tutree. https://tutree.com/
[4] The Python Tutorial — Python 3.9.4 documentation. (n.d.). Python Tutorial.
https://docs.python.org/3/tutorial/.
[5] Tutorial: Get started with Go - The Go Programming Language. (n.d.). Golang Tutorial.
https://golang.org/doc/tutorial/getting-started
[6] Martin, R. C. (2020, April 7). How to write clean code? Lessons learnt from “The Clean
Code” — Robert C. Martin. Clean Code. https://medium.com/mindorks/how-to write-clean-
code-lessons-learnt-from-the-clean-code-robert-c-martin-9ffc7aef870