Phani Intership 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

CSB4244 – INTERNSHIP

ECS51801

Landmark Recognition Technology

INTERNSHIP

ROLL NO: 22112125


22112280

NAME: Konka
Korukonda harish Kumar
Phanikumar
CLASS :CSE-3C
CSE-E

B. TECH CSE

1
BONAFIDE CERTIFICATE
Certified that this internship report "Landmark Recognition technology
internship" is the bonafide work of 22112125
22112280 who carried out the internship
during the academic year 2022-2023

MENTOR
CLASS INCHARGE HEAD OF DEPARTMENT
Radhika
Mrs. Vasugi Paulraj Dr.Thangakumar J

ASSISTANT PROFESSOR(SG) ASSOCIATE PROFESSOR


DEPARTMENT OF CSE DEPARTMENT OF CSE

EXTERNAL EXAMINER
Name: ________________________
INTERNAL EXAMINER
Designation: ___________________
Name: ________________________
Institution Name: _______________
Designation: ___________________

Project Viva - voce conducted on __________________

2
ABOUT

1 STOP
ACCENTURE

Accenture Services Pvt Ltd(“Accenture”)

OFFICE ADDRESS:
17E 18th CROSS ROAD RD, SECTOR 3RD, HSR LAYOUT BENGULURU,
KARANATAKA 560102

3
Internship Domain: AI MACHINE LERNING

Skills Acquired:This technology has applications in various industries,


including tourism navigation, and augmented reality.

Duration: 3 Weeks (09 JUL, 2023 to 31 JUL, 2023)

Project Title: Landmark Recognition Technology

Project Methodology:

Week I:

• Project Understanding
• Learn basics
• Choose a frame work
• Identify the requirements that need to be delivered for this project. •
Identify which tasks you should focus on as AI machine learning

4
Week 2:
• Learning the data set Exploration.
• Understanding information about using algorithms and machine learning.
• Understanding the attributes and their relation model given by the client

Week 3:

• Implementation required for AI Machine learning


1. Pre-processing of the project. 2.
Model training of the project.
3. Evaluation of the project.

5
CONCLUSION:
Overall, my internship experience has provided me with a deep
understanding of Landmark Recognition Technology using Excel, and
I have developed valuable skills that I can apply to future projects. I
enjoyed working with the Accenture team and appreciated the
opportunities to learn and grow throughout the internship. Based on my
experience, I recommend that future interns focus on developing their
Excel skills and exploring different data analytics techniques to better
understand the data and its insights. I also recommend that Accenture
continues to invest.

6
PROOF OF WORK

CERTIFICATE OF INTERNSHIP

7
PROOF OF WORK

PROJECT REPORT

8
RESEARCH INTERNSHIP SCHEME -MAY 2023
(Online Mode)

TITLE: Landmark Recognition Technology

GUIDE NAME: Sanaiya

STUDENT DETAILS: Name :Korukonda Harish Kumar


Konka Phanikumar

Regno: 22112125
22112280

Hindustan Institute of Technology


And science, chennai

9
CONTENTS
Chapter 1
1.1INTRODUCTION
CHAPTER 2
2.1PROJECT REVIEW
2.2LITERATURE REVIEW
2.3PROMBLEM STATEMENT
2.4FUNCTIONAL REQUIRMENTS
2. 4.1NON-FUNCTIONAL REQUIRMENTS
2.5 SOURCE DATA
CHAPTER 3
3.1 SYSTEM ARCHITECTURE
3.2 INTERFACE PROTOTYPING
3.3 DATA FLOW DESIGN CHAPTER 4
IMPLEMENTATION
4.1 DATABASE DESIGN
4.2 USER SCREEN AFTER PREDICTION
CHAPTER 5
CONCLUSION
5.1 EXPERIMENT AND RESULTS
5.2 CONCLUSION AND FUTURE WORK

10
ABSTRACT
Landmark Recognition is the technology that can anticipate landmark names
straightforwardly from picture pixels, to help individuals better comprehend
and sort out their photograph accumulations and for law enforcement
officials to gauge the location of images submitted as evidence. Image
classifications techniques have shown remarkable improvements over the
last few years. To further improve computer vision technologies and
methodologies, researchers are now concentrating on highly specific types
of classification. Instead of classifying cats, cars or buildings, researchers are
trying to classify among different types of landmarks - both natural and
manmade. In the present age, a tremendous roadblock in landmark
recognition research is the lack of large, well labelled datasets. To rectify this,
Google has come up with the Google Landmark Recognition Dataset. The
dataset contains 1.2 million images of 15000 categories of landmarks. For
the project, a subset of Google Landmark Recognition dataset has been used.
Various latest classification algorithms, like AlexNet, ResNet, SEResNet, VGG-
16 and Inception v3 have been implemented to classify the images. Among
them, the SE-ResNet architecture achieves the lowest loss value of
0.0985 and accuracy of 98% on the training set.

11
LIST OF ABBREVIATIONS

Sr No Abbreviation Full Form

1. SE-ResNet Squeeze and Excitation


Residual Network

2. ML Machine Learning

Constructive Cost
3. COCOMO Model

4. SIFT Scale Invariant Feature


Transform

5. CNN Convolutional Neural


Network

6. Reset Residual Network

Uniform Resource
7. URL Locator

12
CHAPTER 1

INTRODUCTION

Image recognition is the technology that identifies places, symbols,


individuals, articles, structures, and numerous further variables in pictures.
Consumers are distributing massive quantities of data through apps, social
networks, and websites. Furthermore, mobile phones furnished with
cameras are creating unlimited images and videos.Image recognition is a
portion of computer vision and a procedure to classify and perceive an entity
or characteristic in a video or image. Computer vision is a wider field that
contains means of collecting, handling and examining data from the real
world. Historically, image recognition has always been linked to Machine
Learning, as most image recognition tasks were made simpler by the use of
machine learning techniques.

The scientific analysis of procedures and arithmetical simulations that


computers use to iteratively improve their performance on an extensive
variation of tasks is called Machine Learning. These systems build a predictor
for training data, to make predictions without being told to perform the task.
Machine learning algorithms are used in the functions of filtering emails,
network intruder recognition, and computer vision. Some of the algorithms
used for classification are Logistic Regression, Support Vector Machines and
Decision Trees. The disadvantage of using these methods for image
classification is that we need to manually specify the features for each and
every image, instead of the algorithm automatically finding features. This
makes Deep learning important for image classification

Deep Learning is a subfield of Machine Learning. Neural Networks are the


essential in Deep Learning. Deep learning is a piece of a bigger sort of ML
13
techniques dependent on perceiving information portrayals, when
contrasted with explicit calculations. It very well may be directed,
semiregulated or unsupervised.

Deep learning models, for example, profound neural systems and


intermittent neural systems have been connected to fields including PC
vision, where they have given results proportionate to and in specific cases
superior to human specialists.

Deep learning models are ambiguously motivated by designs in natural


nervous systems still have many modifications from the fundamental and
purposeful possessions of natural brains (specially human brains).

Image classification in Deep Learning occurs through the use of


Convolutional Neural Networks. A convolutional layer is nothing but a matrix
full of numerical values. The convolution operation multiplies this matrix
with the individual pixel values in the image, thus detecting edges, shapes
and other features of the image. This task occurs automatically, without the
need of manual intervention to specify the features.

By using deep learning CNN algorithms such as AlexNet, DenseNet,


researchers have achieved state of the art performance on image recognition
tasks. This is mainly due to the nature of the convolutional layer, which
detects features on its own. By using a combination of various convolutional
filters, various features of the image can be detected, some not even visible
to the naked eye. Thus, Deep Learning models perform better on image
recognition tasks than plain Machine Learning models.

14
Some of the reasons for using Deep Learning for Image recognition tasks are
as follows. Previous image recognition techniques involved the use of feature
extraction techniques such as SIFT and SURF.These techniques required
programmers to manually extract features from images and store them in a
database for comparison with test images.Deep Learning methods allow for
automatic feature extraction using Convolutional Neural Networks. Since the
dataset has a large number of images, it would require a lot of effort to
manually extract features for so many images. Thus, neural networks and
Deep Learning algorithms are the better option.

15
CHAPTER 2 PROJECT OVERVIEW

For the project, the task is to classify images based on the landmarks
contained in them. For example, if the sample image is an image of the Taj
Mahal, the output of the system should be Taj Mahal. For this task,
Convolutional Neural Networks and its various architectures have been
chosen. Previously, landmark recognition used to be done by using the GPS
information present in the image itself. Without an active internet
connection, this information will not be collected by the camera.

The proposed system will be trained on the Google Landmark Dataset, which
contains 1.2 million images of around 15000 types of landmarks. This system
does not need the use of any internet connection to determine the location
or the landmark present in the image. By running the trained model on the
test image, the landmark present in the image is predicted. Also, this system
will return results within 1-2 ms. For deciding upon the architecture to be
used for the model, various research papers were read, and experiments
were run on a subset of the dataset with the various papers architectures.

The existing methodology relies on GPS information and the metadata


present in the image for classifying images based on location and landmark.
16
The disadvantage of this method is that without an internet connection while
clicking the picture, this metadata does not exist. In such a case, classifying
this image would require searching the Internet which can be a cumbersome
process. Also, some not well known landmarks have still not been mapped,
which results in a wrong classification.

A method for classifying images using Deep Learning has been proposed,
which does not require the use of any Internet connection. The system will
be faster than existing systems and can be trained to learn images which it
does not currently recognize too. The system will be tested on the dataset
using various existing algorithms and the architecture which gives the best
results will be chosen.

2.1 LITERATURE SURVEY

Alex Krizhevsky et al[1] proposed a design made up of 5 convolutional


layers pursued by 3 completely associated layers. AlexNet utilizes ReLu as the
non-straight enactment work. An extra downside that this engineering
settled was diminishing overfitting by methods for a Dropout layer following
each totally associated layer. This engineering accomplished a best 5 test
mistake rate of 15.3%, contrasted with 26.2% accomplished continuously
best section on the ImageNet 2012 dataset.

Karen Simonyan and Andrew Zisserman [2] proposed an improvement over


AlexNet by substituting immense bit estimated channels with various 3X3
channels in a steady progression. Inside the gathering field, a lot of littler size
pieces are prevalent than one huge size part since bunches of non-direct
layers builds the profundity of system which enables it to find additional
convolutions at a lesser expense.The convolutional layers are trailed by 3
17
completely associated layers. The width of the system begins at 64 and
develops by a factor of 2 following each pooling layer. It achieves the main 5
exactness of 92.3 % on ImageNet.

Sergey et al,2015 [3] proposed the VGG design . It accomplishes a great


exactness on ImagesNet dataset. GoogLeNet built up a segment considered
commencement segment that gauges a little CNN with a normal design.
Since just few neurons can be worked, the quantity of the convolutional
channels of a particular piece estimate is kept up to a low number. To add to
the current engineering, they utilize convolutions of various sizes to catch
subtleties at changed scales. Additionally, it has a bottleneck layer(1X1
convolutions). It helps in the abatement of handling power. It accomplishes
93.3% top-5 precision on ImageNet and is a lot speedier than VGG.

Kaiming He et al[4] proposed a remaining square which can be connected in


the middle of the layers of a neural system. The profundity of a neural system
would improve the exactness of the framework, if overfitting is dealt with. In
any case, issue with developed profundity is that the qualities important to
change the loads, end up being next to no at the underlying layers, because
of the expanded profundity. Remaining systems license preparing of these
systems by structure the system through segments named leftover squares.
It achieves improved precision than VGG and GoogLeNet while utilizing less
assets than VGG. ResNet-152 accomplishes
95.51 % top-5 correctnesses.

Jie Hu et al [5] proposed the utilization of another crush and excitation


square to help in viably learning the assignment given close by. CNNs use
channels to acquire information from pictures. Lower layers find minor
subtleties like edges, while upper layers recognize appearances and content.
18
This works by joining the three-dimensional and channel data of a picture.
This implies adding a sole parameter to all channels and giving it a direct
scalar to pass judgment on how pertinent every one is.These qualities would
now be able to be utilized as loads on the highlights framework, positioning
the channels dependent on its significance.

SENet accomplishes 2.251 % top-5 blunder rate in the ImageNet 2012


dataset.

Barret Zoph et al[6] NASNet was fabricated utilizing the Neural Architecture
Search system. The target of NAS is to utilize an information driven and
insightful strategy to building the system plan rather than intuition and
experimentations.In the Inception paper, it was exhibited that a muddled
gathering of channels in a phone can impressively expand results. The NAS
structure traces the structure of such a cell as a streamlining procedure, and
afterward stacks the numerous duplicates of the best cell to construct a vast
network.Finally, two unique cells are manufactured and used to prepare the
full model.

Gao Huang et al[7] proposed the DenseNet, made up of Dense squares,


which is fundamentally a completely associated layer. That implies that the
yield from past layers is completely associated with the contributions of the
following layer. This outcomes is no loss of data from past layers. The Dense
square is comprised of 3 sections a Batch Normalization task, trailed by a
ReLU non-linearity and a convolutional channel of size 3*3. DenseNet does
not require any wide or shallow layers since there is no loss of data from the
information picture to the yield layer. This outcomes in an extensive number
of complex computations to be registered, which builds the quantity of loads
required for the model. This legitimately relates to an expansion in model
19
size. A DenseNet model is right around multiple times the span of a
comparable ResNet model, both giving comparable correctnesses.
Therefore, DenseNet is definitely not a suitable choice for lean, productive
applications.

2.2 PROBLEM DESCRIPTION

Landmark Recognition is a booming field for researchers. Previously, this


use to be done with the help of GPS information and metadata stored in the
image when it is clicked. The drawback with this method is that this
information is not stored in the image if no internet connection is present at
the time of clicking the picture. In such a case, classifying the image would
require scouring the internet for finding this information.

This project proposes a different method for classifying landmarks, one


which relies on using

CNN’s and their various architectures. Five potential algorithms have been
chosen- AlexNet, VGG-16, ResNet, SE-ResNet and Inception v3. The
algorithm which provides the best results will be chosen for as the base for
the system.

The problem is to correctly predict landmark labels directly from images in


order to

● Help people recognize known and unknown landmarks and automatically

organize their photo collections


● Help law enforcement authorities recognize locations based on certain

landmarks that can be used to catch criminals.

20
2.3 REQUIREMENTS
GATHERING ●
Questionnaire:
○ The preliminary requirements will be found by asking the
stakeholders to fill a questionnaire. The questionnaire will
provide vital information on what the stakeholders are seeking
and the area of focus that needs to be tackled.

● Interview:

○ The requirements will be further developed by interviewing the


concerned stakeholders. This helps to clarify any misunderstood
requirements as well as figure out any implied requirements.

● Brainstorming:

○ A brainstorming session will be held with the developers to


discuss the requirements. This step is necessary to find various
approaches to solve the stakeholders’ problems. Both divergent
and convergent thinking will help to gather the solutions
efficiently.

21
2.4 REQUIREMENTS ANALYSIS
2.4.1 FUNCTIONAL REQUIREMENTS
● The system should accept any image and predict the
landmark in it ● The system should not take more than .5
seconds to predict the landmark ● The system should have
good accuracy(~90%).

2.4.2 NON-FUNCTIONAL REQUIREMENTS


● Correctness: The system should predict the category correctly 98%
of the time.

● Reliability: The system should not crash when many people are
using it at once( down time less than 5 seconds).
● Security: A user should have access to their images only

2.5 DATA SOURCE


The data source used is the Google-Landmarks dataset provided by
Google. The dataset contains a CSV file with the filenames of the images
along with a unique ID for each image. Each image can be downloaded by
following the url provided in the CSV file. For the project, a Python script has
been written to automatically download the dataset and skip any missing or
duplicate images.

The dataset contains 1.2 million images of 30,000 categories. The following
image depicts the geographical distribution of the landmarks:
22
URL for dataset-
https://www.kaggle.com/c/landmarkrecognitionchallenge/data

2.5 Geographical Distribution of Landmarks

2.6 COST ESTIMATION


COCOMO model has been used to estimate the cost of the project.

23
2.6 COCOMO Model

The software project is said to be an organic type because:

● the group estimate required is sufficiently little

● the issue is surely known and has been understood previously

● the colleagues have an less experience with respect to the issue


We will use the COCOMO model to estimate the cost of the project.

Here,

KLOC = thousand lines of code a,


b, c = coefficients

2.6 COCOMO Model Coefficients


24
Effort Estimation:
Our project consists of ~2000 LOC= 2
KLOC Now, plugging in the values in the
formula-

Effort= effort in terms of staff months a=3.6,


b=1.2

Substituting the values- 3.6 * (2^1.2)= 8.27 =~ 8 man-months

Product Size:

P= Size/E

= KLOC/man-months

=
2/8.
27 =
0.24
2
Staff
ing:

Here E = 8.27 man-months

And D = 4.91 months

Therefore, P = 8.27 / 4.91 = 1.68 =~ 2 people

25
Prediction Result

After the model has been trained on the training set, it will be saved
in a Hierarchical

Data Format(HDF, .h5) file. For predicting the landmark in a given image,
the saved model will be loaded and the test image will be fed to the
model, which will output the landmark and the accuracy of prediction.

The software project is said to be an organic type because:

● the group estimate required is sufficiently little

● the issue is surely known and has been understood previously

● the colleagues have an less experience with respect to the issue


We will use the COCOMO model to estimate the cost of the project.

Here,

KLOC = thousand lines of code a,


b, c = coefficients

26
CHAPTER 3 ARCHITECTURE AND DESIGN
3.1 SYSTEM ARCHITECTURE
The flow of the system starts from the CSV file. Using a download
script, all the images are downloaded from the URL’s provided in the file.
Next, the images are sorted into the various folders with respect to the
landmark ID. Using another Python script, the images are resized into
128*128 pixels. Next, the images are split into a training and test dataset.
The training images are fed into the model and the accuracy and loss values
are monitored. The trained model is saved into a .h5 file for future testing.

In the GUI, the user selects the image to be classified and clicks on the predict
button. Using the saved model, the system predicts the landmark of the
selected images.

27
3.1 Architecture Diagram for Landmark Recognition
System Modules:
● Image Downloader

It is used to download the dataset used for training and testing the
model. The links for the various images are provided in a CSV file.
Using a python script, we automatically download the images.

● Data preprocessor
This module is used to preprocess the images for optimum training
and testing. It involves extracting, resizing, and compressing the
images. We use OpenCV to resize the images to a size of 128*128
pixels. Once resized, we use the Keras flow_from_directory API to load
images in batches of 32 to the model.

● Data splitter

This is used to split the dataset into training and test sets. The ratio
used is 80:20.

28
● Machine Learning model- Residual Block

Instead of hoping each few stacked layers directly fit a desired


underlying mapping, we explicitly let these layers fit a residual
mapping. Formally, denoting the desired underlying mapping as H(x),
we let the stacked nonlinear layers fit another mapping of F(x) := H(x)
− x. The original mapping is recast into F(x)+x.

● Machine Learning model- Shortcut connection:


Short-cut connections are the connections which skip one or more
layers. In the figure an identity short-cut is added to the learned
residual map F(x) to obtain the desired mapping H(x).

● Machine Learning model- Squeeze and excitation block

The features of the neural network are passed through a squeeze


operation, which results in a vector of size n, where n is the number
of channels. This vector is now passed to the excitation operation,
which outputs the weights of the n channels, which can be used on
the original features to scale the channels according to their
importance.

● Prediction Result

After the model has been trained on the training set, it will be saved
in a Hierarchical

29
Data Format(HDF, .h5) file. For predicting the landmark in a given image,

the saved model will be loaded and the test image will be fed to the model, which

will output the landmark and the accuracy of prediction. 3.2 INTERFACE

PROTOTYPING

3.2 User Interface for Landmark Recognition System

1. The main component of the user interface will be the image uploader
where the user will be able to upload any image of their choice

2. After this, the system will run the prediction model on the given image
to predict which landmark is present in the image

30
3.3 DATA FLOW DESIGN

LEVEL 0

3.3 Level 0 Data Flow Diagram

The Level 0 diagram depicts the overview of the system. The image dataset
goes to the machine learning model which ultimately gives the result.

LEVEL 1

31
Chapter-4 4.1.1 ER DIAGRAM

4.1.1 ER Diagram for Landmark Recognition System The


Images table has the following attributes:
Image ID - primary key

Image URL

Landmark Category

Train/Test

The Result table has the following attributes:

Result ID - primary key


Image Name

Predicted Category

These tables have a 1:1 relationship

4.2 USER INTERFACE

32
4.2 User Screen Before Prediction

4.2 User Screen After Prediction

1. The main component of the user interface will be the image uploader

where the user will be able to upload any image of their choice
2. After this, the system will run the prediction model on the given image

to predict which landmark is present in the image

33
CHAPTER-5 5.1 EXPERIMENT AND WORK:
The flow of the system starts from the CSV file. Using a download
script, all the images are downloaded from the URL’s provided in the file.
Next, the images are sorted into the various folders with respect to the
landmark ID. Using another Python script, the images are resized into
128*128 pixels. Next, the images are split into a training and test dataset.
The training images are fed into the model and the accuracy and loss values
are monitored. The trained model is saved into a .h5 file for future testing. In
the GUI, the user selects the image to be classified and clicks on the predict
34
button. Using the saved model, the system predicts the landmark of the
selected images.

5.2 CONCLUSION AND FUTURE WORK:

The system is made up of the these four files- resize.py, download.py, split.py
and app.py. Download.py is used to download all the images in the dataset,
after which resize.py resizes the images to 128*128 pixels. The dataset is split
into a training and test set by split.py and the app.py file is used to build and
run the model.

The user enters the image in the system. The image downloader downloads
the image dataset. The data processor resizes the images and extracts the
features. Then the dataset is split into the train and test set in a 80:20 ratio.
The machine learning module is trained. The user’s image is then classifies
and the result is displayed.

The system is made up of the these four files- resize.py, download.py, split.py
and app.py. Download.py is used to download all the images in the dataset,
after which resize.py resizes the images to 128*128 pixels. The dataset is split
into a training and test set by split.py and the app.py file is used to build and
run the model.

35
36
ANNEXURE - I
from keras import
layers from keras
import models
import
os import cv2
import
numpy as np
from keras.utils.np_utils import to_categorical from keras
import optimizers from keras.preprocessing.image import
ImageDataGenerator import matplotlib.pyplot as plt
import matplotlib.pyplot as plt

img_height = 128 img_width =


128
img_channels = 3 cardinality
= 32

def squeeze_excite_block(input,ratio=16):

init=input

channel_axis=-1

filters=init._keras_sh

ape[channel_axis]

37
se_shape=(1,1,filters

se=layers.GlobalAveragePooling2D()(init)

se=layers.Reshape(se_shape)(se)

se=layers.Dense(filters//ratio, activation='relu',

kernel_initializer='he_normal',use_bias=False)(se)

se =layers.Dense(filters, activation='sigmoid',
kernel_initializer='he_normal', use_bias=False)(se)

x =layers.multiply([init, se])

return x def
residual_networ k(x):

"""

ResNeXt by default. For ResNet set `cardinality` = 1 above.

"""

def

add_common_layers(y): y
38
=
layers.BatchNormalization()(y
) y = layers.LeakyReLU()(y) return
y

def grouped_convolution(y, nb_channels, _strides):

# when `cardinality` == 1 this is just a standard convolution if


cardinality == 1:

return layers.Conv2D(nb_channels, kernel_size=(3, 3),


strides=_strides, padding='same')(y)

assert not nb_channels % cardinality

_d = nb_channels // cardinality

# in a grouped convolution layer, input and output channels are divided


into `cardinality` groups,

# and convolutions are separately performed within each group


groups = [] for j in range(cardinality): group = layers.Lambda(lambda
z: z[:, :, :, j * _d:j * _d + _d])(y) groups.append(layers.Conv2D(_d,
kernel_size=(3, 3), strides=_strides, padding='same')(group))

39

You might also like