Project Report Final

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Pan Card Tampering Detection

A Report Submitted
In the Partial Fulfilment of the Requiremnet
For the degree of Bachelor of Technology

Jayendra Pratap Singh 202110101110703


Soumya Rai 2021101011107

Under the Guidance of


Er. Abhilasha Mandal

FACULTY OF DEPARTMENT OF COMPUTER SCIENCE


AND ENGINEERING
INSTITUTE OF TECHNOLOGY
SHRI RAMSWAROOP MEMORIAL UNIVERSITY

April,2024
CERTIFICATE

It is certified that the work contained in the project report titled “Pan Card Tampering
Detection”, by “Jayendra Pratap Singh and Soumya Rai ”, has been caried out under my
supervision and that this work has not been submitted else where for any other degree.

Signature
Er. Abhilasha Mandal
(Assistant Professor)
Department of Computer Science and Engineering
Shri Ramswaroop Memorial University

Signature
Dr. Satya Bhushan Verma
(Head of Department)
Department of Computer Science and Engineering
Shri Ramswaroop Memorial University

April,2024

I
ABSTRACT

Our project is PAN CARD TAMPERING DETECTION. This project helps to detect the
tampering of PAN card using Computer vision that trains computers to interpret and
understand the visual world. Using digital images from cameras and videos and deep learning
models, machines can accurately identify and classify objects — and then react to what
they “see”. This project will help different organization in detecting whether the Id i.e., the
PAN card provided to them by their employees or customers or anyone is original or not.

For this project we will calculate structural similarity of original PAN card and the PAN card
uploaded by user. It mainly focused on processing the raw input images to enhance them or
preparing them to do other tasks. Computer vision is focused on extracting information from
the input images or videos to have a proper understanding of them to predict the visual input
like the human brain. This is soul of this project we will discuss it later in this project.

Similarly in this project with the help of image processing involving the techniques
of computer vision we are going to detect that whether the given image of the PAN card
is original or tampered (fake) PAN card.

II
ACKNOWLEDGEMENT

The present work will remain incomplete unless we express our feelings of gratitude towards
a number of persons who delightfully cooperated with us in the process of this work. First of
all, we would like to thank our Team Leader for her encouragement and support during the
course of my study. I extend my hearty and sincere gratitude to my project guide, Er
Abhilasha Mandal assistant professor in Department of Computer Science, for her valuable
direction, suggestions and exquisite guidance with ever enthusiastic encouragement ever
since the commencement of the project.

This project would not have taken shape, without the guidance provided by project
coordinator Er Abhilasha Mandal, who helped in our project and resolved all the technical
as well as other problems related project and, for always providing us with a helping hand
whenever we faced any bottle necks, inspire of being quite busy with their hectic schedule.

III
TABLE OF CONTENT
Catalog
ABSTRACT II

ACKNOWLEDGEMENT III

TABLE OF CONTENT IV

TABLE OF FIGURES VII

CHAPTER 1 1

1. INTRODUCTION 1

System Study 1

1.1 Current Challenges and Limitations 1

1.2 Proposed System Requirements 2

CHAPTER 2: 3

OVERVIEW OF PROJECT 3

CHAPTER 3: 4

HARDWARE AND SOFTWARE REQUIREMENT 4

Hardware Requirements: 4

Software Requirements: 4

3.1. Programming Languages: 4

3.2. Libraries and Frameworks: 4

3.3. Development Tools: 4

3.4. Image Processing Tools: 4

3.5. Web Application Deployment: 5

CHAPTER 4: 6

FEASIBILITY STUDY 6

4.1 Technical Feasibility: 6

4.2. Operational Feasibility: 6

4.3. Economic Feasibility: 6

4.4 Legal and Ethical Feasibility: 6

4.5 Conclusion: 7

CHAPTER 5: 8

PRODUCT PERSPECTIVE 8

IV
5.1 System Interfaces 8

5.2 User Interfaces 8

5.3 Hardware Interfaces 8

5.4 Software Interfaces 8

5.5 Communication Interfaces 8

5.6 Memory Constraints 9

CHAPTER 6: 10

PROJECT DESIGN 10

6.1 Data Model 10

6.1.1 Database Design 10

6.2 Process Model 10

6.3 Functional Decomposition Diagram 10

CHAPTER 7: 12

ARCHITECTURAL DESIGN 12

7.1 Architectural Model 12

7.1.1 Clientside Interface: 12

7.1.2 Serverside Processing: 12

7.1.3. Database Integration: 12

7.1.4. Tampering Detection Module: 12

7.1.5. Reporting and Visualization: 12

7.1.6. Integration with External Systems: 13

7.1.7. Security and Scalability: 13

CHAPTER 8 14

IMPLEMENTING TECHNOLOGIES 14

Steps involved in this project are as follows: 14

8.1 Computer Vision 14

8.2 Concept of Edge Detection 15

Applications of OpenCV 17

OpenCV Functionality 18

ImageProcessing 18

DigitalImage 18

V
Big Data : 20

Machine learning : 20

STRUCTURAL SIMILARITY INDEX MEASURE (SSIM) : 21

CHAPTER 9 22

Data Flow Diagram 22

CHAPTER 10 23

UNDERSTANDING THE LIBRARIES 23

10.1 Model of The Project 23

CHAPTER 11 24

IMPLEMENTATION 24

11.1 Open image and display 24

Converting the format of tampered image similar to original image. 24

11.2 SSIM 27

11.3 SSIM Functions 27

11.4 Real world use of SSIM 27

11.5 SSIM help in detection 27

Example 29

Tech Stack 29

Working 29

References 35

VI
TABLE OF FIGURES

S.NO FIGURES PAGE NO

1 FIG 1 14

2 FIG 2 15

3 DATA FLOW DIAGRAM 20

4 ORIGINAL 23

5 TAMPERED 24

6 CSV OUTPUT 27

7 original image with contour 28

8 INFERENCE 29

9 THRESHOLD 32

VII
CHAPTER 1
1. INTRODUCTION
Computer vision is a field of artificial intelligence that trains computers to interpret and
understand the visual world. Using digital images from cameras and videos and deep learning
models, machines can accurately identify and classify objects — and then react to what they
"see". Similarly in this project with the help of computer vision we are going to detect that
whether the given image of PAN card is original or tampered (fake) Computer vision tasks
include methods for acquiring, processing, analyzing and understanding digital images, and
extraction of highdimensional data from the real world in order to produce numerical or
symbolic information, e.g. in the forms of decisions. Understanding in this context means the
transformation of visual images (the input of the retina) into descriptions of the world that
make sense to thought processes and can elicit appropriate action. This image understanding
can be seen as the disentangling of symbolic information from image data using models
constructed with the aid of geometry, physics, statistics, and learning theory.

System Study

The system study for the "Pan Card Tampering Detection" project involves a comprehensive
analysis of the current processes and challenges related to verifying the authenticity of Pan
Cards. This study aims to identify the limitations of the existing methods and establish the
requirements for the proposed system.

1.1 Current Challenges and Limitations

1.Manual Inspection: Currently, the process of verifying Pan Cards relies heavily on manual
inspection by authorities. This approach is timeconsuming, prone to human error, and lacks
scalability as the number of Pan Card applications continues to grow.

2. Lack of Automated Verification: There is a lack of an automated, reliable, and efficient


system to detect tampering on Pan Cards. The existing methods are often subjective and do
not provide a standardized, quantifiable way to assess the authenticity of the cards.

3. Increasing Sophistication of Tampering Techniques: Fraudsters are continuously


developing more sophisticated techniques to tamper with Pan Cards, making it increasingly
challenging for manual inspection to detect such alterations.

1
4. Centralized Database Limitations: The current centralized database of Pan Card records
maintained by the Income Tax Department has limitations in terms of accessibility, realtime
updates, and integration with other systems.

1.2 Proposed System Requirements

1. Automated Tampering Detection: The new system should be able to automatically


detect tampering on Pan Cards by leveraging computer vision and machine learning
techniques, reducing the reliance on manual inspection.

2. Scalable and Efficient Processing: The system should be designed to handle a large
volume of Pan Card verification requests efficiently, ensuring timely and accurate results.

3. Standardized Tampering Assessment: The system should provide a standardized,


quantifiable method to assess the authenticity of Pan Cards, reducing the subjectivity and
inconsistencies in the current verification process.

4. Integration with Existing Systems: The system should be designed to integrate with
the existing Pan Card database and other relevant systems, enabling seamless data
exchange and realtime updates.

5. Userfriendly Interface: The system should have a userfriendly interface, allowing both
authorities and the general public to easily submit Pan Card images for verification and
access the results.

6. Secure and Reliable: The system should be designed with robust security measures to
protect the integrity of the Pan Card data and ensure the reliability of the tampering
detection process.

2
CHAPTER 2:

OVERVIEW OF PROJECT

2.1 Functionality

The project "Pan Card Tampering Detection" aims to develop a system that can detect tampering or
alterations in PAN (Permanent Account Number) cards. This functionality involves implementing
image processing techniques to analyze PAN card images for any signs of tampering or manipulation.
By utilizing advanced algorithms and image recognition technology, the system will be able to
identify discrepancies in PAN card images, ensuring the authenticity and integrity of the document.
This functionality is crucial in preventing fraud and ensuring the reliability of PAN cards for
identification and financial transactions.

The key functionality of the project includes:

 Implementing Structural Similarity Index Measure (SSIM) to compare a given PAN card image
with an original template and detect any tampering or alterations

 Utilizing computer vision and image processing techniques, such as contour detection and
thresholding, to identify discrepancies in the PAN card image

 Providing a web application or tool that allows users to upload PAN card images and receive a
determination on whether the card has been tampered with

 Ensuring the reliability and authenticity of PAN cards, which are crucial for identification and
financial transactions in India

3
CHAPTER 3:

HARDWARE AND SOFTWARE REQUIREMENT


Based on the provided sources, the hardware and software requirements for the "Pan Card
Tampering Detection" project can be summarized as follows:

Hardware Requirements:
1. Computer System: A standard computer system with sufficient processing power and
memory to handle image processing tasks efficiently.
2. Web Camera: For capturing images of Pan Cards for analysis and verification.
3. Storage: Adequate storage space to store the Pan Card images and processed data.

Software Requirements:

3.1. Programming Languages:

 Python 3.x: Specifically used for implementing machine learning algorithms and image
processing tasks.

3.2. Libraries and Frameworks:

 OpenCV: Essential for image processing tasks, such as image resizing, cropping, and
segmentation.
 Numpy: Required for numerical computations and array operations.
 TensorFlow: Utilized for building and training deep learning models, particularly for
distinguishing between real and fake Pan Cards.
 Scikitlearn: Used for machine learning tasks and model evaluation.
 MediaPipe: Possibly used for additional image processing functionalities.
 Tqdm: For displaying progress bars during model training and testing.

3.3. Development Tools:

 Jupyter Notebook: Used for loading and preprocessing datasets, implementing the
tampered and original image analysis, and evaluating the model.

3.4. Image Processing Tools:

 PIL (Python Imaging Library): Potentially used for image manipulation tasks.

4
3.5. Web Application Deployment:

 Anaconda Prompt: Used for running commands to set up and deploy the web
application for tampering detection.
6. Version Control: Git: Utilized for version control and collaboration on the project codebase.

5
CHAPTER 4:

FEASIBILITY STUDY
Feasibility Study for "Pan Card Tampering Detection"

4.1 Technical Feasibility:

 Algorithm Suitability: The use of Convolutional Neural Networks (CNNs) for analyzing
Pan Card images and detecting tampering indicates the technical feasibility of the project.

 Preprocessing Techniques: The application of image preprocessing steps like resizing,


cropping, and grayscale conversion demonstrates the technical viability of preparing the
data for analysis.

 Integration of Tools: The integration of OpenCV, TensorFlow, and other libraries in the
project showcases the technical feasibility of implementing machine learning algorithms
for fraud detection.

4.2. Operational Feasibility:

 Realtime Monitoring: The proposed system's ability to continuously monitor Pan Card
images in realtime for fraudulent activities indicates operational feasibility in detecting
tampering promptly.

 User Interface: The userfriendly interface and accessibility of the system suggest
operational feasibility in terms of user interaction and ease of use.

4.3. Economic Feasibility:

 Resource Requirements: The project's reliance on standard computer systems and


common programming languages like Python indicates economic feasibility in terms of
resource availability and costeffectiveness.

 Scalability: The potential to train the system using a large dataset of Pan Card images
suggests economic feasibility by leveraging existing resources efficiently.

4.4 Legal and Ethical Feasibility:

 Data Privacy: Ensuring the privacy and security of Pan Card data during the verification
process is crucial for legal and ethical compliance.

 Model Transparency: Maintaining transparency in the machine learning models used


for fraud detection is essential to ensure fairness and accountability.

6
4.5 Conclusion:

The feasibility study indicates that the "Pan Card Tampering Detection" project is
technically, operationally, economically, and ethically viable. The utilization of machine
learning and computer vision techniques, along with the availability of necessary tools and
resources, supports the feasibility of developing an effective system for detecting fraudulent
activities related to Pan Cards.

7
CHAPTER 5:

PRODUCT PERSPECTIVE
5.1 System Interfaces

 The system will interface with the existing Pan Card database maintained by the Income
Tax Department to access reference images and other relevant data for tampering
detection.

 The system will integrate with other government agencies and financial institutions to
enable seamless verification of Pan Card authenticity.

5.2 User Interfaces

 The system will provide a userfriendly webbased interface that allows users (both
authorities and the general public) to upload Pan Card images for verification.

 The interface will display the tampering detection results, highlighting any areas of
concern on the Pan Card.

 The system will offer clear instructions and guidance to users on how to properly
submit Pan Card images for analysis.

5.3 Hardware Interfaces

 The system will require a web camera or scanner to capture highquality images of Pan
Cards for the tampering detection process.

 The system will be compatible with standard computer hardware, such as desktops and
laptops, to ensure accessibility and ease of use.

5.4 Software Interfaces

 The system will utilize various software libraries and frameworks, including OpenCV,
TensorFlow, and Scikitlearn, to implement the computer vision and machine learning
algorithms for tampering detection.

 The system will be developed using Python 3.x, a widelyused programming language
for data analysis and machine learning tasks.

5.5 Communication Interfaces

The system will provide a secure communication channel for users to submit Pan Card
images and receive the tampering detection results.

8
The system will also enable seamless data exchange with the existing Pan Card database and
other relevant systems to ensure realtime updates and information sharing.

5.6 Memory Constraints

The system will need to handle a large volume of Pan Card images and associated data,
requiring adequate storage and memory resources to ensure efficient processing and analysis.

The system will be designed to optimize memory usage and leverage cloudbased storage
solutions, if necessary, to address any memory constraints.

9
CHAPTER 6:

PROJECT DESIGN

6.1 Data Model

6.1.1 Database Design

The system will utilize a relational database to store the Pan Card images, associated
metadata, and the results of the tampering detection process.
The database schema will include tables for storing the following information:
Pan Card image data (e.g., image file, PAN number, name, date of birth)
Tampering detection results (e.g., SSIM score, classification of genuine or tampered)
User and authentication data (if the system requires user accounts)
The database design will ensure data integrity, security, and efficient retrieval of
information for the tampering detection process.

6.2 Process Model

6.2.1 Information Flow Diagram


The information flow diagram illustrates the key steps involved in the Pan Card tampering
detection process:
1. User uploads a Pan Card image to the system.
2. The system preprocesses the image (e.g., resizing, cropping, grayscale conversion).
3. The system calculates the Structural Similarity Index Measure (SSIM) between the
uploaded image and a reference image.
4. The system employs a Convolutional Neural Network (CNN) model to classify the Pan
Card as genuine or tampered.
5. The system generates a tampering detection report and displays the results to the user.
6. The system stores the Pan Card image, metadata, and tampering detection results in the
database.

6.3 Functional Decomposition Diagram

 The functional decomposition diagram breaks down the key functionalities of the "Pan
Card Tampering Detection" system into smaller, more manageable components:

10
 Image Acquisition: Handles the user interface for uploading Pan Card images.
 Image Preprocessing: Performs tasks like resizing, cropping, and grayscale conversion
on the uploaded images.
 Tampering Detection: Calculates the SSIM and utilizes the CNN model to classify the
Pan Card as genuine or tampered.
 Reporting and Visualization: Generates the tampering detection report and displays the
results to the user.
 Database Management: Handles the storage and retrieval of Pan Card images, metadata,
and tampering detection results.
 User Management (if applicable): Manages user accounts and authentication for the
system.

11
CHAPTER 7:
ARCHITECTURAL DESIGN

7.1 Architectural Model


The architectural model for the "Pan Card Tampering Detection" system can be described as
a webbased application with the following components:

7.1.1 Clientside Interface:

 The system will provide a userfriendly web interface that allows users (both authorities
and the general public) to upload Pan Card images for verification.
 The interface will be designed to be responsive and accessible across different devices
and platforms.

7.1.2 Serverside Processing:

 The uploaded Pan Card images will be processed on the serverside using the
implemented machine learning and computer vision algorithms.[5]
 The server will handle tasks such as image preprocessing, tampering detection, and
generating the verification report.

7.1.3. Database Integration:

 The system will integrate with a relational database to store the uploaded Pan Card
images, associated metadata, and the results of the tampering detection process.[4][5]
 The database will enable efficient storage, retrieval, and management of the Pan Card
data.

7.1.4. Tampering Detection Module:

 The core of the system will be the tampering detection module, which will utilize
Convolutional Neural Networks (CNNs) and the Structural Similarity Index Measure
(SSIM) to analyze the Pan Card images and detect any signs of tampering.[5]
 This module will be responsible for the image preprocessing, feature extraction, and
classification of the Pan Cards as genuine or tampered.

7.1.5. Reporting and Visualization:

The system will generate detailed reports on the tampering detection results, highlighting
the areas of concern on the Pan Card.

12
The reports will be presented in a userfriendly format, allowing users to easily understand
the verification outcomes.

7.1.6. Integration with External Systems:

 The system will integrate with the existing Pan Card database maintained by the Income
Tax Department to access reference images and other relevant data for the tampering
detection process.[4]
 The system may also integrate with other government agencies and financial institutions
to enable seamless verification of Pan Card authenticity.[4]

7.1.7. Security and Scalability:

 The architectural design will incorporate robust security measures to protect the
integrity and confidentiality of the Pan Card data.[5]
 The system will be designed to handle a large volume of Pan Card verification requests
efficiently, ensuring scalability and high availability.[5]

13
CHAPTER 8

IMPLEMENTING TECHNOLOGIES

The purpose of this project is to detect tampering of PAN card using computer vision. That
that trains computer to interpret and understand the visual world. Using digital images from
cameras and videos and deep learning models, machines can accurately identify and classify
objects — and then react to what they “see”. This project will help different organization in
detecting whether the Id i.e., the PAN card provided to them by their employees or customers
or anyone is original or not.

For this project we will calculate structural similarity of original PAN card and the PAN card
uploaded by user This is soul of this project we will discuss it later in this Project.

Steps involved in this project are as follows:

 Import necessary libraries


 Scraping the tampered and original pan card from website
 Scaling down the shape of tampered image as original image
 Read original and tampered image
 Converting image into grayscale image
 Applying Structural Similarity Index (SSIM)technique between the two images
 Calculate Threshold and contours and
 Experience real time contours and threshold on images

8.1 Computer Vision

Computer vision is an interdisciplinary scientific field that deals with how computers can
gain highlevel understanding from digital images or videos. From the perspective
of engineering, it seeks to understand and automate tasks that the human visual system can do.
for acquiring, processing, analyzing and understanding digital images, and extraction
of highdimensional data from the real world in order to produce numerical or symbolic
information, e.g., in the forms of decisions. Understanding in this context means the
transformation of visual images (the input of the retina) into descriptions of the world that
make sense to thought processes and can elicit appropriate action. This image understanding

14
can be seen as the disentangling of symbolic information from image data using models
constructed with the aid of geometry, physics, statistics, and learning theory.

The scientific discipline of computer vision is concerned with the theory behind artificial
systems that extract information from images. The image data can take many forms, such as
video sequences, views from multiple cameras, multidimensional data from a 3D scanner, or
medical scanning device. The technological discipline of computer vision seeks to apply its
theories and models to the construction of computer vision systems.

Subdomains of computer vision include scene reconstruction, object detection, event


detection, video tracking, object recognition, 3D pose estimation, learning, indexing, motion
estimation, visual servoing, 3D scene modeling, and image restoration.

8.2 Concept of Edge Detection

The concept of edge detection is used to detect the location and presence of edges by making
changes in the intensity of an image. Different operations are used in image processing to
detect edges. It can detect the variation of grey levels but it quickly gives response when a
noise is detected. In image processing, edge detection is a very important task. Edge detection
is the main tool in pattern recognition, image segmentation and scene analysis. It is a type of
filter which is applied to extract the edge points in an image. Sudden changes in an image
occurs when the edge of an image contour across the brightness of the image.

In image processing, edges are interpreted as a single class of singularity. In a function, the
singularity is characterized as discontinuities in which the gradient approaches are infinity.

As we know that the image data is in the discrete form so edges of the image are defined as
the local maxima of the gradient.

Mostly edges exit between objects and objects, primitives and primitives, objects and
background. The objects which are reflected back are in discontinuous form. Methods of
edge detection study to change a single pixel of an image in gray area.

Edge detection is mostly used for the measurement, detection and location changes in an
image gray. Edges are the basic feature of an image. In an object, the clearest part is the
edges and lines. With the help of edges and lines, an object structure is known. That is why
extracting the edges is a very important technique in graphics processing and feature
extraction.

15
OpenCV

OpenCV is the huge opensource library for the computer vision, machine learning, and
image processing and now it plays a major role in realtime operation which is very
important in today’s systems. By using it, one can process images and videos to identify
objects, faces, or even handwriting of a human. When it integrated with various libraries,
such as NumPy, python is capable of processing the OpenCV array structure for analysis.
To Identify image pattern and its various features we use vector space and perform
mathematical operations on these features.

The first OpenCV version was 1.0. OpenCV is released under a BSD license and hence it’s
free for both academic and commercial use. It has C++, C, Python and Java interfaces and
supports Windows, Linux, Mac OS, iOS and Android. When OpenCV was designed the
main focus was realtime applications for computational efficiency. All things are written in
optimized C/C++ to take advantage of multicore processing.

Look at the following images

FIG 1:

16
from the above original image, lots of pieces of information that are present in the original
image can be obtained. Like in the above image there are two faces available and the
person(I) in the images wearing a bracelet, watch, etc so by the help of OpenCV we can get
all these types of information from the original image.
It’s the basic introduction to OpenCV we can continue the Applications and all the things in
our upcoming articles.

Applications of OpenCV

There are lots of applications which are solved using OpenCV, some of them are listed
below
 face recognition
 Automated inspection and surveillance
 number of people – count (foot traffic in a mall, etc)
 Vehicle counting on highways along with their speeds
 Interactive art installations
 Anamoly (defect) detection in the manufacturing process (the odd defective
products)
 Street view image stitching
 Video/image search and retrieval
 Robot and driverless car navigation and control
 object recognition
 Medical image analysis

17
 Movies – 3D structure from motion
 TV Channels advertisement recognition

OpenCV Functionality

 Image/video I/O, processing, display (core, imgproc, highgui)


 Object/feature detection (objdetect, features2d, nonfree)
 Geometrybased monocular or stereo computer vision (calib3d, stitching,
videostab)
 Computational photography (photo, video, superres)
 Machine learning & clustering (ml, flann)
 CUDA acceleration (gpu)

ImageProcessing

Image processing is a method to perform some operations on an image, in order to get an


enhanced image and or to extract some useful information from it. If we talk about the
basic definition of image processing then “Image processing is the analysis and
manipulation of a digitized image, especially in order to improve its quality”.

DigitalImage

An image may be defined as a twodimensional function f(x, y), where x and y are
spatial(plane) coordinates, and the amplitude of fat any pair of coordinates (x, y) is called
the intensity or grey level of the image at that point. In another word An image is nothing
more than a twodimensional matrix (3D in case of coloured images) which is defined by the
mathematical function f(x, y) at any point is giving the pixel value at that point of an image,
the pixel value describes how bright that pixel is, and what colour it should be.
Image processing is basically signal processing in which input is an image and output is
image or characteristics according to requirement associated with that image.

Image processing basically includes the following three steps:

1. Importing the image


2. Analysing and manipulating the image

18
3. Output in which result can be altered image or report that is based
on image analysis

How Does A Computer Read An Image


Consider the below image:

FIG 2:
We are humans we can easily make it out that is the image of a person who is me. But if we
ask computer “is it my photo?”. The computer can’t say anything because the computer is
not figuring out it all on its own. The computer reads any image as a range of values
between 0 and 255. For any color image, there are 3 primary channels red, green and blue.

19
Big Data :

Big data is a field that treats ways to analyse, systematically extract information from, or
otherwise deal with data sets that are too large or complex to be dealt with by traditional
dataprocessing application software. Data with many fields (columns) offer greater statistical
power, while data with higher complexity (more attributes or columns) may lead to a higher
false discovery rate. Big data analysis challenges include capturing data, data storage, data
analysis, search, sharing, transfer, visualization, querying, updating, information privacy, and
data source. Big data was originally associated with three key concepts: volume, variety, and
velocity.

Machine learning :

Machine learning (ML) is the study of computer algorithms that can improve automatically
through experience and by the use of data. It is seen as a part of artificial intelligence.
Machine learning algorithms build a model based on sample data, known as training data, in
order to make predictions or decisions without being explicitly programmed to do so.
Machine learning algorithms are used in a wide variety of applications, such as in medicine,
email filtering, speech recognition, and computer vision, where it is difficult or unfeasible to
develop conventional algorithms to perform the needed tasks.

A subset of machine learning is closely related to computational statistics, which focuses on


making predictions using computers; but not all machine learning is statistical learning. The
study of mathematical optimization delivers methods, theory and application domains to the
field of machine learning. Data mining is a related field of study, focusing on exploratory
data analysis through unsupervised learning.Some implementations of machine learning use
data and neural networks in a way that mimics the working of a biological brain .In its
application across business problems, machine learning is also referred to as predictive
analytics.

GrayScale
A Grayscale image is one in which the value of each pixel is a single sample representing
only an amount of light; that is, it carries only intensity information. Grayscale images, a kind
of blackandwhite or gray monochrome, are composed exclusively of shades of gray. The
contrast ranges from black at the weakest intensity to white at the strongest.

20
Grayscale images are distinct from onebit bitonal blackandwhite images, which, in the
context of computer imaging, are images with only two colors: black and white (also called
bilevel or binary images). Grayscale images have many shades of gray in between.

STRUCTURAL SIMILARITY INDEX MEASURE (SSIM) :

The structural similarity index measure (SSIM) is a method for predicting the perceived
quality of digital television and cinematic pictures, as well as other kinds of digital images
and videos. SSIM is used for measuring the similarity between two images. The SSIM index
is a full reference metric; in other words, the measurement or prediction of image quality is
based on an initial uncompressed or distortionfree image as reference.

SSIM is a perceptionbased model that considers image degradation as perceived change in


structural information, while also incorporating important perceptual phenomena, including
both luminance masking and contrast masking terms. The difference with other techniques
such as MSE or PSNR is that these approaches estimate absolute errors. Structural
information is the idea that the pixels have strong interdependencies especially when they are
spatially close. These dependencies carry important information about the structure of the
objects in the visual scene. Luminance masking is a phenomenon whereby image distortions
(in this context) tend to be less visible in bright regions, while contrast masking is a
phenomenon whereby distortions become less visible where there is significant activity or
"texture" in the image.

21
CHAPTER 9

Data Flow Diagram

Fig. DFD

22
CHAPTER 10

UNDERSTANDING THE LIBRARIES

10.1 Model of The Project


import the necessary packages
from skimage.metrics import structural_similarity
import imutils
import cv2
from PIL import Image
import requests

 Skimage: Scikitimage, or skimage, is an opensource Python package designed for image


preprocessing. Here in this project most of the image processing techniques will be used
via scikitimage
 imutils: Imutils are a series of convenience functions to make basic image processing
functions such as translation, rotation, resizing, skeletonization, and displaying images
easier with OpenCV
 cv2: OpenCV (OpenSource Computer Vision Library) is a library of programming
functions mainly aimed at realtime computer vision. Here in this project major reading
and writing of the image is done via cv2.
 PIL: PIL (Python Imaging Library) is a free and opensource additional library for the
Python programming language that adds support for opening, manipulating, and saving
many different images file formats. Here in this project PIL is used for storing images in
the form of variable (accessible) to manipulate them according to need.
 requests: Requests is a simple, yet elegant HTTP library. Here the image scrapig from
website is done via requests library.

Make folders and subfolders for storing images


You may create it manually it's completely upto you

 !mkdir pan_card_tampering
 !mkdir pan_card_tampering/image

A subdirectory or file pan_card_tampering already exists. The syntax of the command is


incorrect.

23
CHAPTER 11
IMPLEMENTATION

11.1 Open image and display

original = Image.open(requests.get('https://www.thestatesman.com
/wpcontent/uploads/2019/07/pancard.jpg', stream=True).raw)
tampered = Image.open(requests.get('https://assets1.cleartaxcdn.com
/s/img/20170526124335/Pan4.png', stream=True).raw)
In the above code snippet , we are web scarping the images from different sources
using requests library.

Loading original and user provided images.

The file format of the source file.


print("Original image format : ",original.format)
print("Tampered image format : ",tampered.format)

Image size, in pixels. The size is given as a 2tuple (width, height).

print("Original image size : ",original.size)


print("Tampered image size : ",tampered.size)
Original image format : JPEG Tampered image format : PNG Original image size : (1200,
800) Tampered image size : (282, 179)

As in you can see in above output , The orginal size of original image and original size
of tampered image are different which will result unwanted/false results while doing image
processing , that's why scaling down both the image to equal shape is prominently needed.

Converting the format of tampered image similar to original image.

Resize Image
original = original.resize((250, 160))
print(original.size)

24
original.save('pan_card_tampering/image/original.png')Save image
tampered = tampered.resize((250,160))
print(tampered.size)
tampered.save('pan_card_tampering/image/tampered.png')Saves image
(250, 160) (250, 160)

Now, if you will see the output the shape of both the images (Original image and tampered
image) is scaled down to equal shape i.e. (250,160). Now the image processing will be
smoother and more accurate than it was before.

We can change the format of the image (png or jpg) if needed.

Change image type if required from png to jpg

tampered = Image.open('pan_card_tampering/image/tampered.png')

tampered.save('pan_card_tampering/image/tampered.png')can do png to jpg

Display orginial PAN card image which will be used for comparision.

Display original image

Original

Display user provided image which will be compared with PAN card.
Display user given image

25
Tampered

Reading images using opencv.

load the two input images


original = cv2.imread('pan_card_tampering/image/original.png')
tampered = cv2.imread('pan_card_tampering/image/tampered.png')

Now in the above code we are reading both the images (Original and Tampered) using
cv2's imread() function.
Convert the images into grayscale

Convert the images to grayscale

original_gray = cv2.cvtColor(original, cv2.COLOR_BGR2GRAY)


tampered_gray = cv2.cvtColor(tampered, cv2.COLOR_BGR2GRAY)
In the above code we have converted the original images (Original pan card and user given
Pan card) to grayscale images using cv2's function cvtColor() which have parameter
as cv2.COLOR_BGR2GRAY.

26
why we convert them into grayscale

Converting images into grayscale is very much beneficial in accuracy of image processing,
because in image processing many applications doesn't help us in identifying the
important, edges of the coloured images also coloured images are bit complex to understand
by machine because they have 3 channel while grayscale has only 1 channel

Applying Structural Similarity Index (SSIM) technique between the two images.

11.2 SSIM

The Structural Similarity Index (SSIM) is a perceptual metric that quantifies the image
quality degradation that is caused by processing such as data compression or by losses in data
transmission.

11.3 SSIM Functions

This metric is basically a full reference that requires 2 images from the same shot, this means
2 graphically identical images to the human eye. The second image generally is compressed
or has a different quality, which is the goal of this index.

11.4 Real world use of SSIM

SSIM is usually used in the video industry, but has as well a strong application
in photography.

11.5 SSIM help in detection

SSIM actually measures the perceptual difference between two similar images. It cannot
judge which of the two is better: that must be inferred from knowing which is the original one
and which has been exposed to additional processing such as compression or filters.

Compute the Structural Similarity Index (SSIM) between the two images, ensuring that the
difference image is returned

(score, diff) = structural_similarity(original_gray, tampered_gray, full=True)


diff = (diff 255).astype("uint8")
print("SSIM Score is : {}".format(score100))
if score >= 80:

27
print ("The given pan card is original")
else:
print("The given pan card is tampered")
SSIM Score is : 31.678790332739425 The given pan card is tampered

Let's breakdown what just happened in above code !

Structural similarity index helps us to determine exactly where in terms of x,y coordinates
location, the image differences are. Here, we are trying to find similarities between the
original and tampered image.

The lower the SSIM score lower is the similarity, i.e SSIM score is directly propotional to
similarity between two images

We have given one threshold value of "45" i.e if any score is >= 80 it will be regarded as
original pan card else tampered one.

Generally SSIM values 0.97, 0.98, 0.99 for good quality recontruction techniques.

Experience real time threshold and contours on images

Contours detection is a process can be explained simply as a curve joining all the continuous
points (along with the boundary), having same colour or intensity. The algorithm does indeed
find edges of images but also puts them in a hierarchy.

Calculating threshold and contours


thresh = cv2.threshold(diff, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)

Here we are using the threshold function of computer vision which applies an adaptive
threshold to the image which is stored in the form array. This function transforms the
grayscale image into a binary image using a mathematical formula.

Find contours works on binary image and retrive the contours. This contours are a useful tool
for shape analysis and recoginition. Grab contours grabs the appropriate value of the contours.

28
The OCR stands for Optical Character Recognition. It is used to read text from images such
as a scanned document or a picture.

Example

The above image will be cropped and converted as:

Tech Stack

Python, DNN, Darknet framework, GPU training, Computer Vision, VoTT data annotation
tool, Image augmentation library called Albumentations,

Working

A pipelined python script to convert scanned documents in the form of images from users to a
formatted text. Building a OCR requires regions of an image detection and its recognition.
These detections are such that the selective text from the original image can be obtained.
Thus, after the text is detected, its recognition is required.

29
Challenges

A big data set is prerequisite for training a good model. Datasets were augmented by
changing parameters like brightness, contrast, orientation, etc. of raw images before data
labelling.
GPU computing but with limited access.To manage the rotated and skewed image regions
generated by the model before performing OCR developed a simple solution of hitting the
authentication database API twice for antisymmetric text region obtained.

30
Creating bounding boxes (contours)

loop over the contours


for c in cnts:
applying contours on image
(x, y, w, h) = cv2.boundingRect(c)
cv2.rectangle(original, (x, y), (x + w, y + h), (0, 0, 255), 2)
cv2.rectangle(tampered, (x, y), (x + w, y + h), (0, 0, 255), 2)

Bounding rectangle helps in finding the ratio of width to height of bounding rectangle of the
object. We compute the bounding box of the contour and then draw the bounding box on both
input images to represent where the two images are different or not.

Diplay original image with contour

print('Original Format Image')

original_contour = Image.fromarray(original)

original_contour.save("pan_card_tampering/image/original_contour_image.png")

original_contour

Original Format Image

31
FIG original image with contour

Inference :

Here in the above output, you can see that the original image is shown with the contours
(bounding boxes) on it using fromarray() function.

Also you can simply save the image using save() function (Optional).

Need of edge detection

Edge detection is an image processing technique for finding the boundaries of objects within
images. It works by detecting discontinuities in brightness. Edge detection is used for image
segmentation and data extraction in areas such as image processing, computer vision, and
machine vision.

Diplay tampered image with contour

print('Tampered Image')

tampered_contour = Image.fromarray(tampered)

tampered_contour.save("pan_card_tampering/image/tampered_contours_image.png")

32
tampered_contour

Tampered Image

Inference : Similarly goes with tampered image but one can notice that some of the contours
are missing in tampered image.

Diplay difference image with black

print('Different Image')

difference_image = Image.fromarray(diff)

difference_image.save("pan_card_tampering/image/difference_image.png")

difference_image

Different Image

Inference :

Here is another very interactive way to show the contours in terms of heated threshold i.e. by
finding the heated zone (text/image zone) and normal zone (without text/image).

The heated zone i.e the zone which has text/images will be shown in dark (black) region, and
the other one as light (kind of white) zone.

Display threshold image with white

33
print('Threshold Image')
threshold_image = Image.fromarray(thresh)
threshold_image.save("pan_card_tampering/image/threshold_image.png")
threshold_image

Threshold Image

Inference:

Everything here is just same all we can see is the change in role of color, here white color is
showing the heated zone and black color is showing the normal zone.

34
References

Citations:

[1] https://www.projectbank.in

[2] https://github.com/Pranav-Nagpure/Pan-Card-Tampering-Detection

[3] https://www.linkedin.com/posts/sumeetsagar_github-ssagar012pan-card-tampering-detection-
activity-6979773128092655616-K6Cm

[4] https://github.com/Priyanshu88/Pan-Card-Tampering-Detection

[5] https://www.kaggle.com/code/seherbal/pan-card-tampering/code

35

You might also like