Final 1st Phase

Credit Card Fraud Detection
CHAPTER 1
INTRODUCTION
Fraud detection concerns a large number of financial institutions and banks as this crime costs them
around $ 67 billion per year. There are different types of fraud: insurance fraud, credit card fraud,
statement fraud, securities fraud etc. Of all of them, credit card fraud is the most common type. It is
defined as an unauthorized use of a credit card account. It occurs when the cardholder and the card
issue rare not aware that the card is being used by a third party. The fraudsters can obtain goods
without paying, or gain illegal access to funds from an account. Credit card fraud is classified into
different types based on the nature of fraudulent activities.
They are briefly introduced in the following:
Simple theft (offline fraud): a stolen card is the most straightforward type of credit card
fraud. It is also the fastest to be detected.
Application fraud: when individuals obtain new credit cards using false personal
information.
Bankruptcy fraud: this consists in using a credit card while being insolvent, and purchasing
goods knowing that they are not able to pay. This type can be prevented with credit scoring
techniques.
Internal fraud: when bank employees steal the card details to use it remotely.
Counterfeit fraud / behavioural fraud / cardholder-not-present fraud: when transactions are

made remotely(mobile sales, online, etc.), the cardholder does not need to be present, only the
details of a legitimate credit card are needed. The card's details can be obtained by skimming
or shoulder surfing. The detection of this type of credit card fraud can take time, and needs
sophisticated methods that catch the transactions patterns.
At the current state of the world, financial organizations expand the availability of financial facilities
by employing of innovative services such as credit cards, Automated Teller Machines (ATM),internet
and mobile banking services. Besides, along with the rapid advances of e-commerce. The use of
credit card has become a convenience and necessary part of financial life. Credit card is a payment
card supplied to customers as a system of payment. There are lots of advantages in using credit cards
such as:
Ease of purchase: Credit cards can make life easier. They allow customers to purchase on credit in
arbitrary time, location and amount, without carrying the cash provide a convenient payment method
for purchases made on the internet, over the telephone, through ATMs, etc.
Keep customer credit history: Having a good credit history is often important in detecting loyal
customers. This history is valuable not only for credit cards, but also for other financial services like
loans, rental applications, or even some jobs. Lenders and issuers of credit mortgage companies,
credit card companies, retail stores, and utility companies can review customer credit score and
history to see how punctual and responsible customers are in paying back their debts.
Department of CSE, VVIT 2019-20 Page 1

Protection of Purchases: Credit cards may also offer customers, additional protection if the
purchased Merchandise becomes lost, damaged, or stolen. Both the buyers credit card statement and
company can confirm that the customer has bought if the original receipt is lost or stolen. In addition,
some credit card companies provide insurance for large purchases.
In spite of all mentioned advantages, the problem of fraud is a serious issue in e-banking services
that threaten credit card transactions especially. Fraud is an intentional deception with the purpose of
obtaining financial gain or causing loss by implicit or explicit trick fraud is a public law violation in
which the fraudster gains an unlawful advantage or causes unlawful damage. The estimation of
amount of damage made by fraud activities indicates that fraud costs a very considerable sum of
money. Credit card fraud is increasing significantly with the development of modern technology
resulting in the loss of billions of dollars worldwide each year statistics from the Internet Crime
Complaint Center show that there has been a significant rising in reported fraud in last decade.
Financial losses caused due to online fraud only in US, was reported $3.4 billion in 2011.
Fraud detection involves identifying scarce fraud activities among numerous legitimate transactions
as quickly as possible. Fraud detection methods are developing rapidly in order to adapt with new
incoming fraudulent strategies across the world. But, development of new fraud detection techniques
becomes more difficult due to the severe limitation of the ideas exchange in fraud detection. On the
other hand, fraud detection is essentially a rare event problem, which has been variously called
outlier analysis, anomaly detection, exception mining, mining rare classes, mining imbalanced data
etc. The number of fraudulent transactions is usually a very low fraction of the total transactions.
Hence the task of detecting fraud transactions in an accurate and efficient manner is fairly difficult
and challengeable. Therefore, development of efficient methods which can distinguish rare fraud
activities from billions of legitimate transaction seems essential.
Although, credit card fraud detection has gained attention and extensive study especially in recent
years and there are lots of surveys about this kind of fraud such as neither classify all credit card
fraud detection techniques with analysis of datasets and attributes. Therefore, we attempt to collect
and integrate a complete set of researches of literature and analyse them from various aspects.
The main contributions of this work are highlighted as follows:

To the best of our knowledge, the absence of complete and detailed credit card fraud detection
survey is an important issue, which is addressed by analysing the state of the art in credit card fraud
detection.
The state of the art fraud detection techniques are described and classified from different aspects of
supervised/unsupervised and numerical/categorical data consistent.
In credit card fraud research each researcher has used its own dataset. There is no standard dataset
or benchmark to evaluate detection methods. We attempt to gather different datasets investigated by
researchers, categorize them into real and synthetized groups and extract the common attributes
affects the quality of detection.
Credit card fraud
Illegal use of credit card or its information without the knowledge of the owner is referred to as
credit card fraud. Different credit card fraud tricks belong mainly to two groups of application and
behavioural fraud [3]. Application fraud takes place when, fraudsters apply new cards from bank or
issuing companies using false or others information. Multiple applications may be submitted by one
user with one set of user details (called duplication fraud) or different user with identical details
(called identity fraud).
Behavioural fraud, on the other hand, has four principal types: stolen/lost card, mail theft, counterfeit
card and „card holder not present‟ fraud. Stolen/lost card fraud occurs when fraudsters steals credit
card or get access to a lost card. Mail theft fraud occurs when the fraudster get a credit card in mail or
personal information from bank before reaching to actual cardholder. In both counterfeit and „card
holder not present‟ frauds, credit card details are obtained without the knowledge of card holders. In
the former, remote transactions can be conducted using card details through mail, phone, or the
Internet. In the latter, counterfeit cards are made based on card information.
Based on statistical data stated in 2012, the high risk countries facing credit card fraud threat is
illustrated in Fig.1. Ukraine has the most fraud rate with staggering 19%, which is closely followed
by Indonesia at 18.3% fraud rate. After these two, Yugoslavia with the rate of17.8% is the most risky
country. The next highest fraud rate belongs to Malaysia (5.9%), Turkey (9%) and finally United
States. Other countries that are prune to credit card fraud with the rate below than 1% are not
demonstrated.
Credit Card Fraud Detection Techniques

The credit card fraud detection techniques are classified in two general categories: fraud analysis
(misuse detection) and user behaviour analysis (anomaly detection).
The first group of techniques deals with supervised classification task in transaction level. In these
methods, transactions are labeled as fraudulent or normal based on previous historical data. This
dataset is then used to create classification models which can predict the state (normal or fraud) of
new records. There are numerous model creation methods for a typical two class classification task
such as rule induction [1], decision trees [2] and neural networks [3].This approach is proven to

reliably detect most fraud tricks which have been observed before [4], it also known as misuse
detection.
The second approach deals with unsupervised methodologies which are based on account behaviour.
In this method a transaction is detected fraudulent if it is in contrast with user’s normal behaviour.
This is because we don’t expect fraudsters behave the same as the account owner or be aware of the
behaviour model of the owner .To this aim, we need to extract the legitimate user behavioural model
(e.. user profile)for each account and then detect fraudulent activities according to it. Comparing new
behaviours with this model, different enough activities are distinguished as frauds. The profiles may
contain the activity information of the account; such as merchant types, amount, location and time of
transactions. This method is also known as anomaly detection.
It is important to highlight the key differences between user behaviour analysis and fraud analysis
approaches. The fraud analysis method can detect known fraud tricks, with a low false positive rate.
These systems extract the signature and model of fraud tricks presented in oracle dataset and can then
easily determine exactly which frauds, the system is currently experiencing. If the test data does not
contain any fraud signatures, no alarm is raised. Thus, the false positive rate can be reduced
extremely. However, since learning of a fraud analysis system (i.e. classifier) is based on limited and
specific fraud records, It cannot detect novel frauds. As a result, the false negatives rate may be
extremely high depending on how ingenious are the fraudsters. User behaviour analysis, on the other
hand, greatly addresses the problem of detecting novel frauds. These methods do not search for
specific fraud patterns, but rather compare incoming activities with the constructed model of
legitimate user behaviour. Any activity that is enough different from the model will be considered as
a possible fraud. Though, user behaviour analysis approaches are powerful in detecting innovative
frauds, they really suffer from high rates of false alarm. Moreover, if a fraud occurs during the
training phase, this fraudulent behaviour will be entered in baseline mode and is assumed to be
normal in further analysis.
In this we will briefly introduce some current fraud detection techniques which are applied to credit
card fraud detection tasks, also main advantage and disadvantage of each approach will be discussed.

CHAPTER-2
LITERATURE SURVEY
Literature survey is the most important step in software development process. Before developing the
tool it is necessary to determine the time factor, economy and company strength. Once these things
are satisfied, then next steps are to determine which operating system and programming language can
be used for developing the tool. Once the programmer starts building the tools the programmer need
lot of external support. This support can be obtained from senior programmer, from books and from
website. Before building the system, the above consideration are taken into account for developing
the proposed system.
Some efforts have been reported in the literature on the Credit card fraud detection:
1) Evaluating and emerging payment card fraud challenges and

resolution
Authors: P.Richhariya and P.K. Singh
This paper deliberates the solution of payment card fraud and discuss the various attributes of an
effective payment card and its applied thoughts. Despite of this, it also reviews challenges, the
concepts associated to the profiling of card holder, advanced analytics, metrics to be followed, and
mechanisms of the resolution of card fraud. The advantage is driving the business. The disadvantage
is lack of need to devise strategies around risk categories. The anonymous nature of the data in credit
card transaction is the major problem for the payment system of the bank.
2)Learned lessons in credit card fraud detection from a

practitioner perspective
Authors: A. Dal Pozzolo, O. Caelen et al
In this paper we provide some answers from the practitioner’s perspective by focusing on three
crucial issues: unbalanced-ness, non-stationarity and assessment. The analysis is made possible by a
real credit card dataset provided by our industrial partner. The advantage is it gives an overview of
the challenges in designing efficient fraud detection system. Provides alternative approaches to
unbalanced and drifting data streams. The disadvantage is at the same time public data are scarcely
available for confidentiality issues, leaving unanswered many questions about which is the best
strategy to deal with them.

3) Active Learning for Class Imbalance Problem

Authors: S. Ertekin, J. Huang, and C. L. Giles
The class imbalance problem has been known to hinder the leading performance of classification
algorithms. Various real-world classification tasks such as text categorization suffer from this
phenomenon. We also propose an efficient Support Vector Machine (SVM) active learning strategy
which queries a small pool of data at each iterative step instead of querying the entire dataset. The
advantage is that active learning with early stopping can achieve a faster and scalable solution
without sacrificing prediction performance. The disadvantage is that under sampling may discard
useful data that could be important for the learning process. Oversampling causes longer training
time and inefficiency in terms of memory due to the increased number of training instances and it
suffers from high computational costs for pre-processing the data.
4) Detecting credit card fraud by genetic algorithm and scatter search

Authors: E.Duman and M. H. Ozcelik
In this study we develop a method which improves a credit card fraud detection solution
currently being used in a bank. With this solution each transaction is scored and based on these
scores the transactions are classified as fraudulent or legitimate. In fraud detection solutions the
typical objective is to minimize the wrongly classified number of transactions.
The disadvantage is that wrong classification of each transaction do not have the same effect in that
if a card is in the hand of fraudsters its whole available limit is used up. Thus, the misclassification
cost should be taken as the available limit of the card. The advantage is that at the end of the study,
we increased the savings from fraud by 200%.
5) Credit Card Fraud Detection Using Hidden Markov Model

Authors: A. Srivastava, A. Kundu et al
In this paper, we model the sequence of operations in credit card transaction processing using a
hidden Markov model (HMM) and show how it can be used for the detection of frauds. The
disadvantage is that testing credit card FDSs using real data set is a difficult task. Banks do not, in
general, agree to share their data with researchers. There is also no benchmark data set available for
experimentation. The advantage is that Experimental results show the performance and effectiveness
of the system and demonstrate the usefulness of learning the spending profile of the cardholders.
Comparative studies reveal that the Accuracy of the system is close to 80 percent over a wide
variation in the input data. The system is also scalable for handling large volumes of transactions.

6) A Novel Approach for Credit Card Fraud Detection Targeting

the Indian Market
Authors: J. S. Mishra, S. Panda, and A. K. Mishra
In this paper we present the necessary theory to detect fraud in credit card transaction processing
using a Hidden Markov Model. If an incoming credit card transaction is not accepted by the Hidden
Markov Model with sufficiently high probability, it is considered to be fraudulent.
The advantage is HMM-based approach is an extreme decrease in the number of False Positives
transactions recognized as malicious by a fraud detection system even though they are really genuine.
The disadvantage is there is a greater probability of High False Alarms, thereby degrading the
Performance of the Fraud Detection System.
7) Identifying online credit card fraud using Artificial

Immune Systems
Authors: A. Brabazon, J. Cahill et al
The effectiveness of Artificial Immune Systems (AIS) for credit card fraud detection using a large
dataset obtained from an on-line retailer. Three AIS algorithms were implemented and their
performance was benchmarked against a logistic regression model. The advantage is that AIS
algorithms have potential for inclusion in fraud detection systems. The disadvantage is that it is
highly
dynamic, as fraudsters continually adapt their strategies in response to the increasing sophistication of
detection systems.
8) Cost Sensitive Credit Card Fraud Detection Using Bayes

Minimum Risk
Authors: A. C. Bahnsen, A. Stojanovic et al
In this paper a new comparison measure that realistically represents the monetary gains and losses
due to fraud detection is proposed. Moreover, using the proposed cost measure a cost sensitive
method based on Bayes minimum risk is presented. This method is compared with state of the art
algorithms and shows improvements up to 23% measured by cost. The advantage is that results of
this paper are b
ased on real life transactional data provided by a large European card processing company including
the real cost by creating a cost sensitive system using a Bayes minimum risk classifier, gives rise to
much better fraud detection results in the sense of higher savings. The disadvantage is that using a
cost matrix with fixed FN cost as proposed in gives poor results. This is because in practice the cost
of for different FN varies significantly.

9) Detecting Credit Card Fraud by Using Questionnaire-Responded

Transaction Model based on Support Vector Machines
Authors: Rong-Chang Chen1 et al
First, the questionnaire-responded transaction (QRT) data of users are collected by using an online
questionnaire based on consumer behaviour surveys. The data are then trained by using the support
vector machines (SVMs) whereby the QRT models are developed. The QRT models are used to
predict a new transaction. The advantage is that even without actual transaction data or with little
transaction data, a fraud can be detected with high accuracy by the QRT model, especially by using a
personalized model. The disadvantage is that the system should be able to handle skewed
distributions, since only a very small percentage of all credit card transactions is fraudulent. There are
too many overlapping data.
10) Credit card fraud detection using anti-k nearest neighbour

algorithm
Authors: V. R. Ganji and S. N. P. Mannem
This article proposes the concept of credit card fraud detection by using a data stream outlier
detection algorithm which is based on reverse k-nearest neighbours (SODRNN). The distinct quality
of SODRNN algorithm is it needs only one pass of scan. The disadvantage is whereas traditional
methods need to scan the database many times, it is not suitable for data stream environment. The
advantage is that lost and stolen card feature makes it easier to stop fraudulent transactions. Credit
card validation checks detects errors in a sequence of numbers hence detects valid and invalid
numbers easily observation that deviates so much from other observations as to arouse suspicion that
it was generated by a different mechanism.
11) Credit Card Fraud Detection Using Self-Organizing Maps

Authors: V. Zaslavsky and A. Strizhak
This article presents an automated credit card fraud detection system based on the neural network
technology. The authors apply the Self-Organizing Map algorithm to create a model of typical
cardholder's behaviour and to analyse the deviation of transactions, thus finding suspicious
transactions. The advantages of the proposed approach are: the success of the algorithm does not
depend on statistical assumptions about data distribution; it deals successfully with noisy data; the
method allows modification of the model as new transactions are added and it does not require a
priori information besides some set of transactions performed by the cardholder; the achieved
accuracy of the produced rules is stable. The disadvantage is that they detect only fixed suspicious

situations established beforehand and do not take into account the variable nature of fraud; they do
not consider the individual characteristics of cardholders’ behaviour; the control of such rule-based
system is rather complex task for the expert.
12) Enhanced System for Revealing Fraudulence in Credit

Card Approval
Authors: B. Subashini and K. Chitra
This research paper aims to enhance and evaluate the fraudulence in credit card approval process
using the classification models based on decision trees (C5.0 & CART), Support Vector Machine
(SVM) using SMO, Bayes Net and Logistic Regression. Five methods to detect fraud are presented.
Automatic credit card approval is the most significant process in the banking sector and financial
institutions. This enhanced system prevents the fraud which is going to happen. So this paper
proposes a good solution to the credit card approval using the above methods. The advantages of
applying the data mining techniques including decision trees(C5.0 & CART), SVM using SMO,
Logistic Regression and Bayes Network to reveal fraudulence in credit card approval process. It
reduces the financial institution’s risk. Any financial institution needs to retain its customers. The
disadvantage is that while making decisions on classifying customers combination of different
classification models need to be used to make correct decision about a customer.
13) Credit cards fraud detection by negative selection algorithm

on hadoop (To reduce the training time)
Authors: H. Hormozi, M. K. Akbari
This paper proposed a model for credit card fraud detection system, which is aimed to improve the
current risk management by adding an Artificial Immune System’s algorithm to fraud detection
system. For achieving to this goal, we parallelize the negative selection algorithm on the cloud
platform such as apache hadoop and map reduce. The advantage is that according to the result the
time decreases. That means, our credit card fraud detection system can detect frauds quickly. The
disadvantage is that with increasing of threshold amount, the time of algorithm rises too.
14) Real-time credit card fraud detection using computational

intelligence
Authors: J. T. S. Quah and M. Sriganesh
This paper focuses on real-time fraud detection and presents a new and innovative approach in
understanding spending patterns to decipher potential fraud cases. It makes use of self-organization
map to decipher, filter and analyse customer behaviour for detection of fraud. Merit is that SOM
follows unsupervised learning without prior knowledge of fraud cases and unearths interesting
patterns from the input data. The problem of clustering involves identifying input vectors, which
have minimum intra-cluster and maximum inter-cluster distances. This has been identified as a NP
hard problem and hence only sub-optimal heuristic approaches exist.

15) Learning from Imbalanced Data

Authors: H. He and E. A. Garcia
This paper, we provide a comprehensive review of the development of research in learning from
imbalanced data. Our focus is to provide a critical review of the nature of the problem, the state-of-
the-art technologies, and the current assessment metrics used to evaluate learning performance under
the imbalanced learning scenario. The disadvantage is that data complexity is a broad term that
comprises issues such as overlapping, lack of representative data, small disjuncts, and others.
Advantage is that the imbalanced learning problem is concerned with the performance of learning
algorithms in the presence of underrepresented data and severe class distribution skews.
16) Credit Card Fraud Detection Using Self Organised Map

Authors: W.Wei, J.Li et al
This paper focuses on Real Time Credit Card Fraud Detection and presents a new and innovative
approach to detect the fraud by the help of SOM. The uniqueness of our approach lies in using the
normalization and clustering mechanism of SOM of detecting credit card fraud. This helps in
detecting hidden patterns of the transactions which cannot be identified to the other traditional
method. n. The demerit is value of learning parameter is a small value and can't be adjusted.

CHAPTER 3
PROBLEM STATEMENT
Introduction to System Analysis
System
A system is an orderly group of interdependent components linked together according to a plan to

achieve a specific objective. Its main characteristics are organization, interaction, interdependence,
integration and central objectives.
System Analysis
System analysis and designs are the application of the system approach to problem solving generally
using computers. To reconstruct a system the analyst must consider its elements output and inputs,
processors, controls feedback and environment.
Analysis
Analysis is a detailed study of the various operations performed by a system and their relationships
within and outside of the system. One aspect of analysis is defining the boundaries of the system and
determining whether or not a candidate system should consider other related systems. During
analysis data are collected on the available files decision points and transactions handled by the
present system. This involves gathering information and using structured tools for analysis.
3.1 Existing System
The problem is summarized as follows: using imbalanced classification approaches, the number of
false alarms generated is higher than the number of frauds that are detected.
In existing System, a research about a case study involving credit card fraud detection, where data
normalization is applied before Cluster Analysis and with results obtained from the use of Cluster
Analysis and Artificial Neural Networks on fraud detection has shown that by clustering attributes
neuronal inputs can be minimized. And promising results can be obtained by using normalized data
and data should be MLP trained.
This research was based on unsupervised learning. Significance of this paper was to find new
methods for fraud detection and to increase the accuracy of results. The data set for this paper is
based on real life transactional data by a large European company and personal details in data is kept
confidential.

Accuracy of an algorithm is around 50%. Significance of this paper was to find an algorithm and to
reduce the cost measure. The result obtained was by 23% and the algorithm they find was Bayes
minimum risk.
Another problematic issue in credit card detection is the scarcity of available data due to
confidentiality issues that give little chance to the community to share real datasets and assess
existing techniques.
Drawbacks of Existing System
Fraud detection systems are prune to several difficulties and challenges enumerated below. An
effective fraud detection technique should have abilities to address these difficulties in order to
achieve best performance.
Fraud detection cost: The system should take into account both the cost of fraudulent
behavior that is detected and the cost of preventing it.
Imbalanced data: The credit card fraud detection data has imbalanced nature. It means that
very small percentages of all credit card transactions are fraudulent. This cause the detection
of fraud transactions very difficult and imprecise.
Nonexistence of standard algorithm: There is not any powerful algorithm known in credit
card fraud literature that outperforms all others. Each technique has its own advantages and
disadvantages. Combining impactful algorithms to support each other’s benefits and cover
their weaknesses would be of great interest.
Nonexistence of suitable metrics: The limitation of good metrics in order to evaluate the
results of fraud detection system is yet an open issue. Nonexistence of such metrics causes
incapability of researchers and practitioners in comparing different approaches and
determining priority of most efficient fraud detection system.
Billions of dollars of loss are caused every year by the fraudulent credit card transactions. Fraud is
old as humanity itself and can take an unlimited variety of different forms. The PwC global economic
crime survey of 2017 suggests that approximately 48% of organizations experienced economic
crime. Therefore, there is definitely an urge to solve the problem of credit card fraud detection.
Moreover, the development of new technologies provides additional ways in which criminals may
commit fraud. The use of credit cards is prevalent in modern day society and credit card fraud has
been kept on growing in recent years. Hugh Financial losses has been fraudulent affects not only
merchants and banks, but also individual person who are using the credits. Fraud may also affect the
reputation and image of a merchant causing non-financial losses that, though difficult to quantify in
the short term, may become visible in the long period. For example, if a cardholder is victim of fraud
with a certain company, he may no longer trust their business and choose a contender.
The Credit Card Fraud Detection Problem includes modelling past credit card transactions with the
knowledge of the ones that turned out to be fraud. This model is then used to identify whether a new
transaction is fraudulent or not. Our aim here is to detect 100% of the fraudulent transactions while
minimizing the incorrect fraud classifications. Misuse detection uses classification methods to

determine whether an incoming transaction is fraud or not. Usually, such an approach has to know
about the existing types of fraud to make models by learning the various fraud patterns. Anomaly
detection is to build the profile of normal transaction behaviour of a cardholder based on his/her
historical transaction data, and decide a newly transaction as a potential fraud if it deviates from the
normal transaction behaviour. However, an anomaly detection method needs enough successive
sample data to characterize the normal transaction behaviour of a cardholder.

3.2 Proposed System
• Several solutions have been proposed in a large body which to the best of our knowledge, are
built on machine learning algorithms.
• As it is a classification paradigm only the classification algorithms that would differentiate
Fraud and Non Fraud transaction is utilized .
• Support Vector Machine, Logistic Regression and Random Forest algorithms are used as they
are the best methods according to the 3 considered performance measures (Accuracy,
Sensitivity and AUPRC).
• The comparative analysis on the desired parameters like confusion matrix , measure,
precision, accuracy, intervention and recall are used to compare these algorithm.
• We will develop a model for the class imbalance problem to find a trade-off between
sensitivity and accuracy.
• The tabulated results would depict the proper differentiation between algorithm to realize
tradeoffs among the algorithms.
In proposed System, we are applying random forest algorithm for classify the credit card dataset.
Random Forest is an algorithm for classification and regression. Summarily, it is a collection of
decision tree classifiers. Random forest has advantage over decision tree as it corrects the habit of
over fitting to their training set. A subset of the training set is sampled randomly so that to train each
individual tree and then a decision tree is built, each node then splits on a feature selected from a
random subset of the full feature set. Even for large data sets with many features and data instances
training is extremely fast in random forest and because each tree is trained independently of the
others. The Random Forest algorithm has been found to provide a good estimate of the generalization
error and to be resistant to over fitting.
Advantage
Random forest ranks the importance of variables in a regression or classification problem in a
natural way can be done by Random Forest.
The ‘amount’ feature is the transaction amount. Feature ‘class’ is the target class for the
binary classification and it takes value 1 for positive case (fraud) and 0 for negative case (non
fraud).
3.3 Objectives
• Different machine learning methods that are utilized include the K means clustering algorithm
in the unsupervised learning algorithm, Random Forest Algorithm in the regression based
algorithm, and Support vector machine(SVM) ( Linear, RBF and sigmoidal kernel ),ANN in
the supervised algorithms.

• The objectives of credit card fraud detection are to reduce losses due to payment fraud for
both merchants and issuing banks an increase revenue opportunities for merchants.
• The aim is to drop in the false alarm and thus also to lead an increase in accuracy.
• Our goal is to detect the issues that must be solved to product a highly efficient solution for
the class imbalance problem.
• Our aim here is to detect 100% of the fraudulent transactions while minimizing the incorrect
fraud classifications.
• The performance evaluation by developing the confusion matrix , Fmeasure, precision,
accuracy, intervention and recall are used to compare these algorithms. Python and
SKlearn based implementation is carried out.
• The performance evaluation by developing the confusion matrix , Fmeasure, precision,
accuracy, intervention and recall are used to compare these algorithms. Python and SKlearn
based implementation is carried out and the results are tabulated.

CHAPTER 4
SYSTEM REQUIREMENTS
Hardware Requirements
OS – Windows 7, 8 and 10 (32 and 64 bit)
RAM – 4GB
i3 or i5 processor
Software Requirements
Python with SKlearn
Anaconda
Jupyter cloud platform

CHAPTER 5
SYSTEM ARCHITECTURE
Figure 5.1:System Architecture

DESCRIPTION OF SYSTEM ARCHITECTURE
Frequent itemsets are sets of items that occur simultaneously in as many transactions as the user
defined minimum support. The metric support( ) is defined as the fraction of records of database
that contains the itemset as a subset:

support( ) = count( ) /| | (1)
.
For example, if the database contains 1000 records and the itemset support th e itemset
appears in 800 records, then the
support( ) = 800/1000 = 0.8 = 80%; that is, 80% of transactions

In credit card transaction data, the legal pattern of a customer is the set of attribute values specific to
a customer when he does a legal transaction which shows the customer behaviour. It is found that the
fraudsters are also behaving almost in the same manner as that of a customer .
This means that fraudsters are intruding into customer accounts after learning their genuine behaviour
only. Therefore, instead of finding a common pattern for fraudster behaviour it is more valid to
identify fraud patterns for each customer. Thus, in this research, we have constructed two patterns for
each customer—legal pattern and fraud pattern. When frequent pattern mining is applied to credit
card transaction data of a particular customer, it returns set of attributes showing same values in a
group of transactions specified by the support.
Generally the frequent pattern mining algorithms like that of Apriori return many such groups and
the longest group containing maximum number of attributes is selected as that particular customer’s
legal pattern. The training (pattern recognition) algorithm is given below:
Step 1. Separate each customer’s transactions from the whole transaction database
.
Step 2. From each customer’s transactions separate his/her legal and fraud transactions.
Step 3. Apply Apriori algorithm to the set of legal transactions of each customer. The Apriori
algorithm returns a set of frequent item sets. Take the largest frequent itemset as the legal pattern
corresponding to that customer. Store these legal patterns in legal pattern database.
Step 4. Apply Apriori algorithm to the set of fraud transactions of each customer. The Apriori
algorithm returns a set of frequent item sets. Take the largest frequent itemset as the fraud pattern
corresponding to that customer. Store these fraud patterns in fraud pattern database.
After finding the legal and fraud patterns for each customer, the fraud detection system traverses
these fraud and legal pattern databases in order to detect frauds. These pattern databases are much
smaller in size than original customer transaction databases as they contain only one record
corresponding to a customer. This research proposes a matching algorithm which traverses the
pattern databases for a match with the incoming transaction to detect fraud. If a closer match is found
with legal pattern of the corresponding customer, then the matching algorithm returns “0” giving a
green signal to the bank for allowing the transaction. If a closer match is found with fraud pattern of
the corresponding customer, then the matching algorithm returns “1” giving an alarm to
the bank for stopping the transaction. The size of pattern databases is × where
algorithm is explained below.
is the number of
customers and
is the number of attributes. The matching (testing)
Step 1. Count the number of attributes in the incoming
䈸 transaction matching with that of the legal pattern of the corresponding customer. Let it be .
Step 2. Count the number of attributes in the incoming transaction matching with that of the fraud
pattern of the corresponding customer. Let it be
䈸.
Step 3. If is legal.
= 0 and is more than the user defined matching percentage, then the incoming
transaction 䈸䈸
Step 4. If 䈸 = 0 and 䈸 is more than the user defined matching percentage, then the incoming
transaction is fraud.
Step 5. If both
䈸 and
䈸 are greater than zero and
䈸≥
䈸, then the incoming transaction is fraud or else it is legal.

CHAPTER-6
ALGORITHMS
ARTIFICIAL NEURAL NETWORK

An artificial neural network (ANN) is a set of interconnected nodes designed to imitate the
functioning of the human brain. Each node has a weighted connection to several other nodes in
adjacent layers. Individual nodes take the input received from connected nodes and use the weights
together with a simple function to compute output values. Neural networks come in many shapes and
architectures. The Neural network architecture, including the number of hidden layers, the number of
nodes within a specific hidden layer and their connectivity, most be specified by user based on the
complexity of the problem. ANNs can be configured by supervised, unsupervised or hybrid learning
methods.
Supervised techniques
In supervised learning, samples of both fraudulent and non-fraudulent records, associated with their
labels are used to create models. These techniques are often used in fraud analysis approach. One of
the most popular supervised neural networks is back propagation network (BPN). It minimizes the
objective function using a multi-stage dynamic optimization method that is a generalization of the
delta rule. The back propagation method is often useful for feed-forward network with no feedback.
The BPN algorithm is usually time-consuming and parameters like the number of hidden neurons and
learning rate of delta rules require extensive tuning and training to achieve the best performance. In
the domain of fraud detection, supervised neural networks like back-propagation are known as
efficient tool that have numerous applications.
.Raghavendra Patidar, et al. used a dataset to train a three layers backpropagation neural network in
combination with genetic algorithms (GA)for credit card fraud detection. In this work, genetic
algorithms was responsible for making decision about the network architecture, dealing with the
network topology, number of hidden layers and number of nodes in each layer.
.Also, Aleskerovet al. developed a neural network based data mining system for credit card fraud
detection. The proposed system (CARDWATCH) had three layers auto associative architectures.
They used a set of synthetized data for training and testing the system. The reported results show very
successful fraud detection rates.
In a P-RCE neural network was applied for credit card fraud detection .P-RCE is a type of radial-
basis function networks that usually applied for pattern recognition tasks. Krenkeret al. proposed a
model for real time fraud detection based on bidirectional neural networks. They used a large data set
of cell phone transactions provided by a credit card company. It was claimed that the system
outperforms the rule based algorithms in terms of false positive rate.
Again in a parallel granular neural network (GNN) is proposed to speed up data mining and
knowledge discovery process for credit card fraud detection. GNN is a kind of fuzzy neural network
based on knowledge discovery (FNNKD).The underlying dataset was extracted from SQL server
database containing sample Visa Card transactions and then pre processed for applying in fraud
detection. They obtained less average training errors in the presence of larger training dataset.

Unsupervised techniques
The unsupervised techniques do not need the previous knowledge of fraudulent and normal records.
These methods raise alarm for those transactions that are most dissimilar from the normal ones.
These techniques are often used in user behaviour approach. ANNs can produce acceptable result for
enough large transaction dataset. They need a long training dataset. Self organizing map (SOM) is
one of the most popular unsupervised neural networks learning which was introduced . SOM
provides a clustering method, which is appropriate for constructing and analysing customer profiles
,in credit card fraud detection, as suggested in SOM operates in two phase: training and mapping. In
the former phase, the map is built and weights of the neurons are updated iteratively, based on input
samples , in latter, test data is classified automatically into normal and fraudulent classes through the
procedure of mapping. As stated in, after training the SOM, new unseen transactions are compared to
normal and fraud clusters, if it is similar to all normal records, it is classified as normal. New fraud
transactions are also detected similarly.
One of the advantages of using unsupervised neural networks over similar techniques is that these
methods can learn from data stream. The more data passed to a SOM model, the more adaptation and
improvement on result is obtained. More specifically, the SOM adapts its model as time passes.
Therefore it can be used and updated online in banks or other financial corporations. As a result, the
fraudulent use of a card can be detected fast and effectively. However, neural networks has some
drawbacks and difficulties which are mainly related to specifying suitable architecture in one hand
and excessive training required for reaching to best performance in other hand.
Hybrid supervised and unsupervised techniques
In addition to supervised and unsupervised learning models of neural networks, some researchers
have applied hybrid models. John ZhongLei et.Al. proposed hybrid supervised (SICLN) and
unsupervised (ICLN)learning network for credit card fraud detection. They improved the reward only
rule of SICLN model to ICLN in order to update weights according to both reward and penalty. This
improvement appeared in terms of increasing stability and reducing the training time. Moreover, the
number of final clusters of the ICLN is independent from the number of initial network neurons. As a
result the inoperable neurons can be omitted from the clusters by applying the penalty rule. The
results indicated that both the ICLN and the SICLN have high performance, but the SICL outperforms
well-known unsupervised clustering algorithms.
SUPPORT VECTOR MACHINES

Support Vector Machine (SVM) was first heard in 1992, introduced by Boser, Guyon, and Vapnik in
COLT-92. Support vector machines (SVMs) are a set of related supervised learning methods used for
classification and regression. They belong to a family of generalized linear classifiers. In another
terms, Support Vector Machine (SVM) is a classification and regression prediction tool that uses
machine learning theory to maximize predictive accuracy while automatically avoiding over-fit to the
data. Support Vector machines can be defined as systems which use hypothesis space of a linear
functions in a high dimensional feature space, trained with a learning algorithm from optimization
theory that implements a learning bias derived from statistical learning theory.
Support vector machine was initially popular with the NIPS community and now is an active part of
the machine learning research around the world. SVM becomes famous when, using pixel maps as

input; it gives accuracy comparable to sophisticated neural networks with elaborated features in a
handwriting recognition task. It is also being used for many applications, such as hand writing
analysis, face analysis and so forth, especially for pattern classification and regression based
applications.
The foundations of Support Vector Machines (SVM) have been developed by Vapnik and gained
popularity due to many promising features such as better empirical performance. The formulation
uses the Structural Risk Minimization (SRM) principle, which has been shown to be superior, to
traditional Empirical Risk Minimization (ERM) principle, used by conventional neural networks.
SRM minimizes an upper bound on the expected risk, where as ERM minimizes the error on the
training data. It is this difference which equips SVM with a greater ability to generalize, which is the
goal in statistical learning. SVMs were developed to solve the classification problem, but recently
they have been extended to solve regression problems.
Firstly working with neural networks for supervised and unsupervised learning showed good results
while used for such learning applications. MLP’s uses feed forward and recurrent networks.
Multilayer perceptron (MLP) properties include universal approximation of continuous nonlinear
functions and include learning with input-output patterns and also involve advanced network
architectures.
Figure 6.1:simple neural network
Figure 6.2: multilayer perceptron

These are simple visualizations just to have a overview as how neural network looks like.
There can be some issues noticed. Some of them are having many local minima and also finding how
many neurons might be needed for a task is another issue which determines whether optimality of
that NN is reached. Another thing to note is that even if the neural network solutions used tends to
converge, this may not result in a unique solution [11]. Now let us look at another example where we
plot the data and try to classify it and we see that there are many hyper planes which can classify it.
Figure 6.3: Representation of hyper planes
But which one is better? The need for SVM arises. Note the legend is not described as they are
sample plotting to make understand the concepts involved. From above illustration, there are many
linear classifiers (hyper planes) that separate the data. However only one of these achieves maximum
separation. The reason we need it is because if we use a hyper plane to classify, it might end up closer
to one set of datasets compared to others and we do not want this to happen and thus we see that the
concept of maximum margin classifier or hyper plane as an apparent solution. The next illustration
gives the maximum margin classifier example which provides a solution. .Figure 4: Illustration of
Linear SVM. ( Taken from Andrew W. Moore slides 2003) [2]. Note the legend is not described as
they are sample plotting to make understand the concepts involved.

Figure 6.4: Illustration of Linear SVM
Expression for Maximum margin is :

xw b
marginarg min d( x) arg min
x D x D d2
w
i 1 i
The above illustration is the maximum linear classifier with the maximum range. In this context it is
an example of a simple linear SVM classifier. Another interesting question is why maximum margin?
There are some good explanations which include better empirical performance. Another reason is that
even if we’ve made a small error in the location of the boundary this gives us least chance of causing a
misclassification. The other advantage would be avoiding local minima and better classification. Now
we try to express the SVM mathematically and for this tutorial we try to present a linear SVM. The
goals of SVM are separating the data with hyper plane and extend this to non-linear boundaries using
kernel trick. For calculating the SVM we see that the goal is to correctly classify all the data. For
mathematical calculations we have,
[a] If Yi= +1; wxi + b >1
[b] If Yi= -1; wxi + b ≤ 1
[c] For all i; yi (wi + b) ≥ 1

In this equation x is a vector point and w is weight and is also a vector. So to separate the data [a]
should always be greater than zero. Among all possible hyper planes, SVM selects the one where the
distance of hyper plane is as large as possible. If the training data is good and every test vector is
located in radius r from training vector. Now if the chosen hyper plane is located at the farthest
possible from the data. This desired hyper plane which maximizes the margin also bisects the lines
between closest points on convex hull of the two datasets. Thus we have [a], [b] &
[c].
Figure 6.5: Representation of Hyper planes.
Distance of closest point on hyperplane to origin can be found by maximizing the x as x is on the
hyper plane. Similarly for the other side points we have a similar scenario. Thus solving and
subtracting the two distances we get the summed distance from the separating hyperplane to nearest
points. Maximum Margin = M = 2 / ||w||
Now maximizing the margin is same as minimum. Now we have a quadratic optimization problem
and we need to solve for w and b. To solve this we need to optimize the quadratic function with
linear constraints. The solution involves constructing a dual problem and where a Langlier’s
multiplier αi is associated. We need to find w and b such that Φ (w) =½ |w’||w| is minimized;
And for all {(xi, yi)}: yi (w * xi + b) ≥ 1.
Now solving: we get that w =Σαi * xi; b= yk- w *xk for any xk such that αk 0
Now the classifying function will have the following form: f(x) = Σαi yi xi * x + b

Figure 6.6: Representation of Support Vectors
SVM Representation
In this we present the QP formulation for SVM classification. This is a simple representation only.
l
SV classification: min f K2C i
f, i i 1
yif(xi) 1 - i, for all i i0
SVM classification, Dual formulation:

l
l ll i yi 0
minαi 1 α α y y K(x , x ) 0iC, for all i; i 1
i j i j i j
α
ii 1 2 i 1 j1
Variables are called slack variables and they measure the error made at point (xi,yi). Training SVM
becomes quite challenging when the number of training points is large. A number of methods for fast
SVM training have been proposed .
RANDOM FOREST ALGORITHM
Random Forest is a flexible, easy to use machine learning algorithm that produces, even without
hyper-parameter tuning, a great result most of the time. It is also one of the most used algorithms,
because it’s simplicity and the fact that it can be used for both classification and regression tasks. In
this post, you are going to learn, how the random forest algorithm works and several other important
things about it.
Table of Contents
How it works
Real Life Analogy
Feature Importance
Difference between Decision Trees and Random Forests
Important Hyperparameters (predictive power, speed)
Advantages and
Disadvantages Use Cases

Summary
How it works
Random Forest is a supervised learning algorithm. Like you can already see from it’s name, it creates
a forest and makes it somehow random. The „forest“ it builds, is an ensemble of Decision Trees,
most of the time trained with the “bagging” method. The general idea of the bagging method is that a
combination of learning models increases the overall result.
One big advantage of random forest is, that it can be used for both classification and regression
problems, which form the majority of current machine learning systems. I will talk about random
forest in classification, since classification is sometimes considered the building block of machine
learning. Below you can see how a random forest would look like with two trees:
Figure 6.7: Representation of Random Forest
Random Forest has nearly the same hyperparameters as a decision tree or a bagging classifier.
Fortunately, you don’t have to combine a decision tree with a bagging classifier and can just easily
use the classifier-class of Random Forest. Like I already said, with Random Forest, you can also deal
with Regression tasks by using the Random Forest regressor.
Random Forest adds additional randomness to the model, while growing the trees. Instead of
searching for the most important feature while splitting a node, it searches for the best feature among
a random subset of features. This results in a wide diversity that generally results in a better model.

Therefore, in Random Forest, only a random subset of the features is taken into consideration by the
algorithm for splitting a node. You can even make trees more random, by additionally using random
thresholds for each feature rather than searching for the best possible thresholds (like a normal
decision tree does).
Real Life Analogy:
Imagine a guy named Andrew, that want’s to decide, to which places he should travel during a one-
year vacation trip. He asks people who know him for advice. First, he goes to a friend, asks Andrew
where he travelled to in the past and if he liked it or not. Based on the answers, he will give Andrew
some advice.
This is a typical decision tree algorithm approach. Andrews friend created rules to guide his decision
about what he should recommend, by using the answers of Andrew. Afterwards, Andrew starts
asking more and more of his friends to advise him and they again ask him different questions, where
they can derive some recommendations from. Then he chooses the places that where recommend the
most to him, which is the typical Random Forest algorithm approach.
Feature Importance:
Another great quality of the random forest algorithm is that it is very easy to measure the relative
importance of each feature on the prediction. Sklearn provides a great tool for this, that measures a
features importance by looking at how much the tree nodes, which use that feature, reduce impurity
across all trees in the forest. It computes this score automatically for each feature after training and
scales the results, so that the sum of all importance is equal to 1
.If you don’t know how a decision tree works and if you don’t know what a leaf or node is, here is a
good description from Wikipedia: In a decision tree each internal node represents a “test” on an
attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the
test, and each leaf node represents a class label (decision taken after computing all attributes). A node
that has no children is a leaf.
Through looking at the feature importance, you can decide which features you may want to drop,
because they don’t contribute enough or nothing to the prediction process. This is important, because
a general rule in machine learning is that the more features you have, the more likely your model will
suffer from overfitting and vice versa.

Figure 6.8:Table and Pictorial Representation
Difference between Decision Trees and Random Forests:
Like already mentioned, Random Forest is , Random Forest is a collection of Decision Trees, but
there are some differences.
If you input a training dataset with features and labels into a decision tree, it will formulate some set
of rules, which will be used to make the predictions.
For example, if you want to predict whether a person will click on an online advertisement, you could
collect the ad’s the person clicked in the past and some features that describe his decision. If you put the
features and labels into a decision tree, it will generate some rules. Then you can predict whether the
advertisement will be clicked or not. In comparison, the Random Forest algorithm randomly selects
observations and features to build several decision trees and then averages the results.
Another difference is that „deep“ decision trees might suffer from overfitting. Random Forest
prevents overfitting most of the time, by creating random subsets of the features and building smaller
trees using these subsets. Afterwards, it combines the subtrees. Note that this doesn’t work every
time and that it also makes the computation slower, depending on how many trees your random
forest builds.

Important Hyperparameters:
The Hyperparameters in random forest are either used to increase the predictive power of the model
or to make the model faster. I will here talk about the hyperparameters of sklearns built-in random
forest function
.1. Increasing the Predictive Power

Firstly, there is the „n_ estimators“ hyperparameter, which is just the number of trees the algorithm
builds before taking the maximum voting or taking averages of predictions. In general, a higher
number of trees increases the performance and makes the predictions more stable, but it also slows
down the computation. Another important hyperparameter is „max_ features“, which is the
maximum number of features Random Forest considers to split a node. The last important hyper-
parameter we will talk about in terms of speed, is „min_ sample_ leaf “. This determines, like its
name already says, the minimum number of leafs that are required to split an internal node.
2. Increasing the Models Speed

The „n_ jobs“ hyperparameter tells the engine how many processors it is allowed to use. If it has a
value of 1, it can only use one processor. A value of “-1” means that there is no limit..
Random _state makes the model’s output replicable. The model will always produce the same
results when it has a definite value of random state and if it has been given the same hyperparameters
and the same training data
.Lastly, there is the oob score(also called oob sampling), which is a random forest cross validation
method. In this sampling, about one-third of the data is not used to train the model and can be used to
evaluate its performance. These samples are called the out of bag samples. It is very similar to the
leave-one-out cross-validation method, but almost no additional computational burden goes along
with it.
Advantages and Disadvantages:
Like already mentioned, an advantage of random forest is that it can be used for both regression and
classification tasks and that it’s easy to view the relative importance it assigns to the input features.
Random Forest is also considered as a very handy and easy to use algorithm, because it’s default
hyperparameters often produce a good prediction result. The number of hyperparameters is also not
that high and they are straightforward to understand.
One of the big problems in machine learning is overfitting, but most of the time this won’t happen
that easy to a random forest classifier. That’s because if there are enough trees in the forest, the
classifier won’t overfit the model.
The main limitation of Random Forest is that a large number of trees can make the algorithm to slow
and ineffective for real-time predictions. In general, these algorithms are fast to train, but quite slow
to create predictions once they are trained. A more accurate prediction requires more trees, which

results in a slower model. In most real-world applications the random forest algorithm is fast enough,
but there can certainly be situations where run-time performance is important and other approaches
would be preferred.
And of course Random Forest is a predictive modelling tool and not a descriptive tool. That means, if
you are looking for a description of the relationships in your data, other approaches would be
preferred.
Use Cases:
The random forest algorithm is used in a lot of different fields, like Banking, Stock Market, Medicine
and E-Commerce. In Banking it is used for example to detect customers who will use the bank’s
services more frequently than others and repay their debt in time. In this domain it is also used to
detect fraud customers who want to scam the bank. In finance, it is used to determine a stock’s
behaviour in the future. In the healthcare domain it is used to identify the correct combination of
components in medicine and to analyse a patient’s medical history to identify diseases. And lastly, in
E-commerce random forest is used to determine whether a customer will actually like the product or
not.
Summary:
Random Forest is a great algorithm to train early in the model development process, to see how it
performs and it’s hard to build a “bad” Random Forest, because of its simplicity. This algorithm is
also a great choice, if you need to develop a model in a short period of time. On top of that, it
provides a pretty good indicator of the importance it assigns to your features.
Random Forests are also very hard to beat in terms of performance. Of course you can probably
always find a model that can perform better, like a neural network, but these usually take much more
time in the development. And on top of that, they can handle a lot of different feature types, like
binary, categorical and numerical.
Overall, Random Forest is a (mostly) fast, simple and flexible tool, although it has its limitations.

CHAPTER 7
MODULES
Data Cleansing:
When going through our data cleaning process it’s best to perform all of our cleaning in a coarse-to-
fine style. Start with the biggest glaring issues and work your way down to the nitty gritty details.
Based on this approach, the first thing we’ll do is remove any unrelated or irrelevant features.
Perform a very quick exploration of your dataset to determine which features aren’t highly correlated
with the output you want to predict. You can do this in a few ways:
Perform a correlation analysis of the feature variables.
Check how many rows each feature variable missing. If a variable is missing 90% of its data
points then it’s probably wise to just drop it all together.
Consider the nature of the variable itself. Is it actual useful, from a practical point of view, to
be using this feature variable? Only drop it if you’re quite sure it won’t be helpful.
Handling missing values
We’ve already dropped the feature variables with a high percentage of missing values. Now we want
to handle those feature variables that we do actually need but also have missing values. Again we
have a few options:
Fill in the missing rows with an arbitrary value
Fill in the missing rows with a value computed from the data’s
statistics Ignore missing rows
There first one can be done if you know what a good default value should be. But if you can compute
a value from some kind of statistical analysis that is often highly preferred since it at least has some
support from the data. The last option can be taken if we have a large enough dataset to afford
throwing away some of the rows. However, before you do this, be sure to take a quick look at the
data to be sure that those data points aren’t critically important.
Formatting the data
When datasets are collected, the data will often be entered in by human users as plain text. This can
cause complications with the data format. For example, there are many ways to enter in the name of
the state of California: CA, C.A, California, Cali; these will all need to be standardised into one
uniform format. In addition, there may be cases where the data is continuous and we want to make it
discrete or vice versa.
Standardising data format including acronyms, capitalisation, and
style. Discretising continuous data, or vice versa.
Training :
In general, a learning problem considers a set of n samples of data and then tries to predict properties
of unknown data. If each sample is more than a single number and, for instance, a multi-dimensional
entry (aka multivariate data), it is said to have several attributes or features.

Learning problems fall into a few categories:

supervised learning, in which the data comes with additional attributes that we want to predict .This
problem can be either:
classification: samples belong to two or more classes and we want to learn from already
labelled data how to predict the class of unlabelled data. An example of a classification
problem would be handwritten digit recognition, in which the aim is to assign each input
vector to one of a finite number of discrete categories. Another way to think of classification
is as a discrete (as opposed to continuous) form of supervised learning where one has a
limited number of categories and for each of the n samples provided, one is to try to label
them with the correct category or class.
regression: if the desired output consists of one or more continuous variables, then the task is
called regression. An example of a regression problem would be the prediction of the length
of a salmon as a function of its age and weight.
unsupervised learning, in which the training data consists of a set of input vectors x without any
corresponding target values. The goal in such problems may be to discover groups of similar
examples within the data, where it is called clustering, or to determine the distribution of data
within the input space, known as density estimation, or to project the data from a high-
dimensional space down to two or three dimensions for the purpose of visualisation.
Training set and testing set
Machine learning is about learning some properties of a data set and then testing those properties
against another data set. A common practice in machine learning is to evaluate an algorithm by
splitting a data set into two. We call one of those sets the training set, on which we learn some
properties; we call the other set the testing set, on which we test the learned properties.
Learning and predicting
In the case of the digits dataset, the task is to predict, given an image, which digit it represents.
We are given samples of each of the 10 possible classes (the digits zero through nine) on which
we fit an estimator to be able to predict the classes to which unseen samples belong.
In scikit-learn, an estimator for classification is a Python object that implements the methods
fit(X, y) and predict(T).
Testing
Learning the parameters of a prediction function and testing it on the same data is a
methodological mistake: a model that would just repeat the labels of the samples that it has just
seen would have a perfect score but would fail to predict anything useful on yet-unseen data. This
situation is called overfitting. To avoid it, it is common practice when performing a (supervised)
machine learning experiment to hold out part of the available data as a test set X_test, y_test.
Note that the word “experiment” is not intended to denote academic use only, because even in
commercial settings machine learning usually starts out experimentally. Here is a flowchart of
typical cross validation workflow in model training. The best parameters can be

determinedby gridsearch techniques.
Figure 7.1:Modular Representation
When evaluating different settings (“hyperparameters”) for estimators, such as the C setting that must
be manually set for an SVM, there is still a risk of overfitting on the test set because the parameters
can be tweaked until the estimator performs optimally. This way, knowledge about the test set can
“leak” into the model and evaluation metrics no longer report on generalization performance. To
solve this problem, yet another part of the dataset can be held out as a so-called “validation set”:
training proceeds on the training set, after which evaluation is done on the validation set, and when
the experiment seems to be successful, final evaluation can be done on the test set.
However, by partitioning the available data into three sets, we drastically reduce the number of
samples which can be used for learning the model, and the results can depend on a particular random
choice for the pair of (train, validation) sets.
A solution to this problem is a procedure called cross-validation (CV for short). A test set should still
be held out for final evaluation, but the validation set is no longer needed when doing CV. In the
basic approach, called k-fold CV, the training set is split into k smaller sets (other approaches are
described below, but generally follow the same principles). The following procedure is followed for
each of the k “folds”:
A model is trained using k−1 of the folds as training data;
the resulting model is validated on the remaining part of the data (i.e., it is used as a test set to
compute a performance measure such as accuracy)
.The performance measure reported by k-fold cross-validation is then the average of the values
computed in the loop. This approach can be computationally expensive, but does not waste too much
data (as is the case when fixing an arbitrary validation set), which is a major advantage in problems
such as inverse inference where the number of samples is very small.
The cross_validate function differs from cross_val_score in two ways:

It allows specifying multiple metrics for evaluation.

It returns a dictionary containing fit-times, score-times (and optionally training scores as well as
fitted estimators) in addition to the test score.
Step 1 — Importing Scikit-learn
Let’s begin by installing the Python module Scikit-learn, one of the best and most documented
machine learning libraries for Python.
To begin our coding project, let’s activate our Python 3 programming environment.
Step 2 — Importing Scikit-learn’s Dataset

The dataset we will be working with in this tutorial is the Breast Cancer Wisconsin Diagnostic
Database. The dataset includes various information about breast cancer tumors, as well as
classification labels of credit card fraud or Clean transaction.
Step 3 — Organizing Data into Sets
To evaluate how well a classifier is performing, you should always test the model on unseen data.
Therefore, before building a model, split your data into two parts: a training set and a test set.
Step 4 — Building and Evaluating the Model

There are many models for machine learning, and each model has its own strengths and weaknesses.
In this tutorial, we will focus on a simple algorithm that usually performs well in binary classification
tasks using different models. After we train the model, we can then use the trained model to make
predictions on our test set, which we do using the predict() function. The predict() function
returns an array of predictions for each data instance in the test set. We can then print our predictions
to get a sense of what the model determined
Step 5 — Evaluating the Model’s Accuracy
Using the array of true class labels, we can evaluate the accuracy of our model’s predicted values by
comparing the two arrays (test_labels vs. preds). We will use the sklearn function
accuracy_score() to determine the accuracy of our machine learning classifier.

BIBLIOGRAPHY
[1] P. Richhariya and P. K. Singh, ``Evaluating and emerging payment card fraud challenges and
resolution,'' Int. J. Comput. Appl., vol. 107, no. 14,pp. 5_10, Jan. 2014.
[2] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland, ``Datamining for credit card fraud:
A comparative study,'' Decis. Support Syst.,vol. 50, no. 3, pp. 602_613, 2011.
[3] A. Dal Pozzolo, O. Caelen, Y.-A. L. Borgne, S. Waterschoot, andG. Bontempi, ``Learned lessons
in credit card fraud detection from a practitioner perspective,'' Expert Syst. Appl., vol. 41, no. 10, pp.
4915_4928,2014
.[4] C. Phua, D. Alahakoon, and V. Lee, ``Minority report in fraud detection: Classification of
skewed data,'' ACM SIGKDD Explorations Newslett.,vol. 6, no. 1, pp. 50_59, 2004.
[5] Z.-H. Zhou and X.-Y. Liu, ``Training cost-sensitive neural networks with methods addressing the
class imbalance problem,'' IEEE Trans. Knowl.Data Eng., vol. 18, no. 1, pp. 63_77, Jan. 2006.
[6] S. Ertekin, J. Huang, and C. L. Giles, ``Active learning for class imbalance eproblem,'' in Proc.
30th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf.Retr., 2007, pp. 823_824
[7] M. Wasikowski and X. Chen, ``Combating the small sample class imbalance problem using
feature selection,'' IEEE Trans. Knowl. Data Eng.,vol. 22, no. 10, pp. 1388_1400, Oct. 2010.
[8] S. Wang and X. Yao, ``Multiclass imbalance problems: Analysis and potential solutions,'' IEEE
Trans. Syst., Man, Cybern. B, Cybern., vol. 42,no. 4, pp. 1119_1130, Aug. 2012
[9] R. J. Bolton and D. J. Hand, ``Statistical fraud detection: A review,'' Stat.Sci., vol. 17, no. 3, pp.
235_249, Aug. 2002.
[10] D. J. Weston, D. J. Hand, N. M. Adams, C. Whitrow, and P. Juszczak,``Plastic card fraud
detection using peer group analysis,'' Adv. Data Anal.Classi_cation, vol. 2, no. 1, pp. 45_62, 2008.
[11] E. Duman and M. H. Ozcelik, ``Detecting credit card fraud by genetic algorithm and scatter
search,'' Expert Syst. Appl., vol. 38, no. 10, Sep,2011.

Final 1st Phase

Uploaded by

Copyright:

Available Formats

Final 1st Phase

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final 1st Phase

Uploaded by

Copyright:

Available Formats

Credit Card Fraud Detection

They are briefly introduced in the following:

Counterfeit fraud / behavioural fraud / cardholder-not-present fraud: when transactions are

Department of CSE, VVIT 2019-20 Page 1

The main contributions of this work are highlighted as follows:

Department of CSE, VVIT 2019-20 Page 2

Credit card fraud

Credit Card Fraud Detection Techniques

Department of CSE, VVIT 2019-20 Page 3

Department of CSE, VVIT 2019-20 Page 4

1) Evaluating and emerging payment card fraud challenges and

2)Learned lessons in credit card fraud detection from a

Department of CSE, VVIT 2019-20 Page 5

3) Active Learning for Class Imbalance Problem

4) Detecting credit card fraud by genetic algorithm and scatter search

5) Credit Card Fraud Detection Using Hidden Markov Model

Department of CSE, VVIT 2019-20 Page 6

6) A Novel Approach for Credit Card Fraud Detection Targeting

7) Identifying online credit card fraud using Artificial

8) Cost Sensitive Credit Card Fraud Detection Using Bayes

Department of CSE, VVIT 2019-20 Page 7

9) Detecting Credit Card Fraud by Using Questionnaire-Responded

10) Credit card fraud detection using anti-k nearest neighbour

11) Credit Card Fraud Detection Using Self-Organizing Maps

Department of CSE, VVIT 2019-20 Page 8

12) Enhanced System for Revealing Fraudulence in Credit

13) Credit cards fraud detection by negative selection algorithm

14) Real-time credit card fraud detection using computational

Department of CSE, VVIT 2019-20 Page 9

15) Learning from Imbalanced Data

16) Credit Card Fraud Detection Using Self Organised Map

Department of CSE, VVIT 2019-20 Page 10

Introduction to System Analysis

A system is an orderly group of interdependent components linked together according to a plan to

3.1 Existing System

Department of CSE, VVIT 2019-20 Page 11

Drawbacks of Existing System

Department of CSE, VVIT 2019-20 Page 12

Department of CSE, VVIT 2019-20 Page 13

3.2 Proposed System

Department of CSE, VVIT 2019-20 Page 14

Department of CSE, VVIT 2019-20 Page 15

OS – Windows 7, 8 and 10 (32 and 64 bit)

Python with SKlearn

Jupyter cloud platform

Department of CSE, VVIT 2019-20 Page 16

Figure 5.1:System Architecture

Department of CSE, VVIT 2019-20 Page 17

DESCRIPTION OF SYSTEM ARCHITECTURE

that contains the itemset as a subset:

support( ) = 800/1000 = 0.8 = 80%; that is, 80% of transactions

Step 1. Count the number of attributes in the incoming

䈸 are greater than zero and

䈸, then the incoming transaction is fraud or else it is legal.

ARTIFICIAL NEURAL NETWORK

Department of CSE, VVIT 2019-20 Page 20

SUPPORT VECTOR MACHINES

Department of CSE, VVIT 2019-20 Page 21

Figure 6.1:simple neural network