Paper 65-Fraud Detection in Credit Cards
Paper 65-Fraud Detection in Credit Cards
Paper 65-Fraud Detection in Credit Cards
Abstract—Due to the increasing number of customers as well In Malaysia, the number of transactions performed through
as the increasing number of companies that use credit cards for credit cards in 2011 was approximately 317 million, and this
ending financial transactions, the number of fraud cases has number increased to 447 million in 2018 [4]. In 2015, global
increased dramatically. Dealing with noisy and imbalanced data, credit card fraud reached a record of $21.84 billion, as
as well as with outliers, has accentuated this problem. In this reported by [2]. The number of fraud cases has been rising
work, fraud detection using artificial intelligence is proposed. with the increased use of credit cards. While various
The proposed system uses logistic regression to build the verification methods have been implemented, the number of
classifier to prevent frauds in credit card transactions. To handle fraud cases involving credit cards has not been significantly
dirty data and to ensure a high degree of detection accuracy, a
decreased [6]. The potential for substantial monetary gains,
pre-processing step is used. The pre-processing step uses two
novel main methods to clean the data: the mean-based method
combined with the ever-changing nature of financial services,
and the clustering-based method. Compared to two well-known
creates a wide range of opportunities for fraudsters [7]. Funds
classifiers, the support vector machine classifier and voting from payment card fraud are often used in criminal activities
classifier, the proposed classifier shows better results in terms of that are hard to prevent, e.g., to support terrorist acts [8]. The
accuracy, sensitivity, and error rate. internet is where fraudsters prefer to be because they are able
to conceal their location and identity. The recent increase in
Keywords—Classifier; logistic regression; accuracy; credit card fraud has directly hit the financial sector hard.
smoothing; artificial intelligence; cross validation Losses due to credit card fraud mainly impact merchants
because they bear all expenses, including the fees from their
I. INTRODUCTION card issuer, administrative fees and other charges [9]. All the
According to the definition of fraud [1], the aim of fraud is losses are borne by the merchants, leading to increases in the
to achieve personal or financial gain through deception. Based prices of goods and decreases in discounts. Hence, reducing
on this, fraud detection and prevention are the two significant this loss is highly important. An effective fraud detection
methods for avoiding loss due to fraud. Fraud prevention is system is required to minimize the number of cases of fraud.
the proactive technique for avoiding the occurrence of A. Motivation
fraudulent acts, and fraud detection is the technique for the
detection of fraudulent transactions by fraudsters [2]. A The use of credit cards to perform financial transactions at
variety of payment cards, including credit, charge, debit, and banks or other institutions is a common action in light of the
prepaid cards, are currently widely available. They are the currently available technology. Online payments (or any other
most popular means of payment in some countries [3]. Indeed, online transactions) bring benefits to companies and
advances in digital technologies have paved the way for individuals in terms of the convenience, velocity, and
changes in how we handle money, especially for payment flexibility of performing daily duties [10,11]. The work in [12]
methods that have changed from being a physical activity to a presented a statistical analysis related to the usage of credit
digital activity using electronics means [4]. This has cards over five years (from 2006 to 2010). This reflected the
revolutionized the landscape of monetary policy, including the huge dependency on credit cards by both people and
business strategies and operations of both large and small organizations. To take advantage of advanced technologies,
companies. Credit card fraud is the fraudulent use of credit companies try to use advanced techniques to provide high-
card details to buy a product or service. These transactions can quality services to customers. Automation can be seen as the
be physically or digitally performed [5]. In physical best solution for attracting more customers and consequently
transactions, the credit card is physically present. On the other collecting more financial gain [13]. The process of converting
hand, digital transactions take place over the internet or a manual system to a fully automatic on, as found in smart
telephone. A cardholder normally provides their card number, cities, is not without risk.
card verification number, and expiration date through a B. Problem Statement
website or telephone call. With the rapid rise in e-commerce
over the past few years, credit card use has increased According to [14], it is estimated that 10,000 transactions
tremendously [1,3]. take place via credit cards every second worldwide. Owing to
such a high transaction frequency, credit cards have become
the primary targets of fraud. Indeed, since the Diners Club
540 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
541 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
II. RELATED WORK The techniques employed to construct credit card fraud
detection systems using AI can be categorized into four main
This section first provides a brief background about the groups. This idea is shown in Fig. 3.
research domain. Then, the related work is presented in detail.
1) Classification-based systems: The authors in [19] tried
A. Background
to achieve two main objectives in their work: (1) enhancing
The background refers to the credit card research field in the accuracy of the classifications output by credit card fraud
terms of the intersection of multiple research sectors. This
detection systems and (2) lowering the response times of these
field can be viewed as the intersection of four main domains,
as illustrated in Fig. 2. systems. To achieve the first goal, the authors proposed a
hybrid model that fuses two classifiers to generate a new (or
The definitions of the domains and terms that are applied enhanced) one. The first classifier used is the K-means
in this study are listed below. classifier, which deals with overlapping data because such
Artificial Intelligence (AI): It can be defined as the data cause poor accuracy. The second classifier is the artificial
science that addresses the methods used for training machines bee colony algorithm (ABC), which is used to enhance the
to mimic the brains of humans. In other words, machines can performance of the system. The first classifier forms the first
be used to make decisions on behalf of human users. In this level, and the second classifier forms the second level of the
context, data mining tasks, such as classification, clustering, classification process proposed in the same model. The
applying association rules, and using neural networks, are database used in this work was generated by using the C#
employed [2]. programming language, where the number of instances was
Financial Systems: These can be defined as the systems 100,000. In addition, 12 features were selected to include in
that are used to convert manual transactions into digital the training phase. The selected features were based on a rule
transactions. In this context, the term “transaction” denotes engine.
any financial activity that may be performed by a user based
on a specific system [27]. Moreover, previous systems suffered from problems in
real-time environments [6]. These are problems in the context
Chip Industry: This term refers to the manufacturing of of credit card fraud detection. Such problems include
chips to store critical information on the card of the user. The imbalanced data, noisy data, and the concept of drift. The
information acts as a key to trigger any transaction. However, authors applied the bag creation technique to solve the data
the chip is programmed to match some passwords to allow problems; this technique involves performing the sampling
access to financial interfaces [28]. process on the collected data in real time. To clean the data,
Internet of Things (IoT): It can be defined as a collection they applied naïve Bayes networks for the effective
of devices connected via a network. The devices vary from manipulation of noisy data. An incremental learning-based
small devices with low processing power (such as watches) to method was presented to address the concept of drift. The data
large devices high processing power, such as mobile devices. set used in this work is summarized in Table I.
Using IoT devices to perform financial transactions is vital, The strength of this study is the enhancement in
especially in light of the goal of shifting toward smart cities performance achieved by using Spark to implement the system
[29]. in parallel. In addition, the reduction in cost is considered an
B. Groups of AI-based Techniques important feature of this system, and this was achieved by
employing naive Bayes networks in the process of
Artificial intelligence (AI) is defined as enabling machines classification. The weakness of the proposed system is that it
to make decisions on behalf of human users. In this context, does not manipulate cyclic recurrences that may be included
data mining tasks, such as classification, clustering, applying in the concept of drift. Cyclic recurrences refer to cyclic
association rules, and using neural networks, are employed repetitions in the distributions of data.
[2]. In addition, AI is employed to build systems for fraud
detection, such as classification-based systems [19,6,7,8],
clustering-based systems [17,20,21], neural network-based
systems [18,22,23] and support vector machine-based
systems [9].
%Fraudulent
Start day End day Instances
Transactions
542 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
The authors in [7] evaluated the current fraud detection skilled. The methods proposed to address such problems
system with regard to credit card transactions. The problem is suffer from low accuracy and effectiveness. In addition, the
that there are two stages for automatic classification: real-time methods used for detecting fraud may make some mistakes in
(RT) and near-real-time (NRT). They focused on the NRT identifying fraudulent transactions. The reason behind such
stage by using a rule-based classification technique that shortcomings is that the proposed approaches focus on order
considers the final evaluation of the human element of fraud. analysis rather than anything else. Motivated by these facts,
The authors did not improve the design of the system, the authors proposed a method that focuses on the hackers
discover any new rules, or improve the arithmetic efficiency themselves. The key idea is to extract some recognized
of individual rules. Instead, they manipulated the rules to form features, such as the address of delivery, customer name, and
a decision-making system to improve both the accuracy and methods of payment, and then, based on these features, the
the performance. The key idea is to calculate the contribution similarity among the attackers is calculated. Based on these
of each rule involved in the system. Calculating the similarities, the attackers are grouped in some clusters for
contribution of a rule depends on the difference between two detection. A main feature of their proposed method is that two
values, which are (1) the performance of the system when the current methods, agglomerative clustering and sampling, are
rule is used and (2) the performance of the system without selectively used in a reasonable amount of time for recursively
using the rule. The degree of performance improvement is grouping orders into small clusters. The dataset used for the
high if the rule is not redundant and is low if it is redundant training process was inspired by the Zalando website. This
with other rules or rule groups. For the measurement of website periodically receives approximately 29 million orders
performance, the precision, recall and F-score metrics were (some of them are normal and others are fraudulent).
employed. A real database, which consists of 359,862 records
The authors in [21] tried to evaluate the detection problem
provided by some industrial partners, was used for the training
phase. by extracting the general pattern of the dataset to represent the
fraud. In other words, the enhancement of the clustering
The authors in [8] addressed credit card fraud detection. In methods relies only on the clusters used; this technique is
this study, the authors relied on the fact that "the features of called general enhancement. The authors proposed an
the financial transactions in institutions change over time". approach that enables the application of local enhancement as
This shows that the problem of credit card fraud detection well as general enhancement for fraud detection in financial
should be considered in real time. Therefore, they converted transactions. They proposed the “Hierarchical Clusters-based
this problem into real working transactions. In terms of Deep Neural Networks (HC-DNN)” method that uses the
artificial intelligence, the class should not be provided to the anomalous features of hierarchical clusters that are pretrained
classifier immediately during the training stage. The key idea based on an autoencoder as the initial weights for neural
of the proposed approach is to follow a strict strategy that has networks. In detail, the data are grouped based on abnormal
three main steps: (1) analysing the real conditions under which features that refer to fraud. These features are then used as the
the real transactions are performed; (2) employing these initial weights for the input layers of neural networks, as
conditions to train the classifier using two main data sets; and shown in Fig. 4.
(3) testing the classifier after the training stage is completed
and supporting it by using the feedback of the users (their TABLE II. DETAILS OF THE DATASET USED [8]
interactions) to improve the accuracy of the classifier. Table II
%Fraudulent
summarizes the dataset used. Id End day Instances Features
Transactions
2) Clustering-based systems: To address the problem of 2013 2014-01-18 21'830'330 51 0.19%
detecting credit card fraud through transactions, the authors in 2014-2015 2015-05-31 54'764'384 51 0.24%
[17] dealt with the problem of online shopping fraud and the
concept of drift. They proposed a strategy consisting of four
stages: (1) based on both the previous transaction data and the
information of the cardholders, they used the clustering
method to divide the cardholders into different groups for the
purpose comparing their behaviours; (2) they proposed a
sliding window strategy to group the transactions in each
group to extract the behavioural patterns for each cardholder;
(3) they trained a set of classifications for each group to
measure behavioural patterns; and (4) they used a group of
classifiers by training them on cardholder behaviours and
output the highest behaviour pattern. A feedback mechanism
was used to solve the concept of drift problem. Four dataset
simulators were generated to manually create the data sets.
The authors in [20] proposed a clustering-based method. In
this study, the fraud detection problem in ecommerce is
manipulated and may be exploited by hackers who are highly Fig. 4. Key Idea of the HC-DNN Method [21].
543 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
The authors used a dataset containing 19,505 records, features are used for the clustering process. In other words, the
including fraudulent and non-fraudulent records. The dataset data are cleaned initially. Then, the features of transactions are
is skewed and consists of 19,313 non-fraudulent and 192 extracted. Third, the features are measured to calculate the
fraudulent cases. Some preprocessing steps were performed on similarity among them. To isolate the features as much as
the data to mitigate the negative impact of the imbalanced data
possible, the SVM is used. Fourth, the K-means clustering
before using them for actual training.
algorithm is used to cluster the data based on the isolated (i.e.,
3) Neural network-based systems: The authors in [18] as far as possible) features. The classifier is then trained on the
discussed issues related to increasing fraud detection in online clusters. The classifier that deals with fraudulent transactions
shopping transactions and payments, especially those related is used to detect fraud. The database used for training contains
to credit cards. To detect credit card fraud, they proposed a 5310 records in total. Among them, 490 records are fraudulent
neural network-based system. It uses back prorogation to data and 4820 are non-fraudulent data, and 1174 characteristic
enhance the output of the neural network so that the error (the variables are included.
difference between the actual or desirable value and the output
of the neural network) is distributed back by adjusting the
weights of the inputs. The strategy followed in this work can
be summarized through the following steps:
a) A new Neuroph Project was created in Neuroph
Studio using the Java programming language.
b) The actual perceptron network was constructed.
c) The training data set was prepared.
d) The training process was started by considering the
desired value (the accuracy of fraud detection) set by an expert
in the field.
e) The trained network was tested. Fig. 5. General Scenario of the Fraud Detection System Proposed in the
Work in [22].
The data used for training were collected from a data
mining blog. It includes 20000 active credit card holders with
transactions spanning more than six months. The authors in
[22] proposed a “Convectional Neural Network CNN” in their
work. Similar to previous works, the problem studied was how
to detect a pattern that represents fraudulent transactions. In
their method, the CNN forms a classifier that takes features of
the transactions as inputs. The features are extracted from each
transaction and stored in a feature matrix. The classifier has
the ability to deal with imbalanced data based on the sampling
technique. The key idea behind the sampling technique is to
use higher than normal costs to generate fraudulent
transactions. Fig. 5 illustrates the general scenario of the CNN
model.
The data used includes more than 260 million credit card
transactions in one year. Approximately four thousand
transfers are listed as fraudulent, and the remainder are legal.
A hybrid fraud detection system was proposed in [23]. The
key idea is to use neural networks as classifiers. Since the
network needs to update the weights of the input layer, a
swarm optimization method was employed for this purpose.
Finally, the model was tested and evaluated. Fig. 6 illustrates
the general structure of the proposed system, which is called
the “Particle Swarm Optimization Auto-associative Neural
Network (PSOAANN)”.
4) Support vector machine-based systems: The authors in
[9] used a support vector machine (SVM) to improve the
accuracy of the classifier in the process of detecting fraud in
credit card transactions. The key idea behind using an SVM is Fig. 6. Structure of the PSOAANN-based System [23].
to split the features that represent transactions, where these
544 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
Fig. 11. Data Exploration based on the Two Main Classes of the Data.
B. Data Cleaning
The goal of this step is to clean the data and prepare it for
the training phase of the classifier. In general, data in reality
Fig. 7. Flow Chart of the Proposed Approach. are noisy. Therefore, a cleaning step is necessary. In the
context of the data cleaning process, the procedure is as
follows:
1) Fill in the missing values. A missing value means that a
cell of a given record is empty due to an mistake during entry.
2) Solve any inconsistencies. This means that if there is a
collision in the data, this collision must be resolved.
3) Remove any outliers. Outliers refer to abnormal values
(i.e., very high values or very low values).
Fig. 8. Loading the used Dataset.
545 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
546 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
The logistic regression equation can be obtained from the F. Evaluating the Classifier
linear regression equation. The mathematical steps to obtain In general, a confusion matrix is an effective benchmark
logistic regression equations are given below: for analysing how well a classifier can recognize records of
The equation of the straight line can be written as: different classes [34]. The confusion matrix is formed based
on the following terms:
𝑦 = 𝑎0 + 𝑎1 × 𝑥1 + 𝑎2 × 𝑥2 + ⋯ 𝑎𝑘 × 𝑥𝑘 (1)
1) True positives (TP): positive records that are correctly
In logistic regression, y can be between 0 and 1 only, so labelled by the classifier.
we divide the above equation by (1 − 𝑦): 2) True negatives (TN): negative records that are correctly
𝑦 labelled by the classifier.
|0 𝑓𝑜𝑟 𝑦 = 0 𝑎𝑛𝑑 ∞ 𝑓𝑜𝑟 𝑦 = 1 (2)
1−𝑦 3) False positives (FP): negative records that are
As a result, the logistic regression equation is defined as: incorrectly labelled positive.
𝑦 4) False negatives (FN): positive records that are
log [1−𝑦] = 𝑎0 + 𝑎1 × 𝑥1 + 𝑎2 × 𝑥2 + ⋯ 𝑎𝑘 × 𝑥𝑘 (3) mislabelled negative.
Table III shows the confusion matrix in terms of the TP,
FN, FP, and TN values.
Relying on the confusion matrix, the accuracy, sensitivity,
and error rate metrics are derived. For a given classifier, the
accuracy can be calculated by considering the recognition rate,
which is the percentage of records in the test set that are
correctly classified (fraudulent or non-fraudulent). The
accuracy is defined as:
(𝑇𝑃+𝑇𝑁)
Fig. 17. A Visual Comparison between Linear and Logistic Regression [31].
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (5)
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑟𝑒𝑐𝑜𝑟𝑑𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑒𝑠𝑡𝑖𝑛𝑔 𝑠𝑒𝑡
547 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
548 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
549 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
550 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
[21] Kim, Jeongrae, Han-Joon Kim, and Hyoungrae Kim. "Fraud detection [34] Mona Alfifi, Mohamad Shady Alrahhal, Samir Bataineh and
for job placement using hierarchical clusters-based deep neural Mohammad Mezher, “Enhanced Artificial Intelligence System for
networks." Applied Intelligence 49.8 (2019): 2842-2861. Diagnosing and Predicting Breast Cancer using Deep Learning”
[22] Fu, Kang, et al. "Credit card fraud detection using convolutional neural International Journal of Advanced Computer Science and
networks." International Conference on Neural Information Processing. Applications(IJACSA), 11(7), 2020.
Springer, Cham, 2016. http://dx.doi.org/10.14569/IJACSA.2020.0110763
[23] Kamaruddin, Sk, and Vadlamani Ravi. "Credit card fraud detection [35] Alrahhal, Mohamad Shady, Maher Khemakhem, and Kamal Jambi.
using big data analytics: use of PSOAANN based one-class "Agent-Based System for Efficient kNN Query Processing with
classification." Proceedings of the International Conference on Comprehensive Privacy Protection." International Journal Of Advanced
Informatics and Analytics. 2016. Computer Science And ApplicationS 9.1 (2018): 52-66.
[24] Arun, C., and C. Lakshmi. "Class Imbalance in Software Fault [36] Alrahhal, Mohamad Shady, et al. "AES-route server model for location
Prediction Data Set." Artificial Intelligence and Evolutionary based services in road networks." International Journal Of Advanced
Computations in Engineering Systems. Springer, Singapore, 2020. 745- Computer Science And Applications 8.8 (2017): 361-368.
757. [37] Alrahhal, Mohamad Shady, Maher Khemakhem, and Kamal Jambi. "A
[25] Thabtah, Fadi, et al. "Data imbalance in classification: Experimental SURVEY ON PRIVACY OF LOCATION-BASED SERVICES:
evaluation." Information Sciences 513 (2020): 429-441. CLASSIFICATION, INFERENCE ATTACKS, AND CHALLENGES."
Journal of Theoretical & Applied Information Technology 95.24 (2017).
[26] Maung, Ei Thinzar Win. Comparison of Data Mining Classification
Algorithms: C5. 0 and CART for Car Evaluation and Credit Card [38] Alrahhal, Mohamad Shady, Maher Khemekhem, and Kamal Jambi.
Information Datasets. Diss. Unversity of Computer Studies, Yangon, "Achieving load balancing between privacy protection level and power
2020. consumption in location based services." (2018).
[27] Rike, James B. "Cylinder support system." U.S. Patent Application No. [39] Alrahhal, H.; Alrahhal, M.S.; Jamous, R.; Jambi, K. A Symbiotic
29/641,843. Relationship Based Leader Approach for Privacy Protection in Location
Based Services. ISPRS Int. J. Geo-Inf. 2020, 9, 408.
[28] Freund, Peter C. "Method and system for performing purchase and
other transactions using tokens with multiple chips." U.S. Patent No. [40] Al-Rahal, M. Shady, Adnan Abi Sen, and Abdullah Ahmad Basuhil.
10,282,536. 7 May 2019. "High level security based steganoraphy in image and audio files."
Journal of theoretical and applied information technology 87.1 (2016):
[29] Benamar, Lamya, Christine Balagué, and Zeling Zhong. "Internet of
29.
Things devices appropriation process: the Dynamic Interactions Value
Appropriation (DIVA) framework." Technovation 89 (2020): 102082. [41] Alluhaybi, Bandar, et al. "A Survey: Agent-based Software Technology
Under the Eyes of Cyber Security, Security Controls, Attacks and
[30] Kaggle , website (2020). Avaliable : https://www.kaggle.com/mlg-
Challenges." International Journal of Advanced Computer Science and
ulb/creditcardfraud (access 22 July 2020). Applications (IJACSA) 10.8 (2019).
[31] DataCamp , website (2020). Avaliable : [42] Fouz, Fadi, et al. "Optimizing Communication And Cooling Costs In
https://www.datacamp.com/community/tutorials/understanding-logistic-
Hpc Data Center." Journal of Theoretical and Applied Information
regression-python (access 28 July 2020). Technology 85.2 (2016): 112.
[32] Salillari, Denisa, and Luela Prifti. "Comparison Study of Logistic
[43] Alrahhal, Mohamad Shady, and Adnan Abi Sen. "Data mining, big data,
Regression Model for Albanian Texts." Journal of Advances in
and artificial intelligence: An overview, challenges, and research
Mathematics 12.9 (2016): 6572-6575.
questions." (2018).
[33] javatpoint , website (2020). Avaliable :
https://www.javatpoint.com/logistic-regression-in-machine-learning
(access 22 July 2020).
551 | P a g e
www.ijacsa.thesai.org