Credit Card Fraud Analysis Using Predictive Modeling
Credit Card Fraud Analysis Using Predictive Modeling
Credit Card Fraud Analysis Using Predictive Modeling
INTRODUCTION
1.1 INTRODUCTION
The online shopping growing day to day. Credit cards are used for purchasing goods and services
with the help of virtual card and physical card where as virtual card for online transaction and
physical card for offline transaction. In a physical-card based purchase, the cardholder presents his
card physically to a merchant for making a payment. To carry out fraudulent transactions in this
kind of purchase, an attacker has to steal the credit card. If the cardholder does not realize the loss
of card, it can lead to a substantial financial loss to the credit card company. In online payment
mode, attackers need only little information for doing fraudulent transaction (secure code, card
number, expiration date etc.). In this purchase method, mainly transactions will be done through
Internet or telephone.
To commit fraud in these types of purchases, a fraudster simply needs to know the card details.
Most of the time, the genuine cardholder is not aware that someone else has seen or stolen his card
information. The only way to detect this kind of fraud is to analyse the spending patterns on every
card and to figure out any inconsistency with respect to the “usual” spending patterns.
Fraud detection based on the analysis of existing purchase data of cardholder is a promising way to
reduce the rate of successful credit card frauds. Since humans tend to exhibit specific behavioristic
profiles, every cardholder can be represented by a set of patterns containing information about the
typical purchase category, the time since the last purchase, the amount of money spent, etc.
Deviation from such patterns is a potential threat to the system.
1.2 PURPOSE OF THE PROJECT
The main objective of the Project on Credit Card Fraud Detection System is to manage the details
of Credit Card, Transactions, Datasets, Files, Prediction. It manages all the information about Credit
Card, Customer, Prediction, Credit card. The project is totally built at administrative end and thus
only the administrator is guaranteed the access. The purpose of the project is to build an application
program to reduce the manual work for managing the credit card, transactions, customers, datasets.
It tracks all the all the details about datasets, files, prediction.
1
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
Design:
PROBLEM STATEMENT
Credit card fraud stands as major problem for word wide financial institutions. Annual lost due to
it scales to billions of dollars. We can observe this from many financial reports. Such as
(Bhattacharyya et al., 2011) 10th annual online fraud report by Cyber Source shows that estimated
loss due to online fraud is $4 billion for 2008 which is 11% increase than $3.6 billion loss in 2007and
in 2006, fraud in United Kingdom alone was estimated to be £535 million in 2007 and now costing
around 13.9 billion a year (Mahdi et al., 2010). From 2006 to 2008, UK alone has lost £427.0 million
to £609.90 million due to credit and debit card fraud (Woolsey &Schulz, 2011). Although, there is
some decrease in such losses after implementation of detection and prevention systems by
government and bank, card-not-present fraud losses are increasing at higher rate due to online
transactions. Worst thing is it is still increasing un-protective and un-detective way.
Over the year, government and banks have implemented some steps to subdue these frauds but along
with the evolution of fraud detection and control methods, perpetrators are also evolving their
methods and practices to avoid detection.
2
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
LITERATURE REVIEW
3
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
predicting score according to the algorithm. Finally Confusion matrix was plotted on true and
predicted.
Advantages:
The results obtained by the Logistic Regression Algorithm is best compared to any other
Algorithms.
The Accuracy obtained was almost equal to cent percent which proves using of Logistic
algorithm gives best results.
The plots that were plotted according to the proper data that is processed during the
implementation
2.3 FEASIBILITY STUDY
Preliminary investigation examine project feasibility, the likelihood the system will be useful to the
organization. The main objective of the feasibility study is to test the Technical, Operational and
Economical feasibility for adding new modules and debugging old running system. All system is
feasible if they are unlimited resources and infinite time. There are aspects in the feasibility study
portion of the preliminary investigation:
Technical Feasibility
Operational Feasibility
Economical Feasibility
2.3.1 TECHNICAL FEASIBILITY
The technical issue usually raised during the feasibility stage of the investigation includes the
following:
Does the necessary technology exist to do what is suggested?
Do the proposed equipments have the technical capacity to hold the data required to use the new
system?
Will the proposed system provide adequate response to inquiries, regardless of the number or
location of users?
Can the system be upgraded if developed?
Are there technical guarantees of accuracy, reliability, ease of access and data security?
Earlier no system existed to cater to the needs of ‘Secure Infrastructure Implementation
System’. The current system developed is technically feasible. It is a web based user interface for
audit workflow at NIC-CSD. Thus it provides an easy access to the users. The database’s purpose
is to create, establish and maintain a workflow among various entities in order to facilitate all
concerned users in their various capacities or roles. Permission to the users would be granted based
4
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
on the roles specified. Therefore, it provides the technical guarantee of accuracy, reliability and
security. The software and hard requirements for the development of this project are not many and
are already available in-house at NIC or are available as free as open source. The work for the
project is done with the current equipment and existing software technology. Necessary bandwidth
exists for providing a fast feedback to the users irrespective of the number of users using the system.
2.3.2 OPERATIONAL FEASIBILITY
Proposed projects are beneficial only if they can be turned out into information system. That
will meet the organization’s operating requirements. Operational feasibility aspects of the project
are to be taken as an important part of the project implementation. Some of the important issues
raised are to test the operational feasibility of a project includes the following: -
Is there sufficient support for the management from the users?
Will the system be used and work properly if it is being developed and implemented?
Will there be any resistance from the user that will undermine the possible application benefits?
This system is targeted to be in accordance with the above-mentioned issues. Beforehand,
the management issues and user requirements have been taken into consideration. So there is no
question of resistance from the users that can undermine the possible application benefits.
The well-planned design would ensure the optimal utilization of the computer resources and would
help in the improvement of performance status.
2.3.3 ECONOMICAL FEASIBILITY
A system can be developed technically and that will be used if installed must still be a good
investment for the organization. In the economical feasibility, the
development cost in creating the system is evaluated against the ultimate benefit derived from the
new systems. Financial benefits must equal or exceed the costs.
The system is economically feasible. It does not require any addition hardware or software.
Since the interface for this system is developed using the existing resources and technologies
available at NIC, There is nominal expenditure and economical feasibility for certain.
5
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
SYSTEM DESIGN
3.1. INTRODUCTION
Software design sits at the technical kernel of the software engineering process and is applied
regardless of the development paradigm and area of application. Design is the first step in the
development phase for any engineered product or system. The designer’s goal is to produce a model
or representation of an entity that will later be built. Beginning, once system requirement have been
specified and analyzed, system design is the first of the three technical activities -design, code and
test that is required to build and verify software.
The importance can be stated with a single word “Quality”. Design is the place where quality is
fostered in software development. Design provides us with representations of software that can
assess for quality. Design is the only way that we can accurately translate a customer’s view into a
finished software product or system. Software design serves as a foundation for all the software
engineering steps that follow. Without a strong design we risk building an unstable system – one
that will be difficult to test, one whose quality cannot be assessed until the last stage.The purpose
of the design phase is to plan a solution of the problem specified by the requirement document.This
phase is the first step in moving from the problem domain to the solution domain.In other words,
starting with what is needed, design takes us toward how to satisfy the needs.The design of a system
is perhaps the most critical factor affection the quality of the software; it has a major impact on the
later phase, particularly testing, maintenance.The output of this phase is the design document. This
document is similar to a blueprint for the solution and is used later during implementation, testing
and maintenance.The design activity is often divided into two separate phases System Design and
Detailed Design.
System Design also called top-level design aims to identify the modules that should be in the system,
the specifications of these modules, and how they interact with each other to produce the desired
results.
3.2 SYSTEM DESIGN
Systems design is the process of defining the architecture, product design, modules, interfaces, and
data for a system to satisfy specified requirements. Systems design could be seen as the application
of systems theory to product development. The procedure which we followed to predict the result
are understanding problem statement and data by performing statistical analysis and visualization
then checking whether the data is balance or not, In this data set the data is imbalanced, balanced
by using oversampling, then scaling the data using standardization and normalization and testing
data with different ML algorithms For any data science project some package are very important
such as Numpy that is numeric python And pandas and for visualization of the data, matplotlib and
6
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
seaborn is used which build on matplotlib with some extra features. Anaconda navigator is used as
it is having several IDE’s installed in it python programming language is used to implement machine
learning algorithms as it is easy to learn and implement. In this project Jupyter notebook is used to
process the complete code where the code can be viewed as block of codes and running each section
and identifying the errors is easier.
1. The fraud detection module will work in the following steps.
2. The Incoming set of transactions and amount are treated as credit card transactions.
3. The credit card transactions are given to machine learning algorithms as an input.
4. The output will result in either fraud or valid transaction by analyzing the data and observing a
pattern and using machine learning algorithms to do anomaly detection.
5. The fraud transactions alerts the user that fraud transaction has occurred and the user can block
the card to prevent further financial loss to him as well as the credit card company.
6. The valid transactions are treated as genuine transactions
3.2 ARCHITECTURE DIAGRAM
HARDWARE REQUIREMENTS:
RAM : 4GB and High
Processor : Intel i3 and above
Hard Disk : 500GB: Minimum
SOFTWARE REQUIREMENTS:
OS : Windows or Linux
Python IDE : python 2.7.x and above
7
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
8
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
METHODOLOGY
Data scientists typically use reinforcement learning to teach a machine to complete a multistep
process for which there are clearly defined rules. Data scientists program an algorithm to complete
a task and give it positive or negative cues as it works out how to complete a task. But for the
most part, the algorithm decides on its own what steps to take along the way.
9
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
4.2 ALGORITHMS
Logistic Regression is one of the classification algorithm, used to predict a binary values in a
given set of independent variables (1 / 0, Yes / No, True / False). To represent binary / categorical
values, dummy variables are used. For the purpose of special case in the logistic regression is a
linear regression, when the resulting variable is categorical then the log of odds are used for
dependent variable and also it predicts the probability of occurrence of an event by fitting data to a
logistic function. Such as O = e^(I0 + I1*x) / (1 + e^(I0 + I1*x)) (3.1) Where, O is the predicted
output I0 is the bias or intercept term I1 is the coefficient for the single input value (x). Logistic
regression is started with the simple linear regression equation in which dependent variable can be
enclosed in a link function i.e., to start with logistic regression.
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that
is mostly used in classification problems. It works for both categorical and continuous input and
output variables. In this technique, we split the population or sample into two or more
homogeneous sets (or sub-populations) based on most significant splitter / differentiator in input
variables.
10
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
5. Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You
can say opposite process of splitting.
6. Branch / Sub-Tree: A sub section of entire tree is called branch or sub-tree.
7. Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-
nodes where as sub-nodes are the child of parent node.
11
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
UML combines best techniques from data modeling (entity relationship diagrams), business
modeling (work flows), object modeling, and component modeling. It can be used with all
processes, throughout the software development life cycle, and across different implementation
technologies. UML has synthesized the notations of the Booch method, the Object-modeling
technique (OMT) and Object-oriented software engineering (OOSE) by fusing them into a single,
common and widely usable modeling language. UML aims to be a standard modeling language
which can model concurrent and distributed systems.
5.1 UMLS ON CREDIT FRAUD DETECTIONS
Class diagram:
validate
algoritham
+fraud
+logistic ression +non fraud
+appy() +check()
Sequence diagram:
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that
shows how processes operate with one another and in what order. It is a construct of a Message
12
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
Sequence Chart. A sequence diagram shows, as parallel vertical lines ("lifelines"), different
processes or objects that live simultaneously, and, as horizontal arrows, the messages exchanged
between them, in the order in which they occur. This allows the specification of simple runtime
scenarios in a graphical manner.
Component diagram:
pandas
kaggle dataset
server
13
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
Deployement diagram:
system
logistic regressions
pandas library
Nearest neighbors:
Algorithm
Example of k-NN classification. The test sample (green circle) should be classified either to the
first class of blue squares or to the second class of red triangles. If k = 3 (solid line circle) it is
assigned to the second class because there are 2 triangles and only 1 square inside the inner circle.
If k = 5 (dashed line circle) it is assigned to the first class (3 squares vs. 2 triangles inside the outer
circle).
The training examples are vectors in a multidimensional feature space, each with a class label. The
training phase of the algorithm consists only of storing the feature vectors and class labels of the
training samples.
In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test
point) is classified by assigning the label which is most frequent among the k training samples
nearest to that query point.
A commonly used distance metric for continuous variables is Euclidean distance. For discrete
variables, such as for text classification, another metric can be used, such as the overlap
14
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
metric (or Hamming distance). In the context of gene expression microarray data, for example, k-
NN has also been employed with correlation coefficients such as Pearson and Spearman.[3] Often,
the classification accuracy of k-NN can be improved significantly if the distance metric is learned
with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood
components analysis.
A drawback of the basic "majority voting" classification occurs when the class distribution is
skewed. That is, examples of a more frequent class tend to dominate the prediction of the new
example, because they tend to be common among the k nearest neighbors due to their large
number.[4] One way to overcome this problem is to weight the classification, taking into account
the distance from the test point to each of its k nearest neighbors. The class (or value, in regression
problems) of each of the k nearest points is multiplied by a weight proportional to the inverse of
the distance from that point to the test point. Another way to overcome skew is by abstraction in
data representation. For example, in a self-organizing map (SOM), each node is a representative (a
center) of a cluster of similar points, regardless of their density in the original training data. K-NN
can then be applied to the SOM.
Parameter selection
The best choice of k depends upon the data; generally, larger values of k reduces effect of the
noise on the classification,[5] but make boundaries between classes less distinct. A good k can be
selected by various heuristic techniques (see hyperparameter optimization). The special case
where the class is predicted to be the class of the closest training sample (i.e. when k = 1) is called
the nearest neighbor algorithm.
The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or
irrelevant features, or if the feature scales are not consistent with their importance. Much research
effort has been put into selecting or scaling features to improve classification. A particularly
popular approach is the use of evolutionary algorithms to optimize feature scaling. Another
popular approach is to scale features by the mutual information of the training data with the
training classes.
In binary (two class) classification problems, it is helpful to choose k to be an odd number as this
avoids tied votes. One popular way of choosing the empirically optimal k in this setting is via
bootstrap method.
The 1-nearest neighbor classifier
The most intuitive nearest neighbour type classifier is the one nearest neighbour classifier
As the size of training data set approaches infinity, the one nearest neighbour classifier guarantees
an error rate of no worse than twice the Bayes error rate (the minimum achievable error rate given
the distribution of the data).
15
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
Properties
k-NN is a special case of a variable-bandwidth, kernel density "balloon" estimator with a
uniform kernel.
The naive version of the algorithm is easy to implement by computing the distances from the test
example to all stored examples, but it is computationally intensive for large training sets. Using an
approximate nearest neighbor search algorithm makes k-NN computationally tractable even for
large data sets. Many nearest neighbor search algorithms have been proposed over the years; these
generally seek to reduce the number of distance evaluations actually performed.
k-NN has some strong consistency results. As the amount of data approaches infinity, the two-
class k-NN algorithm is guaranteed to yield an error rate no worse than twice the Bayes error
rate (the minimum achievable error rate given the distribution of the data). Various improvements
to the k-NN speed are possible by using proximity graphs.
For multi-class k-NN classification, Cover and Hart (1967) prove an upper bound error rate of
where is the Bayes error rate (which is the minimal error rate
possible), is the k-NN error rate, and M is the number of classes in the problem. For and as the
Bayesian error rate approaches zero, this limit reduces to "not more than twice the Bayesian error
rate".
Error rates
There are many results on the error rate of the k nearest neighbour classifiers. The k-nearest
neighbour classifier is strongly (that is for any joint distribution on) consistent provided diverges
and converges to zero as,Let denote the k nearest algorithm
Dimension reduction
For high-dimensional data (e.g., with number of dimensions more than 10) dimension reduction is
usually performed prior to applying the k-NN algorithm in order to avoid the effects of the curse
of dimensionality.
The curse of dimensionality in the k-NN context basically means that Euclidean distance is
unhelpful in high dimensions because all vectors are almost equidistant to the search query vector
(imagine multiple points lying more or less on a circle with the query point at the center; the
distance from the query to all data points in the search space is almost the same).
Feature extraction and dimension reduction can be combined in one step using principal
component analysis (PCA), linear discriminant analysis(LDA), or canonical correlation
analysis (CCA) techniques as a pre-processing step, followed by clustering by k-NN on feature
vectors in reduced-dimension space. In machine learning this process is also called low-
dimensional embedding.
16
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
For very-high-dimensional datasets (e.g. when performing a similarity search on live video
streams, DNA data or high-dimensional time series) running a fast approximate k-NN search
using locality sensitive hashing, "random projections", "sketches" or other high-dimensional
similarity search techniques from the VLDB toolbox might be the only feasible option.
Decision boundary
Nearest neighbor rules in effect implicitly compute the decision boundary. It is also possible to
compute the decision boundary explicitly, and to do so efficiently, so that the computational
complexity is a function of the boundary complexity.
Data reduction
Data reduction is one of the most important problems for work with huge data sets. Usually, only
some of the data points are needed for accurate classification. Those data are called
the prototypes and can be found as follows:
1.Select the class-outliers, that is, training data that are classified incorrectly by k-NN (for a
given k)
2.Separate the rest of the data into two sets: (i) the prototypes that are used for the classification
decisions and (ii) the absorbed points that can be correctly classified by k-NN using prototypes.
The absorbed points can then be removed from the training set.
CNN for data reduction
Condensed nearest neighbor (CNN, the Hart algorithm) is an algorithm designed to reduce the
data set for k-NN classification. It selects the set of prototypes U from the training data, such that
1NN with U can classify the examples almost as accurately as 1NN does with the whole data set.
17
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
Use U instead of X for classification. The examples that are not prototypes are called "absorbed"
points.
It is efficient to scan the training examples in order of decreasing border ratio. The border ratio of
a training example x is defined as
a(x) = ||x'-y||/ ||x-y||
where ||x-y|| is the distance to the closest example y having a different color than x, and ||x'-y|| is
the distance from y to its closest example x' with the same label as x.
The border ratio is in the interval [0,1] because ||x'-y||never exceeds ||x-y||. This ordering gives
preference to the borders of the classes for inclusion in the set of prototypes U. A point of a
different label than x is called external to x. The calculation of the border ratio is illustrated by the
figure on the right. The data points are labeled by colors: the initial point is x and its label is red.
External points are blue and green. The closest to x external point is y. The closest to y red point
is x' . The border ratio a(x) = ||x'-y|| / ||x-y||is the attribute of the initial point x.
Below is an illustration of CNN in a series of figures. There are three classes (red, green and
blue). Fig. 1: initially there are 60 points in each class. Fig. 2 shows the 1NN classification map:
each pixel is classified by 1NN using all the data. Fig. 3 shows the 5NN classification map. White
areas correspond to the unclassified regions, where 5NN voting is tied (for example, if there are
two green, two red and one blue points among 5 nearest neighbors). Fig. 4 shows the reduced data
set. The crosses are the class-outliers selected by the (3,2)NN rule (all the three nearest neighbors
of these instances belong to other classes); the squares are the prototypes, and the empty circles
are the absorbed points. The left bottom corner shows the numbers of the class-outliers, prototypes
and absorbed points for all three classes. The number of prototypes varies from 15% to 20% for
different classes in this example. Fig. 5 shows that the 1NN classification map with the prototypes
is very similar to that with the initial data set. The figures were produced using the Mirkes applet.
18
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
Fig. 5. The 1NN classification map based on the CNN extracted prototypes.
FCNN (for Fast Condensed Nearest Neighbor) is a variant of CNN, which turns out to be one of
the fastest data set reduction algorithms for k-NN classification.
19
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
K-NN regression
In k-NN regression, the k-NN algorithm is used for estimating continuous variables. One such
algorithm uses a weighted average of the k nearest neighbors, weighted by the inverse of their
distance. This algorithm works as follows:
Compute the Euclidean or Mahalanobis distance from the query example to the labeled
examples.
Order the labeled examples by increasing distance.
Find a heuristically optimal number k of nearest neighbors, based on RMSE. This is done
using cross validation.
Calculate an inverse distance weighted average with the k-nearest multivariate neighbors.
K-NN outlier
The distance to the kth nearest neighbor can also be seen as a local density estimate and thus is
also a popular outlier score in anomaly detection. The larger the distance to the k-NN, the lower
the local density, the more likely the query point is an outlier.To take into account the whole
neighborhood of the query point, the average distance to the k-NN can be used. Although quite
simple, this outlier model, along with another classic data mining method, local outlier factor,
works quite well also in comparison to more recent and more complex approaches, according to a
large scale experimental analysis.
Validation of results
A confusion matrix or "matching matrix" is often used as a tool to validate the accuracy of k-NN
classification. More robust statistical methods such as likelihood-ratio test can also be applied.
21
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
22
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
Output Screen
step1:-
23
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
Step 2:-
Step 3:-
24
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
Step 4:-
Finally got confusion metrics and high accuracy
25
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
8.1INTRODUCTION
Software testing is a critical element of software quality assurance and represents the
ultimate review of specification, design and coding. In fact, testing is the one step in the software
engineering process that could be viewed as destructive rather than constructive.
A strategy for software testing integrates software test case design methods into a well-
planned series of steps that result in the successful construction of software. Testing is the set of
activities that can be planned in advance and conducted systematically. The underlying motivation
of program testing is to affirm software quality with methods that can economically and effectively
apply to both strategic to both large and small-scale systems.
26
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
UNIT
TESTING
MODULE
TESTING
SUB-SYSTEM
Component TESING
Testing
SYSTEM
TESTING
Integration
Testing
ACCEPTANC
E TESTING
User
Testing
To follow the concept of white box testing we have tested each form .we have created
independently to verify that Data flow is correct, All conditions are exercised to check their validity,
All loops are executed on their boundaries.
SYSTEM SECURITY
9.1 INTRODUCTION
The protection of computer based resources that include hardware, software, data,
procedures and people against unauthorized use or natural Disaster is known as System Security.
System Security can be divided into four related issues:
1. Security
2. Integrity
3. Privacy
4. Confidentiality
SYSTEM SECURITY refers to the technical innovations and procedures applied to the hardware
and operation systems to protect against deliberate or accidental damage from a defined threat.
DATA SECURITY is the protection of data from loss, disclosure, modification and destruction.
SYSTEM INTEGRITY refers to the power functioning of hardware and programs, appropriate
physical security and safety against external threats such as eavesdropping and wiretapping.
PRIVACY defines the rights of the user or organizations to determine what information they are
willing to share with or accept from others and how the organization can be protected against
unwelcome, unfair or excessive dissemination of information about it.
CONFIDENTIALITY is a special status given to sensitive information in a database to minimize
the possible invasion of privacy. It is an attribute of information that characterizes its need for
protection.
9.2SECURITY SOFTWARE
It is the technique used for the purpose of converting communication. It transfers
message secretly by embedding it into a cover medium with the use of information hiding
techniques. It is one of the conventional techniques capable of hiding large secret message in a cover
image without introducing many perceptible distortions.
NET has two kinds of security:
Role Based Security
Code Access Security
The Common Language Runtime (CLR) allows code to perform only those operations that the code
has permission to perform. So CAS is the CLR's security system that enforces security policies by
preventing unauthorized access to protected resources and operations. Using the Code Access
Security, you can do the following:
Restrict what your code can do
29
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
CONCLUSION
This machine learning fraud detection showed how to tackle the problem of credit card fraud
detection using machine learning. It is fairly easy to come up with a simple model, implement it in
Python and get great results for the Credit Card Fraud Detection task on Kaggle.
30
CREDIT CARD FRAUD ANALYSIS USING PREDICTIVE MODELING
REFERENCES
1. L.J.P. vander Maaten and G.E. Hinton, Visualizing High-Dimensional Data Using t-SNE (2014),
Journal of Machine Learning Research
2. Machine Learning Group — ULB, Credit Card Fraud Detection (2018), Kaggle
5. K. Chaudhary, B. Mallick, "Credit Card Fraud: The study of its impact and detection techniques",
International Journal of Computer Science and Network (IJCSN), vol. 1, no. 4, pp. 31-35, 2012,
ISSN ISSN: 2277-5420.
31