Fraud Detection Using Machine Learning PDF
Fraud Detection Using Machine Learning PDF
Fraud Detection Using Machine Learning PDF
Machine Learning
AWS Implementation Guide
Soji Adeshina
Vishaal Kapoor
Nathalie Rauschmayr
Cyrus Vahid
Chaitanya Hazarey
May 2019
Contents
Overview ................................................................................................................................... 3
Cost ........................................................................................................................................ 3
Architecture Overview........................................................................................................... 4
Solution Components ............................................................................................................... 5
Amazon SageMaker ............................................................................................................... 5
Algorithm .............................................................................................................................. 5
Dataset ................................................................................................................................... 5
Considerations .......................................................................................................................... 5
Customization ....................................................................................................................... 5
Regional Deployment ............................................................................................................ 6
AWS CloudFormation Template .............................................................................................. 6
Automated Deployment ........................................................................................................... 6
What We’ll Cover................................................................................................................... 6
Step 1. Launch the Stack ....................................................................................................... 7
Step 2. Run the Notebook ..................................................................................................... 8
Step 3. Enable the CloudWatch Events Rule ........................................................................ 8
Step 4. Verify the Lambda Function Is Processing Transactions ......................................... 9
Security ..................................................................................................................................... 9
Amazon Kinesis Data Firehose ............................................................................................. 9
Additional Resources.............................................................................................................. 10
Appendix A: Data Visualization ............................................................................................. 10
Appendix B: Acknowledgements.............................................................................................12
Source Code .............................................................................................................................13
Document Revisions ................................................................................................................13
Page 2 of 13
Amazon Web Services – Fraud Detection Using Machine Learning May 2019
The guide is intended for developers and data scientists who have practical experience with
machine learning and architecting on the AWS Cloud.
Overview
Fraud is an ongoing problem that can cost businesses billions of dollars annually and damage
customer trust. Many companies use a rule-based approach to detect fraudulent activity
where fraud patterns are defined as rules. But, implementing and maintaining rules can be a
complex, time-consuming process because fraud is constantly evolving, rules require fraud
patterns to be known, and rules can lead to false positives or false negatives.
Machine learning (ML) can provide a more flexible approach to fraud detection. ML models
do not use pre-defined rules to determine whether activity is fraudulent. Instead, ML models
are trained to recognize fraud patterns in datasets, and the models are self-learning which
enables them to adapt to new, unknown fraud patterns.
Amazon SageMaker is a fully managed service that enables developers and data scientists to
quickly and easily build, train, and deploy machine learning models at any scale. Amazon
SageMaker removes the barriers that typically slow down developers who want to use
machine learning. This ability makes Amazon SageMaker applicable for a variety of use cases,
including fraud detection.
To help customers more easily leverage Amazon SageMaker for real-time fraud detection,
AWS offers the Fraud Detection Using Machine Learning solution. This solution automates
the detection of potentially fraudulent activity, and flags that activity for review. The solution
also includes an example dataset but you can modify the solution to work with any dataset.
Cost
You are responsible for the cost of the AWS services used while running this solution. As of
the date of publication, the one-time cost to train the solution’s ML model in the US East (N.
Virginia) Region is $0.50 for the Amazon SageMaker ml.c4.large instance. The cost to
process transactions using the example dataset is approximately $0.30 per hour. Prices
Page 3 of 13
Amazon Web Services – Fraud Detection Using Machine Learning May 2019
are subject to change. For full details, see the pricing webpage for each AWS service you will
be using in this solution.
Architecture Overview
Deploying this solution builds the following environment in the AWS Cloud.
The AWS CloudFormation template deploys an example dataset of credit card transactions
contained in an Amazon Simple Storage Service (Amazon S3) bucket and an Amazon
SageMaker endpoint with an ML model that will be trained on the dataset.
The solution also deploys an Amazon CloudWatch Events rule that is configured to run every
minute. The rule is configured to trigger an AWS Lambda function that processes
transactions from the example dataset and invoke the Amazon SageMaker endpoint which
predicts whether those transactions are fraudulent based on the trained ML model. An
Amazon Kinesis Data Firehose delivery stream loads the processed transactions into another
Amazon S3 bucket for storage.
Once the transactions have been loaded into Amazon S3, you can use analytics tools and
services, including Amazon QuickSight, for visualization, reporting, ad-hoc queries, and
more detailed analysis. For customers who want to use Amazon QuickSight to visualize the
processed transactions, see Appendix A.
Page 4 of 13
Amazon Web Services – Fraud Detection Using Machine Learning May 2019
By default, the solution is configured to process transactions from the example dataset. To
use your own dataset, you must modify the solution. For more information, see
Customization.
Solution Components
Amazon SageMaker
Fraud Detection Using Machine Learning uses an Amazon SageMaker notebook instance,
which is a fully managed machine learning (ML) Amazon Elastic Compute Cloud (Amazon
EC2) compute instance that runs the solution’s Jupyter notebook. The notebook is used to
train and deploy the solution’s ML model. For more information on notebook instances, see
Use Notebook Instances in the Amazon SageMaker Developer Guide.
Algorithm
Amazon SageMaker provides several built-in machine learning algorithms that you can use
for a variety of problem types. This solution leverages the built-in Linear Learner Algorithm,
an algorithm used for solving either classification or regression problems, to provide a
scalable linear regression model. For more information, see How Linear Learner Works in
the Amazon SageMaker Developer Guide.
Dataset
Fraud Detection Using Machine Learning contains a publicly available anonymized credit
card transaction dataset that is used to train the solution’s machine learning (ML) model.
The dataset was collected and analyzed during a research collaboration of Worldline and the
Machine Learning Group of Université Libre de Bruxelles on big data mining and fraud
detection. The dataset consists of anonymized credit card transactions over a two-day period
in 2013 by European card holders. Because the dataset is derived from real data, the
distribution of fraud is low compared to legitimate transactions. Fraudulent transactions
make up 0.172% of the total transactions. For more information, see Appendix B.
Considerations
Customization
By default, Fraud Detection Using Machine Learning uses a credit card fraud dataset to train
the machine learning (ML) model. However, you can customize the solution to use your own
dataset. To train the model on your own dataset, you must modify the included notebook to
point the model to your dataset. You must also modify the solution’s AWS Lambda function
to process your events.
Page 5 of 13
Amazon Web Services – Fraud Detection Using Machine Learning May 2019
We also recommend replacing the solution’s Amazon CloudWatch Events rule that triggers
the Lambda function with an Amazon API Gateway endpoint that is externally invoked by a
transaction or event from your existing business infrastructure.
Regional Deployment
Fraud Detection Using Machine Learning uses Amazon SageMaker and Amazon Kinesis Data
Firehose which are currently available in specific AWS Regions only. Therefore, you must
launch this solution in a region where these services are available.1
Automated Deployment
Before you launch the automated deployment, please review the considerations discussed in
this guide. Follow the step-by-step instructions in this section to configure and deploy Fraud
Detection Using Machine Learning into your account.
Time to deploy: Approximately five minutes
Page 6 of 13
Amazon Web Services – Fraud Detection Using Machine Learning May 2019
• Enter values for required parameters: Stack Name, Model and Data Bucket Name,
Results Bucket Name
• Review the other template parameters, and adjust if necessary.
Note: You are responsible for the cost of the AWS services used while running this
solution. See the Cost section for more details. For full details, see the pricing webpage
for each AWS service you will be using in this solution.
Note: This solution uses the Amazon SageMaker and Amazon Kinesis Data Firehose
services, which are currently available in specific AWS Regions only. Therefore, you
must launch this solution in an AWS Region where these services are available. For
the most current availability by region, see AWS service offerings by region.
3. On the Select Template page, verify that you selected the correct template and choose
Next.
4. On the Specify Details page, assign a name to your solution stack.
5. Under Parameters, review the parameters for the template and modify them as
necessary.
This solution uses the following default values.
Page 7 of 13
Amazon Web Services – Fraud Detection Using Machine Learning May 2019
Important: If you delete the solution stack, you must manually delete the Amazon
SageMaker endpoint (fraud_detection_endpoint).
Page 8 of 13
Amazon Web Services – Fraud Detection Using Machine Learning May 2019
Security
When you build systems on AWS infrastructure, security responsibilities are shared between
you and AWS. This shared model can reduce your operational burden as AWS operates,
manages, and controls the components from the host operating system and virtualization
layer down to the physical security of the facilities in which the services operate. For more
information about security on AWS, visit the AWS Security Center.
Page 9 of 13
Amazon Web Services – Fraud Detection Using Machine Learning May 2019
Additional Resources
AWS services
• Amazon SageMaker • Amazon Simple Storage Service
• AWS Lambda • Amazon QuickSight
• Amazon CloudWatch Events • AWS CloudFormation
• Amazon Kinesis Data Firehose
{
"fileLocations": [
{
"URIPrefixes": [
"https://s3-us-east-1.amazonaws.com/bucket-
name/"
]
}
],
"globalUploadSettings": {
"format": "CSV",
"delimiter": ",",
"textqualifier": "'",
"containsHeader": "false"
}
}
Page 10 of 13
Amazon Web Services – Fraud Detection Using Machine Learning May 2019
Page 11 of 13
Amazon Web Services – Fraud Detection Using Machine Learning May 2019
Appendix B: Acknowledgements
Fraud Detection Using Machine Learning contains a publicly available anonymized credit
card transaction dataset that was collected and analyzed during a research collaboration of
Worldline and the Machine Learning Group of Université Libre de Bruxelles on big data
mining and fraud detection. For more details on current and past fraud detection projects,
see ResearchGate and DEFEATFRAUD.
• Dal Pozzolo, Andrea, Caelen, Olivier, Johnson, Ried A., & Bontempi, Gianluca (2015)
Calibrating probability with undersampling for unbalanced classification. In Symposium
on Computational Intelligence and Data Mining (CIDM), IEEE
• Dal Pozzolo, Andrea, Caelen, Olivier, Le Borgne, Yann-Aël, Waterschoot, Serge, &
Bontempi, Gianluca (2014). Learned lessons in credit card fraud detection from a
practitioner perspective. Expert Systems With Applications, 41(10), 4915-4928,
Pergamon
• Dal Pozzolo, Andrea, Boracchi, Giacomo, Caelen, Olivier, Alippi, Cesare, & Bontempi,
Gianluca (2018). Credit card fraud detection: A realistic modeling and a novel learning
strategy. IEEE Transactions on Neural Networks and Learning Systems, 29(8), 3784-
3797, IEEE
• Dal Pozzolo, Andrea. Adaptive machine learning for credit card fraud detection. ULB
MLG PhD thesis (supervised by Bontempi, Gianluca)
• Carcillo, Fabrizio, Dal Pozzolo, Andrea, Le Borgne, Yann-Aël, Caelen, Olivier, Masser,
Yannis, & Bontempi, Gianluca (2018). Scarff: A scalable framework for streaming credit
card fraud detection with Spark. Information Fusion, 41, 182-194, Elsevier
• Carcillo, Fabrizio, Le Borgne, Yann-Aël, Caelen, Olivier, & Bontempi, Gianluca (2018).
Streaming active learning strategies for real-life credit card fraud detection: Assessment
and visualization. International Journal of Data Science and Analytics, 5(4), 285-300,
Springer International Publishing
Page 12 of 13
Amazon Web Services – Fraud Detection Using Machine Learning May 2019
Source Code
You can visit our GitHub repository to download the templates and scripts for this solution,
and to share your customizations with others.
Document Revisions
Date Change In sections
May 2019 Initial publication --
Notices
Customers are responsible for making their own independent assessment of the information in this document.
This document: (a) is for informational purposes only, (b) represents current AWS product offerings and
practices, which are subject to change without notice, and (c) does not create any commitments or assurances
from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without
warranties, representations, or conditions of any kind, whether express or implied. The responsibilities and
liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor
does it modify, any agreement between AWS and its customers.
Fraud Detection Using Machine Learning is licensed under the terms of the Apache License Version 2.0
available at https://www.apache.org/licenses/LICENSE-2.0.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Page 13 of 13