0% found this document useful (0 votes)

0 views

Module_11(c)

Uploaded by

dipyamanbiswas2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Module_11(c)

Uploaded by

dipyamanbiswas2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Anomaly Detection

What is Anomaly Detection?

Anomaly detection, sometimes called outlier detection, is a process of finding patterns or instances
in a dataset that deviate significantly from the expected or “normal behavior.”

The definition of both “normal” and anomalous data significantly varies depending on the context.
Below are a few examples of anomaly detection in action.

1. Financial transactions

Normal: Routine purchases and consistent spending by an individual in London.

Outlier: A massive withdrawal from Ireland from the same account, hinting at potential fraud.

2. Network traffic in cybersecurity

Normal: Regular communication, steady data transfer, and adherence to protocol.

Outlier: Abrupt increase in data transfer or use of unknown protocols signaling a potential breach or
malware.

3. Patient vital signs monitoring

Normal: Stable heart rate and consistent blood pressure

Outlier: Sudden increase in heart rate and decrease in blood pressure, indicating a potential
emergency or equipment failure.

Anomaly detection includes many types of unsupervised methods to identify divergent samples.
Data specialists choose them based on anomaly type, the context, structure, and characteristics of
the dataset at hand. We’ll cover them in the coming sections.

Real-World Applications of Anomaly Detection

Even though we saw some examples above, let’s look at a real-life story of how anomaly detection
works in finance.

Shaq O’Neal, four times NBA winner, gets traded from the Miami Heat to the Phoenix Suns. When
Shaq arrives at the empty apartment provided by the Phoenix Suns, he wants to furnish his
apartment immediately in the middle of the night. So, he goes to Walmart and makes the biggest
purchase in Walmart history for $70,000. Or at least, he tries to; his card gets declined.

He wonders what could possibly be the problem (he can’t be broke!) At 2 am in the morning,
American Express security calls him and tells him that his card was suspected as stolen because
somebody was trying to make a $70,000 purchase at Walmart in Phoenix.

There are so many other real-world applications of anomaly detection beyond finance and fraud
detection:

 Cybersecurity

 Healthcare

 Industrial equipment monitoring

 Network intrusion detection

 Energy grid monitoring

 E-commerce and user behavior analysis

 Quality control in manufacturing

Anomaly detection is deeply woven into the daily services we use and often, we don’t even notice it.

The Importance of Anomaly Detection in Data Science

Data is the most precious commodity in data science, and anomalies are the most disruptive threats
to its quality. Bad data quality means bad:

 Statistical tests

 Dashboards

 Machine learning models

 Decisions

and ultimately, a compromised foundation for informed decision-making.

Anomalies distort statistical analyses by introducing non-existent patterns, leading to wrong

conclusions and unreliable predictions. As they are often the extreme values in a dataset, anomalies
often skew the two most important characteristics of distributions: mean and standard deviation.

As the internals of almost all machine learning models rely heavily on these two metrics, timely
detection of anomalies is crucial.

Types of Anomalies

Anomaly detection encompasses two broad practices: outlier detection and novelty detection.

Outliers are abnormal or extreme data points that exist only in training data. In contrast, novelties
are new or previously unseen instances compared to the original (training) data.

For example, consider a dataset of daily temperatures in a city. Most days, the temperatures range
between 20°C and 30°C. However, one day, there’s a spike of 40°C. This extreme temperature is an
outlier as it significantly deviates from the usual daily temperature range.

Now, imagine that the city installs a new, more accurate weather monitoring station. As a result, the
dataset starts consistently recording slightly higher temperatures, ranging from 25°C to 35°C. This
sustained increase in temperatures is a novelty, representing a new pattern introduced by the
improved monitoring system.

Anomalies, on the other hand, is a broad term for both outliers and novelties. It can be used to
define any abnormal instance in any context.

Identifying the type of anomalies is crucial as it allows you to choose the right algorithm to detect
them.

Types of Outliers

As there are two types of anomalies, there are two types of outliers as well: univariate and
multivariate. Depending on the type, we will use different detection algorithms.
1. Univariate outliers exist in a single variable or feature in isolation. Univariate outliers are
extreme or abnormal values that deviate from the typical range of values for that specific
feature.

2. Multivariate outliers are found by combining the values of multiple variables at the same
time.

For example, consider a dataset of housing prices in a neighborhood. Most houses cost between
$200,000 and $400,000, but there is House A with an exceptionally high price of $1,000,000. When
we analyze only the price, House A is a clear outlier.

Now, let’s add two more variables to our dataset: the square footage and the number of bedrooms.
When we consider the square footage, the number of bedrooms, and the price, it’s House B that
looks odd:

 It has half the square footage of the mean house price.

 It has only one bedroom.

 It costs the top of the range $380.000.

When we look at these variables individually, they seem ordinary. Only when we put them together
do we find out that House B is a clear multivariate outlier.

Anomaly Detection Methods And When to Use Each One

Anomaly detection algorithms differ depending on the type of outliers and the structure in the
dataset.

For univariate outlier detection, the most popular methods are:

1. Z-score (standard score): the z-score measures how many standard deviations a data point
is away from the mean. Generally, instances with a z-score over 3 are chosen as outliers.

2. Interquartile range (IQR): The IQR is the range between the first quartile (Q1) and the third
quartile (Q3) of a distribution. When an instance is beyond Q1 or Q3 for some multiplier of
IQR, they are considered outliers. The most common multiplier is 1.5, making the outlier
range [Q1–1.5 * IQR, Q3 + 1.5 * IQR].

3. Modified z-scores: similar to z-scores, but modified z-scores use the median and a measure
called Median Absolute Deviation (MAD) to find outliers. Since mean and standard deviation
are easily skewed by outliers, modified z-scores are generally considered more robust.

For multivariate outliers, we generally use machine learning algorithms. Because of their depth and
strength, they are able to find intricate patterns in complex datasets:

1. Isolation Forest: uses a collection of isolation trees (similar to decision trees) that recursively
divide complex datasets until each instance is isolated. The instances that get isolated the
quickest are considered outliers.

2. Local Outlier Factor (LOF): LOF measures the local density deviation of a sample compared
to its neighbours. Points with significantly lower density are chosen as outliers.
3. Clustering techniques: techniques such as k-means or hierarchical clustering divide the
dataset into groups. Points that don’t belong to any group or are in their own little clusters
are considered outliers.

4. Angle-based Outlier Detection (ABOD): ABOD measures the angles between individual
points. Instances with odd angles can be considered outliers.

Apart from the type of anomalies, you should consider dataset characteristics and project
constraints. For example, Isolation Forest works well on almost any dataset but it is slower and
computation-heavy as it is an ensemble method. In comparison, LOF is very fast in training but may
not perform as well as Isolation Forest.

Homework #6: Student: Mario Perez
No ratings yet
Homework #6: Student: Mario Perez
8 pages
ST104a Vle
100% (1)
ST104a Vle
203 pages
Syllabus - Statistical Analysis With Software Application
100% (7)
Syllabus - Statistical Analysis With Software Application
4 pages
Slovin Formula
50% (2)
Slovin Formula
3 pages
Ula Analyze
50% (2)
Ula Analyze
7 pages
17 dm2 Anomaly Detection 2022 23
No ratings yet
17 dm2 Anomaly Detection 2022 23
113 pages
Anomaly Detection
No ratings yet
Anomaly Detection
10 pages
Ebook Beginners Guide To Anomaly Detection 2022
No ratings yet
Ebook Beginners Guide To Anomaly Detection 2022
12 pages
Anomaly Detection and Outlier Analysis
No ratings yet
Anomaly Detection and Outlier Analysis
25 pages
Ecmlpkdd08 Lazarevic Dmfa
No ratings yet
Ecmlpkdd08 Lazarevic Dmfa
116 pages
6anomaly Fraud Detection
No ratings yet
6anomaly Fraud Detection
5 pages
Outlier Mining Techniques For Uncertain Data
No ratings yet
Outlier Mining Techniques For Uncertain Data
7 pages
UNIT 4
No ratings yet
UNIT 4
17 pages
The Ultimate Guide To Anomaly Detection: Key Use Cases, Techniques, and Autoencoder Machine Learning Models
No ratings yet
The Ultimate Guide To Anomaly Detection: Key Use Cases, Techniques, and Autoencoder Machine Learning Models
9 pages
Unit 2 - Part A
No ratings yet
Unit 2 - Part A
51 pages
Anomaly Detection Survey
No ratings yet
Anomaly Detection Survey
72 pages
Introtoanomalydetection 170421012904
No ratings yet
Introtoanomalydetection 170421012904
53 pages
Anomaly Detection
No ratings yet
Anomaly Detection
7 pages
Anomaly detection for Cyber security
No ratings yet
Anomaly detection for Cyber security
31 pages
Distance Based Outlier Detection
No ratings yet
Distance Based Outlier Detection
40 pages
MBA Analytics For Finance 08
No ratings yet
MBA Analytics For Finance 08
9 pages
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
No ratings yet
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
101 pages
Anomaly Detection Guidebook
100% (1)
Anomaly Detection Guidebook
16 pages
Anomaly Detection
No ratings yet
Anomaly Detection
49 pages
Anomoly Detection - Ensemble - Classifiers
No ratings yet
Anomoly Detection - Ensemble - Classifiers
68 pages
Methods To Detect Different Types of Outliers: March 2016
No ratings yet
Methods To Detect Different Types of Outliers: March 2016
7 pages
ADS EXP 7
No ratings yet
ADS EXP 7
10 pages
Anomaly Detection: A Tutorial
No ratings yet
Anomaly Detection: A Tutorial
101 pages
Paper 8664
No ratings yet
Paper 8664
8 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
10 - Anomaly Detection
No ratings yet
10 - Anomaly Detection
12 pages
Lecture Notes _ Anomaly Detection in Time Series
No ratings yet
Lecture Notes _ Anomaly Detection in Time Series
43 pages
Anomaly Detection 2
No ratings yet
Anomaly Detection 2
8 pages
1 s2.0 S0952197622004936 Main
No ratings yet
1 s2.0 S0952197622004936 Main
8 pages
Feature Engineering
No ratings yet
Feature Engineering
66 pages
Outlier Detection
No ratings yet
Outlier Detection
36 pages
Anomaly Detection: Lecture Notes For Chapter 9 Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Anomaly Detection: Lecture Notes For Chapter 9 Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
33 pages
ff12-deep-learning-for-anomaly-detection
No ratings yet
ff12-deep-learning-for-anomaly-detection
71 pages
Anomaly Detection: Jing Gao
No ratings yet
Anomaly Detection: Jing Gao
51 pages
Outlier Detection For Different Applications Review IJERTV2IS3508
No ratings yet
Outlier Detection For Different Applications Review IJERTV2IS3508
13 pages
Elastic Anomalies
No ratings yet
Elastic Anomalies
7 pages
Outlier Analysis
No ratings yet
Outlier Analysis
18 pages
Outlier Detection
No ratings yet
Outlier Detection
22 pages
Outlier Detection Techniques
No ratings yet
Outlier Detection Techniques
11 pages
Anomaly_detection
No ratings yet
Anomaly_detection
13 pages
Outlier Detection A Survey
No ratings yet
Outlier Detection A Survey
84 pages
Krishnendu PCB-IT602B
No ratings yet
Krishnendu PCB-IT602B
11 pages
07 OUTLIER DETECTION
No ratings yet
07 OUTLIER DETECTION
54 pages
Advanced Data Analysis Techniques 3
No ratings yet
Advanced Data Analysis Techniques 3
31 pages
02 - 03 - Anomaly Detection Survey
No ratings yet
02 - 03 - Anomaly Detection Survey
27 pages
Feature Engineering
No ratings yet
Feature Engineering
63 pages
Chapter 12. Outlier Analysis
No ratings yet
Chapter 12. Outlier Analysis
4 pages
Anamoly Detection
No ratings yet
Anamoly Detection
20 pages
10.anomaly Detection
No ratings yet
10.anomaly Detection
24 pages
Reverse Accessible in Local Outlier Factor Density Based Recognition (1)
No ratings yet
Reverse Accessible in Local Outlier Factor Density Based Recognition (1)
10 pages
Anomaly Detection
No ratings yet
Anomaly Detection
7 pages
C1a - Anomaly Detection
No ratings yet
C1a - Anomaly Detection
12 pages
Handling Ouliers
No ratings yet
Handling Ouliers
5 pages
Script
No ratings yet
Script
11 pages
Anomoly detection
No ratings yet
Anomoly detection
2 pages
Smbl Merged
No ratings yet
Smbl Merged
28 pages
Outlier Detection Techniques
100% (2)
Outlier Detection Techniques
56 pages
Anomaly Detection
No ratings yet
Anomaly Detection
22 pages
Building a High-Tech Alarm System with Raspberry Pi (Second Edition)
From Everand
Building a High-Tech Alarm System with Raspberry Pi (Second Edition)
William Pretty
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Assignment I Questions Econ. For Acct & Fin. 2024
100% (1)
Assignment I Questions Econ. For Acct & Fin. 2024
3 pages
Lab Report Bio 330
No ratings yet
Lab Report Bio 330
4 pages
Hasil SPSS
No ratings yet
Hasil SPSS
2 pages
Basic Bootstrap in Stata
No ratings yet
Basic Bootstrap in Stata
2 pages
605 1181 1 SM
No ratings yet
605 1181 1 SM
9 pages
Chapter 6 Control Charts For Variables (1) 2021
No ratings yet
Chapter 6 Control Charts For Variables (1) 2021
133 pages
Module 5 - Post Task
No ratings yet
Module 5 - Post Task
5 pages
Discriminant Analysis
100% (1)
Discriminant Analysis
20 pages
unit-5
No ratings yet
unit-5
15 pages
Probability One Mark
No ratings yet
Probability One Mark
8 pages
Correlation and Regression Skill Set
No ratings yet
Correlation and Regression Skill Set
8 pages
Course Outline PDF
No ratings yet
Course Outline PDF
12 pages
Final Exam Econom
No ratings yet
Final Exam Econom
10 pages
Binary Logistic
No ratings yet
Binary Logistic
87 pages
Anova Lecture Notes 1
No ratings yet
Anova Lecture Notes 1
8 pages
Cochrans Q Test
No ratings yet
Cochrans Q Test
19 pages
Five Number Summary Worksheet 1
No ratings yet
Five Number Summary Worksheet 1
2 pages
1036 2843 1 PB
No ratings yet
1036 2843 1 PB
11 pages
InferentialStats SPSS
No ratings yet
InferentialStats SPSS
14 pages
2 - Stat-701 Correlation
No ratings yet
2 - Stat-701 Correlation
16 pages
Chapter 9 - Correlation and Regression
No ratings yet
Chapter 9 - Correlation and Regression
112 pages
CH - 7 Time Series and Forecasting
No ratings yet
CH - 7 Time Series and Forecasting
44 pages
Introductory Statistics (STA101) Memo Assignment-1
No ratings yet
Introductory Statistics (STA101) Memo Assignment-1
6 pages
(Ebook) The Essentials of Statistics: A Tool for Social Research by Joseph F. Healey ISBN 9781111829568, 111182956X download
100% (2)
(Ebook) The Essentials of Statistics: A Tool for Social Research by Joseph F. Healey ISBN 9781111829568, 111182956X download
60 pages
Math 213 - Engineering Data Analysis
No ratings yet
Math 213 - Engineering Data Analysis
11 pages