Data Mining Using Learning Techniques For Fraud Detection
Data Mining Using Learning Techniques For Fraud Detection
Data Mining Using Learning Techniques For Fraud Detection
3.TYPE OF LEARNING FOR FRAUD DETECTION Anomaly Detection, the set of data points that are considerably different than the remainder of the data.Anomaly is a pattern in the data that does not conform to the expected behaviour. Anomaly Detection is a unsupervised method for fraud detection. Applications: Credit card fraud detection, telecommunication fraud detection, network intrusion detection, fault detection. General Steps Build a profile of the normal behavior Profile can be patterns or summary statistics for the overall population H Use the normal profile to detect anomalies Anomalies are observations whose characteristics differ significantly from the normal profile
P1
Anomaly
O1 O2 N2 N3 O4
N1
Example of Statistical Approach Apply a statistical test that depends on Data distribution Parameter of distribution (e.g., mean, variance) Number of expected outliers (confidence limit)
y
P
O5
O3 N4 Here, in this Example N1, N2, N3, N4 are regions of normal Points O1, O2, O3, O4, O5 are anomalies
behaviour
4.TYPES OF ANOMALY DETECTION Graphical & Statistical-based: Calculation of various statistical parameters such as averages, quantiles, performance metrics, probability distributions, and so on. For example, the averages may include average length of call, average number of calls per month and average delays in bill payment. Models and probability distributions of various business activities either in terms of various parameters or probability distributions. Box plot (1-D), Scatter plot (2-D), Spin plot (3-D) are the graphical approach for detecting fraud. Example of Graphical Approach Here the point P1 is different from the other points in the series, it is an Anomaly or Outlier The Major Limitations of The Graphical Approach To detect Fraud are
r o b a b il it y
90%
5%
5%
Data Value
Distance Based Approach Nearest-neighbor based:-Key: normal points have close neighbors while anomalies are located far from other points Density based :- Key: Compute local densities of particular regions and declare instances in low density regions as potential anomalies Clustering:-Key assumption: normal data records belong to large and dense clusters, while anomalies belong donot belong to any of the clusters or form very small clusters
DistributedAnomaly Detection Data in many anomaly detection applications may come from different sources, example network intrusion detection, credit card frauds, and aviation safety. Failure that occurs in multiple location simultaneously may be undetected by analyzing only data from a single location, so there is a need for a high performance and distributed algorithm for correlation and integration of anomalies Two basics techniques for distributed anomaly detection. Simple data exchange technique and distributing nearest neighbor technique Here, in this diagram o2 is the Nearest neighbor of cluster C2 and C1 is the density based approach 5.CONTEXTUAL AND COLLECTIVE BASED Contextual: - It identifies the context around a data instance and determines if the data instance is anomalous with respect to the context using a set of behavioral attribute Conditional: - Each data point is represented as (x, y) coordinates where x denotes environmental attributes and y denotes indicator attributes. Advantage: - detect Anomalies that are hard to detect when analyzed in a global perspective Challenges: - it is difficult to identify the good contextual attributes Collective Based It detect collective anomalies Exploit the relationship among the data instances. Collective based anomalies are of 3 types Sequential Anomaly:- Detect anomalous sequences Spatial Anomaly:- Detects anomalous sub regions in a spatial data set Graphical:- Detects anomalous sub graphs in graphical data OnlineAnomaly Detection Data in Many rear event arrives continuously at enormous pace There is a significant challenge to analyze such data example of such rear events are video analysis, network traffic monitoring, air craft safety, credit card fraudulent transaction Drawback: if arriving data points start to create a new data cluster then this method will not be able to detect these points as outliers and neither the time when the change occur CONCLUSION Anomaly detection is based on profile that represent the normal behavior of the users or the networks and detecting attacks as significant deviation from this profile Major benefit of anomaly detection is used potentially to recognize fraud/unforeseen attacks Major approach used for frauds/anomaly detection are statistical methods, clustering, Expert system and outlier detection schemes etc. Anomaly detection can detect the critical information in data Nature of anomaly detection problem is dependent on the application domain e.g. the cases like credit cards frauds and web intrusion are solved by on line anomaly and distributed anomaly detection techniques REFERENCES [1]. www.dmargineantu.net/ab.../dmmad2005.workshopno tes.pdf [2]. www.cs.purdue.edu/home/neville/courses/573/.../lectu re23.pdf [3]. www.autonlab.org/tutorials [4]. www.cs.berkeley.edu/~jordan/courses/294fall09/.../time/slides.ppt [5]. www.users.cs.umn.edu/~kumar/.../chapter10_anomaly _detection.ppt [6]. www.siam.org/meetings/sdm08/TS2.ppt [7]. www.slideshare.net/.../anomaly-detection-2747825unitedstates [8]. www.wikipedia.org/wiki/anomaly_detection [9]. www.statssoft.com/textbook/fraud-detection