Data Mining Using Learning Techniques For Fraud Detection

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Proceedings of the 5th National Conference; INDIACom-2011 Computing For Nation Development, March 10 11, 2011 Bharati Vidyapeeths

s Institute of Computer Applications and Management, New Delhi

Data Mining Using Learning Techniques for Fraud Detection


Pooja Sachdeva1 and Sangeeta Behl2 1 Sr. Lecturer, 2 Lecturer 1,2 DAV Institute of Management, NH 3, Faridabad 1 pooja.sachdeva78@gmail.com and 2 behl.sangeeta@gmail.com
ABSTRACT Data mining is a combination of database and artificial intelligence technologies. It is a process of identifying and extracting patterns from data, particularly from very large and/or complex sets of data. The major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data. Data mining and Machine Learning is a relatively new technique that is proving to be extremely effective in detecting fraud, and it offers insurers new opportunities to reduce losses. Fraud that involves cell phones, insurance claims, tax return claims, credit card transactions etc represent significant problems for governments and businesses, but yet detecting and preventing fraud is not a simple task. Fraud is an adaptive crime, so it needs special methods of intelligent data analysis to detect and prevent it. These methods exist in the areas of Knowledge Discovery in Databases (KDD), Data Mining, Machine Learning and Statistics. Techniques used for fraud detection fall into two primary classes: statistical techniques and artificial intelligence. Example Data preprocessing techniques for detection, validation, error correction, and filling up of missing or incorrect data. Calculation of various statistical parameters such as averages, quantiles, performance metrics, probability distributions, and so on. The main AI techniques used for fraud management include: Data mining to classify, cluster, and segment the data and automatically find associations and rules in the data that may signify interesting patterns, including those related to fraud. Machine learning techniques to automatically identify characteristics of fraud This paper presents the detection of fraud through data mining and machine learning techniques. 1.INTRODUCTION Data Mining, the extraction of hidden predictive information from large databases. Data mining derives its name from the similarities between searching for valuable business information in a large database for example, finding linked products in gigabytes of store scanner data and mining a mountain for a vein of valuable ore. Data mining is a combination of database and artificial intelligence technologies. Machine Learning, the ability of a program to learn from experience that is, to modify its execution on the basis of newly acquired information. The ability of a machine to improve its performance based on previous results. Fraud, any act of deception carried out for the purpose of unfair, undeserved and/or unlawful gain.Fraud is an adaptive crime, so it needs special methods of intelligent data analysis to detect and prevent it. Types of fraud Credit card fraud Insurance claim fraud Mobile / cell phone fraud Insider trading Fraud Detection, is concerned with the detection of fraud cases from logged data of system and user behavior. Data mining and Machine Learning is a relatively new technique that is proving to be extremely effective in detecting fraud, and it offers insurers new opportunities to reduce losses. 2.COMMON MACHINE LEARNING There are many types of machine learning Supervised Learning, in which the data is labeled with the correct answers. The two most common types of supervised learning is Classification and Regression. example of a classification problem is for the computer to learn how to recognize handwritten digit, Supervised learning can also be used in medical diagnoses--for instance, given a set of attributes about potential cancer patients, and whether those patients actually had cancer, the computer could learn how to distinguish between likely cancer patients and possible false alarms. Unsupervised Learning, in which we are given a collection of unlabeled data, which we wish to analyze and discover patterns within. e.g. dimension reduction and clustering. The goal is to have computer learn how to do something that we dont tell it how to do! Clustering can be useful when there is enough data to form clusters (though this turns out to be difficult at times) and especially when additional data about members of a cluster can be used to produce further results due to dependencies in the data. Reinforcement Learning, in which an agent seeks to learn the optimal actions to take , based on the past actions e.g. Robot

Copy Right INDIACom-2011 ISSN 0973-7529 ISBN 978-93-80544-00-7

Proceedings of the 5th National Conference; INDIACom-2011

3.TYPE OF LEARNING FOR FRAUD DETECTION Anomaly Detection, the set of data points that are considerably different than the remainder of the data.Anomaly is a pattern in the data that does not conform to the expected behaviour. Anomaly Detection is a unsupervised method for fraud detection. Applications: Credit card fraud detection, telecommunication fraud detection, network intrusion detection, fault detection. General Steps Build a profile of the normal behavior Profile can be patterns or summary statistics for the overall population H Use the normal profile to detect anomalies Anomalies are observations whose characteristics differ significantly from the normal profile

Time Consuming Subjective

P1

Anomaly

O1 O2 N2 N3 O4

N1

Example of Statistical Approach Apply a statistical test that depends on Data distribution Parameter of distribution (e.g., mean, variance) Number of expected outliers (confidence limit)

y
P
O5

O3 N4 Here, in this Example N1, N2, N3, N4 are regions of normal Points O1, O2, O3, O4, O5 are anomalies

behaviour

4.TYPES OF ANOMALY DETECTION Graphical & Statistical-based: Calculation of various statistical parameters such as averages, quantiles, performance metrics, probability distributions, and so on. For example, the averages may include average length of call, average number of calls per month and average delays in bill payment. Models and probability distributions of various business activities either in terms of various parameters or probability distributions. Box plot (1-D), Scatter plot (2-D), Spin plot (3-D) are the graphical approach for detecting fraud. Example of Graphical Approach Here the point P1 is different from the other points in the series, it is an Anomaly or Outlier The Major Limitations of The Graphical Approach To detect Fraud are

r o b a b il it y

90%

5%

5%

Data Value

Distance Based Approach Nearest-neighbor based:-Key: normal points have close neighbors while anomalies are located far from other points Density based :- Key: Compute local densities of particular regions and declare instances in low density regions as potential anomalies Clustering:-Key assumption: normal data records belong to large and dense clusters, while anomalies belong donot belong to any of the clusters or form very small clusters

Copy Right INDIACom-2011 ISSN 0973-7529 ISBN 978-93-80544-00-7

Data Mining Using Learning Techniques for Fraud Detection

DistributedAnomaly Detection Data in many anomaly detection applications may come from different sources, example network intrusion detection, credit card frauds, and aviation safety. Failure that occurs in multiple location simultaneously may be undetected by analyzing only data from a single location, so there is a need for a high performance and distributed algorithm for correlation and integration of anomalies Two basics techniques for distributed anomaly detection. Simple data exchange technique and distributing nearest neighbor technique Here, in this diagram o2 is the Nearest neighbor of cluster C2 and C1 is the density based approach 5.CONTEXTUAL AND COLLECTIVE BASED Contextual: - It identifies the context around a data instance and determines if the data instance is anomalous with respect to the context using a set of behavioral attribute Conditional: - Each data point is represented as (x, y) coordinates where x denotes environmental attributes and y denotes indicator attributes. Advantage: - detect Anomalies that are hard to detect when analyzed in a global perspective Challenges: - it is difficult to identify the good contextual attributes Collective Based It detect collective anomalies Exploit the relationship among the data instances. Collective based anomalies are of 3 types Sequential Anomaly:- Detect anomalous sequences Spatial Anomaly:- Detects anomalous sub regions in a spatial data set Graphical:- Detects anomalous sub graphs in graphical data OnlineAnomaly Detection Data in Many rear event arrives continuously at enormous pace There is a significant challenge to analyze such data example of such rear events are video analysis, network traffic monitoring, air craft safety, credit card fraudulent transaction Drawback: if arriving data points start to create a new data cluster then this method will not be able to detect these points as outliers and neither the time when the change occur CONCLUSION Anomaly detection is based on profile that represent the normal behavior of the users or the networks and detecting attacks as significant deviation from this profile Major benefit of anomaly detection is used potentially to recognize fraud/unforeseen attacks Major approach used for frauds/anomaly detection are statistical methods, clustering, Expert system and outlier detection schemes etc. Anomaly detection can detect the critical information in data Nature of anomaly detection problem is dependent on the application domain e.g. the cases like credit cards frauds and web intrusion are solved by on line anomaly and distributed anomaly detection techniques REFERENCES [1]. www.dmargineantu.net/ab.../dmmad2005.workshopno tes.pdf [2]. www.cs.purdue.edu/home/neville/courses/573/.../lectu re23.pdf [3]. www.autonlab.org/tutorials [4]. www.cs.berkeley.edu/~jordan/courses/294fall09/.../time/slides.ppt [5]. www.users.cs.umn.edu/~kumar/.../chapter10_anomaly _detection.ppt [6]. www.siam.org/meetings/sdm08/TS2.ppt [7]. www.slideshare.net/.../anomaly-detection-2747825unitedstates [8]. www.wikipedia.org/wiki/anomaly_detection [9]. www.statssoft.com/textbook/fraud-detection

Copy Right INDIACom-2011 ISSN 0973-7529 ISBN 978-93-80544-00-7

You might also like