Outlier detection
1,759 Followers
Recent papers in Outlier detection
The paper presents model based on fuzzy methods for churn prediction in retail banking. The study was done on the real, anonymised data of 5000 clients of a retail bank. Real data are great strength of the study, as a lot of studies often... more
RESUMO O presente artigo pretende apresentar os principais resultados de um estudo empírico de modelação da série cronológica da taxa de juro nominal da operação activa do crédito a particulares em Portugal realizado por Caiado (1997).... more
All known robust location and scale estimators with high breakdown point for multivariate sample's are very expensive to compute. In practice, this computation has to be carried out using an approximate subsampling procedure. In this work... more
Spatial data mining is the process of discovering interesting and previously unknown, but potentially useful, patterns from large spatial datasets. Extracting interesting and useful patterns from spatial datasets is more difficult than... more
The analysis of UCR data provides a basis for crime prevention in the United States as well as a sound decision making tool for policy makers. The decisions made with the use of UCR data range from major funding for resource allocation... more
A distance-based outlier detection method that finds the top outliers in an unlabeled data set and provides a subset of it, called outlier detection solving set, that can be used to predict the outlierness of new unseen objects, is... more
Clustering is an extremely important task in a wide variety of application domains especially in management and social science research. In this paper, an iterative procedure of two-way clustering method based on multivariate outlier... more
Outlier (or anomaly) detection is an important problem for many domains, including fraud detection, risk analysis, network intrusion and medical diagnosis, and the discovery of significant outliers is becoming an integral aspect of data... more
When a system fails to function properly, healthrelated data are collected for troubleshooting. However, it is challenging to effectively identify anomalies from the voluminous amount of noisy, high-dimensional data. The traditional... more
Techniques based on agglomerative hierarchical clustering constitute one of the most frequent approaches in unsupervised clustering. Some are based on the single linkage methodology, which has been shown to produce good results with sets... more
Detecting outliers in data is an important problem with interesting applications in a myriad of domains ranging from data cleaning to financial fraud detection and from network intrusion detection to clinical diagnosis of diseases. Over... more
Outlier detection is an important task in many applications; it can lead to the discovery of unexpected, useful or interesting objects in data analysis. Many outlier detection methods are available. However, they are limited by... more
Special thanks are given to my supervisors in Lisbon and Stavanger, Prof. Paulo Urbano and Prof. Chunming Rong, who accepted supervise this master thesis project in two different universities, on two countries so far away, for the... more
Before implementing any multivariate statistical analysis based on empirical covariance matrices, it is important to check whether outliers are present because their existence could induce significant biases. In this article, we present... more
Given the widespread use of modern information technology, a large number of time series may be collected during normal business operations. We use a fast-food restaurant franchise as a case to illustrate how data mining can be applied to... more
Outlier is an object which is different from any objects in one dataset. In data mining, outlier detection is one of growing researches. Generally, outlier detection methods find exception or rare cases in a dataset without considered... more
Increased interest in the opportunities provided by artificial intelligence and machine learning has spawned a new field of healthcare research. The new tools under development are targeting many aspects of medical practice, including... more
Empirical QSAR models are only valid in the domain they were trained and validated. Application of the model to substances outside the domain of the model can lead to grossly erroneous predictions. Partial least squares (PLS) regression... more
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument... more
Every second plethora of reviews on various product lines are being posted in a trending e-commerce website. The objective of a review section in such websites is to analyze customer satisfaction for sales growth and to aid buyers make... more
There is an increasing concern about the control of customs operations. While globalization incentives the opening of the market, increasing amounts of imports and exports have been used to conceal several illicit activities, such as, tax... more
When analyzing data, outlying observations cause problems because they may strongly influence the result. Robust statistics aims at detecting the outliers by searching for the model fitted by the majority of the data. We present an... more
Outlier detection is an important branch in data mining, which is the discovery of data that deviate a lot from other data patterns. Outlier identification can be classified in to formal and informal methods. This paper deals the informal... more
By comparing historical data of trading like daily Open, High, Low, Close, Volume, Number of Trades, Turnover, Delivery percentage etc. of a particular stock with its Peer Group companies and Non Peer Group companies stocks for a... more
The present article discusses various preprocessing techniques suitable for dealing with time series data for environmental science-related studies. The errors or noises due to electronic sensor fault, fault in the communication channel,... more
Seiring dengan semakin diminatinya minuman wine, banyak negara yang mendukung pertumbuhan industri minuman ini. Sertifikasi guna meyakinkan konsumen akan kualitas dan untuk mencegah pemalsuan terhadap produk anggur juga diperlukan.... more
In this paper we focus on the impact of additive level outliers on the calculation of risk measures, such as minimum capital risk requirements, and compare four alternatives of reducing these measures' estimation biases. The first three... more
Credit card frauds are at an ever-increasing rate and have become a major problem in the financial sector. Because of these frauds, card users are hesitant in making purchases and both the merchants and financial institutions bear heavy... more
Raw data collected through surveys, experiments, coding of textual artifacts or other quantitative means may not meet the assumptions upon which statistical analyses rely. The presence of univariate or multivariate outliers, skewness or... more
These days, with the popularity and significant advancements of emerging technologies such as Internet of Things (IoT), Cyber-Physical-Systems (CPS), and other wireless sensor technologies, the huge volume of sensor data has generated for... more
The minimum covariance determinant (MCD) estimator is a highly robust estimator of multivariate location and scatter. It can be computed efficiently with the FAST-MCD algorithm of Rousseeuw and Van Driessen. Since estimating the... more
Center of Mass (CoM) estimation realizes a crucial role in legged locomotion. Most walking pattern generators and real-time gait stabilizers commonly assume that the CoM position and velocity are available for feedback. In this thesis we... more
An outlier is an observations which deviates or far away from the rest of data. There are two kinds of outlier methods, tests discordance and labeling methods. In this paper, we have considered the medical diagnosis data set finding... more
Outliers in a set of data represent observations that are distinguished from the expected patterns in the observed data. We are attracted to them because they are somehow atypical of what we expect to see in the distribution of the data... more
Pemodelan menggunakan analisis regresi linier berganda memiliki tindak lanjut setelah full model terbentuk. Tindak lanjut dilakukan untuk mendapatkan hasil evaluasi yang terbaik untuk model. Pemodelan menggunakan analisis regresi berganda... more
Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is... more
For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an... more
Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of the underlying distribution. We propose an outlier detection method which does not need the assumption of symmetry and does not rely... more
In this paper we propose a method for correctly detecting outliers based on a new technique developed to simultaneously evaluate mean, variance and outliers. This method is capable of self-regulating its robustness to suit the... more
Plagiarism detection can be divided in external and intrinsic methods. Naive external plagiarism analysis suers from computationally demanding full near- est neighbor searches within a reference corpus. We present a conceptually simple... more
Multivariate calibration 1 (MVC1), a MatLabR toolbox for implementing up to 12 different first-order calibration methodologies through easily managed graphical user interfaces, is presented. The toolbox accepts different input data... more
Cognitive radio is an enabling technology that allows opportunistic users to reuse licensed spectrum in order to overcome the artificial spectrum scarcity. In cognitive radio networks, opportunistic users collaboratively perform spectrum... more