Fuzzy based binary feature profiling for modus operandi analysis
- Published
- Accepted
- Subject Areas
- Algorithms and Analysis of Algorithms, Artificial Intelligence, Data Mining and Machine Learning
- Keywords
- Modus Operandi Analysis, Fuzzy Inference Systems, Binary Feature Analysis, Classification, Association Rule Mining
- Copyright
- © 2015 Chamikara et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. Fuzzy based binary feature profiling for modus operandi analysis. PeerJ PrePrints 3:e1532v1 https://doi.org/10.7287/peerj.preprints.1532v1
Abstract
It is a well-known fact that some criminals follow perpetual methods of operations, known as modus operandi (MO) which is commonly used to describe the habits in committing something especially in the context of criminal investigations. These modus operandi are then used in relating criminals to other crimes where the suspect has not yet been recognized. This paper presents a method which is focused on identifying the perpetual modus operandi of criminals by analyzing their previous convictions. The method involves in generating a feature matrix for a particular suspect based on the flow of events. Then, based on the feature matrix, two representative modus operandi are generated: complete modus operandi and dynamic modus operandi. These two representative modus operandi will be compared with the flow of events of the crime in order to investigate and relate a particular criminal. This comparison uses several operations to generate two other outputs: completeness probability and deviation probability. These two outcomes are used as inputs to a fuzzy inference system to generate a score value which is used in providing a measurement for the similarity between the suspect and the crime at hand. The method was evaluated using actual crime data and four other open data sets. Then ROC analysis was performed to justify the validity and the generalizability of the proposed method. In addition, comparison with five other classification algorithms showed that the proposed method performs competitively with other related methods.
Author Comment
This is the first version of the paper. The paper is currently under review at PeerJ.
Supplemental Information
Balloons Dataset
This data set has been generated using an experiment of stretching a collection of balloons carried out on a group of adults and children. In the data set, Inflated is true if (color=yellow and size = small) or (age=adult and act=stretch). In the data set there are two main output classes, namely T if inflated and F if not inflated, two colors yellow and purple, two sizes, large and small, two act types, stretch and dip, and two age groups, adult and child. After the pre processing step, the total number of input columns became 8.
Crime modus operandi dataset
This data set is composed of 48 columns. Each column represents whether that flow has taken place or not. The data set is composed of modus operandi of 20 criminals.
Balance dataset
This data set has been generated to model psychological experimental results. Each example is classified as having the balance scale tip to the right, tip to the left, or be balanced. The attributes are the left weight, the left distance, the right weight, and the right distance. After the preprocessing step the number of input variables was changed to 20 binary values.
Dermatology dataset
This data set has been created on a dermatology test carried out on skin samples which have been taken for the evaluation of 22 histopathological features. The values of the histopathological features have been determined by an analysis of the samples under a microscope. This data set has been moderated in such a way that it suits the proposed algorithm according to the Data preprocessing method proposed in the paper. Therefore, the processed data set has got 97 input variables and the class variable.
Car evaluation dataset
Car Evaluation Database has been derived from a simple hierarchical decision model originally developed for the demonstration of DEX. The preprocessed data set contains 21 input variables.