1 Dashen - ATM - Fasika Wondimu 2019
1 Dashen - ATM - Fasika Wondimu 2019
Designing a model for identifying the reason for ATM out of service:
The case of Dashen bank
By
Fasika Wondimu
12 February 2019
Designing a model for identifying the reason for ATM out of service:
The case of Dashen bank
By
Fasika Wondimu
Advisor
12 February 2019
Designing a model for identifying the reason for ATM out of service:
The case of Dashen bank
By
Fasika Wondimu
_________________________________ Examiner______________________________
ACKNOWLEDGEMENT
Firstly, I give glory to God who is above all and gave me faith and trusted me in this thesis work.
Secondly, thanks for the beloved and courageous advisor Dr. Million Meshesha who worked
hard with me in this thesis.
My special thanks go to my classmate Berihun Hadis who was with me in implementation of this
thesis.
Thank you Dashen bank staffs Asnake Kebede and Metasebiya Ayisow who are helping me
when I made data collection.
Lastly my thanks also go to Dr. Tibebe Besha for his kindly encouragement.
DEDICATED
I dedicate this thesis for my spouse Bethlehem Abera and son Joshua Fasika.
Table of Contents
LIST OF FIGURES ........................................................................................................................................... 8
LIST OF TABLES .............................................................................................................................................9
LIST OF ACRONYMS.....................................................................................................................................10
ABSTRACT ................................................................................................................................................... 12
CHAPTER ONE............................................................................................................................................... 1
1.1 Background ................................................................................................................................... 1
1.2 Dashen bank and its services ....................................................................................................... 3
1.3 Statement of the problem ........................................................................................................... 5
1.4 Objective of the study .................................................................................................................. 9
1.4.1 General objective ................................................................................................................. 9
1.4.2 Specific objectives ................................................................................................................ 9
1.5 Scope and limitation of the study................................................................................................ 9
1.6 Significance of the study ............................................................................................................ 10
1.7 Methodology of the study ......................................................................................................... 11
1.7.1 Research design .................................................................................................................. 11
1.7.2 Understanding of the Problem .......................................................................................... 11
1.7.3 Understanding of the Data................................................................................................. 12
1.7.4 Preparation of the data ...................................................................................................... 13
1.7.5 Data mining for predictive modeling ................................................................................. 13
1.7.6 Evaluation of the Discovered Knowledge .......................................................................... 14
1.7.7 Use of the Discovered Knowledge ..................................................................................... 16
1.8 Operational definition................................................................................................................ 16
1.9 Organization of the research ..................................................................................................... 16
CHAPTER TWO ............................................................................................................................................ 18
2.1 Overview of data mining ............................................................................................................ 18
2.2 Data mining tasks ....................................................................................................................... 20
2.3 DM process models .................................................................................................................... 29
2.3.1 Academic Research Models ............................................................................................... 29
2.3.2 Industrial Model ................................................................................................................. 30
2.3.3 Hybrid model ...................................................................................................................... 31
2.4 Data mining applications ........................................................................................................... 35
2.5 Related works ............................................................................................................................. 37
2.6 Related works of ATM Banking service ..................................................................................... 41
CHAPTER THREE.......................................................................................................................................... 45
3.1 Understanding of the problem domain ..................................................................................... 45
3.2 Understanding of the data ......................................................................................................... 50
3.3 Data Preparation ........................................................................................................................ 56
CHAPTER FOUR ........................................................................................................................................... 63
4.1 Building the model ..................................................................................................................... 63
4.2 Experimental result using J48 decision tree .............................................................................. 63
4.3 Experiment result using PART .................................................................................................... 65
4.4 Experimental results of Naïve Bayes ......................................................................................... 66
4.5 Experimental result of multilayer perception ........................................................................... 67
4.6 Evaluation ................................................................................................................................... 68
CHAPTER FIVE ............................................................................................................................................. 73
5.1 Research Model .......................................................................................................................... 73
5.2 The System Development .......................................................................................................... 74
5.3 The Prototype ............................................................................................................................. 74
5.4 Evaluation ................................................................................................................................... 75
5.5 Analysis and interpretation ....................................................................................................... 76
CHAPTER SIX ............................................................................................................................................... 77
6.1 Conclusions ................................................................................................................................. 77
6.2 Recommendation ....................................................................................................................... 79
Reference.................................................................................................................................................... 80
APPENDIX ................................................................................................................................................... 82
LIST OF FIGURES
Figure 2.1 Data mining models and tasks [4] ............................................................................... 20
Figure 2.2 ANN model [1] ........................................................................................................... 26
Figure 2.3 Backpropgation learning proceess [ 3] ....................................................................... 27
Figure 2.4 Graphical representation of sigmoid function [3] ....................................................... 28
Figure 2.5 Sequential structure of the KDP model [1] ................................................................. 29
Figure 2.7 The six-step KDP model [1] ........................................................................................ 32
Figure 3.1 Event table on Database Server ................................................................................... 51
Figure 3.2 ProView Monitoring Consoles .................................................................................... 52
Figure 3.4 Event message out of service ...................................................................................... 59
Figure 3.5 the class event time stamp 2018 and its values ........................................................... 62
Figure 4.1 precision-recall curve of class April ............................................................................ 69
Figure 5.1 ATM out of service reason identifier system: - Prototype .......................................... 75
LIST OF TABLES
Table 1.1 ATM availability report, from Jan. 1 – Dec. 31, 2017 ................................................... 6
Table 1.2 Binary classes of confusion matrix ............................................................................... 15
Table 3.1 Event content and description ....................................................................................... 48
Table 3.2 Event table from ProView Data Model 4.2/40 (C) Wincor Nixdorf International 2015
....................................................................................................................................................... 50
Table 3.3 Event table on ProView Monitoring Console ............................................................... 53
Table 3.4 Description of event table on ProView Monitoring Console ....................................... 54
Table 3.5 Size of ATM instances and location ............................................................................. 55
Table 3.6 Selected dataset attributes ............................................................................................. 56
Table 3.7 New derived attributes and descriptions. ...................................................................... 57
Table 3.8 Data transformation of Time stamp .............................................................................. 60
Table 3.9 List of attribute used for model building ...................................................................... 60
Table 4.1 Experimental result of J48 decision tree algorithm ...................................................... 64
Table 4.2 Confusion matrix for the result of J48 decision tree ..................................................... 64
Table 4.3 Experimental result of PART algorithm ....................................................................... 65
Table 4.4 Confusion matrix for the result of PART rule induction .............................................. 65
Table 4.5 The result of Naïve Bayes algorithm accuracy ............................................................. 66
Table 4.6 Confusion matrix result of Naïve Bayes ....................................................................... 66
Table 4.7 Experimental result of multilayer perception ............................................................... 67
Table 4.8 Confusion matrix result of multilayer perception ......................................................... 68
Table 4.9 Summarized classifier output........................................................................................ 69
Table 5.1 Detailed summary of questionnaire result .................................................................... 76
LIST OF ACRONYMS
ANN Artificial Neural Network
ATM Automatic Teller Machine
B24 Base 24
BI Business Intelligence
CPO Cashier Payment Order
CSV Comma separated value
Dashen bank S.C. Dashen bank Share Company
DM Data Mining
E-banking Electronic Banking
EPP Encrypting PIN Pad
IDS Intrusion detection system
KDD Knowledge Discovery in Databases
KDP Knowledge Discovery Processes
ML Machine Learning
MS Micro Soft
MLP Multilayer perception
MSE mean square error
NB Naive Bayes
NGO Non-Government organization
PIN Personal identification number
PIN Personal Identification Number
POS Point of Sales
ProM Process Miner
QoS Quality of Service
RDBMS Relational database management system
ROC Receiver Operating Characteristic
AUROC Area Under the Receiver Operating Characteristics
SOP Supervisor of Proview
UT Usability Testing
VB Visual Basic
WEKA Waikato Environment for Knowledge Analysis
ABSTRACT
Yet it is true ideally that ATM service thought to be in service 24 hours a day, 7 days a week,
and 365 days a year; But, frequently it has been seen on these self-service machine out of
service. This study therefore aims to find the reasons for ATMs out of service, taking Dashen
bank ATMs as a case.
The research follows a hybrid knowledge discovery process (KDP) model which has six steps.
The first step is understanding the problem, which helps to describe the problem and identify
attributes for data collection from purposely selected 14 ATMs within periods, December 31,
2017 – May 04, 2018. This is followed by understanding the data and prepare dataset for data
mining tasks. Preparation of data set includes constructions of new attributes from the existing
dataset, removing unnecessary attributes and inconsistence values. From the collected dataset of
ProView ATMs real time monitoring console as per the objective of this research 44 events
describing out of service happening were identified. The crucial step in KDP is data mining
which enables to design a predictive model. In this study WEKA knowledge discovery tool is
used to identify the reasons for ATM out of service.
Experimental analysis shows that J48 decision tree registers the best result with accuracy of
52.7059 %. Finally, using the best predictive model of J48 decision tree we designed a prototype
on NetBeans IDE 8.2 platform. This prototype was evaluated based on ISO 9241-11 UT test
features by domain experts. According to the UT test the prototype was effective, efficient and
hold on user satisfaction. However, an automatic integration of the model with ProView will
help to detect and solve the problem immediately for enhancing ATM in service.
CHAPTER ONE
INTRODUCTION
1.1 Background
The Financial industry, such as banks generate continually a variety of enormous data. The
accumulated data doesn’t make any sense unless it becomes valuable. Enterprises analyze their
historical data by appropriate technologies for business intelligence. The technology applicable
for business intelligence is data mining, the goal of which is to make companies gain competitive
advantage and hence profitable [1].
One of the important bank services nowadays is ATM. On ATM particularly withdrawal became
the most common feature of electronic banking. Government, public sectors business and others
primarily choose payment to be electronically made using ATM. Banks manage, control and
provide secure electronic payment for their customers satisfaction and in turn work hard to gain
profit. Customers who have a plastic card and a personal identification number (PIN) can
withdraw, make deposits or transfer funds between accounts anywhere ATM located depending
on bank products [12].
ATMs are data source as core bank system and produce financial transaction. ATMs are self-
service machine and core bank system with bankers in bricks and mortar both generate massive
data. Because of these, data is alarmingly growing in size, in speed of generation and in variety
of data types. This makes data analysis one of the challenging task for human experts and
necessitates the introduction of automatic data analysis techniques, called data mining. The aim
of data mining is to make sense of large amounts of structured, unstructured, and semi structured
data, available in a given domain [1].
According to Witten, et.al [2], data mining is the extraction of implicit, previously unknown, and
potentially useful information from large amount of data that can be critical for problem solving
and decision making.
1
Data mining is a technology that uses various classification, clustering and association rule
discovery techniques to extract hidden knowledge from heterogeneous and distributed historical
data stored in large databases, data warehouses and other massive information repositories so as
to find patterns in data that are[3]:
valid: not only represent current state, but also hold on new data with some
certainty
novel: non-obvious to the system that are generated as new facts
useful: should be possible to act on the item or problem
understandable: humans should be able to interpret the pattern
According to Dunham [4], data mining tasks are classified as predictive and descriptive.
Predictive models are supervised learning because the class label identified before data analysis.
It contains two processes. The first process finds the best fit model based on the classification
algorithm. Then use the model for determining the class for new records and instances.
There are many different kinds of classification algorithms available for predictive modeling.
Some of the well-known are Naïve Bayesian algorithm, neural network, and decision tree [1][3].
Each algorithm has its own importance and there is a specific condition where they are best
suited to apply for the purpose of data mining depending on the best fit and application.
Descriptive mining tasks characterize the general properties of records in the data repository
[3].Clustering of data, summarization, association rules, and sequence discovery are considered
as descriptive. Descriptive mining considered as unsupervised learning. It is unlike predictive
because there is no class labeled considered on descriptive mining task.
Data mining is a multidisciplinary science and reach vast array of areas such as; biomedical
science, in business of commerce and financial institution, telecommunication and retail
industry. Even though data mining is applicable in so many fields, Han and Kamber [3] pointed
2
out that, it is not free from challenges regarding to mining methodology, user interaction,
performance, and diverse data types with poor quality.
Data being incomplete and noisy, therefore to produce cleaned data; remove noise and
incompleteness to achieve the objective of data mining proper data preprocessing methods are
required.
Challenge to data mining regarding performance is the efficiency and scalability of data mining
algorithms. The running time of a data mining algorithm must be predictable and acceptable in
large databases [3].
Presentation and visualization of data mining results is the other requirements. The extracted
knowledge presentation required to be easily understood and directly usable by humans.
Therefore the presentation need to be on trees, tables, rules, graphs, charts, crosstabs, matrices,
or curves.
The other challenge is requirements of the development of numerous data mining techniques.
These techniques include data characterization, discrimination, association and correlation
analysis, classification, prediction, clustering, outlier analysis, and evolution analysis.
Need of background knowledge on that specific area also is an issue. Therefore understanding of
the domain area is the key because the domain expert can easily interpret the required
knowledge.
3
also forehead on introducing up-to-date technologies because of its mission, “Provide efficient
and customer focused domestic and international banking services by overcoming the continuous
challenges for excellence through the application of appropriate technology”[6]. According to
Gardachew [7], Dashen bank is a leader in introducing E-banking service in Ethiopia.
ATM service increases for cash withdrawal from time to time. Dashen bank 2017 annual report
figured out the ATMs card banking expansion, which is grown by 28%or new 123,198
customers joined the card banking service, and there are now a total of 556,688 card holders, 305
ATMs and 837 POS (uses for cash transfer) terminals[6].These ATMs and POS terminal accept
also international cards including visa, MasteCard, UnionPay, and American Express[6].These
services found in different locations such as hotels and resorts, universities, hospitals, tour and
travel agencies, gallery and jewelry shops, cafés and restaurants, fuel stations, supermarkets,
mall, among others.
Different Dashen Bank S.C departments work cooperatively to see high in service time on
ATMs. Ideally customers who have a plastic card and a PIN can withdraw cash or transfer funds
between accounts at any time wherever ATMs/POSs are located.
The concern of this study is to apply data mining on the accumulated ATM data so as to improve
ATM card bank services of Dashen bank. The self-service machine ATM components, both
hardware and software are expected to perform the intended task effectively and efficiently and
communicate with remote management in real time continuously for whatever event happening.
Application areas of data mining are enormous. According to Han and Kamber [3] DM applied
for financial analysis, telecommunications, biomedicine and science, counterterrorism (including
and beyond intrusion detection) and mobile (wireless) data. Financial data collected in bank and
financial industries are often relatively completed, reliable, and of high quality, which facilitates
systematic data analysis and data mining [3]. DM in bank industry is applied for loan payment
prediction and customer credit policy analysis, detection of money laundering and other financial
crimes, classification and clustering of customers for targeted marketing, and also to design and
construct data warehouses for multidimensional data to find general properties of data [3].
4
1.3 Statement of the problem
One of the technological advancement in banking service is the introduction of ATM which
makes work easy and more effective. Yet it is true ideally that ATM service should be in service
24 hours a day, 7 days a week, and 365 days a year. However, Dashen bank S.C. ATMs Proview
system management and administration tool [8] availability report present the in-service, out of
service, and variables that makes ATM out of service. As presented in table below, 12 selected
ATMs out of service range from 0.37% - 20.12%. The report also pointed out that, the causes of
out of services are Hardware faults, Cash Dispensing, Daily operation, Network issue B24, and
Net connectivity. Out of these causes Daily operation, Network issue B24, and Network
connectivity of ProView are the dominant factors for ATM out of service.
5
Table 1.1 ATM availability partial report, from Jan. 1 – Dec. 31, 2017
Reasons for Out-of-Service
Hardw Cash Daily Network Net
ATMs In Out-of-
are Dispensing Operations Issue Connectivity
Names Service Service
Faults B24
of proview
PATML003_Hilton 89.63% 10.37% 0.75% 3.79% 3.23% 5.77% 2.02%
PATML004_ADA 80.25% 19.75% 0.20% 0.00% 5.31% 2.48% 13.89%
MS
PATML005_SHEB 79.88% 20.12% 0.00% 0.00% 2.74% 11.80% 5.64%
ELLE
PATML006_ETHI 87.77% 12.23% 2.23% 1.82% 4.16% 3.28% 3.74%
OP
PATML007_DHGE 84.97% 15.03% 1.53% 3.43% 6.36% 9.00% 1.26%
DA
PATML009_LUCY 86.99% 13.01% 2.71% 2.72% 5.30% 5.97% 1.49%
PATML012_RASH 89.11% 10.89% 1.59% 1.74% 3.14% 4.19% 2.22%
OT
PATML013_MAIN 95.92% 4.08% 0.02% 0.00% 0.00% 0.00% 3.59%
PATML019_TK 82.02% 17.98% 1.58% 2.11% 7.94% 7.52% 2.55%
PATML022_HAR 82.80% 17.20% 0.10% 1.66% 14.10% 9.46% 0.07%
MONY
PATML029_USEM 99.63% 0.37% 0.20% 0.00% 0.04% 0.00% 0.00%
BA
PATML029_USEM 90.10% 9.90% 4.51% 1.16% 1.99% 2.91% 1.58%
BA
SUM 1049.07 150.93 15.42 18.43 54.31 62.38 38.05
AVERAGE 87.43% 12.58% 1.29% 1.54% 4.53% 5.20% 3.17%
As shown in table 1.1, the ATM is out of service on average 12.58% of the time. In principle
ATM out of service greater than 10% is greatly affects the business [9][10]. Dashen bank ATM
also shows a 2.58% increase on out of service which needs a close attention. From the reasons
for out-of-service Network issue B24 is the dominant factors with 5.20% contribution. This
followed by daily operation and net connectivity 4.53% and 3.17% respectively. Hardware faults
and cash dispensing cases have also a contribution for ATM out of service.
6
It is clear that if the ATM out of service is too much it will be a risk for the business. The bank
targeted to minimize out of service to less than 10%. To this end, all ups and downs of ATM are
monitored 24 hours a day from headquarter for remedial action. If it is up; there is a need to
check proper working of machines components such as cash dispenser, different sensors, cash
cassettes, application software and operating system so that E-payment is possible or not.
Solutions for ATM’s always have delay. The simplest scenario of replenishing money in
shopping centers and hotels ATMs for empty cassette have a delay of 30 - 45minutes because of
traffic jam in Addis Ababa. The fastest measure taken to minimize out of service is manual
observation by calling headquarters call center and on real time monitor using base 241 and
ProView monitoring console as well. Call centers forward the incident for the responsible
section. Communication to report failures and to make it online and operational might not be
smooth. It takes not few minutes but even a day. Because most ATM’s located in hotels,
shopping centers, and universities are running by headquarters. Observation and diagnostics
solution for all areas of ATMs have delays and also it is not automatic. On the other hand,
preventive maintenance solution is ineffective. Because the schedule prepared for maintenance
may not address some of the variables of out of service such as power, network and application.
ProView monitoring system by itself couldn’t be a complete product. The first is the
incompleteness of the data generated from ProView, second the availability of this application,
and third system product auditing, predictive maintenance, and the configurations are untouched
and produce stale reports.
There is no proactive predictive detection and preventive maintenance of ATMs out of service.
This means, that a corrective action is taken after the problem happens, which takes more time,
effort and cost. This results in customer’s dissatisfaction of the ATM service and bank also loses
revenue. For the banks analyzing such a problem and find a solution beside the routine works
became a critical task. The application of DM technology enables design a model that identify
the causes for ATM out of service and schedule accordingly for corrective action. On the other
1
Base 24 is an application software used to manage ATMs
7
side, the size of ATMs event data including transaction is so big so that it couldn’t be analyze in
ordinary descriptive statistics. Also the accuracy and visualization of data mining is powerful
than descriptive statistics.
There are limited studies that apply data mining on ATM service [13]. The concerns of these
studies are firstly on ATM operation menu so as to minimize customers waiting time in
queue[11].The study conducted by Gümüş et al.[12] consider customer satisfaction in the
common use of ATMs; with the purpose of identifying the satisfaction levels of common ATM
users. Madhavi et al. [13] attempt to explore ATM transaction dataset so as to pinpoint several
downsides in its service to predict the ATM usage level and identify peak time of an ATM in a
day/month, and spot any ATM transaction.
There is no local or international study that attempt to investigate the reasons for ATM out of
service using data mining techniques.
It is therefore the aim of this study to explore the reasons for ATM out of service using data
mining classification algorithms. To this end, this study attempts to explore and answer the
following research questions:-
What are the suitable attributes of ATMs that describes ATM in service and out of
service?
Which DM technique is more suitable for designing a model for identifying the reason
ATM out of service?
What are the most interesting patterns that determine the reasons for Dashen bank S.C.
ATMs out of service?
To what extent the model identifies the reasons for ATM out of service?
8
1.4 Objective of the study
1.4.1 General objective
The main objective of this study is to design a model using data mining technology so as to
identify the reason for ATM out of service and take corrective action proactively.
9
There are two major data mining tasks: Predictive and descriptive modeling [4]. The preferred
DM task is design model using classification algorithm on Weka knowledge discovery tool [2].
According to Han and Kamber [3], predictive model applied when forecasting is essential. This
study selects only predictive modeling to identify the reason for ATM out of service using
classification algorithm.
This research has an alignment with Dashen bank S.C. work plan, control, management, policy
and strategy and will have an input particularly on electronic payment using ATM service. The
desired output report of this work support decision maker future direction regarding ATM
service. In other words, the prediction of out of service ATM will produce quality service and
cost reduction. In similar fashion other banks may apply such mining and may have sound
decision.
This study will be an initial attempt locally and may motivate other researchers who have an
interest in the same area to conduct further research that enhance the expansion of ATM service
in the country by the different commercial banks offering the service.
10
1.7 Methodology of the study
This study aims to give solution regarding the problem of ATM being out of service. Such issues
include, why the ATM is out of service? What are the determinant components of ATM that
makes cash withdrawal impossible?
Producing a solution for a given problem presupposes following a scientific principles and
guidelines from the startup to the end. The principles answer systematically the question that
lead to achieve the objective of the study. To these end there are varieties of methodology
followed depending on research characteristics or discipline. The methodology defines the step-
by-step procedure the research has to follow defining concepts or phenomenon that need
research in such a way that the complexity of the problem at hand has clear picture and
understandable [14].
11
To understand the domain the researcher used primary sources such as discussion with the
domain expert and secondary sources which includes document analysis of ProView Operation
Manual V4240 and Console User Manual V4240 ,Dashen bank yearly annual reports, intranet
and extranet portal, magazines, and internet have been used to have deep insight.
Upon the foundation of kick off step this phase builds the following tasks [1]:-
Data collection and sampling and deciding which data to use for data mining task.
Data are checked for completeness, redundancy, missing values, believability of attribute
values.
The final task on this phase is verification of the usefulness of the data with respect to the
DM goals.
In this stage, data collection activity done from ProView. ProView Data Model 4.2/40(C)
Wincor Nixdorf International 2015[17], the log event database for Dashen bank ATMs
monitoring have organized its data in many tables. The researchers in collaboration with domain
experts have selected the event table and its attributes which is interesting for this particular
research. The data found from ProView monitoring console exported in MS excel format for
visualization and check the quality of data.
12
1.7.4 Preparation of the data
The task in this phase, upon the foundation and building of the previous two successive phases
decides which data is used as an input for DM methods in the consequent step. Data mining
results are highly dependent upon data preparation. Poor data preparation results incorrect
results. This step takes too much time to complete the followings tasks:-
Checking the completeness of data records which includes correcting for noise
or outliers and filling missing values by correct attribute values.
Data preparation also includes suiting quality data for DM tool selected.
The data preparation task is performed using MS excel and Weka. Therefore checked the
completeness of records, construction of new attribute, data transformation, data reduction, and
selection of attributes and filling missing values.
13
Can the tool run on different operating system or platform?
Being famous, known well by researchers, and an open source that fulfill the criterion bulleted
above as data mining tool; for the task at hand Weka version 3.8.2 preferred.
Weka contains a collection of many algorithms for data mining tasks, including data
preprocessing, association mining, classification, clustering, attribute selection and visualization
[2]. For preprocessing task it has filtering for both supervised and unsupervised machine learning
of attributes and instances.
This research to design a required model applied classification algorithms such as J48 decision
tree, PART rules induction, multilayer perception neural networks, and Naive Bayes. These
algorithms are utilized more on predictive modeling to forecast based on past trend and have
competent results [1].
14
The effectiveness measure metrics such as the accuracy, precision, recall, and ROC used to
evaluate the discovered knowledge. All these metrics are computed based on confusion matrix.
The confusion matrix is a useful tool for analyzing how well the classifier can recognize records
of different classes [1]. Given m classes, a confusion matrix is a table of at least size m by m.
The table below shows that a binary classes confusion matrix.
Negative Positive
Negative TN FP
Positive FN TP
For binary classification: the possible outcomes of classification are TN (True negative), TP
(True positive), FN (False negative), and FP (False positive).
If the actual instance class is positive and it is classified as positive, it is counted as a true
positive. If the instance is positive and it is classified as negative, it is counted as a false
negative. If the instance is negative and it is classified as negative, it is counted as a true
negative. If the instance is negative and it is classified as positive, it is counted as a false
positive.
AUROC is another metric used when the tradeoffs of ROC curve or the comparison of two
operating characteristics (recall and precision) would not be enough evidence. AUROC is more
powerful metric and easy to understand it is choosing a model/classifier that has maximum area
greater than 0.5 and less than 1 under its corresponding ROC curve [1].
15
1.7.7 Use of the Discovered Knowledge
The final step is designing a prototype that shows the use of the discovered knowledge using
classification algorithm. The design performed using higher level programming language called
java. Java used to develop many kinds of application such as Android apps, desktop apps, and
video games [26].
The prototype usability with respect to efficiency, effectiveness and satisfaction were checked,
evaluated and rated by users.
In-service: an ATM service could be in service when the ATM machine is at the normal
operation or function. Terms such as available, availability, up, uptime, and online are
used interchangeably with in service when the independent self-service machine ATM
can make electronic payment successfully.
16
chapter is data preparation. Chapter four is modeling and evaluation. Chapter five is research
model, use of the discovered knowledge and its implementation and evaluation and the last
chapter is conclusions and recommendation.
17
CHAPTER TWO
LITERATURE REVIEW
2.1 Overview of data mining
Data mining is a multidisciplinary science. Its application on different kinds of fields such as
medicine and commerce has surprising impact. Specialized domain experts might not dig deep to
find hidden knowledge and interesting patterns on their own big data as the data mining
performances. Because of this interesting fact, being computers more powerful and the size of
data grows continually data mining becomes very vital [1]. Data mining uses different
mathematical formulae and algorithms to find precious knowledge from historical data.
According to Bastos et.al[18], the emergence of KDD and DM methodologies are the
development of automated data collection tools, the tremendous data explosion, the urgent need
for interpretation and exploitation of massive data volumes, and the existence of supporting
tools.
According to Cios et al.[1], data mining came to existence because of an alarming growth of data
and technological advancement. An advancement of computer capacity speed; architecture and
algorithms was a great motivation for the data mining. Also the World Wide Web breaks
limitations of location and time and enable ease of collection of different data types- text,
images, audio, and video transfer with high speed and save tera and zeta bytes of data.
All the data in the world are of no value without mechanisms to efficiently and effectively
extract relevant information and knowledge from them. Early pioneers such as Fayyad, Manila,
Piatetsky-Shapiro, Djorgovski, Frawley, and Smith recognized this urgent need, and the data
mining field was born [1].
The subject data mining is related with statistics to model objects. In statistics, researchers
frequently deal with the problem of finding the smallest data size that gives sufficiently confident
18
estimates. DM deals with the opposite problem, namely, data size is large and are interested in
building a data model that is small (not too complex) but still describes the data well[1].
As a major sub field of math’s statistics play a very important role in the research of information
theory, data mining, web mining and so on. For example, when we study a certain population
taking the whole population for analysis it will be cumbersome therefore only limited sample
size taken that help for decision. For this purpose different kinds of statistics measurement
utilized such as mean, variance, MSE (mean square error), standard deviation, and confidence
interval [1].
For any kind of research the methodology or approach considers reality that will be
conceptualized to understand the process from the start to end. The conceptualization includes
modeling. Most descriptions of modeling including data modeling have the form of
mathematical equation in the area of statistics. For example, Bayes, latent semantic and neural
network produce models.
Being youth science its attractiveness and shininess intensity increase. Because viewing others
unknown dimensions of data and discovering surprising knowledge make it advantageous in
business analysis. Others when doing their work such as government and private sectors
acquire knowledge scientifically from their saved data. What is common for all is data that grow
exponentially. If one have stored data without interpretation means being only resourcefulness.
Any available resource needs optimum utilization. Refined data of different application domains
for example health care and financial industry has great benefit accordingly to their specific
domain. Business to stay alive in the competitive market gives high attention for their
accumulated data.
The data that pass through knowledge discovery processes will produce knowledge that was
hidden in the data. The hidden knowledge will be applicable for accurate and precise decision on
business operation of the future.
19
2.2 Data mining tasks
According to Dunham [4], data mining tasks are classified as predictive and descriptive.
Predictive data mining task uncover unknown pattern based on predictor dataset [4]. The
revealed pattern will be a model and determine the desired application domain future trend
decision. Predictive models are supervised learning because the class label identified before data
analysis. The Data Classification processes are two steps [1], building the classifier or model and
using classifier for classification. The first step is the learning phase. In this step the
classification algorithms build the classifier. The classifier is built from the training set made up
of database instances and their associated class labels. Each instance that constitutes the training
set is referred to as a category or class. Then, the classifier is used for classification. Here the
test data is used to estimate the accuracy of classification rules. The classification rules can be
applied to the new data instances if the accuracy is considered acceptable.
There are many different kinds of classification algorithms available for predictive modeling.
Some of the well-known are Naïve Bayesian algorithm, neural network, and decision tree [1].
Each algorithm has its own importance and a specific condition where they are best suited to
apply for the purpose of data mining depending on the best fit and application.
20
Descriptive mining tasks characterize the general properties of the data in the database [3].
Clustering of data, Summarization, Association rules, and sequence discovery are considered as
descriptive. A well-known example is identifying products that are purchased together called
market basket analysis categorized as association rules. Descriptive mining considered as
unsupervised learning. It is unlike predictive because there is no class labeled considered on
descriptive mining task.
Clustering is segmenting and forming subgroups based on their similarity of dataset [1]. The
degree of similarity differentiates groups from the others. The higher the degree of similarity
between records, it is most likely to categorize them in the same cluster. The higher the degree of
similarity yields the more homogenous cluster within that dataset that consists of heterogeneous
data. Clusters or groups are identified with their similarities or nearness.
Association rules is meant for determining which instances go together. An association rule has
two processes as classification. The first one is finding frequently happened pattern and the
second is generating association rules between frequent records in the dataset [3].
The data mining task considered in this study is predictive model based on the classification
algorithms. There are different classification algorithms, such as decision tree, Naïve Bayes, and
neural network multilayer perception.
Hence decision trees models are commonly used in data mining to examine the data and induce
the tree and its rules that will be used to make predictions [19].
21
The two types of decision trees are called classification and regression trees. The classification
trees used to predict categorical variables whereas regression to predict continuous variables
[19].
The main points of learning algorithm for inducing a decision tree from training tuples
summarized as follows [3].
The algorithm is called with three parameters: D (as a data partition), attribute list
(describing the tuples), and Attribute selection method like information gain. Information
gain is feature selection method also used in building decision trees for classification.
Mathematically defined as the difference between the original information requirement
and the new requirement.
Where Info(D) is the entropy of D and InfoA (D) is the expected information required to
classify a tuple from D based on the partitioning by A.
22
The recursive partitioning stops only when any one of the following terminating conditions is
true:
All of the tuples in partition D (represented at node N) belong to the same class.
There are no remaining attributes on which the tuples may be further partitioned. This
involves converting node N into a leaf and labeling it with the most common class in D.
There are no tuples for a given branch, that is, a partition D j is empty. In this case, a leaf
is created with the majority class in D.
………………………….(2.2)
From the above definitions and equations of empirical formula Bayes ‘rule applicable for data
mining to make group and classification based on given statistical information. By classification
it is possible to predict group of objects that belong to specific instance based on a probabilistic
model specification. The hypothesis or probability using Bayes’ rule can predict a class.
Naïve Bayes Classifier is based on Bayes ‘rule by assuming that all attributes are: 1) equally
important and 2) independent of one another given the class. It is a probabilistic classifier. The
following is the mathematical equation [3].
23
CNaiveBayes argmaxP(Cj )P( Ai | Cj ) ……………………………… (2.3)
j i
Naive Bayesian is simple and easy to understand its mathematical formulae. The Naive Bayesian
algorithm usually has the following steps [3].
probability of attribute value xi domain value wi,t in class Ck determines the probability such
that P(Ci |x)>P(Cj|x),x is in class Ci; else x is in class Cj. The Bayesian approach to classify
the new instance is to assign the most probable target value, P (Ci|x), given the attribute values
{w1, w2, ..., wn} that describe the instance according the equation 2.3.
= …..……………………. (2.4)
Since Naive Bayes classifier makes simplified assumption that the attribute values are
conditionally independent given the target value according to the following equation:
………………………… (2.5)
Step 3. All parties calculate the probability of each class, according to the following equation:
= …..……………………. (2.6)
24
Step 4. It selects the maximal of the probability of each class according to the following
equation:
ANN is a concept driven from neuron interconnection to compute and analyze logic. ANN is a
mathematical model or computational model that is inspired by the structure or functional
aspects of biological neural networks. A neural network consists of an interconnected group
of artificial neurons, and it processes information using a connectionist approach to
computation. ANN is an adaptive system that changes its structure based on external or internal
information that flows through the network during the learning phase [20]. There are two types
of neural network [1]. The first one is called feed forward and the second one is feedback loops
called recurrent. In the case of recurrent networks data from the output feedback to the input to
be again recursively as input and then will be an output.
The popular architecure of ANN is either single or multilayerd[20]. When ANN have no hidden
layer it is called single layered where as if it has hidden layer it is multilayered[3].There are no
clear rules as to the “best” number of hidden layer units.The computation of ANN considerd as a
black box. Because the comlexity of ANN mathematical finding solution for a given problem is
so much lengthy. Figure 2.2 shows the main framework of the ANN.
25
Figure 2.2 ANN model [1]
……………………………………………….(2.8)
Computing output involves calculating summation of the product of weights with input values
that are pointing to a given ouput node, as shown in equation 2.9.
m
y w jx j
j 1
……………………….. (2.9)
Another type of ANN backpropagation (the “backwards” direction, that is, from the output layer,
through each hidden layer down to the first hidden layer (hence the name
backpropagation))learning processis described as follows [3]:
26
Initialize weights with random values(initialized to small random numbers (e.g., ranging
from -1:0 to 1:0, or - 0:5 to 0:5)) and set other network parameters,
Read in the inputs and the desired outputs,
Compute the actual output (by working forward through the layers),
Compute the error (difference between the actual and desired output),
Change the weights by working backward through the hidden layers,
Repeat steps until weights stabilize.
AS decipted in figure 2.3, training Algorithm of Back Propagation with mathematical formulae
[3]
m
y w jx j
j 1
………………………........(2.10)
Apply the activation function (in this case step function) such that
27
0 if y 0
y
1 if y 0
…………………………. (2. 11)
Update the weights according to the error.
W j W j * ( yT y ) * x j
………………….. (2.12)
Usually to limit the output an activation function will be considered. This function make
boundary value between 0 and 1. There are so many activation function make boundry. The most
used one is a sigmoid function[3], as shown in equation 2.8. Graphical representation of sigmoid
function is presented in figure 2.4.
a = 1/(1+e-x) ....................................................(2.8)
Notice consider when defined ANN topology before training can begin, decide on the number of
units in the input layer, the number of hidden layers (if more than one), the number of units in
each hidden layer,and the number of units in the output layer[3].
28
2.3 DM process models
An acceptable format for knowledge discovery process (KDP) within common frameworks is
known as a process model [1]. A process model of data mining has its own framework. The
DM process defines, as show in figure 2.5, a sequence of steps (with eventual feedback
loops) that should be followed to discover knowledge (e.g., patterns) in data. Each step is
usually realized with the help of available commercial or open-source software tools [3].
The followings are the three common Knowledge Discovery process models, academic research
model, industrial model and hybrid model [1].
1) Understanding of the application domain and the final product value when passing
through KDP.
2) Form a dataset which will produce valuable sample subset that will be input for KDP.
3) Data cleaning and preprocessing: - treatment of missing value, remove outliers and
accounting for time-sequence information and known changes.
4) Data reduction and projection: - search and find determinant feature or attributes that
align with the purpose of KDP by making data transformation to have representative of
the dataset.
5) Make appropriate data mining task that realize the application domain. Select task
whether it is predictive or descriptive. For example clustering, Association, and
classification.
29
6) Select an algorithm to create a model that excels other models.
7) Data mining (find a pattern or knowledge): The knowledge representation can be in the
form of rules such as decision tree, rule induction, etc.
8) Interpreting mined patterns: - the knowledge representation can be visualized in the form
of graph or text to read based on the selected model.
9) Compile the acquired knowledge: - It include report for the concerned, document,
deployment and make an appropriate action or be an input for other advanced research.
30
1) Business understanding: - understanding of business goals and the desired input needed
for that business. At this stage identified and known the business problem and or
available resources accordingly to data mining problem definition. Also form a
preliminary project plan which consists of details of each task that will be performed to
achieve the objectives.
2) Data understanding: - startup with data collection, then know description of dataset,
explore dataset to know what it contains inside, next verification of data quality,
enumerate and know the placement of every feature.
3) Data preparation: - This is most time taking process and very crucial part of data mining.
It includes Table, record, and attribute selection; integration data sets; data cleaning;
construction of new attributes; and transformation of data.
4) Modeling: - Select the best algorithm and techniques that will be used for creating an
applicable model. The default, calibrated and optimal parameter values tested and used to
find best model. It is an iterative process.
5) Evaluation: - All the models created are evaluated according to business objectives
criteria. At this step before determinant decision decided a review of all the previous
processes required for the selected model to be a model.
6) Deployment: - The deliverable knowledge depend on application domain reported,
presented, and submitted in such way easily to understand or it will be for another KDP .
This model has several feedback and detailed reasons why feedback.
The extracted knowledge extended for another application domain.
It contains research-oriented structure.
Emphasize on data mining.
One such model is a six-step KDP model (see Figure 2.7 below) developed by Cios et al. [1].
The steps are discussed as follows,
31
Figure 2.7: The six-step KDP model [1]
32
1) Understanding of the problem domain: - Kick off step is understanding of the data mining
application domain, looking close at that specific domain, collaborate working with the
domain expert to define clearly the problem and determine the project goals, identifying
key people, and learning about current solutions to the problem. It is a key learning
domain-specific terminology. This step includes the followings tasks.
A description of what, how, and why the problem happened and its
restrictions are prepared.
The business application domain area goals are translated into DM goals.
Initial selection from many available commercial and open source DM tools
to be used later in the process is performed.
2) Understanding of the data: - Upon the foundation of kick off step this phase builds the
following tasks.
Data collection and sampling and deciding which data, including format and
size, will be main task.
The final task on this phase is verification of the usefulness of the data with
respect to the DM goals.
3) Preparation of the data: - The task in this phase, upon the foundation and building of the
previous two successive phases, is deciding which data will be used as input for DM
methods in the consequent step. Data mining results highly depend upon data preparation.
Incorrect results yield form poor data preparation. This step takes too much time to
complete the followings tasks.
33
Data cleaning which includes collecting data from different source or
integration.
Data preparation also includes suiting quality data for DM tool selected.
4) Data mining: - this step has quality data from the previous step to find the desired
knowledge. There are different techniques and algorithms that will be applicable for
extraction of knowledge. The followings are some of them.
5) Evaluation: - At this step the extracted knowledge is checked for interestingness and
applicability. Also domain experts are checking interpretation of the results and
interestingness of the knowledge for retaining the model.
34
6) Use of Knowledge: - the final step will be documentation, deployment, and application in
the current domain and extended to other domains and planning where and how to use the
discovered knowledge.
Financial data collected in bank and financial industries are often are relatively completed, reliable, and of
high quality, which facilitates systematic data analysis and data mining. Data mining in banks industry
applied for loan payment prediction and customer credit policy analysis, detection of money laundering
and other financial crimes, classification and clustering of customers for targeted marketing, and also to
design and construct data warehouses for multidimensional data to find general properties of data [3].
According to Han and Kamber [3] the other application area is in retail industry. Many stores open their
own website without any brick-and-mortar store location but exist solely online where their customers can
purchase items. The data mining application in retail industry used to identify customer buying behaviors,
discover customer shopping patterns and trends, improve the quality of customer service, achieve better
customer retention and satisfaction, enhance goods consumption ratios, design more effective goods
transportation and distribution policies, and reduce the cost of business. Some applications of DM in retail
industries are design and construction of data warehouses, multidimensional analysis of sales, customers,
products, time, and region, analysis of the effectiveness of sales campaigns, customer retention—analysis
of customer loyalty, and product recommendation and cross-referencing of items.
The telecommunication industry also inclined to apply DM for understanding of the business involved,
identifying telecommunication patterns, catching fraudulent activities, making better use of resources, and
improving the quality of service [3]. The followings are lists of some applications:
35
Multidimensional analysis of telecommunication data- to identify and compare the data traffic,
system workload, resource usage, user group behavior, and profit.
Fraudulent pattern analysis and the identification of unusual patterns: to (1) identify potentially
fraudulent users and their atypical usage patterns; (2) detect attempts to gain fraudulent entry to
customer accounts; and (3) discover unusual patterns that may need special attention, such as
busy-hour frustrated call attempts, switch and route congestion patterns, and periodic calls from
automatic dial-out equipment (like fax machines) that have been improperly programmed.
Multidimensional association and sequential pattern analysis: This can help promote the sales of
specific long-distance and cellular phone combinations and improve the availability of particular
services in the region.
Mobile telecommunication services: to design adaptive solutions enabling users to obtain useful
information with relatively few keystrokes. And it used for telecommunication data analysis and
visualization.
Han and Kmber [3] mentioned the application of DM powerfulness on intrusion detection. DM is the
more precise on intrusion detection than the traditional intrusion detection system. Moreover, IDS needs
highly qualified person to find the subtleties of anomaly signature detection but DM require far less
manual processing and input from human experts. The following are areas in which data mining
technology may be applied or further developed for intrusion detection [3]:
Development of data mining algorithms for misuse detection and anomaly detection.
Association and correlation analysis, and aggregation to help select and build discriminating
attributes.
Analysis of stream data for real-time intrusion detection.
Distributed data mining to analyze network data from several network locations in order to detect
these distributed attacks.
Visualization and querying tools for viewing any anomalous patterns detected.
An empirical study by Farooqi et al. [28] describe the necessity of DM on historical data to
extract useful information that would be the source of sound decision. Also presents DM
techniques and its useful application in banking industry like marketing and retail management,
CRM, Investment Banking, portfolio management, risk management and fraud detection.
36
Investment Banking
DM techniques are so supportive to select the best investment as per the clients’ profile. Neural
networks and linear regression could be applied to predict prices for stocks. Based on DM
prediction the one selected ROI could be the maximum [28].
Portfolio Management
With DM techniques investors make allocation of budgeting appropriately in trading activities in
order to have maximize profit and minimum risk [28].
Not many years ago but recently the ATMs transactional data were considered to extract hidden
pattern in certain journals. The goal of the analysis using data mining technology are to
maximize profit, increase return on investment, business intelligence, satisfy customers in
customers relationship management and stay alive and be forerunner in fierce competitive
market.
The first reviewed article is construction of adaptive automated teller machines by Shaikh and
Mahmood [11]. The ATMs transactional data was collected and preprocessed, data mining
technology applied for construction of ATMs user interface. Data mining tool used was ProM,
and then quantitative research method was an extended approach to reinforce the finding. The
37
finding revealed that adaptive automated teller machine was the preference of most ATMs
customers.
Whenever and everywhere quality of ATMs service has been desired for both customers and the
service providers. This research and the related articles reviewed in this thesis concerned with
improving ATM service based on historical data collected using data mining technology. These
days the data mining attracts business and selected fields and because of this researches continue
on ATMs transactional data to forecast the future business of ATM inclination.
Different types of users who have ATM plastic card access ATM at any time wherever ATM
located to find different services. The time taken assuming normal condition for customer to
access and use ATM depend on response time of thirty seconds. However the type of customer,
the time of access, and the location of ATM can determine the span of time. It is common in very
dense population area and peak hour of ATM transaction to see queue to get access.
The adaptive automated teller machine objective was development of adaptive ATM interfaces
to minimize the ATM usage time for a population of customers particularly at heavy traffic
ATMs [11]. For the achievement of the objective there were two approaches used. The first and
most used was data mining technology. After preprocessing of the data interesting knowledge
was extracted. The result identified that withdrawal is the most frequent operation performed by
customers, followed by purchase and balance inquiry. Based on the above finding the authors
develop interfaces that can minimize the time taken when ATM accessed.
The second approach to achieve the objective of adaptive automated teller machine was
quantitative research method. The ATM interfaces of adaptive automated teller machine that was
developed after knowledge discovery process were evaluated using online survey
(questionnaire). The evaluation proofed that users’ preference was the adaptive ATM interface or
menu.
38
Indeed the data mining technology produce the required hidden knowledge that improve ATM
service at dense area of population and pick ATM transaction hour. Not only enough improving
wait time or queue on ATM but identifying the specific reason ATM out of service is the first
step to increase and improve quality of service.
Madhavi et al. [13] conducted ATM service analysis using data mining. The ATMs transactional
data collected and preprocessed, predictive data mining gave solution for finding in which time
ATM used more frequently and identify peak time of an ATM in a day/month, and spot any
ATM transaction, and data mining tool was Weka, the finding support decision makers to take
the necessary actions.
The objective of the research was to provide the graphical visualization for easy identification of
ATM usage level and monitor the ATM peak time. For the achievement of the objective
predictive data mining technology was applied. After preprocessing of the data the extracted
knowledge were visualized peak day of an ATM in a month, peak hour of an ATM in a day, the
type of transaction which occurs regularly are recognized, the location of the ATM which
provides the service to the customer is identified with their usage level, and the ATM usage for
every location is calculated. Based on the finding business intelligence bank decision makers
simply determine the business future direction.
The predictive data mining corrected ATM service to increase the Quality of Service. One of
QoS replenish money be on timely manner and hence reduce the situation of out of stock in that
ATM. The other uncovered hidden knowledge was peak level and idle level of the ATM to take
appropriate action for decision makers.
The above researches opened door for more analysis on ATM transaction dataset because ATMs
nowadays do not have more analysis [13].Both researches view and analyze ATM transaction
dataset in their own way. The research at hand also in different dimension investigates the
problem of out of service to identify the reasons for ATM out of service through designed model.
39
Gümüş et al.[12] attempted to analyze ultimate point in the service provided by the banks to their
customers so as to determine customer satisfaction in the common uses of ATMs. The researcher
pinpoints the necessity of standard measurement in bank industry for the provision of quality
service of in dynamics of ATM usage. For business like banks customers care and customers
relationship management to meet their expectations and determine how such expectations would
be met is a key. For any business in order to have higher position and keep it up for long and to
be competent it is required to find new customers or duly satisfy the already existing customer.
According to Gümüş et al.[12] today, customer satisfaction has become one of the busiest
segments of marketing because of that studied the expected service quality and the perceived
service quality of common ATMs and thus determine the satisfaction level of ATM users.
The existence service provided for commonly use ATM users when perceived to what extent
reached was not known. To know the satisfaction level of common ATM users the approach
followed called screening model. The screening models based on demographic characteristics
questionnaire were used to identify the difference between the expected service quality and the
perceived service quality of common ATMs which indicates the satisfaction level of ATM users.
The finding were with respect to the demographic characteristics sample group divided in
gender, age, educational status, marital status, income status, the number of banks/credit cards
they are using, and the most frequently used credit cards. And the finding shown the
measurements were statistically significant hence all most all perceived measurement score were
less than the expected.
The research found out less satisfaction level provided by banks for their ATMs customers users’
were because of unsatisfactory quality of service. And recommended to measure periodically
their customer satisfaction levels through updated questionnaire screening operations. Further,
the banks are recommended to improve their processes in order to provide services to customers
on time and in a fast, flawless, enthusiastic and reliable manner. In respect of common ATM use,
bank employees should improve their skills, should become equipped with the necessary know-
how and foster a feeling of trust on the customers.
40
All the research reviewed initiates other researches typical on ATMs quality of service. As a
matter of fact the research at hand investigates to identify the reason for ATMs out of service
problem which is different dimension which is unlooked and untouched.
The research methodology followed was importance-performance approach and referenced the
commonly used five attributes models which are tangibles, reliability, responsiveness, empathy
and assurance to measure SQ and customer satisfaction.
To develop the measurement scales referenced another known 25 valid ATM SQ attributes from
different resources which comprised tangibles (6 items), reliability (6 items), responsiveness (6
items), assurance (3 items) and empathy (4 items). Likert scale was added for these 25
questionaries’ and then 353 ATM card users administered through email to find their reply.
The finding was Overall 53% of the respondents were satisfied with ATM services and even
though the referenced five SQ attributes significantly associate with customer satisfaction
reliability is strongly correlates with ATM service satisfaction.
The study remarked banks responsiveness and empathy SQ in Malawi to be standard for
competitive advantage. Because employee performance on ATM cards application,
reconciliation processes and management functionality have been perceived not so good in
performance.
41
The demographic characteristics pointed out was limited. Thus it is recommended ATM
customers’ satisfaction future research to include wide area demographic characteristics and
different customers’ satisfaction research on mobile and internet banking.
Modified automatic teller machine prototype for older adults: A case study of participative
approach to inclusive design by Chan et al. [30] studied the existing ATM interface and aged 60
or above ATM card users’ who have physical or cognitive limitations. Because the existing
human– ATM interface didn’t match with elderly people.
The research methodology followed was a user participative approach to adopt a universal
design that can accommodate older adults. A total of 187 older adults, participated in two stages,
91 older adults were assigned to the existing human– ATM interface test group and 96 were
assigned to the modified universal human– ATM interface prototype group.
The result present for participants who had a lower education level pointed that the universal
design to include reduce amount of information to be recalled and processed, reduce the
transaction time, present information more slowly, simplify the presentation of information on
screen, use a single screen to complete one operation/transaction, and use graphics and visual
effects.
The research notified the tradeoff of the universal design which is the modified ATM prototype
reduction in functionality would create inconvenience to younger users. Hence it is remarked the
future research workout on resolution to address optimum functionality. On the other hand the
research gap was the participant. All the participant respondents were gathered from a certain
elderly center. Also it is recommended on future work respondents to be global or from wide
area different status people.
Analyzing and Investigating the Use of Electronic Payment Tools in Iran using Data Mining
Techniques by Moslehi et al [31] investigated E-payment instruments –ATM and Pin Pad
transactions. Firstly factors that affect the number and amount of transactions , second
42
relationship between the period and the volume of the transaction, thirdly relationship between
the bank and the province.
To address the problem approached followed up was CRISP-DM methodology. The techniques
were clustering and classification. Data collected from Statistical Center of the Central Bank of
Iran which is performance statistics of electronic payment tools ATM from 1986 to 1994 and
49, 425 records of 12 variables used for DM. The DM tool was Clementine 12. First K-Means
algorithm was implemented to determine the target field categories. And then the CART decision
tree algorithm was used to explore the factors affecting the average amount of ATM transactions
and the average amount of Pin Pad transactions.
The finding from CART decision tree algorithm was ATM in seven banks and seven province in
the years 1991 to 1994 in summer, autumn, and winter than in spring has increased significantly.
And the other rule was increased ATM terminals had direct proportion with Pin Pad transactions.
Additionally it is noted the research enhance E-banking services and E-banking future policy.
It is recommended future works to be done using telephone banking, internet banking, mobile
banking, and POS for a comprehensive review of the bank performance in electronic payment.
A maintenance prediction system using data mining techniques by Bastos et al. [18] studied
mitigation of industrial machine failure.
The finding from this system using visualization techniques on ANN back propagation algorithm
with an accuracy level of 72% can predict failure occurrence. And on future with the addition of
43
monitoring data into the system suggested the possibility to predict the failure occurrence within
a proper time and able to perform interventions on equipment before breakdowns to reduce
maintenance effort and cost.
Thus the forwarded future work pointed that to predict new failures more features required
particularly from monitored equipment database to understand the reason of all malfunction
occurrences.
44
CHAPTER THREE
DATA PREPARATION
The Dashen bank external portal lists the following four main products and services [5].
Domestic banking, International banking, Dashen card, and E-banking.
Domestic banking contains three products and services. The first one is Current account which is
a convenient medium to make and receive payments. It allows to access money with ease by
using cheques or VISA Card. The other is loans that are offered in different kinds such as
Agriculture, Manufacturing, Import/Export Loans, Trade and Services, Building and
Construction, and Transport. The third is Money Transfer, Remittance of funds through one
branch to another. Local common means of remitting are - Mail Transfers, Telegraphic or
Telephone Transfers, Local Drafts, Cashier Payment Order (CPO). And the well-known from
internationals are Western Union and Money Gram.
International banking products and services include mainly Foreign Exchange Permits and
Import and Export. Approval process of foreign exchange permits requires presentation of
different sets of documents for each transaction. The other one which is related with Foreign
Exchange Permits Import and Export enable purchase of commodities, machineries, materials,
etc. from abroad and allowed to enter Ethiopian territory.
Dashen card have different varieties. The first one is Debit card, it allows at any time accessing
ATM/POS, Operate multiple accounts with a single card, Withdraw up to 5,000 Birr per day per
card subject to the balance in account, purchase goods or services up to Birr 8000.00 per day per
card at any of the Dashen Bank merchants, Check the balances of all accounts linked to the card,
45
Obtain mini-statement that lists the last ten transactions in any of accounts linked to the card.
The others are Salary Card which is useful for organization salary payments and Student Card
for students.
E-banking services offered by the bank include ATM/POS, Internet Banking, Mobile Banking,
and Agency banking [5].
Available services on Dashen Bank ATMs are Cash withdrawal, Balance Inquiry, Mini-
statement, Fund transfer between accounts attached to a single card, PIN change, and PIN
Unblock. And on POS allowed purchase of goods from Dashen merchants who have POS
machines.
Internet banking is a service that enables the customer by accessing Internet Banking on all types
of search engines like Mozilla fire fox, chrome etc… all over the world. The major services are
Account information, Enquiries mini statement, full statement, Daily Exchange rate, Loan
statement, Fund transfer within Dashen bank and To other local banks accounts, Salary and
provident fund upload, Electronic bill payment(utility payment), Stop cheque payment, Cheque
book order, password change and other service.
Mobile Banking services are Fund transfer within own Bank account and wallet account, Fund
transfer through mobile phone for those who registered for mobile service, Fund transfer to
others who have only mobile phone/No. for those who unregistered for mobile service, Mini
Statement and checking of account history of wallet account, Balance Enquiry on bank account
and Wallet account, PIN change, Fund transfer from Bank account to Bank account for those
who registered for mobile service, Merchant payment, Bill payment and other services.
Agencies banking major services are Account information, E-wallet account opening
/registration/, Cash deposit to wallet account, Cash withdrawal from wallet account, Fund
transfer to others who have only mobile phone/No., Bill payment, Merchant payment, Facilitate
Regular bank account opening and other services.
46
The ATM data collection studied on ProView monitoring console [16] with domain expert and
conducted collaboration work to understand more the problem domain. In other words, on this
step in order to properly understand the problem; we attempt to explore problem domain and
found insight particularly focused on ATM data including the following issues. What are the
variables? Which variables are making out of service ATM? What kind of data? How data
collected? Where is the data? Why the data collected and data translation to achieve DM goals
using a tool Weka [1].
For this reason the researcher in consultation with domain expert learn about Dashen bank S.C.
ATMs monitoring system called ProView. ProView is an agent and PC-based event monitoring
and incident management system for the monitoring and administration of ATMs[8].The agent
from each terminal ATM or self-service application and the self-service hardware send all the
collected events to the ProView Wincor Nixdorf’s management system. For this purpose on the
network operating system MS window 2013 installed and on this platform for relational database
management system installed SQL Server 2008. The application of Wincor Nixdorf’s ProView
monitoring and administer all ATMs installed on win 7.
On this application administrator on real time can view cash status, mechanical problem, and
others different status of ATMs to take remedial action and generate different kinds of report
such as ATM uptime and down.
The main components of ATMs are: - Card reader, Keypad (EPP), cash dispenser, Display
screen, Speaker and Receipt printer. The live communication between ATMs and ProView
enable administrators to monitor the status of all Dashen ATMs. On the other side, B24remote
host processor manages every customer withdrawal request and others found on ATM display
menu. Every event on ATM interface is sent to ProView monitoring console which is the
interesting part of this research. The event provides information described in table 3.1 below.
47
Table 3.1 event content and description
Event Description
time stamp includes month, date, year, hour, minute and seconds
event activities that going on when the ATM interfaces touched for
message different purposed based on menu
Server time a time like event stamp but when the events are arrived on
stamp ProView server
Component ATM have two components; the upper one computer and
dispenser(cash cassettes)
The ATM components based on event collected are hardware and software. The hardware
components of ATM are card reader, cash dispenser, keypad, and screen buttons. And the
components of ATM which are application and operating system considered as software.
Supervisor activities and network are other independent determinant components of ATM.
Card reader is one component of an ATM that read from plastic card. The card reader captures
the account information stored on the magnetic stripe on the back of an ATM/debit or credit
card. The host processor uses this information to route the transaction to the cardholder's bank.
Whenever the card reader has got a problem to read it will be impossible to process electronic
48
payment and the ATM machine became down [32]. Some of card reader values when it is
nonfunctional are read error, blank track, device disconnected, card jam and no smart card
response [16].
Cash dispenser: It is the heart of an ATM which is the safe and cash-dispensing mechanism. The
entire bottom portion of most ATMs is a safe that contains the cash. If this component of an
ATM and its sub components find problem it will be impossible to electronic payment and the
ATM machine became down [32]. Some of cash dispenser values when it is nonfunctional are
device not in use, device not accessible, bill cassette is empty, bill cassette is missing, reject
cassette removed, pick failure, presenter clamping mechanism failed, sensor failure, and
currency jam [16].
Keypad: This component of ATM lets the cardholder tell the bank what kind of transaction is
required (cash withdrawal, balance inquiry, etc.) and for what amount. Also, the bank requires
the cardholder's personal identification number (PIN) for verification [32]. Keypad may also fail
and make out of service ATM. If this keypad or EPP hardware failed the ATM became
nonfunctional. The value of this keypad is invalid key or key doesn’t exist [16].
Application: This system is the one that make transaction or cash withdrawal impossible.
Software found on ATMs is windows operating system and an application system. Both can
affect electronic payment make an ATM down or unavailable. Some of the values are attempt to
reboot via system, transaction failed unable to process transaction, and unable to dispense cash
[16].
Network: This network means a communication between the end terminal ATM self-service
machine and the remote server host which is manager and controller of transaction that connect
to bank core system and database. The values of this independent ATM component are
communication offline, TCP/IP address not accessible and device offline [16].
49
Supervisor activities are part of an ATM component. Whenever the supervisor made service and
change the ATM will be out of service. The values of SOPs are terminal closed for customers
and changed to supervisor mode [16].
In this stage, a data collection activity that is done in this research is discussed. ProView Data
Model 4.2/40(C) Wincor Nixdorf International 2015, the log event database for Dashen bank
data ATMs monitoring have organized its data in many tables[24]. The researcher in
collaboration with domain experts selected the portion of those datasets that are interesting for
this particular research. The original data attributes available in the log database is presented in
table 3.2.
Table 3.2 Event table from ProView Data Model 4.2/40 (C) Wincor Nixdorf International
2015
50
The most important are events tables that can be exported to MS Excel. The followings are some
of the dataset from different tables that need refining to find complete, interesting subset and
non-redundant data.
51
Figure 3.2 ProView Monitoring Consoles
The left side shows that all ATMs hierarchies and the right one is a single ATM events at Capital
Hotel.
52
Table 3.3 Event table on ProView Monitoring Console
53
The numbers of attributes shown on table 3.3 ProView Monitoring Console are 7. According to
[3] data quality can be measured in terms of accuracy, completeness and consistency. The
Dashen bank S.C. ATMs data to be accurate complete and consistent the following section show
what attributes represent and which value consists of.
The data collection has been made from 14 ATM from December 31, 2017 – May 04, 2018.
The following tables 3.5 depict every ATM together with the instances collected on MS Excel.
The size of each ATM data varies because it depends on the transaction made. The combination
54
of 14 independents ATMs data at this stage became one file of size 15.9MB with the sum of
365,522 instances.
The data set attributes and values shown in Figure 3.2ProView Monitoring Consoles are valid
and complete except ‘component’. Component attribute has 7% missing values.
Original message, device name and location and event number considered as irrelevant because
they are not useful for the problem at hand for prediction of ATM out of service and both
attributes are unused. It is too expensive to collect the whole ATMs data. Based on the domain
expert consultancy 14 ATMs which are located in Addis Ababa city area (Bole, Lagar, Bole
premium, and RasDesta); Shopping Center (Adams and Edna Mall); Hotels(Filuha, Hilton,
Shebell, Sheraton, and Yolly);University -Hawasssa, and upcountry area (Nekemeteand
55
Woliso)were selected. These ATMs are selected because of their high transaction and the target
data collections are more complete than the other ATMs.
The dataset at this stage after attributes selection that became an input as following table 3.4.
The tasks applied in this study were data transformation and derivation of new attributes.
56
These attributes are subset of event message attribute and they are derived according to Cios et
al. [1] construction of new attributes.
57
Figure 3.3 Event message count
As shown in figure 3.3 there were a total of 135 distinct event message values count inside
365,522 instances. After removing all the in service values, which are not the concern of the
current study, the out of service event values are organized to prepare data set for
experimentation. The selected event message out of service is shown in figure 3.4.
58
Figure 3.4 Event message out of service
Data transformation
According to Han and Kamber [3] discretization, concept hierarchy, and generation are where
raw data values for attributes are replaced by ranges or higher conceptual levels. Time stamp
attribute after aggregation became three distinct class values as shown below in table 3.8.
59
Table 3.8 Data transformation of Time stamp
Attribute time stamp Time stamp values Aggregated values
The following table 3.9 shows attributes and their values used for this research together with
their data type.
60
6 DispenserTransporter Nominal Currency jam in presenter transport or transport
sensor failure, Sensor failure or currency jam in
main transport
7 SOP Nominal Operator switch changed to supervisor mode,
Terminal is closed for customers
8 Application Nominal Attempt to reboot via special electronic,
Attempt to reboot via system, Transaction failed,
Transaction failed: Unable to dispense cash,
Transaction failed: Unable to Process Transaction
9 NetworkConnectionB24 Nominal Communication Offline, Device offline,
TCP/IP address not accessible
10 Eventime_Timestamp_2018 Nominal January, march, April
The combined 365,522 instances after preprocessing became 20,880 Out of Service event quality
dataset that had been input for data mining tool weka. MS Excel and VB were used for data
cleaning purpose. This quality dataset file converted to comma separated value and attribute
relation file format to make it suitable for data mining tool weka.
As Cios et al. [1] a classifier is a model of data used for a classification purpose: given a new
input, it assigns that input to one of the classes it was designed/trained to recognize. The
dependent classifier attribute was even time stamp. It classifies to three different instances.
61
Figure 3.5the class event time stamp 2018 and its values
62
CHAPTER FOUR
Modeling
4.1 Building the model
Here the data miner uses various DM methods to derive knowledge from preprocessed data [1].
In this study classification algorithms are used patterns describing the reasons for ATM out of
service. The tool for data mining task used was weka. Weka contains varieties of machine
learning algorithms and hence the selected predictive models algorithms were PART, J48, Naiev
Bayes, and multi-layer perception on weka version 3.8.2 knowledge analyzer.
For test design purpose default weka test option 10 folds cross-validation was used for all
experiment. It is partitioning a data set randomly into 10 folds. Then all combined partitioned
dataset except a single subset data set used to train and produce model to have the statistical
result. That single partitioned subset dataset used for test set to evaluate the model. This training
and test repeated 10 times. The cross validation in weka is one of the option of test design to
evaluate predictive model.
The following sections were experiments result achieved in each algorithm when the models
built.
63
Table 4.1 Experimental result of J48 decision tree algorithm
The Experiment results of J48 accuracy as shown in table 4.1 on cross validation test mode when
unpruned false and binary split true; 52.7059%accuracy greater than all others results That
means J48 algorithm correctly classified about 52.7059%and 47.2941% incorrectly.
The following table 4.2 shows the confusion matrix for pruned and binary split.
Table 4.2 Confusion matrix for the result ofJ48 decision tree
a b c <===Classified as
1208 3386 157 a= January
668 9563 134 b = April
795 4735 234 c = March
The above confusion matrix shown that out of 20,880instances10,976 were classified correctly
(52.7059%) which is the diagonal partition and 3 of them were classified incorrectly
(47.2941%).
64
4.3 Experiment result using PART
In this experiment the parameters binary split and unpruned of PART rule induction are
calibrated by changing their default values into true or false whereas the rest of the parameter
were kept as their default value. Summary of experimental results are depicted in table 4.3
below.
The Experiment results of PART accuracy as shown in table 4.3 on cross validation test mode.
The accuracy 52.591%is greater than all the other. That means PART algorithm correctly
classified about 52.591%and 47.409% incorrectly.
The following table shows confusion matrix for pruned binary split.
Table 4.4Confusion matrix for the result of PART rule induction
a b c <====Classified as
1340 3230 181 a= January
791 9283 291 b = April
897 4509 358 c = March
The above confusion matrix shown that out of 20,880 instances 10,951were classified correctly
(52.591%) which is the diagonal partition and three of them were classified incorrectly
(47.409%).
65
4.4 Experimental results of Naïve Bayes
In the case of Naïve Bayes algorithm the parameters calibrated to change default values are
kernel estimator and use of supervised discretization; the rest of the parameter value were kept as
their default value. The experiment done by changing their value into true or false but both
parameters cannot be true at the same time. The results were similar as shown in table 4.5.
The Experiment results of Naïve Bayes accuracy as shown in table 4.5 on cross validation test
mode whether it is default value or changes calibrated parameters Use Kernel Estimator and Use
Supervised Discretization are all similar. The accuracy is 52.6868 %. That means Naïve Bayes
algorithm correctly classified about 52.6868 %and 47.3132 %incorrectly.
The following table 4.6 shows the confusion matrix of Naïve Bayes which is alike J48 and
PART.
Table 4.6 confusion matrix result of Naïve Bayes
a b c <===Classified as
1320 3218 213 a= January
775 9309 281 b = April
874 4518 372 c = March
66
The above confusion matrix shown that out of 20,880 instances 10,977were classified correctly
(52.6868 %) which is the diagonal partition and three of them were classified incorrectly
(47.3132 %). It is clear J48, PART and Naïve Bayes confusion matrix are similar.
67
The Experiment results of multilayer perception accuracy as shown in table 4.7 on cross
validation test mode. The accuracy 52.2701 % greater than all the other. That means multilayer
perception algorithm correctly classified about 52.2701 %and 47.7299 % incorrectly. Notice the
maximum time taken to build modelwas495.64 seconds when default parameters used and the
minimum was 157.05 seconds when hidden layers were 7 and learning rate 0.6.
The confusion matrix of multilayer perception is alike all the above others algorithm.
Table 4.8 confusion matrix result of multilayer perception
a b c Classified as
1093 3378 280 a= January
630 9388 347 b = April
724 4607 433 c = March
The above confusion matrix shown that out of 20, 880 instances 10,914 were classified correctly
(52.2701 %) which is the diagonal partition and three of them were classified incorrectly (47.7299
%). It is clear J48, PART, Naïve Bayes and multilayer perception confusion matrix are
competent but J48 performance is best.
4.6 Evaluation
Evaluation includes understanding the results, checking whether the discovered knowledge is
novel and interesting, interpretation of the results by domain experts, and checking the impact of
the discovered knowledge [1]. This research classification models were developed to determine
the reason for ATMs out of service of Dashen bank S.C. J48 decision tree exhibited model
performance matched with business objective. Because to extend the discovered knowledge
according to hybrid KDP model.
The following table 4.9 is the summarized classifier output. Looking closely at each of the
classifier output they have their own unique presentation. Additionally the weka preprocess tab
have feature to visualize and counting instances that can be easily observed.
68
Table 4.9 summarized classifier output
Name of the algorithm Tree : J48 Rule: Bayes: Naive Function:
PART Bayes Multilayer
Perception
Correctly Classified Instances 52.7059 % 52.591 % 52.6868 % 52.2701%
Incorrectly Classified 47.2941 % 47.4808 % 47.4282 % 47.7299 %
Instances
Time taken to build model 1.46 0.3 0.05 495.64
in
seconds
Additionally weka have feature to visualize data mining metrics that is easy to understand. The
other selected metrics that prove the experiment done as accuracy were precision, recall, and
ROC. The following figures snapshot taken from weka. The first one figure 4.1 which is
precision-recall curve.
69
Precision-recall curves are important to visualize the classifier performances as accuracy. The
aim is to observe whether precision-recall curve is towards the upper right corner of the chart.
The P-R curve displayed with respect to class value April and x-axis is recall (true Positive Rate)
and y a-axis precision.
As accuracy and P-R the other important measurement of model performance is ROC. ROC
curve displayed on figure 4.2 with respect to class value of April. The aim of producing ROC
curve is to have the curve close to upper left corner on y-axis which is one. The other different
interpretation is the given area under ROC as shown on figure 4.2. The AUROC is a comparison
of two operating characteristics (TPR y-axis and FPR x-axis). And the AUROC curve measured
describes the performances of the model if it is greater than 0.5 and less than 1.
70
On this research the J48 decision tree classifier based on evaluation metrics discussed above
performed better so it is selected to use the discovered knowledge. The following chapter had
shown the implementation of J48 model to produce prototype.
71
| | | | | | | | | | Application = Attempt to reboot via special electronic: January
(294.0/179.0)
| | | | | | | | | | Application != Attempt to reboot via special electronic
| | | | | | | | | | | DispenserCurrencyCassete = Bill cassette 4 is empty: January
(64.0/38.0)
| | | | | | | | | | | DispenserCurrencyCassete != Bill cassette 4 is empty
| | | | | | | | | | | | SOP = Operator switch changed to supervisor mode: January
(406.0/238.0)
| | | | | | | | | | | | SOP != Operator switch changed to supervisor mode
| | | | | | | | | | | | | DispenserCurrencyCassete = Device CashOut Module offline:
January (236.0/139.0)
| | | | | | | | | | | | | DispenserCurrencyCassete != Device CashOut Module offline
| | | | | | | | | | | | | | CardReader = Card reader: Shutter jammed open: March (3.0)
| | | | | | | | | | | | | | CardReader != Card reader: Shutter jammed open
| | | | | | | | | | | | | | | CardReader = Card reader: Blank track: January
(228.0/108.0)
| | | | | | | | | | | | | | | CardReader != Card reader: Blank track
| | | | | | | | | | | | | | | | DispenserPickCash = Dispenser: Purge bin not present:
March (8.0/2.0)
| | | | | | | | | | | | | | | | DispenserPickCash != Dispenser: Purge bin not present
| | | | | | | | | | | | | | | | | DispenserPickCash = Dispenser: Pick failure - out of
bills: March (246.0/153.0)
| | | | | | | | | | | | | | | | | DispenserPickCash != Dispenser: Pick failure - out of
bills: April (17768.0/8163.0)
Number of Leaves : 19
Size of the tree : 37
Time taken to build model: 1.3 seconds
72
CHAPTER FIVE
Research model, Use of the Discovered Knowledge, Implementation and
Evaluation
5.1 Research Model
The research is conducted on experimentation. According to the world book encyclopedia [15]
experimentation is a method used to discover facts and to test ideas. On this study to identify the
reason ATM out of service the four classifier algorithms J48, PART, NB, and MLP with the
default cross validation test design after sixteen experimentation produce models. The selected
J48 model was used for the extended use of discovered knowledge. Thus the discovered
knowledge revealed the desired reasons for ATM out of service in this thesis which are the
following components of ATMs.
DispensoerDevices: - 1) Device disconnected 2) Device offline
DispenserCurrencyCassete: - 1) Bill cassette Slot 4 is empty 2) Retracted cards/bills 3) Bill
cassette 4 is empty 4) Device CashOut Module offline
DispenserPickCash: - 1) Presenter clamping mechanism failed or jammed 2) Too many bills
rejected 3) Purge bin not present 4) Pick failure - out of bills
CardReader: - 1) Card Reader disconnected 2) Shutter jammed open 3) Blank track
Application: - 1) Transaction failed: Unable to dispense cash 2) Attempt to reboot via system 3)
Attempt to reboot via special electronic
SOP: - 1) Operator switch changed to supervisor mode, and
Network ConnectionB24:- 1) Device offline were hidden in the timestamp of ATMs events.
The following figure 5.1 is the model of this research.
73
Figure 5.1 Research model for design the reason ATM out of service
Figure 5.1 shows snapshot of the user interface through which displayed one of the class label
time stamp January and its frequency and reasons for ATMs out of service.
74
Figure 5.1 ATM out of service reason identifier system: - Prototype
The input for ATM out of service reason identifier system came from weka J48 classifier output
which is the designed model. In other words, the model have been input for the prototype.
5.4 Evaluation
For conducting usability testing test, seven ATM system administrators were involved; to
evaluate the prototype done on NetBeans IDE 8.2 from the point of effectiveness, efficiency and
their satisfaction accordingly the ISO 9241-11 usability testing features [25]. Out of the seven
system administrators, five of them are seniors in the area. The system administrators section is
the one who is responsible to follow-up any incidents on ATM. Before conducting the
evaluation process description of the prototype has been given for these evaluators.
Guiding questionnaires with two parts has been prepared (see APPENDIX A), for facilitating
experts response. The first part of the questionaries’ was supportive in identifying their
experience on ATM out of service reasons. And the second part was five level likert scale
(strongly agree (5), agree (4), neutral (3), disagree (2) and strongly disagree (1)) is used
for the reply of the usability testing test given questions.
75
The below table 5.1 summarizes the responses of system administrators.
1 7 0 0 0 0 5
2 5 2 0 0 0 4.7
3 7 0 0 0 0 5
4 3 4 0 0 0 4.4
5 7 0 0 0 0 5
6 7 0 0 0 0 5
7 7 0 0 0 0 5
8 7 0 0 0 0 5
9 7 0 0 0 0 5
10 7 0 0 0 0 5
11 7 0 0 0 0 5
76
CHAPTER SIX
Conclusions and Recommendation
6.1 Conclusions
The ATM service is not far from us in our day to day life. Most selected areas such as hotels,
shops, branch bank, universities, hospitals, etc. have a placement for ATMs. These ATMs
intended to provide service 24 hours a day, seven days week, and all the time. But customers of
ATMs service observe commonly out of service on these self-service machine display. This
study was to find the reason for ATM out of service.
The researcher consider Dashen bank S.C. ATMs to answer why ATMs are out of service.
Approach followed up to investigate was data mining technology and the six steps of hybrid
knowledge discovery process model was applied. One of the crucial step in KDP is DM which
enables to design a model so the researcher used weka knowledge discovery tool to identify the
causes for ATM out of service. The method to accomplish DM objective was by 16 experimental
analysis followed evaluation using appropriate metrics accuracy, PR, ROC, and AUROC and
proof the experiments were worthy.
Firstly the researcher together with domain expert particularly studied the event data to
understand and prepare a dataset for data mining task. To identify out of service event data 14
ATMs were purposely selected within periods, December 31, 2017 – May 04, 2018. From the
collected dataset of ProView console as per the objective of this research 44 events of which
hardware problem were 73 % and software 27% describing out of service happening were
identified. Next was preparing dataset that is suitable for DM. Preparation of data set includes
constructions of new attributes from the existing dataset, removing unnecessary attributes and
inconsistence values, and aggregation of attribute values.
After the completion of DM it was found that the basic reasons for ATM out of service were the
non-functionality ATM components which are
DispensoerDevices: - 1) Device disconnected 2) Device offline
77
DispenserCurrencyCassete: - 1) Bill cassette Slot 4 is empty 2) Retracted cards/bills 3) Bill
cassette 4 is empty 4) Device CashOut Module offline
DispenserPickCash: - 1) Presenter clamping mechanism failed or jammed 2) Too many bills
rejected 3) Purge bin not present 4) Pick failure - out of bills
CardReader: - 1) Card Reader disconnected 2) Shutter jammed open 3) Blank track
Application: - 1) Transaction failed: Unable to dispense cash 2) Attempt to reboot via system 3)
Attempt to reboot via special electronic
SOP: - 1) Operator switch changed to supervisor mode, and
Network ConnectionB24:- 1) Device offline were hidden in the timestamp of ATMs events.
The finding shows that 87 % ATMs out of service were in April, 10.7 % January, and 2.2 %
March. The major problem 87 % was on hardware DispenserPickCash Dispenser: Pick failure -
out of bills. And next the remaining were CardReader. Application, SOP and Network
ConnectionB24.
The final step of hybrid KDP is use of the discovered knowledge. To use this discovered
knowledge the finding J48 model used to produce a prototype and the implementation done
using Java on NetBeans IDE 8.2 platform. The Extended knowledge to this result is ATM out of
service reason identifier system that make users easily identify the reasons for ATM out of
service.
The best selected classification algorithm was J48. The selected DM measurements metric
accuracy was 52.7059 %, PR curve shown a good performance which is a recall x axis 1 and y
axis 0.5 at start. ROC have the curve like bow and its peak reached to one to one where as the
area ROC is 0.6161 which is worthy.
The designed model found in this study identify the reasons for ATM out of service and made
classification. Classified ATM out of service reasons support ATM supervisors and increase the
availability and improve the service. To have higher availability on ATMs components and
service it needs further research.
78
6.2 Recommendation
Quality of ATM service is desired at most for customer satisfaction and to gain more on return
investment. Bank industry is responsible to make maximum in service of ATMs. This study
looks at closely Dashen bank ATMs in order to identify the reasons for ATMs out of service.
Based on the findings of the study the following recommendations are forwarded.
In this study, an attempt is made to identify the causes of ATMs out of service
using scarce data captured by ProView. However preparing enough quality data
needs to be considered for future study to simplify the data mining task.
In this study an attempt is made to use the knowledge discovered for identifying
the reasons for ATM out of service. We recommend to integrate data mining
result with the ProView so as to detect the reasons automatically for taking
immediate corrective action.
One of the research on ATM is the hardware parts card reader, dispenser and all
its sub components time to failure prediction in order to classify ATM out of
service by time.
If currency cassettes are not empty it increases the availability of ATM; therefore,
how to make ATM not run empty of a cash proactively needs to construct a
predictive model which is left for further research.
79
Reference
[1] Cios et al., Data Mining A Knowledge Discovery Approach, New York: Springer, 2007.
[2] W. Ian H and F. Eibe, Data Mining: Practical machine learning tools and techniques, San
Francisco: Morgan Kaufmann, 2016.
[3] J. Han and M. Kamber, Data mining: concepts and techniques, San Francisco: Morgan
Kaufmann, 2006.
[4] M. H. Dunham, Data mining: Introductory and advanced topics, NJ: Pearson Education,
2006.
[5] Dashen Bank, "Company profile," Dashen Bank, 4 September 2018. [Online]. Available:
https://dashenbanksc.com. [Accessed 4 September 2018].
[6] Dashen Bank, "21st Annual Report for the year," Dashen Bank, Addis Ababa, 2017.
[7] W. Gardachew, "Electronic-banking in Ethiopia-practices, opportunities and challenges,"
Journal of internet Banking and commerce, vol. 15, no. 2, pp. 1-8, 2010.
[8] Wincor Nixdorf, Administration & System Management ProView Reporting API,
Paderborn: Wincor Nixdorf, 2015.
[9] Dashen Bank, Risk Management policy Manual, Addis Ababa: Dashen Bank S.C., 2017.
[10] Dashen Bank, "Internal Audit Policy," Internal Audit Policy, pp. 10-15, 15 Feburary
2017.
[11] T. Mahmood and G. M. Shaikh, "Adaptive Automated Teller Machines," Expert Systems
with Applications, vol. 40, pp. 1152-1169, 2013.
[12] Gümüş et al., "Ultimate Point in the Service Provided by the Banks to Their Customers:
Customer Satisfaction in the Common Use of ATMs," Social and Behavioral Sciences, vol. 207,
pp. 98-110, 2015.
[13] S. Madhavi, S. Abirami, C. Bharathi, B. Ekambaram, T. Krishna Sankar, A. Nattudurai,
N. Vijayarangan, "ATM Service Analysis Using Predictive Data Mining," International Journal
of Computer, Information, Systems and Control Engineering, vol. 8, no. 2, 2014.
[14] WordWeb software, "WordWeb 8.03a," Princeton University, 2016.
[15] Field enterprises Education Corporation, The world book encyclopedia Volume 6, USA
FGA: Field enterprises Education Corporation, 1967.
[16] Wincor Nixdorf, ProView Console 4.2/40, Paderborn: Wincor Nixdorf, 2015.
[17] Wincor Nixdorf, ProView Data Model 4.2/40(C), Paderborn: Wincor Nixdorf, 2015.
[18] P. Bastos et al, "A Maintenance Prediction System using Data Mining Techniques,"
Proceedings of the World Congress on Engineering , vol. III, pp. 1148-1453, 2012.
80
[19] Two Crows Corporation, Introduction to data mining and knowledge, Potomac: Two
Crows Corporation, 2005.
[20] G. Tewary, "Data Mining Through Neural Networks Using Recurrent Network," itccma,
pp. 57-74, 2015.
[21] Fayyad et al., "From Data Mining to Knowledge Discovery in Databases," AI Magazine,
vol. 17, no. 3, 1996.
[22] Transaction Network Services, What Payments Trends Should You follow in 2018?,
Virginia: Transaction Network Services, 2017.
[23] INETCO, Unlocking Your ATM “Big Data”: Understanding the, Burnaby: INETCO,
2015.
[24] Wincor Nixdorf, ProView Data Model 4.2/40(C), Paderborn: Wincor Nixdorf, 2015.
[25] ISO, "ISO Online Browsing platform," The International Organization for
Standardization, 26 January 2018. [Online]. Available: https://www.iso.org/obp/ui/#search.
[Accessed 26 January 2018].
[26] NetBeans, "NetBeans," NetBeans, 26 January 2018. [Online]. Available:
https://netbeans.org/. [Accessed 26 January 2018].
[27] Cabena et al, Data Mining:From Concepts to Implementation, New Jersey: Prentice Hall
Saddle River, 1998.
[28] Farooqi et al, "Effectiveness of Data mining in Banking Industry: An empirical study,"
International Journal of Advanced Research in Computer Science, vol. 8, no. 5, pp. 827-830,
2017.
[29] C. Mwatsika, "Customers’ satisfaction with ATM banking in Malawi," African Journal of
Business Management, pp. 218-227, 2014.
[30] Chan et al., "Modified automatic teller machine prototype for older adults: A case study
of," Applied Ergonomics, no. 40, pp. 151-160, 2009.
[31] Moslehi et al, "Analyzing and Investigating the Use of Electronic Payment Tools in
Iran," Journal of AI and Data Mining, pp. 417-437, 2018.
[32] Electro Magnetic Components Inc. , "ATMs: How They Work and Basic ATM Parts,"
Electro Magnetic Components Inc. , 2015. [Online]. Available:
http://www.atmparts.net/atm-parts/. [Accessed 28 February 2019].
81
APPENDIX
ATM OUT OF SERVICE REASON IDENTIFIER SYSTEM
Usability Testing Questionnaire (Users: - System Administrator)
This questionnaire is intended to know admins knowledge related to ATM out of service reasons
based on existing system ProView, B24, and NetMon and others.
3. On which ATM monitoring system tool you have experience for identifying status?
__________________________
4. Are you familiar with the ATM out of service reasons? ☐ Yes ☐ No
II. Prototype
The following items are related to usability testing test of ATM out of service reason identifier
system. Please indicate your agreement by making “ “in the boxes
82
No. Question
agree
Strongly
Agree
Neutral
Disagree
Disagree
Strongly
1 Do you think classifying ATM out of service with respect
to timestamp of months in the prototype is essential?
2 Are you satisfied when events are labeled or classified with
time stamp of the three months (January, March and April)
3 Do you think the ATM out of service reason identifier
system prototype can be used easily?
4 Do you think the system is good enough product?
5 Do you think the response time for most operations is fast
enough?
6 Do you think the cost of ATM out of service reason
identifier system is lower than any real time monitoring
system?
7 Do you think the system can be used by any user with
basic knowledge of using computers?
8 Do you think the text which appears on the pages is clearly
readable throughout operating on the system?
9 Do you think the interaction to accomplish tasks is simple
and complete with a few seconds?
10 Do you think the menu items are consistently located and
work without failure?
11 Is ATM out of service reason identifier system easy to use
to view, edit, and copy?
Please write any other comment about the ATM out of service reason identifier system:
______________________________________________________________________________
______________________________________________________________________________
________ ________________________________________________________________
83