0% found this document useful (0 votes)
238 views95 pages

1 Dashen - ATM - Fasika Wondimu 2019

This thesis examines designing a model to identify reasons for ATMs being out of service at Dashen Bank in Ethiopia. The author, Fasika Wondimu, conducted research under the supervision of Dr. Million Meshesha at Addis Ababa University's School of Information Sciences to fulfill the requirements for a Master of Science degree in Information Science. Various data mining techniques were applied to ATM log data from Dashen Bank to build predictive models for identifying the most common reasons for ATMs being out of service. The best performing models were found to be decision tree and Naive Bayes classifiers.

Uploaded by

nhl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
238 views95 pages

1 Dashen - ATM - Fasika Wondimu 2019

This thesis examines designing a model to identify reasons for ATMs being out of service at Dashen Bank in Ethiopia. The author, Fasika Wondimu, conducted research under the supervision of Dr. Million Meshesha at Addis Ababa University's School of Information Sciences to fulfill the requirements for a Master of Science degree in Information Science. Various data mining techniques were applied to ATM log data from Dashen Bank to build predictive models for identifying the most common reasons for ATMs being out of service. The best performing models were found to be decision tree and Naive Bayes classifiers.

Uploaded by

nhl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

Addis Ababa University

College of Natural Sciences


School of Information Sciences

Designing a model for identifying the reason for ATM out of service:
The case of Dashen bank

By

Fasika Wondimu

A Thesis Submitted to the School of Graduate Studies of Addis

Ababa University in Partial Fulfillment of the Requirements for the

Degree of Master of Science in Information Science

12 February 2019

Addis Ababa, Ethiopia


Addis Ababa University
College of Natural Sciences
School of Information Sciences

Designing a model for identifying the reason for ATM out of service:
The case of Dashen bank

By

Fasika Wondimu

Advisor

Dr. Million Meshesha

12 February 2019

Addis Ababa, Ethiopia


Addis Ababa University
College of Natural Sciences
School of Information Sciences

Designing a model for identifying the reason for ATM out of service:
The case of Dashen bank

By

Fasika Wondimu

Name and Signature of Members of the Examining Board

Name Title Signature

__________________________________ Advisor ______________________________

_________________________________ Examiner ______________________________

_________________________________ Examiner______________________________
ACKNOWLEDGEMENT
Firstly, I give glory to God who is above all and gave me faith and trusted me in this thesis work.

Secondly, thanks for the beloved and courageous advisor Dr. Million Meshesha who worked
hard with me in this thesis.

My special thanks go to my classmate Berihun Hadis who was with me in implementation of this
thesis.

Thank you Dashen bank staffs Asnake Kebede and Metasebiya Ayisow who are helping me
when I made data collection.

Lastly my thanks also go to Dr. Tibebe Besha for his kindly encouragement.
DEDICATED
I dedicate this thesis for my spouse Bethlehem Abera and son Joshua Fasika.
Table of Contents
LIST OF FIGURES ........................................................................................................................................... 8
LIST OF TABLES .............................................................................................................................................9
LIST OF ACRONYMS.....................................................................................................................................10
ABSTRACT ................................................................................................................................................... 12
CHAPTER ONE............................................................................................................................................... 1
1.1 Background ................................................................................................................................... 1
1.2 Dashen bank and its services ....................................................................................................... 3
1.3 Statement of the problem ........................................................................................................... 5
1.4 Objective of the study .................................................................................................................. 9
1.4.1 General objective ................................................................................................................. 9
1.4.2 Specific objectives ................................................................................................................ 9
1.5 Scope and limitation of the study................................................................................................ 9
1.6 Significance of the study ............................................................................................................ 10
1.7 Methodology of the study ......................................................................................................... 11
1.7.1 Research design .................................................................................................................. 11
1.7.2 Understanding of the Problem .......................................................................................... 11
1.7.3 Understanding of the Data................................................................................................. 12
1.7.4 Preparation of the data ...................................................................................................... 13
1.7.5 Data mining for predictive modeling ................................................................................. 13
1.7.6 Evaluation of the Discovered Knowledge .......................................................................... 14
1.7.7 Use of the Discovered Knowledge ..................................................................................... 16
1.8 Operational definition................................................................................................................ 16
1.9 Organization of the research ..................................................................................................... 16
CHAPTER TWO ............................................................................................................................................ 18
2.1 Overview of data mining ............................................................................................................ 18
2.2 Data mining tasks ....................................................................................................................... 20
2.3 DM process models .................................................................................................................... 29
2.3.1 Academic Research Models ............................................................................................... 29
2.3.2 Industrial Model ................................................................................................................. 30
2.3.3 Hybrid model ...................................................................................................................... 31
2.4 Data mining applications ........................................................................................................... 35
2.5 Related works ............................................................................................................................. 37
2.6 Related works of ATM Banking service ..................................................................................... 41
CHAPTER THREE.......................................................................................................................................... 45
3.1 Understanding of the problem domain ..................................................................................... 45
3.2 Understanding of the data ......................................................................................................... 50
3.3 Data Preparation ........................................................................................................................ 56
CHAPTER FOUR ........................................................................................................................................... 63
4.1 Building the model ..................................................................................................................... 63
4.2 Experimental result using J48 decision tree .............................................................................. 63
4.3 Experiment result using PART .................................................................................................... 65
4.4 Experimental results of Naïve Bayes ......................................................................................... 66
4.5 Experimental result of multilayer perception ........................................................................... 67
4.6 Evaluation ................................................................................................................................... 68
CHAPTER FIVE ............................................................................................................................................. 73
5.1 Research Model .......................................................................................................................... 73
5.2 The System Development .......................................................................................................... 74
5.3 The Prototype ............................................................................................................................. 74
5.4 Evaluation ................................................................................................................................... 75
5.5 Analysis and interpretation ....................................................................................................... 76
CHAPTER SIX ............................................................................................................................................... 77
6.1 Conclusions ................................................................................................................................. 77
6.2 Recommendation ....................................................................................................................... 79
Reference.................................................................................................................................................... 80
APPENDIX ................................................................................................................................................... 82
LIST OF FIGURES
Figure 2.1 Data mining models and tasks [4] ............................................................................... 20
Figure 2.2 ANN model [1] ........................................................................................................... 26
Figure 2.3 Backpropgation learning proceess [ 3] ....................................................................... 27
Figure 2.4 Graphical representation of sigmoid function [3] ....................................................... 28
Figure 2.5 Sequential structure of the KDP model [1] ................................................................. 29
Figure 2.7 The six-step KDP model [1] ........................................................................................ 32
Figure 3.1 Event table on Database Server ................................................................................... 51
Figure 3.2 ProView Monitoring Consoles .................................................................................... 52
Figure 3.4 Event message out of service ...................................................................................... 59
Figure 3.5 the class event time stamp 2018 and its values ........................................................... 62
Figure 4.1 precision-recall curve of class April ............................................................................ 69
Figure 5.1 ATM out of service reason identifier system: - Prototype .......................................... 75
LIST OF TABLES
Table 1.1 ATM availability report, from Jan. 1 – Dec. 31, 2017 ................................................... 6
Table 1.2 Binary classes of confusion matrix ............................................................................... 15
Table 3.1 Event content and description ....................................................................................... 48
Table 3.2 Event table from ProView Data Model 4.2/40 (C) Wincor Nixdorf International 2015
....................................................................................................................................................... 50
Table 3.3 Event table on ProView Monitoring Console ............................................................... 53
Table 3.4 Description of event table on ProView Monitoring Console ....................................... 54
Table 3.5 Size of ATM instances and location ............................................................................. 55
Table 3.6 Selected dataset attributes ............................................................................................. 56
Table 3.7 New derived attributes and descriptions. ...................................................................... 57
Table 3.8 Data transformation of Time stamp .............................................................................. 60
Table 3.9 List of attribute used for model building ...................................................................... 60
Table 4.1 Experimental result of J48 decision tree algorithm ...................................................... 64
Table 4.2 Confusion matrix for the result of J48 decision tree ..................................................... 64
Table 4.3 Experimental result of PART algorithm ....................................................................... 65
Table 4.4 Confusion matrix for the result of PART rule induction .............................................. 65
Table 4.5 The result of Naïve Bayes algorithm accuracy ............................................................. 66
Table 4.6 Confusion matrix result of Naïve Bayes ....................................................................... 66
Table 4.7 Experimental result of multilayer perception ............................................................... 67
Table 4.8 Confusion matrix result of multilayer perception ......................................................... 68
Table 4.9 Summarized classifier output........................................................................................ 69
Table 5.1 Detailed summary of questionnaire result .................................................................... 76
LIST OF ACRONYMS
ANN Artificial Neural Network
ATM Automatic Teller Machine
B24 Base 24
BI Business Intelligence
CPO Cashier Payment Order
CSV Comma separated value
Dashen bank S.C. Dashen bank Share Company
DM Data Mining
E-banking Electronic Banking
EPP Encrypting PIN Pad
IDS Intrusion detection system
KDD Knowledge Discovery in Databases
KDP Knowledge Discovery Processes
ML Machine Learning
MS Micro Soft
MLP Multilayer perception
MSE mean square error
NB Naive Bayes
NGO Non-Government organization
PIN Personal identification number
PIN Personal Identification Number
POS Point of Sales
ProM Process Miner
QoS Quality of Service
RDBMS Relational database management system
ROC Receiver Operating Characteristic
AUROC Area Under the Receiver Operating Characteristics
SOP Supervisor of Proview
UT Usability Testing
VB Visual Basic
WEKA Waikato Environment for Knowledge Analysis
ABSTRACT
Yet it is true ideally that ATM service thought to be in service 24 hours a day, 7 days a week,
and 365 days a year; But, frequently it has been seen on these self-service machine out of
service. This study therefore aims to find the reasons for ATMs out of service, taking Dashen
bank ATMs as a case.

The research follows a hybrid knowledge discovery process (KDP) model which has six steps.
The first step is understanding the problem, which helps to describe the problem and identify
attributes for data collection from purposely selected 14 ATMs within periods, December 31,
2017 – May 04, 2018. This is followed by understanding the data and prepare dataset for data
mining tasks. Preparation of data set includes constructions of new attributes from the existing
dataset, removing unnecessary attributes and inconsistence values. From the collected dataset of
ProView ATMs real time monitoring console as per the objective of this research 44 events
describing out of service happening were identified. The crucial step in KDP is data mining
which enables to design a predictive model. In this study WEKA knowledge discovery tool is
used to identify the reasons for ATM out of service.

Experimental analysis shows that J48 decision tree registers the best result with accuracy of
52.7059 %. Finally, using the best predictive model of J48 decision tree we designed a prototype
on NetBeans IDE 8.2 platform. This prototype was evaluated based on ISO 9241-11 UT test
features by domain experts. According to the UT test the prototype was effective, efficient and
hold on user satisfaction. However, an automatic integration of the model with ProView will
help to detect and solve the problem immediately for enhancing ATM in service.
CHAPTER ONE
INTRODUCTION
1.1 Background
The Financial industry, such as banks generate continually a variety of enormous data. The
accumulated data doesn’t make any sense unless it becomes valuable. Enterprises analyze their
historical data by appropriate technologies for business intelligence. The technology applicable
for business intelligence is data mining, the goal of which is to make companies gain competitive
advantage and hence profitable [1].

One of the important bank services nowadays is ATM. On ATM particularly withdrawal became
the most common feature of electronic banking. Government, public sectors business and others
primarily choose payment to be electronically made using ATM. Banks manage, control and
provide secure electronic payment for their customers satisfaction and in turn work hard to gain
profit. Customers who have a plastic card and a personal identification number (PIN) can
withdraw, make deposits or transfer funds between accounts anywhere ATM located depending
on bank products [12].

ATMs are data source as core bank system and produce financial transaction. ATMs are self-
service machine and core bank system with bankers in bricks and mortar both generate massive
data. Because of these, data is alarmingly growing in size, in speed of generation and in variety
of data types. This makes data analysis one of the challenging task for human experts and
necessitates the introduction of automatic data analysis techniques, called data mining. The aim
of data mining is to make sense of large amounts of structured, unstructured, and semi structured
data, available in a given domain [1].

According to Witten, et.al [2], data mining is the extraction of implicit, previously unknown, and
potentially useful information from large amount of data that can be critical for problem solving
and decision making.

1
Data mining is a technology that uses various classification, clustering and association rule
discovery techniques to extract hidden knowledge from heterogeneous and distributed historical
data stored in large databases, data warehouses and other massive information repositories so as
to find patterns in data that are[3]:

 valid: not only represent current state, but also hold on new data with some
certainty
 novel: non-obvious to the system that are generated as new facts
 useful: should be possible to act on the item or problem
 understandable: humans should be able to interpret the pattern

According to Dunham [4], data mining tasks are classified as predictive and descriptive.
Predictive models are supervised learning because the class label identified before data analysis.
It contains two processes. The first process finds the best fit model based on the classification
algorithm. Then use the model for determining the class for new records and instances.

There are many different kinds of classification algorithms available for predictive modeling.
Some of the well-known are Naïve Bayesian algorithm, neural network, and decision tree [1][3].
Each algorithm has its own importance and there is a specific condition where they are best
suited to apply for the purpose of data mining depending on the best fit and application.

Descriptive mining tasks characterize the general properties of records in the data repository
[3].Clustering of data, summarization, association rules, and sequence discovery are considered
as descriptive. Descriptive mining considered as unsupervised learning. It is unlike predictive
because there is no class labeled considered on descriptive mining task.

Data mining is a multidisciplinary science and reach vast array of areas such as; biomedical
science, in business of commerce and financial institution, telecommunication and retail
industry. Even though data mining is applicable in so many fields, Han and Kamber [3] pointed

2
out that, it is not free from challenges regarding to mining methodology, user interaction,
performance, and diverse data types with poor quality.

Data being incomplete and noisy, therefore to produce cleaned data; remove noise and
incompleteness to achieve the objective of data mining proper data preprocessing methods are
required.

Challenge to data mining regarding performance is the efficiency and scalability of data mining
algorithms. The running time of a data mining algorithm must be predictable and acceptable in
large databases [3].

Presentation and visualization of data mining results is the other requirements. The extracted
knowledge presentation required to be easily understood and directly usable by humans.
Therefore the presentation need to be on trees, tables, rules, graphs, charts, crosstabs, matrices,
or curves.

The other challenge is requirements of the development of numerous data mining techniques.
These techniques include data characterization, discrimination, association and correlation
analysis, classification, prediction, clustering, outlier analysis, and evolution analysis.

Need of background knowledge on that specific area also is an issue. Therefore understanding of
the domain area is the key because the domain expert can easily interpret the required
knowledge.

1.2 Dashen bank and its services


Dashen bank is one of the forefront banks in Ethiopia financial industry. It is established by
eleven visionary shareholders and veteran bankers with initial capital of Birr 14.9 million in
September 1995[5].As of the year 2017it operates through a network of more than 370 branches,
ten dedicated Forex Bureaus, 305ATMs and 812 plus Point-of-Sale (POS) terminals spread
across the length and breadth of the nation[5]. It has established correspondent banking
relationship with 462 banks covering 70 countries and 170 cities across the world[6].The bank

3
also forehead on introducing up-to-date technologies because of its mission, “Provide efficient
and customer focused domestic and international banking services by overcoming the continuous
challenges for excellence through the application of appropriate technology”[6]. According to
Gardachew [7], Dashen bank is a leader in introducing E-banking service in Ethiopia.

ATM service increases for cash withdrawal from time to time. Dashen bank 2017 annual report
figured out the ATMs card banking expansion, which is grown by 28%or new 123,198
customers joined the card banking service, and there are now a total of 556,688 card holders, 305
ATMs and 837 POS (uses for cash transfer) terminals[6].These ATMs and POS terminal accept
also international cards including visa, MasteCard, UnionPay, and American Express[6].These
services found in different locations such as hotels and resorts, universities, hospitals, tour and
travel agencies, gallery and jewelry shops, cafés and restaurants, fuel stations, supermarkets,
mall, among others.

Different Dashen Bank S.C departments work cooperatively to see high in service time on
ATMs. Ideally customers who have a plastic card and a PIN can withdraw cash or transfer funds
between accounts at any time wherever ATMs/POSs are located.

The concern of this study is to apply data mining on the accumulated ATM data so as to improve
ATM card bank services of Dashen bank. The self-service machine ATM components, both
hardware and software are expected to perform the intended task effectively and efficiently and
communicate with remote management in real time continuously for whatever event happening.

Application areas of data mining are enormous. According to Han and Kamber [3] DM applied
for financial analysis, telecommunications, biomedicine and science, counterterrorism (including
and beyond intrusion detection) and mobile (wireless) data. Financial data collected in bank and
financial industries are often relatively completed, reliable, and of high quality, which facilitates
systematic data analysis and data mining [3]. DM in bank industry is applied for loan payment
prediction and customer credit policy analysis, detection of money laundering and other financial
crimes, classification and clustering of customers for targeted marketing, and also to design and
construct data warehouses for multidimensional data to find general properties of data [3].

4
1.3 Statement of the problem
One of the technological advancement in banking service is the introduction of ATM which
makes work easy and more effective. Yet it is true ideally that ATM service should be in service
24 hours a day, 7 days a week, and 365 days a year. However, Dashen bank S.C. ATMs Proview
system management and administration tool [8] availability report present the in-service, out of
service, and variables that makes ATM out of service. As presented in table below, 12 selected
ATMs out of service range from 0.37% - 20.12%. The report also pointed out that, the causes of
out of services are Hardware faults, Cash Dispensing, Daily operation, Network issue B24, and
Net connectivity. Out of these causes Daily operation, Network issue B24, and Network
connectivity of ProView are the dominant factors for ATM out of service.

5
Table 1.1 ATM availability partial report, from Jan. 1 – Dec. 31, 2017
Reasons for Out-of-Service
Hardw Cash Daily Network Net
ATMs In Out-of-
are Dispensing Operations Issue Connectivity
Names Service Service
Faults B24
of proview
PATML003_Hilton 89.63% 10.37% 0.75% 3.79% 3.23% 5.77% 2.02%
PATML004_ADA 80.25% 19.75% 0.20% 0.00% 5.31% 2.48% 13.89%
MS
PATML005_SHEB 79.88% 20.12% 0.00% 0.00% 2.74% 11.80% 5.64%
ELLE
PATML006_ETHI 87.77% 12.23% 2.23% 1.82% 4.16% 3.28% 3.74%
OP
PATML007_DHGE 84.97% 15.03% 1.53% 3.43% 6.36% 9.00% 1.26%
DA
PATML009_LUCY 86.99% 13.01% 2.71% 2.72% 5.30% 5.97% 1.49%
PATML012_RASH 89.11% 10.89% 1.59% 1.74% 3.14% 4.19% 2.22%
OT
PATML013_MAIN 95.92% 4.08% 0.02% 0.00% 0.00% 0.00% 3.59%
PATML019_TK 82.02% 17.98% 1.58% 2.11% 7.94% 7.52% 2.55%
PATML022_HAR 82.80% 17.20% 0.10% 1.66% 14.10% 9.46% 0.07%
MONY
PATML029_USEM 99.63% 0.37% 0.20% 0.00% 0.04% 0.00% 0.00%
BA
PATML029_USEM 90.10% 9.90% 4.51% 1.16% 1.99% 2.91% 1.58%
BA
SUM 1049.07 150.93 15.42 18.43 54.31 62.38 38.05
AVERAGE 87.43% 12.58% 1.29% 1.54% 4.53% 5.20% 3.17%

As shown in table 1.1, the ATM is out of service on average 12.58% of the time. In principle
ATM out of service greater than 10% is greatly affects the business [9][10]. Dashen bank ATM
also shows a 2.58% increase on out of service which needs a close attention. From the reasons
for out-of-service Network issue B24 is the dominant factors with 5.20% contribution. This
followed by daily operation and net connectivity 4.53% and 3.17% respectively. Hardware faults
and cash dispensing cases have also a contribution for ATM out of service.

6
It is clear that if the ATM out of service is too much it will be a risk for the business. The bank
targeted to minimize out of service to less than 10%. To this end, all ups and downs of ATM are
monitored 24 hours a day from headquarter for remedial action. If it is up; there is a need to
check proper working of machines components such as cash dispenser, different sensors, cash
cassettes, application software and operating system so that E-payment is possible or not.

Solutions for ATM’s always have delay. The simplest scenario of replenishing money in
shopping centers and hotels ATMs for empty cassette have a delay of 30 - 45minutes because of
traffic jam in Addis Ababa. The fastest measure taken to minimize out of service is manual
observation by calling headquarters call center and on real time monitor using base 241 and
ProView monitoring console as well. Call centers forward the incident for the responsible
section. Communication to report failures and to make it online and operational might not be
smooth. It takes not few minutes but even a day. Because most ATM’s located in hotels,
shopping centers, and universities are running by headquarters. Observation and diagnostics
solution for all areas of ATMs have delays and also it is not automatic. On the other hand,
preventive maintenance solution is ineffective. Because the schedule prepared for maintenance
may not address some of the variables of out of service such as power, network and application.

ProView monitoring system by itself couldn’t be a complete product. The first is the
incompleteness of the data generated from ProView, second the availability of this application,
and third system product auditing, predictive maintenance, and the configurations are untouched
and produce stale reports.

There is no proactive predictive detection and preventive maintenance of ATMs out of service.
This means, that a corrective action is taken after the problem happens, which takes more time,
effort and cost. This results in customer’s dissatisfaction of the ATM service and bank also loses
revenue. For the banks analyzing such a problem and find a solution beside the routine works
became a critical task. The application of DM technology enables design a model that identify
the causes for ATM out of service and schedule accordingly for corrective action. On the other

1
Base 24 is an application software used to manage ATMs

7
side, the size of ATMs event data including transaction is so big so that it couldn’t be analyze in
ordinary descriptive statistics. Also the accuracy and visualization of data mining is powerful
than descriptive statistics.

There are limited studies that apply data mining on ATM service [13]. The concerns of these
studies are firstly on ATM operation menu so as to minimize customers waiting time in
queue[11].The study conducted by Gümüş et al.[12] consider customer satisfaction in the
common use of ATMs; with the purpose of identifying the satisfaction levels of common ATM
users. Madhavi et al. [13] attempt to explore ATM transaction dataset so as to pinpoint several
downsides in its service to predict the ATM usage level and identify peak time of an ATM in a
day/month, and spot any ATM transaction.

There is no local or international study that attempt to investigate the reasons for ATM out of
service using data mining techniques.

It is therefore the aim of this study to explore the reasons for ATM out of service using data
mining classification algorithms. To this end, this study attempts to explore and answer the
following research questions:-

 What are the suitable attributes of ATMs that describes ATM in service and out of
service?
 Which DM technique is more suitable for designing a model for identifying the reason
ATM out of service?
 What are the most interesting patterns that determine the reasons for Dashen bank S.C.
ATMs out of service?
 To what extent the model identifies the reasons for ATM out of service?

8
1.4 Objective of the study
1.4.1 General objective
The main objective of this study is to design a model using data mining technology so as to
identify the reason for ATM out of service and take corrective action proactively.

1.4.2 Specific objectives


To achieve the general objective, this research accomplishes the following specific objectives:

 To review literature so as to understand and select DM techniques, algorithms and


approaches well-suitable for the DM classification task at hand
 To understand the problem and collect historical data of ATM from the right source
 To generate quality data by applying data preparation tasks such as data
preprocessing, data cleaning, data reduction and data transformation
 Design a model that can identify the reason for ATMs out of service.
 Evaluate the performance of models using effectiveness measures such as accuracy,
recall, precision and ROC.
 Prepare usability testing test for evaluation of the model that is input for prototype

1.5 Scope and limitation of the study


Though ATM services are widely used nowadays, the problem of frequent ATM out of service
affects consistent and timely service provision of Dashen bank for customers. Hence, this study
aims to apply data mining to determine the reason for Dashen bank’s ATMs out of service. The
five reasons of out of service are hardware faults, cash dispensing, daily operation, and network
issue B24, and network connectivity of Proview. Out of these reasons only daily operation,
network issue B24, hardware faults, and cash dispensing which are the dominant factors
considered. Generally data mining will be applicable for different kind of business. But the
study looked at only Dashen bank ATM’s database to extract data covering from December 31,
2017 to May 04, 2018, of 16MB size, 365,522 records, and 6 attributes from devices event
table as data source.

9
There are two major data mining tasks: Predictive and descriptive modeling [4]. The preferred
DM task is design model using classification algorithm on Weka knowledge discovery tool [2].
According to Han and Kamber [3], predictive model applied when forecasting is essential. This
study selects only predictive modeling to identify the reason for ATM out of service using
classification algorithm.

1.6 Significance of the study


The output of this research will benefit all bank customers, banks, and researchers. If cash
withdrawal services be available at any time; it saves time, labor, cost, and can facilitate the
conduct of transactions at a given time cleanly, safely and securely. This will enhance customers’
satisfaction and will increase revenue of banks.

This research has an alignment with Dashen bank S.C. work plan, control, management, policy
and strategy and will have an input particularly on electronic payment using ATM service. The
desired output report of this work support decision maker future direction regarding ATM
service. In other words, the prediction of out of service ATM will produce quality service and
cost reduction. In similar fashion other banks may apply such mining and may have sound
decision.

This study will be an initial attempt locally and may motivate other researchers who have an
interest in the same area to conduct further research that enhance the expansion of ATM service
in the country by the different commercial banks offering the service.

10
1.7 Methodology of the study
This study aims to give solution regarding the problem of ATM being out of service. Such issues
include, why the ATM is out of service? What are the determinant components of ATM that
makes cash withdrawal impossible?

Producing a solution for a given problem presupposes following a scientific principles and
guidelines from the startup to the end. The principles answer systematically the question that
lead to achieve the objective of the study. To these end there are varieties of methodology
followed depending on research characteristics or discipline. The methodology defines the step-
by-step procedure the research has to follow defining concepts or phenomenon that need
research in such a way that the complexity of the problem at hand has clear picture and
understandable [14].

1.7.1 Research design


This study follows experimental research. As the world book encyclopedia [15] stated,
experimentation is a method used to discover facts and to test ideas. The science that proceeds
with experimentation will produce proves of hypothesis by experiment.
For conducting systematically the experiment, the study uses the hybrid DM process
model[1].This model is selected because of the following reasons; emphasize on understanding
of the problem domain and data mining, it is research-oriented structure and the model has
several feedback, and the extracted knowledge extended for another application domain. The
hybrid process model has six-steps[1]; understanding of the problem, understanding of the data,
preparation of the data, data mining, evaluation, and use of the Knowledge.

1.7.2 Understanding of the Problem


Kick off step is understanding of the data mining application domain, looking close at that
specific domain, collaborate working with the domain expert to define clearly the problem and
determine the project goals, identifying key people, and learning about current solutions to the
problem. It is a key learning domain-specific terminology [1].

11
To understand the domain the researcher used primary sources such as discussion with the
domain expert and secondary sources which includes document analysis of ProView Operation
Manual V4240 and Console User Manual V4240 ,Dashen bank yearly annual reports, intranet
and extranet portal, magazines, and internet have been used to have deep insight.

1.7.3 Understanding of the Data


The data for this research is available from Dashen bank S.C. ATMs database. The database on
real time base for years collected, saved, and that can be retrieved. From that database the
research considers the events database. The event table is data source of this research. Because
the ProView monitoring console [16] contains the necessary attributes and their values for this
research. Event table contains activity, time, and case or sequences of events.

Upon the foundation of kick off step this phase builds the following tasks [1]:-

 Data collection and sampling and deciding which data to use for data mining task.

 Data are checked for completeness, redundancy, missing values, believability of attribute
values.

 The final task on this phase is verification of the usefulness of the data with respect to the
DM goals.

In this stage, data collection activity done from ProView. ProView Data Model 4.2/40(C)
Wincor Nixdorf International 2015[17], the log event database for Dashen bank ATMs
monitoring have organized its data in many tables. The researchers in collaboration with domain
experts have selected the event table and its attributes which is interesting for this particular
research. The data found from ProView monitoring console exported in MS excel format for
visualization and check the quality of data.

12
1.7.4 Preparation of the data
The task in this phase, upon the foundation and building of the previous two successive phases
decides which data is used as an input for DM methods in the consequent step. Data mining
results are highly dependent upon data preparation. Poor data preparation results incorrect
results. This step takes too much time to complete the followings tasks:-

 Checking the completeness of data records which includes correcting for noise
or outliers and filling missing values by correct attribute values.

 Constructions of new attributes from the given ones.

 Data transformation by normalization (make attribute values with in specified


boundaries) and aggregation (using concept hierarchy).

 Data reduction is removing irrelevant and duplicate attributes and reduces


number of instances by sampling.

 Data preparation also includes suiting quality data for DM tool selected.

The data preparation task is performed using MS excel and Weka. Therefore checked the
completeness of records, construction of new attribute, data transformation, data reduction, and
selection of attributes and filling missing values.

1.7.5 Data mining for predictive modeling


This step uses the quality data prepared in the previous step so as to find the desired knowledge.
There are different techniques and algorithms that will be applicable for extraction of knowledge.
There are also many different open source and commercial data mining tools. The following
criterions are applied to select an appropriate DM tool [3].

13
 Can the tool run on different operating system or platform?

 Can it perform well with high speed?

 Can it support varieties of algorithms and data mining tasks?

 How many features and instances can be supported?

 Can users make preprocessing and visualization?

Being famous, known well by researchers, and an open source that fulfill the criterion bulleted
above as data mining tool; for the task at hand Weka version 3.8.2 preferred.

Weka contains a collection of many algorithms for data mining tasks, including data
preprocessing, association mining, classification, clustering, attribute selection and visualization
[2]. For preprocessing task it has filtering for both supervised and unsupervised machine learning
of attributes and instances.

This research to design a required model applied classification algorithms such as J48 decision
tree, PART rules induction, multilayer perception neural networks, and Naive Bayes. These
algorithms are utilized more on predictive modeling to forecast based on past trend and have
competent results [1].

1.7.6 Evaluation of the Discovered Knowledge


The algorithms execution produces models. Finding the best fit model require classifier
evaluation. Evaluation includes understanding the results, checking whether the discovered
knowledge is novel and interesting, interpretation of the results by domain experts, and checking
the impact of the discovered knowledge [1].

14
The effectiveness measure metrics such as the accuracy, precision, recall, and ROC used to
evaluate the discovered knowledge. All these metrics are computed based on confusion matrix.
The confusion matrix is a useful tool for analyzing how well the classifier can recognize records
of different classes [1]. Given m classes, a confusion matrix is a table of at least size m by m.
The table below shows that a binary classes confusion matrix.

Table 1.2 binary classes of confusion matrix


Actual Class Predicted class

Negative Positive
Negative TN FP
Positive FN TP

For binary classification: the possible outcomes of classification are TN (True negative), TP
(True positive), FN (False negative), and FP (False positive).

If the actual instance class is positive and it is classified as positive, it is counted as a true
positive. If the instance is positive and it is classified as negative, it is counted as a false
negative. If the instance is negative and it is classified as negative, it is counted as a true
negative. If the instance is negative and it is classified as positive, it is counted as a false
positive.

AUROC is another metric used when the tradeoffs of ROC curve or the comparison of two
operating characteristics (recall and precision) would not be enough evidence. AUROC is more
powerful metric and easy to understand it is choosing a model/classifier that has maximum area
greater than 0.5 and less than 1 under its corresponding ROC curve [1].

15
1.7.7 Use of the Discovered Knowledge
The final step is designing a prototype that shows the use of the discovered knowledge using
classification algorithm. The design performed using higher level programming language called
java. Java used to develop many kinds of application such as Android apps, desktop apps, and
video games [26].

The prototype usability with respect to efficiency, effectiveness and satisfaction were checked,
evaluated and rated by users.

1.8 Operational definition


The following terms definition are purposefully contextualized for this research.
 Out of service: an ATM service could be out of service because of the non-functionality
of different inside and outside components of the machine itself; hardware parts,
communication, application (software), operating system, and unresponsive remote
server. Terms such as unavailability, downtime, and down are also used interchangeably
with out of service when the independent self-service machine ATM can’t make
electronic payment successfully.

 In-service: an ATM service could be in service when the ATM machine is at the normal
operation or function. Terms such as available, availability, up, uptime, and online are
used interchangeably with in service when the independent self-service machine ATM
can make electronic payment successfully.

1.9 Organization of the research


This research report contains six chapters. The first chapter describes background of the study,
statement of the problem, objective, scope, significance, methodology and operational definition.
The second chapter describes literature review about data mining tasks, techniques and
algorithms, DM processes models, data mining application areas and related works. The third

16
chapter is data preparation. Chapter four is modeling and evaluation. Chapter five is research
model, use of the discovered knowledge and its implementation and evaluation and the last
chapter is conclusions and recommendation.

17
CHAPTER TWO
LITERATURE REVIEW
2.1 Overview of data mining
Data mining is a multidisciplinary science. Its application on different kinds of fields such as
medicine and commerce has surprising impact. Specialized domain experts might not dig deep to
find hidden knowledge and interesting patterns on their own big data as the data mining
performances. Because of this interesting fact, being computers more powerful and the size of
data grows continually data mining becomes very vital [1]. Data mining uses different
mathematical formulae and algorithms to find precious knowledge from historical data.

According to Bastos et.al[18], the emergence of KDD and DM methodologies are the
development of automated data collection tools, the tremendous data explosion, the urgent need
for interpretation and exploitation of massive data volumes, and the existence of supporting
tools.

According to Cios et al.[1], data mining came to existence because of an alarming growth of data
and technological advancement. An advancement of computer capacity speed; architecture and
algorithms was a great motivation for the data mining. Also the World Wide Web breaks
limitations of location and time and enable ease of collection of different data types- text,
images, audio, and video transfer with high speed and save tera and zeta bytes of data.

All the data in the world are of no value without mechanisms to efficiently and effectively
extract relevant information and knowledge from them. Early pioneers such as Fayyad, Manila,
Piatetsky-Shapiro, Djorgovski, Frawley, and Smith recognized this urgent need, and the data
mining field was born [1].

The subject data mining is related with statistics to model objects. In statistics, researchers
frequently deal with the problem of finding the smallest data size that gives sufficiently confident

18
estimates. DM deals with the opposite problem, namely, data size is large and are interested in
building a data model that is small (not too complex) but still describes the data well[1].

As a major sub field of math’s statistics play a very important role in the research of information
theory, data mining, web mining and so on. For example, when we study a certain population
taking the whole population for analysis it will be cumbersome therefore only limited sample
size taken that help for decision. For this purpose different kinds of statistics measurement
utilized such as mean, variance, MSE (mean square error), standard deviation, and confidence
interval [1].

For any kind of research the methodology or approach considers reality that will be
conceptualized to understand the process from the start to end. The conceptualization includes
modeling. Most descriptions of modeling including data modeling have the form of
mathematical equation in the area of statistics. For example, Bayes, latent semantic and neural
network produce models.

Being youth science its attractiveness and shininess intensity increase. Because viewing others
unknown dimensions of data and discovering surprising knowledge make it advantageous in
business analysis. Others when doing their work such as government and private sectors
acquire knowledge scientifically from their saved data. What is common for all is data that grow
exponentially. If one have stored data without interpretation means being only resourcefulness.
Any available resource needs optimum utilization. Refined data of different application domains
for example health care and financial industry has great benefit accordingly to their specific
domain. Business to stay alive in the competitive market gives high attention for their
accumulated data.

The data that pass through knowledge discovery processes will produce knowledge that was
hidden in the data. The hidden knowledge will be applicable for accurate and precise decision on
business operation of the future.

19
2.2 Data mining tasks
According to Dunham [4], data mining tasks are classified as predictive and descriptive.

Figure 2.1: Data mining models and tasks [4]

Predictive data mining task uncover unknown pattern based on predictor dataset [4]. The
revealed pattern will be a model and determine the desired application domain future trend
decision. Predictive models are supervised learning because the class label identified before data
analysis. The Data Classification processes are two steps [1], building the classifier or model and
using classifier for classification. The first step is the learning phase. In this step the
classification algorithms build the classifier. The classifier is built from the training set made up
of database instances and their associated class labels. Each instance that constitutes the training
set is referred to as a category or class. Then, the classifier is used for classification. Here the
test data is used to estimate the accuracy of classification rules. The classification rules can be
applied to the new data instances if the accuracy is considered acceptable.

There are many different kinds of classification algorithms available for predictive modeling.
Some of the well-known are Naïve Bayesian algorithm, neural network, and decision tree [1].
Each algorithm has its own importance and a specific condition where they are best suited to
apply for the purpose of data mining depending on the best fit and application.

20
Descriptive mining tasks characterize the general properties of the data in the database [3].
Clustering of data, Summarization, Association rules, and sequence discovery are considered as
descriptive. A well-known example is identifying products that are purchased together called
market basket analysis categorized as association rules. Descriptive mining considered as
unsupervised learning. It is unlike predictive because there is no class labeled considered on
descriptive mining task.

Clustering is segmenting and forming subgroups based on their similarity of dataset [1]. The
degree of similarity differentiates groups from the others. The higher the degree of similarity
between records, it is most likely to categorize them in the same cluster. The higher the degree of
similarity yields the more homogenous cluster within that dataset that consists of heterogeneous
data. Clusters or groups are identified with their similarities or nearness.

Association rules is meant for determining which instances go together. An association rule has
two processes as classification. The first one is finding frequently happened pattern and the
second is generating association rules between frequent records in the dataset [3].

The data mining task considered in this study is predictive model based on the classification
algorithms. There are different classification algorithms, such as decision tree, Naïve Bayes, and
neural network multilayer perception.

2.2.1 Decision tree


Decision trees are a way of representing a series of rules that lead to a class or value [19].The
tree structure root, branches, and leaves help to realize classification. A decision tree is a flow-
chart-like tree structure, where each node denotes a test on an attribute value, each branch
represents an outcome of the test, and tree leaves represent classes or class distributions [3].

Hence decision trees models are commonly used in data mining to examine the data and induce
the tree and its rules that will be used to make predictions [19].

21
The two types of decision trees are called classification and regression trees. The classification
trees used to predict categorical variables whereas regression to predict continuous variables
[19].

Most of decision trees models constructed in a top-down recursive divide-and-conquer manner


[3]. Decision trees can be easily translated into a collection of rules and for each rule, traverse
the tree starting from its root and moving down to one of the terminal nodes [1].

The main points of learning algorithm for inducing a decision tree from training tuples
summarized as follows [3].

 The algorithm is called with three parameters: D (as a data partition), attribute list
(describing the tuples), and Attribute selection method like information gain. Information
gain is feature selection method also used in building decision trees for classification.
Mathematically defined as the difference between the original information requirement
and the new requirement.

Gain(A) = Info(D) - InfoA(D)…………………. (2.1)

Where Info(D) is the entropy of D and InfoA (D) is the expected information required to
classify a tuple from D based on the partitioning by A.

 The tree starts as a single node, N, representing the training tuples in D.


 If the tuples in D are all of the same class, then node N becomes a leaf and is labeled with
that class.
 Otherwise, the algorithm calls Attribute selection method to determine the splitting
criterion. The splitting criterion indicates the splitting attribute and may also indicate
either a split-point or a splitting subset.
 The node N is labeled with the splitting criterion, which serves as a test at the node. A
branch is grown from node N for each of the outcomes of the splitting criterion.

22
The recursive partitioning stops only when any one of the following terminating conditions is
true:

 All of the tuples in partition D (represented at node N) belong to the same class.
 There are no remaining attributes on which the tuples may be further partitioned. This
involves converting node N into a leaf and labeling it with the most common class in D.
 There are no tuples for a given branch, that is, a partition D j is empty. In this case, a leaf
is created with the majority class in D.

2.2.2 Naïve Bayes


The well-known one of statistical method for classification is Naïve Bayes. The Bayes theorem
determines how likely is that an event will happen given prior and posterior probabilities. Both
theorems are based on a theory of probability. Given a hypothesis H and evidence E, Bayes'
theorem states that the relationship between the probability of the hypothesis before getting the
evidence and the probability of the hypothesis after getting the evidence , is
defined as follows[3].

………………………….(2.2)

From the above definitions and equations of empirical formula Bayes ‘rule applicable for data
mining to make group and classification based on given statistical information. By classification
it is possible to predict group of objects that belong to specific instance based on a probabilistic
model specification. The hypothesis or probability using Bayes’ rule can predict a class.

Naïve Bayes Classifier is based on Bayes ‘rule by assuming that all attributes are: 1) equally
important and 2) independent of one another given the class. It is a probabilistic classifier. The
following is the mathematical equation [3].

P (A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj)

23
CNaiveBayes  argmaxP(Cj )P( Ai | Cj ) ……………………………… (2.3)
j i

Naive Bayesian is simple and easy to understand its mathematical formulae. The Naive Bayesian
algorithm usually has the following steps [3].

1. A training dataset requires xi attributes, Ck class values and w attribute values.


2. It calculates the probability of an instance having the class and the attribute value. The

probability of attribute value xi domain value wi,t in class Ck determines the probability such
that P(Ci |x)>P(Cj|x),x is in class Ci; else x is in class Cj. The Bayesian approach to classify
the new instance is to assign the most probable target value, P (Ci|x), given the attribute values
{w1, w2, ..., wn} that describe the instance according the equation 2.3.

= …..……………………. (2.4)

Since Naive Bayes classifier makes simplified assumption that the attribute values are
conditionally independent given the target value according to the following equation:

………………………… (2.5)
Step 3. All parties calculate the probability of each class, according to the following equation:

= …..……………………. (2.6)

24
Step 4. It selects the maximal of the probability of each class according to the following
equation:

CNaiveBayes  argmaxP(Cj )P( Ai | C………………………………….


j) (2.7)
j i
In other words, the predicted class label is the class Cj for which P (Ai | Cj ) P ( Cj ) is the
maximum.

2.2.3 Neural network


Another popular supervised learning method for classification is Neural Network. The method of
neural network model is unlike statistical. Neural Network or Artificial Neural network (ANN) is
another important model of data mining to classify features. Even though Bayes, Naïve Bayes
and Neural Network work on data mining tasks for classification, they are different.

ANN is a concept driven from neuron interconnection to compute and analyze logic. ANN is a
mathematical model or computational model that is inspired by the structure or functional
aspects of biological neural networks. A neural network consists of an interconnected group
of artificial neurons, and it processes information using a connectionist approach to
computation. ANN is an adaptive system that changes its structure based on external or internal
information that flows through the network during the learning phase [20]. There are two types
of neural network [1]. The first one is called feed forward and the second one is feedback loops
called recurrent. In the case of recurrent networks data from the output feedback to the input to
be again recursively as input and then will be an output.

The popular architecure of ANN is either single or multilayerd[20]. When ANN have no hidden
layer it is called single layered where as if it has hidden layer it is multilayered[3].There are no
clear rules as to the “best” number of hidden layer units.The computation of ANN considerd as a
black box. Because the comlexity of ANN mathematical finding solution for a given problem is
so much lengthy. Figure 2.2 shows the main framework of the ANN.

25
Figure 2.2 ANN model [1]

……………………………………………….(2.8)

Computing output involves calculating summation of the product of weights with input values
that are pointing to a given ouput node, as shown in equation 2.9.

m
y   w jx j
j 1
……………………….. (2.9)

Another type of ANN backpropagation (the “backwards” direction, that is, from the output layer,
through each hidden layer down to the first hidden layer (hence the name
backpropagation))learning processis described as follows [3]:

26
 Initialize weights with random values(initialized to small random numbers (e.g., ranging
from -1:0 to 1:0, or - 0:5 to 0:5)) and set other network parameters,
 Read in the inputs and the desired outputs,
 Compute the actual output (by working forward through the layers),
 Compute the error (difference between the actual and desired output),
 Change the weights by working backward through the hidden layers,
 Repeat steps until weights stabilize.

Figure 2.3 Backpropgation learning proceess [ 3]

AS decipted in figure 2.3, training Algorithm of Back Propagation with mathematical formulae
[3]

 Initialize the weights and threshold to small random numbers.


 Present a vector x to the neuron inputs and calculate the output using the adder function.

m
y   w jx j
j 1
………………………........(2.10)
 Apply the activation function (in this case step function) such that

27
 0 if y  0
y 
 1 if y  0
…………………………. (2. 11)
 Update the weights according to the error.
W j  W j   * ( yT  y ) * x j
………………….. (2.12)

Usually to limit the output an activation function will be considered. This function make
boundary value between 0 and 1. There are so many activation function make boundry. The most
used one is a sigmoid function[3], as shown in equation 2.8. Graphical representation of sigmoid
function is presented in figure 2.4.

a = 1/(1+e-x) ....................................................(2.8)

Figure 2. 4 Graphical representation of sigmoid function[3]

Notice consider when defined ANN topology before training can begin, decide on the number of
units in the input layer, the number of hidden layers (if more than one), the number of units in
each hidden layer,and the number of units in the output layer[3].

28
2.3 DM process models
An acceptable format for knowledge discovery process (KDP) within common frameworks is
known as a process model [1]. A process model of data mining has its own framework. The
DM process defines, as show in figure 2.5, a sequence of steps (with eventual feedback
loops) that should be followed to discover knowledge (e.g., patterns) in data. Each step is
usually realized with the help of available commercial or open-source software tools [3].

Figure 2.5 Sequential structure of the KDP model [1]

The followings are the three common Knowledge Discovery process models, academic research
model, industrial model and hybrid model [1].

2.3.1 Academic Research Models


As Cios et al. [1] states; the academia model was created first in 1996 by Fayyad et al.[21]. It
consists of nine steps in KDP which is the foundation for other models.

1) Understanding of the application domain and the final product value when passing
through KDP.
2) Form a dataset which will produce valuable sample subset that will be input for KDP.
3) Data cleaning and preprocessing: - treatment of missing value, remove outliers and
accounting for time-sequence information and known changes.
4) Data reduction and projection: - search and find determinant feature or attributes that
align with the purpose of KDP by making data transformation to have representative of
the dataset.
5) Make appropriate data mining task that realize the application domain. Select task
whether it is predictive or descriptive. For example clustering, Association, and
classification.

29
6) Select an algorithm to create a model that excels other models.
7) Data mining (find a pattern or knowledge): The knowledge representation can be in the
form of rules such as decision tree, rule induction, etc.
8) Interpreting mined patterns: - the knowledge representation can be visualized in the form
of graph or text to read based on the selected model.
9) Compile the acquired knowledge: - It include report for the concerned, document,
deployment and make an appropriate action or be an input for other advanced research.

2.3.2 Industrial Model


According to Cios et al.[1] the two industrial models are the five-step model by Cabena et al.
[27] with support from IBM and the six-step CRISP-DM model, developed by a large
consortium of European companies. The fore runner of industrial model is CRISP-DM (Cross-
Industry Standard Process for Data Mining).Figure 2.6show CRISP DM process model.

Figure 2.6 six steps of the CRISP-DM KDP [1]

30
1) Business understanding: - understanding of business goals and the desired input needed
for that business. At this stage identified and known the business problem and or
available resources accordingly to data mining problem definition. Also form a
preliminary project plan which consists of details of each task that will be performed to
achieve the objectives.

2) Data understanding: - startup with data collection, then know description of dataset,
explore dataset to know what it contains inside, next verification of data quality,
enumerate and know the placement of every feature.
3) Data preparation: - This is most time taking process and very crucial part of data mining.
It includes Table, record, and attribute selection; integration data sets; data cleaning;
construction of new attributes; and transformation of data.
4) Modeling: - Select the best algorithm and techniques that will be used for creating an
applicable model. The default, calibrated and optimal parameter values tested and used to
find best model. It is an iterative process.
5) Evaluation: - All the models created are evaluated according to business objectives
criteria. At this step before determinant decision decided a review of all the previous
processes required for the selected model to be a model.
6) Deployment: - The deliverable knowledge depend on application domain reported,
presented, and submitted in such way easily to understand or it will be for another KDP .

2.3.3 Hybrid model


The hybrid model is the combination of academia and industrial models. This research uses the
hybrid knowledge discovery process model because of the following reasons [1].

 This model has several feedback and detailed reasons why feedback.
 The extracted knowledge extended for another application domain.
 It contains research-oriented structure.
 Emphasize on data mining.

One such model is a six-step KDP model (see Figure 2.7 below) developed by Cios et al. [1].
The steps are discussed as follows,

31
Figure 2.7: The six-step KDP model [1]

32
1) Understanding of the problem domain: - Kick off step is understanding of the data mining
application domain, looking close at that specific domain, collaborate working with the
domain expert to define clearly the problem and determine the project goals, identifying
key people, and learning about current solutions to the problem. It is a key learning
domain-specific terminology. This step includes the followings tasks.

 A description of what, how, and why the problem happened and its
restrictions are prepared.
 The business application domain area goals are translated into DM goals.

 Initial selection from many available commercial and open source DM tools
to be used later in the process is performed.

2) Understanding of the data: - Upon the foundation of kick off step this phase builds the
following tasks.

 Data collection and sampling and deciding which data, including format and
size, will be main task.

 Data are checked for completeness, redundancy, missing values, believability


of attribute values, etc.

 The final task on this phase is verification of the usefulness of the data with
respect to the DM goals.

3) Preparation of the data: - The task in this phase, upon the foundation and building of the
previous two successive phases, is deciding which data will be used as input for DM
methods in the consequent step. Data mining results highly depend upon data preparation.
Incorrect results yield form poor data preparation. This step takes too much time to
complete the followings tasks.

33
 Data cleaning which includes collecting data from different source or
integration.

 Checking the completeness of data records which includes removing or


correcting for noise or outliers and filling missing values by correct attribute
values etc.

 Constructions of new attributes from the given ones.

 Data transformation by normalization (make attribute values with in specified


boundaries) and aggregation (using concept hierarchy).

 Data reduction is removing irrelevant and duplicate attributes and reduces


number of instances by sampling.

 Data preparation also includes suiting quality data for DM tool selected.

4) Data mining: - this step has quality data from the previous step to find the desired
knowledge. There are different techniques and algorithms that will be applicable for
extraction of knowledge. The followings are some of them.

 Classification techniques - decision tree, rule induction, neural networks.


Using algorithms such as J48 decision tree, PART rule induction, Multilayer
perception neural network, Bayes…

 Association Techniques – pattern discovery of items relationship for example


in shopping items. Algorithms- Apriori and FP Growth

5) Evaluation: - At this step the extracted knowledge is checked for interestingness and
applicability. Also domain experts are checking interpretation of the results and
interestingness of the knowledge for retaining the model.

34
6) Use of Knowledge: - the final step will be documentation, deployment, and application in
the current domain and extended to other domains and planning where and how to use the
discovered knowledge.

2.4 Data mining applications


Application areas of data mining are enormous. According to Han and Kamber [3], DM applied for
financial analysis, telecommunications, biomedicine and science, counterterrorism (including and beyond
intrusion detection) and mobile (wireless) data. In business DM is used for both to increase revenues and
to reduce costs [19].

Financial data collected in bank and financial industries are often are relatively completed, reliable, and of
high quality, which facilitates systematic data analysis and data mining. Data mining in banks industry
applied for loan payment prediction and customer credit policy analysis, detection of money laundering
and other financial crimes, classification and clustering of customers for targeted marketing, and also to
design and construct data warehouses for multidimensional data to find general properties of data [3].

According to Han and Kamber [3] the other application area is in retail industry. Many stores open their
own website without any brick-and-mortar store location but exist solely online where their customers can
purchase items. The data mining application in retail industry used to identify customer buying behaviors,
discover customer shopping patterns and trends, improve the quality of customer service, achieve better
customer retention and satisfaction, enhance goods consumption ratios, design more effective goods
transportation and distribution policies, and reduce the cost of business. Some applications of DM in retail
industries are design and construction of data warehouses, multidimensional analysis of sales, customers,
products, time, and region, analysis of the effectiveness of sales campaigns, customer retention—analysis
of customer loyalty, and product recommendation and cross-referencing of items.

The telecommunication industry also inclined to apply DM for understanding of the business involved,
identifying telecommunication patterns, catching fraudulent activities, making better use of resources, and
improving the quality of service [3]. The followings are lists of some applications:

35
 Multidimensional analysis of telecommunication data- to identify and compare the data traffic,
system workload, resource usage, user group behavior, and profit.
 Fraudulent pattern analysis and the identification of unusual patterns: to (1) identify potentially
fraudulent users and their atypical usage patterns; (2) detect attempts to gain fraudulent entry to
customer accounts; and (3) discover unusual patterns that may need special attention, such as
busy-hour frustrated call attempts, switch and route congestion patterns, and periodic calls from
automatic dial-out equipment (like fax machines) that have been improperly programmed.
 Multidimensional association and sequential pattern analysis: This can help promote the sales of
specific long-distance and cellular phone combinations and improve the availability of particular
services in the region.
 Mobile telecommunication services: to design adaptive solutions enabling users to obtain useful
information with relatively few keystrokes. And it used for telecommunication data analysis and
visualization.

Han and Kmber [3] mentioned the application of DM powerfulness on intrusion detection. DM is the
more precise on intrusion detection than the traditional intrusion detection system. Moreover, IDS needs
highly qualified person to find the subtleties of anomaly signature detection but DM require far less
manual processing and input from human experts. The following are areas in which data mining
technology may be applied or further developed for intrusion detection [3]:

 Development of data mining algorithms for misuse detection and anomaly detection.
 Association and correlation analysis, and aggregation to help select and build discriminating
attributes.
 Analysis of stream data for real-time intrusion detection.
 Distributed data mining to analyze network data from several network locations in order to detect
these distributed attacks.
 Visualization and querying tools for viewing any anomalous patterns detected.

An empirical study by Farooqi et al. [28] describe the necessity of DM on historical data to
extract useful information that would be the source of sound decision. Also presents DM
techniques and its useful application in banking industry like marketing and retail management,
CRM, Investment Banking, portfolio management, risk management and fraud detection.

36
Investment Banking
DM techniques are so supportive to select the best investment as per the clients’ profile. Neural
networks and linear regression could be applied to predict prices for stocks. Based on DM
prediction the one selected ROI could be the maximum [28].

Portfolio Management
With DM techniques investors make allocation of budgeting appropriately in trading activities in
order to have maximize profit and minimum risk [28].

2.5 Related works


Being predominant data mining has been applied in so many kinds of business or industries. One
of these businesses that use data mining techniques for competitive advantage or business
intelligence is financial industry like banks. The innovation of new technologies and software
application in bank still continues. The emerging self-service systems or ATM’s using a plastic
card is going to contactless and video conferencing capabilities[22].On the other side expanding
services, more sophisticated devices and the growing number of customers choosing self-service
channel options are resulting in an explosion of ATM “Big Data”. The electronic payment using
ATM have vast transactional data that classified as Customer engagement, Payment and Cash
management [23]. In this section based on these categories related works of data mining
techniques applied on ATMs transactional data are reviewed and presented.

Not many years ago but recently the ATMs transactional data were considered to extract hidden
pattern in certain journals. The goal of the analysis using data mining technology are to
maximize profit, increase return on investment, business intelligence, satisfy customers in
customers relationship management and stay alive and be forerunner in fierce competitive
market.

The first reviewed article is construction of adaptive automated teller machines by Shaikh and
Mahmood [11]. The ATMs transactional data was collected and preprocessed, data mining
technology applied for construction of ATMs user interface. Data mining tool used was ProM,
and then quantitative research method was an extended approach to reinforce the finding. The

37
finding revealed that adaptive automated teller machine was the preference of most ATMs
customers.

Whenever and everywhere quality of ATMs service has been desired for both customers and the
service providers. This research and the related articles reviewed in this thesis concerned with
improving ATM service based on historical data collected using data mining technology. These
days the data mining attracts business and selected fields and because of this researches continue
on ATMs transactional data to forecast the future business of ATM inclination.

Different types of users who have ATM plastic card access ATM at any time wherever ATM
located to find different services. The time taken assuming normal condition for customer to
access and use ATM depend on response time of thirty seconds. However the type of customer,
the time of access, and the location of ATM can determine the span of time. It is common in very
dense population area and peak hour of ATM transaction to see queue to get access.

The adaptive automated teller machine objective was development of adaptive ATM interfaces
to minimize the ATM usage time for a population of customers particularly at heavy traffic
ATMs [11]. For the achievement of the objective there were two approaches used. The first and
most used was data mining technology. After preprocessing of the data interesting knowledge
was extracted. The result identified that withdrawal is the most frequent operation performed by
customers, followed by purchase and balance inquiry. Based on the above finding the authors
develop interfaces that can minimize the time taken when ATM accessed.

The second approach to achieve the objective of adaptive automated teller machine was
quantitative research method. The ATM interfaces of adaptive automated teller machine that was
developed after knowledge discovery process were evaluated using online survey
(questionnaire). The evaluation proofed that users’ preference was the adaptive ATM interface or
menu.

38
Indeed the data mining technology produce the required hidden knowledge that improve ATM
service at dense area of population and pick ATM transaction hour. Not only enough improving
wait time or queue on ATM but identifying the specific reason ATM out of service is the first
step to increase and improve quality of service.

Madhavi et al. [13] conducted ATM service analysis using data mining. The ATMs transactional
data collected and preprocessed, predictive data mining gave solution for finding in which time
ATM used more frequently and identify peak time of an ATM in a day/month, and spot any
ATM transaction, and data mining tool was Weka, the finding support decision makers to take
the necessary actions.

The objective of the research was to provide the graphical visualization for easy identification of
ATM usage level and monitor the ATM peak time. For the achievement of the objective
predictive data mining technology was applied. After preprocessing of the data the extracted
knowledge were visualized peak day of an ATM in a month, peak hour of an ATM in a day, the
type of transaction which occurs regularly are recognized, the location of the ATM which
provides the service to the customer is identified with their usage level, and the ATM usage for
every location is calculated. Based on the finding business intelligence bank decision makers
simply determine the business future direction.

The predictive data mining corrected ATM service to increase the Quality of Service. One of
QoS replenish money be on timely manner and hence reduce the situation of out of stock in that
ATM. The other uncovered hidden knowledge was peak level and idle level of the ATM to take
appropriate action for decision makers.

The above researches opened door for more analysis on ATM transaction dataset because ATMs
nowadays do not have more analysis [13].Both researches view and analyze ATM transaction
dataset in their own way. The research at hand also in different dimension investigates the
problem of out of service to identify the reasons for ATM out of service through designed model.

39
Gümüş et al.[12] attempted to analyze ultimate point in the service provided by the banks to their
customers so as to determine customer satisfaction in the common uses of ATMs. The researcher
pinpoints the necessity of standard measurement in bank industry for the provision of quality
service of in dynamics of ATM usage. For business like banks customers care and customers
relationship management to meet their expectations and determine how such expectations would
be met is a key. For any business in order to have higher position and keep it up for long and to
be competent it is required to find new customers or duly satisfy the already existing customer.
According to Gümüş et al.[12] today, customer satisfaction has become one of the busiest
segments of marketing because of that studied the expected service quality and the perceived
service quality of common ATMs and thus determine the satisfaction level of ATM users.

The existence service provided for commonly use ATM users when perceived to what extent
reached was not known. To know the satisfaction level of common ATM users the approach
followed called screening model. The screening models based on demographic characteristics
questionnaire were used to identify the difference between the expected service quality and the
perceived service quality of common ATMs which indicates the satisfaction level of ATM users.

The finding were with respect to the demographic characteristics sample group divided in
gender, age, educational status, marital status, income status, the number of banks/credit cards
they are using, and the most frequently used credit cards. And the finding shown the
measurements were statistically significant hence all most all perceived measurement score were
less than the expected.

The research found out less satisfaction level provided by banks for their ATMs customers users’
were because of unsatisfactory quality of service. And recommended to measure periodically
their customer satisfaction levels through updated questionnaire screening operations. Further,
the banks are recommended to improve their processes in order to provide services to customers
on time and in a fast, flawless, enthusiastic and reliable manner. In respect of common ATM use,
bank employees should improve their skills, should become equipped with the necessary know-
how and foster a feeling of trust on the customers.

40
All the research reviewed initiates other researches typical on ATMs quality of service. As a
matter of fact the research at hand investigates to identify the reason for ATMs out of service
problem which is different dimension which is unlooked and untouched.

2.6 Related works of ATM Banking service


Customers’ satisfaction with ATM banking in Malawi by Charles Mwatsika [29] studied the
ATMs service quality and customer satisfaction. ATM SQ and customer satisfaction are coupled
together. Well fitted SQ produce high level performance. It is a performance of the service that
satisfy or dissatisfy customers. The researcher investigate the existing ATM banking service
whether it satisfied customers or not.

The research methodology followed was importance-performance approach and referenced the
commonly used five attributes models which are tangibles, reliability, responsiveness, empathy
and assurance to measure SQ and customer satisfaction.

To develop the measurement scales referenced another known 25 valid ATM SQ attributes from
different resources which comprised tangibles (6 items), reliability (6 items), responsiveness (6
items), assurance (3 items) and empathy (4 items). Likert scale was added for these 25
questionaries’ and then 353 ATM card users administered through email to find their reply.

The finding was Overall 53% of the respondents were satisfied with ATM services and even
though the referenced five SQ attributes significantly associate with customer satisfaction
reliability is strongly correlates with ATM service satisfaction.

The study remarked banks responsiveness and empathy SQ in Malawi to be standard for
competitive advantage. Because employee performance on ATM cards application,
reconciliation processes and management functionality have been perceived not so good in
performance.

41
The demographic characteristics pointed out was limited. Thus it is recommended ATM
customers’ satisfaction future research to include wide area demographic characteristics and
different customers’ satisfaction research on mobile and internet banking.

Modified automatic teller machine prototype for older adults: A case study of participative
approach to inclusive design by Chan et al. [30] studied the existing ATM interface and aged 60
or above ATM card users’ who have physical or cognitive limitations. Because the existing
human– ATM interface didn’t match with elderly people.

The research methodology followed was a user participative approach to adopt a universal
design that can accommodate older adults. A total of 187 older adults, participated in two stages,
91 older adults were assigned to the existing human– ATM interface test group and 96 were
assigned to the modified universal human– ATM interface prototype group.

The result present for participants who had a lower education level pointed that the universal
design to include reduce amount of information to be recalled and processed, reduce the
transaction time, present information more slowly, simplify the presentation of information on
screen, use a single screen to complete one operation/transaction, and use graphics and visual
effects.

The research notified the tradeoff of the universal design which is the modified ATM prototype
reduction in functionality would create inconvenience to younger users. Hence it is remarked the
future research workout on resolution to address optimum functionality. On the other hand the
research gap was the participant. All the participant respondents were gathered from a certain
elderly center. Also it is recommended on future work respondents to be global or from wide
area different status people.

Analyzing and Investigating the Use of Electronic Payment Tools in Iran using Data Mining
Techniques by Moslehi et al [31] investigated E-payment instruments –ATM and Pin Pad
transactions. Firstly factors that affect the number and amount of transactions , second

42
relationship between the period and the volume of the transaction, thirdly relationship between
the bank and the province.

To address the problem approached followed up was CRISP-DM methodology. The techniques
were clustering and classification. Data collected from Statistical Center of the Central Bank of
Iran which is performance statistics of electronic payment tools ATM from 1986 to 1994 and
49, 425 records of 12 variables used for DM. The DM tool was Clementine 12. First K-Means
algorithm was implemented to determine the target field categories. And then the CART decision
tree algorithm was used to explore the factors affecting the average amount of ATM transactions
and the average amount of Pin Pad transactions.

The finding from CART decision tree algorithm was ATM in seven banks and seven province in
the years 1991 to 1994 in summer, autumn, and winter than in spring has increased significantly.
And the other rule was increased ATM terminals had direct proportion with Pin Pad transactions.
Additionally it is noted the research enhance E-banking services and E-banking future policy.

It is recommended future works to be done using telephone banking, internet banking, mobile
banking, and POS for a comprehensive review of the bank performance in electronic payment.

A maintenance prediction system using data mining techniques by Bastos et al. [18] studied
mitigation of industrial machine failure.

The methodology followed up was by demonstrating a new conceptual framework. Therefore


introduces a decentralized predictive machines maintenance system based on the application of
data mining techniques to forecast the possibility of breakdowns that may increase systems
reliability and generate a set of schedules notifications for maintenance action.

The finding from this system using visualization techniques on ANN back propagation algorithm
with an accuracy level of 72% can predict failure occurrence. And on future with the addition of

43
monitoring data into the system suggested the possibility to predict the failure occurrence within
a proper time and able to perform interventions on equipment before breakdowns to reduce
maintenance effort and cost.

Thus the forwarded future work pointed that to predict new failures more features required
particularly from monitored equipment database to understand the reason of all malfunction
occurrences.

44
CHAPTER THREE
DATA PREPARATION

3.1 Understanding of the problem domain


To convert data to data mining goal it is vital to understand the problem domain at this initial
step. Since the hybrid model is research oriented setup learning is the key issue to understand
the problem area.

The Dashen bank external portal lists the following four main products and services [5].
Domestic banking, International banking, Dashen card, and E-banking.

Domestic banking contains three products and services. The first one is Current account which is
a convenient medium to make and receive payments. It allows to access money with ease by
using cheques or VISA Card. The other is loans that are offered in different kinds such as
Agriculture, Manufacturing, Import/Export Loans, Trade and Services, Building and
Construction, and Transport. The third is Money Transfer, Remittance of funds through one
branch to another. Local common means of remitting are - Mail Transfers, Telegraphic or
Telephone Transfers, Local Drafts, Cashier Payment Order (CPO). And the well-known from
internationals are Western Union and Money Gram.

International banking products and services include mainly Foreign Exchange Permits and
Import and Export. Approval process of foreign exchange permits requires presentation of
different sets of documents for each transaction. The other one which is related with Foreign
Exchange Permits Import and Export enable purchase of commodities, machineries, materials,
etc. from abroad and allowed to enter Ethiopian territory.

Dashen card have different varieties. The first one is Debit card, it allows at any time accessing
ATM/POS, Operate multiple accounts with a single card, Withdraw up to 5,000 Birr per day per
card subject to the balance in account, purchase goods or services up to Birr 8000.00 per day per
card at any of the Dashen Bank merchants, Check the balances of all accounts linked to the card,

45
Obtain mini-statement that lists the last ten transactions in any of accounts linked to the card.
The others are Salary Card which is useful for organization salary payments and Student Card
for students.

E-banking services offered by the bank include ATM/POS, Internet Banking, Mobile Banking,
and Agency banking [5].

Available services on Dashen Bank ATMs are Cash withdrawal, Balance Inquiry, Mini-
statement, Fund transfer between accounts attached to a single card, PIN change, and PIN
Unblock. And on POS allowed purchase of goods from Dashen merchants who have POS
machines.

Internet banking is a service that enables the customer by accessing Internet Banking on all types
of search engines like Mozilla fire fox, chrome etc… all over the world. The major services are
Account information, Enquiries mini statement, full statement, Daily Exchange rate, Loan
statement, Fund transfer within Dashen bank and To other local banks accounts, Salary and
provident fund upload, Electronic bill payment(utility payment), Stop cheque payment, Cheque
book order, password change and other service.

Mobile Banking services are Fund transfer within own Bank account and wallet account, Fund
transfer through mobile phone for those who registered for mobile service, Fund transfer to
others who have only mobile phone/No. for those who unregistered for mobile service, Mini
Statement and checking of account history of wallet account, Balance Enquiry on bank account
and Wallet account, PIN change, Fund transfer from Bank account to Bank account for those
who registered for mobile service, Merchant payment, Bill payment and other services.

Agencies banking major services are Account information, E-wallet account opening
/registration/, Cash deposit to wallet account, Cash withdrawal from wallet account, Fund
transfer to others who have only mobile phone/No., Bill payment, Merchant payment, Facilitate
Regular bank account opening and other services.

46
The ATM data collection studied on ProView monitoring console [16] with domain expert and
conducted collaboration work to understand more the problem domain. In other words, on this
step in order to properly understand the problem; we attempt to explore problem domain and
found insight particularly focused on ATM data including the following issues. What are the
variables? Which variables are making out of service ATM? What kind of data? How data
collected? Where is the data? Why the data collected and data translation to achieve DM goals
using a tool Weka [1].

For this reason the researcher in consultation with domain expert learn about Dashen bank S.C.
ATMs monitoring system called ProView. ProView is an agent and PC-based event monitoring
and incident management system for the monitoring and administration of ATMs[8].The agent
from each terminal ATM or self-service application and the self-service hardware send all the
collected events to the ProView Wincor Nixdorf’s management system. For this purpose on the
network operating system MS window 2013 installed and on this platform for relational database
management system installed SQL Server 2008. The application of Wincor Nixdorf’s ProView
monitoring and administer all ATMs installed on win 7.

On this application administrator on real time can view cash status, mechanical problem, and
others different status of ATMs to take remedial action and generate different kinds of report
such as ATM uptime and down.

The main components of ATMs are: - Card reader, Keypad (EPP), cash dispenser, Display
screen, Speaker and Receipt printer. The live communication between ATMs and ProView
enable administrators to monitor the status of all Dashen ATMs. On the other side, B24remote
host processor manages every customer withdrawal request and others found on ATM display
menu. Every event on ATM interface is sent to ProView monitoring console which is the
interesting part of this research. The event provides information described in table 3.1 below.

47
Table 3.1 event content and description
Event Description

time stamp includes month, date, year, hour, minute and seconds

event unique session identification number


number

event activities that going on when the ATM interfaces touched for
message different purposed based on menu

Server time a time like event stamp but when the events are arrived on
stamp ProView server

original messages come from ATMs activities. It is like event message


message but include the amount of transaction

ATM an ID of ATM, description of location and name


device
name and
ID

Component ATM have two components; the upper one computer and
dispenser(cash cassettes)

The ATM components based on event collected are hardware and software. The hardware
components of ATM are card reader, cash dispenser, keypad, and screen buttons. And the
components of ATM which are application and operating system considered as software.
Supervisor activities and network are other independent determinant components of ATM.

Card reader is one component of an ATM that read from plastic card. The card reader captures
the account information stored on the magnetic stripe on the back of an ATM/debit or credit
card. The host processor uses this information to route the transaction to the cardholder's bank.
Whenever the card reader has got a problem to read it will be impossible to process electronic

48
payment and the ATM machine became down [32]. Some of card reader values when it is
nonfunctional are read error, blank track, device disconnected, card jam and no smart card
response [16].

Cash dispenser: It is the heart of an ATM which is the safe and cash-dispensing mechanism. The
entire bottom portion of most ATMs is a safe that contains the cash. If this component of an
ATM and its sub components find problem it will be impossible to electronic payment and the
ATM machine became down [32]. Some of cash dispenser values when it is nonfunctional are
device not in use, device not accessible, bill cassette is empty, bill cassette is missing, reject
cassette removed, pick failure, presenter clamping mechanism failed, sensor failure, and
currency jam [16].

Keypad: This component of ATM lets the cardholder tell the bank what kind of transaction is
required (cash withdrawal, balance inquiry, etc.) and for what amount. Also, the bank requires
the cardholder's personal identification number (PIN) for verification [32]. Keypad may also fail
and make out of service ATM. If this keypad or EPP hardware failed the ATM became
nonfunctional. The value of this keypad is invalid key or key doesn’t exist [16].

Application: This system is the one that make transaction or cash withdrawal impossible.
Software found on ATMs is windows operating system and an application system. Both can
affect electronic payment make an ATM down or unavailable. Some of the values are attempt to
reboot via system, transaction failed unable to process transaction, and unable to dispense cash
[16].

Network: This network means a communication between the end terminal ATM self-service
machine and the remote server host which is manager and controller of transaction that connect
to bank core system and database. The values of this independent ATM component are
communication offline, TCP/IP address not accessible and device offline [16].

49
Supervisor activities are part of an ATM component. Whenever the supervisor made service and
change the ATM will be out of service. The values of SOPs are terminal closed for customers
and changed to supervisor mode [16].

3.2 Understanding of the data


This research considers ATMs data which are located in city area, upcountry area, hotels,
shopping centers, branches and universities. The Dashen Bank ATMs are called NCR and
Diebold. At present time there are a total of 305. Of which only 15% are Diebold. These ATM’s
found in Addis Ababa and upcountry. 66% of ATMs are found in the capital city Addis Ababa
[6].

In this stage, a data collection activity that is done in this research is discussed. ProView Data
Model 4.2/40(C) Wincor Nixdorf International 2015, the log event database for Dashen bank
data ATMs monitoring have organized its data in many tables[24]. The researcher in
collaboration with domain experts selected the portion of those datasets that are interesting for
this particular research. The original data attributes available in the log database is presented in
table 3.2.

Table 3.2 Event table from ProView Data Model 4.2/40 (C) Wincor Nixdorf International
2015

Name Comment Data Mandatory Primary Foreign


Type Key
deviceid Device name string X
timestamp Time the event was created on the device datetime X
("YYYYMMDDhhmmss")
messageno ProView Agent event number (see table numeric X
EVENTCONVERSION)
orgmessage Original device event text string
servertimestamp Time the event arrived at the ProView Server datetime
("YYYYMMDDhhmmss")
devicestate New device state numeric
eventno ProView Server event number (see table numeric
EVENTBASE)
eventcount Unique event counter numeric X
eventgroupid event group id numeric

50
The most important are events tables that can be exported to MS Excel. The followings are some
of the dataset from different tables that need refining to find complete, interesting subset and
non-redundant data.

Figure 3.1 Event table on Database Server

The followings figure is screen shot of ProView.

51
Figure 3.2 ProView Monitoring Consoles

The left side shows that all ATMs hierarchies and the right one is a single ATM events at Capital
Hotel.

52
Table 3.3 Event table on ProView Monitoring Console

Device Event Compo Event Event Message Server Original Message


name and Number nent Timesta Timestamp
location mp

PATMT_04 800100 Cash 4/27/18 Transaction: 4/27/18 01:WITHDRAWA


3DILLA Dispen 2:45:16 Withdrawal 2:45:56 PM L ETB200.00
ser PM

PATMT_04 800200 4/27/18 Transaction: 4/27/18 01: RECORD NO.


3DILLA 2:45:16 Record Number 2:45:56 2848
PM PM
PATMT_04 30017 Status 4/27/18 Cash presented 4/27/18 01: [020t 14:45:10
3DILLA Retrac 2:45:16 2:45:56 NOTES
t PM PM PRESENTED
Casset 0,0,0,2
te
PATMT_04 300506 Cash 4/27/18 The Money has 4/27/18 WFS_SRVE_CDM
3DILLA Dispen 2:45:10 been taken 2:45:50 _ITEMSTAKEN
ser PM PM (CurrencyDispenser
1)
PATMT_04 30007 4/27/18 PIN entered 4/27/18 01: [020t 14:44:47
3DILLA 2:44:56 2:45:35 PIN ENTERED
PM PM
PATMT_04 300004 Card 4/27/18 Card inserted 4/27/18 01: [020t CARD
3DILLA Reader 2:44:46 2:45:28 INSERTED
PM PM

As it is shown in table 3.3 ProView display the following attributes.


 Component
 Event time stamp
 Event message
 Event number
 Server time stamp
 Original message
 Device name and location

53
The numbers of attributes shown on table 3.3 ProView Monitoring Console are 7. According to
[3] data quality can be measured in terms of accuracy, completeness and consistency. The
Dashen bank S.C. ATMs data to be accurate complete and consistent the following section show
what attributes represent and which value consists of.

Table 3.4 Description of event table on ProView Monitoring Console


Attributes Data type Description

Device name Characters ATMs Name and


and location location

Event Numeric -INT Unique session ID


Number

Component Characters Components of


ATM

Event date/time A specific time on


Timestamp happening

Event Characters Activities – from


Message Started to end

Server date/time Time stamp at


Timestamp ProView

Original Characters Activities – from


Message Started to end and
include the amount
of transaction

The data collection has been made from 14 ATM from December 31, 2017 – May 04, 2018.
The following tables 3.5 depict every ATM together with the instances collected on MS Excel.
The size of each ATM data varies because it depends on the transaction made. The combination

54
of 14 independents ATMs data at this stage became one file of size 15.9MB with the sum of
365,522 instances.

Table 3.5 Size of ATM instances and location


Number ATMs Location of ATMs Size of instance
1 Bole City Area ATM 27,500
2 Lagar City Area ATM 39,500
3 Bole premium City Area ATM 17,500
4 RasDesta City Area ATM 25,222
5 Adams Pavilion shopping center ATM 16,000
6 Edna Mall shopping center ATM 18,000
7 Filuha Hotel ATM 8,500
8 Hilton Hotel ATM 19,500
9 Shebell Hotel ATM 17,500
10 Sheraton Hotel ATM 23,500
11 Yoly Hotel ATM 16,000
12 Awasssa University ATM 66,000
13 Nekemete Up country ATM 27,000
14 Woliso Up country ATM 43,500
Sum of records = 365,222

The data set attributes and values shown in Figure 3.2ProView Monitoring Consoles are valid
and complete except ‘component’. Component attribute has 7% missing values.

Original message, device name and location and event number considered as irrelevant because
they are not useful for the problem at hand for prediction of ATM out of service and both
attributes are unused. It is too expensive to collect the whole ATMs data. Based on the domain
expert consultancy 14 ATMs which are located in Addis Ababa city area (Bole, Lagar, Bole
premium, and RasDesta); Shopping Center (Adams and Edna Mall); Hotels(Filuha, Hilton,
Shebell, Sheraton, and Yolly);University -Hawasssa, and upcountry area (Nekemeteand

55
Woliso)were selected. These ATMs are selected because of their high transaction and the target
data collections are more complete than the other ATMs.

The dataset at this stage after attributes selection that became an input as following table 3.4.

Table 3.6 Selected dataset attributes


Number Attributes Descriptions of attributes values
1 Event time stamp Contented with date, month, year, hour and minutes
2 Event message Activities on ATM interfaces indicating in service and out service

The following section 3.3 details about data preparation.

3.3 Data Preparation


As noted by Cios et al. [1] this step concerns deciding which data could be used as input for DM
methods in the subsequent step. Major tasks in data preparation are Data Cleaning, Data
Integration, Data Reduction, and Data Transformation. It also includes sampling dataset,
derivation of new attributes (discretization), and summarization of data (data granularization).

The tasks applied in this study were data transformation and derivation of new attributes.

Driving new attributes


The following table 3.7 shows a new derived attributes. According to Han and Kamber [3]
during attribute construction new attributes are constructed and added from the given set of
attributes to help the mining process. Therefore new attributes which are components of ATM
derived are CardReader, EPP, DispenserDevices, DispenserCurrencyCassette,
DispenserPickCash, DispenserTransporter, SOP, Application, and NetworkConnectionB24. And
they are derived from ‘Event message’ attribute.

56
These attributes are subset of event message attribute and they are derived according to Cios et
al. [1] construction of new attributes.

Table 3.7 New derived attributes and descriptions.


Number Attributes Descriptions
1 CardReader Hardware device that read ATM card
2 EPP Hardware Encrypting PIN Pad
3 DispenserDevices Dispenser devices
4 DispenserCurrencyCassette Dispenser cash cassette
5 DispenserPickCash Dispenser pick cash
6 DispenserTransporter Dispenser cash transporter
7 SOP Supervisor of ProView
8 Application An application and operating system on ATM
9 NetworkConnectionB24 Communication between ATM and the manager base 24
server

The following is a snapshot of event message frequency on weka preprocess.

57
Figure 3.3 Event message count

As shown in figure 3.3 there were a total of 135 distinct event message values count inside
365,522 instances. After removing all the in service values, which are not the concern of the
current study, the out of service event values are organized to prepare data set for
experimentation. The selected event message out of service is shown in figure 3.4.

58
Figure 3.4 Event message out of service

Data transformation
According to Han and Kamber [3] discretization, concept hierarchy, and generation are where
raw data values for attributes are replaced by ranges or higher conceptual levels. Time stamp
attribute after aggregation became three distinct class values as shown below in table 3.8.

59
Table 3.8 Data transformation of Time stamp
Attribute time stamp Time stamp values Aggregated values

January Jan 1-31 2018

Time stamp March March 1-31 2018

April April 1-30 2018

The following table 3.9 shows attributes and their values used for this research together with
their data type.

Table 3.9 List of attribute used for model building


No. Attribute name Data Type Attribute values
1 CardReader Nominal Card Reader Read error, Blank track , No smart
card response , Card Reader disconnected, Shutter
jammed open , Device still inoperative , Card jam
during capture, Card jam
2 EPP Nominal PIN/EPP: Invalid key, key does not exist
3 DispenserDevices Nominal Device not in use , disconnected, not accessible,
, offline
4 DispenserCurrencyCassette Nominal Retracted cards/bills, Device CashOut Module
offline, Bill cassette Slot 4 is inoperative, Reject
cassette removed, Bill cassette Slot 4 is missing,
Bill cassette Slot 4 is empty, Bill cassette 4 is
empty
5 DispenserPickCash Nominal Pick failure, Pick failure - out of bills, Presenter
clamping mechanism failed or jammed,
Purge bin not present, Too many bills rejected

60
6 DispenserTransporter Nominal Currency jam in presenter transport or transport
sensor failure, Sensor failure or currency jam in
main transport
7 SOP Nominal Operator switch changed to supervisor mode,
Terminal is closed for customers
8 Application Nominal Attempt to reboot via special electronic,
Attempt to reboot via system, Transaction failed,
Transaction failed: Unable to dispense cash,
Transaction failed: Unable to Process Transaction
9 NetworkConnectionB24 Nominal Communication Offline, Device offline,
TCP/IP address not accessible
10 Eventime_Timestamp_2018 Nominal January, march, April

The combined 365,522 instances after preprocessing became 20,880 Out of Service event quality
dataset that had been input for data mining tool weka. MS Excel and VB were used for data
cleaning purpose. This quality dataset file converted to comma separated value and attribute
relation file format to make it suitable for data mining tool weka.

As Cios et al. [1] a classifier is a model of data used for a classification purpose: given a new
input, it assigns that input to one of the classes it was designed/trained to recognize. The
dependent classifier attribute was even time stamp. It classifies to three different instances.

61
Figure 3.5the class event time stamp 2018 and its values

62
CHAPTER FOUR
Modeling
4.1 Building the model
Here the data miner uses various DM methods to derive knowledge from preprocessed data [1].
In this study classification algorithms are used patterns describing the reasons for ATM out of
service. The tool for data mining task used was weka. Weka contains varieties of machine
learning algorithms and hence the selected predictive models algorithms were PART, J48, Naiev
Bayes, and multi-layer perception on weka version 3.8.2 knowledge analyzer.

For test design purpose default weka test option 10 folds cross-validation was used for all
experiment. It is partitioning a data set randomly into 10 folds. Then all combined partitioned
dataset except a single subset data set used to train and produce model to have the statistical
result. That single partitioned subset dataset used for test set to evaluate the model. This training
and test repeated 10 times. The cross validation in weka is one of the option of test design to
evaluate predictive model.

The following sections were experiments result achieved in each algorithm when the models
built.

4.2 Experimental result using J48 decision tree


In this experiment the parameters binary split and unpruned of J48 decision tree are calibrated by
changing their default values into true or false whereas the rest of the parameter were kept as
their default value. Summary of experimental results are presented in table 4.1.

63
Table 4.1 Experimental result of J48 decision tree algorithm

Experiments Un Binary Correctly Incorrectly Time taken to


build model in
Pruned Split Classified Classified
seconds
Instances Instances
1 True True 52.4808% 47.5192% 0.23
2 True False 52.4808% 47.5192% 0.07
3 False True 52.7059% 47.2941% 1.46
4 False False 52.3707% 47.6293% 0.29

The Experiment results of J48 accuracy as shown in table 4.1 on cross validation test mode when
unpruned false and binary split true; 52.7059%accuracy greater than all others results That
means J48 algorithm correctly classified about 52.7059%and 47.2941% incorrectly.

The following table 4.2 shows the confusion matrix for pruned and binary split.
Table 4.2 Confusion matrix for the result ofJ48 decision tree
a b c <===Classified as
1208 3386 157 a= January
668 9563 134 b = April
795 4735 234 c = March

The above confusion matrix shown that out of 20,880instances10,976 were classified correctly
(52.7059%) which is the diagonal partition and 3 of them were classified incorrectly
(47.2941%).

64
4.3 Experiment result using PART
In this experiment the parameters binary split and unpruned of PART rule induction are
calibrated by changing their default values into true or false whereas the rest of the parameter
were kept as their default value. Summary of experimental results are depicted in table 4.3
below.

Table 4.3 Experimental result of PART algorithm


Experiments Un Binary Correctly Incorrectly Time taken to
build model in
Pruned Split Classified Classified
seconds
Instances Instances
1 True True 52.4761% 47.5239% 0.98
2 True False 52.48.08% 47.5192% 0.32
3 False True 52.591% 47.409% 0.3
4 False False 52.3515% 47.6485% 0.28

The Experiment results of PART accuracy as shown in table 4.3 on cross validation test mode.
The accuracy 52.591%is greater than all the other. That means PART algorithm correctly
classified about 52.591%and 47.409% incorrectly.

The following table shows confusion matrix for pruned binary split.
Table 4.4Confusion matrix for the result of PART rule induction
a b c <====Classified as
1340 3230 181 a= January
791 9283 291 b = April
897 4509 358 c = March

The above confusion matrix shown that out of 20,880 instances 10,951were classified correctly
(52.591%) which is the diagonal partition and three of them were classified incorrectly
(47.409%).

65
4.4 Experimental results of Naïve Bayes
In the case of Naïve Bayes algorithm the parameters calibrated to change default values are
kernel estimator and use of supervised discretization; the rest of the parameter value were kept as
their default value. The experiment done by changing their value into true or false but both
parameters cannot be true at the same time. The results were similar as shown in table 4.5.

Table 4.5 the result of Naïve Bayes algorithm accuracy


Experiments Use Use Correctly Incorrectly Time taken
Kernel to build
Supervised Classified Classified
model
Estimator
Discretization Instances Instances
1 True/False False/ True 52.6868 % 47.3132 % 0.05
seconds
Or False Or False

The Experiment results of Naïve Bayes accuracy as shown in table 4.5 on cross validation test
mode whether it is default value or changes calibrated parameters Use Kernel Estimator and Use
Supervised Discretization are all similar. The accuracy is 52.6868 %. That means Naïve Bayes
algorithm correctly classified about 52.6868 %and 47.3132 %incorrectly.

The following table 4.6 shows the confusion matrix of Naïve Bayes which is alike J48 and
PART.
Table 4.6 confusion matrix result of Naïve Bayes
a b c <===Classified as
1320 3218 213 a= January
775 9309 281 b = April
874 4518 372 c = March

66
The above confusion matrix shown that out of 20,880 instances 10,977were classified correctly
(52.6868 %) which is the diagonal partition and three of them were classified incorrectly
(47.3132 %). It is clear J48, PART and Naïve Bayes confusion matrix are similar.

4.5 Experimental result of multilayer perception


The last experiment was done using neural network multilayer perception. The calibrated
parameters were hidden layers and learning rate with a 10-fold cross validation. Hidden layer is
one of metrics of the multilayer perception. The default value of this parameter is ‘a’ which
stores = (Number of attributes + classes)/2, which is in this case=> (9+3)/2= 6.This default
value has been used and changed to see the ANN classification accuracy. Learning rate is
another Multilayer perception parameter which is defined as the amount the weights are updated
and its default value is 0.3.The result of the neural network model with default parameter and
calibrated values is shown in table 4.6.

Table 4.7 Experimental result of multilayer perception


Experiments Correctly Incorrectly Time taken to build
model
Classified Classified
Instances Instances
1) Used default 51.7912 % 48.20.88 % 494.86 seconds
parameters
2)when hidden layers 51.8199 % 48.1801 % 490.93 seconds
is 6 or ‘a’ and
learning rate were 0.4
3) when hidden layers 52.2701 % 47.7299 % 495.64 seconds
is 6 and learning rate
were 0.5
4) when hidden layers 51.0967 % 48.9033 % 494.49 seconds
is 6 and learning rate
were 0.6
5) when hidden layers 51.2644% 48.7356% 157.05seconds
is 7 and learning rate
were 0.6

67
The Experiment results of multilayer perception accuracy as shown in table 4.7 on cross
validation test mode. The accuracy 52.2701 % greater than all the other. That means multilayer
perception algorithm correctly classified about 52.2701 %and 47.7299 % incorrectly. Notice the
maximum time taken to build modelwas495.64 seconds when default parameters used and the
minimum was 157.05 seconds when hidden layers were 7 and learning rate 0.6.

The confusion matrix of multilayer perception is alike all the above others algorithm.
Table 4.8 confusion matrix result of multilayer perception
a b c Classified as
1093 3378 280 a= January
630 9388 347 b = April
724 4607 433 c = March

The above confusion matrix shown that out of 20, 880 instances 10,914 were classified correctly
(52.2701 %) which is the diagonal partition and three of them were classified incorrectly (47.7299
%). It is clear J48, PART, Naïve Bayes and multilayer perception confusion matrix are
competent but J48 performance is best.

4.6 Evaluation
Evaluation includes understanding the results, checking whether the discovered knowledge is
novel and interesting, interpretation of the results by domain experts, and checking the impact of
the discovered knowledge [1]. This research classification models were developed to determine
the reason for ATMs out of service of Dashen bank S.C. J48 decision tree exhibited model
performance matched with business objective. Because to extend the discovered knowledge
according to hybrid KDP model.

The following table 4.9 is the summarized classifier output. Looking closely at each of the
classifier output they have their own unique presentation. Additionally the weka preprocess tab
have feature to visualize and counting instances that can be easily observed.

68
Table 4.9 summarized classifier output
Name of the algorithm Tree : J48 Rule: Bayes: Naive Function:
PART Bayes Multilayer
Perception
Correctly Classified Instances 52.7059 % 52.591 % 52.6868 % 52.2701%
Incorrectly Classified 47.2941 % 47.4808 % 47.4282 % 47.7299 %
Instances
Time taken to build model 1.46 0.3 0.05 495.64
in
seconds

Additionally weka have feature to visualize data mining metrics that is easy to understand. The
other selected metrics that prove the experiment done as accuracy were precision, recall, and
ROC. The following figures snapshot taken from weka. The first one figure 4.1 which is
precision-recall curve.

Figure 4.1 precision-recall curve of class April

69
Precision-recall curves are important to visualize the classifier performances as accuracy. The
aim is to observe whether precision-recall curve is towards the upper right corner of the chart.
The P-R curve displayed with respect to class value April and x-axis is recall (true Positive Rate)
and y a-axis precision.

The second one figure 4.2 is ROC curve.

Figure 4.2 ROC curve of class value April

As accuracy and P-R the other important measurement of model performance is ROC. ROC
curve displayed on figure 4.2 with respect to class value of April. The aim of producing ROC
curve is to have the curve close to upper left corner on y-axis which is one. The other different
interpretation is the given area under ROC as shown on figure 4.2. The AUROC is a comparison
of two operating characteristics (TPR y-axis and FPR x-axis). And the AUROC curve measured
describes the performances of the model if it is greater than 0.5 and less than 1.

70
On this research the J48 decision tree classifier based on evaluation metrics discussed above
performed better so it is selected to use the discovered knowledge. The following chapter had
shown the implementation of J48 model to produce prototype.

The following is J48 pruned tree


------------------
DispensoerDevices = Device disconnected: January (86.0)
DispensoerDevices != Device disconnected
| CardReader = Card Reader disconnected: January (34.0)
| CardReader != Card Reader disconnected
| | DispensoerDevices = Device offline: January (13.0)
| | DispensoerDevices != Device offline
| | | Network ConnectionB24 = Device offline: January (13.0)
| | | Network ConnectionB24 != Device offline
| | | | DispenserPickCash = Dispenser: Presenter clamping mechanism failed or jammed:
March (66.0/10.0)
| | | | DispenserPickCash != Dispenser: Presenter clamping mechanism failed or jammed
| | | | | DispenserPickCash = Dispenser: Too many bills rejected: January (36.0/12.0)
| | | | | DispenserPickCash != Dispenser: Too many bills rejected
| | | | | | DispenserCurrencyCassete = Bill cassette Slot 4 is empty: January (54.0/23.0)
| | | | | | DispenserCurrencyCassete != Bill cassette Slot 4 is empty
| | | | | | | DispenserCurrencyCassete = Retracted cards/bills: January (822.0/479.0)
| | | | | | | DispenserCurrencyCassete != Retracted cards/bills
| | | | | | | | Application = Transaction failed: Unable to dispense cash: March
(209.0/119.0)
| | | | | | | | Application != Transaction failed: Unable to dispense cash
| | | | | | | | | Application = Attempt to reboot via system: January (294.0/179.0)
| | | | | | | | | Application != Attempt to reboot via system

71
| | | | | | | | | | Application = Attempt to reboot via special electronic: January
(294.0/179.0)
| | | | | | | | | | Application != Attempt to reboot via special electronic
| | | | | | | | | | | DispenserCurrencyCassete = Bill cassette 4 is empty: January
(64.0/38.0)
| | | | | | | | | | | DispenserCurrencyCassete != Bill cassette 4 is empty
| | | | | | | | | | | | SOP = Operator switch changed to supervisor mode: January
(406.0/238.0)
| | | | | | | | | | | | SOP != Operator switch changed to supervisor mode
| | | | | | | | | | | | | DispenserCurrencyCassete = Device CashOut Module offline:
January (236.0/139.0)
| | | | | | | | | | | | | DispenserCurrencyCassete != Device CashOut Module offline
| | | | | | | | | | | | | | CardReader = Card reader: Shutter jammed open: March (3.0)
| | | | | | | | | | | | | | CardReader != Card reader: Shutter jammed open
| | | | | | | | | | | | | | | CardReader = Card reader: Blank track: January
(228.0/108.0)
| | | | | | | | | | | | | | | CardReader != Card reader: Blank track
| | | | | | | | | | | | | | | | DispenserPickCash = Dispenser: Purge bin not present:
March (8.0/2.0)
| | | | | | | | | | | | | | | | DispenserPickCash != Dispenser: Purge bin not present
| | | | | | | | | | | | | | | | | DispenserPickCash = Dispenser: Pick failure - out of
bills: March (246.0/153.0)
| | | | | | | | | | | | | | | | | DispenserPickCash != Dispenser: Pick failure - out of
bills: April (17768.0/8163.0)
Number of Leaves : 19
Size of the tree : 37
Time taken to build model: 1.3 seconds

72
CHAPTER FIVE
Research model, Use of the Discovered Knowledge, Implementation and
Evaluation
5.1 Research Model
The research is conducted on experimentation. According to the world book encyclopedia [15]
experimentation is a method used to discover facts and to test ideas. On this study to identify the
reason ATM out of service the four classifier algorithms J48, PART, NB, and MLP with the
default cross validation test design after sixteen experimentation produce models. The selected
J48 model was used for the extended use of discovered knowledge. Thus the discovered
knowledge revealed the desired reasons for ATM out of service in this thesis which are the
following components of ATMs.
DispensoerDevices: - 1) Device disconnected 2) Device offline
DispenserCurrencyCassete: - 1) Bill cassette Slot 4 is empty 2) Retracted cards/bills 3) Bill
cassette 4 is empty 4) Device CashOut Module offline
DispenserPickCash: - 1) Presenter clamping mechanism failed or jammed 2) Too many bills
rejected 3) Purge bin not present 4) Pick failure - out of bills
CardReader: - 1) Card Reader disconnected 2) Shutter jammed open 3) Blank track
Application: - 1) Transaction failed: Unable to dispense cash 2) Attempt to reboot via system 3)
Attempt to reboot via special electronic
SOP: - 1) Operator switch changed to supervisor mode, and
Network ConnectionB24:- 1) Device offline were hidden in the timestamp of ATMs events.
The following figure 5.1 is the model of this research.

73
Figure 5.1 Research model for design the reason ATM out of service

5.2 The System Development


For the implementation of graphical user interface that enables the use of discovered knowledge
to identify the reason for ATM out of service a desktop application development tool NetBeans
IDE 8.2 was used. NetBeans is an open-source integrated development environment (IDE) for
developing with Java, PHP, C++, and other programming languages. NetBeans is also referred to
as a platform of modular components used for developing Java desktop applications [26].

5.3 The Prototype


The prototype is designed based on hidden rules identified by classification algorithm, J48
decision tree. The prototype has attempts to identify reasons for ATM out of service based on the
given J48 classifier output.

Figure 5.1 shows snapshot of the user interface through which displayed one of the class label
time stamp January and its frequency and reasons for ATMs out of service.

74
Figure 5.1 ATM out of service reason identifier system: - Prototype

The input for ATM out of service reason identifier system came from weka J48 classifier output
which is the designed model. In other words, the model have been input for the prototype.

5.4 Evaluation
For conducting usability testing test, seven ATM system administrators were involved; to
evaluate the prototype done on NetBeans IDE 8.2 from the point of effectiveness, efficiency and
their satisfaction accordingly the ISO 9241-11 usability testing features [25]. Out of the seven
system administrators, five of them are seniors in the area. The system administrators section is
the one who is responsible to follow-up any incidents on ATM. Before conducting the
evaluation process description of the prototype has been given for these evaluators.

Guiding questionnaires with two parts has been prepared (see APPENDIX A), for facilitating
experts response. The first part of the questionaries’ was supportive in identifying their
experience on ATM out of service reasons. And the second part was five level likert scale
(strongly agree (5), agree (4), neutral (3), disagree (2) and strongly disagree (1)) is used
for the reply of the usability testing test given questions.

75
The below table 5.1 summarizes the responses of system administrators.

Table 5.1: Detailed summary of questionnaire result


Question No. Strongly Agree Agree Neutral Disagree Strongly Disagree Mean Value

1 7 0 0 0 0 5

2 5 2 0 0 0 4.7

3 7 0 0 0 0 5

4 3 4 0 0 0 4.4

5 7 0 0 0 0 5

6 7 0 0 0 0 5

7 7 0 0 0 0 5

8 7 0 0 0 0 5

9 7 0 0 0 0 5

10 7 0 0 0 0 5

11 7 0 0 0 0 5

Overall system usability (%) 98%

5.5 Analysis and interpretation


The guiding questionnaires part two based on ISO 9241-11 usability testing features [25]
questions are classified into three. The first one effectiveness has one questions out of eleven; the
second efficiency has three and the third one user satisfaction have seven. From the
questionnaires analysis concluded that 98% of respondents agreed the system is as an effective,
efficient and satisfactory and accepts as a prototype.

76
CHAPTER SIX
Conclusions and Recommendation
6.1 Conclusions
The ATM service is not far from us in our day to day life. Most selected areas such as hotels,
shops, branch bank, universities, hospitals, etc. have a placement for ATMs. These ATMs
intended to provide service 24 hours a day, seven days week, and all the time. But customers of
ATMs service observe commonly out of service on these self-service machine display. This
study was to find the reason for ATM out of service.

The researcher consider Dashen bank S.C. ATMs to answer why ATMs are out of service.
Approach followed up to investigate was data mining technology and the six steps of hybrid
knowledge discovery process model was applied. One of the crucial step in KDP is DM which
enables to design a model so the researcher used weka knowledge discovery tool to identify the
causes for ATM out of service. The method to accomplish DM objective was by 16 experimental
analysis followed evaluation using appropriate metrics accuracy, PR, ROC, and AUROC and
proof the experiments were worthy.

Firstly the researcher together with domain expert particularly studied the event data to
understand and prepare a dataset for data mining task. To identify out of service event data 14
ATMs were purposely selected within periods, December 31, 2017 – May 04, 2018. From the
collected dataset of ProView console as per the objective of this research 44 events of which
hardware problem were 73 % and software 27% describing out of service happening were
identified. Next was preparing dataset that is suitable for DM. Preparation of data set includes
constructions of new attributes from the existing dataset, removing unnecessary attributes and
inconsistence values, and aggregation of attribute values.

After the completion of DM it was found that the basic reasons for ATM out of service were the
non-functionality ATM components which are
DispensoerDevices: - 1) Device disconnected 2) Device offline

77
DispenserCurrencyCassete: - 1) Bill cassette Slot 4 is empty 2) Retracted cards/bills 3) Bill
cassette 4 is empty 4) Device CashOut Module offline
DispenserPickCash: - 1) Presenter clamping mechanism failed or jammed 2) Too many bills
rejected 3) Purge bin not present 4) Pick failure - out of bills
CardReader: - 1) Card Reader disconnected 2) Shutter jammed open 3) Blank track
Application: - 1) Transaction failed: Unable to dispense cash 2) Attempt to reboot via system 3)
Attempt to reboot via special electronic
SOP: - 1) Operator switch changed to supervisor mode, and
Network ConnectionB24:- 1) Device offline were hidden in the timestamp of ATMs events.
The finding shows that 87 % ATMs out of service were in April, 10.7 % January, and 2.2 %
March. The major problem 87 % was on hardware DispenserPickCash Dispenser: Pick failure -
out of bills. And next the remaining were CardReader. Application, SOP and Network
ConnectionB24.

The final step of hybrid KDP is use of the discovered knowledge. To use this discovered
knowledge the finding J48 model used to produce a prototype and the implementation done
using Java on NetBeans IDE 8.2 platform. The Extended knowledge to this result is ATM out of
service reason identifier system that make users easily identify the reasons for ATM out of
service.

The best selected classification algorithm was J48. The selected DM measurements metric
accuracy was 52.7059 %, PR curve shown a good performance which is a recall x axis 1 and y
axis 0.5 at start. ROC have the curve like bow and its peak reached to one to one where as the
area ROC is 0.6161 which is worthy.

The designed model found in this study identify the reasons for ATM out of service and made
classification. Classified ATM out of service reasons support ATM supervisors and increase the
availability and improve the service. To have higher availability on ATMs components and
service it needs further research.

78
6.2 Recommendation
Quality of ATM service is desired at most for customer satisfaction and to gain more on return
investment. Bank industry is responsible to make maximum in service of ATMs. This study
looks at closely Dashen bank ATMs in order to identify the reasons for ATMs out of service.
Based on the findings of the study the following recommendations are forwarded.

 In this study, an attempt is made to identify the causes of ATMs out of service
using scarce data captured by ProView. However preparing enough quality data
needs to be considered for future study to simplify the data mining task.

 In this study an attempt is made to use the knowledge discovered for identifying
the reasons for ATM out of service. We recommend to integrate data mining
result with the ProView so as to detect the reasons automatically for taking
immediate corrective action.

 To enhance the accuracy of predictive model further study needs to be conducted


using other classification algorithms.

 One of the research on ATM is the hardware parts card reader, dispenser and all
its sub components time to failure prediction in order to classify ATM out of
service by time.

 If currency cassettes are not empty it increases the availability of ATM; therefore,
how to make ATM not run empty of a cash proactively needs to construct a
predictive model which is left for further research.

79
Reference
[1] Cios et al., Data Mining A Knowledge Discovery Approach, New York: Springer, 2007.
[2] W. Ian H and F. Eibe, Data Mining: Practical machine learning tools and techniques, San
Francisco: Morgan Kaufmann, 2016.
[3] J. Han and M. Kamber, Data mining: concepts and techniques, San Francisco: Morgan
Kaufmann, 2006.
[4] M. H. Dunham, Data mining: Introductory and advanced topics, NJ: Pearson Education,
2006.
[5] Dashen Bank, "Company profile," Dashen Bank, 4 September 2018. [Online]. Available:
https://dashenbanksc.com. [Accessed 4 September 2018].
[6] Dashen Bank, "21st Annual Report for the year," Dashen Bank, Addis Ababa, 2017.
[7] W. Gardachew, "Electronic-banking in Ethiopia-practices, opportunities and challenges,"
Journal of internet Banking and commerce, vol. 15, no. 2, pp. 1-8, 2010.
[8] Wincor Nixdorf, Administration & System Management ProView Reporting API,
Paderborn: Wincor Nixdorf, 2015.
[9] Dashen Bank, Risk Management policy Manual, Addis Ababa: Dashen Bank S.C., 2017.
[10] Dashen Bank, "Internal Audit Policy," Internal Audit Policy, pp. 10-15, 15 Feburary
2017.
[11] T. Mahmood and G. M. Shaikh, "Adaptive Automated Teller Machines," Expert Systems
with Applications, vol. 40, pp. 1152-1169, 2013.
[12] Gümüş et al., "Ultimate Point in the Service Provided by the Banks to Their Customers:
Customer Satisfaction in the Common Use of ATMs," Social and Behavioral Sciences, vol. 207,
pp. 98-110, 2015.
[13] S. Madhavi, S. Abirami, C. Bharathi, B. Ekambaram, T. Krishna Sankar, A. Nattudurai,
N. Vijayarangan, "ATM Service Analysis Using Predictive Data Mining," International Journal
of Computer, Information, Systems and Control Engineering, vol. 8, no. 2, 2014.
[14] WordWeb software, "WordWeb 8.03a," Princeton University, 2016.
[15] Field enterprises Education Corporation, The world book encyclopedia Volume 6, USA
FGA: Field enterprises Education Corporation, 1967.
[16] Wincor Nixdorf, ProView Console 4.2/40, Paderborn: Wincor Nixdorf, 2015.
[17] Wincor Nixdorf, ProView Data Model 4.2/40(C), Paderborn: Wincor Nixdorf, 2015.
[18] P. Bastos et al, "A Maintenance Prediction System using Data Mining Techniques,"
Proceedings of the World Congress on Engineering , vol. III, pp. 1148-1453, 2012.

80
[19] Two Crows Corporation, Introduction to data mining and knowledge, Potomac: Two
Crows Corporation, 2005.
[20] G. Tewary, "Data Mining Through Neural Networks Using Recurrent Network," itccma,
pp. 57-74, 2015.
[21] Fayyad et al., "From Data Mining to Knowledge Discovery in Databases," AI Magazine,
vol. 17, no. 3, 1996.
[22] Transaction Network Services, What Payments Trends Should You follow in 2018?,
Virginia: Transaction Network Services, 2017.
[23] INETCO, Unlocking Your ATM “Big Data”: Understanding the, Burnaby: INETCO,
2015.
[24] Wincor Nixdorf, ProView Data Model 4.2/40(C), Paderborn: Wincor Nixdorf, 2015.
[25] ISO, "ISO Online Browsing platform," The International Organization for
Standardization, 26 January 2018. [Online]. Available: https://www.iso.org/obp/ui/#search.
[Accessed 26 January 2018].
[26] NetBeans, "NetBeans," NetBeans, 26 January 2018. [Online]. Available:
https://netbeans.org/. [Accessed 26 January 2018].
[27] Cabena et al, Data Mining:From Concepts to Implementation, New Jersey: Prentice Hall
Saddle River, 1998.
[28] Farooqi et al, "Effectiveness of Data mining in Banking Industry: An empirical study,"
International Journal of Advanced Research in Computer Science, vol. 8, no. 5, pp. 827-830,
2017.
[29] C. Mwatsika, "Customers’ satisfaction with ATM banking in Malawi," African Journal of
Business Management, pp. 218-227, 2014.
[30] Chan et al., "Modified automatic teller machine prototype for older adults: A case study
of," Applied Ergonomics, no. 40, pp. 151-160, 2009.
[31] Moslehi et al, "Analyzing and Investigating the Use of Electronic Payment Tools in
Iran," Journal of AI and Data Mining, pp. 417-437, 2018.

[32] Electro Magnetic Components Inc. , "ATMs: How They Work and Basic ATM Parts,"
Electro Magnetic Components Inc. , 2015. [Online]. Available:
http://www.atmparts.net/atm-parts/. [Accessed 28 February 2019].

81
APPENDIX
ATM OUT OF SERVICE REASON IDENTIFIER SYSTEM
Usability Testing Questionnaire (Users: - System Administrator)

This questionnaire is intended to know admins knowledge related to ATM out of service reasons
based on existing system ProView, B24, and NetMon and others.

I. Background Information about the company and the user

1. Name of the company ________________________________

2. Your position ________________________

3. On which ATM monitoring system tool you have experience for identifying status?
__________________________

4. Are you familiar with the ATM out of service reasons? ☐ Yes ☐ No

5. How many reasons out did you identify as out of service?

6. List the reasons of out of service?

7. Do you know or your company proactively ATM out of service? ☐Yes ☐ No

8. If your answer for question number 7 is Yes, please describe in detail


___________________________________________________________________

9. If your answer for question number 7 is No, please describe in detail


______________________________________________________________

II. Prototype

The following items are related to usability testing test of ATM out of service reason identifier
system. Please indicate your agreement by making “ “in the boxes

82
No. Question

agree
Strongly

Agree

Neutral

Disagree
Disagree
Strongly
1 Do you think classifying ATM out of service with respect
to timestamp of months in the prototype is essential?
2 Are you satisfied when events are labeled or classified with
time stamp of the three months (January, March and April)
3 Do you think the ATM out of service reason identifier
system prototype can be used easily?
4 Do you think the system is good enough product?
5 Do you think the response time for most operations is fast
enough?
6 Do you think the cost of ATM out of service reason
identifier system is lower than any real time monitoring
system?
7 Do you think the system can be used by any user with
basic knowledge of using computers?
8 Do you think the text which appears on the pages is clearly
readable throughout operating on the system?
9 Do you think the interaction to accomplish tasks is simple
and complete with a few seconds?
10 Do you think the menu items are consistently located and
work without failure?
11 Is ATM out of service reason identifier system easy to use
to view, edit, and copy?

Please write any other comment about the ATM out of service reason identifier system:
______________________________________________________________________________
______________________________________________________________________________
________ ________________________________________________________________

83

You might also like