Expert Systems With Applications: Daniel Mateos-García, Jorge García-Gutiérrez, José C. Riquelme-Santos
Expert Systems With Applications: Daniel Mateos-García, Jorge García-Gutiérrez, José C. Riquelme-Santos
Expert Systems With Applications: Daniel Mateos-García, Jorge García-Gutiérrez, José C. Riquelme-Santos
a r t i c l e
i n f o
Keywords:
Evolutionary computation
Nearest-neigbour
Weighted voting
a b s t r a c t
This work presents an evolutionary approach to modify the voting system of the k-nearest neighbours (kNN)
rule we called EvoNN. Our approach results in a real-valued vector which provides the optimal relative contribution of the k-nearest neighbours. We compare two possible versions of our algorithm. One of them
(EvoNN1) introduces a constraint on the resulted real-valued vector where the greater value is assigned to
the nearest neighbour. The second version (EvoNN2) does not include any particular constraint on the order
of the weights. We compare both versions with classical kNN and 4 other weighted variants of the kNN on
48 datasets of the UCI repository. Results show that EvoNN1 outperforms EvoNN2 and statistically obtains
better results than the rest of the compared methods.
2015 Elsevier Ltd. All rights reserved.
1. Introduction
Weighting in machine learning is a common technique for emphasizing some characteristics of data to improve the resulting models.
For example, weighting has been used to outline the importance of
some particular instances (Blachnik & Duch, 2011) or features (Zhi,
Fan, & Zhao, 2014), or rank a set of techniques in the context of ensembles (Berikov, 2014). In a broad sense, Articial Neural Networks
(ANNs) and Support Vector Machines (SVMs) can be also seen as examples of using weights in learning models but the k-nearest neighbours (kNN) has been the most common technique to benet from
weights (Mateos-Garca, Garca-Gutirrez, & Riquelme-Santos, 2012).
kNN and its variants have been widely used in the literature to
solve real problems. Rodger (2014) used a hybrid model to predict
the demand of natural gas. The system was implemented integrating
regression, fuzzy logic, nearest neighbour and neural networks, and
considering several variables such as the price, operating expenses,
cost to drill new wells, etc. If we focus on biological data, Park and
Kim (2015) selected signicant genes from microarrays by using a
nearest-neighbour-based ensemble of classiers. On the other hand,
Park, Park, Jung, and Lee (2015) tackled the problem of designing recommender systems. For this purpose the authors presented Reversed
CF (RCF), a fast item-based collaborative ltering algorithm which
utilizes a k-nearest neighbour graph.
http://dx.doi.org/10.1016/j.eswa.2015.08.017
0957-4174/ 2015 Elsevier Ltd. All rights reserved.
The main goal of a weighting system lies in the optimization (commonly by metaheuristics) of a set of weights in the training step to obtain the highest accuracy but trying not to overt the resulting model.
If we focus on kNN weighting methods, many proposals weighting features or instances can be found. In Raymer, Punch, Goodman,
Kuhn, and Jain (2000) a weighting method to obtain an optimal set
of features was provided. The set of features was selected by means
of a kNN-based genetic algorithm using a bit vector to indicate if a
feature was in the selection or not. In a later work, the same authors
presented a hybrid evolutionary algorithm using a Bayesian discriminant function (Raymer, Doom, Kuhn, & Punch, 2003) and trying to isolate characteristics belonging to large datasets of biomedical origin.
Moreover, Paredes and Vidal (2006) used different similarity functions to improve the behaviour of the kNN. In a rst approximation,
they considered a weight by feature and instance on training data resulting in a non-viable number of parameters in the learning process.
Then, the authors proposed three types of reduction: a weight by
class and feature (label dependency), a weight by prototype (prototype dependency) and a combination of the previous ones. The optimization process was carried out by descendant gradient. In the same
line, Tahir, Bouridane, and Kurugollu (2007) showed an approach that
was able to both select and weight features simultaneously by using
tabu search. Furthermore, Mohemmed and Zhang (2008) presented a
nearest-centroid-based classier. This method calculated prototypical instances by considering arithmetic average from the training
data. To classify an instance, the method calculated the distance to
every prototype and then selected the nearest one. The optimization of the best centroids that minimized the classication error was
carried out through particle swarm. Fernandez and Isasi (2008) also
proposed a weighting system by using a prototype-based classier.
After a data normalization that was based on the position of the
10
instances with respect to regions, the weights were iteratively calculated. More recently, AlSukker, Khushaba, and Al-Ani (2010) used differential evolution to nd weights for different aspects of data. They
described four approaches: feature weighting, neighbour weighting,
class weighting and mixed weighting (features and classes), with the
latter being the one providing the best results.
Weighting has also been applied to the vote system of the kNN.
Thus, the distance-weighted k-nearest neighbour rule (WKNN) proposed by Dudani (1976) has been known for long. WKNN weights the
votes of the k nearest neighbours (wi ) according to Eq. (1) where di
is the distance of the ith nearest neighbour (being d1 the distance
of the nearest) regarding an instance to be classied. A similar version using a uniform weighting (UWKNN) has also been proposed
where a weight is inversely proportional to the position reach among
the neighbours (i.e., wui = 1/i). Recently, both techniques have been
explored working together as a new classier called Dual-Weighted
kNN (DWKNN) showing promising results where each weight was
calculated according to Eq. (2) (Gou, Xiong, & Kuang, 2011). A later
work of Gou, Du, Zhang, and Xiong (2012) provided another version
of DWKNN where the calculation of the weights were different according to Eq. (3).
(dk di )
w
wi = (dk d1 )
if di = d1
(1)
if di = d1
u
wdw1
= ww
i wi
i
(dk di ) (dk + d1 )
dw2
wi
= (dk d1 ) (dk + di )
(2)
if di = d1
l{1..b}
where
k
(4)
i=1
1 if label (xi ) = l
0 otherwise
(5)
l{1..b}
k
(6)
i=1
The function to minimize is the sum of all prediction errors of every instance x from TR. Thus, if we dene the error function as
(3)
if di = d1
Although all the previous approaches provided improvements regarding the classical kNN performance, they have not explored the
possible better suitability of evolutionary computation for the optimization of the neighbours weights. Thus, we propose an evolutionary method to improve the kNN rule by altering the vote system
knowledge obtained in the training phase. We also explore the use
of constraints in learning weights with two different versions of our
approach and we study their reliability compared with classical kNN
and other 4 weighted variants on UCI datasets (Lichman, 2013). Finally, results are statistically validated to reinforce the nal conclusions.
The rest of the paper is organized as follows. Section 2 presents
the elements of the two versions of the evolutionary algorithm designed to weight the vote system of the kNN. The results and several
statistical tests are specied in Section 3. Finally, Section 4 presents
the main ndings and future work.
2. Method
In this section, two variants of our voting optimization system
called Evolutionary Voting of Nearest Neighbours (EvoNN) are described.
2.1. Purpose and functionality
The aim of our work was to nd a set of weights to modify the
inuence of every nearest neighbour when they voted to assign a label to an unlabelled instance. Moreover, our approach also provided
a measurement about the inuence of the proximity of neighbours
by means of the optimized weights. Thus, the evolutionary process
provides the optimal weights that have been found to improve the
classication accuracy of the domain under study.
To formalize our approach we assume that the set of classes (or
labels) L is represented by the natural numbers from 1 to b, with
error(x, w) =
(7)
min
wRk
error(x, w)
(8)
xT R
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
F itness(W, k, T R) : error
error = 0
for i = 1 to m do
Divide T R in s bags: B1 ...Bs
partialError = 0
for j = 1 to s do
partialError = partialError + Evaluate(W, k, T R Bj , Bj )
end for
partialError = partialError/s
error = error + partialError
end for
error = error/m
return error
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
11
o f f spring(i) =
BLX
if i = 1
otherwise
(9)
o f f spring(i) = BLX
from parent1(i)
and parent2(i)
(10)
Regarding the mutation operator, the ith gene of the individual indiv could change according to Eq. (11) in EvoNN1 where was initially
a random value between 0 and 1. Then, to improve the t through the
generations, the value was in the interval [0, 1] during the rst 10
generations. For the next 10, it was in [0, 0.9], etc. On the other hand,
indiv(i) + indiv(i)
indiv(i) indiv(i)
indiv (i) =
if i = 1
if i = k
(11)
otherwise
(12)
12
Table 1
Comparison of accuracies of EvoNN1 and EvoNN2. In bold, the best.
Dataset
EvoNN1
EvoNN2
Dataset
EvoNN1
EvoNN2
anneal
arrhythmia
audiology
australian
autos
balance
breast c.
breast w.
bridges v1
bridges v2
car
cmc
colic
credit a
credit g
cylinder b
dermatology
diabetes
ecoli
ags
glass
haberman
hayes
heart c
97.98
60.13
67.12
83.89
75.42
86.27
71.27
97.15
70.7
69.79
83.43
63.21
79.67
84.34
72.17
80.19
95.39
71.73
93.17
55.4
60.8
72.77
62.35
82.52
97.95
59.76
65.97
84.11
75.12
86.23
69.64
97.17
67.86
66.42
86.89
63.07
79.55
83.75
72.81
80.19
95.62
71.6
92.81
54.73
62.57
68.49
62.26
82.14
heart h
heart stat
hepatitis
hypothyroid
ionosphere
iris
kr vs kp
labor
landsat
liver disorders
lung cancer
lymph
mfeat factors
mfeat fourier
mfeat karhunen
mfeat morph.
mfeat pixel
mfeat zernike
molecular bio.
monks 1
monks 2
monks 3
mushroom
nursery
86.78
78.38
83.44
93.06
90.45
99.78
98.5
81.69
93.86
61.49
81.4
80.87
97.1
86.19
98.04
97.24
96.93
92.05
38.34
47.39
47.12
47.78
100
95.91
86.04
78.13
81.26
93.04
90.36
99.77
98.55
81.22
93.77
60.72
75.79
80.47
97.13
85.91
98
97.31
96.92
92.05
32.47
54.79
54.35
46.22
100
96.22
13
Table 2
Accuracy of every studied algorithm throughout 48 datasets from UCI.
Datasets
EvoNN1
IBk
WKNN
UWKNN
DWKNN1
DWKNN2
anneal
arrhythmia
audiology
australian
autos
balance
breast c.
breast w.
bridges v1
bridges v2
car
cmc
colic
credit a
credit g
cylinder b
dermatology
diabetes
ecoli
ags
glass
haberman
hayes
heart c
heart h
heart stat
hepatitis
hypothyroid
ionosphere
iris
kr vs kp
labor
landsat
liver disorders
lung cancer
lymph
mfeat factors
mfeat fourier
mfeat karhunen
mfeat morph.
mfeat pixel
mfeat zernike
molecular bio.
monks 1
monks 2
monks 3
mushroom
nursery
97.98
60.13
67.12
83.89
75.42
86.27
71.27
97.15
70.7
69.79
83.43
63.21
79.67
84.34
72.17
80.19
95.39
71.73
93.17
55.4
60.8
72.77
62.35
82.52
86.78
78.38
83.44
93.06
90.45
99.78
98.5
81.69
93.86
61.49
81.4
80.87
97.1
86.19
98.04
97.24
96.93
92.05
38.34
47.39
47.12
47.78
100
95.91
97.1
59.39
60.73
84.38
59.79
88.28
73.86
97
58.34
60.47
93.13
45.83
79.2
83.46
72.59
72.29
96.26
74.72
86.49
54.31
66.13
71.08
27.66
83.31
79.41
78.36
82.67
93.25
85.61
95.59
96.2
80.07
90.29
58.36
80.95
78.53
96.56
81.56
96.11
71.08
95.92
80.53
32.57
52.25
54.13
38.48
100
98.36
97.85
57.53
68.4
82.16
73.22
86.81
69.63
96.09
65.11
62.75
93.13
45.26
73.7
81.41
71.87
78.7
96.32
72.67
82.84
59.12
70.27
70.46
67.52
80.6
78.56
76.59
81.8
92.62
87.46
95.19
95.95
84.01
90.53
61.28
71.44
82.82
96.69
81.02
96.8
68.03
96.71
79.09
33.56
35.8
38.27
35.28
100
93.63
98.43
58.45
68.78
83.46
75.87
82.65
70.95
96.36
64.93
63.37
88.16
45.37
77.32
83.65
72.9
77.37
95.79
72.91
83.78
60.31
68.05
67.55
67.47
80.14
78.56
79.39
83.91
93.13
86.33
95.66
95.12
85.78
90.37
60.57
78.02
84.97
96.51
81.26
96.75
68.44
96.78
79.39
34.76
39.51
40.78
37.59
100
96.1
99.14
54.9
66.48
80.4
76.19
82.01
68.12
95.39
64.08
62.15
88.16
44.16
70.79
80.12
71.33
79.16
95.16
70.59
80.24
57.93
69.49
64.92
74.62
76.66
76.67
75.7
79.95
91.2
86.89
95.66
93.7
85.67
90.13
59.34
67.87
82.37
96.08
80.07
96.18
65.9
96.36
79.03
35.24
37.19
38.03
38.93
100
93.23
98.23
57.23
68.44
82.11
76.13
86.81
69.63
96.04
65.24
62.75
93.13
44.56
73.5
81.47
71.79
78.81
96.32
72.54
82.18
59.12
70.07
69.77
67.52
80.27
78.54
76.31
80.64
92.5
87.62
95.19
95.95
84.13
90.56
61.01
71.44
82.85
96.68
80.94
96.77
67.7
96.7
79.07
33.56
35.8
38.27
35.45
100
93.63
Table 3
Mean ranking reached by every compared technique.
Technique
Ranking
EvoNN1
UWKNN
IBk
WKNN
DWKNN2
DWKNN1
2.270
3.062
3.458
3.594
3.781
4.833
rankings in our case. Then, we carried out the Friedman test. The
statistic for Friedman was 48.95, distributed according to a chi-square
with 5 degrees of freedom. The p-Value for Friedman was around
2.302E9 so the null hypothesis (no statistical difference among the
compared techniques) could be rejected with an = 0.05.
Because Friedman test should be accompanied by a post-hoc procedure for multiple pairwise comparison, we used the Holm procedure. This procedure took our approach (EvoNN1) as the control
method and compared it with each other technique by a pairwise
Table 4
Results for Holm post-hoc procedure.
DataSet
Holms adjusted
DWKNNv1
DWKNNv2
WKNN
IBk
UWKNN
1.94E11
7.65E5
5.32E4
0.002
0.038
6.710
3.955
3.464
3.109
2.073
0.010
0.013
0.017
0.025
0.050
14
Berikov, V. (2014). Weighted ensemble of algorithms for complex data clustering. Pattern Recognition Letters, 38, 99106.
Blachnik, M., & Duch, W. (2011). LVQ algorithm with instance weighting for generation
of prototype-based rules. Neural Networks, 24(8), 824830.(Articial Neural Networks: Selected Papers from ICANN 2010).
Demar, J. (2006). Statistical comparisons of classiers over multiple data sets. Journal
of Machine Learning Research, 7, 130.
Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions
on Systems, Man and Cybernetics, 6(4), 325327.
Eshelman, L. J., & Schaffer, J. D. (1993). Real-coded genetic algorithms and intervalschemata. In D. L. Whitley (Ed.), Foundation of genetic algorithms 2 (pp. 187202).
San Mateo, CA: Morgan Kaufmann.
Fernandez, F., & Isasi, P. (2008). Local feature weighting in nearest prototype classication. IEEE Transactions on Neural Networks, 19(1), 4053.
Garca-Gutierrez, J., Gonalves-Seco, L., & Riquelme-Santos, J. C. (2011). Automatic environmental quality assessment for mixed-land zones using lidar
and intelligent techniques. Expert Systems with Applications, 38(6), 68056813.
http://dx.doi.org/10.1016/j.eswa.2010.12.065.
Garca, S., & Herrera, F. (2008). An Extension on Statistical Comparisons of Classiers
over Multiple Data Sets for all Pairwise Comparisons. Journal of Machine Learning
Research, 9, 26772694.
Gou, J., Du, L., Zhang, Y., & Xiong, T. (2012). A new distance-weighted k-nearest neighbor
classier. Journal of Information & Computational Science, 9(6), 14291436.
Gou, J., Xiong, T., & Kuang, J. (2011). A novel weighted voting for k-nearest neighbor
rule. Journal of Computers, 6(5), 833840.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The
WEKA data mining software: an update. SIGKDD Explorations, 11(1).
Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Mateos-Garca, D., Garca-Gutirrez, J., & Riquelme-Santos, J. C. (2012). On the evolutionary optimization of k-NN by label-dependent feature weighting. Pattern Recognition Letters, 33(16), 22322238. doi:10.1016/j.patrec.2012.08.011.
Mohemmed, A. W., & Zhang, M. (2008). Evaluation of particle swarm optimization
based centroid classier with different distance metrics. In Proceedings of IEEE
congress on evolutionary computation08 (pp. 29292932).
Paredes, R., & Vidal, E. (2006). Learning weighted metrics to minimize nearestneighbor classication error. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(7), 11001110.
Park, C. H., & Kim, S. B. (2015). Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Systems with Applications, 42(5), 23362342.
http://dx.doi.org/10.1016/j.eswa.2014.10.044.
Park, Y., Park, S., Jung, W., & Lee, S-g. (2015). Reversed CF: a fast collaborative ltering
algorithm using a k-nearest neighbor graph. Expert Systems with Applications, 42(8),
40224028. http://dx.doi.org/10.1016/j.eswa.2015.01.001.
Raymer, M., Doom, T., Kuhn, L., & Punch, W. (2003). Knowledge discovery in medical
and biological datasets using a hybrid bayes classier/evolutionary algorithm. IEEE
Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 33(5), 802813.
Raymer, M., Punch, W., Goodman, E., Kuhn, L., & Jain, A. (2000). Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation, 4(2),
164171.
Rodger, J. A. (2014). A fuzzy nearest neighbor neural network statistical model
for predicting demand for natural gas and energy cost savings in public buildings. Expert Systems with Applications, 41(4, Part 2), 18131829.
http://dx.doi.org/10.1016/j.eswa.2013.08.080.
Tahir, M. A., Bouridane, A., & Kurugollu, F. (2007). Simultaneous feature selection and
feature weighting using hybrid tabu search/k-nearest neighbor classier. Pattern
Recognition Letters, 28(4), 438446.
Zhi, X., Fan, J., & Zhao, F. (2014). Robust local feature weighting hard c-means clustering
algorithm. Neurocomputing, 134, 2029.