Expert Systems With Applications: Daniel Mateos-García, Jorge García-Gutiérrez, José C. Riquelme-Santos

Expert Systems With Applications 43 (2016) 914
Contents lists available at ScienceDirect
Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa
An evolutionary voting for k-nearest neighbours

Daniel Mateos-Garca, Jorge Garca-Gutirrez, Jos C. Riquelme-Santos
Department of Computer Science, Avda. Reina Mercedes S/N, Seville 41012 , Spain
a r t i c l e
i n f o
Keywords:
Evolutionary computation
Nearest-neigbour
Weighted voting
a b s t r a c t
This work presents an evolutionary approach to modify the voting system of the k-nearest neighbours (kNN)
rule we called EvoNN. Our approach results in a real-valued vector which provides the optimal relative contribution of the k-nearest neighbours. We compare two possible versions of our algorithm. One of them
(EvoNN1) introduces a constraint on the resulted real-valued vector where the greater value is assigned to
the nearest neighbour. The second version (EvoNN2) does not include any particular constraint on the order
of the weights. We compare both versions with classical kNN and 4 other weighted variants of the kNN on
48 datasets of the UCI repository. Results show that EvoNN1 outperforms EvoNN2 and statistically obtains
better results than the rest of the compared methods.
2015 Elsevier Ltd. All rights reserved.
1. Introduction
Weighting in machine learning is a common technique for emphasizing some characteristics of data to improve the resulting models.
For example, weighting has been used to outline the importance of
some particular instances (Blachnik & Duch, 2011) or features (Zhi,
Fan, & Zhao, 2014), or rank a set of techniques in the context of ensembles (Berikov, 2014). In a broad sense, Articial Neural Networks
(ANNs) and Support Vector Machines (SVMs) can be also seen as examples of using weights in learning models but the k-nearest neighbours (kNN) has been the most common technique to benet from
weights (Mateos-Garca, Garca-Gutirrez, & Riquelme-Santos, 2012).
kNN and its variants have been widely used in the literature to
solve real problems. Rodger (2014) used a hybrid model to predict
the demand of natural gas. The system was implemented integrating
regression, fuzzy logic, nearest neighbour and neural networks, and
considering several variables such as the price, operating expenses,
cost to drill new wells, etc. If we focus on biological data, Park and
Kim (2015) selected signicant genes from microarrays by using a
nearest-neighbour-based ensemble of classiers. On the other hand,
Park, Park, Jung, and Lee (2015) tackled the problem of designing recommender systems. For this purpose the authors presented Reversed
CF (RCF), a fast item-based collaborative ltering algorithm which
utilizes a k-nearest neighbour graph.
Correspoding author. Tel.: +34954555964, Fax.: +34954557139.

E-mail addresses: mateosg@us.es (D. Mateos-Garca), jorgarcia@us.es (J. GarcaGutirrez), riquelme@us.es (J.C. Riquelme-Santos).
URL: http://www.lsi.us.es (D. Mateos-Garca)
http://dx.doi.org/10.1016/j.eswa.2015.08.017
0957-4174/ 2015 Elsevier Ltd. All rights reserved.
The main goal of a weighting system lies in the optimization (commonly by metaheuristics) of a set of weights in the training step to obtain the highest accuracy but trying not to overt the resulting model.
If we focus on kNN weighting methods, many proposals weighting features or instances can be found. In Raymer, Punch, Goodman,
Kuhn, and Jain (2000) a weighting method to obtain an optimal set
of features was provided. The set of features was selected by means
of a kNN-based genetic algorithm using a bit vector to indicate if a
feature was in the selection or not. In a later work, the same authors
presented a hybrid evolutionary algorithm using a Bayesian discriminant function (Raymer, Doom, Kuhn, & Punch, 2003) and trying to isolate characteristics belonging to large datasets of biomedical origin.
Moreover, Paredes and Vidal (2006) used different similarity functions to improve the behaviour of the kNN. In a rst approximation,
they considered a weight by feature and instance on training data resulting in a non-viable number of parameters in the learning process.
Then, the authors proposed three types of reduction: a weight by
class and feature (label dependency), a weight by prototype (prototype dependency) and a combination of the previous ones. The optimization process was carried out by descendant gradient. In the same
line, Tahir, Bouridane, and Kurugollu (2007) showed an approach that
was able to both select and weight features simultaneously by using
tabu search. Furthermore, Mohemmed and Zhang (2008) presented a
nearest-centroid-based classier. This method calculated prototypical instances by considering arithmetic average from the training
data. To classify an instance, the method calculated the distance to
every prototype and then selected the nearest one. The optimization of the best centroids that minimized the classication error was
carried out through particle swarm. Fernandez and Isasi (2008) also
proposed a weighting system by using a prototype-based classier.
After a data normalization that was based on the position of the
10
D. Mateos-Garca et al. / Expert Systems With Applications 43 (2016) 914
instances with respect to regions, the weights were iteratively calculated. More recently, AlSukker, Khushaba, and Al-Ani (2010) used differential evolution to nd weights for different aspects of data. They
described four approaches: feature weighting, neighbour weighting,
class weighting and mixed weighting (features and classes), with the
latter being the one providing the best results.
Weighting has also been applied to the vote system of the kNN.
Thus, the distance-weighted k-nearest neighbour rule (WKNN) proposed by Dudani (1976) has been known for long. WKNN weights the
votes of the k nearest neighbours (wi ) according to Eq. (1) where di
is the distance of the ith nearest neighbour (being d1 the distance
of the nearest) regarding an instance to be classied. A similar version using a uniform weighting (UWKNN) has also been proposed
where a weight is inversely proportional to the position reach among
the neighbours (i.e., wui = 1/i). Recently, both techniques have been
explored working together as a new classier called Dual-Weighted
kNN (DWKNN) showing promising results where each weight was
calculated according to Eq. (2) (Gou, Xiong, & Kuang, 2011). A later
work of Gou, Du, Zhang, and Xiong (2012) provided another version
of DWKNN where the calculation of the weights were different according to Eq. (3).
(dk di )
w
wi = (dk d1 )
if di = d1
(1)
if di = d1
u
wdw1
= ww
i wi
i
(dk di ) (dk + d1 )
dw2
wi
= (dk d1 ) (dk + di )
(2)
if di = d1
b being the number of labels. Thus, let D = {(e, l ) | e R f and l

L = {1, 2, . . ., b}} be the dataset under study with f being the number of features and b the number of labels. Let label be a function
that assigns to every element e the real class to which it belongs
to. Let us suppose that D is divided in the sets TR and TS, each of
them being the training set and the testing set, respectively, such that
D = T R T S and T R T S = . In the training step the classication error is minimized with participation only by instances from TR. This
classication error is calculated as follows. For each x TR we compute its k nearest neighbours according to a distance function d. Let
xi with i = 1. . .k be the neighbours to x but ranked by distance, i.e.:
d(x, x1 ) d(x, x2 ). . . d(x, xk ) and y TR with y = xi , d(x, xk ) < d(x,
y). According to the standard kNN rule, the prediction of the label of
x from the labels of its neighbours can be formalized:
predLabel (x) = arg max
l{1..b}
where
k

(l, label (xi ))
(4)
i=1
(l, label (xi )) =
1 if label (xi ) = l
0 otherwise
(5)
If instead of a unitary vote, we consider that the ith neighbour

contributes with the weight wi R, the function (4) is redened:
predLabel (x, w) = arg max
l{1..b}
k

i (l, label (xi ))
(6)
i=1
The function to minimize is the sum of all prediction errors of every instance x from TR. Thus, if we dene the error function as
(3)
if di = d1
Although all the previous approaches provided improvements regarding the classical kNN performance, they have not explored the
possible better suitability of evolutionary computation for the optimization of the neighbours weights. Thus, we propose an evolutionary method to improve the kNN rule by altering the vote system
knowledge obtained in the training phase. We also explore the use
of constraints in learning weights with two different versions of our
approach and we study their reliability compared with classical kNN
and other 4 weighted variants on UCI datasets (Lichman, 2013). Finally, results are statistically validated to reinforce the nal conclusions.
The rest of the paper is organized as follows. Section 2 presents
the elements of the two versions of the evolutionary algorithm designed to weight the vote system of the kNN. The results and several
statistical tests are specied in Section 3. Finally, Section 4 presents
the main ndings and future work.
2. Method
In this section, two variants of our voting optimization system
called Evolutionary Voting of Nearest Neighbours (EvoNN) are described.
2.1. Purpose and functionality
The aim of our work was to nd a set of weights to modify the
inuence of every nearest neighbour when they voted to assign a label to an unlabelled instance. Moreover, our approach also provided
a measurement about the inuence of the proximity of neighbours
by means of the optimized weights. Thus, the evolutionary process
provides the optimal weights that have been found to improve the
classication accuracy of the domain under study.
To formalize our approach we assume that the set of classes (or
labels) L is represented by the natural numbers from 1 to b, with
error(x, w) =
1 if predLabel (x, w) = label (x)

0 otherwise
(7)
then the function to optimize is:
min
wRk
error(x, w)
(8)
xT R
With regard to the two versions of the evolutionary algorithm,

they were called EvoNN1 and EvoNN2. For EvoNN1, the nearest
neighbours (those with lowest distances regarding the unlabelled instance) were heavier and therefore, their inuence (weight) had to
be greater. In EvoNN2 weights had no constraints. Regarding the design of the evolutionary algorithm, it is easy to suppose that EvoNN1
and EvoNN2 were similar except on the encoding, crossover and mutation.
2.2. Voting optimization
This subsection details the search algorithm to calculate the optimum contribution of k nearest neighbours carried out by two versions of an evolutionary algorithm . It is then necessary to dene their
main characteristics i.e., individual encoding, genetic operators, tness function and generational replacement policy.
2.2.1. Individual encoding
In EvoNN1 and EvoNN2, an individual was a real-valued vector
with size k symbolizing the relative contribution of the k-nearest
neighbours in the voting system of the kNN rule. Position 1 in the
vector was associated with the nearest neighbour in distance and position k with the furthest. Moreover, a constraint was established in
EvoNN1 to assure that the closest neighbours were more important
i.e., 1 2 . . .k .
Regarding the initial population, both designs integrated individuals with k random values between 0 and 1 (ordered in EvoNN1). To
include individuals representing classic kNN, initial population also
included several vectors with the rst k values set to 1 and the remaining set to 0 in the initial population e.g., (1.0, 0.0, . . ., 0.0) for
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
F itness(W, k, T R) : error
error = 0
for i = 1 to m do
Divide T R in s bags: B1 ...Bs
partialError = 0
for j = 1 to s do
partialError = partialError + Evaluate(W, k, T R Bj , Bj )
end for
partialError = partialError/s
error = error + partialError
end for
error = error/m
return error
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
Evaluate(W, k, T rain, T est) : error

error = 0
for each instT est in T est do
lab = N earestN (W, k, T rain, instT est)
if lab = label(insT est) then
error = error + 1
end if
end for
error = error/size(T est)
return error
24:
25:
26:
27:
28:
29:
30:
31:
N earestN (W, k, T rain, y) : labY

sortedInst and kN eighbours are empty sorted sets
for each x in T rain do
insert x in sortedInst sorted by d(x, y)
end for
kN eighbours = sortedInst.get(k)
labY = majorityLabel(kN eighbours, W )
return labY
11
Fig. 1. Fitness function.
k = 1, (1.0, 1.0, . . ., 0.0) for k = 2, etc. Finally, the maximum value of

1 for a weight could be surpassed during the evolutionary process to
stand out the importance of a concrete neighbour as explained in the
following section.
2.2.2. Crossover and mutation

The goal of the crossover operator was the generation of a new
individual (offspring) from the genotypic characteristics of a set of selected parents (two in our case, parent1 and parent2). There is also a
constraint in the order of the genes in EvoNN1. Thus, the crossover
operator in the ith gene for EvoNN1 was dened as in Eq. (9) where
BLX stands for the crossover operator dened in Eshelman and
Schaffer (1993) and calculated from parent1(i) and parent2(i), is a
random value between 0 and 1, max = o f f spring(i 1) and min =
minimum( parent1(i), parent2(i), o f f spring(i 1)).
o f f spring(i) =
BLX
if i = 1
(max min) + min
otherwise
(9)
In EvoNN2, because the individuals were freely built, the crossover

operator was simpler, see Eq. (10):
o f f spring(i) = BLX
from parent1(i)
and parent2(i)
(10)
Regarding the mutation operator, the ith gene of the individual indiv could change according to Eq. (11) in EvoNN1 where was initially
a random value between 0 and 1. Then, to improve the t through the
generations, the value was in the interval [0, 1] during the rst 10
generations. For the next 10, it was in [0, 0.9], etc. On the other hand,
the mutation operator in EvoNN2 worked as can be seen in Eq. (12).
indiv(i) + indiv(i)
indiv(i) indiv(i)
indiv (i) =
(indiv(i 1) indiv(i + 1))

+indiv(i + 1)
indiv (i) = indiv(i) indiv(i)
if i = 1
if i = k
(11)
otherwise
(12)
2.2.3. Fitness function

EvoNN1 and EvoNN2 were dened by the same tness function.
This is because the proposed evolutionary designs were thought to
minimize the classication error and therefore, the order of weights
did not therefore have any inuence on the tness function. Both
approaches used the TR D exclusively to obtain the contributions
of the neighbours in the training step. Since we know the labels of
the instances from TR, the tness function was based on a crossvalidation error rate by using kNN and the weighted voting system.
Fig. 1 shows the tness calculation with m s cross validations,
where m stands for the number of iterations of the validation process (line 3) and s is the number of partitions of the training data TR
(line 4). Thus, the set TR is randomly divided in the bags B1 , B2 ...Bs
for each validation. Then, every bag Bj is evaluated through a classication process by using T R B j as a training set. This evaluation is
driven by the function Evaluate which we will describe later. The classication error on every Bj is accumulated on average by partialError
(lines 7 and 9), and by error in every validation (line 10). Finally, the
tness value is the result of calculating the mean of all the validations
(line 12).
The input parameters of the function Evaluate are the weighted
vector W, the k value, and the subsets T R B j and Bj (line 7).
12
Therefore, the result of this function is the error rate on Bj taking

T R B j as reference to calculate the neighbours.
For every single instance from the set used to measure the tness
(line 16), the returned label by the function NearestN is the majority
one according to the k nearest instances belonging to the set used as
training data (line 17). If the returned label does not correspond to the
real label of the testing instance, the error is increased by 1 (line 19).
Then, the resulting error is normalized with the size of the set used
as testing data (line 22). Therefore, the value returned by Evaluate is
a real number between 0 (all instances are well-classied) and 1 (all
instances are misclassied).
The function NearestN calculates the nearest instances to the example y belonging to the set under evaluation (line 24 and seq.). Every example of the neighbours bag is then inserted in a sorted set
according to the distance to y. Thus, the example at the rst position
will be the nearest to y and the one at the last position will be the furthest (line 27). When the algorithm selects the k nearest neighbours
from the sorted set (line 29), the majority label is returned according
to the relative contribution of each neighbour expressed by the vector
W and according to the Eq. (6) (lines 30 and 31).
2.2.4. Generational policy
Regarding the transition between generations, we chose an elitist
design where the best individual was transferred from one generation to the next without being affected by the mutation operator. The
remaining population is built as follows: with N being the number of
individuals, C 1 individuals were created by cloning the best individual from the previous generation, and the next N C individuals
resulted from the crossover operation. The selection of the individuals to cross was carried out by the tournament method. Furthermore,
all individuals except the rst one were under mutation with a probability of p.
3. Results
Applying optimization techniques to a kNN classier is a line of research that has been widely discussed in the literature with considerable success. Finding an optimal weighting (for features or instances)
increases the degrees of freedom of kNN. This allows the model to
achieve a better t in the training step and consequently improve the
accuracy in testing. The novelty of this work lies in that we do not
weight features or instances but the inuence of each of the k nearest neighbours. In the traditional kNN model, the selected neighbours
participate with a unitary vote. Our approach modies the weight of
the votes to modulate the contribution of each neighbour. Thus, our
approach introduces a exibility mechanism able to learn geometric arrangements of the different classes that depend on the domain
under study. For example, if we focus on a dataset where the rst
neighbour is twice as important as the second one, and the second
neighbour is in turn twice as important as the third, fourth and fth
one, we know that the traditional kNN is not able to exploit this singularity but our proposal could suggest a weighted vector (4,2,1,1,1)
which would be an optimal solution.
To assess the quality of our approaches, we use 48 datasets
from the repository UCI (Lichman, 2013) with different types of features and number of classes. All data were preprocessed with the
same techniques i.e., binarization of nominal features, replacement of
missing values and normalization to avoid the Hughes effect. Regarding the evolutionary search conguration and after a trial-and-error
process, we used a population of 100 individuals, 100 generations,
10% of elitism and a mutation probability of 0.1. Regarding the parameters (crossover), g (mutation) and k (number of neighbours)
their values were set at 0.5, 20 and 5 respectively for both EvoNN1
and EvoNN2.
Although the experiments were carried out on four Intel
Xeon Processor E7-4820, an increase of the computational cost
Table 1
Comparison of accuracies of EvoNN1 and EvoNN2. In bold, the best.
Dataset
EvoNN1
EvoNN2
Dataset
EvoNN1
EvoNN2
anneal
arrhythmia
audiology
australian
autos
balance
breast c.
breast w.
bridges v1
bridges v2
car
cmc
colic
credit a
credit g
cylinder b
dermatology
diabetes
ecoli
ags
glass
haberman
hayes
heart c
97.98
60.13
67.12
83.89
75.42
86.27
71.27
97.15
70.7
69.79
83.43
63.21
79.67
84.34
72.17
80.19
95.39
71.73
93.17
55.4
60.8
72.77
62.35
82.52
97.95
59.76
65.97
84.11
75.12
86.23
69.64
97.17
67.86
66.42
86.89
63.07
79.55
83.75
72.81
80.19
95.62
71.6
92.81
54.73
62.57
68.49
62.26
82.14
heart h
heart stat
hepatitis
hypothyroid
ionosphere
iris
kr vs kp
labor
landsat
liver disorders
lung cancer
lymph
mfeat factors
mfeat fourier
mfeat karhunen
mfeat morph.
mfeat pixel
mfeat zernike
molecular bio.
monks 1
monks 2
monks 3
mushroom
nursery
86.78
78.38
83.44
93.06
90.45
99.78
98.5
81.69
93.86
61.49
81.4
80.87
97.1
86.19
98.04
97.24
96.93
92.05
38.34
47.39
47.12
47.78
100
95.91
86.04
78.13
81.26
93.04
90.36
99.77
98.55
81.22
93.77
60.72
75.79
80.47
97.13
85.91
98
97.31
96.92
92.05
32.47
54.79
54.35
46.22
100
96.22
is usually derived when using evolutionary computation. Thus,

the CPU time for the optimization process can be expressed as
#Generations#IndividualskNN time
where the number of available threads
#threads
depends on the workload. Taking the German Credit Data as an example (credit g in Table 1 with 1000 instances and 20 features), the
evolutionary algorithm took about thirteen and a half minutes to run
once.
In a rst level, a comparison between EvoNN1 and EvoNN2 was
established. Table 1 shows the results obtained. Every row represents
the mean accuracy after ve 10-fold cross-validations (10FCV) with
different random seeds on a dataset. We provided the mean accuracy to decrease the inuence of randomness in the evolutionary algorithms. The results show that EvoNN1 obtained better results for
most of the datasets.
After testing the performance of both versions, we applied the
Wilcoxon signed-rank test to try to assure that the differences between the two versions were statistically signicant. The reason for
using a non-parametric test lies in the fact that the results did not
meet the necessary conditions to apply parametric tests, especially
for the sphericity condition (Demar, 2006; Garca & Herrera, 2008).
The statistic for Wilcoxon was 290.0 and the p-Value 0.0062, so we
could state that they behaved signicantly different from each other.
To measure the quality of our best approach, we established a
second level of comparison among IBk (implementation of kNN in
the framework WEKA (Hall et al., 2009)), WKNN, UKNN, DWKNNv1
(Gou et al., 2011) and DWKNNv2 (Gou et al., 2012), and EvoNN1. All
the weighting algorithms were developed from the original IBk but
changing its voting system, and keeping the same k value of 5. Table 2
shows the mean accuracy obtained by the analyzed algorithms. Every
dataset was evaluated for each technique as the mean of ve 10FCV
using ve different seeds. As can be seen, the performance of our algorithm was the best in 25 out of the 48 datasets, and the second best
in 7 out of the remaining 23.
Although our approach seemed to outperform the rest of competitors, the results were statistically validated again to reinforce
this conclusion. Thus, we carried out a non-parametric Friedman test
and a Holm post-hoc procedure to elucidate if the performance of
the different algorithms was statistically different. The rst step for
the Friedman test is the calculation of the mean rankings reached
by each technique (a ranking of 1 is the best). Table 3 shows the
13
Table 2
Accuracy of every studied algorithm throughout 48 datasets from UCI.
Datasets
EvoNN1
IBk
WKNN
UWKNN
DWKNN1
DWKNN2
anneal
arrhythmia
audiology
australian
autos
balance
breast c.
breast w.
bridges v1
bridges v2
car
cmc
colic
credit a
credit g
cylinder b
dermatology
diabetes
ecoli
ags
glass
haberman
hayes
heart c
heart h
heart stat
hepatitis
hypothyroid
ionosphere
iris
kr vs kp
labor
landsat
liver disorders
lung cancer
lymph
mfeat factors
mfeat fourier
mfeat karhunen
mfeat morph.
mfeat pixel
mfeat zernike
molecular bio.
monks 1
monks 2
monks 3
mushroom
nursery
97.98
60.13
67.12
83.89
75.42
86.27
71.27
97.15
70.7
69.79
83.43
63.21
79.67
84.34
72.17
80.19
95.39
71.73
93.17
55.4
60.8
72.77
62.35
82.52
86.78
78.38
83.44
93.06
90.45
99.78
98.5
81.69
93.86
61.49
81.4
80.87
97.1
86.19
98.04
97.24
96.93
92.05
38.34
47.39
47.12
47.78
100
95.91
97.1
59.39
60.73
84.38
59.79
88.28
73.86
97
58.34
60.47
93.13
45.83
79.2
83.46
72.59
72.29
96.26
74.72
86.49
54.31
66.13
71.08
27.66
83.31
79.41
78.36
82.67
93.25
85.61
95.59
96.2
80.07
90.29
58.36
80.95
78.53
96.56
81.56
96.11
71.08
95.92
80.53
32.57
52.25
54.13
38.48
100
98.36
97.85
57.53
68.4
82.16
73.22
86.81
69.63
96.09
65.11
62.75
93.13
45.26
73.7
81.41
71.87
78.7
96.32
72.67
82.84
59.12
70.27
70.46
67.52
80.6
78.56
76.59
81.8
92.62
87.46
95.19
95.95
84.01
90.53
61.28
71.44
82.82
96.69
81.02
96.8
68.03
96.71
79.09
33.56
35.8
38.27
35.28
100
93.63
98.43
58.45
68.78
83.46
75.87
82.65
70.95
96.36
64.93
63.37
88.16
45.37
77.32
83.65
72.9
77.37
95.79
72.91
83.78
60.31
68.05
67.55
67.47
80.14
78.56
79.39
83.91
93.13
86.33
95.66
95.12
85.78
90.37
60.57
78.02
84.97
96.51
81.26
96.75
68.44
96.78
79.39
34.76
39.51
40.78
37.59
100
96.1
99.14
54.9
66.48
80.4
76.19
82.01
68.12
95.39
64.08
62.15
88.16
44.16
70.79
80.12
71.33
79.16
95.16
70.59
80.24
57.93
69.49
64.92
74.62
76.66
76.67
75.7
79.95
91.2
86.89
95.66
93.7
85.67
90.13
59.34
67.87
82.37
96.08
80.07
96.18
65.9
96.36
79.03
35.24
37.19
38.03
38.93
100
93.23
98.23
57.23
68.44
82.11
76.13
86.81
69.63
96.04
65.24
62.75
93.13
44.56
73.5
81.47
71.79
78.81
96.32
72.54
82.18
59.12
70.07
69.77
67.52
80.27
78.54
76.31
80.64
92.5
87.62
95.19
95.95
84.13
90.56
61.01
71.44
82.85
96.68
80.94
96.77
67.7
96.7
79.07
33.56
35.8
38.27
35.45
100
93.63
Table 3
Mean ranking reached by every compared technique.
Technique
Ranking
EvoNN1
UWKNN
IBk
WKNN
DWKNN2
DWKNN1
2.270
3.062
3.458
3.594
3.781
4.833
rankings in our case. Then, we carried out the Friedman test. The
statistic for Friedman was 48.95, distributed according to a chi-square
with 5 degrees of freedom. The p-Value for Friedman was around
2.302E9 so the null hypothesis (no statistical difference among the
compared techniques) could be rejected with an = 0.05.
Because Friedman test should be accompanied by a post-hoc procedure for multiple pairwise comparison, we used the Holm procedure. This procedure took our approach (EvoNN1) as the control
method and compared it with each other technique by a pairwise
Table 4
Results for Holm post-hoc procedure.
DataSet
Holms adjusted
DWKNNv1
DWKNNv2
WKNN
IBk
UWKNN
1.94E11
7.65E5
5.32E4
0.002
0.038
6.710
3.955
3.464
3.109
2.073
0.010
0.013
0.017
0.025
0.050
Friedman test. The results of the procedure can be seen in Table 4

(p-Value, Friedman statistic and adjusted for Holm procedure). In
this case, every test rejected the hypothesis of no pairwise difference
(p-Value below adjusted ), so we could state that our algorithm was
signicantly better than its competitors from a statistical point of
view taking into account that it obtained the best ranking and also
behaved statistically different from the rest.
From our experimentation, some facts should be outlined. First,
an evolutionary algorithm to adjust the contribution of the neighbourhood improved the behaviour of the classical kNN rule. This fact
could not completely be assured for the rest of weighting techniques
14
probably due to a better adaptability of metaheuristics compared

with strict functions as the suggested by DWKNN or UWKNN.
Second, the correct working of the evolutionary approach was
highly correlated with the distance of the weighted neighbours to the
unlabelled instances in the training phase. In other words, although
EvoNN2 had more degrees of freedom in the search space, EvoNN1
reached the best performance. Although this proved true, we think
this fact can be related to problems in the evolutionary search in
EvoNN2 so more work could be needed to make it reach its potentially better weights.
4. Conclusions
This work presented an evolutionary method to optimize the kNN
algorithm. Unlike the classical approaches that use weights on features and instances of data, we focused on adjust the contribution of
each one of the k-nearest neighbours. The results showed a satisfactory performance that was statistically supported.
On the other hand, despite the use of multitasking architectures,
optimization techniques and more specically evolutionary heuristics usually consume more computational resources than lazy classiers. Therefore it will be necessary to establish exible and scalable
parallelization pathways that are in line with the growing amount of
data.
In future works, we will focus in alternative metrics to accuracy
to use in unbalanced data classication. Thus, precision, recall, false
positive rate, false negative rate or F-measure could be considered as
the objective function to optimize.
In addition, Big Data is a clear target to research. Although there
are not yet many studies about using kNN in Big Data, we will try to
extrapolate our method to apply in massive datasets.
Regression problems are approachable too. The kNN algorithm
works effectively when the class to predict is numeric so our method
can be used in a natural way.
Finally, the strengths of our system were tested on 48 heterogeneous domains and the obtained results encourage us to implement future experiments on more specic problems. In this context, we have experience with remote sensing data (Garca-Gutierrez,
Gonalves-Seco, & Riquelme-Santos, 2011) and this approach could
compliment the cited study in a new work.
References
AlSukker, A., Khushaba, R., & Al-Ani, A. (2010). Optimizing the k-NN metric weights using differential evolution. In Proceedings of international conference on multimedia
computing and information technology (MCIT), 2010 (pp. 8992).
Berikov, V. (2014). Weighted ensemble of algorithms for complex data clustering. Pattern Recognition Letters, 38, 99106.
Blachnik, M., & Duch, W. (2011). LVQ algorithm with instance weighting for generation
of prototype-based rules. Neural Networks, 24(8), 824830.(Articial Neural Networks: Selected Papers from ICANN 2010).
Demar, J. (2006). Statistical comparisons of classiers over multiple data sets. Journal
of Machine Learning Research, 7, 130.
Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions
on Systems, Man and Cybernetics, 6(4), 325327.
Eshelman, L. J., & Schaffer, J. D. (1993). Real-coded genetic algorithms and intervalschemata. In D. L. Whitley (Ed.), Foundation of genetic algorithms 2 (pp. 187202).
San Mateo, CA: Morgan Kaufmann.
Fernandez, F., & Isasi, P. (2008). Local feature weighting in nearest prototype classication. IEEE Transactions on Neural Networks, 19(1), 4053.
Garca-Gutierrez, J., Gonalves-Seco, L., & Riquelme-Santos, J. C. (2011). Automatic environmental quality assessment for mixed-land zones using lidar
and intelligent techniques. Expert Systems with Applications, 38(6), 68056813.
http://dx.doi.org/10.1016/j.eswa.2010.12.065.
Garca, S., & Herrera, F. (2008). An Extension on Statistical Comparisons of Classiers
over Multiple Data Sets for all Pairwise Comparisons. Journal of Machine Learning
Research, 9, 26772694.
Gou, J., Du, L., Zhang, Y., & Xiong, T. (2012). A new distance-weighted k-nearest neighbor
classier. Journal of Information & Computational Science, 9(6), 14291436.
Gou, J., Xiong, T., & Kuang, J. (2011). A novel weighted voting for k-nearest neighbor
rule. Journal of Computers, 6(5), 833840.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The
WEKA data mining software: an update. SIGKDD Explorations, 11(1).
Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Mateos-Garca, D., Garca-Gutirrez, J., & Riquelme-Santos, J. C. (2012). On the evolutionary optimization of k-NN by label-dependent feature weighting. Pattern Recognition Letters, 33(16), 22322238. doi:10.1016/j.patrec.2012.08.011.
Mohemmed, A. W., & Zhang, M. (2008). Evaluation of particle swarm optimization
based centroid classier with different distance metrics. In Proceedings of IEEE
congress on evolutionary computation08 (pp. 29292932).
Paredes, R., & Vidal, E. (2006). Learning weighted metrics to minimize nearestneighbor classication error. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(7), 11001110.
Park, C. H., & Kim, S. B. (2015). Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Systems with Applications, 42(5), 23362342.
Park, Y., Park, S., Jung, W., & Lee, S-g. (2015). Reversed CF: a fast collaborative ltering
algorithm using a k-nearest neighbor graph. Expert Systems with Applications, 42(8),
40224028. http://dx.doi.org/10.1016/j.eswa.2015.01.001.
Raymer, M., Doom, T., Kuhn, L., & Punch, W. (2003). Knowledge discovery in medical
and biological datasets using a hybrid bayes classier/evolutionary algorithm. IEEE
Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 33(5), 802813.
Raymer, M., Punch, W., Goodman, E., Kuhn, L., & Jain, A. (2000). Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation, 4(2),
164171.
Rodger, J. A. (2014). A fuzzy nearest neighbor neural network statistical model
for predicting demand for natural gas and energy cost savings in public buildings. Expert Systems with Applications, 41(4, Part 2), 18131829.
Tahir, M. A., Bouridane, A., & Kurugollu, F. (2007). Simultaneous feature selection and
feature weighting using hybrid tabu search/k-nearest neighbor classier. Pattern
Recognition Letters, 28(4), 438446.
Zhi, X., Fan, J., & Zhao, F. (2014). Robust local feature weighting hard c-means clustering
algorithm. Neurocomputing, 134, 2029.

Expert Systems With Applications: Daniel Mateos-García, Jorge García-Gutiérrez, José C. Riquelme-Santos

Uploaded by

Copyright:

Available Formats

Expert Systems With Applications: Daniel Mateos-García, Jorge García-Gutiérrez, José C. Riquelme-Santos

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Expert Systems With Applications: Daniel Mateos-García, Jorge García-Gutiérrez, José C. Riquelme-Santos

Uploaded by

Copyright:

Available Formats

Expert Systems With Applications 43 (2016) 914

Contents lists available at ScienceDirect

Expert Systems With Applications

An evolutionary voting for k-nearest neighbours

Correspoding author. Tel.: +34954555964, Fax.: +34954557139.

D. Mateos-Garca et al. / Expert Systems With Applications 43 (2016) 914

b being the number of labels. Thus, let D = {(e, l ) | e R f and l

predLabel (x) = arg max

(l, label (xi ))

(l, label (xi )) =

If instead of a unitary vote, we consider that the ith neighbour

predLabel (x, w) = arg max

i (l, label (xi ))

1 if predLabel (x, w) = label (x)

then the function to optimize is:

With regard to the two versions of the evolutionary algorithm,

D. Mateos-Garca et al. / Expert Systems With Applications 43 (2016) 914

Evaluate(W, k, T rain, T est) : error

N earestN (W, k, T rain, y) : labY

Fig. 1. Fitness function.

k = 1, (1.0, 1.0, . . ., 0.0) for k = 2, etc. Finally, the maximum value of

2.2.2. Crossover and mutation

(max min) + min

In EvoNN2, because the individuals were freely built, the crossover

the mutation operator in EvoNN2 worked as can be seen in Eq. (12).

(indiv(i 1) indiv(i + 1))

2.2.3. Fitness function

D. Mateos-Garca et al. / Expert Systems With Applications 43 (2016) 914

Therefore, the result of this function is the error rate on Bj taking

is usually derived when using evolutionary computation. Thus,

D. Mateos-Garca et al. / Expert Systems With Applications 43 (2016) 914

Friedman test. The results of the procedure can be seen in Table 4

D. Mateos-Garca et al. / Expert Systems With Applications 43 (2016) 914

probably due to a better adaptability of metaheuristics compared

You might also like