Applsci 11 01742 v2
Applsci 11 01742 v2
Applsci 11 01742 v2
sciences
Article
A Comparison of Feature Selection and Forecasting Machine
Learning Algorithms for Predicting Glycaemia in Type 1
Diabetes Mellitus
Ignacio Rodríguez-Rodríguez 1, * , José-Víctor Rodríguez 2 , Wai Lok Woo 3 , Bo Wei 3 and
Domingo-Javier Pardo-Quiles 2
Abstract: Type 1 diabetes mellitus (DM1) is a metabolic disease derived from falls in pancreatic
insulin production resulting in chronic hyperglycemia. DM1 subjects usually have to undertake
a number of assessments of blood glucose levels every day, employing capillary glucometers for
the monitoring of blood glucose dynamics. In recent years, advances in technology have allowed
for the creation of revolutionary biosensors and continuous glucose monitoring (CGM) techniques.
This has enabled the monitoring of a subject’s blood glucose level in real time. On the other hand,
few attempts have been made to apply machine learning techniques to predicting glycaemia levels,
Citation: Rodríguez-Rodríguez, I.; but dealing with a database containing such a high level of variables is problematic. In this sense,
Rodríguez, J.-V.; Woo, W.L.; Wei, B.; to the best of the authors’ knowledge, the issues of proper feature selection (FS)—the stage before
Pardo-Quiles, D.-J. A Comparison of applying predictive algorithms—have not been subject to in-depth discussion and comparison in past
Feature Selection and Forecasting research when it comes to forecasting glycaemia. Therefore, in order to assess how a proper FS stage
Machine Learning Algorithms for could improve the accuracy of the glycaemia forecasted, this work has developed six FS techniques
Predicting Glycaemia in Type 1
alongside four predictive algorithms, applying them to a full dataset of biomedical features related
Diabetes Mellitus. Appl. Sci. 2021, 11,
to glycaemia. These were harvested through a wide-ranging passive monitoring process involving
1742. https://doi.org/10.3390/
25 patients with DM1 in practical real-life scenarios. From the obtained results, we affirm that
app11041742
Random Forest (RF) as both predictive algorithm and FS strategy offers the best average performance
Academic Editor: Pasi Fränti
(Root Median Square Error, RMSE = 18.54 mg/dL) throughout the 12 considered predictive horizons
(up to 60 min in steps of 5 min), showing Support Vector Machines (SVM) to have the best accuracy
Received: 8 January 2021 as a forecasting algorithm when considering, in turn, the average of the six FS techniques applied
Accepted: 8 February 2021 (RMSE = 20.58 mg/dL).
Published: 16 February 2021
Keywords: diabetes mellitus; machine learning; feature selection; time series forecasting
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations. 1. Introduction
Type I diabetes mellitus (DM1) is generally accompanied by excessive blood sugar
levels caused by the fact that the body is failing to create insulin. Within healthy subjects,
blood glucose levels are regulated by glucose homeostasis, a closed-loop system [1]. The
Copyright: © 2021 by the authors. pancreas is the home of β cells that react to excessive glucose levels and create insulin to
Licensee MDPI, Basel, Switzerland. combat hyperglycemia. In DM1 subjects, such regulatory processes do not occur. DM1
This article is an open access article is an autoimmune disease that causes the immune system to attack the pancreas’ insulin-
distributed under the terms and
producing cells. This is the most aggressive type of diabetes. DM1 subjects are incapable of
conditions of the Creative Commons
producing insulin so they have to rely on either exogenous injections of the hormone or by
Attribution (CC BY) license (https://
wearing an insulin pump for regulation of glucose levels [2]. Management of diabetes aims
creativecommons.org/licenses/by/
to maintain homeostasis and to keep blood glucose at close to normal levels, alongside the
4.0/).
algorithm. Many of these predictive algorithms have only been applied to blood glucose
prediction, but to the best of the authors’ knowledge, there has been no comparative study
not only of the prediction techniques, but also of the feature selection techniques. We
find that it is complex to make fair comparisons with their results since each one uses
different databases, with different added features, evaluating them in different ways (at
different predictive horizons, different error evaluation metrics, etc.) and throughout dif-
ferent monitoring times. With all this, it is complicated to reach a conclusion on which one
behaves better and it is not possible to extract overall conclusions. In addition, the previous
phase of variable selection has not been compared either, so the influence of this on the
improvement of the prediction has not been studied. The aim of the paper is therefore
to make a fair comparison of these algorithms with a complete database. Moreover, the
inclusions of the variables collected in our database are rarely seen in studies on blood
glucose prediction.
Thus, in this paper, a rich and complete dataset has been acquired alongside a novel
process of features selection that enhances the performance and comparison of the predic-
tion has been developed. We believe that this paper will enable a confident combination
of feature selection and forecasting techniques to achieve more precise predictions in an
assumable predictive horizon (PH).
In short, the main contributions of this paper are as follows:
- A brief literature review on variable selection and prediction methods for glycaemia
values in diabetics.
- To use an innovative database in the field of DM1, both in terms of the number of
patients/variables considered and the monitoring time covered.
- To test different variable selection techniques.
- To combine these feature selection techniques with different predictive algorithms.
- To discuss the influence of the variable selection techniques on the performance of the
predictive algorithm, as well as to study the accuracy achieved.
This research has employed a number of cutting-edge modeling and forecasting
techniques. Implementation and analysis have been undertaken using six feature selection
techniques alongside four forecasting techniques. The paper is organized as follows:
Section 2 describes some previous works on forecasting glycaemia in DM1 to frame our
research. In Section 3, we present the feature selection techniques and predictive algorithms.
The monitoring campaign is depicted in Section 4, while Section 5 details the methodology
followed in our work, with descriptions of the ML techniques. Section 6 offers the main
results and discussion, and finally, we conclude the paper in Section 7.
2. Related Works
As mentioned in Section 1, it is crucial that an accurate range of variables and effective
data collection methodology is employed when predicting blood glucose levels and that
note is taken in the way they have previously been used. These variables must be related as
time-series data, as the level of historical occurrences is considered as significant. Feature
selection using a time-series is not the same as feature selection using standard data. With
standard static data, target values only refer to the features’ contemporary values. However,
with feature selection in time-series, the target values are related to feature values at various
points in the past as well as the contemporary value. This means that it is essential in
feature selection in time-series to excise redundant/irrelevant variables and features, and
selecting the relevant previous values for the creation of an effective dataset. In order
to forecast glucose levels, it is widely recognized that the effect of previous values has
importance. One example is the research of Eskaf et al. [9] who found, by employing
discrete Fourier transformations, that blood glucose levels vary as a result of meals within a
single timeframe of eating. This means that there is the capacity (which is crucial) to select
the correct influential variables and so reduce the dimensionality. This is an important
stage in processing [10] prior to applying a data mining algorithm to any dataset.
Appl. Sci. 2021, 11, 1742 4 of 20
Feature selection methodologies with time-series can be separated into filter, wrapper,
and embedded techniques [11]. Filter techniques simply use the data in deciding on the
features for retention. Wrapper techniques employ a learning algorithm wrapped around
the feature search, selecting the features on the basis of its performance. With embedded
methods, weighting is employed to control parameter values. In all cases, applications of
variable selection methods can affect the course of diabetes mellitus. Balakrishnan et al. [12]
applied Support Vector Machines (SVM) to rank variables affecting type 2 diabetes, and
some hybrid methods have been proposed to optimize the diagnosis of diabetes [13].
However, there have not been many studies that apply variable selective methods to
features that affect the immediate course of blood glucose in patients with type 1 DM.
This could be due to some variables only recently being considered for predicting blood
sugar [14]. In 2019, a sequential backward selection algorithm was successfully applied
using a linear model in a cross-validation (CV) setting, obtaining an optimized and reduced
subset [15]. Despite this, a study comparing the performance of different feature selection
methods applied to diabetic patients’ glycaemia has not been developed.
Numerous attempts have been made to develop a reliable prediction of glucose
in DM1 patients. In this case, there have been approaches from a univariate point of
view [16], using Autoregressive Integrated Moving Average (ARIMA), Random Forest
(RF) and Support Vector Machines (SVM) with acceptable results. However, although
univariate approximations can be interesting in computationally restricted environments,
multivariate methods have demonstrated higher accuracy [17]. In this regard, some
forecasting strategies need to be highlighted where results will be improved after a proper
feature selection preliminary stage.
Linear Regression (LR) is probably the simplest methodology. This group of models
endeavors to discover an assessment of the parameters of the model with the goal that the
summation of the squared errors is minimized. Although it is the simplest, its accuracy
could be sufficient, and due to its simplicity, it is easily executable even by limited hardware.
In any case, recent approximations using Least Absolute Shrinkage and Selection Operator
(LASSO) regression have achieved acceptable accuracy and good performance [18].
There are other methods using the same sort of techniques, for example, Gaussian Pro-
cesses (GP) with Radial Basis Function Kernels (RBF) [19], which permit overall uniformity
and limitless levels of basic functions. However, these are not often employed, although
some researchers have used such techniques and the results have shown promise [20].
Some recent research has also looked at GP [21], examining the potential for automatic
insulin delivery that could reduce the number of hypoglycemic events.
GP is a non-parametric methodology that revolves around creating a model of observ-
able responses from a number of points in the training data (function values) and using
them as multivariate normal random variables [22]. It is assumed that the function values
will be distributed in such a way that the function will operate smoothly. Specifically, when
closeness exists (in an Euclidean distance context) between matching input vectors that
decay with divergences, the two function values will be closely correlated. Employing an
assumed distribution by applying a basic probability manipulation allows for posterior
distribution of hitherto unpredicted function values.
RF algorithms use a method referred to as bagging, which re-samples data instances a
number of times in order to create several training subsets on the same training data [23]. A
decision tree is then designed for every training subset until a tree ensemble has been built.
Every tree then inputs a unit vote influencing the outcomes of the incoming data instance
cost label. Xu [24] diagnosed DM1 employing RS with a public hospital dataset, and
this methodology performed better (85% success) than a number of other methodologies
such as the ID3 (Iterative Dichotomiser 3) algorithm (78.57%), the naïve Bayes algorithm
(79.89%), and the AdaBoost algorithm (84.19%).
SVM is a dual learning algorithm that undertakes example processing just by comput-
ing the dot-product. It is possible to efficiently compute such dot-products between feature
vectors, employing a kernel function that does not obliterate every corresponding feature.
Appl. Sci. 2021, 11, 1742 5 of 20
Once the SVM learner has been supplied with the kernel function, it seeks out a hyperplane
that can separate negative and positive examples and simultaneously maximize the size of
their separation (margin). SVM is not prone to over-fitting and performs well in generaliza-
tion as a result of the max-margin criterion employed throughout optimization. Although
MLP solutions may solely be a local optimum, SVMs will always converge to a global
optimum as a result of corresponding convex optimization formulations. The work [25]
offered a useful method of hypoglycemic detection based on SVM, employing a galvanic
skin response using skin temperatures, monitoring of heart rates, and a small band. Re-
grettably, the size and type of the dataset used in this research has unintentionally limited
the applicability of the results. SVM revolves around employing high-dimensional feature
spaces (built using transformational original variables) and the application of penalties to
the resulting complexities by using a penalty term integrated within the error function [26].
Other approaches like the stacking-based General Regression Neural Network (GRNN)
ensemble model are truly promising [27–29], but it has not been previously applied to DM1.
In this sense, the authors of this work intend to analyze it in future research.
In relation to the foregoing, it can be concluded that these techniques have shown
promise in terms of forecasting the dynamics of glycaemia. Nevertheless, as far as the
authors are aware, no research has yet been undertaken employing selection techniques
and a variety of forecasting algorithms utilizing real-world data and a proper feature set to
reveal the most accurate of the options available.
predictions are [37]. One way of doing this is to undertake correlation of the features and
the objectives (our predictions). The optimal features are those with the highest correlation.
The embedded Method (Shrinkage) represents a selection strategy using inbuilt vari-
ables. In this strategy features are neither selected nor excised. Certain controls are applied
to parameter values (weights). Another technique is LASSO Regression. With this method,
regularization is undertaken and certain regression coefficients tend towards zero [38]. As
the coefficients fall towards zero they are dropped/rejected. Another technique is Ridge
Regression (Tikhonov regularization), which incorporates a punishment increasing with
the square of the coefficient greatness [39]. Every coefficient is diminished by an identical
factor (which means no predictor undergoes elimination).
A selection of the above techniques will be employed in this research using a Ranker
Strategy [40], which minimizes the metric Root Mean Squared Error (RMSE) and leads to
reductions in the feature set. Both groups have differing approaches, one univariate and
one multivariate. Univariate techniques are quicker and easier to scale, but they do not take
account of variable dependencies. Conversely, multivariate techniques can model feature
dependencies, but they are not as fast or as easy to scale as univariate techniques [41]. The
methodology section will give further details on the selected techniques. If we minimize
the metric, this leads to improvements in forecasting.
3.2. Forecasting
Once the FS is complete, we can begin the forecasting task in time series. Wolpert and
Macready [42] stated that when we lack the information regarding the underlying model,
we cannot say with certainty that any particular model will always outperform another.
This means that the optimal strategy is to experiment with a number of techniques in order
to discover the most effective model. This research has used both linear and non-linear
techniques to focus on the algorithms that show the greatest promise.
Linear regression is one of the simplest techniques. In this model, we search for an
estimate of the model parameters in order to minimize the sum of the squared errors [43].
The modifications incorporate partial least squares/penalized models, e.g., ridge regression
or LASSO.
One advantage from the proponents of such models is that they are easy to interpret.
Relationships are indicated by the coefficients and these are generally very simple to calcu-
late, meaning that we can afford to employ a number of features. However, performance
with these models may be limited [44]. Good results are achieved if the predictor/response
relationship falls on a hyperplane. If we have higher-order relationships, e.g., like, cubic,
or quadratic, such models may not accurately capture nonlinear relationships and thus we
have to look for a different approach [45].
Certain models are capable of understanding non-linear trends. We do not need to
know the precise type of nonlinearity prior to constructing our model. One of the most
widely used models is Support Vector Machines (SVM). These are dual learning algorithms
that compute the dot-products of data in processing [46]. Proper computation of such
dot-products between variable rates may be achieved using a kernel function [47]. Using
such a function, SVM learners seek out the hyperplane separating the examples with the
maximum separation (margin) between them. It is recognized that SVMs are resistant
to overfitting and perform well in terms of generalization as a result of the max-margin
criterion employed during the optimization process. Additionally, although alternative
solutions produce only local optimums, SVM will converge to a global optimum due to the
corresponding convex optimization formulation [48].
There has been considerable interest in the Regression Trees family of modeling
algorithms in recent times. Tree-based modeling employs if/then statements to find the
predictors that will be used for data partitioning. In such subsets, a model is employed for
forecasting outcomes [49]. Statistically, the addition of randomness when constructing the
tree helps to reduce correlations between predictors. This is used in the Random Forests
Appl. Sci. 2021, 11, 1742 7 of 20
(RF) technique [50]. All models from the set are employed in building predictions for new
datasets, and an average is taken of the predictions, which supplies the ultimate forecast.
RF models undertake variance reductions through the selection of robust complex
learners with low bias levels. This decreases the number of errors and additionally proves
strong in overcoming noisy responses [51].
Gaussian Processes (GPs) with Radial Basis Function Kernels (RBF) [52] and other
forms of comparative strategy create consistency overall and allow for a limitless quantity of
basic functions, but these are rarely used, even though some past research has demonstrated
that it can show promise [53].
GP methodology is nonparametric, with a focus on taking discernible reactions from
a variety of training data points (function values) and modeling them as multivariate
standard random features [54]. It is assumed that there is a priority distribution of these
function data values, guaranteeing that the function will operate smoothly.
When the comparing vectors are close (in the sensitivity and separation), the func-
tion values will be closely correlated, with decay occurring upon divergence. We may
subsequently calculate how the unpredicted function data is distributed by employing an
assumed distribution and applying a basic probability manipulation.
and achieves a reasonable level of accuracy (11.4% Mean Absolute Relative Difference,
MARD). The subjects were asked to note down what fast-insulin dosages that had slow-
insulin dosages, and also the carbohydrates they consumed through food, meaning that
the data were empirical rather than subjective.
The CGM has a maximum lifespan of 14 days, but it can cease functioning prior to
that. As it cannot be reattached once it has fallen off, it can cease working through accident,
failed adhesion, or excessive humidity. In addition, the initial days of use can be inaccurate
as the calibration is not solidified. It was proposed that data should be harvested from nine
days of the usage period, with the initial days being excised as calibration was still taking
place, and the final one is excised as the device may not have been able to cover the full
14-day lifespan. Thus, the experimental phase had 5400 h of data to consider.
Freestyle Libre is a device using flash glucose monitoring. Glucose levels are trans-
mitted instantaneously when required, employing Near Field Communication (NFC)
which needs the patient to actively require the data. Certain devices act as transductors
NFC-Bluetooth (e.g., the popular miao-miao: https://miaomiao.cool/?lang=en, accessed
January 2021). The Libre sensor can be attached to the device and this means that data can
be transmitted to a smartphone at regular intervals.
The dataset was rounded out by use of the smart band Fitbit Charge HR©. This
advanced fitness device automatically tracks various data and monitors the wearer’s heart
rate at all times. It can record sleep time, attitude climbed, step numbers, distance travelled,
and heart rate. It connects using Bluetooth-low-energy and was linked with a computer
and a small phone so that all necessary trends could be monitored. A number of other
researchers have already employed Fitbit trackers to monitor the subjects’ health [55]. All
volunteers were given smart watches to keep a continuous record of their physical activity
(step numbers) over the fortnight, along with heart rate and sleep data. Although these
devices are not designed for precise medical use, it has been demonstrated that they are
sufficiently accurate for the data to be used in research.
This monitoring was undertaken in 2018 and was continually supervised by the
Endocrinology Departments of the Virgen de la Arrixaca and Morales Meseguer hospitals,
two well-respected facilities in Murcia, Spain.
To the best of the authors’ knowledge, unfortunately there is no previous work
within the scientific literature with comprehensive data acquisition, since some previously
published studies consider only partial monitoring, and collect only some of the features
that can be recorded, or simply focus on a limited number of patients and/or data collection
for just a few days.
Once all data had been acquired, preprocessing was undertaken on the dataset with
outliers and gaps being cleaned. For cleaning outliers, extreme value analysis was em-
ployed, either by looking at scatter plots or by searching the values that were deviated
more than double the mean. With gaps, interpolation methods were employed, with finger
stick glucose values being added where possible. The sampling period was set at five
minutes, sufficient to indicate tendencies and rapid changes but not so high that the ML
algorithms would be overloaded. Data storage was undertaken in compliance with the
highest level of data protection regulations regarding personal information. Additionally,
the Ethics Committee of the Universidad de Murcia, Spain, understood supervision of the
way the patients were monitored.
The data gathered will be of great assistance to this research and other researchers
in the future. Table 1 shows a variety of data relating to the population covered by
our monitoring.
Appl. Sci. 2021, 11, 1742 9 of 20
5. Methodology
5.1. The Waikato Environment for Knowledge Analysis (WEKA)
The University of Waikato, New Zealand, has produced the open source software
Waikato Environment for Knowledge Analysis (WEKA v.3.8) (https://waikato.github.
io/weka-wiki/, accessed December 2020). This free software is licensed with the GNU
General Public License. WEKA comprises various algorithms and visualization tools to
analyze data to use in predictive modeling, alongside graphical user interfaces allowing
such functions to be easily accessed. A number of standard data mining routines can
Appl. Sci. 2021, 11, 1742 10 of 20
neurons (those which correspond with the problem classes) and uses the in-
formation to identify a subset of pertinent usable attributes to be employed in
supervised pattern classification [63].
# Instance-Based k-nearest neighbor algorithm (IBk) [64]: this is a K-nearest
neighbor classifier that selects a suitable value for K on the basis of CV; it can
also perform distance weighting.
- Filter Methods. For univariate methods, we will employ the predictors listed below.
# Relief Attribute (Rlf) [65]: Relief feature selection works on the basis of cre-
ating a score by identifying feature value differences for nearest neighbor
instance pairs.
# Principal Component Analysis (PCA) [66]: With this method, we introduce a
novel set of orthogonal coordinate axes, simultaneously maximizing sample
data variants. This makes other directions with more minor variants have less
significance and so they can be cleaned from the dataset. PCA is extremely
effective in transforming data at lower dimensions and can also show us
simplified underlying data patterns.
Technique Command
Ranker weka.attributeSelection.Ranker -T -1.8E308 -N -1
weka.attributeSelection.ClassifierAttributeEval -execution-slots 100 -B weka.classifiers.functions.LinearRegression -F 5 -T 0.01 -R 1 -E
Classifier LR RMSE – -S 0 -R 1.0E-8 -num-decimal-places 4” -S “weka.attributeSelection.Ranker -T -1.8E308 -N 100
weka.attributeSelection.ClassifierAttributeEval -execution-slots 100 -B weka.classifiers.trees.RandomForest -F 5 -T 0.01 -R 1 -E RMSE –
Classifier RF -P 100 -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1” -S “weka.attributeSelection.Ranker -T -1.8E308 -N 100
weka.attributeSelection.ClassifierAttributeEval -execution-slots 1 -B weka.classifiers.functions.MultilayerPerceptron -F 5 -T 0.01 -R 1 -E
Classifier MLP RMSE – -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a” -S “weka.attributeSelection.Ranker -T -1. 1.8E308 -N 100”
weka.attributeSelection.ClassifierAttributeEval -execution-slots 1 –B weka.classifiers.lazy.IBk -F 5 -T 0.01 -R 1 -E RMSE – -K 1 -W 0 -A
Classifier IBk \”weka.core.neighboursearch.LinearNNSearch -A \\\”weka.core.EuclideanDistance -R first-last\\\”\”“ -S
“weka.attributeSelection.Ranker -T -1.8E308 -N 100”
Rlf “weka.attributeSelection.ReliefFAttributeEval -M -1 -D 1 -K 10” -S “weka.attributeSelection.Ranker -T -1.8E308 -N 100”
PCA weka.attributeSelection.PrincipalComponents -R 0.95 -A 5” -S “weka.attributeSelection.Ranker -T 1.8E308 -N -1
LR: Linear Regression; RF: Random Forest; MLP: Multi-Layer Perceptron; IBk: Instance-Based K-nearest neighbor; Rlf: Relief Attribute;
PCA: Principal Component Analysis.
Technique Command
LR weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4
RF weka.classifiers.trees.RandomForest -P 100 -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1
weka.classifiers.functions.SMOreg -C 1.0 -N 0 -I “weka.classifiers.functions.supportVector.RegSMOImproved -T 0.001 -V
SVM
-P 1.0E-12 -L 0.001 -W 1” -K “weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C 250007”
weka.classifiers.functions.GaussianProcesses -L 1.0 -N 0 -K “weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C
GP
250007” -S 1
LR: Linear Regression; RF: Random Forest; SVM: Support Vector Machines; GP: Gaussian Process.
Figure 1. Example of training stage with RF algorithm of the subset RF from patient ‘01’ for a 60 min PH glycemia prediction.
Red, real data of glycaemia; Blue, modeled data.
The results of the forecasting task are tabulated in Table 5. With each predictive
algorithm, and for each subset of data, we calculated the accuracy of the next 60 min
(12 steps) of the glycaemia data. Using the CV technique, we obtained the RMSE (mg/dL)
for each future step as an average of the 25 patients. Later, as a measure of performance,
we further obtained an average of the 12 values of RMSE regarding each FS technique
(RMSE). We also estimated the standard deviation in each predicted series with the aim
to infer the accuracy’s variability. We performed the Shapiro–Wilk test to determine if the
data presented a normal distribution for each 12-step prediction. The results showed that
the data were normally distributed (p-values > 0.05).
As stated in Table 5, we obtained the lower RMSE averaged between the 12 predictions
(RMSE) using RF as a foresight algorithm with the RF dataset (RMSE = 18.54 mg/dL) and
in an average of the FS technique, the best performance is obtained using SVM as predictive
technique (RMSE = 20.58 mg/dL) but we have to note that the better performances are
located on the early predictions (5, 10, 15 min), and then they rise. Figure 2 shows the
evolution of the accuracy per forecasting algorithm with the different FS approaches.
In relation to Figure 2, we can observe how LR as a forecasting technique (upper left)
offers an adequate behavior, but mainly in low PHs (5 or 10 min). Later, the error goes up
for a further PH. It can be seen how the application of a variable selection method means a
general improvement in performance, in light of the differences between the “no FS” line
and the others.
This difference in performance according to FS technique is also observed in the RF
predictive technique (above right). RF stands out as the best approach to select variables.
In this case, RF as a forecasting technique generates good short-term accuracy, and in the
long term it stabilizes at a value that may be acceptable under the best selective techniques.
SVM as a predictive algorithm (bottom left) presents an excellent performance at near
PHs but in the long term it rises in a linear fashion. Again, it is observed how the selection
of variables is necessary to reduce the error, but not all the techniques used result in a
similar behavior. Lightweight RF as FS provides the best result.
GP shows the worst performance in terms of the prediction algorithm. The variability,
according to the selection technique, is wide, the worst precision being when not using
variable selection and the best when RF is used for this task.
Appl. Sci. 2021, 11, 1742 14 of 20
Table 5. RMSE in test stage up to 12-steps glycemia forecasting. Averaged results of 25 subjects.
RMSE (mg/dL)
Subset FS 5 10 15 20 25 30 35 40 45 50 55 60 RMSE Stnd Dev.
Forecasting technique: LR
No F.S. 9.32 15.15 18.94 22.72 26.23 29.30 31.26 33.33 35.48 37.69 40.03 42.55 28.50 10.32
LR 9.00 14.17 16.84 20.07 21.79 23.52 25.41 27.36 29.29 31.24 33.25 35.33 23.94 7.98
RF 9.29 14.53 17.16 20.26 21.82 23.38 25.08 26.84 28.58 30.32 32.09 33.91 23.60 7.41
MLP 9.22 14.44 17.09 20.25 21.89 23.51 25.26 27.06 28.83 30.61 32.42 34.29 23.74 7.57
IBk 9.51 14.88 17.57 20.69 22.26 23.83 25.54 27.31 29.07 30.84 32.63 34.48 24.05 7.50
Rlf 9.75 15.68 18.91 22.47 24.31 26.05 27.91 29.81 31.64 33.39 35.07 36.71 25.98 8.19
PCA 9.53 14.97 17.77 21.04 22.77 24.42 26.20 28.11 29.92 31.81 33.75 35.74 24.67 7.89
RMSE 24.93
Forecasting technique: RF
No F.S. 13.17 20.87 24.78 28.65 31.03 31.78 32.40 32.92 33.33 33.66 33.92 34.15 29.22 6.50
LR 9.75 14.96 17.89 20.71 22.37 22.84 23.20 23.43 23.61 23.70 23.76 23.80 20.84 4.45
RF 7.91 13.21 16.22 19.08 19.74 20.21 20.57 20.83 21.01 21.16 21.26 21.33 18.54 4.14
MLP 8.88 14.18 17.14 19.97 21.68 22.18 22.53 22.78 22.94 23.04 23.10 23.10 20.13 4.52
IBk 7.95 13.29 16.27 19.12 20.81 21.30 21.87 22.15 22.35 22.50 22.58 22.59 19.40 4.63
Rlf 12.39 17.42 20.38 23.25 25.06 25.74 26.27 26.72 27.10 27.40 27.64 27.82 23.93 4.84
PCA 11.93 17.10 20.08 21.35 22.80 23.80 24.67 25.03 25.65 26.17 26.56 26.85 22.67 4.47
RMSE 22.10
Forecasting technique: SVM
No F.S. 2.38 9.16 16.35 20.10 22.62 25.13 27.69 29.84 32.11 34.25 36.43 38.63 24.56 11.07
LR 1.99 7.64 13.53 16.45 18.37 20.39 22.51 24.27 26.11 27.85 29.65 31.48 20.02 8.96
RF 2.33 5.70 11.64 14.57 16.50 18.52 20.63 22.37 24.19 25.92 27.71 29.52 18.30 8.55
MLP 0.99 6.65 12.56 15.51 17.48 19.54 21.69 23.47 25.32 27.09 28.93 30.78 19.17 9.06
IBk 3.56 7.73 13.66 16.57 18.48 20.50 22.61 24.36 26.18 27.92 29.73 31.57 20.24 8.69
Rlf 4.02 8.44 14.82 18.02 20.14 22.30 24.40 25.98 27.62 29.12 30.65 32.16 21.47 8.82
PCA 3.26 7.77 13.74 16.67 18.62 20.64 22.76 24.52 26.35 28.09 29.89 31.71 20.33 8.79
RMSE 20.58
Forecasting technique: GP
No F.S. 15.96 26.16 37.01 43.79 45.80 47.29 47.62 47.98 48.31 48.66 49.03 49.43 42.25 10.68
LR 12.08 25.82 31.17 33.37 34.34 34.79 35.02 35.15 35.23 35.28 35.32 35.35 31.91 6.83
RF 5.32 17.40 22.37 24.49 25.43 25.86 26.07 26.18 26.24 26.28 26.30 26.32 23.19 6.20
MLP 7.11 19.96 25.26 27.51 28.51 28.97 29.19 29.31 29.38 29.42 29.45 29.47 26.13 6.60
IBk 10.30 23.40 28.52 30.64 31.57 32.01 32.24 32.36 32.43 32.47 32.50 32.52 29.25 6.53
Rlf 7.84 24.73 32.56 36.34 38.26 39.28 39.84 40.17 40.38 40.51 40.60 40.66 35.10 9.78
PCA 15.62 24.24 27.85 29.37 30.02 30.31 30.44 30.50 30.53 30.54 30.55 30.55 28.38 4.42
RMSE 30.89
F.S.: Feature Selection; LR: Linear Regression; RF: Random Forest; SVM: Support Vector Machines; GP: Gaussian Process; MLP: Multi-Layer
Perceptron; IBk: Instance-Based K-nearest neighbor; Rlf: Relief Attribute; PCA: Principal Component Analysis.
Appl. Sci. 2021, 11, 1742 15 of 20
Figure 2. Test stage. RMSE up to 12-step glycemia forecasting. Averaged results of 25 subjects. Upper left, LR. Upper right,
RF. Lower left, SVM. Lower right, GP. F.S.: Feature Selection; LR: Linear Regression; RF: Random Forest; SVM: Support
Vector Machines; GP: Gaussian Process; MLP: Multi Layer Perceptron; IBk: Instance-Based K-nearest neighbor; Rlf: Relief
Attribute; PCA: Principal Component Analysis.
In general, according to the mean values in Table 5, SVM is the best forecasting tech-
nique with an RMSE value of 20.58 mg/dL (considering all the FS techniques employed),
and the best FS algorithm is RF. The latter (RF as the best FS technique) can be seen in the
four predictive cases. It is also worth noting that the selection of variables always improves
accuracy, and in some cases the technique itself can provide significant differences. Addi-
tionally, it is necessary to show that, for very short PHs, SVM works very well, but more
broadly, RF could be a more balanced option. Variations in the error values at each step
between the various 25 patients (standard deviation) are generally contained and do not
present large variations between methods.
Figure 3 shows an example of how that prediction behaves at 60 min (12 steps). As can
be seen, the blue line (prediction) follows the red line (actual data) quite closely, although
the error increases at situations of variation. This does not happen under situations of
glycemic stability. Therefore, the patient’s own control will influence the accuracy of the
prediction. For this reason the predictions have been made with a CV method which
has foreseen all types of real situations (stability and oscillations) throughout the days of
monitoring of each subject.
Appl. Sci. 2021, 11, 1742 16 of 20
Figure 3. Example of forecasting stage with RF algorithm of the subset RF from patient ‘01’ for a 60 min PH glycemia
prediction. Red, real data of glycaemia; Blue, predicted data.
nition, and the gradient boosting algorithm, using other databases like the D1NAMO
project [71], which implied the monitoring of 20 healthy subjects and 9 patients by record-
ing their electrocardiograms, breathing, accelerometer signals as well as glucose levels,
including more biosensors that provide more variables in real time and thereby improving
the accuracy of the glycaemia prediction and extending the PH within the glycemic series,
and providing early warning of health monitoring [72].
Author Contributions: Conceptualization, I.R.-R. and J.-V.R.; methodology, I.R.-R., J.-V.R. and
D.-J.P.-Q.; software, I.R.-R. and D.-J.P.-Q.; validation, J.-V.R., B.W. and D.-J.P.-Q.; formal analysis,
I.R.-R., B.W. and J.-V.R.; investigation, J.-V.R. and D.-J.P.-Q.; resources, W.L.W.; data curation, B.W.
and D.-J.P.-Q.; writing—original draft preparation, I.R.-R. and J.-V.R.; writing—review and editing,
I.R.-R., W.L.W. and J.-V.R.; visualization, B.W. and W.L.W.; supervision, J.-V.R.; project administration,
W.L.W. and B.W.; funding acquisition, W.L.W. and J.-V.R. All authors have read and agreed to the
published version of the manuscript.
Funding: Ignacio Rodríguez-Rodríguez would like to thank the support of Programa Operativo
FEDER Andalucía 2014–2020 under Project No. UMA18-FEDERJA-023 and Universidad de Málaga,
Campus de Excelencia Internacional Andalucía Tech.
Institutional Review Board Statement: The study was conducted according to the guidelines of
the Declaration of Helsinki, and approved by the Ethical Research Commission of the University of
Murcia on 25 January 2018 (Id.16 83/2017).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Fowler, M.J. Diabetes: Magnitude and Mechanisms. Clin. Diabetes 2007, 25, 25–28. [CrossRef]
2. DeWitt, D.E.; Hirsch, I.B. Outpatient insulin therapy in type 1 and type 2 diabetes mellitus: Scientific review. JAMA 2003, 289,
2254–2264. [CrossRef]
3. Davidson, M.B.; Davidson, M.B. Diabetes Mellitus: Diagnosis and Treatment; Saunders: Philadelphia, PA, USA, 1998.
4. Sherr, J.L.; Tauschmann, M.; Battelino, T.; De Bock, M.; Forlenza, G.; Roman, R.; Hood, K.; Maahs, D.M. ISPAD Clinical Practice
Consensus Guidelines 2018: Diabetes technologies. Pediatr. Diabetes 2018, 19, 302–325. [CrossRef]
5. Westman, E.C.; Tondt, J.; Maguire, E.; Yancy, W.S., Jr. Implementing a low-carbohydrate, ketogenic diet to manage type 2 diabetes
mellitus. Expert Rev. Endocrinol. Metab. 2018, 13, 263–272. [CrossRef] [PubMed]
6. Kowalski, A. Can We Really Close the Loop and How Soon? Accelerating the Availability of an Artificial Pancreas: A Roadmap
to Better Diabetes Outcomes. Diabetes Technol. Ther. 2009, 11, S113. [CrossRef]
7. Nguyen, B.P.; Ho, Y.; Wu, Z.; Chui, C.-K. Implementation of model predictive control with modified minimal model on low-power
RISC microcontrollers. In Proceedings of the Third Symposium on Virtual Reality Modeling Language-VRML, Monterey, CA,
USA, 16–19 February 2012. [CrossRef]
8. Chui, C.-K.; Nguyen, B.P.; Ho, Y.; Wu, Z.; Nguyen, M.; Hong, G.S.; Mok, D.; Sun, S.; Chang, S. Embedded Real-Time Model
Predictive Control for Glucose Regulation. In XXVI Brazilian Congress on Biomedical Engineering; Springer Nature: Berlin, Germany,
2013; Volume 39, pp. 1437–1440.
9. Eskaf, E.K.; Badawi, O.; Ritchings, T. Predicting blood glucose levels in diabetics using feature extraction and Artificial Neural
Networks. In Proceedings of the 2008 3rd International Conference on Information and Communication Technologies: From
Theory to Applications, Damascus, Syria, 7–11 April 2008; pp. 1–6.
10. Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79.
[CrossRef]
11. Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273–324. [CrossRef]
12. Balakrishnan, S.; Narayanaswamy, R.; Savarimuthu, N.; Samikannu, R. SVM ranking with backward search for feature selection
in type II diabetes databases. In Proceedings of the 2008 IEEE International Conference on Systems, Man and Cybernetics,
Singapore, 12–15 October 2008; pp. 2628–2633.
13. Tomar, D.; Agarwal, S. Hybrid Feature Selection Based Weighted Least Squares Twin Support Vector Machine Approach for
Diagnosing Breast Cancer, Hepatitis, and Diabetes. Adv. Artif. Neural Syst. 2015, 2015, 1–10. [CrossRef]
14. Rodríguez-Rodríguez, I.; Rodríguez, J.-V.; Zamora-Izquierdo, M. Variables to Be Monitored via Biomedical Sensors for Complete
Type 1 Diabetes Mellitus Management: An Extension of the “On-Board” Concept. J. Diabetes Res. 2018, 2018, 1–14. [CrossRef]
[PubMed]
Appl. Sci. 2021, 11, 1742 18 of 20
15. Rodríguez-Rodríguez, I.; Rodríguez, J.-V.; González-Vidal, A.; Zamora, M.; Rodríguez, R.; Vidal, G. Feature Selection for Blood
Glucose Level Prediction in Type 1 Diabetes Mellitus by Using the Sequential Input Selection Algorithm (SISAL). Symmetry 2019,
11, 1164. [CrossRef]
16. Rodríguez-Rodríguez, I.; Chatzigiannakis, I.; Rodríguez, J.-V.; Maranghi, M.; Gentili, M.; Zamora-Izquierdo, M. Utility of Big
Data in Predicting Short-Term Blood Glucose Levels in Type 1 Diabetes Mellitus Through Machine Learning Techniques. Sensors
2019, 19, 4482. [CrossRef]
17. Rodríguez-Rodríguez, I.; Rodríguez, J.V.; Molina-García-Pardo, J.M.; Zamora-Izquierdo, M.Á.; Rodríguez-Rodríguez, M.T.M.I.I.;
Martínez-Inglés, M.T. A Comparison of Different Models of Glycemia Dynamics for Improved Type 1 Diabetes Mellitus
Management with Advanced Intelligent Analysis in an Internet of Things Context. Appl. Sci. 2020, 10, 4381. [CrossRef]
18. Xie, J.; Wang, Q. Benchmarking Machine Learning Algorithms on Blood Glucose Prediction for Type I Diabetes in Comparison
with Classical Time-Series Models. IEEE Trans. Biomed. Eng. 2020, 67, 3101–3124. [CrossRef]
19. Sun, S.; Zhang, G.; Wang, C.; Zeng, W.; Li, J.; Grosse, R. Differentiable compositional kernel learning for Gaussian processes.
In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4828–4837.
20. Ortmann, L.; Shi, D.; Dassau, E.; Doyle, F.J.; Leonhardt, S.; Misgeld, B.J. Gaussian process-based model predictive control of blood
glucose for patients with type 1 diabetes mellitus. In Proceedings of the 2017 11th Asian Control Conference (ASCC), Gold Coast,
QLD, Australia, 17–20 December 2017.
21. Ortmann, L.; Shi, D.; Dassau, E.; Doyle, F.J.; Misgeld, B.J.; Leonhardt, S. Automated Insulin Delivery for Type 1 Diabetes Mellitus
Patients using Gaussian Process-based Model Predictive Control. In Proceedings of the 2019 American Control Conference
(ACC), Philadelphia, PA, USA, 10–12 July 2019.
22. Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning, 1st ed.; The MIT Press: Cambridge, MA, USA,
2016; pp. 33–77.
23. Sage, A.J.; Genschel, U.; Nettleton, D. Tree aggregation for random forest class probability estimation. Stat. Anal. Data Min. 2020,
13, 134–150. [CrossRef]
24. Xu, W.; Zhang, J.; Zhang, Q.; Wei, X. Risk prediction of type II diabetes based on random forest model. In Proceedings of the
2017 Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics
(AEEICB), Chennai, India, 27–28 February 2017; pp. 382–386.
25. Marling, C.; Xia, L.; Bunescu, R.; Schwartz, F. Machine Learning Experiments with Noninvasive Sensors for Hypoglycemia
Detection. In Proceedings of the IJCAI Workshop on Knowledge Discovery in Healthcare Data, New York, NY, USA, 19–24 June
2016.
26. Rodríguez-Rodríguez, I.; Zamora, M.Á.; Rodríguez, J.V. On predicting glycaemia in type 1 diabetes mellitus patients by using
support vector machines. In Proceedings of the 1st International Conference on Internet of Things and Machine Learning,
Liverpool, UK, 17–18 October 2017; pp. 1–2.
27. Izonin, I.; Tkachenko, R.; Verhun, V.; Zub, K. An approach towards missing data management using improved GRNN-SGTM
ensemble method. Eng. Sci. Technol. Int. J. 2020, in press. [CrossRef]
28. Tkachenko, R.; Izonin, I.; Kryvinska, N.; Dronyuk, I.; Zub, K. An Approach towards Increasing Prediction Accuracy for the
Recovery of Missing IoT Data based on the GRNN-SGTM Ensemble. Sensors 2020, 20, 2625. [CrossRef]
29. Izonin, I.; Tkachenko, R.; Vitynskyi, P.; Zub, K.; Tkachenko, P.; Dronyuk, I. Stacking-based GRNN-SGTM Ensemble Model for
Prediction Tasks. In Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), Zallaq,
Bahrain, 8–9 November 2020; pp. 326–330.
30. Guyon, I.; Elissee, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182.
31. Sheikhpour, R.; Sarram, M.A.; Gharaghani, S.; Chahooki, M.A.Z. A Survey on semi-supervised feature selection methods. Pattern
Recognit. 2017, 64, 141–158. [CrossRef]
32. Hastie, T.; Tibshirani, R.; Tibshirani, R.J. Extended comparisons of best subset selection, forward stepwise selection, and the lasso.
arXiv 2017, arXiv:1707.08692.
33. Rodríguez-Rodríguez, I.; Rodríguez, J.V.; Pardo-Quiles, D.J.; Heras-González, P.; Chatzigiannakis, I. Modeling and Forecasting
Gender-Based Violence through Machine Learning Techniques. Appl. Sci. 2020, 10, 8244.
34. Karegowda, A.G.; Manjunath, A.S.; Jayaram, M.A. Feature Subset Selection Problem using Wrapper Approach in Supervised
Learning. Int. J. Comput. Appl. 2010, 1, 13–17. [CrossRef]
35. Yang, K.; Yoon, H.; Shahabi, C. A supervised feature subset selection technique for multivariate time series. In Proceedings of
the Workshop on Feature Selection for Data Mining: Interfacing Machine Learning with Statistics, New Port Beach, CA, USA,
23 April 2005; pp. 92–101.
36. Crone, S.F.; Kourentzes, N. Feature selection for time series prediction—A combined filter and wrapper approach for neural
networks. Neurocomputing 2010, 73, 1923–1936. [CrossRef]
37. Sánchez-Maroño, N.; Alonso-Betanzos, A.; Tombilla-Sanromán, M. Filter Methods for Feature Selection—A Comparative
Study. In Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning, Guilin, China,
30 October–1 November 2017; pp. 178–187.
38. Fonti, V.; Belitser, E. Feature Selection Using Lasso. VU Amst. Res. Pap. Bus. Anal. 2017, 30, 1–25.
Appl. Sci. 2021, 11, 1742 19 of 20
39. Zhang, H.; Zhang, R.; Nie, F.; Li, X. A Generalized Uncorrelated Ridge Regression with Nonnegative Labels for Unsupervised
Feature Selection. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
Calgary, AB, Canada, 15–20 April 2018; pp. 2781–2785.
40. Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. A review of feature selection methods on synthetic data. Knowl. Inf.
Syst. 2012, 34, 483–519. [CrossRef]
41. Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. Distributed feature selection: An application to microarray data
classification. Appl. Soft Comput. 2015, 30, 136–150. [CrossRef]
42. Wolpert, D.; Macready, W. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [CrossRef]
43. Shmueli, G.; Lichtendahl, K.C., Jr. Practical Time Series Forecasting with r: A Hands-on Guide; Axelrod Schnall Publishers: Green
Cove Springs, FL, USA, 2016.
44. Faloutsos, C.; Gasthaus, J.; Januschowski, T.; Wang, Y. Forecasting big time series: Old and new. Proc. VLDB Endow. 2018, 11,
2102–2105. [CrossRef]
45. Kalekar, P.S. Time Series Forecasting Using Holt-Winters Exponential Smoothing; Kanwal Rekhi School of Information Technology:
Powai, Mumbai, 2004; pp. 1–13.
46. Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013.
47. Schölkopf, B.; Smola, A.J. A short introduction to learning with kernels. In Advanced Lectures on Machine Learning; Springer:
Berlin/Heidelberg, Germany, 2003; pp. 41–64.
48. Kuhn, M.; Johnson, K. Applied Predictive Modeling, 1st ed.; Springer: New York, NY, USA, 2013; ISBN 978-1-4614-6848-6.
49. Fierrez, J.; Morales, A.; Vera-Rodriguez, R.; Camacho, D. Multiple classifiers in biometrics. part 1: Fundamentals and review. Inf.
Fusion 2018, 44, 57–64. [CrossRef]
50. Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22.
51. Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How Many Trees in A Random Forest? In International Workshop on Machine Learning
and Data Mining in Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2012; pp. 154–168.
52. Blomqvist, K.; Kaski, S.; Heinonen, M. Deep Convolutional Gaussian Processes. In Proceedings of the Mining Data for Financial
Applications, Ghent, Belgium, 14–18 September 2020; pp. 582–597.
53. Rodríguez-Rodríguez, I.; Rodríguez, J.V.; Chatzigiannakis, I.; Zamora Izquierdo, M.Á. On the Possibility of Predicting Glycaemia
‘On the Fly’with Constrained IoT Devices in Type 1 Diabetes Mellitus Patients. Sensors 2019, 19, 4538. [CrossRef]
54. Seeger, M. Gaussian processes for machine learning. Int. J. Neural Syst. 2004, 14, 69–106. [PubMed]
55. Whelan, M.E.; Orme, M.; Kingsnorth, A.P.; Sherar, L.B.; Denton, F.L.; Esliger, D.W. Examining the Use of Glucose and Physical
Activity Self-Monitoring Technologies in Individuals at Moderate to High Risk of Developing Type 2 Diabetes: Randomized Trial.
JMIR Mhealth Uhealth 2019, 7, e14195. [CrossRef]
56. Bondia, J.; Vehi, J. Physiology-Based Interval Models: A Framework for Glucose Prediction Under Intra-patient Variability. In
Advances in Bioprocess Engineering and Technology; Springer Nature: Berlin, Germany, 2015; pp. 159–181.
57. Garg, S.K.; Weinzimer, S.A.; Tamborlane, W.V.; Buckingham, B.A.; Bode, B.W.; Bailey, T.S.; Brazg, R.L.; Ilany, J.; Slover, R.H.;
Anderson, S.M.; et al. Glucose Outcomes with the In-Home Use of a Hybrid Closed-Loop Insulin Delivery System in Adolescents
and Adults with Type 1 Diabetes. Diabetes Technol. Ther. 2017, 19, 155–163. [CrossRef]
58. Hussain, S.; Dahan, N.A.; Ba-Alwi, F.M.; Ribata, N. Educational Data Mining and Analysis of Students’ Academic Performance
Using WEKA. Indones. J. Electr. Eng. Comput. Sci. 2018, 9, 447–459. [CrossRef]
59. Kiranmai, S.A.; Laxmi, A.J. Data mining for classification of power quality problems using WEKA and the effect of attributes on
classification accuracy. Prot. Control. Mod. Power Syst. 2018, 3, 29. [CrossRef]
60. Lang, S.; Bravo-Marquez, F.; Beckham, C.; Hall, M.; Frank, E. WekaDeeplearning4j: A deep learning package for Weka based on
Deeplearning4j. Knowl.-Based Syst. 2019, 178, 48–50. [CrossRef]
61. Kotthoff, L.; Thornton, C.; Hoos, H.H.; Hutter, F.; Leyton-Brown, K. Auto-WEKA: Automatic Model Selection and Hyperparameter
Optimization in WEKA. In Automated Machine Learning: Methods, Systems, Challenges; Hutter, F., Kotthoff, L., Vanschoren, J., Eds.;
Springer International Publishing: Cham, Switzerland, 2019; pp. 81–95. ISBN 978-3-030-05318-5.
62. Novakovic, J.; Strbac, P.; Bulatovic, D. Toward optimal feature selection using ranking methods and classification algorithms.
Yugosl. J. Oper. Res. 2011, 21, 119–135. [CrossRef]
63. Gasca, E.; Sánchez, J.; Alonso, R. Eliminating redundancy and irrelevance using a new MLP-based feature selection method.
Pattern Recognit. 2006, 39, 313–315. [CrossRef]
64. Aha, D.W.; Kibler, D.; Albert, M.K. Instance-based learning algorithms. Mach. Learn. 1991, 6, 37–66.
65. Kononenko, I. Estimating Attributes: Analysis and Extensions of RELIEF. In Proceedings of the European Conference on Machine
Learning, Catania, Italy, 6–8 April 1994; pp. 171–182.
66. Abdi, H.; Williams, L.J. Principal component analysis. Wiley interdisciplinary reviews. Comput. Stat. 2010, 2, 433–459. [CrossRef]
67. Bergmeir, C.; Benítez, J.M. On the use of cross-validation for time series predictor evaluation. Inf. Sci. 2012, 191, 192–213.
[CrossRef]
68. Snijders, T.A.B. On Cross-Validation for Predictor Evaluation in Time Series. In Lecture Notes in Economics and Mathematical
Systems; Springer Nature: Berlin, Germany, 1988; Volume 307, pp. 56–69.
69. Frank, E.; Hall, M.A.; Holmes, G.; Kirkby, R.B.; Pfahringer, B.; Witten, I.H.; Trigg, L. Weka-A Machine Learning Workbench for
Data Mining. In Data Mining and Knowledge Discovery Handbook; Springer: Boston, MA, USA, 2009; pp. 1269–1277.
Appl. Sci. 2021, 11, 1742 20 of 20
70. Nguyen, B.P.; Tay, W.-L.; Chui, C.-K. Robust Biometric Recognition from Palm Depth Images for Gloved Hands. IEEE Trans.
Hum.-Mach. Syst. 2015, 45, 799–804. [CrossRef]
71. Dubosson, F.; Ranvier, J.-E.; Bromuri, S.; Calbimonte, J.-P.; Ruiz, J.; Schumacher, M. The open D1NAMO dataset: A multi-modal
dataset for research on non-invasive type 1 diabetes management. Inform. Med. Unlocked 2018, 13, 92–100. [CrossRef]
72. Woo, W.L.; Koh, B.H.; Gao, B.; Nwoye, E.O.; Wei, B.; Dlay, S.S. Early Warning of Health Condition and Visual Analytics for
Multivariable Vital Signs. In Proceedings of the 2020 International Conference on Computing, Networks and Internet of Things,
Sanya, China, 24–26 April 2020; pp. 206–211.