Tutorial SVM Matlab

Download as pdf or txt
Download as pdf or txt
You are on page 1of 113

LS-SVMlab Toolbox Users Guide

version 1.7
K. De Brabanter, P. Karsmakers, F. Ojeda, C. Alzate, J. De Brabanter, K. Pelckmans, B. De Moor, J. Vandewalle, J.A.K. Suykens Katholieke Universiteit Leuven Department of Electrical Engineering, ESAT-SCD-SISTA Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium {kris.debrabanter,johan.suykens}@esat.kuleuven.be http://www.esat.kuleuven.be/sista/lssvmlab/ ESAT-SISTA Technical Report 10-146

September 2010

2 Acknowledgements

Research supported by Research Council KUL: GOA AMBioRICS, GOA MaNet, CoE EF/05/006 Optimization in Engineering(OPTEC), IOF-SCORES4CHEM, several PhD/post-doc & fellow grants; Flemish Government: FWO: PhD/postdoc grants, projects G.0452.04 (new quantum algorithms), G.0499.04 (Statistics), G.0211.05 (Nonlinear), G.0226.06 (cooperative systems and optimization), G.0321.06 (Tensors), G.0302. 07 (SVM/Kernel), G.0320.08 (convex MPC), G.0558.08 (Robust MHE), G.0557.08 (Glycemia2), G.0588.09 (Brain-machine) research communities (ICCoS, ANMMM, MLDM); G.0377.09 (Mechatronics MPC), IWT: PhD Grants, McKnow-E, EurekaFlite+, SBO LeCoPro, SBO Climaqs, POM, Belgian Federal Science Policy Oce: IUAP P6/04 (DYSCO, Dynamical systems, control and optimization, 2007-2011); EU: ERNSI; FP7-HD-MPC (INFSO-ICT-223854), COST intelliCIS, EMBOCOM, Contract Research: AMINAL, Other: Helmholtz, viCERP, ACCM, Bauknecht, Hoerbiger. JS is a professor at K.U.Leuven Belgium. BDM and JWDW are full professors at K.U.Leuven Belgium.

Preface to LS-SVMLab v1.7

We have added new functions to the toolbox and updated some of the existing commands with respect to the previous version v1.6. Because many readers are familiar with the layout of version 1.5 and version 1.6, we have tried to change it as little as possible. Here is a summary of the main changes: The major dierence with the previous version is the optimization routine used to nd the minimum of the cross-validation score function. The tuning procedure consists out of two steps: 1) Coupled Simulated Annealing determines suitable tuning parameters and 2) a simplex method uses these previous values as starting values in order to perform a netuning of the parameters. The major advantage is speed. The number of function evaluations needed to nd optimal parameters reduces from 200 in v1.6 to 50 in this version. The construction of bias-corrected approximate 100(1 )% pointwise/simulataneous condence and prediction intervals have been added to this version. Some bug-xes are performed in the function roc. The class do not need to be +1 or 1, but can also be 0 and 1. The conversion is automatically done.

The LS-SVMLab Team Heverlee, Belgium September 2010

Preface to LS-SVMLab v1.6

We have added new functions to the toolbox and updated some of the existing commands with respect to the previous version v1.5. Because many readers are familiar with the layout of version 1.5, we have tried to change it as little as possible. The major dierence is the speed-up of several methods. Here is a summary of the main changes: Chapter/solver/function 1. A birds eye on LS-SVMLab 2. LS-SVMLab toolbox examples Whats new

Roadmap to LS-SVM; Addition of more regression and classication examples; Easier interface for multi-class classication; Changed implementation for robust LS-SVM. Possibility of regression or classication using only one command!; The function validate has been deleted; Faster (robust) training and (robust) model selection criteria are provided; In case of robust regression dierent weight functions are provided to be used with iteratively reweighted LS-SVM. All CMEX and/or C les have been removed. The linear system is solved by using the Matlab command backslash (\).

3. Matlab functions

4. LS-SVM solver

The LS-SVMLab Team Heverlee, Belgium June 2010

1 Introduction 2 A birds eye view on LS-SVMlab 2.1 Classication and regression . . . 2.1.1 Classication extensions . 2.1.2 Tuning and robustness . . 2.1.3 Bayesian framework . . . 2.2 NARX models and prediction . . 2.3 Unsupervised learning . . . . . . 2.4 Solving large scale problems with . . . . . . . . . . . . . . . . . . xed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . size LS-SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 11 11 12 12 12 13 13 13 15 15 15 15 17 19 19 20 22 23 23 25 26 27 28 31 33 34 35 38 39 39 40 40 41 42 43 44 45 46

3 LS-SVMlab toolbox examples 3.1 Roadmap to LS-SVM . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Classication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Hello world . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Using the object oriented interface: initlssvm . . . . . . 3.2.4 LS-SVM classication: only one command line away! . . . 3.2.5 Bayesian inference for classication . . . . . . . . . . . . . 3.2.6 Multi-class coding . . . . . . . . . . . . . . . . . . . . . . 3.3 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 A simple example . . . . . . . . . . . . . . . . . . . . . . 3.3.2 LS-SVM regression: only one command line away! . . . . 3.3.3 Bayesian Inference for Regression . . . . . . . . . . . . . . 3.3.4 Using the object oriented model interface . . . . . . . . . 3.3.5 Condence/Predition Intervals for Regression . . . . . . . 3.3.6 Robust regression . . . . . . . . . . . . . . . . . . . . . . . 3.3.7 Multiple output regression . . . . . . . . . . . . . . . . . . 3.3.8 A time-series example: Santa Fe laser data prediction . . 3.3.9 Fixed size LS-SVM . . . . . . . . . . . . . . . . . . . . . . 3.4 Unsupervised learning using kernel principal component analysis A MATLAB functions A.1 General notation . . . . . . . . . . . . . . A.2 Index of function calls . . . . . . . . . . . A.2.1 Training and simulation . . . . . . A.2.2 Object oriented interface . . . . . A.2.3 Training and simulating functions A.2.4 Kernel functions . . . . . . . . . . A.2.5 Tuning, sparseness and robustness A.2.6 Classication extensions . . . . . . A.2.7 Bayesian framework . . . . . . . . 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 A.2.8 NARX models and prediction . . . . . . . A.2.9 Unsupervised learning . . . . . . . . . . . A.2.10 Fixed size LS-SVM . . . . . . . . . . . . . A.2.11 Demos . . . . . . . . . . . . . . . . . . . . A.3 Alphabetical list of function calls . . . . . . . . . A.3.1 AFEm . . . . . . . . . . . . . . . . . . . . . A.3.2 bay errorbar . . . . . . . . . . . . . . . . A.3.3 bay initlssvm . . . . . . . . . . . . . . . A.3.4 bay lssvm . . . . . . . . . . . . . . . . . . A.3.5 bay lssvmARD . . . . . . . . . . . . . . . . A.3.6 bay modoutClass . . . . . . . . . . . . . . A.3.7 bay optimize . . . . . . . . . . . . . . . . A.3.8 bay rr . . . . . . . . . . . . . . . . . . . . A.3.9 cilssvm . . . . . . . . . . . . . . . . . . . A.3.10 code, codelssvm . . . . . . . . . . . . . A.3.11 crossvalidate . . . . . . . . . . . . . . . A.3.12 deltablssvm . . . . . . . . . . . . . . . . A.3.13 denoise kpca . . . . . . . . . . . . . . . . A.3.14 eign . . . . . . . . . . . . . . . . . . . . . A.3.15 gcrossvalidate . . . . . . . . . . . . . . A.3.16 initlssvm, changelssvm . . . . . . . . . A.3.17 kentropy . . . . . . . . . . . . . . . . . . A.3.18 kernel matrix . . . . . . . . . . . . . . . A.3.19 kpca . . . . . . . . . . . . . . . . . . . . . A.3.20 latentlssvm . . . . . . . . . . . . . . . . A.3.21 leaveoneout . . . . . . . . . . . . . . . . A.3.22 lin kernel, MLP kernel, poly kernel, A.3.23 linf, mae, medae, misclass, mse . . . A.3.24 lssvm . . . . . . . . . . . . . . . . . . . . A.3.25 plotlssvm . . . . . . . . . . . . . . . . . A.3.26 predict . . . . . . . . . . . . . . . . . . . A.3.27 predlssvm . . . . . . . . . . . . . . . . . A.3.28 preimage rbf . . . . . . . . . . . . . . . . A.3.29 prelssvm, postlssvm . . . . . . . . . . . A.3.30 rcrossvalidate . . . . . . . . . . . . . . A.3.31 ridgeregress . . . . . . . . . . . . . . . A.3.32 robustlssvm . . . . . . . . . . . . . . . . A.3.33 roc . . . . . . . . . . . . . . . . . . . . . A.3.34 simlssvm . . . . . . . . . . . . . . . . . . A.3.35 trainlssvm . . . . . . . . . . . . . . . . . A.3.36 tunelssvm, linesearch & gridsearch . A.3.37 windowize & windowizeNARX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RBF kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 48 49 50 51 51 52 54 55 57 59 61 63 65 66 69 71 72 73 74 76 78 79 80 82 83 85 86 87 88 89 91 92 93 94 96 97 98 100 101 103 108

Chapter 1

Support Vector Machines (SVM) is a powerful methodology for solving problems in nonlinear classication, function estimation and density estimation which has also led to many other recent developments in kernel based learning methods in general [14, 5, 27, 28, 48, 47]. SVMs have been introduced within the context of statistical learning theory and structural risk minimization. In the methods one solves convex optimization problems, typically quadratic programs. Least Squares Support Vector Machines (LS-SVM) are reformulations to standard SVMs [32, 43] which lead to solving linear KKT systems. LS-SVMs are closely related to regularization networks [10] and Gaussian processes [51] but additionally emphasize and exploit primal-dual interpretations. Links between kernel versions of classical pattern recognition algorithms such as kernel Fisher discriminant analysis and extensions to unsupervised learning, recurrent networks and control [33] are available. Robustness, sparseness and weightings [7, 34] can be imposed to LS-SVMs where needed and a Bayesian framework with three levels of inference has been developed [44]. LS-SVM alike primal-dual formulations are given to kernel PCA [37, 1], kernel CCA and kernel PLS [38]. For very large scale problems and on-line learning a method of Fixed Size LS-SVM is proposed [8], based on the Nystr om approximation [12, 49] with active selection of support vectors and estimation in the primal space. The methods with primal-dual representations have also been developed for kernel spectral clustering [2], data visualization [39], dimensionality reduction and survival analysis [40] The present LS-SVMlab toolbox Users Guide contains Matlab implementations for a number of LS-SVM algorithms related to classication, regression, time-series prediction and unsupervised learning. All functions are tested with Matlab R2008a, R2008b, R2009a, R2009b and R2010a. References to commands in the toolbox are written in typewriter font.

A main reference and overview on least squares support vector machines is J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle, Least Squares Support Vector Machines, World Scientic, Singapore, 2002 (ISBN 981-238-151-1). The LS-SVMlab homepage is http://www.esat.kuleuven.be/sista/lssvmlab/ The LS-SVMlab toolbox is made available under the GNU general license policy: Copyright (C) 2010 KULeuven-ESAT-SCD This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. 9


CHAPTER 1. INTRODUCTION This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the website of LS-SVMlab or the GNU General Public License for a copy of the GNU General Public License specications.

Chapter 2

A birds eye view on LS-SVMlab

The toolbox is mainly intended for use with the commercial Matlab package. The Matlab toolbox is compiled and tested for dierent computer architectures including Linux and Windows. Most functions can handle datasets up to 20.000 data points or more. LS-SVMlabs interface for Matlab consists of a basic version for beginners as well as a more advanced version with programs for multiclass encoding techniques and a Bayesian framework. Future versions will gradually incorporate new results and additional functionalities. A number of functions are restricted to LS-SVMs (these include the extension lssvm in the function name), the others are generally usable. A number of demos illustrate how to use the dierent features of the toolbox. The Matlab function interfaces are organized in two principal ways: the functions can be called either in a functional way or using an object oriented structure (referred to as the model) as e.g. in Netlab [22], depending on the users choice1 .


Classication and regression

Function calls: trainlssvm, simlssvm, plotlssvm, prelssvm, postlssvm, cilssvm, predlssvm; Demos: Subsections 3.2, 3.3, demofun, democlass, democonfint. The Matlab toolbox is built around a fast LS-SVM training and simulation algorithm. The corresponding function calls can be used for classication as well as for function estimation. The function plotlssvm displays the simulation results of the model in the region of the training points. The linear system is solved via the exible and straightforward code implemented in Matlab (lssvmMATLAB.m), which is based on the Matlab matrix division (backslash command \). Functions for single and multiple output regression and classication are available. Training and simulation can be done for each output separately by passing dierent kernel functions, kernel and/or regularization parameters as a column vector. It is straightforward to implement other kernel functions in the toolbox. The performance of a model depends on the scaling of the input and output data. An appropriate algorithm detects and appropriately rescales continuous, categorical and binary variables (prelssvm, postlssvm). An important tool accompanying the LS-SVM for function estimation is the construction of interval estimates such as condence intervals. In the area of kernel based regression, a popular tool to construct interval estimates is the bootstrap (see e.g. [15] and reference therein). The functions cilssvm and predlssvm result in condence and prediction intervals respectively for
1 See

http://www.kernel-machines.org/software.html for other software in kernel based learning techniques.




LS-SVM [9]. This method is not based on bootstrap and thus obtains in a fast way interval estimates.


Classication extensions

Function calls: codelssvm, code, deltablssvm, roc, latentlssvm; Demos: Subsection 3.2, democlass. A number of additional function les are available for the classication task. The latent variable of simulating a model for classication (latentlssvm) is the continuous result obtained by simulation which is discretised for making the nal decisions. The Receiver Operating Characteristic curve [16] (roc) can be used to measure the performance of a classier. Multiclass classication problems are decomposed into multiple binary classication tasks [45]. Several coding schemes can be used at this point: minimum output, one-versus-one, one-versus-all and error correcting coding schemes. To decode a given result, the Hamming distance, loss function distance and Bayesian decoding can be applied. A correction of the bias term can be done, which is especially interesting for small data sets.


Tuning and robustness

Function calls: tunelssvm, crossvalidatelssvm, leaveoneoutlssvm, robustlssvm; Demos: Subsections 3.2.2, 3.2.6, 3.3.6, 3.3.8, demofun, democlass, demomodel. A number of methods to estimate the generalization performance of the trained model are included. For classication, the rate of misclassications (misclass) can be used. Estimates based on repeated training and validation are given by crossvalidatelssvm and leaveoneoutlssvm. A robust crossvalidation (based on iteratively reweighted LS-SVM) score function [7, 6] is called by rcrossvalidatelssvm. In the case of outliers in the data, corrections to the support values will improve the model (robustlssvm) [34]. These performance measures can be used to determine the tuning parameters (e.g. the regularization and kernel parameters) of the LS-SVM (tunelssvm). In this version, the tuning of the parameters is conducted in two steps. First, a state-of-the-art global optimization technique, Coupled Simulated Annealing (CSA) [52], determines suitable parameters according to some criterion. Second, these parameters are then given to a second optimization procedure (simplex or gridsearch) to perform a ne-tuning step. CSA have already proven to be more eective than multi-start gradient descent optimization [35]. Another advantage of CSA is that it uses the acceptance temperature to control the variance of the acceptance probabilities with a control scheme. This leads to an improved optimization eciency because it reduces the sensitivity of the algorithm to the initialization parameters while guiding the optimization process to quasi-optimal runs. By default, CSA uses ve multiple starters.


Bayesian framework

Function calls: bay lssvm, bay optimize, bay lssvmARD, bay errorbar, bay modoutClass, kpca, eign; Demos: Subsections 3.2.5, 3.3.3. Functions for calculating the posterior probability of the model and hyper-parameters at dierent levels of inference are available (bay_lssvm) [41]. Errors bars are obtained by taking into account model- and hyper-parameter uncertainties (bay_errorbar). For classication [44], one can estimate the posterior class probabilities (this is also called the moderated output) (bay_modoutClass). The Bayesian framework makes use of the eigenvalue decomposition of the kernel matrix. The size of the matrix grows with the number of data points. Hence, one needs



approximation techniques to handle large datasets. It is known that mainly the principal eigenvalues and corresponding eigenvectors are relevant. Therefore, iterative approximation methods such as the Nystr om method [46, 49] are included, which is also frequently used in Gaussian processes. Input selection can be done by Automatic Relevance Determination (bay_lssvmARD) [42]. In a backward variable selection, the third level of inference of the Bayesian framework is used to infer the most relevant inputs of the problem.


NARX models and prediction

Function calls: predict, windowize; Demo: Subsection 3.3.8. Extensions towards nonlinear NARX systems for time-series applications are available [38]. A NARX model can be built based on a nonlinear regressor by estimating in each iteration the next output value given the past output (and input) measurements. A dataset is converted into a new input (the past measurements) and output set (the future output) by windowize and windowizeNARX for respectively the time-series case and in general the NARX case with exogenous input. Iteratively predicting (in recurrent mode) the next output based on the previous predictions and starting values is done by predict.


Unsupervised learning

Function calls: kpca, denoise kpca, preimage rbf; Demo: Subsection 3.4. Unsupervised learning can be done by kernel based PCA (kpca) as described by [30], for which a primal-dual interpretation with least squares support vector machine formulation has been given in [37], which has also be further extended to kernel canonical correlation analysis [38] and kernel PLS.


Solving large scale problems with xed size LS-SVM

Function calls: demo fixedsize, AFEm, kentropy; Demos: Subsection 3.3.9, demo fixedsize, demo fixedclass. Classical kernel based algorithms like e.g. LS-SVM [32] typically have memory and computational requirements of O(N 2 ). Work on large scale methods proposes solutions to circumvent this bottleneck [38, 30]. For large datasets it would be advantageous to solve the least squares problem in the primal weight space because then the size of the vector of unknowns is proportional to the feature vector dimension and not to the number of datapoints. However, the feature space mapping induced by the kernel is needed in order to obtain non-linearity. For this purpose, a method of xed size LS-SVM is proposed [38]. Firstly the Nystr om method [44, 49] can be used to estimate the feature space mapping. The link between Nystr om approximation, kernel PCA and density estimation has been discussed in [12]. In xed size LS-SVM these links are employed together with the explicit primal-dual LS-SVM interpretations. The support vectors are selected according to a quadratic Renyi entropy criterion (kentropy). In a last step a regression is done in the primal space which makes the method suitable for solving large scale nonlinear function estimation and classication problems. The method of xed size LS-SVM is suitable for handling very large data sets. An alternative criterion for subset selection was presented by [3, 4], which is closely related to [49] and [30]. It measures the quality of approximation of the feature space and the space induced



by the subset (see Automatic Feature Extraction or AFEm). In [49] the subset was taken as a random subsample from the data (subsample).

Chapter 3

LS-SVMlab toolbox examples

3.1 Roadmap to LS-SVM

In this Section we briey sketch how to obtain an LS-SVM model (valid for classication and regression), see Figure 3.1. 1. Choose between the functional or objected oriented interface (initlssvm), see A.3.16 2. Search for suitable tuning parameters (tunelssvm), see A.3.36 3. Train the model given the previously determined tuning parameters (trainlssvm), see A.3.35 4a. Simulate the model on e.g. test data (simlssvm), see A.3.34 4b. Visualize the results when possible (plotlssvm), see A.3.25

Figure 3.1: List of commands for obtaining an LS-SVM model



At rst, the possibilities of the toolbox for classication tasks are illustrated.


Hello world

A simple example shows how to start using the toolbox for a classication task. We start with constructing a simple example dataset according to the correct formatting. Data are represented as matrices where each row of the matrix contains one datapoint: >> X = 2.*rand(100,2)-1; >> Y = sign(sin(X(:,1))+X(:,2)); >> X 15



X = 0.9003 -0.5377 0.2137 -0.0280 0.7826 0.5242 .... -0.4556 -0.6024 >> Y Y = -1 -1 1 1 1 1 ... 1 -1 In order to make an LS-SVM model (with Gaussian RBF kernel), we need two tuning parameters: (gam) is the regularization parameter, determining the trade-o between the training error minimization and smoothness. In the common case of the Gaussian RBF kernel, 2 (sig2) is the squared bandwidth: >> >> >> >> gam = 10; sig2 = 0.4; type = classification; [alpha,b] = trainlssvm({X,Y,type,gam,sig2,RBF_kernel}); -0.9695 0.4936 -0.1098 0.8636 -0.0680 -0.1627 .... 0.7073 0.1871

The parameters and the variables relevant for the LS-SVM are passed as one cell. This cell allows for consistent default handling of LS-SVM parameters and syntactical grouping of related arguments. This denition should be used consistently throughout the use of that LS-SVM model. The corresponding object oriented interface to LS-SVMlab leads to shorter function calls (see demomodel). By default, the data are preprocessed by application of the function prelssvm to the raw data and the function postlssvm on the predictions of the model. This option can explicitly be switched o in the call: >> [alpha,b] = trainlssvm({X,Y,type,gam,sig2,RBF_kernel,original}); or be switched on (by default): >> [alpha,b] = trainlssvm({X,Y,type,gam,sig2,RBF_kernel,preprocess}); Remember to consistently use the same option in all successive calls. To evaluate new points for this model, the function simlssvm is used. >> Xt = 2.*rand(10,2)-1; >> Ytest = simlssvm({X,Y,type,gam,sig2,RBF_kernel},{alpha,b},Xt);

LSSVM 1 0.8 1 0.6 0.4 0.2 X2 0 0.2 0.4
RBF 2 , =10, =0.4

with 2 different classes Classifier class 1 class 2

0.6 0.8 1 0.5 0 X1 0.5 1

Figure 3.2: Figure generated by plotlssvm in the simple classication task. The LS-SVM result can be displayed if the dimension of the input data is two. >> plotlssvm({X,Y,type,gam,sig2,RBF_kernel},{alpha,b}); All plotting is done with this simple command. It looks for the best way of displaying the result (Figure 3.2).



The well-known Ripley dataset problem consists of two classes where the data for each class have been generated by a mixture of two normal distributions (Figure 3.3a). First, let us build an LS-SVM on the dataset and determine suitable tuning parameters. These tuning parameters are found by using a combination of Coupled Simulated Annealing (CSA) and a standard simplex method. First, CSA nds good starting values and these are passed to the simplex method in order to ne tune the result. >> >> >> >> % load dataset ... type = classification; L_fold = 10; % L-fold crossvalidation [gam,sig2] = tunelssvm({X,Y,type,[],[],RBF_kernel},simplex,... crossvalidatelssvm,{L_fold,misclass}); >> [alpha,b] = trainlssvm({X,Y,type,gam,sig2,RBF_kernel}); >> plotlssvm({X,Y,type,gam,sig2,RBF_kernel},{alpha,b}); It is still possible to use a gridsearch in the second run i.e. as a replacement for the simplex method >> [gam,sig2] = tunelssvm({X,Y,type,[],[],RBF_kernel},gridsearch,... crossvalidatelssvm,{L_fold,misclass}); The Receiver Operating Characteristic (ROC) curve gives information about the quality of the classier: >> [alpha,b] = trainlssvm({X,Y,type,gam,sig2,RBF_kernel});



>> % latent variables are needed to make the ROC curve >> Y_latent = latentlssvm({X,Y,type,gam,sig2,RBF_kernel},{alpha,b},X); >> [area,se,thresholds,oneMinusSpec,Sens]=roc(Y_latent,Y); >> [thresholds oneMinusSpec Sens] ans = -2.1915 1.0000 1.0000 -1.1915 0.9920 1.0000 -1.1268 0.9840 1.0000 -1.0823 0.9760 1.0000 ... -0.2699 -0.2554 -0.2277 -0.1811 ... 1.1184 1.1220 2.1220 ... 0.1840 0.1760 0.1760 0.1680 ... 0 0 0 ... 0.9360 0.9360 0.9280 0.9280 ... 0.0080 0 0

The corresponding ROC curve is shown on Figure 3.3b.


=11.7704, =1.2557

, with 2 different classes 1 Classifier class 1 class 2 0.9 0.8

Receiver Operating Characteristic curve, area=0.96403, std = 0.009585

0.8 0.7 Sensitivity 0.6 X2 0.6 0.5 0.4 0.3 0.2 0 0.1 0.2 1 0.5 X1 0 0.5 0 0 0.2 0.4 0.6 1 Specificity 0.8 1




(a) Original Classier

(b) ROC Curve

Figure 3.3: ROC curve of the Ripley classication task. (a) Original LS-SVM classier. (b) Receiver Operating Characteristic curve.




Using the object oriented interface: initlssvm

Another possibility to obtain the same results is by using the object oriented interface. This goes as follows: >> % load dataset ... >> % gateway to the object oriented interface >> model = initlssvm(X,Y,type,[],[],RBF_kernel); >> model = tunelssvm(model,simplex,crossvalidatelssvm,{L_fold,misclass}); >> model = trainlssvm(model); >> plotlssvm(model); >> % latent variables are needed to make the ROC curve >> Y_latent = latentlssvm(model,X); >> [area,se,thresholds,oneMinusSpec,Sens]=roc(Y_latent,Y);


LS-SVM classication: only one command line away!

The simplest way to obtain an LS-SVM model goes as follows (binary classication problems and one versus one encoding for multiclass) >> % load dataset ... >> type = classification; >> Yp = lssvm(X,Y,type); The lssvm command automatically tunes the tuning parameters via 10-fold cross-validation (CV) or leave-one-out CV depending on the sample size. This function will automatically plot (when possible) the solution. By default, the Gaussian RBF kernel is taken. Further information can be found in A.3.24.




Bayesian inference for classication

This Subsection further proceeds on the results of Subsection 3.2.2. A Bayesian framework is used to optimize the tuning parameters and to obtain the moderated output. The optimal regularization parameter gam and kernel parameter sig2 can be found by optimizing the cost on the second and the third level of inference, respectively. It is recommended to initiate the model with appropriate starting values: >> [gam, sig2] = bay_initlssvm({X,Y,type,gam,sig2,RBF_kernel}); Optimization on the second level leads to an optimal regularization parameter: >> [model, gam_opt] = bay_optimize({X,Y,type,gam,sig2,RBF_kernel},2); Optimization on the third level leads to an optimal kernel parameter: >> [cost_L3,sig2_opt] = bay_optimize({X,Y,type,gam_opt,sig2,RBF_kernel},3); The posterior class probabilies are found by incorporating the uncertainty of the model parameters: >> gam = 10; >> sig2 = 1; >> Ymodout = bay_modoutClass({X,Y,type,10,1,RBF_kernel},figure); One can specify a prior class probability in the moderated output in order to compensate for an unbalanced number of training data points in the two classes. When the training set contains N + positive instances and N negative ones, the moderated output is calculated as: prior = >> >> >> >> N+ N+ + N

Np = 10; Nn = 50; prior = Np / (Nn + Np); Posterior_class_P = bay_modoutClass({X,Y,type,10,1,RBF_kernel},... figure, prior);

The results are shown in Figure 3.4.



Probability of occurence of class 1 class 1 class 2 1



X2 0.4 0.2 0 0.2 1.2 1 0.8 0.6 0.4 0.2 X






(a) Moderated Output

Probability of occurence of class 1 class 1 class 2 1 1 Probability of occurence of class 1 class 1 class 2






X 0.2 0 1.2 1 0.8 0.6 0.4 0.2 X




0.2 0 0.2 0.4 0.6 0.8

0.2 1.2 1 0.8 0.6 0.4 0.2 X






(b) Unbalanced subset

(c) With correction for unbalancing

Figure 3.4: (a) Moderated output of the LS-SVM classier on the Ripley data set. The colors indicate the probability to belong to a certain class; (b) This example shows the moderated output of an unbalanced subset of the Ripley data; (c) One can compensate for unbalanced data in the calculation of the moderated output. Notice that the area of the blue zone with the positive samples increases by the compensation. The red zone shrinks accordingly.




Multi-class coding

The following example shows how to use an encoding scheme for multi-class problems. The encoding and decoding are considered as a separate and independent preprocessing and postprocessing step respectively (Figure 3.5(a) and 3.5(b)). A demo le demomulticlass is included in the toolbox. >> % load multiclass data ... >> [Ycode, codebook, old_codebook] = code(Y,code_MOC); >> >> [alpha,b] = trainlssvm({X,Ycode,classifier,gam,sig2}); >> Yhc = simlssvm({X,Ycode,classifier,gam,sig2},{alpha,b},Xtest); >> >> Yhc = code(Yh,old_codebook,[],codebook,codedist_hamming); In multiclass classication problems, it is easiest to use the object oriented interface which integrates the encoding in the LS-SVM training and simulation calls: >> % load multiclass data ... >> model = initlssvm(X,Y,classifier,[],[],RBF_kernel); >> model = tunelssvm(model,simplex,... leaveoneoutlssvm,{misclass},code_OneVsOne); >> model = trainlssvm(model); >> plotlssvm(model); The last argument of the tunelssvm routine can be set to code OneVsOne: One versus one coding code MOC: Minimum output coding code ECOC: Error correcting output code code OneVsAll: One versus all coding



8 7

classifier 21 3 class 1 class 2 class 3

8 7 6 5 4 3

Classifier class 1 class 2 class 3

6 5 4 3 2 1 0

2 1 2 1 2 1
1 2


1 3



2 1

2 1

2 1 0

3 11 111 11 1 2


2 X1

1 2

2 X1



8 7 6

Classifier class 1 class 2 class 3

8 7 6


Classifier class 1 class 2 class 3

5 4

5 4 3



2 1 0


2 1


1 0

2 X1

2 X1

12 2 12 1





Figure 3.5: LS-SVM multi-class example: (a) one versus one encoding; (b) error correcting output code; (c) Minimum output code; (d) One versus all encoding.


A simple example

This is a simple demo, solving a simple regression task using LS-SVMlab. A dataset is constructed in the correct formatting. The data are represented as matrices where each row contains one datapoint: >> X = linspace(-1,1,50); >> Y = (15*(X.^2-1).^2.*X.^4).*exp(-X)+normrnd(0,0.1,length(X),1); >> X X = -1.0000 -0.9592 -0.9184 -0.8776

24 -0.8367 -0.7959 ... 0.9592 1.0000 >> Y = Y = 0.0138 0.2953 0.6847 1.1572 1.5844 1.9935 ... -0.0613 -0.0298


In order to obtain an LS-SVM model (with the RBF kernel), we need two extra tuning parameters: (gam) is the regularization parameter, determining the trade-o between the training error minimization and smoothness of the estimated function. 2 (sig2) is the kernel function parameter. In this case we use leave-one-out CV to determine the tuning parameters. >> type = function estimation; >> [gam,sig2] = tunelssvm({X,Y,type,[],[],RBF_kernel},simplex,... leaveoneoutlssvm,{mse}); >> [alpha,b] = trainlssvm({X,Y,type,gam,sig2,RBF_kernel}); >> plotlssvm({X,Y,type,gam,sig2,RBF_kernel},{alpha,b}); The parameters and the variables relevant for the LS-SVM are passed as one cell. This cell allows for consistent default handling of LS-SVM parameters and syntactical grouping of related arguments. This denition should be used consistently throughout the use of that LS-SVM model. The object oriented interface to LS-SVMlab leads to shorter function calls (see demomodel). By default, the data are preprocessed by application of the function prelssvm to the raw data and the function postlssvm on the predictions of the model. This option can be explicitly switched o in the call: >> [alpha,b] = trainlssvm({X,Y,type,gam,sig2,RBF_kernel,original}); or can be switched on (by default): >> [alpha,b] = trainlssvm({X,Y,type,gam,sig2,RBF_kernel,preprocess}); Remember to consistently use the same option in all successive calls. To evaluate new points for this model, the function simlssvm is used. At rst, test data is generated: >> Xt = rand(10,1).*sign(randn(10,1)); Then, the obtained model is simulated on the test data:

3.3. REGRESSION >> Yt = simlssvm({X,Y,type,gam,sig2,RBF_kernel,preprocess},{alpha,b},Xt); ans = 0.0847 0.0378 1.9862 0.4688 0.3773 1.9832 0.2658 0.2515 1.5571 0.3130 The LS-SVM result can be displayed if the dimension of the input data is one or two. >> plotlssvm({X,Y,type,gam,sig2,RBF_kernel,preprocess},{alpha,b});


All plotting is done with this simple command. It looks for the best way of displaying the result (Figure 3.6).


LS-SVM regression: only one command line away!

As an alternative one can use the one line lssvm command: >> type = function estimation; >> Yp = lssvm(X,Y,type); By default, the Gaussian RBF kernel is used. Further information can be found in A.3.24.
function estimation using LSSVMRBF 2.5
=21.1552, =0.17818



0.5 1


0 X


Figure 3.6: Simple regression problem. The solid line indicates the estimated outputs, the dotted line represents the true underlying function. The dots indicate the training data points.



>> >> >> >>

Bayesian Inference for Regression

An example on the sinc data is given: type = function approximation; X = linspace(-2.2,2.2,250); Y = sinc(X) +normrnd(0,.1,size(X,1),1); [Yp,alpha,b,gam,sig2] = lssvm(X,Y,type);

The errorbars on the training data are computed using Bayesian inference: >> sig2e = bay_errorbar({X,Y,type, gam, sig2},figure); See Figure 3.7 for the resulting error bars.
=79.9993, =1.3096

and its 68% (1) and 95% (2) error bands


0.5 Y 0 0.5 1 1.5



0 X



Figure 3.7: This gure gives the 68% errorbars (green dotted and green dashed-dotted line) and the 95% error bars (red dotted and red dashed-dotted line) of the LS-SVM estimate (solid line) of a simple sinc function. In the next example, the procedure of the automatic relevance determination is illustrated: >> X = normrnd(0,2,100,3); >> Y = sinc(X(:,1)) + 0.05.*X(:,2) +normrnd(0,.1,size(X,1),1); Automatic relevance determination is used to determine the subset of the most relevant inputs for the proposed model: >> inputs = bay_lssvmARD({X,Y,type, 10,3}); >> [alpha,b] = trainlssvm({X(:,inputs),Y,type, 10,1});




Using the object oriented model interface

This case illustrates how one can use the model interface. Here, regression is considered, but the extension towards classication is analogous. >> >> >> >> >> >> type = function approximation; X = normrnd(0,2,100,1); Y = sinc(X) +normrnd(0,.1,size(X,1),1); kernel = RBF_kernel; gam = 10; sig2 = 0.2;

A model is dened >> model = initlssvm(X,Y,type,gam,sig2,kernel); >> model model = type: f x_dim: 1 y_dim: 1 nb_data: 100 kernel_type: RBF_kernel preprocess: preprocess prestatus: ok xtrain: [100x1 double] ytrain: [100x1 double] selector: [1x100 double] gam: 10 kernel_pars: 0.2000 x_delays: 0 y_delays: 0 steps: 1 latent: no code: original codetype: none pre_xscheme: c pre_yscheme: c pre_xmean: -0.0690 pre_xstd: 1.8282 pre_ymean: 0.2259 pre_ystd: 0.3977 status: changed weights: [] Training, simulation and making a plot is executed by the following calls: >> >> >> >> model = trainlssvm(model); Xt = normrnd(0,2,150,1); Yt = simlssvm(model,Xt); plotlssvm(model);

The second level of inference of the Bayesian framework can be used to optimize the regularization parameter gam. For this case, a Nystr om approximation of the 20 principal eigenvectors is used:



>> model = bay_optimize(model,2,eign, 50); Optimization of the cost associated with the third level of inference gives an optimal kernel parameter. For this procedure, it is recommended to initiate the starting points of the kernel parameter. This optimization is based on Matlabs optimization toolbox. It can take a while. >> model = bay_initlssvm(model); >> model = bay_optimize(model,3,eign,50);


Condence/Predition Intervals for Regression

Consider the following example: Fossil data set >> % Load data set X and Y Initializing and tuning the parameters >> model = initlssvm(X,Y,f,[],[], RBF_kernel); >> model = tunelssvm(model,simplex,crossvalidatelssvm,{10,mse}); Bias corrected approximate 100(1 )% pointwise condence intervals on the estimated LS-SVM model can then be obtained by using the command cilssvm: >> ci = cilssvm(model,alpha,pointwise); Typically, the value of the signicance level alpha is set to 5%. The condence intervals obtained by this command are pointwise. For example, by looking at two pointwise condence intervals in Figure 3.8(a) (Fossil data set [26]) we can make the following two statements separately (0.70743, 0.70745) is an approximate 95% pointwise condence interval for m(105); (0.70741, 0.70744) is an approximate 95% pointwise condence interval for m(120). However, as is well known in multiple comparison theory, it is wrong to state that m(105) is contained in (0.70743, 0.70745) and simultaneously m(120) is contained in (0.70741, 0.70744) with 95% condence. Therefore, it is not correct to connect the pointwise condence intervals to produce a band around the estimated function. In order to make these statements we have to modify the interval to obtain simultaneous condence intervals. Three major groups exist to ak corrections and results based modify the interval: Monte Carlo simulations, Bonferroni, Sid on distributions of maxima and upcrossing theory [25, 36, 18]. The latter is implemented in the software. Figure 3.8(b) shows the 95% pointwise and simultaneous condence intervals on the estimated LS-SVM model. As expected the simultaneous intervals are much wider than pointwise intervals. Simultaneous condence intervals can be obtained by >> ci = cilssvm(model,alpha,simultaneous); In some cases one may also be interested in the uncertainty on the prediction for a new observation Xt. This type of requirement is fullled by the construction of a prediction interval. As before, pointwise and simultaneous prediction intervals can be found by >> pi = predlssvm(model,Xt,alpha,pointwise); and >> pi = predlssvm(model,Xt,alpha,simultaneous); respectively. We illustrate both type of prediction intervals on the following example. Note that the software can also handle heteroscedastic data. Also, the cilssvm and predlssvm can be called by the functional interface (see A.3.9 and A.3.27).



0.7075 0.7074 0.7074



m (X ) m (X )

0.7073 0.7073



0.7072 0.7072 0.7071 90









0.7072 90










Figure 3.8: (a) Fossil data with two pointwise 95% condence intervals.; (b) Simultaneous and pointwise 95% condence intervals. The outer (inner) region corresponds to simultaneous (pointwise) condence intervals. The full line (in the middle) is the estimated LS-SVM model. For illustration purposes the 95% pointwise condence intervals are connected.

>> >> >> >>

X = linspace(-5,5,200); Y = sin(X)+sqrt(0.05*X.^2+0.01).*randn(200,1); model = initlssvm(X,Y,f,[],[], RBF_kernel); model = tunelssvm(model,simplex,crossvalidatelssvm,{10,mae});

>> Xt = linspace(-4.5,4.7,200); Figure 3.9 shows the 95% pointwise and simultaneous prediction intervals on the test set Xt. As expected the simultaneous intervals are again much wider than pointwise intervals.

6 5 4 3 2

m (X )

1 0 1 2 3 4 5 5 0 X 5

Figure 3.9: Pointwise and simultaneous 95% prediction intervals for the above given data. The outer (inner) region corresponds to simultaneous (pointwise) prediction intervals. The full line (in the middle) is the estimated LS-SVM model. For illustration purposes the 95% pointwise prediction intervals are connected.



As a nal example, consider the Boston Housing data set (multivariate example). We selected randomly 338 training data points and 168 test data points. The corresponding simultaneous condence and prediction intervals are shown in Figure 3.10(a) and Figure 3.10(b) respectively. The outputs on training as well as on test data are sorted and plotted against their corresponding index. Also, the respective intervals are sorted accordingly. For illustration purposes the simultaneous condence/prediction intervals are not connected. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> % load full data set X and Y sel = randperm(506); % Construct test data Xt = X(sel(1:168),:); Yt = Y(sel(1:168)); % training data X = X(sel(169:end),:); Y = Y(sel(169:end)); model = initlssvm(X,Y,f,[],[],RBF_kernel); model = tunelssvm(model,simplex,crossvalidatelssvm,{10,mse}); model = trainlssvm(model); Yhci = simlssvm(model,X); Yhpi = simlssvm(model,Xt); [Yhci,indci] = sort(Yhci,descend); [Yhpi,indpi] = sort(Yhpi,descend); % Simultaneous confidence intervals ci = cilssvm(model,0.05,simultaneous); ci = ci(indci,:); plot(Yhci); hold all, plot(ci(:,1),g.); plot(ci(:,2),g.); % Simultaneous prediction intervals pi = predlssvm(model,Xt,0.05,simultaneous); pi = pi(indpi,:); plot(Yhpi); hold all, plot(pi(:,1),g.); plot(pi(:,2),g.);

sorted m (X ) (Training data)

sorted m (Xt) (Test data)

3 2 1 0 1 2 3 0

4 3 2 1 0 1 2 3 4



















Figure 3.10: (a) Simultaneous 95% condence intervals for the Boston Housing data set (dots). Sorted outputs are plotted against their index; (b) Simultaneous 95% prediction intervals for the Boston Housing data set (dots). Sorted outputs are plotted against their index.



>> >> >> >>

Robust regression

First, a dataset containing 15% outliers is constructed: X = (-5:.07:5); epsilon = 0.15; sel = rand(length(X),1)>epsilon; Y = sinc(X)+sel.*normrnd(0,.1,length(X),1)+(1-sel).*normrnd(0,2,length(X),1);

Robust tuning of the tuning parameters is performed by rcrossvalildatelssvm. Also notice that the preferred loss function is the L1 (mae). The weighting function in the cost function is chosen to be the Huber weights. Other possibilities, included in the toolbox, are logistic weights, myriad weights and Hampel weights. >> model = initlssvm(X,Y,f,[],[],RBF_kernel); >> L_fold = 10; %10 fold CV >> model = tunelssvm(model,simplex,... rcrossvalidatelssvm,{L_fold,mae},whuber); Robust training is performed by robustlssvm: >> model = robustlssvm(model); >> plotlssvm(model);
function estimation using LSSVM=0.14185,2=0.047615 4 3 2 1 0 1 2 3 4 5 Y Y LSSVM data Real function 4 3 2 1 0 1 2 3 4 5 LSSVM data Real function

function estimation using LSSVM=95025.4538,2=0.66686


0 X

0 X



Figure 3.11: Experiments on a noisy sinc dataset with 15% outliers: (a) Application of the standard training and hyperparameter selection techniques; (b) Application of an iteratively reweighted LS-SVM training together with a robust crossvalidation score function, which enhances the test set performance.



In a second, more extreme, example, we have taken the contamination distribution to be a cubic standard Cauchy distribution and = 0.3. >> >> >> >> X = (-5:.07:5); epsilon = 0.3; sel = rand(length(X),1)>epsilon; Y = sinc(X)+sel.*normrnd(0,.1,length(X),1)+(1-sel).*trnd(1,length(X),1).^3;

As before, we use the robust version of cross-validation. The weight function in the cost function is r) chosen to be the myriad weights. All weight functions W : R [0, 1], with W (r) = ( r satisfying W (0) = 1, are shown in Table 3.1 with corresponding loss function L(r) and score function (r ) (r) = dL dr . This type of weighting function is especially designed to handle extreme outliers. The results are shown in Figure 3.12. Three of the four weight functions contain parameters which have to be tuned (see Table 3.1). The software automatically tunes the parameters of the huber and myriad weight function according to the best performance for these two weight functions. The two parameters of the Hampel weight function are set to b1 = 2.5 and b2 = 3. >> model = initlssvm(X,Y,f,[],[],RBF_kernel); >> L_fold = 10; %10 fold CV >> model = tunelssvm(model,simplex,... rcrossvalidatelssvm,{L_fold,mae},wmyriad); >> model = robustlssvm(model); >> plotlssvm(model);
function estimation using LSSVMRBF 80 LSSVM data Real function
=0.018012, =0.93345

function estimation using LSSVMRBF 4 3 2

=35.1583, =0.090211


LSSVM data Real function

40 1 20 0 1 2 3 40 5 0 X 5 4 5 0 X 5 Y Y 0 20



Figure 3.12: Experiments on a noisy sinc dataset with extreme outliers. (a) Application of the standard training and tuning parameter selection techniques; (b) Application of an iteratively reweighted LS-SVM training (myriad weights) together with a robust cross-validation score function, which enhances the test set performance;



Table 3.1: Denitions for the Huber, Hampel, Logistic and Myriad (with parameter R+ 0) weight functions W (). The corresponding loss L() and score function () are also given. Huber W (r)
, |r |

1, 0,
b2 |r | , b2 b1

Logistic tanh(r) r

Myriad 2 2 + r2

if |r | < ; if |r | .

if |r | < b1 ; if b1 |r | b2 ; if |r | > b2 .

2 r ,2 0, if |r | < b1 ; if b1 |r | b2 ; if |r | > b2 .


r2 , |r | 1 c2 , 2

if |r | < ; if |r | .

b2 r |r 3 | , b2 b1

r tanh(r)

log( 2 + r2 )


Multiple output regression

In the case of multiple output data one can treat the dierent outputs separately. One can also let the toolbox do this by passing the right arguments. This case illustrates how to handle multiple outputs: >> >> >> >> >> >> >> % load data in X, Xt and Y % where size Y is N x 3 gam = 1; sig2 = 1; [alpha,b] = trainlssvm({X,Y,classification,gam,sig2}); Yhs = simlssvm({X,Y,classification,gam,sig2},{alpha,b},Xt); Using dierent kernel parameters per output dimension: >> >> >> >> gam = 1; sigs = [1 2 1.5]; [alpha,b] = trainlssvm({X,Y,classification,gam,sigs}); Yhs = simlssvm({X,Y,classification,gam,sigs},{alpha,b},Xt); Tuning can be done per output dimension: >> % tune the different parameters >> [gam,sigs] = tunelssvm({X,Y,classification,[],[],RBF_kernel},simplex,... crossvalidatelssvm,{10,mse});




A time-series example: Santa Fe laser data prediction

Using the static regression technique, a nonlinear feedforward prediction model can be built. The NARX model takes the past measurements as input to the model. >> >> >> >> >> >> % load time-series in X and Xt lag = 50; Xu = windowize(X,1:lag+1); Xtra = Xu(1:end-lag,1:lag); %training set Ytra = Xu(1:end-lag,end); %training set Xs=X(end-lag+1:end,1); %starting point for iterative prediction

Cross-validation is based upon feedforward simulation on the validation set using the feedforwardly trained model: >> [gam,sig2] = tunelssvm({Xtra,Ytra,f,[],[],RBF_kernel},simplex,... crossvalidatelssvm,{10,mae}); Prediction of the next 100 points is done in a recurrent way: >> [alpha,b] = trainlssvm({Xtra,Ytra,f,gam,sig2,RBF_kernel}); >> %predict next 100 points >> prediction = predict({Xtra,Ytra,f,gam,sig2,RBF_kernel},Xs,100); >> plot([prediction Xt]); In Figure 3.13 results are shown for the Santa Fe laser data.
300 Iterative prediction Santa Fe laser data 250








40 60 Discrete time index



Figure 3.13: The solid line denotes the Santa Fe chaotic laser data. The dashed line shows the iterative prediction using LS-SVM with the RBF kernel with optimal hyper-parameters obtained by tuning.




Fixed size LS-SVM

The xed size LS-SVM is based on two ideas (see also Section 2.4): the rst is to exploit the primal-dual formulations of the LS-SVM in view of a Nystr om approximation (Figure 3.14).

Figure 3.14: Fixed Size LS-SVM is a method for solving large scale regression and classication problems. The number of support vectors is pre-xed beforehand and the support vectors are selected from a pool of training data. After estimating eigenfunctions in relation to a Nystr om approximation with selection of the support vectors according to an entropy criterion, the LS-SVM model is estimated in the primal space. The second one is to do active support vector selection (here based on entropy criteria). The rst step is implemented as follows: >> >> >> >> >> >> % X,Y contains the dataset, svX is a subset of X sig2 = 1; features = AFEm(svX,RBF_kernel,sig2, X); [Cl3, gam_optimal] = bay_rr(features,Y,1,3); [W,b] = ridgeregress(features, Y, gam_optimal); Yh = features*W+b;

Optimal values for the kernel parameters and the capacity of the xed size LS-SVM can be obtained using a simple Monte Carlo experiment. For dierent kernel parameters and capacities (number of chosen support vectors), the performance on random subsets of support vectors are evaluated. The means of the performances are minimized by an exhaustive search (Figure 3.15b): >> >> >> >> caps = [10 20 50 100 200] sig2s = [.1 .2 .5 1 2 4 10] nb = 10; for i=1:length(caps), for j=1:length(sig2s), for t = 1:nb, sel = randperm(size(X,1)); svX = X(sel(1:caps(i))); features = AFEm(svX,RBF_kernel,sig2s(j), X); [Cl3, gam_opt] = bay_rr(features,Y,1,3); [W,b] = ridgeregress(features, Y, gam_opt); Yh = features*W+b; performances(t) = mse(Y - Yh); end


CHAPTER 3. LS-SVMLAB TOOLBOX EXAMPLES minimal_performances(i,j) = mean(performances); end end

The kernel parameter and capacity corresponding to a good performance are searched: >> >> >> >> [minp,ic] = min(minimal_performances,[],1); [minminp,is] = min(minp); capacity = caps(ic); sig2 = sig2s(is);

The following approach optimizes the selection of support vectors according to the quadratic Renyi entropy: >> >> >> >> % load data X and Y, capacity and the kernel parameter sig2 sv = 1:capacity; max_c = -inf; for i=1:size(X,1), replace = ceil(rand.*capacity); subset = [sv([1:replace-1 replace+1:end]) i]; crit = kentropy(X(subset,:),RBF_kernel,sig2); if max_c <= crit, max_c = crit; sv = subset; end end

This selected subset of support vectors is used to construct the nal model (Figure 3.15a): >> features = AFEm(svX,RBF_kernel,sig2, X); >> [Cl3, gam_optimal] = bay_rr(features,Y,1,3); >> [W,b, Yh] = ridgeregress(features, Y, gam_opt);
FixedSize LSSVM on 20.000 noisy sinc data points 1.4 1.2 1 0.8 0.08 0.6 0.4 0.2 0 0.2 0.4 0.6 5 0 X 5 0.04 0 20 40 capacity subset 60 80 100 0 200 400 600 2 800 1000 1200 Y 0.06 training data support vectors real function estimated function Estimated cost surface of fixedsize LSSVM on repeated i.i.d. subsampling




Figure 3.15: Illustration of xed size LS-SVM on a noisy sinc function with 20.000 data points: (a) xed size LS-SVM selects a subset of the data after Nystr om approximation. The regularization parameter for the regression in the primal space is optimized here using the Bayesian framework; (b) Estimated cost surface of the xed size LS-SVM based on random subsamples of the data, of dierent subset capacities and kernel parameters.

3.3. REGRESSION The same idea can be used for learning a classier from a huge data set. >> % load the input and output of the trasining data in X and Y >> cap = 25;


The rst step is the same: the selection of the support vectors by optimizing the entropy criterion. Here, the pseudo code is showed. For the working code, one can study the code of demo_fixedclass.m. % initialise a subset of cap points: Xs >> for i = 1:1000, Xs_old = Xs; % substitute a point of Xs by a new one crit = kentropy(Xs, kernel, kernel_par); % if crit is not larger then in the previous loop, % substitute Xs by the old Xs_old end By taking the values -1 and +1 as targets in a linear regression, the Fisher discriminant is obtained: >> features = AFEm(Xs,kernel, sigma2,X); >> [w,b] = ridgeregress(features,Y,gamma); New data points can be simulated as follows: >> features_t = AFEm(Xs,kernel, sigma2,Xt); >> Yht = sign(features_t*w+b); An example of a resulting classier and the selected support vectors is displayed in Figure 3.16 (see demo_fixedclass).
Approximation by fixed size LSSVM based on maximal entropy: 2.3195 1 Negative points Positive points Support Vectors


0.6 X2 0.4 0.2 0

0.5 X1


Figure 3.16: An example of a binary classier (Ripley data set) obtained by application of a xed size LS-SVM (20 support vectors) on a classication task.




Unsupervised learning using kernel principal component analysis

A simple example shows the idea of denoising in the input space by means of kernel PCA. The demo can be called by: >> demo_yinyang and uses the routine preimage_rbf.m which is a xed-point iteration algorithm for computing pre-images in the case of RBF kernels. The pseudo-code is shown as follows: >> >> >> >> >> >> % load training data in Xtrain and test data in Xtest dim = size(Xtrain,2); nb_pcs = 4; factor = 0.25; sig2 = factor*dim*mean(var(Xtrain)); % A rule of thumb for sig2; [lam,U] = kpca(Xtrain,RBF_kernel,sig2,Xtest,eigs,nb_pcs); The whole dataset is denoised by computing approximate pre-images: >> Xd = preimage_rbf(X,sig2,U,[Xtrain;Xtest],d); Figure 3.17 shows the original dataset in gray (+) and the denoised data in blue (o). Note that, the denoised data points preserve the underlying nonlinear structure of the data which is not the case in linear PCA.
Denoising (o) by computing an approximate preimage 2.5 2 1.5 1 0.5 x2 0 0.5 1 1.5 2 3

0 x

Figure 3.17: Denoised data (o) obtained by reconstructing the data-points (+) using 4 kernel principal components with the RBF kernel.

Appendix A

MATLAB functions
A.1 General notation

In the full syntax description of the function calls, a star (*) indicates that the argument is optional. In the description of the arguments, a (*) denotes the default value. In this extended help of the function calls of LS-SVMlab, a number of symbols and notations return in the explanation and the examples. These are dened as follows: Variables d empty m N Nt nb X Xt Y Yt Zt Explanation Dimension of the input vectors Empty matrix ([]) Dimension of the output vectors Number of training data Number of test data Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation Nd matrix with the inputs of the training data Ntd matrix with the inputs of the test data Nm matrix with the outputs of the training data Ntm matrix with the outputs of the test data Ntm matrix with the predicted latent variables of a classier

This toolbox supports a classical functional interface as well as an object oriented interface. The latter has a few dedicated structures which will appear many times: Structures bay model Explanation Object oriented representation of the results of the Bayesian inference Object oriented representation of the LS-SVM model





Index of function calls

Training and simulation
Short Explanation Calculate the latent variables of the LS-SVM classier Plot the LS-SVM results in the environment of the training data Evaluate the LS-SVM at the given points Find the support values and the bias term of a Least Squares Support Vector Machine One line LS-SVM pointwise or simultaneous condence intervals pointwise or simultaneous prediction intervals Reference A.3.20 A.3.25 A.3.34 A.3.35 A.3.24 A.3.9 A.3.27

Function Call latentlssvm plotlssvm simlssvm trainlssvm lssvm cilssvm predlssvm




Object oriented interface

This toolbox supports a classical functional interface as well as an object oriented interface. The latter has a few dedicated functions. This interface is recommended for the more experienced user. Function Call changelssvm demomodel initlssvm Short Explanation Change properties of an LS-SVM object Demo introducing the use of the compact calls based on the model structure Initiate the LS-SVM object before training Reference A.3.16





Training and simulating functions

Short Explanation MATLAB implementation of training Internally called preprocessor Internally called postprocessor Reference A.3.29 A.3.29

Function Call lssvmMATLAB.m prelssvm postlssvm




Kernel functions
Short Explanation Linear kernel for MATLAB implementation Polynomial kernel for MATLAB implementation Radial Basis Function kernel for MATLAB implementation Multilayer Perceptron kernel for MATLAB implementation Reference A.3.22 A.3.22 A.3.22 A.3.22

Function Call lin_kernel poly_kernel RBF_kernel MLP_kernel




Tuning, sparseness and robustness

Short Explanation Estimate the model performance with L-fold crossvalidation Estimate the model performance with generalized crossvalidation Estimate the model performance with robust L-fold crossvalidation A two-dimensional minimization procedure based on exhaustive search in a limited range Estimate the model performance with leaveone-out crossvalidation L1 cost measures of the residuals L and L0 cost measures of the residuals L2 cost measures of the residuals Tune the tuning parameters of the model with respect to the given performance measure Robust training in the case of non-Gaussian noise or outliers Reference A.3.11 A.3.15 A.3.30 A.3.36 A.3.21 A.3.23 A.3.23 A.3.23 A.3.36 A.3.32

Function Call crossvalidate gcrossvalidate rcrossvalidate gridsearch leaveoneout mae, medae linf, misclass mse tunelssvm robustlssvm




Classication extensions
Short Explanation Encode and decode a multi-class classication task to multiple binary classiers Error correcting output coding Minimum Output Coding One versus All encoding One versus One encoding Hamming distance measure between two encoded class labels Encoding the LS-SVM model Bias term correction for the LS-SVM classicatier Receiver Operating Characteristic curve of a binary classier Reference A.3.10 A.3.10 A.3.10 A.3.10 A.3.10 A.3.10 A.3.10 A.3.12 A.3.33

Function Call code code_ECOC code_MOC code_OneVsAll code_OneVsOne codedist_hamming codelssvm deltablssvm roc




Bayesian framework
Short Explanation Compute the error bars for a one dimensional regression problem Initialize the tuning parameters for Bayesian inference Compute the posterior cost for the dierent levels in Bayesian inference Automatic Relevance Determination of the inputs of the LS-SVM Estimate the posterior class probabilities of a binary classier using Bayesian inference Optimize model- or tuning parameters with respect to the dierent inference levels Bayesian inference for linear ridge regression Find the principal eigenvalues and eigenvectors of a matrix with Nystr oms low rank approximation method Construct the positive (semi-) denite kernel matrix Kernel Principal Component Analysis Linear ridge regression Reference A.3.2 A.3.3 A.3.4 A.3.5 A.3.6 A.3.7 A.3.8 A.3.14

Function Call bay_errorbar bay_initlssvm bay_lssvm bay_lssvmARD bay_modoutClass bay_optimize bay_rr eign

kernel_matrix kpca ridgeregress

A.3.18 A.3.19 A.3.31




NARX models and prediction

Short Explanation Iterative prediction of a trained LS-SVM NARX model (in recurrent mode) Rearrange the data points into a Hankel matrix for (N)AR time-series modeling Rearrange the input and output data into a (block) Hankel matrix for (N)AR(X) timeseries modeling Reference A.3.26 A.3.37 A.3.37

Function Call predict windowize windowize_NARX




Unsupervised learning
Short Explanation Automatic Feature Extraction from Nystr om method Reconstruct the data mapped on the principal components Quadratic Renyi Entropy for a kernel based estimator Compute the nonlinear kernel principal components of the data Compute an approximate pre-image in the input space (for RBF kernels) Reference A.3.1 A.3.13 A.3.17 A.3.19 A.3.28

Function Call AFEm denoise_kpca kentropy kpca preimage_rbf




Fixed size LS-SVM

The idea of xed size LS-SVM is still under development. However, in order to enable the user to explore this technique a number of related functions are included in the toolbox. A demo illustrates how to combine these in order to build a xed size LS-SVM. Function Call AFEm bay_rr demo_fixedsize demo_fixedclass kentropy ridgeregress Short Explanation Automatic Feature Extraction from Nystr om method Bayesian inference of the cost on the 3 levels of linear ridge regression Demo illustrating the use of xed size LS-SVMs for regression Demo illustrating the use of xed size LS-SVMs for classication Quadratic Renyi Entropy for a kernel based estimator Linear ridge regression Reference A.3.1 A.3.8 A.3.17 A.3.31




Short Explanation Simple demo illustrating the use of LS-SVMlab for regression Demo illustrating the use of xed size LS-SVMs for regression Simple demo illustrating the use of LS-SVMlab for classication Demo illustrating the use of xed size LS-SVMs for classication Simple demo illustrating the use of the object oriented interface of LS-SVMlab Demo illustrating the possibilities of unsupervised learning by kernel PCA Demo illustrating the construction of condence intervals for LS-SVMs (regression)

name of the demo demofun demo_fixedsize democlass demo_fixedclass demomodel demo_yinyang democonfint




Alphabetical list of function calls


Purpose Automatic Feature Extraction by Nystr om method Basic syntax >> features = AFEm(X, kernel, sig2, Xt) Description Using the Nystr om approximation method, the mapping of data to the feature space can be evaluated explicitly. This gives features that one can use for a parametric regression or classication in the primal space. The decomposition of the mapping to the feature space relies on the eigenvalue decomposition of the kernel matrix. The Matlab (eigs) or Nystr oms (eign) approximation using the nb most important eigenvectors/eigenvalues can be used. The eigenvalue decomposition is not re-calculated if it is passed as an extra argument. Full syntax >> [features, U, lam] = >> [features, U, lam] = >> [features, U, lam] = >> features = Outputs features U(*) lam(*) Inputs X kernel sig2 Xt etype(*) nb(*) U(*) lam(*) See also: kernel_matrix, RBF_kernel, demo_fixedsize AFEm(X, AFEm(X, AFEm(X, AFEm(X, kernel, kernel, kernel, kernel, sig2, sig2, sig2, sig2, Xt) Xt, etype) Xt, etype, nb) Xt, [],[], U, lam)

Ntnb matrix with extracted features Nnb matrix with eigenvectors nb1 vector with eigenvalues Nd matrix with input data Name of the used kernel (e.g. RBF_kernel) Kernel parameter(s) (for linear kernel, use []) Ntd data from which the features are extracted eig(*), eigs or eign Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation Nnb matrix with eigenvectors nb1 vector with eigenvalues




bay errorbar

Compute the error bars for a one dimensional regression problem Basic syntax >> sig_e = bay_errorbar({X,Y,function,gam,sig2}, Xt) >> sig_e = bay_errorbar(model, Xt) Description The computation takes into account the estimated noise variance and the uncertainty of the model parameters, estimated by Bayesian inference. sig_e is the estimated standard deviation of the error bars of the points Xt. A plot is obtained by replacing Xt by the string figure. Full syntax Using the functional interface: >> >> >> >> >> sig_e sig_e sig_e sig_e sig_e = = = = = bay_errorbar({X,Y,function,gam,sig2,kernel,preprocess}, bay_errorbar({X,Y,function,gam,sig2,kernel,preprocess}, bay_errorbar({X,Y,function,gam,sig2,kernel,preprocess}, bay_errorbar({X,Y,function,gam,sig2,kernel,preprocess}, bay_errorbar({X,Y,function,gam,sig2,kernel,preprocess}, Xt) Xt, etype) Xt, etype, nb) figure) figure, etype, nb)

Outputs sig_e Inputs X Y type gam sig2 kernel(*) preprocess(*) Xt etype(*) nb(*)

Nt1 vector with the 2 error bars of the test data Nd matrix with the inputs of the training data N1 vector with the inputs of the training data function estimation (f) Regularization parameter Kernel parameter Kernel type (by default RBF_kernel) preprocess(*) or original Ntd matrix with the inputs of the test data svd(*), eig, eigs or eign Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation

Using the object oriented interface: >> >> >> >> >> >> [sig_e, [sig_e, [sig_e, [sig_e, [sig_e, [sig_e, bay, bay, bay, bay, bay, bay, model] model] model] model] model] model] = = = = = = bay_errorbar(model, bay_errorbar(model, bay_errorbar(model, bay_errorbar(model, bay_errorbar(model, bay_errorbar(model, Xt) Xt, Xt, figure) figure, figure,

etype) etype, nb) etype) etype, nb)



Outputs sig_e model(*) bay(*) Inputs model Xt etype(*) nb(*)

Nt1 vector with the 2 error bars of the test data Object oriented representation of the LS-SVM model Object oriented representation of the results of the Bayesian inference Object oriented representation of the LS-SVM model Ntd matrix with the inputs of the test data svd(*), eig, eigs or eign Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation

See also: bay_lssvm, bay_optimize, bay_modoutClass, plotlssvm




bay initlssvm

Initialize the tuning parameters and 2 before optimization with bay_optimize Basic syntax >> [gam, sig2] = bay_initlssvm({X,Y,type,[],[]}) >> model = bay_initlssvm(model) Description A starting value for 2 is only given if the model has kernel type RBF_kernel. Full syntax Using the functional interface: >> [gam, sig2] = bay_initlssvm({X,Y,type,[],[],kernel})

Outputs gam sig2 Inputs X Y type kernel(*)

Proposed initial regularization parameter Proposed initial RBF_kernel parameter Nd matrix with the inputs of the training data N1 vector with the outputs of the training data function estimation (f) or classifier (c) Kernel type (by default RBF_kernel)

Using the object oriented interface: >> model = bay_initlssvm(model)

Outputs model Inputs model See also:

Object oriented representation of the LS-SVM model with initial tuning parameters Object oriented representation of the LS-SVM model

bay_lssvm, bay_optimize




bay lssvm

Compute the posterior cost for the 3 levels in Bayesian inference Basic syntax >> cost = bay_lssvm({X,Y,type,gam,sig2}, level, etype) >> cost = bay_lssvm(model , level, etype) Description Estimate the posterior probabilities of model tuning parameters on the dierent inference levels. By taking the negative logarithm of the posterior and neglecting all constants, one obtains the corresponding cost. Computation is only feasible for one dimensional output regression and binary classication problems. Each level has its dierent input and output syntax: First level: The cost associated with the posterior of the model parameters (support values and bias term) is determined. The type can be: train: do a training of the support values using trainlssvm. The total cost, the cost of the residuals (Ed) and the regularization parameter (Ew) are determined by the solution of the support values retrain: do a retraining of the support values using trainlssvm the cost terms can also be calculated from an (approximate) eigenvalue decomposition of the kernel matrix: svd, eig, eigs or Nystr oms eign Second level: The cost associated with the posterior of the regularization parameter is computed. The etype can be svd, eig, eigs or Nystr oms eign. Third level: The cost associated with the posterior of the chosen kernel and kernel parameters is computed. The etype can be: svd, eig, eigs or Nystr oms eign. Full syntax Outputs on the rst level >> >> >> >> >> >> [costL1,Ed,Ew,bay] [costL1,Ed,Ew,bay] [costL1,Ed,Ew,bay] [costL1,Ed,Ew,bay] [costL1,Ed,Ew,bay] [costL1,Ed,Ew,bay] = = = = = = bay_lssvm({X,Y,type,gam,sig2,kernel,preprocess}, 1) bay_lssvm({X,Y,type,gam,sig2,kernel,preprocess}, 1, etype) bay_lssvm({X,Y,type,gam,sig2,kernel,preprocess}, 1, etype, nb) bay_lssvm(model, 1) bay_lssvm(model, 1, etype) bay_lssvm(model, 1, etype, nb)

With costL1 Ed(*) Ew(*) bay(*) Cost proportional to the posterior Cost of the training error term Cost of the regularization parameter Object oriented representation of the results of the Bayesian inference

Outputs on the second level


APPENDIX A. MATLAB FUNCTIONS >> [costL2,DcostL2, optimal_cost, bay] = ... bay_lssvm({X,Y,type,gam,sig2,kernel,preprocess}, 2, etype, nb) >> [costL2,DcostL2, optimal_cost, bay] = bay_lssvm(model, 2, etype, nb) With costL2 DcostL2(*) optimal_cost(*) bay(*) Cost proportional to the posterior on the second level Derivative of the cost Optimality of the regularization parameter (optimal = 0) Object oriented representation of the results of the Bayesian inference

Outputs on the third level >> [costL3,bay] = bay_lssvm({X,Y,type,gam,sig2,kernel,preprocess}, 3, etype, nb) >> [costL3,bay] = bay_lssvm(model, 3, etype, nb) With costL3 bay(*) Cost proportional to the posterior on the third level Object oriented representation of the results of the Bayesian inference

Inputs using the functional interface >> bay_lssvm({X,Y,type,gam,sig2,kernel,preprocess}, level, etype, nb)

X Y type gam sig2 kernel(*) preprocess(*) level etype(*) nb(*)

Nd matrix with the inputs of the training data N1 vector with the outputs of the training data function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original 1, 2, 3 svd(*), eig, eigs, eign Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation

Inputs using the object oriented interface >> bay_lssvm(model, level, etype, nb)

model level etype(*) nb(*)

Object oriented representation of the LS-SVM model 1, 2, 3 svd(*), eig, eigs, eign Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation

See also: bay_lssvmARD, bay_optimize, bay_modoutClass, bay_errorbar




bay lssvmARD

Bayesian Automatic Relevance Determination of the inputs of an LS-SVM Basic syntax >> dimensions = bay_lssvmARD({X,Y,type,gam,sig2}) >> dimensions = bay_lssvmARD(model) Description For a given problem, one can determine the most relevant inputs for the LS-SVM within the Bayesian evidence framework. To do so, one assigns a dierent weighting parameter to each dimension in the kernel and optimizes this using the third level of inference. According to the used kernel, one can remove inputs based on the larger or smaller kernel parameters. This routine only works with the RBF_kernel with a sig2 per input. In each step, the input with the largest optimal sig2 is removed (backward selection). For every step, the generalization performance is approximated by the cost associated with the third level of Bayesian inference. The ARD is based on backward selection of the inputs based on the sig2s corresponding in each step with a minimal cost criterion. Minimizing this criterion can be done by continuous or by discrete. The former uses in each step continuous varying kernel parameter optimization, the latter decides which one to remove in each step by binary variables for each component (this can only be applied for rather low dimensional inputs as the number of possible combinations grows exponentially with the number of inputs). If working with the RBF_kernel, the kernel parameter is rescaled appropriately after removing an input variable. The computation of the Bayesian cost criterion can be based on the singular value decomposition svd of the full kernel matrix or by an approximation of these eigenvalues and vectors by the eigs or eign approximation based on nb data points. Full syntax Using the functional interface: >> [dimensions, ordered, costs, sig2s] = ... bay_lssvmARD({X,Y,type,gam,sig2,kernel,preprocess}, method, etype, nb)

Outputs dimensions ordered(*) costs(*) sig2s(*) Inputs X Y type gam sig2 kernel(*) preprocess(*) method(*) etype(*) nb(*)

r1 vector of the relevant inputs d1 vector with inputs in decreasing order of relevance Costs associated with third level of inference in every selection step Optimal kernel parameters in each selection step Nd matrix with the inputs of the training data N1 vector with the outputs of the training data function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original discrete(*) or continuous svd(*), eig, eigs, eign Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation

58 Using the object oriented interface: >>


[dimensions, ordered, costs, sig2s, model] = bay_lssvmARD(model, method, etype, nb)

Outputs dimensions ordered(*) costs(*) sig2s(*) model(*) Inputs model method(*) etype(*) nb(*)

r1 vector of the relevant inputs d1 vector with inputs in decreasing order of relevance Costs associated with third level of inference in every selection step Optimal kernel parameters in each selection step Object oriented representation of the LS-SVM model trained only on the relevant inputs Object oriented representation of the LS-SVM model discrete(*) or continuous svd(*), eig, eigs, eign Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation

See also: bay_lssvm, bay_optimize, bay_modoutClass, bay_errorbar




bay modoutClass

Estimate the posterior class probabilities of a binary classier using Bayesian inference

Basic syntax >> [Ppos, Pneg] = bay_modoutClass({X,Y,classifier,gam,sig2}, Xt) >> [Ppos, Pneg] = bay_modoutClass(model, Xt)

Description Calculate the probability that a point will belong to the positive or negative classes taking into account the uncertainty of the parameters. Optionally, one can express prior knowledge as a probability between 0 and 1, where prior equal to 2/3 means that the prior positive class probability is 2/3 (more likely to occur than the negative class). For binary classication tasks with a two dimensional input space, one can make a surface plot by replacing Xt by the string figure.

Full syntax Using the functional interface:

>> [Ppos, Pneg] = bay_modoutClass({X,Y,classifier,... gam,sig2, kernel, preprocess}, Xt) >> [Ppos, Pneg] = bay_modoutClass({X,Y,classifier,... gam,sig2, kernel, preprocess}, Xt, prior) >> [Ppos, Pneg] = bay_modoutClass({X,Y,classifier,... gam,sig2, kernel, preprocess}, Xt, prior, etype) >> [Ppos, Pneg] = bay_modoutClass({X,Y,classifier,... gam,sig2, kernel, preprocess}, Xt, prior, etype, nb) >> bay_modoutClass({X,Y,classifier,... gam,sig2, kernel, preprocess}, figure) >> bay_modoutClass({X,Y,classifier,... gam,sig2, kernel, preprocess}, figure, prior) >> bay_modoutClass({X,Y,classifier,... gam,sig2, kernel, preprocess}, figure, prior, etype) >> bay_modoutClass({X,Y,classifier,... gam,sig2, kernel, preprocess}, figure, prior, etype, nb)



Outputs Ppos Pneg Inputs X Y type gam sig2 kernel(*) preprocess(*) Xt(*) prior(*) etype(*) nb(*)

Nt1 vector with probabilities that testdata Xt belong to the positive class Nt1 vector with probabilities that testdata Xt belong to the negative(zero) class Nd matrix with the inputs of the training data N1 vector with the outputs of the training data function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original Ntd matrix with the inputs of the test data Prior knowledge of the balancing of the training data (or []) svd(*), eig, eigs or eign Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation

Using the object oriented interface: >> >> >> >> >> >> >> >> [Ppos, Pneg, bay, model] = bay_modoutClass(model, Xt) [Ppos, Pneg, bay, model] = bay_modoutClass(model, Xt, prior) [Ppos, Pneg, bay, model] = bay_modoutClass(model, Xt, prior, etype) [Ppos, Pneg, bay, model] = bay_modoutClass(model, Xt, prior, etype, nb) bay_modoutClass(model, figure) bay_modoutClass(model, figure, prior) bay_modoutClass(model, figure, prior, etype) bay_modoutClass(model, figure, prior, etype, nb)

Outputs Ppos Pneg bay(*) model(*) Inputs model Xt(*) prior(*) etype(*) nb(*)

Nt1 vector with probabilities that testdata Xt belong to the positive class Nt1 vector with probabilities that testdata Xt belong to the negative(zero) class Object oriented representation of the results of the Bayesian inference Object oriented representation of the LS-SVM model Object oriented representation of the LS-SVM model Ntd matrix with the inputs of the test data Prior knowledge of the balancing of the training data (or []) svd(*), eig, eigs or eign Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation

See also: bay_lssvm, bay_optimize, bay_errorbar, ROC




bay optimize

Optimize the posterior probabilities of model (hyper-) parameters with respect to the dierent levels in Bayesian inference Basic syntax One can optimize on the three dierent inference levels as described in section 2.1.3. First level: In the rst level one optimizes the support values s and the bias b. Second level: In the second level one optimizes the regularization parameter gam. Third level: In the third level one optimizes the kernel parameter. In the case of the common RBF_kernel the kernel parameter is the bandwidth sig2. This routine is only tested with Matlab R2008a, R2008b, R2009a, R2009b and R2010a using the corresponding optimization toolbox. Full syntax Outputs on the rst level: >> [model, alpha, b] = bay_optimize({X,Y,type,gam,sig2,kernel,preprocess}, 1) >> [model, alpha, b] = bay_optimize(model, 1) With model alpha(*) b(*) Object oriented representation of the LS-SVM model optimized on the rst level of inference Support values optimized on the rst level of inference Bias term optimized on the rst level of inference

Outputs on the second level: >> [model,gam] = bay_optimize({X,Y,type,gam,sig2,kernel,preprocess}, 2) >> [model,gam] = bay_optimize(model, 2) With model gam(*) Object oriented representation of the LS-SVM model optimized on the second level of inference Regularization parameter optimized on the second level of inference

Outputs on the third level: >> [model, sig2] = bay_optimize({X,Y,type,gam,sig2,kernel,preprocess}, 3) >> [model, sig2] = bay_optimize(model, 3) With model sig2(*) Object oriented representation of the LS-SVM model optimized on the third level of inference Kernel parameter optimized on the third level of inference

Inputs using the functional interface


APPENDIX A. MATLAB FUNCTIONS >> model = bay_optimize({X,Y,type,gam,sig2,kernel,preprocess}, level) >> model = bay_optimize({X,Y,type,gam,sig2,kernel,preprocess}, level, etype) >> model = bay_optimize({X,Y,type,gam,sig2,kernel,preprocess}, level, etype, nb)

X Y type gam sig2 kernel(*) preprocess(*) level etype(*) nb(*)

Nd matrix with the inputs of the training data N1 vector with the outputs of the training data function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original 1, 2, 3 eig, svd(*), eigs, eign Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation

Inputs using the object oriented interface >> model = bay_optimize(model, level) >> model = bay_optimize(model, level, etype) >> model = bay_optimize(model, level, etype, nb)

model level etype(*) nb(*)

Object oriented representation of the LS-SVM model 1, 2, 3 eig, svd(*), eigs, eign Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation

See also: bay_lssvm, bay_lssvmARD, bay_modoutClass, bay_errorbar




bay rr

Bayesian inference of the cost on the three levels of linear ridge regression Basic syntax >> cost = bay_rr(X, Y, gam, level) Description This function implements the cost functions related to the Bayesian framework of linear ridge Regression [44]. Optimizing these criteria results in optimal model parameters W,b and tuning parameters. The criterion can also be used for model comparison. The obtained model parameters w and b are optimal on the rst level for J = 0.5*w*w+gam*0.5*sum(Y-X*w-b).^2. Full syntax Outputs on the rst level: Cost proportional to the posterior of the model parameters.

>> [costL1, Ed, Ew] = bay_rr(X, Y, gam, 1) With costL1 Ed(*) Ew(*) Cost proportional to the posterior Cost of the training error term Cost of the regularization parameter

Outputs on the second level: Cost proportional to the posterior of gam.

>> [costL2, DcostL2, Deff, mu, ksi, eigval, eigvec] = bay_rr(X, Y, gam, 2) With costL2 DcostL2(*) Deff(*) mu(*) ksi(*) eigval(*) eigvec(*) Cost proportional to the posterior on the second level Derivative of the cost proportional to the posterior Eective number of parameters Relative importance of the tting error term Relative importance of the regularization parameter Eigenvalues of the covariance matrix Eigenvectors of the covariance matrix

Outputs on the third level: The following commands can be used to compute the level 3 cost function for dierent models (e.g. models with dierent selected sets of inputs). The best model can then be chosen as the model with best level 3 cost (CostL3).

>> [costL3, gam_optimal] = bay_rr(X, Y, gam, 3) With costL3 gam_optimal(*) Cost proportional to the posterior on the third inference level Optimal regularization parameter obtained from optimizing the second level

64 Inputs: >> cost = bay_rr(X, Y, gam, level)


X Y gam level See also:

Nd matrix with the inputs of the training data N1 vector with the outputs of the training data Regularization parameter 1, 2, 3






Construction of bias corrected 100(1 )% pointwise or simultaneous condence intervals Basic syntax >> ci = cilssvm({X,Y,type,gam,kernel_par,kernel,preprocess},alpha,conftype) >> ci = cilssvm(model,alpha,conftype) Description This function calculates bias corrected 100(1 )% pointwise or simultaneous condence intervals. The procedure support homoscedastic data sets as well heteroscedastic data sets. The construction of the condence intervals are based on the central limit theorem for linear smoothers combined with bias correction and variance estimation. Full syntax Using the functional interface: >> ci = cilssvm({X,Y,type,gam,kernel_par,kernel,preprocess}) >> ci = cilssvm({X,Y,type,gam,kernel_par,kernel,preprocess}, alpha) >> ci = cilssvm({X,Y,type,gam,kernel_par,kernel,preprocess}, alpha, conftype) Outputs ci Inputs X Y type gam sig2 kernel(*) preprocess(*) alpha(*) conftype(*) N 2 matrix containing the lower and upper condence intervals Training input data used for dening the LS-SVM and the preprocessing Training output data used for dening the LS-SVM and the preprocessing function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original Signicance level (by default 5%) Type of condence interval pointwise or simultaneous (by default simultaneous)

Using the object oriented interface: >> ci = cilssvm(model) >> ci = cilssvm(model, alpha) >> ci = cilssvm(model, alpha, conftype) Outputs ci Inputs model alpha(*) conftype(*) N 2 matrix containing the lower and upper condence intervals Object oriented representation of the LS-SVM model Signicance level (by default 5%) Type of condence interval pointwise or simultaneous (by default simultaneous)

See also: trainlssvm, simlssvm, predlssvm




code, codelssvm

Encode and decode a multi-class classication task into multiple binary classiers Basic syntax >> Yc = code(Y, codebook) Description The coding is dened by the codebook. The codebook is represented by a matrix where the columns represent all dierent classes and the rows indicate the result of the binary classiers. An example is given: the 3 classes with original labels [1 2 3] can be encoded in the following codebook (using Minimum Output Coding): >> codebook = [-1 -1 1 -1 1; 1]

For this codebook, a member of the rst class is found if the rst binary classier is negative and the second classier is positive. A dont care is represented by NaN. By default it is assumed that the original classes are represented as dierent numerical labels. One can overrule this by passing the old_codebook which contains information about the old representation. A codebook can be created by one of the functions (codefct) code_MOC, code_OneVsOne, code_OneVsAll, code_ECOC. Additional arguments to this function can be passed as a cell in codefct_args. >> Yc = code(Y,codefct,codefct_args) >> Yc = code(Y,codefct,codefct_args, old_codebook) >> [Yc, codebook, oldcodebook] = code(Y,codefct,codefct_args) To detect the classes of a disturbed encoded signal given the corresponding codebook, one needs a distance function (fctdist) with optional arguments given as a cell (fctdist_args). By default, the Hamming distance (of function codedist_hamming) is used. >> Yc = code(Y, codefct, codefct_args, old_codebook, fctdist, fctdist_args) A simple example is given here, a more elaborated example is given in section 3.2.6. Here, a short categorical signal Y is encoded in Yec using Minimum Output Coding and decoded again to its original form: >> Y = [1; 2; 3; 2; 1] >> [Yc,codebook,old_codebook] = code(Y,code_MOC) >> Yc = [-1 -1 -1 1 1 -1 -1 1 -1 -1] >> codebook = [ -1 -1 % encode

-1 1

1 -1]

>> old_codebook = [1 2




>> code(Yc, old_codebook, [], codebook, codedist_hamming) % decode ans = [1; 2; 3; 2; 1] Dierent encoding schemes are available: Minimum Output Coding (code_MOC) Here the minimal number of bits nb is used to encode the nc classes: nb = log2 nc . Error Correcting Output Code (code_ECOC) This coding scheme uses redundant bits. Typically, one bounds the number of binary classiers nb by nb 15log2 nc . However, it is not guaranteed to have a valid nb -representation of nc classes for all combinations. This routine based on backtracking can take some memory and time. One versus All Coding (code_OneVsAll) Each binary classier k = 1, ..., nc is trained to discriminate between class k and the union of the others. One Versus One Coding (code_OneVsOns) Each of the nb binary classiers is used to discriminate between a specic pair of nc classes nb = Dierent decoding schemes are implemented: Hamming Distance (codedist_hamming) This measure equals the number of corresponding bits in the binary result and the codeword. Typically, it is used for the Error Correcting Code. Bayesian Distance Measure (codedist_bay) The Bayesian moderated output of the binary classiers is used to estimate the posterior probability. Encoding using the previous algorithms of the LS-SVM multi-class classier can easily be done by codelssvm. It will be invoked by trainlssvm if an appropriate encoding scheme is dened in a model. An example shows how to use the Bayesian distance measure to extract the estimated class from the simulated encoded signal. Assumed are input and output data X and Y (size is respectively Ntrain Din and Ntrain 1), a kernel parameter sig2 and a regularization parameter gam. Yt corresponding to a set of data points Xt (size is Ntest Din ) is to be estimated: % encode for training >> model = initlssvm(X, Y, classifier, gam, sig2) >> model = changelssvm(model, codetype, code_MOC) >> model = changelssvm(model, codedist_fct, codedist_hamming) >> model = codelssvm(model) % implicitly called by next command >> model = trainlssvm(model) >> plotlssvm(model); % decode for simulating >> model = changelssvm(model, codedist_fct, codedist_bay) >> model = changelssvm(model, codedist_args,... {bay_modoutClass(model,Xt)}) >> Yt = simlssvm(model, Xt) nc (nc 1) . 2

68 Full syntax


We denote the number of used binary classiers by nbits and the number of dierent represented classes by nc. For encoding: >> [Yc, codebook, old_codebook] = code(Y, codefct) >> [Yc, codebook, old_codebook] = code(Y, codefct, codefct_args) >> Yc = code(Y, given_codebook)

Outputs Yc codebook(*) old_codebook(*) Inputs Y codefct(*) codefct_args(*) given_codebook(*) For decoding:

Nnbits encoded output classier nbits*nc matrix representing the used encoding d*nc matrix representing the original encoding Nd matrix representing the original classier Function to generate a new codebook (e.g. code_MOC) Extra arguments for codefct nbits*nc matrix representing the encoding to use

>> Yd = code(Yc, codebook,[], old_codebook) >> Yd = code(Yc, codebook,[], old_codebook, codedist_fct) >> Yd = code(Yc, codebook,[], old_codebook, codedist_fct, codedist_args)

Outputs Yd Inputs Y codebook old_codebook codedist_fct codedist_args(*) See also:

Nnc decoded output classier Nd matrix representing the original classier d*nc matrix representing the original encoding bits*nc matrix representing the encoding of the given classier Function to calculate the distance between to encoded classiers (e.g. codedist_hamming) Extra arguments of codedist_fct

code_ECOC, code_MOC, code_OneVsAll, code_OneVsOne, codedist_hamming





Estimate the model performance of a model with l-fold crossvalidation. CAUTION!! Use this function only to obtain the value of the crossvalidation score function given the tuning parameters. Do not use this function together with tunelssvm, but use crossvalidatelssvm instead. The latter is a faster implementation which uses previously computed results. Basic syntax >> cost = crossvalidate({Xtrain,Ytrain,type,gam,sig2}) >> cost = crossvalidate(model) Description The data is once permutated randomly, then it is divided into L (by default 10) disjoint sets. In the i-th (i = 1, ..., l) iteration, the i-th set is used to estimate the performance (validation set) of the model trained on the other l 1 sets (training set). Finally, the l (denoted by L) dierent estimates of the performance are combined (by default by the mean). The assumption is made that the input data are distributed independent and identically over the input space. As additional output, the costs in the dierent folds (costs) of the data are returned: >> [cost, costs] = crossvalidate(model) Some commonly used criteria are: >> cost = crossvalidate(model, 10, misclass, mean) >> cost = crossvalidate(model, 10, mse, mean) >> cost = crossvalidate(model, 10, mae, median) Full syntax Using LS-SVMlab with the functional interface: >> [cost, costs] = crossvalidate({X,Y,type,gam,sig2,kernel,preprocess}) >> [cost, costs] = crossvalidate({X,Y,type,gam,sig2,kernel,preprocess}, L) >> [cost, costs] = crossvalidate({X,Y,type,gam,sig2,kernel,preprocess},... L, estfct, combinefct) Outputs cost costs(*) Inputs X Y type gam sig2 kernel(*) preprocess(*) L(*) estfct(*) combinefct(*)

Cost estimation of the L-fold cross-validation L1 vector with costs estimated on the L dierent folds Training input data used for dening the LS-SVM and the preprocessing Training output data used for dening the LS-SVM and the preprocessing function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original Number of folds (by default 10) Function estimating the cost based on the residuals (by default mse) Function combining the estimated costs on the dierent folds (by default mean)

70 Using the object oriented interface: >> >> >> >> [cost, [cost, [cost, [cost, costs] costs] costs] costs] = = = =


crossvalidate(model) crossvalidate(model, L) crossvalidate(model, L, estfct) crossvalidate(model, L, estfct, combinefct)

Outputs cost costs(*) Inputs model L(*) estfct(*) combinefct(*)

Cost estimation of the L-fold cross-validation L1 vector with costs estimated on the L dierent folds Object oriented representation of the LS-SVM model Number of folds (by default 10) Function estimating the cost based on the residuals (by default mse) Function combining the estimated costs on the dierent folds (by default mean)

See also: leaveoneout, gcrossvalidate, trainlssvm, simlssvm





Bias term correction for the LS-SVM classier Basic syntax >> model = deltablssvm(model, b_new) Description This function is only useful in the object oriented function interface. Set explicitly the bias term b_new of the LS-SVM model. Full syntax >> model = deltablssvm(model, b_new) Outputs model Inputs model b_new See also: roc, trainlssvm, simlssvm, changelssvm

Object oriented representation of the LS-SVM model with initial tuning parameters Object oriented representation of the LS-SVM model m1 vector with new bias term(s) for the model




denoise kpca

Reconstruct the data mapped on the most important principal components. Basic syntax >> Xd = denoise_kpca(X, kernel, kernel_par); Description Denoising can be done by moving the point in input space so that its corresponding map to the feature space is optimized. This means that the data point in feature space is as close as possible with its corresponding reconstructed points by using the principal components. If the principal components are to be calculated on the same data X as one wants to denoise, use the command: >> Xd = denoise_kpca(X, kernel, kernel_par); >> [Xd,lam,U] = denoise_kpca(X, kernel, kernel_par, [], etype, nb); When one wants to denoise data Xt other than the data used to obtain the principal components: >> Xd = denoise_kpca(X, kernel, kernel_par, Xt); >> [Xd, lam, U] = denoise_kpca(X, kernel, kernel_par, Xt, etype, nb); Full syntax >> [Xd, lam, U] = denoise_kpca(X, kernel, kernel_par, Xt); >> [Xd, lam, U] = denoise_kpca(X, kernel, kernel_par, Xt, etype); >> [Xd, lam, U] = denoise_kpca(X, kernel, kernel_par, Xt, etype, nb); Outputs Xd lam(*) U(*) Inputs X kernel kernel_par Xt(*) etype(*) nb(*) >> Xd

Nd (Ntd) matrix with denoised data X (Xt) nb1 vector with eigenvalues of principal components Nnb (Ntd) matrix with principal eigenvectors Nd matrix with data points used for nding the principal components Kernel type (e.g. RBF_kernel) Kernel parameter(s) (for linear kernel, use []) Ntd matrix with noisy points (if not specied, X is denoised instead) eig(*), svd, eigs, eign Number of principal components used in approximation

= denoise_kpca(X, U, lam, kernel, kernel_par, Xt);

Outputs Xd Inputs X U lam kernel kernel_par Xt(*) See also:

Nd (Ntd) matrix with denoised data X (Xt) Nd matrix with data points used for nding the principal components Nnb (Ntd) matrix with principal eigenvectors nb1 vector with eigenvalues of principal components Kernel type (e.g. RBF_kernel) Kernel parameter(s) (for linear kernel, use []) Ntd matrix with noisy points (if not specied, X is denoised instead)

kpca, kernel_matrix, RBF_kernel





Find the principal eigenvalues and eigenvectors of a matrix with Nystr oms low rank approximation method Basic syntax >> D = eign(A, nb) >> [V, D] = eign(A, nb) Description In the case of using this method for low rank approximation and decomposing the kernel matrix, one can call the function without explicit construction of the matrix A. >> D = eign(X, kernel, kernel_par, nb) >> [V, D] = eign(X, kernel, kernel_par, nb) Full syntax We denote the size of positive denite matrix A with a*a. Given the full matrix: >> D = eign(A,nb) >> [V,D] = eign(A,nb)

Outputs V(*) D Inputs A nb(*)

anb matrix with estimated principal eigenvectors of A nb1 vector with principal estimated eigenvalues of A a*a positive denite symmetric matrix Number of approximated principal eigenvalues/eigenvectors

Given the function to calculate the matrix elements: >> D = eign(X, kernel, kernel_par, n) >> [V,D] = eign(X, kernel, kernel_par, n)

Outputs V(*) D Inputs X kernel kernel_par nb(*)

anb matrix with estimated principal eigenvectors of A nb1 vector with estimated principal eigenvalues of A Nd matrix with the training data Kernel type (e.g. RBF_kernel) Kernel parameter(s) (for linear kernel, use []) Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation

See also: eig, eigs, kpca, bay_lssvm





Estimate the model performance of a model with generalized crossvalidation. CAUTION!! Use this function only to obtain the value of the generalized crossvalidation score function given the tuning parameters. Do not use this function together with tunelssvm, but use gcrossvalidatelssvm instead. The latter is a faster implementation which uses previously computed results. Basic syntax >> cost = gcrossvalidate({Xtrain,Ytrain,type,gam,sig2}) >> cost = gcrossvalidate(model) Description Instead of dividing the data into L disjoint sets, one takes the complete data and the eective degrees of freedom (eective number of parameters) into account. The assumption is made that the input data are distributed independent and identically over the input space. >> cost = gcrossvalidate(model) Some commonly used criteria are: >> cost = gcrossvalidate(model, misclass) >> cost = gcrossvalidate(model, mse) >> cost = gcrossvalidate(model, mae) Full syntax Using LS-SVMlab with the functional interface: >> cost = gcrossvalidate({X,Y,type,gam,sig2,kernel,preprocess}) >> cost = gcrossvalidate({X,Y,type,gam,sig2,kernel,preprocess}, estfct)

Outputs cost Inputs X Y type gam sig2 kernel(*) preprocess(*) estfct(*)

Cost estimation of the generalized cross-validation Training input data used for dening the LS-SVM and the preprocessing Training output data used for dening the LS-SVM and the preprocessing function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original Function estimating the cost based on the residuals (by default mse)

Using the object oriented interface: >> cost = gcrossvalidate(model) >> cost = gcrossvalidate(model, estfct)



Outputs cost Inputs model estfct(*) See also:

Cost estimation of the generalized cross-validation Object oriented representation of the LS-SVM model Function estimating the cost based on the residuals (by default mse)

leaveoneout, crossvalidatelssvm, trainlssvm, simlssvm




initlssvm, changelssvm

Only for use with the object oriented model interface Description The Matlab toolbox interface is organized in two equivalent ways. In the functional way, function calls need explicit input and output arguments. An advantage is their similarity with the mathematical equations. An alternative syntax is based on the concept of a model, gathering all the relevant signals, parameters and algorithm choices. The model is initialized by model=initlssvm(...), or will be initiated implicitly by passing the arguments of initlssvm(...) in one cell as the argument of the LS-SVM specic functions, e.g. for training: >> model = trainlssvm({X,Y,type,gam,sig2}) ... >> model = changelssvm(model,field,value) After training, the model contains the solution of the training including the used default values. All contents of the model can be requested (model.<contenttype>) or changed (changelssvm) each moment. The user is advised not to change the elds of the model by model.<field>=<value> as the toolbox cannot guarantee consistency anymore in this way. The dierent options are given in following table: General options representing the kind of model:

type: status: alpha: b: duration: latent: x_delays: y_delays: steps: gam: kernel_type: kernel_pars: weights:

classifier ,function estimation Status of this model (trained or changed ) Support values of the trained LS-SVM model Bias term of the trained LS-SVM model Number of seconds the training lasts Returning latent variables (no ,yes ) Number of delays of eXogeneous variables (by default 0 ) Number of delays of responses (by default 0 ) Number of steps to predict (by default 1 ) Regularisation parameter Kernel function Extra parameters of the kernel function Weighting function for robust regression

Fields used to specify the used training data:

x_dim: y_dim: nb_data: xtrain: ytrain: selector: costCV:

Dimension of input space Dimension of responses Number of training data (preprocessed) inputs of training data (preprocessed,coded) outputs of training data Indexes of training data effectively used during training Cost of the cross-validation score function when model is tuned

A.3. ALPHABETICAL LIST OF FUNCTION CALLS Fields with the information for pre- and post-processing (only given if appropriate): preprocess: preprocess or original schemed: Status of the preprocessing (coded ,original or schemed ) pre_xscheme: Scheme used for preprocessing the input data pre_yscheme: Scheme used for preprocessing the output data pre_xmean: Mean of the input data pre_xstd: Standard deviation of the input data pre_ymean: Mean of the responses pre_ystd: Standard deviation of the reponses The specications of the used encoding (only given if appropriate): code: Status of the coding (original ,changed or encoded) codetype: Used function for constructing the encoding for multiclass classification (by default none) codetype_args: Arguments of the codetype function codedist_fct: Function used to calculate to which class a coded result belongs codedist_args: Arguments of the codedist function codebook2: Codebook of the new coding codebook1: Codebook of the original coding Full syntax >> model = initlssvm(X, Y, type, gam, sig2, kernel, preprocess) Outputs model Inputs X Y type gam sig2 kernel(*) preprocess(*)


Object oriented representation of the LS-SVM model Nd matrix with the inputs of the training data N1 vector with the outputs of the training data function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original

>> model = changelssvm(model, field, value) Outputs model(*) Inputs model field value See also: trainlssvm, initlssvm, simlssvm, plotlssvm.

Obtained object oriented representation of the LS-SVM model Original object oriented representation of the LS-SVM model Field of the model that one wants to change (e.g. preprocess) New value of the eld of the model that one wants to change





Quadratic Renyi Entropy for a kernel based estimator Basic syntax Given the eigenvectors and the eigenvalues of the kernel matrix, the entropy is computed by >> H = kentropy(X, U, lam) The eigenvalue decomposition can also be computed (or approximated) implicitly: >> H = kentropy(X, kernel, sig2) Full syntax >> H = kentropy(X, kernel, kernel_par) >> H = kentropy(X, kernel, kernel_par, etype) >> H = kentropy(X, kernel, kernel_par, etype, nb) Outputs H Inputs X kernel kernel_par etype(*) nb(*)

Quadratic Renyi entropy of the kernel matrix Nd matrix with the training data Kernel type (e.g. RBF_kernel) Kernel parameter(s) (for linear kernel, use []) eig(*), eigs, eign Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation

>> H = kentropy(X, U, lam) Outputs H Inputs X U lam See also: kernel_matrix, demo_fixedsize, RBF_kernel

Quadratic Renyi entropy of the kernel matrix Nd matrix with the training data Nnb matrix with principal eigenvectors nb1 vector with eigenvalues of principal components




kernel matrix

Construct the positive (semi-) denite and symmetric kernel matrix Basic Syntax >> Omega = kernel_matrix(X, kernel_fct, sig2) Description This matrix should be positive denite if the kernel function satises the Mercer condition. Construct the kernel values for all test data points in the rows of Xt, relative to the points of X. >> Omega_Xt = kernel_matrix(X, kernel_fct, sig2, Xt) Full syntax >> Omega = kernel_matrix(X, kernel_fct, sig2) >> Omega = kernel_matrix(X, kernel_fct, sig2, Xt) Outputs Omega Inputs X kernel sig2 Xt(*) See also: RBF_kernel, lin_kernel, kpca, trainlssvm

NN (NNt) kernel matrix Nd matrix with the inputs of the training data Kernel type (by default RBF_kernel) Kernel parameter(s) (for linear kernel, use []) Ntd matrix with the inputs of the test data





Kernel Principal Component Analysis (KPCA)

Basic syntax >> [eigval, eigvec] = kpca(X, kernel_fct, sig2) >> [eigval, eigvec, scores] = kpca(X, kernel_fct, sig2, Xt)

Description Compute the nb largest eigenvalues and the corresponding rescaled eigenvectors corresponding with the principal components in the feature space of the centered kernel matrix. To calculate the eigenvalue decomposition of this N N matrix, Matlabs eig is called by default. The decomposition can also be approximated by Matlab (eigs) or by Nystr oms method (eign) using nb components. In some cases one wants to disable (original) the rescaling of the principal components in feature space to unit length. The scores of a test set Xt on the principal components is computed by the call: >> [eigval, eigvec, scores] = kpca(X, kernel_fct, sig2, Xt)

Full syntax >> [eigval, >> [eigval, >> [eigval, >> [eigval, >> [eigval, >> [eigval, >> [eigval, >> [eigval, >> [eigval, >> [eigval, etype, nb) eigvec, eigvec, eigvec, eigvec, eigvec, eigvec, eigvec, eigvec, eigvec, eigvec, empty, omega] = kpca(X, kernel_fct, sig2) empty, omega] = kpca(X, kernel_fct, sig2, [], etype) empty, omega] = kpca(X, kernel_fct, sig2, [],etype, nb) empty, omega] = kpca(X, kernel_fct, sig2, [],etype, nb, rescaling) scores, omega] = kpca(X, kernel_fct, sig2, Xt) scores, omega] = kpca(X, kernel_fct, sig2, Xt, etype) scores, omega] = kpca(X, kernel_fct, sig2, Xt,etype, nb) scores, omega] = kpca(X, kernel_fct, sig2, Xt,etype, nb, rescaling) scores, omega, recErrors] = kpca(X, kernel_fct, sig2, Xt, etype) scores, omega, recErrors] = kpca(X, kernel_fct, sig2, Xt, ...

>> [eigval, eigvec, scores, omega, recErrors] = kpca(X, kernel_fct, sig2, Xt, ... etype, nb, rescaling) >> [eigval, eigvec, scores, omega, recErrors, optOut] = kpca(X, kernel_fct, ... sig2, Xt, etype) >> [eigval, eigvec, scores, omega, recErrors, optOut] = kpca(X, kernel_fct, sig2, Xt, ... etype, nb) >> [eigval, eigvec, scores, omega, recErrors, optOut] = kpca(X, kernel_fct, sig2, Xt, ... etype, nb, rescaling)



Outputs eigval eigvec scores(*) omega(*) recErrors(*) optOut(*) Inputs X kernel sig2 Xt(*) etype(*) nb(*) rescaling(*) See also:

N (nb)1 vector with eigenvalues values NN (Nnb) matrix with the principal directions Ntnb matrix of the scores of test data (or []) NN centered kernel matrix Nt1 vector with the reconstruction errors of test data 12 cell array with the centered test kernel matrix in optOut{1} and the squared norms of the test points in the feature space in optOut{2} Nd matrix with the inputs of the training data Kernel type (e.g. RBF_kernel) Kernel parameter(s) (for linear kernel, use []) Ntd matrix with the inputs of the test data (or []) svd, eig(*),eigs,eign Number of eigenvalues/eigenvectors used in the eigenvalue decomposition approximation original size (o) or rescaling(*) (r)

bay_lssvm, bay_optimize, eign





Calculate the latent variables of the LS-SVM classier at the given test data Basic syntax >> Zt = latentlssvm({X,Y,classifier,gam,sig2,kernel}, {alpha,b}, Xt) >> Zt = latentlssvm({X,Y,classifier,gam,sig2,kernel}, Xt) >> [Zt, model] = latentlssvm(model, Xt) Description The latent variables of a binary classier are the continuous simulated values of the test or training data which are used to make the nal classications. The classication of a test point depends on whether the latent value exceeds the models threshold (b). If appropriate, the model is trained by the standard procedure (trainlssvm) rst. Full syntax Using the functional interface: >> Zt = latentlssvm({X,Y,classifier,gam,sig2,kernel}, {alpha,b}, Xt) >> Zt = latentlssvm({X,Y,type,gam,sig2,kernel,preprocess}, Xt) Outputs Zt Inputs X Y type gam sig2 kernel(*) preprocess(*) alpha(*) b(*) Xt

Ntm matrix with predicted latent simulated outputs Nd matrix with the inputs of the training data Nm vector with the outputs of the training data classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original N1 matrix with the support values the bias terms Ntd matrix with the inputs of the test data

Using the object oriented interface: >> [Zt, model] = latentlssvm(model, Xt) Outputs Zt model(*) Inputs model Xt See also: trainlssvm, simlssvm

Ntm matrix with continuous latent simulated outputs Trained object oriented representation of the LS-SVM model Object oriented representation of the LS-SVM model Ntd matrix with the inputs of the test data





Estimate the performance of a trained model with leave-one-out crossvalidation. CAUTION!! Use this function only to obtain the value of the leave-one-out crossvalidation score function given the tuning parameters. Do not use this function together with tunelssvm, but use leaveoneoutlssvm instead. The latter is a faster implementation based on one full matrix inverse. Basic syntax >> leaveoneout({X,Y,type,gam,sig2}) >> leaveoneout(model) Description In each iteration, one leaves out one point, and ts a model on the other data points. The performance of the model is estimated based on the point left out. This procedure is repeated for each data point. Finally, all the dierent estimates of the performance are combined (default by computing the mean). The assumption is made that the input data is distributed independent and identically over the input space. Full syntax Using the functional interface for the LS-SVMs: >> cost = leaveoneout({X,Y,type,gam,sig2,kernel,preprocess}) >> cost = leaveoneout({X,Y,type,gam,sig2,kernel,preprocess}, estfct, combinefct)

Outputs cost Inputs X Y type gam sig2 kernel(*) preprocess(*) estfct(*) combinefct(*)

Cost estimated by leave-one-out crossvalidation Training input data used for dening the LS-SVM and the preprocessing Training output data used for dening the LS-SVM and the preprocessing function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original Function estimating the cost based on the residuals (by default mse) Function combining the estimated costs on the dierent folds (by default mean)

Using the object oriented interface for the LS-SVMs: >> cost = leaveoneout(model) >> cost = leaveoneout(model, estfct) >> cost = leaveoneout(model, estfct, combinefct)



Outputs cost Inputs model estfct(*) combinefct(*)

Cost estimated by leave-one-out crossvalidation Object oriented representation of the model Function estimating the cost based on the residuals (by default mse) Function combining the estimated costs on the dierent folds (by default mean)

See also: crossvalidate, trainlssvm, simlssvm




lin kernel, MLP kernel, poly kernel, RBF kernel

Kernel implementations used with the Matlab training and simulation procedure Description lin kernel Linear kernel: K (xi , xj ) = xT i xj poly kernel Polynomial kernel:
d K (xi , xj ) = (xT i xj + t) ,


with t the intercept and d the degree of the polynomial. RBF kernel Radial Basis Function kernel: K (xi , xj ) = e with 2 the variance of the Gaussian kernel. MLP kernel Multilayer perceptron kernel: K (xi , xj ) = tanh(sxT i xj + ) with and s tuning parameters. Full syntax >> v = RBF_kernel(x1, X2, sig2) Outputs v Calls RBF_kernel Inputs x1 X2 sig2 See also: kernel_matrix, kpca, trainlssvm
||xi xj ||2 2

N1 vector with kernel values or lin_kernel, MLP_kernel, poly_kernel,... 1d matrix with a data point Nd matrix with data points Kernel parameters




linf, mae, medae, misclass, mse

Cost measures of residuals Description A variety of global distance measures can be dened: mae: L1 L1 L L0 CL1 (e) =
N i=1

|ei |

medae: linf:

median CL (e) = medianN i=1 |ei | 1

CL (e) = supi |ei | CL0 (e) = CL2 (e) =

N i=1

misclass: mse: Full syntax L2

|yi = yi | N e2 i

N i=1

>> C = mse(e) Outputs C Estimated cost of the residuals Calls mse mae, medae, linf or mse Inputs e Nd matrix with residuals >> [C, which] = trimmedmse(e, beta, norm) Outputs C which(*) Inputs e beta(*) norm(*)

Estimated cost of the residuals Nd matrix with indexes of the used residuals Nd matrix with residuals Trimming factor (by default 0.15) Function implementing norm (by default squared norm)

>> [rate, n, which] = misclass(Y, Yh) Outputs rate n(*) which(*) Inputs Y Yh See also: crossvalidate, leaveoneout, rcrossvalidate

Rate of misclassication (between 0 (none misclassied) and 1 (all misclassied)) Number of misclassied data points Indexes of misclassied points Nd matrix with true class labels Nd matrix with estimated class labels





Construct an LS-SVM model with one command line and visualize results if possible Basic syntax >> yp = lssvm(X,Y,type) >> yp = lssvm(X,Y,type,kernel) Description type can be classifier or function estimation (these strings can be abbreviated into c or f, respectively). X and Y are matrices holding the training input and training output. The i-th data point is represented by the i-th row X(i,:) and Y(i,:). The tuning parameters are automatically tuned via leave-one-out cross-validation or 10-fold cross-validation depending on the size of the data set. Leave-one-out cross-validation is used when the size is less or equal than 300 points. The loss functions for cross-validation are mse for regression and misclass for classication. If possible, the results will be visualized using plotlssvm. By default the Gaussian RBF kernel is used. Other kernels can be used, for example >> Yp = lssvm(X,Y,type,lin_kernel) >> Yp = lssvm(X,Y,type,poly_kernel) When using the polynomial kernel there is no need to specify the degree of the polynomial, the software will automatically tune it to obtain best performance on the cross-validation or leaveone-out score functions. >> Yp = lssvm(X,Y,type,RBF_kernel) >> Yp = lssvm(X,Y,type,lin_kernel) >> Yp = lssvm(X,Y,type,poly_kernel) Full syntax >> [Yp,alpha,b,gam,sig2,model] = lssvm(X,Y,type) >> [Yp,alpha,b,gam,sig2,model] = lssvm(X,Y,type,kernel) Inputs X Y type kernel(*) Nd matrix with the inputs of the training data N1 vector with the outputs of the training data function estimation (f) or classifier (c) Kernel type (by default RBF_kernel)

Outputs Yp alpha(*) b(*) gam(*) sig2(*) model(*) See also:

N m matrix with output of the training data N m matrix with support values of the LS-SVM 1 m vector with bias term(s) of the LS-SVM Regularization parameter (determined by cross-validation) Squared bandwidth (determined by cross-validation), for linear kernel sig2=0 Trained object oriented representation of the LS-SVM model

trainlssvm, simlssvm, crossvalidate, leaveoneout, plotlssvm.





Plot the LS-SVM results in the environment of the training data Basic syntax >> plotlssvm({X,Y,type,gam, sig2, kernel}) >> plotlssvm({X,Y,type,gam, sig2, kernel}, {alpha,b}) >> model = plotlssvm(model) Description The rst argument species the LS-SVM. The latter species the results of the training if already known. Otherwise, the training algorithm is rst called. One can specify the precision of the plot by specifying the grain of the grid. By default this value is 50. The dimensions (seldims) of the input data to display can be selected as an optional argument in case of higher dimensional inputs (> 2). A grid will be taken over this dimension, while the other inputs remain constant (0). Full syntax Using the functional interface: >> plotlssvm({X,Y,type,gam,sig2,kernel,preprocess}, {alpha,b}) >> plotlssvm({X,Y,type,gam,sig2,kernel,preprocess}, {alpha,b}, grain) >> plotlssvm({X,Y,type,gam,sig2,kernel,preprocess}, {alpha,b}, grain, seldims) >> plotlssvm({X,Y,type,gam,sig2,kernel,preprocess}) >> plotlssvm({X,Y,type,gam,sig2,kernel,preprocess}, [], grain) >> plotlssvm({X,Y,type,gam,sig2,kernel,preprocess}, [], grain, seldims) Inputs X Nd matrix with the inputs of the training data Y N1 vector with the outputs of the training data type function estimation (f) or classifier (c) gam Regularization parameter sig2 Kernel parameter(s) (for linear kernel, use []) kernel(*) Kernel type (by default RBF_kernel) preprocess(*) preprocess(*) or original alpha(*) Support values obtained from training b(*) Bias term obtained from training grain(*) The grain of the grid evaluated to compose the surface (by default 50) seldims(*) The principal inputs one wants to span a grid (by default [1 2]) Using the >> model >> model >> model object oriented interface: = plotlssvm(model) = plotlssvm(model, [], grain) = plotlssvm(model, [], grain, seldims) Trained object oriented representation of the LS-SVM model Object oriented representation of the LS-SVM model The grain of the grid evaluated to compose the surface (by default 50) The principal inputs one wants to span a grid (by default [1 2])

Outputs model(*) Inputs model grain(*) seldims(*) See also: trainlssvm, simlssvm.





Iterative prediction of a trained LS-SVM NARX model (in recurrent mode) Description >> Yp = predict({Xw,Yw,type,gam,sig2}, Xt, nb) >> Yp = predict(model, Xt, nb) Description The model needs to be trained using Xw, Yw which is the result of windowize or windowizeNARX. The number of time lags for the model is determined by the dimension of the input, or if not appropriate, by the number of given starting values. By default, the model is evaluated on the past points using simlssvm. However, if one wants to use this procedure for other models, this default can be overwritten by your favorite training function. This function (denoted by simfct) has to follow the following syntax: >> simfct(model,inputs,arguments) thus: >> Yp = predict(model, Xt, nb, simfct) >> Yp = predict(model, Xt, nb, simfct, arguments) Full syntax Using the functional interface for the LS-SVMs: >> Yp = predict({Xw,Yw,type,gam,sig2,kernel,preprocess}, Xt) >> Yp = predict({Xw,Yw,type,gam,sig2,kernel,preprocess}, Xt, nb) Outputs Yp Inputs Xw Yw type gam sig2 kernel(*) preprocess(*) Xt nb(*)

nb1 matrix with the predictions Nd matrix with the inputs of the training data N1 matrix with the outputs of the training data function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess or original (by default) nb1 matrix of the starting points for the prediction Number of outputs to predict

Using the object oriented interface with LS-SVMs: >> Yp = predict(model, Xt) >> Yp = predict(model, Xt, nb) Outputs Yp Inputs model Xt nb(*)

nb1 matrix with the predictions Object oriented representation of the LS-SVM model nb1 matrix of the starting points for the prediction Number of outputs to predict

90 Using another model:


>> Yp = predict(model, Xt, nb, simfct, arguments)

Outputs Yp Inputs model Xt nb simfct arguments(*) See also:

nb1 matrix with the predictions Object oriented representation of the LS-SVM model nb1 matrix of the starting points for the prediction Number of outputs to predict Function used to evaluate a test point Cell with the extra arguments passed to simfct

windowize, trainlssvm, simlssvm.





Construction of bias corrected 100(1 )% pointwise or simultaneous prediction intervals Description >> pi = predlssvm({X,Y,type,gam,sig2,kernel,preprocess}, Xt, alpha, conftype) >> pi = predlssvm(model,Xt, alpha, conftype) Description This function calculates bias corrected 100(1 )% pointwise or simultaneous prediction intervals. The procedure support homoscedastic data sets as well heteroscedastic data sets. The construction of the prediction intervals are based on the central limit theorem for linear smoothers combined with bias correction and variance estimation. Full syntax Using the functional interface: >> pi = predlssvm({X,Y,type,gam,kernel_par,kernel,preprocess}, Xt) >> pi = predlssvm({X,Y,type,gam,kernel_par,kernel,preprocess}, Xt, alpha) >> pi = predlssvm({X,Y,type,gam,kernel_par,kernel,preprocess}, Xt, alpha, conftype) Outputs pi Inputs X Y type gam sig2 kernel(*) preprocess(*) Xt alpha(*) conftype(*) N 2 matrix containing the lower and upper prediction intervals Training input data used for dening the LS-SVM and preprocessing Training output data used for dening the LS-SVM and preprocessing function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original Test points where prediction intervals are calculated Signicance level (by default 5%) Type of prediction interval pointwise or simultaneous (by default simultaneous)

Using the object oriented interface: >> pi = predlssvm(model) >> pi = predlssvm(model, Xt, alpha) >> pi = predlssvm(model, Xt, alpha, conftype) Outputs pi Inputs model alpha(*) conftype(*) N 2 matrix containing the lower and upper prediction intervals Object oriented representation of the LS-SVM model Signicance level (by default 5%) Type of prediction interval pointwise or simultaneous (by default simultaneous)

See also: trainlssvm, simlssvm, cilssvm




preimage rbf

Reconstruction or denoising after kernel PCA with RBF kernels, i.e. to nd the approximate pre-image (in the input space) of the corresponding feature space expansions. Basic syntax >> Xdtr = preimage_rbf(Xtr,sig2,U) % denoising on training data; Description This method uses a xed-point iteration scheme to obtain approximate pre-images for RBF kernels only. Denoising a test set Xnoisy can be done using: >> Xd = preimage_rbf(Xtr,sig2,U,Xnoisy,d); and for reconstructing feature space expansions: >> Xr = preimage_rbf(Xtr,sig2,U,projections,r); Full syntax >> Ximg = preimage_rbf(Xtr,sig2,U,B,type); >> Ximg = preimage_rbf(Xtr,sig2,U,B,type,npcs); >> Ximg = preimage_rbf(Xtr,sig2,U,B,type,npcs,maxIts); Outputs Ximg Inputs Xtr sig2 U B type npcs maxIts See also: denoise_kpca, kpca, kernel_matrix, RBF_kernel

Nd (Ntd) matrix with reconstructed or denoised data Nd matrix with training data points used for nding the principal components parameter of the RBF kernel Nnpcs matrix of principal eigenvectors for reconstruction B are the projections, for denoising B is the Ntd matrix of noisy data. If B is not specied, then Xtr is denoised instead reconstruct or denoise number of PCs used for approximation maximum iterations allowed, 1000 by default.




prelssvm, postlssvm

Pre- and postprocessing of the LS-SVM Description These functions should only be called by trainlssvm or by simlssvm. At rst the preprocessing assigns a label to each input and output component (a for categorical, b for binary variables or c for continuous). According to this label each dimension is rescaled: continuous: zero mean and unit variance categorical: no preprocessing binary: labels 1 and +1 Full syntax Using the object oriented interface: Preprocessing: >> >> >> >> model = prelssvm(model) Xp = prelssvm(model, Xt) [empty, Yp] = prelssvm(model, [], Yt) [Xp, Yp] = prelssvm(model, Xt, Yt)

Outputs model Xp Yp Inputs model Xt Yt

Preprocessed object oriented representation of the LS-SVM model Ntd matrix with the preprocessed inputs of the test data Ntd matrix with the preprocessed outputs of the test data Object oriented representation of the LS-SVM model Ntd matrix with the inputs of the test data to preprocess Ntd matrix with the outputs of the test data to preprocess

Postprocessing: >> >> >> >> model = postlssvm(model) Xt = postlssvm(model, Xp) [empty, Yt] = postlssvm(model, [], Yp) [Xt, Yt] = postlssvm(model, Xp, Yp)

Outputs model Xt Yt Inputs model Xp Yp

Postprocessed object oriented representation of the LS-SVM model Ntd matrix with the postprocessed inputs of the test data Ntd matrix with the postprocessed outputs of the test data Object oriented representation of the LS-SVM model Ntd matrix with the inputs of the test data to postprocess Ntd matrix with the outputs of the test data to postprocess





Estimate the model performance with robust L-fold crossvalidation (only regression). CAUTION!! Use this function only to obtain the value of the robust L-fold crossvalidation score function given the tuning parameters. Do not use this function together with tunelssvm, but use rcrossvalidatelssvm instead. Basic syntax >> cost = rcrossvalidate(model) >> cost = rcrossvalidate({X,Y,function,gam,sig2}) Description Robustness in the l-fold crossvalidation score function is obtained by iteratively reweighting schemes. This routine is ONLY valid for regression!! Full syntax Using LS-SVMlab with the functional interface: >> [cost, costs] = rcrossvalidate({X,Y,type,gam,sig2,kernel,preprocess}) >> [cost, costs] = rcrossvalidate({X,Y,type,gam,sig2,kernel,preprocess}, L) >> [cost, costs] = rcrossvalidate({X,Y,type,gam,sig2,kernel,preprocess}, L,... wfun, estfct) >> [cost, costs] = rcrossvalidate({X,Y,type,gam,sig2,kernel,preprocess}, L,... wfun, estfct, combinefct)

Outputs cost costs(*) Inputs X Y type gam sig2 kernel(*) preprocess(*) L(*) wfun(*) estfct(*) combinefct(*)

Cost estimation of the robust L-fold cross-validation L1 vector with costs estimated on the L dierent folds Training input data used for dening the LS-SVM and the preprocessing Training output data used for dening the LS-SVM and the preprocessing function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original Number of folds (by default 10) weighting scheme (by default: whuber) Function estimating the cost based on the residuals (by default mse) Function combining the estimated costs on the dierent folds (by default mean)

Using the object oriented interface: >> >> >> >> [cost, [cost, [cost, [cost, costs] costs] costs] costs] = = = = rcrossvalidate(model) rcrossvalidate(model, L) rcrossvalidate(model, L, wfun) rcrossvalidate(model, L, wfun, estfct)

A.3. ALPHABETICAL LIST OF FUNCTION CALLS >> [cost, costs] = rcrossvalidate(model, L, wfun, ... estfct, combinefct)


Outputs cost costs(*) ec(*) Inputs model L(*) wfun(*) estfct(*) combinefct(*)

Cost estimation of the robust L-fold cross-validation L1 vector with costs estimated on the L dierent folds N1 vector with residuals of all data Object oriented representation of the LS-SVM model Number of folds (by default 10) weighting scheme (by default: whuber) Function estimating the cost based on the residuals (by default mse) Function combining the estimated costs on the dierent folds (by default mean)

See also: mae,weightingscheme, crossvalidate, trainlssvm, robustlssvm





Linear ridge regression Basic syntax >> >> [w, b] = ridgeregress(X, Y, gam) [w, b, Yt] = ridgeregress(X, Y, gam, Xt)

Description Ordinary least squares on training errors together with minimization of a regularization parameter (gam). Full syntax >> [w, b] = ridgeregress(X, Y, gam) >> [w, b, Yt] = ridgeregress(X, Y, gam, Xt) Outputs w b Yt(*) Inputs X Y gam Xt(*) See also: bay_rr,bay_lssvm

d1 vector with the regression coecients bias term Nt1 vector with predicted outputs of test data Nd matrix with the inputs of the training data N1 vector with the outputs of the training data Regularization parameter Ntd matrix with the inputs of the test data





Robust training in the case of non-Gaussian noise or outliers Basic syntax >> [alpha, b] = robustlssvm({X,Y,type,gam,sig2,kernel}) >> model = robustlssvm(model) Robustness towards outliers can be achieved by reducing the inuence of support values corresponding to large errors. One should rst use the function tunelssvm so all the necessary parameters are optimally tuned before calling this routine. Full syntax Using the functional interface: >> >> >> >> [alpha, [alpha, [alpha, [alpha, b] b] b] b] = = = = robustlssvm({X,Y,type,gam,sig2}) robustlssvm({X,Y,type,gam,sig2,kernel}) robustlssvm({X,Y,type,gam,sig2,kernel, preprocess}) robustlssvm({X,Y,type,gam,sig2,kernel, preprocess}, {alpha,b})

Outputs alpha b Inputs X Y type gam sig2 kernel(*) preprocess(*) alpha(*) b(*)

N1 matrix with support values of the robust LS-SVM 11 vector with bias term(s) of the robust LS-SVM Nd matrix with the inputs of the training data N1 vector with the outputs of the training data function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original Support values obtained from training Bias term obtained from training

Using the object oriented interface: >> model = robustlssvm(model) Outputs model Inputs model See also: trainlssvm, tunelssvm, rcrossvalidate

Robustly trained object oriented representation of the LS-SVM model Object oriented representation of the LS-SVM model





Receiver Operating Characteristic (ROC) curve of a binary classier Basic syntax >> [area, se, thresholds, oneMinusSpec, sens, TN, TP, FN, FP] = roc(Zt, Y) Description The ROC curve [11] shows the separation abilities of a binary classier: by setting dierent possible classier thresholds, the data set is tested on misclassications [16]. As a result, a plot is shown where the various outcomes are described. If the plot has an area under the curve of 1 on test data, a perfectly separating classier is found (on that particular dataset), if the area equals 0.5, the classier has no discriminative power at all. In general, this function can be called with the latent variables Zt and the corresponding class labels Yclass >> Zt = [-.7 .3 1.5 ... -.2] >> roc(Zt, Yclass) Yclass = [-1 -1 1 .. 1]

For use in LS-SVMlab, a shorthand notation allows making the ROC curve on the training data. Implicit training and simulation of the latent values simplies the call. >> roc({X,Y,classifier,gam,sig2,kernel}) >> roc(model) Full syntax Standard call (LS-SVMlab independent): >> [area, se, thresholds, oneMinusSpec, sens, TN, TP, FN, FP] = roc(Zt, Y) >> [area, se, thresholds, oneMinusSpec, sens, TN, TP, FN, FP] = roc(Zt, Y, figure)

Outputs area(*) se(*) thresholds(*) oneMinusSpec(*) sens(*) TN(*) TP(*) FN(*) FP(*) Inputs Zt Y figure(*)

Area under the ROC curve Standard deviation of the residuals N1 dierent thresholds value 1-Specicity of each threshold value Sensitivity for each threshold value Number of true negative predictions Number of true positive predictions Number of false negative predictions Number of false positive predictions N1 latent values of the predicted outputs N1 of true class labels figure(*) or nofigure

A.3. ALPHABETICAL LIST OF FUNCTION CALLS Using the functional interface for the LS-SVMs:


>> [area, se, thresholds, oneMinusSpec, sens, TN, TP, FN, FP] = ... roc({X,Y,classifier,gam,sig2,kernel}) >> [area, se, thresholds, oneMinusSpec, sens, TN, TP, FN, FP] = ... roc({X,Y,classifier,gam,sig2,kernel}, figure)

Outputs area(*) se(*) thresholds(*) oneMinusSpec(*) sens(*) TN(*) TP(*) FN(*) FP(*) Inputs X Y type gam sig2 kernel(*) preprocess(*) figure(*)

Area under the ROC curve Standard deviation of the residuals Dierent thresholds 1-Specicity of each threshold value Sensitivity for each threshold value Number of true negative predictions Number of true positive predictions Number of false negative predictions Number of false positive predictions Nd matrix with the inputs of the training data N1 vector with the outputs of the training data classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original figure(*) or nofigure

Using the object oriented interface for the LS-SVMs: >> [area, se, thresholds, oneMinusSpec, sens, TN, TP, FN, FP] = roc(model) >> [area, se, thresholds, oneMinusSpec, sens, TN, TP, FN, FP] = roc(model, figure)

Outputs area(*) se(*) thresholds(*) oneMinusSpec(*) sens(*) TN(*) TP(*) FN(*) FP(*) Inputs model figure(*) See also: deltablssvm, trainlssvm

Area under the ROC curve Standard deviation of the residuals N1 vector with dierent thresholds 1-Specicity of each threshold value Sensitivity for each threshold value Number of true negative predictions Number of true positive predictions Number of false negative predictions Number of false positive predictions Object oriented representation of the LS-SVM model figure(*) or nofigure





Evaluate the LS-SVM at given points Basic syntax >> Yt = simlssvm({X,Y,type,gam,sig2,kernel}, {alpha,b}, Xt) >> Yt = simlssvm({X,Y,type,gam,sig2,kernel}, Xt) >> Yt = simlssvm(model, Xt) Description The matrix Xt represents the points one wants to predict. The rst cell contains all arguments needed for dening the LS-SVM (see also trainlssvm, initlssvm). The second cell contains the results of training this LS-SVM model. The cell syntax allows for exible and consistent default handling. Full syntax Using the functional interface: >> >> >> >> [Yt, [Yt, [Yt, [Yt, Zt] Zt] Zt] Zt] = = = = simlssvm({X,Y,type,gam,sig2}, Xt) simlssvm({X,Y,type,gam,sig2,kernel}, Xt) simlssvm({X,Y,type,gam,sig2,kernel,preprocess}, Xt) simlssvm({X,Y,type,gam,sig2,kernel}, {alpha,b}, Xt)

Outputs Yt Zt(*) Inputs X Y type gam sig2 kernel(*) preprocess(*) alpha(*) b(*) Xt

Ntm matrix with predicted output of test data Ntm matrix with predicted latent variables of a classier Nd matrix with the inputs of the training data Nm vector with the outputs of the training data function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original Support values obtained from training Bias term obtained from training Ntd inputs of the test data

Using the object oriented interface: >> [Yt, Zt, model] = simlssvm(model, Xt) Outputs Yt Zt(*) model(*) Inputs model Xt See also: trainlssvm, initlssvm, plotlssvm, code, changelssvm

Ntm matrix with predicted output of test data Ntm matrix with predicted latent variables of a classier Object oriented representation of the LS-SVM model Object oriented representation of the LS-SVM model Ntd matrix with the inputs of the test data





Train the support values and the bias term of an LS-SVM for classication or function approximation Basic syntax >> [alpha, b] = trainlssvm({X,Y,type,gam,kernel_par,kernel,preprocess}) >> model = trainlssvm(model) Description type can be classifier or function estimation (these strings can be abbreviated into c or f, respectively). X and Y are matrices holding the training input and training output. The i-th data point is represented by the i-th row X(i,:) and Y(i,:). gam is the regularization parameter: for gam low minimizing of the complexity of the model is emphasized, for gam high, tting of the training data points is stressed. kernel par is the parameter of the kernel; in the common case of an RBF kernel, a large sig2 indicates a stronger smoothing. The kernel type indicates the function that is called to compute the kernel value (by default RBF kernel). Other kernels can be used for example: >> [alpha, b] = trainlssvm({X,Y,type,gam,[d; p],poly_kernel}) >> [alpha, b] = trainlssvm({X,Y,type,gam,[] ,lin_kernel}) The kernel parameter(s) are passed as a column vector, in the case no kernel parameter is needed, pass the empty vector! The training can either be proceeded by the preprocessing function (preprocess) (by default) or not (original). The training calls the preprocessing (prelssvm, postlssvm) and the encoder (codelssvm) if appropriate. In the remainder of the text, the content of the cell determining the LS-SVM is given by {X,Y, type, gam, sig2}. However, the additional arguments in this cell can always be added in the calls. If one uses the object oriented interface (see also A.3.16), the training is done by >> model = trainlssvm(model) >> model = trainlssvm(model, X, Y) The status of the model checks whether a retraining is needed. The extra arguments X, Y allow to re-initialize the model with this new training data as long as its dimensions are the same as the old initiation. One implementation is included: The Matlab implementation: a straightforward implementation based on the matrix division \ (lssvmMATLAB.m). This implementation allows to train a multidimensional output problem. If each output uses the same kernel type, kernel parameters and regularization parameter, this is straightforward. If not so, one can specify the dierent types and/or parameters as a row vector in the appropriate argument. Each dimension will be trained with the corresponding column in this vector. >> [alpha, b] = trainlssvm({X, [Y_1 ... Y_d],type,... [ gam_1 ... gam_d], ... [sig2_1 ... sig2_d],... {kernel_1,...,kernel_d}})

102 Full syntax Using the functional interface:


>> [alpha, b] = trainlssvm({X,Y,type,gam,sig2}) >> [alpha, b] = trainlssvm({X,Y,type,gam,sig2,kernel}) >> [alpha, b] = trainlssvm({X,Y,type,gam,sig2,kernel,preprocess})

Outputs alpha b Inputs X Y type gam sig2 kernel(*) preprocess(*)

Nm matrix with support values of the LS-SVM 1m vector with bias term(s) of the LS-SVM Nd matrix with the inputs of the training data Nm vector with the outputs of the training data function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original

Using the object oriented interface: >> >> >> >> model model model model = = = = trainlssvm(model) trainlssvm({X,Y,type,gam,sig2}) trainlssvm({X,Y,type,gam,sig2,kernel}) trainlssvm({X,Y,type,gam,sig2,kernel,preprocess})

Outputs model(*) Inputs model X(*) Y(*) type(*) gam(*) sig2(*) kernel(*) preprocess(*) See also:

Trained object oriented representation of the LS-SVM model Object oriented representation of the LS-SVM model Nd matrix with the inputs of the training data Nm vector with the outputs of the training data function estimation (f) or classifier (c) Regularization parameter Kernel parameter(s) (for linear kernel, use []) Kernel type (by default RBF_kernel) preprocess(*) or original

simlssvm, initlssvm, changelssvm, plotlssvm, prelssvm, codelssvm




tunelssvm, linesearch & gridsearch

Tune the tuning parameters of the model with respect to the given performance measure Basic syntax [gam, sig2, cost] = tunelssvm({X,Y,type,[],[]}, optfun, costfun, costargs) where the values for tuning parameters (fourth and fth argument) are set to the status of empty. Using the object oriented interface this becomes: model = tunelssvm(model, optfun, costfun, costargs) where model is the object oriented interface of the LS-SVM. This is created by the command initlssvm. model = initlssvm(X,Y,type,[],[]); Description There are three optimization algorithms: simplex which works for all kernels, gridsearch is used (this one is restricted to 2-dimensional tuning parameter optimization); and the third one is linesearch (used with the linear kernel). The complete tuning process goes as follows: First, for every kernel, rst Coupled Simulated Annealing (CSA) determines suitable starting points for every method. The search limits of the CSA method are set to [exp(10), exp(10)]. Second, these starting points are then given to on of the three optimization routines above. These routines have to be explicitly specied by the user. CSA have already proven to be more eective than multistart gradient descent optimization. Another advantage of CSA is that it uses the acceptance temperature to control the variance of the acceptance probabilities with a control scheme. This leads to an improved optimization eciency because it reduces the sensitivity of the algorithm to the initialization parameters while guiding the optimization process to quasi-optimal runs. By default, CSA uses ve multiple starters. The tuning parameters are the regularization parameter gam and the squared kernel parameter (or sig2 in the case of the RBF_kernel). costfun gives an estimate of the performance of the model. Possible functions for costfun are crossvalidatelssvm, leaveoneoutlssvm, rcrossvalidatelssvm and gcrossvalidatelssvm. Possible combinations are >> model = >> model = >> model = tunelssvm(model, simplex, crossvalidatelssvm, {10,mse}) tunelssvm(model, gridsearch, crossvalidatelssvm, {10,mse}) tunelssvm(model, linesearch, crossvalidatelssvm, {10,mse})

In the robust cross-validation case, other possibilities for the weights are whampel, wlogistic and wmyriad. In case of function approximation for a linear kernel: >> gam >> gam = tunelssvm({X,Y,f,[],[],lin_kernel},simplex,... leaveoneoutlssvm, {mse}); = tunelssvm({X,Y,f,[],[],RBF_kernel}, linesearch,... leaveoneoutlssvm, {mse})

In the case of the RBF kernel: >> [gam, sig2] = tunelssvm({X,Y,f,[],[],RBF_kernel}, simplex,... leaveoneoutlssvm, {mse}); >> [gam, sig2] = tunelssvm({X,Y,f,[],[],RBF_kernel}, gridsearch,... leaveoneoutlssvm, {mse});



In case of the polynomial (degree is automatically tuned) and robust 10-fold cross-validation (combined with logistic weights): >> [gam, sig2] = tunelssvm({X,Y,f,[],[],poly_kernel}, simplex,... rcrossvalidatelssvm, {10,mae},wlogistic) In the case of classication (notice the use of the function misclass) >> gam >> gam = tunelssvm({X,Y,c,[],[],lin_kernel},simplex,... leaveoneoutlssvm, {misclass}); = tunelssvm({X,Y,c,[],[],lin_kernel},linesearch,... leaveoneoutlssvm, {misclass});

In the case of the RBF kernel where the 10-fold cross-validation cost function is the number of misclassications (misclass): >> [gam,sig2] = tunelssvm({X,Y,c,[],[],RBF_kernel}, simplex,... crossvalidatelssvm,{10,misclass}); >> [gam,sig2] = tunelssvm({X,Y,c,[],[],RBF_kernel}, gridsearch,... crossvalidatelssvm,{10,misclass}) The most simple algorithm to determine the minimum of a cost function with possibly multiple optima is to evaluate a grid over the parameter space and to pick the minimum. This procedure iteratively zooms to the candidate optimum. The StartingValues determine the limits of the grid over parameter space. >> Xopt = gridsearch(fun, StartingValues) This optimization function can be customized by passing extra options and the corresponding value. These options cannot be changed in the tunelssvm command. The default values of gridsearch, linesearch or simplex are used when invoking tunelssvm. >> [Xopt, Yopt, Evaluations, fig] = gridsearch(fun, startvalues, funargs,... option1,value1,...) the possible options and their default values are: nofigure =figure; maxFunEvals= 190; TolFun = .0001; TolX = .0001; grain = 10; zoomfactor = 5; An example is given: >> fun = inline(1-exp(-norm([X(1) X(2)])),X); >> gridsearch(fun,[-4 3; 2 -3]) the corresponding grid which is evaluated is shown in Figure A.1. >> gridsearch(fun,[-3 3; 3 -3],{},nofigure,nofigure,MaxFunEvals,1000)



1 0.8 cost 0.6 0.4 0.2 4 2 0 2 X


2 X

Figure A.1: This gure shows the grid which is optimized given the limit values [-4 3; 2 -3].

Full syntax

Optimization by exhaustive search over a two-dimensional grid:

>> [Xopt, Yopt, Evaluations, fig] = gridsearch(fun, startvalues, funargs,... option1,value1,...)

Outputs Xopt Yopt Evaluations fig Inputs CostFunction StartingValues FunArgs(*) option(*) value(*)

Optimal parameter set Criterion evaluated at Xopt Used number of iterations Handle to the gure of the optimization Function implementing the cost criterion 2*d matrix with limit values of the widest grid Cell with optional extra function arguments of fun The name of the option one wants to change The new value of the option one wants to change



Nofigure MaxFunEvals GridReduction The dierent options: TolFun TolX Grain

gure(*) or nogure Maximum number of function evaluations (default: 100) grid reduction parameter (e.g. 2: small reduction; 10: heavy reduction; default 5) Minimal toleration of improvement on function value (default: 0.0001) Minimal toleration of improvement on X value (default: 0.0001) Square root number of function evaluations in one grid (default: 10)

Optimization by exhaustive search of linesearch: >> [Xopt, Yopt, Evaluations, fig] = linesearch(fun, startvalues, funargs,... option1,value1,...)

Outputs Xopt Yopt iterations fig Inputs CostFun StartingValues FunArgs(*) option(*) value(*)

Optimal parameter set Criterion evaluated at Xopt Used number of iterations Handle to the gure of the optimization Function implementing the cost criterion 2*d matrix with limit values of the widest grid Cell with optional extra function arguments of fun The name of the option one wants to change The new value of the option one wants to change Nofigure MaxFunEvals GridReduction gure(*) or nogure Maximum number of function evaluations (default: 20) grid reduction parameter (e.g. 1.5: small reduction; 10: heavy reduction; default 2) Minimal toleration of improvement on function value (default: 0.01) Minimal toleration of improvement on X value (default: 0.01) Number of evaluations per iteration (default: 10)

The dierent options:

TolFun TolX Grain

Full syntax SIMPLEX - multidimensional unconstrained non-linear optimization. Simplex nds a local minimum of a function, via a function handle fun, starting from an initial point X. The local minimum is located via the Nelder-Mead simplex algorithm [23], which does not require any gradient information. opt contains the user specied options via a structure. The dierent options are set via a structure with members denoted by opt.* >> Xopt = simplex(fun,X,opt)



opts.Chi opts.Delta opts.Gamma opts.Rho The dierent options: opts.Sigma opts.MaxIter opts.MaxFunEvals opts.TolFun


Parameter governing expansion steps (default: 2) Parameter governing size of initial simplex (default: 1.2) Parameter governing contraction steps (default: 0.5) Parameter governing reection steps (default: 1) Parameter governing shrinkage steps (default: 0.5) Maximum number of optimization steps (default: 15) Maximum number of function evaluations (default: 25) Stopping criterion based on the relative change in value of the function in each step (default: 1e-6) Stopping criterion based on the change in the minimizer in each step (default: 1e-6)

See also: trainlssvm, crossvalidate




windowize & windowizeNARX

Re-arrange the data points into a (block) Hankel matrix for (N)AR(X) time-series modeling Basic Syntax >> w = windowize(A, window) >> [Xw,Yw] = windowizeNARX(X,Y,xdelays, ydelays, steps) Description Use windowize function to make a nonlinear AR predictor with a nonlinear regressor. The last elements of the resulting matrix will contain the future values of the time-series, the others will contain the past inputs. window is the relative index of data points in matrix A, that are selected to make a window. Each window is put in a row of matrix W. The matrix W contains as many rows as there are dierent windows selected in A. Schematically, this becomes >> A = [a1 b1 c1 d1 e1 f1 g1 a2 b2 c2 d2 e2 f2 g2 a3; b3; c3; d3; e3; f3; g3];

>> W = windowize(A, [1 2 3]) W = a1 b1 c1 d1 e1

a2 b2 c2 d2 e2

a3 b3 c3 d3 e3

b1 c1 d1 e1 f1

b2 c2 d2 e2 f2

b3 c3 d3 e3 f3

c1 d1 e1 f1 g1

c2 d2 e2 f2 g2

c3 d3 e3 f3 g3

The function windowizeNARX converts the time-series and its exogeneous variables into a block Hankel format useful for training a nonlinear function approximation as a nonlinear ARX model. Full syntax >> Xw = windowize(X, window) The length of window is denoted by w. Outputs Xw (N-w+1)w matrix of the sequences of windows over X Inputs X N1 vector with data points w w1 vector with the relative indices of one window >> [Xw, Yw, xdim, ydim, n] = windowizeNARX(X, Y, xdelays, ydelays) >> [Xw, Yw, xdim, ydim, n] = windowizeNARX(X, Y, xdelays, ydelays, steps)



Outputs Xw Yw xdim(*) ydim(*) n(*) Inputs X Y xdelays ydelays steps(*) See also:

Matrix of the data used for input including the delays Matrix of the data used for output including the next steps Number of dimensions in new input Number of dimensions in new output Number of new data points Nm vector with input data points Nd vector with output data points Number of lags of X in new input Number of lags of Y in new input Number of future steps of Y in new output (by default 1)

windowizeNARX, predict, trainlssvm, simlssvm



[1] Alzate C. and Suykens J.A.K. (2008), Kernel Component Analysis using an Epsilon Insensitive Robust Loss Function, IEEE Transactions on Neural Networks, 19(9), 15831598. [2] Alzate C. and Suykens J.A.K. (2010), Multiway Spectral Clustering with Out-of-Sample Extensions through Weighted Kernel PCA, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2), 335347. [3] Baudat G., Anouar F. (2001), Kernel-based methods and function approximation, in International Joint Conference on Neural Networks (IJCNN 2001), Washington DC USA, 1244 1249. [4] Cawley G.C., Talbot N.L.C. (2002), Ecient formation of a basis in a kernel induced feature space, in Proc. European Symposium on Articial Neural Networks (ESANN 2002), Brugge Belgium, 16. [5] Cristianini N., Shawe-Taylor J. (2000), An Introduction to Support Vector Machines, Cambridge University Press. [6] De Brabanter J., Pelckmans K., Suykens J.A.K., Vandewalle J. (2002), Robust crossvalidation score function for LS-SVM non-linear function estimation, International Conference on Articial Neural Networks (ICANN 2002), Madrid Spain, Madrid, Spain, Aug. 2002, 713719. [7] De Brabanter K., Pelckmans K., De Brabanter J., Debruyne M., Suykens J.A.K., Hubert M., De Moor B. (2009), Robustness of Kernel Based Regression: a Comparison of Iterative Weighting Schemes, Proc. of the 19th International Conference on Articial Neural Networks (ICANN), Limassol, Cyprus, September, 100110. [8] De Brabanter K., De Brabanter J., Suykens J.A.K., De Moor B. (2010), Optimized FixedSize Kernel Models for Large Data Sets, Computational Statistics & Data Analysis, 54(6), 14841504. [9] De Brabanter K., De Brabanter J., Suykens J.A.K., De Moor B. (2010), Approximate Condence and Prediction Intervals for Least Squares Support Vector Regression, Technical Report, 10-156. [10] Evgeniou T., Pontil M., Poggio T. (2000), Regularization networks and support vector machines, Advances in Computational Mathematics, 13(1), 150. [11] Fawcett T. (2006) An Introduction to ROC analysis, Pattern Recogniction Letters, 27, 861874. [12] Girolami M. (2002), Orthogonal series density estimation and the kernel eigenvalue problem, Neural Computation, 14(3), 669688. [13] Golub G.H. and Van Loan C.F. (1989), Matrix Computations, Johns Hopkins University Press, Baltimore, MD. 111



[14] Gy or L., Kohler M., Krzy zak A., Walk H. (2002), A Distribution-Free Theory of Nonparametric Regression, Springer [15] Hall, P. (1992), On Bootstrap Condence Intervals in Nonparametric Regression, Annals of Statistics, 20(2), 695711. [16] Hanley J.A., McNeil B.J. (1982), The meaning and use of the area under a receiver operating characteristic (ROC) curve Radiology 1982; 143, 29-36. [17] Huber P.J. (1964), Robust estimation of a location parameter, Ann. Math. Statist., 35, 73101. [18] Loader C. (1999), Local Regression and Likelihood, Springer-Verlag. [19] MacKay D.J.C. (1992), Bayesian interpolation, Neural Computation, 4(3), 415447. [20] Mika S., Sch olkopf B., Smola A., M uller K.-R., Scholz M., Ratsch G. (1999), Kernel PCA and de-noising in feature spaces, Advances in Neural Information Processing Systems 11, 536542, MIT Press. [21] Mika S., R atsch G., Weston J., Sch olkopf B., M uller K.-R. (1999), Fisher discriminant analysis with kernels, In Neural Networks for Signal Processing IX, 4148, IEEE. [22] Nabney I.T. (2002), Netlab: Algorithms for Pattern Recognition, Springer. [23] Nelder J. A. and Mead R., (1965) A simplex method for function minimization, Computer Journal, 7, 308-313. [24] Poggio T., Girosi F. (1990), Networks for approximation and learning, Proc. of the IEEE, 78, 14811497. [25] Rice S.O. (1939), The distribution of the maxima of a random curve, American Journal of Mathematics, 61(2), 409-416. [26] Ruppert D., Wand M.P. and Carroll R.J. (2003), Semiparametric Regression, Cambridge University Press. [27] Sch olkopf B., Burges C., Smola A. (Eds.) (1998), Advances in Kernel Methods - Support Vector Learning, MIT Press. [28] Sch olkopf B., Smola A. J., M uller K.-R. (1998), Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation, 10, 12991319. [29] Sch olkopf B., Smola A. (2002), Learning with Kernels, MIT Press. [30] Smola A.J., Sch olkopf B. (2000), Sparse greedy matrix approximation for machine learning, Proc. 17th International Conference on Machine Learning, 911918, San Francisco, Morgan Kaufman. [31] Stone M. (1974), Cross-validatory choice and assessment of statistical predictions, J. Royal Statist. Soc. Ser. B, 36, 111147. [32] Suykens J.A.K., Vandewalle J. (1999), Least squares support vector machine classiers, Neural Processing Letters, 9(3), 293300. [33] Suykens J.A.K., Vandewalle J. (2000), Recurrent least squares support vector machines, IEEE Transactions on Circuits and Systems-I, 47(7), 11091114. [34] Suykens J.A.K., De Brabanter J., Lukas L., Vandewalle J. (2002), Weighted least squares support vector machines : robustness and sparse approximation, Neurocomputing, Special issue on fundamental and information processing aspects of neurocomputing, 48(1-4), 85105.



[35] Suykens, J. A. K., Vandewalle, J., & De Moor, B. (2001), Intelligence and cooperative search by coupled local minimizers, International Journal of Bifurcation and Chaos, 11(8), 21332144. [36] Sun J. and Loader C.R. (1994), Simultaneous condence bands for linear regression and smoothing, Annals of Statistics, 22(3), 1328-1345. [37] Suykens J.A.K., Van Gestel T., Vandewalle J., De Moor B. (2002), A support vector machine formulation to PCA analysis and its Kernel version, IEEE Transactions on Neural Networks, 14(2), 447450. [38] Suykens J.A.K., Van Gestel T., De Brabanter J., De Moor B., Vandewalle J. (2002), Least Squares Support Vector Machines, World Scientic, Singapore. [39] Suykens J.A.K. (2008), Data Visualization and Dimensionality Reduction using Kernel Maps with a Reference Point, IEEE Transactions on Neural Networks, 19(9), 15011517. [40] Van Belle V., Pelckmans K., Suykens J.A.K., Van Huel S. (2010), Additive survival least squares support vector machines, Statistics in Medicine, 29(2), 296308. [41] Van Gestel T., Suykens J.A.K., Baestaens D., Lambrechts A., Lanckriet G., Vandaele B., De Moor B., Vandewalle J. (2001) Financial time series prediction using least squares support vector machines within the evidence framework, IEEE Transactions on Neural Networks (special issue on Neural Networks in Financial Engineering), 12(4), 809821. [42] Van Gestel T., Suykens J.A.K., De Moor B., Vandewalle J. (2001), Automatic relevance determination for least squares support vector machine classiers, Proc. of the European Symposium on Articial Neural Networks (ESANN 2001), Bruges, Belgium, 1318. [43] Van Gestel T., Suykens J.A.K., Baesens B., Viaene S., Vanthienen J., Dedene G., De Moor B., Vandewalle J. (2001), Benchmarking least squares support vector machine classiers , Machine Learning, 54(1), 532. [44] Van Gestel T., Suykens J.A.K., Lanckriet G., Lambrechts A., De Moor B., Vandewalle J. (2002), Bayesian framework for least squares support vector machine classiers, gaussian processes and kernel sher discriminant analysis, Neural Computation, 15(5), 11151148. [45] Van Gestel T., Suykens J.A.K., Lanckriet G., Lambrechts A., De Moor B., Vandewalle J. (2002), Multiclass LS-SVMs : moderated outputs and coding-decoding schemes, Neural Processing Letters, 15(1), 4558. [46] Van Gestel T., Suykens J.A.K., De Moor B., Vandewalle J. (2002), Bayesian inference for LS-SVMs on large data sets using the Nystr om method, International Joint Conference on Neural Networks (WCCI-IJCNN 2002), Honolulu, USA, May 2002, 27792784. [47] Vapnik V. (1995), The Nature of Statistical Learning Theory, Springer-Verlag, New York. [48] Vapnik V. (1998), Statistical Learning Theory, John Wiley, New-York. [49] Williams C.K.I., Seeger M. (2001), Using the Nystr om method to speed up kernel machines, Advances in neural information processing systems, 13, 682688, MIT Press. [50] Wahba G.,Wold S. (1975), A completely automatic french curve: tting spline functions by cross-validation, Comm. Statist., 4, 1-17. [51] Wahba G. (1990), Spline Models for observational data, SIAM, 39. [52] Xavier de Souza, S., Suykens, J. A. K., Vandewalle, J., & Boll e, D. (2010), Coupled Simulated Annealing, IEEE Transactions on Systems, Man and Cybernetics - Part B, 40(2), 320335.

You might also like