Abstract
For a binary classification problem, twin support vector machine (TSVM) has a faster learning speed than support vector machine (SVM) by seeking a pair of nonparallel hyperplanes. However, TSVM has two deficiencies: poor discriminant ability and poor sparsity. To relieve them, we propose a novel sparse discriminant twin support vector machine (SD-TSVM). Inspired by the idea of the Fisher criterion, maximizing the between-class scatter and minimizing the within-class scatter, SD-TSVM introduces twin Fisher regularization terms, which may improve the discriminant ability of SD-TSVM. Moreover, SD-TSVM has a good sparsity by utilizing both the 1-norm of model coefficients and the hinge loss. Thus, SD-TSVM can efficiently perform data reduction. Classification results on nine real-world datasets show that SD-TSVM has a satisfactory performance compared with related methods.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Chen T, Guo Y, Hao S (2020) Unsupervised feature selection based on joint spectral learning and general sparse regression. Neural Comput Appl 32:6581–6589
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20(3):273–297
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University, Cambridge
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley, Hoboken
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64
den Hertog D (1994) Interior point approach to linear, quadratic and convex programming: algorithms and complexity. Kluwer Academic Publishers, Dordrecht
Gu Z, Zhang Z, Sun J, Li B (2017) Robust image recognition by l1-norm twin-projection support vector machine. Neurocomputing 223:1–11
Horn RA, Johnson RC (1985) Matrix analysis. Cambridge University Press, Cambridge
Jayadeva Khemchandani R, Chandra S (2007) Twin support vector machine for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Jiang J, Ma J, Chen C, Jiang X, Wang Z (2017) Noise robust face image super-resolution through smooth sparse representation. IEEE Trans Cybern 47(11):3991–4002
Kumar MA, Gopall M (2009) Least squares twin support vector machine for pattern classification. Expert Syst Appl 36:7535–7543
Liu L, Chu M, Yang Y, Gong R (2020) Twin support vector machine based on adjustable large margin distribution for pattern classification. Int J Mach Learn Cybern 11:2371–2389
Ma J, Tian J, Bai X, Tu Z (2013) Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognit 46:3519–3532
Mangasarian OL (2006) Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J Mach Learn Res 7(3):1517–1530
Mangasarian OL, Wild E (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74
Richhariya B, Tanveer M (2021) A fuzzy universum least squares twin support vector machine (FULSTSVM). Neural Comput Appl. https://doi.org/10.1007/s00521-021-05721-4
Shao Y, Zhang C, Wang X, Deng N (2011) Improvements on twin support vector machine. IEEE Trans Neural Netw 22(6):962–968
Shi Y, Miao J, Niu L (2019) Feature selection with MCP2 regularization. Neural Comput Appl 31:6699–6709
Tanveer M (2015) Robust and sparse linear programming twin support vector machines. Cogn Comput 7(1):137–149
Thi HAL, Phan DN (2017) DC programming and DCA for sparse fisher linear discriminant analysis. Neural Comput Appl 28:2809–2822
Tian Y, Ju X, Qi Z (2014) Efficient sparse nonparallel support vector machines for classification. Neural Comput Appl 24(5):1089–1099
Vapnik VN (2000) The nature of statistical learning theory. Springer, Berlin
Zhang L, Zhou W, Chang P, Liu J, Yan Z, Wang T, Li F (2012) Kernel sparse representation-based classifier. IEEE Trans Signal Process 60(4):1684–1695
Zhang L, Zhou W, Jiao L (2004) Hidden space support vector machine. IEEE Trans Neural Netw 15(6):1424–1434
Zhang L, Zhou WD (2016) Fisher-regularized support vector machine. Inf Sci 343–344:79–93
Zhang Z, Zhen L, Deng N, Tan J (2014) Sparse least square twin support vector machine with adaptive norm. Appl Intell 41(4):1097–1107
Zheng X, Zhang L, Xu Z (2021) L1-norm Laplacian support vector machine for data reduction in semi-supervised learning. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05609-9
Zheng X, Zhang L, Yan L (2020) Feature selection using sparse twin bounded support vector machine. In: International conference on neural information processing. Springer, Bangkok, pp 357–369
Zheng X, Zhang L, Yan L (2021) CTSVM: a robust twin support vector machine with correntropy-induced loss function for binary classification problems. Inf Sci 559:22–45
Zheng X, Zhang L, Yan L (2021) Sample reduction using \(\ell 1\)-norm twin bounded support vector machine. In: Zhang H, Yang Z, Zhang Z, Wu Z, Hao TY (eds) Neural computing for advanced applications. Springer, Singapore, pp 141–153
Acknowledgements
This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, by the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animal rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
1.1 Appendix 1: Proof of Theorem 1
According to the properties of convex programming [10], we know that an optimization problem is a convex programming if and only if both its objective function and constraints are convex. In other words, an optimization problem is non-convex if and only if its objective function or constraints are non-convex. Now, we try to prove that the objective of optimization problem (17) or (18) is non-convex. Because these two problems are similar and symmetric, we mainly discuss one of them, or the optimization problem (17).
For the sake of simplification, we define
and
In doing so, we can express the optimization problem (17) as
If we assume that (45) is a convex programming, then \(J_1({\mathbf {w}}_1, b_1)\), \(J_2({\mathbf {w}}_1)\), and \(J_3({\mathbf{w}}_1, b_1)\) must be convex functions.
By substituting \(f_1({\mathbf {X}}_1)={\mathbf {X}}_1{\mathbf {w}}_1+b_1\) into (42), the first term in the optimization problem (45) can be expressed as:
Obviously, the second-order derivative of \(J_1({\mathbf {w}}_1, b_1)\) is \({\mathbf {X}}_1^T{\mathbf {X}}_1\) that is a positive semi-definite matrix. Thus, \(J_1({\mathbf {w}}_1, b_1)\) is convex.
By substituting \(f_1({\mathbf {X}}_1)={\mathbf {X}}_1{\mathbf {w}}_1+b_1\) and \(\overline{f_1({\mathbf {X}}_1)}={\mathbf {m}}_1^T{\mathbf {w}}_1+b_1\) into (43), the second term in the optimization problem (45) can be expressed as:
The second-order derivative of \(J_2({\mathbf {w}}_1)\) is \(\left( {\mathbf{X}}_1-{\mathbf {e}}_1{\mathbf {m}}_1^T\right) ^T\left( {\mathbf {X}}_1-{\mathbf{e}}_1{\mathbf {m}}_1^T\right)\) that is a positive semi-definite matrix. Thus, \(J_2({\mathbf {w}}_1)\) is convex.
By substituting \(f_1({\mathbf {X}}_2)={\mathbf {X}}_2{\mathbf {w}}_1+b_1\) and \(\overline{f_1({\mathbf {X}}_1)}={\mathbf {m}}_1^T{\mathbf {w}}_1+b_1\) into (44), the third term \(J_3({\mathbf {w}}_1, b_1)\) in the optimization problem (45) can be expressed as:
The second-order derivative of \(J_3({\mathbf {w}}_1,b_1)\) is \(-C_1\left( -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf{m}}_1^T\right) ^T\left( -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf {m}}_1^T\right)\) that is not positive semi-definite. Thus, \(J_3({\mathbf {w}}_1, b_1)\) is non-convex.
We prove that both \(J_1({\mathbf {w}}_1, b_1)\) and \(J_2({\mathbf {w}}_1)\) are convex, but \(J_3({\mathbf {w}}_1, b_1)\) is non-convex. This contradicts the assumption that the optimization problem (17) is convex.
In the same way, the non-convexity of optimization problem (18) can be proved, not tired in words here. It completes the proof of Theorem 1.
1.2 Appendix 2: Derivation of optimization problems
To construct convex forms for the optimization problems (17) and (18), we first remove the non-convex parts from the objective functions and then make them as constraints. We illustrate the transformation using (17). For (18), we can follow the same way.
In (17), the minimization of the non-convex part \(-\left\| -f_1({\mathbf {X}}_2)-\overline{f_1({{\mathbf {X}}_1})}\right\| ^2_2\) is identical to the maximization of \(\left\| -f_1({\mathbf {X}}_2)-\overline{f_1({{\mathbf {X}}_1})}\right\| ^2_2\). We take the latter as constraints and get a variant of (17) in the following:
where \(\epsilon _1\) is a constant greater than or equal to 0.
Moreover, the constraints of (49) can be replaced by
The second group of inequalities in (50) holding true means that the value of \(\overline{f_1({\mathbf {X}}_1)}\) is as large as possible, and that of \(f_1({\mathbf {x}}_{2i})\) as small as possible due to \(f_1({\mathbf {X}}_2)\le {0}\) and \(f_1({\mathbf {X}}_1)\ge {0}\), which is inconsistent with the goal of minimizing \(\left\| f_1({\mathbf{X}}_1)\right\| ^2\). Consequently, these inequalities of (50) are redundant and can be ignored. Therefore, (49) can be simplified to:
Considering the case of outlier or noise existing in data, we relax the constraints by adding slack variables \(\xi _{2i}\ge 0\) and set \(\epsilon _1=1\). Finally, we have the convex variant of (17) as follows:
where \(C_3>0\) and \({\varvec{\xi }}_2=[\xi _{21},\dots ,\xi _{2n_2}]^T\).
Thus, we have completed the derivation from (17) to (19). Similarly, the optimization problem (18) also can be derived to (20) in the same way.
1.3 Appendix 3: Proof of Theorem 2
Here, we need to prove that the objective functions and constraints of optimization problems (19) and (20) are convex. Since the structure of (19) and (20) is similar, we discuss only one of them in detail.
To simplify the expression of (19), we use \(J_1({\mathbf {w}}_1,b_1)\) (42) and \(J_2({\mathbf {w}}_1)\) (43) in “Appendix 1” and define \(J_4({\varvec{\xi }}_2)=C_3{\mathbf {e}}_2^T{\varvec{\xi }}_2\). Therefore, (19) can be rewritten as
where \({\varvec{\nu }}_2=[{\mathbf {w}}_1^T,b_1,{\varvec{\xi }}_2]^T\), \(J_{c_1}({\varvec{\nu }}_2) = {\mathbf {G}}_1 {\varvec{\nu }}_2^T\) with \({\mathbf {G}}_1 = \left[ -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf {m}}_1^T,\,-2{\mathbf {e}}_2,\,{\mathbf{I}}_{n_2\times n_2}\right]\), and \(J_{c_2}({\varvec{\nu }}_2) = {\mathbf {G}}_2 {\varvec{\nu }}_2^T\) with \({\mathbf {G}}_2 = \left[ {\mathbf {O}}_{n_2\times m},\,{\mathbf {0}}_{n_2},\,{\mathbf {I}}_{n_2\times n_2}\right]\).
According to the proof of Theorem 1, we know that both \(J_1({\mathbf {w}}_1, b_1)\) and \(J_2({\mathbf {w}}_1)\) are convex functions. In addition, \(J_3({\varvec{\xi }}_2)\) is also convex since it is a linear function. Thus, the objective function of the problem (19) is convex.
Next, we discuss the constraints \(J_{c_1}({\varvec{\nu }}_2)\) and \(J_{c_2}({\varvec{\nu }}_2)\). The second-order derivative of constraints can be described as:
We know that if the second-order derivative of a function is greater than or equal to 0, this function is a convex function. Therefore, both \(J_{c_1}({\varvec{\nu }}_2)\) and \(J_{c_2}({\varvec{\nu }}_2)\) are convex.
Since the objective and constraint functions of (19) are convex, the optimization problem (19) is a convex programming. In a similar way, we can prove that the optimization problem (20) is also convex.
It completes the proof.
1.4 Appendix 4: Derivation from (23) to (25)
For convenience, we write the optimization problem (23) again. Namely,
The first term in the objective function of the optimization problem (23) can be expressed as:
The second term in the objective function of (23) can be rewritten as:
where \({\mathbf {X}}_0=C_1^{\frac{1}{2}}({\mathbf {X}}_1-{\mathbf {e}}_1{\mathbf{m}}_1^T)\).
Next, we rewrite the last two terms in the objective of (23) as:
The inequality constraints in (23) can be directly written as:
For simplicity, we denote \({\varvec{\alpha }}_1=\left[ \begin{array}{ccccc} {{\varvec{\beta }}_1^{*T}},\,&\quad {{\varvec{\beta }}_1^T},\,&\quad \gamma _1^*,\,&\quad \gamma _1,\,&\quad {{\varvec{\xi }}_2^T}\end{array} \right] ^T \in {\mathbb {R}}^{(2m+2+n_2)}\). Thus, the object function of optimization problem (23) can be represented as (25). Namely,
where
with
and
Rights and permissions
About this article
Cite this article
Zheng, X., Zhang, L. & Yan, L. Sparse discriminant twin support vector machine for binary classification. Neural Comput & Applic 34, 16173–16198 (2022). https://doi.org/10.1007/s00521-022-07001-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07001-1