Abstract
An endocytoscope provides ultramagnified observation that enables physicians to achieve minimally invasive and real-time diagnosis in colonoscopy. However, great pathological knowledge and clinical experiences are required for this diagnosis. The computer-aided diagnosis (CAD) system is required that decreases the chances of overlooking neoplastic polyps in endocytoscopy. Towards the construction of a CAD system, we have developed texture-feature-based classification between neoplastic and non-neoplastic images of polyps. We propose a feature-selection method that selects discriminative features from texture features for such two-category classification by searching for an optimal manifold. With an optimal manifold, where selected features are distributed, the distance between two linear subspaces is maximised. We experimentally evaluated the proposed method by comparing the classification accuracy before and after the feature selection for texture features and deep-learning features. Furthermore, we clarified the characteristics of an optimal manifold by exploring the relation between the classification accuracy and the output probability of a support vector machine (SVM). The classification with our feature-selection method achieved 84.7% accuracy, which is 7.2% higher than the direct application of Haralick features and SVM.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Feature selection
- Manifold learning
- Texture feature
- Convolutional neural network
- Endocytoscopic images
- Automated pathological diagnosis
1 Introduction
An endocytoscope was recently developed as a new endoscopic imaging modality for minimally-invasive diagnosis. Endocytoscopy enables direct observation of the cells and their nuclei on the colon wall at a 500-time-maximum ultramagnification as shown in Fig. 1. However, great pathological knowledge and clinical experiences are necessary to achieve accurate endocytoscopy. Automated pathological diagnosis is required to prevent overlooking neoplastic lesions to support physicians [12, 18, 19]. This automated pathological diagnosis is achieved by robust two-category image classification. Figure 2 shows typical examples of neoplastic and non-neoplastic endocytoscopic images. The differences between the two categories are observed as differences of textures as shown in Fig. 2.
Robust two-category classification is a fundamental problem in pattern recognition, since multi-category classification is also based on a two-category classification concept. A robust classification can be achieved by an optimal pipeline of feature extraction, feature selection, and the classification of the selected features. A recent approach in image pattern recognition adopt deep learning architectures [15, 16, 23] as a full pipeline from feature extraction to the classification of extracted features. This deep learning approach can achieve robust multi-category classification with sufficiently large training dataset. However, deep learning fails to find optimal parameters with a small dataset. In medical image classification, collecting a large amount of training data with sufficient patient cases is difficult. Therefore, we have to tackle this problem with medical image classification, especially for a new medical modality, using a handcraft feature and a robust classifier. Previous works [12, 18, 19, 25, 26] adopted texture-based feature extraction and support-vector-machine classification without feature selection.
Principal component analysis (PCA) is a fundamental methodology for analysis and compression in data processing [14]. PCA is also used for feature selection [2, 20], through which finds a small number of principal components to represent the patterns in a category. This feature selection is useful and optimal for the representation of the distribution of a pattern with respect to the mean square root error. However, this feature selection is not optimal for the classification of patterns in different categories. The common or similar principal components among patterns in different categories can lead to incorrect classification. Fukunaga and Koontz proposed feature selection methods using PCA for clustering [7]. Their method removes the features shared by two categories from the features. The validity of their method is experimentally presented with phantom data, where the means of each category patterns are known. Fukui et al. proposed a constraint mutual subspace method [5] with which they tried to remove the common subspace among patterns of different categories. However, this methods was only designed for the mutual subspace method [17].
Grassmann distance between two category subspaces. (a) and (b) show Grassmann distance between category subspaces in D-dimensional space for different feature selections \(\psi \) and \(\tilde{\psi }\). In (a), Grassmann distance for \(m_L\)-dimensional subspace is longer than the one for \(m_S\)-dimensional subspace in (b). An optimal manifold gives largest distance between two-category subspaces for classification.
We propose a new feature-selection method for the linear classification of neoplastic and non-neoplastic lesions on endocytoscopic images. This selection method is achieved by searching for an optimal manifold for classification. Figure 3 summarises the concept of our feature-selection method. Triangles \(\mathcal {L}_1\) and \(\mathcal {L}_2\) represent the linear category subspaces spanned by the features of each category. The ideal discriminative feature gives a hyper-plane between the features of the two categories, as shown in Fig. 3(a). This ideal feature extraction gives robust classification by a linear classifier. The difference between the two category subspaces can be represented by the canonical angle between two subspaces. If a feature contains worthless elements, it gives a common subspace for two categories with small canonical angles. These worthless elements can lead to classification errors. The overlap region of the triangles in Fig. 3(b) shows the common subspace between two categories. For feature selection, we have to find a map \(\varPsi \) that maximises the canonical angles between two-category subspaces on a manifold as shown in Fig. 3(b). By projection with \(\varPsi \), we obtained a discriminative features as shown in Fig. 3(a). To find linear map \(\varPsi \), we used feature normalisation [7] and the Grassmann distance [4, 8, 10, 22].
Feature normalisation clarifies the importance of each eigenvector for category representations in PCA [7]. The Grassmann distance represents the difference of two linear subspaces by canonical angles [1, 4, 8, 10, 22, 24]. We obtained discriminative features by selecting the eigenvectors of the normalised features, which give the maximum Grassmann distance between two category subspaces as shown in Fig. 4. Our proposed method can analyse the extracted features and improve the classification accuracy. Furthermore, we integrated our proposed method to the processing flow of the classification shown in Fig. 5 that output the probabilities of each category for practical applications. In such practical applications as computer-aided diagnosis, accurate probabilities for each category are helpful information for physicians during a diagnosis. We evaluated the proposed method and its classification procedure to the classify neoplastic and non-neoplastic colorectal endocytoscopic images in numerical experiments.
Processing flow of classification of an endocytoscopic image. We use texture features for feature extraction. Classification is achieved by linear support vector machine with probability estimation. In the classification, we utilise rejection option with estimated probabilities. To selection the discriminative features, we propose a new selection method.
2 Mathematical Preliminaries
2.1 Linear Subspace of a Pattern
We have a set \(\{ \varvec{x}_i \}_i^N\) of N extracted D-dimensional features from observed patterns \( \{ \mathcal {X}_i \}_{i=1}^N\) of a category with the condition \(d \ll N\). These extracted features span \(\mathcal {L} = \mathrm {span}(\varvec{x}_1, \dots , \varvec{x}_N )\) for a category. We call this linear subspace a category subspace, which is approximated by
where \(m \le D\). In such standard approaches as subspace methods [11, 17, 20, 28], we obtain bases \(\varvec{y}_i, i=1,2, \dots , m\) from the following eigendecomposition
For this eigendecomposition, an eigenvector \(\varvec{y}_i\) correspond to an eigenvalue \(\eta _i\) with conditions \(\eta _1 \ge \eta _2 \ge \dots \eta _D \) and \(\eta _i \ge 0\). We project an input feature vector \(\varvec{x}\) to a linear subspace \(\mathrm {span}(\varvec{y}_1,\dots , \varvec{y}_m)\) by \(\varvec{Y}^{\top }\varvec{x}\), where \(\varvec{Y} = [\varvec{y}_1, \varvec{y}_2, \dots , \varvec{y}_m] \).
2.2 Grassmannian and Canonical Angle
The Grassmannian manifold (Grassmannian) \(\mathcal {G} (m,D)\) is the set of m-dimensional linear subspaces of \(\mathbb {R}^D\) [8]. An element of \(\mathcal {G} (m,D)\) can be represented by an orthogonal matrix \(\varvec{Y}\) of size D by m, where \(\varvec{Y}\) is comprised of the m basis vectors for a set of patterns in \(\mathbb {R}^D\).
Let \(\varvec{Y}_1\) and \(\varvec{Y}_2\) be the orthogonal matrices of size \(D \times m\). Canonical angles (principal angles) \(0 \le \theta _1 \le \dots \le \theta _m \le \frac{\pi }{2}\) between two subspaces \(\mathcal {L}_1= \mathrm {span} (\varvec{Y}_1)\) and \(\mathcal {L}_2 = \mathrm {span} (\varvec{Y}_2)\) are defined by
where \(i=1,2, \dots , k-1\).
For two linear subspaces \(\mathcal {L}_1\) and \(\mathcal {L}_2\), we have projection matrices \(\varvec{P}=\varvec{Y}_1\varvec{Y}_1^{\top }\) and \(\varvec{Q} = \varvec{Y}_2\varvec{Y}_2^{\top }\). We also have a set of canonical angles \(\{ \theta _k \}_{i=1}^m\) between these two linear subspaces with conditions \(\theta _1 \le \theta _2 \le \dots . \theta _K\). We obtain the canonical angles from the solution of the eigendecomposition problem
The solutions of these eigendecomposition problems are coincident [17]. For the practical computation of canonical angles, we have singular value decomposition
where \(\varvec{\varSigma }= \mathrm {diag (\cos \theta _1, \cos \theta _2, \dots , \cos \theta _m)}\) is the diagonal matrix, and \(\varvec{U}, \varvec{V} \in \mathbb {R}^{m \times m} \) are the orthogonal matrices. Note that we have the following relation,
for eigenvalues \(\lambda _k, \ i=1, 2, \dots m\) in the eigendecomposition in Eq. (4). Canonical angles are used to define the geodesic distance on a Grassmannian.
2.3 Grassmannian Distances
For the two points represented by \(\varvec{Y}_1\) and \(\varvec{Y}_2\) on a Grassmann manifold, we have the following seven distances on the Grassmannian,
-
1.
\(d_p(\varvec{Y}_1,\varvec{Y}_2)\): projection metric
-
2.
\(d_{\mu }(\varvec{Y}_1,\varvec{Y}_2)\): mean distance
-
3.
\(d_{\mathrm {min}}(\varvec{Y}_1,\varvec{Y}_2)\): minimum canonical angle
-
4.
\(d_{\mathrm {max}}(\varvec{Y}_1,\varvec{Y}_2)\): maximum canonical angle
-
5.
\(d_{BC}(\varvec{Y}_1,\varvec{Y}_2)\): Binet-Caucy metric
-
6.
\(d_g(\varvec{Y}_1,\varvec{Y}_2)\): geodesic distance
-
7.
\(d_{c}(\varvec{Y}_1,\varvec{Y}_2)\): Procrustes (chordal) distance
These seven distances are defined using the canonical angles between two linear subspaces. The projection metric and mean distance are defined by
and
Furthermore, the sines of the maximum and minimum canonical angles
are also used as distances on a Grassmannian. Moreover, the Binet-Caucy distance is defined by
where the product of \(\cos \theta _i\) represents the similarity between two linear subspaces. Using canonical angles, we have geodesic distance
on a Grassmann manifold. We have two definitions of the Procrustes (chordal) distance,
where \(\Vert \cdot \Vert _{\mathrm {F}}\) is the Frobenius norm.
2.4 Normalisation of Features
We project the extracted features onto the discriminative feature space for accurate classification. We extract a set of features \(\{ \varvec{x}_i \in \mathbb {R}^D \}_{i=1}^N\) from a set of images with condition \(D \ll N\). Let \(\varvec{\mu } = \mathrm {E} (\varvec{x}_i)\) be the mean of the features. By setting \( \bar{\varvec{x}}_i = \varvec{x}_i - \varvec{\mu }\), we obtain a set of centred features \(\{ \bar{\varvec{x}}_i \}_{i=1}^N \). We assume each image belongs to either category \(\mathcal {C}_1\) or \(\mathcal {C}_2\). Therefore, set \(\{ \bar{\varvec{x}}_i \}_{i=1}^N \) is divided into two sets \(\{ \varvec{x}_i^{(1)} \}_{i=1}^{N_1}\) and \(\{ \varvec{x}_i^{(2)} \}_{i=1}^{N_2}\), where \(N_1\) and \(N_2\), respectively, represent the number of images in the first and second categories.
We define the autocorrelation matrices in the centred feature space as
where \(\varvec{X}_1 = [\varvec{x}^{(1)}_1, \varvec{x}^{(1)}_2, \dots , \varvec{x}^{(1)}_{N_1}]\) and \(\varvec{X}_2 = [\varvec{x}^{(2)}_1, \varvec{x}^{(2)}_2, \dots , \varvec{x}^{(2)}_{N_2}]\), for the two categories. We define the covariance matrix of all the features as
where we set \(P(\mathcal {C})_1= \frac{N_1}{N_1+N_2}\) and \(P(\mathcal {C})_2 = \frac{N_2}{N_1+N_2}\). Fukunaga [7] used the autocorrelation matrix of all the features instead of \(\varvec{C}\) in Eq. (15). Fukunaga [6] used covariance matrices \(\varvec{C}_1\) and \(\varvec{C}_2\) for the two categories instead of \(\varvec{A}_1\) and \(\varvec{A}_2\) in Eq. (15). In this manuscript, we adopt covariance matrix \(\varvec{C}\) for all features to remove the common features of the two categories. Here, autocorrelation matrices \(\varvec{A}_1\) and \(\varvec{A}_2\) include the gaps of the means between all the features and each category. The covariance matrix gives the following eigendecomposition problem \(\varvec{CV} = \varvec{V}\varvec{\varXi }\), where \(\varvec{\varXi }= \mathrm {diag}(\xi _1,\xi _2,\dots ,\xi _D)\) consists of eigenvalues \(\xi _i\) for \(i=1,2, \dots , D\) the condition \(\xi _1\ge \xi _2 \ge \dots \ge \xi _{D}\). The eigendecomposition results derive a whitening matrix \(\varvec{W} = \varvec{\varXi }^{-\frac{1}{2}}\varvec{V}^{\top }\).
Using this whitening matrix and Eq. (15), we obtain the following relation
where \(\varvec{I}\) is an identity matrix. The solutions of the eigenvalue problems
where we set \(j=1,2\), give the bases of two category subspaces. From Eqs. (16) and (17), we have
This leads \(\tilde{\varvec{A}_1} \varvec{\phi }_i^{(2)} = (1-\lambda _i^{(2)})\varvec{\phi }_i^{(2)} \). These relation give the following relations
and
Equations (19) and (20) show that both eigenvalue problems give the same set of eigenvectors, and corresponding eigenvalues. Note that the two sets of eigenvalues are reversely ordered. The eigenvalue orders in Eq. (20) satisfy
These relations imply that the eigenvectors corresponding to the large eigenvalues of \(\tilde{\varvec{A}}_1\) contribute to represent the subspace for \(\mathcal {C}_1\), although they only make minor contribution to the representation for \(\mathcal {C}_2\). Therefore, we obtained discriminative features by projecting the features to a linear subspace given by \(\mathrm {span}(\varvec{\phi }_{1}^{(1)},\varvec{\phi }_{1}^{(2)},\varvec{\phi }_{2}^{(1)},\varvec{\phi }_{2}^{(2)}, \dots , \varvec{\phi }_{d}^{(1)},\varvec{\phi }_{d}^{(2)} )\). We discuss how to decide number d in the next section.
2.5 Linear Classification with Rejection Option
We use a linear support vector machine (SVM) as a classifier. SVM classifies an input feature vector \(\varvec{x} \in \mathbb {R}^D\) by \(\mathrm {sing}(f(\varvec{x}))\), where \(f(\cdot )\) is a decision function. The parameters and the hyperparameters of this decision function are optimised by a training procedure with training data. We can estimate the probability of belonging to each category with the optimised decision function. For two categories with label \(\mathcal {L} \in \{ 0, 1\}\), we can approximately estimate the probabilities
where A, B are the parameters in Platt’s method [21] for category i. \(P(\mathcal {L}=0 |\varvec{x})\) is obtained by \(1-P(\mathcal {L}=1 |\varvec{x})\). After the training for decision function, we obtain A, B by maximum likelihood estimation with the training dataset. In our method, we represent the non-neoplastic and neoplastic categories by 0 and 1. SVM can output the probabilities [21] \(p_\mathrm{{non}}\) and \(p_\mathrm{{nneo}}\) that satisfy \(p_\mathrm{{non}}+p_\mathrm{{neo}}=1\) and \(p_\mathrm{{non}},p_\mathrm{{non}} \in [0,1]\), for non-neoplastic and neoplastic images. We adopt the rejection option to remove low confident classification [2]. The rejection option discards classifications with low probabilities such that \(\mathrm {max}\ (p_{\mathrm {non}}, p_{\mathrm {neo}}) < \kappa \), where criteria \(\kappa \) was decided from preliminary experiments.
3 Feature-Selection Method
We have a set \(\{ \mathcal {X}_i \}_{i=1}^N \) of three-channel images. We extract a feature vector \(\varvec{x}_i \in \mathbb {R}^D\) from each image \(\mathcal {X}_i\). As in the same manner of Sect. 2.4, we divide \(\{ \varvec{x}_i \}_{i=1}^N \) to two sets \(\{ \varvec{x}_i^{(1)} \}_{i=1}^{N_1}\) and \(\{ \varvec{x}_i^{(2)} \}_{i=1}^{N_2}\), where \(N_1\) and \(N_2\) represent the number of images in the first and second categories.
We find a linear map \(\varPsi (\varvec{x})=\varvec{P^{*}W(\varvec{x}-\varvec{\mu })}\) by solving
where \(d(\cdot ,\cdot )\) represents one of the seven Grassmann distances. In this equation, \(\max (\cdot )\) returns one or more matrices that give the maximum distance, and \(\min (\cdot )\) selects one matrix with the minimum rank from the matrices. The solution is an orthogonal matrix \(\varvec{P}^{*}= [\varvec{\phi }_{1}^{(1)},\varvec{\phi }_{1}^{(2)},\varvec{\phi }_{2}^{(1)},\varvec{\phi }_{2}^{(2)}, \dots , \varvec{\phi }_{m}^{(1)},\varvec{\phi }_{m}^{(2)}]^{\top }\), which is comprised of 2m eigenvectors. Algorithm 1, which summarises a procedure to find the \(\varvec{P}^{*}\) using the training data, also returns the selected features for the training data.

4 Numerical Experiments
We evaluated our feature-selection method by applying the image classification of neoplastic and non-neoplastic images of a colon polyp. We used images of the magnified surfaces of polyps captured by an endocytoscope (CF-H290ECI, Olympus, Tokyo) with IRB approval. The neoplastic and non-neoplastic labels of the images ware annotated by expert physicians. The number of images of neoplastic and non-neoplastic polyps is summarised in Table 1. We extracted two kinds of features from these images: the Haralick features [9] and the convolutional neural network (CNN) features [15]. In this section, we first compare the classification accuracy before and after the feature selection for the Haralick features and next compare the classification accuracy before and after the feature selection of the combined Haralick and CNN features. We finally analysed the relation between the output probability and the accuracy with rejection option. The relation clarifies the validity of our feature-selection method as a manifold optimisation. In the final analysis, we also show the performance of two-category classification with a rejection option.
In these evaluations, we used a SVM [27] for the classifications and trained it by the training data and the best hyperparameters. We obtained the best hyperparameters of the SVM by five-fold cross validation with the training data. For a practical computation of SVM, we used libsvm [3]. We note that kernel SVM with a radius basis function gives less classification accuracy than the linear SVM for the endocytoscopic images in our preliminary experiments. The classification accuracy of the original Haralick, CNN and their combined features without feature selection is summarised in Table 2.
4.1 Haralick Feature
The Haralick feature represents the texture information measured by fourteen statistical categories of a gray-level co-occurrence matrix. In this case, we used thirteen categories for eight directions with three scales. To compute the statistics, we used contrast normalisation for each local region for the achievement of robustness against illumination changes. We then extracted 312-dimensional Haralick feature vector for an image, each element of which is normalised to a range of [0, 1]. We applied the proposed feature-selection method to the normalised Haralick features.
Figures 7(a), and (b) and (c) show the eigenvalues of the whitened autocorrelation matrices for two categories, and the Grassmann distances. Table 3 summarises the dimensions and classification accuracy for each feature selection with respect to seven Grassmann distances.
4.2 Combination of Haralick and CNN Features
We combined the Haralick and CNN features into a single feature vector. Figure 6 illustrates the architecture of our CNN. Its settings were decided by preliminary experiments [12], where we compared the several parameter settings with AlexNet [15], VGG Net [23], and their modified versions. Our shallow architecture gave the best classification accuracy among them. Before the extraction of the CNN features, we trained the CNN with training data from scratch. We applied batch normalisation and drop out to convolutional layers and full connected layers for the training. We used the values in full connection layer F7 in Fig. 6 for the feature extraction. Each element of the extracted CNN feature was normalised to a range of [0, 1]. For the CNN implementation, we used the Caffe platform [13] and combined the normalised Haralick and normalised CNN features as 888-dimensional column vectors.
We applied the proposed method to these normalised combined features. Figures 7(d), and (e) and (f) show the eigenvalues of the whitened autocorrelation matrices for two categories, and the Grassmann distances. Table 4 summarises the dimensions of the selected feature and classification accuracy for the selection with respect to seven Grassmann distances.
Eigenvalues and Grassmann distance in feature selection: upper and lower rows represent results for Haralick features and combined features, respectively. (a) and (d) show eigenvalues of \(\tilde{\varvec{A}}_1\) and \(\tilde{\varvec{A}}_1\) for non-neoplastic and neoplastic images. Horizontal and vertical axes represent indices of eigenvalues and eigenvalues. (b), (c), (e) and (f) summarise Grassmann distance after projection with respect to \(\varvec{P}\) given by a criteria \(\tau \). Horizontal and vertical axes represent \(\tau \) and Grassmann distance.
4.3 Analysis of an Optimal Manifold
We evaluated the classification accuracy with respect to the divided range of the output probabilities. The relation between the classification accuracy and the output probability represents the characteristics of the optimal manifold given by our feature selection. Figure 8(a) illustrates the distributions of the output probabilities for the original Haralick features and the selected features. Table 5 summarises the accuracy for each range of the output probabilities for the original Haralick features and the selected features. We also evaluated the classification accuracy with the rejection option of \(\kappa \in \{0.50, 0.55, \dots , 0.90\}\). The rejection rate is the ratio of the rejected images in the test data. The classification results with the rejection option are summarised in Fig. 8(b). We respectively obtained classification accuracy of 82.1%, 82.9% and 84.7% for the original Haralick feature, the selected features of the Haralick and the combined features with the same rejection rate of about 20%. Note that the percentage of inappropriate images in practical diagnosis is close to 20–30%. In these cases, we set \(\kappa \) to 0.70, 0.60 and 0.72 for them. Figure 9 shows examples of rejected images in the classification.
5 Discussion
The curves shown in Fig. 7(a) imply that a small number of eigenvectors is discriminative for classification, since almost all the eigenvalues are close to 0.5. Figures 7(b) and (c) imply that \(d_{\mu }\) gives the largest distance with the fewest selected eigenvectors. The accuracy is improved after the selection based on \(d_{\mu }\) as shown in Table 3. The dimension of the selected features is 30% of the dimension of an original Haralick feature.
The curves shown in Fig. 7(d) imply that there are no particular discriminative eigenvectors, since they are distributed almost uniformly from zero to one. In Table 4, \(d_g, d_p\), and \(d_c\) give the largest distance with all the eigenvectors. In this case, features are just whitened and used for classification. In both the cases of the Haralick and the combined features, our proposed method found discriminative features and improved the classification accuracy.
Analysis of optimal manifold. (a) Distributions of output probabilities for original and selected features: Figure illustrates probability distributions of the original Haralick, the selected Haralick and the Haralick\(+\)CNN features. Vertical axis represents percentage of output probabilities for each range. Horizontal axis represent divided rages. In this figure, p is given as \(\mathrm {max}(p_{\mathrm {non}}, p_{\mathrm {neo}})\). (b) Receiver operating characteristic (ORC) curves for classification accuracy and rejection rate: Vertical and horizontal axes represent classification accuracy and rejection rate for original Haralick, selected Haralick and Haralick\(+\)CNN features.
The results summarised in Fig. 8(b) indicate that the rejection option improved the classification accuracy. The rejection option correctly removed the low-confident classification in both the cases of the Haralick and the combined features. Table 5 also supports the validity of the rejection option. We can observe low classification accuracy for low output probabilities. Figure 9 shows that the rejected images are inappropriate due to bad observation conditions: over staining, bad illumination, and a lack of discriminative texture.
The mean distance gives the best feature selection for the Haralick feature. In this case, the distributions of the output probabilities were characteristic. We have low distribution for \(p > 0.9\) and medium distribution for \(0.6 \ge p >0.5\). The highest distribution is given for \(0.8 \ge p > 0.7\). For these distributions, the output probability approximated the classification accuracy well, as shown gray in Table 5. This characteristic of the relation between classification accuracy and output probability suggests the validity of the obtained manifold. In the case of the combined features, we did not observe the same characteristic even though the selected combined features achieved the highest classification accuracy.
Examples of rejected endocytoscopic images in classification: (a) insufficient texture for classification; (b) bad (too dark) illumination; (c) over staining of a polyp. These images were labelled as inappropriate for practical diagnosis. In practice, medical doctors also recognised them as inappropriate.
6 Conclusions
We proposed a feature-selection method that improves the classification accuracy of two categories by an optimal manifold search for classification. We experimentally evaluated the proposed method by comparing the classification accuracy before and after feature selection for about 19,000 endocytoscopic images. The experimental results showed the validity of our proposed method with the improvement of classification accuracy. Furthermore, we experimentally demonstrated the validity of the obtained optimal manifold by exploring the relation between output probability and classification accuracy. The output probability is helpful information for practical diagnosis. Moreover, we achieved robust classification with our feature-selection method and a linear classifier with a rejection option. The classification accuracy was 84.7% with a rejection rate of 20%, in which the classification accuracy was 7.2% higher than the classification with the original Haralick feature and SVM.
References
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
Edelman, A., Arias, T., Smith, S.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)
Fukui, K., Maki, A.: Difference subspace and its generalization for subspace-based methods. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2164–2177 (2015)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, London (1990)
Fukunaga, K., Koontz, W.L.G.: Application of the Karhunen-Loéve expansion to feature selection and ordering. IEEE Trans. Comput. C–19(4), 311–318 (1970)
Hamm, J., Lee, D.: Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of International Conference on Machine Learning, pp. 376–383 (2008)
Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst., Man, Cybern. 3(6), 610–621 (1973). https://doi.org/10.1109/TSMC.1973.4309314
Harandi, M., Sanderson, C., Shen, C., Lovell, B.: Dictionary learning and sparse coding on Grassmann manifolds: an extrinsic solution. In: Proceedings of The IEEE International Conference on Computer Vision, pp. 3120–3127 (2013)
Iijima, T.: Theory of pattern recognition. In: Electronics and Communications in Japan, pp. 123–134 (1963)
Itoh, H., Mori, Y., Misawa, M., Oda, M., Kudo, S.E., Mori, K.: Cascade classification of endocytoscopic images of colorectal lesions for automated pathological diagnosis. In: Proceedings of SPIE Medical Imaging (2018, in Press)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
Jollife, I.T.: Principal Component Analysis. Springer, New York (2002). https://doi.org/10.1007/b98835
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Proceedings of International Conference on Neural Information Processing Systems, vol. 1, pp. 1097–1105 (2012)
Lecun, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Maeda, K.: From the subspace methods to the mutual subspace method. In: Cipolla, R., Battiato, S., Farinella, G.M. (eds.) Computer Vision. SCI, vol. 285, pp. 135–156. Springer, Berlin (2010). https://doi.org/10.1007/978-3-642-12848-6_5
Mori, Y., et al.: Impact of an automated system for endocytoscopic diagnosis of small colorectal lesions: an international web-based study. Endoscopy 48, 1110–1118 (2016)
Mori, Y., et al.: Novel computer-aided diagnostic system for colorectal lesions by using endocytoscopy. Gastrointest. Endosc. 81, 621–629 (2015)
Oja, E.: Subspace Methods of Pattern Recognition. Research Studies Press, Boston (1983)
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 10, 61–74 (1999). MIT Press
Shigenaka, R., Raytchev, B., Tamaki, T., Kaneda, K.: Face sequence recognition using Grassmann distances and Gassmann kernels. In: Proceedings of International Joint Conference on Neural Networks, pp. 1–7 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations (2015)
Slama, R., Wannous, H., Daoudi, M., Srivastava, A.: Accurate 3D action recognition using learning on the Grassmann manifold. Pattern Recognit. 48(2), 556–567 (2015)
Tamaki, T., et al.: Computer-aided colorectal tumor classification in NBI endoscopy using CNN features. In: Proceedings of Korea-Japan Joint Workshop on Frontiers of Computer Vision (2016)
Tamaki, T., et al.: Computer-aided colorectal tumor classification in NBI endoscopy using local features. Mediacal Image Anal. 17, 78–100 (2013)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
Watanabe, S., Pakvasa, N.: Subspace method of pattern recognition. In: Proceedings of of the 1st International Joint Conference of Pattern Recognition (1973)
Acknowledgements
Parts of this research were supported by the Research on Development of New Medical Devices from the Japan Agency for Medical Research and Development (No. 18hk0102034h0103), and MEXT KAKENHI (No. 26108006, No. 17H00867).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Itoh, H., Mori, Y., Misawa, M., Oda, M., Kudo, SE., Mori, K. (2019). Discriminative Feature Selection by Optimal Manifold Search for Neoplastic Image Recognition. In: Leal-Taixé, L., Roth, S. (eds) Computer Vision – ECCV 2018 Workshops. ECCV 2018. Lecture Notes in Computer Science(), vol 11132. Springer, Cham. https://doi.org/10.1007/978-3-030-11018-5_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-11018-5_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11017-8
Online ISBN: 978-3-030-11018-5
eBook Packages: Computer ScienceComputer Science (R0)