3 DressClassification 22220231
3 DressClassification 22220231
3 DressClassification 22220231
Mohamed Elleuch 1,3, Anis Mezghani 2, Mariem Khemakhem 3 and Monji Kherallah 3
1
National School of Computer Science (ENSI), University of Manouba, Tunisia
2
Higher Istitute of Industrial Management, University of Sfax, Tunisia
3
Faculty of Sciences, University of Sfax, Tunisia
elleuch.mohameds@gmail.com
Abstract. The need of a powerful visual analytics tools becomes a necessity to-
day especially with the emergence of pictures on the Internet and their use sev-
eral times instead of text. In this paper, a new approach for clothing style classi-
fication is presented. The types of clothing items we consider in the proposed
system include shirt, pants, suit, dress and so on. Certainly, clothing style clas-
sification represents a recent computer vision research subject who has several
attractive applications, including e-commerce, criminal law and on-line adver-
tising. In our proposed approach, the classification has been carried out by Deep
Convolutional Neural Networks (CNNs). This Deep Learning technique Incep-
tion-v3 has shown very good performances for different object recognition
problems. For deep features extraction, we use a machine learning technique
called Transfer learning to refine pretrained models. Experiments are performed
on two clothing datasets, particularly on the large and public dataset ImageNet.
According to the obtained results, the developed system provides better results
than those proposed in the state of the art.
1 Introduction
Today, pictures become the main content on Internet. The size of these digital pictures
collected from online users has grown rapidly. The analysis of the collected data
makes it possible to better predict consumer behavior and also to recommend prod-
ucts [1]. Recently, several researches were conducted on clothing recognition [2, 3],
clothing item retrieval [4, 5, 6, 7], clothing style recognition [8, 9, 10]. In many cul-
tures, clothing reflects information about social status, age, gender and lifestyle.
Clothing is also an interesting descriptor in the identification of humans. Style and
texture variations present a major problem for clothing recognition. Also, clothes are
often subject to deformation and occlusion, in addition to the wide variation when
taken in different scenarios, like selfies compared to online photos. Clothing recogni-
tion algorithms relies usually on handcrafted features, like HOG, SIFT and histogram
analysis. Yamaguchi et al. [2] proposed a clothing recognizing system consisting of
2
three classifiers for each pixel. All the results are then combined for the final predic-
tion. The authors created a dataset containing 158,235 images and used only 685 im-
ages for validation. Di et al. [6] used LBP, SIFT, and HOG features for classification
of garment in 12 classes using SVM classifiers. The same features were used by Chen
et al. [11] to classify garment in 10 fashion style classes based on a sparse-coding
approach.
Deep learning has recently demonstrated its performance compared to other classi-
cal methods of machine learning especially when recognizing images from large
amounts of data. The deep learning technique is based on learning features automati-
cally from unlabelled input data and transform them non linearly in each layer to
extract more representative hidden discriminative features. Convolutional Neural
Networks (CNN) is currently very used in pattern recognition and signal processing
researches.
Based on the remark that various problem domains can benefit from the same low
and mid-level features, the transfer learning is attracting more and more attention
[12]. Several researches have shown that transfer learning could be efficient in the
case of transferring a model from a large-scale dataset to other tasks [13, 14]. Many
researchers used the transfer technique with deep neural networks, especially CNNs,
in the field of clothing classification and retrieval [15, 16]. The ImageNet is the most
used dataset in this research area as it is considered as one of the largest datasets for
image object recognition with 1.2 million 256 × 256 RGB images [17]. Chen et al.
[14] implemented a specific dual-path deep neural network to classify the input gar-
ment. Each deep network is used to model a garment domain. Lin et al. [18] proposed
a clothing retrieval system based on hierarchical deep search framework. Transfer
learning was then applied after pre-training network with mid-level visual features.
The experiments were conducted on 15 clothing classes in the dataset composed of
161,234 images from Yahoo shopping websites.
VGGNet [19] has been widely used considering its architectural simplicity. On the
other hand, he suffers from the necessity of a lot of computation. GoogLeNet's Incep-
tion architecture [20] has the advantage of being designed to perform well even under
strict constraints on the memory and the calculation budget. So, GoogLeNet uses a
reduced number of parameters compared to VGGNet and AlexNet. The low Inception
computational cost prompted researchers to use Inception networks in image recogni-
tion from large-scale datasets [21].
Few works have been conducted on garments class recognition based on deep and
transfert learning. The purpose of this work is to recognize from an image, the cloth-
ing type from a given dataset. To achieve this, the raw images are trans-formed using
a pre-trained Inception deep neural network to build the deep features which are then
used thereafter to train the classifiers. The rest of the paper is organized as follows.
Section 2 details the proposed method. In section 3, we present the experimental re-
sults. In this section, the proposed deep learning architecture is validated on the popu-
lar ImageNet clothing Dataset [22]. Finally, the results are discussed and concluding
remarks are given.
3
2 Proposed Method
A Convolutional Neural Network (ConvNet/CNN) is a powerful deep neural network,
which has been broadly utilized to solve hard machine learning issues. Different re-
searches practice CNNs straight on the raw pixel images, without needing to deter-
mine features a priori. CNNs utilize fewer parameters than a fully connected network
by calculating convolution on little regions of the input space and by sharing parame-
ters between regions. This permitted the models to be trained on greater sequence
windows, thus improving the detection of pertinent models. Various architectures that
are based on CNN have been developed and are designed for the 1000-class image
classification such as AlexNet [17], VGGNet [19], ResNet [23] and GoogLeNet [20].
Indeed, in this work the model that was used to build a clothing recognition system is
Inception-v3 by Google. Inception-v3 architecture is depicted in Fig. 1. Consist of
three inception module (A, B and C) punctuated with grid size reduction step. At the
end of the training operation, when accuracies were approaching saturation, the auxil-
iary classifiers participate as regularizer and specially when they had Dropout or
BatchNorm techniques.
Fig.1. Inception-v3 Model with Tensorflow (BatchNorm and ReLU are employed after Conv)
Inception-v3 breaks down the convolutions by employing smaller 1-D filters as in-
dicated in Fig. 3 to minimize number of Multiply-and-Accumulates (MACs) and
weights, as well as benefit from the factorizing Convolutions, in order to go deeper to
42 layers. In conjunction with batch normalization [24] utilized with inception-v2, v3
reaches over 3% lower top-5 error than v1 with 2.5× increase in computation [21].
Inception-v4 utilizes residual connections [25].
1 × 7 filter
In our apparel recognition system we used the transfer learning solution in order to
reuse the feature extraction part and re-train the classification part with a dataset. Fig.
4 shows the training model.
In this section, we present the clothing datasets used for experiments. Thereafter, we
details and discusses the experimental settings. The obtained results are then present-
ed and compared with the proposed systems in the state of the art.
3.1 Dataset
From Table 1, it is clear that Inception-v3 method performs better results than
VGG16 and GoogLeNet, and requires less time in testing. Consequently, as described
above with 42 deep layers, reducing the number of parameters provided by inception
process did not decrease the efficiency of the network.
0
Manteau
Poncho
Chemisier
T shirt
Gilet
Costume
Robe
Chemise
Lingerie
Uniforme
Jacket
Pull
Pull de sport
From the obtained results (See Fig. 6), we notice that the classes of articles (Man-
teau, Gilet, Uniforme, Pull de sport) record the highest rate with a percentage almost
90% (See Fig. 7). The other classes of articles (Chemisier, T-shirt, Chemise, Lingerie,
Pull) provide a low rate (50%) because of the similarity of these five classes.
Comparison with other outcomes is presented in Table 2. We observed that the per-
formance of our clothing recognition system achieved a promising result, with an
accuracy rate of about 70 % compared to other research works based on traditional
methods [11, 9] and deep learning approach [15].
4 Conclusions
Deep Learning technology has had today a major impact on various areas of research.
We benefited from this success story to propose a solution for clothing recognition. In
this work, we have used the deep Convolutional Neural Networks with Inception
architecture for identifying clothing class. We used pre-trained weights as a starting
point to avoid a very long treatment. Thereafter, we compared the proposed approach
with several hand-crafted and machine learning based shallow structure approaches.
The proposed system provided promising recognizing results on the clothing dataset,
which proves the effectiveness of the proposed approach for clothing recognition.
References
1. Yamaguchi, K., Berg, TL., Ortiz. LE.: Chic or social: visual popularity analysis in online
fashion networks. In: ACMconference on multimedia, pp 773–776 (2014).
2. Yamaguchi, K., Kiapour, MH., Ortiz, LE., Berg, TL.: Parsing clothing in fashion photo-
graphs. In: IEEE conference on computer vision and pattern recognition, pp 3570–3577
(2012).
3. Yamaguchi, K., Kiapour, MH., Berg, TL.: Paper doll parsing: retrieving similar styles to
parse clothing items. In: International conference on computer vision, pp 3519–3526
(2013).
4. Kalantidis. Y., Kennedy. L., Li, LJ.: Getting the look: clothing recognition and segmenta-
tion for automatic product suggestions in everyday photos. In: ACMinternational confer-
ence in multimedia retrieval, pp 105–112 (2013).
9
5. Liu, S., Feng, J., Domokos, C., Xu, H., Huang, J., Hu, Z., Yan, S.: Fashion parsing with
weak color-category labels. IEEE Trans Multimedia 16(1):253–265 (2014).
6. Di, W., Wah, C., Bhardwaj, A., Piramuthu, R., Sundaresan N.: Style finder: Fine-grained
clothing style detection and retrieval. In CVPR Workshops, pages 8–13 (2013).
7. Liang, X., Lin, L., Yang, W., Luo, P., Huang, J., Yan, S.: Clothes co-parsing via joint im-
age segmentation and labeling with application to clothing retrieval. In IEEE Transactions
on Multimedia (2016).
8. Chen, JC., Liu, CF.: Visual-based deep learning for clothing from large database. In: ASE
BigData & SocialInformatcis (2015).
9. Bossard, L., Dantone, M., Leistner, C., Wengert, C., Quack, T., Van Gool, L.: Apparel
classification with style. In ACCV, pages 321–335 (2012).
10. Veit, A., Kovacs, B., Bell, S., McAuley, J., Bala, K., Belongie, S.: Learning visual clothing
style with heterogeneous dyadic co-occurrences. In ICCV, pages 4642–4650 (2015).
11. Chen, JC., Xue, BF., Lin Kawuu, W.: Dictionary learning for discovering visual elements
of fashion styles. In: CEC workshop (2015).
12. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image rep-
resentations using convolutional neural networks. In: IEEE conference on computer vision
and pattern recognition, pp 1717–1724 (2014).
13. Huang, J., Feris, RS., Chen, Q., Yan, S.: Cross-domain image retrieval with a dual attrib-
ute-aware ranking network. arXiv preprint arXiv:1505.07922 (2015).
14. Chen, Q., Huang, J., Feris, R., Brown, LM., Dong, J., Yan, S.: Deep domain adaptation for
describing people based on fine-grained clothing attributes. In: IEEE conference on com-
puter vision and pattern recognition, pp 5315–5324 (2015)
15. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: Powering Robust Clothes
Recognition and Retrieval with Rich Annotations, 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) (2016).
16. Chen, JC., Liu, CF.: Deep net architectures for visual-based clothing image recognition on
large database, Soft Computing, vol 21, pp. 2923-2939 (2017).
17. Krizhevsky, A., Sutskever, I., and Hinton, G.: Imagenet classification with deep convolu-
tional neural networks. NIPS, pp. 1106–1114 (2012).
18. Lin, K., Yang, HF., Liu, KH., Hsiao, JH., Chen, CS.: Rapid clothing retrieval via deep
learning of binary codes and hierarchical search. In: ACMinternational conference in mul-
timedia retrieval, pp 499–502 (2015).
19. Simonyan K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556 (2014).
20. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke,
V., Rabinovich, A.: Going deeper with convolutions. In Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pages 1–9 (2015).
21. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the Inception Ar-
chitecture for Computer Vision. 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) (2016).
22. Deng, J., Dong, W., Socher, R., -j. Li, L., Li, K., Fei-fei, L.: ImageNet : A Large-Scale Hi-
erarchical Image Database. CVPR (2009).
23. He, K., Zhang, X., Ren, S., & Sun, J.: Deep residual learning for image recognition.
In Proceedings of the IEEE conference on computer vision and pattern recognition, pages
770-778 (2016).
24. Ioffe, S., & Szegedy, C.: Batch normalization: Accelerating deep network training by re-
ducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).
10
25. Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A.: Inception-v4, inception-resnet and
the impact of residual connections on learning. In Thirty-First AAAI Conference on Artifi-
cial Intelligence (2017, February).
26. Sze, V., Chen, Y. H., Yang, T. J., & Emer, J. S.: Efficient processing of deep neural net-
works: A tutorial and survey. Proceedings of the IEEE, 105(12), 2295-2329 (2017).