3 DressClassification 22220231

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Clothing Classification using Deep CNN Architecture

based on Transfer Learning

Mohamed Elleuch 1,3, Anis Mezghani 2, Mariem Khemakhem 3 and Monji Kherallah 3
1
National School of Computer Science (ENSI), University of Manouba, Tunisia
2
Higher Istitute of Industrial Management, University of Sfax, Tunisia
3
Faculty of Sciences, University of Sfax, Tunisia
elleuch.mohameds@gmail.com

Abstract. The need of a powerful visual analytics tools becomes a necessity to-
day especially with the emergence of pictures on the Internet and their use sev-
eral times instead of text. In this paper, a new approach for clothing style classi-
fication is presented. The types of clothing items we consider in the proposed
system include shirt, pants, suit, dress and so on. Certainly, clothing style clas-
sification represents a recent computer vision research subject who has several
attractive applications, including e-commerce, criminal law and on-line adver-
tising. In our proposed approach, the classification has been carried out by Deep
Convolutional Neural Networks (CNNs). This Deep Learning technique Incep-
tion-v3 has shown very good performances for different object recognition
problems. For deep features extraction, we use a machine learning technique
called Transfer learning to refine pretrained models. Experiments are performed
on two clothing datasets, particularly on the large and public dataset ImageNet.
According to the obtained results, the developed system provides better results
than those proposed in the state of the art.

Keywords: Convolutional Neural Network, Deep Learning, Inception-v3,


Clothing image recognition.

1 Introduction

Today, pictures become the main content on Internet. The size of these digital pictures
collected from online users has grown rapidly. The analysis of the collected data
makes it possible to better predict consumer behavior and also to recommend prod-
ucts [1]. Recently, several researches were conducted on clothing recognition [2, 3],
clothing item retrieval [4, 5, 6, 7], clothing style recognition [8, 9, 10]. In many cul-
tures, clothing reflects information about social status, age, gender and lifestyle.
Clothing is also an interesting descriptor in the identification of humans. Style and
texture variations present a major problem for clothing recognition. Also, clothes are
often subject to deformation and occlusion, in addition to the wide variation when
taken in different scenarios, like selfies compared to online photos. Clothing recogni-
tion algorithms relies usually on handcrafted features, like HOG, SIFT and histogram
analysis. Yamaguchi et al. [2] proposed a clothing recognizing system consisting of
2

three classifiers for each pixel. All the results are then combined for the final predic-
tion. The authors created a dataset containing 158,235 images and used only 685 im-
ages for validation. Di et al. [6] used LBP, SIFT, and HOG features for classification
of garment in 12 classes using SVM classifiers. The same features were used by Chen
et al. [11] to classify garment in 10 fashion style classes based on a sparse-coding
approach.
Deep learning has recently demonstrated its performance compared to other classi-
cal methods of machine learning especially when recognizing images from large
amounts of data. The deep learning technique is based on learning features automati-
cally from unlabelled input data and transform them non linearly in each layer to
extract more representative hidden discriminative features. Convolutional Neural
Networks (CNN) is currently very used in pattern recognition and signal processing
researches.
Based on the remark that various problem domains can benefit from the same low
and mid-level features, the transfer learning is attracting more and more attention
[12]. Several researches have shown that transfer learning could be efficient in the
case of transferring a model from a large-scale dataset to other tasks [13, 14]. Many
researchers used the transfer technique with deep neural networks, especially CNNs,
in the field of clothing classification and retrieval [15, 16]. The ImageNet is the most
used dataset in this research area as it is considered as one of the largest datasets for
image object recognition with 1.2 million 256 × 256 RGB images [17]. Chen et al.
[14] implemented a specific dual-path deep neural network to classify the input gar-
ment. Each deep network is used to model a garment domain. Lin et al. [18] proposed
a clothing retrieval system based on hierarchical deep search framework. Transfer
learning was then applied after pre-training network with mid-level visual features.
The experiments were conducted on 15 clothing classes in the dataset composed of
161,234 images from Yahoo shopping websites.
VGGNet [19] has been widely used considering its architectural simplicity. On the
other hand, he suffers from the necessity of a lot of computation. GoogLeNet's Incep-
tion architecture [20] has the advantage of being designed to perform well even under
strict constraints on the memory and the calculation budget. So, GoogLeNet uses a
reduced number of parameters compared to VGGNet and AlexNet. The low Inception
computational cost prompted researchers to use Inception networks in image recogni-
tion from large-scale datasets [21].
Few works have been conducted on garments class recognition based on deep and
transfert learning. The purpose of this work is to recognize from an image, the cloth-
ing type from a given dataset. To achieve this, the raw images are trans-formed using
a pre-trained Inception deep neural network to build the deep features which are then
used thereafter to train the classifiers. The rest of the paper is organized as follows.
Section 2 details the proposed method. In section 3, we present the experimental re-
sults. In this section, the proposed deep learning architecture is validated on the popu-
lar ImageNet clothing Dataset [22]. Finally, the results are discussed and concluding
remarks are given.
3

2 Proposed Method
A Convolutional Neural Network (ConvNet/CNN) is a powerful deep neural network,
which has been broadly utilized to solve hard machine learning issues. Different re-
searches practice CNNs straight on the raw pixel images, without needing to deter-
mine features a priori. CNNs utilize fewer parameters than a fully connected network
by calculating convolution on little regions of the input space and by sharing parame-
ters between regions. This permitted the models to be trained on greater sequence
windows, thus improving the detection of pertinent models. Various architectures that
are based on CNN have been developed and are designed for the 1000-class image
classification such as AlexNet [17], VGGNet [19], ResNet [23] and GoogLeNet [20].
Indeed, in this work the model that was used to build a clothing recognition system is
Inception-v3 by Google. Inception-v3 architecture is depicted in Fig. 1. Consist of
three inception module (A, B and C) punctuated with grid size reduction step. At the
end of the training operation, when accuracies were approaching saturation, the auxil-
iary classifiers participate as regularizer and specially when they had Dropout or
BatchNorm techniques.

Fig.1. Inception-v3 Model with Tensorflow (BatchNorm and ReLU are employed after Conv)

2.1 Inception-v3 architecture


The Inception deep convolutional approach was presented as GoogLeNet with 22
layers. It‟s composed of parallel connections (See Fig. 2), whereas previously there
was only one serial connection. Since its presentation in 2014, inception has various
versions: v1, v2/v3 and v4.
4

Fig.2. Inception module from GoogleNet [20]

Inception-v3 breaks down the convolutions by employing smaller 1-D filters as in-
dicated in Fig. 3 to minimize number of Multiply-and-Accumulates (MACs) and
weights, as well as benefit from the factorizing Convolutions, in order to go deeper to
42 layers. In conjunction with batch normalization [24] utilized with inception-v2, v3
reaches over 3% lower top-5 error than v1 with 2.5× increase in computation [21].
Inception-v4 utilizes residual connections [25].

1 × 7 filter

7 × 7 filter 7 × 1 filter Carry out in sequence


Fig. 3. Decomposing greater filters into reduced filters: Building a 7×7 support from 1×7 and
7×1 filter

2.2 Transfer Learning


Transfer learning could be seen as the ability of a system to recognize and apply
knowledge and skills, learned from previous work, on new tasks or areas sharing
similarities. Training a convolutional neural network requires a huge volume of data
because it has to learn millions of weights. However, rather than learning a convolu-
tional neural network from scratch, it is common to use a pre-trained model to auto-
matically extract features from a new dataset. This method, called transfer learning, is
a practical solution for applying Deep Learning algorithms without requiring a large
data set or a very long training.
5

In our apparel recognition system we used the transfer learning solution in order to
reuse the feature extraction part and re-train the classification part with a dataset. Fig.
4 shows the training model.

Fig.4. Training model

3 Experiments and Results

In this section, we present the clothing datasets used for experiments. Thereafter, we
details and discusses the experimental settings. The obtained results are then present-
ed and compared with the proposed systems in the state of the art.

3.1 Dataset

To evaluate the proposed system, we create a clothing dataset composed of 80,000


images of 13 style classes: Coat, Poncho, Blouse, Dress, Shirt, Vest, Lingerie, T-shirt,
Uniform, Suit, Sweater, Jacket, Sports sweater (in French: „Manteau‟, „Poncho‟,
'Chemisier', „robe‟, „chemise‟, „gilet‟, „Lingerie‟, „T-shirt‟, „Uniforme‟, „Costume‟,
„Pull‟, „Jacket‟, „Pull de sport‟). All the images are normalized to size 299×299 pixels.
We train and validate the model on a subset of 4100 images and test it on another
subset of 2050 images. In validation, we employ 20% of the training set for the pa-
rameter tuning. In the adopted split, no clothing item overlaps between the different
subsets. Some samples of clothing images of the created dataset are shown in Fig. 5.
Althought the created dataset contains a big number of clothing images, our dataset
is too small to obtain an accurate deep neural network of over a million parameters.
The use of deep features created from pre-trained inception-v3 model, using 1.2 mil-
lion images of ImageNet, represent a practical solution for applying Deep Learning
without requiring a very large dataset or a very long training.
6

Fig.5. Samples of clothing images

3.2 Experimental settings


The architecture is implemented with Python deep learning library TensorFlow,
which is an open-source machine learning algorithms created and released by Google.
To validate our suggested system based on CNN/inception-v3, we use the local data-
base containing clothing images. It‟s divided into two parts: training set and testing
set. For performing experiments, the size considered of the input images shape is
299×299 × 3.
Inception-v3 was trained with ILSVRC (ImageNet Large Scale Visual Recognition
Challenge) using data exploited in the competition in 2012. We used this same trained
network but it is recycled to distinguish clothing according to our own examples of
images. The configuration of the inception-v3 architecture is characterized by :
RMSProp Optimizer which represents a faster learning optimizer, a factorized 7×7
convolution, a BatchNorm in the Auxillary Classifiers and label smoothing to prevent
from over-fitting.
The training parameters of our model are listed as follows: batch size = 32 using
RMSProp Optimizer with learning rate (LR) of 0.001 and decay rate of 0.3. Finally,
the training is given for 20 epochs, which ensures convergence.

3.3 Results and Discussion


The automatic extraction of relevant features through deep learning can save mer-
chants and customers a lot of time. Indeed, our proposed system shows its reliability
and speed with a satisfactory recognition rate of 70%. This represents an improve-
ment of 3.5 % compared to GoogLeNet approach and an improvement of 5.7 % over
the VGG16 network.
7

Table 1. Performance of our proposed method

Method Recognition Rate Test Time (ms)


Inception-v3 70 % 1.4 ms
VGG16 64.3 % 3.0 ms
GoogLeNet 66.5 % 2.7 ms

From Table 1, it is clear that Inception-v3 method performs better results than
VGG16 and GoogLeNet, and requires less time in testing. Consequently, as described
above with 42 deep layers, reducing the number of parameters provided by inception
process did not decrease the efficiency of the network.

Accuracy Rate (%)


100 88 90
81 90 90
75 80
70
53 60
50 50 50
50

0
Manteau
Poncho
Chemisier
T shirt
Gilet
Costume
Robe
Chemise
Lingerie
Uniforme
Jacket
Pull

Pull de sport

Fig.6. Accuracy rate for each class of clothing

Fig.7. Accuracy rate for styles “manteaux” and “gilet”


8

From the obtained results (See Fig. 6), we notice that the classes of articles (Man-
teau, Gilet, Uniforme, Pull de sport) record the highest rate with a percentage almost
90% (See Fig. 7). The other classes of articles (Chemisier, T-shirt, Chemise, Lingerie,
Pull) provide a low rate (50%) because of the similarity of these five classes.
Comparison with other outcomes is presented in Table 2. We observed that the per-
formance of our clothing recognition system achieved a promising result, with an
accuracy rate of about 70 % compared to other research works based on traditional
methods [11, 9] and deep learning approach [15].

Table 2. Performance comparison

Architecture Classification Rule Recognition Rate


Our Proposed Inception-v3 70 %
66.43 % (top-3)
Liu et al. (2016) [15] FashionNet (deep model)
73.16 % (top-5)
Chen et al. (2015)
low-level features + sparse coding 68 %
[11]
Bossard et al. (2012)
Transfer Forest 41.36
[9]

4 Conclusions

Deep Learning technology has had today a major impact on various areas of research.
We benefited from this success story to propose a solution for clothing recognition. In
this work, we have used the deep Convolutional Neural Networks with Inception
architecture for identifying clothing class. We used pre-trained weights as a starting
point to avoid a very long treatment. Thereafter, we compared the proposed approach
with several hand-crafted and machine learning based shallow structure approaches.
The proposed system provided promising recognizing results on the clothing dataset,
which proves the effectiveness of the proposed approach for clothing recognition.

References
1. Yamaguchi, K., Berg, TL., Ortiz. LE.: Chic or social: visual popularity analysis in online
fashion networks. In: ACMconference on multimedia, pp 773–776 (2014).
2. Yamaguchi, K., Kiapour, MH., Ortiz, LE., Berg, TL.: Parsing clothing in fashion photo-
graphs. In: IEEE conference on computer vision and pattern recognition, pp 3570–3577
(2012).
3. Yamaguchi, K., Kiapour, MH., Berg, TL.: Paper doll parsing: retrieving similar styles to
parse clothing items. In: International conference on computer vision, pp 3519–3526
(2013).
4. Kalantidis. Y., Kennedy. L., Li, LJ.: Getting the look: clothing recognition and segmenta-
tion for automatic product suggestions in everyday photos. In: ACMinternational confer-
ence in multimedia retrieval, pp 105–112 (2013).
9

5. Liu, S., Feng, J., Domokos, C., Xu, H., Huang, J., Hu, Z., Yan, S.: Fashion parsing with
weak color-category labels. IEEE Trans Multimedia 16(1):253–265 (2014).
6. Di, W., Wah, C., Bhardwaj, A., Piramuthu, R., Sundaresan N.: Style finder: Fine-grained
clothing style detection and retrieval. In CVPR Workshops, pages 8–13 (2013).
7. Liang, X., Lin, L., Yang, W., Luo, P., Huang, J., Yan, S.: Clothes co-parsing via joint im-
age segmentation and labeling with application to clothing retrieval. In IEEE Transactions
on Multimedia (2016).
8. Chen, JC., Liu, CF.: Visual-based deep learning for clothing from large database. In: ASE
BigData & SocialInformatcis (2015).
9. Bossard, L., Dantone, M., Leistner, C., Wengert, C., Quack, T., Van Gool, L.: Apparel
classification with style. In ACCV, pages 321–335 (2012).
10. Veit, A., Kovacs, B., Bell, S., McAuley, J., Bala, K., Belongie, S.: Learning visual clothing
style with heterogeneous dyadic co-occurrences. In ICCV, pages 4642–4650 (2015).
11. Chen, JC., Xue, BF., Lin Kawuu, W.: Dictionary learning for discovering visual elements
of fashion styles. In: CEC workshop (2015).
12. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image rep-
resentations using convolutional neural networks. In: IEEE conference on computer vision
and pattern recognition, pp 1717–1724 (2014).
13. Huang, J., Feris, RS., Chen, Q., Yan, S.: Cross-domain image retrieval with a dual attrib-
ute-aware ranking network. arXiv preprint arXiv:1505.07922 (2015).
14. Chen, Q., Huang, J., Feris, R., Brown, LM., Dong, J., Yan, S.: Deep domain adaptation for
describing people based on fine-grained clothing attributes. In: IEEE conference on com-
puter vision and pattern recognition, pp 5315–5324 (2015)
15. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: Powering Robust Clothes
Recognition and Retrieval with Rich Annotations, 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) (2016).
16. Chen, JC., Liu, CF.: Deep net architectures for visual-based clothing image recognition on
large database, Soft Computing, vol 21, pp. 2923-2939 (2017).
17. Krizhevsky, A., Sutskever, I., and Hinton, G.: Imagenet classification with deep convolu-
tional neural networks. NIPS, pp. 1106–1114 (2012).
18. Lin, K., Yang, HF., Liu, KH., Hsiao, JH., Chen, CS.: Rapid clothing retrieval via deep
learning of binary codes and hierarchical search. In: ACMinternational conference in mul-
timedia retrieval, pp 499–502 (2015).
19. Simonyan K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556 (2014).
20. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke,
V., Rabinovich, A.: Going deeper with convolutions. In Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pages 1–9 (2015).
21. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the Inception Ar-
chitecture for Computer Vision. 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) (2016).
22. Deng, J., Dong, W., Socher, R., -j. Li, L., Li, K., Fei-fei, L.: ImageNet : A Large-Scale Hi-
erarchical Image Database. CVPR (2009).
23. He, K., Zhang, X., Ren, S., & Sun, J.: Deep residual learning for image recognition.
In Proceedings of the IEEE conference on computer vision and pattern recognition, pages
770-778 (2016).
24. Ioffe, S., & Szegedy, C.: Batch normalization: Accelerating deep network training by re-
ducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).
10

25. Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A.: Inception-v4, inception-resnet and
the impact of residual connections on learning. In Thirty-First AAAI Conference on Artifi-
cial Intelligence (2017, February).
26. Sze, V., Chen, Y. H., Yang, T. J., & Emer, J. S.: Efficient processing of deep neural net-
works: A tutorial and survey. Proceedings of the IEEE, 105(12), 2295-2329 (2017).

You might also like