Applied Sciences: Research On A Surface Defect Detection Algorithm Based On Mobilenet-Ssd
Applied Sciences: Research On A Surface Defect Detection Algorithm Based On Mobilenet-Ssd
Applied Sciences: Research On A Surface Defect Detection Algorithm Based On Mobilenet-Ssd
sciences
Article
Research on a Surface Defect Detection Algorithm
Based on MobileNet-SSD
Yiting Li, Haisong Huang *, Qingsheng Xie, Liguo Yao and Qipeng Chen
Key Laboratory of Advanced Manufacturing Technology, Ministry of Education, Guizhou University,
Guiyang 550025, China; tgl226537@163.com (Y.L.); qsxie@gzu.edu.cn (Q.X.); yaoliguo1990@163.com (L.Y.);
cqplll@gmail.com (Q.C.)
* Correspondence: huang_h_s@126.com; Tel.: +86-139-851-46670
Received: 26 August 2018; Accepted: 13 September 2018; Published: 17 September 2018
Abstract: This paper aims to achieve real-time and accurate detection of surface defects by using
a deep learning method. For this purpose, the Single Shot MultiBox Detector (SSD) network was
adopted as the meta structure and combined with the base convolution neural network (CNN)
MobileNet into the MobileNet-SSD. Then, a detection method for surface defects was proposed based
on the MobileNet-SSD. Specifically, the structure of the SSD was optimized without sacrificing its
accuracy, and the network structure and parameters were adjusted to streamline the detection model.
The proposed method was applied to the detection of typical defects like breaches, dents, burrs and
abrasions on the sealing surface of a container in the filling line. The results show that our method can
automatically detect surface defects more accurately and rapidly than lightweight network methods
and traditional machine learning methods. The research results shed new light on defect detection in
actual industrial scenarios.
1. Introduction
Intellisense and pattern recognition technologies have made progress in robotics [1–3], computer
engineering [4,5], health-related issues [6], natural sciences [7] and industrial academic areas [8,9].
Among them, computer vision technology develops particularly quickly. It mainly uses a binary
camera, digital camera, depth camera and charge-coupled device (CCD) camera to collect target images,
extract features and establish corresponding mathematical models, and to complete the processing of
target recognition, tracking and measurement. For example, Kamal et al. comprehensively consider
the continuity and constraints of human motion. After contour extraction of the acquired depth
image data, the Hidden Markov Model (HMM) is used to identify human activity. This system is
highly accurate in recognition and has the ability to effectively deal with rotation and deficiency of the
body [10]. Jalal et al. use Texture and shape vectors to reduce feature vectors and extracts important
features in facial recognition through density matching score and boundary fixation, so as to manage
key processing steps of face activity (recognition accuracy, recognition speed and security) [11]. In [12],
vehicle damage is classified by a deep learning method, and the recognition accuracy of a small
data set was up to 89.5% by the introduction of transfer learning and an integrated learning method.
This provides a new way for automatic processing of vehicle insurance. Zhang et al. combine the
four features of color, time motion, gradient norm and residual motion to identify the position of each
frame in video. The method uses weighted linear combination to evaluate the different combinations
of these features and establishes a precise hand detector [13]. With the continuous improvement of
computer hardware and the deepening of research on complex image classification, the application
prospect of computer vision technology will be more and more extensive.
Surface defect detection is an important issue in modern industry. Traditionally, surface defects
are often detected in the following steps: first, pre-processing of the target image by image processing
algorithms. Image pre-processing technology can process pixels accurately. By setting and adjusting
various parameters according to actual requirements, the image quality can be improved by de-noising,
changing brightness and improving contrast, laying a foundation for subsequent processing; second,
carry out histogram analysis, wavelet transform or Fourier transform. The above transformation
methods can obtain the representation of an image in a specific space, which is convenient for the
artificial designing and extracting feature; finally, the image is classified according to its features
using a classifier. Common methods include thresholding, decision trees or support vector machine
(SVM). Most of the existing surface defect detection algorithms are based on machine vision [14–19].
Considering the mirror feature of ceramic balls, [17] obtains the stripe distortion image of defective
parts according to the principle of the fringe reflection and locates the defect positions by reverse ray
tracing. The research method is suitable for surface defect detection of ceramic balls and other phases,
but fails to achieve high accuracy due to the selection and design of radiographic models in reverse
ray tracing. Jian et al. realize the automatic detection of glass surface defects on cell phone screens
through fuzzy C-means clustering. Specifically, the image was aligned by contour registration during
pre-processing, and then the defective area was segmented by projection segmentation. Despite the
high accuracy, the detection approach consumes way too much time (1.6601 s) [18]. Win et al. integrate
a median-based Otsu image thresholding algorithm with contrast adjustment to achieve automatic
detection of the surface defects on titanium coating. The proposed method is simple and, to some
extent, immune to variation in light and contrast. However, when the sample size is large, the optimal
threshold calculation is too inefficient and the grey information is easily contaminated by dry noise
points [19]. To sum up, the above surface detection methods can only extract a single feature, and derive
a comprehensive description of surface defects from it. These types of approaches only work well
on small sample datasets, but not on large samples and complex objects and backgrounds in actual
production. To solve this problem, one viable option is to improve the approach with deep learning.
In recent years, deep learning has been successfully applied to image classification,
speech recognition and natural language processing [20–22]. Compared with the traditional machine
learning method, it has the following characteristics: deep learning can simplify or even omit the
pre-processing of data, and directly use the original data for model training; deep learning is composed
of multi-layer neural networks, which solve the defects in the traditional machine learning methods
of artificial feature extraction and optimization. So far, deep learning has been extensively adopted
for surface defect detection. For example, in [23], Deep Belief Network (DBN) was adopted to obtain
the mapping relationship between training images of solar cells and non-defect templates, and the
comparison between reconstructed images and defect images was used to complete the defect detection
of the test images. Cha et al. employ a deep convolution neural network (CNN) to identify concrete
cracks in complex situations, e.g. strong spots, shadows and ultra-thin cracks, and proves that the
deep CNN outperformed the traditional tools like Canny edge detector and Sobel edge detector [24].
Han et al. detect various types of defects on hub surfaces with residual network (ResNet)-101 as the
base net and the faster region-based CNN (Faster R-CNN) as the detector and achieves a high mean
average precision of 86.3% [25]. The above studies fully verify the excellent performance of deep
learning in detecting surface defects. Nevertheless, there are few studies on product surface defect
detection using several main target detection networks in recent years, such as YOLO (You Only
Look Once), SSD (Single Shot MultiBox Detector) [26] and so on. The detection performance of these
networks in surface defect detection needs to be further verified and optimized.
This paper presents a surface defect detection method based on MobileNet-SSD. By optimizing the
network structure and parameters, this method can meet the requirements of real-time and accuracy
in actual production. It was verified in the filling line and the results show that our method can
automatically locate and classify the defects on the surface of the products.
Appl. Sci. 2018, 8, 1678 3 of 17
The K-fold cross validation method divides the expanded dataset C into K discrete subsets.
During network training, a subset was selected as the test set, while the rest (K-1) subsets were
combined into the training set. Each training outputs a classification accuracy of the network model
on the selected test set. The same process was repeated K times to get the mean accuracy, i.e., the true
accuracy of the model.
1 − x2 +2y2
K= e 2σ (1)
2πσ2
The purpose of Gaussian filtering is to smoothen the image and remove as much noise as possible.
Then, the Sobel operator was introduced to obtain the gradient amplitude and its direction, which can
enhance the image and highlight the points with significant changes in neighboring pixels. The operator
contains two groups of 3 × 3 kernels. One group is transverse detection kernels, and the other is
vertical detection kernels. The formula of the operator is as follows:
−1 0 +1 +1 +2 +1
Gx = −2 0 +2 ∗ A, Gy = 0 0 0 ∗A (2)
−1 0 +1 −1 −2 −1
where A is a smooth image; Gx is an image after transverse gradient detection; and Gy is an image
after vertical gradient detection. The gradient can be expressed as:
1/2
∂f ∂f
|∇ f | = mag(∇ f ) = ( )2 + ( )2 (3)
∂x ∂y
Appl. Sci. 2018, 8, 1678 5 of 17
Gy
a( x, y) = arctan( ) (4)
Gx
During edge detection, the NMS was adopted to find the maximum gradient of local pixels by
comparing the gradients and gradient directions. After obtaining the binary edge image, the Hough
circle transform was employed to detect the circles. Under the coordinate system (c1 , c2 , r), the formula
of Hough circle transform can be described as:
Appl. Sci. 2018, 8, x FOR PEER REVIEW 5 of 17
2 2 2
( x −c1 ) + (y−c2 ) = r (5)
( x -c1)2 + ( y -c 2 ) 2 =r 2 (5)
where (c1 , c2 ) is the center of the center coordinates; and r is the radius. The detection was realized
(cfollowing
wherethe 1 ,c2) is the center of the r is the radius. The detection was
through steps: first, the center
non-zero coordinates;
points inand the image are traversed, and line segments
realized through the following steps: first, the non-zero points
along the gradient direction (radius direction) and the opposite direction are in the image aredrawn.
traversed,
Theand line
intersection
segments along the gradient direction (radius direction) and the opposite direction are drawn. The
point of the segments is the circle center. Then, the maximum circle is obtained by setting the threshold
intersection point of the segments is the circle center. Then, the maximum circle is obtained by setting
value. After that, the rectangular regions are generated by the maximum radius. The size of the image
the threshold value. After that, the rectangular regions are generated by the maximum radius. The
was size
normalized to 300 × 300 × 3. Figure 2 shows the regional planning process.
of the image was normalized to 300 × 300 × 3. Figure 2 shows the regional planning process.
2.2.1.2.2.1. Principles
Principles of of MobileNetFeature
MobileNet Feature Extraction
Extraction
TheThe MobileNetnetwork
MobileNet network[27]
[27] was
was developed
developedtotoimprove
improve thethe
real-time
real-timeperformance of deep
performance of deep
learning under limited hardware conditions. This network can reduce the number
learning under limited hardware conditions. This network can reduce the number of parameters of parameters
without sacrificing accuracy. Previous studies have shown that MobileNet only needs 1/33 of the
without sacrificing accuracy. Previous studies have shown that MobileNet only needs 1/33 of the
parameters of Visual geometry group -16 (VGG-16) to achieve the same classification accuracy in
parameters of Visual geometry group -16 (VGG-16) to achieve the same classification accuracy in
ImageNet-1000 classification tasks.
ImageNet-1000 classification tasks.
Figure 3 shows the basic convolution structure of MobileNet. Conv_Dw_Pw is a deep and
Figure 3convolution
separable shows the structure.
basic convolution structure
It is composed of MobileNet.
of depth-wise Conv_Dw_Pw
layers (Dw) is alayers
and point-wise deep and
separable convolution
(Pw). The Dw are structure. It is composed
deep convolutional layers of depth-wise
using layers (Dw)
3 × 3 kernels, while and
the point-wise layers (Pw).
Pw are common
The convolutional
Dw are deeplayers
convolutional
using 1 × 1 layers
kernels.using 3 × 3 kernels,
Each convolution while
result the Pw
is treated arebatch
by the common convolutional
normalization
layers using 1 × 1 kernels. Each convolution result is treated by the batch normalization algorithm
algorithm and the activation function rectified liner unit (ReLU).
and the activation function rectified liner unit (ReLU).
without sacrificing accuracy. Previous studies have shown that MobileNet only needs 1/33 of the
parameters of Visual geometry group -16 (VGG-16) to achieve the same classification accuracy in
ImageNet-1000 classification tasks.
Figure 3 shows the basic convolution structure of MobileNet. Conv_Dw_Pw is a deep and
separable convolution structure. It is composed of depth-wise layers (Dw) and point-wise layers
(Pw). The Dw are deep convolutional layers using 3 × 3 kernels, while the Pw are common
Appl. Sci. 2018, 8, 1678 6 of 17
convolutional layers using 1 × 1 kernels. Each convolution result is treated by the batch normalization
algorithm and the activation function rectified liner unit (ReLU).
In this paper, the activation function ReLU is replaced by ReLU6, and the normalization is carried
out by the batch normalization (BN) algorithm, which supports the automatic adjustment of data
distribution. The ReLU6 activation function can be expressed as:
GN = ∑ KM,N ∗ FM (7)
M
where KM,N is the filter; and M and N are respectively the number of input channels and output
channels. During the standard convolution, the input image, including the feature image, FM means
input images, including feature maps, which use the fill style of zero padding.
When the size and channels of input images respectively are DF ∗DF and M, it is necessary to
have N filters with M channels and the size of DK ∗DK before outputting N feature images of the size
DK ∗DK . The computing cost is DK ∗DK ∗M ∗ N ∗ DF ∗DF .
By contrast, the Dw formula can be expressed as:
where K̂1,M is the filter. FM has the same meaning as Formula (7). When the step size is one, the filling
of zero ensures that the size of the characteristic graph is invariable after the application of deep and
separable convolutional structure. When the step size is two, zero filling ensures that the size of the
feature graph obtained after the application of deep and separable convolutional structure becomes
half of the input image/feature graph; that is, the dimensional reduction operation is realized.
The deep separable convolution structure of MobileNet can obtain the same outputs as
those of standard convolution based on the same inputs. The Dw phase needs M filters with
one channel and the size of DK ∗DK . The Pw phase needs N filters with M channels and the
size of 1 × 1. In this case, the computing cost of the deep separable convolution structure is
1
DK ∗DK ∗M ∗ DF ∗DF +M ∗ N ∗ DF ∗DF , about N + 12 of that of standard convolution.
DK
Besides, the data distribution will be changed by each convolution layer during network training.
If the data are on the edge of the activation function, the gradient will disappear and the parameters
will no longer be updated. Similar to the standard normal distribution, the BN algorithm adjusts the
data by setting two learning parameters, and prevents gradient disappearance and the adjustment of
complex parameters (e.g., learning rate and dropout ratio).
In each selected feature map, there are K frames that differ in size and width-to-height ratio.
Appl. Sci. 2018, 8, x FOR PEER REVIEW 7 of 17
These frames are called the default boxes. Figure 4 expresses default boxes on feature maps of different
convolutional layers. Each
on the sealing surface default box
of a container predicts
in the fillingtheline.B The
classscale
score
ofand the fourboxes
the default position parameters.
for each feature
Hence, B ∗ k ∗
map is computed as: w ∗ h class score and 4 ∗ k ∗ w ∗ h position parameters must be predicted for a w ∗ h
feature image. This requires (B + 4) ∗ k ∗ w ∗ h convolution kernels of the size 3 × 3 to process the
feature map. Then, the convolution S -Smin be taken as the final feature for classification
results
Sk =Smin + maxshould (k-1) , (k ∈ [1,m]) (9)
regression and bounding box regression. Here,m-1 B is set to four because there are four typical defects on
the sealing
m issurface of a container inmaps;
the filling
and line. The scale of the default boxes for each feature map
where the number of feature S m ax , S m in are parameters that can be set. In order to
is computed as:
control the fairness of feature vectors inS the−training and test experiments, the same five kinds of
max Smin
width-to-height ratios ar = {1, =3,S0.5,
Sk2, +
min 0.33} were (k −to1generate
used ) , (k ∈ [1,default
m]) boxes. Then, each default (9)
m−1
box canmbeisdescribed
where the number as: of feature maps; and S , S are parameters that can be set. In order to
max min
a
control the fairness of feature vectors in thewtraining
k =Sk aand
r test experiments, the same five kinds of
width-to-height ratios ar = {1, 2, 3, 0.5, 0.33} were (10)
used to generate default boxes. Then, each default
h ak = S k / a r
box can be described as: √
a wak = a Sk √ar
where wk is the width of default boxes; andha = h is the height of default boxes. (10)
k kSk / ar
'
where wak isathe
Next, default
widthbox Sk = Skboxes;
of default Sk+1 should
and hakbe is added when
the height the width-to-height
of default boxes. ratio is one. The
Next, a default box Sk0 =i+0.5Sk Sj+0.5
p
k+1 should be added when the width-to-height ratio is one.
center of each default box is ( , j+0.5) , and f k is the size of the K-th feature unit, i,j ∈ [ 0 , f k ]
i+0.5
The center of each default box is f(k |f | ,f k|f | ), and |fk | is the size of the K-th feature unit, i, j ∈ [0, |fk |].
k k
The
. Theintersection
intersectionover
overunion
union(IoU)
(IoU)between
betweenarea areaAAand
andareaareaB Bcan
canbebecalculated as:as:
calculated
area (A)
area(A) ∩area(B)
area(B)
IoU ==
IoU (11)
(11)
area (A)
area(A) ∪area(B)
area(B)
If the IoU of default box and calibration box (Ground-truth Box) is greater than 0.5, it means the
default box matches the calibrationcalibration box
box of
of that
that category.
category.
The SSD is an an end-to-end
end-to-end training
training model. The overall loss function of the training contains the
conf ( )
confidence loss LLconf ( s, c
s,c ) of
of the
the classification regression
classification regression and
and the
the position
position loss
loss of
of the
the bounding
bounding box
regression
regression LLloc ((r,r,l,g
l, g)).. This
This function
function can
can be
be described
described as:
as:
loc
where ααis a parameter to balance the confidence loss and position loss; s and r are the eigenvectors
where is a parameter to balance the confidence loss and position loss; s and r are the eigenvectors
of confident loss and position loss, respectively; c is the classification confidence; l is the offset of
of confident loss and position loss, respectively; c is the classification confidence; l is the offset of
predicted box, including the translation offset of the center coordinate and scaling offset of the height
predicted box, including the translation offset of the center coordinate and scaling offset of the height
and width; g is the calibration box of the target actual position; and N is the number of default boxes
and width; g is the calibration box of the target actual position; and N is the number of default boxes
that match the calibration boxes of this category.
that match the calibration boxes of this category.
In the filling line, the sealing surface is easily damaged by friction, collision and extrusion in the
recycling and transport of pressure vessels. The common defects include breaches, dents, burrs and
abrasions on the sealing surface. In this paper, the MobileNet-SSD model can greatly reduce the
number of parameters, and achieve higher accuracy under the limited hardware conditions. The
Appl. Sci. 2018, 8, 1678 8 of 17
Figure
Figure 5. MobileNet-Single Shot
5. MobileNet-Single Shot MultiBox
MultiBox Detector
Detector (SSD)
(SSD) network
network feature
feature pyramid.
pyramid.
A 300 ×
A 300 300image
× 300 imagewaswastaken
takenasasthe
theinput.
input.The
Thesix
sixlayers
layersofofthe
thepyramid
pyramid respectively
respectively contain
contain 4,
4, 6, 6, 6, 6 and 6 default boxes. Besides, different 3 × 3 kernels were adopted for classification
6, 6, 6, 6 and 6 default boxes. Besides, different 3 × 3 kernels were adopted for classification and and
location
location with
with the
the step
step length
length of
of one.
one. The
The numbers
numbersin inbrackets
bracketswere
werethe theamount
amountofof33×× 33 filters
filters that
that are
are
applied around each location in the feature map. Its number was the amount of default
applied around each location in the feature map. Its number was the amount of default Box × number Box × number
of
of categories
categories (classification)
(classification) and
and the
the amount
amount ofof default Box×× 4(location),
default Box 4(location), respectively.
respectively. The The general
general
structure of MobileNet-SSD is shown in
structure of MobileNet-SSD is shown in Table 2.Table 2.
Figure 6. Image acquisition device. CCD = charge-coupled device. LED = light-emitting diode.
Figure 6.
Figure
The detection 6. Image
Image acquisition
acquisition
network device. on
device.
was trained CCD
CCD == charge-coupled
the charge-coupled device. LED
device.
following hardware: LED = Core
Intel= light-emitting
i7 7700Kdiode.
light-emitting diode.
processor
(Vietnam, 2017) which has a main frequency of 4.2 GHz, 32 GB memory and a GeForce TITAN X
The detection unitnetwork was trained the following
on the following
graphics processing (GPU). The software part used thehardware: Intel Core
Ubuntu 14.04.2 i7 7700K
operating processor
system, and
(Vietnam, 2017) which has a main frequency of 4.2 GHz, 32 GB memory and a GeForce TITAN X
the Tensorflow deep learning framework. Twenty percent of the samples in the pre-processed library
graphics processing
graphics processingunit unit(GPU).
(GPU).TheThesoftware
software part used
thethe Ubuntu 14.04.2 operating system,
andand
were allocated to the test set and the other 80% partto theused
training Ubuntu
set. 14.04.2 operating system, the
the Tensorflow
Tensorflow deepdeep learning
learning framework.
framework. TwentyTwenty percent
percent of theofsamples
the samples
in theinpre-processed
the pre-processed library
library were
were allocated
allocated
3.2. ofto
to the
Comparison theset
test
Three test
and
Deep set and
the theNetworks
other
Learning other
80% to80%the to the training
training set. set.
Figure 7. 7.
Figure Loss function
Loss and
function accuracy
and ofof
accuracy the three
the detection
three networks.
detection VGG
networks. = visual
VGG geometry
= visual group.
geometry group.
Figure 7. Loss function and accuracy of the three detection networks. VGG = visual geometry group.
Table 3. Training parameters and results of the three detection networks. SGD = stochastic gradient
descent.
Table 3. Training parameters and results of the three detection networks. SGD = stochastic gradient
descent. Parameter VGG-16 MobileNet MobileNet-SSD
Table 3. Training parameters and results of the three detection networks. SGD = stochastic
gradient descent.
In the training process, the detection networks were tested once after two hundred iterations
of the training set. The loss function and accuracy in Table 2 were mean values obtained from 40 to
50 iterations of the test set. It is clear that the MobileNet-SSD detection algorithm achieved better
accuracy than the other two networks with fewer network parameters.
Successful Error
Sample Leakage Positive
Defect Type Detection Detection
Number Number Rate (%)
Number Number
Breach 30 30 0 0 100.00
Dent 30 27 2 1 90.00
Burr 30 28 1 1 93.33
Abrasion 30 39 1 0 96.67
Total 120 115 4 2 95.00
It can be seen from Table 3 that the surface defect detection network completes the defect marking
of 120 defect samples with a 95.00% accuracy rate. There were missing and false samples in dent and
burr defects and missing samples in abrasion defects. This is because the notches are more obvious
than the other defects, and related to the image quality and subjective feelings of humans.
When the filling line was in operation, the container passed by the image acquisition device
within a certain distance, triggering the CCD camera to take photos. The defect detection network
performed forward operation. If there were defects in the image, the alarm would buzz and the defect
type and location were identified by the host (Figure 8). The single forward operation of the network
was at the rate of 0.12 s/image.
Appl. Sci. 2018, 8, 1678 12 of 17
Appl. Sci.
Appl. Sci. 2018,
2018, 8,
8, xx FOR
FOR PEER
PEER REVIEW
REVIEW 12 of
12 of 17
17
Figure
Figure 8. Detection effect
8. Detection effect of
of sealing surface defects.
sealing surface
surface defects.
defects.
3.4. Degree
Degree of
3.4. Degree of Defect
Defect Detection
Detection
3.4. of Defect Detection
The
The defects
defectsof ofthe
thesame
sametype
type may
maydiffer in terms
differ in termsof severity.
terms Here,
of severity.
severity. the pre-processed
Here, datasets
the pre-processed
pre-processed were
datasets
The defects of the same type may differ in of Here, the datasets
divided into
were divided three
divided into categories
into three based
three categories on the
categories based defect
based on on the severity:
the defect easy, medium
defect severity:
severity: easy, and hard.
easy, medium The
medium and recognition
and hard.
hard. The
The
were
result can serve
recognition result as a
result can yardstick
can serve
serve asof the network
as aa yardstick
yardstick of classification
of the
the network quality. Seventy
network classification percent
classification quality. of all
quality. Seventy samples
Seventy percent were
percent of
of all
all
recognition
divided
samples into
were the training
divided setthe
into andtraining
the remaining
set and 30%
the to the test set.
remaining 30% The
to detection
the test results
set. The of the breaches
detection results
samples were divided into the training set and the remaining 30% to the test set. The detection results
are shown
of the
the in Figure
breaches 9.
are shown
shown in Figure
Figure 9.9.
of breaches are in
Figure 9.
Figure 9. Static
Static image
image detection
detection results
results of
of notches.
notches.
(a)
(b)
(c)
Figure
Figure10.10.(a)
(a)Easy
Easyprecision–recall
precision–recall(PR)
(PR)curves
curvesofofthree
threedifferent
differentalgorithms;
algorithms;(b)
(b)Medium
MediumPRPRcurves
curves
ofofthree
threedifferent
differentalgorithms;
algorithms;(c)(c)Hard
HardPR
PRcurves
curvesofofthree
threedifferent
differentalgorithms.
algorithms.MSMS= =
MobileNet-SSD.
MobileNet-SSD.
MTCNN
MTCNN= = multi-task
multi-taskconvolution
convolutionneural
neuralnetwork.
network.
3.5.Contrast
3.5. ContrastExperiment
Experiment
Threecomparative
Three comparativeexperiments
experiments were
were designed
designed to tofurther
furthervalidate
validatethe
theproposed
proposed algorithm.
algorithm.In In
the
first
the experiment,
first experiment,the proposed
the proposed algorithm was compared
algorithm was comparedto five to
lightweight feature extraction
five lightweight networks,
feature extraction
including SqueezeNet [31], MobileNet, performance vs accuracy net (PVANet)
networks, including SqueezeNet [31], MobileNet, performance vs accuracy net (PVANet) [32], [32], MTCNN and
Faceness-Net. The feature extraction accuracy of each algorithm for the
MTCNN and Faceness-Net. The feature extraction accuracy of each algorithm for the ImageNet ImageNet classification task
is displayedtask
classification in Table 5. In the in
is displayed second
Tableexperiment, the above
5. In the second five networks
experiment, werefive
the above contrasted
networks with
werethe
proposed algorithm in defect detection of the filling line in terms of correct detection
contrasted with the proposed algorithm in defect detection of the filling line in terms of correct rate, training
time andrate,
detection the detection
training timetimeandper the
image (Table time
detection 6). per image (Table 6).
Table 5. Feature extraction accuracy of each algorithm. GFLOPS =floating-point operations per
second. GPU = graphics processing unit. PVANet= performance vs accuracy net.
Table 5. Feature extraction accuracy of each algorithm. GFLOPS =floating-point operations per second.
GPU = graphics processing unit. PVANet= performance vs accuracy net.
Table 6. Correct detection rate, training time and the detection time per image of each network.
Number Model Positive Rate Training Time (Day) Detection Time (s)
1 SqueezeNet 85.83% 2 0.31
2 Faceness-Net 90.83% 3 0.59
3 MobileNet 90.83% <1 0.54
4 MTCNN 91.67% 3 0.64
5 PVANet 94.17% 2 0.25
6 MobileNet-SSD 95.00% <1 0.12
As shown in the two tables, the MobileNet-SSD surface defect model is fast and stable, thanks to
the improved SSD meta-structure of the feature pyramid. In general, the proposed algorithm
outperformed the contrastive algorithms in detection rate, training time and detection time. The final
detection time of our algorithm was merely 120 milliseconds per piece, which meets the real-time
requirements of the industrial filling line.
In Contrast Experiment 3, four traditional defect recognition methods of k-nearest neighbor
(KNN) [33], HMM [34–36], SVM and HMM [37] and back propagation neural network (BPNN) [38]
are realized, which are compared with the method in this paper. The KNN method selects Euclidean
distance as the distance function; the HMM model adopts a sampling window of 5 × 4 size and uses
the discrete cosine transform (DCT) coefficient as the observation vector of HMM. The SVM and HMM
method is the same as in literature [37]. The hidden layer number of the BP neural network is set to
30. The above models are also applied to detect the defects on the sealing surface of a container in the
filling line. The statistical results are shown in Table 7.
Table 7. Correct detection rate and the detection time per image of each model. HMM: Hidden Markov
Model. SVM = support vector machine. KNN = k-nearest neighbor. BPNN = back propagation
neural network.
As can be seen from Table 6, compared with the other traditional defect detection methods,
the MobileNet-SSD method has a higher positive detection rate. Under the same hardware conditions,
MobileNet-SSD still maintains the optimal speed despite the small differences between the above five
methods. In addition, the results of HMM and KNN are not ideal. The reason for this may be that
the proportion of defects is small, and the sealing surface of a container contains a lot of background
information. KNN and HMM did not extract specific features of the image before classifying. However,
both the BP neural network and MobileNet-SSD are based on neural networks, which can automatically
learn features by itself, so the accuracy rate of the two methods are relatively high. MobileNet-SSD,
Appl. Sci. 2018, 8, 1678 15 of 17
due to its unique deep convolutional structure, can learn the deep and detachable features of defects
with a bigger receptive field, so it can achieve a higher positive detection rate.
4. Conclusions
This paper proposes a surface defect detection method based on the MobileNet-SSD network,
and applies it to identify the types and locations of surface defects. In the pre-processing phase,
a regional planning method was presented to cut out the main body of the defect, reduce redundant
parameters and improve detection speed and accuracy. Meanwhile, the robustness of the algorithm was
elevated by data enhancement. The philosophy of MobileNet, a lightweight network, was introduced
to enhance the detection accuracy, reduce the computing load and shorten the training time of this
algorithm. The MobileNet and SSD were adjusted to detect the surface defects, such that the proposed
method could differentiate small defects from the background. The feasibility of the proposed method
was verified by defect detection for the sealing surface of an oil chili filling production line in Guizhou,
China. Specifically, an image acquisition device was established for the sealing surface and the deep
learning framework was adopted to mark the defect positions. The results show that the proposed
method can identify most defects in the production environment at high speed with accuracy. However,
the system also has its limitations. Deep learning models have a certain dependence on the hardware
platform because of computationally intensive processes, and they are not suitable for embedded
systems with general performance. Future research will further improve the proposed method through
integration with embedded chips and the Internet of Things, balancing the classification accuracy and
number of parameters of the detection method, and expand the application scope of our method to
complex defects in industrial processes.
Author Contributions: Project administration, Y.L.; validation, Y.L.; resources, H.H. and Q.X.; investigation, L.Y.
and Q.C.
Funding: This study was supported by the Major Project of Science and Technology in Guizhou Province
(No. [2017]3004) and Natural Science Foundation in Guizhou Province (No. [2015]2043).
Conflicts of Interest: The authors declare no conflict of interest. The founding sponsors had no role in the design
of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript, and in the
decision to publish the results.
References
1. Uddin, M.T.; Uddiny, M.A. Human activity recognition from wearable sensors using extremely randomized
trees. In Proceedings of the International Conference on Electrical Engineering and Information
Communication Technology, Dhaka, Bangladesh, 21–23 May 2015; pp. 1–6.
2. Jalal, A.; Sarif, N.; Kim, J.T.; Kim, T.S. Human Activity Recognition via Recognized Body Parts of Human
Depth Silhouettes for Residents Monitoring Services at Smart Home. Indoor Built Environ. 2013, 22, 271–279.
[CrossRef]
3. Zhan, Y.; Kuroda, T. Wearable sensor-based human activity recognition from environmental background
sounds. J. Ambient Intell. Hum. Comput. 2014, 5, 77–89. [CrossRef]
4. Jalal, A. Security Architecture for Third Generation (3G) using GMHS Cellular Network. In Proceedings
of the International Conference on Emerging Technologies, Islamabad, Pakistan, 12–13 November 2008;
pp. 74–79.
5. Shire, A.N.; Khanapurkar, M.M.; Mundewadikar, R.S. Plain Ceramic Tiles Surface Defect Detection Using
Image Processing. In Proceedings of the International Conference on Emerging Trends in Engineering and
Technology, Port Louis, Mauritius, 18–20 November 2012; pp. 215–220.
6. Shang, L.; Yang, Q.; Wang, J.; Li, S.; Lei, W. Detection of rail surface defects based on CNN image recognition
and classification. In Proceedings of the International Conference on Advanced Communication Technology,
Chuncheon-si, Korea, 11–14 February 2018; pp. 45–51.
7. Jalal, A.; Kim, S. Advanced Performance Achievement using Multi-Algorithmic Approach of Video
Transcoder for Low Bitrate Wireless Communication. ICGST Int. J. Graph. Vis. Image Process. 2004, 5,
27–32.
Appl. Sci. 2018, 8, 1678 16 of 17
8. Deutschl, E.; Gasser, C.; Niel, A.; Werschonig, J. Defect detection on rail surfaces by a vision based system.
In Proceedings of the Intelligent Vehicles Symposium, Parma, Italy, 14–17 June 2004; pp. 507–511.
9. Yazdchi, M.; Yazdi, M.; Mahyari, A.G. Steel Surface Defect Detection Using Texture Segmentation Based
on Multifractal Dimension. In Proceedings of the International Conference on Digital Image Processing,
Bangkok, Thailand, 7–9 March 2009; pp. 346–350.
10. Kamal, S.; Jalal, A.; Kim, D. Depth Images-based Human Detection, Tracking and Activity Recognition Using
Spatiotemporal Features and Modified HMM. J. Electr. Eng. Technol. 2016, 11, 1921–1926. [CrossRef]
11. Jalal, A.; Kim, S. Global Security Using Human Face Understanding under Vision Ubiquitous Architecture
System. World Acad. Sci. Eng. Technol. 2006, 13, 7–11.
12. Patil, K.; Kulkarni, M.; Sriraman, A.; Karande, S. Deep Learning Based Car Damage Classification.
In Proceedings of the IEEE International Conference on Machine Learning and Applications, Cancun,
Mexico, 18–21 December 2017; pp. 50–54.
13. Zhang, Z.; Alonzo, R.; Athitsos, V. Experiments with computer vision methods for hand detection.
In Proceedings of the Petra 2011 International Conference on Pervasive Technologies Related to Assistive
Environments, Crete, Greece, 25 May 2011; pp. 1–6.
14. Jeon, Y.J.; Choi, D.C.; Lee, S.J.; Yun, J.P.; Kim, S.W. Steel-surface defect detection using a switching-lighting
scheme. Appl. Opt. 2016, 55, 47–57. [CrossRef] [PubMed]
15. Kabouri, A.; Khabbazi, A.; Youlal, H. Applied multiresolution analysis to infrared images for defects
detection in materials. NDT E Int. 2017, 92, 38–49.
16. Krummenacher, G.; Ong, C.S.; Koller, S.; Kobayashi, S.; Buhmann, J.M. Wheel Defect Detection with Machine
Learning. IEEE Trans. Intell. Transp. Syst. 2018, 19, 1176–1187. [CrossRef]
17. Fu, L.; Wang, Z.; Liu, C. Research on surface defect detection of ceramic ball based on fringe reflection.
Opt. Eng. 2017, 56, 104104.
18. Jian, C.; Gao, J.; Ao, Y. Automatic Surface Defect Detection for Mobile Phone Screen Glass Based on Machine
Vision. Appl. Soft Comput. 2016, 52, 348–358. [CrossRef]
19. Win, M.; Bushroa, A.R.; Hassan, M.A.; Hilman, N.M.; Ide-Ektessabi, A. A Contrast Adjustment Thresholding
Method for Surface Defect Detection Based on Mesoscopy. IEEE Trans. Ind. Inform. 2017, 11, 642–649.
[CrossRef]
20. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed]
21. Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level
classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [CrossRef] [PubMed]
22. Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.;
Van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017,
42, 60–88. [CrossRef] [PubMed]
23. Xian-Bao, W.; Jie, L.; Ming-Hai, Y.; Wen-Xiu, H.; Yun-Tao, Q. Solar Cells Surface Defects Detection Based on
Deep Learning. Pattern Recognit. Artif. Intell. 2014, 27, 517–523.
24. Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional
Neural Networks. Comput. Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [CrossRef]
25. Han, K.; Sun, M.; Zhou, X.; Zhang, G.; Dang, H.; Liu, Z. A new method in wheel hub surface defect detection:
Object detection algorithm based on deep learning. In Proceedings of the International Conference on
Advanced Mechatronic Systems, Xiamen, China, 6–9 December 2017; pp. 335–338.
26. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox
Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands,
11–14 October 2016; pp. 21–37.
27. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H.
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017,
arXiv:1704.04861.
28. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv
2014, arXiv:1409.1556.
29. Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint Face Detection and Alignment Using Multitask Cascaded
Convolutional Networks. IEEE Signal Process. Lett. 2016, 23, 1499–1503. [CrossRef]
30. Yang, S.; Luo, P.; Loy, C.C.; Tang, X. Faceness-Net: Face Detection through Deep Facial Part Responses.
IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1845–1859. [CrossRef] [PubMed]
Appl. Sci. 2018, 8, 1678 17 of 17
31. Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level
accuracy with 50 × fewer parameters and <0.5 MB model size. arXiv, 2016; arXiv:1602.07360.
32. Hong, S.; Roh, B.; Kim, K.H.; Cheon, Y.; Park, M. PVANet: Lightweight Deep Neural Networks for Real-time
Object Detection. arXiv, 2016; arXiv:1611.08588.
33. Hunt, M.A.; Karnowski, T.P.; Kiest, C.; Villalobos, L. Optimizing automatic defect classification feature
and classifier performance for post. In Proceedings of the 2000 IEEE/SEMI Advanced Semiconductor
Manufacturing Conference and Workshop, Boston, MA, USA, 12–14 September 2000; pp. 116–123.
34. Jalal, A.; Kamal, S.; Kim, D. A depth video sensor-based life-logging human activity recognition system for
elderly care in smart indoor environments. Sensors 2014, 14, 11735–11759. [CrossRef] [PubMed]
35. Jalal, A.; Kim, Y.H.; Kim, Y.J.; Kamal, S.; Kim, D. Robust human activity recognition from depth video using
spatiotemporal multi-fused features. Pattern Recognit. 2017, 61, 295–308. [CrossRef]
36. Jalal, A.; Kamal, S.; Kim, D. Individual detection-tracking-recognition using depth activity images.
In Proceedings of the 2015 12th International Conference on IEEE Ubiquitous Robots and Ambient
Intelligence (URAI), Goyang, Korea, 28–30 October 2015; pp. 450–455.
37. Wu, H.; Pan, W.; Xiong, X.; Xu, S. Human activity recognition based on the combined svm&hmm.
In Proceedings of the 2014 IEEE International Conference on Information and Automation (ICIA), Hailar,
China, 28–30 July 2014; pp. 219–224.
38. Islam, M.A.; Akhter, S.; Mursalin, T.E.; Amin, M.A. A suitable neural network to detect textile defects.
In Proceedings of the International Conference on Neural Information Processing, Hong Kong, China,
3–6 October 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 430–438.
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).