Accepted Manuscript: Neurocomputing

Communicated by Yue Gao
Accepted Manuscript
Multiscale road centerlines extraction from high-resolution aerial

imagery
Ruyi Liu, Qiguang Miao, Jianfeng Song, Yining Quan, Yunan Li,
Pengfei Xu, Jing Dai
PII: S0925-2312(18)31224-4
DOI: https://doi.org/10.1016/j.neucom.2018.10.036
Reference: NEUCOM 20058
To appear in: Neurocomputing
Received date: 5 October 2017

Revised date: 12 August 2018
Accepted date: 9 October 2018
Please cite this article as: Ruyi Liu, Qiguang Miao, Jianfeng Song, Yining Quan, Yunan Li, Pengfei Xu,
Jing Dai, Multiscale road centerlines extraction from high-resolution aerial imagery, Neurocomputing
(2018), doi: https://doi.org/10.1016/j.neucom.2018.10.036
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Multiscale road centerlines extraction from

high-resolution aerial imagery✩
Ruyi Liua , Qiguang Miaoa,∗, Jianfeng Songa , Yining Quana , Yunan Lia ,
Pengfei Xub , Jing Daic
T
a School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi, 710071,
IP
China
b School of Information Science and Technology, Northwest University, Xi’an, Shaanxi,
710127, China
c The hospital of Cheng Du military area authority, Chengdu, Sichuan, 610011, China
CR
Abstract
US
Accurate road extraction from high-resolution aerial imagery has many appli-
cations such as urban planning and vehicle navigation system. The common
AN
road extraction methods are based on classification algorithm, which needs to
design robust handcrafted features for road. However, designing such features
is difficult. For the road centerlines extraction problem, the existing algorithms
M
have some limitations, such as spurs, time consuming. To address the above
issues to some extent, we introduce the feature learning based on deep learn-
ED
ing to extract robust features automatically, and present a method to extract

road centerlines based on multiscale Gabor filters and multiple directional non-
maximum suppression. The proposed algorithm consists of the following four
PT
steps. Firstly, the aerial imagery is classified by a pixel-wise classifier based

on convolutional neural network (CNN). Specifically, CNN is used to learn fea-
CE
tures from raw data automatically, especially the structural features. Then,
edge-preserving filtering is conducted on the resulting classification map, with
the original imagery serving as the guidance image. It is exploied to preserve
AC
the edges and the details of the road. After that, we do some post-processing
✩ Fully
documented templates are available in the elsarticle package on CTAN.
∗ Correspondingauthor
Email address: qgmiao@126.com (Qiguang Miao)
URL: http://web.xidian.edu.cn/qgmiao/en/index.html (Qiguang Miao)
Preprint submitted to Journal of LATEX Templates October 16, 2018

ACCEPTED MANUSCRIPT
based on shape features to extract more reliable roads. Finally, multiscale Ga-
bor filters and multiple directional non-maximum suppression are integrated to
get a complete and accurate road network. Experimental results show that the
proposed method can achieve comparable or higher quantitative results, as well
T
as more satisfactory visual performance.
IP
Keywords: Convolutional Neural Network (CNN), Edge-preserving filtering,
multiscale Gabor filters, centerlines extraction
CR
1. Introduction
US
Road information has become essential nowadays as they support many ap-
plications, such as vehicle navigation system, change detection and urban plan-
ning, so road extraction from high-resolution images is very important. In the
AN
5 last years, much effort has been made and various road extraction methods have
been developed. Roads are modeled as a network of intersections and links be-
tween these intersections, and are found by grouping processes [1]. A scheme
M
for road extraction in rural areas which integrates three different modules with
specific strengths is presented. It can get good results for simple open area with
ED
10 bright ribbon roads, but can not work well for the images with complex back-
ground. Shao et al. [2] used a fast linear detector to extract road centerlines on
high-contrast road pixels with increased performance. Besides linear features,
PT
road intersections are another signature in road networks. Owing to the simple
road extraction operator of only two directions and structure optimization algo-
15 rithm, the algorithm can be realized rapidly. However, the problem lies in this
CE
method emphasizes the speed of the algorithm but disregards its effectiveness.
Most popular and successful methods rely on classification [3] [4] [5] [6] [7] [8]
AC
[9] [10] [11] [12]. Yager et al. applied SVM to the road extraction from re-
mote sensing images using edge-based features [3]. Song and Civco exploited
20 smoothness and compactness to reduce the misclassification between roads and
other objects in a SVM classifier [4]. To improve the accuracy of road extrac-
tion, Zhang and Couloigner [5] integrated traditional k-means clustering with
2
ACCEPTED MANUSCRIPT
angular texture signature. Multiscale structural features and support vector

machines were applied in road network extraction by Huang and Zhang [6].
25 Das et al. designed a multistage framework to extract road network based on
probabilistic SVM and histogram-based features, such as mean, standard de-
T
viation, skew, energy, and entropy [7]. Classification is applied to label the
IP
pixels or voxels as belonging to the structure of interest or to the background
[8]. What makes the problem challenging is the complex structure of the prior:
CR
30 Roads form a connect network, thin segments which meet at junctions and
crossings. A priori knowledge is more difficult to turn into a tractable model
than standard smoothness or co-occurrence assumption. Shi et al. proposed
35
US
an integrated method for urban main road centerlines extraction. SVM, gen-
eral adaptive neighborhood, local Geary’s C and local linear kernel smoothing
regression were utilized [9]. Xu et al. presented a bio-inspired model for road
AN
extraction from remote sensing imagery. The model was an improved support
vector machine (SVM) based on the pooling of feature vectors [10]. In order to
obtain a comprehensive feature extraction method for road extraction, Miao et
M
al. proposed a novel object-based automatic method [11]. This method not only
40 used spectral information but also other spatial and spectral features derived
ED
from objects. A semisupervised method was introduced by Cheng et al. [12],

which explored the intrinsic structures between the labeled samples and the
unlabeled ones. From 2015 onwards, some new methods have been proposed.
PT
An information fusion based approach is proposed in [13]. Spectral and shape

45 features are explored to compute at the pixel level and select road segments
CE
using two different methods (i.e., expectation maximization clustering and lin-
earness filtering). A novel multi-stage object-based approach for road extraction
from VHR satellite images is proposed [14]. Object-based information is embed-
AC
ded as heuristic information in the ant colony optimization (ACO) algorithm

50 for handling the road network extraction problem [15]. Wei et al. proposed
a RSRCNN approach which incorporates road structure in learning the CNN
model with structured output of road regions [16]. Liu et al. proposed a mul-
tiview dictionary learning formulation to approximate the Hough transform for
3
ACCEPTED MANUSCRIPT
straight road detection in multispectral images [17]. A new method for extract-
55 ing roads from high-resolution imagery based on hierarchical graph-based image
segmentation is presented in [18]. Abdollahi et al. proposed a new automatic
method for road extraction by integrating the SVM and Level Set methods.
T
The estimated probability of classification by SVM is used as input in Level
IP
Set Method [19]. Zang et al. proposed a task-oriented enhancing technique
60 for extracting road networks from satellite images [20]. An unsupervised road
CR
detection method based on a Gaussian mixture model and object-based features
is proposed in [21].
The aforementioned methods have achieved accurate road extraction from
65 US
remote sensing imagery. They all depend on the features that can better char-
acterize the distinctiveness of a road region with respect to its surrounding area.
So how to design the robust features is critical for the performance of road ex-
AN
traction. Recently, feature learning has been a topic of interest and considerable
progress has been achieved. Features learned by deep learning have resulted in
state-of-the-art performance in various classification tasks [22] [23] [24] [25] [26].
M
70 Deep learning is a new machine learning method, which establishes deep hier-
archical models to represent and analyze data. Deep learning methods mainly
ED
include Convolutional Neural Network (CNN), deep belief networks (DBN) and
stacked auto-encoders (SAE) [27]. CNN is a popular deep model and has been
widely used in computer vision, such as object detection, image classification
PT
75 and image segmentation. In CNN model, trainable filters and local neighbor-
hood pooling operations are applied alternatingly on the raw input images. Such
CE
multiple layers architecture can extract robust features from raw pixels automat-
ically. Features extracted by such network are highly versatile and often more
effective than traditional handcrafted features [28]. Inspired by this, we obtain
AC
80 features for roads by training CNN. Our trained CNN can also classify raw
pixels into road or non-road. During the last decade, in order to improve the
classification accuracy, edge-preserving filtering has been successfully applied
for hyperspectral image classification [29]. Here, we also use edge-preserving
filtering to smooth the classification map and get the real road boundaries.
4
ACCEPTED MANUSCRIPT
85 Many approaches have been proposed to extract road centerlines. These

approaches include radon transform [5], morphological thinning algorithm [6],
hough transform [30], and regression [9] [31]. Although each of them can ex-
tract road centerlines, they also have some disadvantages. The morphological
T
thinning algorithm always produces some spurs. The radon transform based
IP
90 method can only extract the centerlines of straight road segments. Due to the
limitations of hough transform, some false centerlines exist. Some regression
CR
based methods fail to extract the centerlines of the complicated junctions. To
overcome the aforementioned shortcomings of centerlines extraction algorithms,
we use multiscale Gabor filters [32] [33] to enhance the centerlines. This helps to
95
US
find the accurate location of centerlines. The main contributions of our approach
are highlighted as follows. a) We use CNN to capture the local contrast, texture
as well as shape information, and predicting the label of each pixel without the
AN
need for hand-crafted features. b) We introduce the edge-preserving based on
guided image filtering to make the initial road map align with real road bound-
100 aries. c) A multiscale centerlines extraction using Gabor filters and multiple
M
directional non-maximum suppression is proposed.

In this paper, we have proposed a method to extract road centerlines from
ED
high-resolution aerial images. The pixel-wise classification map can be obtained

by the CNN and then processed with edge-preserving filtering. To extract more
105 reliable roads, we do some post-processing based on shape features. Finally, the
PT
road centerlines are extracted by multiscale Gabor filters and multiple direc-
tional non-maximum suppression. The rest of this paper is organized as follows.
CE
In Section 2, the proposed method is described. Section 3 gives the experimen-

tal results and provides a discussion. Finally, the concluding remarks are given
110 in Section 4.
AC
2. Proposed methodology
The proposed method consists of four steps: pixel-wise classification with

CNN, edge-preserving based on guided image filtering, post-processing based
5
ACCEPTED MANUSCRIPT
on shape features, and multiscale road centerlines extraction using Gabor filters
115 and multiple directional non-maximum suppression. The organization of this
method is shown in Figure 1. The details of each step are presented as follows.
Input image
T
IP
Patches Predictions
Initial road map
by Pixel-wise
classification
CR
based on CNN
US
Edge-preserving filtering
Post-processing based on shape features

AN
Multiscale centerlines extraction using Gabor filters
Data
and multiple directional non-maximum suppression
Process
Final road centerlines map
M
Figure 1: Flowchart of the proposed method.

ED
2.1. Pixel-wise classification with CNN

PT
CNN has been successfully applied in the classification task. It is a train-

able multistage feed-forward neural network [34]. The layers of CNN can be
CE
120 divided into three types: convolution layer, pooling layer and multi-layer per-
ceptron layer. All the convolution and pooling layers compose the feature ex-
tractor of CNN. After extracting features with a multilayer convolutional net-
AC
work, fully connected layers with a classifier are added to output class predic-
tions. A large amount of labeled samples are needed to train CNN for great
125 generalization. Given an image I, the input image patches are represented as
xO = {x1 , x2 , ...xs }, where s is the number of the training samples. These
input samples are obtained by extracting the patches centered on the pixels
6
ACCEPTED MANUSCRIPT
Table 1: Architecture details of our network. C: convolutional layer; BN: Batch normaliza-
tion, it takes a step towards reducing internal covariate shift, and in doing so dramatically
accelerates the training of deep neural nets; P: pooling; F: fully connected layer; R: ReLUs, it
is the most commonly used activation function in deep learning models. The function returns
0 if it receives any negative input, but for any positive value x it returns that value back; S:
T
softmax; Channels: the number of output feature maps; Input size: the spatial size of input
IP
feature maps.
Layer 1 2 3 4 5(Output)
CR
Type C+BN+P C+BN+P C+BN+R F F+S
Channels 12 24 50 50 2
Filter size 6×6 6×6 4×4 - -
Pooling size
Pooling stride
Input size
2×2
2
31 × 31
2×2
2
13 × 13
US -
-
4×4
-
-
1×1
-
-
1×1
AN
O = {o1 , o2 , ...os }. For the input image patches xO = {x1 , x2 , ...xs }, the corre-
sponding network output is expressed as y = {y1 , y2 , ...ys }. Each ys takes its
M
130 value from a finite set of classes Ω1 = {1, 2, ...K}, where K is the number of
class. The labels in our dataset are binary, i.e., road or non-road. For real-world
ED
applications we often faced with the requirement of constrained time budget.

The designs of the network architectures should exhibit as trade-offs among the
factors like depth, numbers of filters, filter sizes, etc. Our network consists of
PT
135 five layers, with three convolutional layers and two fully connected layers. Each
layer contains learnable parameters. The network takes a RGB image patch of
CE
31×31 pixels as an input, and exploits a softmax regression model as the output
layer to generate the probabilities of the central pixel being road and non-road.
The architecture details are listed in Table 1.
AC
140 For the datasets used in our paper, we randomly select 1% labeled patches
with the size of 31 × 31 as training samples. All the patches centered on the
pixels with the size of 31 × 31 are used as the testing samples. To label the
training patches, we mainly consider the groundtruth of their central pixels
7
ACCEPTED MANUSCRIPT
T
IP
(a) (b) (c)
Figure 2: Feature maps of different convolutional layers. (a) The original patch with the size
CR
of 31 × 31; (b) The feature maps of the first convolutional layers; (c) The feature maps of the
second convolutional layers.
145
US
as well as the overlaps between the patches and the groundtruth road mask.
The patch B is labeled as a positive training example if i). the central pixel
is belong to road class, and ii).it sufficiently overlaps with the ground truth
AN
road region G: |B ∩ G| ≥ 0.8 × min(|B| , |G|). Similarly, the patch B is labeled
as a negative training example if i). the central pixel is located within the
background, and ii). its overlap with the ground truth salient region is less
M
150 than a predefined threshold: |B ∩ G| < 0.2 × min (|B| , |G|). In order to better
understand CNN, we show the feature maps of convolutional layers in Figure 2.
ED
Figure 2(a) is the original image patch. Figure 2(b) and (c) are feature maps of
the first convolutional layers and the second convolutional layers, respectively.
It is observed that CNN can learn the features very well. Besides, the extracted
PT
155 features are more abstract and global as the layers get deeper.
2.2. Edge-preserving based on guided image filtering

CE
The result of pixel-wise classification appears noisy and not aligned with real
road boundaries. To solve this problem, it is optimized by edge-preserving based
on guided image filtering [29] [35]. The guided filter is based on a local linear
AC
model which assumes that the filtering output q can be represented as a linear
transform of the guidance image I in a local window w of size (2r + 1)×(2r + 1)
as follows:
qi = aj Ii + bj , ∀i ∈ wj (1)
8
ACCEPTED MANUSCRIPT
This model ensures ∇q ≈ a∇I, which means that the filtering output q will have
an edge only if the guidance image I has an edge. To determine the coefficients
aj and bj , the output q is modeled as the input p subtracting some unwanted
components n like noise/textures:
T
qi = pi − ni (2)
IP
A solution that minimizes the difference between q and p while maintaining the
CR
linear model (1) is proposed in [35]. the following cost function is minimized in
the window wj :
X 2

(aj Ii + bj − Pi ) + εa2j
E (aj , bj ) =
i∈wj
US
where ε is a regularization parameter deciding the degree of blurring for the
(3)
AN
guided filter. Figure 3 shows an illustration of the guided filtering process. We
adopt the input image as the color guidance image. For guided filter, (1) can be
represented in a weighted average form as (4), so the filtering output at a pixel
M
is expressed as a weighted average:

X
qi = wi,j (I) pj (4)
ED
where i and j are pixel indexes. p represents the result of pixel-wise classifica-
tion. The filtering weight wi,j is chosen so that the filter can preserve edges of
PT
the guidance image I. wi,j can be expressed as follows:
1 X (Ii − µk ) (Ij − µk )

wi,j (I) = 2 1+ (5)
|w| k∈wi ,k∈wj σk2 + ε
CE
where wi and wj are local windows around pixel i and j, respectively, µk and
σk are the mean and variance of I in wk ,and |w| is the number of pixels in
AC
wk . A 1-D step edge example is presented in Figure 4 to demonstrate the edge

preserving property of the filtering weight for the guided filter. As shown in
Figure 4, when Ii and Ij are on the same side of an edge (Ii − µk ) (Ij − µk ), the
term will have a positive sign. However, if the two pixels are on different sides,
the term will have a negative sign. Figure 5 shows an example of edge-preserving
9
ACCEPTED MANUSCRIPT
Shear
transform
T
...
IP
CR
Collect
samples
US
Figure 3: Illustrations of the guided filtering process.
...
AN
filtering. It shows that the noisy in Figure 5(b) can be effectively smoothed.
importantly, the image obtained by edge-preserving filtering tends to. be
MoreConvolution
.
. is
aligned with edges in the guidance image. Then a thresholding algorithm
M
Convolution
Subsampling
applied to extract desirable road segments. The thresholding method can be
Subsampling
expressed mathematically as
ED
 Softmax
1 q i > threshold
qi = (6)
0 otherwise
PT
. .
Ii I j
s
CE
m
s
AC
.I j
Figure 4: Example of 1-D step edge. Here, µ and σ are shown for a filtering kernel centered
exactly at an edge.
10
ACCEPTED MANUSCRIPT
T
IP
CR
(a)
US
(b) (c)
AN
Figure 5: Guided image filtering. (a) The guidance image I; (b) The result of pixel-wise
classification; (c) An image obtained by edge-preserving filtering.
2.3. Post-processing based on shape features

M
After the above processes, false roads still exist. A refinement process is
160 necessary to improve the reliability of road extraction. In general, roads have
ED
the following characteristics: 1) They do not have small areas; regions with small
areas can be regarded as noisy and should be removed, and 2) they are narrow
and long [36]. In terms of roads’ characteristics, shape features filtering [31]
PT
and multidirectional morphological filtering [37] are used to distinguish potential

165 road segments from other segments. For holes due to noise in some road regions,
CE
a morphologic closing computation operator is applied to fill holes on road. We

apply closing operation with a structure element, size of which equals to 2 to 4
pixels in the high-resolution images.
AC
A typical shape features filtering method is using linear feature index (LFI).
We use connected component analysis to divide pixels into connected compo-
nents. The component is then converted to a rectangle which satisfies
LW = np (7)
11
ACCEPTED MANUSCRIPT
where L is the length of the new rectangle, W is the width of the new rectangle,
170 np is the area of the road segment(also known as pixel number).
LFI can be calculated by
L L L2
LF I = = = (8)
T
W np /L np
According to roads characteristics, they should have large values of LFI, so
IP
regions with small values of LFI can be regarded as nonlinear features and will
CR
be removed.
In order to ensure the independence of each road candidate, directional mor-
phological filtering is applied to eliminate these neighboring non-road segments.
EL,α . It can be expressed as US

We perform the morphological opening operation using the structure element
90
◦
AN
f= ∪ ◦
I ◦ EL,αi (9)
αi =−90
 
 ◦

yi = xi tan αi , xi = 0, ±1, ... ± (L−1)2cos αi , if |αi | ≤ 45
= (xi , yi )
M
EL,αi
 xi = yi cot αi , yi = 0, ±1, ... ± (L−1) sin αi , if 45 ≤ |αi | < 90 
◦ ◦
2
(10)
ED
where αi is the orientation angle , Lse is the length of the line structure element,
175 I is the image after shape features filtering, ◦ is the morphological opening
operator.
PT
2.4. Multiscale road centerlines extraction using Gabor filters and multiple di-
rectional Non-maximum suppression
CE
Road centerlines extraction has been an active research. In order to produce

180 more smooth and accuracy road network, multiscale Gabor filters [32] [33] and
multiple directional non-maximum suppression [38] are introduced.
AC
A 2-D even Gabor function is a Gaussian modulated by cosine, as the fol-

lowing equation illustrates
 2 2 
 0
 x + y
0


1 0
hF,θ (x, y) = 2
exp − 2
cos 2πF x (11)
2πσ 
 2σ 

12
ACCEPTED MANUSCRIPT
0.5
T
-0.5
-1
IP
50
CR
-50
-30 -20 -10 0 10 20 30
(a) (b)
US
Figure 6: A 2-D even Gabor filter in spatial domain at orientation θ = 0 rendered as (a) a 2D
intensity map and (b) a 3D surface.
AN
0 0

x ,y = (x cos θ + y sin θ, −x sin θ + y cos θ) (12)
M
1
F = (13)
2 ∗ width
r
ln 2 2BF + 1
ED
σ= (14)
2 πF (2BF − 1)
where θ and F denote the orientation of the filter and the spatial center fre-
quency, respectively. width represents the width of a region. σ is the standard
PT
deviation of the Gaussian envelope, BF is the spatial frequency bandwidth.

185 Figure 6 illustrates the support area of a symmetrical Gabor filter.
CE
Centerlines extraction from binary images is to get the maximum response

map, which should have large values on centerlines and gradually decrease as
the distance of a point to the centerlines increases. In order to make the road
AC
centerlines positions have local maximum values, we convolve the road segmen-
tation image with a bank of Gabor filters tuned at different orientations and
frequencies, and choose the maximum response among the different scales and
13
ACCEPTED MANUSCRIPT
orientations. Thus, the response maps Ires (x, y) can be given as
Ires (x, y) = |I ∗ hF,θ (x, y)| (15)
where I denotes the road segmentation image, ∗ denotes the convolution op-
T
eration. After getting above convolution results, we find the maximum of the
IP
corresponding pixel values of images Ires (x, y) as
Imax = max Ires (x, y) (16)
CR
where Imax denotes the maximum response map.
US
AN
(a) (b) (c)

M
Figure 7: The proposed road centerlines extraction method. (a) A binary image; (b) Maximum
response map; (c) Final centerline result.
ED
To get the road centerlines, multiple directional non-maximum suppres-

sion [38] is applied on the maximum response map. Non-maximum suppression
PT
is the process of marking all pixels whose intensity is not maximal within a
190 certain local neighbourhood as zero. The shape of this local neighbourhood is
usually a square or a rectangular window. We use eight linear windows at angles
CE
of 0◦ , 22.5◦ , 45◦ , 67.5◦ , 90◦ , 112.5◦ , 135◦ and 157.5◦ . As Figure 7 shows, our
method can extract smooth and accuracy centerlines.
AC
3. Experiments and analysis
195 Our experiments are performed on a PC with i7 − 4790 3.6 GHz CPU and
a Nvidia GPU of GeForce GTX TianX with MatConvNet toolbox. To demon-
strate the performance of our approach, we do some experiments with aerial
14
ACCEPTED MANUSCRIPT
images and discuss our results. For evaluation, we use two public datasets in
our experiments. One is the EPFL-dataset, which is developed by Turetken
200 et al. [39], the other is Massachusetts Roads dataset. Some images are under
the conditions of complex backgrounds and occlusions of trees. Figure 8(a)
T
shows an aerial image. The result by pixel-wise classification based on CNN is
IP
shown in Figure 8(b). Figure 8(c) shows the image after edge-preserving filter-
ing. Figure 8(d) shows the result by performing a thresholding algorithm on
CR
205 Figure 8(c). Figure 8(e) shows the image after post-processing based on shape
features. Road centerlines result is shown in Figure 8(f). In this experiment,
we use Gabor even filters of twelve different orientations and choose five scales
US
corresponding to five scales,width ∈ {9, 10, 11, 12, 13}, BF is set as 1 .
AN
M
(a) (b) (c)

ED
PT
CE
(d) (e) (f)
Figure 8: The result of each step. (a) Aerial image; (b) Road extraction result by CNN; (c)
AC
The image obtained by edge-preserving filtering; (d) Result by a thresholding algorithm; (e)
Image after post-processing based on shape features; (f) Result of the proposed method;
We provide the comparison results related to SVM to test the performance

210 of classification. For each window representing a road or non-road sample,
15
ACCEPTED MANUSCRIPT
the histogram-based measures, such as mean, standard deviation, skew, energy,

kurtosis and entropy, are computed to form an 18-D feature vector for training
SVM. For fair comparison, the same training samples are used to train SVM.
Figure 9 shows the comparison results. Table 2 and Figure 10 present the cor-
T
215 responding classification accuracy. As can be seen, our method achieves higher
IP
classification accuracy compared to SVM based method, which uses the mean,
standard deviation, skew, energy, kurtosis and entropy of samples. A vision
CR
comparison reveals that the extracted road by our method is more accurate
than that by SVM. Although the initial road network is noisy and not entirely
220 aligned with real road boundaries, it can be solved by the subsequent processing
procedure.
US
AN
M
(a) (b) (c)

ED
PT
CE
AC
(d) (e) (f)
Figure 9: (a) (d)Original images; (b) (e)Road extraction results by SVM; (c) (f)Road extrac-
tion results by the pixel-wise classification with CNN
For quantifying the performance, we use the following three accuracy mea-
16
ACCEPTED MANUSCRIPT
100%
90%
80%
Table 2:70%
Comparison of the classification accuracy of SVM and our method.
60%
50%
Experiment Methods Classification accuracy
40%
30% SVM 78.13%
20% Figure 9(a)
T
10%
Our method 88.70%
0%
IP
SVM Our method SVM Our method
Figure 9(a)SVM 75.85%
Figure 9(d)
Figure 9(d)
CR
Our method 88.23%
100%
US
90%
80%
70%
60%
50%
AN
40%
30%
20%
10%
0%
M
SVM Our method SVM Our method

Figure 9(a) Figure 9(d)
ED
Figure 10: The classification accuracy of SVM and our method.
sures proposed by Wiedemann et al. [40].

PT
TP
Completeness = (17)
TP + FN
CE
TP
Correctness = (18)
TP + FP
TP
AC
Quality = (19)
TP + FP + FN
where T P is the road pixels obtained by an extraction algorithm which is coin-
ciding with the reference data, F P is the obtained road pixels which are not in
the reference data, and F N is the road pixels which are in the reference data
225 but not in the obtained result.
17
ACCEPTED MANUSCRIPT
T
(a1) (b1) (c1) (d1)
IP
CR
(e1) (f1) (g1)
(h1) US
AN
M
(a2) (b2) (c2) (d2)

ED
PT
(e2) (f2) (g2)

CE
AC
(h2)
Figure 11: comparison with different methods ( [2], [39], [9] and [41]) on aerial images. (a1)
and (a2) Aerial images; (b1) and (b2) Ground truth ; (c1) and (c2) Results of Shao’s method
[2]; (d1) and (d2) Results of Sironi’s method [39]; (e1) and (e2) Results of Shi’s method [9],
(f1) and (f2) Results of Liu’s method [41], (g1) and (g2) Results of the proposed method,
(h1) and (h2) Results of magnified images of the sub-regions obtained by different methods.
18
(a3) (b3) (c3) (d3)

ACCEPTED MANUSCRIPT
T
IP
CR
(a3) (b3) (c3) (d3)
US
AN
M
(e3) (f3) (g3)

ED
PT
CE
(a4) (b4) (c4) (d4)

AC
(e4) (f4) (g4)

19
Figure 2: Feature
(e4) maps of different convolutional layers. (g4)
(f4) (a) The original patch with the size
(b)
Figure 2: Feature maps of different convolutional

(b)layers. (a) The original patch with the size
Figure 2: Feature maps of different convolutional layers. (a) The original patch with the size
ACCEPTED MANUSCRIPT
T
IP
(a5) (b5) (c5) (d5)
CR
(e5) (f5)
US (g5)
AN
M
(a6) (b6) (c6) (d6)

ED
PT
(e6) (f6) (g6)

CE
(b) ( [2], [39], [9] and [41]) on aerial images.

Figure 12: Comparison with different methods
AC
(a3), (a4) ,(a5) and (a6)Aerial images; (b3), (b4) ,(b5) and (b6)Ground truth ; (c3), (c4), (c5)
and (c6)2:Results
Figure Featureofmaps
Shao’s method convolutional
of different [2]; (d3), (d4)layers.
,(d5) and (d6) original
(a) The Results patch
of Sironi’s method
with the size
[39];
of 31(e3),
× 31;(e4) ,(e5)feature
(b) The and (e6) Results
maps of theoffirst
Shi’sconvolutional
method [9];layers;
(f3), (f4) ,(f5) feature
(c) The and (f6)maps
Results of
of the
Liu’s
secondmethod [41] (g3),
convolutional (g4) ,(g5) and (g6) Results of the proposed method.
layers.
4
20
ACCEPTED MANUSCRIPT
To verify the performance, we have compared the proposed algorithm with

four existing road extraction methods from the literature. These four methods
are introduced by Shao et al. [2], Sironi et al. [39], Shi et al. [9] and Liu et
al. [41]. Figure 11 and Figure 12 give the comparison results of different road
T
230 extraction methods. A vision comparision reveals that the completeness and
IP
correctness of the proposed method are superior to those of the other three
methods. To evaluate these methods quantitatively, the completeness, correct-
CR
ness, and quality of each method are computed. The results are given in Table
3 and Figure 13. As can be seen, for the image with complex background,
235 the completeness of the road extracted by Sironi’s method is higher than ours.
US
The road centerlines extracted by the proposed method are more correct than
others. Thus our proposed method achieves relatively highest Quality values,
which is an overall evaluation index and a more general measure of the final
AN
result combining completeness and correctness. Shao’s method gets the lowest
240 Quality values because it emphasizes the speed of the algorithm but disregards
its effectiveness. Besides, from the magnified images in Figure 11 we can see
M
that the centerlines extracted by our method have less noise. Our method gets a
good performance in centerlines extraction from high-resolution aerial imagery.
ED
To further verify the robustness of the proposed algorithm for different complex
245 backgrounds, we chose two images from the Massachusetts dataset. The size of
these two images, shown in Figure 14 (a1) and (a2), is 1500 × 1500. They all
PT
have complex backgrounds. Figure 14 (b1) and (b2) shows the ground truth
segmentation , Figure 14 (c1) and (c2) shows the subjective results of road ex-
CE
tracted by our method. One may see that our method can get accurate and
250 complete road network.
For CNN, it consists of convolutional layer, pooling layer and full connected
AC
layer. However, the pooling layer and full connected layer often take 5-10%
of the computational time [42]. Most of the computational time is consumed
by the convolutional layers. As shown in [42], the total time complexity of all
21
ACCEPTED MANUSCRIPT
Table 3: The quantitative evaluation results of various methods.

Experiment Methods Completeness Correctness Quality
Shao’s method [2] 91.78% 84.61% 78.65%
Sironi’s method [39] 98.86 % 92.08% 91.12%
T
Figure 11(a1) Shi’s method [9] 94.69% 92.88% 88.28%
IP
Liu’s method [41] 95.46% 95.92% 91.74%
Proposed method 96.23% 95.70% 92.24%
CR
Shao’s method [2] 93.89% 82.89% 78.65%
Sironi’s method [39] 98.15 % 87.96% 86.53%
Liu’s method [41]
Proposed method
Shao’s method [2]

US 90.84%
94.12%
82.60%
90.79%
91.77%
65.33%
83.18%
86.79%
57.43%
AN
Sironi’s method [39] 87.67% 74.96% 67.81%
Liu’s method [41] 88.20% 95.60% 84.80%

M
Proposed method 93.91% 92.72% 87.46%
Shao’s method [2] 89.28% 54.09% 50.79%
Sironi’s method [39] 85.92% 76.88% 68.28%

ED
Liu’s method [41] 86.72% 78.34% 69.95%
Proposed method 95.92% 84.80% 81.85%

PT
Shao’s method [2] 85.96% 57.53% 52.59%
Sironi’s method [39] 92.77% 68.79% 65.29%

CE
Liu’s method [41] 82.58% 81.74% 69.72%
Proposed method 95.58% 83.04% 79.97%

AC
Shao’s method [2] 85.43% 69.26% 61.94%
Sironi’s method [39] 96.75 % 70.27% 68.65%
Liu’s method [41] 88.08% 84.37% 75.72%
Proposed method 96.65% 91.84% 89.00%
22
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
M
ED
PT
Figure 13: The visualization of quantitative evaluation results.

CE
convolutional layers is
d
!
X
O tl−1 × vl2 × tl × m2l (20)
l=1
AC
where l is the index of the convolutional layer, and d is the depth (the number
of convolutional layers). tl is the number of filters in the lth layer and tl−1 is
also known as the number of input channels of the (l − 1) the layer. Moreover,
vl and ml are the size of the filter and the output feature map, respectively.
255 In fact, our proposed approach mainly consists of four parts, pixel-wise clas-
23
ACCEPTED MANUSCRIPT
sification with CNN , edge-preserving filtering, post-processing and multiscale

centerlines extraction. Hence, the computational complexity of our proposed ap-
proach is the sum of the computational complexity of the four parts. Although
our proposed approach has the high computational complexity, it achieves good
T
260 performance. Therefore, our proposed approach deserves to be studied due to
IP
its promising results.
CR
4. Conclusion
In this paper, we present a method to extract road centerlines from high-

resolution aerial images accurately. The pixel-wise classification map can be
265
US
obtained by the CNN. Edge-preserving filtering is used to optimize the pixel-
wise classification map. Shape features filtering, multidirectional morphological
AN
filtering and hole filling are used to improve the reliability of road extraction.
The centerlines are extracted by multiscale Gabor filters and multiple directional
non-maximum suppression. Experimental results have been evaluated to show
M
270 the effectiveness of the proposed method. However, the proposed method still
has several flaws which we need to do some more research later. The main
limitation of the proposed method is some centerlines by our method are not
ED
single-pixel wide, thus a more accurate method need to be studied.

PT
Acknowledgment
275 The work was jointly supported by the National Natural Science Founda-
CE
tions of China under grant No. 61772396, 61472302, 61772392, 61672409, and
the Fundamental Research Funds for the Central Universities under grant No.
JB170306, JB170304. Natural Science Foundation of Hebei Province of China
AC
under grant No. F2018203096, China Postdoctoral Science Foundation under

280 grant No. 2017M 611188.
24
ACCEPTED MANUSCRIPT
T
IP
CR
(a1) (a2)
US
AN
M
ED
(b1) (b2)
PT
CE
AC
(c1) (c2)
Figure 14: Road area extraction on the Massachusetts Roads dataset. (a1) and (a2) Two
test areas; (b1) and (b2) Ground truth; (c1) and (c2) Corresponding road extraction results
produced by the proposed method.
25
ACCEPTED MANUSCRIPT
References
[1] A. Baumgartner, C. Steger, H. Mayer, W. Eckstein, H. Ebner, Automatic

road extraction based on multi-scale, grouping, and context, Photogram-
metric Engineering and Remote Sensing 65 (1999) 777–786.
T
IP
285 [2] Y. Shao, B. Guo, X. Hu, L. Di, Application of a fast linear feature detector
to road extraction from remotely sensed imagery, IEEE Journal of Selected
CR
Topics in Applied Earth Observations and Remote Sensing 4 (3) (2011)
626–631.
[3] N. Yager, A. Sowmya, Support vector machines for road extraction from re-
290
US
motely sensed images, in: International Conference on Computer Analysis
of Images and Patterns, Springer, 2003, pp. 285–292.
AN
[4] M. Song, D. Civco, Road extraction using svm and image segmentation,
Photogrammetric Engineering & Remote Sensing 70 (12) (2004) 1365–1371.
M
[5] Q. Zhang, I. Couloigner, Benefit of the angular texture signature for the
295 separation of parking lots and roads on high resolution multi-spectral im-
agery, Pattern recognition letters 27 (9) (2006) 937–946.
ED
[6] X. Huang, L. Zhang, Road centreline extraction from high-resolution im-

agery based on multiscale structural features and support vector machines,
PT
International Journal of Remote Sensing 30 (8) (2009) 1977–1987.
300 [7] S. Das, T. Mirnalinee, K. Varghese, Use of salient features for the design
CE
of a multistage framework to extract roads from high-resolution multispec-

tral satellite images, IEEE transactions on Geoscience and Remote sensing
49 (10) (2011) 3906–3931.
AC
[8] J. D. Wegner, J. A. Montoya-Zegarra, K. Schindler, A higher-order crf

305 model for road network extraction, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2013, pp. 1698–1705.
26
ACCEPTED MANUSCRIPT
[9] W. Shi, Z. Miao, J. Debayle, An integrated method for urban main-road

centerline extraction from optical remotely sensed imagery, IEEE Transac-
tions on Geoscience and Remote Sensing 52 (6) (2014) 3359–3372.
310 [10] J. Xu, R. Wang, S. Yue, Bio-inspired classifier for road extraction from
T
remote sensing imagery, Journal of Applied Remote Sensing 8 (1) (2014)
IP
5946–5957.
CR
[11] Z. Miao, W. Shi, P. Gamba, Z. Li, An object-based method for road net-
work extraction in vhr satellite images, IEEE Journal of Selected Topics in
315 Applied Earth Observations and Remote Sensing 8 (10) (2015) 4853–4862.
US
[12] G. Cheng, F. Zhu, S. Xiang, C. Pan, Road centerline extraction via semisu-
pervised segmentation and multidirection nonmaximum suppression, IEEE
Geoscience and Remote Sensing Letters 13 (4) (2016) 545–549.
AN
[13] Z. Miao, W. Shi, A. Samat, G. Lisini, P. Gamba, Information fusion for
320 urban road extraction from vhr optical satellite images, IEEE Journal of
M
Selected Topics in Applied Earth Observations and Remote Sensing 9 (5)

(2016) 1817–1829.
ED
[14] M. Maboudi, J. Amini, M. Hahn, M. Saati, Road network extraction from

vhr satellite images using context aware object feature integration and ten-
325 sor voting, Remote Sensing 8 (8) (2016) 637.
PT
[15] M. Maboudi, J. Amini, M. Hahn, M. Saati, Object-based road extraction

from satellite images using ant colony optimization, International Journal
CE
of Remote Sensing 38 (1) (2017) 179–198.
[16] Y. Wei, Z. Wang, M. Xu, Road structure refined cnn for road extraction in
AC
330 aerial image, IEEE Geoscience and Remote Sensing Letters 14 (5) (2017)
709–713.
[17] W. Liu, Z. Zhang, X. Chen, S. Li, Y. Zhou, Dictionary learning-based hough

transform for road detection in multispectral image, IEEE Geoscience and
Remote Sensing Letters 14 (12) (2017) 2330–2334.
27
ACCEPTED MANUSCRIPT
335 [18] R. Alshehhi, P. R. Marpu, Hierarchical graph-based segmentation for ex-

tracting road networks from high-resolution satellite images, ISPRS journal
of photogrammetry and remote sensing 126 (2017) 245–260.
[19] A. Abdollahi, H. R. R. Bakhtiari, M. P. Nejad, Investigation of svm and
T
level set interactive methods for road extraction from google earth images,
IP
340 Journal of the Indian Society of Remote Sensing (2017) 1–8.
CR
[20] Y. Zang, C. Wang, Y. Yu, L. Luo, K. Yang, J. Li, Joint enhancing filtering
for road network extraction, IEEE Transactions on Geoscience and Remote
Sensing 55 (3) (2017) 1511–1525.
345
US
[21] J. Li, Q. Hu, M. Ai, Unsupervised road extraction via a gaussian mixture
model with object-based features, International Journal of Remote Sensing
AN
39 (8) (2018) 2421–2440.
[22] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. Darrell,

Decaf: A deep convolutional activation feature for generic visual recogni-
M
tion., in: ICML, 2014, pp. 647–655.
350 [23] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for
ED
accurate object detection and semantic segmentation, in: Proceedings of

the IEEE conference on computer vision and pattern recognition, 2014, pp.
580–587.
PT
[24] C. Nebauer, Evaluation of convolutional neural networks for visual recog-

355 nition, IEEE Transactions on Neural Networks 9 (4) (1998) 685–696.
CE
[25] Y. LeCun, K. Kavukcuoglu, C. Farabet, et al., Convolutional networks and

applications in vision., in: ISCAS, 2010, pp. 253–256.
AC
[26] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with

deep convolutional neural networks, in: Advances in neural information
360 processing systems, 2012, pp. 1097–1105.
28
ACCEPTED MANUSCRIPT
[27] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015)
436–444.
[28] G. Li, Y. Yu, Visual saliency based on multiscale deep features, in: Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition,
T
365 2015, pp. 5455–5463.
IP
[29] X. Kang, S. Li, J. A. Benediktsson, Spectral–spatial hyperspectral im-
CR
age classification with edge-preserving filtering, IEEE transactions on geo-
science and remote sensing 52 (5) (2014) 2666–2677.
[30] C. Poullis, S. You, Delineation and geometric modeling of road networks,

370
181.
US
ISPRS Journal of Photogrammetry and Remote Sensing 65 (2) (2010) 165–
AN
[31] Z. Miao, W. Shi, H. Zhang, X. Wang, Road centerline extraction from
high-resolution imagery based on shape features and multivariate adaptive
regression splines, IEEE geoscience and remote sensing letters 10 (3) (2013)
M
375 583–587.
[32] N. Sang, Q. Tang, X. Liu, W. Weng, Multiscale centerline extraction of

ED
angiogram vessels using gabor filters, in: International Conference on Com-

putational and Information Science, Springer, 2004, pp. 570–575.
PT
[33] Z. Cao, X. Liu, B. Peng, Y.-S. Moon, Dsa image registration based on mul-
380 tiscale gabor filters and mutual information, in: 2005 IEEE International
Conference on Information Acquisition, IEEE, 2005, pp. 6–pp.
CE
[34] P. Sermanet, S. Chintala, Y. LeCun, Convolutional neural networks applied

to house numbers digit classification, in: Pattern Recognition (ICPR), 2012
AC
21st International Conference on, IEEE, 2012, pp. 3288–3291.
385 [35] K. He, J. Sun, X. Tang, Guided image filtering, in: European conference
on computer vision, Springer, 2010, pp. 1–14.
29
ACCEPTED MANUSCRIPT
[36] P. P. Singh, R. Garg, A two-stage framework for road extraction from

high-resolution satellite images by using prominent features of impervious
surfaces, International Journal of Remote Sensing 35 (24) (2014) 8074–
390 8107.
T
[37] T. Chao, T. Yihua, C. Huajie, et al., Object-oriented method of hierarchical
IP
urban building extraction from high resolution remote sensing imagery,
Acta Geodaetica et Cartographica Sinica 39 (1) (2010) 39–45.
CR
[38] C. Sun, P. Vallotton, Fast linear feature detection using multiple directional
395 non-maximum suppression, Journal of Microscopy 234 (2) (2009) 147–157.
US
[39] A. Sironi, V. Lepetit, P. Fua, Multiscale centerline detection by learning
a scale-space distance transform, in: 2014 IEEE Conference on Computer
AN
Vision and Pattern Recognition, IEEE, 2014, pp. 2697–2704.
[40] C. Wiedemann, C. Heipke, H. Mayer, O. Jamet, Empirical evaluation of au-

400 tomatically extracted road axes, Empirical Evaluation Techniques in Com-
M
puter Vision (1998) 172–187.
[41] R. Liu, Q. Miao, B. Huang, J. Song, J. Debayle, Improved road centerlines

ED
extraction in high-resolution remote sensing images using shear transform,

directional morphological filtering and enhanced broken lines connection,
405 Journal of Visual Communication and Image Representation 40 (2016) 300–
PT
311.
[42] K. He, J. Sun, Convolutional neural networks at constrained time cost,

CE
in: Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition, 2015, pp. 5353–5360.
AC
30
ACCEPTED MANUSCRIPT
410 Biography
Ruyi Liu received the B.S. degree from Shaanxi Normal University Shaanxi,
T
China in 2012. Now she is currently working toward the Ph.D. degree in School
IP
of Computer and technology, Xidian University Shaanxi, China. Her current
interests include image classification and segmentation, and computer vision
CR
415
methods with applications in remote sensing. Email: ruyi198901210121@126.com
US
AN
Qiguang Miao received the M.Eng and Doctor degrees in Computer Sci-
ence from Xidian University, China. He is currently working as a professor at
420 school of computer science, Xidian University. His research interests include the
M
intelligent information processing, the intelligent image processing, and multi-

scale geometric representations for image. Email: qgmiao@126.com
ED
Jianfeng Song received his BEs and M.Eng degrees in Computer Appli-
PT
425 cation Technology at Xidian University, China. He is currently working as

Lecturer at School of Computer Science and Technology, Xidian University.
His research interests include the intelligent image processing and the system
CE
security. Email: jfsong@mail.xidian.edu.cn

AC
430 YiningQuanistheassociateprofessorandgraduatestudentsupervisorofSchoolofComputerScienceandTechnologyi
received his doctor degree in cryptology fromXidianUniversityin2010.His re-
search interests include network computingandsecurity. Email: ynquan@xidian.edu.cn
31
ACCEPTED MANUSCRIPT
Yunan Li received this B.S degree from the School of Computer Science and
T
435 Technology, Xidian University, Xi’an, China in 2014. He is currently working
toward the Ph.D. degree at Xidian University. His research interests include
IP
pattern recognition and digital image processing. Email: xdfzliyunan@163.com
CR
440
US
Pengfei Xu is lecturer at Information Science and Technology School,
Northwest University in China. His main research interests include: image
AN
processing and pattern recognition. Email:pfxu@nwu.edu.cn
M
ED
PT
CE
AC
32

Accepted Manuscript: Neurocomputing

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Accepted Manuscript: Neurocomputing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Accepted Manuscript: Neurocomputing

Uploaded by

Copyright:

Available Formats

Communicated by Yue Gao

Multiscale road centerlines extraction from high-resolution aerial

To appear in: Neurocomputing

Received date: 5 October 2017

Multiscale road centerlines extraction from

ing to extract robust features automatically, and present a method to extract

steps. Firstly, the aerial imagery is classified by a pixel-wise classifier based

Preprint submitted to Journal of LATEX Templates October 16, 2018

angular texture signature. Multiscale structural features and support vector

from objects. A semisupervised method was introduced by Cheng et al. [12],

An information fusion based approach is proposed in [13]. Spectral and shape

ded as heuristic information in the ant colony optimization (ACO) algorithm

85 Many approaches have been proposed to extract road centerlines. These

directional non-maximum suppression is proposed.

high-resolution aerial images. The pixel-wise classification map can be obtained

In Section 2, the proposed method is described. Section 3 gives the experimen-

The proposed method consists of four steps: pixel-wise classification with

Post-processing based on shape features

Figure 1: Flowchart of the proposed method.

2.1. Pixel-wise classification with CNN

CNN has been successfully applied in the classification task. It is a train-

applications we often faced with the requirement of constrained time budget.

2.2. Edge-preserving based on guided image filtering

is expressed as a weighted average:

the guidance image I. wi,j can be expressed as follows:

wk . A 1-D step edge example is presented in Figure 4 to demonstrate the edge

2.3. Post-processing based on shape features

and multidirectional morphological filtering [37] are used to distinguish potential

a morphologic closing computation operator is applied to fill holes on road. We

EL,α . It can be expressed as US

Road centerlines extraction has been an active research. In order to produce

A 2-D even Gabor function is a Gaussian modulated by cosine, as the fol-

deviation of the Gaussian envelope, BF is the spatial frequency bandwidth.

Centerlines extraction from binary images is to get the maximum response

orientations. Thus, the response maps Ires (x, y) can be given as

Ires (x, y) = |I ∗ hF,θ (x, y)| (15)

Imax = max Ires (x, y) (16)

(a) (b) (c)

To get the road centerlines, multiple directional non-maximum suppres-

3. Experiments and analysis

(a) (b) (c)

(d) (e) (f)

We provide the comparison results related to SVM to test the performance

the histogram-based measures, such as mean, standard deviation, skew, energy,

(a) (b) (c)

(d) (e) (f)

SVM Our method SVM Our method

Figure 10: The classification accuracy of SVM and our method.

sures proposed by Wiedemann et al. [40].

(a2) (b2) (c2) (d2)

(e2) (f2) (g2)

(a3) (b3) (c3) (d3)

(e3) (f3) (g3)

(a4) (b4) (c4) (d4)

(e4) (f4) (g4)

Figure 2: Feature maps of different convolutional

(a6) (b6) (c6) (d6)

(e6) (f6) (g6)