Accepted Manuscript: Neurocomputing
Accepted Manuscript: Neurocomputing
Accepted Manuscript: Neurocomputing
Accepted Manuscript
Ruyi Liu, Qiguang Miao, Jianfeng Song, Yining Quan, Yunan Li,
Pengfei Xu, Jing Dai
PII: S0925-2312(18)31224-4
DOI: https://doi.org/10.1016/j.neucom.2018.10.036
Reference: NEUCOM 20058
Please cite this article as: Ruyi Liu, Qiguang Miao, Jianfeng Song, Yining Quan, Yunan Li, Pengfei Xu,
Jing Dai, Multiscale road centerlines extraction from high-resolution aerial imagery, Neurocomputing
(2018), doi: https://doi.org/10.1016/j.neucom.2018.10.036
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Ruyi Liua , Qiguang Miaoa,∗, Jianfeng Songa , Yining Quana , Yunan Lia ,
Pengfei Xub , Jing Daic
T
a School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi, 710071,
IP
China
b School of Information Science and Technology, Northwest University, Xi’an, Shaanxi,
710127, China
c The hospital of Cheng Du military area authority, Chengdu, Sichuan, 610011, China
CR
Abstract
US
Accurate road extraction from high-resolution aerial imagery has many appli-
cations such as urban planning and vehicle navigation system. The common
AN
road extraction methods are based on classification algorithm, which needs to
design robust handcrafted features for road. However, designing such features
is difficult. For the road centerlines extraction problem, the existing algorithms
M
have some limitations, such as spurs, time consuming. To address the above
issues to some extent, we introduce the feature learning based on deep learn-
ED
tures from raw data automatically, especially the structural features. Then,
edge-preserving filtering is conducted on the resulting classification map, with
the original imagery serving as the guidance image. It is exploied to preserve
AC
the edges and the details of the road. After that, we do some post-processing
✩ Fully
documented templates are available in the elsarticle package on CTAN.
∗ Correspondingauthor
Email address: qgmiao@126.com (Qiguang Miao)
URL: http://web.xidian.edu.cn/qgmiao/en/index.html (Qiguang Miao)
based on shape features to extract more reliable roads. Finally, multiscale Ga-
bor filters and multiple directional non-maximum suppression are integrated to
get a complete and accurate road network. Experimental results show that the
proposed method can achieve comparable or higher quantitative results, as well
T
as more satisfactory visual performance.
IP
Keywords: Convolutional Neural Network (CNN), Edge-preserving filtering,
multiscale Gabor filters, centerlines extraction
CR
1. Introduction
US
Road information has become essential nowadays as they support many ap-
plications, such as vehicle navigation system, change detection and urban plan-
ning, so road extraction from high-resolution images is very important. In the
AN
5 last years, much effort has been made and various road extraction methods have
been developed. Roads are modeled as a network of intersections and links be-
tween these intersections, and are found by grouping processes [1]. A scheme
M
for road extraction in rural areas which integrates three different modules with
specific strengths is presented. It can get good results for simple open area with
ED
10 bright ribbon roads, but can not work well for the images with complex back-
ground. Shao et al. [2] used a fast linear detector to extract road centerlines on
high-contrast road pixels with increased performance. Besides linear features,
PT
road intersections are another signature in road networks. Owing to the simple
road extraction operator of only two directions and structure optimization algo-
15 rithm, the algorithm can be realized rapidly. However, the problem lies in this
CE
method emphasizes the speed of the algorithm but disregards its effectiveness.
Most popular and successful methods rely on classification [3] [4] [5] [6] [7] [8]
AC
[9] [10] [11] [12]. Yager et al. applied SVM to the road extraction from re-
mote sensing images using edge-based features [3]. Song and Civco exploited
20 smoothness and compactness to reduce the misclassification between roads and
other objects in a SVM classifier [4]. To improve the accuracy of road extrac-
tion, Zhang and Couloigner [5] integrated traditional k-means clustering with
2
ACCEPTED MANUSCRIPT
T
viation, skew, energy, and entropy [7]. Classification is applied to label the
IP
pixels or voxels as belonging to the structure of interest or to the background
[8]. What makes the problem challenging is the complex structure of the prior:
CR
30 Roads form a connect network, thin segments which meet at junctions and
crossings. A priori knowledge is more difficult to turn into a tractable model
than standard smoothness or co-occurrence assumption. Shi et al. proposed
35
US
an integrated method for urban main road centerlines extraction. SVM, gen-
eral adaptive neighborhood, local Geary’s C and local linear kernel smoothing
regression were utilized [9]. Xu et al. presented a bio-inspired model for road
AN
extraction from remote sensing imagery. The model was an improved support
vector machine (SVM) based on the pooling of feature vectors [10]. In order to
obtain a comprehensive feature extraction method for road extraction, Miao et
M
al. proposed a novel object-based automatic method [11]. This method not only
40 used spectral information but also other spatial and spectral features derived
ED
using two different methods (i.e., expectation maximization clustering and lin-
earness filtering). A novel multi-stage object-based approach for road extraction
from VHR satellite images is proposed [14]. Object-based information is embed-
AC
3
ACCEPTED MANUSCRIPT
straight road detection in multispectral images [17]. A new method for extract-
55 ing roads from high-resolution imagery based on hierarchical graph-based image
segmentation is presented in [18]. Abdollahi et al. proposed a new automatic
method for road extraction by integrating the SVM and Level Set methods.
T
The estimated probability of classification by SVM is used as input in Level
IP
Set Method [19]. Zang et al. proposed a task-oriented enhancing technique
60 for extracting road networks from satellite images [20]. An unsupervised road
CR
detection method based on a Gaussian mixture model and object-based features
is proposed in [21].
The aforementioned methods have achieved accurate road extraction from
65 US
remote sensing imagery. They all depend on the features that can better char-
acterize the distinctiveness of a road region with respect to its surrounding area.
So how to design the robust features is critical for the performance of road ex-
AN
traction. Recently, feature learning has been a topic of interest and considerable
progress has been achieved. Features learned by deep learning have resulted in
state-of-the-art performance in various classification tasks [22] [23] [24] [25] [26].
M
70 Deep learning is a new machine learning method, which establishes deep hier-
archical models to represent and analyze data. Deep learning methods mainly
ED
include Convolutional Neural Network (CNN), deep belief networks (DBN) and
stacked auto-encoders (SAE) [27]. CNN is a popular deep model and has been
widely used in computer vision, such as object detection, image classification
PT
75 and image segmentation. In CNN model, trainable filters and local neighbor-
hood pooling operations are applied alternatingly on the raw input images. Such
CE
multiple layers architecture can extract robust features from raw pixels automat-
ically. Features extracted by such network are highly versatile and often more
effective than traditional handcrafted features [28]. Inspired by this, we obtain
AC
80 features for roads by training CNN. Our trained CNN can also classify raw
pixels into road or non-road. During the last decade, in order to improve the
classification accuracy, edge-preserving filtering has been successfully applied
for hyperspectral image classification [29]. Here, we also use edge-preserving
filtering to smooth the classification map and get the real road boundaries.
4
ACCEPTED MANUSCRIPT
T
thinning algorithm always produces some spurs. The radon transform based
IP
90 method can only extract the centerlines of straight road segments. Due to the
limitations of hough transform, some false centerlines exist. Some regression
CR
based methods fail to extract the centerlines of the complicated junctions. To
overcome the aforementioned shortcomings of centerlines extraction algorithms,
we use multiscale Gabor filters [32] [33] to enhance the centerlines. This helps to
95
US
find the accurate location of centerlines. The main contributions of our approach
are highlighted as follows. a) We use CNN to capture the local contrast, texture
as well as shape information, and predicting the label of each pixel without the
AN
need for hand-crafted features. b) We introduce the edge-preserving based on
guided image filtering to make the initial road map align with real road bound-
100 aries. c) A multiscale centerlines extraction using Gabor filters and multiple
M
road centerlines are extracted by multiscale Gabor filters and multiple direc-
tional non-maximum suppression. The rest of this paper is organized as follows.
CE
2. Proposed methodology
5
ACCEPTED MANUSCRIPT
on shape features, and multiscale road centerlines extraction using Gabor filters
115 and multiple directional non-maximum suppression. The organization of this
method is shown in Figure 1. The details of each step are presented as follows.
Input image
T
IP
Patches Predictions
Initial road map
by Pixel-wise
classification
CR
based on CNN
US
Edge-preserving filtering
Process
Final road centerlines map
M
120 divided into three types: convolution layer, pooling layer and multi-layer per-
ceptron layer. All the convolution and pooling layers compose the feature ex-
tractor of CNN. After extracting features with a multilayer convolutional net-
AC
work, fully connected layers with a classifier are added to output class predic-
tions. A large amount of labeled samples are needed to train CNN for great
125 generalization. Given an image I, the input image patches are represented as
xO = {x1 , x2 , ...xs }, where s is the number of the training samples. These
input samples are obtained by extracting the patches centered on the pixels
6
ACCEPTED MANUSCRIPT
Table 1: Architecture details of our network. C: convolutional layer; BN: Batch normaliza-
tion, it takes a step towards reducing internal covariate shift, and in doing so dramatically
accelerates the training of deep neural nets; P: pooling; F: fully connected layer; R: ReLUs, it
is the most commonly used activation function in deep learning models. The function returns
0 if it receives any negative input, but for any positive value x it returns that value back; S:
T
softmax; Channels: the number of output feature maps; Input size: the spatial size of input
IP
feature maps.
Layer 1 2 3 4 5(Output)
CR
Type C+BN+P C+BN+P C+BN+R F F+S
Channels 12 24 50 50 2
Filter size 6×6 6×6 4×4 - -
Pooling size
Pooling stride
Input size
2×2
2
31 × 31
2×2
2
13 × 13
US -
-
4×4
-
-
1×1
-
-
1×1
AN
O = {o1 , o2 , ...os }. For the input image patches xO = {x1 , x2 , ...xs }, the corre-
sponding network output is expressed as y = {y1 , y2 , ...ys }. Each ys takes its
M
130 value from a finite set of classes Ω1 = {1, 2, ...K}, where K is the number of
class. The labels in our dataset are binary, i.e., road or non-road. For real-world
ED
135 five layers, with three convolutional layers and two fully connected layers. Each
layer contains learnable parameters. The network takes a RGB image patch of
CE
31×31 pixels as an input, and exploits a softmax regression model as the output
layer to generate the probabilities of the central pixel being road and non-road.
The architecture details are listed in Table 1.
AC
140 For the datasets used in our paper, we randomly select 1% labeled patches
with the size of 31 × 31 as training samples. All the patches centered on the
pixels with the size of 31 × 31 are used as the testing samples. To label the
training patches, we mainly consider the groundtruth of their central pixels
7
ACCEPTED MANUSCRIPT
T
IP
(a) (b) (c)
Figure 2: Feature maps of different convolutional layers. (a) The original patch with the size
CR
of 31 × 31; (b) The feature maps of the first convolutional layers; (c) The feature maps of the
second convolutional layers.
145
US
as well as the overlaps between the patches and the groundtruth road mask.
The patch B is labeled as a positive training example if i). the central pixel
is belong to road class, and ii).it sufficiently overlaps with the ground truth
AN
road region G: |B ∩ G| ≥ 0.8 × min(|B| , |G|). Similarly, the patch B is labeled
as a negative training example if i). the central pixel is located within the
background, and ii). its overlap with the ground truth salient region is less
M
150 than a predefined threshold: |B ∩ G| < 0.2 × min (|B| , |G|). In order to better
understand CNN, we show the feature maps of convolutional layers in Figure 2.
ED
Figure 2(a) is the original image patch. Figure 2(b) and (c) are feature maps of
the first convolutional layers and the second convolutional layers, respectively.
It is observed that CNN can learn the features very well. Besides, the extracted
PT
155 features are more abstract and global as the layers get deeper.
The result of pixel-wise classification appears noisy and not aligned with real
road boundaries. To solve this problem, it is optimized by edge-preserving based
on guided image filtering [29] [35]. The guided filter is based on a local linear
AC
model which assumes that the filtering output q can be represented as a linear
transform of the guidance image I in a local window w of size (2r + 1)×(2r + 1)
as follows:
qi = aj Ii + bj , ∀i ∈ wj (1)
8
ACCEPTED MANUSCRIPT
This model ensures ∇q ≈ a∇I, which means that the filtering output q will have
an edge only if the guidance image I has an edge. To determine the coefficients
aj and bj , the output q is modeled as the input p subtracting some unwanted
components n like noise/textures:
T
qi = pi − ni (2)
IP
A solution that minimizes the difference between q and p while maintaining the
CR
linear model (1) is proposed in [35]. the following cost function is minimized in
the window wj :
X 2
(aj Ii + bj − Pi ) + εa2j
E (aj , bj ) =
i∈wj
US
where ε is a regularization parameter deciding the degree of blurring for the
(3)
AN
guided filter. Figure 3 shows an illustration of the guided filtering process. We
adopt the input image as the color guidance image. For guided filter, (1) can be
represented in a weighted average form as (4), so the filtering output at a pixel
M
where i and j are pixel indexes. p represents the result of pixel-wise classifica-
tion. The filtering weight wi,j is chosen so that the filter can preserve edges of
PT
1 X (Ii − µk ) (Ij − µk )
wi,j (I) = 2 1+ (5)
|w| k∈wi ,k∈wj σk2 + ε
CE
where wi and wj are local windows around pixel i and j, respectively, µk and
σk are the mean and variance of I in wk ,and |w| is the number of pixels in
AC
9
ACCEPTED MANUSCRIPT
Shear
transform
T
...
IP
CR
Collect
samples
US
Figure 3: Illustrations of the guided filtering process.
...
AN
filtering. It shows that the noisy in Figure 5(b) can be effectively smoothed.
importantly, the image obtained by edge-preserving filtering tends to. be
MoreConvolution
.
. is
aligned with edges in the guidance image. Then a thresholding algorithm
M
Convolution
Subsampling
applied to extract desirable road segments. The thresholding method can be
Subsampling
expressed mathematically as
ED
Softmax
1 q i > threshold
qi = (6)
0 otherwise
PT
. .
Ii I j
s
CE
m
s
AC
.I j
Figure 4: Example of 1-D step edge. Here, µ and σ are shown for a filtering kernel centered
exactly at an edge.
10
ACCEPTED MANUSCRIPT
T
IP
CR
(a)
US
(b) (c)
AN
Figure 5: Guided image filtering. (a) The guidance image I; (b) The result of pixel-wise
classification; (c) An image obtained by edge-preserving filtering.
After the above processes, false roads still exist. A refinement process is
160 necessary to improve the reliability of road extraction. In general, roads have
ED
the following characteristics: 1) They do not have small areas; regions with small
areas can be regarded as noisy and should be removed, and 2) they are narrow
and long [36]. In terms of roads’ characteristics, shape features filtering [31]
PT
A typical shape features filtering method is using linear feature index (LFI).
We use connected component analysis to divide pixels into connected compo-
nents. The component is then converted to a rectangle which satisfies
LW = np (7)
11
ACCEPTED MANUSCRIPT
where L is the length of the new rectangle, W is the width of the new rectangle,
170 np is the area of the road segment(also known as pixel number).
LFI can be calculated by
L L L2
LF I = = = (8)
T
W np /L np
According to roads characteristics, they should have large values of LFI, so
IP
regions with small values of LFI can be regarded as nonlinear features and will
CR
be removed.
In order to ensure the independence of each road candidate, directional mor-
phological filtering is applied to eliminate these neighboring non-road segments.
90
◦
AN
f= ∪ ◦
I ◦ EL,αi (9)
αi =−90
◦
yi = xi tan αi , xi = 0, ±1, ... ± (L−1)2cos αi , if |αi | ≤ 45
= (xi , yi )
M
EL,αi
xi = yi cot αi , yi = 0, ±1, ... ± (L−1) sin αi , if 45 ≤ |αi | < 90
◦ ◦
2
(10)
ED
where αi is the orientation angle , Lse is the length of the line structure element,
175 I is the image after shape features filtering, ◦ is the morphological opening
operator.
PT
2.4. Multiscale road centerlines extraction using Gabor filters and multiple di-
rectional Non-maximum suppression
CE
12
ACCEPTED MANUSCRIPT
0.5
T
-0.5
-1
IP
50
CR
-50
-30 -20 -10 0 10 20 30
(a) (b)
US
Figure 6: A 2-D even Gabor filter in spatial domain at orientation θ = 0 rendered as (a) a 2D
intensity map and (b) a 3D surface.
AN
0 0
x ,y = (x cos θ + y sin θ, −x sin θ + y cos θ) (12)
M
1
F = (13)
2 ∗ width
r
ln 2 2BF + 1
ED
σ= (14)
2 πF (2BF − 1)
where θ and F denote the orientation of the filter and the spatial center fre-
quency, respectively. width represents the width of a region. σ is the standard
PT
centerlines positions have local maximum values, we convolve the road segmen-
tation image with a bank of Gabor filters tuned at different orientations and
frequencies, and choose the maximum response among the different scales and
13
ACCEPTED MANUSCRIPT
where I denotes the road segmentation image, ∗ denotes the convolution op-
T
eration. After getting above convolution results, we find the maximum of the
IP
corresponding pixel values of images Ires (x, y) as
CR
where Imax denotes the maximum response map.
US
AN
Figure 7: The proposed road centerlines extraction method. (a) A binary image; (b) Maximum
response map; (c) Final centerline result.
ED
is the process of marking all pixels whose intensity is not maximal within a
190 certain local neighbourhood as zero. The shape of this local neighbourhood is
usually a square or a rectangular window. We use eight linear windows at angles
CE
of 0◦ , 22.5◦ , 45◦ , 67.5◦ , 90◦ , 112.5◦ , 135◦ and 157.5◦ . As Figure 7 shows, our
method can extract smooth and accuracy centerlines.
AC
195 Our experiments are performed on a PC with i7 − 4790 3.6 GHz CPU and
a Nvidia GPU of GeForce GTX TianX with MatConvNet toolbox. To demon-
strate the performance of our approach, we do some experiments with aerial
14
ACCEPTED MANUSCRIPT
images and discuss our results. For evaluation, we use two public datasets in
our experiments. One is the EPFL-dataset, which is developed by Turetken
200 et al. [39], the other is Massachusetts Roads dataset. Some images are under
the conditions of complex backgrounds and occlusions of trees. Figure 8(a)
T
shows an aerial image. The result by pixel-wise classification based on CNN is
IP
shown in Figure 8(b). Figure 8(c) shows the image after edge-preserving filter-
ing. Figure 8(d) shows the result by performing a thresholding algorithm on
CR
205 Figure 8(c). Figure 8(e) shows the image after post-processing based on shape
features. Road centerlines result is shown in Figure 8(f). In this experiment,
we use Gabor even filters of twelve different orientations and choose five scales
US
corresponding to five scales,width ∈ {9, 10, 11, 12, 13}, BF is set as 1 .
AN
M
Figure 8: The result of each step. (a) Aerial image; (b) Road extraction result by CNN; (c)
AC
The image obtained by edge-preserving filtering; (d) Result by a thresholding algorithm; (e)
Image after post-processing based on shape features; (f) Result of the proposed method;
15
ACCEPTED MANUSCRIPT
T
215 responding classification accuracy. As can be seen, our method achieves higher
IP
classification accuracy compared to SVM based method, which uses the mean,
standard deviation, skew, energy, kurtosis and entropy of samples. A vision
CR
comparison reveals that the extracted road by our method is more accurate
than that by SVM. Although the initial road network is noisy and not entirely
220 aligned with real road boundaries, it can be solved by the subsequent processing
procedure.
US
AN
M
Figure 9: (a) (d)Original images; (b) (e)Road extraction results by SVM; (c) (f)Road extrac-
tion results by the pixel-wise classification with CNN
For quantifying the performance, we use the following three accuracy mea-
16
ACCEPTED MANUSCRIPT
100%
90%
80%
Table 2:70%
Comparison of the classification accuracy of SVM and our method.
60%
50%
Experiment Methods Classification accuracy
40%
30% SVM 78.13%
20% Figure 9(a)
T
10%
Our method 88.70%
0%
IP
SVM Our method SVM Our method
Figure 9(a)SVM 75.85%
Figure 9(d)
Figure 9(d)
CR
Our method 88.23%
100%
US
90%
80%
70%
60%
50%
AN
40%
30%
20%
10%
0%
M
TP
Completeness = (17)
TP + FN
CE
TP
Correctness = (18)
TP + FP
TP
AC
Quality = (19)
TP + FP + FN
where T P is the road pixels obtained by an extraction algorithm which is coin-
ciding with the reference data, F P is the obtained road pixels which are not in
the reference data, and F N is the road pixels which are in the reference data
225 but not in the obtained result.
17
ACCEPTED MANUSCRIPT
T
(a1) (b1) (c1) (d1)
IP
CR
(e1) (f1) (g1)
(h1) US
AN
M
(h2)
Figure 11: comparison with different methods ( [2], [39], [9] and [41]) on aerial images. (a1)
and (a2) Aerial images; (b1) and (b2) Ground truth ; (c1) and (c2) Results of Shao’s method
[2]; (d1) and (d2) Results of Sironi’s method [39]; (e1) and (e2) Results of Shi’s method [9],
(f1) and (f2) Results of Liu’s method [41], (g1) and (g2) Results of the proposed method,
(h1) and (h2) Results of magnified images of the sub-regions obtained by different methods.
18
T
IP
CR
(a3) (b3) (c3) (d3)
US
AN
M
Figure 2: Feature
(e4) maps of different convolutional layers. (g4)
(f4) (a) The original patch with the size
of 31 × 31; (b) The feature maps of the first convolutional layers; (c) The feature maps of the
(b)
second convolutional layers.
T
IP
(a5) (b5) (c5) (d5)
CR
(e5) (f5)
US (g5)
AN
M
(a3), (a4) ,(a5) and (a6)Aerial images; (b3), (b4) ,(b5) and (b6)Ground truth ; (c3), (c4), (c5)
and (c6)2:Results
Figure Featureofmaps
Shao’s method convolutional
of different [2]; (d3), (d4)layers.
,(d5) and (d6) original
(a) The Results patch
of Sironi’s method
with the size
[39];
of 31(e3),
× 31;(e4) ,(e5)feature
(b) The and (e6) Results
maps of theoffirst
Shi’sconvolutional
method [9];layers;
(f3), (f4) ,(f5) feature
(c) The and (f6)maps
Results of
of the
Liu’s
secondmethod [41] (g3),
convolutional (g4) ,(g5) and (g6) Results of the proposed method.
layers.
4
20
ACCEPTED MANUSCRIPT
T
230 extraction methods. A vision comparision reveals that the completeness and
IP
correctness of the proposed method are superior to those of the other three
methods. To evaluate these methods quantitatively, the completeness, correct-
CR
ness, and quality of each method are computed. The results are given in Table
3 and Figure 13. As can be seen, for the image with complex background,
235 the completeness of the road extracted by Sironi’s method is higher than ours.
US
The road centerlines extracted by the proposed method are more correct than
others. Thus our proposed method achieves relatively highest Quality values,
which is an overall evaluation index and a more general measure of the final
AN
result combining completeness and correctness. Shao’s method gets the lowest
240 Quality values because it emphasizes the speed of the algorithm but disregards
its effectiveness. Besides, from the magnified images in Figure 11 we can see
M
that the centerlines extracted by our method have less noise. Our method gets a
good performance in centerlines extraction from high-resolution aerial imagery.
ED
To further verify the robustness of the proposed algorithm for different complex
245 backgrounds, we chose two images from the Massachusetts dataset. The size of
these two images, shown in Figure 14 (a1) and (a2), is 1500 × 1500. They all
PT
have complex backgrounds. Figure 14 (b1) and (b2) shows the ground truth
segmentation , Figure 14 (c1) and (c2) shows the subjective results of road ex-
CE
tracted by our method. One may see that our method can get accurate and
250 complete road network.
For CNN, it consists of convolutional layer, pooling layer and full connected
AC
layer. However, the pooling layer and full connected layer often take 5-10%
of the computational time [42]. Most of the computational time is consumed
by the convolutional layers. As shown in [42], the total time complexity of all
21
ACCEPTED MANUSCRIPT
T
Figure 11(a1) Shi’s method [9] 94.69% 92.88% 88.28%
IP
Liu’s method [41] 95.46% 95.92% 91.74%
CR
Shao’s method [2] 93.89% 82.89% 78.65%
Proposed method
94.12%
82.60%
90.79%
91.77%
65.33%
83.18%
86.79%
57.43%
AN
Sironi’s method [39] 87.67% 74.96% 67.81%
22
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
M
ED
PT
convolutional layers is
d
!
X
O tl−1 × vl2 × tl × m2l (20)
l=1
AC
where l is the index of the convolutional layer, and d is the depth (the number
of convolutional layers). tl is the number of filters in the lth layer and tl−1 is
also known as the number of input channels of the (l − 1) the layer. Moreover,
vl and ml are the size of the filter and the output feature map, respectively.
255 In fact, our proposed approach mainly consists of four parts, pixel-wise clas-
23
ACCEPTED MANUSCRIPT
T
260 performance. Therefore, our proposed approach deserves to be studied due to
IP
its promising results.
CR
4. Conclusion
270 the effectiveness of the proposed method. However, the proposed method still
has several flaws which we need to do some more research later. The main
limitation of the proposed method is some centerlines by our method are not
ED
Acknowledgment
275 The work was jointly supported by the National Natural Science Founda-
CE
tions of China under grant No. 61772396, 61472302, 61772392, 61672409, and
the Fundamental Research Funds for the Central Universities under grant No.
JB170306, JB170304. Natural Science Foundation of Hebei Province of China
AC
24
ACCEPTED MANUSCRIPT
T
IP
CR
(a1) (a2)
US
AN
M
ED
(b1) (b2)
PT
CE
AC
(c1) (c2)
Figure 14: Road area extraction on the Massachusetts Roads dataset. (a1) and (a2) Two
test areas; (b1) and (b2) Ground truth; (c1) and (c2) Corresponding road extraction results
produced by the proposed method.
25
ACCEPTED MANUSCRIPT
References
T
IP
285 [2] Y. Shao, B. Guo, X. Hu, L. Di, Application of a fast linear feature detector
to road extraction from remotely sensed imagery, IEEE Journal of Selected
CR
Topics in Applied Earth Observations and Remote Sensing 4 (3) (2011)
626–631.
[3] N. Yager, A. Sowmya, Support vector machines for road extraction from re-
290
US
motely sensed images, in: International Conference on Computer Analysis
of Images and Patterns, Springer, 2003, pp. 285–292.
AN
[4] M. Song, D. Civco, Road extraction using svm and image segmentation,
Photogrammetric Engineering & Remote Sensing 70 (12) (2004) 1365–1371.
M
[5] Q. Zhang, I. Couloigner, Benefit of the angular texture signature for the
295 separation of parking lots and roads on high resolution multi-spectral im-
agery, Pattern recognition letters 27 (9) (2006) 937–946.
ED
300 [7] S. Das, T. Mirnalinee, K. Varghese, Use of salient features for the design
CE
26
ACCEPTED MANUSCRIPT
310 [10] J. Xu, R. Wang, S. Yue, Bio-inspired classifier for road extraction from
T
remote sensing imagery, Journal of Applied Remote Sensing 8 (1) (2014)
IP
5946–5957.
CR
[11] Z. Miao, W. Shi, P. Gamba, Z. Li, An object-based method for road net-
work extraction in vhr satellite images, IEEE Journal of Selected Topics in
315 Applied Earth Observations and Remote Sensing 8 (10) (2015) 4853–4862.
US
[12] G. Cheng, F. Zhu, S. Xiang, C. Pan, Road centerline extraction via semisu-
pervised segmentation and multidirection nonmaximum suppression, IEEE
Geoscience and Remote Sensing Letters 13 (4) (2016) 545–549.
AN
[13] Z. Miao, W. Shi, A. Samat, G. Lisini, P. Gamba, Information fusion for
320 urban road extraction from vhr optical satellite images, IEEE Journal of
M
[16] Y. Wei, Z. Wang, M. Xu, Road structure refined cnn for road extraction in
AC
330 aerial image, IEEE Geoscience and Remote Sensing Letters 14 (5) (2017)
709–713.
27
ACCEPTED MANUSCRIPT
T
level set interactive methods for road extraction from google earth images,
IP
340 Journal of the Indian Society of Remote Sensing (2017) 1–8.
CR
[20] Y. Zang, C. Wang, Y. Yu, L. Luo, K. Yang, J. Li, Joint enhancing filtering
for road network extraction, IEEE Transactions on Geoscience and Remote
Sensing 55 (3) (2017) 1511–1525.
345
US
[21] J. Li, Q. Hu, M. Ai, Unsupervised road extraction via a gaussian mixture
model with object-based features, International Journal of Remote Sensing
AN
39 (8) (2018) 2421–2440.
350 [23] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for
ED
28
ACCEPTED MANUSCRIPT
[27] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015)
436–444.
[28] G. Li, Y. Yu, Visual saliency based on multiscale deep features, in: Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition,
T
365 2015, pp. 5455–5463.
IP
[29] X. Kang, S. Li, J. A. Benediktsson, Spectral–spatial hyperspectral im-
CR
age classification with edge-preserving filtering, IEEE transactions on geo-
science and remote sensing 52 (5) (2014) 2666–2677.
181.
US
ISPRS Journal of Photogrammetry and Remote Sensing 65 (2) (2010) 165–
AN
[31] Z. Miao, W. Shi, H. Zhang, X. Wang, Road centerline extraction from
high-resolution imagery based on shape features and multivariate adaptive
regression splines, IEEE geoscience and remote sensing letters 10 (3) (2013)
M
375 583–587.
[33] Z. Cao, X. Liu, B. Peng, Y.-S. Moon, Dsa image registration based on mul-
380 tiscale gabor filters and mutual information, in: 2005 IEEE International
Conference on Information Acquisition, IEEE, 2005, pp. 6–pp.
CE
385 [35] K. He, J. Sun, X. Tang, Guided image filtering, in: European conference
on computer vision, Springer, 2010, pp. 1–14.
29
ACCEPTED MANUSCRIPT
T
[37] T. Chao, T. Yihua, C. Huajie, et al., Object-oriented method of hierarchical
IP
urban building extraction from high resolution remote sensing imagery,
Acta Geodaetica et Cartographica Sinica 39 (1) (2010) 39–45.
CR
[38] C. Sun, P. Vallotton, Fast linear feature detection using multiple directional
395 non-maximum suppression, Journal of Microscopy 234 (2) (2009) 147–157.
US
[39] A. Sironi, V. Lepetit, P. Fua, Multiscale centerline detection by learning
a scale-space distance transform, in: 2014 IEEE Conference on Computer
AN
Vision and Pattern Recognition, IEEE, 2014, pp. 2697–2704.
311.
30
ACCEPTED MANUSCRIPT
410 Biography
Ruyi Liu received the B.S. degree from Shaanxi Normal University Shaanxi,
T
China in 2012. Now she is currently working toward the Ph.D. degree in School
IP
of Computer and technology, Xidian University Shaanxi, China. Her current
interests include image classification and segmentation, and computer vision
CR
415
US
AN
Qiguang Miao received the M.Eng and Doctor degrees in Computer Sci-
ence from Xidian University, China. He is currently working as a professor at
420 school of computer science, Xidian University. His research interests include the
M
Jianfeng Song received his BEs and M.Eng degrees in Computer Appli-
PT
430 YiningQuanistheassociateprofessorandgraduatestudentsupervisorofSchoolofComputerScienceandTechnologyi
received his doctor degree in cryptology fromXidianUniversityin2010.His re-
search interests include network computingandsecurity. Email: ynquan@xidian.edu.cn
31
ACCEPTED MANUSCRIPT
Yunan Li received this B.S degree from the School of Computer Science and
T
435 Technology, Xidian University, Xi’an, China in 2014. He is currently working
toward the Ph.D. degree at Xidian University. His research interests include
IP
pattern recognition and digital image processing. Email: xdfzliyunan@163.com
CR
440
US
Pengfei Xu is lecturer at Information Science and Technology School,
Northwest University in China. His main research interests include: image
AN
processing and pattern recognition. Email:pfxu@nwu.edu.cn
M
ED
PT
CE
AC
32