002 Pattern Detection
002 Pattern Detection
002 Pattern Detection
Pattern Recognition
journal homepage: www.elsevier.com/locate/patcog
a r t i c l e i n f o a b s t r a c t
Article history: Lane detection is an application of environmental perception, which aims to detect lane areas or lane
Received 11 December 2019 lines by camera or lidar. In recent years, gratifying progress has been made in detection accuracy. To
Revised 13 June 2020
the best of our knowledge, this paper is the first attempt to make a comprehensive review of vision-
Accepted 29 August 2020
based lane detection methods. First, we introduce the background of lane detection, including traditional
Available online 15 September 2020
lane detection methods and related deep learning methods. Second, we group the existing lane detection
Keywords: methods into two categories: two-step and one-step methods. Around the above summary, we introduce
Lane detection lane detection methods from the following two perspectives: (1) network architectures, including classi-
Deep learning fication and object detection-based methods, end-to-end image-segmentation based methods, and some
Semantic segmentation optimization strategies; (2) related loss functions. For each method, its contributions and weaknesses
Instance segmentation are introduced. Then, a brief comparison of representative methods is presented. Finally, we conclude
this survey with some current challenges, such as expensive computation and the lack of generalization.
And we point out some directions to be further explored in the future, that is, semi-supervised learning,
meta-learning and neural architecture search, etc.
© 2020 Elsevier Ltd. All rights reserved.
1. Introduction Feature extraction and lane modeling are critical for obtaining
mathematical description of lanes. Many algorithms including So-
With the development of intelligent transportation, environ- bel [7], Canny [9], FIR filter [10], and Hough transform [11] are ap-
ment perception, as an essential task for autonomous driving, has plied to extract feature. Many algorithms model lanes as straight
become a research hotspot. Lane detection is an important part of lines. For modeling curves, parabolic [12], Catmull-Rom spline [13],
environmental perception. Many efforts have been done during the cubic B-spline [2], and clothoid curve [14] are used. In complex
last decades. However, it is still a challenge to develop a robust conditions, inverse perspective transformation [15], image en-
detector under unlimited conditions. Because there are too many hancement [16], stereo camera [17], and wavelet analysis [18] are
variables, such as fog, rain, illumination variation, and partial oc- used.
clusion. They may have effects on the final results. In 2012, convolutional neural networks (CNN) AlexNet [19] won
Pre-processing steps play an important role in heuristic the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC).
recognition-based lane detection methods. To remove unwanted Since then, deep learning algorithms became a promising tool.
noise, many filters are used, including mean, median [1], Gaus- Over the past several years, via multi-layer nonlinear transforms,
sian [2], and FIR [3] filters. To deal with illumination variation, the deep learning has achieved promising results in many fields. A va-
general solutions employ threshold segmentation [4] algorithm, in- riety of deep learning methods have been applied to tackle the
cluding Otsu [5], and PLSF [6], etc. The region of interest (ROI) lane detection task. Range from early CNN-based method (e.g.,
is usually used to reduce redundant information. Fixed-size ROI [20,21]) to end-to-end segmentation-based methods (e.g., GCN
[7], vanishing point-based ROI [2] and adaptive ROI [8] have been [22], SCNN [23]), GAN-based method (e.g., EL-GAN [24]), et al.
widely explored. Color is another information for pre-processing. In addition, knowledge distillation [25], attention map [26] have
Color space conversion between RGB and YCbCr or HLS is gener- brought new ideas for lane detection (e.g., SAD [27]). How to un-
ally used to enhance the quality of lane mark. derstand the structure of lane lines from the perspective of di-
rected acyclic graphica, DAGMapper [28] gives a good explanation.
Although promising results have been achieved, the lack of gen-
eralization ability is still a main challenge of existing methods. A
∗
Corresponding author.
E-mail address: lisongbin@mail.ioa.ac.cn (S. Li).
https://doi.org/10.1016/j.patcog.2020.107623
0031-3203/© 2020 Elsevier Ltd. All rights reserved.
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
Fig. 1. Architecture of a classfication CNN. It consists of eight convolution layers and three fully connected layers. The output is 10 0 0 categories probability. Picture from
[36].
CNN trained in one scenario may perform less accurate in other, them into a category. One-stage methods including YOLO [42], G-
especially at night. CNN [43], SSD [44], DSDD [45], and RON [46], etc. They can directly
To the best of our knowledge, this paper is the first article to generate the category probability and position coordinate value
make a comprehensive review of recent deep learning-based lane without region proposal stage. Major methods of deep learning-
detection algorithms. The rest of the survey is arranged as follows. based object detection are shown in Fig. 2.
Section 2 describes the background of relative CNN. Section 3 de-
scribes CNN architectures, loss functions, pre-processing, and post-
2.3. Semantic segmentation
processing of deep learning-based lane detection algorithms. In
Section 4, we do some experiments to demonstrate four state-of-
Semantic segmentation is another fundamental task of com-
the-art and representative algorithms. The conclusion and future
puter vision, which aims to classify every pixel into a category.
work are discussed in Section 5.
In 2015, Jonathan Long et al. proposed Fully Convolutional Net-
works (FCN) [48]. Following FCN, encoder-to-decoder architecture
2. Background of related convolutional neural networks is widely used to address image segmentation issues. An encoder-
decoder CNN architecture is shown in Fig. 3. To fuse different
Vision-based lane detection task, as an application of computer contextual information, GCN [49] used large convolution kernels,
vision, can be defined as image classification, object detection or PSP-Net [50] proposed a pyramid pooling module, Deeplab se-
semantic segmentation. CNN has revealed its powerful effects in a ries [51] adopt dilated spatial pyramid pooling, EncNet [52] in-
wide range of computer vision tasks. Due to the close link between troduced a channel attention method. For real-time segmentation,
lane detection and fundamental vision task, it is necessary to intro- ENet [53] proposed bottleneck module, ERFNet [54] used residual
duce the background of related convolutional neural networks. connections and factorized convolutions, EDANet [55] designed ef-
ficient dense modules and discarded deconvolution layers in or-
der to remain efficient while retaining remarkable accuracy. In ad-
2.1. Image classification
dition, PSANet [56] captures pixel-wise relation by a convolution
layer. Readers can refer to survey [57] for more comprehensive re-
In 2012, AlexNet won ILSVRC with five convolution layers and
view.
three fully connected layers, which essentially extends the depth
of LeNet [29] and applies some techniques such as ReLU [30] and
Dropout [31]. The structure of AlexNet is relatively simple but 3. Deep learning for lane detection
demonstrated the remarkable success of CNN.
It is proved that the solution space of CNN can be expanded We can group the existing lane detection methods into two cat-
by increasing its depth or its width [14]. Following AlexNet [19], egories: two-step and one-step methods. Two-step methods are
GoogLeNet [32], VGG [33] got higher accuracy with deeper and composed of feature extracting step and post-processing step. To
wider architectures in ILSVRC2014. VGG increased the depth to be specific, feature extracting includes heuristic recognition-based
16–19 layers, GoogLeNet increased both depth (22 layers) and feature extracting and deep learning-based feature extracting. Post-
width. GoogLeNet is also named Inception-v1. It has evolved from processing mainly contains clustering and fitting. One-step meth-
Inception-v1 to Inception-v4 [34] with different optimizations. The ods can get the detection and clustering results directly from the
generalization performance of VGG is better, and it is often used to input image. Hence, there is no need for clustering and we sum-
extract image features in many fields. Deeper CNN may cause the marize those methods as one-step methods.
problem of exploding or vanishing gradient. The short-cut connec- Around the above summary, we will discuss the existing deep
tions from ResNet [35] make the training possible. The architecture learning-based lane detection algorithms from two perspectives:
of a single pipeline CNN can be seen in Fig. 1. network architectures and loss functions. Post-processing is an-
other critical part of two-step algorithm, which will be introduced
in Section 3.3 together with pre-processing.
2.2. Object detection
We can group the existing deep learning-based detectors 3.1. Network architecture
into two categories: two-stage and one-stage methods. Two-stage
methods including R-CNN [37], Fast R-CNN [38], Faster R-CNN [39], As a specific application, there are many strategies for lane
CoupleNet [40], and Light-Head R-CNN [41], etc., which first gener- detection. From the perspective of how to define lane detection
ate candidate regions by CNN or traditional methods, then classify task, it can be summarized as the following three categories:
2
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
Fig. 2. Major methods of deep learning-based object detection. Anchor-free (in red) and AutoML (in green) techniques have becoming two important research directions.
Picture from [47].
Fig. 3. Sketch of an encoder-decoder CNN. Convolution with pooling constitute encoder section, transposed convolution consitute decoder section. Skip connection between
encoder and decoder is usually used for construct-preserving. Picture from [36].
1) classification-based methods, which combine some prior in- the pre-processed input of this method leads to complicated data
formation to determine lanes position; 2) object detection-based processing. Therefore, can we cancel the pre-processing steps and
method. By labeling regression bounding boxes or feature points train CNN in a more concise way? (2) this method has an eight-
for each lane segment, lanes can be detected by coordinate re- layer architecture, thence, can more complicated CNN architectures
gression; 3) segmentation-based method. Lanes and background (with different depths, widths, and topologies) achieve better re-
pixels are labeled as different classes. And the detection results sults?
can be obtained in the form of pixel-level classification (seman-
tic segmentation/instance segmentation). From the perspective of 3.1.2. Classification and object detection based methods
model structure, it can be summarized as the following two types: A Classification based lane detection methods
1) single-task model. Only lane detection is considered, and other
road signs are not involved; 2) multi-task model, which combine The application of image classification generally aims to dis-
lane detection with other tasks, such as drivable area detection, criminate what object is contained in the input. To all appear-
road marking recognition, road type, or lane type classification. In ances, this way cannot obtain the location of the lanes. Some tricks
practice, many available structures and ideas can be extracted from need to be used between classification and lane detection. We as-
existing CNN, such as the feature extractor VGG, ResNet, and the sume that some location-dependent prior knowledge is known,
end-to-end architecture of FCN, etc. which is denoted as pk(p). CNN, as a mapping function f(x), can be
combined with pk(p) to form a new formulation o = f (x, pk( p)).
DeepLane [21] is a method based on this idea. The overall CNN
3.1.1. Benchmark of CNN based method for lane detection architecture is shown in Fig. 5. In detail, the training dataset con-
In 2014, Jiun Kim and Minho Lee [20] proposed a detector, sisted of images (resolution: 240 × 360) from laterally-mounted
where CNN was first used to extract land features and random down-facing cameras. To obtain a probability distribution of lane
sample consensus (RANSAC) was used to clustering. The CNN ar- positions, softmax function is applied to the output of the last fully
chitecture is shown in Fig. 4, which consist of 8 layers with 3 con- connected layer (317 outputs: 316 possible classes for lane posi-
volutional, 2 subsampling, and 3 fully-connected layers. The train- tions and one class for the absence of lane marker). Hence, the
ing dataset is composed of images after ROI selection and edge de- CNN output a vector Yi = (y0 , ..., y316 ). To locate the lane marking,
tection. The last fully-connected layer output the predicted image the estimated position ei is defined as Eq. (3.1).
(10,015), where predicted pixels of lanes are denoted as white.
ei = argmax(yi ), 0 ≤ i ≤ 316 (3.1)
Though the CNN sketch of [20] is relatively simple, it can be
seen as an approximation of the complex mapping between the DeepLane got a better result by a more complex network (nor-
input and output spaces. Despite the advancement of this method malization layer, dropout layer) than [20]. However, the prior po-
compared to traditional ones, the following problems exist: (1) sition setting limits its application scenario. A more general ap-
3
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
Fig. 4. CNN architecture in [20]. The detector consists of three convolution, two subsampling and three fully connected layers. The output 1500 nodes of last fully connected
layer are reshaped to 100 × 15, which is considered as the predicted map.
4
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
Fig. 6. Sketch of VPGNet [61]. Four branches CNN is designed for grid regression, object detection, multi-label classfication and vanishing point prediction.
Fig. 8. Sketch of LMD [61]. Three dilated convolution layers are added between encoder and decoder.
a more detailed illustration of dilated convolution, readers can re- in an image, to establish the connection between two frames in
fer to [71]. The advantages of dilated convolution have been found. a video, and to establish the connection between different words
But how to design effective CNN structures based on dilated con- in a paragraph, etc. Non-local operation meets the needs of lane
volution is a new problem. detection. In 2019, Li Wenhui et al. added non-local in IANet (In-
Based on VGG, Shao-Yuan Lo et al. proposed LMD (Lane Mark- stance batch normalization and Attention Network) [76] to force
ing Detection) [72]. Three dilated convolution layers are embedded CNN to focus on lane regions. Experimental results show that the
between the encoder and decoder. Its architecture can be seen in mechanism is suitable for two-class segmentation scenes.
Fig. 8. The output is a predicted binary segmentation image. The
A Combining prior knowledge with segmentation
predicted lane pixels are denoted as 1, and the predicted back-
ground pixels denoted as 0. In 2019, based on EDANet, Shao-Yuan In terms of road geometric features, GCLNet (Geometric Con-
Lo et al. [73] proposed another embedded dilated convolution lane strained Network) [77] went further than VPGNet. In GCLNet,
detection CNN. By rethinking the relationship between downsam- Zhang Jie et al. proposed a multiple-task framework with mu-
pling operations and spatial information, FSS (Feature Size Selec- tually interlinked sub-structures between lane segmentation and
tion) and DDB (Degressive Dilation Block) were proposed. lane boundary detection to improve overall performance. As shown
How to effectively obtain associated long-range information is in Fig. 9, each decoder is connected to a link encoder to transfer
another problem. Inspired by the non-local means [74] in clas- complementary information between two tasks and thus the fea-
sic computer vision, Wang Xiaolong et al. proposed a learnable tures of two decoders could be reciprocally refined. The sufficient
non-local [75] operations to capture long-range dependencies. The exploration of the relationship between lane areas and lane bound-
learnable non-local long-range dependencies can be used to estab- ary provide ideas for future researchers trying to adopt a multi-
lish the connection between two pixels with an uncertain distance task strategy. Vijay John et al. proposed PsiNet [78] for detecting
5
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
Fig. 10. Sketch of CNN-LSTM [79]. The Figure does not show the details of the convolution layer of encoder and decoder, which can be U-Net or SegNet.
6
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
Fig. 11. Sketch of CooNet [64]. The maximum of four lane lines can be detected. The 1 × 30 output vector denote coordinates of 15 points.
Fig. 12. Sketch of SCNN [23]. The encoder can be VGG or other effevtive feature exactor. We can consider the slice-to-slice convolution as four 1 × n and n × 1 convolutions.
Fig. 13. Sketch of LaneNet [81]. This method is based on [80], the two branches are same with ENet.
3.1.4. Some optimization strategies of deep learning-based lane proposed by Hinton et al., which used a well-performance network
detection methods (teacher network) to guide the training of small parameter net-
How to remove the post-processing steps? How to get good works (student network) for improving the performance of student
performance with a small dataset? Despite what has been network. Furthermore, the studies [26,83] expanded KD to atten-
achieved, there are still many problems that need to be solved. tion distillation.
These problems also exists in the application of deep learning in How to apply the above methods to lane detection? In 2017, Ji-
computer vision. man Kim, Chanjong Park proposed TLELane (Transfer Learning for
Ego Lane detection) [84], based on two transfer learning steps. The
A Good performance with a small dataset
first step changes the representation domain of the network from
The effectiveness of the pre-training strategy implies that the general scene to the road scene, the second step reduces the
models can share many intermediate representations in different target from general road objects to the left and right ego lanes. In a
datasets. How to use a trained model to help the training of new trained segmentation-based lane detection network, the attention
models? We summarize the feasible methods used in lane detec- maps from different layers will capture rich contextual informa-
tion into the following two types: transfer learning and knowledge tion, which informs the lanes location and rough outline. Hence, it
distillation methods. is a feasible way to utilize the preceding block to mimic the atten-
We can divide the dataset for transfer learning into two cate- tion maps of a deeper block. In [27], Yuenan Hou et al. proposed a
gories: source dataset and target dataset. The source dataset refers self-attention distillation algorithm. Different from attention distil-
to additional data and is not directly related to the task. The tar- lation, the network learns from itself (e.g., block 3 mimic block 4
get dataset is directly related to the task. When both target data and block 2 mimic block 3). Therefore, it is named self-attention-
and source data are labeled, fine-tuning is a common method to distillation based method.
deal with this problem. In order to achieve a good transfer perfor-
mance, there is a wide discussion on which layers to fix and which
layers to train [82]. In 2015, knowledge distillation (KD) [25] was A To remove post-processing
7
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
Fig. 14. Sketch of EL-GAN. The generator and discrimator are Tiramisu DenseNet-based and Two-headed DebseNet based, respectively. The Lemb denotes the loss of feature
map, which can be considered as a perception loss.
In Section 3.1.3.C, we discussed how to simplify the post- 3.2. A summary of deep learning-based representative lane detection
processing, which mainly focuses on how to remove the clustering. algorithm
In this section, we will discuss how to remove the post-processing,
including the clustering step and fitting step. That is, the output of In this section, we give a summary of methods, advantages and
CNN contains the predicted lanes and the parameterized descrip- limitations of existing representative deep learning-based lane de-
tion of each detected lane. tection algorithms in Table 1. We divide those algorithms into two
RLaneNet (Real-time Lane Network) [85] is another attempt to categories: two-step algorithms and one-step algorithms.
fuse CNN and LSTM, where the LSTM serves as a solution to the
uncertain number of lanes and a decoder to decode the parame- 3.3. Loss function
ters of each lane. The predict values of RLaneNet are the three X
coordination values of the points which lies on the intersections of As a key point of deep learning, loss functions are used to
the lane with three horizontal lines (Y = 0, Y = h/2, Y = h). There calculate the inconsistency between predicted output and ground
is an assumption that lanes can be described by a three-point co- truth and guide the model optimization. Different loss functions
ordinate and quadratic function. h is the image height after IPM. focus on different tasks. Therefore, a variety of loss functions are
DLFNet (Differentiable Least-squares Fitting Network) [86] is a adopted to guide the lane detection task. In this section, we take a
more general strategy, which estimating lane curvature parame- look at loss functions that are used in the lane detection field.
ters by solving a weighted least-squares problem in-network. The
weights are generated by a deep network conditioned on the input 3.3.1. For classification
image. A geometric loss function IS used to train the network to L1 loss, L2 loss, or Cross-Entropy loss are widely used in many
minimize the area between the predicted lane line and the ground- tasks, such as lane line type classification and pixel-level classifica-
truth. The weighted least-squares fitting problem can be consid- tion. The equation of L1 and L2 are shown in (3.4), (3.5), where h,
ered as Eq. (3.3), where ω is the pixels weighted map and X, Y w, c denote the height, width, and channel numbers of an image,
is the coordinate matrix. Therefore, the parameters β of the best- respectively. Iˆ and I are the predicted output and input. As shown
fitting curve through the weighted pixel coordinates can be ob- in Eq. (3.6), the inter-class competition mechanism is adopted in
tained from Eq. (3.3). Cross-Entropy loss. When C = 2, the Cross-Entropy loss can be de-
fined as Eq. (3.7), where si denoted predicted probability of class
ti . In Eq. (3.8), different weights are given to each category. It is an
ω Xβ = ω Y (3.3)
effective solution for the problem of sample imbalance. For exam-
ple, in lane detection, the unbalanced ratio of lane line areas and
background is rather big. In [79] and [23], the weights of lane lines
A Better output structure-preserving and background were setted as 1.0 and 0.4, respectively.
1
L1 (Iˆ, I ) = Iˆi, j,k − Ii, j,k (3.4)
As mentioned in Section 3.1.3.A, the emphasis of semantic seg- hwc
i, j,k
1 2
mentation is to obtain accurate classification per pixel rather than
specifying the shape. In 2014, Ian J [87] proposed the genera- L2 (Iˆ, I ) = Iˆi, j,k − Ii, j,k (3.5)
tive adversarial networks (GAN) architecture. The basic principle of
hwc
i, j,k
GAN is a game between generation network and discriminant net-
C
work. The generator generates synthetic data from a given noise Lce = − ti log (si ) (3.6)
(generally referred to as a uniform distribution or a normal distri- i
bution). And the discriminator discriminates the output of the gen- =2
C
erator and real data. The former attempts to produce data that is
Lbce = − ti log (si ) = −t1 log (s1 ) − (1 − t1 ) log (1 − s1 ) (3.7)
closer to reality. Accordingly, the latter attempts to accurately dis-
i=1
tinguish whether the input is real or generated. Readers can refer
to [88–90] for further discussions of GAN.
C
Lwce = − wi ti log (si ) (3.8)
Based on the principle of GAN, Mohsen Ghafoorian et al. pro-
i
posed EL-GAN (Embedding loss GAN) [24] for structure-preserving
of lane detection. DenseNet [91,92] was used in generator and dis- Comparing with L1 loss, the L2 loss is more sensitive to
criminator, as shown in Fig. 14. As a matter of fact, we can regard large errors but less sensitive to small errors. Assuming that
the embedding loss as perceptual loss [93] and the EL-GAN is a the simplified L2 loss function is shown in Eq. (3.9) and yˆi =
combination of CGAN and perceptual loss. (W xi + b). Its partial derivative is shown in Eq. (3.10), where
8
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
σ (W xi + b) = σ (W xi + b)(1 − σ (W xi + b)) and σ is sigmoid. Thus, tions are defined as Eq. (3.16)-(3.18), where x is ground truth data,
we can see that when σ (W xi + b) is close to 0 or 1, dW dJ
will z is input noise data. It is a variation of Cross-Entropy loss.
be close to 0, which leads to slow convergence at the beginning
of training. For Cross-Entropy loss function, its partial derivative min max V (D, G ) = Ex∼P (x ) [log D(x )] + Ez∼Pz(z ) [log(1 − D(G(z )))]
G D
is shown in Eq. (3.11). Therefore, in samentic segmentation or
(3.16)
segmentation-based lane detection algorithm, Cross-Entropy loss
or weighted Cross-Entropy loss is more applicable.
1 2 max V (D, G ) = Ex∼Pdata(x ) [log D(x )] + Ez∼Pz(z ) [log(1 − D(G(z )))]
J= yi − yˆi (3.9) D
2 (3.17)
dJ
= yi − yˆi σ (W xi + b)xi (3.10)
dW min V (D, G ) = Ez∼Pz(z ) [log(1 − D(G(z )))] (3.18)
C
dLce
= [σ ( s i ) − y i ] · x i (3.11) In EL-GAN, the loss functions are constituted by adversarial
dW loss Ladv , Cross-Entropy loss Lcce , and L2 loss Lemb , as shown in
However, due to the inter-class competition mechanism, Cross- Eq. (3.19)-(3.22). The Lemb loss can be seen as a perceptual loss,
Entropy loss only cares about the accuracy of prediction proba- which compares the feature obtained by the convolution of real
bility of the correct label and ignoring the difference of other in- images with the feature obtained by the convolution of gener-
correct labels. This makes the features learned scattered. To over- ated images, so as to make the high-level information (content
come the drawback, from the perspective of activation function, and global structure) close to each other. The perceptual loss was
there are l-Softmax [94] and A-Softmax [95] etc. From the per- widely used in the super-resolution field [93] for better structure-
spective of loss function, [77] proposed an IoU loss for lane de- preserving.
tection. We named it as LIoU−so f t because it represents the rela-
tionship between predicted probability and the ground-truth. As L f it = L f it (G(x; θgen ), y ) = Lcce (G(x; θgen ), y ) (3.19)
shown in Eq. (3.12), I denotes the set of image pixels, yp is the
output probability of pixel p, g= {0, 1}M×N denotes the set of pix- 1
wh c
els ground-truth and ’ × ’ denotes pixel-wise multiplication. To Lcce (y˙ , y ) = yi, j ln y˙ i, j (3.20)
wh
i j
increasing the intersection-over-union between the predicted lane
pixels and ground-truth lane pixels, another IoU loss was proposed Ladv = Ex∼P (x ) [log(1 − D(G(x )))] (3.21)
in [27]. We named it as LIoU−hard because it represents the rela-
tionship between predicted results and the ground-truth. LIoU−hard
Lemb (y˙ , y; x, θdisc ) = De (y; x, θdisc ) − De (y˙ ; x, θdisc ) 2 (3.22)
is defined by Eq. (3.13), where Np is the number of predicted lane
pixels, Ng is the number of ground-truth lane pixels and No is the In this section, we intriduce various loss funtions widely used
number of lane pixels in the overlapped areas between predicted in lane detection field. For a multi-branch detector, a weighted av-
lane areas and ground-truth lane areas. erage of multiple loss functions is frequently used. There are many
other loss functions in the lane detection field, however, we can
p∈I (y p × g p )
LIoU−so f t = 1 − (3.12) consider it as some combinations of mentioned functions in this
p∈I (y p + g p − y p × g p ) section.
LIoU−hard = 1 − N p /(N p + Ng − No ) (3.13)
3.4. Pre-processing and post-processing
9
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
Fig. 16. An illustration of DBSCAN algorithm in a smile dataset. The dataset is clus-
tered into five clusters when min-points = 4, epsilon = 1.
10
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
Table 1
Summary of various deep learning-based lane detection algorithms.
Two step
Jiun Kim et al. Classification: 1: Compared with the traditional method, 1: The network structure is not efficient
(2014) [20] A: 8 layers CNN. which got a good performance. enough.
B: RANSAC.
Brody Huval et al. Object detection: 1:More satisfied with autopilot 1: Complicated data collection and
(2015) [59] A: Vehicle detection and lane requirement. annotation.
detection. 2:Robustness against occlusion to some 2: Repeated detection and coordinate
B: Depth estimation extent. regression.
3:Used CNN to estimate geometric
information.
Seokju Lee et al. Object detection: 1: Improved robustness under various 1: Post-processing take high
(2017) [61] A: lane and road marking recognition. conditions. computational complexity.
B: Vanishing point estimation.
Huang et al. Object detection: 1: Used temporal and spatial constraints 1: Complicated data flow and architecture.
(2018) [63] A: IPM. to reduce the range of searching area. 2: When initial assumptions are not met,
B: Coordinate regression. pre-process cannot provide valid results.
C: Sub-image Extraction.
Chen, Ping-Rong et al. Segmentation: 1: The dilated convolution isemployed to 1: This is just an application of dilated
(2018) [72] A: end-to-end. enlarge the receptive fields. convolution, the performance is not
B: dilated convolution. achieved SOTA.
Zhang Jie et al. Segmentation: 1: This work pay more attention on 1: Complex loss function and network
(2018) [77] A: end-to-end. interlinked relationship of sub-structures. structure bring many difficulties in
B: multiple-task framework. training.
Zou Qin et al. Sementation: 1: The exploration in temporal improve 1: High computational complexity.
(2019) [79] A: Combining CNN and LSTM. its performance under occlusion scene. 2: When the input images does not
B: The input of encoder is 5 change much, the improved performance
consecutive frames. is conditioned.
Mohsen Ghafoorian et al. Segmentation: 1: The Embedding loss in discriminator 1: Large amount of parameters.
(2018) [24] A: CGAN network. can effectively control the output
boundary closer to the label.
One step
Gurghian et al. Classification: 1: Fast detection. 1: Limited application scenarios.
(2016) [21] A: Combine prior position and 2: Simple network structure. 2: Fixed camera parameters.
classification result to estimate
lane position.
Li Wenhui et al. Segmentation: 1: Verified the attention is suitable for 1: Non-local adds large computation.
(2019) [76] A: end-to-end. two-class semantics segmentation task
B: Non-local attention. with only lane and background.
C: Instance batch normalization.
Shriyash Chougule et al. Regression: 1: This strategy does not 1: Fixed number of lane
(2018) [64] A: multi-branch. require cluster step. lines can be detected.
B: Coordinate regression. 2: Lightweight networks.
C: Data augmentation.
Xingang Pan et al. Segmentation: 1: The slice-by-slice convolution is 1: High computational complexity.
(2018) [23] A: end-to-end. suitable for long continuous shape
B: slice-by-slice convolution. structure.
Neven Davy et al. Segmentation: 1: Proposed a H–Net to estimate IPM 1: The H–Net is not very effective.
(2018) [81] A: Instance segmentation. transformation matrix.
2: Do not fixed number of lanes
Jiman Kim et al. Segmentation: 1: Overcome weakpoints of small dataset. 1: Only ego lanes can be detected.
(2017) [84] 1: Two steps transfer learning.
Yuenan Hou et al. Segmentation: 1: The self-attention distillation strategy 1: The complex training strategies
(2019) [27] A: Self-attention distillation. is efficient. and loss functions will make the
hyperparameter adjustment difficulty.
Wang Ze et al. Regression: 1: The LSTM serves as a solution 1: The ordinate of the three points to be
(2018) [85] A: Edge proposal. to the uncertain number. detected is predefined.
B: Parameters regression. 2: Do not need any post-processing.
Van Gansbeke et al. Segmentation: 1: This is a more general strategy without 1: Fixed number of lane lines can be
(2019) [86] A: Generating coordinate weight any predefine condition. detected.
map. 2: When added the number of weight
B: A differentiable least-squares fitting map, the performance will be degraded
module
11
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
Fig. 20. Experiments of CarND under different circumstances. Line 1: ideal environment. line 2: complex conditions. From left to right: input, after IPM, after Binarization,
after fited, histogram map.
Fig. 21. Three results under curved lanes. Line 1: result of LaneNet [81]. Line 2: result of SCNN [23]. Line 3: result of ERFNet-DLSF [79].
Name Frame Train Validation Test Resolution In this paper, we present an overall review of recent deep
TuSimple 6408 3268 358 2782 1280 × 720 learning-based lane detection algorithms. There are three main
contributions. First, this paper is the first overall review of recent
deep learning-based lane detection algorithms, which will facilitate
readers to understand how to apply deep learning to lane detec-
Table 3 tion task. Second, we introduce those algorithms from CNN archi-
Comparisons among some representative deep models. tectures and loss functions. This will facilitate researchers to de-
Paper Method Dataset ACC FPR FNR sign their own detector. Third, four representative state-of-the-art
[35] ResNet18-based TuSimple 92.69 0.0948 0.0822
algorithms are highlighted and conducted experiments on TuSim-
[35] ResNet34-based TuSimple 92.84 0.0918 0.0796 ple dataset, which will help readers to understand the best perfor-
[53] ENet TuSimple 93.02 0.0886 0.0734 mance and optimization directions. We will clean up and open all
[86] ERFNet-DLSF TuSimple 93.38 0.1064 0.0983 the test code used in this work, which will further facilitate the
[27] ENet-SAD TuSimple 96.64 0.0602 0.0205
readers from theory to application.
[81] LaneNet TuSimple 96.38 0.0442 0.0197
[23] S-CNN TuSimple 96.53 0.0617 0.018 Thanks to the improvement of hardware computing power and
[24] EL-GAN TuSimple 96.39 0.0412 0.0336 GPUs. In recent years, a variety of vision-based assisted driving sys-
[79] CNN-LSTM (SegNet+) TuSimple 97.30 0.0416 0.0186 tems have been deployed on vehicle platforms, such as Lane De-
[79] CNN-LSTM (UNet+) TuSimple 97.20 0.0424 0.0184 parture System and Lane Change Assist System, etc. The detection
accuracy of lane detection has increased to 97% [79] on TuSim-
ple dataset. However, there are still many underlying challenges
not been overcome. There are two main challenges: (1) the lack
based and ResNet34-based methods used spatial upsampling as the of generalization ability. [23,27] proposed two modules that can
decoder rather than deconvolution. The overall architecture of four be easily transplanted into other CNN to get a better performance.
selected algorithms [23,79,81,86] have been shown in Section 3. However, the supervised learning method cannot make appropriate
As shown in Fig. 21, Fig. 22, the experiments are conducted under adjustments to the situation that has not appeared in the training
curve lines and shade situation. set; (2) it is difficult to be deployed on mobile devices. The com-
Middle layers feature maps reflect what the CNN extracted. For plicated and excellent performance CNN usually accompanied by
encoder-decoder structure CNN, the encoder encoded input image millions of parameters, this is a challenge for real-time computa-
to multi-dimension and low-resolution feature maps. Fig. 23 shows tion on mobile devices.
several feature maps of [23,79,86] encoder output. We can see that Along with those deficiencies, several directions may be fur-
SCNN extract more obvious linearity features. ther explored in the future. First, semantic segmentation remains
12
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
Fig. 22. Four results under shadows shade condition. Line 1: results of LaneNet [81], from left to right, semantic segmentation map, embedding map, instance segmentation
map and fused image. Line 2: results of SCNN [23]. From left to right: instance segmentation map, fused image. Line 3: results of ERFNet-DLSF [86]. From left to right:
weight map of left lane line, weight map of right lane line, fused weight map of the above two, predicted result. Line 4: result of CNN-LSTM [79]. From left to right: four
continuous frames of input, result of semantic segmentation.
Fig. 23. From left to right: features maps of [79,23,86] encoder output, respectively.
a computationally intensive algorithm for embedded deployment. [3] X. Wang, Y. Wang, C. Wen, Robust lane detection based on gradient-pairs con-
More efficient CNN architectures should be explored. Second, su- straint, in: Proceedings of the 30th Chinese Control Conference, IEEE, 2011,
pp. 3181–3185.
pervised learning requires a large amount of annotated data, and [4] J. Duan, Y. Zhang, B. Zheng, Lane line recognition algorithm based on thresh-
labeling data is a boring and costly task. Semi-supervised or old segmentation and continuity of lane line, in: 2016 2nd IEEE International
weakly-supervised algorithms have been developed to make semi- Conference on Computer and Communications (ICCC), IEEE, 2016, pp. 680–
684.
supervised semantic segmentation possible. In addition, how to [5] Y. Chai, S.J. Wei, X.C. Li, The multi-scale hough transform lane detection
make accurate predictions under a variable environment is critical. method based on the algorithm of otsu and canny, in: Advanced Materials Re-
Meta-Learning should be a viable exploration. Then, existing seg- search, Vol. 1042, Trans Tech Publ, 2014, pp. 126–130.
[6] V. Gaikwad, S. Lokhande, Lane departure identification for advanced driver as-
mentation architectures rely on experience design, auto-machine
sistance, IEEE Transactions on Intelligent Transportation Systems 16 (2) (2014)
learning may provide new ideas for more efficient feature extrac- 910–918.
tor. [7] C. Mu, X. Ma, Lane detection based on object segmentation and piece-
wise fitting, TELKOMNIKA Indones, J. Electr. Eng. TELKOMNIKA 12 (5) (2014)
3491–3500.
Declaration of Competing Interest [8] D. Ding, C. Lee, K.-y. Lee, An adaptive road roi determination algorithm for lane
detection, in: 2013 IEEE International Conference of IEEE Region 10 (TENCON
2013), IEEE, 2013, pp. 1–4.
We declare that we have no known competing financial inter-
[9] P.-.C. Wu, C.-.Y. Chang, C.H. Lin, Lane-mark extraction for automobiles under
ests or personal relationships that could have appeared to influ- complex conditions, Pattern Recognit 47 (8) (2014) 2756–2767 29.
ence the work reported in this paper. [10] T. Aung, M.H. Zaw, Video based lane departure warning system using hough
transform, in: International Conference on Advances in Engineering and Tech-
nology, 2014, pp. 29–30.
Acknowledgement [11] J. Niu, J. Lu, M. Xu, P. Lv, X. Zhao, Robust lane detection using two-stage feature
extraction with curve fitting, Pattern Recognit 59 (2016) 225–233.
This work was supported in part by the Important Science and [12] J.C. McCall, M.M. Trivedi, Video-based lane estimation and tracking for driver
assistance: survey, system, and evaluation, IEEE Trans. Intelligent Transporta-
Technology Project of Hainan Province under Grant ZDKJ201807, in tion Systems (2006) 20–37.
part by the Hainan Provincial Natural Science Foundation of China [13] Y. Wang, D. Shen, E.K. Teoh, Lane detection using spline model, Pattern Recog-
under Grant 618QN309, in part by the Scientific Research Founda- nit Lett 21 (8) (20 0 0) 677–689.
[14] G.F. Montufar, R. Pascanu, K. Cho, Y. Bengio, On the number of linear regions of
tion Project of Haikou Laboratory, Institute of Acoustics, Chinese deep neural networks, in: Advances in neural information processing systems,
Academy of Sciences, in part by the IACAS Young Elite Researcher 2014, pp. 2924–2932.
Project under Grant QNYC201829 and Grant QNYC20174 [15] M. Fu, X. Wang, H. Ma, Y. Yang, M. Wang, Multi-lanes detection based on
panoramic camera, in: 11th IEEE International Conference on Control & Au-
tomation (ICCA), 520, IEEE, 2014, pp. 655–660.
References [16] Y. Li, L. Chen, H. Huang, X. Li, W. Xu, L. Zheng, J. Huang, Nighttime lane mark-
ings recognition based on canny detection and hough transform, in: 2016 IEEE
[1] J.-.G. Wang, C.-.J. Lin, S.-.M. Chen, Applying fuzzy method to vision-based International Conference on Real-time Computing and Robotics (RCAR), IEEE,
lane detection and departure warning system, Expert Syst Appl 37 (1) (2010) 2016, pp. 411–415.
113–126. [17] J.-.G. Kim, J.-.H. Yoo, J.-.C. Koo, Road and lane detection using stereo camera,
[2] P.-.Y. Hsiao, C.-.W. Yeh, S.-.S. Huang, L.-.C. Fu, A portable vision-based real-time in: 2018 IEEE 525 International Conference on Big Data and Smart Computing
lane departure warning system: day and night, IEEE Transactions on Vehicular (BigComp), IEEE, 2018, pp. 649–652.
Technology 58 (4) (2008) 2089–2094.
13
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
[18] D. Mingfang, W. Junzheng, L. Nan, L. Duoyang, Shadow lane robust detection [49] C. Peng, X. Zhang, G. Yu, G. Luo, J. Sun, Large kernel matters-improve se-
by image signal local reconstruction, International Journal of Signal Processing, mantic segmentation by global convolutional network, in: Proceedings of the
Image Processing and Pattern Recognition 9 (3) (2016) 89–102. IEEE conference on computer vision and pattern recognition, 2017, pp. 4353–
[19] I.S. Krizhevsky, G.E. Hinton, Imagenet classification with deep convolutional 4361.
neural networks, in: Advances in neural information processing systems, 2012, [50] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid scene parsing network, in: Pro-
pp. 1097–1105. ceedings of the IEEE conference on computer vision and pattern recognition,
[20] J. Kim, M. Lee, Robust lane detection based on convolutional neural network 2017, pp. 2881–2890.
and random sample consensus, in: International conference on neural infor- [51] L.-.C. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking atrous convolution
mation processing, Springer, 2014, pp. 454–461. 30. for semantic image segmentation, arXiv preprint arXiv:1706.05587, 2017
[21] T.K. Gurghian, S.V. Bailur, K.J. Carey, V.N. Murali, Deeplanes: end-to-end lane [52] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, A. Agrawal, Context en-
position estimation using deep neural networksa, in: Proceedings of the IEEE coding for semantic segmentation, in: Proceedings of the IEEE Conference on
Conference on Computer Vision and Pattern Recognition Workshops, 2016, Computer Vision and Pattern Recognition, 2018, pp. 7151–7160.
pp. 38–45. [53] Paszke, A.C., S. Kim, E. Culurciello, Enet: a deep neural network architecture
[22] W. Zhang, T. Mahale, End to end video segmentation for driving: lane detection for real-time semantic segmentation, arXiv:1606.02147. 33, 2016.
for autonomous car, arXiv:1812.05914, 2018. [54] E. Romera, J.M. Alvarez, L.M. Bergasa, R. Arroyo, Erfnet: efficient residual fac-
[23] X. Pan, J. Shi, P. Luo, X. Wang, X. Tang, Spatial as deep: spatial cnn for traffic torized convnet for real-time semantic segmentation, IEEE Transactions on In-
scene understanding, Thirty-Second AAAI Conference on Artificial Intelligence, telligent Transportation Systems 19 (1) (2017) 263–272.
2018. [55] S.-.Y. Lo, H.-.M. Hang, S.-.W. Chan, J.-.J. Lin, Efficient dense modules of asym-
[24] M. Ghafoorian, C. Nugteren, N. Baka, O. Booij, M. Hofmann, El-gan: embedding metric convolution for real-time semantic segmentation, in: Proceedings of the
loss driven generative adversarial networks for lane detection, in: Proceedings ACM Multimedia Asia on ZZZ, 2019, pp. 1–6.
of the European Conference on Computer Vision (ECCV), 2018 0-0. [56] H. Zhao, Y. Zhang, S. Liu, J. Shi, C. Change Loy, D. Lin, J. Jia, Psanet: point-wise
[25] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network, spatial attention network for scene parsing, in: Proceedings of the European
arXiv preprint arXiv:1503.02531, 2015. Conference on Computer Vision (ECCV), 2018, pp. 267–283.
[26] S. Zagoruyko, N. Komodakis, Paying more attention to attention: improv- [57] Y. Guo, Y. Liu, T. Georgiou, M.S. Lew, A review of semantic segmentation using
ing the performance of convolutional neural networks via attention transfer, deep neural networks, Int J Multimed Inf Retr 7 (2) (2018) 87–93.
arXiv:1612.03928, 2016. [58] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, Overfeat: in-
[27] Y. Hou, Z. Ma, C. Liu, C.C. Loy, Learning lightweight lane detection cnns by tegrated recognition, localization and detection using convolutional networks,
self-attention distillation, in: Proceedings of the IEEE International Conference international conference on learning representations., 2013.
on Computer Vision, 2019, pp. 1013–1021. [59] B. Huval, T. Wang, S. Tandon, J. Kiske, W. Song, J. Pazhayampallil, M. Andriluka,
[28] N. Homayounfar, W.-.C. Ma, J. Liang, X. Wu, J. Fan, R. Urtasun, Dagmapper: P. Rajpurkar, T. Migimatsu, R. Cheng-Yue, et al., An empirical evaluation of deep
learning to map by discovering lane topology, in: Proceedings of the IEEE In- learning on highway driving, CoRR. (2015).
ternational Conference on Computer Vision, 2019, pp. 2911–2920. [60] O. Elharrouss, N. Almaadeed, S. Almaadeed, Y. Akbari, Image inpainting: a re-
[29] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al., Gradient-based learning applied view, Neural Processing Letters (2019) 1–22.
to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324. [61] S. Lee, J. Kim, J. Shin Yoon, S. Shin, O. Bailo, N. Kim, T.-.H. Lee, H. Seok Hong,
[30] V. Nair, G.E. Hinton, Rectified linear units improve restricted boltzmann ma- S.-.H. Han, I. So Kweon, Vpgnet: vanishing point guided network for lane and
chines, in: Proceedings of the 27th international conference on machine learn- road marking detection and recognition, in: Proceedings of the IEEE Interna-
ing (ICML-10), 2010, pp. 807–814. 560. tional Conference on Computer Vision, 2017, pp. 1947–1955.
[31] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R.R. Salakhutdinov, Im- [62] Borji, Vanishing point detection with convolutional neural networks, ArXiv-
proving neural networks by preventing co-adaptation of feature detectors, abs/1609.00967, 2016.
arXiv:1207.0580. 31, 2012. [63] Y. Huang, S. Chen, Y. Chen, Z. Jian, N. Zheng, Spatial-temproal based lane de-
[32] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Van- tection using deep learning, in: IFIP International Conference on Artificial In-
houcke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the telligence Applications and Innovations, Springer, 2018, pp. 143–154. 34.
IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9. [64] S. Chougule, N. Koznek, A. Ismail, G. Adam, V. Narayan, M. Schulze, Reliable
[33] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale multilane 645 detection and classification by utilizing cnn as a regression net-
image recognition, Computer Science (2014). work, in: Proceedings of the European Conference on Computer Vision (ECCV),
[34] C. Szegedy, S. Ioffe, V. Vanhoucke, A.A. Alemi, Inception-v4, inception-resnet 2018 0-0.
and the impact of residual connections on learning, Thirty-First AAAI Confer- [65] K.-.Y. Chiu, S.-.F. Lin, Lane detection using color-based segmentation, in: IEEE
ence on Artificial Intelligence, 2017. Proceedings. Intelligent Vehicles Symposium, 20 05., IEEE, 20 05, pp. 706–711.
[35] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, [66] L. Riera, K. Ozcan, J. Merickel, M. Rizzo, S. Sarkar, A. Sharma, Driver be-
in: Proceedings of the IEEE conference on computer vision and pattern recog- havior analysis using lane departure detection under challenging conditions,
nition, 2016, pp. 770–778. arXiv:1906.0 0 093, 2019.
[36] Z. Hu, J. Tang, Z. Wang, K. Zhang, L. Zhang, Q. Sun, Deep learning for im- [67] K. He, G. Gkioxari, P. Dollar, R. Girshick, Mask r-cnn., IEEE Transactions on Pat-
age-based cancer detection and diagnosis a survey, Pattern Recognit 83 (2018) tern Analysis & Machine Intelligence, 2017 991-1.
134–149. [68] S. Chougule, A. Ismail, A. Soni, N. Kozonek, V. Narayan, M. Schulze, An effi-
[37] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate cient encoderdecoder cnn architecture for reliable multilane detection in real
object 575 detection and semantic segmentation, in: Proceedings of the IEEE time, in: 2018 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2018, pp. 1444–
conference on computer vision and pattern recognition, 2014, pp. 580–587. 1451.
[38] R. Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on [69] F. Pizzati, F. Garc´ıa, Enhanced free space detection in multiple lanes based on
computer vision, 2015, pp. 1440–1448. single cnn with scene identification, in: 2019 IEEE Intelligent Vehicles Sympo-
[39] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real-time object detec- sium (IV), IEEE, 2019, pp. 2536–2541.
tion with region proposal networks, in: Advances in neural information pro- [70] F. Yu, V. Koltun, Multi-scale context aggregation by dilated convolutions, inter-
cessing systems, 2015, pp. 91–99. national conference on learning representations.,2015.
[40] Y. Zhu, C. Zhao, J. Wang, X. Zhao, Y. Wu, H. Lu, Couplenet: coupling global [71] P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, G. Cottrell, Understanding
structure with local parts for object detection, in: Proceedings of the IEEE In- convolution for semantic segmentation, in: 2018 IEEE winter conference on
ternational Conference on Computer Vision, 2017, pp. 4126–4134. applications of computer vision (WACV), IEEE, 2018, pp. 1451–1460.
[41] Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, J. Sun, Light-head r-cnn: in defense of [72] P.-.R. Chen, S.-.Y. Lo, H.-.M. Hang, S.-.W. Chan, J.-.J. Lin, Efficient road lane
two-stage object detector, arXiv:1711.07264, 2017. marking detection with deep learning, in: 2018 IEEE 23rd International Con-
[42] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, re- ference on Digital Signal Processing (DSP), IEEE, 2018, pp. 1–5.
al-time object detection, in: Proceedings of the IEEE conference on computer [73] S.-.Y. Lo, H.-.M. Hang, S.-.W. Chan, J.-.J. Lin, Multi-class lane semantic segmen-
vision and pattern recognition, 2016, pp. 779–788. 32. tation using efficient convolutional networks, in: 2019 IEEE 21st International
[43] M. Najibi, M. Rastegari, L.S. Davis, G-cnn: an iterative grid based object de- Workshop on Multimedia Signal Processing (MMSP), IEEE, 2019, pp. 1–6. 35.
tector, in: Proceedings of the IEEE conference on computer vision and pattern [74] B.Coll Buades, J.-.M. Morel, A non-local algorithm for image denoising, in: 2005
recognition, 2016, pp. 2369–2377. IEEE Computer Society Conference on Computer Vision and Pattern Recogni-
[44] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-.Y. Fu, A.C. Berg, Ssd: tion (CVPR’05), Vol. 2, IEEE, 2005, pp. 60–65.
single shot multibox detector, in: European conference on computer vision, [75] X. Wang, R. Girshick, A. Gupta, K. He, Non-local neural networks, in: Proceed-
Springer, 2016, pp. 21–37. ings of the IEEE conference on computer vision and pattern recognition, 2018,
[45] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, A.C. Berg, Dssd: deconvolutional single shot pp. 7794–7803.
detector, arXiv:1701.06659, 2017. [76] W. Li, F. Qu, J. Liu, F. Sun, Y. Wang, A lane detection network based on ibn and
[46] T. Kong, F. Sun, A. Yao, H. Liu, M. Lu, Y. Chen, Ron: reverse connection with ob- attention, Multimed Tools Appl (2019) 1–14.
jectness prior networks for object detection, in: Proceedings of the IEEE Con- [77] J. Zhang, Y. Xu, B. Ni, Z. Duan, Geometric constrained joint lane segmentation
ference on Computer Vision and Pattern Recognition, 2017, pp. 5936–5944. and lane boundary detection, in: Proceedings of the European Conference on
[47] X. Wu, D. Sahoo, S.C. Hoi, Recent advances in deep learning for object detec- Computer Vision (ECCV), 2018, pp. 486–502.
tion, Neurocomputing (2020). [78] V. John, N.M. Karunakaran, C. Guo, K. Kidono, S. Mita, Free space, visible and
[48] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic seg- missing lane marker estimation using the psinet and extra trees regression, in:
mentation, in: Proceedings of the IEEE conference on computer vision and pat- 2018 24th International Conference on Pattern Recognition (ICPR), IEEE, 2018,
tern recognition, 2015, pp. 3431–3440. pp. 189–194.
14
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
15