002 Pattern Detection

Pattern Recognition 111 (2021) 107623
Contents lists available at ScienceDirect
Pattern Recognition
journal homepage: www.elsevier.com/locate/patcog
Rapid and Brief Communication
A review of lane detection methods based on deep learning

Jigang Tang a,b, Songbin Li a,b,∗, Peng Liu a,b
a
Institute of Acoustics, Chinese Academy of Sciences, 100190, China
b
University of Chinese Academy of Sciences, Beijing, 100049, China
a r t i c l e i n f o a b s t r a c t
Article history: Lane detection is an application of environmental perception, which aims to detect lane areas or lane
Received 11 December 2019 lines by camera or lidar. In recent years, gratifying progress has been made in detection accuracy. To
Revised 13 June 2020
the best of our knowledge, this paper is the first attempt to make a comprehensive review of vision-
Accepted 29 August 2020
based lane detection methods. First, we introduce the background of lane detection, including traditional
Available online 15 September 2020
lane detection methods and related deep learning methods. Second, we group the existing lane detection
Keywords: methods into two categories: two-step and one-step methods. Around the above summary, we introduce
Lane detection lane detection methods from the following two perspectives: (1) network architectures, including classi-
Deep learning fication and object detection-based methods, end-to-end image-segmentation based methods, and some
Semantic segmentation optimization strategies; (2) related loss functions. For each method, its contributions and weaknesses
Instance segmentation are introduced. Then, a brief comparison of representative methods is presented. Finally, we conclude
this survey with some current challenges, such as expensive computation and the lack of generalization.
And we point out some directions to be further explored in the future, that is, semi-supervised learning,
meta-learning and neural architecture search, etc.
© 2020 Elsevier Ltd. All rights reserved.
1. Introduction Feature extraction and lane modeling are critical for obtaining
mathematical description of lanes. Many algorithms including So-
With the development of intelligent transportation, environ- bel [7], Canny [9], FIR filter [10], and Hough transform [11] are ap-
ment perception, as an essential task for autonomous driving, has plied to extract feature. Many algorithms model lanes as straight
become a research hotspot. Lane detection is an important part of lines. For modeling curves, parabolic [12], Catmull-Rom spline [13],
environmental perception. Many efforts have been done during the cubic B-spline [2], and clothoid curve [14] are used. In complex
last decades. However, it is still a challenge to develop a robust conditions, inverse perspective transformation [15], image en-
detector under unlimited conditions. Because there are too many hancement [16], stereo camera [17], and wavelet analysis [18] are
variables, such as fog, rain, illumination variation, and partial oc- used.
clusion. They may have effects on the final results. In 2012, convolutional neural networks (CNN) AlexNet [19] won
Pre-processing steps play an important role in heuristic the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC).
recognition-based lane detection methods. To remove unwanted Since then, deep learning algorithms became a promising tool.
noise, many filters are used, including mean, median [1], Gaus- Over the past several years, via multi-layer nonlinear transforms,
sian [2], and FIR [3] filters. To deal with illumination variation, the deep learning has achieved promising results in many fields. A va-
general solutions employ threshold segmentation [4] algorithm, in- riety of deep learning methods have been applied to tackle the
cluding Otsu [5], and PLSF [6], etc. The region of interest (ROI) lane detection task. Range from early CNN-based method (e.g.,
is usually used to reduce redundant information. Fixed-size ROI [20,21]) to end-to-end segmentation-based methods (e.g., GCN
[7], vanishing point-based ROI [2] and adaptive ROI [8] have been [22], SCNN [23]), GAN-based method (e.g., EL-GAN [24]), et al.
widely explored. Color is another information for pre-processing. In addition, knowledge distillation [25], attention map [26] have
Color space conversion between RGB and YCbCr or HLS is gener- brought new ideas for lane detection (e.g., SAD [27]). How to un-
ally used to enhance the quality of lane mark. derstand the structure of lane lines from the perspective of di-
rected acyclic graphica, DAGMapper [28] gives a good explanation.
Although promising results have been achieved, the lack of gen-
eralization ability is still a main challenge of existing methods. A
∗
Corresponding author.
E-mail address: lisongbin@mail.ioa.ac.cn (S. Li).
https://doi.org/10.1016/j.patcog.2020.107623
0031-3203/© 2020 Elsevier Ltd. All rights reserved.
J. Tang, S. Li and P. Liu Pattern Recognition 111 (2021) 107623
Fig. 1. Architecture of a classfication CNN. It consists of eight convolution layers and three fully connected layers. The output is 10 0 0 categories probability. Picture from
[36].
CNN trained in one scenario may perform less accurate in other, them into a category. One-stage methods including YOLO [42], G-
especially at night. CNN [43], SSD [44], DSDD [45], and RON [46], etc. They can directly
To the best of our knowledge, this paper is the first article to generate the category probability and position coordinate value
make a comprehensive review of recent deep learning-based lane without region proposal stage. Major methods of deep learning-
detection algorithms. The rest of the survey is arranged as follows. based object detection are shown in Fig. 2.
Section 2 describes the background of relative CNN. Section 3 de-
scribes CNN architectures, loss functions, pre-processing, and post-
2.3. Semantic segmentation
processing of deep learning-based lane detection algorithms. In
Section 4, we do some experiments to demonstrate four state-of-
Semantic segmentation is another fundamental task of com-
the-art and representative algorithms. The conclusion and future
puter vision, which aims to classify every pixel into a category.
work are discussed in Section 5.
In 2015, Jonathan Long et al. proposed Fully Convolutional Net-
works (FCN) [48]. Following FCN, encoder-to-decoder architecture
2. Background of related convolutional neural networks is widely used to address image segmentation issues. An encoder-
decoder CNN architecture is shown in Fig. 3. To fuse different
Vision-based lane detection task, as an application of computer contextual information, GCN [49] used large convolution kernels,
vision, can be defined as image classification, object detection or PSP-Net [50] proposed a pyramid pooling module, Deeplab se-
semantic segmentation. CNN has revealed its powerful effects in a ries [51] adopt dilated spatial pyramid pooling, EncNet [52] in-
wide range of computer vision tasks. Due to the close link between troduced a channel attention method. For real-time segmentation,
lane detection and fundamental vision task, it is necessary to intro- ENet [53] proposed bottleneck module, ERFNet [54] used residual
duce the background of related convolutional neural networks. connections and factorized convolutions, EDANet [55] designed ef-
ficient dense modules and discarded deconvolution layers in or-
der to remain efficient while retaining remarkable accuracy. In ad-
2.1. Image classification
dition, PSANet [56] captures pixel-wise relation by a convolution
layer. Readers can refer to survey [57] for more comprehensive re-
In 2012, AlexNet won ILSVRC with five convolution layers and
view.
three fully connected layers, which essentially extends the depth
of LeNet [29] and applies some techniques such as ReLU [30] and
Dropout [31]. The structure of AlexNet is relatively simple but 3. Deep learning for lane detection
demonstrated the remarkable success of CNN.
It is proved that the solution space of CNN can be expanded We can group the existing lane detection methods into two cat-
by increasing its depth or its width [14]. Following AlexNet [19], egories: two-step and one-step methods. Two-step methods are
GoogLeNet [32], VGG [33] got higher accuracy with deeper and composed of feature extracting step and post-processing step. To
wider architectures in ILSVRC2014. VGG increased the depth to be specific, feature extracting includes heuristic recognition-based
16–19 layers, GoogLeNet increased both depth (22 layers) and feature extracting and deep learning-based feature extracting. Post-
width. GoogLeNet is also named Inception-v1. It has evolved from processing mainly contains clustering and fitting. One-step meth-
Inception-v1 to Inception-v4 [34] with different optimizations. The ods can get the detection and clustering results directly from the
generalization performance of VGG is better, and it is often used to input image. Hence, there is no need for clustering and we sum-
extract image features in many fields. Deeper CNN may cause the marize those methods as one-step methods.
problem of exploding or vanishing gradient. The short-cut connec- Around the above summary, we will discuss the existing deep
tions from ResNet [35] make the training possible. The architecture learning-based lane detection algorithms from two perspectives:
of a single pipeline CNN can be seen in Fig. 1. network architectures and loss functions. Post-processing is an-
other critical part of two-step algorithm, which will be introduced
in Section 3.3 together with pre-processing.
2.2. Object detection
We can group the existing deep learning-based detectors 3.1. Network architecture
into two categories: two-stage and one-stage methods. Two-stage
methods including R-CNN [37], Fast R-CNN [38], Faster R-CNN [39], As a specific application, there are many strategies for lane
CoupleNet [40], and Light-Head R-CNN [41], etc., which first gener- detection. From the perspective of how to define lane detection
ate candidate regions by CNN or traditional methods, then classify task, it can be summarized as the following three categories:
2
Fig. 2. Major methods of deep learning-based object detection. Anchor-free (in red) and AutoML (in green) techniques have becoming two important research directions.
Picture from [47].
Fig. 3. Sketch of an encoder-decoder CNN. Convolution with pooling constitute encoder section, transposed convolution consitute decoder section. Skip connection between
encoder and decoder is usually used for construct-preserving. Picture from [36].
1) classification-based methods, which combine some prior in- the pre-processed input of this method leads to complicated data
formation to determine lanes position; 2) object detection-based processing. Therefore, can we cancel the pre-processing steps and
method. By labeling regression bounding boxes or feature points train CNN in a more concise way? (2) this method has an eight-
for each lane segment, lanes can be detected by coordinate re- layer architecture, thence, can more complicated CNN architectures
gression; 3) segmentation-based method. Lanes and background (with different depths, widths, and topologies) achieve better re-
pixels are labeled as different classes. And the detection results sults?
can be obtained in the form of pixel-level classification (seman-
tic segmentation/instance segmentation). From the perspective of 3.1.2. Classification and object detection based methods
model structure, it can be summarized as the following two types: A Classification based lane detection methods
1) single-task model. Only lane detection is considered, and other
road signs are not involved; 2) multi-task model, which combine The application of image classification generally aims to dis-
lane detection with other tasks, such as drivable area detection, criminate what object is contained in the input. To all appear-
road marking recognition, road type, or lane type classification. In ances, this way cannot obtain the location of the lanes. Some tricks
practice, many available structures and ideas can be extracted from need to be used between classification and lane detection. We as-
existing CNN, such as the feature extractor VGG, ResNet, and the sume that some location-dependent prior knowledge is known,
end-to-end architecture of FCN, etc. which is denoted as pk(p). CNN, as a mapping function f(x), can be
combined with pk(p) to form a new formulation o = f (x, pk( p)).
DeepLane [21] is a method based on this idea. The overall CNN
3.1.1. Benchmark of CNN based method for lane detection architecture is shown in Fig. 5. In detail, the training dataset con-
In 2014, Jiun Kim and Minho Lee [20] proposed a detector, sisted of images (resolution: 240 × 360) from laterally-mounted
where CNN was first used to extract land features and random down-facing cameras. To obtain a probability distribution of lane
sample consensus (RANSAC) was used to clustering. The CNN ar- positions, softmax function is applied to the output of the last fully
chitecture is shown in Fig. 4, which consist of 8 layers with 3 con- connected layer (317 outputs: 316 possible classes for lane posi-
volutional, 2 subsampling, and 3 fully-connected layers. The train- tions and one class for the absence of lane marker). Hence, the
ing dataset is composed of images after ROI selection and edge de- CNN output a vector Yi = (y0 , ..., y316 ). To locate the lane marking,
tection. The last fully-connected layer output the predicted image the estimated position ei is defined as Eq. (3.1).
(10,015), where predicted pixels of lanes are denoted as white.
ei = argmax(yi ), 0 ≤ i ≤ 316 (3.1)
Though the CNN sketch of [20] is relatively simple, it can be
seen as an approximation of the complex mapping between the DeepLane got a better result by a more complex network (nor-
input and output spaces. Despite the advancement of this method malization layer, dropout layer) than [20]. However, the prior po-
compared to traditional ones, the following problems exist: (1) sition setting limits its application scenario. A more general ap-
3
Fig. 4. CNN architecture in [20]. The detector consists of three convolution, two subsampling and three fully connected layers. The output 1500 nodes of last fully connected
layer are reshaped to 100 × 15, which is considered as the predicted map.
The effectiveness of multiple branch strategies in VPGNet and

EELane implies that branches for the different tasks can share
much intermediate representation and prior knowledge can guide
lane detection. To explore this idea further, in 2018, Yuhao Huang
et al. proposed STLNet (Spatial-Temporal Lane detection Network)
[63], which is constitutive of three steps: pre-processing, CNN
based classification and regression, lane fitting. The architecture
Fig. 5. CNN sketch of Deeplane [21]. The last fully connected layer output 317 num-
is shown in Fig. 7. In STLNet, CNN was used to classify the
bers that represent the position probability of lanes. Similar to Fig. 4, the backbone boundary type and regress the lane boundaries position. In terms
consists of convolution, pooling and three fully connected layers. of prior knowledge, the main difference between STLNet and
VPGNet/EELane is that STLNet explored temporal constraints fea-
tures. Because of the inverse perspective transformation (IPM) pro-
proach is needed. Furthermore, classification, as a high-level task, cess during post-processing, its robustness is conditioned.
does not match well with lane detection. Regressing lane coor-
dinates in the form of object detection is also a feasible idea to 3.1.3. Image segmentation based lane detection algorithms
achieve lane detection. Shriyash Chougule et al. [64] indicated that semantic
segmentation-based approaches are an inefficient way of de-
A Object detection based lane detection methods tecting lanes. The segmentation paradigm is inherently too strict,
and the emphasis is on obtaining accurate classification per
Autonomous driving requires a variety of visual perception, pixel rather than specifying the shape. Despite these potential
including lane detection, vehicle recognition, and road marking drawbacks, segmentation-based lane detection algorithms have
recognition, etc. We believe that it is necessary to emphasize the achieved promising results. Many strategies have been proposed
importance of multi-supervised training in detection. In 2013, Ser- to solve the above problems. In 2005, Chiu Kuo-Yupaper et al.
manet et al. proposed OverFeat [58], which indicated that in some [65] considered lane detection as an image segmentation task.
cases, training a network to do classification, location, and detec- However, the traditional image segmentation algorithms (before
tion simultaneously can improve the accuracy of all three tasks. deep learning) did not go far enough.
For object detection, two main goals were pursued: (1) predict-
ing categories of objects in an image; (2) determining the position A Application of existing end-to-end segmentation networks
of detected objects. In 2015, Brody Huval et al. made an empiri-
Bigger convolution kernels provide long-ranged information.
cal evaluation of deep learning on highway driving (EELane) [59].
Wenhui Zhang and Tejas Mahale [22] used GCN [49] to recognize
In EELane the same strategy of OverFeat was used for lane and
lane areas. Carla Simulator1 was used for generating datasets. Ri-
vehicle detection. The regression for vehicle class predict a five-
era Luis et al. [66] designed a lane departure detection system by
dimension value (four for the bounding box and one for depth es-
training Mask-RCNN [67] for lane detection and used a Kalman fil-
timation). The lane regression predicts a six-dimension vector. The
ter for lane tracking. For real-time detection, ten layers of CNN
first four dimensions indicate two endpoints of a local line seg-
were designed by Shriyash Chougule et al. [68]. For more com-
ment, and the remaining two dimensions indicate the depth of
prehensive recognition, the type of road or lanes is noteworthy. In
the endpoints with respect to the camera. The idea of estimating
[69], Pizzati Fabio et al. modified ERFNet to detect the drivable ar-
structural or geometric information by CNN to assist the main task
eas and the road class. DBSCAN was adopted to group the pixel-
has been applied in many domains, such as inpainting and edge-
wise free space points to a set of polygons.
detection. Readers can refer to [60] for further understanding.
When applying DCNN to semantic segmentation, some inherent
VPGNet [61] was proposed by Seokju Lee et al. based on the
flaws appear, such as the design of pooling layer. Up-sampling and
idea of VPD [62]. It is another method to estimate geometric char-
pooling layer (e.g. max-pooling) are not learnable. If there are five
acteristics by CNN, and consist of four branches. In VPGNet, a mod-
pooling layers (2 × 2 max-pooling), any object less than 32 pixels
ified vanishing point (VP) location approach was proposed. The
can not be reconstructed theoretically. That is, in order to obtain a
main improvement of VPGNet is that the VP can guide lane de-
large receptive field, a large amount of information is lost. The di-
tection and road marking recognition. However, complicated post-
lated convolution [70] is designed to optimize those problems. For
processing needs expensive computation. For lane detection, point
sampling, clustering, and lane regression were used during post-
processing. Its network architecture is shown in Fig. 6. 1
https://carla.readthedocs.io/en/latest/.
4
Fig. 6. Sketch of VPGNet [61]. Four branches CNN is designed for grid regression, object detection, multi-label classfication and vanishing point prediction.
Fig. 7. Sketch of STLane.
Fig. 8. Sketch of LMD [61]. Three dilated convolution layers are added between encoder and decoder.
a more detailed illustration of dilated convolution, readers can re- in an image, to establish the connection between two frames in
fer to [71]. The advantages of dilated convolution have been found. a video, and to establish the connection between different words
But how to design effective CNN structures based on dilated con- in a paragraph, etc. Non-local operation meets the needs of lane
volution is a new problem. detection. In 2019, Li Wenhui et al. added non-local in IANet (In-
Based on VGG, Shao-Yuan Lo et al. proposed LMD (Lane Mark- stance batch normalization and Attention Network) [76] to force
ing Detection) [72]. Three dilated convolution layers are embedded CNN to focus on lane regions. Experimental results show that the
between the encoder and decoder. Its architecture can be seen in mechanism is suitable for two-class segmentation scenes.
Fig. 8. The output is a predicted binary segmentation image. The
A Combining prior knowledge with segmentation
predicted lane pixels are denoted as 1, and the predicted back-
ground pixels denoted as 0. In 2019, based on EDANet, Shao-Yuan In terms of road geometric features, GCLNet (Geometric Con-
Lo et al. [73] proposed another embedded dilated convolution lane strained Network) [77] went further than VPGNet. In GCLNet,
detection CNN. By rethinking the relationship between downsam- Zhang Jie et al. proposed a multiple-task framework with mu-
pling operations and spatial information, FSS (Feature Size Selec- tually interlinked sub-structures between lane segmentation and
tion) and DDB (Degressive Dilation Block) were proposed. lane boundary detection to improve overall performance. As shown
How to effectively obtain associated long-range information is in Fig. 9, each decoder is connected to a link encoder to transfer
another problem. Inspired by the non-local means [74] in clas- complementary information between two tasks and thus the fea-
sic computer vision, Wang Xiaolong et al. proposed a learnable tures of two decoders could be reciprocally refined. The sufficient
non-local [75] operations to capture long-range dependencies. The exploration of the relationship between lane areas and lane bound-
learnable non-local long-range dependencies can be used to estab- ary provide ideas for future researchers trying to adopt a multi-
lish the connection between two pixels with an uncertain distance task strategy. Vijay John et al. proposed PsiNet [78] for detecting
5
A How to simplify the post-processing step
We can summarize most of the above-mentioned methods into

feature extraction algorithms, which focus on how to extract lane
features more effectively, and do not consider the optimization
of post-processing. Its output lane features are indistinguishable
without post-processing. For all practical purposes, how to design
effective strategies to get good performance through a concise net-
work has always been the research direction. In this section, we
Fig. 9. Sketch of GLCNet [77]. The m, n and 5 in encoder and decoder does not in- pay more attention to the design of strategies rather than specific
dicate specific. details of network. The GLCNet emphasizes the strategy rather than CNN structure.
feature exactor.
Let us rethink the lane detection task. The fundamental purpose
is to detect the position of each lane. Therefore, there are two pos-
the free space, visible lane marker, and road scene label, which is sible types of algorithm output: points and lines. The question is
a similar idea with GCLNet. how to assign different categories to different lanes without post-
As mentioned in Section 3.1.2.A, in addition to spatial or geo- processing. We can group the existing solutions into three types:
metric features, temporal correlation is another prior knowledge. 1) multi-branch CNN structure. Each branch detects one lane line;
During the running of the vehicle, the lanes are continuous linear 2) multi-class semantic segmentation. The output of each lane line
structures in the captured video. Hence, lanes that cannot be accu- is labeled as a different class; 3) instance segmentation. Each lane
rately detected in a single frame may be inferred by combining in- line is considered as a separate instance.
formation from previous frames. Long short-term memory (LSTM) CooNet (Coordinate Network) [64] is the algorithm of multi-
has the ability to extract time information. The general LSTM cell branch structure, proposed by Shriyash Chougule et al., which de-
at time t can be formulated as (3.2), where Xi , Hi , Ci are the in- fine lane detection as a coordinate regression task. The output 4 30
out of lstm. it , ot and ft denote output of input gate, output gate 1 vectors denote coordinates of four-lane lines. As shown in Fig. 11,
and forget gate, respectively. Wij is the weight matrix of the cor- this strategy does not require any clustering process.
responding feature. bi is the bias of corresponding weight matrix. Xingang Pan, Jianping Shi et al. proposed Spatial CNN (SCNN)
The subscript t and t − 1 denote time. ’∗ ’ and ’◦’ is convolution op- [23], which is an algorithm of multi-class semantic segmentation.
eration and Hadamard product, respectively. The lane lines are labeled into four classes (labelled as 1, 2, 3, 4).
Ct = ft ◦ Ct−1 + it ◦ tanh (Wxc ∗ Xt + Whc ∗ Ht−1 + bc ) Fig. 12 shows the architecture of SCNN, where a lane line discrim-
ination branch is designed.
ft = σ Wx f ∗ Xt + Wh f ∗ Wt−1 + Wc f ◦ Ct−1 + b f There are two ways to achieve instance segmentation: top-to-
ot = σ (Wxo ∗ Xt + Who ∗ Wt−1 + Wco ◦ Ct−1 + bo ) down, object detection-based methods, such as Mask R-CNN. And
it = σ (Wxi ∗ Xt + Whi ∗ Wt−1 + Wci ◦ Ct−1 + bi ) down-to-top, semantic segmentation-based methods, such as Deep
Clustering [80].
Ht = ot ◦ tanh (Ct ) (3.2)
LaneNet [81] is an instance segmentation-based lane detec-
In 2019, CNN-LSTM [79] was proposed. Zou Qin et al. added tion method, which is an application of algorithm [80]. Sketch of
two layers of LSTM between the encoder and decoder, which LaneNet is shown in Fig. 13. Readers can refer to [53] for detailed
were shown to achieve improved performance under the occlusion description of ENet. There are two decoders. The top one is used
scene. As shown in Fig. 10, the input of encoder is a sequence of for segmentation and the bottom one is used for instance embed-
5 frames, and the output feature maps of encoder are used as the ding. The learnable clustering algorithm is the greatest contribu-
input of LSTM layer, where the temporal information is fused. The tion of [80], which allows LaneNet to cluster each lane line during
combination of CNN and LSTM is resource consuming. However, training rather than post-processing. A shortcoming of SCNN and
the benefits of this algorithm are obvious, which can get better CooNet is only up to 4 lane lines can be detected. But LaneNet has
performance under adverse weather conditions. no such problem.
Fig. 10. Sketch of CNN-LSTM [79]. The Figure does not show the details of the convolution layer of encoder and decoder, which can be U-Net or SegNet.
6
Fig. 11. Sketch of CooNet [64]. The maximum of four lane lines can be detected. The 1 × 30 output vector denote coordinates of 15 points.
Fig. 12. Sketch of SCNN [23]. The encoder can be VGG or other effevtive feature exactor. We can consider the slice-to-slice convolution as four 1 × n and n × 1 convolutions.
Fig. 13. Sketch of LaneNet [81]. This method is based on [80], the two branches are same with ENet.
3.1.4. Some optimization strategies of deep learning-based lane proposed by Hinton et al., which used a well-performance network
detection methods (teacher network) to guide the training of small parameter net-
How to remove the post-processing steps? How to get good works (student network) for improving the performance of student
performance with a small dataset? Despite what has been network. Furthermore, the studies [26,83] expanded KD to atten-
achieved, there are still many problems that need to be solved. tion distillation.
These problems also exists in the application of deep learning in How to apply the above methods to lane detection? In 2017, Ji-
computer vision. man Kim, Chanjong Park proposed TLELane (Transfer Learning for
Ego Lane detection) [84], based on two transfer learning steps. The
A Good performance with a small dataset
first step changes the representation domain of the network from
The effectiveness of the pre-training strategy implies that the general scene to the road scene, the second step reduces the
models can share many intermediate representations in different target from general road objects to the left and right ego lanes. In a
datasets. How to use a trained model to help the training of new trained segmentation-based lane detection network, the attention
models? We summarize the feasible methods used in lane detec- maps from different layers will capture rich contextual informa-
tion into the following two types: transfer learning and knowledge tion, which informs the lanes location and rough outline. Hence, it
distillation methods. is a feasible way to utilize the preceding block to mimic the atten-
We can divide the dataset for transfer learning into two cate- tion maps of a deeper block. In [27], Yuenan Hou et al. proposed a
gories: source dataset and target dataset. The source dataset refers self-attention distillation algorithm. Different from attention distil-
to additional data and is not directly related to the task. The tar- lation, the network learns from itself (e.g., block 3 mimic block 4
get dataset is directly related to the task. When both target data and block 2 mimic block 3). Therefore, it is named self-attention-
and source data are labeled, fine-tuning is a common method to distillation based method.
deal with this problem. In order to achieve a good transfer perfor-
mance, there is a wide discussion on which layers to fix and which
layers to train [82]. In 2015, knowledge distillation (KD) [25] was A To remove post-processing
7
Fig. 14. Sketch of EL-GAN. The generator and discrimator are Tiramisu DenseNet-based and Two-headed DebseNet based, respectively. The Lemb denotes the loss of feature
map, which can be considered as a perception loss.
In Section 3.1.3.C, we discussed how to simplify the post- 3.2. A summary of deep learning-based representative lane detection
processing, which mainly focuses on how to remove the clustering. algorithm
In this section, we will discuss how to remove the post-processing,
including the clustering step and fitting step. That is, the output of In this section, we give a summary of methods, advantages and
CNN contains the predicted lanes and the parameterized descrip- limitations of existing representative deep learning-based lane de-
tion of each detected lane. tection algorithms in Table 1. We divide those algorithms into two
RLaneNet (Real-time Lane Network) [85] is another attempt to categories: two-step algorithms and one-step algorithms.
fuse CNN and LSTM, where the LSTM serves as a solution to the
uncertain number of lanes and a decoder to decode the parame- 3.3. Loss function
ters of each lane. The predict values of RLaneNet are the three X
coordination values of the points which lies on the intersections of As a key point of deep learning, loss functions are used to
the lane with three horizontal lines (Y = 0, Y = h/2, Y = h). There calculate the inconsistency between predicted output and ground
is an assumption that lanes can be described by a three-point co- truth and guide the model optimization. Different loss functions
ordinate and quadratic function. h is the image height after IPM. focus on different tasks. Therefore, a variety of loss functions are
DLFNet (Differentiable Least-squares Fitting Network) [86] is a adopted to guide the lane detection task. In this section, we take a
more general strategy, which estimating lane curvature parame- look at loss functions that are used in the lane detection field.
ters by solving a weighted least-squares problem in-network. The
weights are generated by a deep network conditioned on the input 3.3.1. For classification
image. A geometric loss function IS used to train the network to L1 loss, L2 loss, or Cross-Entropy loss are widely used in many
minimize the area between the predicted lane line and the ground- tasks, such as lane line type classification and pixel-level classifica-
truth. The weighted least-squares fitting problem can be consid- tion. The equation of L1 and L2 are shown in (3.4), (3.5), where h,
ered as Eq. (3.3), where ω is the pixels weighted map and X, Y w, c denote the height, width, and channel numbers of an image,
is the coordinate matrix. Therefore, the parameters β of the best- respectively. Iˆ and I are the predicted output and input. As shown
fitting curve through the weighted pixel coordinates can be ob- in Eq. (3.6), the inter-class competition mechanism is adopted in
tained from Eq. (3.3). Cross-Entropy loss. When C = 2, the Cross-Entropy loss can be de-
fined as Eq. (3.7), where si denoted predicted probability of class
ti . In Eq. (3.8), different weights are given to each category. It is an
ω Xβ = ω Y (3.3)
effective solution for the problem of sample imbalance. For exam-
ple, in lane detection, the unbalanced ratio of lane line areas and
background is rather big. In [79] and [23], the weights of lane lines
A Better output structure-preserving and background were setted as 1.0 and 0.4, respectively.
1
L1 (Iˆ, I ) = Iî, j,k − Ii, j,k (3.4)
As mentioned in Section 3.1.3.A, the emphasis of semantic seg- hwc
i, j,k
1 2
mentation is to obtain accurate classification per pixel rather than
specifying the shape. In 2014, Ian J [87] proposed the genera- L2 (Iˆ, I ) = Iî, j,k − Ii, j,k (3.5)
tive adversarial networks (GAN) architecture. The basic principle of
hwc
i, j,k
GAN is a game between generation network and discriminant net-

C
work. The generator generates synthetic data from a given noise Lce = − ti log (si ) (3.6)
(generally referred to as a uniform distribution or a normal distri- i
bution). And the discriminator discriminates the output of the gen- =2
C
erator and real data. The former attempts to produce data that is
Lbce = − ti log (si ) = −t1 log (s1 ) − (1 − t1 ) log (1 − s1 ) (3.7)
closer to reality. Accordingly, the latter attempts to accurately dis-
i=1
tinguish whether the input is real or generated. Readers can refer
to [88–90] for further discussions of GAN.
C
Lwce = − wi ti log (si ) (3.8)
Based on the principle of GAN, Mohsen Ghafoorian et al. pro-
i
posed EL-GAN (Embedding loss GAN) [24] for structure-preserving
of lane detection. DenseNet [91,92] was used in generator and dis- Comparing with L1 loss, the L2 loss is more sensitive to
criminator, as shown in Fig. 14. As a matter of fact, we can regard large errors but less sensitive to small errors. Assuming that
the embedding loss as perceptual loss [93] and the EL-GAN is a the simplified L2 loss function is shown in Eq. (3.9) and yî =
combination of CGAN and perceptual loss. (W xi + b). Its partial derivative is shown in Eq. (3.10), where
8
σ (W xi + b) = σ (W xi + b)(1 − σ (W xi + b)) and σ is sigmoid. Thus, tions are defined as Eq. (3.16)-(3.18), where x is ground truth data,
we can see that when σ (W xi + b) is close to 0 or 1, dW dJ
will z is input noise data. It is a variation of Cross-Entropy loss.
be close to 0, which leads to slow convergence at the beginning
of training. For Cross-Entropy loss function, its partial derivative min max V (D, G ) = Ex∼P (x ) [log D(x )] + Ez∼Pz(z ) [log(1 − D(G(z )))]
G D
is shown in Eq. (3.11). Therefore, in samentic segmentation or
(3.16)
segmentation-based lane detection algorithm, Cross-Entropy loss
or weighted Cross-Entropy loss is more applicable.
1 2 max V (D, G ) = Ex∼Pdata(x ) [log D(x )] + Ez∼Pz(z ) [log(1 − D(G(z )))]
J= yi − yî (3.9) D
2 (3.17)
dJ
= yi − yî σ (W xi + b)xi (3.10)
dW min V (D, G ) = Ez∼Pz(z ) [log(1 − D(G(z )))] (3.18)
C
dLce
= [σ ( s i ) − y i ] · x i (3.11) In EL-GAN, the loss functions are constituted by adversarial
dW loss Ladv , Cross-Entropy loss Lcce , and L2 loss Lemb , as shown in
However, due to the inter-class competition mechanism, Cross- Eq. (3.19)-(3.22). The Lemb loss can be seen as a perceptual loss,
Entropy loss only cares about the accuracy of prediction proba- which compares the feature obtained by the convolution of real
bility of the correct label and ignoring the difference of other in- images with the feature obtained by the convolution of gener-
correct labels. This makes the features learned scattered. To over- ated images, so as to make the high-level information (content
come the drawback, from the perspective of activation function, and global structure) close to each other. The perceptual loss was
there are l-Softmax [94] and A-Softmax [95] etc. From the per- widely used in the super-resolution field [93] for better structure-
spective of loss function, [77] proposed an IoU loss for lane de- preserving.
tection. We named it as LIoU−so f t because it represents the rela-
tionship between predicted probability and the ground-truth. As L f it = L f it (G(x; θgen ), y ) = Lcce (G(x; θgen ), y ) (3.19)
shown in Eq. (3.12), I denotes the set of image pixels, yp is the
output probability of pixel p, g= {0, 1}M×N denotes the set of pix- 1
wh c
els ground-truth and ’ × ’ denotes pixel-wise multiplication. To Lcce (y˙ , y ) = yi, j ln y˙ i, j (3.20)
wh
i j
increasing the intersection-over-union between the predicted lane
pixels and ground-truth lane pixels, another IoU loss was proposed Ladv = Ex∼P (x ) [log(1 − D(G(x )))] (3.21)
in [27]. We named it as LIoU−hard because it represents the rela-
tionship between predicted results and the ground-truth. LIoU−hard
Lemb (y˙ , y; x, θdisc ) = De (y; x, θdisc ) − De (y˙ ; x, θdisc ) 2 (3.22)
is defined by Eq. (3.13), where Np is the number of predicted lane
pixels, Ng is the number of ground-truth lane pixels and No is the In this section, we intriduce various loss funtions widely used
number of lane pixels in the overlapped areas between predicted in lane detection field. For a multi-branch detector, a weighted av-
lane areas and ground-truth lane areas. erage of multiple loss functions is frequently used. There are many
other loss functions in the lane detection field, however, we can
p∈I (y p × g p )
LIoU−so f t = 1 − (3.12) consider it as some combinations of mentioned functions in this
p∈I (y p + g p − y p × g p ) section.
LIoU−hard = 1 − N p /(N p + Ng − No ) (3.13)
3.4. Pre-processing and post-processing
3.3.2. For regression 3.4.1. Pre-processing

In [59,61] and [64], coordinate regression and grid regression To reduce the workload of the computer and accelerate the run-
were used. The mainly distance measurement method are L1 , L2 ning speed, ROI cropping was widely used in traditional lane de-
or variations of L1 and L2 . For example, in [64], Lcood is defined as tection algorithms and part of the deep learning-based methods.
Eq. (3.14), where xpi and ypi denote predicted coordinate of xi and The clipped area is usually focused on the sky portion, as shown
yi, xgi and ygi are the corresponding ground truth. in 15. The 720 1280 3 image is cropped to 500 1280 3, which re-
duces computation by 30.5%.

15
15
Lcoor = |x pi − xgi | + |y pi − ygi | (3.14) Robustness is an important factor in applying tasks from re-
i=1 i=1
search to practice. There existing two feasible solutions on how
to improve the generalization of CNN: 1) some innovative train-
The grid regression loss is measured by L1 or L2 , too. We take
ing strategies, such as meta-learning [96]; 2) increase the diver-
YOLO as an illustration, which is formulated as Eq. (3.15), where
sity of the training sets. Data augmentation (including rotation, and
xi , yi , wi , hi denote the center coordinates and the width, height of
brightness adjustment etc.) is widely used to ensure the diversity
ground-truth grid box. xî , yî , wˆ i , hˆ i denote the corresponding pre-
of the dataset. As shown in Fig. 15, rotating, brightness enhance-
diction.
ment or mirroring are widely used to simulate the diversity of

S
B2 2 2 road.
Lcoor = λcoord 1ob
ij
j
xi − xî + yi − yî
i=0 j=0 3.4.2. Post-processing

S2
B
√
2
2 The post-processing algorithm is necessary to transform de-
+ λcoord 1ob
ij
j
( wi − wˆ i ) + ( hi − hˆ i ) (3.15) tected points into mathematical descriptions. In the lane detec-
i=0 j=0 tion field, we can summarize post-processing algorithms into two
steps: clustering and curve-fitting.
3.3.3. For adversarial training Density-Based Spatial Clustering of Applications with Noise
GAN has been introduced to a variety of computer vision tasks, (DBSCAN) is a widely used algorithm for lanes clustering. The idea
which is a game between generator and discriminator. Loss func- is that the points in one cluster should be close to each other.
9
Fig. 15. pre-processing.
Fig. 16. An illustration of DBSCAN algorithm in a smile dataset. The dataset is clus-
tered into five clusters when min-points = 4, epsilon = 1.
There are two key parameters in DBSCAN: min-points and epsilon.

Picking an arbitrary point in our dataset as a center. If there are
more than min-points within a distance of epsilon from the center
(including the original point), all of those points should be consid-
ered as the same cluster. The starting point is marked as visited. Fig. 17. An fitting results illustration of Catmull-Rom, B-spline, and polyfit.
Then all the points in the cluster that are not marked as visited are
processed. By this way, the cluster can be extended recursively. A
test on smile dataset is illustrated in Fig. 16, where min-points = 4, TP (true positive), TN (true negative), FP (false positive), FN (false
epsilon = 1. Readers can get more instances on the website.2 negative) are the common evaluation metrics in image processing
Clustering algorithms make lanes into different clusters. The domain. Furthermore, FNR (false negative rate), FPR (false positive
modeling curves of each cluster of lanes are another critical step rate), ACC (accuracy) can be calculated by Eq. (4.1). N represents
for mathematical description. As mentioned in Section 1, many the total pixel number of an image. The basic information of the
functions including parabolic, Catmull-Rom spline, cubic B-spline, TuSimple dataset is shown in Table 2. Three samples of the TuSim-
clothoid curve, et al. were used. An illustration of Catmull-Rom, B- ple dataset are shown in Fig. 18.
spline, and polyfit is shown in Fig. 17, where x = [1–16], y = [4.00, ACC = (T P + T N )/N
6.40, 8.00, 8.80, 9.22, 9.50, 9.70, 9.86, 10.00, 10.20, 10.32, 10.42,
10.50, 10.55, 10.58, 10.60]. We can see that B-spline got the best F NR = F N/(T P + F N )
performance. F P R = F P/(T N + F P ) (4.1)
The general process of traditional lane detection algorithm is

4. Discussion and analysis
shown in Fig. 19. For example, the detection steps of the famous
project CarND are: (1) inverse perspective transformation; (2) con-
For a more intuitive comparison, we select four representative
vert RGB image to HLS color space and extract S channel features;
segmentation-based algorithms [23,79,81,86] for demonstration in
(3) binarize the RGB and HLS space image and fuse the two bina-
this section. The four methods have been detailedly explained in
rized image; (4) use the histogram algorithm to locate the range
Section 3. In addition, before 2018, TuSimple3 was the largest
of lane lines, and use sliding window method combine with curve
lane detection dataset and many algorithms have been tested on
equation to fit lane lines. As shown in Fig. 20, it is obvious that
it. Hence, the deep learning-based algorithms also tested on the
when illumination changes, the traditional algorithm is difficult to
TuSimple dataset in this section. In term of evaluation metrics,
get good performance .
Table 3 shows a comparison of some experimental results
2
https://www.naftaliharris.com/blog/visualizing- dbscan- clustering/. of segmnetation-based lane detection methods. We trained the
3
http://benchmark.tusimple.ai/#/t/1. ERFNet-DLSF [86] for ego-lane (two-class) detection. ResNet18-
10
Table 1
Summary of various deep learning-based lane detection algorithms.
Authors Method Advantages limitations
Two step
Jiun Kim et al. Classification: 1: Compared with the traditional method, 1: The network structure is not efficient
(2014) [20] A: 8 layers CNN. which got a good performance. enough.
B: RANSAC.
Brody Huval et al. Object detection: 1:More satisfied with autopilot 1: Complicated data collection and
(2015) [59] A: Vehicle detection and lane requirement. annotation.
detection. 2:Robustness against occlusion to some 2: Repeated detection and coordinate
B: Depth estimation extent. regression.
3:Used CNN to estimate geometric
information.
Seokju Lee et al. Object detection: 1: Improved robustness under various 1: Post-processing take high
(2017) [61] A: lane and road marking recognition. conditions. computational complexity.
B: Vanishing point estimation.
Huang et al. Object detection: 1: Used temporal and spatial constraints 1: Complicated data flow and architecture.
(2018) [63] A: IPM. to reduce the range of searching area. 2: When initial assumptions are not met,
B: Coordinate regression. pre-process cannot provide valid results.
C: Sub-image Extraction.
Chen, Ping-Rong et al. Segmentation: 1: The dilated convolution isemployed to 1: This is just an application of dilated
(2018) [72] A: end-to-end. enlarge the receptive fields. convolution, the performance is not
B: dilated convolution. achieved SOTA.
Zhang Jie et al. Segmentation: 1: This work pay more attention on 1: Complex loss function and network
(2018) [77] A: end-to-end. interlinked relationship of sub-structures. structure bring many difficulties in
B: multiple-task framework. training.
Zou Qin et al. Sementation: 1: The exploration in temporal improve 1: High computational complexity.
(2019) [79] A: Combining CNN and LSTM. its performance under occlusion scene. 2: When the input images does not
B: The input of encoder is 5 change much, the improved performance
consecutive frames. is conditioned.
Mohsen Ghafoorian et al. Segmentation: 1: The Embedding loss in discriminator 1: Large amount of parameters.
(2018) [24] A: CGAN network. can effectively control the output
boundary closer to the label.
One step
Gurghian et al. Classification: 1: Fast detection. 1: Limited application scenarios.
(2016) [21] A: Combine prior position and 2: Simple network structure. 2: Fixed camera parameters.
classification result to estimate
lane position.
Li Wenhui et al. Segmentation: 1: Verified the attention is suitable for 1: Non-local adds large computation.
(2019) [76] A: end-to-end. two-class semantics segmentation task
B: Non-local attention. with only lane and background.
C: Instance batch normalization.
Shriyash Chougule et al. Regression: 1: This strategy does not 1: Fixed number of lane
(2018) [64] A: multi-branch. require cluster step. lines can be detected.
B: Coordinate regression. 2: Lightweight networks.
C: Data augmentation.
Xingang Pan et al. Segmentation: 1: The slice-by-slice convolution is 1: High computational complexity.
(2018) [23] A: end-to-end. suitable for long continuous shape
B: slice-by-slice convolution. structure.
Neven Davy et al. Segmentation: 1: Proposed a H–Net to estimate IPM 1: The H–Net is not very effective.
(2018) [81] A: Instance segmentation. transformation matrix.
2: Do not fixed number of lanes
Jiman Kim et al. Segmentation: 1: Overcome weakpoints of small dataset. 1: Only ego lanes can be detected.
(2017) [84] 1: Two steps transfer learning.
Yuenan Hou et al. Segmentation: 1: The self-attention distillation strategy 1: The complex training strategies
(2019) [27] A: Self-attention distillation. is efficient. and loss functions will make the
hyperparameter adjustment difficulty.
Wang Ze et al. Regression: 1: The LSTM serves as a solution 1: The ordinate of the three points to be
(2018) [85] A: Edge proposal. to the uncertain number. detected is predefined.
B: Parameters regression. 2: Do not need any post-processing.
Van Gansbeke et al. Segmentation: 1: This is a more general strategy without 1: Fixed number of lane lines can be
(2019) [86] A: Generating coordinate weight any predefine condition. detected.
map. 2: When added the number of weight
B: A differentiable least-squares fitting map, the performance will be degraded
module
Fig. 18. Three sample images of TuSimple dataset.
11
Fig. 19. Flow diagram of lane detection algorithm.
Fig. 20. Experiments of CarND under different circumstances. Line 1: ideal environment. line 2: complex conditions. From left to right: input, after IPM, after Binarization,
after fited, histogram map.
Fig. 21. Three results under curved lanes. Line 1: result of LaneNet [81]. Line 2: result of SCNN [23]. Line 3: result of ERFNet-DLSF [79].
Table 2 5. Conclusion and future work

Basic information of TuSimple dataset.
Name Frame Train Validation Test Resolution In this paper, we present an overall review of recent deep
TuSimple 6408 3268 358 2782 1280 × 720 learning-based lane detection algorithms. There are three main
contributions. First, this paper is the first overall review of recent
deep learning-based lane detection algorithms, which will facilitate
readers to understand how to apply deep learning to lane detec-
Table 3 tion task. Second, we introduce those algorithms from CNN archi-
Comparisons among some representative deep models. tectures and loss functions. This will facilitate researchers to de-
Paper Method Dataset ACC FPR FNR sign their own detector. Third, four representative state-of-the-art
[35] ResNet18-based TuSimple 92.69 0.0948 0.0822
algorithms are highlighted and conducted experiments on TuSim-
[35] ResNet34-based TuSimple 92.84 0.0918 0.0796 ple dataset, which will help readers to understand the best perfor-
[53] ENet TuSimple 93.02 0.0886 0.0734 mance and optimization directions. We will clean up and open all
[86] ERFNet-DLSF TuSimple 93.38 0.1064 0.0983 the test code used in this work, which will further facilitate the
[27] ENet-SAD TuSimple 96.64 0.0602 0.0205
readers from theory to application.
[81] LaneNet TuSimple 96.38 0.0442 0.0197
[23] S-CNN TuSimple 96.53 0.0617 0.018 Thanks to the improvement of hardware computing power and
[24] EL-GAN TuSimple 96.39 0.0412 0.0336 GPUs. In recent years, a variety of vision-based assisted driving sys-
[79] CNN-LSTM (SegNet+) TuSimple 97.30 0.0416 0.0186 tems have been deployed on vehicle platforms, such as Lane De-
[79] CNN-LSTM (UNet+) TuSimple 97.20 0.0424 0.0184 parture System and Lane Change Assist System, etc. The detection
accuracy of lane detection has increased to 97% [79] on TuSim-
ple dataset. However, there are still many underlying challenges
not been overcome. There are two main challenges: (1) the lack
based and ResNet34-based methods used spatial upsampling as the of generalization ability. [23,27] proposed two modules that can
decoder rather than deconvolution. The overall architecture of four be easily transplanted into other CNN to get a better performance.
selected algorithms [23,79,81,86] have been shown in Section 3. However, the supervised learning method cannot make appropriate
As shown in Fig. 21, Fig. 22, the experiments are conducted under adjustments to the situation that has not appeared in the training
curve lines and shade situation. set; (2) it is difficult to be deployed on mobile devices. The com-
Middle layers feature maps reflect what the CNN extracted. For plicated and excellent performance CNN usually accompanied by
encoder-decoder structure CNN, the encoder encoded input image millions of parameters, this is a challenge for real-time computa-
to multi-dimension and low-resolution feature maps. Fig. 23 shows tion on mobile devices.
several feature maps of [23,79,86] encoder output. We can see that Along with those deficiencies, several directions may be fur-
SCNN extract more obvious linearity features. ther explored in the future. First, semantic segmentation remains
12
Fig. 22. Four results under shadows shade condition. Line 1: results of LaneNet [81], from left to right, semantic segmentation map, embedding map, instance segmentation
map and fused image. Line 2: results of SCNN [23]. From left to right: instance segmentation map, fused image. Line 3: results of ERFNet-DLSF [86]. From left to right:
weight map of left lane line, weight map of right lane line, fused weight map of the above two, predicted result. Line 4: result of CNN-LSTM [79]. From left to right: four
continuous frames of input, result of semantic segmentation.
Fig. 23. From left to right: features maps of [79,23,86] encoder output, respectively.
a computationally intensive algorithm for embedded deployment. [3] X. Wang, Y. Wang, C. Wen, Robust lane detection based on gradient-pairs con-
More efficient CNN architectures should be explored. Second, su- straint, in: Proceedings of the 30th Chinese Control Conference, IEEE, 2011,
pp. 3181–3185.
pervised learning requires a large amount of annotated data, and [4] J. Duan, Y. Zhang, B. Zheng, Lane line recognition algorithm based on thresh-
labeling data is a boring and costly task. Semi-supervised or old segmentation and continuity of lane line, in: 2016 2nd IEEE International
weakly-supervised algorithms have been developed to make semi- Conference on Computer and Communications (ICCC), IEEE, 2016, pp. 680–
684.
supervised semantic segmentation possible. In addition, how to [5] Y. Chai, S.J. Wei, X.C. Li, The multi-scale hough transform lane detection
make accurate predictions under a variable environment is critical. method based on the algorithm of otsu and canny, in: Advanced Materials Re-
Meta-Learning should be a viable exploration. Then, existing seg- search, Vol. 1042, Trans Tech Publ, 2014, pp. 126–130.
[6] V. Gaikwad, S. Lokhande, Lane departure identification for advanced driver as-
mentation architectures rely on experience design, auto-machine
sistance, IEEE Transactions on Intelligent Transportation Systems 16 (2) (2014)
learning may provide new ideas for more efficient feature extrac- 910–918.
tor. [7] C. Mu, X. Ma, Lane detection based on object segmentation and piece-
wise fitting, TELKOMNIKA Indones, J. Electr. Eng. TELKOMNIKA 12 (5) (2014)
3491–3500.
Declaration of Competing Interest [8] D. Ding, C. Lee, K.-y. Lee, An adaptive road roi determination algorithm for lane
detection, in: 2013 IEEE International Conference of IEEE Region 10 (TENCON
2013), IEEE, 2013, pp. 1–4.
We declare that we have no known competing financial inter-
[9] P.-.C. Wu, C.-.Y. Chang, C.H. Lin, Lane-mark extraction for automobiles under
ests or personal relationships that could have appeared to influ- complex conditions, Pattern Recognit 47 (8) (2014) 2756–2767 29.
ence the work reported in this paper. [10] T. Aung, M.H. Zaw, Video based lane departure warning system using hough
transform, in: International Conference on Advances in Engineering and Tech-
nology, 2014, pp. 29–30.
Acknowledgement [11] J. Niu, J. Lu, M. Xu, P. Lv, X. Zhao, Robust lane detection using two-stage feature
extraction with curve fitting, Pattern Recognit 59 (2016) 225–233.
This work was supported in part by the Important Science and [12] J.C. McCall, M.M. Trivedi, Video-based lane estimation and tracking for driver
assistance: survey, system, and evaluation, IEEE Trans. Intelligent Transporta-
Technology Project of Hainan Province under Grant ZDKJ201807, in tion Systems (2006) 20–37.
part by the Hainan Provincial Natural Science Foundation of China [13] Y. Wang, D. Shen, E.K. Teoh, Lane detection using spline model, Pattern Recog-
under Grant 618QN309, in part by the Scientific Research Founda- nit Lett 21 (8) (20 0 0) 677–689.
[14] G.F. Montufar, R. Pascanu, K. Cho, Y. Bengio, On the number of linear regions of
tion Project of Haikou Laboratory, Institute of Acoustics, Chinese deep neural networks, in: Advances in neural information processing systems,
Academy of Sciences, in part by the IACAS Young Elite Researcher 2014, pp. 2924–2932.
Project under Grant QNYC201829 and Grant QNYC20174 [15] M. Fu, X. Wang, H. Ma, Y. Yang, M. Wang, Multi-lanes detection based on
panoramic camera, in: 11th IEEE International Conference on Control & Au-
tomation (ICCA), 520, IEEE, 2014, pp. 655–660.
References [16] Y. Li, L. Chen, H. Huang, X. Li, W. Xu, L. Zheng, J. Huang, Nighttime lane mark-
ings recognition based on canny detection and hough transform, in: 2016 IEEE
[1] J.-.G. Wang, C.-.J. Lin, S.-.M. Chen, Applying fuzzy method to vision-based International Conference on Real-time Computing and Robotics (RCAR), IEEE,
lane detection and departure warning system, Expert Syst Appl 37 (1) (2010) 2016, pp. 411–415.
113–126. [17] J.-.G. Kim, J.-.H. Yoo, J.-.C. Koo, Road and lane detection using stereo camera,
[2] P.-.Y. Hsiao, C.-.W. Yeh, S.-.S. Huang, L.-.C. Fu, A portable vision-based real-time in: 2018 IEEE 525 International Conference on Big Data and Smart Computing
lane departure warning system: day and night, IEEE Transactions on Vehicular (BigComp), IEEE, 2018, pp. 649–652.
Technology 58 (4) (2008) 2089–2094.
13
[18] D. Mingfang, W. Junzheng, L. Nan, L. Duoyang, Shadow lane robust detection [49] C. Peng, X. Zhang, G. Yu, G. Luo, J. Sun, Large kernel matters-improve se-
by image signal local reconstruction, International Journal of Signal Processing, mantic segmentation by global convolutional network, in: Proceedings of the
Image Processing and Pattern Recognition 9 (3) (2016) 89–102. IEEE conference on computer vision and pattern recognition, 2017, pp. 4353–
[19] I.S. Krizhevsky, G.E. Hinton, Imagenet classification with deep convolutional 4361.
neural networks, in: Advances in neural information processing systems, 2012, [50] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid scene parsing network, in: Pro-
pp. 1097–1105. ceedings of the IEEE conference on computer vision and pattern recognition,
[20] J. Kim, M. Lee, Robust lane detection based on convolutional neural network 2017, pp. 2881–2890.
and random sample consensus, in: International conference on neural infor- [51] L.-.C. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking atrous convolution
mation processing, Springer, 2014, pp. 454–461. 30. for semantic image segmentation, arXiv preprint arXiv:1706.05587, 2017
[21] T.K. Gurghian, S.V. Bailur, K.J. Carey, V.N. Murali, Deeplanes: end-to-end lane [52] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, A. Agrawal, Context en-
position estimation using deep neural networksa, in: Proceedings of the IEEE coding for semantic segmentation, in: Proceedings of the IEEE Conference on
Conference on Computer Vision and Pattern Recognition Workshops, 2016, Computer Vision and Pattern Recognition, 2018, pp. 7151–7160.
pp. 38–45. [53] Paszke, A.C., S. Kim, E. Culurciello, Enet: a deep neural network architecture
[22] W. Zhang, T. Mahale, End to end video segmentation for driving: lane detection for real-time semantic segmentation, arXiv:1606.02147. 33, 2016.
for autonomous car, arXiv:1812.05914, 2018. [54] E. Romera, J.M. Alvarez, L.M. Bergasa, R. Arroyo, Erfnet: efficient residual fac-
[23] X. Pan, J. Shi, P. Luo, X. Wang, X. Tang, Spatial as deep: spatial cnn for traffic torized convnet for real-time semantic segmentation, IEEE Transactions on In-
scene understanding, Thirty-Second AAAI Conference on Artificial Intelligence, telligent Transportation Systems 19 (1) (2017) 263–272.
2018. [55] S.-.Y. Lo, H.-.M. Hang, S.-.W. Chan, J.-.J. Lin, Efficient dense modules of asym-
[24] M. Ghafoorian, C. Nugteren, N. Baka, O. Booij, M. Hofmann, El-gan: embedding metric convolution for real-time semantic segmentation, in: Proceedings of the
loss driven generative adversarial networks for lane detection, in: Proceedings ACM Multimedia Asia on ZZZ, 2019, pp. 1–6.
of the European Conference on Computer Vision (ECCV), 2018 0-0. [56] H. Zhao, Y. Zhang, S. Liu, J. Shi, C. Change Loy, D. Lin, J. Jia, Psanet: point-wise
[25] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network, spatial attention network for scene parsing, in: Proceedings of the European
arXiv preprint arXiv:1503.02531, 2015. Conference on Computer Vision (ECCV), 2018, pp. 267–283.
[26] S. Zagoruyko, N. Komodakis, Paying more attention to attention: improv- [57] Y. Guo, Y. Liu, T. Georgiou, M.S. Lew, A review of semantic segmentation using
ing the performance of convolutional neural networks via attention transfer, deep neural networks, Int J Multimed Inf Retr 7 (2) (2018) 87–93.
arXiv:1612.03928, 2016. [58] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, Overfeat: in-
[27] Y. Hou, Z. Ma, C. Liu, C.C. Loy, Learning lightweight lane detection cnns by tegrated recognition, localization and detection using convolutional networks,
self-attention distillation, in: Proceedings of the IEEE International Conference international conference on learning representations., 2013.
on Computer Vision, 2019, pp. 1013–1021. [59] B. Huval, T. Wang, S. Tandon, J. Kiske, W. Song, J. Pazhayampallil, M. Andriluka,
[28] N. Homayounfar, W.-.C. Ma, J. Liang, X. Wu, J. Fan, R. Urtasun, Dagmapper: P. Rajpurkar, T. Migimatsu, R. Cheng-Yue, et al., An empirical evaluation of deep
learning to map by discovering lane topology, in: Proceedings of the IEEE In- learning on highway driving, CoRR. (2015).
ternational Conference on Computer Vision, 2019, pp. 2911–2920. [60] O. Elharrouss, N. Almaadeed, S. Almaadeed, Y. Akbari, Image inpainting: a re-
[29] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al., Gradient-based learning applied view, Neural Processing Letters (2019) 1–22.
to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324. [61] S. Lee, J. Kim, J. Shin Yoon, S. Shin, O. Bailo, N. Kim, T.-.H. Lee, H. Seok Hong,
[30] V. Nair, G.E. Hinton, Rectified linear units improve restricted boltzmann ma- S.-.H. Han, I. So Kweon, Vpgnet: vanishing point guided network for lane and
chines, in: Proceedings of the 27th international conference on machine learn- road marking detection and recognition, in: Proceedings of the IEEE Interna-
ing (ICML-10), 2010, pp. 807–814. 560. tional Conference on Computer Vision, 2017, pp. 1947–1955.
[31] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R.R. Salakhutdinov, Im- [62] Borji, Vanishing point detection with convolutional neural networks, ArXiv-
proving neural networks by preventing co-adaptation of feature detectors, abs/1609.00967, 2016.
arXiv:1207.0580. 31, 2012. [63] Y. Huang, S. Chen, Y. Chen, Z. Jian, N. Zheng, Spatial-temproal based lane de-
[32] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Van- tection using deep learning, in: IFIP International Conference on Artificial In-
houcke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the telligence Applications and Innovations, Springer, 2018, pp. 143–154. 34.
IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9. [64] S. Chougule, N. Koznek, A. Ismail, G. Adam, V. Narayan, M. Schulze, Reliable
[33] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale multilane 645 detection and classification by utilizing cnn as a regression net-
image recognition, Computer Science (2014). work, in: Proceedings of the European Conference on Computer Vision (ECCV),
[34] C. Szegedy, S. Ioffe, V. Vanhoucke, A.A. Alemi, Inception-v4, inception-resnet 2018 0-0.
and the impact of residual connections on learning, Thirty-First AAAI Confer- [65] K.-.Y. Chiu, S.-.F. Lin, Lane detection using color-based segmentation, in: IEEE
ence on Artificial Intelligence, 2017. Proceedings. Intelligent Vehicles Symposium, 20 05., IEEE, 20 05, pp. 706–711.
[35] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, [66] L. Riera, K. Ozcan, J. Merickel, M. Rizzo, S. Sarkar, A. Sharma, Driver be-
in: Proceedings of the IEEE conference on computer vision and pattern recog- havior analysis using lane departure detection under challenging conditions,
nition, 2016, pp. 770–778. arXiv:1906.0 0 093, 2019.
[36] Z. Hu, J. Tang, Z. Wang, K. Zhang, L. Zhang, Q. Sun, Deep learning for im- [67] K. He, G. Gkioxari, P. Dollar, R. Girshick, Mask r-cnn., IEEE Transactions on Pat-
age-based cancer detection and diagnosis a survey, Pattern Recognit 83 (2018) tern Analysis & Machine Intelligence, 2017 991-1.
134–149. [68] S. Chougule, A. Ismail, A. Soni, N. Kozonek, V. Narayan, M. Schulze, An effi-
[37] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate cient encoderdecoder cnn architecture for reliable multilane detection in real
object 575 detection and semantic segmentation, in: Proceedings of the IEEE time, in: 2018 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2018, pp. 1444–
conference on computer vision and pattern recognition, 2014, pp. 580–587. 1451.
[38] R. Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on [69] F. Pizzati, F. Garc´ıa, Enhanced free space detection in multiple lanes based on
computer vision, 2015, pp. 1440–1448. single cnn with scene identification, in: 2019 IEEE Intelligent Vehicles Sympo-
[39] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real-time object detec- sium (IV), IEEE, 2019, pp. 2536–2541.
tion with region proposal networks, in: Advances in neural information pro- [70] F. Yu, V. Koltun, Multi-scale context aggregation by dilated convolutions, inter-
cessing systems, 2015, pp. 91–99. national conference on learning representations.,2015.
[40] Y. Zhu, C. Zhao, J. Wang, X. Zhao, Y. Wu, H. Lu, Couplenet: coupling global [71] P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, G. Cottrell, Understanding
structure with local parts for object detection, in: Proceedings of the IEEE In- convolution for semantic segmentation, in: 2018 IEEE winter conference on
ternational Conference on Computer Vision, 2017, pp. 4126–4134. applications of computer vision (WACV), IEEE, 2018, pp. 1451–1460.
[41] Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, J. Sun, Light-head r-cnn: in defense of [72] P.-.R. Chen, S.-.Y. Lo, H.-.M. Hang, S.-.W. Chan, J.-.J. Lin, Efficient road lane
two-stage object detector, arXiv:1711.07264, 2017. marking detection with deep learning, in: 2018 IEEE 23rd International Con-
[42] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, re- ference on Digital Signal Processing (DSP), IEEE, 2018, pp. 1–5.
al-time object detection, in: Proceedings of the IEEE conference on computer [73] S.-.Y. Lo, H.-.M. Hang, S.-.W. Chan, J.-.J. Lin, Multi-class lane semantic segmen-
vision and pattern recognition, 2016, pp. 779–788. 32. tation using efficient convolutional networks, in: 2019 IEEE 21st International
[43] M. Najibi, M. Rastegari, L.S. Davis, G-cnn: an iterative grid based object de- Workshop on Multimedia Signal Processing (MMSP), IEEE, 2019, pp. 1–6. 35.
tector, in: Proceedings of the IEEE conference on computer vision and pattern [74] B.Coll Buades, J.-.M. Morel, A non-local algorithm for image denoising, in: 2005
recognition, 2016, pp. 2369–2377. IEEE Computer Society Conference on Computer Vision and Pattern Recogni-
[44] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-.Y. Fu, A.C. Berg, Ssd: tion (CVPR’05), Vol. 2, IEEE, 2005, pp. 60–65.
single shot multibox detector, in: European conference on computer vision, [75] X. Wang, R. Girshick, A. Gupta, K. He, Non-local neural networks, in: Proceed-
Springer, 2016, pp. 21–37. ings of the IEEE conference on computer vision and pattern recognition, 2018,
[45] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, A.C. Berg, Dssd: deconvolutional single shot pp. 7794–7803.
detector, arXiv:1701.06659, 2017. [76] W. Li, F. Qu, J. Liu, F. Sun, Y. Wang, A lane detection network based on ibn and
[46] T. Kong, F. Sun, A. Yao, H. Liu, M. Lu, Y. Chen, Ron: reverse connection with ob- attention, Multimed Tools Appl (2019) 1–14.
jectness prior networks for object detection, in: Proceedings of the IEEE Con- [77] J. Zhang, Y. Xu, B. Ni, Z. Duan, Geometric constrained joint lane segmentation
ference on Computer Vision and Pattern Recognition, 2017, pp. 5936–5944. and lane boundary detection, in: Proceedings of the European Conference on
[47] X. Wu, D. Sahoo, S.C. Hoi, Recent advances in deep learning for object detec- Computer Vision (ECCV), 2018, pp. 486–502.
tion, Neurocomputing (2020). [78] V. John, N.M. Karunakaran, C. Guo, K. Kidono, S. Mita, Free space, visible and
[48] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic seg- missing lane marker estimation using the psinet and extra trees regression, in:
mentation, in: Proceedings of the IEEE conference on computer vision and pat- 2018 24th International Conference on Pattern Recognition (ICPR), IEEE, 2018,
tern recognition, 2015, pp. 3431–3440. pp. 189–194.
14
JIGANG TANG received the B.S. degree in control technol-

[79] Zou Q., Jiang H., Dai Q., Yue Y., Chen L., Wang Q., Robust lane detection from
ogy and instruments from China university of petroleum
continuous driving scenes using deep neural networks, IEEE Transactions on
(Beijing), Beijing, China, in 2018. He is currently pursu-
Vehicular Technology., 2019.
ing the Master’s degree in Institute of acoustics, Chinese
[80] J.R. Hershey, Z. Chen, J. Le Roux, S. Watanabe, Deep clustering: discrimina-
Academy of Sciences, Beijing, China. His-current research
tive embeddings for segmentation and separation, in: 2016 IEEE International
interests include computer vision and deep learning.
Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2016,
pp. 31–35.
[81] D. Neven, B. De Brabandere, S. Georgoulis, M. Proesmans, L. Van Gool, Towards
end-toend lane detection: an instance segmentation approach, in: 2018 IEEE
Intelligent Vehicles Symposium (IV), IEEE, 2018, pp. 286–291.
[82] Y. Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, R. Feris, Spottune: transfer
learning through adaptive fine-tuning, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2019, pp. 4805–4814.
[83] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, X. Tang, Residual
attention network for image classification, in: Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, 2017, pp. 3156–3164. 36.
[84] J. Kim, C. Park, End-to-end ego lane estimation based on sequential transfer
learning for self-driving cars, in: Proceedings of the IEEE Conference on Com- SONGBIN LI received the Ph.D. degree from the Insti-
puter Vision and Pattern Recognition Workshops, 2017, pp. 30–38. tute of Acoustics, Chinese Academy of Sciences, Beijing,
[85] Z. Wang, W. Ren, Q. Qiu, Lanenet: real-time lane detection networks for au- China, in 2010. He was a Postdoctoral Fellow and a Visit-
tonomous driving, arXiv preprint arXiv:1807.01726, 2018. ing Professor with Tsinghua University and the University
[86] W. Van Gansbeke, B. De Brabandere, D. Neven, M. Proesmans, L. Van Gool, of Southern California, respectively. He has been a Pro-
End-to-end lane detection through differentiable least-squares fitting, in: Pro- fessor with the Institute of Acoustics, Chinese Academy
ceedings of the IEEE International Conference on Computer Vision Workshops, of Sciences, since 2018. He has been the Principle Inves-
2019 0-0. tigator on several projects of the National Natural Sci-
[87] I Goodfellow, J Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, ence Foundation of China. His-current research interests
A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neural in- include machine learning, multimedia signal processing,
formation processing systems, 2014, pp. 2672–2680. and information forensics.
[88] M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial net-
works, in: Proceedings of the 34th International Conference on Machine Learn-
ing-Volume 70, 2017, pp. 214–223.
[89] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A.C. Courville, Improved
training of wasserstein gans, in: Advances in neural information processing
systems, 2017, pp. 5767–5777.
[90] X. Mao, Q. Li, H. Xie, R.Y. Lau, Z. Wang, S. Paul Smolley, Least squares genera- PENG LIU received the B.S. degree in communication en-
tive adversarial networks, in: Proceedings of the IEEE International Conference gineering from Hainan University in 2011 and the Ph.D.
on Computer Vision, 2017, pp. 2794–2802. degree from the Institute of Acoustics, Chinese Academy
[91] G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected con- of Sciences, Beijing, China, in 2016. He has been an As-
volutional networks, in: Proceedings of the IEEE conference on computer vi- sociate Professor with the Chinese Academy of Sciences
sion and pattern recognition, 2017, pp. 4700–4708. since 2018. His-current research interests include com-
[92] S. Jégou, M. Drozdzal, D. Vazquez, A. Romero, Y. Bengio, The one hundred lay- puter vision, multimedia signal processing, and informa-
ers tiramisu: fully convolutional densenets for semantic segmentation, in: Pro- tion forensics.
ceedings of the IEEE conference on computer vision and pattern recognition
workshops, 2017, pp. 11–19.
[93] J. Johnson, A. Alahi, L. Fei-Fei, Perceptual losses for real-time style transfer and
superresolution, in: European conference on computer vision, Springer, 2016,
pp. 694–711. 37.
[94] W. Liu, Y. Wen, Z. Yu, M. Yang, Large-margin softmax loss for convolutional
neural networks, in: ICML, 2, 2016, p. 7.
[95] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, L. Song, Sphereface: deep hypersphere em-
bedding for face recognition, in: Proceedings of the IEEE conference on com-
puter vision and pattern recognition, 2017, pp. 212–220.
[96] M. Andrychowicz, M. Denil, S. Gomez, M.W. Hoffman, D. Pfau, T. Schaul,
B. Shillingford, N. De Freitas, Learning to learn by gradient descent by gra-
dient descent, in: Advances in neural information processing systems, 2016,
pp. 3981–3989.
15

002 Pattern Detection

Uploaded by

Copyright:

Available Formats

002 Pattern Detection

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

002 Pattern Detection

Uploaded by

Copyright:

Available Formats

Pattern Recognition 111 (2021) 107623

Contents lists available at ScienceDirect

Rapid and Brief Communication

A review of lane detection methods based on deep learning

The effectiveness of multiple branch strategies in VPGNet and

Fig. 7. Sketch of STLane.

A How to simplify the post-processing step

We can summarize most of the above-mentioned methods into

3.3.2. For regression 3.4.1. Pre-processing

Fig. 15. pre-processing.

There are two key parameters in DBSCAN: min-points and epsilon.

The general process of traditional lane detection algorithm is

Authors Method Advantages limitations

Fig. 18. Three sample images of TuSimple dataset.

Fig. 19. Flow diagram of lane detection algorithm.

Table 2 5. Conclusion and future work

JIGANG TANG received the B.S. degree in control technol-

You might also like