An efficient encode-decode deep learning network for lane markings instant segmentation

Received Oct 18, 2020 Revised May 25, 2021 Accepted Jun 12, 2021 Nowadays, advanced driver assistance systems (ADAS) has been incorporated with a distinct type of progressive and essential features. One of the most preliminary and significant features of the ADAS is lane marking detection, which permits the vehicle to keep in a particular road lane itself. It has been detected by utilizing high-specialized, handcrafted features and distinct post-processing approaches lead to less accurate, less efficient, and high computational framework under different environmental conditions. Hence, this research proposed a simple encode-decode deep learning approach under distinguishing environmental effects like different daytime, multiple lanes, different traffic condition, good and medium weather conditions for detecting the lane markings more accurately and efficiently. The proposed model is emphasized on the simple encode-decode Seg-Net framework incorporated with VGG16 architecture that has been trained by using the inequity and cross-entropy losses to obtain more accurate instant segmentation result of lane markings. The framework has been trained and tested on a vast public dataset named Tusimple, which includes around 3.6K training and 2.7 k testing image frames of different environmental conditions. The model has noted the highest accuracy, 96.61%, F1 score 96.34%, precision 98.91%, and recall 93.89%. Also, it has also obtained the lowest 3.125% false positive and 1.259% false-negative value, which transcended some of the previous researches. It is expected to assist significantly in the field of lane markings detection applying deep neural networks.


INTRODUCTION
For many decades, the concern of traffic safety has received considerable attention. To minimize the incidence of vehicle accidents and improve road safety, modern vehicles have integrated an increasing range of advanced driver assistance characteristics, such as lane departure warning, lane-keeping, and automatic emergency braking [1]. A key factor for these technologies is identifying lanes from challenging situations, and many researchers have devoted their efforts to this emerging area in recent times [2]. One of the most successful inventions in road scene analysis for autonomous vehicles is the detection of lane markings [3]. It will also be easier to avoid sudden lane changes and collisions if individuals know where the lanes are located. The significance of locating lane markings is not only for lane-keeping effectiveness but also for the traffic rules that the lane markings show on the streets [4].

RESEARCH METHOD
For lane markings detection, the proposed technique uses a simple encode-decode deep learning approach based on SegNet architecture, and that is very efficient for semantic segmentation [18]. The proposed technique's layout is depicted in Figure 1.

Input dataset and pre-processing
The most used Tusimple dataset has been used for training the proposed method since it contains a vast number of image frames with the proper annotations. It has around 3.6 k image frames for training and around 2.7 k completely unknown image frames for testing. Instead of lane marks, annotated full lane boundary is the main notability of the Tusimple dataset. The dimension of the images is 720×1280. There are three JSON files in the Tusimple dataset, which include the path of the clips having 3626 image frames, the position of the lanes, and the height of the corresponding lanes as a list. The hyper line has been drawn for fitting all the relevant points on each lane after extracting the lane features. All the corresponding lane pixel has been converted into 1 and 0 for the pixels that do not belong to the lanes to create the binary and instant label images. Finally, the image frames have been reformed into 224×224 without taking random cropping to make the aspect ratio constant and reduce the computational complexity. The output from the data processing is the original image, binary label, and instant label.

Model architecture
The proposed model is a combination encode-decode Seg-net deep learning model and a pre-trained VGG16 model. The encode section of Seg-net architecture has been incorporated with the convolution layer of the pre-trained VGG16 model to extract more features from the particular dataset. The proposed model architecture has been shown in Figure 2. There are two sections in the proposed model, such as the encode section to extract the lane markings information from the dataset and the decode section to reconstruct the information from the encoding section. There are thirteen convolutional layers that have been used in the encoding section of the proposed approach. Every convolutional layer contains a two-dimensional convolution layer with the ReLU activation function. Again, a batch normalization layer to train the model with analogous data and more speed. The model has been trained for the kernel size of 3×3, stride 1, and padding 1. Also, thirteen additional convolutional layers have been applied from the pre-trained model VGG16 so that the architecture can extract more lane features to provide an efficient result. Furthermore, there are six max-pooling layers with the kernel of 2×2, and stride two have been executed in the architecture. In the decode stage, there are sixteen convolution layers with five max unspooling layers that have been applied for decoding the extracted lane information from the encoding section. Furthermore, two additional convolutional layers have been applied to have the predicted binary segmentation and instant segmented lane images.

Loss measurement
The loss has been calculated for the backend propagation, update the weight accordingly, and extract the lane information more accurately. There are two types of losses that have been executed for the two segmented images, such as cross-entropy and discriminative loss. As the binary segmentation images contain the information as 0 and 1, the cross-entropy loss has been measured according to (1) [21].
As the instant segmentation ensures the exact position of lanes, the discriminative loss [21] has been executed in this segment. In this loss, the pixels from the same label would be in the nearby position, and pixels from the different labels would be in a distant place. Therefore, pixels from the same lanes would be in the same cluster, and the pixels of the different lanes would be indifferent perspective lanes. The whole process can be done through three different terms for instant separation, neighbourhood, and regulation. The separation section would extend the distance from one lane cluster to another when the lane pixels are close to the threshold value δ_sep. The neighbourhood section would reduce the distance to keep the lane pixel one particular cluster when these pixels are in distance place from the threshold value δ_neighb. Besides, the regularization section would maintain the origin of the clusters. Decisively, the discriminative loss function can be calculated by (2), (3).
Where, N_c=Number of lane cluster, N_e=Number of elements in the lane cluster, M=mean of the instance in the cluster and x_i=instances. The cumulative value of the cross-entropy and discriminative loss have been calculated for the total loss of the network. The backpropagation has been operated through this cumulative loss to update the weight of the network and obtain more accurate output as binary and instance segmentation images.

Interfacing
The yield from the deep learning model is the accumulation of pixels in every lane on the predicted images as binary and instant segmentation. The eventual task is to interface or fit the predicted lane pixels with input images. Therefore, the densely-based spatial clustering of application with noise (DBSCAN) has been used to interface the predicted lane pixels with the input images. DBSCAN works more efficiently than other clustering techniques like K-means in the case of arbitrary and noisy clusters [22]. As the position of the lanes is close to each other and arbitrary like straight and curve, DBSCAN would increase more efficiency in interfacing the lane pixels. The closest distance point in DBSCAN has been taken 0.05 for considering the same lane pixels. If the lane point is less or equal than the mentioned eps point, the point would be considered as in the same lane. On the contrary, the point would be considered in a different cluster. The process would be continued according to the predicted information until all the points on the lanes are converged.

RESULTS AND DISCUSSION
The processed datasets were incorporated into the developed deep learning methods in forecasting lane markings on roads, and the result was assessed in terms of accuracy. Since accuracy is not the only reliable performance metric for evaluating research performance, other performance metrics such as false positive, false negative, and F1 score can also provide a reliable result for evaluating research work performance [23]- [28]. The performance parameter equations were stated in (4)- (7). The model has been trained for 100 epochs, 224×224 image size, and a batch size of 4. Moreover, in the proposed method, PReLU has been investigated as an activation feature. The Adam optimizer has compiled the model with a learning rate of 0.0001, strides, valid padding, and losses, as mentioned in the research methodology. The suggested model has been trained and tested with GTX 1080 Ti on Linux operating system version 18.04. Figure 3 indicates the evolutionary outcome of the suggested model for lane marking detection. The model has achieved the highest 96.53% accuracy, 96.11% F1 score, 97.02% precision, and 93.69% recall. Also, it has also obtained the lowest false positive and false negative value, 3.421% and 1.369%, respectfully. The suggested approach was also examined in various epochs, such as 20, 40, 60, 80, and 100, displayed in Table 1. Table 1 shows the performance result of the proposed architecture that has increased gradually and has reached the highest performance result for 100 epochs. The calculation of loss is also an essential issue for deep learning model evaluation, as the minimum loss refers to the efficient and optimized model. Figure 4 shows the model's total loss, reflecting the gradual decrease in losses during the training period. The lowest total process loss was recorded at 0.0279, indicating the proposed model's efficiency having   The proposed method's efficiency is also compared to some of the more recent lane marking detection approaches, as seen in Table 2. The proposed method outperforms other deep neural network-based lane marking recognition models in Table 2. As compared to other testing methods, the proposed system has the best accuracy, recall, precision, and F1 score. In addition, as compared to other deep learning methods in the field of lane marking detection, the suggested approach has the lowest false positive and false negative values. The proposed model is more effective for detecting lane marking than others due to its superior evolutionary outcome as compared to current deep learning techniques. The proposed method often used a straightforward encode-decode deep neural network structure with fewer weights, implying lower computational complexity Furthermore, since DBSCAN was used to interconnect the projected segmented image pixels rather than separate convolutional networks, it ensures less computational complexity and consistency with fixed and curve lanes. The model was trained and validated on the Tusinple dataset, which includes image frames of different critical environmental factors such as straight lane, curve lane, shadow, distinct lighting, and so on. Figure 5 depicts some sample input-output images of lane markings for visualising the result of the proposed system. From the left side of Figure 5 includes the original image, predicted image, corresponding colour image and interface on the original image respectfully. Figure 5 shows that the model can identify road lane markings more precisely and effectively because its accuracy and other evaluation metrics are higher than those of other existing technologies. It is also expected that the proposed method would have a substantial effect on lane markings identification.

CONCLUSION
A simple encode-decode Seg-Net framework incorporated with VGG16 has been proposed to detect the lane markings on distinct environmental effects. An open-source Tusimple dataset, including distinct intricate environmental conditions, has been used for training and testing the framework. The proposed system noted higher accuracy, F1 score, precision, and recall than previous research work with lower computational complexity. Also, it has achieved minimum loss, false position, and false negative values during the training process. As a result, the proposed approach is a more efficient algorithm for detecting lane markings that outperforms current methods in terms of the performance parameters listed. The outcome can be improved by using a large dataset that includes various complex environmental variables, as long as the algorithm can learn more about the large dataset.