Report
Report
Report
This code is a simple semantic segmentation model training and testing process, using
the PyTorch deep learning framework in Python and some related libraries. Here's a
summary of the key points from this code:Libraries used
• from dataset import camvidLoader: Import a custom CamVid dataset loader.
• import numpy as np: Import the NumPy library to handle array operations.
• from dataset import from_label_to_rgb: Import a function that converts tags to
RGB images.
• from skimage.io import imsave: Import a function for saving the image.
• import os: Import an operating system-related library for file path operations.
• from unet import UNet: Import a custom U-Net model.
• import torch: Import the PyTorch deep learning framework.
• from torch.utils.data import DataLoader: Import the tools in PyTorch for data
loading.
Actions taken
This code implements a training, validation, and testing process for a semantic
segmentation model based on U-Net. By iterating the training data in a loop, the
model gradually learns to improve its semantic segmentation effect on images. At the
end of each cycle, the model is evaluated on a validation dataset to understand its
generalization ability. Finally, inference is performed on the test dataset, the
performance of the model is evaluated, and the resulting images are saved.
import numpy as np
import os
import torch
data_root = 'CamVid/sunny'
num_classes = 14
labels = ['Sky', 'Building', 'Pole', 'Road', 'LaneMarking',
'Bicyclist', 'Others']
lr = 0.0002
batch_size = 4
num_workers = 0
lr = 5e-6
epochs = 300
# device = 'cpu'
train_result_dir = 'train_internal_data//'
result_dir = "output//"
weight_dir = "weights//"
if not os.path.exists(train_result_dir):
os.makedirs(train_result_dir)
if not os.path.exists(result_dir):
os.makedirs(result_dir)
if not os.path.exists(weight_dir):
os.makedirs(weight_dir)
train_dataset = camvidLoader(root=data_root,
train_loader = DataLoader(train_dataset,
num_workers=num_workers, batch_size=batch_size,
shuffle=True, drop_last=True)
val_dataset = camvidLoader(root=data_root,
val_loader = DataLoader(val_dataset,
num_workers=num_workers, batch_size=batch_size,
shuffle=True, drop_last=True)
test_dataset = camvidLoader(root=data_root,
test_loader = DataLoader(val_dataset,
num_workers=num_workers, batch_size=batch_size,
shuffle=True, drop_last=True)
unet = unet.to(device)
loss_func = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(unet.parameters(),
lr=lr)
unet.train()
train_loss = 0
count = 0
enumerate(train_loader):
optimizer.zero_grad()
x = imagergb.to(device)
y_ = labelmask.to(device)
y = unet(x)
optimizer.step()
if idx_batch % 10 == 0:
= " + str(idx_batch))
train_loss += loss.item()
count += 1
train_loss /=count
unet.eval()
val_loss = 0
count = 0
enumerate(val_loader):
with torch.no_grad():
x = imagergb.to(device)
y_ = labelmask.to(device)
y = unet(x)
val_loss += loss.item()
count += 1
if idx_batch % 5 == 0:
for idx in range(0, y.shape[0]):
name = filename[idx]
max_index = torch.argmax(y[idx],
dim=0).cpu().int().numpy()
label_rgb_vis =
from_label_to_rgb(max_index)
gt_correct_format =
y_[idx].cpu().int().numpy()
gt_vis =
from_label_to_rgb(gt_correct_format)
result_vis =
result_vis_for_saving = (result_vis *
imsave(train_result_dir +
"valdation_seg_result_{}_{}_{}_{}.png".format(epoch
val_loss /= count
torch.save(unet, model_location)
unet = torch.load(weight_dir + "nearest.pt")
unet = unet.to(device)
unet.eval()
test_loss = 0
count = 0
enumerate(test_loader):
with torch.no_grad():
x = imagergb.to(device)
y_ = labelmask.to(device)
y = unet(x)
test_loss += loss.item()
count += 1
name = filename[idx]
max_index = torch.argmax(y[idx],
dim=0).cpu().int().numpy()
label_rgb_vis = from_label_to_rgb(max_index)
gt_correct_format = y_[idx].cpu().int().numpy()
gt_vis = from_label_to_rgb(gt_correct_format)
result_vis_for_saving = (label_rgb_vis *
imsave(result_dir + "%s.png"%name,
result_vis_for_saving)
test_loss /= count
str(test_loss))
The U-Net model is a deep learning architecture commonly used for image
segmentation tasks, which is characterized by a U-shaped network structure
composed of symmetrical encoders and decoders in order to capture features at
different scales and achieve fine segmentation results. Originally used for biomedical
image segmentation tasks, such as cell and organ segmentation, the model initially
included accurate image segmentation of smaller datasets and relatively simple
scenes, and the ability to handle subtle edges and details, but with the further
improvement and expansion of the U-Net model, it has been successfully applied to a
wider range of image segmentation tasks, including natural images, remote sensing
images, and other fields, showing more powerful performance and applicability.
Street View image segmentation is a challenging task with the following complexities:
1. Complex scene structure: Street View images usually contain a variety of complex
scene structures, such as buildings, roads, pedestrians, vehicles, etc., and there may be
occlusion, overlap, and irregular shapes between these objects, which increases the
difficulty of segmentation algorithms.
2. Lighting and weather changes: Street View imagery is affected by weather and
lighting conditions, and there may be different situations such as bright, dark, and
cloudy. This change causes changes in color, texture, and contrast in the image,
making it difficult for segmentation algorithms to accurately distinguish between
different objects and areas.
3. Perspective and scale changes: Street View images are often taken from different
perspectives and scales, including close-up, long-range and various angles, which
changes the shape, size and proportion of objects, bringing challenges to segmentation
algorithms.
4. Category richness and category imbalance: A variety of categories of objects are
covered in Street View images, such as buildings, trees, sky, roads, etc., and there may
be an imbalance in their number and proportion, which leads to bias in the learning
and recognition of different categories in the model, affecting the accuracy and
robustness of segmentation.
For bright and cloudy datasets, its failure case image might behave like this:
1. Uneven lighting: In bright and cloudy situations, there may be strong uneven
lighting in the image, resulting in overexposure or obvious shadows in some areas,
making it difficult for the segmentation algorithm to accurately extract object
boundaries and detail information.
2. Color distortion: The change of lighting conditions may lead to color distortion in
the image, which makes the colors in different areas deviate from the real situation,
which affects the recognition and utilization of color features by the model and
reduces the accuracy of segmentation.
3. Blurred details: Images in cloudy weather may cause details to be blurred in some
areas, especially in distant or dimly lit places, making it difficult for the model to
accurately distinguish between object boundaries and texture information, thus
affecting the accuracy of segmentation.
Results
The test loss value is 0.555295467376709. The loss function is commonly used to
measure the discrepancy between the model's predictions and the actual labels, with a
lower loss indicating better model performance.
The recall rate of the model is 0.9775319386082872. This metric measures the
proportion of actual positive samples correctly identified by the model out of all
actual positive samples.
The test mean Intersection over Union (mIoU) achieved by the model is
0.762967828608658. mIoU is a crucial metric in image segmentation tasks,
quantifying the average overlap ratio between the predicted regions and the actual
regions.
Conclutions
The F1 score of 0.983 consolidates the model’s balanced capability in both precision
and recall, which is crucial for models where both the type I and type II errors need to
be minimized. Furthermore, the mIoU score of 0.763 in a task like image
segmentation underscores the model's efficacy in distinguishing and overlapping the
predicted areas with the actual ones, which is pivotal in visual data interpretations.
These metrics collectively suggest that the model performs robustly with high
accuracy and reliability in recognizing and predicting correct labels in its specified
tasks. However, ongoing improvements and adjustments might be needed to address
any potential underfitting indicated by the loss value and to enhance the model's
ability to generalize better to unseen data.