paper_hal

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Automated Ship Detection and Characterization in

Sentinel-2 Images: A Comprehensive Approach


Bou-Laouz Moujahid, Vadaine Rodolphe, Hajduch Guillaume, Ronan Fablet

To cite this version:


Bou-Laouz Moujahid, Vadaine Rodolphe, Hajduch Guillaume, Ronan Fablet. Automated Ship Detec-
tion and Characterization in Sentinel-2 Images: A Comprehensive Approach. 2023. �hal-04359761�

HAL Id: hal-04359761


https://hal.science/hal-04359761v1
Preprint submitted on 21 Dec 2023

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est


archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.

Distributed under a Creative Commons Attribution 4.0 International License


1

Automated Ship Detection and Characterization in


Sentinel-2 Images: A Comprehensive Approach
Bou-Laouz Moujahid1 , Vadaine Rodolphe2 , Hajduch Guillaume2 , and Fablet Ronan1
1
IMT Atlantique, Plouzané France
2
CLS, Collecte Localisation Satellites, Plouzané France

Abstract—The automatic detection and characterization of resolution (VHR) (0.3 m resolution) [8] and high-resolution
ships in optical remote sensing images is a key challenge for (HR) (0.4 − 2 m resolution) [9] imagery. However, these
maritime surveillance applications. This paper presents an satellites images are typically accessible exclusively through
automated system specically designed for ship detection in
medium-resolution Sentinel-2 images. The proposed approach tasking modes, resulting in high acquisition costs and a limited
relies on a deep learning model trained on a dataset comprising interest for continuous monitoring and surveillance tasks. By
over 6000 annotated Sentinel-2 images. It achieves a detection contrast, medium-resolution (MR) optical imagery (typically,
rate of 93%, with an average of 2.1 to 3.9 false alarms per a 10 m resolution), as deployed on Sentinel-2, delivers freely
Sentinel-2 image. Besides the detection task, it also addresses the available remote sensing images for numerous locations on
estimation of ship lengths as well as ship headings. It yields a
mean error of 15.36m ± 19.57m for ship lengths, and estimates Earth with a revisit time ranging between 5 and 10 days. This
ship headings with an accuracy of 93%. This contribution seems particularly adapted to maritime surveillance tasks. Yet,
signicantly enhances the performance of ship detection and only few studies [7] have addressed the automated detection
characterization systems in optical remote sensing imagery. and characterization of ships in MR optical satellite imagery.
Here, we address these challenging issues and present
Index Terms: Deep Neural Network, Sentinel-2 Images, Ship
Detection, Ship Characterization a deep learning approach. We rst collect a representative
groundtruthed dataset comprising more than 12000 ship ex-
amplars from 6000 Sentinel-2 images. Our multi-task deep
I. I NTRODUCTION learning scheme relies on a Faster R-CNN. Our numerical
Spaceborne remote sensing imagery conveys invaluable in- experiments explore data splitting strategies during the training
formation for the monitoring of maritime activities, especially phase to account for class imbalance. We also asses the impact
the maritime trafc. This is of critical importance for both of the neural backbone of the Faster R-CNN architecture.
surveillance and defense issues [1]. We may cite among Overall, we report state-of-the-art performance with a detec-
others the monitoring of maritime borders, the identication tion rate above 93% including for small ships with a ship
of illegal maritime behaviours, the ght against illegal shing length below 20 m. These results support the relevance MR
and smuggling activities, search and rescue operations... optical imagery for maritime monitoring besides HR optical
We can distinguish two main categories of satellite imagery and SAR imagery.
for maritime surveillance topics: Synthetic Aperture Radar This paper is organized as follows. In Section II, we provide
(SAR) imagery [2] and optical imagery [3]. Currently, SAR an overview of related work and analyze their drawbacks. The
imagery is more widespread due to its applicability both details of the proposed approach are presented in Section III.
day and night and in all weather conditions (e.g., cloudy Section IV substantiates the relevance of our proposed meth-
conditions). Ship detection in SAR images [4] relies on ods through experiments. Discussion is covered in Section V,
relatively simple low-level image processing schemes for and the conclusion is presented in Section VI.
ships which possess a metallic structure highly responsive to
radar signals. This may lead to detection ambiguities along the
II. R ELATED WORK AND M OTIVATION
seashore, where other metallic structures (buoys, pontoons,
etc.) cause high-amplitude patterns in SAR images [5], as Recent review papers [7], [10] provide surveys on ship
well as in the case of sea clutter [6]. Additionally, inatable detection and characterization in optical satellite images.
and wooden boats like Zodiacs and plastic boats can hardly They compile over a hundred research articles dating back
be identied in SAR images. By contrast, optical imagery to 1978, providing a comprehensive overview of the subject.
offers a practical alternative for ship detection, increasing not The majority of these studies have investigated ship detection
only detection capabilities in complex environments but also using High-Resolution (HR) and Very High-Resolution
providing the ability to detect all types of ships. (VHR) images, employing deep learning approaches such as
Faster R-CNN [11], YOLO [12] and U-net [13]. However, our
In recent years, ship detection from optical imagery has seen specic challenge is different from working with high or very
increased research effort, with a focus on deep learning meth- high-resolution (<5 m) images. In these cases, identifying
ods [7]. Most studies focus on ship detection from very-high- ships among other objects is relatively straightforward due to
2

their visually distinct features. As a result, the challenge of fully conclusive compared with the performance reported for
ship detection in high-resolution images primarily revolves HR and SAR imagery.
around maximizing the detection rate and closely aligns The objective of our work is to create an automated system
with the established problem of object detection in computer for detecting and characterizing ships in Sentinel-2 images.
vision, involving issues such as adjacent and small object
detection [11]. III. P ROPOSED APPROACH
We present a multi-stage approach for ship detection and
In medium-resolution imagery (10 m in our case), as illus- characterization in Sentinel-2 images. In the initial stage,
trated in Figure 1, ships, especially smaller ones of 50-meter- we employ a sliding window technique to systematically
long or below, the visual detection and characterization may cover the image. Each window is of size 100 × 100 pixels,
be complex: for instance, a large ship can resemble parts of a representing 1 km × 1 km, and overlaps with neighboring
platform; a small island may have a topology similar to a ship; windows by 25%. This overlap ensures that if a ship extends
a very small ship might appear almost identical to a bright spot across two windows, a signicant portion of it remains
on rough waters, or even be mistaken for a small cloud. As a detectable in at least one of them. A window is considered
result, a key challenge is to maximize the detection rate while valid if it contains at least 5% of sea pixels, determined using
minimizing false alarms. This may question how above cited the land/sea mask. Subsequently, these valid windows are
studies for HR images apply to MR optical imagery. categorized as either ”Ship” or ”No ship” using a Resnet-type
[22] classier. For the windows classied as ”Ship”, we
utilize a Faster R-CNN detector to obtain a bounding box
around the detected ships (only the coordinates of the ships
are necessary). This detector incorporates a dedicated branch
to estimate the ship’s heading. Figure 2 provides an overview
of our ship detection and heading estimation system.
Fig. 1: Illustration of complex ship images, presenting chal- When different adjacent patches detect parts of the same
lenges in visual detection and characterization. ship, we apply the non-maximum suppression (NMS)
algorithm. However, in our approach, the suppression is not
based on network condence but rather on the size of the
Only a limited number of studies have specically explored
bounding box. Our primary objective in this context is to
ship detection in MR imagery. Most of these works tend to
retain the largest bounding box to obtain the most precise
concentrate on specic aspects. For example, [14] delves into
ship coordinates. Finally, once the coordinates are identied,
ship detection and characterization in favorable conditions.
we create a 50 × 50 image centered on these coordinates to
[15] addresses the issue of scarce annotated ship data and
estimate the ship length using a Resnet-type network (see
introduces a ’self-supervised learning’ approach. Additionally,
Figure 3).
[16] introduces a method for identifying particular ship shapes
The motivation behind proposing a two-stage strategy is
associated with migrant activity. Few articles adopt a compre-
as follows. In current state-of-the-art schemes, candidate
hensive approach [17], which is our primary area of interest.
proposals are typically classied based on their internal
This scarcity of studies is, in part, attributable to the
features. Consequently, a region containing a very small
absence of freely available ship datasets. In contrast to ship
vessel may exhibit the same features as a region containing
detection at high resolution, which benets from an established
a portion of rough sea. The context becomes crucial in this
reference dataset [18], the only publicly available Sentinel-2
scenario, and the initial classication step is necessary to
ship datasets are [19] and [15]. These datasets comprise 31
capture the context within sufciently large patches that
(2000 ship exemplars) and 16 (1053 ship exemplars) Sentinel-
encompass the entire context while remaining small enough
2 images respectively, which may be limited to deploy state-
to avoid missing very small vessels.
of-the-art learning-based frameworks.
Only [17] addresses the detection of ships in medium- In our approach, each individual component undergoes
resolution satellite optical images within a relatively general distinct training phases, following a traditional methodology
framework. In this article, the analyzed images are sourced that includes training, validation, and testing. It is strongly
from both the Sentinel-2 mission satellites and the Planet recommended to include a substantial number of complete
Labs Dove satellite constellation. The images are divided into Sentinel-2 images during the model development phase. This
800 × 800 pixels patches, which are then used as input for approach not only allows for the identication and resolution
the Faster R-CNN [20] object detection model. The annotation of potential methodological limitations but also facilitates a
process relies on colocating the satellite images with AIS (Au- comprehensive examination of the test dataset’s distribution.
tomatic Identication System) data [21]. AIS data comprise
ship identiers and locations to groundtruth bounding boxes
in the considered dataset. As most small ships are not equipped
with AIS systems, this study only addresses large ships (> 100 A. Classication phase
m). Overall, this study reports a 85% detection rate but does Developing a ”Ship” and ”No ship” category classier
not document the false alarm rate. These results do not appear involves two main phases. Firstly, we focus on generating
3

Fig. 2: An Overview of Our Ship Detection and Heading Estimation System on Sentinel-2 Images. The Faster R-CNN model
comprises a backbone constructed from the Resnet18 classier trained on Sentinel-2 data. The layer names in black used to
build detection feature map are derived from keras applications [23].

approximately 12000 ship images and 20000 non-ship images.


To our knowledge, this represents the rst instance where
a signicantly large dataset has been employed for ship
detection in medium-resolution optical imagery. The ”No ship”
category receives particular attention, as it must encompass
all the scenarios encountered in optical images, including
sea, land, and clouds. Images consisting solely of land are
excluded, as their presence does not impact performance due
to the availability of a land mask that eliminates these false
alarms. Furthermore, it is essential to ensure that our dataset
adequately represented objects resembling ships, such as small
Fig. 3: Ship Length Estimation Component. clouds, small islands, rough seas, and platforms. Manual
verication is performed extensively to conrm the presence
of all these ”sub-categories” within our dataset.
datasets and verifying their distribution. Secondly, we work on 2) Verication of Distribution: The partitioning of train-
the model aspect, which includes selecting models and tuning ing, validation, and test sets may appear straightforward in
parameters. the majority of machine learning problems. However, when
1) Considered datasets: We generate our dataset by ex- dealing with automation applied to a wide range of diverse
tracting patches with dimensions of 100 × 100 pixels from images under various scenarios, a careful partitioning becomes
Sentinel-2 images. Specically, we utilize only the Red, Green, crucial. Let us assume that our network is trained on images
and Blue (RGB) bands (10 m resolution). This selection aims featuring rough seas but is not tested on samples from this
to leverage the advantages of a pre-trained neural network specic scenario to assess its ”understanding”. This can lead to
which is a powerful tool for increasing the accuracy and the vulnerabilities during the deployment phase: results on the test
robustness of our classier [24]. These images are generated set may appear very promising, but performance on real-world
using the vessel detection reports provided by CLS analysts. images may be signicantly poorer. Indeed, our issue involves
These vessel detection reports are routinely generated by CLS binary classication. However, given the diversity of scenarios,
in the context of its commercial activities and contains for each we consider working with subcategories. In our work, We
Sentinel-2 satellite image analysed the position of the detected divide the ”Ship” category into 11 sub-categories: Ship 10,
vessels along its characteristics such as length and heading Ship 20, Ship 30, Ship 40, Ship 50, Ship 100, Ship 150, Ship
when these are measurable. Using the vessel detection reports 200, Ship 300, Ship 400, and Ship 458. In this terminology,
available, we process over 6000 Sentinel 2 images, extracting ”Ship x” signies a ship with a length less than ”x”. Figure
4

4 illustrates the distribution of ”Ship” category in our dataset


based on their size.

(a) Others (b) Artifcats (c) Cloud

(d) Island (e) Land (f) Swell

Fig. 6: Examples of No ship sub-categories images.

Fig. 4: Distribution of ship sub-categories.


subsequently divided into training, evaluation, and test sets.
This approach ensures that the neural network is trained and
tested on a wider range of data, resulting in a model that is
not only more accurate but also more robust. Our approach is
summarized in Diagram 7.
We also extensively employ data augmentation techniques
to ensure a balanced representation of all our sub-cateogies.
For the ’Ship’ category, we apply vertical and horizontal
ips as well as rotations. Regarding the ’No ship’ category,
we employ the same augmentation techniques in addition to
elastic deformation to diversify the shapes of islands and
clouds. It’s worth noting that conducting a comprehensive
enumeration of all the natural sub-categories is of paramount
importance. The more comprehensive the enumeration, the
more effective the network will be in practical applications.
Carrying out this enumeration during the data generation phase
can save considerable time and effort, as it reduces the need
for frequent ne-tuning of the network.
Fig. 5: Distribution of No ship sub-categories. 3) Classication Network: We conduct extensive experi-
ments using various pretrained models on the Imagenet dataset
[26], including InceptionV3 [27], Resnet18, Resnet34, and
We also subdivide the ”No ship” category into 6 sub-
Resnet152 [22], while ne-tuning different parameters. Our
categories: Others compries patches devoid of vessels,
ndings consistently favore the Resnet architecture, which
randomly sampled from 6000 Sentinel-2 images, Artifacts
demonstrates superior performance on our dataset.
contains patches with unwanted visual anomalies or distortions
In our training process, binary cross-entropy serves as our
that can occur during image acquisition or processing, Cloud
loss function. We utilize the Adam Optimizer with a batch
contains various cloud formations, Island includes patches
size of 128 and a learning rate set at 10−4 .
featuring islands and underwater reliefs, Land predominantly
To further enhance our model’s ability to reduce false
consists of coastal areas, ports, and a few images with only
alarms, we implemented an ensemble approach using two
land terrain, lastly, Swell contains various forms of swell.
classiers: ’Resnet152’ and ’Resnet34.’ These two models
Figure 5 illustrates the distribution of ”No ship” category
exhibit comparable results and show signicant improvements
based on this subdivision. Figure 6 shows some exemples of
in false alarm reduction, albeit with a slight decrease in ship
each sub-category.
detection performance.
To reduce the random variability caused by the train/test
split, we further divide each of these subclasses into 5 clus- B. Detection phase
ters using an automatic classication method, specically K- The core of our system is the classication component,
means [25]. This clustering is based on the histogram values which plays a pivotal role. The main goal of the detection
of the image’s 3 color channels. Each of these clusters is component is to assign bounding boxes to ships that
5

Fig. 7: Our K-means method for forming the training, validation, and test sets.

our classier identies within patches labeled as ’Ship’. essential information required to serve as the backbone of
Additionally, it may eliminate some false alarms if no ship is our detector. While the use of the initial 25 × 25 feature
detected in the patch by the detector. To build our detection map from the classication network aids in proposing smaller
network, we employe the Faster R-CNN framework [20]. regions that may contain small ships, it remains relatively
Despite its introduction in 2015, Faster R-CNN, along with shallow for capturing information about larger vessels and
adapted versions, continues to demonstrate its state-of-the-art other objects within the image. To address this, we integrate
performance on established reference datasets [28],[29]. intermediate feature maps of sizes 13 × 13 and 7 × 7. By
However, it’s important to note that we do not explore other employing upsampling techniques, we combine these feature
detector types in this work, including YOLO and its variants maps to create the feature map for region proposals, ensuring
[30] and detection transformer (DETR) and its variants [31]. a comprehensive representation of potential objects of interest
[34].
We utilize a modied version of Faster R-CNN for detec- The region proposal network (RPN) architecture, after cre-
tion, based on the code referenced in [32], implemented on the ating the feature maps, remains unchanged from the original
TensorFlow [33] framework. The primary motivation for these Faster R-CNN. It consists solely of a convolutional layer (512
modications is to enhance the detection of small vessels. The channels) and two additional convolutional layers, one for
Faster R-CNN architecture can be divided into three key com- classication and the other for regression.
ponents: the Backbone network, the Region Proposal Network During the training of the RPN, an anchor is considered
(RPN), and the Detector network. Since we are dealing with positive only if its Intersection over Union (IOU) with a
a single object type, we can use only the RPN along with ground truth box exceeds 0.7. Conversely, it is labeled as
the backbone network to detect ships. However, by employing negative if its IOU falls below 0.3. Anchors falling within
the entire structure, we achieve signicantly improved results. this range are not used for training purposes. To address the
This improvement is attributed to the Detector network, which class imbalance issue, we employ a weighted loss function,
incorporates Fully Connected layers, rening the outcomes of which helps balance the small number of positive anchors per
the RPN, which relies solely on convolutional operations. image.
Our patches are sized at 100 × 100 pixels and contain After proposing the regions of interest, we apply ROI
sometimes very small vessels, representing just 1 pixel of pooling to transform these regions into a xed size of 6 × 6,
the image, and sometimes large ships, occupying more than which is determined based on the lengths of the ships in our
40 pixels. To accommodate this size range, we congure the datasets. Subsequently, these resized regions are forwarded to
Region Proposal Network (RPN) to propose different regions the detector network. The structure of our modied Faster R-
of interest, even for the smallest objects. To achieve this, we CNN is illustrated within the green rectangle in Figure 2.
utilize anchor boxes of various sizes based on the statistics Only the RPN part and the Detector part are trained with a
of the ship length in our dataset, we choose 4 × 4, 7 × 7, learning rate of 10−4 . The weights of the backbone layers
10 × 10, 14 × 14, 18 × 18, and 28 × 28 for square-shaped are frozen. We believe that since the classier performs
anchor boxes. Additionally, we maintain the original aspect exceptionally well on a large dataset containing both ”Ship”
ratios from the Faster R-CNN paper [20], which are 1, 0.5, and ”No ship” images, the network only needs to learn how
and 2. Furthermore, the input feature map for the RPN is set to express the precise position of the ship, which is what the
to a size of 25 × 25 (with a stride length of 4) to preserve RPN and Detector are designed to accomplish.
crucial information from very small vessels, especially in
challenging scenarios. This conguration allows the RPN to C. Characterization of ships
effectively handle objects of different sizes within our patches. In this section, we introduce the deep learning component
for the estimation of the heading of the ship by adding an
The feature map is constructed from the classication additional branch to the Faster R-CNN architecture. Addition-
network mentioned earlier. In fact, the classication net- ally, we use a ResNet type model to estimate the size of the
work demonstrates excellent results in distinguishing between ship.
”Ship” and ”No ship” images. As a result, we believe that 1) Headings: While it was feasible to train a bounding box
the feature maps generated by this network encompass the with the ship’s orientation, as demonstrated in [35], we have
6

access to a dataset containing 3000 ships annotated with their ships, low-density ship images, and cloud-covered scenes. In
heading. This information proves to be more valuable than a total, the 60 images collectively contain 878475 valid patches.
simple rotated bounding box for our specic task. A patch is considered valid if it contains at least 5% of sea
Concerning the training process, we keep all the weights of pixels. In total, the 60 images collectively contain around 1147
our previously discussed Faster R-CNN model frozen. Then, ships. The image annotations are carried out by CLS analysts.
we introduce an additional branch after the ROI pooling layer, It’s important to note that very small ships may not have
as illustrated within the sky blue rectangle in Figure 2, to han- been annotated due to the inherent difculty and resolution
dle ship orientation estimation. In this branch, we utilize both limitations. Our evaluation set is available in [36].
the cosine and sine of the ship’s angle as model outputs. This
approach is adopted to convey circular information effectively
to the network.
2) Length Estimation: Since regions of interest (ROIs) are
resized during the process, some size information is inevitably
lost. To address this issue, one approach is to recover size
information using ship orientation and the scaling factor.
However, given the current resolution, attempts to introduce an
additional branch for length estimation, similar to what we do
for ship heading, do not yield satisfactory results. Therefore,
we decided to employ a separate network for this purpose.
We utilize the Resnet50 network, with inputs sized at 50×50
pixels and centered on the ship. While smaller patches can be Fig. 8: Location of the 60 Sentinel 2 Images Constituting the
considered (46 since the biggest ship we can meet is 458), Evaluation Set.
during the deployment of the entire ship detection system, the
inputs for the length estimation model are constructed around
the center of the ship’s bounding box predicted by our detector.
B. Classication Experimentations
This bounding box may be slightly affected by ship wakes or
irregular ship shapes. Hence, we opt for this patch size, taking 1) Evaluating Distribution Verication Effectiveness: In
into account potential small errors in the detection phase. this experiment, we analyze and compare the results achieved
It is essential to note that for ensuring the robustness of by employing various data splitting methods. We utilize the
our approach, we systematically exclude all images featuring pre-trained Resnet152 network on the Imagenet dataset, as it
partial cropping and inadequately labeled lengths. However, exhibits the best performance. We examine three distinct data
we do not exclude images of ships partially obscured by splitting congurations:
clouds if their lengths are still measurable; otherwise, they are • In the rst approach, we implement a random split, as
excluded. This renement leads to a dataset consisting of 7460 depicted in the rst phase of the diagram 7. This split is
images, distributed randomly across training (80%), validation referred to as the ”Random split.”
(10%), and test (10%) sets. • In the second approach, we employ a split based on
subclasses, as shown in the second phase of the diagram
IV. E XPERIMENTS AND R ESULTS 7. This split is referred to as the ”Subclass split.”
• In the third approach, we create a split based on clusters
In this section, we present the outcomes of our system,
within each subclass, represented in the third phase of
along with experiments demonstrating the effectiveness of the
the diagram 7. This split is referred to as the ”Subclass
methods used in both the classication and detection phases.
k-means split.”
We commence by introducing the experiments and results of
the classication phase, followed by those of the detection We present the performance metrics for the test sets in
phase. Subsequently, we delve into the characterization phase, Table I, each comprising approximately 6500 images, with
and nally, we present the results of our complete ship 2500 ship images and 4000 no-ship images. The metrics used
detection system. To assess the performance of our system, to assess our classiers include the False Positive Number
we utilize a distinct dataset comprising 60 Sentinel-2 images, (FP) and Recall. We opted for FP instead of precision due to
herein referred to as the Evaluation set. It is essential to note the minimal occurrence of false positives. These metrics are
that this dataset is distinct from the training, validation, and compared across different condence thresholds: 0.50, 0.75,
test datasets used for training each component of the system and 0.95.
Table I shows comparative results; however, we cannot draw
conclusions without testing the methods on a common dataset:
A. Description of the Evaluation Set the evaluation set (constructed using 60 Sentinel-2 images, as
The evaluation set consists of 60 Sentinel-2 images captured previously mentioned).
from approximately 20 different Earth locations, as indicated Table II displays the results of the three distinct models,
by red stars in Figure 8, at various acquisition times. These evaluated on the Evaluation set. The FP metric represents
images encompass a wide range of scenarios, including coast- the aggregate count of false positives observed across all
lines, turbulent seas, artifacts, images with a high density of 60 evaluation images, encompassing approximately 877470
7

0.5 0.75 0.95


Split / Threshold
FP Rec.(%) FP Rec.(%) FP Rec.(%)
Random 45 98.1 34 97.7 15 96.0
Subclass 23 98.3 12 97.9 5 96.0
Subclass k-means 25 98.1 16 97.5 8 95.9

TABLE I: Evaluation of Data Splitting Methods on Their


Respective Test Sets Using Resnet152. Fig. 9: The only four false positives produced by the
Resnet152 and Resnet34 ensemble.

0.5 0.75 0.95


Split / Threshold
FP Rec.(%) FP Rec.(%) FP Rec.(%) C. Detection Experimentations
Random 2529 98.2 1818 97.6 1003 96.1 1) Evaluating our Modied Faster R-CNN: In this exper-
Subclass 1050 98.4 840 97.9 605 96.2
Subclass k-means 546 98.2 406 97.1 240 95.5 iment, we assess the effectiveness of our modied Faster
TABLE II: Evaluation of Data Splitting Methods on The R-CNN in comparison to the original version. To ensure a
Evaluation Set Using Resnet152. robust comparison, we adjust the classical Faster R-CNN with
a ResNet18 backbone pretrained on the ImageNet dataset,
solely due to differences in input size. The feature map
used for generating proposals is derived from the layer
valid patches devoid of ships. Remarkably, the ’k-means split’ ”Stage 4 unit1 relu1”. To evaluate the performance of these
method outperforms both the ’random split’ and ’subclass models, we employ Detection rate, False alarm rate , F1 score,
split’ approaches, resulting in the lowest occurrences of false and Average Precision, noting that a detection is considered
positives, averaging approximately 4 per Sentinel-2 image. correct only if it’s Intersection over Union (IOU) exceeds
This outcome underscores the signicance of employing veri- 0.5. Additionally, we conduct a performance comparison using
cation techniques and methodological renements to reduce ResNet152 and ResNet34 backbones. Our adapted version
the random variability inherent in splitting datasets, partic- of the Faster R-CNN is denoted as MRS-Faster R-CNN,
ularly in complex scenarios with a wide range of image underscoring its focus on medium-resolution ship detection.
characteristics.
2) Ensemble network classication: Throughout the re- Model / Metric D(%) FA(%) F1 score(%) AP(%)

mainder of our results, we exclusively employ the dataset Faster R-CNN / ResNet18 70 66 45.7 30.9
MRS-Faster R-CNN / ResNet18 80.4 18.8 75.4 77.1
split derived from the k-means method. To minimize the MRS-Faster R-CNN / ResNet34 77.5 23.7 76.9 74.4
False Positive Number, we conduct extensive experiments with MRS-Faster R-CNN / ResNet152 78 27.8 75 73.1
various models pretrained on Imagenet. Table III summarizes TABLE IV: Performance Metrics for Different detection Mod-
the outcomes of these models in terms of False Positive els.
Number (FP) and Recall metrics.
Our MRS-Faster R-CNN demonstrates promising perfor-
Model / Threshold
0.5 0.75 0.95 mance in comparison to a standard version of Faster R-CNN.
FP Rec.(%) FP Rec.(%) FP Rec.(%) Even though Resnet152 and Resnet34 yield superior classi-
InceptionV3 36 97.6 25 97.0 11 95.0 cation results (Table III), the feature maps constructed from
Resnet18 40 97.0 28 96.9 13 94.0 Resnet18 produce the best outcomes. We hypothesize that the
Resnet34 33 97.5 24 97.2 9 94.8
Resnet152 25 98.1 16 97.5 8 95.9 comparatively shallower architecture of Resnet18 allows the
Ens. Resnet34/Resnet152 14 96.5 10 95.3 4 93.0 intermediate feature maps which are of a relatively large size
TABLE III: Evaluation of Various Pretrained Models’ Perfor- to encompass a richer set of information. It’s worth noting that
mance on the Test Dataset. utilizing a backbone constructed from Resnet152 or Resnet34
for detection will help minimize image processing time in our
system, albeit with a minor trade-off in performance.
Our aim is to create the optimal ensemble, prioritizing Detection and False Alarm rates are inuenced by two key
FP reduction while tolerating a slight drop in Recall. To factors. Firstly, our dataset comprises numerous small vessels
achieve this, we employ ensemble models and manually se- with wakes, leading to instances where wakes are considered
lected aggregation rules, guided by a criterion: False Positives as part of the ship due to resolution limitations. However, our
should not include any images clearly devoid of vessels. detector consistently distinguishes between wakes and vessels
Consequently, we choose Resnet152 and Resnet54 with an on larger ships, as depicted in Figure 10. Secondly, our model
aggregation rule stipulating that both models must output encounters challenges in detecting closely positioned ships,
a condence score exceeding 0.95 to classify an image as particularly small vessels, as illustrated in Figure 11. Rening
containing a ship. the Intersection over Union (IOU) condition to 0.25, Detection
The Ensemble network yields a Recall of 93% with only rate reaches up to 95%, and False alarm rate decreases to 5%.
4 false positives on the test set. Figure 9 illustrates images
of these four false positives. It is not evident whether these 2) Direct application of the Faster R-CNN: In this exper-
images contain a ship. iment, we directly apply our detector to the evaluation set,
8

Fig. 10: Model detections exemples : Green bounding boxes


denote annotations, while yellow indicates model predictions.

Fig. 12: System Performance evaluation : Overall Detection


rate : 93.2%, Potential False Alarms (Range): 128-234, Aver-
age False Alarms for One Image (Range): 2.1-3.9.

Fig. 13: Examples of False Alarms.


Fig. 11: Model struggles with close ship differentiation, gen-
erating a single bounding box for closely positioned vessels.
left part of the gure, a predicted heading is considered accu-
rate if the absolute difference between the predicted heading
excluding the use of the classication component, following (hp ) and the ground truth heading (hgt ) is less than a specied
the approch of [17]. This direct application yields a Detection heading error (herror ), with herror taking values of 15◦ , 30◦ ,
Rate of 96% and an Average False Alarm number per image and 45◦ . In the right part of the gure, accuracy is determined
of 50. The expected outcome aligns with the inherent design by verifying if the absolute difference between the predicted
of the Faster R-CNN, which lacks contextual information. To heading and the actual heading modulo 180◦ is less than herror .
ensure a fair experiment with direct applications, additional This relaxation of the rst condition helps accurately assess the
enhancements are essential. Incorporating branches to encode model’s capability, especially in situations where the direction
contextual information, as demonstrated in [37], represents is unclear due to resolution limitations, as depicted in Figure
a potential avenue for improvement. Notably, such enhance- 15. The accuracy for predicting headings within a range of
ments were not explored in our work. ±30 degrees is 93%.

D. System Performance evaluation


We assess the performance of our ship detection system,
emphasizing the detection rate and false alarm count. Figure 12
presents the detection rates for various ship sizes, showcasing
an overall detection rate of 93.2%. The system records an
average of 2.1 to 3.9 false alarms per Sentinel-2 image. The
uncertainty in false alarm numbers stems from the presence
of 106 detections resembling small ships (< 20m) but not
annotated by CLS analysts due to resolution limitations. No-
tably, our system excels in detecting large vessels (> 100m),
achieving a detection rate exceeding 97%. Table V presents
detailed results for each image, wherein false alarms are pre-
Fig. 14: Heading Prediction Results for 600 Ship
dominantly attributed to harbor facilities, offshore platforms,
Images. Left: A correct prediction is indicated when
and swell (Figure 13). These results demonstrate signicant
|hp − hgt | ≤ herror . Right: A correct prediction occurs when
promise for achieving complete automation in ship detection
|hp − hgt | mod 180 ≤ herror .
on Sentinel-2 images and, more broadly, on optical medium-
resolution images.
2) Length estimation Results: The results on length esti-
mation are based on 746 ship images with dimensions of
E. Ship Characterization Results 50 × 50. The mean length error is reported as 15.36m ±
1) Heading Results: Figure 14 displays the heading results 19.57m, accompanied by an R-squared value of 92 (Figure
for 600 ship images. We assess accuracy in two ways. In the 16). The relatively high standard deviation error is attributed
9

Image Name Main Features Total Ships DR(%) FA Potential Ships, Not Clear
S2A MUL ORT 13 20220521T084745 20220521T084745 00064 Coast, Calm sea 15 86.6 1 0
S2A MUL ORT 13 20220521T084756 20220521T084756 00064 Coast, Calm sea 13 100 0 7
S2A MUL ORT 13 20220702T063238 20220702T063238 00091 Clouds 0 . 0 0
S2A MUL ORT 13 20220707T083745 20220707T083745 00021 Coast, Calm sea, Harbor 140 94.2 6 3
S2A MUL ORT 13 20220712T063233 20220712T063233 00091 Clouds 0 . 0 0
S2A MUL ORT 13 20220719T074600 20220719T074600 00049 Coast, Islands 18 100 0 13
S2A MUL ORT 13 20220719T074613 20220719T074613 00049 Coast, Islands, Clouds 26 100 4 20
S2A MUL ORT 13 20220719T074628 20220719T074628 00049 Coast, Islands, Clouds, Underwater re- 15 93.3 4 3
lief
S2A MUL ORT 13 20220719T074638 20220719T074638 00049 Coast, Islands, Clouds, Underwater re- 8 100 4 3
lief, Artifacts
S2A MUL ORT 13 20220801T063228 20220801T063228 00091 Small Clouds 0 . 0 0
S2A MUL ORT 13 20220801T063229 20220801T063229 00091 Small Clouds, Islands 0 . 0 0
S2A MUL ORT 13 20220801T063235 20220801T063235 00091 Small Clouds, Islands 0 . 0 0
S2A MUL ORT 13 20220808T062238 20220808T062238 00048 Clouds 0 . 0 0
S2A MUL ORT 13 20220818T062238 20220818T062238 00048 Clouds 1 0 0 0
S2A MUL ORT 13 20220821T063236 20220821T063236 00091 Clouds 0 . 0 0
S2A MUL ORT 13 20220828T062238 20220828T062238 00048 Complete Cloud Cover 0 . 0 0
S2A MUL ORT 13 20220908T004106 20220908T004106 00059 Coast, Small Clouds, Underwater relief 14 85.7 0 2
S2A MUL ORT 13 20220908T004119 20220908T004119 00059 Coast, Small Clouds, Underwater relief 0 . 0 1
S2A MUL ORT 13 20220908T004130 20220908T004130 00059 Coast, Small Clouds, Underwater relief 16 81.3 5 0
S2A MUL ORT 13 20220910T063231 20220910T063231 00091 Calm Sea, Small Clouds 0 . 0 0
S2A MUL ORT 13 20220910T063234 20220910T063234 00091 Clouds 0 . 0 0
S2A MUL ORT 13 20220912T220946 20220912T220946 00129 Clouds, Swell 0 . 7 0
S2A MUL ORT 13 20220915T034601 20220915T034601 00018 Clouds, Platforms 14 100 2 5
S2A MUL ORT 13 20220915T034604 20220915T034604 00018 Clouds, Platforms 15 100 1 5
S2A MUL ORT 13 20220915T034630 20220915T034630 00018 Clouds 3 100 0 0
S2A MUL ORT 13 20220915T222001 20220915T222001 00029 Clouds 0 . 0 0
S2A MUL ORT 13 20220918T035449 20220918T035449 00061 Clouds 40 90 6 3
S2A MUL ORT 13 20220918T035521 20220918T035521 00061 Coast, Clouds 8 75 0 0
S2A MUL ORT 13 20220918T035607 20220918T035607 00061 Coast, Clouds, Islands 20 80 4 2
S2A MUL ORT 13 20220918T035616 20220918T035616 00061 Coast, Clouds 17 70 2 0
S2A MUL ORT 13 20220918T035622 20220918T035622 00061 Coast, Clouds, Islands 97 92.7 1 4
S2A MUL ORT 13 20220920T063234 20220920T063234 00091 Clouds 0 . 0 0
S2B MUL ORT 13 20220509T085722 20220509T085722 00107 Coast, Calm Sea 31 100 4 3
S2B MUL ORT 13 20220704T062230 20220704T062230 00048 Clouds 0 . 0 0
S2B MUL ORT 13 20220709T083234 20220709T083234 00121 Coast, Clouds, Underwater relief 69 92.7 6 1
S2B MUL ORT 13 20220709T083248 20220709T083248 00121 Coast, Calm Sea 13 100 1 1
S2B MUL ORT 13 20220710T145840 20220710T145840 00139 Coast, Clouds, Artifacts 8 100 2 2
S2B MUL ORT 13 20220717T063228 20220717T063228 00091 Clouds 0 . 0 0
S2B MUL ORT 13 20220718T155111 20220718T155111 00111 Coast, Clouds 0 . 2 0
S2B MUL ORT 13 20220719T083234 20220719T083234 00121 Coast, Calm Sea, Underwater relief 87 91.9 1 2
S2B MUL ORT 13 20220719T083238 20220719T083238 00121 Calm Sea, Underwater relief, Platforms 147 91.8 10 11
S2B MUL ORT 13 20220722T070213 20220722T070213 00020 Coast, Clouds, Swell 129 96.8 34 5
S2B MUL ORT 13 20220727T063228 20220727T063228 00091 Clouds 0 . 0 0
S2B MUL ORT 13 20220813T062230 20220813T062230 00048 Clouds 0 . 0 0
S2B MUL ORT 13 20220816T063227 20220816T063227 00091 Clouds 1 0 0 0
S2B MUL ORT 13 20220828T083001 20220828T083001 00121 Coast, Harbor, Calm Sea 107 97.1 9 3
S2B MUL ORT 13 20220828T083030 20220828T083030 00121 Coast, Clouds 31 74.1 3 0
S2B MUL ORT 13 20220909T005948 20220909T005948 00002 Small Clouds 0 . 0 0
S2B MUL ORT 13 20220909T005951 20220909T005951 00002 Small Clouds 1 100 1 0
S2B MUL ORT 13 20220909T005956 20220909T005956 00002 Clouds, Islands 7 57 2 0
S2B MUL ORT 13 20220909T005959 20220909T005959 00002 Calm Sea, Clouds 5 100 0 0
S2B MUL ORT 13 20220909T010014 20220909T010014 00002 Calm Sea, Clouds 5 80 0 0
S2B MUL ORT 13 20220909T010028 20220909T010028 00002 Coast, Calm Sea 1 100 0 2
S2B MUL ORT 13 20220912T062227 20220912T062227 00048 Clouds 0 . 0 0
S2B MUL ORT 13 20220916T040541 20220916T040541 00104 Coast, Clouds 4 50 2 1
S2B MUL ORT 13 20220916T040545 20220916T040545 00104 Clouds, Islands 0 . 0 2
S2B MUL ORT 13 20220916T040610 20220916T040610 00104 Coast, Clouds 11 90 2 1
S2B MUL ORT 13 20220916T040624 20220916T040624 00104 Clouds 3 100 1 0
S2B MUL ORT 13 20220916T040628 20220916T040628 00104 Clouds 1 100 1 1
S2B MUL ORT 13 20220916T040634 20220916T040634 00104 Clouds 6 100 0 0

TABLE V: Detection System’s performance on the Evaluation Set [36] with Detection Rate and False Alarm Count metrics.

of 695 images, yielding a reduced mean length error of 11.29m


± 10.17m. Despite these challenges, our results demonstrate
commendable performance, particularly considering the reso-
lution limitations of Sentinel-2 data.

V. D ISCUSSION
Fig. 15: Examples of Ambiguous Heading. The detection of ships in medium resolution optical images
remains an underexplored eld, with only a few studies
adopting a comprehensive approach to this task. The scarcity
to instances where wake shapes or partial cloud coverage of research in this area is not indicative of its insignicance
make the length measurement task challenging (Figure 17). By but rather stems from the limited availability of data and
excluding these challenging instances, the dataset now consists annotated resources. In an attempt to bridge this gap, we
10

tributed to instances where the heading is visually unclear due


to resolution limitations. It is noteworthy that, to the best of
our knowledge, we did not nd any studies presenting results
in heading estimation specically on medium-resolution (MR)
optical imagery.
The demonstrated results hold promise for achieving com-
plete task automation. However, it is essential to note that full
conrmation of these ndings awaits deployment in production
settings (Figure 18), considering the diversity and occasional
complexity of scenes encountered. Our system exhibits a time
limitation, requiring approximately 10 minutes to process a
Fig. 16: Predicted vs. True ship lengths with an R-squared of complete Sentinel-2 image on an 8-Core CPU. This processing
0.92. time is attributed to the multiple stages involved, which,
with the application of alternative ideas, could potentially
be consolidated into a single stage without compromising
performance.
One potential avenue for system enhancement involves
introducing a context branch to the Faster R-CNN architecture
[37]. This modication holds promise in addressing issues
such as false alarms triggered by large vessels, which may
(a) 145(142) (b) 177(173) (c) 68(180) (d) 218(103)
generate noise in their surroundings, producing bright points
that our detector may misinterpret as small ships. To overcome
Fig. 17: Examples of Ship Length Prediction. The rst number the time limitations inherent in our system, the utilization of a
denotes the ground truth length, while the predicted length is one-stage detector, such as YOLO [38], could be considered.
enclosed in parentheses Additionally, we acknowledge the challenge of accurately
detecting very close ships, reecting a broader issue in general
object detection [39]. Future research efforts may explore
make available 60 annotated Sentinel-2 images (1147 ship innovative strategies to mitigate these challenges and further
exemplars) which include length, and heading information rene the performance of ship detection in optical images.
when theses are mesurable [36].
Our study demonstrates a signicant improvement in per- VI. C ONCLUSION
formance compared to existing results: our detection rate is In this research, we introduced an automated system de-
93%, compared to 75% in [15]. Notably, our detection rate signed to detect and characterize ships in Sentinel-2 images
on large vessels (> 100 m) is 97% compared to 85% in [17]. through a multi-stage process. Our system comprises distinct
It is crucial to highlight that this improvement is primarily components focusing on classication, detection, and char-
associated with the substantial difference in the dataset sizes acterization. Within the classication component, we high-
rather than the methodology employed. Importantly, it should lighted the importance of validating distribution in complex
be noted that reproducing our results does not necessitate an environments and diverse scenarios. State-of-the-art models,
equivalent volume of data. Our dataset contains a substantial specically Resnet152 and Resnet34, were employed to mini-
amount of redundant information, suggesting that achieving mize false positives. During the detection phase, modications
comparable results may be feasible with a more carefully were made to the original Faster R-CNN to effectively handle
selected dataset. our specic dataset. In the characterization phase, a dedicated
It’s crucial to highlight that our primary research objective branch was integrated into the Faster R-CNN to estimate
centers around establishing an effective ship detection system, ship headings, and Resnet50 was separately used for length
with ship heading and length estimation emerging as subse- estimation. The results of our system are promising, indicating
quent considerations. While there is room for improvement, the potential for complete automation of ship detection and
particularly in length estimation (our mean error: 15.36m ± characterization in Sentinel-2 images. Additionally, the pre-
19.57m on 746 ship images), in [14], the reported mean error sented approach is adaptable to other satellite constellations.
is 15.48m ± 10.94m on 34 ships, only 3 of which contain However, our system encounters challenges in differentiating
ship wakes. Our higher standard deviation error arises from small ships from their wakes and struggles with detecting
the presence of ships partially obscured under clouds and a closely positioned ships. Acknowledging these limitations,
signicant number of images with ship wakes. Upon removing further research could explore these specic points, ensuring
these challenging images, the mean length error is reduced to a continuous improvement in automated ship detection and
11.29m ± 10.17m. characterization methodologies.
The results for heading estimation have proven highly
satisfactory. We report a 93% accuracy in heading estimation, VII. ACKNOWLEDGEMENTS
where correct heading estimation is dened as ground truth The dataset employed in this paper was provided by Collecte
heading ±30 degrees. The remaining 7% discrepancy is at- Localisation Satellites (CLS). Our sincere appreciation goes to
11

Fig. 18: Integration of the Ship Detection System in CLS Processing Pipeline - Initial System Trials.

Vincent Kerbaol for his pivotal support, making this project [12] G. Tang, S. Liu, I. Fujino, C. Claramunt, Y. Wang, and S. Men, “H-
possible. We express our gratitude to the CLS analysts for yolo: A single-shot ship detection approach based on region of interest
preselected network,” Remote Sensing, vol. 12, no. 24, 2020.
their meticulous annotation of the datasets. [13] S. Karki and S. Kulkarni, “Ship detection and segmentation using unet,”
Special recognition is extended to Etienne Gauchet, Mo- in 2021 International Conference on Advances in Electrical, Computing,
hammed Benayad, and Maxime Vandevoorde for their contri- Communication and Sustainable Technologies (ICAECT), pp. 1–7, 2021.
[14] H. Heiselberg, “A direct and fast methodology for ship recognition in
butions to data cleaning and generation. We offer a heartfelt sentinel-2 multispectral imagery,” Remote Sensing, vol. 8, no. 12, 2016.
thanks to Chaı̈mae Sriti and Aurélien Colin for their compre- [15] A. Ciocarlan and A. Stoian, “Ship detection in sentinel 2 multi-spectral
hensive review of the paper and the insightful comments they images with self-supervised learning,” Remote Sensing, vol. 13, no. 21,
2021.
provided. [16] U. Kanjir, “Detecting migrant vessels in the mediterranean sea: Using
sentinel-2 images to aid humanitarian actions,” Acta Astronautica,
vol. 155, pp. 45–50, 2019.
R EFERENCES [17] D. Štepec, T. Martinčič, and D. Skočaj, “Automated system for ship
detection from medium resolution satellite optical imagery,” in OCEANS
[1] EMSA, “Emsa outlook 2023,” 2022. 2019 MTS/IEEE SEATTLE, pp. 1–10, 2019.
[2] A. Iodice and G. Di Martino, Maritime Surveillance with Synthetic [18] S. Kızılkaya, U. Alganci, and E. Sertel, “Vhrships: An extensive
Aperture Radar. Institution of Engineering and Technology, 2020. benchmark dataset for scalable deep learning-based ship detection
[3] V. W. G. H, Satellite Imaging for Maritime Surveillance of the European applications,” ISPRS International Journal of Geo-Information, vol. 11,
Seas. Berlin (Germany): Springer Science+Business Media B.V., 2008. no. 8, 2022.
[4] D. J. Crisp, “The state-of-the-art in ship detection in synthetic aperture [19] F. D. Vieilleville, A. Lagrange, N. Dublé, and B. L. Saux, “Sentinel-2
radar imagery,” tech. rep., Defence Science and Technology Organisation dataset for ship detection,” Mar. 2022.
Salisbury (Australia) Info . . . , 2004. [20] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster r-cnn: Towards
[5] L. Zhai, Y. Li, and Y. Su, “Segmentation-based ship detection in real-time object detection with region proposal networks.,” in NIPS
harbor for sar images,” in 2016 CIE International Conference on Radar (C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett,
(RADAR), pp. 1–4, 2016. eds.), pp. 91–99, 2015.
[6] M. Stasolla, J. J. Mallorqui, G. Margarit, C. Santamaria, and N. Walker, [21] A. Milios, K. Bereta, K. Chatzikokolakis, D. Zissis, and S. Matwin, “Au-
“A comparative study of operational vessel detectors for maritime tomatic fusion of satellite imagery and ais data for vessel detection,” in
surveillance using satellite-borne synthetic aperture radar,” IEEE Journal 2019 22th International Conference on Information Fusion (FUSION),
of Selected Topics in Applied Earth Observations and Remote Sensing, pp. 1–5, 2019.
vol. 9, no. 6, pp. 2687–2701, 2016. [22] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
[7] B. LI, X. XIE, X. WEI, and W. TANG, “Ship detection and classication recognition,” in 2016 IEEE Conference on Computer Vision and Pattern
from optical remote sensing images: A survey,” Chinese Journal of Recognition (CVPR), pp. 770–778, 2016.
Aeronautics, vol. 34, no. 3, pp. 145–163, 2021. [23] F. Chollet et al., “Keras applications.” https://keras.io/api/applications/,
[8] S. Voinov, D. Krause, and E. Schwarz, “Towards automated vessel 2015.
detection and type recognition from vhr optical satellite images,” in [24] D. Hendrycks, K. Lee, and M. Mazeika, “Using pre-training can improve
IGARSS 2018 - 2018 IEEE International Geoscience and Remote model robustness and uncertainty,” CoRR, vol. abs/1901.09960, 2019.
Sensing Symposium, pp. 4823–4826, 2018. [25] J. MacQueen, “Some methods for classication and analysis of mul-
[9] W. Chen, B. Han, Z. Yang, and X. Gao, “Mssdet: Multi-scale ship- tivariate observations,” Proceedings of the 5th Berkeley Symposium on
detection framework in optical remote-sensing images and new bench- Mathematical Statistics and Probability, Volume 1: Statistics, University
mark,” Remote Sensing, vol. 14, no. 21, 2022. of California Press, Berkeley, pp. 281–297, 1967.
[10] U. Kanjir, H. Greidanus, and K. Oštir, “Vessel detection and classi- [26] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:
cation from spaceborne optical images: A literature survey,” Remote A large-scale hierarchical image database,” in 2009 IEEE conference on
Sensing of Environment, vol. 207, pp. 1–26, 2018. computer vision and pattern recognition, pp. 248–255, Ieee, 2009.
[11] S. Zhang, R. Wu, K. Xu, J. Wang, and W. Sun, “R-cnn-based ship [27] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,
detection from high resolution remote sensing imagery,” Remote Sensing, “Rethinking the inception architecture for computer vision,” CoRR,
vol. 11, no. 6, 2019. vol. abs/1512.00567, 2015.
12

[28] K. He, R. B. Girshick, and P. Dollár, “Rethinking imagenet pre-training,”


CoRR, vol. abs/1811.08883, 2018.
[29] T. Mahendrakar, A. Ekblad, N. Fischer, R. White, M. Wilde, B. Kish, and
I. Silver, “Performance study of yolov5 and faster r-cnn for autonomous
navigation around non-cooperative targets,” in 2022 IEEE Aerospace
Conference (AERO), pp. 1–12, 2022.
[30] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
once: Unied, real-time object detection,” in 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 779–788, 2016.
[31] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and
S. Zagoruyko, “End-to-end object detection with transformers,” in
Computer Vision – ECCV 2020: 16th European Conference, Glasgow,
UK, August 23–28, 2020, Proceedings, Part I, (Berlin, Heidelberg),
p. 213–229, Springer-Verlag, 2020.
[32] B. Trzynadlowski, “Faster r-cnn.” https://github.com/trzy/FasterRCNN,
2022.
[33] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow,
A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur,
J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah,
M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker,
V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wat-
tenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale
machine learning on heterogeneous systems,” 2015. Software available
from tensorow.org.
[34] C. Cao, B. Wang, W. Zhang, X. Zeng, X. Yan, Z. Feng, Y. Liu, and
Z. Wu, “An improved faster r-cnn for small object detection,” IEEE
Access, vol. 7, pp. 106838–106846, 2019.
[35] X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, “Oriented r-cnn
for object detection,” 2021 IEEE/CVF International Conference on
Computer Vision (ICCV), pp. 3500–3509, 2021.
[36] M. Bou-laouz, R. Vadaine, G. Hajduch, and R. Fablet, “Sentinel-
2 dataset for ship detection and characterization,” Dec. 2023.
https://doi.org/10.5281/zenodo.10222276.
[37] J. Leng, Y. Liu, T. Zhang, and P. Quan, “Context learning network
for object detection,” in 2018 IEEE International Conference on Data
Mining Workshops (ICDMW), pp. 667–673, 2018.
[38] J. Terven and D. Cordova-Esparza, “A Comprehensive Review
of YOLO: From YOLOv1 and Beyond,” arXiv e-prints,
p. arXiv:2304.00501, Apr. 2023.
[39] R. Kaur and S. Singh, “A comprehensive review of object detection with
deep learning,” Digital Signal Processing, vol. 132, p. 103812, 2023.

You might also like