Bueno2022 Article AutomatedCervicalSpinalCordSeg

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Journal of Digital Imaging

https://doi.org/10.1007/s10278-022-00637-4

ORIGINAL PAPER

Automated Cervical Spinal Cord Segmentation in Real‑World


MRI of Multiple Sclerosis Patients by Optimized Hybrid Residual
Attention‑Aware Convolutional Neural Networks
América Bueno1 · Ignacio Bosch1 · Alejandro Rodríguez2 · Ana Jiménez3 · Joan Carreres4 ·
Matías Fernández2 · Luis Marti‑Bonmati2,5 · Angel Alberich‑Bayarri2,3

Received: 12 April 2021 / Revised: 22 March 2022 / Accepted: 9 April 2022


© The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine 2022

Abstract
Magnetic resonance (MR) imaging is the most sensitive clinical tool in the diagnosis and monitoring of multiple sclerosis
(MS) alterations. Spinal cord evaluation has gained interest in this clinical scenario in recent years, but, unlike the brain,
there is a more limited choice of algorithms to assist spinal cord segmentation. Our goal was to investigate and develop an
automatic MR cervical cord segmentation method, enabling automated and seamless spinal cord atrophy assessment and
setting the stage for the development of an aggregated algorithm for the extraction of lesion-related imaging biomarkers.
The algorithm was developed using a real-world MR imaging dataset of 121 MS patients (96 cases used as a training dataset
and 25 cases as a validation dataset). Transversal, 3D T1-weighted gradient echo MR images (TE/TR/FA = 1.7–2.7 ms/5.6–
8.2 ms/12°) were acquired in a 3 T system (Signa HD, GEHC) as standard of care in our clinical practice. Experienced
radiologists supervised the manual labelling, which was considered the ground-truth. The 2D convolutional neural network
consisted of a hybrid residual attention-aware segmentation method trained to delineate the cervical spinal cord. The train-
ing was conducted using a focal loss function, based on the Tversky index to address label imbalance, and an automatic
optimal learning rate finder. Our automated model provided an accurate segmentation, achieving a validation DICE coef-
ficient of 0.904 ± 0.101 compared with the manual delineation. An automatic method for cervical spinal cord segmentation
on T1-weighted MR images was successfully implemented. It will have direct implications serving as the first step for
accelerating the process for MS staging and follow-up through imaging biomarkers.

Keywords Segmentation · MRI · Multiple sclerosis · Deep learning · Residual attention-aware · CNN

Introduction

Multiple sclerosis (MS) is an inflammatory and autoimmune


* América Bueno disease of the central nervous system (CNS) which generally
bgomez.america@gmail.com results in the demyelination and axonal loss over time [1], due
1
to the presence of large focal lesions in the white and grey
Instituto de Tecnologías y Aplicaciones Multimedia,
matters of the brain and spinal cord. Because of its high sensi-
Universitat Politècnica de Valencia, Valencia, Spain
2
tivity for the evaluation of inflammatory and neurodegenera-
Biomedical Imaging Research Group (GIBI230), Hospital tive processes, magnetic resonance (MR) has become the most
Universitario y Politécnico e Instituto de Investigación
Sanitaria La Fe, Valencia, Spain important tool in confirming diagnosis and in monitoring MS
3 treatment trials. MR images quantify the lesion load, spatial
Quantitative Imaging Biomarkers in Medicine, QUIBIM S.L,
Valencia, Spain dissemination, and longitudinal evolution over time [2]. Spinal
4 cord evaluation by MR has become an increasingly relevant
Radiology Department, Hospital Universitario y Politécnico
La Fe, Valencia, Spain topic in the diagnosis and management of the disease [3].
5 Initially, MS evaluation focused on demyelinated plaques
Imaging La Fe node at Distributed Network for Biomedical
Imaging (ReDIB) Unique Scientific and Technical in the white matter. Later, investigators realized that lesions
Infrastructures (ICTS), Valencia, Spain are also present in the grey matter, including the cortex, the

13
Vol.:(0123456789)
Journal of Digital Imaging

basal ganglia, and the brain stem, as well as the grey matter algorithms has been proven in many domains, such as com-
of the spinal cord. Moreover, neurodegeneration gives rise puter vision [17]. Furthermore, CNNs are known as the
to axonal loss in the white matter and diffuse changes in the state-of-the-art in biomedical image segmentation, demon-
entire grey matter, affecting the brain and spinal cord in a strating to be highly robust to the variation of image qual-
global sense. These changes will finally result in profound ity characteristics [18]. The main innovation of CNN with
brain tissue loss and atrophy, most pronounced in the pro- respect to traditional machine learning models is their ability
gressive stages of the disease [4]. Several neuropathological to learn which filters to apply to the input images to extract
and MR imaging studies have proved the involvement of the the most appropriate features to solve a specific problem,
spinal cord in MS [5–7]. such as image classification.
Furthermore, neurodegeneration of the spinal cord is In the image segmentation domain, the U-Net architecture
considered to be the main pathological cause of irreversible [19] has meant an important improvement in performance.
locomotor disability [8, 9]. In particular, MR imaging of the The success is built on a down-sampling path, to encode
spinal cord has shown consequential signs of axonal loss input images and get context, and an up-sampling path, to
by quantifying cord atrophy, which is defined as the reduc- decode feature maps for a segmentation map and recover
tion of the cord cross-sectional area correlated with physical spatial information. Recent efforts have attempted to develop
disability. In addition, MR images also show cervical cord fully automated methods with deep learning architectures to
T2-weighted hyperintense areas related to parenchyma inju- obtain an accurate spinal cord segmentation, such as the Spi-
ries in more than 90% of MS patients [10]. nal Cord Toolbox (SCT), which is a comprehensive open-
An effective method in the assessment of MS patients source software, dedicated to the processing of spinal cord
is the MR evaluation of spinal cord atrophy and lesions. MR images [20]. Another existing spinal cord automatic
The evaluation of the cross-sectional area of the spinal cord segmentation method (BASICSeg) from axial T2-weighted
volume and lesions act as imaging biomarkers, evaluating MR imaging is based on U-Net architecture and uses pre-
its structural integrity [11]. However, MR imaging stand- processed data obtained from SCT [21]. These methods
ardization and subsequent analysis are particularly difficult use pre-processing techniques such as data transformation,
[12] due to the presence of varying MR acquisition proto- region cropping, feature extraction, and selection, which
cols, which leads to different images (signal-to-noise ratio, usually results in time-consuming steps through the need
contrast-to-noise ratio, temporal, and spatial resolutions) for user interaction, hindering their acceptance of use as
and the presence of artefacts, such as susceptibility, motion, a medical device and, therefore, their clinical application.
chemical shift, ghosting, blurring and Gibbs artefact. These Different pre-processing techniques are often applied to
factors result in special MR images features that limit their also deal with the unbalanced spinal cord images problem.
use without pre-processing steps and also would affect to In the proposed methodology, we handle this limitation
the generalization of a deep learning model that can work through the CNN design and parameterization, so that we
with a huge variety of MR images, one of our main goals. automatically obtain the whole cervical spinal cord seg-
Also, there is a wide range of spinal cord shapes, lengths, mentation from real-world data, i.e. no pre-processed MR
and pathologies. All these aspects introduce complexity into images, thus avoiding time-consuming techniques which
spinal cord evaluation and segmentation. entails the need of human interaction and enabling the use
Spinal cord localization, atrophy measurement, and of common data acquired in MR systems used in usual hos-
focal areas of inflammatory demyelination detection can pital practice. Nowadays, it is easy to find real-world data-
be depicted by segmentation techniques [13]. Several auto- sets which have irregular class distributions, such as those
matic segmentation methods have been proposed [13–23]. encountered in the MS spinal cord segmentation task; i.e.
Whereas these methods have shown good performance [13], most of the data instances belong to one class and only a few
they often require initial specifications or landmarks and instances belong to others. That is why training CNNs on
are restricted to specific MR image acquisition protocols as very unbalanced datasets is an important research challenge
mentioned previously [14]. Some of their limitations relate [22, 23]. Data with < 1% of spinal cord voxels remains a
to traditional intensity-based deterministic algorithms, continued issue of concern and needs to be addressed [24].
which are limited by factors relating to changing sensitivity To overcome this issue, a residual attention-aware
between MR images [15]. Lesion intensities can be confused mechanism [25] was used, which has become renowned in
with normal structures due to non-optimal image quality the fields of computer vision [26] and image classification
[15]. Moreover, the lack of validation against multicentre [27]. This technique generates attention-aware features that
data or cases with spinal cord damage has limited their change adaptively as layers going deeper and are based
application in large clinical studies [16]. on spatial features to solve the imbalanced data problem.
In recent years, outperformance of convolutional neural Furthermore, the residual units reduce the vanishing gradi-
networks (CNNs) as compared to traditional deterministic ent problem which take place in networks with too many

13
Journal of Digital Imaging

layers, decreasing their performance and stability as the a scientific computer with an Intel i7 processor, working at
value of gradients approaches zero in early layers. 2.2 GHz and 16 Gb of RAM memory. The model training was
Further to a proof-of-concept study developed with a carried out on a Nvidia GeForce GTX 2080 with 8 Gb memory.
preliminary methodology in a reduced cohort [28], in this Early stopping callback of Keras was used to monitor
extended work, we aim to present a novel approach for auto- the validation loss at each epoch during the training process
mated cervical spinal cord segmentation from MR, based on and impose a stop criterion (if the validation loss had not
hybrid residual attention-aware mechanisms that, together improved in the last ten epochs, training was interrupted).
with a focal loss function [29] with the Tversky-index [30] The proposed method was divided into 4 main steps
as main metric, addresses the problems of unbalanced anno- (Fig. 1). After image acquisition, data was resized and nor-
tation and MR imaging pre-processing. We evaluate the per- malized to avoid heterogeneous image intensities. Then, all
formance of this methodology by its application to a dataset the MR images were manually segmented through the delin-
of real-world MR images (3D-T1 weighted images acquired eation of the cervical spinal cord, leading to the ground-truth
in a 3 T system) acquired in patients suffering from MS. dataset. Finally, part of the acquired data was used to train
the designed CNN and validation metrics were obtained in
the rest of the dataset (validation dataset).
Materials and Methods
A) Datasets
All models and methods were implemented in Python 3.7
and Anaconda framework using Keras (v2.2.4) and Tensor-   The ethics committee approved the observational
Flow (v1.14.0) libraries. Model training was carried out by non-interventional study for which the dataset was col-

Fig. 1  Pipeline designed to build the proposed automatic cervical generation (step 3), to finally train and validate a CNN-based archi-
spinal cord segmentation. Going from the dataset generation (step 1), tecture (step 4)
going through the data normalization (step 2) and expert ground-truth

13
Journal of Digital Imaging

lected. The informed consent was waived. Anonymised nearest-neighbour interpolation. Manual labelling was
MR exams of 121 MS patients from the hospital Picture performed slice by slice by two radiologists (with 12
Archiving and Communication System (PACS) were and 30 years of experience, respectively) using the ITK-
collected and used for the development and validation SNAP tool [33]. Discrepancies were solved by consen-
of the proposed algorithm. sus. The resulting cervical spinal cord mask was consid-
  The population series consisted of 34 men and 87 ered as being a ground-truth.
women, with a mean age of 45 years old (the range being
23 to 74). In terms of MS clinical course, 70% of the D) Cervical Spinal Cord Segmentation Network Training
patients were diagnosed as relapsing–remitting (RR), 16% and Hyperparameter Tuning
as secondary-progressive (SP), 8% as primary-progressive
(PP), 5% as clinically isolated syndrome (CIS), and 1% as To reach our goal of obtaining the desired cervical spinal
progressive-relapsing (1%). We do not know a priori the cord segmentation, a 2D residual attention-aware U-Net,
number of patients with presence or not of lesions. based on a residual attention mechanism and U-Net connec-
  3D T1-weighted gradient echo, axial MR images (TE/ tions, was used as architecture. The residual block enables
TR/FA = 1.7–2.7 ms/5.6–8.2 ms/12°) were acquired on a deeper network by having hundreds of layers, whereas the
various 3 T scanners (Signa HDxt and Optima MR360, attention mechanism learns to focus on relevant locations.
GE Healthcare). All MR images had 1-mm slice Regarding training design, it was made with an automatic
thickness and an in-plane isotropic resolution from optimal learning rate finder, a focal loss function based on
0.43 × 0.43 to 1.02 × 1.02 mm. the Tversky-index (FTL) [29] to address the problem of label
  This dataset was randomly divided into two different imbalance and the Matthews correlation coefficient (MCC)
groups, one for training (80%, 96 cases, 15,668 images) as metrics during training [34].
and the other one for validation (20%, 25 cases, 4,260 Finally, to evaluate the performance of both the repro-
images), following the traditional distribution based on ducibility of manual segmentations and the spatial overlap
the Pareto principle [31]. accuracy of automatic segmentation, the DICE coefficient
  We selected MR images acquired in routine clini- [35] was used as statistical validation metric.
cal practice for reproducibility and feasibility consid-
erations. Cervical spinal cord region is selected for our
analysis since accumulating evidence supports routine Residual Attention‑Aware U‑net architecture
measurement of cervical spine (C-spine) area loss in MS
[32] We are aware that better cervical spinal cord seg- To solve the high-class imbalance problem (proportion of
mentations can be obtained through very high-resolution spinal cord compared to the rest of the image), and due
images, movement synchronism and higher acquisition to its ability to improve the model performance, a “very
times not used in most clinical centres. deep” architecture was selected for the cervical spinal cord
segmentation.
B) Data Normalization In the selected network, residual blocks were used in the
different U-net layers, except the first and the last layers, to
  All MR 2D images were resized to a common shape solve the gradient vanishing problem by using identity map-
of 256 × 256 pixels using a bicubic algorithm interpola- pings as the skip connections (Fig. 2). Because the residual
tion over 4 × 4 pixels. To avoid inconsistent tissue inten- units directly propagate features from early to late convolu-
sities in the acquisition process, image normalization at tion, an improvement was noted in the model performance.
slice level was used to set the pixel values between 0 and The stacked residual blocks were introduced to mitigate
1, as in (1): the effect of network degradation on model learning by using
identity mappings as the skip connections.
z(i, j) = (x(i, j) − xmin )∕(xmax − xmin ) (1) The residual unit was made up of a block that contains a
where z(i, j) was the normalized pixel value, x(i, j) the batch normalization layer (BN), an activation (ReLU) layer,
original pixel intensity, and xmin , xmax the minimum and and a convolutional layer, repeated three times. The detailed
maximum image intensity values, respectively. residual unit is shown in Fig. 3.
The batch normalization was employed for each convolu-
C) Cervical Spinal Cord Ground-Truth Generation tional layer and yields the mean and standard deviation to nor-
malize the activation of current layer before advancing to the
  To harmonize the image size, ground-truth images next layer. A convolutional identity mapping connection was
were also resized to a shape of 256 × 256 pixels using a used to ensure the accuracy as the network goes “deeper” [36].

13
Journal of Digital Imaging

Fig. 2  Overall network architecture based on residual blocks and attention gates. Residual blocks allow a deeper network by solving the vanish-
ing gradient problem. Attention mechanisms solve the unbalanced data problem by learning to focus on locations that are relevant

Residual attention learning divided the attention module unbalanced data and small regions of interest (ROIs) such as
into a trunk branch, used to process the original features, the spinal cord, FN detections need to be weighted higher than
and a soft branch, used to construct the identity mapping, FPs to improve recall rate. This was addressed by the focal
enhancing the good features, and reducing the noise from Tversky loss index, which controls the FP and FN and controls
the trunk branch. the loss contribution in training examples, with difficult ROI
Applied to our domain, by using this attention residual detection.
mechanism, the original feature information was preserved 1
through the trunk branch and focused on cervical spinal cord

FTLc = (1 − TIc ) 𝛾
relevant locations by the soft mask branch. c

where 𝛾 [1, 3] exponentiates the loss to focus on hard classes


Focal Tversky Loss detected with lower probability and TIc is the Tversky simi-
larity index.
The classic segmentation losses (such as DICE loss) tend to
be less accurate when the training dataset is unbalanced. This Matthews Correlation Coefficient
is because the DICE loss function equally weights false posi-
tives (FP) and false negatives (FN), leading to segmentation The Matthews correlation coefficient (MCC) is a measure
maps with high precision but low recall [36]. With highly unaffected by the unbalanced datasets issue which produces a
high score only if the prediction provided good results in all of
the four confusion matrix categories (true positives (TP), false
negatives (FN), true negatives (TN), and false positives (FP)),
proportionally both to the size of positive elements and the
size of negative elements in the dataset. MCC reads as follows:
TP ⋅ TN − FP ⋅ FN
MCC = √
(TP + FP) ⋅ (TP + FN) ⋅ (TN + FP) ⋅ (TN + FN)

Learning Rate Finder

A cyclical learning rate (CLR) method [37] was used, which


aims to train neural network with a learning rate that changes
Fig. 3  Residual block design. The identity mapping is performed as
the skip connection, the output of the identity mapping will be added
in a cyclical way for each batch, instead of in a monotonic
to the output of the stacked layers way. The minimum and maximum boundaries were set to

13
Journal of Digital Imaging

Fig. 4  Triangular learning rate policy. Orange line: learning rate value
changing between bounds. Step size is the number of iterations in half
a cycle

baselr and maxlr (Fig. 4). Keras learning rate finder technique
was used to automatically find a suitable maxlr and baselr for
cyclical learning rate scheduling and apply it to our experi-
ments (Fig. 5).
Fig. 6  Loss over learning rate to identify optimal CLR bounds, maxlr
We started by training during 2 to 5 epochs to see the :loss starts to increase, baselr : loss starts to decrease
overall behaviour of our model loss function over time. The
lower boundary was set to a small value (1e−10 ) to let the
network learn while the upper boundary was defined with Results
a too large value (1e1 ) for our model to learn. After training
completion, a smoothed loss over time was plotted (Fig. 5), An initial cervical spinal cord segmentation was per-
enabling us to define when the learning rate was just large formed, applying residual attention-aware U-net archi-
enough for loss to decrease (optimal lower bound) and then tecture with focal Tversky as loss function and setting
too large, to the point where loss starts to increase (optimal the learning rate with the CLR method while training.
upper bound). Finally, the network was fully trained along Our model provided an automated and accurate method,
all the desired epochs. achieving a MCC of 0.95 as training metric and a DICE
In Fig. 6, we can see the plot of the overall behaviour of coefficient of 0.9 as statistical validation metric. The
the loss function during 2 epochs, which led us to determine model was trained using a batch size of 15 along 30
the optimal CLR bounds for the whole training. In our case, epochs, defined with an Early Stopping callback (the train-
maxlr = 1e−3 (loss started to increase) and baselr = 1e−5 ing stops after 10 epochs of not training loss improve-
(loss started to decrease). ment). Figure 8 shows how the MCC and loss obtained
Once the final CLR bounds were determined, the net- evolved over the different training epochs on the validation
work was trained from scratch varying the learning rate in a set of our final model.
linear manner between the minimum (baselr ) and the maxi- The trained method was validated on the 20% of the
mum ( maxlr ) (Fig. 7). In this way, we were able to reduce dataset described above (validation dataset), where the
the work in setting learning rates and improve the network target region class is significantly smaller than the back-
performance. ground class.

Fig. 5  Learning rate finder,


basic procedure

13
Journal of Digital Imaging

Cyclic Learning Rate (CLR) a slight improvement over those trained with U-net, since
0,0012 deep networks have a higher learning capability. Finally, the
0,001
results show that residual attention-aware U-net architec-
ture with a CLR finder has the best performance, i.e., high
0,0008
DICE coefficient, low standard deviation (STD), and high,
Learning rate

0,0006 similar percentile values. In Fig. 9, the validation dataset


0,0004 DICE distribution histogram of the first (*LH) and the final
0,0002
(*RH) trained models are shown. Figure 9 shows that with
a traditional U-Net architecture a wider DICE histogram is
0
0 5000 10000 15000 20000 25000 30000
generated, which means that lower DICE values are obtained
Training iteraons in a high number of samples. However, the DICE values
obtained with the proposed architecture and configuration
Fig. 7  Learning rate varying between CLR bounds. “Training itera- are associated with a narrower histogram with almost all the
tions” label is the number of weight updates per epoch, i.e., the num- values concentrated over a DICE value of 0.8.
ber of total training examples divided by the batch size DICE values obtained along the different training epochs
are represented in Fig. 10 comparing the results obtained
To reach the final model, we did several tests by studying using a CLR or a fixed learning rate of 1e−4 (a common
a large amount of variation within U-net, different learn- value chosen in the literature). It is possible to realize how
ing rate methods and multiple loss functions. We decided to by using a CLR, higher accuracy was achieved with fewer
apply the cyclic learning rate method in those models that experiments and limited hyperparameter tuning. In our case,
showed a better performance with a fixed learning rate and by training with CLR a higher and earlier (around 20 epochs
thus try to achieve an improvement on them. before) DICE were obtained when compared to the fixed
Table 1 shows the comparison of our proposed method learning rate.
with other state-of-the-art CNN configurations trained with The performance of the developed model over a ran-
our training dataset, in terms of the median DICE coefficient domly selected axial slice is shown in Fig. 11. Figure 11a
and their statistical parameters obtained with the data dedi- shows the real-world MR image that is given as an input to
cated to validation. the network. The red box indicates the part of the 2D axial
Models trained with the U-net algorithm generally per- image that is zoomed in the remaining images, to observe
formed poorly, even though Tversky loss and focal Tver- each detail clearly. Ground-truth (manually labelled mask)
sky loss were used as loss functions. Using RA-U-net as over real-world data can be depicted in Fig. 11b, whereas
CNN architecture offered an improved performance. The Fig. 11c presents the mask automatically predicted by the
model obtained with the deep U-net algorithm also showed proposed model. To better compare these two masks, the

Fig. 8  Validation loss and accu- Validaon loss/ MCC


racy evolution over the whole
1
training process

0,8

0,6
Loss/MCC

0,4

val_mahews_correlaon val_loss

0,2

0
0 5 10 15 20 25 30
Epoch

13
Journal of Digital Imaging

Table 1  Results and comparison between model and loss function schemes on validation dataset
Model Loss function Parameters Learning rate Mean DICE ± STD MCC P25 Median (P50) P75

U-net *LH TL α = 0.4, β = 0.6 1e−5 0.66 ± 0.24 0.68 0.57 0.75 0.84
U-net TL α = 0.3, β = 0.7 1e −5 0.68 ± 0.24 0.71 0.61 0.77 0.85
U-net FTL γ = 1.33, α = 0.7, β = 0.3 1e−5 0.62 ± 0.25 0.64 0.53 0.71 0.81
U-net FTL γ = 1.33, α = 0.6, β = 0.4 1e−5 0.68 ± 0.27 0.70 0.62 0.80 0.87
U-net FTL γ = 1.6, α = 0.6, β = 0.4 1e−5 0.74 ± 0.23 0.77 0.70 0.82 0.88
RA-U-net FTL γ = 1.33, α = 0.6, β = 0.4 1e −5 0.86 ± 0.11 0.89 0.85 0.89 0.92
RA-U-net FTL γ = 1.6, α = 0.6, β = 0.4 1e−4 0.80 ± 0.22 0.84 0.82 0.87 0.90
Deep U-net FTL γ = 1.33, α = 0.6, β = 0.4 CLR 0.79 ± 0.25 0.83 0.82 0.89 0.92
(maxlr = 1e−2 , baselr = 1e−4
RA-U-net **RH FTL γ = 1.33, α = 0.6, β = 0.4 CLR 0.904 ± 0.101 0.95 0.90 0.92 0.94
(maxlr = 1e−3 , baselr = 1e−5

Column “Loss function”: Tversky loss (TL), focal Tversky loss (FTL). Column “Parameters” refers to loss function data. Column “Models”:
U-net, RA-U-net, deep U-net [38]. Column “MCC”: Matthews correlation coefficient, final training metric. Column “P25”: 25th percentile. Col-
umn “Median (P50)”: median DICE and 50th percentile. Column “P75”: 75th percentile
*LH left histogram, **RH right histogram

overlap of both is shown in Fig. 11d. The red voxels identi- Fig. 12 where a segmentation error example is shown. In
fied the mismatched area (pixels that are only present in one 3D and sagittal views, we can see two inhomogeneities
of the masks) and the green voxels indicate the matching that are surrounded in yellow. These segmentation failures,
area (pixels that belong to both the manual and predicted which appear as two spikes toward the end of the cervi-
mask). cal spinal cord, are caused by the settings in slices 35,
According to Fig. 11, only one random example from 36, and 37. In these images, we can see that the cervical
more than 2000 axial images that make up the validation spinal cord portion is very close to another anatomical
dataset is shown. It is observed that our model achieves structure of similar intensity that seem to be practically
a successful automatic prediction which hardly differs connected. That is why the prediction mask is bigger than
from that performed by the expert radiologist. Despite those obtained for slices 34, 38, and 39 (where the cervical
good overall performance, there are some situations that spinal cord region is more separated from that anatomical
show that our automatic segmentation tool can make some structure) and, therefore, the segmentation mask is fitted to
prediction mistakes. One example of that can be seen on the cervical spinal cord region and there is no confusion.

Fig. 9  DICE distribution histogram. a First model distribution, scattered distribution. b Final model distribution, narrow distribution

13
Journal of Digital Imaging

gradient echo MR axial images. Due to the huge variety of


MR scanners and the absence of standard acquisition MR
images in the real-world data, MR images of MS patients
differ as they cover different spinal cord regions with dif-
ferent tissue intensities, different fields of view, different
pixel spacing, and signal intensity. The major proposal of
this work is to handle the cervical spinal cord segmentation
by the CNN design and parameterization, avoiding pre-
processing MR images techniques that require human inter-
action, so that an automatic segmentation can be obtained
from common data acquired in MR systems used in usual
hospital practice. The cervical spinal cord manual segmen-
tation involves a time-consuming task for radiologists, so
it will be of useful assistance for them and it will be an
important prior step for medical analysis issues like the
Fig. 10  DICE evolution during training epochs using a CLR and a cervical spinal cord atrophy measurement by calculating
fixed learning rate of 1­ e−4. With CLR, a higher DICE is achieved in a
the volumetry, as it is proved that a volume reduction has
lower number of epochs
direct implications with the atrophy.
Discussion This tool was developed thinking about solving some
deficiencies that other models present, such as the SCT. This
In the present study, we obtained an accurate, automatic method is based on a sequence of two CNNs and achieved
cervical spinal cord segmentation from 3D T1-weighted a median DICE coefficient of 0.95. The first CNN with

Fig. 11  a Real-world MR


data, b ground-truth, c CNN-
predicted mask, d ground-truth
over predicted mask (green-
matched area, red-mismatched
area)

13
Journal of Digital Imaging

Fig. 12  Poor segmentation performance setting. Slices 35, 36, and 37 and 39 show a well-delimited cervical spinal cord region which leads
show the cervical spinal cord very close to another anatomical struc- us to a correct segmentation mask, and these correspond to the adja-
ture that could cause oversized prediction masks and is the last inho- cent peak areas
mogeneity that is surrounded in 3D and sagittal views. Slices 34, 38,

2D-dilated convolutions detects the spinal cord centreline (when carrying out back-propagation) and the degradation
being necessary human interaction to initialize the spinal issue, which results in poor training results. The residual
cord location (e.g., three points in the spinal cord) for an blocks were stacked in the U-net layers, except the first and
optimal performance and also carrying out the next CNN last layers, to solve the vanishing gradient problem by allow-
for spinal cord segmentation in a framework that is more ing the network to have hundreds of layers. The attention
sensitive to the quality of that detection module. mechanism learns to focus on locations that are relevant by
There is another method that depends on the previous allowing the retention of the original feature information
one, for automatic segmentation of the spinal cord after trau- through the trunk branch and paying attention to those spi-
matic contusion injury from axial T2-weighted MR imag- nal cord regions by the soft mask branch. The training was
ing and whose DICE coefficient is 0.93. Firstly, the optic designed with a focal loss function, based on the Tversky
algorithm [39] is used from the SCT to first detect the spinal index, to address the problem of label imbalance in medical
cord centreline and create a square mask around it. Secondly, image segmentation and an automatic optimal learning rate
three different novel 2D-CNN architectures are applied to finder. Finally, a DICE coefficient of 0.904 was achieved.
achieve the segmentation. Moving away from our method As we mentioned during the paper, the usefulness of this
generalization purpose, this study cohort is derived from a model, and as future work in development, this segmentation
single institution with all the imaging being performed on a algorithm will allow the quantification of the cervical spinal
single MR imaging scanner using similar parameters, poten- cord volume and, therefore, atrophy assessment through the
tially biasing the results. Furthermore, the pre-processing comparison of the obtained volume against that associated
step of cropping the image around the spinal cord centreline to a healthy population. In addition, this model will help to
depends on a third-party tool. develop an aggregated lesion detection model from STIR
In our study, overcoming what we understand as other image sequences which will be focused on the lesion locali-
methods limitations and with the aim of avoiding any type of zation on the cervical spine region, since it would be used as
pre-processing techniques and reducing human interaction, first step by obtaining the cervical spinal cord location in the
a 2D residual attention-aware U-Net, based on a residual corresponding STIR space after applying image registration
attention mechanism and U-Net connections, was imple- algorithms between both sequences.
mented to obtain the desired spinal cord segmentation. We Limitations detected during our study will be investi-
selected a deep architecture since it is proved that deeper gated in future works. Some of them primarily relate to the
networks have a higher learning capacity, although they relatively small sample size of patients. Consequently, to
show some drawbacks like the gradient vanishing problem proceed with a K-fold cross-validation scheme would have

13
Journal of Digital Imaging

been reasonable as it ensures that every observation from our Funding This work was funded by a Generalitat Valenciana PhD fel-
limited dataset has the chance of appearing in training and lowship (grant number ACIF/2017/057) and the Universitat Politècnica
de València and Polytechnic La Fe Hospital research project Deep-
validation set. A performance comparison between exist- Medul (grant number 2018/0274) (Deep Learning for spinal cord seg-
ing tools would better clarify the quality of our proposed mentation in Multiple Sclerosis).
model compared to others although this can sometimes
be complicated since not all of them work with the same Data Availability Not applicable.
MR image types. The need for external validation is one of
them; our method should be externally validated in other Code Availability Not applicable.
centres using different MR scanners from different vendors,
magnetic fields, and acquisition parameters. In this way, we Declarations
will be able to evaluate the interoperability of our method.
Ethics Approval This research study was conducted retrospectively
Increasing patient numbers will improve our database and from data obtained for observational purposes. An official waiver of
lead us to a greater number of training samples, therefore, ethical approval was granted from GIBI230 − IISLAFE of the Hos-
achieving fuller training and better performance. pital Universitari I Politècnic of Valencia.

Consent to Participate Not applicable. It is not necessary to obtain


consent because of MR images without identifying information in this
study.
Conclusion
Consent for Publication Not applicable. It is not necessary to obtain
consent because of MR images without identifying information in this
An optimal automated method for cervical spinal cord seg- study.
mentation in real-world MRI of MS patients was imple-
mented by optimized hybrid residual attention-aware CNN Conflict of Interest The authors declare no competing interests.
with focal Tversky as loss function and CLR method to
set, change and tweak the learning rate during training. A
dataset consisting of 3D T1-weighted images from 121 MS
patients was used to train and validate the algorithm. Our
References
method can address the class imbalance problem without 1. Matthews PM, De Stefano N, Narayanan S, et al (1998) Putting
pre-processing techniques, so by using directly real-world magnetic resonance spectroscopy studies in context: Axonal
data, a better clinical integration can be achieved. Lastly, a damage and disability in multiple sclerosis. Semin. Neurol.
comparison of our architecture in combination with several 18:327–336
2. Magraner MJ, Bosca I, Simó-Castelló M, et al (2012) Brain atro-
loss functions and learning rate methods and values is pre- phy and lesion load are related to CSF lipid-specific IgM oligo-
sented. The main contributions of this proposed method are clonal bands in clinically isolated syndromes. In: Neuroradiology.
the lack of human interaction, the use of real-world data to pp 5–12
approach implementable solutions in usual clinical practice, 3. Cordovez MJ, Gálvez GM, Rojas CG, et al (2013) Uso de volumetría
y carga lesional en el seguimiento de pacientes con esclerosis múlti-
the minimization of the unbalanced data problem without ple. Experiencia local y revisión de la literature. Rev Chil Radiol
pre-processing techniques, and the time optimisation due to 19:156–164. https://​doi.​org/​10.​4067/​S0717-​93082​01300​04000​04
the CLR method. 4. Lassmann H (2018) Multiple sclerosis pathology. Cold Spring
The proposed automatic segmentation model will have Harb. Perspect. Med. 8:a028936
5. Bakshi R, Dandamudi VSR, Neema M, et al (2005) Measurement
direct implications for accelerating the MS phenotyping, of brain and spinal cord atrophy by magnetic resonance imaging
follow-up, and extraction of imaging biomarkers, since we as a tool to monitor multiple sclerosis. J. Neuroimaging 15
achieve a time optimisation by subtracting effort from the 6. Bjartmar C, Kidd G, Mörk S, et al (2000) Neurological disabil-
segmentation task and, moreover, increasing resources in the ity correlates with spinal cord axonal loss and reduced N-acetyl
aspartate in chronic multiple sclerosis patients. Ann Neurol
disease’s evolution monitoring in order to define the factors 48:893–901. https://​doi.​org/​10.​1002/​1531-​8249(200012)​48:​6<​
behind the increase of irreversible disability, the principal 893::​AID-​ANA10​>3.​0.​CO;2-B
MS symptom. These implications are addressed in subse- 7. Trapp BD, Ransohoff RM, Fisher E, Rudick RA (1999) Neurodegenera-
quent studies, since the purpose of this paper is to reflect tion in Multiple Sclerosis: Relationship to Neurological Disability.
Neurosci 5:48–57. https://​doi.​org/​10.​1177/​10738​58499​00500​107
the first segmentation method necessary to develop them. 8. Cohen AB, Neema M, Arora A, et al (2012) The Relationships
among MRI-Defined Spinal Cord Involvement, Brain Involve-
Supplementary Information The online version contains supplemen- ment, and Disability in Multiple Sclerosis. J Neuroimaging
tary material available at https://d​ oi.o​ rg/1​ 0.1​ 007/s​ 10278-0​ 22-0​ 0637-4. 22:122–128. https://​doi.​org/​10.​1111/j.​1552-​6569.​2011.​00589.x
9. Lundell H, Svolgaard O, Dogonowski AM, et al (2017) Spinal
Author Contribution Not applicable. cord atrophy in anterior-posterior direction reflects impairment

13
Journal of Digital Imaging

in multiple sclerosis. Acta Neurol Scand 136:330–337. https://​ 27. Huang P, Wang J, Zhang J, et al (2020) Attention-aware Residual
doi.​org/​10.​1111/​ane.​12729 Network based Manifold Learning for White Blood Cells Clas-
10. Valsasina P, Rocca MA, Horsfield MA, et al (2013) Regional cervical sification. IEEE J Biomed Heal Informatics 1–1. https://​doi.​org/​
cord atrophy and disability in multiple sclerosis: A voxel-based analy- 10.​1109/​jbhi.​2020.​30127​11
sis. Radiology 266:853–861. https://​doi.​org/​10.​1148/​radiol.​12120​813 28. Bueno Gómez A, Alberich-Bayarri A, Bosch I, Carreres Polo J
11. Filippi M, Agosta F (2010) Imaging biomarkers in multiple scle- (2021) Automatic MR Spinal Cord Segmentation by Hybrid Residual
rosis. J. Magn. Reson. Imaging 31:770–788 Attention-Aware Convolutional Neural Networks and Learning Rate
12. Yiannakas MC, Mustafa AM, De Leener B, et al (2016) Fully Optimization on Real World Data. In: IFMBE Proceedings. Springer
automated segmentation of the cervical cord from T1-weighted Science and Business Media Deutschland GmbH, pp 158–168
MRI using PropSeg: Application to multiple sclerosis. NeuroIm- 29. Abraham N, Khan NM (2018) A Novel Focal Tversky loss func-
age Clin 10:71–77. https://​doi.​org/​10.​1016/j.​nicl.​2015.​11.​001 tion with improved Attention U-Net for lesion segmentation. Proc
13. De Leener B, Taso M, Cohen-Adad J, Callot V (2016) Segmenta- - Int Symp Biomed Imaging 2019-April:683–687
tion of the human spinal cord. Magn. Reson. Mater. Physics, Biol. 30. Salehi SSM, Erdogmus D, Gholipour A (2017) Tversky loss func-
Med. 29:125–153 tion for image segmentation using 3D fully convolutional deep
14. Horsfield MA, Sala S, Neema M, et al (2010) Rapid semi-automatic networks. In: Lecture Notes in Computer Science (including sub-
segmentation of the spinal cord from magnetic resonance images: series Lecture Notes in Artificial Intelligence and Lecture Notes
Application in multiple sclerosis. Neuroimage 50:446–455. https://​ in Bioinformatics). Springer Verlag, pp 379–387
doi.​org/​10.​1016/j.​neuro​image.​2009.​12.​121 31. The Pareto Principle | Dunford | The Plymouth Student Scien-
15. Chen M, Carass A, Oh J, et al (2013) Automatic magnetic reso- tist. https://​bcur.​org/​journ​als/​index.​php/​TPSS/​artic​le/​view/​408.
nance spinal cord segmentation with topology constraints for vari- Accessed 9 Dec 2019
able fields of view. Neuroimage 83:1051–1062. https://d​ oi.o​ rg/1​ 0.​ 32. Mina Y, Azodi S, Dubuche T, et al (2021) Cervical and thoracic
1016/j.​neuro​image.​2013.​07.​060 cord atrophy in multiple sclerosis phenotypes: Quantification and
16. Jovicich J, Czanner S, Greve D, et al (2006) Reliability in multi- correlation with clinical disability. NeuroImage Clin 30:102680.
site structural MRI studies: Effects of gradient non-linearity cor- https://​doi.​org/​10.​1016/J.​NICL.​2021.​102680
rection on phantom and human data. Neuroimage 30:436–443. 33. Yushkevich PA, Gerig G (2017) ITK-SNAP: An Intractive Medi-
https://​doi.​org/​10.​1016/j.​neuro​image.​2005.​09.​046 cal Image Segmentation Tool to Meet the Need for Expert-Guided
17. Szlávik Z, Szirányi T (2004) Face Analysis Using CNN-UM Segmentation of Complex Medical Images. IEEE Pulse 8:54–57.
18. Litjens G, Kooi T, Bejnordi BE, et al (2017) A survey on deep https://​doi.​org/​10.​1109/​MPUL.​2017.​27014​93
learning in medical image analysis. Med. Image Anal. 42:60–88 34. Chicco D, Jurman G The advantages of the Matthews correlation
19. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks coefficient (MCC) over F1 score and accuracy in binary classifica-
for biomedical image segmentation. In: Lecture Notes in Computer Sci- tion evaluation. https://​doi.​org/​10.​1186/​s12864-​019-​6413-7
ence (including subseries Lecture Notes in Artificial Intelligence and 35. Eelbode T, Bertels J, Berman M, et al (2020) Optimization for
Lecture Notes in Bioinformatics). Springer Verlag, pp 234–241 Medical Image Segmentation: Theory and Practice When Evaluat-
20. De Leener B, Lévy S, Dupont SM, et al (2017) SCT: Spinal Cord ing With Dice Score or Jaccard Index. IEEE Trans Med Imaging
Toolbox, an open-source software for processing spinal cord MRI 39:3679–3690. https://​doi.​org/​10.​1109/​TMI.​2020.​30024​17
data. Neuroimage 145:24–43. https://​doi.​org/​10.​1016/j.​neuro​image.​ 36. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for
2016.​10.​009 image recognition. In: Proceedings of the IEEE Computer Society
21. McCoy DB, Dupont SM, Gros C, et al (2019) Convolutional neu- Conference on Computer Vision and Pattern Recognition. IEEE
ral network–based automated segmentation of the spinal cord and Computer Society, pp 770–778
contusion injury: Deep learning biomarker correlates of motor 37. Smith LN (2017) Cyclical learning rates for training neural net-
impairment in acute spinal cord injury. Am J Neuroradiol 40:737– works. In: Proceedings - 2017 IEEE Winter Conference on Appli-
744. https://​doi.​org/​10.​3174/​ajnr.​A6020 cations of Computer Vision, WACV 2017. Institute of Electrical
22. Lin T-Y, Goyal P, Girshick R, et al (2017) Focal Loss for Dense and Electronics Engineers Inc., pp 464–472
Object Detection 38. Li R, Liu W, Yang L, et al (2017) DeepUNet: A Deep Fully Con-
23. Montahaei E, Ghorbani M, Baghshah MS, Rabiee HR (2018) volutional Network for Pixel-level Sea-Land Segmentation. IEEE
Adversarial Classifier for Imbalanced Problems J Sel Top Appl Earth Obs Remote Sens 11:3954–3962
24. Buda M, Maki A, Mazurowski MA (2017) A systematic study of 39. Gros C, De Leener B, Dupont SM, et al (2018) Automatic spinal
the class imbalance problem in convolutional neural networks. cord localization, robust to MRI contrasts using global curve opti-
https://​doi.​org/​10.​1016/j.​neunet.​2018.​07.​011 mization. Med Image Anal 44:215–227. https://d​ oi.o​ rg/1​ 0.1​ 016/j.​
25. Wang F, Jiang M, Qian C, et al (2017) Residual attention network media.​2017.​12.​001
for image classification. In: Proceedings - 30th IEEE Conference
on Computer Vision and Pattern Recognition, CVPR 2017. Insti- Publisher's Note Springer Nature remains neutral with regard to
tute of Electrical and Electronics Engineers Inc., pp 6450–6458 jurisdictional claims in published maps and institutional affiliations.
26. Gao L, Li Y, Ning J (2019) Residual attention convolutional net-
work for online visual tracking. IEEE Access 7:94097–94105.
https://​doi.​org/​10.​1109/​ACCESS.​2019.​29277​91

13

You might also like