ARTICLE IN PRESS
Image and Vision Computing xxx (2007) xxx–xxx
www.elsevier.com/locate/imavis
Watershed segmentation using prior shape and appearance knowledge
Ghassan Hamarneh *, Xiaoxing Li
School of Computing Science, Simon Fraser University, Burnaby, BC, Canada V5A 1S6
Received 20 November 2005; received in revised form 29 August 2006; accepted 20 October 2006
Abstract
Watershed transformation is a common technique for image segmentation. However, its use for automatic medical image segmentation has been limited particularly due to oversegmentation and sensitivity to noise. Employing prior shape knowledge has demonstrated
robust improvements to medical image segmentation algorithms. We propose a novel method for enhancing watershed segmentation by
utilizing prior shape and appearance knowledge. Our method iteratively aligns a shape histogram with the result of an improved k-means
clustering algorithm of the watershed segments. Quantitative validation of magnetic resonance imaging segmentation results supports the
robust nature of our method.
2007 Elsevier B.V. All rights reserved.
Keywords: Watershed transformation; Prior shape knowledge; Segmentation; k-Means clustering
1. Introduction
The concept of watersheds is well known in topography.
It was first proposed as a potential method for image segmentation in [1]. In [2], watershed transformation was simulated based on an immersion process, enabling an increase
in speed and accuracy. Parallel watershed segmentation was
later developed [3], offering clear partitions within images.
Watershed transformation has increasingly been recognized as a powerful segmentation process due to its many
advantages [4], including simplicity, speed and complete
division of the image. Even with target regions having low
contrast and weak boundaries, watershed transformation
can always provide closed contours. In addition, watersheds
typically occur at the most obvious contours of the object,
even when oversegmentation is severe. This positioning of
contours can offer a stable and accurate initialization for
other post-processing and segmentation techniques. A
detailed review of algorithms that make use of the watershed
transformation for image segmentation can be found in [5].
*
Corresponding author. Tel.: +1 604 291 3007; fax: +1 604 291 3045.
E-mail addresses: hamarneh@cs.sfu.ca (G. Hamarneh), xli1@cs.sfu.ca
(X. Li).
Oversegmentation and sensitivity to noise continue to
plague watershed transformation with respect to medical
image data. Typically, the gradient magnitude of the original image is computed before the watershed transformation
is applied. Fluctuations in the gradient magnitude image, as
well as negative impulse noise being regarded as a local minimum, can result in undesired additional watershed segments. Several methods have been proposed to overcome
these drawbacks. Among the most notable is the use of
region markers [6–8], in which certain desired local minima
are selected as markers, then geodesic reconstruction is
applied to fill the other minima to non-minimum plateaus.
However, automatic marker selection is a difficult process
and in most applications human interaction is required. In
addition, if inappropriate markers are selected, geodesic
reconstruction can cause important image features to be
overlooked, thus dramatically changing the final result.
It has been demonstrated in several medical image segmentation techniques that improved convergence and
robustness can be obtained when prior shape knowledge
is utilized. In the seminal work of Cootes et al. [9], a
training set of shapes represented by landmarks were used
to build a point distribution model, which is used to constrain the segmented shapes to lie within an allowable
0262-8856/$ - see front matter 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.imavis.2006.10.009
Please cite this article in press as: G. Hamarneh, X. Li, Watershed segmentation using prior shape and appearance knowledge, Image
Vis. Comput. (2007), doi:10.1016/j.imavis.2006.10.009
ARTICLE IN PRESS
2
G. Hamarneh, X. Li / Image and Vision Computing xxx (2007) xxx–xxx
Fig. 1. The training cardiac image data [16,17]. (a) A selection of the grayscale MR images in the training set. (b) The expert-segmented binary masks of
the cardiac muscle of each of the images in (a). (c) The shape histograms, constructed from the aligned binary images less the one belonging to the
corresponding image in (a). This example is presented for illustration purposes in order to clarify the steps of the algorithm. Refer to Section 3 for more
elaborate results and validation.
shape domain. In [12], prior shape models were constructed
for level set based shape representations through the use of
the signed distance transform. Prior shape knowledge has
also been encoded through capturing the variability of
Fourier-based shape descriptors in a deformable model
framework [13,14]. In [10], the geometrical and physical
layers of classical deformable models were complemented
with behavioral and cognitive layers that facilitate encoding more complex prior shape and contextual knowledge.
The major detraction toward using prior shape and
appearance models is the need to label a training set of
images. The set should be large enough to encompass the
typical shape and appearance variations one wishes to capture and model. Nevertheless, knowledge about typical
shape variations of a structure of interest in medical images
can be made available to support the incorporation of prior
knowledge. Clearly, by using prior shape models one sacrifices the generality of the method in order to achieve
improved robustness only for a specific class of object
shapes. So although prior shape models ensure plausible
shapes, they may produce incorrect results if the novel data
is too different from the training set.
In this paper we propose a novel method for incorporating prior shape and appearance knowledge into the
watershed segmentation technique.1 The method is composed of a training stage and a segmentation stage. In
1
This work was first presented at the Canadian Conference on
Computer and Robot Vision [15].
the training stage, a ‘shape histogram’ and image intensity
statistics are used to model prior shape and appearance
knowledge, respectively. The segmentation stage is an
automatic iterative procedure and consists of four steps:
classical watershed transformation, improved k-means
clustering, shape alignment, and refinement.
The rest of the paper is organized as follows. Using
short-axis, cardiac MR image data as an example, Section
2 discusses the details of the algorithm including modeling
prior knowledge (Section 2.1), watershed segmentation
(Section 2.2), improved k-means clustering (Section 2.3),
current estimate alignment (Section 2.4) and refinement
(Section 2.5). In Section 3, we apply our method to segmenting the corpora callosa from a set of 51 brain MR
images and provide numerical validation and discussion
of the results. In Section 4, we conclude with a summary
of our work and possible future directions.
2. Methods
2.1. Modeling shape and appearance knowledge
A training set of gray level images containing the object
of interest is collected. In addition, the training set includes
data representing expert segmentation of the images. This
data is used to form binary images directly relating to the
shape of the object of interest. Fig. 1b shows an example
subset of binary images, created from the expert segmentation of the cardiac left ventricle wall from axial MRI slices
Please cite this article in press as: G. Hamarneh, X. Li, Watershed segmentation using prior shape and appearance knowledge, Image
Vis. Comput. (2007), doi:10.1016/j.imavis.2006.10.009
ARTICLE IN PRESS
3
G. Hamarneh, X. Li / Image and Vision Computing xxx (2007) xxx–xxx
seen in Fig. 1a [16,17]. The binary shape images are then
aligned by applying translation, rotation, and scaling factors, in order to normalize the location of the centroid,
the direction of the principle axis, and the area of each of
the binary shapes, respectively.
A suitable shape model should represent the detailed
shape information and be able to model possible shape
variations. To this end we provide a modification to the
idea of a probabilistic atlas by incorporating additional
information derived from the distance transform. First, a
shape histogram is obtained by adding the values of corresponding pixels of the set of aligned binary shape images.
The value at pixel p in the shape histogram, denoted by
SH(p), reflects the number of shapes containing p. Shape
histograms for the sets of aligned expert-segmented regions
are shown in Fig. 1c.
Second, by overlapping and aligning the shape histogram with the instance of the target structure in a new test
image, we can construct a probability map in which the value of each pixel represents the probability that the underlying pixel p belongs to the target. Alignment details are
described in Section 2.4. The probability map is denoted
by PM(p) and defined as
PMðpÞ ¼
s2 þ s1 SHðpÞ
s2 ½1 DistðpÞ
if p 2 R;
if p 2
6 R;
ð1Þ
where R is the set of nonzero pixels in the shape histogram.
Dist(p) is the normalized two-dimensional Euclidean distance transform for the binary image having R as foreground [18]. The two scalars, s1 and s2, satisfy s1 + s2 = 1
and determine the weight of shape information when constructing the probability map (Fig. 2).
In addition to capturing shape variation information
from the expert-segmented binary images, we model the
appearance knowledge from the intensity patch of the object
of interest in the training set of gray level images. In medical images, anatomical structures have characteristic intensity features that can be utilized when locating similar
structures in new images. Histogram equalization is first
applied to the training set of gray level images and then
the mean, Mapp, and variance, Vapp, of the intensity patches
are calculated and used as two appearance descriptors.
2.2. Watershed transformation
By utilizing prior knowledge our method can cope with
the oversegmentation problem of the standard watershed
transformation. In this section we present a brief review
of the standard watershed segmentation algorithm for the
discrete case using topographical distance. The reader is
referred to [5] for a detailed treatment.
The lower slope, LS(p), is the maximum slope connecting pixel p in the input image f, with its neighbors of lower
altitude, which can be written as
f ðpÞ f ðqÞ
LSðpÞ ¼ max
:
ð2Þ
q2N ðpÞ[p
dðp; qÞ
N(p) is the set of neighbors of pixel p and d(p, q) is the
Euclidean distance between p and q. Note that when
q = p, the right hand side of (2) is forced to be zero; thus
we have a lower slope value even when p is a local minimum. Consequently, the cost for moving from pixel p to
q is defined as
8
if f ðpÞ > f ðqÞ;
>
< LSðpÞ dðp; qÞ
if f ðpÞ < f ðqÞ;
costðp; qÞ ¼ LSðqÞ dðp; qÞ
>
:1
½LSðpÞ þ LSðqÞ dðp; qÞ if f ðpÞ ¼ f ðqÞ:
2
ð3Þ
If there exists a path p = (p0, . . . , pl) from p0 = p to pl = q,
the topographical distance along p between the two pixels
p and q is expressed as
T pf ðp; qÞ ¼
l1
X
dðpi ; piþ1 Þcostðpi ; piþ1 Þ:
ð4Þ
i¼0
T pf ðp; qÞ is the minimum of the topographical distances of
all paths linking p and q. Following similar definition, the
catchment basin CB(mi) of a local minimum mi is defined
as the set of pixels which have smaller topographical distances to mi than any other local minimum. Finally, the
set of pixels which do not belong to any catchment basin
Fig. 2. The shape histogram in (a) is used to generate the probability map in (b). The probability profile values along the line ABCD in (b) are shown in
(c). Note that the likelihood of a shape is higher in brighter regions, with the probability decreasing as we move away from the center of the probability
map.
Please cite this article in press as: G. Hamarneh, X. Li, Watershed segmentation using prior shape and appearance knowledge, Image
Vis. Comput. (2007), doi:10.1016/j.imavis.2006.10.009
ARTICLE IN PRESS
4
G. Hamarneh, X. Li / Image and Vision Computing xxx (2007) xxx–xxx
are referred to as the watershed pixels. Note that, in practice, the watershed transformation is usually computed on
the gradient magnitude image instead of the gray value image. An example of watershed transformation can be seen
in Fig. 3b.
2.3. Improved k-means clustering
After performing a standard watershed transformation,
we make use of the k-means clustering algorithm to assign
the resulting segments into different classes. k-means is a
clustering algorithm that assigns N data points to k disjoint
subsets, Sj, j = 1, 2, . . . , k, each containing Nj data points,
by minimizing the sum-of-squares criterion given by
J¼
k X qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
jxn lj j2 ;
ð5Þ
j¼1 n2S j
where xn is the value of nth data point and lj is the mean
value of the data points within the cluster Sj.
k-means clustering algorithm can effectively assign the
watershed segments into several clusters according to different features. In our algorithm, we jointly utilize two
kinds of features for each segment; (i) the mean intensity
and (ii) the spatial centroid position of each segment. For
the nth segment, these features are denoted by In and Cn,
respectively. Accordingly, we design an improved k-means
clustering scheme with a modified sum-of-squares criterion,
given by
k X qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
2
2
ð6Þ
jI n gj j þ a jC n lj j ;
J0 ¼
j¼1 n2S j
where gj and lj denote the average values of In and Cn,
respectively, " n belonging to the jth cluster. In (6), note
that the values of In and Cn are each normalized in advance. By minimizing (6), each resulting cluster of watershed segments will contain segments with similar
intensity. Furthermore, the watershed segments within
each cluster tend to be in close proximity and the likelihood
of assigning non-neighboring segments to the same cluster
is reduced. The parameter a determines the weight for
penalizing distant segments when clustering. The value of
a is chosen to be inversely proportional to the average distance between the two farthest points of an object in the
training set (zooming). An excessively large a, for example,
can hinder segments belonging to a very thin or poorly
connected structure from being grouped together.
To overcome the sensitivity of the k-means algorithm to
different initializations, we begin the process by ordering
the segments according to their mean intensities and
assigning the same number of segments to each of the k
clusters.
After clustering, we locate an initial estimate of the target structure by selecting the cluster whose intensity profile
Fig. 3. Calculating the initial estimate of left ventricle segmentation. (a) Original MR cardiac image. (b) Watershed transformation of the image. (c)
Coalesced watershed segments resulting from improved k-means clustering. (d) The initial segmentation estimate.
Please cite this article in press as: G. Hamarneh, X. Li, Watershed segmentation using prior shape and appearance knowledge, Image
Vis. Comput. (2007), doi:10.1016/j.imavis.2006.10.009
ARTICLE IN PRESS
5
G. Hamarneh, X. Li / Image and Vision Computing xxx (2007) xxx–xxx
Fig. 4. Iterative shape alignment to the wall of the left ventricle. The intensity gradation represents the prior shape probability map, which is masked by
the current estimate of the target. Note how (dark) pixels with lower probability are gradually eliminated and an improved estimate of the target is
obtained. (a) Initial alignment estimate. (b) After sixth iteration. (c) After 15th iteration. (d) Convergence after 21st iteration.
is most similar to the appearance knowledge obtained from
the training set (Section 2.1). Specifically, after histogram
equalization, we calculate the mean and variance of the
intensity for each cluster. We then choose the cluster having the most similar intensity statistics to those calculated
in the training. This is expressed as finding the jth cluster
that minimizes
DFðjÞ ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðmj M app Þ2 þ ðvj V app Þ;
ð7Þ
where mj and vj denote the mean and variance of the intensity of the jth cluster, respectively. The set of all watershed
pixels belonging to this selected cluster is considered to be
the initial segmentation estimate and will be iteratively
Fig. 5. Examples of corpus callosum segmentation results.
Please cite this article in press as: G. Hamarneh, X. Li, Watershed segmentation using prior shape and appearance knowledge, Image
Vis. Comput. (2007), doi:10.1016/j.imavis.2006.10.009
ARTICLE IN PRESS
6
G. Hamarneh, X. Li / Image and Vision Computing xxx (2007) xxx–xxx
improved (Section 2.4). Fig. 3 demonstrates initializing the
first estimate of the region to be segmented by applying our
improved k-means clustering to a watershed segmentation
of an MR image. Note in Fig. 3b how classic watershed
segmentation is characterized by excessive oversegmentation. Our clustering technique detects a good first approximation of the region of interest (Fig. 3c), albeit initially
including some undesirable segments (upper left corner
and along bottom of Fig. 3d). The iterative shape alignment (Section 2.4) will gradually remove these undesired
segments which are typically expected to appear.
histogram and the desired structure is well segmented. An
example illustrating the intermediate results of this process
is shown in Fig. 4. In this case, reasonable convergence
occurred after 21 iterations. Using the target obtained in
the final iteration, we can determine which segments should
Table 1
The mean, maximum, and standard deviation of the Hausdorff distances
between our result and expert segmented corpora callosa boundaries
MRI No.
Max error (pixels)
Mean error (pixels)
SD error (pixels)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22*
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
3.6056
4.0000
3.1623
2.0000
2.2361
2.2361
2.0000
2.0000
3.0000
3.6056
2.2361
3.1623
2.8284
2.0000
1.4142
3.6056
4.0000
1.4142
2.0000
1.4142
2.0000
2.0000
2.8284
2.2361
2.2361
3.0000
3.6056
2.0000
2.0000
3.0000
–
2.0000
2.2361
2.0000
1.4142
3.1623
2.8284
3.6056
1.4142
1.4142
2.0000
2.0000
1.4142
2.0000
2.0000
3.6056
1.0000
2.0000
1.4142
2.8284
1.4142
0.4451
0.6514
0.5835
0.5212
0.4004
0.5205
0.5148
0.5994
0.4962
0.5975
0.3973
0.6404
0.5402
0.4812
0.3367
0.5143
0.4762
0.4034
0.5200
0.3430
0.4075
0.5234
0.6127
0.4324
0.5491
0.5519
0.5179
0.5019
0.5335
0.5929
–
0.3944
0.4308
0.4907
0.4407
0.5894
0.5002
0.4650
0.4198
0.3758
0.4346
0.4231
0.4716
0.4516
0.5768
0.5109
0.3754
0.3650
0.4785
0.3875
0.5364
0.6340
0.7902
0.5784
0.5737
0.5269
0.5441
0.5427
0.5286
0.5723
0.6183
0.5350
0.6677
0.7000
0.5288
0.4777
0.6551
0.6633
0.4954
0.5258
0.4794
0.5302
0.5495
0.6517
0.5364
0.5530
0.6029
0.6969
0.5279
0.5694
0.6114
–
0.5164
0.5503
0.5152
0.5118
0.5674
0.6280
0.6325
0.5035
0.4930
0.5231
0.5252
0.5155
0.5485
0.5212
0.6629
0.4704
0.4889
0.5131
0.5538
0.5319
Average
2.3716
0.4856
0.5648
SD
0.7927
0.0806
0.0678
2.4. Iterative shape alignment
In this section we iteratively merge segments and eliminate undesired ones by aligning the current estimate with
the shape histogram. Initially, however, we perform morphological closing operation on the initial estimate
(obtained in Section 2.3) to join the segments separated
by watershed pixels, the result of which is called target T.
Now, in the ith iteration of the shape alignment step, the
area, orientation and centroid of the current target Ti are
calculated. Then the shape histogram is scaled, rotated
and translated (using bi-cubic interpolation) to be aligned
with Ti, and a probability map is constructed accordingly.
Finally the pixels in Ti with the lowest probabilities are
removed, resulting in an updated target Ti+1. The ith iteration of this procedure can be expressed by the following
steps:
Step 1. Align: Resize, rotate and translate the shape histogram to align it with the target
SHi ¼ R½ðAT i =AR Þ SH þ ðC T i C R Þ;
Step 2. Probability map: Construct the probability map
PMi based on SHi according to (1);
Step 3. Remove: The pixels in target Ti having the lowest
probability are removed.
T iþ1 ¼ fpjp 2 T i \
PMi ðpÞ > r maxðPMi ðqÞÞ þ ð1 rÞ minðPMi ðqÞÞg:
q2T i
q2T i
In the above procedure, SH represents the shape histogram obtained from the training set of images. AR and
C R are the area and centroid of the nonzero region R of
the shape histogram, respectively. Similarly, AT i and C T i
are the area and centroid of the target Ti, respectively. By
applying the rotation matrix R, the shape histogram will
rotate by a degree of H ¼ hT i hR , where hT i and hR are
the orientations of the principle axes of Ti and R, respectively. r is the step size associated with pixel removal for
each iteration. A small r generates a more accurate result
but requires more iterations for the algorithm to converge.
The algorithm converges when all the remaining pixels
have probabilities larger than s2, which indicates that the
shape of the current target is similar enough to the shape
*
k = 4.
Please cite this article in press as: G. Hamarneh, X. Li, Watershed segmentation using prior shape and appearance knowledge, Image
Vis. Comput. (2007), doi:10.1016/j.imavis.2006.10.009
ARTICLE IN PRESS
G. Hamarneh, X. Li / Image and Vision Computing xxx (2007) xxx–xxx
7
Fig. 6. Effect of initialization. (a) Original MRI. (b) Initialization with k = 7. (c)–(d) Intermediate results of iterative alignment based on initialization of
(b). (e) Satisfactory segmentation result. (f) Initialization with k = 4. (g)–(h) Intermediate results of iterative alignment based on initialization of (f). (i)
Unsuccessful segmentation result.
be kept. These segments are merged together to form the
segmented result of the desired organ.
2.5. Refining segmentation results
In some areas along the boundary of the segmented
result (acquired in Section 2.4), the extracted boundary
may fail to follow the true boundary perfectly. This can
occur when some of the small segments near the boundary
of the object are mis-clustered because their mean intensities are not similar enough to the segments inside the
organ. This is remedied by a simple refinement step in
which nearby segments are merged only if they are covered
by R and have mean intensities similar to that of the target
structure. Furthermore, the boundary segmentation result
exhibits a characteristic staircase effect. This is due to the fact
that watershed segments, and consequently the extracted
boundary, is a union of whole pixel (and not subpixels).
A smooth contour with subpixel resolution is easily
obtained by initializing an active contour model (snake)
[20–22] with the resulting watershed boundary pixels.
Subsequent snake deformation iterations according to
image gradient and tensile and flexural forces, provides
satisfactory results with only a few iterations (see the
results in Section 3).
3. Results
The proposed algorithm was tested and validated on the
51 brain MR images in [19], in which expert-segmented
corpora callosa for all images in the data set were provided.
To ensure cross validation, when segmenting an image, the
corresponding expert-segmented result and appearance
information were not included in the training set, i.e. a
leave-one-out scheme was applied in which 50 MR images
with the corresponding expert segmentation were used as
the training set for segmenting the remaining single image.
The shape histogram is obtained by aligning and adding
the expert segmented results. Each test image contains a
brain region cropped from the original image with a dimension of 91 · 141. The cropped brain parts of the 51 MR
images mostly cover the region beginning at the calvaria,
reach the upper part of the pons and have a fixed aspect
ratio. The intensity and contrast of these images vary from
one to another and the shape variation of the corpora callosa is considerably large. The original image has been preprocessed by a median filter prior to computing the
gradient. Example corpus callosum segmentation results
are presented in Fig. 5.
A constant value of a = 0.5 for all the MR images was
used in k-means clustering. By selecting a < 1, we actually
put more emphasis on intensity information and use spatial
position as additional constraints. We choose r = 2.5% as
the removal step size in each iteration. In general, for less
noisy images, r can be larger since there will not be many
small segments spreading around the structure. However,
to ensure the accuracy of the results, it should be no larger
than 5%. On the other hand, for images with much more
noise, r should be smaller and the algorithm will require
more iterations to converge. In our experiment, most cases
converged in less than 10 iterations. The two scalars s1 and
s2 in (1) are selected to be s1 = 0.7 and s2 = 0.3. We always
use s1 > s2 since shape information is the dominant factor.
The results are compared with the corresponding expert
segmented ones in the original data set. To demonstrate the
accuracy of our method, the maximum, average and standard deviation of Hausdorff distance [23] are calculated
and reported for all cases (Table 1). In our earlier work
on segmenting the corpus callosum from the same set of
mid-sagittal MR images [11], only 26 out of 51 images were
successfully segmented with the same set of parameters,
Fig. 7. A failure case. (a) Original image No. 31. (b) Segmentation result.
Please cite this article in press as: G. Hamarneh, X. Li, Watershed segmentation using prior shape and appearance knowledge, Image
Vis. Comput. (2007), doi:10.1016/j.imavis.2006.10.009
ARTICLE IN PRESS
8
G. Hamarneh, X. Li / Image and Vision Computing xxx (2007) xxx–xxx
compared to 49 out 51 in this work. Our results differ very
slightly from the expert segmented results.
However, we report two failure cases where a satisfactory result cannot be generated automatically. One failure
case is image No. 22. We used an experimental value of
k = 7 in k-means clustering and obtained satisfactory
results for all the images except image No. 22, which had
exceptionally low contrast. Consequently, the k-means
algorithm did not successfully classify the segments belonging to the corpora callosa into the same cluster. For that
case only, a good result is still obtained with k adjusted
to k = 4. In Table 1, this case is marked with an asterisk
(*) and the maximum, average and standard deviation of
Hausdorff distance are all calculated using k = 4.
It is important to note that the quality of initialization is
crucial for the subsequent iterative shape alignment and in
turn depends on the choice of the number of clusters k in
the k-means clustering algorithm. The difficulty in determining k is a intrinsic limitation of the k-means algorithm.
However, since all training images exhibit the same field of
view (or ‘zooming’ as described in Section 2.3) and brain
constituents (white matter, gray matter and cerebrospinal
fluid), the intensity distribution was concentrated around
certain intensity ranges and a single k value was suitable
for all images (except the failure case reported). When
the desired target is clustered into different clusters or too
many undesired segments are clustered into the same cluster along with the target, the algorithm fails to produce a
Fig. 8. Obtaining smooth boundaries with subpixel resolution. (a) The results acquired after iterative shape alignment (note the staircase effect in the
detected boundaries). (b)–(d) An iterative snake process employed to clean aliased edges and provide boundaries with subpixel resolution. (e) The final
segmented result.
Fig. 9. Qualitative comparison with typical results of other methods. Simple thresholding (a) clearly does not give acceptable results, classical watershed
segmentation suffers from over-segmenting the image. Snakes (c), region growing (d), and level set based deformable models (e), leak due the weak
boundary. Our proposed method (f) incorporates prior shape and appearance knowledge to merge watershed segments.
Please cite this article in press as: G. Hamarneh, X. Li, Watershed segmentation using prior shape and appearance knowledge, Image
Vis. Comput. (2007), doi:10.1016/j.imavis.2006.10.009
ARTICLE IN PRESS
G. Hamarneh, X. Li / Image and Vision Computing xxx (2007) xxx–xxx
correct segmentation. Fig. 6 demonstrates the effect of
changing k. In Fig. 6a, we set k = 7 and obtain the initialization shown in Fig. 6b. This yields a satisfactory segmentation after seven iterations (Fig. 6e). Setting k = 4 for the
same image results in a different initialization (Fig. 6f) and
22 iterations for the algorithm to converge to the erroneous
result in Fig. 6i. Such a failure due to an improper initialization, which is caused by an improper choice of k, can be
detected by comparing the segmentation results with the
training set and observing that a large number of iterations
is required for convergence. In those cases a different k
must be chosen.
Another failure case is image No. 31. The middle part of
the corpora callosa in this image is very thin and noisy as
shown in Fig. 7a. The watershed transformation was
unable to distinguish this part as a separate segment. Consequently, only the two ends of corpora callosa were well
segmented and the result is hence broken in the middle,
as shown in Fig. 7b.
Active contour models (snakes) was then initialized with
the previous results in order to obtain contours with subpixel accuracy and only a few snake deformation iterations
are executed2 (see Fig. 8). The error values reported (Table
1) compared our results prior to applying snakes to the
expert segmented results, simply because expert segmented
results are obtained without subpixel accuracy.
In Fig. 9c, we show a typical example of running snakes
directly on the original data. The figure demonstrates how
snakes leak from areas of weak edges. Fig. 9 presents other
qualitative comparisons with other methods (thresholding,
classical watershed, region growing, and level set based segmentation3). Fig. 9 helps to emphasize the advantage of
our algorithm in automatically locating and segmenting
the target. This advantage stems from the utilization of prior shape and appearance knowledge while relying on
watershed segments derived from the image.
9
ally affected by some of the limitations of the k-means clustering algorithm and a failure case is reported. However,
this case can be detected by comparing the segmentation
result with the shape histogram and adjusting the numberof-clusters parameter in k-means to obtain a satisfactory
result.
Future work includes applying our method to other data
sets as well as extending it to 3D. Given that the main
building blocks of our method (watershed, k-means, and
spatial transformations for alignment) are well developed
in 3D, we foresee no major difficulties. Segment-level iterative alignment techniques are also being developed to further ensure segments along weak boundaries or thin
structures are not lost. Additionally, more advanced
appearance models may be used when the intensity variations within the target structures are more complex as in
active appearance models [24]. This was not an issue for
the corpora callosa segmentation application.
Acknowledgements
GH was supported in part by the Canadian National
Science and Engineering Research Council (NSERC) discovery grant and Simon Fraser University President’s Research Grant. The cardiac MR data set was acquired and
annotated by Jens C. Nilsson and Bjørn A. Grønning, with
the Danish Research Centre for Magnetic Resonance
(DRCMR) [16,17]. The brain MR data was provided by
Martha Shenton of the Harvard Medical School and Polina Golland at MIT. The authors thank the reviewers for
their valuable comments, Tao Jia from Virginia Tech and
Peter Plett from Simon Fraser University for their help
with code development, proofreading, and manuscript
preparation.
References
4. Conclusion
In this paper we propose a new approach to improve
watershed segmentation by incorporating shape and
appearance knowledge. The method overcomes some
intrinsic problems with watershed transformation. The
oversegmentation problem is handled by clustering and
merging appropriate watershed segments. The effect of
noise is suppressed by computing the mean intensity of
each segment. As a result, even when the input image has
a relatively low resolution and low contrast, our algorithm
achieves relatively accurate results. Our method is margin2
Snakes software written in MATLAB (The MathWorks, Inc.) was
used. The code is available for download from http://www.cs.sfu.ca/
~hamarneh.
3
Region growing and level set methods are implemented using
‘itkConnectedThresholdImageFilter’ and ‘ThresholdSegmentationLevelSetImageFilter’ from the Insight Segmentation and Registration Toolkit,
ITK (www.itk.org).
[1] S. Beucher, C. LantuTjoul, Use of watersheds in contour detection,
in: Proceedings of International Workshop on Image Processing,
Real-time Edge and Motion Detection/Estimation, vol. 132, 1979, pp.
2.1–2.12.
[2] L. Vincent, P. Soille, Watersheds in digital spaces: an efficient
algorithm based on immersion simulations, IEEE transactions on
pattern analysis and machine intelligence 13 (1991) 583–598.
[3] A.N. Moga, M. Gabbouj, Parallel image component labelling with
watershed transformation, IEEE transactions on pattern analysis and
machine intelligence. 19 (1997) 441–450.
[4] S. Beucher, F. Meyer, The Morphological approach to segmentation:
the watershed transform, in: E.R. Dougherty (Ed.), Mathematical
Morphology in Image Processing, Marcel Dekker, New York, 1993,
pp. 433–481.
[5] J.B.T.M. Roerdink, A. Meijster, The watershed transform: definitions, algorithms and parallelization strategies, Fundamenta Informaticae 41 (2000) 187–228.
[6] V. Grau, A.U.J. Mewes, M. Alcañiz, Improved watershed transform
for medical image segmentation using prior information, IEEE
transactions on medical imaging 23 (4) (2004) 447–458.
[7] J.L. Vincent, Morphological grayscale reconstruction in image
analysis: application and efficient algorithms, IEEE transactions on
image processing 2 (1993) 176–201.
Please cite this article in press as: G. Hamarneh, X. Li, Watershed segmentation using prior shape and appearance knowledge, Image
Vis. Comput. (2007), doi:10.1016/j.imavis.2006.10.009
ARTICLE IN PRESS
10
G. Hamarneh, X. Li / Image and Vision Computing xxx (2007) xxx–xxx
[8] S. Beucher, Watershed, hierarchical segmentation and waterfall
algorithm, in: Mathematical Morphology and Its Applications to
Image Processing, Kluwer, Dordrecht, The Netherlands, 1994, pp.
69–76.
[9] T.F. Cootes, D. Cooper, C.J. Taylor, J. Graham, Active shape models
– their training and application, Computer Vision and Image
Understanding 61 (1) (1995) 38–59.
[10] G. Hamarneh, T. McInerney, D. Terzopoulos, Deformable organisms
for automatic medical image analysis, in: Proceedings of Medical
Image Computing and Computer-Assisted Intervention, in: Lecture
Notes in Computer Science, vol. 2208, 2001, pp. 66–75.
[11] T. McInerney, G. Hamarneh, M. Shenton, D. Terzopoulos, Deformable organisms for automatic medical image analysis, Journal of
Medical Image Analysis 6 (3) (2002) 251–266.
[12] M. Leventon, W. Grimson, O. Faugeras, Statistical Shape Influence
in Geodesic Active Contours, in: Proceedings of the Computer Vision
and Pattern Recognition, 2000, pp. 1316–1323.
[13] G. Szekely, A. Kelemen, C. Brechbuehler, G. Gerig, Segmentation of
3D objects from MRI volume data using constrained elastic deformations of flexible Fourier surface models, Medical Image Analysis 1
(1) (1996) 19–34.
[14] G. Hamarneh, T. Gustavsson, Statistically constrained snake deformations, Proceedings of the Systems, Man, and Cybernetics 3 (2000)
1610–1615.
[15] X. Li, G. Hamarneh, Modeling prior shape and appearance knowledge in watershed segmentation, Canadian Conference on Computer
and Robot Vision (2005) 27–33.
[16] M.B. Stegmann, R. Fisker, B.K. Ersbøll, Extending and applying
active appearance models for automated, high precision segmentation
in different image modalities, Proceedings of the 12th Scandinavian
Conference on Image Analysis 1 (2001) 90–97.
[17] M.B. Stegmann, An Annotated Dataset of 14 Cardiac MR Images,
Technical report, Informatics and Mathematical Modelling, Technical University of Denmark, 2002.
[18] H. Breu, J. Gil, D. Kirkpatrick, M. Werman, Linear time euclidean
distance transform algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence 17 (5) (1995) 529–533.
[19] M.E. Shenton, R. Kikinis, F.A. Jolesz, S.D. Pollak, M. LeMay, C.G.
Wible, H. Hokama, J. Martin, D. Metcalf, M. Coleman, R.W.
McCarley, Abnormalities in the left temporal lobe and thought
disorder in schizophrenia: a quantitative magnetic resonance imaging
study, New England Journal of Medicine 327 (1992) 604–612.
[20] M. Kass, A. Witkin, D. Terzopoulos, Snakes: active contour models,
International Journal of Computer Vision 1 (4) (1988) 321–331.
[21] G. Hamarneh, A. Chodorowski, T. Gustavsson, Active contour
models: application to oral lesion detection in color images, IEEE
International Conference on Systems, Man, and Cybernetics 4 (2000)
2458–2463.
[22] S. Lobregt, M.A. Viergever, Discrete dynamic contour model, IEEE
Transactions on Medical Imaging 14 (1) (1995) 1224.
[23] P.K. Agarwal, M. Sharir, Efficient algorithms for geometric optimization, ACM Computing Surveys 30 (4) (1998) 412–458.
[24] T.F. Cootes, G.J. Edwards, C.J. Taylor, Active appearance models,
IEEE PAMI. 34 (6) (2001) 681–685.
Please cite this article in press as: G. Hamarneh, X. Li, Watershed segmentation using prior shape and appearance knowledge, Image
Vis. Comput. (2007), doi:10.1016/j.imavis.2006.10.009