0% found this document useful (0 votes)
29 views70 pages

BAI151A Module 5 Textbook

Uploaded by

syedasubhana2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views70 pages

BAI151A Module 5 Textbook

Uploaded by

syedasubhana2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

636 Chapter 9 Morphological Image Processing MODULE 5

9.1 PRELIMINARIES
9.1

Before proceeding, you


The language of mathematical morphology is set theory. As such, morphology offers
will find it helpful to a unified and powerful approach to numerous image processing problems. When
review the discussion in working with images, sets in mathematical morphology represent objects in those
Section 2.4 dealing with
representing images, the images. In binary images, the sets in question are members of the 2-D integer space
discussion on
connectivity in Section
Z 2 , where each element of a set is a tuple (2-D vector) whose coordinates are the
2.5, and the discussion on coordinates of an object (typically foreground) pixel in the image. Grayscale digital
sets in Section 2.6. images can be represented as sets whose components are in Z 3 . In this case, two
components of each element of the set refer to the coordinates of a pixel, and the
third corresponds to its discrete intensity value. Sets in higher dimensional spaces
can contain other image attributes, such as color and time-varying components.
Morphological operations are defined in terms of sets. In image processing, we use
morphology with two types of sets of pixels: objects and structuring elements (SE’s).
Typically, objects are defined as sets of foreground pixels. Structuring elements can
be specified in terms of both foreground and background pixels. In addition, struc-
turing elements sometimes contain so-called “don’t care” elements, denoted by ×,
signifying that the value of that particular element in the SE does not matter. In this
sense, the value can be ignored, or it can be made to fit a desired value in the evalu-
ation of an expression; for example, it might take on the value of a pixel in an image
in applications in which value matching is the objective.
Because the images with which we work are rectangular arrays, and sets in general
are of arbitrary shape, applications of morphology in image processing require that
sets be embedded in rectangular arrays. In forming such arrays, we assign a back-
ground value to all pixels that are not members of object sets. The top row in Fig. 9.1
shows an example. On the left are sets in the graphical format you are accustomed
to seeing in book figures. In the center, the sets have been embedded in a rectangular
background (white) to form a graphical image.† On the right, we show a digital image
(notice the grid) which is the format we use for digital image processing.
Structuring elements are defined in the same manner, and the second row in Fig. 9.1
shows an example. There is an important difference between the way we represent
digital images and digital structuring elements. Observe on the top right that there is
a border of background pixels surrounding the objects, while there is none in the SE.
As you will learn shortly, structuring elements are used in a form similar to spatial
convolution kernels (see Fig. 3.28), and the image border just described is similar
to the padding we discussed in Section 3.4 and 3.5. The operations are different in
morphology, but the padding and sliding operations are the same as in convolution.
In addition to the set definitions given in Section 2.6, the concept of set reflection
and translation are used extensively in morphology in connection with structuring
elements. The reflection of a set (structuring element) B about its origin, denoted by
Bˆ , is defined as


Sets are shown as drawings of objects (e.g. squares and triangles) of arbitrary shape. A graphical image contains
sets that have been embedded into a background to form a rectangular array. When we intend for a drawing to
be interpreted as a digital image (or structuring element), we include a grid in illustrations that might otherwise
be ambiguous. Objects in all drawings are shaded, and the background is shown in white. When working with
actual binary images, we say that objects are foreground pixels. All other pixels are background.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 636 6/16/2017 2:11:40 PM


9.1 Preliminaries 637

Objects representeed
as sets Objects represented as
Digital image
a graphical image

Structuring element Structuring element Digital


represented as a set represented as a graphical image structuring element

FIGURE 9.1 Top row. Left: Objects represented as graphical sets. Center: Objects embedded in a background to form
a graphical image. Right: Object and background are digitized to form a digital image (note the grid). Second row:
Example of a structuring element represented as a set, a graphical image, and finally as a digital SE.

Bˆ = {w w = −b, for b ∈ B} (9-1)

That is, if B is a set of points in 2-D, then B̂ is the set of points in B whose ( x, y)
Reflection is the same
coordinates have been replaced by ( − x, − y). Figure 9.2 shows several examples of
operation we performed digital sets (structuring elements) and their reflection. The dot denotes the origin of
with kernels prior to the SE. Note that reflection consists simply of rotating an SE by 180° about its origin,
spatial convolution, as
explained in Section 3.4. and that all elements, including the background and don’t care elements, are rotated.
The translation of a set B by point z = ( z1 , z2 ) , denoted ( B )z , is defined as

( B)z = {c c = b + z, for b ∈ B} (9-2)

That is, if B is a set of pixels in 2-D, then ( B )z is the set of pixels in B whose ( x, y)
coordinates have been replaced by ( x + z1 , y + z2 ) . This construct is used to trans-
late (slide) a structuring element over an image, and each location perform a set

FIGURE 9.2
Structuring
×
elements and their ×
reflections about the
origin (the ×’s are
don’t care elements, B B̂
and the dots denote ×
×
× × ×
the origin). Reflec-
tion is rotation by B B̂
× ×
×
180° of an SE about
B B̂ B B̂
its origin.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 637 6/16/2017 2:11:42 PM


638 Chapter 9 Morphological Image Processing

a b c A

FIGURE 9.3
(a) A binary image
containing one object
(set), A. (b) A struc- B
turing element, B.
(c) Image resulting Image I Image after morphological operation
from a morphological
operation (see text).

operation between the structuring element and the area of the image directly under
it, as we explained in Fig. 3.28 for correlation and convolution. Both reflection and
translation are defined with respect to the origin of B.
As an introduction to how morphological operations between images and struc-
turing elements are performed, consider Fig 9.3, which shows a simple binary image, I,
consisting of an object (set) A, shown shaded, and a 3 × 3 SE whose elements are
all 1’s (foreground pixels). The background pixels (0’s) are shown in white. We are
interested in performing the following morphological operations: (1) form a new
image, of the same size as I, consisting only of background values initially, (2) trans-
late (slide) B over image I, and (3) at each increment of translation, if B is completely
contained in A, mark the location of the origin of B as a foreground pixel in the new
image; otherwise, leave it as a background point. Figure 9.3(c) is the result after the
origin of B has visited every element of I. We see that, when the origin of B is on
a border element of A, part of B ceases to be contained in A, thus eliminating that
location of the origin of B as a possible foreground point of the new image. The net
result is that the boundary of set A is eroded, as Fig. 9.3(e) shows. Because of the way
in which we defined the operation, the maximum excursion needed for B in I is when
the origin of B (which is at its center) is contained in A. With B being of size 3 × 3,
the narrowest background padding we needed was one pixel wide, as shown in Fig.
9.3(a). By using the smallest border needed for an operation, we keep the drawings
The reason we gener-
smaller. In practice, we specify the width of padding based on the maximum dimen-
ally specify the padding sions of the structuring elements used, regardless of the operations being performed.
border to be of the same
dimensions as B, is that
When we use terminology such as “the structuring element B is contained in
some morphological set A,” we mean specifically that the foreground elements of B overlap only ele-
operations are defined
for an entire structuring
ments of A. This becomes an important issue when B also contains background and,
element, and cannot be possibly, don’t care elements. Also, we use set A to denote all foreground pixels of I.
interpreted with respect
to the location of its Those foreground elements can be a single object, as in Fig. 9.3, or they can represent
origin. disjoint subsets of foreground elements, as in the first row of Fig. 9.1. We will discuss
binary images and structuring elements from Sections 9.2 through 9.7. Then, in Sec-
tion 9.8, we will extend the binary ideas to grayscale images and structuring elements.

9.2 EROSION AND DILATION


9.2

We begin the discussion of morphology by studying two operations: erosion and


dilation. These operations are fundamental to morphological processing. In fact,
many of the morphological algorithms discussed in this chapter are based on these
two primitive operations.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 638 6/16/2017 2:11:43 PM


9.2 Erosion and Dilation 639

EROSION
Morphological expressions are written in terms of structuring elements and a set,
Remember, set A can
represent (be the union A, of foreground pixels, or in terms of structuring elements and an image, I, that
of) multiple disjoint sets contains A. We consider the former approach first. With A and B as sets in Z 2 , the
of foreground pixels
(i.e., objects). erosion of A by B, denoted A | B, is defined as

{
A | B = z ( B )z 8 A } (9-3)

where A is a set of foreground pixels, B is a structuring element, and the z’s are
foreground values (1’s). In words, this equation indicates that the erosion of A by B
is the set of all points z such that B, translated by z, is contained in A. (Remember,
displacement is defined with respect to the origin of B.) Equation (9-3) is the formu-
lation that resulted in the foreground pixels of the image in Fig. 9.3(c).
As noted, we work with sets of foreground pixels embedded in a set of back-
ground pixels to form a complete image, I. Thus, inputs and outputs of our morpho-
logical procedures are images, not individual sets. We could make this fact explicit
by writing Eq. (9-3) as

{ } {
I | B = z ( B )z 8 A and A 8 I ´ Ac Ac 8 I } (9-4)

where I is a rectangular array of foreground and background pixels. The contents of


the first braces say the same thing as Eq. (9-3), with the added clarification that A is
a subset of (i.e., is contained in) I. The union with the operation inside the second set
of braces “adds” the pixels that are not in subset A (i.e., Ac , which is the set of back-
ground pixels) to the result from the first braces, requiring also that the background
pixels be part of the rectangle defined by I. In words, all this equation says is that
erosion of I by B is the set of all points, z, such that B, translated by z, is contained in
A. The equation also makes explicit that A is contained in I, that the result is embed-
ded in a set of background pixels, and that the entire process is of the same size as I.
Of course, we do not use the cumbersome notation of Eq. (9-4), which we show
only to emphasize an important point. Instead, we use the notation A | B when a
morphological operation uses only foreground elements, and I | B when the oper-
ation uses foreground and background elements. This distinction may seem trivial,
but suppose that we want to perform erosion with Eq. (9-3), using the foreground
elements of the structuring element in the last column in Fig. 9.2. This structuring
element also has background elements, but Eq. (9-3) assumes that B only has fore-
ground elements. In fact, erosion is defined only for operations between foreground
elements, so writing I | B would be meaningless without the “explanation” embed-
ded in Eq. (9-4). To avoid confusion, we use A in morphological expressions when
the operation involves only foreground elements, and I when the operation also
involves background and/or “don’t-care” elements. We also avoid using standard
morphological symbols like | when working with “mixed” SEs. For example, later
in Eq. (9-17) we use the symbol in the expression I  B = {z P ( B )z 8 I }, which has
the same form as Eq. (9-3), but instead involves an entire image and the mixed-value
SE in the last column of Fig. 9.2. As you will see, using SE’s with mixed values adds
considerable power to morphological operations.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 639 6/16/2017 2:11:43 PM


640 Chapter 9 Morphological Image Processing

Returning to our discussion of Eq. (9-3), because the statement that B has to be
contained in A is equivalent to B not sharing any common elements with the back-
ground (i.e., the set complement of A), we can express erosion equivalently as

{
A | B = z ( B )z ¨ Ac = ∅ } (9-5)

where, as defined in Section 2.6, ∅ is the empty set.


Figure 9.4 shows an example of erosion. The elements of set A (shaded) are the
foreground pixels of image I, and, as before, the background is shown in white. The
solid boundary inside the dashed boundary in Fig. 9.4(c) is the limit beyond which
further displacements of the origin of B would cause some elements of the struc-
turing element to cease being completely contained in A. Thus, the locus of points
(locations of the origin of B) within (and including) this boundary constitutes the
foreground elements of the erosion of A by B. We show the resulting erosion shaded
in Fig. 9.4(c), and the background as white. Erosion is the set of values of z that sat-
isfy Eqs. (9-3) or (9-5). The boundary of A is shown dashed in Figs. 9.4(c) and (e) as a
reference; it is not part of the erosion. Figure 9.4(d) shows an elongated structuring
element, and Fig. 9.4(e) shows the erosion of A by this element. Note that the origi-
nal object was eroded to a line. As you can see, the result of erosion is controlled by
the shape of the structuring element. In both cases, the assumption is that the image
was padded to accommodate all excursions of B, and that the result was cropped to
the same size as the original image, just as we did with images processed by spatial
convolution in Chapter 3.
Equations (9-3) and (9-5) are not the only definitions of erosion (see Problems 9.12
and 9.13 for two additional, equivalent definitions). However, the former equations
have the advantage of being more intuitive when the structuring element B is viewed
as if it were a spatial kernel that slides over a set, as in convolution.

a b c
d e d A A両B

FIGURE 9.4
(a) Image I, d/4
consisting of a set d d/4 I両B
(object) A, and back- B
ground.
(b) Square SE, B (the
dot is the origin). Background d/8 3d/4 d/8
Image I
(c) Erosion of A by
B (shown shaded in d/4
the resulting image).
A両B
(d) Elongated SE. d/ 2
(e) Erosion of A
by B. (The erosion d I両B
is a line.) The dotted
d/ 2
border in (c) and (e)
is the boundary of A,
shown for reference. B d/8 3d/4 d/8

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 640 6/16/2017 2:11:44 PM


9.2 Erosion and Dilation 641

EXAMPLE 9.1 : Using erosion to remove image components.


Figure 9.5(a) is a binary image depicting a simple wire-bond mask. As mentioned previously, we gener-
ally show the foreground pixels in binary images in white and the background in black. Suppose that
we want to remove the lines connecting the center region to the border pads in Fig. 9.5(a). Eroding the
image (i.e., eroding the foreground pixels of the image) with a square structuring element of size 11 × 11
whose components are all 1’s removed most of the lines, as Fig. 9.5(b) shows. The reason that the two
vertical lines in the center were thinned but not removed completely is that their width is greater than
11 pixels. Changing the SE size to 15 × 15 elements and eroding the original image again did remove all
the connecting lines, as Fig. 9.5(c) shows. An alternate approach would have been to erode the image
in Fig. 9.5(b) again, using the same 11 × 11, or smaller, SE. Increasing the size of the structuring element
even more would eliminate larger components. For example, the connecting lines and the border pads
can be removed with a structuring element of size 45 × 45 elements applied to the original image, as
Fig. 9.5(d) shows.
We see from this example that erosion shrinks or thins objects in a binary image. In fact, we can
view erosion as a morphological filtering operation in which image details smaller than the structuring
element are filtered (removed) from the image. In Fig. 9.5, erosion performed the function of a “line
filter.” We will return to the concept of morphological filters in Sections 9.4 and 9.8.

DILATION
With A and B as sets in Z 2 , the dilation of A by B, denoted as A { B, is defined as

{ }
A { B = z P (Bˆ )z ¨ A ≠ ∅ (9-6)

a b
c d
FIGURE 9.5
Using erosion to
remove image
components.
(a) A 486 × 486
binary image of a
wire-bond mask
in which fore-
ground pixels are
shown in white.
(b)–(d) Image
eroded using
square structuring
elements of sizes
11 × 11, 15 × 15,
and 45 × 45
elements,
respectively, all
valued 1.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 641 6/16/2017 2:11:45 PM


642 Chapter 9 Morphological Image Processing

This equation is based on reflecting B about its origin and translating the reflection
by z, as in erosion. The dilation of A by B then is the set of all displacements, z, such
that the foreground elements of B̂ overlap at least one element of A. (Remember,
z is the displacement of the origin of Bˆ .) Based on this interpretation, Eq. (9-6) can
be written equivalently as

{
A { B = z P [(Bˆ )z ¨ A] 8 A} (9-7)

Equations (9-6) and (9-7) are not the only definitions of dilation currently in use
(see Problems 9.14 and 9.15 for two different, yet equivalent, definitions). As with
erosion, the preceding definitions have the advantage of being more intuitive when
structuring element B is viewed as a convolution kernel. As noted earlier, the basic
process of flipping (rotating) B about its origin and then successively displacing it
so that it slides over set A is analogous to spatial convolution. However, keep in
mind that dilation is based on set operations and therefore is a nonlinear operation,
whereas convolution is a sum of products, which is a linear operation.
Unlike erosion, which is a shrinking or thinning operation, dilation “grows” or
“thickens” objects in a binary image. The manner and extent of this thickening is con-
trolled by the shape and size of the structuring element used. Figure 9.6(a) shows the
same object used in Fig. 9.4 (the background area is larger to accommodate the dila-
tion), and Fig. 9.6(b) shows a structuring element (in this case B̂ = B because the SE
is symmetric about its origin). The dashed line in Fig. 9.6(c) shows the boundary of
the original object for reference, and the solid line shows the limit beyond which any
further displacements of the origin of B̂ by z would cause the intersection of B̂ and
A to be empty. Therefore, all points on and inside this boundary constitute the dila-
tion of A by B. Figure 9.6(d) shows a structuring element designed to achieve more
dilation vertically than horizontally, and Fig. 9.6(e) shows the dilation achieved with
this element.

EXAMPLE 9.2 : Using dilation to repair broken characters in an image.


One of the simplest applications of dilation is for bridging gaps. Figure 9.7(a) shows the same image
with broken characters that we studied in Fig. 4.48 in connection with lowpass filtering. The maximum
length of the breaks is known to be two pixels. Figure 9.7(b) shows a structuring element that can be
used for repairing the gaps. As noted earlier, we use white (1) to denote the foreground and black (0) for
the background when working with images. Figure 9.7(c) shows the result of dilating the original image
with the structuring element. The gaps were bridged. One important advantage of the morphological
approach over the lowpass filtering method we used to bridge the gaps in Fig. 4.48 is that the morpho-
logical method resulted directly in a binary image. Lowpass filtering, on the other hand, started with
a binary image and produced a grayscale image that would require thresholding to convert it back to
binary form (we will discuss thresholding in Chapter 10). Observe that set A in this application consists
of numerous disjointed objects of foreground pixels.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 642 6/16/2017 2:11:45 PM


9.2 Erosion and Dilation 643

a b c
d e
A丣B
FIGURE 9.6 A
(a) Image I,
composed of set
d/4
(object) A and
background. d d/4
(b) Square SE (the Bˆ  B
dot is the origin).
(c) Dilation of A by d
B (shown shaded). d
d/ 8 d/8
(d) Elongated SE.
(e) Dilation of A by Background
I丣B
Image, I
this element. The
dotted line in (c) A丣B
and (e) is the
boundary of A, d/2
shown for d/4
reference.

d d

Bˆ  B d/2

d/ 8 d d/ 8
I丣B

a c
b
FIGURE 9.7
(a) Low-resolution
text showing
broken characters
(see magnified
view).
(b) Structuring
element.
(c) Dilation of (a)
by (b). Broken
segments were
1 1 1
joined.
1 1 1
1 1 1

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 643 6/16/2017 2:11:46 PM


644 Chapter 9 Morphological Image Processing

DUALITY
Erosion and dilation are duals of each other with respect to set complementation
and reflection. That is,

( A | B)c = Ac { Bˆ (9-8)

and

( A { B)c = Ac | Bˆ (9-9)

Equation (9-8) indicates that erosion of A by B is the complement of the dilation of


Ac by Bˆ , and vice versa. The duality property is useful when the structuring element
values are symmetric with respect to its origin (as often is the case), so that Bˆ = B.
Then, we can obtain the erosion of A simply by dilating its background (i.e., dilating
Ac ) with the same structuring element and complementing the result. Similar com-
ments apply to Eq. (9-9).
We proceed to prove formally the validity of Eq. (9-8) in order to illustrate a typi-
cal approach for establishing the validity of morphological expressions. Starting with
the definition of erosion, it follows that

( A | B)c = {z P ( B)z }
c
8A

If set (B)z is contained in A, then it follows that ( B )z ¨ Ac = ∅, in which case the


preceding expression becomes

( A | B)c = {z P ( B)z ¨ Ac = ∅}
c

But the complement of the set of z’s that satisfy ( B )z ¨ Ac = ∅ is the set of z’s such
that ( B )z ¨ Ac ≠ ∅. Therefore,

( A | B)c {
= z | ( B )z ¨ Ac ≠ ∅ }
= A { Bˆ
c

where the last step follows from the definition of dilation in Eq. (9-6) and its equiva-
lent form in Eq. (9-7). This concludes the proof. A similar line of reasoning can be
used to prove Eq. (9-9) (see Problem 9.16).

9.3 OPENING AND CLOSING


9.3

As you saw in the previous section, dilation expands the components of a set and
erosion shrinks it. In this section, we discuss two other important morphological
operations: opening and closing. Opening generally smoothes the contour of an
object, breaks narrow isthmuses, and eliminates thin protrusions. Closing also tends

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 644 6/16/2017 2:11:47 PM


9.3 Opening and Closing 645

to smooth sections of contours, but, as opposed to opening, it generally fuses narrow


breaks and long thin gulfs, eliminates small holes, and fills gaps in the contour.
The opening of set A by structuring element B, denoted by A ⴰ B, is defined as

A ⴰ B = ( A | B) { B (9-10)

Thus, the opening A by B is the erosion of A by B, followed by a dilation of the result


by B.
Similarly, the closing of set A by structuring element B, denoted A B, is defined

as

A B = ( A { B) | B
䊉 (9-11)

which says that the closing of A by B is simply the dilation of A by B, followed by


erosion of the result by B.
Equation (9-10) has a simple geometrical interpretation: The opening of A by B is
the union of all the translations of B so that B fits entirely in A. Figure 9.8(a) shows
an image containing a set (object) A and Fig. 9.8(b) is a solid, circular structuring ele-
When a circular
ment, B. Figure 9.8(c) shows some of the translations of B such that it is contained
structuring element is within A, and the set shown shaded in Fig. 9.8(d) is the union of all such possible
used for opening, the translations. Observe that, in this case, the opening is a set composed of two disjoint
analogy is often made of
the shape of the opening subsets, resulting from the fact that B could not fit in the narrow segment in the cen-
being determined by a ter of A. As you will see shortly, the ability to eliminate regions narrower than the
“rolling ball” reaching as
far as it can on the inner structuring element is one of the key features of morphological opening.
boundary of a set. For The interpretation that the opening of A by B is the union of all the translations
morphological closing
the ball rolls outside, and of B such that B fits entirely within A can be written in equation form as

{ }
the shape of the closing
is determined by how far
the ball can reach into
A B = ∪ ( B )z ( B )z 8 A (9-12)
the boundary.
where ´ denotes the union of the sets inside the braces.

a b
c d A
FIGURE 9.8
(a) Image I,
composed of set B
(object) A and
background. Background
(b) Structuring Image, I
element, B.
(c) Translations AⴰB
of B while being
contained in A. (A
is shown dark for
clarity.)
(d) Opening of A
by B.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 645 6/16/2017 2:11:48 PM


646 Chapter 9 Morphological Image Processing

a b
c d
FIGURE 9.9 A
(a) Image I,
composed of set B
(object) A, and
background. Background
(b) Structuring Image, I
element B.
(c) Translations of B
such that B does not A B 䊉

overlap any part


of A. (A is shown
dark for clarity.)
(d) Closing of A
by B.

Closing has a similar geometric interpretation, except that now we translate B


outside A. The closing is then the complement of the union of all translations of B
that do not overlap A. Figure 9.9 illustrates this concept. Note that the boundary of
the closing is determined by the furthest points B could reach without going inside
any part of A. Based on this interpretation, we can write the closing of A by B as

{ }
c
A B = ⎡ ∪ ( B )z ( B )z ¨ A = ∅ ⎤
䊉 (9-13)
⎣ ⎦

EXAMPLE 9.3 : Morphological opening and closing.


Figure 9.10 shows in more detail the process and properties of opening and closing. Unlike Figs. 9.8
and 9.9, whose main objectives are overall geometrical interpretations, this figure shows the individual
processes and also pays more attention to the relationship between the scale of the final results and the
size of the structuring elements.
Figure 9.10(a) shows an image containing a single object (set) A, and a disk structuring element.
Figure 9.10(b) shows various positions of the structuring element during erosion. This process resulted
in the disjoint set in Fig. 9.10(c). Note how the bridge between the two main sections was eliminated.
Its width was thin in relation to the diameter of the structuring element, which could not be completely
contained in this part of the set, thus violating the definition of erosion. The same was true of the two
rightmost members of the object. Protruding elements where the disk did not fit were eliminated. Figure
9.10(d) shows the process of dilating the eroded set, and Fig. 9.10(e) shows the final result of opening.
Morphological opening removes regions that cannot contain the structuring element, smoothes object
contours, breaks thin connections, and removes thin protrusions.
Figures 9.10(f) through (i) show the results of closing A with the same structuring element. As with
opening, closing also smoothes the contours of objects. However, unlike opening, closing tends to join
narrow breaks, fills long thin gulfs, and fills objects smaller than the structuring element. In this example,
the principal result of closing was that it filled the small gulf on the left of set A.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 646 6/16/2017 2:11:48 PM


9.3 Opening and Closing 647

As with erosion and dilation, opening and closing are duals of each other with
respect to set complementation and reflection:

( A ⴰ B)c = ( Ac 䊉 Bˆ ) (9-14)

and

(A 䊉
(
B ) = Ac ⴰ Bˆ
c
) (9-15)

We leave the proof of these equations as an exercise (see Problem 9.20).

a
b c
d e A
f g
h i B
FIGURE 9.10
Morphological Background Image, I
opening and
closing.
(a) Image I, A両B
composed of a
set (object ) A
and background;
a solid, circular
structuring element
is shown also. (The
dot is the origin.) A  B  (A 両 B) 丣 B
(b) Structuring
element in
various positions.
(c)-(i) The
morphological
operations used to
obtain the opening
and closing.

A丣B

A  B  (A 丣 B) 両 B

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 647 6/16/2017 2:11:48 PM


648 Chapter 9 Morphological Image Processing

Morphological opening has the following properties:

(a) A ⴰ B is a subset of A.
(b) If C is a subset of D, then C ⴰ B is a subset of D ⴰ B.
(c) ( A ⴰ B) ⴰ B = A ⴰ B.

Similarly, closing satisfies the following properties:

(a) A is a subset of A B.

(b) If C is a subset of D, then C B is a subset of D B.


䊉 䊉

(c) ( A B) B = A B.
䊉 䊉 䊉

Note from condition (c) in both cases that multiple openings or closings of a set have
no effect after the operation has been applied once.

EXAMPLE 9.4 : Using opening and closing for morphological filtering.


Morphological operations can be used to construct filters similar in concept to the spatial filters discussed
in Chapter 3. The binary image in Fig. 9.11(a) shows a section of a fingerprint corrupted by noise. In
terms of our previous notation, A is the set of all foreground (white) pixels, which includes objects of
interest (the fingerprint ridges) as well as white specks of random noise. The background is black, as
before. The noise manifests itself as white specks on a dark background and dark specks on the white
components of the fingerprint. The objective is to eliminate the noise and its effects on the print, while
distorting it as little as possible. A morphological filter consisting of an opening followed by a closing can
be used to accomplish this objective.
Figure 9.11(b) shows the structuring element we used. The rest of Fig. 9.11 shows the sequence of
steps in the filtering operation. Figure 9.11(c) is the result of eroding A by B. The white speckled noise
in the background was eliminated almost completely in the erosion stage of opening because in this case
most noise components are smaller than the structuring element. The size of the noise elements (dark
spots) contained within the fingerprint actually increased in size. The reason is that these elements are
inner boundaries that increase in size as objects are eroded. This enlargement is countered by perform-
ing dilation on Fig. 9.11(c). Figure 9.11(d) shows the result.
The two operations just described constitute the opening of A by B. We note in Fig. 9.11(d) that the
net effect of opening was to reduce all noise components in both the background and the fingerprint
itself. However, new gaps between the fingerprint ridges were created. To counter this undesirable effect,
we perform a dilation on the opening, as shown in Fig. 9.11(e). Most of the breaks were restored, but the
ridges were thickened, a condition that can be remedied by erosion. The result, shown in Fig. 9.11(f), is
the closing of the opening of Fig. 9.11(d). This final result is remarkably clean of noise specks, but it still
shows some specks of noise that appear as single pixels. These could be eliminated by methods we will
discuss later in this chapter.

9.4 THE HIT-OR-MISS TRANSFORM


9.4

The morphological hit-or-miss transform (HMT) is a basic tool for shape detection.
Let I be a binary image composed of foreground (A) and background pixels, respec-
tively. Unlike the morphological methods discussed thus far, the HMT utilizes two

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 648 6/16/2017 2:11:49 PM


9.4 The Hit-or-Miss Transform 649

a b A (foreground pixels)
1 1 1 B
d c 1 1 1
e f A両B
1 1 1
FIGURE 9.11
(a) Noisy image.
(b) Structuring
element.
(c) Eroded image.
(d) Dilation of the
erosion (opening
of A). (e) Dilation
of the opening.
(f) Closing of the
opening.
(Original image (A 両 B) 丣 B  A  B
courtesy of the
National Institute
(A  B) 丣 B [(A  B) 丣 B] 両 B  (A  B)  B
of Standards and
Technology.)

structuring elements: B1 , for detecting shapes in the foreground, and B2 , for detect-
With reference to the
explanation of Eq. (9-4), ing shapes in the background. The HMT of image I is defined as
we show the
morphological HMT
operation working
directly on image I, to
{
I  B1,2 = z P ( B1 )z 8 A and ( B2 )z 8 Ac } (9-16)
make it explicit that the
structuring elements (
= ( A | B1 ) ¨ Ac | B2 )
work on sets of
foreground and back-
ground pixels where the second line follows from the definition of erosion in Eq. (9-3). In words,
simultaneously. this equation says that the morphological HMT is the set of translations, z, of struc-
turing elements B1 and B2 such that, simultaneously, B1 found a match in the fore-
ground (i.e., B1 is contained in A) and B2 found a match in the background (i.e., B2
is contained in Ac ). The word “simultaneous” implies that z is the same translation
of both structuring elements. The word “miss” in the HMT arises from the fact that
B2 finding a match in Ac is the same as B2 not finding (missing) a match in A.
Figure 9.12 illustrates the concepts just introduced. Suppose that we want to find
the location of the origin of object (set) D in image I. Here, A is the union of all
object sets, so D is a subset of A. The need for two structuring elements capable

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 649 6/16/2017 2:11:50 PM


650 Chapter 9 Morphological Image Processing

a b
c d A = C 艛 D艛 E Background
e f
FIGURE 9.12
(a) Image
consisting of a E
foreground (1’s) C
equal to the union, d
A, of set of objects, D
and a background d
of 0’s.
Background Foreground = A c = C c ¨ Dc ¨ E c
(b) Image with Ic
its foreground Image, I
defined as Ac . d
(c) Structuring ele-
ments designed to d
detect object D. A | B1
(d) Erosion of A B1
by B1 .
Foreground
(e) Erosion of Ac d pixels
by B2 . d
(f) Intersection of
(d) and (e),
B2
showing the
location of the
origin of D, as
desired. The dots Ac | B2
indicate the origin
of their respective
components. Each
dot is a single
pixel.
Origin of D

Background
Image: I  B1,2 = A | B1 艚 Ac | B2

of detecting properties of both the foreground and background becomes immedi-


ately obvious. All three objects are composed of foreground pixels, and one way of
explaining why they appear as different shapes is because each occupies a different
area of the background. In other words, the nature of a shape is determined by the
geometrical arrangement of both foreground and background pixels.
Figure 9.12(a) shows that I is composed of foreground (A) and background pixels.
Figure 9.12(b) is I c, the complement of I. The foreground of I c is defined as the set of
pixels in Ac, and the background is the union of the complement of the three objects.
Figure 9.12(c) shows the two structuring elements needed to detect D. Element B1 is
equal to D itself. As Fig. 9.12(d) shows, the erosion of A by B1 contains a single point:
the origin of D, as desired, but it also contains parts of object C.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 650 6/16/2017 2:11:51 PM


9.4 The Hit-or-Miss Transform 651

Structuring element B2 is designed to detect D in I c . Because D is composed of


background elements in I c, and erosion works with foreground elements, B2 has to
be designed to detect the border of D, which is composed of foreground pixels in I c.
The SE in Fig. 9.12(c) does precisely this. It consists of a rectangle of foreground ele-
ments one pixel thick. The size of the rectangle is such that is encloses the size of D.
Figure 9.12(e) shows (shaded) the erosion of the foreground of I c by B2 . It contains
the origin of D, but is also contains parts of sets Ac and C. (The outer shaded area
in Fig. 9.12(e) is larger than shown (see Problem 9.25); the result was cropped to
the same size as image I for consistency.) The only elements that are common in
Figs. 9.12(d) and (e) is the origin of D, so the intersection of these two sets of ele-
ments gives the location of that point, as desired. Figure 9.12(f) shows the final result.
The preceding explanation is the classic way of presenting the HMT using erosion,
which is defined only for foreground pixels. A good question at this point is: Why
not try to detect D directly in image I using a single structuring element, instead
of going thorough such a laborious process? The answer is that it is possible to do
so, but not in the “traditional” context of erosion the way we defined it in Eqs. (9-3)
and (9-5). In order to detect D directly in image I, we would have to be able to pro-
cess foreground and background pixels simultaneously, rather than processing just
foreground pixels, as required by the definition of erosion.
To show how this can be done for the example in Fig. 9.12, we define a structuring
element, B, identical to D, but having in addition a border of background elements
with a width of one pixel. We can use a structuring element formed in such a way to
restate the HMT as

{
I  B = z P ( B )z 8 I } (9-17)

The form is the same as Eq. (9-3), but now we test to see if ( B)z is a subset of image I,
which is composed of both foreground and background pixels. This formulation is
general, in the sense that B can be structured to detect any arrangement of pixels in
image I, as Figs. 9.13 and 9.14 will illustrate.
Figure 9.13 shows graphically the same solution as Fig. 9.12(f), but using the
single structuring element discussed in the previous paragraph. Figure 9.14 shows
several examples based on using Eq. (9-17). The first row shows the result of using
a small SE composed of both foreground (shaded) and background elements. This
SE is designed to detect one-pixel holes (i.e., one background pixel surrounded by a
connected border of foreground pixels) contained in image I. The SE in the second
row is capable of detecting the foreground corner pixel of the top, right corner of
the object in I. Using this SE in Eq. (9-17) yielded the image on the right. As you
can see, the correct pixel was identified. The last row of Fig. 9.14 is more interest-
ing, as it shows a structuring element composed of foreground, background, and
“don’t care” elements which, as mentioned earlier, we denote by ×’s. You can think
of the value of a don’t care element as always matching its corresponding pixel in
an image. In this example, when the SE is centered on the top, right corner pixel,
the don’t care elements in the top of the SE can be considered to be background,
and the don’t care elements on the bottom row as foreground, producing a correct

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 651 6/16/2017 2:11:52 PM


652 Chapter 9 Morphological Image Processing

Border of
background pixels

E Origin of D
C B

D
Background Background
Image, I Image, I  B
a b c
FIGURE 9.13 Same solution as in Fig. 9.12, but using Eq. (9-17) with a single structuring element.

match. When the SE is centered on the bottom, right corner pixel, the role of the
don’t care elements is reversed, again resulting in a correct match. The other border
pixels between the two corners were similarly detected by considering all don’t care
elements as foreground. Thus, using don’t care elements increases the flexibility of
structuring elements to perform multiple roles.

9.5 SOME BASIC MORPHOLOGICAL ALGORITHMS


9.5

With the preceding discussion as a foundation, we are now ready to consider some
practical uses of morphology. When dealing with binary images, one of the principal
applications of morphology is in extracting image components that are useful in the

a b c
d e f
g h i
FIGURE 9.14 B
Three examples
of using a single Image, I Image, I  B
structuring
element and
Eq. (9-17) to
detect specific
features. First
row: detection B
of single-pixel
holes. Second Image, I Image, I  B
row: detection of
an upper-right
corner. Third row:
detection of
multiple features.
B
Image, I Image, I  B

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 652 6/16/2017 2:11:53 PM


9.5 Some Basic Morphological Algorithms 653

representation and description of shape. In particular, we consider morphological


algorithms for extracting boundaries, connected components, the convex hull, and
the skeleton of a region. We also develop several methods (for region filling, thinning,
thickening, and pruning) that are used frequently for pre- or post-processing. We
make extensive use in this section of “mini-images,” designed to clarify the mechan-
ics of each morphological method as we introduce it. These binary images are shown
graphically with foreground (1’s) shaded and background (0’s) in white, as before.

BOUNDARY EXTRACTION
The boundary of a set A of foreground pixels, denoted by b( A), can be obtained by
first eroding A by a suitable structuring element B, and then performing the set dif-
ference between A and its erosion. That is,
b( A) = A − ( A | B) (9-18)
Figure 9.15 illustrates the mechanics of boundary extraction. It shows a simple binary
object, a structuring element B, and the result of using Eq. (9-18). The structuring
element in Fig. 9.15(b) is among the most frequently used, but it is not unique. For
example, using a 5 × 5 structuring element of 1’s would result in a boundary between
2 and 3 pixels thick. It is understood that the image in Fig. 9.15(a) was padded with
a border of background elements, and that the results were cropped back to the
original size after the morphological operations were completed.

EXAMPLE 9.5 : Boundary extraction.


Figure 9.16 further illustrates the use of Eq. (9-18) using a 3 × 3 structuring element of 1’s. As before
when working with images, we show foreground pixels (1’s) in white and background pixels (0’s) in
black. The elements of the SE, which are 1’s, also are treated as white. Because of the size of the structur-
ing element used, the boundary in Fig. 9.16(b) is one pixel thick.

HOLE FILLING
As mentioned in the discussion of Fig. 9.14, a hole may be defined as a background
region surrounded by a connected border of foreground pixels. In this section, we
develop an algorithm based on set dilation, complementation, and intersection for

a b
c d
FIGURE 9.15
(a) Set, A, of
foreground pixels. A B
(b) Structuring
element.
(c) A eroded by B.
(d) Boundary of A.
A両B b( A) = A − ( A | B)

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 653 6/16/2017 2:11:53 PM


654 Chapter 9 Morphological Image Processing

a b
FIGURE 9.16
(a) A binary
image.
(b) Result of
using Eq. (9-18)
with the
structuring
element in
Fig. 9.15(b).

filling holes in an image. Let A denote a set whose elements are 8-connected bound-
aries, with each boundary enclosing a background region (i.e., a hole). Given a point
in each hole, the objective is to fill all the holes with foreground elements (1’s).
We begin by forming an array, X 0 , of 0’s (the same size as I, the image containing
A), except at locations in X 0 that correspond to pixels that are known to be holes,
which we set to 1. Then, the following procedure fills all the holes with 1’s:
Remember, the dila-
tion of image X by B
is the dilation of the X k = ( X k −1 { B ) ¨ I c k = 1, 2, 3, … (9-19)
foreground
elements of X by B.
where B is the symmetric structuring element in Fig. 9.17(c) . The algorithm termi-
nates at iteration step k if X k = X k −1 . Then, X k contains all the filled holes. The set
union of X k and I contains all the filled holes and their boundaries.
The dilation in Eq. (9-19) would fill the entire area if left unchecked, but the
intersection at each step with I c limits the result to inside the region of interest. This is
our first example of how a morphological process can be conditioned to meet a desired
property. In the current application, the process is appropriately called conditional
dilation. The rest of Fig. 9.17 illustrates further the mechanics of Eq. (9-19). This exam-
ple only has one hole, but the concept applies to any finite number of holes, assuming
that a point inside each hole is given (we remove this requirement in Section 9.6).

EXAMPLE 9.6 : Morphological hole filling.


Figure 9.18(a) shows an image of white circles with black holes. An image such as this might result from
thresholding into two levels a scene containing polished spheres (e.g., ball bearings). The dark circular
areas inside the spheres would result from reflections. The objective is to eliminate the reflections by
filling the holes in the image. Figure 9.18(b) shows the result of filling all the spheres. Because it must be
known whether black points are background points or sphere inner points (i.e., holes), fully automating
this procedure requires that additional “intelligence” be built into the algorithm. We will give a fully
automatic approach in Section 9.6 based on morphological reconstruction (see Problem 9.36 also).

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 654 6/16/2017 2:11:54 PM


9.5 Some Basic Morphological Algorithms 655

a b c
d e f
g h i
FIGURE 9.17
Hole filling.
(a) Set A (shown
shaded) contained
in image I. A I Ac Ic B
(b) Complement
of I.
(c) Structuring
element B. Only
the foreground
elements are
used in
computations
(d) Initial point
inside hole, set X0 X1 X2
to 1.
(e)–(h) Various
steps of Eq. (9-19).
(i) Final result
[union of (a) and
(h)].

X6 X8 X8  I

EXTRACTION OF CONNECTED COMPONENTS


Being able to extract connected components from a binary image is central to many
Connectivity and
connected components automated image analysis applications. Let A be a set of foreground pixels consist-
are discussed in ing of one or more connected components, and form an image X 0 (of the same size
Section 2.5.
as I, the image containing A) whose elements are 0’s (background values), except
at each location known to correspond to a point in each connected component in A,

a b
FIGURE 9.18
(a) Binary image.
The white dots
inside the regions
(shown enlarged
for clarity) are the
starting points for
the hole-filling
algorithm.
(b) Result of
filling all holes.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 655 6/16/2017 2:11:54 PM


656 Chapter 9 Morphological Image Processing

a
b c d
B
e f g
FIGURE 9.19
(a) Structuring
element.
(b) Image
containing a set
with one connected
component.
(c) Initial array
containing a 1 in A I X0 X1
the region of the
connected
component.
(d)–(g) Various
steps in the
iteration of
Eq. (9-20)

X2 X3 X6

which we set to 1 (foreground value). The objective is to start with X 0 and find all
the connected components in I. The following iterative procedure accomplishes this:

X k = ( X k −1 { B ) ¨ I k = 1, 2, 3, … (9-20)

where B is the SE in Fig. 9.19(a). The procedure terminates when X k = X k −1 , with


X k containing all the connected components of foreground pixels in the image.
Both Eqs. (9-19) and (9-20) use conditional dilation to limit the growth of set dila-
tion, but Eq. (9-20) uses I instead of I c . This is because here we are looking for
foreground points, while the objective of (9-19) is to find background points. Figure
9.19 illustrates the mechanics of Eq. (9-20), with convergence being achieved for
k = 6. Note that the shape of the structuring element used is based on 8-connec-
tivity between pixels. As in the hole-filling algorithm, Eq. (9-20) is applicable to
any finite number of connected components contained in I, assuming that a point
is known in each. See Problem 9.37 for a completely automated procedure that
removes this requirement.

EXAMPLE 9.7 : Using connected components to detect foreign objects in packaged food.
Connected components are used frequently for automated inspection. Figure 9.20(a) shows an X-ray
image of a chicken breast that contains bone fragments. It is important to be able to detect such foreign
objects in processed foods before shipping. In this application, the density of the bones is such that their
nominal intensity values are significantly different from the background. This makes extraction of the
bones from the background a simple matter by using a single threshold (thresholding was introduced in
Section 3.1 and we will discuss in more detail in Section 10.3). The result is the binary image in Fig. 9.20(b).
The most significant feature in this figure is the fact that the points that remain after thresholding
are clustered into objects (bones), rather than being scattered. We can make sure that only objects of

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 656 6/16/2017 2:11:55 PM


9.5 Some Basic Morphological Algorithms 657

a
b
c d
FIGURE 9.20
(a) X-ray image of
a chicken filet with
bone fragments.
(b) Thresholded
image (shown as
the negative for
clarity).
(c) Image eroded
with a 5 × 5 SE
of 1’s. Connected No. of pixels in
(d) Number of component connected comp
pixels in the
01 11
connected 02 9
components of (c). 03 9
(Image (a) 04 39
courtesy of NTB 05 133
Elektronische 06 1
Geraete GmbH, 07 1
Diepholz, 08 743
Germany, 09 7
www.ntbxray.com.) 10 11
11 11
12 9
13 9
14 674
15 85

“significant” size are contained in the binary image by eroding its foreground. In this example, we define
as significant any object that remains after erosion with a 5 × 5 SE of 1’s. Figure 9.20(c) shows the result
of erosion. The next step is to analyze the size of the objects that remain. We label (identify) these
objects by extracting the connected components in the image. The table in Fig. 9.20(d) lists the results
of the extraction. There are 15 connected components, with four of them being dominant in size. This is
enough evidence to conclude that significant, undesirable objects are contained in the original image. If
needed, further characterization (such as shape) is possible using the techniques discussed in Chapter 11.

CONVEX HULL
A set, S, of points in the Euclidean plane is said to be convex if and only if a straight
line segment joining any two points in S lies entirely within S. The convex hull, H,
of S is the smallest convex set containing S. The convex deficiency of S is defined as
the set difference H − S. Unlike the Euclidean plane, the digital image plane (see
Fig. 2.19) only allows points at discrete coordinates. Thus, the sets with which we
work are digital sets. The same concepts of convexity are applicable to digital sets,
but the definition of a convex digital set is slightly different. A digital set, A, is said
to be convex if and only if its Euclidean convex hull only contains digital points

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 657 6/16/2017 2:11:56 PM


658 Chapter 9 Morphological Image Processing

belonging to A. A simple way to visualize if a digital set of foreground points is con-


vex is to join its boundary points by straight (continuous) Euclidean line segments.
If only foreground points are contained within the set formed by the line segments,
then the set is convex; otherwise it is not. The definitions of convex hull and convex
deficiency given above for S, extend directly to digital sets. The following morpho-
logical algorithm can be used to obtain an approximation of the convex hull of a set
A of foreground pixels, embedded in a binary image, I.
Let Bi , i = 1, 2, 3, 4, denote the four structuring elements in Fig. 9.21(a). The pro-
cedure consists of implementing the morphological equation

( )
X ki = X ki −1  Bi ´ X ki −1 i = 1, 2, 3, 4 and k = 1, 2, 3, … (9-21)

with X 0i = I . When the procedure converges using the ith structuring element (i.e.,
when X ki = X ki −1 ), we let Di = X ki . Then, the convex hull of A is the union of the four
results:
4
C ( A) = ∪ Di (9-22)
i =1

Thus, the method consists of iteratively applying the hit-or-miss transform to I with
B1 until convergence, then letting D1 = X k1 , where k is the step at which convergence
occurred. The procedure is repeated with B2 (applied to I) until no further changes
occur, and so on. The union of the four resulting Di constitutes the convex hull of A.
The algorithm is initialized with k = 0 and X 0i = I every time that i (i.e., the structur-
ing element) changes.
Figure 9.21 illustrates the use of Eqs. (9-21) and (9-22). Figure 9.21(a) shows the
structuring elements used to extract the convex hull. The origin of each element is
at its center. As before, the × entries indicate “don’t care” elements. Recall that the
HMT is said to have found a match of structuring element Bi in a 3 × 3 region of
I, if all the elements of Bi find corresponding matches in that region. As noted ear-
lier, when computing a match, a “don’t care” element can be interpreted as always
matching the value of its corresponding element in the image. Note in Fig. 9.21(a)
that Bi is a clockwise rotation of Bi−1 by 90°.
Figure 9.21(b) shows a set A for which the convex hull is sought. As before, the
set is embedded in an array of background elements to form an image, I. Starting
with X 01 = I resulted in the set in Fig. 9.21(c) after five iterations of Eq. (9-21). Then,
letting X 02 = I and again using Eq. (9-21) resulted in the set in Fig. 9.21(d) (con-
vergence was achieved in only two steps in this case). The next two results were
obtained in the same manner. Finally, forming the union of the sets in Figs. 9.21(c),
(d), (e), and (f) resulted in the convex hull in Fig. 9.21(g). The contribution of each
structuring element is highlighted in the composite set shown in Fig. 9.21(h).
One obvious shortcoming of the procedure just discussed is that the convex hull
can grow beyond the minimum dimensions required to guarantee convexity, thus
violating the definition of the convex hull. This, in fact, is what happened in this
case. One simple approach to reduce this growth is to place limits so that it does
not extend beyond the vertical and horizontal dimensions of set A. Imposing this

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 658 6/16/2017 2:11:57 PM


9.5 Some Basic Morphological Algorithms 659

**
*
**
a ** **

**

**
* *
b c d

*
** **
e f g
B1 B2 B3 B4
h
FIGURE 9.21
(a) Structuring
elements.
(b) Set A.
(c)–(f) Results of
convergence with
the structuring
elements shown
in (a).
(g) Convex hull. I A X 01  I X 51 X 22
(h) Convex hull
showing the
contribution of
each structuring
element.

X 73 X 24 C(A)

B1
B2
B3
B4

limitation on the example in Fig. 9.21 resulted in Fig. 9.22(a). Joining the boundary
pixels of the reduced set (remember, the pixels are the center points of the squares)
show that no set points lie outside these lines, indicating that the set is convex. By
inspection, you can see that no points can be deleted from this set without losing
convexity, so the reduced set is the convex hull of A.
Of course, the limits we used to produce Fig. 9.22 do not constitute a general
approach for obtaining the minimum convex set enclosing a set in question; it is
simply an easy-to-implement heuristic. The reason why the convex hull algorithm
did not yield a closer approximation of the actual convex hull is because of the
structuring elements used. The SEs in Fig. 9.21(a) “look” only in four orthogonal
directions. We could achieve greater accuracy by looking in additional directions,
such as the diagonals, for example. The price paid is increased algorithm complexity
and a higher computational load.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 659 6/16/2017 2:11:58 PM


660 Chapter 9 Morphological Image Processing

a b
A
FIGURE 9.22
(a) Result of limiting
growth of the convex
hull algorithm.
(b) Straight lines
connecting the
boundary points
show that the new set
is convex also.

THINNING
Thinning of a set A of foreground pixels by a structuring element B, denoted A z B,
can be defined in terms of the hit-or-miss transform:

A z B = A − ( A  B)
(9-23)
= A ¨ ( A  B)
c

where the second line follows from the definition of set difference given in Eq. (2-40).
A more useful expression for thinning A symmetrically is based on a sequence of
structuring elements:
{B} = {B1 , B2 , B3 ,…, Bn } (9-24)

Using this concept, we now define thinning by a sequence of structuring elements as

(( (( )
A z {B} = … A z B1 z B2 … z Bn ) ) ) (9-25)

The process is to thin A by one pass with B1 , then thin the result with one pass of B2 ,
and so on, until A is thinned with one pass of Bn . The entire process is repeated until
no further changes occur after one complete pass through all structuring elements.
Each individual thinning pass is performed using Eq. (9-23).
As before, we assume
Figure 9.23(a) shows a set of structuring elements used routinely for thinning
that the image containing (note that B i is equal to B i−1 rotated clockwise by 45°), and Fig. 9.23(b) shows a
A was padded to accom- set A to be thinned, using the procedure just discussed. Figure 9.23(c) shows the
modate all excursions
of B, and that the result result of thinning A with one pass of B1 to obtain A1 . Figure 9.23(c) is the result of
was cropped. We show
only A for simplicity.
thinning A1 with B2 , and Figs. 9.21(e) through (k) show the results of passes with
the remaining structuring elements (there were no changes from A7 to A8 or from
A9 to A11 .) Convergence was achieved after the second pass of B6 . Figure 9.23(l)
shows the thinned result. Finally, Fig. 9.23(m) shows the thinned set converted to
m-connectivity (see Section 2.5 and Problem 9.29) to eliminate multiple paths.

THICKENING
Thickening is the morphological dual of thinning and is defined by the expression

A } B = A ´ ( A  B) (9-26)

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 660 6/16/2017 2:11:59 PM


9.5 Some Basic Morphological Algorithms 661

a Origin
b c d

*
*

*
*
*
* *
g * *
e f

*
*

*
*

*
*
h i j
k l m B1 B2 B3 B4 B5 B6 B7 B8

FIGURE 9.23
(a) Structuring
elements.
(b) Set A.
(c) Result of thinning
A with B1 (shaded). Image, I A A1  A 丢 B1 A2  A1 丢 B2
(d) Result of thinning
A1 with B2 .
(e)–(i) Results of
thinning with the next
six SEs. (There was no
change between A7 A3  A2 丢 B3 A4  A3 丢 B4 A5  A4 丢 B5
and A8 .)
(j)–(k) Result of using
the first four elements
again.
(l) Result after
A6  A5 丢 B6 A7  A6 丢 B7 (A8  A7 丢 B A9  A8 丢 B1
8
convergence.  A7)

(m) Result converted


to m-connectivity.

A12  A11 丢 B4 A14  A13 丢 B6 A14 converted to


(A11  A10  A9) No more changes after this. m-connectivity.

where B is a structuring element suitable for thickening. As in thinning, thickening


can be defined as a sequential operation:

(( ((
A } {B} = … A } B1 } B2 … } Bn ) ) ) ) ` (9-27)

The structuring elements used for thickening have the same form as those shown
in Fig. 9.23(a), but with all 1’s and 0’s interchanged. However, a separate algorithm
for thickening is seldom used in practice. Instead, the usual procedure is to thin the
background of the set in question, then complement the result. In other words, to
thicken a set A we form Ac , thin Ac , and then complement the thinned set to obtain
the thickening of A. Figure 9.24 illustrates this procedure. As before, we show only
set A and image I, and not the padded version of I.
Depending on the structure of A, this procedure can result in disconnected points,
as Fig. 9.24(d) shows. Hence thickening by this method usually is followed by post-
processing to remove disconnected points. Note from Fig. 9.24(c) that the thinned
background forms a boundary for the thickening process. This useful feature is not
present in the direct implementation of thickening using Eq. (9-27), and it is one of
the principal reasons for using background thinning to accomplish thickening.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 661 6/16/2017 2:12:01 PM


662 Chapter 9 Morphological Image Processing

a b
c d
e
FIGURE 9.24
Image, I A Ac
(a) Set A.
(b) Complement of A.
(c) Result of
thinning the
complement.
(d) Thickened set
obtained by
complementing (c).
(e) Final result, with
no disconnected
points.

SKELETONS

We will discuss skeletons


As Fig. 9.25 shows, the notion of a skeleton S( A) of a set A is intuitively simple. We
in more detail in Section deduce from this figure that
11.2.
(a) If z is a point of S( A), and ( D)z is the largest disk centered at z and contained
in A, one cannot find a larger disk (not necessarily centered at z) containing
( D)z and simultaneously included in A. A disk ( D)z satisfying these conditions
is called a maximum disk.
(b) If ( D)z is a maximum disk, it touches the boundary of A at two or more differ-
ent places.

The skeleton of A can be expressed in terms of erosions and openings. That is, it can
be shown (Serra [1982]) that
K
S ( A) = ∪ Sk ( A) (9-28)
k=0

with

Sk ( A) = ( A | kB ) − ( A | kB ) B (9-29)

where B is a structuring element, and ( A | kB ) indicates k successive erosions start-


ing with A; that is, A is first eroded by B, the result is eroded by B, and so on:

( A | kB) = ((…(( A | B) | B) | …) | B) (9-30)

k times. K in Eq. (9-28) is the last iterative step before A erodes to an empty set. In
other words,
K = max k { ( A | kB) ≠ ∅} (9-31)

The formulation in Eqs. (9-28) and (9-29) indicate that S( A) can be obtained as the
union of the skeleton subsets Sk ( A) , k = 0, 1, 2, … , K .

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 662 6/16/2017 2:12:02 PM


9.5 Some Basic Morphological Algorithms 663

a b
c d
FIGURE 9.25
(a) Set A.
(b) Various
positions of
maximum disks
whose centers
partially define
the skeleton of A.
(c) Another
maximum disk,
A
whose center
defines a different
segment of the
skeleton of A.
(d) Complete
skeleton (dashed).

Skeleton of A

It can be shown (Serra [1982]) that A can be reconstructed from these subsets:
K
A= ∪ ( Sk ( A) { kB)
k=0
(9-32)

where ( Sk ( A) { kB ) denotes k successive dilations, starting with Sk ( A) ; that is,

( Sk ( A) { kB) = ((…(( Sk ( A) { B) { B) { …) { B) (9-33)

EXAMPLE 9.8 : Computing the skeleton of a simple set.


Figure 9.26 illustrates the concepts just discussed. The first column shows the original set (at the top)
and two erosions by the structuring element B shown in the figure. Note that one more erosion would
yield the empty set, so K = 2 in this case. The second column shows the opening by B of the sets in the
first column. These results are easily explained by the fitting characterization of the opening operation
discussed in connection with Fig. 9.8. The third column contains the set differences between the first and
second columns. Thus, the three entries in the third column are S0 ( A), S1 ( A), and S2 ( A), respectively.
The fourth column contains two partial skeletons, and the final result at the bottom of the column.
The final skeleton not only is thicker than it needs to be but, more important, it is not connected. This
result is not unexpected, as nothing in the preceding formulation of the morphological skeleton guar-
antees connectivity. Morphology produces an elegant formulation in terms of erosions and openings of
the given set. However, heuristic formulations (see Section 11.2) are needed if, as is usually the case, the
skeleton must be maximally thin, connected, and minimally eroded.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 663 6/16/2017 2:12:03 PM


664 Chapter 9 Morphological Image Processing

FIGURE 9.26
Implementation K K
A 両 kB (A 両 kB) ⴰ B Sk(A) Sk(A) Sk(A) 丣 kB Sk(A) 丣 kB
of Eqs. (9-28) k k0 k0
through (9-33).
The original set is
at the top left, and
its morphologi-
cal skeleton is at
the bottom of the 0
fourth column.
The reconstructed
set is at the
bottom of the
sixth column.

S(A) A

The entries in the fifth and sixth columns deal with reconstructing the original set from its skeleton subsets.
The fifth column are the dilations of Sk ( A); that is, S0 ( A), S1 (k ) { B, and S2 ( A) { 2 B = ( S2 ( A) { B ) { B.
Finally, the last column shows reconstruction of set A which, according to Eq. (9-32), is the union of the
dilated skeleton subsets shown in the fifth column.

PRUNING
Pruning methods are an essential complement to thinning and skeletonizing algo-
rithms, because these procedures tend to leave spurs (“parasitic” components) that
need to be “cleaned up” by postprocessing. We begin the discussion with a pruning
problem, then develop a solution based on the material introduced in the preceding
sections. Thus, we take this opportunity to illustrate how to solve a problem by com-
bining several of the morphological techniques discussed up to this point.
A common approach in the automated recognition of hand-printed characters is
to analyze the shape of the skeleton of a character. These skeletons often contain
spurs, caused during erosion by noise and non-uniformities in the character strokes.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 664 6/16/2017 2:12:04 PM


9.5 Some Basic Morphological Algorithms 665

In this section we develop a morphological technique for handling this problem,


starting with the assumption that the length of a parasitic component does not
exceed a specified number of pixels.
We may define an end
point as the center point Figure 9.27(a) shows the skeleton of a hand-printed letter “a.” The spur on the
of a 3 × 3 region that leftmost part of the character exemplifies what we are interested in removing. The
satisfies any of the
arrangements in solution is based on suppressing a spur branch by successively eliminating its end
Fig. 9.27(b). point. Of course, this also shortens (or eliminates) other branches in the character
but, in the absence of other structural information, the assumption in this example is
that any branch with three or less pixels is to be eliminated. Thinning of a set A, with
a sequence of structuring elements designed to detect only end points, achieves the
desired result. That is, let

X 1 = A z {B} (9-34)

where {B} denotes the structuring element sequence in Fig. 9.27(b) [see Eq. (9-24)
regarding structuring-element sequences]. The sequence of structuring elements
consists of two different structures, each of which is rotated 90° for a total of eight
elements. The × in Fig. 9.27(b) signifies a “don’t care” condition, as defined earlier.
(Note that each SE is a detector for an end point in a particular orientation.)

a b
c d

*
*
*
e f

*
*
*
FIGURE 9.27
(a) Set A of B1 B2 B3 B4
foreground pixels
(shaded).
(b) SEs used for
B5 B6 B7 B8
deleting end points.
(c) Result of three
cycles of thinning.
(d) End points
of (c).
(e) Dilation of end
points conditioned
on (a).
(f) Pruned image.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 665 6/16/2017 2:12:05 PM


666 Chapter 9 Morphological Image Processing

Applying Eq. (9-34) to A three times yielded the set X 1 in Fig. 9.27(c). The next
step is to “restore” the character to its original form, but with the parasitic branches
removed. This requires that we first form a set X 2 containing all end points in X 1
[Fig. 9.27(e)]:
8
X2 = ∪
k =1
( X1  B k ) (9-35)

where the B k are the end-point detectors in Fig. 9.27(b). The next step is dilation
of the end points. Typically, the number of dilations is less than the number of end-
point removals to reduce the probability of “growing” back some of the spurs. In
this case, we know by inspection that no new spurs are created, so we dilate the end
points three times using A as a delimiter. This is the same number of thinning passes:

X3 = ( X2 { H ) ¨ A (9-36)

where H is a 3 × 3 structuring element of 1’s, and the intersection with A is applied


after each step. As in the case of region filling, this type of conditional dilation pre-
vents the creation of 1-valued elements outside the region of interest, as illustrated
by the result in Fig. 9.27(e). Finally, the union of X 1 and X 3 ,

X 4 = X1 ´ X 3 (9-37)

yields the desired result in Fig. 9.27(f).


In more complex scenarios, using Eq. (9-36) sometimes picks up the “tips” of
some branches. This can occur when the end points of these branches are near the
skeleton. Although Eq. (9-36) may eliminate them, they can be picked up again
during dilation because they are valid points in A. However, unless entire parasitic
elements are picked up again (a rare case if these elements are short with respect to
valid strokes), detecting and eliminating the reconstructed elements is easy because
they are disconnected regions.
A natural thought at this juncture is that there must be easier ways to solve this
problem. For example, we could just keep track of all deleted points and simply
reconnect the appropriate points to all end points left after application of Eq. (9-34).
This argument is valid, but the advantage of the formulation just presented is that
we used existing morphological constructs to solve the problem. When a set of such
tools is available, the advantage is that no new algorithms have to be written. We
simply combine the necessary morphological functions into a sequence of opera-
tions.
Sometimes you will encounter end point detectors based on a single structuring
element, similar to the first SE in Fig. 9.27(b), but having “don’t care” conditions
along the entire first column instead having a foreground element separating the
corner ×’s. This is incorrect. For example, the former element would identify the
point located in the eighth row, fourth column of Fig. 9.27(a) as an end point, thus
eliminating it and breaking the connectivity of that part of the stroke.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 666 6/16/2017 2:12:06 PM


812 Chapter 11 Feature Extraction

11.1 BACKGROUND
11.1

Although there is no universally accepted, formal definition of what constitutes an


image feature, there is little argument that, intuitively, we generally think of a fea-
ture as a distinctive attribute or description of “something” we want to label or
differentiate. For our purposes, the key words here are label and differentiate. The
“something” of interest in this chapter refers either to individual image objects, or
even to entire images or sets of images. Thus, we think of features as attributes that
are going to help us assign unique labels to objects in an image or, more gener-
ally, are going to be of value in differentiating between entire images or families of
images.
There are two principal aspects of image feature extraction: feature detection, and
feature description. That is, when we refer to feature extraction, we are referring
to both detecting the features and then describing them. To be useful, the extrac-
tion process must encompass both. The terminology you are likely to encounter in
image processing and analysis to describe feature detection and description varies,
but a simple example will help clarify our use of these term. Suppose that we use
object corners as features for some image processing task. In this chapter, detection
refers to finding the corners in a region or image. Description, on the other hand,
refers to assigning quantitative (or sometimes qualitative) attributes to the detected
features, such as corner orientation, and location with respect to other corners. In
other words, knowing that there are corners in an image has limited use without
additional information that can help us differentiate between objects in an image,
or between images, based on corners and their attributes.
Given that we want to use features for purposes of differentiation, the next ques-
tion is: What are the important characteristics that these features must possess in
the realm of digital image processing? You are already familiar with some of these
characteristics. In general, features should be independent of location, rotation, and
scale. Other factors, such as independence of illumination levels and changes caused
by the viewpoint between the imaging sensor(s) and the scene, also are impor-
tant. Whenever possible, preprocessing should be used to normalize input images
before feature extraction. For example, in situations where changes in illumination
are severe enough to cause difficulties in feature detection, it would make sense to
preprocess an image to compensate for those changes. Histogram equalization or
specification come to mind as automatic techniques that we know are helpful in
this regard. The idea is to use as much a priori information as possible to preprocess
images in order to improve the chances of accurate feature extraction.
When used in the context of a feature, the word “independent” usually has one of
two meanings: invariant or covariant. A feature descriptor is invariant with respect
to a set of transformations if its value remains unchanged after the application (to
the entity being described) of any transformation from the family. A feature descrip-
tor is covariant with respect to a set of transformations if applying to the entity any
transformation from the set produces the same result in the descriptor. For example,
See Table 2.3 regarding
consider this set of affine transformations: {translation, reflection, rotation}, and sup-
affine transformations. pose that we have an elliptical region to which we assign the feature descriptor area.
Clearly, applying any of these transformations to the region does not change its area.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 812 6/16/2017 2:14:47 PM


11.1 Background 813

Therefore, area is an invariant feature descriptor with respect to the given family of
transformations. However, if we add the affine transformation scaling to the fam-
ily, descriptor area ceases to be invariant with respect to the extended family. The
descriptor is now covariant with respect to the family, because scaling the area of the
region by any factor scales the value of the descriptor by the same factor. Similarly,
the descriptor direction (of the principal axis of the region) is covariant because
rotating the region by any angle has the same effect on the value of the descriptor.
Most of the feature descriptors we use in this chapter are covariant in general, in
the sense that they may be invariant to some transformations of interest, but not to
others that may be equally as important. As you will see shortly, it is good practice to
normalize as many relevant invariances as possible out of covariances. For instance,
we can compensate for changes in direction of a region by computing its actual
direction and rotating the region so that its principal axis points in a predefined
direction. If we do this for every region detected in an image, rotation will cease to
be covariant.
Another major classification of features is local vs. global. You are likely to see
many different attempts to classify features as belonging to one of these two catego-
ries. What makes this difficult is that a feature may belong to both, depending on the
application. For example, consider the descriptor area again, and suppose that we
are applying it to the task of inspecting the degree to which bottles moving past an
imaging sensor on a production line are full of liquid. The sensor and its accompany-
ing software are capable of generating images of ten bottles at once, in which liquid
in each bottle appears as a bright region, and the rest of the image appears as dark
background. The area of a region in this fixed geometry is directly proportional to
the amount of liquid in a bottle and, if detected and measured reliably, area is the
only feature we need to solve the inspection problem. Each image has ten regions, so
we consider area to be a local feature, in the sense that it is applicable to individual
elements (regions) of an image. If the problem were to detect the total amount (area)
of liquid in an image, we would now consider area to be a global descriptor. But the
story does not end there. Suppose that the liquid inspection task is redefined so that
it calculates the entire amount of liquid per day passing by the imaging station. We
no longer care about the area of individual regions per se. Our units now are images.
If we know the total area in an image, and we know the number of images, calculat-
ing the total amount of liquid in a day is trivial. Now the area of an entire image is a
local feature, and the area of the total at the end of the day is global. Obviously, we
could redefine the task so that the area at the end of a day becomes a local feature
descriptor, and the area for all assembly lines becomes a global measure. And so on,
endlessly. In this chapter, we call a feature local if it is applies to a member of a set,
and global if it applies to the entire set, where “member” and “set” are determined
by the application.
Features by themselves are seldom generated for human consumption, except in
applications such as interactive image processing, topics that are not in the main-
stream of this book. In fact, as you will see later, some feature extraction meth-
ods generate tens, hundreds, or even thousands of descriptor values that would
appear meaningless if examined visually. Instead, feature description typically is
used as a preprocessing step for higher-level tasks, such as image registration, object

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 813 6/16/2017 2:14:47 PM


814 Chapter 11 Feature Extraction

recognition for automated inspection, searching for patterns (e.g., individual faces
and/or fingerprints) in image databases, and autonomous applications, such as robot
and vehicle navigation. For these applications, numerical features usually are “pack-
aged” in the form of a feature vector, (i.e., a 1 × n or n × 1 matrix) whose elements are
the descriptors. An RGB image is one of the simplest examples. As you know from
Chapter 6, each pixel of an RGB image can be expressed as 3-D vector,

⎡ x1 ⎤
x = ⎢⎢ x2 ⎥⎥
⎢⎣ x3 ⎥⎦

in which x1 is the intensity value of the red image at a point, and the other com-
ponents are the intensity values of the green and blue images at the same point. If
color is used as a feature, then a region in an RGB image would be represented as
a set of feature vectors (points) in 3-D space. When n descriptors are used, feature
vectors become n-dimensional, and the space containing them is referred to as an
n-dimensional feature space. You may “visualize” a set of n-dimensional feature vec-
tors as a “hypercloud” of points in n-dimensional Euclidean space.
In this chapter, we group features into three principal categories: boundary,
region, and whole image features. This subsidivision is not based on the applicabil-
ity of the methods we are about to discuss; rather, it is based on the fact that some
categories make more sense than others when considered in the context of what is
being described. For example, it is implied that when we refer to the “length of a
boundary” we are referring to the “length of the boundary of a region,” but it makes
no sense to refer to the “length” of an image. It will become clear that many of the
features we will be discussing are applicable to boundaries and regions, and some
apply to whole images as well.

11.2 BOUNDARY PREPROCESSING


11.2

The segmentation techniques discussed in the previous two chapters yield raw data
in the form of pixels along a boundary or pixels contained in a region. It is standard
practice to use schemes that compact the segmented data into representations that
facilitate the computation of descriptors. In this section, we discuss various bound-
ary preprocessing approaches suitable for this purpose.

BOUNDARY FOLLOWING (TRACING)


You will find it helpful to Several of the algorithms discussed in this chapter require that the points in the
review the discussion in
Sections 2.5 on neighbor- boundary of a region be ordered in a clockwise or counterclockwise direction. Con-
hoods, adjacency and sequently, we begin our discussion by introducing a boundary-following algorithm
connectivity, and the
discussion in Section 9.6 whose output is an ordered sequence of points. We assume (1) that we are work-
dealing with connected ing with binary images in which object and background points are labeled 1 and 0,
components.
respectively; and (2) that images are padded with a border of 0’s to eliminate the
possibility of an object merging with the image border. For clarity, we limit the dis-
cussion to single regions. The approach is extended to multiple, disjoint regions by
processing the regions individually.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 814 6/16/2017 2:14:48 PM


11.2 Boundary Preprocessing 815

c c c
1 1 1 1 c0 b0 1 1 1 b 1 1 b 1 b
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 ...
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

a b c d e f
FIGURE 11.1 Illustration of the first few steps in the boundary-following algorithm. The point to be processed next is
labeled in bold, black; the points yet to be processed are gray; and the points found by the algorithm are shaded.
Squares without labels are considered background (0) values.

The following algorithm traces the boundary of a 1-valued region, R, in a binary


image.

1. Let the starting point, b0 , be the uppermost-leftmost point† in the image that is
labeled 1. Denote by c0 the west neighbor of b0 [see Fig. 11.1(b)]. Clearly, c0 is
See Section 2.5 for the
always a background point. Examine the 8-neighbors of b0 , starting at c0 and
definition of 4-neigh- proceeding in a clockwise direction. Let b1 denote the first neighbor encountered
bors, 8-neighbors, and whose value is 1, and let c1 be the (background) point immediately preceding b1
m-neighbors of a point,
in the sequence. Store the locations of b0 for use in Step 5.
2. Let b = b0 and c = c0 .
3. Let the 8-neighbors of b, starting at c and proceeding in a clockwise direction,
be denoted by n1 , n2 , … , n8 . Find the first neighbor labeled 1 and denote it by nk .
4. Let b = nk and c = nk –1 .
5. Repeat Steps 3 and 4 until b = b0 . The sequence of b points found when the
algorithm stops is the set of ordered boundary points.

Note that c in Step 4 is always a background point because nk is the first 1-valued
point found in the clockwise scan. This algorithm is referred to as the Moore bound-
ary tracing algorithm after Edward F. Moore, a pioneer in cellular automata theory.
Figure 11.1 illustrates the first few steps of the algorithm. It is easily verified (see
Problem 11.1) that continuing with this procedure will yield the correct boundary,
shown in Fig. 11.1(f), whose points are ordered in a clockwise sequence. The algo-
rithm works equally well with more complex boundaries, such as the boundary with
an attached branch in Fig. 11.2(a) or the self-intersecting boundary in Fig. 11.2(b).
Multiple boundaries [Fig. 11.2(c)] are handled by processing one boundary at a time.
If we start with a binary region instead of a boundary, the algorithm extracts the
outer boundary of the region. Typically, the resulting boundary will be one pixel
thick, but not always [see Problem 11.1(b)]. If the objective is to find the boundaries
of holes in a region (these are called the inner or interior boundaries of the region),

As you will see later in this chapter and in Problem 11.8, the uppermost-leftmost point in a 1-valued boundary
has the important property that a polygonal approximation to the boundary has a convex vertex at that location.
Also, the left and north neighbors of the point are guaranteed to be background points. These properties make
it a good “standard” point at which to start boundary-following algorithms.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 815 6/16/2017 2:14:50 PM


816 Chapter 11 Feature Extraction

1 1
1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1

a b c
FIGURE 11.2 Examples of boundaries that can be processed by the boundary-following algo-
rithm. (a) Closed boundary with a branch. (b) Self-intersecting boundary. (c) Multiple bound-
aries (processed one at a time).

a straightforward approach is to extract the holes (see Section 9.6) and treat them
as 1-valued regions on a background of 0’s. Applying the boundary-following algo-
rithm to these regions will yield the inner boundaries of the original region.
We could have stated the algorithm just as easily based on following a boundary
in the counterclockwise direction but you will find it easier to have just one algo-
rithm and then reverse the order of the result to obtain a sequence in the opposite
direction. We use both directions interchangeably (but consistently) in the following
sections to help you become familiar with both approaches.

CHAIN CODES
Chain codes are used to represent a boundary by a connected sequence of straight-
line segments of specified length and direction. We assume in this section that all
curves are closed, simple curves (i.e., curves that are closed and not self intersecting).

Freeman Chain Codes


Typically, a chain code representation is based on 4- or 8-connectivity of the seg-
ments. The direction of each segment is coded by using a numbering scheme, as in Fig.
11.3. A boundary code formed as a sequence of such directional numbers is referred
to as a Freeman chain code.
Digital images usually are acquired and processed in a grid format with equal
spacing in the x- and y-directions, so a chain code could be generated by following a
boundary in, say, a clockwise direction and assigning a direction to the segments con-
necting every pair of pixels. This level of detail generally is not used for two principal
reasons: (1) The resulting chain would be quite long and (2) any small disturbances
along the boundary due to noise or imperfect segmentation would cause changes
in the code that may not be related to the principal shape features of the boundary.
An approach used to address these problems is to resample the boundary by
selecting a larger grid spacing, as in Fig. 11.4(a). Then, as the boundary is traversed, a
boundary point is assigned to a node of the coarser grid, depending on the proximity
of the original boundary point to that node, as in Fig. 11.4(b). The resampled bound-
ary obtained in this way can be represented by a 4- or 8-code. Figure 11.4(c) shows
the coarser boundary points represented by an 8-directional chain code. It is a simple
matter to convert from an 8-code to a 4-code and vice versa (see Problems 2.15, 9.27,

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 816 6/16/2017 2:14:50 PM


11.2 Boundary Preprocessing 817

a b 1 2

FIGURE 11.3 3 1
Direction
numbers for
2 0 4 0
(a) 4-directional
chain code, and
(b) 8-directional 5 7
chain code.
3 6

and 9.29). For the same reason mentioned when discussing boundary tracing earlier
in this section, we chose the starting point in Fig. 11.4(c) as the uppermost-leftmost
point of the boundary, which gives the chain code 0766…1212. As you might suspect,
the spacing of the resampling grid is determined by the application in which the
chain code is used.
If the sampling grid used to obtain a connected digital curve is a uniform quad-
rilateral (see Fig. 2.19) all points of a Freeman code based on Fig. 11.3 are guaran-
teed to coincide with the points of the curve. The same is true if a digital curve is
subsampled using the same type of sampling grid, as in Fig. 11.4(b). This is because
the samples of curves produced using such grids have the same arrangement as in
Fig. 11.3, so all points are reachable as we traverse a curve from one point to the next
to generate the code.
The numerical value of a chain code depends on the starting point. However, the
code can be normalized with respect to the starting point by a straightforward pro-
cedure: We simply treat the chain code as a circular sequence of direction numbers
and redefine the starting point so that the resulting sequence of numbers forms an
integer of minimum magnitude. We can normalize also for rotation (in angles that
are integer multiples of the directions in Fig. 11.3) by using the first difference of the
chain code instead of the code itself. This difference is obtained by counting the num-
ber of direction changes (in a counterclockwise direction in Fig. 11.3) that separate
two adjacent elements of the code. If we treat the code as a circular sequence to nor-
malize it with respect to the starting point, then the first element of the difference is
computed by using the transition between the last and first components of the chain.

0
a b c
2 7
FIGURE 11.4
1 6
(a) Digital
boundary with 2 6
resampling grid
1 6
superimposed.
(b) Result of 2 6
resampling.
(c) 8-directional 6
3
chain-coded 5 4
3
boundary.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 817 6/16/2017 2:14:51 PM


818 Chapter 11 Feature Extraction

For instance, the first difference of the 4-directional chain code 10103322 is 3133030.
Size normalization can be achieved by altering the spacing of the resampling grid.
The normalizations just discussed are exact only if the boundaries themselves
are invariant to rotation (again, in angles that are integer multiples of the directions
in Fig. 11.3) and scale change, which seldom is the case in practice. For instance,
the same object digitized in two different orientations will have different bound-
ary shapes in general, with the degree of dissimilarity being proportional to image
resolution. This effect can be reduced by selecting chain elements that are long in
proportion to the distance between pixels in the digitized image, and/or by orienting
the resampling grid along the principal axes of the object to be coded, as discussed
in Section 11.3, or along its eigen axes, as discussed in Section 11.5.

EXAMPLE 11.1 : Freeman chain code and some of its variations.


Figure 11.5(a) shows a 570 × 570-pixel, 8-bit gray-scale image of a circular stroke embedded in small,
randomly distributed specular fragments. The objective of this example is to obtain a Freeman chain
code, the corresponding integer of minimum magnitude, and the first difference of the outer boundary
of the stroke. Because the object of interest is embedded in small fragments, extracting its boundary
would result in a noisy curve that would not be descriptive of the general shape of the object. As you
know, smoothing is a routine process when working with noisy boundaries. Figure 11.5(b) shows the
original image smoothed using a box kernel of size 9 × 9 pixels (see Section 3.5 for a discussion of spa-
tial smoothing), and Fig. 11.5(c) is the result of thresholding this image with a global threshold obtained
using Otsu’s method. Note that the number of regions has been reduced to two (one of which is a dot),
significantly simplifying the problem.
Figure 11.5(d) is the outer boundary of the region in Fig. 11.5(c). Obtaining the chain code of this
boundary directly would result in a long sequence with small variations that are not representative
of the global shape of the boundary, so we resample it before obtaining its chain code. This reduces
insignificant variability. Figure 11.5(e) is the result of using a resampling grid with nodes 50 pixels apart
(approximately 10% of the image width) and Fig. 11.5(f) is the result of joining the sample points by
straight lines. This simpler approximation retained the principal features of the original boundary.
The 8-directional Freeman chain code of the simplified boundary is

00006066666666444444242222202202

The starting point of the boundary is at coordinates (2, 5) in the subsampled grid (remember from
Fig. 2.19 that the origin of an image is at its top, left). This is the uppermost-leftmost point in Fig. 11.5(f).
The integer of minimum magnitude of the code happens in this case to be the same as the chain code:

00006066666666444444242222202202

The first difference of the code is

00062600000006000006260000620626

Using this code to represent the boundary results in a significant reduction in the amount of data
needed to store the boundary. In addition, working with code numbers offers a unified way to analyze
the shape of a boundary, as we discuss in Section 11.3. Finally, keep in mind that the subsampled bound-
ary can be recovered from any of the preceding codes.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 818 6/16/2017 2:14:51 PM


11.2 Boundary Preprocessing 819

a b c
d e f
FIGURE 11.5 (a) Noisy image of size 570 × 570 pixels. (b) Image smoothed with a 9 × 9 box kernel. (c) Smoothed
image, thresholded using Otsu’s method. (d) Longest outer boundary of (c). (e) Subsampled boundary (the points
are shown enlarged for clarity). (f) Connected points from (e).

Slope Chain Codes


Using Freeman chain codes generally requires resampling a boundary to smooth
small variations, a process that implies defining a grid and subsequently assigning
all boundary points to their closest neighbors in the grid. An alternative to this
approach is to use slope chain codes (SCCs) (Bribiesca [1992, 2013]). The SCC of a
2-D curve is obtained by placing straight-line segments of equal length around the
curve, with the end points of the segments touching the curve.
Obtaining an SSC requires calculating the slope changes between contiguous line
segments, and normalizing the changes to the continuous (open) interval (−1, 1).
This approach requires defining the length of the line segments, as opposed to Free-
man codes, which require defining a grid and assigning curve points to it—a much
more elaborate procedure. Like Freeman codes, SCCs are independent of rotation,
but a larger range of possible slope changes provides a more accurate representa-
tion under rotation than the rotational independence of the Freeman codes, which is
limited to the eight directions in Fig. 11.3(b). As with Freeman codes, SCCs are inde-
pendent of translation, and can be normalized for scale changes (see Problem 11.8).

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 819 6/16/2017 2:14:52 PM


820 Chapter 11 Feature Extraction

Figure 11.6 illustrates how an SCC is generated. The first step is to select the
length of the line segment to use in generating the code [see Fig. 11.6(b)]. Next, a
starting point (the origin) is specified (for an open curve, the logical starting point is
one of its end points). As Fig. 11.6(c) shows, once the origin has been selected, one
end of a line segment is placed at the origin and the other end of the segment is set
to coincide with the curve. This point becomes the starting point of the next line seg-
ment, and we repeat this procedure until the starting point (or end point in the case
of an open curve) is reached. As the figure illustrates, you can think of this process as
a sequence of identical circles (with radius equal to the length of the line segment)
traversing the curve. The intersections of the circles and the curve determine the
nodes of the straight-line approximation to the curve.
Once the intersections of the circles are known, we determine the slope changes
between contiguous line segments. Positive and zero slope changes are normalized
to the open half interval [0, 1), while negative slope changes are normalized to the
open interval (−1, 0). Not allowing slope changes of ±1 eliminates the implementa-
tion issues that result from having to deal with the fact that such changes result in
the same line segment with opposite directions.
The sequence of slope changes is the chain that defines the SCC approximation
to the original curve. For example, the code for the curve in Fig. 11.6(e) is 0.12, 0.20,
0.21, 0.11, −0.11, −0.12, −0.21, −0.22, −0.24, −0.28, −0.28, −0.31, −0.30. The accu-
racy of the slope changes defined in Fig. 11.6(d) is 10 −2 , resulting in an “alphabet”
of 199 possible symbols (slope changes). The accuracy can be changed, of course. For
instance, and accuracy of 10 −1 produces an alphabet of 19 symbols (see Problem 11.6).
Unlike a Freeman code, there is no guarantee that the last point of the coded curve
will coincide with the last point of the curve itself. However, shortening the line

Line segment

a b c d e
FIGURE 11.6 (a) An open curve. (b) A straight-line segment. (c) Traversing the curve using circumferences to deter-
mine slope changes; the dot is the origin (starting point). (d) Range of slope changes in the open interval (−1, 1)
(the arrow in the center of the chart indicates direction of travel). There can be ten subintervals between the slope
numbers shown.(e) Resulting coded curve showing its corresponding numerical sequence of slope changes. (Cour-
tesy of Professor Ernesto Bribiesca, IIMAS-UNAM, Mexico.)

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 820 6/16/2017 2:14:53 PM


11.2 Boundary Preprocessing 821

length and/or increasing angle resolution often resolves the problem, because the
results of computations are rounded to the nearest integer (remember we work with
integer coordinates).
The inverse of an SCC is another chain of the same length, obtained by reversing
the order of the symbols and their signs. The mirror image of a chain is obtained by
starting at the origin and reversing the signs of the symbols. Finally, we point out
that the preceding discussion is directly applicable to closed curves. Curve following
would start at an arbitrary point (for example, the uppermost-leftmost point of the
curve) and proceed in a clockwise or counterclockwise direction, stopping when the
starting point is reached. We will illustrate an use of SSCs in Example 11.6.

BOUNDARY APPROXIMATIONS USING MINIMUM-PERIMETER


POLYGONS
A digital boundary can be approximated with arbitrary accuracy by a polygon. For a
For an open curve, the closed curve, the approximation becomes exact when the number of segments of the
number of segments
of an exact polygonal
polygon is equal to the number of points in the boundary, so each pair of adjacent
approximation is equal points defines a segment of the polygon. The goal of a polygonal approximation
to the number of points
minus 1.
is to capture the essence of the shape in a given boundary using the fewest pos-
sible number of segments. Generally, this problem is not trivial, and can turn into
a time-consuming iterative search. However, approximation techniques of modest
complexity are well suited for image-processing tasks. Among these, one of the most
powerful is representing a boundary by a minimum-perimeter polygon (MPP), as
defined in the following discussion.

Foundation
An intuitive approach for computing MPPs is to enclose a boundary [see Fig. 11.7(a)]
by a set of concatenated cells, as in Fig. 11.7(b). Think of the boundary as a rubber
band contained in the gray cells in Fig. 11.7(b). As it is allowed to shrink, the rubber
band will be constrained by the vertices of the inner and outer walls of the region
of the gray cells. Ultimately, this shrinking produces the shape of a polygon of mini-
mum perimeter (with respect to this geometrical arrangement) that circumscribes
the region enclosed by the cell strip, as in Fig. 11.7(c). Note in this figure that all the
vertices of the MPP coincide with corners of either the inner or the outer wall.
The size of the cells determines the accuracy of the polygonal approximation.
In the limit, if the size of each (square) cell corresponds to a pixel in the boundary,
the maximum error in each cell between the boundary and the MPP approxima-
tion would be 2 d, where d is the minimum possible distance between pixels (i.e.,
the distance between pixels established by the resolution of the original sampled
boundary). This error can be reduced in half by forcing each cell in the polygonal
approximation to be centered on its corresponding pixel in the original boundary.
The objective is to use the largest possible cell size acceptable in a given application,
thus producing MPPs with the fewest number of vertices. Our objective in this sec-
tion is to formulate a procedure for finding these MPP vertices.
The cellular approach just described reduces the shape of the object enclosed
by the original boundary, to the area circumscribed by the gray walls in Fig. 11.7(b).

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 821 6/16/2017 2:14:53 PM


822 Chapter 11 Feature Extraction

a b c
FIGURE 11.7 (a) An object boundary. (b) Boundary enclosed by cells (shaded). (c) Minimum-perimeter polygon
obtained by allowing the boundary to shrink. The vertices of the polygon are created by the corners of the inner
and outer walls of the gray region.

Figure 11.8(a) shows this shape in dark gray. Suppose that we traverse the bound-
ary of the dark gray region in a counterclockwise direction. Every turn encountered
A convex vertex is the
in the traversal will be either a convex or a concave vertex (the angle of a vertex is
center point of a triplet defined as an interior angle of the boundary at that vertex). Convex and concave
of points that define an vertices are shown, respectively, as white and blue dots in Fig. 11.8(b). Note that
angle in the range
0° < u < 180°. Similarly, these vertices are the vertices of the inner wall of the light-gray bounding region in
angles of a concave Fig. 11.8(b), and that every concave (blue) vertex in the dark gray region has a corre-
vertex are in the range
180° < u < 360°. An sponding concave “mirror” vertex in the light gray wall, located diagonally opposite
angle of 180° defines a the vertex. Figure 11.8(c) shows the mirrors of all the concave vertices, with the MPP
degenerate vertex (i.e.,
segment of a straight from Fig. 11.7(c) superimposed for reference. We see that the vertices of the MPP
line), which cannot be an coincide either with convex vertices in the inner wall (white dots) or with the mir-
MPP-vertex.
rors of the concave vertices (blue dots) in the outer wall. Only convex vertices of the
inner wall and concave vertices of the outer wall can be vertices of the MPP. Thus,
our algorithm needs to focus attention only on those vertices.

MPP Algorithm
The set of cells enclosing a digital boundary [e.g., the gray cells in Fig. 11.7(b)] is
called a cellular complex. We assume the cellular complexes to be simply connected,
in the sense the boundaries they enclose are not self-intersecting. Based on this
assumption, and letting white (W) denote convex vertices, and blue (B) denote mir-
rored concave vertices, we state the following observations:

1. The MPP bounded by a simply connected cellular complex is not self-intersecting.


2. Every convex vertex of the MPP is a W vertex, but not every W vertex of a bound-
ary is a vertex of the MPP.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 822 6/16/2017 2:14:54 PM


11.2 Boundary Preprocessing 823

Direction of travel

a b c
FIGURE 11.8 (a) Region (dark gray) resulting from enclosing the original boundary by cells (see Fig. 11.7). (b) Convex
(white dots) and concave (blue dots) vertices obtained by following the boundary of the dark gray region in the
counterclockwise direction. (c) Concave vertices (blue dots) displaced to their diagonal mirror locations in the
outer wall of the bounding region; the convex vertices are not changed. The MPP (solid boundary) is superimposed
for reference.

3. Every mirrored concave vertex of the MPP is a B vertex, but not every B vertex
of a boundary is a vertex of the MPP.
4. All B vertices are on or outside the MPP, and all W vertices are on or inside the
MPP.
5. The uppermost-leftmost vertex in a sequence of vertices contained in a cellular
complex is always a W vertex of the MPP (see Problem 11.8).

These assertions can be proved formally (Sklansky et al. [1972], Sloboda et al. [1998],
and Klette and Rosenfeld [2004]). However, their correctness is evident for our pur-
poses (see Fig. 11.8), so we do not dwell on the proofs here. Unlike the angles of the
vertices of the dark gray region in Fig. 11.8, the angles sustained by the vertices of
the MPP are not necessarily multiples of 90°.
In the discussion that follows, we will need to calculate the orientation of triplets
of points. Consider a triplet of points, ( a, b, c ) , and let the coordinates of these points
be a = (ax , ay ), b = (bx , by ), and c = (cx , cy ). If we arrange these points as the rows of
the matrix

⎡ ax ay 1⎤
⎢ ⎥
A = ⎢bx by 1⎥ (11-1)
⎢ cx cy 1⎥⎦

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 823 6/16/2017 2:14:54 PM


824 Chapter 11 Feature Extraction

Then, it follows from matrix analysis that

⎧> 0 if (a, b, c) is a counterclockwise sequence



det(A) = ⎨ 0 if the points are colinear (11-2)
⎪< 0 if (a, b, c) is a clockwise sequence

where det(A) is the determinant of A. In terms of this equation, movement in a


counterclockwise or clockwise direction is with respect to a right-handed coordinate
system (see the footnote in the discussion of Fig. 2.19). For example, using the image
coordinate system from Fig. 2.19 (in which the origin is at the top left, the positive
x-axis extends vertically downward, and the positive y-axis extends horizontally to
the right), the sequence a = (3, 4), b = (2, 3), and c = (3, 2) is in the counterclockwise
direction. This would give det(A) > 0 when substituted into Eq. (11-2). It is conve-
nient when describing the algorithm to define

sgn(a, b, c) ≡ det(A) (11-3)

so that sgn(a, b, c) > 0 for a counterclockwise sequence, sgn(a, b, c) < 0 for a clock-
wise sequence, and sgn(a, b, c) = 0 when the points are collinear. Geometrically,
sgn(a, b, c) > 0 indicates that point c lies on the positive side of pair (a, b) (i.e., c lies on
the positive side of the line passing through points a and b). Similarly, if sgn(a, b, c) < 0,
point c lies on the negative side of the line. Equations (11-2) and (11-3) give the same
result if the sequence (c, a, b) or (b, c, a) is used because the direction of travel in the
sequence is the same as for (a, b, c). However, the geometrical interpretation is differ-
ent. For example, sgn(c, a, b) > 0 indicates that point b lies on the positive side of the
line through points c and a.
To prepare the data for the MPP algorithm, we form a list of triplets consisting
of a vertex label (e.g., V0 , V1 , etc.); the coordinates of each vertex; and an additional
element denoting whether the vertex is W or B. It is important that the concave ver-
tices be mirrored, as in Fig. 11.8(c), that the vertices be in sequential order,† and that
the first vertex be the uppermost-leftmost vertex, which we know from property 5
is a W vertex of the MPP. Let V0 denote this vertex. We assume that the vertices are
arranged in the counterclockwise direction. The algorithm for finding MPPs uses
two “crawler” points: a white crawler (WC ) and a blue crawler (BC ). WC crawls along
the convex (W) vertices, and BC crawls along the concave (B) vertices. These two
crawler points, the last MPP vertex found, and the vertex being examined are all that
is necessary to implement the algorithm.
The algorithm starts by setting WC = BC = V0 (recall that V0 is an MPP-vertex).
Then, at any step in the algorithm, let VL denote the last MPP vertex found, and let
Vk denote the current vertex being examined. One of the following three conditions
can exist between VL , Vk , and the two crawler points:


Vertices of a boundary can be ordered by tracking the boundary using the boundary-following algorithm
discussed earlier.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 824 6/16/2017 2:14:57 PM


11.2 Boundary Preprocessing 825

(a) Vk is on the positive side of the line through the pair of points (VL , WC ), in which
case sgn (VL , WC , Vk ) > 0.
(b) Vk is on the negative side of the line though pair (VL , WC ) or is collinear with
it; that is sgn (VL , WC , Vk ) ≤ 0. Simultaneously, Vk lies to the positive side of the
line through (VL , BC ) or is collinear with it; that is, sgn (VL , BC , Vk ) ≥ 0.
(c) Vk is on the negative side of the line though pair (VL , BC ) , in which case
sgn (VL , BC , Vk ) < 0.

If condition (a) holds, the next MPP vertex is WC , and we let VL = WC ; then we
reinitialize the algorithm by setting WC = BC = VL , and start with the next vertex
after the newly changed VL .
If condition (b) holds, Vk becomes a candidate MPP vertex. In this case, we set
WC = Vk if Vk is convex (i.e., it is a W vertex); otherwise we set BC = Vk . We then
continue with the next vertex in the list.
If condition (c) holds, the next MPP vertex is BC and we let VL = BC ; then we
reinitialize the algorithm by setting WC = BC = VL and start with the next vertex
after the newly changed VL .
The algorithm stops when it reaches the first vertex again, and thus has processed
all the vertices in the polygon. The VL vertices found by the algorithm are the ver-
tices of the MPP. Klette and Rosenfeld [2004] have proved that this algorithm finds
all the MPP vertices of a polygon enclosed by a simply connected cellular complex.

EXAMPLE 11.2 : A numerical example showing the details of how the MPP algorithm works.
A simple example in which we can follow the algorithm step-by-step will help clarify the preceding con-
cepts. Consider the vertices in Fig. 11.8(c). In our image coordinate system, the top-left point of the grid
is at coordinates (0, 0). Assuming unit grid spacing, the first few (counterclockwise) vertices are:

V0 (1, 4) W ⏐ V1 (2, 3) B ⏐ V2 (3, 3) W ⏐ V3 (3, 2) B ⏐ V4 (4, 1) W ⏐ V5 (7, 1) W ⏐ V6 (8, 2) B ⏐ V7 (9, 2) B

where the triplets are separated by vertical lines, and the B vertices are mirrored, as required by the
algorithm.
The uppermost-leftmost vertex is always the first vertex of the MPP, so we start by letting VL and V0
be equal, VL = V0 = (1, 4), and initializing the other variables: WC = BC = VL = (1, 4 ) .
The next vertex is V1 = ( 2, 3) . In this case we have sgn (VL , WC , V1 ) = 0 and sgn (VL , BC , V1 ) = 0, so
condition (b) holds. Because V1 is a B (concave) vertex, we update the blue crawler: BC = V1 = ( 2, 3) . At
this stage, we have VL = (1, 4), WC = (1, 4), and BC = (2, 3).
Next, we look at V2 = ( 3, 3) . In this case, sgn (VL , WC , V2 ) = 0, and sgn (VL , BC , V2 ) = 1, so condition (b)
holds. Because V2 is W, we update the white crawler: WC = (3, 3).
The next vertex is V3 = ( 3, 2 ) . At this junction we have VL = (1, 4), WC = (3, 3), and BC = (2, 3). Then,
sgn (VL , WC , V3 ) = −2 and sgn (VL , BC , V3 ) = 0, so condition (b) holds again. Because V3 is B, we let
BC = V3 = (4, 3) and look at the next vertex.
The next vertex is V4 = ( 4, 1) . We are working with VL = (1, 4), WC = (3, 3), and BC = (3, 2). The values
of sgn are sgn(VL , WC , V4 ) = −3 and sgn(VL , BC , V4 ) = 0. So, condition (b) holds yet again, and we let
WC = V4 = (4, 1) because V4 is a W vertex.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 825 6/16/2017 2:15:02 PM


826 Chapter 11 Feature Extraction

The next vertex is V5 = (7, 1). Using the values from the previous step we obtain sgn(VL , WC , V5 ) = 9,
so condition (a) is satisfied. Therefore, we let VL = WC = (4, 1) (this is V4 ) and reinitialize:
BC = WC = VL = (4, 1). Note that once we knew that sgn(VL , WC , V5 ) > 0 we did not bother to compute
the other sgn expression. Also, reinitialization means that we start fresh again by examining the next
vertex following the newly found MPP vertex. In this case, that next vertex is V5 , so we visit it again.
With V5 = ( 7, 1) , and using the new values of VL , WC , and BC , it follows that sgn (VL , WC , V5 ) = 0 and
sgn (VL , BC , V5 ) = 0, so condition (b) holds. Therefore, we let WC = V5 = ( 7, 1) because V5 is a W vertex.
The next vertex is V6 = ( 8, 2 ) and sgn (VL , WC , V6 ) = 3, so condition (a) holds. Thus, we let
VL = WC = ( 7, 1) and reinitialize the algorithm by setting WC = BC = VL .
Because the algorithm was reinitialized at V5 , the next vertex is V6 = (8, 2) again. Using the results
from the previous step gives us sgn(VL , WC , V6 ) = 0 and sgn(VL , BC , V6 ) = 0, so condition (b) holds this
time. Because V6 is B we let BC = V6 = (8, 2).
Summarizing, we have found three vertices of the MPP up to this point: V1 = (1, 4), V4 = (4, 1), and
V5 = (7, 1). Continuing as above with the remaining vertices results in the MPP vertices in Fig. 11.8(c)
(see Problem 11.9). The mirrored B vertices at (2, 3), (3, 2), and on the lower-right side at (13, 10), are on
the boundary of the MPP. However, they are collinear and thus are not considered vertices of the MPP.
Appropriately, the algorithm did not detect them as such.

EXAMPLE 11.3 : Applying the MPP algorithm.


Figure 11.9(a) is a 566 × 566 binary image of a maple leaf, and Fig. 11.9(b) is its 8-connected boundary.
The sequence in Figs. 11.9(c) through (h) shows MMP representations of this boundary using square
cellular complex cells of sizes 2, 4, 6, 8, 16, and 32, respectively (the vertices in each figure were con-
nected with straight lines to form a closed boundary). The leaf has two major features: a stem and three
main lobes. The stem begins to be lost for cell sizes greater than 4 × 4, as Fig. 11.9(e) shows. The three
main lobes are preserved reasonably well, even for a cell size of 16 × 16, as Fig. 11.9(g) shows. However,
we see in Fig. 11.8(h) that by the time the cell size is increased to 32 × 32, this distinctive feature has
been nearly lost.
The number of points in the original boundary [Fig. 11.9(b)] is 1900. The numbers of vertices in
Figs. 11.9(c) through (h) are 206, 127, 92, 66, 32, and 13, respectively. Figure 11.9(e), which has 127 ver-
tices, retained all the major features of the original boundary while achieving a data reduction of over
90%. So here we see a significant advantage of MMPs for representing a boundary. Another important
advantage is that MPPs perform boundary smoothing. As explained in the previous section, this is a
usual requirement when representing a boundary by a chain code.

SIGNATURES
A signature is a 1-D functional representation of a 2-D boundary and may be gener-
ated in various ways. One of the simplest is to plot the distance from the centroid
to the boundary as a function of angle, as illustrated in Fig. 11.10. The basic idea of
using signatures is to reduce the boundary representation to a 1-D function that
presumably is easier to describe than the original 2-D boundary.
Based on the assumptions of uniformity in scaling with respect to both axes, and
that sampling is taken at equal intervals of u, changes in the size of a shape result
in changes in the amplitude values of the corresponding signature. One way to

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 826 6/16/2017 2:15:05 PM


11.2 Boundary Preprocessing 827

a b c d
e f g h
FIGURE 11.9 (a) 566 × 566 binary image. (b) 8-connected boundary. (c) through (h), MMPs obtained using square cells
of sizes 2, 4, 6, 8, 16, and 32, respectively (the vertices were joined by straight-line segments for display). The number
of boundary points in (b) is 1900. The numbers of vertices in (c) through (h) are 206, 127, 92, 66, 32, and 13, respec-
tively. Images (b) through (h) are shown as negatives to make the boundaries easier to see.

normalize for this is to scale all functions so that they always span the same range of
values, e.g., [0, 1]. The main advantage of this method is simplicity, but it has the dis-
advantage that scaling of the entire function depends on only two values: the mini-
mum and maximum. If the shapes are noisy, this can be a source of significant error
from object to object. A more rugged (but also more computationally intensive)
approach is to divide each sample by the variance of the signature, assuming that
the variance is not zero—as in the case of Fig. 11.10(a)—or so small that it creates
computational difficulties. Using the variance yields a variable scaling factor that
is inversely proportional to changes in size and works much as automatic volume
control does. Whatever the method used, the central idea is to remove dependency
on size while preserving the fundamental shape of the waveforms.
Distance versus angle is not the only way to generate a signature. For example,
another way is to traverse the boundary and, corresponding to each point on the
boundary, plot the angle between a line tangent to the boundary at that point and a
reference line. The resulting signature, although quite different from the r(u) curves
in Fig. 11.10, carries information about basic shape characteristics. For instance,
horizontal segments in the curve correspond to straight lines along the boundary
because the tangent angle is constant there. A variation of this approach is to use
the so-called slope density function as a signature. This function is a histogram of

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 827 6/16/2017 2:15:05 PM


828 Chapter 11 Feature Extraction

a b
FIGURE 11.10
r r
Distance-versus- u u
angle signatures.
In (a), r(u) is
constant. In (b),
the signature
consists of A A
repetitions of
the pattern r(u) r(u)
r ( u ) = A sec u for
0 ≤ u ≤ p 4 , and 2A
r ( u ) = A csc u for
A A
p 4 < u ≤ p 2.

0 p p 3p 5p 3p 7p p p 3p 5p 3p 7p
p 2p 0 p 2p
4 2 4 4 2 4 4 2 4 4 2 4
u u

tangent-angle values. Because a histogram is a measure of the concentration of val-


ues, the slope density function responds strongly to sections of the boundary with
constant tangent angles (straight or nearly straight segments) and has deep valleys
in sections producing rapidly varying angles (corners or other sharp inflections).

EXAMPLE 11.4 : Signatures of two regions.


Figures 11.11(a) and (d) show two binary objects, and Figs. 11.11(b) and (e) are their boundaries. The
corresponding r(u) signatures in Figs. 11.11(c) and (f) range from 0° to 360° in increments of 1°. The
number of prominent peaks in the signatures is sufficient to differentiate between the shapes of the two
objects.

SKELETONS, MEDIAL AXES, AND DISTANCE TRANSFORMS


Like boundaries, skeletons are related to the shape of a region. Skeletons can be
computed from a boundary by filling the area enclosed by the boundary with fore-
ground values, and treating the result as a binary region. In other words, a skeleton is
computed using the coordinates of points in the entire region, including its boundary.
The idea is to reduce a region to a tree or graph by computing its skeleton. As we
explained in Section 9.5 (see Fig. 9.25), the skeleton of a region is the set of points in
the region that are equidistant from the border of the region.
As is true of thinning, The skeleton is obtained using one of two principal approaches: (1) by succes-
the MAT is highly sively thinning the region (e.g., using morphological erosion) while preserving end
susceptible to boundary
and internal region points and line connectivity (this is called topology-preserving thinning); or (2)
irregularities, so smooth- by computing the medial axis of the region via an efficient implementation of the
ing and other preprocess-
ing steps generally are medial axis transform (MAT) proposed by Blum [1967]. We discussed thinning in
required to obtain a Section 9.5. The MAT of a region R with border B is as follows: For each point p in
clean a binary image.
R, we find its closest neighbor in B. If p has more than one such neighbor, it is said

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 828 6/16/2017 2:15:06 PM


11.2 Boundary Preprocessing 829

a b c
d e f
FIGURE 11.11
(a) and (d) Two
binary regions,
(b) and (e) their
external
boundaries, and
(c) and (f) their
corresponding r(u)
signatures. The
horizontal axes
in (c) and (f) cor-
respond to angles
from 0° to 360°, in
increments of 1°.

to belong to the medial axis of R. The concept of “closest” (and thus the resulting
MAT) depends on the definition of a distance metric (see Section 2.5). Figure 11.12
shows some examples using the Euclidean distance. If the Euclidean distance is used,
the resulting skeleton is the same as what would be obtained by using the maximum
disks from Section 9.5. The skeleton of a region is defined as its medial axis.
The MAT of a region has an intuitive interpretation based on the “prairie fire”
concept discussed in Section 11.3 (see Fig. 11.15). Consider an image region as a
prairie of uniform, dry grass, and suppose that a fire is lit simultaneously along all
the points on its border. All fire fronts will advance into the region at the same speed.
The MAT of the region is the set of points reached by more than one fire front at
the same time.
In general, the MAT comes considerably closer than thinning to producing skel-
etons that “make sense.” However, computing the MAT of a region requires cal-
culating the distance from every interior point to every point on the border of the
region—an impractical endeavor in most applications. Instead, the approach is to
obtain the skeleton equivalently from the distance transform, for which numerous
efficient algorithms exist.
The distance transform of a region of foreground pixels in a background of zeros
is the distance from every pixel to the nearest nonzero valued pixel. Figure 11.13(a)
shows a small binary image, and Fig. 11.13(b) is its distance transform. Observe that
every 1-valued pixel has a distance transform value of 0 because its closest nonzero
valued pixel is itself. For the purpose of finding skeletons equivalent to the MAT,
we are interested in the distance from the pixels of a region of foreground (white)

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 829 6/16/2017 2:15:06 PM


830 Chapter 11 Feature Extraction

a b c
FIGURE 11.12
Medial axes
(dashed) of three
simple regions.

pixels to their nearest background (zero) pixels, which constitute the region bound-
ary. Thus, we compute the distance transform of the complement of the image, as
Figs. 11.13(c) and (d) illustrate. By comparing Figs. 11.13(d) and 11.12(a), we see
in the former that the MAT (skeleton) is equivalent to the ridge of the distance
transform [i.e., the ridge in the image in Fig. 11.13(d)]. This ridge is the set of local
maxima [shown bold in Fig. 11.13(d)]. Figures 11.13(e) and (f) show the same effect
on a larger (414 × 708) binary image.
Finding approaches for computing the distance transform efficiently has been a
topic of research for many years. Numerous approaches exist that can compute the
distance transform with linear time complexity, O(K ), for a binary image with K
pixels. For example, the algorithm by Maurer et al. [2003] not only can compute the
distance transform in O(K ), it can compute it in O(K P ) using P processors.

a b 0 0 0 0 0 1.41 1 1 1 1.41
c d 0 1 1 1 0 1 0 0 0 1
e f 0 1 1 1 0 1 0 0 0 1
FIGURE 11.13 0 0 0 0 0 1.41 1 1 1 1.41
(a) A small
image and (b) its 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
distance 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0
transform. Note 0 1 1 1 1 1 1 1 0 0 1 2 2 2 2 2 1 0
that all 1-valued 0 1 1 1 1 1 1 1 0 0 1 2 3 3 3 2 1 0
pixels in (a) have 0 1 1 1 1 1 1 1 0 0 1 2 2 2 2 2 1 0
corresponding 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0
0’s in (b). (c) A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
small image, and
(d) the distance
transform of its
complement. (e) A
larger image, and
(f) the distance
transform of its
complement. The
Euclidian distance
was used through-
out.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 830 6/16/2017 2:15:07 PM


11.3 Boundary Feature Descriptors 831

a b
c d
FIGURE 11.14
(a) Thresholded
image of blood
vessels.
(b) Skeleton
obtained by
thinning, shown
superimposed
on the image
(note the spurs).
(c) Result of 40
passes of spur
removal.
(d) Skeleton
obtained using the
distance
transform.

EXAMPLE 11.5 : Skeletons obtained using thinning and pruning vs. the distance transform.
Figure 11.14(a) shows a segmented image of blood vessels, and Fig. 11.14(b) shows the skeleton obtained
using morphological thinning. As we discussed in Chapter 9, thinning is characteristically accompanied
by spurs, which certainly is the case here. Figure 11.14(c) shows the result of forty passes of spur removal.
With the exception of the few small spurs visible on the bottom left of the image, pruning did a reason-
able job of cleaning up the skeleton. One drawback of thinning is the loss of potentially important
features. This was not the case here, except the pruned skeleton does not cover the full expanse of the
image. Figure 11.14(c) shows the skeleton obtained using distance transform computations based on fast
marching (see Lee et al. [2005] and Shi and Karl [2008]). The way the algorithm we used implements
branch generation handles ambiguities such as spurs automatically.
The result in Fig. 11.14(d) is slightly superior to the result in Fig. 11.14(c), but both skeletons certainly
capture the important features of the image in this case. A key advantage of the thinning approach
is simplicity of implementation, which can be important in dedicated applications. Overall, distance-
transform formulations tend to produce skeletons less prone to discontinuities, but overcoming the
computational burden of the distance transform results in implementations that are considerably more
complex than thinning.

11.3 BOUNDARY FEATURE DESCRIPTORS


11.3

We begin our discussion of feature descriptors by considering several fundamental


approaches for describing region boundaries.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 831 6/16/2017 2:15:07 PM


904 Chapter 12 Image Pattern Classification

12.1 BACKGROUND
12.1

Humans possess the most sophisticated pattern recognition capabilities in the known
biological world. By contrast, the capabilities of current recognition machines pale
in comparison with tasks humans perform routinely, from being able to interpret the
meaning of complex images, to our ability for generalizing knowledge stored in our
brains. But recognition machines play an important, sometimes even crucial role in
everyday life. Imagine what modern life would be like without machines that read
barcodes, process bank checks, inspect the quality of manufactured products, read
fingerprints, sort mail, and recognize speech.
In image pattern recognition, we think of a pattern as a spatial arrangement of
features. A pattern class is a set of patterns that share some common properties. Pat-
tern recognition by machine encompasses techniques for automatically assigning
patterns to their respective classes. That is, given a pattern or sets of patterns whose
class is unknown, the job of a pattern recognition system is to assign a class label to
each of its input patterns.
There are four main stages involved in recognition: (1) sensing, (2) preprocessing,
(3) feature extraction, and (4) classification. In terms of image processing, sensing is
concerned with generating signals in a spatial (2-D) or higher-dimensional format.
We covered numerous aspects of image sensing in Chapter 1. Preprocessing deals
with techniques for tasks such as noise reduction, enhancement, restoration, and
segmentation, as discussed in earlier chapters. You learned about feature extraction
in Chapter 11. Classification, the focus of this chapter, deals with using a set of fea-
tures as the basis for assigning class labels to unknown input image patterns.
In the following section, we will discuss three basic approaches used for image
pattern classification: (1) classification based on matching unknown patterns against
specified prototypes, (2) optimum statistical classifiers, and (3) neural networks.
One way to characterize the differences between these approaches is in the level
of “engineering” required to transform raw data into formats suitable for computer
processing. Ultimately, recognition performance is determined by the discriminative
power of the features used.
In classification based on prototypes, the objective is to make the features so
unique and easily detectable that classification itself becomes a simple task. A good
example of this are bank-check processors, which use stylized font styles to simplify
machine processing (we will discuss this application in Section 12.3).
In the second category, classification is cast in decision-theoretic, statistical terms,
and the classification approach is based on selecting parameters that can be shown
to yield optimum classification performance in a statistical sense. Here, emphasis is
placed on both the features used, and the design of the classifier. We will illustrate
this approach in Section 12.4 by deriving the Bayes pattern classifier, starting from
basic principles.
In the third category, classification is performed using neural networks. As you
will learn in Sections 12.5 and 12.6, neural networks can operate using engineered
features too, but they have the unique ability of being able to generate, on their own,
representations (features) suitable for recognition. These systems can accomplish
this using raw data, without the need for engineered features.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 904 6/16/2017 2:16:24 PM


12.1 Background 905

One characteristic shared by the preceding three approaches is that they are
based on parameters that must be either specified or learned from patterns that rep-
resent the recognition problem we want to solve. The patterns can be labeled, mean-
ing that we know the class of each pattern, or unlabeled, meaning that the data are
known to be patterns, but the class of each pattern is unknown. A classic example
of labeled data is the character recognition problem, in which a set of character
samples is collected and the identity of each character is recorded as a label from
the group 0 through 9 and a through z. An example of unlabeled data is when we are
seeking clusters in a data set, with the aim of utilizing the resulting cluster centers as
being prototypes of the pattern classes contained in the data.
When working with a labeled data, a given data set generally is subdivided into
Because the examples in
three subsets: a training set, a validation set, and a test set (a typical subdivision might
this chapter are intended be 50% training, and 25% each for the validation and test sets). The process by
to demonstrate basic
principles and are not
which a training set is used to generate classifier parameters is called training. In
large scale, we dispense this mode, a classifier is given the class label of each pattern, the objective being to
with validation and
subdivide the pattern
make adjustments in the parameters if the classifier makes a mistake in identify-
data into training and ing the class of the given pattern. At this point, we might be working with several
test sets.
candidate designs. At the end of training, we use the validation set to compare the
various designs against a performance objective. Typically, several iterations of train-
ing/validation are required to establish the design that comes closest to meeting the
desired objective. Once a design has been selected, the final step is to determine how
it will perform “in the field.” For this, we use the test set, which consists of patterns
that the system has never “seen” before. If the training and validation sets are truly
representative of the data the system will encounter in practice, the results of train-
ing/validation should be close to the performance using the test set. If training/vali-
dation results are acceptable, but test results are not, we say that training/validation
“over fit” the system parameters to the available data, in which case further work on
the system architecture is required. Of course all this assumes that the given data are
truly representative of the problem we want to solve, and that the problem in fact
can be solved by available technology.
A system that is designed using training data is said to undergo supervised learn-
ing. If we are working with unlabeled data, the system learns the pattern classes
themselves while in an unsupervised learning mode. In this chapter, we deal only
with supervised learning. As you will see in this and the next chapter, supervised
learning covers a broad range of approaches, from applications in which a system
learns parameters of features whose form is fixed by a designer, to systems that uti-
Generally, we associate
lize deep learning and large sets of raw data sets to learn, on their own, the features
the concept of deep required for classification. These systems accomplish this task without a human
learning with large sets
of data. These ideas are
designer having to specify the features, a priori.
discussed in more detail After a brief discussion in the next section of how patterns are formed, and on
later in this section and
next.
the nature of patterns classes, we will discuss in Section 12.3 various approaches for
prototype-based classification. In Section 12.4, we will start from basic principles
and derive the equations of the Bayes classifier, an approach characterized by opti-
mum classification performance on an average basis. We will also discuss supervised
training of a Bayes classifier based on the assumption of multivariate Gaussian

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 905 6/16/2017 2:16:24 PM


906 Chapter 12 Image Pattern Classification

distributions. Starting with Section 12.5, we will spend the rest of the chapter discuss-
ing neural networks. We will begin Section 12.5 with a brief introduction to percep-
trons and some historical facts about machine learning. Then, we will introduce the
concept of deep neural networks and derive the equations of backpropagation, the
method of choice for training deep neural nets. These networks are well-suited for
applications in which input patterns are vectors. In Section 12.6, we will introduce
deep convolutional neural networks, which currently are the preferred approach
when the system inputs are digital images. After deriving the backpropagation equa-
tions used for training convolutional nets, we will give several examples of appli-
cations involving classes of images of various complexities. In addition to working
directly with image inputs, deep convolutional nets are capable of learning, on their
own, image features suitable for classification. This is accomplished starting with raw
image data, as opposed to the other classification methods discussed in Sections 12.3
and 12.4, which rely on “engineered” features whose form, as noted earlier, is speci-
fied a priori by a human designer.

12.2 PATTERNS AND PATTERN CLASSES


12.2

In image pattern classification, the two principal pattern arrangements are quantita-
tive and structural. Quantitative patterns are arranged in the form of pattern vectors.
Structural patterns typically are composed of symbols, arranged in the form of strings,
trees, or, less frequently, as graphs. Most of the work in this chapter is based on pat-
tern vectors, but we will discuss structural patterns briefly at the end of this section,
and give an example at the end of Section 12.3.

PATTERN VECTORS
Pattern vectors are represented by lowercase letters, such as x, y, and z, and have
the form

⎡ x1 ⎤
⎢x ⎥
x=⎢ ⎥
2
(12-1)
⎢ ⎥
⎢ ⎥
⎣ xn ⎦
where each component, xi , represents the ith feature descriptor, and n is the total
number of such descriptors. We can express a vector in the form of a column, as
in Eq. (12-1), or in the equivalent row form x = ( x1 , x2 , …, xn ) , where T indicates
T

transposition. A pattern vector may be “viewed” as a point in n-dimensional Euclid-


ean space, and a pattern class may be interpreted as a “hypercloud” of points in this
pattern space. For the purpose of recognition, we like for our pattern classes to be
grouped tightly, and as far away from each other as possible.
Pattern vectors can be formed directly from image pixel intensities by vector-
We discussed linear
izing the image using, for example, linear indexing, as in Fig. 12.1. A more common
indexing in Section 2.4 approach is for pattern elements to be features. An early example is the work of
(see Fig. 2.22). Fisher [1936] who, close to a century ago, reported the use of what then was a new

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 906 6/16/2017 2:16:25 PM


12.2 Patterns and Pattern Classes 907

a b
225 175 90 90 ⎡ 225 ⎤
FIGURE 12.1
⎢125 ⎥
Using linear ⎢ ⎥
indexing to 125 5 90 225 ⎢ 5 ⎥
vectorize a ⎢ ⎥
x=⎢  ⎥
grayscale image. ⎢ 225 ⎥
5 5 5 225
175 ⎢ ⎥
⎢175 ⎥
⎢ ⎥
125 5 225 225 ⎣ 225 ⎦

technique called discriminant analysis to recognize three types of iris flowers (Iris
setosa, virginica, and versicolor). Fisher described each flower using four features:
Sepals are the undergrowth the length and width of the petals, and similarly for the sepals (see Fig. 12.2). This
beneath the petals.
leads to the 4-D vectors shown in the figure. A set of these vectors, obtained for fifty
samples of each flower gender, constitutes the three famous Fisher iris pattern class-
es. Had Fisher been working today, he probably would have added spectral colors
and shape features to his measurements, yielding vectors of higher dimensionality.
We will be working with the original iris data set later in this chapter.
A higher-level representation of patterns is based on feature descriptors of the
types you learned in Chapter 11. For instance, pattern vectors formed from descrip-
tors of boundary shape are well-suited for applications in controlled environments,
such as industrial inspection. Figure 12.3 illustrates the concept. Here, we are inter-
ested in classifying different types of noisy shapes, a sample of which is shown in
the figure. If we represent an object by its signature, we would obtain 1-D signals
of the form shown in Fig. 12.3(b). We can express a signature as a vector by sam-
pling its amplitude at increments of u, then formimg a vector by letting xi = r(ui ),
for i = 0, 1, 2, … , n. Instead of using “raw” sampled signatures, a more common
approach is to compute some function, xi = g ( r(ui )) , of the signature samples and
use them to form vectors. You learned in Section 11.3 several approaches to do this,
such as statistical moments.

FIGURE 12.2
Petal and sepal Sepal
width and length
Petal
measurements
(see arrows)
performed on iris
⎡ x1 ⎤
flowers for the ⎢x ⎥
purpose of data x = ⎢ 2⎥
classification. The ⎢ x3 ⎥
⎢ ⎥
image shown is of ⎣⎢ x4 ⎦⎥
the Iris virginica
gender. (Image
courtesy of x1 = Petal width
USDA.) x2 = Petal length
x3 = Sepal width
x4 = Sepal length

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 907 6/16/2017 2:16:26 PM


908 Chapter 12 Image Pattern Classification

a b
r(u)
FIGURE 12.3
(a) A noisy object ⎡ g ( r(u1 )) ⎤
r ⎢ ⎥
boundary, and (b) g ( r ( u2 )) ⎥
its corresponding
u x=⎢
⎢  ⎥
signature. ⎢ ⎥
⎢⎣ g ( r(un ))⎥⎦

u
p p 3p 5p 3p 7p
0 p 2p
4 2 4 4 2 4

Vectors can be formed also from features of both boundary and regions. For
example, the objects in Fig. 12.4 can be represented by 3-D vectors whose compo-
nents capture shape information related to both boundary and region properties
of single binary objects. Pattern vectors can be used also to represent properties of
image regions. For example, the elements of the 6-D vector in Fig. 12.5 are texture
measures based on the feature descriptors in Table 11.3. Figure 12.6 shows an exam-
ple in which pattern vector elements are features that are invariant to transforma-
tions, such as image rotation and scaling (see Section 11.4).
When working with sequences of registered images, we have the option of using
pattern vectors formed from corresponding pixels in those images (see Fig. 12.7).
Forming pattern vectors in this way implies that recognition will be based on infor-
mation extracted from the same spatial location across the images. Although this
may seem like a very limiting approach, it is ideally suited for applications such as
recognizing regions in multispectral images, as you will see in Section 12.4.
When working with entire images as units, we need the detail afforded by vectors
of much-higher dimensionality, such as those we discussed in Section 11.7 in connec-
tion with the SIFT algorithm. However, a more powerful approach when working
with entire images is to use deep convolutional neural networks. We will discuss
neural nets in detail in Sections 12.5 and 12.6.

STRUCTURAL PATTERNS
Pattern vectors are not suitable for applications in which objects are represented
by structural features, such as strings of symbols. Although they are used much less
than vectors in image processing applications, patterns containing structural descrip-
tions of objects are important in applications where shape is of interest. Figure 12.8
shows an example. The boundaries of the bottles were approximated by a polygon

a b c d
FIGURE 12.4
Pattern vectors
whose components
capture both bound- ⎡ x1 ⎤ x1 = compactness
⎢ ⎥
ary and regional x = ⎢ x2 ⎥ x2 = circularity
characteristics. ⎢⎣ x3 ⎥⎦ x3 = eccentricity

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 908 6/16/2017 2:16:26 PM


12.2 Patterns and Pattern Classes 909
FIGURE 12.5
An example of
pattern vectors
based on
properties of
subimages. See
Table 11.3 for an
explanation of the
components of x.
⎡ x1 ⎤
⎢x ⎥ x1 = max probability
⎢ 2⎥
⎢ x3 ⎥ x2 = correlation
x=⎢ ⎥ x3 = contrast
⎢ x4 ⎥ x4 = uniformity
⎢x ⎥
⎢ 5⎥ x5 = homogeneity
⎣⎢ x6 ⎦⎥ x6 = entropy

FIGURE 12.6 Feature


vectors with
components that
are invariant to ⎡ x1 ⎤ ⎡ f1⎤
transformations ⎢ x ⎥ ⎢f ⎥
such as rotation, ⎢ 2 ⎥ ⎢ 2⎥
⎢ x3 ⎥ ⎢ f3⎥
scaling, and ⎢ ⎥ ⎢ ⎥
translation. The x = ⎢ x4 ⎥ = ⎢ f4⎥
vector compo- ⎢ x ⎥ ⎢f ⎥
⎢ 5 ⎥ ⎢ 5⎥
nents are moment ⎢ x6 ⎥ ⎢ f6⎥
invariants. ⎢ ⎥ ⎢ ⎥
⎣ x7 ⎦ ⎣ f7⎦

The f's are moment invariants

Images in spectral bands 1–3

Spectral band 6
Spectral band 5

Spectral band 4

Spectral band 3
Spectral band 2
x1
Spectral band 1
x2
x3
x 
x4
x5
x6
Images in spectral bands 4 – 6

FIGURE 12.7 Pattern (feature) vectors formed by concatenating corresponding pixels from a set of registered images.
(Original images courtesy of NASA.)

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 909 6/16/2017 2:16:27 PM


910 Chapter 12 Image Pattern Classification

FIGURE 12.8
Symbol string
generated from
a polygonal 
b Direction of travel
approximation of
the boundaries of b
u
medicine bottles. b a =  bubb

Symbol string

u = interior angle
b = line segment of specified length

using the approach explained in Section 11.2. The boundary is subdivided into line
segments (denoted by b in the figure), and the interior angle, u, is computed at each
intersection of two line segments. A string of sequential symbols is generated as the
boundary is traversed in the counterclockwise direction, as the figure shows. Strings
of this form are structural patterns, and the objective, as you will see in Section 12.3,
is to match a given string against stored string prototypes.
A tree is another structural representation, suitable for higher-level descriptions
of an entire image in terms of its component regions. Basically, most hierarchical
ordering schemes lead to tree structures. For example, Fig. 12.9 shows a satellite
image of a heavily built downtown area and surrounding residential areas. Let the
symbol $ represent the root of a tree. The (upside down) tree shown in the figure
was obtained using the structural relationship “composed of.” Thus, the root of the
tree represents the entire image. The next level indicates that the image is composed
of a downtown and residential areas. In turn, the residential areas are composed
of housing, highways, and shopping malls. The next level down in the tree further
describes the housing and highways. We can continue this type of subdivision until
we reach the limit of our ability to resolve different regions in the image.

12.3 PATTERN CLASSIFICATION BY PROTOTYPE MATCHING


12.3

Prototype matching involves comparing an unknown pattern against a set of pro-


totypes, and assigning to the unknown pattern the class of the prototype that is the
most “similar” to the unknown. Each prototype represents a unique pattern class,
but there may be more than one prototype for each class. What distinguishes one
matching method from another is the measure used to determine similarity.

MINIMUM-DISTANCE CLASSIFIER
The minimum-distance One of the simplest and most widely used prototype matching methods is the
classifier is also referred
to as the nearest-neighbor
minimum-distance classifier which, as its name implies, computes a distance-based
classifier. measure between an unknown pattern vector and each of the class prototypes. It
then assigns the unknown pattern to the class of its closest prototype. The prototype

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 910 6/16/2017 2:16:28 PM


12.3 Pattern Classification by Prototype Matching 911

Image
$

Downtown Residential

Buildings Highways Housing Shopping Highways


malls
High Large Multiple Numerous Loops
densitity structures intersections Low Small Wooded Single Few
density structures areas intersections

FIGURE 12.9 Tree representation of a satellite image showing a heavily built downtown area (Washington, D.C.) and
surrounding residential areas. (Original image courtesy of NASA.)

vectors of the minimum-distance classifier usually are the mean vectors of the vari-
ous pattern classes:
1
n j x∑
mj = x j = 1, 2, …, Nc (12-2)
∈c j

where n j is the number of pattern vectors used to compute the jth mean vector,
c j is the jth pattern class, and Nc is the number of classes. If we use the Euclidean
distance to determine similarity, the minimum-distance classifier computes the dis-
tances
Dj ( x ) =储 x − m j 储 j = 1, 2,…, Nc (12-3)

where 储 a 储 = (aT a)1 2 is the Euclidean norm. The classifier then assigns an unknown
pattern x to class ci if Di (x) < Dj (x) for j = 1, 2, …, Nc , j ≠ i. Ties [i.e., Di (x) = Dj (x)]
are resolved arbitrarily.
It is not difficult to show (see Problem 12.2) that selecting the smallest distance is
equivalent to evaluating the functions

1 T
d j ( x ) = mTj x − mj mj j = 1, 2, …, Nc (12-4)
2

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 911 6/16/2017 2:16:29 PM


912 Chapter 12 Image Pattern Classification

and assigning an unknown pattern x to the class whose prototype yielded the largest
value of d. That is, x is assigned to class ci , if

di (x) > d j (x) j = 1, 2, … , Nc ; j ≠ i (12-5)

When used for recognition, functions of this form are referred to as decision or dis-
criminant functions.
The decision boundary separating class ci from c j is given by the values of x for
which
di (x) = d j (x) (12-6)

or, equivalently, by values of x for which

di (x) − d j (x) = 0 (12-7)

The decision boundaries for a minimum-distance classifier follow directly from this
equation and Eq. (12-4):

dij (x) = di (x) − d j (x)


1 (12-8)
= (mi − m j )T x − (mi − m j )T (mi + m j ) = 0
2
The boundary given by Eq. (12-8) is the perpendicular bisector of the line segment
joining mi and m j (see Problem 12.3). In 2-D (i.e., n = 2), the perpendicular bisector
is a line, for n = 3 it is a plane, and for n > 3 it is called a hyperplane.

EXAMPLE 12.1 : Illustration of the minimum-distance classifier for two classes in 2-D.
Figure 12.10 shows scatter plots of petal width and length values for the classes Iris versicolor and Iris
setosa. As mentioned in the previous section, pattern vectors in the iris database consists of four mea-
surements for each flower. We show only two here so that you can visualize the pattern classes and the
decision boundary between them. We will work with the complete database later in this chapter.
We denote the Iris versicolor and setosa data as classes c1 and c2 , respectively. The means of the two
classes are m1 = ( 4.3, 1.3) and m2 = (1.5, 0.3) . It then follows from Eq. (12-4) that
T T

1 T
d1 ( x ) = mT1 x − m1 m1
2
= 4.3 x1 + 1.3 x2 − 10.1
and
1 T
d2 ( x ) = mT2 x − m2 m2
2
= 1.5 x1 + 0.3 x2 − 1.17
From Eq. (12-8), the equation of the boundary is

d12 (x) = d1 (x) − d2 (x)


= 2.8 x1 + 1.0 x2 − 8.9 = 0

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 912 6/16/2017 2:16:32 PM


12.3 Pattern Classification by Prototype Matching 913

FIGURE 12.10 x2
Decision Iris versicolor
boundary of a Iris setosa
minimum distance 2.8x1 1.0x2  8.9  0
classifier (based 2.0
on two measure-
ments) for the

Petal width (cm)


classes of Iris 1.5
versicolor and Iris
setosa. The dark
dot and square
are the means of 1.0
the two classes.

0.5


0 x1
0 1 2 3 4 5 6 7
Petal length (cm)

Figure 12.10 shows a plot of this boundary. Substituting any pattern vector from class c1 into this equa-
tion would yield d12 (x) > 0. Conversely, any pattern from class c2 would give d12 (x) < 0. Thus, given an
unknown pattern x belonging to one of these two classes, the sign of d12 (x) would be sufficient to deter-
mine the class to which that pattern belongs.

The minimum-distance classifier works well when the distance between means is
large compared to the spread or randomness of each class with respect to its mean.
In Section 12.4 we will show that the minimum-distance classifier yields optimum
performance (in terms of minimizing the average loss of misclassification) when the
distribution of each class about its mean is in the form of a spherical “hypercloud” in
n-dimensional pattern space.
As noted earlier, one of the keys to accurate recognition performance is to specify
features that are effective discriminators between classes. As a rule, the better the
features are at meeting this objective, the better the recognition performance will be.
In the case of the minimum-distance classifier this implies wide separation between
means and tight grouping of the classes.
Systems based on the Banker’s Association E-13B font character are a classic
example of how highly engineered features can be used in conjunction with a simple
classifier to achieve superior results. In the mid-1940s, bank checks were processed
manually, which was a laborious, costly process prone to mistakes. As the volume
of check writing increased in the early 1950s, banks became keenly interested in
automating this task. In the middle 1950s, the E-13B font and the system that reads
it became the standard solution to the problem. As Fig. 12.11 shows, this font set con-
sists of 14 characters laid out on a 9 × 7 grid. The characters are stylized to maximize
the difference between them. The font was designed to be compact and readable by
humans, but the overriding purpose was that the characters should be readable by
machine, quickly, and with very high accuracy.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 913 6/16/2017 2:16:33 PM


914 Chapter 12 Image Pattern Classification

FIGURE 12.11
The American
Bankers
Association
E-13B font
character set and
corresponding
waveforms.

T
r
a
n
s
i
t

A
m
o
u
n
t

O
n

U
s

D
a
s
h

In addition to a stylized font design, the operation of the reading system is further
enhanced by printing each character using an ink that contains finely ground mag-
Appropriately, recogni-
netic material. To improve character detectability in a check being read, the ink is
tion of magnetized char- subjected to a magnetic field that accentuates each character against the background.
acters is referred to as The stylized design further enhances character detectability. The characters are
Magnetic Ink Character
Recognition (MICR). scanned in a horizontal direction with a single-slit reading head that is narrower but
taller than the characters. As a check passes through the head, the sensor produces a
1-D electrical signal (a signature) that is conditioned to be proportional to the rate
of increase or decrease of the character area under the head. For example, consider
the waveform of the number 0 in Fig. 12.11. As a check moves to the right past the
head, the character area seen by the sensor begins to increase, producing a positive
derivative (a positive rate of change). As the right leg of the character begins to pass
under the head, the character area seen by the sensor begins to decrease, produc-
ing a negative derivative. When the head is in the middle zone of the character, the
area remains nearly constant, producing a zero derivative. This waveform repeats
itself as the other leg of the character enters the head. The design of the font ensures
that the waveform of each character is distinct from all others. It also ensures that
the peaks and zeros of each waveform occur approximately on the vertical lines of
the background grid on which these waveforms are displayed, as the figure shows.
The E-13B font has the property that sampling the waveforms only at these (nine)

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 914 6/16/2017 2:16:34 PM


12.3 Pattern Classification by Prototype Matching 915

points yields enough information for their accurate classification. The effectiveness
of these highly engineered features is further refined by the magnetized ink, which
results in clean waveforms with almost no scatter.
Designing a minimum-distance classifier for this application is straightforward.
We simply store the sample values of each waveform at the vertical lines of the grid,
and let each set of the resulting samples be represented as a 9-D prototype vector,
m j , j = 1, 2,…, 14. When an unknown character is to be classified, the approach is
to scan it in the manner just described, express the grid samples of the waveform as
a 9-D vector, x, and identify its class by selecting the class of the prototype vector
that yields the highest value in Eq. (12-4). We do not even need a computer to do
this. Very high classification speeds can be achieved with analog circuits composed
of resistor banks (see Problem 12.4).
The most important lesson in this example is that a recognition problem often can
be made trivial if we can control the environment in which the patterns are gener-
ated. The development and implementation of the E13-B font reading system is a
striking example of this fact. On the other hand, this system would be inadequate if
we added the requirement that it has to recognize the textual content and signature
written on each check. For this, we need systems that are significantly more complex,
such as the convolutional neural networks we will discuss in Section 12.6.

USING CORRELATION FOR 2-D PROTOTYPE MATCHING


We introduced the basic idea of spatial correlation and convolution in Section 3.4,
and used these concepts extensively in Chapter 3 for spatial filtering. From Eq. (3-34),
we know that correlation of a kernel w with an image f ( x, y) is given by

(w 夽 f )( x, y) = ∑ ∑ w(s, t )f ( x + s, y + t ) (12-9)
s t

where the limits of summation are taken over the region shared by w and f . This
equation is evaluated for all values of the displacement variables x and y so all ele-
ments of w visit every pixel of f . As you know, correlation has its highest value(s)
in the region(s) where f and w are equal or nearly equal. In other words, Eq. (12-9)
finds locations where w matches a region of f . But this equation has the drawback
that the result is sensitive to changes in the amplitude of either function. In order
To be formal, we should to normalize correlation to amplitude changes in one or both functions, we perform
refer to correlation (and matching using the correlation coefficient instead:
the correlation
coefficient) as cross-

∑ ∑ [ w(s, t ) - w ] ⎡⎣ f ( x + s, y + t ) − f
correlation when the

functions are different, xy ⎦
and as autocorrelation
when they are the same.
g( x, y) = s t
1
(12-10)
⎧ ⎫
⎨ ∑ ∑ [ w( s, t ) − w ] ∑ ∑ ⎡⎣ f ( x + s, y + t ) − fxy ⎤⎦ ⎬
However, it is customary 2 2 2
to use the generic term
correlation and ⎩ s t s t ⎭
correlation coefficient,
except when the distinc-
tion is important (as in
where the limits of summation are taken over the region shared by w and f , w is the
deriving equations, in average value of the kernel (computed only once), and fxy is the average value of f in
which it makes a dif-
ference which is being
the region coincident with w. In image correlation work, w is often referred to as a
applied). template (i.e., a prototype subimage) and correlation is referred to as template matching.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 915 6/16/2017 2:16:36 PM


916 Chapter 12 Image Pattern Classification

FIGURE 12.12
The mechanics of (m  1)/2
template
matching. Origin
(n  1)/ 2
n

m
(x, y)
Template w
centered at an arbitrary
location (x, y)

Image, f
Padding

It can be shown (see Problem 12.5) that g( x, y) has values in the range [ −1, 1] and is thus
normalized to changes in the amplitudes of w and f . The maximum value of g occurs
when the normalized w and the corresponding normalized region in f are identical.
This indicates maximum correlation (the best possible match). The minimum occurs
when the two normalized functions exhibit the least similarity in the sense of Eq. (12-10).
Figure 12.12 illustrates the mechanics of the procedure just described. The border
around image f is padding, as explained in Section 3.4. In template matching, values
of correlation when the center of the template is past the border of the image gener-
ally are of no interest, so the padding is limited to half the kernel width.
The template in Fig. 12.12 is of size m × n, and it is shown with its center at an
arbitrary location ( x, y). The value of the correlation coefficient at that point is com-
puted using Eq. (12-10). Then, the center of the template is incremented to an adja-
cent location and the procedure is repeated. Values of the correlation coefficient
g( x, y) are obtained by moving the center of the template (i.e., by incrementing x
and y) so the center of w visits every pixel in f . At the end of the procedure, we
look for the maximum in g( x, y) to find where the best match occurred. It is possible
to have multiple locations in g( x, y) with the same maximum value, indicating sev-
eral matches between w and f .

EXAMPLE 12.2 : Matching by correlation.


Figure 12.13(a) shows a 913 × 913 satellite image of 1992 Hurricane Andrew, in which the eye of the
storm is clearly visible. We want to use correlation to find the location of the best match in Fig. 12.13(a)
of the template in Fig. 12.13(b), which is a 31 × 31 subimage of the eye of the storm. Figure 12.13(c)
shows the result of computing the correlation coefficient in Eq. (12-10) for all values of x and y in
the original image. The size of this image was 943 × 943 pixels due to padding (see Fig. 12.12), but we
cropped it to the size of the original image for display. The intensity in this image is proportional to the
correlation values, and all negative correlations were clipped at 0 (black) to simplify the visual analysis
of the image. The area of highest correlation values appears as a small white region in this image. The
brightest point in this region matches with the center of the eye of the storm. Figure 12.13(d) shows as a

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 916 6/16/2017 2:16:38 PM


12.3 Pattern Classification by Prototype Matching 917

a b
c d
FIGURE 12.13
(a) 913 × 913
satellite image
of Hurricane
Andrew.
(b) 31 × 31
template of the
eye of the storm.
(c) Correlation
coefficient shown
as an image (note
the brightest
point, indicated
by an arrow).
(d) Location of
the best match
(identified by the
arrow). This point
is a single pixel,
but its size was
enlarged to make
it easier to see.
(Original image
courtesy of
NOAA.)

white dot the location of this maximum correlation value (in this case there was a unique match whose
maximum value was 1), which we see corresponds closely with the location of the eye in Fig. 12.13(a).

MATCHING SIFT FEATURES


We discussed the scale-invariant feature transform (SIFT) in Section 11.7. SIFT
computes a set of invariant features that can be used for matching between known
(prototype) and unknown images. The SIFT implementation in Section 11.7 yields
128-dimensional feature vectors for each local region in an image. SIFT performs
matching by looking for correspondences between sets of stored feature vector pro-
totypes and feature vectors computed for an unknown image. Because of the large
number of features involved, searching for exact matches is computationally inten-
sive. Instead, the approach is to use a best-bin-first method that can identify the near-
est neighbors with high probability using only a limited amount of computation (see
Lowe [1999], [2004]). The search is further simplified by looking for clusters of poten-
tial solutions using the generalized Hough transform proposed by Ballard [1981]. We

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 917 6/16/2017 2:16:38 PM


918 Chapter 12 Image Pattern Classification
FIGURE 12.14
Circuit board
image of size
948 × 915 pixels,
and a subimage
of one of the
connectors. The
subimage is of size
212 × 128 pixels,
shown zoomed
on the right for
clarity. (Original
image courtesy of
Mr. Joseph E.
Pascente, Lixi,
Inc.)

know from the discussion in Section 10.2 that the Hough transform simplifies looking
for data patterns by utilizing bins that reduce the level of detail with which we look at
a data set. We already discussed the SIFT algorithm in Section 11.7. The focus in this
section is to further illustrate the capabilities of SIFT for prototype matching.
Figure 12.14 shows the circuit board image we have used several times before.
The small rectangle enclosing the rightmost connector on the top of the large image
identifies an area from which an image of the connector was extracted. The small
image is shown zoomed for clarity. The sizes of the large and small images are shown
in the figure caption. Figure 12.15 shows the keypoints found by SIFT, as explained
in Section 11.7. They are visible as faint lines on both images. The zoomed view of
the subimage shows them a little clearer. It is important to note that the keypoints
for the image and subimage were found independently by SIFT. The large image
had 2714 keypoints, and the small image had 35.
Figure 12.16 shows the matches between keypoints found by SIFT. A total of 41
matches were found between the two images. Because there are only 35 keypoints

FIGURE 12.15
Keypoints found
by SIFT. The
large image has
2714 keypoints
(visible as faint
gray lines). The
subimage has 35
keypoints. This is
a separate image,
and SIFT found
its keypoints inde-
pendently of the
large image. The
zoomed section is
shown for clarity.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 918 6/16/2017 2:16:39 PM


12.3 Pattern Classification by Prototype Matching 919

FIGURE 12.16
Matches found by
SIFT between the
large and small
images. A total of
41 matching pairs
were found. They
are shown Errors
connected by
straight lines.
Only three of the
matches were
“real” errors
(labeled “Errors”
in the figure).

in the small image, obviously at least six matches are either incorrect, or there are
multiple matches. Three of the errors are clearly visible as matches with connectors
in the middle of the large image. However, if you compare the shape of the connec-
tors in the middle of the large image, you can see that they are virtually identical to
parts of the connectors on the right. Therefore, these errors can be explained on that
basis. The other three extra matches are easier to explain. All connectors on the top
right of the circuit board are identical, and we are comparing one of them against
the rest. There is no way for a system to tell the difference between them. In fact, by
looking at the connecting lines, we can see that the matches are between the subim-
age and all five connectors. These in fact are correct matches between the subimage
and other connectors that are identical to it.

MATCHING STRUCTURAL PROTOTYPES


The techniques discussed up to this point deal with patterns quantitatively, and
largely ignore any structural relationships inherent in pattern shapes. The methods
discussed in this section seek to achieve pattern recognition by capitalizing precisely
on these types of relationships. In this section, we introduce two basic approaches
for the recognition of boundary shapes based on string representations, which are
the most practical approach in structural pattern recognition.

Matching Shape Numbers


A procedure similar in concept to the minimum-distance classifier introduced ear-
lier for pattern vectors can be formulated for comparing region boundaries that are

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 919 6/16/2017 2:16:39 PM


920 Chapter 12 Image Pattern Classification

described by shape numbers. With reference to the discussion in Section 11.3, the
degree of similarity, k, between two region boundaries, is defined as the largest order
for which their shape numbers still coincide. For example, let a and b denote shape
numbers of closed boundaries represented by 4-directional chain codes. These two
shapes have a degree of similarity k if
Parameter j starts at
4 and is always even s j ( a ) = s j ( b) for j = 4, 6, 8, …, k; and
because we are working (12-11)
with 4-connectivity, and
we require that
s j ( a ) ≠ s j ( b) for j = k + 2, k + 4, …
boundaries be closed.

where s indicates shape number, and the subscript indicates shape order. The dis-
tance between two shapes a and b is defined as the inverse of their degree of simi-
larity:
1
D ( a, b) = (12-12)
k
This expression satisfies the following properties:

D ( a, b) ≥ 0
D ( a, b) = 0 if and only if a = b (12-13)
D ( a, c ) ≤ max ⎡⎣ D ( a, b) , D ( b, c )⎤⎦

Either k or D may be used to compare two shapes. If the degree of similarity is used,
the larger k is, the more similar the shapes are (note that k is infinite for identical
shapes). The reverse is true when Eq. (12-12) is used.

EXAMPLE 12.3 : Matching shape numbers.


Suppose we have a shape, f , and want to find its closest match in a set of five shape prototypes, denoted
by a, b, c, d, and e, as shown in Fig. 12.17(a). The search may be visualized with the aid of the similarity
tree in Fig. 12.17(b). The root of the tree corresponds to the lowest possible degree of similarity, which
is 4. Suppose shapes are identical up to degree 8, with the exception of shape a, whose degree of simi-
larity with respect to all other shapes is 6. Proceeding down the tree, we find that shape d has degree of
similarity 8 with respect to all others, and so on. Shapes f and c match uniquely, having a higher degree
of similarity than any other two shapes. Conversely, if a had been an unknown shape, all we could have
said using this method is that a was similar to the other five shapes with degree of similarity 6. The same
information can be summarized in the form of the similarity matrix in Fig. 12.17(c).

String Matching
Suppose two region boundaries, a and b, are coded into strings of symbols, denot-
ed as a1a2 … an and b1b2 … bm , respectively. Let a represent the number of matches
between the two strings, where a match occurs in the kth position if ak = bk . The
number of symbols that do not match is
b = max ( a , b ) − a (12-14)

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 920 6/16/2017 2:16:41 PM


12.3 Pattern Classification by Prototype Matching 921

a
b c
b c
FIGURE 12.17
a
(a) Shapes.
(b) Similarity
tree. (c) Similarity
matrix.
(Bribiesca and d e f
Guzman.)

Degree
4 abcdef a b c d e f

6 abcdef a 6 6 6 6 6

b 8 8 10 8
8 a bcdef
c 8 8 12

10 a d cf be d 8 8

12 a d cf b e e 8

f
14
a d c f b e

where arg is the length (number of symbols) of string in the argument. It can be
shown that b = 0 if and only if a and b are identical (see Problem 12.7).
An effective measure of similarity is the ratio

a a
R= = (12-15)
b max ( a , b ) − a

We see that R is infinite for a perfect match and 0 when none of the corresponding
symbols in a and b match (a = 0 in this case). Because matching is done symbol by
symbol, the starting point on each boundary is important in terms of reducing the
amount of computation required to perform a match. Any method that normalizes
Refer to Section 11.2
for examples of how the to, or near, the same starting point is helpful if it provides a computational advan-
starting point of a curve tage over brute-force matching, which consists of starting at arbitrary points on each
can be normalized.
string, then shifting one of the strings (with wraparound) and computing Eq. (12-15)
for each shift. The largest value of R gives the best match.

EXAMPLE 12.4 : String matching.


Figures 12.18(a) and (b) show sample boundaries from each of two object classes, which were approxi-
mated by a polygonal fit (see Section 11.2). Figures 12.18(c) and (d) show the polygonal approximations

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 921 6/16/2017 2:16:42 PM


922 Chapter 12 Image Pattern Classification

a b
c d
e f
g
FIGURE 12.18
(a) and (b) sample
boundaries of two
different object
classes; (c) and (d)
their corresponding
polygonal
approximations;
(e)–(g) tabulations
of R. R 1.a 1.b 1.c 1.d 1.e 1.f R 2.a 2.b 2.c 2.d 2.e 2.f
(Sze and Yang.)
1.a 2.a
1.b 16.0 2.b 33.5
1.c 9.6 26.3 2.c 4.8 5.8
1.d 5.1 8.1 10.3 2.d 3.6 4.2 19.3
1.e 4.7 7.2 10.3 14.2 2.e 2.8 3.3 9.2 18.3
1.f 4.7 7.2 10.3 8.4 23.7 2.f 2.6 3.0 7.7 13.5 27.0

R 1.a 1.b 1.c 1.d 1.e 1.f

2.a 1.24 1.50 1.32 1.47 1.55 1.48


2.b 1.18 1.43 1.32 1.47 1.55 1.48
2.c 1.02 1.18 1.19 1.32 1.39 1.48
2.d 1.02 1.18 1.19 1.32 1.29 1.40
2.e 0.93 1.07 1.08 1.19 1.24 1.25
2.f 0.89 1.02 1.02 1.24 1.22 1.18

corresponding to the boundaries in Figs. 12.18(a) and (b), respectively. Strings were formed from the
polygons by computing the interior angle, u, between segments as each polygon was traversed clock-
wise. Angles were coded into one of eight possible symbols, corresponding to multiples of 45°; that is,
a1 : 0° < u ≤ 45°; a2 : 45° < u ≤ 90°; … ; a8 : 315° < u ≤ 360°.
Figure 12.18(e) shows the results of computing the measure R for six samples of object 1 against
themselves. The entries are values of R and, for example, the notation 1.c refers to the third string from
object class 1. Figure 12.18(f) shows the results of comparing the strings of the second object class
against themselves. Finally, Fig. 12.18(g) shows the R values obtained by comparing strings of one class
against the other. These values of R are significantly smaller than any entry in the two preceding tabu-
lations. This indicates that the R measure achieved a high degree of discrimination between the two
classes of objects. For example, if the class of string 1.a had been unknown, the smallest value of R result-
ing from comparing this string against sample (prototype) strings of class 1 would have been 4.7 [see
Fig. 12.18(e)]. By contrast, the largest value in comparing it against strings of class 2 would have been
1.24 [see Fig. 12.18(g)]. This result would have led to the conclusion that string 1.a is a member of object
class 1. This approach to classification is analogous to the minimum-distance classifier introduced earlier.

vtucircle.com

DIP4E_GLOBAL_Print_Ready.indb 922 6/16/2017 2:16:42 PM

You might also like