BAI151A Module 5 Textbook
BAI151A Module 5 Textbook
9.1 PRELIMINARIES
9.1
†
Sets are shown as drawings of objects (e.g. squares and triangles) of arbitrary shape. A graphical image contains
sets that have been embedded into a background to form a rectangular array. When we intend for a drawing to
be interpreted as a digital image (or structuring element), we include a grid in illustrations that might otherwise
be ambiguous. Objects in all drawings are shaded, and the background is shown in white. When working with
actual binary images, we say that objects are foreground pixels. All other pixels are background.
vtucircle.com
Objects representeed
as sets Objects represented as
Digital image
a graphical image
FIGURE 9.1 Top row. Left: Objects represented as graphical sets. Center: Objects embedded in a background to form
a graphical image. Right: Object and background are digitized to form a digital image (note the grid). Second row:
Example of a structuring element represented as a set, a graphical image, and finally as a digital SE.
That is, if B is a set of points in 2-D, then B̂ is the set of points in B whose ( x, y)
Reflection is the same
coordinates have been replaced by ( − x, − y). Figure 9.2 shows several examples of
operation we performed digital sets (structuring elements) and their reflection. The dot denotes the origin of
with kernels prior to the SE. Note that reflection consists simply of rotating an SE by 180° about its origin,
spatial convolution, as
explained in Section 3.4. and that all elements, including the background and don’t care elements, are rotated.
The translation of a set B by point z = ( z1 , z2 ) , denoted ( B )z , is defined as
That is, if B is a set of pixels in 2-D, then ( B )z is the set of pixels in B whose ( x, y)
coordinates have been replaced by ( x + z1 , y + z2 ) . This construct is used to trans-
late (slide) a structuring element over an image, and each location perform a set
FIGURE 9.2
Structuring
×
elements and their ×
reflections about the
origin (the ×’s are
don’t care elements, B B̂
and the dots denote ×
×
× × ×
the origin). Reflec-
tion is rotation by B B̂
× ×
×
180° of an SE about
B B̂ B B̂
its origin.
vtucircle.com
a b c A
FIGURE 9.3
(a) A binary image
containing one object
(set), A. (b) A struc- B
turing element, B.
(c) Image resulting Image I Image after morphological operation
from a morphological
operation (see text).
operation between the structuring element and the area of the image directly under
it, as we explained in Fig. 3.28 for correlation and convolution. Both reflection and
translation are defined with respect to the origin of B.
As an introduction to how morphological operations between images and struc-
turing elements are performed, consider Fig 9.3, which shows a simple binary image, I,
consisting of an object (set) A, shown shaded, and a 3 × 3 SE whose elements are
all 1’s (foreground pixels). The background pixels (0’s) are shown in white. We are
interested in performing the following morphological operations: (1) form a new
image, of the same size as I, consisting only of background values initially, (2) trans-
late (slide) B over image I, and (3) at each increment of translation, if B is completely
contained in A, mark the location of the origin of B as a foreground pixel in the new
image; otherwise, leave it as a background point. Figure 9.3(c) is the result after the
origin of B has visited every element of I. We see that, when the origin of B is on
a border element of A, part of B ceases to be contained in A, thus eliminating that
location of the origin of B as a possible foreground point of the new image. The net
result is that the boundary of set A is eroded, as Fig. 9.3(e) shows. Because of the way
in which we defined the operation, the maximum excursion needed for B in I is when
the origin of B (which is at its center) is contained in A. With B being of size 3 × 3,
the narrowest background padding we needed was one pixel wide, as shown in Fig.
9.3(a). By using the smallest border needed for an operation, we keep the drawings
The reason we gener-
smaller. In practice, we specify the width of padding based on the maximum dimen-
ally specify the padding sions of the structuring elements used, regardless of the operations being performed.
border to be of the same
dimensions as B, is that
When we use terminology such as “the structuring element B is contained in
some morphological set A,” we mean specifically that the foreground elements of B overlap only ele-
operations are defined
for an entire structuring
ments of A. This becomes an important issue when B also contains background and,
element, and cannot be possibly, don’t care elements. Also, we use set A to denote all foreground pixels of I.
interpreted with respect
to the location of its Those foreground elements can be a single object, as in Fig. 9.3, or they can represent
origin. disjoint subsets of foreground elements, as in the first row of Fig. 9.1. We will discuss
binary images and structuring elements from Sections 9.2 through 9.7. Then, in Sec-
tion 9.8, we will extend the binary ideas to grayscale images and structuring elements.
vtucircle.com
EROSION
Morphological expressions are written in terms of structuring elements and a set,
Remember, set A can
represent (be the union A, of foreground pixels, or in terms of structuring elements and an image, I, that
of) multiple disjoint sets contains A. We consider the former approach first. With A and B as sets in Z 2 , the
of foreground pixels
(i.e., objects). erosion of A by B, denoted A | B, is defined as
{
A | B = z ( B )z 8 A } (9-3)
where A is a set of foreground pixels, B is a structuring element, and the z’s are
foreground values (1’s). In words, this equation indicates that the erosion of A by B
is the set of all points z such that B, translated by z, is contained in A. (Remember,
displacement is defined with respect to the origin of B.) Equation (9-3) is the formu-
lation that resulted in the foreground pixels of the image in Fig. 9.3(c).
As noted, we work with sets of foreground pixels embedded in a set of back-
ground pixels to form a complete image, I. Thus, inputs and outputs of our morpho-
logical procedures are images, not individual sets. We could make this fact explicit
by writing Eq. (9-3) as
{ } {
I | B = z ( B )z 8 A and A 8 I ´ Ac Ac 8 I } (9-4)
vtucircle.com
Returning to our discussion of Eq. (9-3), because the statement that B has to be
contained in A is equivalent to B not sharing any common elements with the back-
ground (i.e., the set complement of A), we can express erosion equivalently as
{
A | B = z ( B )z ¨ Ac = ∅ } (9-5)
a b c
d e d A A両B
FIGURE 9.4
(a) Image I, d/4
consisting of a set d d/4 I両B
(object) A, and back- B
ground.
(b) Square SE, B (the
dot is the origin). Background d/8 3d/4 d/8
Image I
(c) Erosion of A by
B (shown shaded in d/4
the resulting image).
A両B
(d) Elongated SE. d/ 2
(e) Erosion of A
by B. (The erosion d I両B
is a line.) The dotted
d/ 2
border in (c) and (e)
is the boundary of A,
shown for reference. B d/8 3d/4 d/8
vtucircle.com
DILATION
With A and B as sets in Z 2 , the dilation of A by B, denoted as A { B, is defined as
{ }
A { B = z P (Bˆ )z ¨ A ≠ ∅ (9-6)
a b
c d
FIGURE 9.5
Using erosion to
remove image
components.
(a) A 486 × 486
binary image of a
wire-bond mask
in which fore-
ground pixels are
shown in white.
(b)–(d) Image
eroded using
square structuring
elements of sizes
11 × 11, 15 × 15,
and 45 × 45
elements,
respectively, all
valued 1.
vtucircle.com
This equation is based on reflecting B about its origin and translating the reflection
by z, as in erosion. The dilation of A by B then is the set of all displacements, z, such
that the foreground elements of B̂ overlap at least one element of A. (Remember,
z is the displacement of the origin of Bˆ .) Based on this interpretation, Eq. (9-6) can
be written equivalently as
{
A { B = z P [(Bˆ )z ¨ A] 8 A} (9-7)
Equations (9-6) and (9-7) are not the only definitions of dilation currently in use
(see Problems 9.14 and 9.15 for two different, yet equivalent, definitions). As with
erosion, the preceding definitions have the advantage of being more intuitive when
structuring element B is viewed as a convolution kernel. As noted earlier, the basic
process of flipping (rotating) B about its origin and then successively displacing it
so that it slides over set A is analogous to spatial convolution. However, keep in
mind that dilation is based on set operations and therefore is a nonlinear operation,
whereas convolution is a sum of products, which is a linear operation.
Unlike erosion, which is a shrinking or thinning operation, dilation “grows” or
“thickens” objects in a binary image. The manner and extent of this thickening is con-
trolled by the shape and size of the structuring element used. Figure 9.6(a) shows the
same object used in Fig. 9.4 (the background area is larger to accommodate the dila-
tion), and Fig. 9.6(b) shows a structuring element (in this case B̂ = B because the SE
is symmetric about its origin). The dashed line in Fig. 9.6(c) shows the boundary of
the original object for reference, and the solid line shows the limit beyond which any
further displacements of the origin of B̂ by z would cause the intersection of B̂ and
A to be empty. Therefore, all points on and inside this boundary constitute the dila-
tion of A by B. Figure 9.6(d) shows a structuring element designed to achieve more
dilation vertically than horizontally, and Fig. 9.6(e) shows the dilation achieved with
this element.
vtucircle.com
a b c
d e
A丣B
FIGURE 9.6 A
(a) Image I,
composed of set
d/4
(object) A and
background. d d/4
(b) Square SE (the Bˆ B
dot is the origin).
(c) Dilation of A by d
B (shown shaded). d
d/ 8 d/8
(d) Elongated SE.
(e) Dilation of A by Background
I丣B
Image, I
this element. The
dotted line in (c) A丣B
and (e) is the
boundary of A, d/2
shown for d/4
reference.
d d
Bˆ B d/2
d/ 8 d d/ 8
I丣B
a c
b
FIGURE 9.7
(a) Low-resolution
text showing
broken characters
(see magnified
view).
(b) Structuring
element.
(c) Dilation of (a)
by (b). Broken
segments were
1 1 1
joined.
1 1 1
1 1 1
vtucircle.com
DUALITY
Erosion and dilation are duals of each other with respect to set complementation
and reflection. That is,
( A | B)c = Ac { Bˆ (9-8)
and
( A { B)c = Ac | Bˆ (9-9)
( A | B)c = {z P ( B)z }
c
8A
( A | B)c = {z P ( B)z ¨ Ac = ∅}
c
But the complement of the set of z’s that satisfy ( B )z ¨ Ac = ∅ is the set of z’s such
that ( B )z ¨ Ac ≠ ∅. Therefore,
( A | B)c {
= z | ( B )z ¨ Ac ≠ ∅ }
= A { Bˆ
c
where the last step follows from the definition of dilation in Eq. (9-6) and its equiva-
lent form in Eq. (9-7). This concludes the proof. A similar line of reasoning can be
used to prove Eq. (9-9) (see Problem 9.16).
As you saw in the previous section, dilation expands the components of a set and
erosion shrinks it. In this section, we discuss two other important morphological
operations: opening and closing. Opening generally smoothes the contour of an
object, breaks narrow isthmuses, and eliminates thin protrusions. Closing also tends
vtucircle.com
A ⴰ B = ( A | B) { B (9-10)
as
A B = ( A { B) | B
䊉 (9-11)
{ }
the shape of the closing
is determined by how far
the ball can reach into
A B = ∪ ( B )z ( B )z 8 A (9-12)
the boundary.
where ´ denotes the union of the sets inside the braces.
a b
c d A
FIGURE 9.8
(a) Image I,
composed of set B
(object) A and
background. Background
(b) Structuring Image, I
element, B.
(c) Translations AⴰB
of B while being
contained in A. (A
is shown dark for
clarity.)
(d) Opening of A
by B.
vtucircle.com
a b
c d
FIGURE 9.9 A
(a) Image I,
composed of set B
(object) A, and
background. Background
(b) Structuring Image, I
element B.
(c) Translations of B
such that B does not A B 䊉
{ }
c
A B = ⎡ ∪ ( B )z ( B )z ¨ A = ∅ ⎤
䊉 (9-13)
⎣ ⎦
vtucircle.com
As with erosion and dilation, opening and closing are duals of each other with
respect to set complementation and reflection:
( A ⴰ B)c = ( Ac 䊉 Bˆ ) (9-14)
and
(A 䊉
(
B ) = Ac ⴰ Bˆ
c
) (9-15)
a
b c
d e A
f g
h i B
FIGURE 9.10
Morphological Background Image, I
opening and
closing.
(a) Image I, A両B
composed of a
set (object ) A
and background;
a solid, circular
structuring element
is shown also. (The
dot is the origin.) A B (A 両 B) 丣 B
(b) Structuring
element in
various positions.
(c)-(i) The
morphological
operations used to
obtain the opening
and closing.
A丣B
A B (A 丣 B) 両 B
vtucircle.com
(a) A ⴰ B is a subset of A.
(b) If C is a subset of D, then C ⴰ B is a subset of D ⴰ B.
(c) ( A ⴰ B) ⴰ B = A ⴰ B.
(a) A is a subset of A B.
䊉
(c) ( A B) B = A B.
䊉 䊉 䊉
Note from condition (c) in both cases that multiple openings or closings of a set have
no effect after the operation has been applied once.
The morphological hit-or-miss transform (HMT) is a basic tool for shape detection.
Let I be a binary image composed of foreground (A) and background pixels, respec-
tively. Unlike the morphological methods discussed thus far, the HMT utilizes two
vtucircle.com
a b A (foreground pixels)
1 1 1 B
d c 1 1 1
e f A両B
1 1 1
FIGURE 9.11
(a) Noisy image.
(b) Structuring
element.
(c) Eroded image.
(d) Dilation of the
erosion (opening
of A). (e) Dilation
of the opening.
(f) Closing of the
opening.
(Original image (A 両 B) 丣 B A B
courtesy of the
National Institute
(A B) 丣 B [(A B) 丣 B] 両 B (A B) B
of Standards and
Technology.)
structuring elements: B1 , for detecting shapes in the foreground, and B2 , for detect-
With reference to the
explanation of Eq. (9-4), ing shapes in the background. The HMT of image I is defined as
we show the
morphological HMT
operation working
directly on image I, to
{
I B1,2 = z P ( B1 )z 8 A and ( B2 )z 8 Ac } (9-16)
make it explicit that the
structuring elements (
= ( A | B1 ) ¨ Ac | B2 )
work on sets of
foreground and back-
ground pixels where the second line follows from the definition of erosion in Eq. (9-3). In words,
simultaneously. this equation says that the morphological HMT is the set of translations, z, of struc-
turing elements B1 and B2 such that, simultaneously, B1 found a match in the fore-
ground (i.e., B1 is contained in A) and B2 found a match in the background (i.e., B2
is contained in Ac ). The word “simultaneous” implies that z is the same translation
of both structuring elements. The word “miss” in the HMT arises from the fact that
B2 finding a match in Ac is the same as B2 not finding (missing) a match in A.
Figure 9.12 illustrates the concepts just introduced. Suppose that we want to find
the location of the origin of object (set) D in image I. Here, A is the union of all
object sets, so D is a subset of A. The need for two structuring elements capable
vtucircle.com
a b
c d A = C 艛 D艛 E Background
e f
FIGURE 9.12
(a) Image
consisting of a E
foreground (1’s) C
equal to the union, d
A, of set of objects, D
and a background d
of 0’s.
Background Foreground = A c = C c ¨ Dc ¨ E c
(b) Image with Ic
its foreground Image, I
defined as Ac . d
(c) Structuring ele-
ments designed to d
detect object D. A | B1
(d) Erosion of A B1
by B1 .
Foreground
(e) Erosion of Ac d pixels
by B2 . d
(f) Intersection of
(d) and (e),
B2
showing the
location of the
origin of D, as
desired. The dots Ac | B2
indicate the origin
of their respective
components. Each
dot is a single
pixel.
Origin of D
Background
Image: I B1,2 = A | B1 艚 Ac | B2
vtucircle.com
{
I B = z P ( B )z 8 I } (9-17)
The form is the same as Eq. (9-3), but now we test to see if ( B)z is a subset of image I,
which is composed of both foreground and background pixels. This formulation is
general, in the sense that B can be structured to detect any arrangement of pixels in
image I, as Figs. 9.13 and 9.14 will illustrate.
Figure 9.13 shows graphically the same solution as Fig. 9.12(f), but using the
single structuring element discussed in the previous paragraph. Figure 9.14 shows
several examples based on using Eq. (9-17). The first row shows the result of using
a small SE composed of both foreground (shaded) and background elements. This
SE is designed to detect one-pixel holes (i.e., one background pixel surrounded by a
connected border of foreground pixels) contained in image I. The SE in the second
row is capable of detecting the foreground corner pixel of the top, right corner of
the object in I. Using this SE in Eq. (9-17) yielded the image on the right. As you
can see, the correct pixel was identified. The last row of Fig. 9.14 is more interest-
ing, as it shows a structuring element composed of foreground, background, and
“don’t care” elements which, as mentioned earlier, we denote by ×’s. You can think
of the value of a don’t care element as always matching its corresponding pixel in
an image. In this example, when the SE is centered on the top, right corner pixel,
the don’t care elements in the top of the SE can be considered to be background,
and the don’t care elements on the bottom row as foreground, producing a correct
vtucircle.com
Border of
background pixels
E Origin of D
C B
D
Background Background
Image, I Image, I B
a b c
FIGURE 9.13 Same solution as in Fig. 9.12, but using Eq. (9-17) with a single structuring element.
match. When the SE is centered on the bottom, right corner pixel, the role of the
don’t care elements is reversed, again resulting in a correct match. The other border
pixels between the two corners were similarly detected by considering all don’t care
elements as foreground. Thus, using don’t care elements increases the flexibility of
structuring elements to perform multiple roles.
With the preceding discussion as a foundation, we are now ready to consider some
practical uses of morphology. When dealing with binary images, one of the principal
applications of morphology is in extracting image components that are useful in the
a b c
d e f
g h i
FIGURE 9.14 B
Three examples
of using a single Image, I Image, I B
structuring
element and
Eq. (9-17) to
detect specific
features. First
row: detection B
of single-pixel
holes. Second Image, I Image, I B
row: detection of
an upper-right
corner. Third row:
detection of
multiple features.
B
Image, I Image, I B
vtucircle.com
BOUNDARY EXTRACTION
The boundary of a set A of foreground pixels, denoted by b( A), can be obtained by
first eroding A by a suitable structuring element B, and then performing the set dif-
ference between A and its erosion. That is,
b( A) = A − ( A | B) (9-18)
Figure 9.15 illustrates the mechanics of boundary extraction. It shows a simple binary
object, a structuring element B, and the result of using Eq. (9-18). The structuring
element in Fig. 9.15(b) is among the most frequently used, but it is not unique. For
example, using a 5 × 5 structuring element of 1’s would result in a boundary between
2 and 3 pixels thick. It is understood that the image in Fig. 9.15(a) was padded with
a border of background elements, and that the results were cropped back to the
original size after the morphological operations were completed.
HOLE FILLING
As mentioned in the discussion of Fig. 9.14, a hole may be defined as a background
region surrounded by a connected border of foreground pixels. In this section, we
develop an algorithm based on set dilation, complementation, and intersection for
a b
c d
FIGURE 9.15
(a) Set, A, of
foreground pixels. A B
(b) Structuring
element.
(c) A eroded by B.
(d) Boundary of A.
A両B b( A) = A − ( A | B)
vtucircle.com
a b
FIGURE 9.16
(a) A binary
image.
(b) Result of
using Eq. (9-18)
with the
structuring
element in
Fig. 9.15(b).
filling holes in an image. Let A denote a set whose elements are 8-connected bound-
aries, with each boundary enclosing a background region (i.e., a hole). Given a point
in each hole, the objective is to fill all the holes with foreground elements (1’s).
We begin by forming an array, X 0 , of 0’s (the same size as I, the image containing
A), except at locations in X 0 that correspond to pixels that are known to be holes,
which we set to 1. Then, the following procedure fills all the holes with 1’s:
Remember, the dila-
tion of image X by B
is the dilation of the X k = ( X k −1 { B ) ¨ I c k = 1, 2, 3, … (9-19)
foreground
elements of X by B.
where B is the symmetric structuring element in Fig. 9.17(c) . The algorithm termi-
nates at iteration step k if X k = X k −1 . Then, X k contains all the filled holes. The set
union of X k and I contains all the filled holes and their boundaries.
The dilation in Eq. (9-19) would fill the entire area if left unchecked, but the
intersection at each step with I c limits the result to inside the region of interest. This is
our first example of how a morphological process can be conditioned to meet a desired
property. In the current application, the process is appropriately called conditional
dilation. The rest of Fig. 9.17 illustrates further the mechanics of Eq. (9-19). This exam-
ple only has one hole, but the concept applies to any finite number of holes, assuming
that a point inside each hole is given (we remove this requirement in Section 9.6).
vtucircle.com
a b c
d e f
g h i
FIGURE 9.17
Hole filling.
(a) Set A (shown
shaded) contained
in image I. A I Ac Ic B
(b) Complement
of I.
(c) Structuring
element B. Only
the foreground
elements are
used in
computations
(d) Initial point
inside hole, set X0 X1 X2
to 1.
(e)–(h) Various
steps of Eq. (9-19).
(i) Final result
[union of (a) and
(h)].
X6 X8 X8 I
a b
FIGURE 9.18
(a) Binary image.
The white dots
inside the regions
(shown enlarged
for clarity) are the
starting points for
the hole-filling
algorithm.
(b) Result of
filling all holes.
vtucircle.com
a
b c d
B
e f g
FIGURE 9.19
(a) Structuring
element.
(b) Image
containing a set
with one connected
component.
(c) Initial array
containing a 1 in A I X0 X1
the region of the
connected
component.
(d)–(g) Various
steps in the
iteration of
Eq. (9-20)
X2 X3 X6
which we set to 1 (foreground value). The objective is to start with X 0 and find all
the connected components in I. The following iterative procedure accomplishes this:
X k = ( X k −1 { B ) ¨ I k = 1, 2, 3, … (9-20)
EXAMPLE 9.7 : Using connected components to detect foreign objects in packaged food.
Connected components are used frequently for automated inspection. Figure 9.20(a) shows an X-ray
image of a chicken breast that contains bone fragments. It is important to be able to detect such foreign
objects in processed foods before shipping. In this application, the density of the bones is such that their
nominal intensity values are significantly different from the background. This makes extraction of the
bones from the background a simple matter by using a single threshold (thresholding was introduced in
Section 3.1 and we will discuss in more detail in Section 10.3). The result is the binary image in Fig. 9.20(b).
The most significant feature in this figure is the fact that the points that remain after thresholding
are clustered into objects (bones), rather than being scattered. We can make sure that only objects of
vtucircle.com
a
b
c d
FIGURE 9.20
(a) X-ray image of
a chicken filet with
bone fragments.
(b) Thresholded
image (shown as
the negative for
clarity).
(c) Image eroded
with a 5 × 5 SE
of 1’s. Connected No. of pixels in
(d) Number of component connected comp
pixels in the
01 11
connected 02 9
components of (c). 03 9
(Image (a) 04 39
courtesy of NTB 05 133
Elektronische 06 1
Geraete GmbH, 07 1
Diepholz, 08 743
Germany, 09 7
www.ntbxray.com.) 10 11
11 11
12 9
13 9
14 674
15 85
“significant” size are contained in the binary image by eroding its foreground. In this example, we define
as significant any object that remains after erosion with a 5 × 5 SE of 1’s. Figure 9.20(c) shows the result
of erosion. The next step is to analyze the size of the objects that remain. We label (identify) these
objects by extracting the connected components in the image. The table in Fig. 9.20(d) lists the results
of the extraction. There are 15 connected components, with four of them being dominant in size. This is
enough evidence to conclude that significant, undesirable objects are contained in the original image. If
needed, further characterization (such as shape) is possible using the techniques discussed in Chapter 11.
CONVEX HULL
A set, S, of points in the Euclidean plane is said to be convex if and only if a straight
line segment joining any two points in S lies entirely within S. The convex hull, H,
of S is the smallest convex set containing S. The convex deficiency of S is defined as
the set difference H − S. Unlike the Euclidean plane, the digital image plane (see
Fig. 2.19) only allows points at discrete coordinates. Thus, the sets with which we
work are digital sets. The same concepts of convexity are applicable to digital sets,
but the definition of a convex digital set is slightly different. A digital set, A, is said
to be convex if and only if its Euclidean convex hull only contains digital points
vtucircle.com
( )
X ki = X ki −1 Bi ´ X ki −1 i = 1, 2, 3, 4 and k = 1, 2, 3, … (9-21)
with X 0i = I . When the procedure converges using the ith structuring element (i.e.,
when X ki = X ki −1 ), we let Di = X ki . Then, the convex hull of A is the union of the four
results:
4
C ( A) = ∪ Di (9-22)
i =1
Thus, the method consists of iteratively applying the hit-or-miss transform to I with
B1 until convergence, then letting D1 = X k1 , where k is the step at which convergence
occurred. The procedure is repeated with B2 (applied to I) until no further changes
occur, and so on. The union of the four resulting Di constitutes the convex hull of A.
The algorithm is initialized with k = 0 and X 0i = I every time that i (i.e., the structur-
ing element) changes.
Figure 9.21 illustrates the use of Eqs. (9-21) and (9-22). Figure 9.21(a) shows the
structuring elements used to extract the convex hull. The origin of each element is
at its center. As before, the × entries indicate “don’t care” elements. Recall that the
HMT is said to have found a match of structuring element Bi in a 3 × 3 region of
I, if all the elements of Bi find corresponding matches in that region. As noted ear-
lier, when computing a match, a “don’t care” element can be interpreted as always
matching the value of its corresponding element in the image. Note in Fig. 9.21(a)
that Bi is a clockwise rotation of Bi−1 by 90°.
Figure 9.21(b) shows a set A for which the convex hull is sought. As before, the
set is embedded in an array of background elements to form an image, I. Starting
with X 01 = I resulted in the set in Fig. 9.21(c) after five iterations of Eq. (9-21). Then,
letting X 02 = I and again using Eq. (9-21) resulted in the set in Fig. 9.21(d) (con-
vergence was achieved in only two steps in this case). The next two results were
obtained in the same manner. Finally, forming the union of the sets in Figs. 9.21(c),
(d), (e), and (f) resulted in the convex hull in Fig. 9.21(g). The contribution of each
structuring element is highlighted in the composite set shown in Fig. 9.21(h).
One obvious shortcoming of the procedure just discussed is that the convex hull
can grow beyond the minimum dimensions required to guarantee convexity, thus
violating the definition of the convex hull. This, in fact, is what happened in this
case. One simple approach to reduce this growth is to place limits so that it does
not extend beyond the vertical and horizontal dimensions of set A. Imposing this
vtucircle.com
**
*
**
a ** **
**
**
* *
b c d
*
** **
e f g
B1 B2 B3 B4
h
FIGURE 9.21
(a) Structuring
elements.
(b) Set A.
(c)–(f) Results of
convergence with
the structuring
elements shown
in (a).
(g) Convex hull. I A X 01 I X 51 X 22
(h) Convex hull
showing the
contribution of
each structuring
element.
X 73 X 24 C(A)
B1
B2
B3
B4
limitation on the example in Fig. 9.21 resulted in Fig. 9.22(a). Joining the boundary
pixels of the reduced set (remember, the pixels are the center points of the squares)
show that no set points lie outside these lines, indicating that the set is convex. By
inspection, you can see that no points can be deleted from this set without losing
convexity, so the reduced set is the convex hull of A.
Of course, the limits we used to produce Fig. 9.22 do not constitute a general
approach for obtaining the minimum convex set enclosing a set in question; it is
simply an easy-to-implement heuristic. The reason why the convex hull algorithm
did not yield a closer approximation of the actual convex hull is because of the
structuring elements used. The SEs in Fig. 9.21(a) “look” only in four orthogonal
directions. We could achieve greater accuracy by looking in additional directions,
such as the diagonals, for example. The price paid is increased algorithm complexity
and a higher computational load.
vtucircle.com
a b
A
FIGURE 9.22
(a) Result of limiting
growth of the convex
hull algorithm.
(b) Straight lines
connecting the
boundary points
show that the new set
is convex also.
THINNING
Thinning of a set A of foreground pixels by a structuring element B, denoted A z B,
can be defined in terms of the hit-or-miss transform:
A z B = A − ( A B)
(9-23)
= A ¨ ( A B)
c
where the second line follows from the definition of set difference given in Eq. (2-40).
A more useful expression for thinning A symmetrically is based on a sequence of
structuring elements:
{B} = {B1 , B2 , B3 ,…, Bn } (9-24)
(( (( )
A z {B} = … A z B1 z B2 … z Bn ) ) ) (9-25)
The process is to thin A by one pass with B1 , then thin the result with one pass of B2 ,
and so on, until A is thinned with one pass of Bn . The entire process is repeated until
no further changes occur after one complete pass through all structuring elements.
Each individual thinning pass is performed using Eq. (9-23).
As before, we assume
Figure 9.23(a) shows a set of structuring elements used routinely for thinning
that the image containing (note that B i is equal to B i−1 rotated clockwise by 45°), and Fig. 9.23(b) shows a
A was padded to accom- set A to be thinned, using the procedure just discussed. Figure 9.23(c) shows the
modate all excursions
of B, and that the result result of thinning A with one pass of B1 to obtain A1 . Figure 9.23(c) is the result of
was cropped. We show
only A for simplicity.
thinning A1 with B2 , and Figs. 9.21(e) through (k) show the results of passes with
the remaining structuring elements (there were no changes from A7 to A8 or from
A9 to A11 .) Convergence was achieved after the second pass of B6 . Figure 9.23(l)
shows the thinned result. Finally, Fig. 9.23(m) shows the thinned set converted to
m-connectivity (see Section 2.5 and Problem 9.29) to eliminate multiple paths.
THICKENING
Thickening is the morphological dual of thinning and is defined by the expression
A } B = A ´ ( A B) (9-26)
vtucircle.com
a Origin
b c d
*
*
*
*
*
* *
g * *
e f
*
*
*
*
*
*
h i j
k l m B1 B2 B3 B4 B5 B6 B7 B8
FIGURE 9.23
(a) Structuring
elements.
(b) Set A.
(c) Result of thinning
A with B1 (shaded). Image, I A A1 A 丢 B1 A2 A1 丢 B2
(d) Result of thinning
A1 with B2 .
(e)–(i) Results of
thinning with the next
six SEs. (There was no
change between A7 A3 A2 丢 B3 A4 A3 丢 B4 A5 A4 丢 B5
and A8 .)
(j)–(k) Result of using
the first four elements
again.
(l) Result after
A6 A5 丢 B6 A7 A6 丢 B7 (A8 A7 丢 B A9 A8 丢 B1
8
convergence. A7)
(( ((
A } {B} = … A } B1 } B2 … } Bn ) ) ) ) ` (9-27)
The structuring elements used for thickening have the same form as those shown
in Fig. 9.23(a), but with all 1’s and 0’s interchanged. However, a separate algorithm
for thickening is seldom used in practice. Instead, the usual procedure is to thin the
background of the set in question, then complement the result. In other words, to
thicken a set A we form Ac , thin Ac , and then complement the thinned set to obtain
the thickening of A. Figure 9.24 illustrates this procedure. As before, we show only
set A and image I, and not the padded version of I.
Depending on the structure of A, this procedure can result in disconnected points,
as Fig. 9.24(d) shows. Hence thickening by this method usually is followed by post-
processing to remove disconnected points. Note from Fig. 9.24(c) that the thinned
background forms a boundary for the thickening process. This useful feature is not
present in the direct implementation of thickening using Eq. (9-27), and it is one of
the principal reasons for using background thinning to accomplish thickening.
vtucircle.com
a b
c d
e
FIGURE 9.24
Image, I A Ac
(a) Set A.
(b) Complement of A.
(c) Result of
thinning the
complement.
(d) Thickened set
obtained by
complementing (c).
(e) Final result, with
no disconnected
points.
SKELETONS
The skeleton of A can be expressed in terms of erosions and openings. That is, it can
be shown (Serra [1982]) that
K
S ( A) = ∪ Sk ( A) (9-28)
k=0
with
Sk ( A) = ( A | kB ) − ( A | kB ) B (9-29)
k times. K in Eq. (9-28) is the last iterative step before A erodes to an empty set. In
other words,
K = max k { ( A | kB) ≠ ∅} (9-31)
The formulation in Eqs. (9-28) and (9-29) indicate that S( A) can be obtained as the
union of the skeleton subsets Sk ( A) , k = 0, 1, 2, … , K .
vtucircle.com
a b
c d
FIGURE 9.25
(a) Set A.
(b) Various
positions of
maximum disks
whose centers
partially define
the skeleton of A.
(c) Another
maximum disk,
A
whose center
defines a different
segment of the
skeleton of A.
(d) Complete
skeleton (dashed).
Skeleton of A
It can be shown (Serra [1982]) that A can be reconstructed from these subsets:
K
A= ∪ ( Sk ( A) { kB)
k=0
(9-32)
vtucircle.com
FIGURE 9.26
Implementation K K
A 両 kB (A 両 kB) ⴰ B Sk(A) Sk(A) Sk(A) 丣 kB Sk(A) 丣 kB
of Eqs. (9-28) k k0 k0
through (9-33).
The original set is
at the top left, and
its morphologi-
cal skeleton is at
the bottom of the 0
fourth column.
The reconstructed
set is at the
bottom of the
sixth column.
S(A) A
The entries in the fifth and sixth columns deal with reconstructing the original set from its skeleton subsets.
The fifth column are the dilations of Sk ( A); that is, S0 ( A), S1 (k ) { B, and S2 ( A) { 2 B = ( S2 ( A) { B ) { B.
Finally, the last column shows reconstruction of set A which, according to Eq. (9-32), is the union of the
dilated skeleton subsets shown in the fifth column.
PRUNING
Pruning methods are an essential complement to thinning and skeletonizing algo-
rithms, because these procedures tend to leave spurs (“parasitic” components) that
need to be “cleaned up” by postprocessing. We begin the discussion with a pruning
problem, then develop a solution based on the material introduced in the preceding
sections. Thus, we take this opportunity to illustrate how to solve a problem by com-
bining several of the morphological techniques discussed up to this point.
A common approach in the automated recognition of hand-printed characters is
to analyze the shape of the skeleton of a character. These skeletons often contain
spurs, caused during erosion by noise and non-uniformities in the character strokes.
vtucircle.com
X 1 = A z {B} (9-34)
where {B} denotes the structuring element sequence in Fig. 9.27(b) [see Eq. (9-24)
regarding structuring-element sequences]. The sequence of structuring elements
consists of two different structures, each of which is rotated 90° for a total of eight
elements. The × in Fig. 9.27(b) signifies a “don’t care” condition, as defined earlier.
(Note that each SE is a detector for an end point in a particular orientation.)
a b
c d
*
*
*
e f
*
*
*
FIGURE 9.27
(a) Set A of B1 B2 B3 B4
foreground pixels
(shaded).
(b) SEs used for
B5 B6 B7 B8
deleting end points.
(c) Result of three
cycles of thinning.
(d) End points
of (c).
(e) Dilation of end
points conditioned
on (a).
(f) Pruned image.
vtucircle.com
Applying Eq. (9-34) to A three times yielded the set X 1 in Fig. 9.27(c). The next
step is to “restore” the character to its original form, but with the parasitic branches
removed. This requires that we first form a set X 2 containing all end points in X 1
[Fig. 9.27(e)]:
8
X2 = ∪
k =1
( X1 B k ) (9-35)
where the B k are the end-point detectors in Fig. 9.27(b). The next step is dilation
of the end points. Typically, the number of dilations is less than the number of end-
point removals to reduce the probability of “growing” back some of the spurs. In
this case, we know by inspection that no new spurs are created, so we dilate the end
points three times using A as a delimiter. This is the same number of thinning passes:
X3 = ( X2 { H ) ¨ A (9-36)
X 4 = X1 ´ X 3 (9-37)
vtucircle.com
11.1 BACKGROUND
11.1
vtucircle.com
Therefore, area is an invariant feature descriptor with respect to the given family of
transformations. However, if we add the affine transformation scaling to the fam-
ily, descriptor area ceases to be invariant with respect to the extended family. The
descriptor is now covariant with respect to the family, because scaling the area of the
region by any factor scales the value of the descriptor by the same factor. Similarly,
the descriptor direction (of the principal axis of the region) is covariant because
rotating the region by any angle has the same effect on the value of the descriptor.
Most of the feature descriptors we use in this chapter are covariant in general, in
the sense that they may be invariant to some transformations of interest, but not to
others that may be equally as important. As you will see shortly, it is good practice to
normalize as many relevant invariances as possible out of covariances. For instance,
we can compensate for changes in direction of a region by computing its actual
direction and rotating the region so that its principal axis points in a predefined
direction. If we do this for every region detected in an image, rotation will cease to
be covariant.
Another major classification of features is local vs. global. You are likely to see
many different attempts to classify features as belonging to one of these two catego-
ries. What makes this difficult is that a feature may belong to both, depending on the
application. For example, consider the descriptor area again, and suppose that we
are applying it to the task of inspecting the degree to which bottles moving past an
imaging sensor on a production line are full of liquid. The sensor and its accompany-
ing software are capable of generating images of ten bottles at once, in which liquid
in each bottle appears as a bright region, and the rest of the image appears as dark
background. The area of a region in this fixed geometry is directly proportional to
the amount of liquid in a bottle and, if detected and measured reliably, area is the
only feature we need to solve the inspection problem. Each image has ten regions, so
we consider area to be a local feature, in the sense that it is applicable to individual
elements (regions) of an image. If the problem were to detect the total amount (area)
of liquid in an image, we would now consider area to be a global descriptor. But the
story does not end there. Suppose that the liquid inspection task is redefined so that
it calculates the entire amount of liquid per day passing by the imaging station. We
no longer care about the area of individual regions per se. Our units now are images.
If we know the total area in an image, and we know the number of images, calculat-
ing the total amount of liquid in a day is trivial. Now the area of an entire image is a
local feature, and the area of the total at the end of the day is global. Obviously, we
could redefine the task so that the area at the end of a day becomes a local feature
descriptor, and the area for all assembly lines becomes a global measure. And so on,
endlessly. In this chapter, we call a feature local if it is applies to a member of a set,
and global if it applies to the entire set, where “member” and “set” are determined
by the application.
Features by themselves are seldom generated for human consumption, except in
applications such as interactive image processing, topics that are not in the main-
stream of this book. In fact, as you will see later, some feature extraction meth-
ods generate tens, hundreds, or even thousands of descriptor values that would
appear meaningless if examined visually. Instead, feature description typically is
used as a preprocessing step for higher-level tasks, such as image registration, object
vtucircle.com
recognition for automated inspection, searching for patterns (e.g., individual faces
and/or fingerprints) in image databases, and autonomous applications, such as robot
and vehicle navigation. For these applications, numerical features usually are “pack-
aged” in the form of a feature vector, (i.e., a 1 × n or n × 1 matrix) whose elements are
the descriptors. An RGB image is one of the simplest examples. As you know from
Chapter 6, each pixel of an RGB image can be expressed as 3-D vector,
⎡ x1 ⎤
x = ⎢⎢ x2 ⎥⎥
⎢⎣ x3 ⎥⎦
in which x1 is the intensity value of the red image at a point, and the other com-
ponents are the intensity values of the green and blue images at the same point. If
color is used as a feature, then a region in an RGB image would be represented as
a set of feature vectors (points) in 3-D space. When n descriptors are used, feature
vectors become n-dimensional, and the space containing them is referred to as an
n-dimensional feature space. You may “visualize” a set of n-dimensional feature vec-
tors as a “hypercloud” of points in n-dimensional Euclidean space.
In this chapter, we group features into three principal categories: boundary,
region, and whole image features. This subsidivision is not based on the applicabil-
ity of the methods we are about to discuss; rather, it is based on the fact that some
categories make more sense than others when considered in the context of what is
being described. For example, it is implied that when we refer to the “length of a
boundary” we are referring to the “length of the boundary of a region,” but it makes
no sense to refer to the “length” of an image. It will become clear that many of the
features we will be discussing are applicable to boundaries and regions, and some
apply to whole images as well.
The segmentation techniques discussed in the previous two chapters yield raw data
in the form of pixels along a boundary or pixels contained in a region. It is standard
practice to use schemes that compact the segmented data into representations that
facilitate the computation of descriptors. In this section, we discuss various bound-
ary preprocessing approaches suitable for this purpose.
vtucircle.com
c c c
1 1 1 1 c0 b0 1 1 1 b 1 1 b 1 b
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 ...
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
a b c d e f
FIGURE 11.1 Illustration of the first few steps in the boundary-following algorithm. The point to be processed next is
labeled in bold, black; the points yet to be processed are gray; and the points found by the algorithm are shaded.
Squares without labels are considered background (0) values.
1. Let the starting point, b0 , be the uppermost-leftmost point† in the image that is
labeled 1. Denote by c0 the west neighbor of b0 [see Fig. 11.1(b)]. Clearly, c0 is
See Section 2.5 for the
always a background point. Examine the 8-neighbors of b0 , starting at c0 and
definition of 4-neigh- proceeding in a clockwise direction. Let b1 denote the first neighbor encountered
bors, 8-neighbors, and whose value is 1, and let c1 be the (background) point immediately preceding b1
m-neighbors of a point,
in the sequence. Store the locations of b0 for use in Step 5.
2. Let b = b0 and c = c0 .
3. Let the 8-neighbors of b, starting at c and proceeding in a clockwise direction,
be denoted by n1 , n2 , … , n8 . Find the first neighbor labeled 1 and denote it by nk .
4. Let b = nk and c = nk –1 .
5. Repeat Steps 3 and 4 until b = b0 . The sequence of b points found when the
algorithm stops is the set of ordered boundary points.
Note that c in Step 4 is always a background point because nk is the first 1-valued
point found in the clockwise scan. This algorithm is referred to as the Moore bound-
ary tracing algorithm after Edward F. Moore, a pioneer in cellular automata theory.
Figure 11.1 illustrates the first few steps of the algorithm. It is easily verified (see
Problem 11.1) that continuing with this procedure will yield the correct boundary,
shown in Fig. 11.1(f), whose points are ordered in a clockwise sequence. The algo-
rithm works equally well with more complex boundaries, such as the boundary with
an attached branch in Fig. 11.2(a) or the self-intersecting boundary in Fig. 11.2(b).
Multiple boundaries [Fig. 11.2(c)] are handled by processing one boundary at a time.
If we start with a binary region instead of a boundary, the algorithm extracts the
outer boundary of the region. Typically, the resulting boundary will be one pixel
thick, but not always [see Problem 11.1(b)]. If the objective is to find the boundaries
of holes in a region (these are called the inner or interior boundaries of the region),
†
As you will see later in this chapter and in Problem 11.8, the uppermost-leftmost point in a 1-valued boundary
has the important property that a polygonal approximation to the boundary has a convex vertex at that location.
Also, the left and north neighbors of the point are guaranteed to be background points. These properties make
it a good “standard” point at which to start boundary-following algorithms.
vtucircle.com
1 1
1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
a b c
FIGURE 11.2 Examples of boundaries that can be processed by the boundary-following algo-
rithm. (a) Closed boundary with a branch. (b) Self-intersecting boundary. (c) Multiple bound-
aries (processed one at a time).
a straightforward approach is to extract the holes (see Section 9.6) and treat them
as 1-valued regions on a background of 0’s. Applying the boundary-following algo-
rithm to these regions will yield the inner boundaries of the original region.
We could have stated the algorithm just as easily based on following a boundary
in the counterclockwise direction but you will find it easier to have just one algo-
rithm and then reverse the order of the result to obtain a sequence in the opposite
direction. We use both directions interchangeably (but consistently) in the following
sections to help you become familiar with both approaches.
CHAIN CODES
Chain codes are used to represent a boundary by a connected sequence of straight-
line segments of specified length and direction. We assume in this section that all
curves are closed, simple curves (i.e., curves that are closed and not self intersecting).
vtucircle.com
a b 1 2
FIGURE 11.3 3 1
Direction
numbers for
2 0 4 0
(a) 4-directional
chain code, and
(b) 8-directional 5 7
chain code.
3 6
and 9.29). For the same reason mentioned when discussing boundary tracing earlier
in this section, we chose the starting point in Fig. 11.4(c) as the uppermost-leftmost
point of the boundary, which gives the chain code 0766…1212. As you might suspect,
the spacing of the resampling grid is determined by the application in which the
chain code is used.
If the sampling grid used to obtain a connected digital curve is a uniform quad-
rilateral (see Fig. 2.19) all points of a Freeman code based on Fig. 11.3 are guaran-
teed to coincide with the points of the curve. The same is true if a digital curve is
subsampled using the same type of sampling grid, as in Fig. 11.4(b). This is because
the samples of curves produced using such grids have the same arrangement as in
Fig. 11.3, so all points are reachable as we traverse a curve from one point to the next
to generate the code.
The numerical value of a chain code depends on the starting point. However, the
code can be normalized with respect to the starting point by a straightforward pro-
cedure: We simply treat the chain code as a circular sequence of direction numbers
and redefine the starting point so that the resulting sequence of numbers forms an
integer of minimum magnitude. We can normalize also for rotation (in angles that
are integer multiples of the directions in Fig. 11.3) by using the first difference of the
chain code instead of the code itself. This difference is obtained by counting the num-
ber of direction changes (in a counterclockwise direction in Fig. 11.3) that separate
two adjacent elements of the code. If we treat the code as a circular sequence to nor-
malize it with respect to the starting point, then the first element of the difference is
computed by using the transition between the last and first components of the chain.
0
a b c
2 7
FIGURE 11.4
1 6
(a) Digital
boundary with 2 6
resampling grid
1 6
superimposed.
(b) Result of 2 6
resampling.
(c) 8-directional 6
3
chain-coded 5 4
3
boundary.
vtucircle.com
For instance, the first difference of the 4-directional chain code 10103322 is 3133030.
Size normalization can be achieved by altering the spacing of the resampling grid.
The normalizations just discussed are exact only if the boundaries themselves
are invariant to rotation (again, in angles that are integer multiples of the directions
in Fig. 11.3) and scale change, which seldom is the case in practice. For instance,
the same object digitized in two different orientations will have different bound-
ary shapes in general, with the degree of dissimilarity being proportional to image
resolution. This effect can be reduced by selecting chain elements that are long in
proportion to the distance between pixels in the digitized image, and/or by orienting
the resampling grid along the principal axes of the object to be coded, as discussed
in Section 11.3, or along its eigen axes, as discussed in Section 11.5.
00006066666666444444242222202202
The starting point of the boundary is at coordinates (2, 5) in the subsampled grid (remember from
Fig. 2.19 that the origin of an image is at its top, left). This is the uppermost-leftmost point in Fig. 11.5(f).
The integer of minimum magnitude of the code happens in this case to be the same as the chain code:
00006066666666444444242222202202
00062600000006000006260000620626
Using this code to represent the boundary results in a significant reduction in the amount of data
needed to store the boundary. In addition, working with code numbers offers a unified way to analyze
the shape of a boundary, as we discuss in Section 11.3. Finally, keep in mind that the subsampled bound-
ary can be recovered from any of the preceding codes.
vtucircle.com
a b c
d e f
FIGURE 11.5 (a) Noisy image of size 570 × 570 pixels. (b) Image smoothed with a 9 × 9 box kernel. (c) Smoothed
image, thresholded using Otsu’s method. (d) Longest outer boundary of (c). (e) Subsampled boundary (the points
are shown enlarged for clarity). (f) Connected points from (e).
vtucircle.com
Figure 11.6 illustrates how an SCC is generated. The first step is to select the
length of the line segment to use in generating the code [see Fig. 11.6(b)]. Next, a
starting point (the origin) is specified (for an open curve, the logical starting point is
one of its end points). As Fig. 11.6(c) shows, once the origin has been selected, one
end of a line segment is placed at the origin and the other end of the segment is set
to coincide with the curve. This point becomes the starting point of the next line seg-
ment, and we repeat this procedure until the starting point (or end point in the case
of an open curve) is reached. As the figure illustrates, you can think of this process as
a sequence of identical circles (with radius equal to the length of the line segment)
traversing the curve. The intersections of the circles and the curve determine the
nodes of the straight-line approximation to the curve.
Once the intersections of the circles are known, we determine the slope changes
between contiguous line segments. Positive and zero slope changes are normalized
to the open half interval [0, 1), while negative slope changes are normalized to the
open interval (−1, 0). Not allowing slope changes of ±1 eliminates the implementa-
tion issues that result from having to deal with the fact that such changes result in
the same line segment with opposite directions.
The sequence of slope changes is the chain that defines the SCC approximation
to the original curve. For example, the code for the curve in Fig. 11.6(e) is 0.12, 0.20,
0.21, 0.11, −0.11, −0.12, −0.21, −0.22, −0.24, −0.28, −0.28, −0.31, −0.30. The accu-
racy of the slope changes defined in Fig. 11.6(d) is 10 −2 , resulting in an “alphabet”
of 199 possible symbols (slope changes). The accuracy can be changed, of course. For
instance, and accuracy of 10 −1 produces an alphabet of 19 symbols (see Problem 11.6).
Unlike a Freeman code, there is no guarantee that the last point of the coded curve
will coincide with the last point of the curve itself. However, shortening the line
Line segment
a b c d e
FIGURE 11.6 (a) An open curve. (b) A straight-line segment. (c) Traversing the curve using circumferences to deter-
mine slope changes; the dot is the origin (starting point). (d) Range of slope changes in the open interval (−1, 1)
(the arrow in the center of the chart indicates direction of travel). There can be ten subintervals between the slope
numbers shown.(e) Resulting coded curve showing its corresponding numerical sequence of slope changes. (Cour-
tesy of Professor Ernesto Bribiesca, IIMAS-UNAM, Mexico.)
vtucircle.com
length and/or increasing angle resolution often resolves the problem, because the
results of computations are rounded to the nearest integer (remember we work with
integer coordinates).
The inverse of an SCC is another chain of the same length, obtained by reversing
the order of the symbols and their signs. The mirror image of a chain is obtained by
starting at the origin and reversing the signs of the symbols. Finally, we point out
that the preceding discussion is directly applicable to closed curves. Curve following
would start at an arbitrary point (for example, the uppermost-leftmost point of the
curve) and proceed in a clockwise or counterclockwise direction, stopping when the
starting point is reached. We will illustrate an use of SSCs in Example 11.6.
Foundation
An intuitive approach for computing MPPs is to enclose a boundary [see Fig. 11.7(a)]
by a set of concatenated cells, as in Fig. 11.7(b). Think of the boundary as a rubber
band contained in the gray cells in Fig. 11.7(b). As it is allowed to shrink, the rubber
band will be constrained by the vertices of the inner and outer walls of the region
of the gray cells. Ultimately, this shrinking produces the shape of a polygon of mini-
mum perimeter (with respect to this geometrical arrangement) that circumscribes
the region enclosed by the cell strip, as in Fig. 11.7(c). Note in this figure that all the
vertices of the MPP coincide with corners of either the inner or the outer wall.
The size of the cells determines the accuracy of the polygonal approximation.
In the limit, if the size of each (square) cell corresponds to a pixel in the boundary,
the maximum error in each cell between the boundary and the MPP approxima-
tion would be 2 d, where d is the minimum possible distance between pixels (i.e.,
the distance between pixels established by the resolution of the original sampled
boundary). This error can be reduced in half by forcing each cell in the polygonal
approximation to be centered on its corresponding pixel in the original boundary.
The objective is to use the largest possible cell size acceptable in a given application,
thus producing MPPs with the fewest number of vertices. Our objective in this sec-
tion is to formulate a procedure for finding these MPP vertices.
The cellular approach just described reduces the shape of the object enclosed
by the original boundary, to the area circumscribed by the gray walls in Fig. 11.7(b).
vtucircle.com
a b c
FIGURE 11.7 (a) An object boundary. (b) Boundary enclosed by cells (shaded). (c) Minimum-perimeter polygon
obtained by allowing the boundary to shrink. The vertices of the polygon are created by the corners of the inner
and outer walls of the gray region.
Figure 11.8(a) shows this shape in dark gray. Suppose that we traverse the bound-
ary of the dark gray region in a counterclockwise direction. Every turn encountered
A convex vertex is the
in the traversal will be either a convex or a concave vertex (the angle of a vertex is
center point of a triplet defined as an interior angle of the boundary at that vertex). Convex and concave
of points that define an vertices are shown, respectively, as white and blue dots in Fig. 11.8(b). Note that
angle in the range
0° < u < 180°. Similarly, these vertices are the vertices of the inner wall of the light-gray bounding region in
angles of a concave Fig. 11.8(b), and that every concave (blue) vertex in the dark gray region has a corre-
vertex are in the range
180° < u < 360°. An sponding concave “mirror” vertex in the light gray wall, located diagonally opposite
angle of 180° defines a the vertex. Figure 11.8(c) shows the mirrors of all the concave vertices, with the MPP
degenerate vertex (i.e.,
segment of a straight from Fig. 11.7(c) superimposed for reference. We see that the vertices of the MPP
line), which cannot be an coincide either with convex vertices in the inner wall (white dots) or with the mir-
MPP-vertex.
rors of the concave vertices (blue dots) in the outer wall. Only convex vertices of the
inner wall and concave vertices of the outer wall can be vertices of the MPP. Thus,
our algorithm needs to focus attention only on those vertices.
MPP Algorithm
The set of cells enclosing a digital boundary [e.g., the gray cells in Fig. 11.7(b)] is
called a cellular complex. We assume the cellular complexes to be simply connected,
in the sense the boundaries they enclose are not self-intersecting. Based on this
assumption, and letting white (W) denote convex vertices, and blue (B) denote mir-
rored concave vertices, we state the following observations:
vtucircle.com
Direction of travel
a b c
FIGURE 11.8 (a) Region (dark gray) resulting from enclosing the original boundary by cells (see Fig. 11.7). (b) Convex
(white dots) and concave (blue dots) vertices obtained by following the boundary of the dark gray region in the
counterclockwise direction. (c) Concave vertices (blue dots) displaced to their diagonal mirror locations in the
outer wall of the bounding region; the convex vertices are not changed. The MPP (solid boundary) is superimposed
for reference.
3. Every mirrored concave vertex of the MPP is a B vertex, but not every B vertex
of a boundary is a vertex of the MPP.
4. All B vertices are on or outside the MPP, and all W vertices are on or inside the
MPP.
5. The uppermost-leftmost vertex in a sequence of vertices contained in a cellular
complex is always a W vertex of the MPP (see Problem 11.8).
These assertions can be proved formally (Sklansky et al. [1972], Sloboda et al. [1998],
and Klette and Rosenfeld [2004]). However, their correctness is evident for our pur-
poses (see Fig. 11.8), so we do not dwell on the proofs here. Unlike the angles of the
vertices of the dark gray region in Fig. 11.8, the angles sustained by the vertices of
the MPP are not necessarily multiples of 90°.
In the discussion that follows, we will need to calculate the orientation of triplets
of points. Consider a triplet of points, ( a, b, c ) , and let the coordinates of these points
be a = (ax , ay ), b = (bx , by ), and c = (cx , cy ). If we arrange these points as the rows of
the matrix
⎡ ax ay 1⎤
⎢ ⎥
A = ⎢bx by 1⎥ (11-1)
⎢ cx cy 1⎥⎦
⎣
vtucircle.com
so that sgn(a, b, c) > 0 for a counterclockwise sequence, sgn(a, b, c) < 0 for a clock-
wise sequence, and sgn(a, b, c) = 0 when the points are collinear. Geometrically,
sgn(a, b, c) > 0 indicates that point c lies on the positive side of pair (a, b) (i.e., c lies on
the positive side of the line passing through points a and b). Similarly, if sgn(a, b, c) < 0,
point c lies on the negative side of the line. Equations (11-2) and (11-3) give the same
result if the sequence (c, a, b) or (b, c, a) is used because the direction of travel in the
sequence is the same as for (a, b, c). However, the geometrical interpretation is differ-
ent. For example, sgn(c, a, b) > 0 indicates that point b lies on the positive side of the
line through points c and a.
To prepare the data for the MPP algorithm, we form a list of triplets consisting
of a vertex label (e.g., V0 , V1 , etc.); the coordinates of each vertex; and an additional
element denoting whether the vertex is W or B. It is important that the concave ver-
tices be mirrored, as in Fig. 11.8(c), that the vertices be in sequential order,† and that
the first vertex be the uppermost-leftmost vertex, which we know from property 5
is a W vertex of the MPP. Let V0 denote this vertex. We assume that the vertices are
arranged in the counterclockwise direction. The algorithm for finding MPPs uses
two “crawler” points: a white crawler (WC ) and a blue crawler (BC ). WC crawls along
the convex (W) vertices, and BC crawls along the concave (B) vertices. These two
crawler points, the last MPP vertex found, and the vertex being examined are all that
is necessary to implement the algorithm.
The algorithm starts by setting WC = BC = V0 (recall that V0 is an MPP-vertex).
Then, at any step in the algorithm, let VL denote the last MPP vertex found, and let
Vk denote the current vertex being examined. One of the following three conditions
can exist between VL , Vk , and the two crawler points:
†
Vertices of a boundary can be ordered by tracking the boundary using the boundary-following algorithm
discussed earlier.
vtucircle.com
(a) Vk is on the positive side of the line through the pair of points (VL , WC ), in which
case sgn (VL , WC , Vk ) > 0.
(b) Vk is on the negative side of the line though pair (VL , WC ) or is collinear with
it; that is sgn (VL , WC , Vk ) ≤ 0. Simultaneously, Vk lies to the positive side of the
line through (VL , BC ) or is collinear with it; that is, sgn (VL , BC , Vk ) ≥ 0.
(c) Vk is on the negative side of the line though pair (VL , BC ) , in which case
sgn (VL , BC , Vk ) < 0.
If condition (a) holds, the next MPP vertex is WC , and we let VL = WC ; then we
reinitialize the algorithm by setting WC = BC = VL , and start with the next vertex
after the newly changed VL .
If condition (b) holds, Vk becomes a candidate MPP vertex. In this case, we set
WC = Vk if Vk is convex (i.e., it is a W vertex); otherwise we set BC = Vk . We then
continue with the next vertex in the list.
If condition (c) holds, the next MPP vertex is BC and we let VL = BC ; then we
reinitialize the algorithm by setting WC = BC = VL and start with the next vertex
after the newly changed VL .
The algorithm stops when it reaches the first vertex again, and thus has processed
all the vertices in the polygon. The VL vertices found by the algorithm are the ver-
tices of the MPP. Klette and Rosenfeld [2004] have proved that this algorithm finds
all the MPP vertices of a polygon enclosed by a simply connected cellular complex.
EXAMPLE 11.2 : A numerical example showing the details of how the MPP algorithm works.
A simple example in which we can follow the algorithm step-by-step will help clarify the preceding con-
cepts. Consider the vertices in Fig. 11.8(c). In our image coordinate system, the top-left point of the grid
is at coordinates (0, 0). Assuming unit grid spacing, the first few (counterclockwise) vertices are:
where the triplets are separated by vertical lines, and the B vertices are mirrored, as required by the
algorithm.
The uppermost-leftmost vertex is always the first vertex of the MPP, so we start by letting VL and V0
be equal, VL = V0 = (1, 4), and initializing the other variables: WC = BC = VL = (1, 4 ) .
The next vertex is V1 = ( 2, 3) . In this case we have sgn (VL , WC , V1 ) = 0 and sgn (VL , BC , V1 ) = 0, so
condition (b) holds. Because V1 is a B (concave) vertex, we update the blue crawler: BC = V1 = ( 2, 3) . At
this stage, we have VL = (1, 4), WC = (1, 4), and BC = (2, 3).
Next, we look at V2 = ( 3, 3) . In this case, sgn (VL , WC , V2 ) = 0, and sgn (VL , BC , V2 ) = 1, so condition (b)
holds. Because V2 is W, we update the white crawler: WC = (3, 3).
The next vertex is V3 = ( 3, 2 ) . At this junction we have VL = (1, 4), WC = (3, 3), and BC = (2, 3). Then,
sgn (VL , WC , V3 ) = −2 and sgn (VL , BC , V3 ) = 0, so condition (b) holds again. Because V3 is B, we let
BC = V3 = (4, 3) and look at the next vertex.
The next vertex is V4 = ( 4, 1) . We are working with VL = (1, 4), WC = (3, 3), and BC = (3, 2). The values
of sgn are sgn(VL , WC , V4 ) = −3 and sgn(VL , BC , V4 ) = 0. So, condition (b) holds yet again, and we let
WC = V4 = (4, 1) because V4 is a W vertex.
vtucircle.com
The next vertex is V5 = (7, 1). Using the values from the previous step we obtain sgn(VL , WC , V5 ) = 9,
so condition (a) is satisfied. Therefore, we let VL = WC = (4, 1) (this is V4 ) and reinitialize:
BC = WC = VL = (4, 1). Note that once we knew that sgn(VL , WC , V5 ) > 0 we did not bother to compute
the other sgn expression. Also, reinitialization means that we start fresh again by examining the next
vertex following the newly found MPP vertex. In this case, that next vertex is V5 , so we visit it again.
With V5 = ( 7, 1) , and using the new values of VL , WC , and BC , it follows that sgn (VL , WC , V5 ) = 0 and
sgn (VL , BC , V5 ) = 0, so condition (b) holds. Therefore, we let WC = V5 = ( 7, 1) because V5 is a W vertex.
The next vertex is V6 = ( 8, 2 ) and sgn (VL , WC , V6 ) = 3, so condition (a) holds. Thus, we let
VL = WC = ( 7, 1) and reinitialize the algorithm by setting WC = BC = VL .
Because the algorithm was reinitialized at V5 , the next vertex is V6 = (8, 2) again. Using the results
from the previous step gives us sgn(VL , WC , V6 ) = 0 and sgn(VL , BC , V6 ) = 0, so condition (b) holds this
time. Because V6 is B we let BC = V6 = (8, 2).
Summarizing, we have found three vertices of the MPP up to this point: V1 = (1, 4), V4 = (4, 1), and
V5 = (7, 1). Continuing as above with the remaining vertices results in the MPP vertices in Fig. 11.8(c)
(see Problem 11.9). The mirrored B vertices at (2, 3), (3, 2), and on the lower-right side at (13, 10), are on
the boundary of the MPP. However, they are collinear and thus are not considered vertices of the MPP.
Appropriately, the algorithm did not detect them as such.
SIGNATURES
A signature is a 1-D functional representation of a 2-D boundary and may be gener-
ated in various ways. One of the simplest is to plot the distance from the centroid
to the boundary as a function of angle, as illustrated in Fig. 11.10. The basic idea of
using signatures is to reduce the boundary representation to a 1-D function that
presumably is easier to describe than the original 2-D boundary.
Based on the assumptions of uniformity in scaling with respect to both axes, and
that sampling is taken at equal intervals of u, changes in the size of a shape result
in changes in the amplitude values of the corresponding signature. One way to
vtucircle.com
a b c d
e f g h
FIGURE 11.9 (a) 566 × 566 binary image. (b) 8-connected boundary. (c) through (h), MMPs obtained using square cells
of sizes 2, 4, 6, 8, 16, and 32, respectively (the vertices were joined by straight-line segments for display). The number
of boundary points in (b) is 1900. The numbers of vertices in (c) through (h) are 206, 127, 92, 66, 32, and 13, respec-
tively. Images (b) through (h) are shown as negatives to make the boundaries easier to see.
normalize for this is to scale all functions so that they always span the same range of
values, e.g., [0, 1]. The main advantage of this method is simplicity, but it has the dis-
advantage that scaling of the entire function depends on only two values: the mini-
mum and maximum. If the shapes are noisy, this can be a source of significant error
from object to object. A more rugged (but also more computationally intensive)
approach is to divide each sample by the variance of the signature, assuming that
the variance is not zero—as in the case of Fig. 11.10(a)—or so small that it creates
computational difficulties. Using the variance yields a variable scaling factor that
is inversely proportional to changes in size and works much as automatic volume
control does. Whatever the method used, the central idea is to remove dependency
on size while preserving the fundamental shape of the waveforms.
Distance versus angle is not the only way to generate a signature. For example,
another way is to traverse the boundary and, corresponding to each point on the
boundary, plot the angle between a line tangent to the boundary at that point and a
reference line. The resulting signature, although quite different from the r(u) curves
in Fig. 11.10, carries information about basic shape characteristics. For instance,
horizontal segments in the curve correspond to straight lines along the boundary
because the tangent angle is constant there. A variation of this approach is to use
the so-called slope density function as a signature. This function is a histogram of
vtucircle.com
a b
FIGURE 11.10
r r
Distance-versus- u u
angle signatures.
In (a), r(u) is
constant. In (b),
the signature
consists of A A
repetitions of
the pattern r(u) r(u)
r ( u ) = A sec u for
0 ≤ u ≤ p 4 , and 2A
r ( u ) = A csc u for
A A
p 4 < u ≤ p 2.
0 p p 3p 5p 3p 7p p p 3p 5p 3p 7p
p 2p 0 p 2p
4 2 4 4 2 4 4 2 4 4 2 4
u u
vtucircle.com
a b c
d e f
FIGURE 11.11
(a) and (d) Two
binary regions,
(b) and (e) their
external
boundaries, and
(c) and (f) their
corresponding r(u)
signatures. The
horizontal axes
in (c) and (f) cor-
respond to angles
from 0° to 360°, in
increments of 1°.
to belong to the medial axis of R. The concept of “closest” (and thus the resulting
MAT) depends on the definition of a distance metric (see Section 2.5). Figure 11.12
shows some examples using the Euclidean distance. If the Euclidean distance is used,
the resulting skeleton is the same as what would be obtained by using the maximum
disks from Section 9.5. The skeleton of a region is defined as its medial axis.
The MAT of a region has an intuitive interpretation based on the “prairie fire”
concept discussed in Section 11.3 (see Fig. 11.15). Consider an image region as a
prairie of uniform, dry grass, and suppose that a fire is lit simultaneously along all
the points on its border. All fire fronts will advance into the region at the same speed.
The MAT of the region is the set of points reached by more than one fire front at
the same time.
In general, the MAT comes considerably closer than thinning to producing skel-
etons that “make sense.” However, computing the MAT of a region requires cal-
culating the distance from every interior point to every point on the border of the
region—an impractical endeavor in most applications. Instead, the approach is to
obtain the skeleton equivalently from the distance transform, for which numerous
efficient algorithms exist.
The distance transform of a region of foreground pixels in a background of zeros
is the distance from every pixel to the nearest nonzero valued pixel. Figure 11.13(a)
shows a small binary image, and Fig. 11.13(b) is its distance transform. Observe that
every 1-valued pixel has a distance transform value of 0 because its closest nonzero
valued pixel is itself. For the purpose of finding skeletons equivalent to the MAT,
we are interested in the distance from the pixels of a region of foreground (white)
vtucircle.com
a b c
FIGURE 11.12
Medial axes
(dashed) of three
simple regions.
pixels to their nearest background (zero) pixels, which constitute the region bound-
ary. Thus, we compute the distance transform of the complement of the image, as
Figs. 11.13(c) and (d) illustrate. By comparing Figs. 11.13(d) and 11.12(a), we see
in the former that the MAT (skeleton) is equivalent to the ridge of the distance
transform [i.e., the ridge in the image in Fig. 11.13(d)]. This ridge is the set of local
maxima [shown bold in Fig. 11.13(d)]. Figures 11.13(e) and (f) show the same effect
on a larger (414 × 708) binary image.
Finding approaches for computing the distance transform efficiently has been a
topic of research for many years. Numerous approaches exist that can compute the
distance transform with linear time complexity, O(K ), for a binary image with K
pixels. For example, the algorithm by Maurer et al. [2003] not only can compute the
distance transform in O(K ), it can compute it in O(K P ) using P processors.
a b 0 0 0 0 0 1.41 1 1 1 1.41
c d 0 1 1 1 0 1 0 0 0 1
e f 0 1 1 1 0 1 0 0 0 1
FIGURE 11.13 0 0 0 0 0 1.41 1 1 1 1.41
(a) A small
image and (b) its 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
distance 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0
transform. Note 0 1 1 1 1 1 1 1 0 0 1 2 2 2 2 2 1 0
that all 1-valued 0 1 1 1 1 1 1 1 0 0 1 2 3 3 3 2 1 0
pixels in (a) have 0 1 1 1 1 1 1 1 0 0 1 2 2 2 2 2 1 0
corresponding 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0
0’s in (b). (c) A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
small image, and
(d) the distance
transform of its
complement. (e) A
larger image, and
(f) the distance
transform of its
complement. The
Euclidian distance
was used through-
out.
vtucircle.com
a b
c d
FIGURE 11.14
(a) Thresholded
image of blood
vessels.
(b) Skeleton
obtained by
thinning, shown
superimposed
on the image
(note the spurs).
(c) Result of 40
passes of spur
removal.
(d) Skeleton
obtained using the
distance
transform.
EXAMPLE 11.5 : Skeletons obtained using thinning and pruning vs. the distance transform.
Figure 11.14(a) shows a segmented image of blood vessels, and Fig. 11.14(b) shows the skeleton obtained
using morphological thinning. As we discussed in Chapter 9, thinning is characteristically accompanied
by spurs, which certainly is the case here. Figure 11.14(c) shows the result of forty passes of spur removal.
With the exception of the few small spurs visible on the bottom left of the image, pruning did a reason-
able job of cleaning up the skeleton. One drawback of thinning is the loss of potentially important
features. This was not the case here, except the pruned skeleton does not cover the full expanse of the
image. Figure 11.14(c) shows the skeleton obtained using distance transform computations based on fast
marching (see Lee et al. [2005] and Shi and Karl [2008]). The way the algorithm we used implements
branch generation handles ambiguities such as spurs automatically.
The result in Fig. 11.14(d) is slightly superior to the result in Fig. 11.14(c), but both skeletons certainly
capture the important features of the image in this case. A key advantage of the thinning approach
is simplicity of implementation, which can be important in dedicated applications. Overall, distance-
transform formulations tend to produce skeletons less prone to discontinuities, but overcoming the
computational burden of the distance transform results in implementations that are considerably more
complex than thinning.
vtucircle.com
12.1 BACKGROUND
12.1
Humans possess the most sophisticated pattern recognition capabilities in the known
biological world. By contrast, the capabilities of current recognition machines pale
in comparison with tasks humans perform routinely, from being able to interpret the
meaning of complex images, to our ability for generalizing knowledge stored in our
brains. But recognition machines play an important, sometimes even crucial role in
everyday life. Imagine what modern life would be like without machines that read
barcodes, process bank checks, inspect the quality of manufactured products, read
fingerprints, sort mail, and recognize speech.
In image pattern recognition, we think of a pattern as a spatial arrangement of
features. A pattern class is a set of patterns that share some common properties. Pat-
tern recognition by machine encompasses techniques for automatically assigning
patterns to their respective classes. That is, given a pattern or sets of patterns whose
class is unknown, the job of a pattern recognition system is to assign a class label to
each of its input patterns.
There are four main stages involved in recognition: (1) sensing, (2) preprocessing,
(3) feature extraction, and (4) classification. In terms of image processing, sensing is
concerned with generating signals in a spatial (2-D) or higher-dimensional format.
We covered numerous aspects of image sensing in Chapter 1. Preprocessing deals
with techniques for tasks such as noise reduction, enhancement, restoration, and
segmentation, as discussed in earlier chapters. You learned about feature extraction
in Chapter 11. Classification, the focus of this chapter, deals with using a set of fea-
tures as the basis for assigning class labels to unknown input image patterns.
In the following section, we will discuss three basic approaches used for image
pattern classification: (1) classification based on matching unknown patterns against
specified prototypes, (2) optimum statistical classifiers, and (3) neural networks.
One way to characterize the differences between these approaches is in the level
of “engineering” required to transform raw data into formats suitable for computer
processing. Ultimately, recognition performance is determined by the discriminative
power of the features used.
In classification based on prototypes, the objective is to make the features so
unique and easily detectable that classification itself becomes a simple task. A good
example of this are bank-check processors, which use stylized font styles to simplify
machine processing (we will discuss this application in Section 12.3).
In the second category, classification is cast in decision-theoretic, statistical terms,
and the classification approach is based on selecting parameters that can be shown
to yield optimum classification performance in a statistical sense. Here, emphasis is
placed on both the features used, and the design of the classifier. We will illustrate
this approach in Section 12.4 by deriving the Bayes pattern classifier, starting from
basic principles.
In the third category, classification is performed using neural networks. As you
will learn in Sections 12.5 and 12.6, neural networks can operate using engineered
features too, but they have the unique ability of being able to generate, on their own,
representations (features) suitable for recognition. These systems can accomplish
this using raw data, without the need for engineered features.
vtucircle.com
One characteristic shared by the preceding three approaches is that they are
based on parameters that must be either specified or learned from patterns that rep-
resent the recognition problem we want to solve. The patterns can be labeled, mean-
ing that we know the class of each pattern, or unlabeled, meaning that the data are
known to be patterns, but the class of each pattern is unknown. A classic example
of labeled data is the character recognition problem, in which a set of character
samples is collected and the identity of each character is recorded as a label from
the group 0 through 9 and a through z. An example of unlabeled data is when we are
seeking clusters in a data set, with the aim of utilizing the resulting cluster centers as
being prototypes of the pattern classes contained in the data.
When working with a labeled data, a given data set generally is subdivided into
Because the examples in
three subsets: a training set, a validation set, and a test set (a typical subdivision might
this chapter are intended be 50% training, and 25% each for the validation and test sets). The process by
to demonstrate basic
principles and are not
which a training set is used to generate classifier parameters is called training. In
large scale, we dispense this mode, a classifier is given the class label of each pattern, the objective being to
with validation and
subdivide the pattern
make adjustments in the parameters if the classifier makes a mistake in identify-
data into training and ing the class of the given pattern. At this point, we might be working with several
test sets.
candidate designs. At the end of training, we use the validation set to compare the
various designs against a performance objective. Typically, several iterations of train-
ing/validation are required to establish the design that comes closest to meeting the
desired objective. Once a design has been selected, the final step is to determine how
it will perform “in the field.” For this, we use the test set, which consists of patterns
that the system has never “seen” before. If the training and validation sets are truly
representative of the data the system will encounter in practice, the results of train-
ing/validation should be close to the performance using the test set. If training/vali-
dation results are acceptable, but test results are not, we say that training/validation
“over fit” the system parameters to the available data, in which case further work on
the system architecture is required. Of course all this assumes that the given data are
truly representative of the problem we want to solve, and that the problem in fact
can be solved by available technology.
A system that is designed using training data is said to undergo supervised learn-
ing. If we are working with unlabeled data, the system learns the pattern classes
themselves while in an unsupervised learning mode. In this chapter, we deal only
with supervised learning. As you will see in this and the next chapter, supervised
learning covers a broad range of approaches, from applications in which a system
learns parameters of features whose form is fixed by a designer, to systems that uti-
Generally, we associate
lize deep learning and large sets of raw data sets to learn, on their own, the features
the concept of deep required for classification. These systems accomplish this task without a human
learning with large sets
of data. These ideas are
designer having to specify the features, a priori.
discussed in more detail After a brief discussion in the next section of how patterns are formed, and on
later in this section and
next.
the nature of patterns classes, we will discuss in Section 12.3 various approaches for
prototype-based classification. In Section 12.4, we will start from basic principles
and derive the equations of the Bayes classifier, an approach characterized by opti-
mum classification performance on an average basis. We will also discuss supervised
training of a Bayes classifier based on the assumption of multivariate Gaussian
vtucircle.com
distributions. Starting with Section 12.5, we will spend the rest of the chapter discuss-
ing neural networks. We will begin Section 12.5 with a brief introduction to percep-
trons and some historical facts about machine learning. Then, we will introduce the
concept of deep neural networks and derive the equations of backpropagation, the
method of choice for training deep neural nets. These networks are well-suited for
applications in which input patterns are vectors. In Section 12.6, we will introduce
deep convolutional neural networks, which currently are the preferred approach
when the system inputs are digital images. After deriving the backpropagation equa-
tions used for training convolutional nets, we will give several examples of appli-
cations involving classes of images of various complexities. In addition to working
directly with image inputs, deep convolutional nets are capable of learning, on their
own, image features suitable for classification. This is accomplished starting with raw
image data, as opposed to the other classification methods discussed in Sections 12.3
and 12.4, which rely on “engineered” features whose form, as noted earlier, is speci-
fied a priori by a human designer.
In image pattern classification, the two principal pattern arrangements are quantita-
tive and structural. Quantitative patterns are arranged in the form of pattern vectors.
Structural patterns typically are composed of symbols, arranged in the form of strings,
trees, or, less frequently, as graphs. Most of the work in this chapter is based on pat-
tern vectors, but we will discuss structural patterns briefly at the end of this section,
and give an example at the end of Section 12.3.
PATTERN VECTORS
Pattern vectors are represented by lowercase letters, such as x, y, and z, and have
the form
⎡ x1 ⎤
⎢x ⎥
x=⎢ ⎥
2
(12-1)
⎢ ⎥
⎢ ⎥
⎣ xn ⎦
where each component, xi , represents the ith feature descriptor, and n is the total
number of such descriptors. We can express a vector in the form of a column, as
in Eq. (12-1), or in the equivalent row form x = ( x1 , x2 , …, xn ) , where T indicates
T
vtucircle.com
a b
225 175 90 90 ⎡ 225 ⎤
FIGURE 12.1
⎢125 ⎥
Using linear ⎢ ⎥
indexing to 125 5 90 225 ⎢ 5 ⎥
vectorize a ⎢ ⎥
x=⎢ ⎥
grayscale image. ⎢ 225 ⎥
5 5 5 225
175 ⎢ ⎥
⎢175 ⎥
⎢ ⎥
125 5 225 225 ⎣ 225 ⎦
technique called discriminant analysis to recognize three types of iris flowers (Iris
setosa, virginica, and versicolor). Fisher described each flower using four features:
Sepals are the undergrowth the length and width of the petals, and similarly for the sepals (see Fig. 12.2). This
beneath the petals.
leads to the 4-D vectors shown in the figure. A set of these vectors, obtained for fifty
samples of each flower gender, constitutes the three famous Fisher iris pattern class-
es. Had Fisher been working today, he probably would have added spectral colors
and shape features to his measurements, yielding vectors of higher dimensionality.
We will be working with the original iris data set later in this chapter.
A higher-level representation of patterns is based on feature descriptors of the
types you learned in Chapter 11. For instance, pattern vectors formed from descrip-
tors of boundary shape are well-suited for applications in controlled environments,
such as industrial inspection. Figure 12.3 illustrates the concept. Here, we are inter-
ested in classifying different types of noisy shapes, a sample of which is shown in
the figure. If we represent an object by its signature, we would obtain 1-D signals
of the form shown in Fig. 12.3(b). We can express a signature as a vector by sam-
pling its amplitude at increments of u, then formimg a vector by letting xi = r(ui ),
for i = 0, 1, 2, … , n. Instead of using “raw” sampled signatures, a more common
approach is to compute some function, xi = g ( r(ui )) , of the signature samples and
use them to form vectors. You learned in Section 11.3 several approaches to do this,
such as statistical moments.
FIGURE 12.2
Petal and sepal Sepal
width and length
Petal
measurements
(see arrows)
performed on iris
⎡ x1 ⎤
flowers for the ⎢x ⎥
purpose of data x = ⎢ 2⎥
classification. The ⎢ x3 ⎥
⎢ ⎥
image shown is of ⎣⎢ x4 ⎦⎥
the Iris virginica
gender. (Image
courtesy of x1 = Petal width
USDA.) x2 = Petal length
x3 = Sepal width
x4 = Sepal length
vtucircle.com
a b
r(u)
FIGURE 12.3
(a) A noisy object ⎡ g ( r(u1 )) ⎤
r ⎢ ⎥
boundary, and (b) g ( r ( u2 )) ⎥
its corresponding
u x=⎢
⎢ ⎥
signature. ⎢ ⎥
⎢⎣ g ( r(un ))⎥⎦
u
p p 3p 5p 3p 7p
0 p 2p
4 2 4 4 2 4
Vectors can be formed also from features of both boundary and regions. For
example, the objects in Fig. 12.4 can be represented by 3-D vectors whose compo-
nents capture shape information related to both boundary and region properties
of single binary objects. Pattern vectors can be used also to represent properties of
image regions. For example, the elements of the 6-D vector in Fig. 12.5 are texture
measures based on the feature descriptors in Table 11.3. Figure 12.6 shows an exam-
ple in which pattern vector elements are features that are invariant to transforma-
tions, such as image rotation and scaling (see Section 11.4).
When working with sequences of registered images, we have the option of using
pattern vectors formed from corresponding pixels in those images (see Fig. 12.7).
Forming pattern vectors in this way implies that recognition will be based on infor-
mation extracted from the same spatial location across the images. Although this
may seem like a very limiting approach, it is ideally suited for applications such as
recognizing regions in multispectral images, as you will see in Section 12.4.
When working with entire images as units, we need the detail afforded by vectors
of much-higher dimensionality, such as those we discussed in Section 11.7 in connec-
tion with the SIFT algorithm. However, a more powerful approach when working
with entire images is to use deep convolutional neural networks. We will discuss
neural nets in detail in Sections 12.5 and 12.6.
STRUCTURAL PATTERNS
Pattern vectors are not suitable for applications in which objects are represented
by structural features, such as strings of symbols. Although they are used much less
than vectors in image processing applications, patterns containing structural descrip-
tions of objects are important in applications where shape is of interest. Figure 12.8
shows an example. The boundaries of the bottles were approximated by a polygon
a b c d
FIGURE 12.4
Pattern vectors
whose components
capture both bound- ⎡ x1 ⎤ x1 = compactness
⎢ ⎥
ary and regional x = ⎢ x2 ⎥ x2 = circularity
characteristics. ⎢⎣ x3 ⎥⎦ x3 = eccentricity
vtucircle.com
Spectral band 6
Spectral band 5
Spectral band 4
Spectral band 3
Spectral band 2
x1
Spectral band 1
x2
x3
x
x4
x5
x6
Images in spectral bands 4 – 6
FIGURE 12.7 Pattern (feature) vectors formed by concatenating corresponding pixels from a set of registered images.
(Original images courtesy of NASA.)
vtucircle.com
FIGURE 12.8
Symbol string
generated from
a polygonal
b Direction of travel
approximation of
the boundaries of b
u
medicine bottles. b a = bubb
Symbol string
u = interior angle
b = line segment of specified length
using the approach explained in Section 11.2. The boundary is subdivided into line
segments (denoted by b in the figure), and the interior angle, u, is computed at each
intersection of two line segments. A string of sequential symbols is generated as the
boundary is traversed in the counterclockwise direction, as the figure shows. Strings
of this form are structural patterns, and the objective, as you will see in Section 12.3,
is to match a given string against stored string prototypes.
A tree is another structural representation, suitable for higher-level descriptions
of an entire image in terms of its component regions. Basically, most hierarchical
ordering schemes lead to tree structures. For example, Fig. 12.9 shows a satellite
image of a heavily built downtown area and surrounding residential areas. Let the
symbol $ represent the root of a tree. The (upside down) tree shown in the figure
was obtained using the structural relationship “composed of.” Thus, the root of the
tree represents the entire image. The next level indicates that the image is composed
of a downtown and residential areas. In turn, the residential areas are composed
of housing, highways, and shopping malls. The next level down in the tree further
describes the housing and highways. We can continue this type of subdivision until
we reach the limit of our ability to resolve different regions in the image.
MINIMUM-DISTANCE CLASSIFIER
The minimum-distance One of the simplest and most widely used prototype matching methods is the
classifier is also referred
to as the nearest-neighbor
minimum-distance classifier which, as its name implies, computes a distance-based
classifier. measure between an unknown pattern vector and each of the class prototypes. It
then assigns the unknown pattern to the class of its closest prototype. The prototype
vtucircle.com
Image
$
Downtown Residential
FIGURE 12.9 Tree representation of a satellite image showing a heavily built downtown area (Washington, D.C.) and
surrounding residential areas. (Original image courtesy of NASA.)
vectors of the minimum-distance classifier usually are the mean vectors of the vari-
ous pattern classes:
1
n j x∑
mj = x j = 1, 2, …, Nc (12-2)
∈c j
where n j is the number of pattern vectors used to compute the jth mean vector,
c j is the jth pattern class, and Nc is the number of classes. If we use the Euclidean
distance to determine similarity, the minimum-distance classifier computes the dis-
tances
Dj ( x ) =储 x − m j 储 j = 1, 2,…, Nc (12-3)
where 储 a 储 = (aT a)1 2 is the Euclidean norm. The classifier then assigns an unknown
pattern x to class ci if Di (x) < Dj (x) for j = 1, 2, …, Nc , j ≠ i. Ties [i.e., Di (x) = Dj (x)]
are resolved arbitrarily.
It is not difficult to show (see Problem 12.2) that selecting the smallest distance is
equivalent to evaluating the functions
1 T
d j ( x ) = mTj x − mj mj j = 1, 2, …, Nc (12-4)
2
vtucircle.com
and assigning an unknown pattern x to the class whose prototype yielded the largest
value of d. That is, x is assigned to class ci , if
When used for recognition, functions of this form are referred to as decision or dis-
criminant functions.
The decision boundary separating class ci from c j is given by the values of x for
which
di (x) = d j (x) (12-6)
The decision boundaries for a minimum-distance classifier follow directly from this
equation and Eq. (12-4):
EXAMPLE 12.1 : Illustration of the minimum-distance classifier for two classes in 2-D.
Figure 12.10 shows scatter plots of petal width and length values for the classes Iris versicolor and Iris
setosa. As mentioned in the previous section, pattern vectors in the iris database consists of four mea-
surements for each flower. We show only two here so that you can visualize the pattern classes and the
decision boundary between them. We will work with the complete database later in this chapter.
We denote the Iris versicolor and setosa data as classes c1 and c2 , respectively. The means of the two
classes are m1 = ( 4.3, 1.3) and m2 = (1.5, 0.3) . It then follows from Eq. (12-4) that
T T
1 T
d1 ( x ) = mT1 x − m1 m1
2
= 4.3 x1 + 1.3 x2 − 10.1
and
1 T
d2 ( x ) = mT2 x − m2 m2
2
= 1.5 x1 + 0.3 x2 − 1.17
From Eq. (12-8), the equation of the boundary is
vtucircle.com
FIGURE 12.10 x2
Decision Iris versicolor
boundary of a Iris setosa
minimum distance 2.8x1 1.0x2 8.9 0
classifier (based 2.0
on two measure-
ments) for the
0.5
0 x1
0 1 2 3 4 5 6 7
Petal length (cm)
Figure 12.10 shows a plot of this boundary. Substituting any pattern vector from class c1 into this equa-
tion would yield d12 (x) > 0. Conversely, any pattern from class c2 would give d12 (x) < 0. Thus, given an
unknown pattern x belonging to one of these two classes, the sign of d12 (x) would be sufficient to deter-
mine the class to which that pattern belongs.
The minimum-distance classifier works well when the distance between means is
large compared to the spread or randomness of each class with respect to its mean.
In Section 12.4 we will show that the minimum-distance classifier yields optimum
performance (in terms of minimizing the average loss of misclassification) when the
distribution of each class about its mean is in the form of a spherical “hypercloud” in
n-dimensional pattern space.
As noted earlier, one of the keys to accurate recognition performance is to specify
features that are effective discriminators between classes. As a rule, the better the
features are at meeting this objective, the better the recognition performance will be.
In the case of the minimum-distance classifier this implies wide separation between
means and tight grouping of the classes.
Systems based on the Banker’s Association E-13B font character are a classic
example of how highly engineered features can be used in conjunction with a simple
classifier to achieve superior results. In the mid-1940s, bank checks were processed
manually, which was a laborious, costly process prone to mistakes. As the volume
of check writing increased in the early 1950s, banks became keenly interested in
automating this task. In the middle 1950s, the E-13B font and the system that reads
it became the standard solution to the problem. As Fig. 12.11 shows, this font set con-
sists of 14 characters laid out on a 9 × 7 grid. The characters are stylized to maximize
the difference between them. The font was designed to be compact and readable by
humans, but the overriding purpose was that the characters should be readable by
machine, quickly, and with very high accuracy.
vtucircle.com
FIGURE 12.11
The American
Bankers
Association
E-13B font
character set and
corresponding
waveforms.
T
r
a
n
s
i
t
A
m
o
u
n
t
O
n
U
s
D
a
s
h
In addition to a stylized font design, the operation of the reading system is further
enhanced by printing each character using an ink that contains finely ground mag-
Appropriately, recogni-
netic material. To improve character detectability in a check being read, the ink is
tion of magnetized char- subjected to a magnetic field that accentuates each character against the background.
acters is referred to as The stylized design further enhances character detectability. The characters are
Magnetic Ink Character
Recognition (MICR). scanned in a horizontal direction with a single-slit reading head that is narrower but
taller than the characters. As a check passes through the head, the sensor produces a
1-D electrical signal (a signature) that is conditioned to be proportional to the rate
of increase or decrease of the character area under the head. For example, consider
the waveform of the number 0 in Fig. 12.11. As a check moves to the right past the
head, the character area seen by the sensor begins to increase, producing a positive
derivative (a positive rate of change). As the right leg of the character begins to pass
under the head, the character area seen by the sensor begins to decrease, produc-
ing a negative derivative. When the head is in the middle zone of the character, the
area remains nearly constant, producing a zero derivative. This waveform repeats
itself as the other leg of the character enters the head. The design of the font ensures
that the waveform of each character is distinct from all others. It also ensures that
the peaks and zeros of each waveform occur approximately on the vertical lines of
the background grid on which these waveforms are displayed, as the figure shows.
The E-13B font has the property that sampling the waveforms only at these (nine)
vtucircle.com
points yields enough information for their accurate classification. The effectiveness
of these highly engineered features is further refined by the magnetized ink, which
results in clean waveforms with almost no scatter.
Designing a minimum-distance classifier for this application is straightforward.
We simply store the sample values of each waveform at the vertical lines of the grid,
and let each set of the resulting samples be represented as a 9-D prototype vector,
m j , j = 1, 2,…, 14. When an unknown character is to be classified, the approach is
to scan it in the manner just described, express the grid samples of the waveform as
a 9-D vector, x, and identify its class by selecting the class of the prototype vector
that yields the highest value in Eq. (12-4). We do not even need a computer to do
this. Very high classification speeds can be achieved with analog circuits composed
of resistor banks (see Problem 12.4).
The most important lesson in this example is that a recognition problem often can
be made trivial if we can control the environment in which the patterns are gener-
ated. The development and implementation of the E13-B font reading system is a
striking example of this fact. On the other hand, this system would be inadequate if
we added the requirement that it has to recognize the textual content and signature
written on each check. For this, we need systems that are significantly more complex,
such as the convolutional neural networks we will discuss in Section 12.6.
(w 夽 f )( x, y) = ∑ ∑ w(s, t )f ( x + s, y + t ) (12-9)
s t
where the limits of summation are taken over the region shared by w and f . This
equation is evaluated for all values of the displacement variables x and y so all ele-
ments of w visit every pixel of f . As you know, correlation has its highest value(s)
in the region(s) where f and w are equal or nearly equal. In other words, Eq. (12-9)
finds locations where w matches a region of f . But this equation has the drawback
that the result is sensitive to changes in the amplitude of either function. In order
To be formal, we should to normalize correlation to amplitude changes in one or both functions, we perform
refer to correlation (and matching using the correlation coefficient instead:
the correlation
coefficient) as cross-
∑ ∑ [ w(s, t ) - w ] ⎡⎣ f ( x + s, y + t ) − f
correlation when the
⎤
functions are different, xy ⎦
and as autocorrelation
when they are the same.
g( x, y) = s t
1
(12-10)
⎧ ⎫
⎨ ∑ ∑ [ w( s, t ) − w ] ∑ ∑ ⎡⎣ f ( x + s, y + t ) − fxy ⎤⎦ ⎬
However, it is customary 2 2 2
to use the generic term
correlation and ⎩ s t s t ⎭
correlation coefficient,
except when the distinc-
tion is important (as in
where the limits of summation are taken over the region shared by w and f , w is the
deriving equations, in average value of the kernel (computed only once), and fxy is the average value of f in
which it makes a dif-
ference which is being
the region coincident with w. In image correlation work, w is often referred to as a
applied). template (i.e., a prototype subimage) and correlation is referred to as template matching.
vtucircle.com
FIGURE 12.12
The mechanics of (m 1)/2
template
matching. Origin
(n 1)/ 2
n
m
(x, y)
Template w
centered at an arbitrary
location (x, y)
Image, f
Padding
It can be shown (see Problem 12.5) that g( x, y) has values in the range [ −1, 1] and is thus
normalized to changes in the amplitudes of w and f . The maximum value of g occurs
when the normalized w and the corresponding normalized region in f are identical.
This indicates maximum correlation (the best possible match). The minimum occurs
when the two normalized functions exhibit the least similarity in the sense of Eq. (12-10).
Figure 12.12 illustrates the mechanics of the procedure just described. The border
around image f is padding, as explained in Section 3.4. In template matching, values
of correlation when the center of the template is past the border of the image gener-
ally are of no interest, so the padding is limited to half the kernel width.
The template in Fig. 12.12 is of size m × n, and it is shown with its center at an
arbitrary location ( x, y). The value of the correlation coefficient at that point is com-
puted using Eq. (12-10). Then, the center of the template is incremented to an adja-
cent location and the procedure is repeated. Values of the correlation coefficient
g( x, y) are obtained by moving the center of the template (i.e., by incrementing x
and y) so the center of w visits every pixel in f . At the end of the procedure, we
look for the maximum in g( x, y) to find where the best match occurred. It is possible
to have multiple locations in g( x, y) with the same maximum value, indicating sev-
eral matches between w and f .
vtucircle.com
a b
c d
FIGURE 12.13
(a) 913 × 913
satellite image
of Hurricane
Andrew.
(b) 31 × 31
template of the
eye of the storm.
(c) Correlation
coefficient shown
as an image (note
the brightest
point, indicated
by an arrow).
(d) Location of
the best match
(identified by the
arrow). This point
is a single pixel,
but its size was
enlarged to make
it easier to see.
(Original image
courtesy of
NOAA.)
white dot the location of this maximum correlation value (in this case there was a unique match whose
maximum value was 1), which we see corresponds closely with the location of the eye in Fig. 12.13(a).
vtucircle.com
know from the discussion in Section 10.2 that the Hough transform simplifies looking
for data patterns by utilizing bins that reduce the level of detail with which we look at
a data set. We already discussed the SIFT algorithm in Section 11.7. The focus in this
section is to further illustrate the capabilities of SIFT for prototype matching.
Figure 12.14 shows the circuit board image we have used several times before.
The small rectangle enclosing the rightmost connector on the top of the large image
identifies an area from which an image of the connector was extracted. The small
image is shown zoomed for clarity. The sizes of the large and small images are shown
in the figure caption. Figure 12.15 shows the keypoints found by SIFT, as explained
in Section 11.7. They are visible as faint lines on both images. The zoomed view of
the subimage shows them a little clearer. It is important to note that the keypoints
for the image and subimage were found independently by SIFT. The large image
had 2714 keypoints, and the small image had 35.
Figure 12.16 shows the matches between keypoints found by SIFT. A total of 41
matches were found between the two images. Because there are only 35 keypoints
FIGURE 12.15
Keypoints found
by SIFT. The
large image has
2714 keypoints
(visible as faint
gray lines). The
subimage has 35
keypoints. This is
a separate image,
and SIFT found
its keypoints inde-
pendently of the
large image. The
zoomed section is
shown for clarity.
vtucircle.com
FIGURE 12.16
Matches found by
SIFT between the
large and small
images. A total of
41 matching pairs
were found. They
are shown Errors
connected by
straight lines.
Only three of the
matches were
“real” errors
(labeled “Errors”
in the figure).
in the small image, obviously at least six matches are either incorrect, or there are
multiple matches. Three of the errors are clearly visible as matches with connectors
in the middle of the large image. However, if you compare the shape of the connec-
tors in the middle of the large image, you can see that they are virtually identical to
parts of the connectors on the right. Therefore, these errors can be explained on that
basis. The other three extra matches are easier to explain. All connectors on the top
right of the circuit board are identical, and we are comparing one of them against
the rest. There is no way for a system to tell the difference between them. In fact, by
looking at the connecting lines, we can see that the matches are between the subim-
age and all five connectors. These in fact are correct matches between the subimage
and other connectors that are identical to it.
vtucircle.com
described by shape numbers. With reference to the discussion in Section 11.3, the
degree of similarity, k, between two region boundaries, is defined as the largest order
for which their shape numbers still coincide. For example, let a and b denote shape
numbers of closed boundaries represented by 4-directional chain codes. These two
shapes have a degree of similarity k if
Parameter j starts at
4 and is always even s j ( a ) = s j ( b) for j = 4, 6, 8, …, k; and
because we are working (12-11)
with 4-connectivity, and
we require that
s j ( a ) ≠ s j ( b) for j = k + 2, k + 4, …
boundaries be closed.
where s indicates shape number, and the subscript indicates shape order. The dis-
tance between two shapes a and b is defined as the inverse of their degree of simi-
larity:
1
D ( a, b) = (12-12)
k
This expression satisfies the following properties:
D ( a, b) ≥ 0
D ( a, b) = 0 if and only if a = b (12-13)
D ( a, c ) ≤ max ⎡⎣ D ( a, b) , D ( b, c )⎤⎦
Either k or D may be used to compare two shapes. If the degree of similarity is used,
the larger k is, the more similar the shapes are (note that k is infinite for identical
shapes). The reverse is true when Eq. (12-12) is used.
String Matching
Suppose two region boundaries, a and b, are coded into strings of symbols, denot-
ed as a1a2 … an and b1b2 … bm , respectively. Let a represent the number of matches
between the two strings, where a match occurs in the kth position if ak = bk . The
number of symbols that do not match is
b = max ( a , b ) − a (12-14)
vtucircle.com
a
b c
b c
FIGURE 12.17
a
(a) Shapes.
(b) Similarity
tree. (c) Similarity
matrix.
(Bribiesca and d e f
Guzman.)
Degree
4 abcdef a b c d e f
6 abcdef a 6 6 6 6 6
b 8 8 10 8
8 a bcdef
c 8 8 12
10 a d cf be d 8 8
12 a d cf b e e 8
f
14
a d c f b e
where arg is the length (number of symbols) of string in the argument. It can be
shown that b = 0 if and only if a and b are identical (see Problem 12.7).
An effective measure of similarity is the ratio
a a
R= = (12-15)
b max ( a , b ) − a
We see that R is infinite for a perfect match and 0 when none of the corresponding
symbols in a and b match (a = 0 in this case). Because matching is done symbol by
symbol, the starting point on each boundary is important in terms of reducing the
amount of computation required to perform a match. Any method that normalizes
Refer to Section 11.2
for examples of how the to, or near, the same starting point is helpful if it provides a computational advan-
starting point of a curve tage over brute-force matching, which consists of starting at arbitrary points on each
can be normalized.
string, then shifting one of the strings (with wraparound) and computing Eq. (12-15)
for each shift. The largest value of R gives the best match.
vtucircle.com
a b
c d
e f
g
FIGURE 12.18
(a) and (b) sample
boundaries of two
different object
classes; (c) and (d)
their corresponding
polygonal
approximations;
(e)–(g) tabulations
of R. R 1.a 1.b 1.c 1.d 1.e 1.f R 2.a 2.b 2.c 2.d 2.e 2.f
(Sze and Yang.)
1.a 2.a
1.b 16.0 2.b 33.5
1.c 9.6 26.3 2.c 4.8 5.8
1.d 5.1 8.1 10.3 2.d 3.6 4.2 19.3
1.e 4.7 7.2 10.3 14.2 2.e 2.8 3.3 9.2 18.3
1.f 4.7 7.2 10.3 8.4 23.7 2.f 2.6 3.0 7.7 13.5 27.0
corresponding to the boundaries in Figs. 12.18(a) and (b), respectively. Strings were formed from the
polygons by computing the interior angle, u, between segments as each polygon was traversed clock-
wise. Angles were coded into one of eight possible symbols, corresponding to multiples of 45°; that is,
a1 : 0° < u ≤ 45°; a2 : 45° < u ≤ 90°; … ; a8 : 315° < u ≤ 360°.
Figure 12.18(e) shows the results of computing the measure R for six samples of object 1 against
themselves. The entries are values of R and, for example, the notation 1.c refers to the third string from
object class 1. Figure 12.18(f) shows the results of comparing the strings of the second object class
against themselves. Finally, Fig. 12.18(g) shows the R values obtained by comparing strings of one class
against the other. These values of R are significantly smaller than any entry in the two preceding tabu-
lations. This indicates that the R measure achieved a high degree of discrimination between the two
classes of objects. For example, if the class of string 1.a had been unknown, the smallest value of R result-
ing from comparing this string against sample (prototype) strings of class 1 would have been 4.7 [see
Fig. 12.18(e)]. By contrast, the largest value in comparing it against strings of class 2 would have been
1.24 [see Fig. 12.18(g)]. This result would have led to the conclusion that string 1.a is a member of object
class 1. This approach to classification is analogous to the minimum-distance classifier introduced earlier.
vtucircle.com