Computer and Robot Vision Volume 1 PDF
Computer and Robot Vision Volume 1 PDF
r ( K ) =min
xK
m
YEKl
lx - ~ l l 5
ZIIYI=I
l - x + xl
+yll~ forx E K
5 f{maxIl~
-YJI +maxII* -Yll} forx E K
Y K
Y EK
5maxIIx -yll forx E K
Y EK
S ~ ~ ~ P M ( A , A5Qygllkll
K)
andygllkll = r(K),wehave P M ( A , A $
K ) 5 r(K). Also, sinceAeK > A , ~ M ( A * K , A=) p(AeK,A). Since 0 E K ,
A e K c A $ K . Hence pM(A0 K . A ) = p(A e K , A ) 5 p [ ( e~ K ) @ K , A ] =
P ( A @ K , A ) 5 r(K).
It immediately follows that the distance between the minimal and maximal reconstructions, which differ only by a dilation by K , is no greater than the sue of
the reconstruction structuring element; that is, pM [(F n S ) l K , ( F n S ) $ K ] 5
r(K). WhenF = F o K = F e K , ( F n S ) e K G F E(FnS)@K.Sincethedistance between the minimal and maximal reconstructions is no greater than r ( K ) ,it
is unsurprising that the distance between F and either of the reconstructions is no
greater than r(K). We can reason the following way:
It is readily verified that if A G B G C , then ( 1) pM(A,B ) 5 pM(A,C ) and
(2) PM(B,C)5 PM(A,C).
Now it immediately follows that if F = F OK = F e K , pw [ F ,( F n ~@ K
) ]<
r ( K ) and ~ M [ F , ( nF S ) O K ] 5 r(K).
When the image F is open under K , the distance between F and its sampling
F n S can be no greater than r(K). Why? It is certainly the case that F n S F C
( F n S ) @ K .Hence pM(F,F n S ) < pM[F n S , ( F n S ) @ K ]< r(K).
If two sets are both open under the reconstruction structuring element K , then
the distance between the sets must be no greater than the distance between their
samplings plus the size of K .
Proposition 5.13: If A = A O K and B = B O K ,then pM(A,B)5 pM(An S , B
S)
+ r(K).
Proof:
CHAPTER
NEIGHBORHOOD
OPERATORS
264
Neighborhood Operators
Figure 6.1 A small and a large square mighborhood that might be used in a
neighborhood operator.
centered around ( r , c ) , the set of all neighboring pixel positions whose Euclidean
distance to pixel ( r , c ) is less than or equal to po(r,c),the distance po depending
on ( r , c ) ; or it might consist of the set of all neighboring pixel positions whose
Euclidean distance to pixel ( r ,c ) is less than or equal to po and whose row index is
less than or equal to r , and so on.
Letting f designate the input image and g the output image, we can write the
action of a general nonrecursive neighborhood operator 4 as
the notation f (r',cl) : (rf,c')E N ( r , c ) meaning a list of the values f ( r ' , c f )for
each of the (r',cl) E N ( r , c ) in a preagreed-upon order that the notation itself
suppresses.
One common nonrecursive neighborhood operator is the Zinmr operiztor. 7'he
output value at pixel position ( r , c ) of a linear operator is given as a possibly
position-dependent linear combination of all the input pixel values at positions in
the neighborhood of ( r,c ).
The most common nonrecursive neighborhood operator is the kind that is employed uniformly on the image. Its action on identical neighborhood spatial configurations is the same regardless of where on the image the neighborhood is located.
This kind of neighborhood operator is called shift-invariant or position invariant.
The action for a shift-invariant nonrecursive neighborhood operator can be
written as
g(r,c ) = 4 [ f ( r ' ,c') : (r',c') E ~ ( c)]
r ,
where the neighborhood N itself satisfies a shift invariance:
(rf,c')E N ( r , c ) implies ( r l - u , c l - v ) E N ( r - u , c - v )
forall ( u , v ) .
6.1 Introduction
265
+ r,
and v = c'
+ co. Then
g(r -ro,c -co) =+[f(u -ro,v -c0): (U -r0,v -co) EN(^ -ro,c -co)]
- co) : (u, v) E
N(r,c)] is the result of the neighborhood operator acting on the translated input
image, and g(r -ro, c -co) is the translation of the neighborhood operator acting on
the given image. Their equality states that the result of a shift-invariant operator on
a translated image is the same as translating the result of the shift-invariant operator
on the given image.
Compositions of shift-invariant operators are also shift-invariant. More precisely, suppose
and
g(rl - r0,c1- c,) = 41 [f (r" - rO,cl'- c,) : (r'l,cl') E Nl(rl,cl)]
Shifting f by (ro,co) produces a g shifted by (ro,co), and the g shifted by (rO,c0)
produces an h shifted by (ro,co). So shifting f by (ro,co) and then applying 9,
followed by & results in an h that has been shifted by (rO,c0).
An important kind of shift-invariant neighborhood operator is the linear shiftinvariant operator. The output value at pixel position (r, c) of a linear shift-invariant
operator is given as a fixed linear combination of all input pixel values at positions in
the neighborhood of ( r , c). The neighborhood itself is shift invariant, and the weights
of the linear combination are the same from neighborhood to neighborhood. If we
designate by W the neighborhood of all pixel positions around (O,O), then the action
of the linear shift-invariant operator on an image f having domain F can be written
266
~eighborhoodOperators
(el $
(0 $
Figure 6.2 Common 3 x 3 masks used for noise cleaning. The mask in (a) is
called the 3 x 3 box filter.
The weight function w is called the kernel or the mask of weights, W is the domain
of w ,and the operation itself is called the crosscorrelation off with w . We write
Figures 6.2 and 6.3 illustrate common linear shift-invariant operators for noise
cleaning. Figure 6.4 shows the indexing of the mask, the indexing of the image,
and the application of the mask to the image. Figure 6.5 shows the application of a
mask with weights to an image.
Figure 6.3 Common 5 x 5 masks used for noise cleaning. The mask in (a) comes
from a repeated application of a 3 x 3 box filter; in (b) from the hnction
8 - (r2 + c2); in (c) from
(c:2),
where r and c range between -2 and
+2 and (:) designates the binomial coefficient (:) - ( U - uu!) ! u !
(,t2)
Figun 6.4 Indexing of the mask (left), and of the image (right), and the computation of the crosscornlation for pixel (2.2) of the image.
mask
input image with
extended domain
FigW 6.5 Application of a mask of weights to pixel (2.2) of an image for the
croskcomlation operation.
268
Neighborhood Operators
Figure 6.6 A nonsymmetric L-shaped mask and the application of the convolution operator to an image using this mask. The effect is like applying a crosscornlation operator with the L-shaped mask flipped.
Cf * w)(r, c) =
f (r
- rl,c - c1)w(r',c')
(r1.c')W
( r -rl,c-cl)F
The convolution has the same form as the crosscorrelation, but it indexes differently
over the pixels of the image. If the mask is symmetric, convolution and correlation
are the same; otherwise the convolution operation flips the mask. Figure 6.6 illustrates the indexing of a nonsymmetric mask and an image and the operation of the
convolution operator. A more detailed discussion of linear shift-invariant operators
is given in Section 6.4.
Another example of a shift-invariant operator includes the gray scale morphological operators of dilation and erosion, whose action can be written as
d(r, c) = rnax V(r - r', c - c')
(r1.c')EW
and
V ( r + r', c
e(r, c) = ( r m,in
' . ~)EW
+ w(rl, c')]
respectively. Finally, for our last example we take the multiplicative maximum and
minimum shift-invariant operator
V ( r - r', c - cl)w(r', c')]
m,(r, c) =
(r
.C
)EW
defined by Ritter, Shrader-Frechette, and Wilson ( 1987), Ritter and Wilson ( 1987),
and Wilson and Ritter (1987).
(a)
269
(b)
Figure 6.7 Indexing of the pixels in (a) 4-connected and (b) ltonnected neighborhoods of xo.
( 1971) examines symbolic-related neighborhood operators that compute local properties, most of which we do not address here. The symbolic operators are useful for
image segmentation tasks as well as for the labeling of primitives involved in structural image analysis. The common form of the operators suggests the possibility of
a large-scale integration hardware implementation in the VLSI device technology.
The simple operators discussed here may use two basic neighborhoods: a
4-connected neighborhood and an 8-connected neighborhood. As illustrated in
Fig. 6.7, the 4-connected neighborhood around a pixel consists of the pixel and
its north, south, east, and west neighbors. The 8-connected neighborhood around a
pixel consists of all the pixels in a 3 x 3 window whose center is the given pixel.
Recall that recursive neighborhood operators are those for which a previously
generated output may be one of the inputs to the neighborhood. In this manner the
output memory is also part of the input. Nonrecursive .neighborhood operators are
those using independent image memory for input and output. Previous outputs cannot influence current outputs. Rosenfeld and Pfaltz (1966) call recursive operators
sequential; nonrecursive operators, parallel.
The region-growing operator uses the primitive function h in the following way.
For the operator in the 4-connected mode, let a, = x0. Define a, = h(a,-, ,x,), n =
1, . . . ,4. Then the output symbol y is defined by y = a,. For the operator in the
8-connected mode, let a. = x0. Define a, = h(a,-,,x,), n = 1,. . . ,8. Then
the output symbol y is defined by y = as. Figure 6.8 shows the operation of the
region-growing operator in the 8-connected mode.
The region-growing operator is related to the binary dilation morphological
operator with a structuring element that is a 4- or 8connected neighborhood. However, instead of working on the binary images on which the binary dilation operator
Neighborhood Operators
= h(a, g) = a
all = h(a,g) = a
filled image
(one iteration)
Figun 6.8 Operation of the region growing operator in the 8connected mode.
271
The region-shrinking operator uses the primitive function h in the following way. For the operator in the 4-connected mode, let a. = x0. Define a, =
h(a,,l,x,), n = 1,. . . ,4. Then the output symbol y is defined by y = a,. For
the operator in the 8-connected mode, let a. = x0. Define a, = h (a,-, ,x,), n =
1,. . ,8. Then the output symbol y is defined by y = a,. Figure 6.9 illustrates this
operation. Region shrinking is related to binary erosion, except that region-shrinking
operates on labeled regions instead of binary-1s.
A more sophisticated region-shrinking operator shrinks border pixels only if
they are connected to enough pixels of unlike regions. In the 8connected mode it
sets a, = h(xo,x,), n = 1,. . . ,8 and defines the output symbol y by
=o:{
if#{n I a , = g ) > k
otherwise
As mentioned in the section on nearest neighbor sets, to obtain a regionshrinking (or region growing) that is close to a Euclidean-distance region-shrinking
(or growing), the 4-neighborhood and the &neighborhood must be used alternately,
approximating as closely as possible the ratio */1 (Rosenfeld and Pfaltz, 1968).
A ratio of 413 can be obtained by the sequence 4 : 3 =< 4,8,4,8,4,8,4 >, and a
ratio of 312 can be obtained by the sequence 3 : 2 =< 4,8,4,8,4 > Alternating
these two sequences will give a ratio of 715, which is just under
Using one 4:3
Altersequence followed by two 3:2 sequences gives a ratio of Ion, just over
nating between < 4 : 3, 3 : 2, 3 : 2 > and < 4 : 3, 3 : 2 > gives a ratio of 17/12,
by less than 2.5 x
an approximation that should be
which differs from
good enough for most purposes.
The choice of Cneighborhood or 8-neighborhood for the current iteration that
best approximates the Euclidean distance can be determined dynamically. Let N4
be the number of uses of the Cneighborhood so far and N8 be the number of uses
of the &neighborhood so far. If I N4 - & ( ~ 8 1) I < 1 N4 1 - a
~ I, then
8
use the 8-neighborhood for the current iteration; otherwise use the 4-neighborhood.
a.
a.
272
Neighborhood Operators
b ifc=b
ifcf b
Its purpose is to classify the way a pixel is connected to its like neighbors. As
shown in Fig. 6.10, there are six values of connectivity, five for border pixels and
one for interior pixels. The border pixels consist of isolated pixels, edge pix&,
connected pixels, branching pixels, and crossing pixels. The connectivity number
operator associates with each pixel a symbol called the connectivity number of the
pixel. The symbol, though a number, is really a label. It has no arithmetic number
properties. The number designates which of the six kinds of connectivity a pixel
has with its like neighbors.
Labeling of the
' 8 ' Pixels
Binary Image
Isolated
1 Edee
Key:
2 Connecting
3 Branching
4 Crossing
5 Interior
Figure 6.11 Corner neighborhood corresponding to each of the east, north, west,
and south neighbors of the center pixel; (a) indexing pixels in a 3 x 3 neighborhood, (b) corner of X I , (c) corner of x2, (d) corner of x3, and (e) comer
of ~ 4 .
273
274
Neighborhood Operators
For 8-connectivity, a pixel is an interior pixel if its value and that of each of
its S-comected neighbors are the same. Otherwise the 8-connectivity of a pixel is
given by the number of times a 4-connected neighbor has a different value and at
least one pixel in the corresponding three-pixel neighborhood comer has the same
value.
The connectivity operator requires two primitive functions: a function h that
can determine whether a three-pixel comer neighborhood is connected in a particular way and a function f that basically counts the number of arguments having a
particular value.
For 4-connectivity, the function h of four arguments is defined by
The connectivity operator using Ccomectivity is then defined in the following way
by letting
a1 = h ( ~ 0 , ~ 1 , ~ 6 , ~ 2 )
=4
a, = r
a2 = 4
a2 = r
a3 = 4
a3 = r
a4 = 4
a4 = r
(11
f(a1,a2,a3,a4)
=4
(crossing)
a l= r
275
f (alra2,a3,a4)
=5
(interior)
a1 = r
at - 4
a2 = r
a1 = 4
a3 = r
a4 = 4
a4 = q
f(al,a2,a3,as)
=3
f (a1,a2,a3,a4)
=1
(branching)
(edge)
Figure 6.12 Computationof the Yokoi connectivity number using konnectivity.
that are the same as that of the center pixel as one travels around the &neighborhood
of a pixel.
The Rutovitz connectivity number requires a three-argument primitive function
h defined by
Then set
276
Neighborhood Operators
Y=CO.
n=l
h(b,c,d,e) = 1 i f c # b a n d ( d = b o r e = b )
0 otherwise.
**
****
******
** **
* **
* **
* **
***
* *
**
* *
(a)
(b)
input image
connected shrink
output image
:
:
277
{ xg
ifexactlyoneofal,az,a3,a4= 1
otherwise
Using the indexing convention of Fig. 6.1 1, we define the connected shrink operator
by letting
Result
278
Neighborhood Operators
during the iterative process so that it may be counted before it is made to disappear by the process. A three-dimensional extension to this nomecursive algorithm
can be found in Arcelli and Levialdi (1972). Lobregt, Verbeck, and Groen (1980)
discuss a recursive operator for three-dimensional shrinking.
******
******
****
** *
* **
*
**
*
*
(a)
input image
(b)
thinned output
image
*******
* ****
* ***
*
*
279
Figure 6.15 Result of one application of the thinning operator using the 4connectivity deletabiiity condition.
6.2.9
280
Neighborhood Operators
In the 8connected mode, the output symbol y of the first operator is defined by
Y = ~ ( x ~ , ~ ~ ~ x ~ ,1)
xJ,xO;
In the 4connected mode, the output symbol y is defined by
For the second operator, the primitive function is simply the minimum function. In
the 8connected mode, the output symbol y of the second operator is defined by
In the 4connected mode, the output symbol y is defined by
where g(al, . . . ,a ~d); = &{al, . . . ,aN)+d. Figure 6.16 illustrates the operation
of the recursive distance operator (Rosenfeld and Pfaltz, 1968).
It is possible to compute a distance transformation that gives distances closely
approximating Euclidean distance. For the first left-right, topdown pass, the output
y is given by
Y = hn{h(xz,x3,xo; ~ I ) , ~ ( x ~ , x ~d ,~x )o);
281
Original
Pass 1
0 000 0
00 I l l 10
001212 0
0 1 1
001
3330
23eo
Pass 2
Figure 6.16 Operation of the recursive Rosenfeld and Pfaltz distance operator.
using konnectivity.
5+
$
282
Neighborhood Operators
Verwer (1988) minimizes the maximum relative error and determines that d l =
,9604 and d2 = 1.3583. For an integer approximation of scaled Euclidean distance,
Verwer recommends d l = 5 and d2 = 7 or, for a better approximation, d l = 12
and d2 = 17.
Danielsson (1980) propagates the pair of vertical and horizontal distances and is
able thereby to compute a distance transformation that is almost exactly Euclidean,
a small error occurring only in a few situations.
morphologically closed with a disk of radius p, then the given connected region
wilf fuse with some other connected region.
To determine the radius of fusion for every binary-l pixel, we begin by computing the comected components image for the given binary image I. In this way every
binary-1 pixel can be labeled with a unique label corresponding to the connected
region to which it belongs. The binary4 labeled pixels are given the background
label.
Exactly as we do to determine the nearest neighbor sets, we perform a region
growing to label every binary4 pixel with the label of its nearest labeled pixel.
One iteration of a shrink operation on this image can label with a 0 all border
pixels (pixels that have a neighbor with a different label). We use this image as the
input to the distance transformation operator that labels nonzero every pixel with
its distance to the nearest border. Then we mask this distance transformation image
with the original image. Pixels having a nonbackground label on the original image
get labeled with the distances associated with their spatial position on the distance
transformation image. Then we give each pixel in a labeled region the minimum
distance a pixel in the region has received. Pixels that are not labeled retain the
background label.
The radius of fusion as defined here differs from the radius as defined by
Rosenfeld and Pfaltz (1968). There the radius of fusion for a pixel is the smallest
integer n such that after region-growing n iterations and region-shrinking n 1
iterations, the pixel retains a nonbackground label. The difficulty with this definition
for radius of fusion is that by using 4-neighborhoods, for example, it is possible
for a pair or triangle of pixels never to fuse. The radius of fusion is therefore not
defined. Defining it by some large number in such cases is artificial.
-:
L
P*
283
P'
284
Neighborhood Operators
and their reachability regions are more complex than simple 3 x 3 operators precisely
because a relative extrema can be a region of pixels larger than 3 x 3.
In this section we discuss operators for the identification of nonminima and
nonmaxima, for the identification of relative minima and maxima, and for the identification of the region of pixels reachable by ascending or descending pattern from
relative extrema regions.
if a4 = xo = br
if a4 = xo < b4
if a4 < xo = b4
if a4 < xo < b4
Figun 8.18 Example of how a pixel can be its neighborhood maximum yet not
be a part of any relative maximum. In its &neighborhood, the central 3 is a
maximum, yet the flat of 3s it belongs to is a transition region.
285
3 (transition)
if a8 < xo < b8
(a)
(b)
left-right top bottom scans
(dl
(c)
Figure 6.19 Pixel designations for the recursive operators that require forward
and reverse scans and a numeric image and that recursively produce an output
image. (a) and (c), input numeric images; (b) and (d), output images.
286
Nolghborhood Operators
The relative maxima operator uses two primitive functions h and max. The
four-argument function h selects the maximum of its last two arguments if its second
argument is greater than or equal to its first argument. Otherwise it selects the third
argument. Hence
max (d,e) i f c 2 b
h(b,c,d,e) = {d
ifc<b
The primitive function max selects the maximum of its arguments. In the 8connected
mode, the relative maxima operator lets
a o = l o and a , = h ( ~ ~ , ~ , , a , , ~ , l , ) , n = 1 , 2 , 3 , a n d 4
The output value 1 is defined by 1 = a4. In the 4-connected mode, the operator is
a 0 = l o and a,=h(xo,x,,an,l,l,),
n=land2
The output value 1 is defined by 1 = a2. Figure 6.20 illustrates the application bf
the relative maxima operator.
The relative minima operator is defined much l
hthe relative maxima operator,
with the max function replaced by the min function and all inequalities changed.
Hence, for the relative minima operator, h is defined by
' , :
287
288
Neighborhood Operators
Bottom-Up Pass
Result of Bonom-Up
2
9
6
4
Final Result
Table 6.1
289
Propagation table for the recursive relative extrema operator. The table gives
the propagation label C for any pair a , b of neighboring pixels.
a labels
b labels
Transition
Nonmaxima Transition
Flat
Transition
Transition
Transition
Transition
Transition
Transition
2. Either L, is a minimum (1) and L, is a maximum (2) or vice versa. In this case,
since a region that is constant in value cannot simultaneously be a minimum
and a maximum, we mark both pixels as transitions (3).
3. Either L, or L, is a transition (3). In this case, since a region that is constant
in value and has one pixel marked as transition must be a transition region, we
mark the nontransition-region pixel as a transition region, thereby propagating
the transition label.
This propagation rule requires one four-argument primitive function h , defined
by
3 (transition)
if x = y and (a = 3) or (b = 3)
(a = 1 and b = 0) or
(a = 0 and b = 1)
2 (flat)
i f x = y and (a = 2 and b = 2 )
1 (nonminimum)
if x = y and (a = 1 and b = 2) or
(a = 2 and b = 1) or
(a = 1 and b = 1)
290
Neighborhood Operators
ifx>y
291
The operator uses the primitive function h in the &connected mode by letting a. = lo
anddefininga, =h(a,-l,l,,xo,x,), n = 1,2,3,and4. Theoutputlabell isdefined
by1 =a4.
The 4connected mode sets a. =loand defines a, = h ( a , ~ l , l n , ~ o , ~n, )=, 1
and 2. The output label I is defined by 1 = a2.
The ascending readability operator is defined just as the descending reachability operator is, except that the inequalities are changed. Hence, for the ascending
reachability operator, the primitive function h is defined by
b ifa=gandx>y
h(a,b,x,y) =
if a # g , b # g , a # b, andx > y
*v = u *W,
Cf*w)*v = f *(w * u ) ,
Cf+g)*w=f *w+g*w,
(af) * w = aCf * w) for any constant a,
(W*v)-=**U1, where f designatesthereflectionof f , f ( r , c ) = f(-r,-c).
W
292
Neighborhood Operators
293
- a,B = c - b . Then
294
Nrlghborhood Operators
$W $
V . B y definition,
(f * w ) ( r - a , c -b)v(a,b)
[(f * w ) * v ] ( r , c )=
(cr,b)Ev
1. [ ( f + g ) * w ] ( r , c ) = C f * w ) ( r , c ) + ( g * w > ( r , c ) ,
2 . [ ( af ) * w]( r ,c ) = a(f * w)(r,c ) for any constant a .
295
* 6.
Pmt:
By definition,
296 ~elghborhoodOperators
Making a change of variables, we let a' = -a and b' = -b. Then
a ( r - o',c - bl)C(a',b')
=
lal.b'~~V
4
f(r+a,c+b)w(a,b)
(a.b)EW
(r+..c+b)EF
g~w)(r,c)
f ( r - al,c
- bl)w(-a', -b3
f ( r - a', c
- b')3(a1, b')
(-a1, - b l ) W
(r-al.c-bl)F
=
-bl)W
(r-a1.c -bl) )L
(-8'.
= (J * w)(r,c)
Using the associative property for correlation and the relation f @w = f *w between
convolution and cross-correlation, we can determine whether crosscorrelation is
associative. It is not. However, it has a close analog. Consider
( J @ w ) @ u= C f @ ~ ) * f i
=cf*it)*v'
= f * ( a *fi)
= f gI(lV*fi)"
= f @ ( w*v)
= f @(w@C)
297
The result is just the kernel of the convolution, which suggests naming w as the
impulse response function or the point spreadfunction.
6.4.2
Separability
When the weight matrix w of a linear shift-invariant operator has a product decomposition, considerable savings can be obtained in the computation of the resultant
image by performing the computation in a separable way. If the domain of w is
a (2M+ 1) x (2N + 1) neighborhood, then a straightforward computation would
require (2M + 1) x (2N + 1) multiplications and-additions for each output value
generated. When w has a product decomposition, it may be written as
as illustrated in Fig. 6.22. In this case we can structure the computation by computing first along the columns for each row of the image and then along the rows
Figun 6.22 A separable mask w that can be decomposed into the two masks u
and u as shown.
298
Neighborhood Operators
for each column of the image resulting from the first computation:
f ( r + r', c
g(r, C ) =
+ cl)w(rl,c')
(r'.ct)w
(r+r'.r+c1)F
+ r'. Letting
g(r, c ) =
rl=-M
k ( r ,c ) =
km(r,c )
m=l
Then the convolution off with k has an implementation as the sum of M separable
convolutions since convolution is linear:
Cv * km)(r,c )
m= l
Exercises
299
(Rao and Mitra, 1971), subroutines for which are available in most libraries of linear
routines.
The singular-value decomposition of an I x J matrix K is given by
K = UAV
where U is an I x I orthonormal matrix, A is an I x J matrix all of whose nonzero
entries are on the diagonal, and V is a J x J orthonormal matrix. The diagonal
entries of A are nonnegative, and we assume that they are arranged in order of
decreasing value.
Examining the case I < J (the case I 1 J can be examined in an exactly
symmetric manner), we let the I columns of the matrix U be u,, . . . ,u,, each
column ui being an I x 1 vector
,,. . .,
UJ
Letting the rth row of ui be ui(r) and the cth column of v be v j(c), we can express
any kernel k defined on a I x J neighborhood as
Exercises
Suppose g(r, c) = +lf(r +r', c +c') : (rl,c') E W]. Show that is a shift-invariant
operator.
6.2. Suppose g(r, C)= 9lf(r - r',c - c') : (r', c') E W].Show that 4 is a shift-imarbt
operator.
6.3. Programming neighborhood operators to execute on images efficiently is important.
Many neighborhood operators can be expressed in the iterative form, where
6.1.
CHAPTER
C O N D I T I O N I N G AND
LABELING
Introduction
Conditioning and labeling are among the most common uses of neighborhood operators. Conditioning is based on a model that suggests the observed image is composed
of an informative pattern modified by uninteresting variations that typically add to
or multiply the informative pattern. Conditioning estimates the informative pattern
on the basis of the observed image. Thus conditioning suppresses noise, which can
be thought of as random, unpattemed variations affecting all measurements. Conditioning can also perform background normalization by suppressing uninteresting
systematic or p a t t e d variations. Conditioning is typically applied uniformly and
is usually context independent.
labeling is based on a model that suggests the informative pattern has structure
as a spatial arrangement of events, each spatial event being a set of connected
pixels. Labeling determines in what kinds of spatial events each pixel participates.
E;or example, if the interesting spatial events of the informative pattern are only
those of high-valued and low-valued pixels, then the thresholding operation can
be considered a labeling operation. Other kinds of labeling operations include edge
detection, comer finding, and identificationof pixels that participate in various shape
primitives.
This chapter is organized by application. We discuss neighborhood operators
for the conditioning operations of noise cleaning and sharpening and for the labeling
operations of edge and line detection.
Noise Cleaning
Noise cleaning is very often one of the first operators applied to an image in a computer vision task. Noise cleaning uses neighborhood spatial coherence and neigh-
304
borhood pixel value homogeneity as its basis. Noise-cleaning techniques detect lack
of coherence and either replace the incoherent pixel value by something more spatially coherent by using some or all of the pixels in a neighborhood containing the
given pixel or they average or smooth the pixel value with others in an appropriate
neighborhood. Mastin (1985) reviews a number of approaches. Typical masks for
the 3 x 3 and 5 x 5 neighborhoods are shown in Figs. 6.2 and 6.3 of the preceding
chapter.
The operator that computes the equally weighted average is called the boxjlter
opemtor. The operator is important because of the ease with which it can be made
to execute quickly. Not only is the operator separable, but by implementing it as
a recursive operator, any sized neighborhood box filter can be implemented by
using just a constant five operations per pixel, independent of neighborhood size.
The operations are two additions, two subtractions, and one division (McDonnell,
1981).
The box filter operator with a (2M + 1) x (2N + 1) neighborhood-smoothing
image f is defined by
M
The recursive calculation then proceeds in the following way. Suppose row r has
just been processed and we are beginning to process row r + 1. Then the partial
row sum function h(r 1, c), defined by
This requires two operations per pixel. The partial row sums
h(r-M,c), h ( r + l - M , c )
,..., h ( r + l + M , c )
for the previous 2M + 1 rows as well as for the current row must be kept available.
Then the partial column sum function g(r + 1, c) defined by
M
for row r
'
J4
+ 1 is recursively computed by
Again this takes two more operations per pixel. Finally, the box filter output is
calculated by
1
k(r,c) =
g(r, c)
(2M 1)(2N 1)
-a cost of one division per pixel.
305
A common linear smoother is the Gaussian filter. Here the weight matrix w is
given by
k=
C -4 (9)
(r,c)EW
The neighborhood W must be big enough to allow the pixels on the periphery of
the neighborhood to be a distance of two or three a from the center. The twodimensional filter can be written as a product of two one-dimensional Gaussians,
which makes it possible to implement the Gaussian filters efficiently and separably,
first convolving a one-dimensional Gaussian in the row direction and then convolving
a one-dimensional Gaussian in the column direction. That is, if
= w,(r)w,(c)
thenI*w =(I*w,)*w2.
The application of linear noisecleaning filters has the effect of defocusing the
image. Edges become blurred. Narrow lines become attenuated. All other things
being equal, the larger the neighborhood of the filter, the greater the defocusing
action. The proper determination of neighborhood size and mask values depends on
the correct balance between the better job of noise removal that large neighborhood
operators can provide and the loss of detail that this entails. Determination of the
correct balance depends on having a model that relates the observed neighborhood
pixel values to the true underlying neighborhood pixel values and the effect the noise
has on the latter. We discuss this relationship from a statistical point of view.
306
f ( c ) = c'Cc + X(ctl - 1 )
Taking partial derivatives off with respect to each component of c and using the
fact that C is symmetric, we find that the vector of partial derivatives o f f with
respect to the components of c is given by
We set this equal to 0 and solve for c in terms of the Lagrange multiplier X. Hence
307
Substituting this value of c into the constraint c'l = 1, we may solve for A.
c'l =
($1
(z-'~1
) ~=
(+) ll(C-')'I
Since C = C', we must also have C-I = (Ct)-' = (C-I)'. Hence c'l =
(+) 1'C-'1 = 1, so that
-2
A=llC-'1
Therefore
EXAMPLE 7.1
Consider the case when the neighborhood is 3 x 3 and the random noise is
uncorrelated. To account for the reality that the greater the distance between a
given pixel and the center pixel of the neighborhood, the greater its deviation
from the expected value, we take the variance of the north, south, east, and
west nearest neighbors of the center pixel to be a, a 2 1, times the variance
of the center pixel, and the variance of the diagonal neighbors to be 8, 3 a ,
times the variance of the center pixel. Figure 7.l(a) and (b) illustrates these
conventions.
(a)
(b)
(c)
308
there results
This is shown in Fig. 7.l(c). Figure 7.2 shows particular 3 x 3 linear operators
and their associated values of a and 8.
EXAMPLE 7.2
We again consider a 3 x 3 neighborhood but assume that the noise is correlated
in such a way that adjoint pixels have correlation p and with a distance n apart
have correlations pn, n > 1, and that all pixel values have the same variance.
309
P'
P'
P'
P'
p2
p2
P'
p2
P'
p2
P'
p2
P'
P'
p2
p2
P'
p2
p2
P'
p2
p2
p2
p2
(1
C=02
1 P'
P'
p2
p2
p2
p2
P'
P'
p2
P'
p2
P'
p2
p3
p2
p2
P'
P'
P'
P'
\ P
-P
1+p2
-P
1
f-P
-p
-p
p2
-p(l
+ p2)
p2
1 +pi
-p(l
+ p2)
-P
-P
Figure 7.3 shows this result as well as some particular 3 x 3 linear operators
with their associated values of p .
3 10
Figure 7.3 (a) General mask for an equal variance correlated case; (b) mask for
= 0; (c) mask for p = -112; (d) mask for p = 112; (e) mask for p = 1; and
(f) mask for p = -1.
311
To illustrate this, consider Fig. 7.5, which shows the upper half of a covariance
matrix associated with a 4 x 4 neighborhood. No two entries are identical. Table 7.1
lists all pairs of pixel locations that stand in the same translational relationship. When
the image second-order statistics are stationary, then the covariance matrix takes the
form shown in Fig. 7.6.
When the second-order statistics are stationary and isotropic, all pairs of pixels
with the same distance between them have the same covariance. In this case the
covariance can be written as a function
u [(i - m)'
+ ( j - n)?]
when the distance involved is the Euclidean distance. Table 7.2 lists all pairs of
pixel locations in the 4 x 4 neighborhood standing in the same Euclidean distance
relationship. The resulting isotropic covariance matrix is shown in Fig. 7.7.
Let
y r k d
d k r y
3 12
.i?
4
2
For example, all the pixel location pairs under column c are related by
three rows down and one row across.
k j i b
b i j k
.)
d c b a
c d c b
s 4 = ( ba bc c
d d
Then the isotropic covariance matrix of Fig. 7.7 can be written as a partitioned
313
matrix:
v2S v 1 s -vzS
v4S v3s v2S v 1 s
where u ,v 2 ,vjr and v 4 are scalars. The composite matrix C can then be expressed
Empirical experiments with image data indicate that actual covariance matrices
do not often deviate too far from this form. The inverse of a matrix that can be
represented as a Kronecker product is the Kronecker product of the inverses. Hence
314
Table 7.2
315
()
U'V
uxu=
u;.
UJV
( A x B)(u x v ) = ( A u ) x (Bu)
This makes the matrix C-' easy to calculate:
'*
3 16
has a Kroneker product form, which means that it can be written as a product of a
row function times a column function, thereby permitting fast implementation as a
separable computation.
Afrait (1954), Andrews and Kane (1970), Andrews and Caspari (1971), and
Haralick, Griswold, and Kattiyakulwanich(1975) give more details about the algebra
associated with the separable implementation.
7.2.3
In outlier or peak noise, the value of a pixel is simply replaced by a random noise
value typically having little to do with the underlying pixel's value. This can happen when there is a transmission error, when there is a bad spot on a vidicon or
CCD array, when external noise affects the analog-to-digitalconversion process, or
when something small and irrelevant occludes the object of interest. A neighborhood operator can be employed to detect and reduce such noise. The size of the
neighborhood must be larger than the size of the noise structure and smaller than
the size of the details to be preserved. The basis of the operator is to compare an
estimate of the representative neighborhood value with the observed value of the
neighborhood's center pixel. The center pixel value is an outlier if this difference
is too large.
Let x,, . . . ,XN denote all the pixel values in the neighborhood with the exception of the center pixel value. Such a neighborhood is called the centerdeleted
neighborhuud. Let y denote the center pixel value. Let ji be the estimate of the
representative value of the centerdeleted neighborhood. We take fi to be the value
that minimizes the sum of the squared differences between ji and x1,.. . ,XN. That
is, ji minimizes E = , ( x n- fil2.The minimizin ji is quickly found to be the mean
B x,. If y is reasonably close to fi,
of the centerdeleted neighborhood: ji = En=,
we can infer that y is not an outlier value.
of the neighborhood operator is defined by
The output value zmd,,
Zoutlier rrmovai
{ ji
ifp-jil<e
otherwise
(7.5)
Using ly - jiI as the test statistic poses some problems. If 9 is too small, the edges
will be blurred. If 8 is too large, the noise cleaning will not be good.
If y is statistically significantly different from ji, then we can infer that y is an
outlier value. Statistically significant implies a comparison of y - fi to the centerdeleted neighborhood variance d2, which is defined by d2 = & E:=l(~n - ji)2
This suggests using the test statistic
t = IY;fil
and rejecting the hypothesis of significant difference when t is small enough, small
being smaller than threshold 8. The output value with a contrast-dependent-threshold
'-
317
{y
ji
ifl*I<0
otherwise
This operator replaces a pixel value with the neighborhood mean if the pixel's value
is sufficiently far from the neighborhood mean.
Instead of making the replacement occur completely or not at all, we can define
a smooth replacement. The output value z is a convex combination of the input pixel
value y and the neighborhood mean, where the weights of the convex combination
depend on 1
91 :
7.2.4
k-Nearest Neighbor
Davis and Rosenfeld (1978) suggest a k-nearest neighbor averaging technique for
noise cleaning. All the pixel values in the neighborhood are compared with the central pixel value. The kclosest pixel values are determined. The k-nearest neighbor
average is then the equally weighted average of the k-nearest neighbors. For 3 x 3
neighborhoods Davis and Rosenfeld found k = 6 to perform better than other values of k. They also found that iterating the operator for three or four iterations
produced better results than just one application.
(r'cl) = (r,c)
otherwise
318
where
and the noisecleaned output g is given by a convex combination of the pixel neighborhood values
c
N
2 =
WX(,
(7.10)
n-1
Median Operator
The most common order statistic operator used to do noise cleaning is the median.
For K x K neighborhoods where K is odd, N = K x K will also be odd, and the
median is given by
Z m ~ d=
' ~X
N 1
(3-1
(7.1 1)
319
Gallagher and Wise (1981) showed that repeatedly applying a median filtering
operation to a onedimensional signal dust produce a result that is unchanged by
further median filtering operations. The fixed-point result of a median filter is called
the median mot. Median roots consist only of constant-valued neighborhoods and
sloped edges.
Narendra (1981) discusses an impleme&a~onof a separable median filter. Far
an N x N .separable median filter, the image is first filtered with an N x 1 median
filter and then a 1 x N median filter. This implementation reduces the computational
complexity of the median filter. Narendra demonstrates that although the output of
the separable median filter is not identical to the full two-dimensional median filter,
the performance is sufficiently close.
Nodes and Gallagher (1983) show that it is almost always the case that repeated applications ofthe separable two-dimensional median fiiter results in a twodimensional median root.
Fitch, Coyle, and Gallagher (1985) suggest that the neighborhood size and
shape of a median filter be chosen so that the underlying noise-free image will be a
median root for this filtering operation with the given neighborhood size and shape.
The running median does a good job of estimating the true pixel values in
situations where the underlying neighborhood trend is flat or monotonic and the
noise distribution has fat tails, such as double-sided exponential or Laplacian noise.
Thus it does well in situations where the neighborhood is homogeneous or when
the central feature in the neighborhood is a step edge.
The running median is effective for removing impulsive noise. However, when
the neighborhood contains fine detail such as thin lines, they are distorted or lost.
Corners can be clipped. Also it sometimes produces regions of constant or nearly
constant values that are perceived as patches, streaks, or amorphous blotches. This
can give the image a mottled appearance. Such artifacts may suggest boundaries
that really do not exist. Bovik (1987) gives an analysis of this effect.
Brownrigg (1984) defines a weighted-median filter. The weights are given by a
weight matrix w defined on domain W = {(i, j)l - N 5 i 5 N, -N 5 j 5 N) for
an odd-sized 2N + 1 x 2 N + 1 neighborhood. The sum L of the weights is required
to be odd:
N
L=
C C w(i,j)
i=-N j=-N
Let x(i, j), (i, j ) E W denote the value of the (i, j)th pixel in a neighborhood. The
weighted median first forms a list y ,,. . . ,y, of values. For each (i, j ) E W, x(i, j)
is put into the list w(i, j ) times. The weighted median is then given by the median
of the values in the list:
(7.12)
Zwcigw m x t i i = Y (
w)
320
He finds that the weights with the smallest sum that satisfy the requirements is the
weight matrix whose center weight has value 3 and whose other weights have value
1. The operator of the weighted median is illustrated in Fig. 7.8.
The running-median operator can be put into a contextdependent threshold
form (Scollar, Weidner, and Huang, 1984). Define the interquartile distance Q by
'"
where truncation is used for those cases that are not evenly divisible. The output
value z of the median with contrast-dependent threshold is given by
zmedjanotherwise
(7.13)
Arce and McLaughlin (1987), Nieminen and Neuvo (1988), and Nieminen, Heinonen,
and Neuvo (1987) discuss other variations of the running-median operator.
Trimmed-Mean Operator
Another common order statistic operator that works well when the noise distribution
has a fat tail is the trimmed-mean operator (Bedner and Watt, 1984). Here the first
k and last k order statistics are not used, and an equal weighted average of the
central N - 2k order statistics is used. When a = kN,the result is called the
a-trimmed mean.
N-k
Figure 7.8 How the weighted-median operator works. (a) Weight matrix; (b) and
(c) two configurations for which the one-pixel-wide, high-valued streak is to be
eliminated; (d) configuration for which the block of 2s is to be preserved. Using
the weight matrix (a), configuration (b), (c), and (d) produce the same list of
sorted values (e) whose median value is 2.
-:
R7.2 Noise
TB
rg
Cleaning
321
Lee and Kassam (1985) indicate that when the noise has a double-exponential distribution, a value for a around .4 gives the minimum variance among all a-trimmed
mean estimators. Peterson, Lee, and Kassam (1988) show that the trimmed-mean
filter can perform better than the linear box lilter as well as the median filter in
white noise suppression, while at the same time having midrange edge-preserving
characteristics comparable to those of the median.
Midrange
When the noise distribution has tails that are light and smooth, the midrange estimator can be more efficient than the mean (Rider, 1957; Arce and Fontana, 1988).
It is defined by
- 1 g +eX(WI
~
~ - [X(II
(7.15)
Typically the density P(xn I p) has a functional form that depends only on (p -xn)'.
In this case we seek the estimator ji that minimizes
322
If we take the prior P(P) to be uniform and the loss fyction to be the win-andlose-all loss function-infinite gain when 6 = p (a negative infinite loss) and zero
loss when ji # p-then the estimator ji must maximize
logp [ ( i - xnI2l
n=l
where
P(Y) = &a
so that
is proportional to
For small y this is just smaller than 1, and for large y it approaches 0. Values far
from the estimate are weighted less and less. A two-term Taylor series expansion of
323
A,
[ - (+)'I2
[I
for
($1'
<1
(0
otherwise
'Ibkcy usually takes S to be the sample median absolute deviation and the constant
c to be 6 or 9. Thus xi that are more than four standard deviations away from the
estimate ji will get weight zero (c = 9).
There are other kinds of loss function than the win-and-lose-all kind. Another
common one is the squared loss function L(fi,p) = (ji - p)2. Assuming again a
uniform prior, we seek to find a 6 that minimizes
fi =
J P f i p [ ( ~- X ~ ) ~ I ~ P
n-1
N
J no1
n P [(P - x ~ )d~~ I
(7.16)
324
We then have
where wl = W ( N + ~=) 0.
Reflecting on the meaning of this equation, we see that as x,, gets farther away
from the center of the distribution, wn will become smaller.
+
+ +
325
the state is w and f (n 1) I f (n), let ko be the smallest positive integer for which
f (n + ko) < f (n ko 1) and f (n ko) < f (n ko - 1). This makes n ko the
location of the next relative minimum. If the state is DOWN and f (n 1) > f (n),
let jo be the smallest positive integer for which f (n + jo) > f (n + jo 1) and
f (n + jo) > f (n + jo- 1). This makes n jo the location of the next relative
maximum.
Nowifthestateiswandif f(n+l) 5 f(n)and f(n)-f(n+ko) < h,indicating
that the relative minimum is a minor fluctuation, then the output value remains the
same: g(n 1) = g(n), and the state remains w. However, iff (n 1) 5 f (n) and
f (n) - f (n ko) 2 h, indicating that the relative minimum is significant, then the
output value follows the input, g(n + 1) = f (n + I), and the state changes to DOWN.
Finally, if the state is DOTHNand if f (n + 1) > f (n) and f (n + jo) - f (n) < h,
indicating that the relative maximum is a minor fluctuation, then the output value
remains the same: g(n 1) = g (n), and the state remains DOWN. But, iff (n + 1) >
f (n) and f (n jo) -f (n) 2 h, indicating that the relative maximum is sigmlicant,
then the output value follows the input, g(n 1) = f (n + I), and the state changes to
VP. Tables 7.3 and 7.4 summarize the next stateloutput table for the state machine.
Ehrich suggests that the hysteresis smoothing be applied twice to the input image
data: once in a left-to-right or forward order, and then in a right-to-left or reversed
order. The output of the symmetric hysteresis smoother is the average of the two
results.
+
+
7.2.9
+
+
Sigma Filter
The sigma filter (Lee, 1983b) is another way in which a pixel can be averaged with
its neighbors that are close in value to it. Lee suggests looking to all the values in
the neighborhood of the given pixel value y and averaging y only with those values
that are within a two-sigma interval of y. If XI,.. . ,XNare the neighboring values,
then the estimate j j is determined by
.
whereM={nly-2u I x , 5 y + 2 0 )
When the number of neighboring pixels in the two-sigma ranges is too small,
Lee recommends taking j j as the average of all its eight immediate neighborhoods.
For a 7 x 7 neighborhood, Lee suggests that too small is less than four.
Experiments done by Lee (1983b) indicate that the performance of the sigma
filter is better than that of the gradient inverse weighted filter, the median, and the
selected-neighborhood average filter.
326
Table 7.3
Input to the state machine as a function of the input f and the previously
computed output g .
Input
Condition
takes the point of view that the pixels that are averaged should be ones that form
a tight spatial configuration as well as ones that are homogeneous and related to
the given pixel. To understand selected-neighborhood averaging, we must make a
change in point of view from the central neighborhood around a given pixel to a collection of all neighborhoods that contain a given pixel. For example, there are nine
3 x 3 neighborhoods that contain a given pixel. Neighborhoods in the collection,
however, are not required to be square. Some may be square, some rectangular,
and some diagonal. Nagao and Matsuyama use pentagonal and hexagonal neighborhoods. Graham uses a three-pixel vertical or horizontal neighborhood. Newrnan
and Dirilten suggest a five-pixel strip neighborhood oriented orthogonally to the
gradient direction.
Each neighborhood that contains the given pixel has a mean and variance. One
of the neighborhoods that contains the given pixel will have the lowest variance.
The noise-filtered value is computed as the mean value from the lowest variance
neighborhood containing the given pixel.
Table 7.4
Current State
UP
UP
UP
UP
DOWN
f ( n + 1)
DOWN
f(n
+ 1)
f(n
+ 1)
UP
f(n
+ 1)
DOWN
s(n)
g(n)
DOWN DOWN
f ( n 1) f ( n 1 )
327
To determine the minimizing a and (3, we take partial derivatives of c2 with respect
to a and p, set them to zero, and solve the resulting system of equations.
328
a=0:
+ u2
making the estimate Y a convex combination of the observed pixel value Z and the
mean p Y .
To use this estimate, a;, p y , and a2 all need to be known. In the case of
stationary non-signaldependent noise, the noise variance u2 can be regarded as
constant over the whole image. It is reasonable to take it to be known. However, py
and u; change over the image, and it is not reasonable to regard them as known.
They may be estimated as the mean and variance of the central neighborhood around
~
the values in the neighborhood, then
the given pixel. If x , , . . . , x denote
and
are the estimated neighborhood mean and variance. We take these to be the true
mean and variance:
p '=fiz
and a t = 6;
Not all noise is additive. Under low light conditions, the noise has more of a
Poisson character, which is better modeled by multiplicative noise. Here the observed Z can be represented as
Let
329
Taking the partial derivative of c2 with respect to a! and 6 and setting them to
zero results in
ae2
- = 0 = 2(a! - I)(&
aa!
As in the additive noise case, the required values for py and a: are obtained as
the neighborhood mean and variance. Lee (1980) makes a linearizing assumption,
gets a result close to the one given here, and shows its application to radar imagery
(Lce, 1981). Kuan et al. (1985) give the formulation expressed here.
114
Sharpening
The simplest neighborhood operator method for sharpening or crispening an image
IS to subtract from each pixcl some fraction of the neighborhood mean and then
scale the result. If x , , . . . ,xr are all the values in the neighborhood, including the
value y of the center pixcl, and fi =
x , is the neighborhood mean, the
c:=,
= SP- kfil
t.
w ~ t ha
9 x 9 neighborhood
336
is unreliable, replace it with the neighborhood median. On the other hand, if the
value of the center pixel is close to the neighborhood median, then to bring out
or sharpen the spatial detail of which it is part, replace its value with the median
sharpened value. Thus we have a neighborhood operator defined by
Wallis (1976) suggested a form of unsharp masking that adjusts the neighborhood brightness and contrast toward given desired values. The Wallis neighborhood
operator is defined by
where pd is the desired neighborhood mean, a: is the desired neighborhood variance, A is a gain or contrast expansion constant governing the degree to which the
neighborhood variance is adjusted toward the desired variance, and a is a constant
governing the degree to which the neighborhood mean is adjusted to the desired
mean. Reasonable values for A range from 3 to 25 and for a from 0 to 0.4. For
this operator to be effective, the neighborhood length N must be around $ of the
image side length.
The Wallis operator can be put in the median mode as well by replacing ji
with zdun, 6 with the observed interquartile range Q, and a,, with the desired
interquartile range Qd .
= min{f ( r
4
$
and
zmu = rnax{f (r
rV
t
Then
= ,Z
337
What is an edge in a digital image? The first intuitive notion is that a digital edge
between two pixels that appears when their brightness values are
is the bounda~~
sigdicantly different. Here "significantly different" may depend on the distribution
of brightness values around each pixel.
On an image we often point to a region and say it is brighter than its surrounding
area. We might then say that an edge exists between each pair of neighboring pixels
where one pixel is inside the brighter region and the other is outside. Such edges
are referred to as step edges. Hence we use the word edge to refer to a place
in the image where brightness value appears to jump. Jumps in brightness value
are associated with bright values of first derivative and are the kinds of edges that
Roberts ( 1965) originally detected.
One clear way to interpret jumps in value when refemng to a discrete array of
values is to assume that the array comes about by sampling a real-valued function
f defined on the domain of the image that is a bounded and comected subset of
the real plane R2. The finite difference typically used in the numerical approxirnation of first-order derivatives is usually based on the assumption that the function f
is linear. From this point of view, the jumps in value really must refer to points of
high first derivative o f f . Edge detection must then involve fitting a function to the
sample values. Prewitt (1970), Hueckei ( 1973), Brooks (1978), Haralick (1980),
Haralick and Watson (198l), Morgenthaler and Rosenfeld ( 198I), Zucker and Hurnme1 (1979), and Morgenthaler (1981b) all use the surface fit concept in determining
edges. Roberts (1965) and Sobel (1970) explain edge detectors in terms of fitting
as well. In this section we review other approaches to edge detection. We begin
with some of the basic gradient edge operators, which measure a quantity related
to edge contrast or gradient as well as edge or gradient direction. We also discuss
zerocrossing edge operators, the performance characterization of edge operators,
and some line detectors.
7.4.1
One of the early edge operators was employed by Roberts (1965). He used two 2 x 2
masks to calculate the gradient across the edge in two diagonal directions (Fig. 7.21).
Letting r , be the value calculated from the first mask and r2 the value calculated
from the second mask, he obtained that the gradient magnitude is d m .
Prewitt (1970) used two 3 x 3 masks oriented in the row and column direction
(Fig. 7.22). Letting p be the value calculated from the first mask and p2the value
calculated from the second mask, she obtained that the gradient magnitude g is
Jp:+pf and the gradient direction 9 taken in a clockwise angle with respect to
the column axis is arctan@ /p2).
Sobel (1970) used two 3 x 3 masks oriented in the row and column direction
(Fig. 7.23). Letting s, be the value calculated from the first mask and s2the value
calculated from the second mask, he obtained that the gradient magnitude g is
338
,/m
339
+ +
a simple linear relationship I(r,c) = arr Bc y (Fig. 7.29a) and that the masks
used for the vertical and horizontal differences are those shown in Fig. 7.29(b). For
such a linear gray level intensity surface, the gradient magnitude is dm,
and
the gradient direction 8 satisfies tan 8 = a/B.
The row and column differences computed by this edge operator are
340
Hence for this operator to compute gradient magnitude correctly, a and b must
satisfy 2(2a + b ) = 1. The gradient direction 8 computed by the operator satisfies
Gradient
direction
Gradient
direction
direction
240"
Gradient
direction
Edge contour
direction 60"
Gradient
direction
330"
Edge contour
direction 90
Gradient
direction 0"
Edge contour
direction 120'
Gradient
direction
30"
Edge contour
direction 150"
Gradient
direction
60"
(a)
Figure 7.28 Lineal gray level spatial patterns detected by the compass edge op
. erator. The white boxes show pixels with low values, and the gray boxes show
pixels with high values. Both edge contour and gradient directions are indicated.
Compass edge operator with maximal response for edges in the indicated direction. White boxes indicate pixels with high values; black boxes indicate pixels
with law values.
341
342
(b)
Figure 7.28. Continual.
0;
2af
+4 4
Whenuf=~~,a=~,andsince'2(2a+b)=1,thcnb=~.~resultisa~
multiple of the Prewitt operator. When u,2 = 2 4 , a = i,and since 2(2q b) = 1,
a$
b = The result is a multiple of the Sobel operator. Hence we see that the choice
of the Prewitt, Sobel, or Frei and Cbrn masks for edge detection should not be by
a flip of the coin but should be based on a noise model. In other words, selection
of a particular operator in effect commits us to a particular noise model.
To determine the properties of an edge operator for edge direction and contrast,
a.
8r
gr
(a)
(b)
Fiqun 7.29 (a) Gray level pattern for a linear gray level intensity surface; (b)
3 x 3 masks to compute differences in the row and column directions.
343
we must assume an appropriate edge model. The model we choose is the step edge.
We assume that a straight step edge passes directly through the center point of the
center pixel of the 3 x 3 neighborhood. All points on the bright side of the edge
have the same value H, and all points on the dark side of the edge boundary have
the same value L. Hence for pixels on or near the edge boundary, some of the
pixel's area will have high values and some low values. We assume a model in
which each pixel value is the convex combination of the high and low values, where
the coefficients of the convex combination are just the areas of high and low values
and where the areas of pixels are unit squares.
Using this edge model, we find from Fig. 7.30 that when edge direction 8
satisfies 5 tan 9 5 1,
Figure 7.30 Edge boundary passing though the center of the 3 x 3 neighborhood
and the areas of each pixel on each side of the edge boundary.
344
:;
4
2'
4
{
346
Table 7.6
Legal consistent orientation pairs for edge pixels detected by a compass edge
operator.
Two of the common 3 x 3 masks employed to calculate the digital Laplacian are
shown in Fig. 7.33. It is easy to verify that if I ( r , c ) = k , + k2r + k3c + k4r2+
kSrc+ k6c2,then the 3 x 3 values of I are as given in Fig. 7.34, and that each of
the masks of Fig. 7.33 produces the correct value of the Laplacian of I, which is
2k4 4- 2ka.
The general pattern for the computation of an isotropic digital Laplacian is
shown in Fig. 7.35. If we multiply the weights of Fig. 7.35 with the values of
Fig. 7.34 and then add, the result must be 2k4 2k6. This implies that 2a +b = 1.
It is easy to see from the equation relating the k , term that e = -(4a + 4b).
(c)
347
Figure 7.32 (a) One-dimensional step edge. (b) Its first derivative. (c) Its second
derivative.
The various 3 x 3 masks that correctly compute the digital Laplacian have
different performance characteristics under noise. Suppose that the values in a local
3 x 3 neighborhood can be modeled by
Figure 7.35 General pattern for a 3 x 3 mask computing a digital Laplacian. The
constraints are that e = 440 + 46) and that 2a b = 1.
348
where E(r,c) is independent noise having mean 0 and variance a2(r,c). If the
noise variance is constant, then the variance V of the digital Laplacian will be
V = a2[4a2+ 4b2 (4a 4b)'I. The values of a and b that minimize the variance
of the Laplacian can then be determined easily. Minimize u2[4a2+4b2 +(4a +4b)2]
subject t o 2 a + b = 1.
Using the Lagrangian multiplier solution technique, let e2 = u2[4a2+ 4b2 +
(4a + 4b)2] h(2a b - 1). Then
+ +
Setting each of the partial derivatives to zero and solving for a and b yields a = $
and b =
Since e = -(4a
4b), e =
The resulting mask is shown in
Fig. 7.36. Of course different noise models will necessitate different weights for the
digital Laplacian mask.
As we have seen with the previous edge operators, the differencing entailed by
taking first or second derivatives needs to be stabilized by some kind of smoothing
or averaging. Marr and Hildreth (1980) suggest using a Gaussian smoother. The
resulting operator is called the Laplacian of Gaussian zero-crossing edge detector.
Since convolution is an associative and commutative operation, smoothing an
image by convolving it with a Gaussian kernel and then taking its Laplacian by
convolving the result with a Laplacian kernel is exactly the same as taking the
Laplacian of the Gaussian kernel (LOG) and convolving the image with it. The
Laplacian of the Gaussian kernel is given by
+.
9.
349
radius 3 4 0 . In actual practice, since only a zero crossing is being looked for, the
Laplacian of the Gaussain kernel is multiplied by some constant, and the resulting
values are quantized to integers, with some care being taken to do the quantization
so that the sum of the positive entries equals the absolute values of the sum of the
negative entries. One way of accomplishing this is to define
LOG(r, c)
0=
where
N = 13 h a ]
r=-N c=-N
and A is chosen to be just less than the largest value of A that would make
LOG(N,N) = 0. Figure 7.37 shows a Laplacian of the Gaussian kernel suitable
for an 11 x 11 mask ( a = 1.4).
From our discussion of variance of the 3 x 3 digital Laplacian, it is clear that the
masks of Fig. 7.33 for computing the digital Laplacian are not necessarily optimal.
To determine the optimal mask values, a noise model will have to be assumed.
If this noise model is independent, identically distributed noise, then the Gaussian
smoothing will introduce spatial correlation, which will then have to be appropriately
taken into account. It appears that this kind of minimum-variance optimization for
smoothed images has not been utilized, although it is not difficult to do.
Once the image is convolved with the Laplacian of the Gaussian kernel, the
zero crossings can be detected in the following way: A pixel is declared to have a
zero crossing if it is less than -t and one of its eight neighbors is greater than t, or
if it is greater than t and one of its eight neighbors is less than -t, for some fixed
threshold t .Figure 7.38 shows the image of Fig. 7.3 1processed with a Laplacian of
Gaussian model a = 1.4 and then with a zerocrossing detection having a threshold
t = 1.
l for u = 1.4.
7.4 ~ d g eDetection
351
even when the value of the directional derivative is high, the edge detection should
be suppressed. This edge operator has come to be known as the Canny operator.
Its various implementations differ in the details of the establishment of the gradient direction, the suppression of nonedgelike neighborhoods, and the directional
differencing.
352
0"
45"
90"
135"
Figure 7.39 Template masks for a compass line detector having four orientation
directions.
Line Detection
A line segment on an image can be characterized as an elongated rectangular region
having a homogeneous gray level bounded on both its longer sides by homogeneous
regions of a different gray level. For a dark line the different gray levels of the side
regions have higher values than the center elongated region containing the dark line.
For a bright line the different gray levels of the side level have lower values than
the center elongated region containing the bright line. Different line segments may
differ in width. The width along the same line segment may also vary. A general
line detector should be able to detect lines in a range of widths.
One-pixel-wide lines can be detected by compass line detectors, as shown in
Fig. 7.39. Vanderburg suggests a semilinear line detector created by a step edge on
135"
Flgun 7.40 Template masks for a semilinear compass line detector having four
orientations.
Figun 7.41 Template mash for a line detector capabk of detecting lines of one
to three pixels in width.
353
354
either side of the line. Using the template masks shown in Fig. 7.39, he calculated
the line strength s at a pixel position as
s = max{ai
in a direction 8 defined by
8 = 45"i,
where ai + bi = s
His qualitative experiments indicated that the semilinear line detector performs
slightly better than the linear line detector of Fig. 7.40.
For lines that have a width greater than one pixel, the template masks of
Figs. 7.39 and 7.40 are not satisfactory. For lines two pixels in width, the line
detector of Fig. 7.39 produces half the response of what it would for one-pixelwide lines. For lines three pixeIs or more in width, it produces no response. For
lines two pixels in width, the semilinear detector of Fig. 7.40 fails when t > 0. And
if t = 0, then it will produce many false detections. One possible way of handling
a variety of line widths is to condition the image with a Gaussian smoothing, using a standard deviation equal to the largest-width line that the detector is required
to detect. The smoothing has the effect of changing the gray level intensity profile
across a wide line from constant to convex downward, which makes it possible for
the simple line detectors of Figs. 7.39 and 7.40 to work.
Another way to handle wider lines is to use a greater sampling interval, as
suggested by Paton (1979). The template masks compare gray level values from
the center of the line to values at a distance of two to three pixels away from
the center line. Template masks for eight directions are shown in Fig. 7.41. As
long as the regions at the sides of the line are larger than two pixels in width, the
technique will work for lines of one to three pixels in width. Larger-width lines can
be accommodated by even longer interval spacings.
7.2. Generate and examine the appearance of the following noisy images obtained by independently distorting each pixel of an image of a real scene by the following
methods:
a. Adding Gaussian noise with standard deviation from 1 to 21 by 4.
b. Adding replacement noise with replacement fractions p = 0.01,0.002,0.05,
0.01,0.02,0.05. (Adding replacement noise means choosing a fraction p of pixels of the image at random and replacing their values with random values within
the range of the image values.)
c. Distorting the image with multiplicative noise by multiplying each pixel value with
a uniform random variable in the range [0.8,12].
CHAPTER
81 Introduction
The facet model principle states that the image can be thought of as an underlying continuum or piecewise continuous gray level intensity surface. The observed
digital image is a noisy, discretized sampling of a distorted version of this surface.
Processing of the digital image for conditioning or labeling must first be defined
in terms of what the conditioning or labeling means with respect to the underlying
gray level intensity surface. To actually carry out the processing with the observed
digital image requires both a model that describes what the general form of the surface would be in the neighborhood of any pixel if there were no noise and a model
of what any noise and distortion, such as &focusing or monotonic gray level transformation, does to the assumed form. On the basis of the general form, processing
proceeds by implicitly or explicitly estimating the free parameters of the general
form for each neighborhood and then calculating the appropriate conditioning or
labeling values on the basis of the definitions relative to the underlying gray level
intensity surface. Graham (1962) and Prewitt (1970) were the first to adopt this
point of view.
The commonly used general forms for the facet model include piecewise constant (flat facet model), piecewise linear (sloped facet model), piecewise quadratic,
and piecewise cubic. In the flat model, each ideal region in the image is constant in
gray level. In the sloped model, each ideal region has a gray level surface that is a
sloped plane. Similarly, in the quadratic and cubic models, regions have gray level
surfaces that are bivariate quadratic and cubic surfaces, respectively.
Given a noisy, defocused image and assuming one of these models, we must
first estimate both the parameters of the underlying surface for a given neighborhood and the variance of the noise. We can then use these estimates in a variety
of ways, including edge detection, line detection, comer detection, and noise filtering, to accomplish labeling and conditioning. In Section 8.2 we illustrate the use
372
Tho h
a Modal
Relative Maxima
To illustrate the facet model principle, we consider a simple labeling application,
which is to detect and locate all relative maxima, to subpixel accuracy, from a
one-dimensional observation sequence f,,f 2,. . . ,f fakeri on successive equally
spaced points a unit distance apart. Relative maxima are defined to occur at points
for which the first derivative is zero and the second derivative is negative.
To find the relative maxima, we can least-squares fit a quadratic function Emz+
bm + B, -k 5 m 5 k, to each group of 2k + 1 successive observations, taking
the origin of each fit to be the position of the middle observation in each group of
2k +1. Then we analytically determirie if the fitted quadratic has a relative maximum
close enough to the origin.
The squared fitting error e?, for the nth group of 2k + 1 can be expressed by
Taking partial derivatives of e?, with respect to the free parameters d , 6, and f results
in the following:
373
from which
2. If
3. If
To see how well this algorithm will perform under conditions of additive independent Gaussian noise, suppose that the observed f ,, can be modeled by
where g, is the true underlying value of the nth point and satisfies gn+,,,= c d
374
1
3
= -(a2 +4u2 + a2)= -a2
4
(8.9)
Hence 6 and t are uncorrelated, and since each is a normally distributed variate,
they are statistically independent.
Having determined that
we can ask questions relating to the probability of missing a peak. One such question
has the form: What is the probability that the estimated t is greater than zero, given
the value of c?
Prob ( E > O(c) =
11
---&gi*dC
375
Our answer indicates that when c is negative but close to zero e.g., $u < c < 0 ) ,
the chance that the estimated 2 will be greater than zero is significant, and therefore
a maximum could be missed.
In a similar manner, we can ask questions relating to the probability of declaring
a false peak. One such question has the form: What is the probability that a e
estimated t is less than zero, given the value c?
($1
the chance that the estimated t will be less than zero is significant, and therefore a
false maximum could occur.. To limit the probability of false maxima, this answer
suggests that instead of only requiring 2 < 0 , we can require that t < c, for a given
negative c,. By doing so, we can limit the probability that a false maximum can
occur.
In general, for the relative maximum labeling procedure "label the a pixel as
having a relative maximum if 2 < c , and - 6 1 2 ~< f ," the knowledge that
has
can be used to determine or bound the misdetection rate and the falsedetection rate
performance characteristics.
+ +
(8.15)
g(r,c) = crr
Bc
y
q(r,c)
where 7 is a random variable indexed on R x C , which represents noise. We will
assume that q is noise having mean 0 and variance u2 and that the noise for any two
pixels is independent.
376
The least-squares procedure determines an &, 8 , and that minimize the sum
of the squared differences between the fitted surface and the observed one:
Without loss of generality, we choose our coordinate system R x C so that the center
of the neighborhood R x C has coordinates (0,O). When the number of rows and
columns is odd, the center pixel of the neighborhood has coordinates (0,O). When
the number of rows and columns is even, there is no pixel in the center, but the
point where the comers of the four central pixels meet has coordinates (0,O). In this
case pixel centers will have coordinates of an integer plus a half.
The symmetry in the chosen coordinate system leads to
r = 0 and
c
rtR
c = 0
ctC
Hence
8, and +, we obtain
cu =
=
=
Cr Ccrg(r, C)
Cr Cc r2
Cr Cc'g(r, C)
Cr Cc c2
Cr Ccg(r, c)
Cr Cc 1
(8.19)
+ +
Replacing g(r, c) by cur +PC y q(r,c! and simplifying the equations will
allow us to see explicitly the dependence of &, 8, and 4 on the noise. We obtain
& = a +Cr
Ccrq(r, C)
Cr Cc r2
d = ~ Cr
+ Ccc ~ ( rc),
Cr Cc c2
C
r
Cc~ ( rc),
4 = y +
Cr Cc 1
(8.20)
377
From this it is apparent that &, 6 ,and ij are unbiased estimators for a , @, and y,
respectively, and have variances
Normally distributed noise implies that &, 6 , and j. are normally distributed.
The independence of the noise implies that &, 8 , and j. are independent because
they are normal and that
C C {(hr + bc + 4) - [ar + BC + y + ( ) ( I ,
= C C[(&
- a)'r2 + (6 - B)'c2 + (9 - yl2 + q2(r,
c2 =
C)]}~
C)
we may substitute into the last three terms for c2 and obtain after simplification
C~-(~-~
1 ) ~ C X
c2 = ~ ~ q 2 ( r , c ) - ( h - a ) 2r x2 -x( B - ~ ) 2
r
independently distributed normal random variables with mean 0 and variance o'.
Hence
xc
c2 + (4 - 7)' CrC 1
(8.241
u2
is distributed as a chi-squared variate with three degrees of freedom. Therefore
e2/u2is distributed as a chi-squared variate with
(4
x r
C r2 + tb - 8)' Cr
1 C 1 - 3
r
differs from those of the neighborhood pixels. In order to measure the difference
between a pixel and its neighbors, we need to estimate local gray level statistics in
the neighborhood in tenns of the sloped facet model and afterward compare those
statistics with the current pixel gray level. We should note here that peak noise
is judged not from the univariate rnargmal distribution of the gray level intensities
in the neighborhood but from their spatial distribution. Figure 8.1 illustrates an
example of the spatial disttibution dependency of peak noise. In Fig. 8.l(a) and (b)
the central pixel has the same gray level "5," and the neighborhood is composed of
four values {1,2,3,4). However, the peakedness of gray level intensity is different
between them. It is difficult to judge that the center pixel in part (b) is peak noise,
whereas it is easier in part (a). This indicates that the gray level spatial statistics are
important.
Let N be a set of neighborhood pixels that does not contain the center pixel.
Note that the use of the deleted neighborhood makes this facet approach different
from that used earlier. Let n be the number of pixels in the neighborhood N. By
choosing this neighborhood, we may estimate the difference between the observed
value of the center pixel's (r,, c,) gray level intensity and the value estimated from
the neighboring pixels according to the sloped facet model (Yasuka and Haralick,
1983).
'
-3
-2
-1
(b)
Figure 8.1 Spatial distribution dependency of peak noise.
Proceeding as before, we determine that the doped facet model with the centerdeleted neighborhood N = R x C - ((0,O)) is given by
g(r,c) = ar +Bc
+r +v(r,c)
for (r,c) E N
+,
+ are given by
380
At the center pixel of the neighborhood, the fitted value is 9 and the observed value
is g(0,O). Under the hypothesis that g(0,O) is not peak noise, g(0,O) - 9 has a
Gaussian distribution with mean 0 and variance aZ(l+ &).Hence
has mean 0 and variance 1. We have already obtained that e2/a2has a chi-squared
distribution with #N - 3 degrees of freedom. Hence
Hence if t > TnN-3,p,the hypothesis of the equality of g(0,o) and 9 is rejected, and
the output value for the center pixel is given by 9. If t 5 T#N-3,p,the hypothesis
of the equality of g(0,O) and 7 ,is not rejected, and the output value for the center
pixel is given by g(0,O). A reasonable value for p is .05 to .0l.
381
regions. Morphologically, we would say that the facet domains are open under a
x K structuring element.
To make these ideas precise, let R and C be the row and column index set
for the spatial domain of an image. For any (r,c) E R x C, let I ( r , c) be the gray
value of resolution cell (r,c) and let B(r,c) be the K x K block of resolution cells
centered around resolution cell (r, c). Let r = {a,,. . . ,r ~ be) a partition of the
spatial domain of R x C into its facets.
In the iterated sloped facet model, for every resolution cell (r,c) E T,, there
exists a resolution cell (i, j ) E R x C such that
+ p,c + 7,.
An observed image J differs from its corresponding ideal image I by the addition of random stationary noise having zero mean and covariance matrix proportional
to a specified one.
where
E[v(r, c)l = 0
E[v(r, c)v(rl,c')] = ko(r - r', c - c')
The flat facet model of Tomita and Tsuji (1977) and Nagao and Matsuyama
(1979) differs from the sloped facet model only in that the coefficients a, and /3,
are assumed to be zero and Nagao and Matsuyama use a more generalized shape
constraint, which is also suitable here.
The iterations generate images satisfying the facet form. The procedure has been
proved to converge (Haralick and Watson, 1981) and has the following important
properties: In a coordinated and parallel manner, the strong influence the weak in
their neighborhoods, thereby causing the weak to become consistent with the strong.
The facet model suggests the following simple nonlinear iterative procedure to
operate on the image until the image of ideal form is produced. Each resolution cell
is contained in K 2different K x K blocks. The gray leveI distribution in each of
these blocks can be fit by a polynomial model. One of the K 2blocks has the smallest
error of fit. Set the output gray value to be that gray value fitted by the block having
the smallest error of fit. For the flat facet model, this amounts to computing the
variance for each K x K block in which a pixel participates. The output gray value
is then the mean value of the block having the smallest variance (Tomita and Tsuji,
1977) and Nagao and Matsuyama ( 1979).
For the sloped facet model, the procedure amounts to fitting a sloped plane to
each of the blocks in which a given resolution cell participates and outputting the
fitted gray value of the given resolution cell from the block having the lowest fitting
error.
The sloped facet model relaxation procedure examines each of the K2,K x K
blocks to which a pixel (r, c) belongs. For each block, a block error can be computed
382
Figure 8.2 The 3 x 3 linear estimators of a pixel's gray level for the nine different
3 x 3 neighborhoods in which the pixel participates. If the pixel's position is ( i ,j )
in the neighborhood, the estimate is j ( i , j).Each mask must be normalized by
dividing by 18.
by using
+ +
383
will result. The gradient magnitude will be proportional to the gray level jump. On
the other hand, if the region is entirely contained within a homogeneous area, then
the true surface a r /3 c 7 will have a = /3 = 0, and the fitted sloped facet
model & r 6 c 9 will produce a value of
which is near zero. Hence it is reasonable for edge detectors to use the estimated
gradient magnitude
as the basis for edge detection. Such edge detectors are called gradient-based edge
detectors. There are other kinds of edge detectors, such as zero-crossing ones. A
discussion of how the facet model can be used to determine zero crossings of second
directional derivatives as edges can be found in Section 8.8.
The idea of fitting linear surfaces for edge detection is not new. Roberts (1965)
employed an operator commonly called the Roberts gradient to determine edge
strength in a 2 x 2 window in a blocks-world scene analysis problem. Prewitt
(1970) used a quadratic fitting surface to estimate the parameters a and fl in a 3 x 3
window for automatic leukocyte cell scan analysis. The resulting values for & and
are the same for the quadratic or linear fit. O'Gorrnan and Clowes (1976) discussed
the general fitting idea. Merb and Vamos (1976) used the fitting idea to find lines on
a binary image. Hueckel(1973) used the fitting idea with low-frequency polar-form
Fourier basis functions on a circular disk in order to detect step edges.
The fact that the Roberts gradient arises from a linear fit over the 2 x 2 neighborhood is easy to see. Let
a
b
c
384
be the four gray level levels in the 2 x 2 window whose local spatial coordinates
for pixel centers are
a2
+ ( b - c ) ~The
. least-squares fit for
The Roberts gradient is defined by d ( a cu and fl is
6=-1
[ ( c + 6 ) - ( a + b ) ] and b = - [1( b + d ) - ( a + c ) l
(8.25)
2
2
The gradient, which is the slope in the steepest direction, has magnitude
JG.
+ Ib - c 1. Now since
(8.27)
There results
={lei, 181)
=1 . p
-dl
+ lb - C I I
(8.28)
Hence the quick Roberts gradient is related to the parameters of the fitted sloped
surface.
Finally, the max Roberts gradient is also related to the parameters of the fitted
sloped surface. The max Roberts gadent value is defined by rnax{la - dl, (b - ci).
Since max{lul,IvI) =
+I?(,
385
significantly different from zero? To answer this question, we begin by noting that
6 is a normally distributed variate with mean a and variance
u2 I
C 1r 2
r
that
so that the test statistic is a multiple of the estimated squared gradient magnitude
B2. For neighborhood sizes greater than 3 x 3, such an edge operator is
a generalization of the Prewitt gradient edge operator. However, by knowing the
conditional distribution given no edge, it becomes easier to choose a threshold.
For example, suppose we want the edge detector' to work with a controlled falsedark rate. The false-darm rate is the conditional probability that the edge detector
classifies a pixel as an edge given that the pixel is not an edge. Suppose the falsealarm rate is to be held to 1%. Then since P(d > 9.21) = .01, the threshold we
must use must be at least 9.21.
Notice that, other things being equal, as the neighborhood size gets bigger, a
fixed value of squared gradient becomes more statistically significant. Hence small
gradient magnitudes estimated in small neighborhoods may not be statistically significant, but small gradient magnitudes may be statistically significant in large neighborhoods. Also notice that, other things being equal, the greater the noise variance,
the greater the gradient magnitude must be in order to be statistically significant.
To use this technique, we must know the noise variance u 2 . Fortunately we
can obtain a good estimate of a*. Each neighborhood's normalized squared residual
error
G2
e2
(Cr
C c
- 3)
386
can constitute an estimator for 0'. This estimator is available for each neighborhood
of the image. Because there are usually thousands of pixels in the Image, the average
of 6' :( C r
1 - 3) taken over all the neighborhoods of the image is a very
good and stable estimator of o Z if it can be assumed that the noise variance is the
same in each neighborhood. If EI, represents the squared residual fitting error from
the nth neighborhood and the image has N such neighborhoods, then we may use
EL.
Under the hypothesis of no edgc. G. being the ratio of two chi-squared statiswould have an F distribution. But because the effective number of degrees of
freedom of G2 is so high. despite the dependencies among the E , , G has essentially
a chi-squared distributiun with two degrees of frecdom. Thus if wc wanted to detect
edges and he assured that the false-alarm rate (the conditional probability of assipning a pixel as an edge given that it is not an edge) is less than p,, we would use a
threshold of 8,, where P(xf 1 8,) = p,.
Figure 8.3 (upper left) shows a controlled 100 x 100 image having a disk of
diameter 63. The interior of this disk has a gray Ievel of 200. The background of the
d ~ s khah a gray level of 0. Independent Gaussian noise having mean 0 and standard
deviation 40. 50, and 75 is added tcl the controFled image. The noisy images are
shown in the other parts of Fig. 8.3.
A sloped facet model is fitted to cach 5 x 5 neighborhood of each image and its
ri
, and f 2 are computed. For the idcal image of Fig. 8.3 (upper icfi). the average
tics,
figure 8.3 (Upper I d t ) . C(>ntmlledd ~ c hI background hovrnp value 0,d ~ s khavIng valuc ZW)and nolsy d ~ c k s .(upper r~pht),noise standard dcv~ationof 40:
(lower left). nolfe standard dcvialion of 50: (lower rlght). noise standard deviatron
06 75
387
(c)
squared residual fitting error e2 , the average being taken over all neighborhoods, is
302.33. This corresponds to a standard deviation of about 17.4, which is 8.7% of
the dynamic range of the image.
Obviously in the noiseless image the fit will be perfect for all 5 x 5 neighborhoods that are entirely contained in the background or in the disk. The error must
come from neighborhoods that contain some pixels from the background and some
from the disk. In these neighborhoods the sloped fit is only an approximation. One
neighborhood having the worst fit is shown in Fig. 8.4. The sloped fit there has an
average squared residual error of 3746. The standard deviation of fitting error is
then 61.2, which represents 30.6% of the dynamic range.
For the noisy image of Fig. 8.3 (upper right), the standard deviation of the
fitting error is a = 77.3. This is just a little higher than the standard deviation of
the noise because it takes into account the extent to which the data do not fit the
model. In fact, assuming that the imperfectedness of the model and the noise are
independent, we would expect to find a standard deviation of d17.4~ 752 = 77,
which is close to the 77.3 measured.
Figure 8.5 (upper left) shows edges obtained when the statistic G computed
on the ideal image of Fig. 8.3 is thresholded at the value 120. Since b2 = 302.33
and for a 5 x 5 neighborhood C C r2 = C C c2 = 50, this corresponds to
selecting all neighborhoods having slopes greater than 26.94. The other parts of
Fig. 8.5 show the edges obtained when a 5 x 5 sloped facet model is employed
and when the statistic G computed from each neighborhood of the noisy image of
Fig. 8.3 (lower right) is thresholded at 4, 8, and 11. Since b2 = 5975.3 for the
388
Flgura 8.5 [Upper left). Edge\ obtained when the statist~cG co~nputedby using
5 -, 5 neighborhoods on the ldeal Image of Fig. 8.3a 1s thresholded at the value
120. (upper nght) edges ohtalned u h e n the statistic G computed using 5 x 5
nelghhurhoods on the nois! rmagc o f F1p R.3 (lower nghtl i\ ~hresholdedat the
value 3: Ilnwer left) and flower r~pht)thresholds or 8 and 1 1.
(xr
389
Flgure 8.6 Two operating curves for the 5 x 5 sloped facet gradient edge detector
The higher one corresponds 10a noisq drsk w t h noise standard deviar~onof 35
and the upper one correspond%to a nols!, disk with noise standard dcviatlan
of 50.
101
nrl edge
u!l!ih
.~.~cni~aEly
proportional ta the squared gradient of the region normalized by
r r h ~ t - hir
a random variable whose expected value is a*, the variance of the noise.
IEXAMPLE 0.1
Consider the folZowing 3 x 3 region:
390
The
hca Model
Then cii = -1.17,b = 2.67, and 9 = 5.00. The estimated gray level surface
is given by 6r + br + i .Sampling it at the center of each pixel produces
The difference between the estimated and the observed surfaces is the error,
and it is
From this we can compute the squared error e2 = 11.19. The F statistic is then
391
Flgure 8.7 Edgcs obtained under ;l 5 5 <loped tacet model using the F statistic.
!L:pper left) Thresholded F statistic from rhe noiseless disk; (upper right) T;
statistic image rlf the nrliky disk of Fip. 8.3 (lower right) thresholded at 2.32:
[lower Teft) and (lower right) Ihresholds nf 5 . M and 7.M.
11
f;11m
TLlr Bayesian
nlbr
tz
hcn,
(8.35)
From the previous section, P(G I nonedge) is known to be the density function of a
& variate. But P(G 1 edge ) is not known.
It is possible to infer P(G I edge ) from the observed image &ta since P ( G ) ,
the density function of the histogram of observed gradient magnitude, is easily
calculated.
Now
(8.36)
P(G) = P(G ( edge )P( edge ) + P(G 1 nonedge )P( nonedge )
Hence
(8.38)
This relation, together with the relation for P(GI edge ), implies that the threshold
t must satisfy
P ( t ) = 2P(t 1 nonedge )P( nonedge)
(8.39)
Here P ( t ) is the observed density for the test statistic G evaluated at t , P ( t 1 nonedge)
is the value of the density function of a variate evaluated at t , and P( nonedge)
is a user-specified prior probability of nonedge. For many images values of .9 to
.95 are reasonable.
*i
393
As just mentioned, the zero-crossing edge detector places edges not a Zocations of
a pixel
high gradient but at locations of spatial gradient maxima. More ,-p
is marked as an edge pixel if in its immediate area there is a zero m&ng of the
second directional derivative taken in the direction of the gradient ( W k , 1982)
and if the slope of the zero crossing is negative. Thus this kind of
detector
will respond to weak but spatially peaked gradients.
The underlying functions from which the directional derivatives zr= computed
are easy to represent as linear combinations of the polynomials in a q plynomial
basis set. A polynomial basis set that permits the independent estka&n of each
coefficient would be the easiest to use. Such a basis is the d i m orthogonal
polynomials.
C P k ( r ) ( f n+an-,rn-' +... + a , r + a O ) = O ,
k = o , . . . , n -1
(8.41)
r ER
These equations are linear equations in the unknown a o , .. . ,a,-, anxi are easily
solved by standard techniques.
The first five polynomial function fomlulas are
where
The discrete orthogonal polynomials defined on symmetric sets can be recursively generated (Forsythe, 1957) by the relation
394
where
P o ( r ) = l and P , ( r ) = r
395
For each r E R, let a data value d(r) be observed. The exact fitting problem is
to determine coefficients ao,. . . ,a,-, such that
z
N-1
d(r) =
anpn(r)
(8.45)
n=o
The exact fitting coefficients and the least-squares coefficients are identical for m =
0, . . . ,K. Similar equations hold for the two-dimensional case. Readers who would
like more technical details on the relationships between discrete least-square function
fitting and orthogonal projection may consult Appendix B.
The equation for a, means that each fitting coefficient can be computed as a
linear combination of the data values. For each index r in the index set, the data
value d(r) is multiplied by the weight
In this manner the estimate for any derivative at any point may be obtained. Similarly
an estimate for any definite integral can be obtained.
The mathematics for weighted least squares is similar to the mathematics for
equal-weighted least squares. The only difference is that the norm associated with
the weighted least squares is a positive definite diagonal matrix instead of an identity
matrix.
Figure 8.8 Kernels for estimating the cmfficients k l , . . . ,klo of the bivariate
cubic kt +kzr +k3c +k4(r2 -2) +ksrc +k6(c2 - 2 ) +k7(r3 - 3.4r) +kg(r2 2)c ksr(c2 - 2) klo(c3 - 3.4~).
397
The simplest way to think about directional derivatives is to cut the surface f (r,c)
with a plane that is oriented in the desired direction and is orthogonal to the rowcolumn plane. The intersection results in a curve. The derivative of the curve is
the directional derivative. To cut the surface f (r, c) with a plane in direction a ,
we simply require that r = p sin a and c = p cos a , where p is the independent
variable. This produces the curve f, (p).
We denote the second directional derivative off at the point (r, c) in the direction a by fz(r,c). It quickly follows by substituting f: for f that
Similarly,
fr = 3 f cos3a
sin3a + 3alf sin2a cos a + -sin a cos2a + ddr2dc
dr3
drac2.
dc3
.
383f
(8.53)
+
+
+ +
+ +
+
The kernels for directly estimating the coefficients k t , . . . ,klo in the expression
given above for a 5 x 5 window are shown in Fig. 8.9.
Notice that because linear combinations of linear combinations are linear combinations, the kernel associated with any monomial of Fig. 8.9 can be determined
from the kernels associated with the orthogonal basis of Fig. 8.8 by taking on appropriate linear combinations. For example, for the monomial r, the kernel for k2
398
he kcd Model
*
in Fig. 8.8 involves r with a weight of 1, the kernel for k7 involves r with a weight
of -3.4, and the kernel for k g involves r with a weight of -2. Hence we can write
a pictorial form for the kernel equation for r of Fig. 8.9 as
=L
420
It is well defined whenever k$ + k: > 0.At any point ( r ,c), the second directional
derivative in the direction a is given by
(8.56)
=Ap+B
(8.57)
where
A = 6 [k, sin3a + k8sin2a cos a + k gsin a cos2a + k l ocos3a]
(8.58)
399
[&I k ~ o c3
Figure 8.9 Kernels for directly estimating the coefficients kl,. . . ,kIo of the
bivariate cubic k l +k2r+k~c+k4r2+k5rc+k,jc2+k7r3 + k ~ r ~ c + k+kloc3
~rc~
for a 5 x 5 neighborhood.
and
+ kgcos2a]
(8.59)
If for some p, )pi < PO,where po is slightly smaller than the length of the side
of a pixel, fy(p) < 0,f:(p) = 0, and fL(p) # 0, we have discovered a negatively
sloped zero crossing of the estimated second directional derivative taken in the
estimated direction of the gradient. We mark the center pixel of the neighborhood
as an edge pixel, and if required we make a note of the subpixel location of the
zero crossing.
400
he ficet Model
If our ideal edge is the step edge, then we can refine these detection criteria by
insisting that the cubic polynomial fe(p) have coefficientsthat make f e a suitable
polynomial approximation of the step edge (Haralick, 1986). Now a step edge does
not change in its essence if it is translated to the left or right or if it has a constant
added to its height. Since the cubic polynomial is representing the step edge, we
must determine what it is about the cubic polynomial that is its fundamental essence
after an ordinate and abscissa translation.
To do this, we translate the cubic polynomial
so that its inflection point is at the origin. Calling the new polynomial g, we have
d m ,
Finally, we have
401
g a b ) = c (SP - S3p3)
In this form it is relatively easy to determine the character of the cubic. Differentiating, we have
a.
The locations of the relative extrema depend only on S. They are located at f1/S
The height difference between relative extrema depends only on the contrast. Their
.
characteristics of the cubic depend on both C and S.
heights are f 2 ~ / 3 a Other
For example, the magnitude of the curvature at the extreme is 2 4 CS2, and the
derivative at the inffection point is CS.
Of interest to us is the relationship between an ideal perfect step edge and
the representation it has in the least-squares approximating cubic whose essential
parameters are contrast C and scale S. We take an ideal step edge centered in an
odd neighborhood size N to have
pixels with value -1, a center pixel with
value 0, and +N pixels with value +l. Using neighborhood sizes of 5 to 23, we
find the values listed in Table 8.1 for contrast C and scale S of the least-squares
approximating cubic.
The average contrast of the approximating cubic is 3.16257. The scale S(N)
appears to be inversely related to N; S(N) = $. The value of So minimizing the
relative error
S(N) - %
Tablo 8.1
Contrast C and scale S of the fitted cubic for an ideal step edge as a function
of neighborhood size N.
Neighborhood Size N
Contrast C
Scale S
402
for ideal step edges having a contrast of 2 can help provide additional criteria
for edge selection. For example, the contrast across an arbitrary step edge can be
estimated by
2C
Edge contrast = (8.67)
3.16257
If the edge contrast is too small, then the pixel is rejected as an edge pixel. In many
kinds of images, too small means smaller than 5% of the image's true dynamic
range. Interestingly enough, edge contrast C depends on the three coefficients c , ,c2,
and c3 of the representing cubic. Firstderivative magnitude at the origin, a value
used by many edge gradient magnitude detection techniques, depends only on the
coefficient c , . Firstderivative magnitude at the inflection point is precisely CS, a
value that mixes together both scale and edge contrast.
The scale for the edge can be defined by
SON
Edge scale = 1.793157
Ideal step edges, regardless of their contrast, will produce least-squares approximating cubic polynomials whose edge scale is very close to unity. Values of edge scale
larger than 1 have the relative extrema of the representing cubic closer together than
expected for an idea step edge. Values of edge scale smaller than 1 have the relative
extrerna of the representing cubic farther away from each other than expected for an
ideal step edge. Values of edge scale that are significantly different from unity may
be indicative of a cubic representing a data value pattern very much different from
a step edge. Candidate edge pixels with an edge scale very different from unity can
be rejected as edge pixels.
The determination of how far from unity is different enough requires an understanding of what sorts of nonedge situations yield cubics with a high enough
contrast and with an inflection point close enough to the neighborhood center. We
have found that such nonedge situations occur when a steplike jump occurs at the
last point in the neighborhood. For example, suppose all the observed values are
the same except the value at an endpoint. If N is the neighborhood size, then the
inflection point of the approximating cubic will occur at &%,the plus sign corresponding to a different left endpoint and the minus sign corresponding to a different
right endpoint. Hence for neighborhood sizes of N = 5,7,9, or 11, the inRection
point occurs within a distance of 1 from the center point of the neighborhood. So
providing the contrast is high enough, the situation would be classified as an edge
if scale were ignored. For neighborhood sizes of N = 5,7,9,ll, and 13, however,
the scale of the approximating cubic is 1.98, 1.81, 1.74, 1.71, and 1.68, respectively. This suggests that scales larger than 1 are significantly more different from
unity scale than corresponding scales smaller than 1. In many images restricting
edge scale to between .4 and 1.1 works well.
'g
1
,.
If one is interested in only straight-line edges, a further refinement is possible. We can insist that the curvature of the contour at the zero-crossing point be
sufficiently small. Let (ro,co) = (posin 8, po cos 8) be the zero-crossing point. Let
+ 2k9cos 9)po
404
pixel's center. The bias occurs because the step edge does not match the polynomial
model.
Instead of computing directional derivatives at a point directly from the fitted
surface, as in the case of the standard cubic facet gradient operator mentioned earlier,
in this section we describe an operator that measures the integrated directional
derivative strength as the integral of first directional derivative taken over a square
area. The direction for the integrated derivative estimate that maximizes the integral
defines the estimate of gradient direction.
Edge direction estimate bias of the integrated directional derivative gradient
operator is sharply reduced as compared with the bias of the standard cubic facet,
Sobel, and Prewitt gradient operators. Noise sensitivity is comparable to that of the
Sobel and Prewitt operators and is much better than that of the standard cubic facet
operator.
Also, unlike the standard cubic facet, Sobel, and Prewitt operators, increasing
the neighborhood size decreases both estimate bias and noise sensitivity. For ramp
edges the integrated operator is very nearly unbiased. The worst bias for the 7 x 7
operator is less than 0.09", for the 5 x 5 operator, less than 0.26".
Section 8.9.1 describes the mathematical analysis necessary to derive the new
gradient estimate. Section 8.9.2 provides a comparison of the integrated directional
derivative gradient operator against the standard cubic facet gradient, Prewitt, and
Sobel operators for step and ramp edges contaminated by zero-mean Gaussian noise.
F g = -LJ w
JI:
-,
+ w cos o)dpdw
(8.69)
G =FeMAX"@MAX
(8.70)
= ma- F g and U e M A x is a unit vector in the direction that maximizes
where FeMAX
Fn.
Using the bivariate cubic fit, we obtain that f ,'(p cos 8 + w sin 0, -p sin 8 +
w cos 8) reduces to
f;(pcosB + w sine, -psino + w cos8)
= [k9sin38 kg cos3B (3k7 - 2k9)sin 8 cos28 + (3k lo - 2kg)sin28 cos 81 p2
2 [-kg sin38 (3k7 - 2k9)sin28 cos 8 (2kg - 3k10)sin 8 cos28 kgcos2B] po
+ [3k7sin38 + 3klocos38 + 3k9sin 8 cos28 + 3kgsin28 cos 81w2
+ [-k5 sin28 + 2(k4 - kg)sin e cos 8 + k5cos281p
(8.71)
+2[k4sin2B+k5sin8cosB +k6cos28]w +k2sin8 + k3cosB
405
(8.72)
[(kg 3k7)sin39
Then
FeMax=
(8.75)
where Dl and D2 are the numerator and denominator of the argument of the tangent
function in Eq. (8.73). In the remainder of our discussion we take the area of
integration to be square: L = W.
406
if we assume a unit distance between two 4-neighbor pixels in the grid. A step
edge passing through a pixel divides it into two parts having areas Al and A2, with
A, +Az = 1. If the corresponding gray level intensities on each side of the edge are
I , and 12,then the pixel is assigned a gray level intensity I according to the rule
The experiments use values for I l and I2 equal to 100 and 200, respectively, which
implies that the edge contrast is 100.
Ramp edges are generated by defocusing step edges with a 3 x3 equally weighted
averaging filter. Finally, both step and ramp edges are contaminated by adding zeromean Gaussian noise with a given standard deviation.
For comparison purposes we use two performance measurements, edge direction estimate bias and edge direction estimate standard deviation. The latter measures noise sensitivity. The estimate bias is defined as the difference between the
estimate mean direction and the true edge direction. Combining the previous two
measurements by the root-mean-square error formula produces a single performance
measurement. The experiments show that the integrated directional derivative gradient operator achieves best performance in the root-mean-square error sense when
L = W = 1.8 for a 5 x 5 neighborhood size and L = W = 2.5 for a 7 x 7
neighborhood size for both step and ramp edges and for a variety of noise levels.
We compare the following gradient operators: 5 x 5 extended Sobel (Iannino
and Shapiro, 1979), 5 x 5 and 7 x 7 Prewitt, 5 x 5 and 7 x 7 standard cubic facet,
and 5 x 5 and 7 x 7 integrated directional derivative. Figure 8.10 shows the 5 x 5
row derivative masks associated with each of the operators, and Fig. 8.11 shows the
7 x 7 row derivative mask for the integrated directional derivative gradient operator.
,g+
7
g
!
407
Figure 8.11 Row derivative mask for integrated directional derivative gradient
operator for 7 x 7 neighborhood size.
The column derivative masks can be obtained from the row masks by rotation of
90".
For a step or ramp edge of a given orientation and noise standard deviation, each
operator is applied to the grid's center 10,000 times, each time with a different noisy
sample and a different edge displacement from the grid's center. Edge orientations
vary from 0 to 90" and noise standard deviation from 0 to 100. Edge contrast is
100. The results can be seen in Figs. 8.12 and 8.13.
Figures 8.12 and 8.13 show estimate bias against true edge direction for step
and ramp edges under zero-noise conditions for the standard cubic facet gradient
---
90.00
Figure 8.12 Bias as function of true edge direction for step edges under zeronoise conditions for four different edge operators.
408
----- ---
90.00
Figure 8.13 Bias as function of true edge direction for ramp edges under zeronoise conditions for four different edge operators.
operator and the integrated directional derivative gradient operator. Three observations can be made: (1) The integrated operator is clearly superior to the standard
cubic facet gradient operator. (2) Under zero-noise conditions the 7 x 7 integrated
directional derivative gradient operator has a worst bias of less than O.WO. And
(3) the 5 x 5 integrated directional derivative gradient operator has a worst bias of
less than 0.26" on ramp edges. For comparison purposes, the 7 x 7 standard cubic facet gradient operator has a worst bias of about 1.2", and the 5 x 5 standard
cubic facet gradient operator has a worst bias of 0.5". This improvement in worst
bias remains when the edges are contaminated by additive independent zero-mean
Gaussian noise. The estimate bias decreases for the integrated operator as the neighborhood size increases, whereas the opposite happens with the standard cubic facet
gradient operator. Both operators perform better with ramp edges than with step
edges.
Figure 8.14 shows estimate standard deviation against noise standard deviation
for a fixed step edge with orientation of 22.S0 and additive independent Gaussian
noise. Again, the integrated operator is uniformly superior to the standard cubic
facet gradient operator for both step and ramp edges.
Under zero-noise conditions, Fig. 8.15 shows estimate bias of the Sobel and
Prewitt operators as a function of tr@ edge direction for step edges. The 7 x 7 integrated operator has the smallest bias, followed by the 5 x 5 extended Sobel and the
5 x 5 and 7 x 7 Prewitt operators. Notice that for ramp edges the response of the
integrated operator is nearly flat about zero, that is, the operator is nearly unbiased.
For the 7 x 7 integrated operator, the worst bias is less than O.WO,and for the 5 x 5
''
4
+.
100.00
-- --- ---
,,
I
,
,
-7.00
0
-- --- ---=-=-
90.00
Flgun 8.15 Estimate bias of the Sobel and Prewitt operators as function of true
edge directionfor step edges under zero-noise conditions.
409
-- -----*---
90.00
Figure 8.16 Estimate bias of the Sobel and Prewitt operators as function of
hue edge direction for step edge. Noise standard deviation is 25.
Edge contrast
is 100.
integrated operator, less than 0.26". For comparison purposes the worst bias in the
7 x 7 Prewitt operator is about 5" and in the 5 x 5 Prewitt operator about 4".
Figures 8.16 and 8.17 show estimate bias as a function of true edge direction for
step and ramp edges when the noise standard deviation is equal to 25. The bias for
all the operators shown is nearly identical to the bias under zero-noise conditions.
It can be seen from the plots of estimate standard deviation that, as expected, the
7 x 7 operators are less sensitive to noise than the 5 x 5 operators.
(eaq)
Comer Detection
The detection of comers in images is extremely useful for computer vision tasks.
For example, Huertas (1981) uses comers to detect buildings in aerial images. Nagel
and Enkdmann (1982) use comer points to determine displacement vectors from
a pair of consecutive images taken in time sequence. Some approaches to comer
detection rely on prior segmentation of the image and subsequent analysis of region
boundaries. Rutkowski and Rosenfeld (1978) provide a comparison of several comer
detection techniques along those lines.
In this section we develop gray scale comer detectors, which detect comers
by operating directly on the gray scale image. As Kitchen and Rosenfeld (1980)
point out, the main advantage of such comer detectors is that their performance
is not dependent on the success or failure of a prior segmentation step. Among
- ------
41 1
90.00
5 x 5 Integrated'rectional derivative
5 x 5 Sobel
5 x 5 Rewitt
I.-.-
7 x 7 Integrateddirectional derivative
7~7R.ewitt
Figun 8.17 Estimate bias of the Sobel, Prewitt, and integrated directional derivative operators as function of true edge direction for ramp edge. Noise standard
deviation is 25. Edge contrast is 100.
the earliest gray scale comer detectors is Beaudet's (1978) DET operator, which
responds significantly near comer and saddle points. Kitchen and Rosenfeld report
results using several operators that measure comemess by $theproduct of gradient
magnitude and rate of change of gradient direction. Dreschler and Nagel (1981)
investigate points lying between extrema of Gaussian curvature as suitable candidates
for comer points.
Comer detection is simply described by using the facet model. In general, what
we are usually inclined to call a corner occurs where two edge boundaries meet at a
certain angle or where the direction of an edge boundary is changing very rapidly.
We associate comers therefore with two conditions: the occurrence of an edge and
significant changes in edge direction. We discussed edge detection in Sections 8 . 6
8.9 Here we will concentrate on change in edge direction, which is equivalent to
change in gradient direction since edge and gradient directions are orthogonal.
Comers occur at edge points where a significant change in gradient direction
takes place. Now this change in gradient direction should ideally be measured as an
incremental change along the edge boundary. We do not desire, however, to perform
boundary following, since that would require a prior segmentation step. There are
several ways to handle this situation based on the realization that according to our
model the direction of an edge point- that is, the tangent to the edge boundary at that
point- is orthogonal to the gradient vector at that same point. The simplest approach
is to compute the incremental change in gradient direction along the tangent line to
the edge at the point that is a comer candidate. The second approach is to evaluate the
412
incremental change along the contour line that passes through the comer candidate.
Finally, we can compute the instantaneous rate of change in gradient direction in
the direction of the tangent line. We will discuss each of these approaches.
The properties of those neighborhood points away from the neighborhood center
and possibly outside the pixel itself can be computed by two different methods:
(a) using the surface fit from the central neighborhood and (b) using the surface fit
from the neighborhood centered around the pixel closest to the given point.
Although the first method is computationally less expensive than the second
one, the latter may be more accurate.
Figure 8.18 Two points equidistant to the origin and lying on the line tangent to
the edge boundary passing through it.
"
413
Figure 8.19 Two points equidistant to the origin and lying on the contour line
passing through it.
414
Tho hcet M O M
centered at the comer candidate pixel. Let f ,(r, c) and f ,(r, c) denote the row and
coIurnn partial derivatives of f . Consider the line passing through the origin in
the direction a. An arbitrary point in this line is given by p(sin a , cos a), and the
gradient direction at that point is given by
O,(p sin a , p cos a ) = tan-'
fr(p sina,pcosa)
(p sin a , p cos a )
fc
The rate of change of gradient direction in the direction a evaluated at the origin
(p = 0) is then
eL(0) =
k3(2k4
+ ks cos a ) - k2(k5
kf
+ k::
+2k6 cos a )
(8.79)
We are interested in.the value of 8:(0) when the direction a is orthogonal to the
gradient direction at the origin (the edge direction). Since (k2,k3) is the gradient
vector at the origin, (-k3, k2) is a vector orthogonal to it, and
sina =-
-k3
Jrn'
k2
cos a = -
Jm";
415
The test to decide whether the origin (0,O) is a comer point follows: We declare
(0,O) to be a comer point if these two conditions are satisfied:
1. (0,O) is an edge point.
2. For a given threshold R, lO:(O)l
> s2.
4 16
(a)
Origlt~~J
( b ) N(31fy
"
.
-
"
<"*
' % * ,
(c)Aerial wcnc
Flgwre 8.20 Perfect and
nnihq srt~ficlollq~enerdtedInragcs
tcsted point. Surprisingly, the next best i.: thc simplest one, which uses incremental
change along the tangent line and ptoperticc; from the same corner candidate central
neighborhood for all the tested points in the tangent line.
Table 8.2
417
Grad threshold
-20
P(AC/TC)'
Incremental
0.97
change along
tangent line.
Central
neighborhood
increment=
3.50 pixels.
Incremental
0.97
change along
tangent line.
Nearest
neighborhood
increment=
3.50 pixels.
Incremental
0.94
change along
contour line.
Central
neighborhood
increment=
3.50 pixels.
Incremental
0.97
change along
contour line.
Nearest
neighborhood
increment=
4 pixels.
Instantaneous
0.94
rate of change.
da = 1
P(TCIAC)'
0.99
d=O
Angle P(AC/TC) P(TC/AC) Angle
Threshold
Threshold
47'
0.278
0.25
55'
0.97
67'
0.111
0.108
80'
0.94
50'
0.278
0.294
63'
0.99
76'
0.361
0.361
94'
O.%
0.075
16 ' /pixel
Kitchen and Rosenfeld investigated several techniques for gray level comer
detection. Each one computed for every pixel in the image a measure of comemess,
and then comers were obtained by thresholding. Their best results were achieved by
measuring comemess by the product of gradient magnitude and instantaneous rate
of change in gradient direction evaluated from a quadratic polynomial gray level
surface fit.
Dreschler and Nagel detected comers by the following procedure. First, they
computed for each pixel in the image the Gaussian curvature. This entailed doing
a local quadratic polynomial fit for each pixel and computing the Hessian matrix.
The Gaussian curvature is the product of the main curvatures (eigenvalues of the
418
Table 8.3
Performance of the best facet model-based, Kitchen-Rosenfeld, and DreschlerNagel comer detectors.
d=O
d=l
Best facet-model
corner detector
Kitchen-Rosenfeld
No gradient
threshold
Kitchen-Rosenfeid
Gradient
thnsbold=20
Dreschlcr-Nagel
Gradient
threshold=20
P(TC1AC)
P(AC/TC)
0.97
P(TC1AC)
0.97
P(ACmC)
0.361
0.361
0.36
0.36
0.055
0.021
0.83
0.84
0.055
0.05
0.33
0.35
0.055
0.059
Hessian matrix). Next, they found the locations of maximum and minimum Gaussian
curvature. A pixel was declared to be a comer if the following conditions were
satisfied.
1. The pixel's steepest slope was along the line that connected the location of
maximum with the location of minimum Gaussian curvature. (This was done
only for extrema lying within a given radius from the comer candidate pixel.)
2. The gray level intensity at the location of maximum Gaussian curvature was
larger than the gray level intensity at the location of minimum Gaussian curva-
ture.
3. The orientation of the main curvature that changed sign between the two extrema
pointed into the direction of the associated extremum.
Figure 8.21 illustrates the results of applying the facet model-based, KitchenRosenfeld, and Dreschler-Nagel gray level comer detectors to the artificially generated noisy image. In all cases we used a cubic polynomial fitting on a 7 x 7 pixel
neighborhood. We slightly modified the Kitchen-Rosenfeld comer detector by considering only points whose gradient exceeded a given threshold. This resulted in
substantial improvement of the original Kitchen-Rosenfeld method. The DreschlerNagel comer detector proved to be the most sensitive to noise, and a gradient
threshold had to be used to improve its performance. Since all three methods use
the same cubic polynomial surface fit and the same 7 x 7 pixel neighborhood size,
the same gradient threshold of 20 was used in each to minimize the effects of the
noise. The search for Gaussian curvature extrema was done in a 5 x 5 neighborhood.
Table 8.3 shows the probability of correct corner assignment for each case. The best
results according to this table are obtained by using the facet model-based comer
419
Flgure 8.21 Comparison of the comer assignments for the facet model-based,
the Kitchen-Rosenfeld (wilh and wlthout gradient threshold), and the DreschIerKdgel comer detectors (clockwl~efrom top left). Pdramerers are shown in Table 8.3 for d = 1 .
a-
l~olmplcDerivative Magnitudes
A gradient edge can be understood as arising from a first-order isorropic derivative magnitude. It should then come as no surprise that higher-order features can
420
arise from higher-order derivative magnitudes (Prewitt, 1970; Beaudet, 1978; and
Haralick, 1981). In this section we determine those linear combinations of squared
partial derivatives of two-dimensional functions that are invariant under rotation of
the domain of the two-dimensional function. Let us first consider the simple linear
function f (s, c ) = k, kzr k,c. If we rotate the coordinate system by 8 and call
the resulting function g, we have in the new ( r ' , c') coordinates
421
and
g(rt,c') = f ( r ,c )
so that
Hence the sum of the squares of the first partials is the same constant k; + ki, the
squared gradient magnitude, for the original function or for the rotated function.
In the remainder of this section we explicitly develop the direction isotropic
derivative magnitudes for the first and the second derivatives of an arbitrary function.
Then we state the theorem giving the formula for the isotropic derivative magnitude
of any order.
Proceeding as we did with the first-order case, upon writing the rotation equation with r' and c' as the independent variables, we have
in terms of
we have
af
afar
drl=drdrl
a f a c af
+-,=-co~e+-sine
scar
ar
af = 8-f 8f + a-f ac = df
-
act
ar act
a~ act
ar
af
ac
+
af
-cos 0
a~
Then
a2f
B'f
= B'fcos2e +2--cosesin0
a r R 8r2
drdc
=
drtdcl
2f
+ d-sin20
ac2
a2f
-_ 8 2 f ~ ~ s O ~ i ndO2+f - ( c o ~ 2 0 - ~ i n 2
0)+-~~~O~in~
dr2
drdc
dc2
=
-*BYcosOsinO + - cdo s22 ~f
dca
dr2
drdc
dc2
Looking for some constant X that makes
drdc
we discover that exactly one does exist, and its value is X = 2. Thus for each point
in the unrotated coordinate system,
423
("f
acdrdr
)'+(dcardc
" )'+(dcacdr
"I )'+( d3f )'
dcacdc
It should be clear from this example that the binomial coefficients arise because of
the commutivity of the partial differentialoperators.
Theorem 8.1: The sum of the squares of all partial derivatives of the same order is
isotropic. Let f : EN + E be Cm,P'be an N x N orthonormal matrix, and
y = Px, where the ( n ,k)th entry of P is pnk. Let M be any positive integer.
Then
Proof:
Since y = Px,
$ = p,,
so that
424
To make our notation simpler, we denote O/ayn by fn(y) and can therefore
write
,.
by f j l. .
write
Then
Therefore
425
sequence than those neighboring the sequence. Significantly higher or lower may
depend on the distribution of brightness values surrounding the sequence, as well
as on the length of the sequence.
Ridges and ravines may arise from dark or bright lines or from an image
that might be related to reflections, to variations, or to a three-dimensional surface
structure. For elongated objects that have curved surfaces with a specular reflectance
function, the locus of points on their surfaces having surface normals pointing in the
direction of the camera generates pixels on a digital image that are ridges. Linearly
narrow concavities on an object surface (such as cracks) are typically in shadow
and generate pixels on a digital image that are ravines. Line and curve finding plays
a universal role in object analysis. Therefore one important part of the computer
vision algorithm lies in the detection of ridge and ravine pixels.
The facet model can be used to help accomplish ridge and ravine identification.
To use the facet model, we must first translate our notion of ridge and ravine to
the continuous-surface perspective. Here the concept of line translates in terms of
directional derivatives. If we picture ourselves walking by the shortest distance across
a ridge or ravine, we would walk in the direction having the greatest magnitude of
second directional derivative. The ridge peak or the ravine bottom would occur
when the first directional derivative has a zero crossing.
Thus to label pixels as ridges and ravines, we need to use the neighborhood
of a pixel to estimate a continuous surface whose directional derivatives we can
compute analytically. To do this, we can use a functional form consisting of a cubic
polynomial in the two-variables row and column, just as we did for the facet edge
operator.
From this it follows that the second directional derivative in direction a can be
expressed by
Rearranging the expression for f z , we find that the second directional derivative
can be expressed as a linear combination of two terms, the first being the Laplacian
off and not depending on a and the second depending on a:
1 a2f a2f
1 a2f a2f
br2 ) cos 2a + a'f sin 2a]
fT = -2 (-ar2 + -)
bc2 + Z (dcZ-drat
(8.84)
426
Therefore
sin 2a = f.( ~ ~ ~ ) / D ac ons 2d a = &
a2f
a2f
where
indicating that the extremum is a relative minimum, and when the minus signs are
taken,
=&
iZ"
indicating that the extremum is a relative maximum. Also, the direction a that makes
427
r =psinot
and c = p c o s a
A = (k, sin3a
At those positions for p that make fL(p) = 0, the value of ft(p) is proportional to
the curvature of the surface. If B2 - 3AC > 0, the largest-magnitude root pL,can
be found by
-B - sign(B)dB2 - 3AC
(8.9 1)
PL =
3A
C
3APL
=-
(8.92)
428
interval, i the locat~anof thc inflection point. and u the location o f the relative minimum. If the relative minimum occurs t ( ~the left of the relative maximum, the depth
of the minimum is defined by
Depth = min{fP(b),f d i ) ) - f p C ~ )
If the relative minimum occurs to the right of the inflection point, then the depth is
defined by
Depth
min{fdaI,fdi)) - f d v 1
The rclative depth of the minimum is then defined by
Relative depth = depchlrange
(cl
(dl
Figure 8.23 tal Or~glnalimagr: {h) first isotropic derlvallve; ( c ) iecond ~+orropic
der~ratlic,and Id) th~rrlisotrop~cder~vative.
429
Table 8.4
Neighborhood
Size
Data
Position of
Extremum
Closest to
Origin
Position of
Inflection
Point
Relative
Depth
inflection point is not too far from the origin, indicating a relatively high frequency
ringing. Also for all these cases the relative depth, which measures the significance
of the ringing, is close to zero.
Compare this situation to the case in which the data clearly have an extremum
near the origin, as in the one-dimensional patterns of Table 8.4. Notice that in
these cases the inflection points tend to be much farther away from the extremum
indicating that the "ringing" behavior has a lower frequency. Also the relative depth
tends to be much larger in the case of a true extremum.
What all this means is that in using fe to evaluate derivatives, we must understand and interpret the data through the "eyes" of a cubic polynomial. Not all
extrema of the cubic are significant. A cubic extremum is significant in the sense
that it reflects an extremum in the fitted data only if it passes the following tests:
1. I Position of extremum from origin
2.
Invariance Requirement
A digital image can be obtained with a.variety of sensing-camera gain settings. It
can be visually enhanced by an appropriate adjustment of the camera's dynamic
range. The gain-setting, or enhancing, point operator changes the image by some
monotonically increasing function that is not necessarily linear. For example, nonlinear, enhancing point operators of this type include histogram normalization and
equal probability quantization.
In visual perception, exactly the same visual interpretation apd understanding
of a pictured scene occurs whether the camera's gain setting is low or high and
whether the image is enhanced or unenhanced. The only diffference is that the
enhanced image has more contrast, is nicer to look at, and can be understood more
quickly by the eye.
The last fact is important because it suggests that many of the current lowlevel computer vision techniques, which are based on edges, cannot ever hope to
have the robustness associated with human visual perception. They cannot have
the robustness because they are inherently incapable of invariance under monotonic
transformations. Fbr example, edges based on zero crossings of second derivatives
will change in position as the monotonic gray level transformation changes because
431
convexity of a gray level intensity surface is not preserved under such transformations. However, the topographic categories of peak, pit, ridge, valley, saddle, flat,
and hillside do have the required invariance.
Background
Marr (1976) argues that the first level of visual processing is the computation of a
rich description of gray level changes present in an image, and that all subsequent
computations are done in terms of this description, which he calls the primal sketch.
Gray level changes are usually associated with edges, and Marr's primal sketch
has, for each area of gray level change, a description that includes type, position,
orientation, and fuzziness of edge. Marr (1980) illustrates that from this information
it is sometimes possible to reconstruct the image to a reasonable degree.
The topographic primal sketch has richness and invariance and is very much
in the spirit of Marr's primal sketch and the thinking behind Ehrich's relational
trees (Ehrich and Fbith 1978). Instead of concentrating on gray level changes as
edges, as Marr does, or on one-dimensional extrema, as Ehrich and Rith do, we
concentrate on all types of two-dimensional gray level variations. We consider each
area on an image to be a spatial distribution of gray levels that constitutes a surface
or facet of gray level intensities having a specific surface shape. It is likely that, if
we could describe the shape of the gray level intensity surface for each pixel, then by
assembling all the shape fragments we could reconstruct, in a relative way, the entire
surface of the image's gray level intensity values. The shapes that have the invariance
property are peak, pit, ridge, ravine, saddle, flat, and hillside, with hillside having
noninvariant subcategories of slope, inflection, saddle hillside, convex hillside, and
concave hillside.
Knowing that a pixel's surface has the shape of a peak does not tell us precisely
where in the pixel the peak occurs; nor does it tell us the height of the peak or
the magnitude of the slope around the peak. The topographic labeling, however,
does satisfy Marr's (1976) primal sketch requirement in that it contains a symbolic
description of the gray level intensity changes. Furthermore, upon computing and
binding to each topographic label numerical descriptors such as gradient magnitude
and direction, as well as directions of the extrema of the second directional derivative
along with their values, we can obtain a reasonable absolute description of each
surface shape.
11 Vf 1 1 =
gradient magni~de
a(')= unit vector in the direction in which the second directional derivative
vf .@(I)
;
-,
(:ti)
Thus
Rrthennore, the two directions represented by the eigenvectors are orthogonal to
each other. Since H is a 2 x 2 symmetric matrix, calculation of the eigenvalues
and eigenvectors can be done efficiently and accurately by using the method of
Rutishauser (1971). We may obtain the values of the first directional derivative by
'cs
433
simply taking the dot product of the gradient with the appropriate eigenvector:
Peak
A peak (knob) occurs where there is a local maximum in all directions. In other
words, we are on a peak if, no matter what direction we look in, we see no point
that is as high as the one we are on (Fig. 8.24). The curvature is downward in
all directions. At a peak the gradient is zero, and the second directional derivative
is negative in all directions. To test whether the second directional derivative is
negative in all directions, we simply examine the value of the second directional derivative in the directions that make it smallest and largest. A point is therefore
classified as a peak if it satisfies the following conditions:
Pit
A pit (sink, bowl) is identical to a peak except that it is a local minimum in all
directions rather than a local maximum. At a pit the gradient is zero, and the
434
Ridge
Figure 8.25 Saddle surface with ridge and ravine lines and saddle hillsides.
A geometric way of thinking about the ridge definition is to realize that the
= 0 means that the gradient direction (which is defined for
condition V f .
nonzero gradients) is orthogonal to the direction w ( ' ) of extremized curvature.
Ravine
A ravine (valley) is identical to a ridge except that it is a local minimum (rather
than maximum) in one direction. As we walk along the ravine line, the points to the
right and left of us are higher than the one we are on (see Fig. 8.25). A point is
classified as a ravine if it satisfies any one of the following three sets of conditions:
A saddle occurs where there is a local maximum in one direction and a local
minimum in a perpendicular direction (see Fig. 8.25). A saddle must therefore
have a positive curvature in one direction and a negative curvature in a perpendicular
direction. At a saddle the gradient magnitude must be zero, and the extrema of the
second directional derivative must have opposite signs. A point is classified as a
saddle if it satisfies the following conditions:
Flat
A flat (plain) is a simple, horizontal surface (Fig. 8.26). It therefore must have
a zero gradient and no curvature. A point is classified as a flat if it satisfies the
following conditions:
IlVf 11 = 0,Xl = 0,X2 = 0
Given that these conditions are true, we may further classify a flat as a foot or
a shoulder. A foot occurs at the point where the flat just begins to turn up into a
hill. At this point the third directional derivative in the direction toward the hill is
nonzero, and the surface increases in this direction. The shoulder is an analogous
case and occurs where the flat is ending and turning down into a hill. At this point the
maximum magnitude of the third directional derivative is nonzero, and the surface
decreases in the direction toward the hill. If the third directional derivative is zero
436
in all directions, then we are in a flat, not near a hill. Thus a flat may be further
qualified as being a foot or a shoulder or not qualified at all.
Hillside
A hillside point is anythmg not covered by the previous categories. It has a nonzero
gradient and no strict extrema in the directions of maximum and minimum second
directional derivative. If the hill is simply a tilted flat (i.e., has a constant gradient),
we call it a slope. If its curvature is positive (upward), we call it a convex hill. If
its curvature is negative (downward), we call it a mncave hill. If the curvature is
up in one direction and down in a perpendicular direction, we call it a saddle hill.
A saddle hill is illustrated in Fig. 8.25. If the curvature is 0 in all directions, the
area is called aflat. The flat, the convex hill, and the concave hill are illustrated in
Fig. 8.26.
A point on a hillside is an inpaction point if it has a zero crossing of the second
directional derivative taken in the direction of the gradient. .When the slope of the
second directional derivative is negative, the inflection point class is the same as the
step edge defined in Section 8.8.
To determine whether a point is a hillside, we simply take the complement of
the disjunction of the conditions given for all the previous classes. Thus if there
is no curvature, then the gradient must be nonzero. If there is curvature, then the
point must not be a relative extremum. Therefore a point is classified as a hillside
if all three sets of the following conditions are true (+ represents the operation of
437
logical implication):
A1 = A2 = 0
and
XI # 0
-t
-+
IlVf
11 # 0
V f -w'l' # 0
and
Rewritten as a disjunction rather than a conjunction of clauses, a point is classified
as a hillside if any one of the following four sets of conditions is true:
We can differentiate between classes of hillsides by the values of the second directional derivative. The distinction can be made as follows:
Slope if XI = X2 = 0
Convex if XI > X2 2 0, XI # 0
Concave if XI
Saddle if XI
I X2 I 0,
XI
#0
* X2 < 0
438
he hcet Model
?B
Mathematical properties of topographic structures.
Table 8.5
+
0
+
+
+
*
-
0
0
o
0
o
+
*
0
o
o
o
-,+
-,+
-,+
*
-,+
0
*
+
+
-,+
-,+
Peak
Ridge
Saddle
Flat
Saddle
Ravine
Pit
Hillside (concave)
Ridge
Ridge
Hillside (concave)
Hillside (saddle)
Hillside (slope)
Hillside (saddle)
Hillside (convex)
Ravine
Ravine
Hillside (convex)
Cannot occur
From the table one can see that our classification scheme is complete. All possible combinations of first and second directional derivatives have a corresponding
entry in the table. Each topographic category has a set of mathematical properties
that uniquely determines it.
(Note: Special attention is required for the degenerate case X1 = X2 # 0,
can be any two orthogonal directions. In this case
which implies that a(')and dZ)
there always exists an extreme direction w that is orthogonal to Vf , and thus the
first directional derivative V - w is always zero in an extreme direction. To avoid
spurious zero directional derivatives, we choose w(') and w(') such that Vf .w(') #
0 and Vf - d2)
# 0, unless the gradient is zero.)
439
Let the original underlying gray level surface be f (r, c). Let w be a monotonically increasing gray level transformation, and let g(r,c) denote the transformed
image: g(r, c) = wlf(r, c)]. It is directly derivable that
gi(r,c) = w'lf(r, c>l* f i ( r , C)
from which we obtain that
Let us fix a position (r,c). Since w is a monotonically increasing function, w'
is positive. In particular, w' is not zero. Hence the direction IS that maximizes gh
also maximizes fi, thereby showing that the gradient directions are the same. The
categories peak, pit, ridge, ravine, saddle, and flat all have in common the essential
property that the first directional derivative is zero when taken in a direction that
extremizes the second directional derivative. To see the invariance, let be an
extremizing direction off;. Then for points (r, c) having a label (peak,pit, ridge,
ravine, saddle, or flat), fi(r,c) = 0 and af;(r,c)/Z@ = 0. Notice that
440
or in fact any function of the form h ( 9 + cZ).In the case of the cone, the gradient is proportional to (r, c), and the unnorrnalized eigenvectors corresponding to
eigenvalues
X(r ,c) = (rZ+ c2)-'12 and 0
are (-c,r) and (r,c), respectively. The eigenvector corresponding to the nonzero
eigenvalue is orthogonal to the gradient direction. The entire surface of the inverted
cone, except for the apex, is a ravine. Other, nonradially symmetric examples exist
as well.
The identification of points that are really ridge or ravine continua can be made
as a postprocessing step. Points that are labeled ridge or ravine and have neighboring
points in a direction orthogonal to the gradient that are also labeled ridge or ravine
are ridge or ravine continua. These continua can be reclassified as hillsides if the
label of ridge or ravine does not make sense in the application.
44 1
strict, local, one-dimensional extremum has occurred (flat and hillside). By onedimensional, we mean along a line (in a particular direction). A strict, local, onedimensional extremum can be located by finding those points within a pixel's area
where a zero crossing of the first directional derivative occurs.
So that we do not search the pixel's entire area for the zero crossing, we
search only in the directions of extreme second directional derivative, w(') and d2).
Since these directions are well aligned with curvature properties, the chance of
overlooking an important topographic structure is minimized, and more important,
the computational cost is small.
When XI = X2 # 0, the directions w(') and a(*)are not uniquely defined. We
handle this case by searching for a zero crossing in the direction given by H-'* Vf .
This is the Newton direction, and it points directly toward the extremum of a
quadratic surface.
For inflection-point location (first derivative extremum), we search along the
gradient direction for a zero crossing of the second directional derivative. For onedimensional extrema, there are four cases to consider: (1) no zero crossing, (2) one
zero crossing, (3) two zero crossings, and (4) more than two zero crossings of the
first directional derivative. The next four sections discuss these cases.
442
Table 8.6
+
+
+
0
0
Flat
Concave hill
Concave hill
Saddle hill
Slope
Saddle hill
Convex hill
Convex hill
Table 8.7
IlVfll
A,
A2
Label
Peak
Ridge
Saddle
Saddle
Ravine
Pit
0
0
0
0
0
+
+
+
+
-
Table 8.8
LABELl
LABEL2
Resulting Label
Peak
Peak
Peak
Ridge
Pit
Ravine
Saddle
Ridge
Ravine
Saddle
Ravine
Saddle
Peak
Peak
Pit
Pit
Saddle
Ridge
Ridge
Ridge
Ravine
Ravine
443
Pit
Pit
Saddle
Ridge
Saddle
Saddle
Ravine
Saddle
is a local maximum in only one direction. Thus peak conveys more information
about the image surface. An analogous argument can be made if the labels are pit
and ravine. Similarly, a saddle gives us more information than a ridge or a ravine.
Thw a pixel is assigned "saddle" if its zero crossings have been labeled ridge and
saddle or ravine and saddle.
It is apparent from Table 8.8 that not all possible label combinations are accounted for. Some combinations, such as peak and pit, are omitted because of the
assumption that the underlying surface is smooth and sampled frequently enough
that a peak and a pit will not both occur within the same pixel's area. If such a case
does occur, our convention is to choose arbitrarily either LABEL1 or LABEL2 as
the resulting label for the pixel.
Case
If more than two zero crossings occur within a pixel's area, then in at least one
of the extrema directions there are two zero crossings. If this happens, we choose
the zero crossing closest to the pixel's center and ignore the other. If we ignore the
further zero crossings, then this case is identical to case 3. This situation has yet to
occur in our experiments.
M Tfi#
Previous Work
Detection of topographic structures in a digital image is not a new idea. A wide
variety of techniques for detecting pits, peaks, ridges, ravines, and the like have
been described.
Peuker and Johnston (1972) characterize the surface shape by the sequence of
positive and negative differences as successive surrounding points are compared
with the cenbral point. Peuker and Douglas (1975) describe several variations of this
method for detecting one of the shapes from the set (pit, peak, pass, ridge, ravine,
break, slope, flat). They start with the most frequent feature (slope) and proceed to
the less fnquent, thus making it an order-dependent algorithm.
Johnston and Rosenfeld (1975) attempt to find peaks by finding all points P
such that no points in an n x n neighborhood surrounding P have greater elevation
than P. Pits are found in an analogous manner. To find ridges, they identify points
that are either east-wst or north-south elevation maxima. This is done by using
a "smoothed" array in which each point is given the highest elevation in a 2 x 2
square CO-g
it. East-west and north-south maxima are also found on this array.
Ravines arc found in a similar manner.
Paton (1975) uses a six-term quadratic expansion in Legendre polynomials fitted
to a small disk around each pixel. The most s i d c a n t coefficients of the secondorder polynomial yield a descriptive label chosen from the set (constant, ridge,
valley, peak, bowl, saddle, ambiguous). He uses the continuous-fit formulation in
setting up the surface fit equations as opposed to the discrete least-squares fit used
in the facet model. The continuous fit is a more expensive computation than the
discrete fit and results in a step& approximation.
Grender's (1976) algorithm compares the gray level elevation of a central point
with surrounding elevations at a given distance around the perimeter of a circular
window. The radius of the window may be increased in successive passes through
the image. This topographic-labeling set consists of slope, ridge, valley, knob, sink,
and saddle.
Toriwaki and FUumura (1978) take a totally different approach from all the
others. They use two local features of gray level pictures, connectivity number and
Exercises
445
coefficient of curvature, for classification of the pixel into peak, pit, ridge, ravine,
hillside, or pass. They then describe how to extract structural information from the
image once the labelings have been made. This structural information consists of
ridge lines, ravine lines, and the like.
Hsu, Mundy, and Beaudet (1978) use a quadratic surface approximation at every
point on the image surface. The principal axes of the quadratic approximation are
used as directions in which to segment the image. Lines emanating from the center
pixel in these directions thus provide natural boundaries of patches approximating
the surface. The authors then selectively generate the principal axes from some
critical points distributed over an image and interconnect them into a network to
get an approximation of the image data. In this network, which they call the web
representation, the axes divide the image into regions and show important features,
such as edges and peaks. They are then able to extract a set of primitive features
from the nodes of the network by mask matching. Global features, such as ridge
lines, are obtained by state transition rules.
Lee and Fu (1981) define a set of 3 x 3 templates that they convolve over the
image to give each class except the class they call plain, a figure of merit. Their set
of labels includes none, plain, slope, ridge, valley, foot, and shoulder. Thresholds
are used to determine into which class the pixel will fall. In their scheme a pixel may
satisfy the definition of zero, one, or more than one class. Ambiguity is resolved
by choosing the class with the highest figure of merit.
Exercises
8.1.
a
&
8.2.
+f,2)4
Using the previous result, show that the change in a unit vector in the direction of
the gradient taken in a direction of angle 8 (B being clockwise with respect to the
column axis) is given by
(sin8 C-8)
frrfrr
Lrrfcc
f? +fc2
8.3.
f c ( r i n 8 c a 0 ) f r r f rc
L f C c ) (!;r)
m=.
u;2
fr
&f.
( )d
fx
-fr
Show that the change in gradient direction taken in the direction of the gradient is
given by
cr? +f,2)4
-fr
CHAPTER
TEXTURE
Introduction
Texture is an important characteristic for the analysis of many types of images, from
multispectral scanner images obtained from aircraft or satellite platforms, which the
remote-sensing community analyzes, to microscopic images of cell cultures or tissue
samples, which the biomedical community analyzes, to the outdoor scenes and object
surfaces, which the machine vision community analyzes. Despite its importance and
ubiquity in image data, a formal approach or precise definition of texture does not
exist. The texture discrimination techniques are for the most part ad hoc. In this
chapter we discuss some of the extraction techniques and models that have been
used to measure textural properties and to compute three-dimensional shape from
texture.
Ehrich and kith (1978) summarize the following issues in texture analysis:
1. Given a textured region, determine to which of a finite number of classes the
region belongs.
2. Given a textured region, determine a description or model for it.
3. Given an image having many textured areas, determine the boundaries between
the differently textured regions.
Issue 1 has to do with the pattern recognition task of textural feature extraction.
Issue 2 has to do with generative models of texture. Issue 3 has to do with using
what we know about issues 1 and 2 to perform a texture segmentation of an image.
In the remainder of this section we provide a brief historical elaboration of issues 1
and 2.
Early work in image texture analysis sought to discover useful features that had
some relationship to the fineness and coarseness, contrast, directionality, roughness,
and regularity of image texture. Tamura, Mori, and Yamawaki (1978) discuss the
454 Texture
relationship of such descriptive measures to human visual perception. Typically, an
image known to be texturally homogeneous was analyzed, and the problem was to
measure textural features by which the image could be classified. For example, with
microscopic imagery, analysts discriminated between eosinophils and large lymphocytes by using a textural feature for cytoplasm and a shape feature for cell nucleus
(Bacus and Gose, 1972). With aerial imagery, they discriminated between areas having natural vegetation and trees and areas having manmade objects, buildings, and
roads by using textural features (Haralick and Shanmugam, 1973). These statistical
textural-feature approaches included use of the autoconelation function, the spectral power density function, edgeness per unit area, spatial gray level co-occurrence
probabilities, gray level run-length distributions, relative extrema distributions, and
mathematical morphology.
Later approaches to image texture analysis sought a deeper understanding of
what image texture is by the use of a generative image model. Given a generative
model and the values of its parameters, one can synthesize homogeneous image
texture samples associated with the model and the given value of its parameters.
This association provides a theoretical and visual means of understanding the texture. Image texture analysis then amounts to verification and estimation. First one
must verify that a given image texture sample is consistent with or fits the model.
Then one must estimate the values of the model parameters on the basis of the
observed sample. Autoregressive, moving-average, time-series models (extended
to two dimensions), Marlaw random fields, and mosaic models are examples of
model-based techniques.
The image texture we consider is nonfigurative and cellular. We think of this
kind of texture as an organized-area phenomenon. When it is decomposable, it
has two basic dimensions on which it may be described. The first dimension is
concemed with the gray level primitives or local properties constituting the image
texture, and the second dimension is concerned with the spatial organization of the
gray level primitives. The basic textural unit of some textural primitives in their
defining spatial relationships is sometimes called a texel, for texture element.
Gray level primitives are regions with gray level properties. The gray level
primitive can be described in terms such as the average level or the maximum and
minimum levels of its region. A region is a maximally ~ 0 ~ e ~sett eofdpixels having
a given gray level property. The gray level region can be evaluated in terms of its
area and shape. The gray level primitive includes both its gray level and gray level
region properties.
An image texture is described by the number and types of its primitives and
their spatial organization or layout. The spatial organization may be random, may
have a pairwise dependence of one primitive on a neighboring primitive, or may
have a dependence of n primitives at a time. The dependence may be structural,
probabilistic, or functional (like a linear dependence).
Image texture can be qualitatively evaluated as having one or more of the
properties of fineness, coarseness, smoothness, granulation, randomness, lineation or as being mottled, irregular, or hummocky. Each of these qualities translates into some property of the gray level primitives and the spatial interaction
between them. Unfortunately, few investigators have attempted experiments to map
4!
-9
*-
,
'
9.1 Introduction
455
semantic meaning into precise properties of gray level primitives and their spatial
distribution.
When the gray level primitives are small in size and the spatial interaction
between gray level primitives is constrained to be very local, the resulting texture is
a microtexture. The simplest example of a microtexture occurs when independent
Gaussian noise is added to each pixel's value in a smooth gray level area (Fig. 9.1
a). As the Gaussian noise becomes more correlated, the texture becomes more of a
microtexture. Finally, when the gray level primitives begin to have their own distinct
shape and regular organization, the texture becomes a macrotexture. This illustrates
an important point. Texture cannot be analyzed without a frame of reference in which
a gray level primitive is stated or implied. For any textural surface there exists a
scale at which, when the surface is examined, it appears smooth and textureless.
Then as resolution increases, the surface appears as a fine texture and then a coarse
one, and for multiple-scale textural surfaces the cycle of smooth, fine, and coarse
may repeat.
Figure 9.l(a) shows a white-noise microtexture. Figure 9.l(b) is a box-filtered
defocusing of the image in Part (a). It illustrates the beginning organization of
some gray level primitives larger in size than the individual pixel. Parts (c) and (d)
are examples of textures in which the gray level primitives have become regions.
Parts (e) and (f) show textures in which the gray level primitives themselves have
their own identifiable shape properties. These constitute macrotextures.
To use the gray level and textural pattern elements objectively, we must explicitly define the concepts of gray level and texture. Doing so, we discover that
gray level and texture are not independent concepts. They bear an inextricable relationship to each other very much as do a particle and a wave. Whatever exists
has both particle and wave properties, and depending on the situation, either particle or wave properties may predominate. Similarly, in the image context both gray
level and texture are always there, although at times one property can dominate the
other, and we tend to speak of only gray level or only texture. Hence, when we explicitly define gray level and texture, we are defining not two concepts but one gray
level-texture concept.
The basic interrelationships in the gray level-texture concept are the following:
When a small-area patch of an image has little variation of gray level primitives,
the dominant property of that area is gray level. When a small-area patch has wide
variation of gray level primitives, the dominant property is texture. Crucial in this
distinction are the size of the small-area patch, the relative sizes and types of gray
IeveI primitives, and the number and placement or arrangement of the distinguishable
primitives. As the number of distinguishable gray level primitives decreases, the
gray level properties will predominate. In fact, when the small-area patch is only
the size of a pixel, so that there is only one discrete feature, the only property
present is simple gray level. As the number of distinguishable gray level primitives
increases within the small-area patch, the texture property will predominate. When
the spatial pattern in the gray level primitives is random and the gray level variation
between primitives is wide, a fine texture results. As the spatial pattern becomes
more definite and the gray level regions involve more and more pixels, a coarser
texture results.
356
Texture
Figure 9.1 Irnur ditkrenl )i~nd<nf rcuturr: ( a ) U'hite-noise resture. ihl colorednnl,r trulule drte~rninrrlhy h r n fltcring the white-noise texture. ( c ) and ( d )
Inore nlacrotcrturr\. Irl ; ~ n t tI i l Irxlure\ in which ~ h r :primitive.; hepn to have
their r m n ~den~itiahls
<h;ipe propcnic\.
457
Figure 8.2 The set of all distance-1 horizontal neighboring resolution cells on a
4 x 4 image.
458 Texture
image levels, would be used to calculate a distance-1 horizontal spatial gray level
dependence matrix.
Formally, for angles quantized to 45" intervals, the unnormalized frequencies
are defined by
( k - m = -d, t - n = -d)
I(k, 9 = i, I(m,n) = j )
where # denotes the number of elements in the set.
Note that these matrices are symmetric; P ( i ,j ; d , a) = P(j, i; d, a). The
distance metric p implicit in these equations can be explicitly defined by
p[(k,l),(m, n)] = max{lk - m 1, - n 1). Other distances, such as Euclidean, could
be used as well.
Consider Fig. 9.3(a), which represents a 4 x 4 image with four gray levels,
ranging from 0 to 3. Figure 9.3(b) shows the general form of any gray level spatial
dependence matrix.For example, the element in the (2,l)th position of the distance1 horizontal PH matrix is the total, number of times two gray levels of value 2 and
1 occurred horizontally adjacent to each other. To determine this number, we count
the number of pairs of resolution cells in RH such that the first resolution cell of the
pair has gray level 2 and the second resolution cell of the pair has gray level 1. In
Fig. 9.3(c) through ( f ) we calculate all four distance-1 gray level spatial dependence
matrices.
Using features calculated from the co-occurrence matrix (see Fig. 9.4), Haralick
and Bosley (1973) performed a number of identification experiments. On a set
of aerial imagery and eight terrain classes (old residential, new residential, lake,
swamp, marsh, urban, railroad yard, scrub or wooded), an 82%correct identification
was obtained. On a LANDSAT Monterey Bay, California, image, an 84% correct
identification was obtained with the use of 64 x 64 subimages and both spectral
and textural features on seven terrain classes: coastal forest, woodlands, annual
grasslands, urban areas, large irrigated fields, small irrigated fields, and water. On
a set of sandstone photo-micrographs, an 89% correct identification was obtained on
five sandstone classes: Dexter-L, Dexter-H, St. Peter, Upper Muddy, and Gaskel.
'
jz.
J
&
Q
Gray Level
Figure 9.3 Spatial cooccumnce calculations (Haralick, Shanmugam, and Dinstein. 1973).
The wide class of images in which they found that spatial gray level dependence
carries much of the texture information is probably indicative of the power and
generality of this approach.
The many possible co-occurrence features times the number of distance angle
relationships for which the co-occurrence matrices can be computed lead to a potentially large number of dependent features. Zucker (1980) suggests using only the
distance that maximizes a chi-square statistic of P. Tou and Chang (1977) discuss
an eigenvector-based feature extraction approach to help reduce the dimension of
fature space.
The power of the gray level cosccumnce approach is that it characterizes the
spatial intemlationships of the gray levels in a textural pattern and can do so in a
way that is invariant under monotonic gray level transformations. Its weakness is
that it does not capture the shape aspects of the gray level primitives. Hence it is not
likely to work well for textures composed of large-area primitives. Also, it cannot
capture the spatial relationships between primitives that are regions larger than a
pixel.
Julesz (1961) is fhe first to use co-occurrence statistics in visual human texture discrimination experiments. Darling and Joseph (1968) use statistics obtained
from nearest neighbor gray level transition probability matrices to measure textures
using spatial illtensity dependence in satellite images taken of clouds. Deutsch and
Belknap (1972) use a variant of co-occurrence matrices to describe image texture.
Zobrist and Thompson (1985) use co-occurrence statistics in a Gestalt grouping
experiment. Bartles and Wied (1975), and Bartles, Bahr, and Wied (1969) use onedimensional co-occunence statistics for the analysis of cervical cells. Rosenfeld
and Tray (1970), Haralick (1971), and Haralick, Shanmugam, and Dinstein (1973)
suggest the use of spatial co-occurrence for arbitrary distances and directions. Galloway (1975) usas gray level run-length statistics to measure texture. 'l3ese statistics
are computable from co-occurrence if one assumes that the image is generated by a
Uniformity of energy
pb
ij
C Pijlog Pi,
iJ
Maximum Probability
Pij
ti - j lk(Pij)'
Contrast
Inverse difference moment
Correlation
where p =
(Pi - Pii)2(Piiy-'
p:
C iPij
where Pi =
pi,
i
Homogeneity
cJ
Cluster Tendency
c i j ( i + j - ~PYP,,
Figure 8.4 Eight of the common features computed from the cwccumnce probabilities.
Markov process. Chen and Pavlidis (1979) use the co-occurrence matrix in conjunction with a split-and-merge algorithm to segment an image at textural boundaries.
Tou and Chang (1977) use statistics from the co-occurrence matrix, followed by
a principal-components eigenvectordimensionality-reductionscheme to reduce the
dimensionality of the classification problem.
Statistics that Haralick, Shanmugam, and Dinstein (1973) compute from such
co-occurrence matrices of equal probability quantized images (see also Conners and
Harlow, 1978) are used to analyze textures in satellite images (Haralick and Shanmugarn, 1974). An 89% classification accuracy is obtained. Additional applications
of this technique include the analysis of microscopic images (Haralick and Shanmugam, 1973), pulmonary radiographs (Chien and Fb, 1974), and cervical cell,
leukocyte, and lymph node tissue section images (Pressman, 1976).
Vickers and Modestino (1982) argue that using features of the cosccurrence
matrix in a classification situation is surely suboptimal and that better results arc
obtained by using the co-occurrence matrix directly in a maximum-likelihood classifier. They report better than a 95% correct identification accuracy in distinguishing
461
between tree bark, calf leather, wool, beach sand, pigskin, plastic bubbles, herringbone weave, raffia, and wood grain textures.
Bacus and Gose (1972) use a gray level difference variant of the co-occurrence
matrix to help distinguish between eosinophils and lymphocytes. They use the probability of a given contrast occurring in a given spatial relationship as a textural feature.
This gray level difference probability can be defined in terms of the co-occurrence
probabilities by
p(d) =
p(i, j )
i
li-jl = d
The probability of a small contrast d for a coarse texture will be much higher than
for a fine texture. Bacus and Gose use statistics of the differences between a pixel on
a red image and a displaced pixel on a blue image. Rosenfeld, Wang, and Wu (1982)
also suggest using multispectral difference probabilities. Haralick and Shanrnugarn
(1973) use multispectral co-occurrence probabilities.
Weszka, Dyer, and Roknfeld (1976) use contrast, energy, entropy, and mean
of P(d) as texture measures and report that they did about as well as the cooccurrence probabilities. Sun and Wee (1983) suggest a variant of the gray level
difference distribution. They fix a distance d and a contrast c and determine the
number of pixels each having gray level g and each having n neighbors that are
within distance d and within contrast c. That is,
From P(g, n) they compute a variety of features, such as entropy and energy. They
report an 85% classification accuracy in distinguishing between textures of three different geological terrain types on LANDSAT imagery. Wechsler and Kidode (1979)
and Wechsler (1980) use the gray level difference probabilities to define a randomwalk model for texture. See de Souza (1983) and Percus (1983) for some comments
on the random-walk model.
Haralick (1975) illustrates a way to use co-occurrence matrices to generate
an image in which the value at each resolution cell is a measure of the texture in
the resolution cell's neighborhood. All these studies produce reasonable results on
different textures. Comers and Harlow (1976, 1980) conclude that this spatial gray
level dependence technique is more powerful than spatial frequency (power spectra),
gray level difference (gradient), and gray level run-length methods (Galloway, 1975)
of texture quantification.
Dyer, Hong, and Rosenfeld (1980) and Davis, Clearman, and Aggarwal(1981)
compute co-occurrence features for local properties, such as edge strength maxima
and edge direction relationships. They suggest computing gray level co-occurrence
involving only those pixels near edges. Zucker and Kant (1981) also suggest using
generalized co-occurrence statistics. Tenopoulos and Zucker (1982) report a 13%
increase in accuracy when combining gray level co-occurrence features with edge
co-occurrence features in the diagnosis of osteogenesis imperfects from images of
fibroblast cultures.
.. .
C(
and
y = -C ( x .
--
- pxx. - P)'
n-1
463
464
Texture
texture, we could lay expanding circles around it and locate the shortest distance
between it and every other kind of primitive. Our co-occurrence frequency would be
three-dimensional, two dimensions for primitive kind and one dimension for shortest
distance. This can be dimensionally reduced to two dimensions by considering only
the shortest distance between each pair of like primitives.
To define the concept of generalized co-occurrence, it is necessary first to
decompose an image into its primitives. Let Q be the set of all primitives on the
image. Then we measure primitive properties, such as mean gray level, variance of
gray levels, region size and region shape. Let T be the set of primitive properties
and f be a function assigning to each primitive in Q a property of T. Finally, we
need to specify a spatial relation between primitives, such as distance or adjacency.
Let S E Q x Q be the binary relation pairing all primitives that satisfy the spatial
relation. The generalized co-occurrence matrix P is defined by
P(tl,f2) =
#s
P ( t l ,t2) is just the relative frequency with which two primitives occur with specified
spatial relationships in the image, one primitive having property t l and the other
having property t2.
Zucker (1974) suggests that some textures may be characterized by the frequency distribution of the number of primitives related to any given primitive. This
probability p(k) is defined by
From one point of view, texture relates to the spatial size of the gray level primitives on an image. Gray level primitives of larger size are indicative of coarser
textures; gray level primitives of smaller size are indicative of finer textures. The
autocorrelation function is a feature that describes the size of the gray level primitives.
We explore the autocorrelation function with the help of a thought experiment.
Consider two image transparencies that are exact copies of each other. Overlay one
transparency on top of the other, and with a uniform source of light measure the
average light transmitted through the double transparency. Now translate one transparency relative to the other and measure only the average light transmitted through
the portion of the image where one transparency overlaps the ( x , y) translated positions. The two-dimensional autocorrelation function of the image transparency is
their average normalized with respect to the (0,O) translation.
Let I(u,v ) denote the transmission of an image transparency at position (u, v ) .
We assume that outside some bounded rectangular region 0 < u < L, and 0 _<
i"
465
v 5 L,, the image transmission is zero. Let (x, y) denote the x-translation and
y-translation, respectively. The autocorrelation function for the image transparency
d is formally defined by
If the gray level primitives on the image are relatively large, then the autocorrelation
will drop off slowly with distance. If the gray level primitives are small, then the
autocorrelation will drop off quickly with distance. To the extent that the gray level
primitives are spatially periodic, the autocorrelation function will drop off and rise
again in a periodic manner. The relationship between the autocorrelation function
and the power spectral density function is well known: They are Fourier transforms
of each other.
The gray level primitive in the autocorrelation model is the gray level. The
spatial organization is characterized by the correlation coefficient that is a measure
of the linear dependence one pixel has on another.
466
Texture
Homing and Smith (1973) does work similar to Gramenopoulos, but with aerial
multispectral scanner imagery instead of LANDSAT imagery.
Kirvida and Johnson (1973) compare the fast Fourier, Hadamard, and Slant
transforms for textural features on LANDSAT imagery over Minnesota. They use
8 x 8 subimages and five categories: hardwoods, conifers, open, city, and water.
Using only spectral information, they obtain 74% accuracy. When they add textural
information, they increase their identification accuracy to 99%.
Maurer (1974a, 1974b) obtains encouraging results by classifying crops from
low-altitude color photography on the basis of a onedimensional Fourier series taken
in a direction orthogonal to the rows.
Bajcsy (1973a, 1973b) and Bajcsy and Lieberman (1974, 1976) divide the
image into square windows and use the two-dimensional power spectrum of each
window. They express the power spectrum in a polar coordinate system of radius
r versus angle (b, treating the power spectrum as two independent one-dimensional
functions of r and (b. Directional textures tend to have peaks in the power spectrum
as a function of 4. Bloblike textures tend to have peaks in the power spectrum as
a function of r . The investigators show that texture gradients can be measured by
locating the trends of relative maxima of r or 4 as a function of the position of the
window whose power spectrum is being taken. As the relative maxima along the
radial direction tend to shift toward larger values, the image surface becomes more
finely textured.
In general, features based on Fourier power spectra have been shown to perform
more poorly than features based on second-order gray level co-occurrence statistics
(Haralick, Shanmugam, and Dinstein, 1973) or those based on first-order statistics
of spatial gray level differences (Weszka, Dyer, and Rosenfeld, 1976; Conners and
Harlow, 1980). Presence of aperture effects has been hypothesized to account for
part of the unfavorable performance by Fourier features compared with spatialdomain gray level statistics (Dyer and Rosenfeld, 1976), although experimental
results indicate that this effect, if present, is minimal. However, D'Astous and
Jernigan (1984) claim that the reason for the poorer performance is that earlier
studies employing the Fourier transform features used summed spectral energies
within band- or wedge-shaped regions in the power spectrum. They argue that
additional discriminating information can be obtained from the power spectrum in
terms of characteristics such as regularity, directionality, linearity, and coarseness.
The degree of regularity can be measured by the relative strength of $e highest
non-DC peak in the power spectrum. Other peak features include the Laplacian
at the peak, the number of adjacent neighbors of the peak containing at least 50%
of the energy in the peak, the distance between the peak and the origin, and the
polar angle of the peak. In the comparative experiment reported by D'Astous and
Jernigan, the peak features yielded uniformly greater interclass differences than the
co-occurrence features, and the co-occurrence features yielded uniformly greater
interclass distances than the summed Fourier energy features.
Pentland (1984) computes the discrete Fourier transform for each block of 8 x
8 pixels of an image and determines the power spectrum. He then uses a linear
regression technique on the log of the power spectrum as a function of frequency to
estimate the fractal dimension D. For gray level intensity surfaces of textured scenes
*I.
cf
467
that satisfy the fractal model (Mandelbrot, 1983), the power spectrum satisfies
Pentland reports a classification accuracy of 84.4% on a texture mosaic using fractal
dimensions computed in two orthogonal directions.
Transforms other than the Fourier can be used for texture analysis. Kirvida
(1976) compared the fast Fourier, Hadamard, and Slant Transforms for textural
features on aerial images of Minnesota. Five classes (i.e., hardwood trees, conifers,
open space, city, and water) were studied with the use of 8 x 8 subirnages; a 74%
correct classification rate was obtained by using only spectral information. This rate
increased to 98.5% when textural information was also included in the analysis.
These researchers reported no significant difference in the classification accuracy as
a function of which transform was employed.
The simplest orthogonal transform that can be locally applied is the identity
transformation. Lowitz (1983, 1984) and Carlotto (1984) suggest using the local
histogram for textural feature extraction. Lowitz uses window sizes as large as
16 x 16; Carlotto, as large as 33 x 33.
The power of the spatial frequency approach to texture is the familiarity we have
with these concepts. One of the inherent problems, however, is in regard to gray level
calibration of the image. The procedures are not invariant under even a monotonic
transformation of gray level. To compensate for this, probability quantizing can
be employed. But the price paid for the invariance of the quantized images under
monotonic gray level transformations is the resulting loss of gray level precision in
the quantized image. Weszka, Dyer, and Rosenfeld (1976) compare the effectiveness
of some of these techniques for terrain classification. They conclude that spatial
frequency approaches perform significantly poorer than the other approaches.
Textural Energy
In the textural energy approach (Laws, 1980, 1985), the image is first convolved
with a variety of kernels. If I is the input image and g, ,. .. ,g, are the kernels, the
images J , = I * g,, n = 1,. . . ,N are computed. Then each convolved image is
processed with a nonlinear operator to determine the total textural energy in each
pixel's 7 x 7 neighborhood. The energy image corresponding to the nth kernel is
defined by
-
Associated with each pixel position (r,c), therefore, is a textural feature vector
[sl(r, c), . .. ,SN(~,C)]
.
The textural energy approach is very much in the spirit of the transform approach, but .it uses smaller windows or neighborhood supports. The kernels Laws
uses have supports for 3 x 5,5 x 5, and 7 x 7 neighborhoods. The one-dimensional
f o m are illustrated in Fig. 9.5. The two-dimensional f o m are generated from
the one-dimensional form by outer products. That is, if kl and k2 are two one-
468
Texture
Flgun 9.5 Oncdimensional textural energy kernels. L stands for level, E for
edge, S for shape, W far wave, R for ripple, and 0 for oscillation.
469
Unser (1984) notes that one could use a discrete orthogonal transform, such
as the discrete sine or discrete cosine transform, applied locally to each pixel's
neighborhood instead of using the ad hoc linear operators of Laws. He indicates a
classification accuracy above 96% with the discrete sine transform in distinguishing
between textures of paper, grass, sand, and raffia. Ikonomopoulos and Unser (1984)
suggest local directional filters. Jernigan and D'Astous (1984) compute an FFT on
windows and then use the entropy in different-sized regions for the normalized power
spectrum for textural features.
Textural Edgrness
The autocorrelation function and digital transforms basically both reference texture to spatial frequency. Rosenfeld and Troy (1970) and Rosenfeld and Thurston
(1971) conceive of texture not in terms of spatial frequency but in terms of edgeness per unit area. An edge passing through a pixel can be detected by comparing
the values for local properties obtained in pairs of nonoverlapping neighborhoods
bordering the pixel. To detect microedges, small neighborhoods can be used; to
detect mrmcroedges, large neighborhoods can be used (Fig. 9.6).
The local property that Rosenfeld and Thurston suggest is the quick Roberts
gradient (the sum of the absolute value of the differences between diagonally opposite
neighboring pixels). The Roberts gradient is an estimate of the image gradient. Thus
a measure of texture for any subimage can be obtained by computing the Roberts
gradient image for the subimage and from it determining the average value of the
gradient in the subimage.
Sutton and Hall (1972) extend Rosenfeld and Thurston's idea by making the
gradient a function of the distance between the pixels. Thus for every distance d
and subimage I defined over neighborhood N, they compute
(a)
(b)
The curve of g(d) is like the graph of the minus autocorrelation function translated
vertically.
Sutton and Hall apply this textural measure in a pulmonary disease identification experiment and obtain an identification accuracy in the 80 percentile range
for discriminating between normal and abnormal lungs when using a 128 x 128
subimage.
Triendl(1972) measures degree of edgeness by filtering the image with a 3 x 3
averaging filter and a 3 x 3 Laplacian filter. The two resulting filtered images are
then smoothed with an 11 x 11 smoothing filter. The two values of average level and
roughness obtained from the low- and high-frequency filtered image can be used as
textural features.
Hsu (1977) determines textural edgeness by computing gradientlike measures
for the gray levels in a neighborhood. If N denotes the set of resolution cells in
a neighborhood about a pixel and g, is the gray level of the center pixel, I( is the
mean gray level in the neighborhood, and p is a metric, then Hsu suggests
Vector Dispersion
The vector dispersion estimate for texture was first applied by Harris and Barrett
(1978) in a cloud texture assessment study and is based on a theory developed by
Fisher (1953). In the vector dispersion technique, the image texture is divided into
mutually exclusive neighborhoods, and a sloped plane fit (see Chapter 8) to the
gray levels is performed for each neighborhood.,That is, using a relative coordinate
system whose origin is the center of the neighborhood, one can model the gray
levels in the neighborhood with
The unit normal vector to the plane fit for the ith neighborhood is given by
(5)
471
where
According to Fisher (1953), the distribution of errors over the unit sphere is
proportional to
01
where cos Oi = Ni
satisfies
K=-
N-R
takes the value 1. For uneven or rough textures,
takes a
Rosenfeld and Troy (1970) suggest the number of extrema per unit area for a texture
measure. They suggest defining extrema in one dimension only along a horizontal
scan in the following way. In any row of pixels, a pixel i is a relative minimum if
its gray level g(i) satisfies
g(i) I g(i
+ 1)
and
g(i) 5 g(i - 1)
(9.1)
and
g(i) 2 g(i - 1)
(9.2)
g(i) 2 g(i
+ 1)
472 Texture
'
Note that with this definition each pixel in the intefior of any constant gray level run
of pixels is considered simultaneously a relative minimum and a relative maximum.
This is so even if the constant run is just a plateau on the way down to or on the
way up from a relative extremum.
The algorithm employed by Rosenfeld and Troy marks every pixel in each
row that satisfies Eq. (9.1) or (9.2). Then it centers a square window around each
pixel and counts the number of marked pixels. The texture image created this way
corresponds to a defocused marked image.
Mitchell, Myers, and Boyne (1977) suggest the extrema idea of Rosenfeld and
Troy except they use true extrema and operate on a smoothed image to eliminate
extrema due to noise (Carlton and Mitchell, 1977; Ehrich and Foith, 1976, 1978a,
1978b).
One problem with simply counting all extrema in the same extrema plateau as
extrema is that extrema per unit area as a measure is not sensitive to the difference
between a region having a few large plateaus of extrema and many single-pixel
extrema. The solution to this problem is to count an extrema plateau only once.
This can be achieved by locating some central pixel in the extrema plateau and
marking it as the extremum associated with the plateau. Another way of achieving
this is to associate a value of 1/N for every extremum in an N-pixel extrema plateau.
In the onedimensional case, two properties can be associated with every extremum: its height and its width. The height of a maximum can be defined as the
difference between the value of the maximum and the highest adjacent minimum.
The height (depth) of a minimum can be defined as the difirence between the value
of the minimum and the lowest adjacent maximum. The width of a maximum is the
distance between its two adjacent minima. The width of a minimum is the distance
between its two adjacent maxima.
Two-dimensional extrerna are more complicated than onedimensional extrema.
One way of finding extrerna in the full two-dimensional sense is by the iterated
use of some recursive neighborhood operators propagating extrema values in an
appropriate way. Maximally connected areas of relative extrema may be areas of
single pixels or may be plateaus of many pixels. We can mark each pixel in a relative
extrema region of size N with the value h , indicating that it is part of a relative
extremum having height h, or mark it with the value h I N , indicating that it is part
of a relative extrema region with the value h. Alternatively, we can mark the most
centrally located pixel in the relative extrema region with the value h . Pixels not
marked can be given the value 0. Then for any specified window centered on a given
pixel, we can add up the values of all the pixels in the window. This sum divided
by the window size is the average height of extrema in the area. Alternatively we
could set h to 1, and the sum would be the number of relative extrema per unit area
to be associated with the given pixel.
Going beyond the simple counting of relative extrema, we can associate properties with each relative extremum. For example, given a relative maximum, we can
determine the set of all pixels reachable only by the given relative maximum, not
by any other, by monotonically decreasing paths. This set of reachable pixels is a
connected region and forms a mountain. Its border pixels may be relative minima
or saddle pixels.
*
v
9s
473
The relative height of the mountain is the difference between its relative maximum and the highest of its exterior border pixels. Its size is the number of pixels
that constitute it. Its shape can be characterized by features such as elongation, circularity, and symmetric axis. Elongation can be defined as the ratio of the larger
to the smaller eigenvalue of the 2 x 2 second-moment matrix obtained from the (;)
coordinates of the border pixels (Bachi, 1973; Frolov, 1975)._Circularitycan be defined as the ratio of the standard deviation to the mean of the radii from the region's
center to its border (Haralick, 1975). The symmetric axis feature can be determined
by thinning the region down to its skeleton and counting the number of pixels in
the skeleton. For regions that are elongated, it may be important to measure the
direction of the elongation or the direction of the symmetric axis.
Osman and Saukar (1975) use the mean and variance of the height of mountain
or the depth of valley as properties of primitives. Tsuji and Tomita (1973) use size.
Histograms and statistics of histograms of these primitive properties are all suitable
measures for textural primitive properties.
Mathematical Morphology
The morphological approach to the texture analysis of binary images was proposed
by Matheron (1975) and Serra and Verchery ( 1973). Mathematical morphology was
discussed in Chapter 5. Here we review the basic definitions to make this discussion
readable by itself. The approach requires the definition of a structuring element (i.e.,
a set of pixels constituting a specific shape, such as a line, a disk, or a square) and
the generation of binary images that result from the translation of the structuring
element through the image and the erosion of the image by the structuring element.
The textural features can be obtained from the new binary images by counting the
number of pixels having the value 1. This mathematical morphology approach of
Serra and Matheron is the basis of the Leitz texture analyser (TAS) (Mueller and
Herman, 1974; Mueller, 1974; Serra, 1974) and the Cyto computer (Sternberg,
1979). A broad spectrum of applications has been found for this quantitative analysis
of microstructures method in materials science and biology.
We begin by examining the texture of binary images. Let H , a subset of pixels,
be the structuring element. We define the translate of H by row-column coordinates
(r, C) as H(r, c), where
H(r,c) = {(i, j)l for some (rl,cl) E H , i = r + r l , j = c + c l )
The erosion of F by the structuring element H , written as F 8 H , is defined as
474
Texture
+ i , c +j ) - H ( i ,j ) } = (I 6 X)(r,c)
J
The dilation of gray level image I by gray level structuring element H produces
a gray level image f defined by
e.aig
J(r,c) =max{I(r - i , c - j ) + H ( i ,j ) } = ( I @ H ) ( r , c )
(is17
The gray level opening is defined as a gray level erosion followed by a gray level
dilation. The gray level closing is defined as a gray level dilation followed by a gray
level erosion. Commonly used gray level structuring elements include rods, disks,
cones, paraboloids, and hemispheres.
Peleg et al. (1984) use gray level erosion and dilation to determine the fractal
surface of the gray level intensity surface of a textural scene. They define the scale-k
volume of the blanket around a gray level intensity surface I by
475
where $ means a k-fold dilation with the structuring element, and 8 means a k-fold
erosionkwith the structuring element. Peleg et al. defined the struckring element H
over the five-pixel cross neighborhood taking the value of 1 for the center pixel and
0 elsewhere. The fractal surface area A at scale k is then defined by
Peleg et al. compare the similarity between textures by the weighted distance D
between their fractal signatures
Werman and Peleg (1984) give a fuzzy set generalization to the morphological operators. Meyer (1979) and Lipkin and Lipkin (1974) demonstrate the capability of
morphological textural p&ameters in biomedical image analysis. Theoretical properties of the erosion operator as well as those of other operators are presented by
Matheron (1963), Serra (1978), and Lantuejoul (1978). The importance of this
approach to texture analysis is that properties obtained by the application of operators in mathematical morphology can be related to physical three-dimensional shape
properties of the materials imaged.
Autoregression Models
The autoregression model provides a way to use linear estimates of a pixel's gray
level, given the gray levels in a neighborhood containing it, in order to characterize
texture. For coarse textures, the coefficients will all be similar. For fine textures,
the coefficients will vary widely.
The linear dependence that one pixel of an image has on another is well known
and can be illustrated by the autocorrelation function. This linear dependence is exploited by the autoregression model for texture, which was first used by McCormick
and Jayaramamurthy (1974). These researchers employ the Box and Jenkins (1970)
time-series seasonal-analysis method to estimate the parameters of a given texture.
They then use the estimated parameters and a given set of starting values to illustrate
that the synthesized texture is close in appearance to the given texture. Dcguchi and
Morishita (1976), Tou, Kao, and Chang (1976), and Tou and Chang (1976) also
use a similar technique.
Figure 9.8 shows this texture synthesis model. Given a randomly generated
noise image and any sequence of K-synthesized gray level values in a scan, the
next gray level value can be synthesized as a linear combination of the previously
figure 9.8 How, from a randomly generated noise image and a given starting
sequence 01,. . . ,aK representing the initial boundary conditions, all values in a
texture image can be synthesized by a one-dimensional autoregressive model.
D Pixels
Figure 9.9 How, from a randomly generated noise image and a given starting
sequence for the first-order D neighborhood in the image, all values in a texture
image can be synthesized by a two-dimensional autoregressivemodel.
-r
Auto-mgrusive T
477
Moving A r m p T m
Figure 9.10 How a gray level value for pixel (i,j ) can be estimated by using
the gray level values in the neighborhood N ( i ,j ) and the differences between the
actual values and the estimated values in the neighborhood.
Assuming a uniform prior distribution, we can decide that pixel (i,j ) has texture
category k if
- i k ( i j)l
, 5 la(i,j ) - i 1 ( i ,j)l for every 1 and la(i,j ) - i k ( i j)l
, 58
la(i,j ) - ak(i,j ) 1 > 8, then we can decide that pixel (i,j ) is a boundary pixel.
la(i,j )
478
Texture
the neighbors of ( r ,c). N ( r ,c ) does not include ( r ,c). Because the field is stationary, ( a , b ) E N ( r , c ) if and only if ( a + i , c j ) E N(r + i , c j ) for any ( i ,j ) .
Hence the spatial neighborhood configuration is the same all over the image. There
is an obvious difficulty with this condition holding at pixels near the image boundary. The usual way of handling the problem theoretically is to assume the image
is wrapped around a torus. In this case the canonical spatial neighborhood can be
given as N(0,O).For any ( r ,c ) , N ( r ,c ) is then a translation of N(0,O).
The conditional independence assumption is that the conditional probability of
the pixel given all the remaining pixels in the image is equal to the conditional
probability of the pixel given just the pixels in its neighborhood. That is,
Markov mesh models were first introduced into the pattern recognition community by Chow (1962) and then by Abend, Harley, and Kanal (1965).One important
issue is how to compute the joint probability function P[I(r,c ) : ( r ,c ) E R x C].
Hassner and Sklansky (1980) note that this can be done by identifying the conditional probability assumption with Gibbs ensembles, which are studied in statistical
mechanics. Woods (1972) shows that when the distributions are Gaussian, the discrete Gauss-Markov field can be written as an equation in which each pixel's value
is a linear combination of the values in its neighborhood plus a correlated noise
term. That is,
where the coefficients of the linear combination are given by the function h, and the
E R x C) represents a joint set of possible correlated Gaussian
set {u(r,c)J(r,c)
random variables. This equation has a lot of similarity to the autoregressive movingaverage time-series models of Box and Jenkins (1970).There the relationship would
be expressed by
where K(0,O) represents a domain that contains only pixels occurring after (0,O)in
the usual top-down raster scan order of an image. Hence each term in the summation
I(r - i , c - j ) contains only pixels occurring before pixel ( i ,j ) in the raster scan
order. The first summation is called the autoregressiveterm, and the second term is
called the moving-average term. When K(0,O) contains pixels occumng before and
after (0,O)in the raster scan order, the model is called a simultaneous autoregressive
model.
To determine the coefficients h(i,j ) for ( i ,j ) E N(O,O), we can proceed by
least squares. Define
479
Now take the partial derivatives of E* with respect to h(m,n) and set these partial
derivatives to zero. There results the system of equations
where u (m , n) =
(r.c)
(4)
ek(1 - elK-k
where
and
j ) E N(O.0).
The texture parameters here are a and b(i, j) ,for (i,
It is apparent that the discrete ~ a r k o vrandom field model is a generalization
of time-series autoregressive moving-average models that were initially explored
for image texture analysis by McCormick and Jayaramamurthy (1974), Tou and
Chang (1976), Tou, Kao, and Chang (1976), and Deguchi and Morishita (1978).
Related papers include Delp, Kashyap, and Mitchell (1979), Tou (1980), Chen
(1980), Faugeras (1980), Therrien (1980), and Jau, Chin, and Weinrnan (1984).
Issues concerning the estimation of h from texture samples can be found in Kashyap
and Chellappa (1981). De Souza (1982) develops a chi-square test to discriminate
microtextures described by autoregressive models.
Pratt, Faugeras, and Gagalowicz (1978) and Faugeras and Pratt (1980) consider
only the autoregressive term with independent noise and rewrite the autoregressive
equation as
Here {u(r,c)l(r,c) E R x C) represents independent random variables, not necessarily Gaussian. The left-hand side represents a convolution that decorrelates the
.=
480
Texture
image. Faugeras and Pratt characterize the texture by the mean, variance, skewness,
and kurtosis of the decorrelated image that is obtained by either estimating h or by
using a given gradient of a Laplacianlike operator to perform the decorrelation.
Table 9.1
Expected value for area A , perimeter length S , and number of sides N for
the line model, the occupancy model, and the Delauney model.
-
-I
I
2A
2
A
32
--
48 1
Texture Segmentation
Most work in image texture analysis has been devoted to textural feature analysis
of an entire image. It is apparent, however, that an image is not necessarily homogeneously textured. An important image-processing operation, therefore, is the
segmentation of an image into regions, each of which is differently textured. The
constraint is that each region has a homogeneous texture, such as that arising from
a frontal view, and that each pair of adjacent regions is differently textured. Bajcsy
(1973a, 1973b) is one of the first researchers to do texture segmentations for outdoor scenes. Her algorithm merges small, nearly connected regions having similar
local texture or color descriptors. For texture descriptors she uses Fourier transform features. The descriptors for each region include an indication of whether the
texture is isotropic or directional, the size of the texture element, and the separation between texture elements. If the texture is considered directional, then the
description includes the orientation.
Chen and Pavlidis (1979) use the split-and-mergealgorithm on the co-occurrence
matrix of the regions as the basis for merging. Let the four 2N-' x 2N-' windows in
a 2Nx 2Nwindow have CNE,CNW,
CSE,and CSWfor their respective co-occurrence
matrices. Then with only little error, the co-occurrence matrix C of the 2N x 2N
482
Texture
Experiments by Hong, Wu, and Rosenfeld (1980) indicate that the error of this
computation is minimal. The 2N x 2N window is declared to be.unifonnly textured
if for the user-specified threshold T,
E[max{cNE(i,j),CNW(i,j),CsE(i, j),CSW(i,j)}-
min{CNE(i,j),CNW(i,j),cSE(i,j),cSW(i,j)}] < T
Using this criteria, Chen and Pavlidis begin the merging process using 16 x 16
windows. Any 16 x 16 window not merged is split into four 8 x 8 windows. The
splitting continues until the window size is 4 x 4. The gray levels of the images
are quantized to eight levels. Chen and Pavlidis (1983) use a similar split-andmerge algorithm, with the correlation coefficients between vertically adjacent and
horizontally adjacent pixels as the feature vectors. Modestino, Fries, and Vickrs
(1981) use a Poisson line process to partition the plane and assign gray levels to each
region by a Gauss-Markov model using adjacent regions. They develop a maximumlikelihood estimator for the parameters of the process and show segmentation results
on artificially generated images having three different texture types.
Comers, Trivedi, and Harlow (1984) use six features from the co-occurrence
matrix to segment an aerial urban scene into nine classes: residential, commercial1
industrial, mobile home, water, dry land, runway/taxiway, aircraft parking, multilane highway, and vehicle parking. Their work is important because it integrates the
splitting idea of Chen and Pavlidis into a classification setting. They initially segmented the image into regions. Any region whose likelihood ratio for its highestlikelihood class against any other class was too low was considered a boundary
region and split. Any region whose likelihood ratio for its highest-likelihood class
against each other class was high enough was considered to be uniformly textured
and assigned to the highest-likelihood class.
Kashyap and Khotanzad (1984) use a simultaneous autoregressive and circular
autoregressive model for each 3 x 3 neighborhood of an image. Here each neighborhood produces a feature vector associated with the model. The set of feature
vectors generated fmm the image is clustered, and each pixel is labeled with the
cluster label of the feature vector associated with its 3 x 3 neighborhood. Pixels associated with outlier feature vectors are given the cluster label of the majority of
their labeled neighbors. Themen (1983) uses an autoregressive model for each textured region and superimposes a Markov random field to describe the transitions
of one region to another. He uses maximum likelihood and maximum a posteriori
estimation techniques to achieve a high-quality segmentation of aerial imagery.
z;
z
$4
483
brief guide to some of the representative papers in the literature. McCormick and
Jayaramamurthy (1974) use a time-series model for texture synthesis, as do Tou,
Kao, and Chang (1976). Yokoyama and Haralick (1978) use a structured-growth
model to synthesize a more complex image texture. Pratt, Faugeras, and Gagalowicz (1978) develop a set of techniques for generating textures with identical means,
variances, and autocorrelation functions but different higher-order moments. Gagalowicz (1981) gives a technique for generating binary texture fields with prescribed
second-order statistics. Chellappa and Kashyap (1981) describe a technique for the
generation of images having a given Gauss-Markov random field. '
Yokoyama and Haralick (1979) describe a technique that uses a Markov chain
method. Schacter (1980a) uses a long-crested wave model. Mome, Schmitt, and
Massaloux (1981) use an interlaced vertical and horizontal Markov chain method to
generate a texture image. Garber and Sawchuk (1981) use a best-fit model instead
of the Nth-order transition probabilities to make a good simulation of texture without exceeding computer memory limits on storing Nth-order probability functions.
Schmitt et al. (1984) add vector quantization to the bidimensional Markov technique
of Mome, Schmitt, and Massaloux (1981) to improve the appearance of the texture
image. Gagalowicz (1984) describes a texture synthesis technique that produces textures as they would appear on perspective projection images of three-dimensional
surfaces. Ma and Gagalowicz (1984) describe a technique to synthesize artificial
textures in parallel from a compressed-data set and retain good visual similarity to
natural textures.
484
Texture
equations for the slant and tilt angles of a planar surface under orthographic projection by measuring the distribution of tangent directions of zero-crossing contours,
assuming isotropic distribution of edge directions for frontally viewed textures.
Witkin divided the tangent angle interval [0, r ] into n equal intervals, the ith interval being [(i - l)r/n,ir/n], i = 1,. . . ,n, and measured the number k(i) of
tangent directions that fell in the ith interval. The slant angle s and the tilt angle t
of the observed surface were estimated to be that pair of values maximizing the a
posteriori probability of (s,t), given the observed k(i), i = 1,. . . ,n. Davis, Janos,
and Dunn (1983) indicated some mistakes in the Witkin paper and gave the joint a
posteriori probability of (s, t) as proportional to
sin(s) cos"(s)
r=,
[I - sin2(s)sin2(V)
-t)]
They also gave a modified version of the two-dimensional Newton method for determining the (s, t) achieving the maximization.
Other work that relates to surface orientation recovery from texture includes
that of Kender (1979), who described an aggregation Hough-related transform that
groups together edge directions associated with the same vanishing point, assuming parallel edges in frontally viewed textures. The edge direction is the gradient
direction. It is perpendicular to the direction along an edge boundary. In the case
of a line segment, it is perpendicular to the direction along the line segment. A
unit length edge direction E' = (Ex,Ey) at position P' = (P,, P,) has coordinates
T' = (Tx,Ty)in the transformed space where
All lines that go through the point P map to a circle whose center is P/2 and whose
radius is JI P (1 12. Kender gives a second transform defined by
Here a l l lines that go through the point P are mapped to a line that goes through
the point fP/JIPJJ2
in the orthogonal direction to P.
The way Kender makes use of this transform for the estimation of the normal to
a plane in three-dimensional space is as follows: Consider that the plane is densely
populated with texels that are just edge segments (edgels). Each has a position
and orientation. Each such edgel on the plane in three-dimensional space can be
represented as a set of points satisfying
(i) (!)
+k
The point ( x ,y, z) can be thought of as the center position of the edgel on the threedimensional plane, and (a,@,
y) are its direction cosines. From the perspective
geometry (see Chapter 13), the perspective projection (u, v) of any point ( x ,y, z)
satisfies
485
where f is the distance the image projection plane is in front of the center of
perspectivity. The perspective projection of the set of points on the edgel in the
plane in three-dimensional space therefore satisfies
(: )
= (ay
- px) (:;~-+!:z)
And if the edgels on the three-dimensional plane have a variety of different directions,
then the transform of all the pixels having the ith direction cosines (ai, Pi, yi) will
densely populate the line
Notice now, that tbese lines Li must intersect if there is more than one variety
of directions to the three-dimensional edgels. Hence, around the intersection point
there must be a higher concentration or density of transformed points. If, by this
higher concentration, the intersection point (pO,go) can be identified, then because
the nonnal to the plane must be perpendicular to the direction cosines for all lines
the plane contains, (po,go,-1) must be in the direction of the normal to the threedimensional plane, since @o, go,-1) satisfies
Ax+By+Cz+D=O
(9.5)
where A2 +B2 +C2 = 1.
From the perspective geometry (see Chapter 13) for any three-dimensional point
( x ,y , z ) , its perspective projection (u, v ) satisfies
where f is the distance from the image projection plane to the front of the center
of perspectivity. Hence
and y = -vz
"
"=7
Substituting the expression of Eq. (9.6) into Eq. (9.5) and solving for z results in
Let d be the distance between the center of perspectivity and the point determined by the intersection of the plane Ax + By Cf + D = 0 with the ray
X(u, v , f). This point is the three-dimensional point whose perspective projection
is (u, v). Then d? = x2 +yZ + z 2 .After substituting Eq. (9.6), there results
487
where
m.
+ + +
+ +
= R f d u 2 + v 2 + f2
The density y of texture primitives in this area is then
We have, without loss of generality, the freedom to choose the sign of n so that
nl[ is positive. Hence, after taking cube roots, we can rewrite Eq. (9.8)in the form
where
488
Texture
[,!n = hobi
where
The system of Eq. (9.11) is overconstrained. The normal n is not known and
must satisfy n'n = 1. The parameter ho is not known. We will find the n that
maximizes the cosine of the angle between 9 n and hob subject to the constraint
n'n = 1. An n maximizes the cosine of the angle between Qn and X,b if and only
if the n maximizes (@n)'b.Determining the n that maximizes (@n)'bis easy. Let
Taking partial derivatives of r with respect to n and q results in
dr = W b + 2qn
dn
- = n'n - 1
av
Setting these partial derivatives to zero and solving yields
Ohta, Maenobu, and Sakai (1981) assume that there are effective procedures
for finding texture primitives, that the texture primitives can be divided into types
distinguishable from one another on the perspective projection image, and that
the area of each instance of a textural primitive on the observed surface is the
same. They do not require that the density of texture primitives be constant on the
observed surface. They proceed by approximating the perspective projection with
an affine transformation and developing the required relationships from the affine
transformation. We will develop the analogous relationships in a more exact form
using the reasoning we have already given with solid angles.
489
Let A , be the area of each of the primitives on the textural plane surface. It is
a constant. Let R be the solid angle formed by this area with respect to the center
of perspectivity. Then
Both R and [ are functions of the position of the primitive on the perspective
projection image. Let AI be the area of the perspective projection of the primitive.
Then as we have already seen,
AI =L?fdu2 + v 2 + f 2
(9.13)
Consider the situation for two texture primitives having identical areas on the
textured plane Ax By Cz D = 0. Let the perspective projection image be
Ail and A12, respectively. Let
+ + +
1
Next consider the geometry illustrated in Fig. 9.11 of the distances h and h2
from the points (ul,v I ) and (u2,v 2 )to the line Au + Bv + Cf = 0 on the image
plane.
Hence
Draw the line between ( u , ,v and (u2,v2) and extend it until it intersects the
line Au +Bv +Cf = 0. Let f be the distance from ( u ,,v ,) along the extended line
to the line Au Bv Cf = 0, and let f 2 he the distance from (u2,v2)along the
+ +
490
Texture
Figure 9.11 The similar triangle geometry generated between two perspective
projection points (ul ,v 1) and (u2. v2) with their distance to a line Ax By
Cf =o.
Let g be the distance between ( u ,,v , ) and (242, v2). Then f = g +f,. Substituting
this into Eqs. (9.17)and (9.18)results in
Notice that the right-hand side of Eq. (9.19)contains all measurable quantities,
so that the distance f can be determined. With f,determined, the intersection point
(us,
v 3) is given by
Then from the measured position and area of each pair of texture primitives on
the perspective projection image, it is possible to determine one point on the line
A u + Bu + Cf = 0. The only assumption is that the area of the texture primitives
on the textured plane A x + B y + C z D = 0 is identical. From N pairs of such
texture primitives, N points on the line A u Bv Cf = 0 can be determined.
Since f is given, a fit of the N points to a line will produce the unknown coefficients
n' = (A,B,C),which is the unit normal to the textured plane.
+ +
?r
f-
8
'$
491
Aloimonos and Swain (1988) use the same affine transformation approximation
to the perspective projection that Ohta et al. (1981) do. In terms of the geometry
we have been doing with solid angles, they derive the relationship between the area
AI on the image of a texture primitive located at (u,v) and the area A, on the
textured plane as
where d, the distance from the center of perspectivity to the texture primitive on
the plane Ax + B y + Cz + D = 0, must be considered a function of (u,v).
From our perspective, however, Aloimonos and Swain make the approximation that
f du2 + u 2 + f Id2 is a constant, which is reasonable when the depth of field of the
observed textured plane is narrow and u2+ v 2 <<f '. In this case Eq. (9.20) becomes
where they call X the "textural albedo," noticing the similarity between Eq. (9.21)
and the bmbertian. reflectance equation (see Chapter 12).
Suppose the textural primitives are observed as all having identical area and
lying on the textured plane Ax + B y + Cz D = 0. Then by Eq. (9.21), we have
Hence
492
Texture
where
bi =
[J-
A i ( ~ ivi)
,
3
and
Blostein and Ahuja do not analytically solve for the (slant, tilt) angle. They do
a brute-force search over a set of fixed slant-tilt angles to find the (slant, tilt) angle
that best fits the observed areas. The motivation for the brute-force search is that
they accept the reality that it is difficult to determine which regions have the texture
of interest and which ones do not. By doing a global fit, they are able to identify
simultaneously the true texels from among the candidate regions and the slant and
tilt.
Blostein and Ahuja found the texel identification problem to be difficult and
gave the detailed procedure they found most satisfactory among the ones they tried.
They extracted texture disklike primitives by convolving the image with a Laplacian
of Gaussian kernel
-bLG(u,v)
80
[6(u:
v2)
(u2 v2)2
u3
For an image having a disk of diameter D and contrast C centered at (0,0), that is,
=
c if x2 + y 2 5 ~
0 elsewhere
the response of the filters V G and & V G at the center of the disk was (&D2/2u2) x
e-~'"' and (rCD2/2)[D2/4u3 - 2/u3]e-D'"',
respectively.
From these expressions, it is apparent that by dividing one expression by the
other, it is possible to solve for both the diameter D and the contrast C of the disk.
Exercises
493
This results in
What Blostein and Ahuja did was to compute the convolutions V G * I and
5 d , and 6 a . They identified
the position of the local extrema in the V G *I image. At each local extremum, they
used the value of & V G *Iand V * I to compute disk diameter D and constant
C. If 2 a - 2 5 D 5 2 a u 2, a disk of diameter D was accepted to be centered
at the position of the extremum. Finally, they identified texture primitives as either
single disks that did not touch any other disk or the union of overlapping disks,
providing the union did not have concavities that were too sharp.
8 V G * I at six a values: a , 2 a , 3a , 4fi,
a.
We have discussed the conceptual basis of texture in terms of its primitives and
their spatial relationships, and described several kinds of primitives and spatial relationships by which a texture can be characterized. We reviewed a number of
different textural feature extraction techniques, including co-occurrence, autocorrelation, digital transform, edgeness, relative extrema density, mathematical morphology, autoregressive Markov random fields, random mosaic models, and structural
approaches. Then we discussed texture segmentation and synthetic texture generation. Finally, we reviewed a variety of approaches to determine shape from texture.
These approaches assume that either the density or the size of the texture primitives is constant, and they permit the calculation of a local surface normal on the
basis of the texture density or size of texture primitives at multiple locations on the
perspective projection image. Qualitatively, shape from texture can work; quantitatively, the techniques are generally not dependable. The degree of dependability
could actually be determined by an analytic calculation of the variance of the computed quantities as a function of the Poisson density parameter for the primitive
generating process. However, this has not yet been done.
Exercises
9.1. Generate some synthetic 512 x 512 texture images by low-pass filtering a white-noise
image with the Gaussian kernel
for o = 1,2,3,4,5.
9.2. For each of the five images generated by Exercise 9.1, determine the entropy, energy,
and homogeneity features of the co-occurrence matrix for distance d = 1,2,3,4,5.
494
Texture
9.3.
9.4.
9.5.
9.6.
What can you say about the relationship between the feature value as a function of
a and d?
For each of the five images generated in Exercise 9.1, determine the relationship
between the texture energy features of Laws and a.
For each of the five images generated in Exercise 9.1, determine the relationship
between the textural edgeness features and a.
For each of the five images generated in Exercise 9.1, determine the relationship
between the vector dispersion feature and a.
For each of the five images generated by Exercise 9.1, use the Markov random field
model to estimate the coefficients of linear combinations h(i, j) from Eq. (9.4).
What can you infer about the relationship between a and the h(i,j) coefficients?
Use the textured image of a = 5 to texture a plane Ax By Cz + D = 0 in
three-dimensional space. Then compute the perspective projection of this textured
plane. Write a program to identifv the position and size of the textural primitives
on the perspective projection image.
Use a method that assumes a constant density of texture primitives to estimate the
surface normal (A, B,C) of the textured plane from the image produced in Exercise 9.7.
Use a method that assumes a constant size of texture primitives to estimate the surface
normal (A, B, C) of the textured plane.
Repeat Exercise 9.8 for a number of trials with different-texture images generated in
Exercises 9.1 and 9.7 with a = 5. What is an appropriate way to estimate the
variance of the estimated unit normal vector? What is this variance as a function of
image size? Try images that are 64 x 64, 128 x 128,256 x 256, and 512 x 512.
Repeat Exercise 9.9 for a number of trials with different texture images generated as
in Exercises 9.1 and 9.7 with a = 5. What is an appropriate way to estimate the
variance of the estimated unit normal vectors? What is this variance as a function
of image size? Try images that are 64 x 64, 128 x 128,256 x 256, and 512 x 512.
Devise an approach that uses both the constant density and the constant size of texture
primitives to estimate the surface normal (A, B,C) of the textured plane.
Write a program that uses the method of Exercise 9.12 to estimate the normal (A, B, C)
of a textured plane from the image produced in Exercise 9.7.
Repeat Exercise 9.13 for a number of trials with different texture images generated as
in Exercises 9.1 and 9.7 with a = 5. What is the value for a suitable measure of
the variance of the estimated unit normal vector as a function of image size? Try
images that are 64 x 64, 128 x 128,256 x 256, and 512 x 512.
9.7.
9.8.
9.9.
9.10.
9.11.
9.12.
9.13.
9.14.
Bibliography
Abend, K., T. J. Harley, and L. N. Kanal, "Classification of Binary Random Patterns,"
IEEE Zhnsactions on Information Theory, Vol. IT-11, 1965, pp. 538-544.
Ahuja, N., T. Dubitzki, and A. Rosenfeld, "Some Experiments with Mosaic Models for
Images," IEEE Zhnsactions on Systems, Man, and Cybernetics, Vol. SMC10, 1980, pp. 744-749.
Ahuja, N., and A. Rosenfeld, "Mosaic Models for Textures," IEEE Zhnsactions on Pattern Anahsis and Machine Intelligences Vol. PAMI-3, 1981, pp. 1-1 1.
CHAPTER
IMAGE SEGMENTATION
Introduction
An image segmentation is the partition of an image into a set of nonoverlapping
regions whose union is the entire image. The purpose of image segmentation is
to decompose the image into parts that are meaningful with respect to a particular
application. For example, in two-dimensional part recognition, a segmentation might
be performed to separate the two-dimensional object from the background. Figure
lO.l(a) shows a gray level image of an industrial part, and Figure lO.l(b) shows
its segmentation into object and background. In this figure the object is shown in
white and the background in black. In simple segmentations we will use gray levels
to illustrate the separate regions. In more complex segmentation examples where
there are many regions, we will use white lines on a black background to show the
separation of the image into its parts.
It is very difficult to tell a computer program what constitutes a "meaningful"
segmentation. Instead, general segmentation procedures tend to obey the following
rules.
1. Regions of an image segmentation should be uniform and homogeneous with
Figure 10.1
cegrnentatlrln
of
Flgure 10.2 Set of points in a Euclidean measurement space that can be separated
rnro three clusters of points. Each cluster consists of poin~stha~are rn some sen=
close to one another.
51 1
This chapter describes the main ideas behind the major image segmentation
techniques and gives example results for a number of them. Additional image segmentation surveys can be found in Zucker (1976), Riseman and Arbib (1977),
Kanade (1980), and Fu and Mui (1981). Our point of view will be segmentation
with respect to the gray level characteristic. Segmentation on the basis of some other
characteristic, such as texture, can be achieved by first applying an operator that
transforms local texture to a texture feature value. Texture segmentation can then
be accomplished by applying segmentation with respect to the texture pattern value
characteristic exactly as if it were a gray level characteristic.
Measurement-SpacsGuidedSpatial Clustering
The technique of measurement-space-guided spatial clustering for image segmentation uses the measurement-spaceclustering process to define a partition in measurement space. Then each pixel is assigned the label of the cell in the measurementspace partition to which it belongs. The image segments are defined as the connected
components of the pixels having the same label.
The segmentation process is, in general, an unsupervised clustering, since no
a priori knowledge about the number and type of regions present in the image is
available. The accuracy of the measurement-spaceclustering image segmentation
process depends directly on how well the objects of interest on the image separate
into distinct measurement-space clusters. Typically the process works well in situations where there are a few kinds of distinct objects having widely different gray
level intensities (or gray level intensity vectors, for multiband images) and where
these objects appear on a nearly uniform background.
Clustering procedures that use the pixel as a unit and compare each pixel value
with every other pixel value can require excessively long computation times because
of the large number of pixels in an image.. Iterative partition rearrangement schemes
have to go through the image data set many times and, if done without sampling, can
also take excessive computation time. Histogram mode seeking, because it requires
only one pass through the data, probably involves the least computation time of the
measurement-space-clustering technique, and it is the one we discuss here.
Histogram mode seeking is a measurement-space-clustering process in which
it is assumed that homogeneous objects on the image manifest themselves as the
clusters in measurement space. Image segmentation is accomplished by mapping the
clusters back to the image domain where the maximal connected components of the
mapped back clusters constitute the image segments. For images that are single-band
images, calculation of this histogram in an array is direct. The measurement-space
clustering can be accomplished by determining the valleys in this histogram and
declaring the clusters to be the intervals of values between valleys. A pixel whose
value is in the ith interval is labeled with index i , and the segment it belongs to is
one of the connected components of all pixels whose label is i. The thresholding
techniques discussed in Chapter 2 are examples of histogram mode seeking with
bimodal histograms.
hnlcs.
Ftgure 10.3 illus~ratcsan oxarnplc image that is the ri_eht kind of image for the
measz~rernenl-spacc-cluste% image segmentation process. It is an enlarged image
of a polished mineral ore section. The width of the field is about l m m . The ore is
from Ducktown. Tennessee, and shows subhedral-to-enhedtal pyrite porophyroblests
(while) in a matrix of pyrorhotite (pray). The black areas are holes. Figure 10.4
**
**
tt
***
tt*
**f
* & A
'?t
.*
'*
*t
**
.A
ttt*
*f
****
tf
ttt*
tft*
I***
**
.*
*t
.*
**
ttt*
If*
****
*+**
I***
t*t*
****
.*
.*
f*
*.**I
t**
tt*t+
.ft
Dnra values
Figura 10.4 H~stopramo f the image In FIE. 10.3 The three nanuverlapplnp
n u ~ d r ccorre\rond to Ihe hlack hole\. lhe prorhot~le,and the pyntc
-r
In 3 pruduced
5 13
h) cluh~erin? ihc
shows the histogram of this image. The valleys arc no trouble ~tufind. The first
cluster is from the left end to the hrst valley. The second cluster is from the firxi
valley to thc sccond valley. The third cluster 11; from the second vatley to the right
end. Assignine to each pixel the cluster index of the cluster to which it belongs
and thcn assignme a unique gray level tn each cluster label yields the segmentation
+own in Fig. 10.5.This is a virtually perfect (meaningfull segmentation.
Figure 10.6 shows an example image that is not idcal for measurement-spaceclustering imagc segmentation. F i ~ u r e10.7 shvws itc histogram, which has three
modes and twn valleys, and Fig. 10.11 show the correxponding segmentation. Notice
the multiple-boundary arca. It is apparent that the boundary bcrwcen rhe grain and
54
Image Segmentation
I *
' t i
5 15
Figure 10.9 An F-15 bulkhead. Images of portions of the bulkhead are used as
cnamples ~hroughoutthis chaprer.
rhe background is in fact shaded dark, and there are many such border regions that
show up as dark segments. In t h ~ scase we do not desire the edge borders to be
wparate regions, and although the segmentation procedure did exactly as ~t should
have done, the results are not what w e desired. This il1us:rates that segmentation into
homogeneous regions is not necessarily a good solution to a segmentation problem.
The next example further illustrates the fallac~esof measurement-space clustering. Figure 10.9 is a diagram of an E- 15 bulkhead. Images of portions of the bulkhead, which were used as test data for an experimental robot guidancelinspection
system, will be used as examples throughout the rest of this chapter. Figure 10.10
5 16
Image Segmentation
36
53
80
103
125
147
170
192
214
236
Data values
illustrates an image of a section of the F-15 bulkhead. It is clear that the image
has distinct parts, such as webs and ribs. Figure 10.11 shows the histogram of this
image. It has two well-separated modes. The narrow one on the right with a long
left tail corresponds to specular reflection points. The main mode has three valleys
on its left side and two valleys on its right side. Defining the depth of a valley to
be the probability difference between the valley bottom and the lowest valley side
and eliminating the two shallowest valleys produces the segmentation shown in Fig.
10.12. The problem in the segmentation is apparent. Since the clustering was done
in measurement space, there was no requirement for good spatial continuation, and
the resulting boundaries are very noisy and busy. Separating the main mode into its
two most dominant submodes produces the segmentation of Fig. 10.13. Here the
boundary noise is less and the resulting regions are more satisfactory, but the detail
provided is much less.
Ohlander, Price, and Reddy (1978) refine the clustering idea in a recursive
way. They begin by defining a mask selecting all pixels on the image. Given any
mask, a histogram of the masked image is computed. Measurement-space clustering
enables the separation of one mode of the histogram set from another mode. Pixels
on the image are then identified with the cluster to which they belong. If there is
only one measurement-space cluster, then the mask is terminated. If there is more
than one cluster, then each connected component of all pixels with the same
cluster is, in turn, used to generate a mask that is placed on a mask stack.
During successive iterations the next mask in the stack selects pixels in the histo-
51 7
a measurement-space
gram corl.rputatlon prnccss. Clurtcring i h repcared for each new inask until the stack
is cmpp
Figure 10.14 illustrate< this procehs. w h ~ c hwe call a recursive histogramdirected spatiai clustering. Figure 10.15 illu.;trates a rccursi\lc histogram-directed
spatial-clustering technique applled to thc bulkhead image of Fig. 10.10. It produces a result witn boundarieh being somewhat busy and with mart) smail regions
in areas or specular reflectance. Figure 10.16 illustrates the rewlts r ~ fperforming
a rnorphalog~calopening with a 3 r 3 hquarc structuring element on each region in
the segmentation of Fig. 10.15. Pixels, which are removed hy the opening. are then
given values by a single region growing pmcess. The tiny regions arc removed in
this manner. but 5everal inlportant long, t h ~ nregions arc also lost.
Push
?I
O r ~ p n ama\k
l
cavers entlre Image
Flgure 10.3 5
5 19
w l t l ~d ? x 3 %quare
structuring elemcnt on the <egmeuratlon oi F I ~10. lfr and then filling In the
removed prxels by a sinclc repon prn&lnp process
For ordinary color images Ohta, Kanade, and Sakai (1980) suggest that histograms not be computed individually on the red, green, and blue (RGB) color
rariahles. but on a set of variables closer to what the Karhunen-Loeve (principal
components) transform would suggest. They suggest ( R + G I - B ) / 3 . ( R - 8 ) / 2 .
2nd (2G - R - 8 ) / 4 . Figure 10.17 illustrates a color image. Figure 10.18 shows
wo segmentations of thc color image: one by recursive histogrm-directed spatial
clustering using the R , G , and B bands, and the second by the same method, but
using the transformed bands suggested by Ohta, Kanade, and Sakai.
510
Image segmentation
Figure 10.18 Two \cgmcntallun\ rlC the color inlare of Fig 10 17. The lrfr reptl~cK. 6 . ;lnrl R hands. The right scenlcntntinn was achieved hy thc same m e ~ h o d
hut using rhe tr;in+h)rr~lcd
b,jnds IR G 8 ) 3. t K - R I ' 2 . and (?C; - K - B ) '4
wgpr\tcd liy Ol~t,).K,~n;ldr. and SaLi I IYROI.
10.2.1 Thresholding
I F thc Irnagc contains a bright object against a dark background and the measurement
apauc is one-dimensional. [measurement-space clustering amounts to determining a
thre\hold ~ u c hthat all pixelq whose values are less than or equal to the threshold are
ass~pnedto one cluster and the remaining pixels are assigned to the second cluster.
In the easiest cases a prnccdurt: lo determine the threshold need only examine the
histogram and place thc threshold in lthe valley between the two modes. Procedures
of th~kkind were d~suusscdin Chapter 2 . Unfortunately. it is not always the case
that the two modes arc n~cclyseparated by a valley. To handle this kind of situation.
o variep uf tcchniqucs can be used to combine the spatla! lnformation on the image
with the gray level intcns~tyinformation to help in threshold determination.
Chow and Kaneko (1972) suggert using a threshold that depends on the kisto~ra111
dctcrrnined from the spatially local area around the plxcl to which the threshulcl ,ipplies. Thus, for example. a ne~ghrhorhoodsize of 33 x 33 or 65 x 65 can be
uzcd t o compute the local histogram. Chow and Kancko avoided the local histograin
cornputatlcln for each pixel's neighborhood by dividing the image Into rnufuaily
C X C ~ U S I L Cblocks. computing the histoyram for each block. and dctcrrnining an apprnpriatc threshold Tor each histogram. This threshold value can bc considered to
xpply IO the ccnler p~xelof each block. To obbrain thresholds for the remaining pixul\, t h q spat~iillyintcrpolated the block center-pixel thresholds to obtain a spatially
a d a p t ~ \ cthrcshold for each p ~ x e l .
W r s ~ k a ,Nagel, and RosenfeId ( 1974) suggest determining a histogram for only
~hosrprels liav~nga hrgh Laplacian magnitude
Chaptcr 6). They reason that
there w ~ l lhc a shoulder of the gray level lntcnsity fi~nctionat each side of the
52 1
boundary. The shoulder has high Laplacian magnitude. A histogram of all shoulder
pixels will be a histogram of all interior pixels just next to the interior border of the
region. It will not involve those pixels between regions that help make the histogram
valley shallow. It will also have a tendency to involve equal numbers of pixels from
the object and from the background. This makes the two histogram modes about
the same size. Thus the valley-seeking method for threshold selection has a chance
of working on the new histogram.
Weszka and Rosenfeld (1978) describe one method for segmenting white blobs
against a dark background by a threshold selection based on busyness. For any
threshold, busyness is the percentage of pixels having a neighbor whose thresholded
value is different from their own thresholded value. A good threshold is the point
near the histogram valley between the two peaks that minimizes the busyness.
Watanabe (1974) suggests choosing a threshold value that maximizes the sum of
gradients taken over all pixels whose gray level intensity equals the threshold value.
Kohler (1981) suggests a modification of the Watanabe idea. Instead of choosing
a threshold that maximizes the sum of gradient magnitudes taken over all pixels
whose gray level intensity equals the threshold value, Kohler suggests choosing the
threshold that detects more highcontrast edges and fewer lowcontrast edges than
any other threshold.
Kohler defines the set E(T) of edges detected by a threshold T to be the set
of all pairs of neighboring pixels one of whose gray level intensities is less than or
equal to T and one of whose gray level intensities is greater than T;
E ( n = {[(iyj)y(kyl)ll
1. pixels (i, j ) and (k,l) are neighbors
2. min {I(i, j), I(k,l)} 5 T < max{I(i, j),I(k,l)}.
(10.1)
pi
Gradient
Gray tone
Figure 10.19 Diagram showing how the threshold of the Panda and Rosenfeld
technique depends on the gradient magnitude.
F ~ Q U10.22
~ B Pixel+. ot rhr 1.I-IK
IrndTs
I~axjnglarge g r ~ d ~ e rnagnitudc.
nt
523
the Image in
525
would have to have 1006 = 1012locations. A large image might be 10,000pixels per
row by 10,000 rows. This constitutes only lo8 pixels, a sample too small to estimate
probabilities in a space of 1012values were it not for some constraints of reality: (1)
there is typically a high correlation between the band-to-band pixel values, and (2)
there is a large amount of spatial redundancy in image data. Both these factors create
a situation in which the 108 pixels can be expected to contain only between lo4 and
1V distinct 6-tuples. Based on this fact, the counting required for the histogram is
easily done by mapping the 6-tuples into array indexes. The programming technique
known as hashing, which is described in most data structures texts, can be used for
this purpose.
Clustering using the multidimensional histogram is more difficult than univariate histogram clustering, since peaks fall in different places in the different histograms. Goldberg and Shlien ( 1977, 1978) threshold the multidimensional histogram to select all N-tuples situated on the most prominent modes. Then they
perform a measurement-space connected components on these N-tuples to collect
all the N-tuples in the top of the most prominent modes. These measurement-space
connected sets form the cluster cores. The clusters are defined as the set of all
N-tuples closest to each cluster core.
An alternative possibility (Narendra and Goldberg, 1977) is to locate peaks in
the multidimensional measurement space and region-grow around them, constantly
descending from each peak. The region growing includes all successive neighboring
N-tuples whose probability is no higher than the N-tuple from which it is growing.
Adjacent mountains meet in their common valleys.
Rather than accomplish-the clustering in the full measurement space, it is possible to work in multiple lower-order projection spaces and then reflect these clusters back to the full measurement space. Suppose, for example, that the clustering is done on a four-band image. If the clustering done in bands 1 and 2 yields
clusters cl,c2,c3 and the clustering done in bands 3 and 4 yields clusters c4 and
c5, then each possible 4-tuple from a pixel can be given a cluster label from the
set {(cl,~4),(~1,~5),(~2,~4),(~2,~5),(~3,~4),(~3,~~))A4-tuple (x1,x2,x3,x4)gets
the cluster label (c2,c4) if (x1,xZ)is in cluster c2-and (x3,x4) is in cluster c4.
Figure 10.25 Simple gray level image and graph resulting from defining "similar
enough" to be ditfering in gray level by less than 5 and using the Cneighborhood
to determine connected components (see Chapter 2).
do, however, have a problem with chaining, because it takes only one arc leaking
from one region to a neighboring one to cause the regions to merge.
As illustrated in Fig. 10.25, the simplest single-linkage scheme defines "similar
enough" by pixel difference. Two neighboring pixels are similar enough if the
absolute value of the difference between their gray level intensity values is small
enough. Bryant (1979) defines "similar enough" by normalizing the difference by the
quantity
times the root-mean-square value of neighboring pixel differences taken
over the entire image. For the image of Fig. 10.25, the normalization factor is 99.22.
The random variable that is the difference of two neighboring pixels normalized by
the factor 1199.22 has a normal distribution with mean 0 and standard deviation
99.22. A threshold can now be chosen in t e r n of the standard deviation instead of
as an absolute value. For pixels having vector values, the obvious generalization is
to use a vector norm of the pixel difference vector.
527
Flgure 10.26
l i:~r.llick and Dinstcin (1975). however. 30 report some succcss using this technique
t \ ! \ 1,iINDSAT data. They perform a dilation of the edgc pixcls In order ro close
c,~p\brfore perfnrrning he connected c3mponents operatnr. Pcrkins (1980) uses a
techniqi~c.
Flaralick 1 1982. 1984) discusseq a very ~cnsitivczero crossing of the qecond
t i ~ r ~ c t ~ o derivative
nal
cdpe operator. In thir technique. each neighborhood I F least+.rllKmfi fitted with a cubic polynomial in two variables. The first and secnnd partial
rlr.r~i;~tives
are easily determined frnm thu polynomial. The first partla! derivati\es
.if ~lrrcenter pixcl determine the gradient drrection. With the direction fixed to b e the
cr.tdicnt direction. the second partla! 5 determine the second directional derivative. If
r h zradicnt
~
is high enough and if. In the gradient direction. the second d~rectional
L l c r ~ r n t i v has
e a negatively sloped zero crossing insidc thc pixel's area, then an edge
!k
I~claredin the neighborhood's center pixel. (This cdgc opcrator is described In
rrirlre detail in Chapter 8.)
F~qure10.27 shows t h e edyes resu,ting from thc second directional derivative
k~:jr~l:~r
528
image Segmentation
scheme in which any pair of neiphhnring pirelc, ncithcr ot' which is an edge pixel. can lbnk togcrher. The reculling
scgmentatlon consisrs ot ~hc.crmnrcted cotnponcnts nt' the nonrrlge plxc1.i auprncntcd hl; asripninp edpe plxrlc 11) ltic~rncarcst connccted curnprlntrlk. Thii rcsult
\\,:I\ rhtalned horn thc tdpc IIIIU~C
01 Ir1p. 111 27.
529
Edges are declared between any pair of adjacent pixels when the t-statistic from their
neighborhoods is high enough. As N l and N2 get large, 210g t is asymptotically
distributed as a chi-squared variate with two degrees of freedom.
If it can be assumed that the variances of the two regions are identical, then the
statistic
(N1+ N2 - 2)NIN2 (XI -X2)2
(10.8)
F=
N I +N2
s: s;
has an Fdistribution with 1 and N,+N2-2 degrees of freedom under the hypothesis
that the means of the regions are equal. For an F-value that is sufficiently large, the
hypothesis can be rejected and an edge declared to exist between the regions.
Haralick (1980) suggests fitting a plane to the neighborhood around the pixel
and testing the hypothesis that the slope of the plane is zero. Edge pixels correspond to pixels between neighborhoods in which the zero-slope hypothesis has to
be rejected. To determine a roof or V-shaped edge, Haralick suggests fitting a plane
to the neighborhoods on either side of the pixel and testing the hypothesis that the
coefficients of fit, referenced to a common framework, are identical.
Another hybrid technique first used.by Levine and Leemet (1976) is based
on the Jarvis and Patrick (1973) shared-nearest-neighbor idea. Using any kind of
reasonable notion for similarity, each pixel examines its K x K neighborhood and
makes a list of the N pixels in the neighborhood most similar to it. Call this list the
similar-neighbor list, where we understand neighbor to be any pixel in the K x K
neighborhood. An arc joins any pair of immediately neighboring pixels if each pixel
is in the other's shared-neighbor list and if there are enough pixels common to their
shared-neighbor lists, that is, if the number of shared neighbors is high enough.
To make the shared-neighbor technique work well, each pixel can be associated
with a property vector consisting of its own gray level intensity and a suitable average
of the gray level intensities of pixels in its K xK neighborhood. For example, we can
have ( x ,a) and (y, b) denote the property vectors for two pixels if x is the gray level
intensity value and a is the average gray level intensity value in the neighborhood
of the first pixel, y is the gray level intensity value and b is the average gray level
intensity value in the neighborhood of the second pixel. Similarity can be established
by computing
S = w , ( x - y)2 w2(x- b12 w,(y - a12
(10.9)
530
~megeSegmentation
whcrc w , . w ? ,and M I Xare nonnegative weights. Thus thc quantlty S takes into accoun
the drferencc between the pray levels of thc two pixels in question and the difference
beween thc gray level of each pixel and thc average gray level o f the neighborhood
ol' thc cwher pixel. The weights I*>, . w,,and w , can be learned from training data for
a pan~uularc l a ~ sof images. The pixels arc called similar enough for small enough
r.due\ af S .
Pong et a!. 11984) suggest an approach to segmentation based on the fauct
rnldcl of images. The pmcedusc start5 with an initial se_pmcntatlon of the image
inlo m a l l regions. The initial segmcntatiun~used by Pong group together pixels that
have s~milarfacet-fitting parameters ( s e c Chapter 8). hut any in~tialscgmcntation
can be used. For each region of the initial sesmentatlon. a prnperty vector, which is
a li\t o f values of a set o f predefined attribures, is computed. The attributes consist
of such propertics o f a region as its area, irs mcan gray level. and its elongation.
Each rtgion with an associated property vector is considered a unit. In a series
ut rterations thc property vector of a rcglon is replaced by a p r o p c q vector that
is a function nf its neighboring regions. (The function that worked bcst in P o n g ' ~
experirnenti replaced the property veclor of a region with the propcry vector of the
b e s t - f i t t ~ nnelphborhood
~
clf that region.) Then adjacent regions having simllar final
propersp vectors are mcrged. This gives a new segmcntatjon, which can then bc used
as input to the algorithm. Thu\ a sequence nf coarscr and coarser segmentations is
produced. Useful variarions are to prohibit merging across strong edge boundaries
or when the variance of the cornhrned region hecomes too large. F~gurcs10.29.
10.30, and 20.31 illustrate the results of' the Pong approach on the imagc of Fig.
In. 10 for nne. two. and three iterations. respectively. Figure 10.32 illustrates the
result of removing r g i o n s c ~ fsize 25 or fewer pixels from the segmentation of Fig.
10.31.
In,10.
131
Fig.
Flgbro 70.32
~L,'?t~l:jeti
~
t l ~~
: erricrl
mi?
~
;cc;tln\
t
m~,:;!?r
~
t:b:m
size
~
25 ~
532
image Segmentatkn
and
Under the assumption that all the pixels in R and the test pixel y are independent
afd
533
has a TN-I distribution. If t is small enough, y is added to region R and the mean
and scatter are updated by using y. The new mean and scatter are given by
.
and
(10.14)
S L + S L (Y - r-)'
N(X- -Xdd)?
If t is too high, the value y is not likely to have arisen from the population
of pixels in R. If y is different from a l l its neighboring regions, then it begins its
own region. A slightly stricter linking criterion can require that not only must y be
close enough to the mean of the neighboring regions, but a neighboring pixel in that
region must have a close-enough value to y. This combines a centroid-linkage and
a single-linkage criterion. The next section discusses a more powerful combination
technique, but first we want to develop the concept of "significantly high."
To give a precise meaning to the notion of too high a difference, we use an
a-level statistical significance test. The fraction a represents the probability that a
T-statistic with N - 1 degrees of freedom will exceed the value TN_l(a). If the
~ then we declare the difference to be sigdicant.
observed T is larger than T N -(a),
If the pixel and the segment really come from the same population, the probability
that the test provides an incorrect answer is a.
The significance level a is a user-provided parameter. The value of TN-,(a) is
higher for small degrees of freedom and lower for larger degrees of freedom. Thus,
region scatters considered to be equal, the larger a region is, the closer a pixel's
value has to be to the region's mean in order to merge into the region. This behavior
tends to prevent an already large region from attracting to it many other additional
pixels and tends to prevent the drift of the region mean as the region gets larger.
Note that all regions initially begin as one pixel in size. To avoid the problem
of division by 0 (for SZis necessarily 0 for one-pixel regions as well as for regions
having identically valued pixels), a small positive constant can be added to 9.One
convenient way of determining the constant is to decide on a prior variance V > 0
and an initial segment size N. The initial scatter for a new one-pixel region is then
given by NV, and the new initial region size is given by N. This mechanism keeps
the degrees of freedom of the T-statistic high enough so that a significant difference
is not the huge difference required for a T-statistic with a small number of degrees of
freedom. To illustrate this method, consider the second image of the F-15 bulkhead
shown in Fig. 10.34. Figure 10.35 illustrates the resulting segmentation of this
bulkhead image for a 0.2% significance level test after all regions smaller than 25
pixels have been removed.
Pavlidis (1972) suggests a more general version of this idea. Given an initial
segmentation where the regions are approximated by some functional fit
guaranteed to have a small enough error, pairs of neighboring regions can be
merged if for each region the sum of the squares of the differences between the
534
Image Segmentation
fittcd coefficients for this region and the corresponding averaged coefficients,
averaged over both regions, is small enough. Pavlidis gets his initial segmentation by
finding the besr way 20 dividc each row of the image tnto segments with a sufficiently good fif. He also describes a combinatorial tree search algorithm to
accomplish the merging that guarantees the best result. Kettig and Landgrebe
(1975) successively mergc small image blocks using a stafisticaI test. They avoid
much uf the problem of 7ero scatter by considering only cells containing a
2 x 2 block nf pixels.
Gupta et al. ( 1973) svggest using a 7-test based on the absolute value of the
difference between the pixel and the nearest region as the measure of dissimilarity.
535
Kettig and Landgrebe (1975) discuss the multiband situation leading to the F-test
and report good success with LANDSAT data.
Nagy and Tolaba (1972) simply examine the absolute value between the pixel's
value and the mean of a neighboring region formed already. If this distance is small
enough, the pixel is added to the region. If there is more than one region, then the
pixel is added to the region with the smallest distance.
The Levine and Shaheen (1981) scheme is similar. The difference is that Levine
and Shaheen attempt to keep regions more homogeneous and try to keep the region
scatter from becoming too high. They do this by requiring the differences to be
more significant before a merge takes place if the region scatter is high. For a
user-specified value 8, they define a test statistic t where
Hybrid-Linkage Combinations
Thc prcvlous Kectlon mcntlclned the s~nlplccomhtnation of ccntro~d-li~lliagc
and
srnylt-ltnkage resion growing. In this qectlon w t d i w ~ s sthc more powerful
\e:l~!cnta~inn
537
of'
: was used.
Spatial Clustering
Tr
pn\sible to determinc the i m a y segmcns by simul~dneouslyconlbining clustcrspace with a $patiat reg-ion growing. We call \uch a technique
<patid-clustering. In essencc. spatial-clustering schemes cton~binethe histogammode-seeking techn~quewith a region-prowir~go r a spatial-11t1Lge technique.
Haralick and Kclly (1969) cuygest that segmenzatlcn hc done hy first locating.
In turn. all the peak5 in the moasurenicnt-space h~slogrzm.and then deterinining
311 pixel lociitions having a mcasurenent (In the pcak. Next, h q i n n i n g with a pixel
IT
Inr in mcasuremcnt
538
Image segmentation
corresponding to the highest peak not yet processed, both spatial and measurementspace region growing are simultaneously performed in the following manner. Initially each segment is the pixel whose value is on the current peak. Consider for
possible inclusion into this segment a neighbor of this pixel (in general, the neighbors of the pixel we are growing from) if the neighbor's value (an N-tuple for an
N band image) is close enough in measurement space to the pixel's value and if its
probability is not larger than the probability of the value of the pixel we are growing from. Matsumoto, Naka, and Yanamoto (1981) discuss a variation of this idea.
Milgram (1979) defines a segment for a single-band image to be any connected
component of pixels all of whose values lie in some interval I and whose border
has a higher coincidence with the border created by an edge operator than for any
other interval I. The technique has the advantage over the Haralick and Kelly technique in that it does not require the difficult measurement-space exploring done in
climbing down a mountain. But, it does have to try many different intervals for each
segment. Extending it to efficient computation in multiband images appears difficult.
However, Milgram does report good results from segmenting white blobs against a
black background. Milgram and Kahl (1979) discuss embedding this technique into
the Ohlander (1978) recursive control structure.
Minor and Sklansky (1981) make more active use of the gradient edge image
than Milgram but restrict themselves to the more constrained situation of small
convexlike segments. They begin with an edge image in which each edge pixel
contains the direction of the edge. The orientation is such that the higher-valued
gray level is to the right of the edge. Then each edge sends out for a limited distance
a message to nearby pixels and in a direction orthogonal to the edge direction. The
message indicates the sender's edge direction. Pixels that pick up these messages
from enough different directions must be interior to a segment.
The spoke filter of Minor and Sklansky counts the number of distinct directions appearing in each 3 x 3 neighborhood. If the count is high enough, Minor and
Sklansky mark the center pixel as belonging to an interior of a region. Then the
connected component of all marked pixels is obtained. The gradient-guided segmentation is completed by performing a region growing of the components. The region
growing must stop at the high-gradient pixels, thereby assuring that no undesired
boundary placements are made.
Burt, Hong, and Rosenfeld (1981) describe a spatial-clustering scheme that is
a spatial pyramid constrained ISODATA kind of clustering. The bottom layer of the
pyramid is the original image. Each successive higher layer of the pyramid is an
image having half the number of pixels per row and half the number of rows of the
image below it. Figure 10.39 illustrates the pyramid structure. Initial links between
layers are established by linking each parent pixel to the spatially corresponding
4 x 4 block of child pixels. Each pair of adjacent parent pixels has eight child
pixels in common. Each child pixel is linked to a 2 x 2 block of parent pixels. The
iterations proceed by assigning to each parent pixel the average of its child pixels.
Then each child pixel compares its value with each of its parent's values and links
itself to its closest parent. Each parent's new value is the average of the children
to which it is linked, and so on. The iterations converge reasonably quickly for the
same reason the ISODATA iterations converge. If the top layer of the pyramid is
540
image segmentation
a 2 x 2 block of great-grandparents, then there are at most four segments that are
ment. Then the method successively splits each of its current segments into quarters
if the segment is not homogeneous en
that is, if the difference between the
largest and the smallest gray level in
s is large. A merging method starts
and succ ively merges regions that are similar
enough.
Splitting algori
first suggested by Robertson (1973) and Klinger
(1973). Kettig and Landgrebe (1975) try to split all nonuniform 2 x 2 neighborhoods before beginning the region merging. Fukada (1980) suggests successively
splitting a region into quarters until the sample variance is small enough. Efficiency
of the split-and-merge method can be increased by arbitrarily partitioning the image into square regions of a user-selected size and then splitting these further if they
are not homogeneous.
Because segments are successively divided into quarters, the boundaries produced by the split technique tend to be squarish and slightly artificial. Sometimes
adjacent quarters coming from adjacent split segments need to be joined rather than
remain separate. Horowitz and Pavlidis (1976) suggest the split-and-merge strategy to take care of this problem. They begin with an initial segmentation achieved
by splitting into rectangular blocks of a prespecified size. The image is represented
by a segmentation tm, which is a quadtree data structure (a tree whose nonleaf
nodes each have four children). The entire image is represented by the root node.
The children of the root are the regions 'obtained by splitting the root into four
equal pieces, and so on. A segmentation is represent& by a &set, a minimal set
of nodes separating the root from all of the leaves. In the tree structure the merging
process consists of removing four nodes from the cutset and replacing them with
their parent. Splitting consists of removing a node from the cutset and replacing it
with its four children. The two processes are mutually exclusive; all the merging
operations are followed by all the splitting operationi. The splitting and merging
in the tree structure is followed by a final grouping procedure that can merge adjacent unrelated blocks found in the final cutset. Figure 10.40 illustrates the result of
a Horowitz and Pavlidis type split-and-merge segmentation of the bulkhead image.
Muerle and Allen (1968) suggest merging a pair of adjacent regions if a statistical
test determines that their gray level intensity distributions are similar enough. They
recommend the Kolmogorov-Smirnov test.
Chen and Pavlidis (1980) suggest using statistical tests for uniformity rather
than a simple examination of the difference between the largest and the smallest
gray level intensities in the region under consideration for splitting. The uniformity
test requires that there be no significant difference between the mean of the region
541
,mtl rhe mean of each of it%quarters. The Chen and Pavlidis tests assume that the
t,iri~ncesare equal and known.
1-et each quarter have K pixels m i t h X , , being the j t h pixel in the ith region; let
,
he rhc mean of thc ith quarter: and let .Y .. be the grand me:m nf all the pixeb in
t'rc [our quarters. Then in order for a region to be considered hnmogeneous, Chen
. ~ n , iPavlidis require that
i s a given ~hrebtioldparameter.
We give here the F-test for testing the hypothesis t h a ~the means and variances
, I ! r he quarters are identical. This i s thc opt \ma1 test when the randclmness can be
11: rdeled as arising k o m additive Gaussidn-distributed variates. The valuc of variance
I \ nor assumed known. Under thc assumption that the regions arc independent and
identtcally distributcd normals. the optinla1 lest i s given by the sta'tlstic F.
11 hlch IF defined by
{(here f
542
Image Segmentation
Rule-Based Segmentation
The rules behind each of the methods discussed so far are encoded in the procedures
of the method. Thus it is not easy to try different concepts without complete reprogramming. Nazif and Levine (1984) solve this problem with a rule-based expert
system for segmentation. The knowledge in the system is not application domain
specific, but includes general-purpose, scene-independent knowledge about images
and grouping criteria. Rule-based systems including higher-level knowledge are discussed in Chapter 19.
The Nazif and Levine system contains a set of processes-the initializer, the
line analyzer, the region analyzer, the area analyzer, the focus of attention, and
the scheduler-plus two associate memories, the short-term memory (STM) and
the long-term memory (LTM). The short-term memory holds the input image, the
segmentation data, and the output. The long-term memory contains the model representing the system knowledge about low-level segmentation and control strategies.
As described in detail in Chapter 19, a system process matches rules in the LTM
against the data stored in the STM. When a match occurs, the rule fires, and an
action, usually involving data modification, is performed.
The model stored in the LTM has three levels of rules. At level 1 are knowledge
rules that encode information about the properties of regions, lines, and areas in
the form of situation-action pairs. The specific actions include splitting a region;
merging two regions; adding, deleting, or extending a line; merging two lines; and
creating or modifying a focus-of-attention area. Knowledge rules are classified by
their actions. At level 2 are the control rules, which are divided into two categories:
focus-of-attention rules and inference rules. Focus-of-attention rules find the next
data entry to be considered: a region, a line, or an entire area. These rules control
the focus-of-attention strategy. The inference rules are metarules in that their actions
do not modify the data in the STM. Instead, they alter the matching order of different
knowledge rule sets. Thus they control which process will be activated next. At level
3, the highest rule level, are strategy rules that select the set of control rules that
executes the most appropriate control strategy for a given set of data.
The conditions of the rules in the rule base are made up of (1) a symbolic
qualifier depicting a logical operation to be performed on the data, (2) a symbol
denoting the data entry on which the condition is to be matched, (3) a feature of
this data entry, (4) an optional NOT qualifier, and (5) an optional DIFFERENCE
qualifier that applies the operation to differences in feature values. Table 10.1 shows
the different types of data entries allowed. Tables 10.2 to 10.4 show the different
kinds of features, and Tables 10.5 and 10.6 show the possible actions that can be
associated with a rule. Table 10.7 illustrates several rules from the system.
The Nazif and Levine approach to segmentation is useful because it is general
but allows more specific strategies to be incorporated without changing the code.
Other rule-based segmentation systems tend to use high-level-knowledge models of
the expected scene instead of general rules. The paper by McKeown discussed in
Chapter 19 takes this approach for aerial images of airport scenes.
Table 10.1
Allowable data entry types in the Nazif and Levine rule-based segmentation
system.
Data Entry
Symbol
Current region
Current line
Current area
Region ADJACENT to current region
Region to the LEFT of current line
Region to the RIGHT of current line
Line NEAR current line
Line in FRONT of current line
Line BEHIND current line
Line PARALLEL TO current line
Line INTERSECTING current region
Table 10.2
543
REG
LINE
AREA
REGA
REGL
REGR
LINEN
LINEF
LINEB
LINEP
LINE1
Numerical descriptive features that can be associated with the condition part
of a rule.
Numerical Descriptive h t u r e s
Feature 1
Variance 1
Intensity
Gradient variance
Minimum X
Maximum Y
Ending X
Ending direction
Start-end distance
Histogram bimodality
Uniformity 1
Region contrast 1
Line contrast 1
Line connectivity
Number of areas
Rature 2
Variance 2
Intensity variance
X-centioid
Minimum Y
Starting X
Endiig Y
Average direction
Size
Circularity
Uniformity 2
Region contrast 2
Line contrast 2
Number of regions
,
Feature 3
Variance 3
Gradient
Y centroid
Maximum X
Starting Y
Starting direction
Len@
Perimeter
Aspect ratio
Uniformity 3
Region contrast 3
Line contrast 3
Number of lines
544
Image Segmentation
rule.
Numerical SpatiaI Features
Adjacency values
Line content between regions
Nearest point on line in FRONT
Nearest point of line BEHIND
Number of PARALLEL points
Adjacency of RIGHT region
Number of lines BEHIND
Number of regions to the LEFT
Table 10.4
Logical features that can be associated with the condition part of a rule.
Logical Features
Histogram is bimodal
Line is open
Line is loop
Line start is open
Area is smooth
Area is bounded
One region to the LEFT
?e~~rn\
Table 10.5
545
Area, region, and line analyzer actions that can be associated with a rule.
Area Analyzer Actions
Motion-Based Segmentation
In time-varying image analysis (Chapter 15) the data are a sequence of images
instead of a single image. One paradigm under which such a sequence can arise
is with a stationary camera viewing a scene containing moving objects. In each
frame of the sequence after the first frame, the moving objects appear in different
positions of the image &om those in the previous frame. Thus the motion of the
objects creates a change in the images that can be used to help locate the moving
objects.
Jain, Martin, and Aggarwal(1979) used differencing operations to identify areas
containing moving objects. The images of the moving objects were obtained by
focusing the segmentation processes on these restricted areas. In this way motion
was used as a cue to the segmentation process. Thompson (1980) developed a
method for partitioning a scene into regions corresponding to surfaces with distinct
velocities. He first computed velocity estimates for each point of the scene (see
Chapter 15) and then performed the segmentation by a region-merging procedure
that combined regions based on similarities in both intensity and motion.
546
Image Segmentation
hcus-of-Attention Actions
Region with highest adjacency
Region with lowest adjacency
Region with higher label
Region to the LEFT of line
Closest line in front
Closest PARALLEL line
Longest line that is near
Weakest line that is near
Next scanned line
Defocus (focus on whole image)
Clear region list
Freeze area
Next smooth area
Next bounded area
supervisor Actions,
--- - -
Initialize regions
Match region rules
Match focus rules
- -
- -
- -
Initialize lines
Match line rules
Start
Generate areas
Match area rules
Stop
Jain (1984) handled the more complex problem of segmenting dynamic scenes
using a moving camera. He used the known location of the focus of expansion (see
Chapter 15) to transform the original frame sequence into another camera-centered
sequence. The ego-motion polar transform (EMP) works as follows:
Suppose that A is a point in 3-space having coordinates ( x ,y,z), and the camera at time 0 is located at (xo,yo,zo).During the time interval between frames, the
camera undergoes displacement (dxo,dyo,dzo),and the point A undergoes displacement (dx ,dy ,d ~ )When
.
the projection plane is at z = 1, the focus of expansion
is at (dxo/dzo,dyo/dzo).The projection A' of point A after the displacements is
at (X,Y) in the image plane, where
and
10.8 Motlo~BasedSegmentation
547
Table 10.7
IF: 1.
2.
3.
THEN:
Line-Merging R*.
IF: 1.
2.
3.
THEN:
The point A' is converted into its polar coordinates (r,8),with the focus of expansion
being the origin in the image plane. The polar coordinates are given by
548
Image Segmentation
and
r = [(X- d ~ o ) (Y
~ -d ~ ~
4 ) ~ ]
In ( r ,8) space the segmentation is simplified. Assume that the transformed picture is
represented as a two-dimensional image having 8 along the vertical axis and r along
the horizontal axis. If the camera continues its motion in the same direction, then the
focus of expansion remains the same, and 8 remains constant. Thus the radial motion
of the stationary point A' in the image plane due to the motion of the camera is
convened to horizontal motion in (r, 8 ) space. If the camera has only a translational
component to its motion, then all the regions that show only horizontal velocity in
the (r, 8) space can be classified as due to stationary surfaces. The regions having
a vertical velocity component are due to nonstationary surfaces. The segmentation
algorithm first separates the stationary and nonstationary components on the basis of
their velocity components in (r, 8) space. The stationary components are then further
segmented into distinct surfaces by using the motion to assign relative depths to the
surfaces.
Summary
We have briefly surveyed the place of segmentation in vision algorithms as well as
common techniques of measurement-space clustering, single linkage, hybrid linkage,
region growing, spatial clustering, and split and merge used in image segmentation.
The single-linkage region-growing schemes are the simplest and most prone to the
unwanted region merge errors. The hybrid and centroid region-growing schemes are
better in this regard. The split-and-merge technique is not as subject to the unwanted
region merge error. However, it suffers from large memory usage and excessively
blocky region boundaries. The measurement-space-guided spatial clustering tends
to avoid both the region merge errors and the blocky boundary problems because
of its primary reliance on measurement space. But the regions produced are not
smoothly bounded, and they often have holes, giving the effect of salt and pepper
noise. The spatialclustering schemes may be better in this regard, but they have
not been wellenough tested. The hybrid-linkage schemes appear to offer the best
compromise between having smooth boundaries and few unwanted region merges.
When the data form a time sequence of images instead of a single image, motionbased segmentation techniques can be used. AlI the techniques can be made to be
more powerful if they are based on some kind of statistical test for equality of means
assuming that each region may have some small fraction of outliers and more flexible
if part of a rule-based system.
Not discussed as part of image segmentation is the fact that it might be appropriate for some segments to remain apart or to be merged not on the basis of the
gray level distributions but on the basis of the object sections they represent. The
use of this kind of semantic information in the image segmentation process is essential for the higher-level image understanding work. The work of McKeown, which
is discussed under knowledge-based systems in Chapter 19, describes a system that
uses domain-specific knowledge in this manner.
fs
Exercises
549
Exercises
10.1. Write a program to generate controlled images for the purpose of segmentation. One
model for the generation of a controlled image is to establish a background graylevel value and then place nonco~ectingor noninterfering shapes, such as disks
and polygons, on the image, each having a given gray level. Next additive Gaussian
noise can be included with a given standard deviation: This noise can be correlated
by preaveraging it with a Gaussian filter with a given standard deviation. Finally,
outlier noise can be added by choosing a munber of pixels to be affected by the
outlier noise from a Poisson distribution, then choosing the location of the pixels
to be affected by a uniform distribution over the spatial domain of the image, and
then choosing the value of the affected pixels from a uniform distribution over
the range of gray levels. Control parameters include contrast between shape and
background, area of shape, kind of shape, standard deviation of noise (after any
presmoothing), autocorrelationfunction of noise due to presmoothing, and Poisson
density parameter.
10.2. Thinlr about how a figure of merit for a segmentation process can be defined. );or
example, for the image generated in Exercise 10.1, a 100% correct segmentation
can be defined as an image I, in which each background pixel is labeled 0 and each
different disk or polygon created on the synthetic image has all of its pixels labeled
with the same label. Any algorithm-produced segmentation Is can be represented
as an image in which each pixel is given a label designating the segment to which
it belongs. A figure of merit for the segmentation of I, with respect to the correct
segmentation I, can be created from the contingency table T defined by
10.3.
10.4.
10.5.
10.6.
m.
CHAPTER
Introduction
-
556
557
on the bottom, so that all rows can be treated alike. The algorithm is expressed in
high-level pseudocode for an NLINES by NPIXELS symbolic image S as follows:
procedure border;
for R:= 1 to NLINES do
besin
for C:= 1 to NPIXELS do
begin
LABEL: = S(R,C);
if new_region(LABEL) then add (CURRENT,LABEL);
NEIGHB: =neighbors(R,C ,LABEL);
T: = pixeltype(R,C ,NEIGHB);
if T == 'border'
then for each pixel N in NEIGHB de
begin
CHAINSET:=chainlist(LABEL);
NEWCHAIN:=true;
for each chain X in CHAINSET while NEWCHAIN do
if N==rear(X)
then begin add(X,(R,C)); NEWCHAIN:= false end
end for
if NEWCHAIN
then rnakeaew-chain(CHAINSET,(R,C),LABEL)
end
end for
end
end for;
for each region REG in CURRENT
if complete(REG)
then begin connect-chains(REG); output(REG); free(REG) end
end for
end
end for
end border;
In this procedure, S is the name of the symbolic image; thus S(R,C) is the value
(JABEL) of the current pixel being scanned. If this is a new label, it is added to the
set CURRENT of current region labels. NEIGHB is the list of neighbors of pixel
(R,C) that have the label LABEL. The function pixeltype looks at the values of
(R,C) and its neighbors to decide whether (R,C) is a nonbackground border pixel.
If so, the procedure searches for a chain of the region with the label LABEL that
has a neighbor of (R,C) at its rear, and if it finds one, it appends (R,C) to the end of
the chain by the procedure add, whose first argument is a chain and whose second
argument is (R,C). If no neighbor of (R,C) is at the rear of a chain of this region,
then a new chain is created containing (R,C) as its only element by the procedure
makenew-chain, whose first argument is the set of chains to which a new chain
558
is being added. The new chain's sole element is the location (R,C), which is the
second argument of the procedure. Its third argument is the label LABEL to be
associated with the new chain.
After each row R is scanned, the chains of those current regions whose borders
are now complete are merged into a single border chain, which is output. The
hash table entries and list elements associated with those regions are then freed.
Figure 11.1 shows a symbolic image and its output from the border procedure.
559
Figure 11.2 Symbolic edge image containing a junction of three line segments
at pixel (3.3) and a potential comer at pixel (5,3).
5. Finding a comer
As in border tracking, efficient data structure manipulation is needed to manage
the information at each step of the procedure. The data structures used are very
similar to those used in the border algorithm. Instead of past, current, and future
regions, there are past, current, and future segments. Segments are lists of edge
points that represent straight or curved lines on the image. Current segments are
kept in internal memory and accessed by a hash table. Finished segments are written
out to a disk file and their space in the hash table freed. The main difference is the
detection of junction points and the segments entering them from above or the
left and the segments leaving them from below or the right. We will assume an
extended neighborhood operator called pixeltype that determines whether a pixel
is an isolated point, the starting point of a new segment, an interior pixel of an old
segment, an ending point of an old segment, a junction, or a comer. If the pixel is
an interior or endpoint of an old segment, the segment id of the old segment is also
retumed. If the pixel is a junction or a comer point, then a list of segment IDS of
incoming segments and a list of pixels representing outgoing segments are retumed.
A procedure for tracking edges in a symbolic image is given below. Figure 11.3
gives the results of its application on the symbolic image of Fig. 11.2.
Segment ID Length List
Figure 11.3 Output of the edge-rmck procedure on the image of Fig. 11.2.
assuming the point (5.3) is judged to be a comer point. If comer points are
not used to terminate segments, then segement 3 would have length 5 and list
[(3.3)(4,3)(5.3)(5,4)(5,5)1.
560
procedure edge-track;
IDNEW := 0
for R := 1 to NLINES do
for C := 1 to NPIXELS do
besin
NAME := address(R,C);
NEIGHB := neighbors(R,C);
T := pixeltype(R,C,NEIGHB,ID,INLIST,OUTLIST);
case
T = isolated point :
next;
T = start point of new segment:
begin
IDNEW := IDNEW + 1;
make_newsegment(IDNEW,NAME)
end;
T = interior point of old segment :
add(ID,NAME);
T = end point of old segment :
begin
add(ID,NAME);
output(1D);
free(lD);
end;
T = junction or comer point:
for each ID in INLIST do
besin
add(ID,NAME);
output(~~jt
free(1D)
end;
for each pixel in OUTLIST do
begin
IDNEW := IDNEW + 1;
rnakenew-segment(IDNEW,NAME)
end;
end case
end
end for
end for
end edge-track;
.
The exact details of keeping track of segment ids entering and leaving segments
at a junction have been suppressed. This part of the procedure can be very simple
and assume that every pixel adjacent to a junction pixel is part of a different segment.
In this case, if the segments are more than one pixel wide, the algorithm will detect
561
a large number of small segments that are really not new line segments at all. This
can be avoided by applying the connected shrink operator discussed in Chapter 6 to
the edge image. Another alternative would be to make the pixeltype operator even
smarter. It can look at a larger neighborhood and use heuristics to decide whether
this is just a thick part of the current segment or a new segment is starting. Often
the application will dictate what these heuristics should be.
{ 08'
iflO-yJ<l8*-yJ
otherwise
where
e* = { 0+360"
8 -360"
if8-y<O
otherwise
If the group has N pixels and a scatter of S2, then the statistical closeness of 8 to
y can be measured by a t-statistic having N - 1 degrees of freedom
The given pixel is then added to that group having the smallest t-value, provided
the a percentage point on the cumulative t-distribution with N - 1
that t < Tcr,N-l,
562
degrees of freedom. If t < Tasr-,,the mean and scatter of the group are updated:
If there were two or more previously encountered labeled neighbors, then after
the pixel is linked into its closest group, a test can be done to determine whether
the two groups closest to the given pixel should be merged. This merging test is
accomplished by another t -test. Suppose the group means are y and y2, the group
scatters are S: and q,and the number of pixels in each group is N 1 and N2,
respectively. Define y~ by
7%. =
where
{z
{ 7722 +- 360"
360"
if y2 - y 1 < 0
otherwise
The t-statistic having N l + N 2- 2 degrees of freedom is defined by
7;=
If t < T c r , N 1-2,
+ ~the
Z two groups are merged, creating a new group having N
pixels, scatter S2,and mean y , where
IB]I
The border-tracking and edge-linking algorithms we have discussed produce extracted digital arcs. An extracted digital arc is a sequence of row-column pairs in
which the spatial coordinates of successive row-column pairs are close together.
In fact, most often we would expect successive row-column pairs of an extracted
digital arc to be digital eneighbors or digital &neighbors. However, in the development that follows, we need only the assumption of spatial closeness and not the
assumption of 4-neighboring or &neighboring.
Arc segmentation is a process that partitions an extracted digital arc sequence
into digital arc subsequences having the property that each digital arc subsequence
is a maximal sequence that can fit a straight or curved line of a given type. The
endpoints of the subsequences are called comer points or dominant points. The
basis for the partitioning process is the identification of all locations (a) that have
563
sufficiently high curvature (high change in tangent angle to change in arc length) and
(b) that are enclosed by subsequences that can fit different straight lines or curves of
a given type. Maximal subsequences that can fit a straight line are subsequences for
which some measure of curvature is uniformly small. Maximal subsequences that
can fit a curved-line segment are sequences for which some measure of curvature is
uniformly high. The principal problem that must be handled by any arc segmentation
technique is the determination of the appropriate region of support for any curvature
calculation as well as the handling of spatial quantbtion and image noise, which
perturbs point location, sometimes systematically.
Next we discuss a variety of techniques for segmenting digital arcs into simple
segments. Simple here means either an arc segment that is a straight-line segment or
one that is a curved-arc segment containing no straight-line segments. The techniques
range from iterative endpoint fitting and splitting to using tangent angle deflection,
prominence, or high curvature as the basis of the segmentation.
+
+
Flgun 11.4 Geometry of the iterative endpoint fit and split. The pixel having
the farthest distance to the line AC is the pixel B. The iterative endpoint fit and
split segments the arc sequence at pixel 8 , creating two arc subsequences, each
of whicb bdter fit a straight-line segment.
564
allowable error; open, an input list of the beginning and final indices for each of
the segments (the endpointfitandsplit procedure will refine the position determined by open); segmentlist, the list of beginning and final indices for each of the
resulting segments; and sfig, which has a value of 0 if endpointfitandsplit does
not refine the input segmentation and a value of 1 if it does. The function remove
removes the first element from the list specified by its argument and returns the
first element as its value. The procedure add adds the item specified by its second
argument to the end of the list specified by its first argument. The procedure endpointlinefit inputs the arc sequence S, the beginning and final indices b and f
defining the subsequence of S being fit, a variable e, in which to retun the error of fit, and a variable k that marks the index of the point having the m a h u m
distance to the line constructed between the points indexed by b and f .
procedure endpointAitandsplit(S,E ,open,segrnentlist, sflag);
sflag := 0;
segmentlist := nil;
while open # nil do
besin
(b,f) := remove(open);
endpointlineAit(S,b,f,e, A);
if e, > e then
begin
sflag := 1;
dd(open,(b,k));
dd(open,(k+l ,f))
end
else add(segmentlist,(b,f))
end
end endpoint-fitandsplit;
procedure endpointlineAit(S,b,f,e- ,k)
d := ,/(r, - rb)2 + (c, - cb)2;
a := (C, - Cb)/d;
/3 := (rb - r,)/d;
7 := (ryeb- rbc,)/d;
forj : = b t o f d o
besrn
e, := 0;
e = jar, +Bej +TI;
if e > e, then
beein
emu := e;
k:=j
end
end
end for
end endpointline-fit;
565
Figure 11.5 Geometry of the exterior angle formed by two line segments.
566
where cr,(k) =
(cn-k -- cn )
rn-k
rn
( e n - Cn+k)
Rosenfeld and Johnston (1973) suggest associating with (r,,c,) the largest k, satisfying cos 8,(m) > cos 8,(m - 1) > . . . > cos B,(k,) 2( cos8,(kn - I), where m is
the largest k to be considered. Rosenfeld and Johnston use m = N/10. Care must
be used in selecting m since it must be no larger than the smallest number of points
in a line segment of the arc sequence if the computed value cos8,(kn) is to have a
proper geometric meaning.
Finally, we must recall that cos 8,(k,) measures the exterior angle at (r,, c,)
only if (r,, c,) is a place where two line segments meet. To judge whether (r,, c,)
is indeed a place where two line segments meet, we can use the fact that at a place
where two line segments meet, cos8,(kn) will be smaller (the angle will be larger)
than the corresponding cosine values of the successor and predecessor positions.
This motivates Rosenfeld and Johnston's criterion of deciding that (r,, c,) is a point
at which two line segments meet if and only if
for all i satisfying In - i 1 5 k, /2.
Davis (1977) discounts any local maximum in any extended neighborhood of
size k for which there is some nearby point that has a significant exterior angle in
some slightly smaller extended neighborhood. That is, (r,,c,) is the meeting place
of two line segments if and only if
where t is a threshold chosen so that points with exterior angles whose cosine is
larger than t are considered to be part of straight-line segments, and s(k) represents
the maximum expected uncertainty in the position of an angle as a function of k,
the size of its extended neighborhood.
Freeman and Davis (1977) and Freeman (1978) measure the prominence of a
comer for each point (r,,c,) in an arc sequence. A point is a prominent comer
point to the extent that:
1. There is a large extended neighborhood preceding (r,,c,) that has small curvature;
2. There is a large extended neighborhood succeeding (r,, c,) that has small curvature;
3. The extended neighborhood around (r, ,c,) has large curvature.
They define the "curvature" b,k as twice the mean over two adjacent angular
differences of line segments defined by an extended neighborhood size of k. This is
>
567
then
where
Shirai (1973) uses the following idea to associate an angle change at each
interior point of an arc sequence. Let the digital arc sequence S =< ( r ,c ),. . . ,
( r N c, N )> . Let a positive integer rn > 1 be given to specify the arc neighborhood
size. For each n, rn + 1 5 n 5 N - rn, the angle change 6, at (r,, c,) is defined
as the exterior angle between the line segment defined by the points (r,-,,c,-,)
and (r,,c,) and the points (r,, c,) and (r,,,, en+,).This is illustrated in Fig. 11.5.
Each digital arc subsequence is then a mstximally long subsequence of S in which
successive points of the subsequence are successive points of S and where 6, < 6,
for each point (r,,c,) of the subsequence or where 6, > 6- for each point (r,, c,)
of the subsequence.
In a sequence S =< (rl,c : ) ,. . . ,(rN,c N )>,the average exterior angle change
is shown by
N-m
each. Each pair of such successive subsequences has associated with it an angle
change. There are Nlrn - 1 such angle changes. This motivates the quantity
568
Using Eqs. (11.2) and (11.3) in Eq. (11.4) permits a chord-to-arc distance to
be measured for any arc sequence.
To determine whether a digital arc subsequence is one that fits a straight or
a curved line, the following classification scheme can be used. Let chord-to-arc
distance thresholds d , and do,d , < d*, be given and central angle 8' be given.
Then for any arc subsequence for which 6, < 6- or 6, > 6- for every point
in the subsequence, the arc subsequence can be classified as a straight line if (1)
d < d , or (2) d < d* and 8 < 8'. Otherwise it is classified as a curved line (Shirai,
1975).
Instead of wing endpoints to define line segments, Pavlidis (1973), Davis
(1977), and Anderson and Bezdek (1984) use a least-squares fit to determine a
line. Anderson and Bezdek incorporate a least-squares fit of a fixed number m of
points to segment an arc in the following way. They look for the first point in the
arc sequence for which a least-squares straight-line fit is sufficiently good. This fit
constitutes the baseline. Then they locate the following point p that can begin an
m-pointleast-squares fit whose angular orientation is sufficiently different from the
orientation of the baseline fit. The next breakpoint is the point q that can be no more
than m points following p for which the local tangential deflection is a relative extremum. If there is no local extremum, q is mlc points following p. The local
tangential deflection of a point v is measured by the angular orientation difference
of the m-point least-squares fit preceding and following v .
Anderson and Bezdek (1984) derive the following computationally efficient
means to determine the cosine of twice the tangential deflection angle arising
from two least-squares line fits. For any digital arc sequence S =< (r,,cl), . . . ,
( r N ,cN) >, define the normalized scatter matrix A by
569
where
Let
be the normalized scatter matrices for two arc subsequences, and let A9 be the
tangential deflection angle between the angular orientations of the least-squares fit
to the arc subsequences. Then Anderson and Bezdek (1984) derive that
570
from P and Q to x , and keeping track, independently on each side, of those directions that point farthest away from the direction of the line on the opposite side.
We call these directions the boundary directions. Initially these boundary directions
actually point toward each other. As each successive point of the sequence is processed, one or both of these boundary directions change in a way that brings the
directions closer to being parallel. Eventually a point is reached where these boundary directions no longer point toward each other, but away from each other. The
first point at which this happens becomes the final point of the segment. The final
fitted point of the segment is then computed as the intersection point of two lines,
the first being the line passing through the beginning point in a direction of the average of the boundary directions and the second being the line passing through the
final breakpoint of the segment in a direction perpendicular to the average of those
directions that point farthest away from the directions of the line on the opposite
side. This is illustrated in Fig. 11.6.
The procedure segmentgrow details the algorithm. Its input arguments are
S =< (r c ,), . . . ,(rN,cN) >, the digital arc sequence, and A, the specified bound.
It produces segmentlist, which is a list of beginning and final indices for the points
in each segment. It calls a function avg-tangentdirertion, which determines the
average forward-looking tangent direction at a point having index b of arc sequence
S. It also calls a function linedir, which, given two points, determines the direction
cosines of the line passing through the two points.
571
repeat
begin
f := f+l;
d := dist((a1,81) + (u1,v1),(a2,B)+ ( ~ 2 , ~ 2 ) ) ;
(g1,h1) := linedir((a1,P1), (rfrcf));
(g2,hz) := linedir((a2,82>,(rf ,cf));
if dist((a1,81) + (g1,hl),(a2,82) + (u2,~2))> d
then (u1,v1) := (gl,hl);
if dist((a1,81)+ ( ~ I , v I ) , ( ~ z ,+&(g2,hz))
)
>d
tJml(u1,vl) := (g1,h,);
end
until f=N or dist((a1,81)+ ( ~ I , u I ) , ( ~ z ,+&(u2,v2))
)
2 2h;
push(segmentlist,(b,f ) );
572
third argument in the jth position of the list specified by its first argument; j is its
second argument.
procedure movebreakpoints(S,segmentlist,sflag);
sflag := 0;
flag = 1;
L = length(segmentlist);
while Bag = 1 do
begin
flag = 0;
for j := 1 to L-1 by2 do
adjust(Sj ,segmentlist,flag);
end for sflag := or(sflag,flag);
forj : = 2 t o L b y Z d o
adjust(S,j,segmentlist,flag)
end for sflag := or(sflag,flag);
end
end move-breakpoints;
procedure adjust(S,j,segmentlist,flag)
(b,, f ,) := get-element(segmentlist,j);
(b f j+l) := get-element(segmentlist ,j +I);
e, := errornorm(S,bj,fj);
ej+l := err~rnom(S,b,+~
fj+l);
if e j > ej+, then
begin
d j = errornom(S,b,, f, - 1);
dj+l = erromom(S,bj+l - l,fj+l);
if m a ~ { d ~ , d ~
<+max{e,,ej+,}
~)
then
begin
f, := f, - 1;
bj+l := bj+1 - 1;
flag := 1
end
end
e k
begin
d j = erromorm(S,bj, f j - 1);
dj+l = err~rnom(S,bj+~
+I ,fj+l);
if max{dj, dj+l) < max{e,, ej+,) then
besin
f, := f, - 1;
bj+l := bj+l+l;
flag=l
end
end;
573
put-element(segmentlist,j,(bi,f j));
put-element(segmentlist,j,(bj+l,fj+l));
end adjust;
The procedure adjust actually performs the trial shifting of breakpoints. Each
call to adjust either moves a breakpoint that reduces the error max{ei,ei+,) and
leaves all the other errors the same or does not move a breakpoint and therefore
leaves max{ei,ei+,) as well as all other errors alone. Therefore after each iteration
through the while loop, either some breakpoint has been moved, in which -case
max{e,, . . . , e l ) is reduced and the while iterates, or no breakpoints have been
moved, in which case the while terminates. So as the iteration proceeds, the sequence
whose terms are the maximum resulting error constitute a nonincreasing sequence
bounded below by 0. Therefore it must terminate. The termination, however, may
be at a local minimum rather than at the global minimum.
575
Pi := add(Pi, (ri,ci));
end
end for
end
until p q - I = Pq ;
p =p4-1;
(a,8,y) = (aq-', Pq-I, yq-I)
end isodatalinefit
function indexmindist((r,c),(a,@,y));
d: =verylargenumber;
for k=l to K do
begin
dist:=Ia(k)r 8(k)c y 1;
if dist < d then
begin
d: =&st;
indexmindist := k
end
end
end for
end indexmindist
The isodata segmentation as given in the procedure isodatalinefit puts together
in the same cluster points from S that may be in two collinear segments. So a
final pass through the ordered points in S must be made to determine the proper
subsequences. If a point belongs to the same cluster as the previous point, then it
is assigned to the same subsequence. But, if it belongs to a different cluster, it is
assigned to a new subsequence.
11.5.7 Curvature
The geometric idea of curvature is simple. For a planar curve, the curvature at a
point on the curve is the limit, as arc length change goes to zero, of the change
in tangent angle divided by the change in arc length. Curvature has been long
established to play an important role in shape perception (Attneave, 1954, 1957).
Places of natural curve breaks are places of curvature maxima and minima. Places
where curvature passes through zero are places where the local shape changes from
convex to concave.
For this discussion we represent the curve parametrically as [r(t), c(t)],
a < t < b. For any t, the arc length s(t) going from [r(a),c(a)] to [r(t),c(t)] is
given by
576
Hence
The unit length tangent vector T at [r(t),c(t)]as measured clockwise from the
column axis is given by
577
r, = a +Rsint,
=b+Rcost,
rnfl = a + R sin t,+~
c,+~= b +R cos t,+I
sin 4, = cos t,
cos 4. = - sin t,
sin
= cos t,,~
cos +,+I = - sin tn+l
C,
+ (cn+12- '"'(sin
- sin 4,)
Figun 11.7 Geometry of the circular arc segment defined by two points (rn,cn)
and (r,+I ,cn+l)and their tangent directions 4n and 4,,+1,
respectively.
(11.7)
578
Hough Transform
The Hough transform (Hough, 1962; Duda and Hart, 1972) is a method for detecting straight lines and curves on gray level images. The method is given the
family of curves being sought and produces the set of curves from that family that
appears on the image. Stockman and Agrawala (1977) were the first to realize that
the Hough transform is template matching. Rosenfeld (1969) describes an implementation that is almost always more efficient than the original Hough formulation.
Here we describe the Hough transform technique and show how to apply it to finding
straight-line segments and circular arcs in images. We will emphasize the O'Gorman
and Clowes (1976) formulation. We then present a Bayesian approach to the Hough
transform.
The equation y = mx
b for straight lines does not work for vertical lines. The
equation d = x cos 8 y sin 8, where d is the perpendicular distance from the line to
579
the origin and 8 is the angle the perpendicular makes with the x-axis, was suggested
in Duda and Hart (1972) and used by O'Gorman and Clowes (1976). We will use
this form of the equation but convert to row (r) and column (c) coordinates. Thus
our equation becomes
where d is the perpendicular distance from the line to the origin of the image
(assumed to be at upper left), and 8 is the angle this perpendicular makes with
the c (column) axis. Figure 11.8 illustrates the parameters of the line segment.
The accumulator A has subscripts that represent quantized values of d and 8.
O'Gorman and Clowes quantized the values of d by 3s and 8 by 10" increments
in their experiments on gray level images of puppet objects. An accumulator
array quantized in this fashion is illustrated in Fig. 11.9. The O'Gorman and
Clowes algorithm for filling the accumulator A and parallel list array PTLIST
can be stated as follows:
580
procedure accumulate;
A := 0;
PTLIST := NIL;
for R := 1 to NLINES do
for C := 1 to NPIXELS do
begin
DR := row-gradient(R,C) ;
DC := col-gradient(R,C) ;
GMAG := gradient(DR,DC);
if GMAG > gradient-threshold
then begin
THETA := atan2(DR,DC);
THETAQ := quantizeangle(THETA);
D := C*cos(THETAQ) + R*sin(THETAQ);
DQ := quantize-distance(D);
A(DQ,THETAQ) := A(DQ,THETAQ)+GMAG;
PTLIST(DQ,THETAQ) := append(PTLIST(DQ,THETAQ),(R,C))
end
end
end for
end for
end accumulate;
The algorithm is expressed in rowcolumn space to be consistent with the other
algorithms in the book. The functions mwgmdient and columngmdient are neighborhood functions that estimate the row and column components of the w e n t , and
the function gmdient combines the two to get the magnitude. The function atan2 is
the standard scientific library function that returns the angle in the correct quadrant
given the row and column components of the gradient. We assume here that atan2
returns a value between 0" and 359". Many implementations return the angle in radians, which would have to be converted' to degrees. The actions of the procedure
are illustrated in Fig. 11.10. Notice that with a 3 x 3 gradient operator, the lines
are two pixels wide. Notice also that counts appear in other accumulators than the
two correct ones.
Procedure accurrzulate is O'Gorman and Clowes's version of the Hough method
and pretty much follows the Hough theory. Once the accumulator and list arrays
are filled, though, there is no standard method for extracting the line segments.
O'Gorman and Clowes presented an ad hoc procedure that illustrates some of the
problems that come up in this phase of the line-segment extraction process. This
procedure can be expressed as follows:
procedure find-lines;
V := pickgreatest-bin(A,DQ,THETAQ);
while V > value-threshold do
begin
list-of-points := reorder(PTLIST(DQ,THETAQ));
for each point (R,C) in list-of-points do
1
2
3
4
5
THETAQ
360
360
DQ:
6
3
0
DQ:
6
3.
0
0 10 20 30 40
THETAQ
accumulator A
90
THETAQ
PTLlST
4 ( 1,3)(1,4)(2,3)(2,4)
0 (3,4)
4 (3,3)(4,4)
V (3,1)(3,2)(4,1)(4,2)(4,3)
Figun I 1.I0 Results of the operation of procedure acrumulate on a simple gray
level image.
end
end while
end findlines;
The function pick-gmtest-bin returns the value in the largest accumulator
while setting its last two parameters, DQ and THETAQ, to the quantized d and 8
values for that bin. The m d e r function orders the list of points in a bin by column
coordinate for 8 < 45 or 8 > 135 and by row coordinate for 45 5 8 5 135. The
arrays D and THETA are expected to hold the quantized D and THETA values for a
pixel that were computed during the accumulation. Similarly the array GRADIENT
is expected to contain the computed gradient magnitude. These can be saved as
intermediate images. The merge procedure merges the list of points from a neighbor
of a pixel with the list of points for that pixel, keeping the spatial ordering. The
set-tozero procedure zeroes out an accumulator so that it will not be reused.
Finally, the procedure craatesegments goes through the final ordered set of points
searching for gaps longer than one pixel. It creates and saves a set of line segments
terminating at gaps. O'Gorman and Clowes use a least-squares procedure to fit lists
of points to line segments.
Finding Circles
The Hough transform technique can be extended to circles and other parametrized curves. Kirnme et al. (1975) develop a program for finding circles in
chest x-rays. Like O'Gorman and Clowes, they use a gradient technique to reduce
the dimension of the parameter space to be searched. The standard equation of a circle has three parameters. If (r,c) lies on a circle, then gradient (r,c) points to the
center of that circle, as shown in Fig. 11.11. So if a point (r, c) is given, a radius
d is selected, and the direction of the vector from (r,c) to the center is computed,
583
Figure 11.11 Direction of the gradient at the boundary points of a circle. The
inward pointing gradients are the ones that will accumulate evidence for the center
of the circle.
then the coordinates of the center can be found. The radius, d , the row-coordinate
of the center, r,, and the column-coordinate of the center, c,, are the three parameters used to vote for circles in the Hough algorithm. Circles are represented by the
equations
584
This procedure can easily be modified to take into account the gradient magnitude,
like the procedure for line segments.
Extensions
The Hough transform method can be extended to any curve with analytic equation
of the form f (x,a) = 0, where x denotes an image point and a is a vector of
parameters. The procedure as expressed by Ballard and Brown (1982) is:
1. Initialize accumulator array A(a) to zero.
2. For each edge pixel x, compute all a such that f (x, a) = 0 and set A(a): =
A(a)+l.
3. Local maxima in A correspond to curves off in the image.
If there are m parameters in a, each having M discrete values, then the time complexity is O(MM-').
The Hough transform method has been further generalized to arbitrary shapes
specified by a sequence of boundary points (Ballard, 1981a). This is known as the
generalized Hough transform.
Variations
A number of hybrid techniques exist that use some of the principles of the Hough
transform. The Bums line finder (Bums, Hanson, and Riseman, 1986) was developed to find straight lines in complex images of outdoor scenes. The Bums method
can be summarized as follows:
1. Compute the gradient magnitude and direction at each pixel.
2. For points with high enough gradient magnitude, assign two labels representing
two different quantizations of the gradient direction. (For example, for eight
bins, if the first quantization is 0 to 44, 45 to 90, 91 to 134, etc., then the
second can be -22 to 22, 23 to 67, 68 to 112, etc.) The result is two symbolic
images.
3. Find the connected components of each symbolic image and compute line length
for each component.
Each pixel is a member of two components, one from each symbolic image.
Each pixel votes for its longer component.
Each component receives a count of pixels that voted for it.
The components (line segments) that receive the majority support are selected.
The Bums line finder takes advantage of two powerful algorithms: the Hough
transform and the connected components algorithm. It attempts to get rid of the
585
where 6 is some fixed number related to a function of the resolution cell size. We
refer to such a line segment as one with parameters (8, d).
The Bayesian approach to the Hough transform computes for each quantized
value of 8 and d the conditional probability
P[a line segment exists with parameters(8, d) I I ( r , c) : (r, c) E E(8, d)].
586
(11.8)
P[a line segment exists with parameters (8,d)l
P[I(r,c) : (r,c) E E(0,d)l
We henceforth denote the conditional probability
P[a line segment exists with parameters (8, d) I I(r, c) : (r, c) E E(8, d)]
by P[8, d I I ( r , c) : (r, c) E E(8, d)]. We assume that the observations {I(r, c) :
(r,c) E E(8,d)) are independent, conditioned on the line parameters. That is,
P(I(r,c) I ,,&I
(11.9)
(r,c)tE(B.d)
Then
P[I(~,c) I 8 , 4 P ( o , ~ )
(r,ckE(B,d)
But the conditional distribution of the observations, given that there is no line, is,
within a fairly good approximation, the unconditional distribution of
the observations because the Prob(1ine) << Prob(no1ine). That is, the right
bracketed term of Eq. (11.11) is approximately one. Hence, Eq. (11.11)
simplifies to
P[8,d 1 I(r,c) : (r,c) E E(8,d)l =
p[r(r,c) I 8,611 P ( e , a
(r,c)cE(B,d)
=
(r.ckE(0.d)
(11.13)
587
q(8,d)
ifI(r,c)=land(r,c)~E(8,d)
1 -q(8,d)
and
P[I(r,c)( no line] =
ifI(r,c)=l
1 -w ifI(r,c) = 0
where q(8,d) is a specified parameter function related to the edge operator employed, then the Hough transform H(8, d) takes the form
+ C
(r.r)fE(#,d)
l(r.r)=l
=log P(8,d) +
+ #{(r,
(r.r)fE(O,d )
I(r.r)=O
log w -
log (1 -w)
(r.r)fE(O.d)
I(r.r)=O
9(8 d) log -A
(r.c)fE(O.d)
= log P(8, d)
log [1 - q(O.d)]
log q(8, d)
(r.r)fE(@.d)
I(r.r)=l
(r.rlfE(0.d)
1 -w
log 1 -q(8,d)
q(8, d)
c)eE(B, d)lI(r, c) = 1) log W
588
To understand the relationship between Eq. (11.16) and the Hough transform
as we have defined it, we rewrite Eq. ( 11.13) as
P['(r$c)l@,4
H(esd) = (r.c)cE(@,dl log P[I(r, c)lno line 1
log p(e,
(11.17)
in which case we see that what should be summed is log likelihoods and not gradient
strength. If a pixel (r, c) in E(8, d) has an angle T(r ,c) very much different from
8, then the log likelihood will be small. The closer T(r,c) is to 8, the larger the
log likelihood will be. In the O'Gorman and Clowes Eq. (11.16), the pixels in the
summation are restricted to only those in E(8,d) for which T(r,c) = 8 and for
which B(r,c) = 1. In Eq. (11.17) we find that all pixels in E(8,d) must be in the
summation.
Line Fitting
Here we give a procedure for the least-squares fitting of a line to the observed noisy
values. We derive expressions for the estimated parameters of the fitted line and their
variances, as well as a variance expression for the orientation of the line. Because
we need to differentiate between the observed noisy values and the d u e s before
random-noise perturbing, we adopt a notational change in this and the following
sections. We denote the noisy observed values by (in,
en)and the unknown values
before noise perturbation by (r,, c,).
Suppose that noisy observations (Pa,2,) of points (r,, c,) that lie on a line
cur, + Bc, + 7 = 0 are given. Our model for (i,,i.,)is
589
where we assume that the random variables tnand qn are independent and identically
distributed, having mean 0 and variance u2,and that they come from a distribution
that is an even function. Hence
' = j
otherwise
Dorff and Gurland (1961) use a similar model for the noise, but they use the
representation c = mr +b instead of the more general representation ar +/3c +y = 0.
By way of notation we define the following moments
Pcc
g C ( c "n=I
which directly relate to the unknown parameters a , & and y of the line on which
the points (r,, c,) lie. It is easy to determine that
Now from the noisy observations (?,,en) we must estimate the parameters of
the unknown line. To do this we employ the principle of minimizing the squared
residuals under the constraint that hZ 6' = 1. Using the Lagrange multiplier form,
590
we define
n=l
Upon taking the partial derivative of r2 with respect to 9 and setting the partial
derivative to zero, we have
Letting
l N
and
f i r = - C f n
I N
fiC= -ten
n=l
we can obtain
n=l
+ = -(&fir + bfic)
Hence
C 2 [&(in
- I%) + &en - &)I(in
fir)
n=l
Letting
pcc
N -l
n=l
( e n -fic12
- k(2&)N = 0
So the sought-after
591
(i)
C [&(in
-
fir)
= ( ~ - l ) ( h
(i)
,+(en
- ire)]
,+)(trr
Pre
Pee
ere)
(i)
[(k
(A ;)I (i)
-i
!re)
Pre
Pec
=0
Therefore
f i r
fire
fieC-il=~
f i r
The smaller eigenvalue corresponds to the minus sign. With ); determined, the
corresponding unit length eigenvector can be determined.
(i)f i e +
=
-f i r 2
-f i r
( -fire- i )
fire
ficc
592
To find the expected value and variance of &, we need a way by which the expected
and ); - irr
can be related to &. We expand & around the
value and variance of irC
point ( p r c , 0,p,,) in a first-order expansion. There results
= s i g n ( ~ ( ~ ~ ) u which
, ( ~ ~ , is true under our model, and
Prc
" = d m
and /3 =
-Prr
we obtain
&=a+-
1
Prr
[ ( h C - prc)(-B)
+ Pcc
+( 1 - i r r +prr)a]
( 1 1.26)
Then to determine E[&]we simply take expectations on both sides of Eq. ( 1 1.26):
E [ h l =a
1
+ Prr
+ Pcc [ - ~ ( f i r c - p r c ) B + EO; - + prr)a]
irr
E [(&
- a)']
- CX)'] ,
1
(Prr
+~ c c ) '
E{
[ - ( b r c - grc)O
+ ( 1 - brr +~ r ~ ) a ] ' }
-8,
593
Hence
&,we obtain
8 = 8 + P[ ( i r e - ~ r e ) ( - a ) + O; - f i r e + p r C ) 8 ]
r r + Pcc
( 1 1.29)
from which
- 4 a 2 ~ 2u 2 ( P r r + Pcc + 0 ' )
N - 1
Therefore
From Eqs. ( 1 1 . 2 6 ) and ( 1 1 . 3 0 ) we can determine the covariance between & and
P.
594
Hence
+PpC)
9 = - ( & f i r +PC).
and
v[+]
= E [(j - yl21
= E [(-h(pr
where $ = -
c6
N
n=l
and t,, and 7. constitute the random noise as given in Eq. (11.18). After some
simplification and rearrangement,
Therefore
v[+]
= p: V [ h ]+ p: V @ ]+ 2 p r p C ~ o v (8)
h.
+ 0' (v[&I
+ v@1+1)
prr+
+
[
= u2 { I + ( N - lj(prr pcCI2
-)
+]
(1 1.32)
595
Similarly,
I
E [(B
-e ) ~ ]
Since
+E [(sin 0 - sin 8 ) ~ I
] E [(& - a ) 2 ] + E [(b- B)']
u2(prr
+ + g2)
~(cc
( N - l ) ( ~ r r+ pee)'
+ pee,
where the functions f k ( r , c), k = 1,. . . ,K are given and the unknown parameters
596
If the observed noisy points are (i,,t,), n = 1,. . . ,N, a fit that minimizes
f (r,, c.)~ can be determined by the principal-axis technique.
The objective function to be minimized is
c:=,
where
~ I ( F I , ~ f1(?2,t2)
I)
fl(?n,tn)
and a =
f ~ ( i l , t l >f ~ ( i 2 ~ 2 2 ).
f~(in,tn)
This means that cr must be an eigenvector of FF'. To see which one, note that
a: = 1, he minimizes e2 subject to a:
subject to
&=I
+ a:
= 1. However, as
597
we shall see in the next section, the objective function that the original technique
or Bookstein's modified technique minimizes is not the most appropriate one. Indeed, experience with the principal-axis technique has shown that the fit it provides
to conics is often too eccentric, because the technique, in effect, gives too. much
influence to outlier points.
Region-of-Support Determination
Teh and Chin (1989) argue that the determination of the region of support for the
calculation of the straight-line fits that are associated with tangential angle deflection
or curvature are of primary importance. Techniques that use a fixed or nonadaptive
region of support easily fail on curves that have segments of widely varying lengths.
If the region of support is too large, fine features will be smoothed out. If the region
of support is too small, many comer points or dominant points can be produced.
Davis (1977), for example, gives instances where this happens.
Teh and Chin suggest that the appropriate region of support for a comer point
is the largest symmetrically placed subsequence for which the chord joining the
subsequence endpoints has maximal length. To prevent every point of a digital
circle from being assigned a comer point, they add a second, alternative condition
requiring that the ratio of the perpendicular distance between the point and the chord
between the arc subsequences to the chord length be maximal. Formally, the region
of support D, for an observed point (in,2,) is given by
where
and
j=-k.
....k
in-
+ (C, - en,)*
Teh and Chin (1988, 1989) give some experimental results demonstrating the efficacy
of this way to determine the region of support, and they compare their results against
some of the other techniques.
Another way to determine the region of support can be obtained from Eq.( 1 1.35),
the variance of the angle of the fitted line. At point (i,,C,) of the arc sequence,
the angle of the fitted line using points (in_&,i?n-&),
. . . ,(Pn,Cn)has variance
598
for j > i .
Likewise the angle of the fitted line using the noisy observed points (in+l,
. . . ,(in+k+l,has variance V(n+ 1 , n + 1 +k). The total variance around the
endpoints (in,
2,) and (in+,
,:,+I) is then V ( n-k, n) V ( n 1, n 1+k). So long as
the observed points ( i n - k , e n - k ) , . . . ,(in,
2,) are noisy observations of points lying
,?,+,), . . . ,(in+l+k,
En+1M) are noisy
on the same line and the observed points (in+,
observations of points lying on the same line, V ( n - k , n) + V(n + 1, n + 1 + k )
will be decreasing as k increases. However, as soon as a point that is not a noisy
observation of a point lying on the predecessor or successor line is included in either
the predecessor sequence or the successor sequence, V ( n-k, n)+V ( n+1 , n 1 +k)
increases. This motivates defining the region of support around the right endpoint
( i n , e n ) by
+ + +
{n-k
,...,n ( P ( n + l - k , , n ) + P ( n + l , n + k , ) 2...
2 P(n + 1 +k,n) + P(n + 1,n + k )
< ~(n+k,n)+~(n+l,n+k+l))
and around the left endpoint (in+,
,tn+,)
by
599
endpoint of the left segment, and N is the index of the rightmost endpoint of the
right segment. However, if the minimizing n is not within N/2 of the middle of the
sequence, there will be a bias to the result. The bias can be removed by limiting
the segment with the greater number of points to be no more than three times the
number of points in the smaller segment.
EXAMPLE 11.1
Table 11.1 lists a sequence of 24 row-column pairs of points on two noisy line
segments meeting at a comer point of about 105". The digital curve defined
by this sequence is shown in Fig. 11.12(a). Shown in Fig. 11.12(b) is the log
of the variance u as a function of point index n, where u(n) = V(k,,n) +
V(n, N - k, I), k, = 4, and N = 24. Notice the sharp minimum that occurs
at n = 14, the comer point.
Table 11.1
+ +
600
Log (EVfN))
COL
(a)
Figure 11.12 (a) Twenty-four point digital curve. (b) Log of the variance of the
digital curve as a function of n, where v(n) = V ( k o ,n) +V(n,N -ko + l), ko =
4, and N = 24.
and let
where E is a fitting-error vector. Then, the weighted least-squares sense, one way
to formulate the problem is to find a vector p that minimizes the total weighted
fitting error etPe subject to the condition that 1 lp11 = 1, where the weight matrix P
is a diagonal matrix.
601
singular-value decomposition of W k A be
W k A= USV'
where
In the singular-value decomposition, both U and V are orthonormal matrices. Without loss of generality, we may assume that s, >_ s2 2 s3.Then the total weighted
fitting error becomes
tlPke= plA'PAp
= p'( US V1)'(USV')p
= p'VS2 V'p
This error has minimum value s: by taking p = v 3 . For p = v 3, the weighted fitting
error Wkt can be expressed as
elPkt = S:
Now let U2be the N x (N - 2) matrix that consists of the columns 3 through
602
N of the matrix U, that is, U2 = (u3,. . . ,uN). Define the redundancy matrix
R = {cj) = U2U;.
Then
N
Let ei =
where
otherwise
and where c is a constant and Z is the median of leiI.
Least-Squares C u m Fitting
Suppose that a set of rowcolumn positions {(i,,t,))f=, has been determined by
any of the arc extraction techniques. As in the case of fitting a line, the positions
(in,t,), n = 1, . . . ,N are assumed to be noisy observations of points coming from
some curve f (r,c, w) = 0 whose parameter vector, w, must be estimated. That is,
where f (r,, c,, w ) = 0.In this section we treat the problem of determining the free
parameters w of the curve f that make the curve f a best fit, in the least-squares
C,), n = 1, . . .,N. The problem we must solve,
sense, to the given positions (in,
therefore, is to determine the parameter vector w and points (r,, c,), n = 1, . . . ,N
that lie on the curve and are closest to (in,En), n = 1,. . . ,N, respectively.
603
604
bf
= 2(r - ro) - 2A-(r,c) = 0
dr
br
d
= 2(c - co)- 2 ~ -f ( r , ~ =
) 0
bc
dc
( 1 1.36)
Bff
ah
( 11.37)
( 1 1.38)
= -2f(r,c) = 0
From Eqs. ( 1 1.36) and ( 1 1.37) we can write the matrix equation
0=f(r,c)=f(r.,c0)+(r-r0)-(r0,~.)+(c-~.)-(~..~.)
dr
af
bc
(11.40)
An approximate solution from Eqs. (11.39) and (11.40) is easily obtained. Assuming
that the unknown ( r ,c ) is close enough to (r,, c.) so that the partial derivatives of
f do not change much around (r., c.), we obtain that %(r,c ) = $$(r co) and
g ( r , c ) = g ( r o ,c.). In this case Eq. ( 1 1.39) becomes
..
Multiplying the left- and right-hand sides of Eq. ( 1 1.41) by [f$ (I.. co), (r., c.)]
using Eq. ( 1 1 .a),
and solving for A, we obtain
Substituting Eq. ( 1 1.42) into Eq. (11.39) and solving for ( r ,c ) then results in
The distance d between (r,c) and (ro,co)can then easily be computed from
Eq. (11.43).
This relation (11.44) is exactly correct for any f that is a first-order polynomial.
605
From this approximate solution, Eq. (11.44), to the determination of the distance
between a point and a curve, we can solve the original minimization problem, which
is to determine the parameter vector w to minimize
subject to the constraint that f (r,, c,,w) = 0, n = 1,. . . ,N. With the use of
Eq. (11.44), this problem then translates to finding the parameter vector w to minimize
where V is the gradient operator. Now to find the right Aw, consider the fact that
be negative and smaller than e2(w,) in magnitude.
A w 'Ve2(w,) ~nwt
This suggests that Aw should be in the negative gradient direction and should
produce a change smaller than e2(w,). Hence we take
The a2in the denominator assures us that as 11 Ve2(w,)112 becomes smaller than a2,
11 Aw 112 will also become small. The parameter /3 is some fraction, 0 5 /3 5 1, that
can be a constant or a function of the iteration index or a function of w +A w,e2,and
a tentative wt+,. For example, B = 0.5 (F)is one choice that for t = 1 produces
B = 1.0 and for large t produces B = 05. The procedure curvefit details the
iterative structure solution algorithm. Its input arguments are w,, the initial guess;
606
:= wo;
for n = 1 to N do
besin
Wf
Ve2(w)
.
Aw := -fle2(~f))Ivez(Wf)II!+
wf := wf + Aw* step(wf,Aw,e2)
end
end for
2
1
end curvefit;
The number of iterations can be reduced some by using a more sophisticated
scheme for selecting the magnitude of Aw .Equation (11.48) can be used to select a
trial size for Aw. Then the value of E ~ ( W , =
+ ~e2(wt
) + Aw) can be compared with
e2(wt).If E ~ ( W ,<
+~
c2(wt),
)
a successful steepest-descent size has been determined.
Now a small search of increasing step sizes may be done to determine whether there
is a large step size that produces a smaller c2(wt+,).If so, we can use it. On the
other hand, if E ~ ( W ,>
+ ~e2(wt),
)
the trial step size is too large. A smaller step size
can surely produce an E ~ ( W ,that
+ ~ )can satisfy e2(wt+,)< e2(wt).SO in this case
we can do a small search on reduced step sizes to find a sufficiently small step. The
following function illustrates this idea.
function step (w,, A w,e2);
Wt+1 := W, + Aw;
e, := E ~ ( W , + ~ ) ;
c := 1;
ifep<e2(wt)then k : = 3
else
k := .333;
form :=1 to 5 do
begin
c : = c *k;
step :=c;
e := e2(wt+ CAW);
if e < E,
begin
e, := e;
step :=c
end
else break
end
end for
end step;
607
e2(w,
(11.49)
where H = H(w,), the Hessian of e2, is the matrix of second-order partial derivatives of e2 evaluated at w,. To find the Aw that minimizes Eq. (11.49), take partial
derivatives of Eq. (1 1.49) with respect to Aw,'set the partial derivative to zero,
and solve for A w This produces
instead of Eq. (1 1.50). When X gets large, the direction that Eq. (1 1.51) produces
is the negative gradient direction, and when X gets close to zero, the direction that
Eq. (1 1.51) produces is the Newton direction. Since e2 is the sum of N terms, it
would not be unreasonable to consider using h = N.
(5,
(z,
where
Vf=(:)
and F = (
0
1
(rO,cO)
g ( r O , c o ) &r0, co)
608
so that
Hence
( 11.54)
(11.55)
where
C = f (ro,co).
Once Eq. (21.55) is solved for A, each of the two possible values can be
substituted back into Eq. (11.53), and the squared distance dZ to the curve is then
determined by
d2 = (r - r,)'
+ (C - c
~ =) A2V
~ f (ro,.co)'(I - XF)-'V f (r,, c,)
The value of A that produces the smaller value of 8 is the root chosen, and the
smaller value of &' is the desired squared distance.
609
and
-I
-4
5
N
f (in,Cn,a,b,R)
[(in - a)' + (2,, - b)q2
The simple iteration algorithm is then given as shown below. The parameters
r and c are vector arrays of the row and column coordinates of the N points to
be fit. Parameters a,b, and R are the fitting pararneters of the circle, which are
computed by the procedure. The internal variable number-ofitemtiom will have
to be around 200 or 300 for the iterative procedure to get close to the correct
solution. This makes the procedure too slow to use in practice.
procedure circlefitl(r,c,N,a, @,a,b,R);
for t := 1 to number-ofitemtom do
begin
for n := 1 t o N do
begin
d := (r(n) - a)'
k := l . / d ;
+ (c(n) - b)';
610
f :=d-R2;
kf (r(n)-a)(-1 +0.5(kf)2)
kf (c(n)- b)(-1
0.5(kf)2)
-kfR
c 2 : = e 2 + f *f * k
Ve2 := Ve2
end
end for;
'
c2 := 0.25 * e2;
(8) (8) +
*w
step
((i)
,Aw,c2) ;
end
end for
end circlefitl;
Jd
N
e2 = C(d(?,,
(1159)
n=l
from which
The iterations can then proceed by substituting Eq. (11.60) into Eq. (11.48) and
using Eq. (11.48) in Eq. (11.46) to produce the next value of w =a,b,R.
A faster and more tolerant iteration solution can be done by noticing that from
$$ of Eq. (11.60) an analytically computed value for R can be determined when
&%!
aR
=o.
R =-
CN J(in
- a)'
+ (P. - b)2
n=l
(1 1.61)
61 1
where
Iterations can proceed by substituting Eq. ( 11.62) into Eq. (1 1.48) and Eq. (1 1.48)
into Eq. (11.46) to produce the next value of w = (a, b, R). Convergence is typically
achieved after 20 iterations.
The procedure circlefit gives the pseudocode for these iterations. Its input is
the row-column arrays r and c , each N long. It outputs the circle center (a,b) and
radius R. It calls upon the function epserr, which, given the observed row-column
points and a center estimate (a, b), determines the radius by Eq. (11.61) and then
the error it returns by Eq. (11.59). It also calls upon the procedure step, which
functions like the function step in Section 11.10.1 with natural modification to let
it also output the radius and return the values of the updated center.
procedure circlefit(r,c,N,a,fi ,a,b,R)
a: 4 ;
b: =O;
for n=l to N do
beein
a: =a+r(n);
b: =b+c(n)
end
end for
a: =am;
b: =b/N;
r=epserr(r,c,N,a,b,radius);
for t=l to 20 do
for n=l to N do
d(n): =,/(r(n) - a)2 + (c(n) - b)Z;
dr&:=o;
drdb: 4;
for i=l to N bf do
besin
drda: =drda+(r(i)-a)/d(i);
drdb: =drdb+(c(i)-b)/d(i)
end
end for
dr&: =-drdaM;
drdb: =-drdb/N;
at: 4 ;
bt: 4;
for n=l to N do
ban
f:=1-radius/d(n);
612
end circlefit
EXAMPLE 11.2
d m -= .2201.
Table 11.2
The points in a noisy 90' circular arc. inis the row number and
is the column number.
en
613
Robinson (1961) and Landau (1987) note that the desired values of a,b,
and R that minimize Eq. (11.59)are values of a,b, and R that make VE of
Eq. (11.60) zero. Therefore the simultaneous equations to be solved for a,b, and
R are Eq. (11.61)and
where P and ?t are given by Eqs. ( 1 1.56) and ( 1 1.57). He solves the simultaneous
Eqs. (11.61), (11.65), and (11.66)iteratively. The initial values a, and b, for a
and b are obtained from Eqs. (11 56) and (11.57). Once a center approximation
(a,,b,) is calculated at iteration t, the iterations proceed by
Hence
614
- E , a'
=a
= b - T'. Then
Taking partial derivatives of e2 with respect to a', b', and R, setting these partial
derivatives to zero, and using the fact that
results in
where
( 1 1.69)
H a t + G b ' - yb' = Q
( 1 1.70)
615
and
y = R2 - (a')2- (b')2
However, multiplying Eq.( 1 1.69) by a' and Eq. ( 1 1.70) by b' and adding these
together shows that the bracketed term in Eq. ( 1 1.71) is zero. Equation ( 1 1.71) then
becomes
2Pa' + 2Qb' + y2 = T
(11.72)
Now Eqs. (11.69) and (11.70) can be used to determine an expression for a'
and b' in terms of y, and these expressions for a' and b' can be substituted
back into Eq. ( 1 1.72). After rearranging, there results the fourth-order equation
in y
y4 +AT3 +By2 + C y + D = O
( 1 1.73)
where
A=-F-G
B=FG-T-H~
C = T ( F G ) - 2(P2 Q2)
D =T ( H ~
- F G ) + 2(p2G+ Q2F)- 4PQH
Chernov and Ososkov solve Eq. (11.73) for the desired root by the iterative
Newton method, beginning with the initial estimate
which can be solved in the least-squares sense for ( a ,b , q). Then R2 = a2+b2+q.
The least-squares solution for Eq. (1 1.74) minimizes the sum of the squared
errors between the squared radius and the squared distance between the observed
data points and the circle center:
x[(in
- + (En N
e2 =
a)2
b)'
- R212
n= l
The technique should be used with care, since it is easy for the numerical errors
to influence the results unduly. For example, if Eq. (11.74) is solved by the
616
normal equations,
(i)
(" +")
=(A~A)-,A,
i:
+ 22,
the roundoff error in the computation of A'A can cause excessive inaccuracy. Therefore a singular-value decomposition technique must be used instead.
Finally, since R2 = a2 b2 q , it is apparent that when the center ( a ,b ) of a
circle of radius 10 is in the bottom right comer of an image, a2 b2 can easily be
over 5 x 1V and q must be a large negative number just 100 larger than -5 x 105.
Hence the computation for R2 = a2 b2 + q will involve the subtraction of q , a
large negative number, from a2+b2, a large positive number, with the inherent loss
of precision in the result.
Thomas and Chan ( 1989) also minimize
+ +
The a2 and b2 terms can be eliminated. Multiply Eq. ( 1 1.75) by N and subtract
from it Eq. ( 1 1.77) multiplied by Cnin.Multiply Eq. ( 1 1.76) by N and subtract it
from Eq. ( 1 1.77) multiplied by Cnen. The resulting linear system is Eq. (11.78).
where
617
Solve Eq. (11.78) for a and b and then substitute into Eq. (11.77) to obtain RZ.
where we assume that the random variables [, and 7, are independent and identically
distributed, having mean 0 and variance u2. Hence
618
Then
SO
619
By the system of equations (11.82) the covariance matrix for ti, b, R is identical
to the covariance matrix of h a , A b , A R . By the equal-variance and no-correlation
assumption of Eq. ( 11.SO), we obtain that the covariance matrix for A a , A b, and
A R is given by
E
I(:!)
( A a Ab A R ) ] = u ~ J ; ~ J ~ J ; J ; - ~
( 11.83)
A!)
=u2J;l
(11.84)
620
where
A , = 2#,,(rn- a )
+ + 2Vn(~n- b) + V ;
( 1 1.86)
RZ
Then by squaring the argument of the expectation and simplifying, we obtain
E[e2]= R2
E [(1 + A n )- 2
+ 11
n=l
Assuming the variance of the noise is small compared with R2, SO that I A n( << 1,
we may expand 4in a Taylor series to obtain
Hence
Now, using Eq.(11.86) and assuming that E[&] = E[v;] = 0,we &tennine that
E[e2]= Nu2
which suggests d2 = r2/N as an estimator for u2.Since in actual practice a , b, and
R are estimated as those values that minimize e2, we must use
The analysis we have just done to determine the expect* value af e2 is also
useful for establishing the expected value of the estimator R , which is actually
slightly biased. To make our analysis easier, we assume N is large enough so that
the difference between d and a and b and b is negligible. In this case
+ c,:
Hence E[R] rr R 5
E [ A ~ ]But
. h m Eq. (11.86) E[An] = $, so that
E[R]=R+$.
Berman (1989) observes that the least-squares solutions d , 6 , and R to the sum
of the squared error of Eq. (11.59) are biased. As we have done, he shows that
the asymptotic bias of R is about a 2 / R , where u2 is the Gaussian variance of the
, The
additive noise that perturbs the true (r,, c,) to produce the observed ( i n P,).
biases of ci and 6 tend to be negligible in most cases. Other statistical analyses of the
circle-fitting model can be found in Berman and Griffiths (1985), Berman'(1983),
Berman and Culpin (1986), and Anderson (1981).
aOrf = .2 A ( r - a ) + 2 ~ ( c - b )
622
l N
=
where
,=I
dnfn
[(
-2gn
2(Pn - a)(C, - b)
(2, - b)2
g, = A ( i ,
) (
= dnfn
-A&
- Bh,
-Bgn - Ch,
gn(?, - a )
gn(en - b) hn(tn - a )
hn(L - b)
)]
- a ) + B(i., - b)
To determine initial values for the parameters, we can proceed by using the
principal-axis curve-fit procedure of Section 11.7.2.
Sampson (1982) gives an iterative refinement method to find the conic panuneters to minimize Eq. (11.92).Pavlidis (1983) has a good discussion on fitting curves
with conic splines.
(Ag :)
by a matrix product guaranteed to be positive semidefinite:
A B
(B C)
=(:
; ) I ( :
;)
&
623
It is clear from this relation that for any values d , e , and f , the matrix
suet
(As :),
This means that we can set up the fitting problem with the free parameters d , e ,f .
With this perspective we define the functions to be minimized by
where
624
Proffitt (1982) uses a discrete Fourier transform technique for fitting ellipses.
Wang, Hanson, and Riseman (1988) give a short discussion on extracting ellipses
from images.
where (Vf )I = (g,g) . Also, we need the probability density of observing a data
point (r,c), given that it does not come from the curve being fit. In this case we
will assume that (r, c) comes from a uniform distribution over an area of centered
at (0,O). Hence
P(r, c 1 not from curve) = e
Finally, we need the probability q that an observed data point comes from the
curve of interest, and for each observation (r,, c,) we define a random variable y,,
where y, = 1 if (r,, c,) comes from the curve and y, = 0 if (r,,c,) does not come
from the curve. Hence P ( y , = 1) = q and P(y, = 0) = I - q. Then
625
Assuming the data points are independently conditioned on w and that P(ynlw) =
P(Y,), we obtain
A = { 01
4. Having hypothesized which points come from the curve and which points do
not come from the curve, determine w,+, by maximizing
626
( 1 1.96)
The solution to Eq. ( 1 1.95) can be likewise obtained as the solution to a linear
programming problem.
Exercises
11.1. Show that if A is a 2 x 2 matrix, then for any
A,
I -~ A - ' I A ~
(I - u)-'
=
1 - A trace A
11.2.
Suppose x and y are angles expressed in degrees in the range O0 to 360'. Prove that
the following procedure dctennines a y* that is quai to y modulo 360' such that
d=x-y;
if d > 0 then
else
e = x -u;
ifldl<lel
elsc
11.3.
+ X21AI
then
u =y +3m0
u = y - 360';
y*=y
y* = u;
[(i
=0
+ pcc + 0')
- firr + prrl2] = 402prr(prr
( N - l)(prr + pee)
658
Appendix A
to obtain
EXAMPLE A.4
2 J G 7 r E
=-
rn
dm
Since p,, > prr, we use Eq. (A.84) to obtain the clockwise orientation of
the major axis of the ellipse with respect to the column axis.
APPENDIX
APPENDIX B
For L = EN,the standard inner product is < x , y >= x'y . The inner product
with respect to a positive definite symmetric matrix A is < x, y >= x'Ay . An
inner product < x , y > always leads to a norm on L defined by
660
Appendix B
For L = EN where the norm is induced by the symmetric positive definite matrix
A of the inner product, we will write
llxllA = Jm
=
03.2)
A norm has properties
1 . 1 lx 1 1 L 0 with equality if and only if x = 0;
2. J J a x =
J JJ J c r JJJxll
J for all x E L , a E E;
3. IIx +Yll 5 llxll + llYll for all x,Y E L .
Not every norm arises from some inner product, but those that do have nicer
properties.
Vectors u , v E L are orthogonal if < u , v >= 0. A set of vectors {ul, . . . ,u k )
is orthonormal if
c:=,
<x,bm>= xanbn,bm
661
Since b l , . . . ,br is a basis for M and br+,,. . . ,bN is a basis for M L by taking
=
~ n b nand v =
crnb,, the representatio* and its uniqueness of
x = u + v , where u ~ M a n vd M L ,are shown.
Next we want to show that if x = u v , where u E M and v E M I , the map
x --r u defines a linear operator. Let
c:=,+,
Cz1
where u y ~ M , v y ~ M L
where u, E M , v, E Mi
y=u,+v,
z = u, v,
(B.7)
ay+Pz=u+v
Consider
03.8)
03.9)
=b u y
+Pu,
a ~ y
EMI.
Then
662
Appendix 8
PP = [B(B'AB)-' B'A][B(BJAB)-'B'A]
=B ( B ~ A B ) - ~ ( B ~ A B ) ( B ~ A B ) - ~ B ~ A
= B(BIAB)-'B'A = P
Also, P is symmetric,
=< x , P y >
(B. 14)
Px = BB'Ax
Index
669
economic consequence, 97
economic gain matrix, 96-97, 110- 111
edge detection, 6, 337-351, 383
edge linking, 7
edgeness per unit area, 469
edges. 3
ego-motion polar transform, 546
eigenvalues, 643
eigenvectors, 643
ellipse, 639
endoskeleton, 23 1
entropy purity function, 133
equal-probability-of-ignorance, 110
equivalence table. 34
erosion, 161, 268
error of commission, 126
error estimation, 142
error of omission, 126
exclusive, 191
exoskeleton, 23 1
expand, 159
expected gain, 102
expected profit, 98
extensive operators, 161, 179, 191
extracting, 7
extrema per unit area, 471
extremal points, 62, 648-649
extremum sharpening, 336
462
Index
671
672
Index
symbolic image, I
symmetric, 191
symmetric axis, 230
synthetic texture, 482
tangent line, 648
tangential angle deflection, 565
template matching, 8, 172, 578
texel identification problem, 492
textural edgeness, 469-470
textural energy, 467-469
textural plane, 489
textural primitives, 454, 49 1
textural surface, 486
texture analysis, 453-454
texture features, 61 -62
texture primitives, 490
texture segmentation, 477, 481 -482
thickening, 170
thinning, 170
thinning operator, 278
threshold decision, 136
thresholding, 13, 14, 5 18-523
top hat transformation, 181
top surface, 20 1
topographic primal sketch. 430-443
tracking, 559
trimmed-mean operator, 320
Tukey's biweight, 323
two-dimensional extrema, 472
type I error, 126
type I1 error, 126
umbra, 201 -202
umbra homomorphism theorem, 205-208
uniform bounded-error approximation. 569
uniform error estimation, 625-627
units, 2
unsharp masking, 334
I
I
variogram, 480
vector dispersion, 470
vertical projection, 48, 49, 80
vertical projection, 80
Wallis neighborhood operator, 336
weighted-median filter, 3 19
within-group variance, 20
Yokoi connectivity number, 272
zero-crossing edge detector, 392-403
zone of influence, 237
!
I