Image Processing Techniques For Machine Vision
Image Processing Techniques For Machine Vision
Image Processing Techniques For Machine Vision
=
=
=
=
=
K K K K
K K K K
For John Canny an optimal edge detector need to satisfy
these three conditions:
The edge detector should respond only to edges,
and should find all of them; no edges should be
missed.
The distance between the edge pixels as found by
the edge detector and the actual edge should be as
small as possible.
The edge detector should not identify multiple edge
pixels where only a single edge exits.
Canny edge detection algorithm
1. Read the image I.
2. Convolve a 1D Gaussian mask with I.
3. Create a 1D mask for the first derivative of the
Gaussian in the x and y directions.
4. Convolve I with G along the rows to obtain I
x
, and
down the columns to obtain I
y
.
5. Convolve I
x
with G
x
to have I
x
, and I
y
with G
y
to
have I
y
.
6. Find the magnitude of the result at each pixel (x, y)
2 2
) , ( ) , ( ) , ( y x I y x I y x M
y x
+ =
The Infinite Symmetric Exponential Filter uses another
optimization function to find the edge in an image this
function can be written as:
2 2
2 0 0
4
4 ( ) ( )
(0)
N
f x dx f x dx
C
f
=
The function C
N
will be minimized with an optimal
smoothing filter for an edge detector. This is called infinite
symmetric exponential filter (ISEF):
( )
( ) ( , )
2
1D 2D
p x y p x
p
f x e f x y a e
+
= =
The filter in the ISEF Edge Detector is presented as one-
dimensional recursive filter. Assuming the 2D-filter
function real and continuos, it can be rewritten as:
( )
b
b b
j i f
y x
+
=
+
1
1
] , [
Ideal step edge Ramp edge
4
The use of recursive filtering speeds up the convolution.
The value b can be entered by the user. Figure 4 illustrates
different edge detectors techniques applied to a test image
(Figure 4 a).
Figure 4. Output image using different edge detector
techniques. a) Original Image. b) Gradient Edge Detector. c)
Kirsch Edge Detector. d) Sobel Edge Detector. e) Canny
Edge Detector. f) ISEF Edge Detector.
DIGITAL MORPHOLOGY
The concept of digital morphology is based on the fact that
images consist of set of pictures elements called pixels that
collect into groups having a two-dimensional structure
called shape. A group of mathematical operations can be
applied to the set of pixels to enhance or highlight specific
aspects of the shape so that they can be counted or
recognized [8], [9], [10].
This part of the image processing analysis deals with image
filtering and geometric analysis of the structuring elements.
Erosion or elimination of set of pixels having a given
pattern (structuring element) and dilation or addition of a
given pattern to a small area, are basic morphology
operations. Binary morphological operations are defined on
bilevel images.
In general, an operator is defined as a set of black pixels
with a specific location for each of its pixels given by the
pixel row and column indices. Mathematically, a pixel is
though as a point in two-dimensional space.
The binary dilation of the S by a set S
1
is:
{ }
1 1 1 1
, , S s S s s s c c S S + = =
S represents the image being transformed, and S
1
is a second
set of pixels, with a peculiar shape that acts on the pixels of
S producing an expected result. This set S
1
is called
Structuring Element, and its shape defines the nature of the
dilation. Figure 5 shows a dilation morphology operation of
an image S by using a structuring element S
1
[9]
Figure 5. Dilation of set S using a structuring element S
1
.
It is important to notice that the pixel marked with an x is
considered the origin of the image. The origin could be
either a white or black pixel. The structuring element is not
more than a template moving over the image. The dilation
can be considered to be the union of all translations
specified by the structured element, that is:
1 1
1
) (
1
S s
s
S S S
=
And because it is a commutative operation, dilation can also
be considered the union of all translations of the structuring
element by all pixels in the image:
S s
s
S S S
= ) (
1 1
Figure 6 illustrates the dilation process using a 6x6 image
and a 3x3-structuring element. The ASCII files are also
presented to support the process.
Figure 6. Dilation process in a test image.
The binary erosion of the S by a set S
1
is:
{ } S S c S S
c
= ) (
1 1
The Binary erosion [9] will be the set of pixels c such that
the structuring element S
1
translated by c corresponds to a
set of black pixels in S. As the result, any pixels that do not
a) b) c)
d)
e) f)
x
=
Object S
Structuring
Element
S
1
Object S
dilated
by S
1
x
x
=
x
P1
#origin 0 0
6 6
0 0 0 0 0
0 1 0 0 1 0
0 0 1 1 0 0
0 0 1 1 0 0
0 1 0 0 1 0
0 0 0 0 0 0
P1
#origin 1 1
3 3
1 0 1
0 0 0
1 0 1
P1
#origin 0 0
6 6
1 0 1 1 0 1
0 1 1 1 1 0
1 1 1 1 1 1
1 1 1 1 1 1
0 1 1 1 1 0
1 0 1 1 0 1
5
match the pattern defined by the black pixels in the
structuring element will not belong to the result.
Erosion and Dilation can be associated in the following
manner:
1 1
) ( S S S S
c c
=
This means that the complement of erosion is the same as a
dilation of the complement image by the reflected
structuring element. Figure 7 illustrates the erosion process
using a 10x10 image and a 3x3-structuring element. The
ASCII files are also presented to support the process.
A combination of the simplest morphology operation:
dilation and erosion will result in two very helpful image
processing operation. They are called opening and closing
[8].
Opening: Application of erosion immediately followed by a
dilation using the same structuring element. This binary
operation tends to open small gaps between touching objects
in an image. After an opening objects are better isolated,
and might be counted or classified. A practical application
of an opening is removing noise. For instance after
thresholding an image.
Closing: Application of a dilation immediately followed by
erosion using the same structuring element. The closing
operation closes or fills the gaps between objects. A closing
can be used for smoothing the outline of objects after a
digitization followed by
Figure 7. Erosion process in a test image.
Figure 8 shows the results of applying opening from depth 1
(1 dilations followed by one erosion) to depth 5. It can be
observed that the result of multiple openings is a negative
image with smoothing edges.
Figure 8. The result of applying multiple opening to an
image.
TEXTURE
The repetition of a pattern or patterns over a region is called
texture. This pattern may be repeated exactly, or as set or
small variations. Texture has a conflictive random aspect:
the size, shape, color, and orientation of the elements of the
pattern (textons) [9].
The main goal identifying different textures in machine
vision is to replace them by a unique grey level or color. In
addition there is another problem associated with texture:
scaling. Equal textures at different scales may look different
for an image-processing algorithm. For that reason is
unlikely that a simple operation will allow the segmentation
of textured regions. But some combination of binary
operations may result in an acceptable output for a wide
range of textures.
The simplest way to perform the texture segmentation in
grey-level images is that the grey level associated with each
pixel in a textured region could be the average (mean) level
over some relatively small area. This area is called window
and can vary in size to capture different scales. The use of
windows is very convenient in this case, since texture deals
with region instead of individual pixels [9].
The method can be stated as following:
1. For each pixel in the image, replace it by the
average of the levels seen in a region W W pixels
in size centered at that pixel.
2. Threshold the image into two regions using the
new average levels. The exact location of the
boundary between regions depends on the
threshold method that is applied.
An improved method uses the standard deviation of the grey
level in a small region instead of the mean. The standard
deviation brings information about how many pixels in that
regions belong to textons and how many belong to the
background. The precision with which the boundary
between regions is known is a function of the windows
size.
=
P1
10 10
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 1 1 1 1 1 0 0 0 0
0 1 1 1 1 1 0 0 0 0
0 1 1 1 1 1 1 1 0 0
0 1 1 1 1 1 1 1 0 0
0 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0
P1
#origin 1 1
2 3
0 1
1 1
0 1
P1
10 10
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 0 0 0 0 0
0 0 1 1 1 0 0 0 0 0
0 0 1 1 1 1 1 0 0 0
0 0 0 0 1 1 1 0 0 0
0 0 0 0 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
6
A texture is a combination of a large number of textons.
Isolating textons and treat them as individual objects, it is
something doable. Once this is done, it should be possible to
locate edges that result from the grey-level transition along
boundary of a texton.
Analyzing some edge properties such as common directions,
distances over which the edge pixel repeat, or a measure of
the local density, it is possible to characterize the texture.
The number of edge pixels in a window can be found after
applying an edge detector to that window. Then, the density
is calculated dividing the number of edge pixels found by
the area of the window. From here, useful information can
be extracted. For instance edge direction, the mean x and y
component of the gradient at the edge, and the relative
number of pixels whose principal direction is x and y [7].
The combination of edge enhancement and co-concurrence
is a smart solution (Dyer 1980 Davis 1981) that can be
used in grey-level texture segmentation. The computation of
the co-concurrence matrix of an edge-enhanced image gives
better results than the traditional method.
Figure 9 illustrates this method applied to Figure 1 b).
Notice the difference between the contrast of the co-
concurrence matrix, Figure 9 b), and the contrast of the co-
concurrent matrix of the edge image, Figure 9 d). In this
case Sobel edge detector was used. Figure 9 e) is the
thresholded image. The entire upper region containing the
sky is marked as black, except for the region with the sun.
The edge enhancement method works quite well in a real
image like this one.
Figure 9. Combination of edge-enhancement and co-
concurrence. a) Original image. b) Contrast of the co-
concurrence matrix, distance 1 horizontal. c) Edge-
enhancement image (Sobel). d) Contrast of the co-
concurrent matrix of edge image. e) Thresholded image.
THINNING AND SKELETONIZATION
ALGORITHMS
Skeletonization was introduced to describe the global
properties of objects and to reduce the original image into a
more compact representation. The skeleton expresses the
structural connectivities of the main components of an
object and it has the width of one pixel in the discrete case.
These kinds of techniques have a wide range of
applications, for example skeletonization has been applied
successfully in solving character recognition problems.
A basic method for skeletonization is thinning. It is an
iterative technique, which extracts the skeleton of an object
as a result. In every iteration, the edge pixels having at least
one adjacent background point are deleted. All those pixels
can be eroded, only if its removal doesn't affect the topology
of the object. The skeleton represents the shape of the object
in a relatively small number of pixels [9].
Thinning works for objects consisting of lines (straight or
curved). This method does not work for object having
shapes that encloses a large area. Thinning is most of the
time an intermediate process, to prepare the object for
further analysis. The subsequence processes determine the
properties of the skeleton. For the same object, a skeleton
may work find in one situation, but may not work in all
situations.
The first definition of skeleton was made by Blum in 1967.
He defined the medial axis function (MAF). Medial axis is
basically defined as the set of points, which are the center
points of largest circles that can be contained inside the
shape or the object. To represent a shape as a skeleton, and
still have the capability to reconstruct the original shape,
there can be a radius function that is associated with the
skeleton points. The MAF of a shape is the locus of the
centers of all maximal discs contained in the shape. A
maximal disc contained in the shape is any circle with its
interior that is contained in the shape. A maximal disc
contained in the shape is any circle with its interior that is
contained in the shape. MAF is a reversible transform,
which means it can be inverted to give back the original
image [2], [8].
The MAF in its original implementation needs time and
space and it is very difficult to implement directly. For that
reason the continuous transform its converted to a discrete
one.
A good approximation of MAF on a sampled grid is easily
obtained computing first the distance from each object pixel
to the nearest boundary pixel, and then calculates the
Laplacian of the distance image. Pixels having large values
belong to the medial axis. The way that distance between
the object pixels and the boundary is measured has an
influence on the final result (skeleton). Figure 10 illustrates
the thinning process using the Medial Axis Transform.
a) b) c) d) e)
7
Figure 10. Skeletonization process using the medial axis
transform. a) Sampled Image. b) Distance Map for 8-
distance. c) Skeleton for 8-distance. d) Distance Map for 4-
distance.d) Skeleton for 4-distance.
The Template-Based Mark-and-Delete Thinning Algorithms
are very popular because of their reliability and
effectiveness. This type of thinning processes uses
templates, where a match of the template in the image,
deletes the center pixel. They are iterative algorithms, which
erodes the outer layers of pixel until no more layers can be
removed. The Stentiford Thinning Method is an example of
these kinds of algorithms [9], [14].
It uses a set of four 3 3 templates to scan the image.
Figure 11 shows these four templates.
Figure 11. Templates to identify pixels to be eroded in the
Stentiford Method. The empty white boxes belong to places
where the color of the pixel does not need to be checked.
The Stentiford Algorithm can be stated as following [9]:
1. Find a pixel location (i,j) where the pixels in the image
match those in template T
1
. With this template all pixels
along the top of the image are removed moving from
left to right and from top to bottom.
2. If the central pixel is not an endpoint, and has
connectivity number = 1, then mark this pixel for
deletion.
Endpoint pixel: A pixel is considered an endpoint if it
is connected to just one other pixel. That is, if a black
pixel has only one black neighbor out of the eight
possible neighbors.
Connectivity number: It is a measure of how many
objects are connected with a particular pixel.
+ +
=
S k
k k k k n
N N N N C ) (
2 1
where: N
k
is the color of the eight neighbors of the
pixel analyzed. N
0
is the center pixel. N
1
is the color
value of the pixel to the right of the central pixel and
the rest are numbered in counterclockwise order around
the center.
S ={1,3,5,7}
Figure 12 illustrates the connectivity number. Figure 12 a)
represents connectivity number = 0. b) represents
connectivity number = 1, the central pixel might be deleted
without affecting the connectivity between left and right. c)
represents connectivity number = 2, the deletion of the
central pixel might disconnect both sides. d) represents
connectivity number = 3, and e) represents connectivity
number = 4.
Figure 12. Connectivity numbers.
3. Repeat steps 1 and 2 for all pixel locations matching T
1
.
4. Repeat steps 1-3 for the rest of the templates: T
2
, T
3
,
and T
4
.
T
2
will match pixels on the left side of the object,
moving from bottom to top and from left to right. T
3
will select pixels along the bottom of the image and
move from right to left and from bottom to top. T
4
locates pixels on the right side of the object, moving
from top to bottom and right to left.
5. Set to white the pixels marked for deletion.
Figure 13 shows some examples of the thinning process
using the Stentiford Algorithm.
Figure 13. Skeletonization Process using Stentiford
Algorithm.
a) b) c)
d) e)
T
1 T
2
T
3 T
4
a)
b) c) d) e)
8
Another type of skeletonization algorithm is The Zhang-
Suen Thinning Algorithm [9], [14]. This skeletonization
algorithm is a parallel method that means the new value
obtained only depend on the previous iteration value. It is
fast and simple to be implemented. This algorithm is made
by two subiterations. In the fist one, a pixel I(i,j) is deleted if
the following condition are satisfied:
1. Its connectivity number is one.
2. It has at least two black neighbors and not more
than six.
3. At least one of I(i,j+1), I(i-1,j), and I(i,j-1) are
white.
4. At least one of I(i-1,j), I(i+1,j), and I(i,j-1) are
white.
In the second subiteration the conditions in steps 3 and 4
change.
1. Its connectivity number is one.
2. It has at least two black neighbors and not more
than six.
3. At least one of I(i-1,j), I(i,j+1), and I(i+1,j) are
white.
4. At least one of I(i,j+1), I(i+1,j), and I(i,j-1) are
white.
At the end, pixels satisfying these conditions will be
deleted. If at the end of either subiteration there are no
pixels to be deleted, then the algorithm stops. Figure 14
shows some examples of the thinning process using the
Zhang-Suen Algorithm
Figure 14. Skeletonization Process using Zhang-Suen
Algorithm.
CONCLUSION
Image processing is the study of representation and
manipulation of pictorial information. Digital image
processing is performed on digital computers that
manipulate images as arrays or matrices of numbers.
The latest advancements in computer technology have
opened the use of image processing analysis to fields that
for their complexity would be impossible to be included in
the past. High computational speed, high video resolution,
more efficient computer language to process the data, and
more efficient and reliable computer vision algorithms are
some of the factors that let fields such as medical diagnosis,
industrial quality control, robotic vision, astronomy, and
intelligent vehicle / highway system to be included as a part
of the large list of applications that use computer vision
analysis to achieve their goals.
More and more complex techniques have been developed to
achieve new goals unthinkable in the pass. Machine Vision
researchers started using more efficient and faster
mathematical approaches to solve more complicated
problems. Convolution methods widely used in computer
vision can be speed up by using Fourier Transforms and
Fast Fourier Transforms.
The idea that an image can be decomposed into a sum of
weighted sine and cosine components it is very attractive.
The functions that produce such decompositions are called
basis functions. Examples of basis functions are Fourier and
Wavelet Transforms. A wavelet, like Fourier Transform has
a frequency associated with it but in addition a scale factor
has to be considered. Only some basis functions produce a
decomposition that has a real importance in image
processing.
To emulate human's ability to learn and the human brain's
capacity to analyze, reason, and discern represent difficult
tasks for computers. The brain is actually a highly complex,
nonlinear and parallel computer. Its capacity to perform
certain tasks such as image processing, pattern recognition,
motor control, and perception is several times faster than the
fastest available computer.
Studies to develop artificial tools, based on mathematical
model that simulated the brain performance started 50 years
ago. The Artificial Neural Network (ANN) is an example of
this approach. The use of Neural Nets for symbol and
pattern recognition is a good example of the application of
this technique to Machine Vision.
Another advanced computer algorithm applied to Computer
Vision analysis is the Genetic Algorithms. The idea of
Genetic Algorithms is to simulate the way nature uses
evolution. It uses Survival of the Fittest with the different
solutions in the population. The good solutions reproduce to
form new and hopefully better solutions in the population,
while the bad solutions are removed.
The Genetic Algorithm is an optimization technique. It is
very useful finding the maximum and minimum of a given
function or event. In Machine Vision, since images have
9
two spatial dimensions, it is to be expected that an
optimization technique involving images would take much
longer than a simple 1D problem. In this case, techniques
such as Genetic Algorithms become useful in speeding up
the computational process.
REFERENCES
[1] Bolhouse, V., Fundamentals of Machine Vision,
Robotic Industries Assn, 1997.
[2] Davies, E., R., Machine Vision: Theory,
Algorithms, Practicalities (Signal Processing and
Its Applications Series), Academic Press, 1996.
[3] Freeman, H. Machine Vision. Algorithms,
Architectures, and Systems. Academic Press, Inc.,
1988.
[4] Freeman, H. Machine Vision for Three
Dimensional Scenes, Academic Press, Inc., 1990.
[5] Hussain, Z. Digital Image Processing. Practical
Applications of Parallel Processing Techniques.
Published by: Ellis Horwood Limited, 1991.
[6] Jain, R., Kasturi, R., Shunck, B., G., Machine
Vision (Mc Graw-Hill Series in Computer
Science), McGraw Hill College Div., 1995.
[7] Myler, H., R., Fundamental of Machine Vision,
SPIE Press, 1999.
[8] Parker, J., R., Algorithms for Image Processing
and Computer Vision, Wiley Computer
Publishing, 1997.
[9] Parker, J., R., Practical Computer Vision using
C, Wiley Computer Publishing, 1994.
[10] Ritter, G., X., Wilson, J., N., Handbook of
Computer Vision Algorithms in Image Algebra,
CRC Press, 1996.
[11] Russ, J. C., The Image Processing Handbook.
CRC Press, 1992.
[12] Sanz, J. L. Advances in Machine Vision, Spring-
Verlag, 1989.
[13] Shirai, Y. Three-Dimensional Computer Vision.
Symbolic Computation. Spring-Verlag, 1987.
[14] Sonka, M., Hlavac, V., Boyle, R., Image
Processing, Analysis, and Machine Vision, 2
nd
Edition, Pws. Pub. Co., 1998.
[15] Zuech, N. Applying Machine Vision. John Wiley
& Sons, Inc. , 1988.