1-s2.0-003132039390115D-main

Pattern Recoonition ,Vol. 26, No. 4, pp. 617 625, 1993 0031 3203/93 $6.00+.
00
Printed in Great Britain Pergamon Press Ltd
Pattern Recognition Society
M I N I M U M CROSS ENTROPY THRESHOLDING

C. H. LI and C. K. LEEt
Department of Electronic Engineering, Hong Kong Polytechnic, Hung Horn, Hong Kong
(Received 13 February 1992; in revisedform 12 August 1992; receivedfor publication 23 September 1992)
Abstract--The threshold selection problem is solved by minimizing the cross entropy between the image
and its segmented version. The cross entropy is formulated in a pixel-to-pixel basis between the two images
and a computationally attractive algorithm employing the histogram is developed. Without making a priori
assumptions about the population distribution, this method provides an unbiased estimate of a binarized
version of the image in an information theoretic sense.
Thresholding Image segmentation Minimum cross entropy Maximum entropy method
I. I N T R O D U C T I O N not guaranteed to exist and histogram enhancing or

sharpening algorithms are often needed to overcome
Thresholding is an important preliminary step in these difficulties. A particular advantage of the local
many pattern recognition systems. The selection of a thresholding algorithm is the determination of the
threshold will affect both the accuracy and the efficiency number of classes without the need to be determined
of the subsequent analysis of the segmented image. The a priori. The global thresholding algorithms attempt
principal assumption behind the approach is that the to measure some global statistics of the histogram as
object and the background can be distinguished by the criteria for the selection. This kind of approach is
comparing their gray level values with a suitably less sensitive to noise and does not require elaborate
selected threshold value. The segmented image may enhancement which is usually sensitive to the individ-
not be suitable as the input for general image under- ual image characteristics and requires a lot of super-
standing routines due to its lack of consideration of vision. Recently, Wilson and Spann 14) introduced a
the spatial context of the image pixels. However, the method of clustering combining the local and global
simplicity and speed of the thresholding algorithm thresholding approach which eliminates most of the
make it one of the most widely used algorithms in above drawbacks. In this paper, we shall only discuss
automated systems ranging from medical applications the global approach to thresholding and we use the
to industrial manufacturing. The binarized image is two most well-known and most widely applied algor-
especially suitable as the input for hardware im- ithms, the minimum error thresholding algorithm by
plementation of template matching through correla- Kittler and Illingworth 15) and Ostu's method 16) as the
tion and moment-based recognition. Besides the ap- major comparisons to our method. In Ostu's method,
plication of global thresholding in image segmentation, the threshold is selected so as to maximize the class
it is also used in various classification problems in separability, which is based on the within-class variance,
pattern recognition. between-class variance, and the total variance of gray
Research work in threshold selection is numerous levels. This method is non-parametric, unsupervised
and detailed reviews can be found in references (1-3). and applies without a priori knowledge. This method
In general, threshold selection algorithms can be has wide applicability and is often used as a standard
roughly classified as global or local according to the algorithm with which other thresholding algorithms
nature of the algorithms. Local thresholding algorithms are compared. The major drawback of this algorithm
select the threshold based on local properties in the is the existence of a bias in the threshold when the two
histogram function such as the existence of maxima underlying distributions have unequal variances or
and minima. A typical example of a local thresholding when the populations of the two distributions are very
algorithm is the bottom of the valley approach which different.
involves locating local minima of the histogram. In In the minimum error approach, the sets of pixels
principle, this approach relies on differentiation and is which comprise the object and the background are
thus extremely prone to noise. Significant preprocessing both assumed to be normally distributed. A criterion
such as smoothing the image and smoothing the function is constructed such that the selected threshold
histogram has to be done as a preliminary step in will minimize the average error in pixel classification.
the procedure. Moreover, local extremal points are Besides assuming a normal distribution, the approach
also assumes that the overlap between the two under-
lying distributions is small and the truncation error in
t Author to whom all correspondence should be addressed. the derivation of the algorithm can thus be ignored.
617
618 C.H. L1 and C. K. LEE
This method actually attempts to bypass the estimation The cross entropy was proposed by Kullback"2~
of the mean, variance and standard deviation of the under the name of directed divergence. The cross
two distributions in the histogram. This algorithm entropy measures the information theoretic distance
gives a better estimate of the threshold when the between two distributions P={Pl,P2 ..... PN} and
distributions are in fact normally distributed and will Q = {ql,q2 ..... qN} by
not give a threshold when the distribution is a uni- N
modal normal distribution. An improvement to the D(Q,P)= ~ qklOg2 qk. (1)
minimum error method is proposed by Cho et al. (7~ k = 1 Pk
which corrects the biased estimation of variances due This measure D(Q, P) is also studied by Renyi1~3~as
to truncation. As the principle in modelling is identical, the information theoretical distance between the two
we shall use reference (5) as the basis for discussions distributions P and Q. Renyi also points out that the
and comparisons. formula can be interpreted as the expectation of the
The above two algorithms are examples of thres- change in the information content when we are using
holding algorithms which utilize the histogram as the Q instead of P. The minimum cross entropy method
only input data for the threshold selection. Another can be seen as an extension of the maximum entropy
class of thresholding algorithms which are also devel- method by setting equal initial estimates for all p~
oped solely on the histogram consists of the application when no prior information is available.
of the maximum entropy principle to the problem of
image segmentation. These approaches use the concept
of entropy from information theory without explicit 3. M I N I M U M CROSS ENTROPY SEGMENTATION
reference to the properties of an image as a two-
dimensional distribution and have significant restric- For an experiment such as dice throwing, each toss
tions in practice. The details will be discussed in is a separate trial and the outcome of each trial does
Section 3. not affect the others and the entropy maximization
leads to independent probabilities for different trials.
However, for the case of time-series analysis or image
modelling, there is strong correlation in the data and
2. PRINCIPLE OF THE M A X I M U M ENTROPY
in order to take account of the mutual information
The maximum entropy principle was originally content, the modelling has to treat the entire time
proposed by Jaynes {s) to the inference of unknown series or image as a single trial and the combinatorial
probability distribution under constraints. The role of argument of the maximum entropy principle applied
the constraints is to limit the solution set to include to the collection of many different realizations of the
only those solutions that are consistent with the data. experiment.
While the inference problem is often underdetermined Former work in applying the maximum entropy
as a result of insufficient data, a number of feasible method to image segmentation does not make the
solutions often exist after applying all the constraints. above distinction and all consider the pixel gener-
The maximum entropy principle allows us to select the ation process as independent trials." 4-16) They use the
solution which gives the largest entropy. The original normalized gray level histogram as the gray level
idea is that it will give the most unbiased estimate and probability distributions based on random trials of
allows a maximum freedom within the limit of the con- individual pixels having a certain gray level and meas-
straints. Throughout the years, the maximum entropy ure the entropy of the pixel distributions. The max-
principle has undergone wide theoretical debates and imum entropy thresholding method proposed by
has been applied with great success to various areas of Kapur et al. (16~ is the algorithm that is considered
science and engineering. With the use of the concentra- superior to other entropy thresholding algorithms. (17~
tion theorem and the study of multiplicities, it has been However, this maximum entropy method is still not
shown that distributions of higher entropy have higher well accepted and performs poorly at times and the
multiplicities and are thus more likely to be observed}9) authors have various extensions of their work. Wong
Axiomatic formulations have also shown that the and Sahoo tlS~ used a combination of two measures
maximum entropy method is the uniquely correct concerning the spatial information in the image and
method for inductive inference when new information the entropy as the criteria for selecting the threshold.
is given in the form of expected values.{~°~It has been While retaining the histogram entropy function in the
extended to be a general inference method that deals maximum entropy thresholding method, Kapur in-
with the problem of recreation of positive and additive troduces a set of additional heuristic principles to
distributions including incoherent image intensity select the threshold. ~17) Thus the formulation of a
or power and spectral density. It is widely applied general histogram thresholding using the entropy
in various fields and is remarkably successful. For principle without additional criteria has not been
example, in the field of spectral estimation the maxi- successful.
mum entropy method provides better resolution than In the proposed scheme, the segmentation process
other traditional methods such as the maximum like- is posed as one of reconstruction of the image distribu-
lihood method/~ 1) tion. Consider the image functionf : N × N --*G, where
Minimum cross entropy thresholding 619
250
200
graylevel
151
I0
UV
Fig. 1. Three-dimensional plot of a gray level image.
200
175
graylevel
15(
12
lq
UV
Fig. 2. Three-dimensional plot of the segmented image.

620 C.H. Ll and C. K. LEE
N is the set of natural integers and G = { 1,..., L} c N the where N1 and N 2 a r e the number of pixels smaller in
set of gray levels. The segmentation process is the con- the two regions. Combining equations (1), (2) and (5),
struction of a function O: N x N --*S, where S = {Pl, #2} we get
e ~ ÷ × ~ + , and ~ + is the set of real positive numbers.
Figures 1 and 2 show a sample f(x, y) and a segmented q(t)= ~ f, l o g ( ~ f,. . ' ) + ~. f, log( f! "~. (6)
image 9ix, Y) in the coordinate space with the gray level :,<t \pdtU f,>_t - 't'\/1)2(f
displayed on the z-axis. The segmented image 9(x, y) will The threshold is then selected by
be constructed as follows:
to = min (qit)) (7)
t
g(x,y) = ~#1, f(x,y) < t (2)
t#2, f(x,y)>t" where to is the required threshold. The above sum-
The segmented image 9(x, y) is uniquely determined mation is done on the entire image, however, there are
from f(x,y) by the specification of three unknown repeated calculations that can be grouped. Thus we
parameters t, P x and #2. A criteria has to be constructed arrive at the following formulae:
to enable us to find the optimal g, or equivalently the j=t-1 j-L
optimal set of parameters t, #1 and #2 that resembles f jhj ~ jhj
as close as possible. That is j=l j=t
#1(t)= j = t - I ' # 2 ( t ) - j=L (8)
r/(ff) ~ r/it, #1, #2 ). (3) E hj E hj
j=l j=t
The criteria function is generally some sort of distortion
measure, for example, the mean square difference of 9 t/(t)= ~ jhjlog + ~ jhjlog .
from f is a general measure that can be used. The s= l ~l (t) j=t \X#Eit) /
minimum error method and Ostu's method both belong
The form of the cross entropy in our model bears
to this category. At the end of this section, we will show
similarity to the image entropy derived by Skilling<19)
that Ostu's method minimizes the mean square distance
in which a set of four axioms has been employed to
between the image and its segmented version while the
derive the following entropic functions:
proposed method minimizes the cross entropy. Instead
of using the mean square differences, the cross entropy s(f, m) = S dx(f(x) - m(x)) -f(x)log (f(x)/m(x)) (9)
is the preferred measure for positive and additive
where f(x) is the image intensity distribution and re(x)
distributions.
the model for the image. In fact, if the total intensity
The above problem can be considered from a classical
conservation constraints are included, the two equa-
maximum entropy inference problem using constraints.
tions are identical with a sign reversed, since the first
Thus a set of values G = {01, g2 . . . . . ON}, where N is the
two terms of the integral in equation (8) cancel out
number of pixels in the image, has to be inferred from
when integrated over all categories.
the observed image F = {fa,fz . . . . . fN} together with
While the introduced method minimizes the cross
the use of suitable constraints. Here the distributions
entropy between the image and its segmented version,
are obtained by linearizing the two-dimensional dis-
Ostu's method of minimization of the between-class
tributions in an identical way, so 9, and f, come from
variance can also be derived by the above method
the same location in the image space. And G contains
using mean square distance as the metric between the
elements having only two values, say #1 and #2, which
two images and the same set of constraints in equa-
are as yet unknown. Next, the intensity conservation
tion (4). The criterion function in this case becomes
constraint is applied. Since we want the reconstructed
distribution to follow the data closely, the observed O(t)= ~ (fi--#l(t))2+ ~ ( f , - # 2 ( t ) ) 2. (10)
image intensity F gives the constraints on the values fi<t ft>--t
of #1 and #2 such that the total intensity in the Grouping the summation using the histogram, the
reconstructed image is identical to the observed image criterion function becomes
in both categories. The constraints can be summarized
as
Oit)= ~ hjij-#1it)) 2 + ~, hjiJ-p2(t)) 2 i l l )
j<t j>_t
(i) 9,e {#x, #2} which is the within-class variance as defined in refer-
(ii) Z fi= Z #~ ence (6). Minimization of the function defined in
fi<t fl<t equation (11) is equivalent to maximizing O stu's criteria
(iii) ~ f~= ~ #1 (4) as the sum of the within-class variance and the between-
fl>_t fi>t class variance which is a constant independent of the
threshold selected.
which allows the determination of #t and #2 by the
The use of the cross entropy function is not limited
following equations:
to the thresholding application. It can be extended to
Ef, Ef, cover other areas of image segmentation when combined
f, < t , - :' ~ ' (5) with other constraints. For example, a region-based
segmentation method such as the split and merge
method using the cross entropy as a criterion for Ostu's method, the minimum error method, and the
merging regions can be formulated with constraints on minimum cross entropy method are implemented on
the labelling of spatial coordinates. the normal distributions as discussed in reference (5).
These histograms are shown in Figs 3-5. We also use
4. IMPLEMENTATIONS AND DISCUSSIONS the histogram of an actual image (Fig. 6) taken from
an industrial setting. The thresholds selected by the
For the purpose of comparison, we adopt the same four algorithms are summarized in Table 1.
approach that was taken by Kittler and Illingworth. The results of the first three histograms on the
o.o,~
o.o, 2
O.O1
o,oo~
O.00Q
O.OO~
0.002
OO so ~ oo ~~ o 2oo a~o
Fig. 3. Histogram (a).
o.oo~
o.ooe
o.oo~
o.°o2
°o so 1 °o ,so aoo 2SO --~00
Fig. 4. Histogram (b).
o,o0
oo s
o.o:18
o.o~ I
o . o2~ i
o.o2
o.o~5
o°;S.!
Fig. 5. Histogram (c).
~aoo
l a o o ! ~
~400
1200
1000
UO0
eo0
200
o ,~o ,go =~o 2go :1oo
Fig. 6. Histogram (d).

622 C.H. LI and C. K. LEE
Table 1. Threshold selected for the four trial histograms using Ostu's
method, minimum error method, maximum entropy method, and the
minimum cross entropy (MCE) method
Minimum Maximum
Ostu's error entropy MCE
Histogram method method method method
(a) 97 59 130 83
(b) 96 82 118 88
(c) 101 64 165 93
(d) 41 None 58 40
Fig. 7. Strain image.
Gaussian distributions are consistent as the fact that and the ellipse from the background. The histogram is
the minimum error method takes the normal distribu- extracted and shown in Fig. 6. The minimum error
tion as the model and subsequently outperforms both method does not give any threshold as there is no
the minimum cross entropy method and Ostu's method. internal minimum for the criterion function. The
However, the minimum cross entropy is able to give a histogram is thus wrongly inferred as unimodal while
threshold which is closer to the optimal threshold than the actual situation is that the two distributions are
both Ostu's method and the maximum entropy method. highly overlapping with a significant amount of em-
For the strain measurement image in Fig. 7, the goal bedded noise. This shows that wrong assumptions may
is to measure the change in the major and minor axes be disastrous. The minimum cross entropy method
of the ellipse which will give information about the and Ostu's method successfully select a threshold
strain of the part of a piece of metal. The first step in which results in the segmented image with significantly
this process is the segmentation of the parallel lines less noise than thresholds which are selected otherwise.
623
Fig. 8. Strain image with threshold at 40.
Fig. 9. Strain image with threshold less than 40.

624 C.H. LI and C. K. LEE
- ~-~W~p,~¢',,,~,..~ ". ".., ~ , .- 5 ",
~ ~' ~ 7-'~" ~ . . . . . : .:~-~, ..~.
Fig. 10. Strain image with threshold larger than 40.
To illustrate this, compare Fig. 8 which shows the permits the selection of thresholds without the tendency
segmented image with threshold value 40 as selected of biasing when the two distributions have highly
by our method with Figs 9 and 10 which shows the unequal populations and highly unequal variances.
segmented image with threshold slightly below and The minimum error method and its improved version
slightly above our estimated value. In Fig. 9 where a provide a computationally efficient method of comput-
smaller threshold is used, there are more holes and ing the threshold provided the population is normally
openings in the lines and the ellipse. In Fig. 10, where distributed. However, for situations which we have no
a larger threshold is used, too much of the background knowledge about the population model, such assump-
is included. This illustrates the selection of the correct tions may not be necessarily correct. Without making
threshold is critical in this application as it is generally a priori assumptions about the inherent distributions,
true with highly overlapping distributions. the minimum cross entropy algorithm gives the most
The thresholds selected by the maximum entropy unbiased estimate of the binarized version of the
method are very far from the thresholds selected by the image.
other three algorithms. This could be explained by
the existence of the bias in the algorithms toward
REFERENCES
splitting the histogram into equal halves, tl 7)
1. P. K. Sahoo, S. Soltani and A. K. C. Wong, A survey of
thresholding techniques, Comput. Vision Graphics Image
Process. 41, 233-260 (1988).
5. CONCLUSIONS 2. A. K. C. Wong and P. K. Sahoo, A gray-level threshold
selection method based on maximum entropy principle,
The application of the minimum cross entropy IEEE Trans. Syst. Man ,Cybern. SMC-19(4), 866-871
method to the segmentation problem is developed and (July/August 1989).
applied to both asymmetrically normal distributions 3. J. Kittler and J. Illingworth, Threshold selection based
and a real image with success. Our algorithm selects on a simple image statistics, Comput. Vision Graphics
the threshold which minimizes the cross entropy Image Process. 30, 125-147 (1985).
4. R. Wilson and M. Spann, A new approach to clustering,
between the thresholded image and the original image. Pattern Recognition 23, 1413-1425 (1990).
Compared to Ostu's method, the use of cross entropy 5. J. Kittler and J. Illingworth, Minimum error thresholding,
as the distortion measure instead of the variance, Pattern Recognition 19, 41-47 (1986).
6. N. Ostu, A threshold selection method from gray-level entropy of the histogram, Comput. Graphics Vision Image
histogram, IEEE Trans. Syst. Man Cybern. SMC-9(1), Process. 29, 273-285 (1985).
62-66 (1979). 17. J. N. Kapur, Maximum Entropy Models in Science and
7. S. Cho, R. Haralick and S. Yi, Improvement of Kittler Engineering. Wiley Eastern, New Delhi (1989).
and Illingworth's minimum error thresholding, Pattern 18. A. K. C. Wong and P. K. Sahoo, A gray-level threshold
Recognition 22, 609-617 (1989). selection method based on maximum entropy principle,
8. E. T. Jaynes, Information theory and statistical mechanics, IEEE Trans. Syst. Man Cybern. SMC-19(4), 866-871
Phys. Rev. 106, 620-630 (1957). (July/August 1989).
9. E.T. Jaynes, On the rationale of maximum-entropy 19. J. Skilling, Classic Maximum Entropy, Maximum Entropy
methods, Proc. IEEE 70(9), (September 1982). and Bayesian Methods, J. Skillings, ed, pp. 45-52. Kluwer
10. J. E. Shore and R. W. Johnson, Axiomatic derivation of Dordrecht (1988).
the principle of maximum entropy and the principle of
minimum cross entropy, IEEE Trans. Inf. Theory IT-26,
26-37 (January 1980).
l 1. N. A. Malik and J. S. Lim, Properties of two-dimensional APPENDIX
maximum entropy power spectral estimation, IEEE
Trans. Acoust. Speech Signal Proc. ASSP-30, 788-798 For trial histograms (a), (b), and (c), the following normal
(October 1982). distribution model
12. S. Kullback, Information Theory and Statistics. Wiley,
New York (1959). h(g)= P l exp( (g_p,)2~ + - P2
- exp f (g--/~2)2~
13. A. Renyi, A Diary on Information Theory. Akadememiai x/(2rOa, \ 2a, ) ~/(2rt)a2 ~ 2o~ /t
Kiado Wiley, Budapest (1984).
14. T. Pun, A new method for grey level thresholding using with parameters from reference (4) are used:
the entropy of the histogram, Signal Process. 2, 223-237 (a) Pl = 0.25,/tl =38,171 =9, P2 =0.75,//2 = 121,°"2 =44;
(1980). (b) Pl = 0.45, u I = 47, al = 13, P2 = 0.55,#2 = 144, e 2 = 25;
15. T. Pun, Entropic thresholding, a new approach, Comput. (c) PI = 0.5, #1 = 50, a I = 4, P2 = 0.5, #2 = 150, 0"2 = 30.
Graphics Image Process. 16, 210-239 (1981).
16. J.N. Kapur, P.K. Sahoo and A. K. C. Wong, A new Histogram (d) is obtained from the strain-gauge measurement
method for grey-level picture thresholding using the image (Fig. 6) obtained from an industrial setting.
About the Author--CHuN-Hur~G Ll was born in Hong Kong in 1966. He received the B.Sc. degree in
physics from the State University of New York at Stony Brook. He is currently a M.Phil. student at the
Department of Electronic Engineering, Hong Kong Polytechnic. His research interests include computer
vision and image processing.
About the Author--C. K. LEE received the B.Sc. and Ph.D. degrees in electronic engineering from the
University of London and UCNW, University of Wales, in 1977 and 1984, respectively. He worked in
the Hirst Research Center, GEC, in 1977. From 1981 to 1986, he was employed as Supervisory Engineer
in ASM Assembly Automation, Hong Kong, to develop pattern recognition systems for wire-bonder
machines. In 1986, he joined the Hong Kong Polytechnic. He is now a senior lecturer in the Department
of Electronic Engineering. His research interests include digital signal processing, computer vision systems,
and pattern recognition systems.

1-s2.0-003132039390115D-main

Uploaded by

Copyright:

Available Formats

1-s2.0-003132039390115D-main

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1-s2.0-003132039390115D-main

Uploaded by

Copyright:

Available Formats

Pattern Recoonition ,Vol. 26, No. 4, pp. 617 625, 1993 0031 3203/93 $6.00+.

M I N I M U M CROSS ENTROPY THRESHOLDING

Thresholding Image segmentation Minimum cross entropy Maximum entropy method

I. I N T R O D U C T I O N not guaranteed to exist and histogram enhancing or

Fig. 1. Three-dimensional plot of a gray level image.

Fig. 2. Three-dimensional plot of the segmented image.

Fig. 3. Histogram (a).

°o so 1 °o ,so aoo 2SO --~00

Fig. 4. Histogram (b).

Fig. 6. Histogram (d).

Fig. 7. Strain image.

Fig. 8. Strain image with threshold at 40.

Fig. 9. Strain image with threshold less than 40.

- ~-~W~p,~¢',,,~,..~ ". ".., ~ , .- 5 ",

~ ~' ~ 7-'~" ~ . . . . . : .:~-~, ..~.

Fig. 10. Strain image with threshold larger than 40.

You might also like