1-s2.0-003132039390115D-main
1-s2.0-003132039390115D-main
1-s2.0-003132039390115D-main
00
Printed in Great Britain Pergamon Press Ltd
Pattern Recognition Society
(Received 13 February 1992; in revisedform 12 August 1992; receivedfor publication 23 September 1992)
Abstract--The threshold selection problem is solved by minimizing the cross entropy between the image
and its segmented version. The cross entropy is formulated in a pixel-to-pixel basis between the two images
and a computationally attractive algorithm employing the histogram is developed. Without making a priori
assumptions about the population distribution, this method provides an unbiased estimate of a binarized
version of the image in an information theoretic sense.
617
618 C.H. L1 and C. K. LEE
This method actually attempts to bypass the estimation The cross entropy was proposed by Kullback"2~
of the mean, variance and standard deviation of the under the name of directed divergence. The cross
two distributions in the histogram. This algorithm entropy measures the information theoretic distance
gives a better estimate of the threshold when the between two distributions P={Pl,P2 ..... PN} and
distributions are in fact normally distributed and will Q = {ql,q2 ..... qN} by
not give a threshold when the distribution is a uni- N
modal normal distribution. An improvement to the D(Q,P)= ~ qklOg2 qk. (1)
minimum error method is proposed by Cho et al. (7~ k = 1 Pk
which corrects the biased estimation of variances due This measure D(Q, P) is also studied by Renyi1~3~as
to truncation. As the principle in modelling is identical, the information theoretical distance between the two
we shall use reference (5) as the basis for discussions distributions P and Q. Renyi also points out that the
and comparisons. formula can be interpreted as the expectation of the
The above two algorithms are examples of thres- change in the information content when we are using
holding algorithms which utilize the histogram as the Q instead of P. The minimum cross entropy method
only input data for the threshold selection. Another can be seen as an extension of the maximum entropy
class of thresholding algorithms which are also devel- method by setting equal initial estimates for all p~
oped solely on the histogram consists of the application when no prior information is available.
of the maximum entropy principle to the problem of
image segmentation. These approaches use the concept
of entropy from information theory without explicit 3. M I N I M U M CROSS ENTROPY SEGMENTATION
reference to the properties of an image as a two-
dimensional distribution and have significant restric- For an experiment such as dice throwing, each toss
tions in practice. The details will be discussed in is a separate trial and the outcome of each trial does
Section 3. not affect the others and the entropy maximization
leads to independent probabilities for different trials.
However, for the case of time-series analysis or image
modelling, there is strong correlation in the data and
2. PRINCIPLE OF THE M A X I M U M ENTROPY
in order to take account of the mutual information
The maximum entropy principle was originally content, the modelling has to treat the entire time
proposed by Jaynes {s) to the inference of unknown series or image as a single trial and the combinatorial
probability distribution under constraints. The role of argument of the maximum entropy principle applied
the constraints is to limit the solution set to include to the collection of many different realizations of the
only those solutions that are consistent with the data. experiment.
While the inference problem is often underdetermined Former work in applying the maximum entropy
as a result of insufficient data, a number of feasible method to image segmentation does not make the
solutions often exist after applying all the constraints. above distinction and all consider the pixel gener-
The maximum entropy principle allows us to select the ation process as independent trials." 4-16) They use the
solution which gives the largest entropy. The original normalized gray level histogram as the gray level
idea is that it will give the most unbiased estimate and probability distributions based on random trials of
allows a maximum freedom within the limit of the con- individual pixels having a certain gray level and meas-
straints. Throughout the years, the maximum entropy ure the entropy of the pixel distributions. The max-
principle has undergone wide theoretical debates and imum entropy thresholding method proposed by
has been applied with great success to various areas of Kapur et al. (16~ is the algorithm that is considered
science and engineering. With the use of the concentra- superior to other entropy thresholding algorithms. (17~
tion theorem and the study of multiplicities, it has been However, this maximum entropy method is still not
shown that distributions of higher entropy have higher well accepted and performs poorly at times and the
multiplicities and are thus more likely to be observed}9) authors have various extensions of their work. Wong
Axiomatic formulations have also shown that the and Sahoo tlS~ used a combination of two measures
maximum entropy method is the uniquely correct concerning the spatial information in the image and
method for inductive inference when new information the entropy as the criteria for selecting the threshold.
is given in the form of expected values.{~°~It has been While retaining the histogram entropy function in the
extended to be a general inference method that deals maximum entropy thresholding method, Kapur in-
with the problem of recreation of positive and additive troduces a set of additional heuristic principles to
distributions including incoherent image intensity select the threshold. ~17) Thus the formulation of a
or power and spectral density. It is widely applied general histogram thresholding using the entropy
in various fields and is remarkably successful. For principle without additional criteria has not been
example, in the field of spectral estimation the maxi- successful.
mum entropy method provides better resolution than In the proposed scheme, the segmentation process
other traditional methods such as the maximum like- is posed as one of reconstruction of the image distribu-
lihood method/~ 1) tion. Consider the image functionf : N × N --*G, where
Minimum cross entropy thresholding 619
250
200
graylevel
151
I0
UV
200
175
graylevel
15(
12
lq
UV
N is the set of natural integers and G = { 1,..., L} c N the where N1 and N 2 a r e the number of pixels smaller in
set of gray levels. The segmentation process is the con- the two regions. Combining equations (1), (2) and (5),
struction of a function O: N x N --*S, where S = {Pl, #2} we get
e ~ ÷ × ~ + , and ~ + is the set of real positive numbers.
Figures 1 and 2 show a sample f(x, y) and a segmented q(t)= ~ f, l o g ( ~ f,. . ' ) + ~. f, log( f! "~. (6)
image 9ix, Y) in the coordinate space with the gray level :,<t \pdtU f,>_t - 't'\/1)2(f
displayed on the z-axis. The segmented image 9(x, y) will The threshold is then selected by
be constructed as follows:
to = min (qit)) (7)
t
g(x,y) = ~#1, f(x,y) < t (2)
t#2, f(x,y)>t" where to is the required threshold. The above sum-
The segmented image 9(x, y) is uniquely determined mation is done on the entire image, however, there are
from f(x,y) by the specification of three unknown repeated calculations that can be grouped. Thus we
parameters t, P x and #2. A criteria has to be constructed arrive at the following formulae:
to enable us to find the optimal g, or equivalently the j=t-1 j-L
optimal set of parameters t, #1 and #2 that resembles f jhj ~ jhj
as close as possible. That is j=l j=t
#1(t)= j = t - I ' # 2 ( t ) - j=L (8)
r/(ff) ~ r/it, #1, #2 ). (3) E hj E hj
j=l j=t
The criteria function is generally some sort of distortion
measure, for example, the mean square difference of 9 t/(t)= ~ jhjlog + ~ jhjlog .
from f is a general measure that can be used. The s= l ~l (t) j=t \X#Eit) /
minimum error method and Ostu's method both belong
The form of the cross entropy in our model bears
to this category. At the end of this section, we will show
similarity to the image entropy derived by Skilling<19)
that Ostu's method minimizes the mean square distance
in which a set of four axioms has been employed to
between the image and its segmented version while the
derive the following entropic functions:
proposed method minimizes the cross entropy. Instead
of using the mean square differences, the cross entropy s(f, m) = S dx(f(x) - m(x)) -f(x)log (f(x)/m(x)) (9)
is the preferred measure for positive and additive
where f(x) is the image intensity distribution and re(x)
distributions.
the model for the image. In fact, if the total intensity
The above problem can be considered from a classical
conservation constraints are included, the two equa-
maximum entropy inference problem using constraints.
tions are identical with a sign reversed, since the first
Thus a set of values G = {01, g2 . . . . . ON}, where N is the
two terms of the integral in equation (8) cancel out
number of pixels in the image, has to be inferred from
when integrated over all categories.
the observed image F = {fa,fz . . . . . fN} together with
While the introduced method minimizes the cross
the use of suitable constraints. Here the distributions
entropy between the image and its segmented version,
are obtained by linearizing the two-dimensional dis-
Ostu's method of minimization of the between-class
tributions in an identical way, so 9, and f, come from
variance can also be derived by the above method
the same location in the image space. And G contains
using mean square distance as the metric between the
elements having only two values, say #1 and #2, which
two images and the same set of constraints in equa-
are as yet unknown. Next, the intensity conservation
tion (4). The criterion function in this case becomes
constraint is applied. Since we want the reconstructed
distribution to follow the data closely, the observed O(t)= ~ (fi--#l(t))2+ ~ ( f , - # 2 ( t ) ) 2. (10)
image intensity F gives the constraints on the values fi<t ft>--t
of #1 and #2 such that the total intensity in the Grouping the summation using the histogram, the
reconstructed image is identical to the observed image criterion function becomes
in both categories. The constraints can be summarized
as
Oit)= ~ hjij-#1it)) 2 + ~, hjiJ-p2(t)) 2 i l l )
j<t j>_t
(i) 9,e {#x, #2} which is the within-class variance as defined in refer-
(ii) Z fi= Z #~ ence (6). Minimization of the function defined in
fi<t fl<t equation (11) is equivalent to maximizing O stu's criteria
(iii) ~ f~= ~ #1 (4) as the sum of the within-class variance and the between-
fl>_t fi>t class variance which is a constant independent of the
threshold selected.
which allows the determination of #t and #2 by the
The use of the cross entropy function is not limited
following equations:
to the thresholding application. It can be extended to
Ef, Ef, cover other areas of image segmentation when combined
f, < t , - :' ~ ' (5) with other constraints. For example, a region-based
segmentation method such as the split and merge
Minimum cross entropy thresholding 621
method using the cross entropy as a criterion for Ostu's method, the minimum error method, and the
merging regions can be formulated with constraints on minimum cross entropy method are implemented on
the labelling of spatial coordinates. the normal distributions as discussed in reference (5).
These histograms are shown in Figs 3-5. We also use
4. IMPLEMENTATIONS AND DISCUSSIONS the histogram of an actual image (Fig. 6) taken from
an industrial setting. The thresholds selected by the
For the purpose of comparison, we adopt the same four algorithms are summarized in Table 1.
approach that was taken by Kittler and Illingworth. The results of the first three histograms on the
o.o,~
o.o, 2
O.O1
o,oo~
O.00Q
O.OO~
0.002
OO so ~ oo ~~ o 2oo a~o
o.oo~
o.ooe
o.oo~
o.°o2
o,o0
oo s
o.o:18
o.o~ I
o . o2~ i
o.o2
o.o~5
o°;S.!
Fig. 5. Histogram (c).
~aoo
l a o o ! ~
~400
1200
1000
UO0
eo0
200
o ,~o ,go =~o 2go :1oo
Table 1. Threshold selected for the four trial histograms using Ostu's
method, minimum error method, maximum entropy method, and the
minimum cross entropy (MCE) method
Minimum Maximum
Ostu's error entropy MCE
Histogram method method method method
(a) 97 59 130 83
(b) 96 82 118 88
(c) 101 64 165 93
(d) 41 None 58 40
Gaussian distributions are consistent as the fact that and the ellipse from the background. The histogram is
the minimum error method takes the normal distribu- extracted and shown in Fig. 6. The minimum error
tion as the model and subsequently outperforms both method does not give any threshold as there is no
the minimum cross entropy method and Ostu's method. internal minimum for the criterion function. The
However, the minimum cross entropy is able to give a histogram is thus wrongly inferred as unimodal while
threshold which is closer to the optimal threshold than the actual situation is that the two distributions are
both Ostu's method and the maximum entropy method. highly overlapping with a significant amount of em-
For the strain measurement image in Fig. 7, the goal bedded noise. This shows that wrong assumptions may
is to measure the change in the major and minor axes be disastrous. The minimum cross entropy method
of the ellipse which will give information about the and Ostu's method successfully select a threshold
strain of the part of a piece of metal. The first step in which results in the segmented image with significantly
this process is the segmentation of the parallel lines less noise than thresholds which are selected otherwise.
623
To illustrate this, compare Fig. 8 which shows the permits the selection of thresholds without the tendency
segmented image with threshold value 40 as selected of biasing when the two distributions have highly
by our method with Figs 9 and 10 which shows the unequal populations and highly unequal variances.
segmented image with threshold slightly below and The minimum error method and its improved version
slightly above our estimated value. In Fig. 9 where a provide a computationally efficient method of comput-
smaller threshold is used, there are more holes and ing the threshold provided the population is normally
openings in the lines and the ellipse. In Fig. 10, where distributed. However, for situations which we have no
a larger threshold is used, too much of the background knowledge about the population model, such assump-
is included. This illustrates the selection of the correct tions may not be necessarily correct. Without making
threshold is critical in this application as it is generally a priori assumptions about the inherent distributions,
true with highly overlapping distributions. the minimum cross entropy algorithm gives the most
The thresholds selected by the maximum entropy unbiased estimate of the binarized version of the
method are very far from the thresholds selected by the image.
other three algorithms. This could be explained by
the existence of the bias in the algorithms toward
REFERENCES
splitting the histogram into equal halves, tl 7)
1. P. K. Sahoo, S. Soltani and A. K. C. Wong, A survey of
thresholding techniques, Comput. Vision Graphics Image
Process. 41, 233-260 (1988).
5. CONCLUSIONS 2. A. K. C. Wong and P. K. Sahoo, A gray-level threshold
selection method based on maximum entropy principle,
The application of the minimum cross entropy IEEE Trans. Syst. Man ,Cybern. SMC-19(4), 866-871
method to the segmentation problem is developed and (July/August 1989).
applied to both asymmetrically normal distributions 3. J. Kittler and J. Illingworth, Threshold selection based
and a real image with success. Our algorithm selects on a simple image statistics, Comput. Vision Graphics
the threshold which minimizes the cross entropy Image Process. 30, 125-147 (1985).
4. R. Wilson and M. Spann, A new approach to clustering,
between the thresholded image and the original image. Pattern Recognition 23, 1413-1425 (1990).
Compared to Ostu's method, the use of cross entropy 5. J. Kittler and J. Illingworth, Minimum error thresholding,
as the distortion measure instead of the variance, Pattern Recognition 19, 41-47 (1986).
Minimum cross entropy thresholding 625
6. N. Ostu, A threshold selection method from gray-level entropy of the histogram, Comput. Graphics Vision Image
histogram, IEEE Trans. Syst. Man Cybern. SMC-9(1), Process. 29, 273-285 (1985).
62-66 (1979). 17. J. N. Kapur, Maximum Entropy Models in Science and
7. S. Cho, R. Haralick and S. Yi, Improvement of Kittler Engineering. Wiley Eastern, New Delhi (1989).
and Illingworth's minimum error thresholding, Pattern 18. A. K. C. Wong and P. K. Sahoo, A gray-level threshold
Recognition 22, 609-617 (1989). selection method based on maximum entropy principle,
8. E. T. Jaynes, Information theory and statistical mechanics, IEEE Trans. Syst. Man Cybern. SMC-19(4), 866-871
Phys. Rev. 106, 620-630 (1957). (July/August 1989).
9. E.T. Jaynes, On the rationale of maximum-entropy 19. J. Skilling, Classic Maximum Entropy, Maximum Entropy
methods, Proc. IEEE 70(9), (September 1982). and Bayesian Methods, J. Skillings, ed, pp. 45-52. Kluwer
10. J. E. Shore and R. W. Johnson, Axiomatic derivation of Dordrecht (1988).
the principle of maximum entropy and the principle of
minimum cross entropy, IEEE Trans. Inf. Theory IT-26,
26-37 (January 1980).
l 1. N. A. Malik and J. S. Lim, Properties of two-dimensional APPENDIX
maximum entropy power spectral estimation, IEEE
Trans. Acoust. Speech Signal Proc. ASSP-30, 788-798 For trial histograms (a), (b), and (c), the following normal
(October 1982). distribution model
12. S. Kullback, Information Theory and Statistics. Wiley,
New York (1959). h(g)= P l exp( (g_p,)2~ + - P2
- exp f (g--/~2)2~
13. A. Renyi, A Diary on Information Theory. Akadememiai x/(2rOa, \ 2a, ) ~/(2rt)a2 ~ 2o~ /t
Kiado Wiley, Budapest (1984).
14. T. Pun, A new method for grey level thresholding using with parameters from reference (4) are used:
the entropy of the histogram, Signal Process. 2, 223-237 (a) Pl = 0.25,/tl =38,171 =9, P2 =0.75,//2 = 121,°"2 =44;
(1980). (b) Pl = 0.45, u I = 47, al = 13, P2 = 0.55,#2 = 144, e 2 = 25;
15. T. Pun, Entropic thresholding, a new approach, Comput. (c) PI = 0.5, #1 = 50, a I = 4, P2 = 0.5, #2 = 150, 0"2 = 30.
Graphics Image Process. 16, 210-239 (1981).
16. J.N. Kapur, P.K. Sahoo and A. K. C. Wong, A new Histogram (d) is obtained from the strain-gauge measurement
method for grey-level picture thresholding using the image (Fig. 6) obtained from an industrial setting.
About the Author--CHuN-Hur~G Ll was born in Hong Kong in 1966. He received the B.Sc. degree in
physics from the State University of New York at Stony Brook. He is currently a M.Phil. student at the
Department of Electronic Engineering, Hong Kong Polytechnic. His research interests include computer
vision and image processing.
About the Author--C. K. LEE received the B.Sc. and Ph.D. degrees in electronic engineering from the
University of London and UCNW, University of Wales, in 1977 and 1984, respectively. He worked in
the Hirst Research Center, GEC, in 1977. From 1981 to 1986, he was employed as Supervisory Engineer
in ASM Assembly Automation, Hong Kong, to develop pattern recognition systems for wire-bonder
machines. In 1986, he joined the Hong Kong Polytechnic. He is now a senior lecturer in the Department
of Electronic Engineering. His research interests include digital signal processing, computer vision systems,
and pattern recognition systems.