Content Based Image Retrieval
Content Based Image Retrieval
Content Based Image Retrieval
5(22)
Abstract
The importance of an effective technique in searching and retrieving images from
the huge collection cannot be overemphasized. One approach for indexing and
retrieving image data is using manual text annotations. The annotations can then be
used to search images indirectly. But there are several problems with this approach.
First, it is very difficult to describe the contents of an image using only a few keywords.
Second, the manual annotation process is very subjective, ambiguous, and incomplete.
Those problems have created great demands for automatic and effective techniques for
content-based image retrieval (CBIR) systems. Most CBIR systems use low-level image
features such as color, texture, shape, edge, etc., for image indexing and retrieval. Its
because the low-level features can be computed automatically. Content Based Image
Retrieval (CBIR) has emerged during the last several years as a powerful tool to
efficiently retrieve images visually similar to a query image. The main idea is to
represent each image as a feature vector and to measure the similarity between images
with distance between their corresponding feature vectors according to some metric.
Finding the correct features to represent images with, as well as the similarity metric
that groups visually similar images together, are important steps in the construction of
any CBIR system.
1 INTRODUCTION
The field of image retrieval has been an active research area for several decades and has been
paid more and more attention in recent years as a result of the dramatic and fast increase in the
volume of digital images. The development of Internet not only cause an explosively growing
volume of digital images, but also give people more ways to get those images.
There were two approaches to content-based image retrieval initially. The first one is based
on attribute representation proposed by database researchers where image contents are defined as a
set of attributes which are extracted manually and are maintained within the framework of
conventional database management systems. Queries are specified using these attributes. This
obviously involves high-level of image abstraction. The second approach which was presented by
image interpretation researchers depends on an integrated feature-extraction / object-recognition
subsystem to overcome the limitations of attribute-based retrieval. This system automates the
feature-extraction and object- recognition tasks that occur when an image is inserted into the
database. This automated approaches to object recognition are computationally expensive, difficult
and tend to be domain specific. There are two major categories of features. One is basic which is
concerned with extracting boundaries of the image and the other one is logical which defines the
image at various levels of details. Regardless of which approach is used, the retrieval in contentbased image retrieval is done by color, texture, sketch, shape, volume, spatial constraints, browsing,
objective attributes, subjective attributes, motion, text and domain concepts.
1.1 The growth of digital imaging: The use of images in human communication is hardly
new our cave-dwelling ancestors painted pictures on the walls of their caves, and the use of maps
and building plans to convey information almost certainly dates back to pre-Roman times. But the
10
2 Applications of CBIR
A wide range of possible applications for CBIR technology has been identified. Potentially
fruitful areas include: Crime prevention, The military ,Fashion and interior design, Journalism and
advertising, Medical diagnosis , Geographical information and remote sensing systems , Web
searching.
Crime prevention: Law enforcement agencies typically maintain large archives of visual
evidence, including past suspects facial photographs (generally known as mug shots), fingerprints,
tyre treads and shoeprints. Whenever a serious crime is committed, they can compare evidence from
the scene of the crime for its similarity to records in their archives. Strictly speaking, this is an
example of identity rather than similarity matching, though since all such images vary naturally
over time, the distinction is of little practical significance. Of more relevance is the distinction
between systems designed for verifying the identity of a known individual (requiring matching
against only a single stored record), and those capable of searching an entire database to find the
closest matching records.
The military: Military applications of imaging technology are probably the best-developed,
though least publicized. Recognition of enemy aircraft from radar screens, identification of targets
from satellite photographs, and provision of guidance systems for cruise missiles are known
examples though these almost certainly represent only the tip of the iceberg. Many of the
surveillance techniques used in crime prevention could also be relevant to the military field.
Fashion and interior design: Similarities can also be observed in the design process in other
fields, including fashion and interior design. Here the designer has to work within externallyimposed constraints, such as choice of materials. The ability to search a collection of fabrics to find
a particular combination of colour or texture is increasingly being recognized as a useful aid to the
design process.
Journalism and advertising :Both newspapers and stock shot agencies maintain archives of
still photographs to illustrate articles or advertising copy. These archives can often be extremely
large (running into millions of images), and dauntingly expensive to maintain if detailed keyword
indexing is provided. Broadcasting corporations are faced with an even bigger problem, having to
deal with millions of hours of archive video footage, which are almost impossible to annotate
without some degree of automatic assistance.
Medical diagnosis :The increasing reliance of modern medicine on diagnostic techniques
such as radiology, histopathology, and computerised tomography has resulted in an explosion in the
number and importance of medical images now stored by most hospitals. While the prime
requirement for medical imaging systems is to be able to display images relating to a named patient,
there is increasing interest in the use of CBIR techniques to aid diagnosis by identifying similar past
cases.
Geographical information systems (GIS) and remote sensing : Although not strictly a
case of image retrieval, managers responsible for planning marketing and distribution in large
corporations need to be able to search by spatial attribute (e.g. to find the 10 retail outlets closest to
a given warehouse). And the military are not the only group interested in analysing satellite images.
Agriculturalists and physical geographers use such images extensively, both in research and for
more practical purposes, such as identifying areas where crops are diseased or lacking in nutrients
or alerting governments to farmers growing crops on land they have been paid to leave lying fallow.
Web searching: Cutting across many of the above application areas is the need for effective
location of both text and images on the Web, which has developed over the last five years into an
12
Figure 1) Transformation, Quantization of tiger Image. (a) Original image. (b) Image
produced by applying RGB to HSV color transformation and quantization.
3.2.3 Methods of Representation
The main method of representing colour information of images in CBIR systems is through
colour histograms. A colour histogram is a type of bar graph, where each bar represents a particular
colour of the colour space being used. We can get a colour histogram of an image in the RGB or
HSV colour space. The bars in a colour histogram are referred to as bins and they represent the xaxis. The number of bins depends on the number of colours there are in an image. The y-axis
denotes the number of pixels there are in each bin. In other words how many pixels in an image are
of a particular colour.
3.2.4 Color Content Extraction
One of the widely used methods for querying and retrieval by color content is color
histograms. The color histograms are used to represent the color distribution in an image mainly,
the color histogram approach counts the number of occurrences of each unique color on a sample
image. Since an image is composed of pixels and each pixel has a color, the color histogram of an
image can be computed easily by visiting every pixel once. Smith and Chang proposed colorsets as
an opponent to color histograms. The colorsets are binary masks on color histograms and they store
the presence of colors as 1 without considering their amounts. For the absent colors, the colorsets
store 0 in the corresponding bins. The colorsets reduce the computational complexity of the distance
between two images. Besides, by employing colorsets region-based color queries are possible to
some extent. On the other hand, processing regions with more than two or three colors is quite
complex. Another image content storage and indexing mechanism is color correlograms. It
involves an easy-to-compute method and includes not only the spatial correlation of color regions
but also the global distribution of local spatial correlation of colors. In fact, a color correlogram is a
table each row of which is for a specific color pair of an image. The k-th entry in a row for color
pair (i; j) is the probability of finding a pixel of color j at a distance k from a pixel of color i. The
method resolves the drawbacks of the pure local and pure global color indexing methods since it
includes local spatial color information as well as the global distribution of color information.
3.2.5 Texture
Texture is that innate property of all surfaces that describes visual patterns, each having
properties of homogeneity. It contains important information about the structural arrangement of the
surface, such as; clouds, leaves, bricks, fabric, etc. It also describes the relationship of the surface to
the surrounding environment .In short, it is a feature that describes the distinctive physical
composition of a surface. Texture properties include: (i) Coarseness, ii)Contrast, iii)Directionality,
iv)Line-likeness, v)Regularity & v)Roughness.
15
(c) Rocks
(a) Clouds
(b) Bricks
In the system, this technique is employed for similarity calculations as a result of texture
vector and color histogram comparisons between database images and query image.
16
6 IMPLEMENTATION DETAILS
6.1 Color Histogram calculation
We define color histograms as a set of bins where each bin denotes the probability of pixels in
the image being of a particular color. A color histogram H for a given image is defined as a vector:
H = {H[0],H[1],H[2],..,H[i],H[n]}
19
R+G +B
.
3
Where the quantities R, G and B are the amounts of the red, green and blue Components
normalized to the range [0; 1]. The value is therefore just the average of the red, green and blue
components.
The saturation is given by S = 1
min( R , G , B )
3
=1
min( R , G , B ) .
I
R+G + B
Where the min(R; G;B) term is really just indicating the amount of white Present. If any of R,
G or B are zero, there is no white and we have a pure colour. The hue is given by
H = cos
1
[( R G ) + ( R B ) ]
( R G ) 2 + ( R B )(G B )
1
2
20
The EHD basically represents the distribution of 5 types of edges in each local area called a
sub-image. As shown in Fig. 5, the sub-image is defined by dividing the image space into 44 no
overlapping blocks. Thus, the image partition always yields 16 equal-sized sub-images regardless of
the size of the original image. To characterize the sub-image, we then generate a histogram of edge
distribution for each sub-image. Edges in the sub-images are categorized into 5 types: vertical,
horizontal, 45-degree diagonal, 135-degree diagonal and non-directional edges (Fig. 6). Thus, the
histograms for each sub-image represent the relative frequency of occurrence of the 5 types of
edges in the corresponding sub-image. As a result, as shown in Fig. 1, each local histogram contains
5 bins. Each bin corresponds to one of 5 edge types. Since there are 16 sub-images in the image, a
total of 516=80 histogram bins is required. Note that each of the 80-histogram bins has its own
semantics in terms of location and edge type. The semantics of the histogram bins form the
normative part of the MPEG-7 standard descriptor. Specifically, starting from the sub-image at (0,0)
and ending at (3,3), 16 sub-images are visited in the raster scan order and corresponding local
histogram bins are arranged accordingly. Within each sub image, the edge types are arranged in the
following order: vertical, horizontal, 45-degree diagonal, 135-degree diagonal, and non directional.
Table 1 summarizes the complete semantics for the EHD with 80 histogram bins. Of course, each
histogram bin value should be normalized and quantized. For normalization, the number of edge
occurrences for each bin is divided by the total number of image-blocks in the sub-image.
Histogram bins
Semantics
BinCounts[0]
Vertical edge of sub-image at (0,0)
BinCounts[1]
Horizontal edge of sub-image at (0,0)
BinCounts[2]
45 degree edge of sub-image at (0,0)
BinCounts[3]
135 degree edge of sub-image at (0,0)
BinCounts[4]
Non-directional edge of sub-image at (0,0)
BinCounts[5]
Vertical edge of sub-image at (0,1)
:
:
BinCounts[74]
Non-directional edge of sub-image at (3,2)
BinCounts[75]
Vertical edge of sub-image at (3,3)
BinCounts[76]
Horizontal edge of sub-image at (3,3)
BinCounts[77]
45 degree edge of sub-image at (3,3)
BinCounts[78]
135 degree edge of sub-image at (3,3)
BinCounts[79]
Non-directional edge of sub-image at (3,3)
Table-1) Semantics of Local Edge bins
Image block is a basic unit for extracting the edge information. That is, for each image-block,
we determine whether there is at least an edge and which edge is predominant. When an edge
exists, the predominant edge type among the 5 edge categories is also determined. Then, the
21
22
.....(6)
N
2
d (q, t ) = (hq (m) ht (m)) 2
m =1
this Metric is uniform in terms of the Euclidian distance between vectors in feature space, but
the vectors are not normalized to unit length (infact, they are on a hyperlane if the histogram is
normalized).
6.4.2 Cosine Distance
If we normalize all vectors to unit length, and look at the angle between them, we have cosine
distance, defined as:
N
hq ( m ) ht ( m )
2
d ( q , t ) = cos 1 m =1
min( hq , ht )
Assume we are going to combine them as a weighted sum of all the distances, i.e. the distance
for a image in the database is written as:
K
D ( q, i ) =
wk d k (q, i )
......(1)
and
k =1
= 1, wk 0, k = 1, 2,.., K
......(2)
k =1
Now we want to search for a vector w that satisfies Eq.(2) and the resulting distance measure
is "most close" to our subjective criteria. There are two candidate approaches:
i) assign a set of weights based on the perceptually judgment of the designer on some image
set (training). But the problem here is that this set of weights may perform poorly on new dataset.
ii) Or, having no assumption about the subjective judgment of a user, we choose the image
that minimizes the maximum distance over all valid set of weights as the best match(denoted as
Mini-Max hereafter). For every image i, searching for the maximum distance over the weight space
turns our to be a linear program, an thus have fast solution:
Maximize: (1), Subject to (2).Where all ds are the constants and wk , k = 1.., K are unknown. The
image with the miminum "max-distance" is declared as the best match to the query image. For our 2
features case the max distance
D(q, i ) = wdc (q, i ) + (1 w)de (q, i ), 0 w 1
of every image i, is a linear function of w over [0,1]. Thus the maximum either lies at w=0 or
w=1, and comparing dc (q, i) and de (q, i) is sufficient. Then we rank the maximum of
d c (q, i ) and d e (q, i ) for all i, and take n images with the least distance as our return result.
7 TESTING
7.1 Recall and Precision Evaluation:
Testing the effectiveness of the content based image Retrieval about testing how well the
CBIR can retrieve similar images to the query image and how well the system prevents the return
results that are not relevant to the source at all in the user point of view. A sample query image must
be selected from one of the image category in the database. When the system is run and the result
images are returned, the user needs to count how many images are returned and how many of the
returned images are similar to the query image. Determining whether or not two images are similar
is purely up to the users perception. Human perceptions can easily recognise the similarity between
two images although in some cases, different users can give different opinions. After images are
retrieved, the systems effectiveness needs to be determined. To achieve this, two evaluation
measures are used. The first measure is called Recall. It is a measure of the ability of a system to
present all relevant items. The equation for calculating recall is given below:
Re call =
The second measure is called Precision. It is a measure of the ability of a system to present
only relevant items. The equation for calculating precision is given below.
Precision =
The number of relevant items retrieved is the number of the returned images that are similar
to the query image in this case. The number of relevant items in collection is the number of images
that are in the same particular category with the query image. The total number of items retrieved is
the number of images that are returned by the system.
24
1
2
3
4
5
Total Number
of
Relevant
Images
46
38
18
30
32
Number of
relevant images
Retrieved
Total Number
of
Retrieved
Images
29
45
30
44
12
20
25
36
23
35
Table 2) Color Test cases
Recall
63.04
78.94
66.66
83.33
71.80
Precision
64.44
68.00
80.00
69.44
65.70
1
2
3
4
5
Total Number
of
Relevant
Images
46
38
18
30
32
Number of
relevant images
Retrieved
Total Number
of
Retrieved
Images
19
45
28
44
11
20
22
36
19
35
Table 3) Texture Test cases
Recall
41.30
73.68
61.11
73.33
59.37
Precision
42.22
63.64
55.00
61.11
54.28
1
2
3
4
5
Total Number
Number of
Total Number
of
relevant images
of
Recall
Relevant
Retrieved
Retrieved
Images
Images
46
35
45
76.08
38
33
44
86.84
18
15
20
83.33
30
26
36
86.67
32
27
35
84.38
Table 4) Color and Texture Test cases
Precision
77.77
75.00
75.00
72.22
77.14
8 CONCLUSION
The dramatic rise in the sizes of images databases has stirred the development of effective and
efficient retrieval systems. The development of these systems started with retrieving images using
textual annotations but later introduced image retrieval based on content. This came to be known as
25
26
_________________________
Article received: 2008-12-18
27