MPEG7 Algorithms Implemented For A Multimedia Management System

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

MPEG7 algorithms implemented for a Multimedia Management System

Liana Stnescu, Dumitru Sorina, Cosmin Stoica Spahiu, Dumitru Dan Burdescu *

* University of Craiova, Faculty of Automation, Computers and Electronics, Bvd. Decebal, No.107,
Romania; (e-mail{ stanescu, burdescu, stoica.cosmin}@software.ucv.ro).

Abstract: The paper presents an original dedicated integrated software system for managing and
querying alphanumerical information and images. The system is designed with a modularized
architecture which is based on a relational database management server. The system is updated
with new algorithms from MPEG 7 for image processing and retrieval. The studies made for the
implemented algorithms, have shown that the results obtained by combining the Color Layout,
Dominant Color and Texture Edge Histogram descriptors, improved the performance. The visual
manner of building this type of query specific for multimedia data and the modified Select
command that is sent for execution to the MMDBMS give originality to the software product.
Keywords: Image retrieval, image processing, MPEG7 descriptors.

functions for images processing and characteristics


1. INTRODUCTION
extractions, and functions for storing all this information.
The large scale usage of the digital data in the last years,
The paper has the following structure: section 2 presents
and the growth of the internet have ensured that huge
the related work, section 3 presents the general
volumes of high dimensional multimedia data are
architecture of the MMDBMS, Section 4 and 5 present in
available all around us. This kind of information is often
detail the colour descriptors implemented for the SGBD,
mixed, the same document containing different data types
and Section 6 presents the conclusions and future work
such as: text, image, audio, speech, hypertext, graphics,
related to this paper.
and video. All these components are interspersed with
each other. The World Wide Web has played an important
2. RELATED WORK
role in making the data easily accessible to users all over
the world, even if they are from geographically distant Many content based retrieval systems have been
locations. developed in the last years. Some of them for commercial
purposes, others have been developed as results of
All these information must be efficiently managed and
research projects. Below one presents some of the most
retrieved in order to be useful. Different algorithms and
known systems:
systems have been implemented for this goal. The most
known set of algorithms is MPEG. QBIC project (Query By Image Content) was one of the
first commercial content-based image retrieval system
The main objective of the MPEG-7 visual standard is to
that was developed. It was implemented by IBM and
provide standardized descriptions of streamed or stored
consists in a search engine that sorts database images
images or video-standardized header bits (visual low-level
according to some descriptors: colours, textures, shapes,
descriptors) that help users or applications to identify,
sizes, and space position. The system has implemented
categorize, or filter images or video. These low-level
functionality for: query-by-example, user drawings,
descriptors can be used to compare, filter or browse
selected colour and texture patterns, camera and object
images or video purely on the basis of non-textual visual
motion. Any image obtained as a query result can be used
descriptions of the content or in combination with
as a new query in order to improve the search. (Flickner,
common text-based queries.
M. et. al (1997), Niblack, C. W. et. al, (1993), Royo,
The paper presents the development of a multimedia C.V., (2010)).
databases management system which combines the
Virage project (Bach, J. R. et. al, 1996) is also a content-
functionality of a classical relational system with
based images retrieval system, developed by Virage Inc.
algorithms for images processing.
This system is similar to the QBIC project: it has
A classical DBMS has to provide support for database functionality for visual queries based on colour, texture,
management like browsing and querying, inserting, and shapes. The main difference is that this system allows
updating, storing and deleting information. It has to the users to adjust the weights of each characteristic taken
ensure database integrity and security (Dimitrova, 1999). into account (Royo, C.V., 2010).
In addition, a multimedia database management system RetrievalWare project is also a commercial retrieval
must have functions for metadata processing, special engine developed by Excalibur Technologies Corporation.
It was commercially launched in 1992. Its emphasis was Color, Color Layout, Scalable Color and Color Structure.
in neural nets to image retrieval. Its more recent search The texture characteristic is also a feature considered to
engines use features such as colour, shape, texture, be very important in image retrieval. The algorithm used
brightness, colour layout, and aspect ratio of the image for extraction is Texture Edge Histogram. Various
(Excalibur Visual Retrievalware, (1998), Royo, C.V., matching tools are defined for different descriptors and
(2010)). description schemes.
Photobook project is a retrieval system which allows the
3. THE MULTIMEDIA DATABASE MANAGEMENT
users to browse image databases using both text
SYSTEM
annotation information added to images, and executing
content-based retrieval operations. This project was The multimedia database management system that was
developed at the MIT Media Lab. It is composed of three developed is an original application that includes methods
sub-modules: shape, texture, and face features extracted. for extracting the colour and texture characteristics and
Humans interaction is needed during the image for executing content based retrieval queries. It uses both
annotation and retrieval loop (similar to the MIRROR classical data types for storing standard information and it
system) (Pentland, A. et. al, (1995), Royo, C.V.,(2010)). has defined a new data type special for images, called
IMAGE (Spahiu, C.S. et. al., (2009a), Spahiu, C.S. et. al,
ImgSeek project is a free open source photo collection
(2009b)). This type is used to store the image file and the
manager and viewer which includes content-based
extracted metadata.
retrieval functionality. The query is built by providing
either a sketch painted by the user or an image file The multimedia database management system was
(Jacobs, C. E. et. al. (1995) , ImgSeek web, (Royo, C.V., developed by the Computers and Information Technology
(2010)). Department from the University of Craiova. The basic
reason for the designing of this system was that most of
MIRROR project. The acronym MIRROR comes
the database servers existing on the market nowadays
from Multimedia Information Retrieval Reducing
does not offer any support for the images management
Information OveRload. It was developed by the
and processing. Most of them do not offer any particular
University of Twente and it is one of the most complex
functionality for images management. It is usually
systems (Wong et. al, 2005, Royo, C.V., (2010)). It was
recommended to store the images files outside the
implemented to evaluate MPEG-7 visual descriptors and
database, this way being sensitive to unauthorized
to design and develop new algorithms for image retrieval.
changes/deletes.
It includes a web-based user interface for query by image
example retrieval. The system does not include the The biggest advantage of this system is that the client
DBMS. Figure 1 shows a capture of the MIRROR applications complexity is much lower than that from
interface. The MIRROR system has implemented the other similar systems, as the system includes all the
following color descriptors from MPEG-7: Dominant necessary functionality for images processing. The clients

Figure 1. A capture of MIRROR interface.


need only to call the APIs of the system. clustering algorithm is applied in order to extract the
dominant colours of the image. Any clustering method
The system is a modularly designed, each functionality
can be applied, however the one selected in this project is
being managed by a different module. There are
the K-means algorithm.
implemented modules for: colour characteristics
extraction, texture extraction, similarity computation, The K-means clustering algorithm is a method of cluster
SQL processing, etc. analysis which aims to divide n observations into k
clusters. Each observation belongs to the cluster with the
If needed, new functionalities can be added very easy by
nearest mean (K-means web a.)(K-means web b.).
adding new modules to the system and connecting them
to the main module. The following Figure 2 is an example of a colour
clustering using K-means for the DCD extraction of an
4. MPEG DESCRIPTORS USED input image. As the standard recommends the use of three
or four dominant colours for the DCD, four colours have
One of the main objectives of the system is to include all been selected in this particular example.
algorithms needed to characterize images and to
determine their similarity close to humans colour
perception.
The initial version of the multimedia system used the
Gabor Filters and Histograms for images retrieval. The
functionality was extended by adding new MPEG7
algorithms for images processing and retrieval. The
MPEG-7 standard is a collection of methods used to Figure 2. K-means: RGB clustering with 4 clusters.
obtain different colour descriptors from images (MPEG-
7). It can be observed as only four dominant colours of the
original image have been selected, which corresponds to
After studying the advantages and disadvantages of each the four centroids of the K-means clusters.
colour descriptor, the Colour Layout Descriptor (CLD)
and the Dominant Colour Descriptor (DCD) were selected The percentage of pixels in the image belonging to each
for implementation. This decision was carried out to take of the clusters is calculated in the last step of the
the advantages offered by DCD and CLD. The DCD algorithm. The DCD results (four centroids and the
makes possible the effective description of the dominant percentage) obtained for the previous example, are
colours of an image and the CLD is a compact, summarized in the following table:
resolution-invariant representation that retains the spatial
distribution of an images colours. Table 1. Example of DCD

4.1 Dominant Colour Descriptor (DCD) Centroids () Percentage ()


L* u* v*
The Dominant Colour Descriptor (DCD) it is one of the 54.2495 70.0140 12.3690 0.3501 %
descriptors used in MPEG7. It is used to give a compact 80.3203 24.7781 18.8156 0.1321 %
description of the salient colours in an image or image 66.4478 49.2724 17.7287 0.2456 %
region. It gives the possibility to specify a small number 34.6152 36.4142 -15.7432 0.2722 %
of dominant colour values as well as their statistical
properties: distribution or variance (Verdaguer, S.,L., These centroids and its corresponding percentages
(2009)). represent the DCD of the image.
The algorithm used to extract the dominant colours from The performance of the system using DCD was evaluated
an image needs a clustering method which is used to using different images from database. In this case, it is
cluster the pixel colour values. expected to obtain images with similar dominant colours
There are needed three stages for the extraction process: a ordered according to increasing distances. An example of
color space change, a clustering method and a calculation the results returned is presented in Figure 3.
of the percentages of each centroid (Verdaguer, S.L., In this particular example it can be observed that the
(2009)). image with less distance (distance 0) is the input image (it
The first stage is recommended by the MPEG-7 standard, is detected in the database) and the following ones are the
in order to use a perceptually uniform colour space for other images with similar dominant colors. Furthermore,
executing the clustering method. The recommended the same results have been obtained using, as an input to
colour space is CIE LUV. This step must be implemented the system, images with different dominant colors, such
as most of the images used in this project were defined on as white, red, brown.
the RGB colour space. The obtained results were satisfactory. However, in this
The second step can begin once the input image is case the computing time when the descriptors were
converted to the CIE LUV colour space. At this step a extracted was very high:
needs the selection of a single representative colour from
each block. This selection can be performed using any
appropriate method. The standards recommended method
is to use for the corresponding representative colour the
average of the pixel colours in a block. This method has a
low complexity and the description accuracy is good.
The results of this selection are tiny image icons of size
8x8. Figure 5 shows the process. The resolution of the
original image has been maintained only in order to
facilitate its representation.

Figure 3. Results obtained using DCD .


- Analyzing the DCD of one picture: 20 seconds
- Input image DCD extraction and Matching with 100 Figure 5. Stage 2: Representative colour selection.
DCD: 45 seconds (S.L. Verdaguer, S.L., (2009)).
The colour space conversion from RGB to YCbCr is
4.2 Color layout descriptor (CLD) applied once the new icon is obtained. This color space
The second descriptor implemented in the system is conversion can be done after any step. However, the
Colour Layout Descriptor. It captures the spatial layout of MPEG-7 standard recommended to proceed with the
the representative colours on a grid superimposed on a conversion in this point in order to reduce the
region or image. The representation of image is based on computational load needed for this process (Verdaguer,
coefficients of the Discrete Cosine Transform. It is a very S.L., (2009)).
compact descriptor, very efficient, being recommended The following formulas are used to calculate the DCT in a
for fast browsing and search applications. The descriptor 2D array:
can be used with good results both for still images as well
M 1 N 1
(2m +1) p (2n +1)q 0 p M 1 (1)
as for video files (Verdaguer, S.L., (2009)). Bpq = pq Am n cos cos ,
m=0 n=0 2M 2N 0 q N 1
This descriptor is obtained by applying the discrete cosine
transform (DCT) transformation on a 2-D array of local
representative colours in Y or Cb or Cr colour space. 1
, p=0
The functionalities of the CLD is matching both images p = M
and video. The CLD is one of the most precise and fast 2 , 1 p M 1
M
colour descriptor from MPEG. (2)
1
Four steps are needed for the descriptors extraction , q=0
process: image partitioning, representative colour q = N
detection, DCT transformation and a zigzag scanning. An 2 , 1 q N 1
additional step is needed to transform colour space from N
RGB (if necessary) to YCbCr.
The algorithm divides the input picture (represented in The inputs and outputs for this step are presented by
RGB color space) in 64 blocks to guarantee the invariance Table 2.
to the image resolution or scale (Fig. 4).
Table 2. Inputs and Outputs Stage 4
The second step starts after the image partitioning stage. It
Input Stage 4 Output Stage 4
Tiny image icon [8x8] 3 [8x8] matrix of 64
in YCbCr color space coefficients (DCTY, DCTCb,
DCTCr)
A zigzag scanning is performed in the next step with these
three sets of 64 DCT coefficients. The scanning schema is
depicted by Fig. 6.
The purpose of this step is to group the low frequency
Figure 4. Step 1: Partitioning. coefficients of the 8x8 matrix.
numbers of angular and directional parameters are 6 and
5, resulting in 30 channels in total.
local edge histogram descriptor. This descriptor has
good results when the image region that should be
processed is not homogeneous. The algorithm divides an
image into 4 x 4 non-overlapping sub-images. Each sub-
image is divided into an application-specific number of
image-blocks. The five types of edge information can be
extracted from the image-blocks by edge detection
Figure 6. Stage 5: Zigzag scanning
operators. For each sub-image a local edge histogram
with 5 bins is generated and the total of 80 histogram bins
The CLD of the input image are represented by these (16 sub-images multiplying 5 bins) is achieved for the
three set of matrices. whole image (Xu, F. and Zhang, Y.J., (2005)).
The images similarity between two CLDs obtained from The performances of these algorithms were evaluated
two different images is computed as: from the speed and retrieval quality point of view.
D = i w yi (DYi DYi ') + i wbi (DCbi DCbi ') +
2 2
Table 3 shows the average runtime needed for each of the
(3) three algorithms to process an image. The results of the
+ i wri (DCri DCri ')
2
HTD recommend it as the most efficient for texture
extraction. Although it hasnt the simplest
where {DY, DCb, DCr}, { DY', DCb', DCr' }, represent implementation, it has the lowest complexity among the
the CLDs of the images. three descriptors. The most complex and time consuming
descriptor is TBD.
The subscript i represents the zigzag-scanning order of the
coefficients. Furthermore, notice that is possible to weight Table 3. The average runtime of needed for texture
the coefficients (w) in order to adjust the performance of extraction
the matching process. These weights let us give to some
Texture descriptor HTD TBD EHD
components of the descriptor more importance than others
(Verdaguer, S.L., (2009)). Average runtime (ms) 585 814 664

6. CONCLUSIONS AND FUTURE WORK


5. TEXTURE DESCRIPTORS
The paper presented the additional functionality of a
The texture characteristic is a very important low-level
multimedia database server using a set of low-level
descriptor in images retrieval, similar to the colour
features called visual descriptors that have been defined
characteristic. The MPEG-7 standard uses three texture
by MPEG7. The system can be used for managing
descriptors (MPEG-7 (a)-(c)):
medium sized collections of images. There is also made a
texture browsing descriptor (TDB). It uses perceptual
survey of the similar implementations of other well
attributes such as directionality, regularity, and coarseness
known applications that manage the content based image
of a texture. This algorithm obtains results close to human
retrieval.
eye perception. The TBD is a 5-dimensional vector
expressed as: This MMDBMS is created for managing and querying
[Regularity Directionality1/ Directionality2/ Scale1/ Scale2], medium sized personal digital collections that contain
where: both alphanumerical information and digital images. The
Regularity represents the degree of periodic structure software tool allows the creating and deleting of
of the texture. The larger the Regularity value is, the more databases, the creating and deleting of tables in databases,
regular the pattern is; updating data in tables and querying. The user can use
Directionalities are associated to two dominant several types of data as integer, char, double and image.
orientations of the texture;
Scales represents the two dominant scales of the This software can be extended in the following directions:
texture. Adding new types of traditional and multimedia data
When this algorithm is used, the similarity is best types (for example video type or DICOM type - because
computed using Euclidean distance the main area where this multimedia DBMS is used it is
homogeneous texture descriptor (HTD) . It provides a the medical domain and the DICOM type of data is for
quantitative characterization of homogeneous texture storing alphanumerical information and images existing
regions for similarity retrieval. It characterizes the texture in a standard DICOM file provided by a medical device).
characteristic using the energy and energy deviation in a Studying and implementing indexing algorithms for data
set of frequency channels (ISO/IEC JTC1/SC29/WG11, inserted in the tables.
(2002)). This characteristic is extracted using Gabor filter Adjusting the weights of the CLD&DCD in order to
which partitions the frequency space with equal angle of improve the obtained results.
30 in angular direction and with octave division in radial Combining the different results obtained by each visual
direction. According to some previous results, the best descriptor; for this it will be study the statistical
distribution of the distance values obtained for each ead=250&sufs=0&order=r&is_summary_on=1&Result
descriptor in order to normalize each distance before Count=10&query=kmeans&submitButtonName=Search
combining them. Khoshafian S., and Baker, A.B. (1996). Multimedia and
Using Relevance Feedback techniques in order to Imaging Databases. Morgan Kaufmann Publishers,
improve the results. These techniques consist in involving Inc. San Francisco California.
the human factor in the search engine. Thus, once the Manjunath, B.S., Salembier, P., and Sikora, T., (2002).
retrieved images have been shown, the user has to decide Introduction to MPEG-7: Multimedia Content
which results are considered relevant. The retrieval Description Interface, 1st edition.
system takes then the advantage of this information given MPEG: http://www.mpeg.org/
by the user and tries to improve its results by using all the MPEG-7(a): http://www.chiariglione.org/mpeg/standards/
features extracted from these relevant images. mpeg-7/mpeg-7.htm
Developing a region-based search engine. For this we MPEG-7(b):
will need to incorporate shape descriptors, as well as http://www.m4if.org/resources.php#section40
whole set of descriptors used in image retrieval. MPEG-7(c): http://www.multimedia-metadata.info/
Niblack, C. W., Barber, R., Equitz, W., Flickner, M. D.,
REFERENCES Glasman, E. H., Petkovic, D., Yanker, P., Faloutsos,
C., and Taubin, G. (1993). QBIC project: querying
Bach, J. R., Fuller, C., and Gupta, A. (1996). Virage images by content, using color, texture, and shape,
image search engine: an open framework for image Storage and Retrieval for Image and Video
management, Storage and Retrieval for Still Image Databases, vol. 1908, SPIE, pp. 173187.
and Video Databases IV 2670, no. 1, 7687. Pentland, A., Picard, R. W., and Sclaroff, S., (1995).
Del Bimbo, A., (2001). Visual Information Retrieval, Photobook: Content-Based Manipulation of Image
Morgan Kaufmann Publishers. San Francisco USA Databases.
Dimitrova, N. (1999). Multimedia Content Analysis and Royo, C.V. (2010) Image-Based Query by Example
Indexing for Filtering and Retrieval Applications, Using MPEG-7 Visual Descriptors.
Informing Science Special Issue on Multimedia Smith, J.R. (1997). Integrated Spatial and Feature Image
Informing Technologies, vol.2 no.4, pp.87-100. Systems: Retrieval, Compression and Analysis. Ph.D.
Excalibur Visual Retrievalware, web page (1998). thesis, Graduate School of Arts and Sciences.
http://www.excalib.com/products/vrw/vrw.html. Columbia University
Flickner, M., Sawhney, H., Niblack, W., Ashley, J., S.C. Spahiu, C., Mihaescu, C., Stanescu, L., Burdescu,
Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., D.D., Brezovan, M. (2009 (a)). Database Kernel for
Petkovic, D., Steele, D., and Yanker, P. (1997). Query Image Retrieval, The First International Conference
by image and video content: the QBIC system, on Advances in Multimedia (MMEDIA 2009), pp. 169
Intelligent multimedia information retrieval, MIT 173.
Press, Cambridge, MA, USA, pp. 722. Stoica Spahiu, C. (2009 (b)). A Multimedia Database
ImgSeek, web page: http://www.imgseek.net/ Server for information storage and querying, 2nd
ISO/IEC JTC1/SC29/WG11, (2002). MPEG-7 overview, International Symposium on Multimedia
V. 8, Doc. N4980. Applications and Processing (MMAP'09), Vol. 4, pp.
Jacobs, C. E., Finkelstein, A., and Salesin, D. H. (1995). 517 522.
Fast multiresolution image querying, SIGGRAPH 95: Verdaguer, S.L. (2009). Color Based Image Classification
Proceedings of the 22nd annual conference on and Description
Computer graphics and interactive techniques (New Wong, K. M., Cheung, K. W., and Po, L. M. (2005).
York, NY, USA), ACM, pp. 277286. MIRROR: an interactive content based image retrieval
K-means: http://people.revoledu.com/kardi/tutorial/ system, IEEE International Symposium on Circuits
kMean/NumericalExample.htm and Systems, vol. 2, pp. 15411544.
K-means: http://www.mathworks.com/access/helpdesk/ Xu, F., Zhang, Y.J. (2005). Evaluation and comparison of
help/toolbox/stats/index.html?/access/helpdesk/help/to texture descriptors proposed in MPEG-7, November
olbox/stats/kmeans.html&http://www.mathworks.com 2005, www.sciencedirect.com.
/cgibin/texis/webinator/search/?db=MSS&prox=page
&rorder=750&rprox=750&rdfreq=500&rwfreq=500&rl

You might also like