Multi 4 Merged

Chapter 4
Color in Image and Video
1
 This chapter explores:
§ several issues in the use of color, since color is vitally
important in multimedia programs
 in this chapter we shall consider the following topics:

§ Color Science
§ Color Models in Images
§ Color Models in Video.
2
 4.1 Color Science
 4.2 Color Models in Images
 4.3 Color Models in Video
3
 Light and Spectra
 Light is an electromagnetic wave. Its color is characterized by

the wavelength content of the light.
(a) Laser light consists of a single wavelength: e.g., a ruby laser
produces a bright, scarlet -red beam.
(b) Most light sources produce contributions over many
wavelengths.
(c) However, humans cannot detect all light, just contributions
that fall in the "visible wavelengths".
(d) Short wavelengths produce a blue sensation, long wavelengths
produce a red one.
4
 Spectrophotometer: device used to measure visible
light, by reflecting light from a diffraction grating (prism) (a
ruled surface) that spreads out the different wavelengths.
 Figure 4.1 shows the phenomenon that white light

contains all the colors of a rainbow.
 Visible light is an electromagnetic wave in the range 400

nm to 700 nm (where nm stands for nanometer, 10−9
meters).
5
Fig. 4.1: Sir Isaac Newton's experiments.
6
7
8
9
 Human Vision
 The eye works like a camera, with the lens focusing an image onto the
retina (upside-down and left-right reversed).
 The retina consists of an array of rods and three kinds of cones.

 The rods come into play when light levels are low and produce a image
in shades of gray ("all cats are gray at night!").
 For higher light levels, the cones each produce a signal. Because of
their differing pigments, the three kinds of cones are most
sensitive to red (R), green (G), and blue (B) light.
 Eye-Brain system
 It seems likely that the brain makes use of differences R-G, G-B, and B-
R, as well as combining all of R, G, and B into a high-light-level
achromatic channel.
10
 Spectral Sensitivity of the Eye
 The eye is most sensitive to light in the middle of the visible
Spectrum.

 The sensitivity of our receptors is also a function of wave-length (Fig. 4.3
).

 The Blue receptor sensitivity is not shown to scale because it is much
smaller than the curves for Red or Green – Blue is a late addition,
in evolution.
 Fig. 4.3 shows the overall sensitivity as a dashed line – this

important curve is called the luminous -efficiency function.
◦ It is usually denoted V (λ) and is formed as the sum of the response
curves for Red, Green, and Blue.
11
Fundamentals of Multimedia, Chapter 4
• The rod sensitivity curve looks like the luminous-efficiency

function V (λ) but is shifted to the red end of the spectrum.
• The achromatic channel produced by the cones is approxi-mately

proportional to 2R + G + B/20.
1.0
V
0.8
Relative response
0.6
G R
0.4
0.2
B
0.0
400 450 500 550 600 650 700
Wavelength (nm)
Fig. 4.3: R,G, and B cones, and Luminous Efficiency curve V(λ).
 Spectral Sensitivity of the Eye
 The eye has about 6 million cones, but the proportions of R,

G, and B cones are different.
 They likely are present in the ratios 40:20:1
 So the achromatic channel produced by the cones is thus

something like 2R + G + B/20.
13
14
 Image Formation
 In most situations, we actually image light that is reflected

from a surface.
 Surfaces reflect different amounts of light at different

wavelengths, and dark surfaces reflect less energy than light
surfaces.
 then the reflected light filtered by the eye’s cone
 See Figure 4.5 Next slide
15
Fig. 4.5: Image formation model. 16
17
18
 Camera Systems
 Camera systems are made in a similar fashion; a good camera has

three signals produced at each pixel location (corresponding to a
retinal position).
 Analog signals are converted to digital, truncated to integers, and

stored. If the precision used is 8-bit, then the maximum value for
any of R; G;B is 255, and the minimum is 0.
19
20
1. Instead of red, green, and blue primaries, we need pri-

maries that amount to -red, -green, and -blue. I.e., we
need to subtract R, or G, or B.
2. These subtractive color primaries are Cyan (C ), Magenta
(M ) and Yellow (Y ) inks.
Blue Cyan Yellow Red
Magenta Green
Green Magenta
Red Yellow Cyan Blue
Black (0, 0, 0) White (1, 1, 1) White (0, 0, 0) Black (1, 1, 1)
The RGB Cube The CMY Cube
Fig. 4.15: RGB and CMY color cubes.

52 Li & Drew c Prentice Hall 2003
22
23
• Fig. 4.16: color combinations that result from combining

primary colors available in the two situations, additive color
and subtractive color.
Fig. 4.16: Additive and subtractive color. (a): RGB is used to

specify additive color. (b): CMY is used to specify subtractive
color
55 Li & Drew c Prentice Hall 2003

25
26
27
28
More Color Coordinate Schemes
• Beware: gamma correction or not is usually ignored.
• Schemes include:
a) HSL — Hue, Saturation and Lightness;
b)
c) HSV — Hue, Saturation and Value;
d) HSI — Hue, Saturation and Intensity;
e) HCI — C=Chroma;
f ) HVC — V=Value;
g) HSD — D=Darkness.
30
Chapter 5
Fundamental Concepts in
Video
5.1. Introduction
• Video is a series of images.
• When this series of images are displayed on screen at fast speed (
e.g 30 images per second), we see a perceived motion.
• These single images are called frames. The rate at which the
frames are projected is generally between 24 and 30 frames per
second (fps).
• The rate at which these images are presented is referred to as the
Frame Rate
• This is fundamental to the way video is modeled in computers.
• To model smooth motion psychophysical studies have shown that
a rate of 30 frames a second is good enough to simulate smooth
motion.
– Old Charlie Chaplin movies were taken at 12 frames a second and are visibly jerky in nature.
…cont’d
• Each screen-full of video is made up of thousands of pixels.
• A pixel is the smallest unit of an image. A pixel can display
only one color at a time.
• A television has 720 vertical lines of pixels (from left to right)
and 486 rows of pixels (top to bottom). A total of 349,920
pixels (720 x 486) for a single frame.
• There are two types of video:
– Analog Video
– Digital Video
5.1.1.Analog Video
• Analog technology requires information representing images and sound to
be in a real time continuous-scale electric signal between sources and
receivers.
• It is used throughout the television industry.
• For television, images and sound are converted into electric signals by
transducers.
• Distortion of images and noise are common problems for analog video.
• In an analogue video signal, each frame is represented by a fluctuating
voltage signal.
• This is known as an analogue waveform.
• One of the earliest formats for this was composite video.
• Analog formats are susceptible to loss due to transmission noise effects.
• Quality loss is also possible from one generation to another.
• This type of loss is like photocopying, in which a copy of a copy is never as
good as the original.
5.1.2. Digital Video
• Digital technology is based on images represented in the form
of bits.
• A digital video signal is actually a pattern of 1's and 0's that
represent the video image.
• With a digital video signal, there is no variation in the original
signal once it is captured on to computer disc
• Therefore, the image does not lose any of its original
sharpness and clarity.
• The image is an exact copy of the original.
• A computer is the must common form of digital technology.
…cont’d
• The limitations of analog video led to the birth of digital video.
• Digital video is just a digital representation of the analogue video signal.
• Unlike analogue video that degrades in quality from one generation to the
next, digital video does not degrade.
• Each generation of digital video is identical to the parent. Even though the
data is digital, virtually all digital formats are still stored on sequential
tapes.
• There are two significant advantages for using computers for digital video :
– The ability to randomly access the storage of video and
– compress the video stored.
• Computer-based digital video is defined as a series of individual images
and associated audio.
• These elements are stored in a format in which both elements (pixel and
sound sample) are represented as a series of binary digits (bits).
Advantages:
 Direct random access > good for nonlinear video editing
 No problem for repeated recording fl No need for blanking and sync
pulse
 Almost all digital video uses component video
• Analog vs. Digital Video
• An analog video can be very similar to the original video copied, but
it is not identical.
• Digital copies will always be identical and will not loose their
sharpness and clarity over time.
• However, digital video has the limitation of the amount of RAM
available, whereas this is not a factor with analog video.
• Digital technology allows for easy editing and enhancing of videos.
• Storage of the analog video tapes is much more cumbersome than
digital video CDs.
• Clearly, with new technology continuously emerging, this debate
will always be changing.
5.2. Displaying Video
• There are two ways of displaying video on screen:
– Progressive scan
– Interlaced scan
5.2.1. Interlaced Scanning
• It writes every second line of the picture during a scan, and
writes the other half during the next sweep.
• Doing that we only need 25/30 pictures per second.
• This idea of splitting up the image into two parts became
known as interlacing and the splited up pictures as fields.
• Graphically seen a field is basically a picture with every 2nd
line black/white.
• Here is an image that shows interlacing so that you can better
imagine what happens.
…cont’d
5.2.2. Progressive Scanning
• PC CRT displays are fundamentally different from TV screens.
• Monitor writes a whole picture per scan.
• Progressive scan updates all the lines on the screen at the same
time, 60 times every second.
• This is known as progressive scanning. Today all PC screens
write a picture like this.
•
Comparison of computer and television
display.
Computer Television
• Scans 480 horizontal lines • Scans 625, 525 horizontal
from top to bottom lines
• Scan each line progressively • Scan line using interlacing
• Scan full frame at a rate of system
typically 66.67 HZ or higher • Scan 25-30 HZ for full time
• Use RGB color model • Uses limited color palette
and restricted
luminance (lightness or
darkness)
5.3. Recording Video
• CCDs (Charge Coupled Devices) a chip containing a series of tiny, light-
sensitive photosites.
• It forms the heart of all electronic and digital cameras.
• CCDs can be thought of as film for electronic cameras.
• CCDs consist of thousands or even millions of cells, each of which is light-
sensitive and capable of producing varying amounts of charge in response
to the amount of light they receive.
• Digital camera uses lens which focuses the image onto a Charge Coupled
Device (CCD), which then converts the image into electrical pulses.
• These pulses are then saved into memory.
• In short, Just as the film in a conventional camera records an image when
light hits it, the CCD records the image electronically.
• The photosites convert light into electrons.
• The electrons pass through an analog-to-digital converter, which produces a
file of encoded digital information in which bits represent the color and
tonal values of a subject.
• The performance of a CCD is often measured by its output resolution,
which in turn is a function of the number of photosites on the CCD's
surface.
5.4 Video Display Interfaces
• We now discuss the interfaces for video signal transmission
from some output devices (e.g., set-top box, video player,
video card, and etc.) to a video display (e.g., TV, monitor,
projector, etc.).
• There have been a wide range of video display interfaces,
supporting video signals of different formats (analog or
digital, interlaced or progressive), different frame rates, and
different resolutions
• We start our discussion with
– analog interfaces, including Component Video, Composite
Video, and S-Video,
– and then digital interfaces, including DVI, HDMI, and
DisplayPort.
13
5.4.1 Analog Display Interfaces
• Analog video signals are often transmitted in one of three different
interfaces:
– Component video,
– Composite video, and
– S-video.
• Figure 5.7 shows the typical connectors for them
Fig. 5.7 Connectors for typical analog display interfaces. From left to right: Component
video, Composite video, S-video, and VGA
14
• Component Video
• Higher end video systems, such as for studios, make use of three
separate video signals for the red, green, and blue image planes.
• This is referred to as component video.
• This kind of system has three wires (and connectors) connecting the
camera or other devices to a TV or monitor.
15
• Composite Video
• When connecting to TVs or VCRs, composite video uses only
one wire (and hence one connector, such as a BNC connector
at each end of a coaxial cable or an RCA plug at each end of
an ordinary wire), and video color signals are mixed, not sent
separately.
• The audio signal is another addition to this one signal.
16
…cont’d
• S-Video
• As a compromise, S-video (separated video, or super-video,
e.g., in S-VHS) uses two wires: one for luminance and
another for a composite chrominance signal.
• The reason for placing luminance into its own part of the
signal is that black-and white information is most important
for visual perception.
• As noted in the previous chapter, humans are able to
differentiate spatial resolution in the grayscale (―black and-
white‖) part much better than for the color part of RGB
images.
• Therefore, color information transmitted can be much less
accurate than intensity information.
• We can see only fairly large blobs of color, so it makes sense
to send less color detail. 17
• Video Graphics Array (VGA)
• The Video Graphics Array (VGA) is a video display interface that
was first introduced by IBM in 1987, along with its PS/2 personal
computers. It has since been widely used in the computer industry
with many variations, which are collectively referred to as VGA.
• The initial VGA resolution was 640×480 pixels.
• The VGA video signals are based on analog component RGBHV
(red, green, blue, horizontal sync, vertical sync).
18
5.4.2 Digital Display Interfaces
• Given the rise of digital video processing and the monitors that
directly accept digital video signals, there is a great demand toward
video display interfaces that transmit digital video signals.
• Such interfaces emerged in 1980s (e.g., Color Graphics Adapter
(CGA)
• Today, the most widely used digital video interfaces include Digital
Visual Interface (DVI), High-Definition Multimedia Interface
(HDMI), and Display Port, as shown in Fig. 5.8.
Fig. 5.8 Connectors of different digital display interfaces. From left to right: DVI,
HDMI, DisplayPort
19
• Digital Visual Interface (DVI)
• Digital Visual Interface (DVI) was developed by the Digital
Display Working Group (DDWG) for transferring digital video
signals, particularly from a computer’s video card to a monitor.
• It carries uncompressed digital video and can be configured to
support multiple modes, including DVI-D (digital only), DVI-A
(analog only), or DVI-I (digital and analog).
• The support for analog connections makes DVI backward
compatible with VGA (though an adapter is needed between the
two interfaces).
• The DVI allows a maximum 16:9 screen resolution of 1920×1080
pixels.
20
• High-Definition Multimedia Interface (HDMI)
• HDMI is a newer digital audio/video interface developed to be
backward-compatible with DVI.
• HDMI, however, differs from DVI in the following aspects:
1. HDMI does not carry analog signal and hence is not compatible with
VGA.
2. DVI is limited to the RGB color range (0–255).
3. HDMI supports digital audio, in addition to digital video.
• The HDMI allows a maximum screen resolution of 2560×1600
pixels.
21
• Display Port
• Display Port is a digital display interface. It is the first display
interface that uses packetized data transmission, like the Internet or
Ethernet
• Display Port can achieve a higher resolution with fewer pins than
the previous technologies.
• The use of data packets also allows Display Port to be extensible,
i.e., new features can be added over time without significant
changes to the physical interface itself.
• Display Port can be used to transmit audio and video
simultaneously, or either of them.
• Compared with HDMI, Display Port has slightly more bandwidth,
which also accommodates multiple streams of audio and video to
separate devices.
22
5.5. Video Broadcasting Standards/ TV
standards
• There are three different video broadcasting standards: PAL, NTSC,
and SECAM
PAL (Phase Alternate Line)
• PAL uses 625 horizontal lines at a field rate of 50 fields per second
(or 25 frames per second).
• Only 576 of these lines are used for picture information with the
remaining 49 lines used for sync or holding additional information
such as closed captioning. It is used in Australia, New Zealand,
United Kingdom, and Europe.
– Scans 625 lines per frame, 25 frames per second (40 msec/frame)
– Interlaced, each frame is divided into 2 fields, 312.5 lines/field
– For color representation, PAL uses YUV (YCbCr) color model
– In PAL, 5.5 MHz is allocated to Y, 1.8 MHz each to U and V
…cont’d
SECAM (Sequential Color with Memory)
• SECAM uses the same bandwidth as PAL but transmits the color
information sequentially.
• It is used in France, East Europe, etc SECAM (Systeme Electronic Pour
Couleur Avec Memoire) is very similar to PAL.
• It specifies the same number of scan lines and frames per second. It is the
broadcast standard for France, Russia, and parts of Africa and Eastern
Europe.
NTSC (National Television Standards Committee)
• NTSC is a black-and-white and color compatible 525-line system that
scans a nominal 30 interlaced television picture frames per second.
• Used in USA, Canada, and Japan.
• 525 scan lines per frame, 30 frames per second (or be exact, 29.97 fps,
33.37 sec/frame)
• Interlaced, each frame is divided into 2 fields, 262.5 lines/field
• 20 lines reserved for control information at the beginning of each field (Fig.
38)
• So a maximum of 485 lines of visible data
NTSC Video Scan Line
• Each line takes 63.5 microseconds to scan. Horizontal retrace
takes 10 microseconds (with 5 microseconds horizontal synch
pulse embedded), so the active line time is 53.5 microseconds.
• NTSC Video Color Representation/Compression
• For color representation, NTSC uses YIQ color model.
• Basic Compression Idea
• Eye is most sensitive to Y, next to I, next to Q.
• This is still Analog Compression:
• In NTSC,
– 4 MHz is allocated to Y,
– 1.5 MHz to I,
– 0.6 MHz to Q.
5.6. High-Definition TV
• The introduction of wide-screen movies brought the discovery that
viewers seated near the screen enjoyed a level of participation
(sensation of immersion) not experienced with conventional
movies.
• Apparently the exposure to a greater field of view, especially the
involvement of peripheral vision, contributes to the sense of ―being
there.‖
• The main thrust of High-Definition TV (HDTV) is not to increase
the ―definition‖ in each unit area, but rather to increase the visual
field, especially its width.
• First-generation HDTV was based on an analog technology
developed by Sony and NHK in Japan in the late 1970s.
26
5.7. High-Definition TV
• MUltiple sub-Nyquist Sampling Encoding (MUSE) was an

improved NHK HDTV with hybrid analog/digital technologies that
was put in use in the 1990s.
• It has 1,125 scan lines, interlaced (60 fields per second), and a 16:9
aspect ratio. (compare with NTSC 4:3 aspect ratio, see slide 8)
• In 1987, the FCC decided that HDTV standards must be compatible
with the existing NTSC standard and must be confined to the
existing Very High Frequency (VHF) and Ultra High Frequency
(UHF) bands.
27
5.8. Ultra High Definition TV (UHDTV)
• UHDTV is a new development—a new generation of HDTV!
• The standards announced in 2012
• The aspect ratio is 16:9.
• The supported frame rate has been gradually increased to 120 fps.
Four Factors of Digital Video
• With digital video, four factors have to be kept in mind.
• These are :
– Frame rate
– Spatial Resolution - "How big is the picture?".
– Color Resolution -refers to the number of colors displayed on the screen at one
time. Computers deal with color in an RGB (red-greenblue) format, while
video uses a variety of formats. One of the most common video formats is
called YUV.
– Image Quality
28
Chapter 6
Basics of Digital Audio

Sampling Audio
Analog Audio
…cont’d
• A wave has three characteristics:
• Amplitude - is the intensity of signal. This is can be
determined by looking at the height of signal. If amplitude
increases, the sound becomes louder. Amplitude measures the
how high or low the voltage of the signal is at a given point of
time.
• Frequency- is the number of times the wave cycle is repeated.
This can be determined by counting the number of cycles in
given time interval. Frequency is related with pitchness of the
sound. Increased frequency->high pitch.
• Phase- related to the waves appearance.
Analog to Digital Conversion
• Converting an analog audio to digital audio requires that the analog signal
is sampled. Sampling is the process of taking periodic measurements of the
continuous signal. Samples are taken at regular time interval, i.e. every T
seconds. This is called sampling frequency/sampling rate. Digitized
audio is sampled audio. Many times each second, the analog signal is
sampled. How often these samples are taken is referred to as sampling rate.
The amount of information stored about each sample is referred to as
sample size.
• Analog signal is represented by amplitude and frequency. Converting these
waves to digital information is referred to as digitizing. The challenge is to
convert the analog waves to numbers (digital information).
…cont’d
• In digital form, the measure of amplitude (the 7 point scale -
vertically) is represented with binary numbers (bottom of graph).
• The more numbers on the scale the better the quality of the
sample, but more bits will be needed to represent that sample.
• The graph below only shows 3bits being used for each sample,
but in reality either 8 or 16-bits will be used to create all the
levels of amplitude on a scale. (Music CDs use 16-bits for each
sample).
…cont’d
• In digital form, the measure of frequency is referred to as how often the
sample is taken. In the graph below the sample has been taken 7 times
(reading across). Frequency is talked about in terms of Kilohertz (KHz).
• Hertz (Hz) = number of cycles per second
– KHz = 1000Hz MHz = 1000 KHz
• Music CDs use a frequency of 44.1 KHz. A frequency of 22 KHz for
example, would mean that the sample was taken less often.
• Sampling means measuring the value of the signal at a given time period.
The samples are then quantized.
• Quantization is rounding the value of each sample to the nearest amplitude
number in the graph. For example, if amplitude of a specific sample is 5.6,
this should be rounded either up to 6 or down to 5. This is called
quantization.
• Quantization is assigning a value (from a set) to a sample. The quantized
values are changed to binary pattern. The binary patterns are stored in
computer.
Sample Rate
• A sample is a single measurement of amplitude.
• The sample rate is the number of these measurements taken
every second.
• In order to accurately represent all of the frequencies in a
recording that fall within the range of human perception,
generally accepted as 20Hz-20KHz, we must choose a sample
rate high enough to represent all of these frequencies. At first
consideration, one might choose a sample rate of 20 KHz since
this is identical to the highest frequency. This will not work,
however, because every cycle of a waveform has both a
positive and negative amplitude and it is the rate of alternation
between positive and negative amplitudes that determines
frequency. Therefore, we need at least two samples for every
cycle resulting in a sample rate of at least 40 KHz.
Sampling Theorem
• Sampling frequency/rate is very important in order to
accurately reproduce a digital version of an analog waveform.
• Nyquists Theorem: The Sampling frequency for a signal
must be at least twice the highest frequency component in the
signal. Sample rate = 2 x highest frequency
• When the sampling rate is lower than or equal to the Nyquist
rate, the condition is defined as under sampling.
• It is impossible to rebuild the original signal according to the
sampling theorem when such sampling rate is used.
Common Sampling Rates
• 8KHz: used for telephone
• 11.025 KHz: Speech audio
• 22.05 KHz: Low Grade Audio (WWW Audio, AM Radio)
• 44.1 KHz: CD Quality audio
Sample Resolution/Sample Size
• Each sample can only be measured to a certain degree of
accuracy.
• The accuracy is dependent on the number of bits used to
represent the amplitude, which is also known as the sample
resolution.
• How do we store each sample value (quantized value)? fl 8 Bit
Value (0-255) fl 16 Bit Value (Integer) (0-65535)
…cont’d
• The amount of memory required to store t seconds long sample is
as follows:
– If we use 8 bit resolution, mono recording =>memory = f*t*8*1
– If we use 8 bit resolution, stereo recording =>memory = f*t*8*2
– If we use 16 bit resolution, and mono recording =>memory = f*t*16*1
– If we use 16 bit resolution, and stereo recording =>memory =f* t*16*2
– where f is sampling frequency, and t is time duration in seconds
• Examples: Abebe sampled audio for 10 seconds. How much
storage space is required if
– a) 22.05 KHz sampling rate is used, and 8 bit resolution with mono
recording?
– b) 44.1 KHz sampling rate is used, and 8 bit resolution with mono
recording?
– c) 44.1 KHz sampling rate is used, 16 bit resolution with stereo recording?
– d) 11.025 KHz sampling rate, 16 bit resolution with stereo recording?
Solution
• a) m=22050*8*10*1 m= 1764000bits=220500bytes=220.5KB
• b) m=44100*8*10*1 m= 3528000 bits=441000butes=441KB
• c) m=44100*16*10*2 m= 14112000 bits= 1764000 bytes= 1764KB
• d) m=11025*16*10*2 m= 3528000 bits= 441000 bytes= 441KB
The End
Chapter 7
Data Compression
Introduction
Data compression is often referred to as coding, where coding is a very general term encompassing
any special representation of data which satisfies a given need.
Definition: Data compression is the process of encoding information using fewer number of bits so
that it takes less memory area (storage) or bandwidth during transmission.
Two types of compression:
 Lossy data compression
 Lossless data compression
Lossless Data Compression: in lossless data compression, the original content of the data is not
lost/changed when it is compressed (encoded).
Examples:
RLE (Run Length Encoding)
Dictionary Based Coding
Arithmetic Coding
Lossy data compression: the original content of the data is lost to certain degree when compressed.
Part of the data that is not much important is discarded/lost. The loss factor determines whether there
is a loss of quality between the original image and the image after it has been compressed and played
back (decompressed). The more compression, the more likely that quality will be affected. Even if
the quality difference is not noticeable, these are considered lossy compression methods.
Examples
JPEG (Joint Photographic Experts Group)
MPEG (Moving Pictures Expert Group)
ADPCM
Information Theory
Information theory is defined to be the study of efficient coding and its consequences. It is the field
of study concerned about the storage and transmission of data. It is concerned with source coding
and channel coding.
Source coding: involves compression
Channel coding: how to transmit data, how to overcame noise, etc
Data compression may be viewed as a branch of information theory in which the primary objective
is to minimize the amount of data to be transmitted.
Fig Information coding and transmission
Need for Compression

With more colors, higher resolution, and faster frame rates, you produce better quality video, but you
need more computer power and more storage space for your video. Doing some simple calculations
(see below) it can be shown that with 24-bit color video, with 640 by 480 resolutions, at 30 fps,
requires an astonishing 26 megabytes of data per second! Not only does this surpass the capabilities
of the many home computer systems, but also overburdens existing storage systems.
640 horizontal resolution

X 480 vertical resolution
= 307, 200 total pixels per frame
X 3 bytes per pixel
= 921, 600 total bytes per frame
X 30 frames per second
= 27, 648, 000 total bytes per second
/ 1, 048 576 to convert to megabytes
= 26.36 megabytes per second!
The calculation shows space required for video is excessive. For video, the way to reduce this
amount of data down to a manageable level is to compromise on the quality of video to some extent.
This is done by lossy compression which forgets some of the original data.
Compression Algorithms
Compression methods use mathematical algorithms to reduce (or compress) data by eliminating,
grouping and/or averaging similar data found in the signal. Different Although there are various
compression methods, including Motion JPEG, only MPEG-1 and MPEG-2 are internationally
recognized standards for the compression of moving pictures (video).
A simple characterization of data compression is that it involves transforming a string of characters

in some representation (such as ASCII) into a new string (of bits, for example) which contains the
same information but whose length is as small as possible. Data compression has important
application in the areas of data transmission and data storage.
The proliferation of computer communication networks is resulting in massive transfer of data over
communication links. Compressing data to be stored or transmitted reduces storage and/or
communication costs. When the amount of data to be transmitted is reduced, the effect is that of
increasing the capacity of the communication channel.
Lossless compression is a method of reducing the size of computer files without losing any
information. That means when you compress a file, it will take up less space, but when you
decompress it, it will still have the exact same information. The idea is to get rid of any redundancy
in the information, this is exactly what happens is used in ZIP and GIF files. This differs from lossy
compression, such as in JPEG files, which loses some information that isn't very noticeable. Why
use lossless compression?
You can use lossless compression whenever space is a concern, but the information must be the
same. An example is when sending text files over a modem or the Internet. If the files are smaller,
they will get there faster. However, they must be the same as that you sent at destination. Modem
uses LZW compression automatically to speed up transfers.
There are several popular algorithms for lossless compression. There are also variations of most of
them, and each has many implementations. Here is a list of the families, their variations, and the file
types where they are implemented:
Family Variations Used in

Running-Length none
Huffman MNP5
Huffman Adaptive Huffman COMPACT
Shannon-Fano SQ
Arithmetic none
GIF
LZ78 (Lempel-Ziv 1978) LZW (Lempel-Ziv-Welch) v.42bis
compress
ZIP
LZ77 (Lempel-Ziv 1977) LZFG ARJ
LHA
Table lossless coding algorithm families and variations
Variable Length Encoding

Claude Shannon and R.M. Fano created the first compression algorithm in the 1950's. This algorithm
assigns variable number of bits to letters/symbols.
Shannon-Fano Coding
Let us assume the source alphabet S={X1,X2,X3,Ö,Xn} and
Associated probability P={P1,P2,P3,Ö,Pn}
The steps to encode data using Shannon-Fano coding algorithm is as follows:
Order the source letter into a sequence according to the probability of occurrence in non-increasing
order i.e. decreasing order.
ShannonFano(sequence s)
If s has two letters
Attach 0 to the codeword of one letter and 1 to the codeword of another;
Else if s has more than two letter
Divide s into two subsequences S1, and S2 with the minimal difference between
probabilities of each subsequence;
extend the codeword for each letter in S1 by attaching 0, and by attaching 1 to each
codeword for letters in S2;
ShannonFano(S1);
ShannonFano(S2);
Example: Suppose the following source and with related probabilities

S={A,B,C,D,E}
P={0.35,0.17,0.17,0.16,0.15}
Message to be encoded=îABCDEî
The probability is already arranged in non-increasing order. First we divide the message into AB and
CDE. Why? This gives the smallest difference between the total probabilities of the two groups.
S1={A,B} P={0.35,0.17}=0.52
S2={C,D,E} P={0.17,0.17,0.16}=0.46
The difference is only 0.52-0.46=0.06. This is the smallest possible difference when we divide the
message.
Attach 0 to S1 and 1 to S2.
Subdivide S1 into sub groups.
S11={A} attach 0 to this
S12={B} attach 1 to this
Again subdivide S2 into subgroups considering the probability again.

S21={C} P={0.17}=0.17
S22={D,E} P={0.16,0.15}=0.31
Attach 0 to S21 and 1 to S22. Since S22 has more than one letter in it, we have to subdivide it.
S221={D} attach 0
S222={E} attach 1
Fig Shannon-Fano coding tree
The message is transmitted using the following code (by traversing the tree)
A=00 B=01
C=10 D=110
E=111
Instead of transmitting ABCDE, we transmit 000110110111.
Dictionary Encoding
Dictionary coding uses groups of symbols, words, and phrases with corresponding abbreviation. It
transmits the index of the symbol/word instead of the word itself. There are different variations of
dictionary based coding:
LZ77 (printed in 1977)
LZ78 (printed in 1978)
LZSS
LZW (Lempel-Ziv-Welch)
LZW Compression
LZW compression has its roots in the work of Jacob Ziv and Abraham Lempel. In 1977, they
published a paper on "sliding-window" compression, and followed it with another paper in 1978 on
"dictionary" based compression. These algorithms were named LZ77 and LZ78, respectively. Then
in 1984, Terry Welch made a modification to LZ78 which became very popular and was called
LZW.
The Concept
Many files, especially text files, have certain strings that repeat very often, for example " the ". With
the spaces, the string takes 5 bytes, or 40 bits to encode. But what if we were to add the whole string
to the list of characters? Then every time we came across " the ", we could send the code instead of
32,116,104,101,32. This would take less no of bits.
This is exactly the approach that LZW compression takes. It starts with a dictionary of all the single
character with indexes 0-255. It then starts to expand the dictionary as information gets sent through.
Then, redundant strings will be coded, and compression has occurred.
The Algorithm:
LZWEncoding()
Enter all letters to the dictionary;
Initialize string s to the first letter from the input;
While any input is left
read symbol c;
if s+c exists in the dictionary
s = s+c;
else
output codeword(s); //codeword for s
enter s+c to dictionary;
s =c;
end loop
output codeword(s);
Example: encode the ff string ìaababacbaacbaadaaî
The program reads one character at a time. If the code is in the dictionary, then it adds the character
to the current work string, and waits for the next one. This occurs on the first character as well. If the
work string is not in the dictionary, (such as when the second character comes across), it adds the
work string to the dictionary and sends over the wire the works string without the new character. It
then sets the work string to the new character.
Example:
Encode the message aababacbaacbaadaaa using the above algorithm
Encoding
Create dictionary of letters found in the message
Encoder Dictionary
Input Output Index Entry
1 a
2 b
3 c
4 d
S is initialized to the first letter of message a (s=a)

Read symbol to c, and the next symbol is a (c=a)
Check if s+c (s+c=aa) is found in the dictionary (the one created above in step 1). It is not found.
So add s+c(s+c=aa) to dictionary and output codeword for s(s=a). The code for a is 1 from the
dictionary.
Then initialize s to c (s=c=a).
Encoder Dictionary
Input(s+c) Output Index Entry
1 a
2 b
3 c
4 d
aa 1 5 aa
Read the next letter from message to c (c=b)

Check if s+c (ab) is found in the dictionary. It is not found. Then, add s+c (s+c=ab) into dictionary
and output code for c (c=b). The codeword is 2. Then initialize s to c (s=c=b).
Encoder Dictionary
1 a
2 b
3 c
4 d
aa 1 5 aa
ab 1 6 ab
Read the next letter to c (c=a).

Check if s+c (s+c=ba) is found in the dictionary. It is not found. Then add s+c (s+c=ba) to the
dictionary. Then output the codeword for s (s=b). It is 2. Then initialize s to c (s=c=b).
Encoder Dictionary
1 a
2 b
3 c
4 d
aa 1 5 aa
ab 1 6 ab
ba 2 7 ba
Read the next message to c (c=a). Then check if s+c (s+c=ab) is found in the dictionary. It is there.
Then initialize s to s+c (s=s+c=ab).
Read again the next letter to c (c=a). Then check if s+c (s+c=aba) is found in the dicitionary. It is not
there. Then transmit codeword for s (s=ab). The code is 6. Initialize s to c(s=c=a).
Encoder Dictionary
1 a
2 b
3 c
4 d
aa 1 5 aa
ab 1 6 ab
ba 2 7 ba
aba 6 8 aba
Again read the next letter to c and continue the same way till the end of message. At last you will
have the following encoding table.
Encoder Dictionary
1 a
2 b
3 c
4 d
aa 1 5 aa
ab 1 6 ab
ba 2 7 ba
aba 6 8 aba
ac 1 9 ac
cb 3 10 cb
baa 7 11 baa
acb 9 12 acb
baad 11 13 baad
da 4 14 da
aaa 5 15 aa
Table encoding string
Now instead of the original message, you transmit their indexes in the dictionary. The code for the
message is 112613791145.
Decompression
The algorithm:
LZWDecoding()
Enter all the source letters into the dictionary;
Read priorCodeword and output one symbol corresponding to it;
While codeword is still left
read Codeword;
PriorString = string (PriorCodeword);
If codeword is in the dictionary
Enter in dictionary PriorString + firstsymbol(string(codeword));
output string(codeword);
else
Enter in the dictionary priorString +firstsymbol(priorString);
Output priorString+firstsymbol(priorstring);
priorCodeword=codeword;
end loop
The nice thing is that the decompressor builds its own dictionary on its side, that matches exactly the
compressorís dictionary, so that only the codes need to be sent.
Example:
Let us decode the message 112613791145.
We will start with the following table.
Encoder Dictionary
1 a
2 b
3 c
4 d
Read the first code. It is 1. Output the corresponding lettera

Encoder Dictionary
1 a
2 b
3 c
4 d
1 a
Read the next code. It is 1 and it is found in the dictionary. So add aa to the dictionary and output a
again.
Encoder Dictionary
1 a
2 b
3 c
4 d
1 a
1 a 5 aa
Read the next code which is 2. It is found in the dictionary. We add ab to dictionary and output b.
Encoder Dictionary
1 a
2 b
3 c
4 d
1 a
1 a 5 aa
2 b 6 ab
Read the next code which is 6. It is found in the dictionary. Add ba to dictionary and output ab
Encoder Dictionary
1 a
2 b
3 c
4 d
1 a
1 a 5 aa
2 b 6 ab
6 ab 7 ba
Read the next code. It is 1. 1 is found in the dictionary. Add aba to the dictionary and output a.
Encoder Dictionary
1 a
2 b
3 c
4 d
1 a
1 a 5 aa
2 b 6 ab
6 ab 7 ba
1 a 8 aba
Read the next code. It is 3 and it is found in the dictionary. Add ac to dictionary and output c.
Continue like this till the end of code is reached. You will get the following table:
Encoder Dictionary
1 a
2 b
3 c
4 d
1 a
1 a 5 aa
2 b 6 ab
6 ab 7 ba
1 a 8 aba
3 c 9 ac
7 ba 10 cb
9 ac 11 baa
11 baa 12 acb
4 d 13 baad
5 aa 14 da
The decoded message is aababacbaacbaadaa
Huffman Compression
When we encode characters in computers, we assign each an 8-bit code based on an ASCII chart. But
in most files, some characters appear more often than others. So wouldn't it make more sense to
assign shorter codes for characters that appear more often and longer codes for characters that appear
less often? D.A. Huffman published a paper in 1952 that improved the algorithm slightly and it soon
superceded Shannon-Fano coding with the appropriately named Huffman coding.
Huffman coding has the following properties:

 Codes for more probable characters are shorter than ones for less probable characters.
 Each code can be uniquely decoded
To accomplish this, Huffman coding creates what is called a Huffman tree, which is a binary tree.
First count the amount of times each character appears, and assign this as a weight/probability to
each character, or node. Add all the nodes to a list.
Then, repeat these steps until there is only one node left:
 Find the two nodes with the lowest weights.
 Create a parent node for these two nodes. Give this parent node a weight of the sum of the two
nodes.
 Remove the two nodes from the list, and add the parent node.
This way, the nodes with the highest weight will be near the top of the tree, and have shorter codes.
Algorithm to create the tree

Assume the source alphabet S={X1, X2, X3, Ö,Xn} and
Associated Probabilities P={P1, P2, P3,Ö, Pn}
Huffman()
For each letter create a tree with single root node and order all trees according to the
probability of letter of occurrence;
while more than one tree is left
take two trees t1, and t2 with the lowest probabilities p1, p2 and create a tree with
probability in its root equal to p1+p2 and with t1 and t2 as its subtrees;
associate 0 with each left branch and 1 with each right branch;
create unique codeword for each letter by traversing the tree the root to the leaf containing the
probability corresponding to this letter and putting all encountered 0s and 1s together;
Example: Suppose the following source and related probability

S={A,B,C,D,E}
P={0.15,0.16,0.17,0.17,0.35}
Message=îabcdeî
Fig Huffman tree
To read the codes from a Huffman tree, start from the root and add a 0 every time you go left to a
child, and add a 1 every time you go right. So in this example, the code for the character b is 01 and
the code for d is 110.
As you can see, a has a shorter code than d. Notice that since all the characters are at the leafs of the
tree, there is never a chance that one code will be the prefix of another one (eg. a is 01 and b is 011).
Hence, this unique prefix property assures that each code can be uniquely decoded.
The code for each letter is:

a=000 b=001
c=010 d=011
e=1
The original message will be encoded to:

abcde=0000010100111
To decode the message coded by Huffman coding, a conversion table had to be known by the
receiver. Using this table, a tree can be constructed with the same path as the tree used for coding.
Leaves store the same path as the tree used for coding. Leaves store letters instead of probabilities
for efficiency purpose.
The decoder then can use the Huffman tree to decode the string by following the paths according to
the string and adding a character every time it comes to one.
Fig Huffman tree
The Algorithm
Move left if you get 0

Move right if you get 1
If you get letter (reach leaf node) output that letter.
Go back and start from root again with the remaining code.
Using this algotihm and the above decoding tree, let us decode the encoded message
0000010100111 at destination.
0-move left
0-move left again
0-move left again, and we have reached leaf. Output the letter on the leaf node which is a.
Go back to root.
0-move left
0-move left
1-move right, and we have reached the leaf. Output letter on the leaf and it is b.
Go back to root.
0-move left
1-move right
0-move left, and we reach leaf. Output letter found on the leaf which is c.
Go back to root.
0-move left
1-move right
1-move right, and we reach leaf. Output letter on leaf which is d.
Go back to root.
1-move right, and we reach leaf node. Output the letter on the node which is e. Now we have
finished i.e. no more code remains. Display the letters output as message. Abcde
How can the encoder let the decoder know which particular coding tree has been used? Two ways:
i) Both agree on particular Hufmann tree and both use it for sending any message
ii) The encoder constructs Huffman tree afresh every time a new message is sent and sends the
conversion table along with the message. This is more versatile, but has additional overloadó
sending conversion table. But for large data, there is the advantage.
It is also possible to create tree for pairs of letters. This improves performance.
Example: S={x,
y, z} P={0.1, 0.2,
0.7}
To get the probability of pairs, multiply the probability of each letter.
xx=0.1*0.1=0.01
xy=0.1*0.2=0.02
xz=0.1*0.3=0.07
yx=0.2*0.1=0.02
yy=0.2*0.2=0.04
yz=0.2*0.7=0.14
zx=0.7*0.1=0.07
zz=0.7*0.7=0.49
zy=0.7*0.2=0.14
Using these probabilities, you can create Huffman tree of pairs the same way as we did previously.
Arithmetic Coding
The entire data set is represented by a single rational number, whose value is between 0 and 1. This
range is divided into sub-intervals each representing a certain symbol. The number of sub-intervals
is identical to the number of symbols in the current set of symbols and the size is proportional to
their probability of appearance. For each symbol in the original data a new interval division takes
place, on the basis of the last sub-interval.
Algorithm:
ArithmeticEncoding(message)
CurrentInterval=[0,1); //includes 0 but not 1
while the end of message is not reached
read letter Xi from message;
divide the CurrentInterval into SubInterval IRcureentInterval;
CurrentInterval=SubIntervali in CurrentInterval;
Output bits uniquely identifying CurrentInterval;
Assume the source alphabet s={X1, X2, X3,Ö, Xn} and associated probability of
P={p1, p2, p3,Ö, pn}
To calculate sub interval of current interval [L,R], use the following formula
IR[L,R]={[L, L+(R-L)*P1],[ L+(R-L)*P1, L+(R-L)*P2],[ L+(R-L)*P2, L+(R-L)*P3],Ö,
[L+(R-L)*Pn-1, L+(R-L)*P1)}
where Pi= , and

[L,R]=current interval for which sub interval is calculated
Cumulative probabilities are indicated using capital P and single probabilities are indicated using
small p.
Example:
Encode the message abbc# using arithmetic encoding.
s={a,b,c,#}
p={0.4,0.3,0.1,0.2}
At the beginning CurentInterval is set to [0,1). Let us calculate subintervals of [0,1).
First let us get cumulative probability Pi

P1=0.4
P2=0.4+0.3=0.7
P3=0.4+0.3+0.1=0.8
P4=0.4+0.3+0.1+0.2=1
Next calculate subintervals of [0,1) using the formula given above.

IR[0,1]={[0,0+(1-0)*0.4),[0+(1-0)*0.4, 0+(1-0)*0.7), [0+(1-0)*0.7, 0+(1-0)*0.8),
[0+(1-0)*0.8, 0+(1-0)*1)}
IR[0,1]={[0,0.4),[0.4,0.7),[0.7,0.8),[0.8,1)}-- four subintervals
Now the question is, which one of the SubIntervals will be the CurrentInterval? To determine this,
read the first letter of the message. It is a. Look where a is found in the source alphabet. It is found at
the beginning. So the next CurrentInterval will be [0,4) which is also found at the beginning in the
SubIntervals.
Again let us calculate the SubIntervals of CurrentInterval [0,0.4). The cumulative probability does
not change i.e the same as previous.
IR[0,0.4]={[0,0+(0.4-0)*0.4),[ 0+(0.4-0)*0.4, 0+(0.4-0)*0.7),[ 0+(0.4-0)*0.7, 0+(0.4-0)*0.8),
[0+(0.4-0)*0.8, 0+(0.4-0)*1)}
IR[0,0.4]={[0,0.16),[0.16,0.28),[0.28,0.32),[0.32,0.4)}.
Which interval will be the next CurrentInterval? Read the next letter from message. It is b. B is
found in the second place in the source alphabet list. The next CurrentInterval will be the second
SubInterval i.e [0.16,0.28).
Continue like this till there is letter left in the message. You will get the following result:
IR[0.16,0.28]={[0.16,0.208),[0.208,0.244),[0.244,0.256),[0.256,0.28)}. Next
IR[0.208,0.244]={[0.208,0.2224),[0.2224,0.2332),[0.2332,0.2368),[0.2368,0.242). Next
IR[0.2332,0.2368]={[0.2332,0.23464),[0.23464,0.23572),[0.23572,0.23608),[0.23608, 0.2368)}.
We are done because no more letter remained in the message. The last letter read was #. It is the
fourth letter in source alphabet. So take the fourth SubInterval as CurrentInterval i.e [0.23608,
0.2368]. Now any number between the last CurrentInterval is sent as the message. So you can send
0.23608 as the encoded message or any number between 0.23608, and 0.2368.
Diagramatically, calculating SubIntervals look like this:

Fig sub interval and current interval
Decoding
Algorithm:
ArithmeticDecoding(codeword)
CurrentInterval=[0,1];
While (1)
Divide CurrentInterval into SubIntervals IRcurrentInterval;
Determine the SubIntervali of CurrentInterval to which the codeword belongs;
Output letter Xi corresponding to this SubInterval;
If end of file
Return;
CurrentInterval=SubIntervali in IRcurrentInterval;
End of while
Example:
Decode 0.23608 which we previously encoded.
To decode the source alphabet and related probability should be known by destination. Let us use the
above source and probability.
s={a,b,c,#}
p={0.4,0.3,0.1,0.2}
First set CurrentInterval to [0,1], and then calculate SubInterval for it. The formula to calculate the
SubInterval is the same to encoding. The cumulative probabilities are:
P1=0.4
P2=0.4+0.3=0.7
P3=0.4+0.3+0.1=0.8
P4=0.4+0.3+0.1+0.2=1
IR[0,1]={[0,0+[1-0]*0.4),[ 0+[1-0]*0.4, 0+[1-0]*0.7),[ 0+[1-0]*0.7, 0+[1-0]*0.8),
[ 0+[1-0]*0.8, 0+[1-0]*1)}
IR[0,1]={[0,0.4),[0.4,0.7),[0.7,0.8),[0.8,1)}. Now check in which SubInterval the encode message
falls. It falls in the first SubInterval i.e [0,0.4]. Output the first letter from source alphabet. It is a. Set
CurrentInterval to [0,0.4]
IR[0,0.4]={[0,0+(0.4-0)*0.4),[ 0+(0.4-0)*0.4, 0+(0.4-0)*0.7),[ 0+(0.4-0)*0.7, 0+(0.4-0)*0.8),

[0+(0.4-0)*0.8, 0+(0.4-0)*1)}
IR[0,0.4]={[0,0.16),[0.16,0.28),[0.28,0.32),[0.32,0.4)}. Again check where 0.23608 falls. It falls in
the second SubInterval i.e [0.16,0.28]. Set CurrentInterval to this SubInterval. Output the second
letter from source alphabet. It is b.
IR[0.16,0.28]={[0.16,0.208),[0.208,0.244),[0.244,0.256),[0.256,0.28)}. 0.23608 falls in the second

SubInterval. Output the second letter from source alphabet. It is b.
IR[0.208,0.244]={[0.208,0.2224),[0.2224,0.2332),[0.2332,0.2368),[0.2368,0.242). falls in the third

SubInterval. Output the third letter from source alphabet. It is c.
IR[0.2332,0.2368]={[0.2332,0.23464),[0.23464,0.23572),[0.23572,0.23608),[0.23608, 0.2368)}.
0.23608 falls in the fourth SubInterval. Output fourth letter which is #. Now end of the message has
been reached.
Disadvantage: arithmetic precision of computer is soon suppressed and hence large message canít
be encoded.
Implementation of Arithmetic Coding

To solve the above disadvantage, arithmetic coding is implemented as follows:
Algorithm:
OutputBits()
{
While(1)
If CurrentInterval [0,0.5)
Output 0 and bitcount 1s; //and here shows concatenation
Bitcount=0;
Else if CurrentInterval [0.5,1)
Output 1 and bitcount 0s;
Bitcount=0;
Subtract 0.5 from left and right bounds of CurrentInterval;
Else if CurrentInterval [0.25,0.75)
Bitcount++;
Subtract 0.25 both left and right bounds of CurrentInterval;
Else
Break;
Double left and right bounds of CurreentInterval;
}
FinishArithmeticCoding()
{
bitcount++;
if lowerbound of CurrentInterval <0.25
output 0 and bitcount 1s;
else
output 1 and bitcount 0s;
}
ArithmeticEncoding(message)
{
CurrentInterval=[0,1];
Bitcount=0;
While the end of message is not reached
{
read letter Xi from message;
divide CurrentInterval into SubInterval IRCurrentInterval;
CurrentInterval=SubIntervali in IRCurrentInterval;
OutputBits();
}
FinishArithmeticEncoding();
}
Example:
Encode the message abbc#.
s={a,b,c,#}
p={0.4,0.3,0.1,0.2}
CurrentInterval input output bitcount Subintervals
[0,1) a 0 [0,0.4) [0.4,0.7) [0.7,0.8) [0.8, 1)
[0,0.4) 0
[0,0.8) b [0,0.32) [0.32, 0.56) [0.56,0.64) [0.64,0.8)
[0.32,0.56) - 1
[0.14,0.62) b [0.14,0.332) [0.332,0.476) [0.476,0.524) [0.524,0.62)
[0.332,0.476) 01 0
[0.664,0.952) 1
[0.328,0.904) c [0.328,0.5584) [0.5584,0.7312) [.7312,.7888) [.7888,.904)
[.7312,.788) 1
[.4624,.5776) - 1
[.4248,.6552) - 2
[.3496,.8104) # [.3496,.53392) [.53392,.67216) [.67216,.71824) [.71824,.8104)
[.71824,.8104) 100 0
[.43648,.6208) - 1
[.37296,.7416) - 2
[.24592,.9832) 0111
The final code will be 001111000111 from the output column of table.

Multi 4 Merged

Uploaded by

Copyright:

Available Formats

Multi 4 Merged

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multi 4 Merged

Uploaded by

Copyright:

Available Formats

Chapter 4

Color in Image and Video

 in this chapter we shall consider the following topics:

 Light is an electromagnetic wave. Its color is characterized by

 Figure 4.1 shows the phenomenon that white light

 Visible light is an electromagnetic wave in the range 400

 The retina consists of an array of rods and three kinds of cones.

 Fig. 4.3 shows the overall sensitivity as a dashed line – this

• The rod sensitivity curve looks like the luminous-efficiency

• The achromatic channel produced by the cones is approxi-mately

400 450 500 550 600 650 700

 The eye has about 6 million cones, but the proportions of R,

 They likely are present in the ratios 40:20:1

 So the achromatic channel produced by the cones is thus

 In most situations, we actually image light that is reflected

 Surfaces reflect different amounts of light at different

 then the reflected light filtered by the eye’s cone

 See Figure 4.5 Next slide

 Camera systems are made in a similar fashion; a good camera has

 Analog signals are converted to digital, truncated to integers, and

1. Instead of red, green, and blue primaries, we need pri-

Red Yellow Cyan Blue

Black (0, 0, 0) White (1, 1, 1) White (0, 0, 0) Black (1, 1, 1)

The RGB Cube The CMY Cube

Fig. 4.15: RGB and CMY color cubes.

• Fig. 4.16: color combinations that result from combining

Fig. 4.16: Additive and subtractive color. (a): RGB is used to

55 Li & Drew c Prentice Hall 2003

• Beware: gamma correction or not is usually ignored.

c) HSV — Hue, Saturation and Value;

d) HSI — Hue, Saturation and Intensity;

• MUltiple sub-Nyquist Sampling Encoding (MUSE) was an

Basics of Digital Audio

Need for Compression

640 horizontal resolution

A simple characterization of data compression is that it involves transforming a string of characters

Family Variations Used in

Variable Length Encoding

Example: Suppose the following source and with related probabilities

Again subdivide S2 into subgroups considering the probability again.

Fig Shannon-Fano coding tree

Instead of transmitting ABCDE, we transmit 000110110111.

Example: encode the ff string ìaababacbaacbaadaaî

S is initialized to the first letter of message a (s=a)

Read the next letter from message to c (c=b)

Read the next letter to c (c=a).

Read the first code. It is 1. Output the corresponding lettera

Huffman coding has the following properties:

Algorithm to create the tree

Example: Suppose the following source and related probability

The code for each letter is:

The original message will be encoded to:

Move left if you get 0

where Pi= , and

First let us get cumulative probability Pi

Next calculate subintervals of [0,1) using the formula given above.

Diagramatically, calculating SubIntervals look like this:

IR[0,0.4]={[0,0+(0.4-0)*0.4),[ 0+(0.4-0)*0.4, 0+(0.4-0)*0.7),[ 0+(0.4-0)*0.7, 0+(0.4-0)*0.8),

IR[0.16,0.28]={[0.16,0.208),[0.208,0.244),[0.244,0.256),[0.256,0.28)}. 0.23608 falls in the second

IR[0.208,0.244]={[0.208,0.2224),[0.2224,0.2332),[0.2332,0.2368),[0.2368,0.242). falls in the third

IR[0,0.4]={[0,0+(0.4-0)0.4),[ 0+(0.4-0)0.4, 0+(0.4-0)0.7),[ 0+(0.4-0)0.7, 0+(0.4-0)*0.8),