A Lecture Series on
DATA COMPRESSION
JPEG and MPEG Standards
Abdou Youssef (Speaker)
Anastase Nakassis (MDVTG Manager)
1
Motivation for Standards
Why Standards?
{ Compatibility
{ Production cost reduction
{ Triggering growth and product development
Do JPEG/MPEG standards kill research?
{ Not necessarily
{ Those standards are very
exible regarding the encoder
design, leaving much room for improvement
{ The trends are toward even greater
exibility in future
generations of the standards
{ Another growth area is the development of systems which
integrate components that use the standards
2
Image/Video Compression Standards
(Outline)
JPEG
{ Baseline JPEG
{ Extended JPEG
{ Lossless JPEG (DPCM + Human/Arithmetic)
MPEG
{ History (H.261 and H.263)
{ MPEG1
{ MPEG2
{ Why not MPEG3
{ Future: MPEG4 (slated for 1998)
3
The JPEG \Toolkit"
JPEG provides a \toolkit" of techniques for compressing
continuous-tone, still, color and monochrome images
Baseline JPEG provides a DCT-based algorithm, and uses
run-length encoding and Human coding
Baseline JPEG operates only in sequential mode, and is re-
stricted to 8 bits/pixel input images
Extended JPEG oers several optional enhancements:
{ 12-bit/pixel input
{ Progressive transmission
{ Choice between Arithmetic and Human coding
{ Adaptive quantization
{ Tiling
{ Still picture interchange le format (SPIFF)
{ Selective renement
Applications of JPEG
{ Desktop publishing, color fax, photojournalism,
picture archiving and communications systems for med-
ical images, general image archiving systems, consumer
imaging, graphic arts, and others
4
The Baseline JPEG Algorithm
1. It operates on 8 8 blocks of the input image
2. Mean-normalization (subtract 128 from each pixel)
3. Transform: DCT-transform each block
4. Quantization
An 8 8 quantization matrix Q is user-provided
Each block is divided by Q (point by point)
The terms are then rounded to their nearest integers
Remark: Up to 4 quantization matrices per image are
allowed (for example, one for luminance, and for each of
the three color components)
5. Entropy-coding of the DC coecients (the top left coecient
of each quantized block) using DPCM+Human
Human-encode the DC residuals derived from the dif-
ference between each DC and the DC of the preceding
block
6. Entropy-coding of the AC (i.e., non-DC) coecients
Zigzag-order the quantized coecients of each block
Record for each nonzero coecient both its distance (called
run) to the preceding nonzero coecient in the zigzag se-
quence, and its value (called level)
Human code the [run,level] terms using one single Hu-
man table for all the AC's of the image
5
A Schematic Diagram of the JPEG Encoder
6
The Quantization Matrices
They are user-provided
They can be computed using the contrast sensitivity function
of the HVS
Their values are 8-bit integers
They provide control over the bitrate by scaling them by a
constant factor
Example
0 1
11
B
B 16 10 16 24 40 51 61 CC
B
12
B
B
B
B
12 14 19 26 58 60 55 CCCCC
13
B
B
B
B
14 16 24 40 57 69 56 CCCC
Q= 17
B
B
B
B
14 22 29 51 87 80 62 CCCC
:
22
B
B
B
B
18 37 56 68 109 103 77 CCCC
B
35
B
B
B 24 55 64 81 104 113 92 CCCC
B
64
B
B
B 49 78 87 103 121 120101 CCCC
@ A
72 92 95 98 112 100 103 99
7
The DC Human Table
The DC residuals are in the range [-2047,2047]
Divide this range into 12 categories corresponding to \one-
dimensional ripples"
Category k, 0 k 11, consists of 2k integers ranging from
,2k + 1 to ,2k, , and from 2k, to 2k , 1
1 1
Develop a Human code for the 12 categories, where every
codeword is at most 16 bits long
Encode each DC residual as a binary string hsm where
{ h is the codeword of the residual's category
{ s = sign of the residual; s = 0 if negative, 1 if positive
{ m = the (k , 1)-bit [(magnitude of the residual) ,2k, ] 1
8
Coding of the AC Terms
(The AC Human Table)
Divide the range of the AC's into 10 categories
Category k, 1 k 10, consists of 2k integers ranging from
,2k + 1 to ,2k, , and from 2k, to 2k , 1
1 1
Describe each [run; level] by 8 bits `NNNNSSSS'
{ NNNN = the value of run (i.e., the runlength)
{ SSSS = the category of level
{ If the runlength run > 15, then
run = 15p + r, 0 r 15, r = r r r r in binary
3 2 1 0
describe [run; level] by
111100001 111100002 ::: 1111000p r3r2r1r0SSSS
Add an end-of-block EOB symbol after the last nonzero
coecient in each block
The total number of symbols needed is 16*10+1+1=162
Build a Human table for those 162 symbols, where every
codeword is at most 16 bits long
Encode each AC quantized term as hsm where
{ h is the codeword of the AC's [run; category]
{ s = sign of the term; s = 0 if negative, 1 if positive
{ m = the (k , 1)-bit [(magnitude of the term) ,2k, ] 1
9
Examples
Take 0an 8 8 block of Lena 1
B
B 143
147 149 152 156 147 146 149 CC
B
B
B
B
B
151
146 143 154 148 144 153 132 CCCCC
B
B
B
B
147
143 145 149 144 145 128 133 CCCC
B=
B
B
B
B
152
145 145 144 146 134 130 137 CCCC
B
B
B
B
146
143 142 147 124 127 139 138 CCCC
B
B
B
B 139
145 139 127 126 135 139 141 CCCC
B
B
B
B 145
137 124 130 138 136 140 144 CCCC
@ A
144 124 136 134 137 139 142 145
Normalize
0 B 1
BB 15
19 21 24 28 19 18 21 CC
BB
18BB 23
BB
15 26 20 16 25 4 CCCCC
15 BB 19 17 21 16 17 0 5 CCCC
BB
17 BB 24 17 16 18 6 2 9 CCCC
NB = 15
BB
BB 18 14 19 ,4 ,1 11 10 CCCC
BB
17 BB 11
BB 11 ,1 ,2 7 11 13 CCCC
9 BB 17
B@ ,4 2 10 8 12 16 CCCC
16 ,4 8 6 9 11 14 17 A
10
Perform
0 DCT 1
:4
B
B 103 12:4 6:0 2:1 8:1 5:7 ,0:7 ,0:4 CC
B
:7
B
B
B 31 11:6 ,22:8 ,0:7 ,0:1 ,0:5 ,4:7 ,1:8 CCCCC
B
:0
B
B
B 11 ,21:2 ,3:7 8:8 2:3 0:1 ,0:2 5:6 CCCC
D=
B
:2
B
B
B
B
0 ,4:9 10:3 ,8:6 ,8:9 ,2:5 ,7:6 ,1:4 CCCC
:9
B
B
B
B
4 ,0:4 ,1:9 ,8:9 6:6 2:6 3:6 3:6 CCCC
:7
B
B
B
B 0 ,0:9 ,0:9 4:1 7:2 ,15:5 2:8 1:8 CCCC
, :1
B
B
B
B 3 ,1:0 ,2:5 ,5:5 ,5:2 ,6:7 10:7 ,1:1 CCCC
,2:9
@
,0:2 ,1:2 ,2:7 3:6 2:6 0:4 ,6:4 A
Quantize
0
(round(D:=Q)) 1
BB 6
1 0 0 0 0 00 CC
BB
1BB 4 ,2 0 0 0 00 CCCCC
,2 BB
BB 1
BB
0 0 0 0 00 CCCC
0 BB 0 0 0 0 0 00 CCCC
Dq = 0
BB
BB 0 0 0 0 0 00 CCCC
BB
0 BB 0
BB 0 0 0 0 00 CCCC
0 BB 0
B@ 0 0 0 0 00 CCCC
A
0 0 0 0 0 0 00
zigzag ordering: 6 1 4 1 1 0 -2 -2 allzeros
11
Example Human Table for Lena
12
Coding the Example Block
zigzag ordering: 6 1 4 1 1 0 0 -2 -2 allzeros
Human symbols: [run, category]
DC (0,1) (0,3) (0,1) (0,1) (2,2) (0,2) EOB
Code:
(DC di. code)/001/100100/001/001/111110010/0100/1010
Bitrate (assuming 8 bits dor the DC residual symbol):
40
64
= 0:625 bits/pixel
13
Decoding
1. Entropy-decode the bitstream back to the quantized blocks
2. Dequantize: multiply each block coecient by the corre-
sponding coecient of the quantization matrix
3. Apply the inverse DCT transform on each block
4. Denormalize: add 128 to each coecient
Example: Reconstructed block ^
B
0 1
B
B 143 147 153 157 157 154 149 145 CC
B
B
B
B
B
145 147 151 154 153 149 144 141 CCCCC
B
B
B
B
146 147 149 149 146 142 137 134 CCCC
B
B
B
B
146 146 145 142 139 135 132 130 CCCC
B
B
B
B
145 143 140 136 133 131 130 130 CCCC
B
B
B
B 141 138 134 131 130 131 134 135 CCCC
B
B
B
B 136 134 130 128 129 133 139 142 CCCC
@ A
133 131 127 126 129 135 142 147
14
Error
0 block 1
B
B 0 0 ,4 ,5 ,1 ,7 ,3 4 CC
B
B
B
B 6 , 1 , 8 0 ,5 ,5 9 ,9 CCCCC
B
B
B
B 1 ,4 ,4 0 ,2 3 ,9 ,1 CCCC
6 ,1 0 2 7 ,1 ,2
B
B
B
B
B
7 CCCC
B
B
B
B
1 0 2 11 ,9 ,4 9 8 CCCC
B
B
B
B ,2 7 5 ,4 ,4 4 5 6 CCCC
B
B
B
B 9 3 ,6 2 9 3 1 2 CCCC
@
11 ,7 9 8 8 4 0 ,2 A
MSE=5.2
15
Extended JPEG
Extended JPEG allows for several optional enhancements:
{ 12-bit/pixel input
{ Arithmetic coding is allowed as an alternative to Human
coding
{ Adaptive quantization: Allows 5-bit scale change to the
quantization matrix from one block to another
{ SPIFF: A le format that provides for the interchange
of compressed image les between dierent application
environments
{ Progressive transmission (PT)
Sequential PT
Spectral selection
Successive approximations
Hierarchical PT (pyramid encoding)
{ Tiling
{ Selective renement
16
Sequential Progressive Transmission
17
Hierarchical Progressive Transmission
Subsampling lters are left to the users to specify
Upsampling: bilinear interpolation
A B ,! A aB
c x b
CD C dD
a = A B; b = B D; c = A C ; d = C D; x = A
+
2
+
2
+
2
+
2
+B +C +D
4
18
MPEG
(History)
Standard Description
Activity
Internation Telecommunication Union (ITU);
H.261 designed for ISDN applications at p 64 kbits/sec,
Dec. 1990 (p = 1; 2; :::; 30)
ITU Recommendation;
H.263 video coding for low bitrate communication;
Feb. 1995 improved visual telephony applications
ISO Moving Picture Expert Group;
MPEG1: storage/retrieval of video+audio at about
1.5 Mbits/s
MPEG MPEG2: for generic applications at higher birates
Nov. 1994 (initially 10Mbits/s);
MPEG3: initially intended for 50 Mbits/s bitrate,
but MPEG2 was found to achieve the MPEG3 speed
goals, leading to the abandonment of MPEG3
19
Basic Concepts
Video a sequence of images called frames
Color
{ Three components: red (R), green (G), and blue (B)
{ For compatibility with non-colored media, the RGB model
was converted to an equivalent model | Y CbCr
Y is the luminance component, which was experimen-
tally determined to be
Y = 0:299R + 0:587G + 0:114B
Cb = B , Y
Cr = R , Y
{ Y is referred to as luma, and Cb & Cr as chroma
Every frame is really 3 images: one Y , one Cb and one Cr
8 bits/pixel for each of the three color components
Because human vision is less sensitive to color, the Cb and
Cr images are downsampled by 2 in each dimension (they
are quarter the size of Y images)
Much of MPEG processing is on the basis of a macroblock:
A 16 16 luminance block with the two 8 8 associated
chroma blocks
20
Modes of MPEG Compression
Intraframe compression
{ Exploits spatial redundancy only
{ Operates on single frames independetly of other frames
Interframe compression
{ Exploits both spatial and temporal redunadancies
{ Employs motion estimation (MS) without standardizing
any MS algorithm
{ Derives motion-compensated predictions of frames
{ Finally, it performs JPEG-like compression on the resid-
ual frames
21
Types of Frames in MPEG
MPEG has 3 types of frames: I, P, and B
I frames are stricly intra compressed as in JPEG. Their pur-
pose is to provide random access points to the video
P frames are motion-compensated forward-predictive-coded
frames; they are interframe compressed, and typically pro-
vide more compression than I frames
B frames are motion-compensated bidirectionally-predictive-
coded frames; they are interframe compressed, and typically
provide the most compression
The relative numbers of I, P and B frames are arbitrary
An I frame must occur at least once every 132 frames to
provide user-acceptible speed of random access to various
parts of a video
22
Interframe Compression of P and B Blocks
A motion-compensated prediction (i.e., approximation) f of
a P/B frame f is made
The residual f , f is then compressed in a JPEG-like style
Any macroblock of the original frame f may be strictly intra
compressed if its prediction is deemed to be poor
Issues to be addressed
{ Method of strict intra compression
{ Method of compressing residual macroblocks
{ Motion-compensated prediction
23
Flowchart of MPEG Compression
24
Motion-Estimation and Prediction
Motion estimation is performed on the basis of macroblocks,
using the 16 16 luminance blocks only
Motion is assumed to be uniform across all the pixels of a
macroblock
Remark: There is a tradeo in deciding the size of the basic
block for motion estimation
{ The block has to be suciently large to avoid \false hits"
{ The block has to be suciently small to avoid diverse
motions within one single block
{ The MPEG block size, 16 16, is a good compromise
25
Motion-Estimation and Prediction for P Frames
Consider a P-frame P
P will be predicted (i.e., approximated) from one single ref-
erence frame R
R is the most recent (decoded) I or P frame
For each macroblock MB of P , nd the closest matching
macroblock MB' in the reference frame R
If the MB-to-MB' match is satisfactory, then
{ treat MB' as the prediction (i.e., approximation) of MB
{ record the motion (i.e., displacement) vector between the
two macroblocks (allowing half-pixel accuracy)
{ Compute and compress the macroblock residual MB,MB'
(luma and chroma)
26
If the MB-to-MB' match is found to be unsatisfactory, then
the macroblock MB is strictly intra compressed as is done in
I frames
The motion vectors of all the macroblocks of P exhibit re-
dundancy due to similar (or sometimes identical) motion ex-
perienced by many neighboring macroblocks
This redundancy is exploited by coding the consecutive dif-
ferential values of motion vectors (i.e., DPCM)
Remark 1: MPEG does not standardize the decision mech-
anism for judging whether or or not a match between two
macroblocks is satisfactory
Remark 2: A typical decision mechanism involves computing
an error measure between the luminance of the two macro
blocks. The match is treated as satisfory if and only if the
error is below a certain threshold. Possible error measures
include mean-square error (MSE), mean absolute-dierence
(MB ,MB )
error (MAD), and variance
variance(MB ) .
0
27
Motion-Estimation and Prediction for B Frames
Consider a B-frame B
B will be predicted (i.e., approximated) from TWO reference
frames R1 and R2
R is the most recent (decoded) past I/P frame, and R is
1 2
the nearest (decoded) future I/P frame
For each macroblock MB of B , nd the closest matching
macroblock MB1 in the reference frame R1, and the closest
matching macroblock MB2 in R2
The predicted macroblock is PM = NINT ( MB + MB )
1 1 2 2
{ = 0:5 and = 0:5 if both matches are satisfactory
1 2
{ = 1 and = 0 if only the 1st match is satisfactory
1 2
{ = 0 and = 1 if only the 2nd match is satisfactory
1 2
{ = 0 and = 0 if neither match is satisfactory
1 2
28
Compute and compress the macroblock residual MB , PM
(luma and chroma)
If neither match is found to be satisfactory, then the mac-
roblock MB is strictly intra compressed as is done in I frames
Record the motion vector(s) between the MB and the other
one or two macroblocks (allowing half-pixel accuracy)
Again, the motion-vector redundancy is exploited by cod-
ing the consecutive dierential values of motion vectors (i.e.,
DPCM)
Remark: The prediction mode chosen is 2-bit coded and
passed on along with the macroblock header information
29
MPEG1 Constrained Parameters
horizontal picture size 768
vertical picture size 576
picture area 396 macroblocks at 25 frames/ses
picture area 330 macroblocks at 30 frames/sec
picture rate 30 frames/sec
bitrate 1.856 Mbits/sec (constant)
30
MPEG2
Higher data rates than MPEG1
MPEG2 allows for higher quality source images
{ 4:2:0 (chroma subsamples only horizontally)
{ 4:4:4 (no subsampling of chroma)
{ Note: 4:2:2 is what's supported by MPEG1
MPEG2 allows for ner quantization and for specifying separate
quantization table for luma and chroma
MPEG2 allows for ner adjustment of quantization scale fac-
tor, used in intra compression
MPEG2 allows for interlaced video
MPEG2 supports error concealment (of lost macroblocks)
MPEG2 supports scalable compression
{ SNR-scalability: by sending bands of DCT coecients
{ spatial-scalability: pixel resolution by down-/up-sampling
{ temporal-scalability: dierent frame rates by skipping frames
31
MPEG2 (Cont.)
MPEG2 has a Prole and Level structure
{ Proles are algorithmic elements included in MPEG2
{ Levels are upper bounds on parameter values
Proles (proles are backward-compatible)
1. Simple: No use of B frames
2. Main: what was described earlier, but no scalability
3. SNR-scalable
4. Spatially scalable
5. High: temporally scalable, higher-quality source data (4:2:0,
4:2:2 and possibly 4:4:4 in the future)
Levels
1. Low: 352 240 frame size
2. Main: 720 480 frame size
3. High 1440: 1440 1152 frame size
4. High: 1920 1080 frame size
32
Allowed Level-Prole Combinations
simple main SNR- spatially high
scalable scalable
high NO YES No No YES
high 1440 NO YES No YES YES
main YES YES YES NO YES
low NO YES YES NO NO
The Grand Alliance (for HDTV) follows the Main Prole
and High Level combination, for a maximum bitrate of 45
Mbits/sec
33
References
1. B. pennebaker and J. L. Mitchell, JPEG Still Image Data
Compression Standard, Van Nostrand reinhold, New York
1993.
2. ISO-11172-2: Generic Coding of moving pictures and associ-
ated audio (MPEG-1)
3. ISO-13818-2: Generic Coding of moving pictures and associ-
ated audio (MPEG-2)
34