MPEG, The MP3 Standard, and Audio Compression
MPEG, The MP3 Standard, and Audio Compression
y
K
K
K
G
0
G
K
Encoder Decoder
x
6
Z Transform
n Assists in splitting frequencies
n Discrete Time generalization of the Fourier
transform
n Important Properties
n Linearity
n Convolution Theorem
n Delay Theorem
n Can model all kinds of filter banks through it
n Representation of frequency content
DFT 2 * 1024
Hann Window
Filt er Bank ( 32
Sub-Bands)
0
31
MDCT
Psychoacoust ic
Model
Non-Uniform
Mi dt r ead Quant izer
Rat e/ Dist or t ion Loop
0
511
Huffman Coding
Codi ng of Si de
I nfor mat ion
Bi t st r eam
Format t ing
Coded
Audi o
Dat a
Layer III
7
Time to Frequency Mapping
n Filters parse signal to K bands
n Quantized to a limited number of bits
n Noise put in bands barely audible
n Sent to decoder where sound is restored
x
H
0
H
K
K
K
I nput
Out put
y
0
y
K
y
0
y
K
K
K
G
0
G
K
Encoder Decoder
x
MPEG Time to Frequency Mapping
[ ] [ ] ( )
32
16
2
1
cos
1
]
1
+
,
_
+
n k n h n h
k
[ ] [ ] ( )
1
]
1
+
,
_
+
32
16
2
1
cos 32
n k n h n g
k
n Uses a filter of 32 bands, signal represented by 512
samples
n The above equations allow for taking apart the signal
(the h part of the time to frequency mapping diagram)
and putting it back together (the g part of the time to
frequency mapping diagram)
Analysis Filter: Synthesis Filter:
511 , , 1 , 0 ; 31 , , 1 , 0 K K n k
8
DFT 2 * 1024
Hann Window
Filt er Bank ( 32
Sub-Bands)
0
31
MDCT
Psychoacoust ic
Model
Non-Uniform
Mi dt r ead Quant izer
Rat e/ Dist or t ion Loop
0
511
Huffman Coding
Codi ng of Si de
I nfor mat ion
Bi t st r eam
Format t ing
Coded
Audi o
Dat a
Layer III
PQMF & MDCT
n Both are methods of time to frequency mapping
n Pseudo-Quadrature Mirror Function
n Multiple Discrete Cosine Transformation
n Mathematically, they are equivalent
n PQMF involves using Z transforms to represent
the amplitudes of the frequency
n MDCT involves performing a block transform
using a window to represent amplitudes
n These amplitudes are then quantized
9
DFT 2 * 1024
Hann Window
Filt er Bank ( 32
Sub-Bands)
0
31
MDCT
Psychoacoust ic
Model
Non-Uniform
Mi dt r ead Quant izer
Rat e/ Dist or t ion Loop
0
511
Huffman Coding
Codi ng of Si de
I nfor mat ion
Bi t st r eam
Format t ing
Coded
Audi o
Dat a
Layer III
Pyschoacoustic Model
n determines masking threshold for each sub band
n Uses human auditory property of Auditory
Masking
10
Non-uniform Quantizer
n Analog to digital
n Quantizer: Maps amplitude values into finite
number of bits
n Non-uniform: changes sample size according
to amplitude values
n parts of signal with lesser amplitude coded
with greater accuracy increases signal to
noise ratio (SNR)
DFT 2 * 1024
Hann Window
Filt er Bank ( 32
Sub-Bands)
0
31
MDCT
Psychoacoust ic
Model
Non-Uniform
Mi dt r ead Quant izer
Rat e/ Dist or t ion Loop
0
511
Huffman Coding
Codi ng of Si de
I nfor mat ion
Bi t st r eam
Format t ing
Coded
Audi o
Dat a
Layer III
11
Huffman coding
n For better data compression, variable-length
Huffman codes are used to encode the
quantized samples.
n quantized MDCT coefficients (for long blocks)
arranged in order from lowest to highest
frequency
n whole range divided into 3 sections, each
coded with a different set of Huffman tables
BitstreamFormatting
n formats encoded quantized samples into an
encoded bitstream final form in which the
compressed signal is transmitted.
12
MPEG-4 and The Future?
n Incorporates speech and music compression
n More of an extension of MPEG-2
compression techniques with independent
techniques geared specifically at coding for
speech content (some coding for meaning)
n Hasnt really taken off yet, only time will tell
n MPEG-2 AAC (Advanced Audio Coding) is
the audio format that is used if you download
from the apple iTunes store