0% found this document useful (0 votes)
74 views12 pages

MPEG, The MP3 Standard, and Audio Compression

1. The document discusses audio compression techniques used in MP3 files. It explains how MP3s use lossy compression to eliminate unnecessary frequencies from the audio signal in a way that is perceptually lossless to human hearing. 2. The compression process involves converting the audio from the time domain to the frequency domain using a filter bank or MDCT. It then applies psychoacoustic modeling and non-uniform quantization to reduce bits allocated to frequencies that are masked by louder sounds. 3. The quantized frequency coefficients are entropy encoded using Huffman coding and formatted into an MP3 bitstream for transmission or storage. While lossy, MP3 compression allows for high compression ratios with negligible perceived quality loss compared to
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views12 pages

MPEG, The MP3 Standard, and Audio Compression

1. The document discusses audio compression techniques used in MP3 files. It explains how MP3s use lossy compression to eliminate unnecessary frequencies from the audio signal in a way that is perceptually lossless to human hearing. 2. The compression process involves converting the audio from the time domain to the frequency domain using a filter bank or MDCT. It then applies psychoacoustic modeling and non-uniform quantization to reduce bits allocated to frequencies that are masked by louder sounds. 3. The quantized frequency coefficients are entropy encoded using Huffman coding and formatted into an MP3 bitstream for transmission or storage. While lossy, MP3 compression allows for high compression ratios with negligible perceived quality loss compared to
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1

MPEG, the MP3 Standard,


and Audio Compression
Mark Kilgore and Jamie Wu
Mathematics of the Information Age
September 16, 2003
Audio Compression
n Basic Audio Coding.
n Why beneficial to compress?
n Lossless versus Lossy Compression.
n How are MP3s Compressed?
n What makes MP3 Compression Different?
n What other formats lie in our future?
2
PCM
Why Compress??
n Eliminate redundancy
n Most basic encoder/decoder is PCM
n Lots of redundancy b/c PCM representation is a basic
sine wave
n If representing the sine wave based on frequency
rather than time, only need to store information
regarding frequency, amplitude, and phase in order
to represent the information
n Can reduce data without information loss
n Extends playing time, Allows for miniaturization and
greater equipment tolerance, Reduces cost
3
Lossless vs. Lossy (Perceptive)
n Lossless coding allows perfect reconstruction
of a signal (theoretically)
n Lossy Coding creates a more highly
compressed signal, but some unnecessary
frequencies are eliminated
n Perceptually, however, lossy coding results in
no difference in how it SOUNDS to a person
n MP3s are lossy, but perceptually lossless
MPEG
n Moving Picture Experts Group
n Aim to create standards relating to synchronized
audio and video compression
n MPEG-1
n MPEG-2
4
MPEG-1 Block Diagrams
Topics Discussed in Detail After Diagrams
Layers I and II
Filt er Bank ( 32
Sub-Bands)
0
31
DFT 512/ 1024
Hann Window
Psychoacoust ic
Model
Uniform Mi dt r ead
Quanit zer
Codi ng of Si de
I nfor mat ion
Bi t st r eam
Format t ing
Coded
Audi o
Dat a
5
DFT 2 * 1024
Hann Window
Filt er Bank ( 32
Sub-Bands)
0
31
MDCT
Psychoacoust ic
Model
Non-Uniform
Mi dt r ead Quant izer
Rat e/ Dist or t ion Loop
0
511
Huffman Coding
Codi ng of Si de
I nfor mat ion
Bi t st r eam
Format t ing
Coded
Audi o
Dat a
Layer III
Time to Frequency Mapping
n Filters parse signal to K bands
n Quantized to a limited number of bits
n Noise put in bands barely audible
n Sent to decoder where sound is restored
x
H
0
H
K
K
K
I nput
Out put
y
0
y
K
y
0

y
K

K
K
G
0
G
K
Encoder Decoder
x
6
Z Transform
n Assists in splitting frequencies
n Discrete Time generalization of the Fourier
transform
n Important Properties
n Linearity
n Convolution Theorem
n Delay Theorem
n Can model all kinds of filter banks through it
n Representation of frequency content
DFT 2 * 1024
Hann Window
Filt er Bank ( 32
Sub-Bands)
0
31
MDCT
Psychoacoust ic
Model
Non-Uniform
Mi dt r ead Quant izer
Rat e/ Dist or t ion Loop
0
511
Huffman Coding
Codi ng of Si de
I nfor mat ion
Bi t st r eam
Format t ing
Coded
Audi o
Dat a
Layer III
7
Time to Frequency Mapping
n Filters parse signal to K bands
n Quantized to a limited number of bits
n Noise put in bands barely audible
n Sent to decoder where sound is restored
x
H
0
H
K
K
K
I nput
Out put
y
0
y
K
y
0

y
K

K
K
G
0
G
K
Encoder Decoder
x
MPEG Time to Frequency Mapping
[ ] [ ] ( )
32
16
2
1
cos
1
]
1

+
,
_

+

n k n h n h
k
[ ] [ ] ( )
1
]
1

+
,
_

+
32
16
2
1
cos 32

n k n h n g
k
n Uses a filter of 32 bands, signal represented by 512
samples
n The above equations allow for taking apart the signal
(the h part of the time to frequency mapping diagram)
and putting it back together (the g part of the time to
frequency mapping diagram)
Analysis Filter: Synthesis Filter:
511 , , 1 , 0 ; 31 , , 1 , 0 K K n k
8
DFT 2 * 1024
Hann Window
Filt er Bank ( 32
Sub-Bands)
0
31
MDCT
Psychoacoust ic
Model
Non-Uniform
Mi dt r ead Quant izer
Rat e/ Dist or t ion Loop
0
511
Huffman Coding
Codi ng of Si de
I nfor mat ion
Bi t st r eam
Format t ing
Coded
Audi o
Dat a
Layer III
PQMF & MDCT
n Both are methods of time to frequency mapping
n Pseudo-Quadrature Mirror Function
n Multiple Discrete Cosine Transformation
n Mathematically, they are equivalent
n PQMF involves using Z transforms to represent
the amplitudes of the frequency
n MDCT involves performing a block transform
using a window to represent amplitudes
n These amplitudes are then quantized
9
DFT 2 * 1024
Hann Window
Filt er Bank ( 32
Sub-Bands)
0
31
MDCT
Psychoacoust ic
Model
Non-Uniform
Mi dt r ead Quant izer
Rat e/ Dist or t ion Loop
0
511
Huffman Coding
Codi ng of Si de
I nfor mat ion
Bi t st r eam
Format t ing
Coded
Audi o
Dat a
Layer III
Pyschoacoustic Model
n determines masking threshold for each sub band
n Uses human auditory property of Auditory
Masking
10
Non-uniform Quantizer
n Analog to digital
n Quantizer: Maps amplitude values into finite
number of bits
n Non-uniform: changes sample size according
to amplitude values
n parts of signal with lesser amplitude coded
with greater accuracy increases signal to
noise ratio (SNR)
DFT 2 * 1024
Hann Window
Filt er Bank ( 32
Sub-Bands)
0
31
MDCT
Psychoacoust ic
Model
Non-Uniform
Mi dt r ead Quant izer
Rat e/ Dist or t ion Loop
0
511
Huffman Coding
Codi ng of Si de
I nfor mat ion
Bi t st r eam
Format t ing
Coded
Audi o
Dat a
Layer III
11
Huffman coding
n For better data compression, variable-length
Huffman codes are used to encode the
quantized samples.
n quantized MDCT coefficients (for long blocks)
arranged in order from lowest to highest
frequency
n whole range divided into 3 sections, each
coded with a different set of Huffman tables
BitstreamFormatting
n formats encoded quantized samples into an
encoded bitstream final form in which the
compressed signal is transmitted.
12
MPEG-4 and The Future?
n Incorporates speech and music compression
n More of an extension of MPEG-2
compression techniques with independent
techniques geared specifically at coding for
speech content (some coding for meaning)
n Hasnt really taken off yet, only time will tell
n MPEG-2 AAC (Advanced Audio Coding) is
the audio format that is used if you download
from the apple iTunes store

You might also like