MMC Unit III-1

MMC Unit III
By
M. C. Aralimarad
Introduction to Audio and Video Compression
• Key Characteristics:
• Unlike text and images, audio and video signals are continuously varying analog
signals.
• Digitization involves a continuous stream of digital values representing sampled
analog signals.
• Compression Differences:
• Algorithms differ for digitized audio/video compared to text/image data.
• Audio Compression (Section 4.2)
• Digitization Process:
• Performed using Pulse Code Modulation (PCM).
• Sampling rate: At least twice the maximum frequency (Nyquist rate).
• Sampling Examples:
• Speech signal: Max frequency = 10 kHz → Sampling rate = 20 kHz.
• Music: Max frequency = 20 kHz → Sampling rate = 40 kHz.
• Bit Requirements:
• Speech: Typically 12 bits per sample.
• General audio: 16 bits per sample.
• Stereo signal: Two channels are digitized.
• Resulting bit rates:
• Compression Methods
• Challenges:
• High bit rates exceed available channel bandwidth.
• Solutions:
• Lower Sampling Rate:
• Reduces quality due to loss of high-frequency components.
• Compression Algorithms:
• Efficiently reduce data rates while preserving acceptable quality.
• Differential Pulse Code Modulation (DPCM) - Section 4.2.1
• Overview:
• Derived from PCM.
• Encodes differences between successive audio signal samples.
• Advantages:
• Reduces required bits per sample:
• Standard PCM for voice: 64 kbps.
• DPCM reduces it to 56 kbps.
• DPCM Encoder
• Components:
• Bandlimiting Filter: Limits input signal's frequency bandwidth.
• Analog-to-Digital Converter (ADC): Digitizes input samples.
• Subtractor & Register (R):
• Computes the difference between current and previous sample values.
• Adder: Updates the register for future operations.
• Parallel-to-Serial Converter: Outputs DPCM signal.
• Figure Reference: Fig. 4.1(a).
• DPCM Decoder
• Components:
• Serial-to-Parallel Converter: Reads DPCM data stream.
• Adder: Reconstructs the signal using stored values in register (R).
• Digital-to-Analog Converter (DAC): Converts reconstructed signal back to
analog.
• Low-Pass Filter: Smoothens signal for playback.
• Figure Reference: Fig. 4.1(a)
• Timing Diagram of DPCM
• Process Breakdown:
• R0=Current Register ValueR0=Current Register Value.
• DPCM=PCM−R0DPCM=PCM−R0.
• R1=R0+DPCMR1=R0+DPCM.
• Timing Considerations:
• T0T0: Time for encoding PCM to DPCM.
• T1T1: Time for updating the register.
• Figure Reference: Fig. 4.1(b).
Predictive DPCM signal encoder and decoder schematic.
Predictive DPCM Signal Encoder and Decoder
• Figure 4.2: Third-order predictive DPCM signal schematic.

• Encoder:
• Components: Bandlimiting filter, ADC, subtractor, adder, predictor
coefficients C1,C2,C3C_1, C_2, C_3C1,C2,C3, registers R1,R2,R3R_1,
R_2, R_3R1,R2,R3, timing control.
• Process: Predicts the signal using previous values weighted by
coefficients.
• Decoder:
• Components: DAC, low-pass filter, adder, predictor coefficients,
registers, timing control.
• Process: Reconstructs the original signal by adding predictions to the
transmitted residuals.
• Input Signal : A simple audio signal is sampled at regular intervals:
Input samples: [10, 12, 15, 13, 14] (in arbitrary units)
• Predictive DPCM Encoder Process:
1. Band limiting Filter: Filters the input to remove unnecessary high-frequency components.
2. ADC (Analog-to-Digital Converter):Converts the filtered input signal to digital form.
3. Prediction:
1. The predictor estimates the current signal using the previous samples and coefficients.
2. Assume predictor coefficients C1=0.7,C2=0.2,C3=0.1 and initial register values R1=R2=R3=0
4. Subtraction (Error Computation): Compute the residual error:
Residual=Actual Signal−Predicted Value.
5. Example Steps:
1. For Input=10 Predicted Value = C1×R1+C2×R2+C3×R3=0
Residual = 10−0=10
Update Registers: R1=10,R2=0,R3=0
2. For Input=12
Predicted Value = 0.7×10+0.2×0+0.1×0=7
Residual = 12−7=5
Update Registers: R1=12,R2=10,R3=0
3. Continue similarly for other samples.
6. Residual Signal Output : Encoded residual signal: [10, 5, 3, -4, 1].
Adaptive Differential Pulse Code Modulation (ADPCM)
Adaptive Differential Pulse Code Modulation (ADPCM)
• Figure 4.3: Subband encoder and decoder schematic.
• Uses two subbands:
• Lower subband (50 Hz – 3.5 kHz): Encoded at 48 kbps.
• Upper subband (3.5 kHz – 7 kHz): Encoded at 16 kbps.
• Process:
• Subbands are encoded separately using ADPCM.
• Multiplexed to form a 64 kbps signal.
• At the receiver, demultiplexer splits the signal for decoding.
• Advantages of ADPCM
• Reduces bit rates for audio transmission.
• Enables independent encoding of subbands.
• Operating bit rates: 64, 56, or 48 kbps.
• ITU-T Recommendation G.726: Defines 16–40 kbps using a 3.4 kHz bandwidth.
Adaptive Predictive Coding (APC)
• Principle: Predictor coefficients change adaptively.

• Process:
• Input signal divided into fixed-time segments.
• Optimum coefficients computed for each segment.
• Results in bandwidth reduction to 8 kbps with acceptable quality.
• How It Works:
• APC tries to predict the next segment of a signal based on the characteristics of
the current segment.
• The predictor coefficients are adaptive, meaning they change depending on
the audio's frequency components at a given moment.
• Example:
• Imagine a music recording with a repeating pattern like a drumbeat.
• Step 1: The algorithm divides the sound into small segments.
• Step 2: For each segment, APC analyzes the frequencies (e.g., low bass, high
cymbals).
• Step 3: It predicts what the next segment will sound like based on these
characteristics.
• Step 4: Only the differences between the actual and predicted sound
(residuals) are transmitted.
• Visualization for Students:
• Input: "Boom-tick, Boom-tick" (drumbeat).
• Prediction: "Boom-tick" is modeled based on the frequency patterns.
• Output: Transmit only changes, such as "Boom-tack" if a cymbal is added.
By reducing the amount of data transmitted, the bandwidth required is
Linear Predictive Coding (LPC)
• Introduction to Linear Predictive Coding (LPC)

• Traditional Methods:
• PCM: Sends quantized speech waveform samples directly.
• DPCM: Sends quantized difference signals.
• LPC Approach:
• Analyzes audio waveform for perceptual features.
• Features are quantized and transmitted.
• Regenerated at the receiver using a synthesizer.
• Advantages:
• High compression rates.
• Low bit rates.
• Disadvantage:
• Sound can be synthetic.
• Key Perceptual Features in LPC
• Determines perception of speech:
• Pitch:
• Related to signal frequency.
• Most sensitive in the 2-5 kHz range.
• Period:
• Duration of the signal.
• Loudness:
• Determined by signal energy.
• Sound Origins (Vocal Tract Excitation Parameters):
• Voiced Sounds:
• Generated via vocal cords (e.g., "m", "v", "l").
• Unvoiced Sounds:
• Vocal cords are open (e.g., "f", "s").
• LPC Encoding Process
• Steps:
• Input speech waveform is sampled and quantized.
• Block of digitized samples (segments) analyzed for perceptual parameters.
• Encoder determines new model coefficients for each segment.
• Adaptive Model:
• Decoder synthesizes speech using current model coefficients.
• LPC Encoder/Decoder Output
• Encoder Output:
• String of frames (one per segment).
• Each frame includes:
• Pitch and loudness fields.
• Voiced/unvoiced notification.
• Computed model coefficients.
• Some LPC encoders (e.g., LPC-10) use up to 10 sets of past coefficients.
• LPC Applications and Performance
• Bit Rates:
• LPC-10: 2.4 kbps or lower (e.g., 1.2 kbps).
• Use Case:
• Primarily in military applications.
• Bandwidth efficiency is critical.
• Limitations:
• Sound quality at very low bit rates can be highly synthetic.
• LPC Signal Encoder/Decoder (Figure 4.4)
• Encoder:
• Digitizes the input speech signal into segments.
• Extracts perceptual parameters such as pitch, loudness, and voiced/unvoiced.
• Generates LPC model coefficients based on vocal tract analysis.
• Decoder:
• Synthesizes speech using coefficients and vocal tract modeling.
• Combines outputs from voiced and unvoiced synthesizers.
• How It Works:
• LPC models the human speech production process by focusing on perceptual features like
pitch, loudness, and voiced/unvoiced sounds.
• It uses these features to regenerate speech at the decoder, achieving compression by not
transmitting all details of the sound.
• Example:
• Imagine you're talking into a phone, saying, "Hello."
• Step 1: The LPC encoder analyzes the sound signal and breaks it into perceptual features:
• Pitch: Frequency of your voice (~200 Hz for a typical female voice).
• Loudness: Energy of the signal (how loud you’re speaking).
• Voiced/Unvoiced: "H" is unvoiced; "ello" is voiced.
• Step 2: These features are converted into a model of the vocal tract.
• Step 3: The decoder reconstructs the speech signal based on this model.
• Think of your voice as a musical instrument:
• The pitch is the key you play on a keyboard.
• The loudness is how hard you press the key.
• The voiced/unvoiced sounds determine whether it’s a wind instrument or a drum.
By sending only this “score” instead of the entire music, the sound is compressed.
Code-excited linear prediction (CELP) model
•LPC Decoders: Simplistic models of the vocal tract.

•CELP Models:
•Advanced version of LPC.
•Part of the enhanced excitation LPC model family.
•Designed for limited bandwidth applications.
•Ensures acceptable speech quality for multimedia use
• The CELP Encoding Process
• Key Features:
• Limited set of waveform templates used.
• Templates stored in a codebook (precomputed by encoder/decoder).
• Differential encoding of template samples.
• Codeword Selection:
• Encoder sends codeword to select best-matching template.
• Improves sound quality through continuity between sample sets.
• International Standards
• CELP-Based Standards:ITU-T Recommendations: G.728, G.729, G.729(A), and G.723.1.
• Features:
• Low bit rates.
• High perceived quality.
• Coder Delays
• Types of Delay:
• Processing Delay: Time for encoder to analyze and decoder to reconstruct speech.
• Algorithmic Delay:
• Time to buffer/store sample blocks.
• May include "lookahead" for analyzing successive blocks.
• Impact:
• High delays unsuitable for real-time conversations.
• Acceptable in non-interactive applications.
• Other Coder Considerations
• Key Parameters:
• Delay: Lower delay needed for conversations.
• Complexity: Trade-off with speech quality.
• Comparison:
• PCM Coders: Minimal delay (~0.125 ms at 8 kHz sampling).
• CELP-Based Standards (Table 4.1)
• Overview: CELP standards are widely used for telephony and video
applications.
• Examples:
• G.728: 16 kbps, 0.625 ms delay, low-bit-rate telephony.
• G.729: 8 kbps, 25 ms delay, telephony in cellular/radio networks.
• G.723.1: 5.3/6.3 kbps, 67.5 ms delay, video and Internet telephony.
• Trade-off: Delay vs. complexity vs. perceived quality.
• How It Works:
• CELP (Code-Excited Linear Prediction) enhances LPC by introducing a codebook to
improve compression and quality.
• Instead of calculating everything from scratch, the algorithm uses predefined patterns
(from the codebook) to match the sound.
• Example:
• Imagine you're playing a game where you need to describe a color.
• Instead of describing the color in detail ("It’s light green, with a yellowish tint"), you
simply say, "Code 5" if Code 5 represents this shade of green in your dictionary.
• Similarly, CELP transmits the codebook index for the closest match, saving
bandwidth.
• Think of the codebook as a playlist:
• Instead of sending the entire song, you just share the song number from Spotify.
• In CELP, only the index (or difference) is transmitted, and the decoder reconstructs the
• Key take aways
1.Understand the Basics: Grasp PCM first, as it forms the foundation for all
other techniques.
2.Focus on Applications: Know which coding scheme suits which application.
3.Adaptiveness vs Complexity: Adaptive techniques (ADPCM, APC, CELP) offer
better efficiency but come with increased complexity.
4.Speech-Specific Techniques: LPC and CELP are specialized for speech, with
CELP being highly advanced.
Perceptual Coding
• Introduction to Perceptual Coding

• Purpose:
• Compression of general audio (e.g., digital TV broadcasts).
• Key Features:
• Based on psychoacoustic models.
• Exploits limitations of human hearing.
• Transmits only perceptible features.
• Comparison:
• LPC and CELP: Primarily for speech compression in telephony.
• Perceptual coding: General audio compression.
• Psychoacoustic Model Overview
• Function:
• Analyzes sampled audio segments.
• Identifies features perceptible to the human ear.
• Eliminates inaudible features to reduce transmitted data.
• Key Limitations of Human Hearing:
• Non-linear sensitivity: More sensitive to certain frequencies.
• Frequency masking: Strong signals mask nearby weaker signals.
• Temporal masking: Ear needs time to recover after a loud sound.
• Sensitivity of the Human Ear
• Dynamic Range: Ratio of loudest sound to quietest sound (~96 dB).
• Perception Threshold:
• Ear is most sensitive to frequencies between 2-5 kHz.
• Example:
• Signal A above threshold → Audible.
• Signal B below threshold → Inaudible.
• Visual Reference:
• Threshold vs. frequency graph.
Frequency Masking
•Definition: Sensitivity of the ear changes when multiple frequencies are present.
•Key Points:
•Loud signals distort ear’s sensitivity curve in their vicinity.
•Example: Signal B (loud) masks Signal A (quieter, nearby frequency).
•Critical Bandwidth:
•Range of frequencies affected by masking.
•Increases with frequency:
•<500 Hz → Constant (~100 Hz).
•500 Hz → Increases linearly in multiples of 100 Hz.
• Masking Curve Example
• Masking Curves for Signals (1 kHz, 4 kHz, 8 kHz):
• Masking effect broadens with increasing frequency.
• Practical Application:
• Determine frequency components of an audio signal.
• Identify masked (inaudible) frequencies to avoid transmitting them.
• Temporal Masking
• Definition:
• After hearing a loud sound, ear takes time to detect quieter sounds.
• Effect:
• Loud sound → Decay envelope (~tens of milliseconds).
• Signals below decay envelope → Inaudible.
• Visual Reference:
• Loud sound decay vs. time graph.
• Advantages of Perceptual Coding
• Efficiency: Transmits only perceptible audio features.
• Applications:
• General audio compression.
• Digital TV, multimedia broadcasting.
• Challenges:
• Accurate modeling of psychoacoustic phenomena.
• Real-time processing requirements for audio encoding.
•Perceptual coding uses psychoacoustic models to efficiently compress audio.
•Relies on exploiting human hearing limitations like frequency and temporal masking.
•Widely used for high-quality audio in multimedia applications.
• Applications in MPEG Audio Coders
• Title: MPEG Audio Coders
• Key Points:
• Perceptual Coding:
• Eliminates inaudible parts of the signal (frequency and temporal masking).
• Achieves high compression without perceptible quality loss.
• Standards: MP3, AAC, and other MPEG formats leverage these properties.
• Scenario:
• Imagine two audio tones:
• Tone A: Loud, 1 kHz.
• Tone B: Soft, 1.1 kHz.
• Without masking:
• Both tones are audible.
• With masking:
• Tone B becomes inaudible due to Tone A's masking effect.
• Temporal Masking:
• After Tone A stops, the ear requires ~50 ms to detect Tone B.
Motion Pictures Expert Group (MPEG)
•A technique that exploits the human auditory system's limitations to compress audio.
•Applications:Widely used in audio compression applications, including MPEG audio
coders.
•Key Standards:
•ISO Recommendation 11172-3 (MPEG audio layers 1, 2, 3).
•Focus on multimedia applications like digital video and audio broadcasting.
MPEG Audio Compression - Overview
•Purpose:Efficient storage and transmission of audio while maintaining perceptual quality.
•Core Concepts:
• Dividing the audio signal into frequency subbands.
• Applying psychoacoustic models to allocate bits based on human perception.
•Processing Levels:Layers 1, 2, and 3 (increasing complexity and compression).
•Applications:CD-quality audio, professional sound, and multimedia applications.
• Basic Encoder Structure
• Components:
• PCM Sampling: Samples and quantizes the input signal.
• Analysis Filter Bank: Divides signal into subbands (e.g., 32 subbands, each 500 Hz wide).
• Psychoacoustic Model: Calculates signal-to-mask ratios (SMRs).
• Quantization: Allocates bits to audible components based on masking thresholds.
• Frame Format: Encodes data into a subband sample (SBS) format.
• Figure Reference: Figure 4.8(a) showing the encoder and decoder.
• Psychoacoustic Model
• Purpose:Models human auditory perception for efficient audio encoding.
• Key Functions:
• Applies frequency and temporal masking.
• Uses a Discrete Fourier Transform (DFT) for frequency analysis.
• Determines SMRs for quantization.
• Output: Allocates bits to frequency components with higher perceptual importance.
• Visual: Illustrative diagram of masking effects and SMRs.
• Subband Filtering and Scaling Factors
• Subband Filtering:
• Breaks audio into frequency subbands (e.g., 32 subbands of 500 Hz each at 32 kHz sampling).
• Each subband corresponds to a frequency component.
• Scaling Factors:
• Maximum amplitude of subband samples.
• Used in quantization to ensure effective bit allocation.
• Quantization: Scaling factors (6 bits) and subband samples (4 bits) encode data.
• Frame Format in Encoder
• Components:
• Header: Contains metadata like sampling frequency.
• Subband Sample (SBS) Format: Includes scaling factors and quantized subband samples.
• Ancillary Data: Optional field for additional information (e.g., surround sound).
• Frame Duration:
• Layer 1: 384 samples (~12 ms at 32 kHz).
• Layer 2: 1152 samples (~36 ms).
• Decoder Structure
• Simplified Design:
• Does not require a psychoacoustic model.
• Components:
• Dequantizer: Decodes magnitude of subband samples.
• Synthesis Filter Bank: Converts subband samples back to PCM.
• Output: Produces reconstructed analog signal.
• Advantages:
• Reduced complexity and cost for applications like broadcasting.
• Performance of MPEG Layers
• Layer 1: Basic mode, no temporal masking, moderate compression.
• Layer 2:
• Includes temporal masking, higher compression.
• Also referred to as MUSICAM.
• Layer 3: Advanced processing, highest compression and perceptual quality (basis of MP3).
• Applications:
• Layer 1: Broadcast communications.
• Layer 2: General multimedia.
• Layer 3: CD-quality audio, internet streaming.
• Table Reference:
• Include Table 4.2 summarizing performance and application domains.
• Perceptual coding efficiently compresses audio while maintaining quality.

• MPEG layers provide scalable solutions for various applications.
• Psychoacoustic models and subband filtering are crucial components.
• Future Implications: Basis for modern audio formats (e.g., MP3, AAC).
• ISO Recommendations and Application
• Key Points:
• ISO Recommendation 11172-3 defines three layers:
• Layer 1: Basic (no temporal masking).
• Layer 2: Adds masking for better compression.
• Layer 3: Maximum compression for limited bandwidth.
• Table 4.2 Reference:

• Provides application areas and performance summaries.
• Example for Students:
• Layer 3 (MP3): Perfect for mobile streaming due to low bandwidth requirements.
• Applications of Compression
• Key Points:
• Compression caters to different audio formats:
• Monophonic, dual monophonic, stereo, and joint stereo.
• Joint stereo encodes redundant information to save space.
• Student Activity Example:
• Compare audio quality of MP3 (Layer 3) vs WAV (uncompressed) to understand compression trade-offs.
Comparison of PCM, DPCM, ADPCM, APC, LPC, and CELP
Feature PCM DPCM ADPCM APC LPC CELP

Compression None Basic Moderate High Very High Very High
Adaptiveness No No Yes Yes No Yes
Application General Basic Speech/audio Speech/audio Speech Speech

audio compression
Computational Low Low Moderate High Moderate Very High

Complexity
Technique Description Key Takeaways
PCM (Pulse Code Encodes analog signals into digital by sampling at regular • Simple and widely used.
Modulation) intervals and quantizing the values. • Requires high bandwidth due to
uncompressed nature.
DPCM (Differential PCM) Encodes the difference between successive samples to • Lower bandwidth compared to PCM.
reduce redundancy. • Prone to cumulative errors.
ADPCM (Adaptive Improves DPCM by adapting quantization levels to the • Efficient for voice compression.
Differential PCM) signal's dynamic range. • Balances quality and compression.
Adaptive Predictive Coding Predicts future samples based on past samples and • Suitable for speech coding.
(APC) encodes the prediction error. • Requires complex prediction algorithms.
Linear Predictive Coding Models the vocal tract as filters and encodes parameters • Efficient for speech synthesis and recognition.
(LPC) to represent speech. • Low bit rate.
Code-Excited LPC (CELP) Combines LPC with a codebook of excitation signals to • Used in mobile and VoIP applications.
improve quality. • Balances bit rate and quality.
Perceptual Coding Removes inaudible parts of the signal based on • Foundation for modern codecs (e.g., MP3).
psychoacoustic principles. • High compression with minimal perceptual
loss.
MPEG Audio Coders Use subband filtering, psychoacoustic models, and • Widely used in multimedia and streaming.
perceptual coding for efficient audio compression. • Supports multiple layers with varying
complexity.
Dolby Audio Coders Specialize in high-quality surround sound using adaptive • Used in cinema and home theaters.
transform coding. • Provides immersive audio experiences.
Dolby audio coders:Forward Adaptive Bit Allocation
• Key Features:
• Bits allocated dynamically for each subband.
• Bit allocation data included with encoded samples.
• Advantages: Psychoacoustic model needed only in the encoder.
• Disadvantages:
• Bit allocation information increases overhead in the bitstream.
• Reduced efficiency of the available bit rate.
• Example: Illustrated in Figure 4.9(a).
• Fixed Bit Allocation Strategy
• Description:
• Fixed bits assigned to subbands based on ear sensitivity.
• No need to transmit bit allocation data in the bitstream.
• Advantages: Simple and efficient.
• Example:
• Dolby AC(acoustic coder)-1 Standard
• 40 subbands at 32 ksps.
• Typical stereo bit rate: 512 kbps.
Dolby audio coders
Backward Adaptive Bit Allocation
• Key Features:
• Psychoacoustic model present in both encoder and decoder.
• Decoder computes bit allocation using subband samples.
• Advantages:
• Reduces overhead in the bitstream.
• Disadvantages:
• Dependency on the same psychoacoustic model in encoder and decoder.
• Encoder modification requires decoder updates.
• Example:
• Dolby AC-2 Standard
• Used in PC sound cards.
• Typical stereo bit rate: 256 kbps.
Backward Adaptive vs. Forward
Adaptive
Feature Forward Adaptive Backward Adaptive
Where Allocation Happens At the encoder At the decoder
Computed by decoder based on
Bit Allocation Info Explicitly sent in the bitstream
spectral envelope
Encoder simplicity, no dependency Reduced bitstream overhead, more
Advantages
on decoder efficient use of bits
Requires psychoacoustic model in
Disadvantages Increased bitstream overhead
decoder
Hybrid Backward/Forward Adaptive Bit Allocation
• Description:
• Combines forward and backward adaptive methods.
• Key Components:
• PMB: Backward psychoacoustic model.
• PMF: Forward psychoacoustic model (complex, used only in the encoder).
• Process:
• PMF computes differences between forward and backward allocations.
• Differences improve quantization accuracy and are transmitted to the decoder.
• Advantages:
• Flexible encoder modifications.
• Improved audio quality.
• Example:
• Dolby AC-3 Standard
• Used in HDTV and ATV applications.
• Comparable to MPEG audio standards.
Block Design in Dolby AC-3
• Block Structure:
• Each block: 512 subband samples.
• Continuity: Last 256 samples repeated in the next block.
• PCM sampling rate: 32 ksps.
• Subband Details:
• Audio bandwidth: 15 kHz.
• Subband width: 62.5 Hz.
• Duration: Block: 16 ms (512 samples, with 256 new samples).
• Bit Rate: Typical stereo rate: 192 kbps.
Comparison of Techniques
Feature Forward Adaptive Fixed Allocation Backward Adaptive Hybrid Approach
Bitstream Overhead High Low Medium Medium
Psychoacoustic Both
Encoder only None in decoder Encoder dominant
Model encoder/decoder
Flexibility Moderate Low Low High
Audio Quality Good Limited by fixed Good Excellent
Applications
• Dolby AC-1:
• FM radio and TV audio.
• Low complexity, high efficiency.
• Dolby AC-2:
• Hi-fi quality for PC sound cards.
• Used in professional audio compression.
• Dolby AC-3:
• Advanced Television (ATV) and HDTV.
• Comparable to MPEG audio standards.
Summary
• Bit Allocation Methods: Forward, backward, and hybrid strategies optimize

compression.
• Standards: Dolby AC-1, AC-2, and AC-3 provide scalable solutions for different
applications.
• Trade-offs: Efficiency vs. complexity vs. audio quality.
• Future Trends:
• Improved models and flexible bit allocation strategies for emerging audio applications.
Video Compression
• Applications of Video in Multimedia:
• Interpersonal Communication:
• Video telephony
• Videoconferencing
• Interactive Access: Stored video in various formats
• Entertainment:
• Digital television
• Movie/video-on-demand
• Video Quality Determinants
• Digitization Format:
• Defines the sampling rate for:
• Luminance (Y)
• Chrominance (Cb and Cr)
• Specifies their relative position within a frame
• Frame Refresh Rate: Impacts the perceived smoothness of motion
• Digitization Formats and Bit Rates
• Range of Bit Rates (Worst Case):
• SQCIF (Sub-Quarter Common Intermediate Format): ~10 Mbps
• Used for video telephony
• 4:2:0 Format: ~162 Mbps
• Used for digital television broadcasts
• Challenges in Video Applications
• High Bit Rate Requirements:
• Resulting from all digitization formats
• Example: 162 Mbps for 4:2:0 format
• Available Transmission Channel Bit Rates:
• Significantly lower than required by digitization formats
• Importance of Compression
• Compression is essential for all video applications due to the disparity
between:
• Bit rate requirements of video formats
• Transmission channel capabilities
• Diversity of Video Standards
• No Single Video Standard:
• Multiple standards exist
• Each standard is targeted at specific application domains
• Majority of Standards:
• Internationally recognized
• Designed to address diverse requirements
Introduction to Video Compression Principles
• Terminology:
• Video is referred to as moving pictures; terms "frame" and "picture" used
interchangeably.
• Standard usage in context: "frame."
• Compression via JPEG:
• Applying JPEG independently to each frame is called MJPEG (Moving JPEG).
• Compression ratios: 10:1 to 20:1.
• These ratios are insufficient for most video applications.
• Spatial and Temporal Redundancy
• Spatial Redundancy: Exists within individual frames.
• Temporal Redundancy: Exists between successive frames:
• Example:
• Minor movements like lips or eyes in video telephony.
• Larger movements like a person or vehicle in movies.
• Bandwidth Savings: Send only information about moving segments.

• Temporal Redundancy Exploitation
• Predictive Compression:
• Predict frame contents using preceding or succeeding frames.
• Only differences between actual and predicted frames are sent.
• Motion Estimation: Estimates movement between frames for accurate prediction.
• Motion Compensation: Sends additional data to correct prediction errors.
• Types of Compressed Frames
• Intracoded Frames (I-Frames):
• Encoded independently, using JPEG principles.
• Provide low compression.
• Predictive Frames (P-Frames):
• Encoded based on preceding I- or P-frames.
• Use motion estimation and compensation for higher compression.
• Bidirectional Frames (B-Frames):
• Predicted using both past and future frames.
• Provide the highest compression.
• Group of Pictures (GOP)
• I-Frames in GOP:
• Inserted regularly to recover from transmission errors.
• Interval between I-Frames denoted as GOP (N).
• Typical values: 3 to 12 frames.
• Prediction Span (M):
• Number of frames between a P-frame and its reference frame.
• Typical values: 1 to 3 frames.
• Motion Estimation Techniques
• Search Regions:
• Limited to neighbouring segments for efficiency.
• Applications:
• Video telephony: Small movements handled well.
• Movies: B-frames handle fast-moving objects and overlapping objects.
• Encoding and Decoding Process
• I-Frames: Decoded immediately upon receipt.
• P-Frames: Decoded using preceding I- or P-frame contents.
• B-Frames:
• Decoded using preceding and succeeding I- or P-frames.
• Require reordering of encoded frames for efficient decoding.
• Frame Reordering Example
• Uncoded Frame Sequence: IBBPBBPBBI...
• Reordered Frame Sequence: IPBBPBBIBBPBB...
• Additional Frame Types
• PB-Frames:
• Encodes neighbouring P- and B-frames as a single frame.
• Increases frame rate without significantly increasing bit rate.
• D-Frames:
• Used in movie/video-on-demand applications for fast-forward/rewind.
• Encoded with only DC coefficients for high-speed decoding.
• D-Frames in Detail
• Usage: Inserted periodically in the video stream.
• Compression Principle:
• Encoded using DC coefficients of pixel blocks (mean values).
• Provides low-resolution sequences for fast operations.
Frame Prediction in Compression
•Frame Prediction:
•Encoded contents of P- and B-frames predicted by estimating motion between:
•The target frame (current frame being encoded).
•Preceding I- or P-frame.
•For B-frames, also the succeeding P- or I-frame.
•Motion estimation identifies macroblock movement between frames.
•Figure Reference: Steps involved in encoding a P-frame are illustrated in Figure 4.12.
• Macroblock Structure
• Definition of Macroblock:
• A 16x16 pixel block in the Y matrix (luminance).
• For 4:1:1 digitization:
• Cb and Cr matrices are 8x8 pixels each.
• DCT Blocks per Macroblock:
• 4 DCT blocks for luminance.
• 1 DCT block each for chrominance (Cb and Cr).
• Macroblock Address: Each macroblock is uniquely identified for encoding.
• Encoding a P-Frame
• Comparison Process:
• Pixel-by-pixel comparison between:
• Target macroblock in the target frame.
• Corresponding macroblock in the reference frame (preceding I or P frame).
• Match Scenarios:
• Close Match Found: Only macroblock address encoded.
• No Match Found:
• Search extended to a defined area around the macroblock in the reference frame.
• Search area typically includes multiple macroblocks.
• Motion Vector and Prediction Error
• Motion Vector:
• Indicates the (x, y) offset of the target macroblock relative to its matching location in the
reference frame.
• Can be at macroblock or single-pixel resolution.
• Prediction Error:
• Comprises three matrices (Y, Cb, Cr).
• Contains differences between the target macroblock and the matched block in the search area.
• Search Strategy for P-Frames
• Search Criteria: Match identified if the mean of absolute errors for pixel
differences is below a threshold.
• Encoding Results:
• Motion Vector: Encoded using Differential Encoding (DE) and Huffman Encoding.
• Difference Matrices: Encoded with:
• Discrete Cosine Transform (DCT).
• Quantization.
• Entropy encoding.
• No Match: Macroblock encoded independently like I-frame macroblocks.
• Encoding a B-Frame
• Reference Frames:
• Motion estimated using:
• Preceding I- or P-frame.
• Succeeding P- or I-frame.
• Computation Steps:
• Motion vector and difference matrices calculated using:
• Preceding reference frame.
• Succeeding reference frame.
• A third set of values computed using the mean of the two.
• Best match (lowest difference matrices) selected.
• Encoding:
• Motion vectors encoded to sub-pixel resolution (e.g., half-pixel resolution).
• Difference matrices encoded similarly to P-frames.
•Sub-Pixel Resolution:
•Enhances precision of motion estimation for B-frames.
•Unmatched Macroblocks:
•Encoded independently as in I-frames.
•Efficiency:
•B-frames provide better compression but require more computational resources.
Issues in Encoding I-, P-, and B-Frames
• Overview of Encoding Process (Figure 4.14)
• Figure 4.14 Breakdown:
• (a) Encoding I-frames.
• (b) Encoding P-frames.
• (c) Encoding B-frames.
• (d) Example encoded bitstream format.
• I-Frame Encoding
• Procedure:
• Encoding follows the JPEG standard for each 8x8 pixel block.
• Steps involved:
• Forward Discrete Cosine Transform (DCT).
• Quantization.
• Entropy encoding.
• Macroblock Composition:
• Four 8x8 luminance (Y) blocks.
• P-Frame Encoding (Figure 4.14b)
• Motion Estimation Unit:
• Determines match between the target macroblock and reference frame macroblock.
• Three possible scenarios:
• Exact Match: Encode only the macroblock address.
• Close Match: Encode motion vector and difference matrices.
• No Match: Encode macroblock as in an I-frame.
• Reference Frame Management:
• Reference frame (uncoded) is stored in a buffer.
• Reference frame updated using computed difference values decompressed by:
• Dequantization (DQ).
• Inverse DCT (IDCT).
• B-Frame Encoding (Figure 4.14c)
• Reference Frames:
• Both preceding and succeeding frames used.
• Difference values calculated from:
• Preceding reference frame.
• Succeeding reference frame.
• Encoded Bitstream Format (Figure 4.14d)
• Components of Format:
• Type Field: Identifies frame type (I-, P-, or B-frame).
• Macroblock Address: Location of macroblock in the frame.
• Quantization Value: Threshold for quantizing DCT coefficients.
• Motion Vector: Encoded vector, if present.
• Block Presence Indicator: Identifies which of the six 8x8 blocks are present.
• JPEG-Encoded DCT Coefficients: For present blocks.
• Variable Bit Rate:
• Encoder output varies with source video complexity.
• Decoding Process
• I-Frame Decoding: Same as decoding JPEG-encoded images.
• P-Frame Decoding:
• Retains preceding decoded I- or P-frame in a buffer.
• Macroblocks processed as:
• Uncoded Macroblocks: Use address to copy contents from the buffer.
• Fully Encoded Macroblocks: Decode directly.
• Motion Vector + Difference Matrices: Compute new values using reference frame matrices.
• B-Frame Decoding:
• Requires three buffers:
• Preceding I- or P-frame.
• Succeeding P- or I-frame.
• Frame being assembled.
• Performance Metrics
• Compression Ratios:
• I-Frames: Similar to JPEG, typically 10:1 to 20:1 depending on frame complexity.
• P-Frames: Higher, typically 20:1 to 30:1.
• B-Frames: Highest, typically 30:1 to 50:1.
• Factors Affecting Compression:
• Frame complexity.
Introduction to H.261 Video Compression
• Defined by ITU-T for video telephony and videoconferencing over ISDN.

• Known as p x 64, where p is 1 to 30 (multiples of 64 kbps).
• Digitization formats:
• CIF (Common Intermediate Format): Typically for videoconferencing.
• QCIF (Quarter CIF): Typically for video telephony.
• Compression operates on macroblocks (16 x 16 pixels):
• Horizontal resolution reduced to 352 pixels for 22 macroblocks.
• Spatial Resolutions
• CIF Resolution:
• Luminance (Y): 352 x 288.
• Chrominance (Cb, Cr): 176 x 144.
• QCIF Resolution:
• Luminance (Y): 176 x 144.
• Chrominance (Cb, Cr): 88 x 72.
• Progressive (non-interlaced) scanning with refresh rates:
• CIF: 30 fps.
• QCIF: 15 or 7.5 fps.
Frame Types and Encoding
• Uses I-frames and P-frames only:

• 1 I-frame followed by 3 P-frames.
• Macroblock encoding:
• Each macroblock: 4 blocks for Y, 1 for Cb, and 1 for Cr.
• Encoded using forward DCT, quantization, and entropy encoding.
• Encoded macroblock format (Figure 4.15(a)):
• Address: Identifies macroblock.
• Type: Intracoded (I-frame) or intercoded (P-frame).
• Quantization value: Threshold for DCT coefficients.
• Motion vector: Encoded vector (if present).
• Coded block pattern: Indicates presence of 8 x 8 blocks.
Frame Format
• Encoded frame format (Figure 4.15(b)):

• Picture start code: Indicates start of a new frame.
• Temporal reference field: Time-stamp for synchronization with audio blocks.
• Picture type field: Specifies I-frame or P-frame.
• Group of Blocks (GOB)
• Group of macroblocks (GOB):
• Matrix: 11 x 3 macroblocks.
• CIF: 12 GOBs (2 x 6).
• QCIF: 3 GOBs (1 x 3).
• Unique start code at each GOB head (Figure 4.15(c)):
• Serves as a resynchronization marker.
• Allows error handling by locating the start of the next GOB.
• GOB dropping mechanism:
• Allows missing GOBs to reduce data when bandwidth exceeds.
Variable Bit Rate and FIFO Buffer
• Variable bit rate produced by encoder:

• Must be converted to a constant bit rate for ISDN transmission.
• FIFO buffer scheme (Figure 4.16(a)):
• First-In-First-Out (FIFO) buffer used to stabilize bit rate.
• Monitored to adjust quantization dynamically:
• High threshold: Increase quantization threshold, reducing output rate.
• Low threshold: Decrease quantization threshold, increasing output
rate.
• Data structure preserved in FIFO.
• Quantization Control
• Compression rate adjustment via quantization:
• Higher quantization threshold → Lower accuracy → Lower bit rate.
• Lower quantization threshold → Higher accuracy → Higher bit rate.
• Operates at the GOB level:
• Adjust quantization for each GOB.
• Drop GOBs when thresholds are exceeded.
• Handling Transmission Bandwidth
• Adaptive adjustments:
• Quantization thresholds are matched with dequantizers in the decoder.
• Complete frames may be dropped:
• Matches frame rate with available transmission bandwidth.
Introduction to H.263
• Definition: ITU-T video compression standard for applications like:

• Video telephony
• Videoconferencing
• Security surveillance
• Interactive games
• Real-Time Requirement: Encoder output transmitted over networks in real
time.
• Target Networks: Wireless and PSTN, which require compression to very low
bit rates (28.8 to 56 kbps).
• Comparison to H.261:
• At bit rates <64 kbps, H.261 offers poor picture quality due to high quantization
thresholds and low frame rates.
• H.263 introduces advanced coding options to improve quality.
Digitization Formats in H.263
• Mandatory Formats: QCIF and Sub-QCIF (S-QCIF).

• Frame Division:
• Macroblocks of 16×16 pixels.
• Horizontal resolution adjusted for an integral number of
macroblocks.
• Spatial Resolutions:
• QCIF: Y = 176×144, Cb = Cr = 88×72.
• S-QCIF: Y = 128×96, Cb = Cr = 64×48.
• Frame Rate: Progressive scanning with refresh rates of 15 or 7.5 fps.
• Decoder Requirement: Must support both formats; encoder may
support only one.
Frame Types in H.263
• Frame Types Used: I-frames, P-frames, and B-frames.

• PB-Frames:
• Combines a B-frame and succeeding P-frame into a single entity.
• Reduces encoding overheads, enabling higher frame rates.
• Decoding Process:
• P-frame reconstructed first.
• B-frame decoded bidirectionally using the reconstructed P-frame and preceding P-
frame.
• Unrestricted Motion Vectors
• Traditional Restriction:
• Motion vectors limited to a defined search area within the frame boundary.
• Macroblocks near frame edges often encoded as I-frames due to mismatches.
• Unrestricted Mode:
• Pixels outside the frame boundary replaced with edge pixels.
• Error Resilience in H.263
• Target Networks: High error probability in wireless and PSTNs.
• Error Characteristics:
• Burst errors affect groups of macroblocks (GOBs).
• Errors propagate across frames due to predictive encoding.
• Error Masking:
• Skip corrupted GOBs and resynchronize at the next GOB.
• Use error concealment (e.g., preceding frame’s GOB).
• Error Tracking
• Two-Way Communication: Decoder sends feedback to encoder.
• Error Detection Methods:
• Invalid motion vectors, codewords, or DCT coefficients.
• Operation:
• Negative acknowledgment (NAK) sent with GOB location.
• Encoder retransmits affected macroblocks as intracoded blocks.
• Figure Reference: Illustrates frame sequence and error handling (Figure 4.17).
• Independent Segment Decoding
• Objective: Prevent errors in one GOB from affecting neighboring GOBs.
• Method:
• Treat each GOB as an independent subvideo.
• Limit motion estimation and compensation to within GOB boundaries.
• Limitation: Reduced efficiency in vertical motion estimation.
• Figure Reference: Examples shown in Figures 4.18(a) and 4.18(b).
• Reference Picture Selection
• NAK Mode:
• Errors reported via NAK.
• Encoder selects unaffected frames for reference.
• Propagation delay determined by communication channel round-trip time.
• ACK Mode:
• Decoder acknowledges error-free frames.
• Only acknowledged frames used as references.
• Better performance with short round-trip delays.
• Figure Reference: Diagrammatic representation in Figures 4.19(a) and 4.19(b).
Introduction to MPEG
• Formation: MPEG was formed by ISO.

• Objective: Develop standards for multimedia applications involving video and sound.
• Outcome:
• Three standards for recording or transmitting integrated audio and video streams.
• Focus on compression and integration of audio and video.
• MPEG Standards Overview
• Each standard targets a specific application domain.
• Three primary standards:
• MPEG-1
• MPEG-2
• MPEG-4
• Key Aspects:
• Video resolution.
• Audio and video compression and integration.
• MPEG-1
• Defined In: ISO Recommendation 11172.
• Video Resolution: Source Intermediate Digitization Format (SIF) – up to 352 x
288 pixels.
• Application:
• Storage of VHS-quality audio and video on CD-ROM.
• Bit rates up to 1.5 Mbps (higher rates for faster access).
• MPEG-2 Overview
• Defined In: ISO Recommendation 13818.
• Applications:
• Recording and transmission of studio-quality audio and video.
• Four levels of video resolution.
• MPEG-2 Levels
1. Low Level:
1. Resolution: Up to 352 x 288 pixels (SIF format).
2. Quality: VHS-quality video.
3. Bit Rate: Up to 4 Mbps.
2. Main Level:
1. Resolution: Up to 720 x 576 pixels (4:2:0 format).
2. Quality: Studio-quality digital video.
3. Features: Multiple CD-quality audio channels.
4. Bit Rate: Up to 15 Mbps (or 20 Mbps with 4:2:2 format).
3. High 1440 Level:
1. Resolution: 1440 x 1152 pixels (4:2:0 format).
2. Application: HDTV.
4. High Level:
1. Resolution: 1920 x 1152 pixels (4:2:0 format).
2. Application: Wide-screen HDTV.
• MPEG-4
• Initial Focus: Low bit rate channels (4.8 to 64 kbps).
• Expanded Scope:
• Interactive multimedia applications over the Internet.
• Entertainment networks.
• MPEG-7 (Brief Mention)
• Purpose: Describes structure and features of multimedia content.
• Applications:
• Search engines use these descriptions to locate material with defined features.
• MPEG-3
• Focus: Initially intended for HDTV.
• Outcome:
• Work was incorporated into MPEG-2.
• Not developed separately.
• Structure of MPEG Standards
• Three Parts:
• Video
• Audio
• System
• Functions:
• Video and audio parts: Compression and bitstream formatting.
• System part: Integration and synchronization of audio and video streams.
• Scope of Discussion in the Book
• Focus on:
• MPEG-1.
• Main and high levels of MPEG-2.
• Selected features of MPEG-4.
• Introduction to MPEG-1
• Compression Technique:
• Similar to H.261 standard.
• Uses Source Intermediate Format (SIF).
• Macroblocks:
• Frame divided into 16 x 16 pixel macroblocks.
• Horizontal resolution reduced from 360 to 352 pixels for integral macroblocks.
• Spatial Resolutions
• NTSC:
• Y: 352 x 240 pixels
• Cb = Cr: 176 x 120 pixels
• PAL:
• Y: 352 x 288 pixels
• Cb = Cr: 176 x 144 pixels
• Progressive Scanning:
• Refresh rates: 30 Hz (NTSC), 25 Hz (PAL).
• Frame Types in MPEG-1
• Supported Frames:
• I-frames only.
• I- and P-frames.
• I-, P-, and B-frames (most common).
• Unsupported Frames: No D-frames in MPEG standards.
• Key Use of I-frames:
• Essential for random-access functions (e.g., VCR).
• Max random-access time: 0.5 seconds.
• Frame Sequences
• Example sequences for PAL and NTSC:
• PAL: IBBPBBPBBI…
• NTSC: IBBPBBPBBPBBI… (now commonly used for both).
• Compression Algorithm
• Based On: H.261 standard.
• Macroblock Composition:
• 16 x 16 pixels in Y-plane.
• 8 x 8 pixels in Cb and Cr planes.
• Differences from H.261
1. Time-stamps (Temporal References):
1. Enable faster decoder resynchronization.
2. Macroblocks grouped into slices.
3. A slice = 1 to max macroblocks per frame (typically 22).
2. Introduction of B-frames:
1. Increased interval between I- and P-frames.
2. Larger search window for moving objects.
3. Finer motion vector resolution.
• Compression Ratios
• I-frames: ~10:1
• P-frames: ~20:1
• B-frames: ~50:1
• Hierarchical Bitstream Structure
• Top Level: Sequence
• String of Groups of Pictures (GOPs).
• Each GOP = String of I, P, or B frames.
• Frames consist of slices → macroblocks → 8x8 pixel blocks.
• Decoder Requirement:
• Sequence Format
• Sequence Start Code: Indicates the start.
• Parameters for Sequence:
• Video Parameters: Screen size and aspect ratio.
• Bitstream Parameters: Bit rate and memory/frame buffer size.
• Quantization Parameters: Quantization tables for frame types.
• Encoded Video Stream: String of GOPs.
• Sequence Format
• Sequence Start Code: Indicates the start.
• Parameters for Sequence:
• Video Parameters: Screen size and aspect ratio.
• Bitstream Parameters: Bit rate and memory/frame buffer size.
• Quantization Parameters: Quantization tables for frame types.
• Encoded Video Stream: String of GOPs.
• Frame Details
• Picture Start Code: Indicates start of a frame.
• Fields:
• Type field (I, P, or B).
• Buffer parameters: Define buffer fill level for decoding.
• Encode parameters: Define motion vector resolution.
• Slices: Multiple slices per frame.
• Slice Details
• Slice Start Code: Marks start of each slice.
• Fields:
• Vertical position: Indicates scan line related to slice.
• Quantization parameter: Scaling factor for the slice.
• Macroblocks:
• Encoded like H.261.
• Slice = Equivalent to GOB in H.261.
• MPEG-2 Overview
• Four Levels:
• Low, Main, High 1440, High.
• Five Profiles:
• Simple, Main, Spatial Resolution, Quantization Accuracy, High.
• Two-Dimensional Framework:
• Levels and profiles form a table for standards activities.
• Enables compatibility and interworking between old and new equipment.
• Low Level: Compatible with MPEG-1.
• Focus: Main Profile @ Main Level (MP@ML) and High Levels for HDTV.
• MP@ML (Main Profile @ Main Level)
• Target Application: Digital television broadcasting.
• Scanning Method: Interlaced scanning.
• Resolution:
• 720 x 480 pixels @ 30 Hz (NTSC).
• 720 x 576 pixels @ 25 Hz (PAL).
• Output Bit Rate: 4 Mbps to 15 Mbps (dependent on broadcast channel bandwidth).
• Differences in Video Coding
• Interlaced Scanning:
• Frame = Two interlaced fields.
• Fields alternate lines (as shown in Figure 4.22(a)).
• DCT Encoding Modes:
• Field Mode: Based on lines in a field; used for high motion.
• Frame Mode: Based on lines in a complete frame; used for low motion.
• Choice of Mode: Depends on motion in video (e.g., live sports = field mode, studio program = frame
mode).
• Motion Estimation in P- and B-frames
• Modes:
• Field Mode:
• Motion vectors calculated for corresponding macroblocks in preceding (I or P) and succeeding (P or I) fields.
• Motion relates to time to scan one field.
• Frame Mode:
• Motion vectors calculated for odd and even fields relative to preceding/succeeding odd/even fields.
• Motion relates to time to scan a full frame (two fields).
• Mixed Mode:
• Both field and frame vectors computed; smallest mean values selected.
• HDTV Standards
• Three Standards:
• Advanced Television (ATV) – North America.
• Digital Video Broadcast (DVB) – Europe.
• Multiple Sub-Nyquist Sampling Encoding (MUSE) – Japan and Asia.
• Common Features:
• Define digitization format, compression schemes, and transmission methods.
• ITU-R HDTV specification for studio production and international exchange.
• ITU-R HDTV Specification
• Aspect Ratio: 16:9.
• Resolution:
• 1920 samples per line.
• 1152 lines per frame (1080 visible).
• Current Scanning: Interlaced.
• Future Scanning: Progressive with 4:2:2 format.
• ATV (Advanced Television)
• Alliance: Developed by Grand Alliance (GA).
• Specifications:
• ITU-R HDTV (16:9, 1920 x 1080).
• Lower resolution format: 1280 x 720 (16:9).
• Compression Standards:
• Video: MPEG-2 MP@HL.
• Audio: Dolby AC-3.
• DVB (Digital Video Broadcast)
• Resolution:
• 1440 samples per line.
• 1152 lines per frame (1080 visible).
• Relation to PAL: Twice the resolution of PAL (720 x 576).
• Compression Standards:
• Video: SSP@H1440 (Spatially Scalable Profile @ High 1440).
• Audio: MPEG Audio Layer 2.
• MUSE Standard
• Resolution: 1920 samples per line, 1035 lines per frame.
• Compression Algorithm: Similar to MPEG-2 MP@HL.
• Introduction to MPEG-4
• Application domain: Interactive multimedia applications over the Internet and entertainment networks.
• Features:
• Passive video access (start/stop/pause).
• Scene manipulation: Reposition, delete, or alter individual elements in a video.
• Applications: Video telephony and multimedia over low-bit-rate networks (e.g., wireless, PSTNs).
• Alternative to the H.263 standard.
• Scene Composition in MPEG-4
• Content-based functionalities:
• Scene defined as background + foreground audio-visual objects (AVOs).
• AVOs consist of audio and/or video objects.
• Example:
• Stationary car: Single video object.
• Talking person: Audio + video objects.
• Subobject definition:
• A face can be divided into head, eyes, and mouth for efficient encoding.
• Encoding:
• Background and AVOs encoded separately with timing info for synchronization.
• Object descriptor enables manipulation before decoding.
• Scene Description and Compression
• Scene descriptor: Defines relationships between AVOs (position, composition).
• Example:
• Frame split into Video Object Planes (VOPs).
• VOPs encoded based on shape, motion, texture.
• Encoding process:
• Separate compression for each VOP using minimal macroblocks.
• Bitstreams multiplexed with object and scene descriptors.
• Audio and Video Compression
• Audio compression:
• Algorithm choice depends on bit rate and quality.
• Examples:
• G.723.1 (CELP): Internet and wireless networks.
• Dolby AC-3/MPEG Layer 2: Interactive TV.
• Video compression:
• VOP segmentation into objects with similar properties.
• Reduced bit rate due to object-based encoding.
• Advanced coding for interactive VOP manipulation.
• Transmission Format
• Transport stream (TS):
• Multiplexed Packetized Elementary Streams (PESs).
• Compatible with various networks (Internet, PSTN, wireless, TV).
• Synchronization:
• Timing info in PES packets ensures accurate decoding.
• Scene and object descriptors carried in separate elementary streams.
• Composition and rendering:
• Decoded AVOs combined for frame output.
• Audio synchronized with video.
• Error Resilience Techniques
• Fixed-length video packets:
• Divides bitstreams into equal-sized packets, not macroblocks.
• Resynchronization markers inserted for error recovery.
• Reversible Variable-Length Coding (RVLC):
• VLCs decoded forward or backward.
• Minimizes bit loss due to errors.
• Ensures usable codewords from overlapping regions.
• Frame-level parameters:
• Replicated in headers for error correction.
• Fixed-Length Video Packets
• Overcomes GOB-based error limitations:
• GOB size varies with motion/activity.
• Fixed-length packets improve resilience.
• Packet structure:
• Motion boundary marker separates motion vectors and DCT info.
•
• Reversible VLCs (RVLCs)
• Definition:
• VLCs designed for forward and backward decoding.
• Encoding process:
• VLCs with constant Hamming weight.
• Fixed-length prefix and suffix added.
• Error handling:
• Forward and reverse scans locate errors.
• Usable codewords extracted from overlap regions.
• Key Observations
1.Evolution:
1. H.261 and H.263 are primarily for low-bit-rate video telephony and conferencing, while
the MPEG standards expanded to broader multimedia applications.
2. MPEG-4 introduced object-based coding and scalability, marking a shift towards
interactivity and versatility.
2.Compression Efficiency:
1. H.263 improved efficiency over H.261. MPEG-2 offered better quality for broadcasting,
and MPEG-4 refined it further for streaming and mobile applications.
3.Resilience and Scalability:
1. Error resilience improved progressively from H.261 to MPEG-4, making later standards
more robust for unreliable networks like mobile or streaming.
4.Applications:
1. H.261 and H.263 are mostly obsolete but foundational.
2. MPEG-2 remains dominant for broadcasting.
3. MPEG-4 is widely used in streaming platforms and mobile content today.

MMC Unit III-1

Uploaded by

Copyright:

Available Formats

MMC Unit III-1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MMC Unit III-1

Uploaded by

Copyright:

Available Formats

MMC Unit III

• Figure 4.2: Third-order predictive DPCM signal schematic.

• Principle: Predictor coefficients change adaptively.

• Introduction to Linear Predictive Coding (LPC)

•LPC Decoders: Simplistic models of the vocal tract.

• Introduction to Perceptual Coding

• Perceptual coding efficiently compresses audio while maintaining quality.

• Table 4.2 Reference:

Feature PCM DPCM ADPCM APC LPC CELP

Adaptiveness No No Yes Yes No Yes

Application General Basic Speech/audio Speech/audio Speech Speech

Computational Low Low Moderate High Moderate Very High

• Bit Allocation Methods: Forward, backward, and hybrid strategies optimize

• Bandwidth Savings: Send only information about moving segments.

• Defined by ITU-T for video telephony and videoconferencing over ISDN.

• Uses I-frames and P-frames only:

• Encoded frame format (Figure 4.15(b)):

• Variable bit rate produced by encoder:

• Definition: ITU-T video compression standard for applications like:

• Mandatory Formats: QCIF and Sub-QCIF (S-QCIF).

• Frame Types Used: I-frames, P-frames, and B-frames.

• Formation: MPEG was formed by ISO.

You might also like