Media Networks - Audio and Video

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

MEDIA NETWORKS

Audio and Video standards


• Digital media content is send over an IP Network
VIDEO
• Resolution
− Number of
pixels per
frame

• Framerate
− Number of
images per
second

• Aspect ratio
− Ratio between
height and
width
CAPTURING VIDEO
• Optical cmos or ccd sensor
• Red, green, blue
• Rgb values stored per pixel = chroma
• Lumanance stored per pixel = luma
• Stored digitaly with specified Color Depth
− describes the amount of information stored in each pixel
of data (RGB)
− increase bit depth, also increase the number of colors
that can be represented
• 8-bit RGB image, each pixel has 8-bits of data per color
(RGB), so for each color channel the pixel has 28 = 256
possible variations = 16 777 216 colors
• 10-bit RGB image, each color channel would have 210 = 1024
variations = 1 073 741 824 colors
• 12-bit … 4096 variations = 70 billion colors
CHROMA SUB-SAMPLING

• Sampling color to a
lower rate than luma

• a:b:c

− a: number of luma samples


in row 1
− b: number of different
colors in row 1
− c: number of different
colors in row 2 respective
to row 1
AUDIO

• Pulse code modulation (PCM)


− Sample every x ms → sample rate
− Quantizes sample by a number of bits → bit depth
− Every channel (stereo, 5.1) is sampled separately
AUDIO SAMPLING QUALITY

Quality Sample Bits per Channels Data Rate Frequency


Rate (kHz) Sample uncompress band (kHz)
(bits) ed (kb/s)
Telephone 8 8 Mono 64 0,200-3,4
AM Radio 11,025 8 Mono 88,2 0,1-5,5
FM Radio 22,05 16 Stereo 705,6 0,02-11
CD 44,1 16 Stereo 1411 0,005-20
DAT 48 16 Stereo 1536 0,005-20
DVD Audio 192 24 6 channels 27 648 0-96
CONTAINERS
• Bundle and store all elements of a video into one package
• Contains one ore more tracks and metadata
− Video tracks (without audio)
• Metadata: aspect ratio, duration, angle, track id, codec, …
• Encoded binary data
− Audio tracks (without video)
• Metadata: language, duration, volume, codec, …
• Encoded binary data
• Contains markers to synchronize audio with video track
− Text tracks
• Captions or subtitles
− Image tracks
• Thumbnails for fast forwarding
− Metadata
• Title, author, date, cover art, list of chapters
MP4 VIDEO CONTAINER

• Mp4 file exists of atoms


− 4 byte size + 4 byte type
− FTYP: always first, type of file and
version of atoms
− MOOV: Contains the information
about the video and audio (codec,
length, keyframes, …)
− MDAT: contains encoded audio
and video data
− WIDE: enlarge size of the MDAT
atom to 16 byte = 18 exabyte
OGG VIDEO CONTAINER

• Ogg files are framed in pages


• Each page has header
FULL VIDEO FILE SIZES

• https://toolstud.io/video/filesize.php?imagewidth=1920
&imageheight=1080&framerate=25&timeduration=60&ti
meunit=seconds
REDUCING REQUIRED BANDWITH
• Interlaced vs progressive video
− 1080p vs 1080i

− Interlaced: odd and even lines are transmitted separately


• Half bandwidth of Progressive
− Progressive: Whole frames are transmitted

− Anti-aliassing to blur lines with interlaced


REDUCING REQUIRED BANDWITH
The “art” of encoding video and audio is, in many respects, a
“black art.” There are no standards other than file format and
everything else from data rate to audio is left to your best
judgment. The decisions you make in creating the files are
therefore “subjective” not “objective.” Before you can stream the
video and audio content you need to clearly understand how these
files are created and the decisions you will need to make to ensure
smooth playback. This process starts with a rather jarring
revelation: video is not video. The extensions used to identify video
and audio files are more like shoeboxes than anything else. The file
formats—MPEG4, WebM, and Ogg—are the names on the boxes
and inside the boxes are a video track and an audio track. The box
with the file format label is called a “container.”
CODEC
• Codec is comprised of
− Encoder → compression
− Decoder → decompression
• Described in compression
standards, algorithm to compress
and decompress the data
• Video codec examples: H.264,
H.265(HEVC), H.266(VVC), VP8,
VP9, RV40, AV1
• Audio codec examples: LAME/MP3,
Fraunhofer FDK AAC, FLAC
• Codec is NOT a container
COMPRESSION

Lossy comression Losless compression


• through the compression data is reduced
to an extent where the original • class of algorithms that will
information can not be obtained
when the video is decompressed.
allow for the exact original data
• As the video is encoded, increasingly to be reconstructed from the
more information is, essentially, thrown compressed data
out
• throwing away information that is not • Loses no information and
relevant to the visual perception or results in larger files
hearing
• seriously smaller file size • useful for the original video
• Reduce the size of a video file → more
information lost → quality of the video
file or data
images decreases
• when you encode video, you should
always keep a copy of the original files
AUDIO COMPRESSION
• Resulting sampled data
(uncompressed PCM) can be
compressed using audio
codecs
• Compression techniques
− cut out a certain frequency ranges
− spend lot less data on the quiet
sounds
• Simultaneous with louder sound
(simultaneous masking)
• Close to louder sounds (temporal
masking)
• Alone (minimum audition
threshold)
COMMON AUDIO CODECS
• MPEG Audio Layer III (MP3)
− Licensed under Lame
− Bit depth: 16 bit
− Sample rate: 44,1 kHz (16 to 48 kHz)
− Mono or Stereo
− Bitrate (64kbps), 128kbps, 192kbps, 320kbps
− lossy data compression to encode data using inexact approximations and the
partial discarding of data
− large reduction in file size when compared to uncompressed audio, which is
important for both transmission and storage concerns, while still retaining a
comparable level of sound quality
− designed as a streamable format, segments of a transmission can be lost
without affecting the ability to decode later segments.
COMMON AUDIO CODECS
• Advances Audio Coding (AAC)
− Introduced by apple, later standardized by ISO MPEG
− sample rates from 8 to 96 kHz
− up to 48 channels
− Any bitrate
− Better sound quality than mp3
• Ogg vorbis
− fully open, non-proprietary, patent-and-royalty-free, general-purpose compressed
audio format lossy audio compression
− fixed and variable bitrates from 16 to 128 kbps/channel
− 8kHz-48.0kHz, 16+ bit
− in the same competitive class as audio representations such as MPEG-4 (AAC), but
higher performance than MPEG-1/2 audio layer 3
COMMON AUDIO CODECS

• Free Lossless Audio Codec (FLAC)


− lossless compression
− typically be reduced to between 50 and 70 percent of its original size
− FLAC is an open format with royalty-free licensing
• G.711 and G.722
• Telephony standards
− G.711
• 8bit, 8kHz, 64kbps PCM
− G.722
• 48Kbps, 56Kbps and 64Kbps
• Better quality than G.711
VIDEO COMPRESSION

Intraframe compression Interframe compression

− compressing each video frame as • Compression over more frames


a single entity
• image-to-image prediction
IMAGE-TO-IMAGE PREDICTION
• I-frames: self-contained,
no dependency outside
of that image
• P-frames: forward
Predicted pictures,
predicted from one
reference image
• B-frames: bidirectionally
predicted from two
reference images
• Video without I-Frames:
https://www.youtube.co
m/watch?v=7-lGmErj3t4
BLOCK MOTION COMPENSATION

• Reference frame
• Devide frame in macroblocks
• Move blocks using vector
• Treshhold to big: reset block
BLOCK DIFFERENCE BETWEEN H.264 AND
H.265
VARIABLE AND CONSTANT BIT RATES
• CBR (Constant Bit Rate)
− Constant and predefined bit rate
− image quality will vary
− quality will remain relatively high when there is
no motion
− will significantly decrease with increased motion

• VBR (Variable Bit Rate)


− predefined level of image quality
− desirable in video surveillance
− the network infrastructure (available bandwidth)
needs to have a higher capacity
COMMON VIDEO CODECS

• H.264
− MPEG-4 part 10 or MPEG-4 AVC or MPEG-4 Advanced Video Coding
− industry standard for video compression
− international standards bodies ITU-T (International Telecommunication
Union) and ISO/IEC (International Organisation for Standardisation /
International Electrotechnical Commission)
− Profiles
• Baseline: use this with iOS devices.
• Main: this is mostly an historic profile used on standard definition (SD) (4:3
Aspect Ratio) TV broadcasts.
• High: use this for Web, SD, and HD (high definition) video publishing.
− License fee if you want to distribute paid content
COMMON VIDEO CODECS
• H.265
− High Efficiency Video Coding (HEVC) or MPEG-H Part 2
− up to twice the data compression with the same level of
video quality.
− support future resolutions up to 8K UHD (8192x4320)
compared to the 4K (4092x2160) for H.264
− H.265 encoding and decoding requires much more
processing power over H.264
• H.266
− Versatile Video Coding (VVC) or MPEG-H Part 3
− finalized on 6 July 2020
− Can reduce filesize up to 50% over H.265 (HEVC) for
similar quality
− Supports 4K, 8K and even 16K UHD, 360° video
COMMON VIDEO CODECS
• VP8
− open and royalty free video compression format owned by Google and created
by On2 Technologies
− up to 2160p (4k) resolution
• VP9
− open and royalty-free video coding format developed by Google
− successor to VP8 and competes mainly with MPEG's High Efficiency Video
Coding (HEVC/H.265)
− combination of VP9 video and Opus audio in the WebM container
− Used by youtube
• AV1
− Successor of VP9
WHICH CODEC TO USE

• Capturing audio and video


• Size (compression) <-> quality
− Storage
− Bandwidth (streaming)
− https://mattgadient.com/x264-vs-x265-vs-vp8-vs-vp9-examples/
• Compatibility (transcoding)
− https://developer.mozilla.org/en-
US/docs/Web/HTML/Supported_media_formats
• Latency for encoding and decoding (VoIP, video
conferencing)
HARDWARE CODEC
• Hardware encoder: dedicated
processors that use a designed
algorithm to encode video and
data
• High speed
• Less flexibility
• Not upgradable
• High cost
− Boxcaster, Teradek VidiU, Teradek
Beam, NewTek TriCaster

• GPU Hardware acceleration


SOFTWARE CODEC

• programs that run on a computing device


• Higher latency than hardware encoders
• Flexible
• Low cost (free)

• Ffmpeg, Adobe Flash Media Live Encoder™, Telestream


Wirecast
FFMPEG

• Free, open source, cross platform command line tool to


convert audio and video
− Change container format, audio and video codecs, framerate,
sample rate, …
− Apply filters like rotate, colors, crop, scale, …
− Combine media files like PIP, horizontal stack, …
− Can take files, network streams, grabbing devices as input
− writes to an arbitrary number of outputs like files, network
streams
• Can be used in nginx
FFMPEG BASIC COMMANDS

• ffmpeg [global_options] {[input_file_options] -i


input_url} ... {[output_file_options] output_url} ...

• ffmpeg -i video.mp4 video.avi


− Change container format

• ffmpeg -i video.mp4 -c:v libx265 video_converted.mp4


− Change video codec to H.265
FFMPEG OUTPUT OPTIONS
• -c:v (-codec:v, -vcodec) output video codec
libx265, libx264, libvpx, libvpx-vp9, libtheora, copy
• -c:a (-codec:a, -acodec) output audio codec
aac, libvorbis, libopus, copy
• -qscale:v, -qscale:a quality between 0 and 10
• -b:v –b:a bitrate
ex: 512k 3M
• -pix_fmt sets the pixel format
uyvy422
• -sample_rate, -framerate
44100, 30
• -filter/-vf/-af filters
Scale=320:240 , format=gray, rotate=90, crop=out_w:out_h:x:y
FFMPEG FILTERS

• ffmpeg -i input.mp4 -vf scale=320:240 output.mp4


− Scale video to 320 by 240
• ffmpeg -i input.mp4 -vf scale=iw/2:-1 output.mp4
− Iw is input width, -1 is to keep aspect ratio

• ffmpeg -i input.mp4 -vf rotate=90,scale=iw/2:-1


output.mp4
− Use multiple filters
FFMPEG COMPLEX FILTERS
• ffmpeg -i input1.mp4 -i input2.mp4 -filter_complex "[0:v][1:v]hstack=inputs=2[out]" -map "[out]" output.mp4

• ffmpeg -i input1.mp4 -i input2.mp4 -i input3.mp4 –filter_complex “[0:v][1:v]hstack=inputs=2[middle],[middle][2:v]vstack=inputs=2[out]” –


map “[out]” output.mp4

− [0][1][2]hstack=inputs=3[out] stack 3 inputs horizontal


− [0][1]vstack=inputs=2[out] stack 2 inputs vertical
− [0][1]overlay=x=10:y=10[out] overlay 1 over 0
− [in] split [out0][out1] split file in 2, you can apply different filters to out1 and out2

− [0]lutrgb=r=negval:g=negval:b=negval[out] negative image


− [0]lutrgb=r=val*2[out] increase red by factor 2

− [0]trim=start=10:end=30[out] cut movie from 10s to 30s


− [0]trim=start=10:duration=20[out] cut movie from 10s for a duration of 20s

− volume=volume=0.5

− Vflip, hflip, rotate, ……..


MORE OPTIONS

• https://www.ostechnix.com/20-ffmpeg-commands-
beginners/
• https://ffmpeg.org/ffmpeg-filters.html
• http://randombio.com/linuxsetup141.html
HTML PROGRESSIVE DOWNLOAD

• <video> and <audio> tags


• Progressive download, not video
streaming
• Browser uses HTTP range requests to
request parts of a large file
• First part of file is downloaded and
played, following parts are all
downloaded as fast as possible
• Download file is stored locally
<VIDEO>
Attribute Value Description
<video width="320" height="240" controls>
autoplay autoplay Specifies that the video will start
<source src="movie.mp4" type="video/mp4"> playing as soon as it is ready
<source src="movie.ogg" type="video/ogg">
controls controls Specifies that video controls should
Your browser does not support the video tag. be displayed (such as a play/pause
</video> button etc).
height pixels Sets the height of the video player
loop loop Specifies that the video will start over
again, every time it is finished
muted muted Specifies that the audio output of the
video should be muted
poster URL Specifies an image to be shown while
Format MIME-type the video is downloading, or until the
user hits the play button
MP4 video/mp4 preload auto Specifies if and how the author thinks
metadata the video should be loaded when the
WebM video/webm none page loads
src URL Specifies the URL of the video file
Ogg video/ogg
width pixels Sets the width of the video player
<TRACK>
<video width="320" height="240" controls>
<source src=“jelly.mp4" type="video/mp4">
<source src=“jelly.ogg" type="video/ogg">
<track src="subtitles_en.vtt" kind="subtitles" srclang="en" label="English">
<track src="subtitles_nl.vtt" kind="subtitles" srclang="nl" label=“Nederlands">
</video>

Attribute Value Description


default default Specifies that the track is to be enabled if the user's preferences do not indicate that
another track would be more appropriate

kind captions Specifies the kind of text track


chapters
descriptions
metadata
subtitles
label text Specifies the title of the text track
src URL Required. Specifies the URL of the track file
srclang language_code Specifies the language of the track text data (required if kind="subtitles")
<AUDIO>
<audio controls> Attribute Value Description
<source src="horse.ogg" type="audio/ogg"> autoplay autoplay Specifies that the audio will start
<source src="horse.mp3" type="audio/mpeg"> playing as soon as it is ready
Your browser does not support the audio element.
</audio> controls controls Specifies that audio controls should
be displayed (such as a play/pause
button etc)

loop loop Specifies that the audio will start over


again, every time it is finished
Format MIME-type
muted muted Specifies that the audio output should
MP3 audio/mpeg
be muted
OGG audio/ogg
preload auto Specifies if and how the author thinks
WAV audio/wav metadata the audio should be loaded when the
none page loads

src URL Specifies the URL of the audio file


POSITION OF THE MOOV ATOM IN MP4 FILE
• When encoding,
information to put in
moov atom available at
the end of encoding
• Moov atom at end of the
file
• Player needs moov atom
first, before it knows
how to play encoded
data
• Multiple range requests
to find moov atom
PLACE MOOV ATOM IN THE FRONT

• Moov atom can be put in the beginning of file


• Moov atom is found at first range requenst
• fmpeg -i input.mp4 -movflags faststart -acodec copy -
vcodec copy output.mp4
SOURCES
• https://video.ibm.com/blog/streaming-video-tips/what-is-
video-encoding-codecs-compression-techniques/
• Beginning HTML5 Media
• Wp videocompresion
• https://developer.mozilla.org/en-
US/docs/Web/HTML/Supported_media_formats
• https://ledgernote.com/blog/science/how-does-mp3-
compression-work/
• https://www.w3schools.com
• https://www.ffmpeg.org/ffmpeg.html
• Multimedia networking (Ivan Vidal)

You might also like