Hardware-Assisted, Low-Cost Video Transcoding Solution in Wireless Networks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 1

Hardware-assisted, Low-cost Video Transcoding


Solution in Wireless Networks
Jongwon Yoon, Member, IEEE, and Suman Banerjee, Member, IEEE

Abstract—Wireless video streaming has become an extremely popular application in recent years. Internet video streaming to mobile
devices, however, faces several challenges, e.g., unstable wireless connections, long latency, high jitter and etc. Bitrate adaptive
streaming and video transcoding solutions are widely used to address the above-mentioned issues, however, there are still several
shortcomings of these approaches. Such challenges hinder providing satisfactory quality of video streaming service to the mobile users.
We propose a hardware-assisted, real-time video transcoding solution implemented on a commercial off-the-shelf device, Raspberry
Pi. We employ the software and hardware coupled architecture in order to improve the performance/quality of video streaming and
enhance the user satisfactions in wireless network. Our video transcoding solution can be applied to both the downlink and uplink
streaming: for downlink stream, it can provide agile bitrate adaptation to sudden network dynamics and enhance video quality by
running our transcoding solution at the wireless edge. It can be used to uplink stream for broadcasting live streams in real-time. We
present the design and implementation of our video transcoding system in both cases with practical scenarios. The evaluation results
reveal that our transcoding solution enhances the performance of video streaming compared with other adaptive bitrate streamings
and it provides higher video quality without causing rebuffering or video stall. We bridge the gap between the wireless channel capacity
and the video quality while providing a better streaming experience to end user.

Index Terms—Adaptive bitrate streaming, Video transcoding, Wireless networks

1 I NTRODUCTION an agile adaptation to the end user because of that channel


With the rapid increase in mobile devices, the amount of instability frequently occurs with user mobility in wireless
traffic destined to mobile devices is exponentially growing, networks. In addition, it is hard to pre-process the live media
and the online video streaming constitutes the majority of sources like breaking news, sports events, and teleconferences
the traffic to mobile device [1]. The keys to the success of in real-time. Other shortcomings of these schemes include the
internet video streaming is delivering high quality, providing following: they are not dynamically configurable in sudden
interruption/distortion-free and continuous playout with im- network changes, they require massive storage to host pre-
mediate start. In spite of the increasing demand for mobile configured video streams, and some of the content may be
video streaming, current wireless networks cannot keep up wasted if it is not watched.
and provide satisfactory quality of video streaming service to Video transcoding has emerged as an alternative technology
the mobile users due to several reasons; the wireless link is for optimizing video data, because of the difficulties of pre-
dynamic due to the link fluctuation and user mobility, it pro- processing various formats of video streams and the challenges
vides limited capacity and the wireless channels prone to un- of fine-grained bitrate adaptation. Video transcoding encodes
stable. These factors lead to un-desirable outcome of ensuring or converts the input stream to a pre-configured format (e.g.,
user satisfactory. Above-mentioned challenges and limitations codec, resolution, bitrate) to overcome the diversity and com-
have been addressed by proposals for better video delivery patibility issues presented by the multiple formats of video
approaches, such as adaptive bitrate streaming [2], scalable streams. It is commonly used in the area of mobile device
video coding (SVC) [3], and progressive downloading. In the content adaptation, where a target device does not support the
adaptive bitrate approach, the media server maintains multiple format or has a limited storage capacity and computational
copies of the same video, which are encoded in different resource that mandate a reduced file size. However, video
bitrates and qualities. However, the wide variety of bitrates, transcoding is a very expensive process requiring high com-
codecs, and formats makes it difficult for the video service putational power and resources, thus it is usually performed
provider to pre-process numerous media contents in advance. in the media server in an off-line manner. Several research
It also imposes higher overhead on content providers and the works use video transcoding solution in order to provide video
delivery infrastructure such as media server and storage. For streaming service to a wide range of devices [4], [5]. They
instance, Netflix maintains 120 different versions of each video propose a cloud transcoder to offload transcoding work in the
to deal with different screen sizes and bandwidth requirements. cloud and make the reusing of transcoded videos possible by
The limited selection of pre-processed bitrates can not provide storing them in the cache. This approach requires storage and
causes additional delay as video traffic is redirected from the
• J. Yoon is with the College of Computing, Hanyang University, Korea,
cloud system. It does not support real-time video streaming
e-mail: jongwon@hanyang.ac.kr Corresponding author. because it is based on an off-line transcoding. In short, video
• S. Banerjee is with the Computer Sciences Dept. University of Wisconsin- transcoding is a very effective way to optimize video data,
Madison, USA, e-mail: suman@cs.wisc.edu
however, it is expensive and is hard to apply for live streaming.

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 2

Given that video transcoding is compute-intensive work- ;20"( 3!


loads and not adequate to software-only solution, it is nec- )01#&!20"30"(
essary to consider hardware-assisted approach. Toward this,
we introduce a cost-effective, real-time video transcoding )01#&( 2! /01!$2.&()!
%! 0%6710"(
solution, TransPi, that employs the software and hardware !"#$#%&'(( *+,!$-.&()!
coupled architecture in order to improve the quality of video 27<"60( )*+,-((./( 8%109( "#!$%&'&()!
stream and user satisfactions. Instead of having expensive .! ((:'0(
transcoding process in the media server, we utilize a low- (/0$40%50"(
cost commercial off-the-shelf (COTS) device, Raspberry Pi ./(20$40%52(
[6], for video transcoding. The Raspberry Pi is a cheap, credit
(a) Adaptive bitrate streaming (HLS)
card sized computer includes a graphics processing unit (GPU)
and a hardware decoder and encoder that are able to accelerate )#*+,(-"&%./,*+"(
video transcoding process. TransPi can be easily applied to
0+$1+%-(
both the downlink and uplink video streams. In detail, TransPi 2+/,*+(
runs at the wireless edge (i.e., wireless last hop, AP) where 3%/,*+(
!"#$#%&'((
it can be agile to adapt to the sudden network changes in .,4"/+( 5"&%./,*+*(,4-64-(
order to provide a better streaming experience to the users.
In addition, we can use TransPi to uplink video stream for (b) Video transcoding example
real-time broadcasting with minimal delay. By running our
transcoding solution at the wireless AP (for downlink) or at Fig. 1. Adaptive streaming converts the original video into
the media source (for uplink), it can provide agile adaptation multiple versions at various bitrates. In contrast, the video
to sudden network dynamics and is able to quickly incorporate transcoding solution in TransPi generates a single stream
client’s network conditions, respectively. while adaptively switching the video bitrate.
To realize real-time transcoding solution on live streaming,
we partition the input stream into multiple small segments TransPi in Section 5. Section 6 presents a broadcast application
and apply our transcoding solution on-the-fly. The transcoded that uses TransPi to upload live streaming with minimal delay.
segments are thereby delivered to the client in real-time with We discuss related and future works in Section 7. Section 8
minimal delay. Then we provide the continuous transcoded concludes the paper.
video stream to the client. TransPi provides a seamless, flexible
bitrate adaptation service to the user, while taking into account
the user’s profile. In contrast to other transcoding work and 2 A DAPTIVE B ITRATE S TREAMING (ABR)
previously existing systems, we propose a storage-less, on- Traditionally, the video streaming solution utilizes either UDP-
the-fly, and seamlessly adaptive transcoding solution. Our based (RTP/RTSP) or TCP-based (RTMP) approach while
evaluations show TransPi provides a better streaming service optimizing video data to promote high efficiency. HTTP video
to the user. The most important contribution of our system is streaming over TCP is widely used and account for huge
the practical implementation of real-time video streaming opti- amount of network traffic due to the advantages of its profound
mization that can be applied to various streaming applications. infrastructures, servers and caching. Many challenges, how-
Our contributions in this work are multi-fold: ever, arise in delivering video data over HTTP; it is required
to have various video formats to support diverse users and
• This work is a systematic research and engineering effort
maintain stable connection between server and client, and the
in realizing the design and implementation of real-time
latency has to be carefully controlled for streaming live media.
video transcoding solution in practice.
Especially in highly variable wireless channels, it is critical to
• Our solution can be applicable to practical streaming ser-
adapt quickly to the channel conditions in order to provide
vices in uplink and downlink. We demonstrate efficacies
the user with a good watching experience. To overcome such
and performance improvements in real-world scenarios.
challenges, many solutions utilize bitrate adaptive mechanism,
• Unlike other works, TransPi provides dedicated streaming
HLS [7], MPEG-DASH [8], Adobe Dynamic Streaming and
to each user instantaneously taking the user’s profile and
Microsoft Smooth Streaming, however, they are not fully
quickly adapting to the network dynamics.
functional across different platforms which makes their usage
• TransPi provides seamless bitrate switches with fine-time
very limited.
granularity, thus the end user is not interrupted during
ABR. In an adaptive bitrate streaming, the server maintains
video playback.
multiple profiles of the same video encoded in various bitrates
The remainder of this paper is organized as follows. We and quality levels. Further, the video object is partitioned
introduce background of video streaming solutions widely into multiple segments each of which has a duration of
used today in Section 2. Section 3 motivates the transcoding a few seconds. A client player then requests the segments
solution while comparing the performance to conventional at different encoding bitrates, depending on the underlying
bitrate adaptive streaming. Section 4 describes the overview of network conditions and adaptation algorithm.
TransPi, details of our transcoding solution and its implemen- HTTP Live Streaming (HLS). HLS is an HTTP-based me-
tation for downlink stream. We evaluate the performance of dia streaming protocol implemented by Apple and supported

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 3

by all Apple and Android smart devices. HLS is an open +&


standard and provides great flexibility for implementation. A +,&
'()"*(")&
schematic view of HLS processes is depicted in Fig. 1(a). The
media encoder downloads original video stream, encodes it 0-"*&
!"#$%&
-"*."*& !"#$%& /&
into H.264 video format and generates an MPEG-2 Transport -"*."*&
Stream (TS). Then, the stream segmenter takes the MPEG-2
TS and produces a series of equal-length files (TS segments) Fig. 2. (A) The video streaming is provided from a
from it that are suitable for use in HLS. It also generates an remote media server. (B) The video segments are directly
index file (playlist) that contains a list of the media files. The downloaded from a media server at the wireless edge.
client player reads the index file, requests the transcoded video
data, and plays it. The client also switches between streams
dynamically if the available bandwidth changes.
latency, (ii) provide more agile service adapted to the network
Dynamic Adaptive Streaming over HTTP (DASH).
changes and (iii) have a vantage point from which to monitor
DASH is the first adaptive bit-rate HTTP-based streaming
the status of the client. Cloudlet architecture would be bene-
solution and uses HTTP web server infrastructure to deliver
ficial for delay-sensitive applications, especially over wireless
the content. Similarly to HLS, DASH works by breaking
networks, since it could address low latency requirement and
the overall stream into a sequence of small chunks. Before
quickly incorporate network dynamics. To realize the cloudlet-
the client player (e.g., dash.js) begins downloading media
based system, we have deployed the same HLS server in the
segments, the client first requests the MPD, which contains
wireless edge which is directly connected to the AP as depicted
the metadata for the various sub-streams that are available. As
in Fig. 2 (red dashed line B). In this setup, the ping latency
the stream is played, the client may select from a number
to the HLS-edge server is around 4 msec.
of different alternate streams containing the same material
encoded at a variety of bitrates, allowing the streaming session The client player sends a request to the media server and
to adapt to the available network capacity. adaptively downloads the video segments according to its
TransPi. TransPi supports the two most widely used stream- wireless capacity and buffer level. In this set of experiments,
ing formats, HLS and DASH, however it does not employ we assume the last wireless hop is the bottleneck wherein
their rate adaptation algorithms. In a nutshell, we use video many devices connect to the same AP and compete for
transcoding solution to generate video data whose bitrate bandwidth. In addition, the performance of wireless network
spontaneously switches according to network variations (Fig. is highly variable due to the changing bandwidth, latency,
1(b)). In our implementation, we use hardware decoder and interference and etc. We have used a COTS AP and injected
encoder to transcode input video to a single-rate output stream network variations by controlling the downlink capacity be-
based on the user’s network connectivity. TransPi adapts the tween AP and the client. While the client receives video
transcoded bitrate based on the network changes (once every streaming service, we adjust wireless capacity with 20 second
second). Unlike other adaptive streaming solutions, TransPi time intervals (e.g., 17, 12, 7, 4 and 17 Mbps) and trigger the
only provides a single transcoded stream and thus it does not client player to adapt the video bitrates. The client receives
provide a list of various bitrates for adaptation. video segments via AP according to the HLS’s adaptation al-
gorithm considering the client’s channel condition and profile
(e.g., buffer level, network capacity, available bitrates and etc.).
3 I N - EFFICIENCY /L IMITATIONS OF ABR
Similarly, we have tested the same experiment with the
We evaluate the performance of adaptive algorithms employed DASH protocol for comparison. There are various adaptation
in HLS and DASH. We then motivate the necessity of video algorithms for DASH and the following three algorithms are
transcoding solution in order to provide better streaming the most recent and widely used: (i) DASH-BOLA determines
service to the user by addressing several key questions. the bitrate according to the client’s buffer level, (ii) DASH-
throughput selects the bitrate based on the latest throughput
3.1 Bitrate Selection and Bandwidth Utilization measurements, and (iii) DASH-dynamic is a hybrid model
Testbed. We created a network testbed to evaluate the perfor- that utilizes both client’s buffer occupancy and instantaneous
mance of HLS and DASH. For our experiments, we first set throughput information (we have tested above-mentioned three
up media servers that provide adaptive streaming service with variations and found that DASH-dynamic algorithm outper-
HLS standards. The media server hosts pre-processed video forms others). We used the DASH-dynamic [10] and publicly
segments of the same video stream encoded in bitrates of 1.5, available DASH test streams [11] for evaluation. In DASH
3.5, 6.5, 10, 15 and 20 Mbps. We deployed one such media configuration, ten video bitrates are available (e.g., 0.25, 0.5,
server in a remote location where the end-to-end ping latency 1, 2, 3, 5, 8, 10, 12 and 15 Mbps) and the ping latency from
between a client and the HLS-remote server is around 100 the client to the server is around 4 msec which is similar to
msec (see line A in Fig. 2). that of HLS-edge.
The cloudlet system is introduced to support services that Bitrate. In Fig. 3, we present the selected video bitrates
require low-latency, such as real-time applications [9]. By of two adaptive bitrate streaming solutions, HLS and DASH-
providing the service from the vicinity of the end user, cloudlet dynamic, along with available network bandwidth between AP
architecture has multifold benefits: (i) reduce the network and the client (grey area). We notice that both DASH-dynamic

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 4

DASH HLS-edge HLS-remote Bandwidth utilization


20 HLS-remote 40.7%
DASH-dynamic 59.4%
HLS-edge 55.7%
15
Bitrate (Mbps)

TABLE 1
Both HLS and DASH-dynamic use less than 60% of the
10
available bandwidth.

5
condition). The performance of video streaming over wireless
could be improved when it is serviced from the vicinity of the
0
0 20 40 60 80 100 AP due to the above reasons.
Time (sec) Bandwidth. The bandwidth utilization can be inferred by
selected video bitrates and the time duration of video seg-
Fig. 3. The adaptive steaming solutions under-utilize the ments. In other words, the area of bitrate plotted in Fig. 3
available bandwidth by selecting lower video bitrates. corresponds to the bandwidth utilization for each adaptive
scheme. We obtained the average bandwidth utilization of
HLS-remote, HLS-edge, and DASH-dynamic by repeating
and HLS conservatively under-utilize available bandwidth, and multiple runs with the same condition and summarized the
hence the user experiences a lower video quality (HLS-remote results in Table 1. It can be seen that both HLS and DASH-
selects lower video bitrate comparing to HLS-edge under the dynamic barely use about 60% of the network capacity. Again,
same network conditions). Such lower video quality could we confirm that the cloudlet (i.e., HLS-edge) improves HLS’s
have been improved, given that network bandwidth is much performance by 37% in terms of bandwidth utilization. We
higher than the selected bitrates. This observation corroborates can see that the selection of lower bitrate in turn leads to the
findings from prior researches that compare the performance under-utilization of given network capacity.
of several adaptive HTTP streaming players [12], [13]. Perhaps Thus, carefully selected video bitrates are indeed critical
one may argue that the client selects lower bitrates due to the for improving bandwidth utilization and to providing higher
unavailability of other bitrates, however, this may not be true. streaming quality to the users. In addition, providing video
For instance, the client could have picked a video segment of streaming at the wireless edge (AP) could help to improve the
6.5, 10 or 15 Mbps bitrate during 80-100 seconds while the quality of video streaming.
available bandwidth is around 17 Mbps, however, 3.5 and 10
Mbps were the highest bitrates selected by HLS-remote and 3.2 Rebuffering and Stall
HLS-edge, respectively.
In this section, we focus on how bitrate selection affects video
We notice that HLS and DASH-dynamic sometimes select
quality with respect to video freeze and rebuffering time. The
bitrates higher than the available network capacity. Such ag-
client player utilizes the buffer for storing video frames ahead
gressive selection causes significant amounts of time to freeze
of time in order to prevent stalls due to unexpected networking
the video during playback, resulting in poor video quality (we
changes and dynamics. Besides the network capacity, the
will elaborate in the next subsection). After a high selection,
player’s buffer occupancy is one of the factors for adaptively
DASH-dynamic lower the bitrate to the lowest (0.25 Mbps at
determining the bitrates of video segments when downloading.
71 second) even though it could adapt to higher bitrates. The
For instance, when the player’s buffer level is above the
main reason that both HLS and DASH-dynamic can not select
threshold, then the client chooses higher bitrates to provide
an appropriate bitrate is due to the behavior of their adaptation
a better video quality, whereas when the buffer is low, the
algorithm (inaccurate estimation, not able to detect network
client selects lower bitrates to fill the buffer up again. In this
changes, decision priority and etc.). One interesting observa-
sense, if bitrate decisions are not carefully managed based on
tion is that selected bitrates are higher when the client receives
buffer occupancy, then it could lead to unpleasant results such
video segments from the media server connected to AP (HLS-
as frequent stalls and long rebuffering times.
edge) than they are from the remote server (HLS-remote).
Buffer. We used the same HLS remote server and controlled
Higher bitrates could be beneficial for users because they
the network capacity at the AP in the same manner as
could provide a better streaming service. Note that HLS-edge,
described in the subsection 3.1. During the experiments, we
however, still under-utilizes the available bandwidth. There are
measured the instantaneous buffer level at the client player and
many network factors that have an impact on the quality of
its bitrate decisions. Fig. 4 presents one particular example of
video streaming service. For example, high bandwidth and low
HLS’s buffer level and selected bitrate while streaming video
network latency are required to provide satisfactory service
to the client.1 We observed buffer depletion between 30 and
(e.g., no stall, no glitch) to users. In this sense, there are multi-
40 seconds as marked in the circle. During this period, the
fold advantages of deploying video streaming service near the
client player stopped and rebuffered for about 10 seconds.
AP; (i) it can reduce the latency by providing service close
to the user and (ii) it is agile to network changes since it can 1. We have observed same behavior from other experiments and presented
instantaneously incorporate client’s feedback (wireless channel one of them for the sake of brevity. The duration of test video is 600 seconds.

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 5

20 20 35 20
Buffer Buffer Bitrate
Wrong selection Bitrate
30

Selected bitrate (Mbps)


15 15 15
Buffer level (sec)

25

Bitrate (Mbps)
Buffer (sec)
wrong
20 selection
10 Wrong selection 10 10
wrong
15 selection

10
5 5 5
5

0 0 0 0
0 20 40 60 80 100 0 20 40 60 80 100
Time (sec) Time (sec)

Fig. 4. Inaccurate bitrate selection in HLS leads to stall Fig. 5. DASH-dynamic keeps higher bitrates than the
and rebuffering. The client player freezes during 30-40 available bandwidth (at 40 and 60 seconds), which results
seconds due to buffer depletion (marked in a red circle). in almost buffer depletion.

Number of stalls Rebuffering duration


The main reason the client’s player stalled is that the client HLS-remote 1.1 9.7 sec
DASH-dynamic 0.9 8.4 sec
selected inappropriate (too high) bitrates. In particular, we can HLS-edge 0 0 sec
see that the buffer level decreases at 23 seconds when the client
aggressively switched its bitrate to 15 Mbps, even though the TABLE 2
available bandwidth is 12 Mbps. This bitrate switch depleted Average number of stalls and rebuffering duration for
the buffer and eventually caused video stall. In consequence, HLS and DASH-dynamic during 100 second playback.
the client lowered the bitrate from 15 to 6.5 Mbps at 30
seconds, and thus filled up the buffer again. This late switching
is due to inappropriate decisions of HLS’s adaptive algorithm;
an algorithm that is based on a mixture of buffer occupancy immediately affect the quality of streaming service and user
and inaccurate estimation. The client were able to avoid such engagements [14]. This result shows that the inaccurate bitrate
stall and rebuffering by keeping the bitrate at 10 Mbps which adaptation used in HLS and DASH-dynamic does not work
is slightly lower than the available bandwidth. We observed the well in a highly variable network, therefore it hinders users
same behavior between 60 and 70 seconds, however the client from achieving the QoS requirement of streaming video.
barely avoid buffer depletion. The sudden bandwidth changes Unlike DASH-dynamic and HLS-remote, HLS-edge does not
in the middle of downloading video segment could trigger the introduce stall or rebuffering, but it can not efficiently utilize
client switch to different version (bitrate) of segment. This bandwidth while selecting low bitrates. Again, better bitrate
also causes buffer depletion because the partially downloaded adaptation algorithm and carefully selected bitrates based on
segment cannot be used. accurate bandwidth estimation and buffer occupancy could
Similarly, we have repeated the same experiment with the prevent video stalls, and hence enhance QoE.
DASH-dynamic client and server. Fig. 5 shows the buffer oc- This in turn motivates us to design a channel-aware bitrate
cupancy and the selected bitrates. Recall that DASH-dynamic decision that does not incur stalls or freezes. We also consider
uses both throughput estimation and buffer information to the edge-based system for streaming service given the benefits
determine the bitrate. In detail, DASH-dynamic initially selects of the cloudlet in wireless networks.
and maintains the lowest bitrate until the buffer occupancy
reaches the threshold (i.e., 30 seconds). Once DASH-dynamic
has sufficient buffers, it adjusts the bitrates based on through- 3.3 Video Transcoding
put and buffer. Its inappropriate selections (at 40 and 60 The inefficiency of current bitrate adaptive solutions motivates
seconds) cause almost buffer depletion (at 67 seconds) and us to design a new approach for streaming video. Several
DASH-dynamic immediately switches to the lowest bitrate inferences from the experiment results are summarized as
to fill the buffer. Again, when the buffer occupancy reaches follows: (i) bitrate adaptive algorithms implemented on HLS
the threshold (at 85 seconds), DASH-dynamic switches to the and DASH-dynamic are not as efficient at adapting to the
higher bitrates. Like HLS, DASH-dynamic does not take full network changes, given that selected bitrates are not optimal,
advantage of the available bandwidth. (ii) the number of video bitrates of pre-processed segments are
Stall. Table 2 summarizes the average number of stalls not sufficient to handle the network dynamics, and (iii) bitrate
and their duration obtained from multiple runs under the adaptation does not work well in the wireless environment,
same configurations. We found the same conclusion under especially where the last hop to a client is wireless.
different settings, e.g., different downlink capacity and bi- Regarding the first inference, one of the reasons that bitrate
trates of segments. Frequent stalls and the rebuffering time decisions are not optimal is due to the inaccurate channel

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 6

4=<.2>?3&805@=.&

'()&
A.6<1&35-04.& &&&/0123456.0&&
!*B/'%& !71389.00:&;<%&
&&&R2>.02.>& )''&
!"#$%& '()&
*+)& +')& ,-.-. & )'' &
'()&
!*B$"%&
)19=.&/C&2.>D50E&
&&&&&FG&4H122.=3& IJ&7.4.<K<2L&1&4H122.=&0.M-.3>&126&>H.&-3.0&805@=.&
FJ&*.>.0N<2<2L&9<>01>.3&913.6&52&>H.&4=<.2>?3&<2O5&
PJ&/0123456<2L&50<L<21=&K<6.5&D<>H&>H.&6.>.0N<2.6&9<>01>.&
QJ&*.=<K.0<2L&>0123456.6&3>0.1N&>5&>H.&4=<.2>3&

Fig. 6. TransPi interacts with the client to determine the parameters (e.g., bitrate) for video transcoding and provides
seamless streaming service in real-time. TransPi switches the encoding bitrate according to changes in the network.

estimation and subsequent reactions based on the estima- server, we leverage a video transcoding scheme that generates
tion. There are several approaches to improve the adaptation a single stream whose bitrate is instantaneously determined.
algorithm, however, they have focused on application layer Instead of adapting within given bitrates of segments, TransPi
solutions (e.g., accurate channel estimation, better adaptation transcodes the original source to the most appropriate bitrates
algorithm and etc.) which do not involve changes of or to the and provides streaming service. It thereby eliminates ineffi-
media source itself. Second, it is challenging to pre-process cient bitrate adaptations for the client. TransPi takes a channel-
and store numerous bitrates of segments for providing fine- aware approach for determining the bitrates of a transcoded
grained bitrate adaptation because there are too many video stream (details in Section 4). Unlike HLS and DASH-dynamic,
codecs and platforms of diverse users and pre-processing TransPi adaptively switches the video bitrates while receiv-
requires additional storage for outputs. A limited choice of ing instantaneous client’s bandwidth feedback from the AP,
bitrates frequently causes under-utilization of network capac- therefore it is more agile to network changes. Knowing the
ity, and this leads to lower quality of streaming service. It benefits of cloudlet architecture for video streaming service,
is hard to pre-process live media sources (e.g., sports games, we deploy our transcoding system at the wireless edge where it
news, and teleconferences) in real-time, thus pre-processing is can monitor the client’s profile from the nearest vantage point.
limited to certain purposes such as video on demand. Third, the In addition, having video transcoding solution at the wireless
cloudlet or wireless edge system would be beneficial for video edge provides more affordability to the end user and allows
streaming service especially over wireless networks since it speedy deployment in existing systems. Since TransPi changes
could address low latency requirement and quickly incorporate the media source directly, it does not require additional storage
high variations of wireless network. One may argue that for hosting various bitrates of segments.
most recent wireless standard (e.g., 802.11ac) and AP can
provide much higher bandwidth (e.g., up to 1 Gbps), therefore
last hop bottleneck no longer exists. However, the wireless 4 S YSTEM D ESIGN
environment is highly variable due to fading, interference and
4.1 Overview
etc., therefore it is not able to guarantee stable and continuous
service to end user. Specifically, the authors in [23] showed a Our system consists of four components; media source
large variation in terms of throughput while having streaming (DATN), video transcoder on Raspberry Pi (RPi), bandwidth
service. For instance, network capacity varies widely from monitor running on AP and client’s video player as depicted
500Kbps to 17 Mbps when downloading video segment. To in Fig. 6. The basic operations of system are as follows:
provide better streaming services, we need to address the 1. On the client’s video player, the client selects a channel
issues of unstable network, high fluctuations in throughput and to watch. This will trigger the transcoding process by
bandwidth sharing. sending a channel request to the video transcoder on RPi.
Above-mentioned inferences motivate us to design a 2. The bandwidth monitor running on AP periodically (100
channel-aware transcoding solution that directly modifies the msec) sends the client’s information to the encoder on
media data instead of providing various versions and bitrates. RPi. This information is used for determining transcoding
We propose TransPi, a video transcoding system running at parameters instantaneously.
the wireless edge that enhances user experience with the video 3. The video transcoder fetches the requested TV stream
streaming. TransPi provides seamless streaming service to the from the media source (DATN) and then decodes and
user in a variant wireless environment taking into account the encodes the requested stream. While encoding, the video
user’s profile. Considering the difficulty of storing multiple transcoder adapts the video bitrates for transcoding based
versions of the same video with various bitrates in a media on the received client’s information.

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 7

4. The transcoded output segments are consistently deliv-


ered to the client over the TCP. The client then plays
them in real-time.
We describe the details of each procedure and their implemen-
tations in the following subsections.

4.2 TransPi Implementation


4.2.1 Live Media Source, DATN
The University of Wisconsin-Madison campus network pro-
vides the IP-based Digital Academic Television Network
(DATN) service in the area of campus dormitories and Uni-
versity housing so that users are able to access TV streams
Fig. 7. VideoCoreIV GPU performs computationally in-
with their mobile devices or computers [15]. DATN is a TV
tensive video transcoding tasks. ARM CPU is responsible
network that carries 27 channels (e.g., CNN, NBC, ABC, and
for coordination and data passing.
etc. running 24/7) operated by the University. However, due to
the scarce resources in wireless capacity and the high volume
of demand, users may experience unsatisfactory video quality.
To remedy this problem, we apply our transcoding solution
on DATN channels to enhance the user experience. We make scalability. Node.js Express [18] is used to build the transcod-
the incoming DATN TV streams to have either HLS or DASH ing manager. The transcoding agent reports the status of the
format to support various methods in TransPi implementation, transcoding workers to the transcoding manager including the
however other formats can be also incorporated. available resources. We use MQTT [19] last will message
to track the status in real-time. Note that, if multiple users
4.2.2 Raspberry Pi share the same transcoded stream, single RPi can support more
than three users. In a typical home environment, one RPi is
We use Raspberry Pi (RPi) for transcoding live streams
sufficient for providing transcoding service. To provide stable
from DATN. The RPi is a low-cost, credit card sized com-
streaming services and support various streaming techniques
puter that includes a CPU (900MHz ARM Cortex-A7 quad
(e.g., RTMP, HLS, MPEG-DASH) to the end user, we employ
core), a powerful GPU (VideoCoreIV) and a hardware video
RTMP module on PRi. RPi maintains TCP connection to the
decoder and encoder. Using this architecture, our design
client for stable streaming while minimizing delivery latency
strategy is to execute light-weight operations on the CPU,
using RTMP (details in Section 4.3).
and run computationally intensive tasks on the GPU. The
hardware decoder/encoder can support HD (1080p) video de-
coding/encoding with low power consumption (< 3.5W). The
RPi provides OpenMAX APIs [16] to access the video decoder
4.2.3 Transcoding Worker
and encoder for the video transcoding process. For instance,
hardware decoder and encoder can improve the performance
of transcoding process more than 4 times faster with 18 times The transcoding worker is a process that transcodes an input
less CPU utilization comparing to the software-based video stream. It only transcodes the video data and simply bypasses
transcoding [17].2 The transcoding process is accelerated by the audio data because the size of audio data (e.g., 126 kbps)
the hardware video decoder and encoder on RPi. The hardware is almost negligible compared to the video data. VideoCoreIV
video decoder and encoder on RPi can transcode both 720p (GPU) primarily performs the video transcoding task. The
HD and one SD streams or three SD streams simultaneously. ARM processor is responsible for executing the networking
Therefore, single RPi can provide independent transcoding protocol stack and demultiplexing and multiplexing the video
service to multiple users (up to three) at the same time. streams (note that audio data is simply bypassed).
There are two types of processes running on RPi: transcod- In RPi, OpenMAX IL interfaces are provided that allows
ing agent and transcoding worker. To transcode multiple application layer software to access hardware video decoder
streams at the same time, 1∼3 transcoding workers (depending and encoder. Instead of calling the OpenMAX IL interfaces
on the number of transcoding tasks) are executed on the com- directly, we built an application using the GStreamer frame-
puting resource. The transcoding agent executed on an ARM work [20]. GStreamer is a widely used open source multimedia
processor has two responsibilities: (i) it creates/terminates framework for building media processing application and has
transcoding workers based on requested tasks and (ii) it a well-designed plug-in (filter) system. Specifically, gst-omx
arranges/schedules them. We build a RPi-center which consists is a GStreamer OpenMAX IL wrapper plugin that is used to
of multiple RPis and transcoding manager in order to increase access hardware video decoder and encoder resources in our
implementation [21]. Fig. 7 shows the software architecture
2. Note that different hardware implementations lead to various perfor-
mance gain. We observed from our experiments that the performance gain of the transcoding worker in TransPi. Next, we describe the
would be much higher with fine-tuning and optimization. implementation details of the GStreamer.

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 8

of video transcoding. The video transcoder running on RPi


first receives a TV channel request from the client and fetches
the TV stream from the DATN. At this stage, the incoming
TV stream is segmented (duration of several seconds). The
transcoder then decodes the segment and encodes it with the
selected transcoding parameters (e.g., bitrate, profile, level,
IDR interval). While encoding, the video transcoder instan-
taneously adapts the bitrates (once every second) based on the
received client’s information. The output of the transcoding
process is pipelined to the output queue in the transcoder, then
the transcoder converts it to a continuous video stream before
it is delivered to the client (we maintain the output queue size
of 400 msec in order to minimize the delay to the client). The
client receives the transcoded stream over the TCP connection
from the transcoder and plays it in real-time.
Fig. 8. The architecture of GStreamer pipeline. After
The embedded video decoder and encoder in RPi are highly
demuxing, video data is transcoded through pipelines
efficient for real-time video transcoding. The reason TransPi
whereas audio data is bypassed.
is able to transcode live streaming in real-time is that it first
transcodes the incoming video segments immediately and then
concatenates transcoded outputs into a single stream. The
4.2.4 GStreamer Implementation delay caused by the transcoding process is very minimal; there
We use GStreamer to develop our video transcoding applica- is only about 400 msec delay for live TV streaming. TransPi
tion on RPi (shown in Fig. 8)3 . In a GStreamer application, shows the feasibility of low latency, on-the-fly and real-time
plugins are connected to form a pipeline to process media transcoding service with live TV channels. The transcoding
data. In our implementation, all plugins are executed on ARM complexity is almost negligible since video transcoding is
processor except H.264 decoder and H.264 encoder plugins. handled on a dedicated machine, Raspberry Pi, therefore, it
The default behavior of the GStreamer plugin is summarized does not incur any overhead to the streaming service.
in three steps: (i) it reads data from the source pad, (ii) it
processes data, and (iii) it writes data to the sink pad. The 4.2.6 Channel-aware Bitrate Decision
GStreamer pipeline moves data and signals between connected Inaccurate channel estimation due to the significant channel
source and sink pads. The plugins operate independently and variation in wireless network lead to poor performance in
process video data sequentially. For instance, the HLS demux adaptive streaming service [23]. In TransPi, video transcoding
plugin is used to manage the connection with the HLS media parameters have a great impact on video quality. For instance,
server and download video segments while the TS demux bitrate selection is critical for efficiently utilizing available
plugin is used to separate video and audio data from the bandwidth and enhancing video quality. The AP passively
video stream. GStreamer plugins for the hardware decoder monitors the wireless bandwidth of each client to determine
and encoder are then connected to transcode the video data. video encoding parameters. In our implementation, we use
In general, both H.264 decoder and encoder send and receive the client’s downlink bandwidth to decide the bitrates of
large amounts of data between the ARM processor’s memory transcoding outputs. The downlink bandwidth for each client is
and the GPU, however such data exchange wastes CPU cycles. recorded every 100 msec and reported to the encoder on RPi.
To address this issue, we have extensively modified gst-omx We tested various values for the interval of client’s feedback
implementation and have created hardware tunneling between and empirically set it to 100 msec because it provides fine-
the encoder and decoder to minimize the CPU load. Moreover, grained network information and does not cause any overhead
we have optimized plungins to realize on-the-fly transcoder. on our system.
This hardware tunneling is important for real-time video Based on the received client’s profile, the video encoder runs
transcoding, because it enables transcoding multiple videos bitrate decision algorithm as described in Algorithm 1 once
simultaneously (e.g., transcoding one HD and one SD videos). every second. Note that TransPi has no restrictions on setting
After the transcoding process, the video data and audio data the adaptation interval. In general, the adaptation interval for
are muxed again to generate the TS stream and then the client other streaming service (e.g., DASH and HLS) depends on the
downloads that TS stream over the TCP connection. segment size which varies from 2 to 10 seconds. Given that
wireless network frequently induces sudden network changes
4.2.5 Video Transcoding (i.e., throughput variation is high) and dynamics, longer switch
interval may not be able to provide agile adaptation. In
The transcoding procedure is described as follows. The
contrast, shorter interval triggers frequent bitrate switches, and
transcoder periodically receives the client’s profile (e.g., down-
hence it may discomfort user. We have tested various param-
link bandwidth) from the AP for determining the parameters
eters (results are omitted for the sake of brevity) and set the
3. Our GStreamer implementation and RPi buildroot are publicly available duration of each video chunk to one second. In consequence,
at [22]. the adaptation algorithm is executed once every second and

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 9

Algorithm 1 Bitrate decision for transcoding DASH HLS TransPi


20
1: Input: client’s profile
2: Output: transcoding bitrate for next video segment
3: bi : bitrate for client i, pi : client i’s bandwidth 15

Bitrate (Mbps)
4: for i ∈ [1 : |I|] do
5: if (pi > bi × (1 + α)) || (pi < bi × (1 − α)) then 10
6: bi = pi , si = pi
7: end if
8: end for 5
9: si : selected bitrate for client i

0
0 20 40 60 80 100
Time (sec)
provides agile adaptation. We set α = 0.1 to switch the bitrate
of encoding when the network bandwidth changes more than
Fig. 9. TransPi’s channel-aware approach keeps the
10% compared to the previous configuration (we empirically
transcoding bitrates close to the client’s downlink capacity,
set α to 0.1; a lower threshold triggers frequent adaptations,
thus fully utilizes bandwidth.
whereas a higher threshold can not respond quickly to network
changes). We have tried other parameters and bitrate selection
algorithms, however their gains are not significant considering Time (sec) Bandwidth Avg. bitrate Stdev.
0-20 17 Mbps 16.90 Mbps 0.19 Mbps
complexity. Our goal is to enhance the user experience (e.g., 20-40 12 Mbps 11.83 Mbps 0.21 Mbps
achieve higher bitrate and eliminate stalls and rebuffering) 40-60 7 Mbps 6.88 Mbps 0.19 Mbps
by providing the most appropriate video bitrates according to 60-80 4 Mbps 3.96 Mbps 0.08 Mbps
80-100 17 Mbps 16.96 Mbps 0.08 Mbps
downlink capacity, thus Algorithm 1 is simple but effective
for real-time video streaming.
TABLE 3
The average video bitrate of TransPi is very close to the
4.3 Client Player network capacity.
We used multiple open-source video players (ffmpeg and
gstreamer) on clients for playback. The video player initially
creates a TCP connection (for reliability and performance transcoded output is due to the behavior of the hardware
reasons) to the video transcoder on RPi and requests a TV video encoder in RPi. This video encoder sets the range of
stream. Moreover, streaming over TCP is highly scalable, video bitrates during the encoding process instead of a single
it does not require web server an additional software or bitrate value. For this reason, even if the measured bandwidth
plugins and it does not require synchronization between video is stable (with small variation) and we set the video bitrates,
streaming server and client. Video quality would not be the video encoder is not able to keep the output video bitrates
degraded by network impairments (e.g., high delay and packet at a single value. The variation of bitrate across I/P-frames in
loss) because of TCP’s reliable service. However, client-side a group of picture (GoP) inherently propagates to the bitrate
rebuffering can still occur and can not be handled by TCP, of transcoding. As a result, this introduces small variations
therefore it can affect the user experience. This is why we in output bitrates as shown in Fig. 9. We present the average
need better bitrate adaptation or equivalent solution to enhance bitrate and standard deviation of the transcoded stream in Table
the streaming quality. The transcoded video segments are 3. The average of bitrates during each 20 second period (with
consistently delivered to the client as a single, continuous respect to the network changes) is very close to that of the
stream and the client plays them in real-time. Note that we bandwidth and the standard deviation remains relatively small.
only adapt the HLS and DASH formats for playing video TransPi frequently switches the bitrates, however, this does
without having their bitrate adaptation algorithm since TransPi not break (or interrupt) video playback nor affect the user
only provides a single stream at a time. experience at all because the client player seamlessly plays
continuous video stream. Specifically, there are no sudden
5 E VALUATION scene change or freeze during playback.
Comparison with DASH-BBA. Buffer-based approach
5.1 Controlled Setting (DASH-BBA) [23] is one of the state-of-the-art streaming
Bitrate. To show how accurately TransPi adapts bitrates for solutions and it adapts the bitrate according to buffer occu-
transcoding according to the network changes, we present pancy. In order to compare the performance of TansPi with
video bitrates of transcoded output and network bandwidth DASH-BBA, we have conducted experiments with the same
(grey area) in Fig. 9. We can clearly see that the video bitrate network configuration. The selected bitrate and instantaneous
of transcoded output is very close to the downlink capacity buffer occupancy of DASH-BBA are shown in Fig. 10. Similar
while TransPi incorporates the client’s instantaneous chan- to HLS and DASH-dynamic, we confirm that DASH-BBA
nel feedback to determine the video bitrates of transcoding. underutilizes the available bandwidth. For instance, during the
We point out that the small variation of video bitrates of first 20 seconds, DASH-BBA could switch to higher bitrates

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 10

20 25
TransPi-bitrate BBA-Buffer 0.5
BBA-bitrate Transcoder
Client
Bitrate / Bandwidth (Mbps)

Transcoder Queue (sec)


20 100
0.4
15

Client Buffer (KB)


Buffer (sec)
80
15 0.3
10 60
10 0.2
40
5
5 0.1
20

0 0 0 0
0 20 40 60 80 100 0 20 40 60 80 100 120
Time (sec) Time (sec)

Fig. 10. DASH-BBA frequently switches bitrate and un- Fig. 11. The transcoder keeps the size of output queue to
derutilizes the network capacity. 400 msec to minimize the delay. The client’s input buffer
remains stable (30-50 Kbytes).

Scheme Avg. throughput Bandwidth utilization


DASH-BBA 6.15 Mbps 53.9 % queue of the transcoder at 400 msec,4 which is not only enough
TransPi 11.31 Mbps 99.2 %
for complementing sudden changes in the wireless network
but also sufficient for providing real-time streaming service
TABLE 4
to the clients with minimal delay (eventually TransPi adds a
TransPi’s throughput and bandwidth utilization are almost
400 msec delay). In Fig. 11, we plot both the transcoder’s
twice those of DASH-BBA.
output queue and the client’s input buffer during 120 second
playback (while controlling bandwidth in the same manner).
This shows that TransPi constantly keeps the output buffer
(e.g., 12, 15, 17 Mbps), however it selects lower bitrates level close to 400 msec while the client’s buffer occupancy
(e.g., 8, 12 Mbps). After 20 seconds, the DASH-BBA buffer is slightly effected by the network changes. There are small
continues to decrease and the bitrate abruptly drops to 200 variations (50-100 msec) in the transcoder’s output buffer size;
Kbps (from 12 Mbps) in order to fill the buffer. We can also see this is mainly due to the variation of bitrates across the GoP.
that DASH-BBA frequently switches bitrate, whereas TransPi We present the transcoder’s queue size in terms of second
has relatively stable bitrate. Frequent bitrate switches could (in Fig. 11) to highlight the minimal delay caused by the
negatively impact user satisfaction, which could reduce user transcoder (i.e., 400 msec), however it can be represented in
engagement. KB, buffer size. For instance, 300 and 400 msec correspond to
Bandwidth utilization and throughput. We evaluate the 57 KB and 75 KB, respectively. The transcoder’s queue size
bandwidth utilization of TransPi and compare it to that of is maintained on average 67 KB during 120 second playback
HLS and DASH-dynamic. TransPi fully utilizes the available which is slightly higher than the client’s buffer size.
capacity, and hence the bandwidth utilization is 99.2%. We can The buffer size of the clients’s player also remains stable
tell from Fig. 9 that TransPi’s output video bitrate is very close (at between 30-50 KB) because the incoming data rate of
to the downlink capacity. Compared to HLS-edge and DASH- video segments equals to the playback bitrate. The player
dynamic, TransPi improves bandwidth utilization by 78% and also can slightly adjust the playing speed according to the
67%, respectively, while TransPi provides much higher bitrate buffer size without causing any noticeable difference to the
than other schemes. human eye; e.g., increase (decrease) the speed when the
Table 4 summarizes the average throughput and bandwidth buffer level is higher (lower) than the threshold. We can
utilization of TransPi and DASH-BBA obtained from multiple observe that bandwidth changes do not affect the client’s
runs. We confirm that TransPi outperforms DASH-BBA in buffer level. The client’s buffer neither dries out nor overflows
terms of bitrate, throughput and bandwidth utilization. DASH- because of that TransPi switches video bitrates according to
BBA accurately estimates network capacity and adapts bitrate the downlink speed. Due to this, we do not observe either
accordingly, however it still does not fully utilize the available stall nor rebuffering time during playback. The buffer size
bandwidth due to the limited number of encoded bitrates. of the transcoder is related to the periodicity of the bitrate
Moreover, DASH-BBA unnecessarily selects the lowest bitrate decisions. TransPi makes a bitrate decision once every second
to keep the buffer level high enough. The high bitrate of incorporating client’s channel information. One of the reasons
TransPi can lead to high video quality, assuming the same for keeping a large buffer for current state-of-the-art video
compression ratio of the video encoder. players is to prevent stall and rebuffering time due to network
Buffer occupancy. The size of the transcoder output buffer
4. We empirically set the output queue to 400 msec; a lower threshold
eventually introduces some delay to real-time streaming while could cause stall and a higher value could introduce a longer delay. 400 msec
playing on a video on the client side. We maintain the output strikes a good balance.

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 11

25 DASH BBA TransPi 1

20 0.8
Bitrate (Mbps)

0.6
15

CDF
0.4
10
0.2 TransPi
5 DASH
BBA
0
0 100 200 300 400 500 600 700 800
0 Time (msec)
0 20 40 60 80 100 120 140
Time (sec) Fig. 13. CDF of end-to-end delay.
Fig. 12. TransPi adapts to higher bitrates whenever
DASH-BBA or DASH-dynamic select lower bitrates.

each scheme because they are very similar to the results of


changes, however, this is not necessary in TransPi because the controlled experiments.
our channel-aware bitrate decision accounts for unexpected End-to-end delay. The performance of the streaming ser-
network variations immediately. Moreover, a larger buffer vice is affected by the end-to-end delay. To quantify the end-
would increase the delay in real-time streaming service, thus to-end delay, we measured the RTT (time difference between
we keep the transcoder output queue low in TransPi. when the client sent the request and when the requested
segment downloaded) for each video segment. We present
5.2 Uncontrolled (Realistic) Setting the CDF of end-to-end delay for TransPi, DASH-dynamic
and DASH-BBA in Fig. 13. The median delays for TransPi,
We evaluate the performance of TransPi in realistic scenarios
DASH-BBA and DASH-dynamic are 216, 280 and 393 msec,
where network bandwidth is not controlled and shared with
respectively. We can see that TransPi has small delay and less
other solutions. We created a local network that consists of
delay variation than others. In summary, TransPi reduces the
a media server and three clients: TransPi, DASH-BBA and
buffer size and delay while keeping the bitrates close to the
DASH-dynamic. Both DASH clients connect to the media
available bandwidth.
server and pull the video segments according to their bitrate
adaptation algorithm, whereas TransPi adapts video bitrates Trace-driven emulation. We create a trace-driven emulator
on-the-fly. To overcome the wireless channel variations, we to evaluate the performance of TransPi in reproducible and fair
performed multiple experiments with the same configurations. scenarios. The emulation testbed consists of a media server,
Bitrate. We present the selected bitrates for each scheme network emulator and mobile clients and is similar to that of
in Fig. 12. Similar to the controlled experiments, DASH-BBA Cellsim [24]. We use publicly available wireless traces; 4G
frequently switches bitrate and DASH-dynamic aggressively LTE dataset [25] and FCC broadband dataset [26]. 4G LTE
selects high bitrates (e.g., 20 Mbps at 16 and 100 seconds) dataset contains various mobility traces such as car, tram, train
and this decision leads to buffer depletion and video stalls. As and pedestrian. The FCC collects raw data that are periodically
a result, DASH-dynamic is halted for re-buffering twice for measured by ISPs in the United States. Our network emulator
5.6 and 4.3 seconds (during 150 seconds playback). DASH- takes traces and configures the network parameters accordingly
BBA and DASH-dynamic use both buffer information and (e.g., downlink bandwidth, RSSI, delay and etc.). The only
network capacity to determine the bitrate, but buffer has a difference with Cellsim is that our emulator configures the
higher priority than bandwidth (sometimes bandwidth may not downlink network rather than the uplink network because we
be considered). Note that, TransPi adapts to higher bitrates consider the performance of the clients. For fair comparison,
whenever DASH-BBA or DASH-dynamic select lower bitrates we conducted experiments one by one under the same network
due to rebuffering. For example, TransPi selects ∼18 Mbps at configurations while taking various types of traces. Table 5
30 seconds and ∼22 Mbps at 105 seconds. We confirm that summarizes the results; average bitrate, end-to-end delay and
TransPi takes full advantage of available bandwidth. Overall, re-buffering. The average bitrate for DASH-dynamic is higher
we can see that the bitrate for TransPi is higher than the others. than that of TransPi, because DASH-dynamic aggressively
Throughput. We repeated the competing scenario multiple selected the higher bitrate than the network bandwidth. In
times and measured the throughput of each scheme. The aver- consequence, DASH-dynamic experienced multiple stalls, re-
age throughputs of TransPi, DASH-BBA and DASH-dynamic sulting in rebuffering, while TransPi did not stop or rebuffer.
are 19.9, 9.4 and 8.8 Mbps, respectively. TransPi utilizes 52% These results clearly corroborate our findings in controlled
of the network capacity when multiple streaming services setting, thereby demonstrate TransPi’s superiority in realistic
share bandwidth. We have omitted the buffer occupancy for scenarios as well.

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 12

DASH- DASH-
Scheme BBA dynamic TransPi HLS DASH TransPi
Bitrate LTE: 5.34 LTE: 7.43 LTE: 6.57 50
(Mbps) FCC: 3.99 FCC: 5.47 FCC: 5.15
Delay LTE: 117.2 LTE: 137.3 LTE: 79.3 40
(msec) FCC: 146.3 FCC:170.6 FCC:87.6

Frequency
Number of LTE: 0.6 LTE: 3.45 LTE: 0
Stall FCC: 0 FCC: 0.9 FCC: 0 30
Rebuffering LTE: 6.38 LTE: 12.15 LTE: 0
Time (sec) FCC: 0 FCC: 2.36 FCC: 0 20
TABLE 5
TransPi outperforms DASH-BBA and DASH-dynamic in 10
terms of average bitrate, delay and rebuffering.
0
1 2 3 4 5
MOS
5.3 User Quality of Experience (QoE) Fig. 14. The distribution of MOS. Participants clearly
Next, we assess the performance of TransPi using subjective indicate that TransPi provides higher QoE.
user studies. The video quality is inherently subjective quan-
tity. The Quality of Service (QoS) is expressed by several Scheme HLS DASH-dynamic TransPi
network parameters (bandwidth, jitter, packet loss, delay and Avg. MOS 3.36 3.52 4.24
Std. MOS 0.83 0.88 0.65
etc.), however a good QoS does not always guarantee a good User preference (%) 11.1 18.9 70.0
user experience. For instance, even if the provided video qual-
ity is high, frequent stalls and long buffering time could lead TABLE 6
to less user engagement and low satisfaction because several TransPi is superior to others in terms of average MOS.
metrics are interrelated. This eventually lowers the quality to Participants prefer TransPi over ABR schemes.
the users. The Quality of Experience (QoE) has emerged to
address such issue and expresses the user satisfaction of a
service. QoE is affected by both users and network system.
rebuffing time, delay at startup and bitrate switches. Our
Mean Opinion Score (MOS) is mostly used metric and its
experiment results (both objective and subjective tests) confirm
scale is the 5-point Absolute Category Rating (ACR) scale:
above-mentioned inferences that TransPi provides higher bi-
5-Excellent, 4-Good, 3-Fair, 2-Poor and 1-Bad [27].
trate, no rebuffering or stall, therefore user QoE is much higher
We have conducted subjective tests with 15 users in a than others. The main purpose of utilizing video transcoding
single stimulus manner (without reference) [27]. We have used in this work is to provide better streaming service to the user
6 video sequences and asked the participants to rate each when the wireless network cannot provide successful transmis-
video sequence individually on an ACR scale. In addition, sion of original video data on time. TransPi can compensate
participants indicate their preferences among HLS, DASH- for some loss of video quality by eliminating stalls, buffering
dynamic and TransPi (without knowing the schemes). Table and start time and improving the user experience (note that
6 shows the average and standard deviation of MOS for each the quality of the original video is clearly higher than that
scheme along with the user’s preference (note that we have of transcoded one, because transcoding downgrades the video
used 90 video sequences in total). We can clearly see that quality in terms of bitrate, resolution and etc.).
TransPi has higher QoE than ABR schemes and people prefer
TransPi over HLS or DASH-dynamic. The main reason for
achieving higher QoE in TransPi is that it provides seamless 5.4 Fairness and Providing Differentiated Service
streaming service, whereas two ABR schemes suffer from Fairness. Significant unfairness between clients has been ob-
rebuffering, and hence stalls bother the participants and their served for bitrate adaptive streaming solutions when multiple
engagements. It is well know that reffering has a significant clients share the network bandwidth [30]. Providing bandwidth
impact on user QoE. For instance, the authors in [28] have fairness is important to guarantee the QoS in multi-client
shown that time spent rebuffering and the frequency of re- deployment. TransPi determines the bitrates for transcoding
buffering events can significantly reduce the QoE. In addition, based on the client’s downlink capacity, thus selected bitrates
video streaming service providers prefer to lower the quality allocate the bandwidth for clients in the network. This eventu-
of the video delivered rather than cause an interruption in its ally ensures fair sharing of bandwidth across multiple clients.
playback. Our QoE measurement results clearly corroborate To evaluate the performance of TransPi under a competing
such inference. Fig. 14 represents the distribution of MOS scenario, we repeat similar experiments with two clients. In
collected from the same set of experiments. It can be seen this setup, each client receives transcoded video streaming
that TransPi has received a large number of “Excellent” and from a dedicated transcoder (RPi) and shares downlink band-
the MOS difference is significant. We can conclude that higher width while associating with the same AP. Each transcoder
bitrate and less rebuffering result in higher QoE. incorporates its client’s network bandwidth to decide the
Several previous researches [14], [29] shown that user output bitrates. To evaluate the fairness under multiple clients
engagement and QoE depend on many factors; video bitrate, deployment, we first set the downlink bandwidth in a sym-

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 13

Client1
10 10
9
8
Bitrate (Mbps)

Bitrate (Mbps)
7
6.5
6
Client2
5
4
3.5
3
2
Client1 Client3
Client2 Client4 1
0 0
0 20 40 60 80 100 0 20 40 60 80 100 120
Time (sec) Time (sec)

Fig. 15. Regardless of the bandwidth changes in the Fig. 16. Client1 receives twice as much downlink band-
network, the transcoding bitrates for each client remain width as client2.
the same. Four clients equally share the bandwidth.
Intel server (i7) TransPi (RPi)
Avgerage power 92.94 W 2.65 W
Duration 85 sec 519 sec
metric manner. Fig. 15 shows the selected bitrates of four Energy consumption 2.19 J 0.38 J
clients while applying bandwidth changes configured to 40,
26, 15 and 40 Mbps for 25 seconds each (the shared bandwidth TABLE 7
for each client is 10, 6.5, 3.5 and 10 Mbps during each Energy consumption comparison.
time interval). We can clearly see that the selected bitrates
for each client remain same; four clients equally share the
transcoding tasks in real-time without degrading the perfor-
available bandwidth, 1/n where n is the number of clients
mance and video quality. To monitor and evaluate the energy
in the network. This ratio of bitrates for clients is maintained
efficiency of TransPi, we measured the power consumption of
same even with the network’s bandwidth changes. The sum
both the RPi and a dedicated transcoding server (Intel i7-6700
of selected bitrates for both clients equals to the network
CPU with 16GB RAM). We have executed the same video
capacity, thus TransPi efficiently utilizes bandwidth (utilization
transcoding task (1080p HD) on both platforms while logging
is around 99%). In this experiment, we set the time interval
the instantaneous power consumption (we ran the same video
to 25 seconds to characterize the network dynamics, however
transcoding solution on both machine). Fig. 17 shows one
even smaller time interval (e.g., 1 second) is feasible, because
typical power measurement result (we have conducted multiple
TransPi configures the transcoding bitrate every 1 second. We
runs and obtained the same results). We can see that the
omit the result for the sake of brevity.
desktop implementation of transcoding server consumes more
Providing differentiated service. TransPi instantaneously than 35 times power comparing to that of RPi while running
determines the video bitrates for transcoding. Based on such transcoding task. We can see that Intel-based transcoding
characteristics, we can easily configure the downlink band- server finishes the same transcoding task much faster (6.1×)
width for each client when multiple clients share the network. than the RPi because of the powerful and sufficient resources.
We can provide differentiated service (i.e., allocating different Note that video transcoding time on RPi is much longer Intel-
bandwidth) for each client by configuring different bitrate based transcoding server, however, RPi is still sufficient to
for transcoding. In this experiment, we control the downlink transcode live streaming without incurring additional delays.
bandwidth in an asymmetric manner; client 1 receives two The areas in Fig. 17 (excluding idle states) represent the total
times the downlink bandwidth compared to client 2. Similar power consumption of each transcoder implementation. Table
to the fair sharing experiment, we configure the network 7 summarizes the power measurements for both platforms. We
bandwidth to 15, 10, 5 and 10 Mbps for 30 seconds each. can see that TransPi is more energy efficient (i.e., 5.76× less
Shown in Fig. 16, we confirm that the ratio of the selected energy) and much cheaper than transcoders built with general-
bitrates for each client stays at 2:1 as we configure the purpose processors.
downlink bandwidth for two clients, and the ratio remains
same while the network’s bandwidth changes. In this set of
experiments we have confirmed that both fair-sharing and con-
6 B ROADCAST L IVE S TREAMING V IDEO
trolled configuration (unequal sharing) are feasible in TransPi We introduced the design and implementation of TransPi that
by adapting the bitrates for the clients in the network. addresses the problem of channel variation for downlink video
streaming. In this section, we present a broadcast application
that uses video transcoding for real-time streaming uploads.
5.5 Energy Consumption Broadcasting live events (e.g., sports game, breaking news) is
Compared with a dedicated video transcoding server, RPi challenging because the wireless channel is highly variable and
is a cost-effective device that is capable of executing video the uplink bandwidth is relatively small. Toward this, we apply

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 14

4
100 Running
3.5
90
80 3 Running
Idle Idle
Power (W) 70 2.5

Power (W)
60
2 Idle
50
40 1.5
30 1
20
0.5
10
0 0
0 20 40 60 80 100 120 140 0 100 200 300 400 500
Time (sec) Time (sec)
(a) Desktop transcoding server (Intel i7 CPU) (b) TransPi (RPi)

Fig. 17. TransPi significantly reduces power consumption compared to the dedicated transcoding server.

600

500

Client buffer (msec)


400

300

200

100

0
0 50 100 150 200 250 300 350
Time (sec)

Fig. 18. Drone broadcasts live aerial streams to multiple users while Fig. 19. The client maintains a stable buffer
adapting video bitrates to minimize the delay. level.

video transcoding solution to uplink case for broadcasting live cellular base-station, and a web server that hosts segments of
streams. Next, we describe an implementation of TransPi on video streaming, as depicted in Fig. 18. We used DJI F550 [31]
drone for streaming aerial real-time images to the user. (a basic, user-configurable drone without a camera or wireless
modules) for our aerial vehicle. Our drone is equipped with the
Raspberry Pi for video encoding, an RPi camera module, and
6.1 Real-time Broadcasting using TransPi
a LTE USB adaptor for wireless connection during flight. The
Aerial vehicles such as Aircraft, MedFlight, and delivery drone broadcasts live aerial video streaming taken from RPi
drones need connectivity for communication and data trans- camera to the various users in real-time. In this configuration,
mission. They rely on satellite or cellular networks, however, uplink bandwidth (i.e., drone to the media server via LTE) is
such networks tend to have considerable variability due to relatively smaller than downlink capacity (i.e., media server to
fading and interference, especially in outdoor wireless net- the end users) and is a bottleneck. Such bandwidth limitation
works. Given that ensuring 100% guaranteed connectivity hinders uploading live stream in real-time, and hence incurs
with wireless medium is extremely difficult to achieve, it delay to the end users. In the worst case, user may experience
may be useful to provide reliable services in the application frequent stalls and buffering. It therefore requires optimization
domain. We have explored a new application domain, aerial for uploading data to provide seamless streaming service to the
vehicles, with channel-aware video encoding for broadcasting user in real-time while avoiding stall or freeze. To circumvent
live streaming service. Broadcast live streaming is one of the bandwidth limitation, unstable connection and variability
the popular applications in recent years due to the easy of of wireless channel, we apply video transcoding solution when
access and availability of numerous tools, cheap devices and uploading the recorded aerial video as depicted in Fig. 18.
network infrastructure (e.g., cloud). Given the popularity of the
mobile streaming service, there are continued demand growth First, the transcoder in RPi takes video streaming from
for high bandwidth in wireless mobile network. The goal of the camera module and encodes them using the GStreamer
this application is to provide live-streaming aerial coverage to framework as described in section 4.2. We have implemented
multiple users while adapting to the variant wireless channel. a similar GStreamer pipeline to handle video streams. RPi on
Our system consists of three components: a drone, LTE drone instantaneously determines the video bitrates according

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 15

Youtube MLB Mixer TransPi


Average (msec) 8.49 5.22 1.56 3.05 is around 40-50 msec). Moreover, we have not experienced
Median (msec) 0.001 0.001 0.001 0.001 any stall or buffer depletion for 10 min of playback. TransPi
75% (msec) 0.002 0.34 0.87 0.21 is able to provide stable and adaptive services for live event
85% (msec) 1.20 7.44 3.04 5.12
broadcasting with minimal delay.
95% (msec) 3.55 23.14 10.32 15.79
Max. (msec) 4625 889 36 87
7 R ELATED W ORK AND D ISCUSSION
TABLE 8 Adaptive streaming. Adaptive bitrate streaming (ABR) is a
Summary of inter-packet gap. technique used in streaming video over HTTP where various
versions of the source video are pre-encoded at different
bitrates [8]. Several measurement studies [12], [35] presented
to the connected cellular bandwidth, and encodes the video the in-efficient performance problems of ABR players, e.g.,
stream. It is challenging to provide real-time streaming service Smooth streaming, Netflix, and Adobe HTTP dynamic stream-
to the user without carefully transcoding the video appropri- ing. They show that none of these players is good enough; they
ately to the wireless uplink capacity because of that delayed are either too conservative or too aggressive, and hence they
uploading causes frequent stalls for buffering on the client are not efficiently adapt to the network variations. Moreover,
player. Thus, having appropriate video bitrate is critical for all above players have relatively long response time under
meeting the requirement of real-time application. Note that the network congestion. Several works [13], [30] analyze bitrate
duration of the transcoded segment is one second, therefore our adaptation algorithms with respect to the efficiency, fairness
transcoding solution is able to cope with the channel variation. and stability. Inaccurate bandwidth measurements caused by
Then, the transcoded outputs are uploaded to the media server the temporal overlap of downloading chunk causes under/over-
using FTL protocol [32] to minimize the latency. The end estimation of underlying bandwidth and results in incorrect bi-
users simply access the media server and plays the real-time trate selections. The authors in [30] guideline on designing bet-
video taken by the camera of the drone. Since our transcoding ter scheduling and bitrate selection logics in the client player,
solution can be applied to live transcoding in real-time, the however, such design requires significant modifications to the
uploaded video streaming is delayed by a few seconds so that client and hence hinders wide deployments. As described,
it could provide live streaming service to the end user. there are many performance issues on widely used adaptive
streaming solutions, thus work in [36] proposed two rate
adaptation algorithms of serial and parallel fetching methods
6.2 Evaluation
for DASH. The main difference from above-mentioned works
To evaluate the performance of TransPi applied to broadcast is our system involves changing media source itself through
live streaming video, we measured the inter-packet gap for transcoding instead of having better adaptation algorithms.
received video traffic. For comparison, we have collected Transcoding at the edge. The idea and principles of mobile
packet traces of various live streaming services, Youtube, edge computing (MEC) have been widely adopted in many
MLB [33] (streaming live baseball games) and Mixer [34] applications because of its multi-fold advantages; reduced
(Microsoft’s live streaming service) under the same network latency, easy to r client’s feedback and etc. The authors
conditions. Unlike TransPi, the video bitrates available for in [37] propose a transcoding system consisting of a client
the above-mentioned live streaming services are limited (e.g., proxy (collecting client’s profile) and a video proxy (running
240p, 360p, 480p, 720p, 1080p and etc.). transcoding near the video server) in the 3G network. [38]
Inter-packet Delay. Table 8 summarizes statistics for inter- present transcoding gateway and its application to support
packet delay. We found that the average inter-packet delay heterogeneous multi-client in wireless networks. The authors
for Youtube is the highest of the four, even though 95% of in [39] propose an approach of edge-assisted and on-the-
the inter-packet delay is less than 4 msec. This is because fly transcoding to improve QoE. The above-mentioned works
Youtube live streaming has a long latency, e.g., 30 sec or even leverage advantages of MEC for video transcoding applica-
higher (2 min), therefore traffics are already stored on the tions taking into account user and network conditions. These
media server before delivered to the client in real-time. This transcoding solutions transcode the source video into pre-fixed
makes the client download Youtube live streaming in a bursty bitrates that can not be adapted to time-varying channels.
manner; the client downloads for approximately 500 msec, TransPi, on the other hand, continuously adjusts the video
then pauses for approximately 4 sec, and repeats again. Sucg bitrate according to the client’s state and provides fine-grained
long latency and large inter-packet delays are not adequate for bitrate adaptation even in a single video stream.
live streaming. In contrast, Mix has both the lowest average Wang et al. [40] propose an adaptive video transcoding
and maximum delay, therefore it is able to provide sub-second framework based on edge computing. Both [39] and [40]
latency. TransPi’s inter-packet gap is close to that of Mixer execute the video transcoding at the edge, but their goal
and the latency for TransPi is around 2 sec. We can confirm is different from TransPi. They reduce massive transcoded
that TransPi provides bitrate adaptation without incurring high streams traverse the core network before being delivered to
latency compared to other live streaming services. end viewers, whereas TransPi improves the user experience
Buffer. Fig. 19 presents client buffer size of TransPi. We can and video quality while transcoding at the wireless edge.
see that TransPi maintains a stable buffer level of 200 to 300 The traditional video transcoding schemes transcode videos
msec except for the first 10 msec (the buffer size of Mixer in all possible bitrates in an offline manner. In contrast,

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 16

the authors in [41] proposed online transcoding policies that accelerate the system deployment.
transcode video segments in a just-in-time fashion when
bitrates are actually requested by the user. They focus on
8 C ONCLUSION
theoretical analysis of predictive model to reduce transcoding
workload, therefore it is limited to simulation. Similarly, We present the design, implementation and evaluation of a
TransPi operates video transcoding on-the-fly. Unlike [41], real-time video transcoding solution in wireless networks.
TransPi is implemented on a COTS device and demonstrates Our proposed system provides quick bitrate adaptation under
its efficacies and user satisfactions in real-world scenarios. unstable wireless conditions and efficiently utilizes avail-
Albanese et al. [42] propose the video transcoding unit ap- able bandwidth. We show the feasibility of real-time video
plication that leveraging MEC architecture. They demonstrated transcoding solution on a cheap single-board computer RPi
the superiority of GPU-accelerated video transcoding com- and demonstrate its superiority over bitrate adaptive streaming
pared to the software-only one, but they simply adopted the approaches. TransPi is capable of transcoding live streaming,
COTS GPU hardware without any further modification. The on-the-fly, with minimal delay. We plan to improve its scala-
main difference is that we design and optimize the architecture bility in a practical setting (i.e., supporting hundreds of clients
of GStreamer pipeline to improve transcoding performance simultaneously and incorporating 100+ IPTV channels).
on a COTS device and enhance the user experience. Our
implementation is publicly available and easily adaptable to ACKNOWLEDGMENT
many applications that require real-time streaming.
In summary, TransPi has several unique features when This work was supported by Basic Science Research Pro-
compared to other works: (i) TransPi employs the software and gram through the National Research Foundation of south
hardware coupled architecture on a cost-effect and powerful Korea (NRF) funded by the Ministry of Education NRF-
COTS device to improve the video streaming quality and user 2018R1C1B6006436.
satisfactions. (ii) Our transcoding solution can be used for both
the downlink and uplink video streams. It provides real-time R EFERENCES
transcoding service, and hence it can handle live streaming
[1] Cisco visual networking index, Global mobile data traffic forecast update
with minimal delay. (iii) Unlike other transcoding systems, 2012-2017, http://www.cisco.com/.
TransPi provides each user with a dedicated streaming that [2] T. Stockhammer, “Dynamic Adaptive Streaming over HTTP: Standards
takes the user’s profile instantaneously to determine the bi- and Design Principles”. In Proceedings of ACM MMSys, 2011.
[3] H. Schwarz, D. Marpe and T. Wiegand, “Overview of the Scalable Video
trates of the transcoding. Thus, it quickly adapts to the user’s Coding Extension of the H.264/AVC Standard”, In IEEE Transactions on
network condition as opposed to other adaptation solutions Circuits and Systems for Video Technology, vol. 17, no. 9, pp. 1103-1120,
that only serve a limited choice of pre-processed bitrates. Sep. 2007.
[4] Z. Li, Y. Huang, G. Liu, F. Wang and Z. Zhang. “Cloud Transcoder:
(iv) It provides seamless bitrate adaptation while transcoding Bridging the Format and Resolution Gap between Internet Videos and
video in fine time granularity, that ensures end users are Mobile Devices”, In Proceedings of ACM NOSSDAV, 2012.
not interrupted during playback. (v) TransPi does not require [5] Z. Huang, C. Mei, L. E. Li and T. Woo. “CloudStream: Delivering
High-quality Streaming Videos through a Cloud-based SVC Proxy”, In
preprocessing or additional storage because TransPi runs on- Proceedings of IEEE Infocom, 2011.
the-fly. [6] Raspberry Pi, http://www.raspberrypi.org/.
Deployment. Video transcoding is a computationally inten- [7] HTTP Live Streaming (HLS), “IETF Internet-Drafts 2014”,
http://tools.ietf.org/html/.
sive task for general purpose processor. The performance of [8] ISO/IEC23009-1, “MPEG Dynamic Adaptive Streaming over HTTP
hardware transcoding clearly outperforms the software-based (DASH)”, http://dashif.org/mpeg-dash/.
solution. For instance, Hameed et al. [43] have shown that [9] M. Satyanarayanan, P. Bahl, R. Caceres and N. Davies. “The Case
for VM-Based Cloudlets in Mobile Computing”, In IEEE Pervasive
ASIC implementation of H.264 encoder could improve power Computing, vol. 8, no. 4, pp. 14-23, Oct. 2009.
efficiency up to 500 times compared to the software-based one [10] DASH client 2.7.0, https://github.com/Dash-Industry-Forum/dash.js/
running on general purpose processor. Similarly, we leverage [11] DASH test stream, https://dash.akamaized.net/akamai/bbb 30fps/bbb
30fps.mpd/
the GPU in RPi for designing video transcoding solution, [12] S. Akhshabi, A. C. Begen and C. Dovrolis. “An Experimental Evaluation
however, a single RPi can only transcode 4 HD streams of Rate-Adaptation Algorithms in Adaptive Streaming over HTTP”, In
simultaneously. Given the large number of video streams Proceedings of ACM MMSys, 2011.
[13] S. Akhshabi, L. Anantakrishnan, C. Dovrolis and A. C. Begen. “What
requested by multiple users, it is necessary to have a system Happens when HTTP Adaptive Streaming Players Compete for Band-
that can handle many transcoding at the same time. Toward width?”, In Proceedings of ACM NOSSDAV, 2012.
this, we build a RPi-center (e.g., a rack of RPis) and transcod- [14] F. Dobrian, V. Sekar, A. Awan, I. Stoica, D. A. Joseph, A. Ganjam, J.
ing manager that schedules multiple RPis for transcoding. Zhan and H. Zhang, H. “Understanding the Impact of Video Quality on
User Engagement”, In Proceedings of ACM Sigcomm, 2011.
We parallelize the transcoding processes to support multiple [15] Digital Academic Television Network, https://it.wisc.edu/ services/datn/
requests from users and simultaneously support hundreds of [16] Openmax. https://www.khronos.org/openmax/
users. RPi is a cost-effective and powerful hardware, and hence [17] GPU CUDA acceleration, http://www.bdlot.com/resource/gpu-
acceleration.htm/
RPi-center is still much cheaper than maintaining dedicated [18] Node.js, https://nodejs.org/
and massive transcoding servers. With these low-cost and [19] MQTT.js, https://github.com/mqttjs/
performance efficiencies, our system can be easily deployed [20] Gstreamer multimedia framework, http://gstreamer.freedesktop.org/
[21] OpenMAX IL wrapper plugin, https://github.com/pliu6/gst-omx/
in wireless networks. In addition, we are working to integrate [22] TransPi implementation on Raspberry Pi, https://github.com/mobile-
the TransPi into the GPU-embedded WiFi access point to systems-lab/buildroot/

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2898834, IEEE
Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 17

[23] T. Y. Huang, R. Johari, N. McKeown, M. Trunnell and M. Watson. “A Suman Banerjee is an Professor in Computer
Buffer-Based Approach to Rate Adaptation: Evidence from a Large Video Sciences at UW-Madison where he is the found-
Streaming Service”, In Proceedings of ACM Sigcomm, 2014. ing director of the WiNGS laboratory which
[24] K. Winstein, A. Sivaraman, and H. Balakrishnan. “Stochastic Forecasts broadly focuses on research in wireless and
Achieve High Throughput and Low Delay over Cellular Networks”, In mobile networking systems. He received his un-
Proceedings of USENIX NSDI, 2013. dergraduate degree from IIT Kanpur in Com-
[25] D. Raca, J.J. Quinlan, A.H. Zahran, C.J. Sreenan. “Beyond Throughput: puter Science and Engineering and was a gold
a 4G LTE Dataset with Channel and Context Metrics”, In Proc. of medalist in his graduating class. He received
ACM Multimedia Systems Conference, 2018. Trace is available at his M.S. and Ph.D. degrees from the University
https://www.ucc.ie/en/misl/research/datasets/ivid 4g lte dataset/ of Maryland and his Ph.D. dissertation was the
[26] FCC Raw Data - Measuring Broadband America 2016. university’s nomination for the ACM Doctoral
https://www.fcc.gov/reports-research/reports/measuring-broadband- Dissertation Award. While at Wisconsin, Prof. Banerjee received the
america/raw-data-measuring-broadband-america-2016/ CAREER award from the US National Science Foundation and the
[27] ITU-T Rec. P.910. “Subjective Video Quality Assessment Methods for inaugural Rockstar Award from ACM SIGMOBILE for early career
Multimedia Applications”, https://www.itu.int/rec/T-REC-P.910/en/, 2008. achievements and contributions in his field. Prof. Banerjee has authored
[28] R. Mok, E. Chan and R. Chang. “Measuring the Quality of Experience of more than 100 technical papers in leading journals and conferences in
HTTP Video Streaming”, In Proc. of IFIP/IEEE International Symposium the field, including ACM/IEEE Transactions on Networking, ACM/IEEE
on Integrated Network Management, 2011. Transactions on Mobile Computing, ACM Sigcomm, ACM MobiCom,
[29] S. S. Krishnan and R. K. Sitaraman. “Video Stream Quality Im- IEEE Infocom, ACM MobiSys, ACM CoNEXT, ACM IMC, IEEE Dyspan,
pacts Viewer Behavior: Inferring Causality Using Quasi-Experimental and more. It also includes various award papers from conferences such
Designs”, In Proceedings of ACM IMC, 2012. as ACM MobiCom, ACM CoNEXT, and IEEE Dyspan. Prof. Banerjee
[30] J. Jiang, V. Sekar and H. Zhang, H. “Improving Fairness, Efficiency, served as the chair of ACM SIGMOBILE between 2013 and 2017.
and Stability in HTTP-based Adaptive Video Streaming with FESTIVE”,
In Proceedings of ACM CoNEXT, 2012.
[31] DJI F550 Hexacopter drone https://www.dji.com/flame-wheel-arf/
[32] SDK for Beam’s FTL Protocol https://github.com/mixer/ftl-sdk/
[33] Major league baseball http://mlb.tv/
[34] Interactive live streaming platform http://mixer.com/
[35] C. Muller, S. Lederer and C. Timmerer. “An Evaluation of Dynamic
Adaptive Streaming over HTTP in Vehicular Environments”, In Proceed-
ings of ACM MoVid, 2012.
[36] C. Liu, L. Bouazizi, M. Hannuksela and M. Gabbouj. “Rate Adaptation
for Dynamic Adaptive Streaming over HTTP in Content Distribution
Network”, In Image Communication, vol. 27, no. 4, pp. 288-311, Apr.
2012.
[37] A. Warabino, S. Ota, D. Morikawa, M. Ohashi, H. Nakamura, H.
Iwashita and F. Watanabe. “Video Transcoding Proxy for 3Gwireless
Mobile Internet Access”, In IEEE Communications Magazine, vol. 38,
no. 10, pp. 66-71, Oct. 2000.
[38] Z. Lei and N. D. Georganas. “Video Transcoding Gateway for Wireless
Video Access”, In Proc. of IEEE CCECE, 2003.
[39] S. Dutta, T. Taleb, P. A. Frangoudis and A. Ksentini. “On-the-Fly QoE-
Aware Transcoding in the Mobile Edge”, In Proc. of IEEE Globecom,
2016.
[40] D. Wang, Y. Peng, X. Ma, W. Ding, H. Jiang, F. Chen and J. Liu. “Adap-
tive Wireless Video Streaming based on Edge Computing: Opportunities
and Approaches”, In IEEE Transactions on Services Computing, Apr.
2018.
[41] D. K. Krishnappa, M. Zink and R. K. Sitaraman. “Optimizing the Video
Transcoding Workflow in Content Delivery Networks”, In Proc. of ACM
MMSys, 2015.
[42] A. Albanese, P. S. Crosta, C. Meani and P. Paglierani. “GPU-accelerated
Video Transcoding Unit for Multi-access Edge Computing Scenarios”, In
Proc. of The Sixteenth International Conference on Networks, 2017.
[43] R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. Lee,
S. Richardson, C. Kozyrakis and M. Horowitz. “Understanding Sources
of Inefficiency in General-Purpose Chips”, In ACM SIGARCH Computer
Architecture News, vol. 38, no. 3, pp. 37-47, 2010.

Jongwon Yoon received the B.S. degree in


computer science from Korea University in 2007,
and the M.S. and Ph.D. degrees from the Uni-
versity of Wisconsin-Madison in 2012 and 2014,
respectively. He was a recipient of the Lawrence
Landweber fellowship during his Ph.D. study. He
is currently an assistant professor in the Depart-
ment of Computer Science and Engineering at
Hanyang University, Korea. He leads the Mo-
bile Systems laboratory which broadly focuses
on research in wireless communication, mobile
networking, and embedded systems.

1536-1233 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like