An Architecture Methodology For Secure Video Conferencing: Reshma Lal Pradeep M. Pappachan
An Architecture Methodology For Secure Video Conferencing: Reshma Lal Pradeep M. Pappachan
An Architecture Methodology For Secure Video Conferencing: Reshma Lal Pradeep M. Pappachan
Conferencing
Reshma Lal Pradeep M. Pappachan
Intel Labs Intel Labs
Intel Corporation Intel Corporation
Hillsboro, OR USA Hillsboro, OR USA
Abstract— This paper describes how to enhance the security of VoIP can prove ineffective. The risk of losing valuable corporate IP
applications using hardware security features on computing or sensitive government data is therefore only increasing.
platforms such as notebooks, tablets and smartphones. Specifically
we explain how to develop such applications using the protection Participant authentication where identity is verified by a
offered by a processor based security technology, which provides the secure backend server before a call session can be established
ability for software developers to maintain control of the security by is also susceptible to attack. In this case an unauthorized
creating trusted domains within applications. Using this processor participant can spoof a legitimate participant’s identity and join
security technology, sensitive code and data can be hosted without the call session to listen in without detection.
the risk of being observed or modified by malware present in other
parts of the system. The resulting VoIP applications would better Hardware security technologies are becoming available on
meet the strong security needs of corporate and government sectors notebooks, tablets and phones that provide a secure execution
for real time digital information sharing. We include results of a environment for software to protect code and data. A new
research project sponsored by the United States Department of innovative technology Intel® Software Guard Extensions
Homeland Security and the United States Air Force Academyi where (Intel® SGX), provides a safe enclave in which software can
the team studied ways to enhance the security of a video conferencing operate, free from external observation or modification of the
application and implemented an experimental Video Chat application code and data [1]. Restructuring security sensitive software
using security technologies provided by the processor and media such as VoIP applications to make use of this new feature can
processing hardware. enable the applications to meet higher security goals.
Keywords-component; Security; VoIP; Video Conferencing; We begin this paper by presenting an overview of video
Intel® Software Guard Extensions; Trusted Execution conferencing in Section II. In Section III, we present a security
Environment analysis of a video conferencing application, describing the
assets, adversaries, threat model and limitations of existing
solutions. Section IV starts with an overview of Intel® SGX
I. INTRODUCTION
and Intel® PAVP technologies used to build a secure VoIP
With the widespread availability of high network solution; this is followed by a description of a secure video
bandwidth and inexpensive hardware for capturing video and conferencing architecture. Next, we describe an
audio on client platforms, use of video chat, video conferencing implementation of this secure architecture in Section V. The
and web conferencing applications has become increasingly results of the research sponsored by the United States
popular for real time information sharing. This creates an Department of Homeland Security and the United States Air
opportunity for the unauthorized capture and distribution of a Force Academyi to build an experimental Secure Video Chat
video conferencing stream by malicious individuals, which (SVC) solution and future work are covered in Section VI.
could lead to the theft of valuable Intellectual Property (IP) or Finally, we present a summary and our conclusions in Section
sensitive information in enterprise and government sectors. VII.
Such intrusions intercept and steal the Audio/Video (AV)
streams during a video conferencing session as it is being II. OVERVIEW OF VIDEO CONFERENCING
transmitted over the network or during processing on the
computing platform. A video conferencing application allows two or more users
to share video and audio information in real time. Examples of
While today’s secure VoIP solutions protect sensitive commercial applications in this category include Microsoft®
content during transmission by using proven cryptographic
Office Lync, Skype, and Cisco WebEx®. In this paper, we
methods such as Transport Layer Security (TLS) and Secure
discuss the security objectives of this class of applications by
Real-Time Transport Protocol (SRTP), recently threats have
migrated to inside the computing platform. From within, using the example of a two-party Video Chat application.
malware, including rootkits, can steal keys and data from an However, the findings reported in this paper apply to a full-
application while it is processing confidential content in featured, multiparty video conferencing application, as well as
cleartext. Sandboxing and restricting what is allowed to run on other real-time digital information sharing applications such as
the platform offers some protection, but, unfortunately, in the Instant Messaging (IM) and virtual Whiteboards.
face of increasingly aggressive malware threats, these solutions
B. Adversaries
We want to protect the assets from the following
adversaries:
461
SRTP keys from application’s memory which can be then 1. AV streams inside the application’s memory are processed
used to decrypt the network packets. in cleartext, providing privileged malware with the
opportunity to steal or modify the content.
2. Malware can directly read the audio and video frames
from the applications memory as it is being processed. 2. Keys used in secure protocols such as TLS and SRTP are
available in cleartext inside the application’s memory
3. Recording or screen capture of AV streams during transit during encryption/decryption, providing malware with the
between the application and the I/O devices opportunity to steal them during use.
4. Static or runtime code modification to alter the flow of 3. Using stolen user credentials, an attacker can compromise
control to make the application behave differently than the authentication system, without detection.
intended for example security can be turned off, call
policies can be modified, and call log generation can be 4. Communication between the application and I/O devices
suppressed. (e.g., camera, microphone, speakers and display) is not
protected against malware.
5. Trusted insider can use a camcorder to record off the
screen and share the proprietary or classified information 5. Unauthorized modification of application binaries or
with outside collaborators. tampering of sensitive code at runtime can break security
model and can go undetected.
6. Audio and video streams could be modified or replayed.
7. Client authentication can be compromised, allowing IV. SECURE VIDEO CHAT ARCHITECTURE
unauthorized parties to participate in and observe calls.
In this section we describe our architecture for secure video
chat that addresses many of the weaknesses in today’s
D. Limitations of Existing Solutions
solutions. We start with a review of the security technologies
Existing solutions for video conferencing address the network used in our approach.
vulnerability of audio and video streams by protecting them
with appropriate encryption technologies, though care must be A. Intel® SGX Overview
taken to balance the strength of encryption with the efficiency Intel® SGX technology is an extension to Intel®
of the process to meet the QoS requirements for real time Architecture that provides software applications a secure
audio and video. Standard protocols such as SRTP provide environment to operate on sensitive data, without observation
secrecy, integrity and replay protection for secure transmission from all other software including privileged software such as
of real-time audio and video streams. The protected audio and the OS and VMM [1]. This enables the application to maintain
video streams eventually must be decrypted on the client secrecy and integrity even when the platforms on which they
platform before they can be rendered on audio and video run are infected with malware, including rootkits [3]. Using
output devices. This is an area where the application is Intel® SGX, an application can create a safe enclave with the
susceptible to malware attacks (see Figure 2). following attributes:
1. Protected Execution: During execution, data and code
within the secure enclave cannot be observed or modified
by any software outside the enclave.
2. Hardware based attestation: Intel® SGX can generate
hardware based attestation of the software running inside
and enclave. This enables an enclave to present strong
proof of its identity and integrity to a remote entity. An
enclave can obtain a “quote” from the processor, which
includes information such as the enclave’s identity, its
measurement (cryptographic hash of its code), hardware
version information, and ISV identity. This attestation
capability hardens client authentication to a service
provider prior to provisioning of application secrets, such
as keys to the client [2].
3. Secure Storage: Intel® SGX provides a mechanism for
enclave to seal secrets for persistent and secure storage
outside the enclave. The sealed secret can only be un-
sealed inside the same enclave software that sealed it, and
Figure 2: Areas of vulnerability in today’s video chat application
on the same platform upon which the enclave was running
when the data was sealed.
Examples of weaknesses in today’s solution include:
462
B. Intel® Protected Audio Video Path (Intel® PAVP)
Intel® integrated graphics and audio hardware on an IA
platform implement Intel® PAVP technology, which is
designed to protect audio and video data in transit from an
application’s memory to the output devices. It accomplishes
this by enabling the application and Intel® PAVP capable
hardware to establish a shared symmetric key that is used to
encrypt audio and video before it leaves the application’s
memory. The encrypted data is secure against snooping as it
goes through the standard OS and driver software stack, and is
securely decrypted in the device’s memory just prior to
rendering.
C. SVC Architecture
To improve the security of a video chat application, we start
with partitioning the SVC client application into trusted and
untrusted components (see Figure 3). The trusted components Figure 3: Secure Video Chat Application: High Level Architecture
that handle sensitive data (e.g., the authentication and
provisioning subsystem, the media processing system, and The key modules of the SVC application server are:
policy enforcement logic) are placed inside an Intel® SGX
enclave. The untrusted parts of application include interface to 1. Authentication and Provisioning: This module
the OS (thread management, network I/O) and the user authenticates the client by using a Quote Verification
interface; the untrusted components never handle sensitive data service to verify the attestation data generated by the
in clear. While we illustrate our approach with a single enclave, client and provisions the client with application secrets.
the use of multiple enclaves may better suit cases where the 2. Session Management: This module manages the
trusted components are developed by different vendors and establishment of secure session between the client and
integrated into a solution. In a multi-enclave solution, the server, after the client has authenticated itself to the
enclaves must authenticate each other using Intel® SGX’s server.
intra-platform attestation feature, creating trusted channels
3. Location and Directory Services: This module provides
between them. The trusted client modules in this architecture
a client location (e.g. IP address) service to clients for
are:
locating each other prior to establishing a call session.
1. Provisioning: This module generates user and platform The directory services enable the server to validate the
attestation data in the form of an enclave “quote” (see registered users, and optionally the platforms, during the
Section V) that is used to authenticate the client to the provisioning protocol.
SVC server during an initial provisioning protocol. Other 4. Key Distribution: In our example the server distributes
user data such as user ID and platform ID may be the SRTP key securely to the call participants after
included in the quote to bind client’s identity to hardware successful authentication. Other secure key distribution
generated attestation data. Upon successful provisioning, methods, such as sending keys over a secure SIP session,
authentication secrets are sealed to the enclave for use in may be used instead.
subsequent secure communication with the SVC server.
D. SVC Application Component Interactions
2. Session Management: This module creates and manages
secure sessions, which may be set up using a The interactions between the various components of the
cryptographic protocol such as SIGMA [9], between the SVC application are shown in figure 4. The application client
client and server, using the sealed secret client key to engages with the server in a secure provisioning protocol and
prove the client’s authenticity. presents Intel® SGX based attestation data, resulting in
3. Media Subsystem: This module contains an application secrets being provisioned to the client platform.
During the call, the client establishes a secure media session
implementation of the media transport protocol
with its peer and sends/receives A/V packets protected using
(RTP/SRTP), media sessions, and codecs. SRTP over its network interface. The SRTP packets are
4. Secure I/O: This module manages the creation of secure decrypted, decoded and re-encrypted with Intel® PAVP keys
channels between the enclave and Intel® PAVP capable (for output devices) inside the secure enclave. The encrypted
devices (audio and video) and performs Intel® PAVP media packets are sent to the media processing hardware
encryption. through the standard audio/video software stack, and they are
5. Policy Engine: This module is responsible for enforcing decrypted inside the hardware using the Intel® PAVP
call policies such as call event logging. symmetric key before rendering to a display or speaker device.
463
(Server) how the media subsystem of OPAL was partitioned into trusted
(inside the secure enclave) and untrusted parts.
Client
V. SVC IMPLEMENTATION
Our SVC implementation utilizes OPAL, an open source
multi-platform library (implemented in C++) for video
conferencing that supports multiple protocols and media
formats for call management and media control, such as SIP for
call signaling, and media control subsystems for transmitting
and receiving media streams using protocols such as RTP and
SRTP. The implementation includes user authentication and
client provisioning for end to end protection.
Figure 5: Partitioned Media Subsystem of an SVC Client
A. Client Authentication and Provisioning
When the SVC client connects to the server for the first To understand why the media subsystem was partitioned as
time, it engages in an authentication and provisioning protocol shown above, consider the example of an audio stream
to prove its trustworthiness to the server. The client sends a received by a client. The audio packets in this example are
signed report, called an SGX quote, of its measurement encrypted by the SRTP library in the sender’s client, and must
(cryptographic hash of its code) and other user data to the be decrypted on the receiver’s client before it can be processed
server, along with the public key of the client’s RSA key pair. further, in the clear. Therefore, the SRTP library (that performs
The server verifies the report, establishing that the client encryption and decryption), RTP session module (that
application is running on trusted Intel® SGX-enabled packetizes and depacketizes the RTP packets) and the adaptive
hardware. The server also validates that the user and platform jitter buffer module (compensates for variance in packet arrival
are registered with the directory service (we used Microsoft® times) must be hosted inside the secure enclave to protect the
Active Directory®), and verifies that the measurement and stream’s confidentiality and integrity. After the audio stream is
identity of the enclave matches that of a known enclave in a decrypted, it must be decoded by an appropriate audio codec
whitelist stored in its database. Once the platform and user are and re-encrypted for secure audio output to the Intel® PAVP
authenticated, the client seals its private key to its enclave and audio device. Hardware codecs may be used for performance
the server stores the corresponding client public key in its efficiency; however, we used software audio codec and video
database. Subsequently, the client and server establish a secure codecs that were available as separate plugins. We integrated
ephemeral session by using their public keys to encrypt and these into the enclave library because they handle audio and
private keys to sign the messages for each other during the video streams in the clear. Though not shown in the figure, a
session key agreement protocol. All messages between the similar secure communication and processing path must be
client and server (e.g., SRTP master key distribution) are implemented for the protected video stream received by the
encrypted, authenticated and replay-protected. In our approach, client. Outbound audio and video streams are handled
the server distributes the SRTP key to the clients after analogously. The part of the media subsystem outside the
authentication and SIP session setup between the two clients, enclave consists of the class abstractions for input and output
though it is possible to establish a shared SRTP key during SIP, audio/video devices, which in turn are connected to platform-
if it is protected using a protocol such as TLS [8]. specific audio and video libraries.
464
performs symmetric key exchange between the application and 3. Forensics: An authorized participant with malicious intent
the output devices during client initialization (mediated by Intel may record data using an external camera and distribute it.
Manageability Engine), as well as encryption of the Currently there is no technology to prevent this type of
audio/video streams; and (ii) Untrusted audio and graphics attack. However, forensic techniques such as digital
drivers, which relay the encrypted audio/video streams to the watermarking may be used to trace the leaks of sensitive
hardware for decryption and playback. Once the encrypted digital content.
streams are received by the graphics and audio hardware and
processed, they are re-encrypted and transmitted over a secure VII. CONCLUSION
link to an HDCP device (speakers, headsets, and displays)
where they are output. The proliferation of various methods of real-time digital
communication have forced organizations to implement
Figure 6 shows how audio is processed and rendered solutions that safeguard them in the face of aggressive malware
securely in our solution; similar techniques apply to video threats on computing devices. The safeguards need to ensure
processing as well. data confidentiality and integrity, and should include robust
Client mechanisms for authentication and policy enforcement related
to the consumption and distribution of data. Protecting digital
communication for VoIP applications needs to go beyond
User Interface Module secure enclave protecting data during network transmission and should include
the endpoints of the communication channel – the computing
Media Processing
Sound
Resources Authentication devices. Intel® SGX is a new CPU feature on Intel platforms
HDCP PAVP Encryption Key that provide an application with a secure execution
User
Audio Encryption PAVP
environment for protecting sensitive code and data that are
Processing
Interface
Equipment
Module
Mediator Interface Audio critical for application security. In this paper, we have
Codec
PAVP Communication described an architecture for secure video conferencing, using
Intel® SGX and Intel® PAVP technologies that provides end-
to-end protection against a large number of threats thereby
significantly raising the robustness and security of the solution.
Figure 6: Secure Audio Processing and Playback Using PAVP The secure architectural pattern that partitions the system into
trusted and untrusted parts based on an analysis of the assets
VI. RESULTS AND FUTURE WORK
and threats, as described here, can be applied to a broad class of
VoIP applications such as video conferencing, phone calls,
We successfully repartitioned an open source video Instant Messaging (IM), texting, and virtual Whiteboards.
conferencing stack to use Intel® SGX and Intel® PAVP Furthermore, we can leverage and reuse several components of
technologies to create a secure video chat application that held the secure architecture such as authentication, provisioning,
up to security and penetration testing in a lab environment. We session management and output protection in other application
subjected our implementation to several attacks, including domains such as Information Rights Management.
application memory and video frame buffer attacks by code
running under high privilege (simulating malware). Below, we
identify additional opportunities to extend our work and
improve the security of VoIP solutions: ACKNOWLEDGMENTS
1. Server Security: SVC’s security relies on the security of The authors of this paper wish to acknowledge the
backend components that might include one or more contributions of USAFA-SE research team and support from
application servers. These servers, at a minimum, perform the Intel® SGX research team in the design and development
authentication and distribution of initial secrets and could of the experimental Secure Video Chat applicationi.
be involved in other secure transactions as well, such as We also wish to recognize the support of the United States
distribution of policy, storage of call logs etc. In our Department of Homeland Security and the United States Air
current work, we focused on client security; however, Force Academy for sponsoring this research.
future work is intended to focus on the use of Intel® SGX
for server security which may include interaction with Intel is a trademark of Intel Corporation in the U.S.
various types of virtualization technologies, workload and/or other countries.
migration between servers, and secure cloud server
provisioning techniques.
2. Trusted Input: SVC security can be improved if we can REFERENCES
protect the transfer of data from input devices such as [1] F. McKeen, I. Alexandrovich, A. Berenzon, C. Rozas, H. Shafi, V.
cameras and microphones to secure enclaves. Future work Shanbhogue and U. Savagaonkar, “Innovative Instructions and Software
will focus on how these input devices can be augmented to Model for Isolated Execution”, in Hardware and Architectural Support
for Security and Privacy (HASP), Tel Aviv, Israel, 2013.
include security features like authentication and
[2] I. Anati, S. Gueron, S. P. Johnson and V. R. Scarlata, “Innovative
encryption, for creating trusted channels to secure Technology for CPU Based Attestation and Sealing,” in Hardware and
enclaves. Architectural Support for Security and Privacy (HASP), Tel Aviv, Israel,
2013.
465
[3] M. Hoekstra, R Lal, P. Pappachan, C Rozas, V Phegade, J. Cuvillo, AND HOLD INTEL AND ITS SUBSIDIARIES,
“Using Innovative Instructions to Create Trustworthy Software SUBCONTRACTORS AND AFFILIATES, AND THE
Solutions”, in Hardware and Architectural Support for Security and
Privacy (HASP), Tel Aviv, Israel, 2013. DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH,
[4] D. McGrew, E. Carrara, M. Baugher, M. Naslund, and K. Norrman.
HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES,
“RFC 3711: The Secure Real-time Transport Protocol (SRTP)”, AND EXPENSES AND REASONABLE ATTORNEYS'
Technical report, Cisco Systems, Inc and Ericsson Research, March FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY,
2004. ANY CLAIM OF PRODUCT LIABILITY, PERSONAL
[5] Real Time Transport Protocol, URL http://www.ietf.org/rfc/rfc3550.txt INJURY, OR DEATH ARISING IN ANY WAY OUT OF
[6] OPAL: Open Phone Abstraction Library, URL SUCH MISSION CRITICAL APPLICATION, WHETHER
http://www.opalvoip.org/wiki/ OR NOT INTEL OR ITS SUBCONTRACTOR WAS
[7] Mobility Capability Package Version 2.0, URL NEGLIGENT IN THE DESIGN, MANUFACTURE, OR
http://www.nsa.gov/ia/_files/Mobility_Capability_Pkg_Vers_2_1.pdf WARNING OF THE INTEL PRODUCT OR ANY OF ITS
[8] C. Jennings, N. Modadugu, “Session Initiation Protocol (SIP) over PARTS.
Datagram Transport Layer Security”, URL
http://tools.ietf.org/html/draft-jennings-sip-dtls-05 Intel may make changes to specifications and product
[9] H. Krawczyk, “SIGMA: The 'SIGn-and-MAc' Approach to descriptions at any time, without notice. Designers must not
Authenticated Diffie-Hellman and Its Use in the IKE-Protocols”, in 23rd rely on the absence or characteristics of any features or
Annual International Cryptology Conference, Santa Barbara, California,
2003. instructions marked "reserved" or "undefined". Intel reserves
these for future definition and shall have no responsibility
whatsoever for conflicts or incompatibilities arising from future
INFORMATION IN THIS DOCUMENT IS PROVIDED IN changes to them. The information here is subject to change
CONNECTION WITH INTEL PRODUCTS. NO LICENSE, without notice. Do not finalize a design with this information.
EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,
TO ANY INTELLECTUAL PROPERTY RIGHTS IS No computer system can provide absolute security under all
GRANTED BY THIS DOCUMENT. EXCEPT AS conditions. Built-in security features available on select Intel®
PROVIDED IN INTEL'S TERMS AND CONDITIONS OF processors may require additional software, hardware, services
SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO and/or an Internet connection. Results may vary depending
LIABILITY WHATSOEVER AND INTEL DISCLAIMS upon configuration. Consult your system manufacturer for
ANY EXPRESS OR IMPLIED WARRANTY, RELATING more details.
TO SALE AND/OR USE OF INTEL PRODUCTS
Intel®, the Intel® Logo, Intel® Inside, Intel® Core™, Intel®
INCLUDING LIABILITY OR WARRANTIES RELATING
Atom™, and Intel® Xeon® are trademarks of Intel
TO FITNESS FOR A PARTICULAR PURPOSE,
Corporation in the U.S. and/or other countries. Other names
MERCHANTABILITY, OR INFRINGEMENT OF ANY
and brands may be claimed as the property of others.
PATENT, COPYRIGHT OR OTHER INTELLECTUAL
PROPERTY RIGHT.
A "Mission Critical Application" is any application in which Copyright © 2013 Intel® Corporation
i
failure of the Intel Product could result, directly or indirectly, in
personal injury or death. SHOULD YOU PURCHASE OR
USE INTEL'S PRODUCTS FOR ANY SUCH MISSION
CRITICAL APPLICATION, YOU SHALL INDEMNIFY
i
The Department of Homeland Security (DHS) sponsors the Center of Innovation at the United States Air Force Academy, which
conducts research for educational purposes. The United States Air Force Academy and DHS sponsored the production of this
material under United States Air Force Academy agreement number FA7000-11-2-0001. The U.S. Government is authorized to
reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.
The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the
official policies or endorsements, either expressed or implied, of The United States Air Force Academy or the U.S. Government.
466