Data at Rest Encryption

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

New in HDP 2.

3: Enterprise Grade HDFS Data At Rest Encryption


By Balaji Ganesan
on
June 10th, 2015

Apache Hadoop has emerged as a critical data platform to deliver business insights
hidden in big data. As a relatively new technology, system administrators hold
Hadoop to higher security standards. There are several reasons for this scrutiny:

 External ecosystem that comprise of data repositories and operational


systems that feed Hadoop deployments are highly dynamic and can introduce
new security threats on a regular basis.

 Hadoop deployment contains large volume of diverse data stored over longer
periods of time. Any breach of this enterprise-wide data can be catastrophic.

 Hadoop enables users across multiple business units to access, refine,


explore and enrich data using different methods, thereby raising the risk for
potential breach.

Security Pillars in Hortonworks Data Platform (HDP)


HDP is the only Hadoop platform offering comprehensive security and centralized
administration of security policies across the entire stack. At Hortonworks we take a
holistic view to enterprise security requirements and ensure that Hadoop can not
only define but also apply a comprehensive policy. HDP leverages Apache
Rangerfor centralized security administration, authorization and auditing; Kerberos
and Apache Knox for authentication and perimeter security, and support for
native/partner solutions for encrypting over the wire and data-at-rest.
Data at REST Encryption – State of the union
In addition to authentication and access control, data protection adds a robust layer
of security, by making data unreadable in transit over the network or at rest on a
disk.

Compliance regulations, such as HIPAA and PCI, stipulate that encryption is used to
protect sensitive patient information and credit card data. Federal agencies and
enterprises in compliance driven industries, such as healthcare, financial services
and telecom, leverage data at rest encryption as core part of their data protection
strategy. Encryption helps protect sensitive data, in case of an external breach or
unauthorized access by privileged users.

There are several encryption methods, varying in degrees of protection. Disk or OS


level encryption is the most basic version, which protects against stolen disks.
Application level encryption, on the other hand, provides higher level of granularity
and prevents rogue admin access; however, it adds a layer of complexity to the
architecture.

Traditional Hadoop users have been using disk encryption methods such as dm-
crypt as their choice for data protection. Although OS level encryption is transparent
to Hadoop, it adds a performance overhead and does not prevent admin users from
accessing sensitive data. Hadoop users are now looking to identify and encrypt only
sensitive data, a requirement that involves delivering finer grain encryption at the
data level.

Certifying HDFS Encryption


The HDFS community worked together to build and introduce transparent data
encryption in HDFS. The goal was to encrypt specific HDFS files by writing them to
HDFS directories known as encryption zones (EZ). The solution is transparent to
applications leveraging HDFS file system, such as Apache Hive and Apache HBase.
In other words, there is no major code change required for existing applications
already running on top of HDFS. One big advantage of encryption in HDFS is that
even privileged users, such as the “hdfs” superuser, can be blocked from viewing
encrypted data.

As with any other Hadoop security initiative, we have adopted a phased approach of
introducing this feature to customers running HDFS in production environment. After
the technical preview announcement earlier this year, Hortonworks team has worked
with select group of customers to gather use cases and perform extensive testing
against those use cases. We have also devoted significant development effort in
building a secure key storage in Ranger, by leveraging the open source Hadoop
KMS. Ranger now provides centralized policy administration, key management and
auditing for HDFS encryption.

We believe that HDFS encryption, backed by Ranger KMS, is now enterprise ready
for specific use cases. We will introduce support for these use cases as part of
the HDP 2.3 release.

HDFS encryption in HDP – Components and Scope

The HDFS encryption solution consists of 3 components (more details in the Apache
website here)

 HDFS encryption/decryption enforcement: HDFS client level encryption


and decryption for files within an Encryption Zone
 Key provider API: API used by HDFS client to interact with KMS and retrieve
keys

 Ranger KMS: The open source Hadoop KMS is a proxy that retrieves keys
for a client. Working with the community, we have enhanced Ranger GUI to
enable securely store key using a database and centralize policy
administration and auditing. (Please refer to the screenshots below)

We have extensively tested HDFS data at rest encryption across the HDP stack and
will provide a detailed set of best practices for how to use HDFS data at rest
encryption among various use cases as part of the HDP 2.3 release.
We are also working with key encryption partners so that they can integrate their
own enterprise ready KMS offerings with HDFS encryption. This offers a broader
choice to customers looking to encrypt their data in Hadoop.

Summary
In summary, to encrypt sensitive data, protect privileged access and go beyond OS
level encryption, enterprise can now use HDFS transparent encryption. Both HDFS
encryption and Ranger’s KMS are open source, enterprise-ready, and satisfy
compliance sensitive requirements. As such they facilitate Hadoop adoption among
compliant conscious enterprises.

You might also like