Data at Rest Encryption
Data at Rest Encryption
Data at Rest Encryption
Apache Hadoop has emerged as a critical data platform to deliver business insights
hidden in big data. As a relatively new technology, system administrators hold
Hadoop to higher security standards. There are several reasons for this scrutiny:
Hadoop deployment contains large volume of diverse data stored over longer
periods of time. Any breach of this enterprise-wide data can be catastrophic.
Compliance regulations, such as HIPAA and PCI, stipulate that encryption is used to
protect sensitive patient information and credit card data. Federal agencies and
enterprises in compliance driven industries, such as healthcare, financial services
and telecom, leverage data at rest encryption as core part of their data protection
strategy. Encryption helps protect sensitive data, in case of an external breach or
unauthorized access by privileged users.
Traditional Hadoop users have been using disk encryption methods such as dm-
crypt as their choice for data protection. Although OS level encryption is transparent
to Hadoop, it adds a performance overhead and does not prevent admin users from
accessing sensitive data. Hadoop users are now looking to identify and encrypt only
sensitive data, a requirement that involves delivering finer grain encryption at the
data level.
As with any other Hadoop security initiative, we have adopted a phased approach of
introducing this feature to customers running HDFS in production environment. After
the technical preview announcement earlier this year, Hortonworks team has worked
with select group of customers to gather use cases and perform extensive testing
against those use cases. We have also devoted significant development effort in
building a secure key storage in Ranger, by leveraging the open source Hadoop
KMS. Ranger now provides centralized policy administration, key management and
auditing for HDFS encryption.
We believe that HDFS encryption, backed by Ranger KMS, is now enterprise ready
for specific use cases. We will introduce support for these use cases as part of
the HDP 2.3 release.
The HDFS encryption solution consists of 3 components (more details in the Apache
website here)
Ranger KMS: The open source Hadoop KMS is a proxy that retrieves keys
for a client. Working with the community, we have enhanced Ranger GUI to
enable securely store key using a database and centralize policy
administration and auditing. (Please refer to the screenshots below)
We have extensively tested HDFS data at rest encryption across the HDP stack and
will provide a detailed set of best practices for how to use HDFS data at rest
encryption among various use cases as part of the HDP 2.3 release.
We are also working with key encryption partners so that they can integrate their
own enterprise ready KMS offerings with HDFS encryption. This offers a broader
choice to customers looking to encrypt their data in Hadoop.
Summary
In summary, to encrypt sensitive data, protect privileged access and go beyond OS
level encryption, enterprise can now use HDFS transparent encryption. Both HDFS
encryption and Ranger’s KMS are open source, enterprise-ready, and satisfy
compliance sensitive requirements. As such they facilitate Hadoop adoption among
compliant conscious enterprises.