Kaelble S. Data Security For Dummies 2023
Kaelble S. Data Security For Dummies 2023
Kaelble S. Data Security For Dummies 2023
by Steve Kaelble
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Data Security For Dummies®, Immuta Special Edition
Published by
John Wiley & Sons, Inc.
111 River St.
Hoboken, NJ 07030-5774
www.wiley.com
Copyright © 2023 by John Wiley & Sons, Inc., Hoboken, New Jersey
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without
the prior written permission of the Publisher. Requests to the Publisher for permission should be
addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, The Dummies Way, Dummies.com,
Making Everything Easier, and related trade dress are trademarks or registered trademarks of John
Wiley & Sons, Inc. and/or its affiliates in the United States and other countries, and may not be
used without written permission. All other trademarks are the property of their respective owners.
John Wiley & Sons, Inc., is not associated with any product or vendor mentioned in this book.
For general information on our other products and services, or how to create a custom For
Dummies book for your business or organization, please contact our Business Development
Department in the U.S. at 877-409-4177, contact info@dummies.biz, or visit www.wiley.com/go/
custompub. For information about licensing the For Dummies brand for products or services,
contact BrandedRights&Licenses@Wiley.com.
Publisher’s Acknowledgments
Some of the people who helped bring this book to market include the
following:
Project Editor: Elizabeth Kuball Senior Client Account Manager:
Acquisitions Editor: Ashley Coffey Matt Cox
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Introduction
D
ata can be of astonishing value, but not in the traditional
sense of the word. You can’t hang data in a museum and
gaze at it in admiration. You can’t stash it in a vault, close
the door, and wait for its value to appreciate. Data has to be used
in order to unlock and tap into its value.
Introduction 1
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
the present and future, and helps you understand how to imple-
ment automated and dynamic access control as simply and quickly
as possible, even in the most complex cloud environments.
The Tip icon highlights information that will make your life
easier — at least when it comes to data security.
Any book about data is going to have a dark side. The Warning
icon points to issues you’ll want to avoid.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Making the cloud work for your business
Chapter 1
Making Data Work
for You
D
ata is increasingly the lifeblood of many businesses. The
various ways your enterprise maintains and uses its data
can be the key to growth and market disruption. And like
any lifeblood, data needs to circulate to work its magic.
This chapter explores just how much data use has exploded and
how the cloud is enabling digital transformation. It discusses how
data can generate return on investment (ROI) in the cloud, as well
as why data access must be balanced by security considerations.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
However, nothing in life is ever that simple. For one thing, getting
lots of data into the hands of lots of people is complicated by
the fact that these users are accessing multiple cloud platforms
and have different access clearance levels. There are also many
different, confusing, and strict rules and regulations that apply
to data, and those rules and regulations vary by industry and
location (more on that in a bit).
It’s important to venture forth with a solid plan. If you “lift and
shift” data without fully understanding how it will be used, you’re
liable to be setting yourself up for more work with less to show for
it than you were hoping. You may or may not achieve cost reduc-
tions, and you may just wind up with more technology troubles
than ever.
You need to understand where your data lives, who needs it, and
how you can build a system that allows those users to access that
data — without being stopped in your tracks by scalability or
security issues.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
That’s why so many enterprises are turning to the cloud to cre-
ate that success. According to Gartner, global end-user spending
on public cloud services was just over $400 billion in 2021; it was
predicted to approach half a trillion dollars in 2022 and keep on
climbing to nearly $600 billion in 2023.
That said, achieving compliance and data security has never been
easy. Organizations must fundamentally assess and align on their
risk appetite, or the level of risk that they are willing to accept in
pursuing their data goals. The more access users have to data, the
less inherent security there is, and vice versa. Risk appetite differs
for each organization but is essential to striking the right balance.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
It’s common to see compliance laws as a major pain. They create
a lot of hoops to jump through and often threaten hefty fines for
those who don’t.
But don’t forget why data and privacy regulations exist — to protect
consumers, employees, and even your business. They promote best
practices, and the data management frameworks they encourage
will likely improve data’s effectiveness and long-term profitability.
»» Know your data. Your enterprise must fully grasp all the
data types you deal with regularly in order to understand the
data security laws and information security standards that
apply. Healthcare organizations deal with patient records,
while practically all businesses maintain customer credit card
information. Tools that perform sensitive data discovery
simplify this step by automatically identifying, tagging, and
classifying sensitive information.
»» Develop a plan. A data security compliance plan explicitly
details compliance requirements and outlines how to
maintain them. A third-party data security platform can help,
particularly if it provides attribute-based access control
(ABAC) and dynamic data masking to enforce data access
policies across all cloud platforms.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
»» Perform regular assessments. Achieving compliance isn’t
something you do once and are done. Data needs change
over time, and so do regulations and data standards. Your
personnel roster changes, too, so you must be sure your
evolving team is always acting compliantly. That’s why
regular data and risk assessments are needed.
On the topic of data compliance laws and what they are, it’s
important to point out what they are not: They are not a panacea
in this age of data perils. Don’t believe that just because you’re
compliant, you’re as secure as you need to be. Compliance laws
may not be nimble enough to keep up with the latest threats, and
they may not be specific enough to cover those threats that are
unique to your industry.
You may still have holes in your data access controls, even if
you’re fully compliant. And if one of those holes ends up allow-
ing a data breach, the results will be just as horrific, regardless of
whether you were compliant with all the rules and regulations.
Being able to say, “But we were compliant” won’t spare you from
bad press, lost trust, and lawsuits.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Indeed, it’s not easy to replicate the same access control func-
tionality in the cloud on the first try. The cloud environments
you’re exploring are more powerful but also more complex.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Defining sensitive data
Chapter 2
Understanding Data
Security
C
hapter 1 talks about how data is a revolutionary tool for
business success, but how it can also bring powerful
enterprises to their knees. Regardless, data security is an
essential part of digital transformation.
This chapter gets into more detail about data security. It discusses
what exactly sensitive data is and how the move to the cloud makes
its protection more complicated. It then outlines the various com-
ponents of data security and the tools used to protect data.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Personally identifiable information (PII) is the best known form
of sensitive data. It might include credit card information,
usernames, and passwords. Similarly, protected health informa-
tion (PHI) refers to sensitive healthcare-related data, which could
include anything in a medical record, from diagnosis to billing
and insurance information. It can even include appointment
scheduling information, which could be used to identify a patient,
even if their name is not directly included.
In more recent years, data has become ubiquitous. There has been
an explosion in data users, with many more people accessing,
processing, and sharing data, both internally and externally.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
The old way of doing things had policies embedded in databases
and tied to user roles. When there were fewer data users,
platforms, and sources, this worked well enough.
In the cloud, though, there are many different ways to access data.
There are also more business rules governing which employees
can access which data in which system. And alongside regulatory
considerations, organizations are implementing contractual
arrangements and data use agreements for sharing data with
third parties.
Multiply all these restrictions, and you’ll find that even the sim-
plest data query is impacted by a web of policies. How can you
efficiently and securely grant data access in that reality?
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
»» Data cataloging: If you can’t find the data you need — even
if it’s properly classified — it’s worthless. Data cataloging
involves creating an organized inventory of your data assets
so data consumers can locate, access, and leverage the data
they need.
»» Data retention: It’s important to retain each piece of data as
long as required by business or legal needs, or regulatory
requirements — and then get rid of it when no longer
needed. You also need to be able to handle deletion
requests as required by various privacy regulations. Data
retention policies spell out how all this is handled.
»» Data lineage: People are told to pay close attention to the
source of the information they read online. Similarly, data
security requires paying attention to the data’s source. Data
lineage tells users where data came from, why that source
was added to the project, and how the data has changed
over time, and it can help determine the various projects
associated with that data source.
»» Data quality: Data quality is measured by gauging such
things as how accurate, complete, timely, and consistent it is.
The quality of data varies, and data that’s deemed good
enough for some purposes is not well suited for others.
»» Data ownership: The data owner is the one who creates
data sources and sets the policy controls that apply to users.
As a steward of a particular set of data, the data owner has a
keen interest in ensuring its security.
»» General change management: In the world of data, change
is constant. A holistic data governance program should
include expertise in change management. Failing to keep up
with evolving threats or security requirements can carry a
hefty price tag.
So, who is involved in data security? The short answer is, every-
one. All players on your team have a role (and the change man-
agement expertise mentioned earlier will help ensure they all
know the part they play).
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Included in the “everyone” that we just talked about are data
scientists and data analysts. These are people who need instant
access to data without having to change any workflows or code —
and your data access platform should be able to deliver.
Data security techniques vary along this path, and they depend on
where the data resides and how it’s consumed. Some of the key
approaches include the following:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
»» Data masking: This approach replaces sensitive information
in a data set with fake (but convincing) data. Static data
masking focuses on data at rest; it makes a copy of existing
data and scrubs sensitive information from that copy so it
can be shared without risking a data leak. This is preferable
for application development and training. Dynamic data
masking, on the other hand, applies masking techniques as
data moves in the data pipeline. This approach avoids data
copies, making it better for access control and compliance
management.
»» Hashing: Hashing transforms strings of characters into
different values. The hash values index a hash table.
»» Key management: If you’ve got keys, you’re going to need a
keychain. Key management is a way to manage crypto-
graphic keys. It refers to how you generate them, use them,
exchange them, and store them.
»» Privacy-enhancing technologies (PETs): This refers to
various dynamic controls to address any privacy require-
ment for sharing sensitive data, including PII, PHI, or
personal data. Such controls include k-anonymization and
differential privacy.
»» Tokenization: This refers to removing a sensitive data
element and replacing it with a nonsensitive equivalent,
called a token. That token refers back to the sensitive data
through a tokenization system.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Discovering the types of access control
Chapter 3
Enabling Access for
the Right Players
T
he key to data security and achieving secure outcomes has
a lot to do with the creation and enforcement of data
policies governing access. This chapter explores the most
common approaches to access control, outlines how concepts are
evolving, spotlights the troubles you may be having with legacy
approaches, and discusses how to build the right strategy going
forward.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Role-based access control
RBAC is about setting data policies based on users’ job roles. It’s
the kind of access control that was common when many of today’s
legacy compute and storage systems were born back in the 1990s.
In that era, it met people’s data security needs.
RBAC gained traction due to its relative simplicity. Roles were cre-
ated by system administrators and assigned as users came onboard.
As long as a new user was taking a role that already existed and
didn’t ever change much, managing user access was practically
automatic, with no need to manually assign permissions.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
ABAC policies take this situational context into account and
dynamically enforce data protection at runtime, making it far
more flexible and scalable than RBAC. Multidimensional controls
dynamically increase agility, zeroing in on each request’s specific
circumstances. With RBAC, however, data protection is implicitly
predetermined by a static policy.
With RBAC, each of the 2,040 users would require a unique role
that’s built based upon both user type (of which there are six
per store, one for each department) and store ID (of which there
are 340). This means at least 2,380 access policies are needed
to enable all users to query the report (2,040 users plus 340
store IDs).
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
PBAC VERSUS ABAC
We mention PBAC as a subset of ABAC, in which the P stands for
the purpose of access layered into permission decisions. PBAC is
beneficial for applying regulation-based or contractual restrictions
to sensitive data access policies, and it taps into data masking tools
as reinforcement.
For instance, a member of a financial firm’s legal team may not typically
have access to a certain data set, but when working specifically on a
fraud case, that person is allowed access because the purpose is legiti-
mate and approved. Think about how this can work with regulations
such as the GDPR and HIPAA, which require data to only be accessed
for specific and approved purposes. In such situations, PBAC enables
granular access control that promotes utility without risking privacy.
Why is that? Consider that for every piece of data, there would be
policies dictating who could see it and what they could do. More data
and more users mean more policies. You could wind up with tens of
thousands of policies that must be manually written and managed.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
The problem of policy burden can be especially troublesome in the
cloud. The term refers to building initial data policies, maintain-
ing those data policies, and updating them to conform to evolving
and sometimes conflicting regulations. For the amount of data
engineering time and resources these tasks require, “burden” is
probably putting it lightly.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
To be sure, an access control strategy is essential to ensuring that
data doesn’t get into the wrong hands. But to operate at cloud
scale, like most forward-thinking and innovative businesses
these days, it makes sense for controls to be based on the data
and data attributes, rather than static user roles.
The best way to achieve all these aims is through an ABAC model.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Implementing dynamic access controls
Chapter 4
Automating Data Access
I
t’s a reality in all sorts of business areas: Manual processes are
slow, consume lots of resources, and are error-prone.
Automation, on the other hand, can get the job done much
faster, reduce resource needs, and virtually eliminate mistakes.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
With this approach, you’ll never derive the value you need from
your data, especially as data use continues to scale. As the envi-
ronment becomes more complex, confidence in your ability to
keep data safe is likely to diminish.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
For a look at how this works in practice, consider Immuta’s
approach as an example. Because the data policy is separate from
the database, Immuta works with any tool — that takes care of
the first pillar. It also handles data wherever it is, in whatever
form, addressing the second pillar.
The data security platform sits between end users and the raw
data that they want to use, which means that no copies of data are
required. That takes care of the third important pillar. A plain-
language policy builder enables people who aren’t particularly
technical to get the job done, so that checks the box on the fourth
pillar. Finally, all data policies are in one place, in one format,
which takes care of that last pillar.
Ultimately, this pulls the whole policy creation process out of the
silos where it used to live. Where you once had privacy, security,
and compliance expertise living in one world and technical
expertise in another, now they all can work together to understand
how data is being controlled and accessed.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Speaking of compliance, centralizing policy management and
enforcement in one place means your audit logs and reports can
also come from that place. Audit logs are standardized, regardless
of the storage technology or project, making it much easier to
track activities and build reports whenever necessary.
Now imagine that you’ve instead tapped into a data security and
automated access control solution that works across multiple
platforms, such as Immuta. You’ll have all the security features
listed earlier across every single cloud provider — and you’ll get
there much more easily and efficiently.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
In fact, dynamic policy enforcement across platforms is up to
90 percent more efficient than more passive and disjointed
approaches. One access control policy can do the work of more
than a hundred identity access management roles.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
with both existing technologies and all modern architectures —
including such models as data lakehouse and data mesh.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Staying secure and compliant
»» Growing globally
Chapter 5
Ten (or So) Use Cases for
Data Access Control
A
ll data-driven organizations — from global enterprises to
small start-ups — need efficient and secure data science
and analytics. They must be able to maximize their data’s
utility, while maintaining customer trust and staying in compli-
ance with all applicable laws. This chapter provides several exam-
ples of how real-life companies have used centralized and
automated data access controls to achieve their goals.
CHAPTER 5 Ten (or So) Use Cases for Data Access Control 27
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
modern data access platform with attribute-based access control
(ABAC), the company continued its forward momentum while
also gaining flexibility and simplicity. Dynamic access controls
allowed the company to reduce data access policies from 40 down
to 5, streamlining management efforts eightfold.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
That’s another good reason to adopt a centralized approach to
access control. As your data architecture becomes more diverse
and decentralized, data protection laws become more complex
to manage from one geographic location to another. Consider,
for example, data localization requirements stipulating the
jurisdiction where various data must be stored or processed. Such
requirements could make access control a compliance nightmare.
Although the data team was small, it was able to manage and
apply policies consistently across platforms and in different juris-
dictions. Data users were able to access the data needed without
going astray of the rules — among other benefits, the ABAC could
make access decisions based on where any user was, and where
the data was.
CHAPTER 5 Ten (or So) Use Cases for Data Access Control 29
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Think about data mesh architecture, for example. Data is a
product, and each data source has a different product manager.
Responsibilities are clear enough and scaling constraints are
less of an issue than with data warehouses or data lakes, but it’s
essential that policies be separated from databases.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
WILEY END USER LICENSE AGREEMENT
Go to www.wiley.com/go/eula to access Wiley’s ebook EULA.