Security and Privacy in Big Data
Security and Privacy in Big Data
Security and Privacy in Big Data
ISSN No:-2456-2165
Abstract:- Data which is large in size, volume and that the data that is collected by the businesses have to be
processing extreme complexity and generated mainly by collected by the data owner’s consent. But not only consent
business is collectively known as Big Data. Due to their is enough to go ahead with their data, but how their data was
complexity, this type of data is not possible to be stored collected and in what way the organization will use their
in the traditional database systems. They require data, the way of storing it should be communicated with the
specifically designed tools for them such as Hadoop. Big data owner to keep the transparency. Proper management of
Data might involve sensitive data which if left unsecured data is required to prevent privacy breaches. However, while
can be exploited by cybercriminals for misusing it. As securing and preserving the privacy of big data, the
with time, the Big Data just keeps on growing, it is more organization has to go through may obstacles in their way.
difficult to secure/protect the data. Every organization
has to secure and maintain privacy of their Big Data to II. CHALLENGES IN SECURING BIG DATA
avoid any negative impact from the data collected
through the hands of cybercriminals. Big Data security A. Distributed data
involves guarding the data and analytics processes A distributed data can lead to security issues as it
performed on it from cyber-attacks such as theft or any requires more attention in securing each distributed data.
other kind of malicious attacks. This research paper Today’s commonly used tool is Hadoop for big data
intends to explore the security and privacy processing and it’s storage which was designed without
concerns/challenges occurring while securing the Big considering the security aspect in it. As the design is free
Data. from security controls on it, the big data is more vulnerable
to any malicious attack on it. A distributed data technique
Keywords:- Big Data, Security, Privacy, Encryption, helps in sharing the workloads of data but at the end of the
Cybercriminals. day it is more tedious to secure it and look after the security
issues arising from all the distributed data.
I. INTRODUCTION
B. Securing NoSQL Databases
Big Data is useful for a business to make better Normally, to store data we prefer our traditional
decisions for their businesses. Data is always exponentially database systems that allows us to store data as set of
growing, so as they get continuous streams of data, it is records. Due to it’s scalability and diversity, it is not the
essential for an organization to properly manage this data in solution for storing Big Data. Hers is when the NoSQL
order to prevent any form of cyber attacks on it. Research database design comes to Big Data’s rescue. In NoSQL
Data Alliance [1] states that – “ Big Data security is the database, data is stored as information in JSON documents.
processing of guarding data and analytics processes, both in Alex Bekker in his blog mentions- [3] “ NoSQL databases
the cloud and on- premise, from any number of factors that are continuously being honed with new features. And just
could compromise their confidentiality”. In a nut shell, it like we said in the beginning of this article, security is being
can be said as the term ‘ Big Data Security’ comprises of the mistreated and left in the background. It is universally hoped
security measures undertaken for data collected and the that the security of big data solutions will be provided
security tools used to perform data analytics on it. It is externally. But rather often it is ignored even at that level”.
important to secure the Big Data because the organization So it is safe to assume that in NoSQL database, security
collecting the big data might end up facing legal actions aspect should not be overlooked while updating the features
even if it unintentionally becomes the source of leaking of NoSQL.
sensitive and confidential data like personal customer’s
information or say credit card numbers to outside harmful C. Endpoint Vulnerabilities
sources. Such organizations involved with big data need to [4] Security problems with big data often start at the
comply with and follow the rules and regulations of General point of entry. The source of the data flowing into a
Data Protection Regulation(GDPR) that deals with basic company’s big data system could be compromised. Suppose
data security measures. Hence, it is extremely important to a cybercriminal hacks into the data source, he can easily
overcome the challenges faced while securing the big data. manipulate that data which can then end up going to it’s
Whereas Big Data privacy is concerned with – [2] “ The destination for further data analytics. In this way, false and
more data you collect, the more important it is to be malicious data will enter the system and moreover the
transparent with your customers about what you are doing results that would be derived from this data will also be
with their data, how you are storing it, and what steps you incorrect. Therefore, it is important to validate the
are taking to comply with regulations that govern privacy authenticity of data at the endpoints to overcome it’s
and data protection”. By this, it can be clearly understood vulnerability for precise and productive business decisions.
C. Data Discrimination
Data can be unfairly used for analytics purposes that
supports data discrimination. Suppose a data consisting of
user’s information is discriminated by age, gender, religion
or caste by developing an algorithm to support data
discrimination. With the use of such algorithm the outcome
will also be unfair. Organizations should always choose
fairness while using the data.