Big Data in Computer Cyber Security Systems
Big Data in Computer Cyber Security Systems
4, April 2016
The Publics Authority for Applied Education and Training- Secretarial and Office Administration Institute -Girls-Computer
Department- Kuwait
quantification and therein lies the difficulty in furnishing a 3. The rise of big data
definition.
• Oracle defines Big data as; the derivation of value The first point and central feature of Big data as a
from traditional relational database driven business phenomenon is the un -presented growth in the volume
decision making , augmented with new sources and variety of high-frequency digital structured and the un
(such as blogs, social media, sensor networks, structured data which badly emitted by and picked up
images est.)of un structured data which vary in about humans populations behaviors and beliefs, each year
size ,structure , format and other factors. It focused since 2012, over 1.2 zeta byte of data have been produced
upon infrastructure -1021 bytes; enough to fill 80 billon 16 GB I phones that
• Intel describes big data through quantifying the would the earth more than 100 times.
expenses of its business partners. It also suggest The volume of these data is increasing just like a human
that the organizations which were surveyed deal population of digital data .And just like a human
extensively with unstructured data and place an population with a sudden outburst of fertility gets larger
emphasis on performing analytics over their data. and younger, The proportion of digital data produced
• Micro soft provides a remarkable succinct recently (i.e. new baby data) is growing .It has been said
definition:"Big data is the tern increasingly used to many times that 90% of world’s data was created over just
describe the process of applying serious computing two years, Although the assertion is almost impossible to
power- the latest in machine learning and artificial source or corroborate. The second development is what
intelligence to seriously massive end of ten highly has primarily been called 'the industrial revolution of data
complex sets of information" ,It states that big data 'big data' Mike Hoeeigan at the US BLS defined big data
requires the application of significant computer as "non sampled data ,characterized by the creation of
power . database from electronic sources whose primary purpose is
Notably something other than statistical inference ".
All definitions make at least one of the following
assertions:
• Size: the volume of datasets in critical factor.
• Complexity: the structure, behavior and
permutations of datasets in critical factor.
• Technologies: tools and techniques which are used
to process a sizable or complex datasets is a critical
factor.
that are serious to essential business processes. Data is often data that the enterprise has not tapped for analytics.
sources such as email, social media content, corporate A business analysts would hold several terabytes of
documents, and web content maybe helpful to add detailed data leftover from operational applications to get a
additional context to traditional security data but largely view of recent customer behaviors .The analyst might mix
unstructured data . that data with historic data from data warehouse.
Next, a variety of analytics can be performed to expose Discovery analytics against big data can be enabled by
security visions from these huge data sets and need more different types of analytic tools ,including those based on
processing time. Also this operation need to be done SQL queries ,data mining ,statistical analytics ,fact
asynchronously to the real-time analysis that traditional clustering ,data visualization . TDWI research big data
security intelligence specializes in. Once the analysis is analytics natural language processing, text analytics,
complete, the vision have to be feedback to the real time artificial intelligence and so on. It's an store of tool types,
components to make the overall solution more effective and know-how users get to know their analytic
over time. requirement before deciding which tool type is appropriate
Finally, a renewed emphasis needs to be placed on to their needs. Many of these techniques appeared in 1990s
investigative analysis that could be branded as hoc before and they have been around for years .The difference today
it is codified. That can offer the specificity of an is that far more user organizations are actually using them.
organization and it business environment this will be very Because most of these techniques adapt well to large,
important for security intelligence solution gain multi-terabyte data sets with minimal data preparation.
interrelated consciousness for frustrating targeted attacks. Today, where is bigdata for advanced analytics managed
and operated on? where would you prefer that bigdata for
advanced analytics be managed and operated on?
5. Advance in Big data Analytics • The EDW :enterprise data warehouse ;is much used
and preferred platform for analytics ,At most two-
Why must we rush to advanced analytics? first :change is thirds of users surveyed report using an EDW today,
uncontrolled in business ,as seen in the multiple and two -third say's a performance .Some EDW were
"economies" in recent years .Analytics Led us to discover originally designed by users for reporting,
the changes and how we react .Second, as we move slowly performance management , and OLAP(on line
out the decline and into the recovery ,there are more analytical s processing), Also It can handle advanced
business chances should be held ,So advanced analytics is analytics - in terms of scalability and query
the best way to know new customers segments ,identify performance and some cannot.
the best suppliers, associate products of • New types of analytic platforms are coming .A few
similarity ,understand sales seasonality, and so on. The users report using cloud-based analytic platforms
need to advanced analytics means that a lot of today, and many users would perform them.
organizations realize the advanced analytics for the first • TDWI expects various types of clouds to become
time and there for they are confused about how to go common platforms for analytics within a few years.
about it. Even if we have related experience in data
• Hadoop: is so hyped we would talk about it later.
warehousing, reporting, and online analytic processing
(OLAP) ,we will discover that the business and technical
requirements are different for advanced forms of analytics.
The user organizations are implementing specific forms of
analytics, particularly what is sometimes called advanced
analytics .This is a collection of related techniques and
tool types, usually including predictive analytics, data
mining ,statistical analysis and compelled SQL .We could
also add to the list data visualization, artificial intelligence,
natural language processing ,and database capabilities that
supports analytics(like Map Reduce, in-database analytics,
in-memory databases, columnar data stores).
Better we call the advanced analytics as "discovery
analytics "why? Because what users are trying to achieve.
In other word some people call it "exploratory analytics"
With big data analytics, the user is typically a business
analyst who is trying to discover new facts that never
known before by anyone in enterprise. To do that, the
analyst needs large volumes of data with huge detail. This figer3
IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.4, April 2016 59
6. Big Data security challenges and risks data meets the trustworthiness that our analysis algorithms
require to produce the accurate results. Therefore, we need
The great opportunity that big data presents for the reconsider the authenticity and integrity of used data in out
enterprises by tapping into varieties and volumes of tools .we can take advantages from adversarial machine
data ,Scientists ,product managers ,marketers , executives , learning and from strong statistics to identify and moderate
and others can take benefit from informing plans and the effects of unkindly inserted data.[6-2]
decisions , discover new chances for optimization , and 6.3. Third challenge, the volume: volume which means
deliver breakthrough innovations. Without the right ( storage) ,The amount of data created every day through
security and encryption solutions the big data could be internet is in the order of Exabyte ,That’s make the
really big problem. capacity of hard disks nowadays in the range of terabytes,
Its large enough and it will get larger in future. The
traditional RDBMS tools will be unable to store or
process such as big data .to solve this challenge ,the
database that don’t use traditional SQL based queries are
used. Compression technology might be a good choice to
compress the data at rest and in memory.
6.4. Forth challenge, Analysis: Analyzing the huge size of
data and the different in structure because the generated
data to several types of online sites ,analysis the data may
consume a lot of time and resources .defeating this, scaled
out architectures could be used for processing the data in
disseminated methods. Splitting data to small pieces and
processing it in huge number of computers available
during the network and the processed data is aggregated.
figer4 6.5. Fifth challenge, limitations of traditional encryption
approaches: However there are many of encryption
In spite of the applications of big data analytics to security offerings around, most of them engage in one specific
problems has significant promise, we have to mention aspect. For example we can use transparent data
some challenges: encryption capabilities from our data base vendor, but
6.1 . First challenge is the Privacy: what happens when that data gets exported to big data
of avoiding data responses (using data only for the environments? Also, what about all other data sources and
purposes that it was collected).Recently, privacy trusted systems in play? we also have to know if the vendor store
largely on www.computer.org/security 75 technological the keys with the data or no ?. some vendors offer big data
limitations on the ability to extract, analyze, and correlate encryption capabilities , It secure only specific big data
potentially sensitive datasets .However advance in big data nodes ,not the original data sources that are fed into big
analytics brought us tools extract and correlate this data environments or the analytics that come out of the
data ,That would make data violation much easier .That environments, Further ,the encryption in big data offerings
make developing the big data applications a must without not secure for the configurations information and also for
forgetting the needs of privacy principles and the log files.
recommendations . Al the activities produced in 6.6. Sixth challenge, Reporting: When huge amount of
communications commission works like data are involved because the Traditional reports display
(telecommunications companies, Health Accountability of statistical data in the form of numbers, It would be hard
data, and any Federal trade commission’s) have been to interpret by human beings. To get over this matter we
broad in system coverage and mostly could cause need to represent the reports in a form that can be easily
interpretation. The large scale collection and storage of recognized by looking into them.
data would be attractive to many people especially (whom
using this data for advertising and marketing), Also
government (finding this data necessary for the national 7. What is cyber security?
security or for low prosecution), and for law breakers (they
would like to steal the identities).That why we need from Cyber security, we can also call it information technology
the big data designers creating a suitable safeguards to security, focuses on protecting computers ,networks,
prevent abuse of these big data stores.[6-1] programs and data from unintended access, change or
6.2. Second challenge, the veracity: which means destruction. Another definition; Cyber security is the body
(authenticity - reality) (the data provenance of technologies ,processes and practices designed to
problem) .why? Because it's difficult to be sure that each protect networks ,computers ,programs from
60 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.4, April 2016
This is where big data comes to its own rescue through the accessible locations .Scientists in any organization might
use of large data sets which enable new generations of need accessing to information, perimeter protection would
algorithms to identify and alert based on the risk and the be very necessary also complicated to ensure access to
right way to handle it .That needs to think about users while protecting the system from possible attacks.
Information Security as a core part of the Metadata that is The possibility that the configuration of servers may at risk
captured and governed around information. and not be stable would increase.
Big data security is a new generation of challenges and Third, the technology, especially big data programming
new generation of risks, these requiring new generation of tools, including Hadoop and NoSQL databases, were not
corporation and new generation of solutions so the originally designed with security in mind. Let us take
information security is not left to few people in the IT hadoop for example, originally did not authenticate
department. services or users, and didn't encrypt data that's transmitted
Securing big data comes with it special and exclusive between nodes in the environment. This would cause
challenges further than a high-value target. Big data weakness for authentication and network security.
security challenges arise because of incremental However NoSQL, database lack some of security features
differences, not fundamental ones. The different between provided by traditional databases, such as role-based
big data environments and traditional data environment access control. The advantage of NoSQL is it allows for
include: The data collected, grouped, and analyzed for big the flexibility to include new data types on the fly, but
data analysis. The infrastructure used to store and house defending security policies for this new data is not
big data. The technologies applied to analyze structured straightforward with these technologies.
and unstructured big data How can we use Big data to stop cybercrime? Lately threat
monitoring .Automated information collection and
intelligence sharing. Smarter links between many security
systems and layers (including physical protections). A
continuous and real-time loop of monitoring data and
behavioral signals. Many cyber security experts think that
it is necessary and useful weapons to fight the plague of
data breaches. So –called 'security analytics' would allow
systems to adjust automatically their risk profile (i.e., go
figer7- The different between Big data environments and traditional data on high alert) once any system in the “threat intelligence
environments network" detects a threat –be it malware a reprobate
bordering or distrustful log activity.
First ,The data : each data source will likely have its own o Beyond big data to data-driven insights for cyber
access restrictions and policies ,making it hard to balance security: Big data and analytics are the most effective
appropriate security for all data sources with the need to defenses against cyber interruptions. Better, faster,
collect and extract meaning from the data .For example ,a actionable security information reduce the critical time
big data environment may include a dataset with from detection to remediation, enabling cyber warfare
proprietary research information, or dataset requiring specialists to proactively defend and protect the network.
regulatory compliance, and separate dataset with Teradata delivers a single, automotive ecosystem
personally identifiable information (PII).Protecting big integrating information security, cyber security, network
data requires balancing analysis with security requirements operations data, analytics and reporting.
on case-by-case basis. Is more Cyber Defense always better? Challengers, or
Also, many of repositories collect data at high volumes hackers continue developing a powerful attacking
and velocity from a number of different data sources; also moments to ruin what considered to be high effective
they could have their own data transfer workflows. cyber defenses .Today with available hacker’s
Multiple repositories with these connections can increase resources, they can easily move around a defense –in-
the attempt to attack surface for adversary. A big data depth strategy to break data systems. A common
system receiving feeds from 20 different data sources may response to evolving attacks is to either add more
present an attacker with 20 viable vectors to attempt to win security tools or increase the sensitivity of the security
an access to a cluster. tools already in place – or both.
Second, The Infrastructure is another big data challenge, It Unfortunately, as cyber-attack worsen and businesses
is the distributed nature of big data environments. Related respond with greater force, existing staff resources are
with a single high-end database server, distributed taxed, yielding less effective security results.
environment are difficult and defenseless to any possible Cyber strategy: Evolving Cyber Threats Demand a
attack. The big data distribute geography ,physical New Generation of Cyber strategy; Hackers today have
security controls need to be standardized across all evolved from hobbyists to professionals. They are well
62 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.4, April 2016
trained and well funded, and also they can run the n-dimensional space where n is the number of
range from social activities and state-sponsored variables.
operators to criminal syndicate members .As a result, • Classifications: is the identification of category/ class
cyber crime is on the rise, costing $118 billion in to which a value belongs to ,on the basic of previously
business losses annually and climbing .Security categorizes values.[9-1].Several Open source tools
professionals used to be confident they could lock exist which help taming Big data some of top tools are
down and secure their networks to prevent cyber attack. mongo DB-(across platform document oriented data
Now the attitude is that cyber attacks are base management system)Hadoop- Map Reduce- (will
expected .And the load is on the IT teams to find out be explained. )-Orange (python based tools for
how to detect and remediate an attack before data is processing and mining big data) - weka (java bases
compromised. [7-3] tool for processing large amount of data, algorithms
used in mining data). - SAP HANA (is a priority in-
memory RDBMS capable of handling large amount of
9-The Top security tools In fight against data). [9-2].let us take closer look to some of these
cybercrime. tools like:
A. Hadoop: Hadoop Is a software framework that can be
Before we talk about the tools and technologies let us installed on commodity Linux cluster to license large
recognize the different types of data we are processing it: scale distributed data analysis .No hardware
we have three different types of data modification is needed other than possible changes to
structured Data: This is the data that we used in meet minimum recommended RAM ,disk space ,etc.
traditional RDMS and is divided into well-defined Requirements per node. The initial version of Hadoop
structures. It has schema that will be checked during was created in 2004 by Dough Cutting (and named after
write & read operation .e.g. Data in RDMS like Oracle, his son's elephant).Hadoop became a top-level Apache
My SQL server est. Software Foundation project in January 2008. It
unstructured: This is important to understand, 80% of provides the robust, fault-tolerant Hadoop Distributed
world data is unstructured or semi structured .These are File System (HDFS), inspired by Google’s file system,
the data which are on its raw form and cannot be as well as Java –based API that allows parallel
processed using RDMS. Example: Face book, twitter processing across the nodes of the cluster using the Map
data. Data does not have any structure and it can be any Reduce paradigm .Use of code written in other
form –Web server logs ,E-mail, Image sect.[9] languages, such as Python and C is possible through
Semi-structured: Data is not strictly structured but have Hadoop Streaming, a utility which allows users to create
some structure .e.g. XML files. and run jobs with any executable as mapped and /or the
reducer. Furthermore Hadoop comes with Job and
Trackers that keep track of the programs, execution
across the nodes of cluster. So Hadoop is a frame work
that allows for distributed processing of large data sets
across clusters of computers using simple programming
modules .There are four modules in Hadoop:
1. Hadoop Common: the common utilities that support
the other Hadoop modules.
2. Hadoop Distributed File System (HDFS): distrusted
file system that provides high-throughput access to
figer 8-Kinds of data application data.
3. Hadoop YARN: a frame work for job scheduling and
the challenges faced in processing Big data technologies cluster resource management.
are overcome by using various techniques. The most 4. Hadoop Map Reduce: AYARN – based system for
popular techniques used like: parallel processing of large data sets.
• Regression: is used in predicting values of dependant HDFS file system –There are some drawbacks to HDFS
variable by estimating the relationship among use. HDFS handles continuous updates (write many) less
variables using statistical analysis well than traditional relational database management
• Nearest Neighbor: in this technique the value are system. Encompasses HDFS (Hadoop Distributed File
predicted based on the predicted values of the records System) and Map Reduce programming framework.
that are nearest to the record that needs to be predicted.
• Clustering: it involves group of records that are Why Hadoop Because ; Hadoop is very useful but it's
similar by identifying the distance between them in an not the only way of interacting with the data. This
IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.4, April 2016 63
reason Hadoop is the industry standard for handling B. Map Reduce: we can also use Map Reduce (Map
big data because it's very scalable. As more you throw Reduce has their own vision of Hbase), in - memory
of computing resources at it as more you would receive databases and analysis frameworks like Spark and Shark,
a better performance and higher data processing as well as graph databases like unlimited Graph and
capabilities working with hadoop primarily might feel Titan. This is by no resources a complete list, the point
a little discouraging and the barriers to use appear is different use cases needs different tools and many
higher especially because of the Java API. It's different ways of interacting with big data. Hadoop can
unimportant to increase more resources; scaling five be used for everything, is the hit for which everything is
nodes cluster to thousand node cluster does not terribly a secure but it's not necessarily the perfect tool for every
increase the load to the administrate. You do not have use case. Map Reduce highlights parallelism in data
to worry about the spiteful issues associated with big retrieval .Jobs are parceled out or 'mapped' to a number
data spanning across multiple disks and multiple of subsidiary nodes, with result handed backup
machines .It takes care the resilience, fault tolerance, (reduced) in the ultimate of the original tasks.
and scalability issues. Hadoop is also fault tolerant. It a C. NoSQL: It’s a platform such as Cassandra , Mongo
disk or node, or even a whole rack of nodes goes down DB, and others .It's techniques that directly address
your data is replicated across the cluster in such way some of limitations of traditional .relational data store
that prevents losing any data. Also the running jobs when analysis of a body of data is the priority .They can
that processing data also are fault tolerant, restarting be highly distributed systems ,developed in many cases
tasks when necessary to ensure that all the data is to deliver better performance in data management and
correctly processed. In addition to Hadoop itself, There retrieval at Internet scale .They are well suited for the
are multiple open source projects built on top of analysis of an entire body of recorded data to discover
Hadoop .Major project are described such below patterns, trends, and anomalies which makes them
compelling candidates for handling large and diverse
table 1 bodies of security-relevant data. These techniques
Hive Created for people with SQL background, Data incline to hold high distributed, fault-tolerant
ware house developed at Face book used for hoc architecture on commodity hardware. Historically
querying with SQL type query language and for
businesses have been storing the data in relational
more complex analysis. Users define tables and
columns. Data loaded into retrieved reports,
databases with some normalization and IT groups are
analysis .It designed for batch offer real-time contended maintaining and querying in this model. With
queries. The queries written are similar to SQL. new tidal wave of data that organizations are looking to
It process completely structure data not for store and take advantages from NoSQL solutions are
unstructured data. becoming much more essential due to the resilience and
PIG A high –level data flow language (Pig-Latin) scalability factors. All of the NoSQL solutions
and execution framework whose compiler fundamentally offer an system just like other
produces sequences of Map reduce programs for technologies can be at the top of the mass in hierarchy.
execution within Hadoop, its batch for There are two important factors when analyzing large
processing data.
data sets and Hadoop does both well:
It can be used for both structured and
unstructured data. • the ability to run processes across multiple nodes in the
https://developer.yahoo.com/blogs/hadoop/comparing- data center and file system that can store results in single
pig-latin-sql-constructing-data-processing-pipelines- view. A lot of companies have tried their own
444.html techniques and technologies to achieve this but, with
using Hadoop they really became satisfied these two
requirements in a dependable, open -source way.
• The barriers to use Hadoop is understanding the API and
architecture, setting it up, and using the eco-system have
become considerably lower. For example using Map
Reduce abstracts away a lot of the issues that initially
made Hadoop difficult to use. It's becoming easier to
quickly start using Hadoop primarily due to additional
companies creating tools and APLs with their own
optimizations and stacks that let us focusing on business
problems and less on the infrastructure.
figer9 Another advantages of NoSQL is It's great tolerance for
flexibility in embracing a wider variety of data and data
structure compared to relational systems, which often
64 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.4, April 2016
requite data to conform to a defined schema on (or before) The data driven applications have appeared in the past
ingestion. When structure is needed to define a dataset for decades for example ,as early as 1990's business
analysis, NoSQL techniques allow more precise intelligence has become a prevailing technology for
definitions later in the process. business application and network search engines based
on massive data mining processing emerged in the
This further suggests how NoSQL techniques may enable early twenty-first century .Some potential and
security organizations to take in a wider variety of data, influential applications from different field and their
such as unstructured data from both internal and external data and analysis characteristic as follows:
sources, and binary content such as images or video • Evolution of commercial applications: Evolution of
(which may be managed via the application of structured commercial applications: The past business data was
or semi-structured metadata). mostly structured data, which collected by companies
from old systems and stored in RDBMSs. Analytical
technologies used in such systems were prevailing in
1990's and it was very simple .e.g., reports, instruments
panels, special queries, search-based business
intelligence, online transactions processing, interactive
visualization, score cards, predictive modeling, data
mining.
• Evolution of Network applications: Text analysis ,data
mining, and webpage analysis technologies have been
applied to the mining of email contents and building
search engines ,nowadays ,most applications are web-
based ,also their application field and design goals.
Network data accounts for major percentage of global
figer 11-comparison between Traditional RDBMS table and NoSQL data volume . has become a common platform for
database table interconnected pages, all kinds of data, like (text,
images, videos, pictures, and interactive contents.
10-applications of big data techniques in etc.).So many advanced technologies used for semi-
structured or structured data emerged at the right
learning moment. For example, the image analysis technology
Firstly the learning started in the classroom was based on may extract useful information from pictures e.g. face
three models: recognition. Multimedia analysis technologies can be
applied to the automated videos systems for business,
table 2
low enforcement, and military applications. Different
users might use daily and celebrity news publish their
Behavioral Cognitive Constructivist
social and political opinions, and provide different
applications with timely feedback.
It relay on
observe
The student • Evolution of Scientific applications: Scientific
It based on have learn on research in many fields causing massive data with
changes in
the active their own from high-throughput sensors and instruments, such as
student
involvement knowledge genomics, ocean logy, astrophysics, and environmental
behavior to
of teacher in available to research. The U.S National Science Foundation (NSF)
assess the
learning them.(learning has announced the big data research Interactive to
learning out
in network) promote research efforts to extract knowledge and
come
insights from large and complex collections of digital
data.
Recent learning methods depend greatly on online
activities (on line learning management systems) to
improve the learning experience .students has started using
smart phones to access learning content. As the learning
environment have become accessible anywhere through
the internet. The data available from student activities and
also the data of educational institutions which use
applications to mange courses, classes and students. The
data is so huge so the educational institutions start using
"big data" technologies to process the educational data.
• Application Evolution: The propose of big data and
big data analysis is describing datasets and analytical Figer 12- Application Evolution
technologies in large-scale complex programs, which
need to be analyzed in advanced analytical methods.
IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.4, April 2016 65
Conclusion: http://www.tableau.com/sites/default/files/whitepapers/tdwi
_bpreport_q411_big_data_analytics_tableau.pdf.
Big data technologies are changing the whole world, [4] Big Data Analytics for Security- Posted by Alvaro A.
Cárdenas, Pratyusa K. Manadhata, Sreeranga P. Rajan-
everything from internet of things to gathering both more
http://www.infoq.com/articles/bigdata-analytics-for-security.
qualitative and more quantitative data will lead to better [5] http://www.umuc.edu/cybersecurity/about/cybersecurity-
decision-making and insight. By leveraging big data asics.cfmumuc[7-
technologies effectively ,organizations can be more 1]http://whatis.techtarget.com/definition/cybersecurity
efficient and more competitive .Privacy advocates and data [6] http://whatis.techtarget.com/glossary/Security-Threats-and-
organizers criticize the history of big data as they watch untermeasures
the growing ubiquity of data collection and increasingly [7] teradata-http://www.teradata.com/Cyber-Security-
tough uses of data enabled by powerful processors and http://bigdata.teradata.com/US/Success-Stories/Innovations-
unlimited storage .Researchers ,business ,and and-Insights/
[8] Securing Big Data - Part 1-Posted by Steve Jones at
entrepreneurs strongly point to concrete or anticipated
Tuesday, January 06, 2015
innovations that may be dependent on the default [9] http://service-architecture.blogspot.com/2015/01/securing-
collection of large data sets. Also the quick growth of the big-data-part-2-understanding.html.
internet has bought with it an exponential increase in the [10] unstructured data in big data environment-
type and frequency of cyber attacks. Many well-known http://www.dummies.com/how-to/content/unstructured-
cyber security solutions are in place to counteract these data-in-a-big-data-environment.html
attacks. [11] http://ictactjournals.in/paper/IJSC_Paper_6_pp_1035_1049.
The huge argument today is how should privacy risks be pdf
weighed against big data rewards? [12] ICTACT JOURNAL ON SOFT COMPUTING: SPECIAL
ISSUE ON SOFT COMPUTING MODELS FOR BIG
especially the recent controversy over leaked documents
DATA, JULY 2015, VOLUME: 05, ISSUE: 04 1035
revealing the massive scope of data collection, [13] APPLICATION OF BIG DATA IN EDUCATION DATA
analysis .Big data creates tremendous chance for the world MINING AND LEARNING ANALYTICS – A
economy not only in field of security ,but also in LITERATURE REVIEW -Katrina Sin1 and Loganathan
marketing and credit risk analysis to medical research and Muthu2-1Faculty of Education and Languages, Open
built-up planning. At the same time the unexpected University Malaysia, Malaysia-
benefits of big data are tempered by concerned that http://bmcbioinformatics.biomedcentral.com/articles/10.118
advances of data ecosystem will turn over the power 6/1471-2105-11-12-1
relationships between government, business and [14] -https://developer.yahoo.com/blogs/hadoop/comparing-pig-
latin-sql-constructing-data-processing-pipelines-444.html
individuals, and lead to racial or other profiling.
[15] M.Chen et al., Big data: Related Technologies, Challenges
Discrimination over criminalization, and other restricted and future Prospects,:"-Springer Brief in computer science-
freedoms. http://link.springer.com/chapter/10.1007/978-3-319-06245-
Finally: It is really very important to understand the 7_6#page-2.
security and privacy implications resulting from big data
implementations supporting non information security
functions. Specifically, security required executives should
be aware of who Big data increases attack surface of
hackers and understand how to protect against link ability
threats.
Insert acknowledgment, if any.
References:
[1] Motivation for Big Data-Pro Apache Hadoop-Date: 08
September 2014-Sameer Walker Affiliated with,
Madhu Siddalingaiah.
http://link.springer.com/chapter/10.1007/978-1-4302-4864-
4_1#page-2
[2] Undefined By Data: A Survey of Big Data Definitions-
Jonathan Stuart Ward and Adam Barker-School of
Computer Science-University of St Andrews,
UK{jonthan.stuart.ward, adam.barker}@st-andrews.ac.uk-
http://arxiv.org/pdf/1309.5821v1.pdf
[3] BIGDATA ANALYTICS - 5th QUARTER BY PHILIP
RUSSOM -TDWI research.