Big Data Management Challenges: Article

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/339672487

Big Data Management Challenges

Article  in  International Journal of Advanced Trends in Computer Science and Engineering · February 2020
DOI: 10.30534/ijatcse/2020/102912020

CITATION READS
1 117

3 authors, including:

Sabrina Šuman Maja Gligora Markovic


Polytechnic of Rijeka University of Rijeka Faculty of Medicine
14 PUBLICATIONS   14 CITATIONS    22 PUBLICATIONS   46 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Enhancing the efficiency of an e-learning system based on data mining, (reference number 13.13.1.2.02) View project

"Extending the information system development methodology with artificial intelligence methods" (reference number 13.13.1.2.01.) supported by the University of Rijeka
(Croatia) View project

All content following this page was uploaded by Sabrina Šuman on 16 April 2020.

The user has requested enhancement of the downloaded file.


ISSN 2278-3091
Sabrina Šuman et al., International Journal ofVolume
Advanced9,Trends
No.1,inJanuary
Computer–Science and Engineering,
February 2020 9(1), January – February 2020, 717 – 723

International Journal of Advanced Trends in Computer Science and Engineering


Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse102912020.pdf
https://doi.org/10.30534/ijatcse/2020/102912020

Big Data Management Challenges

Sabrina Šuman 1, Patrizia Poščić 2, Maja Gligora Marković 3


1
Polytechnic of Rijeka, Business department, Croatia, ssuman@veleri.hr
2
University of Rijeka, Department of Informatics, Croatia, patrizia@inf.uniri.hr
3
University of Rijeka, Faculty of Medicine, Department of Medical Informatics, Croatia,
majagm@medri.uniri.hr


analytical processing are used (for example to reduce costs,
ABSTRACT reduce time in business processes and respond to queries,
faster and better development of a new product, eliminate
The emergence of new data types in big data era implicates failures, errors and failures, better customer relationships,
the need to analyse and exploit them to gain valuable business risk assessment and making better quality decisions).
insight. Traditional platforms cannot fully meet the analytical Problems related to big data are most commonly related to
needs of the company if support for an unstructured data type storage, processing, and management in general [7][8]. There
is needed. This paper gives an overview and synthesis of areas are also issues related to data ownership, privacy and security
related to big data technologies, with a series of guidelines for (see for security issues and algorithms in [9]), quality (a large
adopting the appropriate software, storage structure, and amount of data from different sources should be readily
efficient deployment for big data management. A broad data available for analysis in a short time) and timeliness (a large
management context is presented through a conceptual model
amount of data, longer analysis, streaming issues analyses)
of business performance management in a modern data
[10]. In big data, there are also new storage structures (e.g.
management era.
data lakes, distributed file systems, non-relational databases,
etc.), needs for new competencies of existing experts and new
Key words: Big data, Big data management, Hadoop, Big
profiles of experts (big data analyst, big data engineer, big
data processing, Data lake
data architect and so on) [3].
1. INTRODUCTION
In order to maintain and increase the company's ability to use
The ICT field is extremely dynamic, with the frequent technology for successful decision-making, it is necessary to
emergence of new technologies. In the last decade there have build an innovative management platform with all new data
been major changes in data management with the emergence types, new methods of storage, processing and application of
of new types and sources of data summed up in the big data intelligent methods (knowledge and information retrieval,
concept. Big data are complex, layered, large amount of data, pattern recognition, machine learning, optimization methods,
and big data technologies are usually considered in etc.) [11]. As Russom's research [12] shows, employees are
comparison to standard 3V: Volume, Variety (of many aware of changes and issues in big data, and 82% of them
different data types from different sources), Velocity, and (225 of them) believe that data in their company evolves, in
additional 2V: Veracity (Data Quality) and Value (Value for terms of diversity in structure, type, sources, the way they are
Business). Big data can also be structured, but they are managed and how they are used in business (20% claim to be
primarily semi-structured and unstructured data types. It is drastic, while 62% claim to be moderate). In the same survey,
primarily the size and data diversity that creates new analytics one of the major issues highlighted the incompatibility of
approaches [1]. It imposes special ways of retrieving, large data types and structures with relational databases (68%
transforming and preparing, storing and analysing [2] [3]. of respondents). This requires a revision of the data
Utilizing big data technology has unlimited potential for management strategy and good information regarding the
improving both personal life and business competitiveness [4] potentials and disadvantages of big data technology, big data
[5]. At the same time, a large amount of data types make it tools, platforms and ways of implementation. The purpose of
difficult to find the right values from data [6], and data this paper is to provide arguments and guidelines for the
management in big data is extremely complex. The value of adoption of an appropriate big data management strategy,
big data lies in the way that information obtained through selection of software tools, storage structure and efficient
deployment, through an overview of the field of big data
technologies. The aim of the paper is to provide a synthetic

717
Sabrina Šuman et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(1), January – February 2020, 717 – 723

overview of the area relevant to the establishment of modern warehouse concept in big data should evolve because it does
data architecture. After the introduction, a review of concepts not solve all issues related to analytics and decision support.
important for understanding the wider area of modern data
architecture is given - big data technology from the aspect of One of the changes in business requirements is the need for
data types, storage types, analytical processing, and overall analysing unstructured and semi-structured data, performing
big data management strategy. The concept of warehousing streaming analytics, network analytics, and other phenomena
modernization, the Hadoop ecosystem and the data lake were and needs related to big data. The data are generated in huge
discussed in particular. In the results and discussion section, quantities, completely incompatible with the relational
examples of modern data architecture are given as well as the model, often with lacking ownership, so it is difficult to
synthetic representation of the processing phase of the big manage and store them in a rigid structure. Often, a lot of
data with the description of each phase with the examples of resources were spent on building a warehouse without
tools used in phases. thinking about data and analytical needs. Also, many
implementations were unsuccessful, mostly as a result of bad,
The motivation for this research stems from the current issues rigid designs. More modern and agile design and
of managing new forms of data and concerns all data implementation techniques show greater success, customer
management phases: from data sources, through purification, satisfaction, and greater return on investment. Such
analysis, visualization and storage. Also, companies are faced techniques are successful because they allow for flexible
with the problem of choosing the right solutions to manage upgrades and changes that result from changes in business
their data and reviewing possible solutions with a series of requirements.
guidelines can be beneficial.
In a response to the need to analyse new types of data in a big
2. DATA MANAGEMENT OVERVIEW IN BIG DATA data era, many innovative tools and techniques have been
ERA developed to store and process this data. The basic concept in
developing these tools and techniques was that each company
In this section, we list some traditional and modern elements accessed the data in a customized, personalized way that
of data management such as data warehouses, Hadoop meets the specifics and requirements of the company. Thus,
framework, Data lake, Spark and Map Reduce the evolution of data warehousing is needed to adapt and
coexist with other analytical solutions that include the use of
2.1. Data warehouses role in a big data era new data types and new data sources. This does not mean that
Data warehouses are created due to the need to integrate the there will be no need to store and manage structured data in
contents of different databases and other data sources over relational structures, but that companies will use different
time and access these data effectively to perform analytical forms of storage, management, and data processing. Using
processes. It is a centralized, cleaned and integrated big data and managing them is a completely separate need
organization of different source data. Today, the following and additional business potential that cannot replace the need
questions can be asked: can data warehouses meet the data for warehouses and warehousing [13].
management needs in big data? Is data warehouse needed in
big data era? Are solutions related to big data management a 2.2. Hadoop framework
replacement for data warehouses? The Hadoop framework is the technology that is becoming the
standard when it comes to storing and processing large
First of all, a big data solution is a technology that involves amounts of different data (big data). The Hadoop platform is
storage and management of big data, and a data warehouse is allows performing the tasks on multiple computers (Cluster).
architecture. Today, there are cases where some companies It is optimized for processing large amounts of data. Hadoop
may have a big data solution as well as a data warehouse, architecture consists of:
cases with no big data management solution, but with a data • Hadoop Common Package – contains Java Archive Files
warehouse, and a scenario where the company does not have a (JARs) and scripts required to run and manage all Hadoop
data warehouse but has a big data management solution. Each modules;
of the scenarios depending on the business activity and • MapReduce processing mechanism - a system based on
business model of the company can or does not have to be YARN for parallel processing of large data sets;
successful. While the company needs reliable, consolidated, • Hadoop distributed file system (HDFS) - a distributed,
relevant data for decision-making and management at all scalable and portable file system which stores large amounts
levels, it needs also a data warehouse [13]. Yet, the traditional of data (GB and TB) on multiple computers;
data warehouse provides the basis for reporting and analytics • YARN - a central platform for managing operations,
of structured data but probably does not represent the most security and data through Hadoop clusters [15], [16].
cost-effective way to store all kinds of data [14]. Data
718
Sabrina Šuman et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(1), January – February 2020, 717 – 723

Hadoop is a platform that offers a solution for many computing platform and a big data processing framework.
limitations of past technologies, such as storage constraints Apache Spark is a framework for fast processing of a large
and large volume data processing capabilities [17]. It supports amount of data that can be used generally for all types of data
multiple data formats – i.e. structured, semi-structured and processing (batch processing, interactive analysis, stream
unstructured data. It is the open source software, with low processing, machine learning and graph computing) [21].
implementation costs, and low learning curve. Using NoSQL Provides a fast-paced computing environment for general
solutions, such as Apache Cassandra, helps eliminate purpose and sophisticated analytical capabilities that enable
performance issues, costs, and the availability of big data the development of analytic applications written in Java,
applications. To manage such data, it is convenient to use Scala, Python or R [21]. Spark can run independently, on the
Apache Hive, data storage software built on Hadoop. It allows Hadoop cluster or in the Mesos environment. Spark can
reading, writing and managing large databases stored in connect at run time on HDFS, Amazon S3, Cassandra, and
distributed storage with SQL support [3]. Apache HBase.

The Hadoop ecosystem provides a variety of open source and Apache Spark stands out at its speed due to the ability to
commercial technologies for processing fast, interactive BI perform in-memory processing. Spark is more effective for
queries, data retrieval and research, and sophisticated disk processing applications. If linear data processing of large
analytical processes such as testing predictive models [18]. data sets is used, the advantage should be given to Hadoop
Today, technologies enable users to run queries and analysis MapReduce while Spark provides fast performance, real-time
within the Hadoop cluster and/or clouds without having to analytics, graphing, machine learning and other. In many
move data to a data warehouse, data marts, or stand-alone BI cases, Spark can surpass Hadoop MapReduce. Also, Spark is
server [18]. The Hadoop cluster consists of many parallel fully compatible with the Hadoop Ecosystem [22]. It is
machines where large data sets are stored and processed. suitable to be used by a data scientist or a statistician without
Client computers send tasks to this cloud of computers and get or limited knowledge of cluster computing. It is usually
results. Storage and data processing take place within this enabled by interactive shells similar to those such as
"cloud" machine. Different users can send computerized tasks MATLAB or R. It is particularly suitable for interactive data
to Hadoop from individual clients (their own machines at mining of large data sets in clusters [23].
remote locations from the Hadoop cluster). The linear
scalability offered by Hadoop clusters with flexible cloud
2.5. Data lake
scalability storage can enable organizations to be agile and
It is the central repository for all organization data (without
flexible in expanding computer power in response to
predefined data schema). The goal is to collect all the data
immediate BI analytic needs. Companies can also improve
security management - use security procedures that are set on before they are potentially lost. It provides data consolidation
and a highly customizable analytical approach. The lake
sources, as the preparation and processing takes place where
the data is located [18]. consists of a distributed, scalable file system (HDFS or
Amazon S3), and one or more dedicated processing and query
tools such as Apache Spark, Drill, Impala or Presto [24].
2.3. MapReduce Upgrading the existing data warehouse with a data lake
MapReduce is the script frame for applications handling huge (creating hybrid architecture) is a positive change for
amounts of both structured and unstructured data stored in the businesses. Advantages of increasing data warehouse include:
Hadoop Distributed File System (HDFS). Each MapReduce  Great savings in storage costs - scaling architectures (e.g.
works in two phases: the map phase which maps the input Hadoop, AWS S3) can store non-processed data in any
data to key/value sets and reduce phase which takes key/value format at a much lower cost than data warehouses.
pairs and provides a desired output based on applying its own  Significantly accelerating processing - flexible data lake
algorithms [19]. MapReduce algorithm can be written in architecture enables faster data loading and parallel
many languages (Java, C ++, or Python) and its tasks are easy processing, resulting in faster instant analytical insight.
to run. MapReduce can handle petabytes of data from HDFS  Maximizing efficiency - spending less time on low-value
on one cluster at maximum processing speeds. MapReduce business activities (ETL, for example) and making better
takes care of the failures and allows retrieving the redundant use of resources for strategic goals of high business value.
copy of the processing data. It moves the processing to data in  Getting more valuable business insights, faster and of a
HDFS, and not vice versa. Processing tasks may appear on a larger amount of data (lower storage costs allow you to
physical node where data is found, significantly contributing store more data, leading to more accurate trends, better
to Hadoop's processing speed [20]. forecasts, etc.) [25].
2.4. Spark Comparison of the classical data warehouse and data lake,
based on research data from [25][26] is given in Table 1.
Apache Spark is a distributed or clustered open source
719
Sabrina Šuman et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(1), January – February 2020, 717 – 723

Table 1: Comparison of a classic data warehouse and a data lake.


Data warehouse comparison of Data lake
structured, processed, data preparation data and data structured, unstructured, in its source (raw)
requires IT department assistance, preparation form, provide self-service ad-hoc
administrator permissions etc. transformation without administrator
permissions
the data were processed before being entered processing the data is in the original format and is
in the data warehouse processed as needed
on large databases, very expensive for large storage designed for a large amount of data at a low
amounts of data cost
fixed configuration flexibility flexible, possible reconfiguration as needed

mature and steady security in development


business professionals for whom it is data scientists, data scientists, data
intended? engineers
moderate scaling but with high-cost scaling large scaling at a low cost
efficiently utilizes storage and processing cost / efficiency efficiently utilizes storage and processing
capabilities but with high cost capabilities at a low cost
easily manage the quality and security of data management requires an approach to create metadata for
data raising quality, security and privacy

Russom’s research [12] shows that out of 225 respondents, have very unpredictable burden and usage patterns. In this
90% of them know the concept of data lakes, 24% think that environment data scientists should have the freedom of
Hadoop data lake has a really high impact on the success of experiment with new data types from different sources, new
data management strategies in their company, 32% consider methods of data transformations, and new analytical models
the impact is moderate, while 44% of respondents do not even to get new valuable insights from data and build predictable
consider this question at all. The users consider the following business models. It's too manageable and allows data
items (in terms of consequences) as the ones that would scientists to use any tools that prefer research, analysis, and
benefit the most of data lake implementation (Table 2). analytical modelling [27].

Table 2 data lake's implementation positive impact [12] Data Lab - Big data and advanced analytics require different
Advanced Analytics (data mining, Machine 49% technologies and approaches. The analysis may require data
Learning, Complex SQL) that are not available in the warehouse. Models can be
Data Exploration and Knowledge Discovery 49% CPU-intensive and create problems for other applications at
Sources of big data for analytics 45% the same time. There may be conflicts between a warehouse
Data warehouse widening 39% administrator who wants a carefully controlled environment
Data retention for storage 36% and analysts, especially data scientists who want maximum
Reducing Cost of Data Storage 34% flexibility. That is why data labs can be created where users
A possibility of using the data also for 24% can make changes, unlike in original storage tables. The Data
nontechnical user types Warehouse Administrator creates certain lab owners for
To accept unstructured data 21% specific areas, such as marketing and sales. Each lab owner
identifies people who have access to a data lab. The Data
As barriers to the implementation of Hadoop data lakes, users Store Administrator in collaboration with the users
have often reported a series of reasons related to data determines how much the workspace is allocated to each lab
management, security and lack of knowledge and skills and sets the expiration date [27].
related to Hadoop and big data technology.
3. RESULTS AND DISCUSSION

2.6. Other elements of a modern data architecture Based on changes and challenges reported in previous section
Analytics / Sandbox Environments - This environment in area related to modern data management, a visualised and
almost completely contradicts the easy-to-manage BI / DW summarized overview and a more detailed insight into the
environment of a predictable workload that supports classical modern data architecture is given in Figure 1. It starts with
managerial reporting of a type "what's happened" with the general data management cycle (with some modern
business questions. It represents a research environment that architecture elements added). Then the possible benefits from
720
Sabrina Šuman et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(1), January – February 2020, 717 – 723

introducing modern data platforms’ elements, a number of "preparing" and storing data (cleaning, consolidating,
possible strategies and implementation solutions are structuring, aggregating, storing) to serve the company's
discussed. At the end of this chapter, detailed big data analytical needs as efficiently as possible, and an analytic
processing phases are described in detail, along with subsystem where data is "exploited" for the implementation of
descriptions of basic processes, and a description of software previously defined goals. In the last subsystem, there are a
support, needed for their realization. number of activities in which a whole spectrum of different
tools is used to extract the potential of different types of data.
3.1. Data management context in big data era
The need for continuity of running such performance
In order to show a wider context related to data management management cycles is stressed, so that the company
in big data era, a synthetic graphical overview of the moderates goals, KPIs, and selects the tools that deliver the
performance management cycle of an enterprise is given. A best results. Storage methods and analytical tools and
conceptual model, i.e. a framework (Figure 1), has been
methods used are divided into traditional BIs (storage in data
developed that specifies the components, that is, the areas that
warehouses and analytical processes mostly over structured
participate in the efficient business of the company. data) and on big data typical storage and analytical processes
Model is separated into three interconnected areas or (data lakes and sandboxes). For each analytical category there
subsystems. A subsystem representing internal or external is a full range of techniques and tools [28] [29].
data and/or systems generating such data. A subsystem for

Figure 1: Business Performance Management Cycle

or they combine local storage with cloud storage services.


3.2. Synthesis of big data processing phases
Big data Processing can be seen as six dependent phases. First, In the fifth phase, different tools, methods and techniques, for
data are generated in different applications and systems analysing and using data are used to obtain information
(internal and external data in different formats and important for supporting business activities and making
structures). decisions. While the data warehouse is used to process and
analyse structured data, the Hadoop cluster is used for
The second phase includes all steps to download generated processing and transforming unstructured and
data from different sources (web scraping, web crawling, semi-structured into structured data. For analytical processes,
APIs ...). the Hadoop ecosystem has multiple extensions for queries,
data processing, storage in NoSQL databases (e.g. HBase),
The third phase involves cleaning and converting data from data warehouses (e.g. Hive) and advanced data mining and
multiple sources processing. Today, data management machine learning algorithms.
systems are expected to be able to process data in real-time
(streaming data) and batch-aggregate processing. This means The Sixth Phase - the information and results obtained from
that the dynamics must also adjust the processes of analysis phase should be visually presented, assigned,
preparation, cleaning, and other data transformation actions. distributed and presented to its users in the final phase [30].
All the phases are also synthesized visually (based on data
The fourth phase is storage - usually, all data types are from [31], [32] and [33]) on a Figure 2.
permanently stored in some type of the file system or database,
721
Sabrina Šuman et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(1), January – February 2020, 717 – 723

Figure 2: Synthesis of the big data processing phases

4. CONCLUSION capabilities requirements. Subsequent research activities will


aim at adjusting the study programs in higher education
Traditional data management architectures cannot meet the structures in order to meet the companies' demands for
current needs of companies for integrating and analysing a management and decision making in large data and IoT era.
wide range of data types generated from a variety of sources.
The modern data platform enables analytical processing of
both historical data and in real time, and for the structured, REFERENCES
semi-structured and unstructured data, clouded or locally 1. I. A. Atoum and N. A. Al-Jarallah, Big data analytics
stored. The new big data technologies are complementary to for value-based care: Challenges and opportunities,
existing data management technologies and serve to manage, Int. J. Adv. Trends Comput. Sci. Eng., vol. 8, no. 6, pp.
process, and analyse new types and forms of data that are not 3012–3016, 2019.
supported in standard BI / DW systems. Therefore, new data https://doi.org/10.30534/ijatcse/2019/55862019
management platforms complement and optimize the 2. I. P. Popchev and D. A. Orozova, Towards Big Data
existing ones. This paper presents the context of data Analytics in the e-Learning Space, Cybern. Inf.
management in big data era, provides a review of new Technol., vol. 19, no. 3, pp. 16–24, 2019.
technologies, concepts, platforms in modern data 3. J. Campos, P. Sharma, U. Gorostegui Gabiria, E.
management architecture. Jantunen, and D. Baglee, A big data analytical
Companies should realize the benefits and problems of using architecture for the Asset Management, Procedia
big data technology, but above all, should understand their CIRP 64, pp. 369 – 374, 2017.
needs to successfully implement those new technologies that https://doi.org/10.1016/j.procir.2017.03.019
help them achieve successful business goals. Whereas much 4. J. Morris, Top 10 categories for Big Data sources and
of the big data is stored using the Hadoop File System (HDFS) mining technologies, 2012. [Online]. Available:
and on distributed computing platforms that support Hadoop https://www.zdnet.com/article/top-10-categories-for-big
-data-sources-and-mining-technologies/.
clusters, most companies need to be thoroughly informed
5. V. Mayer-Schönberger and K. Cukier, Big data: A
about the technologies within the Hadoop ecosystems related
revolution that will transform how we live, work, and
to all the above-mentioned big data processing phases. In
think. New York: Houghton Mifflin Harcourt, 2013.
terms of assistance and aid during that process, an overview 6. B. Butler, Cloud Cronicles, 2015. [Online]. Available:
of some of the possible solutions is provided in Figure 2 and it http://www.networkworld.com/article/2973963/big-data
provides a general and broad overview of the big data -business-intelligence/5-problems-with-big-data.html.
technology. In order to create an optimal strategy and 7. D. Ahamad, M. Akhtar, and S. A. Hameed, A review
appropriate software selection, a skilled team is needed with a and analysis of big data and mapreduce, Int. J. Adv.
spectrum of knowledge and competencies in data Trends Comput. Sci. Eng., vol. 8, no. 1, pp. 1–3, 2019.
management, a modern data architecture, big data 8. C. Ji, Y. Li, W. Qiu, U. Awada, and K. Li, Big data
technologies, and also business domain experts. The processing in cloud computing environments, in
following research activities are directed toward identifying International Symposium on Pervasive Systems,
new employees' profiles and knowledge, competencies and Algorithms and Networks, 2012.

722
Sabrina Šuman et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(1), January – February 2020, 717 – 723

9. P. Amarendra Reddy and O. Ramesh, Security O’Reilly Media, 2018.


mechanisms leveraged to overcome the effects of big 26. M. Knight, Data Warehouse vs. Data Lake
data characteristics, Int. J. Adv. Trends Comput. Sci. Technology: Different Approaches to Managing
Eng., vol. 8, no. 2, pp. 312–316, 2019. Data, 2017. [Online]. Available:
10. A. Al-Drees, R. Bin-Hezam, R. Al-Muwayshir, and W. https://www.dataversity.net/data-warehouse-vs-data-lak
Haddoush, Unified Retrieval Model of Big Data, in e-technology-different-approaches-managing-data/.
Advances in Big Data Proceedings of the 2nd INNS 27. J. Watson,H., Data Lakes, Data Labs, and Sandboxes,
Conference on Big Data, October 23–25, 2016. Bus. Intell. J., vol. 20, no. 1, 2015.
11. D. Zhu, Y. Zhang, X. Wang, and E. Al., Research on the 28. S. Šuman and I. Pogarčić, Development of ERP and
methodology of technology innovation management other large business systems in the context of new
with big data, Sci. Sci. Manag. S. T., vol. 4, pp. trends and Technologies, in 27th Daaam International
172–180, 2013. Symposium On Intelligent Manufacturing And
12. P. Russom, Data Lakes Purposes, Practices, Patterns, Automation, 2016, pp. 319–327.
and Platforms, 2017. 29. S. Šuman, Sustavi poslovne inteligencije - teorija i
13. B. Inmon, Big Data Implementation vs. Data riješeni primjeri. Rijeka: Veleučilište U Rijeci, 2017.
Warehousing, 2013. [Online]. Available: 30. Heilig,L. and S. Voß, Managing Cloud-Based Big Data
http://www.b-eye-network.com/view/17017. Platforms: A Reference Architecture and Cost
14. C. Russell, Database Development with IBM Hybrid Perspective, in Big Data Management, B. García
Data Architecture, 2017. [Online]. Available: Márquez,F.,P., Lev, Ed. Springer International
http://www.ibm.com/developerworks/. Publishing AG, p.29, 2017.
15. D. Marjanović, Hadoop i analitika u realnom 31. W. El Kaim, Big Data Architecture, 2016. [Online].
vremenu, 2017. [Online]. Available: Available:https://www.slideshare.net/welkaim/big-data-
http://www.datascience.rs/Hadoop-i-analitika-u-realno- architecture-part-2.
vremenu/. 32. K. Singh, Top 10 Big Data Tools in 2019, 2019.
16. V. Dagade, M. Lagali, S. Avadhani, and P. Kalekar, Big [Online]. Available: https://dimensionless.in/
Data Weather Analytics Using Hadoop, Int. J. Emerg. top-10-big-data-tools-in-2019/.
Technol. Comput. Sci. Electron., vol. 14, no. 2, 2015. 33. G. Ginde, R. Aedula, S. Saha, A. Mathur, S. Roy Dey, S.
17. N. Garg, S. Singla, and S. Jangra, Challenges and Sampatrao, G., and D. Sagar, Big Data Acquisition,
Techniques for Testing of Big Data, Procedia Comput. Preparation, and Analysis Using Apache Software
Sci., vol. 85, pp. 940–948, 2016. Foundation Tools, in Big Data Analytics Tools and
18. D. Stodder, New Strategies for Visual Big Data Technology for Effective Planning, D. Somani, A.K.,
Analytics, How organizations can apply modern data Ganesh, C., Ed. Boca Raton: CRC Press Taylor &
platform technologies and practices to support Francis Group, 2018.
analytics innovation, 2017. https://doi.org/10.1201/b21822-9
19. S. Alapati, Expert Hadoop Administration Managing
Tuning, and Securing Spark,Yarn and HDFS. Addison-
Wesley, pp 24-25, 2017.
20. MapReduce, 2019. [Online]. Available:
https://www.ibm.com/analytics/hadoop/mapreduce.
21. R. Alapati, S., Expert hadoop administration managing,
tuning, and securing Spark, Yarn,and HDFS. Addison-
Wesley, pp 149-151,2017.
22. A. Bekker, Spark vs. Hadoop MapReduce: Which big
data framework to choose, 2017. [Online]. Available:
https://www.scnsoft.com/blog/spark-vs-hadoop-mapred
uce.
23. T. Oktay and A. Sayar, Analyzing Big Security Logs in
Cluster with Apache Spark, in Advances in Big Data,
Advances in Intelligent Systems and Computing,
Angelov et al., Ed. 2016.
https://doi.org/10.1007/978-3-319-47898-2_14
24. J. Caserta and E. Cordo, Data Warehousing in the Era
of Big Data, 2016. [Online]. Available:
http://www.dbta.com/BigDataQuarterly/Articles/Data-
Warehousing-in-the-Era-of-Big-Data-108590.aspx.
25. B. Sharma, Architecting Data Lakes: Data Management
Architectures for Advanced Business Use Cases.
723

View publication stats

You might also like