0% found this document useful (0 votes)
263 views85 pages

Big - Data PPT Unit 1

Uploaded by

aditya singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
263 views85 pages

Big - Data PPT Unit 1

Uploaded by

aditya singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 85

Noida Institute of Engineering and Technology, Greater Noida

Introduction

Unit: 1

Introduction to Big Data


SOVERS SINGH BISHT
Assistant Professor
B-Tech 6th Sem IT DEPT
ONLINE (A & B) NIET

SOVERS SINGH BISHT ( KCS-061 ) Unit


1 1
November 27, 202
THE CONCEPT LEARNING TASK
Syllabus

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) Unit 1 2


THE CONCEPT LEARNING TASK
Text Books

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 3
THE CONCEPT LEARNING TASK
Course Objective

Objective of this unit course is to:

1.Provide an overview of an exciting growing field of big data analytics.

2.Understand the dimensions of tools required to manage and analyze big data
like Hadoop, MapReduce.

3.Understand the important aspects of fundamental techniques and principles


in Hadoop distributed file system and Hadoop environment.

4.Identify the impact of Skills that will help them to solve complex real-world
problems using YARN , MONGODB, SCALA, Spark.

5.Identify the importance of tools for Hadoop Eco system framework like
PIG, HIVE,HBASE.

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 4
THE CONCEPT LEARNING TASK
Unit Objective

 Understand the significance Big data in Industry.


 Understanding basic concept of Big data implementation and eco system.
 Describe a formal definition to its frameworks.
 Understanding the concept of Big Data frameworks.
 Understand the challenges faced by Big Data frameworks.
 Describe the standards & requirements of implementing big data in cloud
environment.

SOVERS SINGH BISHT RCS-073 HCI Unit I

11/27/2024 5
THE CONCEPT LEARNING TASK
Course Outcome

At the end of the semester, student will be able to:


Course Blooms’
Outcomes CO Description Taxonomy
(CO)

CO1 Demonstrate knowledge of Big Data Analytics concepts and K1, K2


its applications in business

CO2 Demonstrate functions and components of Map Reduce K1, K2


Framework and HDFS

CO3 Discuss Data Management concepts in NoSQL environment. K6

CO4 Explain process of developing Map Reduce based distributed K2, K5


processing applications.

CO5 Explain process of developing applications using HBASE, Hive, K2, K5


Pig etc.
SOVERS SINGH BISHT ( KCS-061 )
November 27, 2024 6
Unit 1
THE CONCEPT
CO-PO LEARNING
and PSO TASK
Mapping
Correlation Matrix of CO with PO

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 7
CONTENT
Introduction to Big Data:
1. Types of digital data
2. History of Big Data innovation
3. Introduction to Big Data platform
4. Drivers for Big Data
5. Big Data architecture and characteristics
6. 5 Vs of Big Data
7. Big Data technology components
8. Big Data importance and applications
9. Big Data features – security
10. Compliance auditing and protection
11. Big Data privacy and ethics
12. Big Data Analytics
13. Challenges of conventional systems
14. Intelligent data analysis
15. Nature of data
16. Analytic processes and tools
17. Analysis vs reporting
18. Modern data analytic tools
SOVERS SINGH BISHT ( KCS-061 ) Unit 1
November 27, 2024 8
THE CONCEPT
PrerequisiteLEARNING
and Recap TASK

Prerequisites:
• Linux operating system.

• Java.

• MySQL.

• Programming Languages (Python or Java)

Recap:
• Discussion about Big Data Environments.

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 9
THE CONCEPT LEARNING TASK
Unit Objective

The objective of the Unit 1 is :

1.To provide an overview of an exciting growing field of big data analytics.

2. To inculcate the preliminary knowledge of domain of Big data and also


elaborate following topics such as:
• History of Hadoop
• Big Data platform
• Challenges in traditional system
• Data analytics tools

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 10
THE CONCEPT
History LEARNING
of Big Data innovationTASK

Objective:
 In this topic we learn about how big data came into existence and
what was the industry need for Bid Data. This shows the innovation
importance of the technology as an open source framework.

Recap:
 Revision of database systems.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 11
THE CONCEPT
History LEARNING
of Big Data innovationTASK

Batch processing:
• Batch processing is a technique in which an Operating System collects the programs
and data together in a batch before processing starts. An operating system does the
following activities related to batch processing.
• The OS defines a job which has predefined sequence of commands, programs and
data as a single unit.
• The OS keeps a number a jobs in memory and executes them without any manual
information.
• Jobs are processed in the order of submission, i.e., first come first served fashion.
• When a job completes its execution, its memory is released and the output for the job
gets copied into an output spool for later printing or processing.

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 12


nit 1
THE CONCEPT LEARNING TASK
DISK DIAGRAM

Data transfer rate >seek time


So transfer of data to other disk
is much faster then looking for
data in same disk.

Map reduce suits for


write once read many

Database suits for


continuous write in
data

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 13


nit 1
THE CONCEPT
Introduction LEARNING
to Big TASK
Data platform

Objective:
 This topic introduces the big data as an open source framework with
its ecosystems and also how processing takes place worth huge
amount of data in cloud infrastructure.

Recap:
 Revision of cloud interfaces and high performance computing.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 14
THE CONCEPT
Introduction LEARNING
to Big TASK
Data platform

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 15
THE CONCEPT
Introduction LEARNING
to Big TASK
Data platform

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 16
THE CONCEPT
Drivers forLEARNING
Big Data TASK

Objective:
 This topic depicts the basic drivers behind the innovation of big data
framework and deals with business needs of Big data and need of
framework in current scenario.

Recap:
 Revision of Google file systems.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 17
THE CONCEPT
Drivers forLEARNING
Big Data TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 18


nit 1
THE CONCEPT
Drivers forLEARNING
Big Data TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 19


nit 1
THE CONCEPT
Drivers forLEARNING
Big Data TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 20


nit 1
THE CONCEPT
Drivers forLEARNING
Big Data TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 21


nit 1
THE CONCEPT
Drivers forLEARNING
Big Data TASK

Business Drivers for Big Data


This includes skill upgradation, technology and tool development, and a strategic
redirection. Of course, this begins with a self-assessment of where the enterprise
stands with respect to Big Data and Analytics.

• The digitization of society.


• The plummeting of technology costs.
• Connectivity through cloud computing.
• Increased knowledge about data science.
• Social media applications.
• The upcoming Internet-of-Things (IoT).

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 22


nit 1
THE CONCEPT LEARNING
Types of digital data TASK

Objective:
 This topic deals with the concept of different types if big data
requirements on digital platform and how they are occupying space
in our day to day environment.

Recap:
 Revision cloud infrastructure basics.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 23
THE CONCEPT LEARNING
Types of digital data TASK

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 24
THE CONCEPT LEARNING
Types of digital data TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 25


nit 1
THE CONCEPT LEARNING
Types of digital data TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 26


nit 1
THE CONCEPT LEARNING TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 27


nit 1
THE CONCEPT LEARNING TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 28


nit 1
THE CONCEPT LEARNING TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 29


nit 1
THE CONCEPT
Types ofLEARNING
Big Data TASK

Objective:
 This unit focuses on different types of big data components and data
issues which are to be managed with the technology innovations.

Recap:
 Revision of data generation ethics and mechanism.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 30
THE CONCEPT
Types ofLEARNING
Big Data TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 31


nit 1
THE CONCEPT LEARNING
5 Vs of Big Data TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 32


nit 1
THE CONCEPT LEARNING
5 Vs of Big Data TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 33


nit 1
THE CONCEPT LEARNING
5 Vs of Big Data TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 34


nit 1
THE CONCEPT LEARNING
5 Vs of Big Data TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 35


nit 1
THE CONCEPT
Big Data LEARNING
technology TASK
components

Objective:
 This Unit is basically dealing with the big data ecosystem and
frameworks. It also focusses on how we can manage big data with
the open source frameworks.

Recap:
 Revision of data generation process.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 36
THE CONCEPT
Big Data LEARNING
technology TASK
components

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 37


nit 1
THE
Big CONCEPT
Data LEARNING
importance TASK
and applications

Objective:
 This unit objective is to specify the application areas of Big data and
also list the importance of Big data environment with industry
standards.

Recap:
 Revision of need of Big Data in Industry.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 38
THE
Big CONCEPT
Data LEARNING
importance TASK
and applications

Main Technology Components Of Big Data

1. Data Management
2. Data Mining
3. Hadoop
4. In-Memory Analytics
5. Predictive Analytics
6. Text Mining
Why is big data concepts analytics
important?
1. Reduced cost
2. Quick decision making
3. New products and features

SOVERS SINGH BISHT ( KCS-061 )


November 27, 2024 39
Unit 1
THE CONCEPT
ApplicationLEARNING
of Big Data TASK

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 40


nit 1
THE CONCEPT
Challenges LEARNING
of conventional TASK
systems

Objective:
 This unit deals of all challenges were big data faces an obstacle for
implementation and also signifies how these challenges can be met
with breath taking solutions.

Recap:
 Revision of architecture for implementation of Big Data.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 41
THE CONCEPT
Challenges LEARNING
of conventional TASK
systems

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 42


nit 1
THE CONCEPT
Analytic processes and tools & LEARNING
Modern data TASK
analytic tools

Objective:
 This Unit focusses on the tools utilized to manage and maintain the
performance of Big data in industry, it also lists the programming
language and how we can manage large data pool in cluster
computing with the help of tools.

Recap:
 Revision of interface of Big Data.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 43
THE CONCEPT
Analytic processes and tools & LEARNING
Modern data TASK
analytic tools

• R Programming: R is a free open source software programming language and a


software environment for statistical computing and graphics. It is used by data miners for
developing statistical software and data analysis. It has become a highly popular tool for
big data in recent years.
• R is the leading analytics tool in the industry and widely used for statistics and data
modeling. It can easily manipulate your data and present in different ways. It has
exceeded SAS in many ways like capacity of data, performance and outcome. R
compiles and runs on a wide variety of platforms viz -UNIX, Windows and MacOS. It
has 11,556 packages and allows you to browse the packages by categories. R also
provides tools to automatically install all packages as per user requirement, which can
also be well assembled with Big data.

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 44


nit 1
THE CONCEPT
Analytic processes and tools & LEARNING
Modern data TASK
analytic tools

• Data wrapper: It is an online data visualization tool for making interactive charts.
You need to paste your data file in a csv, pdf or excel format or paste it directly in the
field. Data wrapper then generates any visualization in the form of bar, line, map etc.
It can be embedded into any other website as well. It is easy to use and produces
visually effective charts.
• Content Grabber: Content Grabber is a data extraction tool. It is suitable for people
with advanced programming skills. It is a web crawling software. Businesses can
use it to extract content and save it in a structured format. It offers editing and
debugging facility among many others for analysis later.

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 45


nit 1
THE CONCEPT
Analytic processes and tools & LEARNING
Modern data TASK
analytic tools

• Tableau Public: Tableau is another popular big data tool. It is simple and very intuitive to
use. It communicates the insights of the data through data visualization. Through Tableau,
an analyst can check a hypothesis and explore the data before starting to work on it
extensively.

• Tableau Public is a free software that connects any data source be it corporate Data
Warehouse, Microsoft Excel or web-based data, and creates data visualizations, maps,
dashboards etc. with real-time updates presenting on web. They can also be shared
through social media or with the client. It allows the access to download the file in
different formats. If you want to see the power of tableau, then we must have very good
data source. Tableau’s Big Data capabilities makes them important and one can analyze
and visualize data better than any other data visualization software in the market.
November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 46
nit 1
THE CONCEPT
Analytic processes and tools & LEARNING
Modern data TASK
analytic tools

• Python: Python is an object-oriented scripting language which is easy to read, write,


maintain and is a free open source tool. It was developed by Guido van Rossum in late
1980’s. Python is easy to learn as it is very similar to JavaScript, Ruby, and PHP. Also,
Python has very good machine learning libraries viz. Scikitlearn, Theano, Tensorflow
and Keras. Another important feature of Python is that it can be assembled on any
platform like SQL server, a MongoDB database or JSON. Python can also handle text
data very well.

• Python is an object-oriented scripting language which is easy to read, write, maintain


and is a free open source tool. It was developed by Guido van Rossum in late 1980’s
which supports both functional and structured programming methods.

• Python is easy to learn as it is very similar to JavaScript, Ruby, and PHP. Also, Python
has very good machine learning libraries viz. Scikitlearn, Theano, Tensorflow and
Keras. Another important feature of Python is that it can be assembled on any platform
like SQL server, a MongoDB database or JSON. Python can also handle text data very
well.

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 47


nit 1
THE CONCEPT
Analytic processes and tools & LEARNING
Modern data TASK
analytic tools

Sas is a programming environment and language for data manipulation and a leader in
analytics, developed by the SAS Institute in 1966 and further developed in 1980’s and 1990’s.
SAS is easily accessible, manageable and can analyze data from any sources. SAS introduced a
large set of products in 2011 for customer intelligence and numerous SAS modules for web,
social media and marketing analytics that is widely used for profiling customers and prospects.
It can also predict their behaviors, manage, and optimize communications.

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 48


nit 1
THE CONCEPT
Analytic processes and tools & LEARNING
Modern data TASK
analytic tools

• Apache Spark: The University of California, Berkeley’s AMP Lab, developed Apache in 2009.
Apache Spark is a fast large-scale data processing engine and executes applications in Hadoop
clusters 100 times faster in memory and 10 times faster on disk. Spark is built on data science and
its concept makes data science effortless. Spark is also popular for data pipelines and machine
learning models development. Spark also includes a library – MLlib, that provides a progressive
set of machine algorithms for repetitive data science techniques like Classification, Regression,
Collaborative Filtering, Clustering, etc.
• MS Excel: Excel is a basic, popular and widely used analytical tool almost in all industries.
Whether you are an expert in Sas, R or Tableau, you will still need to use Excel. Excel becomes
important when there is a requirement of analytics on the client’s internal data. It analyzes the
complex task that summarizes the data with a preview of pivot tables that helps in filtering the
data as per client requirement. Excel has the advance business analytics option which helps in
modelling capabilities which have prebuilt options like automatic relationship detection, a
creation of DAX measures and time grouping.
November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 49
nit 1
THE CONCEPT LEARNING
Analysis vs Reporting TASK

Analysis vs Reporting

• Reporting: The process of organizing data into informational summaries in order to


monitor how different areas of a business are performing.
• Analysis: The process of exploring data and reports in order to extract meaningful
insights, which can be used to better understand and improve business performance.
Reporting translates raw data into information.

Analysis transforms data and information into insights.

Reporting helps companies to monitor their online business and be alerted to when data
falls outside of expected ranges. The goal of analysis is to answer questions by
interpreting the data at a deeper level and providing actionable recommendations.

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 50


nit 1
THE CONCEPT LEARNING
Analysis vs Reporting TASK

Objective:
 This unit deals with the purpose of why analysis and reporting came
under existence and with Big Data how can we implement these
mechanism to enhance the capabilities for processing big data.

Recap:
 Revision Big Data framework.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 51
THE CONCEPT LEARNING
Analysis vs Reporting TASK

1. Purpose: Reporting helps companies monitor their data even before digital
technology boomed. Various organizations have been dependent on the information it
brings to their business, as reporting extracts that and makes it easier to understand.

Analysis interprets data at a deeper level. While reporting can link between cross-
channels of data, provide comparison, and make understand information easier (think of
a dashboard, charts, and graphs, which are reporting tools and not analysis reports),
analysis interprets this information and provides recommendations on actions.

2.Tasks: Reporting includes building, configuring, consolidating, organizing,


formatting, and summarizing. Analysis consists of questioning, examining, interpreting,
comparing, and confirming. With big data, predicting is possible as well.

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 52


nit 1
THE CONCEPT LEARNING
Analysis vs Reporting TASK

3. Outputs: Reporting has a push approach, as it pushes information to users and outputs
come in the forms of canned reports, dashboards, and alerts.
Analysis has a pull approach, where a data analyst draws information to further probe and
to answer business questions. Outputs from such can be in the form of ad hoc responses
and analysis presentations.

4. Delivery: Considering that reporting involves repetitive tasks—often with truckloads of


data, automation has been a lifesaver, especially now with big data. It’s not surprising that
the first thing outsourced are data entry services since outsourcing companies are perceived
as data reporting experts.
Analysis requires a more custom approach, with human minds doing superior reasoning
and analytical thinking to extract insights, and technical skills to provide efficient steps
towards accomplishing a specific goal.

5. Value: This Path to Value illustrates how data converts into value by reporting and
analysis such that it’s not achievable without the other.

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 53


nit 1
THE
Big CONCEPT
Data LEARNING
importance TASK
and applications
Big Data Importance
Big Data contains a large amount of data that is not being processed by traditional data
storage or the processing unit. It is used by many multinational companies to process the
data and business of many organizations. The data flow would exceed 150 exabytes per day
before replication

There are five v's of Big Data that explains the characteristics.
5 V's of Big Data
Volume
Veracity
Variety
Value
Velocity

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 54


nit 1
THE CONCEPT LEARNING TASK
Use Case
• An e-commerce site XYZ (having 100 million users) wants to offer a gift voucher of 100$
to its top 10 customers who have spent the most in the previous year. Moreover, they
want to find the buying trend of these customers so that company can suggest more
items related to them.
Issues
• Huge amount of unstructured data which needs to be stored, processed and analyzed.
Solution
• Storage: This huge amount of data, Hadoop uses HDFS (Hadoop Distributed File System)
which uses commodity hardware to form clusters and store data in a distributed
fashion. It works on Write once, read many times principle.
• Processing: Map Reduce paradigm is applied to data distributed over network to find
the required output.
• Analyze: Pig, Hive can be used to analyze the data.
• Cost: Hadoop is open source so the cost is no more an issue.

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 55


nit 1
THEBig
CONCEPT LEARNING
Data features – securityTASK

Features of Big Data


• Data Processing. Data processing features involve the collection and

organization of raw data to produce meaning. ...

• Predictive Applications. ...

• Analytics. ...

• Reporting Features. ...

• Security Features. ...

• Technologies Support.

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 56


nit 1
THEBig
CONCEPT LEARNING
Data privacy and ethics TASK

Objective:
 This unit deals with the privacy issues of data processed
analytically with BIG Data Framework and how ethics can me
maintained to secure our data . Cloud is not secure as per
observation so we follow some ethics to maintain privacy.

Recap:
 Revision of Big Data Analytics.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 57
THEBig
CONCEPT LEARNING
Data privacy and ethics TASK

5 Principles for Big Data Ethics


• Big data analytics raises a number of ethical issues, especially as companies begin
monetizing their data externally for purposes different from those for which the data
was initially collected. The scale and ease with which analytics can be conducted today
completely changes the ethical framework. We can now do things that were impossible
a few years ago, and existing ethical and legal frameworks cannot prescribe what we
should do. While there is still no black or white, experts agree on a few principles:
• Private customer data and identity should remain private: Privacy does not mean
secrecy, as private data might need to be audited based on legal requirements, but that
private data obtained from a person with their consent should not be exposed for use
by other businesses or individuals with any traces to their identity.
• Shared private information should be treated confidentially: Third party companies
share sensitive data — medical, financial or locational — and need to have restrictions
on whether and how that information can be shared further.
November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 58
nit 1
THEBig
CONCEPT LEARNING
Data privacy and ethics TASK

• Customers should have a transparent view of how our data is being used or sold,
and the ability to manage the flow of their private information across massive,
third-party analytical systems.
• Big Data should not interfere with human will: Big data analytics can moderate
and even determine who we are before we make up our own minds. Companies
need to begin to think about the kind of predictions and inferences that should be
allowed and the ones that should not.
• Big data should not institutionalize unfair biases like racism or sexism. Machine
learning algorithms can absorb unconscious biases in a population and amplify
them via training samples.

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 59


nit 1
THE CONCEPT LEARNING TASK
Big Data Analytics

Objective:
 This Unit focusses on how data analytics process can be monitored
and maintained within the big data framework . If makes us
understand the importance of fundamental issues and techniques
with Big Data Analytics.

Recap:
 Revision of Data Analytics.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 60
THE CONCEPT LEARNING TASK
Big Data Analytics

Big Data Analytics


• Big data analytics is the often complex process of examining big data to uncover
information -- such as hidden patterns, correlations, market trends and
customer preferences -- that can help organizations make informed business
decisions.
• On a broad scale, data analytics technologies and techniques give organizations
a way to analyze data sets and gather new information. Business intelligence (BI)
queries answer basic questions about business operations and performance.
• Big data analytics is a form of advanced analytics, which involve complex
applications with elements such as predictive models, statistical algorithms and
what-if analysis powered by analytics systems.
• Why is big data analytics important?
• Organizations can use big data analytics systems and software to make data-
driven decisions that can improve business-related outcomes. The benefits may
include more effective marketing, new revenue opportunities, customer
personalization and improved operational efficiency. With an effective strategy,
these benefits can provide competitive advantages over rivals.
November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 61
nit 1
THE CONCEPT LEARNING TASK
Big Data Analytics
How does big data analytics work?

• Data analysts, data scientists, predictive modelers, statisticians and other analytics
professionals collect, process, clean and analyze growing volumes of structured
transaction data as well as other forms of data not used by conventional BI and
analytics programs.
• Here is an overview of the four steps of the data preparation process:
• Data professionals collect data from a variety of different sources. Often, it is a mix
of semi-structured and unstructured data. While each organization will use different
data streams, some common sources include:
• internet clickstream data.
• web server logs.
• cloud applications.
• mobile applications.
• social media content.
• text from customer emails and survey responses.
• mobile phone records and
• machine data captured by sensors connected to the internet of things (IoT).

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 62


nit 1
THE CONCEPT LEARNING
Intelligent data analysis TASK
How Smart Meter Big Data is analysed?

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 63


nit 1
THE CONCEPT
Nature LEARNING
of data TASK

Objective:
 This unit focusses of nature and types of data. The management &
Processing of data in huge amount, the data needs to be processed
for analytics so here we discuss the mechanism of processing.

Recap:
 Revision of types of digital data.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 64
THE CONCEPT
Nature LEARNING
of data TASK
• Data is processed. After data is collected and stored in a data warehouse or data lake, data
professionals must organize, configure and partition the data properly for analytical
queries. Thorough data processing makes for higher performance from analytical queries.
• Data is cleansed for quality. Data professionals scrub the data using scripting tools or
enterprise software. They look for any errors or inconsistencies, such as duplications or
formatting mistakes, and organize and tidy up the data.
• The collected, processed and cleaned data is analyzed with analytics software. This
includes tools for:
• data mining, which sifts through data sets in search of patterns and relationships
• predictive analytics, which builds models to forecast customer behavior and other future
developments
• machine learning, which taps algorithms to analyze large data sets
• deep learning, which is a more advanced offshoot of machine learning
• text mining and statistical analysis software
• artificial intelligence (AI)
• mainstream business intelligence software
• data visualization tools

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 65


nit 1
THE CONCEPT
Compliance LEARNING
auditing TASK
and protection

Objective:
 This unit focussed on the compliance for auditing and protection of
Big Data from third party interfaces. It focussed on how data can be
managed and secured with the policies over the cloud.

Recap:
 Revision of Big data Architecture and open source frameworks over
cloud.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 66
THE CONCEPT
Compliance LEARNING
auditing TASK
and protection

• The sheer size of Big Data brings with it a major security challenge. Proper
security entails more than keeping the bad guys out; it also means backing up
data and protecting data from corruption.
• Data access: data can be protected if you eliminate access to the data! Not
pragmatic so we opt to control access.
• Data availability: controlling where the data are stored and how it is
distributed; more control position you better to protect the data.
• Performance: encryption and other measures can improve security but they
carry a processing burden that can severely affect the system performance!
• Liability: accessible data carry with them liability, such as the sensitivity of
the data. The legal requirements connected to the data privacy issues, and IP
concerns.
• Adequate security becomes a strategic balancing act among the above
concerns. With planning, logic, and observations, security becomes
manageable. Effectively protecting data while allowing access to the
authorized users and systems. 67
THE CONCEPT
Compliance LEARNING
auditing TASK
and protection

• Pragmatic Steps to Securing Big Data:


 First get rid of data that are no longer needed. If not possible to destroy
then the information should be securely archived and kept offline
 A real challenge is to decide which data is needed? As value can be found
in unexpected places. For example, activity logs represent a risk but logs
can be used to determine scale, use, and efficiency of big data analytics
 There is no easy answer to the above question, and it becomes a case of
choosing the lesser of two evils.
• Classifying Data:
 Protecting data is much easier if data is classified into categories, e.g.,
internal email between colleagues is different from financial report, etc.
 Simple classification can be: financial, HR, sales, inventory, and
communications.
 Once organizations better understand their data, they can take important
steps to segregate the information and that makes it easier to employ
security measures like encryption and monitoring more manageable 68
THE CONCEPT
Compliance LEARNING
auditing TASK
and protection
• Protecting Big Data Analytics:
 A real concern with Big Data is the fact that Big Data contains all of the things you
don’t want to see when are trying to protect data, very unique sample set, etc.
 Such uniqueness also means that you can’t leverage time-saving backup and
security technologies such as deduplication.
 Significant issue is the large size and number of files involved in Big Data Analytics
environment. Backup bandwidth and/or the backup appliance must be large and
the receiving devices must be able to ingest data at the delivery rate of data.
• Big Data and Compliance:
 Compliance has major effect on how Big Data is protected, stored, accessed, and
archived.
 Big Data is not easily handled by RDBMS; this means it is harder to understand
how compliance affects the data.
 Big Data is transforming the storage and access paradigm to a new world of
horizontally scaling, unstructured databases, which are more suited to solve old
business problems with analytics.
 New data types and methodologies are still expected to meet the legislative
requirements expected by compliance laws 69
THE CONCEPT
Compliance LEARNING
auditing TASK
and protection
• Big Data and Compliance:
 Preventing compliance from becoming the next Big Data nightmare is going to
be the job of security professionals.
 Health care is a good example of Big Data compliance challenge, i.e., different
data types and vast rate of data from different devices, etc.
 NoSQL is evolving as the new data management approach to unstructured data.
No need for federating multiple RDBMS. Clustered single NoSQL database and
being deployed in the cloud.
• Big Data and Compliance:
 Unfortunately, most data stores in the NoSQL world (i.e., Hadoop, Cassandra
and MongoDB) do not incorporate sufficient data security tools to provide what
is needed.
 Big Data changed few things: For example network security developers spent a
great deal of time and money on perimeter-based security mechanisms (e.g.,
firewalls) but that cannot prevent unauthorized access to data once a
criminal/hacker has entered the network!
 Lessons learned:
70
 Control access by process, not job function
THE CONCEPT
Compliance LEARNING
auditing TASK
and protection

 Secure the data at the data store level


 Protect the cryptographic keys and store them separately from the
data
 Create trusted applications and stacks to protect data from rogue
users
 Once you begin to map and understand the data, opportunities will be
evident that will lead to automating and monitoring compliance and
security compliance.
 Of course automation does not solve every problem; there are still
basic rules to be used to enable security while not derailing the value
of Big Data:
 Ensure that security does not impede performance or availability
 Pick the right encryption scheme, i.e., file, document, column, etc.
 Ensure that the security solution can evolve with your changing
requirements 71
THE CONCEPT
Compliance LEARNING
auditing TASK
and protection

• The Intellectual Property (IP) Challenge:


 One of the biggest issues with Big Data is the concept of IP.
 IP refers to creations of the human mind, such as inventions, literary and
artistic works, and symbols, names, images used in commerce.
 Some basic rules are:
 Understand what IP is and know what you have to protect
 Prioritize protection
 Label (confidential information should be labeled
 Educate employees
 Know your tools: tools that can be used to track IP stores
 Use a holistic approach: includes internal risks as well as external ones.
 Use a counterintelligence mind-set: think as if you are spying on your
company and ask how would you do it?
 The above guidelines can be applied to almost any information security
paradigm that is geared toward protecting IP. 72
THE
Big CONCEPT
Data LEARNING
importance TASK
and applications

Objective:
 This unit focussed on importance of big data I industry, depicts
practical issues were it can be utilized , its need in current scenario
and how many applications use it on daily basis.

Recap:
 Revision of Big Data applications in Industry.

SOVERS SINGH BISHT RCS-073 HCI


Unit I

11/27/2024 73
THE
Big CONCEPT
Data LEARNING
importance TASK
and applications

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 74


nit 1
THE CONCEPT
Daily LEARNING
Quiz TASK
1.What is Apache Hadoop? Why is Hadoop essential for every Big Data
application?
2) What are the main features and Characteristics of Hadoop which
make it the most popular and powerful Big Data tool?
3) What are the core components of Apache Hadoop?
4) What are the configuration files in Hadoop?
5) What are the different modes in which we can configure/install
Hadoop?
6) Explain how Hadoop cluster hardware planning and provisioning is
done?
7) How to create a user in Hadoop?
8) What are the major differences between Hadoop 2 and Hadoop 3?
9) What is a single node cluster in Hadoop? for what all purposes Hadoop
run on a single node cluster? 75
Faculty VideoTHE
Links, You tube &LEARNING
CONCEPT NPTEL VideoTASK
Links and Online
Courses Details

You Tube video

https://www.youtube.com/watch?v=rvJgArru8dI

https://www.youtube.com/watch?v=jmDV93UOngo

https://www.youtube.com/watch?v=bAyrObl7TYE

https://www.youtube.com/watch?v=zez2Tv-bcXY

https://www.youtube.com/watch?v=iANBytZ26MI

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 76
THE CONCEPT LEARNING TASK
MCQ unit wise/weekly

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 77
THE CONCEPT LEARNING TASK
MCQ unit wise/weekly

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 78


nit 1
THE CONCEPT LEARNING TASK
MCQ unit wise/weekly

November 27, 2024 SOVERS SINGH BISHT ( KCS-061 ) U 79


nit 1
THE CONCEPT LEARNING
Weekly/monthly/Unit TASK
Wise Assignment.
Weekly Assignment
Assignment 1

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 80
THE CONCEPT LEARNING
Old Question Papers TASK

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 81
THE CONCEPT LEARNING TASK
References
Text books:
Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging
Business Intelligence and Analytic Trends for Today's Businesses", Wiley
2. Big-Data Black Book, DT Editorial Services, Wiley
3. Dirk deRoos, Chris Eaton, George Lapis, Paul Zikopoulos, Tom Deutsch, “Understanding
Big Data Analytics for Enterprise Class Hadoop and Streaming Data”, McGrawHill.
4. Thomas Erl, Wajid Khattak, Paul Buhler, “Big Data Fundamentals: Concepts, Drivers and
Techniques”, Prentice Hall.
5. Bart Baesens “Analytics in a Big Data World: The Essential Guide to Data Science and its
Applications (WILEY Big Data Series)”, John Wiley & Sons
6. ArshdeepBahga, Vijay Madisetti, “Big Data Science & Analytics: A HandsOn Approach “,
VPT
7. Anand Rajaraman and Jeffrey David Ullman, “Mining of Massive Datasets”, CUP
8. Tom White, "Hadoop: The Definitive Guide", O'Reilly.
SOVERS SINGH BISHT ( KCS-061 ) Unit 1
November 27, 2024 82
THE CONCEPT
Expected Questions LEARNING
for University TASK
Exam

• List out the best practices of Big Data Analytics ?


• Write down the characteristics of Big Data Applications?
• What are the characteristics of big data?
• What is Bigdata? Describe the main features of a big data in detail?
• Explain in detail about HDFS, With Diagram?
• Why Hadoop is called a BIG DATA technology ? Explain How it supports Big Data?
• How is Google File System Different from Hadoop File System and explain the
Google file System?
• Give Two examples of Big Data Case Studies & Indicate which V’s are satisfied with
these case studies?
• Write any four applications of Business Intelligence in various sectors?

11/27/2024 Faculty Name Subject code and abbreviation Unit Number 83


THE CONCEPT LEARNING TASK
Summary

 This unit provide us fundamentals domain of Big Data and its latest
trends in industry.
 In this unit we are also benefitted with the knowledge of different
types of data
 and very important one are the 5 V’s of Big Data and we also through
the concept of reporting vs analysis which is used in industry
prospects.
 This unit will impart us with knowledge of analytics tool like tableau ,
SAS , R, etc.

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 84
CONTENT

Thank You

SOVERS SINGH BISHT ( KCS-061 ) Unit 1


November 27, 2024 85

You might also like