Bda L2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Big Data Analytics

1
Introduction to Big data
and Hadoop

2
Characteristics of Big Data

3
Characteristics of Big data -Velocity

Speed at data is generated


900 million photos uploaded on facebook everyday.
500 million tweets are posted on twitter everyday.
0.4 million hours of video are uploaded on youtube
everyday.
3.5 + million search on google
This is like nuclear explosion.

4
Characteristics of Big data -Velocity

Big data helps the company to hold this explosion and


accept incoming data at the same time process it so fast
without bottleneck.
In short,
Speed of generating data
Generate real time
Online and offline data
In streams, batch or bits
5
Characteristics of Big data -Variety

Variety – all the structured and unstructured


Generated by people or machine.
Its about classify the incoming data into various
categories.
Example : tweets, text, voicemail, ECG reading, audio and
video recording.

6
Characteristics of Big data -Veracity

It means the degree of reliability that the data has offer


Major part of data is unstructured and irrelevant for specific
purpose.
Big data needs to filter them or translate them out as a data is
crucial in business development.

7
Characteristics of Big data -Volume

Volume is a major issue.


Its not only the amount of data that we store or
process.
Its actually amount of data that we store or process. It
is amount of valuable, reliable and trustworthy data
needs to be stored, processed, analyzed to find
insights.

8
Traditional Versus Big data Approach

Traditional data management and analytics store structured data in data


marts and data warehouse.
It also handle large amount of data.
e.g. millions of credit card transactions.
Hundreds of these systems are distributed throughout the organization
and its pattern.
Each of these systems has its own silos of data and many of these
contain information about customer experience.
9
Traditional Versus Big data Approach

Many of the data source do not have same definitions.


So, Copying at central location is not advisable.
Sampling is also not serve the purpose of data extracting required
information.
The objective of big data is to identify the customer experience view over
a period of time from all the events that took place.
To implement this traditional approach may take one year.

10
Big data Approach

Solution is Big data Approach.


Tools like:
Hadoop Cluster: Storage Requirement of big data
Apache Spark : Capable of stream processing.
These tools can reduce work from 2 years to less than 4
months

11
Big Data Approach

Organizations where work load is constant : Traditional


Approach.
Organizations challenged by increasing data demands :
Hadoop Scalable infrastructure.

Hybrid systems : Integrate Hadoop platforms with traditional


databases, which is cost effective.
12
13
14
Advantages of Big data

Big data analytics uses simple model that can be applied to


volume of data.
Researcher says that algorithm gives competitive edge
when it gives better result on large amount of data without
compromising on performance.
Big data analytics has sophisticated model developed for it.

15
Hadoop over RDBMS

Scalability – nodes can be added to scale the system with little administration.
Unlike traditional RDBMS, no pre-processing is required before storing.
Any unstructured data such as text, images and videos can be stored.
There is no limit to how much data needs to be stored and for how long.
Protection against hardware failure – in case of any node failure, it is
redirected to other nodes.
Multiple copies of the data are automatically stored.

16
For analyzing big data, it is important to ascertain the following areas while
acquiring systems:
Support to extract data from different data sources and different internal
and external platforms.
Support different data types, for example, text documents, existing
databases, data streams,image, voice and video.
Data integrator to combine the information from different sources to do
analysis.
A simple interface to carry out analysis.
Facility to view results in different ways as per user’s needs.
17
Answer???

Amazon has been collecting review data for a particular product. They have realized
that almost 90% of the reviews were mostly a 5/5 rating. However, of the 90%, they
realized that 50% of them were customers who did not have proof of purchase or
customers who did not post serious reviews about the product. Of the following, which
is true about the review data collected in this situation?

High volume
High velocity
Low value
Low veracity
18

You might also like