Bda L2
Bda L2
Bda L2
1
Introduction to Big data
and Hadoop
2
Characteristics of Big Data
3
Characteristics of Big data -Velocity
4
Characteristics of Big data -Velocity
6
Characteristics of Big data -Veracity
7
Characteristics of Big data -Volume
8
Traditional Versus Big data Approach
10
Big data Approach
11
Big Data Approach
15
Hadoop over RDBMS
Scalability – nodes can be added to scale the system with little administration.
Unlike traditional RDBMS, no pre-processing is required before storing.
Any unstructured data such as text, images and videos can be stored.
There is no limit to how much data needs to be stored and for how long.
Protection against hardware failure – in case of any node failure, it is
redirected to other nodes.
Multiple copies of the data are automatically stored.
16
For analyzing big data, it is important to ascertain the following areas while
acquiring systems:
Support to extract data from different data sources and different internal
and external platforms.
Support different data types, for example, text documents, existing
databases, data streams,image, voice and video.
Data integrator to combine the information from different sources to do
analysis.
A simple interface to carry out analysis.
Facility to view results in different ways as per user’s needs.
17
Answer???
Amazon has been collecting review data for a particular product. They have realized
that almost 90% of the reviews were mostly a 5/5 rating. However, of the 90%, they
realized that 50% of them were customers who did not have proof of purchase or
customers who did not post serious reviews about the product. Of the following, which
is true about the review data collected in this situation?
High volume
High velocity
Low value
Low veracity
18