Big - Data Unit-1

Brief introduction about Big-Data, Characteristics of Big Data, Digital Data and their types, Big data Analytics, Application of bog data.

Uploaded by

Tulshiram Kamble

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

581 views33 pages

Big - Data Unit-1

Brief introduction about Big-Data, Characteristics of Big Data, Digital Data and their types, Big data Analytics, Application of bog data.

Uploaded by

Tulshiram Kamble

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Introduction to Big

data
Unit- 1
Introduction
• Data exists everywhere.
• Amount of digital data exists is rising at rapid rate, doubling after
every few years and changing our life.
• Quantity of data generated per second is much large.
• Real time analysis in data stream is needed to manage huge data,
through proper analysis we can get essential data, through this we
can predict network traffic, intrusion related activity, weather and so
many.
• Data is growing rapidly increasing there are specific trends and
patterns in the data. It is difficult to know where to look or how to fins
them.
• Year back only Structured data was used by organization, so system
which is easy to handle by RDBMS , It is tools to store , mange,
process and report this data. But present day nature of data change ,
huge amount of data generating in various formats and at very fast
rate.
• These data not simple structured data, so for this almost impossible
to use traditional relational databases and store, mange, process and
report this data.
• Big data is the solution to overcome such problems about data store
and manipulation.
Concept of Big data
• Big data refer to the tools, processes, and procedures allowing
organization to create manipulate and manage huge data and store
facilities.
• It refers to huge volume of data that cant be processed effectively
with the traditional existing application and analysis technique.
• It is not possible to store and aggregate the raw data in the memory
of a single computer for processing.
• So it requires efficient tools for data management and analysis.
• Big data is one which help to analyze that can guide to better
decisions and also for strategic business steps.
Definition of Big data
• Big data analytics involves using advanced tools and techniques to
uncover patterns, correlations, and insights (understanding) from
these large datasets to inform decision-making and strategic planning.

• Big data refers to extremely large and complex datasets that cannot
be easily managed, processed, or analyzed using traditional data
processing tools.
Characteristics of Big Data
• Big-data Characteristics measures in 5 V's of Big Data
• Volume
• Veracity
• Variety
• Value
• Velocity
Volume
• The name Big Data itself is related to an enormous size. Big Data is a
vast 'volumes' of data generated from many sources daily, such
as business processes, machines, social media platforms, networks,
human interactions, and many more.
• It is related to the quantity of data that represents the amount of data
generated, stored and operated within the system
• Facebook can generate approximately a billion messages, 4.5
billion times that the "Like" button is recorded, and more than 350
million new posts are uploaded each day. Big data technologies can
handle large amounts of data.
Variety
• Big Data can be structured, unstructured, and semi-structured that
are being collected from different sources. Data will only be collected
from databases and sheets in the past, But these days the data will
comes in array forms, that are PDFs, Emails, audios, Social Media
posts, photos, videos, etc.
Variety
The data is categorized as below:
• Structured data: In Structured schema, along with all the required
columns. It is in a tabular form. Structured Data is stored in the
relational database management system.
• Semi-structured: In Semi-structured, the schema is not appropriately
defined, e.g., JSON, XML, CSV, TSV, and email. OLTP (Online
Transaction Processing) systems are built to work with semi-
structured data. It is stored in relations, i.e., tables.
• Unstructured Data: All the unstructured files, log files, audio files,
and image files are included in the unstructured data. Some
organizations have much data available, but they did not know how
to derive the value of data since the data is raw.
Veracity
• Veracity means how much the data is reliable. It has many ways to
filter or translate the data. Veracity is the process of being able to
handle and manage data efficiently. Big Data is also essential in
business development.
• Veracity is the assurance of the quality or trustworthiness of the data.
It refer to inconsistencies and uncertainty in data.
• For example, Facebook posts with hashtags.
Velocity
• Velocity plays an important role compared to others. Velocity creates
the speed by which the data is created in real-time. It contains the
linking of incoming data sets speeds, rate of change, and activity
bursts.
• The primary aspect of Big Data is to provide demanding data rapidly.
• Big data velocity deals with the speed at the data flows from sources
like application logs, business processes, networks, and social media
sites, sensors, mobile devices, etc.
Value
• It refers to ability to turn the data into value. Big data must have
value.
• The potential insights and benefits that can be derived from analyzing
the data.
• Data having no value is of not good for any organization, unless turn it
into something useful.

Volume + Velocity + Variety + Veracity = Value

Advantages of Big Data
• Improved Business Processes
• Fraud Detection
• Improved Customer Service
• Better decision Making
• Increase Productivity
• Reduce Cost
• Increase Revenue
Disadvantages
• Cyber Security Risk
• Need for expertise
• Data quality
• Accuracy of results
• Technical Complexity
Digital Data
• It is data that represents other forms of data using specific machine
language system that can be interpreted by various technologies.

• Digital data is the term commonly use in Computing basically for

information (data) transformed to binary form like Digital Audio,
Digital Photography.
Types of Big Data/Digital Data
Structured Data
• Structured data can be defined as the data that resides in a fixed field within a record.

• It is type of data most familiar to our everyday lives. for ex: birthday, address

• A certain schema ( structure) binds it, so all the data has the same set of properties. Structured
data is also called relational data. It is split into multiple tables to enhance the integrity
(veracity) of the data by creating a single record to depict (represent) an entity. Relationships
are enforced by the application of table constraints.

• The business value of structured data lies within how well an organization can utilize its existing
systems and processes for analysis purposes.
Sources of Structured data
Semi-Structured Data
• The data is not in the relational format and is not neatly organized into rows and columns like that in a
spreadsheet. However, there are some features like key-value pairs that help in understanding the different
entities from each other.

• Since semi-structured data doesn’t need a structured query language, it is commonly called NoSQL data.

• A data serialization language is used to exchange semi-structured data across systems that may even have
varied underlying ( basic) infrastructure.

• Semi-structured content is often used to store metadata about a business process but it can also include files
containing machine instructions for computer programs.

• This type of information typically comes from external sources such as social media platforms or other web-
based data feeds.
Sources of Semi-Structured data
Unstructured Data
• Unstructured data is the kind of data that doesn’t adhere( follow) to any definite
schema or set of rules. Its arrangement is unplanned and haphazard (disorganized).

• Photos, videos, text documents, and log files can be generally considered
unstructured data. Even though the metadata accompanying an image or a video
may be semi-structured, the actual data being dealt with is unstructured.

• Additionally, unstructured data is also known as “dark data” because it cannot be

analyzed without the proper software tools.
Sources of Un-Structured data
Big Data Analytics
• It is process of collecting, organizing and analyzing large set of data
through various tools and techniques using through to discover
unknown pattern and other useful information.
• It is also use to find hidden correlations, meaningful trends and
other perceptions for making data-driven decision to obtain results.
• It is helpful for organization to better understand the information
contained within the data and will help to identify the data which is
most important to the business future decisions and predictions.
Need of Big Data Analytics
• Data is generating in different forms. The traditional analytics
solutions not possible due to the cost of implementation and lack of
professionals.
• It help to improve the applications & services for providing better
outcomes.
• It help to understand and fulfill customer needs & demands.
• It help us to uncover hidden pattern, unknown correlations, market
trends etc. that leads to more effective marketing, better customer
service etc.
Benefits of Big data analytics
• Cost savings
• Save time to make faster and better
• Understand customer need
• Improved product and services
• Increase Security
Applications of Big data
• Banking Sector
• Health Sector
• Media & entertainment Sector
• E-commerce
Big Data in Banking Sector
• Customer Insights and Personalization
Customer Segmentation and Personalized Services: Analyzing customer data to tailor products and
services to individual needs.
• Risk Management
Credit Scoring and Fraud Detection: Improving credit scoring accuracy and identifying potential
fraud through advanced data analytics.
• Operational Efficiency
Process Optimization and Predictive Maintenance: Streamlining (reform) processes and preventing
system failures with data-driven insights.
• Customer Experience
360-Degree View and Feedback Analysis: Understanding customer interactions across channels and
improving service quality by analyzing feedback.
Big Data in Banking Sector
• Investment and Wealth Management
Portfolio Management and Robo-Advisors: Optimizing investment
portfolios and providing automated investment advice using algorithms
and big data.
Big Data in Health Sector
• Patient Care and Outcomes
Personalized Medicine: Tailoring treatments based on individual data.
Predictive Analytics: Identifying at-risk patients for early intervention.
• Disease Tracking and Prevention
Epidemiology: Predicting and tracking disease outbreaks.
Public Health Surveillance (investigation): Monitoring health trends to
control diseases.
• Operational Efficiency
Resource Management: Optimizing healthcare resources and staffing
(employment).
Supply Chain Management: Ensuring availability of medical supplies.
Big Data in Health Sector
• Clinical Research
Data-Driven Trials: Enhancing clinical trials with large datasets.
New Drug Development: Accelerating drug discovery using analytics.
• Healthcare Management
Cost Reduction: Identifying cost-saving opportunities.
Quality Improvement: Monitoring and improving care quality.
• Patient Engagement
Telemedicine: Enabling remote consultations and monitoring.
Patient Feedback: Using feedback to enhance healthcare services.
Big Data in Media & entertainment
Sector
• Media sector always generates data from various way such as
research , sales, customer databases, log files and so on.
• Even It is possible to figure out view or likes of an artist to know the
popularity in the digital media sector.
• It help cover other factors which belongs to media & entertainment
sector are following:
• Predicts Audience interests
• Provide insights into customer churn (process of something reduction)
• Optimized scheduling of media streams
• Content Monetizing
Big Data in E-Commerce Sector
• Amazon, Flipkart, Alibaba may more are collaborated with big data for
making right business decisions.
• Big data is grown in e-commerce & help them for predicating the
interests of user and provide their customers interesting search when
they shop online site.
• It also help companies to find the position of particular product that can
grow with competition & also able to find and compare with other
online stores.
• Online retailor make use the big data for better shopping experience ,
get customer satisfaction & generate more sales.
• Hadoop server best technique which provide the scalable & inexpensive
platform for data processing.

Unit-3 DS
No ratings yet
Unit-3 DS
21 pages
Social Media Analytics Unit-1
No ratings yet
Social Media Analytics Unit-1
43 pages
Data Analytics Notes
No ratings yet
Data Analytics Notes
1 page
Data Science and Big Data Analytics
No ratings yet
Data Science and Big Data Analytics
264 pages
Data Analytics Handwritten Notes
No ratings yet
Data Analytics Handwritten Notes
47 pages
Big Data Analytics by Seema Acharya PDF - PDF
No ratings yet
Big Data Analytics by Seema Acharya PDF - PDF
372 pages
20IT503 - Big Data Analytics - Unit4
No ratings yet
20IT503 - Big Data Analytics - Unit4
73 pages
ADA Complete Notes
33% (3)
ADA Complete Notes
151 pages
Bda Viva Q&a
No ratings yet
Bda Viva Q&a
24 pages
MC5502 Bigdata Unit 2 Notes
100% (2)
MC5502 Bigdata Unit 2 Notes
20 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
Big - Data Unit-2
100% (2)
Big - Data Unit-2
64 pages
Big-Data Unit-3
100% (1)
Big-Data Unit-3
54 pages
Big-Data Unit-4
No ratings yet
Big-Data Unit-4
110 pages
Angular JS Unit-4
No ratings yet
Angular JS Unit-4
43 pages
AngularJS-unit 2
100% (1)
AngularJS-unit 2
41 pages
Angular JS Unit-3
No ratings yet
Angular JS Unit-3
29 pages
JQuery Unit-1
0% (1)
JQuery Unit-1
19 pages
JQuery Unit-3
No ratings yet
JQuery Unit-3
14 pages
JQuery Unit-2
No ratings yet
JQuery Unit-2
18 pages
Big Data Analytics Notes
50% (2)
Big Data Analytics Notes
16 pages
Data Science Handwritten Notes
No ratings yet
Data Science Handwritten Notes
44 pages
Lecture 6 Data Preprocessing
No ratings yet
Lecture 6 Data Preprocessing
59 pages
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
No ratings yet
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
24 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
Big Data Analytics
100% (3)
Big Data Analytics
79 pages
SQL Notes by Apna College
75% (4)
SQL Notes by Apna College
29 pages
Chapter 3: Data Preprocessing
100% (1)
Chapter 3: Data Preprocessing
41 pages
SQL (Notes)
100% (1)
SQL (Notes)
59 pages
Bba CA Project Sem 4
100% (1)
Bba CA Project Sem 4
5 pages
Machine Learning Notes - TutorialsDuniya
100% (1)
Machine Learning Notes - TutorialsDuniya
58 pages
Counting Ones in A Window: The Cost of Exact Counts
100% (1)
Counting Ones in A Window: The Cost of Exact Counts
13 pages
Data Warehousing & Data Mining (R20) Imp Questions:-Unit-1
100% (1)
Data Warehousing & Data Mining (R20) Imp Questions:-Unit-1
3 pages
Lecture 2 Data Structure Array & Vector
No ratings yet
Lecture 2 Data Structure Array & Vector
33 pages
Big Data Analytics PPT-2 (Section-A)
No ratings yet
Big Data Analytics PPT-2 (Section-A)
10 pages
Industrial Training PPT On Movie Recomendation System
No ratings yet
Industrial Training PPT On Movie Recomendation System
13 pages
Data Science Theory: Analysis and Analytics
No ratings yet
Data Science Theory: Analysis and Analytics
14 pages
Data Science 1
100% (4)
Data Science 1
133 pages
Data Analytics III I
No ratings yet
Data Analytics III I
86 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
1.2 Challenges of Conventional Systems
100% (1)
1.2 Challenges of Conventional Systems
19 pages
Unit 1-Big Data Analytics & Lifecycle
No ratings yet
Unit 1-Big Data Analytics & Lifecycle
130 pages
Data Science
No ratings yet
Data Science
16 pages
M. Tech. (Sem-Ii) Theory Examination 2017-18 Distributed Data Base
100% (1)
M. Tech. (Sem-Ii) Theory Examination 2017-18 Distributed Data Base
2 pages
Data Mining
100% (1)
Data Mining
53 pages
Javascript Notes by Heera Singh Bellary
100% (2)
Javascript Notes by Heera Singh Bellary
133 pages
CLIQUE and PROCLUS
0% (1)
CLIQUE and PROCLUS
13 pages
A Convergence of Key Trends: Kept Large Amounts of Information Information On Tape
No ratings yet
A Convergence of Key Trends: Kept Large Amounts of Information Information On Tape
14 pages
Daa
No ratings yet
Daa
68 pages
20IT503 - Big Data Analytics - Unit2
No ratings yet
20IT503 - Big Data Analytics - Unit2
62 pages
Angular JS Notes
No ratings yet
Angular JS Notes
33 pages
Big Data (KCS-061)
No ratings yet
Big Data (KCS-061)
46 pages
Power BI Course Syllabus - by Murali P N, Besant Technologies
No ratings yet
Power BI Course Syllabus - by Murali P N, Besant Technologies
6 pages
Data Science Fundamentals - Class1
100% (1)
Data Science Fundamentals - Class1
51 pages
Big Data UNIT I
No ratings yet
Big Data UNIT I
91 pages
BDA Unit 1 Notes-1
No ratings yet
BDA Unit 1 Notes-1
34 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
34 pages
Ds Unit-1
No ratings yet
Ds Unit-1
19 pages
Cloud Computing
No ratings yet
Cloud Computing
86 pages
Getting An Overview of Big Data (Module1)
No ratings yet
Getting An Overview of Big Data (Module1)
58 pages
Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques For Pattern Recognition in Cyber Physical Social Systems
No ratings yet
Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques For Pattern Recognition in Cyber Physical Social Systems
65 pages
Improving Organizational Performance THR
No ratings yet
Improving Organizational Performance THR
15 pages
ITISA1 Ch06 PowerPoint
No ratings yet
ITISA1 Ch06 PowerPoint
39 pages
FDP Brochure
No ratings yet
FDP Brochure
2 pages
Big Data Security Issues
No ratings yet
Big Data Security Issues
7 pages
T Shaped Teams
No ratings yet
T Shaped Teams
46 pages
Big Data: (Data Security and Integrity)
No ratings yet
Big Data: (Data Security and Integrity)
9 pages
Digital Fluency Final
No ratings yet
Digital Fluency Final
37 pages
Capgemini Sas Enterprise Fraud Management 2378590
No ratings yet
Capgemini Sas Enterprise Fraud Management 2378590
8 pages
Unit 1
No ratings yet
Unit 1
61 pages
NAPEC 2024 Speaker Mekideche Mounir
No ratings yet
NAPEC 2024 Speaker Mekideche Mounir
29 pages
Arline Industry: Appicaion of Business Analytics and Intelligence in Airline Industry
No ratings yet
Arline Industry: Appicaion of Business Analytics and Intelligence in Airline Industry
51 pages
Assignments Omega
No ratings yet
Assignments Omega
7 pages
Caiib Rural Banking Notes
100% (1)
Caiib Rural Banking Notes
2 pages
Data Protection On The Move - Current Developments in ICT and Privacy - Data Protection
100% (1)
Data Protection On The Move - Current Developments in ICT and Privacy - Data Protection
492 pages
Iit Patna Mtech Big Data Block Chain
No ratings yet
Iit Patna Mtech Big Data Block Chain
12 pages
Big Data - Midsem
No ratings yet
Big Data - Midsem
526 pages
Corrigendum 020120
No ratings yet
Corrigendum 020120
44 pages
Week-1 Introduction To BDDA-TWM PDF
No ratings yet
Week-1 Introduction To BDDA-TWM PDF
48 pages
Artificial Intelligence and Big Data Science in Ne
No ratings yet
Artificial Intelligence and Big Data Science in Ne
8 pages
Big Data Analytics
100% (1)
Big Data Analytics
3 pages
A Guide To: Data Science at Scale
No ratings yet
A Guide To: Data Science at Scale
20 pages
Big Data Research: Shaokun Fan, Raymond Y.K. Lau, J. Leon Zhao
No ratings yet
Big Data Research: Shaokun Fan, Raymond Y.K. Lau, J. Leon Zhao
5 pages
Big Data Text Analytics
No ratings yet
Big Data Text Analytics
11 pages
Towards The Next Generation of Manufacturing Implications of Big Data and Digitalization in The Context of Industry 4.0
No ratings yet
Towards The Next Generation of Manufacturing Implications of Big Data and Digitalization in The Context of Industry 4.0
5 pages
Competition Law Assignment
No ratings yet
Competition Law Assignment
13 pages
Data Science Terminology Flashcards - Quizlet
100% (1)
Data Science Terminology Flashcards - Quizlet
15 pages
Business Analytics and Big Data
No ratings yet
Business Analytics and Big Data
11 pages
MSYS116 Case Study 3
No ratings yet
MSYS116 Case Study 3
8 pages
2016 IDG Data & Analytics Survey
100% (4)
2016 IDG Data & Analytics Survey
11 pages

Big - Data Unit-1

Uploaded by

Big - Data Unit-1

Uploaded by

Introduction to Big

Volume + Velocity + Variety + Veracity = Value

• Digital data is the term commonly use in Computing basically for

• Additionally, unstructured data is also known as “dark data” because it cannot be

You might also like