100% found this document useful (2 votes)
581 views33 pages

Big - Data Unit-1

Brief introduction about Big-Data, Characteristics of Big Data, Digital Data and their types, Big data Analytics, Application of bog data.

Uploaded by

Tulshiram Kamble
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
581 views33 pages

Big - Data Unit-1

Brief introduction about Big-Data, Characteristics of Big Data, Digital Data and their types, Big data Analytics, Application of bog data.

Uploaded by

Tulshiram Kamble
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Introduction to Big

data
Unit- 1
Introduction
• Data exists everywhere.
• Amount of digital data exists is rising at rapid rate, doubling after
every few years and changing our life.
• Quantity of data generated per second is much large.
• Real time analysis in data stream is needed to manage huge data,
through proper analysis we can get essential data, through this we
can predict network traffic, intrusion related activity, weather and so
many.
• Data is growing rapidly increasing there are specific trends and
patterns in the data. It is difficult to know where to look or how to fins
them.
• Year back only Structured data was used by organization, so system
which is easy to handle by RDBMS , It is tools to store , mange,
process and report this data. But present day nature of data change ,
huge amount of data generating in various formats and at very fast
rate.
• These data not simple structured data, so for this almost impossible
to use traditional relational databases and store, mange, process and
report this data.
• Big data is the solution to overcome such problems about data store
and manipulation.
Concept of Big data
• Big data refer to the tools, processes, and procedures allowing
organization to create manipulate and manage huge data and store
facilities.
• It refers to huge volume of data that cant be processed effectively
with the traditional existing application and analysis technique.
• It is not possible to store and aggregate the raw data in the memory
of a single computer for processing.
• So it requires efficient tools for data management and analysis.
• Big data is one which help to analyze that can guide to better
decisions and also for strategic business steps.
Definition of Big data
• Big data analytics involves using advanced tools and techniques to
uncover patterns, correlations, and insights (understanding) from
these large datasets to inform decision-making and strategic planning.

• Big data refers to extremely large and complex datasets that cannot
be easily managed, processed, or analyzed using traditional data
processing tools.
Characteristics of Big Data
• Big-data Characteristics measures in 5 V's of Big Data
• Volume
• Veracity
• Variety
• Value
• Velocity
Volume
• The name Big Data itself is related to an enormous size. Big Data is a
vast 'volumes' of data generated from many sources daily, such
as business processes, machines, social media platforms, networks,
human interactions, and many more.
• It is related to the quantity of data that represents the amount of data
generated, stored and operated within the system
• Facebook can generate approximately a billion messages, 4.5
billion times that the "Like" button is recorded, and more than 350
million new posts are uploaded each day. Big data technologies can
handle large amounts of data.
Variety
• Big Data can be structured, unstructured, and semi-structured that
are being collected from different sources. Data will only be collected
from databases and sheets in the past, But these days the data will
comes in array forms, that are PDFs, Emails, audios, Social Media
posts, photos, videos, etc.
Variety
The data is categorized as below:
• Structured data: In Structured schema, along with all the required
columns. It is in a tabular form. Structured Data is stored in the
relational database management system.
• Semi-structured: In Semi-structured, the schema is not appropriately
defined, e.g., JSON, XML, CSV, TSV, and email. OLTP (Online
Transaction Processing) systems are built to work with semi-
structured data. It is stored in relations, i.e., tables.
• Unstructured Data: All the unstructured files, log files, audio files,
and image files are included in the unstructured data. Some
organizations have much data available, but they did not know how
to derive the value of data since the data is raw.
Veracity
• Veracity means how much the data is reliable. It has many ways to
filter or translate the data. Veracity is the process of being able to
handle and manage data efficiently. Big Data is also essential in
business development.
• Veracity is the assurance of the quality or trustworthiness of the data.
It refer to inconsistencies and uncertainty in data.
• For example, Facebook posts with hashtags.
Velocity
• Velocity plays an important role compared to others. Velocity creates
the speed by which the data is created in real-time. It contains the
linking of incoming data sets speeds, rate of change, and activity
bursts.
• The primary aspect of Big Data is to provide demanding data rapidly.
• Big data velocity deals with the speed at the data flows from sources
like application logs, business processes, networks, and social media
sites, sensors, mobile devices, etc.
Value
• It refers to ability to turn the data into value. Big data must have
value.
• The potential insights and benefits that can be derived from analyzing
the data.
• Data having no value is of not good for any organization, unless turn it
into something useful.

Volume + Velocity + Variety + Veracity = Value


Advantages of Big Data
• Improved Business Processes
• Fraud Detection
• Improved Customer Service
• Better decision Making
• Increase Productivity
• Reduce Cost
• Increase Revenue
Disadvantages
• Cyber Security Risk
• Need for expertise
• Data quality
• Accuracy of results
• Technical Complexity
Digital Data
• It is data that represents other forms of data using specific machine
language system that can be interpreted by various technologies.

• Digital data is the term commonly use in Computing basically for


information (data) transformed to binary form like Digital Audio,
Digital Photography.
Types of Big Data/Digital Data
Structured Data
• Structured data can be defined as the data that resides in a fixed field within a record.

• It is type of data most familiar to our everyday lives. for ex: birthday, address

• A certain schema ( structure) binds it, so all the data has the same set of properties. Structured
data is also called relational data. It is split into multiple tables to enhance the integrity
(veracity) of the data by creating a single record to depict (represent) an entity. Relationships
are enforced by the application of table constraints.

• The business value of structured data lies within how well an organization can utilize its existing
systems and processes for analysis purposes.
Sources of Structured data
Semi-Structured Data
• The data is not in the relational format and is not neatly organized into rows and columns like that in a
spreadsheet. However, there are some features like key-value pairs that help in understanding the different
entities from each other.

• Since semi-structured data doesn’t need a structured query language, it is commonly called NoSQL data.

• A data serialization language is used to exchange semi-structured data across systems that may even have
varied underlying ( basic) infrastructure.

• Semi-structured content is often used to store metadata about a business process but it can also include files
containing machine instructions for computer programs.

• This type of information typically comes from external sources such as social media platforms or other web-
based data feeds.
Sources of Semi-Structured data
Unstructured Data
• Unstructured data is the kind of data that doesn’t adhere( follow) to any definite
schema or set of rules. Its arrangement is unplanned and haphazard (disorganized).

• Photos, videos, text documents, and log files can be generally considered
unstructured data. Even though the metadata accompanying an image or a video
may be semi-structured, the actual data being dealt with is unstructured.

• Additionally, unstructured data is also known as “dark data” because it cannot be


analyzed without the proper software tools.
Sources of Un-Structured data
Big Data Analytics
• It is process of collecting, organizing and analyzing large set of data
through various tools and techniques using through to discover
unknown pattern and other useful information.
• It is also use to find hidden correlations, meaningful trends and
other perceptions for making data-driven decision to obtain results.
• It is helpful for organization to better understand the information
contained within the data and will help to identify the data which is
most important to the business future decisions and predictions.
Need of Big Data Analytics
• Data is generating in different forms. The traditional analytics
solutions not possible due to the cost of implementation and lack of
professionals.
• It help to improve the applications & services for providing better
outcomes.
• It help to understand and fulfill customer needs & demands.
• It help us to uncover hidden pattern, unknown correlations, market
trends etc. that leads to more effective marketing, better customer
service etc.
Benefits of Big data analytics
• Cost savings
• Save time to make faster and better
• Understand customer need
• Improved product and services
• Increase Security
Applications of Big data
• Banking Sector
• Health Sector
• Media & entertainment Sector
• E-commerce
Big Data in Banking Sector
• Customer Insights and Personalization
Customer Segmentation and Personalized Services: Analyzing customer data to tailor products and
services to individual needs.
• Risk Management
Credit Scoring and Fraud Detection: Improving credit scoring accuracy and identifying potential
fraud through advanced data analytics.
• Operational Efficiency
Process Optimization and Predictive Maintenance: Streamlining (reform) processes and preventing
system failures with data-driven insights.
• Customer Experience
360-Degree View and Feedback Analysis: Understanding customer interactions across channels and
improving service quality by analyzing feedback.
Big Data in Banking Sector
• Investment and Wealth Management
Portfolio Management and Robo-Advisors: Optimizing investment
portfolios and providing automated investment advice using algorithms
and big data.
Big Data in Health Sector
• Patient Care and Outcomes
Personalized Medicine: Tailoring treatments based on individual data.
Predictive Analytics: Identifying at-risk patients for early intervention.
• Disease Tracking and Prevention
Epidemiology: Predicting and tracking disease outbreaks.
Public Health Surveillance (investigation): Monitoring health trends to
control diseases.
• Operational Efficiency
Resource Management: Optimizing healthcare resources and staffing
(employment).
Supply Chain Management: Ensuring availability of medical supplies.
Big Data in Health Sector
• Clinical Research
Data-Driven Trials: Enhancing clinical trials with large datasets.
New Drug Development: Accelerating drug discovery using analytics.
• Healthcare Management
Cost Reduction: Identifying cost-saving opportunities.
Quality Improvement: Monitoring and improving care quality.
• Patient Engagement
Telemedicine: Enabling remote consultations and monitoring.
Patient Feedback: Using feedback to enhance healthcare services.
Big Data in Media & entertainment
Sector
• Media sector always generates data from various way such as
research , sales, customer databases, log files and so on.
• Even It is possible to figure out view or likes of an artist to know the
popularity in the digital media sector.
• It help cover other factors which belongs to media & entertainment
sector are following:
• Predicts Audience interests
• Provide insights into customer churn (process of something reduction)
• Optimized scheduling of media streams
• Content Monetizing
Big Data in E-Commerce Sector
• Amazon, Flipkart, Alibaba may more are collaborated with big data for
making right business decisions.
• Big data is grown in e-commerce & help them for predicating the
interests of user and provide their customers interesting search when
they shop online site.
• It also help companies to find the position of particular product that can
grow with competition & also able to find and compare with other
online stores.
• Online retailor make use the big data for better shopping experience ,
get customer satisfaction & generate more sales.
• Hadoop server best technique which provide the scalable & inexpensive
platform for data processing.

You might also like