Big Data Analytics - Unit1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

Big Data Analytics

Unit - 1
What is data?
Dictionary Definition:
The quantities, characters, or symbols on which
operations are performed by a computer, which may
be stored and transmitted in the form of electrical
signals and recorded on magnetic (audio tape), optical
(CD), or mechanical recording media (Phonographic
disc)
What is big data?
It is a collection of data that is huge in volume and yet
growing exponentially with time.

It is a data with so large size and complexity that none


of traditional data management tools can store it or
process it efficiently.

Big data is also a data but with huge size.


Definition
Big data is high-volume, and or / high velocity
information assets that demand cost-effective,
innovative forms of information processing for
enhanced insight and decision making.
- Gartner IT Glossary
The huge data is of the order of tera (10^12)bytes, Peta
(10^15) bytes or Zeta (10^21) bytes.
Explanation of Big Data definition
Big Data Definition
Why is Big Data important?
Using the data from any source and analyzing it, we
can find answers that
Streamline resource management
Improve operational efficiencies
Optimize product development
Drive new revenue and growth opportunities
Enable smart decision making
Big data enables to accomplish business
related tasks
Determine the root causes of failures, issues and
defects in near- real time (industrial usage)
Spotting anomalies faster and more accurately than
human eye (healthcare usage)
Recalculating entire risk portfolios in minutes
(investment / finance sector)
Detect fraudulent behavior before it affects your
organization.
Some examples of big data
The NYSE generates about one terabyte of new trade
data per day.
55 billion messages and 4.5 billion photos are
exchanged on wattsapp every day
300 hours of video is uploaded every minute
Every minute user sends 31.25 million messages and
watch 2.77 million videos
There are around 40,000 search queries googled each
second.
Types of Big Data
Structured
Semi-Structured
Unstructured
Types of Big Data
Big Data

Unstructured (80%)
Structured (10%)
Semi Structured (10%)
Un-structured Data
10 Characteristics of Big Data
Values Visualization
Volume Data clustering,
Getting value
Data Size sunbursts
out of data
Velocity
Vulnerability
Speed at which
Security Concerns
data is generated
The 10 V’s
Variety Volatility
Different types of of Big data
Data governance
data

Variability
Dynamic Evolving Validity
behavior in data Veracity Data quality check
science Confidence or Trust in
data
Data accuracy
Unstructured data for Analytics
Business Documents
Emails
Social Media
Customer feedback
Webpages
Open-ended survey responses
Images, Audio and Video

Importance of unstructured data analysis for businesses:


Improve the customer experience
Discover gaps in the market and innovate
Listen to your customers
How to Analyze unstructured Data
Choose the End Goal
Define a clear set of measurable goals.

Collect Relevant Data


Focus on the source of data

Clean Data
Reduce noise
Eliminate unwanted information

Implement Technology
NoSQL databases
Data visualization using Tableau, Google data studio
Unstructured Data Analytics
Data • Association Rule
Mining mining
• Regression Analysis
• Collaborative filtering
Dealing Text
with Mining
unstructure
d data

NLP

Noisy
Text
Analysis
Unstructured Data Analytics Tools
MonkeyLearn – Used for Text Analytics.
 This tool makes it simple to clean, label and visualize customer
feedback
Word Clouds - textual data visualization which allows anyone to
see in a single glance the words which have the highest frequency within
a given body of text
Listen to customer’s voice – open surveys and emails.
 Aspect is based on sentiment analysis

Amazon AWS
Microsoft Azure
IBM Cloud
The Advantages of Deploying Big Data
Better Decision making

Cost Reduction

Newer Products and redevelopment of the old

Risk Analysis

Collection of Data
Industries using Big Data
 CA technology have done a global study in which clearly the benefits of Big
data outweigh the obstacles in implementation
 The percentage of organizations that plan to and already have implemented
a big data project is 84%
 Acquision has increased to 54%, revenue has improved by 88%.

hiQ: It specializes in ‘people analytics’.


SumAI: Helps businesses optimize their social media campaigns with the
help of one single chart.
Splunk: Visual analytics
Alteryx: Combines structured and unstructured data from a number of
sources and stores it in one database. Spacial, predictive and statistical
analysis tasks are done on this data.
How big data is used in different industries
Media and entertainment:
Companies like Hulu and Netflix work with big data to
analyze user tendencies, preferred content, trends in
consumption.
Lot of services like spotify are coming up with new
revenue models to increase profits
Ads are targeted more strategically thanks to big data
analytics software.
Finance
Shift from Manual trading to trading backed by
technology

These models analyze big data to make


accurate enter / exit trade decisions,
minimize risk using machine learning and
guage market sentiment using opinion mining
Healthcare
With predictive analytics , big data can predict
negative health events that senior citizens would
experience from home care.
This reduced visits by 73% and 64% amongst
chronically ill patients
Big data can identify disease trends based on
demographics, geographies, socio economics and
other factors
Education
Improve learning management. Tracking how much
time learners spend on tasks, tests, and exams helps to
customize curricula efficiently.
Improve students’ performance. Leveraging data
about learners’ performance helps educators develop
personalized learning paths.
Provide data-driven decision-making.
Predict learning outcomes.
Use big data to reduce dropout rates
Retail
Enhance Service Quality

Optimize Price

Manage Supply Chain

Identify Potential Risks

Forecast demand
Manufacturing
Quality Assurance

Supply Chain Optimization

Improving Throughput and Yield 

Less downtime

Greater Customer Service


Big Data Challenges
Big Data Challenges
1. Lack of Knowledge Professionals
To run these large data tools, companies need skilled
data professionals. (data scientists, data analysts and
data engineers)
 Solution : Big data tools are used by professionals who are
not data science experts but have the basic knowledge.
 This saves a lot of money for the companies.

2. Lack of proper understanding of Massive Data


Employees not knowing how to store sensitive data.
 Solution: Data workshops and seminars must be held at
companies for everybody.
Big Data Challenges
3. Data Growth Issues:
One of the biggest challenges is to store the huge data.
 Solution: Compression is used to reduce the size of data
stored.
 De-duplication removes the duplicate and unwanted data
 Data Tiering stores data in different data tiers.(public clouds,
private cloud and flash storage)
4. Confusion while Big data tool selection
Companies are confused on which tool to select for Data
analysis and storage? Hbase, Cassandra etc.
 Solution: Hire experts who know which tools to use.
Big Data Challenges
5. Integrating data from a spread of sources
Data in corporation comes from various sources like social
media pages, ERP applications, customer logs etc.
 Solution: Data integration problems are solved by purchasing
proper tools.
6. Securing Data
Companies can lost a lot of revenue due to a stolen record
or a knowledge breach.
 Solution: cyber-security professionals guard their data. Other
steps include encryption, identity and access control,
implementation of end point security real-time security
monitoring
Assignment Questions
1. What is Big Data ?
2. Explain the types of data. Also briefly mention the sources
of each types of data along with examples.
3. Why is big data important. How does it help businesses
and briefly describe its usage across various
domains(industry, retail, healthcare, manufacturing,
education …)
4. Briefly describe the characteristics of big data.
5. Describe the types of analytics and mention the sources of
unstructured data used in analytics.
6. Mention some tools used in analytics
7. Discuss the Big Data challenges briefly

You might also like