Introduction to Data Analytics
Data Analytics, Data Science, and
Machine Learning
Learning Objectives
By the end of this lesson, you will be able to:
Define data science and machine learning
Differentiate between data science, machine learning, and
data analytics
Introduction to Data Science
Data Science
Data science is the study of data, which involves gathering, storing, analyzing, and
plotting data, to effectively extract useful information.
Aim: Gain meaningful insights from both structured and unstructured data.
Data Science
Preparation and
Data cleansing analysis
Trend forecast Machine learning
and data analytics
Types of Data Science
Data Science
Data Analytics Machine Learning Data Mining
Data Analytics
Data analytics is the process of examining and analyzing raw data sets to:
Draw conclusion Derive information
Derive insights from raw data
sources
Machine Learning
Learns from patterns in the past Predicts outcomes
using a set of algorithms accurately
Data Mining
● Data mining is the process of analyzing data from
different perspectives.
● It summarizes data into useful information.
● It helps increase revenue and cut costs.
Data Science, Data Analytics, and Machine Learning
Data Science and Data Analytics
Forecasts the future
based on past patterns
Data Scientist
Extracts meaningful insights
from various data sources
Data Analyst
Machine Learning
Machine learning creates systems that can learn from the data.
It is the ability of machines to predict outcomes based on patterns in the past.
Machine Learning
Leverages various algorithms to
train the machine
ML Engineer
Data Science and Machine Learning
Extracts useful information from
collected data sets Understands data from a
business point of view
Gathers data from various
sources
Provides accurate predictions to
improve key business decisions
Data Scientist
Understanding Data Science
Understanding Data Science
A data scientist combines both domain and technology perspectives.
Understanding Data Science
Works with data from video and
Knows multiple analytical
social media sources
functions
Data Scientist
Has a sound knowledge of technologies such as Python,
SAS, R, Scala, visualization libraries, SQL database, and
machine learning
Data Science: Process Flow
How car insurance costs less if
you pay bills on time?
Data scientists found that the people who
pay bill promptly are less prone to the
accidents
Data Science: Process Flow
Step 1: Data acquisition
Data scientists work with existing data
sets or gather them from various
sources.
Data acquisition
The most important part of the whole process is to have the correct data.
Data Science: Process Flow
Step 2: Data wrangling
● Choose the right tools from
Python, R, and SQL
● Derive a clean data set
Data acquisition
● Apply pick-and-shovel
algorithms
● Obtain meaningful data
Data wrangling
Data Science: Process Flow
Step 3: Machine learning
● Validate the model
● Perform necessary statistical analysis
Data acquisition ● Apply machine learning or recursive
analysis
● Run regression testing
● Compare results against other
techniques or sources
Machine learning
Data wrangling
Challenge of a Data Scientist
The most challenging part of being a data scientist is taking the results and presenting them to the
stakeholders in an easy and consumable manner.
Data Science and Business Strategy
Data Science and Business Strategy
Business owners used to measure their success based only on the Profit and Loss Statement.
Current era of technology leverages data science for efficient prediction on what will work.
Data Science and Business Strategy
The process flow of a data-driven decision-making process:
Define business Build a team of data
goals scientists
1 2
4 3
Identify data sources and
Design business
dashboards to track goals enable new sources of data
capture
Data Scientist: Asset to the Business
Empowers management Identifies and refines
the target audience
to make better decisions
Provides insights on
Identifies areas of
various KPIs and
improvement
parameters
Enables strategic changes Identifies opportunities
for better results
Data Scientist
Companies Using Data Science
Successful Companies Using Data Science
Few successful companies that use data science
Google Search Engine
Google Search Engine
Google uses data science to provide relevant search recommendations.
The influencing factors include:
● Query volume: unique and verifiable users
● Geographical locations
● Keyword or phrase matches on the web
● Scrubbing for inappropriate content
Facebook Tags
Facebook Tags
Facebook uses machine learning in every aspect including:
Scrolling the news feed Browsing images or videos
Facebook Tags
Uses clustering algorithm to:
Find mutual friends Send friend
suggestions
Alibaba
Alibaba’s Aliloan
Aliloan is an automated online system that provides flexible microloans to entrepreneurial
online vendors.
Alibaba’s Aliloan
Analyzes trading records Uses predictive models to
and evaluates risk analyze transaction records
Aliloan
Collects data from e- Determines merchants’
commerce platforms creditworthiness
Travel Industry
Travel Industry
Travel companies use datasets from social media, itineraries, predictive analytics, and location tracking
to arrive at the 360-degree view.
The sensors from different modes of transport provide real-time data on various parameters to predict
and prevent problems.
Travel Industry
Integrates historical data to Offers deals based on the user’s
ensure maximum yield preferences or recommended
local attractions
Predictive algorithms help drivers predict fuel needs, ETAs, and delays.
Retail
Retail
RFM analysis is a marketing technique that leverages data to determine the target customer.
Recency Frequency Monetary
Retailers use data science to segment customers into RFM groups and target marketing and
promotions.
E-Commerce
E-Commerce
Amazon is an e-commerce giant that leverages data science to the fullest extent.
Amazon prefers an everything under one roof model.
E-Commerce
E-commerce companies use data science to upsell through their websites.
Amazon’s People who viewed that product, also liked this functionality uses
sophisticated mining techniques and boosts business.
Crime Agencies
Crime Agencies
Analytics keeps crime in check by:
● Using identified patterns to derive
prediction techniques
● Analyzing previous data to prevent future
burglaries
Crime Agencies
● Data mining can help identify pattern in from
domestic violence to terrorism.
● Advanced analytics helps prevent crime by using
information from social media.
Crime Agencies
Crime prevention agencies use data science in
deciding:
● Where to deploy police manpower?
● Who to search at a border crossing?
● Which intelligence to consider in
counter-terrorism activities?
Analytical Platforms across Industries
Analytical Platforms across Industries
Data storage Tools
Algorithms Architectures
platforms
Analytical Platforms across Industries
Machine
Architectures Data storage Tools
learning
platforms
algorithms
Forecasting
Regression
Bayesian network
Vector autoregression
Analytical Platforms across Industries
Machine learning Deep learning Data storage Tools
algorithms architectures platforms
Deep Belief Network (DBN)
Convolutional Neural Network (CNN)
Recurrent Neural Network (RNN)
Analytical Platforms across Industries
Machine learning Deep learning Cloud storage Tools
algorithms architectures platforms
Amazon AWS
Microsoft Azure
Lambda
Analytical Platforms across Industries
Machine learning Deep learning Cloud storage
architectures platforms Tools
algorithms
Analytics tools
● Spark
● Python
● R
Reporting tools ● Apache Pig
● Tableau
● Splunk
● Power BI
● Kibana
Key Takeaways
Data science is the study of data, which involves gathering,
storing, analyzing, and plotting data, to effectively extract
useful information.
Data science is an umbrella that contains data analytics,
data mining, and machine learning.
Data science is used by many successful companies such as
Google, Facebook, and Alibaba.
Analytical platforms across industries include algorithms,
architecture, data storage platforms, and tools.