BUSINESS ANALYTICS
IN
MARKETING
SYLLABUS
problem-solving DATA wrangling VISUALIZATION EXPERT
using data & ANALYSIS & STORYTELLING SHARING
• Problem statement & goal • Data wrangling • Graphs, charts &
setting • Data analysis dashboards
• Data analytics roadmap • RFM analysis • Story-telling
• Analytics tools exercise • Common pitfalls
HOW TO SOLVE PROBLEMS USING DATA?
STEP 1 Step 2 Step 3
Problem statement Data wrangling Data analysis
Step 4 Step 5
Data visualization Communication
data WRANGLING
& analysis
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
data DATA DATA
DISCOVERY STRUCTURING CLEANING
DATA DATA DATA
ENRICHING VALIDATING publishing
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
data What does this dataset mean?
DISCOVERY
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
DATA A dataset is a collection of values.
STRUCTURING Every value belongs to a variable and an
observation.
Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.
A variable contains all values that measure the
same underlying attribute across units. An
observation contains all values measured on
the same unit across attributes.
PRACTICE
PROBLEM-SOLVE
HOW TO SOLVE PROBLEMS USING DATA?
GROUP Work in groups. Discuss the provided case
DISCUSSION studies.
Present back 15’/group
PRACTICE: CASE NO.1
Context
Amazon is a multinational technology company based in Seattle, Washington, United States. It is
one of the largest online retailers in the world, selling a wide variety of products, including
electronics, books, clothing, and household items. Amazon is among the top 5 most valuable
companies in terms of market capitalization (Jan 2023).
In 2021, only a third of Amazon’s new hires stayed with the company for more than 90
days before quitting, being fired, or getting laid off.
An investigation from the New York Times found that, among hourly employees,
Amazon’s turnover was approximately 150 percent annually.
Those numbers indicate that Amazon is having serious issues retaining employees.
Amazon estimated that its attrition rate costs it almost $8 billion a year across its
global consumer field operations team.
• Clark M. Leaked documents show just how fast employees are leaving Amazon [Internet]. The Verge. 2022 [cited 2023Feb19].
Available from: https://www.theverge.com/2022/10/17/23409920/amazon-third-hires-attrition-cost-workforce
• Villegas A, Beachy S. Inside Amazon’s Employment Machine [Internet]. New York Time. 2021 [cited 2023Feb19].
Available from: https://www.nytimes.com/interactive/2021/06/15/us/amazon-workers.html
PRACTICE: CASE NO.2
Context
Netflix is an American streaming service that provides a wide range of TV shows, movies,
documentaries, and other forms of entertainment to subscribers. It was founded in 1997
originally as a DVD-by-mail service before transitioning to a streaming service in 2007.
During the three-month period ending June 30 2022, Netflix reported a loss of 970,000
subscribers. This is the largest quarterly loss in the company’s history.
Previously, in April, the company reported that it had lost 200,000 subscribers in the
first quarter of 2022 — the first big loss in over a decade.
Netflix’s stock was on a decline of approximately 70% from the beginning of the year
to July 2022. Its market valuation has decreased from $300 billion to under $90
billion in less than a year.
• Forristal L. Netflix loses 970,000 subscribers, its largest quarterly loss ever [Internet]. TechCrunch. 2022 [cited 2023Feb19].
Available from: https://techcrunch.com/2022/07/19/netflix-loses-970000-subscribers-its-largest-quarterly-loss-ever/
data
STRUCTURe
HOW TO SOLVE PROBLEMS USING DATA?
DATA A dataset is a collection of values.
STRUCTURING Every value belongs to a variable and an
observation.
Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.
A variable contains all values that measure the
same underlying attribute across units. An
observation contains all values measured on
the same unit across attributes.
DATA STRUCTURE
Transforming given data to the standard tabular
format (variables - observations - values).
HOW TO SOLVE PROBLEMS USING DATA?
EXTRA INFO: RELATIONAL DATABASE
A relational database is a collection of
information that organizes data in predefined
relationships where data is stored in one or
more tables (or "relations") of columns and
rows.
Relationships are a logical connection
between different tables, established on the
basis of interaction among these tables.
A database schema comprise of all
relationships and defines how data is
organized within a relational database
What is a relational database (RDBMS)? [Internet]. Google. Google; [cited 2023Feb19]. Available from: https://cloud.google.com/learn/what-is-a-relational-database#
Brazilian e-commerce public dataset by Olist [Internet]. Kaggle. Olist; 2021 [cited 2023Feb19]. Available from: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
the most
common
data TYPES?
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA analysis: DATA TYPES IN EXCEL
There are 4 main data types for data wrangling and analysis using Excel:
TEXT NUMBER BOOLEAN ERROR
A, B, C 1, 2, 3 TRUE #DIV/0, #N/A,
apple, Banana 1.2, 1.999 FALSE #NAME?,
Who? -1, -0.9 #NULL!,
“10”, “2.1” *date, #NUM!, #REF!
“TRUE” *time, #VALUE!
“” *duration etc.
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA analysis: DATA TYPES IN EXCEL
WARNING: In Excel, the data type (what it is) and data format (how we see
it) of one value might be vastly different:
DATA TYPE DATA FORMAT
● Character: A, B, C
TEXT ●
●
Special character: !@#$%^&*()
Text: apple, Banana, ORANGE
● Numbers as text: “0”, “1.1”
● Number: 0, 1.2, -1, -3.5
number ● Percentage: 12%, 1.5%, -3%
● Accounting; currency: (3), 4; 5000đ, $5.00
● Date; datetime: Feb 19th, 2023; 2023-02-19 17:00:00
● Duration: 3:20:00
HOW TO SOLVE PROBLEMS USING DATA?
EXTRA INFO: DATA TYPES IN DATABASE
There are many main data types for a structured database. They are heavily
validated. PostgresQL has some data types similar to Excel including:
TEXT NUMBER TEMPORAL BOOLEAN NULL
TEXT FLOAT DATE TRUE
CHAR INTEGER TIME FALSE
VARCHAR DATETIME
TIMESTAMP
INTERVAL
data
QUALITY
HOW TO SOLVE PROBLEMS USING DATA?
Korolov, M. (2022) 6 dimensions of Data Quality Boost Data Performance, TechTarget. TechTarget.
Available at: https://www.techtarget.com/searchdatamanagement/tip/6-dimensions-of-data-quality-boost-data-performance (Accessed: February 19, 2023).
data
CLEANING
HOW TO SOLVE PROBLEMS USING DATA?
DATA
STRUCTURING
Create unique ids for
the observations.
HOW TO SOLVE PROBLEMS USING DATA?
DATA Missing values:
CLEANING ● Drop the observation
DATA
altogether.
ENRICHING
● Impute the values using the
mean, median or max value
(for continuous values) or
most frequent value (for
categorical values) depending
on the situation.
● Ignore the values of those
variables only.
HOW TO SOLVE PROBLEMS USING DATA?
DATA Invalid value:
CLEANING ● Drop the observation
DATA
altogether.
ENRICHING
● Correct the value using most
reasonable methods.
HOW TO SOLVE PROBLEMS USING DATA?
DATA
CLEANING
DATA
ENRICHING
Remove, impute or
correct missing values
and invalid values.
HOW TO SOLVE PROBLEMS USING DATA?
Value inconsistency: lengths
Data type inconsistency: text (“20”)
vs. number (19)
DATA
VALIDATING
Validate and ensure the correct data
types, data homogeneity and
constraints.
HOW TO SOLVE PROBLEMS USING DATA?
DATA
publishing
Store & manage data in suitable
format and system to deliver &
distribute the data to end-users
through platform and tools.
HOW TO SOLVE PROBLEMS USING DATA?
DATA
STRUCTURING
Remove duplicates.
MID-TERM
BRIEFING
MID-TERM BRIEFING
Instruction & Dataset:
BA_S3(2024-2025)_Assignment_Mid-term
● Work in small team (max 4)
● Submit on eLearning by Mon, 16th Jun 2025
KAHOOT TIME !!!
LOG YOUR GEMS HERE
Among 3 approaches to data
culture, which one relies the
least on available data?
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
What does "relation" mean
in a relational database?
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
"There will probably be an economic crisis
this winter"
This statement refers to which basic
components of the time series data?
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
What is NOT a suitable way
to deal with missing data?
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
The column "weight" in a dataset
"us_teenager" has the following values
[48,49.5,50,50,70,180,700]. At most, this
dataset violates which data quality
dimension(s)?
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.