Target SQL Business Case
❖ Topic: SQL
❖ Duration: 1 week
____________________________________________________________________________
Why this case study?
From company’s perspective:
● Target is a globally renowned brand and a prominent retailer in the United States.
Target makes itself a preferred shopping destination by offering outstanding value,
inspiration, innovation and an exceptional guest experience that no other retailer can
deliver.
● This particular business case focuses on the operations of Target in Brazil and provides
insightful information about 100,000 orders placed between 2016 and 2018. The
dataset offers a comprehensive view of various dimensions including the order status,
price, payment and freight performance, customer location, product attributes, and
customer reviews.
● By analyzing this extensive dataset, it becomes possible to gain valuable insights into
Target's operations in Brazil. The information can shed light on various aspects of the
business, such as order processing, pricing strategies, payment and shipping efficiency,
customer demographics, product characteristics, and customer satisfaction levels.
From learner’s perspective:
● Solving this business case holds immense importance for aspiring data analysts and
scientists.
● SQL is the backbone of data analysis and management in the modern era. By working
through this case study, individuals gain hands-on experience and practical skills in
manipulating and querying databases, which are fundamental to extracting insights
from large volumes of data.
● Solving this particular case study will help in developing proficiency in constructing
complex SQL queries and understanding real-world data scenarios.
● Additionally, it will enhance one's ability to communicate with the stakeholders
involved in data-related projects and help the organization take better, data-driven
decisions.
____________________________________________________________________________
Dataset:
https://drive.google.com/drive/folders/1TGEc66YKbD443nslRi1bWgVd238gJCnb?usp=sharing
The data is available in 8 different csv files:
1. customers.csv
2. geolocation.csv
3. order_items.csv
4. payments.csv
5. reviews.csv
6. orders.csv
7. products.csv
8. sellers.csv
The column description for these csv files is given below.
The customers.csv contain following features:
Features Description
customer_id ID of the consumer who made the purchase
customer_unique_id Unique ID of the consumer
customer_zip_code_prefix Zip Code of consumer’s location
customer_city Name of the City from where order is made
customer_state State Code from where order is made (Eg. são paulo - SP)
The geolocations.csv contain following features:
Features Description
geolocation_zip_code_prefix First 5 digits of Zip Code
geolocation_lat Latitude
geolocation_lng Longitude
geolocation_city City
geolocation_state State
The sellers.csv contains following features:
Features Description
Unique ID of the seller registered
seller_id
seller_zip_code_prefix Zip Code of the seller’s location
seller_city Name of the City of the seller
seller_state State Code (Eg. são paulo - SP)
The order_items.csv contain following features:
Features Description
order_id A Unique ID of order made by the consumers
order_item_id A Unique ID given to each item ordered in
the order
product_id A Unique ID given to each product available
on the site
seller_id Unique ID of the seller registered in Target
shipping_limit_date The date before which shipping of the
ordered product must be completed
price Actual price of the products ordered
freight_value Price rate at which a product is delivered
from one point to another
The payments.csv contain following features:
Features Description
order_id A Unique ID of order made by the
consumers.
payment_sequential Sequences of the payments made in case of
EMI
payment_type Mode of payment used (Eg. Credit Card)
payment_installments Number of installments in case of EMI
purchase.
payment_value Total amount paid for the purchase order
The orders.csv contain following features:
Features Description
order_id A Unique ID of order made by the consumers
customer_id ID of the consumer who made the purchase
order_status Status of the order made i.e. delivered,
shipped, etc.
order_purchase_timestamp Timestamp of the purchase
order_delivered_carrier_date Delivery date at which carrier made the
delivery
order_delivered_customer_date Date at which customer got the product
order_estimated_delivery_date Estimated delivery date of the products
The reviews.csv contain following features:
Features Description
review_id ID of the review given on the product
ordered by the order id
order_id A Unique ID of order made by the consumers
review_score Review score given by the customer for each
order on a scale of 1-5
review_comment_title Title of the review
review_comment_message Review comments posted by the consumer
for each order
review_creation_date Timestamp of the review when it is created
review_answer_timestamp Timestamp of the review answered
The products.csv contain following features:
Features Description
product_id A Unique identifier for the proposed project.
product_category_name Name of the product category
product_name_lenght Length of the string which specifies the name
given to the products ordered
product_description_lenght Length of the description written for each
product ordered on the site.
product_photos_qty Number of photos of each product ordered
available on the shopping portal
product_weight_g Weight of the products ordered in grams
product_length_cm Length of the products ordered in
centimeters
product_height_cm Height of the products ordered in
centimeters
product_width_cm Width of the product ordered in centimeters
Dataset schema:
____________________________________________________________________________
How to get started?
To complete the case study, begin by downloading the CSV files from the provided link.
Afterward, proceed to upload them onto BigQuery for further analysis.
Assuming you have already established a project called "Ecommerce" in your SQL workspace,
the next step is to create a new dataset named "target."
Now, systematically upload each CSV file as a separate BigQuery table within the "target"
dataset.
Once all the files have been successfully uploaded, you can conveniently access them within
the query editor by referencing the dataset and table names.
For example,
● To access the "customers" table, you would use the reference `target.customers`.
● Similarly, for the "orders" table, you would use `target.orders`, and so on.
____________________________________________________________________________
What is expected?
Assuming you are a data analyst/ scientist at Target, you have been assigned the task of
analyzing the given dataset to extract valuable insights and provide actionable
recommendations.
Submission Process:
Once you’re done with the case study...
● Use a Word document to paste your SQL queries along with a screenshot of the first
10 rows from the output.
● List down any valuable insights that you find during the analysis and provide some
action items from the company’s perspective in order to improve the current situation.
● Convert your solutions doc into a PDF, and upload the same on the platform.
● Please note that after submitting once, you will not be allowed to edit your submission.
General Guidelines:
Evaluation will be kept lenient, so make sure you attempt this case study.
●
It is understandable that you might struggle with getting started on this or feel stuck at
●
some point.
In such case:
a. Read the question carefully and try to understand what exactly is being asked.
b. Brainstorm a little. If you’re getting an error, remember that Google is your best
friend.
c. You can watch the lecture recordings or go through your lecture notes once
again if you feel like you’re getting confused over some specific topics.
d. Discuss your problems with your peers. Make use of the Slack channel and
WhatsApp group.
e. Only if you think that there’s a major issue, you can reach out to your Instructor
via Slack or Email.
____________________________________________________________________________
What does ‘good’ look like?
I. Import the dataset and do usual exploratory analysis steps like checking the
structure & characteristics of the dataset.
A. Data type of all columns in the “customers” table.
Hint: We want you to display the data type of each column present in the
“customers” table.
B. Get the time range between which the orders were placed.
Hint: We want you to get the date & time when the first and last orders in our
dataset were placed.
C. Count the Cities & States of customers who ordered during the given period.
Hint: We want you to count the number of unique cities & states where orders
were placed by the customers during the given time period.
______________________________________________________________________________
II. In-depth Exploration:
A. Is there a growing trend in the no. of orders placed over the past years?
Hint: We want you to find out if no. of orders placed has increased gradually in
each month, over the past years.
B. Can we see some kind of monthly seasonality in terms of the no. of orders being
placed?
Hint: We want you to find out if the no. of orders placed are at peak during
certain months.
C. During what time of the day, do the Brazilian customers mostly place their
orders? (Dawn, Morning, Afternoon or Night)
● 0-6 hrs : Dawn
● 7-12 hrs : Mornings
● 13-18 hrs : Afternoon
● 19-23 hrs : Night
Hint: We want you to categorize the hours of a day into the given time brackets/
intervals and find out during which intervals the Brazilian customers usually
order the most.
______________________________________________________________________________
III. Evolution of E-commerce orders in the Brazil region:
A. Get the month on month no. of orders placed in each state.
Hint: We want you to get the no. of orders placed in each state, in each month
by our customers.
B. How are the customers distributed across all the states?
Hint: We want you to get the no. of unique customers present in each state.
______________________________________________________________________________
IV. Impact on Economy: Analyze the money movement by e-commerce by looking at
order prices, freight and others.
A. Get the % increase in the cost of orders from year 2017 to 2018 (include
months between Jan to Aug only).
Hint: You can use the payment_value column in the payments table to get the
cost of orders.
B. Calculate the Total & Average value of order price for each state.
Hint: We want you to fetch the total price and the average price of orders for
each state.
C. Calculate the Total & Average value of order freight for each state.
Hint: We want you to fetch the total freight value and the average freight value
of orders for each state.
_____________________________________________________________________________________
V. Analysis based on sales, freight and delivery time.
A. Find the no. of days taken to deliver each order from the order’s purchase date
as delivery time.
Also, calculate the difference (in days) between the estimated & actual delivery
date of an order.
Do this in a single query.
Hint: You can calculate the delivery time and the difference between the
estimated & actual delivery date using the given formula:
● time_to_deliver = order_delivered_customer_date -
order_purchase_timestamp
● diff_estimated_delivery = order_estimated_delivery_date -
order_delivered_customer_date
B. Find out the top 5 states with the highest & lowest average freight value.
Hint: We want you to find the top 5 & the bottom 5 states arranged in
increasing order of the average freight value.
C. Find out the top 5 states with the highest & lowest average delivery time.
Hint: We want you to find the top 5 & the bottom 5 states arranged in
increasing order of the average delivery time.
D. Find out the top 5 states where the order delivery is really fast as compared to
the estimated date of delivery.
You can use the difference between the averages of actual & estimated delivery
date to figure out how fast the delivery was for each state.
Hint: Include only the orders that are already delivered.
______________________________________________________________________________
VI. Analysis based on the payments:
A. Find the month on month no. of orders placed using different payment types.
Hint: We want you to count the no. of orders placed using different payment
methods in each month over the past years.
B. Find the no. of orders placed on the basis of the payment installments that have
been paid.
Hint: We want you to count the no. of orders placed based on the no. of
payment installments where at least one installment has been successfully
paid.
____________________________________________________________________________
FAQs
Q. Which platform am I supposed to use for writing queries?
You are advised to use BigQuery as the platform for solving this case study.
Q. I am having issues setting up BigQuery .
For any query related to BigQuery setup, you can refer to this doc - link
Q. How do I get data types of columns in BigQuery?
We can get the data type of a column using the BigQuery INFORMATION_SCHEMA.
Q. What do you mean by month on month in the questions?
Month on Month means for each month starting from 1 (Jan) to 12 (Dec) in natural order.
Q. Do I need to find the time difference or the date difference?
You have to find the difference between the given timestamps in days.