0% found this document useful (0 votes)

58 views6 pages

PYF_Project_LearnerNotebook_LowCode

The document outlines a data analysis project for FoodHub, a food aggregator app, focusing on the increasing demand for restaurant delivery services in New York. The analysis aims to answer key questions regarding customer orders, restaurant performance, and delivery metrics using a provided dataset. It includes detailed instructions for data manipulation, exploratory analysis, and visualization to enhance customer experience and improve business strategies.

Uploaded by

Aizaz Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views6 pages

PYF_Project_LearnerNotebook_LowCode

Uploaded by

Aizaz Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Project Python Foundations: FoodHub Data Analysis

Context
The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic
lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator
company FoodHub offers access to multiple restaurants through a single smartphone app.

The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the
order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the
food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food.
The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The
food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.

Objective
The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze
the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are
hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data
analysis to find answers to these questions that will help the company to improve the business.

Data Description
The data contains the different data related to a food order. The detailed data dictionary is given below.

Data Dictionary
order_id: Unique ID of the order
customer_id: ID of the customer who ordered the food
restaurant_name: Name of the restaurant
cuisine_type: Cuisine ordered by the customer
cost_of_the_order: Cost of the order
day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is
Saturday and Sunday)
rating: Rating given by the customer out of 5
food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the
timestamps of the restaurant's order confirmation and the delivery person's pick-up confirmation.
delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the
timestamps of the delivery person's pick-up confirmation and drop-off information

Please read the instructions carefully before starting the project.

This is a commented Jupyter IPython Notebook file in which all the instructions and tasks to be performed are mentioned. Read along carefully to
complete the project.

Blanks '___' are provided in the notebook that needs to be filled with an appropriate code to get the correct result. Please replace the blank with the
right code snippet. With every '___' blank, there is a comment that briefly describes what needs to be filled in the blank space.
Identify the task to be performed correctly, and only then proceed to write the required code.
Fill the code wherever asked by the commented lines like "# write your code here" or "# complete the code". Running incomplete code may throw
an error.
Please run the codes in a sequential manner from the beginning to avoid any unnecessary errors.
You can the results/observations derived from the analysis here and use them to create your final presentation.

Let us start by importing the required libraries

In [ ]: # Installing the libraries with the specified version.

!pip install numpy==1.25.2 pandas==1.5.3 matplotlib==3.7.1 seaborn==0.13.1 -q --user

Note: After running the above cell, kindly restart the notebook kernel and run all cells sequentially from the start again.

In [ ]: # Import libraries for data manipulation

import numpy as np
import pandas as pd

# Import libraries for data visualization

import matplotlib.pyplot as plt
import seaborn as sns
Understanding the structure of the data

In [ ]: # uncomment and run the following lines for Google Colab

# from google.colab import drive
# drive.mount('/content/drive')

In [ ]: # Read the data

df = pd.read_csv('_______') ## Fill the blank to read the data

In [ ]: # Returns the first 5 rows

df.head()

Question 1: How many rows and columns are present in the data? [0.5 mark]

In [ ]: # Check the shape of the dataset

df._______ ## Fill in the blank

Question 2: What are the datatypes of the different columns in the dataset? [0.5 mark]

In [ ]: df.info()

Question 3: Are there any missing values in the data? If yes, treat them using an appropriate method. [1
Mark]

In [ ]: # Checking for missing values in the data

df.'______' #Write the appropriate function to print the sum of null values for each column

Question 4: Check the statistical summary of the data. What is the minimum, average, and maximum time it
takes for food to be prepared once an order is placed? [2 marks]

In [ ]: # Get the summary statistics of the numerical data

df.'_______' ## Write the appropriate function to print the statitical summary of the data (Hint - you have s
een this in the case studies before)

Question 5: How many orders are not rated? [1 mark]

In [ ]: df['_______'].value_counts() ## Complete the code

Exploratory Data Analysis (EDA)

Univariate Analysis

Question 6: Explore all the variables and provide observations on their distributions. (Generally, histograms,
boxplots, countplots, etc. are used for univariate exploration.) [9 marks]

Order ID

In [ ]: # check unique order ID

df['order_id'].nunique()

Customer ID

In [ ]: # check unique customer ID

df['customer_id'].'_____' ## Complete the code to find out number of unique Customer ID

Restaurant name

In [ ]: # check unique Restaurant Name

df['restaurant_name'].'_____' ## Complete the code to find out number of unique Restaurant Name
Cuisine type

In [ ]: # Check unique cuisine type

df['cuisine_type'].'_______' ## Complete the code to find out number of unique cuisine type

In [ ]: plt.figure(figsize = (15,5))
sns.countplot(data = df, x = 'cuisine_type') ## Create a countplot for cuisine type.

Cost of the order

In [ ]: sns.histplot(data=df,x='cost_of_the_order') ## Histogram for the cost of order

plt.show()
sns.boxplot(data=df,x='cost_of_the_order') ## Boxplot for the cost of order
plt.show()

Day of the week

In [ ]: # # Check the unique values

df['day_of_the_week'].'_______' ## Complete the code to check unique values for the 'day_of_the_week' column

In [ ]: sns.countplot(data = df, x = '______') ## Complete the code to plot a bar graph for 'day_of_the_week' column

Rating

In [ ]: # Check the unique values

df['rating'].'_______' ## Complete the code to check unique values for the 'rating' column

In [ ]: sns.countplot(data = df, x = '______') ## Complete the code to plot bar graph for 'rating' column

Food Preparation time

In [ ]: sns.histplot(data=df,x='_____') ## Complete the code to plot the histogram for the cost of order
plt.show()
sns.boxplot(data=df,x='_____') ## Complete the code to plot the boxplot for the cost of order
plt.show()

Delivery time

In [ ]: sns.histplot(data=df,x='_____') ## Complete the code to plot the histogram for the delivery time
plt.show()
sns.boxplot(data=df,x='_____') ## Complete the code to plot the boxplot for the delivery time
plt.show()

Question 7: Which are the top 5 restaurants in terms of the number of orders received? [1 mark]

In [ ]: # Get top 5 restaurants with highest number of orders

df['restaurant_name'].'_______' ## Complete the code

Question 8: Which is the most popular cuisine on weekends? [1 mark]

In [ ]: # Get most popular cuisine on weekends

df_weekend = df[df['day_of_the_week'] == 'Weekend']
df_weekend['cuisine_type'].'_______' ## Complete the code to check unique values for the cuisine type on week
end
Question 9: What percentage of the orders cost more than 20 dollars? [2 marks]

In [ ]: # Get orders that cost above 20 dollars

df_greater_than_20 = df[df['_______']>20] ## Write the appropriate column name to get the orders having cost
above $20

# Calculate the number of total orders where the cost is above 20 dollars
print('The number of total orders that cost above 20 dollars is:', df_greater_than_20.shape[0])

# Calculate percentage of such orders in the dataset

percentage = (df_greater_than_20.shape[0] / df.shape[0]) * 100

print("Percentage of orders above 20 dollars:", round(percentage, 2), '%')

Question 10: What is the mean order delivery time? [1 mark]

In [ ]: # Get the mean delivery time

mean_del_time = df['delivery_time'].'_______' ## Write the appropriate function to obtain the mean delivery
time

print('The mean delivery time for this dataset is', round(mean_del_time, 2), 'minutes')

Question 11: The company has decided to give 20% discount vouchers to the top 5 most frequent customers.
Find the IDs of these customers and the number of orders they placed. [1 mark]

In [ ]: # Get the counts of each customer_id

df['_______'].value_counts().head(3) ## Write the appropriate column name to get the top 5 cmost frequent cu
stomers

Multivariate Analysis

Question 12: Perform a multivariate analysis to explore relationships between the important variables in the
dataset. (It is a good idea to explore relations between numerical variables as well as relations between
numerical and categorical variables) [10 marks]

Cuisine vs Cost of the order

In [ ]: # Relationship between cost of the order and cuisine type

plt.figure(figsize=(15,7))
sns.boxplot(x = "cuisine_type", y = "cost_of_the_order", data = df, palette = 'PuBu', hue = "cuisine_type")
plt.xticks(rotation = 60)
plt.show()

Cuisine vs Food Preparation time

In [ ]: # Relationship between food preparation time and cuisine type

plt.figure(figsize=(15,7))
sns.boxplot('_______') ## Complete the code to visualize the relationship between food preparation time and
cuisine type using boxplot
plt.xticks(rotation = 60)
plt.show()

Day of the Week vs Delivery time

In [ ]: # Relationship between day of the week and delivery time

plt.figure(figsize=(15,7))
sns.boxplot('_______') ## Complete the code to visualize the relationship between day of the week and delive
ry time using boxplot
plt.show()

Run the below code and write your observations on the revenue generated by the restaurants.

In [ ]: df.groupby(['restaurant_name'])['cost_of_the_order'].sum().sort_values(ascending = False).head(14)

Rating vs Delivery time

In [ ]: # Relationship between rating and delivery time
plt.figure(figsize=(15, 7))
sns.pointplot(x = 'rating', y = 'delivery_time', data = df)
plt.show()

Rating vs Food preparation time

In [ ]: # Relationship between rating and food preparation time

plt.figure(figsize=(15, 7))
sns.pointplot('_______') ## Complete the code to visualize the relationship between rating and food preparat
ion time using pointplot
plt.show()

Rating vs Cost of the order

In [ ]: # Relationship between rating and cost of the order

plt.figure(figsize=(15, 7))
sns.pointplot('_______') ## Complete the code to visualize the relationship between rating and cost of the
order using pointplot
plt.show()

Correlation among variables

In [ ]: # Plot the heatmap

col_list = ['cost_of_the_order', 'food_preparation_time', 'delivery_time']
plt.figure(figsize=(15, 7))
sns.heatmap(df[col_list].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()

Question 13: The company wants to provide a promotional offer in the advertisement of the restaurants. The
condition to get the offer is that the restaurants must have a rating count of more than 50 and the average
rating should be greater than 4. Find the restaurants fulfilling the criteria to get the promotional offer. [3
marks]

In [ ]: # Filter the rated restaurants

df_rated = df[df['rating'] != 'Not given'].copy()

# Convert rating column from object to integer

df_rated['rating'] = df_rated['rating'].astype('int')

# Create a dataframe that contains the restaurant names with their rating counts
df_rating_count = df_rated.groupby(['restaurant_name'])['rating'].count().sort_values(ascending = False).rese
t_index()
df_rating_count.head()

In [ ]: # Get the restaurant names that have rating count more than 50
rest_names = df_rating_count['______________']['restaurant_name'] ## Complete the code to get the restaurant
names having rating count more than 50

# Filter to get the data of restaurants that have rating count more than 50
df_mean_4 = df_rated[df_rated['restaurant_name'].isin(rest_names)].copy()

# Group the restaurant names with their ratings and find the mean rating of each restaurant
df_mean_4_rating = df_mean_4.groupby(['_______'])['_______'].mean().sort_values(ascending = False).reset_inde
x().dropna() ## Complete the code to find the mean rating

# filter for average rating greater than 4

df_avg_rating_greater_than_4 = df_mean_4_rating[df_mean_4_rating['_______'] > 4].sort_values(by='_______', as
cending=False).reset_index(drop=True) ## Complete the code to find restaurants with rating > 4

df_avg_rating_greater_than_4
Question 14: The company charges the restaurant 25% on the orders having cost greater than 20 dollars and
15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company across
all orders. [3 marks]

In [ ]: #function to determine the revenue

def compute_rev(x):
if x > 20:
return x*0.25
elif x > 5:
return x*0.15
else:
return x*0

df['Revenue'] = df['________'].apply(compute_rev) ## Write the apprpriate column name to compute the revenue
df.head()

In [ ]: # get the total revenue and print it

total_rev = df['Revenue'].'_____' ## Write the appropriate function to get the total revenue
print('The net revenue is around', round(total_rev, 2), 'dollars')

Question 15: The company wants to analyze the total time required to deliver the food. What percentage of
orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to be
prepared and then delivered.)[2 marks]

In [ ]: # Calculate total delivery time and add a new column to the dataframe df to store the total delivery time
df['total_time'] = df['food_preparation_time'] + df['delivery_time']

## Write the code below to find the percentage of orders that have more than 60 minutes of total delivery tim
e (see Question 9 for reference)

Question 16: The company wants to analyze the delivery time of the orders on weekdays and weekends. How
does the mean delivery time vary during weekdays and weekends? [2 marks]

In [ ]: # Get the mean delivery time on weekdays and print it

print('The mean delivery time on weekdays is around',
round(df[df['day_of_the_week'] == 'Weekday']['delivery_time'].mean()),
'minutes')

## Write the code below to get the mean delivery time on weekends and print it

Conclusion and Recommendations

Question 17: What are your conclusions from the analysis? What recommendations would you like to share
to help improve the business? (You can use cuisine type and feedback ratings to drive your business
recommendations.) [6 marks]

Conclusions:

Recommendations:

The God Anubis-Iconography and Epithets-partI PDF
100% (4)
The God Anubis-Iconography and Epithets-partI PDF
65 pages
Foodhub Project Full Code .HTML
88% (8)
Foodhub Project Full Code .HTML
30 pages
Preliminary
No ratings yet
Preliminary
450 pages
Reviews of Books On The Historical Jesus, by Author's Last Name
No ratings yet
Reviews of Books On The Historical Jesus, by Author's Last Name
45 pages
Passive Voice by Me
No ratings yet
Passive Voice by Me
12 pages
Syllabus World To 1500 Spring 2023 1010
No ratings yet
Syllabus World To 1500 Spring 2023 1010
7 pages
CA266 7 Posterior Probability and Bayes
No ratings yet
CA266 7 Posterior Probability and Bayes
17 pages
Broca Aphasia PDF
100% (1)
Broca Aphasia PDF
12 pages
First Quarter Requirement in Eapp: Submitted By: Genesis Ryms B. Domallig Student Name
No ratings yet
First Quarter Requirement in Eapp: Submitted By: Genesis Ryms B. Domallig Student Name
6 pages
Reported Speech
No ratings yet
Reported Speech
4 pages
Certificate of Registration For Value Added Tax in The United Arab Emirates
No ratings yet
Certificate of Registration For Value Added Tax in The United Arab Emirates
1 page
The Story of Keesh
No ratings yet
The Story of Keesh
4 pages
Log
No ratings yet
Log
2 pages
Project Template Notebook Ipynb 1
No ratings yet
Project Template Notebook Ipynb 1
23 pages
SMDM Guided Project Ashish
No ratings yet
SMDM Guided Project Ashish
25 pages
Adding Numbers With Regrouping
No ratings yet
Adding Numbers With Regrouping
13 pages
I. Choose The Correct Answer: 30 P
No ratings yet
I. Choose The Correct Answer: 30 P
2 pages
Course 10263A
No ratings yet
Course 10263A
7 pages
Tugas 1 Sesi 3 Binggris
No ratings yet
Tugas 1 Sesi 3 Binggris
4 pages
All Life Bank - AIML_ML_Project_low_code_notebook
No ratings yet
All Life Bank - AIML_ML_Project_low_code_notebook
78 pages
BCA Even 2025 update
No ratings yet
BCA Even 2025 update
1 page
AV Project Shivakumar Vanga
100% (1)
AV Project Shivakumar Vanga
37 pages
Physics_Crossword_Puzzle
No ratings yet
Physics_Crossword_Puzzle
1 page
Anthem For Doomed Youth
No ratings yet
Anthem For Doomed Youth
4 pages
FRA Main Project Part B Guided
No ratings yet
FRA Main Project Part B Guided
23 pages
Underground To Canada Questions
50% (2)
Underground To Canada Questions
6 pages
Time Series Forecasting Jupyter Code - Ipynb
No ratings yet
Time Series Forecasting Jupyter Code - Ipynb
2,484 pages
Blueprint Modernize The Employee User Experience Cloud With SAP Fiori Cloud
No ratings yet
Blueprint Modernize The Employee User Experience Cloud With SAP Fiori Cloud
26 pages
Reglas para Añadir: Present Continuous
No ratings yet
Reglas para Añadir: Present Continuous
5 pages
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
No ratings yet
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
48 pages
Cold Storage Assignment - Atanu
100% (2)
Cold Storage Assignment - Atanu
11 pages
Project-Time Series Forecasting
100% (1)
Project-Time Series Forecasting
10 pages
Capstone Notes-2
No ratings yet
Capstone Notes-2
27 pages
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
BSE-20S-038, Essay On Listning Skill
No ratings yet
BSE-20S-038, Essay On Listning Skill
4 pages
Reading Material High
No ratings yet
Reading Material High
23 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
Great Lakes Extraa_Learn Project Business Report - 2-Kavish-Rathod
No ratings yet
Great Lakes Extraa_Learn Project Business Report - 2-Kavish-Rathod
22 pages
TSF - Project
100% (1)
TSF - Project
5 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
Assignment 5 - Heuristics and Principles
No ratings yet
Assignment 5 - Heuristics and Principles
4 pages
Mvchine Learning Project Report
No ratings yet
Mvchine Learning Project Report
33 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
MySQL - Week 5 Quiz
100% (1)
MySQL - Week 5 Quiz
6 pages
PREDICTIVE MODELING
No ratings yet
PREDICTIVE MODELING
21 pages
PM Guided Project Sample Business Report
100% (1)
PM Guided Project Sample Business Report
52 pages
Time Series Forecasting - Rose - Buisness Report
100% (1)
Time Series Forecasting - Rose - Buisness Report
69 pages
SMDM Guided Project Sample Business Report
No ratings yet
SMDM Guided Project Sample Business Report
17 pages
AS Notebook - PCA - Wine Data-4
100% (1)
AS Notebook - PCA - Wine Data-4
1 page
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
100% (1)
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
12 pages
NIrupam Agarwal Business Report-ML
100% (1)
NIrupam Agarwal Business Report-ML
23 pages
Ready for C2 Proficiency Workbook
0% (1)
Ready for C2 Proficiency Workbook
20 pages
ML Models
No ratings yet
ML Models
2 pages
Percentage, Rate, and Base
No ratings yet
Percentage, Rate, and Base
4 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
FRA Extended
No ratings yet
FRA Extended
22 pages
The Tamil Nadu Municipal Engineering Subordinate Service Rules, 1970
No ratings yet
The Tamil Nadu Municipal Engineering Subordinate Service Rules, 1970
21 pages
LDA KNN Logistic
100% (1)
LDA KNN Logistic
29 pages
PM ProjectJune - 2021
100% (1)
PM ProjectJune - 2021
33 pages
ML-2 Guided Project Report
No ratings yet
ML-2 Guided Project Report
63 pages
Problem 2 Businessreport ML
No ratings yet
Problem 2 Businessreport ML
9 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
Mra Project - Milestone1: Student Name: Gowri Srinivasan Batch: Dsba Online Mar 20
No ratings yet
Mra Project - Milestone1: Student Name: Gowri Srinivasan Batch: Dsba Online Mar 20
30 pages
Prathamesh Shukla SMDM Project 20.08.23
100% (1)
Prathamesh Shukla SMDM Project 20.08.23
34 pages
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
No ratings yet
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
52 pages
Business Report SMDM Bhushan
No ratings yet
Business Report SMDM Bhushan
18 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
Nagareddy 18-Nov-2023
No ratings yet
Nagareddy 18-Nov-2023
20 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
Machine Learning Project Car Price Prediction Algorithm
No ratings yet
Machine Learning Project Car Price Prediction Algorithm
4 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
HL 650 Omt250
No ratings yet
HL 650 Omt250
16 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
Predictive Modeling
No ratings yet
Predictive Modeling
38 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
PROPOSAL y Conectores para Subir Nota
No ratings yet
PROPOSAL y Conectores para Subir Nota
3 pages
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
No ratings yet
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
18 pages
Surabhi FRA PartA
No ratings yet
Surabhi FRA PartA
13 pages
SMDM Report
No ratings yet
SMDM Report
12 pages
English Honours Cbcs Draft Syllabus 2
No ratings yet
English Honours Cbcs Draft Syllabus 2
37 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
SuperKart Milestone1 Final
No ratings yet
SuperKart Milestone1 Final
15 pages
Clustering Project
100% (1)
Clustering Project
44 pages
ML - Project - Business Report
No ratings yet
ML - Project - Business Report
43 pages