PYF_Project_LearnerNotebook_LowCode
PYF_Project_LearnerNotebook_LowCode
Context
The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic
lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator
company FoodHub offers access to multiple restaurants through a single smartphone app.
The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the
order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the
food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food.
The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The
food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.
Objective
The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze
the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are
hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data
analysis to find answers to these questions that will help the company to improve the business.
Data Description
The data contains the different data related to a food order. The detailed data dictionary is given below.
Data Dictionary
order_id: Unique ID of the order
customer_id: ID of the customer who ordered the food
restaurant_name: Name of the restaurant
cuisine_type: Cuisine ordered by the customer
cost_of_the_order: Cost of the order
day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is
Saturday and Sunday)
rating: Rating given by the customer out of 5
food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the
timestamps of the restaurant's order confirmation and the delivery person's pick-up confirmation.
delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the
timestamps of the delivery person's pick-up confirmation and drop-off information
Blanks '___' are provided in the notebook that needs to be filled with an appropriate code to get the correct result. Please replace the blank with the
right code snippet. With every '___' blank, there is a comment that briefly describes what needs to be filled in the blank space.
Identify the task to be performed correctly, and only then proceed to write the required code.
Fill the code wherever asked by the commented lines like "# write your code here" or "# complete the code". Running incomplete code may throw
an error.
Please run the codes in a sequential manner from the beginning to avoid any unnecessary errors.
You can the results/observations derived from the analysis here and use them to create your final presentation.
Note: After running the above cell, kindly restart the notebook kernel and run all cells sequentially from the start again.
Question 1: How many rows and columns are present in the data? [0.5 mark]
Question 2: What are the datatypes of the different columns in the dataset? [0.5 mark]
In [ ]: df.info()
Question 3: Are there any missing values in the data? If yes, treat them using an appropriate method. [1
Mark]
Question 4: Check the statistical summary of the data. What is the minimum, average, and maximum time it
takes for food to be prepared once an order is placed? [2 marks]
Univariate Analysis
Question 6: Explore all the variables and provide observations on their distributions. (Generally, histograms,
boxplots, countplots, etc. are used for univariate exploration.) [9 marks]
Order ID
Customer ID
Restaurant name
In [ ]: plt.figure(figsize = (15,5))
sns.countplot(data = df, x = 'cuisine_type') ## Create a countplot for cuisine type.
In [ ]: sns.countplot(data = df, x = '______') ## Complete the code to plot a bar graph for 'day_of_the_week' column
Rating
In [ ]: sns.countplot(data = df, x = '______') ## Complete the code to plot bar graph for 'rating' column
In [ ]: sns.histplot(data=df,x='_____') ## Complete the code to plot the histogram for the cost of order
plt.show()
sns.boxplot(data=df,x='_____') ## Complete the code to plot the boxplot for the cost of order
plt.show()
Delivery time
In [ ]: sns.histplot(data=df,x='_____') ## Complete the code to plot the histogram for the delivery time
plt.show()
sns.boxplot(data=df,x='_____') ## Complete the code to plot the boxplot for the delivery time
plt.show()
Question 7: Which are the top 5 restaurants in terms of the number of orders received? [1 mark]
# Calculate the number of total orders where the cost is above 20 dollars
print('The number of total orders that cost above 20 dollars is:', df_greater_than_20.shape[0])
print('The mean delivery time for this dataset is', round(mean_del_time, 2), 'minutes')
Question 11: The company has decided to give 20% discount vouchers to the top 5 most frequent customers.
Find the IDs of these customers and the number of orders they placed. [1 mark]
Multivariate Analysis
Question 12: Perform a multivariate analysis to explore relationships between the important variables in the
dataset. (It is a good idea to explore relations between numerical variables as well as relations between
numerical and categorical variables) [10 marks]
Run the below code and write your observations on the revenue generated by the restaurants.
In [ ]: df.groupby(['restaurant_name'])['cost_of_the_order'].sum().sort_values(ascending = False).head(14)
Question 13: The company wants to provide a promotional offer in the advertisement of the restaurants. The
condition to get the offer is that the restaurants must have a rating count of more than 50 and the average
rating should be greater than 4. Find the restaurants fulfilling the criteria to get the promotional offer. [3
marks]
# Create a dataframe that contains the restaurant names with their rating counts
df_rating_count = df_rated.groupby(['restaurant_name'])['rating'].count().sort_values(ascending = False).rese
t_index()
df_rating_count.head()
In [ ]: # Get the restaurant names that have rating count more than 50
rest_names = df_rating_count['______________']['restaurant_name'] ## Complete the code to get the restaurant
names having rating count more than 50
# Filter to get the data of restaurants that have rating count more than 50
df_mean_4 = df_rated[df_rated['restaurant_name'].isin(rest_names)].copy()
# Group the restaurant names with their ratings and find the mean rating of each restaurant
df_mean_4_rating = df_mean_4.groupby(['_______'])['_______'].mean().sort_values(ascending = False).reset_inde
x().dropna() ## Complete the code to find the mean rating
df_avg_rating_greater_than_4
Question 14: The company charges the restaurant 25% on the orders having cost greater than 20 dollars and
15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company across
all orders. [3 marks]
df['Revenue'] = df['________'].apply(compute_rev) ## Write the apprpriate column name to compute the revenue
df.head()
Question 15: The company wants to analyze the total time required to deliver the food. What percentage of
orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to be
prepared and then delivered.)[2 marks]
In [ ]: # Calculate total delivery time and add a new column to the dataframe df to store the total delivery time
df['total_time'] = df['food_preparation_time'] + df['delivery_time']
## Write the code below to find the percentage of orders that have more than 60 minutes of total delivery tim
e (see Question 9 for reference)
Question 16: The company wants to analyze the delivery time of the orders on weekdays and weekends. How
does the mean delivery time vary during weekdays and weekends? [2 marks]
## Write the code below to get the mean delivery time on weekends and print it
Question 17: What are your conclusions from the analysis? What recommendations would you like to share
to help improve the business? (You can use cuisine type and feedback ratings to drive your business
recommendations.) [6 marks]
Conclusions:
Recommendations: