Medical inventory optimization
Exploratory Data Analysis (SQL) by Prathmesh Patil
Software: MySQL Workbench
Business decisions based on the unclean ‘projectfinaldata’ table
1. Set the current database to "med_inventory".
USE med_inventory;
2. Displaying the table.
SELECT * FROM projectfinaldata LIMIT 20;
Output:
3. Calculating the first moment (measures of central tendency such as mean, median, mode) for
the dataset.
Mean:
SELECT
ROUND(AVG(Quantity), 2) AS mean_quantity,
ROUND(AVG(ReturnQuantity), 2) AS mean_return_quantity,
ROUND(AVG(Final_Cost), 2) AS mean_final_cost,
ROUND(AVG(Final_Sales), 2) AS mean_final_sales,
ROUND(AVG(RtnMRP), 2) AS mean_rtnmrp FROM projectfinaldata;
Output:
Median:
SELECT
ROUND(AVG(Final_Cost), 2) AS median_final_cost,
ROUND(AVG(Final_Sales), 2) AS median_final_sales,
ROUND(AVG(Quantity), 2) AS median_quantity,
ROUND(AVG(ReturnQuantity), 2) AS median_return_quantity,
ROUND(AVG(RtnMRP), 2) AS median_rtnmrp
FROM (
SELECT Final_Cost, Final_Sales, Quantity, ReturnQuantity, RtnMRP,
ROW_NUMBER() OVER (ORDER BY Final_Cost) AS row_num,
COUNT(*) OVER () AS total_rows FROM projectfinaldata
) AS subquery
WHERE row_num IN (FLOOR((total_rows + 1) / 2), CEILING((total_rows + 1) / 2));
Mode:
SELECT
mode_quantity.mode_value AS mode_quantity,
mode_return_quantity.mode_value AS mode_return_quantity,
mode_final_cost.mode_value AS mode_final_cost,
mode_final_sales.mode_value AS mode_final_sales,
mode_rtnmrp.mode_value AS mode_rtnmrp
FROM (
SELECT Quantity AS mode_value, COUNT(*) AS mode_count
FROM projectfinaldata
GROUP BY Quantity
ORDER BY COUNT(*) DESC
LIMIT 1
) AS mode_quantity,
SELECT ReturnQuantity AS mode_value, COUNT(*) AS mode_count
FROM projectfinaldata
GROUP BY ReturnQuantity
ORDER BY COUNT(*) DESC
LIMIT 1
) AS mode_return_quantity,
SELECT Final_Cost AS mode_value, COUNT(*) AS mode_count
FROM projectfinaldata
GROUP BY Final_Cost
ORDER BY COUNT(*) DESC
LIMIT 1
) AS mode_final_cost,
SELECT Final_Sales AS mode_value, COUNT(*) AS mode_count
FROM projectfinaldata
GROUP BY Final_Sales
ORDER BY COUNT(*) DESC
LIMIT 1
) AS mode_final_sales,
SELECT RtnMRP AS mode_value, COUNT(*) AS mode_count
FROM projectfinaldata
GROUP BY RtnMRP
ORDER BY COUNT(*) DESC
LIMIT 1
) AS mode_rtnmrp;
Output:
4. Calculating the second moment (measures of dispersion such as variance, standard deviation,
range) for the dataset.
Variance:
SELECT
ROUND(VARIANCE(Quantity), 2) AS variance_quantity,
ROUND(VARIANCE(ReturnQuantity), 2) AS variance_return_quantity,
ROUND(VARIANCE(Final_Cost), 2) AS variance_final_cost,
ROUND(VARIANCE(Final_Sales), 2) AS variance_final_sales,
ROUND(VARIANCE(RtnMRP), 2) AS variance_rtnmrp
FROM projectfinaldata;
Output:
Standard Deviation:
SELECT
ROUND(STDDEV(Quantity), 2) AS stddev_quantity,
ROUND(STDDEV(ReturnQuantity), 2) AS stddev_return_quantity,
ROUND(STDDEV(Final_Cost), 2) AS stddev_final_cost,
ROUND(STDDEV(Final_Sales), 2) AS stddev_final_sales,
ROUND(STDDEV(RtnMRP), 2) AS stddev_rtnmrp
FROM projectfinaldata;
Output:
Range:
SELECT
MAX(Quantity) - MIN(Quantity) AS range_quantity,
MAX(ReturnQuantity) - MIN(ReturnQuantity) AS range_return_quantity,
MAX(Final_Cost) - MIN(Final_Cost) AS range_final_cost,
MAX(Final_Sales) - MIN(Final_Sales) AS range_final_sales,
MAX(RtnMRP) - MIN(RtnMRP) AS range_rtnmrp
FROM projectfinaldata;
Output:
5. Calculating the third moment (skewness) for the dataset.
Skewness:
SELECT 'Quantity' AS column_name,
ROUND((SUM(POW(Quantity - (SELECT AVG(Quantity) FROM projectfinaldata), 3)) / (COUNT(*) *
POW(STDDEV(Quantity), 3))), 2) AS skewness_value
FROM projectfinaldata
UNION ALL
SELECT 'ReturnQuantity' AS column_name,
ROUND((SUM(POW(ReturnQuantity - (SELECT AVG(ReturnQuantity) FROM projectfinaldata), 3)) /
(COUNT(*) * POW(STDDEV(ReturnQuantity), 3))), 2) AS skewness_value
FROM projectfinaldata
UNION ALL
SELECT 'Final_Cost' AS column_name,
ROUND((SUM(POW(Final_Cost - (SELECT AVG(Final_Cost) FROM projectfinaldata), 3)) / (COUNT(*) *
POW(STDDEV(Final_Cost), 3))), 2) AS skewness_value
FROM projectfinaldata
UNION ALL
SELECT 'Final_Sales' AS column_name,
ROUND((SUM(POW(Final_Sales - (SELECT AVG(Final_Sales) FROM projectfinaldata), 3)) / (COUNT(*) *
POW(STDDEV(Final_Sales), 3))), 2) AS skewness_value
FROM projectfinaldata
UNION ALL
SELECT 'RtnMRP' AS column_name,
ROUND((SUM(POW(RtnMRP - (SELECT AVG(RtnMRP) FROM projectfinaldata), 3)) / (COUNT(*) *
POW(STDDEV(RtnMRP), 3))), 2) AS skewness_value
FROM projectfinaldata;
Output:
6. Calculating the fourth moment (kurtosis) for the dataset.
Kurtosis:
SELECT
ROUND((SUM(POWER(Quantity - avg_value, 4)) / (COUNT(Quantity) * POWER(STDDEV(Quantity),
4))), 2) AS kurtosis_quantity,
ROUND((SUM(POWER(ReturnQuantity - avg_value, 4)) / (COUNT(ReturnQuantity) *
POWER(STDDEV(ReturnQuantity), 4))), 2) AS kurtosis_return_quantity,
ROUND((SUM(POWER(Final_Cost - avg_value, 4)) / (COUNT(Final_Cost) *
POWER(STDDEV(Final_Cost), 4))), 2) AS kurtosis_final_cost,
ROUND((SUM(POWER(Final_Sales - avg_value, 4)) / (COUNT(Final_Sales) *
POWER(STDDEV(Final_Sales), 4))), 2) AS kurtosis_final_sales,
ROUND((SUM(POWER(RtnMRP - avg_value, 4)) / (COUNT(RtnMRP) * POWER(STDDEV(RtnMRP), 4))),
2) AS kurtosis_rtnmrp
FROM
(SELECT
AVG(Quantity) AS avg_value,
STDDEV(Quantity) AS stddev_value,
COUNT(Quantity) AS count_value
FROM projectfinaldata) AS subquery, projectfinaldata;
Output:
Business decisions based on ‘cleaned_table’ with pre-processed data
1. Calculating the first moment (measures of central tendency such as mean, median, mode) for
the dataset.
Mean:
SELECT
ROUND(AVG(Quantity), 2) AS mean_quantity,
ROUND(AVG(ReturnQuantity), 2) AS mean_return_quantity,
ROUND(AVG(Final_Cost), 2) AS mean_final_cost,
ROUND(AVG(Final_Sales), 2) AS mean_final_sales,
ROUND(AVG(RtnMRP), 2) AS mean_rtnmrp FROM cleaned_table;
Output:
Median:
SELECT
ROUND(AVG(Final_Cost), 2) AS median_final_cost,
ROUND(AVG(Final_Sales), 2) AS median_final_sales,
ROUND(AVG(Quantity), 2) AS median_quantity,
ROUND(AVG(ReturnQuantity), 2) AS median_return_quantity,
ROUND(AVG(RtnMRP), 2) AS median_rtnmrp
FROM (
SELECT Final_Cost, Final_Sales, Quantity, ReturnQuantity, RtnMRP,
ROW_NUMBER() OVER (ORDER BY Final_Cost) AS row_num,
COUNT(*) OVER () AS total_rows FROM cleaned_table
) AS subquery
WHERE row_num IN (FLOOR((total_rows + 1) / 2), CEILING((total_rows + 1) / 2));
Mode:
SELECT
mode_quantity.mode_value AS mode_quantity,
mode_return_quantity.mode_value AS mode_return_quantity,
mode_final_cost.mode_value AS mode_final_cost,
mode_final_sales.mode_value AS mode_final_sales,
mode_rtnmrp.mode_value AS mode_rtnmrp
FROM (
SELECT Quantity AS mode_value, COUNT(*) AS mode_count
FROM cleaned_table
GROUP BY Quantity
ORDER BY COUNT(*) DESC
LIMIT 1
) AS mode_quantity,
SELECT ReturnQuantity AS mode_value, COUNT(*) AS mode_count
FROM cleaned_table
GROUP BY ReturnQuantity
ORDER BY COUNT(*) DESC
LIMIT 1
) AS mode_return_quantity,
SELECT Final_Cost AS mode_value, COUNT(*) AS mode_count
FROM cleaned_table
GROUP BY Final_Cost
ORDER BY COUNT(*) DESC
LIMIT 1
) AS mode_final_cost,
SELECT Final_Sales AS mode_value, COUNT(*) AS mode_count
FROM cleaned_table
GROUP BY Final_Sales
ORDER BY COUNT(*) DESC
LIMIT 1
) AS mode_final_sales,
SELECT RtnMRP AS mode_value, COUNT(*) AS mode_count
FROM cleaned_table
GROUP BY RtnMRP
ORDER BY COUNT(*) DESC
LIMIT 1
) AS mode_rtnmrp;
Output:
2. Calculating the second moment (measures of dispersion such as variance, standard deviation,
range) for the dataset.
Variance:
SELECT
ROUND(VARIANCE(Quantity), 2) AS variance_quantity,
ROUND(VARIANCE(ReturnQuantity), 2) AS variance_return_quantity,
ROUND(VARIANCE(Final_Cost), 2) AS variance_final_cost,
ROUND(VARIANCE(Final_Sales), 2) AS variance_final_sales,
ROUND(VARIANCE(RtnMRP), 2) AS variance_rtnmrp
FROM cleaned_table;
Output:
Standard Deviation:
SELECT
ROUND(STDDEV(Quantity), 2) AS stddev_quantity,
ROUND(STDDEV(ReturnQuantity), 2) AS stddev_return_quantity,
ROUND(STDDEV(Final_Cost), 2) AS stddev_final_cost,
ROUND(STDDEV(Final_Sales), 2) AS stddev_final_sales,
ROUND(STDDEV(RtnMRP), 2) AS stddev_rtnmrp
FROM cleaned_table;
Output:
Range:
SELECT
MAX(Quantity) - MIN(Quantity) AS range_quantity,
MAX(ReturnQuantity) - MIN(ReturnQuantity) AS range_return_quantity,
MAX(Final_Cost) - MIN(Final_Cost) AS range_final_cost,
MAX(Final_Sales) - MIN(Final_Sales) AS range_final_sales,
MAX(RtnMRP) - MIN(RtnMRP) AS range_rtnmrp
FROM cleaned_table;
Output:
3. Calculating the third moment (skewness) for the dataset.
Skewness:
SELECT 'Quantity' AS column_name,
ROUND((SUM(POW(Quantity - (SELECT AVG(Quantity) FROM cleaned_table), 3)) / (COUNT(*) *
POW(STDDEV(Quantity), 3))), 2) AS skewness_value
FROM cleaned_table
UNION ALL
SELECT 'ReturnQuantity' AS column_name,
ROUND((SUM(POW(ReturnQuantity - (SELECT AVG(ReturnQuantity) FROM cleaned_table), 3)) /
(COUNT(*) * POW(STDDEV(ReturnQuantity), 3))), 2) AS skewness_value
FROM cleaned_table
UNION ALL
SELECT 'Final_Cost' AS column_name,
ROUND((SUM(POW(Final_Cost - (SELECT AVG(Final_Cost) FROM cleaned_table), 3)) / (COUNT(*) *
POW(STDDEV(Final_Cost), 3))), 2) AS skewness_value
FROM cleaned_table
UNION ALL
SELECT 'Final_Sales' AS column_name,
ROUND((SUM(POW(Final_Sales - (SELECT AVG(Final_Sales) FROM cleaned_table), 3)) / (COUNT(*) *
POW(STDDEV(Final_Sales), 3))), 2) AS skewness_value
FROM cleaned_table
UNION ALL
SELECT 'RtnMRP' AS column_name,
ROUND((SUM(POW(RtnMRP - (SELECT AVG(RtnMRP) FROM cleaned_table), 3)) / (COUNT(*) *
POW(STDDEV(RtnMRP), 3))), 2) AS skewness_value
FROM cleaned_table;
Output:
4. Calculating the fourth moment (kurtosis) for the dataset.
Kurtosis:
SELECT
ROUND((SUM(POWER(Quantity - avg_value, 4)) / (COUNT(Quantity) * POWER(STDDEV(Quantity),
4))), 2) AS kurtosis_quantity,
ROUND((SUM(POWER(ReturnQuantity - avg_value, 4)) / (COUNT(ReturnQuantity) *
POWER(STDDEV(ReturnQuantity), 4))), 2) AS kurtosis_return_quantity,
ROUND((SUM(POWER(Final_Cost - avg_value, 4)) / (COUNT(Final_Cost) *
POWER(STDDEV(Final_Cost), 4))), 2) AS kurtosis_final_cost,
ROUND((SUM(POWER(Final_Sales - avg_value, 4)) / (COUNT(Final_Sales) *
POWER(STDDEV(Final_Sales), 4))), 2) AS kurtosis_final_sales,
ROUND((SUM(POWER(RtnMRP - avg_value, 4)) / (COUNT(RtnMRP) * POWER(STDDEV(RtnMRP), 4))),
2) AS kurtosis_rtnmrp
FROM
(SELECT
AVG(Quantity) AS avg_value,
STDDEV(Quantity) AS stddev_value,
COUNT(Quantity) AS count_value
FROM projectfinaldata) AS subquery, cleaned_table;
Output:
Comparison table showing the business decisions results for the unclean and clean data:
Observation:
Overall, the results indicate that the unclean data exhibits higher mean, variance, standard deviation,
range, skewness, and kurtosis values compared to the clean data. This suggests greater
inconsistencies, variability, and potential outliers in the unclean data. Cleaning the data has resulted
in more stable and normalized distributions with reduced variability and potential biases, making it
more reliable for business decision-making.
Bounce Rate Analysis:
1. Finding the percentage of customers who bounced (returned a product with a final sale price of
0) out of the total number of customers.
SELECT ROUND((bounced_customers / total_customers) * 100, 2) AS bounce_rate
FROM
(SELECT COUNT(DISTINCT Patient_ID) AS total_customers
FROM cleaned_table
WHERE Typeofsales IN ('Sale', 'Return')) AS t1,
(SELECT COUNT(DISTINCT Patient_ID) AS bounced_customers
FROM cleaned_table
WHERE Typeofsales = 'Return' AND Final_Sales = 0) AS t2;
Output:
Insight:
We can understand that around 22.85% of customers in the ‘cleaned_table’ faced a situation where
they returned medicines with a Final_Sales value of 0. This means that a significant portion of
customers did not get the medicines they needed, which could lead to dissatisfaction. To improve
business success and increase revenue, it is important to reduce this bounce rate by ensuring
customers receive the medicines they require.
2. Finding the number of drugs in each subcategory that have been returned without making a sale
(Final_Sales = 0).
SELECT SubCat, COUNT(DISTINCT DrugName) AS num_returned_drugs
FROM cleaned_table
WHERE Typeofsales = 'Return' AND Final_Sales = 0
GROUP BY SubCat
ORDER BY num_returned_drugs DESC;
Output:
Insight:
We can observe that the subcategory "INJECTIONS" has the highest count of returned drug names
with 98 occurrences, followed by the subcategory "TABLETS & CAPSULES" with 63 occurrences,
indicating a potential issue with customer satisfaction, product quality, or other factors that lead to
returns for these two subcategories.
3. Finding the formulation with the highest return count within the "INJECTIONS" and "TABLETS &
CAPSULES" subcategories.
SELECT SubCat, Formulation, return_count
FROM (
SELECT SubCat, Formulation, COUNT(*) AS return_count,
ROW_NUMBER() OVER (PARTITION BY SubCat ORDER BY COUNT(*) DESC) AS rn
FROM cleaned_table
WHERE Typeofsales = 'Return' AND Final_Sales = 0 AND SubCat IN ('INJECTIONS', 'TABLETS &
CAPSULES')
GROUP BY SubCat, Formulation
) AS subquery
WHERE rn = 1;
Output:
Insight:
We can observe that within the "INJECTIONS" subcategory, the Formulation "Form1" has the highest
return count with 398 occurrences. Similarly, within the "TABLETS & CAPSULES" subcategory, the
Formulation "Form1" again has the highest return count with 77 occurrences.
4. Finding the count of occurrences of Formulation "Form1" for each Department (Dept) where the
SubCat is either "INJECTIONS" or "TABLETS & CAPSULES".
SELECT Dept, COUNT(*) AS form1_count
FROM cleaned_table
WHERE Formulation = 'Form1' AND SubCat IN ('INJECTIONS', 'TABLETS & CAPSULES')
GROUP BY Dept
ORDER BY form1_count DESC;
Output:
Insight: The Formulation "Form1" appears to be significantly more prevalent in Department1
compared to other departments which likely corresponds to the Pharmacy department, has a high
demand or usage of the "Form1" formulation. This could indicate that "Form1" is a commonly
prescribed or requested medication within this department.
5. Finding the count of occurrences of Typeofsales as 'Return' for each Department (Dept).
SELECT Dept, COUNT(*) AS return_count
FROM cleaned_table
WHERE Typeofsales = 'Return'
GROUP BY Dept;
Output:
Insight: Department1 has a relatively higher count of return occurrences compared to other
departments. This suggests that there may be more instances of customers returning products in
Department1.
6. Finding the count of occurrences of Typeofsales as 'Return' for each Specialisation within
Department1 and Formulation as 'Form1'
SELECT Specialisation, COUNT(*) AS return_count
FROM cleaned_table
WHERE Typeofsales = 'Return' AND Dept = 'Department1' AND Formulation = 'Form1'
GROUP BY Specialisation
ORDER BY return_count DESC;
Output:
Insight: We can observe that Specialisation4 and Specialisation7 are experiencing a relatively higher
number of returns compared to other Specialisations within Department1 and Formulation as
'Form1'.
Conclusion:
Based on the patterns and trends gained from the analysis of the dataset, the following conclusions
can be drawn:
1. Focus on Subcategories: The subcategories "INJECTIONS" and "TABLETS & CAPSULES" require
special attention due to their higher counts of returned drug names. The hospital should conduct a
thorough analysis of these subcategories to identify the underlying causes and take necessary steps
to address customer satisfaction, product quality, or other issues contributing to returns.
2. Evaluation of Formulation: The "Form1" formulation stands out with the highest return counts in
both the "INJECTIONS" and "TABLETS & CAPSULES" subcategories. It is essential to thoroughly
evaluate this formulation, considering factors such as product effectiveness, potential side effects,
and customer preferences. Improvements in the formulation or alternative options should be
explored to reduce returns.
3. Supplier Assessment: Assess the performance of suppliers providing products to Department1,
particularly those supplying the "Form1" formulation. Evaluate factors such as product quality,
reliability, and adherence to customer requirements. Consider alternative suppliers or negotiate with
existing ones to improve the quality and availability of these products.
4. Inventory Management: Ensure efficient inventory management for Department1, particularly for
products with the "Form1" formulation. Optimize stock levels, expiration dates, and replenishment
processes to minimize instances of expired or obsolete products. Proper inventory management can
help reduce returns and maintain a more cost-effective inventory.
5. Efficient Return Management: Department1, representing the Pharmacy department, experiences
a relatively higher count of return occurrences. The hospital should focus on implementing efficient
return management practices, such as streamlining return processes, improving inventory
management, and addressing any issues specific to Department1 that contribute to returns.
6. Specialisation-specific Strategies: Specialisation4 and Specialisation7 within Department1 and
associated with the "Form1" formulation exhibit a higher number of returns. The hospital should
collaborate with healthcare professionals in these specialisations to understand the reasons behind
returns, gather insights, and implement specialisation-specific strategies to reduce returns and
enhance customer satisfaction.
By taking these specific conclusions into consideration, the hospital can make informed business
decisions and implement targeted strategies to reduce bounce rate, improve customer satisfaction,
and achieve the economic success criteria.