|
| 1 | +# Box Plot |
| 2 | + |
| 3 | +A box plot represents the distribution of a dataset in a graph. It displays the summary statistics of a dataset, including the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. The box represents the interquartile range (IQR) between the first and third quartiles, while whiskers extend from the box to the minimum and maximum values. Outliers, if present, may be displayed as individual points beyond the whiskers. |
| 4 | + |
| 5 | +For example - Imagine you have the exam scores of students from three classes. A box plot is a way to show how these scores are spread out. |
| 6 | + |
| 7 | +## Key Ranges in Data Distribution |
| 8 | + |
| 9 | +The data can be distributed between five key ranges, which are as follows - |
| 10 | +1. Minimum: Q1-1.5*IQR |
| 11 | +2. 1st quartile (Q1): 25th percentile |
| 12 | +3. Median: 50th percentile |
| 13 | +4. 3rd quartile(Q3): 75th percentile |
| 14 | +5. Maximum: Q3+1.5*IQR |
| 15 | + |
| 16 | +## Purpose of Box Plots |
| 17 | + |
| 18 | +We can create the box plot of the data to determine the following- |
| 19 | +1. The number of outliers in a dataset |
| 20 | +2. Is the data skewed or not (skewness is a measure of asymmetry of the distribution) |
| 21 | +3. The range of the data |
| 22 | + |
| 23 | +## Creating Box Plots using Matplotlib |
| 24 | + |
| 25 | +By using inbuilt funtion boxplot() of pyplot module of matplotlib - |
| 26 | + |
| 27 | +Syntax - matplotlib.pyplot.boxplot(data,notch=none,vert=none,patch_artist,widths=none) |
| 28 | + |
| 29 | +1. data: The data should be an array or sequence of arrays which will be plotted. |
| 30 | +2. notch: This parameter accepts only Boolean values, either true or false. |
| 31 | +3. vert: This attribute accepts a Boolean value. If it is set to true, then the graph will be vertical. Otherwise, it will be horizontal. |
| 32 | +4. position: It accepts the array of integers which defines the position of the box. |
| 33 | +5. widths: It accepts the array of integers which defines the width of the box. |
| 34 | +6. patch_artist: this parameter accepts Boolean values, either true or false, and this is an optional parameter. |
| 35 | +7. labels: This accepts the strings which define the labels for each data point |
| 36 | +8. meanline: It accepts a boolean value, and it is optional. |
| 37 | +9. order: It sets the order of the boxplot. |
| 38 | +10. bootstrap: It accepts the integer value, which specifies the range of the notched boxplot. |
| 39 | + |
| 40 | +## Implementation of Box Plot in Python |
| 41 | + |
| 42 | +### Import libraries |
| 43 | +import matplotlib.pyplot as plt |
| 44 | +import numpy as np |
| 45 | + |
| 46 | +### Creating dataset |
| 47 | +np.random.seed(10) |
| 48 | +data = np.random.normal(100, 20, 200) |
| 49 | +fig = plt.figure(figsize =(10, 7)) |
| 50 | + |
| 51 | +### Creating plot |
| 52 | +plt.boxplot(data) |
| 53 | + |
| 54 | +### show plot |
| 55 | +plt.show() |
| 56 | + |
| 57 | +### Implementation of Multiple Box Plot in Python |
| 58 | +import matplotlib.pyplot as plt |
| 59 | +import numpy as np |
| 60 | +np.random.seed(10) |
| 61 | +dataSet1 = np.random.normal(100, 10, 220) |
| 62 | +dataSet2 = np.random.normal(80, 20, 200) |
| 63 | +dataSet3 = np.random.normal(60, 35, 220) |
| 64 | +dataSet4 = np.random.normal(50, 40, 200) |
| 65 | +dataSet = [dataSet1, dataSet2, dataSet3, dataSet4] |
| 66 | +figure = plt.figure(figsize =(10, 7)) |
| 67 | +ax = figure.add_axes([0, 0, 1, 1]) |
| 68 | +bp = ax.boxplot(dataSet) |
| 69 | +plt.show() |
| 70 | + |
| 71 | +### Implementation of Box Plot with Outliers (visual representation of the sales distribution for each product, and the outliers highlight months with exceptionally high or low sales) |
| 72 | +import matplotlib.pyplot as plt |
| 73 | +import numpy as np |
| 74 | + |
| 75 | +### Data for monthly sales |
| 76 | +product_A_sales = [100, 110, 95, 105, 115, 90, 120, 130, 80, 125, 150, 200] |
| 77 | +product_B_sales = [90, 105, 100, 98, 102, 105, 110, 95, 112, 88, 115, 250] |
| 78 | +product_C_sales = [80, 85, 90, 78, 82, 85, 88, 92, 75, 85, 200, 95] |
| 79 | + |
| 80 | +### Introducing outliers |
| 81 | +product_A_sales.extend([300, 80]) |
| 82 | +product_B_sales.extend([50, 300]) |
| 83 | +product_C_sales.extend([70, 250]) |
| 84 | + |
| 85 | +### Creating a box plot with outliers |
| 86 | +plt.boxplot([product_A_sales, product_B_sales, product_C_sales], sym='o') |
| 87 | +plt.title('Monthly Sales Performance by Product with Outliers') |
| 88 | +plt.xlabel('Products') |
| 89 | +plt.ylabel('Sales') |
| 90 | +plt.show() |
| 91 | + |
| 92 | +### Implementation of Grouped Box Plot (to compare the exam scores of students from three different classes (A, B, and C)) |
| 93 | +import matplotlib.pyplot as plt |
| 94 | +import numpy as np |
| 95 | +class_A_scores = [75, 80, 85, 90, 95] |
| 96 | +class_B_scores = [70, 75, 80, 85, 90] |
| 97 | +class_C_scores = [65, 70, 75, 80, 85] |
| 98 | + |
| 99 | +### Creating a grouped box plot |
| 100 | +plt.boxplot([class_A_scores, class_B_scores, class_C_scores], labels=['Class A', 'Class B', 'Class C']) |
| 101 | +plt.title('Exam Scores by Class') |
| 102 | +plt.xlabel('Classes') |
| 103 | +plt.ylabel('Scores') |
| 104 | +plt.show() |
0 commit comments