Skip to content

Commit c54f612

Browse files
authored
Create matplotlib-box-plots.md
1 parent 50d2b65 commit c54f612

File tree

1 file changed

+104
-0
lines changed

1 file changed

+104
-0
lines changed
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Box Plot
2+
3+
A box plot represents the distribution of a dataset in a graph. It displays the summary statistics of a dataset, including the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. The box represents the interquartile range (IQR) between the first and third quartiles, while whiskers extend from the box to the minimum and maximum values. Outliers, if present, may be displayed as individual points beyond the whiskers.
4+
5+
For example - Imagine you have the exam scores of students from three classes. A box plot is a way to show how these scores are spread out.
6+
7+
## Key Ranges in Data Distribution
8+
9+
The data can be distributed between five key ranges, which are as follows -
10+
1. Minimum: Q1-1.5*IQR
11+
2. 1st quartile (Q1): 25th percentile
12+
3. Median: 50th percentile
13+
4. 3rd quartile(Q3): 75th percentile
14+
5. Maximum: Q3+1.5*IQR
15+
16+
## Purpose of Box Plots
17+
18+
We can create the box plot of the data to determine the following-
19+
1. The number of outliers in a dataset
20+
2. Is the data skewed or not (skewness is a measure of asymmetry of the distribution)
21+
3. The range of the data
22+
23+
## Creating Box Plots using Matplotlib
24+
25+
By using inbuilt funtion boxplot() of pyplot module of matplotlib -
26+
27+
Syntax - matplotlib.pyplot.boxplot(data,notch=none,vert=none,patch_artist,widths=none)
28+
29+
1. data: The data should be an array or sequence of arrays which will be plotted.
30+
2. notch: This parameter accepts only Boolean values, either true or false.
31+
3. vert: This attribute accepts a Boolean value. If it is set to true, then the graph will be vertical. Otherwise, it will be horizontal.
32+
4. position: It accepts the array of integers which defines the position of the box.
33+
5. widths: It accepts the array of integers which defines the width of the box.
34+
6. patch_artist: this parameter accepts Boolean values, either true or false, and this is an optional parameter.
35+
7. labels: This accepts the strings which define the labels for each data point
36+
8. meanline: It accepts a boolean value, and it is optional.
37+
9. order: It sets the order of the boxplot.
38+
10. bootstrap: It accepts the integer value, which specifies the range of the notched boxplot.
39+
40+
## Implementation of Box Plot in Python
41+
42+
### Import libraries
43+
import matplotlib.pyplot as plt
44+
import numpy as np
45+
46+
### Creating dataset
47+
np.random.seed(10)
48+
data = np.random.normal(100, 20, 200)
49+
fig = plt.figure(figsize =(10, 7))
50+
51+
### Creating plot
52+
plt.boxplot(data)
53+
54+
### show plot
55+
plt.show()
56+
57+
### Implementation of Multiple Box Plot in Python
58+
import matplotlib.pyplot as plt
59+
import numpy as np
60+
np.random.seed(10)
61+
dataSet1 = np.random.normal(100, 10, 220)
62+
dataSet2 = np.random.normal(80, 20, 200)
63+
dataSet3 = np.random.normal(60, 35, 220)
64+
dataSet4 = np.random.normal(50, 40, 200)
65+
dataSet = [dataSet1, dataSet2, dataSet3, dataSet4]
66+
figure = plt.figure(figsize =(10, 7))
67+
ax = figure.add_axes([0, 0, 1, 1])
68+
bp = ax.boxplot(dataSet)
69+
plt.show()
70+
71+
### Implementation of Box Plot with Outliers (visual representation of the sales distribution for each product, and the outliers highlight months with exceptionally high or low sales)
72+
import matplotlib.pyplot as plt
73+
import numpy as np
74+
75+
### Data for monthly sales
76+
product_A_sales = [100, 110, 95, 105, 115, 90, 120, 130, 80, 125, 150, 200]
77+
product_B_sales = [90, 105, 100, 98, 102, 105, 110, 95, 112, 88, 115, 250]
78+
product_C_sales = [80, 85, 90, 78, 82, 85, 88, 92, 75, 85, 200, 95]
79+
80+
### Introducing outliers
81+
product_A_sales.extend([300, 80])
82+
product_B_sales.extend([50, 300])
83+
product_C_sales.extend([70, 250])
84+
85+
### Creating a box plot with outliers
86+
plt.boxplot([product_A_sales, product_B_sales, product_C_sales], sym='o')
87+
plt.title('Monthly Sales Performance by Product with Outliers')
88+
plt.xlabel('Products')
89+
plt.ylabel('Sales')
90+
plt.show()
91+
92+
### Implementation of Grouped Box Plot (to compare the exam scores of students from three different classes (A, B, and C))
93+
import matplotlib.pyplot as plt
94+
import numpy as np
95+
class_A_scores = [75, 80, 85, 90, 95]
96+
class_B_scores = [70, 75, 80, 85, 90]
97+
class_C_scores = [65, 70, 75, 80, 85]
98+
99+
### Creating a grouped box plot
100+
plt.boxplot([class_A_scores, class_B_scores, class_C_scores], labels=['Class A', 'Class B', 'Class C'])
101+
plt.title('Exam Scores by Class')
102+
plt.xlabel('Classes')
103+
plt.ylabel('Scores')
104+
plt.show()

0 commit comments

Comments
 (0)