Partition Values and Box-Plot
Partition Values and Box-Plot
Example-1: Ten-count tests made on certain yarn have shown following results:
22.8, 23.2, 22.9, 22.6, 23.4, 23.0, 23.1, 23.0, 22.9, 23.0.
Compute 𝑃25 𝑎𝑛𝑑 𝑃50 .
1|Page
Solution: Here, 𝑛 = 10.
Arranged data: 22.6, 22.8, 22.9, 22.9, 23.0, 23.0, 23.0, 23.1, 23.2, 23.4
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
𝑡ℎ
(10 + 1)
𝑃25 = [25 ∗ ] 𝑣𝑎𝑙𝑢𝑒 = 2.75𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
100
= 2𝑛𝑑 𝑣𝑎𝑙𝑢𝑒 + 0.75 ∗ (3𝑟𝑑 − 2𝑛𝑑 )
= 22.8 + 0.75 ∗ (22.9 − 22.8) = 22.875
𝑡ℎ
(10 + 1)
𝑃50 = [50 ∗ ] 𝑣𝑎𝑙𝑢𝑒 = 5. 5𝑡ℎ
100
= 5𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 + 0.5 ∗ (6𝑡ℎ − 5𝑡ℎ )
= 23.0 + 0.5 ∗ (23.0 − 23.0) = 23.0
2|Page
B) Formula for computing 𝑗𝑡ℎ percentile from grouped data/frequency distribution:
𝒄 𝑵
𝒋𝒕𝒉 𝒑𝒆𝒓𝒄𝒆𝒏𝒕𝒊𝒍𝒆 = 𝑷𝒋 = 𝑳𝒐 + ( ∗ 𝒋 − 𝒑. 𝒄𝒇)
𝒇 𝟏𝟎𝟎
𝑤ℎ𝑒𝑟𝑒, 𝐿0 = 𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝑗𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠
𝑐 = 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑜𝑓 𝑗𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠
𝑓 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑗𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠
𝑝. 𝑐𝑓 = 𝑐𝑓 𝑜𝑓 𝑝𝑟𝑒 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠
3|Page
Example-2: The following data are related to the linear density of yarn:
Linear 13.00– 13.25– 13.50– 13.75– 14.00– 14.25– 14.50–
density 13.25 13.50 13.75 14.00 14.25 14.50 14.75
No. of tests 8 12 20 25 22 10 4
Compute and explain 𝑷𝟓𝟎 𝒂𝒏𝒅 𝑷𝟕𝟓 .
Solution: First we have to compute cumulative frequency.
Linear density f cf
13.00–13.25 8 8
13.25–13.50 12 20
13.50–13.75 20 40
13.75–14.00 25 65
14.00–14.25 22 87
14.25–14.50 10 97
14.50–14.75 4 101
Total N=101
4|Page
𝑁 101
𝑷𝟓𝟎 : To locate 50th percentile class compute ∗ 50 = ∗ 50 = 50.5 ≈ 51; which is less than the 𝑐𝑓 =
100 100
65.
So, 13.75–14.00 is the 50th percentile class.
𝑐 𝑁 0.25
𝑃50 = 𝐿𝑜 + ( ∗ 50 − 𝑝. 𝑐𝑓) = 13.75 + ∗ (50.5 − 40) = 13.855
𝑓 100 25
This means 50% of 101 observations’ that is 50%*101=50.5=51 observations’ Linear density lies below
or equal 13.855
Compute 𝑃75 yourself.
2) Quartiles are 3 values which divide a data into approximately 4 equal parts. So, there are 3 quartiles
normally denoted as:
i. Q1 (1st quartile)
ii. Q2 (2nd quartile)
iii. Q3 (3rd quartile)
Note that,
5|Page
𝑄1 = 𝑃25
𝑄2 = 𝑃50 = 𝑀𝑒𝑑𝑖𝑎𝑛
𝑄3 = 𝑃75
However, we can determine 𝑗𝑡ℎ quartile by following formula:
𝑛 + 1 𝑡ℎ
𝑄𝑗 = (𝑗 ∗ ) 𝑣𝑎𝑙𝑢𝑒 ; 𝑗 = 1,2,3.
4
Example: Find 𝑄1 , 𝑄2 , 𝑄3 of the following data (CPU time of 11 jobs (in seconds)):
59, 139, 46, 37, 42, 30, 55, 56, 36, 82, 29.
Solution: Here, 𝑛 = 11.
Sorted data:
29, 30, 36, 37, 42, 46, 55, 56, 59, 82, 139.
So,
6|Page
11 + 1 𝑡ℎ
𝑄1 = 𝑃25 = (1 ∗ ) 𝑣𝑎𝑙𝑢𝑒 = 3𝑟𝑑 𝑣𝑎𝑙𝑢𝑒 = 36
4
11 + 1 𝑡ℎ
𝑄2 = 𝑃50 = (2 ∗ ) 𝑣𝑎𝑙𝑢𝑒 = 6𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 46
4
11 + 1 𝑡ℎ
𝑄3 = 𝑃75 = (3 ∗ ) 𝑣𝑎𝑙𝑢𝑒 = 9𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 59
4
Inter-quartile Range:
The inter-quartile range is the difference between the third quartile (Q3 or 75th percentile) and the first
quartile (Q1 or 25th percentile). That is,
𝑰𝑸𝑹 = 𝑸𝟑 − 𝑸𝟏 .
Question:
• In a class there are 40 students. Their mid-term marks were collected and 𝑃20 of marks was found
23. What does 𝑃20 = 23 means?
• Approximately how many students have got at most 23 (23 or less)?
7|Page
𝑃20 = 23 means 20% students (out of 40) have got 23 or less
So, number students who got 23 or less=40*20%=8.
Parallel/comparative boxplots
Boxplots are often used to compare different populations or parts of the same population. For such a
comparison, samples of data are collected from each part, and their boxplots are drawn on the same scale
next to each other.
9|Page
An example: Tensile Strength [DC Montgomery, Applied Statistics and Probability for Engineers, page-
353]
A manufacturer of paper is interested in improving the product’s tensile strength. Product engineering
believes that tensile strength is a function of the hardwood concentration in the pulp and that the range of
hardwood concentrations of practical interest is between 5 and 20%. A team of engineers responsible for
the study decides to investigate four levels of hardwood concentration: 5%, 10%, 15%, and 20%. They
decide to make up six test specimens at each concentration level by using a pilot plant. All 24 specimens
are tested on a laboratory tensile tester in random order. The data from this experiment are shown in Table
13.1.
10 | P a g e
It is important to graphically analyze the data from a designed experiment.
Figure 13.1 presents box plots of tensile strength at the four hardwood
concentration levels. This figure indicates that changing the hardwood
concentration has an effect on tensile strength; specifically, higher
hardwood concentrations produce higher observed tensile strength.
Furthermore, the distribution of tensile strength at a particular hardwood
level is reasonably symmetric, and the variability in tensile strength does
not change dramatically as the hardwood concentration changes.
Graphical interpretation of the data is always useful. Box plots show the
variability of the observations within a treatment (factor level) and the
variability between treatments.
11 | P a g e
Exercises
1) Find the third quartile and 60th percentile for the following data of daily wages of the temporary
workers and comment on them.
Daily wages (Rs) 40–49 50–59 60–69 70–79 80–89
No. of workers 15 20 30 45 25
2) Calculate the third decile, second percentile, and 70th percentile.
Fabric strength 45–49 50–54 55–59 60–64 65–69 70–74
No of samples 5 10 15 20 10 5
3) Calculate the first quartile, and 70th percentile for the following data and comment on them.
CV% 3.0–3.5 3.5–4.0 4.0–4.5 4.5–5.0 5.0–5.5
No. of tests 4 10 25 16 10
12 | P a g e
4) Compute median, and 38th percentile for the following data:
Weight in kg 10–19 20–29 30–39 40–49 50–59 60–69
No. of goods 6 10 16 14 8 4
5) Compute the first quartile, seventh decile, and 38th percentile for following data of U% of certain
yarn:
2.25 4.25 2.65 3.85 5.82 3.42 4.44 2.86 2.88 4.46
6) A textile company weaves a certain fabric on a large number of looms. The managers would like the
looms to be homogeneous so that their fabric is of uniform strength. It is suspected that there may be
significant variation in strength among looms. Consider the following data for 2 randomly selected
looms. Each observation is a determination of strength of the fabric in pounds per square inch.
Loom A 99 97 97 96 97 96 92 98
B 94 95 90 92 93 94 90 92
13 | P a g e
i) Construct a parallel/comparative box-plot of fabric strength of loom A and B.
ii) Which loom produces better fabric in terms of strength? Which loom’s has lower variability?
Answer from box-plot.
14 | P a g e