Chapter 3 Zica

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 49

CHAPTER 3

STATISTICS

3.1 Introduction

This Chapter provides the students with a general awareness and understanding of
the collection and presentation of numerical information, including frequency
distributions. At the end of the Chapter the student will have a basic ability in the
analysis and interpretation of statistical data.

3.2 Sources of Statistics

In many applications of statistics, businesses use internal data – that is data arising
from bookkeeping practices, standard operating business procedures, or planned
experiments by research divisions with the company. Examples are profit and
loss statements, employee salary information, production data and economic
forecasts. The data sourced from outside the firm is called external data. Internal
data may be of two types. Primary data and Secondary data. By primary data, we
obtain data from the organization that originally collected them. An example is
the population data collected by and made available from the Central Statistical
Office (CSO) Zambia. Secondary data come from a source other than the one that
originally collected them. Users of secondary data cannot have a clear
understanding of the background as the original investigator, and so may be
unaware of the limitations of the data at hand.

There are many excellent sources of published (Primary and secondary) data
compiled by the state, by business and economic associations, and by commercial
sources (periodicals). Some examples are:

CSO Journal, Bank of Zambia Journal, A – Z Business Journal etc.

3.3 Descriptive Statistics

When a survey or an experiment has produced a body of data, the original state of
data will not generally convey much information about the characteristics of
interest. Typically, they will be too many reservations to give on insight into the
nature of data. It is necessary to organize and reduce the data into such
meaningful forms as graphs and charts or such numerical quantities as averages,
totals and percentages. The resulting statistical summaries of the data can be used
as a framework for data analysis and interpretation.

There are basically two methods of describing data. The graphical method and
numerical method. This Chapter focuses on both of these methods.
Population

45
We use the word population to describe possible measurements of the particular
characteristic under consideration. A population can be finite (small or large) or
infinite (in the sense that it is particularly impossible to count its size). For
example, the number of students in a class (small), the yearly output of a certain
type of soft drinks (large), the number of particles of sand in the world (infinite).

Sample

A sample is a part of a population in which the population characteristic is studied


so that inference may be made from the sample to study about the entire
population.

Frequency

In any population two or more members may have the same value. For example,
the height (to the nearest cm) of several members of a school may be the same.
The number of members with the same value is known as the frequency and is
generally denoted by f.

3.3.1 Frequency Distributions

Any data not arranged in a given order is called raw data otherwise it is an array
of data.

Example 1

The following data record the number of children under the age working in a
certain company

1 1 3 2 0 8 8 6 7 7 8
6 8 8 1 1 0 0 2 9 4 4

Construct an array and also a frequency distribution.

The data array is as follows: The data is arranged in increasing order.

0 0 0 1 1 1 1 2 2 3 4 4 6 6 7 7 8 8 8 8 8 9

To construct a frequency distribution start with a tally chart.

46
Tally Chart

Data Value Tally Marks Total


0 III 3

1 IIII 4

2 II 2

3 I 1

4 II 2

6 II 2

7 II 2

8 IIII 5

9 I 1

Frequency Distribution Table

No. of Children No. of Children


Under Age working (f)
0 3

1 4

2 2

3 1

4 2

6 2

7 2

8 5

9 1

47
When the number of distinct data values in a set of raw data is large (20 or more,
say), a simple frequency distribution is not appropriate, since there will be too
much information, not easily assimilated. In this type of situation, a grouped
frequency distribution is used. An example of a grouped distribution is given
below.

Salary Scale ‘K’000,000’ No. of Workers


5 and  10 5
10 and  15 6
15 and  20 8
20 and  25 3

Table 3.1 Frequency Grouped Distribution for Salary Scale.

3.3.2 Cumulative Frequency Distributions

A cumulative frequency distribution describes the number of items that have


values either above or below a particular level. Cumulative frequency
distributions come in two different forms:

i) “less than” distributions


ii) “more than” distributions.

Example 2

From Table 3.1, construct

i) “less than” distribution,


ii) “more than” distribution

i)
Salary scale No. of Salary scale No. of workers
‘K’000,000’ workers
5 and  10 5  10 5
10 and  15 6  15 11
15 and  20 8  20 19
20 and  25 3  25 22

Here, a set of items values is listed (normally the class “upper


boundaries”), with each one showing the number of items in the
distribution having values less than this item value.

48
ii) Here, a set of item values is listed (normally the class “lower boundaries”)
with each one showing the number of items in the distribution having
values greater than this item value. See the table below.

Salary scale No. of Salary scale No. of workers


‘K’000,000’ workers
5 and  10 5  10 22
10 and  15 6  15 17
15 and  20 8  20 11
20 and  25 3  25 3

3.3.3 Results Presentation

One of the most effective ways of presenting information, particularly numerical


information, is to construct a chart or a graph. The choice depends on the type of
data. A set of data is discrete if we only need to make a count, like the number of
customers entering a shop. A set of data is continuous if measurement is made on
a continuous scale, such as time, weight etc.

For discrete data, we use bar charts, and pie charts while for continuous data, we
use a histogram.

However, a disadvantage of graphs may be that values may not be read


accurately. But graphs are not meant to show up quantitative details as tables do,
graphs are meant to show effects.

Example 3

The following information shows the total turnover of Mukulumpe plc, analyzed
by geographical segment.

Sales K’Billion K’Billion K’Billion


20 x 3 20 x 4 20 x 5
West Africa 82 78 65
East Africa 41 31.2 22
Southern Africa 20.5 18 17
Central Africa 61.5 4.2 4.5
205 131.4 108.5

Show this as a bar chart.

49
In a simple bar chart, the number observed (counts) whether by ‘geographical
segment”, or ‘years” or some other category can be represented as vertical bars.
The height of each bar is drawn in proportion to the number (amounts) by a
vertical ruler scale. Figure 3.1 shows the sales of each geographical segment.

Sales 200
(K’billion)

150

100

50

20x3 20x4 20x5 year

Figure 3.1 Sales of Mukulumpe Plc.

Component Bar Charts

There are used to show the breakdown of a total into components. The bars of the
simple bar chart are subdivided to show component parts. They are two kinds of
component bar charts.

i) Component bar chart (actuals).

In these charts the overall heights of the bars and the individual
components heights represent actual figures.

ii) Percentage component bar chart

In these charts the individual component lengths represent the percentage


each component forms of the overall bar total. Note that the series of such
bars will all be the same total height, i.,e. 100 per cent.

Example 4

50
Construct

i) component bar chart,

ii) percentage component bar chart,

For the data in Example 3

i) Component Bar Chart

Sales 200
(K’billion)

150

100

50

20x3 20x4 20x5 year

West Africa

East Africa

Southern Africa

Central Africa

ii)

51
100

80

Sales (%)
40

20

20x3 20x4 20x5


year

West Africa

East Africa

Southern Africa

Central Africa

Multiple Bar Charts

These are similar to component bar charts but here the components are shown side by
side. As this does not give an immediate impression of the change in totals, they should
be used where we want to demonstrate the change in size of the components.

Example 5

Show the above data as a multiple bar chart.

52
150
Sales
(K’billion)
100

50

20x3 20x4 20x5


years

West Africa

East Africa

Southern Africa

Central Africa

Histogram

This is a bar chart. It is appropriate where there is need to show grouped data which is
continuous. There are no gaps between the bars. The total area of each bar represents the
frequency of the event.

Example 6

The marks obtained by students in a NATech paper were as follows:

Percentage (%) No. of Students


25 – 29 10
30 – 34 15
35 – 39 12
40 – 44 2053
45 - 49 3
Show this as a histogram

20
No of students
16

12

0
25 - 29 30 – 34 35 – 39 40 - 44 45 - 49

Marks

Pie Charts

A pie chart is a circle or ‘pie’, divided radically into sectors which represent component
parts of the total. The 360o at the center of the circle are divided in proportion to the data
thus giving sectors with areas proportional to the values of the components parts.

Pie charts can be used to show changes in components where the number of components
is too great for a bar chart, though a pie chart with more than seven or eight components
would become too clouded for ready interpretation.

Example 7

For the data in Example 3, for the 20x3, construct a pie chart.

54
Central West
Africa Africa

Southern East
Africa Africa

Figure 3.2. A Pie Chart showing turnover of Mukulumpe Plc.

Calculations

82
West Africa  360o  144o
82  41  20.5  61.5

41
East Africa  360o  72o
82  41  20.5  61.5

20.5
Southern Africa  360o  36o
82  41  20.5  61.5

61.5
Central Africa  360o  108o
82  41  20.5  61.5

Exercise 1

1. Obtain a number of charts and graphs used to describe quantitative data. This is
the data produced by ordinal, interval or ratio scales. Sources include, for
example, newspaper cuttings, magazines or textbooks. Classify each as being
discrete or continuous data and state reasons why you consider them to be
informative or misleading.

55
2. The data below give the scores obtained in an aptitude test by a group of 40
applicants for a particular post in a company

8 9 9 10 11 9 10 8 9 11
12 9 12 6 8 9 8 10 9 8
12 8 9 11 9 12 7 11 9 8
9 8 10 9 8 10 9 8 9 10

construct a frequency distribution from this information.

3. A survey of 55 retail outlets in the Kitwe area gave the following distribution of
mango prices.

Price (Kwacha/g) 250 100 185


Number of stores 2 23 6

Construct a bar chart for the given distribution.

4. From sales ledger of a small company, the age of a sample of 100 debts are shown
in the distribution below. Construct a histogram of this distribution.

Age of debt(days) 1-10 11-20 21-30 31-40 41-50 51-60


No. of accounts 24 28 22 16 6 4

5. The age distribution of a random sample of 500 people in Ndola is shown below.
Construct a histogram from this table.

Age (Years) Under 2 2- 5- 10- 30-


Number of people 98 107 170 75 50

6. Draw:

i) ‘less than’ distribution


ii) ‘greater than’ distribution

given the distribution of bonus payments made to 150 employees in a company


shown below.

56
Monthly bonus (Kwacha) 0- 10- 20- 30- 40- 50-
No. of employees 6 44 36 30 8 6

7. Draw a pie chart to illustrate the expenditure of a large company on a number of


advertising methods.

Method of Advertising Radio Newspaper Competitions


Others
Expenditure during 50 20
2003 (X ‘K1 000 000)

8. Use a bar chart to illustrate the number of workers employed in four factories as
tabulated below.

Factory A B C D
No. of employees 130 310 260 160

9. Draw a component bar chart of the data given below, for factories, X, Y, Z and
W.

No. of Employees
X Y Z W
Unskilled 30 40 50 40
Semi-skilled 50 110 100 110
Skilled 70 180 130 30

10. Draw a multiple bar chart to illustrate the performances of three


companies over a four year period.

Output (X K1, 000,000)


2000 2001 2002 2003
Company X 400 380 365 350

57
Company Y 285 340 355 340
Company Z 180 200 220 230

3.4 Measure of Central Tendency

This Section describes the most commonly used averages, the arithmetic mean,
median and mode.

3.4.1 Measure of central tendency for ungrouped data

The mean is the most used measure of location, with the median and the
mode being used for specific (special case) applications. The arithmetic
mean is the name given to the ‘simple average’ that most people calculate.

Arithmetic mean = Total value of items


Total number of items

It is easy to understand and a very effective way of communicating an


answer. It does not apply to categorical data and its interpretation can be
difficult when used with ordinal data, but it is often justified for practical
reasons. Mathematically it is very useful for further calculations. All the
data is included in its calculation. Its disadvantage is that it is easily
affected by very high or very low value and cannot be measured or
checked graphically. Further more, it may not correspond to any actual
value in the distribution itself.

Example 8

Consider the following prices of a packet of milk from 12 different retail outlets.

K280 K275 K290 K310


K185 K195 K200 K225
K175 K200 K190 K195

58
What is the mean price?

The mean of a set of values is their total divided by the number of items. In our example,
the mean is

280 + 275 + 290 + 310 + 185 + 195 + 200 + 225 + 175 + 200 + 190 + 185 /12

2710

12

 225.833

We usually employ the symbol x (pronounced, ‘ x bar’) to represent the mean of a


sample. A general formula for the mean of a sample of n items is therefore

x1  x2  x3  . . . xn
x
n

The short form is x  


x
n
where  is the Greek symbol for capital “S” for sum and

 x is simply translated as “add up all the values of x under consideration”.

The mode is the number which appears more times than any number in a given set. It is
quoted as a typical value of the variable. The mode can be of great assistance in
manufacturing and production. For example production of shoes, clothes, cars, etc. It is
not affected by very low or very high values and it is an actual value of the distribution.
However, it is not clearly defined when no two items have the same value, or two or
more items have the same highest frequency.

Example 9

In Example 8, find the mode of the given data set.

The mode by definition, is the most ‘common’ number – the value which occurs most
often in the data set. There are two numbers which appear more times than any other
numbers hence, there are two modes K200 and K195.

59
Note that a distribution can have one mode, two modes (bimodal), three modes etc. The
mode is used to describe the size of shoes, clothes or the most popular make of a car,
television etc.

The median

The median is not as widely used as the mean or mode, but has particular applications.
For example the use of the IQ scale with the average figure of 100. Also in the real world
we must often deal with data, like salary distribution where relatively small numbers of
extreme values can distort the arithmetic mean, the median makes it a typical value. It is
easily obtained and not affected by high or low values. However, if the number of items
is small or the items are not evenly spread, the median loses a lot of its significance.

Example 10

Calculate the median for the following data:

310, 290, 280, 275, 225, 195, 200, 200, 190, 185, 175.

Recall that there are n items of data in our sample. The position of median is therefore
( n  1)
the th from smallest (or largest) when n is odd. Placing out data in increasing
2
order, we have

310, 290, 280, 275, 225, 200, 200, 195, 190, 185, 175.

(11  1)
The position of the median is  6 , hence the median is 200.
2

Example 11

In Example 10, suppose the number 280 is dropped. Find the median of the new
data set. Arranging the data, in increasing order, we have 175, 185, 190, 200,
200, 225, 275, 280, 290, 310

60
n 1
n  10 is an even number. Hence then position is not a whole number and
2
so the median is taken as the average of the two middle values. So the median is
(10  1)
the  5.5 from the largest item which is the average of the 5 th and 6th
2
from largest values.

200  200
Median is =  200.
2

3.4.2 Measure Of Central Tendency For Grouped Data

For a grouped frequency distribution, the mean, mode, and median cannot be
determined exactly and so must be estimated. This will be illustrated in the
following example.

Example 12

Given the distribution of ages in a certain firm as shown in the table below:
calculate

i) mean
ii) median
iii) mode

Age Number of Employees


15 to 19 3
20 to 24 15
25 to 29 30
30 to 34 45
35 to 39 8

i) In a frequency distribution, the mean x 


 fx where x is the middle point of the
f
class interval.

We construct the following table for calculation of the mean

Age No. of Employees (f) Mid-class point fx


( x)
15-19 3 17 51
20-24 15 22 330

61
25-29 30 27 810
30-34 45 32 1440
35-39 8 37 296
Totals 101 2927

Here  fx  2927 and  f  101

Therefore mean number of ages

x
 fx
f
2927

101

 28.98

ii) We use the median formula given by

 0.5 N  Fm 1 
Median = Lm    Cm
 f m 

Where Lm  lower boundary of the median class interval

Fm 1  cumulative frequency of class below the median class interval

f m  Actual frequency in the median class interval

Cm  Median class width

In our on going example, we need a column of cumulative frequency (F)

Age (years) f F
15-19 3 3
20-24 15 18
25-29 30 48
30-34 45 93

62
35-39 8 101

Calculate .5N = .5(101) = 50.5. This gives us the position of the median.
Therefore the median class interval is 30 to 34. This interval contains the 50.5 th
observation. The median can now be estimated using the formula given below.

Lm  30; Fm  48; f m  45; Cm  4

 0.5 N  Fm 1 
Thus , Median  Lm    Cm
 fm 

 50.5  48 
 30    (4)
 45 

 30.2222

i.e median = 30.22 years (two decimal places).

iii) An estimate of the mode for a grouped frequency distribution can be obtained
using the formula

 ba 
Mode = L  G
 2b  a  c 

Where: L = the lower boundary of modal class interval


G = modal call interval width
a = frequency of class immediately below modal class interval
b = frequency of modal class interval
c = frequency of class immediately above modal class.

The modal class interval is 30 to 34 it has the highest frequency of 45.


Therefore, L = 30, G = 4, a = 30, b = 45, c = 8.

 bc 
Thus, Mode = L   G
 2b  a  c 

 45  30 
 30    ( 4)
 2( 45)  30  8 

 31.1538

63
i.e mode = 31.15 (two decimal places)

Example 13

From Example 11, find

i) the median
ii) the mode, graphically

i) The median

A percentage cumulative frequency curve is drawn and the value of the


variable that corresponds to the 50% point (i.e half way along the
distribution) is read off and gives the median estimate. The method is
shown in the worked example.

Step 1

Age (years) No. of Employees


15-19 3
20-24 15
25-29 30
30-34 45
35-39 8

Table 1. Number of Employees

Step 2

Upper boundary F F%
19 3 3.0
24 18 17.8
29 48 47.5
34 93 92.1
39 101 100

Table 2.0 Cumulative frequency of Employees

64
Percentage 
Number of 100
Employees 

80

60

50% point

40 

 Median estimate = 30
20

19 24 29 34 39
Age upper boundary

Fig 1 Cumulative Frequency Curve ( orgive) of Example 11.

The points to remember are:

i) Form a cumulative (percentage) frequency distribution.

ii) Draw up a cumulative frequency curve by plotting class upper boundary, against
cumulative percentage frequency and join the points with a smooth curve.

iii) Read off the 50% point to give the median.

i) We construct three histogram bars, representing the class with the highest
frequency and the ones on either side of it, we then draw two lines as
shown in Figure 1.0. The mode is the value of x corresponding to the
intersection of the lines.

65
x

Figure 1.0

The histogram bars in Figure 2.0 represents the following three classes and
frequencies.

25 to 29 30
30 to 34 45
35 to 39 8

Number of
Employees

50

40

66
30

20

0
25 30 35 39

Age (years)
Mode estimate = 31
Figure 2.0

Weighted Averages

Another common problem arises where the means of a number of groups need to be
combined to form a grand mean. For example, suppose a company has three outlets and
their average sales as as follows, X, K 900 000 per sales from 25 sales, Y, K112 000 per
sales from 40 sales and Z, K100 000 per sale from 30 sales. Find the average value per
sale overall.

Weighted mean =
 wx where w is the weight assigned to each average,.
w
For the data given above

29980000
mean 
95

 315578.95

i.e. average value of all sales = K315 578.95


Relationship Between Measures

The relative position of the mean, median and mode will tell us something about the
distribution of the data, as shown in the figure below.

Mode

median
mean

67
Negative skew

Mode
Median
Mean

Symmetrical

mode

median

mean

Positive skew

If the distribution is perfectly symmetric, all three measure will coincide.


Skewed Distribution.

The three measures will now spread out:

Mode - Correspond to the highest point

Mean - affected by extreme values, lies down the tail of the distribution

Median - dividing the area under the curve in two, lies between the mean
and the mode.

Roughly: mode = mean – 3 (mean – median)

Exercise 2

68
1. Find the arithmetic mean of the following data sets.

a) 560, 520, 540, 720, 650, 470, 680, 600

b) 8.8, 9.3, 9.8, 7.9, 10.2, 8.5

c) 12.9, 13.4, 13.8, 14.3, 16.9, 17.1, 13.8,

d) 6, 25, -8, 14, -22, 33

e) -3, -4, -10, -18, -9

2. Find the mean of the following frequency distributions.

a)
x 20.5 12.5 35.5
f 8 10 14

b)
x 2 3 4 5 6
f 5 6 12 30 32

c)
x 35-40 -45 -50 -55
f 8 20 25 34

d)
x 0-9 10-19 20-29 30-39
f 2 5 20 25

e)
x 20-30 -40 -50 -60 -70 -80 -90
f 4 60 75 12 15 10 3

3. The mean salaries of 150, 200 and 250 men employed by three different firms are
K300 000, K250 000 and K450 000 per month respectively. Calculate the mean
salary per month of all the men.

4. The maize yields in a particular region over the past 10 years are (millions of
tons): 2.3, 1.5, 1.2, 1.6, 1.7, 2.8, 1.4, 1.2, 1.3, 1.8.

69
Estimate:

i) The average
ii) The median
iii) The mode.

5. Using the graphical method, estimate:

i) The median
ii) The mode of the distribution given below.,

x 0 1 2 3 4
f 25 28 6 3 3

6. The total price of units ordered from a warehouse of a certain commodity is


shown in the distribution below.

Cost of units ordered per day (Kwacha) No. of days


0 and under 50 3
50 and under 100 8
100 and under 150 9
150 and under 200 17
200 and under 250 10
250 and under 300 9

a) Compute the mean

b) Using both the graphical and formula method, estimate:


i) the median
ii) the mode
7. Estimate:

a) The arithmetic mean,


b) The median, and
c) The mode for both of the following frequency distributions.

i)
x 0-2 2-4 4-6 6-8 8-10
f 0-2 2-4 4-6 6-8 8-10

ii)
x 10-15 15-20 20-30 30-50 50-60
f 5 12 14 4 2

70
8. A survey of workers in a particular industrial sector produced the following table.

Income (Weekly ‘000’) Number


Under K100 180
K100 but under K150 235
K150 but under K200 210
K200 but under K250 150
K250 and over 100

Compute the mean, median and mode.

9. The number of new orders received by a company over the past 30 working days
were recoded as follows:

4 0 2 1 2 3
5 3 1 1 4 5
5 6 3 2 6 4
4 0 4 3 3 2
5 3 2 4 5 6

Determine the mean, median and mode.

10. Which measure of central tendency would most effectively describe?

a) The weight of a person?


b) The most popular make of television set?
c) Earnings of part time workers in Zambia?
d) Cost of typical food item at a market?
e) Holiday destinations?
f) Learning days lost through class boycotts?

3.5 Measure of Dispersion

Having obtained a measure of location or position of a distribution, we need to


know how the data is spread about that point. Information about the spread can
be given by one or more measures of dispersion.

71
The Range

This is the simplest measure of dispersion available in statistical analysis. It uses


only two extreme values. The range is defined as the difference between the
maximum and minimum values of a given data set.

Its advantage lies in its simplicity and its independence of the measure of position.
However, it is distorted by the extreme values and tells us nothing between the
maximum and minimum values.

Example 13

Find the range for the given data set.

1, 3, 4, 10.

The range is 10 – 1 = 9

The Quartile Deviation.

The median divides the area under the frequency curve in two. The quartiles
divide the area in four.

Frequency

QL Median QU

72
(n  1)
The position of the lower quartile QL is given by . That of the upper quartile
4
3
QU is given by ( n  1).
4

The interquartile range is the distance between the quartiles = QU  QL i.e the range of
the middle 50% of the distribution.

The quartile deviation or semi-quartile range is half of the interquartile range.

1
QD  (QU  QL )
2

The advantages of the quartiles is that they are easy to understand and are not affected by
extreme values. However, they do not cover the whole of the distribution. They give no
indication of how many items are dispersed between QL and QU .

Example 14

Calculate the first and third quartiles for the following data set:

44, 76, 49, 52, 52, 48, 51.

We first arrange the data set in ascending order

44, 48, 49, 51, 52, 52, 76.

7 1
Q1 is the value of the th  2nd item, which is 48.
4
3
Q3 is the value of the (7  1)th  6th item, which is 52.
4

Notice that if there had been, say, more items in the set, the values of (n+1)/4 and
3(n+1)/4 would not have been whole numbers, which would have necessitated some sort
of interpolation formula to obtain (untypical) values. This is beyond this manual.

Example 15

Compute the interquartile range and the quartile deviation in Example 14.

73
Interquartile range = Q3  Q1  52  48  4

Q3  Q1 4
The quartile Deviation =   2.
2 2

Example 16

Compute the median and quartile deviation for the following distribution.

x f

3200 – 4000 2
4 000 – 4800 3
4800 – 5600 4
5600 – 6400 8
6400 – 7200 3

x f F(Cumulative frequency)

3200 – 4000 2 2
4 000 – 4800 3 5
4800 – 5600 4 9
5600 – 6400 8 17
6400 – 7200 3 20

Using the formula

(.5 N  Fm 1 )
Median = Lm  Cm
fm

.5 N  .5(20)  10

Lm  5600, Fm 1  9, fm  8

(10  8)
Median  5600  (800)
8
 5700

74
For Q1 ;

1 1
Position of first quartile = N  ( 20)  5
4 4

In the same formula for the media, we replace

1
Lm by LQ1 , .5 N by N, Fm 1 by FQi 1 and f m by f Q1 . Therefore, we have
4

LQ1  4000, FQ1 1  2, f Q1  4, CQ1  800

(.25 N  FQ1 1 )
Q1  LQ1  CQ1
f Q1

(5  2)
 400  (800)
4

Hence Q1  4600

For Q3 :

3N 3
Product of third quartile   ( 20)  15
4 4
LQ3  5600, FQ3 1  9, f Q1  8, CQ3  800

(.75 N  FQ3 )
Q3  LQ3  1
CQ3
f Q3

(15  9)
Q3  5600  (800)
8

Hence , Q3  6000 .

Therefore, Quartile deviation

75
1
 (Q3  Q1 )
2
1
 (6000  4600)
2

 700

3.5.1 The Mean Deviation

This measure is an average of the deviation of all items from the arithmetic mean.
To consider the deviation of an iten from the mean, only the size of the figure is
important, the sign is not taken into account i.,e. the modulus is taken. If this is
not done then the sum of the deviation i.e.  ( x  x ) will equal zero.

The following formulas are used depending on the kind of data set given.

xx
Mean deviation = for ungrouped data
n

xx
 f for grouped data.
f
Example 17

A greengrocer owns 10 shops in various parts of a certain town. The distances from the
wholesale fruit and vegetables market are 8, 13, 15, 20, 27, 33, 46, 59 , 65 and 72
kilometers.

a) Find the mean deviation of kilometers from the mean

b) Find also the mean deviation from the median

a)
x x  x

8 27.8
13 22.8
15 20.8
20 15.8
27 8.8
33 2.8
46 10.2
59 23.2
65 29.2
72 36.2

76
 x  358  x  x  181.8

x  35.8

xx
mean deviation = 
n
181.8

10

 18.18

b) Since n is even, the median is given by the average of the two middle
values.

27  33
Median =  30
2

x x  median
8 +22
13 17
15 15
20 10
27 3
33 3
46 16
59 29
65 35
72 42
 x  358  x  median  192

192
Median deviation =  19.2
10

Example 18

Given the following data, compute the mean deviation

Weekly Wage Number of Employees


K’000 000
31 and under 36 7
36 and under 41 9
41 and under 46 13

77
46 and under 51 19
51 and under 56 26

x f xf xx f x x

33.5 7 234.5 -13.24 92.68


38.5 9 346.5 -8.24 74.16
43.5 13 565.5 -3.24 42.12
48.5 19 921.5 1.76 33.44
53.5 26 1391 6.76 175.76
74 3459 418.16

x
 xf 
3459
 46.74
f 74

Mean deviation =
 f xx
f
418.16

74

 5.65

3.5.2 The Standard Deviation

The standard deviation is the most widely used measure of dispersion, since it is
directly related to the mean. If you chose the mean as the most appropriate
measure of central location, then the standard deviation would be the natural
choice for a measure of dispersion.

The standard deviation measures the differences from the mean; a larger value
indicates large variation. The standard deviation is in the same units as the actual
observations. For example if the observations are in cm, even the standard
deviation will be in cm.

To calculate the standard deviation, we follow the following steps.

1) compute the mean x

2) Calculate the differences from the mean ( x  x)

3) Square the differences ( x  x) 2

78
4) Sum the squared difference i.e.  ( x  x) 2
5) Take the average of the sum of the squared differences in (4) to find the
variance i.e.

S2 
 ( x  x) 2
for a sample and  2   ( x  x) 2
for a population.
n 1 N
6) Square root of the variance gives the standard deviation

S 
 ( x  x) 2
for a sample and    ( x  x) 2
for a population.
n 1 N

Example 19

For the following sample of 7 observations, find the standard deviation.

4, 5, 10, 13, 9, 7 and 8

The calculations are shown in the table below.

x xx ( x  x) 2

4 -4 16
5 -3 9
10 2 4
13 5 25
9 1 1
7 -1 1
8 0 0

Total 56 0 56

 x  56, n  7, therefore x
 x  56  8
n 7

S
 ( x  x) 2


56
n 1 6

 3.055 (3 decimal places).

79
Its weakness lies in its calculation and understanding which is more difficult than for
other measures. Moreover by squaring, it gives more than proportional weight to
extreme values.

Other uses of the standard deviation considered in this manual is in the measure of
relative standing.

1. Coefficient of Variation

Coefficient of variation calculates the standard deviation from a set of observation


as a percentage of the arithmetic mean.

S
Cv   100
x

The higher the coefficient of variation, the more variability there is in the set of
observations.

2. Skewness

Skewness in a set of data relates to the shape of the histogram which could be
drawn from the data.

( median  median)
Pearson coefficient of Skewness = 3

Positively skewed if sk  0
Negatively skewed if sk  0
Symmetric distribution sk  0

Example 20

The distribution shown below is the output of the factories of Quality Clothing Plc, for
the month of July 2005.

Monthly Output men’s Suits Number of Factories


25 and under 30 15

80
30 and under 35 30
35 and under 40 30
40 and under 45 20
45 and under 50 10
50 and under 55 15

Calculate the mean and standard deviation

Class Interval f x xf x2 f
25 – 30 15 27.5 412.5 11343.75
30 – 35 30 32.5 975 31687.50
35 – 40 30 37.5 1125 42187.50
40 – 45 30 42.5 850 36125.00
45 – 50 10 47.5 475 22562.50
45 – 50 15 52.5 787.5 41343.75

f  120  xf  4625, x 2
f  185250

This is grouped data and we use the following formulas.

81
mean 
 xf 
4625
  38.54 two decimal places 
f 120

 xf 
x f   f
2
2

Variance 

 f

  xf  2

 x2 f 
S tan dard deviation   
f
f

185250 
 4625 2
 120
119

 7.67(2 decimal places ).

Exercise 3

1. Explain the meaning of standard deviation to someone who doesn’t know


anything about statistics.

2. The number of new orders received by a company over the past 30 working days
were recorded as follows

4 0 2 1 2 3
5 3 1 1 4 5
5 6 3 2 6 4
4 0 4 3 3 2
5 3 2 4 5 6

Determine the range, quartile deviation and standard deviation.

3. For the following results of I.Q test , estimate:

82
a) The mean
b) The standard deviation
c) The interquartile range
d) The coefficient of variation.

Mark 65 85 90 95 99 100
No. of students 5 10 20 45 40 18
Mark 104 108 115 120 125
No. of students 20 19 15 8 3

4. Using the figures given below, calculate:

a) The range
b) The arithmetic mean
c) The median
d) The lower quartile
e) The upper quartile
f) The quartile deviation
g) Pearson’s coefficient of Skewness
h) The standard deviation

3 16 27 40 48 59
6 18 31 41 52 61
8 19 33 44 54 65
9 23 37 46 56 67
12

5. For the given frequency distribution, find

i) mean
ii) Mode
iv) Range
v) Standard deviation

x 3 4 5 6 7 8

f 1 3 4 8 5 6

83
6. The following data relates to the number of rooms per dwelling in Zambia for two
separate years.

Number of rooms 1 2 3 4 5 6 7 8 or more


Year 1 (%) 2 6 13 28 36 14 4 5
Year 2 (%) 3 5 10 24 31 24 6 5

For each year, calculate the mean, standard deviation and coefficient of variation.
Interpret the coefficient of variation based on the data at hand.

7. Explain the term ‘measure of dispersion’ and state briefly the advantages of using
the following measures of dispersion.

i) Range
ii) Quartile deviation
iii) Variance
iv) Standard Deviation

EXAMINATION QUESTIONS WITH ANSWERS

Multiple Choice Questions

1.1 What is the arithmetic mean of the following frequency distribution?

Interval 6.1 – 6.5 6.6 – 7.0 7.1 – 7.5 7.6 – 8.0 8.1 – 8.5
Frequency , f 3 16 32 20 9

A) 7.3 B) 7.4 C) 16 D) 32

(NATech, 1.2 Mathematics & Statistics, June 2003)

1.2 What is the quartile deviation of the following set of data?

3 6 8 9 10 12 16 18 19
23 27 20 32 35 40 42 44

A) 27 B) 22 C) 17 D) 11

84
(NATech, 1.2 Mathematics & Statistics, December 1998)

1.3 What is the variance of the following set of numbers? 4, 6, 8, 9, 13.

A) 2.4 B) 6.78 C) 9.2 D) 40


(NATech, 1.2 Mathematics & Statistics, June 2001)

1.4 A group of people have the following ages, 21, 32, 19, 24, 31, 27, 17, 21, 26 and
42. The median age of the group is

A) 31years B) 21years C) 25years D) 26years


(NATech, 1.2 Mathematics & Statistics, December 2004)

1.5 The number of books read by eleven members of the public last year were:

15, 30, 19, 32, 10, 7, 12, 20, 12, 24, 4

What is the quartile deviation of the number of books read?

A) 3 B) 8 C) 7 D) 6
(NATech, 1.2 Mathematics & Statistics, December 2003)

1.6 The mean wages of 50, 25 and 75 mean employees by three 930 different firms
are K40,000, K70,000 and K120, 000 per week. Calculate the mean range per
week of all the men.

A) K76 000 B) K85 000 C) K31 000 D) K76 667


(NATech, 1.2 Mathematics & Statistics, Nov/Dec 2000)

1.7 What is the approximate mean value per order of the following distribution

Value (K’000) 100 150 200 250


No. of orders 165 190 105 92

A) K110 000 B) K161 000 C) K175 000 D) K180 000


(NATech, 1.2 Mathematics & Statistics, December 1999 (Rescheduled)

1.8 The number of books ready by twelve members of the public last year were: 15,
30, 19, 32, 10, 7, 12, 20, 12, 24, 4 and 28.

85
What is the quartile deviation of the number of books read?

A) 3 B) 8 C) 7 D) 6
(NATech, 1.2 Mathematics & Statistics, June 2005)

1.9 A bar chart with three adjacent bars then a gap and three month and a further three
after a final gap is known as:

A) A simple bar chart. B) A component part bar chart

C) A multiple bar chart D) A percentage bar chart


(NATech, 1.2 Mathematics & Statistics, June 2005)

1.10 The eight accountants in the Standard Chartered Bank have the following years of
experience 5, 8, 5, 19, 7 and 11. Find, for these years of experience the median.

A) 8 B) 19 C) 9.5 d) 12.4

(NATech, 1.2 Mathematics & Statistics, December 2001)

SECTION B

QUESTION ONE

a) Find the first quartile Q1 , the second quartile Q2 and the third quartile Q3 and
the quartile deviation QD of the following data.

18, 2, 5, 13, 4, 8, 11, 7

(NATech, 1.2 Mathematics & Statistics, June 2002)

b) A company trades in five distinct geographical markets. In the last financial year,
its turnover was:

(K)

Congo DR 59.3
Congo Brazaville 61.6
Tanzania 15.8
Kenya 10.3
Zambia 9.9
Total 156.9

86
Draw a pie chart using the above figures.

QUESTION TWO

a) An analyst is considering two categories of companies: X and Y, for possible


investment. One of her assistants has compiled the following information on the
price earning ratios of the shares of the companies in the two categories over the
past year.

Price – Earning Ratios Number of Category Number of Category


X Companies Y Companies
4.95 to under 8.95 3 4
8.95 to under 12.95 5 8
12.95 to under 16.95 7 8
16.95 to under 20.95 6 3
20.95 to under 24.95 3 3
24.95 to under 28.95 1 4

Required:

Compute the standard deviations of these two distributions.

b) A College receives the following number of complaints per week.

Complaints per week 0 1 2 3 4


Number of weeks 5 12 7 2 1

What is the median value?

c) After receiving complaints from trade union representatives concerning the


disparity between higher and low paid workers in this company, the Personnel
manager of the company asks for information on the current salary structure.

He is given the following data:

Basic Wage (K’000) Number of Employees


under 100 3
100 to under 200 6
200 to under 300 11
300 to under 400 15
400 to under 500 12
500 to under 600 7
over 800 6

87
Required:

Calculate a statistical measure of mean deviation using the data given above.
(NATech, 1.2 Mathematics & Statistics, June 2005)

QUESTION THREE

An analysis of access time to a computer disc system was made during the running of a
particular computer program, which utilized disc file handling facilities. The results of
the 140 access time were as follows:

Access time in Milli seconds Frequency


30 and less than 35 22
35 and less than 40 27
40 and less than 45 21
45 and less than 50 31
50 and less than 55 21
55 and less than 60 18

i) Determine the mean access time for this program


ii) Determine the standard deviation of the access time for this program
iii) Interpret for your superior, who is not familiar with grouped data, what the results
in parts (i) and (ii) mean.
(NATech, 1.2 Mathematics & Statistics, December 2001)
QUESTION FOUR

a) The times, measured to the nearest second, taken by 30 students to complete an


algebraic problem are given below.

47 53 46 68 72
48 41 49 58 45
43 45 48 44 43
61 43 46 48 57
54 63 42 65 44
51 38 46 42 47

i) Group these times into a frequency table using eight equal class intervals,
the first of which contains measured times in the range 35 to 39 seconds.

ii) Which is the modal class of your distribution?


(NATech, 1.2/B1 Mathematics & Statistics, December, 1999(Rescheduled))

QUESTION FIVE

88
a) During the 1999/2000 session a college ran 70 different classes of which 44 were
‘English’, with a mean class size of 15. 2 and 26 were ‘History, with a mean class
size of 19.2. The frequency distribution of class size were as follows:

Size of Class No. of English No. of History


(No. of students( Classes classes
1-6 4 0
7-2 15 3
13-18 11 10
19-24 8 8
25-30 5 4
31-36.95 1 1

No student belonged to more than one class.

i) Calculate the mean class size of the college.

Suppose now that no class of 12 students or less had been allowed to run.
Calculate what the mean class size for the college would have been if the
student in such classes:

ii) had been transferred to the other classes.

iii) Had not be admitted to the college.


(NATech, 1.2 Mathematics & Statistics, Nov/Dec 2000)

QUESTION SIX

a) Consider the grouped frequency distribution below.

At Least Value less than Frequency


0 10 0
10 20 50
20 30 150
30 40 100

You are required to estimate the mode graphically.

b) The Director of a large company has decided to analyse the annual salaries that
are paid to staff. The frequency distribution of salaries that are currently being
paid is as follows:

89
Salary Number of Staff
(million kwacha)
Under 10 16
10 to under 20 30
20 to under 30 34
30 to under 40
40 to under 50
50 to under 70 22
70 to under 90 10
5
3

Records from five years ago include the following statistics about salaries that
were paid.

Then:

Mean salary = K18.95m Standard deviation K106m


Median Salary = K17.0m Quartile deviation K6.2m

You are required to help with the analysis by

i) Calculating the mean and standard deviation of current salaries.

ii) Interpreting the statistics that you have calculated.

(NATech, 1.2 Mathematics & Statistics, December 2003)

QUESTION SEVEN

a) The following is a frequency distribution of I.Qs of 100 children at a primary


school.

IQ Number of Children
50 - 59 1
60 – 69 2
70 – 79 8
80 – 89 18
90 – 99 23
100 - 109 21
110 – 119 15
120 – 129 9
130 - 139 3

90
i) The mean deviation

ii) The standard deviation of the IQ scores

b) Compare and comment on the values obtained for the two measure of dispersion
in (i) and (ii) above.
(NATech, 1.2 Mathematics & Statistics, December 2002)

c) The data in the following Table relates to the number of successful sales made
by the salesmen employed by a large microcomputer firm in a particular quarter.

No. of sales 0-4 5-9 10-14 15 – 19 20 – 24 25 - 29


No. of salesmen 1 14 23 21 15 6

Calculate:

i) The mean, and

ii) The standard deviation, of the number of sales.

(NATech, 1.2 Mathematics & Statistics, June 2003)

QUESTION EIGHT

a) Given the following data

Value Number of Orders


100 000 165
150 000 190
200 000 105
250 000 92

i) Find the mean, and


ii) The modal value per order

(NATech, 1.2 Mathematics & Statistics, December 1998)

91
b) The number of goals scored per game by a football player during 1997 – 1998
were as follows:

No. of goods, x 0 1 2 3 4 or more


No. of games, f 23 14 3 2 0

Calculate

i) The mean
ii) Variance, and
iii) Standard deviation of the number of goals per game.
(NATech, 1.2 Mathematics & Statistics, June 2001)

c) A sample of estimate of weekly sales for Product A are represented in the weekly
sales distribution below.

Weekly Sales (K’000) Number of Weeks


4000 – 5000 3
5000 – 6000 7
6000 – 7000 2
7000 – 8000 4
8000 – 9000 6
9000 – 10000 10
1000 – 11000 8
11000 – 12000 4
12000 – 13000 0
above - 13000 8

92
Calculate:

i) Arithmetic mean
ii) Modal sales
iii) Standard Deviation
iv) Coefficient of Skewness and comment on the distribution
(NATech, 1.2 Mathematics & Statistics, December 2004)

—B— Mean deviation, 77, 80


bar chart, 50, 51, 53, 54, 55, 56, 57, 58, Mean Deviation, 77
87 median, 58, 60, 61, 62, 63, 64, 65, 66,
—C— 68, 69, 71, 72, 73, 75, 78, 84, 86, 88,
Coefficient of Variation, 81 89
continuous scale, 49 mode, 58, 60, 61, 64, 65, 66, 68, 69, 71,
Cumulative frequency, 48, 65, 75 72, 91
—D— —P—
discrete, 49, 56 Pie Charts, 54
Distributions, 46, 48 Population, 46
—E— Primary data, 45
EXAMINATION QUESTIONS —Q—
WITH ANSWERS, 86 quantitative data, 56
—F— quartile, 74, 75, 76, 84, 86, 87, 88
formulas, 77, 83 Quartile Deviation, 73
Frequency, 46, 47, 48, 66, 73, 86, 89, 91 —R—
frequency distribution, 46, 48, 56, 61, random sample, 57
62, 64, 66, 85, 86, 90, 91, 92 Range, 73, 85
—G— —S—
Grouped Distribution, 48 Sample, 46
—H— Secondary data., 45
histogram, 49, 54, 56, 57, 66, 67, 82 Standard Deviation, 80, 85, 94
—I— Statistics, 45, 86, 87, 88, 89, 90, 91, 92,
interquartile, 74, 75, 84 93, 94
—L— —U—
lower quartile, 74, 84 upper quartile, 74, 84
—M—
mean, 58, 59, 60, 61, 62, 68, 69, 70, 71,
72, 77, 78, 79, 80, 82, 84, 85, 86, 87,
89, 90, 91, 92, 93, 94

93

You might also like