Presentationofdata PDF
Presentationofdata PDF
Presentationofdata PDF
1.1 INTRODUCTION
Once data has been collected, it has to be classified and organised in such
a way that it becomes easily readable and interpretable, that is, converted to
information. Before the calculation of descriptive statistics, it is sometimes a good
idea to present data as tables, charts, diagrams or graphs. Most people find
‘pictures’ much more helpful than ‘numbers’ in the sense that, in their opinion,
they present data more meaningfully.
1.2.1 Arrays
1. Minimum observation
2. Maximum observation
3. Number of observations, n
4. Mode
5. Median, if n is odd
1
Example
2 7 8 11 15
16 18 19 19 19
23 23 24 26 27
29 33 40 44 47
49 51 54 63 68
Table 1.2.1
1. Minimum = 2
2. Maximum = 68
3. Number of observations = 25
4. Mode = 19
5. Median = 24
Example
Table 1.2.2
2
Example
COURSE
BA B Com B Sc
Pass 37 25 33
RESULT
Supp 5 10 4
Fail 11 8 27
Table 1.2.3
A line graph is usually meant for showing the frequencies for various
values of a variable. Successive points are joined by means of line segments so
that a glance at the graph is enough for the reader to understand the distribution of
the variable.
The simplest of line graphs is the single line graph, so called because it
displays information concerning one variable only, in terms of its frequencies.
Example
Table 1.3.1.1
3
Line graph for ages of students
160
140
Number of students 120
100
80
60
40
20
0
19 20 21 22 23 24
Age
Fig. 1.3.1.2
Table 1.3.2.1
4
Multiple line graph for age distribution at academic institutions
160
140
Number of students
120
100
UoM
80 DCDMBS
UTM
60
40
20
0
19 20 21 22 23 24
Age
Fig. 1.3.2.2
The pie chart follows the principle that the angle of each of its sectors
should be proportional to the frequency of the class that it represents.
Merits
Limitations
5
1.4.1 Simple pie chart
Example
Using the same data from Table 1.2.3, but this time, including the total
number of students enrolled for BA, B Com and B Sc, we shall now display the
distribution of students for these three courses the population.
COURSE
BA B Com B Sc
Pass 37 25 33
RESULT
Supp 5 10 4
Fail 11 8 27
TOTAL 53 43 64
Table 1.4.1.1
BA
33%
B Com
40%
B Sc
27%
Fig. 1.4.1.2
6
1.4.2 Enhanced pie chart
This is just an enhancement (as the name says itself) of a simple pie chart
in order to lay emphasis on particular sector.
Example
Again, using the same data from Table 1.2.3, but this time, including the
total number of students enrolled for BA, B Com and B Sc, we shall now display
the distribution of students for these three courses the population.
COURSE
BA B Com B Sc
Pass 37 25 33
RESULT
Supp 5 10 4
Fail 11 8 27
TOTAL 53 43 64
Table 1.4.1.3
BA
33%
40% B Com
B Sc
27%
Fig. 1.4.1.4
7
1.5 BAR CHARTS
The bar chart is one of the most common methods of presenting data in a
visual form. Its main purpose is to display quantities in the form of bars. A bar
chart consists of a set of bars whose heights are proportional to the frequencies
that they represent.
Note that the figure may be drawn horizontally or vertically. There are
different types of bar charts, depending on the number of variables and the type of
information to be displayed.
General merits
General limitations
Note Any additional merit or limitation for each type of bar chart will be
mentioned in its corresponding section.
The simple bar chart is used for the case of one variable only. In Table
1.5.1.1 below, our variable is age.
Example
Table 1.5.1.1
8
Simple bar chart for age distribution of students
160
140
Number of students
120
100
80
60
40
20
0
19 20 21 22 23 24
Age
Fig. 1.5.1.2
The multiple bar chart is an extension of a simple bar chart when there are
quantities of several variables to be displayed. The bars representing the
quantities for the different variables are piled next to one another for each
attribute.
Example
COURSE
BA B Com B Sc
Pass 37 25 33
RESULT
Supp 5 10 4
Fail 11 8 27
TOTAL 53 43 64
Table 1.5.2.1
9
Multiple bar chart showing the results for BA, B Com and B Sc
40
35
30
25 Pass
Results
20 Supp
15 Fail
10
0
BA B Com B Sc
Courses
Fig. 1.5.2.2
Merits
Limitations
1. The figure becomes very cumbersome when there are too many variables
and components.
2. Only absolute, not relative, values are available – it is much easier to
compare component percentages across variables.
In this type of bar chart, the components (quantities) of each variable are
piled on top of one another.
10
Example
COURSE
BA B Com B Sc
Pass 37 25 33
RESULT
Supp 5 10 4
Fail 11 8 27
TOTAL 53 43 64
Table 1.5.2.1
70
60
50
Fail
Results
40
Supp
30
Pass
20
10
0
BA B Com B Sc
Courses
Fig. 1.5.2.2
Merits
Limitations
11
1.5.4 Percentage (component) bar chart
Fig. 1.5.4 presents the same data as for the previous example.
100%
90%
80%
70%
Fail
Results
60%
50% Supp
40%
Pass
30%
20%
10%
0%
BA B Com B Sc
Courses
Fig. 1.5.4
Merits
Limitations
12
1.6 HISTOGRAMS
The histogram should be clearly distinguished from the bar chart. The
most striking physical difference between these two diagrams is that, unlike the
bar chart, there are no ‘gaps’ between successive rectangles of a histogram. A bar
chart is one-dimensional since only the length, and not the width, matters whereas
a histogram is two-dimensional since both length and width are important.
Example
Consider the set of data in Fig. 1.6.1.1, which represents the ages of
workers of a private company. The real limits and mid-class values have already
been computed.
Table 1.6.1.1
13
The data is presented on the histogram in Fig. 1.6.1.2.
40
Number of workers (frequency)
35
30
25
20
15
10
0
[20.5, [25.5, [30.5, [35.5, [40.5, [45.5, [50.5, [55.5,
25.5) 30.5) 35.5) 40.5) 45.5) 50.5) 55.5) 60.5)
Age group of w orkers
Fig. 1.6.1.2
When class intervals are unequal, a correction must be made. This consists
of finding the frequency density for each class, which is the ratio of the frequency
to the class interval. The frequency densities now become the actual heights of
the rectangles since the areas of the rectangles should be proportional to the
frequencies.
Frequency
Frequency density =
Class interval
Example
14
Temperature Class Frequency Frequency
intervals density
[0 – 5) 5 3 0.60
[5 – 10) 5 6 1.20
[10 – 20) 10 10 1.00
[20 – 30) 10 15 1.50
[30 – 40) 10 10 1.00
[40 – 50) 10 5 0.50
[50 – 70) 20 5 0.25
Total 54
Table 1.6.2.1
Note [20 – 30) means ‘from 20 to 30, including 20 but excluding 30’.
1.6
1.4
1.2
Frequency density
1
0.8
0.6
0.4
0.2
0
0 10 20 30 40 50 60 70 80
Temparature (degrees Fahrenheit)
Fig. 1.6.2.2
15
1.7.1 Drawing a histogram first
Example
Temperature Frequency
[0 – 10) 2
[10 – 20) 7
[20 – 30) 11
[30 – 40) 17
[40 – 50) 9
[50 – 60) 3
[60 – 70) 1
Total 50
Table 1.7.1.1
18
16
14
12
Frequency
10
0
-10 0 10 20 30 40 50 60 70 80
Fig. 1.7.1.2
16
1.7.2 Direct construction
The frequency polygon may also be directly drawn by finding the points
on the figure. The x-coordinate of each point is the mid-class value of the cell
whilst the y-coordinate is the frequency of the cell (or frequency density if class
intervals are unequal). Successive points are then linked by means of line
segments.
In that state, the polygon would be ‘hanging in the air’, that is, it would
not touch the x-axis. To satisfy this ultimate requirement, we determine its left
(right) x-intercept by respectively subtracting (adding) the class intervals of the
first (last) classes from the x-intercept of the first (last) point.
Example
Using the data from Table 1.6.2.1, we have the following polygon:
45
Number of students (frequency)
40
35
30
25
20
15
10
5
0
20.5 – 25.5 – 30.5 – 35.5 – 40.5 – 45.5 – 50.5 – 55.5 –
25.5 30.5 35.5 40.5 45.5 50.5 55.5 60.5
Age of students
Fig. 1.7.2
17
1.8 OGIVES
Definition 1
Definition 2
Note For the rest of this course, we will denote ‘cumulative frequency’ by CF.
Example
Table 1.8.1
18
Note Careful inspection of Table 1.8.1 reveals that the ‘less than’ CF of a class
is also the overall rank of the last observation in that class. This is a very
important finding since it will be of tremendous help to us when
calculating percentiles.
The points on a ‘less than’ CF ogive have upper real limits for x-
coordinates and ‘less than’ CF for y-coordinates. This is quite easy to remember:
‘less than’ CFs are defined according to upper real limits!
If we use the data from Table 1.8.1, the following ‘less than’ CF curve is
obtained.
160
140
120
'Less than' CF
100
80
60
40
20
0
20.5 25.5 30.5 35.5 40.5 45.5 50.5 55.5 60.5
Fig. 1.8.2
Note The ‘less than’ CF ogive has an x-intercept equal to the lower real limit of
the first class.
19
1.8.3 ‘More than’ cumulative frequency ogive
The points on a ‘more than’ CF ogive have lower real limits for x-
coordinates and mores than’ CF for y-coordinates. Remember that ‘more than’
CFs are defined according to lower real limits!
Again, if we use the data from Table 1.8.1, the following ‘more than’ CF
curve is obtained.
160
140
120
'More than' CF
100
80
60
40
20
0
20.5 25.5 30.5 35.5 40.5 45.5 50.5 55.5 60.5
Fig. 1.8.3
Note The ‘less than’ CF ogive has an x-intercept equal to the upper real limit of
the last class.
20
1.9 STEM AND LEAF DIAGRAMS
Stem and leaf diagrams, or stemplots, are used to represent raw data, that
is, individual observations, without loss of information. The ‘leaves’ in the
diagram are actually the last digits of the values (observations) while the ‘stems’
are the remaining part of the values. For example, the value 117 would be split as
‘11’, the stem, and ‘7’, the leaf. By splitting all the values and distributing them
appropriately, we form a stemplot. The example in Section 1.9.1 would be a better
illustration of the above explanation.
Example
84 17 38 45 47
53 76 54 75 22
66 65 55 54 51
44 39 19 54 72
Table 1.9.1.1
In the first instance, the data is classified in the order that it appears on a
stemplot (see Fig. 1.9.1.2). The leaves are then arranged in ascending order (see
Fig. 1.9.1.3) – this is indeed a very practical way of arranging a set of data in
order if the number of observations is not very large.
21
1.9.2 Back-to-back stemplots
Example
FRENCH 75 69 58 58 46 44 32 50 53 78
81 61 61 45 31 44 53 66 47 57
ENGLISH 52 58 68 77 38 85 43 44 56 65
65 79 44 71 84 72 63 69 72 79
Table 1.9.2.1
Fig. 1.9.2.2
From Fig. 1.9.2.2, we can deduce that pupils performed better in English
than in French (since they had higher marks in English given the negative
skewness of the distribution).
Merits
22
1.10 BOX AND WHISKERS DIAGRAMS
1. Minimum value
2. Lower quartile
3. Median
4. Upper quartile
5. Maximum value
Example
Using the data from Table 1.9.1.1 in Section 1.9.1, we have the following
five summary statistics:
Minimum 17
Lower quartile 40.25
Median 53.5
Upper quartile 65.75
Maximum 84
Fig. 1.10.1
O 10 20 30 40 50 60 70 80 90 100
23
1.10.1 What information can be gathered from a boxplot?
Apart from the five descriptive statistics, we can deduce the following
about the distribution:
1. The range – the numerical difference between the maximum and the
minimum values.
2. The inter-quartile range – the difference between the upper and lower
quartiles. It measures the dispersion for the middle 50% of the distribution.
3. The skewness of the distribution – if the median is closer to the lower
(upper) quartile, the distribution is positively (negatively) skewed. If it is
exactly in the middle of those quartiles, the distribution is symmetrical.
Several boxplots may even be plotted on the same axes for comparison
purposes. We might wish to compare marks obtained by students in French and
English so as to study any similarities and differences between their performances
in these subjects.
French
English
O 10 20 30 40 50 60 70 80 90 100
Number of marks
Fig. 1.10.3
24
1.11 SCATTER DIAGRAMS
Just imagine that we wish to know whether the length of a metal rod varies
with temperature. We may choose to record the length of the rod at various
temperatures. It is clear here that ‘temperature’ is the independent variable and
‘length’ is the dependent one. These data are kept in the form of a table in which
‘temperature’ and ‘length’ are labelled as X and Y respectively. We next plot the
corresponding pairs of readings in (x, y) form on a graph, the scatter diagram. Fig.
1.11.2 is an example of a scatterplot.
Example
Temperature (0C) 13 50 63 58 20 78 39 55 29 62
Length (cm) 5.10 5.68 5.85 5.74 5.25 5.98 5.59 5.73 5.46 5.81
Table 1.11.1
6.1
6
5.9
5.8
5.7
Length
5.6
5.5
5.4
5.3
5.2
5.1
5
0 10 20 30 40 50 60 70 80 90
Temperature
Fig. 1.11.2
25
A scatterplot enables us to verify whether there does exist a causal
relationship between two variables by checking the pattern of points. In fact, it
even reveals the nature of the relationship, that is, if it is linear or non-linear, by
the shape of the pattern. Scatter diagrams are especially very useful in regression
and correlation analyses.
Example
The following data represent the annual sales of petrol in Iraq in millions
of dollars for the period 1985-96.
Year (19_) 85 86 87 88 89 90 91 92 93 94 95 96
Sales ($m) 600 840 420 720 640 860 420 740 670 900 430 760
Table 1.12.1
1000
900
800
700
Sales ($m)
600
500
400
300
200
100
0
85 86 87 88 89 90 91 92 93 94 95 96
Year
Fig. 1.12.2
26
A time series shows the trend, cycle and seasonality in the behaviour of a
variable. It is a very sophisticated means of forecasting the values of the variable
on the assumption that history repeats itself.
Example
Table 1.13.1 below refers to tax paid by people in various income groups
in a sample. Construct a Lorenz curve for the data and comment on it.
Table 1.13.1
The above table now should be altered in such a way that relative
cumulative frequencies may now be displayed for both variables, that is, ‘number
of people’ and ‘tax paid’. We must change the labels for the first column,
determine the cumulative frequencies and then convert these to percentages
(proportions) as shown in Table 1.13.2.
Table 1.13.2
27
Next, we plot one relative cumulative frequency against another. It does
not really matter which axis is to be used for which variable, the reason being that
we only wish to observe the departure of the Lorenz curve from the line of
uniform distribution. On graph, this is simply the line y = x, which represents the
ideal situation where, for our example, the proportion of tax paid is equally
distributed among the various classes of income earners. This line is also to be
drawn on the same graph in order to make the ‘bulge’ of the Lorenz curve more
visible. This is clearly illustrated in Fig. 1.13.3 below.
0.9
0.8 Line of
uniform
distribution
0.7
Proportion of tax paid
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Proportion of taxpayers
Fig. 1.13.3
The further the curve is from the line of uniform distribution, the more
uneven is the distribution. It can be observed, for example, that approximately
36% of the population of taxpayers pays only 10% of the total tax. This shows a
considerable degree of unevenness in the population. In an ideal situation, 36% of
the population would have paid 36% of the total tax.
28
1.14 Z-CHARTS
The annual moving total is the sum of the values of the variable for the
12-month period up to the end of the month under consideration. A line for the
budget for the year to data may be added to a Z-chart, for comparison with the
cumulative sum of actual values.
Example
The sales figures for a company for 2002 and 2003 are as follows.
Table 1.14.1
Table 1.14.2 will now include the cumulative sales for 2003 and the
annual moving total, that is, the 12-month period will be updated from the period
Jan-Dec 2003 to Feb 2003-Jan 2004, then Mar 2003- Feb 2004 and so on until
Jan-Dec 2004, whilst these total sales will be continuously calculated and
recorded.
Note Z-charts do not have to cover 12 months of a year. They could, for
example, also be drawn for four quarters of a year or seven days of a
week.
29
2002 sales 2003 sales Cumulative sales Annual moving
Month
($m) ($m) 2003 ($m) total ($m)
January 7 8 8 91
February 7 8 16 92
March 8 8 24 92
April 7 9 33 94
May 9 8 41 93
June 8 8 49 93
July 8 7 56 92
August 7 8 64 93
September 6 9 73 96
October 7 6 79 95
November 8 9 88 96
December 8 9 97 97
Table 1.14.2
Fig. 1.14.3
30
Interpretation of Z-charts
1. Monthly totals show the monthly results at a glance with any seasonal
variations.
2. Cumulative totals show the performance to data and can be easily compared
with planned and budgeted performance by superimposing the budget line.
3. Annual moving totals compare the current levels of performance with those
of the previous year. If the line is rising, then this year’s monthly results are
better than the results of the corresponding month last year. The opposite
applies if the line is falling. The annual moving total line indicates the long-
term trend of the variable, whether rising, falling or steady.
Note While the values of the annual moving total and the cumulative values are
plotted on month-end positions, the values for the current monthly figures
are plotted on mid-month positions. This is because monthly figures
represent achievement over a particular month whereas the annual moving
totals and the cumulative values represent achievement up to a particular
month end.
31