Quantitative Methods PDF
Quantitative Methods PDF
Quantitative Methods PDF
Methods
Professor David Targett
Module 1 4/1
Module 2 4/7
Module 3 4/14
Module 4 4/20
Module 5 4/25
Module 6 4/34
Module 7 4/39
Module 8 4/48
Module 9 4/56
Module 10 4/62
Module 11 4/70
Module 12 4/77
Module 13 4/84
Module 14 4/89
Module 15 4/99
Index I/1
Learning Objectives
This module gives an overview of statistics, introducing basic ideas and concepts at a general
level, before dealing with them in greater detail in later modules. The purpose is to provide a
gentle way into the subject for those without a statistical background, in response to the
cynical view that it is not possible for anyone to read a statistical text unless they have read it
before. For those with a statistical background the module will provide a broad framework
for studying the subject.
1.1 Introduction
The word statistics can refer to a collection of numbers or it can refer to the science of
studying collections of numbers. Under either definition the subject has received far more
than its share of abuse (lies, damned lies). A large part of the reason for this may well be
the failure of people to understand that statistics is like a language. Just as verbal languages
can be misused (for example, by politicians and journalists?) so the numerical language of
statistics can be misused (by politicians and journalists?). To blame statistics for this is as
sensible as blaming the English language when election promises are not kept.
One does not have to be skilled in statistics to misuse them deliberately (figures can lie
and liars can figure), but misuses often remain undetected because fewer people seem to
have the knowledge and confidence to handle numbers than have similar abilities with
words. Fewer people are numerate than are literate. What is needed to see through the
misuse of statistics, however, is common sense with the addition of only a small amount of
technical knowledge.
The difficulties are compounded by the unrealistic attitudes of those who do have statis-
tical knowledge. For instance, when a companys annual accounts report that the physical
stock level is 34236417 (or even 34236000), it conveys an aura of truth because the
figure is so precise. Accompanying the accountants who estimated the figure, one may have
thought that the method by which the data were collected did not warrant such precision.
For market research to say that 9 out of 10 dogs prefer Bonzo dog food is also misleading,
but in a far more overt fashion. The statement is utterly meaningless, as is seen by asking the
questions: Prefer it to what?, Prefer it under what circumstances?, 9 out of which 10
dogs?
Such examples and many, many others of greater or lesser subtlety have generated a poor
reputation for statistics which is frequently used as an excuse for remaining in ignorance of
it. Unfortunately, it is impossible to avoid statistics in business. Decisions are based on
information; information is often in numerical form. To make good decisions it is necessary
to organise and understand numbers. This is what statistics is about and this is why it is
important to have some knowledge of the subject.
Statistics can be split into two parts. The first part can be called descriptive statistics.
Broadly, this element handles the problem of sorting a large amount of collected data in
ways which enable its main features to be seen immediately. It is concerned with turning
numbers into real and useful information. Included here are simple ideas such as organising
and arranging data so that their patterns can be seen, summarising data so that they can be
handled more easily and communicating data to others. Also included is the now very
important area of handling computerised business statistics as provided by management
information systems and decision support systems.
The second part can be referred to broadly as inferential statistics. This element tackles the
problem of how the small amount of data that has been collected (called the sample) may
be analysed to infer general conclusions about the total amount of similar data that exist
uncollected in the world (called the population). For instance, opinion polls use inferential
statistics to make statements about the opinions of the whole electorate of a country, given
the results of perhaps just a few hundred interviews.
Both types of statistics are open to misuse. However, with a little knowledge and a great
deal of common sense, the errors can be spotted and the correct procedures seen. In this
module the basic concepts of statistics will be introduced. Later some abuses of statistics and
how to counter them will be discussed.
The first basic concept to look at is that of probability, which is fundamental to statistical
work. Statistics deals with approximations and best guesses because of the inaccuracy and
incompleteness of most of the data used. It is rare to make statements and draw conclusions
with certainty. Probability is a way of quantifying the strength of belief in the information
derived and the conclusions drawn.
1.2 Probability
All future events are uncertain to some degree. That the present government will still be in
power in the UK in a years time (given that it is not an election year) is likely, but far from
certain; that a communist government will be in power in a years time is highly unlikely, but
not impossible. Probability theory enables the difference in the uncertainty of events to be
made more precise by measuring their likelihood on a scale.
Heads 0.5
Tails 0.5
(b) Relative frequency approach. When the event has been or can be repeated a large
number of times, its probability can be measured from the formula
For example, to estimate the probability of rain on a given day in September in London,
look at the last 10 years records to find that it rained on 57 days. Then:
(c) Subjective approach. A certain group of statisticians (Bayesians) would argue that the
degree of belief that an individual has about a particular event may be expressed as a
probability. Bayesian statisticians argue that in certain circumstances a persons subjective
assessment of a probability can and should be used. The traditional view, held by classi-
cal statisticians, is that only objective probability assessments are permissible. Specific
areas and techniques that use subjective probabilities will be described later. At this stage
it is important to know that probabilities can be assessed subjectively but that there is
discussion amongst statisticians as to the validity of doing so. As an example of the sub-
jective approach, let the event be the achievement of political unity in Europe by the year
2010 AD. There is no way that either of the first two approaches could be employed to
calculate this probability. However, an individual can express his own feelings on the
likelihood of this event by comparing it with an event of known probability, e.g. is it
more or less likely than obtaining a head on the spin of a coin? After a long process of
comparison and checking, the result might be:
political unity in Europe by 2010 AD 0.10
The process of accurately assessing a subjective probability is a field of study in its own
right and should not be regarded as pure guesswork.
The three methods of determining probabilities have been presented here as an introduc-
tion and the approach has not been rigorous. Once probabilities have been calculated by
whatever method, they are treated in exactly the same way.
Examples
1. What is the probability of throwing a six with one throw of a die? A priori approach there
are six possible outcomes: 1, 2, 3, 4, 5, or 6 showing. All outcomes are equally likely, there-
fore:
1
throwing a 6
6
2. What is the probability of a second English Channel tunnel for road vehicles being completed
by 2025 AD?
The subjective approach is the only one possible, since logical thought alone cannot lead to an
answer and there are no past observations. My assessment is a small one, around 0.02.
3. How would you calculate the probability of obtaining a head on one spin of a biased coin?
The a priori approach may be possible if one had information on the aerodynamical behaviour
of the coin. A more realistic method would be to conduct several trial spins and count the
number of times a head appeared:
4. What is the probability of drawing an ace in one cut of a pack of playing cards?
Use the a priori method. There are 52 possible outcomes (one for each card in the deck) and
the probability of picking any one card, say the ace of diamonds, must therefore be . There
are four aces in the deck, hence:
4 1
drawing an ace
52 13
53
66
41 71 40
110
83 106
72
20
99 92
75
Table 1.1 is an ordered array. They look neater now but it is still not possible to get a feel
for the data (the average, for example) as they stand. The next step is to classify the data and
then arrange the classes in order. Classifying means grouping the numbers in bands (e.g. 50
54) to make them easier to handle. Each class has a frequency which is the number of data
points that fall within that class. This is called a frequency table and is shown in Table 1.2.
This shows that seven data points were greater than or equal to 40 but less than 50, 12 were
greater than or equal to 50 but less than 60 and so on. There were 100 data points in all.
It is now much easier to get an overall conception of what the data mean. For example,
here most of the numbers are between 60 and 90 with extremes of 40 and 110. Of course, it
is likely that at some time there may be a need to perform detailed calculations with the
numbers to provide specific information, but at present the objective is merely to get a feel
for the data in the shortest possible time. Another arrangement with greater visual impact,
called a frequency histogram, will help meet this objective.
30
27
22
Frequency
20 19
12
10
10 7
3
e.g.
40 50 0.07
The frequency histogram can then be turned into a probability histogram by writing the
units of the vertical axis as probabilities (as calculated above) instead of frequencies. The
shape of the histogram would remain unaltered. Once the histogram is in the probability
form it is usually referred to as a distribution, in this case a discrete distribution. A variable is
discrete if it is limited in the values it can take. For example, when the data are restricted to
classes (as above) the variable is discrete. Also when a variable is restricted to whole numbers
only (an integer variable), it is discrete.
The probability histogram makes it easier to work out the probabilities associated with
amalgams of classes. For instance, if the probabilities of two of the classes are:
50 60 0.12
60 70 0.22
then:
50 70 0.12 0.22
0.34
This is true whether working in probabilities or the frequencies from which they were
derived.
Examples
From the data in Figure 1.3, what are the probabilities:
1. 80 100 ?
2. 70 ?
3. 60 100 ?
Answers
1.
80 100 80 90 90 100
0.19 0.10
0.29
2.
70 50 50 60 60 70
0.07 0.12 0.22
0.41
3.
50 60
Continuous variable (d) Variable classes (c)
50 < x < 55
50 < x < 60
55 < x < 60
Area
0.12 Area Area
0.05 0.07
50 60 50 55 60
Using areas to measure probabilities, the column heights of the new classes are approxi-
mately the same as those of the original. The lower probabilities for the new classes are
reflected in the halving of the column widths, rather than changes in the heights. As the
subdivision process continues, there is no tendency for the distribution to become flatter. In
this way a continuous distribution can have a definite shape which can be interpreted in the
same way as the shape of a discrete distribution, but its probabilities are measured from
areas. Just as the column heights of a discrete distribution sum to 1 (because each observa-
tion certainly has some value), so the total area of a continuous distribution is 1.
The differences between discrete and continuous distributions are summarised in Ta-
ble 1.3.
Example
Figure 1.6 The area under each part of the curve is shown. The total area is equal
to 1.0
Using the continuous distribution in Figure 1.6, what are the probabilities that a particular value of
the variable falls within the following ranges?
1. 60?
2. 100?
3. 60 110?
4. 135?
5. 110?
Answers
1. 60 0.01
2. 100 0.01 0.49 0.5
3. 60 110 0.49 0.27 0.76
4. 135 0.02
5. 110 0.21 0.02 0.23
In practice, the problems with the use of continuous distributions are, first, that one can
never collect sufficient data, sufficiently accurately measured, to establish a continuous
distribution. Second, were this possible, the accurate measurement of areas under the
curve would be difficult. Their greatest practical use is where continuous distributions
appear as standard distributions, a topic discussed in the next section.
the theoretical one on which the mathematics were based. However, this disadvantage is
more than offset by the saving in data collection that the use of a standard distribution
brings about. Observed distributions often entail a great deal of data collection. Not only
must sufficient be collected for the distribution to take shape, but also data must be collected
individually for each and every situation.
In summary, using an observed distribution implies that data have been collected and
histograms formed; using a standard distribution implies that the situation in which data
are being generated resembles closely a theoretical situation for which a distribution has been
constructed mathematically.
The property of the normal distribution illustrated in Figure 1.9 is derived from the
underlying mathematics which are beyond the scope of this introduction. In any case, it is
more important to be able to use the normal distribution than to prove its properties
mathematically. The property applies whether the distribution is flat and wide or high and
narrow, provided only that it is normal. Given such a property, it is possible to calculate the
probabilities of events. The example below demonstrates how a standard distribution (in this
case the normal) can be used in statistical analysis.
(a)
(b)
Figure 1.8 Salaries: (a) hospital high standard deviation; (b) school low standard
deviation
68%
1s 1s
95%
2s 2s
99%
3s 3s
Example
A machine is set to produce steel components of a given length. A sample of 1000 components is
taken and their lengths measured. From the measurements the average and standard deviation of
all components produced are estimated to be 2.96cm and 0.025cm respectively. Within what
limits would 95 per cent of all components produced by the machine be expected to lie?
Take the following steps:
1. Assume that the length of all components produced follow a normal distribution. This is
reasonable since this situation is typical of the circumstances in which normal distributions
arise.
2. The parameters of the distribution are the average mean = 2.96cm and the standard deviation
= 0.025cm. The distribution of the lengths of the components will therefore be as in Fig-
ure 1.10.
95%
According to this estimate, 95 per cent of all production will lie between 2.91cm and 3.01cm.
1.6.1 Definitions
Statistical expressions and the variables themselves may not have precise definitions. The
user may assume the producer of the data is working with a different definition than is the
case. By assuming a wrong definition, the user will draw a wrong conclusion. The statistical
expression average is capable of many interpretations. A firm of accountants advertises in
its recruiting brochure that the average salary of qualified accountants in the firm is 44200.
A prospective employee may conclude that financially the firm is attractive to work for. A
closer look shows that the accountants in the firm and their salaries are as follows:
3 partners 86000
8 senior accountants 40000
9 junior accountants 34000
3 partners 50000
8 senior accountants 37000
9 junior accountants 32400
The mean salary is now 36880. Remuneration at this firm is suddenly not quite so
attractive.
1.6.2 Graphics
Statistical pictures are intended to communicate data very rapidly. This speed means that
first impressions are important. If the first impression is wrong then it is unlikely to be
corrected.
Pictorial representations of data are many, but the most frequently used is probably the
graph. If the scale of a graph is concealed or not shown at all, the wrong conclusion can be
drawn. Figure 1.11 shows the sales figures for a company over the last three years. The
company would appear to have been successful.
Sales
1994 1995 1996
1994 11250000
1995 11400000
1996 11650000
A more informative graph showing the scale is given in Figure 1.12. Sales have hardly
increased at all. Allowing for inflation, they have probably decreased in real terms.
12
10
Sales ( million)
Second, sample bias arises through the questions that elicit the data. Questions such as:
Do you go to church regularly? will provide unreliable information. There may be a
tendency for people to exaggerate their attendance since, generally, it is regarded as a worthy
thing to do. The word regularly also causes problems. Twice a year, at Christmas and
Easter, is regular. So is twice every Sunday. It would be difficult to draw any meaningful
conclusions from the question as posed. The question should be more explicit in defining
regularity.
Third, the sample information may be biased by the interviewer. For example, supermar-
ket interviews about buying habits may be conducted by a young male interviewer who
questions 50 shoppers. It would not be surprising if the resultant sample comprised a large
proportion of young attractive females.
The techniques of sampling which can overcome most of these problems will be de-
scribed later in the course.
1.6.4 Omissions
The statistics that are not given can be just as important as those that are. A television
advertiser boasts that nine out of ten dogs prefer Bonzo dog food. The viewer may conclude
that 90 per cent of all dogs prefer Bonzo to any other dog food. The conclusion might be
different if it were known that:
(a) The sample size was exactly ten.
(b) The dogs had a choice of Bonzo or the cheapest dog food on the market.
(c) The sample quoted was the twelfth sample used and the first in which as many as nine
dogs preferred Bonzo.
Logical errors are often made with probability. For example, suppose a questionnaire
about marketing methods is sent to a selection of companies. From the 200 replies, it
emerges that 48 of the respondents are not in the area of marketing. It also emerges that 30
are at junior levels within their companies. What is the probability that any particular
questionnaire was filled in by someone neither in marketing nor at a senior level? It is
tempting to suppose that:
Probability 39%
This is almost certainly wrong because of double counting. Some of the 48 non-
marketers are also likely to be at a junior level. If 10 respondents were non-marketers and at a
junior level, then:
Probability 34%
Only in the rare case where none of those at a junior level were outside the marketing
area would the first calculation have been correct.
2000
No. of civil servants
1500
1000
500
2000
1000
500
Salary (000s)
Review Questions
1.1 One of the reasons why probability is important in statistics is that if data being dealt with are in
the form of a sample, any conclusions drawn cannot be 100 per cent certain. True or false?
1.2 A randomly selected card drawn from a pack of cards was an ace. It was not returned to the pack.
What is the probability that a second card drawn will also be an ace?
A.
B.
C.
D.
E.
1.4 A coin is known to be unbiased, i.e. it is just as likely to come down heads as tails. It has just
been tossed eight times and each time the result has been heads. On the ninth throw, what is the
probability that the result will be tails?
A. Less than
B.
C. More than
D. 1
25
22
17
8
6
1.5 On how many days were sales not less than 50000?
A. 17
B. 55
C. 23
D. 48
1.6 What is the probability that on any day sales are 60000 or more?
A.
B.
C.
D. 0
1.7 What is the sales level that was exceeded on 90 per cent of all days?
A. 20 000
B. 30 000
C. 40 000
D. 50 000
E. 60 000
1.9 A normal distribution has mean 60 and standard deviation 10. What percentage of readings will be
in the range 6070?
A. 68%
B. 50%
C. 95%
D. 34%
E. 84%
1.10 A police checkpoint recorded the speeds of motorists over a one week period. The speeds had a
normal distribution with a mean 82km/h and standard deviation 11km/h. What speed was
exceeded by 97.5 per cent of motorists?
A. 49
B. 60
C. 71
D. 104
As a first step towards planning new facilities at one of its city centre ticket offices, an airline has
collected data on the length of time customers spend at a ticket desk (the service time). One
hundred customers were investigated and the time in minutes each one was at an enquiry desk
was measured. The data are shown below.
0.9 3.5 0.8 1.0 1.3 2.3 1.0 2.4 0.7 1.0
2.3 0.2 1.6 1.7 5.2 1.1 3.9 5.4 8.2 1.5
1.1 2.8 1.6 3.9 3.8 6.1 0.3 1.1 2.4 2.6
4.0 4.3 2.7 0.2 0.3 3.1 2.7 4.1 1.4 1.1
3.4 0.9 2.2 4.2 21.7 3.1 1.0 3.3 3.3 5.5
0.9 4.5 3.5 1.2 0.7 4.6 4.8 2.6 0.5 3.6
6.3 1.6 5.0 2.1 5.8 7.4 1.7 3.8 4.1 6.9
3.5 2.1 0.8 7.8 1.9 3.2 1.3 1.4 3.7 0.6
1.0 7.5 1.2 2.0 2.0 11.0 2.9 6.5 2.0 8.6
1.5 1.2 2.9 2.9 2.0 4.6 6.6 0.7 5.8 2.0
1 Classify the data in intervals one minute wide. Form a frequency histogram. What service time is
likely to be exceeded by only ten per cent of customers?
separately. For years, however, this antiquated practice has been little more than a ritual.
Supposedly, the system gives workers the opportunity to express their views, but the fact is, the
wages settlement in the first group invariably sets the pattern for all other groups within a
particular company. The Door Trim Line at JPC was the key group in last years negotiations.
Being first in line, the settlement in Door Trim would set the pattern for JPC that year.
Annie Smith is forewoman for the Door Trim Line. There are many variations of door trim
and Annies biggest job is to see that they get produced in the right mix. The work involved in
making the trim is about the same regardless of the particular variety. That is to say, it is a straight
piecework operation and the standard price is 72p per unit regardless of variety. The work itself,
while mainly of an assembly nature, is quite intricate and requires a degree of skill.
Last years negotiations started with the usual complaint from the union about piece prices in
general. There was then, however, an unexpected move. Here is the unions demand for the
Door Trim Line according to the minutes of the meeting:
Well come straight to the point. 72p a unit is diabolical A fair price is 80p.
The women average about 71 units/day. Therefore, the 8p more that we want amounts
to an average of 5.68 more per woman per day
This is the smallest increase weve demanded recently and we will not accept less than
80p.
(It was the long-standing practice in the plant to calculate output on an average daily basis.
Although each persons output is in fact tallied daily, the bonus is paid on daily output averaged
over the week. The idea is that this gives a person a better chance to recoup if she happens to
have one or two bad days.)
The unions strategy in this meeting was a surprise. In the past the first demand was purposely
out of line and neither side took it too seriously. This time their demand was in the same area as
the kind of offer that JPCs management was contemplating.
At their first meeting following the session with the union, JPCs management heard the fol-
lowing points made by the accountant:
a. The unions figure of 71 units per day per person is correct. I checked it against the latest
Production Report. It works out like this:
Average weekly output for the year to date is 7100 units, thus average daily output is:
1420 units/day
The number of women directly employed on the line is 20, so that average daily output is:
71 units/day/woman
100 11.1
c. Direct labour at current rates is estimated at 26 million. Assuming an 11.1 per cent increase
across the board, which, of course, is what we have to anticipate, total annual direct labour
would increase by about 2.9 million:
Prior to the negotiations management had thought that seven per cent would be a reasonable
offer, being approximately the rate at which productivity and inflation had been increasing in
recent years. Privately they had set ten per cent as the upper limit to their final offer. At this level
they felt some scheme should be introduced as an incentive to better productivity, although they
had not thought through the details of any such scheme.
As a result of the unions strategy, however, JPCs negotiating team decided not to hesitate any
longer. Working late, they put together their best package using the ten per cent criterion. The
main points of the plan were as follows:
a. Maintain the 72p per unit standard price but provide a bonus of 50p for each unit above a
daily average of 61 units/person.
b. Since the average output per day per person is 71, this implies that on average 10 bonus units
per person per day would be paid.
c. The projected weekly cost then is 5612:
71 0.72 10 0.50 56.12
56.12 5 20 5612
d. The current weekly cost then is 5112:
71 0.72 5 20 5112
e. This amounts to an average increase of 500 per week, slightly under the 10 per cent upper
limit:
100 9.78%
f. The plan offers the additional advantage that the average worker gets 10 bonus units
immediately, making the plan seem attractive.
g. Since the output does not vary much from week to week, and since the greatest improvement
should come from those who are currently below average, the largest portion of any increase
should come from units at the lower cost of 72p each. Those currently above average proba-
bly cannot improve very much. To the extent that this occurs, of course, there is a tendency
to reduce the average cost below the 79p per unit that would result if no change at all occurs:
79.0p
At this point management had to decide whether they should play all their cards at once, or
whether they should stick to the original plan of a seven per cent offer. Two further issues had to
be considered:
a. How good were the rates?
b. Could a productivity increase as suggested by the 9.8 per cent offer plan really be anticipated?
Annie Smith, the forewoman, was called into the meeting and she gave the following infor-
mation:
a. A few workers could improve their own average a little, but the rates were too tight for any
significant movement in the daily outputs.
b. This didnt mean that everyone worked at the same level, but that individually they were all
close to their own maximum capabilities.
c. A number did average under 61 units per day. Of the few who could show a sustained
improvement, most would be in this less-than-61 category.
This settled it. JPC decided to go into the meeting with their best offer of 9.8 per cent. Next
day the offer was made. The union asked for time to consider it and the next meeting was set for
the following afternoon.
In the morning of the following day Annie Smith reported that her Production Performance
Report (see Table 1.4) was missing. She did not know who had taken it, but was pretty sure it was
the union steward.
The next meeting with the union lasted only a few minutes. A union official stated his under-
standing of the offer and, after being assured that he had stated the details correctly, he
announced that the union approved the plan and intended to recommend its acceptance to its
membership. He also added that he expected this to serve as the basis for settlement in the other
units as usual and that the whole wage negotiations could probably be completed in record time.
And that was that. Or was it? Some doubts remained in the minds of JPCs negotiating team.
Why had the union been so quick to agree? Why had the Production Performance Report been
stolen? While they were still puzzling over these questions, Annie Smith phoned to say that the
Production Performance Report had been returned.
1 In the hope of satisfying their curiosity, the negotiating team asked Annie to bring the Report
down to the office. Had any mistakes been made?
Was JPCs offer really 9.8 per cent? If not, what was the true offer?