Yule 1911
Yule 1911
Yule 1911
http://hdl.handle.net/2027/mdp.39015033708259
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
ra^ife
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
.V5
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
THEORY OF STATISTICS.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
AN INTRODUCTION TO THE
Twenty-sixth Annual Issue.
AND IRELAND.
PAPERS read during the Session 1908-1909 before all the LEADING
Departments of Research :—
Anthropology.
and Architecture.
§ 10. Law.
§ 11. Literature.
5 12. Psychology.
§ 13. Archaeology.
§ 14. Medicine.
Journal.
use for the progress of Science."—Lord Play fair, F.R.S., K.C.B., M.P.. Past-President
"It goes almost without saying that a Handbook of this subject will be in time
one of the most generally useful works for the library or the desk."—The Times.
"British Societies are now well represented in the 'Year-Book of the Scientific
our great Scientific Centres, Museums, and Libraries throughout the Kingdom,
Scientific Work.
THEORY OF STATISTICS
BY
G^TJDNY YULE,
LONDON:
1911.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
N
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
PREFACE.
to render the book more suitable for the use of biologists and »
and some of the more difficult parts of the subject have been
of some thirty lectures. For the rest, the chapters follow closely
the arrangement of the course, the three parts into which the
have also been added for the benefit, more especially, of the
v
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
VI PREFACE.
for reading the greater part of the manuscript, and the proofs,,
been of the greatest service, but also for much friendly help and
I can hardly hope that all errors in the text or in the mass
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
biguities, or obscurities.
G. U. Y.
December 1910.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
CONTENTS.
INTRODUCTION.
PAOES
1-3. The introduction of the terms " statistics," " statistical," into
usage 1-8
CHAPTER I.
CHAPTER II.
CONSISTENCE.
CHAPTER III.
ASSOCIATION.
PAGES
CHAPTER IV.
PARTIAL ASSOCIATION.
CHAPTER V.
MANIFOLD CLASSIFICATION.
CHAPTER VI.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
THE FREQUENCY-DISTRIBUTION.
X
CONTENTS. IX
CHAPTER VII.
AVERAGES.
PAGES
CHAPTER VIII.
CHAPTER IX.
CORRELATION.
coefficient 157-190
CHAPTER X.
AND METHODS.
PAGES
CHAPTER XI.
THE CORRELATION-COEFFICIENT.
CHAPTER XII.
PARTIAL CORRELATION.
XI
CHAPTER XIII.
250-271
CHAPTER XIV.
272-286
Xll CONTENTS.
CHAPTER XV.
NORMAL CURVE.
PAGES
CHAPTER XVI.
NORMAL CORRELATION.
>
CONTENTS. Xlll
CHAPTER XVII.
PAGES
given 357-364
Index 365-376
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
THEOKY OF STATISTICS.
INTRODUCTION.
1-3. The introduction of the terms " statistics," " statistical," into the English
all derived, more or less indirectly, from the Latin status, in the
2. The first term is, however, of much earlier date than the two
"It is about forty years ago," says Zimmermann, "that that branch
of political knowledge, which has for its object the actual and
become a favourite study in Germany" (p. ii); and again (p. v),
1
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
2 THEORY OF STATISTICS.
3. Within the next few years the words were adopted by several
writers, notably by Sir John Sinclair, the editor and organiser of the
introduction.
2 Statistical Account, vol. xx., Appendix to '' The History of the Origin and
appears to have been only half accomplished even after the founda-
first volume of the Journal, issued in 1838-9, are for the most
which were originally formed for the study of the state, on almost
numerical data concerning the state were still termed " statistical
and the same methods woulc) not be applied. What, then, is this
common character 1
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
X
INTRODUCTION. 5
In the first place, the methods available for eliminating the effect <s
following definitions:—
causes.
methods.
a marked extent " is necessary, since the term "statistics " is not
REFERENCES.
(1) John, V., Der Name Statistik; Weiss, Berne, 1883. A translation in
(2) Yule, G. U., "The Introduction of the Words 'Statistics,' ' Statistical,'
into the English Language," Jour. Roy. Stat. Soc., vol. lxviii., 1905,
p. 391.
(3) John, V., Geschichte der Statistik, lte Teil, bis auf Quetelet; Enke,
Stuttgart, 1884. (All published; the author died in 1900. By far the
century.)
Economy are useful. For its importance as regards the English school
(6) Hull, C. H., The Economic Writings of Sir William Petty, together
CHAPTER I.
In the first place, the observer may note only the presence or
the dumb and speaking, or the insane and sane. The quantitative
of excess or defect, and stating the numbers of tall and short (or
7
8 THEOKY OF STATISTICS.
the changes of a variable which can only possess two values, say
methods and principles developed for the case in which the observer
and most fundamental, and are best considered first. This and
third, and so on, every class being divided into two at each step.
classified into males and females; the members of each sex into
sane and insane; the insane males, sane males, insane females,
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
eight classes—tall with round green seeds, tall with round yellow
seeds, tall with wrinkled green seeds, tall with wrinkled yellow
class is divided into two sub-classes and no more, has been termed
notation for the classes formed, and for the numbers of observa-
attribute A.
classes so formed, viz. AB, A/i, aB, afi, include respectively the
blind and deaf, the blind but not-deaf, the deaf but not-blind, and
the neither blind nor deaf. If a third attribute be noted, e.g. in-
sanity, denoted say by C, the class ABC, includes those who are
at once deaf, blind, and insane, ABy those who are deaf and blind
(J) den(
(a)
n a'st
,, Dot ,, ,, A
(AB) „
ABs,
(t£) „
„ aB's,
,, ,, ,, B but not A
(ABC) ,,
ABC's,
A,B,andC
(aBC) „
aBCs,
(«m „
a/5C's,
symbols given stand for the numbers of the blind, the not-blind,
the blind and deaf, the deaf but not blind, the blind, deaf, and in-
sane, the deaf and insane but not blind, and the insane but neither
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
If two classes are such that every attribute in the symbol for
in the symbol for the other, they may be termed contrary classes /
order and its frequency as a frequency of the rth order. Thus AB,
orders respectively.
11. The classes of one and the same order fall into further
attributes A, B,C have been noted, the classes of the second order
cies the symbols for which are derived from any one positive
11
the second order, and the twelve classes of the second order which
same aggregate are kept together. Thus the frequencies for the
specified:—
Order 0. N
Order 1.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Order 2.
Order 3.
(*)
(AB)
(aB)
(B)
(P)
(AC)
(Ay)
(«C)
(ay)
(C)
(y)
(BC)
(By)
(PC)
(Py)
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
(ABC)
(ABy)
(A/3C)
UPy)
(aBC)
(aBy)
(aBC)
(-Py)
(1)
statement.
A's to the number of A's that are B together with the number of
A's that are not B; and so on,—i.e. any class-frequency can always
(2)
enumerate more than the ultimate frequencies. All the others can
nutrition.
vations N.
(ABC) 57 (aBC) 78
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
(ABC) 86 (aBC) 65
total: N= 10,000.
classes, is—
2x2x2x2 . . . . =.2".
NOTATION AND TERMINOLOGY. 13
the data are completely given, but any other set containing the
total number of observations N, form one such set. They are alge-
2", as may be readily seen from the fact that if the Greek letters
are struck out of the symbols for the ultimate classes, they become
the symbols for the positive classes, with the exception of oj8y
is made up as follows :—
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
n(n -1)
n(re-lXn-2)
Example i., § 13. The latter gives directly the whole number of
observations and the totals of A's, B's, and C's. The former gives
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
the second-order frequencies (AB), (AC), and (BC), which are neces-
(a/J) =(«)"M)
(aft) = (a/3)-(«/?£)
positive classes.
= 877- 143-281=453
= 10,000-1086-286 + 135-453
= 10,135-1825 = 8310
and so on.
the mentally deranged and the returns of persons who are deaf
blind but not deranged; dumb and deranged but not blind;
blind and deranged but not dumb; blind, dumb, and deranged.
thus given are (A), (B), (C), (ABy), (ABC), (aBC), (ABC) (cf.
Census of England and Wales, 1891, vol. iii., tables 15 and 16,
basis (cf. ref. 5), the symbol A denoting that the object or in-
has seen the whole list of attributes of which note has been
taken, and this list he must bear in mind. The statement that
1
NOTATION AND TERMINOLOGY. 15
this respect, been quite clear. The "Blind" includes those who
are " Blind and Dumb," or " Blind, Dumb, and Lunatic," and so
forth. But the heading "Blind and Dumb," in the table relating
the first table the headings are inclusive, in the second exclusive.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
REFERENCES.
(The method used in these chapters is that of Jevons, with the notation
(2) Yule, G. U., "On the Association of Attributes in Statistics, etc.," Phil.
Statistics," Biomelrika• vol. ii., 1903, p. 121. (The first three sections
of (4) are an abstract of (2) and (3). The remarks made as regards the
nection with the remarks made at the beginning of (3) and in this
Material has been cited from, and reference made to the notation used in—
(5) Warner, F.. and others, "Report on the Scientific Study of the Mental and
(6) Warner, F., "Mental and Physical Conditions among Fifty Thousand
Children, etc.," Jour. Roy. Stat. Soc, vol. lix., 1896, p. 125.
EXERCISES.
1. (Figures from ref. (5).) The following are the numbers of boys observed
(ABC)
149
(aBC)
204
(ABy)
73S
(aBy)
1,762
(A$C)
225
(ajSC)
171
(Afly)
1,196
(o/37)
21,842
2. (Figures from ref. (5).) The following are the frequencies of the
23,713
(AS)
587
(A)
1,618
(AC)
128
(S)
2,015
(BC)
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
335
(C)
770
(ABC)
156
3. (Figures from Census, England and Wales, 1891, vol. iii.) Convert the
census statement as below into a statement in terms of (a) the positive, (b)
derangement.
N 29,002,525 (ABy) 82
4. (Cf. Mill's Logic, bk. iii^ eh. xvii., and ref. (1).) Show that if A
then will B occur in a larger proportion of the cases where A is than where
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
i\ 5. (Cf. De Morgan, Formal Logic, p. 163, and ref. (1).) Most B's are A's,
most B's are Cs: find the least number of A's that are C's, i.e. the lowest
6. Given that
show that
(ABC) = (a$y),
800 cases for one measurement, in 700 cases for another, and in 660 cases for
both measurements, in how many cases will both measurements on the wife
show that
■
CHAPTER II.
CONSISTENCE.
attributes.
convenient.
the symbols—
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
(UB) = ,, blind
17 2
18 THEORY OF STATISTICS.
general relations (2), § 13, Chap. I., using U to denote the common
in the form—
case
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
that are B together with the number of A's that are not-B
(a) = N-(A),
= 2T-(A)-(B) + (AB).
(aBy) = (y)-(Ay)-(By)'4lABy)
JV 1000 (AB) 42
they cannot have been observed in one and the same universe.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
= 1000-1307 + 275-25.
= -57.
misprint.
the values of all the unstated frequencies, and so verify the fact
positive, all others must be so, being derived from the ultimate
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
they must all exceed zero.. Apart from this, any one frequency of
less than zero. We will consider the cases of one, two, and
are iV and (-4). The ultimate frequencies are (.4) and (a), where
(a)=IT-(A).
(^)<to ir-(Arto
(a), (c), and (d) are obvious; (b) is perhaps a little less obvious,
really of a new form, but may be derived at once from (1) (a) and
limits as is given by (2). The conditions (a) and (b) give lower or
minor limits to the value of (AB); (c) and (d) give upper or
major limits. If either major limit be less than either minor limit
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
(A) and (B) can take such values that this may be the case.
Expressing the condition that the major limits must be not less
(^)^0 I (5X0 (
(A)3>Vi (B)1>X)
These are simply the conditions of the form (1). If, therefore,
(.4) and (B) fulfil the conditions (1), the conditions (2) must be
CONSISTENCE. 21
possible. The conditions (1) and (2) therefore give all the con-
will be negative.
Z..
from (1) (a) and (1) (b) by specifying the universe in turn as
BC, By, fiC, and (3y. The two conditions holding in four universes
possible to fulfil if any one of the major limits (e)-(h) be less than
any one of the minor limits (a)-(d). The values on the right
There are four major and four minor limits, or sixteen compari-
find, only lead back to conditions of the form (2) for (AB), (AC),
to contrary frequencies ( (a) and (h), (b) and (g), (c) and (f), (d)
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
■ (4)
in any way from those given under (1) and (2). They are con-
each other, whilst the inequalities of the form (2) are only conditions
(AB) and (AC), the conditions (4) give limits for the third, viz.
ditions :—
Example i.—Given that (.4) = (B) = (C) = JJV1 and 80 per cent.
of the A's are B, 75 per cent. of A's are G, find the limits to the
NN
(a) gives a negative limit and (d) a limit greater than unity;
W«0-55 2if)>0-95
—that is to say, not less than 55 per cent. nor more than 95 per
iVI000
<98.
that are A and of C"s that are .4, it cannot be inferred that any B's
are G.
The conditions (a) and (6) give the lower limit of (BC), which
is required. We find—
consider that, even if all the B's were A, and of the remaining
22 A's 14 were C's, there would still be 8 A's that were neither
B nor C.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
14. The student should note the result of the last example, as it
values of If, (A), (B), (C), (AB), and (AC), it will often happen
that any value of (BC) not less than zero (or, more generally, not
less than either of the lower limits (2) (a) and (2) (b)) will satisfy
possible. The argument of the type "So many A's are B and
so many B's are C that we must expect some A's to be C" must
REFERENCES.
(1) Morgan, A. de, Formal Logic, 1847(chapter viii., "On the Numerically
Definite Syllogism").
(2) Boole, G., Laws of Thought, 1854 (chapter xix., "Of Statistical Condi-
tions").
The above are the classical works with respect to the general theory
EXERCISES.
urban district of Bury, 817 per tho^li J H [C-7664J 1894). If, in the
years of age were returned as '' occupied 'Tti 'Ji^TT'' between 20 and 25
men.
4. (Material from ref. 5 of Chap. I.) The following are the proportL
per 10,000 of boys observed, with certain classes of defects amongst a nurrM
dulness.
Show that some dull boys do not exhibit development defects, and state how
Show that some defectively developed girls are not dull, and state how many
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
6. Take the syllogism " All A's are B, all B's are C, therefore all A's ar
C," express the premisses in terms of the notation of the preceding chapter,'
and deduce the conclusion by the use of the general conditions of consistence
7. Do the same for the syllogism "All A's are B, no B's are C, therefor
8. Given that (A) = (.B) = (C) = itf, and that (AB)/N=(AC)/N=p, find!
what must be the greatest or least values of^j in order that we may infer
9. Show that if
NNN
and (AB)_(AC)_(BC)_
and N ~ N ~ N ~y'
r"
CHAPTER III.
, ASSOCIATION.
'
(AWJAP) m
m 08) K'
(B)-(P) L'
(AB)JaB) (1'
<A) («) , ■
(A0)JaB) 13
A), W
(B)-(AB)-(P)-(Ap)
(B)" 08)'
\
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
25
26 THEORY OF STATISTICS•
WW
(B)-(B) + (B) N,
i.e. in words, "the proportion of A's amongst the B's is the same
(B) N W
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
vxtoV^p- - (2)
(AB)_(A) (B)
Wi*vJU> f (C) \^ J ^
tion of B's.
The advantage of the forms (2) over the form (1) is that they
Example i.— If there are 144 A's and 384 B's in 1024 observa-
tions, how many AB's will there be, A and B being independent?
144x384 _,
-1024- °54-
Example ii.—If the A's are 60 per cent., the B's 35 per cent., of
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
independent]
60x35
~T0O 21'
ASSOCIATION. 27
and therefore there must be 21 per cent. (more or less closely, cf.
3. It follows from § 1 that if the relation (2) holds for any one
from (1)—
(AP)-(AB) + (APl-(A)
giving
w)-(T-
And again,
which gives
y-vv
a*
u & - * a. 9
. -. 88UX640 -„ -TT- q
1024
f.
(aB)=(B)-(A£).
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
(Afi) = (A)-(AB).
THEORY OF STATISTICS.
Therefore—
(AB) (aB)
The equation (b) may be read "The ratio of A's to a's amongst
Clearly ■ (AB)(a/B)>(aB)(ap),
Then if (AB)>^Ml
N'
(ABX^fl,
The student should notice that these words are not used
some A a*™B% but that the number of A's which are B's exceeds
it is not mean* that no A's are B\ but that the number of A'l
mere fact that some A's are B's, however great that proportion;
in mind.
If, (A), and (B) is either (.4) or (B) (whichever is the less). When
of the cases, "All A's are B" or "All B's are A," or it might be
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
"No A's are B," or "no a's are B," or more narrowly to the
case when both these statements are true. The greater the
100 pairs may give such results as the following (taken from an
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
actual record) :—
„ „ „ tails . . .18
„ „ „ tails . . .29
the second, we have from the above (.4) = 44, (B) = 53. Hence
44 x 53
the result of the first throw and the result of the second. But it
is fairly certain, from the nature of the case, that such association
years, nor exactly the same proportion of male births when the
unproved.
B. The first two, (a) and (b), follow at once from the definition
across and expanding (A) and N in the first case, (B) and N
student.
ASSOCIATION.
31
(AB) (A)
(B)> N
(AB) (B)
U) N
(a)
(AB) (Aft
(AB) (aB)
(d)
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
(^) 04)
(j8) < ^
W)^ 03)
(/)
(afl) (a)
(a5) (B)
(?)
(A)
(«) ^
(a/3) M
(«0L (/3)
(i)
(A)
(ft ^
(«) *
(a/3) (afl)
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
03) <*)
m.(AB)
(0
(»)
(a) M)
10. Two principles should decide this point: (1) of any two
comparisons, that is the better which brings out the more clearly
B's as compared with the B's (as in (c)), and not with the propor-
tion of the a's in those two universes (as in (/)); or with the
universe (a), and not with the proportion of a's amongst the
yS's as compared with the whole universe (j). That is simply the
N (B) . N + (0) . N.
(AB)/(B) =-70
(AB)/(B) = -40
32 THEORY OF STATISTICS.
(AB)/(B)=-70 (A)IN=-Q7
(A)/N= -7 x -9 + -4 x -1 = -67
unless the value of (B)/JV (or (A)/N in the second case) is known,
vi. below).
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
There still remains the choice between (a) and (l,), or between
(c) and (d). This must be decided with reference to the second
Females „ „ . . 16,848,000
and (aB)/(a), i.e. the proportion of males that died and the
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
(AB) 285,618
(A) 15,773,000
(aB) 265,967
(a) 16,848,000
„ „ Females . . 15-8 ,,
point to be investigated.
A comparison of the form (4) (c) is again valid for testing the
with death-rates, and not with the sex-ratios of the living and
imbecility.
mutes by (B). One of the comparisons (a) or (b) may very well
be used in this case, seeing that (A)IN and (B)/N differ very
3
34 THEORY OF STATISTICS.
mutes = (AB)/(B) . . . J r
population = (A)IN . . . )
Fathers with light eyes and sons with light eyes (AB) . 471
father is reckoned once for each son; e.g. a family in which the
father was light-eyed, two sons light-eyed and one not, would be
reckoned as giving two to the class AB and one to the class Aft.
[ST-(A)](B)
= (aB)0 - 8.
.: (AB)-(AB)0^(aB)0-(aB).
(AB) = (A8)0-8.
(aB) ={aB)0 +8
then
If, now, A and B are positively associated, and (.42?) = say 35,
35 - 27 = 30 - 22 = 18 - 10 = 33 - 25 = 8.
and 19 - 27 = 14 - 22 = 18 - 26 = 33 - 41 = - 8.
8 = (AB)-(AB)0 = (AB)-(^p.
= ±{(AB)(aB)-(*BXAB)}.
8 = l^ol 35x30-25x10 i =8
37
,, „ smooth (A0)
„ „ smooth (afi)
47
12
21
acter of fruit.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
prickly fruits . . . . . ) r
13. While the methods used in the preceding pages suffice for
association:—
U)
(2)
(3)
(AB)
(A)
(AB)
(AB)
(A)
(AB)
(A)
WB)
(o/3)
(a)
(a0)
(a)
(«/8)
(a)
(B)
(/»)
(B)
03)
(B)
(/3)
In the first case all A's are B, and so (Afi) = 0; in the second
all B's are A and so (aB) = 0; and in the third case we have (A) =
38
THEORY OF STATISTICS.
(B) = (AB), so that all A's are B and also all B's are A. The
(4)
(5)
(6)
U$)
(y<)
(AB)
M/3)
M)
0
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
(AV)
uo
(■*)
(«S)
(a)
(aB)
(a)
(•£)
(a)
(*)
(/8)
(-B)
(/3)
N
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
(-B)
(/»)
+ 1 in the first three cases, - 1 in the second three, and shall also
expression—
_(AB)(aP)-(A/3)(aB)
V (AB)(a(l) + (Ap)(aB)
-(AB)(ap) + (A/3)(aB)
—where S is the symbol used in the two last sections for the
tions of sampling.
referred to the reference (1) at the end of this chapter; and for a
theory of variables, which has come into more general use, to ref.
sake of emphasis, that (cf. § 5) the mere fact of 80, 90, or 99 per
A's and .B's, or concerning a universe that includes both a's and
in the whole population was only 1-5 per thousand; nor would
REFERENCES.
Trans. Roy. Soc, Series A, vol. cxciv., 1900, p. 257. (Deals fully
suggested.)
has since been largely used: only the advanced student will be able to
pendently. )
40
THEORY OF STATISTICS.
EXERCISES.
1. At the census of England and Wales in 1901 there were (to the nearest
1000) 15,729,000 males and 16,799,000 females; 3497 males were returned
childhood and sex. How many of each sex for the same total number would
(a) N =5000
(A) =2350
(AB) = 294
(aB) = 768
(B) =3100
(a) = 570
(A$) = 48
(AB) = 1600
(aB) = 380
(a$) = 144
cf. ref. 1, p. 294.) The table below gives the numbers of plants of certain
species that were above or below the average height, stating separately those
Parentage Cross-fer-
tilised. Height—
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Parentage Self-fer-
tilised. Height—
Species.
Above
Average.
Below
Average.
Above
Average.
Below
Average.
Ipomsea purpurea
Petunia violacea
Reseda lutea
Reseda odorata .
Lobelia fulgens.
63
61
25
39
17
10
16
16
17
18
13
11
25
12
55
64
21
30
22
4. (Figures from same source as Example vii. p. 34, but material differently
association between darkness of eye-colour in father and son from the following
ASSOCIATION.
41
Also tabulate for comparison the frequencies that would have been observed
had there been strict independence between eye colour of husband and eye
6. (Figures from the Census of England and Wales, 1891, vol. iii.: the data
number of males in successive age groups, together with the number of the
from childhood to old age, tabulating the proportions of insane amongst the
whole population and amongst the blind, and also the association coefficient.
5-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
15-
25-
35-
46-
65-
65-
444,896
75 and
upwards.
3,304,230
2,712,521
2,089,010
1,611,077
1,191,789
770•124
161.692
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
(A)
844
1,184
1,166
1,501
1752
1,905
1,932
1,701
(B)
2,820
6,226
8,482
9,214
8•187
6,799
3,412
1,098
(AB)
17
19
19
31
32
34
22
7. Show that if
be two aggregates corresponding to the same values of (A), (B), (a), and (0),
8. Show that if
S = (A£)-(AB)0
PARTIAL ASSOCIATION.
. (AB)> or <^
this kind: it is argued that the relation between the two attributes
matter clearer:—
wholly due to the fact that most of the unvaccinated are drawn from
42
PARTIAL ASSOCIATION. 43
the chapter).
attribute in the father and its presence in the son; and also
and attack were drawn from one narrow section of the population
hygienic conditions.
tives winning elections when they spend more than their opponents
and when they spend less, we shall avoid the possible fallacy. If
those cases in which all the parents, say, possess the attribute, or
else all do not, and it is still sensible, then the association first
and C.
44 THEORY OF STATISTICS.
(2)
above apply in the general case where more than three attributes
have been noted, or where the relations of more than three have
(ABcn)>^mm,
The Report from which the figures are drawn concludes that "the
form (4) (a) or (b) of Chap. III. (p. 31), or (2) (a) (b) above, may
pp. 31-2).
A and D for the whole universe, the .8-universe and the /?-
universe:—
789
and D is very high indeed both for the material as a whole (the
table 20, p 216. The table only gives particulars for 78 large
Children
Parents
Grandparents
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
A. a.
B. 0.
0. y
Light-eyed. ^tyed.
Light-eyed. Lign™yed.
Light-eyed. Light°eyed.
45
11
13
34
11
40
(ABC)
1928
(aBC)
303
(ABy)
596
(aBy)
225
(A(iC)
552
UpC)
395
(48y)
508
Wy)
501
PARTIAL ASSOCIATION. 47
parents J »"
grandparents . . . . J"'
tested: (1) where the parents are light-eyed, (2) where they are
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
parents 1
grandparents . . . . J '"
48 THEORY OF STATISTICS.
parents )<-(>)
)\JABy)
j (/37) .
grandparents . . . y>
only.
grandparents . . J
or B or both.
PARTIAL ASSOCIATION. 49
(AC)(BC)
(ABC)--
(G)
(3)
(y) (y)
and we have
This proves the theorem; for the right-hand side will not be
4
50 THEORY OF STATISTICS.
Take the following case, for example. Suppose there have been
from some disease. Suppose, further, that the death-rate for males
(the case mortality) has been 30 per cent., for females 60 per cent.
here concerned, are death, treatment and male sex. The data show
that more males were treated than females, and more females
tively, the second positively, with the third. It follows that there
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Males.
Females.
Total•
24
24
48
56
16
72
36
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
42
14
24
38
i.e. of the treated, only 48/120 = 40 per cent. died, while of those
not treated 42/80 = 52.5 per cent. died. If this result were stated
the different proportions of the two that were treated and to the
a fair return, either the results for the two sexes should be
would have for each line and for a mixed record of equal
numbers—
PARTIAL ASSOCIATION. 51
•25
17
J)
)I
I 25
17
))
J)
I 25
81
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
53
JI
children with . . J r r r
children without .
is, however, due solely to the association of both with male sex.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
The student will see that if records for male-female and female-
and that if all four lines were combined there would be no illusory
association at all.
versd; in such a case A and B (so far as the record goes) will both
are not well defined, one observer may be more generous than
before.
(^>J^tte&)+8l+S! . . (6)
Hence if the value of (AB) exceed the value given by the first
both. If, on the other hand, (AB) fall short of the value given by
(.4.6) be equal to the value of the first two terms, A and B must
The following are the death-rates per thousand per annum, and the
Proportion
Death-rate
per thousand
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
per thousand.
over 65 Years
of Age.
46
Farmers „ ,, . . 19.6
132
34
16
rates among occupied males from 15-65 and over 65 years of age
each of its separate age-groups (under 65, over 65), and see
The calculated rate for farmers largely exceeds the actual rate;
and glass working still more so (13-0< 16-6) ; the actual low total
variations not only in the relative proportions of the old, but also
occupation depends not only on the mere proportions over and under
65, but on the relative numbers at every single year of age. The
one, the nature of the fallacy involved in assuming that crude death-
parent.
(C) = 3178
54 THEORY OF STATISTICS.
(AB)(BQ (AfiMC)
-1845-0 + 513-2
= 2358-2.
(ABC).
tion of them all for any case in which n is greater than four
student will find that there are no less than 270, and for six
that was asked. In Example ii., again, the three total associations
(G), ... of order lower than the second, assigned values of the
of the second and higher orders is only 2" -n+1 ; therefore the
2"-M+l
11
26
57
12. Practically, however, the mere fact that they can be deduced
this is not the case. The relations that exist between the ratios
determined from those that are given without more or less lengthy
[<
-[WMfp],
differences for the frequencies (AB), (AC), and (BC). The four
Clearly, the relation is not of such a simple kind that the term on
13. The particular case in which all the 2" - n + 1 given associa-
may term it, exists. Suppose, for instance, that we are given—
(ABXBC)-(AB)(AC)
(ABC'-—(B) (Ay'
(ABy^-(ABC^J^m
-W(B)(y) .My)(By)
*2 (?)'
form of the equation of independence, (2) (d), Chap. III. (p. 26).
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
14. It must be noted, however, that (8) is not a criterion for the
PARTIAL ASSOCIATION. 57
equation
NN"N
are given N, (A), and (B), and the last relation quoted holds
good, we know that similar relations must hold for (Afi), (aB),
and (afi). If N, (A), (B), and (C) be given, however, and the
attributes, while JV", (.4), (B), (C) are only four: the equation (8)
third order before the conclusion can be drawn that it holds good
student.
Quite generally, if i\T, (A), (B), (C), .... be given, the relation
N N - N " N y'
must be shown to hold good for 2" - n + 1 of the nth order classes
only because
2"-n + l = l
N~ N " N'
relation (9) holds good; but it does not follow that if the relation
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
REFERENCES.
Trans. Soy. Soc, Series A, vol. cxciv., 1900, p. 257. (Deals fully
mixing of records.)
58 THEORY OF STATISTICS.
EXERCISES.
1. Take the following figures for girls corresponding to those for boys in
Example i., p. 45, and discuss them similarly, but not necessarily using
exactly the same comparisons, to see whether the conclusion that "the
connecting link between defects of body and mental dulness is the coincident
10,000
(AB)
248
(A)
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
682
(AD)
307
(B)
850
(BD)
363
(-»)
689
(ABB)
128
2. (Material from Census of England and Wales, 1891, vol. iii.) The
following figures give the numbers of those suffering from single or combined
infirmities: (1) for all males, (2) for males of 55 years of age and over.
(1)
All Males.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
(2)
Males 55-
(1)
All Males.
(2)
Males 55
14,053,000
1,377,000
[AB) 183
65
(A)
12,281
5,538
(AC) 51
(BC) 299
(ABC) 11
14
(B)
(C)
45,392
10,309
47
7,707
746
blindness and mental derangement, and the partial association between the
same two infirmities among deaf-mutes, (1) for males in general, (2) for those
of 55 years of age or over. Give a short verbal statement of the results, and
The death-rate from cancer for occupied males in general (over 15) is
The death-rates from cancer for occupied males under and over 45 respec-
tively are 0.13 and 2.25 respectively. Of the farmers 46.1 per cent. are over
45.
years of age and 93 per cent. under. The death-rates are 12 per thousand per
PARTIAL ASSOCIATION. 59
1000
(AS)
358
U)
622
(AC)
471
(B)
558
(BC)
419
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
((')
617
(AB)-N/i = S1
(AC)-N/i = ^
(BC) -N/i = 83
show that
»[(^0).fe«^a]-.[(^y)-fc^M]^.iy
so that the partial associations between A and B in the universes 0 and y are
Conservative opposed one Liberal and there were no other candidates) 66 per
cent. of the winning candidates (according to the returns) spent more money
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
than their opponents. Given that 63 per cent of the winners were Con-
per cent. of the contests, find the percentages of elections won by Conservatives
(1) when they spent more and (2) when they spent less than their opponents,
and hence say whether you consider the above figures evidence of the influence
find the major and minor limits to y that enable one to infer positive associa-
as co-ordinates, and shading the limits within which y must lie in order to
permit of the above inference. Point out the peculiarities in the case of in-
11. Discuss similarly the more complex case (A)/N=x, (B)/N=2x, (C)/N=
3a;:—
(AO)/N=y.
(BC)/N=y.
(BO)/N=y.
CHAPTER V.
MANIFOLD CLASSIFICATION.
fold and of the B's tf-fold, the frequencies of the st classes of the
of the two characters, say Am and B„, i.e. the frequency of the
column and the nth row, the st compartments thus giving all
and the feet of columns give the first-order frequencies, i.e. the
numbers of Am's and 2?„'s, and finally the grand total at the
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
60
MANIFOLD CLASSIFICATION.
61
and Wales are divided into those which are in (1) London, (2)
other urban districts, (3) rural districts, and the houses in each
Inhabited.
Unin-
habited.
Building.
Total.
Rural districts
571
4064
1625
40
285
124
5
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
45
12
616
4394
1761
6260
449
62
6771
the eye-colours are classed under the three heads "blue," "grey or
Eye-colour.
Hair-colour.
Total.
Fair.
Brown.
Black.
Red.
Blue ....
1768
946
115
807
189
746
288
47
53
16
2811
Grey or Green
1387
3132
Brown ....
62 THEORY OF STATISTICS.
read similarly to the last. Taking the first row, it tells us that
there were 2811 men with blue eyes noted, of whom 1768 had
fair hair, 807 brown hair, 189 black hair, and 47 red hair.
Similarly, from the first column, there were 2829 men with fair
hair, of whom 1768 had blue eyes, 946 grey or green eyes, and
115 brown eyes. The tables are a generalised form of the four-
between the A's and the -5's, any such table may be treated on
association between any one or more of the A's and any one or
urban districts . . . )
rural districts . . . )
between the " uninhabitedness " of houses and the urban character
districts . . . . )
districts . . . . )
and columns 1, 2, and 4—the column for red being really mis-
last case.
-B's; and if so, is this dependence very close, or the reverse? The
briefly and incidentally, for where there are only four classes of the
where the number is, say, twenty-five or more, and the need for any
based, are, however, quite simple and fundamental, and the mode of
64 THEORY OF STATISTICS.
values of m and n—
(AmBn)JA^£n)'-(4A). • • • (1)
add them together, for the sum of all the values of 8, some of
which are negative and others positive, must be zero in any case,
the sum of both the (AB)'s and the (AB)0's being equal to the
get rid of the signs, and this may be done in two simple ways: (1)
ences and then summing the squares. The first process is the
(AB)0, and that the sum of all such ratios is, say, x2 \ or, in
*'=5(<Ss) • • • • <3>
to the root, for the coefficient simply shows whether the two
coefficient.1 t
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
the same da^ta classified in two different ways should be, at least
the case: if certain data be classified in, say, (1) 6 x 6-fold, (2)
V"
s
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
(6)
which (Am) = (Bm) for all values of m; and suppose, further, that
gency ; the ratio ^/N, which he denotes by <p2, the mean square con tingency;
and the sum of all the S's of one sign only, on which a different coefficient can
5
66
THEORY OF STATISTICS.
value of C—
V^
<= 2 C car
inot excee<
1 0-707
t= 3
0816
t= 4 ,
0-866
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
t= 5
i»•
0-894
t= 6
0-913
t= 1
in
0-926
t= 8 ,
0-935
t= 9 ,
0-943
< = 10 ,
} II
0949
time the classification must not be made too fine, or else the value
Eye colour.
Fair.
Brown.
Black.
Red.
Blue
1169
1088
506
663
154
48-0
53 4
14-6
1303
1212
357
332
crude for the purpose of calculating the coefficient, but will serve
(1768)2/1169
2673-9
(946)2/1303
686-8
(115)2/357
37-0
(807)2/1088
598-6
(1387)2/1212
1587-3
(438)2/332
577-8
(189)2/506
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
70-6
(746)2/563
988-5
(288)2/154
538-6
(47)2/48-0
46-0
(53)2/53-4
52-6
(16)2/14-6
17-5
Total = 5= 7875-2
N= 6800
S-N= 1075-2
say—
(AmB„) (Am+1Bn)
In both cases the first three ratios form descending series, but
the fourth ratio is greater than the second. The signs of the
++-
++-
the same way, exhibit just the same characteristic. But the
immediately after the first: if this be done, i.e. if "red " be placed
will be the same. The colours will then run fair, red, brown,
black, and this would seem to be the more natural order, consider-
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
tribution.
the same not only for every elementary tetrad of adjacent frequen-
(AmBn+q), (Am+pBn+q).
MANIFOLD CLASSIFICATION. 69
(AmBn)(Am+1B„+1)>(Am+lBn)(AmBn+l) . . (1)
and similarly,
(^m+A)(Am+iBn+l)>(Am+iBn)(Am+1Bn+1) . . (2)
(AmBn)(Am+zBn+1)>(Am+2Bn)(AmBn+1) . . (3)
For if
(AmBn) = (Am)(Bn)/N
From the work of the preceding section we may say that Table
THEORY OF STATISTICS.
(Data of Sir F. Galton, from Karl Pearson, Phil. Trans., A, vol. cxcv.
Father's Eye-colour.
1.
2.
3.
4.
Total.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
194
70
41
30
335
&
83
124
41
36
284
3
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
25
34
55
23
137
"o
56
36
43
109
244
Total
358
264
180
198
1000
Columns
1 and 2.
2 and 3.
3 and 4
0-735
0-631
0-577
0-401
0-752
0-532
0-424
0-382
0-705
0609
0-456
0-283
The order in which the ratios run is different for each pair of
the column of totals; for in, say, the column An the frequencies
of variables.
being the same for all the sub-classes of any one class. Thus
A's and as are both subdivided into B's and /3's, A1's, A2's ....
As's into B^s, B2's .... Bt's, and so on. Clearly this is necessary
ment. The next order is "Defence of the Country," with the sub-
Clerical, (2) Legal, (3) Medical, (4) Teaching, (5) Literary and
e.g. by repetition.
REFERENCES.
Contingency.
(1) Pearson, Karl, "On the Theory of Contingency and its Relation to
tingency table and working out the coefficient: the same principle is
the country.)
Isotropy.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
(4) Yule, G. U., "On a Property which holds good for all Groupings of a
Qualities," Proc. Roy. Soc, Series A, vol. lxxvii., 1906, p. 324. (On
(5) Yule, G. U., "On the Influence of Bias and of Personal Equation in
in contingency tables.)
of Cases wherein B exceeds (or falls short of) a given Intensity is recorded
for each Grade of A" Biometrika, vol. vii., 1909, p. 96. (Deals with a
each year of age. The table of such a type stands between the con-
measured quality.)
EXERCISES.
(1) (Data from Karl Pearson, " On the Inheritance of the Mental and Moral
Characters in Man," Jour, of the Anthrop. Inst., vol. xxxiii., and Biometrika,
contingency) for the two tables below, showing the resemblance between
brothers for athletic capacity and between sisters for temper. Show that
contingency should not as a rule be used for tables smaller than 5 x 5-fold:
these small tables are given to illustrate the method, while avoiding lengthy
arithmetic.)
74
THEORY OF STATISTICS
A. Athletic Capacity.
First Brother.
Athletic.
Betwixt.
Non-
athletic.
Total.
Athletic
Betwixt
Non-athletic .
906
20
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
140
1066
105
519
20
76
140
370
Total
1066
105
519
1690
B. Temper.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
First Sister.
Quick .
Good-natured
Sullen .
Total
Quick.
198
177
77
452
Good-
natured.
177
990
165
1338
Sullen.
77
165
120
362
Total.
452
1338
362
2152
PART II.—THE THEORY OF VARIABLES.
CHAPTER VI.
THE FREQUENCY-DISTRIBUTION.
that can present more than one numerical value, that is, a varying
(cf. Chap. V.), avoids the crudity of the dichotomous form, since
for the decade 1881-90, have been classified to the nearest unit;
death-rate was over 12 5 but under 13-5, over 13-5 but under
following table.
[Table I.
THE FEEQUENCY-DISTEIBUTION.
77
per Annum for the Ten Fears 1881-90. (Material from the Supplement
Number of
Number of
Mean Annual
Death-rate.
Districts with
Mean Annual
Death-rate.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Districts with
Death-rate
between Limits
Death-rate
between Limits
stated.
stated. •
12-5-136
5.
23-S-24--5
135-14-5
16
24-5-25-5
14-5-15-5
61
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
25 5-26-5
15-5-165
112
26-5-27-5
16-5-17-5
159
27 -5-28 -5
17-5-18 5
104
28-5-29-5
18 5-19-5
67
29-5-30-5
19-5-20-5
42
30-5-31-5
20 -5-21 -5
25
31-5-32-5
21-5-22-5
18
32-5-33-5
22-5-23-5
Total
632
great majority of districts lie nearer the lower limit than the
THEORY OF STATISTICS.
Families, Dying at Different Ages. (Cited from Proc. Jioy. Soc., vol. lxvii.
Number of
Number of
Age at Death,
Years.
Women Dying
between
Age at Death,
Years.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Women Dying
between
said Years
said Years
of Age.
of Age.
17-5-22-5
29
62-5- 67-5
73
22-5-27-5
87
67-5- 72 5
83 *
27 5-32-5
99
72-5- 77-5
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
77
32-5-37-5
109 ",
77-5- 82-5
78 1-:
37-5-42-5
90
82-5- 87-5
59
42-5-47-5
87
87 5- 92-5
26
47-5-52-5
64
92-5- 97-5
52-5-57-5
54
97-5-102-5
57-5-62-5
69
Total
1095
attained in the fourth class (age at death 32-5 to 37-5), and then
Number of
THE FREQUENCY-DISTRIBUTION. 79
like the preceding are formed in the following way :—(1) Thes
interval, is first fixed; one unit was chosen in the case of Tables
I. and III., five units in the case of Table II. (2) The position or
having been made, the complete scale of intervals is fixed, and the
these heads.
judgment.
The two conditions which guide the choice are these: (a) we
desire to be able to treat all the values assigned to any one class,
the first class of Table I. were exactly 13-0, the death rate of
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
every district in the second class 14 0, and so on; (b) for con-
Dividing the difference between these by, say, five and twenty, we.
clustering round certain values, e.g. tens, or tens and fives. This
chiefly round tens, " 25 and under 35," " 35 and under 45," etc., the
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
and under 30," "30 and under 40," and so on. Where there is
each observation. These are then dealt out into packs according
Thus, in compiling Table I., some districts will have been noted
16-5, 17.5, or 18.5, any one of which might at first sight have
birth and death alone are given, the age at death is only calcul-
able to the nearest unit; if the actual day of birth and death be
Tables I. and II., but may be more briefly indicated by the mid-
13 5
14 16
15 61
16 112
etc. etc.
limits in the form "x and less than y." In the case of measure-
6
82 THEORY OF STATISTICS.
Stature in Inches.
Number of
Observations.
58 „ „ 59
59 „ „ 60
14
etc.
etc.
—the statement "57 and less than 58, "etc., being often abbreviated
to 57-, 58-, 59-, etc. (cf. Table VI., p. 88). The mode of grouping
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
only taken to the nearest quarter of an inch, the limits are 56§
that the table is not perspicuous. Thus, consider the first two
running the eye down the column headed "number of houses " it
"£60 and under £80," and "£100 and under £150." But these
To make the latter really comparable inter se, they must first be
THE FREQUENCY-DISTRIBUTION.
83
Number
of Houses.
Frequency
per £10
Interval.
306,408
306,408
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
30 „ 40
182,972
182,972
40 ,, 50
105,407
105,407
50 „ 60
63,096
63,096
60 ,, 80
71,436
35,718
80 ,, 100
32,365
16,182
100 „ 150
41,336
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
8,267
150 „ 300
26,732
1,782
300 „ 500
6,198
310
500 ,, 1000
2,098
42
644
838,692
and so on. This gives the mean frequencies per £10 interval
however, impossible in the case of the last class, for we are only
official publications.
after the fifth year of age, but it would still be desirable to give
the numbers of deaths in each year for the first five years, so as
to bring out the rapid rise to the maximum in the fourth year
of life.
THEORY OF STATISTICS.
as an example.
Head-breadth
in Inches.
Number of
Head-breadth.
Head-breadth
in Inches.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Number of
Head-breadth.
5-5
5-6
67
58
5-9
"6-0
691
6-2
12
63
64
6-5
6-6
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
67
6-8
99
37
15
12
43
80
131
236
• 185
142
Total
1000
ticals by straight lines, the last points at each end being joined
down to the base at the centre of the next class-interval (fig. 1);
through the marks on the verticals (fig. 2), which now form the
85
-8 -9 60 -1 -2 -3 -4 -3 fi -7
5-5 -6 -7 -8 -d 60 -1 -2 -3 -4- S
THEORY OF STATISTICS.
Fig. 3.
the areas of the two little triangles shaded in the figure being
Fig. 4.
polygon is too small; and if, for this reason, the frequency-
to the maximum that a histogram is, on the whole, the better re-
almost imperative.
polygon and the histogram will approach more and more closely
significance (of. Chap. XV. § 15, and § 18, Ex. iv.). The forms
distribution.
THEORY OF STATISTICS.
Place of Birth-
Height without
Total.
shoes, Inches.
England.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Scotland.
Wales.
Ireland.
57-
68-
.—
59-
13
1
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
14
60-
.39
41
61-
"70
83
62-
128
30
169
63-
320
19
48
394
64-
534'
47
83
15
669
65-
740 Art
109
108
33
990
66-
JSJJV
THE FREQUENCY-DISTRIBUTION.
89
X1.200
• -■ A
JV
7X
ti
4t
II
?£i
^\
j\
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
^^ ^—
900
ifs 600
k.
58 60 62 64 66 68 10 72 7-1 76 78 80
Stature in inches.
THEORY OF STATISTICS.
English Sons (Karl Pearson, Biometrika, ii., 1903, p. 415); (2) for 1000
Stature in
Limits of Stature.
Inches.
(1)
English Sons.
(2)
Cambridge
Students.
59 5-60-5
20
60-5-61 -5
1-5
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
61 '5-62 -5
3-5
4-0
62-5-63-5
20-5
19-0
63 -5-64 -5
38-5
24-5
64-5-65-5
61-5
40-5
655-66-5
89-5
84-5
66-5-67-5
148-0
123 5
67-5-68-5
173-5
139-0
68-5-69-5
149-5
179-0
69 5-70-5
128-0
138-5
70-5-71-5
108-0
108-0
71-5-72-5
63 0
53-5
72-5-73-5
42-0
47-5
73-5-74-5
29-0
THE FREQUENCY-DISTRIBUTION.
91
200
two
160
'140
'too
"" 80
60
\
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
40
20
58 60 2 4 6 8 70 2 4 6 8 80
(Table VII.)
to
1
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
•5 BO
b eo
1z
f"
eo
24 SS 70 246
B 80
THEORY OF STATISTICS.
districts (Table VIII. and fig. 10) jis smoother and more like the
100
90
80
70
60
SO
4. 40,
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
ft. 30
20
10
z^
—/\
—J\
- -7- - V_
— 1. — v ——
4 3— —
t-\---
t-X
i- - -\
'- ^ ~
to 11
93
relief, and then tails off slowly to unions with 6, 7, and 8 per
cent. of pauperism.
Belief on the 1st January 1891. (Yule, Jour. Roy. Stat. Soc, vol. lix.,
1896, p. 347. q.v. for distributions for earlier years.) See Fig. 10.
Percentage of
Number of
Unions with
given Percent-
age in receipt
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
of Relief.
the Population
in receipt of .
Relief.
0-75-1-25
1-25-1-75
1-75-2-25
2-25-2-75
2-75-3-25
"3-25-3-75
3-75-4-25
4-25-475
4-75-5-25
5-25-5 75
5-75-6-25
6-25-6-75
6-75-7-25
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
7-25-7-75
7-75-8-25
8-25-8-75
18
48
72
\ 85\
V100 J
90
75
60
40
21
11
Total
632
towards the lower end of the range. This is shown very well by
the data (Table IX. and fig. 11) collected by the same British
stature were cited in the last section. As in the case of the stature
diagram (fig. 6), the small error of £ lb. has been neglected, for
the sake of brevity, in lettering the base-line of fig. 11, the classes
and so on.
THEORY OF STATISTICS.
1•600
aoo
« 800
400
b>
V.
'
1
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
85 105 125 145 165 185 205 2Z5 245 265 285
350
1/15 ZJ15 3/l5 4/15 SJ15 6/15 1J15 BJ15 3/15 10/15 11J15 12/15 li/l5 14/l5 I
95
born in England, Ireland, Scotland, and Wales. (Loc. cit., Table VI.)
Weights were taken to the nearest pound, consequently the true Class-
Weight
in lbs.
Total.
England.
Scotland.
Wales.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Ireland.
90-
100-
110-
120-
130-
140-
150-
160-
170-
180-
190-
200-
210-
220-
230-
240-
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
250-
260-
270-
280-
26
34
133
22
63
10
152
338
23
390
694
68
42
867
1240
173
153
57
1623
1075
255
178
51
1559
881
275
134
96
THEORY OF STATISTICS.
Lee, and Moore, Phil. Trans., A, vol. cxeii. (1899), p. 303.) See Fig. 12.
Number of
Number of
Mares with
Mares with
Fecundity.
Fecundity
Fecundity.
Fecundity
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
between the
between the
Given Limits.
Given Limits.
1/30- 3/30
17/30-19/30
315
3/30- 5/30
7-5
19/30-21/30
337
5/30- 7/30
11-5
21/30-23/30
293-5
7/30- 9/30
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
21-5
23/30-25/30
204
9/30-11/30
55
25/30-27/30
127
11/30-13/30
104-5
27/30-29/30
49
13/30-15/30
182
29/30-1
19
15/30-17/30
2715
Total
2000-0
(Karl Pearson and A. Lee, Phil. Trans., A, vol. cxc. (1897), p. 428, q.v.
Number of Days
Number of Days
Height of
on which Height
Height of
on which Height
Barometer
was observed
Barometer
was observed
in Inches.
between the
in Inches.
between the
Given Limits.
Given Limits.
THE FREQUENCY-DISTRIBUTION.
97
700
600
•g see
"5
- 400
-300-
'200
100
\.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
X06
305
31
ZS>. 295 30
30 40
Years of age
THEORY OF STATISTICS.
suggesting an ideal cur\ e that meets the base (at one end) at a
finite angle, even a right angle, as in fig. 9 (6), are less frequent,
kind. The actual figures for this case are given in Table XII., and
deaths reaches a maximum for children aged "3 and under 4,"
Agesin England and Wales during the Ten Years 1891-1900. (Supple-
Number of
Age in Years.
Deaths between
Number
per Annum.
Given Limits
of Age.
Under 1 year
4,186
4,186
1-
10,491
10,491
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
2-
11,218
11,218
3-
12,390
12,390
4-
11,194
11,194
5-
23,348
4,670
10-
4,092
818
15-
1,123
225
20-
585
117
25-
786
79
35-
512
51
45-
324
32
55-
260
26
65-
127
IS
75 and upwards
35
!
THE FREQUENCY-DISTRIBUTION. 99
only, they would have run 49,479, 23,348, 4,092, and so on,
type; but if the maximum is not absolutely at the lower end of the
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
100
THEORY OF STATISTICS.
lower end of the range to enable the exact position of the maximum
interest. It will be seen from the table and fig. 16 that with the
the present type, the number of estates between zero and £100
in annual value being more than six times as great as the number
the first class suggests, however, that the greatest frequency does
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
those who had taken part in the Jacobite Rising of 1715. (Compiled from
Cosin's Names of the Roman Catholics, Nonjurors, and others who refused
to take the Oaths to his late Majesty King Oeorge, etc.; London, 1745.
Annual
Number of
Estates.
Annual
Number of
Estates.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Value in
Value in
£100.
£100.
0- 1
1726-5
280
140-5
87
46-5
42-5
29-5
25-5
18-5
21
11-5
17-18
1- 2
2- 3
20-21
21-22
22-23
23-24
3- 4
4- 5
5- 6
6- 7
7- 8
27-28
8- 9
9-10
31-32
1
THE FREQUENCY-DISTRIBUTION.
101
16
14-
12-
H 10
n.
«S 4
2-
345678
10
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
II 12
THEORY OF STATISTICS.
Three Series of Ranunculus bulbosus. (H. de Vries, Ber. dtsch. hot. Ges.,
Number
of Petals.
Frequency.
Series A.
Series B.
Series C.
5
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
312
345
24
133
55
23
17
2
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
10
11
Total
337
380
222
instances occur here and there. Table XIV. and fig. 17 show
',300
,200-
S67S9
5 6 7 8 9 10 11
S 6 7 S 9 10
culus bulbosus.
103'
at Breslau during the Ten Years 1876-85. (See ref. 2.) See Fig. 19.
Cloudiness.
Frequency.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Cloudiness.
Frequency.
751
10
21
179
71
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
107
194
69
117
46
2089
Total
3653
104
THEORY OF STATISTICS.
rare.
2000.
1SOO.
1000.
500 .
1 1 1 1 i ''
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
1 1 • _J
2 3 4 5 6.
Table XVI. gives the distribution for an analogous case, viz. the
Percentage
Number of
Families.
Percentage
Number of
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Families.
of
Deaf-mutes.
of
Deaf-mutes.
0-20
20-40
40-60
220
20-5
12
60-80
80-100
6-5
15
Total
273
THE FREQUENCY-DISTRIBUTION. 105
of the children are deaf-mutes: at the other end of the range the
cases in which over 80 per cent. of the children are deaf-mutes are
between 60 and 80. The numbers are, however, too small to form
REFERENCES.
1896-7. See especially tome ii., livre iii., chap• i., "La courbe des
revenus."
The first three memoirs above are mathematical memoirs on the theory
our rough empirical types may be divided into several sub-types, the
§15.
EXERCISES.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
to the inch and 4 inches of stature to the inch, what is the scale of observa-
If the scales are 100 observations per interval to the centimetre and 2 inches
square centimetre?
and 2 per cent. to the inch, what is the scale of observations to the
square inch?
If the scales are 10 observations per interval to the centimetre and 1 per
centimetre?
AVERAGES.
those comparisons with other series which are necessary for any
106
AVERAGES.
107
differ: (1) they may differ markedly in position, i.e. in the values
they may centre round the same value, but differ in the range of
Fig. 20.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
percentage, its average is a' percentage, and so on. But there are
w^
'average should be rigidly defined, and not left to the mere estimation
average. At the same time too great weight must not be attached
will rarely be quite the same, but one form of average may show
arithmetic mean, the median, and the mode, the first named being
these may be added the geometric mean and the harmonic mean,
M-^X) . . . . (1)
mean fulfils the conditions laid down in (a) and (b) of S 4, for it
position. In the cases just cited, it will be noted that the mean
mean wage we need no£ know the wages of every hand, but only
we need not know the number in each family, but only the total.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
all the values observed are added together and the total divided
treating all the values in each class as if they were identical with
may be written—
M-yty.X) . . . . (2)
then
M=A+±Xff - - - • W
sixth class-interval from the top of the table, and a little nearer
values starting, of course, from zero opposite 3"5 per cent. Each
Ill
that is 0'21 per cent. Hence the mean is 3-5 - 0"21 = 3-29
per cent.
(1)
(2)
(3)
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
(4)
Mid-values
'of the
Deviation
Class-intervals
Frequency
from Arbitrary
Product
(Percentage in
/•
Value A
A-
receipt of
«•
Relief).
1.
18
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
-5
90
1-5
48
-4
192
72
-3
216
2-5
89
-2
178
100
90
75
-1
100
* **
-776
+1
75
4-5
60
+2
120
40
+3
120
5-5
21
+4
84
6
112
.THEORY OF STATISTICS.
tion from the figures of Table VI., Chap. VI. In this case the class-
of the intervals are 57^, 58^, etc., and not 57'5, 58-5, etc.
Mean Stature of Male Adults in the British Isles from the Figures of
(1)
(2)
(3)
Deviation
Height,
Frequency
from Arbitrary
Product
Inches.
/•
Value A
A-
57-
2■
-10
20
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
58-
-9
36
59-
14
-8
112
60-
41
-7
287
61-
83
-6
498
62-
169
-5
845
63-
394
-4
1576
64-
669
-3
2007
65-
990
-2
1980
66-
67-
68-
1223
1329
1230
-1
AVERAGES.
113
origin for the deviations: all the figures of col. (4) will be changed,
but the value ultimately obtained for the mean must be the
Mi
iioo
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
\70
- --/ V
- -J- . ,\-
7--..-3
1 Mo % y
,::::::: \::::::::
- /-— ■ _V
--r X-
- / -^v
I Is.
0 12 3^ 4 6 6 7 8 9 10
Fig. 21.—Showing the Arithmetic Mean M, the Median Mi, and the Mode Mo,
by verticals drawn through the corresponding points on the base, for the
errors caused by the assumption that all values within the same
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
course the class-interval is not large (Chap. VI. § 5). In the case
ever, the error is evidently of definite sign, for in all the intervals
frequency, i.e. the lower end of the range in the case of the illustra-
tions given in Chap. VI., and is' not evenly distributed over the
114
THEORY OF STATISTICS.
for the mean. The student should test for himself the effect of
this form the mean lies, as in the present example, on the side of
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Mo MiM
Fig. 22.—Mean M, Median Mi, and Mode Mo, of the ideal moderately
asymmetrical distribution.
arithmetic mean, and at the same time illustrate the facility of its
algebraic treatment:—
(a) The sum of the deviations from the mean, taken with their
For example, we find from the data of Table VI., Chap. VI.,
Hence the mean stature of the 1087 men born in the two countries
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
equation
be noted that the approximate value for the mean obtained from
that all the values in any class are identical with the mid-value
JL = Jt1 i *A2J
2(X) = 2(X1)±2(X2).
116 THEORY OF STATISTICS.
M=M1±M2 . . (8)
if
the arithmetic mean of the errors M2. If, and only if, the
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
latter be zero, will the observed mean be identical with the true
most or central value of the variable when the values are ranged
variable the vertical through which divides the area of the curve
The median, like the mean, fulfils the conditions (b) and (c)
take the mean of the nth and (m +1 )th values as the median,
rule, any value such that greater and less values occur with equal
the half is 316. Looking down the table, we see that there are
227 districts with not more than 2-75 per cent. of the population
in receipt of relief, and 100 more with between 2-75 and 3-25
per cent. But only 89 are required to make up the total of 316;
89
100 J
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
The mean being 3-29, the median is slightly less; its position
Difference . . = 703-5
703 "5
= 67'47 inches.
118
THEORY OF STATISTICS.
2-25 is 138; not exceeding 2-75, 227 ; not exceeding 3-25, 327;
400
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
ZOO
T Mi
2S 3 35
the side of the mean. As was shown in § 13, when several series
with the resultant mean, nor with any other simply assignable
be equally frequent.
the median, and he alone need be measured. (On the other hand
120 THEORY OF STATISTICS.
cannot be found from the total of the wages-bill, and the total
a final indefinite class, as in Table IV. (Chap. VI. § 10). (d) The
any other man whose height is only just greater than the median.
(In general the mean is the less affected.) The point is discussed
mean, median, and mode that appears to hold good with surprising
these three averages for a great many cases with which the
That is to say, the median lies one-third of the distance from the
mean towards the mode (compare figs. 21 and 22). For the
places of decimals,—
Mean 3-289
= 3-007,
for the final result, though three decimal places must be retained
the unions of England and Wales in the years 1850, 1860, 1870,
1881, and 1891 (the last being the illustration taken above),
THEORY OF STATISTICS.
Comparison nf the Approximate, and True Modes in the Case of Five Dis-
Relief) in the Unions of England and Wales. (Yule, Jour. Roy. Stat.
Year.
Mean.
Median.
Approximate
Mode.
True Mode.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
1850
1860
1870
1881
1891
6-508
5-195
5 451
3-676
3-289
6 261
5-000
5-380
3-523
3-195
5-767
4-610
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
5-238
3-217
3-007
5-815
4-657
5-038
3-240
2987
Comparison of the Approximate and True Modes in the Case of Five Dis-
Station.
Mean.
Median.
Approximate
Mode.
True Mode.
Southampton .
Londonderry
Carmarthen
Glasgow .
Dundee .
29-981
29-891
29-952
29-886
29-870
30 000
29-915
29 974
29-900
29-890
30-038
29-963
30-018
29-946
29-930
30-039
AVERAGES. 123
stable than the mean; but its use is undesirable in cases of discon-
/ definite reason for the choice of another form of average, and the
the mean is not the mode, and that its value is consequently mis-
noted, would apply with almost equal force to the median, for, as
mean.
logG=~2(logX) . . . (11)
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
less than their arithmetic mean; the student will find a proof in
The geometric mean has never come into general use as a repre-
mean does not possess any simple and obvious properties which
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
series, there being iV1 observations in the first, Jf2 in the second,
(*))_
N. log G = 2fv log C1 + JT,. log 0,-r- .... + Ifr log Gr . (12)
For if
X=X1/X2,
logX = logX1-logX2,
X=XVX2.X3 . . . . Xr
. G- of Xv X2, . .
■ ■ X„ by
the relation
G = Gy G"9. LTg .
. . . Gr . .
• (I*)
125
24. The use of the geometric mean finds its simplest application
save that the numbers recorded at the first census were P0 and at
1801 11 21 31
si 61
SI 91
Gunberland"
2SO
300
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
2SO
200
IS O
5 Cumberland
"3 Dorset
-S 100
Q. Hereford
SO
1801 11 21 31 41 SI 61 71 SI 91 1901
Census year.
geometric series, PQr being the population a year after the first
census, P0r2 two years after the first census, and so on, and
Pn = P0.rn . . . . (15)
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
i.e. the geometric mean of the numbers given by the two censuses.
the curves are frequently concave towards the base, and similar
case in which the rate of increase is not uniform over the entire
area—and almost any area can be analysed into parts which are not
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Pn+P^Po.lT+Po.r".
refs. 14, 15 for a discussion of methods actually used for the con-
■ • *"o
. y ■' ■t y'"
. . X\
■ ■ x\
1 io' z io, ± io -1 io
form of average of the Y'b for any given year will afford an
indication of the general level of prices for that year, provided the
'M
• (17)
^20 - / j. 20
Y"
Y'"
* 20
Yn*x\»
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
1 20
^10 ^'10
r'io
Y"\o ''
'' Y"J
*"2
X'"2
X"2\h
'' x\)
-\x\
X\
x"\ ■ ■
= (Y'n
Y"n
Y-'n ■ ■
Y" )-
From the first form of this equation- we see that the ratio of the
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
identical with the geometric mean of the ratios for the index-
not hold for any other form of average: the ratio of the arithmetic
the ratios, nor is the ratio of the medians the median of the
the year first chosen as base (i.e. year 0), and is identical with the
base. Again, a similar property does not hold for any other form
for example, the ratio of the mean in year 2 to the mean in year
1 will vary with the year taken as base, and will differ more or
less from the arithmetic mean ratio of the prices in year 2 to the
the same grounds, but has never been at all generally employed.
26. The general use of the geometric mean has been suggested
tude of the average; thus the length of a mouse varies less than
the stature of a man, and the height of a shrub less than that of
measured rather by their ratio to, than their difference from, the
THEORY OF STATISTICS.
tion made does not, however, appear to have been very widely
tested, and the reasons assigned have not sufficed to bring the
clearly does not hold where the (arithmetic) mode is greater than
H N\X,
\)
(18)
Number in
Number of
Litter.
Litters.
fix.
/•
7.000
11
5-500
16
5 333
17
4-250
•26
5 200
31
5-167
11
1-571
0-125
0-111
121
34-257
AVERAGES. 129
stated in the form "so much for a unit of money," and an average
and 20 ten to the shilling; then the mean number per shilling
if the prices had been quoted in the form usual for other com-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
until 1907, given in the form of "Sers (2-057 lbs.) per rupee."
and Wages in India" for 1908 and later years the prices have
REFERENCES.
General.
9
130 THEORY OF STATISTICS.
and its Social Effects set forth; Stanford, London, 1863. Reprinted
(5) Jevons, W. Stanley, "On the Variation of Prices and the Value of
the Currency since 1782," Jour. Boy. Slat. Soc, vol. xxviii., 1865.
Value of Gold," Jour. Roy. Stat. Soc, vol. xlvi., 1883, p. 714. (Some
criticism of the reasons assigned by Jevons for the use of the geometric
mean.)
(7) Galton, Francis, "The Geometric Mean in Vital and Social Statistics,"
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
(8) McAlister, Donald, "The Law of the Geometric Mean," ibid., p. 367.
(The law of frequency to which the use of the geometric mean would
be appropriate.)
(10) Crawford, G. E., "An Elementary Proof that the Arithmetic Mean
The Mode.
mode, p. 345.)
Jour. Roy. Stat. Soc., vol. lix., 1896, p. 343. (The note deals with
Estimates of Population.
last Intercensal Period," Jour. Roy. Stat. Soc, vol. lxiv., 1901, p. 293.
Report ofthe Registrar-General for England and Wales (Od. 2618, 1907),
p. cxvii.
Index-numbers.
are not considered in the present work. The student will find copious
131
Reports, 1887 (p. 247), 1888 (p. 181), 1889 (p. 133), and 1890 (p. 485).
EXERCISES.
1. Verify the following means and medians from the data of Table VI.
Chap. VI.
Mean
Median
67-31
67-35
68-55
68-48
66-62
66-56
67-78
67-69
In the calculation of the means use the same arbitrary origin as in Example
2. Find the mean weight of adult males in the United Kingdom from the
data in the last column of Table IX., Chap. VI. Also find the median weight,
3. Similarly, find the mean, median, and approximate value of the mode
4. Using a graphical method, find the median annual value of houses assessed
to inhabited house duty in the financial year 1885-6 from the data of Table
5. (Data from Sauerbeck, Jour. Roy. Stat. Soc, March 1909.) The figures
in columns 1 and 2 of the small table below show the index-numbers (or per-
centages) of prices of certain animal foods in the years 1898 and 1908, on
their average prices during the years 1867-77. In column 3 have been added
Find the average ratio of prices in 1908 to prices in 1898, taken as 100:—
Note that, by § 25, the last two methods must give the same result.
Commodity.
1.
2.
3.
1. Beef, prime
78
72
84
67
87
78
76
88
90
92
95
83
84
91
112-8
125-0
109-5
132
THEORY OF STATISTICS.
6. (Data from census of 1901.) The table below shows the population of
the rural sanitary districts of Essex, the urban sanitary districts (other than
the borough of West Ham), and the borough of West Ham, at the censuses
of 1891 and 1901. Estimate the total population of the county at a date
midway between the two censuses, (1) on the assumption that the percentage
rate of increase is constant for the county as a whole, (2) on the assumption
that the percentage rate of increase is constant in each group of districts and
Essex.
Population.
1891.
1901.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Rural districts
232,867
204,903
345,604
240,776
267,358
575,864
Total
783,374
1,083,998
7. (Data from Agricultural Statistics for 1905, Cd. 3061, 1906.) The
Britain in 1905, as compiled from the weekly returns of market prices for
Month.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
First
Quality.
Second
Quality.
s. d.
s. d.
January
13 0
11 0
February
11 0
90
March
80
60
April
76
66
May
80
76
June
86
80
July
96
86
August .
11 0
10 0
September
11 6
10 6
October .
14 0
12 6
November
18 0
16 0
CHAPTER VIII.
serious purpose. There are seldom real upper and lower limits
over 280 lbs., the next heaviest being under 260• lbs. The
the observations for the most part closely clustered round the
133
134 THEORY OF STATISTICS.
the measure should possess all the properties laid down as desir-
^ = ls(^) . . . . (1)
£=X-A.
** = is(<?) . . . . . (2)
M-A = d (3)
so that f = x + d.
Then ^ = xi + 2x.d+d\
135
But the sum of the deviations from the mean is zero, therefore
! + dK
(4)
are measured from the mean, i.e. the standard deviation is the least
the first moment (c/. Chap. VII. § 8): we shall not make use
MA
Fig. 25.
equal to the standard deviation (on the same scale in which the
will be very minute, since A will lie very nearly on the circle
are first of all totalled and the total divided by JY^to give tne
unit: the signs are not entered, as they are not wanted, but the
negative 290, thus checking the value for the mean, viz. 15s.
lid.+ 10/38.]
col. 4,—tables of squares are useful for such work if any of the
The sum of the squares is 16,018. Treating the value taken for
oo
o-=20-5d.
<«.M? =421-5263
o-= 20-529d.
errors in the mean have little effect on the value found for the
137
1.
2.
3.
4.
Earnings
Difference
{ (Pence).
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
(Difference)2
Union.
(Shillings
and Pence).
1. Glendale ....
s. d.
20 9
58
3,364
2. Wigton
20 3
52
2,704
3. Garstang
19 8
45
2,025
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
4. Belper
18 6
31
961
5. Nantwich .
17 8
21
441
6. Atcham
17 6
19
361
7. Driffield
17 1
14
196
8. Uttoxeter .
17 0
13
169
9. Wetherby .
17 0
13
169
10. Easingwold
16 11
12
144
11. Southwell .
16 6
49
12. Hollingbourn
16 4
r,
25
16 3
4
138 THEORY OF STATISTICS.
allowances for gifts in kind, such as coal, potatoes, cider, etc. The
that earnings would vary less than wages, as his earnings and not
,, „ ,, wages . . 26-0d.
have—
ii. below, cols. 1, 2, 3, and 4 are the same as those we have already
768, and so on. The work is therefore done very rapidly. The
remaining steps of the arithmetic are given below the table; the
per cent.
MEASURES OF DISPERSION, ETC.
139
Chap. VI. (Cf. the work for the mean alone, p. 111.)
(1)
(2)
(3)
(4)
Product.
(5)
Product.
Percentage
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
in receipt
of Relief.
Frequency.
f.
from Value A.
Deviation
A-
A2-
18
48
72
89
100
-5
90
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
192
216
178
100
450
768
648
356
100
1-5
-4
-3
2-5
-2
-1
3-5
90
-776
75
+1
+2
+3
+4
75
75
240
360
336
275
180
49
64
4-5
60
120
140
THEORY OF STATISTICS.
and Wales since 1850. (From Yule, Jour. Roy. Stat. Soc., vol. lix ,
in receipt of Relief.
Year.
Arithmetic
Mean.
Standard
Deviation.
1850
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
1860
1870
1881
1891
6 51
5-20
5-15
3-68
3-29
2-50
2-07
2-02
1 3B
1-24
the statures of adult males in the British Isles, the work being
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
the mean in Example ii. of Chap. VII. The steps of the arith-
metic hardly call for further explanation, but it may be noted that
of the mean, that the treatment of all values within each class-
or three cases.
141
from the figures of Table VI, p. 88. (Cf. p. 112 for the calculation of
mean alone.)
(1)
(2)
(3)
Deviation
(4)
(5)
Height.
Frequency.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
from
Product.
Product
Inches.
/•
Value A.
/•£•
f.e-
57-
-10
20
200
58-
-9
36
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
324
59-
14
-8
112
896
60-
41
-7
287
2,009
61-
83
-6
498
2,988
62-
169
_5
845
4,225
63-
394
-4
1576
6,304
64-
669
-3
2007
6,021
65-
990
-2
1980
3,960
66-
1223
1329
142 THKORY OF STATISTICS.
in Example i., for instance, the actual range is a good deal less
later stage (Chap. XI.), but the work of § 3 has already served as
N.M=NVM1 + N2 M2,
Ml-M=d1
M2 - M= d2.
the mean M are, by equation (4), o-l2 + d12 and o-22 + rf22 respec-
It is evident that the form of the relation (5) is, quite general:
standard deviations o-v o-2, . . . o-„ and means diverging from the
N(N+l)(2N+l)
o-*=g . . . . (10)
student will have to take the statement on trust for the present,
squaring deviations and then taking the square root of the mean
this feeling after a little practice in the calculation and use of the
sion, unless there is some very definite reason for preferring another
of position. It may be added here that the student will meet with
have adopted the most recent (due to Pearson, ref. 2): many of
the earlier names are hardly adapted to general use, as they bear
(Airy), and "mean square error" have all been used in the same
sense. The square of the standard deviation, and also twice the
from the median, but the latter is the natural origin to use. Just
i.e. until it coincides with the mth value from the upper end of
(N - m)c - mc
+N■
= A+hjV-2m)c.
MEASURES OF DISPERSION, ETC. 145
The new mean deviation is accordingly less than the old so long as
m>lN.
all origins within the range between the iV/2th and the (N/2 + l)th
deviation is lowest when the origin coincides with the (iV+ l)/2th
15. The calculation of the rmai' deviation either from the mean
We have already found the mean (15s. lid. to the nearest penny),
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
and the deviations from the mean are written down in column 3.
ations we find a total of 590. The mean deviation from the mean
replaces the mean as the origin from which deviations are measured.
The median is 15s. 6d. The deviations in pence run 63, 57, 50,
36, and so on; their sum is 570; and, accordingly, the mean
round 3-5 per cent. We have already found that the sum of
add JVvd to the sum found and subtract N2.d. In the present
therefore
and the sum of deviations from the mean is 1285 — 9-2 = 1275-8.
which the median (instead of the mean) lies should, for con-
10
146 THEORY OF STATISTICS.
(Chap. VII. § 15) 3195 per cent. Hence 3-0 per cent. should be
from the median is 2-012 intervals, or again 1-01 per cent. The
value is really smaller than that of the mean deviation from the
work, and in such cases the use of the mean deviation may be
Isles, Example iii., the ratio found is 0-80. For a short series of
one-quarter of all the values observed are less than Q1 and three-
then Q3 is termed the upper quartile. The two quartiles and the
a symmetrical distribution
Mi-Q1 = Q3-Mi,
measure
graphical interpolation (cf. Chap. VII. §§15, 16). Thus for the
632-4-4=158
Difference = 20
20
interpolation.
Fit-"-
rule, but the actual ratio, viz. 0p61, does not diverge greatly.
It follows from this ratio that a range of nine times the semi-
like the median, with great ease, and the quartiles may be found,
three men picked out for measurement who stand in the centre
Chap. VII. § 14. It has, however, been largely used in the past,
the average, but also deviations from the average, the geometric
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
, = 100.J,
mean, the coefficient of variation (ref. 6), and has used it, for
measurement employed.
, mean - mode ., ,
skewness=——3—-j—3—:—-;— . . (12)
from the formula, but, as a fact, the value does not exceed unity for
(cf. Chap. VII. §20). The measure (12) is much more sensitive
past in lieu of the methods dealt with in Chapters VI. and VII.,
the total frequency into ten equal parts, form a natural and
151
above it and 50 per cent, below, is the median: the two quartiles
lie between the second and third and the seventh and eighth
deciles respectively.
the same as for median and quartiles (Chap. VII. § 15, and above,
tlno-
S | 600-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
\\SOO-
-S 8 400-
300.
\200-
\*IOO-
|1
|-3
in receipt of relief
which the Pauperism on 1st January 1891 did not exceed any given per-
determination of Deciles.
The figures of the original table are added up step by step from
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
. the top, so as to give the total frequency not exceeding the upper
figure, rises slowly at first when the frequencies are small, then
more rapidly as they increase, and finally turns over again and
becomes quite flat as the frequencies tail off to zero. The deciles
152
THEORY OF STATISTICS.
terminal ordinate into ten equal parts, and projecting the points
the fourth decile, the value of which is approximately 2.88 per cent.
parts (grades, as Sir Francis Galton has termed them), and erecting
responding percentile. This gives the curve of fig. 27, which was
o io 20 30 <w so eo w so so wo
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
V-
\\*-
0 10 20 30 40 BO 60 70 SO 90 1O0
Grades
that the ogive curve does not bring out the asymmetry of the
153
boys are then "numbered up " in order, the number of each boy,
or his rank, serves as some sort of index to his capacity (cf. the
not quite the same as grade; if a boy is tenth, say, from the
74-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
72-
70-
9<s<?-
=<5<J-
£64-
62-
60-
10 20 30 40 SO 60 70 SO 90 100
-74
-72
-70
-68
-66
-64
-62
-60
O 10 ZO 30 40 SO 60 70 SO 90 100
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
REFERENCES.
General.
Standard Deviation.
Roy. Soc, Series A, vol. clxxxv., 1894, p. 71. (Introduction of the term
Mean Deviation.
(3) Laplace, Pierre Simon, Marquis de, Théorie analytique des probabili-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Law of Frequency of Error," Phil. Mag., vol. xlix (4th Series), 1875,
pp. 33-46.
measure of dispersion.)
Relative Dispersion.
Skewness.
of term, p. 370.)
We have given a direct method that seems the simplest and best for
155
EXERCISES.
1. Verify the following from the data of Table VI., Chap. VI., continuing
the work from the stage reached for Qu. 1, Chap. VII.
Stature in
England.
Scotland.
Wales.
Ireland.
Standard deviation .
Mean deviation.
Quartile deviation
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
2-56
2-05
2-50
1-95
1-56
078
235
1-82
1-46
0-78
2-17
1-69
1-35
0-78
deviation
Quartile deviation/standard
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
1-78
0-80
deviation
Lower quartile .
Upper ,, . . .
0-69
0 62
0 62
0-62
65-55
69-10
66-92
70-04
65-06
6798
66-39
69-10
for the distribution of weights of adult males in the United Kingdom given in
Compare the ratios of the mean and quartile deviations to the standard
Find the value of the skewness (equation 12), using the approximate value
of the mode.
find the quartile values for houses assessed to inhabited house duty in 1885-6,
Find also the 9th decile (the value exceeded by 10 per cent. of the houses
only).
numbers 1 to 10.
5. (Data from Sauerbeck, Jour, Boy. Stat. Soc., March 1909.) The
1908 on their average prices in the years 1867-77 :—40, 43, 43, 46, 46, 46,
54, 56, 59, 62, 64, 64, 66, 66, 67, 67, 68, 68, 69, 69, 69, 71, 75, 75, 76, 76,
78, 80, 82, 82, 82, 82, 82, 83, 84, 86, 88, 90, 90, 91, 91, 92, 95, 102, 127.
Find the mean and standard deviation (1) without further grouping; (2)
grouping the numbers by fives (40-, 45-, 50-, etc.); (3) grouping by tens (40-,
156 THEORY OF STATISTICS.
median) as origin, on the assumption that the observations are evenly dis-
tributed oyer each class-interval. Take the number of observations below the
interval containing the mean (or median) to be n,, in that interval n2, and
above it n3 ; and the distance of the mean (or median) from the arbitrary
origin to be d
Show that the values of the mean deviation (from the mean and from the
median respectively) for Example ii., found by the use of this formula, do not
differ from the values found by the simpler method of §§ 16 and 17 in the
Chap. VII.: the second form of the relation is given by G Duncker {Die
that if deviations are small compared with the mean, so that (x/ATf may be
G=M[
{*-*&),
where G is the geometric mean, M the arithmetic mean, and a the standard
9. (Scheibner, loc. cit., Qu. 8.) Similarly, show that if deviations are small
/c- ¥J
CHAPTER IX.
CORRELATION.
1-3. The correlation table and its fonnation—4-5. The correlation surface—
6-7. The general problem—8-9. The line of means of rows and the
the coefficient.
peerage). Table V., the rate of discount and the ratio of reserves
the first variable for cases in which the second variable lies
within the limits stated on the left of the row. Similarly, every
for cases in which the value of the first variable lies within the
157
158
thkwtt or stxranca.
-" y
1.
"2 *«
it
—a
.-
~H
1
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
1a
(.1 "-'
I. «1X3«cr-1-f:
LI
7»-78.
I —St , So
7S 75.
70-72.
«7-«».
«4-««.
<51-€3.
5S-80.
65-57.
S '52-54.
49-51.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
46-48.
43-45.
40-42.
37-39.
I . I "■=
I I I "2
l°°§3 I I
IIi
1Il
"rt I I I I I I I I
II
I1
Ii
I1
I I I I I I I I I I I • -£
N9CoW9NioooH^N9 -V
-
-
-
o
o
9
9
9
1
9
1
9
9
n
9
9
9
9
9
9
9
9
9
1
H
-9
9-
9-
99
11
99
99
99
91
99
-9I
i,T.
zzS
\
O
<
O
999
Z9
9Z
9Z
OS
99 -9
9—i
ZI9
-9Z
tZ9
Z99
99 -91
-Wox
99 -9 19 Z9 It-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
-
-
-
o
a
9
9
9
9
2
9
9
p
.-
C
s-
in
i9
9.
1-
to
91
99
99
99
99
99
99
91
9tI
DC
9.9
to9
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
~
<
X
W
O
999
999
991
j,
r^
9to1
w>i
99»
«"
■*
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
I
I
I
I
I
I
I
I
I
I
I
I
I
I
-
-
-
-
t
s
z
o
o
o
o
o
o
o
o
9
9
9
9
9
9
9
4
9
1
6
9
9
9
9
9
u
n
9
--
H
-s
-z
-9
-9
91
49
99
91
99
"
W
<
"9
•9
"9
'I-
•S-
t9[
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
-
-
-
-
-
z
z
c
o
o
o
9
1
9
9
n
9
2
1
1
9
9
n
9
H
H
9I
m
o-
-9
ze
go
99
94
91
99
Z
■
|
■
vwt
162
■n
1[
[9
99 on
I|t
M-fBS
■sy .-J
a rati]
>oix
% IO9
'oddQ
99p«9l[
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
I
I
I
I
I
I
I
I
I
I
I
I
I
-
-
-
-
-
-
t
t
t
s
s
s
o
T
9
9
9
9
9
9
9
9
9
9
9
9
1
9
9
8
9
9
9
9
9
9
9
9
9
9
1
oi
4,
9-
IM
oo
eo
19
91
t91
O
~
~
"9*?ox
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
of the one set running vertically and the other horizontally, and
termed the type of the array. (Pearson, ref. 6.) The special
headed in the same way as the final table and entering a dot,
occur in the table, as, e.g., in Table III. In this case the statures
which the recorded stature of the father is 60'5 in. and that of
fractional frequencies.
distributions for all arrays of the one variable, to the same scale,
stereograms.
under a few simple types: the forms are too varied. The simplest
and the maximum does not lie in the centre of the distribution.
etc. The data of Table II. will serve as an example. The total
J3
B
166
■s
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
THEORY OF STATISTICS.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
B
\
je 1C
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
166
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Theory
[Toface page It
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
if V
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
CORRELATION. 167
the top of the table (the mode being lower than the mean), and
negative for the rows at the foot, the more central rows being
upper end of the table in the compartment under the row and
column headed "30 -". The frequency falls off very rapidly
towards the lower ages, and slowly in the direction of old age.
any simple types. Tables V. and VI. are given simply as illus-
and if so, how closely, the two variables are related, and much
table, seeing that the measures of Chapters VII. and VIII. can be
In general they are not the same, and the relation between the
more important, and our attention will for the present be con-
THEORY OF STATISTICS.
variables, i.e. the scales at the head and side of the table, 01, 12,
parallel arrays are similar (Chap. V. § 13), and the means of arrays
must lie on the vertical and horizontal lines M^M, M2M, the
oo
i
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
M;
9.
.3M, 4 S 6X
(i
CI
II
-+-
—»—
-+-■
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
-+-
-+-
"M
it
ii
>
Fig. 32.
small circles denoting means of rows and the small crosses means
of columns. (In any actual case, of course, the means would not
would only fluctuate slightly to the one side and the other of the
two lines.)
each array, and the means of rows and of columns lie approximately
on one and the same curve, like the line RR of fig. 33.
169
either (1) that the means of arrays lie very approximately round
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
THEORY OF STATISTICS.
leaves too much room for guesswork, and different observers obtain
standard deviations.
10. Consider the simplest case in which the means of rows lie
p=^2(w) ■ ■ ■ ■ / ■ 0)
*' = £ (2)
62 = f2 ..... (3)
fr-JL r\ . . . . (4)
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
9\
This is necessarily greater than the value (7); hence ~%(x - bjy)2
is to, the deviation of the mean of the row from RR is d (fig. 35),
But the first of the two sums on the right is unaffected by the
That is to say, when b, is put equal to r o-Jo-y, the sum of the squares
and also %(n.e2) (fig. 35). Hence we may regard the equations (6)
173
least possible; or (6) equations for estimating the mean of the x'sr
associated with a given type of y (and the mean of the y's associated
Age of Wife
40 SO 60
Fig. 36. — Correlation between Age of Husband and Age of Wife in England
and Wales (Table II.): means of rows shown by circles and means of
natural sense, "lines of best fit " to the two actual lines of means.
and 62 are zero (c/. § 8). The sign is the sign of the mean
THEORY OF STATISTICS.
03
or,
C7
<*3
>;:,
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
"3 7/
73
FtitAwS statiu-e
61 R 66 68
70
7£
75
+s
s.
■
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
0\
N<v.
k:.
<
r=+0-51.
fig. 33 all lay on one and the same straight line. From these
175
Figs. 36, 37, 38 are drawn from the data of Tables II., III., and
IV., for which r has the values +0-91,+ 0-51, and +0"21 respec-
I 3 Sjt, 7 9 11
10
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
6• = r— o„ = r—
THEORY OF STATISTICS.
are both necessarily of the same sign (the sign of r). Since r is
1*
1§8§
♦
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
*+
x.
r'
*,
"
40
JO
i0
nn
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
portion of Male Births per thousand of all births (England and Wales,
not greater than unity, but the other may be considerably greater
stature (Galton, refs. 2, 3). In this case the two standard devia-
tions are very nearly equal, so that both b1 and 62 are less than
unity, say (using the more recent data of Table III.) 0.50 and 0"52.
CORRELATION. 177
Hence the sons of fathers of deviation x from the mean of all fathers
have an average deviation of only 052a: from the mean of all sons;
i.e. they step back or "regress " towards the general mean, and 0-52
x = \-y y = bvx.
truly linear and the standard deviations of all parallel arrays are
of arrays."
12
178
THEORY OF STATISTICS.
1.
2.
3.
4.
6.
6.
7V
8.
9.
X.
Y.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
X.
V-
Products xy.
Estimated
Average
Percent-
age of
Earnings
Popula-
Devia-
tion of
zfrom
of Agri-
tion in
Devia-
Union.
cultural
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Labourers.
receipt
of
tion of
y from
%-.
V2-
Posi-
Nega-
Shillings
and Pence
Poor-
law
Mean
(Pence)
Mean.
tive.
tive.
per Week.
Relief.
2-40
1. Glendale .
s. d-
20 9
+58
-1-27
3364
1-6129
73-66
•2. Wigton . .
20 3
2-29
+52
-1-38
2704
1-9044
71-76
3. Garatang .
CORRELATION. 179
from the mean are written down (columns 4 and 5): care must
17-53, and
,= ^-=--66.
20-5 x 1-29
the average earnings, so for the regressions we may alter the unit
b. = r^ = - 0-87, b„ = r^ = - 0-50.
o-y o-x
z=-0-87y y=-0-50x.
X= 19-13-0-877 . . . . (a)
the units being Is. for the earnings and 1 per cent. for the
respectively are
<7i>/iT^=15-4d. = l-288.
iy
180
THEORY OF STATISTICS.
this might mean that the giving of relief tends to depress wages.
12 13 14 15 16 17 IS 19n 20
21
obtained. Take scales along two axes at right angles (fig. 40)
lines, and then to check the work by seeing that they meet in the
determined from (b) by the points X= 12, i" = 564 and X=2\,
low earnings but very low pauperism, and Glendale and Wigton
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
with the highest earnings but a pauperism well above the lowest—
reduced to the value it would have with respect to the mean; (2)
the arithmetic.
£=x +| n = y + ij.
S(^) = 2(ay) + ^,
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
2(*y) = S(6,)-^.
P =P ~ &j9
In any case where the origin from which deviations have been
Wales; (2) Y, the ratio of the numbers of persons given relief " out-
doors " (in their own homes) to one "indoors" (in the workhouse).
The figures refer to a one-day count (1st August 1890, No. 36,
1890), and the table is one of a series that were drawn up with
The arbitrary origin for X was taken at the centre of the fourth
row, or 3-5. The following are the values found for the constants
this sign will be positive in the upper left-hand and lower right-
being grouped according to the value and sign of frj. Thus for
column 4 of Table VIII. (the row and column for which £>j = 0),
common to the two; this grand total must clearly be equal to the
-s
CORRELATION.
183
nary type. The numbers in heavy type are the Deviation-Products (£?)).)
Number
relieved
Outdoors
to One
Indoors.
Total.
0-6.
6-10.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
60
10-16.
16-20.
20-26.
26-30.
30-36.
35-40.
0- 1 {
0-6
,9
90
10
10
17-6
3
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
12
1- 2 {
'3-6
13 0
100
14-0
6-0 ,
45-6
2-3 |
1-0
4.6
13 0
13.6
14-0
2-0
48'0
3- 4 {
1-0
45
0
184
THEORY OF STATISTICS.
1.
2. 3.
4.
5. 6.
Frequencies.
Total
Products.
«*•
Quadrants.
Quadrants.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Positive.
1-5
Negative.
21-5
20
+ 1-5
20
11-5
+ 8-5
17
12
+ 10
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
30
18
+ 17
68
175
+ 16-5
99
0-5
+ 1-5
12
1-5
+ 0-5
4-5
10
0-5
+ 3-5
35
12
—
CORRELATION. 185
x = Q'74.y, or
X= 13-9 + 0-747,
and columns, and drawing a diagram like figs. 36, 37, and 38.
separated from it, and the daughter-frond when its first daughter-
made with the Zeiss-Abbe camera under a low power, the actual
The arbitrary origin for both X and 7 was taken at 105 mm.
The following are the values found for the constants of the single
distributions:—-
the totals the frequency in the row and column for which £rj is
1
1
4
9
9
1
8
5
6
2
5
4
9
3
7
2
4.
1.
iv-
14
12
55
10
45
96
13
51
28
17
6-5
7-5
0-5
6-5
4-5
8-5
8-5
—
—
+
+
210
105
186
13 5
5. 6.
2. 3.
17-5
13-5
13-5
- 8 -5
+1
+5
+7
Total.
+ 12
+ 5-5
+ 8-5
+ 1-5
+ 1-5
+ 3-5
+ 17-5
Products.
Table IXa.
Quadrants.
Quadrants.
Frequencies.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
THEORY OF STATISTICS.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
i
-
c
c
o
u
it
-a
-a
Totf
?1
type.
90-96
84-90
78-84
72-78
66-72
60-66
96-102
daught
162-168
156-162
150-168
144-150
138-144
132-138
126-132
120-126
114-120
108-114
Table IX.
102-103;
Theory of S\
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
CORRELATION. 187
F=l-48 + 0-69X
points in working:—
(1) To give p and £rj their correct signs in finding the true
mean deviation-product p.
unit, in the value of r=p/o-x ary, for these are the units in terms
(3) To use the proper units for the standard deviations (not
y and then work out the correlation coefficient for samples of,
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
REFERENCES.
Sir Francis Galton, in (2), (3), and (4), developed the practical method,
quency (cf. Chap. XVI.). The method used in the preceding chapter
(1) Bravais, A., "Analyse mathematique sur les probabilites des erreurs de
(5) Edgeworth, F. Y., "On Correlated Averages," Phil. Mag., 5th Series,
(7) Yule, G. U., "On the significance of Bravais' Formula; for Regression,
etc., in the case of Skew Correlation," Proc. Roy. Soc, vol. Ix., 1897,
p. 477.
(8) Yule, G. U., "On the Theory of Correlation," Jour. Roy. Stat. Soc,
tion," Mem. and Proc. of the Manchester Lit. and Phil. Soc, vol. li.,
relating to several Quantities," Phil. Mag., 5th Series, vol. xxiv., 1887,
chapter, and based on the use of the median: the method involves
the use of trial and error to some extent. For some illustrations see
189
EXERCISES.
X. Y.
for so few observations: the figures are given solely as a short example on
2. The following figures show, for the districts of Example i., the ratios of
of relief in the workhouse. Find the correlations between the out-relief ratio
and (1) the estimated earnings of agricultural labourers; (2) the percentage
6-40
14
7-50
27
297
4-04
16
4-44
28
5-38
7-90
16
8-34
29
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
3-24
3-31
17
069
30
7 61
7-85
18
9-89
31
5-87
0-45
19
4-00
32
5-50
10-00
■JO
6-02
33
3-58
4-43
21
8-27
34
6-93
4-78
22
1-58
35
6-02
10
4-73
190 THEORY OF STATISTICS.
I. Group together (1) two top rows, (2) three bottom rows, (3) two first
II. Regroup by ten-year intervals (IB-, 25-, 35-, etc.) for both husband and
etc., for son. If a 3-inch grouping be used (58.5-61-5, etc., for both father and
son), the coefficient of mean square contingency is 0.465. [Both results cited
VI. For cols., group all up to 494 "5 and all over 521 .5, leaving central ools.
.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
CHAPTER X.
METHODS.
coefficient.
consistent with others; and not only are care and judgment
and, if several are to be dealt with, they should afford the answers
general rules can be laid down, but the following are given as
191
192 THEORY OF STATISTICS.
aged in receipt of relief, was given in Chap. IX. (p. 183). The
tics of crime).
age, the actual figures given by one of the only two then existing
returns of the age of paupers being—2 per cent. under age 16,
1 per cent. over 16 but under 65, 20 per cent. over 65. (Return
36, 1890.)
"pauperism "?
VIII., Chap. IX.), as numbers are more important than cost from \
of dealing with these two classes differ entirely from the methods
there does not seem to be any special reason for taking the one
return rather than the other, but the return for 1st January was
relief on 1st January 1871, 1881, and 1891 (the three census
years), less lunatics and vagrants, was therefore tabulated for each
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
The latter seems, as before, the simpler and more important ratio
for the present purpose, though some writers have preferred the
outdoor to total paupers, the figures are 94 per cent. and 91 per
cent. respectively, which are so close that they will probably fall
therefore tabulated for 1st January in the census years 1871, 1881,
1891.
room, i.e. "overcrowding"; (3) rateable value per head (Aged Poor—
13
194 THEORY OP STATISTICS.
at the census of 1891, and are not available for earlier years.
Some trial was made of rateable value per head, but with not
very satisfactory results. For any given year, and for a group of
pauperism, but changes in the two are not very highly correlated:
years of age was therefore worked out for every union and tabu-
cent. for those over 65, and only 1-2 per cent. for those under that
bers of each age and sex. (Cf. below, Chap. XI. pp. 219-21.)
tabulated for every union were then measured by working out the
the value in the earlier year as 100 in each case. The percentage
the conditions are and were very different for rural and for urban
0-3 but not more than 1 person per acre: Urban, more than 1 person
selves. The limit 0-3 for rural unions was suggested by the
Table VII., Chap. IX.): the average density of these was 0-25,
and 34 of the 38 were under 0'3. The lower limit of density for
between all the possible (6) pairs that can be formed from the four
variables.
of fertility in man. (Cf. Pearson and others, ref. 3.) One table,
from the memoir cited, was given as an example in the last chapter
(Table IV.).
the same way by excluding all cases in which, say, husband was
over 30 years of age or wife over 25 (or even less) at the time
afterwards.
the first generation is 5 (say the mother and her brothers and
for whom data are given, and who fulfils the conditions as to
turnips and other root crops, hay, etc.), and the weather. (Cf.
other hand, the area should not be too small; it should be large
the critical period will not be very well denned. If, therefore,
rainfalls were averaged for eight stations within the area, and
limit (about 42" Fahr.) there is very little growth, and the
The student should refer to the original for the full dis-
for the more rapid movements will often exhibit a fairly close
births in the same year); (2) the general mortality (deaths at all
ages per 1000 living) in England and Wales during the period
when the infantile mortality rose from one year to the next
the general mortality also rose, as a rule; and similarly, when the
198
THEORY OF STATISTICS.
were, in fact, only five or six exceptions to this rule during the
as the general mortality has been falling more or less steadily since
tion between annual values may, indeed, very well vanish, for the
causes as show marked changes between one year and the next, it
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
the process), and the correlation worked out between the figures of
columns 3 and 5.
1.
2.
3.
4.
5.
Infantile
Increase or
General
Increase or
Year.
Mortality
Decrease
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Mortality
Decrease
per 1000
from Year
per 1000
from Year
Births.
before.
living.
before.
1838
159
22-4
1839
151
-8
21-8
-0-6
1840
154
+3
22-9
+ 1-1
1841
145
-9
21-6
-1-3
1842
152
+7
217
+ 0-1
1843
150
-2
21-2
-0-5
For the period to which the diagram refers, viz. 1838-1904, the
CORRELATION: PRACTICAL APPLICATIONS AND METHODS. 199
and under 1 year of age. (Gf. Exercises 7 and 8, Chap. XL, and
| ZOO
1840 SO
90
10
so
so
woo
5>
ft.
■,100
L§~ so.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
uN
r^
u^
\/
W^^"7 V
"^^
ft.
ao %
5, >
20
1840 SO
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
TO
Y&ars
SO
90
1900
England and Wales; (2) the values of exports and imports per
1855 eo
65
0-£
I!
kO
TO 75 SO 85
90 33 1900 05 „
1*0
5s
30-4
•I
io +
°I
1855 60 00 TO IS 80 85 90 95 !90O 05
rise from one year to the next when the other rises, and a fall
when the other falls. The movement of both variables is, how-
200
THEORY OF STATISTICS.
trend, and it is the " waves " in the two variables—the short-period
being as near as may bo the same as the period of the " waves "—
apart from the slower changes. The figures for foreign trade
however great they may be. The arithmetic may be carried out
in the form of the following table, and the correlation worked out
1.
2.
Marriage-
3.
4.
5.
Exports +
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
6.
7.
Year.
rate
(England
Nine
Years'
Differ-
Imports,
£'i per
Nine
Years'
Differ-
and
Average.
ence.
head
Average.
ence.
Wales).
(U.K.).
1855
16-2
9-36
1856
16-7
11-14
1857
16-5
11-85
.—
1858
16'0
CORRELATION: PRACTICAL APPLICATIONS AND METHODS. 201
the exports and imports of the year before, or two years before,
possible to obtain the figures for exports and imports for periods
i860
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
1860
+ Imports per head) in England and Wales : the Curves show Deviations
from 9-year means. Data of R. H. Hooker, Jour. Eoy. Stat. Soe., 1901.
similar to fig. 43, and the nature of the relationship between the
Y = A + B.<t>(X),
X(Y-B) = A,
XY=A + BX.
Z=A + BX.
Y=ABZ
we have
XnY=A
we have
method of treatment.
found in various ways, for the most part dependent either (1)
on the formulae for the two regressions »—- and r—, or (2) on
r= Jbvbr
coefficient.
(2) The means of one set of arrays only, say the rows, are
calculated, and also the two standard-deviations o-x and <rr The
the slope of the line to the vertical is r <rJo;,, and hence r will be
deviations.
and the ratio of the dispersions of the two variables being required:
dispersion, will serve quite well for rough purposes in lieu of the
RR to the vertical is r.
deviations as units, and measuring the slope of the line, was the
and hence
2-56
2-26
2-11
2-26
2-55
2-45
2-24
2-33
2-23
2-60
Mean
2-359
approximately
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
/" /2-359V
= 0-514.
case this would only lead to a very slightly different result, viz.
"
CORRELATION: practical applications and methods. 205
curve which sweeps through all the means of arrays, and the second
REFERENCES.
(1) Yule, G. U., "On the Correlation of total Pauperism with Proportion of
Out-relief," Economic Jour., vol. v., 1895, p. 603, and vol. vi., 1896,
p. 613.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
(2) Yule, O. U., "An Investigation into the Causes of Changes in Pauperism
Roy. Stat. Soc, vol. lxii., 1899, p. 249. (Cf. Illustration i.)
(4) Hooker, R. H., "On the Correlation of the Marriage-rate with Trade,"
Jour. Roy. Stat. Soc, vol. lxiv., 1901, p. 485. (The method of
Illustration v.)
of Illustration iv.)
(6) Hooker, R. H., "The Correlation of the Weather and the Crops," ibid.,
(7) Norton, J. P., Statistical Studies in the New York Money Market;
logarithmic curve.)
de la sociite de statistique de Paris, 1905, pp. 255 and 306. (Uses the
(9) Yule, G. U., "On the Changes in the Marriage and Birth Rates in
England and Wales during the past Half Century, with an Inquiry as
to their probable Causes," Jour. Roy. Stat. Soc, vol. lxix., 1906, p. 83.
(The second part is useful for the fitting of curves in cases of non-linear
regression.)
206 THEORY OF STATISTICS.
v^ (12) Pearson, Karl, On the General Theory of Skew Correlation and Non-
which measures the approach of the points given by every pair of values
of the two variables x and y to a curve of any form, in the same way
\
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
CHAPTER XI.
THE CORRELATION-COEFFICIENT.
ing to the correction of death-rates, etc., for varying sex and age-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
arithmetic mean.
derive their importance largely from the fact that they fulfil this
^ = X.-I i Ji-rp
evidently
207
208 THEORY OF STATISTICS.
That is, if r be the correlation between x1 and x2 , and o-, o-1, o-2
+ • • • • + 2r23.o-2o-3 + ....
r12 being the correlation between X1 and X2, r2S the correlation
. Z=X+S.
/
THE USE OF THE CORRELATION-COEFFICIENT. 209
zero. Then, the arithmetic mean error being zero for all values
notice that the assumption made does not imply the complete in-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
errors fluctuate more, for example, with large than with small
supposed.
and accordingly
14
210 THEORY OF STATISTICS.
we have
r t = Sfoy^S^y,,) = S(x1y2)2(a:2y1)
rrrr
rrrr
"''" (r*,-•r„*)'-'
N*\X
that x2\M2 is so small that powers higher than the second can
v2 = o-2\M2,
- /=5j(1-r^2 + V) • • • • (9)
2 , 72 l Mi2*
s2 + I" = NMf
(1+fJV2f2+32j)
M2
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
or from (9)
given by
*w.-<£;-/l)$-',)
L3
<Xxf)~^
.MXM,
-'W+SX,+^+iy -"*-
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
212 THEORY OF STATISTICS.
MM
where, in the last step, a term of the order -v34-has again been
vl
Z2.X3 = X2, and the answer to the question whether the correlation
obtainable for the coefficient, which holds good even for the limiting
case when there are only two rows and two columns. It is
-
THE USE OF THE CORRELATION-COEFFICIENT.
213
(AS)
(aB)
(5)
(A0)
(o/3)
(A) 1 (a)
Taking the centre of the table as arbitrary origin and the class-
£=2^{(°)-(^)}
o-1!!-0-25-ii, = (^)(a)/^
Finally,
Writing
(AB)-(A)(B)IN=Z
this reduces to
2(*y)=&
Whence
r= J(A)(a)(B)(B) ■ ■ ■ (U)
(AB) = (.4) = (B). This is the only case in which both frequencies
(aB) and (Afi) can vanish so that (AB) and (a/3) correspond to
one family with statures 5 ft. 9, 5 ft. 10, and 5 ft. 11, these are
„ 5 ft. 11 5 ft. 11 „
5 ft. 10 „ „ „ „ 5 ft. 10
which may be entered into the table. The entire table will be
i.e. once in combination with every other value, the means and
written
= x1{%(x)-x1}+x2{%(x)-x2}+xs{ll(x)-xi} + ....
No* 1
W(N-i)<r2 iir-i •
(13)
values xv x2 only determine the two points (xv x2) and (x2, xj,
association will arise between the first and second member of the
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
pair if all possible pairs are formed in a mixture of .4's and a's.
the equation (13) still holds, even if the variables can only assume
XIV.
-My,
second record is identical with that in the first record, or the mean
record, or both.
NyM1+N^
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Ny + N2
give a correlation-coefficient
r = 2fey)
%(xy)
(»1 + «2)<W'
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
r1 n1 + n2 v'
not really belong to the same skeleton, and have been virtually
M=S,(X)IN.
so that
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
M' = %(W.X)/%(W).
know that 23,930 qrs. were sold at A, only 26 qrs. at B, and 3933
27889 =29s-
average prices. from year to year (cf. Chap. VII. § 25), it may
importance from some point of view; and much has been written
of simply as unity.
rate for simplicity as a fraction, and not as a rate per 1000 of the
population,
-. . „ . . total births
J total population
i.e. the rate for the whole country is the mean of the rates in the
correlation between weights and variables, <rw and or,. the standard-
THEORY OF STATISTICS.
receipt of Relief.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
January 1.
Arithmetic Mean
England and
of Rates in
different Districts.
Wales as a
whole.
1850
1860
1870
1881
1891
6-51
5-80
4-26
4-77
312
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
2 69
5-20
5-45
3 68
3-29
In this case the weighted mean is markedly the less, and the
must therefore be negative, the larger (on the whole urban) dis-
other hand, for the decade 1881-90 the average birth-rate for
mean of the rates for the different districts 30-34 only. The
3-29
2-69 459
1-24 X564
- 0-39.
THE USE OF THE CORRELATION-COEFFICIENT. 219
- 32-34-30-34 459
r~ 4-08 *564
= + -40.
of course, accidental.
age-groups 0-, 10-, 20-, etc., in which the fractions of the whole
population are p0, pv p2, etc., where 2(p) = 1. Let the death-
Now D and A may differ either because the d's and 8's differ
really both districts are about equally healthy, and the death-
district and the second urban, for instance, there will be a larger
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
population, but with the weights, u-1 ir2 tt3 . . . . given by the
19. Difficulty may arise in practical cases from the fact that
tion, but only the crude rates D and the fractional populations
then given by
%(d.-K) _s(M
2(d.p) 2(8.p)'
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
This will hold good if, e.g., the death-rates in the standard
ratio in all age-classes, i.e. S1/rf1 = S2/d2 = Ss/ds = etc. This method
eye-colour as well.
formed by finding the value of the variable for which the sum
10g (r. ^^ ^ .
REFERENCES.
(1) Sheppard, W. F., "On the Calculation of tlie Average Square, Cube, etc.,
ofalarge number of Magnitudes," Jour. Roy. Stat. Soc, vol. lx., 1897,
p. 698.
(2) Sheppard, W. F., "On the Calculation of the most probable Values of
Divisions of a Scale," Proc. Load. Math. Soc, vol. xxix. p. 353. (The
Sheppard's result, but the mode in which he deduces this and similar
(Formula (8).)
(Proof of formula (8), but on different lines to that given in the text,
(9) Pearson, Karl, "On a Form of Spurious Correlation which may arise
when Indices are used in the Measurement of Organs," Proc. Roy. Soc,
/«
Letter to Pt. II. (Cd. 7769, 1895; 8503, 1897 ; 2619, 1908).
Corrected Birth-rates," Jour. Roy. Stat. Soc, vol. lxix., 1906, p. 34.
(15) Yule, G. U., "On the Changes in the Marriage- and Birth-Rates in
England and Wales during the past half-century," etc., ibid., p. 88.
Miscellaneous.
ductory part of this memoir, some of which have been utilised in §§ 12-
EXERCISES.
(p. 139) and iii. (p. 141) of Chapter VIII. on applying Sheppard's correction
for grouping.
difference of less than 0'5 per cent. in the rough value of the standard-
deviation.
c
THE USE OF THE CORRELATION-COEFFICIENT.
223
Decade.
Mean.
Standard-
deviation.
1881-1890
1891-1900
508-1
508-4
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
12-80
10-37
Both decades
508-25
11-65
district in the two decades is + 0.36, estimate (1) the true standard-deviation
tions of sampling, i.e. of the errors produced by such fluctuations in the observed
4. (Data from Pearson, ref. 9.) The coefficients of variation for breadth,
height, and length of certain skulls are 3-89, 3-50, and 3-24 per cent. respec-
tively. Find the '' spurious correlation" between the breadth/length and
Proc. Roy. Soc., vol. lxii. p. 413.) From short series of measurements on
American Indians the mean coefficient of correlation found between father and
son, and father and daughter, for cephalic index, is0-]4 ; between mother and
son, and mother and daughter 0-33. Assuming these coefficients should be
the same if it were not for the looseness of family relations, find the proportion
uncorrelated.
uncorrelated.
of those under and over 1 year of age were uncorrelated. Note that —
births
population
infantile mortality is (loc. cit.) 9.6, and that of annual movements in mortality
other than infantile may be taken as sensibly the same as that of general
9. If the relation
holds for all values of x„ ar2 and ?3 (which are, in oar usual notation,
a, b and e.
10. What is the effect on a weighted mean of errors in the weights or the
quantities weighted, such errors being uncorrelated with each other, with the
weights, or with the variables—(1) if the arithmetic mean values of the errors
are zero; (2) if the arithmetic mean values of the errors are not zero!
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
CHAPTER XII.
PARTIAL CORRELATION.
changes in the proportion of old; and the question might arise how
far the first correlation was due merely to a tendency to give out-
relief more freely to the old than the young, i.e. to a correlation
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
225 15
226 THEORY OF STATISTICS.
say, the bulk of a crop and the rainfall during a certain period, and
temperature during the same period; and the question might arise
ature if other things were equal, but failing as a rule to obtain this
of the method used in the case of two variables. The latter case
find a sensible positive value for any one coefficient such as b2,
for, so far as this may be done with a linear equation. For examples
tion that the means of all arrays were strictly collinear, and the
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
gression-equation
Xl — a\ + ^12-^2 • • • ■ (a)
we write
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
best value (for the given value of bl2), and would continuously
of fig. 44. The best value of av for which s12 attained its
228
THEORY OF STATISTICS.
ness from the condition that ifa\ a"1 be two values close above
and below the best, the corresponding values of sl2 are equal. Let
when 8 is very small, the value of a1 is the best for the assigned
that is,
2(ar, - a1 + b12.x2) = 0,
a,=0
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
%x2(x^ - b12.x2) = 0
b„
S(ay2)
. 2(*22)"
(c)
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
PARTIAL CORRELATION. 229
this, and notably we have for o-12, the minimum value of s12,
•r^-T^l-fa*) . . . .(d)
the x on the left (the dependent variable), and the second will
in the form
The regressions b12, b2v b13, b31, etc., in the case of two variables
difference between the actual value of x" and the value assigned
tion we have
deviations o-1 o-2, etc., being regarded as of order zero, the standard-
deviations 0-1.s o"21 etc., (cf. eqn. (d) of § 3) of the first order, and
so on.
ing the coefficients 61234 ....„, etc., (n- 1) again for determining
-
PARTIAL CORRELATION. 231
the coefficients b.n 3l ....„, etc., and so on: they are sometimes
termed the normal equations. If the student will folio-" the pro-
cess by which (5) was obtained, he will see that when the con-
value, x2 enters into the product-sum with a;i 23 ....„; when the
= 2(Kl.34 . . . „ . 2j).
Similarly,
Similarly again,
which are common to the two, and, conversely, tlie product-sum of any
the latter.
the subscripts of the one deviation occur among the secondary sub-
(')
That is
But this is the value that would have been obtained by taking a
#1.34 . . . . n and a;„34 ....„, and from (4) that we may write
"i3 4 . . . . n
if we had three variables only, xv #2, and #3, the value of b123 or
r23 and the corresponding regressions bl3 and b23; (2) working out
with the same values of x3. The method would not, however, be
much more lengthy than the method given below for expressing
For,
^l^ . . . . n= °1.23 .... (n-l)(l — "ln.23 .... (n-1) • Orel.23 .... (n-1))
= <T?-
PARTIAL CORRELATION. 233
oi„ = oi(l-rin)
It is clear from (9) that rln2S in-1)t like any correlation of order
any greater accuracy from x2 and x3 than from x2 alone, for the
Apart from the algebraic proof, it is obvious that the values must
is clearly indifferent in what order the latter are taken into account.
= 2(lKl.34... (n-1) .2>2.3(... (n-1)) ~ &2n.34 ... (n-l)S(a;i.3l... („-l . Xn.34... (n-1)).
Replacing blnU ... („-„ by b,aM ... {„-1). o-|31.,. {n-vjo-lM ... „-,„
we have
6l2.34 . . . n. (T2.34 . . . n = &12.34 . . . (n-1)" ff2.34 . . . (n-1) - &ln.34 . . . (n-1). *n2.34 . . (n-1 • 1^2.34 . .. (n-1),
°12.n = -I r i
1 — °2n • °n2
234 THEORY OF STATISTICS.
xlM .... (Ii-l) = "l2.34 . . . n ■ ^2.34 . . . (ii-1) + "ln.23 . . . 1m-ll • ^'tl.34 . . . (n-l)i
^n.34 .... (n-ii being given. As any other secondary suffix might
I ^12.34 .... (n-1) — ^ln.34 .... (Il-l1 • *Wm .... i'l-l) o-1.34 .... (n—11.
"12.34 . . It 1 - r" IT
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
. ■ (n-1)
(12)
'12.34 n— 7t m \\ /I --«L \J
order higher than the first, for any one of the secondary suffixes
the same form as (12), and the value obtained for r12M.... n by
235
more variables does not involve any difference in the form of the
tion was worked out between (1) the average earnings of agri-
Question 2 of the same chapter are given (3) the ratios of the
determined are—
Ml = 15-9 shillings
M^ 5-79
r,„=+0-60
"2-3-(l-r-f3)»(l-r2^
1.
2.
3.
4.
5.
6.
7. 8.
9.
r.
Product
Term.
Numera-
tor.
log
Num.
log
Deuom.
Correlatiou of
First Order.
log.
Value.
ri2=-0-66
r13=-0-13
r23 = +0-60
1-87680
1-99629
1-90309
- 0-0780
-0 3960
+0-0868
-0-6820
+0-2660
+0-6142
1-76492
1-42488
236 THEORY OF STATISTICS.
to the form
log 6213 = 1-64993, 6213= -0-45 : log 6231 = T-33917, bwl= +0-22
log 6312 = 1-93024, 631.2= +0-85 : log &m = 0-33891, 6g2a = +2-18
The units are throughout one shilling for the earnings X,, 1
per cent. for the pauperism X2 , and 1 for the out-relief ratio X2.
(X3), though very small (cf. Chap. IX. § 17), does not seem
actually occur.
to which the student should refer for details. The variables are
100) of—
THEORY OF STATISTICS.
follows:—
Table I.
1.
2.
3.
4.
Means.
Standard-
deviations.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Correlation-
coefficient.
1047
29-2
12
+ 0-52
1-93154
90-6
41-7
13
+ 0-41
1-96003
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
1077
6-5
14
-0-14
1-99570
111-3
23-8
23
+ 0-49
1-94038
24
+ 0-23
1-98820
34
+ 0-25
1-98598
It is seen that the average changes are not great; the per-
9-4 per cent., and the percentage of old has increased by 7-7
per cent., at the same time as the population of the unions has
risen on the average by 113 per cent. At the same time the
239
Table II.
1.
2.
3.
4.
5.
Correlation -
Product
Correlation-
coefficient
(First Order).
coefficient
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Term of
Numerator.
log VI-r2.
(Zero Order).
Numerator.
12
13
23
+ 0-52
+ 0-41
+ 0-49
+ 0-2009
+ 0-2548
+ 0-2132
+ 0-3191
+ 0-1552
+ 0-2768
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
12-3
13-2
23-1
+ 0-4013
+ 0-2084
+ 0-3553
1-96187
1-99035
1-97070
12
14
24
+ 0-52
-0-14
+ 0-23
'-0-0322
+ 0-1196
-0-0728
+ 0-5522
-0-2596
+ 0-3028
12-4
14-2
24-1
+ 0-5731
-0-3123
+ 0-3580
1-91355
1-97772
1-97022
13
14
+ 0-41
-0-14
+ 0-25
-00350
+ 0-1025
-0-0574
+ 0-4450
240 THEORY OF STATISTICS.
with the same secondary suffix (Table III. col. 1), and these
lines, for instance, rl3ii in the second and seventh, and so on.
sufficient number of digits is not retained, and for this reason the
h =r o-l-234
"12-34 12 34- l
°2134
n)
V-
we find
= ^(1-^(1 -r?,4)'(l-r?,3,)*
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
percentage-ratios,
I
PARTIAL CORRELATION. 241
-0-383 „ „ population.
= +0-52
»-12.34=+0-46
= +0-41
r13.24=+0-28
= -0-14
ru-w= -°.36
[For the full discussion of the case cf. Jour. Roy. Stat. Soc.,
were taken along two axes at right angles, and every observed
from the data of Example i. Four pieces of wood are fixed together
16
242
THEORY OF STATISTICS.
like the bottom and three sides of a box. Supposing the open
upwards along the left-hand angle at the back of the "box," the
Fig. 45. — Model illustrating the Correlation between three Variables: (1)
(2) Out-relief ratio (numbers given relief in their homes to one in the
(data pp. 178 and 189). A, front view; B, view of model tilted till the
as a straight line.
ph.
as a si.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
PARTIAL CORRELATION. 243
back and bottom of the box, starting from zero at the left: finally,
the scale of earnings is drawn out towards the observer along the
angle between the left-hand side and the bottom, but as earnings
lower than 12s. do not occur, the scale may start from 12s. at the
relief ratio, 1 in. = 1 unit; earnings, 1 in. = Is.; and the inside
"box." The earnings and out-relief ratio for some one union are
higher the lower the wages and the higher the out-relief, for the
highest points lie towards the back and right-hand side of the
opposite sides of the box (the holes facing each other), and threads
on the back sides and base: they represent, of course, the eleva-
16. If we write
For we have
and also
(«?-«!« y
but in the limiting case of two variables the two are identical.
1-^,,3....,„>1-^-
Hence Rl{23 ....„, cannot be numerically less than r12. For the
cannot be numerically less than r12, r13, .... rl„, i.e. any one
That is,
"12.34 .... (n-l1 = "12.34 . . . . n + "ln.23 .... (n-1) • "n2.34 .... (n-1)'
In this equation the coefficient on the left and the last on the
(n-1)
have
1 "ln.2 • "nl.2
by writing down equation (16) for 62134 ... („-,, and taking the
THEORY OF STATISTICS.
'12"
(l-rU!(l-»i,i)!
only briefly here. Writing (12) in its simplest form for r123,
that is,
(l-r?3)(l-^)
rV
"(18)
if the three r's are consistent with each other. If we take r12, r13
(19)
and therefore, if r123 and r132 are given, r231 must lie between
the limits
a few special cases, for the three coefficients of zero order and
Value of
Limits of
m or r12.3.
ns or ri3.2.
»'23.
J"23.1.
+1
±1
+1
+1
+1
+1
-1
+1
-1
+1
+ V0-5
+ \A)-5
0, +1
0, -1
0, -1
0, +1
PARTIAL CORRELATION. 247
order zero and value unity are only consistent if either one only,
not -1,-1, -1. On the other hand, the set of three coefficients
of the first order and value unity are only consistent if one only,
or all three, are negative: the only consistent sets are +1, +1,
be very high if even the sign of the third can be inferred; if the
third; the fact that 1 and 2, 1 and 3 are uncorrelated, pair and
such fallacies is the same as for the case of attributes, and was
r12-r13.r23
and from the form of the corresponding expression for r12 in terms
12 s/a-^xi-rio
(6)
From the form of the numerator of (a) it is evident (1) that even
if r12 be zero, r123 will not be zero unless either r13 or ri3, or
both, are zero. If r13 and r23 are of the same sign the partial
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
often is, of opposite sign to r12, and this may lead to still more
see that, conversely, r12 will not be zero even though r123 is zero,
and x23, and that we might determine the value of this partial
not be the same (or approximately the same) for all such tables,
the correlation between x13 and x23 for every value of xa (cf.
of C's and the universe of y's: that these two associations may
REFERENCES.
The preceding chapter is written from the standpoint of refs. 3 and 4, and
with the notation and method of ref. 5. The theory of correlation for several
variables was developed by Edgeworth and Pearson (refs. 1 and 2) from the
Theory.
(1) Edgewoeth, F. Y., "On Correlated Averages," Phil. Mag., 5th Series,
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
(3) Yule, G. U., "On the Significance of Bravais' Formulae for Regression,
etc., in the case of Skew Correlation," Proc. Roy. Soc., vol. lx., 1897,
p. 477.
(4) Yule, G. U., "On the Theory of Correlation," Jour. Roy. Stat. Soc,
(5) Yule, G. U., "On the Theory of Correlation for any number of Variables
Influence of Two Variables upon a Third," Jour. Roy. Stat. Soc, vol.
(7) Yule, G. U., "An Investigation into the Causes of Changes in Pauperism
in England, etc.," Jour. Roy. Stat. Soc, vol. lxii., 1899, p. 249.
(8) Hooker, R. H., "The Correlation of the Weather and the Crops," Jour.
EXERCISES.
found for
Find the partial correlations and the regression-equation for hay-crop on spring
X±— deaths of infants under 1 year per 1000 births in same year (infantile
mortality).
(overcrowding).
Taking the figures below for 30 urban areas in England and Wales, find the
other factors.
J/, = 164
<r,= 200
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
r12 = +0-49
»-23= +0'15
J/2 = 158
a2= 74-9
r13=+0-78
r24= -0 37
34 = 143
a3 = 22-4
j-u=+0-20
r34= + 0'23
JI/4 = 205
,r4 = 130-0
3. If all the correlations of order zero are equal, say = r, what are the values
Under the same condition, what is the limiting value of r if all the equal
5. Write down from inspection the values of the partial correlations for the
three variables
Check the answer to Qu. 7, Chap. XL, by working out the partial
correlations.
6. If the relation
holds for all sets of values of xlt x^, and x3, what must the partial correlations
bel
Check the answer to Qu. 9, Chap. XL, by working out the partial
correlations.
PART III—THEORY OF SAMPLING.
CHAPTER XIII.
1. The problem of the present Part—2. The two chief divisions of the theory
and only 44 tails, but we cannot conclude that the coin is biassed:
two nations we find that the mean stature is slightly greater for
250
"X
SIMPLE SAMPLING OF ATTRIBUTES. 251
of sampling, and there are two chief sections of the theory corre-
two variables X and Y are recorded on each card, we can also form
variables are briefly treated, the greater part of the theory, owing
holds good, e.g., for the tossing of a coin or the throwing of a die:
the result of any one throw or toss does not affect, and is un-
It does not hold good, on the other hand, for the drawing of balls
circular disc, there is nothing which can make it tend to fall more
often on the one side than on the other; we may expect, there-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
fore, that in any long series of throws the coin will fall with
cube, it will tend, in any long series of throws, to fall with each
saying that the chance of throwing heads (or tails) with a coin is
1/2, and the chance of throwing six (or any other face) with a die
event may give either no successes or one success, and will tend
to give the former qN, the latter pN, times in N trials. Take
an arithmetical example :—
qN 0 — ,-
pN 1 pJf pN
N — pN pN
SIMPLE SAMPLING OF ATTRIBUTES. 253
oi=p-p2=pq.
o-;
= W (1)
sl = o-l/n2=pq/n . . . . (2)
THEORY OF STATISTICS.
trials, care should be taken to see that they are fairly true cubes,
and the marks not cut very deeply. Cheap dice are generally
very much out of truth, and if the marks are deeply cut the
Encyel. Brit., 10th edn., vol. xxviii. p. 282. Totals of the columns
1-732.
Successes.
Frequei
60
3
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
198
430
731
948
Successes.
Frequency
847
536
257
10
71
11
11
12
Total
4096
Successes.
Frequency.
447
1145
SIMPLE SAMPLING OF ATTRIBUTES. 255
deviation, 0'816.
Frequency-distribution observed:—
Successes.
Frequency.
0
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
179
298
141
30
Total
648
deviation includes either all, or the great bulk of, the observations,
been dealing, only one condition has been explicitly laid down as
is not the only nor the most fundamental condition which has
equation (2) 1
preceding work that our dice or our coins were the same set or
256 THEORY OF STATISTICS.
of throwing "heads" with the coins or, say, "six" with the dice
with dice loaded in one way and later on take a fresh set of dice
place during the period over which the observations are spread.
sampling.
only that we were using the same set of coins or dice throughout,
so that the chances p and q were the same at every trial, but
also that all the coins and dice in the set used were identically
similar, so that the chances p and q were the same for every coin
the character observed must not only be the same for every
rates, formulae (1) and (2) would not apply to the numbers of
samples were all of the same age and sex composition, and living
only contained persons of one sex and one age. For if each
period not being the same for the two sexes, nor for the young
and the old. The groups would not be homogeneous in the sense
of hair-colour.
doing so, and hence of dying from the disease. The same thing
holds good for certain classes of deaths from accident, e.g. railway
(b), and (c), all the samples and all the individual contributions to
17
258 THEORY OF STATISTICS.
if p is, say, 1/3, and not 1/2 owing to the black balls, for some
§ 4.)
all the necessary conditions are fulfilled, but this is not a necessary
10. In Table VI. of Chap. IX. (p. 163) was given a correlation-
of England and Wales during the decade 1881-90 and the pro-
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
portion of male births. The table below gives some similar figures,
based on the same data, for a few isolated groups of districts con-
numbers of births, are given at the foot of the table, and it will
be seen that the two agree, on the whole, with surprising closeness,
Chap. IX. are given in Qu. 7 at the end of this chapter, and show
259
Different Ratios of Male to Total Births during the Decade 1881-90, for
Groups of Districts with the Numbers of Births in the Decade lying between
Male Births
Number of Births
in Decade.
per Thousand
Total Births.
1500
to
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
3500
to
4500
to
10,000
to
15,000
to
30,000
to
50,000
to
2500.
4000.
5000.
15,000.
20,000.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
50,000.
90,000.
466-67
1'
482- 3
492- 3
494- 5
496- 7
498- 9
500- 1
—
260 THEORY OF STATISTICS.
births per 1000 of all births, that is, 1000 times the values given
the proportions per 1000 for p and q in the formula. Thus for
-/508x492V 11.0
adopted.
therefore, as the means tend to the same values in all the groups,
and accordingly
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
ff nl 7l2 ng
(3)
"
Percentage.
Frequency.
Percentage.
Frequency
40
40
14
4
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
43
17
50
16
20
57
22
60
25
10
67
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
-4
29
80
33
13
100
VII. p. 128, and the harmonic mean size of litter was found to be
(3£)'-»<*
of albinos amongst all the offspring together was 24'7 per cent.
pq =p -p2 =p approximately,
1894) there were 122 men killed by the kick of a horse, or, on an
average, there were 0-61 deaths from that cause in each army
o- = (0-61)' = 0-78.
262 THEORY OF STATISTICS.
aths.
Frequency•
109
65
2:3
4
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
0^ = 0-6079
o- = 0-78
whence
sampling.
results for various special cases, to the use of the formulae for
conditions in the universe from which the sample has been drawn
= 110-9.
-xAxiXigl-^0-00226'
49152
and the difference observed bears the same ratio to the standard
1781. Can the divergence from the exact theoretical result have
standard error is
standard error, and may very well have arisen owing simply to
fluctuations of sampling.
264 THEORY OF STATISTICS.
and similarly the divergence from theory is only some 3/5 of the
would be the standard error for the divergence of (AB) from the
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
a priori value n/4, not the standard error for differences of (AB)
from (A)(B)/N, (A) and (B) being the numbers of heads thrown
been taken.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
sampling, the proportion of A's being really the same in both cases,
jnjpv\-n2p2
and the standard error of the difference e12, the samples being
=M»(i + 4) • • • • (5)
(b) If, on the other hand, the proportions of A's are not the same
in the material from which the two samples are drawn, but p1 and
A=Pi9i/n\ A=Pa-2/'>h
and consequently
i_m+PA . . . . (6)
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
sampling on taking fresh samples in the same way from the same
material.
Further, the student should note that the value of e12 given by
true values of the difference for a given observed value, and hence,
unless P1 and p2 differ largely, and in that case either formula will
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
tion respectively:—
Height— Heig —
17 17 12 22
29 39
68 X 68 * 34
2 \J
u) =°-120'
The student will notice, however, that all the other cases cited
the same sign, but rather more marked. Hence the difference
If 50 per cent. and 35 per cent. were the true proportions in the
/50 x 50 35 x 65\ „ „
vol. xxxvii., 1907.) The following are extracted from the tables
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
The actual difference is 3"0 per cent., or over 5 times this, and
0"56 per cent. With such large samples the difference could not,
alone.
the B's with the proportion of A's in the universe. The general
ni h= I ni
\ + n2 €0 V W1 + M2.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Therefore finally
r°1 = «,
three times this value of €01, it may have arisen solely by the
of »i1 observations.
As, in this case, both the subsamples have the same number
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
,-(rS^^y-0-060
of sampling.
43-5 per cent., difference 2-4 per cent. The standard error of
observations is therefore
901 = (43-5x56-5){1&s|^r3)'
The actual difference is over five times this (the ratio must, of
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
course, be the same as in Example iv.), and could not have occurred
REFERENCES.
The theory of sampling, for the cases dealt with in this chapter, is generally
Chapter XV., and the student will be unable to ollow much of the literature
D^^-
"N
SIMPLE SAMPLING OF ATTRIBUTES. 269
(1) Quetelet, A., Lettres .... sur la thiorie des probability; Bruxelles,
1849). See especially letter xiv. and the table on p. 374 of the
(2) Westergaard, H., Die Orundzilge der Theorie der Statistik; Fischer,
Jena, 1890.
(3) Edgeworth, F. Y., Article on the "Law of Error" in the Tenth Edition
tion," Mem. and Proc. of the Manchester Lit. and Phil. Soe., vol. li.,
1907.
(5) Poisson, S. D., "Sur la proportion des naissances des filles et des
(7) Lexis, W., Abhandlungen zur Theorie der Bevblkerungs und Moralstati-
reference.)
(9) Venn, John, The Logic of Chance, 3rd edn. ; Macmillan, London, 1888.
Probabilities," Jour. Roy. Stat. Soc., vols. Ix., lxi., 1897-8 (especially
(12) Vigor, H. D., and G. U. Yule, "On the Sex-ratios of Births in the
Stat. Soc., vol. lxix., 1906, p. 576. (Use of the harmonic mean as in
111.)
(13) Poisson, S. D., Recherches sur la probabilite ' des jugements, etc. ; Paris,
Leipzig, 1898.
THEORY OF STATISTICS.
EXERCISES.
Compare the actual with the theoretical mean and standard-deviation for
as a "success."
Successes. Frequency.
Successes. Frequency.
1351
1
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
14
844
103
391
302
10
117
711
11
21
1231
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
12
1411
Total 6500
2. (Ref. 1.)
Balls were drawn from a bag containing equal numbers of black and white
balls, each ball being returned before drawing another. The records were then
grouped by counting the number of black balls in consecutive 2's, 3's, 4's, 5's,
etc. The following give the distributions so derived for grouping by 5's, 6's,
Successes.
(o) Grouping
(6) Grouping
(c) Grouping
by Fives.
by Sixes.
by Sevens.
30
17
125
65
34
277
166
104
224
192
151
136
166
148
5
SIMPLE SAMPLING OF ATTRIBUTES.
271
4. The proportion of successes in the data of Qu. 1 is 0 •5097. Find the stand-
ard-deviation of the proportion with the given number of throws, and state
whether you would regard the excess of successes as probably significant of bias
in the dice.
5. In the 4096 drawings on which Qu. 2 is based 2030 balls were black
Find n audp in this way from the data of Qu. 1 and Qu. 3.
7. Verify the following results for Table VI. of Chapter IX. p. 163, and
Actual
Standard-
Row or Rows.
Mean.
Standard-
deviation s.
deviation *
of Sampling s0.
508-2
11-60
11-18
509 5
6-79
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
6-45
610-0.
5-28
5-00
4,
511-1
5-03
4-22
5 <r.
510-2
3-67
3-73
6,7 •
509-7
4-13
3-24
8, 9, 10, 11
508-7
3-10
2-69
12, 13, 14
508-4
2-55
2-25
15 and upwards.
508-2
2-13
1-85
mean number in a litter was 4735, and the expected proportion of albinos
50 per cent. Find the standard-deviation of simple sampling for the pro-
9. (Data from Report i., Evolution Committee of the Royal Society, p. 17.)
In breeding certain stocks 408 hairy and 126 glabrous plants were obtained.
10. (Data of Example viii. and Qu. 5, Chap. III.) Is the association in
CHAPTER XIV.
1. Warning as to the assumption that three times the standard error gives the
—2. Warning as to the use of the observed for the true value of p in
the formula for the standard error—3. The inverse standard error, or
effect of variation in p and q for the several universes from which the
samples are drawn—11-12. (b) Effect of variation inp and q from one
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
the limits are not, as a rule, strictly the same for positive and
272
REMOVING THE LIMITATIONS OF SIMPLE SAMPLING. 273
greater than 0'5, greater than is possible for positive errors. The
tall plants is 29/68, or, say, 43 per cent. The standard error of
/25x75\» KOK
18
274 THEORY OF STATISTICS.
of the array at right angles to this, i.e. the array of p's associated
becomes very small, o> becomes sensibly equal to <rp, and therefore
cally in any way round 0.5, any observed value ir greater than
&■
REMOVING THE LIMITATIONS OF SIMPLE SAMPLING. 275
difference.
it covers those fluctuations alone which exist when all the assumed
many such cases concerns quite a different point, viz. whether the
owing to the nature of the conditions under which the sample was
.4's and a's, the characters not being well defined—a source of
error which we need not further discuss, but one which may lead
to serious results \cf. ref. 5 of Chap. V.]. (2) Owing to either A's
necessarily infer that the proportion of black balls in the bag was
approximately ir, even though the standard error were small, and
to the law of simple sampling. For the black balls might be,
of inclusion in the sample is the same for A's and a's, far more
samples so dubious.
sampling does not evade the difficulty. Compulsion could not en-
of fruit were taken solely from the top layers of baskets exposed
that it will not tend to include, even in the long run, equal
but, on the other hand, no certainty that it does not exist. Thus
schools were selected, e.g. the volunteering of teachers for the work
ports, and the question were raised whether the sample was
for expecting definite bias in either case, but it may exist, and
errors in classifying the -4's and a's. On the other hand, of course,
another universe, but ir, - ir2 is considerably less than three times
that the true proportion for the given universes, p, and p2, are
success for each die is pv for the next/2 throws pm for the next/*3
throws p3, and so on, the chance of success varying from time to
time, just as the chance of death, even for individuals of the same
age and sex, varies from district to district. Suppose, now, that
the records of all these throws are pooled together. The mean
M=j/(fip1+fip2+f»p*+ ■ ■ ■ ■ ) = n-pw
deviations an amount
successes for the first set and the mean for all the sets together.
n n v v'
of the form
s"' = 4 + 0%
REMOVING THE LIMITATIONS OF SIMPLE SAMPLING. 279
from Puerperal Fever) per 1000 Births in the same Year, for the same
Groups of Districts as in the Table of Chap. XIII. § 10. Data from same
Deaths per
1000 Births.
5-
0-
2-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
6-
3-
•o-
3-
5-
4.
-o-
4-
5-
.0-
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
5-
5-
0-
6-
5-
0-
5-
0-
5-
9-
0-
9-
5-
10
10
o-io-
10
5-
11-
Total
Mean
Standard- de- |
viation i
TheoreticaF
standard -de-
280 THEORY OF STATISTICS.
does s fall short of «0. In the table on p. 279 are given some
o-j, = 0"8 in the deaths of women per thousand births, five out of
the eight values being very close to this average. The figures of
this case also bring out clearly one important consequence of (2),
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
22 x 978
21 x 106
condition that the chances p and q shall be the same for every
chances for m1 dice are p1 q1; for m2 dice, p2 q2, and so on,
in that the chances were the same for every die, at any one
throw, but varied from one throw to another: now they are con-
stant from throw to throw, but differ from one die to another as
.v.
REMOVING THE LIMITATIONS OF SIMPLE SAMPLING. 281
= n.p0
the m1 dice for which the chances are pl qv together with the
are p.2 q2, and so on: and these numbers of successes are all
independent. Hence
= ^(mpq),
standard-deviation of p,
successes,
■t
j>=M'--' . "(4)
nn
12. The effect of the chances varying for the individual dice or
calculated from the mean proportion p0, and the effect may
for half the events and unity for the remainder, p0 = q0 = \, and
s = 0-408/ Jn, instead of 0-5/ Jn, the value of s if the chances are
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
«2 = -(18x982-900)
s=l30/Jn
however, that the two other conditions (a) and (b) are fulfilled,
the chances^? and q being the same for every event at every trial,
expression
o-2 = n.pq,
where, r12, r13, etc. are the correlations between the results of the
^ = npq[l+r(n-l)]. . . . (5)
*2=-?[l+*•(*-1)1 - - • • (6)
the case when r is negative: for if the chances are not the same
for every event at each trial, and the chance of success for some
one event is above the average, the mean chance of success for the
remainder must be below it. The cases (a), (b) and (c) are, how-
§§ 9-12.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
ball or a black ball for the first, second, or nth ball of the sample:
will therefore tend in the long-run to give just the same form of
whence
r2 = n.pq,
(■-H,)
w-n
w~ 1
(npq)^; for drawing 5 balls out of 10, 0"745 (npq)i; in the case
cases) a very small value of r may easily lead to a very great increase
ever constant from one year to another, but the following will
284 THEORY OF STATISTICS.
(n-l)o-l
707:5 +0-00012.
560000 x105
chances are the same for each universe from which a sample is
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
which samples have been drawn (i.e. that the variations are
REFERENCES.
added—
(An expansion of one section of ref. 10 of Chap. XIII., dealing with the
first problem of our § 14, i.e. drawing samples from a bag containing
a limited number of white and black balls, from the standpoint of the
samples.)
EXERCISES.
2. For all the districts in England and Wales included in the same table
(Table VI., Chap. IX.) the standard-deviation of the proportion of male births
per 1000 of all births is 7'46 and the mean proportion of male births 509'2.
The harmonic mean number of births in a district is 5070. Find the significant
standard-deviation ap.
286 THEORY OF STATISTICS.
3. If for one half of n events the chance of success is p and the chance of
failure q, whilst for the other half the chance of success is q and the chance of
4. The following are the deaths from small-pox during the 20 years
1882
1317
1892
431
S3
957
93
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
1457
84
2234
94
820
85
2827
95
22S
86
275
96
541
87
506
97
25
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
8s
1026
98
253
89
23
99
174
HO
16
1900
85
91
49
1901
356
The death-rate from small-pox being very small, the rule of § 12, Chap.
Assuming that the excess of the actual standard-deviation over this can be
NORMAL CURVE.
of the normal curve and its use—17 The quartile deviation and the
all the events are completely independent, and the chances p and
q the same for each event and constant throughout the trials.
(homogeneous cubes).
results of this first event the results of a second. The two events
287
4
N.p*
N.p*
"3
N.p3
N.p3
ZN.fq + N.p>q
iN.p3q
-P*
N.p8
2N.p2q + N.p9q
.fq
p>q9 + %N.p*q9
8N.fq*
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
8A
8N
N.p
N.pq + N.pq
2N.pq
pq"
pf
pi*
iN.pq3
N.pq9 + IN.
8N.
pq3 + 8N.
-1
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
?"
?9
<?
?3
g* N
N.q1
Number 8f Successes.
One event
Two events
Three events
Four events
00
CO
CD
>
CO
>—9
CD
THE BINOMIAL DISTRIBUTION AND THE NORMAL CURVE. 289
(Nq)p with successes of the second event (cf row 2 of the scheme
cases of one success and one failure, and Np2 cases of two successes,
combined with those of the first two in precisely the same way.
Of the Nq1 cases in which both the first two events failed, (Nq2)q
of the third event and (2Npq)p with success, and similarly for
the Np2 cases in which both the first two events succeeded. The
event, and it is evident that all the results are included under a
given
19
290
THEORY OF STATISTICS.
from 0.1 to 0.5. When p = 0.1, cases of two successes are the
Number of
p = Q-\
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
p = 0'2
y = 0-3
p = 04
p = 0-5
Successes.
9 = 0-9
9 = 0-8
9 = 07
? = 0-6
9 = 05
1216
115
2702
576
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
68
2852
1369
278
31
1901
2054
716
123
11
898
2182
1304
350
46
319
1746
1789
746
148
89
1091
1916
1244
370
20
545
1643
1659
739
THE BINOMIAL DISTRIBUTION AND THE NORMAL CURVE. 291
B.
(Figures given
Number
Number
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Number
of
Frequency.
of
Frequency.
of
Frequency.
Successes.
Successes.
Successes.
1148
16
193
3
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
1304
17
106
16
10
1319
18
54
59
11
1199
19
26
159
12
988
20
12
339
13
743
21
596
. 14
513
22
889
15
327
23
292
THEORY OF STATISTICS.
rth term together with p times the (r-l)th term of the preceding
AB:BC::q:p, then
BQ=p.AP + q.CR.
A if b p c
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Fig. 46.
q :p, viz. 3:1. Next, choosing a vertical scale, draw the binomial
taken = 4096, and the polygon is abed, ob = 3072, lc= 1024. The
cally. Mark the points where ab, be, cd respectively cut the
points where ab', b'c, etc., cut the intermediate verticals are
<i
«i
f>
r*
THEORY OF STATISTICS.
comes from the funnel and meets the wedge 1. This wedge is
to the right (of the observer). The wedges 2 and 3 are set so as
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
to the left and p2 to the right. The streams passing these wedges
are therefore in the ratio of q2 : 2qp : p2. The next row of wedges
as before, and the four streams that result will bear the propor-
tions q3 : 3q2p : 3qp2 : p3. The final set, at the heads of the
vertical strips, will give the streams proportions q*: iq3p : Gq^p2:
as may be desired.
q" 0 — —
unity, and the mean is therefore given by the sum of the terms
the terms in col. (4) less the square of the mean, that is,
2 = nplq"-
= np - np"1 = npq.
.4-cards drawn from a record containing A's and a's, or the number
the same ordinates, the curve being such that the area between
series for any values of p and q, but in the present work we will
mr\1+w+ L2 + i.2.3 + — }•
Wr
m \n - m
n—m>m+1
n-1
or m<—~- .
Suppose, for simplicity, that n is even, say equal to 2k; then the
12*
y*=N^Y-Sk-x ■•"(»)
and therefore
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
y0 (k + l)(k+2)(k + 3) . . . . (k + x)
x-
('+D('+I)('+I)---('^X'^
(3)
298 THEORY OF STATISTICS.
series
82 8s 8*
to every bracket in the fraction (3), and neglect all terms beyond
i<;=-I(i+2+3+----+^-i)-I
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
x(x - 1) X
kk
k'
Therefore, finally,
yx=y<fi =y0e
J _ ~& . . . . (4)
median, and mode therefore coincide, and the curve is, in fact,
that drawn in fig. 5 and taken as the ideal form of the symmetri-
of error.
as may be, the last two data are given by the standard-deviation
alteration in the area, for the values of yx are the same for the
AT=
axy0o-
for positive and negative values of x. For the whole curve the
X.
V-
Logy.
X.
y-
Logi/.
1-00000
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
2-6
•03405
2-53209
0-2
■98020
1-99131
2-8
•01984
2-29757
0-4
•92312
1-96526
3-0
•01111
2-04567
0-6
•83527
1-92183
3-2
•00598
377641
0-8
•72615
1-86103
3-4
•00309
3-48978
10
■60653
1-78285
3-6
"00153
3-18577
1-2
•48675
1-68731
3-8
•00073
300 THEORY OF STATISTICS.
approximation,
1 AT n"
have
N -*,
y-i~^a - - - ■ (fi)
Another rule cited in Chap. VIII., viz. that the mean deviation
for the normal curve only. For this distribution the mean
the annexed table, the ordinates of the normal curve agree with
only one second-order term that has been neglected, viz. that due
Table showing (1) Ordinates of the Binomial Series 10,000 (J + \)u and
10,000 - 32
4\/2ir
Term.
32
31 and 33
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
30
29
28
27
26
25
34
35
36
37
38
39
Binomial
Normal
Series.
Curve.
993
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
997
963
967
878
880
753
753
606
605
459
457
326
324
217
216
Term.
24 and 40
23
22
21
20
19
18
17
41
42
43
44
45
46
47
Binomial
Normal
Series.
Curve.
136
135
80
79
302
THEORY OF STATISTICS.
elements, each of which can take the values 0 and 1 (or other two
equal frequency.
isoo
.1200
.$00
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
.600
\J00
mm
III
56 58 GO 6V
14
76 76 SO
64 66 68 70 72
Fig. 49.—The Distribution of Stature for Adult Males in the British Isles
figure, the frequency-polygon has not been drawn in, the tops of the
compared with Jnpq, cf. § 3), that p and q are not quite the
same for all the events, that all the events are not quite inde-
ceeding further from this last idea, the deduction may be rendered
k:
THE BINOMIAL DISTRIBUTION AND THE NORMAL CURVE. 303
head, the vertebral column, and the legs, the thickness of the
curve was first deduced, and received its name of the curve of
XI. § 3). The normal curve is then most readily drawn by plot-
and marking over these points the ordinates given by the figures
table.
is given by the fact that the whole area of the polygon showing
8585 12.53
plotting.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
in a normal curve, it divides the whole area into two parts, the
and tedious matter, can thus be done once for all, and a table
the end of this work (list of tables, pp. 353-4), the short table below
being given only for illustrative purposes. The table shows the
greater fraction of the area lying on one side of any given ordinate;
O.lo- from the mean, and 046017 on the other side. It will be
area on one side; some 68 per cent. of the area will therefore be
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
standard-deviation cuts off only 2.3 per cent., and therefore some
95-4 per cent. of the whole area lies within a range of + 2o-. As
range of + 3o-. This is the basis of our rough rule that a range
great bulk of the observations: the rule is founded on, and is only
20
306
THEORY OF STATISTICS.
Table shcnving the Greater Fraction of the Area of a Normal Curve to One
Greater
Greater
x/tr.
Fraction of
x/a.
Fraction of
Area.
Area.
0
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
•60000
2-1
•98214
0-1
•53983
2-2
•98610
0-2
•57926
28
•98928
0-3
•61791
2-4
-99180
0-4
•65542
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
2-5
•99379
0-5
•69146
2-6
•99534
0-6
-72575
27
•99653
07
•76804
2.8
•99744
0-8
•78814
2-9
•99813
0-9
•81594
3-0
•99865
1-0
•84134
3-1
•99903
11
•86433
3-2
•99931
1-2
-88493
83
•99952
1-3
•90320
34
-99966
1-4
THE BINOMIAL DISTRIBUTION AND THE NORMAL CURVE!. 307
normal. On the whole, the use of the "probable error" has little
It is true that the " probable error " has a simpler and more direct
Further, the best modern tables of the ordinates and area of the
standard error, not in terms of the probable error, and the mul-
and q are unequal; but this is not so for small samples, such as
times in 100 trials—or the odds are about 4.6 to 1 against its
occurring at any one trial. For a range of three times the probable
error the odds are about 22 to 1, and for a range of four times the
error and take ± 3 times the standard error as the critical range:
for this range the odds are about 370 to 1 against such a devia-
area table (p. 306), but a little caution must now be used, owing
occur, as given by the table, 187 times in 100,000 throws, or, say,
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
19 times in 10,000.
mean?
statures of about 303 per thousand of the given population will lie
29-5 or more, 4.5 from the mean; that is, 0'636.o-. A positive
•6745x50s/l/^=l,
nearest unit.
THE BINOMIAL DISTRIBUTION AND THE NORMAL CURVE. 309
statures recorded in the group "62 in. and less than 63" is
ordinate in the centre of the interval, or, better, use the integral
is evident from the form of the curve, a little too small. The
frequency 201'5.
the given class is 0-023, the proportion falling into other classes
sampling.
in a hundred.
REFERENCES.
(1) Galton, Francis, Natural Inheritance; Macmillan & Co. London, 1889,
ref. 11.)
310 THEORY OF STATISTICS.
Frequency Curves.
For the early classical memoirs on the normal curve or law of error
more than cite a few of the more recent memoirs, of which 5, 6, and 11
in 5, 7, and 12.
(2) Charlier, C. V. L., "Researches into the Theory of Probability " (Com-
tical Formula;," Jour. Soy. Stat. Soc., vol. lxi., 1898 ; vol. Ixii, 1899;
(4) Edgeworth, F. Y., Article on the '' Law of Error" in the Encyclopaedia
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
(5) Edgeworth, F. Y., "The Law of Error," Cambridge Phil. Trans., vol.
xx., 1904, pp. 36-65, 113-141 (and an appendix, pp. i-xiv, not
(10) Macalister, Donald, "The Law of the Geometric Mean," Proc. Roy.
with curves derived from the general binomial, and from a somewhat
ignoring the binomial and analogous distributions, cf. chap• x., ref. 12.)
p. 169.
(13) Sheppard, W. F., "On the Application of the Theory of Error to Cases
(14) Yule, G. U., "On the Distribution of Deaths with Age when the Causes
Jour. Roy. Stat. Soc, vol. lxxiii., 1910, p. 26. (A binomial distribu-
tion with negative index, and the related curve, i.e. a special case of
tical Formulae," partii., Jour. Boy. Stat. Soc, vol. lxii., 1899, p. 125.
Racial Differentiation," Phil. Mag., 6th Series, vol. i., 1901, p. 110.
Biomttrika, vol. iv., 1905, p. 230. Also memoir under the same title
in the Transactions of the Reale Accademia dei Lincei, Rome, vol. vi.
1906. (The first is a short note, the second the full memoir.)
See also the memoir by Charlier, cited in (2), section vi. of that
(19) Pearson, Karl, "On the Criterion that a given System of Deviations
EXERCISES.
of the same number of observations) are so superposed that the rth term of
the one coincides with the (r + l)th term of the other, the distribution
[Note: it follows that if two normal distributions of the same area and
nearly normal. ]
4. Calculate the ordinates of the binomial 1024 (0.5 + 0.5)10, and compare
Students (Chap. VII., Table VII.), and a normal curve of the same area,
7. Taking the mean stature for the British Isles as 67"46 in. (the dis-
tribution of fig. 49), the mean for Cambridge students as 68 "85 in., and the
based on 7125 seeds gave 25-32 per cent. of green seeds instead of the theoretical
proportion 25 per cent., the standard error being 0.51 per cent. In what per-
alone?
1000 seeds, might (a) 30 per cent. or more, (6) 35 per cent. or more, of green
index is under 75, mesocephalic when the same index lies between 75 and 80,
and brachycephalic when the index is over 80, find approximately (assuming
NORMAL CORRELATION.
1-3. Deduction of the general expression for the normal correlation surface
measurement: arrays taken at any angle across the surface are normal
313
314 THEORY OF STATISTICS.
2. Consider first the case in which the two variables are com-
"\
yi=yi«
y2=y2e
la
x?
(1)
<£D ■ ■»
yi2 = y12e
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
where
y1 2 = !LP = 7, . . . (3)
with major and minor axes parallel to the axes of x1 and x2 and
proportional to o-1 and o-2, the equations to the contour lines being
+ ^ = C .... (4)
a.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
that if
x1 and x2v as also x2 and x12 are uncorrelated. If they are not
(I41) ■■•<*>
-H-i
But
X\ 3C% XyCi
r''2
2 i 2 *" 11
— + —.
O^ CTi.2
have
, = N = N_ = iV
THEORY OF STATISTICS.
Axes of Measurement
the surface
Correlation Surface.
axes are, however, no longer parallel to the axes of x1 and x2, but
lated form of the contour lines for one case, RR and CC being
for rvhieh r is zero turned round through some angle, and since
£ca arrays and x2 arrays are normal, it follows that every section
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
and x2 must be normal for every angle though which the surface
must also be normal; (2) the correlation between any two linear
correlation.
To find the angle 6 through which the surface has been turned,
from the position for which the correlation is zero to the position
for which the coefficient has some assigned value r, we must use
the «1-axis which will make it, if continued through 90°, coincide
(8)
tan20=-f^? . . . . (9)
318 THEORY OF STATISTICS.
(9) gives the angle that they make with the axes of measurement
,_N
'_ N
therefore
2^=^(1-^)* . - • (11)
also if r is positive, the major axes of the ellipses lying along fi:
co-ordinates.
curve. We can now utilise Table III., Chap. IX., p. 160, showing
means of arrays deviate slightly here and there from the lines
appreciably linear.
2-56
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
2-60
2-11
2-26
2-55
2-26
2-24
2-45
2-23
2-33
irregularly round their mean value. The mean of the first five
first group, two are greater and three are less than the mean,
and the same is true of the second group. There does not seem
in any very satisfactory way: we can only say that they do not
III., Chap. IX., and forming the totals of such diagonals (running
0-25
78-75
81-25
3.25
67-5
6-25
59-25
42-25
9-75
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
30-75
17
29-25
34-5
19
41
10-75
46-25
60-5
4-25
67-5
3-5
85-75
1-75
87-25
78
0-25
94-25
Total 1078
the constants for the table given in Chap. IX., Question 3, p. 189,
321
and inserting o-1 = 272, o-2 = 2-75, r12 = 0-51, sin 6 = cos 6 = 1/ >J2
1UU
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
SO
(!0
\.
to
/
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
?!1
Chap. IX., along Diagonals running up from left to right, fitted with a
Normal Curve.
mately linear; (2) that, in the arrays which we have tested, the
their differences are only small, irregular and fluctuating; (3) that
test, viz. the form of the contour lines and the closeness of their
once, however, that no very close fit can be expected. Since the
21
322 THEORY OF STATISTICS.
square root (Chap. XIII. § 12), and this implies a standard error
the sons,
y'12 = 26-7
(22 \
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
gj *i *1*2 \
tan 20 = -46-49,
whence 20 = 91° 14', 0 = 45° 37', the principal axes standing very
They should be set off on the diagram, not with a protractor, but
2? + 2| = 14-961
21h-22 = 5-275
2l-2, = 1-447
ing nearly at 45° the first value is sensibly the same as that found
(3-36)2 + (l-91)J~'
NORMAL CORRELATION.
323
the major and minor axes being 3-36 x c and 1*91 x c respectively.
loge
'/',
63 64
71 7S 7S
65 66 67 6S 69'
of Table III., Chap. IX., and corresponding Contour Ellipses of the fitted
values for the major and minor axes of the ellipses :—semi-major
axes, 6-15, 4*70, 255: semi-minor axes, 3-50, 2-67, 1-45. The
ellipses drawn with these axes are shown in fig. 52, very much
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
324 THEORY OF STATISTICS.
contour lines for the same frequencies are shown by the irregular
column not being used. It will be seen that the fit of the two
the fit looks very poor to the eye, but if the ellipse be compared
carefully with the table, the figures suggest that here again we
standard error would bring the actual contour outside the ellipse.
and similar ellipses (ref. 2): the suggestion was confirmed when
these data, and the various shapes and other particulars of its
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
round values of the variables xv x2, x\, x2, we have for the ratio
xl x2/'
( xi - *i)( *a - z?).
6"1.!T8.1
the exponent is of the same sign as r13- Hence the association for
NORMAL CORRELATION. 325
this group of four frequencies is also of the same sign as rn, the
intervals are equal or unequal, large or small, and the sign of the
chosen.
normal, but at least the test for isotropy affords a rapid and
frequency.
THEORY OF STATISTICS.
regrouping the table in a much coarser form, say with four rows
and four columns: the table below exhibits such a grouping, the
Son's Stature
(inches).
Under
65-5.
65-5-67-5
67-5-69-5.
69-5
and over.
Total.
Under 66'5
66-5-68-5
68-5-70-5
97-5
76-5
33-25
14-75
74-25
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
108
64-75
32 5
34-75
85
95
80-75
10-5
52
84-5
134
217
321-5
277-5
262
Total
222
279-5
295-5
281
1078
Row.
Columns
1 and 2.
2 and 3.
3 and 4.
0-568
0-681
0-768
0415
0-560
NORMAL CORRELATION. 327
isotropic.
chapter),
yi2 . . . . n — y 12 . . . , n e
where
(13)
(13) for the exponent. Clearly this might have been written in
arrived at precisely the same final form for the exponent whether
we started with the two deviations xx and #2.i or with x2 and xlv
Our assumption, then, that the deviations xv x2v xs 12, etc. are
tions of any order and with any suffixes are normally distributed,
X
328 THEORY OF STATISTICS.
reducing x3l2 to the second order we shall find that the correla-
any tivo deviations xmk and xnk is normal correlation; (2) the correla-
xmk and x„k: in the normal case rmi^k is constant for all the sub-
case might be. Finally, we have to note that if, in the expression
(15) for <j>, we assign fixed values, say h2, h3, etc., to all the
deviations except xv and then throw <j> into the form of a perfect
But this is a linear function of h2, h3, etc., therefore in the case of
^12.34 .. . . n, "to.
REFERENCES.
General.
(1) Bbavais, A., "Analyse mathematique sur les probabilites des erreurs de
situation d'un point," Acad• des Sciences: Memoires presentes par divers
(2) Galton, Francis, " Family Likeness in Stature," Proc. Soy. Soc., vol. xl.,
1886, p. 42.
(4) Dickson, J. D. Hamilton, Appendix to (2), Proc. Roy. Soc , vol. xl.,
1886, p. 63.
(5) Edgeworth, F. Y., "On Correlated Averages," Phil. Mag., 5th Series,
(7) Pearson, Karl, "On Lines and Planes of Closest Fit to Systems of Points
in Space," Phil. Mag., 6th Series, vol. ii., 1901, p. 559. (On the fitting
of " principal axes" and the corresponding planes in the case of more
(8) Pearson, Karl, "On the Influence of Natural Selection on the Variability
and Correlation of Organs," Phil. Trans. Roy. Soc., Series A, vol. cc,
(9) Pearson, Karl, and Alice Lee, "On the Generalised Probable Error in
(10) Yule, G. U., "On the Theory of Correlation," Jour. Rmj. Stat. Soc,
(11) Yule, G. U., "On the Theory of Correlation for any number of Variables
(12) Sheppard, W. F., "On the Application of the Theory of Error to Cases
1900, p. 23.
age of Cases wherein B exceeds (or falls short of) a given Intensity is
recorded for each grade of A," Biometrika, vol. vii., 1909, p. 96.
(17) Pearson, Karl, "On the Theory of Contingency and its Relation to
London, 1907.
Psychology, vol. ii., 1906, p. 89. (The suggestion of a " rank "method:
EXERCISES.
without assuming the normal distribution. (A proof will be found in ref. 10.)
2. Hence show that if the pairs of observed values of a;1 and x2 are repre-
sented by points on a plane, and a straight line drawn through the mean, the
sum of the sqnares of the distances of the points from this line is a minimum
zero, and with reference to other axes something, there must be some pair of
greatest without regard to sign. Show that these axes make an angle of 45°
with the principal axes, and that the maximum value of the correlation is—
tion table, taking the points of division between A and a, B and ft, at the
'= cos (
,_2U£)\
N)
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
CHAPTER XVII.
the theory of sampling for the case of attributes and the frequency-
bag, note the value that it bears, draw another, and so on until
N such samples of n cards each, and then work out the mean,
331
332 THEORY OF STATISTICS.
were discussed very fully for the case of attributes (Chap. XIII.
§ 8), and we would refer the student to the discussion then given.
(b) We assume not only that we are drawing from the same
may be regarded quite strictly as drawn from the same record (or
take the first card from bundle number 1, the second card from
may not be the same for each individual card at each drawing.
tickets only, bearing the numbers 1 to 10, and we draw the card
than the mean of all cards drawn; if, on the other hand, we draw
the 10, the average of the following cards will be lower than the mean
number on the card taken at any one drawing and the card taken
the bag indefinitely large, we can, as already pointed out for the
of, practical cases the very question at issue is the nature of the
portion of X's above Xp, say p + 8, for the sample, we also proceed
Vv
Vvy n
Sheppard (Table IV., ref. 14, in Appendix I.), gives the values
334
THEORY OF STATISTICS.
directly, and these have been utilised for the following: the
area and ordinate tables for the normal curve given in Chapter
Value of yp
Median .
Deciles 4 and 6
„ 3 and 7
2 and 8
„ 1 and 9
Quartiles
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
0-3989423
0-3863425
0-3476926
0-2799619
0-1754983
0-3177766
etc., and the values given in the second column for their probable
errors (Chap. XV. § 17), which the student may sometimes find
useful:—
Median
. 1-25331
0-84535
Deciles 4 and 6 .
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
. 1-26804
0-85528
„ 3 and 7 .
. 1-31800
0-88897
2 and 8 .
. 1-42877
0-96369
„ 1 and 9 .
. 1-70942
1-15298
Quartiles
. 1-36263
0-91908
that of the median, and the standard error of the first or ninth
having the same area, the standard error of the median reduces to
pound distribution.
(a)
-J
2* ^ . . . . (c)
= 1.
n o-1 + o-2
2 s/Tr(ri<r2
2 ijirp
or
Fig. 53.
have
f-
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
THE SIMPLER CASES OF SAMPLING FOR VARIABLES. 337
But this gives at once for the standard error expressed in terms
% = ^F .... (2)
the two different formulae (1) and (2), take the distribution of
for our present purpose, the frequency per interval at the median,
tion of normality.
Let us find the standard error of the first and ninth deciles
tion is normal, these standard errors are the same, and equal to
that the class-interval is, in this case, identical with the unit of
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
interval.
Example i.), the fact that the class-interval is not a unit must
is approximately 96, and this gives for the standard error of the
22
338 THEORY OF STATISTICS.
ful to note that the errors in two such percentiles are not
whole area of the frequency curve into three parts, the areas of
will tend to be spread over the two other sections of the curve
produce an error
2 pi l
fi pi
■Jm
normal distribution ) v
from first principles, applying the usual formula for the standard
§ 2, equation (1)).
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
the material from which it has been drawn, nor can it give any
the average stature for the United Kingdom: the sample is not
(As regards the theory of sampling for the median and per-
arithmetic mean.
each drawing the value recorded on the first, second, third ....
on each separate card will tend in the long run to be the same
Jn
(5)
(cf- § 4), and in general the standard errors of the two stand in
the two standard errors, viz. 1 "26, assumes almost exactly the theo-
standard error of the mean is only 0-0493 per cent., which bears
(6)
from the same record: the one sample must have been drawn
4=^+ff' • . • - (7)
This is, indeed, the formula usually employed for testing the
dispersion.
universe, but instead of comparing the mean of the one with the
mean of the other we compare the mean m1 of the first with the
justified, for errors in the mean of the one sample are correlated
the lines of the similar problem in § 13, Chap. XIII., case III., we
4l = o* 7 \ s - - - • (8)
point here; but if the student will refer to§ 13, Chap. XV., he will
see that the genesis of the normal curve in this case is in accord-
ance with what we then stated, viz. that the distribution tends to
correlated with each other, even a fairly large sample may con-
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
once from the fact that any linear function of normally distributed
§ 2 cease to apply.
(a) If we do not draw from the same record all the time, but
&1 samples from the first record, for which the standard-deviation
is o-2, and the mean differs by d2 from the mean of all the records
together, and so on. «Then for the samples drawn from the first
record the standard error of the mean will be <rJJn, but the
mean for all the records together: and so on for the samples
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
drawn from the other records. Hence, if o-m be the standard error
Krt^k^ + ^.cP).
But the standard-deviation o-0 for all the records together is given
by
XIV. The standard error of the mean, if our samples are drawn
for the different districts will not be <rjjn, but will have some
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
always draw the first card from one part of that record, the
second card from another part, and so on, and these parts differ
For if, in large samples drawn from the subsidiary parts of the
record from which the several cards are taken, the standard-
deviations are o-v <r2, . . . . o-„, and the means differ by dv d2,
THE SIMPLER CASES OF SAMPLING FOR VARIABLES. 345
. . . . dn from the mean for a large sample from the entire record,
we have
Hence
The last equation again corresponds precisely with that given for
the same departure from the rules of simple sampling in the case
of attributes (Chap. XIV. § 11., eqn. 4). If, to vary our previous
by taking one man from each district for the first sample, one
man from each district for the second sample, and so on, the
district were all of precisely the same stature, the means of all the
the cards from which we were drawing samples had been arranged
with common-sense.
(c) Finally, suppose that, while our conditions (a) and (b) of § 2
another card, e.g. that if the first card drawn at any sampling
bears a high value, the next and following cards of the same
sample are likely to bear high values also. Under these circum-
■j
.-[l+r(n-l)] . . . (11)
value of r the increase will be the greater, the greater the size of
distinct part of the record, the correlation between any two x's will
may arise for reasons quite different from those considered under
briefly our reasons for not proceeding further with the discussion
or, in other words, the mean (deviation)2 with respect to the mean.
THE SIMPLER CASES OF SAMPLING FOR VARIABLES. 347
with respect to the mean. Either, then, we must find this quantity
the limits that we have set ourselves. To deal with the standard
tandard-deviation in .
normal distribution )
This is the value always given: the use of a more general formula
which would entail the use of higher moments does not appear
cf. ref. 23. Equation (15) gives the standard error of a coefficient
and q are equal) and in the case of the mean of a normal distri-
fp in (2), and of o- in (4) and (5), i.e. the values that would be
known a priori. But this is only the case in dealing with the
obtained, e.g. in the case of the mean, by first working out the
any constant in the inverse sense, i.e. the standard error ceases
REFERENCES.
(1) Blakeman, J., and Karl Pearson, "On the Probable Error of the
191.
(2) Bowley, A. L., The Measurement of Groups and Scries ; C. & E Layton,
London, 1903.
(6) Edgeworth, F. Y., "The Choice of Means," Phil. Mag., 5th Series,
Jour. Roy. Stat. Soc, vol. lxxi., 1908, pp. 381, 499, 651; and
(8) Elderton, W. Palin, "Tables for Testing the Goodness of Fit of Theory
(10) Heron, D., "An Abac to determine the Probable Errors of Correlation
(11) Heron, D., "On the Probable Error of a Partial Correlation Coefficient,"
lines, for the case of three variables, of the result given in (25).)
(12) Laplace, Pierre Simon, Marquis de, Theorie des probabilites, 2e edn.,
(14) Pearl, Raymond, "On certain Points concerning the Probable Error
V iriation and Correlation," Phil. Trans. Roy. Soc., Series A, vol. cxci.,
1"98, p. 229.
(16) Pe; rson, Karl, "On the Criterion that a given System of Deviations
(17) Pe. rson, Karl, and others (editorial), "On the Probable Errors of
t1 in of the frequency-distribution.)
(18) Pea , Karl, "On the Curves which are most suitable for describing
v. J6, p. 172.
181.
(20) Rhind, A., "Tables for Facilitating the Computation of Probable Errors
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
(21) Sheppard, W. F., " On the Application ofthe Theory of Error to Cases
(22) "Student," " On the Probable Error of a Mean," Biometrika, vol. vi.,
en 1r of the sample.)
Biometrika, vol. vi., 1908, p. 302. (The problem of the probable error
(24) "Student," " On the Distribution of Means of Samples which are not
(25) Yule, G. U., "On the Theory of Normal Correlation for any number of
Reference may also be made to the following, which deals for the
most part with the effects of errors other than errors of sampling:—
that of its Constituent Parts," Jour. Roy. Slat. Soc, vol. lx., 1897,
p. 855.
THE SIMPLEK CASES OF SAMPLING FOE VARIABLES. 351
EXERCISES.
1. For the data in the last column of Table IX., Chap. VI. p. 95, find
2. For the same distribution, find the standard errors of the two quartiles
3. For the same distribution, find the standard error of the semi-inter-
quartile range.
standard error of the mean, and compare its magnitude with that of the
5. Work out the standard error of the standard-deviation for the distribu-
distribution normal."
coefficient, based on (1) 100, (2) 1000 observations, for values of r = 0, .1.2, 04,
161 ,
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
APPENDIX I.
A. CALCULATING TABLES.
rule, beyond the reach of the student. For a great deal of simple
Bauschinger and Peters (W. Engelmann, Leipzig, and Asher & Co.,
trigonometric functions).
tables are very useful. There are many of these, and four of
352
X
APPENDIX I.—SPECIAL TABLES OF FUNCTIONS, ETC. 353
1000 x 1000.) M'Corquodale & Co., London ; price with thumb index,
(5) Peters, J., Neue Eechcntafeln fur Multiplikation und Division. (Gives
for all numbers up to 1000 at the foot of the page.) W. Ernst & Son,
Berlin; price 5s. ; English edition, Asher & Co., London, 6s.
(8) Elderton, W. P., "Tables for Testing the Goodness of Fit of Theory to
(11) Heron, D., " An Abac to determine the Probable Errors of Correlation
{12) Lee, Alice, "Tables of F(r, v) and H(r, v) Functions," British Associa-
{13 Rhind, A., " Tables for Facilitating the Computation of Probable Errors
23
354 THEORY OF STATISTICS.
vol. ii., 1903, p. 174. (Includes not merely table of areas of the normal
curve (to seven figures), but also a table of the ordinates to the same
degree of accuracy.)
PROBABILITY.
All the works mentioned in the following list, with others which
(1) Aiky, Sir 6. B., On the Algebraical and Numerical Theory of Errors of
1909.
1843.
355
356 THEORY OF STATISTICS.
(12) Galloway, T., Treatise on Probability (republished from the 7th edn.
(13) Gauss, C. F., Mithode des moindres carres: Mimoires sur la comhinaison
(14) Laplace, Pierre Simon, Marquis de, Essai philosophique sur les
some modifications.)
(16) Lexis, W., Abhandlungen zur Theorie der Bevblkerungs- und Moral-
(19) Quetelet, L. A. J., Lettres sur la thiorie des probabilitis, appliquie aux
Downes, 1849.)
(21) Venn, J., The Logic of Chance: an Essay on the Foundations and
(22) Westeroaard, H., Die Grundziige der Theorie der Statistik; Fischer,
Jena, 1890.
"t
ANSWERS
CHAPTER I.
26,287
(AB)
(AC)
(BC)
(ABC)
SS7
374
353
149
(A)
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
(B)
(C)
2,308
2,853
749
(ABC)
(ABy)
(A$C)
(A$y)
156
431
272
759
(aBC)
(aBy)
179
1,249
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
(a&C)
163
(*$v)
20,504
«. * • (AB) (A)
that is Vm > rV
(aB) (a)
5. (AB) + (BC) - (B), i.e., the sum of the excesses of (AB) and (BC) over (B)/2.
CHAPTER II.
4. 117.
5. 108.
(BC)^(B) + (C)-N.
357
358
THEORY OF STATISTICS.
CHAPTER III.
male sex: if there had been no association between deaf-mutism and sex, there
3.
Parentage Crossed.
Self-fertilised
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
. 86 per cent.
25 per cent.
17 „
34 „
45
• 79 „
• 78 „
• 71 „
• 50 ,,
35
Petunia violacea .
Reseda lutea
Reseda odorata
Lobelia fulgens .
The association is much less for the species at the end than for those at the
cent.
cent.
If there had been no heredity, the frequencies to the nearest unit would
have been (AB\ 18, (^4/8)0 111, (aB)0 121, (a/3)o 750.
per cent.
per cent.
= 108.
In general population: 0-9, 2-3, 4-1, 57, 6-9, 7.5, 77, 6"8.
Amongst the blind: 2091, 16-0, 16-3, 207, 18-3, 17-8, 11-4, 5 "3.
65—, and the negative association in the last age-group. The association
+ 0-20, -0-18.
CHAPTER IV.
(AD)/(A) =45-0 „
(/3£)/03) = 3.6 ,,
(AHD)/(A$) =41-2 „
(BD)/(B) =427 „
(ABD)/(AB) = 5V6 „
(AD)/(D) =44-6
M/3)/(/3) = 47
(A$D)/($D) =54-9
(AB)/(B) =29-2
(ABD)/(BD)=35-3
The above give two legitimate comparisons. The general results are the same
as for the boys, i.e. a very small association between development-defects and
359
amongst those who do not exhibit nerve-signs is quite as high as for the girls
2. (1)
(2)
(1)
(2)
per
thousand.
per
thousand.
per
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
thousand.
per
thousa
(B)/N 3-2
(AB)/(A) 14-9
7-5
117
(A)/N) 0-9
(AB)/(B) 4-0
4-0
6-3
(BC)/(C) 38-8
(ABC)/(AC) 216
63-0
214
(AC)ftC) 6-6
(ABC)/(BC) 36-8
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
18-8
63-8
The above give the two simplest comparisons, either of which is sufficient to
show that there is a high association between blindness and mental derange-
the old, the association is, in fact, small for the general population, but well-
marked for deaf-mutes. This result stands in direct contrast with that of
Qu. 1, where the association between the two defects A and D was much
stated, no great reliance can be placed on the census data as to these infirmities.
were the same as for the population at large, the rate for all farmers 15—
would be I.ll. This is slightly less than the actual rate 1-20, but the excess
would not justify the statement that'' farmers were peculiarly liable to cancer."
have neglected, e.g. amongst those over 45 there are more over 55 amongst
4. 15 percent.
(A B) equal to
471x419 • 151xl39_OT;,„
-6T7"+-383~-3747-
9. (1) 68.1 per cent. (2) 42.5 per cent. The fallacy discussed in § 2 is
now avoided, and there seems no reason for declining to consider this as evidence
y<&Zx-x*-\)
association from two negatives is possible unless x lies between the limits
>l(x + 6xi),
(3) y<H6z-to*-l)
>l(3x + 2x2),
CHAPTER V.
1. A, 0 68. B, 0-36.
CHAPTER VI.
CHAPTER VII.
2. Mean, 156-73 lb. Median, 154-67 lb. Mode (approx.) 150-6 lb. (Note
that the mean and the median should be taken to a place of decimals further
than is desired for the mode: the true mode, found by fitting a theoretical
is 0 653.)
4. £35-5 approximately.
5. (1) 116-0. (2) Means 77.4, 89-0, ratio 114-9. (3) Geometrical means 77"2,
. . . , note that the resulting series is also a binomial when a common factor
CHAPTER VIII.
2. Standard deviation 21-3 lb. Mean deviation 16-4 lb. Lower quartile
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
5. (1) M=73-2, ff=17-3. (2) M=TS2, <r = 17-5. (3) if=73-2, <r = 18-0.
(Note that while the mean is unaffected in the second place of decimals, the
intervals does not affect the sum of deviations, except for the interval in which
the mean or median lies: for that interval the sum is n2 (0 -25 + d2), hence the
entire correction is
and is given its proper sign. Notice that the % and % of this question are
CHAPTER IX.
2. Using the subscripts 1 for earnings, 2 for pauperism, 3 for out-relief ratio,
CHAPTER XI.
1. 1"232 per cent. (against I.240 per cent.): 2-556 in. against 2p572 in.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
is too low, and the former consequently probably too high. Cf. Chap. XIV.
§10.)
4. 0'43.
5. 58 per cent.
6. ovVVK' + oVW + O-
8. 0-30.
10. (1) No effect at all. (2) If the mean value of the errors in variables is
d, and in the weights e, the value found for the weighted mean is—
ic(w + e)
If r is small, d is the important term, and hence errors in the quantities are
siderable, errors in the weights may be of consequence, but it does not seem
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
probable that the second term would become the most important in practical
CHAPTER XII.
4. -ri2.
CHAPTEE XIII.
i/=2'97, ir=l-26.
5. The standard deviation of the number drawn is 32, and the actual
standard-deviation does not, therefore, seem to indicate any real variation, but
10. The test can be applied either by the formulae of Case II. or Case III.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
(a) (AB)/(B) = 69-1 per cent.: (A$)/($) = &0-0 per cent. Difference 10-9
percent. (Ji )/iV= 71 "1 percent. and thence ei2=12-9 per cent. The actual
simple sampling.
(b) (AB)/(B) = l§-\ per cent.: (,4/3)/(/3) = 64-3 per cent. Difference 5-8 per
cent. (A)/N=67.G per cent., and thence «12 = 3.40 per cent. The actual
difference is 1 '7 times this, and might, rather infrequently, occur as a fluctua-
CHAPTER XIV.
Row.
(Tp.
Group of Rows.
ffjj.
3-1
5, 6, and 7
1-8
2-1
8, 9, 10, and 11
1-6
1-7
1.2
2-7
15 and upwards
1-1
3. a2=n.pq as if the chance of success were p in all cases (but the mean is
n/2notp.n).
az=566,582. r=0-000029.
ANSWERS, ETC., TO EXERCISES GIVEN. 363
CHAPTER XV.
(l)
(2)
(3)
7 792
12
8 495
66
9 220
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
2-20
10 66
495
11 12
792
12 1
924
Total, 4096
459-4
5 116-4
1102-6
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
6 27-2
1212-8
7 4-7
808-6
8 -6
-1
363 9
Total, 4096-2
192
288
141
24
Total, 648
r<np+p: if *ip is an integer, r=np gives the greatest term and also the mean.
4.
Binomial.
Normal curve.
1-7
10
10-5
45
42-7
120
116-1
210
211-5
252
258-4
210
211-5
364 THEORY OF STATISTICS.
4. In fig. 50, suppose every horizontal array to be given a slide to the right
until its mean lies on the vertical axis through the mean of the whole distribu-
tion: then suppose the ellipses to be squeezed in the direction of this vertical
axis until they become circles. The original quadrant has now become a
sector with an angle between one and two right angles, and the question is
CHAPTER XVII.
frequency 1472, standard error 026 lb.; upper Q, frequency 1116, standard
error 0-34 lb. 3. 0-18 lb. 4. 0.24 lb., 17 per cent• less than the standard
the standard error of the semi-interquartile range is 1-23 per cent• of that
range.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
6 r. n = 100. n = 1000.
o-o
o-i
0-0316
0-2
0-096
0-0304
0-4
0-084
0-0266
0.6
0-064
0-0202
0-8
0-036
0-0114
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
INDEX.
£The references are to pages. The subject matter of the Exercises given at
the ends of the chapters has been indexed only when such exercises (or
the answers thereto) give the constants for statistical tables in the text,
the text are given first, followed by citations of the authors' papers or
chances), 261-262.
Staatswissenschaft, 2.
Earnings.
Observation, 355.
metic.
correlation, 315-316.
refs., 57.
Asymmetricalfrequency-distributions,
quency-distributions.
Asymmetry in frequency-distribu-
THEORY OF STATISTICS.
Sampling, of attributes).
Mode.
318.
modes, 122.
chances, 269.
355.
de statistique, 365.
probabilités, 355.
310.
23.
195.
bilités, 355.
tistics, 355.
328.
367
Consistence of correlation-coefficients.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
246-247.
349.
refs., 222.
generally), 221.
of correlation-coefficient for
refs., 221-222.
tation of frequency-distribution by
THEORY OF STATISTICS.
328-329.
of standard-deviation, 232-233, of
100.
probability, 355.
130.
table, 353.
197.
keitsrechnung, 355.
369
Deviation, standard.
166.
quency-distribution.
calculation of standard-deviation,
curve, 311.
THEORY OF STATISTICS.
interpreting correlations—"spuri-
correlations, 247-248.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
(ref.), 222.
ref., 222.
350.
144.
of prices, 131.
illustrations: of death-rates in
371
another, 291-293.
Bills of Mortality, 6.
monic.
relation, 206.
(qu. 3) 155.
THEORY OF STATISTICS.
tical," 4.
frond, 185-187.
268.
84, 90.
of movements, 199-201.
istical," 4.
373
300.
schaflen, 5.
205.
229-230.
of Political Economy, 6.
politique, 105.
partial.
partial.
THEORY OF STATISTICS.
(qu. 3) 189.
of attributes.
refs., 154.
tion, 117.
353.
Writings, 6.
bability, 356.
sampling, 285.
of dispersion, 133.
distributions, 117.
INDEX.
375
normal.
from, 37.
"statistics," 3.
tiles.
"statist," 1.
Frequency-distributions.
Skewness of frequency-distributions,
THEORY OF STATISTICS.
errors, 350.
Symmetrical frequency-distributions,
table, 164.
death-rates, 222.
356.
ability, 6.
tion, 12-13.
17, 18.
102-105.
diagram, 101.
persion, 154.
Earnings.
197.
Mode.
(qu. 2) 155.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
X
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google