Yule 1911

Download as pdf or txt
Download as pdf or txt
You are on page 1of 407

An introduction to the theory of statistics,

Yule, G. Udny (George Udny), 1871-1951.

London, C. Griffin and company, limited, 1911.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Public Domain in the United States,


This work is deemed to be in the public domain in the

United States of America. It may not be in the public
domain in other countries. Copies are provided as a
preservation service. Particularly outside of the United
States, persons receiving copies should make appropriate
efforts to determine the copyright status of the work
in their country and use the work accordingly. It is possible
that heirs or the estate of the authors of individual portions
of the work, such as illustrations, assert copyrights over
these portions. Depending on the nature of subsequent
use that is made, additional rights may need to be obtained
independently of anything we can address. The digital
images and OCR of this work were produced by Google,
Inc. (indicated by a watermark on each page in the
PageTurner). Google requests that the images and OCR
not be re-hosted, redistributed or used commercially.
The images are provided for educational, scholarly,
non-commercial purposes.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Twenty-sixth Annual Issue.





Comprising (together with other Official Information) LISTS of the

PAPERS read during the Session 1908-1909 before all the LEADING

SOCIETIES throughout the Kingdom engaged in the following

Departments of Research :—

§ 1. Science Generally: i.e. Societies §

occupying themselves with several §

Branches of Science, or with

Science and Literature jointly.

§ 2. Mathematics and Physics.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

I 3. Chemistry and Photography.

§ 4. Geology, Geography, and Mineralogy.

§ 5. Biology, including Microscopy and


Economic Science and Statistics.

7. Mechanical Science, Engineering,

and Architecture.

§ 8. Naval and Military Science.

§ 9. Agriculture and Horticulture.

§ 10. Law.

§ 11. Literature.

5 12. Psychology.

§ 13. Archaeology.

§ 14. Medicine.

"Fills a very real want."—Engineering.

"Indispensable to any one who may wish to keep himself

abreast of the scientific work of the day."—Edinburgh Medical

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


"The YEAR-BOOK OF Sooieties is a Record which ought to be of the greatest

use for the progress of Science."—Lord Play fair, F.R.S., K.C.B., M.P.. Past-President

of the British Association.

"It goes almost without saying that a Handbook of this subject will be in time

one of the most generally useful works for the library or the desk."—The Times.

"British Societies are now well represented in the 'Year-Book of the Scientific

and Learned Societies of Great Britain and Ireland.'"—(Art. "Societies" in New

Edition of " Encyclopaedia Britannica," vol. xxii.)

Copies of the First Issue, giving an Account of the History,

Organization, and Conditions of Membership of the various

Societies, and forming the groundwork of the Series, may still

be had, price 7/6. Also Copies of the Issues following.

The year-book of sooieties forms a complete index to the scientific work

of the sessional year in the various Departments. It is used as a Handbook in all

our great Scientific Centres, Museums, and Libraries throughout the Kingdom,

and has become an indispensable book of reference to every one engaged in

Scientific Work.









tiUitb 53 figures and SHagrams.




Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

The following chapters are based on the courses of instruction

given during my tenure of the Newmarch Lectureship in Statistics

at University College, London, in the sessions 1902-1909. The

variety of illustrations and examples has, however, been increased

to render the book more suitable for the use of biologists and »

others besides those interested in economic and vital statistics,

and some of the more difficult parts of the subject have been

treated in greater detail than was possible in a sessional course

of some thirty lectures. For the rest, the chapters follow closely

the arrangement of the course, the three parts into which the

volume is divided corresponding approximately to the work of

the three terms. To enable the student to proceed further with

the subject, fairly detailed lists of references to the original

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

memoirs have been given at the end of each chapter: exercises

have also been added for the benefit, more especially, of the

student who is working without the assistance of a teacher.

The volume represents an attempt to work out a systematic

introductory course on statistical methods—the methods available

for discussing, as distinct from collecting, statistical data—-suited

to those who possess only a limited knowledge of mathematics:

an acquaintance with algebra up to the binomial theorem,

.together with such elements of co-ordinate geometry as are now

generally included therewith, is all that is assumed. I hope that

it may prove of some service to the students of the diverse

sciences in which statistical methods are now employed.

My most grateful thanks are due to Mr R. H. Hooker not only

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

for reading the greater part of the manuscript, and the proofs,,

and for making many criticisms and suggestions which have

been of the greatest service, but also for much friendly help and

encouragement without which the preparation of the volume,

often delayed and interrupted by the pressure of other work,

might never have been completed: my debt to Mr Hooker is

indeed greater than can well be expressed in a formal preface.

My thanks are also due to Mr H. D. Vigor for some assistance

in checking the arithmetic, and my acknowledgments to Professor

Edgeworth for the example used in § 5 of Chap. XVII. to illustrate

the influence of the form of the frequency distribution on the

probable error of the median.

I can hardly hope that all errors in the text or in the mass
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

of arithmetic involved in examples and exercises have been

eliminated, and will feel indebted to any reader who directs

my attention to any such mistakes, or to any omissions, am-

biguities, or obscurities.

G. U. Y.

December 1910.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



1-3. The introduction of the terms " statistics," " statistical," into

the English language—4-6. The change in meaning of these

terms during the nineteenth century—7-9. The present use

of the terms—10. Definitions of "statistics," "statistical

methods," "theory of statistics," in accordance with present

usage 1-8




1-2. Statistics of attributes and statistics of variables: fundamental

character of the former—3-5. Classification by dichotomy—

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

6-7. Notation for single attributes and for combinations—

8. The class-frequency—9. Positive and negative attributes,

contraries—10. The order of a class—11. The aggregate—

12. The arrangement of classes by order and aggregate—

13-14. Sufficiency of the tabulation of the ultimate class-

frequencies—15-17. Or, better, of the positive class-fre-

quencies—18. The class-frequencies chosen in the census

for tabulation of statistics of infirmities—19. Inclusive and

exclusive notations and terminologies 7-16-



1-3. The field of observation or universe, and its specification by

symbols—4. Derivation of complex from simple relations by

specifying the universe — 5-6. Consistence — 7-10. Con-

ditions of consistence for one and for two attributes—

11-14. Conditions of consistence for three attributes . . 17-24

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google




1-4. The criterion of independence — D-10. The conception of

association, and testing for the same by the comparison

of percentages—11-12. Numerical equality of the differences

between the four second-order frequencies and their in-

dependence values—13. The coefficient of association—

14. Necessity for an investigation into the causation of an

attribute A being extended to include non-^4's . . . 25-41



1-2. Uncertainty in interpretation of an observed association—3-5.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Source of the ambiguity: partial associations—6-8. Illusory

association due to the association of each of two attributes

with a third—9. Estimation of the partial associations from

the frequencies of the second order—10-12. The total

number of associations for a given number of attributes—

13-14. The case of complete independence . . . . 42-59



1. The general principle of a manifold classification—2-4. The

table of double entry or contingency table and its treatment

by fundamental methods—5-8. The coefficient of contin-

gency—9-10. analysis of a contingency table by tetrads

—11-13. Isotropic and anisotropic distributions—14-15.

Homogeneity of the classifications dealt with in the pre-

ceding chapters: heterogeneous classifications . . . 60-74


Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


1. Introductory—2. Necessity for classification of observations : the

frequency-distribution—3. Illustrations—4. Method of form-

ing the table—5. Magnitude of class-intervals—6. Position

of intervals—7. Process of classification—8. Treatment of

intermediate observations—9. Tabulation—10. Tables with

unequal intervals —11. Graphical representation of the

frequency-distribution—12. Ideal frequency-distributions—

13. The symmetrical distribution—14. The moderately

asymmetrical distribution—15. The extremely asymmetri-

cal or J-shaped distribution—16. The U-shaped distribution 75-105





1. Necessity for quantitative definition of the characters of a

frequency-distribution—2. Measures of position (averages)

and of dispersion—3. The dimensions of an average the

same as those of the variable—4. Desirable properties for

an average to possess—5. The commoner forms of average—

6-13. The arithmetic mean: its definition, calculation, and

simpler properties—14-18. The median: its definition,

calculation, and simpler properties—19-20. The mode: its

definition and relation to mean and median—21. Summary

comparison of the preceding forms of average—22-26. The

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

geometric mean: its definition, simpler properties, and the

cases in which it is specially applicable—27. The harmonic

mean: its definition and calculation 106-132



1. Inadequacy of the range as a measure of dispersion—

2-13. The standard deviation: its definition, calculation,

and properties—14-19. The mean deviation: its definition,

calculation, and properties—20-24. The quartile deviation

or semi-interquartile range—25. Measures of relative dis-

persion—26. Measures of asymmetry or skewness—27-30.

The method of grades or percentiles 133-156



1-3. The correlation table and its formation—4-5. The correlation

surface—6-7. The general problem—8-9. The line of means

of rows and the line of means of columns: their relative

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

positions in the case of independence and of varying degrees

of correlation—10-14. The correlation-coefficient and the

regressions —15-16. Numerical calculations. — 17. Certain

points to be remembered in calculating and using the

coefficient 157-190




41. Necessity for careful choice of variables before proceeding to

calculate r—2-8. Illustration i.: Causation of pauperism—

9-10. Illustration ii.: Inheritance of fertility—11-13.



Illustration iii. : The weather and the crops—14. Corre-

lation between the movements of two variables: (a)

Non- periodic movements: Illustration iv.: changes in

infantile and general mortality—15-17. (6) Quasi-periodic

movements: Illustration v.: the marriage-rate and foreign

trade—18. Elementary methods of dealing with cases of

non-linear regression—19. Certain rough methods of approxi-

mating to the correlation-coefficient 191-206




1. Introductory—2. Standard-deviation of a sum or difference—

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

3. Influence of grouping of observations on the standard-

deviation—4-5. Influence of errors of observation on the

standard-deviation—6-7. Influence of errors of observation

on the correlation-coefficient (Spearman's theorems) — 8.

Mean and standard-deviation of an index—9. Correlation

between indices—10. Correlation-coefficient for a two x two-

fold table—11. Correlation-coefficient for all possible pairs of

N values of a variable—12. Correlation due to heterogeneity

of material—13. Reduction of correlation due to mingling

of uncorrelated with correlated material —14-17. The

weighted mean—18-19. Application of weighting to the

correction of death-rates, etc., for varying sex and age-

distributions—20. The weighting of forms of average other

than the arithmetic mean 207-224



1-2. Introductory explanation—3. Direct deduction of the formulae

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

for two variables — 4. Special notation for the general

case: generalised regressions—5. Generalised correlations—

6. Generalised deviations and standard - deviations —

7-8. Theorems concerning the generalised product-sums —

9. Direct interpretation of the generalised regressions —

10-11. Reduction of the generalised standard-deviation—

12. Reduction of the generalised regression—13. Reduction

of the generalised correlation-coefficient—14. Arithmetical

work: Example i. ; Example ii.—15. Geometrical repre-

sentation of correlation between three variables by means of

a model—16. The coefficient of n-fold correlation—17. Ex-

pression of regressions and correlations of lower in terms of

those of higher order—18. Limiting inequalities between

the values of correlation-coefficients necessary for consist-

ence—19. Fallacies 225-249






1. The problem of the present Part—2. The two chief divisions of

the theory of sampling—3. Limitation of the discussion to

the case of simple sampling—4. Definition of the chance of

success or failure of a given event—5. Determination of the

mean and standard-deviation of the number of successes in

n events—6. The same for the proportion of successes in n

events: the standard-deviation of simple sampling as a

measure of unreliability or its reciprocal as a measure of

precision—7. Verification of the theoretical results by ex-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

periment—8. More detailed discussion of the assumptions

on which the formula for the standard-deviation of simple

sampling is based—9-10. Biological cases to which the

theory is directly applicable—11. Standard-deviation of

simple sampling when the numbers of observations in the

samples vary—12. Approximate value of the standard-

deviation of simple sampling, and relation between mean

and standard-deviation, when the chance of success or

failure is very small—13. Use of the standard-deviation of

simple sampling, or standard error, for checking and con-

trolling the interpretation of statistical results .





Warning as to the assumption that three times the standard

error gives the range for the majority of fluctuations of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

simple sampling of either sign—2. Warning as to the use

of the observed for the true value of p in the formula for

the standard error—3. The inverse standard error, or

standard error of the true proportion for a given observed

proportion: equivalence of the direct and inverse standard

errors when n is large—4-8. The importance of errors

other than fluctuations of "simple sampling" in practice:

unrepresentative or biassed samples—9-10. Effect of diver-

gences from the conditions of simple sampling: (a) effect

of variation in p and q for the several universes from which

the samples are drawn—11-12. (6) Effect of variation in

p and q from one sub-class to another within each universe—

13-14. (c) Effect of a correlation between the results of the

several events—15. Summary






1-2. Determination of the frequency-distribution for the number

of successes in n events: the binomial distribution—3.

Dependence of the form of the distribution on p, y, and n—

4-5. Graphical and mechanical methods of forming re-

presentations of the binomial distribution—6. Direct

calculation of the mean and the standard-deviation from

the distribution—7-8. Necessity of deducing, for use in

many practical cases, a continuous curve giving approxi-

mately, for large values of ?i, the terms of the binomial

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

series—9. Deduction of the normal curve as a limit to the

symmetrical binomial—10-11. The value of the central

ordinate—12. Comparison with a binomial distribution for

a moderate value of n—13. Outline of the more general

conditions from which the curve can be deduced by advanced

methods—14. Fitting the curve to an actual series of

observations—15. Difficulty of a complete test of fit by

elementary methods—16. The table of areas of the normal

curve and its use—17. The quartile deviation and the

"probable error"—18. Illustrations of the application of

the normal curve and of the table of areas .... 287-312



1-3. Deduction of the general expression for the normal correlation

surface from the case of independence—4. Constancy of the

standard-deviations of parallel arrays and linearity of the

regression—5. The contour lines: a series of concentric and

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

similar ellipses—6. The normal surface for two correlated

variables regarded as a normal surface for uncorrelated vari-

ables rotated with respect to the axes of measurement:

arrays taken at any angle across the surface are normal

distributions with constant standard-deviation: distribution

of and correlation between linear functions of two normally

correlated variables are normal: principal axes—7. Standard-

deviations round the principal axes—8-11. Investigation of

Table III., Chapter IX., to test normality: linearity of

regression, constancy of standard-deviation of arrays,

normality of distribution obtained by diagonal addition,

contour lines—12-13. Isotropy of the normal distribution

for two variables—14. Outline of the principal properties of

the normal distribution for n variables .... 313-330






1-2. The problem of sampling for variables: the conditions

assumed—3. Standard error of a percentile—4. Special

values for the percentiles of a normal distribution—5.

Effect of the form of the distribution generally—6. Simplified

formula for the case of a grouped frequency-distribution—7.

Correlation between errors in two percentiles of the same

distribution—8. Standard error of the interquartile range

for the normal curve—9. Effect of removing the restrictions

of simple sampling, and limitations of interpretation—10.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Standard error of the arithmetic mean—11. Relative sta-

bility of mean and median in sampling—12. Standard error

of the difference between two means—13. The tendency to

normality of a distribution of means—14. Effect of removing

the restrictions of simple sampling—15. Statement of the

standard errors of standard-deviation, coefficient of variation,

correlation-coefficient, and regression—16. Restatement of

the limitations of interpretation if the sample be small . 331-351

Appendix I.—Tables for facilitating Statistical Work . . . 352-354

Appendix II.—Short List of Works on the Mathematical Theory

of Statistics, and the Theory of Probability . . . 355-356

Answers to, and Hints on the Solution of, the Exercises

given 357-364

Index 365-376
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


1-3. The introduction of the terms " statistics," " statistical," into the English

language—4-6. The change in meaning of these terms during the

nineteenth century—7-9. The present use of the terms—10. Defini-

tions of "statistics," "statistical methods," "theory of statistics," in

accordance with present usage.

1. The words "statist," "statistics," "statistical," appear to be

all derived, more or less indirectly, from the Latin status, in the

sense that it acquired in mediaeval Latin of a political state.

2. The first term is, however, of much earlier date than the two

others. The word "statist" occurs, for instance, in Hamlet

(1602),1 Gymbeline (1610 or 1611),2 and in Paradise Regained

(1671).3 "Statistics" and "statistical" seem to have been only

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

introduced into English in 1787, the earliest known uses of the

terms occurring in the preface to A Political Survey of the Present

State of Europe, by E. A. W Zimmermann,4 issued in that year.

"It is about forty years ago," says Zimmermann, "that that branch

of political knowledge, which has for its object the actual and

relative power of the several modern states, the power arising

from their natural advantages, the industry and civilisation of

their inhabitants, and the wisdom of their governments, has been

formed, chiefly by German writers, into a separate science. . . .

By the more convenient form it has now received .... this

science, distinguished by the new-coined name of statistics, is

become a favourite study in Germany" (p. ii); and again (p. v),

"To the several articles contained in this work, some respectable

1 Act v., sc. 2. 2 Act ii., sc. 4. 3 Bk. iv.

* Zimmermann's work appears to have been written in English, though he

was a German, Professor of Natural Philosophy at Brunswick.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

statistical writers have added a view of the principal epochas of the

history of each country."

3. Within the next few years the words were adopted by several

writers, notably by Sir John Sinclair, the editor and organiser of the

first Statistical Account of Scotland,1 to whom, indeed, their intro-

duction has been frequently ascribed. In the circular letter to the

Clergy of the Church of Scotland issued in May 1790,2 he states

that in Germany "' Statistical Inquiries,' as they are called, have

been carried to a very great extent," and adds an explanatory

footnote to the phrase "Statistical Inquiries"—"or inquiries

respecting the population, the political circumstances, the pro-

ductions of a country, and other matters of state." In the

"History of the Origin and Progress" 3 of the work, he tells us,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

"Many people were at first surprised at my using the new words,

Statistics and Statistical, as it was supposed that some term in our

own language might have expressed the same meaning. But in

the course of a very extensive tour, through the northern parts of

Europe, which I happened to take in 1786, I found that in

Germany they were engaged in a species of political enquiry,

to which they had given the name of Statistics ;4 .... as I

thought that a new word might attract more public attention,

I resolved on adopting it, and I hope that it is now completely

naturalised and incorporated with our language." This hope

was certainly justified, but the meaning of the word underwent

rapid development during the half century or so following its


4. "Statistics" (statistik), as the term is used by German

writers of the eighteenth century, by Zimmermann and by Sir

John Sinclair, meant simply the exposition of the noteworthy

characteristics of a state, the mode of exposition being—almost

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

inevitably at that time—preponderantly verbal. The conciseness

and definite character of numerical data were recognised at a

comparatively early period—more particularly by English writers

—but trustworthy figures were scarce. After the commencement

of the nineteenth century, however, the growth of official data

was continuous, and numerical statements, accordingly, began

more and more to displace the verbal descriptions of earlier days.

"Statistics " thus insensibly acquired a narrower signification, viz.,

1 Twenty-one vols., 1791-99.

2 Statistical Account, vol. xx., Appendix to '' The History of the Origin and

Progress . . . ." given at the end of the volume.

3 Loc. cit., p. xiii.

4 The Abriss der Statsxcissenschaft der Europaischen Reiche (1749) of Gottfried

Achenwall, Professor of Politics at Gtittingen, is the volume in which the word

"statistik" appears to be first employed, but the adjective "statisticus"

occurs at a somewhat earlier date in works written in Latin.


the exposition of the characteristics of a State by numerical *

methods. It is difficult to say at what epoch the word came

definitely to bear this quantitative meaning, but the transition

appears to have been only half accomplished even after the founda-

tion of the Royal Statistical Society in 1835. The articles in the

first volume of the Journal, issued in 1838-9, are for the most

part of a numerical character, but the official definition has no

reference to method. "Statistics," we read, "may be said, in the

words of the prospectus of this Society, to be the ascertain-

ing and bringing together of those facts which are calculated to

illustrate the condition and prospects of society." 1 It is, however,

admitted that "the statist commonly prefers to employ figures

and tabular exhibitions."

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

5. Once, however, the first change of meaning was accomplished,

further changes followed. From the name of a science or art of

state-description by numerical methods, the word was transferred to

those series of figures with which it operated, as we speak of vital

statistics, poor-law statistics, and so forth. But similar data

occur in many connections; in meteorology, for instance, in anthro-

pology, etc. Such collections of numerical data were also termed

"statistics," and consequently, at the present day, the word is

held to cover a collection of numerical data, analogous to those

which were originally formed for the study of the state, on almost

any subject whatever. We not only read of rainfall "statistics,"

but of "statistics" showing the growth of an organisation for

recording rainfall.2 We find a chapter headed "Statistics" in a

book on psychology,3 and the author, writing of "statistics con-

cerning the mental characteristics of man," "statistics of children,

under the headings bright—average—dull."4 We are informed

that, in a book on Latin verse, the characteristics of the Virgilian

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

hexameter "are examined carefully with statistics." 5

6. The development in meaning of the adjective "statistical"

was naturally similar. The methods applied to the study of

numerical data concerning the state were still termed " statistical

methods," even when applied to data from other sources. Thus

we read of the inheritance of genius being treated "in a statistical

manner,"6 and we have now "a journal for the statistical

study of biological problems."7 Such phrases as "the statistical

1 Jour. Stat. Soc, vol. i. p 1.

- Symons' British Rainfall for 1899, p. 15.

3 E. W. Scripture, The New Psychology, 1897, chap. ii.

4 Op. tit. p. 18.

5 Athe-ueum, Oct. 3, 1903.

8 Francis Galton, Hereditary Oenius (Macmillan, 1869), preface.

7 Biometrika, Cambridge Univ. Press, the first number issued in 1901.


investigation of the motion of molecules "J have become part of

the ordinary language of physicists. We find a work entitled

"the principles of statistical mechanics,"2 and the Bakerian

lecture for 1909, by Sir J. Larmor, was on "the statistical and

thermodynamical relations of radiant energy."

7. It is unnecessary to multiply such instances to show that the

words "statistics," "statistical," no longer bear any necessary

reference to " matters of state." They are applied indifferently in

physics, biology, anthropology, and meteorology, as well as in the

social sciences. Diverse though these cases are, there must be

some community of character between them, or the same terms

and the same methods woulc) not be applied. What, then, is this

common character 1
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

8. Let us turn to social science, as the parent of the methods

termed "statistical," for a moment, and consider its characteristics

as compared, say, with physics or chemistry. One characteristic

stands out so markedly that attention has been repeatedly

directed to it by "statistical" writers as the source of the peculiar

difficulties of their science—the observer of social facts cannot ex-

periment, but must deal with circumstances as they occur, apart

from his control. Now the object of experiment is to replace the

complex systems of causation usually occurring in nature by

simple systems in which only one causal circumstance is permitted

to vary at a time. This simplification being impossible, the

observer has, in general, to deal with highly complicated cases of

multiple causation—cases in which a given result may be due to

any one of a number of alternative causes or to a number of

different causes acting conjointly.

9. A little consideration will show, however, that this is also

precisely the characteristic of the observations in other fields to

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

which statistical methods are applied. The meteorologist, for

example, is in almost precisely the same position as the student

of social science. He can experiment on minor points, but the

records of the barometer, thermometer, and rain gauge have to be

treated as they stand. With the biologist, matters are in some-

what better case. He can and does apply experimental methods

to a very large extent, but frequently cannot approximate

closely to the experimental ideal; the internal circumstances of

animals and plants too easily evade complete control. Hence a

large field (notably the study of variation and heredity) is left,

in which statistical methods have either to aid or to replace the

methods of experiment. The physicist and chemist, finally,

1 Clerk Maxwell, "Theory of Heat" (1871), and "On Boltzmann's

Theorem" (1878), Camb. Phil. Trans., vol. xii.

2 By J. Willard Gibbs (Maemillan, 1902).


stand at the other extremity of the scale. Theirs are the

sciences in which experiment has been brought to its greatest

perfection. But even so, statistical methods still find application.

In the first place, the methods available for eliminating the effect <s

of disturbing circumstances, though continually improved, are not,

and cannot be, absolutely perfect. The observer himself, as well

as the observing instrument, is a source of errpr; the effects of

changes of temperature, or of moisture, of pressure, draughts, vibra-

tion, cannot be completely eliminated. Further, in the problems

of molecular physics, referred to in the last sentences of § 6,

multiplicity of causes is of the essence of the case. The motion

of an atom or of a molecule in the middle of a swarm is dependent

on that of every other atom or molecule in the swarm.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

10. In the light of this discussion, we may accordingly give the

following definitions:—

By statistics we mean quantitative data affected to a marked

extent by a multiplicity of causes.

By statistical methods we mean methods specially adapted to

the elucidation of quantitative data affected by a multiplicity of


By theory of statistics we mean the exposition of statistical


The insertion in the first definition of some such words as "to

a marked extent " is necessary, since the term "statistics " is not

usually applied to data, like those of the physicist, which are

affected only by a relatively small residuum of disturbing causes.

At the same time, "statistical methods" are applicable to all such

cases, whether the influence of many causes be large or not.


The History of the Words "Statistics," "Statistical."

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

(1) John, V., Der Name Statistik; Weiss, Berne, 1883. A translation in

Jour. Roy. Slat. Soc. for same year.

(2) Yule, G. U., "The Introduction of the Words 'Statistics,' ' Statistical,'

into the English Language," Jour. Roy. Stat. Soc., vol. lxviii., 1905,

p. 391.

The History of Statistics in General.

(3) John, V., Geschichte der Statistik, lte Teil, bis auf Quetelet; Enke,

Stuttgart, 1884. (All published; the author died in 1900. By far the

best history of statistics down to the early years of the nineteenth


(4) Mohl, Robert von, Geschichte und Litteratur der Staatswissenschaften,

3 vols.; Enke, Erlangen, 1855-58. (For history of statistics see

principally latter half of vol. iii.)


(5) Gabaglio, Antonio, Teoria generale della statistica, 2 vols. ; Hoepli,

Milano, 2nd edn., 1888. (Vol. i., Parte storiea.)

Several works on theory of statistics include short histories, e.g.

H. Westergaard's Die Grundziige der Theorie der Statistik (Fischer,

Jena, 1890), and P. A. Meitzen's Geschichte, Theorie und Technik der

Statistik (new edn., 1903 ; American translation by R. P. Falkner,

1891). There is no detailed history in English, but the article

"Statistics" in the Encyclopaedia Britannica (9th edn.) gives a sketch,

and the biographical articles in Palgrave's Dictionary of Political

Economy are useful. For its importance as regards the English school

of political arithmetic, reference may also be made to—

(6) Hull, C. H., The Economic Writings of Sir William Petty, together

with the Observations on the Bills of Mortality more probably by Captain

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

John Graunt, Cambridge University Press, 2 vols., 1899.

History of Theory of Statistics.

Somewhat slight information is given in the general works cited.

From the purely mathematical side the following is important:—

(7) Todhunter, I., A History of the Mathematical Theory of Probability

from the time of Pascal to that of Laplace; Macmillan, 1865.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



1-2. Statistics of attributes and statistics of variables : fundamental character

of the former—3-5. Classification by dichotomy—6-7. Notation for

single attributes and for combinations—8. The class-frequency—9.

Positive and negative attributes, contraries—10. The order of a class—

11. The aggregate—12. The arrangement of classes by order and

aggregate—13-14. Sufficiency of the tabulation of the ultimate class-

frequencies—IP—17. Or, better, of the positive class-frequencies—18.

The class-frequencies chosen in the census for tabulation of statistics

of infirmities—19. Inclusive and exclusive notations and terminologies.

1. The methods of statistics, as defined in the Introduction,

deal with quantitative data alone. The quantitative character

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

may, however, arise in two different ways.

In the first place, the observer may note only the presence or

absence of some attribute in a series of objects or individuals, and

count how many do or do not possess it. Thus, in a given

population, we may count the number of the blind and seeing,

the dumb and speaking, or the insane and sane. The quantitative

character, in such cases, arises solely in the counting.

In the second place, the observer may note or measure the

actual magnitude of some variable character for each of the

objects or individuals observed. He may record, for instance, the

ages of persons at death, the prices of different samples of a

commodity, the statures of men, the numbers of petals in flowers.

The observations in these cases are quantitative ab initio.

2. The methods applicable to the former kind of observations,

which may be termed statistics of attributes, are also applicable

to the latter, or statistics of variables. A record of statures of

men, for example, may be treated by simply counting all measure-

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

ments as tall that exceed a certain limit, neglecting the magnitude

of excess or defect, and stating the numbers of tall and short (or


more strictly not-tall) on the basis of this classification. Similarly,

the methods that are specially adapted to the treatment of

statistics of variables, making use of each value recorded, are

available to a greater extent than might at first sight seem possible

for dealing with statistics of attributes. For example, we may

treat the presence or absence of the attribute as corresponding to

the changes of a variable which can only possess two values, say

0 and 1. Or, we may assume that we have really to do with a

variable character which has been crudely classified, as suggested

above, and we may be able, by auxiliary hypotheses as to the

nature of this variable, to draw further conclusions. But the

methods and principles developed for the case in which the observer

only notes the presence or absence of attributes are the simplest

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

and most fundamental, and are best considered first. This and

the next three chapters (Chapters I.-IV.) are accordingly devoted

to the Theory of Attributes.

3. The objects or individuals that possess the attribute, and

those that do not possess it, may be said to be members of two

distinct classes, the observer classifying the objects or individuals

observed. In the simplest case, where attention is paid to one

attribute alone, only two mutually exclusive classes are formed.

If several attributes are noted, the process of classification may,

however, be continued indefinitely. Those that do and do not

possess the first attribute may be reclassified according as they do

or do not possess the second, the members of each of the sub-

classes so formed according as they do or do not possess the

third, and so on, every class being divided into two at each step.

Thus the members of the population of any district may be

classified into males and females; the members of each sex into

sane and insane; the insane males, sane males, insane females,
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

and^ane females into blind and seeing. If we were dealing with

a number of peas (Pisum sativum) of different varieties, they

might be classified as tall or dwarf, with green seeds or yellow

seeds, with wrinkled seeds or round seeds, so that we would have

eight classes—tall with round green seeds, tall with round yellow

seeds, tall with wrinkled green seeds, tall with wrinkled yellow

seeds, and four similar classes of dwarf plants.

4. It may be noticed that the fact of classification does not

necessarily imply the existence of either a natural or a clearly

defined boundary between the two classes. The boundary may

be wholly arbitrary, e.g. where prices are classified as above or

below some special value, barometer readings as above or below

some particular height. The division may also be vague and

uncertain: sanity and insanity, sight and blindness, pass

into each other by such fine gradations that judgments may


differ as to the class in which a given individual should be

entered. The possibility of uncertainties of this kind should

always be borne in mind in considering statistics of attributes:

whatever the nature of the classification, however, natural or

artificial, definite or uncertain, the final judgment must be de-

cisive; any one object or individual must be held either to possess

the given attribute or not.

5. A classification of the simple kind considered, in which each

class is divided into two sub-classes and no more, has been termed

by logicians classification, or, to use the more strictly applicable

term, division by dichotomy (cutting in two). The classifica-

tions of most statistics are not dichotomous, for most usually a

class is divided into more than two sub-classes, but dichotomy is

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

the fundamental case. In Chapter V. the relation of dichotomy

to more elaborate (manifold, instead of twofold or dichotomous)

processes of classification, and the methods applicable to some

such cases, are dealt with briefly.

6. For theoretical purposes it is necessary to have some simple

notation for the classes formed, and for the numbers of observa-

tions assigned to each.

The capitals A, B, C, . . . will be used to denote the several

attributes. An object or individual possessing the attribute A

will be termed simply A. The class, all the members of which

possess the attributed, will be termed the class A. It is con-

venient to use single symbols also to denote the absence of the

attributes A, B, C, . . . We shall employ the Greek lettsrs, a,

fi, y, . . . Thus if A represents the attribute blindness, a

represents sight, i.e. non-blindness; if B stands for deafness, (S

stands for hearing. Generally "a" is equivalent to "non-A," or

an object or individual not possessing the attribute A ; the class a

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

is equivalent to the class none of the members of which possess *the

attribute A.

7. Combinations of attributes will be represented by juxta-

positions of letters. Thus if, as above, A represents blindness, B

deafness, AB represents the combination blindness and deafness.

If the presence and absence of these attributes be noted, the four

classes so formed, viz. AB, A/i, aB, afi, include respectively the

blind and deaf, the blind but not-deaf, the deaf but not-blind, and

the neither blind nor deaf. If a third attribute be noted, e.g. in-

sanity, denoted say by C, the class ABC, includes those who are

at once deaf, blind, and insane, ABy those who are deaf and blind

but not insane, and so on.

Any letter or combination of letters like A, AB, aB, ABy, by

means of which we specify the characters of the members of a class,

may be termed a class symbol.


8. The number of observations assigned to any class is termed,

for brevity, the frequency of the class, or the class-frequency.

Class-frequencies will be denoted by enclosing the corresponding

class-symbols in brackets. Thus—

(J) den(

ttes number of .Is,

i.e. objects possessing attribute A


n a'st

,, Dot ,, ,, A

(AB) „


,, possessing attributes A and /.'

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(t£) „

„ aB's,

,, ,, ,, B but not A

(ABC) ,,



(aBC) „


,, „ „ B and C but not A

(«m „


., ,, ,, C but neither A nor B

and so on for any number of attributes. If A represent, as in

the illustration above, blindness, B deafness, C insanity, the

symbols given stand for the numbers of the blind, the not-blind,

the blind and deaf, the deaf but not blind, the blind, deaf, and in-

sane, the deaf and insane but not blind, and the insane but neither
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

blind nor deaf, respectively.

9. The attributes denoted by capitals ABC, . . . may be

termed positive attributes, and their contraries denoted by Greek

letters negative attributes. If a class-symbol include only

capital letters, the class may be termed a positive class; if only

Greek letters, a negative class. Thus the classes A, AB, ABC

are positive classes; the classes a, a/3, a/Jy, negative classes.

If two classes are such that every attribute in the symbol for

the one is the negative or contrary of the corresponding attribute

in the symbol for the other, they may be termed contrary classes /

and their frequencies contrary frequencies; e.g. AB and af3, A/3

and aB, AfiC and aBy, are pairs of contraries.

10. The classes obtained by noting say n attributes fall into

natural groups according to the numbers of attributes used to

specify the respective classes, and these natural groups should be

borne in mind in tabulating the class-frequencies. A class

specified by r attributes may be spoken of as a class of the rth

order and its frequency as a frequency of the rth order. Thus AB,

AC, BC are classes of the second order; (A), (A/3), (aBC),

(AByD), class-frequencies of the first, second, third, and fourth

orders respectively.

11. The classes of one and the same order fall into further

groups according to the actual attributes specified. Thus if three

attributes A, B,C have been noted, the classes of the second order

may be specified by any one of the pairs of attributes AB, AC, or

BC (and their contraries). The series of classes or class-frequen-

cies the symbols for which are derived from any one positive

class by substituting Greek letters for one or more of the italic

capital letters in every possible way will be termed an aggregate.

Thus (AB) (A/3) (aB) (a/2) form an aggregate of frequencies of



the second order, and the twelve classes of the second order which

can be formed where three attributes have been noted may be

grouped into three such aggregates.

12. Class-frequencies should, in tabulating, be arranged so that

frequencies of the same order and frequencies belonging to the

same aggregate are kept together. Thus the frequencies for the

case of three attributes should be grouped as given below; the

whole number of observations denoted by the letter iV being

reckoned as a frequency of order zero, since no attributes are


Order 0. N

Order 1.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Order 2.

Order 3.















Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google










13. In such a complete table for the case of three attributes,

twenty-seven distinct frequencies are given :—1 of order zero, 6

of the first order, 12 of the second, and 8 of the third. It

is, however, in no case necessary to give such a complete


The whole number of observations must clearly be equal to the

number of A's together with the number of a's, the number of

A's to the number of A's that are B together with the number of

A's that are not B; and so on,—i.e. any class-frequency can always

be expressed in terms of class-frequencies of higher order. Thus—

N=(A) + (a) = (B) + (B) = etc.

= (AB) + (AB) + (aB) + (a/3) = etc.

(A) = (AB) + (A/3) = (AC) + (Ay) = etc.

(AB) = (ABC) + (ABy) = etc.


Hence, instead of enumerating all the frequencies as under (1),

no more need be given, for the case of three attributes, than

the eight frequencies of the third order. If four attributes had

been noted it would be sufficient to give the sixteen frequencies of

the fourth order.

The classes specified by all the attributes noted in any case,

i.e. classes of the nth order in the case of n attributes, may be


termed the ultimate classes and their frequencies the ultimate

frequencies. Hence we may say that it is never necessary to'

enumerate more than the ultimate frequencies. All the others can

be obtained from these by simple addition.

Example i.—(See reference 5 at the end of the chapter.)

A number of school children were examined for the presence

or absence of certain defects of which three chief descriptions

were noted, A development defects, B nerve signs, C low


Given the following ultimate frequencies, find the frequencies

of the positive classes, including the whole number of" obser-

vations N.

(ABC) 57 (aBC) 78
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(ABy) 281 (aBy) 670

(ABC) 86 (aBC) 65

(ABy) 453 (aBy) 8310

The whole number of observations N is equal to the grand

total: N= 10,000.

The frequency of any first-order class, e.g. (A) is given by the

total of the four third-order frequencies, the class-symbols for

which contain the same letter—

(ABC) + (ABy) + (ABC) + (ABy) = (A) = 877.

Similarly, the frequency of any second-order class, e.g. (AB), is

given by the total of the two third-order frequencies, the class-

symbols for which both contain the same pair of letters—

(ABC) + (ABy) = (AB) = 338.

The complete results are—

iV 10,000 (AB) 338

(A) 877 (AC) 143

(B) 1,086 (BC) 135

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

(C) 286 (ABC) 57

14. The number of ultimate frequencies in the general case of

n attributes, or the number of classes in an aggregate of the nth

order, is given by considering that each letter of the class-symbol

may be written in two ways (A or a, B or /?, C or y), and that

either way of writing one letter may be combined with either

,* way of writing another. Hence the whole number of ways in

which the class-symbol may be written, i.e. the number of

classes, is—

2x2x2x2 . . . . =.2".

The ultimate frequencies form one natural set in terms of which

the data are completely given, but any other set containing the

same number of algebraically independent frequencies, viz. 2",

may be chosen- instead.

15. The positiye class-frequencies, including under this head the

total number of observations N, form one such set. They are alge-

braically independent; no one positive class-frequency can be ex-

pressed wholly in terms of the others. Their number is, moreover,

2", as may be readily seen from the fact that if the Greek letters

are struck out of the symbols for the ultimate classes, they become

the symbols for the positive classes, with the exception of oj8y

.... for which N must be substituted. Otherwise the number

is made up as follows :—
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Order 0. (The whole number of observations) ... 1

Order 1. (The number of attributes noted) .... n

n(n -1)

Order 2. (The number of combinations of n things 2 together) —j-~—


Order 3. (The number of combinations of n things 3 together) i~23

and so on. But the series

n(n - 1) n(n - 1)(» - 2)

1+«j v '- + x £j '- + . . . .

is the binomial expansion of (1 + 1)n or 2", therefore the total

number of positive classes is 2".

16. The set of positive class-frequencies is a most convenient

one for both theoretical and practical purposes.

Compare, for instance, the two forms of statement, in terms of

the ultimate and the positive classes respectively, as given in

Example i., § 13. The latter gives directly the whole number of

observations and the totals of A's, B's, and C's. The former gives
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

none of these fundamentally important figures without the perfor-

mance of more or less lengthy additions. Further, the latter gives

the second-order frequencies (AB), (AC), and (BC), which are neces-

sary for discussing the relations subsisting between A, B, and C, but

are only indirectly given by the frequencies of the ultimate classes.

17. The expression of any class-frequency in terms of the

positive frequencies is most easily obtained by a process of step-

by-step substitution; thus—

(a/J) =(«)"M)

= N-(A)-(B) + (AB) (3)

(aft) = (a/3)-(«/?£)

= AT- (A) - (B) + (AB) - (aC) + (aBC)

= N- (A) - (B) - (C) + (AB) + (AC) + (BC) - (ABC) (4)


Arithmetical work, however, should be executed from first

principles, and not by quoting formulae like the above.

Example ii.—Check the work of Example i., § 13, by finding the

frequencies of the ultimate classes from the frequencies of the

positive classes.

(ABy) = (AB) - (ABC) = 338 - 57 = 281

(ABy) = (Ay) - (ABy) = (A) - (AC) - (ABy)

= 877- 143-281=453

(aBy) = 08y) - (ABy) = N - (B) - (C) + (BC) - (ABy)

= 10,000-1086-286 + 135-453

= 10,135-1825 = 8310

and so on.

18. Examples of statistics of precisely the kind now under

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

consideration are afforded by the census returns, e.g., of 1891 or

1901, for England and Wales, of persons suffering from different

"infirmities," any individual who is deaf and dumb, blind or

mentally deranged (lunatic, imbecile, or idiot) being required to

be returned as such on the schedule. The classes chosen for

tabulation are, however, neither the positive nor the ultimate

classes, but the following (neglecting minor distinctions amongst

the mentally deranged and the returns of persons who are deaf

but not dumb):—Dumb, blind, mentally deranged; dumb and

blind but not deranged; dumb and deranged but not blind;

blind and deranged but not dumb; blind, dumb, and deranged.

If, in the symbolic notation, deaf-mutism be denoted by A, blind-

ness by B, and mental derangement by C, the class-frequencies

thus given are (A), (B), (C), (ABy), (ABC), (aBC), (ABC) (cf.

Census of England and Wales, 1891, vol. iii., tables 15 and 16,

p. lvii. Census of 1901, Summary Tables, table xlix.). This set of

frequencies does not appear to possess any special advantages.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

19. The symbols of our notation are, it should be remarked,

used in an inclusive sense, the symbol A, for example, signifying

an object or individual possessing the attribute A with or without

others. This seems to be the only natural use of the symbol,

but at least one notation has been constructed on an exclusive

basis (cf. ref. 5), the symbol A denoting that the object or in-

dividual possesses the attribute A, but not B or C or D, or what-

ever other attributes have been noted. An exclusive notation is

apt to be relatively cumbrous and also ambiguous, for the reader

cannot know what attributes a given symbol excludes until he

has seen the whole list of attributes of which note has been

taken, and this list he must bear in mind. The statement that

the symbol A is used exclusively cannot mean, obviously, that the

object referred to possesses only the attribute A and no others


whatever; it merely excludes the other attributes noted in the

particular investigation. Adjectives, as well as the symbols which

may represent them, are naturally used in an inclusive sense, and

care should therefore be taken, when classes are verbally described,

that the description is complete, and states what, if anything, is

excluded as well as what is included, in the same way as our

notation. The terminology of the English census has not, in

this respect, been quite clear. The "Blind" includes those who

are " Blind and Dumb," or " Blind, Dumb, and Lunatic," and so

forth. But the heading "Blind and Dumb," in the table relating

to "combined infirmities," is used in the sense "Blind and Dumb,

but not Lunatic or Imbecile," etc., and so on for the others. In

the first table the headings are inclusive, in the second exclusive.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259


(1) Jevons, W. Stanley, "On a General System of Numerically Definite

Reasoning," Memoirs of the Manchester Lit. and Phil. Snc, 1870.

Reprinted in Pure Logic and other Minor Works; Macmillan, 1890.

(The method used in these chapters is that of Jevons, with the notation

slightly modified to that employed in the next three memoirs cited.)

(2) Yule, G. U., "On the Association of Attributes in Statistics, etc.," Phil.

Trans. Roy. Soc, Series A, vol. exciv., 1900, p. 257-

(3) Yule, G. U., "On the Theory of Consistence of Logical Class-frequencies

and its Geometrical Representation," Phil. Trans. Roy. Soc, Series A,

vol. exevii., 1901, p. 91.

(4) Yule, G. U., "Notes on the Theory of Association of Attributes in

Statistics," Biomelrika• vol. ii., 1903, p. 121. (The first three sections

of (4) are an abstract of (2) and (3). The remarks made as regards the

tabulation of class-frequencies at the end of (2) should be read in con-

nection with the remarks made at the beginning of (3) and in this

chapter: cf. footnote on p. 94 of (3).

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Material has been cited from, and reference made to the notation used in—

(5) Warner, F.. and others, "Report on the Scientific Study of the Mental and

Physical Conditions of Childhood"; published by the Committee,

Parkes Museum, 1895.

(6) Warner, F., "Mental and Physical Conditions among Fifty Thousand

Children, etc.," Jour. Roy. Stat. Soc, vol. lix., 1896, p. 125.


1. (Figures from ref. (5).) The following are the numbers of boys observed

with certain classes of defects amongst a number of school-children. A•

denotes development defects ; B, nerve signs; C, low nutrition.

















Find the frequencies of the positive classes.


2. (Figures from ref. (5).) The following are the frequencies of the

positive classes for the girls in the same investigation :—










Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259






Find the frequencies of the ultimate classes.

3. (Figures from Census, England and Wales, 1891, vol. iii.) Convert the

census statement as below into a statement in terms of (a) the positive, (b)

the ultimate class-frequencies. ^4=blinuness, B=deaf-mutism, C=mental


N 29,002,525 (ABy) 82

(A) 23,467 (A&C) 380

(B) 14,192 (aBC) 500

(C) 97,383 (ABC) 25

4. (Cf. Mill's Logic, bk. iii^ eh. xvii., and ref. (1).) Show that if A

occurs in a larger proportion of the cases where B is than where B is not,

then will B occur in a larger proportion of the cases where A is than where
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

. A is not: i.e. given (AB)1(B)>(A$)/($), show that (AB)ftA)>(aB)fta).

i\ 5. (Cf. De Morgan, Formal Logic, p. 163, and ref. (1).) Most B's are A's,

most B's are Cs: find the least number of A's that are C's, i.e. the lowest

possible value of (A C).

6. Given that

(A) = (a) = (B) = ($) = lN,

show that

(AB) = (a$),(A0) = (aB).

7. (Cf. ref. (2), § 9, "Case of equality of contraries.") Given that

(A) = (a) = (B) = (0) = (C) = (y) = hN,

(ABC) = (a$y),

2 (ABC) = (AB) + (AC) + (BC)- \N.

^ 8. Measurements are made on a thousand husbands and a thousand wives.

If the measurements of the husbands exceed the measurements of the wives in

800 cases for one measurement, in 700 cases for another, and in 660 cases for

both measurements, in how many cases will both measurements on the wife

exceed the measurements on the husband?

and also that

show that



1-3. The field of observation or universe and its specification by symbols—

4. Derivation of complex from simple relations by specifying the

universe—5-6. Consistence—7-10. Conditions of consistence for one

and for two attributes—11-14. Conditions of consistence for three


1. Any statistical inquiry is necessarily confined to a certain

time, space, or material. An investigation on the prevalence of

insanity, for instance, may be limited to England, to England in

1901, to English males in 1901, or even to English males over 60

years of age in 1901, and so on.

For actual work on any given subject, no term is required to

denote the material to which the work is so confined: the limits

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

are specified, and that is sufficient. But for theoretical purposes

some term is almost essential to avoid circumlocution. The ex-

pression the universe of discourse, or simply the universe, used

in this sense by writers on logic, may be adopted as familiar and


2. The universe, like any class, may be considered as specified

by an enumeration of the attributes common to all its members,

e.g. to take the illustration of § 1, those implied by the predicates

English, male, over 60 years of age, living in 1901. It is not, in

general, necessary to introduce a special letter into the class-

symbols to denote the attributes common to all members of the

universe. We know that such attributes must exist, and the

common symbol can be understood.

In strictness, however, the symbol ought to be written: if, say,

V denote the combination of attributes, English—male—over 60

—living in 1901, A insanity, B blindness, we should strictly use

the symbols—
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

(U) = Number of English males over 60 living in 1901,

(UA) = ,, insane English males over 60 living in 1901,

(UB) = ,, blind

(UAB)= ,, blind and insane English males over 60 living in 1901,

17 2

instead of the simpler symbols N (A) (B) (AB). Similarly, the

general relations (2), § 13, Chap. I., using U to denote the common

attributes of all the members of the universe and (U) consequently

the total number of observations N, should in strictness be written

in the form—

(U) = (UA) + (Ua) = (UB) + (UB) = etc.

= (UAB) + (UAB) + (UaB) + (Ua/3) - etc.

(UA) = (UAB) + (UAB) = (UAC) + (UAy) = etc.

(UAB) = (UABC) + (UABy) = etc.

3. Clearly, however, we might have used any other symbol

instead of U to denote the attributes common to all the members

of the universe, e.g. A or B or AB or ABC, writing in the latter

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(ABC) = (ABCD) + (ABCS)

and so on. Hence any attribute or combination of attributes

common to all the class-symbols in an equation may be regarded as

specifying the universe within which the equation holds good.

Thus the equation just written may be read in words: "The

number of objects or individuals in the universe ABC is equal to

the number of D's together with the number of not-2)'s within

the same universe." The equation

(AC) = (ABC) + (ABC)

may be read: "The number of A's is equal to the number of A's

that are B together with the number of A's that are not-B

within the universe C."

4. The more complex may be derived from the simpler relations

between class-frequencies very readily by the process of specifying

the universe. Thus starting from the simple equation

(a) = N-(A),

we have, by specifying the universe as B, ." .. r■«-

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

(aB) = (B)-(AB) ^V^

= 2T-(A)-(B) + (AB).

Specifying the universe, again, as y, we have

(aBy) = (y)-(Ay)-(By)'4lABy)

= N-(A)- (B) - (C) + (AB) + (AC) + (BC) - (ABC).

5. Any class-frequencies which have been or might have been

observed within one and the same universe may be said to be


consistent with one another. They conform with one another,

and do not in any way conflict.

The conditions of consistence are some of them simple, but

others are by no means of an intuitive character. Suppose, for

instance, the data are given—

JV 1000 (AB) 42

(A) 525 (AC) 147

(B) 312 (BC) 86

(C) 470 (ABC) 25

—there is nothing obviously wrong with the figures. Yet they

are certainly inconsistent. They might have been observed at

different times, in different places or on different material, but

they cannot have been observed in one and the same universe.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

They imply, in fact, a negative value for (af3y)—

(a/8y) = 1000 - 525 - 312 - 470 + 42 + 147 + 86 - 25.

= 1000-1307 + 275-25.

= -57.

Clearly no class-frequency can be negative. If the figures,

consequently, are alleged to be the result of an actual inquiry in

a definite universe, there must have been some miscount or


6. Generally, then, we may say that any given class-frequencies

are inconsistent if they imply negative values for any of the

unstated frequencies. Otherwise they are consistent. To test the

consistence of any set of 2" algebraically independent frequencies,

for the case of n attributes, we should accordingly calculate

the values of all the unstated frequencies, and so verify the fact

that they are positive. This procedure may, however, be limited

by a simple considei-ation. If the ultimate class-frequencies are

positive, all others must be so, being derived from the ultimate
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

frequencies by simple addition. Hence we need only calculate

the values of the ultimate class-frequencies in terms of those

given, and verify the fact that they exceed zero.

7. As we saw in the last chapter, there are two sets of 2"

algebraically independent frequencies of practical importance, viz.

(1) the ultimate, (2) the positive class-frequencies.

It follojys.irom.what we have just said that there is only one

condition of consistence for the ultimate frequencies, viz. that

they must all exceed zero.. Apart from this, any one frequency of

the set may vary anywhere between 0 and oo without becoming

inconsistent with the others.

For the positive class-frequencies, the conditions may be


expressed symbolically by expanding the ultimate in terms of

the positive frequencies, and writing each such expansion not

less than zero. We will consider the cases of one, two, and

three attributes in turn.

8. If only one attribute be noted, say A, the positive frequencies

are iV and (-4). The ultimate frequencies are (.4) and (a), where


The conditions of consistence are therefore simply

(^)<to ir-(Arto

or, more conveniently expressed,

(a) (AHO (b) (A)±ff . . , (1)

These conditions are obvious: the number of A's cannot be less

than zero, nor exceed the whole number of observations.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

9. If two attributes be noted there are four ultimate frequencies

(AB), (AB), (all), (aB). The following conditions are given by

expanding each in terms of the frequencies of positive classes—

'(a) (AB)<£0 or (AB) would be negative \

(b) (ABH(A) + (B)-N„ (aB) „ „ I •

(e) (AB)>(A) „ (AB) „ „ ( W

(d) (AB)^(B) „ (aB) „ „ )

(a), (c), and (d) are obvious; (b) is perhaps a little less obvious,

and is occasionally forgotten. It is, however, of precisely the

same type as the other three. None of these conditions are

really of a new form, but may be derived at once from (1) (a) and

(1) (b) by specifying the universe as B or as 8 respectively. The

conditions (2) are therefore really covered by (1).

10. But a further point arises as regards such a system of

limits as is given by (2). The conditions (a) and (b) give lower or

minor limits to the value of (AB); (c) and (d) give upper or

major limits. If either major limit be less than either minor limit
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

the conditions are impossible, and it is necessary to see whether

(A) and (B) can take such values that this may be the case.

Expressing the condition that the major limits must be not less

than the minor, we have—

(^)^0 I (5X0 (

(A)3>Vi (B)1>X)

These are simply the conditions of the form (1). If, therefore,

(.4) and (B) fulfil the conditions (1), the conditions (2) must be

possible. The conditions (1) and (2) therefore give all the con-

ditions of consistence for the case of two attributes, conditions of

an extremely simple and obvious kind.

11. Now consider the case of three attributes. There are

eight ultimate frequencies. Expanding the ultimate in terms of

the positive frequencies, and expressing the condition that each

expansion is not less than zero, we have—

or the frequency given below

will be negative.

(a) (ABC)^O . (ABO)-

(b) '*(AB) + (AC)-(A) (A/3y) %±x>~~

(c) <t(AB) + (BC)-(B) (ofly)

(e) ^(AB) (ABy) t y'

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(g) >(BC) (aBC)

(h) . >(AB) + (AC) + (BC)-(A)-(B)-(C) + iV(a^y)


These, again, are not conditions of a new form. We leave it

as an exercise for the student to show that they may be derived

from (1) (a) and (1) (b) by specifying the universe in turn as

BC, By, fiC, and (3y. The two conditions holding in four universes

give the eight inequalities above.

12. As in the last case, however, these conditions will be im-

possible to fulfil if any one of the major limits (e)-(h) be less than

any one of the minor limits (a)-(d). The values on the right

must be such as to make no major limit less than a minor.

There are four major and four minor limits, or sixteen compari-

sons in all to be made. But twelve of these, the student will

find, only lead back to conditions of the form (2) for (AB), (AC),

and'(-BC) respectively. The four comparisons of expansions due

to contrary frequencies ( (a) and (h), (b) and (g), (c) and (f), (d)
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

and (e) ) alone lead to new conditions, viz.—

(a) (AB) + (AC) + (BC)<fc(A) + (B) + (C) - N\

(b) (AB) + (AC)-(BC)^(A)

(c) (AB)-(AC) + (BC)1>(B)

(d)-(AB) + (AC) + (BC)HC)

■ (4)

13. These are conditions of a wholly new type, not derivable

in any way from those given under (1) and (2). They are con-

ditions for the consistence of the second-order frequencies with

each other, whilst the inequalities of the form (2) are only conditions

for the consistence of the second-order frequencies with those of

lower orders. Given any two of the second-order frequencies, e.g.


(AB) and (AC), the conditions (4) give limits for the third, viz.

(BC). They thus replace, for statistical purposes, the ordinary

rules of syllogistic inference. From data of the syllogistic form,

they would, of course, lead to the same conclusion, though in a

somewhat cumbrous fashion; one or two cases are suggested as

exercises for the student (Questions 6 and 7). The following

will serve as illustrations of the statistical uses of the con-

ditions :—

Example i.—Given that (.4) = (B) = (C) = JJV1 and 80 per cent.

of the A's are B, 75 per cent. of A's are G, find the limits to the

percentage of B's that are C. The datar are—


and the conditions give—

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(a) ^_^l -0-8 -0-75

(b) <C0-8 + 0-75-1

(c) >1 -0-8 +0-75

(d) >1 +0-8 -0-75

(a) gives a negative limit and (d) a limit greater than unity;

hence they may be disregarded. From (b) and (c) we have—

W«0-55 2if)>0-95

—that is to say, not less than 55 per cent. nor more than 95 per

cent. of the B's can be C.

Example ii.—If a report give the following frequencies as

actually observed, show that there must be a misprint or mistake

of some sort, and that possibly the misprint consists in the

dropping of a 1 before the 85 given as the frequency (BC).


(A) 510 (AB) 189

(B) 490 (AG) 140

(C) 427 (BC) 85

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

From (4) (a) we have—

(5CX510 + 490 + 427 -1000 -189 -140


But 85 < 98, therefore it cannot be the correct value of (BC)..

If we read 185 for 85 all the conditions are fulfilled.


Example iii.—In a certain set of 1000 observations (A) = 45,

(B) = 23, (C)=14. Show that whatever the percentages of B's

that are A and of C"s that are .4, it cannot be inferred that any B's

are G.

The conditions (a) and (6) give the lower limit of (BC), which

is required. We find—

The first limit is clearly negative. The second must also be

negative, since (AB)/N cannot exceed -023 nor (AC)/N '014.

Hence we cannot conclude that there is any limit to (BC) greater

than 0. This result is indeed immediately obvious when we

consider that, even if all the B's were A, and of the remaining

22 A's 14 were C's, there would still be 8 A's that were neither

B nor C.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

14. The student should note the result of the last example, as it

illustrates the sort of result at which one may often arrive by

applying the conditions (4) to practical statistics. For given

values of If, (A), (B), (C), (AB), and (AC), it will often happen

that any value of (BC) not less than zero (or, more generally, not

less than either of the lower limits (2) (a) and (2) (b)) will satisfy

the conditions (4), and hence no true inference of a lower limit is

possible. The argument of the type "So many A's are B and

so many B's are C that we must expect some A's to be C" must

be used with caution.


(1) Morgan, A. de, Formal Logic, 1847(chapter viii., "On the Numerically

Definite Syllogism").

(2) Boole, G., Laws of Thought, 1854 (chapter xix., "Of Statistical Condi-


The above are the classical works with respect to the general theory

of numerical consistence. The student will find both difficult to follow

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

on account of their special notation, and, in the case of Boole's work,

the special method employed.

(3) Yule, G. U., "On the Theory of Consistence of Logical Class-frequencies

and its Geometrical Representation," Phil. Trans., A, vol. cxcvii.

(1901), p. 91. (Deals at length with the theory of consistence for

any number of attributes, using the notation of the present chapters.)



1. (For this and similar est«m„t__ „/• <•D

Statistics of Employment of WomSZ^H «i,.l.» ^ ,b£ ¥1SS Collet on lhe

urban district of Bury, 817 per tho^li J H [C-7664J 1894). If, in the

years of age were returned as '' occupied 'Tti 'Ji^TT'' between 20 and 25

thousand as married or widowed, what is tli ^H ''• ''l-. ioi263pei

of the married or widowed that must have been oe^M

2. If, in a series of houses actually invaded by smal,

inhabitants are attacked and 85 per cent. have been vac^_

lowest percentage of the vaccinated that must have been atta<

3. Given that 50 per cent. of the inmates of a workhouse ar(

cent. are "aged "(over 60), 80 per cent• non-able-bodied• 35 p«

men, 45 percent. non-able-bodied men, and 42 per cent. nun-able-bod1

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

aged, find the greatest and least possible proportions of non-able-bodii


4. (Material from ref. 5 of Chap. I.) The following are the proportL

per 10,000 of boys observed, with certain classes of defects amongst a nurrM

of school-children, A = development defects, B=nerve signs, Z> = men1


N =10,000 (D) =789 a ri

U)= 877 (AB) = 338

(B)= 1,086 (BD) = m

Show that some dull boys do not exhibit development defects, and state how

many at least do not do so.

5. The following are the corresponding figures for girls:— v

N =10,000 (J9) =689 \

(A) = 682 (AB) = 2i8

(B)= 850 (BD) = 36S

Show that some defectively developed girls are not dull, and state how many
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

at least must be so.

6. Take the syllogism " All A's are B, all B's are C, therefore all A's ar

C," express the premisses in terms of the notation of the preceding chapter,'

and deduce the conclusion by the use of the general conditions of consistence

7. Do the same for the syllogism "All A's are B, no B's are C, therefor

no A's are C."

8. Given that (A) = (.B) = (C) = itf, and that (AB)/N=(AC)/N=p, find!

what must be the greatest or least values of^j in order that we may infer

that (BC)/N exceeds any given value, say q.

9. Show that if

«£U <£.* $=3*


and (AB)_(AC)_(BC)_

and N ~ N ~ N ~y'

the value of neither x nor y can exceed J.



1 1-4. The criterion of independence.—5-10. The conception of association and

testing for the same by the comparison of percentages—11-12.

Numerical equality of the differences between the four second-order

frequencies and their independence values—13. The coefficient of

association—14. Necessity for an investigation into the causation of

an attribute A being extended to include non-^'s.


1. If there is no sort of relationship, of any kind, between two

attributes A and B, we expect to find the same proportion of A's

amongst the B's as amongst the non-5's. We may anticipate,

for instance, the same proportion of abnormally wet seasons in

leap years as in ordinary years, the same proportion of male to

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

total births when the moon is waxing as when it is waning, the

same proportion of heads whether a coin be tossed with the right

hand or the left.

Two such unrelated attributes may be termed independent, and

we have accordingly as the criterion of independence for A and B—


m 08) K'

If this relation hold good, the corresponding relations

(B)-(P) L'

(AB)JaB) (1'

<A) («) , ■

(A0)JaB) 13

A), W

must also hold. For it follows at once from (1) that—


(B)" 08)'

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


that is (aB) _ (a/?)


and the other two identities may be similarly deduced.

2. The criterion may, however, be put into a somewhat

different and theoretically more convenient form. The equation

(1) expresses (AB) in terms of (B), (B), and a second-order fre-

quency (A/3); eliminating this second-order frequency we have—

(AB)_(AB) + (AB) (A)

(B)-(B) + (B) N,

i.e. in words, "the proportion of A's amongst the B's is the same

as in the universe at large." The student should learn to recog-

nise this equation at sight in any of the forms—

(B) N W
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

vxtoV^p- - (2)

(AB)_(A) (B)

Wi*vJU> f (C) \^ J ^

The equation (d) gives the important fundamental rule: If the

attributes A and B are independent, the proportion of AB's in the

universe is equal to the proportion of A's multiplied by the propor-

tion of B's.

The advantage of the forms (2) over the form (1) is that they

give expressions for the second-order frequency in terms of the

frequencies of the first-order and the whole number of observa-

tions alone; the form (1) does not.

Example i.— If there are 144 A's and 384 B's in 1024 observa-

tions, how many AB's will there be, A and B being independent?

144x384 _,

-1024- °54-

There will therefore be 54 AB's.

Example ii.—If the A's are 60 per cent., the B's 35 per cent., of
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

the whole number of observations, what must be the percentage

of AB's in order that we may conclude that A and B are



~T0O 21'

and therefore there must be 21 per cent. (more or less closely, cf.

§§ 7, 8 below) of AB's in the universe to justify the conclusion

that A and B are independent

3. It follows from § 1 that if the relation (2) holds for any one

of the four second-order frequencies, e.g. (AB*), similar relations

must hold for the remaining three. Thus we have directly

from (1)—

(AP)-(AB) + (APl-(A)

03) (Mj+W) N'



And again,

(aBl (a/3) (aB) + (a(3r (a)

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(B} 08) (B) + (f3) n:

which gives

(»B) = (V< ^>=¥i



u & - * a. 9

Example iii.—In Example i. above, what would be^tne number

of afTs, A and 5 being independent 1

(a) = 1024-144 = 880

(/?) = 1024 - 384 = 640

. -. 88UX640 -„ -TT- q



The theorem is an important one, and the result may be

deduced more directly from first principles, replacing (AB) by

its value (A)(B)/N in the expansions—

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

(Afi) = (A)-(AB).

(afl) = (N)-(A)-(B) + (AB).

This is left as an exercise for the student.

4. Finally, the criterion of independence may be expressed in

yet a third form, viz. in terms of the second-order frequencies

alone. If A and B are independent, it follows at once from

equation (2) and the work of the preceding section that—

And evidently (aB)(Afi) is equal to the same fraction.




(AB)(a/J) = (aB)(A/3) (a))

{iS) - (a/3) wi . . . (3)

(AB) (aB)

UP) ~ H§) {c\

The equation (b) may be read "The ratio of A's to a's amongst

SeshlSila8rieq ratlo of *a t0 a'S amongst the P's," and

This form of criterion is a convenient one if all the four

second-order frequencies are given, enabling one to recognise

inde0Sendaenta * ether 0r not the two attributes are

Example iv._If the second-order frequencies have the following

values, are A and B independent or not?

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(AB) =110 (aB) = 90 (.4/3) = 290 (a/3) = 510.

Clearly ■ (AB)(a/B)>(aB)(ap),

so A and B are not independent.

5. Suppose now that A and 5 are not independent, but related

in some way or other, however complicated.

Then if (AB)>^Ml


A and B are said to be positively associated, or sometimes simply

associated. If, on the other hand, *""piy


&2L2&.said t0 be negatively associated or,more bri<%,

The student should notice that these words are not used

exactly in their ordinary senses, but in a technical sense. When

A andIB are said1 to be associated, it is not meant merely that

some A a*™B% but that the number of A's which are B's exceeds

the number\obe expected if A and B are independent. Similarly

when A and £ are said to be negatively associated or disassociated

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

it is not mean* that no A's are B\ but that the number of A'l

which are Bsfa^h short of the number to be expected if A and B


are independent. "Association" cannot be inferred from the

mere fact that some A's are B's, however great that proportion;

this principle is fundamental, and should be always borne

in mind.

6. The greatest possible value of (AB) for given values of

If, (A), and (B) is either (.4) or (B) (whichever is the less). When

(AB) attains either of these values, A and B may be said to be

completely or perfectly associated. The lowest possible value of

(AB), on the other hand, is either zero or (A) + (B) - N (which-

ever is the greater). When (AB) falls to either of these values,

A and B may be said to be completely disassociated. Complete

association is generally understood to correspond to one or other

of the cases, "All A's are B" or "All B's are A," or it might be
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

more narrowly defined as corresponding only to the case when

both these statements were true. Complete disassociation may

be similarly taken as corresponding to one or other of the cases.

"No A's are B," or "no a's are B," or more narrowly to the

case when both these statements are true. The greater the

divergence of (AB) from the value (A)(B)IN towards the limit-

ing value in either direction, the greater, we may say, is the

intensity of association or of disassociation, so that we may speak

of attributes being more or less, highly or slightly associated. This

conception of degrees of association, degrees which may in fact be

measured by certain formulae (cf. § 13), is important.

7. When the association is very slight, i.e. where (AB) only

differs from (A)(B)/N by a few units or by a small proportion, it

may be that such association is not really significant of any

definite relationship. To give an illustration, suppose that a coin

is tossed a number of times, and the tosses noted in pairs; then

100 pairs may give such results as the following (taken from an
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

actual record) :—

First toss heads and second heads . . .26

„ „ „ tails . . .18

First toss tails and second heads . . .27

„ „ „ tails . . .29

If we use A to denote "heads" in the first toss, B "heads" in

the second, we have from the above (.4) = 44, (B) = 53. Hence

44 x 53

(A)(B)IN=-^rp = 23-32, while actually (AB) is 26. Hence

there is a positive association, in the given record, between

the result of the first throw and the result of the second. But it

is fairly certain, from the nature of the case, that such association

cannot indicate any real connection between the results of the


two throws; it must therefore be due merely to such a complex

system of causes, impossible to analyse, as leads, for example, to

differences between small samples drawn from the same material.

The conclusion is confirmed by the fact that, of a number of such

records, some give a positive association (like the above), but

some a negative association.

8. An event due, like the above occurrence of positive associa-

tion, to an extremely complex system of causes of the general nature

of which we are aware, but of the detailed operation of which we

are ignorant, is sometimes said to be due to chance, or better to

the chances or fluctuations of sampling.

A little consideration will suggest that such associations due to

the fluctuations of sampling must be met with in all classes of

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

statistics. To quote, for instance, from § 1, the two illustrations

there given of independent attributes, we know that in any

actual record we would not be likely to find exactly the same

proportion of abnormally wet seasons in leap years as in ordinary

years, nor exactly the same proportion of male births when the

moon is waxing as when it is waning. But so long as the diver-

gence from independence is not well-marked we must regard such

attributes as practically independent, or dependence as at least


The discussion of the question, how great the divergence must

be before we can consider it as "well-marked," must be postponed

to the chapters dealing with the theory of sampling. At present the

attention of the student can only be directed to the existence of

the difficulty, and to the serious risk of interpreting a "chance

association " as physically significant.

9. The definition of § 5 suggests that we. are to test the

existence or the intensity of association between two attributes

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

by a comparison of the actual value of (AB) with its independence-

value (as it may be termed) (A)(B)/lf. The procedure is from the

theoretical standpoint perhaps the most natural, but it is usual,

in practice, to adopt a method of comparing proportions, e.g. the

proportion of A's amongst the B's with the proportion in the

universe at large. Such proportions are usually expressed in the

form of percentages or proportions per thousand.

A large number of such comparisons are available for the

purpose, as indicated by the inequalities (4) below, which all

hold good for the case of positive association between A and

B. The first two, (a) and (b), follow at once from the definition

of § 5, (c) and (d) follow from (a) and (6), on multiplying

across and expanding (A) and N in the first case, (B) and N

in the second. The deduction of the remainder is left to the



(AB) (A)

(B)> N

(AB) (B)

U) N


(AB) (Aft

(S) > 08)

(AB) (aB)

\A) > (a)

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(^) 04)

(j8) < ^

W)^ 03)


(afl) (a)

(a5) (B)



(«) ^

(a/3) M

(«0L (/3)



(ft ^

(«) *

(a/3) (afl)
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

03) <*)




(a) M)

The question arises then, which is the best comparison to adopt?

10. Two principles should decide this point: (1) of any two

comparisons, that is the better which brings out the more clearly

the degree of association; (2) of any two comparisons, that is

the better which illustrates the more important aspect of the

problem under discussion.

The second condition will generally exclude all the comparisons

(e)-(m), for the capital letters will naturally be used to denote

the important aspect of the character. We will generally be

concerned, for instance, with the proportion of A's amongst the

B's as compared with the B's (as in (c)), and not with the propor-

tion of the a's in those two universes (as in (/)); or with the

proportion of A's amongst the B'a as compared with the whole

universe (a), and not with the proportion of a's amongst the

yS's as compared with the whole universe (j). That is simply the

natural method of using the notation. We may confine our

attention accordingly to the comparisons (a)-(d). Of these

four, (c) or (d) is generally to be preferred to (a) or (b), for the

reason that either of the latter may give a misleading impression

as to the intensity of the association. We have in fact—

(A) JAB) (B) (AB) 03)

N (B) . N + (0) . N.

Hence if (B)/N be large compared with (B)/N, (A)jN will

approach the value (AB)/(B) and the association will appear

to be very small, even though (AB)/(B) and (AB)/(B) differ

considerably. Suppose, for example, in some given case, for a

considerable number of observations—

(AB)/(B) =-70

(AB)/(B) = -40

this would mean a considerable positive association between

A and B. But if it were only stated that—

(AB)/(B)=-70 (A)IN=-Q7

the association would appear to be small. Yet the two state-

ments are equivalent if (B)/N=Q-9, for then we have—

(A)/N= -7 x -9 + -4 x -1 = -67

The meaning of (a) or (6), in fact, cannot be fully realised

unless the value of (B)/JV (or (A)/N in the second case) is known,

and therefore (c) is to be preferred to (a), and (d) to (b). An

exception may, however, be made in cases where the proportion

of B's (or A's) in the universe is very small, so that (A)/N

approaches closely to (Afi)/((}) or (B)IN to (aB)/(a) (cf. Example

vi. below).
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

There still remains the choice between (a) and (l,), or between

(c) and (d). This must be decided with reference to the second

principle, i.e. with regard to the more important aspect of the

problem under discussion, the exact question to be answered,

or the hypothesis to be tested, as illustrated by the examples

below. Where no definite question has to be answered or

hypothesis tested both pairs of proportions may be tabulated,

as in Example vi. again.

Example v.—Association between sex and death. (Material

from 64th Annual Report Reg. General. [Cd. 1230] 1903.)

Males in England and Wales, 1901 . . 15,773,000

Females „ „ . . 16,848,000

Of the Males died 285,618

Of the Females died 265,967

We may denote the number of males by (A), the number of

deaths by (B); then the natural comparison is between (AB)/(A)

and (aB)/(a), i.e. the proportion of males that died and the
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

proportion of females. We find—

(AB) 285,618

(A) 15,773,000

(aB) 265,967

(a) 16,848,000

Therefore (AB)/(A)>(aB)/(a), and there is positive association

between male-sex and death. It is usual to express proportions


of deaths, births, marriages, etc., to the population as rates per

thousand; so that the above figures would be written—

. Death-rate among Males . . 18-1 per thousand.

„ „ Females . . 15-8 ,,

A comparison of the death-rate among males with the death-

rate for the whole population would be equally valid, but it

should be remembered that the latter depends on the sex-ratio

as well as on the causes that determine the death-rates amongst

males and females. The above figures give—

Death-rate among males . . 18'1 per thousand.

„ for whole population . 16-9 „

This brings out the difference between the death-rates of

males and of the whole population, but is not so clear an indica-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

tion of the difference between males and females, which is the

point to be investigated.

A comparison of the form (4) (c) is again valid for testing the

association, but the form is not desirable, illustrating very well

the remarks on the opposite page. Statisticians are concerned

with death-rates, and not with the sex-ratios of the living and

the dead. The student should learn, however, to recognise such

forms of statement as the following, as equivalent to the above :—

Proportion of males amongst those I -, 9 ,, ,

,f , ,. ". ," ° > 518 per thousand,

that died in the year . . . ) r

Proportion of males amongst those \ ion

that did not die in the year . )"

Since (AB)j(B)>(Af3)j(/3), it follows, as before, that there is

positive association between A and B.

Example vi.—Deaf-mutism and Imbecility. (Material from

Census of 1901. Summary Tables. [Cd. 1523.])

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Total population of England and Wales . . 32,528,000

Number of the imbecile (or feeble-minded) . 48,882

Number of deaf-mutes . . . . . 15,246

Number of imbecile deaf-mutes . . . 451

Required, to find whether deaf-mutism is associated with


We may denote the number of the imbecile by (A), of deaf-

mutes by (B). One of the comparisons (a) or (b) may very well

be used in this case, seeing that (A)IN and (B)/N differ very

little from (AB)/(p) and (aB)I(a.) respectively. The question


whether to give the preference to (a) or to (b) depends on the

nature of the investigation we wish to make. If it is desired to

exhibit the conditions among deaf-mutes (a) may be used :—

Proportion of imbeciles among deaf-1 29.g thougand.

mutes = (AB)/(B) . . . J r

Proportion of imbeciles in the whole I "►

population = (A)IN . . . )

If, on the other hand, it is desired to exhibit the conditions

amongst the imbecile, (6) will be preferable.

Proportion of deaf-mutes amongst I Q 0 ,, A

the imbecile (AB)/(A) . . ( 9'J Per thousand'

Proportion of deaf-mutes in the I „-

whole population (B)IN . . J"

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Either comparison exhibits very clearly the high degree of asso-

ciation between the attributes. It may be pointed out, however,

that census data as to such infirmities are very untrustworthy.

Example vii.—Eye-colour of father and son (material due

to Sir Francis Galton, as given by Professor Karl Pearson, Phil.

Trams., A, vol. cxcv. (1900), p. 138; the classes 1, 2, and 3 of the

memoir treated as light).

Fathers with light eyes and sons with light eyes (AB) . 471

not light „ (A/3) . 151

„ not light „ light „ (aB) . 148

not light „ (ai8) . 230

Eequired to find whether the colour of the son's eyes is

associated with that of the father's. In cases of this kind the

father is reckoned once for each son; e.g. a family in which the

father was light-eyed, two sons light-eyed and one not, would be

reckoned as giving two to the class AB and one to the class Aft.

The best comparison here is—

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Percentage of light-eyed amongst the sons I n„ ,

of light-eyed fathers . . . . / 'b per Cent'

Percentage of light-eyed amongst the sons I „„

of not-light-eyed fathers ... J"

But the following is equally valid—

Percentage of light-eyed amongst the I »fi ,

fathers of light-eyed sons . . . )"

Percentage of light-eyed amongst the ) ,„

fathers of not-light-eyed sons . . J"


The reason why the former comparison is preferable is, that we

usually wish to estimate the character of offspring from that of

the parents, and define heredity in terms of the resemblance of

offspring to parents. We do not, as a rule, want to make use of

the power of estimating the character of parents from that of their

offspring, nor do we define heredity in terms of the resemblance

of parents to offspring. Both modes of statement, however,

indicate equally clearly the tendency to resemblance between

father and son.

11. The values that the four second-order frequencies take in

the case of independence, viz.—

(jm WW (jm {*w

N' if' N' N'

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

are of such great theoretical importance, and of so much use as

reference-values for comparing with the actual values of the

frequencies (AB) (aB) (A/3) and (a/3), that it is often desirable to

employ single symbols to denote them. We shall use the symbols—

If 8 denote the excess of (AB) over (AB)W then we have—

(aB) = (B)-(AB) = (B)-(AB)0-S


= (aB)0 - 8.

.: (AB)-(AB)0^(aB)0-(aB).

Similarly it may be shown that—■

(AB) = (A8)0-8.

(aB) ={aB)0 +8

Therefore, quite generally we have—

(AB) - (AB)0 = (aB) - (cB)0 = (AB)0 - (AB) = (aB)0 - (aB).

Supposing, for example,

iV=100 (A) = 60 (B) = 45

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


(AB)0 = 27 (aB\= 18 (^3)o = 33 (a/3)0 = 22.


If, now, A and B are positively associated, and (.42?) = say 35,

then (aB) = 45 - 35 - 10, (A/3) = 60 - 35 = 25, (a/3) =100-60-45

+ 35 = 30, and we have—

35 - 27 = 30 - 22 = 18 - 10 = 33 - 25 = 8.

Similarly, if A and B be disassociated and (AB) = say 19, the student

will find that—

(AB) = 19 (olB) = 26 (.4/3) = 41 (a/3) = U

and 19 - 27 = 14 - 22 = 18 - 26 = 33 - 41 = - 8.

12. The value of this common difference 8 may be expressed

in a form that it is useful to note. We have by definition

8 = (AB)-(AB)0 = (AB)-(^p.

Bring the terms on the right to a common denominator, and

express all the frequencies of the numerator in terms of those of

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

the second order; then we have—

g = 1 I (AB)[(AB) + (aB) + (AB) + (a/3)])

N\ -[(AB) + (AB)l(AB) + (aB)f"}

= ±{(AB)(aB)-(*BXAB)}.

That is to say, the common difference is equal to l/7Vth of the

difference of the "cross products" (AB)(afi) and (aB)(AB); e.g.

taking the examples of § 11, we have

8 = l^ol 35x30-25x10 i =8

and 8 = y^| 19x14-26x41 i =-8.

It is evident that the difference of the cross-products may be

very large if N be large, although 8 is really very small. In

using the difference of the cross-products to test mentally the

sign of the association in a case where all the four second-order

frequencies are given, this should be remembered: the difference

should be compared with N, or it will be liable to suggest a higher

degree of association than actually exists.

Example viii.—The following data were observed for hybrids of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


Datura (W. Bateson and Miss Saunders, Report to the Evolution

Committee of the Royal Society, 1902):—

Flowers violet, fruits prickly (AB)

,, „ smooth (A0)

Flowers white, „ prickly (aB)

„ „ smooth (afi)




Investigate the association between colour of flower and char-

acter of fruit.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Since 3x47 = 141, 12x21 = 252, i.e. (AB) (ap)<(aB) (A/3),

there is clearly a negative association; 252-141 = 111, and at

first sight this considerable difference is apt to suggest a consider-

able association. But 8 = 111/83 = l-3 only, so that in point of

fact the association is small, so small that no stress can be laid

on it as indicating anything but a fluctuation of. sampling.

Working out the percentages we have—

Percentage of violet-flowered plants with L. .

. i - . .. ?ou per cent,

prickly fruits . . . . . ) r

Percentage of white-flowered plants with I £•«

prickly fruits ..... I"

13. While the methods used in the preceding pages suffice for

most practical purposes, it is often very convenient to measure

the intensities of association in different cases by means of some

formula or " coefficient," so devised as to be zero when the attributes

are independent, + 1 when they are completely associated, and

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

- 1 when they are completely disassociated, in the sense of § 6. If

we use the term "complete association" in the wider sense there

defined, we have, grouping the frequencies in a small table in a

way that is sometimes convenient, the three cases of complete

























In the first case all A's are B, and so (Afi) = 0; in the second

all B's are A and so (aB) = 0; and in the third case we have (A) =


(B) = (AB), so that all A's are B and also all B's are A. The

three corresponding cases of complete disassociation are—









Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259














Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



It is required to devise some formula which shall give the value

+ 1 in the first three cases, - 1 in the second three, and shall also

be zero where the attributes are independent. Many such

formulae may be devised, but perhaps the simplest possible is the



V (AB)(a(l) + (Ap)(aB)

-(AB)(ap) + (A/3)(aB)

—where S is the symbol used in the two last sections for the

difference (AB) - (AB)a. It is evident that Q is zero when the

attributes are independent, for then 8 is zero: it takes the value + 1

when there is complete association, for then the second term in

both numerator and denominator of the first form of the expression

is zero: similarly it is - 1 where there is complete disassociation,

for then the first term in both numerator and denominator is

zero. Q may accordingly be termed a coefficient of association.

As illustrations of the values it will take in certain cases, the

association between deaf-mutism and imbecility, on the basis of

the English census figures (Example vi.) is +0-91; between light

eye colour in father and in son (Example vii.) + 0-66; between

colour of flower and prickliness of fruit in Datura (Example viii.)

- 0.28, an association which, however, as already stated, is

probably of no practical significance and due to mere fluctua-

tions of sampling.

The coefficient is only mentioned here to direct the attention

of the student to the possibility of forming such a measure of

association, a measure which serves a similar purpose in the case

of attributes to that served by certain other coefficients in the cases

of manifold classification (cf. Chap. V.) and of variables (cf.


Chap. IX., and the references to Chaps. X. and XVI.). For

further illustrations of the use of this coefficient the reader is

referred to the reference (1) at the end of this chapter; and for a

mode of deducing another coefficient, based on theorems in the

theory of variables, which has come into more general use, to ref.

(3). Reference should also be made to § 10 of Chap. XI.

14. In concluding this chapter, it may be well to repeat, for the

sake of emphasis, that (cf. § 5) the mere fact of 80, 90, or 99 per

cent. of A's being B implies nothing as to the association of A

with B; in the absence of information, we can but assume that

80, 90, or 99 per cent. of o's may also be B. In order to apply

the criterion of independence for two attributes A and B, it is

necessary to have information concerning a's and y3's as well as

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

A's and .B's, or concerning a universe that includes both a's and

A's, /3's and B's. Hence an investigation as to the causal

relations of an attribute A must not be confined to A's, but must

be extended to a's (unless, of course, the necessary information

as to a's is already obtainable): no comparison is otherwise

possible. It would be no use to obtain with great pains the

result (cf. Example vi.), that 29-6 per thousand of deai-mutes

were imbecile unless we knew that the proportion of imbeciles

in the whole population was only 1-5 per thousand; nor would

it contribute anything to our knowledge of the heredity of deaf-

mutism to find out the proportion of deaf-mutes amongst the

offspring of deaf-mutes unless the proportions amongst the off-

spring of normal individuals were also investigated or known.


(1) Yule, G. U., "On the Association of Attributes in Statistics," Phil.

Trans. Roy. Soc, Series A, vol. cxciv., 1900, p. 257. (Deals fully

with the theory of association: the association coefficient of § 13

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


(2) Yule, G. U., "Notes on the Theory of Association of Attributes in Statis-

tics," Biometrika, vol. ii., 1903, p. 121. (Contains an abstract of the

principal portions of (1) and other matter.)

(3) Pearson, Karl, "On the Correlation of Characters not Quantitatively

Measurable," Phil. Trans. Roy. Soc, Series A, vol. exev., 1900, p. 1.

(Deals with the problem of measurement of intensity of association

from the standpoint of the theory of variables, giving a method which

has since been largely used: only the advanced student will be able to

follow the work.) •

(4) Lipps, G. F., " Die Bestimmungder Abhangigkeit zwischenden iterkmalen

eines Gegenstandes," Berichte d. math.-phys. Klasse d. kgl. Sachsischen

Gesellschaft d. Wissenschaften ; Leipzig, Feb. 1905. (Deals with the

general theory of the dependence between two characters, however

classified: the coefficient of association of § 13 is again suggested inde-

pendently. )



1. At the census of England and Wales in 1901 there were (to the nearest

1000) 15,729,000 males and 16,799,000 females; 3497 males were returned

as deaf-mutes from childhood, and 3072 females.

State proportions exhibiting the association between deaf-mutism from

childhood and sex. How many of each sex for the same total number would

have been deaf-mutes if there had been no association?

2. Show, as briefly as possible, whether A and B are independent,

positively associated, or negatively associated in each of the following cases :—

(a) N =5000

(b) (A) = 490

(c) (AB) = 256

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(A) =2350

(AB) = 294

(aB) = 768

(B) =3100

(a) = 570

(A$) = 48

(AB) = 1600

(aB) = 380

(a$) = 144

3. (Figures derived from Darwin's Cross- and Self-fertilisation of Plants,"

cf. ref. 1, p. 294.) The table below gives the numbers of plants of certain

species that were above or below the average height, stating separately those

that were derived from cross-fertilised and from self-fertilised parentage.

Investigate the association between height and cross-fertilisation of parent-

age, and draw attention to any special points you notice.

Parentage Cross-fer-

tilised. Height—
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Parentage Self-fer-

tilised. Height—










Ipomsea purpurea

Petunia violacea

Reseda lutea

Reseda odorata .

Lobelia fulgens.




















4. (Figures from same source as Example vii. p. 34, but material differently

grouped ; classes 7 and 8 of the memoir treated as "dark.") Investigate the

association between darkness of eye-colour in father and son from the following


Also tabulate for comparison the frequencies that would have been observed

had there been strict independence between eye colour of husband and eye

colour of wife, i.e. the values of (AB)0, etc., as in question 4.

6. (Figures from the Census of England and Wales, 1891, vol. iii.: the data

cannot be regarded as trustworthy.) The figures given below show the

number of males in successive age groups, together with the number of the

blind (A), of the mentally-deranged (B), and the blind mentally-deranged

(AB). Trace the association between blindness and mental derangement

from childhood to old age, tabulating the proportions of insane amongst the

whole population and amongst the blind, and also the association coefficient.

Give a short verbal statement of your results.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259








75 and








Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



























7. Show that if

(AB\ (a/?)! (AB\ (afS)!

(AB)s (ag)2 (^/3)2 (a<8)2

be two aggregates corresponding to the same values of (A), (B), (a), and (0),

(AB\ - (AB)2=(a5)a- (a5)! = (Afts - (A$\ = (a^ - (a/3)2.

8. Show that if

S = (A£)-(AB)0

(ABf + (a$f-(aBf-U0f=l(A)-(a)l(B)-($)] + 2N.S.



1-2. Uncertainty in interpretation of an observed association—3-5. Source of

the ambiguity: partial associations—6-8. Illusory association due

to the association of each of two attributes with a third—9. Estima-

tion of the partial associations from the frequencies of the second

order—10-12. The total number of associations for a given number

of attributes—13-14. The case of complete independence.

1. If we find that in any given case

. (AB)> or <^

all that is known is that there is a relation of some sort or kind

between A and B. The result by itself cannot tell as whether

the relation is direct, whether possibly it is only due to "fluctuations

of sampling" (c/. Chap. III. §§ 7-8), or whether it is of any other

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

particular kind that we may happen to have in our minds at the

moment. Any interpretation of the meaning of the association is

necessarily hypothetical, and the number of possible alternative

hypotheses is in general considerable.

2. The commonest of all forms of alternative hypothesis is of

this kind: it is argued that the relation between the two attributes

A and B is not direct, but due, in some way, to the association of

A with C and of B with C. An illustration or two will make the

matter clearer:—

(1) An association is observed between "vaccination" and

"exemption from attack by small-pox," i.e. more of the vaccinated

than of the unvaccinated are exempt from attack. It is argued

that this does not imply a protective effect of vaccination, but is

wholly due to the fact that most of the unvaccinated are drawn from

the lowest classes, living in very unhygienic conditions. Denoting

vaccination by A, exemption from attack by B, hygienic conditions by

C, the argument is that the observed association between A and B

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

is due to the associations of both with C.


(2) It is observed, at a general election, that a greater

proportion of the candidates who spent more money than their

opponents won their elections than of those who spent less. It

is argued that this does not mean an influence of expenditure on

the result of elections, but is due to the fact that Conservative

principles generally carried the day, and that the Conservatives

generally spent more than the Liberals. Denoting winning by A,

spending more than the opponent by B, and Conservative by C, the

argument is the same as the above (cf. Question 9 at the end of

the chapter).

(3) An association is observed between the presence of some

attribute in the father and its presence in the son; and also

between the presence of the attribute in the grandfather and its

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

presence in the grandson. Denoting the presence of the attribute

in son, father, and grandfather by A, B, and C, the question arises

whether the association between A and C may not be due solely

to the associations between A and B, B and C, respectively.

3. The ambiguity in such cases evidently arises from the fact

that the universe of observation, in each case, contains not

merely objects possessing the third attribute alone, or objects

not possessing it, but both.

If the universe were restricted to either class alone the given

ambiguity would not arise, though of course others might remain.

Thus, in the first illustration, if the statistics of vaccination

and attack were drawn from one narrow section of the population

living under approximately the same hygienic conditions, and an

association were still observed between vaccination and exemption

from attack, the supposed argument would be refuted. The fact

would prove that the association between vaccination and

exemption could not be wholly due to the association of both with

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

hygienic conditions.

Again, in the second illustration, if we confine our attention to

the "universe " of Conservatives (instead of dealing with candidates

of both parties together), and compare the percentages of Conserva-

tives winning elections when they spend more than their opponents

and when they spend less, we shall avoid the possible fallacy. If

the percentage is greater in the former case than in the latter, it

cannot be for the reasons suggested in § 2.

The biological case of the third illustration should be similarly

treated. If the association between A and C be observed for

those cases in which all the parents, say, possess the attribute, or

else all do not, and it is still sensible, then the association first

observed between A and C for the whole universe cannot have

been due solely to the observed associations between A and B, B

and C.

4. The associations observed between the attributes A and B

in the universe of C's and the universe of y's may be termed

partial associations, to distinguish them from the total associations

observed between A and B in the universe at large. In terms of

the definition of § 5 of Chap. III., A and B will be said to be posi-

tively associated in the universe of C's (cf. § 4 of Chap. II.) when

(ABC)><^p • " • • (1)

and negatively associated in the converse case.

As in the simpler case, the association is most simply tested by

a comparison of percentages or proportions (§ 9, Chap. III.),

although for some purposes the "coefficient of association " may be

useful. Confining our attention to the more fundamental method,

if A and B are positively associated within the universe of C's, we

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

must have, to quote only the four most convenient comparisons

(cf. (4) (a)-(d), Chap. III. p. 31),

(ABC) (AC) (ABC) (BC)

(BC) > (C) K' (AC) > (C) w

(ABC) (ABC) . (ABC) (aBC)

(BC) > (BC) [-' (AC) > (aC) W


These inequalities may easily be rewritten for any other case by

making the proper substitutions in the symbols; thus to obtain

the inequalities for testing the association between A and C in

the universe of B's, B must be written for C, B for y, and vice

versa., throughout; it being remembered that the order of the

letters in the class symbol is immaterial. The remarks of § 10,

Chap. III., as to the choice of the comparison to be used, apply of

course equally to the present case.

5. Though we shall confine ourselves in the present work to

the detailed discussion of the case of three attributes, it should be

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

noticed that precisely similar conceptions and formulse to the

above apply in the general case where more than three attributes

have been noted, or where the relations of more than three have

to be taken into account. If, when it is observed that A and B

are still associated within the universe of C's, it is argued that

this is due to the association of both A and B with D, the argu-

ment may be tested by still further limiting the field of observa-

tion to the universe CD. If


A and B are positively associated within the universe of CD's,

and the association cannot be wholly ascribed to the presence and


absence of D as suggested, nor to the presence and absence of

C and D conjointly. If it be then argued that the presence

and absence of E is the source of association, the process may

be repeated as before, the association of A and B being tested

for the universe CDE, and so on as far as practicable.

Partial associations thus form the basis of discussion for any

case, however complicated. The two following examples will

serve as illustrations for the case of three attributes.

Example i.—(Material from ref. 5 of Chap. I.)

The following are the proportions per 10,000 of boys observed

with certain classes of defects, amongst a number of school

children. (A) denotes the number with development defects, (B)

with nerve-signs, (D) the number of the "dull."

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

ilT 10,000 (AS) 338

(A) 877 (AD) 338

(B) 1,086 (BD) 455

(D) 789 (ABB) 153

The Report from which the figures are drawn concludes that "the

connecting link between defects of body and mental dulness is

the coincident defect of brain which may be known by observation

of abnormal nerve-signs." Discuss this conclusion.

The phrase "connecting link" is a little vague, but it may

mean that the mental defects indicated by nerve-signs B may

give rise to development-defects A, and also to mental-dul-

ness D; A and D being thus common effects of the same cause

B (or another attribute necessarily indicated by B), and not

directly influencing each other. The case is thus similar to that

of the first illustration of § 2 (liability to small-pox and to non-

vaccination being held to be common effects of the same circum-

stances), and may be similarly treated by investigation of the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

partial associations between A and D for the universes B and ft.

As the ratios (A)/N, (B)/N, (D)/N are small, comparisons of the

form (4) (a) or (b) of Chap. III. (p. 31), or (2) (a) (b) above, may

very well be used (cf. the remarks in § 10 of the same chapter,

pp. 31-2).

The following figures illustrate, then, the association between

A and D for the whole universe, the .8-universe and the /?-


For the entire material:—


Proportion of the dull = (D)1N . . . = '°" = 7 9 percent.

„ ,, defectively developed who \- 338 -„„.,.

were dull=(AD)/(A) . . . - / 877


For those exhibiting nerve signs :—

Proportion of the dull = (BD/(B) . . = —|- = 4199 per cent.

,, „ defectively developed who \ _ 153 _iK.o

were dull = (ABD)/(AB). . . .J~l38

For those not exhibiting nerve signs :—

Proportion of the dull = (jB2))/(fl) . . = 3U = 87 „

,, ,, defectively developed who \ _ 185 _oi.o

were dull = (A$D)/(Afl) . . . . J—539~

The results are extremely striking; the association between A

and D is very high indeed both for the material as a whole (the

universe at large) and for those not exhibiting nerve-signs (the

/3-universe), but it is very small for those who do exhibit nerve-

signs (the iJ-universe).

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

This result does not appear to be in accord with the conclusion

of the Report, as we have interpreted it, for the association

between A and D in the /3-universe should in that case have

been very low instead of very high. •

Example ii.—Eye-colour of grandparent, parent and child.

(Material from Sir Francis Galton's Natural Inheritance (1889),

table 20, p 216. The table only gives particulars for 78 large

families with not less than 6 brothers or sisters, so that the

material is hardly entirely representative, but serves as a good

illustration of the method.) The original data are treated as in

Example vii. of the last chapter (p. 34). Denoting a light-eyed

child by A, parent by B, grandparent by C, every possible line of

descent is taken into account. Thus, taking the following two

lines of the table,



Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

A. a.

B. 0.

0. y

Light-eyed. ^tyed.

Light-eyed. Lign™yed.

Light-eyed. Light°eyed.







the first would give 4x1x1=4 to the class ABC, 4x1x3 = 12 to

the class ABy, 4 to AfiC, 12 to A/3y, 5 to aBC, 15 to aBy, 5 to

a.f$G, and 15 to a/?y; the second would give 3x1x4 = 12 to the

class ABC, 12 to AfiC, 16 to aBC, 16 to aj3C, and none to the re-

mainder. The class-frequencies so derived from the whole table are,

















The following comparisons indicate the association between

grandparents and parents, parents and children, and grand-

parents and grandchildren, respectively :—

Grandparents and Parents.

Proportion of light-eyed amongst the \ _ (-£i£)_ 2281 _,-.„ ,.._„_».

children of light-eyed grandparents/ (C) 3178"

Proportion of light-eyed amongst the \ (By) 821

children of not-light-eyed grand-J-= -7—r-=rjg7j = 44-9 ,,

parents J »"

Parents and Children.

Proportion of light-eyed amongst the \ _ Mj#) _ 2524 _

children of light-eyed parents . / ~ (B) 3U52 ai ' Per cenl-

Proportion of light-eyed amongst the \ _(^/3) _ 1060 _

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

children of not-light-eyed parents. / (/3) 1956"

In both the above cases we are really dealing with the

association between parent and offspring, and consequently the

intensity of association is, as might be expected, approximately

the same; in the next case it is naturally lower:—

Grandparents and Grandchildren.

Proportion of light-eyed amongst the ^ (AC) 2480

grandchildren of light-eyed grand- J- = 77^- = jjr-^5 = 78 "0 per cent.

parents J (C) 3178

Proportion of light-eyed amongst the \ (Ay) 1104

grandchildren of not-light-eyed J-= -j\ = TooTi= 60.3 ,,

grandparents . . . . J"'

We proceed now to test the partial associations between grand

parents and grandchildren, as distinct from the total associations

given above, in order to throw light on the real nature of the

inheritance. There are two such partial associations to be

tested: (1) where the parents are light-eyed, (2) where they are
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

not-light-eyed. The following are the comparisons :—

Grandparents and Grandchildren: Parents light-eyed.

Proportion of light-eyed amongst the ) (ABC) 1928

grandchildren of light-eyed grand- > = TJ>7;\ =2231 = ^ ^ Per

parents 1

Proportion of light-eyed amongst the "1 (ABy) 596

grandchildren of not-light-eyed J-= yB-r = oo7"= 72 6 ,.

grandparents . . . . J '"

Grandparents and Grandchildren: Parents not-light-eyed•

Proportion of light-eyed amongst the 1 (ABC) 552

roportion of light-eyed amongst the 1 (ABC) 552

grandchildren of light-eyed grand- [ = ~(W7n = "047"= ^.3 per cent.

parents )<-(>)


j (/37) .

Proportion of light-eyed amongst the "I (ABy) 508

grandchildren of not-light-eyed J- = .„ 7 = -TKna ~ 60 .3

grandparents . . . y>

In both cases the partial association is quite well-marked and

positive; the total association between grandparents and grand-

children cannot, then, be due wholly to the total associations

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

between grandparents and parents, parents and children, re-

spectively. There is an ancestral heredity, as it is termed, as

well as a parental heredity.

We need not discuss the partial association between children and

parents, as it is comparatively of little consequence. It may be

noted, however, as regards the above results, that the most

important feature may be brought out by stating three ratios


If A and B are positively associated, (AB)/(B) > (A)/N.

If A and C are positively associated in the universe of B's,

(ABC)/(BC) > (AB)/(B). Hence (A)/'N, (AB)/(B), and (ABC)/(BC)

form an ascending series. Thus we have from the given data—

^Sin^ne?^ "7""!} = W =7^6 per cent.

Proportion of light-eyed amongst the \ _ , j B> 1. c\ _ on.'r

children of light-eyed parents . J - U"*VW -«"' i.

Proportion of light-eyed amongst the.j

children of light-eyed parents and V =(ABC)/(BC) = 86-A „

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

grandparents . . J

If the great-grandparents, etc., etc., were also known, the series

might be continued, giving (ABCD)/(BCD), (ABCDE)j(BCDE),

and so forth. The series would probably ascend continuously

though with smaller intervals, A and D being positively associated

in the universe of BG's, A and E in the universe of BCD's, etc.

6. The above examples will serve to illustrate the practical

application of partial associations to concrete cases. The general

nature of the fallacies involved in interpreting associations

between two attributes as if they were necessarily due to the

most obvious form of direct causation is more clearly exhibited

by the following theorem :—

If A and B are independent within the universe of C's and also

within the universe of y's, they will nevertheless be associated

within the universe at large, unless C is independent of either A

or B or both.

The two data give—





(ABy) = (Ay)(By) = [(A)-(AG)]m-(BC)]

(y) (y)

Adding them together we have—

Write, as in § 11 of Chap. III. (p. 35)—

(A^J^m mjvp, (Bo.-iMQ,

subtract (AB)0 from both sides of the above equation, simplify,

and we have

(AB) - (AB)0 = (0j^[(AC) - (AC)Q][(BC) - (BC\] . (4)

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

This proves the theorem; for the right-hand side will not be

zero unless either (AC) = (AC)0 or (BC) = (BC)0.

7. The result indicates that, while no degree of heterogeneity

in the universe can influence the association between A and B

if all other attributes are independent of either A or B or both,

an illusory or misleading association may arise in any case where

there exists in the given universe a third attribute C with which

both A and B are associated (positively or negatively). If both

associations are of the same sign, the resulting illusory association

between A and B will be positive; if of opposite sign, negative.

The three illustrations of § 2 are all of the first kind. In (1) it

is argued that the positive associations between vaccination and

hygienic conditions, exemption from attack and hygienic conditions,

give rise to an illusory positive association between vaccination

and exemption from attack. In (2) it is argued that the positive

associations between conservative and winning, conservative and

spending more, give rise to an illusory positive association between

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

winning and spending more. In (3) the question is raised whether

the positive association between grandparent and grandchild may

not be due solely to the positive associations between grandparent

and parent, parent and child.

Misleading associations of this kind may easily arise through


the mingling of records, e.g. respecting the two sexes, which a

careful worker would keep distinct.

Take the following case, for example. Suppose there have been

200 patients in a hospital, 100 males and 100 females, suffering

from some disease. Suppose, further, that the death-rate for males

(the case mortality) has been 30 per cent., for females 60 per cent.

A new treatment is tried on 80 per cent. of the males and 40 per

cent. of the females, and the results published without distinction

of sex. The three attributes, with the relations of which we are

here concerned, are death, treatment and male sex. The data show

that more males were treated than females, and more females

died than males; therefore the first attribute is associated nega-

tively, the second positively, with the third. It follows that there
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

will be an illusory negative association between the first two—

death and treatment. If the treatment were completely inefficient

we would, in fact, have the following results :—




Treated and died .




„ and did not die




Not treated and died

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


„ and did not die .




i.e. of the treated, only 48/120 = 40 per cent. died, while of those

not treated 42/80 = 52.5 per cent. died. If this result were stated

without any reference to the fact of the mixture of the sexes, to

the different proportions of the two that were treated and to the

different death-rates under normal treatment, then some value in

the new treatment would appear to be suggested. To make

a fair return, either the results for the two sexes should be

stated separately, or the same proportion of the two sexes

must receive the experimental treatment. Further, care would

have to be taken in such a case to see that there was no

selection (perhaps unconscious) of the less severe cases for treat-

ment, thus introducing another source of fallacy (death positively

associated with severity, treatment negatively associated with

severity, giving rise to illusory negative association between

treatment and death).

A misleading association between the characters of parent and

offspring might similarly be created if the records for male-male

and female-female lines of descent were mixed. Thus suppose 50

per cent. of males and 10 per cent. of females exhibit some

attribute for which there is no association in either line, then we

would have for each line and for a mixed record of equal


Male line. Female line. Mixed record





I 25




I 25

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259



Parents with attribute and OK , . , Q ,

•.•• ... } 25 percent. 1 per cent. 13 per cent.

children with . . J r r r

Parents with attribute and

children without .

Parents without attribute

and children with

Parents without attribute

and children without .

Here 13/30 = 43 per cent. of the offspring of parents with the

attribute possess the attribute themselves, but only 17/70 = 24

per cent. of the offspring of parents without the attribute. The

association between attribute in parent and attribute in offspring

is, however, due solely to the association of both with male sex.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

The student will see that if records for male-female and female-

male lines were mixed, the illusory association would be negative,

and that if all four lines were combined there would be no illusory

association at all.

8. Illusory associations may also arise in a different way

through the personality of the observer or observers. If the

observer's attention fluctuates, he may be more likely to notice

the presence of A when he notices the presence of B, and vice

versd; in such a case A and B (so far as the record goes) will both

be associated with the observer's attention C, and consequently

an illusory association will be created. Again, if the attributes

are not well defined, one observer may be more generous than

another in deciding when to record the presence of A and also

the presence of B, and even one observer may fluctuate in the

generosity of his marking. In this case the recording of A and

the recording of B will both be associated with the generosity

of the observer in recording their presence, C, and an illusory

association between A and B will consequently arise, as


9. It is important to notice that, though we cannot actually

determine the partial associations unless the third-order frequency

(ABC) is given, we can make some conjecture as to their sign

from the values of the second-order frequencies.

Suppose, for instance, that—

wJAiMA^ - " . <5)


so that 81 and 82 are positive or negative according as A and B

are positively or negatively associated in the universes of C and

y respectively. Then we have by addition-^

(^>J^tte&)+8l+S! . . (6)

Hence if the value of (AB) exceed the value given by the first

two terms (i.e. if 82 + 82 be positive), A and B must be positively

associated either in the universe of C"s, the universe of y's, or

both. If, on the other hand, (AB) fall short of the value given by

the first two terms, A and B must be negatively associated in

the universe of C"s, the universe of y's, or both. Finally, if

(.4.6) be equal to the value of the first two terms, A and B must

be positively associated in the one partial universe and negatively

in the other, or else independent in both.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

The expression (6) may often be used in the following form,

obtained by dividing through by, say, (B)—

(AB)_(AG) (BC)(Ay) (M*i + S* m

(B) - (C) - (B) + (y) • (B)+~(B) • • V>

In using this expression we make use solely of proportions or

percentages, and judge of the sign of the partial associations

between A and B accordingly. A concrete case, as in Example iii.

below, is perhaps clearer than the general formula.

Example iii.—(Figures compiled from Supplement to the Fifty-

fifth Annual Report of the Registrar-General [C. — 8503], 1897.)

The following are the death-rates per thousand per annum, and the

proportions over 65 years of age, of occupied males in general,

farmers, textile workers, and glass workers (over 15 years of age

in each case) during the decade 1891-1900 in England and Wales.



per thousand
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

per thousand.

over 65 Years

of Age.

Occupied males over 15 . 15 "8


Farmers „ ,, . . 19.6


Textile workers, males over 15 . 15-9


Glass workers „ „ . 16"6


Would farming, textile working, and glass working seem to, be

relatively healthy or unhealthy occupations, given that the death-

rates among occupied males from 15-65 and over 65 years of age

are 11-5 and 102.3 per thousand respectively?

If A denote death, B the given occupation, C old age, we have


to apply the principle of equation (7). Calculate what would be

the death-rate for each occupation on the supposition that the

death-rates for occupied males in general (11'5, 102'3) apply to

each of its separate age-groups (under 65, over 65), and see

whether the total death-rate so calculated exceeds or falls short

of the actual death-rate. If it exceeds the actual rate, the

occupation must on the whole be healthy; if it falls short, un-

healthy. Thus we have the following calculated death-rates:—

Farmers. . . 11*5 x -868 + 102-3 x -132 = 23-5.

Textile workers . 11-5 x -966 + 102-3 x -034 = 14-6.

Glass workers . . 11-5 x 984 + 102-3 x -016 = 13-0.

The calculated rate for farmers largely exceeds the actual rate;

farming, then, must on the whole, as one would expect, be

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

a healthy occupation. The death-rate for either young farmers

or old farmers, or both, must be less than for occupied males in

general (the last is actually the case); the high death-rate

observed is due solely to the large proportion of the aged. Textile

working, on the other hand, appears to be unhealthy (14-6 < 15-9),

and glass working still more so (13-0< 16-6) ; the actual low total

death-rates are due merely to low proportions of the aged.

It is evident that age-distributions vary so largely from one

occupation to another that total death-rates are liable to be very

misleading—so misleading, in fact, that they are not tabulated at all

by the Registrar-General; only death-rates for narrow limits of age

(5 or 10 year age-classes) are worked out. Similar fallacies are

liable to occur in comparisons of local death-rates, owing to

variations not only in the relative proportions of the old, but also

in the relative proportions of the two sexes.

It is hardly necessary to observe that as age is a variable quantity,

the above procedure for calculating the above comparative death-

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

rates is extremely rough. The death-rate of those engaged in any

occupation depends not only on the mere proportions over and under

65, but on the relative numbers at every single year of age. The

simpler procedure brings out, however, better than a more complex

one, the nature of the fallacy involved in assuming that crude death-

rates are measures of healthiness. [See also Chap. XI. §§ 17-19.]

Example iv.—Eye-colour in grandparent, parent and child.

(The figures are those of Example ii.)

A, light-eyed child; B, light-eyed parent; C, light-eyed grand-


N = 5008 (AS) = 2524

(A) = 3584 LAG) = 2480

(5) = 3052 (SC) = 2231

(C) = 3178

Given only the above data, investigate whether there is probably

a partial association between child and grandparent.

If there were no partial association we would have—

(AB)(BQ (AfiMC)

- 2524 x 2231 1060 x 947

= ~" 3052 + 1956~

-1845-0 + 513-2

= 2358-2.

Actually (.4(7) = 2480; there must, then, be partial association

either in the 2?-universe, the ^-universe, or both. In the absence

of any reason to the contrary, it would be natural to suppose there

is a partial association in both; i.e. that there is a partial

association with the grandparent whether the line of descent

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

passes through "light-eyed "or "not-light-eyed " parents, but this

could not be proved without a knowledge of the class-frequency


10. The total possible number of associations to be derived from

n attributes grows so rapidly with the value of n that the evalua-

tion of them all for any case in which n is greater than four

becomes almost unmanageable. For three attributes there are 9

possible associations—three totals, three partials in positive

universes, and three partials in negative universes. For four

attributes, the number of possible associations rises to 54,

for there are 6 pairs to be formed from four attributes, and

we can find 9 associations for each pair (1 total, 4 partials

with the universe specified by one attribute, and 4 partials

with the universe specified by two). For five attributes the

student will find that there are no less than 270, and for six

attributes 1215 associations.

As suggested by Examples i. and ii. above, however, it is not

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

necessary in any actual case to investigate all the associations

that are theoretically possible; the nature of the problem indicates

those that are required.

In Example L, for instance, the total and partial associations

between A and D were alone investigated; the associations between

A and B, B and D were not essential for answering the question

that was asked. In Example ii., again, the three total associations

and the partial association between A and C were worked out,

but the partial associations between A and B, B and C were

omitted as unnecessary. Practical considerations of this kind will

always lessen the amount of necessary labour.


11. It might appear, at first sight, that theoretical considera-

tions would enable us to lessen it still further. As we saw in

Chapter I., all class-frequencies can be expressed in terms of those

of the positive classes, of which there are 2" in the case of n

attributes. For given values of the n + 1 frequencies If, (A), (B),

(G), ... of order lower than the second, assigned values of the

positive class-frequencies of the second and higher orders must

therefore correspond to determinate values of all the possible

associations. But the number of these positive class-frequencies

of the second and higher orders is only 2" -n+1 ; therefore the

number of algebraically independent associations that can be

derived from n attributes is only 2n-7i + l. For successive

values of n this gives—

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259





Hence if we give data, in any form, that determine four

associations in the case of three attributes, eleven in the case of

four attributes, and soon, in addition to Nand the class-frequencies

of the first- order, we have done all that is theoretically necessary.

The remaining associations can be deduced.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

12. Practically, however, the mere fact that they can be deduced

is of little help unless such deduction can be effected simply,

indeed almost directly, by mere mental arithmetic almost, and

this is not the case. The relations that exist between the ratios

or differences, such as (AB) — (AB)0, that indicate the associations

are, in fact, so complex that an unknown association cannot be

determined from those that are given without more or less lengthy

work; it is not possible to infer even its sign by any simple

process of inspection. We have, for instance, from (5), by the

process used in obtaining (4) for the special case of § 6—



which gives us the difference of (ABy) from the value it would

have if A and B were independent in the universe of y's in terms

of the difference of (ABC) from the value it would have if A and


B were independent in the universe of C's, and the corresponding

differences for the frequencies (AB), (AC), and (BC). The four

quantities in the brackets on the right represent, say, the four

known associations, the bracket on the left the unknown association.

Clearly, the relation is not of such a simple kind that the term on

the left can be, in general, mentally evaluated. Hence in con-

sidering the choice and number of associations to be actually

tabulated, regard must be had to practical considerations rather

than to theoretical relations.

13. The particular case in which all the 2" - n + 1 given associa-

tions are zero is worth some special investigation.

It follows, in the first place, that all other possible associations

must be zero, i.e. that a state of complete independence, as we

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

may term it, exists. Suppose, for instance, that we are given—

Then it follows at once that we have also—


(ABC'-—(B) (Ay'

i.e. A and C are independent in the universe of .5's, and B and C

in the universe of A's. Again,


-W(B)(y) .My)(By)

*2 (?)'

Therefore A and B are independent in the universe of y's.

Similarly, it may be shown that A and C are independent in the

universe of /3's, B and C in the universe of a's.

In the next place it is evident from the above that relations of

the general form (to write the equation symmetrically)

(ABC)-(A) (B) (C)

must hold for every class-frequency. This relation is the general

form of the equation of independence, (2) (d), Chap. III. (p. 26).
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

14. It must be noted, however, that (8) is not a criterion for the

complete independence of A, B, and C in the sense that the


(AS) (A) (B)


is a criterion for the complete independence of A and B. If we

are given N, (A), and (B), and the last relation quoted holds

good, we know that similar relations must hold for (Afi), (aB),

and (afi). If N, (A), (B), and (C) be given, however, and the

equation (8) hold good, we can draw no conclusion without

further information; the data are insufficient. There are eight

algebraically independent class-frequencies in the case of three

attributes, while JV", (.4), (B), (C) are only four: the equation (8)

must therefore be shown to hold good for four frequencies of the

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

third order before the conclusion can be drawn that it holds good

for the remainder, i.e. that a state of complete independence

subsists. The direct verification of this result is left for the


Quite generally, if i\T, (A), (B), (C), .... be given, the relation

(ABC ) (A) (B) (C)

N N - N " N y'

must be shown to hold good for 2" - n + 1 of the nth order classes

before it may be assumed to hold good for the remainder. It is

only because

2"-n + l = l

when n = 2 that the relation

(AS) (A) (B)

N~ N " N'

may be treated as a criterion for the independence of A and B.

If all the n (n>2) attributes are completely independent, the

relation (9) holds good; but it does not follow that if the relation
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

(9) hold good they are all independent.


(1) Yule, G. U., "On the Association of Attributes in Statistics," Phil.

Trans. Soy. Soc, Series A, vol. cxciv., 1900, p. 257. (Deals fully

with the theory of partial as well as of total association, with numerous

illustrations: a notation suggested for the partial coefficients.)

(2) Yule, G. U., "Notes on the Theory of Association of Attributes in

Statistics," Biomelrika, vol. ii., 1903, p. 121. (Cf. especially §§ 4 and

5, on the theory of complete independence, and the fallacies due to

mixing of records.)


1. Take the following figures for girls corresponding to those for boys in

Example i., p. 45, and discuss them similarly, but not necessarily using

exactly the same comparisons, to see whether the conclusion that "the

connecting link between defects of body and mental dulness is the coincident

defect of brain which may be known by observation of abnormal nerve signs"

seems to hold good.

A, development defects. B, nerve signs. D, mental dulness.




Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259












2. (Material from Census of England and Wales, 1891, vol. iii.) The

following figures give the numbers of those suffering from single or combined

infirmities: (1) for all males, (2) for males of 55 years of age and over.

A, Blindness. B, Mental derangement. C, Deaf-mutism.


All Males.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


Males 55-


All Males.


Males 55



[AB) 183





(AC) 51

(BC) 299

(ABC) 11









Tabulate proportions per thousand, exhibiting the total association between

blindness and mental derangement, and the partial association between the

same two infirmities among deaf-mutes, (1) for males in general, (2) for those

of 55 years of age or over. Give a short verbal statement of the results, and

contrast them with those of Question 1.

3. (Material from supplement to 55th Annual Report Reg.-Genl.)

The death-rate from cancer for occupied males in general (over 15) is

0'685 per thousand per annum, and for farmers 1 .20.

The death-rates from cancer for occupied males under and over 45 respec-

tively are 0.13 and 2.25 respectively. Of the farmers 46.1 per cent. are over


Would you say that farmers were peculiarly liable to cancer?

4. A population of males over 15 years of age consists of 7 per cent. over 65

years of age and 93 per cent. under. The death-rates are 12 per thousand per

A light-eye colour in husband, B in wife, 0 in son—











Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259



7. Show that if (ABC)=(a$y), (aBC)=(A0y), and so on (the case of

"complete equality of contrary frequencies" of Question 7, Chap. I.), A, B,

and C are completely independent if A and B, A and C, B and C are inde-

pendent pair and pair.

8. If, in the same case of complete equality of contraries,

(AB)-N/i = S1

(AC)-N/i = ^

(BC) -N/i = 83

show that


so that the partial associations between A and B in the universes 0 and y are

positive or negative according as

9. In the simple contests of a general election (contests in which one

Conservative opposed one Liberal and there were no other candidates) 66 per

cent. of the winning candidates (according to the returns) spent more money
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

than their opponents. Given that 63 per cent of the winners were Con-

servatives, and that the Conservative expenditure exceeded the Liberal in 80

per cent. of the contests, find the percentages of elections won by Conservatives

(1) when they spent more and (2) when they spent less than their opponents,

and hence say whether you consider the above figures evidence of the influence

of expenditure on election results or no. (Note that if the one candidate in a

contest be a Conservative-winner-who spends more than his opponent—the

other must necessarily be a Liberalloser-who spends less — and so forth.

Hence the case is one of complete equality of contraries.)

10. Given that (A)JN=(B)/N=(C)/N=x, and that (AB)/N=(AC)/N=y,

find the major and minor limits to y that enable one to infer positive associa-

tion between B and C, i. e. (BC)/N> x 2.

Draw a diagram on squared paper to illustrate your answer, taking x and y

as co-ordinates, and shading the limits within which y must lie in order to

permit of the above inference. Point out the peculiarities in the case of in-

ferring a positive association from two negative associations.

11. Discuss similarly the more complex case (A)/N=x, (B)/N=2x, (C)/N=


(1) for inferring positive association between B and C given (AB)/N=


(2) for inferring positive association between A and C given (AB)/N=


(S) for inferring positive association between A and B given (AC)/N=



1. The general principle of a manifold classification—2-4. The table of

double-entry or contingency table and its treatment by fundamental

methods—5-8. The coefficient of contingency—9-10. Analysis of

a contingency table by tetrads—11-13. Isotropic and anisotropic

distributions—14-15. Homogeneity of the classifications dealt with

in this and the preceding chapters: heterogeneous classifications.

1. Classification by dichotomy is, as was briefly pointed out in

Chap. I. § 5, a simpler form of classification than usually occurs

in the tabulation of practical statistics. It may be regarded as

a special case of a more general form in which the individuals or

objects observed are first divided under, say, s heads, A1 A„ . . . .

As, each of the classes so obtained then subdivided under t heads,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Bv Bt . . . . B„ each of these under u heads, Gv C2 ■ ■ ■ ■ Cu, and

so on, thus giving rise to s. t. u ultimate classes altogether.

2. The general theory of such a manifold as distinct from a

wofold or dichotomous classification, in the case of n attributes

or characters ABC . . . . N, would be extremely complex: in the

present chapter the discussion will be confined to the case of two

characters, A and B, only. If the classification of the A's be s-

fold and of the B's tf-fold, the frequencies of the st classes of the

second order may be most simply given by forming a table with

s columns headed Ax to A„ and t rows headed B1 to Bt. The

number of the objects or individuals possessing any combination

of the two characters, say Am and B„, i.e. the frequency of the

class AmB„, is entered in the compartment common to the with

column and the nth row, the st compartments thus giving all

the second-order frequencies. The totals at the ends of rows

and the feet of columns give the first-order frequencies, i.e. the

numbers of Am's and 2?„'s, and finally the grand total at the
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

right-hand bottom corner gives the whole number of observations.

Tables I. and II. below will serve as illustrations of such tables

of double-entry or contingency tables, as they have been termed

by Professor Pearson (ref. 1).



3. In Table I. the division is 3 x 3-fold: the houses in England

and Wales are divided into those which are in (1) London, (2)

other urban districts, (3) rural districts, and the houses in each

of these divisions are again classified into (1) inhabited houses,

(2) uninhabited but completed houses, (3) houses that are

"building," i.e. in course of erection. Thus from the first row

we see that there were in London, in round numbers, 616,000

houses, of which 571,000 were inhabited, 40,000 uninhabited,

and 5000 in course of erection: from the first column, there

were 6,260,000 inhabited houses in England and Wales, of which

571,000 were in London, 4,064,000 in other urban districts, and

1,625,000 in rural districts.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Table I.—Houses in England and Wales. (Census of 1901.

Summary Table X.) (000's omitted.)






Adm. County of London

Other urban districts

Rural districts







Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google






Total for England and Wales





In Table II., on the other hand, the classification is 3 x 4-fold:

the eye-colours are classed under the three heads "blue," "grey or

green," and "brown," while the hair-colours are classed under

four heads, "fair," "brown," "black," and "red." The table is

Table II.—Hair- and Eye-Colours of 6800 Males in Baden.

(Ammon, Zur Anthropologic der Badener.)








Blue ....












Grey or Green



Brown ....

read similarly to the last. Taking the first row, it tells us that

there were 2811 men with blue eyes noted, of whom 1768 had

fair hair, 807 brown hair, 189 black hair, and 47 red hair.

Similarly, from the first column, there were 2829 men with fair

hair, of whom 1768 had blue eyes, 946 grey or green eyes, and

115 brown eyes. The tables are a generalised form of the four-

fold (2 x 2-fold) tables in § 13, Chap. III.

4. For the purpose of discussing the nature of the relation

between the A's and the -5's, any such table may be treated on

the principles of the preceding chapters by reducing it in different

ways to 2 X 2-fold form. It then becomes possible to trace the

association between any one or more of the A's and any one or

more of the -S's, either in the universe at large or in universes

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

limited by the omission of one or more of the A's, of the B's, or

of both. Taking Table I., for example, trace the association

between the erection of houses and the urban character of a

district. Adding together the first two rows—i.e. pooling London

and the other urban districts together—and similarly adding the

first two columns, so as to make no distinction between inhabited

and uninhabited houses as long as they are completed, we find—

Proportion of all houses which )

are in course of erection in J 50/5010= 10 per thousand.

urban districts . . . )

Proportion of all houses which \

are in course of erection in > 12/1761 =7 ,,

rural districts . . . )

There is therefore, as might be expected, a distinct positive

association, a larger proportion of houses being in course of

erection in urban than in rural districts.

If, as another illustration, it be desired to trace the association

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

between the " uninhabitedness " of houses and the urban character

of the district, the procedure will be rather different. Eows 1

and 2 may be added together as before, but column 3 may be

omitted altogether, as the houses which are only in course of

erection do not enter into the question. We then have—

Proportion of all houses which )

are uninhabited in urban > 325/4960 = 66 per thousand•

districts . . . . )

Proportion of all houses which J

are uninhabited in rural > 124/1749 = 71 „

districts . . . . )

The association is therefore negative, the proportion of houses

uninhabited being greater in rural than in urban districts.


The eye- and hair-colour data of Table II. may be treated in a

precisely similar fashion. If, e.g., we desire to trace the associa-

tion between a lack of pigmentation in eyes and in hair, rows 1

and 2 may be pooled together as representing the least pigmenta-

tion of the eyes, and columns 2, 3, and 4 may be pooled together

as representing hair with a more or less marked degree of

pigmentation. We then have—

Proportion of light-eyed with ) 27u/5943 = 46 cent

fair hair ....)' r

Proportion of brown-eyed with ( ,, k/ok* i q

fair hair . \ ' ~"

The association is therefore well-marked. For comparison we

may trace the corresponding association between the most marked

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

degree of pigmentation in eyes and hair, i.e. brown eyes and

black hair. Here we must add together rows 1 and 2 as before,

and columns 1, 2, and 4—the column for red being really mis-

placed, as red represents a comparatively slight degree of pigmenta-

tion. The figures are—

?r WackTak br0Wn"eyed With } 288/857 = 34 per cent.

Proportion of light-eyed with I 935/5943 = 16

black hair ....)'"

The association is again positive and well-marked, but the

difference between the two percentages is rather less than in the

last case.

5. The mode of treatment adopted in the preceding section rests

on first principles, and, if fully carried out, it gives the most

detailed information possible with regard to the relations of the

two attributes. At the same time a distinct need is felt in

practical work for some more summary method—a methqd which

will enable a single and definite answer to be given to such a

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

question as —Are the A's on the whole distinctly dependent on the

-B's; and if so, is this dependence very close, or the reverse? The

coefficient of association, which affords the answer to this question

in the case of a dichotomous classification, was only dealt with

briefly and incidentally, for where there are only four classes of the

second order to be considered the matter is not nearly so complex as

where the number is, say, twenty-five or more, and the need for any

summary coefficient is not so often nor so keenly felt; moreover, the

coefficient most widely used (Chap. III. ref. 3) is hardly susceptible

of elementary treatment. The ideas on which Professor Pearson's

general measure of dependence, the "coefficient of contingency," is

based, are, however, quite simple and fundamental, and the mode of

calculation is therefore given in full in the following section. The

advanced student should refer to the original memoir (ref. 1) for

the complete treatment of the theory of the coefficient, and of its

relation to the theory of variables.

6. Generalising slightly the notation of the preceding chapters,

let the frequency of Am's be denoted by (Am), the frequency of

Bn's by (Bn), and the frequency of objects or individuals possessing

both characters by (AmB„). Then, if the A's and B's be com-

pletely independent in the universe at large, we must have for all

values of m and n—

(AmBn)JA^£n)'-(4A). • • • (1)

If, however, A and B are not completely independent, (AmBn) and

(AmB„)a will not be identical for all values of m and n. Let

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

the difference be given by

8mn = (AmB„)-(AmB„)0 . . . (2)

A coefficient such as we are seeking may evidently be based in

some way on these values of 8. It will not do, however, simply to

add them together, for the sum of all the values of 8, some of

which are negative and others positive, must be zero in any case,

the sum of both the (AB)'s and the (AB)0's being equal to the

whole number of observations N. It is necessary, therefore, to

get rid of the signs, and this may be done in two simple ways: (1)

by neglecting them and forming the arithmetical instead of the

algebraical sum of the differences S, or (2) by squaring the differ-

ences and then summing the squares. The first process is the

shorter, but the second the better, as it leads to a coefficient

easily treated by algebraical methods, which the first process

does not: as the student will see later, squaring is very

usefully and very frequently employed for the purpose of elimin-

ating algebraical signs. Suppose, then, that every 8 is calculated,

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

and also the ratio of its square to the corresponding value of

(AB)0, and that the sum of all such ratios is, say, x2 \ or, in

symbols, using 2 to denote "the sum of all quantities like " :—

*'=5(<Ss) • • • • <3>

Being the sum of a series of squares, x2 ls necessarily positive,

and if A and B be independent it is zero, because every 8 is zero.

If, then, we form a coefficient C given by the relation

N+X2 . . ." . (4)


this coefficient is zero if the characters A and B are completely

independent, and approaches more and more nearly towards

unity as x2 increases. In general, no sign should be attached

to the root, for the coefficient simply shows whether the two

characters are or are not independent, and nothing more, but in

some cases a conventional sign may be used. Thus in Table II.

slight pigmentation of eyes and of hair appear to go together,

and the contingency may be regarded as definitely positive. If

slight pigmentation of eyes had been associated with marked

pigmentation of hair, the contingency might have been regarded

as negative. C is Professor Pearson's mean square contingency

coefficient.1 t
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

7. The coefficient, in the simple form (4), has one disadvantage,

viz. that coefficients calculated on. different systems of classi-

fication are not comparable with each other. It is clearly desir-

able for practical purposes that two coefficients calculated from

the same da^ta classified in two different ways should be, at least

approximately, identical. With the present coefficient this is not

the case: if certain data be classified in, say, (1) 6 x 6-fold, (2)

3 x 3-fold foitm, the coefficient in the latter form tends to be the

least. The greatest possible value of the coefficient is, in fact,

only unity ifNjie number of classes be infinitely great; for any

finite number of classes the limiting value of C is the smaller the

smaller the number of classes. This may be briefly illustrated as

follows. Replacing Sm„ in equation (3) by its value in terms of

(AmB.) and (AmB„)0 we have—

and therefore, denoting the expression in brackets by S,


Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


Now suppose we have to deal with a t x i-fold classification in

which (Am) = (Bm) for all values of m; and suppose, further, that

the association between Am and Bm is perfect, so that (AmBm) =

(Am) = (Bm) for all values of m, the remaining frequencies of the -

second order being zero; all the frequency is then concentrated

in the diagonal compartments of the table, and each contributes

1 Professor Pearson (ref. 1) terms 8 a sub-contingency; x2 the square contin-

gency ; the ratio ^/N, which he denotes by <p2, the mean square con tingency;

and the sum of all the S's of one sign only, on which a different coefficient can

be based, the mean contingency.



N to the sum S. The total value of S is accordingly tN, and the

value of C—


This is the greatest possible value of C for a symmetrical t x <-fold

classification, and therefore, in such a table, for—-

<= 2 C car

inot excee<

1 0-707

t= 3


t= 4 ,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

t= 5



t= 6


t= 1



t= 8 ,


t= 9 ,


< = 10 ,

} II


It is as well, therefore, to restrict the use of the "coefficient of

contingency" to 5 x 5-fold or finer classifications. At the same

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

time the classification must not be made too fine, or else the value

of the coefficient is largely affected by casual irregularities of no

physical significance in the class-frequencies (cf. the remarks in

Chap. III. §§ 7-8).

Table III.—In lependence- Values of the Frequencies for Table II.

Eye colour.












53 4


Grey or Green . ...





8. As the classification of Table II. is only 3 x 4-fold, it is rather

crude for the purpose of calculating the coefficient, but will serve

simply as an illustration of the form of the arithmetic. In Table

III. are given the values of the independence frequencies, 2829 x

2811/6800 = 1169 and so on. The value of x2 is more readily

calculated from equation (5) than from (3):—














Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259












Total = 5= 7875-2

N= 6800

S-N= 1075-2

C = \/w^= v/-1365 = °-37

The squares in such work may conveniently be taken from

Barlow's Tables of Squares, Cubes, etc. (see list of tables on

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

p. 352), or logarithms may be used throughout—five-figure

logarithms are quite sufficient.

9. While such a coefficient of contingency, in some form or

other, is a great convenience in many fields of work, its use

should not lead to a neglect of those details which a treatment by

the elementary methods of § 4 would have revealed. Whether

the coefficient be calculated or no, every table should always be

examined with care to see if it exhibit any apparently significant

peculiarities in the distribution of frequency, e.g. in the associa-

tions subsisting between Am and B„ in limited universes. A good

deal of caution must be used in order not to be misled by casual

irregularities due to paucity of observations in some compartments

of the table, but imporfetefc points that would otherwise be over-

looked will often be revealed by such a detailed examination.

10. Suppose, for example, that any four adjacei^fe frequencies,


(AmB„) (Am+1Bn)

are extracted from the general contingency table. Considering

these as a table exhibiting the association between Am and Bn in

a universe limited to AmAm+1 BnBn+1 alone, the association is

positive, negative, or zero according as (AmBn)/(Am+iBn) is greater


than, less than, or equal to the ratio (AmBn+l)/(Am+lB„+1). The

whole of the contingency table can be analysed into a series of

elementary groups of four frequencies like the above, each one

overlapping its neighbours so that an rs-fold table contains

(r— 1) (s - 1) such "tetrads," and the associations in them all can

be very quickly determined by simply tabulating the ratios like

(AmBn)/(Am+lBn), (AmBn+l)/(Am+lB„+1), etc., or perhaps better,

the proportions (AmB„)/{(AmB,,) + (Am+lB„)}, etc., for every pair

of columns or of rows, as may be most convenient. Taking the

figures of Table II. as an illustration, and working from the

rows, the proportions run as follows :—

For rows 1 and 2. For rows 2 and 3.

1768/2714 0-651 946/1061 0-892

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

807/2194 0-368 1387/1825 0760

189/935 0-202 746/1034 0-721

47/100 0-470 53/69 0 768

In both cases the first three ratios form descending series, but

the fourth ratio is greater than the second. The signs of the

associations in the six tetrads are accordingly—



The negative sign in the two tetrads on the right is striking,

the more so as other tables for hair- and eye-colour, arranged in

the same way, exhibit just the same characteristic. But the

peculiarity will be removed at once if the fourth column be placed

immediately after the first: if this be done, i.e. if "red " be placed

between "fair" and "brown" instead of at the end of the colour-

series, the sign of the association in all the elementary tetrads

will be the same. The colours will then run fair, red, brown,

black, and this would seem to be the more natural order, consider-
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

ing the depth of the pigmentation.

11. A distribution of frequency of such a kind that the

association in every elementary tetrad is of the same sign

possesses several useful and interesting properties, as shown in

the following theorems. It will be termed an isotropic dis-


(1) In an isotropic distribution the sign of the association is

the same not only for every elementary tetrad of adjacent frequen-

cies, but for every set of four frequencies in the compartments

common to two rows and two columns, e.g. (AmBn), (Am+vBn),

(AmBn+q), (Am+pBn+q).

For suppose that the sign of association in the elementary

tetrads is positive, so that—

(AmBn)(Am+1B„+1)>(Am+lBn)(AmBn+l) . . (1)

and similarly,

(^m+A)(Am+iBn+l)>(Am+iBn)(Am+1Bn+1) . . (2)

Then multiplying up and cancelling we have

(AmBn)(Am+zBn+1)>(Am+2Bn)(AmBn+1) . . (3)

That is to say, the association is still positive though the two

columns Am and Am+2 are no longer adjacent.

(2) An isotropic distribution remains isotropic in whatever way

it may be condensed by grouping together adjacent rows or columns.

Thus from (1) and (3) we have, adding—

(AmB„)[(Am+1Bn+1) + (Am+2Bn+1)] > (AmB„+l)[(Am+1B„) + (Am+2Bn)],

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

that is to say, the sign of the elementary association is unaffected

by throwing the (m + l)th and (m + 2)th columns into one.

(3) As the extreme case of the preceding theorem, we may

suppose both rows and columns grouped and regrouped until

only a 2 x 2-fold table is left; we then have the theorem—

If an isotropic distribution be reduced to a fourfold distribution

in any way whatever, by addition of adjacent rows and columns,

the sign of the association in such fourfold table is the same as in

the elementary tetrads of the original table.

The case of complete independence is a special case of isotropy.

For if

(AmBn) = (Am)(Bn)/N

for all values of m and n, the association is evidently zero for

every tetrad. Therefore the distribution remains independent

in whatever way the table be grouped, or in whatever way the

universe be limited by the omission of rows or columns. The

expression "complete independence " is therefore justified.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

From the work of the preceding section we may say that Table

II. is not isotropic as it stands, but may be regarded as a dis-

arrangement of an isotropic distribution. It is best to rearrange

such a table in isotropic order, as otherwise different reductions

to fourfold form may lead to associations of different sign, though

of course they need not necessarily do so.

12. The following will serve as an illustration of a table that

is not isotropic, and cannot be rendered isotropic by any rearrange-

ment of the order of rows and columns.



Table IV.— Showing the Frequencies of Different Combinations of

Eye-colours in Father and Son.

(Data of Sir F. Galton, from Karl Pearson, Phil. Trans., A, vol. cxcv.

(1900), p. 138 ; classification condensed.)

1. Blue. 2. Blue-green, grey. 3. Dark grey, hazel. 4. Brown.

Father's Eye-colour.





Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259












Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


















The following are the ratios of the frequency in column m to

the sum of the frequencies in columns m and m+ 1 :—


1 and 2.

2 and 3.

3 and 4













The order in which the ratios run is different for each pair of

columns, and it is accordingly impossible to make the table


the great majority of the tables, and accordingly its origin

demands explanation. Were such a table treated by the method

of the contingency coefficient, or a similar summary method,

alone, the peculiarity might not be remarked.

13. It may be noted, in concluding this part of the subject,

that in the case of complete independence the distribution of

frequency in every row is similar to the distribution in the row

of totals, and the distribution in every column similar to that in

the column of totals; for in, say, the column An the frequencies

are given by the relations —

and so on. This property is of special importance in the theory

of variables.

14. The classifications both of this and of the preceding chapters

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

have one important characteristic in common, viz. that they

are, so to speak, "homogeneous"—the principle of division

being the same for all the sub-classes of any one class. Thus

A's and as are both subdivided into B's and /3's, A1's, A2's ....

As's into B^s, B2's .... Bt's, and so on. Clearly this is necessary

in order to render possible those comparisons on which the

discussions of associations and contingencies depend. If we

only know that amongst the A's there is a certain percentage

of B's, and amongst the a's a certain percentage of C's, there

are no data for any conclusion.

Many classifications are, however, essentially of a heterogeneous

character, e.g. biological classifications into orders, genera, and

species; the classifications of the causes of death in vital

statistics, and of occupations in the census. To take the last

case as an illustration, the first "order" in the list of occupations

is "General or Local Government of the Country," subdivided

under the headings (1) National Government, (2) Local Govern-

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

ment. The next order is "Defence of the Country," with the sub-

headings (1) Army, (2) Navy and Marines—not (1) National

and (2) Local Government again—the sub-heads are necessarily

distinct. Similarly, the third order is "Professional Occupations

and their Subordinate Services," with the fresh sub-heads (1)

Clerical, (2) Legal, (3) Medical, (4) Teaching, (5) Literary and

Scientific, (6) Engineers and Surveyors, (7) Art, Music, Drama,

(8) Exhibitions, Games, etc. The number of sub-heads under

each main heading is, in such a case, arbitrary and variable,

and different for each main heading; but so long as the

classification remains purely heterogeneous, however complex


it may become, there is no opportunity for any discussion

of causation within the limits of the matter so derived. It is

only when a homogeneous division is in some way introduced

that we can begin to speak of associations and contingencies.

15. This may be done in various ways according to the

nature of the case. Thus the relative frequencies of different

botanical families, genera, or species may be discussed in

connection with the topographical characters of their habitats—

desert, marsh, or moor—and we may observe statistical associa-

tions between given genera and situations of a given topographical

type. The causes of death may be classified according to sex,

or age, or occupation, and it then becomes possible to discuss

the association of a given cause of death with one or other

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

of the two sexes, with a given age-group, or with a given

occupation. Again, the classifications of deaths and of occupations

are repeated at successive intervals of time; and if they have

remained strictly the same, it is also possible to discuss the

association of a given occupation or a given cause of death with

the earlier or later year of observation—i.e. to see whether the

numbers of those engaged in the given occupation or succumbing

to the given cause of death have increased or decreased. But

in such circumstances the greatest care must be taken to see

that the necessary condition as to the identity of the classifications

at the two periods is fulfilled, and unfortunately it very

seldom is fulfilled. All practical schemes of classification are

subject to alteration and improvement from time to time, and

these alterations, however desirable in themselves, render a

certain number of comparisons impossible. Even where a

classification has remained verbally the same, it is not necessarily

really the same; thus, in the case of the causes of death,

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

improved methods of diagnosis may transfer many deaths from

one heading to another without any change in the incidence

of the disease, and so bring about a virtual change in the

classification. In any case, heterogeneous classification should

be regarded only as a partial process, incomplete until a

homogeneous division is introduced either directly or indirectly,

e.g. by repetition.



(1) Pearson, Karl, "On the Theory of Contingency and its Relation to

Association and Normal Correlation," Drapers' Company Research

Memoirs, Biometric Series i. ; Dulau & Co., London, 1904. (The

memoir in which the coefficient of contingency is proposed.)


(2) Lipps, G. F., "Die Bestimmung der Abhangigkeit zwischen den

Merkmalen eines Gegenstandes," Berichte der math.-phys. Klasse der

kgl. Sdchsischen Gesellschaft der Wissenschaften; Leipzig, 1905. (A

general discussion of the problems of association and contingency.)

(3) Pearson, Karl, "On a Coefficient of Class Heterogeneity or Divergence,"

Biometrika, vol. v. p. 198, 1906. (An application of the contingency

coefficient to the measurement of heterogeneity, e.g. in different

districts of a country, by treating the observed frequencies of some

quality Alt A, .... An in the different districts as rows of a con-

tingency table and working out the coefficient: the same principle is

also applicable to the comparison of a single district with the rest of

the country.)

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(4) Yule, G. U., "On a Property which holds good for all Groupings of a

Normal Distribution of Frequency for Two Variables, with applications

to the Study of Contingency Tables for the Inheritance of Unmeasured

Qualities," Proc. Roy. Soc, Series A, vol. lxxvii., 1906, p. 324. (On

the property of isotropy and some applications.)

(5) Yule, G. U., "On the Influence of Bias and of Personal Equation in

Statistics of Ill-defined Qualities," Jour. of the Anthrop. Inst.,

vol. xxxvi., 1906, p. 325. (Includes an investigation as to the influence

of bias and of personal equation in creating divergences from isotropy

in contingency tables.)

Contingency Tables of two Rows only.

(6) Pearson, Karl, "On a New Method of Determining Correlation between

a Measured Character A and a Character B of which only the Percentage

of Cases wherein B exceeds (or falls short of) a given Intensity is recorded

for each Grade of A" Biometrika, vol. vii., 1909, p. 96. (Deals with a

measure of dependence for a common type of table, e. g. a table showing

the numbers of candidates who passed or failed at an examination, for

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

each year of age. The table of such a type stands between the con-

tingency tables for unmeasured characters and the correlation table

(chap. IX.) for variables. Pearson's method is based on that adopted

for the correlation table, and assumes a normal distribution of fre-

quency (chap, xv.) for B.)

(7) Pearson, Karl, "On a New Method of Determining Correlation, when

one Variable is given by Alternative and the other by Multiple

Categories," Biometrika, vol. vii., 1910, p. 248. (The similar

problem for the case in which the variable is replaced by an un-

measured quality.)


(1) (Data from Karl Pearson, " On the Inheritance of the Mental and Moral

Characters in Man," Jour, of the Anthrop. Inst., vol. xxxiii., and Biometrika,

vol. iii.) Find the coefficient of contingency (coefficient of mean square

contingency) for the two tables below, showing the resemblance between

brothers for athletic capacity and between sisters for temper. Show that

neither table is even remotely isotropic. (As stated in § 7, the coefficient of

contingency should not as a rule be used for tables smaller than 5 x 5-fold:

these small tables are given to illustrate the method, while avoiding lengthy



A. Athletic Capacity.

First Brother.








Non-athletic .


Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259














B. Temper.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

First Sister.

Quick .


Sullen .

























1. Introductory—2. Necessity for classification of observations: the frequency

distribution—3. Illustrations—4. Method of forming the table—5.

Magnitude of class-interval—6. Position of intervals—7. Process of

classification—8. Treatment of intermediate observations—9. Tabula-

tion—10. Tables with unequal intervals—11. Graphical representa-

tion of the frequency-distribution—12. Ideal frequency-distributions

— 13. The symmetrical distribution—14. The moderately asymmetri-

cal distribution—15. The extremely asymmetrical or J-shaped dis-

tribution—16. The U-shaped distribution.

1. The methods described in Chaps. I.-V. are applicable to all

observations, whether qualitative or quantitative; we have now

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

to proceed to the consideration of specialised processes, adapted

to the treatment of quantitative measurements, but not generally

available, except by the aid of more or less artificial hypotheses,

for the discussion of purely qualitative observations. Since

numerical measurement is applied only in the case of a quantity

that can present more than one numerical value, that is, a varying

quantity, or more shortly a variable, this section of the work may

be termed the theory of variables. As common examples of such

variables that are subject to statistical treatment may be cited

birth- or death-rates, prices, wages, barometer readings, rainfall

records, and measurements or enumerations {e.g. of glands, spines,

or petals) on animals or plants.

2. If some hundreds or thousands of values of a variable have

been noted merely in the arbitrary order in which they happened

to occur, the mind cannot properly grasp the significance of the

record: the observations must be ranked or classified in some

way before the characteristics of the series can be comprehended,

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

and those comparisons, on which arguments as to causation

depend, can be made with other series. The dichotomous classi-


fication, considered in Chaps. I.-IV., is too crude: if the values are

merely classified as A's or as according as they exceed or fall

short of some fixed value, a large part of the information given

by the original record is lost. A manifold classification, however

(cf. Chap. V.), avoids the crudity of the dichotomous form, since

the classes may be made as numerous as we please, and numerical

measurements lend themselves with peculiar readiness to a

manifold classification, for the class limits can be conveniently

and precisely defined by assigned values of the variable. For

convenience, the values of the variable chosen to define the

successive classes should be equidistant, so that the numbers of

observations in the different classes (the class-frequencies) may be

comparable. Thus for measurements of stature the interval

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

chosen for classifying (the class-interval, as it may be termed)

might be 1 inch, or 2 centimetres, the numbers of individuals

being counted whose statures fall within each successive inch, or

each successive 2 centimetres, of the scale; returns of birth- or

death-rates might be grouped to the nearest unit per thousand

of the population; returns of wages might be classified to the

nearest shilling, or, if desired to obtain a more condensed table,

by intervals of five shillings or ten shillings, and so on. When

the variation is discontinuous, as for example in enumerations

of numbers of children in families or of petals on flowers, the

unit is naturally taken as the class-interval unless the range of

variation is very great. The manner in which the observations

are distributed over the successive equal intervals of the scale is

spoken of as the frequency-distribution of the variable,

3. A few illustrations will make clearer the nature of such

frequency-distributions, and the service which they render in

summarising a long and complex record :—

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

(a) Table I. In this illustration the mean annual death-rates,

expressed as proportions per thousand of the population per

annum, of the 632 registration districts of England and Wales,

for the decade 1881-90, have been classified to the nearest unit;

i.e. the numbers of districts have been counted in which the

death-rate was over 12 5 but under 13-5, over 13-5 but under

14'5, and so on. The frequency-distribution is shown by the

following table.

[Table I.


Table I.—Showing the Numbers of Registration Districts in England and

Wales with Different mean Death-rates "per Thousand of the Population

per Annum for the Ten Fears 1881-90. (Material from the Supplement

to the 55th Annual Report of the Registrar-General for England and

WaleslC— 7769] 1895.)

Number of

Number of

Mean Annual


Districts with

Mean Annual

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Districts with


between Limits


between Limits


stated. •








Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

25 5-26-5






27 -5-28 -5

17-5-18 5



18 5-19-5






20 -5-21 -5









Whilst a glance through the original returns fails to convey

any very definite impression, owing to the large and erratic

differences between the death-rates in successive districts, a brief

inspection of the above table brings out a number of important

points. Thus we see that the death-rates range, in round

numbers, from 13 to 33 per thousand pet- annum, but in the

great majority of districts lie nearer the lower limit than the

upper; that the death-rates in some 60 per cent. of the districts



Table II.—Showing the Numbers of Married Women, in certain Quaker

Families, Dying at Different Ages. (Cited from Proc. Jioy. Soc., vol. lxvii.

(1900), p. 172. On the Correlation between Duration of Life and Number

of Offspring, by Miss M. Beeton, Karl Pearson, and G. U. Yule.)

Number of

Number of

Age at Death,


Women Dying


Age at Death,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Women Dying


said Years

said Years

of Age.

of Age.



62-5- 67-5




67-5- 72 5

83 *

27 5-32-5


72-5- 77-5
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



109 ",

77-5- 82-5

78 1-:



82-5- 87-5




87 5- 92-5




92-5- 97-5








The distribution is somewhat more irregular than in the last

case; the commencement is abrupt; a maximum frequency is

attained in the fourth class (age at death 32-5 to 37-5), and then

there is a slow fall to the age-class 52-5-57-5. After this class

the frequency rises again and attains a secondary maximum in

the age-class 67'5-72-5.

(c) Table III. The numbers of stigmatic rays on a number

of Shirley poppies were counted. As the range of variation is

not great, the unit is taken as the class-interval. The frequency-

distribution is given by the following table.

Table III.—Showing the Frequencies of Seed Capsules on certain Shirley

Poppies, with Different Numbers of Stigmatic Rays. (Cited from

Biometrika, ii. p. 89, 1902.)

Number of

The numbers of rays range from 6 to 20,—12, 13, or 14 rays

being the most usual.

4. To expand slightly the brief description given in § 2, tables

like the preceding are formed in the following way :—(1) Thes

magnitude of the class-interval, i.e. the number of units to each

interval, is first fixed; one unit was chosen in the case of Tables

I. and III., five units in the case of Table II. (2) The position or

origin of the intervals must then be determined, e.g. in Table I.

we must decide whether to take as intervals 12-13, 13-14, 14-15,

etc., or 12-5-13-5, 13-5-14-5, 14-5-15-5, etc. (3) This choice

having been made, the complete scale of intervals is fixed, and the

observations are classified accordingly. (4) The process of

classification being finished, a table is drawn up on the general

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

lines of Tables I.—III., showing the total numbers of observations

in each class-interval. Some remarks may be made on each of

these heads.

5. Magnitude of Class-Interval.—As already remarked, in cases

where the variation proceeds by discrete steps of considerable

magnitude as compared with the range of variation, there is very

little choice as regards the magnitude of the class-interval. The

unit will in general have to serve. But if the variation be

continuous, or at least take place by discrete steps which are

small in comparison with the whole range of variation, there is

no such natural class-interval, and its choice is a matter for


The two conditions which guide the choice are these: (a) we

desire to be able to treat all the values assigned to any one class,

without serious error, as if they were equal to the mid-value

of the class-interval, e.g. as if the death-rate of every district in

the first class of Table I. were exactly 13-0, the death rate of
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

every district in the second class 14 0, and so on; (b) for con-

venience and brevity we desire to make the interval as large as

possible, subject to the first condition. These conditions will

generally be fulfilled if the interval be so chosen that the whole

number of classes lies between 15 and 25. A number of classes

less than, say, ten leads in general to very appreciable inaccuracy,

and a number over, say, thirty makes a somewhat unwieldy,

table. A preliminary inspection of the record should accordingly

be made and the highest and lowest values be picked out.

Dividing the difference between these by, say, five and twenty, we.

have an approximate value for the interval. The actual value

should be the nearest integer or simple fraction.

6. Position of Intervals.—The position or starting-point of the

intervals is, as a rule, more or less indifferent, but in general it

is fixed either so that the limits of intervals are integers, or, as in


Tables I. and IT., so that the mid-values are integers. It may,

however, be chosen, for simplicity in classification, so that no

limit corresponds exactly to any recorded value (cf. § 8 below). In

some exceptional cases, moreover, the observations exhibit a marked

clustering round certain values, e.g. tens, or tens and fives. This

is generally the case, for instance, in age returns, owing to the

tendency to state a round number where the true age is unknown.

Under such circumstances, the values round which there is a

marked tendency to cluster should preferably be made mid-values

of intervals, in order to avoid sensible error in the assumption

that the mid-value is approximately representative of the values

in the class. Thus, in the case of ages, since the clustering is

chiefly round tens, " 25 and under 35," " 35 and under 45," etc., the
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

classification of the English census, is a better grouping than " 20

and under 30," "30 and under 40," and so on. Where there is

any probability of a clustering of this kind occurring, it is as well

to subject the raw material to a close examination before finally

fixing the classification.

7. Classification.—The scale of intervals having been fixed, the

observations may be classified. If the number of observations is

not large, it will be sufficient to mark the limits of successive

intervals in a column down the left-hand side of a sheet of paper,

and transfer the entries of the original record to this sheet by

marking a 1 on the line corresponding to any class for each entry

assigned thereto. It saves time in subsequent totalling if each

fifth entry in a class is marked by a diagonal across the preceding

four, or by leaving a space.

The disadvantage in this process is that it offers no facilities for

checking: if a repetition of the classification leads to a different

result, there is no means of tracing the error. If the number of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

observations is at all considerable and accuracy is essential, it is

accordingly better to enter the values observed on cards, one to

each observation. These are then dealt out into packs according

to their classes, and the whole work checked by running through

the pack corresponding to each class, and verifying that no cards

have been wrongly sorted.

8. In some cases difficulties may arise in classifying, owing to

the occurrence of observed values corresponding to class-limits.

Thus, in compiling Table I., some districts will have been noted

with death-rates entered in the Registrar-General's returns as

16-5, 17.5, or 18.5, any one of which might at first sight have

been apparently assigned indifferently to either of two adjacent

classes. In such a case, however, where the original figures for

numbers of deaths and population are available, the difficulty may

be readily surmounted by working out the rate to another place


of decimals: if the rate stated to be 16-50 proves to be 16'502, it

will be sorted to the class 16-5-17-5; if 16"498, to the class

15'5-16-5. Death-rates that work out to half-units exactly do

not occur in this example, and so there is no real difficulty. In

the case of Table II., again, there is no difficulty: if the year of

birth and death alone are given, the age at death is only calcul-

able to the nearest unit; if the actual day of birth and death be

cited, half-years still cannot occur in the age at death, because

there is an odd number of days in the year. The difficulty may

always be avoided if it be borne in mind in fixing the limits

to class-intervals, these being carried to a further place of decimals,

or a smaller fraction, than the values in the original record. Thus

if statures are measured to the nearest centimetre, the class-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

intervals may be taken as 150'5-151-5, 151'5-152'5, etc.; if to

the nearest eighth of an inch, the intervals may be 59||—60|f,

60^ f-61 If, and so on.

If the difficulty is not evaded in any of these ways, it is

usual to assign one-half of an intermediate observation to each

adjacent class, with the result that half-units occur in the

class-frequencies (cf. Tables VII., p. 90, X., p. 96, and XT.,

p. 96). The procedure is rough, but probably good enough for

practical purposes; it would be slightly better, but a good deal

more laborious, to assign the intermediate observations to the

adjacent classes in proportion to the numbers of other observations

falling into the two classes.

9. Tabulation.—As regards the actual drafting of the final

table, there is little to be said, except that care should be taken

to express the class-limits clearly, and, if necessary, to state the

manner in which the difficulty of intermediate values has been

met or evaded. The class-limits are perhaps best given as in

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Tables I. and II., but may be more briefly indicated by the mid-

values of the class-intervals. Thus Table I. might have been

given in the form—

Death-rate per 1000 Number of

per annum to the Districts with

Nearest Unit. said Death-rate.

13 5

14 16

15 61

16 112

etc. etc.

A common mode of defining the class-intervals is to state the

limits in the form "x and less than y." In the case of measure-

ments of stature, for example, the table might run—


Stature in Inches.

Number of


57 and less than 58

58 „ „ 59

59 „ „ 60




—the statement "57 and less than 58, "etc., being often abbreviated

to 57-, 58-, 59-, etc. (cf. Table VI., p. 88). The mode of grouping
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

is, in effect, that described in the last paragraph as of service in

avoiding intermediate observations, but it should be noted that the

form of statement leaves the class-limits uncertain unless the degree

of accuracy of the measurements is also given. Thus, if measure-

ments were taken to the nearest eighth of an inch, the class-

limits are really 56i£-57y£, 57^^-58if, etc.; if they were

only taken to the nearest quarter of an inch, the limits are 56§

-57-|, 57|—58-|, etc. With such a form of tabulation a state-

ment as to the number of significant figures in the original

record is therefore essential. It is better, perhaps, to state the

true class-limits and avoid ambiguity.

10. The rule that class-intervals should be all equal is one

that is very frequently broken in official statistical publications,

principally in order to condense an otherwise unwieldy table,

thus not only saving space in printing but also considerable

expense in compilation, or possibly, in the case of confidential

figures, to avoid giving a class which would contain only one or

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

|wo observations, the identity of which might be guessed. It

would hardly be legitimate, for example, to give a return of

incomes relating to a limited district in such a form that the

income of the two or three wealthiest men in the district would

be clear to any intelligent reader with local knowledge. If the

intervals be made unequal, the application of many statistical

methods is rendered awkward, or even impossible, and the"

relative values of the frequencies are at first sight misleading, so

that the table is not perspicuous. Thus, consider the first two

columns of Table IV., showing the numbers of dwelling-houses

of different annual values, assessed to inhabited house duty. On

running the eye down the column headed "number of houses " it

is at once caught by the two striking irregularities at the classes

"£60 and under £80," and "£100 and under £150." But these

have no real significance; they are merely due to changes from

a £10 to a £20, and then to a £50 interval. Moreover, the

intervals after £150 go on continuously increasing, but attention

is not directed thereto by any marked changes in the frequencies.

To make the latter really comparable inter se, they must first be


Table IV.—Showing the Annual Value and Number of Dwelling-houses in

Oreat Britain assessed to Inhabited House Duty in 1885-6. (Cited from

Jour. Roy. Stat. Soe., vol. 1., 1887, p. 610.)

Annual Value in £'s.


of Houses.


per £10


£20 and under £30


Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

30 „ 40



40 ,, 50



50 „ 60



60 ,, 80



80 ,, 100



100 „ 150

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


150 „ 300



300 „ 500



500 ,, 1000



1000 and upwards


Total number of houses


reduced to a common interval as basis, e.g. £10, by dividing the

fifth and sixth numbers by 2, the seventh by 5, the eighth by 15,

and so on. This gives the mean frequencies per £10 interval

tabulated in the third column of Table IV. The reduction is,

however, impossible in the case of the last class, for we are only

told the number of houses of £1000 annual value and upwards:

the magnitude of the class is indefinite. Such an indefinite class

is in many respects a great inconvenience, and should always be

avoided in work not subject to the necessary limitations of

official publications.

The general rule that intervals should be equal must not be

held to bar the analysis by smaller equal intervals of some

portion of the range over which the frequency varies very

rapidly. In Table XII., p. 98, for example, giving the numbers

of deaths from diphtheria at successive ages, a five-year interval

might be substituted with advantage for the irregular intervals

after the fifth year of age, but it would still be desirable to give

the numbers of deaths in each year for the first five years, so as

to bring out the rapid rise to the maximum in the fourth year

of life.

11. When the table has been completed, it is often convenient

to represent the frequency-distribution by means of a diagram

which conveys the general run of the observations to the eye



giving the distribution of head-breadths for 1000 men, will serve

as an example.

Table V.—Showing the Frequency-distribution of Sepd-breadlhs for Students

at Cambridge. Measurements taken to the nearest tenth of an inch.

(Cited from W. R. Macdonell, Biometrika, i., 1902, p. 220.)


in Inches.

Number of

Men with said



in Inches.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Number of

Men with said














Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google











• 185




Taking a piece of squared paper ruled, say, in inches and tenths,

mark off along a horizontal base-line a scale representing class-

intervals; a half-inch to the class-interval would be suitable.

Then choose a vertical scale for the class-frequencies, say 50

observations per interval to the inch, and mark off, on the

verticals or ofdinates through the points marked 5-5, 5"6, 5-7"

. . . . at the centres of the class-intervals on the base-line, heights

representing on this scale the class-frequencies 3, 12, 43. . . .

The diagram may then be completed in one of two ways: (1)

as a frequency polygon, by joining up the marks on the ver-

ticals by straight lines, the last points at each end being joined

down to the base at the centre of the next class-interval (fig. 1);

or (2) as a column diagram or histogram (to use a term sug-

gested by Professor Pearson, ref. 1), short horizontals being drawn

through the marks on the verticals (fig. 2), which now form the

central axes of a series of rectangles representing the class-

frequencies. The student should note that in any such diagram,

of either form, a certain area represents. a given number of

observations. On the scales suggested, 1 inch on the horizontal

represents 2 intervals, and 1 inch on the vertical represents 50

observations per interval: 1 square inch therefore represents

50x2 = 100 observations. The diagrams are, however, con-

ventional: the whole area of the figure is correct in either case,



-8 -9 60 -1 -2 -3 -4 -3 fi -7

Head, bread9h, in. Inches

Fig. 1.—Frequency-Polygon for Head-breadths of 1000 Cambridge

Students. (Table V.)

5-5 -6 -7 -8 -d 60 -1 -2 -3 -4- S

Head, breadth, in inches.

Fig. 2.—Histogram for the same data as Fig. 1.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


interval is not the same, as suggested by the histogram. The

area shown by the frequency-polygon over any interval with an

ordinate y2 (fig. 3) is only correct if the tops of the three

Fig. 3.

successive ordinates yv y2, ys lie on a line, i.e. if y2 = ^(y1 + y3),

the areas of the two little triangles shaded in the figure being

equal. If y2 fall short of this value, the area shown by the

Fig. 4.

polygon is too great; if y2 exceed it, the area shown by the

polygon is too small; and if, for this reason, the frequency-

polygon tends to become very misleading at any part of the

range, it is better to use the histogram. In the mortality dis-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

tribution of Table I., for instance, the frequency rises so sharply

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

to the maximum that a histogram is, on the whole, the better re-

presentation of the distribution of frequency, and in such a

distribution as that of Table IV. the use of the histogram is

almost imperative.

12. If the class-interval be made smaller and smaller, and at

the same time the number of observations be proportionately in-

creased, so that the class-frequencies may remain finite, the

polygon and the histogram will approach more and more closely

to a smooth curve. Such an ideal limit to the frequency-polygon

or histogram is termed a frequency-curve. In this ideal frequency-

curve the area between any two ordinates whatever is strictly

proportional to the number of observations falling between the

corresponding values of the variable. Thus the number of

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

observations falling between the values x, and x2 of the variable

in fig. 4 will be proportional to the area of the shaded strip in the

figure; the number of observed values greater than x2 will

similarly be given by the area of the curve to the right of the

ordinate through x2 , and so on. When, in any actual case, the

number of observations is considerable—say a thousand at least

—the run of the class-frequencies is generally sufficiently

smooth to give a good notion of the form of the ideal distri-

bution; with small numbers the frequencies may present all

kinds of irregularities, which, most probably, have very little

significance (of. Chap. XV. § 15, and § 18, Ex. iv.). The forms

presented by smoothly running sets of numerous observations

present an almost endless variety, but amongst these we notice

a small number of comparatively simple types, from which many

at least of the more complex distributions may be conceived as

compounded. For elementary purposes it is sufficient to consider

these fundamental simple types as four in number, the symmetri-

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

cal distribution, the moderately asymmetrical distribution, the

extremely asymmetrical or J-shaped distribution, and the U-shaped


13. The symmetrical distribution, the class-frequencies decreas-

ing to zero symmetrically on either side of a central maximum.

Fig. 5 illustrates the ideal form of the distribution.

Being a special case of the more general type described under

the second heading, this form of distribution is comparatively rare

under any circumstances, and very exceptional indeed in economic

statistics. It occurs more frequently in the case of biometric, more

especially anthropometric, measurements, from which the following

illustrations are drawn, and is important in much theoretical work.

Table VI. shows the frequency-distribution of statures for adult

males in the British Isles, from data published by a British

Association Committee in 1883, the figures being given separately



Table VI.—Showing the Frequency-distributions of Statures for Adult

Males born in England, Ireland, Scotland, and Walts. Final Report of

the Anthropometric Committee to the British Association. (Report, 1883,

p. 256.) As Measurements are stated to have been taken to the nearest

1th of an Inch, the Class-Intervals are here presumably 56^f-57f|,

57i$-58H, and so on (cf. § 9). See Fig. 6.

Number of Men within said Limits of Height.

Place of Birth-

Height without


shoes, Inches.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259









Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
























740 Art








Fig. 5.—An ideal symmetrical Frequency-distribution.


• -■ A








Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

^^ ^—


ifs 600


58 60 62 64 66 68 10 72 7-1 76 78 80

Stature in inches.

Fig. 6. —Frequency-distribution of Stature for 8585 Adult Males born in

the British Isles. (Table VI.)

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


Table VII. gives two similar distributions from more recent

investigations, relating respectively to sons over 18 years of

age, with parents living, in Great Britain, and to students at

Cambridge. The polygons are shown in figs. 7 and 8. Both these

distributions are more irregular than that of fig. 6, but, roughly

speaking, they may all be held to be approximately symmetrical.

14. The moderately asymmetrical distribution, the class-fre-

quencies decreasing with markedly greater rapidity on one side of

the maximum than on the other, as in fig. 9 (a) or (b). This is

the most common of all smooth forms of frequency-distribution,

illustrations occurring in statistics from almost every source. The

distribution of death-rates in the registration districts of England

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Table VII.—Showing the Frequency-distribution of Statures for (1) 1078

English Sons (Karl Pearson, Biometrika, ii., 1903, p. 415); (2) for 1000

Male Students at Cambridge (W. R. Macdonell, Biometrika, i., 1902,

p. 220). See Figs. 7 and 8.

Number of Men within said

Stature in

Limits of Stature.



English Sons.




59 5-60-5


60-5-61 -5

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

61 '5-62 -5






63 -5-64 -5











123 5







69 5-70-5







63 0













"" 80


Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259



58 60 2 4 6 8 70 2 4 6 8 80

Stature in, inches

Fig. 7.—Frequency-distribution of Stature for 1078 " English Sons.'

(Table VII.)


Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

•5 BO

b eo




24 SS 70 246

Stature in" inches.

B 80

Fig. 8 — Frequency-distribution of Stature for 1000 Cambridge

Students. (Table VII.)



and Wales, given in Table I., p. 77, is a spmewhat rough example

of the type. The distribution of rates of pauperism in the same

Fig. 9.—Ideal distributions of the moderately asymmetrical form.

districts (Table VIII. and fig. 10) jis smoother and more like the

type (a) of fig. 9. The frequency attains a maximum for







4. 40,
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

ft. 30






- -7- - V_

— 1. — v ——

4 3— —



i- - -\

'- ^ ~

to 11

Percentage of the population, in- receipt of relief.

Fig. 10.—Frequency-distribution of Pauperism (Percentage of the Population

in Receipt of Poor-law Relief) on 1st January 1891 in the Registration

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Districts of England and Wales: 632 Districts. (Table VIII.)



districts with 2f to 3^ per cent, of the population in receipt of

relief, and then tails off slowly to unions with 6, 7, and 8 per

cent. of pauperism.

Table VIII.—Showing the Number of Registration Districts in England and

Wales with Different Percentages of the Population in receipt of Poor-law

Belief on the 1st January 1891. (Yule, Jour. Roy. Stat. Soc, vol. lix.,

1896, p. 347. q.v. for distributions for earlier years.) See Fig. 10.

Percentage of

Number of

Unions with

given Percent-

age in receipt
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

of Relief.

the Population

in receipt of .











5-25-5 75



Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google







\ 85\

V100 J









While the distribution of stature is in general symmetrical, that

of weight is asymmetrical or skew, the greater frequencies lying

towards the lower end of the range. This is shown very well by

the data (Table IX. and fig. 11) collected by the same British

Association Committee, from the Report of which the data as to

stature were cited in the last section. As in the case of the stature

diagram (fig. 6), the small error of £ lb. has been neglected, for

the sake of brevity, in lettering the base-line of fig. 11, the classes

being treated as if they were 90 lb.-l00 lb., 100 lb.-H0 lb.,

and so on.

Table X. and fig. 12 give a biological illustration, viz. the

distribution of fecundity (ratio of yearling foals produced to

coverings) in mares. The student should notice the difficulty





« 800





Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

85 105 125 145 165 185 205 2Z5 245 265 285

Weigh9• in. lbs

Fig. 11. —Frequency-distribution of Weight for 7749 Adult Males in

the British Isles. (Table IX.)


1/15 ZJ15 3/l5 4/15 SJ15 6/15 1J15 BJ15 3/15 10/15 11J15 12/15 li/l5 14/l5 I

Ratio of Yearling foals produced• to coveriiirfs.

Fig. 12.—Frequency-distribution of Fecundity for Brood-mares;

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

2000 observations. (Table X.)



Table IX. —Showing the Frequency-distribution of Weights for Adult Males

born in England, Ireland, Scotland, and Wales. (Loc. cit., Table VI.)

Weights were taken to the nearest pound, consequently the true Class-

Intervals are 89-5-99-5, 99.5-109-5, etc. (§ 9).


in lbs.

Number of Men within given Limits of

Weight. Place of Birth-




Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

















Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

































Table X.—Showing the Frequency-distribution of Fecundity, i.e. the llatio

of the Number of Yearling Foals produced to the Number of Coverings,

for Brood-mares (Race-horses) Covered Eight Times at Least. (Pearson,

Lee, and Moore, Phil. Trans., A, vol. cxeii. (1899), p. 303.) See Fig. 12.

Number of

Number of

Mares with

Mares with




Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

between the

between the

Given Limits.

Given Limits.

1/30- 3/30



3/30- 5/30




5/30- 7/30




7/30- 9/30
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google




















Table XI. —Showing the Frequency-distribution of Barometer Heights for

Daily Observations during the Thirteen Years 1878-1890 at Southampton.

(Karl Pearson and A. Lee, Phil. Trans., A, vol. cxc. (1897), p. 428, q.v.

for numerous other distributions.) See Fig. 13.

Number of Days

Number of Days

Height of

on which Height

Height of

on which Height


was observed


was observed

in Inches.

between the

in Inches.

between the

Given Limits.

Given Limits.




•g see


- 400




Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259




ZS>. 295 30

Hetglit izv incites

Flo. 13.—Frequency-distribution of Barometer Heights at

Southampton: 4748 observations. (Table XI.)

30 40

Years of age

plG 14 Frequency-distribution of Deaths from Diphtheria at different Ages

in England and Wales, 1891-1900. (Table XII.)

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


the distribution, in such a way as to suggest that the ideal

curve is tangential to the base. Cases of greater asymmetry,

suggesting an ideal cur\ e that meets the base (at one end) at a

finite angle, even a right angle, as in fig. 9 (6), are less frequent,

but occur occasionally. The distribution of deaths from diphtheria,

according to age, affords one such example of a more asymmetrical

kind. The actual figures for this case are given in Table XII., and

illustrated by fig. 14; and it will be seen that the frequency of

deaths reaches a maximum for children aged "3 and under 4,"

the number rising very rapidly to the maximum, and thence

falling so slowly that there is still an appreciable frequency for

persons over 60 or 70 years of age.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Table XII.—Showing the Numbers of Deaths from Diphtheria at Different

Agesin England and Wales during the Ten Years 1891-1900. (Supple-

ment to 65<A Annual Report of the Registrar-General, 1891-1900, p. 3.)

See Fig. 14.

Number of

Age in Years.

Deaths between


per Annum.

Given Limits

of Age.

Under 1 year





Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google





































75 and upwards



only, they would have run 49,479, 23,348, 4,092, and so on,

thus suggesting a maximum number of deaths at the beginning

of life, i.e. a distribution of the present type. It is only the

analysis of the deaths in the earlier years of life by one-year

intervals which shows that the frequency reaches a true maximum

in the fourth year, and therefore the distribution is of the

moderately asymmetrical type. In practical cases no hard and

Fig. 15.—An ideal Distribution of the extreme Asymmetrical Form.

fast line can always be drawn between the moderately and

extremely asymmetrical types, any more than between the

moderately asymmetrical and the symmetrical type.

In economic statistics this form of distribution is particularly

characteristic of the distribution of wealth in the population at

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

large, as illustrated, e.g., by income tax and house valuation returns,

by returns of the size of agricultural holdings, and so on (cf. ref. 4).

The distributions may possibly be a very extreme case of the last

type; but if the maximum is not absolutely at the lower end of the
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


range, it is very close indeed thereto. Official returns do not

usually give the necessary analysis of the frequencies at the

lower end of the range to enable the exact position of the maximum

to be determined; and for this reason the data on which Table

XIII. is founded, though of course very unreliable, are of some

interest. It will be seen from the table and fig. 16 that with the

given classification the distribution appears clearly assignable to

the present type, the number of estates between zero and £100

in annual value being more than six times as great as the number

between £100 and £200 in annual value, and the frequency

continuously falling as the value increases. A close analysis of

the first class suggests, however, that the greatest frequency does
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

not occur actually at zero, but that there is a true maximum

frequency for estates of about £1 15 0 in annual value. The

distribution might therefore be more correctly assigned to the

second type, but the position of the greatest frequency indicates a

Table XIII.—Showing the Numbers and Annual Values of the Estates of

those who had taken part in the Jacobite Rising of 1715. (Compiled from

Cosin's Names of the Roman Catholics, Nonjurors, and others who refused

to take the Oaths to his late Majesty King Oeorge, etc.; London, 1745.

Figures of very doubtful absolute value. See a note in Southey's

Commonplace Book, vol. i. p. 573, quoted from the Memoirs of T. Hollis.)

See Fig. 16.


Number of



Number of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Value in

Value in



0- 1













1- 2

2- 3





3- 4

4- 5

5- 6

6- 7

7- 8


8- 9





degree of asymmetry that is high even compared with the

asymmetry of fig. 14: the distribution of numbers of deaths from




H 10


«S 4



AruvaaX value isi JLlOO

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

II 12

Fig. 16.—Frequency distribution of the Annual Values of certain Estates

in England in 1715: 2476 Estates. (Table XIII.)

diphtheria would more closely resemble the distribution of estate-

values if the maximum occurred in the fourth and fifth weeks

of life instead of in the fourth year. The figures of Table IV.,

p. 83, showing the annual value and number of dwelling-houses,

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


afford a good illustration of this form of distribution, but marred

by the unequal intervals so common in official returns.

Table XIV.—Showing the Frequencies of Different Numbers of Petals for

Three Series of Ranunculus bulbosus. (H. de Vries, Ber. dtsch. hot. Ges.,

Bd. xii., 1894, q.v. for details.) See Fig. 17.


of Petals.


Series A.

Series B.

Series C.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259








Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google







The type is not very frequent in other classes of material, but

instances occur here and there. Table XIV. and fig. 17 show




5 6 7 8 9 10 11

S 6 7 S 9 10

Fig. 17.—Frequency-distributions of Numbers of Petals for Three Series of

Ranunculus bulbosus: A 337, B 380, C 222 observations. (Table XIV.)

distributions of this form for the petals of the buttercup, Ranun-

culus bulbosus.

16. The U-shaped distribution, exhibiting a maximum frequency



at the ends of the range and a minimum towards the centre

The ideal form of the distribution is illustrated by fig. 18.

Fig. 18.—An ideal Distribution of the U-shaped Form.

This is a rare but interesting form of distribution, as it stands

in somewhat marked contrast to the preceding forms. Table XV.

and fig. 19 illustrate an example based on a considerable number

of observations, viz. the distribution of degrees of cloudiness, or

estimated percentage of the sky covered by cloud, at Breslau

Table XV.—Showing the Frequencies of Estimated Intensities of Cloudiness

at Breslau during the Ten Years 1876-85. (See ref. 2.) See Fig. 19.


Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259







Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google










during the years 1876-85. A sky completely, or almost com-

pletely, overcast at the time of observation is the most common,

a practically clear sky comes next, and intermediates are more


This form of distribution appears to be sometimes exhibited by

the percentages of offspring possessing a certain attribute when one

at least of the parents also possesses the attribute. The remarks




500 .

1 1 1 1 i ''
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

1 1 • _J

2 3 4 5 6.

Fig. 19.—Frequency-distribution of Degrees of Cloudiness at Bresluu,

1876-85: 3663 observations. (Table XV.)

of Sir Francis Galton in Natural inheritance suggest such a

form for the distribution of "consumptivity" amongst the off-

spring of consumptives, but the figures are not in a decisive shape.

Table XVI. gives the distribution for an analogous case, viz. the

Table XVI.—Showing the Percentages of Deaf-mutes among Children of

Parents one of whom at least was a Deaf-mute, for Marriages producing

Five Children or more. (Compiled from material in Marriages of the Deaf

in America, ed. E. A. Fay, Volta Bureau, Washington, 1898.)


Number of



Number of
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


















distribution of deaf-mutism amongst the offspring of parents one

of whom at least was a deaf mute. In general less than one-fifth

of the children are deaf-mutes: at the other end of the range the

cases in which over 80 per cent. of the children are deaf-mutes are

nearly three times as many as those in which the percentage lies

between 60 and 80. The numbers are, however, too small to form

a very satisfactory illustration.


(1) Pearson, Karl, "Skew Variation in Homogeneous Material," Phil.

Trans. Roy. Soc., Series A, vol. clxxxvi. (1895), pp. 343-414.

(2) Pearson, Karl, "Cloudiness: Note on a Novel Case of Frequency,"

Proc. Roy. Soc, vol. lxii. (1897), p. 287.

(3) Pearson, Karl, "Supplement to a Memoir on Skew Variation," Phil.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Trans. Roy. Soc, Series A, vol. cxcvii. (1901), pp. 443-459.

(4) Pareto, Vilfredo, Cours d'économie politiq1ie; 2 vols., Lausanne,

1896-7. See especially tome ii., livre iii., chap• i., "La courbe des


The first three memoirs above are mathematical memoirs on the theory

of ideal frequency-curves, the first being the fundamental memoir, and

the second and third supplementary. The elementary student may,

however, refer to them with advantage, on account of the large collection

of frequency-distributions which is given, and from which some of the

illustrations in the preceding chapter have been cited. Without

attempting to follow the mathematics, he may also note that each of

our rough empirical types may be divided into several sub-types, the

theoretical division into types being made on different grounds.

The fourth work is cited on account of the author's discussion of the

distribution of wealth in a community, to which reference was made in


Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

1. If the diagram fig. 6 is redrawn to scales of 300 observations per interval

to the inch and 4 inches of stature to the inch, what is the scale of observa-

tions to the square inch?

If the scales are 100 observations per interval to the centimetre and 2 inches

of stature to the centimetre, what is the scale of observations to the

square centimetre?

2. If fig. 10 is redrawn to scales of 25 observations per interval to the inch

and 2 per cent. to the inch, what is the scale of observations to the

square inch?

If the scales are 10 observations per interval to the centimetre and 1 per

cent. to the centimetre, what is the scale of observations to the square


3. If a frequency-polygon be drawn to represent the data of Table I., what

number of observations will the polygon show between death-rates of

16 5 and 17 -5 per thousand, instead of the true number 159?

4. If a frequency-polygon be drawn to represent the data of Table V.,

what number of observations will the polygon show between head-breadths

5-95 and 6-05, instead of the true number 236?



1. Necessity for quantitative definition of the characters of a frequency-

distribution—2. Measures of position (averages) and of dispersion—3.

The dimensions of an average the same as those of the variable—4.

Desirable properties for an average to possess—5. The commoner forms

of average—6-13. The arithmetic mean: its definition, calculation, and

simpler properties—14-18. The median: its definition, calculation, and

simpler properties—19-20. The mode: its definition and relation to

mean and median—21. Summary comparison of the preceding forms

of average—22-26. The geometric mean: its definition, simpler pro-

perties, and the cases in which it is specially applicable—27. The

harmonic mean: its definition and calculation.

1. In § 2 of the last chapter it was pointed out that a classification

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

of the observations in any long series is the first step necessary

to make the observations comprehensible, and to render possible

those comparisons with other series which are necessary for any

discussion of causation. Very little experience, however, would

show that classification alone is not an adequate method, seeing

that it only enables qualitative or verbal comparisons to be made.

The next step that it is desirable to take is the quantitative

definition of the characters of the frequency-distribution, so that

quantitative comparisons may be made between the corresponding

characters of two or more series. It might seem at first sight

that very difficult cases of comparison could arise in which, for

example, we had to contrast a symmetrical distribution with a "J-

shaped " distribution. As a matter of practice, however, we seldom

have to deal with such a case; distributions drawn from similar

material are, in general, of similar form. When we have to

compare the frequency-distributions of stature in two races of

man, of the death-rates in English registration districts in two

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

successive decades, of the numbers of petals in two races of the

same species of Ranunculus, we have only to compare with each

other two distributions of the same or nearly the same type.

2. Confining our attention, then, to this simple case, there are

two fundamental characteristics in which such distributions may



differ: (1) they may differ markedly in position, i.e. in the values

of the variable round wlych they centre, as in fig. 20, A, or (2)

they may centre round the same value, but differ in the range of

variation or dispersion, as it is termed, as in fig. 20, B. Of course

the distributions may differ in both characters at once, as in fig 20,

C, but the two properties may be considered independently.

Measures of the first character, position, are generally known as

averages; measures of the second are termed measures of disper-

sion. In addition to these two principal and fundamental

characters, we may also take a third of some interest but of much

less importance, viz. the degree of asymmetry of the distribution.

Fig. 20.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

The present chapter deals only with averages; measures of

dispersion are considered in Chapter VIII. and measures of

asymmetry are also briefly discussed at the end of that chapter.

3. In whatever way an average is defined, it may be as well to

note, it is merely a certain value of the variable, and is therefore

necessarily of the same dimensions as the variable: i.e. if the

variable be a length, its average is a length; if the variable be a

percentage, its average is a' percentage, and so on. But there are

several different ways of approximately defining the position of a

frequency-distribution, that is, there are several different forms of

average, and the question therefore arises, By what criteria are we

to judge the relative merits of different forms? What are, in fact,

the desirable properties for an average to possess 1

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


(a) In the first place, it almost goes without saying that an

'average should be rigidly defined, and not left to the mere estimation

of the observer. An average that was merely estimated would

depend too largely on the observer as well as the data. (6) An

average should be based on all the observations made. If not,

it is not really a characteristic of the whole distribution• (c) It

is desirable that the average should possess some simple and

obvious properties to render its general nature readily compre-

hensible: an average should not be of too abstract a mathematical

character• (d) It is, of course, desirable that an average should

be calculated with reasonable ease and rapidity. Other things

being equal, the easier calculated is the better of two forms of

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

average. At the same time too great weight must not be attached

to mere ease of calculation, to the neglect of other factors• (e)

It is desirable that the average should be as little affected as

may be possible by what we have termed fluctuations of sampling.

If different samples be drawn from the same material, however

carefully they may be taken, the averages of the different samples

will rarely be quite the same, but one form of average may show

much greater differences than another. Of the two forms, the

more stable is the better. The full discussion of this condition

must, however, be postponed to a later section of this work

(Chap. XVII.). (f) Finally, by far the most important desideratum

is this, that the measure chosen shall lend itself readily to

algebraical treatment. If, e.g., two or more series of observations

on similar material are given, the average of the combined series

should be readily expressed in terms of the averages of the

component series: if a variable may be expressed as the sum of

two or more others, the average of the whole should be readily

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

expressed in terms of the averages of its parts. A measure for

which simple relations of this kind cannot be readily determined

is likely to prove of somewhat limited application.

5. There are three forms of average in common use, the

arithmetic mean, the median, and the mode, the first named being

by far the most widely used in general statistical work. To

these may be added the geometric mean and the harmonic mean,

more rarely used, but of service in special cases. We will con-

sider these in the order named.

6. The arithmetic mean.—The arithmetic mean of a series of

values of a variable 2£L, X2, X3, . i . Xn, N in number, is the

quotient of the sum of the values by their number. That is to

say, if M be the arithmetic mean,

M=j(X1 + X2 + Xi+ . . . +X„),


or, to express it more briefly by using the symbol 2 to denote

"the sum of all quantities like,"

M-^X) . . . . (1)

The word mean or average alone, without qualification, is very

generally used to denote this particular form of average: that

is to say, when anyone speaks of "the mean " or "the average"

of a series of observations, it may, as a rule, be assumed that the

arithmetic mean is meant. It is evident that the arithmetic

mean fulfils the conditions laid down in (a) and (b) of S 4, for it

is rigidly defined and based on all the observations made.

Further, it fulfils condition (c), for its general nature is readily

comprehensible. If the wages-bill for N workmen is £P, the

arithmetic mean wage, P/N pounds, is the amount that each

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

would receive if the whole sum available were divided equally

between them: conversely, if we are told that the mean wage

is £M, we know this means that the wages-bill is N.M pounds.

Similarly, if N families possess a total of C children, the mean

number of children per family is C/N—the number that each

family would possess if the children were shared uniformly.

Conversely, if the mean number of children per family is M, the

total number of children in N families is N.M. The arithmetic

mean expresses, in fact, a simple relation between the whole

and its parts.

7. As regards simplicity of calculation, the mean takes a high

position. In the cases just cited, it will be noted that the mean

is actually determined without even the necessity of determining

or noting all the individual values of the variable: to get the

mean wage we need no£ know the wages of every hand, but only

the wages-bill; to get the mean number of children per family

we need not know the number in each family, but only the total.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

If this total is not given, but we have to deal with a moderate

number of observations—so iew (say 30 or 40) that it is hardly

worth while compiling the frequency-distribution—the arithmetic

mean is calculated directly as suggested by the definition, i.e.

all the values observed are added together and the total divided

by the number of observations. But if the number of observations

be large, this direct process becomes a little lengthy. It may

be shortened considerably by forming the frequency-table and

treating all the values in each class as if they were identical with

the mid-value of the class-interval, a process which in general

gives an approximation that is quite sufficiently exact for prac-

tical purposes if the class-interval has been taken moderately


small (c/. Chap. VI. § 5). In this process each class-frequency

is multiplied by the mid-value of the interval, the products added

together, and the total divided by the number of observations.

If/denote the frequency of any class, X the mid-value of the

corresponding class-interval, the value of the mean so obtained

may be written—

M-yty.X) . . . . (2)

8. But this procedure is still further abbreviated in practice

by the following artifices:—(1) The class-interval is treated

as the unit of measurement throughout the arithmetic; (2) the

difference between the mean and the mid-value of some arbi-

trarily chosen class-interval is computed instead of the absolute

value of the mean.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

If A be the arbitrarily chosen value and

X=A + £. .... (3)


%(fX) = l(f.A) + 2(f.£),

or, since A is a constant,

M=A+±Xff - - - • W

The calculation of %(f.X) is therefore replaced by the calcula-

tion of 5(/.£). The advantage of this is that the class-frequencies

need only be multiplied by small integral numbers; for A

being the mid-value of a class-interval, and X the mid-value of

another, and the class-interval being treated as a unit, the f's

must be a series of integers proceeding from zero at the arbitrary

origin A. To keep the values of f as small as possible, A should

be chosen near the middle of the range.

It may be mentioned here that 2(f), or 2(/.f) for the grouped

distribution, is sometimes termed thejirst moment of the distribu-

tion about the arbitrary origin A: we shall not, however, make

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

use of this term.

9. The process is illustrated by the following example, using

the frequency-distribution of Table VIII., Chap. VI. The

arbitrary origin A is taken at 3-5 per cent., the middle of the

sixth class-interval from the top of the table, and a little nearer

than the middle of the range to the estimated position of the

mean. The consequent values of $ are then written down as in

column (3) of the table, against the corresponding frequencies, the

values starting, of course, from zero opposite 3"5 per cent. Each

frequency/is then multiplied by its | and the products entered



in another column (4). The positive and negative products are

totalled separately, giving totals - 776 and + 509 respectively,

whence 2(/.£) = - 267. Dividing this by N, viz. 632, we have

the difference of At from A in class-intervals, viz. 0-42 intervals,

that is 0'21 per cent. Hence the mean is 3-5 - 0"21 = 3-29

per cent.

Calculation of the Mean: Example i.—Calculation of the Arithmetic

Mean of the Percentages of the Population in receipt of Relieff. ftom the

Figures of Table VIII., Chap. VI., p. 93.



Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259



'of the




from Arbitrary


(Percentage in


Value A


receipt of




Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



















* **

















interval is half a unit, and accordingly the quotient 267/632 is

halved in order to obtain an answer in units. Care must also be

taken to give the right sign to the quotient.

10. As the process is an important one we give a second illustra-

tion from the figures of Table VI., Chap. VI. In this case the class-

interval is a unit (1 inch), so the value of M — A is given directly

by dividing 2(/.£) by N. The student must notice that, measures

having been made to the nearest eighth of an inch, the mid-values

of the intervals are 57^, 58^, etc., and not 57'5, 58-5, etc.

Calculation of the Mean: Example ii.—Calculation of the Arithmetic

Mean Stature of Male Adults in the British Isles from the Figures of

Chap. VI., Table VI., p. 88.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259







from Arbitrary




Value A





Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google








































It is evident that an absolute check on the arithmetic of any

such calculation may be effected by taking a different arbitrary

origin for the deviations: all the figures of col. (4) will be changed,

but the value ultimately obtained for the mean must be the

same. The student should note that a classification by unequal

intervals is, at best, a hindrance to this simple form of calculation,

and the use of an indefinite interval for the extremity of the

distribution renders the exact calculation of the mean impossible

(c/. Chap. VI. § 10).

11. We return again below (§ 13) to the question of the


Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259


- --/ V

- -J- . ,\-


1 Mo % y

,::::::: \::::::::

- /-— ■ _V

--r X-

- / -^v

I Is.

0 12 3^ 4 6 6 7 8 9 10

Percentage of the population" in" receipt of relief.

Fig. 21.—Showing the Arithmetic Mean M, the Median Mi, and the Mode Mo,

by verticals drawn through the corresponding points on the base, for the

distribution of pauperism of fig. 10, p. 9i

errors caused by the assumption that all values within the same
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

interval may be treated as approximately the mid-value of the

interval. It is sufficient to say here that the error is in general

very ""small and of uncertain sign for a distribution of the

symmetrical or only moderately asymmetrical type, provided of

course the class-interval is not large (Chap. VI. § 5). In the case

of the "J-shaped" or extremely asymmetrical distribution, how-

ever, the error is evidently of definite sign, for in all the intervals

the frequency is piled up at the limit lying towards the greatest

frequency, i.e. the lower end of the range in the case of the illustra-

tions given in Chap. VI., and is' not evenly distributed over the


interval. In distributions of such a type the intervals must be

made very small indeed to secure an approximately accurate value

for the mean. The student should test for himself the effect of

different groupings in two or three different cases, so as to get

some idea of the degree of inaccuracy to be expected.

12. If a diagram has been drawn representing the frequency-

distribution, the position of the mean may conveniently be

indicated by a vertical through the corresponding point on the

base. Thus fig. 21 (a reproduction of fig. 10) shows the frequency-

polygon for our first illustration, and the vertical MM indicates

the mean. In a moderately asymmetrical distribution at all of

this form the mean lies, as in the present example, on the side of
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

the greatest frequency towards the longer "tail" of the distribu-

Mo MiM

Fig. 22.—Mean M, Median Mi, and Mode Mo, of the ideal moderately

asymmetrical distribution.

tion: M in fig. 22 shows similarly the position of the mean in

an ideal distribution. In a symmetrical distribution the mean

coincides with the centre of symmetry. The student should mark

the position of the mean in the diagram of every frequency dis-

tribution that he draws, and so accustom himself to thinking of

the mean, not as an abstraction, but always in relation to the

frequency-distribution of the variable concerned.

13. The following examples give important properties of the

arithmetic mean, and at the same time illustrate the facility of its

algebraic treatment:—

(a) The sum of the deviations from the mean, taken with their

proper signs, is zero.

This follows at once from equation (4): for if M and A are

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

identical, evidently 2(/.f) must be zero.


(b) If a series of N observations of a variable X consist of, say,

two component series, the mean of the whole series can be

readily expressed in terms of the means of the two components.

For if we denote the values in the first series by X1 and in the

second series by X2,

"2(X) = 2(X,) + 2(X!),

that is, if there be .V1 observations in the first series and N2 in

the second, and the means of the two series be Mv M2 respectively,

N.M=Nl.Ml + N2.M2 . . . (5)

For example, we find from the data of Table VI., Chap. VI.,

Mean stature of the 346 men born in Ireland = 67-78 in.

„ ,, ,, 741 „ „ Wales = 66-62 in.

Hence the mean stature of the 1087 men born in the two countries
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

is given by the equation—

1087.il/= (346 x 67-78) + (741 x 66-62).

That is, M = 66-99 inches. It is evident that the form of the

relation (5) is quite general: if there are r series of observations

Xv X2 .... X„ the mean M of the whole series is related' to

the means Mv M2 .... M„ of the component series by the


N.M<=NVM1 + N2.M2 + .... +Nr.Mr . . (6)

For the convenient checking of arithmetic, it is useful to note

that, if the same arbitrary origin A for the deviations £ be taken

in each case, we must have, denoting the component series by the

subscripts 1, 2, ... r as before,

2(/.f) = 2(/1.£1) + 2(/2.£2) + + 2(/,£) • (7)

The agreement of these totals accordingly checks the work.

As an important corollary to the general relation (6), it may

be noted that the approximate value for the mean obtained from

any frequency distribution is the same whether we assume (1)

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

that all the values in any class are identical with the mid-value

of the class-interval, or (2) that the mean of the values in the

class is identical with the mid-value of the class-interval.

(c) The mean of all the sums or differences of corresponding

observations in two series (of equal numbers of observations) is

equal to the sum or difference of the means of the two series.

This follows almost at once. For if

JL = Jt1 i *A2J

2(X) = 2(X1)±2(X2).

That is, if M, Mv M2 be the respective means,

M=M1±M2 . . (8)

Evidently the form of this result is again quite general, so that


X=X1±Xt± .... ±X„

Jf=M1±Mi± .... ±Mr . . (9)

As a useful illustration of equation (8), consider the case of

measurements of any kind that are subject (as indeed all

measures must be) to greater or less errors. The actual measure-

ment X in any such case is the algebraic sum of the true

measurement X1 and an error X2. The mean of the actual

measurements M is therefore the sum of the true mean Mv and

the arithmetic mean of the errors M2. If, and only if, the
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

latter be zero, will the observed mean be identical with the true

mean. Errors of grouping (§11) are a case in point.

14. The median.—The median may be defined as the middle-

most or central value of the variable when the values are ranged

in order of magnitude, or as the value such that greater and

smaller values occur with equal frequency. In the case of a

frequency-curve, the median may be defined as that value of the

variable the vertical through which divides the area of the curve

into two equal parts, as the vertical through Mi in fig. 22.

The median, like the mean, fulfils the conditions (b) and (c)

of § 4, seeing that it is based on all the observations made, and

that it possesses the simple property of being the central or

middlemost value, so that its nature is obvious. But the defini-

tion does not necessarily lead in all cases to a determinate value.

If there be an odd number of different values of X observed, say

2n+1, the (n+l)th in order of magnitude is the only value

fulfilling the definition. But if there be an even number, say

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

2n different values, any value between the nth and (m+l)th

fulfils the conditions. In such a case it appears to be usual to

take the mean of the nth and (m +1 )th values as the median,

but this is a convention supplementary to the definition. It

should also be noted that in the case of a discontinuous variable

the second form of the definition in general breaks down: if we

range the values in order there is always a middlemost value

(provided the number of observations be odd), but there is not, as a

rule, any value such that greater and less values occur with equal

frequency. Thus in Table III., § 3 of Chap. VI., we see that 45 per

cent. of the poppy capsules had 12 or fewer stigmatic rays, 55

per cent. had 13 or more; similarly 61 per cent. had 13 or fewer

rays, 39 per cent. had 14 or more. There is no number of rays


such that the frequencies in excess and defect are equal.

In the case of the buttercups of Table XIV. (Chap. VI. § 15)

there is no number of petals that even remotely fulfils the

required condition. An analogous difficulty may arise, it may

be remarked, even in the case of an odd number of observations

of a continuous variable if the number of observations be small

and several of the observed values identical. The median is

therefore a form of average of most uncertain meaning in cases

of strictly discontinuous variation, for it may be exceeded by

5, 10, 15, or 20 per cent. only of the observed values, instead of

by 50 per cent.: its use in such cases is to be deprecated, and

is perhaps best avoided in any case, whether the variation be

continuous or discontinuous, in which small series of observations

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

have to be dealt with.

15. When a table showing the frequency-distribution for a

long series of observations of a continuous variable is given, no

difficulty arises, as a sufficiently approximate value of the median

can be readily determined by simple interpolation on the hypo-

thesis that the values in each class are uniformly distributed

throughout the interval. Thus, taking the figures in our first

illustration of the method of ^calculating the mean, the total

number of observations (registration districts) is 632, of which

the half is 316. Looking down the table, we see that there are

227 districts with not more than 2-75 per cent. of the population

in receipt of relief, and 100 more with between 2-75 and 3-25

per cent. But only 89 are required to make up the total of 316;

hence the value of the median is taken as


2-75 + ^. A = 2-75 + 0-445

100 J
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

= 3-195 per cent.

The mean being 3-29, the median is slightly less; its position

is indicated by Mi in fig. 21.

The value of the median stature of males may be similarly

calculated from the data of the second illustration. The work

may be indicated thus :—

Half the total number of observations (8585) = 4292-5

Total frequency under 66if inches . . = 3589

Difference . . = 703-5

Frequency in next interval . . = 1329

703 "5

Therefore median = 66^f + , „ _.

= 67'47 inches.


The difference between median and mean in this case is

therefore only about one-hundredth of an inch, the smallness

of the difference arising from the approximate symmetry of

the distribution. In an absolutely symmetrical distribution

it is evident that mean and median must coincide.

16. Graphical interpolation may, if desired, be substituted

for arithmetical interpolation. Taking, again, the figures of

Example i., the number of districts with pauperism not exceeding

2-25 is 138; not exceeding 2-75, 227 ; not exceeding 3-25, 327;

and not exceeding 3.75, 417. Plot the numbers of districts

with pauperism not exceeding each value X to the corresponding

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259


T Mi

2S 3 35

Percen9age of the population-

in- receip9 of relief.

Via. 23.—Determination of the median by graphical interpolation.

value of X on squared paper, to a good large scale, as in fig. 23,

and draw a smooth curve through the points thus obtained,

preferably with the aid of one of the "curves," splines, or flexible

curves sold by instrument-makers for the purpose. The point

in which the smooth curve so obtained cuts the horizontal line

corresponding to a total frequency N/2 = 316 gives the median.

In general the curve is so flat that the value obtained by this

graphical method does not differ appreciably from that calculated

arithmetically (the arithmetical process assuming that the

curve is a straight line between the points on either side of

the median); if the curvature is considerable, the graphical

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

value—assuming, of course, careful and accurate draughtsmanship

—is to be preferred to the arithmetical value, as it does not


involve the crude assumption that the frequency is uniformly

distributed over the interval in which the median lies.

17. A comparison of the calculations for the mean and

for the median respectively will show that on the score of

brevity of calculation the median has a distinct advantage.

When, however, the ease of algebraical treatment of the two

forms of average is compared, the superiority lies wholly on

the side of the mean. As was shown in § 13, when several series

of observations are combined into a single series, the mean of

the resultant distribution can be simply expressed in terms

of the means of the components. The expression of the

median of the resultant distribution in terms of the medians

of the components is, however, not merely complex and difficult,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

but impossible: the value of the resultant median depends on

the forms of the component distributions, and not on their

medians alone. If two symmetrical distributions of the same

form and with the same numbers of observations, but with

different medians, be combined, the resultant median must

evidently (from symmetry) coincide with the resultant mean, i.e.

lie halfway between the means of the components. But if the

two components be asymmetrical, or (whatever their form)

if the degrees of dispersion or numbers of observations in the

two series be different, the resultant median will not coincide

with the resultant mean, nor with any other simply assignable

value. It is impossible, therefore, to give any theorem for

medians analogous to equations (5) and (6) for means. It is

equally impossible to give any theorem analogous to equations

(8) and (9) of § 13. The median of the sum or difference of

pairs of corresponding observations in two series is not,

in general, equal to the sum or difference of the medians of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

the two series; the median value of a measurement subject to

error is not necessarily identical with the true median, even

if the median error be zero, i.e. if positive and negative errors

be equally frequent.

18. These limitations render the applications of the median in

any work in which theoretical considerations are necessary com-

paratively circumscribed. On the other hand, the median may

have an advantage over the mean for special reasons, (a) It is

very readily calculated; a factor to which, however, as already

stated, too much weight ought not to be attached, (b) It is

readily obtained, without the necessity of measuring all the

objects to be observed, in any case in which they can be arranged

by eye in order of magnitude. If, for instance, a number of men

be ranked in order of stature, the stature of the middlemost is

the median, and he alone need be measured. (On the other hand

it is useless in the cases cited at the end of § 6; the median wage

cannot be found from the total of the wages-bill, and the total

of the wages-bill is not known when the median is given.) (c) It

is sometimes useful as a makeshift, when the observations are so

given that the calculation of the mean is impossible, owing, e.g., to

a final indefinite class, as in Table IV. (Chap. VI. § 10). (d) The

median may sometimes be preferable to the mean, owing to its

being less affected by abnormally large or small values of the

variable. The stature of a giant would have no more influence

on the median stature of a number of men than the stature of

any other man whose height is only just greater than the median.

If a number of men enjoy incomes closely clustering round a

median of £500 a year, the median will be no more affected by

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

the addition to the group of a man with the income of £50,000

than by the addition of a man with an income of £5000, or even

£600. If observations of any kind are liable to present occasional

greatly outlying values of this sort (whether real, or due to

errors or blunders), the median will be more stable and less

affected by fluctuations of sampling than the arithmetic mean.

(In general the mean is the less affected.) The point is discussed

more fully later (Chap. XVII.).

19. The Mode.—The mode is the value of the variable corre-

sponding to the maximum of the ideal frequency-curve which

gives the closest possible fit to the actual distribution.

It is evident that in an ideal symmetrical distribution mean,

median and mode coincide with the centre of symmetry. If,

however, the distribution be asymmetrical, as in fig. 22, the three

forms of average are distinct, Mo being the mode, Mi the median,

and M the mean. Clearly, the mode is an important form of

average in the cases of skew distributions, though the term is of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

recent introduction (Pearson, ref. 11). It represents the value

which is most frequent or typical, the value which is in fact the

fashion (la mode). But a difficulty at once arises on attempting

to determine this value for such distributions as occur in practice.

It is no use giving merely the mid-value of the class-interval into

which the greatest frequency falls, for this is entirely dependent

on the choice of the scale of class-intervals. It is no use making

the class-intervals very small to avoid error on that account, for

the class-frequencies will then become small and the distribution

irregular. What we want to arrive at is the mid-value of the

interval for which the frequency would be a maximum, if the

intervals could be made indefinitely small and at the same time

the number of observations be so increased that the class-frequen-

cies should run smoothly. As the observations cannot, in a

practical case, be indefinitely increased, it is evident that some


process of smoothing out the irregularities that occur in the

actual distribution must be adopted, in order to ascertain the

approximate value of the mode. But there is only one smoothing

process that is really satisfactory, in so far as every observation

can be taken into account in the determination, and that is the

method of fitting an ideal frequency-curve of given equation to

the actual figures. The value of the variable corresponding to the

maximum of the fitted curve is then taken as the mode, in

accordance with our definition. Mo in fig. 21 is the value of the

mode so determined for the distribution of pauperism, the value

2.99 being, as it happens, very nearly coincident with the centre

of the interval in which the greatest frequency lies. The deter-

mination of the mode by this—the only strictly satisfactory—

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

method must, however, be left to the more advanced student.

20. At the same time there is an approximate relation between

mean, median, and mode that appears to hold good with surprising

closeness for moderately asymmetrical distributions, approaching

the ideal type of fig. 9, and it is one that should be borne in

mind as giving—roughly, at all events—the relative values of

these three averages for a great many cases with which the

student will have to deal. It is expressed by the equation—

Mode = Mean - 3(Mean - Median).

That is to say, the median lies one-third of the distance from the

mean towards the mode (compare figs. 21 and 22). For the

distribution of pauperism we have, taking the mean to three

places of decimals,—

Mean 3-289

Median .... 3-195

Difference .... 0-094

Hence approximate mode = 3-289 - 3 x 0-094

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

= 3-007,

or 3.01 to the second place of decimals, which is sufficient accuracy

for the final result, though three decimal places must be retained

for the calculation. The true mode, found by fitting an ideal

distribution, is 2-99. As further illustrations of the closeness

with which the relation may be expected to hold in different cases,

we give below the results for the distributions of pauperism in

the unions of England and Wales in the years 1850, 1860, 1870,

1881, and 1891 (the last being the illustration taken above),

and also the results for the distribution of barometer heights at



Southampton (Table XI., Chap. VI. § 14), and similar distribu-

tions at four other stations.

Comparison nf the Approximate, and True Modes in the Case of Five Dis-

tributions of Pauperism (Percentages of the Population in receipt of

Relief) in the Unions of England and Wales. (Yule, Jour. Roy. Stat.

Soc., vol. lix.. 1896.)






True Mode.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259








5 451



6 261






Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google









Comparison of the Approximate and True Modes in the Case of Five Dis-

tributions of the Height of the Barometer for Daily Observations at the

Stations named. (Distributions given by Karl Pearson and Alice Lee,

Phil. Trans , A, vol. cxc. (1897), p. 423.)






True Mode.

Southampton .



Glasgow .

Dundee .






30 000


29 974









rather less affected than the median by errors of sampling. The

median is, it is true, somewhat more easily calculated from a given

frequency-distribution than is the mean; it is sometimes a useful

makeshift, and in a certain class of cases it is more and not less

stable than the mean; but its use is undesirable in cases of discon-

tinuous variation, its value may be indeterminate, and its algebraic

treatment is difficult and often impossible. The mode, finally,

is a form of average hardly suitable for elementary use, owing

to the difficulty of its determination, but at the same time it

\ represents an important value of the variable. The arithmetic

mean should invariably be employed unless there is some very

/ definite reason for the choice of another form of average, and the

I elementary student will do very well if he limits himself to its

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

I use. Objection is sometimes taken to the use of the mean in the

case of asymmetrical frequency-distributions, on the ground that

the mean is not the mode, and that its value is consequently mis-

leading. But no one' in the least degree familiar with the

manifold forms taken by frequency-distributions would regard the

two as in general identical, and while the importance of the mode

is a good reason for stating its value in addition to that of the

mean, it cannot replace the latter. The objection, it may be

noted, would apply with almost equal force to the median, for, as

we have seen (§ 20), the difference between mode and median

is usually about two-thirds of the difference between mode and


22. The Geometric Mean.— The geometric mean G of a series of

values Xv X2, Xz, .... X„, is defined by the relation

G = (XvXrXt . . . . Xn% . . (io)

The definition may also be expressed in terms of logarithms,

logG=~2(logX) . . . (11)
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

that is to say, the logarithm of the geometric mean of a series of

values is the arithmetic mean of their logarithms.

The geometric mean of a given series of quantities is always

less than their arithmetic mean; the student will find a proof in

most text-books of algebra, and in ref. 10. The magnitude of

the difference depends largely on the amount of dispersion of the

variable in proportion to the magnitude of the mean (c/. Chap.

VIII., Question 8). It is necessarily zero, it should be noticed, if

even a single value of X is zero, and it may become imaginary if

negative values occur. Excluding these cases, the value of the


geometric mean is always determinate and is rigidly defined. The

computation is a little long, owing to the necessity of taking

logarithms: it is hardly necessary to give an example, as the

method is simply that of finding the arithmetic mean of the

logarithms of X (instead of the values of X) in accordance with

equation (11). If there are many observations, a table should be

drawn up giving the frequency-distribution of log X, and the

mean should be calculated as in Examples i. and ii. of §£ 9 and 10.

The geometric mean has never come into general use as a repre-

sentative average, partly, no doubt, on account of its rather

troublesome computation, but principally on account of its some-

what abstract mathematical character (cf. § 4 (c) ): the geometric

mean does not possess any simple and obvious properties which
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

render its general nature readily comprehensible.

23. At the same time, as the following examples show, the

mean possesses some important properties, and is readily treated

algebraically in certain cases.

(a) If the series of observations X consist of r component

series, there being iV1 observations in the first, Jf2 in the second,

and so on, the geometric mean G of the whole series can be

readily expressed in terms of the geometric means Gv G$, etc., of

the component series. For evidently we have at once (as in § 13


N. log G = 2fv log C1 + JT,. log 0,-r- .... + Ifr log Gr . (12)

(b) The geometric mean of the ratios of corresponding observa-

tions in two series is equal to the ratio of their geometric means.

For if


logX = logX1-logX2,

then summing for all pairs of X1'b and X2's,

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

G=GJG2 .... (13)

(c) Similarly, if a variable X is given as the product of any

number of others, i.e. if

X=XVX2.X3 . . . . Xr

Xv X2, . . . . X- denoting corresponding observations in r

different series, the geometric mean G of X is expressed in terms

of the geometric means Gv G2, . . .

. G- of Xv X2, . .

■ ■ X„ by

the relation

G = Gy G"9. LTg .

. . . Gr . .

• (I*)

That is to say, the geometric mean of the product is the product

of t.hfi ceometric means.

of the geometric means



24. The use of the geometric mean finds its simplest application

in estimating the numbers of a population midway between two

epochs (say two census years) at which the population is known.

If nothing is known concerning the increase of the population

save that the numbers recorded at the first census were P0 and at

the second census n years later Pm the most, reasonable assump-

1801 11 21 31

si 61

SI 91



Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259




5 Cumberland

"3 Dorset

-S 100

Q. Hereford


1801 11 21 31 41 SI 61 71 SI 91 1901

Census year.

Flo. 24.—Showing the Populations of certain rural counties of England

for each Census year from 1801 to 1901.

tion to make is that the percentage increase in each year has

been the same, so that the populations in successive years form a

geometric series, PQr being the population a year after the first

census, P0r2 two years after the first census, and so on, and

Pn = P0.rn . . . . (15)
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

The population midway between the two censuses is therefore

Pnl2 = P0.r"* = (PQ.P„y . . . (16)


i.e. the geometric mean of the numbers given by the two censuses.

This result must, however, be used with discretion. The rate of

increase of population is not necessarily, or even usually, constant

over any considerable period of time: if it were so, a curve

representing the growth of population as in fig. 24 would be

continuously convex to the base, whether the population were

increasing or decreasing. In the diagram it will be seen that

the curves are frequently concave towards the base, and similar

results will often be found for districts in which the population is

not increasing very rapidly, and from which there is much

emigration. Further, the assumption is not self-consistent in any

case in which the rate of increase is not uniform over the entire

area—and almost any area can be analysed into parts which are not
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

similar in this resp'ect. For if in one part of the area considered

the initial population is P0 and the common ratio R, and in the

remainder of the area the initial population is p0 and the common

ratio r, the population in year n is given by


This does not represent a constant rate of increase unless R = r.

If then, for example, a constant percentage rate of increase be

assumed for England and Wales as a whole, it cannot be assumed

for the Counties: if it be assumed for the Counties, it cannot be

assumed for the country as a whole. The student is referred to

refs. 14, 15 for a discussion of methods actually used for the con-

sistent estimation of populations under such circumstances.

25. The property of the geometric mean illustrated by equation

(13) renders it, in some respects, a peculiarly convenient form of

average in dealing with ratios, i.e. "index-numbers," as they are

termed, of prices. Let

. O'ri,, 0' ■"- 0' •'

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

■ • *"o

. y ■' ■t y'"

. . X\

-^ TT.'t '.V 1' ■'

■ ■ x\

denote the prices of N commodities in the years 0, 1, 2

Further, let Yl0 — XJX^, and so on, so that

V Y" V" V"

1 io' z io, ± io -1 io

± 20' -'20' 20' • ' ' • ■* 20

represent the ratios of the prices of the several commodities in years

1, 2, ... to their prices in year 0. These ratios, in practice

multiplied by 100, are termed index-numbers of the prices of the

several commodities, on the year 0 as base. Evidently some


form of average of the Y'b for any given year will afford an

indication of the general level of prices for that year, provided the

commodities chosen are sufficiently numerous and representative.

The question is, what form of average to choose. If the geometric

mean be chosen, and Gw G20 denote the geometric means of the

Y's for the years 1 and 2 respectively, we have


• (17)

^20 - / j. 20



* 20

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

1 20

^10 ^'10


Y"\o ''

'' Y"J




'' x\)



x"\ ■ ■

= (Y'n


Y-'n ■ ■

Y" )-

From the first form of this equation- we see that the ratio of the
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

geometric mean index-number in year 2 to that in year 1 is

identical with the geometric mean of the ratios for the index-

numbers of the several commodities. A similar property does

not hold for any other form of average: the ratio of the arithmetic

mean index-numbers is not the same as the arithmetic mean of

the ratios, nor is the ratio of the medians the median of the

ratios. From the second and third forms of the equation it

appears further that the ratio of the geometric mean index-

number in y«ar 2 to that in year 1 is independent of the prices in

the year first chosen as base (i.e. year 0), and is identical with the

geometric mean of the index-numbers for year 2, on year 1 as

base. Again, a similar property does not hold for any other form

of average. If arithmetic means of the index-numbers be taken,

for example, the ratio of the mean in year 2 to the mean in year

1 will vary with the year taken as base, and will differ more or

less from the arithmetic mean ratio of the prices in year 2 to the

prices of the same commodities in year 1 ; the same statement is

true if medians be used. The results g"T rt }w the use of the

geometric mean possess, therefore, a ce .1 consistency that is

not exhibited if other forms of average are employed. It was

used in a classical paper by Jevons (ref. 4), though not on quite

the same grounds, but has never been at all generally employed.

26. The general use of the geometric mean has been suggested

on another ground, namely, that the magnitudes of deviations

appear, as a rule, to be dependent in some degree on the magni-

tude of the average; thus the length of a mouse varies less than

the stature of a man, and the height of a shrub less than that of

a tree. Hence, it is argued, variations in such cases should be

measured rather by their ratio to, than their difference from, the

average; and if this is done, the geometric mean is the natural

average to use. If deviations be measured in this way, a



deviation G/r will be regarded as the equivalent of a deviation r.G,

instead of a deviation -a; as the equivalent of a deviation + x.

If a distribution take the simplest possible form when relative

deviations are regarded as equivalents, the frequency of deviations

between G/s and G/r will be equal to the frequency of deviations

between r.G and s.G. The frequency-curve will then be sym-

metrical round log G if plotted to log X as base, and if there be

a single mode, log G will be that mode—a logarithmic or geometric

mode, as it might be termed: G will not be the mode if the distri-

bution be plotted in the ordinary way to values of X as base.

The theory of such a distribution has been discussed by more than

one author (refs. 2, 8, 9). The general applicability of the assump-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

tion made does not, however, appear to have been very widely

tested, and the reasons assigned have not sufficed to bring the

geometric mean into common use. It may be noted that, as the

geometric mean is always less than the arithmetic mean, the

fundamental assumption which would justify the use of the former

clearly does not hold where the (arithmetic) mode is greater than

the arithmetic mean, as in Tables X. and XI. of the last chapter.

27. The Harmonic Mean.-—The harmonic mean of a series of

quantities is the reciprocal of the arithmetic mean of their

reciprocals, that is, if H be the harmonic mean,

H N\X,



The following illustration, the result of which is required for an

example in a later chapter (Chap. XIII. § 11), will serve to show

the method of calculation.

The table gives the number of litters of mice, in certain

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

breeding experiments, with given numbers (X) in the litter. (Data

from A. D. Darbishire, Biometrika, iii. pp. 30, 31.)

Number in

Number of









5 333




5 200









Whence, 1/// = 0-2831, i/= 3-532. The arithmetic mean is 4-587,

or more than a unit greater.

If the prices of a commodity at different places or times are

stated in the form "so much for a unit of money," and an average

price obtained by taking the arithmetic mean of the quantities

sold for a unit of money, the result is equivalent to the harmonic

mean of prices stated in the ordinary way. Thus retail prices of

eggs are usually quoted in England as "so many to the shilling."

Supposing we had 100 returns of retail prices of eggs, 50 returns

showing twelve eggs to the shilling, 30 fourteen to the shilling,

and 20 ten to the shilling; then the mean number per shilling

would be 12"2, equivalent to a price of 0-984d. per egg. But

if the prices had been quoted in the form usual for other com-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

modities, we should have had 50 returns showing a price of Id.

per egg, 30 showing a price of 0-857d., and 20 a price of l-2d.:

arithmetic mean 0-997d., a slightly greater value than the har-

monic mean of 0-984. The official returns of prices in India were,

until 1907, given in the form of "Sers (2-057 lbs.) per rupee."

The average annual price of a commodity was based on half-

monthly prices stated in this form, and "index-numbers" were

calculated from such annual averages. In the issues of "Prices

and Wages in India" for 1908 and later years the prices have

been stated in terms of "rupees per maund (82-286 lbs.)." The

change, it will be seen, amounts to a replacement of the harmonic

by the arithmetic mean price.

The harmonic mean of a series of quantities is always lower

than the geometric mean of the same quantities, and, a fortiori,

lower than the arithmetic mean, the amount of difference depend-

ing largely on the magnitude of the dispersion relatively to the

magnitude of the mean. (Cf. Question 9, Chap. VIII.)

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



(1) Fechner, G. T. "Ueber den Ausgangswerth der kleinsten Abweich-

ungssumme, dessen Bestimmung, Verwenduug und Verallgemein-

erung," Abh. d. kgl. sachsischen Oesellschaft d. Wissensclut/ten, vol.

xviii. (also numbered xi. of the Abh. d. math.-phys. Classe); Leipzig

(1878), p. 1. (The average denned as the origin from which the

dispersion, measured in one way or another, is a minimum: geometric

mean dealt with incidentally, pp. 13-16.)

(2) Fechner, G. T., Kollektivmasslehre, herausgegeben von G. F. Lapps;

Engelmann, Leipzig, 1897. (Posthumously published: deals with

frequency-distributions, their forms, averages, and measures of dis-

persion in general: includes much of the matter of (1).)

(3) Zizek, Franz, Die statistischen Mittelwerlhe; Duncker und Humblot,

Leipzig, 1908. (Non-mathematical, hut useful to the economic student

for references cited.)


The Geometric Mean.

(4) Jevons, W. Stanley, A Serious Fall in the Value of Gold ascertained

and its Social Effects set forth; Stanford, London, 1863. Reprinted

in Investigations in Currency and Finance; Macmillan, London, 1884.

(The geometric mean applied to the measurement of price changes.)

(5) Jevons, W. Stanley, "On the Variation of Prices and the Value of

the Currency since 1782," Jour. Boy. Slat. Soc, vol. xxviii., 1865.

Also reprinted in volume cited above.

(6) Edgewoeth, F. Y., "On the Method of ascertaining a Change in the

Value of Gold," Jour. Roy. Stat. Soc, vol. xlvi., 1883, p. 714. (Some

criticism of the reasons assigned by Jevons for the use of the geometric


(7) Galton, Francis, "The Geometric Mean in Vital and Social Statistics,"
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Proc. Roy. Soc., vol. xxix., 1879, p. 365.

(8) McAlister, Donald, "The Law of the Geometric Mean," ibid., p. 367.

(The law of frequency to which the use of the geometric mean would

be appropriate.)

(9) Kapteyn, J. C, Skew Frequency-curves in Biology and Statistics;

Noordhoff, Grbningen, and Wm. Dawson, London, 1903. (Contains,

amongst other forms, a generalisation of McAlister's law.)

(10) Crawford, G. E., "An Elementary Proof that the Arithmetic Mean

of any number of Positive Quantities is greater than the Geometric

Mean," Proc. Edin. Math. Soc., vol. xviii., 1899-1900.

See also refs. 1 and 2.

The Mode.

(11) Pearson, Karl, "Skew Variation in Homogeneous Material," Phil.

Trans. Roy. Soc., Series A, vol. clxxxvi., 1895, p. 343. (Definition of

mode, p. 345.)

(12) Yule, G. U., "Notes on the History of Pauperism in England and

Wales, etc. : Supplementary Note on the Determination of the Mode,"

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Jour. Roy. Stat. Soc., vol. lix., 1896, p. 343. (The note deals with

elementary methods of approximately determining the mode : the one-

third rule and one other.)

(13) Pearson, Karl, "On the Modal Value of an Organ or Character,"

Biometrika, vol. i., 1902, p. 260. (A warning as to the inadequacy of

mere inspection for determining the mode.)

Estimates of Population.

(14) Waters, A. C, "A Method for estimating Mean Populations in the

last Intercensal Period," Jour. Roy. Stat. Soc, vol. lxiv., 1901, p. 293.

(15) Waters, A. C, Estimates of Population: Supplement to the 65th Annual

Report ofthe Registrar-General for England and Wales (Od. 2618, 1907),

p. cxvii.


These were incidentally referred to in § 25. The general theory of

index-numbers and the different methods in which they may be formed

are not considered in the present work. The student will find copious

references to the literature in the following:—

(16) Edgeworth, F. Y., "Reports of the Committee appointed for the

purpose of investigating the best methods of ascertaining and measuring



Variations in the Value of the Monetary Standard," British Association

Reports, 1887 (p. 247), 1888 (p. 181), 1889 (p. 133), and 1890 (p. 485).

(17) Edgeworth, F. Y., Article " Index-numbers" in Palgrave's Dictionary

of Political Economy, vol. ii.; Macmillan, 1896.

(18) Fountain, H., "Memorandum on the Construction of Index-numbers

of Prices," in the Board of Trade Report on Wholesale and Retail

Prices in the United Kingdom,'" 1903.


1. Verify the following means and medians from the data of Table VI.

Chap. VI.

Stature in Inches for Adult Males in—

England. Scotland. Wales. Ireland.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259











In the calculation of the means use the same arbitrary origin as in Example

ii., and check your work by the method of § 13 (6).

2. Find the mean weight of adult males in the United Kingdom from the

data in the last column of Table IX., Chap. VI. Also find the median weight,

and hence the approximate mode, by the method of § 20.

3. Similarly, find the mean, median, and approximate value of the mode

for the distribution of fecundity in racehorses, Table X., Chap. VI.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

4. Using a graphical method, find the median annual value of houses assessed

to inhabited house duty in the financial year 1885-6 from the data of Table

IV., Chap. VI.

5. (Data from Sauerbeck, Jour. Roy. Stat. Soc, March 1909.) The figures

in columns 1 and 2 of the small table below show the index-numbers (or per-

centages) of prices of certain animal foods in the years 1898 and 1908, on

their average prices during the years 1867-77. In column 3 have been added

the ratios of the index-numbers in 1908 to the index-numbers in 1898, the

latter being taken as 100.

Find the average ratio of prices in 1908 to prices in 1898, taken as 100:—

(1) From the arithmetic mean of the ratios in col. 3.

(2) From the ratio of the arithmetic means of cols. 1 and 2.

(3) From the ratio of the geometric means of cols. 1 and 2.

(4) From the geometric mean of the ratios in ool. 3.

Note that, by § 25, the last two methods must give the same result.

Index- number of price in Ratio

1898. 1908. 08/98.





1. Beef, prime



















6. (Data from census of 1901.) The table below shows the population of

the rural sanitary districts of Essex, the urban sanitary districts (other than

the borough of West Ham), and the borough of West Ham, at the censuses

of 1891 and 1901. Estimate the total population of the county at a date

midway between the two censuses, (1) on the assumption that the percentage

rate of increase is constant for the county as a whole, (2) on the assumption

that the percentage rate of increase is constant in each group of districts and

the borough of West Ham.




Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Rural districts







West Ham ....

Other urban districts




7. (Data from Agricultural Statistics for 1905, Cd. 3061, 1906.) The

following statement shows the monthly average prices of eggs in Great

Britain in 1905, as compiled from the weekly returns of market prices for

first and second quality British eggs, per 120 :—

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google





s. d.

s. d.


13 0

11 0


11 0

















August .

11 0

10 0


11 6

10 6

October .

14 0

12 6


18 0

16 0


1. Inadequacy of the range as a measure of dispersion—2-13. The standard

deviation; its definition, calculation, and properties—14-19. The

mean deviation : its definition, calculation, and properties—20-24. The

quartile deviation or semi-interquartile range—25. Measures of

relative dispersion—26. Measures of asymmetry or skewness—27-30.

The method of grades or percentiles.

1. The simplest possible measure of the dispersion of a series of

values of a variable is the actual range, i.e. the difference between

the greatest and least values observed. While this is frequently

quoted, it is as a rule the worst of all possible measures for any

serious purpose. There are seldom real upper and lower limits

to the possible values of the variable, very large or very small

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

values being only more or less infrequent: the range is therefore

subject to meaningless fluctuations of considerable magnitude

according as values of greater or less infrequency happen to

have been actually observed. Note, for instance, the figures of

Table IX., Chap. VI. p. 95,. showing the frequency distributions of

weights of adult males in the several parts of the United King-

dom. In Wales, one individual was observed with a weight of

over 280 lbs., the next heaviest being under 260• lbs. The

addition of the one very exceptional individual has increased the

range by some 30 lbs., or about one-fifth. A measure subject to

erratic alterations by casual influences in this way is clearly not

of much use for comparative purposes. Moreover, the measure

takes no account of the form of the distribution within the limits

of the range; it might well happen that, of two distributions

covering precisely the same range of variation, the one showed

the observations for the most part closely clustered round the

average, while the other exhibited an almost even distribution of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

frequency over the whole range. Clearly we should not regard

two such distributions as exhibiting the same dispersion, though

they exhibit the same range. Some sort of measure of dispersion

is therefore required, based, like the averages discussed in the last


chapter, on all the observations made, so that no single observation

can have an unduly preponderant effect on its magnitude; indeed,

the measure should possess all the properties laid down as desir-

able for an average in § 4 of Chap. VII. There are three such

measures in common use—the standard deviation, the mean

deviation, and the quartile deviation or semi-interquartile range,

of which the first is the most important.

2. The Standard Deviation.—The standard deviation is the

square root of the arithmetic mean of the squares of all deviations,

deviations being measured from the arithmetic mean of the

observations. If the standard deviation be denoted by o-, and a

deviation from the arithmetic mean by x, as in the last chapter,

then the standard deviation is given by the equation

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

^ = ls(^) . . . . (1)

To square all the deviations may seem at first sight an artificial

procedure, but it must be remembered that it would be useless to

take the mere sum of the deviations, in order to obtain a measure

of dispersion, since this sum is necessarily zero if deviations be

taken from the mean. In order to obtain some quantity that

shall vary with the dispersion it is necessary to average the

deviations by a process that treats them as if they were all of the

same sign, and squaring is the simplest process for eliminating

signs which leads to results of algebraical convenience.

3. A quantity analogous to the standard deviation may be

defined in more general terms. Let A be any arbitrary value of

X, and let $ (as in Chap. VII. § 8) denote the deviation of X

from A; i.e. let


Then we may define the root-mean-square deviation s from the

origin A by the equation

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

** = is(<?) . . . . . (2)

In terms of this definition the standard deviation is the root-

mean-square deviation from the mean. There is a very simple

relation between the standard deviation and the root-mean-square

deviation from any other origin. Let '-

M-A = d (3)

so that f = x + d.

Then ^ = xi + 2x.d+d\

2(£2) = 2(x*) + U.%(x) + N.d*.



But the sum of the deviations from the mean is zero, therefore

the second term vanishes, and accordingly

! + dK


Hence the root-mean-square deviation is least when deviations

are measured from the mean, i.e. the standard deviation is the least

possible root-mean-square deviation.

%(&), or 2(/.f2) if we are dealing with a grouped distribution

and /Is the frequency of £, is sometimes termed the second moment

of the distribution about A, just as 2(£) or %(/.£) is termed

the first moment (c/. Chap. VII. § 8): we shall not make use

of the term in the present work. Generally, 2(f.f) is termed

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

the nth moment.

4. If o- and d are the two sides of a right-angled triangle, s is


Fig. 25.

the hypotenuse. If, then, ME be the vertical through the

mean of a frequency-distribution (fig. 25), and MS be set off

equal to the standard deviation (on the same scale in which the

variable X is plotted along the base), SA will be the root-mean-

square deviation from the point A. This construction gives a

concrete idea of the way in which the root-mean-square deviation

depends on the origin from which deviations are measured. It

will be seen that for small values of d the difference of s from o-

will be very minute, since A will lie very nearly on the circle

drawn through M with centre S and radius SM: slight errors

in the mean due to approximations in calculation will not, there-

fore, appreciably affect the value of the standard deviation.

5. If^we have Jo deal with relatively few, say thirty or forty,

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

ungrouped observations, the method of calculating the standard

deviation is perfectly straightforward. It is illustrated by the

figures given below for the estimated average earnings of


agricultural labourers in 38 rural unions. The values (earnings)

are first of all totalled and the total divided by JY^to give tne

arithmetic mean M, viz. 15s ll^fd., or 15s. lid. to the nearest

penny. The earnings being estimates, it is not necessary to take

the average to any higher degree of accuracy. Having found

the mean, the difference of each observation from the mean is

next written down as in col. 3, one penny being taken as the

unit: the signs are not entered, as they are not wanted, but the

work should be checked by totalling the positive and negative

differences separately. [The positive total is 300 and the

negative 290, thus checking the value for the mean, viz. 15s.

lid.+ 10/38.]

Finally, each difference is squared, and the squares entered in

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

col. 4,—tables of squares are useful for such work if any of the

differences to be squared are large (see list of Tables, p. 352).

The sum of the squares is 16,018. Treating the value taken for

the mean as sensibly accurate, we have—



If we wish to be more precise we can reduce to the true mean

by the use of equation (4), as follows :—

<«.M? =421-5263

rf=^ = 0-2632; cP= 0 0693

Hence o-2 = s2 - d2 = 421-4570

o-= 20-529d.

Evidently this reduction, in the given case, is unnecessary,

illustrating the fact mentioned at the end of $ 4, that small

errors in the mean have little effect on the value found for the

standard deviation. The first value is correct within a very

small fraction of a penny.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


Calculation of the Standard Deviation: Example i. —Calculation of

Mean and Standard Deviation for a Short Series of Observations tin-

grouped. Estimated Average Weekly Earnings of Agricultural Labourers

in Thirty-eight Rural Unions, in 1892-3. (W. Little: Labour Com-

mission; Report, vol. v., part i., 1894.)







{ (Pence).
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259




and Pence).

1. Glendale ....

s. d.

20 9



2. Wigton

20 3



3. Garstang

19 8


Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

4. Belper

18 6



5. Nantwich .

17 8



6. Atcham

17 6



7. Driffield

17 1



8. Uttoxeter .

17 0



9. Wetherby .

17 0



10. Easingwold

16 11



11. Southwell .

16 6


12. Hollingbourn

16 4



13. Melton Mowbray

16 3


The figures dealt with in this illustration are estimates of the

weekly earnings of the agricultural labourers, i.e. they include

allowances for gifts in kind, such as coal, potatoes, cider, etc. The

estimated weekly money wages are, however, also given in the

same Report^ and we are thus enabled to make an interesting

comparison of the dispersions of the two. It might be expected

that earnings would vary less than wages, as his earnings and not

the mere money wages he receives are the important niatter to

the labourer, and as a fact we find

Standard deviation of weekly earnings . . 20'5d.

,, „ ,, wages . . 26-0d.

The arithmetic mean wage is 13s. 5d.

6. If we have to deal with a grouped frequency-distribution,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

the same artifices and approximations are used as in the calculation

of the mean (Chap. VII. §§ 8, 9, 10). The mid-value of one of

the class-intervals is chosen as the arbitrary origin A from which

to measure the deviations £, the class-interval is treated as a

unit throughout the arithmetic, and all the observations within

any one class-interval are treated as if they were identical with

the mid-value of the interval. If, as before, we denote the

frequency in any one interval by /, these / observations con-

tribute /£2 to the sum of the squares of deviations and we


The standard deviation is then calculated from equation (4).

7. The whole of the work proceeds naturally as an extension of

that necessary for calculating the mean, and we accordingly use

the same illustrations as in the last chapter. Thus in Example

ii. below, cols. 1, 2, 3, and 4 are the same as those we have already

given in Example i. of Chap. VII. for the calculation of the mean.

Column 5 gives the figures necessary for calculating the standard

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

deviation, and is derived directly from col. 4 by multiplying the

figures of that column again by £. Thus 90 x 5 = 450, 192 x 4 =

768, and so on. The work is therefore done very rapidly. The

remaining steps of the arithmetic are given below the table; the

student must be careful to remember the final conversion, if

necessary, from the class-interval as unit to the natural unit

of measurement. In this case the value found is 2'48 class-

intervals, and the class-interval being half a unit, that is l-24

per cent.


Calculation of the Standard Deviation: Example ii.---Calculation of

the Standard Deviation of the Percentages of the Population in receipt of

Relief, in addition to the Mean, from the figures of Table VIII. of

Chap. VI. (Cf. the work for the mean alone, p. 111.)








Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

in receipt

of Relief.



from Value A.










Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google





































Means and Standard Deviations of the Distributions of Pauperism (Percentage

of the Population in receipt of Poor-law Relief) in tlie Unions of England

and Wales since 1850. (From Yule, Jour. Roy. Stat. Soc., vol. lix ,

1896, figures slightly amended.)

Percentage of the Population

in receipt of Relief.






Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259





6 51








1 3B


8. In the table given on p. 141 (Example iii.), the calculation of

the standard deviation is similarly shown for the distribution of

the statures of adult males in the British Isles, the work being
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

continued from the stage which it reached for the calculation of

the mean in Example ii. of Chap. VII. The steps of the arith-

metic hardly call for further explanation, but it may be noted that

the class-interval being a unit in this case, no conversion of

the standard deviation from class-intervals to units is required.

9. The student must remember, as in the case of the calculation

of the mean, that the treatment of all values within each class-

interval as if they were identical with the mid-value of the interval

is an approximation and no more (c/. Chap. VII. § 11), though,

for a distribution of the symmetrical or moderately asymmetrical

type with a class-interval not greater than one-twentieth or so

of the range, the approximation may be a very close one. But

while the value of the arithmetic mean may be either increased

or decreased by grouping, in the case of distributions which are

not more than slightly asymmetrical, the standard deviation of

such distributions always tends to be increased, and the increase

is the greater the cruder the grouping. We give an approximate

correction for this effect later (Chap. XL § 3). The student is

recommended to test for himself the effect of grouping in two

or three cases.

10. It is a useful empirical rule to remember that a range of

six times the standard deviation usually includes 99 per cent. or

more of all the observations in the case of distributions of the

symmetrical or moderately asymmetrical type. Thus in Example



Calculation of the Standard Deviation: Example iii.—Calculation

of the Standard Deviation of Stature of Male Adults in the British Isles

from the figures of Table VI, p. 88. (Cf. p. 112 for the calculation of

mean alone.)








Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259






Value A.









Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google








































give a more definite and concrete meaning to the standard

deviation, and also to check arithmetical work to some extent—

sufficiently, that is to say, to guard against very gross blunders.

It must not be expected to hold for short series of observations:

in Example i., for instance, the actual range is a good deal less

than six times the standard deviation.

11. The standard deviation is the measure of dispersion which

it is most easy to treat by algebraical methods, resembling in this

respect the arithmetic mean amongst measures of position. The

majority of illustrations of its treatment must be postponed to a

later stage (Chap. XI.), but the work of § 3 has already served as

one example, and we may take another by continuing the work oi

§ 13 (6), Chap. VII. In that section it was shown that if a series

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

of observations of which the mean is M consist of two component

series, of which the means are Ml and Mn respectively,

N.M=NVM1 + N2 M2,

iV, and N2 being the numbers of observations in the two com-

ponent series, and N= Nx + N2 the number in the entire series.

Similarly, the standard deviation o- of the whole series may be

expressed in terms of the standard deviations o-l and <j2 of the

components and their respective means. Let


M2 - M= d2.

Then the mean-square deviations of the component series about

the mean M are, by equation (4), o-l2 + d12 and o-22 + rf22 respec-

tively. Therefore, for the whole series,

N.v^N^ + d^ + N^ + d*) . . (5)

If the numbers of observations in the component series be equal

and the means be coincident, we have as a special case—

o^K^ + oV) • " • • (6)

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

so that in this case the square of the standard deviation of the

whole series is the arithmetic mean of the squares of the standard

deviations of its components.

It is evident that the form of the relation (5) is, quite general:

if a series of observations consists of r component series with

standard deviations o-v o-2, . . . o-„ and means diverging from the

general mean of the whole series by dv d2, . . . d„ the standard

deviation o- of the whole series is given (using m to denote any

subscript) by the equation—

N.o-* = %(Nm.<rj) + %(Nm.dJ) . . . (7)


Again, as in § 13 of Chap. VII., it is convenient to note, for the

checking of arithmetic, that if the same arbitrary origin be used

for the calculation of the standard deviations in a number of

component distributions we must have

S(/.0 = 2(/1.^) + 2(/2.|22)+ +2(7^) • (8)

12. As another useful illustration, let us find the standard

deviation of the first N natural numbers. The mean in this case

is evidently (N+ 1)/2. Further, as is shown in any elementary

Algebra, the sum of the squares of the first N natural numbers is


The standard deviation o- is therefore given by the equation—

0-2 = \(N+ l)(2N+ 1) - |(iT+ 1)2,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

that is, o-2 = tl(^2-1) . . . . (9)

This result is of service if the relative merit of, or the relative

intensity of some character in, the different individuals of a series

is recorded not by means of measurements, e$. marks awarded on

some system of examination, but merely by means of their

respective positions when ranked in order as regards the character,

in the same way as boys are numbered in a class. With N

individuals there are always N ranks, as they are termed,

whatever the character, and the standard deviation is therefore

always that given by equation (9).

Another useful result follows at once from equation (9), namely,

the standard deviation of a frequency-distribution in which all

values of X within a range ± 1/2 on either side of the mean are

equally frequent, values outside these limits not occurring, so that

the frequency-distribution may be represented by a rectangle. The

base I may be supposed divided into a very large number N of equal

elements, and the standard deviation reduces to that of the first N

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

natural numbers when N is made indefinitely large. The single

unit then becomes negligible compared with N, and consequently

o-*=g . . . . (10)

13. It will be seen from the preceding paragraphs that the

standard deviation possesses the majority at least of the properties

which are desirable in a measure of dispersion as in an average

(Chap. VII. § 4). It is rigidly defined; it is based on all the

observations made; it is calculated with reasonable ease; it lends

itself readily to algebraical treatment; and we may add, though the

student will have to take the statement on trust for the present,

that it is, as a rule, the measure least affected by fluctuations of


sampling. On the other hand, it may be said that its general

nature is not very readily comprehended, and that the process of ■

squaring deviations and then taking the square root of the mean

seems a little involved. The student will, however, soon surmount

this feeling after a little practice in the calculation and use of the

constant, and will realise, as he advances further, the advantages

that it possesses. Such rootmean-square quantities, it may be

added, frequently occur in other branches of science. The

standard deviation should always be used as the measure of disper-

sion, unless there is some very definite reason for preferring another

measure, just as the arithmetic mean should be used as the measure

of position. It may be added here that the student will meet with

the standard deviation under many different names, of which we

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

have adopted the most recent (due to Pearson, ref. 2): many of

the earlier names are hardly adapted to general use, as they bear

evidence of their derivation from the theory of errors of observation.

Thus the terms "mean error" (Gauss), "error of mean square"

(Airy), and "mean square error" have all been used in the same

sense. The square of the standard deviation, and also twice the

square, have been termed the "fluctuation" (Edgeworth): the

standard deviation multiplied by the square root of 2, the

"modulus" (Airy),—the student will see later the reason for

the adoption of the factor. The reciprocal of the modulus has

been termed the "precision" (Lexis).

14. The Mean Deviation.—The mean deviation of a series of

values of a variable is the arithmetic mean of their deviations

from some average, taken without regard to their sign. The

deviations may be measured either from the arithmetic mean or

from the median, but the latter is the natural origin to use. Just

as the root-mean-square deviation is least when deviations are

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

measured from the arithmetic mean, so the mean deviation is

least when deviations are measured from the median. For

suppose that, for some origin exceeded by m values out of N, the

mean deviation has a value A. Let the origin be displaced by

an amount c until it is j ust exceeded by m — 1 of the values only,

i.e. until it coincides with the mth value from the upper end of

the series. By this displacement of the origin the sum of devia-

tions in excess of the mean is reduced by m.c, while the sum of

deviations in defect of the mean is increased by (N-m)e. The

new mean deviation is therefore

(N - m)c - mc


= A+hjV-2m)c.

The new mean deviation is accordingly less than the old so long as


That is to say, if N be even, the mean deviation is constant for

all origins within the range between the iV/2th and the (N/2 + l)th

observations, and this value is the least: if N be odd, the mean

deviation is lowest when the origin coincides with the (iV+ l)/2th

observation. The mean deviation is therefore a minimum when

deviations are measured from the median or, if the latter be

indeterminate, from an origin within the range in which it lies.

15. The calculation of the rmai' deviation either from the mean

or from the median for a series of ungrouped observations is very

simple. Take the figures of Example i. (p. 137) as an illustration.

We have already found the mean (15s. lid. to the nearest penny),
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

and the deviations from the mean are written down in column 3.

Adding up this column without respect to the sign of the devi-

ations we find a total of 590. The mean deviation from the mean

is therefore 590/38 = 15.53d. The mean deviation from the

median is calculated in precisely the same way, but the median

replaces the mean as the origin from which deviations are measured.

The median is 15s. 6d. The deviations in pence run 63, 57, 50,

36, and so on; their sum is 570; and, accordingly, the mean

deviation from the median is 15d. exactly.

16. In the case of a grouped frequency-distribution, the sum

of deviations should be calculated first from the centre of the

class-interval in which the mean (or median) lies, and then

reduced to the mean as origin. Thus in the case of Example ii.

the mean is 3 29 per cent. and lies in the class-interval centring

round 3-5 per cent. We have already found that the sum of

deviations in defect of 3.5 per cent. is 776, and of deviations in

excess 509: total (without regard to sign) 1285,—the unit of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

measurement being, of course, as it is necessary to remember, the

class-interval. If the number of observations below the mean is

N1 and above the mean _/V"2, and M - A = d, as before, we have to

add JVvd to the sum found and subtract N2.d. In the present

case JV1 = 327 and iV2 = 305, while d=-0-42 class-intervals,


d(N1 - vV2) = - 0-42 x 22 = - 9-2,

and the sum of deviations from the mean is 1285 — 9-2 = 1275-8.

Hence the mean deviation from the mean is 1275-8/632 = 2.019

class-intervals, or l.0l per cent.

17". The mean deviation from the. median should be found in

precisely similar fashion, but the mid-value of the interval in

which the median (instead of the mean) lies should, for con-


venience, be taken as origin. Thus in Example ii. the median is

(Chap. VII. § 15) 3195 per cent. Hence 3-0 per cent. should be

taken as the origin, d = +039 intervals, J\Ti «= 327, N2 = 305. The

deviation-sum with 3 0 as origin is found to be 1263, and the

correction is +0 39 x 22=+86. Hence the mean deviation

from the median is 2-012 intervals, or again 1-01 per cent. The

value is really smaller than that of the mean deviation from the

arithmetic mean, but the difference is too slight to affect the

second place of decimals.

It should be noted that, as in the case of the standard deviation,

this method of calculation implies the assumption that all the

values of X within any one class-interval may be treated as if

they were the mid-value of that interval. This is, of course, an

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

approximation, but as a rule gives results of amply sufficient

accuracy for practice if the class-interval be kept reasonably small

(cf. again Chap. VI. § 5). We have left it as an exercise to the

student to find the correction to be applied if the values in each

interval are treated as if they were evenly distributed over the

interval, instead of concentrated at its centre (Question 7).

18. The mean deviation, it will be seen, can be calculated rather

more rapidly than the standard deviation, though in the case of a

grouped distribution the difference in ease of calculation is not

great. It is not, on the other hand, a convenient magnitude for

algebraical treatment; for example, the mean deviation of a dis-

tribution obtained by combining several others cannot in general

be expressed in terms of the mean deviations of the component

distributions, but depends upon their forms. As a rule, it is more

affected by fluctuations of sampling than is the standard deviation,

but may be less affected if large and erratic deviations lying

somewhat beyond the bulk of the distribution are liable to occur.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

This may happen, for example, in some forms of experimental

work, and in such cases the use of the mean deviation may be

slightly preferable to that of the standard deviation.

19. It is a useful empirical rule for the student to remember

that for symmetrical or only moderately asymmetrical distri-

butions, -approaching the ideal forms of figs. 5 and 9, the mean

deviation is usually very nearly four-fifths of the standard devia-

tion. Thus for the distribution of pauperism we have

mean deviation 1*01 __.

standard deviation l-24

In the case of the distribution of male statures in the British

Isles, Example iii., the ratio found is 0-80. For a short series of

observations like the wage statistics of Example i. a regular result

could hardly be expected: the actual ratio is 15•0/20-5 = 073.


We pointed out in.§ 10 that in distributions of the simple forms

referred to, a range of six times the standard deviation contains

over 99 per cent. of all the observations. If the mean deviation

be employed as the measure of dispersion, we must substitute a

range of 1\ times this measure.

20. The Quartile Deviation or Semi-interquartile Range.—If a

value Q1 of the variable be determined of such magnitude that

one-quarter of all the values observed are less than Q1 and three-

quarters greater, then Q1 is termed the lower quartile. Similarly,

if a value Q3 be determined such that three-quarters of all the

values observed are less than Q3 and one-quarter only greater,

then Q3 is termed the upper quartile. The two quartiles and the

median divide the observed values of the variable into four

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

classes of equal frequency. If Mi be the value of the median, in

a symmetrical distribution

Mi-Q1 = Q3-Mi,

and the difference may be taken as a measure of dispersion. But

as no distribution is rigidly symmetrical, it is usual to take as the


and Q is termed the quartile deviation, or better, the semi-

interquartile range—it is not a measure of the deviation from

any particular average: the old name probable error should be

confined to the theory of sampling (Chap. XV. § 17).

21. In the case of a short series of ungrouped observations

the quartiles are determined, like the median, by inspection.

In the wage statistics of Example i., for instance, there are

38 observations, and 38/4 = 9-5: What is the lower quartile?

The student may be tempted to take it halfway between the

ninth and tenth observations from the bottom of the list;

but this would be wrong, for then there would be nine

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

observations only below the value chosen instead of 9-5. The

quartile must be taken as given by the tenth observation

itself, which may be regarded as divided by the quartile, and

falling half above it and half below. Therefore

Lower quartile Q1 = 14s. lOd.

Upper quartile Qs= 16s. lid.

and (2 = ^-9-^! = 12-5rf.

22. In the case of a grouped distribution, the quartiles, like

the median, are determined by simple arithmetical or by


graphical interpolation (cf. Chap. VII. §§15, 16). Thus for the

distribution of pauperism, Example ii., we have


Total frequency under 2-25 per cent. = 138

Difference = 20

Frequency in interval 2'25 - 275 = 89


Whence Q1 = 2-25 + gjj x 0-5 = 2-362 per cent.

Similarly we find Q3 =4-130 „

Hence (3 = ^-^ = 0-884

It is left to the student to check the value by graphical


23. For distributions approaching the ideal forms of figs.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

5 and 9, the semi-interquartile range is usually about two-thirds

of the standard deviation. Thus for Example ii. we find


The distribution of statures, Example iii., gives the ratio 0-68.

The short series of wage statistics in Example i. could not be

expected to give a result in very strict conformity with the

rule, but the actual ratio, viz. 0p61, does not diverge greatly.

It follows from this ratio that a range of nine times the semi-

interquartile range, approximately, is required to cover the same

proportion of the total frequency (99 per cent, or more) as a range

of six times the standard deviation.

24. Of the three measures of dispersion, the semi-interquartile

range has the most clear and simple meaning. It is calculated,

like the median, with great ease, and the quartiles may be found,

if necessary, by measuring two individuals only. If, e.g., the

dispersion as well as the average stature of a group of men

is required to be determined with the least possible expenditure

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

of time, they may be simply ranked in order of height, and the

three men picked out for measurement who stand in the centre

and one-quarter from either end of the rank. This measure of

dispersion may also be useful as a makeshift if the calculation

of the standard deviation has been rendered difficult or impossible

owing to the employment of an irregular classification of the

frequency or of an indefinite terminal class. Such uses are,

however, a little exceptional, and, generally speaking, the


semi-interquartile range as a measure of dispersion is not to be

recommended, unless simplicity of meaning is of primary im-

portance, owing to the lack of algebraical convenience which

it shares with the median. Further, it is obvious that the

quartile, like the median, may become indeterminate, and that

the use of this measure of dispersion is undesirable in cases of

discontinuous variation: the student should refer again to the

discussion of the similar disadvantage in the case of the median,

Chap. VII. § 14. It has, however, been largely used in the past,

particularly for anthropometric work.

25. Measures of Relative Dispersion.—As was pointed out in

Chapter VII. § 26, if relative size is regarded as influencing not only

the average, but also deviations from the average, the geometric
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

mean seems the natural form of average to use, and deviations

should be measured by their ratios to the geometric mean. As

already stated, however, this method of measuring deviations, with

its accompanying employment of the geometric mean, has never

come into general use. It is a much more simple matter to allow

for the influence of size by taking the ratio of the measure of

absolute dispersion (e.g. standard deviation, mean deviation, or

quartile deviation) to the average (mean or median) from which

the deviations were measured. Pearson has termed the quantity

, = 100.J,

i.e. the percentage ratio of the standard deviation to the arithmetic

mean, the coefficient of variation (ref. 6), and has used it, for

example, in comparing the relative variations of corresponding

organs or characters in the two sexes: the ratio of the quartile

deviation to the median has also been suggested (Verschaeffelt,

ref. 7). Such a measure of relative dispersion is evidently a mere

number, and its magnitude is independent of the units of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

measurement employed.

26. Measures of Asymmetry or Skewness.—If we have to compare

a series of distributions of varying degrees of asymmetry, or skew-

ness, as Pearson has termed it, some numerical measure of this

character is desirable. Such a measure of skewness should

obviously be independent of the units in which we measure the

variable—e.g. the skewness of the distribution of the weights of a

given set of men should not be dependent on our choice of the

pound, the stone, or the kilogramme as the unit of weight—and

the measure should accordingly be a mere number. Thus the

difference between the deviations of the two quartiles on either

side of the median indicates the existence of skewness, but to

measure the degree of skewness we should take the ratio of this


difference to some quantity of the same dimensions, e.g. the semi-

interquartile range. Our measure would then be, taking the

skewness to be positive if the longer tail of the distribution runs

in the direction of high values of X,

skewness - ^& -~. *> - **+ V 2Mi . (11)

This would not be a bad measure if we were using the quartile

deviation as a measure of dispersion: its lowest value is zero,

when the distribution is symmetrical; and while its highest possible

value is 2, it would rarely in practice attain higher numerical

values than ± 1. A similar measure might be based on the mean

deviations in excess and in defect of the mean. There is, however,

only one generally recognised measure of skewness, and that is

Pearson's measure (ref. 8)—

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

, mean - mode ., ,

skewness=——3—-j—3—:—-;— . . (12)

standard deviation v'

This is evidently zero for a symmetrical distribution, in which

mode and mean coincide. No upper limit to the ratio is apparent

from the formula, but, as a fact, the value does not exceed unity for

frequency-distributions resembling generally the ideal distributions

of fig. 9. As the mode is a difficult form of average to determine

by elementary methods, it may be noted that the numerator of the

above fraction may, in the case of frequency-distributions of the

forms referred to, be replaced approximately by 3(mean - median),

(cf. Chap. VII. §20). The measure (12) is much more sensitive

than (11) for moderate degrees.of asymmetry.

. 27. The Method of Percentiles.—We may conclude this chapter

by describing briefly a method that has been largely used in the

past in lieu of the methods dealt with in Chapters VI. and VII.,

and the preceding paragraphs of this chapter, for summarising

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

such statistics as we have been considering. If the values of the

variable (variates, as they are sometimes termed) be ranged in

order of magnitude, and a value P of the variable be determined

such that a percentage p of the total frequency lies below it and

100 -p above, then P is termed a percentile. If a series of per-

centiles be determined for short intervals, e.g. 5 per cent. or 10

per cent., they suffice by themselves to show the general form

of the distribution. This is Sir Francis Galton's method of

percentiles. The deciles, or values of the variable which divide

the total frequency into ten equal parts, form a natural and

convenient series of percentiles to use. The fifth decile, or value

of the variable which has 50 per cent. of the observed values



above it and 50 per cent, below, is the median: the two quartiles

lie between the second and third and the seventh and eighth

deciles respectively.

28. The deciles, like the median and quartiles, may be

determined either by arithmetical or by graphical interpolation,

excluding the cases in which, like the former constants, they

become indeterminate (cf. § 24). It is hardly necessary to give

an illustration of the former process, as the method is precisely

the same as for median and quartiles (Chap. VII. § 15, and above,

§ 22). Fig. 26 shows, of course on a very much reduced scale, the


S | 600-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259


-S 8 400-






Percentage• of the population

in receipt of relief

Via. 26.—Curve showing the number of Districts of England and Wales in

which the Pauperism on 1st January 1891 did not exceed any given per-

centage of the population (same data as Fig. 10, p. 92): graphical

determination of Deciles.

curve used for obtaining the deciles by the graphical method in

the case of the distribution of pauperism (Example ii. above).

The figures of the original table are added up step by step from
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

. the top, so as to give the total frequency not exceeding the upper

limit of each class-interval, and ordinates are then erected to a

horizontal base to represent on some scale these integrated

frequencies: a smooth curve is then drawn through the tops of

the ordinates so obtained. This curve, as will be seen from the

figure, rises slowly at first when the frequencies are small, then

more rapidly as they increase, and finally turns over again and

becomes quite flat as the frequencies tail off to zero. The deciles


may be readily obtained from such a curve by dividing the

terminal ordinate into ten equal parts, and projecting the points

so obtained horizontally across to the curve and then vertically

down to the base. The construction is indicated on the figure for

the fourth decile, the value of which is approximately 2.88 per cent.

29. The curve of fig. 26 may be drawn in a different way by

taking a horizontal base divided into ten or a hundred equal

parts (grades, as Sir Francis Galton has termed them), and erecting

at each point so obtained a vertical proportional to the cor-

responding percentile. This gives the curve of fig. 27, which was

obtained by merely redrafting fig. 26. The curve is of so-called

o io 20 30 <w so eo w so so wo
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259



0 10 20 30 40 BO 60 70 SO 90 1O0


Fig. 27.—The curve of Fig. 26 redrawn so as to give the Pauperism

corresponding to each grade: Galton's " Ogive."

ogive form. The ogive curve for the distribution of statures

(Example iii.) is shown for comparison in fig. 28. It will be noticed

that the ogive curve does not bring out the asymmetry of the

distribution of pauperism nearly so clearly as the frequency-

polygon, fig. 10, p. 92.

30. The method of percentiles has some advantages as a method

of representation, as the meaning of the various percentiles is so

simple and readily understood. An extension of the method to

the treatment of non-measurable characters has also become of

some importance. For example, the capacity of the different boys

in a class as regards some school subject cannot be directly

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

measured, but it may not be very difficult for the master to



arrange them in order of merit as regards this character: if the

boys are then "numbered up " in order, the number of each boy,

or his rank, serves as some sort of index to his capacity (cf. the

remarks in § 12. It should be noted that rank in this sense is

not quite the same as grade; if a boy is tenth, say, from the

bottom in a class of a hundred his grade is 9"5, but the method

is in principle the same with that of grades or percentiles).

The method of ranks, grades, or percentiles in such a case may

be a very serviceable auxiliary, though, of course, it is better if

possible to obtain a numerical measure. But if, in the case of a

measurable character, the percentiles are used not merely as

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259








10 20 30 40 SO 60 70 SO 90 100









O 10 ZO 30 40 SO 60 70 SO 90 100
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Stature corresponding to each, grade.,

for acUtlt, males in the. British Isles.

Fig. 28.—Ogive Curve for Stature, same data as Fig. 6, p. 89.

constants illustrative of certain aspects of the frequency-distribu-

tion, but entirely to replace the table giving the frequency-

distribution, serious inconvenience may be caused, as the

application of other methods to the data is barred. Given the

table showing the frequency-distribution, the reader can calculate

not only the percentiles, but any form of average or measure of

dispersion that has yet been proposed, to a sufficiently high

degree of approximation. But given only the percentiles, or at

least so few of them as the nine deciles, he cannot pass back to

the frequency-distribution, and thence to other constants, with any

degree of accuracy. In all cases of published work, therefore,

the figures of the frequency-distribution should be given; they

are absolutely fundamental.




(1) Fechner, G. T., "Ueber deu Ausgangswerth der kleinsten Abweichungs-

snmine, dessen Bestimmung, Verwendung und Verallgemeinerung,"

Abh d. kgl. Sikhs. Ges. d. IVissenschaften, vol. xviii. (also numbered

vol. xi. of the Abh. d. malh.-phys. Classe); Leipzig, 1878, p. 1.

Standard Deviation.

(2) Pearson, Karl, " Contributions to the Mathematical Theory of Evolution

(i. On the Dissectipn of Asymmetrical Frequency-curves)," Phil. Trans.

Roy. Soc, Series A, vol. clxxxv., 1894, p. 71. (Introduction of the term

"standard deviation," p. 80.)

Mean Deviation.

(3) Laplace, Pierre Simon, Marquis de, Théorie analytique des probabili-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

ty: 2"" supplement, 1818. (Proof that the mean deviation is a

minimum when taken about the median.)

Method of Percentiles, including Quartiles, etc.

(4) Galton, Francis, "Statistics by Intercomparison, with Remarks on the

Law of Frequency of Error," Phil. Mag., vol. xlix (4th Series), 1875,

pp. 33-46.

(5) Galton, Francis, Natural Inheritance; Macmillan, 1889. (The method

of percentiles is used thoughout, with the quartile deviation as the

measure of dispersion.)

Relative Dispersion.

(6) Pearson, Karl, "Regression, Heredity, and Panmixia," Phil. Trans.

Roy. Soc., Series A, vol. clxxxvii., 1896, p. 253. (Introduction of

"coefficient of variation," pp. 276-7.)

(7) Verschaeffelt, E., "Ueber graduelle Variability von pflanzlichen

Eigenschaften," Ber. deutsch.bot. Ges., Bd. xii., 1894, pp. 350-55.


(8) Pearson, Karl, "Skew Variation in Homogeneous Material," Phil.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Trans. Roy. Soc, Series A, vol. clxxxvi., 1895, p. 343. (Introduction

of term, p. 370.)

Calculation of Mean, Standard-deviation, or of the General

Moments of a Grouped Distribution.

We have given a direct method that seems the simplest and best for

the elementary student. A process of successive summation that has

some advantages can, however, be used instead. The student will

find a convenient description with illustrations in—

(9) Elderton, W. Palin, Frequency-curves and Correlation; C. & E.

Layton, London, 1906.




1. Verify the following from the data of Table VI., Chap. VI., continuing

the work from the stage reached for Qu. 1, Chap. VII.

Stature in

Inches for Adult Males born in—





Standard deviation .

Mean deviation.

Quartile deviation
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Mean deviation / standard
















Quartile deviation/standard
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google




Lower quartile .

Upper ,, . . .


0 62

0 62










2. (Continuing from Qu. 2, Chap. VII.) Find the standard deviation,

mean deviation, quartiles and quartile deviation (or semi-interquartile range)

for the distribution of weights of adult males in the United Kingdom given in

the last column of Table IX., Chap. VI.

Compare the ratios of the mean and quartile deviations to the standard

deviation with the ratios stated in §§ 19 and 23 to be usual.

Find the value of the skewness (equation 12), using the approximate value

of the mode.

3. Using, or extending if necessary, your diagram for Question 4, Chap. VII.,

find the quartile values for houses assessed to inhabited house duty in 1885-6,

from the data of Table IV., Chap. VI.

Find also the 9th decile (the value exceeded by 10 per cent. of the houses


4. Verify equation (9) by direct calculation of the standard deviation of the

numbers 1 to 10.

5. (Data from Sauerbeck, Jour, Boy. Stat. Soc., March 1909.) The

following are the index-numbers (percentages) of prices of 45 commodities in

1908 on their average prices in the years 1867-77 :—40, 43, 43, 46, 46, 46,

54, 56, 59, 62, 64, 64, 66, 66, 67, 67, 68, 68, 69, 69, 69, 71, 75, 75, 76, 76,

78, 80, 82, 82, 82, 82, 82, 83, 84, 86, 88, 90, 90, 91, 91, 92, 95, 102, 127.

Find the mean and standard deviation (1) without further grouping; (2)

grouping the numbers by fives (40-, 45-, 50-, etc.); (3) grouping by tens (40-,

(or median), in a grouped frequency-distribution, is found to be S. Find the

correction to be applied to this sum, in order to reduce it to the mean (or

median) as origin, on the assumption that the observations are evenly dis-

tributed oyer each class-interval. Take the number of observations below the

interval containing the mean (or median) to be n,, in that interval n2, and

above it n3 ; and the distance of the mean (or median) from the arbitrary

origin to be d

Show that the values of the mean deviation (from the mean and from the

median respectively) for Example ii., found by the use of this formula, do not

differ from the values found by the simpler method of §§ 16 and 17 in the

second place of decimals.

8. (W. Scheibner, "Ueber Mittelwerthe," Berichte der kgl. sachsischen

Gesellschaft d. WissenschafUn, 1873, p. 564, cited by Fechner, ref. 2 of

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Chap. VII.: the second form of the relation is given by G Duncker {Die

Methode der VariationsstatUtik; Leipzig, 1899) as an empirical one.) Show

that if deviations are small compared with the mean, so that (x/ATf may be

neglected in comparison with x/M, we have approximately the relation



where G is the geometric mean, M the arithmetic mean, and a the standard

deviation : and consequently to the same degree of approximation M2 - Gz = <r2.

9. (Scheibner, loc. cit., Qu. 8.) Similarly, show that if deviations are small

compared with the mean, we have approximately

H being the harmonic mean.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

/c- ¥J



1-3. The correlation table and its fonnation—4-5. The correlation surface—

6-7. The general problem—8-9. The line of means of rows and the

line of means of columns: their relative positions in the case of

independence and of varying degrees of correlation—10-14. The

correlation coefficient and the regressions—15-16. Numerical calcula-

tions—17. Certain points to be remembered in calculating and using

the coefficient.

1. In chapters VI.-VIII. we considered the frequency-distribu-

tion of a single variable, and the more important constants

that may be calculated to describe certain characters of such

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

distributions. We have now to proceed to the case of two

variables, and the consideration of the relations between them.

2. If the corresponding values of two variables be noted

together, the methods of classification employed in the preceding

chapters may be applied to both, and a table of double entry or

contingency-table (Chap. V.) be formed, exhibiting the frequencies

of pairs of values lying within given class-intervals. Six such

tables are given below as illustrations for the following

variables:—Table I., two measurements on a shell (Pecten).

Table II., ages of husbands and wives in England and Wales in

1901. Table III., statures of fathers and their sons (British).

Table IV., fertility of mothers and their daughters (British

peerage). Table V., the rate of discount and the ratio of reserves

to deposits in American banks. Table VI., the proportion of

male to total births, and the total numbers of births, in the

registration districts of England and Wales.

Each row in such a table gives the frequency-distribution of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

the first variable for cases in which the second variable lies

within the limits stated on the left of the row. Similarly, every

column gives the frequency-distribution of the second variable

for cases in which the value of the first variable lies within the

limits stated at the head of the column. As "columns" and

"rows" are distinguished only by the accidental circumstance


thkwtt or stxranca.

-" y


"2 *«





Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259


(.1 "-'

I. «1X3«cr-1-f:



I —St , So

7S 75.







S '52-54.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google





I . I "■=

I I I "2

l°°§3 I I



"rt I I I I I I I I





I I I I I I I I I I I • -£

N9CoW9NioooH^N9 -V

(2) Dorso-ventral diameter, mm.











99 -9




99 -91

99 -9 19 Z9 It-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259

"ssai^ jo saSy (-i

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google















Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google









Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

"najpuqo s(J9qSopi jo jaqmn^j (i)














99 on


■sy .-J
a rati]

% IO9

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

"ooBds -- 9} poS:|inio nooq OA^q suranjoo ^ui^q ''^U8° -t9^' 9










Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:49 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

•si^jia - jo 999 -lad SW3 8I«K jo uoiiiodoij; (i)


of the one set running vertically and the other horizontally, and

the difference has no statistical significance, the word array

has been suggested as a convenient term to denote either a row

or a column. If the values of X in one array are associated

with values of Y between the limits Yn - 8 and Yn + 8, Yn may be

termed the type of the array. (Pearson, ref. 6.) The special

kind of contingency tables with which we are now concerned

are called correlation tables, to distinguish them from tables

based on unmeasured qualities and so forth.

3. Nothing need be added to what was said in Chapter VI. as

regards the choice of magnitude and position of class-intervals.

When these have been fixed, the table is readily compiled by

taking a large sheet ruled with rows and columns properly

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

headed in the same way as the final table and entering a dot,

stroke, or small cross in the corresponding compartment for each

pair of recorded observations. If facility of checking be of

great importance, each pair of recorded values may be entered

on a separate card and these dealt into little packs on a board

ruled in squares, or into a divided tray; each pack can then be

run through to see that no card has been mis-sorted. The

difficulty as to the intermediate observations—values of the

variables corresponding to divisions between class-intervals—will

be met in the same way as before if the value of one variable

alone be intermediate, the unit of frequency being divided

between two adjacent compartments. If both values of the pair

be intermediates, the observation must be divided between four

adjacent compartments, and thus quarters as well as halves may

occur in the table, as, e.g., in Table III. In this case the statures

of fathers and sons were measured to the nearest quarter-

inch and subsequently grouped by 1-inch intervals: a pair in

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

which the recorded stature of the father is 60'5 in. and that of

the son 62-5 in. is accordingly entered as 0-25 to each of the

four compartments under the columns 59-5-60-5, 60-5-61-5, and

the rows 61-5-62'5, 62-5-63-5. Workers will generally form

their own methods for entering such fractional frequencies

during the process of compiling, but one convenient method is

to use a small x to denote a unit and a dot for a quarter; the

four dots should be placed in the position of the four points

of the x and joined when complete. It is best to choose the

limits of class-intervals, where possible, in such a way as to avoid

fractional frequencies.

4. The distribution of frequency for two variables may be

represented by a surface or solid in the same way as the frequency-

distribution of a single variable may be represented by a plane

figure. We may imagine the surface to be obtained by erecting


at the centre of every compartment of the correlation-table a

vertical of length proportionate to the frequency in that com-

partment, and joining up the tops of the verticals. If the

compartments were made smaller and smaller while the class-

frequencies remained finite, the irregular figure so obtained would

approximate more and more closely towards a continuous curved

surface—a frequency-surface—corresponding to the frequency-

curves for single variables of Chapter VI. The volume of the

frequency-solid over any area drawn on its base gives the

frequency of pairs of values falling within that area, just as the

area of the frequency-curve over any interval of the base-line gives

the frequency of observations within that interval. Models of

actual distributions may be constructed by drawing the frequency-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

distributions for all arrays of the one variable, to the same scale,

on sheets of cardboard, and erecting the cards vertically on a

base-board at equal distances apart, or by marking out a base-

board, in squares corresponding to the compartments of the

correlation-table, and erecting on each square a rod of wood of

height proportionate to the frequency. Such solid representations

of frequency-distributions for two variables are sometimes termed


5. It is impossible, however, to group the majority of

frequency-surfaces, in the same way as the frequency-curves,

under a few simple types: the forms are too varied. The simplest

ideal type is one in which every section of the surface is a sym-

metrical curve—the first type of Chap. VI. (fig. 5, p. 89). Like

the symmetrical distribution for the single variable, this is a very

rare form of distribution in economic statistics, but approximate

illustrations may be drawn from anthropometry. Fig. 29 shows

the ideal form of the surface, somewhat truncated, and fig.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

30 the distribution of Table III., which approximates to the same

type,—the difference in steepness is, of course, merely a matter of

scale. The maximum frequency occurs in the centre of the

whole distribution, and the surface is symmetrical round the

vertical through the maximum, equal frequencies occurring at

equal distances from the mode on opposite sides. The next

simplest type of surface corresponds to the second type of

frequency-curve—the moderately asymmetrical. Most, if not all,

of the distributions of arrays are asymmetrical, and like the dis-

tribution of fig. 9, p. 92: the surface is consequently asymmetrical,

and the maximum does not lie in the centre of the distribution.

This form is fairly common, and illustrations might be drawn

from a variety of sources—economics, meteorology, anthropometry,

etc. The data of Table II. will serve as an example. The total

distributions and the distributions of the majority of the arrays



Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
je 1C
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

[Toface page It
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
if V
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

are asymmetrical, the skewness being positive for the rows at

the top of the table (the mode being lower than the mean), and

negative for the rows at the foot, the more central rows being

nearly symmetrical. The maximum frequency lies towards the

upper end of the table in the compartment under the row and

column headed "30 -". The frequency falls off very rapidly

towards the lower ages, and slowly in the direction of old age.

Outside these two forms, it seems impossible to delimit empirically

any simple types. Tables V. and VI. are given simply as illus-

trations of two very divergent forms. Fig. 31 gives a graphical

representation of the former by the method corresponding to the

histogram of Chapter VI., the frequency in each compartment

being represented by a square pillar. The distribution of

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

frequency is very characteristic, and quite different from that

of any of the Tables I., II., III., or IV.

6. It is clear that such tables may be treated by any of the

methods discussed in Chapter V., which are applicable to all

contingency-tables, however formed. The distribution may be

investigated in detail by such methods as those of § 4, or tested

for isotropy (§ 11), or the coefficient of contingency can be

calculated (§§ 5-8). In applying any of these methods, however,

it is desirable to use a coarser classification than is suited to the

methods to be presently discussed, and it is not necessary to

retain the constancy of the class-interval. The classification

should, on the contrary, be arranged simply with a view to avoiding

many scattered units or very small frequencies. A few examples

should be worked as exercises by the student (Question 3).

7. But the coefficient of contingency merely tells us whether,

and if so, how closely, the two variables are related, and much

more information than this can be obtained from the correlation-

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

table, seeing that the measures of Chapters VII. and VIII. can be

applied to the arrays as well as to the total distributions. If the

two variables are independent, the distributions of all parallel

arrays are similar (Chap. V. § 13); hence their averages and

dispersions, e.g. means and standard deviations, must be the same.

In general they are not the same, and the relation between the

mean or standard deviation of the array and its type requires

investigation. Of the two constants, the mean is, in general, the

more important, and our attention will for the present be con-

fined to it. The majority of the questions of practical statistics

relate solely to averages: the most important and fundamental

question is whether, on an average, high values of the one variable

show any tendency to be associated with high (or with low)

values of the other. If possible, we also desire to know how great a

divergence of the one variable from its average value is associated



with a unit divergence of the other, and to obtain some idea as to

the closeness with which this relation is usually fulfilled.

8. Suppose a diagram (fig. 32) to be drawn representing the

values of means of arrays. Let OX, OF be the scales of the two

variables, i.e. the scales at the head and side of the table, 01, 12,

etc., being successive class-intervals. Let M, be the mean value

of X, and M2 the mean value of Y. If the two variables be

absolutely independent, the distributions of frequency in all

parallel arrays are similar (Chap. V. § 13), and the means of arrays

must lie on the vertical and horizontal lines M^M, M2M, the


Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259



.3M, 4 S 6X






Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google







Fig. 32.

small circles denoting means of rows and the small crosses means

of columns. (In any actual case, of course, the means would not

lie so regularly, but, if the independence were almost complete,

would only fluctuate slightly to the one side and the other of the

two lines.)

The cases with which the experimentalist, e.g. the chemist or

physicist, has to deal, where the observations are all crowded

closely round a single line, lie at the opposite extreme from

independence. The entries fall into a few compartments only of

each array, and the means of rows and of columns lie approximately

on one and the same curve, like the line RR of fig. 33.

The ordinary cases of statistics are intermediate between these

two extremes, the lines of means being neither at right angles as



in fig. 32, nor coincident as in fig. 33, but standing at an acute

angle with one another as BR (means of rows) and CC (means of

columns) in figs. 36-8. The complete problem of the statistician,

like that of the physicist, is to find formulas or equations which

will suffice to describe approximately these curves.

9. In the general case this may be a difficult problem, but, in

the first place, it often suffices, as already pointed out, to know

merely whether on an average high values of the one variable

show any tendency to be associated with high or with low values

of the other, a purpose which will be served very fairly by fitting a

straight line; and further, in a large number of cases, it is found

either (1) that the means of arrays lie very approximately round
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

straight lines, or (2) that they lie so irregularly (possibly owing

only to paucity of observations) that the real nature of the curve

is not clearly indicated, and a straight line will do almost as well

as any more elaborate curve. (Cf. figs. 36-38.) In such cases

—and they are relatively more frequent than might be supposed

—the fitting of straight lines to the means of arrays determines

all the most important characters of the distribution. We might

fit such lines by a simple graphical method, plotting the points

representing means of arrays on a diagram like those of figures

36-38, and "fitting " lines to them, say, by means of a stretched

black thread shifted about till it appeared to run as near as

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


might be to all the points. But such a method is hardly satis-

factory, more especially if the points are somewhat scattered; it

leaves too much room for guesswork, and different observers obtain

very different results. Some method is clearly required which

will enable the observer to determine equations to the two lines

for a given distribution, however irregularly the means may lie,

as simply and. definitely as he can calculate the means and

standard deviations.

10. Consider the simplest case in which the means of rows lie

exactly on a straight line RR (fig. 34). Let M2 be the mean

value of Y, and let RR cut M2x, the horizontal through M2 , in M.

Then it may be shown that the vertical through M must cut OX

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

in Mv the mean of X. For, let the slope of RR to the vertical,

i.e. the tangent of the angle MXMR or ratio of kl to IM, be bv

and let jiayiations from My, Mx be denoted by ayffldL^. Then for

any one row of type y in which the number of observations is .a,

%(x)=ni.b1y, and therefore for the whole table, since 2(ny) = 0.

2(*) = b^S.(ny) _ 0. M1 must therefore be the mean of X, and

M may accordingly be termed the mean of the whole distribution.

Knowing that RR passes through M, it remains only to determine

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

bv This may conveniently be done in terms of the mean product

p of all pairs of associated deviations x and y, i.e.—

p=^2(w) ■ ■ ■ ■ / ■ 0)

For any one row we have

%(xy) = y%(x) = n.by. - ^ *'

Therefore for the whole table

*' = £ (2)

Similarly, if CC be the line on which lie the means of columns

and b2 its slope to the horizontal, rsjsM,

62 = f2 ..... (3)

These two equations (2) and (3) are usually written in a

slightly different form. Let

fr-JL r\ . . . . (4)
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Then b,=r-x 6, = r-» . . . (5)

Or we may write the equations to RB and CC—


x = r—.y v = r—.x * . 1 . (6)

These equations may, of course, be expressed, if desired, in

terms of the absolute values of the variables X and Y instead of

the deviations x and y.

J^ 11. The meaning of the above expressions when the means of

rows and columns do not lie exactly on straight lines is very

readily obtained. If the values of x and bvy be noted for all

pairs of associated deviations, we have for the sum of the

squares of the differences, giving bl its value from (5),

%(x-bvyf = N.cr?.(\-r*) . . . (7)

If bl be given any other value, say (r + S)—, then

%(x - bvyf = No-?(\ - »-2 + 32).

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

This is necessarily greater than the value (7); hence ~%(x - bjy)2

has the lowest possible value when 61 is put equal to ro-x/o-y.

Further, for any one row in which the number of observations

is to, the deviation of the mean of the row from RR is d (fig. 35),

and the standard deviation is s„ 2(# - bly)2 = ns2 + n.d2. There-

fore for the whole table,

S(* - bvyf = %(ns2) + S(ntP).

But the first of the two sums on the right is unaffected by the

slope or position of RR, hence, the left-hand side being a

minimum, the second sum on the right must be a minimum also.

That is to say, when b, is put equal to r o-Jo-y, the sum of the squares

of the distances of the row-means from RR, each multiplied by the

corresponding frequency, is the lowest possible.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Similar theorems hold good, of course, with respect to the line

CC. If b2 be given the value r ~, %(x - b2.y)2 is a minimum,

and also %(n.e2) (fig. 35). Hence we may regard the equations (6)

as being, either (a) equations for estimating each individual x

from its associated y (and y from its associated x) in such a way

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


as to make the sum of the squares of the errors of estimate the

least possible; or (6) equations for estimating the mean of the x'sr

associated with a given type of y (and the mean of the y's associated

with a given type of x) in such a way as to make the sum of the

squares of the errors of estimate the least possible, when every

mean is counted once for each observation on which it is based.

Age of Wife

40 SO 60

Fig. 36. — Correlation between Age of Husband and Age of Wife in England

and Wales (Table II.): means of rows shown by circles and means of

columns by crosses: r= +0.91.

The lines represented by the two equations are thus, in a certain

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

natural sense, "lines of best fit " to the two actual lines of means.

12. The constant r is of very great importance. It is evi-

dently a pure number, and its magnitude is unaffected by the

scales in which x and y are measured, for these scales will

affect the numerator and denominator of (4) to the same

extent. If the two variables are independent, r is zero, for b1

and 62 are zero (c/. § 8). The sign is the sign of the mean

product p, and accordingly r is positive if large values of x

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


are associated with large values of y, and conversely (as in

Tables I.-IV.) negative if small values of x are associated with

large values of y, and conversely (as in Table V.). The numerical

value cannot exceed ± 1, for the sum of the series of squares

in equation (7) is then zero and the sum of a series of squares

cannot be negative. If r= ±1, it follows that all the observed

pairs of deviations are subject to the relation x/y = o-i/<ri: this





Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

"3 7/


FtitAwS statiu-e

61 R 66 68





Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google





Fig. 37.—Correlation between Stature of Father and Stature of Son (Table

III.): means of rows shown by circles and means of columns by crosses:


would be the case if the circles and crosses in such a diagram as

fig. 33 all lay on one and the same straight line. From these

properties r is termed the coefficient of correlation, and the

expression (4), r = pjir1.o-, — 'St(xy)/N.<Tjjiri should bexemembered.

It should be noted that, while r is zero if the variables are

independent, the converse is not necessarily true: the fact that

r is zero only implies that the means of rows and columns

lie scattered round two straight lines which do not exhibit



any definite trend, to right or to left, upward or downward.

Two variables for which r is zero are, however, conveniently

spoken of as uncorrelated. Table VI. and fig. 39 will serve as an

illustration of a case in which the variables are almost uncor-

related but by no means independent, r being very small ( - 0-014),

but the coefficient of contingency C 0-47.

Figs. 36, 37, 38 are drawn from the data of Tables II., III., and

IV., for which r has the values +0-91,+ 0-51, and +0"21 respec-

tively,, the correlation being positive in each case. The student

Number of Mother's ChiZcLreiv.

I 3 Sjt, 7 9 11

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Fia. 38.—Correlation between number of a Mother's Children and number of

her Daughter's Children (Table IV.): means of rows shown by circles

and means of columns by crosses: r = 4- 0.21.

should study such tables and diagrams closely, and endeavour to

accustom himself to estimating the value of r from the general

appearance of the table.

13. The two quantities .

6• = r— o„ = r—

are termed the coefficients of regression, or simply the regressions;

61 being the regression of x on y, or deviation in x corresponding

on the average to a unit change in the type of y, and b2 being

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


similarly the regression of y on x. Whilst the coefficient of

correlation is always a pure number, the regressions are only

pure numbers if the two variables have the same dimensions, as

in Tables I.-IV.: their magnitudes depend on the ratio of o-j<ry, and

consequently on the units in which x and y are measured. They

are both necessarily of the same sign (the sign of r). Since r is

Proportion, of Male• births per 1000 oirtks.



Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259









Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Fig. 39.—Correlation between Population of a Registration District and Pro-

portion of Male Births per thousand of all births (England and Wales,

1881-90, Table VI.): means of rows shown by |circles and means

of columns by crosses: r= -0.014.

not greater than unity, one at least of the regressions must be

not greater than unity, but the other may be considerably greater

if the ratio <rj<ry or o-„/o-x be great. The name regression arose

from the term being first introduced in the case of inheritance of

stature (Galton, refs. 2, 3). In this case the two standard devia-

tions are very nearly equal, so that both b1 and 62 are less than

unity, say (using the more recent data of Table III.) 0.50 and 0"52.

Hence the sons of fathers of deviation x from the mean of all fathers

have an average deviation of only 052a: from the mean of all sons;

i.e. they step back or "regress " towards the general mean, and 0-52

may be termed the "ratio of regression." In general, however,

the idea of a "stepping back" or "regression" towards a more

or less stationary mean is quite inapplicable—obviously so where

the variables are different in kind, as in Tables V. and VI.—

and the term " coefficient of regression" should be regarded simply

as a convenient name for the coefficients bl and b2. RR and GC

are generally termed the "lines of regression," and equations (6)

the " regression equations." The expressions " characteristic lines,"

"characteristic equations" (Yule, ref. 8) would perhaps be better.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Where the actual means of arrays appear to be given, to a satis-

factory degree of approximation, by straight lines, we may say

that the regression is linear. It is not safe, however, to assume

that such linearity extends beyond the limits of observation.

14. The two standard deviations

sx = o-x Jl —r2 sy = ay -J\ - r2

are of considerable importance. It follows from (7) that sx is the

standard deviation of (x - bvy), and similarly sy is the standard

deviation of (y - b2.x). Hence we may regard sx and sy as the

standard errors (root mean square errors) made in estimating x

from y and y from x by the respective characteristic relations

x = \-y y = bvx.

sx may also be regarded as a kind of average standard deviation of

a row about RR, and sy as an average standard deviation of a

column about CC. In an ideal case, where the regression is

truly linear and the standard deviations of all parallel arrays are

equal, a case to which the distribution of Table III. is a rough

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

approximation, sx is the standard deviation of the a;-array and s„

the standard deviation of the y-array (cf. Chap. X. § 19 (3)).

Hence sx and sy are sometimes termed the "standard deviations

of arrays."

15. Proceeding now to the arithmetical work, the only new

expression that has to be calculated in order to determine r, bv b2,

s„ and sy is the product sum %(xy) or the mean product p. As in

the cases of means and standard deviations, the form of the

arithmetic is slightly different according as the observations are

few and ungrouped, or sufficient to justify the formation of a

correlation-table. In the first case, as in Example i. below, the

work is quite straightforward.

Example i., Table VII.—The variables are (1) X—the estimated



Table VII. Theory of Correlation: Example i.











Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259



Products xy.




age of




tion of


of Agri-

tion in



Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google




tion of

y from






and Pence








per Week.



1. Glendale .

s. d-

20 9






•2. Wigton . .

20 3







3. Garatang .

average weekly earnings of agricultural labourers in 38 English

Poor-law unions of an agricultural type (the data of Example i.,

Chap. VIII. p. 137). (2) Y—the percentage of the population

in receipt of Poor-law relief on the 1st January 1891 in each of the

same unions (B return). The means of each of the variables are

calculated in the ordinary way, and then the deviations x and y

from the mean are written down (columns 4 and 5): care must

be taken to give each deviation the correct sign. These deviations

are then squared (columns 6 and 7) and the standard deviations

found as before (Chap. VIII. p. 136). Finally, every x is

multiplied by the associated y and the product entered in column

8 or column 9 according to its sign. These columns are then

added up separately and the algebraic sum of the totals gives

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

"S,(xy)= -666-04: therefore the mean product p = ~2,(xy)IN= -

17-53, and

,= ^-=--66.

20-5 x 1-29

There is therefore a well-marked relation exhibited by these data

between the earnings of agricultural labourers in a district and

the percentage of the population in receipt of Poor-law relief.

A penny is rather a small unit in which to measure deviations in

the average earnings, so for the regressions we may alter the unit

of a; to a shilling, making o-x= 1 "71, and

b. = r^ = - 0-87, b„ = r^ = - 0-50.

o-y o-x

The regression equations are therefore, in terms of these units,

z=-0-87y y=-0-50x.

For practical purposes it is more convenient to express the

equations in terms of the absolute values of the variables rather

than the deviations: therefore, replacing a; by (X- 15-94) and y

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

by (Y— 3-67) and simplifying, we have

X= 19-13-0-877 . . . . (a)

Y= 11-64 -0-50X . . . .(b)

the units being Is. for the earnings and 1 per cent. for the

pauperism. The standard errors made in using these equations

to estimate earnings from pauperism and pauperism from earnings

respectively are

<7i>/iT^=15-4d. = l-288.

o-y Jl - r2 = 0-97 per cent.



The equation (6) tells us therefore that a rise of 2s. in earnings

in passing from one district to another means on the average a

fall of 1 in the percentage in receipt of relief. A natural con-

clusion would be that this means a direct effect of the higher

earnings in diminishing the necessity for relief, but such a

conclusion cannot be accepted offhand. Equation (a) indicates,

for instance, that every rise of a unit in the percentage re-

lieved corresponds to a fall of 0-87 shillings, or 10^d. in earnings:

this might mean that the giving of relief tends to depress wages.

Which is the correct interpretation of the facts? The above

12 13 14 15 16 17 IS 19n 20

Avenane* weekly earnings oP AgricuUioxU-Labourers.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259


Fira. 40.—Correlation between Pauperism and Average Earnings of Agricultural

Labourers for certain districts of England (data of Table VII.) : MR,

CO, lines of regression: r— -0'66.

regression equations alone cannot tell us this, and it is in the

discussion of such questions that most of the difficulties of statisti-

cal arguments arise.

As a check on the whole of the arithmetical work, and to test

whether the correlation coefficient is unduly affected by a few out-

lying observations, or, perhaps, by the regression not being linear,

it is always as well to draw a diagram representing the results

obtained. Take scales along two axes at right angles (fig. 40)

representing the variables, and insert a dot (better, for clearness,

a small circle or a cross) at the point determined by each observed

pair of x and y. Complete the diagram by inserting the two lines

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

RR and CC given by the regression equations (a) and (b). In

doing this it is as well to determine a point at each end of both

lines, and then to check the work by seeing that they meet in the

mean of the whole distribution. Thus RR is determined from (a)

by the points r=0, •2T = 19-13 and 7=6, Z= 13-91: CC is

determined from (b) by the points X= 12, i" = 564 and X=2\,

y = l"14. Marking in these points, and drawing the lines, they

will be found to meet in the mean, X= 15-94, F=3-67. The

diagram gives a very clear idea of the distribution; clearly the

regression is as nearly linear as may be with so very scattered a

distribution, and there are no very exceptional observations. The

most exceptional districts are Brixworth and St Neots with rather

low earnings but very low pauperism, and Glendale and Wigton
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

with the highest earnings but a pauperism well above the lowest—

over 2 per cent.

16. When a classified correlation-table is to be dealt with, the

procedure is of precisely the same kind as was used in the calcula-

tion of a standard deviation, the same artifices being used to shorten

the work. That is to say, (1) the product-sum is calculated in the

first instance with respect to an arbitrary origin, and is afterwards

reduced to the value it would have with respect to the mean; (2)

the arbitrary origin is taken at the centre of a class-interval; (3)

the class-interval is treated as the unit of measurement throughout

the arithmetic.

Let deviations from the arbitrary origin be denoted by £ r], and

let |^ be the co-ordinates of the'mean. Then

£=x +| n = y + ij.

Therefore, summing, since the second and third sums on the

right vanish, being the sums of deviations from the mean,

S(^) = 2(ay) + ^,
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

or bringing 2(a;y) to the left,

2(*y) = S(6,)-^.

That is, in terms of mean-products, using p to denote the mean-

product for the arbitrary origin,

P =P ~ &j9

In any case where the origin from which deviations have been

measured is not the mean, this correction must be used. It will

sometimes give a sensible correction even for work in the form of


Example i., and in that case, of course, the standard deviations

will also require reduction to the mean.

As the arithmetical process of calculating the correlation co-

efficient from a grouped table is of great importance, we give two

illustrations, the first economic, the second biological.

Example ii., Table VIII.—The two variables are (1) X, the

percentage of males over 65 years of age in receipt of Poor-law

relief in 235 unions of a mainly rural character in England and

Wales; (2) Y, the ratio of the numbers of persons given relief " out-

doors " (in their own homes) to one "indoors" (in the workhouse).

The figures refer to a one-day count (1st August 1890, No. 36,

1890), and the table is one of a series that were drawn up with

the view to discussing the influence of administrative methods on

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

pauperism. (Economic Journal, vol. vi., 1896, p. 613.)

The arbitrary origin for X was taken at the centre of the fourth

column, or at 17-5 per cent.; for Y at the centre of the fourth

row, or 3-5. The following are the values found for the constants

of the single distributions :—

|= -0-1532 intervals= -0-77 per cent., whence Mx =

16-73 per cent.

(rx= 1-29 intervals = 6-45 per cent.

fj= +0-36 intervals or units, whence My = 3-8§.

o-,j = 2-98 units.

To calculate 2(£>7), the value of £t) is first written in every

compartment of the table against the corresponding frequency,

treating the class-interval as the unit: these are the figures in

heavy type in Table VIII. In making these entries the sign of

the product may be neglected, but it must be remembered that

this sign will be positive in the upper left-hand and lower right-

hand quadrants, negative in the two others. The frequencies are

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

then collected as shown in columns 2 and 3 of Table VIIIa.,

being grouped according to the value and sign of frj. Thus for

fi/=l, the total frequency in the positive quadrants is 13 + 8-5

= 21-5, in the negative 14 + 6 = 20: forfi;=2, 10 + 4-5 + 1 + 4-5

= 20 in the positive quadrants, 5 + 2 + 1 + 3-5 = 11-5 in the

negative, and so on. When columns 2 and 3 are completed, they

should first of all be checked to see that no frequency has been

dropped, which may be readily done by adding together the totals

of these two columns together with the frequency in row 4 and

column 4 of Table VIII. (the row and column for which £>j = 0),

being careful not to count twice the frequency in the compartment

common to the two; this grand total must clearly be equal to the

total number of observations N, or 235 in the present case. The

algebraic sum of the frequencies in each line of columns 2 and 3 is



Table VIII. Theory of Correlation : Example ii.— Old-age Pauperism and

Proportion of Out-relief. (The Frequencies are the figures printed in ordi-

nary type. The numbers in heavy type are the Deviation-Products (£?)).)




to One


Percentage of Males over 66 in receipt of Relief.



Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259








0- 1 {







Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


1- 2 {


13 0



6-0 ,


2-3 |



13 0





3- 4 {





Table VIIIa. Calculation of the Product Sum 2({if).


2. 3.


5. 6.






Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259






+ 1-5



+ 8-5



+ 10
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



+ 17



+ 16-5



+ 1-5



+ 0-5




+ 3-5




+ 0'34 x 6-45/2-98 = 0.74, and the regression equation accordingly

x = Q'74.y, or

X= 13-9 + 0-747,

the standard error made in using the equation for estimating X

from Y being o-x «/l - r2 = 6-07.

This is the equation of greatest practical interest, telling us

that, as we pass from one district to another, a rise of 1 in the

ratio of the numbers relieved in their own homes to the numbers

relieved in the workhouse corresponds on an average to a rise of

0-74 in the percentage in receipt of relief. The result is such as

to create a presumption in favour of the view that the giving of

outrrelief tends to increase the numbers relieved, and this can be

taken as a working hypothesis for further investigation.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

The student should work out the second regression equation,

and check both by calculating the means of the principal rows

and columns, and drawing a diagram like figs. 36, 37, and 38.

Example iii., Table IX.—(Unpublished data; measurements by

G. U. Yule.) The two variables are (1) X, the length of a mother-

frond of Lemna minor; (2) Y, the length of the daughter-frond.

The mother-frond was measured when the daughter-frond

separated from it, and the daughter-frond when its first daughter-

frond separated. Measures were taken from camera drawings

made with the Zeiss-Abbe camera under a low power, the actual

magnification being 24 :1. The units of length in the tabulated

measurements are millimetres on the drawings.

The arbitrary origin for both X and 7 was taken at 105 mm.

The following are the values found for the constants of the single


|= - 1'058 intervals= - 6-3 mm. Ml = 987 mm. on drawing.

= 4.11 mm. actual.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

ax= 2"828 intervals = 17.0 mm. on drawing = 0707 mm. actual.

5J=-0-203 ,, =- 1-2 mm. Mi = 103'8 mm. on drawing.

= 4-32 mm. actual.

oy= 3"084 ,, = 18.5 mm. on drawing= 0771 mm. actual.

The values of £rj are entered in every compartment of the

table as before, and the frequencies then collected, according to

the magnitude and sign of fq, in columns 2 and 3 of Table IXa.

The entries in these two columns are next checked by adding to

the totals the frequency in the row and column for which £rj is

zero, and seeing that it gives the total number of observations

(266). The numbers in column 4 are given by deducting the

entries in column 3 from those in column 2. The totals so

obtained are multiplied by £»/ (column 1) and the products entered








13 5
5. 6.
2. 3.

- 8 -5


+ 12

+ 5-5
+ 8-5
+ 1-5
+ 1-5
+ 3-5

+ 17-5
Table IXa.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google







Table IX.

Theory of S\
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

The regression of daughter-frond on mother-frond is 0-69 (a

value which will not be altered by altering the units of measure-

ment for both mother- and daughter-fronds, as such an alteration

will affect both standard deviations equally). Hence the re-

gression equation giving the average actual length (in millimetres)

of daughter-fronds for mother-fronds of actual length X is

F=l-48 + 0-69X

We again leave it to the student to work out the second

regression equation giving the average length of mother-fronds

for daughter-fronds of length Y, and to check the whole work

by a diagram showing the lines of regression and the means of

arrays for the central portion of the_table. -

17. The student should be careful to remember the following

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

points in working:—

(1) To give p and £rj their correct signs in finding the true

mean deviation-product p.

(2) To express o-x and o-„ in terms of the class-interval as a

unit, in the value of r=p/o-x ary, for these are the units in terms

of which p has been calculated.

(3) To use the proper units for the standard deviations (not

class-intervals in general) in calculating the coefficients of

regression: in forming the regression equation in terms of the

absolute values of the variables, for example, as above, the work

will be wrong unless means and standard deviations are ex-

pressed in the same units.

Further, it must always be remembered that correlation

coefficients, like all other statistical measures, are subject to

fluctuations of sampling (ef. Chap. III. §§ 7, 8). If we write

on cards a series of pairs of strictly independent values of x and

y and then work out the correlation coefficient for samples of,
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

say, 40 or 50 cards taken at random, we are very unlikely ever

to find r = 0 absolutely, but will find a series of positive and

negative values centring round 0. No great stress can therefore

be laid on small, or even on moderately large, values of r as

indicating a true correlation if the numbers of observations be

small. For instance, if N= 36, a value of r = + 0-5 may be

merely a chance result (though a very infrequent one); if

iV=100, r= ±0-3 may similarly be a mere fluctuation of

sampling, though again an infrequent one. If N= 900, a value

of r= ±0"1 might occur as a fluctuation of sampling of the same

degree of infrequency. The student must therefore be careful in

interpreting his coefficients. (See Chap. XVII. § 15.)

Finally, it should be borne in mind that any coefficient, e.g. the

coefficient of correlation or the coefficient of contingency, gives


only a part of the information afforded by the original data or

the correlation table. The correlation table itself, or the original

data if no correlation table has been compiled, should always be

given, unless considerations of space or of expense absolutely

preclude the adoption of such a course.


The theory of correlation was first developed on definite assumptions

as to the form of the distribution of frequency, the so-called "normal

distribution " (Chap. XVI.) being assumed. In (1) Bravais introduced

the product-sum, but not a single symbol for a coefficient of correlation.

Sir Francis Galton, in (2), (3), and (4), developed the practical method,

determining his coefficient (Galton's function, as it was termed at first)

graphically. Edgeworth developed the theoretical side further in (5),

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

and Pearson introduced the product-sum formula in (6)—both memoirs

being written on the assumption of a "normal" distribution of fre-

quency (cf. Chap. XVI.). The method used in the preceding chapter

is based on (7) and (8).

(1) Bravais, A., "Analyse mathematique sur les probabilites des erreurs de

situation d'un point," Acad• des Sciences: Mémoiresprésentéspar divers

savants, II„ serie, t. ix., 184ti, p. 255.

(2) Galton, Francis, "Regression towards Mediocrity in Hereditary

Stature," Jour. Anthrop. Inst.• vol. xv., 1886, p. 246.

(3) Galton, Francis, "Family Likeness in Stature," Proc. Roy. Soc,

vol. xl., 1886, p. 42.

(4) Galton, Francis, "Correlations and their Measurement," Proc. Soy.

Soc , vol. xlv., 1888, p. 135.

(5) Edgeworth, F. Y., "On Correlated Averages," Phil. Mag., 5th Series,

vol. xxxiv., 1892, p. 190.

(6) Pearson, Karl, "Regression, Heredity, and Panmixia," Phil. Trans.

Roy. Soc, Series A, vol. clxxxvii., 1896, p. 253.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

(7) Yule, G. U., "On the significance of Bravais' Formula; for Regression,

etc., in the case of Skew Correlation," Proc. Roy. Soc, vol. Ix., 1897,

p. 477.

(8) Yule, G. U., "On the Theory of Correlation," Jour. Roy. Stat. Soc,

vol. Ix., 1897, p. 812.

(9) Darbishire, A. D., "Some Tables for illustrating Statistical Correla-

tion," Mem. and Proc. of the Manchester Lit. and Phil. Soc, vol. li.,

1907. (Tables and diagrams illustrating the meaning of values of the

correlation coefficient from 0 to 1 by steps of a twelfth.)

Reference may also be made here to—

(10.) Edgeworth, F. Y., "On a New Method of reducing Observations

relating to several Quantities," Phil. Mag., 5th Series, vol. xxiv., 1887,

p. 222, and vol. xxv., 1888, p. 184. (A method of treating correlated

variables differing entirely from that described in the preceding

chapter, and based on the use of the median: the method involves

the use of trial and error to some extent. For some illustrations see

F. Y. Edgeworth and A. L. Bowley, Jour. Roy. Stat. Soc, vol. lxv.,

1902, p. 341 et seq.)

References to memoirs on the theory of non-linear regression are given

at the end of Chapter X.




■ 1. Find the correlation-coefficient and the equations of regression for the

following values of X and Y.

X. Y.

[As a matter of practice it is never worth calculating a con-elation-coefficient

for so few observations: the figures are given solely as a short example on

which the student can test his knowledge of the work.]

2. The following figures show, for the districts of Example i., the ratios of

the numbers of paupers in receipt of outdoor relief to the numbers in receipt

of relief in the workhouse. Find the correlations between the out-relief ratio

and (1) the estimated earnings of agricultural labourers; (2) the percentage

of the population in receipt of relief.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259














Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google






7 61




























In calculating the coefficient of contingency (coefficient of mean square

contingency) use the following groupings, so as to avoid small scattered fre-

quencies at the extremities of the tables and also excessive arithmetic:—

I. Group together (1) two top rows, (2) three bottom rows, (3) two first

columns, (4) four last columns, leaving centre of table as it stands.

II. Regroup by ten-year intervals (IB-, 25-, 35-, etc.) for both husband and

wife, making the last group "65 and over."

III. Regroup by 2-inch intervals, 58.5-60-5, etc., for father, Sit.S-ei'B,

etc., for son. If a 3-inch grouping be used (58.5-61-5, etc., for both father and

son), the coefficient of mean square contingency is 0.465. [Both results cited

from Pearson, ref. 1 of Chap. V.]

IV. For cols., group 1 + 2, 3 + 4, . . . , 11 + 12, 13 and upwards. Rows,

0, 1 + 2, 3 + 4, . . . , 9 +10, 11 and upwards.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

VI. For cols., group all up to 494 "5 and all over 521 .5, leaving central ools.

Rows singly up 20: then 20-28, 28-44, 44-56, 56 upwards

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



1. Necessity for careful choice of variables before proceeding to calculate r—

2-8. Illustration i.: Causation of pauperism — 9-10. Illustration

ii.: Inheritance of fertility—11-13. Illustration iii.: The weather

and the crops —14. Correlation between the movements of two

variables:—(a) Non-periodic movements: Illustration iv.: Changes

in infantile and general mortality—15-17. (b) Quasi-periodic move-

ments: Illustration v.: The marriage-rate and foreign trade—

18. Elementary methods of dealing with cases of non-linear regression

—19. Certain rough methods of approximating to the correlation-


1. The student—especially the student of economic statistics, to

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

whom this chapter is principally addressed—should be careful to

note that the coefficient of correlation, like an average or a

measure of dispersion, only exhibits in a summary and compre-

hensible form one particular aspect of the facts on which it is

'based, and the real difficulties arise in the interpretation of the

coefficient when obtained. The value of the coefficient may be

consistent with some given hypothesis, but it may be equally

consistent with others; and not only are care and judgment

essential for the discussion of such possible hypotheses, but also

a thorough knowledge of the facts in all other possible aspects.

Further, care should be exercised from the commencement in the

selection of the variables between which the correlation shall be

determined. The variables should be defined in such a way as

to render the correlations as readily interpretable as possible,

and, if several are to be dealt with, they should afford the answers

to specific and definite questions. Unfortunately, the field of

choice is frequently very much limited, by deficiencies in the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

available data and so forth, and consequently practical possibilities

as well as ideal requirements have to be taken into account. No

general rules can be laid down, but the following are given as

illustrations of the sort of points that have to be considered.


2. Illustration i.—It is required to throw some light on the

variations of pauperism in the unions (unions of parishes) of

England. (Cf. Yule, ref. 2.)

One table (Table VIII.) bearing on a part of this question, viz.

the influence of the giving of out-relief on the proportion of the

aged in receipt of relief, was given in Chap. IX. (p. 183). The

question was treated by correlating the percentage of the aged

relieved in different districts with the ratio of numbers relieved

outdoors to the numbers in the workhouse. Is such a method

the best possible 1

On the whole, it would seem better to correlate changes in

pauperism with changes in various possible factors. If we say

that a high rate of pauperism in some district is due to lax

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

administration, we presumably mean that as administration

became lax, pauperism rose, or that if administration were more

strict, pauperism would decrease; if we say that the high pauper-

ism is due to the depressed condition of industry, we mean that

when industry recovers, pauperism will fall. When we say, in

fact, that any one variable is a factor of pauperism, we mean

that changes in that variable are accompanied by changes in the

percentage of the population in receipt of relief, either in the

same or the reverse direction. It will be better, therefore, to

deal with changes in pauperism and possible factors. The next

question is what factors to choose.

3. The possible factors may be grouped under three heads:—

(a) Administration.—Changes in the method or strictness of

administration of the law.

(b) Environment. — Changes in economic conditions (wages,

prices, employment), social conditions (residential or industrial

character of the district, density of population, nationality of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

population), or moral conditions (as illustrated, e.g., by the statis-

tics of crime).

(c) Age Distribution.—the percentage of the population between

given age-limits in receipt of relief increases very rapidly with old

age, the actual figures given by one of the only two then existing

returns of the age of paupers being—2 per cent. under age 16,

1 per cent. over 16 but under 65, 20 per cent. over 65. (Return

36, 1890.)

It is practically impossible to deal with more than three factors,

one from each of the above groups, or four variables alto-

gether, including the pauperism itself. What shall we take, then,

as representative variables, and how shall we best measure

"pauperism "?

4. Pauperism.—The returns give (a) cost, (b) numbers relieved.

It seems better to deal with (6) (as in the illustration of Table


VIII., Chap. IX.), as numbers are more important than cost from \

the standpoint of the moral effect of relief on the population.'

The returns, however, generally include both lunatics and vagrants

in the totals of persons relieved; and as the administrative methods

of dealing with these two classes differ entirely from the methods

applicable to ordinary pauperism, it seems better to alter the

official total by excluding them. Eeturns are available giving

the numbers in receipt of relief on 1st January and 1st July;

there does not seem to be any special reason for taking the one

return rather than the other, but the return for 1st January was

actually used. The percentage of the population in receipt of

relief on 1st January 1871, 1881, and 1891 (the three census

years), less lunatics and vagrants, was therefore tabulated for each
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

union. (The investigation was carried out in 1898.)

5. Administration.—The most important point here, and one

that lends itself readily to statistical treatment, is the relative

proportion of indoor and outdoor relief (relief in the workhouse

and relief in the applicant's home). The first question is,

again, shall we measure this proportion by cost or by numbers 1

The latter seems, as before, the simpler and more important ratio

for the present purpose, though some writers have preferred the

statement in terms of expenditure (e.g. Mr Charles Booth, Aged

Poor—Condition, 1894). If we decide on the statement in terms

of numbers, we still have the choice of expressing the proportion (1)

as the ratio of numbers given out-relief to numbers in the work-

house, or (2) as the percentage of numbers given out-relief on

the total number relieved. The former method was chosen,

partly on the simple ground that it had already been used in an

earlier investigation, partly on the ground that the use of the

ratio separates the higher proportions of out-relief more clearly

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

from each other, and these differences seem to have significance.

Thus a union with a ratio of 15 outdoor paupers to one indoor

seems to be materially different from one with a ratio of, say, 10

to 1 ; but if we take, instead of the ratios, the percentages of

outdoor to total paupers, the figures are 94 per cent. and 91 per

cent. respectively, which are so close that they will probably fall

into the same array. The ratio of numbers in receipt of outdoor

relief to the numbers in the workhouse, in every union, was

therefore tabulated for 1st January in the census years 1871, 1881,


6. Environment.—This is the most difficult factor of all to deal

with. In Mr Booth's work the factors tabulated were (1) persons

per acre; (2) percentage of population living two or more to a

room, i.e. "overcrowding"; (3) rateable value per head (Aged Poor—

Condition). The data relating to overcrowding were first collected


at the census of 1891, and are not available for earlier years.

Some trial was made of rateable value per head, but with not

very satisfactory results. For any given year, and for a group of

unions of somewhat similar character, e.g. rural, the rateable value

per head appears to be highly (negatively) correlated with the

pauperism, but changes in the two are not very highly correlated:

probably the movements of assessments are sluggish and irregular,

especially in the case of falling assessments in rural unions, and

do not correspond at all accurately with the real changes in the

value of agricultural land. After some consideration, it was

decided to use a very simple index to the changing fortunes of a

district, viz. the movement of the population itself. If the

population of a district is increasing at a rate above the average,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

this is primA facie evidence that its industries are prospering; if

the population is decreasing, or not increasing as fast as the

average, this strongly suggests that the industries are suffering

from a temporary lack of prosperity or permanent decay. The

population of every union was therefore tabulated for the censuses

of 1871, 1881, 1891.

7. Age Distribution.—As already stated, the figures that are

known clearly indicate a very rapid rise of the percentage relieved

after 65 years of age. The percentage of the population over 65

years of age was therefore worked out for every union and tabu-

lated from the same three censuses. This is not, of course,

at all a complete index to the composition of the population as

affecting the rate of pauperism, which is sensibly dependent on

the proportion of the two sexes, and the numbers of children as

well. As the percentage in receipt of relief was, however, 20 per

cent. for those over 65, and only 1-2 per cent. for those under that

age, it is evidently a most important index. (A more complete

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

method might have been used by correcting the observed rate of

pauperism to the basis of a standard population with given num-

bers of each age and sex. (Cf. below, Chap. XI. pp. 219-21.)

8. The changes in each of the four quantities that had been

tabulated for every union were then measured by working out the

ratios for the intercensal decades 1871-81 and 1881-91, taking

the value in the earlier year as 100 in each case. The percentage

ratios so obtained were taken as the four variables. Further, as

the conditions are and were very different for rural and for urban

unions, it seemed very desirable to separate the unions into groups

according to their character. But this cannot be done with any

exactness: the majority of unions are of a mixed character, con-

sisting, say, of a small town with a considerable extent of the

surrounding country. It might seem best to base the classification

on returns of occupations, e.g. the proportions of the population


engaged in agriculture, but the statistics of occupations are not

given in the census for individual unions. Finally, it was decided

to use a classification by density of population, the grouping used

being—Rural, 0-3 person per acre or less: Mixed, more than

0-3 but not more than 1 person per acre: Urban, more than 1 person

per acre. The metropolitan unions were also treated by them-

selves. The limit 0-3 for rural unions was suggested by the

density of those agricultural unions the conditions in which

were investigated by the Labour Commission (the unions of

Table VII., Chap. IX.): the average density of these was 0-25,

and 34 of the 38 were under 0'3. The lower limit of density for

urban unions—1 per acre—was suggested by a grouping of Mr

Booth's (group xiv.): of course 1 person per acre is not a density

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

associated with an urban district in the ordinary sense of the

term, but a country district cannot reach this density unless it

include a small town or portion of a town, i.e. unless a large

proportion of its inhabitants live under urban conditions.

The method by which the relations between four variables are

discussed is fully described in Chapter XII.: at the present stage

it can only be stated that the discussion is based on the correlations

between all the possible (6) pairs that can be formed from the four


9. Illustration ii.—The subject of investigation is the inheritance

of fertility in man. (Cf. Pearson and others, ref. 3.) One table,

from the memoir cited, was given as an example in the last chapter

(Table IV.).

Fertility in man (i.e. the number of children born to a given pair)

ia very largely influenced by the age of husband and wife at

marriage (especially the latter), and by the duration of marriage.

It is desired to find whether it is also influenced by the heritable

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

constitution of the parents, i.e. whether, allowance being made for

the effect of such disturbing causes as age and duration of marriage,

fertility is itself a heritable character.

The effect of duration of marriage may be largely eliminated

by excluding all marriages which have not lasted, say, 15 years

at least. This will rather heavily reduce the number of records

available, but will leave a sufficient number for discussion. It

would be desirable to eliminate the effect of late marriages in

the same way by excluding all cases in which, say, husband was

over 30 years of age or wife over 25 (or even less) at the time

of marriage. But, unfortunately, this is impossible; the age of

the wife—the most important factor—is only exceptionally given

in peerages, family histories, and similar works, from which the

data must be compiled. All marriages must therefore be

included, whatever the age of the parents at marriage, and the


effect of the varying age at marriage must be estimated


10. But the correlation between (1) number of children of a

woman and (2) number of children of her daughter will be further

affected according as we include in the record all her available

daughters or only one. Suppose, e.g., the number of children in

the first generation is 5 (say the mother and her brothers and

sisters), and that she has three daughters with 0, 2, and 4

children respectively: are we to enter all three pairs (5, 0),

(5, 2), (5, 4) in the correlation-table, or only one pair 1 If the

latter, which pair 1 For theoretical simplicity the second process

is distinctly the best (though it still further limits the available

data). If it be adopted, some regular rule will have to be made

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

for the selection of the daughter whose fertility shall be entered

in the table, so as to avoid bias: the first daughter married

for whom data are given, and who fulfils the conditions as to

duration of marriage, may, for instance, be taken in every case.

(For a much more detailed discussion of the problem, and the

allied problems regarding the inheritance of fertility in the horse,

the student is referred to the original.)

11. Illustration iii.—The subject for investigation is the

relation between the bulk of a crop (wheat and other cereals,

turnips and other root crops, hay, etc.), and the weather. (Cf.

Hooker, ref. 6.)

Produce-statistics for the more important crops of Great

Britain have been issued by the Board of Agriculture since

1885: the figures are based on estimates of the yield furnished

by official local estimators all over the country. Estimates are

published for separate counties and for groups of counties

(divisions). But the climatic conditions vary so much over the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

United Kingdom that it is better to deal with a smaller area,

more homogeneous from the meteorological standpoint. On the

other hand, the area should not be too small; it should be large

enough to present a representative variety of soil. The group

of eastern counties, consisting of Lincoln, Hunts, Cambridge,

Norfolk, Suffolk, Essex, Bedford, and Hertford, was selected as

fulfilling these conditions. The group includes the county with

the largest acreage of each of the ten crops investigated, with

the single exception of permanent grass.

12. The produce of a crop is dependent on the weather of

a long preceding period, and it is naturally desired to find the

influence of the weather at all successive stages during this

period, and to determine, for each crop, which period of the

year is of most critical importance as regards weather. It must

be remembered, however, that the times of both sowing and


harvest are themselves very largely dependent on the weather,

and consequently, on an average of many years, the limits of

the critical period will not be very well denned. If, therefore,

we correlate the produce of the crop (X) with the characteristics

of the weather (Y) during successive intervals of the year, it

will be as well not to make these intervals too short. It was

accordingly decided to take successive groups of 8 weeks, over-

lapping each other by 4 weeks, i.e. weeks 1-8, 5-12, etc.

Correlation coefficients were thus obtained at 4-weeks intervals,

but based on 8 weeks' weather.

13. It remains to be decided what characteristics of the weather

are to be taken into account. The rainfall is clearly one factor

of great importance, temperature is another, and these two will

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

afford quite enough labour for a first investigation. The weekly

rainfalls were averaged for eight stations within the area, and

the average taken as the first characteristic of the weather.

Temperatures were taken from the records of the same stations.

The average temperatures, however, do not give quite the sort

of information that is required: at temperatures below a certain

limit (about 42" Fahr.) there is very little growth, and the

growth increases in rapidity as the temperature rises above this

point (within limits). It was therefore decided to utilise the

figures for "accumulated temperatures above 42° Fahr.," i.e.

the total number of day-degrees above 42° during each of the

8-weekly periods, as the second characteristic of the weather;

these "accumulated temperatures," moreover, show much larger

variations than mean temperatures.

The student should refer to the original for the full dis-

cussion as to data. The method of treating the correlations

between three variables, based on the three possible correlations

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

between them, is described in Chapter XII.

14. Problems of a somewhat special kind arise when dealing

with the relations between simultaneous values of two variables

which have been observed during a considerable period of time,

for the more rapid movements will often exhibit a fairly close

consilience, while the slower changes show no similarity. The two

following examples will serve as illustrations of two methods which

are generally applicable to such cases.

Illustration iv.—Fig. 41 exhibits the movements of (1) the

infantile mortality (deaths of infants under 1 year of age per 1000

births in the same year); (2) the general mortality (deaths at all

ages per 1000 living) in England and Wales during the period

1838-1904. A very cursory inspection of the figure shows that

when the infantile mortality rose from one year to the next

the general mortality also rose, as a rule; and similarly, when the


infantile mortality fell, the general mortality also fell. There

were, in fact, only five or six exceptions to this rule during the

whole period under review. The correlation between the annual

values of the two mortalities would nevertheless not be very high,

as the general mortality has been falling more or less steadily since

1875 or thereabouts, while the infantile mortality attained almost

a record value in 1899. During a long period of time the correla-

tion between annual values may, indeed, very well vanish, for the

two mortalities are affected by causes which are to a large extent

different in the two cases. To exhibit, therefore, the closeness of

the relation between infantile and general mortality, for such

causes as show marked changes between one year and the next, it
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

will be best to proceed by correlating the annual changes, and not

the annual values. The work would be arranged in the following

form (only sufficient years being given to exhibit the principle of

the process), and the correlation worked out between the figures of

columns 3 and 5.







Increase or


Increase or



Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



per 1000

from Year

per 1000

from Year

















+ 1-1










+ 0-1






For the period to which the diagram refers, viz. 1838-1904, the

and under 1 year of age. (Gf. Exercises 7 and 8, Chap. XL, and

for method ref. 5.)


1840 SO









L§~ so.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259





W^^"7 V



ao %

5, >


1840 SO
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google






Fig. 41.—Infantile and General Mortality in England and Wales, 1838-1904.

15. Illustration v.—The two curves of fig. 42 show (1) the

marriage-rate (persons married per 1000 of the population) for

England and Wales; (2) the values of exports and imports per

1855 eo





TO 75 SO 85

90 33 1900 05 „





io +


1855 60 00 TO IS 80 85 90 95 !90O 05

Fig. 42.—Marriage-rate and Foreign Trade, England and Wales, 1855-1904.

head of the population of the United Kingdom for every year

from 1855 to 1904. Inspection of the diagram suggests a similar

relation to that of the last example, the one variable showing a

rise from one year to the next when the other rises, and a fall

when the other falls. The movement of both variables is, how-


ever, of a much more regular kind than that of mortality,

resembling a series of "waves" superposed on a steady general

trend, and it is the " waves " in the two variables—the short-period

movements, not the slower trends—which are so clearly related.

16. It is not difficult, moreover, to separate the short-period

oscillations, more or less approximately, from the slower movement.

Supposethe marriage-rate for each year replaced by the average

of an odd number of years of which it is the centre, the number

being as near as may bo the same as the period of the " waves "—

e.g. nine years. If these short-period averages were plotted on

the diagram instead of the rates of the individual years, we should

evidently obtain a smoother curve which would clearly exhibit

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

the trend and be practically free from the conspicuous waves.

The excess or defect of each annual rate above or below the

trend, if plotted separately, would therefore give the "waves"

apart from the slower changes. The figures for foreign trade

may be treated in the same way as the marriage-rate, and we

can accordingly work out the correlation between the waves or

rapid fluctuations, undisturbed by the movements of longer period,

however great they may be. The arithmetic may be carried out

in the form of the following table, and the correlation worked out

in the ordinary way between the figures of columns 4 and 7.







Exports +
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google










£'i per
























are related to those of trade. For the period 1861-95 the

correlation between the two oscillations (Hooker, ref. 4) is 0-86.

The method may obviously be extended by correlating the devia-

tion of the marriage-rate in any one year with the deviation of

the exports and imports of the year before, or two years before,

instead of the same year; if a sufficient number of years be

taken, an estimate may be made, by interpolation, of the time-

difference that would make the correlation a maximum if it were

possible to obtain the figures for exports and imports for periods

other than calendar years. Thus Mr Hooker finds (ref. 4) that

on an average of the years 1861-95 the correlation would be a

maximum between the marriage-rate and the foreign trade of

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259


Fig. 43.—Fluctuations in (1) Marriage-rate and (2) Foreign Trade (Exports

+ Imports per head) in England and Wales : the Curves show Deviations

from 9-year means. Data of R. H. Hooker, Jour. Eoy. Stat. Soe., 1901.

about one-third of a year earlier. The method is an extremely

useful one and is obviously applicable to any similar case. The

student should refer to the paper by Mr Hooker, cited. Reference

may also be made to ref. 9, in which several diagrams are given

similar to fig. 43, and the nature of the relationship between the

marriage-rate and such factors as trade, unemployment, etc., is

discussed, it being suggested that the relation is even more

complex than appears from the above.

18. It was briefly mentioned in § 9 of the last chapter that

the treatment of cases when the regression was non-linear was,

in general, somewhat difficult. Such cases lie strictly outside

the scope of the present volume, but it may be pointed out

that if a relation between X and Y be suggested, either by

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

theory or by previous experience, it may be possible to throw

that relation into the form

Y = A + B.<t>(X),

where A and B are the only unknown constants to be determined.

If a correlation-table be then drawn up between Y and <j>(2T)

instead of Y and X, the regression will be approximately linear.

Thus in Table V. of the last chapter, if X be the rate of

discount and Y the percentage of reserves on deposits, a

diagram of the curves of regression, or curves on which the

means of arrays lie, suggests that the relation between X and Y

is approximately of the form

X(Y-B) = A,

A and B being constants; that is,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

XY=A + BX.

Or, if we make XY a new variable, say Z,

Z=A + BX.

Hence, if we draw up a new correlation-table between X and Z

the regression will probably be much more closely linear.

If the relation between the variables be of the form


we have

log Y=logA + X. log B,

and hence the relation between log Y and X is linear. Similarly,

if the relation be of the form


we have

log Y = log A — n. log X,

and so the relation between log Y and log X is linear. By

means of such artifices for obtaining correlation-tables in

which the regression is linear, it may be possible to do a good

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

deal in difficult cases whilst using elementary methods only.

The advanced student should refer to ref. 12 for a different

method of treatment.

19. The only strict method of calculating the correlation

coefficient is that described in Chapter IX. from the formula

r= v' — . Approximations to this value may, however, be


found in various ways, for the most part dependent either (1)

on the formulae for the two regressions »—- and r—, or (2) on

the formulae for the standard deviations of the arrays crx Jl - r2

and o-y J1 - r2. Such approximate methods are not recommended

for ordinary use, as they will lead to different results in different

hands, but a few may be given here, as being occasionally useful

for estimating the value of the correlation in cases where the

data are not given in such a shape as to permit of the proper

calculation of the coefficient.

(1) The means of rows and columns are plotted on a diagram,

and lines fitted to the points by eye, say by shifting about

a stretched black thread until it seems to run as near as may

be to all the points. If bv b2 be the slopes of these two lines

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

to the vertical and the horizontal respectively,

r= Jbvbr

Hence the value of r may be estimated from any such diagram

as figs. 36-40 in Chapter IX., in the absence of the original

table. Further, if a correlation-table be not grouped by

equal intervals, it may be difficult to calculate the product

sum, but it may still be possible to plot approximately a diagram

of the two lines of regression, and so determine roughly the

value of r. Similarly, if only the means of two rows and

two columns, or of one row and one column in addition to the

means of the two variables, are known, it will still be possible

to estimate the slopes of RR and CC, and hence the correlation


(2) The means of one set of arrays only, say the rows, are

calculated, and also the two standard-deviations o-x and <rr The

means are then plotted on a diagram, using the standard-deviation

of each variable as the unit of measurement, and a line fitted by

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

eye. The slope of this line to the vertical is r. If the standard

deviations be not used as the units of measurement in plotting,

the slope of the line to the vertical is r <rJo;,, and hence r will be

obtained by dividing the slope by the ratio of the standard-


This method, or some variation of it, is often useful as a

makeshift when the data are too incomplete to permit of the

proper calculation of the correlation, only one line of regression

and the ratio of the dispersions of the two variables being required:

the ratio of the quartile deviations, or other simple measures of

dispersion, will serve quite well for rough purposes in lieu of the

ratio of standard-deviations. As a special case, we may note that


if the two dispersions are approximately the same, the slope of

RR to the vertical is r.

Plotting the medians of arrays on a diagram with the quartile

deviations as units, and measuring the slope of the line, was the

method of determining the correlation coefficient (" Galton's

function ") used by Sir Francis Galton, to whom the introduction

of such a coefficient is due. (Refs. 2-4 of Chap. IX. p. 188.)

(3) If sx be the standard-deviation of errors' of estimate like

x- bvy, we have from Chap. IX. § 11—

and hence

But if the dispersions of arrays do not differ largely, and the

regression is nearly linear, the value of sz may be estimated from

the average of the standard-deviations of a few rows, and r deter-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

mined—or rather estimated—accordingly. Thus in Table III.,-

Chap. IX., the standard-deviations of the ten columns headed

62-5-63-5, 63-5-64-5, etc., are—













The standard-deviation of the stature of all sons is 2-75: hence

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

/" /2-359V

= 0-514.

This is the same as the value found by the product-sum method

to the second decimal place. It would be better to take an

average by weighting the square of each standard-deviation with

the number of observations in the column, but in the present

case this would only lead to a very slightly different result, viz.

o-= 2-362, r = 0-512.

The method is clearly inapplicable to such tables as V. and VI.

of Chap. IX., in which the means of successive arrays do not lie

closely round straight lines. In such cases it would always tend

to give a value for r markedly higher than that given by the

product-sum method. That method gives a value based on the

CORRELATION: practical applications and methods. 205

standard-deviation round the line of regression; the method used

here gives a value dependent on the standard-deviation round a

curve which sweeps through all the means of arrays, and the second

standard-deviation is necessarily less than the first. The method

thus leads to a generalised correlation-coefficient (Pearson's

correlation-ratio) measuring the approach towards a curvilinear

line of regression of any form. (Ref. 12.)


Illustrative Applications, principally to Economic Statistics,

and Practical Methods.

(1) Yule, G. U., "On the Correlation of total Pauperism with Proportion of

Out-relief," Economic Jour., vol. v., 1895, p. 603, and vol. vi., 1896,

p. 613.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(2) Yule, O. U., "An Investigation into the Causes of Changes in Pauperism

in England chiefly during the last two Intercensal Decades," Jour.

Roy. Stat. Soc, vol. lxii., 1899, p. 249. (Cf. Illustration i.)

(3) Pearson, Karl, Alice Lee, and L. Bramley Moore, "Genetic

(reproductive) Selection, Inheritance of Fertility in Man and of

Fecundity in thoroughbred Race-horses," Phil. Trans. Roy. Soc., Series

A, vol. cxcii., 1899, p. 257. (Cf. Illustration ii.)

(4) Hooker, R. H., "On the Correlation of the Marriage-rate with Trade,"

Jour. Roy. Stat. Soc, vol. lxiv., 1901, p. 485. (The method of

Illustration v.)

(5) Hooker, R. H., "On the Correlation of Successive Observations: illus-

trated by Corn-prices," ibid., vol. lxviii., 1905, p. 696. (The method

of Illustration iv.)

(6) Hooker, R. H., "The Correlation of the Weather and the Crops," ibid.,

vol. Ixx., 1907, p. 1. (Cf. Illustration iii.)

(7) Norton, J. P., Statistical Studies in the New York Money Market;

Macmillan Co., New York, 1902. (Applications to financial statistics:

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

an instantaneous average method, analogous to that of illustration v., is

employed, but the instantaneous average is obtained by an interpolated

logarithmic curve.)

(8) March, L., "Comparaison numerique de courbes statistiques," Jour.

de la sociite de statistique de Paris, 1905, pp. 255 and 306. (Uses the

methods of illustrations iv. and v., but obtaining the instantaneous

average in the latter case by graphical interpolation.)

(9) Yule, G. U., "On the Changes in the Marriage and Birth Rates in

England and Wales during the past Half Century, with an Inquiry as

to their probable Causes," Jour. Roy. Stat. Soc, vol. lxix., 1906, p. 83.

(10) Heron, D., On the Relation of Fertility in Man to Social Status,

"Drapers' Co. Research Memoirs: Studies in National Deterioration,"

I. ; Dulau & Co., London, 1906.

Theory of Correlation in the case of Non-linear Regression.

(11) Pearson, Karl, "On the Systematic Fitting of Curves to Observations

and Measurements," Biometrika, vol. i. p. 265, and vol. ii. p. 1, 1902.

(The second part is useful for the fitting of curves in cases of non-linear


v^ (12) Pearson, Karl, On the General Theory of Skew Correlation and Non-

linear Regression, '' Drapers' Co. Research Memoirs: Biometric Series,"

II.; Dulau & Co., London, 1905. (Suggests a "correlation ratio"

which measures the approach of the points given by every pair of values

of the two variables x and y to a curve of any form, in the same way

that the correlation-coefficient measures the closeness to a straight line,

by utilising the standard-deviation of arrays.)

(13) Pearson, Karl, "On a General Theory of the Method of False

Position," Phil. Mag., June 1903. (A method of curve fitting by

the use of trial solutions.)

(14) Blakeman, J., "On Tests for Linearity of Regression in Frequency-

distributions," Biometrika, vol. iv., 1905, p. 332.

Abbreviated Methods of Calculation.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

See also references to Chapter XVI.

(15) Harris, J. Arthur, "A Short Method of Calculating the Coefficient

of Correlation in the case of Integral Variates," Biometrika, vol. vii.,

1909, p. 214. (Not an approximation, but a true short method.)

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



1. Introductory—2. Standard-deviation of a sum or difference—3. Influence

of grouping of observations on the standard-deviation—4-5. Influence

of errors of observation on the standard-deviation—6-7. Influence• of

errors of observation on the correlation-coefficient (Spearman's

theorems)—8. Mean and standard-deviation of an index—9. Correla-

tion between indices—10. Correlation-coefficient for a two-x two-fold

table—11. Correlation coefficient for all possible pairs of N values of a

variable—12. Correlation due to heterogeneity of material—13. Reduc-

tion of correlation due to mingling of uncorrelated with correlated

material—14-17. The weighted mean—18-19. Application of weight-

ing to the correction of death-rates, etc., for varying sex and age-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

distributions—20. The weighting of forms of average other than the

arithmetic mean.

1. It has already been pointed out that a statistical measure, if

it is to be widely useful, should lend itself readily to algebraical

treatment. The arithmetic mean and the standard-deviation

derive their importance largely from the fact that they fulfil this

requirement better than any other averages or measures of dis-

persion; and the following illustrations, while giving a number of

results that are of value in one branch or another of statistical

work, suffice to show that the correlation-coefficient can be treated

with the same facility. This might indeed be expected, seeing

that the coefficient is derived, like the mean and standard-devia-

tion, by a straightforward process of summation.

2. To find the Standard-deviation of the sum or difference Z of

corresponding values of two variables Xl and X2.

Let z, xv x2 denote deviations of the several variables from

their arithmetic means. Then if

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

^ = X.-I i Ji-rp



Squaring both sides of the equation and summing,

2(32) = Sfo2) + 2(*22) ± 2S(*1*S).

That is, if r be the correlation between x1 and x2 , and o-, o-1, o-2

the respective standard-deviations,

o-2 = (r12 + o-22±2r.o-1""2 . . . (1)

If x1 and a;2 are uncorrelated, we have the important special case

0-2 = o-12 + <r22 . . . . (2)

The student should notice that in this case the standard-

deviation of the sum of corresponding values of the two variables

is the same as the standard-deviation of their difference.

The same process will evidently give the standard-deviation of a

linear function of any number of variables. For the sum of a

series of variables Xv X2 . . . . Xn we must have

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

o-2 = o-12 + o-22 + . . . . + o-n2 + 2r12.o-lfr2 + 2r13.o-iO-3

+ • • • • + 2r23.o-2o-3 + ....

r12 being the correlation between X1 and X2, r2S the correlation

between X2 and X3, and so on.

Influence of Grouping on the Standard-deviation.—The results

of § 2 may be applied to give an approximate correction for the

effects of grouping observations on the standard-deviation.

Instead of assigning to any observation its true value X, we assign

to it the value Z corresponding to the centre of the class-interval,

thereby making an error 8, where

. Z=X+S.

Now regarding the frequency-distribution, to a first approxima-

tion, as built up of a series of rectangles, like the histogram, the

frequency being uniformly distributed over each interval, the

correlation between X and 8 is zero, for the mean value of 8 is

zero for every interval. Further, to the same degree of approxima-

tion, the standard deviation of 8 is c2/12, where e is the class-

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

interval (Chap. VIII. S 12, eqn. (10)). Hence if o- be the

standard-deviation of the grouped values Z, and <r1 the standard-

deviation of the true value X, we have approximately

°-i2 = °-2-^ • • • • (3)

This is a formula of correction (Sheppard's correction, refs. 1 to 4)

that is very frequently used.


4. Influence of Errors of Observation on the Standard-deviation.

—The results may be further applied to the theory of errors of

observation. Let us suppose that, if any value of X be observed

a large number of times, the arithmetic mean of the observations

is approximately the true value, the arithmetic mean error being

zero. Then, the arithmetic mean error being zero for all values

of X, the error, say 8, is uncorrelated with X. In this case if x1 be

an observed deviation from the arithmetic mean, x the true devia-

tion, we have from the preceding

*.*->** +of . . . . (4)

The effect of errors of observation is, consequently, to increase the

standard-deviation above its true value. The student should

notice that the assumption made does not imply the complete in-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

dependence of X and 8: he is quite at liberty to suppose that

errors fluctuate more, for example, with large than with small

values of X, as might very probably happen. In that case the

contingency-coefficient between X and 8 would not be zero,

although the correlation-coefficient might still vanish as


5. If the observations be repeated so that we have in every case

two measures xl and »2 of the same deviation x, it is possible to

obtain the true standard-deviation <rx if the further assumption

is legitimate that the errors 81 and 82 are uncorrelated with each

other. On this assumption

Sfoav,) = %(x + 81)(a; + 82)

and accordingly

"-•-^ .... (5)

(This formula is part of Spearman's formula for the correction of

the correlation-coefficient, cf. § 7.)

6. Influence of Errors of Observation on the Correlation-coefficient.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

—Let xv y, be the observed deviations from the arithmetic means,

x, y the true deviations, and 8, e the errors of observation. Of

the four quantities x, y, .8, c we will suppose x and y alone to

be correlated. On this assumption

%x1y,) = %(xy)" • • ■ (6)

It follows at once that


and consequently the observed correlation is less than the true

correlation. This difference, it should be noticed, no mere increase

in the number of observations can in any way lessen.

7. Spearman's Theorems.—If, however, the observations of both

x and y be repeated, as assumed in § 5, so that we have two

measures x, and x2, y1 and y2 of every value of x and y, the true

value of the correlation can be obtained by the use of equations

(5) and (6), on assumptions similar to those made above. For

we have

r t = Sfoy^S^y,,) = S(x1y2)2(a:2y1)


Or, if we use all the four possible correlations between observed

values of x and observed values of y,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259


r 4 - *i»C 'si'!" *itV *i"i /a \

"''" (r*,-•r„*)'-'

Equation (8) is the original form in which Spearman gave his

correction formula (refs. 5, 6). It will be seen to imply the

assumption that, of the six quantities x, y, S,, 82, tv e2, x and y

alone are correlated. The correction given by the second part

of equation (7), also suggested by Spearman, seems, on the

whole, to be safer, for it eliminates the assumption that the errors

in x and in y, in the same series of observations, are uncorrelated.

An insufficient though partial test of the correctness of the

assumptions may be made by correlating x1 - x2 with y1 — y2: this

correlation should vanish. Evidently, however, it may vanish

from symmetry without thereby implying that all the correlations

of the errors are zero.

8. Mean and Standard-deviation of an Index.—(Ref. 9.) The

means and standard-deviations of non-linear functions of two or

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

more variables can in general only be expressed in terms of the

means and standard-deviations of the original variables to a first

approximation, on the assumption that deviations are small

compared with the mean values of the variables. Thus let it be

required to find the mean and standard-deviation of a ratio or

index Z=XJX2, in terms of the constants for X, and X2. Let /

be the mean of Z, Ml and M2 the means of X1 and X2. Then


:.J A' J/.A +mlA mJ


Expand the second bracket by the binomial theorem, assuming

that x2\M2 is so small that powers higher than the second can

be neglected. Then to this approximation

That is, if r be the correlation between x1 and x2, and if v1 = o-JMv

v2 = o-2\M2,

- /=5j(1-r^2 + V) • • • • (9)

If s be the standard-deviation of Z we have

Expanding the second bracket again by the binomial theorem,

and neglecting terms of all orders above the second,

2 , 72 l Mi2*

s2 + I" = NMf


Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

or from (9)

s2=J>i2-2'-^+V) .... (10)

9. Correlation between Indices.—(Ref. 9.) The following prob-

lem affords a further illustration of the use of the same method.

Required to find approximately the con-relation between two ratios

I?1 = Xi/Xi, Z2 = X2/X3, X1 X2 and X3 being uncorr-elated.

Let the means of the two ratios or indices be /1 I2 and the

standard-deviations s1 s2; these are given approximately by (9)

and (10) of the last section. The required correlation p will be

given by





-'W+SX,+^+iy -"*-
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Neglecting terms of higher order than the second as before and

remembering that all correlations are zero, we have


where, in the last step, a term of the order -v34-has again been

neglected. Substituting from (10) for «1 and «2, we have finally—


p= V(V+V)(V+V) ' . . (11)

This value of p is obviously positive, being equal to 0-5 if

vl = v2 = vi; and hence even if Xl and X2 are independent, the in-

dices formed by taking their ratios to a common denominator X3 will

be correlated. The value of p is termed by Professor Pearson the

"spurious correlation." Thus if measurements be taken, say, on

three bones of the human skeleton, and the measurements grouped

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

in threes absolutely at random, there will, nevertheless, be a

positive correlation, probably approaching 0-5, between the indices

formed by the ratios of two of the measurements to the third. To

give another illustration, if two individuals both observe the same

series of magnitudes quite independently, there may be little, if

any, correlation between their absolute errors. But if the errors

be expressed as percentages of the magnitude observed, there

may be considerable correlation. It does not follow of necessity

that the correlations between indices or ratios are misleading.

If the indices are uncorrelated, there will be a similar "spurious"

correlation between the absolute measurements ZvX3 = Xl and

Z2.X3 = X2, and the answer to the question whether the correlation

between indices or that between absolute measures is misleading

depends on the further question whether the indices or the

absolute measures are the quantities directly determined by the

causes under investigation (cf. ref. 11).

The case considered, where X1 X2 and Xs are uncorrelated, is only

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

a special one; for the general discussion cf. ref. 9.

10. The Correlation-coefficient for a two- x two-fold Table.—The

correlation-coefficient is in general only calculated for a table with

a considerable number of rows and columns, such as those given

in Chapter IX. In some cases, however, a theoretical value is

obtainable for the coefficient, which holds good even for the limiting

case when there are only two rows and two columns. It is

consequently of some interest to obtain the value for such a

coefficient in terms of the class-frequencies.



Using the notation of Chapters I.-IV. the table is






(A) 1 (a)

Taking the centre of the table as arbitrary origin and the class-

interval, as usual, as the unit, the co-ordinates of the mean are


The standard-deviations o-v o-2 are given by

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

o-1!!-0-25-ii, = (^)(a)/^



%(xy) = \{(AB) + (a/3) - (48) - (a*)} - N&.


(as in Chap. III. §§ 11-12) and replacing £, rj by their values,

this reduces to



r= J(A)(a)(B)(B) ■ ■ ■ (U)

This value of r might be used as a coefficient of association, but,

unlike the association-coefficient of Chap. III. § 13, which is

unity if either (AB) = (A) or (AB) = (B), r only becomes unity if

(AB) = (.4) = (B). This is the only case in which both frequencies

(aB) and (Afi) can vanish so that (AB) and (a/3) correspond to

the frequencies of two points X, Yv X2 Y2 on a line.

11. The Correlation-coefficient for all possible jxiirs of N values

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

of a Variable.—In certain cases a correlation-table is formed by

combining N observations in pairs in all possible ways. If, for

example, a table is being formed to illustrate, say, the correlation

between brothers for stature, and there are three brothers in


one family with statures 5 ft. 9, 5 ft. 10, and 5 ft. 11, these are

regarded as giving the six pairs

5 ft. 9 with 5 ft. 10 5 ft. 10 with 5 ft. 9

„ 5 ft. 11 5 ft. 11 „

5 ft. 10 „ „ „ „ 5 ft. 10

which may be entered into the table. The entire table will be

formed from the aggregate of such subsidiary tables, each due to

one family. Let it be required to find the correlation-coefficient,

however, for a single subsidiary table, due to a family with JV

members, the numbers of pairs being therefore JV(iV- 1).

As each observed value of the variable occurs Jf—1 times,

i.e. once in combination with every other value, the means and

standard-deviations of the totals of the correlation-table are the

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

same as for the original N observations, say M and o-. If x1 x2

x, .... be the ooserved deviations, the product sum may be


if/iJCn ~l X-tOCn ~J~ '■s'\3ja 1* " •"

"T" XtyC-t "|" XqiKq T XqPCt I • " " ■

= x1{%(x)-x1}+x2{%(x)-x2}+xs{ll(x)-xi} + ....

whence, there being N(N- 1) pairs,

No* 1

W(N-i)<r2 iir-i •


For N—2, 3, 4 . . . . this gives the successive values of r= — 1,

— A, _ J ... . It is clear that the first value is right, for two

values xv x2 only determine the two points (xv x2) and (x2, xj,

and the slope of the line joining them is negative.

The student should notice that a corresponding negative

association will arise between the first and second member of the
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

pair if all possible pairs are formed in a mixture of .4's and a's.

Looking at the association, in fact, from the standpoint of § 10,

the equation (13) still holds, even if the variables can only assume

two values, e.g. 0 and 1. This result is utilised in § 14 of Chapter


12. Correlation due to Heterogeneity of Material.—The following

theorem offers some analogy with the theorem of Chap. IV.

§ 6 for attributes.—If X and Y are uncorrelated in each of two

records, they will nevertheless exhibit some correlation when the



two records are mingled, unless the mean value of X in the

second record is identical with that in the first record, or the mean

value of Y in the second record is identical with that in the first

record, or both.

This follows almost at once, for if Mv M2 are the mean values of

X in the two records Kv K2, the mean values of Y, Nv N2 the

numbers of observations, and M, K the means when the two

records are mingled, the product-sum of deviations about M, K is

Nx (M1 - M)(K1 -K) + N2(M2 - M)(K2 - K).

Evidently the first term can only be zero if M=M1 or K = KV

But the first condition gives

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Ny + N2

that is, Mx — M2.

Similarly, the second condition gives K1=K2. Both the first

and second terms can, therefore, only vanish if M1 = M2 or

K1 = K¥ Correlation may accordingly be created by the mingling

of two records in which X and Y vary round different means.

(For a more general form of the theorem cf. ref. 17.)

13. Reduction of Correlation due to mingling of uncorrelated

with correlated pairs.—Suppose that n1 observations of x and y

give a correlation-coefficient

r = 2fey)

Now let n2 pairs be added to the material, the means and

standard-deviations of x and y being the same as in the first

series of observations, but the correlation zero. The value of

~%(xy) will then be unaltered, and we will have


(»1 + «2)<W'
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Whence -=—T— .... (14)

r1 n1 + n2 v'

Suppose, for example, that a number of bones of the human

skeleton have been disinterred during some excavations, and

a correlation r2 is observed between pairs of bones presumed

to come from the same skeleton, this correlation being rather

lower than might have been expected, and subject to some

uncertainty owing to doubts as to the allocation of certain

bones. If r, is the value that would be expected from other

records, the difference might be accounted for on the hypothesis


that, in a proportion (r: - r2)/r1 of all the pairs, the bones do

not really belong to the same skeleton, and have been virtually

paired at random. (For a more general form of the theorem cf.

again ref. 17.)

14. The Weighted Mean.—The arithmetic mean M of a series

of values of a variable X was defined as the quotient of the sum

of those values by their number N, or


If, on the other hand, we multiply each several observed

value of X by some numerical coefficient or weight W, the

quotient of the sum of such products by the sum of the weights

is defined as a weighted mean of X, and may be denoted by M';

so that
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

M' = %(W.X)/%(W).

The distinction between "weighted" and " unweighted" means

is, it should be noted, very often formal rather than essential,

for the "weights" may be regarded as actual, estimated, or

virtual frequencies. The weighted mean then becomes simply

an arithmetic mean, in which some new quantity is regarded

as the unit. Thus if we are given the means Mv M2 , AI3 . . . .

Mr of r series of observations, but do not know the number

of observations in every series, we may form a general average

by taking the arithmetic mean of all the means, viz. 1(M)/r,

treating the series as the unit. But if we know the number

of observations in every series it will be better to form the

weighted mean 2(iVil/)/2(#), weighting each mean in proportion

to the number of observations in the series on which it is based.

The second form of average would be quite correctly spoken

of as a weighted mean of the means of the several series: at

the same time it is simply the arithmetic mean of all the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

series pooled together, i.e. the arithmetic mean obtained by

treating the observation and not the series as the unit.

(Chap. VII. § 13.)

15. To give an arithmetical illustration, if a commodity is sold

at different prices in different markets, it will be better to form

an average price, not by taking the arithmetic mean of the several

market prices, treating the market as the unit, but by weighting

each price in proportion to the quantity sold at that price, if

known, i.e. treating the unit of quantity as the unit of frequency.

Thus if wheat has been sold in market A at an average price of

29s. Id. per quarter, in market B at an average price of 27s. 7d.,

and in market C at an average price of 28s. 4d., we may, if no

statement is made as to the quantities sold at these prices (as very


often happens in the case of statements as to market prices), take

the arithmetic mean (28s. 4d.) as the general average. But if we

know that 23,930 qrs. were sold at A, only 26 qrs. at B, and 3933

qrs. at G, it will be better to take the weighted mean

(29s. Id. x 23,930) + (27s. 7d. x 26) + (28s. 4d. x 3933)

27889 =29s-

to the nearest penny. This is appreciably higher than the

arithmetic mean price, which is lowered by the undue importance

attached to the small markets B and C.

In the case of index-numbers for exhibiting the changes in

average prices. from year to year (cf. Chap. VII. § 25), it may

make a sensible difference whether we take the simple arithmetic

mean of the index-numbers for different commodities in any one

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

year as representing the price-level in that year, or weight the

index-numbers for the several commodities according to their

importance from some point of view; and much has been written

as to the weights to be chosen. If, for example, our standpoint

be that of some average consumer, we may take as the weight for

each commodity the sum which he spends on that commodity in

an average year, so that the frequency of each commodity is

taken as the number of shillings or pounds spent thereon instead

of simply as unity.

Rates or ratios like the birth-, death-, or marriage-rates of a

country may be regarded. as weighted means. For, treating the

rate for simplicity as a fraction, and not as a rate per 1000 of the


-. . „ . . total births

Birth-rate of whole country = .—.-, ,—r.

J total population

2(birth-rate in each district x population in that district)

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

^(population of each district)

i.e. the rate for the whole country is the mean of the rates in the

different districts, weighting each in proportion to its population.

We use the weighted and unweighted means of such rates as

illustrations in §17 below.

16. It is evident that any weighted mean will in general differ

from the unweighted mean of the same quantities, and it is

required to find an expression for this difference. If r be the

correlation between weights and variables, <rw and or,. the standard-

deviations, and w the mean weight, we have at once

■%(W.X) = N(M.w + ro-wo-x),

whence M' = M+ro-°^ . . . . (15)



That is to say, if the weights and variables are positively correlated,

the weighted mean is the greater; if negatively, the less. In some

cases r is very small, and then weighting makes little difference,

but in others the difference is large and important, r having a

sensible value and <txct„/u, a large value.

17. The difference between weighted and unweighted means

of death-rates, birth-rates or other rates on the population in

different districts is, for instance, nearly always of importance.

Thus we have the following figures for rates of pauperism

(Jour. Stat. Soc., vol. lix. (1896), p. 349).

Percentages of the Population in

receipt of Relief.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

January 1.

Arithmetic Mean

England and

of Rates in

different Districts.

Wales as a











Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

2 69



3 68


In this case the weighted mean is markedly the less, and the

correlation between the population of a district and its pauperism

must therefore be negative, the larger (on the whole urban) dis-

tricts having the lower percentage in receipt of relief. On the

other hand, for the decade 1881-90 the average birth-rate for

England and Wales was 32-34 per thousand, the arithmetic

mean of the rates for the different districts 30-34 only. The

weighted mean was therefore the greater, the birth-rate being

higher in the more populous (urban) districts, in which there is

a greater proportion of young married persons.

For the year 1891 the average population of a Poor-law district

was found to be roughly 45,900 and the standard-deviation 0-„

56,400 (populations ranging from under 2000 to over half a

million). The standard-deviation o-x of the percentages of the

population in receipt of relief was 1-24. We have therefore,

for the correlation between pauperism and population,


2-69 459

1-24 X564

- 0-39.

For the birth-rate, on the other hand, assuming that <rw/w

is approximately the same for the decade 1881-90 as in 1891,

we have, crx being 4 '08,

- 32-34-30-34 459

r~ 4-08 *564

= + -40.

The closeness of the numerical values of r in the two cases is,

of course, accidental.

18. The principle of weighting finds one very important

application in the treatment of such rates as death-rates, which

are largely affected by the age and sex-composition of the popula-

tion. Neglecting, for simplicity, the question of sex, suppose the

numbers of deaths are noted in a certain district for, say, the

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

age-groups 0-, 10-, 20-, etc., in which the fractions of the whole

population are p0, pv p2, etc., where 2(p) = 1. Let the death-

rates for the corresponding age-groups be d0, dv d2, etc. Then

the ordinary or crude death-rate for the district is

D = 2(d.p) .... (16)

For some other district taken as a basis of comparison, perhaps

the country as a whole, the death-rates and fractions of the

population in the several age-groups may be 81 82 83 . . . , ti-1 ir2

7t, . . . , and the crude death-rate

A = 2(8.ir) .... (17)

Now D and A may differ either because the d's and 8's differ

or because the p's and Vs differ, or both. It may happen that

really both districts are about equally healthy, and the death-

rates approximately the same for all age-classes, but, owing to a

difference of weighting, the first average may be markedly higher

than the second, or vice versd. If the first district be a rural

district and the second urban, for instance, there will be a larger
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

proportion of the old in the former, and it may possibly have a

higher crude death-rate that the second, in spite of lower death-

rates in every class. The comparison of crude death-rates is

therefore liable to lead to erroneous conclusions. The difficulty

may be got over by averaging the age-class death-rates in the

district not with the weights p1 pa p3 . . . . given by its own

population, but with the weights, u-1 ir2 tt3 . . . . given by the

population of the standard district. The corrected death-rate for

the district will then be

Z>' = S(tf.ir) . . . (18)


and D' and A will be comparable as regards age-distribution.

There is obviously no difficulty in taking sex into account as well

as age if necessary. The death-rates must be noted for each sex

separately in every age-class and averaged with a system of

weights based on the standard population. The method is also

of importance for comparing death-rates in different classes of the

population, e.g. those engaged in given occupations, as well as in

different districts, and is used for both these purposes in the

Decennial Supplements to the Reports of the Registrar General

for England and Wales (ref. 13).

19. Difficulty may arise in practical cases from the fact that

the death-rates dl d„ ds . . . . are not known for the districts or

classes which it is desired to compare with the standard popula-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

tion, but only the crude rates D and the fractional populations

of the age-classes p1 p2 p3 . . . . The difficulty may be partially

obviated (cf. Chap. IV. § 9, pp. 51-3) by forming what may be

termed a potential or standard death-rate A' for the class or

district, A' being given by

A' = %(8.p) . . . . (19)

i.e. the rates of the standard population averaged with the

weights of the district population. It is the crude death-rate

that there would be in the district if the rate in every age-

class were the same as in the standard population. An

approximate corrected death-rate for the district or class is

then given by

D" = Dx^ .... (20)

D" is not necessarily, nor generally, the same as D'. It can

only be the same if

%(d.-K) _s(M

2(d.p) 2(8.p)'
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

This will hold good if, e.g., the death-rates in the standard

population and the district stand to one another in the same

ratio in all age-classes, i.e. S1/rf1 = S2/d2 = Ss/ds = etc. This method

of correction is used in the Annual Summaries of the Registrar

General for England and Wales.

Both methods of correction—that of § 18 and that of the

present section—are of great and growing importance. They

are obviously applicable to other rates besides death-rates, e.g.

birth-rates (cf. refs. 14, 15). Further, they may readily be

extended into quite different fields. Thus it has been suggested

(ref. 16) that corrected average heights or corrected average weights


of the children in different schools might be obtained on the

basis of a standard school population of given age and sex

composition, or indeed of given composition as regards hair and

eye-colour as well.

20. In §§ 14-17 we have dealt only with the theory of

the weighted arithmetic mean, but it should be noted that

any form of average can be weighted. Thus a weighted median

can be formed by finding the value of the variable such that

the sum of the weights of lesser values is equal to the sum

of the weights of greater values. A weighted mode could be

formed by finding the value of the variable for which the sum

of the weights was greatest, allowing for the smoothing of

casual fluctuations. Similarly, a weighted geometric mean could

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

be calculated by weighting the logarithms of every value of the

variable before taking the arithmetic mean, i.e.

10g (r. ^^ ^ .


Effect of Grouping Observations.

(1) Sheppard, W. F., "On the Calculation of tlie Average Square, Cube, etc.,

ofalarge number of Magnitudes," Jour. Roy. Stat. Soc, vol. lx., 1897,

p. 698.

(2) Sheppard, W. F., "On the Calculation of the most probable Values of

Frequency Constants for Data arranged according to Equidistant

Divisions of a Scale," Proc. Load. Math. Soc, vol. xxix. p. 353. (The

result given in eqn. (3) for the correction of the standard-deviation is

Sheppard's result, but the mode in which he deduces this and similar

corrections is quite different.)

(3) Sheppard, W. F., "The Calculation of Moments of a Frequency-distribu-

tion," Biometrika, v., 1907, p. 450.

(4) Pearson, Karl, and others [editorial], "On an Elementary Proof of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Sheppard's Formulse for correcting Raw Moments, and on other allied

points," Biometrika, vol. iii., 1904, p. 308.

Effect of Errors of Observation on the Correlation-coefficient.

(5) Spearman• C, "The Proof and M easurement of Association between Two

Things," Amer. Jour, of Psychology, vol. xv., 1904, p. 88.

(Formula (8).)

(6) Spearman, C., "Demonstration of Formulae for True Measurement of

Correlation,' Amer. Jour• of Psychology, vol. xviii., 1907, p. 161.

(Proof of formula (8), but on different lines to that given in the text,

which was communicated to Spearman in 1908, and published by

Brown and by Spearman in (7) and (8).)

(7) Spearman, C, "Correlation calculated fiom Faulty Data," British Jour.

of Psychology, vol. iii., 1910, p. 271.


(8) Buown, W., "Some Experimental Results in Correlation," Proceedings

of the Sixth International Congress of Psychology, Geneva, August 190P.

Correlations between Indices, etc.

(9) Pearson, Karl, "On a Form of Spurious Correlation which may arise

when Indices are used in the Measurement of Organs," Proc. Roy. Soc,

vol. lx., 1897, p. 489. (§§8, 9.)

(10) Galton, Francis, "Note to the Memoir by Prof. Karl Pearson on

Spurious Correlation," ibid., p. 498.

(11) Yule, G. U., "On the Interpretation of Correlations between Indices or

Ratios," Jour. Roy. Stat. Soc, vol. lxxiii., 1910, p. 644.

The Weighted Mean.

(12) Pearson, Karl, "Note on Reproductive Selection," Proc. Roy. Soc,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

vol. lix., 1896, p. 301. (Eqn. (15).)

Correction of Death-rates, etc.

(13) Tatham, John, Supplement to the Fifty-fifth Annual Report of the

Registrar-General for England and Wales: Introductory Letters to

Pt. I. and Ft. II. Also Supplement to Sixty-fifth Report: Introductory

Letter to Pt. II. (Cd. 7769, 1895; 8503, 1897 ; 2619, 1908).

(14) Newsholme, A., and T. H. C. Stevenson, "The Decline of Human

Fertility in the United Kingdom and other Countries, as shown by

Corrected Birth-rates," Jour. Roy. Stat. Soc, vol. lxix., 1906, p. 34.

(15) Yule, G. U., "On the Changes in the Marriage- and Birth-Rates in

England and Wales during the past half-century," etc., ibid., p. 88.

(16) Heron, David, "The Influence of Defective Physique and Unfavourable

Home Environment on the Intelligence of School Children," Eugenics

Laboratory Memoirs, viii., Dulau & Co., London, 1910.


(17) Peakson, Karl, Alice Lee, and L. Bramley-Moork, "Genetic

(reproductive) Selection: Inheritance of Fertility in Man and of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Fecundity in Thoroughbred Race-horses," Phil. Trans. Roy. Soc,

Series A, vol. excii., 1899, p. 257.

(A number of theorems of general application are given in the intro-

ductory part of this memoir, some of which have been utilised in §§ 12-

13 of the preceding chapter.)


1. Find the values obtained for the standard-deviations in Examples ii.

(p. 139) and iii. (p. 141) of Chapter VIII. on applying Sheppard's correction

for grouping.

2. Show that if a range of six times the standard-deviation covers at least

18 class-intervals (cf. Chap. VI. § 5), Sheppard's correction will make a

difference of less than 0'5 per cent. in the rough value of the standard-


3. (Data from the decennial supplements to the Annual Reports of the

Registrar-General for England and Wales.) The following particulars are



£ound for 36 small registration districts in which the number of births in a

decade ranged between 1500 and 2500 :—

Proportion of Male Births

per 1000 of all Births.








Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259



Both decades



It is believed, however, that a great part of the observed standard-deviation

is due to mere "fluctuations of sampling " of no real significance.

Given that the correlation between the proportions of male births in a

district in the two decades is + 0.36, estimate (1) the true standard-deviation

freed from such fluctuations of sampling; (2) the standard-deviation of fluctua-

tions of sampling, i.e. of the errors produced by such fluctuations in the observed

proportions of male births.

4. (Data from Pearson, ref. 9.) The coefficients of variation for breadth,

height, and length of certain skulls are 3-89, 3-50, and 3-24 per cent. respec-

tively. Find the '' spurious correlation" between the breadth/length and

height/length indices, absolute measures being combined at random so that

they are uncorrelated.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

6. (Data from Boas, communicated to Pearson: cf. Fawcett and Pearson,

Proc. Roy. Soc., vol. lxii. p. 413.) From short series of measurements on

American Indians the mean coefficient of correlation found between father and

son, and father and daughter, for cephalic index, is0-]4 ; between mother and

son, and mother and daughter 0-33. Assuming these coefficients should be

the same if it were not for the looseness of family relations, find the proportion

of children not due to the reputed father.

6. Find the correlation between Xl + X2 and X2 + Xs; X1, X2 and Xs being


7. Find the correlation between Xl and aX1 + bX2, Xl and X2 being


8. (Referring to illustration iv., § 14, Chap. X.) Use the answer to

question 7 to estimate, very roughly, the correlation that would be found

between annual movements in infantile and general mortality if the mortality

of those under and over 1 year of age were uncorrelated. Note that —

gei000 Tpo^ulati^n } = infantile mortality per 1000 births >



+ deaths over one year per 1000 of population.

and treat the ratio of births to population as if it were constant at a rough

average value, say 0'033. The standard-deviation of annual movements in

infantile mortality is (loc. cit.) 9.6, and that of annual movements in mortality

other than infantile may be taken as sensibly the same as that of general

mortality, or say 1 unit.

9. If the relation

a.x1 + b.x2 + c.x3=0


holds for all values of x„ ar2 and ?3 (which are, in oar usual notation,

deviations from their respective arithmetic means), find the correlations

between xii j2 and x% in terms of their standard-deviations and the values of

a, b and e.

10. What is the effect on a weighted mean of errors in the weights or the

quantities weighted, such errors being uncorrelated with each other, with the

weights, or with the variables—(1) if the arithmetic mean values of the errors

are zero; (2) if the arithmetic mean values of the errors are not zero!
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


1-2. Introductory explanation—3. Direct deduction of the formulae for two

variables—4. Special notation for the general case : generalised re-

gressions—5. Generalised correlations—B. Generalised deviations and

standard-deviations—7-8. Theorems concerning the generalised pro-

duct-sums—9. Direct interpretation of the generalised regressions—

10-11. Reduction of the generalised standard-deviation—12. Reduc-

tion of the generalised regression—13. Reduction of the generalised

correlation-coefficient—14. Arithmetical work: Example i. : Example

ii.—15. Geometrical representation of correlation between three

variables by means of a model—16. The coefficient of n-fold correlation

—17. Expression of regressions and correlations of lower in terms of

those of higher order—18. Limiting inequalities between the values of

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

correlation-coefficients necessary for consistence—19. Fallacies.

1. In Chapters IX.-XI. the theory of the correlation-coefficient for

a single pair of variables has been developed and its applications

illustrated. But in the case of statistics of attributes we found

it necessary to proceed from the theory of simple association for

a single pair of attributes to the theory of association for several

attributes, in order to be able to deal with the complex causation

characteristic of statistics; and similarly the student will find it

impossible to advance very far in the discussion of many problems

in correlation without some knowledge of the theory of multiple

correlation, or correlation between several variables. In such a

problem as that of illustration i.. Chap. X., for instance, it might

be found that changes in pauperism were highly correlated

(positively) with changes in the out-relief ratio, and also with

changes in the proportion of old; and the question might arise how

far the first correlation was due merely to a tendency to give out-

relief more freely to the old than the young, i.e. to a correlation
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

between changes in out-relief and changes in proportion of old.

The question could not at the present stage be answered by work-

ing out the correlation-coefficient between the last pair of variables,

for we have as yet no guide as to how far a correlation between

225 15

the variables 1 and 2 can be accounted for by correlations

between 1 and 3 and 2 and 3. Again, in the case of illustration iii.,

Chap. X., a marked positive correlation might be observed between,

say, the bulk of a crop and the rainfall during a certain period, and

practically no correlation between the crop and the accumulated

temperature during the same period; and the question might arise

whether the last result might not be due merely to a negative

correlation between rain and accumulated temperature, the crop

being favourably affected by an increase of accumulated temper-

ature if other things were equal, but failing as a rule to obtain this

benefit owing to the concomitant deficiency of rain. In the prob-

lem of inheritance in a population, the corresponding problem is

of great importance, as already indicated in Chapter IV. It is

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

essential for the discussion of possible hypotheses to know whether

an observed correlation between, say, grandson and grandparent

can or cannot be accounted for solely by observed correlations

between grandson and parent, parent and grandparent.

2. Problems of this type, in which it is necessary to consider

simultaneously the relations between at least three variables, and

possibly more, may be treated by a simple and natural extension

of the method used in the case of two variables. The latter case

was discussed by forming linear equations between the two

variables, assigning such values to the constants as to make the

sum of the squares of the errors of estimate as low as possible:

the more complicated case may be discussed by forming linear

equations between any one of the n variables involved, taking

each in turn, and the n — 1 others, again assigning such values to

the constants as to make the sum of the squares of the errors of

estimate as minimum. If the variables are X1 X2 Xs . . . . Xn,

the equation will be of the form

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

X1 = a + b2.X2 + bs.X3 + .... +bn.Xn.

If in such a generalised regression or characteristic equation we

find a sensible positive value for any one coefficient such as b2,

we know that there must be a positive correlation between X1

and X2 that cannot be accounted for by mere correlations of X1

and X2 with Xs, Xv or Xn, for the effects of changes in these

variables are allowed for in the remaining terms on the right.

The magnitude of b2 gives, in fact, the mean change in Xl

associated with a unit change in X2 when all the remaining

variables are kept constant. The correlation between X-± and

X2 indicated by b2 may be termed a partial correlation, as

corresponding with the partial association of Chapter IV., and it

is required to deduce from the values of the coefficients b, which

may be termed partial regressions, partial coefficients of corre-


lation giving the correlation between JT1 and X, or other pair of

variables when the remaining variables X3 .... Xn are kept

constant, or when changes in these variables are corrected or allowed

for, so far as this may be done with a linear equation. For examples

of such generalised regression-equations the student may turn to

the illustrations worked out below (pp. 235-243).

3. With this explanatory introduction, we may now proceed to

the algebraic theory of such generalised regression-equations and

of multiple correlation in general. It will first, however, be as

well to revert briefly to the case of two variables. In Chapter IX.,

to obtain the greatest possible simplicity of treatment, the value

of the coefficient r =p/o-1o-2 was deduced on the special assump-

tion that the means of all arrays were strictly collinear, and the
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

meaning of the coefficient in the more general case was sub-

sequently investigated. Such a process is not conveniently

applicable when a number of variables are to be taken into

account, and the problem has. to be faced directly: i.e. required,

to determine the coefficients and constant term, if any, in a

regression-equation, so as to make the sum of the squares of the

errors of estimate a minimum. We will take this problem first

for the case of two variables, introducing a notation that can be

conveniently adapted to more. Let us take the arithmetic

means of the variables as origins of measurement, and let xv x2

denote deviations of the two variables from their respective

means. Then it is required to determine a, and 612 in the re-


Xl — a\ + ^12-^2 • • • ■ (a)

so as to make S(^l - a1 + b12.x2)2, for all associated pairs of

deviations xl and x2, the least possible. Put more briefly, if

we write
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

iVr.s?.2=2(x1-a1. + &12.x2)2 . . .(b)

so that sl2 is the root-mean-square value of the errors of estimate

in using regression-equation (a) (cf. Chap. IX. § 14), it is required

to make sl2 a minimum. Suppose any value whatever to be

assigned to 6,2, and a series of values of a, to be tried', sl2 being

calculated for each. Evidently «12 would be very large for

values of al that erred greatly either in excess or defect of the

best value (for the given value of bl2), and would continuously

decrease as this best value was approached; the value of s, 2 could

never become negative, though possibly, but exceptionally, zero.

If therefore the values of sh2 were plotted to the values of a1 on

a diagram, a curve would be obtained more or less like that

of fig. 44. The best value of av for which s12 attained its


minimum value, say o-12, could be approximately estimated from

such a diagram; but it can be calculated into much more exact-

ness from the condition that ifa\ a"1 be two values close above

and below the best, the corresponding values of sl2 are equal. Let

al and (et1 -(- 8) be two such values. Then if

%(x1 -al + bu.x2)2 = 2(j;, - a1 + 8 + b12.x2)2

when 8 is very small, the value of a1 is the best for the assigned

value of bl2. But, evidently, the equation gives, neglecting

the term in 82,

that is,

2(ar, - a1 + b12.x2) = 0,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

•whatever the value of b12 This is the direct proof of the

result that no constant term need be introduced on the right

of a regression-equation when written in terms of deviations

from the arithmetic mean, or that the two lines of regression

must pass through the mean (Chap. IX. § 10). We may

therefore omit any constant term. If, now, bn is to be assigned

the best value, we must have, by similar reasoning, for slightly

differing values, 612, 612 + 8,

2(^-412.^)2 = 2(^-[512 + 8>2)2.

That is, again neglecting terms in 82,

%x2(x^ - b12.x2) = 0

or, breaking up the sum,



. 2(*22)"

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

which is the value found by the previous indirect method of

Chapter IX. From the fact that 612 is determined so as to

make the value of S^ - bvlx^)2 the least possible, the method

of determination is sometimes called the method of least squares.

Evidently all the remaining results of Chapter IX. follow from

this, and notably we have for o-12, the minimum value of s12,

the standard-deviation of errors of estimate

•r^-T^l-fa*) . . . .(d)

4. Now apply the same method to the regression-equation

for n variables. Writing the equation in terms of deviations,

it follows from reasoning precisely similar to that given above

that no constant term need be entered on the right-hand

side. For the partial regression-coefficients (the coefficients of

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

the x's, on the right) a special notation will be used in order1

that the exact position of each coefficient may be rendered quite

definite. The first subscript affixed to the letter b (which will

always be used to denote a regression) will be the subscript of

the x on the left (the dependent variable), and the second will

be the subscript of the x to which it is attached; these may

be called the primary subscripts. After the primary subscripts,

and separated from them by a point, are placed the subscripts

of all the remaining variables on the right-hand side as secondary

subscripts. The regression-equation will therefore be written

in the form

Xi = "12.34 . . . n • *J+ "13.24 . . . n • x3 + • " • + "ln.23 . . . (n-1) ' xn • (I)

The order in which the secondary subscripts are written is,

it should be noted, quite indifferent, but the order of the

primary subscripts is material; e.g. b12S „ and b.n3 n

denote quite distinct coefficients, x1 being the dependent variable

in the first case and x2 in the second. A coefficient with p

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

secondary subscripts may be termed a regression of the pth order.

The regressions b12, b2v b13, b31, etc., in the case of two variables

may be regarded as of order zero, and may be termed total as

distinct from partial regressions.

5. In the case of two variables, the correlation-coefficient r12

may be regarded as defined by the equation

We shall generalise this equation in the form

r 12.34 . . . . n = ("12.31 . . . . n - "21.34 . . . . n) ■ "(2)

This is at present a pure definition of a new symbol, and it

remains to be shown that rUM . , n may really be regarded as,


and possesses all the properties of, a correlation-coefficient; the

name may, however, be applied to it, pending the proof. A

oorrelation-coefficient with p secondary subscripts will be termed

a correlation of order p. Evidently, in the case of a correlation-

coefficient, the order in which both primary and secondary

subscripts is written is indifferent, for the right-hand side of

equation (2) is unaltered by writing 2 for 1 and 1 for 2. The

correlations r12 r13, etc., may be regarded as of order zero, and

spoken of as total, as distinct from partial, correlations.

6. If the regressions baM ....„, b132t ....„, etc., be assigned the

"best" values, as determined by the method of least squares, the

difference between the actual value of x" and the value assigned

by the right-hand side of the regression-equation (1), that is, the

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

error of estimate, will be denoted by x12S , . . „; i.e. as a defini-

tion we have

*1.SS . . . it = xl — °12.34 . . . n • X1i ~ "13.24 . . . II ■ xi ~ . . . ~ "ln.23 . . . Iit-1) • Xn ■ (")

where xl x2 . . . . x„ are assigned any one set of observed values.

Such an error (or residual, as it is sometimes called) denoted by a

symbol with p secondary suffixes, will be termed a deviation of the

pth order. Finally, we will define a generalised standard-deviation

°i.s« .... n by the equation

N-<n.-n n = 2(423 „) ■ ■ • (4)

N being, as usual, the number of observations. A standard-

deviation denoted by a symbol with p secondary suffixes will be

termed a standard-deviation of the pth order, the standard-

deviations o-1 o-2, etc., being regarded as of order zero, the standard-

deviations 0-1.s o"21 etc., (cf. eqn. (d) of § 3) of the first order, and

so on.

7. From the reasoning of § 3 it follows that the "least-square"

values of the partial regressions bnM ....,„ etc., will be given by

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

equations of the form

— ('.l 612.34 ....„• #0 + • ■ . . + Oln 23 .... („-l) . xn)

= l(x1 - (61234 „ + 8)x2 +....+ 6i„.,3 („-i) " xnf

8 being very small. That is, neglecting the term in 82,

zx2(xi ~ 612.34 . . . „ . x2 + . . . . + bln2i .... („-D. x„) — 0,

or, more briefly, in terms of the notation of equation (3),

2(#2 ■ #i,23 t) = 0 ■ • • • (5)

There are a large number of these equations, (n - 1) for determin-

ing the coefficients 61234 ....„, etc., (n- 1) again for determining


the coefficients b.n 3l ....„, etc., and so on: they are sometimes

termed the normal equations. If the student will folio-" the pro-

cess by which (5) was obtained, he will see that when the con-

dition is expressed that bl23i . . . . „ shall possess the "least-square"

value, x2 enters into the product-sum with a;i 23 ....„; when the

same condition is expressed for bl3U „, x3 enters into the

product-sum, and so on. Taking each regression in turn, in fact,

every x the suffix of which is included in the secondary suffixes

of #1.23 . . . . n enters into the product-sum. The normal equations

of the form (5) are therefore equivalent to the theorem—

The product-sum of any deviation of order zero with any deviation

of higher order is zero, provided the subscript of the former occur

among the secondary subscripts of the latter.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

8. But it follows from this that

2(21.34 ...... ^2.34 . . . n) = ZZi.34 . . . n^ - *23.4 . . . n . X3 - . . . - 62...34 . .. (n-1) • 2n)

= 2(Kl.34 . . . „ . 2j).


2(21.34 . . . n - 22.34 . .. n) = 2(2i . 22.34 ...„)-

Similarly again,

2(21.34 ..... 22.34 . . . (n-l|) = 2(2l.34 . . . n . 2j),

and so on. Therefore, quite generally,

2(#1.34 ....... 3-2.34 ......) = 2(«i.34 .... („_i) . #2.34 . . . . n)

— *\X1 - ^234 . . . . n) ,„,

= 2(#i.34 ....„• #2.34 .... (..-!))

— 2(#i.34 ....«. #2)

Comparing all the equal product-sums that may be obtained

in this way, we see that the product-sum of any two deviations is

unaltered by omitting any or all of the secondary subscripts of either

which are common to the two, and, conversely, tlie product-sum of any

deviation of order p with a deviation of order p + q, the p subscripts

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

being the same in each case, is unaltered by adding to the secondary

subscripts of the former any or all of the q additional subscripts of

the latter.

It follows therefore from (5) that any product-sum is zero if all

the subscripts of the one deviation occur among the secondary sub-

scripts of the other. As the simplest case, we may note that x1 is

uncorrelated with x%v and x2 uncorrelated with x12.

The theorems of this and of the preceding paragraph are of

fundamental importance, and should be carefully remembered.



9. We have now from §§ 7 and 8—

0 = 2(X2.34 . . . . n ■ #1.234 ....«)

= 2#2.3 4 . . . . n (*i - 612 34 „. x., - terms in x3 to xn)

= i(*l . ar.i34 ....„) ~ 012 34 . . . . „ 2(#2 ■ •''a.s* . . . . n)

= 2(#i 34 ....n• #2.34 . . . n) ~~ "li.Si . . . . n 2(4.34 .... „)■

That is

1 - ^(#1.34 ....„• #2.34 ■ . . . t)

1434 •n~~ 2(4*....„r • ■

But this is the value that would have been obtained by taking a

regression-equation of the form

#1.34 . . . . n = "12.34 ....it• #2.34 . . . . n

and determining 6,234 . . . . „ by the method of least-squares, i.e.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

*12.3i .... n is the regression of #, 34 „ on xi3i ...... It follows

at once from (2) that r1234....,, is the correlation between

#1.34 . . . . n and a;„34 ....„, and from (4) that we may write

by,M....n = rl2.;ii....n(Thii--n . . (8)

"i3 4 . . . . n

an equation identical with the familiar relation b12 = ri2.o-1/<r2,

with the secondary suffixes 34 .... n added throughout.

To illustrate the meaning of the equation by the simplest case,

if we had three variables only, xv #2, and #3, the value of b123 or

r123 could be determined (1) by finding the correlations r13 and

r23 and the corresponding regressions bl3 and b23; (2) working out

the residuals x1 - bl3.x3 and #2 - b23.x3 for all associated deviations;

(3) working out the correlation between the residuals associated

with the same values of x3. The method would not, however, be

a practical one, as the arithmetic would be extremely lengthy,

much more lengthy than the method given below for expressing

a correlation of order p in terms of correlations of order p — 1.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

10. Any standard-deviation of orders may be expressed in terms

of a standard-deviation of orders - 1 and a correlation of order p - 1.


2(#f.23 . . . n) = 2(*i.23 . . . („-l) • #1,23 . . . „)

= 2(#i.23... (n-i))(xi ~ °in.23... (n-D^n — terms m x2 to xn-1)

= 2(#i.23... (n-1)) — °ln,23 ... (n-I) 2(#i.23. .. („-1) • #n.2S... (n-1))

or, dividing through by the number of observations,

^l^ . . . . n= °1.23 .... (n-l)(l — "ln.23 .... (n-1) • Orel.23 .... (n-1))

1.23 ... (n-l)(l — '■111.23 .... (n-1)) • • • (")

= <T?-

This is again the relation of the familiar form—

oi„ = oi(l-rin)

with the secondary suffixes 23 . . . . (n-1) added throughout.

It is clear from (9) that rln2S in-1)t like any correlation of order

zero, cannot be numerically greater than unity. It also follows

at once that if we have been estimating x, from x2, x3 . . . . x„-lt

xn will not increase the accuracy of estimate unless rln23 (n-1)

(not rln) differ from zero. This condition is somewhat interesting,

as it leads to rather unexpected results. For example, if r12 = + 0-8,

r13= +0-4, r23 — +0'5, it will not be possible to estimate x1 with

any greater accuracy from x2 and x3 than from x2 alone, for the

value of r132 is zero (see below, § 13).

11. It should be noted that, in equation (9), any other subscript

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

can be eliminated in the same way as subscript n from the suffix of

o"i.23.... m so that a standard-deviation of order p can be expressed

in p ways in terms of standard-deviations of the next lower order.

This is useful as affording an independent check on arithmetic.

Further, o-, 23 . . ,„-,, can be expressed in the same way in terms

of C1.23 .... (n-2), an(l so oni so that we must have

°in...n = °iQ- -rj2)(l -n3.2)(l -rL») . . . (1 -J-^.,3...,„-,,) . (10)

This is an extremely convenient expression for arithmetical use;

the arithmetic can again be subjected to an absolute check by

eliminating the subscripts in a different, say the inverse, order.

Apart from the algebraic proof, it is obvious that the values must

be identical; for if we are estimating one variable from n others, it

is clearly indifferent in what order the latter are taken into account.

12. Any regression of order p may be expressed in terms of

regressions of orders - 1. For we have

S(a;i.34...n. 82.34... n) = Z(zi.34... („-1,. xq.u. .. n)

= Szi.M... (n-11(22 - *2n.3*... (n-1) • xn - terms in x3 to x„-\)

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

= 2(lKl.34... (n-1) .2>2.3(... (n-1)) ~ &2n.34 ... (n-l)S(a;i.3l... („-l . Xn.34... (n-1)).

Replacing blnU ... („-„ by b,aM ... {„-1). o-|31.,. {n-vjo-lM ... „-,„

we have

6l2.34 . . . n. (T2.34 . . . n = &12.34 . . . (n-1)" ff2.34 . . . (n-1) - &ln.34 . . . (n-1). *n2.34 . . (n-1 • 1^2.34 . .. (n-1),

or, from (9),

/ - "12.34 .... (n-1) ~ "ln.34 .... In-ll • "n2.34 ■ , ■ (n-1 /11\

012.34 n T^JT 7 \ii-)

1 °2n.34 .... (n-1) • °n2.34 .... (n-1)

The student should note that this is an expression of the form

1 - °12 ~ »ln • "n2

°12.n = -I r i

1 — °2n • °n2

with the subscripts 34 ... . (ra-1) added throughout. The

coefficient 6123t. . . . „ may therefore be regarded as determined

from a regression-equation of the form

xlM .... (Ii-l) = "l2.34 . . . n ■ ^2.34 . . . (ii-1) + "ln.23 . . . 1m-ll • ^'tl.34 . . . (n-l)i

i.e. it is the partial regression of xl3i . . . . '„-i, on x2Si . . . . '„-,,,

^n.34 .... (n-ii being given. As any other secondary suffix might

have been eliminated in lieu of n, we might also regard it as

the partial regression of xlM . . „ on x2tb ....„, xsi5 . „ being

given, and so on.

13. From equation (11) we may readily obtain a corresponding

equation for correlations. For (11) may be written

I ^12.34 .... (n-1) — ^ln.34 .... (Il-l1 • *Wm .... i'l-l) o-1.34 .... (n—11.

"12.34 . . It 1 - r" IT
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

1 '2n.34 .... (n-1) °2.34 .... (I

Hence, writing down the corresponding expression for 65,34 .

and taking the square root

. ■ (n-1)


'12.34 n— 7t m \\ /I --«L \J

V1 M>t.34 . . . . (n-lV V1 r2n.34 .... (n-1)/

This is, similarly, the expression for three variables

*■12 ~ *■ln ■ **aw

with the secondary subscripts added throughout, and r12.34 „

can be assigned interpretations corresponding to those of 6,2 M „

above. Evidently equation (12) permits of an absolute check on

the arithmetic in the calculation of all partial coefficients of an

order higher than the first, for any one of the secondary suffixes

of r123i.... „ can be eliminated so as to obtain another equation of

the same form as (12), and the value obtained for r12M.... n by

inserting the values of the coefficients of lower order in the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

expression on the right must be the same in each case.

14. The equations now obtained provide all that is necessary

for the arithmetical solution of problems in multiple correlation.

The best mode of procedure on the whole, having calculated all

the correlations and standard-deviations of order zero, is (1) to

calculate the correlations of higher order by successive applications

of equation (12); (2) to calculate any required standard deviations

by equation (10); (3) to calculate any required regressions by

equation (8): the use of equation (11) for calculating the

regressions of successive orders directly from each other is com-

paratively clumsy. We will give two illustrations, the first for



three and the second for four variables. The introduction of

more variables does not involve any difference in the form of the

arithmetic, but rapidly increases the amount.

Example i.—The first illustration we shall take will be a

continuation of example i. of Chapter IX., in which the correla-

tion was worked out between (1) the average earnings of agri-

cultural labourers and (2) the percentage of the population in

receipt of Poor-law relief in a group of 38 rural districts. In

Question 2 of the same chapter are given (3) the ratios of the

numbers in receipt of outdoor relief to the numbers relieved in the

workhouse, in the same districts. Required to work out the partial

correlations, regressions, etc., for these three variables.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Using as our notation X1 = average earnings, X2 = percentage of

population in receiptof relief, X3 — out-relief ratio, the first constants

determined are—

Ml = 15-9 shillings

M^= 3-67 per cent.

M^ 5-79

o-1 = 1-71 shillings r12=-0"66

o-2 = 1-29 per cent. r13=-0-13


'3— """ '23'

To obtain the partial correlations, equation (12) is used direct in

its simplest form—


The work is best done systematically and the results collected

in tabular form, especially if logarithms are used, as many of the

logarithms occur repeatedly. First it will be noted that the

logarithms of (1-r)' occur in all the denominators; these had,

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

accordingly, better be worked out at once and tabulated (col. 2 of

the table below). In col. 3 the product term of the numerator of



log v'l - i-a.





7. 8.











Correlatiou of

First Order.

log a/1 -I-2.





r23 = +0-60




- 0-0780

-0 3960







of the denominators from those of the numerators we have the

logarithms of the correlation of the first-order. It is also as well

to calculate at once, for reference in the calculation of standard-

deviations of the second-order, the values of log Jl - r2 for the

first-order coefficients (col. 9).

Having obtained the correlations we can now proceed to the

regressions. If we wish to find all the regression-equations, we

shall have six regressions to calculate from equations of the form

"12-3 = rl2-3 • "W °"2S-

These will involve all the six standard-deviations of the first

order o-12, o-13, o-2v CT2-3, e^c- ^ut tne standard-deviations of

the first-order are not in themselves of much interest, and the

standard-deviations of the second-order are so, as being the

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

standard-errors or root-mean-square errors of estimate made in

using the regression-equations of the second-order. We may

save needless arithmetic, therefore, by replacing the standard-

deviations of the first-order by those of the second, omitting the

former entirely, and transforming the above equation for 6123

to the form

"12-3 = rl2-S - tr1.23/<r2-13.

This transformation is a useful one and should be noted by the

student. The values of each o- may be calculated twice inde-

pendently by the formulae of the form

°"i.23 = ""i(1 - tiiY 0- - ris.2)*

so as to check the arithmetic; the work is rapidly done if the

values of log Jl - r2 have been tabulated. The values found are

log o-123 = 0-06146 o-123=l'15

log o-213 = 1 -84584 o-2'13 = 0-70

log 0-3.12 = 0-34571 o-3.12 = 2-22

From these and the logarithms of the r's we have

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

log 61S8 = 008116, 6123= -1-21 :log6132 = T-36174, 6132= +0-23

log 6213 = 1-64993, 6213= -0-45 : log 6231 = T-33917, bwl= +0-22

log 6312 = 1-93024, 631.2= +0-85 : log &m = 0-33891, 6g2a = +2-18

That is, the regression-equations are

(1) *1= - 1-21 x.2 + 0-23 x8

(2) x2= - 0-45 x1 + 0-22 xs

(3) x3= +0-85^ + 2-18*2


or, transferring the origins to zero,

(1) Earnings Z1 = + 18"4 - 1-21 X,+ 0-23 Xs

(2) Pauperism X2 = + 9-55 - 0-45 X1 + 0-22 X3

(3) Out-relief ratio Z, = - 15-7 + 0'85 X1 + 218 X2

The units are throughout one shilling for the earnings X,, 1

per cent. for the pauperism X2 , and 1 for the out-relief ratio X2.

The first and second regression-equations are those of most

practical importance. The argument has been advanced that

the giving of out-relief tends to lower earnings, and the total

coefficient (rls=-013) between earnings (X1) and out-relief

(X3), though very small (cf. Chap. IX. § 17), does not seem

inconsistent with such a hypothesis. The partial correlation

coefficient (r132 = + 0'44) and the regression-equation (1), how-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

ever, indicate that in unions with a given percentage of the

population in receipt of relief (X2) the earnings are highest where

the proportion of out-relief is highest; and this is, in so far,

against the hypothesis of a tendency to lower wages. It remains

possible, of course, that out-relief may adversely affect the possibil-

ity of earning, e.g. by limiting .the employment of the old. As

regards pauperism, the argument might be advanced that the

observed correlation (r23 = +0-60) between pauperism and out-

relief was in part due to the negative correlation (r13= -0-13)

between earnings and out-relief. Such a hypothesis would have

little to support it in view of the smallness and doubtful signifi-

cance of rn, and is definitely contradicted by the positive partial

correlation r23.1 = + 0'69, and the second regression-equation. The

third regression-equation shows that the proportion of out-relief is

on the whole highest where earnings are highest and pauperism

greatest. It should be noticed, however, that a negative ratio is

clearly impossible, and consequently the relation cannot be strictly

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

linear; but the third equation gives possible (positive) average

ratios for all the combinations of pauperism and earnings that

actually occur.

Example ii.—(Four variables.) As an illustration of the form

of the work in the case of four variables, we will take a portion

of the data from another investigation into the causation of

pauperism, viz. that described in the first illustration of Chapter X.,

to which the student should refer for details. The variables are

the ratios of the values in 1891 to the values in 1881 (taken as

100) of—

1. The percentage of the population in receipt of relief,

2. The ratio of the numbers given outdoor relief to the numbers

relieved in the workhouse,

3. The percentage of the population over 65 years of age,



4. The population itself,

in the metropolitan group of 32 unions, and the fundamental

constants (means, standard-deviations and correlations) are as


Table I.







Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259



log \/l - r2.




+ 0-52





+ 0-41

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google









+ 0-49



+ 0-23



+ 0-25


It is seen that the average changes are not great; the per-

centages of the population in receipt of relief have increased on

an average by 4.7 per cent., the out-relief ratio has dropped by

9-4 per cent., and the percentage of old has increased by 7-7

per cent., at the same time as the population of the unions has

risen on the average by 113 per cent. At the same time the

standard-deviations of the first, second, and fourth variables are

very large. As a matter of fact, while in one union the

pauperism decreased by nearly 50 per cent. and in others by

20 per cent., in some there were increases of 60, 80, and 90

per cent.; similarly, in the case of the out-relief, in several unions



Table II.






Correlation -




(First Order).

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Term of


log VI-r2.

(Zero Order).





+ 0-52

+ 0-41

+ 0-49

+ 0-2009

+ 0-2548

+ 0-2132

+ 0-3191

+ 0-1552

+ 0-2768
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google




+ 0-4013

+ 0-2084

+ 0-3553







+ 0-52


+ 0-23


+ 0-1196


+ 0-5522


+ 0-3028




+ 0-5731


+ 0-3580






+ 0-41


+ 0-25


+ 0-1025


+ 0-4450

correlations of the first order (Table II. col. 4) are obtained.

The first-order coefficients are then regrouped in sets of three,

with the same secondary suffix (Table III. col. 1), and these

are treated precisely in the same way as the coefficients of order

zero. In this way, it will be seen, the value of each coefficient

of the second order is arrived at in two ways independently, and

so the arithmetic is checked: rl2 84 occurs in the first and fourth

lines, for instance, rl3ii in the second and seventh, and so on.

Of course slight differences may occur in the last digit if a

sufficient number of digits is not retained, and for this reason the

intermediate work should be carried to a greater degree of

accuracy than is necessary in the final result; thus four places

of decimals were retained throughout in the intermediate work of

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

this example, and three in the final result. If he carries out an

independent calculation, the student may differ slightly from

the logarithms given in this and the following work, if more or

fewer figures are retained.

Having obtained the correlations, the regression can be calcu-

lated from the third-order standard-deviations by equations of the

form (as in the last example),

h =r o-l-234

"12-34 12 34- l


so the standard-deviations of lower orders need not be evaluated.

Using equations of the form

r,.2>4 = o-i(l - r?2)*(l - *&»)*(! - ru.



we find

= ^(1-^(1 -r?,4)'(l-r?,3,)*
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

log o-1234 = 1-35740 o-1234 = 22-8

log 0-2.134 = 1 -50597 02.iM = 32'1

log o-3124 = 0-65773 o-3 m = 4-55

log o-4i;3= 1-32914 o-4.m = 21-3

All the twelve regressions of the second order can be readily

calculated, given these standard deviations and the correlations,

but we may confine ourselves to the equation giving the changes

in pauperism (X^) in terms of other variables as the most impor-

tant. It will be found to be

xl = 0-325x2 + 1 -383x3 - 0-383#4,

or, transferring the origins and expressing the equation in terms of


X1 = - 31 1 + 0-325X, + 1-383X3 - 0-383X4,


or, again, in terms of percentage-changes (ratio - 100). Percent-

age change in pauperism

= +1 "4 per cent.

+ 0-325 times the change in out-relief ratio.

-S-1-383 „ ,, proportion of old.

-0-383 „ „ population.

These results render the interpretation of the total coefficients,

which might be equally consistent with several hypotheses, more

clear and definite. The questions would arise, for instance,

whether the correlation of changes in pauperism with changes in

out-relief might not be due to correlation of the latter with the

other factors introduced, and whether the negative correlation with

changes in population might not be due solely to the correlation

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

of the latter with changes in the proportion of old. As a matter

of fact, the partial correlations of changes in pauperism with

changes in out-relief and in proportion of old are slightly less than

the total correlations, but the partial correlation with changes in

population is numerically greater, the figures being

= +0-52


= +0-41


= -0-14

ru-w= -°.36

So far, then, as we have taken the factors of the case into

account, there appears to be a true correlation between changes

in pauperism and changes in out-relief, proportion of old, and

population—the latter serving, of course, as some index to

changes in general prosperity. The relative influences of the

three factors are indicated by the regression-equation above.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

[For the full discussion of the case cf. Jour. Roy. Stat. Soc.,

vol. lxii., 1899.]

15. The correlation between pauperism and labourers' earnings

exhibited by the figures of Example i. was illustrated by a diagram

(fig. 40, p. 180), in which scales of "pauperism" and "earnings"

were taken along two axes at right angles, and every observed

pair of values was entered by marking the corresponding point

with a small circle: the diagram was completed by drawing in

the lines of regression. In precisely the same way the correlation

between three variables may be represented by a model showing the

distribution of points in space; for any set of observed values Xv

X2, X3 may be regarded as determining a point in space, just as

any pair of values X1 and X2 may be regarded as determining a

point in a plane. Fig. 45 is drawn from such a model, constructed

from the data of Example i. Four pieces of wood are fixed together



like the bottom and three sides of a box. Supposing the open

side to face the observer, a scale of pauperism is drawn vertically

upwards along the left-hand angle at the back of the "box," the

Fig. 45. — Model illustrating the Correlation between three Variables: (1)

Pauperism (percentage of the population in receipt of Poor-law relief);

(2) Out-relief ratio (numbers given relief in their homes to one in the

workhouse); (8) Average Weekly Earnings of agricultural labourers,

(data pp. 178 and 189). A, front view; B, view of model tilted till the

plane of regression for pauperism on the two remaining variables is seen

as a straight line.


as a si.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

scale starting from zero, as very small values of pauperism occur:

a scale of out-relief ratio is taken along the angle between the

back and bottom of the box, starting from zero at the left: finally,

the scale of earnings is drawn out towards the observer along the

angle between the left-hand side and the bottom, but as earnings

lower than 12s. do not occur, the scale may start from 12s. at the

corner. Suitable scales are: pauperism, 1 in. = 1 per cent.; out-

relief ratio, 1 in. = 1 unit; earnings, 1 in. = Is.; and the inside

measures of the model may then be 17 in. x 10 in. x 8 in. high,

the dimensions of the model constructed. Given these three

scales, any set of observed values determine a point within the

"box." The earnings and out-relief ratio for some one union are

noted first, and the corresponding point marked on the baseboard;

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

a steel wire is then inserted vertically in the base at this point

and cut off at the height corresponding, on the scale chosen, to

the pauperism in the same union, being finally capped with a

small ball or knob to mark the "point" clearly. The model

shows very well the general tendency of the pauperism to be the

higher the lower the wages and the higher the out-relief, for the

highest points lie towards the back and right-hand side of the

model. If some representation of all three equations of regression

were to be inserted in the model, the result would be rather

confusing; so the most important equation, viz. the second, giving

the average rate of pauperism in terms of the other variables, may

be chosen. This equation represents a plane: the lines in which

it cuts the right-and left-hand sides of the "box" should be

marked, holes drilled at equal intervals on these lines on the

opposite sides of the box (the holes facing each other), and threads

stretched through these holes, thus outlining the plane as shown

in the figure. In the actual model the correlation-diagrams (like

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

fig. 40) corresponding to the three pairs of variables were drawn

on the back sides and base: they represent, of course, the eleva-

tions and plan of the points.

The student possessing some skill in handicraft would find it

worth while to make such a model for some case of interest to

himself, and to study on it thoroughly the nature of the plane of

regression, and the relations of the partial and total correlations.

16. If we write

»f.*....» = 0?(l-^» ...„,). • . (13)

it may be shown that Ru.a ....„l is the correlation between

xl and the expression on the right-hand side of the regression-

equation, say e123 ....„, where

e1.2S . . . II = "12.34 . . . n • X2 + "13.24. . . n - x% + • • • + "ln.23. .. (n-1) • Xn • K^*)


For we have

and also

2(«?.» ) = 2(*i-*i.2s. . . n)2 = ^(0-1-^.23 t)

whence the correlation between x1 and eia „ is

(«?-«!« y

»'.«. the value of RII2S n) given by (13). The value of R is

accordingly a useful datum as indicating how closely x1 can

be expressed in terms of a linear function of x2, x3 . . . . xn, and

the values of the regressions may be regarded as determined

by the condition that R shall be a maximum. Its value is

essentially positive as the product-sum S^.e, ffl ....„) is positive.

R may be termed a coefficient of (n-1)-fold (or double, triple,

etc.) correlation; for n variables there are n such correlations,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

but in the limiting case of two variables the two are identical.

The value may be readily calculated, either from o-,t,. . . . „ and

o-, or directly from the equation

1 --KU...„-(l-»4)(l -r?ls)(l -rU).. . (1-rU...<-,,)• (15)

It is obvious from this equation that since every bracket on

the right is not greater than unity,


Hence Rl{23 ....„, cannot be numerically less than r12. For the

same reason, rewriting (15) in every possible form, RM23 „,

cannot be numerically less than r12, r13, .... rl„, i.e. any one

of the possible constituent coefficients of order zero. Further,

for similar reasons, Rim ....„, cannot be numerically less than

any possible constituent coefficient of any higher order. That

is to say, R1(w ....„, is not numerically less than the greatest

of all the possible constituent coefficients, and is usually, though

not always, markedly greater. Thus in Example i., i?2<i3)

(the coefficient of double correlation between pauperism on

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

the one hand, out-relief and labourers' earnings on the other)

is 0839, and the numerically £- satest of the possible constituent

coefficients is r123= - 0-79. Again, in Example ii., RMiSi] is

0-626, and the numeric?Vly greatest of the possible constituent

coefficients is r124 = +0-573.

The student should; notice that R is necessarily positive.

Further, even if all the variables Xv X2 , . . . . Xn were strictly

uncorrelated in the original universe as a whole, we should expect

rut ri3 2, rH-28' etc-' to exhibit values (whether positive or negative)


differing from zero in a limited sample. Hence, It will not

tend, on an average of such samples, to be zero, but will

fluctuate round some mean value. This mean value will

be the greater the smaller the number of observations in the

sample, and also the greater the number of variables. When

only a small number of observations are available it is,

accordingly, little use to deal with a large number of variables.

As a limiting case, it is evident that if we deal with n variables

and possess only n observations, all the partial correlations

of the highest possible order will be unity.

17. It is obvious that as equations (11) and (12) enable us to

express regressions and correlations of higher orders in terms of

those of lower orders, we must similarly be able to express the

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

coefficients of lower in terms of those of higher orders. Such

expressions are sometimes useful for theoretical work. Using the

same method of expansion as in previous cases, we have

0 = 2,(2^ 23 ....n. %34 .... (n-1))

= 2,(#! . ^2.34 .... (n-1)) — "12.84 . . . . n ^\x2 • ^2.34 .... (n-l,)

— "ln.23 .... (n-1) 2\&nm xiM .... (n-1))

That is,

"12.34 .... (n-l1 = "12.34 . . . . n + "ln.23 .... (n-1) • "n2.34 .... (n-1)'

In this equation the coefficient on the left and the last on the

right are of order n - 3, the other two of order n - 2. We therefore

wish to eliminate the last coefficient on the right. Interchanging

the suffixes 1 for n and n for 1, we have


— "t2.13 .... (n-1) • + "nl.23 .... n-1) • "12.34 .... (n-1)'

Substituting this value for i^.^ („-,, in the first equation we


1 "12.34 . , . . n + "ln.23 .... (n-1) • "n2.13 .... (n-1) ,,„,

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

°12.34 . . . . (n-1) — 1 I 1 . (10}

1 "ln.23 . . . (n-1) " "ul.23 .... (n-1)

This is the required equation for the regressions; it is the equation

7 - "I2.n + Oln.2 • "n2.1

1 "ln.2 • "nl.2

with secondary suffixes 34 .... (to- 1) added throughout. The

corresponding equation for the correlations is obtained at once

by writing down equation (16) for 62134 ... („-,, and taking the

square root of the product (cf. §13); this gives

- r12.34 . . . . n + ^.23 .... (n-1) • *Vn.l3 ■ ■ ■ ■ (n-1) /, h\

»"l2.34 .... (n-1)— /i - 2 \(/1 -„2 \j " {i-l)

V1 "ln.23 .... (n-1)/1 \l '2n.l3 .... (n-1)/



which is similarly the equation

rl2.n + rln.l • rln.\



with the secondary suffixes 34 .... (n- 1) added throughout.

18. Equations (12) and (17) imply that certain limiting

inequalities must hold between the correlation-coefficients in

the expression on the right in each case in order that real

values (values between ±1) may be obtained for the correlation-

coefficient on the left. These inequalities correspond precisely

with those "conditions of consistence" between class-frequencies

with which we dealt in Chapter II., but we propose to treat them

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

only briefly here. Writing (12) in its simplest form for r123,

we must have rj23< 1 or

that is,


ri8 ^"23) < I



if the three r's are consistent with each other. If we take r12, r13

as known, this gives as limits for r.ri

*Wi, ± Jl-ri2- rj, + r^rfj.

Similarly writing (17) in its simplest form for r12 in terms of

r123, r132, and r23v we must have

*1a3 + T0.t + *"!kl + -,r12.3rlMr23.1 < 1


and therefore, if r123 and r132 are given, r231 must lie between

the limits

- **12.3ri3.2 i V 1 — * 12.3 — **13.2 + ^2.^13. •

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

The following table gives the limits of the third coefficient in

a few special cases, for the three coefficients of zero order and

of the first order respectively :—

Value of

Limits of

m or r12.3.

ns or ri3.2.













+ V0-5

+ \A)-5

0, +1

0, -1

0, -1

0, +1

The student should notice that the set of three coefficients of

order zero and value unity are only consistent if either one only,

or all three, are positive, i.e. +1, +1, +1, or - 1, - 1, +1 ; but

not -1,-1, -1. On the other hand, the set of three coefficients

of the first order and value unity are only consistent if one only,

or all three, are negative: the only consistent sets are +1, +1,

— 1 and - 1, - 1, - 1. The values of the two given r's need to

be very high if even the sign of the third can be inferred; if the

two are equal, they must be at least equal to «/0-5 or '707 . . . .

Finally, it may be noted that no two values for the known

coefficients ever permit an inference of the value zero for the

third; the fact that 1 and 2, 1 and 3 are uncorrelated, pair and

pair, permits no inference of any kind as to the correlation

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

between 2 and 3, which may lie anywhere between + 1 and - 1.

19. We do not think it necessary to add to this chapter a

detailed discussion of the nature of fallacies on which the theory

of multiple correlation throws much light. The general nature of

such fallacies is the same as for the case of attributes, and was

discussed fully in Chap. IV. £§ 1-8. It suffices to point out the

principal sources of fallacy which are sugg^oted at once by the

form of the partial correlation


123 v/(l-rr3)(l-r£) *'

and from the form of the corresponding expression for r12 in terms

of the partial coefficients

12 s/a-^xi-rio


From the form of the numerator of (a) it is evident (1) that even

if r12 be zero, r123 will not be zero unless either r13 or ri3, or

both, are zero. If r13 and r23 are of the same sign the partial
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

association will be positive; if of opposite sign, negative. Thus

the quantity of a crop might appear to be unaffected, say, by

the amount of rainfall during some period preceding harvest:

this might be due merely to a correlation between rain and low-

temperature, the partial correlation between crop and rainfall

being positive and important. We may thus easily misinterpret

a coefficient of correlation which is zero. (2) r123 may be, indeed

often is, of opposite sign to r12, and this may lead to still more

serious errors of interpretation.

From the form of the numerator of (b), on the other hand, we

see that, conversely, r12 will not be zero even though r123 is zero,

unless either r13.2 or r231 is zero. This corresponds to the theorem


of Chap. IV. § 6, and indicates a source of fallacies similar to

those there discussed.

20. We have seen (§ 9) that r123 is the correlation between xi3

and x23, and that we might determine the value of this partial

correlation by drawing up the actual correlation table for the two

residuals in question. Suppose, however, that instead of drawing

up a single table we drew up a series of tables for values of xl3

and x13 associated with values of x3 lying within successive

class-intervals of its range. In general the value of r123 would

not be the same (or approximately the same) for all such tables,

but would exhibit some systematic change as the value of x3

increased. Hence r123 should be regarded, in general, as of the

nature of an average correlation: the cases in which it measures

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

the correlation between x13 and x23 for every value of xa (cf.

Chap. XVI.) are probably exceptional. The process for deter-

mining partial associations (cf. Chap. IV.) is, it will be remembered,

thorough and complete, as we always obtain the actual tables

exhibiting the association between, say, A and B in the universe

of C's and the universe of y's: that these two associations may

differ materially, is illustrated by Example i. of Chap. IV.

(pp. 45-6). It might sometimes serve as a useful check on

partial-correlation work to reclassify the observations by the

fundamental methods of that chapter.


The preceding chapter is written from the standpoint of refs. 3 and 4, and

with the notation and method of ref. 5. The theory of correlation for several

variables was developed by Edgeworth and Pearson (refs. 1 and 2) from the

standpoint of the "normal" distribution of frequency (cf. Chap. XVI.).


(1) Edgewoeth, F. Y., "On Correlated Averages," Phil. Mag., 5th Series,
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

vol. xxxiv., 1892, p. 194.

(2) Pearson, Karl, "Regression, Heredity, and Panmixia," Phil. Trans.

Roy. Soc, Series A, vol. clxxxvii., 1896, p. 253.

(3) Yule, G. U., "On the Significance of Bravais' Formulae for Regression,

etc., in the case of Skew Correlation," Proc. Roy. Soc., vol. lx., 1897,

p. 477.

(4) Yule, G. U., "On the Theory of Correlation," Jour. Roy. Stat. Soc,

vol. lx., 1897, p. 812.

(5) Yule, G. U., "On the Theory of Correlation for any number of Variables

treated by a New System of Notation," Proc. Roy. Soc, Series A, vol.

lxxix., 1907, p. 182.

(6) Hooker, R. H., and G. U. Yule, "Note on Estimating the Relative

Influence of Two Variables upon a Third," Jour. Roy. Stat. Soc, vol.

lxix., 1906, p. 197.


Illustrative Applications of Economic Interest.

(7) Yule, G. U., "An Investigation into the Causes of Changes in Pauperism

in England, etc.," Jour. Roy. Stat. Soc, vol. lxii., 1899, p. 249.

(8) Hooker, R. H., "The Correlation of the Weather and the Crops," Jour.

Roy. Stat. Soc., vol. lxx., 1907, p. 1.


1. (Ref. 8.) The following means, standard-deviations, and correlations are

found for

Xx = seed-hay crops in cwts. per acre,

X2 = spring rainfall in inches,

X3 = accumulated temperature above 42° F. in spring,

in a certain district of England during 20 years.

3/, = 28 02 ,r, = 4-42 r,2=+0-80

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

M2= 4-91 <-2 = l-10 r13=-0-40

,¥3 = 594 (r3 = 85 ^=-0-56

Find the partial correlations and the regression-equation for hay-crop on spring

rainfall and accumulated temperature.

2. (The following figures must be taken as an illustration only: the data

on which they were based do not refer to uniform times or areas.)

X±— deaths of infants under 1 year per 1000 births in same year (infantile


X1 = proportion per thousand of married women occupied for gain.

X, = death-rate of persons over 5 years of age per 10,000.

Xi = proportion per thousand of population living 2 or more to a room


Taking the figures below for 30 urban areas in England and Wales, find the

partial correlations and the regression-equation for infantile mortality on the

other factors.

J/, = 164

<r,= 200
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

r12 = +0-49

»-23= +0'15

J/2 = 158

a2= 74-9


r24= -0 37

34 = 143

a3 = 22-4


r34= + 0'23

JI/4 = 205

,r4 = 130-0

3. If all the correlations of order zero are equal, say = r, what are the values

of the partial correlations of successive orders?

Under the same condition, what is the limiting value of r if all the equal

correlations are negative and n variables have been observed?

4. What is the correlation between a^.2and a^1?

5. Write down from inspection the values of the partial correlations for the

three variables

-3l1, X2, and X3=a. X1 + b.X2.

Check the answer to Qu. 7, Chap. XL, by working out the partial


6. If the relation

a.x1 + b.x2 + c.x3=0

holds for all sets of values of xlt x^, and x3, what must the partial correlations


Check the answer to Qu. 9, Chap. XL, by working out the partial




1. The problem of the present Part—2. The two chief divisions of the theory

of sampling—3. Limitation of the discussion to the case of simple

sampling—4. Definition of the chance of success or failure of a given

event—5. Determination of the mean and standard-deviation of the

number of successes in n events—6. The same for the proportion of

successes in n events: the standard-deviation of simple sampling as a

measure of unreliability, or its reciprocal as a measure of precision—7.

Verification of the theoretical results by experiment—8. More detailed

discussion of the assumptions on which the formula for the standard-

deviation of simple sampling is based—9-10. Biological cases to

which the theory is directly applicable—11. Standard-deviation of

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

simple sampling when the numbers of observations in the samples

vary—12. Approximate value of the standard-deviation of simple

sampling, and relation between mean and standard-deviation, when

the chance of success or failure is very small—13. Use of the standard-

deviation of simple sampling, or standard error, for checking and

controlling the interpretation of statistical results.

1. On several occasions in the preceding chapters it has been

pointed out that small differences between statistical measures like

percentages, averages, measures of dispersion and so forth cannot

in general be assumed to indicate the action of definite and assign-

able causes. Small differences may easily arise from indefinite

and highly complex causation such as determines the fluctuating

proportions of heads and tails in tossing a coin, of black balls in

drawing samples from a bag containing a mixture of black and

white balls, or of cards bearing measurements within some given

class-interval in drawing cards, say, from an anthropometric record.

In 100 throws of a coin, for example, we may have noted 56 heads

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

and only 44 tails, but we cannot conclude that the coin is biassed:

on repeating our throws we may get only 48 heads and 52 tails.

Similarly, if on measuring the statures of 1000 men in each of

two nations we find that the mean stature is slightly greater for



nation A than for nation B, we cannot necessarily conclude that

the real mean stature is greater in the case of nation A: possibly

if the observations were repeated on different samples of 1000

men the ratio might be reversed.

2. The theory of such fluctuations may be termed the theory

of sampling, and there are two chief sections of the theory corre-

sponding to the theory of attributes and the theory of variables

respectively. In tossing a coin we only classify the results of the

tosses as heads or tails; in drawing balls from a mixture of black

and white balls, we only classify the balls drawn as black or as

white. These cases correspond to the theory of attributes, and

the general case may be represented as the drawing of a sample

from a universe containing both A's and a's, the number 6r

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

proportion of A's in successive samples being observed. If, on the

other hand, we put in a bag a number of cards bearing different

values of some variable X and draw sample batches of cards, we

can form averages and measures of dispersion for the successive

batches, and these averages and measures of dispersion will vary

slightly from one batch to another. If associated measures of

two variables X and Y are recorded on each card, we can also form

correlation-coefficients for the different batches, and these will vary

in a similar manner. These cases correspond to the theory of

variables, and it is the function of the theory of sampling for such

cases to inform us as to the fluctuations to be expected in the

averages, measures of dispersion, correlation-coefficients, etc., in

successive samples. In the present and the three following

chapters the theory of sampling is dealt with for the case of

attributes alone. The theory is of great importance and interest,

not only from its applications to the checking and control of

statistical results, but also from the theoretical forms of frequency-

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

distribution to which it leads. Finally, in Chapter XVII. one or

two of the more important cases of the theory of sampling for

variables are briefly treated, the greater part of the theory, owing

to its difficulty, lying somewhat outside the limits of this work.

3. The theory of sampling attains its greatest simplicity if

every observation contributed to the sample may be regarded as

independent of every other. This condition of independence

holds good, e.g., for the tossing of a coin or the throwing of a die:

the result of any one throw or toss does not affect, and is un-

affected by, the results of the preceding and following tosses.

It does not hold good, on the other hand, for the drawing of balls

from a bag: if a ball be drawn from a bag containing 3 black

and 3 white balls, the remainder may be either 2 black and 3

whitej or 2 white and 3 black, according as the first ball was

black or white. The result of drawing a second ball is therefore


dependent on the result of drawing the first. The disturbance

can only be eliminated by drawing from a bag containing a

number of balls that is infinitely large compared with the

total number drawn, or by returning each ball to the bag before

drawing the next. In this chapter our attention will be confined

to the case of independent sampling, as in coin-tossing or dice-

throwing—the simplest cases of an artificial kind suitable for

theoretical study and experimental verification. For brevity, we

may refer to such cases of sampling as simple sampling: the

implied conditions are discussed more fully in § 8 below.

4. If we may regard an ideal coin as a uniform, homogeneous

circular disc, there is nothing which can make it tend to fall more

often on the one side than on the other; we may expect, there-
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

fore, that in any long series of throws the coin will fall with

either face uppermost an approximately equal number of times,

or with, say, heads uppermost approximately half the times.

Similarly, if we may regard the ideal die as a perfect homogeneous

cube, it will tend, in any long series of throws, to fall with each

of its six faces uppermost an approximately equal number of

times, or with any given face uppermost one-sixth of the whole

number of times. These results are sometimes expressed by

saying that the chance of throwing heads (or tails) with a coin is

1/2, and the chance of throwing six (or any other face) with a die

is 1/6. To avoid speaking of such particular instances as coins

or dice, we shall in future, using terms which have become

conventional, refer to an event the chance of success of which is p

and the chance of failure q. Obviously p + q = 1.

5. Suppose we take N samples with n events in each. What

will be the values towards which the mean and standard-deviation

of the number of successes in a sample will tend? The mean is

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

given at once, for there are N.n events, of which approximately

pNn will be successes, and the mean number of successes in a

sample will therefore tend towards pn. As regards the standard-

deviation, consider first the single event («=1). The single

event may give either no successes or one success, and will tend

to give the former qN, the latter pN, times in N trials. Take

this frequency-distribution and work out the standard-deviation

of the number of successes for the single event, as in the case of

an arithmetical example :—

Frequency/. Successes {. ft. /|3.

qN 0 — ,-

pN 1 pJf pN

N — pN pN

We have therefore M=p, and


But the number of successes in a group of n such events is the

sum of successes for the single events of which it is composed,

and, all the events being independent, we have therefore, by the

usual rule for the standard-deviation of the sum of independent

variables (Chap. XI. § 2, equation (2)), <r„ being the standard-

deviation of the number of successes in n events,


= W (1)

This is an equation of fundamental importance in the theory of

sampling. The student should particularly bear in mind that

the standard-deviation of the number of successes, due to

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

fluctuations of simple sampling alone, in a group of n events

varies, not directly as n, but as the square root of n.

6. In lieu of recording the absolute number of successes in each

sample of n events, we might have recorded the proportion of

such successes, i.e. 1/nth of the number in each sample. As this

would amount to merely dividing all the figures of the original

record by n, the mean proportion of successes—or rather the value

towards which the mean tends to approach—must be p, and the

standard-deviation of the proportion of successes sn be given by

sl = o-l/n2=pq/n . . . . (2)

The standard-deviation of the proportion of successes in samples

of such independent events varies therefore inversely as the square

root of the number on which the proportion is calculated. Now

if we regard the observed proportion in any one sample as a

more or less unreliable determination of the true proportion in

a very large sample from the same material, the standard-devia-

tion of sampling may fairly be taken as a measure of the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

unreliability of the determination—the greater the standard-

deviation, the greater the fluctuations of the observed proportion,

although the true proportion is the same throughout. The

reciprocal of the standard-deviation (1/s), on the other hand, may

be regarded as a measure of reliability, or, as it is sometimes

termed, precision, and consequently the reliability or precision of

an observed proportion varies as the square root of the number of

observations on which it is based. This is again a very important

rule with many practical applications, but the limitations of the

case to which it applies, and the exact conditions from which it

has been deduced, should be borne in mind. We return to this

point again below (§ 8 and Chap. XIV.).

7. Experiments in coin tossing, dice throwing, and so forth

have been carried out by various persons in order to obtain ex-



perimental verification of these results. The following will serve

as illustrations, but the student is strongly recommended to

carry out a few series of such experiments personally, in order to

acquire confidence in the use of the theory. It may be as well

to remark that if ordinary commercial dice are to be used for the

trials, care should be taken to see that they are fairly true cubes,

and the marks not cut very deeply. Cheap dice are generally

very much out of truth, and if the marks are deeply cut the

balance of the die may be sensibly affected. A convenient mode

of throwing a number of dice, suggested, we believe, by the late

Professor Weldon, is to roll them down an inclined gutter of

corrugated paper, so that they roll across the corrugations.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(1) (W. F. R. Weldon, cited by Professor F. Y. Edgeworth,

Encyel. Brit., 10th edn., vol. xxviii. p. 282. Totals of the columns

in the table there given.)

Twelve dice were thrown 4096 times; a throw of 4, 5, or 6 points

reckoned a success, therefore p = q = 0-5. Theoretical mean M= 6;

theoretical value of the standard-deviation o-12= ^0-5 x 0-5 x 12 =


The following was the frequency-distribution observed :—




Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

















Mean M= 6-139, standard-deviation o-= 1-712. The proportion of

successes is 6-139/12 = 0-512 instead of 0-5.

(2) (W. F. R. Weldon, loc. cit., p. 289. Totals of columns of

the table given.)

Twelve dice were thrown 4096 times; only a throw of 6 was

counted a success, so p = 1/6, q = 5/6. Theoretical mean M = 2,

standard-deviation o-= ^1/6 x 5/6 x 12 = 1-291.

The following was the observed frequency-distribution :—





Mean M= 2000, standard-deviation o- = 1-296. Actual proportion

of successes 2"00/12 =0,1667, agreeing with the theoretical value

to the fourth place of decimals. Of course such very close

agreement is accidental, and not to be always expected.

(3) (G. U. Yule.) The following may be taken as an illustra-

tion based on a smaller number of observations. Three dice were

thrown 648 times, and the numbers of 5's or 6's noted at

each throw. p = l/3, q = 2/3. Theoretical mean 1. Standard-

deviation, 0'816.

Frequency-distribution observed:—



Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259







M= l-034, o- = 0-823. Actual proportion of successes 0-345.

For other illustrations, some of which are cited in the questions

at the end of this chapter, the student may be referred to the

list of references on p. 269. The student should notice that in

all the distributions given a range of six times the standard-

deviation includes either all, or the great bulk of, the observations,

as in most frequency-distributions of the same general form. We

shall make use of this rule below, § 13.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

8. In deducing the formulae (1) and (2) for the standard-

deviations of simple sampling in the cases with which we have

been dealing, only one condition has been explicitly laid down as

necessary, viz. the independence of the several drawings, tossings,

or other events composing the sample. But in point of fact this

is not the only nor the most fundamental condition which has

been explicitly or implicitly assumed, and it is necessary to realise

all the conditions in order to grasp the limitations under which

alone the formulae arrived at will hold. Supposing, for example,

that we observe among groups of 1000 persons, at different times

or in different localities, various percentages of individuals

possessing certain characteristics —dark hair, or blindness, or

insanity, and so forth. Under what conditions should we

expect the observed percentages to obey the law of sampling

that we have found, and show a standard-deviation given by

equation (2) 1

(a) In the first place we have tacitly assumed throughout the

preceding work that our dice or our coins were the same set or

identically similar throughout the experiment, so that the chance

of throwing "heads" with the coins or, say, "six" with the dice

was the same throughout: we did not commence an experiment

with dice loaded in one way and later on take a fresh set of dice

loaded in another way. Consequently if formula (2) is to hold

good in our practical case of sampling there must not be a

difference in any essential respect—i.e. in any character that can

affect the proportion observed—between the localities from which

the observations are drawn, nor, if the observations have been

made at different epochs, must any essential change have taken

place during the period over which the observations are spread.

Where the causation of the character observed is more or less

unknown, it may, of course, be difficult or impossible to say what

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

differences or changes are to be regarded as essential, but, where

we have more knowledge, the condition laid down enables us to

exclude certain cases at once from the possible applications of

formula (1) or (2). Thus it is obvious that the theory of simple

sampling cannot apply to the variations of the death-rate in

localities with populations of different age and sex compositions,

nor to death-rates in a mixture of healthy and unhealthy districts,

nor to death-rates in successive years during a period of con-

tinuously improving sanitation. In all such cases variations

due to definite causes are superposed on the fluctuations of


(b) In the second place, we have also tacitly assumed not

only that we were using the same set of coins or dice throughout,

so that the chances p and q were the same at every trial, but

also that all the coins and dice in the set used were identically

similar, so that the chances p and q were the same for every coin

or die. Consequently, if our formulae are to apply in the practical

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

case of sampling, the conditions that regulate the appearance of

the character observed must not only be the same for every

sample, but also for every individual in every sample. This is

again a very marked limitation. To revert to the case of death-

rates, formulae (1) and (2) would not apply to the numbers of

persons dying in a series of samples of 1000 persons, even if these

samples were all of the same age and sex composition, and living

under the same sanitary conditions, unless, further, each sample

only contained persons of one sex and one age. For if each

sample included persons of both sexes and different ages, the

condition would be broken, the chance of death during a given

period not being the same for the two sexes, nor for the young

and the old. The groups would not be homogeneous in the sense

required by the conditions from which our formulae have been

deduced. Similarly, if we were observing hair-colours, our formulae


would not apply if the samples were compounded by always

taking one person from district A, another from district B, and

so on, these districts not being similar as regards the distribution

of hair-colour.

The above conditions were only tacitly assumed in our previous

work, and consequently it has been necessary to emphasise them

specially. The third condition was explicitly stated: (c) The

individual "events," or appearances of the character observed,

must be completely independent of one another, like the throws

of a die, or sensibly so, like the drawings of balls from a bag

containing a number of balls that is very large compared with

the number drawn. Eeverting to the illustration of a death-rate,

our formulae would not apply even if the sample populations

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

were composed of persons of one age and one sex, if we were

dealing, for example, with deaths from an infectious or contagious

disease. For if one person in a certain sample has contracted

the disease in question, he has increased the possibility of others

doing so, and hence of dying from the disease. The same thing

holds good for certain classes of deaths from accident, e.g. railway

accidents due to derailment, and explosions in mines: if such an

accident is fatal to one person it is probably fatal to others also,

and consequently the annual returns show large and more or

less erratic variations.

When we speak of simple sampling in the following pages, the

term is intended to imply the fulfilment of all the conditions (a),

(b), and (c), all the samples and all the individual contributions to

each sample being taken under precisely the same conditions,

and the individual " events" or appearances of the character being

quite independent. It may be as well expressly to note that we

need not make any assumption as to the conditions that determine

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

p unless we have to estimate sjrypq a priori. If we draw a

sample and observe in it the actual proportion of, say, A's:

draw another sample under precisely the same conditions, and

observe the proportion of A's in the two samples together: add

to these a third sample, and so on, we will find that p approaches

—not continuously, but with some fluctuations—closer and closer

to some limiting value. It is this limiting value which is to be

used in our formulae—the value of p that would be observed in

a very large sample. The standard-deviation of the number of

sixes thrown with n dice, on this understanding, may be sjnpq,

even if the dice be out of truth or loaded so that p is no longer

1/6. Similarly, the standard-deviation of the number of black

balls in samples of n drawn from an infinitely large mixture of

black and white balls in equal proportions may be Jnpq even


if p is, say, 1/3, and not 1/2 owing to the black balls, for some

reason, tending to slip through our fingers. (Cf. Chap. XIV.

§ 4.)

9. It is evident that these conditions very much limit the

field of practical cases of an economic or sociological character

to which formulae (1) and (2) can apply without considerable

modification. The formulae appear, however, to hold to a high

degree of approximation in certain biological cases, notably in

the proportions of offspring of different types obtained on crossing

hybrids, and, with some limitations, to the proportions of the

two sexes at birth. It is possible, accordingly, that in these cases

all the necessary conditions are fulfilled, but this is not a necessary

inference from the mere applicability of the formulae (cf. Chap.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

XIV. § 15). In the case of the sex-ratio at birth, it seems

doubtful whether the rule applies to the frequency of the sexes in

individual families of given numbers (ref. 9), but it does apply

fairly closely to the sex-ratios of births in different localities,

and still more closely to the ratios in one locality during

successive periods. That is to say, if we note the number of

males in a series of groups of n births each, the standard-deviation

of that number is approximately sjnpq, where p is the chance

of a male birth; or, otherwise, Jpqfn is the standard-deviation

of the proportion of male births. We are not able to assign an

a priori value to the chance p as in the case of dice-throwing,

but it is quite sufficiently accurate for practical purposes to use

the proportion of male births actually observed if that proportion

be based on a moderately large number of observations.

10. In Table VI. of Chap. IX. (p. 163) was given a correlation-

table between the total numbers of births in the registrationdistricts

of England and Wales during the decade 1881-90 and the pro-
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

portion of male births. The table below gives some similar figures,

based on the same data, for a few isolated groups of districts con-

taining not less than 30 to 40 districts each. In both tables the

drop in dispersion as we pass from the small to the large districts

is extremely striking. The actual standard-deviations, and the

standard-deviations of simple sampling corresponding to the mid-

numbers of births, are given at the foot of the table, and it will

be seen that the two agree, on the whole, with surprising closeness,

considering the small numbers of observations. The actual

standard-deviation is, however, the larger of the two in every case

but one. The corresponding standard-deviations for Table VI. of

Chap. IX. are given in Qu. 7 at the end of this chapter, and show

the same general agreement with the standard-deviations of simple

sampling; the actual standard-deviations are, however, again, as

a rule, slightly in excess of the theoretical values.



Table showing Frequencies of Registration Districts in England and Wales with

Different Ratios of Male to Total Births during the Decade 1881-90, for

Groups of Districts with the Numbers of Births in the Decade lying between

Certain Limits. [Data based on Decennial Supplement to Fifty-fifth Annual

Report of the Registrar-General for England and IVales.]

Male Births

Number of Births

in Decade.

per Thousand

Total Births.


Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

















Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google





482- 3

492- 3

494- 5

496- 7

498- 9

500- 1


The student should note that in both cases the standard-devia-

tions given are standard-deviations of the proportion of male

births per 1000 of all births, that is, 1000 times the values given

by equation (2). These values are given by simply substituting

the proportions per 1000 for p and q in the formula. Thus for

the first column of Table I. the proportion of males is 508 per

1000 births, the mid-number of births 2000, and therefore—

-/508x492V 11.0

*0-V 2000 ) ~iyi-

11. In the above illustration the difficulty due to the wide

variation in the number of births n in different districts has been

surmounted by grouping these districts in limited class intervals,

and assuming that it would be sufficiently accurate for practical

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

purposes to treat all the districts in one class as if the sex-ratios

had been based on the mid-numbers of births. Given a sufficiently

large number of observations, such a process does well enough,

though it is not very good. But if the number of observations

does not exceed, perhaps, 50 or 60 altogether, grouping is

obviously out of the question, and some other procedure must be


Suppose, then, that a series of samples have been taken from

the same material, /. samples containing TO1 individuals or observa-

tions each, /2 containing «2, f3 containing ns, and so on: What

would be the standard-deviation of the observed proportions in

these samples? Evidently the square of the standard-deviation

in the first group would be pq/nv in the secondpq/n2, and so on:

therefore, as the means tend to the same values in all the groups,

we must have for the whole series—

But if H be the harmonic mean of n, n2 ns

and accordingly
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

ff nl 7l2 ng



That is to say, where the number of observations varies from one

sample to another, the harmonic mean number of observations in

a sample must be substituted for n in equation (2).

Thus the following percentages (taken to the nearest unit) of


albinos were obtained in 121 litters from hybrids of Japanese

waltzing mice by albinos, crossed inter se (A. D. Darbishire,

Biometrika, iii. p. 30):—-








Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259











Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google







The distribution is very irregular owing to the small numbers in

the litters, and the standard-deviation is 2 3'09 per cent. . The

numbers of litters of different sizes were given in § 27 of Chap.

VII. p. 128, and the harmonic mean size of litter was found to be

3-53. The expected proportion of albinos is 25 per cent., and

hence the standard-deviation of sampling is


in very close agreement with the actual value. The proportion

of albinos amongst all the offspring together was 24'7 per cent.

12. If one of the two proportions p and q become very small,

equation (1) may be put into an approximate form that is very

useful. Suppose p to be the proportion that becomes very small,

so that we may neglect p2 compared with p: then

pq =p -p2 =p approximately,

and consequently we have approximately

o-„= Jn.p— s/M ... (4)

That is to say, if the proportion of successes be small, the

standard-deviation of the number of successes is the square root of

the mean number of successes. Hence we can find the standard-

deviation of sampling even though p be unknown, provided only

-we know that it is small.

Thus (ref. 14) in 10 Prussian army corps in 20 years (1875-

1894) there were 122 men killed by the kick of a horse, or, on an

average, there were 0-61 deaths from that cause in each army

corps annually. From equation (4) we accordingly have for the

standard-deviation of simple sampling

o- = (0-61)' = 0-78.

The frequency-distribution of the number of deaths per army

corps per annum was






Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

0^ = 0-6079

o- = 0-78


—an almost exact agreement with the standard-deviation of simple


13. We may now turn from these verifications of the theoretical

results for various special cases, to the use of the formulae for

checking and controlling the interpretation of statistical results.

If we observe, in a statistical sample, a certain proportion of

objects or individuals possessing some given character—say A's—

this proportion differing more or less from the proportion which

for some reason we expected, the question always arises whether

the difference may be due to the fluctuations of simple sampling

only, or may be indicative of definite differences between the

conditions in the universe from which the sample has been drawn

and the assumed conditions on which we based our expectation.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Similarly, if we observe a different proportion in one sample from

that which we have observed in another, the question again arises

whether this difference may be due to fluctuations of simple

sampling alone, or whether it indicates a difference between the

conditions subsisting in the universes from which the two samples

were drawn: in the latter case the difference is often said to be

significant. These questions can be answered, though only more

or less roughly at present, by comparing the observed difference

with the standard-deviation of simple sampling. We know

roughly that the great bulk at least of the fluctuations of samp-

ling lie within a range of ± three times the standard-deviation;

and if an observed difference from a theoretical result greatly

exceeds these limits it cannot be ascribed to a fluctuation of

"simple sampling " as defined in § 8: it may therefore be signifi-

cant. The "standard-deviation of simple sampling" being the

basis of all such work, it is convenient to refer to it by a shorter

name. The observed proportions of A's in given samples being

regarded as differing by larger or smaller errors from the true

proportion in a very large sample from the same material, the


"standard-deviation of simple sampling" may be regarded as a

measure of the magnitude of such errors, and may be called ac-

cordingly the standard error.

Three principal cases of comparison may be distinguished.

Case I.—It is desired to know whether the deviation of a certain

observed number or proportion from an expected theoretical value

is possibly due to errors of sampling.

In this case the observed difference is to be compared with the

standard error of the theoretical number or proportion, for the

number of observations contained in the sample.

Example i.—In the first illustration of § 7, 25,145 throws of a 4,

5, or 6 were made in lieu of the 24,576 expected (out of 49,152

throws altogether). The excess is 569 throws. Is this excess

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

possibly due to mere fluctuations of sampling?

The standard error is

o-= ^x|x 49152

= 110-9.

The deviation observed is 5-1 times the standard error, and,

practically speaking, could not occur as a fluctuation of simple

sampling. It may perhaps indicate a slight bias in the dice.

The problem might, of course, have been attacked equally well

from the standpoint of the proportion in lieu of the absolute

number of 4's, 5's, or 6's thrown. This proportion is 0-5116 instead

of the theoretical 0-5000, difference in excess 0-0116. The

standard error of the proportion is



and the difference observed bears the same ratio to the standard

error as before, as of course it must.

Example ii.—(Data from the Second Report of the Evolution

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Committee of the Royal Society, 1905, p. 72.)

Certain crosses of Pisum sativum gave 5321 yellow and 1804

green seeds. The expectation is 25 per cent. of green seeds, or

1781. Can the divergence from the exact theoretical result have

arisen owing to errors of sampling only 1

The numerical difference from the expected result is 23. The

standard error is

o-= J0-25x 0-75x7125 = 36-8.

Hence the divergence from theory is only some 3/5 of the

standard error, and may very well have arisen owing simply to

fluctuations of sampling.

Working from the observed proportion of green seeds, viz. 0-2532

instead of the theoretical (V25, we have

*= V0-25 x 0-75/7125 - 0-0051,

and similarly the divergence from theory is only some 3/5 of the

standard error, as before.

It should be noted that this method must not be used as a test

of association by comparing the difference of (AB) from (A)(B)/N

with a standard error calculated from the latter value as a

"theoretical number," for it is not a theoretical number given

a priori as in the above illustrations, and A and B are themselves

liable to errors of sampling. If we formed an association-table

between the results of tossing two coins N times, o- = JN. \. f

would be the standard error for the divergence of (AB) from the
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

a priori value n/4, not the standard error for differences of (AB)

from (A)(B)/N, (A) and (B) being the numbers of heads thrown

in the case of the first and the second coin respectively.

Case II.—Two samples from distinct materials or different

universes give proportions of A's pt and p2, the numbers of

observations in the samples being jil and n2 respectively, (a) Can

the difference between the two proportions have arisen merely as a

fluctuation of simple sampling, the two universes being really

similar as regards the proportion of A's therein? (b) 'If the

difference indicated were a real one, might it vanish, owing to

fluctuations of sampling, in other samples taken in precisely the

same way 1 This case corresponds to the testing of an association

which is indicated by a comparison of the proportion of A's amongst

B's and B'b.

(a) We have no theoretical expectation in this case as to the

proportion of A's in the universe from which either sample has

been taken.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Let us find, however, whether the observed difference between px

and p2 may not have arisen solely as a fluctuation of simple

sampling, the proportion of A's being really the same in both cases,

and given, let us say, by the (weighted) mean proportion in our

two samples together, i.e. by


(the best guide . ut we have).

Let £t €2 be the standard errors in the two samples, then

If the samples are simple samples in the sense of the previous

work, then the mean difference between p1 and p2 will be zero,


and the standard error of the difference e12, the samples being

independent, will be given by

=M»(i + 4) • • • • (5)

If the observed difference is less than some three times e12 it

may have arisen as a fluctuation of simple sampling only.

(b) If, on the other hand, the proportions of A's are not the same

in the material from which the two samples are drawn, but p1 and

p2 are the true values of the proportions, the standard errors of

sampling in the two cases arc

A=Pi9i/n\ A=Pa-2/'>h

and consequently

i_m+PA . . . . (6)
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

If the difference between p1 and p2 does not exceed some three

times this value of e12, it may be obliterated by an error of simple

sampling on taking fresh samples in the same way from the same


Further, the student should note that the value of e12 given by

equation (6) is frequently employed, in lieu of that given by

equation (5), for testing the significance of an observed difference.

The justification of this usage we indicate briefly later (Chap.

XIV, § 3). Here it is sufficient to state that, if n be large,

equation (6) gives approximately the standard-deviation of the

true values of the difference for a given observed value, and hence,

if the observed difference is greater or less than some three times

the value of eu given by (6), it is hardly possible that the true

value of the difference can be zero. The difference between the

values of e12 given by (5) and (6) is indeed, as a rule, of more

theoretical than practical importance, for they do not differ largely

unless P1 and p2 differ largely, and in that case either formula will
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

place the difference outside the range of fluctuations of sampling.

Example iii.—The following data were given in Qu. 3 of Chap.

III. for plants of Lobelia fulgens obtained by cross- and self-fertilisa-

tion respectively:—

Parentage Cross-fertilised. Parentage Self-fertilised.

Height— Heig —

Above Average. Below Average. Above Average. Below Average.

17 17 12 22

The figures indicate an association between tallness and cross-

fertilisation of parentage. Is this association significant of some

real difference, or may it have arisen solely as an "error of


sampling " 1 The proportion of plants above average height in the

two classes (cross- and self-fertilised) together is 29/68. The

standard-deviation of the differences due to simple sampling

between the proportions of "tall" plants in two samples of 34

observations each is therefore

29 39

68 X 68 * 34

2 \J

u) =°-120'

or 12-0 per cent. The actual proportions observed are 50 per

cent. and 35 per cent.—difference 15 per cent. As this difference

is only slightly in excess of the standard error of the difference,

for samples of 34 observations drawn from identical material, no

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

definite significance could be attached to it—if it stood alone.

The student will notice, however, that all the other cases cited

from Darwin in the question referred to show an association of

the same sign, but rather more marked. Hence the difference

observed may be a real one, or perhaps the real difference may be

greater and may be partially masked by a fluctuation of sampling.

If 50 per cent. and 35 per cent. were the true proportions in the

two classes, the standard error of the percentage difference would

be, by equation (6),

/50 x 50 35 x 65\ „ „

*i2 = (^ 34 + —34— J = 11 -9 per cent.,

and consequently the actual difference might not infrequently be

completely masked by fluctuations of sampling, so long as experi-

ments were only conducted on the same small scale.

Example iv.—(Data from J. Gray, Memoir on the Pigmentation

Survey of Scotland, Jour• of the Royal Anthropological Institute,

vol. xxxvii., 1907.) The following are extracted from the tables
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

relating to hair-colour of girls at Edinburgh and Glasgow :—

Of Medium Total Per cent.

Hair-colour. observed. Medium.

Edinburgh . . 4,008 9,743 411

Glasgow . . 17,529 39,764 44-1

Can the difference observed in the percentage of girls of medium

hair-colour have arisen solely through fluctuations of sampling 1

In the two towns together the percentage of girls with medium

hair-colour is 43.5 per cent. If this were the true percentage,

the standard error of sampling for the difference between per-

centages observed in samples of the above sizes would be—

e12 = (43-5 x 56-5)* x (g^j + gg^j)'

= 056 per cent.


The actual difference is 3"0 per cent., or over 5 times this, and

could not have arisen.through the chances of simple sampling.

If we assume that the difference is a real one and calculate the

standard error by equation (6), we arrive at the same value, viz.

0"56 per cent. With such large samples the difference could not,

accordingly, be obliterated by the fluctuations of simple sampling


Case III.—Two samples are drawn from distinct material or

different universes, as in the last case, giving proportions of

A's p1 and p2, but in lieu of comparing the proportion p1 with

p2 it is compared with the proportion of A's in the two samples

together, viz. p0, where, as before,

Required to find whether the difference between pl and p0 can

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

have arisen as a fluctuation of simple sampling, p0 being the

true proportion of A's in both samples.

This case corresponds to the testing of an association which

is indicated by a comparison of the proportion of A's amongst

the B's with the proportion of A's in the universe. The general

treatment is similar to that of Case II., but the work is complicated

owing to the fact that errors in pl and p0 are not independent.

If e01 be the standard error of the difference between p1 and

p0, we have at once

eoi = £o + €i ~ "rm • £o€i

r01 being the correlation between errors of simple sampling in

P1 and p0. But, from the above equation relating p0 to p1

and p2, writing it in terms of deviations in p0 p^ and p2,

multiplying by the deviation in pl and summing, we have,

since errors in pl and p2 are uncorrected,

ni h= I ni

\ + n2 €0 V W1 + M2.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Therefore finally

r°1 = «,

Unless the difference between p0 and pl exceed, say, some

three times this value of €01, it may have arisen solely by the

chances of simple sampling.


It will be observed that if m1 be very small compared with

n2, e01 approaches, as it should, the standard error for a sample

of »i1 observations.

We omit, in this case, the allied problem whether, if the

difference between p1 and p0 indicated by the samples were

real, it might be wiped out in other samples of the same size

by fluctuations of simple sampling alone. The solution is a

little complex as we no longer have tl = Polo/(ni + n2)-

Example v.—Taking the data of Example iii., suppose that

we compare the proportion of tall plants amongst the offspring

resulting from cross-fertilisations (viz. 50 per cent.) with the

proportion amongst all offspring (viz. 29/68, or 42 6 per cent.).

As, in this case, both the subsamples have the same number
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

of observations, nl = n2 = 34, and


or 6 per cent. As in the working of Example iii., the observed

difference is only 1.25 times the standard error of the difference,

and consequently it may have arisen as a mere fluctuation

of sampling.

Example vi.—Taking now the figures of Example iv., suppose

that we had compared the proportion of girls of medium hair-

colour in Edinburgh with the proportion in Glasgow and

Edinburgh together. The former is 41-1 per cent., the latter

43-5 per cent., difference 2-4 per cent. The standard error of

the difference between the percentages observed in the sub-

sample of 9743 observations and the entire sample of 49,507

observations is therefore

901 = (43-5x56-5){1&s|^r3)'

= 0-45 per cent.

The actual difference is over five times this (the ratio must, of
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

course, be the same as in Example iv.), and could not have occurred

as a mere error of sampling.


The theory of sampling, for the cases dealt with in this chapter, is generally

treated by first determining the frequency-distribution of the number of

successes in a sample. This frequency-distribution is not considered till

Chapter XV., and the student will be unable to ollow much of the literature

until he has read that chapter.



Experimental results of dice throwing, coin tossing, etc.

(1) Quetelet, A., Lettres .... sur la thiorie des probability; Bruxelles,

1846 (English translation by 0. G. Downes; C. & E. Layton, London,

1849). See especially letter xiv. and the table on p. 374 of the

French, p. 255 of the English, edition.

(2) Westergaard, H., Die Orundzilge der Theorie der Statistik; Fischer,

Jena, 1890.

(3) Edgeworth, F. Y., Article on the "Law of Error" in the Tenth Edition

of the Encyclopaedia Britannica, vol. xxviii., 1902, p. 280.

(4) Darbishisk, A. D., "Some Tables for illustrating Statistical Correla-

tion," Mem. and Proc. of the Manchester Lit. and Phil. Soe., vol. li.,


General: and applications to sex-ratio of births.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(5) Poisson, S. D., "Sur la proportion des naissances des filles et des

garcons," Memoires de I'Acad. des Sciences, vol. ix., 1829, p. 239.

(Principally theoretical: the statistical illustrations very slight.)

(6) Lexis, W., Zur Theorie der Massenerscheinungen in der menschlichen

Gesellschaft; Freiburg, 1877.

(7) Lexis, W., Abhandlungen zur Theorie der Bevblkerungs und Moralstati-

stik; Fischer, Jena, 1903. (Contains, with new matter, reprints of

some of Professor Lexis' earlier papers in a form convenient for


(8) Edgeworth, F. Y., "Methods of Statistics," Jour. Roy. Stat. Soc,

jubilee volume, 1885, p. 181.

(9) Venn, John, The Logic of Chance, 3rd edn. ; Macmillan, London, 1888.

(Cf. the data regarding the distribution of sexes in families on p. 264,

to which reference was made in § 9.)

(10) Pearson, Karl, "Skew Variation in Homogeneous Material," Phil.

Trans. Roy. Soc, Series A, vol. clxxxvi., 1895, p. 343. (Sections 2 to

6 on the binomial distribution.)

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

(11) Edgeworth, F. Y., "Miscellaneous Applications of the Calculus of

Probabilities," Jour. Roy. Stat. Soc., vols. Ix., lxi., 1897-8 (especially

part ii., vol. lxi. p. 119).

(12) Vigor, H. D., and G. U. Yule, "On the Sex-ratios of Births in the

Registration Districts of England and Wales, 1881-90," Jour. Roy.

Stat. Soc., vol. lxix., 1906, p. 576. (Use of the harmonic mean as in


As regards the sex-ratio, reference may also be made to papers in

vols. v. and vi. of Biometrika by Heron, Weldon, and Woods.

The law of small chances (§ 12).

(13) Poisson, S. D., Recherches sur la probabilite ' des jugements, etc. ; Paris,

1837. (Pp. 205-7.)

(14) Bobtkewitsoh, L. von, Das Qesetz der kleinen Zahlen; Teubner,

Leipzig, 1898.

(15) Student, "On the Error of Counting with a Hsemacytometer," Biomet-

rika, vol. v. p. 351, 1907.

(16) Rutherford, E., and H. Geiger, with a note by H. Bateman,

"The probability variations in the distribution of a particles," Phil.

Mag., Series 6, vol. xx., 1910, p. 698. (The frequency of particles

emitted during a small interval of time follows the law of small

chances : the law deduced by Bateman in ignorance of previous work.)




1. (Ref. 4: total of columns of all the 13 tables given,)

Compare the actual with the theoretical mean and standard-deviation for

the following record of 6500 throws of 12 dice, 4, 5, or 6 being reckoned

as a "success."

Successes. Frequency.

Successes. Frequency.


Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259











Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



Total 6500

2. (Ref. 1.)

Balls were drawn from a bag containing equal numbers of black and white

balls, each ball being returned before drawing another. The records were then

grouped by counting the number of black balls in consecutive 2's, 3's, 4's, 5's,

etc. The following give the distributions so derived for grouping by 5's, 6's,

and 7's. Compare actual with theoretical means and standard-deviations.


(o) Grouping

(6) Grouping

(c) Grouping

by Fives.

by Sixes.

by Sevens.

















4. The proportion of successes in the data of Qu. 1 is 0 •5097. Find the stand-

ard-deviation of the proportion with the given number of throws, and state

whether you would regard the excess of successes as probably significant of bias

in the dice.

5. In the 4096 drawings on which Qu. 2 is based 2030 balls were black

and 2066 white. Is this divergence probably significant of bias?

6. If a frequency-distribution such as those of Questions 1,2, and 3 be given,

show how n andp, if unknown, maybe approximately determined from the

mean and standard-deviation of the distribution.

Find n audp in this way from the data of Qu. 1 and Qu. 3.

7. Verify the following results for Table VI. of Chapter IX. p. 163, and

compare the results of the different grouping of the table on p. 259. In

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

calculating the actual standard-deviation, use Sheppard's correction for

grouping (p. 208).



Row or Rows.



deviation s.

deviation *

of Sampling s0.




509 5

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google









5 <r.




6,7 •




8, 9, 10, 11




12, 13, 14




15 and upwards.




8. In a case of mice-breeding (see reference given in § 11) the harmonic

mean number in a litter was 4735, and the expected proportion of albinos

50 per cent. Find the standard-deviation of simple sampling for the pro-

portion of albinos in a litter, and state whether the actual standard-deviation

(21-63 per cent.) probably indicates any real variation, or not.

9. (Data from Report i., Evolution Committee of the Royal Society, p. 17.)

In breeding certain stocks 408 hairy and 126 glabrous plants were obtained.

If the expectation is one-fourth glabrous, is the divergence significant, or might

it have occurred as a fluctuation of sampling?

10. (Data of Example viii. and Qu. 5, Chap. III.) Is the association in



1. Warning as to the assumption that three times the standard error gives the

range for the majority of fluctuations of simple sampling of either sign

—2. Warning as to the use of the observed for the true value of p in

the formula for the standard error—3. The inverse standard error, or

standard error of the true proportion for a given observed proportion:

equivalence of the direct and inverse standard^rors when n is large—

4-8. The importance of errors other than Actuations of "simple

sampling" in practice: unrepresentative Qt biassed samples—9-10.

Effect of divergences from the conditions of simple sampling: (a)

effect of variation in p and q for the several universes from which the

samples are drawn—11-12. (b) Effect of variation inp and q from one
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

sub-class to another within each universe—13-14. (c) Effect of a

correlation between the results of the several events—15. Summary.

1. There are two warnings as regards the methods adopted in

the examples in the concluding section of the last chapter

which the student should note, as they may become of importance

when the number of observations is small. In the first place, he

should remember that, while we have taken three times the

standard error as giving the limits within which, the great

majority of errors of sampling of either sign are contained,

the limits are not, as a rule, strictly the same for positive and

for negative errors. As is evident from the examples of actual

distributions in § 7, Chap. XIII., the distribution of errors is not

strictly symmetrical unless p = q = 0-5. No theoretical rule as

to the limits can be given, but it appears from the examples

referred to and from the calculated distributions in Chap. XV.

§ 3, that a range of three times the standard error includes

the great majority of the deviations in the direction of the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

longer "tail" of the distribution, while the same range on the

shorter side may extend beyond the limits of the distribution

altogether. If, therefore, p be less than 0-5, our assumed range

may be greater than is possible for negative errors, or if p be


greater than 0'5, greater than is possible for positive errors. The

assumption is not, however, likely as a rule to lead to a serious

mistake; as stated at the commencement of this paragraph, the

point is of importance only when n is small, for when n is large the

distribution tends to become sensibly symmetrical even for values

of p differing considerably from 0-5. (Gf. Chap. XV. for the

properties of the limiting form of distribution.)

2. In the second place, the student should note that, where we

were unable to assign any a priori value to p, we have assumed

that it is sufficiently accurate to replace p in the formula for the

standard error by the proportion actually observed, say ir.

Where n is large so that the standard error of p becomes small

relatively to the product pq the assumption is justifiable, and no

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

serious error is possible. If, however, n be small, the use of the

observed value i r may lead to an under- or over-estimation of the

standard error which cannot be neglected. To get some rough

idea of the possible importance of such effects, the approximate

standard error e may first be calculated as usual from the

observed proportion ir, and then fresh values recalculated, replac-

ing ir by ir + 3e. ^should be remembered that the maximum

value of the product pq is given by p = q = 0.5, and hence these

values, if within the limits of fluctuations of sampling, will give

one limiting value for the standard error. The procedure is by

no means exact, but may serve to give a useful warning.

Thus in Example iii. of Chap. XIII. the observed proportion of

tall plants is 29/68, or, say, 43 per cent. The standard error of

this proportion is 6 per cent., and a true proportion of 50 per

cent. is therefore well within the limits of fluctuations of sampling.

The maximum value of the standard error is therefore

/50 x50V Rrit.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

I ._ J =6 "06 per cent.

On the other hand, the standard error is unlikely to be lower

than that based on a proportion of 43 - 18 = 25 per cent.,

/25x75\» KOK

( —7t— J = 5-25 per cent.

3. The two difficulties mentioned in §§ 1 and 2 arise when n,

the number of cases in the sample, is small. The interpretation

of the value of the standard error is also more limited in this

case than when n is large. Suppose a large number of observa-

tions to be made, by means of samples of n observations each, on

different masses of material, or in different universes, for each of

which the true value of p is known. On these data we could


form a correlation-table between the true proportion p in a given

universe and the observed proportion ir in a sampis-of n observa-

tions drawn therefrom. What we have found from the work of

the last chapter is that the standard-deviation of an array of w's

associated with a certain true value p, in this table, is (pq/n)*;

but the question may be asked —What is the standard-deviation

of the array at right angles to this, i.e. the array of p's associated

with a certain observed proportion irl In other words, given an

observed proportion ir, what is the standard-deviation of the true

proportions? This is the inverse of the problem with which we

have been dealing, and it is a much more difficult problem.

On general principles, however, we can see that if n be large,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

the two standard-deviations will tend, on the average of all

values ofp, to be nearly the same, while if n be small the standard-

deviation of the array of ir's will tend to be appreciably the

greater of the two. For if ir =p + 8, 8 is uncorrelated with p,

and therefore if <rv be the standard-deviation of p in all the

universes from which samples are drawn, ov the standard-

deviation of observed proportions in the samples, and as the

standard-deviation of the differences, A

But o-j varies inversely as n. Hence if n become very large, <r«

becomes very small, o> becomes sensibly equal to <rp, and therefore

the standard-deviations of the arrays, on an average, are also

sensibly equal. If n be large, therefore, [ir(l _ ir)/n]» may be

taken as giving, with sufficient exactness, the standard-deviation

of the true proportion p for a given observed proportion ir. But

if n be small, as cannot be neglected in comparison with o-^ o-w is

therefore appreciably greater than <rp, and the standard-deviation

of the array of i9's is, on an average of all arrays, correspondingly

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

greater than the standard deviation of the array of p's—the state-

ment is not true for every pair of corresponding arrays, especially

for extreme values of p near 0 and 1. Further, it should be

noticed that, while the regression of ir on p is unity—i.e. the

mean of the array of v's is identical with p, the type of the

array—the regression of p on ir is less than unity. If we as-

sume, therefore, that a tabulation of all possible chances, observed

for every conceivable subject, would give a distribution of p

ranging uniformly between 0 and 1, or indeed grouped symmetri-

cally in any way round 0.5, any observed value ir greater than

0-5 will probably correspond to a true value of p slightly lower

than tt, and conversely. We have already referred to the use of

the inverse standard error in § 13 of Chap. XIII. (Case II., p. 265).

If we determine, for example, the standard error of the difference


between two observed proportions by equation (6) of that chapter,

this may be taken, provided n be large, as approximately the

standard-deviation of true differences for the given observed


4. The use of standard errors must be exercised with care. It

is very necessary to remember the limited assumptions on which

the theory of simple sampling is based, and to bear in mind that

it covers those fluctuations alone which exist when all the assumed

conditions are fulfilled. The formulae obtained for the standard

errors of proportions and of their differences have no bearing

except on the one question, whether an observed divergence of a

certain proportion from a certain other proportion that might be

observed in a more extended series of observations, or that has

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

actually been observed in some other series, might or might not

be due to fluctuations of simple sampling alone. Their use is

thus quite restricted, for in many cases of practical sampling this

is not the principal question at issue. The principal question in

many such cases concerns quite a different point, viz. whether the

observed proportion ir in the sample may not diverge from the

proportion p existing in the universe from which it was drawn,

owing to the nature of the conditions under which the sample was

taken, tt tending to be definitely greater or definitely less than

p. Such divergence between ir and p might arise in two distinct

ways, (1) owing to variations of classification in sorting the

.4's and a's, the characters not being well defined—a source of

error which we need not further discuss, but one which may lead

to serious results \cf. ref. 5 of Chap. V.]. (2) Owing to either A's

or a's tending to escape the attentions of the sampler. To give

an illustration from artificial chance, if on drawing samples from

a bag containing a very large number of black and white balls

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

the observed proportion of black balls was tt, we could not

necessarily infer that the proportion of black balls in the bag was

approximately ir, even though the standard error were small, and

we knew that the proportions in successive samples were subject

to the law of simple sampling. For the black balls might be,

say, much more highly polished than the white ones, so as to

tend to escape the fingers of the sampler, or they might be re-

presented by a number of lively black insects sheltering amongst

white stones: in neither case would the ratio of black balls to

white, or of insects to stones, be represented in their proper pro-

portions. Clearly, in any parallel case, inferences as to the

material from which the sample is drawn are of a very doubtful

and uncertain kind, and it is this uncertainty whether the chance

of inclusion in the sample is the same for A's and a's, far more

than the mere divergences between different samples drawn in


the same way, which renders many statistical results based on

samples so dubious.

5. Thus in collecting returns as to family income and expendi-

ture from working-class households, the families with lower

incomes are almost certain to be under-represented; they largely

"escape the sampler's fingers" from their simple lack of ability

to keep the necessary accounts. It is almost impossible to say,

however, to what extent they are under-represented, or to form

any estimate as to the possible error when two such samples

taken by different persons at different times, or in different places,

are compared. Again, if estimates as to crop-production were

formed on the basis of a limited number of voluntary returns,

the estimates would be likely to err in excess, as the persons who

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

made the returns would probably include an undue proportion

of the more intelligent farmers whose crops would tend to be

above average. Whilst voluntary returns are in this way liable

to lead to more or less unrepresentative samples, compulsory

sampling does not evade the difficulty. Compulsion could not en-

sure equally accurate and trustworthy returns from illiterate

and well-educated workmen, from intelligent and unintelligent

farmers. The following of some definite rule in drawing the

sample may also produce unrepresentative samples: if samples

of fruit were taken solely from the top layers of baskets exposed

for sale, the results might be unduly favourable; if from the

bottom layer, unduly unfavourable.

6. In such cases we can see that any sample, taken in the

way supposed, is likely to be definitely biassed, in the sense

that it will not tend to include, even in the long run, equal

proportions of the A's and a's in the original material. In other

cases there may be no obvious reason for presuming such bias,

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

but, on the other hand, no certainty that it does not exist. Thus

if we noted the hair-colours of the children in, say, one

school in ten in a large town, the question would arise whether

this method would tend to give an unbiassed sample of all the

children. No assured answer could be given: conjectures on

the matter would be based in part on the way in which the

schools were selected, e.g. the volunteering of teachers for the work

might in itself introduce an element of bias. Again, if say

10,000 herrings were measured as landed at various North Sea

ports, and the question were raised whether the sample was

likely to be an unbiassed sample of North Sea herrings, no

assured answer could be given. There may be no definite reason

for expecting definite bias in either case, but it may exist, and

no mere examination of the sample itself can give any informa-

tion as to whether it exists or no.


7. Such an examination may be of service, however, as

indicating one possible source of bias, viz. great heterogeneity in

the original material. If, for example, in the first illustration,

the hair-colours of the children differed largely in the different

schools —much more largely than would be accounted for by

fluctuations of simple sampling— it would be obvious that one

school would tend to give an unrepresentative sample, and

questionable therefore whether the five, ten or fifteen schools

observed might not also have given an unrepresentative sample.

Similarly, if the herrings in different catches varied largely, it

would, again, be difficult to get a representative sample for a

large area. But while the dissimilarity of subsamples would

then be evidence as to the difficulty of obtaining a representative

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

sample, the similarity of subsamples would, of course, be no

evidence that the sample was representative, for some very

different material which should have been represented might

have been missed or overlooked.

8. The student must therefore be very careful to remember

that even if some observed difference exceed the limits of fluctua-

tion in simple sampling, it does not follow that it exceeds the

limits of fluctuation due to what the practical man would regard —

and quite rightly regard—as the chances of sampling. Further,

he must remember that if the standard error be small, it by no

means follows that the result is necessarily trustworthy: the

smallness of the standard error only indicates that it is not

untrustworthy owing to the magnitude of fluctuations of simple

sampling. It may be quite untrustworthy for other reasons:

owing to bias in taking the sample, for instance, or owing to definite

errors in classifying the -4's and a's. On the other hand, of course,

it should also be borne in mind that an observed proportion is not

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

necessarily incorrect, but merely to a greater or less extent

untrustworthy if the standard error be large. Similarly, if an

observed proportion jt1 in a sample drawn from one universe be

greater than an observed proportion ir2 in a sample drawn from

another universe, but ir, - ir2 is considerably less than three times

the standard error of the difference, it does not, of course, follow

that the true proportion for the given universes, p, and p2, are

most probably equal. On the contrary, p1 most likely exceeds p2;

the standard error only warns us that this conclusion is more or

less uncertain, and that possibly p2 may even exceed pv

9. Let us now consider the effect, on the standard-deviation of

sampling, of divergences from the conditions of simple sampling

which were laid down in § 8 of Chap. XIII.

First suppose the condition (a) to break down, so that there is

some essential difference between the localities from which, or the


conditions under which, samples are drawn, or that some essential

change has taken place during the period of sampling. We may

represent such circumstances in a case of artificial chance by

supposing that for the first /] throws of n dice the chance of

success for each die is pv for the next/2 throws pm for the next/*3

throws p3, and so on, the chance of success varying from time to

time, just as the chance of death, even for individuals of the same

age and sex, varies from district to district. Suppose, now, that

the records of all these throws are pooled together. The mean

number of successes per throw of the n dice is given by

M=j/(fip1+fip2+f»p*+ ■ ■ ■ ■ ) = n-pw

where N=^(f) is the whole number of throws and p0 is the mean

value 1(fp)IN of the varying chance p. To find the standard-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

deviation of the number of successes at each throw consider that

the first set of throws contributes to the sum of the squares of

deviations an amount

fl[np1q1 + **(>' -#>)*],

n.plq1 being the square of the standard-deviation for these throws,

and n(p1 -p0) the difference between the mean number of

successes for the first set and the mean for all the sets together.

Hence the standard-deviation <r of the whole distribution is given

by the sum of all quantities like the above, or

Let o-;l be the standard-deviation of p, then the last sum is

N.n-dp and substituting 1 -p for q, we have

a1 = np0 - npl - no-], + nocrp

= np0q0 + n(n-1)o-; . . . . (1)

This is the formula corresponding to equation (1) of Chap.

XIII.: if we deal with the standard-deviation of the proportion

of successes, instead of that of the absolute number, we have,

dividing through by n2, the formula corresponding to equation

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

(2) of Chap. XIII., viz.—

n n v v'

10. If n be large and s0 be the standard-deviation calculated

from the mean proportion of successes p0, equation (2) is sensibly

of the form

s"' = 4 + 0%

Table showing Frequencies of Registration Districts in England and Wales

with Different Proportions of Deaths in Childbirth (including Deaths

from Puerperal Fever) per 1000 Births in the same Year, for the same

Groups of Districts as in the Table of Chap. XIII. § 10. Data from same

source. Decade 1881-90.

Deaths per

1000 Births.



Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259










Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google






















Standard- de- |

viation i


standard -de-

certain registration districts of England, in § 10 of Chap. XIII.

p. 259. It will be seen that in the first group of small districts

there appears to be a significant standard-deviation of some 6

units in the proportion of male births per thousand, but in the

more urban districts this falls to 1 or 2 units; in one case only

does s fall short of «0. In the table on p. 279 are given some

different data relating to the deaths of women in childbirth in the

same groups of districts, and in this case the effect of definite

causes is relatively larger, as one might expect. The values of

Js2 - si suggest an almost uniform significant standard-deviation

o-j, = 0"8 in the deaths of women per thousand births, five out of

the eight values being very close to this average. The figures of

this case also bring out clearly one important consequence of (2),
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

viz. that if we make n large s becomes sensibly equal to o-^ while

if we make n small s becomes more nearly equal to p0q0/n. Hence

if we want to know the significant standard-deviation of the pro-

portion p—the measure of its fluctuation owing to definite causes

—n should be made as large as possible; if, on the other hand, we

want to obtain good illustrations of the theory of simple sampling

n should be made small. If n be very large the actual standard-

deviation may evidently become almost indefinitely large com-

pared with the standard-deviation of sampling. Thus during the

20 years 1855-74 the death-rate in England and Wales fluctuated

round a mean value of 22-2 per thousand with a standard-devia-

tion of 0.86. Taking the mean population as roughly 21 millions,

the standard-deviation of sampling is approximately

22 x 978

21 x 106

This is only about one twenty-seventh of the actual value.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

11. Now consider the effect of altering the second condition

of simple sampling, given in § 8 (6) of Chapter XIII., viz. the

condition that the chances p and q shall be the same for every

die or coin in the set, or the circumstances that regulate the

appearance of the character observed the same for every individual

or every sub-class in each of the universes from which samples

are drawn. Suppose that in the group of n dice thrown the

chances for m1 dice are p1 q1; for m2 dice, p2 q2, and so on,

the chances varying for different dice, but being constant

throughout the experiment. The case differs from the last, as

in that the chances were the same for every die, at any one

throw, but varied from one throw to another: now they are con-

stant from throw to throw, but differ from one die to another as

they would in any ordinary set of badly made dice. Required to

find the effect of these differing chances.


For-the mean number of successes we evidently have

M=m]p1 + m2p2 + mip:t+ ....

= n.p0

p0 being the mean chance 2(mp)/n. To find the standard-deviation

of the number of successes at each throw, it should be noted that

this may be regarded as made up of the number of successes in

the m1 dice for which the chances are pl qv together with the

number of successes amongst the m2 dice for which the chances

are p.2 q2, and so on: and these numbers of successes are all

independent. Hence

<r2 = mlp1q1 + m2p2q2 + mipiq3+ ....

= ^(mpq),

Substituting 1 -p for q, as before, and using <rp to denote the

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

standard-deviation of p,

o-'2 = n.p0q(l-iloj, . . . . (3)

or if s be, as before, the standard-deviation of the proportion of



j>=M'--' . "(4)


12. The effect of the chances varying for the individual dice or

other "events" is therefore to lower the standard-deviation, as

calculated from the mean proportion p0, and the effect may

conceivably be considerable. To take a limiting case, if p be zero

for half the events and unity for the remainder, p0 = q0 = \, and

<r,= L so that s is zero. To take another illustration, still some-

what extreme, if the values of p are uniformly distributed over

the whole range between 0 and 1, p0 = ?0 = £ as before but o-\ =

1/12 = 0-0833 (Chap. VIII. §J2, p. 143). Hence s2 = 0-1667/n,

s = 0-408/ Jn, instead of 0-5/ Jn, the value of s if the chances are
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

A in every case. In most practical cases, however, the effect will be

much less. Thus the standard-deviation of sampling for a death-

rate of, say, 18 per thousand in a population of uniform age and

one sex is (18 x 982)!/\/'i = 133/vV In a population of the age

composition of that of England and Wales, however, 'the death-

rate is not, of course, uniform, but varies from a high value in

infancy (say 150 per thousand), through very low values (2 to 4

per thousand) in childhood to continuously increasing values in

old age; the standard-deviation of the rate within such a popula-

tion is roughly about 30 per thousand. But the effect of this


variation on the standard-deviation of simple sampling is quite

small, for, as calculated from equation (4),

«2 = -(18x982-900)


as compared with 133/\Jn.

13. We have finally to pass to the third condition (c) of § 8, Chap.

XIII., and to discuss the effect of a certain amount of dependence

between the several " events" in each sample. We shall suppose,

however, that the two other conditions (a) and (b) are fulfilled,

the chances^? and q being the same for every event at every trial,

and constant throughout the experiment. The problem is again

most simply treated on the lines of § 5 of the last chapter. The

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

standard^leviation for each event is (pq)i as before, but the events

are no longer independent: instead, therefore, of the simple


o-2 = n.pq,

we must have (cf. Chap. XI. § 2)

<r2 = n.pq + 2pq(r12 + rli + .... r23+ ....),

where, r12, r13, etc. are the correlations between the results of the

first and second, first and third events, and so on—correlations

for variables (number of successes) which can only take the

values 0 and 1, but may nevertheless, of course, be treated as

ordinary variables (cf. Chap. XI. § 10). There are n(n-l)/2

correlation-coefficients, and if, therefore, r is the arithmetic mean

of the correlations we may write

^ = npq[l+r(n-l)]. . . . (5)

The standard-deviation of simple sampling will therefore be

increased or diminished according as the average correlation

between the results of the single events is positive or negative,

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

and the effect may be considerable, as o- may be reduced to zero

or increased to n(pqy. For the standard deviation of the propor-

tion of successes in each sample we have the equation

*2=-?[l+*•(*-1)1 - - • • (6)

It should be noted that, as the means and standard-deviations

for our variables are all identical, r is the correlation-coefficient

for a table formed by taking all possible pairs of results in the

n events of each sample.


It should also be noted that the case when r is positive covers

the departure from the rules of simple sampling discussed in

§§ 9-10: for if we draw successive samples from different records,

this introduces the positive correlation at once, even although the

results of the events at each trial are quite independent of one

another. Similarly, the case discussed in §§ 11-12 is covered by

the case when r is negative: for if the chances are not the same

for every event at each trial, and the chance of success for some

one event is above the average, the mean chance of success for the

remainder must be below it. The cases (a), (b) and (c) are, how-

ever, best kept distinct, since a positive or negative correlation

may arise for reasons quite different from those discussed in

§§ 9-12.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

14. As a simple illustration, consider the important case of

sampling from a limited universe, e.g. of drawing n balls in

succession from the whole number w in a bag containing pw white

balls and qw black balls. On repeating such drawings a large

number of times, we are evidently equally likely to get a white

ball or a black ball for the first, second, or nth ball of the sample:

the correlation-table formed from all possible pairs of every sample

will therefore tend in the long-run to give just the same form of

distribution as the correlation-table formed from all possible pairs

of the w balls in the bag. But from Chap. XI. § 11 we

know that the correlation-coefficient for this table is - l/(w— 1),


r2 = n.pq,



w~ 1

If ra=1, we have the obviously correct result that <r = (pqY, as

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

in drawing from unlimited material: if, on the other hand, n = w,

o- becomes zero as it should, and the formula is thus checked for

simple cases. For drawing 2 balls out of 4, o- becomes 0-816

(npq)^; for drawing 5 balls out of 10, 0"745 (npq)i; in the case

of drawing half the balls out of a very large number, it approxi-

mates to (O'b.npqY, or 0.707 (npq)*.

In the case of contagious or infectious diseases, or of certain

forms of accident that are apt, if fatal at all, to result in whole-

sale deaths, r is positive, and if n be large (as it usually is in such

cases) a very small value of r may easily lead to a very great increase

in the observed standard-deviation. It is difficult to give a really

good example from actual statistics, as the conditions are hardly

ever constant from one year to another, but the following will

serve to illustrate the point. During the twenty years 1887-1906

there were 2107 deaths from explosions of firedamp or coal-dust

in the coal-mines of the United Kingdom, or an average of 105

deaths per annum. From § 12 of Chap. XIII. it follows that this

should be the square of the standard-deviation of simple sampling,

or the standard-deviation itself approximately 10-3. But the

square of the actual standard-deviation is 7178, or its value 84-7,

the numbers of deaths ranging between 14 (in 1903) and 317

(in 1894). This large standard-deviation, to judge from the

figures, is partly, though not wholly, due to a general tendency to

decrease in the numbers of deaths from explosions in spite of a

large increase in the number of persons employed; but even if we

ignore this, the magnitude of the standard-deviation can be

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

accounted for by a very small value of the correlation r, expressive

of the fact that if an explosion is sufficiently serious to be fatal to

one individual, it will probably be fatal to others also. For if <r0

denote the standard-deviation of simple sampling, o- the standard-

deviation of sampling given by equation (5), we have


Whence, from the above data, taking the numbers of persons

employed underground at a rough average of 560,000,

707:5 +0-00012.

560000 x105

15. Summarising the preceding paragraphs, §§ 9-14, we see

that if the chances p and q differ for the various universes,

districts, years, materials, or whatever they may be from which

the samples are drawn, the standard-deviation observed will be

greater than the standard-deviation of simple sampling, as

calculated from the average values of the chances: if the average

chances are the same for each universe from which a sample is
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

drawn, but vary from individual to individual or from one sub-

class to another within the universe, the standard-deviation

observed will be less than the standard-deviation of simple

sampling as calculated from the mean values of the chances:

finally, if p and q are constant, but the events are no longer

independent, the observed standard-deviation will be greater or

less than the simplest theoretical value according as the corre-

lation between the results of the single events is positive or

negative. These conclusions further emphasise the need for

caution in the use of standard errors. If we find that the


standard-deviation in some case of sampling exceeds the standard-

deviation of simple sampling, two interpretations are possible:

either that p and q are different in the various universes from

which samples have been drawn (i.e. that the variations are

more or less definitely significant in the sense of § 13, Chap. XIII.),

or that the results of the events are positively correlated inter

se. If the actual standard-deviation fall short of the standard-

deviation of simple sampling, two interpretations are again

possible, either that the chances p and q vary for different

individuals or sub-classes in each universe, while approximately

constant from one universe to another, or that the results of

the events are negatively correlated inter se. Even if the

actual standard-deviation approaches closely to the standard-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

deviation of simple sampling, it is only a conjectural and not

a necessary inference that all the conditions of " simple sampling"

as defined in § 8 of the last chapter are fulfilled. Possibly, for

example, there may be a positive correlation r between the

results of the different events, masked by a variation of the

chances p and q in sub-classes of each universe.

Sampling which fulfils the conditions laid down in § 8 of

Chap. XIII., simple sampling as we have called it, is generally

spoken of as random sampling. We have thought it better to

avoid this term, as the condition that the sampling shall be

random—haphazard—is npt the only condition tacitly assumed.


Of. generally the references to Chap. XIII., to which may be


(I) Pearson, Karl, "On certain Properties of the Hypergeometrical Series,

and on the fitting of such Series to Observation Polygons in the Theory of

Chance," Philosophical Magazine, 5th Series, vol. xlvii., 1899, p. 236.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

(An expansion of one section of ref. 10 of Chap. XIII., dealing with the

first problem of our § 14, i.e. drawing samples from a bag containing

a limited number of white and black balls, from the standpoint of the

frequency-distribution of the number of white or black balls in the



1. Referring to Question 7 of Chap. XIII., work out the values of the

significant standard-deviation av (as in § 10) for each row or group of rows

there given, but taking row 5 with rows 6 and 7.

2. For all the districts in England and Wales included in the same table

(Table VI., Chap. IX.) the standard-deviation of the proportion of male births

per 1000 of all births is 7'46 and the mean proportion of male births 509'2.

The harmonic mean number of births in a district is 5070. Find the significant

standard-deviation ap.

3. If for one half of n events the chance of success is p and the chance of

failure q, whilst for the other half the chance of success is q and the chance of

failure p, what is the standard-deviation of the number of successes, the events

being all independent?

4. The following are the deaths from small-pox during the 20 years

1882-1901 in England and Wales: -







Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

















Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

















The death-rate from small-pox being very small, the rule of § 12, Chap.

XIII., may be applied to estimate the standard-deviation of simple sampling.

Assuming that the excess of the actual standard-deviation over this can be

entirely accounted for by a correlation between the results of exposure to risk

of the individuals composing the population, estimate r. The mean population

during the period may be taken in round numbers as 29 millions.




1-2. Determination of the frequency-distribution for the number of successes

in n events: the binomial distribution—3. Dependence of the form

of the distribution on p, q and n—4-5. Graphical and mechanical

methods of forming representations of the binomial distribution—

6. Direct calculation of the mean and the standard-deviation from

the distribution - 7-8. Necessity of deducing, for use in many

practical cases, a continuous curve giving approximately, for large

values of n, the terms of the binomial series—9. Deduction of the

normal curve as a limit to the symmetrical binomial—10-11. The

value of the central ordinate—12. Comparison with a binomial dis-

tribution for a moderate value of n—13. Outline of the more general

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

conditions from which the curve can be deduced by advanced methods—

14. Fitting the curve to an actual series of observations—15. Difficulty

of a complete test of lit by elementary methods—16. The table of areas

of the normal curve and its use—17 The quartile deviation and the

"probable error"—18. Illustrations of the application of the normal

curve and of the table of areas.

1. In Chapters XIII. and XIV. the standard-deviation of the

number of successes in n events was determined for the several

more important cases, and the applications of the results indicated.

For the simpler cases of artificial chance it is possible, however, to

go much further, and determine not merely the standard-deviation

but the entire frequency-distribution of the number of "successes."

This we propose to do for the case of "simple sampling," in which

all the events are completely independent, and the chances p and

q the same for each event and constant throughout the trials.

The case corresponds to the tossing of ideally perfect coins (homo-

geneous circular discs), or the throwing of ideally perfect dice

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

(homogeneous cubes).

2. If we deal with one event only, we expect in N trials, Nq

failures and Np successes. Suppose we now combine with the

results of this first event the results of a second. The two events

are quite independent, and therefore, according to the rule of







ZN.fq + N.p>q




2N.p2q + N.p9q


p>q9 + %N.p*q9

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259




N.pq + N.pq






N.pq9 + IN.


pq3 + 8N.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google





g* N


Number 8f Successes.

One event

Two events

Three events

Four events








independence, of the Nq failures of the first event (Nq)q will be

associated (on an average) with failures of the second event, and

(Nq)p with successes of the second event (cf row 2 of the scheme

on p. 288). Similarly of the Np successful first events, (Np)q will

be associated (on an average) with failures of the second event

and (Np)p with successes. In trials of two events we would

therefore expect approximately Nq* cases of no success, 2Npq

cases of one success and one failure, and Np2 cases of two successes,

as in row 3 of the scheme. The results of a third event may be

combined with those of the first two in precisely the same way.

Of the Nq1 cases in which both the first two events failed, (Nq2)q

will be associated (on an average) with failure of the third also,

(Nq2)p with success of the third. Of the 2Npq cases of one

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

success and one failure, (2Npq)q will be associated with failure

of the third event and (2Npq)p with success, and similarly for

the Np2 cases in which both the first two events succeeded. The

result is that in N trials of three events we should expect Nq3

cases of no success, 3 Npq2 cases of one success, 3 Np2q cases of two

successes, and Np3 cases of three successes, as in row 5 of the

scheme. The scheme is continued for the results of a fourth

event, and it is evident that all the results are included under a

very simple rule: the frequencies of 0, 1, 2 ... . successes are


for one event by the binomial expansion of N(q +p)

for two events ,, ,, N(q+p)2

for three events „ ,, N(q+p)3

for four events „ ,, N(q+p)*

and soon. Quite generally, in fact:—the frequencies of0,1, 2 ... .

successes in N trials of n events are given by the successive terms

in the binomial expansion of N(q + p)n, viz.—

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

This is the first theoretical expression that we have obtained for

the form of a frequency-distribution.

3. The general form of the distributions given by such

binomial series will have been evident from the experimental

examples given in Chapter XIII., i.e. they are distributions

of greater or less asymmetry, tailing off in either direction

from the mode. The distribution is, however, of so much

importance that it is worth while considering the form in

greater detail. This form evidently depends (1) on the values

of q and p, (2) on the value of the exponent n. If p and q

are equal, evidently the distribution must be symmetrical, for



p and q may be interchanged without altering the value of

any term, and consequently terms equidistant from either

end of the series are equal. If p and q are unequal, on the

other hand, the distribution is asymmetrical, and the more

asymmetrical, for the same value of n, the greater the inequality

of the chances. The following table shows the calculated

distributions for n = 20 and values of p, proceeding by 0.1,

from 0.1 to 0.5. When p = 0.1, cases of two successes are the

A. —Terms of the Binomial Series 10,000 (q+pf for Values of p

from 0-l to 0 5. (Figures given to the nearest unit.)

Number of

p = Q-\
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

p = 0'2

y = 0-3

p = 04

p = 0-5


9 = 0-9

9 = 0-8

9 = 07

? = 0-6

9 = 05




Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google































If p = q the effect of increasing n is to raise the mean and

increase the dispersion. If p is not equal to q, however, not

only does an increase in n raise the mean and increase the

dispersion, but it also lessens the asymmetry; the greater

n, for the same value of p and q, the less the asymmetry.

Thus if we compare the first distribution of the above table

with that given by n= 100, we have the following :—


-Terms of the Binomial Series 10,000 (0-9 + 0-1)11

to the nearest unit.)

(Figures given


Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259














Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
























. 14








binomial distributions. It will have been noted that any one

term—say the rth—in one series is obtained by taking q times the

rth term together with p times the (r-l)th term of the preceding

series. Now if AP, CR (figure 46) be two verticals, and a third,

BQ, be erected between them, cutting PR in Q, so that

AB:BC::q:p, then

BQ=p.AP + q.CR.

(This follows at once on joining AR and considering the two

segments into which BQ is divided.) Consider then some

binomial, say for the case p = j, q = \. Draw a series of verticals

(the heavy verticals of fig. 47) at any convenient distance apart

A if b p c
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Fig. 46.

on a horizontal base line, and erect other verticals (the lighter

verticals) dividing the distance between them in the ratio of

q :p, viz. 3:1. Next, choosing a vertical scale, draw the binomial

polygon for the simplest case n = 1 ; in the diagram N has been

taken = 4096, and the polygon is abed, ob = 3072, lc= 1024. The

polygons for higher values of n may now be constructed graphi-

cally. Mark the points where ab, be, cd respectively cut the

intermediate verticals and project them horizontally to the right

on to the thick verticals. This gives the polygon ab'c'd'e for

n = 2. For ob' = q.ob, 1 c'=p.ob + q. lc, and so on. Similarly, if the

points where ab', b'c, etc., cut the intermediate verticals are

projected horizontally on to the thick verticals, we have the

polygon ab"c"d"e"f for n = 3. The process may be continued

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

indefinitely, though it will be found difficult to maintain any

high degree of accuracy after the first few constructions.





5. The mechanical method of constructing the representation of

a binomial series is indicated diagrammatically by fig. 48. The

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


apparatus consists of a funnel opening into a space—say a i inch in

depth—between a sheet of glass and a back-board. This space is

broken up by successive rows of wedges like 1, 2 3, 4 5 6, etc., which

will divide up into streams any granular material such as shot or

mustard seed which is poured through the funnel when the

apparatus is held at a slope. At the foot these wedges are

replaced by vertical strips, in the spaces between which the

48.—The Pearson-Galton Binomial Apparatus.

material can collect. Consider the stream of material that

comes from the funnel and meets the wedge 1. This wedge is

set so as to throw q parts of the stream to the left and p parts

to the right (of the observer). The wedges 2 and 3 are set so as
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

to divide the resultant streams in the same proportions. Thus

wedge 2 throws q2 parts of the original material to the left and

qp to the right, wedge 3 throws pq parts of the original material

to the left and p2 to the right. The streams passing these wedges

are therefore in the ratio of q2 : 2qp : p2. The next row of wedges

is again set so as to divide these streams in the same proportions

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

as before, and the four streams that result will bear the propor-

tions q3 : 3q2p : 3qp2 : p3. The final set, at the heads of the

vertical strips, will give the streams proportions q*: iq3p : Gq^p2:

iqps : p\ and these streams will accumulate between the strips

and give a representation of the binomial by a kind of histogram,

as shown. Of course as many rows of wedges may be provided

as may be desired.

This kind of apparatus was originally devised by Sir Francis

Gal ton (ref. 1) in a form that gives roughly the symmetrical

binomial, a stream of shot being allowed to fall through rows of

nails, and the resultant streams being collected in partitioned

spaces. The apparatus was generalised by Professor Pearson,

who used rows of wedges fixed to movable slides, so that they

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

could be adjusted to give any ratio of q :p. (Kef. 11.)

6. The values of the mean and standard-deviation of a binomial

distribution may be found from the terms of the series directly,

as well as by the method of Chap. XIII. (the calculation was

in fact given as an exercise in Question 8, Chap. VII., and

Question 6, Chap. VIII.). Arrange the terms under each other

as in col. 1 below, and treat the problem as if it were an arith-

metical example, taking the arbitrary origin at 0 successes: as

N is a factor all through, it may be omitted for convenience.

(1) (2) (3) (4)

Frequency/. Dev. {. /£. /{2.

q" 0 — —

n.qn-ip 1 TC.g"'-1p m.9"-Ip

W("~1)g"~2P2 2 n(n-l)qn-Y 2n(ra -1)?"-^2

iKn-l)(n-2) n(n-l)(n-2) 3n(tt -1 )(re - 2)

1.2.3 q r 1.2 9 r 1.2 q p

The sum of col. 1 is of course unity, i.e. we are treating N as

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

unity, and the mean is therefore given by the sum of the terms

in col. (3). But this sum is

np | g"-1 + (n - 1)g-fr + (n ~ ^ - 2)g"-y + . . . .}

= np(q +p)"j1 = np.

That is, the mean M is np, as by the method of Chap. XIII.


The square of the standard-deviation is given by the surn of

the terms in col. (4) less the square of the mean, that is,

2 = nplq"-

1 + 2(n-l)j"-2p + 8(^-^y -j"-3p2+ . . . . \-n2p\

But the series in the bracket is the binomial series (q +p)"~l

with the successive terms multiplied by 1, 2, 3, . . . It therefore

O gives the difference of the mean of the said binomial from - 1,

'and its sum is therefore (n- \)p + \. Therefore'

o-2 = np{(n - \)p + 1} - n2p2

= np - np"1 = npq.

7. The terms of the binomial series thus afford a means of

completely describing a certain class of frequency-distributions—

i.e. of giving not merely the mean and standard-deviation in

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

each case, but of describing the whole form of the distribution.

If N samples of n cards each be drawn from an indefinitely large

record of cards marked with A or a, the proportion of il-cards

in the record being p, then the successive terms of the series

N(q + p)n give the frequencies to be expected in the long run of

0, 1, 2, . . . .4-cards in the sample, the actual frequencies only

deviating from these by errors which are themselves fluctuations

of sampling. The three constants N, p, n, therefore, determine

the average or smoothed form of the distribution to which actual

distributions will more or less closely approximate.

Considered, however, as a formula which may be generally

useful for describing frequency-distributions, the binomial series

suffers from a serious limitation, viz. that it only applies to a

strictly discontinuous distribution like that of the number of

.4-cards drawn from a record containing A's and a's, or the number

of heads thrown in tossing a coin. The question arises whether

we can pass from this discontinuous formula to an equation

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

suitable for representing a continuous distribution of frequency.

8. Such an equation becomes, indeed, almost a necessity for

certain cases with which we have already dealt. Consider, for

example, the frequency-distribution of the number of male births

in batches of 10,000 births, the mean number being, say, 5100.

The distribution will be given by the terms of the series

(0-49 + 0'51)10000 and the standard-deviation is, in round numbers,

50 births. The distribution will therefore extend to some 150

births or more on either side of the mean number, and in order

to obtain it we should have to calculate some 300 terms of a

binomial series with an exponent of 10,000! This would not

only be practically impossible without the use of certain methods

of approximation, but it would give the distribution in quite


unnecessary detail: as a matter of practice, we would not have

compiled a frequency-distribution by single male births, but

would certainly have grouped our observations, taking probably

10 births as the class-interval. We want, therefore, to replace the

binomial series by some continuous curve, having approximately

the same ordinates, the curve being such that the area between

any two ordinates y, and y2 will give the frequency of observations

between the corresponding values of the variable x1 and xv

9. It is possible to find such a continuous limit to the binomial

series for any values of p and q, but in the present work we will

confine ourselves to the simplest case in which p = q-=0'5, and the

binomial is symmetrical. The terms of the series are

„.,, (, ra(ra-l) n(ra-1)(«-2) »

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

mr\1+w+ L2 + i.2.3 + — }•

The frequency of m successes is


m \n - m

and the frequency of m + 1 successes is derived from this by

multiplying it by (n-m)/(m+l). The latter frequency is

therefore greater than the former so long as



or m<—~- .

Suppose, for simplicity, that n is even, say equal to 2k; then the

frequency of h successes is the greatest, and its value is


The polygon tails off symmetrically on either side of this greatest

ordinate. Consider the .frequency of k + x successes; the value is

y*=N^Y-Sk-x ■•"(»)

and therefore
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

yx (*)(*-!)(*-2) .... (k-x+l)

y0 (k + l)(k+2)(k + 3) . . . . (k + x)


('-30 -!)('~!) ■■•■(, -s



Now let us approximate by assuming, as suggested in § 8, that

k is very large, and indeed large compared with x, so that (x/k)2

may be neglected compared with (x/k). This assumption does

not involve any difficulty, for we need not consider values of x

much greater than three times the standard-deviation or 3 Jk/2,

and the ratio of this to k is 3/ Jlk, which is necessarily small if k

be large. On this assumption we may apply the logarithmic


82 8s 8*

logXl+8) = 8-i + i-i+. . . .

to every bracket in the fraction (3), and neglect all terms beyond

the first. To this degree of approximation,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

x(x - 1) X



Therefore, finally,

yx=y<fi =y0e

J _ ~& . . . . (4)

where, in the last expression, the constant k has been replaced by

the standard-deviation o-, for o-2 = k/2.

The curve represented by this equation is symmetrical about

the point x = 0, which gives the greatest ordinate y = y0. Mean,

median, and mode therefore coincide, and the curve is, in fact,

that drawn in fig. 5 and taken as the ideal form of the symmetri-

cal frequency-distribution in Chap. VI. The curve is generally

known as the normal curve of errors or of frequency, or the law

of error.

10. A normal curve is evidently defined completely by giving

the values of y0 and o- and assigning the origin of x. If we

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

desire to make a normal curve fit some given distribution as near

as may be, the last two data are given by the standard-deviation

and the mean respectively; the value of y0 will be given by the

fact that the areas of the two distributions, or the numbers of

observations which these areas represent, must be the same.

This condition does not, however, lead in any simple and

elementary algebraic way to an expression for y0, though such

a value could be found arithmetically to any desired degree

of approximation. For it is evident that (1) any alteration in


y0 produces a proportionate alteration in the area of the curve,

e.g. doubling y0 doubles every ordinate yx and therefore doubles

the area: (2) any alteration in o- produces a proportionate

alteration in the area, for the values of yx are the same for the

same values of x/o-, and therefore doubling o- doubles the distance

of every ordinate from the mean, and consequently doubles the

area. The area of the curve, or the number of observations

represented, is therefore proportional to y0o-, or we must have



where a. is a numerical constant. The value of a may be found

approximately by taking y0 and o- both equal to unity, calculating

the values of the ordinates yx for equidistant values of x, and

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

taking the area, or number of observations N, as given by the

sum of the ordinates multiplied by the interval.

11. The table below gives the values of y for values of x

proceeding by fifths of a unit; the values are, of course, the same

for positive and negative values of x. For the whole curve the

sum of the ordinates will be found to be 1253318, the interval

being 0-2 units; the area is therefore, approximately, 2-50664,

Ordinates ofthe Curve y — e ''. (For references to more extended

tables, see list on pp. 353-4.)







Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google







































Stirling (1730) If n be large, we have, to a high degree of


1 AT n"

\n= sJ2mr -„•

Applying Stirling's theorem to the factorials in equation (1) we


The complete expression for the normal curve is therefore

N -*,

y-i~^a - - - ■ (fi)

The exponent may be written x2/c2 where c = V2.o-, and this is

the origin of the use of j2xo- (the "modulus ") as a measure

of dispersion, of 1/ 972.o- as a measure of "precision," and of 2o-2

as "thejluctuation" (c/. Chap. VIII. § 13). The use of the factor

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

2 or *]2 becomes meaningless if the distribution be not normal.

Another rule cited in Chap. VIII., viz. that the mean deviation

is approximately 4/5 of the standard-deviation, is strictly true

for the normal curve only. For this distribution the mean

deviation = <r J2/ir = 0-79788 . . . . o-: the proof cannot be given

within the limitations of the present work. The rule that a

range of 6 times the standard-deviation includes the great

majority of the observations and that the quartile deviation is

about 2/3 of the standard-deviation were also suggested by the

properties of this curve (see below §§ 16, 17).

12. In the proof of § 9 the assumption was made that k (the

half of the exponent of the binomial) was very large compared

with x (any deviation that had to be considered). In point

of fact, however, the normal curve gives the terms of the

symmetrical binomial surprisingly closely even for moderate

values of n. Thus if n = 64, k = 32, and the standard-deviation

is 4. Deviations x have therefore to be considered up to ±12

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

or more, which is over 1/3 of k. As will be seen, however, from

the annexed table, the ordinates of the normal curve agree with

those of the binomial to the nearest unit (in 10,000 observations)

up to x= ±15. The closeness of approximation is partly due

to the fact that, in applying the logarithmic series to the

fraction on the right of equation (3), the terms of the second

order in expansions of corresponding brackets in numerator and

denctmiaa.tQr • cancel each other: these terms, therefore, do not


accumulate, but only the terms of the third order. There is

only one second-order term that has been neglected, viz. that due

to the last bracket in the denominator. Even for much lower

values of n than that chosen for the illustration—e.g. 10 or 12

(cf. Qu. 4 at the end of this chapter)—the normal curve still

gives a very fair approximation.

Table showing (1) Ordinates of the Binomial Series 10,000 (J + \)u and

10,000 - 32

(2) Corresponding Ordinates of the Normal Curve y — —r= e




31 and 33
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

















Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

















24 and 40
























Lotus, Pearl, American Naturalist, Nov. 1906). The question

arises, therefore, why, in such cases, the distribution should be

approximately normal, a form of distribution which we have only

shown to arise if the variable is the sum of a large number of

elements, each of which can take the values 0 and 1 (or other two

constant values), these values occurring independently, and with

equal frequency.

In the first place, it should be stated that the conditions of the

deduction given in § 9 were made a little unnecessarily restricted,



Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259





56 58 GO 6V


76 76 SO

64 66 68 70 72

Stature in. inches.

Fig. 49.—The Distribution of Stature for Adult Males in the British Isles

(fig. 6, p. 89), fitted with a Normal Curve: to avoid confusing the

figure, the frequency-polygon has not been drawn in, the tops of the

ordinates being shown by small circles.

with a view to securing simplicity of algebra. The deduction

may be generalised, whilst retaining the same type of proof, by

assuming that p and q are unequal (provided p — q be small

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

compared with Jnpq, cf. § 3), that p and q are not quite the

same for all the events, that all the events are not quite inde-

pendent, or that n is not large, but that some sort of continuous

variation is possible in the values of the elementary variables,

these being no longer restricted to 0 and 1, or two other discrete

values. (Cf. the deduction given by Pearson in ref. 11.) Pro-

ceeding further from this last idea, the deduction may be rendered


more general still, without introducing the conception of the

binomial at all, by founding the curve on more or less complex

cases of the theory of sampling for variables instead of for

attributes. If a variable is the sum (or, within limits, some

slightly more complicated function) of a large number of other

variables, then the distribution of the compound or resultant

variable is normal, provided that the elementary variables are

independent, or nearly so. The forms of the frequency-distribu-

tions of the elementary variables affect the final distribution less

and less as their number is increased: only if their number is

moderate, and the distributions all exhibit a comparatively high

degree of asymmetry of uniform sign, will the same sign of

asymmetry be sensibly evident in the distribution of the compound

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

variable. On this sort of hypothesis, the expectation of normality

in the case of stature may be based on the fact that it is a highly

compound character—depending on the sizes of the bones of the

head, the vertebral column, and the legs, the thickness of the

intervening cartilage, and the curvature of the spine—the elements

of which it is composed being at least to some extent independent,

i.e. by no means perfectly correlated with each other, and their

frequency-distributions exhibiting no very high degree of asym-

metry of one and the same sign. The comparative rarity of

normal distributions in economic statistics is probably due in part

to the fact that in most cases, while the entire causation is

certainly complex, relatively few causes have a largely predominant

influence (hence also the frequent occurrence of irregular

distributions in this field of work), and in part also to a high

degree of asymmetry in the distributions of the elements on which

the compound variable depends. Errors of observation may in

general be regarded as compounded of a number of elements, due

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

to various causes, and it was in this connection that the normal

curve was first deduced, and received its name of the curve of

errors, or law of error.

14. If it be desired to compare some actual distribution

with the normal distribution, the two distributions should be

superposed on one diagram, as in fig. 49, though, of course, on

a much larger scale. When the mean and standard-deviation

of the actual distribution have been determined, y0 is given by

equation (5); the fit will probably be slightly closer if the

standard-deviation is adjusted by Sheppard's correction (Chap.

XI. § 3). The normal curve is then most readily drawn by plot-

ting a scale showing fifths of the standard-deviation along the

base line of the frequency diagram, taking the mean as origin,

and marking over these points the ordinates given by the figures

of the table on p. 299, multiplied in each case by y0. The curve


can be drawn freehand, or by aid of a curve ruler, through the

tops of the ordinates so determined. The logarithms of y in the

table on p. 299 are given to facilitate the multiplication. The only

point in. which the student is likely to find any difficulty is

in the use of the scales: he must be careful to remember

that the standard-deviation must be expressed in terms of the

class-interval as a unit in order to obtain for y0 a number of

observations per interval comparable with the frequencies of his


The process may be varied by keeping the normal curve

drawn to one scale, and redrawing the actual distribution

so as to make the area, mean, and standard-deviation the

same. Thus suppose a diagram of a normal curve was printed

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

once for all to a scale, say, of ya = 5 inches, o- = 1 inch, and

it were required to fit the distribution of stature to it.

Since the standard-deviation is 2-57 inches of stature, the

scale of stature is 1 inch = 2-57 inch of stature, or 0-389 inches

= 1 inch of stature; this scale must be drawn on the base of the

normal-curve diagram, being so placed that the mean falls

at 67'46. As regards the scale of frequency-per-interval, this

is given by the fact that the whole area of the polygon showing

the actual distribution must be equal to the area of the

normal curve, that is 5 J2t9= 12-53 square inches. If, therefore,

the scale required is n observations per interval to the inch,

we have, the number of observations being 8585,

8585 12.53

which gives n = 266-6.

Though the second method saves curve drawing, the first,

on the whole, involves the least arithmetic and the simplest

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

15. Any plotting of a diagram, or the equivalent arithmetical

comparison of actual frequencies with those given by the

fitted normal distribution, affords, of course, in itself, only a

rough test, of a practical kind, of the normality of the given

distribution. The question whether all the observed differences

between actual and calculated frequencies, taken together,

may have arisen merely as fluctuations of sampling, so that the

actual distribution may be regarded as strictly normal, neglecting

such errors, is a question of a kind that cannot be answered in

an elementary work (cf. ref. 19). At present the student is in

a position to compare the divergences of actual from calculated

frequencies with fluctuations of sampling in the case of single

class-intervals, or single groups of class-intervals only. If the


expected theoretical frequency in a certain interval is /, the

standard error of sampling is n^N -/)/N; and if the divergence

of the observed from the theoretical frequency exceed some

three times this standard error, the divergence is unlikely to

have occurred as a mere fluctuation of sampling.

It should be noted, however, that the ordinate of the normal

curve at the middle of an interval does not give accurately the

area of that interval, or the number of observations within it: it

would only do so if the curve were sensibly straight. To deal

strictly with problems as to fluctuations of sampling in the

frequencies of single intervals or groups of intervals, we require,

accordingly, some convenient means of obtaining the number of

observations, in a given normal distribution, lying between any

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

two values of the variable.

16. If an ordinate be erected at a distance x/<r from the mean,

in a normal curve, it divides the whole area into two parts, the

ratio of which is evidently, from the mode of construction of the

curve, independent of the values of y0 and of o\ The calculation

of these fractions of area for given values of x/o-, though a long

and tedious matter, can thus be done once for all, and a table

giving the results is useful for the purpose suggested in § 15 and

in many other ways. References to complete tables are cited at

the end of this work (list of tables, pp. 353-4), the short table below

being given only for illustrative purposes. The table shows the

greater fraction of the area lying on one side of any given ordinate;

e.g. 0-53983 of the whole area lies on one side of an ordinate at

O.lo- from the mean, and 046017 on the other side. It will be

seen that an ordinate drawn at a distance from the mean equal to

the standard-deviation cuts off some 16 per cent. of the whole

area on one side; some 68 per cent. of the area will therefore be
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

contained between ordinates at ± o-. An ordinate at twice the

standard-deviation cuts off only 2.3 per cent., and therefore some

95-4 per cent. of the whole area lies within a range of + 2o-. As

three times the standard-deviation the fraction of area cut off is

reduced to 135 parts in 100,000, leaving 99-7 per cent. within a

range of + 3o-. This is the basis of our rough rule that a range

of 6 times the standard-deviation will in general include the

great bulk of the observations: the rule is founded on, and is only

strictly true for, the normal distribution. For other forms of

distribution it need not hold good, though experience suggests

that it more often holds than not. The binomial distribution,

especially Up and q be unequal, only becomes approximately normal

when n is large, and this limitation must be remembered in applying

the table given, or similar more complete tables, to cases in which

the distribution is strictly binomial.



Table shcnving the Greater Fraction of the Area of a Normal Curve to One

Side of an Ordinate of Abscissa x{a. (For references to more extended

tables, see list on pp. 853-4.)




Fraction of


Fraction of



Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

















Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google








































unreliability of observed statistical results, and the term probable

error is given to this quantity. It should be noted that the word

"probable" is hardly used in its usual sense in this connection:

the probable error is merely a quantity such that we may expect

greater and less errors of simple sampling with about equal

frequency, provided always that the distribution of errors is

normal. On the whole, the use of the "probable error" has little

advantage compared with the standard, and consequently little

stress is laid on it in the present work ; but the term is in constant

use, and the student must be familiar with it.

It is true that the " probable error " has a simpler and more direct

significance than the standard error, but this advantage is lost as

soon as we come to deal with multiples of the probable error.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Further, the best modern tables of the ordinates and area of the

normal curve are given in terms of the standard-deviation or

standard error, not in terms of the probable error, and the mul-

tiplication of the former by 0-6745, to obtain the probable error,

is not justified unless the distribution is normal. For very large

samples the distribution is approximately normal, even though p

and q are unequal; but this is not so for small samples, such as

often occur in practice. In the case of small samples the use of

the "probable error" is consequently of doubtful value, while the

standard error retains its significance as a measure of dispersion.

The "probable error," it may be mentioned, is often stated after

an observed proportion with the + sign before it; a percentage

given as 20"5 + 2-3 signifying "20-5 per cent., with a probable

error of 2.3 per cent."

If an error or deviation in, say, a certain proportion p only just

exceed the probable error, it is as likely as not to occur in simple

sampling: if it exceed twice the probable error (in either direction),

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

it is likely to occur as a deviation of simple sampling about 18

times in 100 trials—or the odds are about 4.6 to 1 against its

occurring at any one trial. For a range of three times the probable

error the odds are about 22 to 1, and for a range of four times the

probable error 142 to 1. Until a deviation exceeds, then, 4 times

the probable error, we cannot feel any great confidence that it is

likely to be "significant." It is simpler to work with the standard

error and take ± 3 times the standard error as the critical range:

for this range the odds are about 370 to 1 against such a devia-

tion occurring in simple sampling at any one trial.

18. The following are a few miscellaneous examples of the use

of the normal curve and the table of areas.

Example i.—A hundred coins are thrown a number of times.

How often approximately in 10,000 throws may (1) exactly 65

heads, (2) 65 heads or more, be expected 1


The standard-deviation is \A)-5 x 05 x 100 = 5. Taking the

distribution as normal, y0 = 797-9.

The mean number of heads being 50, 65 - 50 = 3er. The

frequency of a deviation of 3o- is given at once by the table (p. 299)

as 797-9 x 0111 . . . . =8-86, or nearly 9 throws in 10,000. A

throw of 65 heads will therefore be expected about 9 times.

The frequency of throws of 65 heads or more is given by the

area table (p. 306), but a little caution must now be used, owing

to the discontinuity of the distribution. A throw of 65 heads is

equivalent to a range of 64-5-65.5 on the continuous scale of the

normal curve, the division between 64 and 65 coming at 64-5.

64-5 ~ 50= + 2-9o-, and a deviation of +2-9.o- or more, will only

occur, as given by the table, 187 times in 100,000 throws, or, say,
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

19 times in 10,000.

Example ii.—Taking the data of the stature-distribution of fig.

49 (mean 67.46, standard-deviation 2-57 in.), what proportion of

all the individuals will be within a range of + 1 inch of the


1 inch = 0389o\ Simple interpolation in the table of p. 306

gives 0-65129 of the area below this deviation, or a more extended

table the more accurate value 0-65136. Within a range of

+ 0-389o- the fraction of the whole area is therefore 0'30272, or the

statures of about 303 per thousand of the given population will lie

within a range of ± 1 inch from the mean.

Example iii.—In a case of crossing a Mendelian recessive by a

heterozygote the expectation of recessive offspring is 50 per cent.

(1) How often would 30 recessives or more be expected amongst 50

offspring owing simply to fluctuations of sampling 1 (2) How many

offspring would have to be obtained in order to reduce the probable

error to 1 per cent.?

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

The standard error of the percentage of recessives for 50

observations is 50 \/l/50 = 7-07. Thirty recessives in fifty is

a deviation of 5 from the mean, or, if we take thirty as representing

29-5 or more, 4.5 from the mean; that is, 0'636.o-. A positive

deviation of this amount or more occurs about 262 times in 1000,

so that 30 recessives or more would be expected in more than a

quarter of the batches of 50 offspring. We have assumed

normality for rather a small value of n, but the result is sufficiently

accurate for practical purposes.

As regards the second part of the question we are to have


n being the number of offspring. This gives n=1137 to the

nearest unit.

Example iv.—The diagram of fig. 49 shows that the number of

statures recorded in the group "62 in. and less than 63" is

markedly less than the theoretical value. Could such a difference

occur owing to fluctuations of simple sampling; and if so, how

often might it happen?

The actual frequency recorded is 169. To obtain the theoreti-

cal frequency we may either take it as given roughly by the

ordinate in the centre of the interval, or, better, use the integral

table. Remembering that statures were only recorded to the

nearest \ in., the true limits of the interval are 61-J-f—62j£, or

61"94-62-94, mid-value 62"44. This is a deviation from the

mean (67"46) of 5-02. Calculating the ordinate of the normal

curve directly we find the frequency 197-8. This is certainly, as

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

is evident from the form of the curve, a little too small. The

interval actually lies between deviations of 4-52 in. and 5.52

in., that is, l-759o-and 2148o\ The corresponding fractions of

area are 0-96071 and 0"98418, difference, or fraction of area

between the two ordinates, 0/02347. Multiplying this by the

whole number of observations (8585) we have the theoretical

frequency 201'5.

The difference of theoretical and observed frequencies is therefore

32-5. But the proportion of observations which should fall into

the given class is 0-023, the proportion falling into other classes

0.977, and the standard error of the class frequency is accordingly

x/0-023 x 0-977 x 8585 = 14-0. As the actual deviation is only

2-32 times this, it could certainly have occurred as a fluctuation of


The question how often it might have occurred can only be

answered if we assume the distribution of fluctuations of sampling

to be approximately normal. It is true that p and q are very

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

unequal, but then n is very large (8585)—so large that the

difference of the chances is fairly small compared with Jnpq

(about one-fifteenth). Hence we may take the distribution of

errors as roughly normal to a first approximation, though a

first approximation only. The tables give 0-990 of the area

below a deviation of 2.320-, so we would expect an equal or

greater deficiency to occur about 10 times in 1000 trials, or once

in a hundred.


The Binomial Machine.

(1) Galton, Francis, Natural Inheritance; Macmillan & Co. London, 1889,

(Mechanical method of forming a binomial or normal distribution,

chap. v., p. 63; for Pearson's generalised machine, see below,

ref. 11.)

Frequency Curves.

For the early classical memoirs on the normal curve or law of error

by Laplace, Gauss, and others, see Todhunter's History (1 n traduction:

ref. 7). The literature of this subject is too extensive to enable us to do

more than cite a few of the more recent memoirs, of which 5, 6, and 11

are of fundamental importance. The student will find other citations

in 5, 7, and 12.

(2) Charlier, C. V. L., "Researches into the Theory of Probability " (Com-

munications from the Astronomical Observatory, Lund); Lund, 1906.

(3) Edgeworth, F. Y., "On the Representation of Statistics by Mathema-

tical Formula;," Jour. Soy. Stat. Soc., vol. lxi., 1898 ; vol. Ixii, 1899;

and vol. lxiii , 1900.

(4) Edgeworth, F. Y., Article on the '' Law of Error" in the Encyclopaedia
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Britannica, 10th edn., vol. xxviii., 1902, p. 280.

(5) Edgeworth, F. Y., "The Law of Error," Cambridge Phil. Trans., vol.

xx., 1904, pp. 36-65, 113-141 (and an appendix, pp. i-xiv, not

printed in the Cambridge Phil. Trans.).

(6) Edgeworth, F. Y., "The Generalised Law of Error, or Law of Great

Numbers," Jour. Roy. Stat. Soc., vol. lxix., 1906, p. 497.

(7) Edgeworth, F. Y., "On the Representation of Statistical Frequency by

a Curve," Jour. Roy. Stat. Soc, vol. lxx., 1907, p. 102.

(8) Fechner, G. T., Kollektivmasslehre (herausgegeben von G. F. Lipps);

Engelmann, Leipzig, 1897.

(9) Kaptetn, J. C, Skew Frequency Curves in Biology and Statistics;

Noordhoff, Groningen; Wm. Dawson & Sons, London, 1903.

(10) Macalister, Donald, "The Law of the Geometric Mean," Proc. Roy.

Soc., vol. xxix., 1879, p. 367.

(11) Pearson, Karl, "Skew Variation in Homogeneous Material," Phil.

Trans. Roy. Soc, Series A, vol. clxxxvi., 1895, p. 343.

(For the generalised binomial machine, see § 1. The memoir deals

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

with curves derived from the general binomial, and from a somewhat

analogous series derived from the case of sampling from limited

material. Supplement to the memoir, ibid., vol. cxcvii., 1901, p. 443.

For a derivation of the same curves from a modified standpoint,

ignoring the binomial and analogous distributions, cf. chap• x., ref. 12.)

(12) Pearson Karl, "Das Fehlergesetz und seine Verallgemeinerungen

durch Fechner und Pearson": A Rejoinder, Biometrika, vol. iv., 1905,

p. 169.

(13) Sheppard, W. F., "On the Application of the Theory of Error to Cases

of Normal Distribution and Normal Correlation," Phil. Trans. Roy.

Soc, Series A, vol. cxcii., 1898, p. 101. (Includes a geometrical treat-

ment of the normal curve.)

(14) Yule, G. U., "On the Distribution of Deaths with Age when the Causes

of Death act cumulatively, and similar Frequency-distributions,"

Jour. Roy. Stat. Soc, vol. lxxiii., 1910, p. 26. (A binomial distribu-

tion with negative index, and the related curve, i.e. a special case of

one of Pearson's curves, ref. 11.)

The Resolution of a Distribution compounded of two Normal

Curves into its Components.

(15) Pearson, Karl, "Contributions to the Mathematical Theory of Evolu-

tion (on the Dissection of Asymmetrical Frequency Curves)," Phil.

Trans. Roy. Soc, Series A, vol. clxxxv., 1894, p. 71.


(16) Edgeworth, F. Y., "On the Representation of Statistics by Mathema-

tical Formulae," partii., Jour. Boy. Stat. Soc, vol. lxii., 1899, p. 125.

(17) Pearson, Karl, "On some Applications of the Theory of Chance to

Racial Differentiation," Phil. Mag., 6th Series, vol. i., 1901, p. 110.

(18) Helguero, Fernando de, "Per la risoluzione delle curve dimorfiche,"

Biomttrika, vol. iv., 1905, p. 230. Also memoir under the same title

in the Transactions of the Reale Accademia dei Lincei, Rome, vol. vi.

1906. (The first is a short note, the second the full memoir.)

See also the memoir by Charlier, cited in (2), section vi. of that

memoir dealing with the problem of dissection.

Testing the Fit of a Theoretical to an Observed Distribution.

(19) Pearson, Karl, "On the Criterion that a given System of Deviations

from the Probable, in the case of a Correlated System of Variables, is

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

such that it can be reasonably supposed to have arisen from random

sampling," Phil. Mag., July 1900, p. 157.


1. Calculate the theoretical distributions for the three experimental cases

(1), (2), and (3) cited in § 7 of Chapter XIII.

2. Show that if np be a whole number, the mean of the binomial coincides

with the greatest term.

3. Show that if two symmetrical binomial distributions of degree n (and

of the same number of observations) are so superposed that the rth term of

the one coincides with the (r + l)th term of the other, the distribution

formed by adding superposed terms is a symmetrical binomial of degree n +1.

[Note: it follows that if two normal distributions of the same area and

standard-deviation are superposed so that the difference between the means is

small compared with the standard-deviation, the compound curve is very

nearly normal. ]

4. Calculate the ordinates of the binomial 1024 (0.5 + 0.5)10, and compare

them with those of the normal curve.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

5. Draw a diagram showing the distribution of statures of Cambridge

Students (Chap. VII., Table VII.), and a normal curve of the same area,

mean, and standard-deviation superposed thereon.

6. Compare the values of the semi-interquartile range for the stature

distributions of male adults in the United Kingdom and Cambridge Students,

(1) as found directly, (2) as calculated from the standard-deviation, on the

assumption that the distribution is normal.

7. Taking the mean stature for the British Isles as 67"46 in. (the dis-

tribution of fig. 49), the mean for Cambridge students as 68 "85 in., and the

common standard-deviation as 2 -66 in., what percentage of Cambridge students

exceed the British mean in stature, assuming the distribution normal?

8. As stated in Chap. XIII., Example ii., certain crosses of Pisum sativum

based on 7125 seeds gave 25-32 per cent. of green seeds instead of the theoretical

proportion 25 per cent., the standard error being 0.51 per cent. In what per-

centage of experiments based on the same number of seeds might an equal or

greater percentage bo expected to occur owing to fluctuations of sampling


9. In what proportion of similar. experiments based on (1) 100 seeds, (2)

1000 seeds, might (a) 30 per cent. or more, (6) 35 per cent. or more, of green

seeds, be expected to occur, if ever?


10. In similar experiments, what number of seeds must be obtained to

make the " probable error" of the proportion 1 per cent.?

11. If skulls are classified as dolichocephalic when the length-breadth

index is under 75, mesocephalic when the same index lies between 75 and 80,

and brachycephalic when the index is over 80, find approximately (assuming

that the distribution is normal) the mean and standard-deviation of a series

in which 58 per cent. are stated to be dolichocephalic, 38 per cent. meso-

cephalic, and 4 per cent. brachycephalic.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


1-3. Deduction of the general expression for the normal correlation surface

from the case of independence—4. Constancy of the standard-

deviations of parallel arrays and linearity of the regression—5. The

contour lines: a series of concentric and similar ellipses—6. The

normal surface for two correlated variables regarded as a normal

surface for uncorrelated variables rotated with respect to the axes of

measurement: arrays taken at any angle across the surface are normal

distributions with constant standard-deviation : distribution of and

correlation between linear functions of two normally correlated

variables are normal: principal axes—7. Standard-deviations round

the principal axes—8-11. Investigation of Table III., Chap. IX., to

test normality: linearity of regression, constancy of standard-deviation

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

of arrays, normality of distribution obtained by diagonal addition,

contour lines—12-13. Isotropy of the normal distribution for two

variables—14. Outline of the principal properties of the normal dis-

tribution for n variables.

1. The expression that we have obtained for the "normal "dis-

tribution of a single variable may readily be made to yield a

corresponding expression for the distribution of frequency of pairs

of values of two variables. This normal distribution for two

variables, or "normal correlation surface," is of great historical

importance, as the earlier work on correlation is, almost with-

out exception, based on the assumption of such a distribution;

though when it was recognised that the properties of the correla-

tion-coefficient could be deduced, as in Chap. IX , without reference

to the form of the distribution of frequency, a knowledge of

this special type of frequency-surface ceased to be so essential.

But the generalised normal law is of importance in the theory of

sampling: it serves to describe very approximately certain actual

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

distributions (e.g. of measurements on man); and if it can be

assumed to hold good, some of the expressions in the theory of

correlation, notably the standard-deviations of arrays (and, if

more than two variables are involved, the partial correlation-

coefficients), can be assigned more simple and definite meanings

than in the general case. The student should, therefore, be

familiar with the more fundamental properties of the distribution.


2. Consider first the case in which the two variables are com-

pletely independent. Let the distributions of frequency for the

two variables xl and x2 , singly, be







Then, assuming independence, the frequency-distribution of pairs

of values must, by the rule of independence, be given by

<£D ■ ■»

yi2 = y12e
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259


y1 2 = !LP = 7, . . . (3)

Equation (2) gives a normal correlation surface for one special

case, the correlation-coefficient being zero. If we put x2 = a con-

stant, we see that every section of the surface by a vertical plane

parallel to the xl axis, i.e. the distribution of any array of a^'s, is

a normal distribution, with the same mean and standard-deviation

as the total distribution of Z1's, and a similar statement holds for

the array of x2's; these properties must hold good, of course, as

the two variables arc assumed independent (cf. Chap. V. § 13).

The contour lines of the surface, that is to say, lines drawn on

the surface at a constant height, are a series of similar ellipses

with major and minor axes parallel to the axes of x1 and x2 and

proportional to o-1 and o-2, the equations to the contour lines being

of the general form

+ ^ = C .... (4)

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Pairs of values of x1 and x2 related by an equation of this form

are, therefore, equally frequent.

3. To pass from this special case of independence to the general

case of two correlated variables, remember (Chap. XII. § 8)

that if

V%\ =X2~ °2VXl

x1 and x2v as also x2 and x12 are uncorrelated. If they are not

merely uncorrelated but completely independent, and if the dis-


tribution of each of the deviations singly be normal, we must have

for the frequency-distribution of pairs of deviations of x1 and x21

(I41) ■■•<*>



1 .|. -21 — * .1. ^2 - 0 *i*a

J ~T ■t 9/1 *> \ i 9/-1 9\ «?*«

o-? T oi, of(l - r?2) T oJ(l - rf2) %o-2(l - r>a)

X\ 3C% XyCi

z-r + zr- 2»v


2 i 2 *" 11

"■1.2 Via o-i.2-tr2.i

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Evidently we would also have arrived at precisely the same

expression if we had taken the distribution of frequency for x2

and x12, and reduced the exponent

— + —.

O^ CTi.2

We have, therefore, the general expression for the normal

correlation surface for two variables

-<t4-.'»^) " " • <6>

„ - „' t v la "2.1 1.2 2.1'

Further, since xl and x2.v x2 and *1.2, are independent, we must


, = N = N_ = iV

'2ir.<ri<r2A 27T.tr2.0-L2 27r.o-1.o-2(l - r%y '^'

4. If we assign to x2 some fixed value, say h2, we have the

distribution of the array of x^s of type h2,

'°1.J "2.1 l.-flV

= y'12.e 2o2. e 2"i-2

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

This is a normal distribution of standard-deviation u12, with a

mean deviating by r12—-h2 from the mean of the whole distribu-

tion of *1's. As A2 represents any value whatever of x2, we see

(1) that the standard-deviations of all arrays of x1 are the same,



and equal to o-12: (2) that the regression of x1 on x2 is strictly

linear. Similarly, of course, if we assign to *l any value hv we

will find (1) that the standard-deviations of all arrays of x2 are

the same: (2) that the regression of x2 on x, is strictly linear.

Axes of Measurement

M t Mean of whole surface

and is also the summit of

the surface

RR.CC,- Lines of means

Contour lines and Axes of

normal correlation surfa<e

Fig. 50.—Principal Axes and Contour Lines of the normal

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Correlation Surface.

5. The contour lines are, as in the case of independence, a

series of concentric and similar ellipses; the major and minor

axes are, however, no longer parallel to the axes of x1 and x2, but

make a certain angle with them. Fig. 50 illustrates the calcu-

lated form of the contour lines for one case, RR and CC being

the lines of regression. As each line of regression cuts every

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

array of x1 or of x2 in its mean, and as the distribution of every

array is symmetrical about its mean, RR must bisect every

horizontal chord and CC every vertical chord, as illustrated

by the two chords shown by dotted lines: it also follows that

RR cuts all the ellipses in the points of contact of the horizontal

tangents to the ellipses, and CC in the points of contact of

the vertical tangents. The surface or solid itself, somewhat

truncated, is shown in fig. 29, p. 166.

6. Since, as we see from fig. 50, a normal surface for two

correlated variables may be regarded merely as a certain surface

for rvhieh r is zero turned round through some angle, and since

lor every angle through which it is turned the distributions of all

£ca arrays and x2 arrays are normal, it follows that every section
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

of a normal surface by a vertical plane is a normal curve, ie. the

distributions of arrays taken at any angle across the surface are

normal. It also follows that, since the total distributions of xl

and x2 must be normal for every angle though which the surface

is turned, the distributions of totals given by slices or arrays

taken at any angle across a normal surface must be normal

distributions. But these would give the distributions of functions

like a.x1±b.x2, and consequently (1) the distribution of any

linear function of two normally distributed variables x1 and x2

must also be normal; (2) the correlation between any two linear

functions of two normally distributed variables must be normal


To find the angle 6 through which the surface has been turned,

from the position for which the correlation is zero to the position

for which the coefficient has some assigned value r, we must use

a little trigonometry. The major and minor axes of the ellipses

are sometimes termed the principal axes. If (v £, be the co-

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

ordinates referred to the principal axes (the f1-axis being the

x1 axis in its new position) we have for the relation between £v

£2, xv x.2, the angle 6 being taken as positive for a rotation of

the «1-axis which will make it, if continued through 90°, coincide

in direction and sense with the a^-axis,

£i = xv cos 6 + x2. sin 6 \

f2 = x2. cos 6 - xv sin 6 )


But, since £1 £2 are uncorrelated, 2(f1£2) = 0. Hence, multiplying

together equations (8) and summing,

0 = (o-° - erf) sin 20 + 2r12.o-lo-, cos 20

tan20=-f^? . . . . (9)

It should be noticed that if we define the principal axes of any

distribution for two variables as being a pair of axes at right

angles for which the variables £v £2 are uncorrelated, equation

(9) gives the angle that they make with the axes of measurement

whether the distribution be normal or no.

7. The two standard-deviations, say 2X and 22, about the

principal axes are of some interest, for evidently from § 2 the

major and minor axes of the contour-ellipses are proportional

to these two standard-deviations. They may be most readily

determined as follows. Squaring the two transformation equations

(8), summing and adding, we have

3B} + Zj-o? + oJ .... (10)

Referring the surface to the axes of measurement, we have for

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

the central ordinate by equation (7)


Referring it to the principal axes, by equation (3)

'_ N

But these two values of the central ordinate must be equal,


2^=^(1-^)* . - • (11)

(10) and (11) are a pair of simultaneous equations from which

21 and 22 may be very simply obtained in any arithmetical case.

Care must, however, be taken to give the correct signs to the

square root in solving. 21 + 22 is necessarily positive, and 2t - 22

also if r is positive, the major axes of the ellipses lying along fi:

but if r be negative, 2X - 22 is also negative. It should be noted

that, while we have deduced (11) from a simple consideration

depending on the normality of the distribution, it is really of

general application (like equation 10), and may be obtained at

somewhat greater length from the equations for transforming

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


8. As stated in Chap. XV. § 13, the frequency-distribution

for any variable may be expected to be approximately normal

if that variable may be regarded as the sum (or, within limits,

some slightly more complex function) of a large number of other

variables, provided that these elementary component variables

are independent, or nearly so. Similarly, the correlation between

two variables may be expected to be approximately normal if


each of the two variables may be regarded as the sum, or some

slightly more complex function, of a large number of elementary

component variables, the intensity of correlation depending on

the proportion of the components common to the two variables.

Stature is a highly compound character of this kind, and we

have seen that, in one instance at least, the distribution of stature

for a number of adults is given approximately by the normal

curve. We can now utilise Table III., Chap. IX., p. 160, showing

the correlation between stature of father and son, to test, as far

as we can by elementary methods, whether the normal surface

will fit the distribution of the same character in pairs of indi-

viduals: we leave it to the student to test, as far as he can do so

by simple graphical methods, the approximate normality of the

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

total distributions for this table. The first important property

of the normal distribution is the linearity of the regression, and

this was illustrated in fig. 37, p. 174. It is evident that the

means of arrays deviate slightly here and there from the lines

of regression, but there is no marked and regular departure from

linearity—no suggestion of a smooth and sweeping curve.

Subject to some investigation as to the possibility of the devia-

tions that do occur arising as fluctuations of simple sampling,

when drawing samples from a record for which the regression

is strictly linear, we may conclude that the regression is

appreciably linear.

9. The second important property of the normal distribution

for two variables is the constancy of the standard-deviation for

all parallel arrays. We gave in Chap. X. p. 204 the standard-

deviations of ten of the columns of the present table, from the

column headed 62-5-63-5 onwards; these were—

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google










the mean being 2-36. The standard-deviations again only fluctuate

irregularly round their mean value. The mean of the first five

is 2-34, of the second five 2-38, a difference of only 0-04: of the

first group, two are greater and three are less than the mean,

and the same is true of the second group. There does not seem

to be any indication of a general tendency for the standard-

deviation to increase or decrease as we pass from one end of the

table to the other. We are not yet in a position to test how

far the differences from the average standard-deviation might

arise in sampling from a record in which the distribution was


strictly normal, but, as a fact, a rough test suggests that they

might have done so.

10. Next we note that the distributions of all arrays of a

normal surface should themselves be normal. Owing, however,

to the small numbers of observations in any array, the distributions

of arrays are very irregular, and their normality cannot be tested

in any very satisfactory way: we can only say that they do not

exhibit any marked or regular asymmetry. But we can test the

allied property of a normal correlation-table, viz. that the totals

of arrays must give a normal distribution even if the arrays be

taken diagonally across the surface, and not parallel to either

axis of measurement (cf. § 6). From an ordinary correlation-

table we cannot find the totals of such diagonal arrays exactly,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

but the totals of arrays at an angle of 45° will be given with

sufficient accuracy for our present purpose by the totals of lines

of diagonally adjacent compartments. Referring again to Table

III., Chap. IX., and forming the totals of such diagonals (running

up from left to right), we find, starting at the top left-hand

corner of the table, the following distribution :—









Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



















Total 1078

The mean of this distribution is at 0'368 of an interval above the

centre of the interval with frequency 78: its standard-deviation

is 4-755 intervals, or, remembering that the interval is 1/ sj2 of

an inch, 3-362 inches. (This value may be checked directly from

the constants for the table given in Chap. IX., Question 3, p. 189,

for we have from the first of the transformation equations (8),

o-| = o-°. coso 6 + o-2. sin2 6 + 2r12o-1o-2. sin 6 cos 6,



and inserting o-1 = 272, o-2 = 2-75, r12 = 0-51, sin 6 = cos 6 = 1/ >J2

find o-f = 3,361). Drawing a diagram and fitting a normal

curve we have fig. 51 ; the distribution is rather irregular but the

fit is fair; certainly there is no marked asymmetry, and, so far as

the graphical test goes, the distribution may be regarded as

appreciably normal. One of the greatest divergences of the

actual distribution from the normal curve occurs in the almost

central interval with frequency 78: the difference between the

observed and calculated frequencies is here 12 units, but the

standard error is 9*1, so that it may well have occurred as a

fluctuation of simple sampling.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259





Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


Fig. 51.—Distribution of Frequency obtained by addition of Table III.,

Chap. IX., along Diagonals running up from left to right, fitted with a

Normal Curve.

11. So far, we have seen (1) that the regression is approxi-

mately linear; (2) that, in the arrays which we have tested, the

standard-deviations are approximately constant, or at least that

their differences are only small, irregular and fluctuating; (3) that

the distribution of totals for one set of diagonal arrays is approxi-

mately normal. These results suggest, though they cannot

completely prove, that the whole distribution of frequency may

be regarded as approximately normal, within the limits of fluctu-

ations of sampling. We may therefore apply a more searching

test, viz. the form of the contour lines and the closeness of their

fit to the contour-ellipses of the normal surface. We can see at

once, however, that no very close fit can be expected. Since the

frequencies in the compartments of the table are small, the

standard error of any frequency is given approximately by its


square root (Chap. XIII. § 12), and this implies a standard error

of about 5 units at the centre of the table, 3 units for a frequency

of 9, or 2 units for a frequency of 4: such fluctuations might

cause wide divergences in the corresponding contour lines.

Using the suffix 1 to denote the constants relating to the

distribution of stature for fathers, and 2 the same constants for

the sons,

#=1078 ^ = 67-70 Aft = 68-66 n„

o-a= 2-72 o-2= 2-75 'i2-U0i

Hence we have from equation (7)

y'12 = 26-7

and the complete expression for the fitted normal surface is

(22 \
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

gj *i *1*2 \

The equation to any contour ellipse will be given by equating

the index of e to a constant, but it is very much easier to draw

the ellipses if we refer them to their principal axes. To do this

we must first determine 6, 2i and 22. From (9),

tan 20 = -46-49,

whence 20 = 91° 14', 0 = 45° 37', the principal axes standing very

nearly at an angle of 45° with the axes of measurement,

owing to the two standard-deviations being very nearly equal.

They should be set off on the diagram, not with a protractor, but

by taking tan0 from the tables (1-022) and calculating points on

each axis on either side of the mean.

To obtain 2X and 22 we have from (10) and (11)

2? + 2| = 14-961

22i 22= 12-868

Adding and subtracting these equations from each other and

taking the square root,

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

21h-22 = 5-275

2l-2, = 1-447

whence 2^3-36, 22=1-91; owing to the principal axes stand-

ing nearly at 45° the first value is sensibly the same as that found

for <r( in § 10. The equations to the contour ellipses, referred to

the principal axes, may therefore be written in the form

(3-36)2 + (l-91)J~'


the major and minor axes being 3-36 x c and 1*91 x c respectively.

To find c for any assigned value of the frequency y we have

ea - 2(log y\i - log y1s)


Supposing that we desire to draw the three contour-ellipses for

y = 5, 10 and 20, we find c = l-83, 1-40 and 0-76, or the following


63 64

71 7S 7S

65 66 67 6S 69'

Stature of Father: inches

Fig. 52.—Contour Lines for the Frequencies 5, 10 and 20 of the distribution

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

of Table III., Chap. IX., and corresponding Contour Ellipses of the fitted

Normal Surface. P, Plt P2 P2, principal axes: M, mean.

values for the major and minor axes of the ellipses :—semi-major

axes, 6-15, 4*70, 255: semi-minor axes, 3-50, 2-67, 1-45. The

ellipses drawn with these axes are shown in fig. 52, very much
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

reduced, of course, from the original drawing, one of the squares

shown representing a square inch on the original. The actual

contour lines for the same frequencies are shown by the irregular

polygons superposed on the ellipses, the points on these polygons

having been obtained by simple graphical interpolation between

the frequencies in each row and each column—diagonal interpola-

tion between the frequencies in a row and the frequencies in a

column not being used. It will be seen that the fit of the two

lower contours is, on the whole, fair, especially considering the

high standard errors. In the case of the central contour, y = 20,

the fit looks very poor to the eye, but if the ellipse be compared

carefully with the table, the figures suggest that here again we

have only to deal with the effects of fluctuations of sampling.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

For father's stature = 66 in., son's stature = 70 in., there is

a frequency of 18-75, and an increase in this much less than the

standard error would bring the actual contour outside the ellipse.

Again, for father's stature = 68 in., son's stature = 71 in., there

is a frequency of 19, and an increase of a single unit would give

a point on the actual contour below the ellipse. Taking the

results as a whole, the fit must be regarded as quite as good as

we could expect with such small frequencies. It is perhaps of

historical interest to note that Sir Francis Galton, working with-

out a knowledge of the theory of normal correlation, suggested

that the contour lines of a similar table for the inheritance of

stature seemed to be closely represented by a series of concentric

and similar ellipses (ref. 2): the suggestion was confirmed when

he handed the problem, in abstract terms, to a mathematician,

Mr J. D. Hamilton Dickson (ref. 4), asking him to investigate

"the Surface of Frequency of Error that would result from

these data, and the various shapes and other particulars of its
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

sections that were made by horizontal planes " (ref. 3, p. 102).

12. The normal distribution of frequency for two variables is

an isotropic distribution, to which all the theorems of Chap. V.

§§ 11-12 apply. For if we isolate the four compartments of the

correlation-table common to the rows and columns centring

round values of the variables xv x2, x\, x2, we have for the ratio

of the cross-products (frequency of x1 x2 multiplied by frequency

of x[, x2, divided by frequency of x1 x\ multiplied by frequency of

xl x2/'

( xi - *i)( *a - z?).


Assuming that x\ - x1 has been taken of the same sign as x'% — x2

the exponent is of the same sign as r13- Hence the association for

this group of four frequencies is also of the same sign as rn, the

ratio of the cross-products being unity, or the association zero,

if r12 is zero. In a normal distribution, the association is therefore

of the same sign—the sign of r12—for every tetrad of frequencies

in the compartments common to two rows and two columns; that

is to say, the distribution is isotropic. It follows that every

grouping of a normal distribution is isotropic whether the class-

intervals are equal or unequal, large or small, and the sign of the

association for a normal distribution grouped down to 2- x 2-fold

form must always be the same whatever the axes of division


These theorems are of importance in the applications of the

theory of normal correlation to the treatment of qualitative

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

characters which are subjected to a manifold classification. The

contingency tables for such characters are sometimes regarded as

groupings of a normal distribution of frequency, and the coefficient

of correlation is determined on this hypothesis by a rather lengthy

procedure (ref. 14). Before applying this procedure it is well,

therefore, to see whether the distribution of frequency may be

regarded as approximately isotropic, or reducible to isotropic form

by some alteration in the order of rows and columns (Chap. V.

§§ 9-10). If only reducible to isotropic form by some rearrange-

ment, this rearrangement should be effected before grouping the

table to 2- x 2-fold form for the calculation of the correlation

coefficient by the process referred to. If the table is not reducible

to isotropic form by any rearrangement, the process of calculating

the coefficient of correlation on the assumption of normality is to

be avoided. Clearly, even if the table be isotropic it need not be

normal, but at least the test for isotropy affords a rapid and

simple means for excluding certain distributions which are not

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

even remotely normal. Table II. of Chap. V. might possibly be

regarded as a grouping of normally distributed frequency if re-

arranged as suggested in § 10 of the same chapter—it would be

worth the investigator's while to proceed further and compare

the actual distribution with a fitted normal distribution—but

Table IV. could not be regarded as normal, and could not be

rearranged so as to give a grouping of normally distributed


13. If the frequencies in a contingency-table be not large, and

also if the contingency or correlation be small, the influence

of casual irregularities due to fluctuations of sampling may

render it difficult to say whether the distribution may be regarded

as essentially isotropic or no. In such cases some further con-

densation of the table by grouping together adjacent rows and

columns, or some process of "smoothing" by averaging the



frequencies in adjacent compartments, may be of service. The

correlation-table for stature in father and son (Table III., Chap.

IX.), for instance, is obviously not strictly isotropic as it stands:

we have seen, however, that it appears to be normal, within the

limits of fluctuations of sampling, and it should consequently be

isotropic within such limits. We can apply a rough test by

regrouping the table in a much coarser form, say with four rows

and four columns: the table below exhibits such a grouping, the

limits of rows and of columns having been so fixed as to include

not less than 200 observations in each array.

Table I.—(condensed from Table III. of Chapter IX.).

Father's Stature (inches).

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Son's Stature







and over.


Under 66'5







Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



32 5













70'5 and over







Taking the ratio of the frequency in col. 1 to the sum of the

frequencies in cols. 1 and 2 for each successive row, and so on for

the other pnirs of columns, we find the following series of ratios:

Table II.—Ratio of Frequency in Column m to Frequency in Column m

+ Frequency in Column (m +1) in Table I.



1 and 2.

2 and 3.

3 and 4.






isotropic. The student should form one or two other condensations

of the original table to 3- x 3- or 4- x 4-fold form: he will probably

find them either isotropic, or diverging so slightly from isotropy

that an alteration of the frequencies, well within the margin of

possible fluctuations of sampling, will render the distribution


14. Before concluding this chapter we may note briefly some

of the principal properties of the normal distribution of frequency

for any number of variables, referring the student for proofs to

the original memoirs. Denoting the frequency of the combination

of deviations xv x2, x3, . . . , x„ by y12 n, we must have

in the notation of Chapter XII., if the uncorrelated deviations xv

x„v x312, etc. be completely independent (c/. § 3 of the present

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259


„ -„' e-Wp\*v.— *n) • • • (12)

yi2 . . . . n — y 12 . . . , n e


°1 "al °3.12 ""o., .... (n-1)


and y12 „="« w« . (14)

The expression (13) for the exponent cf> may be reduced to a

general form corresponding to that given for two variables, viz.—

ct> = Z5 + 3 ~ +---+^ ~ .... (15)

O1.23 . . . . n 0^2.13 . . . . n "n.12 . . . (n-1)

+ 2.»\2.3. . . „— Z + " ■ • + 2r(n-1)n.12. . . ,„-2)

°'l.33 . .. t0-2.i3 . .. „ "0"(„ . ij.i. . . (n-JJntr^, . . . („-,)

Several important results may be deduced directly from the form

(13) for the exponent. Clearly this might have been written in

a great variety of ways, commencing with any deviation of the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

first order, allotting any primary subscript to the second deviation

(except the subscript of the first), and so on, just as in § 3 we

arrived at precisely the same final form for the exponent whether

we started with the two deviations xx and #2.i or with x2 and xlv

Our assumption, then, that the deviations xv x2v xs 12, etc. are

normally distributed amounts to the assumption that all devia-

tions of any order and with any suffixes are normally distributed,

i.e. in the general normal distribution for n variables every array

of every order is a normal distribution. It will also follow, gen-

eralising the deduction of § 6, that any linear function of xv x2

. . . . x„ is normally distributed. Further, if in (13) any fixed


values be assigned to x312 and all the following deviations, the

correlation between x1 and x2, on expanding x2v is, as we have

seen, normal correlation. Similarly, if any fixed values be

assigned to xv to «4.123, and all the following deviations, on

reducing x3l2 to the second order we shall find that the correla-

tion between x21 and x3l is normal correlation, the correlation

coefficient being r231, and so on. That is to say, using h to

denote any group of secondary suffixes, (1) the correlation between

any tivo deviations xmk and xnk is normal correlation; (2) the correla-

tion between the said deviations is rmnk whatever the particular

fixed values assigned to the remaining deviations. The latter

conclusion, it will be seen, renders the meaning of partial

correlation coefficients much more definite in the case of normal

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

correlation than in the general case. In the general case rmnjc

represents merely the average correlation, so to speak, between

xmk and x„k: in the normal case rmi^k is constant for all the sub-

groups corresponding to particular assigned values of the other

variables. Thus in the case of three variables which are normally

correlated, if we assign any given value to x3, the correlation

between the associated values of x1 and x2 is r123: in the general

case r123, if actually worked out for the various sub-groups

corresponding, say, to increasing values of x3, would probably

exhibit some continuous change, increasing or decreasing as the

case might be. Finally, we have to note that if, in the expression

(15) for <j>, we assign fixed values, say h2, h3, etc., to all the

deviations except xv and then throw <j> into the form of a perfect

square (as in § 4 for the case of two variables), we obtain a normal

distribution for x1 in which the mean is displaced by

°"l.23.. -», , °"l.23...»r. °"l.23. . , n ,

''12.34 ...n_. "» + rl3.24 .. . n _ «3 + . . . rln 2 • (W_i)_ "„.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

°2.13 ...n °3.12 . . . n "n.12 . . . (r1-1)

But this is a linear function of h2, h3, etc., therefore in the case of

normal correlation the regression of any one variable on any or all

of the others is strictly linear. The expressions r12 M ....«.

""1.23 .... n/02.13.... in etc. are of course the partial regressions

^12.34 .. . . n, "to.



(1) Bbavais, A., "Analyse mathematique sur les probabilites des erreurs de

situation d'un point," Acad• des Sciences: Memoires presentes par divers

savants, II8 serie, ix., 1846, p. 255.

(2) Galton, Francis, " Family Likeness in Stature," Proc. Soy. Soc., vol. xl.,

1886, p. 42.

(3) Galton, Francis, Natural Inheritance; Macmillan & Co., 1889.


(4) Dickson, J. D. Hamilton, Appendix to (2), Proc. Roy. Soc , vol. xl.,

1886, p. 63.

(5) Edgeworth, F. Y., "On Correlated Averages," Phil. Mag., 5th Series,

vol. xxxiv., 1892, p. 190.

(6) Pearson, Karl, "Regression, Heredity, and Panmixia," Phil. Trans.

Roy. Soc, Series A, vol. clxxxvii., 1896, p. 253.

(7) Pearson, Karl, "On Lines and Planes of Closest Fit to Systems of Points

in Space," Phil. Mag., 6th Series, vol. ii., 1901, p. 559. (On the fitting

of " principal axes" and the corresponding planes in the case of more

than two variables.)

(8) Pearson, Karl, "On the Influence of Natural Selection on the Variability

and Correlation of Organs," Phil. Trans. Roy. Soc., Series A, vol. cc,

1902, p. 1. (Based on the assumption of normal correlation.)

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(9) Pearson, Karl, and Alice Lee, "On the Generalised Probable Error in

Multiple Normal Correlation," Biometrika, vol. vi., 1908, p. 59.

(10) Yule, G. U., "On the Theory of Correlation," Jour. Rmj. Stat. Soc,

vol. lx., 1897, p. 812.

(11) Yule, G. U., "On the Theory of Correlation for any number of Variables

treated by a New System of Notation," Proc Roy. Soc, Series A, vol.

lxxix., 1907, p. 182.

(12) Sheppard, W. F., "On the Application of the Theory of Error to Cases

of Normal Distribution and Normal Correlation," Phil. Trans. Roy.

Soc, Series A, vol. excii., 1898, p. 101.

(13) Sheppard, W. F., "On the Calculation of the Double-integral

expressing Normal Correlation," Cambridge Phil. Trans., vol. xix.,

1900, p. 23.

Applications to the Theory of Attributes, etc.

(14) Pearson, Karl, "On the Correlation of Characters not Quantitatively

Measurable," Phil. Trans. Roy. Soc, Series A, vol. exev., 1900, p. 1.

(15) Pearson, Karl, " On a New Method of Determining Correlation between

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

a Measured Character A and a Character B, of which only the Percent-

age of Cases wherein B exceeds (or falls short of) a given Intensity is

recorded for each grade of A," Biometrika, vol. vii., 1909, p. 96.

(16) Pearson, Karl, "On a New Method of Determining Correlation, when

one Variable is given by Alternative and the other by Multiple

Categories," Biometrika, vol. vii., 1910, p. 248.

See also the memoir (12) by Sheppard.

Various Methods and their Relation to Normal Correlation.

(17) Pearson, Karl, "On the Theory of Contingency and its Relation to

Association and Normal Correlation," Drapers' Company Research

Memoirs, Biometric Series I. ; Dulau & Co., London, 1904.

(18) Pearson, Karl, "On Further Methods of Determining Correlation,"

Drapers' Company Research Memoirs, Biometric Series IV. (Methods

based on correlation of ranks: difference methods.) Dulau & Co.,

London, 1907.

(19) Spearman, C, "A Footrule for Measuring Correlation," Brit. Jour• of

Psychology, vol. ii., 1906, p. 89. (The suggestion of a " rank "method:

see Pearson's criticism and improved formula in (18) and Spearman's

reply on some points in (20).)

(20) Spearman, C, "Correlation calculated from Faulty Data," Brit. Jour.

of Psychology, vol. iii., 1910, p. 271.

(21) Thorndike, E. L., "Empirical Studies in the Theory of Measurement,"

Archives of Psychology (New York), 1907.



1. Deduce equation (11) from the equations for transformation of co-ordinates

without assuming the normal distribution. (A proof will be found in ref. 10.)

2. Hence show that if the pairs of observed values of a;1 and x2 are repre-

sented by points on a plane, and a straight line drawn through the mean, the

sum of the sqnares of the distances of the points from this line is a minimum

if the line is the major principal axis.

3. The coefficient of correlation with reference to the principal axes being

zero, and with reference to other axes something, there must be some pair of

axes at right angles for which the correlation is a maximum, i. e. is numerically

greatest without regard to sign. Show that these axes make an angle of 45°

with the principal axes, and that the maximum value of the correlation is—

4. (Sheppard, ref. 12.) A fourfold table is formed from a normal correla-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

tion table, taking the points of division between A and a, B and ft, at the

medians, so that (A) = (o) = (B) = [$) = N/2. Show that

'= cos (


Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



1-2. The problem of sampling for variables; the conditions assumed—

3. Standard error of a percentile—4. Special values for the percentiles

of a normal distribution—5. Effect of the form of the distribution

generally—6. Simplified formula for the case of a grouped frequency-

distribution—7. Correlation between errors in two percentiles of the

same distribution—8. Standard error of the interquartile range

for the normal curve—9. Effect of removing the restrictions of simple

sampling, and limitations of interpretation—10. Standard error of

the arithmetic mean—11. Relative stability of mean and median in

sampling—12. Standard error of the difference between two means—

13. The tendency to normality of a distribution of means—14. Effect

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

of removing the restrictions of simple sampling—15. Statement of the

standard errors of standard deviation, coefficient of variation, corre-

lation coefficient and regression—16. Restatement of the limitations

of interpretation if the sample be small.

1. In Chapters XIII.-XVI. we have been concerned solely with

the theory of sampling for the case of attributes and the frequency-

distributions appropriate to that case. We now proceed to

consider some of the simpler theorems for the case of variables

(cf. Chap. XIII. § 2). Suppose that we have a bag containing a

practically infinite number of tickets or cards bearing the recorded

values of some variable X, and that we draw a ticket from this

bag, note the value that it bears, draw another, and so on until

we have drawn n cards (a number small compared with the whole

number in the bag). Let us continue this process until we have

N such samples of n cards each, and then work out the mean,

standard-deviation, median, etc., for each of the samples. No one

of these measures will prove to be absolutely the same for every

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

sample, and our problem is to determine the standard-deviation

that each such measure will exhibit.

2. In solving this problem, we must be careful to define

precisely the conditions which are assumed to subsist, so as to

realise the limitations of any solution obtained. These conditions


were discussed very fully for the case of attributes (Chap. XIII.

§ 8), and we would refer the student to the discussion then given.

Here it is sufficient to state the assumptions briefly, using the

letters (a), (6) and (c) to denote the corresponding assumptions

indicated by the same letters in the section cited.

(a) We assume that we are drawing from precisely the same

record throughout the experiment, so that the chance of drawing

a card with any given value of X, or a value within any assigned

limits, is the same at each sampling.

(b) We assume not only that we are drawing from the same

record throughout, but that each of our cards at each drawing

may be regarded quite strictly as drawn from the same record (or

from identically similar records): e.g. if our card-record is con-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

tained in a series of bundles, we must not make it a practice to

take the first card from bundle number 1, the second card from

bundle number 2, and so on, or else the chance of drawing a

card with a given value of X, or a value within assigned limits,

may not be the same for each individual card at each drawing.

(c) We assume that the drawing of each card is entirely

independent of that of every other, so that the value of X recorded

on card 1, at each drawing, is uncorrelated with the value of X

recorded on card 2, 3, 4, and so on. It is for this reason that we

spoke of the record, in § 1, as containing a practically infinite

number of cards, for otherwise the successive drawings at each

sampling would not be independent: if the bag contain ten

tickets only, bearing the numbers 1 to 10, and we draw the card

bearing 1, the average of the following cards drawn will be higher

than the mean of all cards drawn; if, on the other hand, we draw

the 10, the average of the following cards will be lower than the mean

of all cards—i.e. there will be a negative correlation between the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

number on the card taken at any one drawing and the card taken

at any other drawing. Without making the number of cards in

the bag indefinitely large, we can, as already pointed out for the

case of attributes (Chap. XIII. § 3), eliminate this correlation by

replacing each card before drawing the next.

Sampling conducted under these conditions we shall, as before,

speak of as simple sampling. We do not, it should be noticed,

make the further assumption that the sample is unbiassed, i.e.

that the chance of inclusion in the sample is independent of the

value of X recorded on the card (c/. the last paragraph in § 8,

Chap. XIII., and the discussion in §§ 4-8, Chap. XIV.). This

assumption is unnecessary. If it be true, the interpretation of

our results becomes simpler and more straightforward, for we

can substitute for such phrases as "the standard-deviation of X

in a very large sample," "the form of the frequency-distribution


in a very large sample," the phrases "the standard-deviation of

X in the original record" "the form of the frequency-distribution

in the original record": but in very many, perhaps the majority

of, practical cases the very question at issue is the nature of the

relation between the distribution of the sample and the distribu-

tion of the record from which it is drawn. As has already been

emphasised in the passages to which reference is made above, no

examination of samples drawn under the same conditions can

give any evidence on this head.

3. Standard Error of a Percentile.—Let us consider first the

fluctuations of sampling for a given percentile, as the problem is

intimately related to that of Chaps. XIII.-XIV.

Let Xp be a value of X such that pN of the values of X in

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

an indefinitely large sample drawn under the same conditions lie

above it and qN below it.

If we note the proportions of observations above Xp in samples

of n drawn from the record, we know that these observed values

will tend to centre round p as mean, with a standard-deviation

•Jpq/n. If now at each drawing, as well as observing the pro-

portion of X's above Xp, say p + 8, for the sample, we also proceed

to note the adjustment e required in Xp to make the proportion

of observations above Xv + e in the sample pn, the standard-

deviation of t will bear to the standard-deviation of 8 the same

ratio that t on an average bears to 8. But this ratio is quite

simply determinable if the number of observations in the sample

is sufficiently large to justify us in assuming that 8 is small—so

small that we may regard the element of the frequency curve

(for a very large sample) over which Xv +t ranges as approximately

a rectangle. If this assumption be made, and we denote the

standard-deviation of X in a very large sample by tr, and the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

ordinate of the frequency curve at Xp when drawn with unit area

and unit standard-deviation by yp,


Therefore for the standard-deviation of « or of the percentile

corresponding to a proportion p we have

Vvy n

4. If the frequency-distribution for the very large sample be a

normal curve, the values of yp for the principal percentiles may be

taken from the published tables. A table calculated by Mr

Sheppard (Table IV., ref. 14, in Appendix I.), gives the values


directly, and these have been utilised for the following: the

student can estimate the values roughly by a combined use of the

area and ordinate tables for the normal curve given in Chapter

XV., remembering to divide the ordinates given in that table by

J2ir so as to make the area unity—

Value of yp

Median .

Deciles 4 and 6

„ 3 and 7

2 and 8

„ 1 and 9

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259







Inserting these values of yp in equation (1), we have thr.

following values for the standard errors of the median, deciles,

etc., and the values given in the second column for their probable

errors (Chap. XV. § 17), which the student may sometimes find


Standard error is Probable error is

tr/Vn multiplied by ,r/Vre multiplied by


. 1-25331


Deciles 4 and 6 .
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

. 1-26804


„ 3 and 7 .

. 1-31800


2 and 8 .

. 1-42877


„ 1 and 9 .

. 1-70942



. 1-36263


It will be seen that the influence of fluctuations of sampling on

the several percentiles increases as we depart from the median:

the standard error of the quartiles is nearly one-tenth greater than

that of the median, and the standard error of the first or ninth

deciles more than one-third greater.

5. Consider further the influence of the form of the frequency-

distribution on the standard error of the median, as this is an

important form of average. For a distribution with a given

number of observations and a given standard-deviation the

standard error varies inversely as yp. Hence for a distribution in

which yp is small, for example a U-shaped distribution like that

of fig. 18 or fig. 19, the standard error of the median will be

relatively high, and it will, in so far, be an undesirable form of

average to employ. On the other hand, in the case of a distribu-

tion which has a high peak in the centre, so as to exhibit a value

of yp large compared with the standard-deviation, the standard

error of the median will be relatively low. We can create such a


"peaked" distribution by superposing a normal curve with a

small standard-deviation on a normal curve with the same mean

and a relatively large standard-deviation. To give some idea of

the reduction in the standard error of the median that may be

effected by a moderate change in the form of the distribution, let

us find for what ratio of the standard-deviations of two such curves,

having the same area, the standard error of the median reduces to

°7\/n, where o- is of course the standard-deviation of the com-

pound distribution.

Let o-1, o-2 be the standard-deviations of the two distributions,

and let there be n/2 observations in each. Then


On the other hand, the value of yp is—

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Hence the standard error of the median is


2* ^ . . . . (c)

= 1.

n o-1 + o-2

(c) is equal to a/Jn if

(o"! + g-2) y/o-.i + ar\

2 s/Tr(ri<r2

Writing o-j/o-1 = p, that is if

2 ijirp


p4 + 2p3 + (2-4ir)p2 + 2p + l=0.

This equation may be reduced to a quadratic and solved by

taking p + —as a new variable. The roots found give p = 2-2360

.... or 0.4472 . . . ., the one root being merely the reciprocal of

the other. The standard error of the median will therefore be

o-/^/n, in such a compound distribution, if the standard-deviation

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

of the one normal curve is, in round numbers, about 2\ times

that of the other. If the ratio be greater, the standard error

of the median will be less than <r1jn. The distribution


for which the standard error of the median is exactly equal to

o-/Jn is shown in fig. 53: it will be seen that it is by no means

a very striking form of distribution; at a hasty glance it might

almost be taken as normal. In the case of distributions of a form

more or less similar to that shown, it is evident that we cannot

at all safely estimate by eye alone the relative standard error of

the median as compared with o-/Jn.

6. In the case of a grouped frequency-distribution, if the

number of observations is sufficient to make the class-frequencies

run fairly smoothly, i.e. to enable us to regard the distribution

Fig. 53.

as nearly that of a very large sample, the standard error of any

percentile can be calculated very readily indeed, for we can

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

eliminate a from equation (1). Let fp be the frequency-per-

class-interval at the given percentile—simple interpolation will

give us the value with quite sufficient accuracy for practical

purposes, and if the figures run irregularly they may be smoothed.

Let o- be the value of the standard-deviation expressed in class-

intervals, and let n be the number of observations as before.

Then since yp is the ordinate of the frequency-distribution when

drawn with unit standard-deviation and unit area, we must


Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

But this gives at once for the standard error expressed in terms

of the class-interval as unit

% = ^F .... (2)

As an example in which we can compare the results given by

the two different formulae (1) and (2), take the distribution of

stature used as an illustration in Chaps. VII. and VIII. and in

§§ 13, 14 of Chap. XV. The number of observations is 8585,

and the standard-deviation 2-57 in., the distribution being

approximately normal: o-/v/w = 0'027737, and, multiplying by the

factor 1-253 .... given in the table in § 4, this gives 0-0348

as the standard error of the median, on the assumption of

normality of the distribution. Using the direct method of

equation (2), we find the median to be 6747 (Chap. VII. § 15),

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

which is very nearly at the centre of the interval with a

frequency 1329. Taking this as being, with sufficient accuracy

for our present purpose, the frequency per interval at the median,

the standard error is

As we should expect, the value is practically the same as that

obtained from the value of the standard-deviation on the assump-

tion of normality.

Let us find the standard error of the first and ninth deciles

as another illustration. On the assumption that the distribu-

tion is normal, these standard errors are the same, and equal to

0-027737 x 1-70942 = 0-0474. Using the direct method, we

find by simple interpolation the approximate frequencies per

interval at the first and ninth deciles respectively to be 590 and

570, giving standard errors of 0-0471 and 0-0488, mean 00479,

slightly in excess of that found on the assumption that the fre-

quency is given by the normal curve. The student should notice

that the class-interval is, in this case, identical with the unit of
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

measurement, and consequently the answer given by equation (2)

does not require to be multiplied by the magnitude of the


In the case of the distribution of pauperism (Chap. VII.,

Example i.), the fact that the class-interval is not a unit must

be remembered. The frequency at the median (3-195 per cent.)

is approximately 96, and this gives for the standard error of the

median by (2) (the number of observations being 632) 0-1309

intervals, that is 0-0655 per cent.

7. In finding the standard error of the difference between two


percentiles in the same distribution, the student must be care-

ful to note that the errors in two such percentiles are not

independent. Consider the two percentiles, for which the values

of p and q are p, qv p2 q2 respectively, the first-named being the

lower of the two percentiles. These two percentiles divide the

whole area of the frequency curve into three parts, the areas of

which are proportional to qv 1 - q1 -p2, and p2. Further, since

the errors in the first percentile are directly proportional to the

errors in qv and the errors in the second percentile are directly

proportional but of opposite sigh to the errors in p2, the corre-

lation between errors in the two percentiles will be the same as

the correlation between errors in ql and p2 but of opposite sign.

But if there be a deficiency of observations below the lower

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:50 GMT / http://hdl.handle.net/2027/mdp.39015033708259

percentile, producing an error 8l in qv the missing observations

will tend to be spread over the two other sections of the curve

in proportion to their respective areas, and will therefore tend to

produce an error

2 pi l

in p2. If then r be the correlation between errors in q1 and p,,,

t, and e2 their respective standard errors, we have

fi pi

Or, inserting the values of the standard errors,


The correlation between the percentiles is the same in magni-

tude but opposite in sign: it is obviously positive, and consequently

correlation between errors 1 - , Ip2i\ ,„.

in two percentiles J V q<g>, • v*f

If the two percentiles approach very close together, q1 and q1,

p1 and p2 become sensibly equal to one another, and the correla-

tion becomes unity, as we should expect.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

8. Let us apply the above value of the correlation between

percentiles to find the standard error of the semi-interquartile

range for the normal curve. Inserting q, =p2 = \, q2 =p1 = |, we

find r = J. Hence the standard error of the interquartile range

is, applying the- ordinary formula for the standard-deviation of a

difference, 2/^3 times the standard error of either quartile, or


the standard error of the semi-interquartile range l/»/3 times

the standard error of a quartile. Taking the value of the

standard error of a quartile from the table in § 4, we have, finally,

standard error of the semi- \ &

interquartile range in a V =0.78672~~r . . (4)

normal distribution ) v

Of course the standard-deviation of the inter-quartile, or semi-

interquartile, range can readily be worked out in any particular

case, using equation (2) and the value of the correlation

given above: it is best to work out such standard errors

from first principles, applying the usual formula for the standard

deviation of the difference of two correlated variables (Chap. XI.

§ 2, equation (1)).
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

9. If there is any failure of the conditions of simple sampling,

the formulte of the preceding sections cease, of course, to hold

good. We need not, however, enter again into a discussion of

the effect of removing the several restrictions, for the effect on

the standard error of p was considered in detail in §§ 9-14 of

Chap. XIV., and the standard error of any percentile is directly

proportional to the standard error ol p (cf. § 3). Further, the

student may be reminded that the standard error of any per-

centile measures solely the fluctuations that may be expected in

that percentile owing to the errors of simple sampling alone: it

has no bearing, therefore, save on the one question, whether an

observed divergence of the percentile, from a certain value that

might be expected to be yielded by a more extended series of

observations or that had actually been observed in some other

series, might or might not be due to fluctuations of simple

sampling alone. It cannot and does not give any indication of

the possibility of the sample being biassed or unrepresentative of

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

the material from which it has been drawn, nor can it give any

indication of the magnitude or influence of definite errors of

observation—errors which may conceivably be of greater im-

portance than errors of sampling. In the case of the distribution

of statures, for instance, the standard error almost certainly gives

quite a misleading idea as to the accuracy attained in determining

the average stature for the United Kingdom: the sample is not

representative, the several parts of the kingdom not contributing

in their true proportions. The student should refer again to the

discussion of these points in §§ 4-8 of Chap. XIV. Finally, we

may note that the standard error of a percentile cannot be

evaluated unless the number of observations is fairly large—large

enough to determine fp (eqn. 2) with reasonable accuracy, or


to test whether we may treat the distribution as approximately

normal (cf. also § 16 below).

(As regards the theory of sampling for the median and per-

centiles generally, cf. ref. 12, Laplace, Supplement II. (standard

error of the median), Edgeworth, refs. 4, 5, 6, and Sheppard, ref.

21: the preceding sections have been based on the work of

Edgeworth and Sheppard.)

10. Standard Error of the Arithmetic Mean.—Let us now pass

to a fresh problem, and determine the standard error of the

arithmetic mean.

This is very readily obtained. Suppose we note separately at

each drawing the value recorded on the first, second, third ....

and nth card of our sample. The standard-deviation of the values

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

on each separate card will tend in the long run to be the same

and identical with the standard-deviation o- of x in an indefinitely

large sample, drawn under the same conditions. Further, the

value recorded on each card is (as we assume) uncorrelated with

that on every other. The standard-deviation of the sum of the

values recorded on the n cards is therefore Jn.o-, and the

standard-deviation of the mean of the sample is consequently

1/nth of this; or,



This is a most important and frequently cited formula, and the

student should note that it has been obtained without any

reference to the size of the sample or to the form of the frequency-

distribution. It is therefore of perfectly general application, if

o- be known. We can verify it against our formula for the

standard-deviation of sampling in the case of attributes. The

standard-deviation of the number of successes in a sample of m

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

observations is Jm.pq: the standard-deviation of the total

number of successes in n samples of m observations each is there-

fore Jnm.pq: dividing by n we have the standard-deviation of

the mean number of successes in the n samples, viz. Jmpq/ Jn,

agreeing with equation (5).

11. For a normal curve the standard error of the mean is to

the Standard error of the median approximately as 100 to 125

(cf- § 4), and in general the standard errors of the two stand in

a somewhat similar ratio for a distribution not differing largely

from the normal form. For the distribution of statures used as

an illustration in § 6 the standard error of the median was found

to be 0.0349: the standard error of the mean is only 0.0277.

The distribution being very approximately normal, the ratio of


the two standard errors, viz. 1 "26, assumes almost exactly the theo-

retical magnitude. In the case of the asymmetrical distribution of

rates of pauperism, also used as an illustration in § 6, the standard

error of the median was found to be 0-0655 per cent. The

standard error of the mean is only 0-0493 per cent., which bears

to the standard error of the median a ratio of 1 to 1-33. As

such cases as these seem on the whole to be the more common

and typical, we stated in Chap. VII. § 18 that the mean is in

general less affected than the median by errors of sampling. At

the same time we also indicated the exceptional cases in which

the median might be the more stable—cases in which the mean

might, for example, be affected considerably by small groups of

widely outlying observations, or in which the frequency-distri-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

bution assumed a form resembling fig. 53, but even more

exaggerated as regards the height of the central "peak" and the

relative length of the "tails." Such distributions are not un-

common in some economic statistics, and they might be expected

to characterise some forms of experimental error. If, in these

cases, the greater stability of the median is sufficiently marked

to outweigh its disadvantages in other respects, the median

may be the better form of average to use. Fig. 53 represents

a distribution in which the standard errors of the mean and of the

median are the same. Further, in some experimental cases it is

conceivable that the median may be less affected by definite

experimental errors, the average of which does not tend to be

zero, than is the mean,—this is, of course, a point quite distinct

from that of errors of sampling.

12. If two quite independent samples of nl and n2 observations

respectively be drawn from a record, evidently e12, the standard

error of the difference of their means is given by

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google


If an observed difference exceed three times the value of e12

given by this formula it can hardly be ascribed to fluctuations

of sampling. If, in a practical case, the value of o- is not known

a priori, we must substitute an observed value, and it would seem

natural to take as this value the standard-deviation in the two

samples thrown together. If, however, the standard-deviations

of the two samples themselves differ more than can be accounted

for on the basis of fluctuations of sampling alone (see below, § 15),

we evidently cannot assume that both samples have been drawn

from the same record: the one sample must have been drawn

from a record or a universe exhibiting a greater standard-deviation


than the other. If two samples be drawn quite independently

from different universes, indefinitely large samples from which

exhibit the standard-deviations o-l and o-2, the standard error of

the difference of their means will be given by

4=^+ff' • . • - (7)

This is, indeed, the formula usually employed for testing the

significance of the difference between two means in any case:

seeing that the standard error of the mean depends on the

standard-deviation only, and not on the mean, of the distribution,

we can inquire whether the two universes from which samples

have been drawn differ in mean apart from any difference in


If two quite independent samples be drawn from the same

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

universe, but instead of comparing the mean of the one with the

mean of the other we compare the mean m1 of the first with the

mean m0 of both samples together, the use of (6) or (7) is not

justified, for errors in the mean of the one sample are correlated

with errors in the mean of the two together. Following precisely

the lines of the similar problem in § 13, Chap. XIII., case III., we

find that this correlation is Ju1/fa + n2), and hence

4l = o* 7 \ s - - - • (8)

(For a complete treatment of this problem in the case of samples

drawn from two different universes cf. ref. 19.)

13. The distribution of means of samples drawn under the

conditions of simple sampling will always be more symmetrical

than the distribution of the original record, and the symmetry

will be the greater the greater the number of observations in the

sample. Further, the distribution of means (and therefore also of

the differences between means) tends to become not merely sym-

metrical but normal. We can only illustrate, not prove, the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

point here; but if the student will refer to§ 13, Chap. XV., he will

see that the genesis of the normal curve in this case is in accord-

ance with what we then stated, viz. that the distribution tends to

be normal whenever the variable may be regarded as the sum

(or some slightly more complex function) of a number of other

variables. In the present instance this condition is strictly ful-

filled. The mean of the sample of n observations is the sum of

the values in the sample each divided by n, and we should expect

the distribution to be the more nearly normal the larger n. As

an illustration of the approach to symmetry even for small values


of n, we may take the following case. If the student will turn to

the calculated binomials, given as illustrations of the forms of

binomial distributions in Chap. XV. § 3, he will find there the

distribution of the number of successes for twenty events when

q = 0'9, p = 0-l: the distribution is extremely skew, starting at

zero, rising to high frequencies for 1 and 2 successes, and thence

tailing off to 20 cases of 7 successes in 10,000 throws, 4 cases of 8

successes and 1 case of 9 successes. But now find the distribu-

tion for the mean number of successes in groups of five throws,

under the same conditions. This will be equivalent to finding

the distribution of the number of successes for 100 such events,

and then dividing the observed number of successes by five—the

last process making no difference to the form of the distribution,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

but only to its scale. But the distribution of the number of

successes for 100 events when g• = 0.9, jb = 0'1, is also given in

Chap. XV. § 3, and it will be seen that, while it is appreciably

asymmetrical, the divergence from symmetry is comparatively

small: the distribution has gained very greatly in symmetry

though only five observations have been taken to the sample.

We may therefore reasonably assume, if our sample is large,

that the distribution of means is approximately a normal dis-

tribution, and we may calculate, on that assumption, the fre-

quency with which any given deviation from a theoretical value

or a value observed in some other series, in an observed mean, will

arise from fluctuations of simple sampling alone.

The warning is necessary, however, that the approach to

normality is only rapid if the condition that the several drawings

for each sample shall be independent is strictly fulfilled. If the

observations are not independent, but are to some extent positively

correlated with each other, even a fairly large sample may con-
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

tinue to reflect any asymmetry existing in the original distribution

(cf. ref. 24 and the record of sampling there cited).

If the original distribution be normal, the distribution of

means, even of small samples, is strictly normal. This follows at

once from the fact that any linear function of normally distributed

variables is itself normally distributed (Chap. XVI. § 6). The

distribution will not in general, however, be normal if the

deviation of the mean of each sample is expressed in terms of the

standard-deviation of that sample (cf. ref. 22).

14. Let us consider briefly the effect on the standard error of

the mean if the conditions of simple sampling as laid down in

§ 2 cease to apply.

(a) If we do not draw from the same record all the time, but

first draw a series of samples from one record, then another

series from another record with a somewhat different mean and


standard-deviation, and so on, or if we draw the successive

samples from essentially different parts of the same record, the

standard error will be greatly increased. For suppose we draw

&1 samples from the first record, for which the standard-deviation

(in an indefinitely large sample) is o-v and the mean differs by

d1 from the mean of all the records together (as ascertained by

large samples in numbers proportionate to those now taken); k2

samples from the second record, for which the standard-deviation

is o-2, and the mean differs by d2 from the mean of all the records

together, and so on. «Then for the samples drawn from the first

record the standard error of the mean will be <rJJn, but the

distribution will centre round a value differing by dl from the

mean for all the records together: and so on for the samples
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

drawn from the other records. Hence, if o-m be the standard error

of the mean, If the total number of samples,

Krt^k^ + ^.cP).

But the standard-deviation o-0 for all the records together is given


Hence, writing 2(W2) = N.s2m,

.?+^4 .... (9)

This equation corresponds precisely to equation (2) of § 9, Chap.

XIV. The standard error of the mean, if our samples are drawn

from different records or from essentially different parts of the

entire record, may be increased indefinitely as compared with the

value it would have in the case of simple sampling. If, for

example, we take the statures of samples of n men in a number

of different districts of England, and the standard-deviation of all

the statures observed is o-0, the standard-deviation of the means

for the different districts will not be <rjjn, but will have some
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

greater value, dependent on the real variation in mean stature

from district to district.

(b) If we are drawing from the same record throughout, but

always draw the first card from one part of that record, the

second card from another part, and so on, and these parts differ

more or less, the standard error of the mean will be decreased.

For if, in large samples drawn from the subsidiary parts of the

record from which the several cards are taken, the standard-

deviations are o-v <r2, . . . . o-„, and the means differ by dv d2,

. . . . dn from the mean for a large sample from the entire record,

we have


-*?-£ .... (10)

The last equation again corresponds precisely with that given for

the same departure from the rules of simple sampling in the case

of attributes (Chap. XIV. § 11., eqn. 4). If, to vary our previous

illustration, we had measured the statures of men in each of re

different districts, and then proceeded to form a set of samples

by taking one man from each district for the first sample, one

man from each district for the second sample, and so on, the

standard-deviation of the means of the samples so formed would

be appreciably less than the standard error of simple sampling

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

<rjjn. As a limiting case, it is evident that if the men in each

district were all of precisely the same stature, the means of all the

samples so compounded would be identical: in such a case, in fact,

o-0 = *„„ and consequently o-m = 0. To give another illustration, if

the cards from which we were drawing samples had been arranged

in order of the magnitude of X recorded on each, we would get

a much more stable sample by drawing one card from each

successive nth part of the record than by taking the sample

according to our previous rules—e.g. shaking them up in a bag

and taking out cards blindfold, or using some equivalent process.

The result is perhaps of some practical interest. It shows that,

if we are actually taking samples from a large area, different

districts of which exhibit markedly different means for the

variable under consideration, and are limited to a sample of re

observations; if we break up the whole area into re sub-districts,

each as homogeneous as possible, and take a contribution to the

sample from each, we will obtain a more stable mean by this

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

orderly procedure than will be given, for the same number of

observations, by any process of selecting the districts from which

samples shall be taken by chance. There may, however, be a

greater risk of biassed error. The conclusions seem in accord

with common-sense.

(c) Finally, suppose that, while our conditions (a) and (b) of § 2

hold good, the magnitude of the variable recorded on one card

drawn is no longer independent of the magnitude recorded on


another card, e.g. that if the first card drawn at any sampling

bears a high value, the next and following cards of the same

sample are likely to bear high values also. Under these circum-

stances, if r,2 denote the correlation between the values on the

first and second cards, and so on,

& = -+2-a(ra + ra+ .... +ra+ ....).

There are n(n-l)/2 correlations; and if, therefore, r is the

arithmetic mean of them all, we may write


.-[l+r(n-l)] . . . (11)

As the means and standard-deviations of xv jc2, .... xn are all

identical, r may more simply be regarded as the correlation

coefficient for a table formed by taking all possible pairs of the

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

n values in every sample. If this correlation be positive, the

standard error of the mean will be increased, and for a given

value of r the increase will be the greater, the greater the size of

the samples. If r be negative, on the other hand, the standard

error will be diminished. Equation (11) corresponds precisely to

equation (6), § 13, of Chap. XIV.

As was pointed out in that chapter, the case when r is positive

covers the case discussed under (a): for if we draw successive

samples from different records, such a positive correlation is at

once introduced, although the drawings of the several cards at

each sampling are quite independent of one another. Similarly,

the case discussed under (6) is covered by the case of negative

correlation, for if each card is always drawn from a separate and

distinct part of the record, the correlation between any two x's will

on the average be negative: if some one card be always drawn

from a part of the record containing low values of the variable,

the others must on an average be drawn from parts containing

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

relatively high values. It is as well, however, to keep the cases

(a), (b), and (c) distinct, since a positive or negative correlation

may arise for reasons quite different from those considered under

(a) and (6).

15. With this discussion of the standard error of the arithmetic

mean we must bring the present work to a close. To indicate

briefly our reasons for not proceeding further with the discussion

of standard errors, we must remind the student that in order to

express the standard error of the mean we require to know, in

addition to the mean itself, the standard-deviation about the mean,

or, in other words, the mean (deviation)2 with respect to the mean.

Similarly, to express the standard error of the standard-deviation

we require to know, in the general case, the mean (deviation)4

with respect to the mean. Either, then, we must find this quantity

for the given distribution—and this would entail entering on a

field of work which hitherto we have intentionally avoided—or we

must, if that be possible, assume the distribution to be of such a

form that we can express the mean (deviation)4 in terms of the

mean (deviation)2. This can be done, as a fact, for the normal

distribution, but the proof would again take us rather beyond

the limits that we have set ourselves. To deal with the standard

error of the correlation coefficient would take us still further

afield, and the proof would be laborious and difficult, if not

impossible, without the use of the differential and integral cal-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

culus. We must content ourselves, therefore, with a simple

statement of the standard errors of the three most important

constants, standard-deviation, correlation coefficient, and regres-

sion. [The fundamental memoirs are refs. 15, 17, 21.]

Standard-deviation.—If the distribution be normal.

standard error of the

tandard-deviation in .

normal distribution )

standard-deviation in > = #„- . . (12)

This is generally given as the standard error in all cases: it is,

however, by no means exact: the general expression is

standard error of the standard- ) I' - «

deviation in a distribution > =v/ t "(13)

of any form ) ^'

where fi4 is the mean (deviation)4—deviations being, of course,

measured from the mean—and /x2 the mean (deviation)2 or the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

square of the standard-deviation: n is assumed sufficiently large

to make the errors in the standard-deviation small compared with

that quantity itself. Equation (13) may in some cases give

values considerably greater—twice as great or more—than (12).

(Cf. ref. 14.) If, however, the distribution be normal, equation

(12) gives the standard error not merely of standard-deviations of

order zero, to use the terminology of Chap. XII., but of standard-

deviations of any order (ref. 25). It will be noticed, on reference

to equation (4) above, § 8, that the standard error of the standard-

deviation is absolutely greater but relatively less than that of the

semi-interquartile range for a normal distribution.

For a normal distribution, again, we have—

standard error of the co- ) - v j , 9/ v \2 )' ,, ..

efficient of variation v J J2n I V100/ 1'


The expression in the bracket is usually very nearly unity, for

a normal distribution, and in that case may be neglected.

Correlation coefficient.—If the distribution be normal,

standard error of the cor- I \ - ■,-

relation coefficient for J- =—»=■ . . (15)

a normal distribution | x"

This is the value always given: the use of a more general formula

which would entail the use of higher moments does not appear

to have been attempted. As regards the case of small samples,

cf. ref. 23. Equation (15) gives the standard error of a coefficient

of any order, total or partial (ref. 25).

Coefficient of regression.—If the distribution be normal,

standard error of the co- | (7l J\ -^ „lt

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

efficient of regression b1 2 > = ~j=~ = J . (16)

for a normal distribution ) 2 J

This formula again applies to a regression coefficient of any order,

total or partial: i.e. in terms of our general notation, k denoting

any collection of secondary subscripts,

standard error of bV2t for 1 -. .??•*

a normal distribution J o-.lk Jn.

To convert any standard error to the probable error multiply by

the constant 0-674489 ....

16. We need hardly restate once more the warnings given in

Chap. XIV., and repeated in § 9 above, that a standard error can

give no evidence as to the biassed or representative character of

a sample, nor as to the magnitude of errors of observation, but

we may, in conclusion, again emphasise the warnings given

in §§ 1-3, Chap. XIV., as to the use of standard errors when

the number of observations in the sample is small.

In the first place, if the sample be small, we cannot in general

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

assume that the distribution of errors is approximately normal:

it would only be normal in the case of the median (for which p

and q are equal) and in the case of the mean of a normal distri-

bution. Consequently, if n be small, the rule that a range of

three times the standard error includes the majority of the

fluctuations of simple sampling of either sign does not strictly

apply, and the "probable error" becomes of doubtful significance.

Secondly, it will be noted that the values of o- and yp in (1), of

fp in (2), and of o- in (4) and (5), i.e. the values that would be

given for these constants by an indefinitely large sample drawn


under the same conditions, or the values that they possess in

the original record if the sample is unbiassed, are assumed to be

known a priori. But this is only the case in dealing with the

problems of artificial chance: in practical cases we have to use

the values given us by the sample itself. If this sample is based

on a considerable number of observations the procedure is safe

enough, but if it be only a small sample we may possibly mis-

estimate the standard error to a serious extent. Following the

procedure suggested in Chap. XIV., some rough idea as to the

possible extent of under-estimation or over-estimation may be

obtained, e.g. in the case of the mean, by first working out the

standard error of o- on the assumption that the values for the

necessary moments are correct, and then replacing o- in the

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

expression for the standard error of the mean by o- ± three times

its standard error so obtained.

Finally, it will be remembered that unless the number of

observations is large, we cannot interpret the standard error of

any constant in the inverse sense, i.e. the standard error ceases

to measure with reasonable accuracy the standard-deviation of

true values of the constant round the observed value (Chap.

XIV. § 3). If the sample be large, the direct and inverse

standard errors are approximately the same.


(1) Blakeman, J., and Karl Pearson, "On the Probable Error of the

Coefficient of Mean Square Contingency," Biometrika, vol. v., 1906, p.


(2) Bowley, A. L., The Measurement of Groups and Scries ; C. & E Layton,

London, 1903.

(3) Bowley, A. L., Address to Section F of the British Association, 1906.

(4) Edgewokth, F. Y., "Observations and Statistics: An Essay on the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Theory of Errors of Observation and the First Principles of Statistics,"

Cambridge Phil. Trans., vol. xiv., 1885, p. 139.

(5) Edgeworth, F. Y., "Problems in Probabilities," Phil. Mag., 5th Series,

vol. xxii., 1886, p. 371.

(6) Edgeworth, F. Y., "The Choice of Means," Phil. Mag., 5th Series,

vol. xxiv., 1887, p. 268.

(7) Edgeworth, F. Y., "On the Probable Errors of Frequency Constants,"

Jour. Roy. Stat. Soc, vol. lxxi., 1908, pp. 381, 499, 651; and

Addendum, vol. Ixxii., 1909, p. 81.

(8) Elderton, W. Palin, "Tables for Testing the Goodness of Fit of Theory

to Observation," Biometrika, vol. i., 1902, p. 155.

(9) Gibson, Winifred, "Tables for Facilitating the Computation of

Probable Errors," Biometrika, vol. iv., 1906, p. 385.

(10) Heron, D., "An Abac to determine the Probable Errors of Correlation

Coefficients," Biometrika, vol. vii., 1910, p. 411. (A diagram giving

the probable error for any number of observations up to 1000.)


(11) Heron, D., "On the Probable Error of a Partial Correlation Coefficient,"

Biometrika, vol. vii., 1910, p. 411. (A proof, on ordinary algebraic

lines, for the case of three variables, of the result given in (25).)

(12) Laplace, Pierre Simon, Marquis de, Theorie des probabilites, 2e edn.,

1814. (With four supplements.)

(13) Pearl, Raymond, "The Calculation of the Probable Errors of Certain

Constants of the Norm-' Curve," Biometrika, vol. v., 1906, p. 190.

(14) Pearl, Raymond, "On certain Points concerning the Probable Error

ofthe Standard-deviatirn," Biometrika, vol. vi., 1908, p. 112. (On

the amount of divergence, in certain cases, from the probable error

o-/V2M in the case of a normal distribution.)

(15) Pearson, Karl, and L. N. G. Filon, "On the Probable Errors of

Frequency Constants, and on the Influence of Random Selection on

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

V iriation and Correlation," Phil. Trans. Roy. Soc., Series A, vol. cxci.,

1"98, p. 229.

(16) Pe; rson, Karl, "On the Criterion that a given System of Deviations

fi ,m the Probable in the Case of a Correlated System of Variables is

such that it can be reasonably supposed to have arisen from Random

Sampling," Phil. Mag., 5th series, vol. 1., 1900, p. 157.

(17) Pe. rson, Karl, and others (editorial), "On the Probable Errors of

F iquency Constants," Biometrika, vol. ii., 1903, p. 273. (Useful for

tl "eneral formulae given, based on the general case without respect to

t1 in of the frequency-distribution.)

(18) Pea , Karl, "On the Curves which are most suitable for describing

tl. quency of Random Samples of a Population," Biometrika, vol.

v. J6, p. 172.

(19) Pea tsoN, Karl, "Note on the Significant or Non-significant Character

of i Sub-sample drawn from a Sample," Biometrika, vol. v., 1906, p.


(20) Rhind, A., "Tables for Facilitating the Computation of Probable Errors
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

of the Chief Constants of Skew Frequency-distributions," Biometrika,

vol. vii., 1909-10, p. 127 and p. 386.

(21) Sheppard, W. F., " On the Application ofthe Theory of Error to Cases

of Normal Distribution and Normal Correlation," Phil. Trans. Roy.

Soc., Series A, vol. cxcii., 1898, p. 101.

(22) "Student," " On the Probable Error of a Mean," Biometrika, vol. vi.,

1Q08, p. 1. (The standard error ofthe mean in terms ofthe standard

en 1r of the sample.)

(23) "Student," "On the Probable Error of a Correlation Coefficient,"

Biometrika, vol. vi., 1908, p. 302. (The problem of the probable error

with small samples.)

(24) "Student," " On the Distribution of Means of Samples which are not

drawn at Random," Biometrika, vol. vii., 1909, p. 210.

(25) Yule, G. U., "On the Theory of Normal Correlation for any number of

Variables treated by a New System of Notation," Proc. Roy. Soc.,

Series A, vol. lxxix., 1907, p. 182. (See pp. 192-3 at end.)

Reference may also be made to the following, which deals for the

most part with the effects of errors other than errors of sampling:—

(26) BoWLEY, A. L., "Relations between the Accuracy of an Average and

that of its Constituent Parts," Jour. Roy. Slat. Soc, vol. lx., 1897,

p. 855.


1. For the data in the last column of Table IX., Chap. VI. p. 95, find

the standard error of the median (164-7 lbs.).

2. For the same distribution, find the standard errors of the two quartiles

(142-5 lbs., 168-4 lbs.).

3. For the same distribution, find the standard error of the semi-inter-

quartile range.

4. The standard-deviation of the same distribution is 21 "3 lbs. Find the

standard error of the mean, and compare its magnitude with that of the

standard error of the median (Qn. 1).

5. Work out the standard error of the standard-deviation for the distribu-

tion of statures used as an illustration in § 6. (Standard-deviation 2,^7 in.:

8585 observations.) Compare the ratio of standard error of s(indard-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

deviation to the standard-deviation, with the ratio of the standard jerror of

the semi-interquartile range to the semi-interquartile range, assunj.jng the

distribution normal."

6. Calculate a small table giving the standard errors of the correlation

coefficient, based on (1) 100, (2) 1000 observations, for values of r = 0, .1.2, 04,

0-6, 0 8, assuming the distribution normal. v'

161 ,
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



For heavy arithmetical work an arithmometer is, of course,

invaluable; but, owing to their cost, arithmetic machines are, as a

rule, beyond the reach of the student. For a great deal of simple

work, especially work not intended for publication, the student

will find a slide-rule exceedingly useful: particulars and prices

will be found in any instrument maker's catalogue. A plain

25-cm. rule will serve for most ordinary purposes, or if greater

accuracy is desired, a 50-cni. rule, a Fuller spiral rule, or one of

Hannyngton-pattern rules (Aston it Mander, London), in which

the scale is broken up into a number of parallel segments, may be

preferred. For greater exactness in multiplying or dividing,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

logarithms are almost essential: five-figure tables suffice if answers

are only desired true to five digits; if greater accuracy is needed,

seven-figure tables must be used. It is hardly necessary to cite

special editions of tables of logarithms here, but attention may

perhaps be directed to the recently issued eight-figure tables of

Bauschinger and Peters (W. Engelmann, Leipzig, and Asher & Co.,

London, 1910; vol. i. containing logarithms of all numbers from

1 to 200,000, price 18s. 6d. net.; vol. ii. to contain logs, of

trigonometric functions).

If it is desired to avoid logarithms, extended multiplication

tables are very useful. There are many of these, and four of

different forms are cited below. Zimmermann's tables are inex-

pensive and recommended for the elementary student, Cotsworth's,

Crelle's, or Peters' tables for more advanced work. Barlow's tables

are invaluable for calculating standard-deviations of ungrouped

observations and similar work.

(1) Barlow's Tables of Squares, Cubes, Square-roots, Cube-roots, and Recip-

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

rocals of all Integer Numbers up to 10,000; E. & F. N. Spon,

London and New York; stereotype edition, price 6s.



(2) Cotsworth, M. B., The Direct Calculator, Series 0. (Product table to

1000 x 1000.) M'Corquodale & Co., London ; price with thumb index,

25s.; without index, 21s.

(3) Ceelle, A. L., Rechentafeln. (Multiplication table giving all products up

to 1000x1000.) Can be obtained with explanatory introduction in

German or in English. G. Eeimer, Berlin; price 15s.

(4) Elderton, W. P. "Tables of Powers of Natural Numbers, and of the

Sums of Powers of the Natural Numbers from 1 to 100" (gives

powers up to seventh), Biometrika, vol. ii. p. 474.

(5) Peters, J., Neue Eechcntafeln fur Multiplikation und Division. (Gives

products up to 100 x 10,000: more convenient than Crelle for forming

four-figure products. Introduction in English, French or German.)

G. Eeimer, Berlin; price 15s.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(6) Zimmermann, H., Bechentafel, nebst Sammlung haufig gebrauchter

Zahlenwerthe. (Products of all numbers up to 100 x 1000: subsidiary

tables of squares, cubes, square-roots, cube-roots and reciprocals, etc.

for all numbers up to 1000 at the foot of the page.) W. Ernst & Son,

Berlin; price 5s. ; English edition, Asher & Co., London, 6s.


Several tables of service will be found in the works cited in

Appendix II., e.g., a table of Gamma Functions in Elderton's

book (10) and a table of six-figure logarithms of the factorials

of all numbers from 1 to 1100 in De Morgan's treatise (9). The

tables cited below from Biometrika are to be included with others

in a volume entitled Tables for the Use of Statisticians and

Biometricians, now in the press, to be issued for the Biometric

Laboratory of University College, London, by Messrs Dulau & Co.

(7) Davenport, C. B., Statistical Methods, with especial reference to

Biological Variation ; New York, John Wiley; London, Chapman &

Hall; second edition, 1904. (Tables of area and ordinates of the

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

normal curve, gamma functions, probable errors of the coefficient of

correlation, powers, logarithms, etc.)

(8) Elderton, W. P., "Tables for Testing the Goodness of Fit of Theory to

Observation," Biometrika, vol. i., 1902, p. 155.

(9) Everitt, P. F., "Tables of the Tetrachoric Functions for Fourfold

Correlation Tables," Biometrika, vol. vii., 1910, p. 437. (Tables for

facilitating the calculation of the correlation coefficient of a fourfold

table by Pearson's method on the assumption that it is a grouping

of a normally distributed table; cf. ref. 14 of Chap. XVI.)

(10) Gibson, Winifred, "Tables for Facilitating the Computation of Prob-

able Errors," Biometrika, vol. iv., 1906, p. 385.

(11) Heron, D., " An Abac to determine the Probable Errors of Correlation

Coefficients," Biometrika, vol. vii., 1910, p. 411. (A diagram giving

the probable error for any number of observations up to 1000.)

{12) Lee, Alice, "Tables of F(r, v) and H(r, v) Functions," British Associa-

tion Report, 1899. (Functions occurring in connection with Professor

Pearson's frequency curves.)

{13 Rhind, A., " Tables for Facilitating the Computation of Probable Errors

of the Chief Constants of Skew Frequency-distributions," Biometrika,

vol. vii., 1909-10, p. 127 and p. 386.


(14) Sheppard, W. F., "New Tables of the Probability Integral," Biometrika,

vol. ii., 1903, p. 174. (Includes not merely table of areas of the normal

curve (to seven figures), but also a table of the ordinates to the same

degree of accuracy.)

(15) Sheppard, W. F., "Table of Deviates of the Normal Curve" (with

introductory article on Grades and Deviates by Sir Francis Galton),

Biometrika, vol. v., 1907, p. 404. (A table giving the deviation of

the normal curve, in terms of the standard-deviation as unit, for the

ordinates which divide the area into a thousand equal parts.)

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google




The student may find the following short list of service, as

supplementing the lists of references given at the ends of the

several chapters, the latter containing, as a rule, original memoirs

only. The economic student who wishes to know more of the

practical side of statistics may be referred to Mr A. L. Bowley's

"Elements" (5 below), to An Elementary Manual of Statistics

(Macdonald & Evans, London, 1910), by the same writer (useful

as a general guide to English statistics), and to M. Jacques

Bertillon's @ours elementaire de statistique (Soci^te' d'^ditions

scientifiques, 1895: international in scope). Dr A. Newsholme's

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Vital Statistics (Swan Sonnenschein, 3rd edn., 1899) will also be

of service to students of that subject.

All the works mentioned in the following list, with others which

it has not been thought necessary to include, are in the library

of the Royal Statistical Society.

(1) Aiky, Sir 6. B., On the Algebraical and Numerical Theory of Errors of

Observations; lstedn., 1861 ; 3rd edn., 1879.

(2) Bernoulli, J., Ars conjectandi, opus posthumum: Accedit tractatus de

seriebus infinitis, et epistola gallice scripta de ludo pilae reticularis,'

1713. (A German translation in Ostwald's Klassiker der exakten

Wissenschaften, Nos. 107, 108.)

(3) Bertrand, J. L. F., Calcul'.desprobability ; Gauthier-Villars, Paris, 1889.

(4) Borel, E., Elements de la thAorie des probability; Hermann, Paris,


(5) Bowley, A. L., Elements of Statistics; P. S. King, London; 1st edn.,

1901 ; 3rd edn., 1907.

(6) Brtjns, H., Wahrscheinlichkeitsrechnung und Kollektivmasslehre;

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Teubner, Leipzig, 1906.

(7) Cournot, A. A., Exposition de la thiorie des chances et des probabilite's,


(8) Czuber, E., Wahrscheinlichkeitsrechnung und ihre Anwendung auf

Fehlerausgleichung, Statistik und Lebensversicherung; Teubner,

Leipzig, 2nd edn., vol. i., 1908-10.


(9) De Morgan, A., Treatise on the Theory of Probabilities (extracted from

the Encyclopuedia Metropolitana), 1837.

(10) Elderton, W. P., Frequency Curves and Correlation; C. & E. Layton,

London, 1906. (Deals with Professor Pearson's frequency curves and

correlation, with illustrations chiefly of actuarial interest.)

(11) Fechner, G. T., Kollektivmasslehre (posthumously published; edited

by G. F. Lipps); Engelmann, Leipzig, 1897.

(12) Galloway, T., Treatise on Probability (republished from the 7th edn.

of the Encyclopuedia Britannica), 1839.

(13) Gauss, C. F., Mithode des moindres carres: Mimoires sur la comhinaison

des observations, traduits par J. Bertrand, 1855.

(14) Laplace, Pierre Simon, Marquis de, Essai philosophique sur les

probabilitis, 1814. (The introduction to 15, separately printed with

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

some modifications.)

(15) Laplace, Pierre Simon, Marquis de, Thiorie analytique desprobabilitis;

2nd edn., 1814, with supplements 1 to 4.

(16) Lexis, W., Abhandlungen zur Theorie der Bevblkerungs- und Moral-

statistik; Fischer, Jena, 1903.

(17) Poincare, H., CaUul des probabilitis; Gauthier-Villars, Paris, 1896.

(18) Poisson, S. D., Recherches sur la probabiliti des jugements en matiere

criminelle et en matiere civile, pricidies des regies ginirales du calcul

des probabilites, 1837. (German translation by C. H. Schnuse, 1841.)

(19) Quetelet, L. A. J., Lettres sur la thiorie des probabilitis, appliquie aux

sciences morales et politiques, 1846. (English translation by 0. G.

Downes, 1849.)

(20) Thorndike, E. L., An Introduction to the Theory of Mental and Social

Measurements, Science Press, New York, 1904.

(21) Venn, J., The Logic of Chance: an Essay on the Foundations and

Province of the Theory of Probability, with especial reference to its

Logical Bearings and Us Application to Moral and Social Science and to

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Statistics; 3rd edn., Macmillan, London, 1888.

(22) Westeroaard, H., Die Grundziige der Theorie der Statistik; Fischer,

Jena, 1890.













Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

















Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google





3. The frequencies not given in the question itself are—

(a) (AB) 107 (AC) 405 (BC) 525.

(6) (A$y) 22,980 (aBy) 13,585 (a$C) 96,478 (o/37) 28,868,495.

(AB) (B) . (AB) (B)

(A$) (0) -- (AB) + (Afsf(B) + (f})'

a <. • (AB) (A) .• . . (AB) (A)

that is -gf >iy, that is (B]_(AB) >JfZ(A)

«. * • (AB) (A)

that is Vm > rV

(aB) (a)

5. (AB) + (BC) - (B), i.e., the sum of the excesses of (AB) and (BC) over (B)/2.

8. 160. Take A = husband exceeding wife in first measurement, B =

husband exceeding wife in second measurement, and find (a/8).


1. 80/263 or 304 per thousand.

2. 55/85 or 65 per cent.

3. 32 per cent. and 30 per cent.

4. 117.

5. 108.

8- ?>i (l-2?),.p<ti(l + 2?). i.e.,p must lie between 0 and i (l-2g) or

between J (1 + 2q) and J.

9. As a hint, remember the condition that—

(BC)^(B) + (C)-N.




1. Deaf-mutes from childhood per million among males 222; among

females 183; there is therefore positive association between deaf-mntism and

male sex: if there had been no association between deaf-mutism and sex, there

would have been 3176 male and 3393 female deaf-mutes.

2. (a) positive association, since (AB)0 = 1457.

(6) negative association, since 294/490 = 3/5, 380/570 = 2/3.

(c) independence, since 256/768 = 1/3, 48/144 = 1/3.


Percentage of Plants above the Average Height.

Parentage Crossed.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

. 86 per cent.

25 per cent.

17 „

34 „


• 79 „

• 78 „

• 71 „

• 50 ,,


1 pom a;a purpurea.

Petunia violacea .

Reseda lutea

Reseda odorata

Lobelia fulgens .

The association is much less for the species at the end than for those at the

beginning of the list.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

4. Percentage of dark-eyed amongst the sons of dark-eyed fathers 39 per


Percentage of dark-eyed amongst the sons of not dark-eyed fathers 10 per


If there had been no heredity, the frequencies to the nearest unit would

have been (AB\ 18, (^4/8)0 111, (aB)0 121, (a/3)o 750.

5. Percentage of light-eyed amongst the wives of light-eyed husbands 59

per cent.

Percentage of light-eyed amongst the wives of not light-eyed husbands 53

per cent.

If there had been no association: (AB\=29&, (^0)o = 225, (a.B)0=143, (a/3)o

= 108.

6. The following are the proportions of the insane per thousand in

successive age groups :—

In general population: 0-9, 2-3, 4-1, 57, 6-9, 7.5, 77, 6"8.

Amongst the blind: 2091, 16-0, 16-3, 207, 18-3, 17-8, 11-4, 5 "3.

Note the diminishing association, which is especially clear in the age-group

65—, and the negative association in the last age-group. The association

coefficient gives the values below, which decrease continuously :—

Association coefficient: +0-92, +075, +0-61, +0-57, +0-46, +0-41,

+ 0-20, -0-18.


(D)/N =6-9 percent.

(AD)/(A) =45-0 „

(/3£)/03) = 3.6 ,,

(AHD)/(A$) =41-2 „

(BD)/(B) =427 „

(ABD)/(AB) = 5V6 „

(A)/N = 6-8 per cent.

(AD)/(D) =44-6

M/3)/(/3) = 47

(A$D)/($D) =54-9

(AB)/(B) =29-2


The above give two legitimate comparisons. The general results are the same

as for the boys, i.e. a very small association between development-defects and

dulness amongst those exhibiting nerve-signs, as compared with those who do



not exhibit nerve-signs, or with the girls in general. As the association

amongst those who do not exhibit nerve-signs is quite as high as for the girls

in general, the "conclusion " quoted does not seem valid.

2. (1)








Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259




(B)/N 3-2

(AB)/(A) 14-9



(A)/N) 0-9

(AB)/(B) 4-0



(BC)/(C) 38-8

(ABC)/(AC) 216



(AC)ftC) 6-6

(ABC)/(BC) 36-8
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google



The above give the two simplest comparisons, either of which is sufficient to

show that there is a high association between blindness and mental derange-

ment amongst the deaf-mutes as well as in the general population; amongst

the old, the association is, in fact, small for the general population, but well-

marked for deaf-mutes. This result stands in direct contrast with that of

Qu. 1, where the association between the two defects A and D was much

smaller in the defective universe B than in the universe at large. As previously

stated, no great reliance can be placed on the census data as to these infirmities.

3. If the cancer death-rates for farmers over 45 and under 45 respectively

were the same as for the population at large, the rate for all farmers 15—

would be I.ll. This is slightly less than the actual rate 1-20, but the excess

would not justify the statement that'' farmers were peculiarly liable to cancer."

It is, in point of fact, due to the further differences of age-distribution that we

have neglected, e.g. amongst those over 45 there are more over 55 amongst

farmers than amongst the general population, and so on.

4. 15 percent.

6. If A and B were independent in both C and y universes, we would have

(A B) equal to

471x419 • 151xl39_OT;,„


Actually (AB) only = 358. Therefore A and B must be disassociated in one or

both partial universes.

9. (1) 68.1 per cent. (2) 42.5 per cent. The fallacy discussed in § 2 is

now avoided, and there seems no reason for declining to consider this as evidence

of the effect of expenditure on election results.

10. The limits to y are—


subject to the conditions y^p-x, y<£0, y^2x -1. No inference of a positive

association from two negatives is possible unless x lies between the limits

•382 . . . , -618 ....

11. The limits to y are:—

<1) 'y< \(6x-Sx*-\)

>l(x + 6xi),

subject to conditions 2/«£0, <£ix - 1, .Jp-x.

An inference is only possible from positive associations of AB and A C if aCf>

i ; an inference is only possible from two negative associations if x lies between

•211 .... and -274. . . . Note that x cannot exceed J.


No inference is possible from positive associations of AB and BG.

An inference is only possible from negative associations if x lie between

"183 .... and "215 .... Note that x cannot exceed J.

(3) y<H6z-to*-l)

>l(3x + 2x2),

subject to the conditions j/«£0, «£5a: - 1, ^>0.

As in (2), no inference is possible from positive associations of A C and BC;

an inference is possible from negative associations if 2 lie between .177 . . . .

and '224 .... Note that x cannot exceed J.


1. A, 0 68. B, 0-36.


1. 1200; 200. 2. 100; 20. 3. 146-25. 4. 216.5.

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259


2. Mean, 156-73 lb. Median, 154-67 lb. Mode (approx.) 150-6 lb. (Note

that the mean and the median should be taken to a place of decimals further

than is desired for the mode: the true mode, found by fitting a theoretical

frequency curve, is 151-1 lb.)

3. Mean, 0-6330. Median, 0-6391. Mode (approx.), 0-651. (True mode

is 0 653.)

4. £35-5 approximately.

5. (1) 116-0. (2) Means 77.4, 89-0, ratio 114-9. (3) Geometrical means 77"2,

88-9, ratio 115-2. (4)115-2.

6. (1) 921,507. (2) 916,963.

7. 1st qual. 10s. 6|d. 2nd qual. 9s. 2|d.

8. n.p. If the terms of the given binomial series are multiplied by 0, 1, 2, 3

. . . , note that the resulting series is also a binomial when a common factor

is removed. [The full proof is given in Chapter XV. § 6.]


2. Standard deviation 21-3 lb. Mean deviation 16-4 lb. Lower quartile
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

142-5, upper quartile 168-4; whence $ = 12-95. Ratios: m.d./s.d. =077,

Q/bA. = 0-61. Skewness, 0-29.

3. Approximately lower quartile=£26-l, upper quartile = £54"6, ninth

5. (1) M=73-2, ff=17-3. (2) M=TS2, <r = 17-5. (3) if=73-2, <r = 18-0.

(Note that while the mean is unaffected in the second place of decimals, the

standard deviation is the higher the coarser the grouping.)

6. \Jn.pq. The proof is given in Chapter XV. § 6.

7. The assumption that observations are evenly distributed over the


intervals does not affect the sum of deviations, except for the interval in which

the mean or median lies: for that interval the sum is n2 (0 -25 + d2), hence the

entire correction is

rf(n! -n3) + n2(0 -25 + di).

In this expression d is, of course, expressed as a fraction of the class-interval,

and is given its proper sign. Notice that the % and % of this question are

not the same as the iVi and JV2 of § 16.


1. <rI=l-414, ov=2-280, r=+0-81. X=0-5Y+0-5. Y=ISX+V\.

2. Using the subscripts 1 for earnings, 2 for pauperism, 3 for out-relief ratio,

Jlf3 = 5-79, <r3 = 3-09: r13=-0-13, r23=+0•60.


1. 1"232 per cent. (against I.240 per cent.): 2-556 in. against 2p572 in.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

2. The corrected standard-deviation is 0-9954 of the rough value.

3. Estimated true standard-deviation 6 "91: standard-deviation of fluctua-

tions of sampling 9 "38. (The latter, which can be independently calculated,

is too low, and the former consequently probably too high. Cf. Chap. XIV.


4. 0'43.

5. 58 per cent.

6. ovVVK' + oVW + O-

8. 0-30.

The others may be written down from symmetry.

10. (1) No effect at all. (2) If the mean value of the errors in variables is

d, and in the weights e, the value found for the weighted mean is—

The true value + d-r.<Tx.(rw

ic(w + e)

If r is small, d is the important term, and hence errors in the quantities are

usually of more importance than errors in the weights. If r become con-

siderable, errors in the weights may be of consequence, but it does not seem
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

probable that the second term would become the most important in practical


1. r12.3 = +0759, r13.2= +0-097, ria.1= -0-436.

<r1.23=2-64, en, 13 = 0-594, <r3.12=70-l.

Jf1 = 9-31+3-37 Xj-0-00364 Xs.


2- ru-34= +0-680, r13.M= +0-803, ^4.23= +0 397.

r2M4= -0-438, rM.13= -0-553, rM.li= -0-149.

(r1.234=9-17, iri!.1M = 49-2, o-s.ia4=12-5, o-4.i23=105-4.

X1 = 53 + 0-127 Xt + 0-587 Xs + 0-0345 X4.

3. The correlation of the pth order is r/(l+pr). Hence if r be negative, the

correlation of order n - 2 cannot be numerically greater than unity and r

cannot exceed (numerically) l/(n - 1).

4. -ri2.

5- »-12-3= - 1i r13'S = r23-l= +!-

"- r12.3 = r18i2=:''23■l= - !•


1. Theo. if=6, <r = 1732 : Actual j)/=6-116, <r=1732.

(a) Theo. M=2-5, 0 = 1-118

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(6) „ ^=3, o-=1225

(c) „ i»/=3-5, <r = 1323

Actual J/=2-48, ff = l-14.

i/=2'97, ir=l-26.

J/=347, <r = l:40.

3. Theo. M=W, ff = 5 : Actual .¥=50-11, <r = 5-23.

4. The standard deviation of the proportion is 0-00179, and the actual

-divergence is 5-4 times this, and therefore almost certainly significant.

5. The standard deviation of the number drawn is 32, and the actual

difference from expectation 18. There is no significance.

6. p = l-a2/M, n = M/p :p=0-5W, n = 12-0 :p = 0'454, n=110-4.

8. Standard deviation of simple sampling 23-0 per cent. The actual

standard-deviation does not, therefore, seem to indicate any real variation, but

only fluctuations of sampling.

9. Difference from expectation 7 "5 : standard error 10 "0. The difference

might therefore occur frequently as a fluctuation of sampling.

10. The test can be applied either by the formulae of Case II. or Case III.
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

Case II. is taken as the simplest.

(a) (AB)/(B) = 69-1 per cent.: (A$)/($) = &0-0 per cent. Difference 10-9

percent. (Ji )/iV= 71 "1 percent. and thence ei2=12-9 per cent. The actual

difference is less than this, and would frequently occur as a fluctuation of

simple sampling.

(b) (AB)/(B) = l§-\ per cent.: (,4/3)/(/3) = 64-3 per cent. Difference 5-8 per

cent. (A)/N=67.G per cent., and thence «12 = 3.40 per cent. The actual

difference is 1 '7 times this, and might, rather infrequently, occur as a fluctua-

tion of simple sampling.




Group of Rows.



5, 6, and 7



8, 9, 10, and 11



12, 13, and 14



15 and upwards


a-p is given in units per 1000 births, as s and s0.

2. s0 = 7-02, and <rp=2.5 units.

3. a2=n.pq as if the chance of success were p in all cases (but the mean is


4. Mean number of deaths per annum = o-0a = 680,

az=566,582. r=0-000029.





7 792


8 495


9 220
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259


10 66


11 12


12 1


Total, 4096


5 116-4

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

6 27-2


7 4-7


8 -6


363 9

Total, 4096-2





Total, 648

2. The frequency of r successes is greater than that of r -1 so long as

r<np+p: if *ip is an integer, r=np gives the greatest term and also the mean.

3. This follows at once from a consideration of the Galton-Pearson apparatus.



Normal curve.














4. In fig. 50, suppose every horizontal array to be given a slide to the right

until its mean lies on the vertical axis through the mean of the whole distribu-

tion: then suppose the ellipses to be squeezed in the direction of this vertical

axis until they become circles. The original quadrant has now become a

sector with an angle between one and two right angles, and the question is

solved on determining its magnitude.


1. Estimated frequency 1512, standard error 0-29 lb. 2. Lower Q,

frequency 1472, standard error 026 lb.; upper Q, frequency 1116, standard

error 0-34 lb. 3. 0-18 lb. 4. 0.24 lb., 17 per cent• less than the standard

error of the median. 5. 0 0196 in or 0 76 per cent. of the standard-deviation:

the standard error of the semi-interquartile range is 1-23 per cent• of that

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

6 r. n = 100. n = 1000.















Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

£The references are to pages. The subject matter of the Exercises given at

the ends of the chapters has been indexed only when such exercises (or

the answers thereto) give the constants for statistical tables in the text,

or theoretical results of general interest; in all such cases the number of

the question cited is given. In the case of authors' names, citations in

the text are given first, followed by citations of the authors' papers or

books in the lists of references. ]

Accident, deaths from (law of small

chances), 261-262.

Achenwall, Gottfried, Abriss der

Staatswissenschaft, 2.

Ages, at death of certain women

(table), 78; of husband and wife

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

(correlation), 159; diagram, 173;

constants (qu. 3), 189.

Aggregate, of classes, 10-11.

Agricultural labourers' earnings. See


Airy, Sir G. B., use of terms "error

of mean square" and "modulus,"

144. Refs., Tlimry of Errors of

Observation, 355.

Amman, 0., hair and eye-colour data

cited from, 61.

Annual value of dwelling-houses

(table), 83; of estates in 1715,

table 100, diagram, 101.

Arithmetic mean. See Mean, arith-


Array, def., 164; standard-deviation

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

of, 177, 204, 232-233, in normal

correlation, 315-316.

Association, generally, 25-59; def.,

28; degrees of, 29-30; testing by

comparison of percentages, 30-35;

constancy of difference from in-

dependence values for the second-

order frequencies, 35-36; co-

efficient of, 37-38; illusory or

misleading, 48-51; total possible

number of, for n attributes, 54-56;

case of complete independence,

56-57 ; use of ordinary correlation-

coefficient as measure of association,

212-213 ; Pearson's coefficient based

on normal correlation (refs.), 39,

329 ; refs., 15, 39, 329.

Association, partial, generally, 42-59;

the problem, 42-43 ; total and par-

tial, def., 44; arithmetical treat-

ment, 44-48 ; testing, in ignorance

of third-order frequencies, 51-54;

refs., 57.

examples: deaths and sex, 32-

33 ; deaths and occupation, 52-53;

deaf-mutism and imbecility, 33-34;

eye-colour of father and son, 34-35;

eye-colour of grandparent, parent,

and offspring, 46-48, 53-54; colour

and prickliness of Daturu, fruits, 3,6-

37; defectsinschool-children, 45-46.


90-102 ; relative positions of mean,

median and mode in, 121-122,

diagrams, 113-114. See also Fre-


Asymmetry in frequency-distribu-

tions, measures of, 107, 149-50.

Attributes, theory of, generally, 1-59;



aggregate of classes, 10-11; ulti-

mate classes, 12; positive classes,

13-14; consistence of class-fre-

quencies, 17-24 (see Consistence);

association of, 25-59 (see Associa-

tion); sampling of, 250-830 (see

Sampling, of attributes).

Averages, generally, 106-32; def.,

107; desirable properties of, 107-

108; forms of, 108; average in

sense of arithmetic mean, 109;

refs., 129-130. See Mean, Median,

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259


Axes, principal, in correlation, 317-


Barlow, P., tables of squares, etc.,

67. Kefs., 352.

Barometer heights, table, 96; dia-

gram, 97; means, medians, and

modes, 122.

Bateman, II.• refs., law of small

chances, 269.

Bateson, W., data cited from, 87.

Beeton, Miss M., data cited from, 78.

Bernoulli, J., refs., Ars Conjectandi,


Bertillon, J., ref., Cours élémentaire

de statistique, 365.

Bertrand, J. L. F., refs., Calcul des

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

probabilités, 355.

Bias in sampling, 257-258, 275-277,

332, 339, 348.

Binomial series, 287-296 ; genesis of

in sampling of attributes, 287-289;

calculated series for different values

of p and n, 290, 291 ; experimental

illustrations of, 254-255, (qu. 1

and qu. 2) 270; graphic method

of forming a representation of series,

291-293; mechanical method of

forming a representation of series,

293-295, refs., 309; direct deter-

mination of mean and standard-

deviation, 295-296; deduction of

normal curve from, 297-298; refs.,


Blakeman, J., refs., tests for linearity

of regression, 206; probable error

of contingency coefficient, 349.

Boole, G., refs., Laws of Thought,


Booth, Charles, on pauperism, 193,


Borel, E., refs., Théorie des 2>roba-

bilités, 355.

Bortkewitsch, L. von., refs., law of

small chances, 269.

Bowley, A. L., refs., effect of errors

on an average, 850 ; on sampling,

849; Measurement of Groups and

Series, 349; Elements of Statistics,

355; Elementary Manual of Sta-

tistics, 355.

Bravais, A., refs., correlation, 188,


British Association, data cited from,

stature, 88 ; weight, 95, see Stature,

Weight; Reports on index-num-



Cloudiness at Breslau, frequency dis-

tribution, 103; diagram, 104.

Coefficient, of association, 37-38; of

contingency, 64-67; of variation,

149, standard error, 347; of cor-

relation, see Correlation.

Consistence, of class-frequencies for

attributes, generally, 17-24; def.,

18-19 ; conditions, for one or two

attributes, 20; for three attributes,

21-22; refs., 23.

Consistence of correlation-coefficients.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259


Contingency tables, def., 60; treat-

ment of, by elementary methods,

61-63 ; isotropy, C8-71, 324-327.

coefficient of, 64-67; applica-

tion to correlation tables, 167, (qu.

3) 189; standard error of (refs.),


Contrary classes and frequencies (for

attributes), 10 ; case of equality of

contrary frequencies (qu. 6, 7, 8),

16 ; (qu. 8), 24; (qu. 7, 8, 9), 59.

Correction of death-rates, etc., for

age and sex-distribution, 219-221;

refs., 222.

of standard-deviation for group-

ing of observations, 208; refs.

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

(including correction of moments

generally), 221.

of correlation-coefficient for

errors of observation, 209-210;

refs., 221-222.

Correlation, generally, 157-249; con-

struction of tables, 164; represen-

tation of frequency-distribution by

surface, 165-167; treatment of

table by coefficient of contingency,

167; correlation-coefficient, 170-

174, def. 174, direct deduction

227-229; regressions, 175-177,

def. 175; standard-deviations of

arrays, 177, 204; calculation of

coefficient for ungrouped data, 177-

181, for a grouped table, 181-188:

between movements of two variables.

197-201; elementary methods for

cases of non-linear regression, 201-

202; rough methods for estimating

coefficient, 202-205; correlation-

ratio, 205 ; effect of errors of ob-

servation on the coefficient, 209-

210; correlation between indices,

211-212; coefficient for a fourfold

table, direct, 212-213, on assump-

tion of normal correlation (Pearson's

coefficient) (refs.), 39, 329; for all

possible pairs of N values, 213-

214; correlation due to hetero-

geneity of material, 214-215 ; effect

of adding uncorrelated pairs to a

given table, 215-216 ; application

to theory of weighted mean, 216-

218; correlation in theory of sam-

pling, 267, 282-285, 338, 345-

346 ; standard error of coefficient,

348. Refs., 188,205-206, 221-222.



Movements of infantile and

general mortality, 197-199.

11 ovements of marriage-rate and

foreign trade, 199-201.

Correlation, normal, 313-330: deduc-

tion of expression for two variables,

310-315; constancy of standard-

deviation of arrays and linearity

of regression, 315-316; contour

lines, 316-317 ; normality of linear

functions of two normally distri-

buted variables, 317; principal

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

axes, 317-318; testingfornormality

of correlation table for stature,

318-324; isotropy of normal cor-

relation table, 324-327; outline

of theory for any number of

variables, 327-328; coefficient for

a normal distribution grouped to

fourfold form round medians

(Sheppard's theorem), (qu. 4) 330;

applications to theory of qualitative

observations (refs.), 329. Refs.,


partial, 225-249; the pro-

blem, partial regressions and cor-

relations, 225-227; notation and

definitions, 229-230 ; normal equa-

tions, fundamental theorems on

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

product-sums, 230-231; signifi-

cance of generalised regressions

and correlations, 232; reduction

of standard-deviation, 232-233, of

regression 233-234, of correlation

2-34; arithmetical treatment 234-

241; representation by a model,

241-243; coefficient of M-fold cor-

relation, 243-245; expression of

correlations and regressions in terms

of those of higher order, 245-246;

consistence of coefficients, 246-7;

fallacies, 247-248 ; limitations in

interpretation of the partial correla-

tion coefficient, partial association

and partial correlation, 248; par-

tial correlation in case of normal

distribution of frequency, 327-328.

Refs., 248-249, 328-329.

ratio, 205 ; refs., 206.

Cosin, values of estates in 1715,


Cotsworth, M. B., refs., multiplica-

tion table, 353.

Cournot, A. A., refs., theory of

probability, 355.

Crawford, G. E., refs., proof that

arithmetic mean exceeds geometric,


Crelle, A. L., refs., multiplication

table, 353.

Crops and weather, correlation, 196-


Czuber, E., refs., Wahrscheinlich-

keitsrechnung, 355.

Dakbishire, A. D., data cited from,

128, 261. Refs., illustrations of

correlation, 188, 269.

Darwin, Charles, data cited from,



Deviation, root-mean-square. See

Deviation, standard.

standard, 134-144; def. 134;

relation to root-mean-square devi-

ation from any origin, 134-135;

is the least possible root-mean-

square deviation, 135; little affected

by small errors in the mean, 135;

calculation for ungrouped data,

135-137, for a grouped distribu-

tion, 138-141 ; influence of group-

ing, 140, 208 ; range of six times

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

the s.d. contains the bulk of the

observations, 140-142, 305 ; of a

series compounded of others, 142-

143; of N consecutive natural

numbers, 143; of a rectangle, 143;

of arrays in theory of correlation,

177, 204, 315-316; of generalised

deviations (arrays), 230, 232-233;

other names for, 144; of a sum or

difference, 207-208 ; effect of errors

of observation on, 209 ; of an index,

210-211; of binomial series, 295-

296. For standard-deviations of

sampling, see Error, standard.

De Vries, H., data cited from, 102.

Dice, records of throwing, 254-255,

(qu. 1, 2, 3) 270; testing for

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

significance of divergence from

theory, 263; refs., 269.

Dickson, J. D. Hamilton, normal

correlation surface, 324. Refs.,

normal correlation, 329.

Diphtheria, ages at death from, table,

98; diagram, 97.

Discounts and reserves in American

banks, table, 162; diagram, facing


Dispersion, measures of, 107, 133-

156; unsuitability of range as

a measure, 133; relative, 149;

refs., 154. See Deviation, mean;

Deviation, standard; Quartiles.

Distribution of Frequency. See Fre-


Earnings of agricultural labourers:

calculation of standard-deviation,

135-137; mean deviation, 145;

quartiles, 147; correlation with

pauperism and out-relief, 178-181,

constants, (qu. 2) 189, 235; dia-

gram, 180; by partial correlation,

235-243 ; diagram of model, 242.

Edgeworth, F. Y., terms for measures

of dispersion, 144 ; dice-throwings

(Weldon), 254; probable error of

median, etc., 340. Refs., Index-

numbers, 130-131; correlation,

188, 248, 329; law of error

(normal law), 269, 310; theory of

sampling, probable errors, etc.,

269, 349; dissection , of normal

curve, 311.

Elderton, W. Palin, refs., calculation

of moments, 154 ; table of powers,

353; tables for testing fit, 349,

353; Frequency Curves and Cor-



Falkner, R. P., refs., translation of

Meitzen's Theorie der Statistik, 6.

Fallacies, in interpreting associations

—theorem on, 48-49, illustrations,

49-50 ; owing to changes of classi-

fication, actual or virtual, 72; in

interpreting correlations—"spuri-

ous" correlation between indices,

211-212; correlation due to hetero

geneity of material, 214-215; dif-

ference of sign of total and partial

correlations, 247-248.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

Fay, E. A., data cited from Mar-

riages of the Deaf in America, 104.

Fechner, G. T., refs., frequency-dis-

tributions, averages, measures of

dispersion, etc., 129, 154; Kol-

lektivmasslehre, 129, 310, 356.

Fecundity of brood-mares, table, 96;

diagram, 94; mean, median, and

mode, (qu. 3) 131; inheritance

(ref.), 222.

Fertility of mother and daughter,

correlation, 161, 195-196; dia-

gram, 175 ; constants, (qu. 3) 189;

ref., 222.

Filon, L. N. G., ref., probable errors,


Fit of a theoretical to an actual fre-

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

quency-distribution, testing (ref.),

311 ; tables for, 353.

Fluctuation, measure of dispersion,


Fountain, H., ref., index-numbers

of prices, 131.

Frequency of a class, 10, 76.

Frequency-curve, def., 87; ideal forms

of, 87-105; normal curve (q.v.),

297-309; refs., 105, 310.

Frequency-distributions, 76; forma-

tion of, 79-83; graphic represen-

tation of, 83-87; ideal forms-

symmetrical, 87-90, moderately

asymmetrical, 90-98, extremely

asymmetrical (J-shaped), 98-102,

U-shaped, 102-105 ; binomial series,

287-296; hypergeometrical series

(ref.), 285; normal curve, 297-

309; theoretical forms, refs., 285,

310. See Binomial series; Normal

curve; Correlation, normal.

illustrations: of death-rates in

England and Wales, 77 ; of ages at

death of certain women, 78; of

stigmatic rays on poppies, 78; of

annual values of dwelling-houses

in Great Britain, 83; of head-

breadths of Cambridge students,

84; of statures of males in the

U.K., 88, 90; of pauperism in

different districts of England and

Wales, 93 ; of weights of males in

the U.K., 95; of fecundity of

brood-mares, 96; of barometer

heights at Southampton, 96; of

ages at death from diphtheria, 98;

of annual values of estates, 100;

of petals in Ranunculus bulbosus,



centiles, 118, 151-152; of repre-

senting correlation between two

variables, 180-181; of estimating

correlation coefficient, 203-204 ; of

forming one binomial polygon from

another, 291-293.

Graunt, John, Observations on the

Bills of Mortality, 6.

Gray, John, data cited from, 266.

Grouping of observations to form

frequency - distribution, choice of

class-interval, 79-80 ; influence on

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

mean, 113-114, 115, 116 ; influence

on standard-deviation, 140, 208.

Hair-colour: and eye-colour, ex-

ample of contingency, 61, 63, 66-

68; theory of sampling applied to

certain data, 266-267, 268.

Harmonic mean. See Mean, har-


Harris, J. A., rets., short method

of calculating coefficient of cor-

relation, 206.

Head-breadths of Cambridge students,

table, 84 ; diagram, 85.

Helguero, F. de, refs., dissecting

compound normal curve, 311.

Heron, D., refs., relation between

fertility and social status, 205;

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

defective physique and intelli-

gence, application of correction

for age-distribution, etc., 222;

abac giving probable errors of

correlation coefficient, 349, 353;

probable error of a partial correla-

tion coefficient, 350.

Histogram, construction of, 84.

Hollis, T., cited re Cosin's Names of

the Roman Catholics, etc., 100.

Hooker, R. H., correlation between

weather and crops, 196; between

movements of two variables, 201.

Refs., correlation between move-

ments of two variables, 205;

weather and crops, 205, 249;

theory of partial correlation, 248.

Houses, inhabited and uninhabited,

in rural and urban districts, 61-

62; annual value of, table, 83;

median, (qu. 4) 131; quartiles,

(qu. 3) 155.

Hull, C. H., ref., The Economic

Writings of Sir William Petty,

together with the Observations on

the Bills of Mortality more probably

by Captain John Qraunt, 6.

Husbands and wives, correlation be-

tween ages, table, 159; diagram,

173; constants, (qu. 3) 189.

Hypergeometrical Series, ref., 285.

Illusory associations, 48-51.

Imbecility, association with deaf-

mutism, 33-34, 38.

Inclusive and exclusive notations for

statistics of attributes, 14-15.

Independence, criterion of, for attri-

butes, 25-28 ; case of complete, for

attributes, 56-57 ; form of contin-



Laplace, Pierre Simon, Marquis de,

probable error of median, 340.

Refs., normal curve, 310; mean

deviation least about the median,

154; Théorie analytiquc des pro-

bability, 154, 350, 356.

Larmor, Sir J., use of word "statis-

tical," 4.

Lee, Alice, data cited from, 96, 122,

160, 161. Refs., inheritance of

fertility and fecundity, 222.

Lemna Minor, correlation between

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

lengths of mother- and daughter-

frond, 185-187.

Lexis, W., use of term "precision,"

144. Refs., Theorie der Massen-

erscheinungen, 269; Abhandlungen

zur Theorie der Beyblkcrungs und

Moral-statistik, 269, 356.

Lipps, G. F., refs., measures of

dependence (association, correla-

tion, contingency, etc.), 39; Fech-

ner's Kollektivmasslchre, 129, 356.

Little, W., data as to agricultural

labourers' earnings cited from, 137.

Lobelia, application of theory of

sampling to certain data, 265-266,


Logarithmic increase of population,

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

125-126 ; logarithmic mode, 128.

Macalister, Sir Donald, ref., law

of geometric mean, 130, 310.

Macdonell, W. R., data cited from,

84, 90.

Marriage-rate and trade, correlation

of movements, 199-201.

Maxwell, Clerk, use of word "stat-

istical," 4.

Mean, arithmetic, generally, 108-116;

def., 108-109; nature of, 109 ; cal-•

culation of, for a grouped distribu-

tion, 109-113; influence of group-

ing, 113-114, 115, 116; position

relatively to mode and median, 121_

122, diagrams, 113, 114; sum of

deviations from, is zero, 114; of

series compounded of others, 115; of

sum or difference, 115-116; com-

parison with median, 119; sum-

mary comparison with median and

mode, mean is the best for all

general purposes, 122-123; weight-

ing of, 216-221 ; of binomial series,

295 ; standard error of, 340-346.

deviation. See Deviation, mean.

Mean, error, 144.

geometric, 108; generally,

123-128; def., 123; calculation,

124; less than arithmetic mean,

123; difference from arithmetic

mean in terms of dispersion, (qu. 8)

156; of series compounded of

others, 124; of series of ratios or

products, 124 ; in estimating inter-

censal populations, 125-126; con-

venience for index-numbers, 126-

127 ; use on ground that deviations

vary with absolute magnitude, 127-



Mode, 108 ; generally, 120-123; def.,

120; approximate determination,

from mean and median, 121-122;

diagrams showing position re-

latively to mean and median 113,

114 ; logarithmioorgeometric mode,

128 ; weighting of, 221; refs., 130.

Modulus, as measure of dispersion,

144; origin from normal curve,


Mohl, Robert von, refs., Geschichte

unit Literatur der Staatswissen-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

schaflen, 5.

Moment, first, def., 110 ; second and

general, def., 135; calculation of

moments (ref.), 154.

Moore, L. Bramley, data cited from,

96, 161. Ref., inheritance of fer-

tility and fecundity, 222.

Mortality. See Death-rates.

Movements, correlation of, in two

variables, methods, 197-201 ; refs.,


Negative classes and attributes, 10.

Newsholme, A., refs., birth-rates, cor-

rection for age-distribution, etc.,

222; Vital Statistics, 355.

Normal curve of errors: deduction

from binomial series, 297-298;

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

value of central ordinate, 300;

table of ordinates, 299; mean

deviation and modulus, 300;

comparison with binomial series

for moderate value of n, 300-301;

outline' of more general methods

of deduction, 301-303; fitting to

a given distribution, 303-304 ; the

table of areas, 306, and its use,

305-306; quartile deviation and

probable error, 306-307 ; numerical

examples of use of tables, 307-309

normality in fluctuations of sam

pling of the mean, 342-343. Refs.

general, 310; dissection of com

pound curve, 310-311; tables.

353-354. For normal correlation.

see Correlation, normal.

Norton, J. P., data cited from, 162.

Ref., Statistical Studies in the New

York Money Market, 205.

Order, of a class, 10 ; of generalised

correlations, regressions, devia-

tions, and standard-deviations,


Palgrave, Sir R. H. I., Dictionary

of Political Economy, 6.

Pareto, V., refs., Cours aVe.conomic

politique, 105.

Partial association. See Association,


Partial correlation. See Correlation,


Pauperism, in England and Wales,

table 93 ; diagrams, 92, 113; cal-

culation of mean, 111 ; of median,

117, 118; means, medians, and

modes for other years, 122 ; stand-

ard-deviation, 139-140; mean



diameters of shell, 158 ; constants,

(qu. 3) 189.

Percentage, standard error of, 252-

253; when numbers in samples

vary, 260-261. See also Sampling,

of attributes.

Percentiles, 150-153; def., 150 ; de-

termination, 151-152; advantages

and disadvantages, 152-153; use

for unmeasured characters, 152-

153, refs., 329; standard errors

of, 333-337; correlation between

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

errors of sampling in, 337-338;

refs., 154.

Petals of Ranuncuhis bidbosus, fre-

quency of, 102; unsuitability of

median in case of such a distribu-

tion, 117.

Peters, J., refs., multiplication table,


Petty, Sir W., refs., Economic

Writings, 6.

Poincare, H., refs., Calcul des pro-

bability, 356.

Poisson, S. D., refs., sex-ratio, 269;

Recherches snr la probabilite des

jugements, 209, 356.

Poppies, stigmatic rays on, frequency,

78; unsuitability of median in

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

such a distribution, 116.

Population, estimation of between

censuses, 125-126 ; refs., 130.

Positive classes and attributes, def.,

10; number of positive classes, 13;

sufficiency of for tabulation, 13;

expression of other frequencies, in

terms of, 13-14.

Precision, 144, 253, 300.

Prices, index-numbers of, 126; use of

geometric mean, 126; of harmonic

mean, 129 ; refs., 130-131.

Principal axes, in correlation, 317-

318 ; ref., 329.

Quartile deviation. See Quartiles.

Quartiles, quartile deviation and semi-

interquartile range, 134; generally,

147-149; defs., 147; determina-

tion, 147-148; ratio of q.d. to

standard-deviation, 148, 306; ad-

vantages of q.d. as a measure of

dispersion, 148-149; difference be-

tween deviations of quartiles from.,

median as measure of skewness,

150; ratio of q.d. to median as

measure of relative dispersion, 149;

q.d. of normal curve, 306; stan-

dard errors, 333-337, 337-339.

Quetelet, L. A. J., refs., Lettres sur

la théorie des probabilités, 269, 356.

Random sampling, in sense of simple

sampling, 285.

Range, unsuitability of, as a measure

of dispersion, 133.

Ranks, 143, 153 ; methods of corre-

lation based on (refs.), 329.

Ranunculus, frequency of petals, 102;

unsuitability of median for such

distributions, 117.


measure of untrustworthiness, 275-

277 ; effect of removing conditions

of simple sampling, 277-285; sam-

pling from limited material, 283;

binomial distribution, 287-296;

normal curve, 297-309; normal

correlation, 313-330. See also

Binomial series; Hypergeometrical

series; Normal curve; Correlation,


Sampling, of variables, conditions

assumed in simple sampling, 331-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

333; standard errors of percentiles

(median and quartiles), 333-337;

dependence of standard error of

median on the form of the distribu-

tion, 334-336; of difference between

two percentiles, 337-339 ; of arith-

metic mean, 340-346; of difference

between two means, 341-342; nor-

mality of distribution of mean,

342-343; effect of removing con-

ditions of simple sampling on

standard error of mean, 343-346;

standard error of standard-devia-

tion and coefficient of variation,

347; of coefficients of correlation

and regression, 348.

Saunders, Miss E. R., data cited

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

from, 37.

Scheibner, W., difference between

arithmetic and geometric, arith-

metic and harmonic means, (qu. 8

and qu. 9) 156.

Scripture, E. W., use of word

"statistics," 3.

Semi-interquartile range. See Quar-


Sex-ratio of births: correlation with

total births, 163, 175; diagram,

176; constants, (qu. 3) 189;

application of the theory of samp-

ling to, 258-260, (qu. 7) 271, (qu.

1, 2) 285, refs., 269; standard

error of ratio male to female births,

(qu. 11) 271

Shakespeare, W., use of word

"statist," 1.

Sheppard, W. F., correction of the

standard-deviation for grouping,

208, 303; theorem on correlation

of a normal distribution grouped

round medians, (qu. 4) 330;

normal curve tables, 333; standard

errors of percentiles, 340. Refs.,

calculation and correction of

moments, 221; normal curve and

correlation, theory of sampling,

310, 329, 350; tables of normal

function and its integral, 354.

Significant differences, 262.

Sinclair, Sir John, use of words

"statistics," " statistical," 2.

Skew or asymmetrical frequency-

distributions, 90-102. See also


Skewness of frequency-distributions,

107; measures of, 149-150.



Stigmatic rays on poppies, frequency,

78; unsuitability of median for

such distributions, 116.

Stirling, James, expression for fac-

torials of large numbers, 300.

"Student" (pseudonym), refs., law

of small chances, 269; probable

errors, 350.

Symmetrical frequency-distributions,

87-90. See also Frequency-dis-

tributions; Normal curve.

Symons, G. J., use of word '' sta-

Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259

tistics " in British Rain/all, 3.

Tabulation, of statistics of attri-

butes, 11-15, 37; of a frequency-

distribution, 81; of a correlation

table, 164.

Tatham, John, refs., correction of

death-rates, 222.

Thorndike, E. L., refs., methods of

measuring correlation, 329 ; Theory

of Mental and Social Measurements,


Todhunter, I., refs., History of

the Mathematical Theory of Prob-

ability, 6.

Type of array, def., 164.

Ultimate classes and frequencies,

def., 12 ; sufficiency of, for tabula-

Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

tion, 12-13.

Universe, def., 17; specification of,

17, 18.

U-shaped frequency distributions,


Value, annual, of dwelling-houses,

table, 83; median, (qu. 4) 131;

quartiles, (qu. 3) 155.

of estates, in 1715, table, 100;

diagram, 101.

Variables, theory of, generally, 75-

249 ; def., 7, 75.

Variates, def., 150.

Variation, coefficient of, 149; stan-

dard error of, 347.

Venn, John, refs., Logic of Chance,

sex-ratio, 269, 356.

Verschaeffelt, E., relative dispersion,

149. Kefs., measure of relative dis-

persion, 154.

Vigor, H. D., data cited from, 163.

Refs., sex-ratio, 269.

Wages of agricultural labourers, see


Warner, F., refs., study of defects in

v school-children, notation for stat-

istics of attributes, 15.

Waters, A. C, refs., estimating in-

tercensal populations, 130.

Weather and crops, correlation, 196-


Weighted Mean, see Mean, weighted;

also Mean, geometric; Median;


Weights of males in U.K., table, 95;

diagram, 94; mean, median, and

mode, (qu. 2) 131; standard-devia-

tion, mean deviation and quartiles,

(qu. 2) 155.
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google
Generated for Lawrence J Hubert (University of Illinois at Urbana-Champaign) on 2013-09-06 15:51 GMT / http://hdl.handle.net/2027/mdp.39015033708259
Public Domain in the United States, Google-digitized / http://www.hathitrust.org/access_use#pd-us-google

You might also like