Mathematical Statistics: (Communicated by Prof. H. at The Meeting of October 31, 1959)
Mathematical Statistics: (Communicated by Prof. H. at The Meeting of October 31, 1959)
Mathematical Statistics: (Communicated by Prof. H. at The Meeting of October 31, 1959)
NICOLAAS H. KUIPER
1. Introduction
H. KLOMP, Professor of zoology at Wageningen, suggested the following
problem to me:
Let M ~ (qJ1 , ... qJm) be a sequence of angles (0 <: (/Ji < 2n) representing
compass-directions into which m birds, or groups of birds, have been seen
flying on migration at a given place on earth at a given time, for example
at a given day. M is assumed to be a sequence of m independent obser-
vations concerning an unknown random direction cp. Let M' ~ (qJ~, ... qJ~)
be analogous with respect to cp', concerning a second place-time.
1) The symbol ::::::::: means "Having the same cumulative distribution function".
Compare [5].
39
tation of the direction cp will be defined and given by E(cp) and an
estimate is .!.
m
~r ~t· A measure for the degree of the orientation is
{E(cp -Ecp)2}-1
which can also be estimated in the usual way. For m sufficiently large,
confidence intervals based on normal approximations can be given, and
results on different place-times can be compared.
If one has to compare several place-times it is claryfying to represent
the corresponding confidence intervals for E(cp) and [E(cp-Ecp) 2]l in a
plane with these numbers as polar coordinates.
WESTENBERG [6] applied related methods also under weaker conditions,
in which he indicated a choice for the direction which shall have coordinate
~ = 0. Personally we hesitate to go as far as he does.
and we may test the n'lillhypothesis that z has uniform distribution on the
e
unit circle, with the statistic 1):
For any distribution of z. on the unit circle the second moment of any
linear function f(z) with gradient l, that is a function of the kind
f(z)=xcosiX+ysiniX, IX constant, is immediately seen to be< l.
Hence also the variance of any such function is < l. This has as a
consequence that the linear transformation a has the property that for
any vector z, length az...;;; length z.
Again we assume m sufficiently large so that we could approximate z
with a normal random vector. If we use, under these assumptions, a
circular confidence region with confidence level IX for E(z), obtained by
putting
(i- E(z)) 2 " ' ~ X~ (which is ·not true)
1) An example of an alternative against which this test does not hold, is the
case that the probability mass is equally divided over the vertices of a regular polygon.
41
We assume that X1, ... Xn are independent random variables isomorous
with a random variable x with values in [0, 1). And we want to give a
test for the nullhypothesis
(2.1) Ho: x:::::: y.
We obtained m [5].
(2.2) Dtv(x)- Dw(x) is constant.
Consider supx Dtv(x) and infx Dtv(x). For b = 0 these values are denoted
in the literature byn: and - D;;. KoLMOGOROV uses such statistics and
also max (D:, D;;) in order to test H 0 in view of one-sided or two-sided
alternatives. (In the general case of real random variables, cumulating
starts at - oo ).
Analogously one might suggest supx Dtv(x) andfor infx Dtv(x) for tests
. concerning distributions on the circle of reals mod l. However, the values
of these statistics depend on the value b of x at which we start cumulating!
From (2.2) follows that
Vtv = sup.,Div(x)- inf.,Dtv(x)
is a function of the sequence W = (x1 , ... Xn), 0<Xt< l, which is independent
of b, and so we may substitute just as well b=O (and omit b in the
notation).
(2.3) Vw = sup.,Dw(x)- inf.,Dw(x).
Instead of V w we occasionally write Vn·
From a random set W we obtain the random variable:
Vw = v.. = sup.,Dw(x)- inf.,Dw(x) = D,t + D;;
which can be used on the circle by passing to the reals mod l, and which,
assuming the nullhypothesis, is independent of the c.d. function on this
circle and independent of the point at which we start cumulating.
If b is a constant then we will use the symbol b also for the random
variable which has a probability one of taking the value b. If Zn is a
sequence of random variables and the c.d.-functions of Zn converge for each
value to that of a random variable z, then we say that the limit* of Zn
for n -+ oo is isomorous with z: 1)
lim* Zn"" z.
11-->-00
1) This limit* should not be confused with the customary limit of a random
infinite sequence.
42
Consequently we suggest the statistic Vw= V,.. in (2.3) for testing whether n
given points are independent values of a given theoretical random point on
the circle.
3. An asymptotic formula for the c.d. function of V,..
An asymptotic formula for the distribution-function of V,.. can be
obtained from a result of D. DARLING [2]. DARLING has, assuming the
nullhypothesis
(3.1) P (for all x: -a< Vn D,..(x) <b)= <t>,..(a, b), a~ 0, b ~ 0,
where
<t>,..(a, b) = <l>(a, b)+ 6 ~ (i)~ +i)i)b) <l>(a, b)+ o(~)'
The following computation is due to DARLING and the author. The density
of the random vector
(Vn D;, Vn Dt)= (a, b)= (-infVn Dw(x), sup Vn Dw(x))
is
Hence
P(Vn V,.. ~c)= P(b- (-a)< c)= P(a+b<c)
c c-b
=
b=O
I ~ I !:~~ da ~ db
11=0
(3.2) c
=
b=O
I h~ <t>,..(a, b)t-c-b db
+ B(o) + o(!).
= A(c)
vn n
As
s~ <l>(a, b)?
( i)b ~~~-c-b
= i {-4j2ce-2i'o" +4(j-1) (jc-b) e- 2<ic-bl'}
i- _ 00
we have
A(c) = j ~ ~ <f>(a, b)?Sa-c-b db= i-f-oo {_4j2c2e-2i'c' + (j -1) e-2(ic-bl b=Oh
b=O( i)b
1
00
= _I { _ 4j2c2e-2i'c' +
i= -oo + (j -1)(e-2<i-I>'c' _ (j _ 1) e-2i1c}
=-I
00
e":.2i"o"{-4j2-e2+j-(j-1)}
;-·-'oo .
=I
00
(1- 4j2c2) e-2i•c•
i--oo
I
00
= 1- 2(4j2c2-1) e- 2i"c".
-1
43
From (3.1) (3.2) one also obtains
Hence
(3.4)
The formula (3.3) was compared with the result concerning 200 in-
dependent samples of 10 numbers between 0 and 1, in three decimal
places, obtained from a table of random numbers. The c.fr. function of
the 200 values of V 10 so obtained is given in figure l. It is seen to be
in good agreement with the values according to the formula given in
table l. (The cumulation in the figure goes from right to left.)
TABLE 1
c u = cjV10 v = P(Vn > cjVlo)
::Jo
CD
·d
• 0
.• ~
..
•
...
• •
.··
......
-·· ••
-
1\
:,
..........~··
....
0
-·
... ...
-·
.....
>I
a.. ..~
.~·-'
>II 0
------~~--------r---------r---------~--------~a
m CD ~ o
d d d .
Fig. 1
45
We remark
lim* Vn,m'"""' Vn
m ..... oo
and recall
lim* Vn'"""' 0.
n ..... oo
We suggest the statistic Vn,m for testing whether in case n given independent
values of an unknown random point x and m given independent values of
an unknown random point y can come from the same continuous distribution
x '"""' y on the circle.
In order to apply the test one needs the c.d. function of Vn,m which,
however, we did not yet determine in general. For the case m=n we
can obtain this c.d. function as follows from a formula of GNEDENKO [3],
recently improved by KEMPERMAN [5]: § 4, formula (9).
with
* ,ro:::- H2r*
~ ~ H2r
( -1)' g,(a, b, n) = (-IV g, = ~ (2kc (2a+ffn2kc) I
i
k=-oo v 2n 2n J
n:,(x) = ( d)2r
d:l; e-rl11Z, c = a+b.
(4.2) Pn(a, b)= Pn(aVn, bVn) = P(- ;,;< Dn,n(X) <~for all x)
and
g,(a Vn, b vn, n) = h,(a, b, n) = h,.
Then
00
(-l)'hr= I {H;",.(V2kc)-H:,(JI2(a+kc))}
k=-00
and
(4.3)
i=-00
Hence
c-b 2
I ~ ~:~~ b) da~ db
c
P(Vn Vn,n ~c)= P(a+b ~c)= I~
(4.2) c
b-O a-o
=
. b=O
Jh~b Pn(a, b) t-c-b db .
The computations are as in § 3. One obtains:
Some critical regions concerning Vn and Vn,n are given in tables 2 and 3.
Conclusions concerning Vn,m for n<m ea:a occasionally be obtained from
these tables and the fact that if n < ir,,
P(Vn Vn >c)< P(Vn Vn,m >c)< P(Vn Vn,m >c).
TABLE 2
Critical regions for the V,.•test. P( Ynv,. > c)= !X
~ 10 20 30 40 100
"'
\'
.10 1.4877 1.5322 1.5503 1.5608 1.5839 1.6196
.05 1.6066 1.6564 1.6760 1.6869 1. 7110 1.7473
.01 1.8391 1.9027 1.9153 1.9375 1.9637 2.0010
TABLE 3
Critical regions for the V,.,,.-test. (P( YnV,.,,. > c) =~X
~ 10 20 30 ~0 100
"'
\'
.10 2.2429 2.2663 2.2743 2.2783 2.2855 2.2905
.05 2.4041 2.4376 2.4488 2.4543 2.4643 2.4710
.01 2.6125 2.6988 2.7352 2.7556 2.7974 2.8298
Landbouwhogesclwol Wageningen
47
REFERENCES
1. BmNBAUM, Z. W. and R. PYKE, On some distributions related to the statistic
Dt. (Ann. of Math. Stat. 29, 179-187 (1958).
2. DARLING, D. A., To appear in the Bulletin of the Amer. Math, Soc.
3. GNEDENKO, B. V., Some results on the maximal deviation between two, empirical
distributions. Dokl. Akad. Nauk SSSR, 82, 661-663 (1952).
4. KEMPERMAN, J. H. B., Some exact formulae for the Kolmogorov-Smyrnov
distributions. Indag. Math. XIX, 535-540. = Proc. Amsterdam (1957).
5. , Asymptotic expansions for the Smirnov test and the range of cumu-
lative sums. Ann. of Math. Stat. 30, 448-463 (1959).
6. KuiPER, N. H., Annals of Math. Stat. 30, 251-252 (1959).
7. , On the random cumulative frequency function. This Proc. 32
Amsterdam (1959)
8. WESTENBERG, J., The median and interquartile range test applied to frequency
distributions plotted on a circular axis. Indag. Math. XII, 378-381
= Proc. Amsterdam (1950).