Mathematical Statistics: (Communicated by Prof. H. at The Meeting of October 31, 1959)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

MATHEMATICAL STATISTICS

TESTS CONCERNING RANDOM POINTS ON A CIRCLE


BY

NICOLAAS H. KUIPER

(Communicated by Prof. H. FREUDENTHAL at the meeting of October 31, 1959)

1. Introduction
H. KLOMP, Professor of zoology at Wageningen, suggested the following
problem to me:
Let M ~ (qJ1 , ... qJm) be a sequence of angles (0 <: (/Ji < 2n) representing
compass-directions into which m birds, or groups of birds, have been seen
flying on migration at a given place on earth at a given time, for example
at a given day. M is assumed to be a sequence of m independent obser-
vations concerning an unknown random direction cp. Let M' ~ (qJ~, ... qJ~)
be analogous with respect to cp', concerning a second place-time.

1. How to test the nullhypothesis cp ,..._, cp'? 1).


2. How to define and estimate the expectation of a random direction?
How to test whether these expectations coincide for cp and cp'?
Observe that in general the value E(cp) is not significant in this respect.
3. How to define and estimate a degree of the birds preference for a
particular compass-direction, the so-called degree of the orientation
of the birds ?
4. Given M, how to test the nullhypothesis that the birds have no
preference in direction at all, or that they fly according to a given
(theoretical) random direction cp.

In this paper we deal with these problems. Different circumstances


may invite to different approaches. We first consider in this § some
obvious methods. In the other §§ we study a non-parametric test.

In a first approach we assume that a strong inclination of the birds is


present, so that after a suitable choice of the "coordinate" qJ, that is of
the direction to be called qJ = 0, it can be assumed that the greater part
of the probability-mass is concentrated in a relatively small qJ-interval far
away from qJ = 0 and from qJ = 2n. This qJ-interval is then considered as
part of -oo<qJ<oo. Standard methods can be applied now. The expec-

1) The symbol ::::::::: means "Having the same cumulative distribution function".
Compare [5].
39
tation of the direction cp will be defined and given by E(cp) and an
estimate is .!.
m
~r ~t· A measure for the degree of the orientation is
{E(cp -Ecp)2}-1
which can also be estimated in the usual way. For m sufficiently large,
confidence intervals based on normal approximations can be given, and
results on different place-times can be compared.
If one has to compare several place-times it is claryfying to represent
the corresponding confidence intervals for E(cp) and [E(cp-Ecp) 2]l in a
plane with these numbers as polar coordinates.
WESTENBERG [6] applied related methods also under weaker conditions,
in which he indicated a choice for the direction which shall have coordinate
~ = 0. Personally we hesitate to go as far as he does.

A second approach consists in dividing the interval 0 < ~ < 2n in sub-


intervals, and applying the goodness-of-fit-Chi-square method. This may
be a good method in practice, for example in the mentioned birds-case
it was, because the directions were grouped from the beginning in 32
intervals.

In a third approach one considers the random euclidean unit-vector


z = (x, y) = (cos cp, sin cp) as a two-dimensional random vector. The "expec-
tation of the direction" can be defined as the unit vector E(z)/V(E(z)) 2
in case E(z)#O. If E(z)=O the preferences for different directions cancel.
If E(z) has length one, then P(z=E(z))= l so that the degree of orientation
is maximal.
. A measure for the degree of orientation is (E(z))2, which obeys
O.;;;;;(Ez)) 2<l. The vector E(z) then comprises both interesting parameters
as its direction and length.
If Xz is the circle-symmetric standard two-dimensional normal random
vector with density (2n)-1 exp. -!(x2+y2) and a is such an automorfism
of the two-dimensional vector space that any linear function with argument
z has the first and second moments in common with the same linear
function with argument E(z) + ax2, then if Z1, .•• Zm are m independent
replicates of z and if m is sufficiently large, the random vector z = ~ ~r= 1 Zi
can, according to the central limit theorem, be approximated by the
normal random vector:
1
E(z) + Vm·
~- a X2 •

In particular if z has uniform distribution on the tinit circle, one finds


that E(z) = 0, a is a scalar, and
2n dqJ 1
a 2 = E( cos2 cp) = / cos2 q; • 2n = 2 ..
40
Hence in this case i is approximated by the random vector
1
V2m X2>

and we may test the n'lillhypothesis that z has uniform distribution on the

e
unit circle, with the statistic 1):

approximation: 1+ ~ + Zmy = (i) 2 " ' 2~ X~.

For any distribution of z. on the unit circle the second moment of any
linear function f(z) with gradient l, that is a function of the kind
f(z)=xcosiX+ysiniX, IX constant, is immediately seen to be< l.
Hence also the variance of any such function is < l. This has as a
consequence that the linear transformation a has the property that for
any vector z, length az...;;; length z.
Again we assume m sufficiently large so that we could approximate z
with a normal random vector. If we use, under these assumptions, a
circular confidence region with confidence level IX for E(z), obtained by
putting
(i- E(z)) 2 " ' ~ X~ (which is ·not true)

then the true confidence level is certainly at most IX.


However, this method is very inefficient in case (E(z))2 is close to l,
that is in case the degree of orientation is large.

§ 2. The non-parametric statistic Vn


The main aim of this paper is a fourth approach to the problem. This
approach, which could be called non-parametric, is connected with the
Kolmogorov and Kolmogorov-Smyrnov tests. These tests are modified so
that they can be applied to random points on a circle instead of on a line.
Instead of cp in § l, we now use the random variable
1
x = 2n cp, O<;x< l.
In the sequel we use the notations of KUIPER [5]. The residu class
modulo l of a real number x or a set of numbers W is denoted by x or W
respectively. The set of residu classes mod l, with coordinates x, O<;x< l,
is the circle to be considered. The cumulative frequency (c.fr.) function
of a given set W of points x1, ... Xn, O..;;;xi< l, and the cumulative distri-
bution (c.d.) function of a random element y, 0...;;; y < l, starting cumulating
at the value b and jumping from x = l to x = 0, are
Ffv(x) and Fb(x), whereas
Dfv(x) = Ffv(x) - Fb(x), D~(x) = Dw(x).

1) An example of an alternative against which this test does not hold, is the
case that the probability mass is equally divided over the vertices of a regular polygon.
41
We assume that X1, ... Xn are independent random variables isomorous
with a random variable x with values in [0, 1). And we want to give a
test for the nullhypothesis
(2.1) Ho: x:::::: y.
We obtained m [5].
(2.2) Dtv(x)- Dw(x) is constant.
Consider supx Dtv(x) and infx Dtv(x). For b = 0 these values are denoted
in the literature byn: and - D;;. KoLMOGOROV uses such statistics and
also max (D:, D;;) in order to test H 0 in view of one-sided or two-sided
alternatives. (In the general case of real random variables, cumulating
starts at - oo ).
Analogously one might suggest supx Dtv(x) andfor infx Dtv(x) for tests
. concerning distributions on the circle of reals mod l. However, the values
of these statistics depend on the value b of x at which we start cumulating!
From (2.2) follows that
Vtv = sup.,Div(x)- inf.,Dtv(x)
is a function of the sequence W = (x1 , ... Xn), 0<Xt< l, which is independent
of b, and so we may substitute just as well b=O (and omit b in the
notation).
(2.3) Vw = sup.,Dw(x)- inf.,Dw(x).
Instead of V w we occasionally write Vn·
From a random set W we obtain the random variable:
Vw = v.. = sup.,Dw(x)- inf.,Dw(x) = D,t + D;;
which can be used on the circle by passing to the reals mod l, and which,
assuming the nullhypothesis, is independent of the c.d. function on this
circle and independent of the point at which we start cumulating.
If b is a constant then we will use the symbol b also for the random
variable which has a probability one of taking the value b. If Zn is a
sequence of random variables and the c.d.-functions of Zn converge for each
value to that of a random variable z, then we say that the limit* of Zn
for n -+ oo is isomorous with z: 1)
lim* Zn"" z.
11-->-00

Now, as it 1s well known that


lim* D,t ,....., lim* D;; ,...., 0
n-+oo n-+oo
also
lim* Vn"' 0.
n-> oo

1) This limit* should not be confused with the customary limit of a random
infinite sequence.
42
Consequently we suggest the statistic Vw= V,.. in (2.3) for testing whether n
given points are independent values of a given theoretical random point on
the circle.
3. An asymptotic formula for the c.d. function of V,..
An asymptotic formula for the distribution-function of V,.. can be
obtained from a result of D. DARLING [2]. DARLING has, assuming the
nullhypothesis
(3.1) P (for all x: -a< Vn D,..(x) <b)= <t>,..(a, b), a~ 0, b ~ 0,
where
<t>,..(a, b) = <l>(a, b)+ 6 ~ (i)~ +i)i)b) <l>(a, b)+ o(~)'

and I {e-2i'<~~+b>" _ e-2<iu+<i-I>b>"}.


00
<t>(a, b) =
i=-00

The following computation is due to DARLING and the author. The density
of the random vector
(Vn D;, Vn Dt)= (a, b)= (-infVn Dw(x), sup Vn Dw(x))
is

Hence
P(Vn V,.. ~c)= P(b- (-a)< c)= P(a+b<c)
c c-b

=
b=O
I ~ I !:~~ da ~ db
11=0
(3.2) c

=
b=O
I h~ <t>,..(a, b)t-c-b db
+ B(o) + o(!).
= A(c)
vn n
As
s~ <l>(a, b)?
( i)b ~~~-c-b
= i {-4j2ce-2i'o" +4(j-1) (jc-b) e- 2<ic-bl'}
i- _ 00

we have
A(c) = j ~ ~ <f>(a, b)?Sa-c-b db= i-f-oo {_4j2c2e-2i'c' + (j -1) e-2(ic-bl b=Oh
b=O( i)b
1

00
= _I { _ 4j2c2e-2i'c' +
i= -oo + (j -1)(e-2<i-I>'c' _ (j _ 1) e-2i1c}

=-I
00
e":.2i"o"{-4j2-e2+j-(j-1)}
;-·-'oo .

=I
00
(1- 4j2c2) e-2i•c•
i--oo

I
00
= 1- 2(4j2c2-1) e- 2i"c".
-1
43
From (3.1) (3.2) one also obtains

B(c) = ~.!:.. A(c) = ~ c


6 de 3
I j2(4j2c2- 3) e-
i=l
2i'c'.

Hence

1-! 2(4j2c2-1) e- i'c' +


- 00
P{Vn Vn ~ c} = 2
i=l
(3.3)
+ _s_ c
3Vn
I j2 (4j2c2- 3)
i=l
e-2i'c' + o(~).
n

For c > ~ a reasonable approximation is obtained from first terms of


the series as follows :

(3.4)

The formula (3.3) was compared with the result concerning 200 in-
dependent samples of 10 numbers between 0 and 1, in three decimal
places, obtained from a table of random numbers. The c.fr. function of
the 200 values of V 10 so obtained is given in figure l. It is seen to be
in good agreement with the values according to the formula given in
table l. (The cumulation in the figure goes from right to left.)

TABLE 1
c u = cjV10 v = P(Vn > cjVlo)

1.0 0.316 0.693


l.l .348 .528
1.2 .379 .377
1.3 .411 .252
1.4 .443 .158
1.5 .474 .093
1.6 .506 .052
1.7 .536 .027
1.8 .569 .0135
1.9 .600 .0063

§ 4. The statistic Vn,m


Analogously one may consider two random c.fr. functions F~(x) and
F!;.(x) concerning independent samples of size n and m of two unknown
random variables on the circle ofreals mod 1 x andy, starting the cumula-
tion at b. We want to test the nullhypothesis x,....., y in view of "values"
F~(x) and F~(x) that the random c.fr. functions F~(x) and F!;.(x) have taken.
Let
D~.m(x) = F~(x) - F!;.(x).

Then supx D~.m(x)- infx D~.m(x)


44
is independent of the point b, and so we may substitute just as well b = 0
(and omit b in the notation).
Let
V ... m =sup D ... m(x) - inf. D,..m(x) = D;tm + D;,m.
The statistic Vn,m is independent of the point at which we start cumula-
ting; it is also independent of the c.d. function of x("' y).

::Jo
CD
·d

• 0
.• ~
..

...
• •
.··

......

-·· ••
-
1\
:,
..........~··

....
0

... ...


.....
>I
a.. ..~

.~·-'
>II 0
------~~--------r---------r---------~--------~a
m CD ~ o
d d d .
Fig. 1
45
We remark
lim* Vn,m'"""' Vn
m ..... oo

and recall
lim* Vn'"""' 0.
n ..... oo

We suggest the statistic Vn,m for testing whether in case n given independent
values of an unknown random point x and m given independent values of
an unknown random point y can come from the same continuous distribution
x '"""' y on the circle.

In order to apply the test one needs the c.d. function of Vn,m which,
however, we did not yet determine in general. For the case m=n we
can obtain this c.d. function as follows from a formula of GNEDENKO [3],
recently improved by KEMPERMAN [5]: § 4, formula (9).

Pn(a,b)=P(for all x: -~<Dn,n(x)<~) =


(4.l) ) 9 16 1
= Yo+ (3go-g2)/(24n) + ( 2 go- 3g2- 5 (/3 + 2 (/4)/(24n) 2+ 0 (n-3)

with

* ,ro:::- H2r*
~ ~ H2r
( -1)' g,(a, b, n) = (-IV g, = ~ (2kc (2a+ffn2kc) I
i
k=-oo v 2n 2n J

n:,(x) = ( d)2r
d:l; e-rl11Z, c = a+b.

We will use different formulas obtained as follows:


Let

(4.2) Pn(a, b)= Pn(aVn, bVn) = P(- ;,;< Dn,n(X) <~for all x)
and
g,(a Vn, b vn, n) = h,(a, b, n) = h,.
Then
00

(-l)'hr= I {H;",.(V2kc)-H:,(JI2(a+kc))}
k=-00

and

(4.3)

For our purpose it will be sufficient to use

(4.4) Pn(a, b) = ho+ (3ho-h2)/(24n) + O(n-2)


46
with
00
ho = ! (e-i'•'- e-<a+icl'), c =a+ b
i=-00

i=-00

The density of the random vector.


(a, b) = ( ....:...infz v'n Dn,n(x), supz Vn Dn,n(x))
is therefore

Hence
c-b 2
I ~ ~:~~ b) da~ db
c
P(Vn Vn,n ~c)= P(a+b ~c)= I~
(4.2) c
b-O a-o

=
. b=O
Jh~b Pn(a, b) t-c-b db .
The computations are as in § 3. One obtains:

P(VnVn,n ~c)= 1-! 2(2j2c2-l) e-i'•'+


+ l~: +I j2c2(2j2c2-7) e-i•c') +O(n-2).
)
43
( . ) .
6n i=l

Some critical regions concerning Vn and Vn,n are given in tables 2 and 3.
Conclusions concerning Vn,m for n<m ea:a occasionally be obtained from
these tables and the fact that if n < ir,,
P(Vn Vn >c)< P(Vn Vn,m >c)< P(Vn Vn,m >c).
TABLE 2
Critical regions for the V,.•test. P( Ynv,. > c)= !X

~ 10 20 30 40 100
"'

\'
.10 1.4877 1.5322 1.5503 1.5608 1.5839 1.6196
.05 1.6066 1.6564 1.6760 1.6869 1. 7110 1.7473
.01 1.8391 1.9027 1.9153 1.9375 1.9637 2.0010

TABLE 3
Critical regions for the V,.,,.-test. (P( YnV,.,,. > c) =~X

~ 10 20 30 ~0 100
"'

\'
.10 2.2429 2.2663 2.2743 2.2783 2.2855 2.2905
.05 2.4041 2.4376 2.4488 2.4543 2.4643 2.4710
.01 2.6125 2.6988 2.7352 2.7556 2.7974 2.8298
Landbouwhogesclwol Wageningen
47

REFERENCES
1. BmNBAUM, Z. W. and R. PYKE, On some distributions related to the statistic
Dt. (Ann. of Math. Stat. 29, 179-187 (1958).
2. DARLING, D. A., To appear in the Bulletin of the Amer. Math, Soc.
3. GNEDENKO, B. V., Some results on the maximal deviation between two, empirical
distributions. Dokl. Akad. Nauk SSSR, 82, 661-663 (1952).
4. KEMPERMAN, J. H. B., Some exact formulae for the Kolmogorov-Smyrnov
distributions. Indag. Math. XIX, 535-540. = Proc. Amsterdam (1957).
5. , Asymptotic expansions for the Smirnov test and the range of cumu-
lative sums. Ann. of Math. Stat. 30, 448-463 (1959).
6. KuiPER, N. H., Annals of Math. Stat. 30, 251-252 (1959).
7. , On the random cumulative frequency function. This Proc. 32
Amsterdam (1959)
8. WESTENBERG, J., The median and interquartile range test applied to frequency
distributions plotted on a circular axis. Indag. Math. XII, 378-381
= Proc. Amsterdam (1950).

You might also like