Estadísticas
Estadísticas
Estadísticas
SUMMARY
To define a likelihood we have to specify the form of distribution of the observations, but
to define a quasi-likelihood function we need only specify a relation between the mean and
variance of the observations and the quasi-likelihood can then be used for estimation. For
a one-parameter exponential family the log likelihood is the same as the quasi-likelihood
and it follows that assuming a one-parameter exponential family is the weakest sort of
distributional assumption that can be made. The Gauss-Newton method for calculating
nonlinear least squares estimates generalizes easily to deal with maximum quasi-likelihood
estimates, and a rearrangement of this produces a generalization of the method described
by Nelder & Wedderburn (1972).
Some key words: Estimation; Exponential families; Gauss-Newton method; Generalized linear models;
Maximum likelihood; Quasi-likelihood.
1. INTRODUCTION
This paper is mainly concerned with fitting regression models, linear or nonlinear, in
which the variance of each observation is specified to be either equal to, or proportional to,
some function of its expectation. If the form of distribution of the observations were
specified, the method of maximum likelihood would give estimates of the parameters in
the model. For instance, if it is specified that the observations have normally distributed
errors with constant variance, then the method of least squares provides expressions for the
variances and covariances of the estimates, exact for linear models and approximate for
nonlinear ones, and these estimates and the expressions for their errors remain valid even
if the observations are not normally distributed but merely have a fixed variance; thus, with
linear models and a given error variance, the variance of least squares estimates is not
affected by the distribution of the errors, and the same holds approximately for nonlinear
ones.
A more general situation is considered in this paper, namely the situation when there is a
given relation between the variance and mean of the observations, possibly with an unknown
constant of proportionality. A similar problem was considered from a Bayesian viewpoint by
Hartigan (1969). We define a quasi-likelihood function, which can be used for estimation in
the same way as a likelihood function. With constant variance this again leads to least
squares estimation. When other mean-variance relationships are specified, the quasi-
likelihood sometimes turns out to be a recognizable likelihood function; for instance, for a
constant coefficient of variation the quasi-likelihood function is the same as the likelihood
obtained by treating the observations as if they had a gamma distribution.
440 R. W. M. WEDDEBBUBISr
Then computational methods are discussed. The well-known Gauss-Newton method for
calculation of nonlinear least squares estimates is generalized to provide a method of
calculating maximum quasi-likelihood estimates.
When there exists a function of the means that is linear in the parameters, a rearrangement
of the calculations in the generalized Gauss-Newton method gives a procedure identical to
that described by Nelder & Wedderburn (1972). This method produces maximum likelihood
out v{/ii)
or equivalently
'%< ?x d/i'i + function of zt.
J " [Mil
From now on, when convenient, the subscript i will be dropped so that z and ji will refer
to an observation and its expectation, respectively. Also, the notation S(.) will be used to
denote summation over the observations, so that S(z) is the sum of the observations. We shall
find that K has many properties in common with a log likelihood function. In fact, we find
that K is the log likelihood function of the distribution if z comes from a one-parameter
exponential family, as will be shown in § 4.
3. PROPERTIES OF QUASI-LIKELIHOODS
It is now shown that the function K has properties similar to those of log likelihoods.
THEOREM 1. Let z and K be defined as in § 2, and suppose that /i is expressed as a function
of parameters fix,...,pm. Then K has the following properties:
F(8K8K\ _ F( 8 K
* \ = * 8/l d/l
Quasi-likelihood and generalized linear models 441
Proof. First, (i) follows immediately from the definition of K. Then (ii) follows on noting
that dK/dfii = (dK/dfi) (dfi/dfit) and (iii) is a special case of (iv).
To prove (iv), we note that
-tdKdKX (dK\28/i8/i
\ Oil J dp • OP A
r
_ 1 d/i
-
since V{/i) = var (z). Also we have
1 8/i d/i
Proof. From the theorem just proved, the above statement is equivalent to
var(z)^-l/J6/SV
a result which follows immediately from the Cramer-Rao inequality (Kendall & Stuart,
1973, §7.14).
(3)
T^WY
where /i = E(z) and V(ji) = var (z), if and only if the density of z with respect to some measure
can be written in the form exp{zd — g{0)}, where 6 is some function of/i.
Proof. If BL/d/i has the form (3), then integrating with respect to ji and defining
dfi
442 R. W. M. W E D D E R B U K N
we have the required result. Conversely, suppose for some measure m on the real line, the
distribution of z is given by exp{zd-g(6)}dm(z). Then JezSdm(z) = em, and so the moment
generating function M(t) ofz is
It follows that g(d + t) — g(6), regarded as a function off, is the cumulant generating function
Then Theorem 2 shows that this inequality becomes an equality for a one-parameter
exponential family. Thus for a given mean-variance relationship, a one-parameter
exponential family minimizes the information — E(d2Lldfi2), provided that an exponential
family exists for that relationship.
It seems reasonable to regard — E(82K/8/i2), which is equal to 1/var (z), as a measure of the
information z gives concerning ju, when only the mean-variance relationship is known, and to
regard E{82{K—L)l8/i2}t which is always nonnegative, as the additional information
provided by knowing the distribution ofz. From this point of view, assuming a one-parameter
exponential family for z is equivalent to making no assumption other than the mean-
variance relationship.
Let H = 82S(K)\dfii8fii; then, if the observations are independent, S(u) has mean 0 and
dispersion D = —E(H). Now let ft be the maximum quasi-likelihood estimate of ft, obtained
by setting S(u) equal to its expectation, 0. To first order in ft-ft we have S(u) ^== H(fi-ft),
whence fi-ft^ H^S)
Quasi-likelihood and generalized linear models 443
Approximating to H by its expectation, — D, we have,
Now D-18(ji) has dispersion D~l; hence we have deduced, rather informally, the following
result.
3. Maximum quasi-likelihood estimates have approximate dispersion matrix
Then if ft denotes the current estimates and ft* = ft+Sft, the corrected ones, and if
Y = S/ftjS^ we have
Nelder & Wedderburn showed that this technique could be used to obtain maximum
likelihood estimates when there was a linearizing transformation of the mean/(/i), and the
distribution of z could be expressed in the form
n(z; 6,4) = a(
where 6 is a function of/i and <j> is a nuisance parameter. For fixed <l> this gives a one-parameter
exponential family, so that the likelihood is the same as the quasi-likelihood. Also, by a
simple extension of the argument used in Theorem 1 we have var (z) = g"{d)ja.(<j>). Hence
the mean-variance relationship is of the form given in (3), and the result of Nelder &
Wedderburn is a special case of Theorem 5.
A good starting approximation in this process is usually given by setting /i = z and
calculating w from (5) and y as f(z), but some modification may be needed when / has
singularities at the ends of the range of possible z.
Quasi-likelihood and generalized linear models 445
8. EXAMPLE
J. F. Jenkyn in an unpublished Aberystwyth Ph.D. thesis, discussed the data of Table I
which gives estimates of the percentage leaf area of barley infected with Rhynchosporium
secalis, or leaf blotch, for 10 different varieties grown at 9 different sites in a variety trial in
1965.
6 100 5-00 500 5-00 5-00 5-00 10-00 5-00 50-00 75-00 16-60
7 5-00 010 5-00 500 5000 10-00 5000 25-00 50-00 7500 27-51
8 500 10-00 5-00 5-00 25-00 75-00 5000 7500 75-00 75-00 4000
9 17-50 25-00 42-50 5000 37-50 95-00 62-50 95-00 95-00 9500 61-50
Mean 4-20 4-77 7-34 9-57 1400 21-76 2417 34-81 37-22 49-33
An attempt was made to analyze the data using a logistic transformation. To do this, zero
had to be replaced by some suitably small value; since the value 0-01 % occurs in the data
we could hardly replace zero by something greater than this, unless 0-01 % were to be
increased too. It was found that some of these small values in the data gave large negative
residuals which had a serious distorting effect on the analysis, and only when these values
were ignored was it possible to obtain a satisfactory analysis.
The logistic transformation appeared to be about right for stabilizing variance and
producing additivity except for the undesirable effect of the small observations. This led to
a different formulation of the model which was the same to a first order of approximation,
but which avoided the problems caused by applying a logistic transformation to small
observations.
Let pif denote the proportion of leaf area infected in the ith variety at thejth site. Let
Pij = E(pij) and Qij=l — Pij. The model is stated as logitl^ = Ttj = m + ai + b^ and
?
446 R . W. M. W E D D E B B U R N
When the method of §6 is applied the weighting variate is equal to 1. This is useful
because it means that the iterative analysis remains orthogonal. The modified y variate,
/(/*)» takes the form of a 'working logit', namely
1 2 3 4 5 6 7 8 9 10
Mean of -4-05 -4-51 -3-96 -309 -2-69 -2-71 -1-71 -0-78 -0-91 -016
A
logit Pn
(Standard error + 0-331.)
The mean values of Tti for each variety are shown, with their standard errors, in Table 2.
Clearly there are differences between varieties; there seem to be 3 highly resistant varieties
and 3 less so, while the remaining 4 are much more susceptible.
Starting with the final set of working logits, the technique of Finlay & Wilkinson (1963)
was applied noniteratively, but there was no sign of any interaction; nor was there any when
a single degree of freedom for nonadditivity (Tukey, 1949) was isolated.
I t seems, then, that a simple summary of the data has been achieved which makes it easy
to see what conclusions can be drawn. The simpler method of working with logit Pit might
have worked better if the variance had not been so large; part of the trouble is that with
such a large variance the approximations
&t, ^ log Ptj, var (logit^) ^ var (p<,)/(Pfc Q%)
break down.
9. CONCLUSIONS
It may be difficult to decide what distribution one's observations follow, but the form of
the mean-variance relationship is often much easier to postulate; this is what makes quasi-
likelihoods useful. It has been seen how maximum quasi-likelihood estimation produced a
satisfactory analysis of rather difficult data, and how these estimates can be computed.
Some procedures used in the past are best understood in terms of quasi-likelihoods. For
instance, in probit analysis, when the variance of the observations is found to be greater
than that predicted by the binomial distribution, it is common to accept the maximum
Quasi-likelihood and generalized linear models 4A1
likelihood estimates regardless, while estimating the degree of heterogeneity as in Chapter 4
of Finney (1971). If the variance is still proportional to binomial variance then this pro-
cedure can be justified in terms of quasi-likelihoods. Also Fisher (1949),findingthat in some
data the variance was proportional to the mean, treated them effectively as if they had a
Poisson distribution, even though the measurement involved was a continuous one. Thus
quasi-likelihoods improve understanding of some past procedures, as well as providing new
The author wishes to thank the director of the National Institute of Agricultural Botany
and Dr J. F. Jenkyn for their permission to use the data, Mr M. J. R. Healy whose comments
on an earlier version of the paper improved the presentation and Mr R. W. Payne for
running the calculations on the GENSTAT statistical program developed at Rothamsted.
REFERENCES
FINLAY, K. W. & WILKINSON, G. N. (1963). The analysis of adaptation in a plant breeding programme.
Aust. J. Agric. Res. 14, 742-54.
FINNEY, D. J. (1971). Probit Analysis, 3rd edition. Cambridge University Press.
FISHER, R. A. (1949). A biological assay of tuberculins. Biometrics 5, 300-16.
HARTIGAN, J. A. (1969). Linear Bayesian models. J.R. Statist. Soc. B 31, 446-54.
KENDAXX, M. G. & STUART, A. L. (1973). The Advanced Theory of Statistics, Vol. II, 3rd edition. London :
Griffin.
LEHMANN, E. L. (1959). Testing Statistical Hypotheses. New York: Wiley. ,
NELDER, J. A. & WEDDERBURN, R. W. M. (1972). Generalized linear models. J.R. Statist. Soc. A 135,
370-84.
TUKEY, J. W. (1949). One degree of freedom for non-additivity. Biometrics 5, 232-42.