Least Squares Adjustment
Least Squares Adjustment
Least Squares Adjustment
Preface
This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed
adjustment1 . The note also contains a couple of typical land surveying and satellite positioning application
examples. In these application areas we are typically interested in the parameters in the model typically 2or 3-D positions and not in predictive modelling which is often the main concern in other regression analysis
applications.
Adjustment is often used to obtain estimates of relevant parameters in an over-determined system of equations
which may arise from deliberately carrying out more measurements than actually needed to determine the set
of desired parameters. An example may be the determination of a geographical position based on information
from a number of Global Navigation Satellite System (GNSS) satellites also known as space vehicles (SV).
It takes at least four SVs to determine the position (and the clock error) of a GNSS receiver. Often more than
four SVs are used and we use adjustment to obtain a better estimate of the geographical position (and the
clock error) and to obtain estimates of the uncertainty with which the position is determined.
Regression analysis is used in many other fields of application both in the natural, the technical and the social
sciences. Examples may be curve fitting, calibration, establishing relationships between different variables
in an experiment or in a survey, etc. Regression analysis is probably one the most used statistical techniques
around.
Dr. Anna B. O. Jensen provided insight and data for the Global Positioning System (GPS) example.
Matlab code and sections that are considered as either traditional land surveying material or as advanced
material are typeset with smaller fonts.
Comments in general or on for example unavoidable typos, shortcomings and errors are most welcome.
1
in Danish udjvning
Contents
Preface
Contents
1.1
1.2
1.3
1.1.1
Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.2
1.1.3
11
1.1.4
13
1.1.5
QR Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
1.1.6
Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
14
1.2.1
Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
1.2.2
Weight Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
1.2.3
18
1.2.4
WLS as OLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
21
2.2
22
23
2.1.1
Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.1.2
Iterative Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2.1.3
24
2.1.4
Confidence Ellipsoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.1.5
25
2.1.6
26
42
2.2.1
42
2.2.2
Newtons Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
2.2.3
43
2.2.4
44
3 Final Comments
45
Literature
46
Index
46
3.5
2.5
1.5
0.5
10
15
20
25
Time [days]
30
35
40
45
50
(1)
(2)
(3)
(4)
Rewrite to get
e1 = y1 1 x1
e2 = y2 1 x2
..
.
en = yn 1 xn .
(5)
(6)
(7)
(8)
In order to find the best line through (the origo and) the point cloud {xi yi }ni=1 by means of the least squares
principle write
=
n
n
1X
1X
e2i =
(yi 1 xi )2
2 i=1
2 i=1
(9)
4
and find the derivative of with respect to the slope 1
n
n
X
X
d
= (yi 1 xi )(xi ) = (1 x2i xi yi ).
d1 i=1
i=1
(10)
Setting the derivative equal to zero and denoting the solution 1 we get
1
n
X
x2i =
i=1
n
X
xi yi
(11)
i=1
xi yi
1 = P 2 .
xi
(12)
n
d2 X
=
x2 > 0
d12 i=1 i
(13)
Since
for non-trivial cases 1 gives a minimum for . This 1 gives the best straight line through the origo and the
point cloud, best in the sense that it minimizes (half) the sum of the squared residuals measured along the
y-axis, i.e., perpendicular to the x-axis. In other words: the xi s are considered as uncertainty- or error-free
constants, all the uncertainty or error is associated with the yi s.
Lets look at another situation where we want to predict one (response) variable y as an affine function of one
(predictor) variable x. We have n joint observations of x and y and write the model where the parameter 0
is the intercept of the line with the y-axis and the parameter 1 is the slope of the line as
y1 = 0 + 1 x1 + e1
y2 = 0 + 1 x2 + e2
..
.
yn = 0 + 1 xn + en .
(14)
(15)
(16)
(17)
Rewrite to get
e1 = y1 (0 + 1 x1 )
e2 = y2 (0 + 1 x2 )
..
.
en = yn (0 + 1 xn ).
(18)
(19)
(20)
(21)
In order to find the best line through the point cloud {xi yi }ni=1 (and this time not necessarily through the
origo) by means of the least squares principle write
=
n
n
1X
1X
e2i =
(yi (0 + 1 xi ))2
2 i=1
2 i=1
(22)
and find the partial derivatives of with respect to the intercept 0 and the slope 1
n
n
n
X
X
X
=
(yi (0 + 1 xi ))(1) =
yi + n0 + 1
xi
0
i=1
i=1
i=1
(23)
n
n
n
n
X
X
X
X
=
(yi (0 + 1 xi ))(xi ) =
xi y i + 0
xi + 1
x2i .
1
i=1
i=1
i=1
i=1
(24)
Setting the partial derivatives equal to zero and denoting the solutions 0 and 1 we get (omitting the summation indices for clarity)
0 =
P 2P
P P
xi y i xi xi y i
P
P
n x2i ( xi )2
P
P P
(25)
n xi yi xi yi
1 =
.
P
P
n x2i ( xi )2
(26)
P
P
P
P
We see that 1 xi + n0 = yi or y = 0 + 1 x where x = xi /n is the mean value of x and y = yi /n
is the mean value of y. Another way of writing this is
0 = y 1 x
P
(xi x)(yi y)
xy
1 =
=
.
P
2
(xi x)
x2
(27)
(28)
where
xy = (xi x)(yi y)/(n 1) is the covariance between x and y, and
x2 =
the variance of x. Also in this case 0 and 1 give a minimum for .
yi (0 , 1 , . . . , p0 ; x1 , . . . , xp0 ) + ei
yi (; x) + ei
yi () + ei
(0 + ) 1 xi1 + + p0 xip0 + ei , i = 1, . . . , n
(29)
(30)
(31)
(32)
where = [0 1 . . . p0 ]T , x = [x1 . . . xp0 ]T , and ei is the difference between the data and the model for
observation i with expectation value E{ei } = 0. ei is termed the residual or the error. The last equation above
is written with the constant or the intercept 0 in parenthesis since we may want to include 0 in the model or
we may not want to, see also Examples 3-5. Write all n equations in matrix form
y1
y2
..
.
yn
1 x11
1 x21
..
..
.
.
1 xn1
x1p0
x2p0
..
..
.
.
xnp0
0
1
..
.
p0
e1
e2
..
.
(33)
en
or
y = X + e
where
y is n 1,
(34)
6
X is n p, p = p0 + 1 if an intercept 0 is estimated, p = p0 if not,
is p 1, and
e is n 1 with expectation E{e} = 0.
If we dont want to include 0 in the model, 0 is omitted from and so is the first column of ones in X.
Equations 33 and 34 are termed the observation equations2 . The columns in X must be linearly independent,
i.e., X is full column rank. Here we study the situation where the system of equations is over-determined,
i.e., we have more observations than parameters, n > p. f = n p is termed the number of degrees of
freedom3 .
The model is linear in the parameters but not necessarily linear in y and xj (for instance y could be replaced
by ln y or 1/y, or xj could be replaced by xj , extra columns with products xk xl called interactions could be
added to X or similarly). Transformations of y have implications for the nature of the residual.
Finding an optimal given a set of observed data (the ys and the xj s) and an objective function (or a cost or
a merit function, see below) is referred to as regression analysis in statistics. The elements of the vector are
also called the regression coefficients. In some application sciences such as geodesy including land surveying
regression analysis is termed adjustment4 .
All uncertainty (or error) is associated with y, the xj s are considered as constants which may be reasonable
or not depending on (the genesis of) the data to be analyzed.
(35)
(36)
(37)
= X T y + X T X.
(38)
When the columns of X are linearly independent the second order derivative 2 / T = X T X is positive
definite. Therefore we have a minimum for . Note that the p p X T X is symmetric, (X T X)T = X T X.
OLS (pronounced theta-hat) by setting / = 0 to obtain the
We find the OLS estimate for termed
5
normal equations
OLS = X T y.
XT X
2
in Danish observationsligningerne
in Danish antal frihedsgrader or antal overbestemmelser
4
in Danish udjvning
5
in Danish normalligningerne
3
(39)
Parameter Estimates
If the symmetric matrix X T X is well behaved, i.e., it is full rank (equal to p) corresponding to linearly
independent columns in X a formal solution is
OLS = (X T X)1 X T y.
(40)
For reasons of numerical stability especially in situations with nearly linear dependencies between the columns
of X (causing slight alterations to the observed values in X to lead to substantial changes in the estimated ;
this problem is known as multicollinearity) the system of normal equations should not be solved by inverting
X T X but rather by means of SVD, QR or Cholesky decomposition, see Sections 1.1.4, 1.1.5 and 1.1.6.
If we apply Equation 40 to the simple regression problem in Equations 14-17 of course we get the same
solution as in Equations 25 and 26.
When we apply regression analysis in other application areas we are often interested in predicting the response
In
variable based on new data not used in the estimation of the parameters or the regression coefficients .
and not on this predictive modelling.
land surveying and GNSS applications we are typically interested in
OLS can be found in one go because eT e is quadratic in ; unlike in the nonlinear case
(In the linear case
dealt with in Section 2 we dont need an initial value for and an iterative procedure.)
(pronounced y-hat) is
The estimate for y termed y
OLS = X(X T X)1 X T y = Hy
= X
y
(41)
. In geodesy
where H = X(X T X)1 X T is the so-called hat matrix since it transforms or projects y into y
6
(and land surveying) these equations are termed the fundamental equations . H is a projection matrix: it
is symmetric, H = H T , and idempotent, HH = H. We also have HX = X and that the trace of H,
trH = tr(X(X T X)1 X T ) = tr(X T X(X T X)1 ) = trI p = p.
(pronounced e-hat) is
The estimate of the error term e (also known as the residual) termed e
=yy
= y Hy = (I H)y.
e
(42)
E{(X T X)1 X T y}
(X T X)1 X T E{y}
(X T X)1 X T E{X + e}
,
(43)
(44)
(45)
(46)
Example 3 (from Strang and Borre, 1997, p. 306) Between four points A, B, C and D situated on a straight
line we have measured all pairwise distances AB, BC, CD, AC, AD and BD. The six measurements are
6
in Danish fundamentalligningerne
y = [3.17 1.12 2.25 4.31 6.51 3.36]T m. We wish to determine the distances 1 = AB, 2 = BC and
3 = CD by means of linear least squares adjustment. We have n = 6, p = 3 and f = 3. The six observation
equations are
y1
y2
y3
y4
y5
y6
=
=
=
=
=
=
1 + e1
2 + e2
3 + e3
1 + 2 + e4
1 + 2 + 3 + e5
2 + 3 + e6 .
(47)
(48)
(49)
(50)
(51)
(52)
3.17
1.12
2.25
4.31
6.51
3.36
1
0
0
1
1
0
0
1
0
1
1
1
0
0
1
0
1
1
e1
e2
e3
e4
e5
e6
(53)
= X T y; units are m)
The normal equations are (this is X T X
3 2 1
1
13.99
2 4 2 2 = 15.30 .
1 2 3
3
12.12
(54)
H=
1/2 1/4
0
1/4 1/4 1/4
1/4
1/2 1/4
1/4
0
1/4
0 1/4
1/2 1/4 1/4
1/4
.
1/4
1/4 1/4
1/2 1/4
0
1/4
0
1/4
1/4 1/2
1/4
1/4
1/4
1/4
0 1/4
1/2
(55)
3.17
1.12
2.25
4.31
6.51
3.36
1
1
1
1
1
1
1
0
0
1
1
0
0
1
0
1
1
1
0
0
1
0
1
1
0
1
2
3
e1
e2
e3
e4
e5
e6
(56)
6
3
4
3
3
3
2
1
4
2
4
2
3
1
2
3
0
1
2
3
20.72
13.99
15.30
12.12
(57)
H=
3/4
0
1/4
1/4
0 1/4
0
3/4
0
1/4 1/4
1/4
1/4
0
3/4 1/4
0
1/4
.
1/4
1/4 1/4
1/2
1/4
0
0 1/4
0
1/4
3/4
1/4
1/4
1/4
1/4
0
1/4
1/2
1.1.2
(58)
[end of example]
OLS , y
and e
are
Dispersion or variance-covariance matrices for y,
D{y} = 2 I
OLS } = D{(X T X)1 X T y}
D{
= (X T X)1 X T D{y}X(X T X)1
= 2 (X T X)1
OLS }
D{
y } = D{X
OLS }X T
= XD{
= 2 H, V{
yi } = 2 Hii
D{
e} = D{(I H)y}
= (I H)D{y}(I H)T
= 2 (I H) = D{y} D{
y }, V{
ei } = 2 (1 Hii ).
(59)
(60)
(61)
(62)
(63)
(64)
(65)
(66)
(67)
(68)
The ith diagonal element of H, Hii , is called the leverage7 for observation i. We see that a high leverage gives
a high variance for yi indicating that observation i may have a high influence on the regression compared to
other observations. This again indicates that observation i may be an outlier, see also Section 1.1.3 on residual
and influence analysis.
For the sum of squared errors (SSE, also called RSS for the residual sum of squares) we get
T e
= y T (I H)y
e
(69)
2 = e
(70)
3 2 1
2 4 2
1 2 3
7
in Danish potentialet
1/2 1/4
0
1/2 1/4
= 1/4
.
0 1/4
1/2
(71)
10
and dispersion 2 (X T X)1 . Assuming that i = ci where ci is a constant it can be shown that the ratio
zi =
i ci
(72)
follows a t distribution with n p degrees of freedom. This can be used to test whether i ci is significantly different from 0. If
for example zi with ci = 0 has a small absolute value then i is not significantly different from 0 and xi should be removed from
the model.
Example 5 (continuing Example 4) The t-test statistics zi with ci = 0 in the case with no intercept are [266.3 94.31 187.8]T
which are all very large compared to 95% or 99% percentiles in a two-sided t-test with three degrees of freedom, 3.182 and 5.841
respectively. The probabilities of finding larger values of |zi | are [0.0000 0.0000 0.0000]T . Hence all parameter estimates are
significantly different from zero. The t-test statistics zi with ci = 0 in the case with an intercept are [0.8485 206.6 72.83 145.5]T ;
all but the first value are very large compared to 95% and 99% percentiles in a two-sided t-test with two degrees of freedom,
4.303 and 9.925 respectively. The probabilities of finding larger values of |zi | are [0.4855 0.0000 0.0002 0.0000]T . Therefore the
estimate of 0 is insignificant (i.e., it is not significantly different from zero) and the intercept corresponding to an imprecise zero
mark of the distance measuring device used should not be included in the model.
[end of example]
Often a measure of variance reduction termed the coefficient of determination R2 and a version that adjusts
2
for the number of parameters Radj
are defined in the statistical literature:
SST0 = y T y (if no intercept 0 is estimated)
)T (y y
) (if an intercept 0 is estimated)
SST1 = (y y
T
e
SSE = e
R2 = 1 SSE/SSTi
2
Radj
= 1 (1 R2 )(n i)/(n p) where i is 0 or 1 as indicated by SSTi .
2
2
Both R2 and Radj
lie in the interval [0,1]. For a good model with a good fit to the data both R2 and Radj
should be close to 1.
Matlab code for Examples 3 to 5
% (C) Copyright 2003
% Allan Aasbjerg Nielsen
% aa@imm.dtu.dk, www.imm.dtu.dk/aa
% model without intercept
y = [3.17 1.12 2.25 4.31 6.51 3.36];
X = [1 0 0; 0 1 0; 0 0 1; 1 1 0; 1 1 1; 0 1 1];
[n,p] = size(X);
f = n-p;
thetah = X\y;
yh = X*thetah;
eh = y-yh;
s2 = eh*eh/f;
s = sqrt(s2);
iXX = inv(X*X);
Dthetah = s2.*iXX;
stdthetah = sqrt(diag(Dthetah));
t = thetah./stdthetah;
pt = betainc(f./(f+t.2),0.5*f,0.5);
H = X*iXX*X;
Hii = diag(H);
% model with intercept
X = [ones(n,1) X];
[n,p] = size(X);
f = n-p;
11
thetah = X\y;
yh = X*thetah;
eh = y-yh;
s2 = eh*eh/f;
s = sqrt(s2);
iXX = inv(X*X);
Dthetah = s2.*iXX;
stdthetah = sqrt(diag(Dthetah));
t = thetah./stdthetah;
pt = betainc(f./(f+t.2),0.5*f,0.5);
H = X*iXX*X;
Hii = diag(H);
The Matlab backslash operator \ or mldivide, left matrix divide, in this case with X non-square computes the QR factorization (see Section 1.1.5) of X and finds the least squares solution by back-substitution.
Probabilities in the t distribution are calculated by means of the incomplete beta function evaluated in Matlab by the betainc
function.
1.1.3
Residual analysis is performed to check the model and to find possible outliers or gross errors in the data.
against y
and e
against the columns in X (the explanatory variables
Often inspection of listings or plots of e
or the regressors) are useful. No systematic tendencies should be observable in these listings or plots.
Standardized residuals
ei
e0i =
1 Hii
(73)
which have unit variance (see Equation 68) are often used.
Studentized or jackknifed residuals (regression omitting observation i to obtain a prediction for the omitted
2
observation y(i) and an estimate of the corresponding error variance
(i)
)
yi y(i)
ei = q
(74)
V{yi y(i) }
are also often used. We dont have to redo the adjustment each time an observation is left out since it can be
shown that
,v
u
u n p e0 2
i
0
ei = ei t
.
(75)
np1
For the sum of the diagonal elements Hii of the hat matrix we have trH = ni=1 Hii = p which means that
ii = p/n. Therefore an alarm for very influential observations which may be outliers
the average value H
could be set if Hii > 2p/n (or maybe if Hii > 3p/n). As mentioned above Hii is termed the leverage for
observation i. None of the observations in Example 3 have high leverages.
Another often used measure of influence of the individual observations is called Cooks distance also known
as Cooks D. Cooks D for observation i measures the distance between the vector of estimated parameters
with and without observation i (often skipping the intercept 0 if estimated). Other influence statistics exist.
Example 6 In this example two data sets are simulated. The first data set contains 100 observations with
one outlier. This outlier is detected by means of its residual, the leverage of the outlier is low since the
observation does not influence the regression line, see Figure 2. In the top-left panel the dashed line is from
12
a regression with an insignificant intercept and the solid line is from a regression without the intercept. The
outlier has a huge residual, see the bottom-left panel. The mean leverage is p/n = 0.01. Only a few leverages
are greater then 0.02, see the top-right panel. No leverages are greater then 0.03.
The second data set contains four observations with one outlier, see Figure 2 bottom-right panel. This outlier
(observation 4 with coordinates (100,10)) is detected by means of its leverage, the residual of the outlier
is low, see Table 1. The mean leverage is p/n = 0.5. The leverage of the outlier is by far the greatest,
H44 ' 2p/n.
[end of example]
Leverage, H
ii
12
0.03
10
0.025
0.02
6
0.015
4
0.01
0.005
0
2
0.2
0.4
0.6
0.8
Residuals
20
40
60
80
100
12
12
10
10
6
6
4
4
0
2
0.2
0.4
0.6
0.8
20
40
60
80
100
Figure 2: Simulated examples with 1) one outlier detected by the residual (top-left and bottom-left) and 2)
one outlier (observation 4) detected by the leverage (bottom-right).
Table 1: Residuals and leverages for simulated example with one outlier (observation 4) detected by the
leverage.
Obs
x y Residual Leverage
1
1 1 0.9119
0.3402
2
2 2
0.0062
0.3333
3
3 3
0.9244
0.3266
4 100 10 0.0187
0.9998
13
(76)
where V is n p, is p p diagonal with the singular values of X on the diagonal, and U is p p with U T U = U U T =
V T V = I p . This leads to the following solution to the normal equations
OLS
XT X
T T
T
(V U ) (V U ) OLS
OLS
U V T V U T
2 T
U U OLS
OLS
U T
= XT y
= (V U T )T y
(77)
(78)
U V T y
U V T y
V Ty
(79)
(80)
(81)
=
=
=
and therefore
1.1.5
OLS = U 1 V T y.
(82)
X = QR,
(83)
QR Decomposition
An alternative factorization of X is
= XT y
= (QR)T y
(84)
(85)
R T QT y
QT y.
(86)
(87)
=
=
1.1.6
Cholesky Decomposition
(88)
XT y
(89)
X y.
(90)
A Trick to Obtain
T e
with the Cholesky Decomposition X T X = CC T , C is p p lower triangular
e
OLS =
CC T
T
C(C OLS ) =
XT y
XT y
(91)
(92)
XT X
XT y
C
T =
C
.
(X T y)T y T y
(93)
With
=
C
C
zT
0
s
T =
and C
CT
0T
z
s
(94)
14
we get
C
T =
C
CC T
zT C T
Cz
z T z + s2
(95)
We see that
s2
yT y zT z
yT y
y y
T
(96)
T CC T
OLS
OLS
T
T
OLS X y
OLS
yT X
T
T
1
y y
y T y y X(X X)
y T y y T Hy
y T (I H)y
.
T e
e
=
=
=
=
(97)
(98)
(99)
(100)
(101)
(102)
(103)
X y
is
Hence, after Cholesky decomposition of the expanded matrix, the lower right element of C
T
(skipping s in the last row) is C OLS , hence OLS can be found by back-substitution.
p
T
T e
. The last column in C
e
P =
p1 0
0 p2
.. ..
. .
0 0
..
.
0
0
..
.
(104)
pn
We get
= 1/2(y X)T P (y X)
= 1/2(y T P y y T P X T X T P y + T X T P X)
= 1/2(y T P y 2 T X T P y + T X T P X).
(105)
(106)
(107)
= X T P y + X T P X.
(108)
When the columns of X are linearly independent the second order derivative 2 / T = X T P X is
positive definite. Therefore we have a minimum for . Note that X T P X is symmetric, (X T P X)T =
X T P X.
W LS (pronounced theta-hat) by setting / = 0 to obtain the
We find the WLS estimate for termed
normal equations
W LS = X T P y
XT P X
W LS = c with N = X T P X and c = X T P y.
or N
8
(109)
15
Parameter Estimates
If the symmetric matrix N = X T P X is well behaved, i.e., it is full rank (equal to p) corresponding to
linearly independent columns in X a formal solution is
W LS = (X T P X)1 X T P y = N 1 c.
(110)
For reasons of numerical stability especially in situations with nearly linear dependencies between the columns
of X (causing slight alterations to the observed values in X to lead to substantial changes in the estimated ;
this problem is known as multicollinearity) the system of normal equations should not be solved by inverting
X T P X but rather by means of SVD, QR or Cholesky decomposition, see Sections 1.1.4, 1.1.5 and 1.1.6.
When we apply regression analysis in other application areas we are often interested in predicting the response
In
variable based on new data not used in the estimation of the parameters or the regression coefficients .
land surveying and GNSS applications we are typically interested in and not on this predictive modelling.
W LS can be found in one go because eT P e is quadratic in ; unlike in the nonlinear case
(In the linear case
dealt with in Section 2 we dont need an initial value for and an iterative procedure.)
(pronounced y-hat) is
The estimate for y termed y
W LS = X(X T P X)1 X T P y = Hy = XN 1 c
= X
y
(111)
. In geodesy
where H = X(X T P X)1 X T P is the so-called hat matrix since it transforms y into y
(and land surveying) these equations are termed the fundamental equations. In WLS regression H is not
symmetric, H 6= H T . H is idempotent HH = H. We also have HX = X and that the trace of H,
trH = tr(X(X T P X)1 X T P ) = tr(X T P X(X T P X)1 ) = trI p = p. Also P H = H T P = H T P H
which is symmetric.
(pronounced e-hat) is
The estimate of the error term e (also known as the residual) termed e
=yy
= y Hy = (I H)y.
e
(112)
E{(X T P X)1 X T P y}
(X T P X)1 X T P E{y}
(X T P X)1 X T P E{X + e}
,
(113)
(114)
(115)
(116)
1.2.2
Weight Assignment
In general we assign weights to observations so that the weight of an observation is proportional to the inverse
2
.
expected (prior) variance of that observation, pi 1/i,prior
16
In traditional land surveying and GNSS we deal with observations of distances, directions and heights. In
P
WLS we minimize half the weighted sum of squared residuals = 1/2 ni=1 pi e2i . For this sum to make sense
all terms must have the same unit. This can be obtained by demanding that pi e2i has no unit. This means that
pi has units of 1/e2i or 1/yi2 . If we consider the weight definition 02 = p1 12 = = pi i2 = = pn n2
2
we see that 02 has no unit. Choosing pi = 1/i,prior
we obtain that 0 = 1 if measurements are carried out
with the expected (prior) variances (and the regression model is correct). i,prior depends on the quality of the
instruments applied and how measurements are performed. Below formulas for weights are given, see Jacobi
(1977).
Distance Measurements
Here we use
pi =
s2G
n
+ a2 s2a
(117)
where
n is the number of observations,
sG is the combined expected standard deviation of the distance measuring instrument itself and on
centering of the device,
sa is the expected distance dependent standard deviation of the distance measuring instrument, and
a is the distance between the two points in question.
Directional Measurements
Here we use
pi =
n
2
n asc2
+ s2t
(118)
where
n is the number of observations,
sc is the expected standard deviation on centering of the device, and
st is the expected standard deviation of one observed direction.
Levelling or Height Measurements Here we traditionally choose weights pi equal to the number of measurements divided by the distance between the points in question measured in units of km, i.e., a weight of 1
is assigned to one measured height difference if that height difference is measured over a distance of 1 km.
2
this choice of weights does not ensure 0 = 1. In this case the units for
Since here in general pi 6= 1/i,prior
the weights are not those of the inverse prior variances so 0 is not unit-free, and also this tradition makes it
impossible to carry out adjustment of combined height, direction and distance observations.
In conclusion we see that the weights for distances and directions change if the distance a between points
change. The weights chosen for height measurements are generally not equal to the inverse of the expected
(prior) variance of the observations. Therefore they do not lead to 0 = 1. Both distance and directional
measurements lead to nonlinear least squares problems, see Section 2.
17
Figure 3: From four points Q, A, B and C we measure all possible pairwise height differences (from MrskMller and Frederiksen, 1984).
Example 7 (from Mrsk-Mller and Frederiksen, 1984, p. 74) From four points Q, A, B and C we have
measured all possible pairwise height differences, see Figure 3. All measurements are carried out twice. Q has
a known height KQ = 34.294 m which is considered as fixed. We wish to determine the heights in points A,
B and C by means of weighted least squares adjustment. These heights are called 1 , 2 and 3 respectively.
The mean of the two height measurements are (with the distance di between points in parentheses)
from Q to A 0.905 m (0.300 km),
from A to B 1.675 m (0.450 km),
from C to B 8.445 m (0.350 km),
from C to Q 5.864 m (0.300 km),
from Q to B 2.578 m (0.500 km), and
from C to A 6.765 m (0.450 km).
The weight for each observation is pi = 2/di , see immediately above, resulting in (units are in km1 )
P =
6.6667
0
0
0
0
0
0
4.4444
0
0
0
0
0
0
5.7143
0
0
0
0
0
0
6.6667
0
0
0
0
0
0
4.0000
0
0
0
0
0
0
4.4444
(119)
=
=
=
=
=
=
1 KQ + e1
2 1 + e2
2 3 + e3
KQ 3 + e4
2 KQ + e5
1 3 + e6 .
(120)
(121)
(122)
(123)
(124)
(125)
18
In matrix form we get (units are m)
0.905
1.675
8.445
5.864
2.578
6.765
1
1
0
0
0
1
0
0
34.294
1
0
0.000
1 1
0.000
2 +
0 1
34.294
3
34.294
1
0
0 1
0.000
e1
e2
e3
e4
e5
e6
(126)
or (with a slight misuse of notation since we reuse the i s and the ei s; this is y = X + e; units are mm)
35, 199
1, 675
8, 445
28, 430
36, 872
6, 765
1
1
0
0
0
1
e1
0
0
1
0
e2
1
e3
1 1
2 +
0 1
e4
3
e5
1
0
e6
0 1
(127)
(128)
H=
0.0335
0.2288 0.5452
0.2595
0.2260
0.1953
0.2830
0.2670 0.3228 0.3069
0.4101 0.0159
0.3312 0.2367 0.2511
0.3169 0.0143
0.4320
(129)
=
=
=
=
02 P 1
02 (X T P X)1 = 02 N 1
02 XN 1 X T
02 (P 1 XN 1 X T ) = D{y} D{
y }.
(130)
(131)
(132)
(133)
For the weighted sum of squared errors (SSE, also called RSS for the residual sum of squares) we get
T P e
=
e
=
=
=
y T (I H)T P (I H)y
y T (I H)T (P P H)y
y T (P P H H T P + H T P H)y
y T P (I H)y
(134)
(135)
(136)
(137)
19
02 = e
(138)
(with well chosen pi , see Section 1.2.2) follows a 2 distribution with n p degrees of freedom which has expectation n p.
T P e
/(n p) has expectation 1 and its square root is approximately 1.
Therefore e
T P e
= (n p)s20 follows a 2 distribution with n p degrees
What if s0 is larger than 1? How much larger than 1 is too large? e
of freedom. If the probability of finding (n p)s20 larger than the observed value is much smaller than the traditionally used 0.05
(5%) or 0.01 (1%), then s0 is too large.
The square roots of the diagonal elements of the dispersion matrices in Equations 130, 131, 132 and 133
are the standard errors of the quantities in question. For example, the standard error of i denoted
i is the
T
2
1
square root of the ith diagonal element of 0 (X P X) .
= [1.1941 0.7605 1.6879 0.2543 1.5664
Example 8 (continuing Example 7) The estimated residuals
are e
q
T P e
/3 mm/km1/2 = 4.7448 mm/km1/2 . The inverse
2.5516]T mm. Therefore the RMSE or
0 = s0 = e
of X T P X is
(139)
i ci
(140)
follows a t distribution with n p degrees of freedom. This can be used to test whether i ci is significantly different from 0. If
for example zi with ci = 0 has a small absolute value then i is not significantly different from 0 and xi should be removed from
the model.
Example 9 (continuing Example 8) The t-test statistics zi with ci = 0 are [25, 135 24, 270 20, 558]T which are all extremely
large compared to 95% or 99% percentiles in a two-sided t-test with three degrees of freedom, 3.182 and 5.841 respectively. To
double precision the probabilities of finding larger values of |zi | are [0 0 0]T . All parameter estimates are significantly different
from zero.
[end of example]
Matlab code
for Examples 7 to 9
20
[n p] = size(X);
%number of degrees of freedom
f = n-p;
dist = [0.30 0.45 0.35 0.30 0.50 0.45];
P = diag(2./dist); % units [km(-1)]
%P = 0.1*P; % This gives a better s0
%OLS
%P = eye(size(X,1));
y = [.905 1.675 8.445 5.864 2.578 6.765];
%units are mm
y = 1000*y;
Kq = 1000*Kq;
cst = Kq.*[1 0 0 -1 1 0];
y = y+cst;
%OLS by "\" operator: mldivide
%thetahat = X*X\(X*y)
N = X*P;
c = N*y;
N = N*X;
%WLS
thetahat = N\c;
yhat = X*thetahat;
ehat = y-yhat;
yhat = yhat-cst;
%MSE
SSE = ehat*P*ehat;
s02 = SSE/f;
%RMSE
s0 = sqrt(s02);
%Variance/covariance matrix of the observations, y
Dy = s02.*inv(P);
%Standard deviations
stdy = sqrt(diag(Dy));
%Variance/covariance matrix of the adjusted elements, thetahat
Ninv = inv(N);
Dthetahat = s02.*Ninv;
%Standard deviations
stdthetahat = sqrt(diag(Dthetahat));
%Variance/covariance matrix of the adjusted observations, yhat
Dyhat = s02.*X*Ninv*X;
%Standard deviations
stdyhat = sqrt(diag(Dyhat));
%Variance/covariance matrix of the adjusted residuals, ehat
Dehat = Dy-Dyhat;
%Standard deviations
stdehat = sqrt(diag(Dehat));
%Correlations between adjusted elements, thetahat
aux = diag(1./stdthetahat);
corthetahat = aux*Dthetahat*aux;
% tests
% t-values and probabilities of finding larger |t|
% pt should be smaller than, say, (5% or) 1%
t = thetahat./stdthetahat;
pt = betainc(f./(f+t.2),0.5*f,0.5);
% probability of finding larger s02
% should be greater than, say, 5% (or 1%)
pchi2 = 1-gammainc(0.5*SSE,0.5*f);
Probabilities in the 2 distribution are calculated by means of the incomplete gamma function evaluated in Matlab by the gammainc
function.
A Trick to Obtain
T P e
with the Cholesky Decomposition X T P X = CC T , C p p lower triangular
e
W LS
CC T
T
C(C W LS )
= XT P y
= XT P y
(141)
(142)
21
T
XT P X
XT P y
CC =
.
(X T P y)T y T P y
With
=
C
C
zT
0
s
T =
and C
CT
0T
z
s
(143)
(144)
we get
C
T =
C
CC T
zT C T
Cz
z T z + s2
(145)
We see that
s2
=
=
=
=
=
=
=
yT P y zT z
yT P y
T
y Py
T
(146)
T
T
W LS CC W LS
T
T
W LS X P y
W LS
yT P X
T
T
1
(147)
(148)
y Py
y T P y y P X(X P X) X T P y
y T P (I X(X T P X)1 X T P )y
T P e
.
e
is
Hence, after Cholesky decomposition of the expanded matrix, the lower right element of C
T
(149)
(150)
(151)
(152)
p
T
T P e
. The last column in C
e
Ty
.
X
(153)
(154)
(155)
(156)
(157)
(158)
(159)
If the symmetric matrix X T 1 X is well behaved, i.e., it is full rank (equal to p) corresponding to linearly
independent columns in X a formal solution is
GLS = (X T 1 X)1 X T 1 y.
(160)
22
The GLS problem can be turned into an OLS problem by means of the Cholesky decomposition of = CC T
(or of 1 )
GLS
X T 1 X
GLS
X T C T C 1 X
GLS
(C 1 X)T (C 1 X)
GLS
TX
= X T 1 y
= X T C T C 1 y
= (C 1 X)T (C 1 y)
Ty
,
= X
(161)
(162)
(163)
(164)
= C 1 X and y by y
= C 1 y.
i.e., replace X by X
(165)
In the traditional land surveying notation of Mrsk-Mller and Frederiksen (1984) we have (yi `i , fi
Fi , j xj , and ei vi )
`i = Fi (x1 , . . . , xp ) + vi , i = 1, . . . , n.
(166)
(Mrsk-Mller and Frederiksen (1984) use vi ; whether we use +vi or vi is irrelevant for LS methods.)
Several methods are available to solve this problem, see Sections 2.2.1, 2.2.2, 2.2.3 and 2.2.4. Here we use a
linearization method.
If we have one parameter x only we get (we omit the observation index i)
` = F (x) + v.
(167)
In geodesy and land surveying the parameters are often called elements. We perform a Taylor expansion of
F around a chosen initial value x9
` = F (x ) + F 0 (x )(x x ) +
1 00
1
F (x )(x x )2 + F 000 (x )(x x )3 + + v
2!
3!
(168)
and retain up till the first order term only (i.e., we linearize F near x to approximate v 2 to a quadratic near
x ; a single prime 0 denotes the first order derivative, two primes 00 denote the second order derivative etc.)
` ' F (x ) + F 0 (x )(x x ) + v.
(169)
(170)
and from a Taylor expansion we retain the first order terms only
`'
F (x1 , . . . , xp )
F
F
(xp xp ) + v
(x
1 x1 ) + +
x1 x1 =x
xp xp =x
(171)
` ' F (x ) + [F (x )]T (x x ) + v
(172)
or
23
`1
`2
..
.
`n
F1 (x)
F2 (x)
..
.
Fn (x)
v1
v2
..
.
(173)
vn
or
` = F (x) + v
(174)
` ' F (x ) + A(x x ) + v
(175)
and get
where the n p derivative matrix A is
"
A=
F
F
F
=
x
x1
xp
F1
x
.1
= ..
Fn
x1
...
F1
xp
Fn
xp
..
.
(176)
with all Aij = Fi /xj evaluated at xj = xj . Therefore we get (here we use = instead of the correct ')
k = A + v
(177)
(179)
For reasons of numerical stability especially in situations with nearly linear dependencies between the columns
this probof A (causing slight alterations to the values in A to lead to substantial changes in the estimated ;
lem is known as multicollinearity) the system of normal equations should not be solved by inverting AT P A
but rather by means of SVD, QR or Cholesky decomposition, see Sections 1.1.4, 1.1.5 and 1.1.6.
When we apply regression analysis in other application areas we are often interested in predicting the response
. In
variable based on new data not used in the estimation of the parameters or the regression coefficients x
and not on this predictive modelling.
land surveying and GNSS applications we are typically interested in x
24
2.1.2 Iterative Solution
elements in become small, or based on a consideration in terms of the sum of weighted squared residuals
T P (k A)
T P v
= (k A)
v
T AT P k +
T AT P A
= k T P k k T P A
T
(180)
(181)
A Pk +
A P A(A P A) A P k
= k P k k P A
T AT P k +
T AT P k
= k T P k k T P A
= k T P k k T P A
T
T AT P k
= kT P k
Tc
= kT P k
T
= k Pk c N
(182)
(183)
(184)
(185)
(186)
(187)
c.
Hence
kT P k
cT N 1 c
=
1
+
1.
T P v
T P v
v
v
(188)
Therefore we iterate until the ratio of the two quadratic forms on the right hand side is small compared to 1.
The method described here is identical to the Gauss-Newton method sketched in Section 2.2.3 with A as the Jacobian.
02 (P 1 AN 1 AT )
(189)
(190)
(191)
02 N 1
=
D{`} D{`}
Q` Q`.
(192)
For the weighted sum of squared errors (SSE, also called RSS for the residual sum of squares) we get
T P v
= `T P (I AN 1 AT P )` = `T P (I H)`
v
(193)
02 = v
(194)
The square roots of the diagonal elements of the dispersion matrices in Equations 189, 190, 191 and 192
are the standard errors of the quantities in question. For example, the standard error of xi denoted
xi is the
square root of the ith diagonal element of 02 (AT P A)1 .
25
x
in Section 1.2.3, and 2) on influence and leverage in Section 1.1.3,
The remarks on 1) the distribution and significance of
are valid here also.
T
, and ` and v
are orthogonal (with respect to P ): AT P v
= 0 and ` P v
= 0. Geometrically this
A and v
means that our analysis finds the orthogonal projection (with respect to P ) ` of ` onto the plane spanned by
the linearly independent columns of A. This gives the shortest distance between ` and ` in the norm defined
by P .
2.1.4
Confidence Ellipsoids
(195)
q
q
q
(196)
(197)
(198)
(199)
T 1
y (V V ) y =
y T V 1 V T y =
1/2 T
T
(
V y) (1/2 V T y) =
(200)
(201)
a distribution
with
p
degrees
of
freedom,
(x
x
)
Q
(x
x
)
(p),
and
the
semi
axes
of
a,
say,
95%
confidence
ellipsoid
x
would be q i where q is the 95% fractile of the 2 (p) distribution and i are the eigenvalues of Qx .
T P v
/(n p) which means that (n p)
02 2 (n p). Also, (x
02 = v
02 unknown In this case we estimate 02 as
T
1
)T (A P A)(x x
) 2 (p). This means that
)T Qx (x x
)/02 = (x x
x
)T (AT P A)(x x
)/p
(x x
F (p, n p)
2
(202)
(since the independent numerator and denominator above follow 2 (p)/p and 2 (n p)/(n p) distributions, respectively. As
n goes to infinity the left hand side multiplied by p approaches a 2 (p)
distribution so the above case with 02 known serves as
a limiting case.) The semi axes of a, say, 95% confidence ellipsoid are q p i where q is the 95% fractile of the F (p, n p)
distribution, p is the number of parameters and i are the eigenvalues of Qx . If a subset
of m < p parameters are studied the
semi axes of a, say, 95% confidence ellipsoid of the appropriate submatrix of Qx are q m i where q is the 95% fractile of the
F (m, n p) distribution, m is the number of parameters and i are the eigenvalues of that submatrix, see also Examples 10 and
11 with Matlab code.
(203)
With g = f (
x) we get (again we use = instead of the correct ')
D{f } = 02 g T (AT P A)1 g,
see also Example 10 with Matlab code.
(204)
26
2.1.6 The Derivative Matrix
The elements of the derivative matrix A, Aij = Fi /xj , can be evaluated analytically or numerically.
Analytical partial derivatives
the height in point B)
(205)
(206)
(207)
Equation 205 is obviously linear. If we do levelling only and dont combine with distance or directional
observations we can do linear adjustment and we dont need the iterative procedure and the initial values for
the elements. There are very few other geodesy, land surveying and GNSS related problems which can be
solved by linear adjustment.
Analytical partial derivatives for 3-D distance observations are (remember that d( u)/du = 1/(2 u) and
use the chain rule for differentiation)
q
F =
(xB xA )2 + (yB yA )2 + (zB zA )2
= dAB
F
1
=
2(xB xA )(1)
xA
2dAB
xB xA
=
dAB
F
F
=
xB
xA
(208)
(209)
(210)
(211)
(212)
F =
(xB xA )2 + (yB yA )2
= aAB
1
F
=
2(xB xA )(1)
xA
2aAB
xB x A
=
aAB
F
F
=
xB
xA
(213)
(214)
(215)
(216)
(217)
F = arctan
F
xA
10
in Danish kredsdrejningselement
(218)
(219)
27
yB yA
a2AB
F
xA
1
1
(1)
yB yA 2
1 + ( xB xA ) xB xA
xB xA
a2AB
F
yA
=
F
=
xB
F
=
yA
=
F
=
yB
F
= 1.
rA
Numerical partial derivatives
(220)
(221)
(222)
(223)
(224)
(225)
can be calculated as
F (x1 , x2 , . . . , xp )
F (x1 + , x2 , . . . , xp ) F (x1 , x2 , . . . , xp )
'
x1
F (x1 , x2 , . . . , xp )
F (x1 , x2 + , . . . , xp ) F (x1 , x2 , . . . , xp )
'
x2
..
.
(226)
(227)
(228)
(229)
both with appropriately small. Generally, one should be careful with numerical derivatives. There are two
sources of error in the above equations, roundoff error that has to do with exact representation in the computer,
and truncation error having to do with the magnitude of . In relation to Global Navigation Satellite System
(GNSS) distance observations we are dealing with F s with values larger than 20,000,000 meters (this is the
approximate nadir distance from the GNSS space vehicles to the surface of the earth). In this connection a
of 1 meter is small compared to F , it has an exact representation in the computer, and we dont have to do
the division by (since it equals one). Note that when we use numerical partial derivatives we need p + 1
function evaluations (2p for the symmetrized form) for each iteration rather than one.
Example 10 (from Mrsk-Mller and Frederiksen, 1984, p. 86) This is a traditional land surveying example. From point 103
with unknown (2-D) coordinates we measure horizontal directions and distances to four points 016, 020, 015 and 013 (no distance
is measured to point 020), see Figure 4. We wish to determine the coordinates of point 103 and the orientation unknown by means
of nonlinear weighted least squares adjustment. The number of parameters is p = 3.
Points 016, 020, 015 and 013 are considered as error free fix points. Their coordinates are
Point
016
020
015
013
x [m]
3725.10
3465.74
3155.96
3130.55
y [m]
3980.17
4268.33 .
4050.70
3452.06
We measure four horizontal directions and three distances so we have seven observations, n = 7. Therefore we have f = 7 3 = 4
degrees of freedom. We determine the (2-D) coordinates [x y]T of point 103 and the the orientation unknown, r so [x1 x2 x3 ]T =
28
Figure 4: From point 103 with unknown coordinates we measure horizontal directions and distances (no distance is measured to
point 020) to four points 016, 020, 015 and 013 (from Mrsk-Mller and Frederiksen, 1984; lefthand coordinate system).
[x y r]T . The observation equations are (assuming that arctan gives radians and we want gon, = 200/ gon)
3980.17 y
r + v1
3725.10 x
4268.33 y
arctan
r + v2
3465.74 x
4050.70 y
arctan
r + v3
3155.96 x
3452.06 y
arctan
r + v4
3130.55 x
p
(3725.10 x)2 + (3980.17 y)2 + v5
p
(3155.96 x)2 + (4050.70 y)2 + v6
p
(3130.55 x)2 + (3452.06 y)2 + v7 .
`1
= arctan
(230)
`2
(231)
`3
`4
`5
`6
`7
(232)
(233)
(234)
(235)
(236)
To
point
016
020
015
013
Horizontal
direction [gon]
0.000
30.013
56.555
142.445
Horizontal
distance [m]
706.260
614.208
132.745
where the directional observations are means of two measurements. As the initial value [x y ]T for the coordinates [x y]T of
point 103 we choose the mean values for the coordinates of the four fix points. As the initial value r for the direction unknown r
we choose zero. First order Taylor expansions of the observation equations near the initial values give (assuming that arctan gives
radians and we want gon; units for the first four equations are gon, for the last three units are m)
`1
arctan
3980.17 y
3980.17 y
3725.10 x
r +
x
y r + v 1
2
3725.10 x
a1
a21
(237)
29
4268.33 y
3980.17 y
3725.10 x
r
+
y r + v 2
x
3465.74 x
a22
a22
3980.17 y
3725.10 x
4050.70 y
arctan
r
+
y r + v 3
x
2
3155.96 x
a3
a23
3725.10 x
3452.06 y
3980.17 y
x
y r + v 4
arctan
r +
2
3130.55 x
a4
a24
3725.10 x
3980.17 y
a1
x
y + v 5
a1
a1
3155.96 x
4050.70 y
a3
x
y + v 6
a3
a3
3130.55 x
3452.06 y
a4
x
y + v 7
a4
a4
`2
= arctan
(238)
`3
(239)
`4
`5
`6
`7
(240)
(241)
(242)
(243)
a2
a3
a4
p
(3725.10 x )2 + (3980.17 y )2
p
(3565.74 x )2 + (4268.33 y )2
p
(3155.96 x )2 + (4050.70 y )2
p
(3130.55 x )2 + (3452.06 y )2 .
(244)
(245)
(246)
(247)
= A;
as above units for the first four equations are gon, for the last three units are m)
In matrix form we get (k
0.000 arctan
30.013 arctan
56.555 arctan
142.445 arctan
3980.17y
3725.10x
4268.33y
3465.74x
4050.70y
3155.96x
3452.06y
3130.55x
+r
+ r
+ r
+ r
706.260 a1
614.208 a3
132.745 a4
3980.17y
a2
1
3980.17y
a2
2
3980.17y
a2
3
3980.17y
a2
4
3725.10x
a1
3155.96x
a3
3130.55x
a4
3725.10x
a2
1
3725.10x
a2
2
3725.10x
a2
3
3725.10x
a2
4
3980.17y
a1
4050.70y
a3
3452.06y
a4
y .
(248)
The starting weight matrix is (for directions: n = 2, sc = 0.002m, and st = 0.0015gon; for distances: n = 1, sG = 0.005m, and
sa = 0.005m/1000m = 0.000005), see Section 1.2.2 (units for the first four weights are gon2 , for the last three units are m2 )
0.7992 0
0
0
0
0
0
0
0.7925 0
0
0
0
0
0
0.7127
0
0
0
0
0
0
0
0.8472
0
0
0
P =
(249)
0
0
0
0.03545
0
0
0
0
0
0
0.03780 0
0
0
0
0
0
0
0.03094
and after eleven iterations with the Matlab code below we end with (again, units for the first four weights are gon2 , for the last
three units are m2 )
0.8639 0
0
0
0
0
0
0
0.8714 0
0
0
0
0
0
0.8562 0
0
0
0
.
0
0
0.4890 0
0
0
P = 0
(250)
0
0
0
0.02669
0
0
0
0
0
0
0.02904 0
0
0
0
0
0
0
0.03931
After the eleven iterations we get [x y r]T = [3, 263.155m 3, 445.925m 54.612gon]T with standard deviations [4.14mm 2.49mm
0.641mgon]T . The diagonal elements of the hat matrix H are [0.3629 0.3181 0.3014 0.7511 0.3322 0.2010 0.7332] and p/n =
= [0.2352mm 0.9301mm 0.9171mm
3/7 = 0.4286 so no observations have high leverages. The estimated residuals are v
0.3638mm 5.2262mgon 6.2309mgon 2.3408mgon]T . The resulting RMSE is s0 = 0.9563. The probability of finding a larger
T P v
is 0.4542 so s0 is suitably small.
value for RSS = v
As an example on application of Equation 204 we calculate the distance between fix point 020 and point 103 and the standard
deviation of the distance. From the Matlab code below we get the distance 846.989 m with a standard deviation of 2.66 mm.
30
The plots made in the code below allow us to study the iteration pattern of the Gauss-Newton method applied. The last plot
produced, see Figure 5, shows the four fix points as triangles, the initial coordinates for point 103 as a plus, and the iterated
solutions as circles marked with iteration number. The final solution is marked by both a plus and a circle. We see that since there
are eleven iterations the last 3-4 iterations overlap in the plot.
6
4.4
x 10
020
4.2
015
016
103 start
3.8
1
3.6
013
3.4
103 stop
3.2
2.8
2.6
3
2.8
3.2
3.4
3.6
3.8
4
6
x 10
Figure 5: Development of x and y coordinates of point 103 over iterations with first seven iterations annotated; righthand
coordinate system.
A 95% confidence ellipsoid for [x y r]T with semi axes 18.47, 11.05 and 2.41 ( p 6.591 i where p = 3 is the number of
parameters, 6.591 is the 95% fractile in the F (3, 4) distribution, and i are the eigenvalues of Qx = 02 (AT P A)1 ) is shown in
Figure 6. Since the ellipsoid in the Matlab code in the notation of Section 2.1.4 in page 25 is generated in the z-space we rotate by
V to get to y-space.
[end of example]
Matlab code
for Example 10
initial values for elements: x- and y-coordinates for point 103 [m], and
the direction unknown [gon]
= [3263.150 3445.920 54.6122];
play with initial values to check robustness of method
= [0 0 -200];
= [0 0 -100];
= [0 0 100];
r [mgon]
31
2
0
2
15
10
10
0
10
15
10
y [mm]
x [mm]
10
y [mm]
5
0
5
10
15
10
0
x [mm]
10
15
x = [0 0 200];
x = [0 0 40000];
x = [0 0
0];
x = [100000 100000 0];
x = [mean(xx) mean(yy) 0];
%x = [mean(xx) 3452.06 0]; % approx. same y as 013
p = size(x,1);
% desired units: mm and mgon
xx = 1000*xx;
yy = 1000*yy;
l = 1000*l;
x = 1000*x;
cst = 1000*cst;
%number of degrees of freedom
f = n-p;
x0 = x;
sc
st
sG
sa
%a
= 0.002*1000;%[mm]
= 0.0015*1000;%[mgon]
= 0.005*1000;%[mm]
= 0.000005;%[m/m], no unit
[mm]
idx = [];
e2 = [];
dc = [];
X = [];
for iter = 1:50 % iter --------------------------------------------------------% output from atan2 is in radian, convert to gon
F1 = cst.*atan2(yy-x(2),xx-x(1))-x(3);
a = (x(1)-xx).2+(x(2)-yy).2;
F2 = sqrt(a);
F = [F1; F2([1 3:end])]; % skip distance from 103 to 020
% weight matrix
%a [mm]
32
P = diag([2./(2*(cst*sc).2./a+st2); 1./(sG2+a([1 3:end])*sa.2)]);
diag(P)
k = l-F; % l is \ell (not one)
A1 = [];
A2 = [];
if strcmp(partial,analytical)
% A is matrix of analytical partial derivatives
error(not implemented yet);
else
% A is matrix of numerical partial derivatives
%directions
dF = (cst.*atan2(yy- x(2)
,xx-(x(1)+eps))- x(3)
-F1)/eps;
A1 = [A1 dF];
dF = (cst.*atan2(yy-(x(2)+eps),xx- x(1)
) - x(3)
-F1)/eps;
A1 = [A1 dF];
dF = (cst.*atan2(yy- x(2)
,xx- x(1)
) -(x(3)+eps)-F1)/eps;
A1 = [A1 dF];
%distances
dF = (sqrt((x(1)+eps-xx).2+(x(2)
-yy).2)-F2)/eps;
A2 = [A2 dF];
dF = (sqrt((x(1)
-xx).2+(x(2)+eps-yy).2)-F2)/eps;
A2 = [A2 dF];
dF = (sqrt((x(1)
-xx).2+(x(2)
-yy).2)-F2)/eps;
A2 = [A2 dF];
A2 = A2([1 3:4],:);% skip derivatives of distance from 103 to 020
A = [A1; A2];
end
N = A*P;
c = N*k;
N = N*A;
%WLS
deltahat = N\c;
khat = A*deltahat;
vhat = k-khat;
e2 = [e2 vhat*P*vhat];
dc = [dc deltahat*c];
%update for iterations
x = x+deltahat;
X = [X x];
idx = [idx iter];
% stop iterations
itertst = (k*P*k)/e2(end);
if itertst < 1.000001
break;
end
end % iter ------------------------------------------------------------------%x-x0
% number of iterations
iter
%MSE
s02 = e2(end)/f;
%RMSE
s0 = sqrt(s02)
%Variance/covariance matrix of the observations, l
Dl = s02.*inv(P);
%Standard deviations
stdl = sqrt(diag(Dl))
%Variance/covariance matrix of the adjusted elements, xhat
Ninv = inv(N);
Dxhat = s02.*Ninv;
%Standard deviations
stdxhat = sqrt(diag(Dxhat))
%Variance/covariance matrix of the adjusted observations, lhat
Dlhat = s02.*A*Ninv*A;
%Standard deviations
stdlhat = sqrt(diag(Dlhat))
33
34
%semiaxes = sqrt(diag(dDxhat));
% 95% fractile for 2 dfs is 5.991 = 2.4482
% 99% fractile for 2 dfs is 9.210 = 3.0352
%
df
F(3,df).95 F(3,df).99
%
1
215.71
5403.1
%
2
19.164
99.166
%
3
9.277
29.456
%
4
6.591
16.694
%
5
5.409
12.060
%
10
3.708
6.552
% 100
2.696
3.984
% inf
2.605
3.781
% chi2 approximation, 95% fractile
semiaxes = sqrt(7.815*diag(dDxhat))
figure
ellipsoidrot(0,0,0,semiaxes(1),semiaxes(2),semiaxes(3),vDxhat);
axis equal
xlabel(x [mm]); ylabel(y [mm]); zlabel(r [mgon]);
title(95% confidence ellipsoid, \chi2 approx.)
% F approximation, 95% fractile. NB the fractile depends on df
semiaxes = sqrt(3*6.591*diag(dDxhat))
figure
ellipsoidrot(0,0,0,semiaxes(1),semiaxes(2),semiaxes(3),vDxhat);
axis equal
xlabel(x [mm]); ylabel(y [mm]); zlabel(r [mgon]);
title(95% confidence ellipsoid, F approx.)
view(37.5,15)
print -depsc2 confxyr.eps
%clear
%close all
function [v,d] = eigsort(a)
[v1,d1] = eig(a);
d2 = diag(d1);
[dum,idx] = sort(d2);
v = v1(:,idx);
d = diag(d2(idx));
function [xx,yy,zz] = ellipsoidrot(xc,yc,zc,xr,yr,zr,Q,n)
%ELLIPSOID Generate ellipsoid.
%
% [X,Y,Z] = ELLIPSOID(XC,YC,ZC,XR,YR,ZR,Q,N) generates three
% (N+1)-by-(N+1) matrices so that SURF(X,Y,Z) produces a rotated
% ellipsoid with center (XC,YC,ZC) and radii XR, YR, ZR.
%
% [X,Y,Z] = ELLIPSOID(XC,YC,ZC,XR,YR,ZR,Q) uses N = 20.
%
% ELLIPSOID(...) and ELLIPSOID(...,N) with no output arguments
% graph the ellipsoid as a SURFACE and do not return anything.
%
% The ellipsoidal data is generated using the equation after rotation with
% orthogonal matrix Q:
%
% (X-XC)2
(Y-YC)2
(Z-ZC)2
% -------- + -------- + -------- = 1
%
XR2
YR2
ZR2
%
% See also SPHERE, CYLINDER.
%
%
%
%
error(nargchk(7,8,nargin));
if nargin == 7
n = 20;
end
[x,y,z] = sphere(n);
x = xr*x;
y = yr*y;
z = zr*z;
xvec = Q*[reshape(x,1,(n+1)2); reshape(y,1,(n+1)2); reshape(z,1,(n+1)2)];
x = reshape(xvec(1,:),n+1,n+1)+xc;
y = reshape(xvec(2,:),n+1,n+1)+yc;
z = reshape(xvec(3,:),n+1,n+1)+zc;
35
if(nargout == 0)
surf(x,y,z)
%
surfl(x,y,z)
%
surfc(x,y,z)
axis equal
%shading interp
%colormap gray
else
xx = x;
yy = y;
zz = z;
end
Example 11 In this example we have data on the positions of Navstar Global Positioning System (GPS)
space vehicles (SV) 1, 4, 7, 13, 20, 24 and 25 and pseudoranges from our position to the SVs. We want
to determine the (3-D) coordinates [X Y Z]T of our position and the clock error in our GPS receiver, cdT ,
[x1 x2 x3 x4 ]T = [X Y Z cdT ]T , so the number of parameters is p = 4. The positions of and pseudoranges
(`) to the SVs given in a data file from the GPS receiver are
SV
1
4
7
13
20
24
25
X [m]
Y [m]
Z [m]
16,577,402.072
5,640,460.750 20,151,933.185
11,793,840.229 10,611,621.371 21,372,809.480
20,141,014.004 17,040,472.264 2,512,131.115
22,622,494.101 4,288,365.463 13,137,555.567
12,867,750.433 15,820,032.908 16,952,442.746
3,189,257.131 17,447,568.373 20,051,400.790
7,437,756.358 13,957,664.984 21,692,377.935
` [m]
20,432,524.0
21,434,024.4
24,556,171.0
.
21,315,100.2
21,255,217.0
24,441,547.2
23,768,678.3
The true position is (that of the no longer existing GPS station at Landmalervej in Hjortekr) [X Y Z]T =
[3, 507, 884.948 780, 492.718 5, 251, 780.403]T m. We have seven observations, n = 7. Therefore we have
f = n p = 7 4 = 3 degrees of freedom. The observation equations are (in m)
q
`1 =
`2 =
`3 =
`4 =
`5 =
`6 =
`7 =
q
q
q
q
q
(251)
(255)
As the initial values [X Y Z cdT ]T we choose [0 0 0 0]T , center of the Earth, no clock error. First order
Taylor expansions of the observation equations near the initial values give (in m)
`1 = d1 + cdT
(258)
5640460.750 Y
20151933.185 Z
16577402.072 X
X
Y
Z + cdT + v1
d1
d1
d1
`2 = d2 + cdT
(259)
11793840.229 X
10611621.371 Y
21372809.480 Z
X
Y
Z + cdT + v2
d2
d2
d2
`3 = d3 + cdT
(260)
20141014.004 X
17040472.264 Y
2512131.115 Z
X
Y
Z + cdT + v3
d3
d3
d3
36
`4 = d4 + cdT
(261)
22622494.101 X
4288365.463 Y
13137555.567 Z
X
Y
Z + cdT + v4
d4
d4
d4
`5 = d5 + cdT
(262)
12867750.433 X
15820032.908 Y
16952442.746 Z
X
Y
Z + cdT + v5
d5
d5
d5
`6 = d6 + cdT
(263)
3189257.131 X
17447568.373 Y
20051400.790 Z
X
Y
Z + cdT + v6
d6
d6
d6
`7 = d7 + cdT
(264)
7437756.358 X
13957664.984 Y
21692377.935 Z
X
Y
Z + cdT + v7
d7
d7
d7
where (in m)
q
d1 =
d2 =
d3 =
d4 =
d5 =
d6 =
d7 =
(265)
(266)
(267)
(268)
(269)
(270)
(271)
q
q
q
q
q
q
= A;
as above units are m)
In matrix form we get (k
20432524.0 d1 cdT
21434024.4 d2 cdT
24556171.0 d3 cdT
21315100.2 d4 cdT
21255217.0 d5 cdT
24441547.2 d6 cdT
23768678.3 d7 cdT
5640460.750Y
16577402.072X
d1
d1
10611621.371Y
11793840.229X
d2
d2
17040472.264Y
20141014.004X
d3
d3
4288365.463Y
22622494.101X
d4
d4
12867750.433X
15820032.908Y
d5
d5
3189257.131X
17447568.373Y
d6
d
6
7437756.358X
13957664.984Y
d7
d7
20151933.185Z
d1
21372809.480Z
d2
2512131.115Z
d3
13137555.567Z
d4
16952442.746Z
d5
20051400.790Z
d6
21692377.935Z
d7
1
1
1
1
1
1
1
cdT
(272)
.
After five iterations with the Matlab code below (with all observations weighted equally, pi = 1/(10m)2 )
we get [X Y Z cdT ]T = [3, 507, 889.1 780, 490.0 5, 251, 783.8 25, 511.1]T m with standard deviations
[6.42 5.31 11.69 7.86]T m. 25, 511.1 m corresponds to a clock error of 0.085 ms. The difference between the
true position and the solution found is [4.18 2.70 3.35]T m, all well within one standard deviation. The
corresponding distance is 6.00 m. Figure 7 shows the four parameters over the iterations including the starting
guess. The diagonal elements of the hat matrix H are [0.4144 0.5200 0.8572 0.3528 0.4900 0.6437 0.7218]
=
and p/n = 4/7 = 0.5714 so no observations have high leverages. The estimated residuals are v
T
2
2
[5.80 5.10 0.74 5.03 3.20 5.56 5.17] m. With prior variances of 10 m , the resulting RMSE is
s0 = 0.7149.
T P v
is 0.6747 so s0 is suitably small. With prior variances of 52 m2 instead
The probability of finding a larger value for RSS = v
T P v
is 0.1054 so also in this situation s0 is
of 102 m2 , s0 is 1.4297 and the probability of finding a larger value for RSS = v
2
2
2
2
suitably small. With prior variances of 3 m instead of 10 m , s0 is 2.3828 and the probability of finding a larger value for
T P v
is 0.0007 so in this situation s0 is too large.
RSS = v
95% confidence ellipsoids for [X Y Z]T in an earth-centered-earth-fixed (ECEF) coordinate system and in a local EastingNorthing-Up (ENU) coordinate system are shown in Figure 8. The semi axes in both the ECEF and the ENU systems are 64.92,
37
x 10
10
x 10
15
4
5
10
5
2
0
x 10
2
0
x 10
30.76 and 23.96 (this is m 9.277 i where m = 3 is the number of parameters, 9.277 is the 95% fractile in the F (3, 3) distribution, and i are the eigenvalues of QXY Z , the upper-left 3 3 submatrix of Qx = 02 (AT P A)1 ); units are metres. The rotation
to the local ENU system is performed by means of the orthogonal matrix (F T F = F F T = I)
sin
cos
0
(273)
F T = sin cos sin sin cos
cos cos
cos sin sin
where is the latitude and is the longitude. The variance-covariance matrix of the position estimates in the ENU coordinate
system is QEN U = F T QXY Z F . Since
QXY Z a =
F T QXY Z F F T a =
QEN U (F T a) =
a
F T a
(F T a)
(274)
(275)
(276)
we see that QXY Z and QEN U have the same eigenvalues and their eigenvectors are related as indicated. Since the ellipsoid in the
Matlab code in the notation of Section 2.1.4 in page 25 is generated in the z-space we rotate by V to get to y-space.
Dilution of Precision, DOP Satellite positioning works best when there is a good angular separation between the space vehicles.
A measure of this separation is termed the dilution of precision, DOP. Low values of the DOP correspond to a good angular
separation, high values to a bad angular separation, i.e., a high degree of clustering of the SVs. There are several versions of DOP.
From Equation 190 the dispersion of the parameters is
Qx
= D{
xW LS }
02 (AT P A)1 .
(277)
This matrix has contributions from our prior expectations to the precision of the measurements (P ), the actual precision of the
measurements (02 ) and the geometry of the problem (A). Lets look at the geometry alone and define the symmetric matrix
2
qX
qXY
qXZ
qXcdT
qXY
qY2
qY Z
qY cdT
2
2
2 + q2 + q2 ,
qX
Y
Z
(279)
2
qcdT
= qcdT
(280)
q
2 + q2 + q2 + q2
qX
Y
Z
cdT
(281)
which is the square root of the trace of QDOP . It is easily seen that GDOP 2 = P DOP 2 + T DOP 2 .
In practice PDOP values less than 2 are considered excellent, between 2 and 4 good, up to 6 acceptable. PDOP values greater than
around 6 are considered suspect.
38
After rotation from the ECEF to the ENU coordinate system which transforms the upper-left 3 3 submatrix QXY Z of Qx into
QEN U , we can define
2
qE
qEN qEU
2
2
qN U ,
QDOP,EN U = QEN U /(02 prior
(282)
) = qEN qN
2
qEU qN U qU
the horizontal DOP
q
2 + q2
qE
N
(283)
2 =q .
qU
U
(284)
HDOP =
and the vertical DOP
q
V DOP =
60
60
30
40
40
20
20
10
0
N [m]
U [m]
20
Z [m]
[end of example]
20
20
10
40
40
20
60
60
30
20
20
0
20
20
Y [m]
20
X [m]
0
20
N [m]
20
20
20
0
E [m]
20
E [m]
Figure 8: 95% ellipsoid for [X Y Z]T in ECEF (left) and ENU (middle) coordinate systems with projection on EN-plane (right).
Matlab code
for Example 11
Code for functions eigsort and ellipsoidrot are listed under Example 10.
% (C) Copyright 2004
% Allan Aasbjerg Nielsen
% aa@imm.dtu.dk, www.imm.dtu.dk/aa
format short g
% use analytical partial derivatives
partial = analytical;
%partial = n;
% speed of light, [m/s]
%clight = 300000000;
clight = 299792458;
% length of C/A code, [m]
%L = 300000;
% true position (Landmaalervej, Hjortekaer)
xtrue = [3507884.948 780492.718 5251780.403 0];
% positions of satellites 1, 4, 7, 13, 20, 24 and 25 in ECEF coordinate system, [m]
xxyyzz = [16577402.072
5640460.750 20151933.185;
11793840.229 -10611621.371 21372809.480;
20141014.004 -17040472.264 2512131.115;
22622494.101 -4288365.463 13137555.567;
12867750.433 15820032.908 16952442.746;
-3189257.131 -17447568.373 20051400.790;
-7437756.358 13957664.984 21692377.935];
pseudorange = [20432524.0 21434024.4 24556171.0 21315100.2 21255217.0 ...
39
40
s0 = sqrt(s02); %RMSE
Qdop = inv(A*P*A);
Qx = s02.*Qdop;
Qdop = Qdop/sprior2;
PDOP = sqrt(trace(Qdop(1:3,1:3)));
% must be in local Easting-Northing-Up coordinates
%HDOP = sqrt(trace(Qdop(1:2,1:2)));
% must be in local Easting-Northing-Up coordinates
%VDOP = sqrt(Qdop(3,3));
TDOP = sqrt(Qdop(4,4));
GDOP = sqrt(trace(Qdop));
% Dispersion etc of elements
%Qx = s02.*inv(A*P*A);
sigmas = sqrt(diag(Qx));
sigma = diag(sigmas);
isigma = inv(sigma);
% correlations between estimates
Rx = isigma*Qx*isigma;
% Standardised residuals
%Qv = s02.*(inv(P)-A*inv(A*P*A)*A);
Qv = s02.*inv(P)-A*Qx*A;
sigmares = sqrt(diag(Qv));
stdres = vhat./sigmares;
disp(----------------------------------------------------------)
disp(estimated parameters/elements [m])
x
disp(estimated clock error [s])
x(4)/clight
disp(number of iterations)
iter
disp(standard errors of elements [m])
sigmas
%tval = x./sigmas
disp(s0)
s0
disp(PDOP)
PDOP
%stdres
disp(difference between estimated elements and initial guess)
deltaori = x-x0
disp(difference between true values and estimated elements)
deltaori = xtrue-x
disp(----------------------------------------------------------)
% t-values and probabilities of finding larger |t|
% pt should be smaller than, say, (5% or) 1%
t = x./sigmas;
pt = betainc(f./(f+t.2),0.5*f,0.5);
% probabilitiy of finding larger s02
% should be greater than, say, 5% (or 1%)
pchi2 = 1-gammainc(0.5*SSE,0.5*f);
% semi-axes in confidence ellipsoid for position estimates
% 95% fractile for 3 dfs is 7.815 = 2.7962
% 99% fractile for 3 dfs is 11.342 = 3.3682
[vQx dQx] = eigsort(Qx(1:3,1:3));
semiaxes = sqrt(diag(dQx));
% 95% fractile for 2 dfs is 5.991 = 2.4482
% 99% fractile for 2 dfs is 9.210 = 3.0352
%
df
F(3,df).95 F(3,df).99
%
1
215.71
5403.1
%
2
19.164
99.166
%
3
9.277
29.456
%
4
6.591
16.694
%
5
5.409
12.060
%
10
3.708
6.552
% 100
2.696
3.984
% inf
2.605
3.781
% chi2 approximation, 95% fractile
figure
ellipsoidrot(0,0,0,semiaxes(1)*sqrt(7.815),semiaxes(2)*sqrt(7.815),...
semiaxes(3)*sqrt(7.815),vQx);
axis equal
xlabel(X [m]); ylabel(Y [m]); zlabel(Z [m]);
a = 6378137;
f = 1/298.257223563;
lambda = atan2(y,x);
ex2 = (2-f)*f/((1-f)2);
c = a*sqrt(1+ex2);
phi = atan(z/((sqrt(x2+y2)*(1-(2-f))*f)));
h = 0.1; oldh = 0;
while abs(h-oldh) > 1.e-12
41
42
oldh = h;
N = c/sqrt(1+ex2*cos(phi)2);
phi = atan(z/((sqrt(x2+y2)*(1-(2-f)*f*N/(N+h)))));
h = sqrt(x2+y2)/cos(phi)-N;
end
phi1 = phi*180/pi;
b = zeros(1,3);
b(1) = fix(phi1);
b(2) = fix(rem(phi1,b(1))*60);
b(3) = (phi1-b(1)-b(2)/60)*3600;
bb = [b(1) b(2) b(3)];
lambda1 = lambda*180/pi;
l = zeros(1,3);
l(1) = fix(lambda1);
l(2) = fix(rem(lambda1,l(1))*60);
l(3) = (lambda1-l(1)-l(2)/60)*3600;
ll = [l(1) l(2) l(3)];
1X
pi [yi fi ()]2 .
2 i=1
(285)
(286)
From an initial value (an educated guess) we can update by taking a step in the direction in which ke2 k decreases most rapidly,
namely in the direction of the negative gradient
new = old ke2 ( old )k
(287)
where > 0 determines the step size. This is done iteratively until convergence.
(288)
where H = 2 ke2 ()k/ T is the second order derivative of ke2 ()k also known as the Hessian matrix not to be confused
with the hat matrix in Equations 41, 111 and 193. The gradient of the above expansion is
ke2 ()k ' ke2 ( 0 )k + H( 0 )[ 0 ].
(289)
At the minimum ke2 ()k = 0 and therefore we can find that minimum by updating
2
new = old H 1
old ke ( old )k
(290)
until convergence.
From Equation 286 the elements of the Hessian H are
n
fi () fi ()
2 fi ()
2 ke2 k X
=
pi
[yi fi ()]
.
Hkl =
k l
k
l
k l
i=1
(291)
43
We see that the Hessian is symmetric. The second term in Hkl depends on the sum of the residuals between model and data,
which is supposedly small both since our model is assumed to be good and since its terms can have opposite signs. It is therefore
customary to omit this term. If the Hessian is positive definite we have a local minimizer and if its negative definite we have a local
maximizer (if its indefinite, i.e., it has both positive and negative eigenvalues, we have a saddle point). H is sometimes termed the
curvature matrix.
2.2.3
(292)
where J is the so-called Jacobian matrix containing the partial derivatives of e (like A containing the partial derivatives of F in
Equation 176). In the WLS case this leads to
1 T
1
1
e ()P e() ' eT ( 0 )P e()0 + [ 0 ]T J T ( 0 )P e( 0 ) + [ 0 ]T J T ( 0 )P J ( 0 )[ 0 ].
2
2
2
(293)
(294)
(295)
This is equivalent to Equation 178 so the linearization method described in Section 2.1 is actually the Gauss-Newton method with
A as the Jacobian.
It can be shown that if the function to be minimized is twice continuously differentiable in a neighbourhood around the solution
, if J () over iterations is nonsingular, and if the initial solution 0 , is close enough to , then the Gauss-Newton method
in Section 2.1 and h
converges. It can also be shown that the convergence is quadratic, i.e., the length of the increment vector (
in the Matlab function below) decreases quadratically over iterations.
Below is an example of a Matlab function implementation of the unweighted version of the Gauss-Newton algorithm. Note, that the
code to solve the positioning problem is a while loop the body of which is three (or four) statements only. This is easily extended
with the weighting and the statistics part given in the previous example. Note also, that in the call to function gaussnewton we
need to call the function fJ with the at symbol (@) to create a Matlab function handle.
Matlab code
for Example 11
%
%
%
%
- final solution
% Modified after
% L. Elden, L. Wittmeyer-Koch and H.B. Nielsen (2004).
44
45
method is slow because the gradient vanishes at the minimum. In the Levenberg-Marquardt method we modify Equation 295 to
[J T ( old )P J ( old ) + I]( new old ) = J T ( old )P e( old )
(296)
where 0 is termed the damping factor. The Levenberg-Marquardt method is a hybrid of the gradient method far from the
minimum and the Gauss-Newton method near the minimum: if is large we step in the direction of the steepest descent, if = 0
we have the Gauss-Newton method.
Also Newtons method may cause the new to wander off further from the minimum than the old since the Hessian may be
indefinite or even negative definite (this is not the case for J T P J ). In a Levenberg-Marquardt-like extension to Newtons method
we could modify Equation 290 to
new = old (H old + I)1 ke2 ( old )k.
(297)
3 Final Comments
In geodesy (and land surveying and GNSS) applications of regression analysis we are often interested in
the estimates of the regression coefficients also known as the parameters or the elements which are often
2- or 3-D geographical positions, and their estimation accuracies. In many other application areas we are
(also) interested in the ability of the model to predict values of the response variable from new values of the
explanatory variables not used to build the model.
Unlike the Gauss-Newton method both the gradient method and Newtons method are general and not restricted to least squares problems, i.e., the functions to be optimized are not restricted to the form eT e or
eT P e. Many other methods than the ones described and sketched here both general and least squares methods such as quasi-Newton methods, conjugate gradients and simplex search methods exist.
Solving the problem of finding a global optimum in general is very difficult. The methods described and
sketched here (and many others) find a minimum that depends on the set of initial values chosen for the
parameters to estimate. This minimum may be local. It is often wise to use several sets of initial values to
check the robustness of the solution offered by the method chosen.
46
Literature
P.R. Bevington (1969). Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill.
K. Borre (1990). Landmaling. Institut for Samfundsudvikling og Planlgning, Aalborg. In Danish.
K. Borre (1992). Mindste Kvadraters Princip Anvendt i Landmalingen. Aalborg. In Danish.
M. Canty, A.A. Nielsen and M. Schmidt (2004). Automatic radiometric normalization of multitemporal satellite imagery. Remote
Sensing of Environment 41(1), 4-19.
P. Cederholm (2000). Udjvning. Aalborg Universitet. In Danish.
R.D. Cook and S. Weisberg (1982). Residuals and Infuence in Regression. Chapman & Hall.
K. Conradsen (1984). En Introduktion til Statistik, vol. 1A-2B. Informatik og Matematisk Modellering, Danmarks Tekniske Universitet. In Danish.
K. Dueholm, M. Laurentzius and A.B.O. Jensen (2005). GPS. 3rd Edition. Nyt Teknisk Forlag. In Danish.
L. Elden, L. Wittmeyer-Koch and H.B. Nielsen (2004). Introduction to Numerical Computation - analysis and MATLAB illustrations. Studentlitteratur.
N. Gershenfeld (1999). The Nature of Mathematical Modeling. Cambridge University Press.
G.H. Golub and C.F. van Loan (1983). Matrix Computations. Johns Hopkins University Press.
P.S. Hansen, M.P. Bendse and H.B. Nielsen (1987). Liner Algebra - Datamatorienteret. Informatik og Matematisk Modellering,
Matematisk Institut, Danmarks Tekniske Universitet. In Danish.
T. Hastie, R. Tibshirani and J. Friedman (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
Springer.
O. Jacobi (1977). Landmaling 2. del. Hovedpunktsnet. Den private Ingenirfond, Danmarks Tekniske Universitet. In Danish.
A.B.O. Jensen (2002). Numerical Weather Predictions for Network RTK. Publication Series 4, volume 10. National Survey and
Cadastre, Denmark.
N. Kousgaard (1986). Anvendt Regressionsanalyse for Samfundsvidenskaberne. Akademisk Forlag. In Danish.
K. Madsen, H.B. Nielsen and O. Tingleff (1999). Methods for Non-Linear Least Squares Problems. Informatics and Mathematical
Modelling, Technical University of Denmark.
P. McCullagh and J. Nelder (1989). Generalized Linear Models. Chapman & Hall. London, U.K.
E.M. Mikhail, J.S. Bethel and J.C. McGlone (2001). Introduction to Modern Photogrammetry. John Wiley and Sons.
E. Mrsk-Mller and P. Frederiksen (1984). Landmaling: Elementudjvning. Den private Ingenirfond, Danmarks Tekniske
Universitet. In Danish.
A.A. Nielsen (2001). Spectral mixture analysis: linear and semi-parametric full and partial unmixing in multi- and hyperspectral
image data. International Journal of Computer Vision 42(1-2), 17-37 and Journal of Mathematical Imaging and Vision 15(1-2),
17-37.
W.H. Press, S.A. Teukolsky, W.T. Vetterling and B.P. Flannery (1992). Numerical Recipes in C: The Art of Scientific Computing.
Second Edition. Cambridge University Press.
J.A. Rice (1995). Mathematical Statistics and Data Analysis. Second Edition. Duxbury Press.
G. Strang (1980). Linear Algebra and its Applications. Second Edition. Academic Press.
G. Strang and K. Borre (1997). Linear Algebra, Geodesy, and GPS. Wellesley-Cambridge Press.
P. Thyregod (1998). En Introduktion til Statistik, vol. 3A-3D. Informatik og Matematisk Modellering, Danmarks Tekniske Universitet. In Danish.
W.N. Venables and B.D. Ripley (1999). Modern Applied Statistics with S-PLUS. Third Edition. Springer.
Index
F distribution, 25
R2 , 10
2 distribution, 19, 20, 24
0 , 14, 16
s0 , 19, 24
t distribution, 10, 11, 19
t-test, two-sided, 10, 19
adjustment, 6
chain rule, 26
Cholesky, 7, 1315, 20, 21, 23
coefficient of determination, 10
confidence
ellipsoid, 25, 30, 36
Cooks distance, 11
coordinate system
ECEF, 36
ENU, 36
damping factor, 45
decomposition
Cholesky, 13
QR, 13
singular value, 13
degrees of freedom, 6
derivative matrix, 23, 26
dilution of precision, 37
dispersion matrix, 9, 18, 24
distribution
F , 25
2 , 19, 20, 24
t, 10, 11, 19
normal, 10, 19
DOP, 37
ECEF coordinate system, 36
ENU coordinate system, 36
error
ellipsoid, 25
gross, 11
or residual, 5, 6
estimator
central, 7, 15
unbiased, 7, 15
fundamental equations, 7, 15, 23
Gauss-Newton method, 24, 30, 43, 45
Global Navigation Satellite System, 26, 27
Global Positioning System, 35
GNSS, 26, 27
GPS, 35
gradient method, 42
linearization, 22, 23
Matlab command mldivide, 11
minimum
global, 45
local, 45
MSE, 9, 19, 24
multicollinearity, 7, 15, 23
multiple regression, 5
Navstar, 35
Newtons method, 42, 45
normal distribution, 10, 19
normal equations, 6, 14, 21
objective function, 6, 14
observation equations, 6, 23
optimum
global, 45
local, 45
orientation unknown, 26, 27
outlier, 11, 12
partial derivatives
analytical, 26
numerical, 27
precision
dilution of, 37
pseudorange, 35
QR, 7, 11, 13, 15, 23
regression, 6
multiple, 5
simple, 3
regressors, 5
residual
jackknifed, 11
or error, 5
standardized, 11
studentized, 11
RMSE, 9, 19, 24
RSS, 9, 18, 24
significance, 10, 19, 25
simple regression, 3
space vehicle, 27, 35
SSE, 9, 18, 24
standard deviation of unit weight, 14
steepest descent method, 42
SVD, 7, 13, 15, 23
Taylor expansion, 22, 25, 43
uncertainty, 6
hat matrix, 7, 15
Hessian, 42, 43, 45
idempotent, 7, 15
iid, 10
influence, 9, 11, 25
initial value, 22, 26, 42, 45
iterations, 24
iterative solution, 24
variable
dependent, 5
explanatory, 5
independent, 5
predictor, 5
response, 5
variance-covariance matrix, 9, 18, 24
weights, 15
Jacobian, 24, 43
least squares
general (GLS), 21
nonlinear (NLS), 22
ordinary (OLS), 6
weighted (WLS), 14
WLS as OLS, 21
levelling, 16
Levenberg-Marquardt method, 44
leverage, 9, 11, 12, 18, 25, 29, 36