NNLS. Pag 160
NNLS. Pag 160
NNLS. Pag 160
Squares Problems
SIAM's Classics in Applied Mathematics series consists of books that were previously
allowed to go out of print. These books are republished by SIAM as a professional
service because they continue to be important resources for mathematical scientists.
Editor-in-Chief
Robert E. O'Malley, Jr., University of Washington
Editorial Board
Richard A. Brualdi, University of Wisconsin-Madison
Herbert B. Keller, California Institute of Technology
Andrzej Z. Manitius, George Mason University
Ingram Olkin, Stanford University
Stanley Richardson, University of Edinburgh
Ferdinand Verhulst, Mathematisch Instituut, University of Utrecht
Classics in Applied Mathematics
C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the
Natural Sciences
Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras
with Applications and Computational Methods
James M. Ortega, Numerical Analysis: A Second Course
Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: Sequential
Unconstrained Minimization Techniques
F. H. Clarke, Optimization and Nonsmooth Analysis
George F. Carrier and Carl E. Pearson, Ordinary Differential Equations
Leo Breiman, Probability
R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding
Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathemat-
ical Sciences
Olvi L. Mangasarian, Nonlinear Programming
*Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject
to Errors: Part One, Part Two, Supplement. Translated by G. W. Stewart
Richard Bellman, Introduction to Matrix Analysis
U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary
Value Problems for Ordinary Differential Equations
K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of Initial-Value
Problems in Differential-Algebraic Equations
Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems
J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained
Optimization and Nonlinear Equations
Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability
Cornelius Lanczos, Linear Differential Operators
Richard Bellman, Introduction to Matrix Analysis, Second Edition
Beresford N. Parlett, The Symmetric Eigenvalue Problem
*First time in print.
Classics in Applied Mathematics (continued)
Charles L. Lawson
San Clemente, California
Richard J. Hanson
Rice University
Houston, Texas
siam.
Society for Industrial and Applied Mathematics
Philadelphia
Copyright © 1995 by the Society for Industrial and Applied Mathematics. This SIAM
edition is an unabridged, revised republication of the work first published by Prentice-Hall,
Inc., Englewood Cliffs, New Jersey, 1974.
10 9 8 7 6 5
All rights reserved. Printed in the United States of America. No part of this book may be
reproduced, stored, or transmitted in any manner without the written permission of the
publisher. For information, write to the Society for Industrial and Applied Mathematics,
3600 University City Science Center, Philadelphia, PA 19104-2688.
Lawson, Charles L.
Solving least squares problems / Charles L. Lawson, Richard J.
Hanson.
p. cm. - (Classics in applied mathematics; 15)
"This SIAM edition is an unabridged, revised republication of the
work first published by Prentice-Hall, Inc., Englewood Cliffs, New
Jersey, 1974"--T.p. verso.
Includes bibliographical references and index.
ISBN 0-89871-356-0 (pbk.)
1. Least squares-Data processing. I. Hanson, Richard J., 1938-
II. Title. III. Series.
QA275.L38 1995
511'.42--dc20 95-35178
Xi
This page intentionally left blank
PREFACE
in 1966. Since that time the authors have worked very closely together in the
adaptation of the stable mathematical methods to practical computational
problems at JPL. This book was written as part of this collaborative effort.
We wish to express our appreciation to the late Professor G. E. Forsythe
for extending to us the opportunity of writing this book. We thank the mana-
gement of the Jet Propulsion Laboratory, California Institute of Technology,
for their encouragement and support. We thank Drs. F. T. Krogh and D.
Carta for reading the manuscript and providing constructive comments. A
number of improvements and simplifications of mathematical proofs were
due to Dr. Krogh.
Our thanks go to Mrs. Roberta Cerami for her careful and cheerful
typing of the manuscript in all of its stages.
CHARLES L. LAWSON
RICHARD J. HANSON
1 INTRODUCTION
This book is intended to be used as both a text and a reference for persons
who are investigating the solutions of linear least squares problems. Such
least squares problems often occur as a component part of some larger com-
putational problem. As an example, the determination of the orbit of a space-
craft is often considered mathematically as a multipoint boundary value
problem in the field of ordinary differential equations, but the computation
of the orbital parameters usually amounts to a nonlinear least squares esti-
mation problem using various linearization schemes.
More generally, almost any problem with sufficient data to overdetermine
a solution calls for some type of approximation method. Most frequently
least squares is the approximation criterion chosen.
There are a number of auxiliary requirements which often arise in least
squares computations which merit identification for algorithmic develop-
ment. Examples:
A problem may require certain equality or inequality relations between
variables. A problem may involve very large volumes of data so that the
allocation of computer storage is of major concern.
Many times the purpose of least squares computation is not merely to
find some set of numbers that "solve" the problem, but rather the investigator
wishes to obtain additional quantitative information describing the relation-
ship of the solution parameters to the data. In particular, there may be a
family of different solution parameters that satisfy the stated conditions
almost equally well and the investigator may wish to identify this indeter-
minacy and make a choice in this family to meet some additional condi-
tions.
This book presents numerical methods for obtaining least squares solu-
1
2 INTRODUCTION CHAP. 1
tions keeping the points above in mind. These methods have been used suc-
cessfully by many engineers and scientists, together with the authors, in the
NASA unmanned space program.
The least squares problem that we are considering here is known by
different names by those in different scientific disciplines. For example,
mathematicians may regard the (least squares) problem as finding the closest
point in a given subspace to a given point in a function space. Numerical
analysts have also tended to use this framework, which tends to ignore data
errors. It follows that the opportunity to exploit the often present arbitrari-
ness of the solution is disregarded.
Statisticians introduce probability distributions into their conception
of the problem and use such terms as regression analysis to describe this
area. Engineers reach this problem by studying such topics as parameter
estimation, filtering, or process identification.
The salient point is this: When these problems (formulated in any of
these contexts) reach the stage of numerical computation, they contain the
same central problem, namely, a sequence of linear least squares systems.
'This basic linear least squares problem can be stated as follows:
PROBLEM LS
Given a real m x n matrix A of rank k < min (m, n), and given a real
m-vector btfind a real n-vector x0 minimizing the euclidean length of Ax — b.
(The reader can refer to Appendix A for the definitions of any unfamiliar
linear algebraic terms.) We shall use the symbolism A x = b to denote
Problem LS.
This problem can also be posed and studied for the case of complex A
and b. The complex case arises much less frequently in practice than the real
case and the theory and computational methods for the real case generalize
quite directly to the complex case.
In addition to the statement of Problem LS (in a given context) there
is an additional condition: The numerical data that constitute A and b have
only a limited number of accurate digits and after those digits the data are
completely uncertain and hence arbitrary I In practice this is the usual state
of affairs. This is due, in part, to the limited accuracy of measurements or
observations. It is important that explicit note be taken of this situation and
that it be used to advantage to obtain an appropriate approximate solution
to the problem. We shall discuss methods for accomplishing this, particularly
in Chapters 25 and 26.
Consider, as an important example, the case in which the linear least
squares problem arises from a nonlinear one and the solution vector xo is
to be used as a correction to be added to a current nominal solution of the
nonlinear problem. The linearized problem will be a useful approximation
Fig. 1.1 The six cases of Problem LS depending on the relative
sizes of m, it, and Rank (A).
3
4 INTRODUCTION CHAP. 1
to the nonlinear one only in some limited neighborhood. If there are differ-
ent vectors that give satisfactorily small residuals in the linear problem, one
may prefer the one of smallest length to increase the likelihood of staying
within the neighborhood of validity of the linear problem.
The following three points are fundamental to our approach to least
squares computation.
1. Since the data defining Problem LS are uncertain, we can change
these data to suit our needs within this uncertainty.
2. We shall use orthogonal elimination matrices directly on the linear
system of Problem LS.
3. Practicality, for computer implementation, is stressed throughout.
We shall briefly comment on the first two of these points. Our goal in
changing the problem, as stated in point 1, will be to avoid the situation above
where a "small" change in the data produces "large" changes in the solution.
Orthogonal transformation matrices, mentioned in point 2, have a natural
place in least squares computations because they leave the euclidean length
of a vector invariant. Furthermore, it is desirable to use them because of their
stability in computation with regard to propagating data errors or uncer-
tainty.
We have made no assumption regarding the relative sizes of m and n.
In our subsequent discussion of Problem LS it will be convenient to identify
the six cases illustrated in Fig. 1.1.
The main interest of this book will be in Case 2a with special attention
to the situation where data uncertainty leads to Case 2b. Algorithms and
discussions applicable to all six cases will be given, however.
2 ANALYSIS OF THE
LEAST SQUARES PROBLEM
5
6 ANALYSIS OF THE LEAST SQUARES PROBLEM CHAP. 2
where
(a) H is an m x m orthogonal matrix.
(b) R is an m x n matrix of the form
where yt is arbitrary.
(2) Any such ft gives rise to the same residual vector r satisfying
Proof: Replacing A by the right side of Eq. (2.4) and using Eq. (2.2)
yields
for all x.
The right member of Eq. (2.13) has the minimum value \\gt ||2 when
where y2 is arbitrary.
Then with £ defined by Eq. (2.8) one obtains
EXERCISE
3 BY CERTAIN ELEMENTARY
ORTHOGONAL TRANSFORMATIONS
with
and
Proof: Define
and
with
such that
AND
CHAP. 3 CERTAIN ELEMENTARY ORTHOGONAL TRANSFORMATIONS 11
is orthogonal and Q2Q1A has zeros below the diagonal in both the first two
columns.
Continuing in this way, a product of at most n orthogonal transforma-
tions can be constructed that will transform A to upper triangular form.
These remarks can be formalized to provide a finite induction proof of Theo-
rem (3.11).
The algorithmic details of the construction of these transformations will
be given in Chapter 10. This decomposition of a matrix as a product of an
orthogonal matrix and an upper triangular matrix is called a QR decomposi-
tion. It plays an important role in a number of computational algorithms
of linear algebra [Francis (1961); Golub and Businger (1965)].
For Cases la and 2a of Fig. 1.1, in which Rank (A) = n, Theorem (3.11)
establishes the existence of an orthogonal decomposition of A. Thus from
Theorem (3.11) we can write
where the matrices QT, R, and I„ in this representation have the properties
required of the matrices H, R, and KT of an orthogonal decomposition for
a matrix A, as given in Theorem (2.3).
12 CERTAIN ELEMENTARY ORTHOGONAL TRANSFORMATIONS CHAP. 3
In the case that the rank of A is m, Case 3a of Fig. 1.1, Theorem (3.11)
allows us to write
so that
Here, m> Im, and Q in this representation have the properties required of
the matrices H, R, and KT for an orthogonal decomposition of a matrix of
rank m, as given in Theorem (2.3).
In Cases Ib, 2b, and 3b of Fig. 1.1, the matrix R obtained in Theorem
(3.11) is not necessarily in the form required for an orthogonal decomposi-
tion.
We proceed to discuss additional transformations that will obtain the
orthogonal decomposition for these cases.
(3.15) THEOREM
Let A be an m x n matrix whose rank k satisfies k < n <; m. There
is an m x m orthogonal matrix Q and an n x n permutation matrix
P such that
The submatrix [R:T]in the right member of Eq. (3.16) can be further
transformed to the compact form required of the matrix R in Theorem (2.3).
This transformation is described by the following lemma:
CHAP. 3 CERTAIN ELEMENTARY ORTHOGONAL TRANSFORMATIONS 13
(3.17) LEMMA
Let [R: T] be a k x n matrix where R is of rank k. There is an n x n
orthogonal matrix W such that
(3.19) THEOREM
Let A be an m x n matrix of rank k. Then there is an m x m ortho-
gonal matrix H and an n x n orthogonal matrix K such that
where
EXERCISES
(4.3) LEMMA
Let A be an n x n matrix of rank n. Then there exists an n x n
orthogonal matrix U, an n x n orthogonal matrix V, and on n x n
diagonal matrix S such that
Proof of Lemma (4.3): The positive definite symmetric matrix ATA has
an eigenvalue-eigenvector decomposition
and
so that V is orthogonal.
From Eq. (4.8) and the fact that V is orthogonal,
where U, S, and Knave the properties stated in Theorem (4.1). This completes
the proof of Theorem (4.1).
Notice that the singular values of the matrix A are uniquely determined
even though the orthogonal matrices U and V of Eq. (4.19) are not unique.
CHAP. 4 SINGULAR VALUE DECOMPOSITION 21
and
EXERCISES
The singular values of a matrix are very stable with respect to changes
in the elements of the matrix. Perturbations of the elements of a matrix
produce perturbations of the same, or smaller, magnitude in the singular
values. The purpose of this chapter is to present Theorems (5.7) and (5.10),
which provide precise statements about this stability, and Theorem (5.12),
which gives bounds on the perturbation of singular values due to removing
a column, or row, of a matrix.
These theorems are direct consequences of corresponding theorems about
the stability of eigenvalues of a symmetric matrix. We first quote the three
relevant eigenvalue theorems.
(5.1) THEOREM
Let B, A, and E be n x n symmetric matrices with B — A = E.
Denote their respective eigenvalues by Bi, ai, and Ei, = 1,..., n, each
set labeled in nonincreasing order. Then
23
24 PERTURBATION THEOREMS FOR SINGULAR VALUES CHAP. 5
(5.3) THEOREM
Let A be an n x n symmetric matrix with eigenvalues a1 > a2 > • • •
>an. Let k be an integer, 1 < k <; n. Let B be the (n — 1) X (n — I)
symmetric matrix obtained by deleting the kth row and column from A.
Then the ordered eigenvalues B1 of B interlace with those of\ as follows:
where
and
where
and
Clearly, analogous results hold for the case of m < n. For later reference
we state the following theorem based on the discussion above:
(5.6) THEOREM
Let A be an m x n matrix and k = min (m, n). Let C be the (m + n) x
(m + n) symmetric matrix defined by Eq. (5.4). If the singular values of
A are s1,..., sk, then the eigenvalues of C areS1,...,sk, —s1,...,
—sk, and zero repeated | m — n | times.
We may now state the following three theorems regarding the perturba-
tion of singular values.
(5.7) THEOREM
Let B, A, and E be m X n matrices with B — A = E. Denote their
respective singular values by Bi a1 and E i — 1,..., k; k = min
(m, n), each set labeled in nonincreasing order. Then
Then
(5.10) THEOREM
With hypotheses as in Theorem (5.7), there follows the inequality
26 PERTURBATION THEOREMS FOR SINGULAR VALUES CHAP. S
Proof: Introducing Eq. (S.9) and using Theorems (5.6) and (5.2), we
obtain
CASE 1 m > n
CASE 2 m < n
Proof: The results in Eq. (5.13) and (5.14) follow directly from appli-
cation of Theorem (5.3) to the symmetric matrices A = ATA and B = BTB.
In Case 1 the eigenvalues of A and B are a12, i = 1,...,n, and B12, i =
1,..., n — 1, respectively. In Case 2 the eigenvalues of b are b2l i=
m, and zero repeated n — m times, while the eigenvalues of B are B21, i =
1 , . . . , mt and zero repeated n — 1 — m times.
EXERCISES
(5.17) Define x(A) to be the ratio of the largest singular value of A to the
smallest nonzero singular value of A. (This "condition number'* of
A will be used in subsequent chapters.) Show that if Rank (AMXN) = n
and if Bmxr is a matrix obtained by deleting (n — r) columns from
A,then K(B)<K(A).
6 BOUNDS FOR THE CONDITION NUMBER
OF A TRIANGULAR MATRIX
28
CHAP. 6 BOUNDS FOR THE CONDITION NUMBER 29
Using Eq. (6.4) we find p — 1 as a lower bound for K, which tells us nothing
at all. For this matrix a more realistic upper bound for sn can be obtained by
noting the effect of R on the vector y with components
we obtain
Table 6.1 lists the computed values of the last diagonal element FM and
the smallest singular value sn as well as the ratiosn/Fnnfor n — 20 through 26.
Observe that in these casesFnnprovides a good estimate of the size of sn.
It is nevertheless true that there exist n x n matrices with unit column
norms which could be produced by Algorithm HFTI and whose smallest
singular value is smaller than the smallest diagonal element by a factor of
approximately 21-n.
CHAP. 6 BOUNDS FOR THE CONDITION NUMBER 31
The following example of such a matrix is due to Kalian (1966b), pp. 791-
792. Define the n x n matrix R by
For general n, the matrix R is upper triangular, the column vectors are all of
unit euclidean length, and the inequalities
we have
The following theorem shows that this case represents nearly the minimum
value attainable by sn/ r nn|.
(6.13) THEOREM [Stated without proof in Faddeev, et al. (1968)]
Let A be an m X n matrix of rank n. Assume all column vectors of A.
have unit euclidean length. Let R be an upper triangular matrix pro-
duced by Householder triangularization of A with interchange of
columns as in Algorithm HFTI (14.9). Then Sn, the smallest singular
value of A., is related to rnn, the last diagonal element of R, by
and
CHAP. 6 BOUNDS FOR THE CONDITION NUMBER 233
Since
the assertions in Eq. (6.14) and (6.15) are equivalent to the assertions
and
The assertion in Eq. (6.19) follows from the fact thatrnn-1is an element of
R-1.
To establish Eq. (6.20), we shall develop upper bounds on the magnitudes
of all elements of R-1 and then compute the Frobenious norm of the bound-
ing matrix M.
Define T — R~l. It will be convenient to group the elements of T into
diagonals parallel to the main diagonal. Introduce bounding parameters
for the diagonals as follows:
Then
Since |r,il+l| <, |rw|, we may use Eq. (6.21) and (6.23) to obtain
and define
For example, if n = 4,
From Eq. (6.21) and (6.2S) to (6.27), it follows that the elements ofR~l
(== 7) are individually bounded in magnitude by the corresponding elements
ofM. Thus
To compute the Frobenious norm of A/, we note that for/ > 2 the sum of
squares of the elements of column j is
This expression is also correct fory = 1, assuming the sum in the middle
member is defined as zero.
Then
CHAP. 6 BOUNDS FOR THE CONDITION NUMBER 36
The inequalities in Eq. (6.28) and (6.30) together establish the inequality
in Eq. (6.20) and complete the proof of Theorem (6.13).
The following more general theorem has also been stated by Faddeev,
Kublanovskaya, and Faddeeva (1968).
(6.31) THEOREM
Let A and R be matrices as given in Theorem (6.13). The singular
values st> s2 > •. • > sn of A are related to the diagonal elements
ru of R by the inequalities
which with Eq. (6.33) establishes the lower bound for s, stated in Eq. (6.32).
Define Wj to be the principal submatrix of R of order n +1 — j consist-
ing of the intersections of rows and columns j through n. Note that the
leading diagonal element of Wi is rtt. Using the properties in Eq. (6.16) and
Exercise (6.37), one obtains
SinceW1(i)= || Wt ||, Eq. (6.35) and (6.36) may be combined to obtain the
upper bound for st in Eq. (6.32), completing the proof of Theorem (6.31).
EXERCISE
(6.37) Let A be an m x n matrix with column vectors aj. Then ||A||
n1/2 max, || a,||.
7 THE PSEUDOINVERSE
(7.3) THEOREM
Let A m x n = HRKT as in Theorem (7.7). Define
This definition along with Theorems (7.1) and (7.3) immediately allow us
to write the minimum length solution to Problem LS as
The following two cases are worthy of special note. For a square nonsin-
gular matrix B, the pseudoinverse of B is the inverse of B:
For an m x n matrix,
If one has the singular value decomposition of Eq. (4.2), as given in Theo-
rem (4.1), then
EXERCISES
These matrices have many useful properties derivable directly from the
Penrose conditions [Theorem (7.9)]. Also see Exercise (7.21) and the sum-
mary of standard properties of projection matrices given in Appendix A.
The matrices defined in Eq. (8.1) to (8.4) will be used throughout this
chapter without further reference.
(8.5) THEOREM
Using definitions of Eq. (8.1) and (8.2), the matrix G satisfies
where
The bounds in Eq. (8.10) to (8.12) follow from the facts that ||I — Q || < 1
and ||I — ^H < 1, completing the proof of Theorem (8.5).
We remark that for the case of real numbers a and 3 (and in fact for square
nonsingular matrices) one has the algebraic identity
which suggests that Eq. (8.10) is a reasonable form to expect in a bound for
\\G\\. The additional terms G2 and G3 are specifically associated with the
cases of nonsquare matrices or square singular matrices in the following
sense: The matrix G2 can be nonzero only if Rank (A) < m and G3 can be
nonzero only if Rank (A) < n since Q = Im if Rank (A) — m and P = In if
Rank (A) = n.
We next wish to replace ||A+ || in the right sides of Eq. (8.10), (8.11) and
(8.12) by its bound in terms of ||A+|| and ||E||. Such a bound is available
only under the assumptions in Eq. (8.16) and (8.17) of the following theorem.
(8.15) THEOREM
Assume
and
and
which implies that Rank (A + E)>k. With the inequality of Eq. (8.16)
this establishes Eq. (8.18). The inequality of Eq. (8.20) can be written as
which is equivalent to the inequality of Eq. (8.19). This completes the proof
of Theorem (8.15).
The conditions of Eq. (8.16) and (8.17) are necessary. It is easy to verify
that || A+ || may be unbounded if either of these conditions are not satisfied.
As long as we shall be imposing Eq. (8.16) and (8.17) to obtain Eq. (8.19), we
can take advantage of the condition in Eq. (8.18) to prove that|E|.|A+||2
can be replaced by| | E | | . | | A + | | A +|| in the bound for ||G2|| given in Eq.
(8.11). This is accomplished by Theorems (8.21) and (8.22).
(8.21) THEOREM
If Rank (A) = Rank (A), then
and
Then from Eq. (8.3) and (8.4) and the assumption that A and A have the
same rank, say, k,
and
Then
and thus
Therefore
one obtains
Thus || W12 || = || W21 ||, which completes the proof of Theorem (8.21).
(8.22) THEOREM
If Rank (A) = Rank (A), then the matrix G2 defined in Eg. (8.8) satis-
fies
46 PERTURBATION BOUNDS FOR THE PSEUDOINVERSE CHAP. 8
Then since
it follows that || G2 || satisfies Eq. (8.23), which completes the proof of Theo-
rem (8.22).
where
Proof: The conclusion that Rank (A) = Rank (A) is obtained from
Theorem (8.15). Using the bound for ||A+ || obtained in that theorem, one
obtains Eq. (8.25), (8.26) and (8.27) from Eq. (8.10), (8.23) and (8.12), re-
spectively.
Thus the bound of Eq. (8.28) would follow immediately with c = 3. For
practical purposes this would be a satisfactory result. The special result in
Eq. (8.31) is also immediate since in this case || G2 || = ||G3 ||= 0. Of course
CHAP. 8 PERTURBATION BOUNDS FOR THE PSEUDOINVERSE 47
this result for a square nonsingular matrix is well known and can be proved
more directly [e.g., see Wilkinson (1965a), pp. 189-190].
The special results of Eq. (8.29) and (8.30) are established as follows. Let
x be a unit m-vector. Define
Then
and
Let
Therefore
Then
Equations (8.1) to (8.9) and the theorems of this chapter may be used to
prove that with appropriate hypotheses on the rank of A the elements of A+
are differentiable functions of the elements of A.
Examples of some specific differentiation statements and formulas are
given in Exercises (8.33) and (9.22-9.24). Note that the formulas given in
these exercises generalize immediately to the case in which t is a K-dimen-
sional variable with components t1, . • • , t k . One simply replaces d/dt in these
formulas by d/dt, for / = 1,.... k.
Differentiation of the pseudoinverse has been used by Fletcher and Lill
(1970) and by Perez and Scolnik (1972) in algorithms for constrained mini-
mization problems. See Golub and Pereyra (1973) and Krogh (1974) for an
application of differentiation of the pseudoinverse to nonlinear least squares
problems in which some of the parameters occur linearly.
EXERCISE
and
49
50 PERTURBATION BOUNDS FOR SOLUTION OF PROBLEM LS CHAP. 9
The definitions in Eq. (9.1) to (9.6) of course apply only when the respective
denominators are nonzero. The quantity K of Eq. (9.5) is called the condition
number of A.
(9.7) THEOREM
Let x be the minimum length solution to the least squares problem Ax =
b with residual vector r = b — Ax. Assume || E|A+| A* || < 1 and Rank
(A) <, Rank (A), and let x + dx be the minimum length solution to the
least squares problem
Then
and
Proof: The conclusion in Eq. (9.8) follows from Theorem (8.15). The
vectors x and x + dx satisfy
and
Thus
Note that r = (I - Q)r = (I - Q)b, which with Eq. (8.8) implies G2b
G2r. Then using Eq. (8.6) to (8.9) gives
Using the bound for || A+ || from Theorem (8.15) and the bound for || G 2 ||
from Eq. (8.26), the result in Eq. (9.9) is obtained.
Dividing the inequality in Eq. (9.9) by ||x||, using the definitions in Eq.
(9.1) to (9.6) gives the inequality in Eq. (9.10). This completes the proof of
Theorem (9.7).
CHAP. 9 PERTURBATION BOUNDS FOR SOLUTION OF PROBLEM LS 51
Observe that when n = k = Rank (A), the matrix G3 of Eq. (8.9) is zero,
which leads to the fourth term in the right member of the inequalities in Eq.
(9.9) and (9.10) being zero. Similarly when m — k = Rank (A), the matrix
G2 of Eq. (8.8) and consequently the third term in the right member of the
inequalities in Eq. (9.9) and (9.10) are zero.
Furthermore if either n = k or m = k, then the rank of A clearly cannot
exceed the rank of A. Thus the hypothesis Rank (A) < Rank (A) that was
used in Theorem (9.7) is automatically satisfied when either n = k or m = k.
These observations provide proofs for the following three theorems.
and
and
and
EXERCISES
where r = b = Ax.
(9.23) Further show that dx/dt is the solution of the square nonsingular
system
and
The fact that the matrix Q defined in Eq. (10.11) has the desired properties
is established by the following three lemmas.
(10.12) LEMMA
The m-vector u and the scalar b defined in Eq. (10.5) to (10.10) satisfy
(10.14) LEMMA
The matrix Q of Eq. (10.11) is orthogonal.
Proof: The verification that (QT Q— Im follows directly using Eq.
(10.11) and (10.13).
(10.15) LEMMA
Let y — Qv. Then
56 ELEMENTARY ORTHOGONAL TRANSFORMATIONS CHAP. 10
On a machine that does not use normalized base 2 arithmetic, additional care
should be taken to avoid loss of accuracy in computing c and s. For example,
Cody (1971) has given the following reformulation of the expression in Eq.
(10.23) for use with normalized base 16 arithmetic.
Step Description
8 If va=0, go to Step 10.
9 Set c := 1, s := 0, r := 0, and go to Step 16.
10 Set w := v1v2.
11 Set q :=(1+w2)l/2.
12 Set s :=1/q.
13 If v2 < 0, set s:= — s.
14 Set c := ws.
15 Set r:= |v2|q.
16 Comment: The transformation has been constructed.
(10.26) ALGORITHM G2(c, s, z1, z2)
step Description
1 Set w := z1c + z2s.
2 Set z2 := —z1s + z2c.
3 Setz1:=w.
4 Comment: The transformation has been applied.
The Fortran subroutines G1 and G2 given in Appendix C implement
Algorithms Gl and G2, respectively.
Many variations of algorithmic and programming details are possible in
implementing Householder or Givens transformations. Tradeoffs are possible
involving execution time, accuracy, resistance to underflow or overflow,
storage requirements, complexity of code, modularity of code, taking advan-
tage of sparsity of nonzero elements, programming language, portability, etc.
Two examples of such variations will be described.
Our discussion of the Householder transformation has been based on the
representation given in Eq. (10.1). The Householder matrix can also be
expressed as
The representations in Eq. (10.1) and (10.27) are related by the substitutions
g = b-lU1U — s-1u and h — ui-lu.
The form of Eq. (10.27) is mainly of interest for small m, in which case
the need to store two vectors g and h instead of one vector u is of no con-
60 ELEMENTARY ORTHOGONAL TRANSFORMATIONS CHAP. 10
and
We wish to replace D and B in storage with new matrices D2x2 and B 2 x n with
CASE I
We may set d = d and B = B.
CASE II
Define
Note thatb11=B11(1+t)andb21= 0.
62 ELEMENTARY ORTHOGONAL TRANSFORMATIONS CHAP. 10
CASE III
dEFINE
Rather than saving the quantity b1 that appears in Eq. (11.2), we shall
recompute it when needed based on Eq. (10.10).
Introducing subscripts we have
Each Sj is the jth diagonal entry of the R matrix and will be stored as
such. The quantities are stored in an auxiliary array of locations named
hj, =1,...,«.
The computational algorithm will construct the decomposition of Theo-
rem (3.11). This algorithm will be known as HFT(m, n, A, K). The input to
HFT consists of the integers m and n and the m x n matrix A. The output
consists of the nonzero portion of the upper triangular matrix R stored in
the upper triangular portion of the array named A, the scalars u ( /)j stored in
theyth entry hj of the array named h, and the remaining nonzero portions
63
64 OVERDETERMINED OR EXACTLY DETERMINED FULL RANK PROBLEM CHAP. 11
of the vectors u(J) stored as columns in the subdiagonal part of theyth column
of the array named A.
(IIA) ALGORITHM HFT(m, n, A, h)
Step Description
1 For j := 1,..., n, execute Algorithm Hl(j,j + 1, m, aij,
hj, a1+ l t n —j) (see Algorithm 10.22).
2 Comment: The (forward) triangularization is computed.
In Step 1 of the algorithm above we are adopting the convention (con-
sistent with Fortran usage) that they'th column of A can be referred to by
the name of its first component, and the submatrix of A composed of columns
y+1 through n can be referred to by the name of the first entry of column
7+1-
In Cases la and 2a of Fig. 1.1 the n x n upper triangular matrix /?,, (see
Eq. 3.21) is nonsingular. Thus, in these two cases, the solution X of Problem
LS can be obtained by computing
partitioning g as
and solving
and
Use of Eq. (11.8) and (11.9) obviates the need to save or regenerate the data
matrix [A: b] for the purpose of computing residuals.
The following algorithm, HS1, will accomplish the computation of Eq.
(11.5) to (11.7). The input to this algorithm will consist of the integers m and
n, the arrays named A and h as they are output by the Algorithm HFT (11.4)
and the array named b that holds the right-side m-vector b of Problem LS.
CHAP. 11 OVEROETERMINED OR EXACTLY DETERMINED FULL RANK PROBLEM 65
The output of the algorithm consists of the n-vector x replacing the first
n entries of the array named b and, if m > n, the (m — n)-vector g2 will
replace entries n+1 through m of the array named b.
(11.10) ALGORITHM HSl(m, n, A, h, b)
Step Description
\ For j := !,...,», execute Algorithm H2(j,j + 1, m, ajt
M.O.
2 Comment: In Steps 3 and 4 we shall compute the solution
to the triangular system R11x = gi of Eq. (11.7).
3 Set bn := bjann.
4 If it < 1, go to Step 6.
5 Fort :=it—l,it — 2 , . . . , l,set&,:=
6 Comment: The solution of Problem LS has been computed
for Cases la and 2a.
The case where b represents an m x / matrix can be easily handled by
changing Step 1 to execute Algorithm H2(/,/ + 1, m, au, Ay, b, l)J = 1,...,
it, and by changing Steps 3 and 5 to deal with b as an (m x /)-array.
To compute the residual norm p [see Eq. (11.9)], one could add to
Algorithm HS1
Step?
To compute the residual vector [see Eq. (11.8)], one could add to
Algorithm HS1
Step 8 For t := 1,..., it, set bt := 0
Step 9 For y := it, it — 1,..., 1, execute Algorithm H2(j,j + 1,
m, <u, hj, b, 1)
Note that if Steps 8 and 9 are used, the solution vector x must first be
moved to storage locations distinct from the array b if it is not to be over-
written. Following Step 9 the residual vector r = b — Ax occupies the storage
array called b.
Algorithm HS1 is based on the assumption that the matrix A of Problem
LS is of rank it. There is no test made in the algorithm to check for this.
In practice it is important to know whether a change in the matrix A of
the order of the.data uncertainty could produce a matrix of rank less than it.
One computational approach to this problem, involving the use of column
interchanges as a first step, will be discussed in Chapter 14. Another
66 OVERDETERMINED OR EXACTLY DETERMINED FULL RANK PROBLEM CHAP. 11
EXERCISES
(11.11) Derive a forward triangularization algorithm using Givens transfor-
mations. Count the number of adds and multiplies separately. How
is this count changed if the two-multiply, two-add transformation is
used?
(11.12) Determine the number of multiplications required to solve Problem
LS using Algorithms HFT and HS1. Compare this count with that
obtained for the Givens transformations of Exercise (11.11).
COMPUTATION OF
In order to be able to replace rij by tij in computer storage when tij is com-
puted and to reduce the number of division operations, these formulas will
be used in the following form:
Step Description
13 Remark: This completes the computation of the elements
of the upper triangle of(A T A-~ l for the case of Eq. (12.3)
where no permutations were used in computing R. Alter-
natively, in the case of Eq. (12.4) Steps 14-23 must be
executed to accomplish the final premultiplication and
postmultiplication by P and PT, respectively, as indicated
in Eq. (12.10). In this case we assume an array of integers
pi, i = 1,..., is, is available recording the fact that the
ith permutation performed during the triangularization of
A was an interchange of columns i and pt.
14 For i := n, n — .1,..., 1, do through Step 22.
15 If pt = i, go to Step 22.
16 Set k :== pt. Interchange the contents of a,, and akk. If
/= 1, go to Step 18.
17 For / := 1 , . . . , / — 1, interchange the contents of au
and ajk.
18 If k - i = 1, go to Step 20.
19 For l := i + 1,...,K — 1, interchange the contents of
ail and aik.
20 If k = n, go to Step 22.
21 For /:=k+ 1,...,n, interchange the contents of au
and aw.
22 Continue.
23 Remark: The computation of the unsealed covariance
matrix is completed.
Examples of Fortran implementations of this algorithm with and without
column permutations are, respectively, given in the Programs PROG1 and
PROG2 of Appendix C.
If the singular value decomposition is used, as in Eq. (12.5), the unsealed
covariance matrix C satisfies Eq. (12.11). Thus the individual elements of C
are given by
or
or
respectively.
Thus one would evaluate Eq. (12.17) by first solving for Yin the equation
followed by
72 THE COVARIANCE MATRIX OF THE SOLUTION PARAMETERS CHAP. 12
is near-singular.
A weakness of this type of analysis is the fact that only dependencies
between pairs of variables are easily detected. For example, there may be a
set of three variables that have a mutual near dependence, whereas no two
of the variables are nearly dependent. Such a set of three variables could be
associated with a 3 x 3 principal submatrix of E such as
or
Consider now the Problem LS, Ax = b, for the case m < n, Rank (A)
= m (Case 3a, Fig. 1.1).
A solution algorithm is described briefly in the following steps:
74
CHAP. 13 THE UNDERDETERMINED FULL RANK PROBLEM 75
Each Sj is they jth diagonal entry of the R matrix and will be stored as such.
This algorithm will be known as HBT(m, it, A, g). The input to Algorithm
HBT consists of the integers m and n and the m x n matrix A of rank m.
The output consists of the nonzero portion of the lower triangular matrix R
stored in the lower triangular portion of the array named A. The wj;) are
stored in an auxiliary array named gjtj = 1 m. The remaining nonzero
portions of the vectors ulj) are stored in the upper diagonal part of they'th
row of the array named A.
(13.8) ALGORITHM HBT(m, », A, g)
Step Description
1 For j := 1,..., m, execute Algorithm Hl(j,j 4- 1,n,aj1,
gj, aj+i.i, w —j) (see Algorithm 10.22).
2 Comment: The (backward) triangularization is computed.
In Step 1, it should be noted that aj,1 denotes the start of a row vector in
storage, while aj+ltl denotes the start of m -j rows to be operated on by
the transformations.
The following algorithm will accomplish the computation of Eq. (13.2)
to (13.4). The input of this algorithm will consist of the integers m and n,
the arrays named A and g as they are output by Algorithm HBT (13.8).
The solution will be returned in an array named x. The array b is replaced in
storage by the vector yl of Eq. (13.2).
(13.9) ALGORITHM HS2(m, n, A, g, b, x)
Step Description
1 Setfb1:=b1,/a1.
2 For i := 2,..., m, set ft,
3 Comment: At this point the vector yz must be determined.
If the minimal length solution is desired, put y2 := 0.
4 Ser
76 THE UNDERDETBRMINED FULL RANK PROBLEM CHAP. 13
Step Description
5 For j := m, m — 1,..., 1, execute Algorithm H2(j,j + 1,
ntajltgjtxt 1).
6 Comment: The array named x now contains a particular
solution of the system Ax — b. If the vector y2 of Step 4 is
zero, the array x contains the solution of minimal length.
COMPUTING THE SOLUTION FOR
other factors, such as the details of the computational algorithm, the value
of tolerance parameters used in the computation, and the effects of machine
round-off errors.
Algorithm HFTI applies in particular to the rank-deficient problems iden-
tified as Cases 1b, 2b, and 3b in Fig. 1.1. Strictly speaking, however, it is the
pseudorank rather than the rank of A that will determine the course of the
computation.
Using Theorems (2.3) and (3.19), the mathematical relations on which
Algorithm HFTI is based are the following:
The algorithm will determine the orthogonal matrix Q and the permuta-
tion matrix P so that R is upper triangular and R11 is nonsingular.
The permutation matrix P arises implicitly as a result of the column in-
terchange strategy used in the algorithm. The column interchange strategy is
intimately related to the problem of determining the pseudorank k. An es-
sential requirement is that the submatrix R11 be nonsingular. Generally,
unless other criteria are suggested by special requirements of the application,
one would prefer to have R11 reasonably well-conditioned and ||R22 ||rea-
sonably small.
One example of an exceptional case would be the case in which AmXH is
known to have zero as a distinct eigenvalue. Then one would probably wish
to set k — n — 1 rather than have the algorithm determine k.
Another exceptional case is the weighted approach to Problem LSE
(Chapter 22) where it is reasonable to permit R11 to be very ill-conditioned.
The column interchange strategy used in Algorithm HFTI is as follows.
For the construction of the y'th Householder transformation, we consider
columns/ through n and select that one, say, column A, whose sum of squares
of components in rows./ through m is greatest. The contents of columns./
CHAP. 14 PROBLEM LS WITH POSSIBLY DEFICIENT PSEUDORANK 79
by the matrix
Note that
and
Further, with the stated procedures for column interchange and pseudorank
determination, one has
where
The input to Algorithm HFTI will consist of the data A and b stored in
the arrays called A and b, respectively, the integers m and it, and the non-
negative absolute tolerance parameter T. The orthogonal matrix Q of Eq.
(14.1) is a product of fi Householder transformations Q,. The data that
define these matrices occupy the lower triangular part of the array A on out-
put plus /i additional stores in an array named h.
The array h is also used to store the squares of the lengths of columns of
certain submatrices generated during the computation. These numbers are
used in the selection of columns to be interchanged. This information can be
(and is) overwritten by the pivot scalars for the matrices Qt.
The permutation matrix P is constructed as a product of transposition
matrices, P = (l,p,) • • • (jitpj. Here (i,y) denotes the permutation matrix
obtained by interchanging columns i and j of /„. The integers p, will be re-
corded in an array p. The orthogonal matrix K of Eq. (14.3) is a product of
k Householder transformations K,. The data that define these matrices oc-
cupy the rectangular portion of the array A consisting of the first k rows of
the last n — k columns plus k additional stores in an array g.
Figure 14.1 illustrates the output storage configuration for this decom-
position when m = 6, n = 5, and k = Rank (A) — 3. If one wishes to
retain complete information about the original matrix A, an extra array
consisting of the diagonal terms of the matrix Kn of Eq. (14.1) must be
recorded. These would be needed to compute the matrix Q of that same
equation. Since Q is applied immediately to the vector b in Algorithm HFTI
this additional storage is not needed.
Interchange record.
Fig. 14.1
CHAP. 14 PROBLEM LS WITH POSSIBLY DEFICIENT PSEUDORANK 81
Step Description
15 Comment: Here k < n. Next, determine the orthogonal
transformations K, whose product constitutes K of Eq. (14.3).
16 For i := k, k - 1,..., 1, execute Algorithm Hl(i, k + 1,
"n,aij,gi,a11,i-)• (The parameters a11 and a11 each iden-
tify the first element of a row vector that is to be referenced.)
17 Set xk := bk/akk. lf k < 1, go to Step 19.
18 For i := k - 1, k - 2,..., 1, x, :
[see Eq. (14.4)].
19 If k = n, go to Step 22.
20 Define the (n — k)-vector y2 of Eq. (14.5) and store its com-
ponents in xij i:= k +1,..., n. In particular, set y2 := 0
if the minimal length solution is desired.
21 For i:=1,...,k, execute Algorithm H2(i, k + 1,n, aij,
giy x, 1). [See Eq. (14.6). Here an identifies the first element
of a row vector.]
22 Fory := /i, n — 1,..., 1, do Step 23.
23 If PJ =J, interchange the contents of xj and xpi [see Eq.
(14.6)].
A Fortran subroutine HFTI, implementing Algorithm HFTI, is given in
Appendix C. In the subroutine the algorithm has been generalized to handle
multiple right-side vectors. The subroutine also handles the (unlikely) case
of the pseudorank k being zero by setting the solution vector x to zero.
15 ANALYSIS OF COMPUTING ERRORS
FOR HOUSEHOLDER TRANSFORMATIONS
appropriate scaling can assure that numbers satisfying | x | < L can safely
be replaced by zero without affecting the accuracy of the results in the sense
of vector norms.
The number n characterizes the relative precision of the normalized float-
ing point arithmetic. Following Wilkinson we use the notation
to indicate that z is the machine number produced by the computer when its
addition operation is applied to the machine numbers x and y. If z — x + y,
then even assuming L <, | z | < U, it is frequently true that £ — z = 0. For
convenience this difference will be called round-off error regardless of whether
the particular computer uses some type of rounding or simply truncation.
The number n is the smallest positive number such that whenever the true
result z of one of the five operations, add, subtract, multiply, divide, or square
root, is zero or satisfies L < | z | < U, there exist numbers El bounded in
magnitude by 17, satisfying the corresponding one of the following five equa-
tions.
The details of the derivation of these bounds are omitted since they are
somewhat lengthy and are very similar to the derivation of bounds given on
pp. 152-156 of Wilkinson (1965a). Note, however, that the bounds given
here are different from those in Wilkinson (1965a) primarily because we are
assuming that all arithmetic is done with precision 9, whereas it is assumed
there that some operations are done with a precision of if. We defer discus-
sion of such mixed precision arithmetic to Chapter 17.
We next consider the application of a Householder transformation to an
m x n matrix C using Algorithm H2 or the optional capability of Algorithm
HI (10.22). Mathematically we wish to compute
where
Then
Equation (15.39) and the bound in Eq. (15.40) can be interpreted as show-
ing that the computed matrix Ak+l is the exact result of an orthogonal trans-
formation of a matrix A + Hk where || Hk ||r is small relative to || A ||P.
Note that the result expressed by Eq. (15.39) and (15.40) is independent
of the fact that the matrices Qt were computed for the purpose of zeroing
certain elements of A,. Equations (15.39) and (15.40) remain true for cases in
which A (and thus Ak+1) is replaced by a matrix or vector not having any
special relation to the matrices Ql.
Furthermore if the computed matrices of Eq. (15.34) are used in the op-
posite order, say,
where
The orthogonal matrices Qt in Eq. (15.42) are the same as those in Eq. (15.39).
This latter observation will be needed in the proofs of Theorems (16.18),
(16.36), (17.19), and (17.23).
Finally we shall need bounds on the error in solving a triangular system
of equations. Let R be a k x k triangular matrix and c be a k-vector. Then
the computed solution x of
will satisfy
For the derivation of this result, see Forsythe and Moler (1967), pp. 103-105.
In the course of proving Eq. (15.46) in the place cited, it is shown that
EXERCISE
with
and
90
CHAP. 16 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS 91
Using Eq. (15.39) and (15.40) with A there replaced by the matrix [A: b]
of Eq. (16.5), it follows that there exist an orthogonal matrix Q, a matrix (7,
and a vector f such that the computed quantities R, c, and d satisfy
with
and
From Eq. (15.45) and (15.46) the computed solution x will satisfy
with
To relate this computed vector x to the original least squares problem, define
the augmented matrix
Define
Then x is the least squares solution of Eq. (16.2). Using |E| < ||G|| + ||S||
along with Eq. (16.8) to (16.10) establishes the bounds in Eq. (16.3) and
(16.4). This completes the proof of Theorem (16.1).
(16.11) THEOREM (The square nonsingular problem)
Let A be an n x n matrix of pseudorank n and b be an n-vector.
If X is the solution of Ax. = b computed using Algorithms HFT(11.4)
and HS1 (11.10) or HFTl (14.9) with computer arithmetic of relative
precision n, then X is the exact solution of a problem
with
and
This theorem is simply a special case of Theorem (16.1) for the case n — m
so the bounds in Eq. (16.13) and (16.14) follow from the bounds in Eq. (16.3)
and (16.4). We have stated Theorem (16.11) as a separate theorem because
there is often independent interest in the square nonsingular case.
It is of interest to compare Theorem (16.11) with an analogous error
analysis theorem for Gaussian elimination algorithms with pivoting [Forsythe
and Moler (1967)].
In this case one has the inequality
Here the symbol || • 1U denotes the max-row^sum norm defined for any
m x n matrix B as
with
and
94 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS CHAP. 16
In place of Eq. (16.22) the computed lower triangular matrix R will satisfy
Furthermore from Eq. (15.45) and (15.46) it follows that in place of Eq.
(16.23) the computed vector y will satisfy
with
The orthogonal matrix Q in Eq. (16.29) is the same as the matrix Q in Eq.
(16.25) as noted in the remarks associated with Eq. (15.41) to (15.43).
Rewriting Eq. (16.27) as
then
CHAP. 16 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS 95
Define
and
and
which establish the inequalities in Eq. (16.20) and (16.21), completing the
proof of the Theorem (16.18).
with
96 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS CHAP. 16
Then
and
where
and
In place of Eq. (16.42) the computed matrices R11 R12, and S will satisfy
where, using the results of Chapter 15, Qk is an exactly orthogonal matrix and
and
Here a, is defined by Eq. (15.37), n — min (m, n) as in Chapter 14, and R22
denotes the computed matrix corresponding to R22 of Eq. (14.1).
CHAP. 16 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS 97
with
In Eq. (14.5) we shall consider only the case y2 — 0. In place of Eq. (14.6)
98 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS CHAP. 16
with
The error coefficient in Eq. (16.57) is determined in the same way as that
in Eq. (16.51).
Define
where
Substituting Eq. (16.61) into Eq. (16.60) and augmenting the system with
additional rows shows that x is the minimal length least squares solution of
CHAP. 16 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS 99
Thus
or using Eq. (16.49) and (16.63), and using k <n to simplify the final expres-
sion,
With Eq. (16.53) and (16.59) this completes the proof of Theorem (16.36).
ANALYSIS OF COMPUTING ERRORS
100
CHAP. 17 THE PROBLEM LS USING MIXED PRECISION ARITHMETIC 101
Then the bound in Eq. (15.40) on the error associated with multiplication
by k Householder transformations becomes
The error bound in Eq. (15.46) is already small relative to the error bounds
from other sources in the complete solution of Problem LS and thus the addi-
tional reduction of the overall error bounds due to the use of co-precision
arithmetic in solving Eq. (15.44) is very slight. For this reason we shall assume
that //-precision rather than co-precision arithmetic is used in solving Eq.
(15.44).
With this mixed precision version of the algorithms, Theorems (16.1),
(16.11), (16.18) and (16.36) are replaced by the following four theorems.
Proofs are omitted due to the close analogy with those of Chapter 16.
(17.11) THEOREM (The full rank overdetermined least squares problem)
Let A be an m x n matrix of pseudorank n and b be an m-vector.
102 THE PROBLEM LS USING MIXED PRECISION ARITHMETIC CHAP. 17
where
ana
with
and
(17.19) THEOREM (The minimum length solution of the full rank underdeter-
mined problem)
Let A be an m x n matrix of pseudorank m and b be an m-vector.
If X is the solution of Ax = b computed using the Algorithms HBT
(13.8) and HS2 (13.9) with mixed precision arithmetic of precisions n
and w<,n2 as described above, then x is close to the exact solution of
a perturbed problem in the following sense: There exists a matrix E
and a vector & such that X is the minimum length solution of
with
and
CHAP. 17 THE PROBLEM LS USING MIXED PRECISION ARITHMETIC 103
with
Note that because of our assumptions on the a priori ordering of the columns
of A, the quantity sk satisfies
Using the general fact that |eij| < ||e||r, one can obtain the inequality
from Eq. (17.13). Comparing inequalities in Eq. (17.34) and (17.36) we note
that inequality in Eq. (17.34) will be most useful in those cases in which yt is
substantially smaller than ||A||P. Analogous remarks hold regarding Eq.
(17.14) and (17.35).
Recall that y, denotes the magnitude of the largest element occurring in
row / of any of the matrices A(1),..., A(H+l\ Thus for y, to be small (relative
to || A Id,) requires that the elements of row i of the original matrix A(l} = A
must be small and that elements in row i of the successive transforming
matrices A(k} must not grow substantially in magnitude.
Powell and Reid have given an example showing that if no restriction is
placed on the ordering of the rows of A, the growth of initially small elements
in some rows can be very large. They recommend the following row-inter-
change strategy: Suppose that a pivot column has been selected and moved
106 THE PROBLEM LS USING MIXED PRECISION ARITHMETIC CHAP. 17
to the kth column of the storage array in preparation for the kth Householder
transformation. Determine a row index / such that
Powell and Reid report that using this combined column- and row-inter-
change strategy they have obtained satisfactory solutions to certain dispa-
rately row-weighted problems for which results were unsatisfactory when
rows of small elements were permitted to be used as pivotal rows.
COMPUTATION OF THE SINGULAR
Section 1. INTRODUCTION
Included are additional details relating to the use of the singular value decom-
position in the analysis and solution of Problem LS.
For notational convenience and because it is the case in which one is
most likely to apply singular value analysis, we shall assume that m>n
throughout this chapter. The case of m<n can be converted to this case by
adjoining n — m rows of zeros to the matrix A or, in the case of Problem
107
108 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS CHAP. 18
LS, to the augmented data matrix [A : b]. This adjoining of zero rows can be
handled implicitly in a computer program so that it does not actually require
the storing of these zero rows or arithmetic operations involving these zero
elements. Such an implementation of singular value decomposition with
analysis and solution of Problem LS is provided by the set of Fortran sub-
routines SVA, SVDRS, and QRBD in Appendix C.
(a) Qk is orthogonal
(b) Rk is upper triangular
(c) sk is the kth shift parameter
Before stating the method of computing the shift parameters <sk, notation
will be introduced for the elements of the matrices Ak. It can be verified [see
Exercise (18.46)] that A1 being tridiagonal implies that all the matrices Ak
k = 2,3,..., will be tridiagonal also. We shall write the diagonal terms of
each tridiagonal matrix Ak as a(k) i= 1,...,n, and the superdiagonal and
subdiagonal terms as b(k) i = 2 n, for reference in this chapter and in
Appendix B.
CHAP. 18 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS 109
where
where 0 and P are orthogonal and S is diagonal. Then the singular value
decomposition of A is
We now treat the computation represented by Eq. (18.8). First note that
if any et is zero, the matrix B separates into blocks that can be treated in-
CHAP. 18 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS 111
dependency. Next we will show that if any g, is zero, this permits the applica-
tion of certain transformations which also produce a partition of the matrix.
Suppose that qk = 0, with qt 9*= 0, and ej = 0, j = k + 1,...,n. Pre-
multiplying B by (n — k) Givens rotations T, we will have
where all diagonal and superdiagonal elements of B'2 are nonzero. Before we
discuss B'2 further, note that B'1 has at least one zero singular value since its
lower right corner element [position (k, k)] is zero.
When the algorithm later returns to operate on B'l this fact can be used
to eliminate ek, with the following sequence of rotations:
Here Rt, operates on columns i and k to zero position (i, k). For i > 1, the
application of this rotation creates a nonzero entry in position (i — 1, k).
We return to consideration ofB'2and for convenience revert to the symbol
B to denote this bidiagonal matrix, all of whose diagonal and superdiagonal
elements are nonzero. The symbol n continues to denote the order of B.
The singular value decomposition of B [Eq. (18.8)] will be obtained by an
iterative procedure of the form
112 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS CHAP. 18
where Uk and Vk are orthogonal and Bk is upper bidiagonal for all k. The
choice of Uk and Vk is such that the matrix
is upper triangular.
The orthogonal matrix Uk is determined so that the product matrix
is upper bidiagonal.
The computational details differ from what would be suggested by the
formulas above. In particular the matrix BrkBk is never formed and the shift
by ak is accomplished implicitly.
To simplify the notation we shall drop the subscripts indicating the itera-
tion stage, k.
Using the notation of (18.7) the lower right 2x2 submatrix of BTB is
Since we seek the root of Eq. (18.13) that is closest to q2n + e2n, it is c
venient to make the substitution
Then s satisfies
CHAP. 18 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS 113
and
Then y satisfies
where
Thus, using Eq. (18.14) and (18.17), the root l of Eq. (18.13) closest to ql +
el is
and
(18.25) the first column ofVT(BTB— sI) is zero below the first element,
then VT(BTB — al) is upper triangular.
114 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS CHAP. 18
Using this theorem the matrices V and U of Eq. (18.11) and (18.12) will
be computed as products of Givens rotations,
and
In the present algorithm one has Fin the factored form BTB where B is bidi-
agonal. One wishes to produce 7 in a factored form, Y =BTB.Thus B must
be of the form B = UTBV where Kis the same orthogonal matrix as in Eq.
(18.28) and {/is also orthogonal.
Turning now to the determination of the rotations Rt and Tt of Eq. (18.26)
and (18.27), note that the first column of (BTB — al) is
with a given by Eq. (18.22). The first rotation Rt is determined by the require-
ment that the second element in the first column of RT1(BTB — s I) must be
zero.
The remaining rotations are determined in the order T1,R2,T2,...,
Rx-i> T,_i and applied in the order indicated by the parentheses in the fol-
lowing expression:
CHAP. 18 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS 115
Fig. 18.1 The chasing pattern for one QR sweep for the case of
n = 6.
where P is given by Eq. (18.33), the rotations T, are all the premultiplying
Givens rotations that arise proceeding and during the QR sweeps [see Eq.
(18.10) and (18.30)], and the matrices Qt are the Householder transforma-
tions of Eq. (18.6).
where
and
also satisfies
where g(1) and g(2) are the subvectors of £ defined by Eq. (18.38).
The columns of V associated with small singular values may be inter-
preted as indicating near linear dependencies among the columns of A. This
has been noted in Chapter 12. Further remarks on the practical use of the
singular values and the quantities pk are given in Chapter 25, Section 6.
First, if mn is large and m > it, we could arrange that the data matrix
[A : b] be first reduced to an (n + 1) upper triangular matrix by sequential
Householder processing (see Chapter 27). The matrix A is reduced to bidi-
agonal fora as indicated in Eq. (18.6). The nonzero elements of B re-
place the corresponding elements of A in the storage array called A. The
transformations Qt of Eq. (18.6) may be applied to b as they are formed and
need not be saved. The vector resulting from this transformation replaces b
in the storage array called b. The transformations H, of Eq. (18.6) must be
saved in the storage space that becomes available in the upper triangle of the
A array plus an it-array, called A, say.
After the bidiagonalization the nonzero elements of B are copied into the
two it-arrays q and e, occupying locations q,,i= 1,..., it, and et,i=2,...,
it, with location e\ available for use as working storage by the QR algorithm.
The computation of the matrix V of Eq. (18.36) is initiated by explicitly
forming the product (H2 • • • //J required in Eq. (18.34). This computation
can be organized (and is so organized in the Fortran subroutine SVDRS of
Appendix C) so that the resulting product matrix occupies the first it rows of
the storage array A, and no auxiliary arrays of storage are required.
The QR algorithm is applied to B, as represented by the data stored in
arrays q and e. As each rotation Rt is produced during the QR algorithm it
is multiplied into the partial product matrix stored in the A array for the
purpose of forming V [see Eq. (18.34)]. Similarly, each rotation Tt is multi-
plied times the vector stored in the array b in order to form the vector UTb
[see Eq. (18.35) and (18.38)].
At the termination of the QR iterations the numbers stored in the loca-
tions et, i = 2,..., it will be small. The numbers stored in qtt i = 1,..., nt
must next be set to be nonnegative and sorted. Application of these sign
changes and permutations to the storage array A completes the computation
of the matrix F[see Eq. (18.34)]. Application of the permutations to the stor-
age array b completes the computation of g — VTb [see Eq. (18.35) and
(18.38)].
If desired the candidate solutions of Eq. (18.41) and (18.42) can be com-
puted and stored as column vectors in the upper n x it submatrix of the
array A, overwriting the matrix V.
The Fortran subroutine SVDRS (Appendix C) contains the additional
feature of initially scanning the matrix A for any entirely zero columns. If /
such columns are found, the columns are permuted so that the first n — I
columns are nonzero. Such a matrix has at least / of the computed singular
values exactly zero.
This feature was introduced into the subroutine so that a user can delete
a variable from a problem by the simple process of zeroing the corresponding
column of the matrix A. Without this procedure some of the computed sin-
120 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS CHAP. 18
gular values that should be zero would have values of approximately tj\\A\\.
In many circumstances this would be satisfactory, but in some contexts it is
desirable to produce the exact zero singular values.
The subroutine SVDRS also tests the rows of A, and if necessary permutes
the rows of the augmented matrix [A : b] so that if A contains / nonzero rows
the first n — min (/, ri) rows of the permuted matrix are nonzero. This is only
significant when / < , in which case A has at least n — I zero singular values
and this process assures that at least n — I of the computed singular values
will be exactly zero.
When the singular value decomposition is computed for the purpose of
reaching a better understanding of a particular least squares problem, it
is desirable to have a program that prints various quantities derived from
the singular value decomposition in a convenient format. Such a Fortran
subroutine, SVA, is described and included in Appendix C. An example of
the use of SVA is given in Chapter 26.
EXERCISES
(18.46) Prove that if At is symmetric and tridiagonal, then so are the Ak,
k = 2,..., of Eq. (18.3).
(18.47) Prove that the special QR algorithm [Eq. (18.11), (18.12), and
(18.22)] converges in one iteration for 2 x 2 bidiagonal matrices.
19 OTHER METHODS FOR
LEAST SQUARES PROBLEMS
Saction 1. Normal Equations with Cholasky Decomposition
Section 2. Modtftod Gram-Schmidt Orthogonalteation
121
122 OTHER METHODS FOR LEAST SQUARES PROBLEMS CHAP. 19
Asymptotic Number of
Operations Where an Operation
Is a Multiply or Divide Plus
Method an Add
Householder triangularization mn2 — n3/3
Singular value analysis
Direct application to A 2mn2 + s(n)t
Householder reduction of A to triangular R
plus singular value analysis of R mn2 + 5n3/3 + s(n)t
Form normal equations mn2/2
Cholesky solution of normal equations n3/6
Gauss-Jordan solution of normal equations
(for stepwise regression) n3/3
Eigenvalue analysis of normal equations 4n3/3 + s(n)t
Gram-Schmidt (either classical or modified) mn2
tThe term s(n) accounts for the iterative phase of the singular value or eigenvalue com-
putation. Assuming convergence of the QR algorithm in about 2it sweeps and implementa-
tion as in the Fortran subroutine QRBD (Appendix C), s(n) would be approximately 4n3.
which is called the system of normal equations for the problem. Here
and
Equation (19.1) arises directly from the condition that the residual vector
b — Ax for the least squares solution must be orthogonal to the column
space of A. This condition is expressed by the equation
which is seen to be equivalent to Eq. (19.1) using Eq. (19.2) and (19.3).
While P has the same rank as A and thus can be singular, it is nevertheless
true that Eq. (19.1) is always consistent. To verify this note that, from Eq.
(19.3), the vector d is in the row space of A but from Eq. (19.2), the row space
of A is identical with the column space of P.
The system (19.1) could be solved by any method that purports to solve
a square consistent system of linear equations. From Eq. (19.2), however,
the matrix P is symmetric and nonnegative definite, which permits the use of
Cholesky elimination with its excellent properties of economy and stability
[Wilkinson (1965a), pp. 231-232].
The Cholesky method, which we will now describe, is based on the fact
that there exists an n x it upper triangular (real) matrix U such that
and thus the solution x of Eq. (19.1) can be obtained by solving the two tri-
angular systems
and
and
124 OTHER METHODS FOR LEAST SQUARES PROBLEMS CHAP. 19
Note that P and d of Eq. (19.8) denote the same matrix and vector as in Eq
(19.2) and (19.3) and to satisfies
with
where V and y of Eq. (19.10) satisfy Eq. (19.5) and (19.6). Furthermore it is
easily verified that p of Eq. (19.10) satisfies
Solving for utj in the equation involving pt] leads to the following equa-
tions, which constitute the Cholesky (also called square root or Banachiewicz)
factorization algorithm:
In Eq. (19.12) and (19.14) the summation term is taken to be zero when
i = l.
Theoretically if Rank (A) — n, then vt > 0, i=1,...,n. Equations
(19.12) to (19.14) then define unique values for each un.
If Rank (A) = k < n however, there will be a first value of /, say, i = t,
such that v, = 0. Then for i = t the numerator in the right side of Eq. (19.14)
CHAP. 19 OTHER METHODS FOR LEAST SQUARES PROBLEMS 125
must also be zero for j = r + 1 , . . . , n , since solutions exist for all the equa-
tions in (19.11). In this case there is some freedom in the assignment of values
to u,jtj = / + ! , . . . , « . One set of values that is always admissible is «,/ =
0,y — I + 1 , . . . , n [see Exercise (19.37)].
It follows that the theoretical algorithm of Eq. (19.12) to (19.14) provides
a solution for Eq. (19.5) regardless of the rank of A if it is modified so that
Eq. (19.14) is replaced by
where dand w2 are defined by Eq. (19.8). Then with the change of variables
x = Vp
Problem LS is equivalent to the problem
tions and using the Cholesky decomposition. This can be illustrated by con-
sidering the following example.
Define the 3 x 2 matrix
Suppose the value off is such that it is significant to the problem, c > lOOiy,
say, but €2 < qt so that when computing with relative precision 17 we have
1 — e & I but 3 -f- Ea is computed as 3. Thus instead of computing
that satisfies WR = ATA yields, using the Cholesky method [Eq. (19.12) to
(19.14)],
plus random errors of the order of magnitude 3n and of arbitrary sign. The
correct result at this point would be of course
128 OTHER METHODS FOR LEAST SQUARES PROBLEMS CHAP. 19
Thus using arithmetic of relative precision jy, no significant digits are ob-
tained in the element r22. Consequently the matrix R is not distinguishable
from a singular matrix. This difficulty is not a defect of the Cholesky decom-
position algorithm but rather reflects the fact that the column vectors of ATA
are so nearly parallel (linearly dependent) that the fact that they are not
parallel cannot be established using arithmetic of relative precision n.
Contrast this situation with the use of Householder transformations to
triangularize A directly without forming ATA. Using Eq. (10.5) to (10.11) one
would compute
Then to premultiply the second column, a2, of A by I + b-l uuT one computes
and
The important point to note is that the second and third components of
#2, which are of order of magnitude e, were computed as the difference of
quantities of order of magnitude unity. Thus, using imprecision arithmetic,
these components are not lost in the round-off error.
The final step of Householder triangularization would be to replace the
second components of a2 by the negative euclidean norm of the second and
third components, and this step involves no numerical difficulties.
CHAP. 19 OTHER METHODS FOR LEAST SQUARES PROBLEMS 129
where
where
In fact, if one poses the problem of choosing the numbers a* to minimize the
norm of the vector defined by expression (19.24), it is easily verified that the
minimal length vector is given by Eq. (19.26). Thus the vector a, of Eq. (19.22)
and the vector a(t) of Eq. (19.25) are related by the inequality ||a ( j) || < ||aj||.
The superior numerical stability of this modified algorithm derives from this
fact.
The amount of computation and storage required is not increased by this
modification since the vectors af can be computed recursively rather than
using Eq. (19.26) explicitly. Furthermore, the vector a{J+1} can replace the
vector af in storage.
This algorithm, which has become known as the modified Gram-Schmidt
(MGS) algorithm [Rice (1966)], is described by the following equations:
To use MGS in the solution of Problem LS one can form the augmented
matrix
where the matrix R is upper triangular with unit diagonal elements. The
strictly upper triangular elements of R are given by Eq. (19.30). The vectors
q, given by Eq. (19.28) constitute the column vectors of the m x (n + 1)
matrix Q. One also obtains the (n + 1) x (n + 1) diagonal matrix D with
diagonal elements </,»' = 1 » . . . , » + 1, given by Eq. (19.29).
For the purpose of mathematical discussion leading to the solution of
Problem LS, let Q be an m x m orthogonal matrix satisfying
Then
and
Therefore the minimum value of the quantity \\Ax — b\\ is \dn+l \ and this
value is attained by the vector Jc, which satisfies the triangular system
For a given precision of arithmetic MGS has about the same numerical
stability as Householder triangularization. The error analysis given in Bjorck
(1967a) obtains the result that the solution to Ax s b computed using MGS
132 OTHER METHODS FOR LEAST SQUARES PROBLEMS CHAP. 19
with mixed precision fo, jy*) is the exact solution to a perturbed problem
(X + £)xS&-t-/with
and
EXERCISES
(19.38) [Peters and Wilkinson (1970)] Assume Rank (Amxm) = n and m > n.
By Gaussian elimination (%f"Mnfg>g partial pivoting) the decomposi-
tion A = PLR can be obtained where L»XJt is lower triangular with
unit diagonal elements, J^x, is upper triangular, and P is a per-
mutation matrix accounting for the row interchanges.
(a) Count operations needed to compute this Tdecomposition.
(b) Problem LS can be solved by solving L Ly = LTPTb by the
Cholesky method and then solving Rx = y. Count the oper-
ations needed to form LTL and to compute its Cholesky
factorization.
(c) Show that the total operation count of this method is the same
as for the Householder method (see Table 19.1). (We suggest
determining only the coefficients of mn* and n9 in these opera-
tion counts.)
(19.39) Let A be an m x n matrix of rank n. Let A » fl rinQ be the decom-
position of A obtained by the (exact) application of the modified
Gram-Schmidt algorithm to A. Assume g and R are adjusted so
that the columns of Q have unit euclidean length. Consider the
(exact) application of the algorithm HFT(11.4) to the (m + it) x n
matrix PS"*"] and let f"^"x"l denote the data that replace R] in
storage after execution of HFT. Show that B is the same as R
to within signs of rows and C is the same as Q to within signs
and normalization of columns.
(19.40) (Cholesky Method Without Square Roots) In place of Eq. (19.5)
consider the decomposition WTDW — P, where W is upper trian-
gular with unit diagonal elements and d is diagonal. Derive
formulas analogous to Eq. (19.12) to (19.14) for computing the
diagonal elements of D and the strictly upper triangular elements
of W. Show that using this decomposition of f [Eq. (19.8)] prob-
lem (19.1) can be solved without computing any square roots.
LINEAR LEAST SQUARES WITH
Clearly Problem LSE has a solution if and only if Eq. (20.2) is consistent.
We shall assume the consistency of Eq. (20.2) throughout Chapters 20, 21,
22, and 23. In the usual practical cases of this problem one would have
134
CHAP. 20 LINEAR EQUALITY CONSTRAINTS USING A BASIS OF NULL SPACE 135
n > ml — kt, which would assure that Eq. (20.2) is consistent and has more
than one solution.
It will subsequently be shown that, if a solution for Problem LSE exists,
it is unique if and only if the augmented matrix is of rank n. In the case
of nonuniqueness there is a unique solution of minimal length.
Clearly Problem LSE could be generalized to the case in which Eq. (20.2)
is inconsistent but is interpreted in a least squares sense. We have not seen this
situation arise in practice; however, our discussion of Problem LSE would
apply to that case with little modification.
The three algorithms to be given for Problem LSE are each compact in
the sense that no two-dimensional arrays of computer storage are needed
beyond that needed to store the problem data. Each of the three methods can
be interpreted as having three stages:
The first method [Hanson and Lawson (1969)], which will be described in
this chapter, makes use of an orthogonal basis for the null space of the matrix
of the constraint equations. This method uses both postmultiplication and
premultiplication of the given matrix by orthogonal transformation
matrices.
If the problem does not have a unique solution, this method has the
property that the (unique) minimum length solution of the derived uncon-
strained problem gives rise to the (unique) minimum length solution of the
original constrained problem. Thus this first method is suggested for use
if it is expected that the problem may have deficient pseudorank and if stabi-
lization methods (see Chapter 25) based on the notion of limiting the length
of the solution vector are to be used.
This method is amenable to numerically stable updating techniques for
solving a sequence of LSE problems in which the matrix C is successively
changed by adding or deleting rows. The method is used in this way by Stoer
(1971) in his algorithm for Problem LSI.
Furthermore, this first method is of theoretical interest. In Theorem
(20.31) this method is used to show the existence of an unconstrained least
squares problem which is equivalent to Problem LSE in the sense of having
the same set of solutions for any given right-side vectors d and f. This permits
the application to Problem LSE of certain computational procedures, such
as singular value analysis.
136 UNBAR EQUALITY CONSTRAINTS USING A BASIS OP NULL SPACE CHAP. 20
If
then from Theorem (2.3), Eq. (2.8), it follows that X may be represented as
where
where
or equivalently
and thus
or equivatently
From Eq. (7.5) the unique minimal length solution vector for this problem
is given by
138 LINEAR EQUALITY CONSTRAINTS USING A BASIS OF NULL SPACE CHAP. 20
Thus using Eq. (20.7) a solution vector for Problem LSE is given by
It follows that x is the unique minimum length solution vector for Problem
LSE.
It remains to relate the uniqueness of the solution of Problem LSE to
k — Rank . Clearly if k < n there exists an H-vector w ^ 0 satisfying
whose rank is also n. Having only n columns, all columns must be indepen-
dent. Since C#2 = 0, it follows that £tf2 must be of rank n — kv. Thus $ of
Eq. (20.13) is the unique vector minimizing Eq. (20.12), and consequently x
of Eq. (20.14) is the unique solution of Problem LSE.
This completes the proof of Theorem (20.9).
Step Description
4 For / := 2,..., m,, set
(20.27) d=0.1376
0.6593
(20.28) f=
0.9666
if we define
and
But Eq. (20.30) implies that £ is the unique minimum length solution to
the problem of minimizing
where
and
Other relations that follow directly from these and will be used in the
proof are
and
which is symmetric.
which is symmetric.
CHAP. 20 LINEAR EQUALITY CONSTRAINTS USING A BASIS OF NULL SPACE 143
and
According to Theorem (20.31) this matrix has the property that for an arbi-
trary one-dimensional vector, d, and two-dimensional vector,/, the solution
of the least squares problem
and
and
There are a variety of ways in which these steps can be accomplished. For
example, following Bjorck and Golub (1967), suppose a QR decomposition
of C, is computed so that
where Q1 is orthogonal and C1 is upper triangular. Then Eq. (21.4) and (21.5)
can be written as
and
Compute
46 LINEAR EQUALITY CONSTRAINTS BY DIRECT ELIMINATION CHAP. 21
and
(using Householder's method, for example) for a "small" but nonzero value ofe.
Then the solution x(f) of the least squares problem (22.1) is "close" [in a sense
made precise by inequality (22.38) and Eq. (22.37)] to the solution X of Problem
LSE.
This general approach is of practical interest since some existing linear
least squares subroutines or programs can effectively solve Problem LSE by
means of solving the system of Eq. (22.1).
An apparent practical drawback of this idea is the fact that the matrix of
problem (22.1) becomes arbitrarily poorly conditioned [assuming Rank (C)
< n] as the "downweighting parameter" c is made smaller. This observation
certainly does limit the practicality of the idea if problem (22.1) is solved by
the method of normal equations [see Eq. (19.1)]. Thus the normal equation
148
CHAP. 22 LINEAR EQUALITY CONSTRAINTS BY WEIGHTING 149
and unless the matrices involved have very special structure, the computer
representation of the matrix aCTC + €*£?£ will be indistinguishable from
the matrix C*C (and C*d + f £T/wiU be indistinguishable from CTd) when
e is sufficiently small.
Nevertheless, Powell and Reid (1968a and 1968b) have found experi-
mentally that a satisfactory solution to a disparately weighted problem such
as Eq. (22.1) can be computed by Householder transformations if care is
taken to introduce appropriate row interchanges. The reader is referred to
Chapter 17, in which the principal theoretical results of Powell and Reid
(1968a) are presented in the form of three theorems.
In Powell and Reid (1968a) the interchange rules preceding the-construc-
tion of the &th Householder transformation are as follows. First, do the usual
column interchange as in Algorithm HFTI (14.9), moving the column whose
euclidean length from row k through m is greatest into column k. Next scan
the elements in column k from row k through m. If the element of largest
magnitude is in row /, interchange rows / and k.
This row interchange rule would generally be somewhat overcautious for
use in problem (22.1). The main point underlying the analysis of Powell and
Reid (1968a) is that a very damaging loss of numerical information occurs if
the pivot element vf [using the notation of Eq. (10.5) to (10.11)] is significant
for the problem but of insignificant magnitude relative to some elements vt,
i > p. By "significant for the problem** we mean that the solution would be
significantly changed if v, were replaced by zero. But if |v,| is sufficiently
small relative to some of the numbers \vt\, i> p, then vf will be computa-
tionally indistinguishable from zero in the calculation of s and u, by Eq. (10.5)
and (10.7), respectively.
Suppose the nonzero elements of C and E in problem (22.1) are of the
same order of magnitude so that any large disparity in sizes of nonzero ele-
ments of the coefficient matrix of Eq. (22.1) is due to the small multiplier e.
Then the disastrous loss of numerical information mentioned above would
generally occur only if the pivot element v, [of Eq. (10.5)] were chosen from
a row of cE (say, v, = eetj) while some row of C (say, row /) has not yet been
used as a pivot row and contains an element (say, cu) such that j cu | > e \ etj
This situation can be avoided by keeping the rows in the order indicated in
Eq. (22.1) so that all rows of C are used as pivot rows before any row ofcE
is used. Here we have used the symbols f£and C to denote those matrices or
their successors in the algorithm.
We now turn to an analysis of convergence of the solution vector of
problem (22.1) to the solution vector of Problem LSE as 6 —»0. First consider
160 LINEAR EQUALITY CONSTRAINTS BY WEIGHTING CHAP. 22
the special case of Problem LSE in which the matrix C is diagonal and E is
an identity matrix. This case is easily understood, and it will be shown
subsequently that the analysis of more general cases is reducible to this special
case.
Define
and
Together the following two lemmas show that the solution of the down-
weighted least squares problem (22.1) converges to the solution of Problem
LSE, as e —+ 0, for the special case of C and E as given in Eq. (22.2) and
(22.3).
(22.4) LEMMA
Suppose that C and E are given in Eq. (22.2) and (223). Then the solu-
tion of Problem LSE using the notation of Eq. (20.2) and (203) is
given by
The length of Ex - f is
(22.7) LEMMA
Let C and E be as in Eq. (22.2) and (223). The unique least squares
solution of problem (22.1) is given by
Lemma (22.7) can be easily verified by forming and solving the normal
equations (CTC + €1ETE)x = (C*d -f €*E*f). With the hypotheses present
CHAP. 22 UNBAR BQUAU1Y CONstrAlNIS BY WDOH11NO 151
on C and £this system has a diagonal coefficient matrix from which Eq. (22.8)
easily follows.
For convenience in expressing the difference between 3(c) and X define
the vector-valued function ft(e) by
and
then
Using Eq. (22.9) and (22.11) the vector norm of the difference between
x(f) and X is bounded as follows:
Thus in order to have \\x(e) — *|| ^ ij\\£|| it suffices to choose \e\ ^ f0»
where
Note that Eq. (22.18) does not apply if ||/— *|| = 0. In this case, however,
Eq. (22.1) is consistent and has the same solution for all e ^ 0.
We now consider Problem LSE for the more general case in which C is
wit x n of rank A:, <;/w, <n,Eism2 x », and the matrix {€*:£*} is of rank
n. It is further assumed that the constraint equations Cx = d are consistent
By appropriate changes of variables we, shall reduce this problem to the
special one previously considered. First note that for | e | < 1 the least squares
problem (22.1) is exactly equivalent to the least squares problem
where
where
Let
be a singular value
l
decomposition of CR~l [see Theorem (4.1)]. Since C is
of rank *„ CR~ is also, and we have
Then with P*> = z the least squares problem of Eq. (22.24) is equivalent
to the feast squares problem
164 LINEAR EQUALITY CONSTRAINTS BY WEIOHTINO CHAP. 22
Since Cx — dis consistent,
Thus problem (22.1) has been transformed to problem (22.26), where the
coefficient matrix consists of a zero matrix and two diagonal matrices, one of
which is a scalar multiple of the identity matrix.
From Lemmas (22.4) and (22.7) it follows that as p tends to zero the solu-
tion £(/>) of problem (22.26) converges to the (unique) solution £ of the prob-
lem
and
where R and Kare defined by Eq. (22.22) and (22.25), respectively, Problem
LSE (20.1) is converted to problem (22.28).
Thus using Eq. (22.29) and (22.30) we obtain
In order to apply the bounds in Eq. (22.11) using the quantities appearing
in problem (22.26), define
and
where g[ and d\ are the /th components of the vectors of Eq. (22.35) and
(22.36), respectively.
For St 9fc 0, one obtains the relative precision \\x(p) — 4II ^ 9II* I) by
choosing 0 < p ^ p0 where
Then defining
not generally compute in solving problem (22.1). Note that there is no positive
lower bound on permissible values of | € \ imposed by the mathematical analysis
of the problem or the numerical stability of the Householder method as
analyzed in Powell and Reid (1968a). The only practical lower limit is set by
the computer's floating point underflow value, L, referred to in Chapter IS.
For example, consider a machine with 9 = 10"' and I. = 10~3>. Suppose
all nonzero data in the matrices C and E and the vectors d and/are approxi-
mately of unit magnitude.
In practice one would probably need to have € < tjl/* = 10~4 and e > L
— 10-". With this wide range available one might, for example, decide to
set € « 10-1*.
As a numerical illustration of this weighted approach to solving Problem
LSE, consider the data given in Eq. (20.25) to (20.28). We now formulate this
as the weighted least squares problem
r j(r>
1 3.7 X 10-*
2 3.7 x 10-4
3 3.6 x 10-6
4 6.3 X 10-8
S 2.6 X 10-7
6 9.1 X 10-8
7 9.1 X 10-«
8 12 X 10-7
9 4.5 x 10-«
10 5.8 X 10-3
36 8.7 x 10-8
37 3.6 x 10-9
38 1 E small. Numbers multiplied by f e
too
,Q J underflow to zero.
CHAP. 22 UNBAR EQUALITY CONSTRAINTS BY WEIGHTING 157
EXERCISE
Show that the product matrix B has zeros below the first ele-
ment in the first column and thus that an appropriately constructed
sequence of mt matrices like O can be used to zero all elements
below the diagonal in the first w, columns of Show that this
sequence of operations has the same effect as Eq. (21.11) to (21.14).
23 LINEAR LEAST SQUARES WITH
UNEAR INEQUALITY CONSTRAINTS
Section 1. Introduction
Section 2. Characterization of a Solution
Section 3. Problem NNLS
Section* Problem LDP
Section 5. Converting Problem LSI to Problem LDP
Section 6. Problem LSI with Equality Conatrainta
Section 7. An Example of Cortatrainad Curve Fitting
Section 1. INTRODUCTION
168
CHAP. 23 LINEAR INEQUALITY CONSTRAINTS 159
The following important special cases of Problem LSI will also be treated in
detail:
(23.2) PROBLEM NNLS (Nonnegative Least Squares)
Minimize||Ex - f||subjectto x ^ 0.
(23.3) PROBLEM LDP (Least Distance Programming)
Minimize || x|| subject to Gx > h.
Conditions characterizing a solution for Problem LSI are the subject of
the Kuhn-Tucker theorem. This theorem is stated and discussed in Section
2 of this chapter.
In Section 3 Problem NNLS is treated. A solution algorithm, also called
NNLS, is presented. This algorithm is fundamental for the subsequent
algorithms to be discussed in this chapter. A Fortran implementation of
Algorithm NNLS is given in Appendix C as subroutine NNLS.
In Section 4 it is shown that the availability of an algorithm for Problem
NNLS makes possible an elegantly simple algorithm for Problem LDP.
Besides stating Algorithm LDP in Section 4, a Fortran implementation, sub-
routine LDP, is given in Appendix C.
The problem of determining whether or not a set of linear inequalities
(7* ;> /r is consistent and if consistent finding some feasible vector arises in
various contexts. Algorithm LDP can of course be used for this purpose.
The fact that no assumptions need be made regarding the rank of G or the
relative row and column dimensions of G may make this method particularly
useful for some feasibility problems.
In Section 5 the general problem LSI, having full column rank, is treated
by transforming it to Problem LDP. Problem LSI with equality constraints
is treated in Section 6.
Finally in Section 7 a numerical example of curve fitting with inequality
constraints is presented as an application of these methods for handling
constrained least squares problems. A Fortran program, PROG6, which
carries out this example, is given in Appendix C.
The following theorem characterizes the solution vector for Problem LSI:
(23.4) THEOREM (Kuhn-Tucker Conditions for Problem LSI)
An n-vector His a solution for Problem LSI (23.1) if and only if there
exists an m-vector $ and a partitioning of the integers I through m
into subsets 6 and § such that
160 UNBAR INEQUALITY CONfTRAIKTS CHAP. 23
where
This theorem may be interpreted as follows. Let gf denote the fth row
vector of the matrix G. The fth constraint, gfx ^ hlt defines a feasible half-
space, [x: gjx ^> h,}. The vector gt is orthogonal (normal) to the bounding
hyperplane of this halfspace and is directed into the feasible halfspace. The
point x is interior to the halfspaces indexed in S (S for slack) and on the
boundary of the halfspaces indexed in 8 (8 for equality).
The vector
and Z witt be defined and modified in the course of execution of the algorithm.
Variables indexed in the set Z will be held at the value zero. Variables indexed
in the set <P will be free to take values different from zero. If such a variable
takes a nonpositive value, the algorithm will either move the variable to
a positive value or else set the variable to zero and move its index from set (+P
to set Z.
On termination x will be the solution vector and w will be the dual vector.
(23.10) ALGORITHM NNLS(1?, ma, »,/, x, w, z, <J>, Z)
Step Description
1 Set <J> := NULL, Z := {1,2,..., a), and x := 0.
2 Compute the n-vector w := JF(f — Ex).
3 If the set Z is empty or if w, ^ 0 for all j € Z, go to
Step 12.
4 Find an index t e Z such that w, = max [wjij e Z).
5 Move the index I from set Z to set 9.
6 LctEf denote the m* x n matrix defined by
Column y of
and
with
and
From Eq. (23.21) it follows that the nth component X. of the solution
vector X is the least squares solution of the reduced problem
has a strictly smaller value each time Step 2 is reached and thus that at Step
2 the vector x and its associated set (P = {/: xt > 0} are distinct from all
previous instances of x and (P at Step 2. Since (P is a subset of the set (1,2,...,
n} and there are only a finite number of such subsets, Loop A must terminate
after a finite number of iterations. In a set of small test cases it was observed
that Loop A typically required about1/2niterations.
The least squares problem being solved at Step 6 differs from the problem
previously solved at Step 6 either due to the addition of one more column of E
into the problem at Step 5 or the deletion of one or more columns of E at
Step 11. Updating techniques can be used to compute the QR decomposition
for the new problem based upon retaining the QR decomposition of the
previous problem. Three updating methods are described in Chapter 24.
The third of these methods has been used in the Fortran subroutine NNLS
(Appendix C).
doe to round-off error) should be treated as being zero by moving its index
from set (P to set Z.
The sign tests on z1, / e (P, at Steps 7 and 8 do not appear to be critical.
The consequences of a possible misclassification here do not seem to be
damaging.
A Fortran subroutine NNLS implementing Algorithm ]NNLS and using
these ideas for enhancing the numerical reliability appears in Appendix C.
From the Kuhn-Tucker conditions (Theorem (23.4)] for this Problem NNLS
there exist disjoint index sets e and $ such that
and
Consider the case in which \\r\\ > 0 at Step 3. From Eq. (23.32) this
implies thatrn+1< 0, so division byrn+1at Step 5 is valid. Using Eq. (23.31)
and (23.32) and the equations of Steps 2 and 5, we establish the feasibility
of x as follows:
Therefore,
From Eq. (23.31) and (23.33) it follows that the rows of the system of
inequalities of Eq. (23.34) indexed in set $ are satisfied with equality. The
gradient vector for the objective function J|| x | ja of Problem LDP is simply x.
The Kuhn-Tucker conditions for X to minimize 1/2||x||2 subject to Gx>h
require that the gradient vector X must be representabte as a nonnegative
linear combination of the rows of G that are associated with equality condi-
tions in Eq. (23.34), i.e., the rows of G indexed in set S.
From Steps 2 and 5 and Eq. (23.32) we have
CHAP. 23 LINEAR INEQUALITY CONSTRAINTS 167
Noting the sign conditions on A given in Eq. (23.30) completes the proof that
£ is a solution of Problem LDP.
It is clearly the unique solution vector since, if x is a different solution
vector, then p|| = ||x|| and the vector x =1/2(x+ x) would be a feasible
vector having a strictly smaller norm than X, which contradicts the fact that
is a feasible vector of minimum norm.
Now consider the case of \\r\\ = 0 at Step 3. We must show that the
inequalities Gx > h are inconsistent. Assume the contrary, i.e., that there
exists a vector x satisfying Gx > h. Define
Then
This last expression cannot be zero, however, because q > 0 and u> 0.
From this contradiction we conclude that the condition \\r\\ — 0 implies
the inconsistency of the system Gx > h. This completes the mathematical
verification of Algorithm LDP.
where
we may write
After solving problem (23.44) for y2 the solution & can be computed using
Eq. (23.42).
If the method of Chapter 21 is used, one would compute Q1, c1, c2, d, e1,
£„ and / using Eq. (21.11) to (21.14) and additionally solve for the matrix
&, in the upper triangular system
t w
0.25 0.5
0.50 0.6
0.50 0.7
0.80 1.2
170 LINEAR INEQUALITY CONSTRAINTS CHAP. 23
which fits these data in a least squares sense subject to the constraints
where
where
and
Fig. 2&2 Graph of solution line for the sample problem (23.48)^-
(23.51).
CHAP. 23 LINEAR INEQUALITY CONSTRAINTS 173
The given data points (tt, w,), i = 1 , . . . , 4, and the fitted line, f(t) =
0.621t + 0.379, are shown in Fig. 23.2. Note that the third constraint, f(1)
<; 1, is active in limiting how well the fitted line approximates the data points.
The numerical values shown in describing this example were computed
using a UNTVAC 1108 computer. Executing the same Fortran code on
an IBM 360/67 resulted in opposite signs in the intermediate quantities v, f1,
G, and z. this is a consequence of the fact that the signs of columns of the
matrix V in a singular value decomposition are not uniquely determined.
The difference in the number of iterations required to compute the singular
value decomposition of the matrix E on the two computers having different
word lengths resulted in a different assignment of signs in the matrices U
and V
EXERCISES
(23.55) Prove that if a Problem LSE has a unique solution without inequality
constraints, then it has a unique solution with inequality constraints.
(23.56) Show that the problem of minimizing a quadratic function f(x)
=1/2xtBx+atx,for positive definite B can be transformed to the
problem of minimizing1/2||w||2by letting w = Fx—g for an appro-
priate choice of the nonsingular matrix F and vector g.
(23.57) If the function / of Ex. (23.56) is to be minimized subject to the
constraints Cx = d and Gx > A, what are the corresponding con-
straints for the problem of minimizing1/2|| w||2?
MODIFYING A QR DECOMPOSITION TO
24 ADD OR REMOVE COLUMN VECTORS
is given by
METHOD 1
We have Q stored as an explicit m x m orthogonal matrix and R stored
as a A: x fc upper triangle.
Adjoining • Vector
Let ak+1 be a vector linearly independent of the columns of Ak. Form the
augmented matrix
using Q of Eq. (24.2). Introduce a Householder matrix Qk+1 so that the vector
where
and
Removing a Vector
If the matrix Ak of Eq. (24.1) is modified by the removal of the column
vector ajf forming the matrix
176 MODIFYING A QR DECOMPOSITION CHAP. 24
we see that
where each r, is zero in entries i + 1,..., m. [See Golub and Saunden (1969)
and Stoer (1971).] Define
where the G, are Givens rotation matrices chosen so that the matrix
METHOD 2
Instead of storing Q explicitly as in Method 1 we now assume that g is
given as a product of fc Householder transformations.
where each
is stored by retaining only the nonzero entries of the vectorsm1plus one ad-
ditional scalar associated with each vector. The matrix R is again stored as a
k x k upper triangle.
Adjoining • Vector
If the set is enlarged to the linearly independent set [a1 ak, ak+1]
compute the product bk+l = Qk ••• Qqak+i. Compute a Householder
transformation Qk+l so that Qk+1bk+1 =rk+1is zero in entries k + 2,..., m.
It follows that
CHAP. 24 MOD1FY1NO A QR DECOMPOSITION 177
METHODS
In this method the entire contents of a storage array,Wm*[n+1],which
initially contains the data [A m x n : bmx1,], is modified each time a new column
vector is adjoined to or deleted from the basis. The column vectors of the
upper triangular matrix Rk of Eq. (24.2) will occupy some set of A: columns of
the array W. No record will be kept of the matrix Q of Eq. (24.2).
Let (Pk = {PI,PI,... ,Pk} be a subset of (1,2,..., n) identifying the
columns of A that have been introduced into the basis. The order of the
integers p, in the set (P is significant. Let Ak denote the m x A: matrix consist-
ing of those column vectors of A indexed in (P and ordered in the sequence
P1,P2, • • • p k . Let Q denote the orthogonal matrix satisfying Eq. (24.2).
Then W will contain the m x (« + 1) matrix Q[A : b]. This implies that the
jih column vector of Rk will be stored in column number pj of W. In this
method it may be found convenient to store explicit zeros for elements that
are zeroed by Householder or Givens transformations.
Adjoining • Vector
Removing • Vector
Assume again that <P* identifies the current basis. Let 1 <.j £ k and as-
sume column number pj is to be removed from the basis. For i =j 4-1,/ -f-
2,..., k form the Givens rotation matrix G( which operates on rows i — 1
and / and zeros the (i,/pi+1) element of W. Premultiply Gt times the entire
array w
Form the new index set (Pk_1 by setting p1 := pt for / = 1,2,...,./ — 1
and setting p1:—p1+\ for i*=/,/ + l , . . . , K — 1. This method requires only
the m x (n+1) storage array W which initially contains the data [A : &]. If
a copy of the initial data will be needed at later stages of the computation,
then it must be saved in additional storage.
25
PRACTICAL ANALYSIS OF
LEAST SQUARES PROBLEMS
The problem originator should also have information about the un-
certainty of the data constituting A and b. Frequently he may have a priori
information about the solution of his matrix problem. This may involve some
knowledge as to what would constitute reasonable values for some or all of
the solution components. It might also include information such as a require-
ment that some or all solution components must be nonnegative.
In very general terms there are two major considerations in the design of a
computational approach to the problem and it is important to keep these
points separate.
180
CHAP. 25 PRACTICAL ANALYSIS OP LEAST SQUARES PROBLEMS 181
Here we have replaced \\A\\r of Eq. (16.3) by its upper bound n1/2 \\A\\.
Ignoring the terms 0(n2) in Eq. (16.3) and (16.4), we infer from Theorem
(16.1) that if n is chosen small enough so that
and
and
and
With this choice of n and to w<n2 we infer from Theorem (17.11) that the
computed solution is the exact solution of a perturbed problem with per
turbations smaller than the a priori uncertainty as described by Eq. (25.2)
and (25.3).
Recall that the multipliers of \\A \\ n and \\b \\ q in Eq. (25.4), (25.5), (25.8),
and (25.9) were derived by considering worst-case propagation of computa-
tional errors. In practice, replacing these multipliers by their square roots
will generally give more realistic estimates of the norms of the virtual per-
turbations due to the computational errors.
We now turn to point 2. It will be convenient to formalize the notion of
an "acceptably small" residual vector. Suppose we are willing to accept resid-
ual vectors with norms as large as some number p. Then we may define the
acceptable solution set as
It is important to note that the definition of the set X depends upon the three
tolerance parameters, j, y, and p, which should be chosen on the basis of
actual knowledge about the problem and its data.
Our purpose in writing Eq. (25.12) is not so much to establish this particu-
lar definition of an acceptable solution set as it is to give some degree of
concreteness to the general idea of an acceptable solution set and to identify
some of the quantities that determine the "size" of this set.
To assess the "size" of the set X we first observe that if Rank (A) < nt or
if (p and K (K denotes the condition number of A) are so large that Kj|| A\\>
1, thtnA + £ will be singular for some \\E\\ < j and the set JIT is unbounded.
On the other hand, if Rank (A) — n andkj||A||< 1, then the per-
turbation bound (9.13) or (9.14) may be used to obtain useful information
regarding the diameter of the set X. The presence of the parameter p in the
definition of the set X leads to some further possible increase in its size.
If this set X is "large"" in the sense that it contains vectors which are
significantly different from each other, then some selection must be made of
CHAP. 25 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS 183
a particular solution vector from JIT. This selection process can be viewed
very broadly as including any steps that the problem originator might take
to change the problem to one having a smaller solution set, generally con-
tained in X,
The criterion that one uses to reduce the size of the set X depends upon
the application. A situation that occurs very commonly is one in which the
problem A x = b arises as a local linearization of a nonlinear least squares
problem. This was mentioned in Chapter 1. In this case one is likely to prefer
the use of the x e X having least norm in order to reduce the likelihood of
leaving the region in which b — Ax is a good approximation to the nonlinear
problem.
Although there are many different motivations (statistical, mathematical,
numerical, heuristic, etc.) that have been proposed for different specific pro-
cedures for changing a given least squares problem, most of these procedures
consist of performing one or more of the following four operations, not
necessarily in the order listed.
1. Left-multiply A and b by an m x m matrix G.
2. Right-multiply A by an n x n matrix H with the corresponding change
of variables x - Hx or x — 'HZ + E.
3. Append additional rows to A and additional elements to b.
4. Assign fixed values (often zero) to some components of the solution
vector. This may be done either with respect to the original set of variables
or a transformed set of variables.
We shall expand on each of these four items in Sections 2,3,4, and 5,
respectively.
Finally in Section 6 we describe singular value analysis. By singular value
analysis we mean the computation of a number of quantities derived from
the singular value decomposition of the matrix A and the interpretation of
these quantities as an aid in understanding the indeterminacies of the prob-
lem Ax^b and in selecting a useful solution vector. The problem Ax = b to
which singular value analysis is applied may of course be a problem resulting
from preliminary application of various of the operations described in Sec-
tions 2 to 5.
This operation changes the norm by which the size of the residual vector
is assessed. Thus instead of seeking x to minimize (b — AxY(b — Ax), one
changes the problem to that of minimizing (Gb — GAxf(Gb — GAx\ which
184 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS CHAP. 25
and
or equivalently
Note that with this scaling all components of the modified vector B defined by
by the problem
where
186 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS CHAP. 25
The matrix H is n x /, with 1<,n. The vector x is 1-dimensional and the ma-
trix A is m x 1 lf His n x n diagonal, this transformation may be interpreted
as a column scaling operation applied to A.
If H is n x n nonsingular, this transformation does not change the prob-
lem mathematically. Thus the set of vectors satisfying x — Hx + £ where
x minimizes \\B — Xx\\ is the same as the set of vectors x minimizing
\\b-Ax\\.
However, unless His orthogonal, the condition number of A will generally
be different from that of A. Therefore, if one is using an algorithm, such as
HFTI (14.9) or singular value analysis [Eq. (18.36) to (18.45)], that makes a
determination of the pseudorank of A, the algorithm may make a different
determination using A.
Furthermore, if the pseudorank is determined to have a value k < n and
one proceeds to compute the minimal length solution of the rank k problem,
then the use of the transformation matrix H alters the norm by which the
"size" of the solution vector is measured. This will, in general, lead to a
different vector being selected as the "minimum length'* solution. Thus in
problem (25.16) theminimal length solution isa solution that minimizes || x |||,
whereas using the transformed problem in Eq. (25.17) the minimal length
solution is a solution x of Eq. (25.17) that minimizes \\x\\. This amounts to
minimizing || H~1(x — E)|| rather than minimizing || x||.
As to criteria for selecting H we first note that the use of the spectral norm
of the perturbation matrix, E, in Eq. (25.2) is a realistic mathematical model
only if the absolute uncertainties in different components of A are all of
about the same magnitude. Thus, if one has estimates of the.uncertainty of
individual elements of A, the matrix H can be selected as a column scaling
matrix to balance the size of the uncertainties in the different columns of A.
A somewhat similar idea can be based on a priori knowledge of the solu-
tion vector x. Suppose it is known that the solution vector should be close to
a known vector ((the a priori expected value of x). Suppose further that one
has an a priori estimate, o1 of the uncertainty of e1, as an estimate of x1. One
can take H to be the n x n diagonal matrix with diagonal components
where a, denotes the jth column vector of A. It has been proved by van der
Sluis (1969) that with H defined by Eq. (25.24), using euclidean column
norms, Cond (AH) does not exceed the minimal condition number obtainable
by column scaling by more than a factor of n1/2.
Improving the condition number has the advantage that perturbation
bounds such as Eq. (9.10) will give less pessimistic results. It also may lead
to a determination of pseudorank k = n, whereas without the improvement
of the condition number by scaling a pseudorank of k < n might have been
determined leading to unnecessary and inappropriate complications in ob-
taining a solution.
A transformation matrix H can be selected so that A or a submatrix of A
has an especially convenient form, such as triangular or diagonal. Thus if
one has computed the singular value decomposition, A = USVt, then left-
multiplying [A : b] by Urand right-multiplying A by H = K transforms A to
the diagonal matrix S. Operationally one would not usually do this pre- and
188 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS CHAP. 25
and set
where ((GtG)-1 is the 1a priori covariance matrix of the uncertainty in the data
vector b and (FtF)- is the a priori covariance matrix of the uncertainty in
the a priori expected value e of the solution vector x. In practice, however,
these a priori covariance matrices, particularly (FtF)-1 may not be known
with much certainty and one may wish to explore the changes that different
relative weighting produces in the solution vector and in the residual vector.
For this purpose introduce a nonnegative scalar weighting parameter A
into problem (25.30) and consider the new problem
where
and
or equivalently
where
and
The translation by E and seating by F-1 used in Eq. (25.35) have been
discussed in Section 3 of this chapter. Their purpose is to produce the new
variable y that is better scaled for meaningful assessment of the "size" of the
solution vector.
Write a singular value decomposition of A:
Recall that S — Diag {s1,..., Sn}. If Rank (A) = k< n, then st = 0 for
i > k. Introduce the orthogonal change of variables
where
where
and
192 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS CHAP. 25
with
For A = 0 problem (25.39) has a diagonal matrix of rank k and the solu-
tion vector p(0) is given by
For A > 0 we use Eq. (25.41) to (25.44) and obtain the solution compo-
nents
Furthermore we have
compromise between the size of the solution vector p(l) and the size of the
residual norm w(l).
The set of solutions of problem (25.31) given by this technique has an
important optimality property expressed by the following theorem.
(25.49) THEOREM [Morrison (1960); Marquardt (1963)]
For a fixed nonnegative value of A, say, l, let y be the solution
vector for problem (25.36), and let w = ||b — Ay||. Then & is the
minimum value of \\b —Ay \\for all vectors y satisfying \\y\\ <,\\y\\.
Proof: Assume the contrary. Then there exists a vector y with
||y|| <||y|| satisfying ||ay||<||b-aj||. Therefore, ||$-^|p +
l2||y||2 < ||b-ay||2 + l2||y||, which contradicts the assumption that y is
the least squares solution of problem (25.36) and thus minimizes \\b — Ay\\2
+ l2||yll2.
A simple tabular or graphic presentation of the quantities || p(l)\\ and wl
given by Eq. (25.47) and (25.48) can be very useful in studying a particular
least squares problem. Further detail can be obtained by tabular or graphic
presentation of the individual solution components [Eq. (25.46), (25.38), and
(25.35)] as functions of JU An example of this type of analysis is given in
Chapter 26.
One may wish to solve Eq. (25.47) or (25.48) to find a value of A which
produces a prespecified solution norm or residual norm. If Newton's method
is used for this purpose the following expressions will be useful:
and
194 PRACTICAL ANALYS1S OF LEAST SQUARES PROBLEMS CHAP. 25
If one solves problem (25.36) directly, using any algorithm that determines
the norm, say pl of the residual vector, then it should be noted that Pl sat-
isfies
Thus if one wishes to obtain the quantity wl this can be done by computing
|| y(l)||2 and then solving Eq. (25.50) for wl.
is changed to
where pj is given by Eq. (25.57). The candidate solution vector p(k) is the
pseudoinverse solution (i.e., the minimal length solution) of problem (25.55)
under the assumption that the singular values sj, for j > k are regarded as
being zero.
From the candidate solution vectors p(k) one obtains candidate solution
vectors x*" for the problem Ax S b as
Fig. 26.1 Output from the subroutine SVA for the sample problem
of Chapter 26.
that for any vector y the coordinate pair (\\y\\, \\b — Ay\\) lies on or above
the curve.
For more detailed information Eq. (25.46), (25.38), and (25.35) can be
used to compute and plot values of the individual solution components as
functions of L Figures 26.3 and 26.4 show this information for the present
example. Graphs of this type are extensively discussed in Hoerl and Kennard
(1970b).
Figure 26.5 provides a comparison of solution norms and residual norms
of the five candidate solutions obtained by singular value analysis with the
corresponding data for the continuum of Levenberg-Marquardt solutions.
We have also applied the subroutine, HFTI (see Appendix C) to this
example. The magnitudes of the diagonal elements of the triangular matrix
202 SOME METHODS OF ANALYZING A LEAST SQUARES PROBLEM CHAP. 26
Fig. 26.2 Residual norm versus solution norm for a range of values
of the Levenberg-Marquardt stabilization parameter A.
Fig. 26.4 Same data as Fig. 26.3 with expanded vertical scale.
R to which the matrix A is transformed are 0.52, 0.71 x 10-1, 0.91 x 10-2,
0.14 x 10-4, and 0.20 x 10~6. It is interesting to note that these numbers
agree with the respective singular values of A to within a factor of two. From
Theorem (6.31) it is known that the singular values, st, and the diagonal
elementsttt,must satisfy
Table 26.2 SOLUTION AND RESIDUAL NORMS USINg HFTI ON THE SAMPLE
PROBLEM
k ||z(k)|| ||b-Az(k)||
1 0.99719 0.216865
2 2.24495 0.039281
3 4.58680 0.000139
4 4.929S1 0.000139
5 220.89008 0.000138
Note that the data of Table 26.2 are quite similar to the corresponding
data (columns headed "YNORM" and "RNORM") given in Fig. 26.1 for
the candidate solutions obtained by singular value analysis.
As still another way of analyzing this problem, solutions were computed
using each of the 31 possible nonnull subsets of the five columns of the
matrix A. Solution norms and residual norms for each of these 31 solutions
are listed in Table 26.3 and are plotted in Fig. 26.6.
CHAP. 26 90MB METHODS OF ANALYZING A LEAST SQUARES PROBLEM 206
1 2.46 0.40
2 1.92 0.22
3 2.42 0.07
4 2.09 0.19
5 2.30 0.19
1.2 5.09 0.039
1,3 2.72 0.052
1,4 4.53 0.023
1,5 5.07 0.001
2,3 3.03 0.053
2,4 20.27 0.030
2,5 17.06 0.128
3,4 3.07 0.056
3,5 2.97 0.058
4,5 17.05 0.175
,2,3 22.1 0.00018
,2.4 10.8 0.00018
,2.5 5.0 0.00014
,3.4 8.1 0.00015
,3.5 5.0 0.00014
,4,5 4.9 0.00014
2,3,4 13.5 0.00020
2,3.5 7.6 0.00014
2,4,5 24.0 0.00028
3,4,5 17.3 0.00017
1,2.3.4 10.3 0.00014
1,2,3,5 5.0 0.00014
1,2,4,5 5.0 0.00014
1,3,4,5 5.0 0.00014
2,3,4,5 9.0 0.00014
1,2,3,4,5 220.9 0.00014
MODIFYING A QR DECOMPOSITION
TO ADD OR REMOVE ROW VECTORS
WITH APPLICATION TO SEQUENTIAL
PROCESSING OP PROBLEMS
HAVING A LARGE OR BANDED
27 COEFFICIENT MATRIX
Section 1. Sequential Accumulation
Section 2. Sequential Accumulation of Banded Matrices
Section 3. An Example: Line Splines
Section 4. Data Fitting Using Cubic Spline Functions
Sections. Removing Rowe of Data
both Sections 1 and 2 are applicable to the sequential estimation problem since
the solution vector or its associated covariance matrix can readily be com-
puted at each step of the sequential matrix accumulation if required. In
Section 3 an application of these ideas is presented using the example of curve
fitting using line splines. In Section 4 a procedure is described for least squares
data fitting by cubic splines having equally spaced breakpoints. This provides
a further example of the applicability of the banded sequential accumulation
method of Section 2.
Methods of deleting data will be described in Section 5.
has the same solution set and the same residual norm as the problem
CHAP. 27 MODIFYING A QR DECOMPOSITION 209
The significant point permitting a saving of storage is the fact that for each t,
the matrix[r1:d1]can be constructed and stored in the storage space previ-
ously occupied by the augmented matrix
Thus the maximum number of rows of storage required is maxj [mf + min
For notational convenience let [Re: de] denote a (null) matrix having no
rows. At the beginning of the ith stagee of the algorithm one has themt-1xx
(a +1) matrix [Rl-l :dt-i] from the (i — l)st stage and the new m, x
(n + 1) data matrix [At: &,]. Let m, = mt-1 + m,. Form the mt X (n + 1)
augmented matrix of Eq. (27.4) and reduce this matrix to triangular form
by Householder triangularization as follows:
is equivalent to the original problem in Eq. (27.1) in that it has the same set
of solution vectors and the same minimal residual norm. Furthermore the
matrix R has the same set of singular values as the matrix A of Eq. (27.1).
We shall now describe a computing algorithm that formalizes this pro-
cedure. To this end set
and
It was noted in Table 19.1 that if quadratic and lower-order terms in m and
n are neglected, the number of operations required to triangularize an m x it
matrix (m > n) using Householder transformations is approximately v(2a +
2m). If the Householder processing is done sequentially involving q stages as
in Algorithm SEQHT (27.10), then the operation count is increased to ap-
proximately v(2a + 2/iXm + 9)/m. If the entering blocks of data each con-
tain k rows (kg = m), then the operation count can be written as v(2a + 2m)
(k+1)/k.
Thus sequential Householder accumulation increases in cost as the block
size is decreased. In the extreme case of a block size of A: = 1, the operation
count is approximately doubled relative to the count for nonsequential
Householder processing.
For small block sizes it may be more economical to replace the House-
holder method used at Step S of Algorithm SEQHT (27.10) with one of the
methods based on 2 x 2 rotations or reflections. The operation counts for
these methods are independent of the block size. The operation count for
triangularization using the Givens method [Algorithms Gl (10.25) and G2
(10.26)] is v(2a + 4ft). The 2x2 Householder transformation expressed
as in Eq. (10.27) has an operation count of v(3a + 3m).
Gentleman's modification (see Chapter 10) of the Givens method reduces
the count to v(2a + 2m). Thus this method is competitive with the standard
Householder method for nonsequential processing and requires fewer arith-
metic operations than the Householder method for sequential processing.
Actual comparative performance of computer programs based on any of
the methods will depend strongly on coding details.
Following an application of Algorithm SEQHT (27.10), if the upper trian-
gular matrix R of Eq. (27.6) is nonsingular, we may compute the solution, X,
by solving
In many applications one needs the unsealed covariance matrix [see Eq.
(12.1)3
212 MODIFYING A QR DECOMPOSITION CHAP. 27
The working array G will be partitioned by the algorithm into three sub-
arrays g1, g2, and G3 as follows:
(27.15) GI = rows 1 through tp, - 1 of G
(27.16) =G2rowsipthrough ir - 1 of G
(27.17) G, = rows J, through /, + m, — 1 of G
The integers i, andirare determined by the algorithm and their values chang
in the course of the processing. These integers are limited by the inequalities
l<ip<ir<n + 2.
At the various stages of the algorithm the (nb + l)st column of the G
array contains either the vector Bt [e.g., see the left side of Eq. (27.4)] or the
processed vector dt [e.g., see the right side of Eq. (27.5)].
For 1 < j < n the identification of matrix elements with storage locations
is as follows:
(27.18) In GI : storage location (i,j) contains matrix element
(/,/+./-1)
(27.19) In Gl: storage location (i,j) contains matrix element
(i,i ,+ j-i)
(27.20) In G1: storage location (i,j) contains matrix element
(i,je+j-1)
To indicate the primary idea of a compact storage algorithm for the
banded least squares problem consider Fig. 27.1 and 27.2. If Algorithm (27.10)
were applied to the block diagonal problem with nb = 4, then at the step in
which the data block \Cn bt] is introduced the working array might appear as
in Fig. 27.1. Taking advantage of the limited bandwidth these data can be
packed in storage as in Fig. 27.2.
In this illustration the inequalities i, <j, ^ t, are satisfied. For furthe
diagrammatic illustration of this algorithm the reader may wish to look
ahead to Section 3 and Fig. 27.4 and 27.5.
A detailed statement of the algorithm follows. In describing the algorithm
we shall use the notation G(i,j) to denote the (ij) element of the array G
and the notation G(il: i2,j1, :jt) to denote the subarray of G consisting of
all elements G(i,j) with i1 < i < i2, and j1 < j< J2.
(27.21) ALGORITHM BSEQHT (Banded Sequential Householder Trian-
gularization)
Step Description
1 Set ir := land/, := 1.
2 For t := 1,...,q, do Steps 3-24.
214 DECOMPOSITION
MODIFIYIGN A QR DECOMPOSITION CHAP. 27
Step Description
3 Remarks: At this point the data [C,: bt] and the integers
mt and jt must be made available to the algorithm. It is as-
sumed that m, > 0 for all t and jq > jq-1 > • • • >jt > 1.
4 Set G(ir:ir + mt - 1, 1:nb +): =[Ct:bt][SeeEq.(27.14)
for a definition of C,. Note that the portion of the G array
into which data are being inserted is the subarray (7, of
Eq. (27.17).]
5 If jt, — ip go to Step 18.
6 Remark: Here the monotonicity assumption on j, assures
that jt > ip.
7 If y, ^ in go to Step 12.
CHAP. 27 MODIFTOK} A QR DECOMPOSITION 215
Fig. 27.2 The samedataas in Fig. 27.1 but packed into the storage
array, G.
Step Description
8 Remark: Here jt exceeds /,. This indicates rank 'deficiency
since the diagonal element in position (ir, ir) of the trian-
gular matrix R [see Eq. (27.6)] will be zero. Nevertheless,
the triangularization can be completed. Some methods for
solving this rank-deficient least squares problem will be
discussed in the text following this algorithm.
9 Move the contents of G(ir: /r + m, — 1,1:nb+ 1) into
G(jt :Jt + >m1- 1,1:nb+ !).
10 Zero the subarray G(i,: jt - 1,1:»»+ 1).
11 Set/,:-/,.
12 Set m := min (nb - 1, / , - / , — 1); if /i — 0, go to Step
17.
13 For l := 1 , . . . , m, do Steps 14-16.
14 Set k := min (/, jt - /,).
216 MODIFYING A QR DECOMPOSITION CHAP. 27
Step Description
15 For i : = / + l nb„ set (?(ip, + /, i - k) := G(ip + 1,
i).
16 For!:« 1 , . . . , * , set G(if + /,nb+ 1 - 0 := 0.
17 Set ip :=jt,. [Note that this redefines the partition line
between subarrays G1 and G2 of Eq. (27.15) and (27.16).]
18 Remark: Steps 19 and 20 apply Householder triangulariza-
tion to rows if through ir + mt— 1. As was discussed
following the statement of Algorithm SEQHT (27.10),
some reduction of execution time may be attainable by re-
placing the general Householder method by one of the
methods based on 2 x 2 rotations or reflections.
19 Set m :— if + m, — /,. Set k := min (nb+ 1, m).
20 For i := 1 £, execute Algorithm HI (/, max (i + 1,
I, - i, + 1), mt <?(/„ i), p, <?(/„ i -f 1),nb + 1 - /).
21 Set ir, := ip + k [Note that this redefines the partition line
between subarrays Gt and G3 of Eq. (27.16) and (27.17).]
22 If H < n> + 1, go to Step 24.
23 For j :- 1....,nb, set G(i, - 1,j) := 0.
24 Continue.
25 Remark: The main loop is finished. The triangular matrix
of Eq. (27.6) is stored in the subarrays (7, and Ga [see Eq.
(27.15) and (27.16)] according to the storage mapping of
Eq. (27.18) and (27.19).
If the main diagonal elements of the matrix R of Eq.
(27.6) are all nonzero, the solution of problem (27.7) can
be computed by back substitution using Steps 26-31. Note
that these diagonal elements of the matrix R are used as
divisors at Step 31. Discussion of some techniques for
handling the alternative (singular) case are given in the
text following this algorithm.
26 For i := 1 , . . . , n , set X(i) := G(i, nb + 1).
27 For i: n,n — 1 , . . . , 1 do Steps 28-31.
28 Set j := 0 and / :— max (0, i — /„).
29 If i = n, go to Step 31.
30 For j:= 2 , . . . , min(n + 1 — i,nb), set s := s +
G(i,j+1)xX(i-1+j).
31 Set X(i): = [X(i) - s]/G(it I + 1).
CHAP. 27 MODIFYING A QR DECOMPOSITION 217
Step Description
32 Remark: The solution vector x is now stored in the array
X. If, as would usually be the case, the full data matrix
[A: b] [see Eq. (27.2) and (27.3)] has more than n rows,
the scalar quantity, e, of Eq. (27.6) will be stored in location
G(n + 1, nt + 1). The magnitude of e is the norm cf the
residual associated with the solution vector, x.
The banded structure of A does not generally imply that the unsealed
covariance matrix C of Eq. (12.1) will have any band-limited structure. How-
ever, its columns, cjtj= 1 it, can be computed, one at a time, without
requiring any additional storage. Specifically the vector c, can be computed
by solving the two banded triangular systems
in which the matrix A has bandwidth nb. The matrix [ A : b ] can then be pro-
cessed by Algorithm BSEQHT. If the matrix F is nonsingular, then A is of
full rank. If the matrix Fis nonsingular and A is sufficiently large, then A will
be of full pseudorank so that Algorithm BSEQHT can be completed including
the computation of a solution.
If one wishes to investigate the effect of different values of A in problem
(25.31), one could first process only the data [A: b] of problem (25.31), re-
ducing this matrix to a banded triangular array
The latter problem can be subjected to row interchanges so that the coef-
ficient matrix has bandwidthnb,and this transformed problem is then in the
form such that its solution can be computed using Algorithm BSEQHT.
This technique of interleaving rows of the matrix [AF: ld] with the rows
of the previously computed triangular matrix [R: d], to preserve a limited
bandwidth, can also be used as a method for introducing new data equations
into a previously solved problem. This avoids the need for triangularization
using the entire original data set for the expanded problem.
In the case where F = In and d = 0, the choice of parameter A can be
facilitated by computing the singular values and transformed right side of the
least squares problem of Eq. (27.7). Thus with a singular value decomposition
the vector
can be computed without additional arrays of storage other than the array
that contains the banded matrix R. The essential two ideas are: post- and
premultiply R by a finite sequence of Givens rotation matrices Jt, and Tt so
that the matrix B = Tv--t1,RJ1• • • /, is bidiagonal. The product Tv • • • r,J
replaces <7 in storage. Compute the SVD of the matrix B using Algorithm
QRBD (18.31). The premultiplying rotations are applied to d in storage,
ultimately producing the vector g of Eq. (27.25). A value of A can then be
determined to satisfy a prespecified residual norm or solution vector norm by
using Eq. (25.48) or (25.47), respectively.
To fix some of the ideas that were presented in Sections 1 and 2, consider
the following data-fitting (or data-compression) problem. Suppose that we
have m data pairs {(t„ y,)} whose abscissas occur in an interval of t such that
a<.tt<.b, i = I,... ,m.It is desired that these data be fit by a function
/(/) whose representation reduces the storage needed to represent the data.
Probably the simplest continuous function (of some generality) that can be
220 MODIFYING A QR DECOMPOSITION CHAP. 27
(least squares) fit to these data is the piecewise linear continuous function
defined as follows:
(27.26) Partition the interval [a, b] into n — 1 subintervals with breakpoints
t(l) satisfying a = t(1) < t(2) < - - • < t(n) = b.
For
define
and
The least squares problem for the x, then has the form shown in Fig. 27.4.
Note that the coefficient matrix is banded with bandwidth nb = 2.
Fig. 27.4
Fig. 27.5
222 MODIFYING A QR DECOMPOSITION CHAP. 27
Stage (1) Initially ip = ir, = 1. The first block of data, consisting of the
nontrivial data in the first three rows of Fig. 27.4, is introduced. Set j1 = 1,
MI == 3.
Stage (2) This block is triangularized using Householder transforma-
tions. Set ir, = 4.
Stage (3) Introduce second block of data from Fig. 27.4. Set J2 = 2,
M2 = 3.
Stage (4) Left shift of second row exclusive of the last column, which
represents the right-side vector. Set ip = j2 (= 2).
Stage (5) Triangularization of rows 2-6. Set ir = 5.
Stage (6) Introduce third block of data from Fig. 27.4. Set J3 = 3,
m3 = 4.
Stage (7) Left shift of row 3 exclusive of last column. Set ip = ji (= 3).
Stage (8) Triangularization of rows 3-8. Set i, = 6.
The data appearing at stage (8) represent the least squares problem shown
in diagram (9). This problem can now be solved by back substitution.
As an illustration of the extent of storage that can be saved with the use
of this band-matrix processing of Section 2, consider an example of line-
spline fitting with m — 1000 data points using 100 intervals. Thus the least
squares problem will have n — 101 unknowns. Let us suppose further that
the row dimension of each block we shall process does not exceed 10. Then
the maximum size of the working array does not have to exceed [n + 1 +
max (m,)] x 3 = (101 + 1 + 10) x 3 = 336. The use of a less specialized
sequential accumulation algorithm such as that of Section 1 of this chapter
would require a working array of dimension at least [n + 1 + max (m,)] x
(it + 1) = (101 + 1 + 10) x 102 = 11,424. If all rows were brought in at
once, the working array would have to have dimensions m x (n + 1) = 1000
x 101 - 101,000.
continuous together with its first and second derivatives throughout the
interval [b1,bn].
It can be shown that the set S constitutes an (n + 2)-dimensional linear
space of functions. Thus any set of n + 2 linearly independent members of
5, say [jq :j — 1 , . . . , n + 2), is a basis for 5. This implies that each f e S
has a unique representation in the form
Using such a basis, the problem of finding a member of 5 that best fits a
set of data {(x1 y,): x, e [bt, bn]; t = 1 , . . . , m] in a least squares sense takes
the form
where
and
Let 11 denote the closed interval [b1, b2] and let 1k denote the half-open in-
terval (bk, bk+,] for k — 2 , . . . , n — 1.
In the interval /* only four of the functions qt have nonzero values. These
four functions are defined for x e 1k by
224 MODIFYING A QR DECOMPOSIIION CHAP. 27
* y x y
2 2.2 14 3.8
4 4.0 16 5.1
6 5.0 18 6.1
8 4.6 20 6.3
10 2.8 22 5.0
12 2.7 24 2.0
where rt is the residual at the ith data point. The value of RMS for each case
is given in Table 27.2. Note that RMS is not a monotone function of NBP.
Table 27.2 RMS AS A FUNCTION OF NBP
NBP 5 6 7 8 9 10
RMS 0.254 0.085 0.134 0.091 0.007 0.0
Plots of each of the six curve fits are given in Fig. 27.6.
CHAP. 27 MODIFYING A QR DECOMPOSITION 225
Three methods will be described for removing a row of data from a least
squares problem, A x = b . For notational convenience let
226 MODIFYING A QR DECOMPOSITION CHAP. 27
Suppose C has m rows and n columns and is of rank k. Suppose further that
a Cholesky factor R for C has been computed. Here R is a k x n upper trian-
gular matrix satisfying
The first two methods to be described appear in Gill, Golub, Murray, and
Saundcrs (1972).
ODIFYING MODIFYING A QR DECOMPOSTION 227
Since & is the only nonzero element in its column and the column is of unit
euctidean length it follows that a = ±1. Furthermore since the rows also
have unit euclidean length u must be zero. This in turn implies that wt = uvt
= +vt Thus Eq. (27.39) can be written as
Defining
228 MODIFYING A QR DECOMPOSITION CHAP. 27
Thus Q and R satisfy the conditions required of Q and H in Eq. (27.35) and
(27.36).
The question of whether the rank of Risk or k—I depends upon whether
& in Eq. (27.38) is nonzero or zero. By assumption the k diagonal elements
of It are each nonzero. If & is nonzero, it is easily verified by noting the
structure and effect of the Givens transformation matrices used in passing
from Eq. (27.38) to Eq. (27.40) that the k diagonal elements of R will also be
nonzero.
Suppose on the other hand that & is zero. Then \\p\\ = 1, which assures
that some components of p are nonzero. Letp1denote the last nonzero com-
ponent of p, i.e., pi = 0, and if 1 = k, then pt = 0 for 1 < i < k. In the order
in which the matrices Gij are applied in transforming Eq. (27.38) to Eq.
(27.40), the matrix C1,k+1 will be the first transformation matrix that is not
simply an identity matrix. This matrix Gl>k+l will be a (signed) permutation
matrix that interchanges rows / and k + 1, possibly changing the sign of one
of these rows. In particular, its effect on R will be to replace row / with a row
of zero elements. Subsequent transformations will not alter this row. There-
fore, row l of R in Eq. (27.40) will be zero.
Thus Rank (R) will be less than k. The rank of R cannot be less than k— I
since Rank (R) = Rank (C) > Rank (C) - 1 = k - 1.
or equivatently
as a consequence of the unit euclidean length of the column vector (ptt a, 0)T
in Eq. (27.38). The sign of & is arbitrary and so can be taken to be nonnega-
tive, as is implied by Eq. (27.46).
Having computed p and a one can compute R by using k Givens transfor-
mations as described following Eq. (27.38) in Method 1. This part of the com-
putation is expressed by the equation
This method has been discussed by Golub (1969), Chambers (1971), and
Gentleman (1972a and 1972b). Following the practice of these authors we
shall restrict our discussion to the case in which Rank (C) = Rank (C) = k
= n. Properties of the more general case will be developed in the Exercises.
Letting i denote the imaginary unit (i2 = — 1), Eq. (27.36) can be written
as
By formal use of the Givens equations (3.5), (3.8), and (3.9). matrices
(l)
F can be defined that will triangularize as follows:
Here the matrix F(i) differs from the identity matrix only at the intersec-
tions of rows / and k +1 with columns / and k + 1. In these four positions
F(i) contains the submatrix
where
It can be shown that our assumption that both C and & are of full rank n
implies that the numbers d (l) defined by Eq. (27.53) are positive.
The multiplication by F(i) indicated in Eq. (27.50) can be expressed entirely
in real arithmetic as follows:
CHAP. 27 MODIFYING A QR DECOMPOSITION 231
EXERCISES
(27.60) [Chambers (1971)]
(a) Show that the numbers s(l) and T(l) of Eq. (27.55) and (27.56)
can be interpreted as being the secant and tangent, respectively,
of an angle 0(l).
(b) Show that these angles0(l)are the same angles that appear in
the Givens transformations G(l) (c(a) = coso(l), s(l) — sin0(l))
which would occur in the process of adjoining the row if to R
to produce R, i.e., in the operations represented by
232 MODIFYING A QR DECOMPOSITION CHAP. 27
(c) With O(l), c(l), and s(l) defined as above, show that as an alterna-
tive to Eq. (27.58)v(l)icould be computed as
In this appendix we list the essential facts of linear algebra that are used
in this book. No effort is made to present a logically complete set of con-
cepts. Our intention is to introduce briefly just those concepts that are directly
connected with the development of the book's material.
For a real number x define
Two vectors are orthogonal to each other if their inner product is zero.
The euclidean length or euclidean norm or l2 norm of a vector v, denoted
by || v 11, is defined as
positive homogeneity,
We shall also have occasion to use the Frobenioits norm (also called the
Schur or euclidean matrix norm) of a matrix A, denoted by || A||, and defined
as
We shall let the numeral 0 denote the zero vector or the zero matrix with
the distinction and dimension to be determined by the context.
A set of vectors v1,..., vk is linearly dependent if there exist scalars a1,
...,ak, not all zero, such that
k—
(Ak-skin
s
) and
Rk are bounded in magnitude by 2 \\A\\, for all k= 1,2,
We must examine the basic operations of the QR method with shifts to
establish properties of certain intermediate quantities and ultimately to esti-
mate the magnitude of certain of the off-diagonal terms of the matrices Ak.
Denote the diagonal terms of the shifted matrix (Ak —skIn)by
By the choice rule for sk [see Eq. (18.4)] it follows that the eigenvalue of the
240
AFP. B GLOBAL QUADRATIC CONVERGENCE OF THE QR ALGORITHM 241
and
where eachJ(k)t-t,lis a rotation in the plane determined by the (i — l)st and ith
coordinate axes. The scalarsc(k)iands(k)tof the rotation matrices, as given in
Lemma (3.4), define the rotation by means of the identity
Here each P\k) is defined as in Eq. (3.5) with c and s appropriately super- and
subscripted.
Following premultiplication of (Ak — skIn) by the first i — 2 of the rota-
tion matrices we shall have
From Eq. (B.10) and (B.24) and the fact that the sequence is
bounded [Lemma (B.2)], we obtain
From Eq. (B.25), (B.26), and (B.29) we haveB(k)n-1—>0, but since the
sequence{b(k)n}is bounded, this implies , contradicting the as-
sumption that L > 0. This completes the proof of Lemma (B.22).
(B.30) LEMMA
The sequence k =1,2,..., contains arbitrarily small terms.
or
It follows that for any t > 0 there exists an integer £ depending upon t such
that
and
differ at most by c. Thus with a possible reindexing of the A,' we have the
inequalities
we have
Now using Eq. (B.41) on the second and third terms of the right side of
inequality (B.42), observing that |£ — Xm\ = | l, -- |n > d, followed by use
of Eq. (B.4) with b(nk) = € on the fourth term of the right side of inequality
(B.42) we have
EXERCISE
Introduction
HFTI
This set of problems can also be written as the matrix least squares problem:
SVA
the subroutine performs a singular value analysis of the least squares problem
where
The subroutine SVA uses the subroutine SVDRS to compute the singu-
lar value decomposition
with components
256 DESCRIPTION AND USB OF FORTRAN CODES AFP. C
where Vj denotes the jth column vector of V. Reverting to the original vari-
ables we have the corresponding candidate solutions
The quantities
are computed. For k < n the quantity p2k is the sum of squares of residuals
associated with the kth candidate solution; i.e., p\ = \\b — Ax(k)\\2.
It is possible that the m x (n + 1) data matrix [A: b] provided to this
subroutine may be a compressed representation of a least squares problem
involving more than m rows. For example, [A: b] may be the triangular
matrix resulting from sequential Householder triangularization of a large set
of data as described in Chapter 27. The user provides an integer MDATA
that specifies the number of rows of data in the original problem. Of course,
if[A:b] is the original data, then MDATA and M should be set to the same
value. The number MDATA is used in computing
Under appropriate statistical hypotheses on the data [A: b], the number
ak can be interpreted as an unbiased estimate of the standard deviation of
errors in the data vector b.
Adapting Eq. (25.47) and (25.48) to the notation of this Subroutine User's
Guide, we compute the following quantities, which may be used for a Leven-
berg-Marquardt analysis:
APP. C DESCRIPTION AND USE OP FORTRAN CODES 257
Usage
B()
The array B( ) initially contains the M-vector 6 of the least
squares problem Ax ~ 6. On output the M-vector g is stored
in the array B( ).
SING( )
On return the singular values of the scaled matrix A are stored
in descending order in SING(i), i — 1,..., min(M, N). If M
< N, SING(M+1) through SING(N) will be set to zero.
KPVEC( )
Option array controlling report generation. If KPVEC(l) — 0,
default settings will be used, producing the full report, sending
it to the standard system output unit, formatted with a max-
imum line length of 79. If KPVEC(l) = 1, the contents of
KPVEC(i), i = 2,..., 4, set options for the report as follows:
KPVEC(2)
The decimal representation of KPVEC(2) must be at most 6
digits, each being 0 or 1. The decimal digits will be interpreted
as independent on/off flags for the 6 possible blocks of the report
that were described above. Examples: 111111, which is the
default value, selects all blocks. 101010 selects the 1st, 3rd,
and 5th blocks. 0 suppresses the whole report.
KPVEC(3)
Define UNIT = KPVEC(3). K UNIT = -1, which is the
default value, output will be written to the "*" output unit,
i.e., the standard system output unit. If UNIT > 0, UNIT will
APP. C DESCRIPTION AND USE OF FORTRAN CODES 259
Example of Usage
SVDRS
USER'S GUIDE TO SVDRS: SINGULAR VALUE
DECOMPOSITION OF PROBLEM LS
Purpose
Given an M x N matrix A and an M x NB matrix B, this subroutine
computes the singular values of A and also computes auxiliary quantities
useful in analyzing and solving the matrix least squares problem AX =*
B. Denoting the singular value decomposition of A by A — USVT, this
subroutine computes 5, V, and G — UtB.
To complete the solution of the least squares problem AX = B, the
user must first decide which small singular values are to be treated as zero.
Let S+ denote the matrix obtained by transposing S and reciprocating
the significant singular values and setting the others to zero. Then the
solution matrix X can be obtained by computing P = S+G and X — VP.
Either M > N or M < N is permitted. Note that if B = I, then X is the
pseudoinverse of A.
Input
The matrices A and B and their dimensioning parameters are input.
Output
The matrices V, G — UTB, and the diagonal elements of S are output in
the arrays A(,), B(,), and S(), respectively.
Usage
APP. C DESCRIPTION AND USB OP FORTRAN CODES 261
SO
On conclusion the first N locations contain the ordered singular
values of the matrix S(l) > S(2) > • • • > S(N) > 0.
WORK( )
This is working space of size 2 x N.
262 DESCRIPTION AND USB OF FORTRAN CODES APP. C
Error Message
In subroutine QRBD the off-diagonal of the bidiagonal matrix with
smallest nonzero magnitude is set to the value zero whenever the iteration
counter reaches 10 * N. The iteration counter is then reset. The results
are therefore still likely to result in an accurate SVD. This is reported
to SVDRS with the output parameter IPASS = 2. Subroutine SVDRS
then prints the message
FULL ACCURACY NOT ATTAINED IN BIDIAGONAL SVD
Example of Usage
See PROG3 and the subroutine SVA for examples of usage of this
subroutine.
QRBD
Usage
DIMENSION D(n,), E(n1,), V(MDV, n1), C(MDC, m.)
CALL QRBD (IPASS, D. E. N, V, MDV, NRV, C, MDC, NC)
The dimension parameters must satisfy nl > N, m, ;> NC, MDV >
NRV, MDC > N.
The subroutine parameters are defined as follows:
IPA8S This integer flag is returned with either of the values
1 or 2.
IPASS = 1 signifies that convergence of the QR
BNDACC, BNDSOL
Usage of BNDACC
DIMENSION G(MDG,n,)
The dimensioning parameters must satisfy MDQ > m [see Eq. (27.9) for
the definition of m] and it,n1>NB +1.
The user must set IP - 1 and IR - 1 before thefirstcall to BNDACC.
APP. C DESCRIPTION AND USB OP FORTRAN CODES 265
Usage of BNDSOL
This subroutine performs three distinct functions selected by the first
parameter MODE, which may have the values. 1,2, or 3.
The statement CALL BND8OL (1,...) may be executed after one or
more calls to BNDACC to compute a solution for the least squares problem
whose data has been introduced and triangularized by the calls to BNDACC.
The problem being solved is represented by Eq. (27.11). This statement also
computes the number | e \ of Eq. (27.6), which has the interpretation given in
Eq. (27.12). The number \e\ will be stored in RNORM.
The computation performed by CALL BND8OL (1,...) does not alter
the contents of the array G ( , ) or the value of IR. Thus, after executing
CALL BNDSOL(1,.. .),more calls can be made to BNDACC to introduce
additional data.
The statement CALL BNDSOL (2,...) may be used to solve Rty = d
and the statement CALL BNDSOL (3,...) may be used to solve RC = w.
These entries do not modify the contents of the array G(. ) or the variable
266 DESCRIPTION AND USE OF FORTRAN CODES APP. C
IR. The variable RNORM is set to zero by each of these latter two state-
ments.
The primary purpose of these latter two statements is to facilitate compu-
tation of columns of the covariance matrix C — (RtR)- l as described in
Chapter 27. To compute c,, the jth column of C, one would set d — ej, the
Jin column of the identity matrix. After solving Rty = d, set w = y and
solve Rcj — w. This type of usage is illustrated in the Fortran program
PROG5.
The appropriate specification statement is
DIMENSION G(MDG,n1), X(n2)
The dimensioning parameters must satisfy MDG > m [see Eq. (27.9)
for the definition of m], n1 > NB +1, and n2 > N. The CALL statement is
CALL BNDSOL(MODE, G, MDG, NB, IP, IR, X, N, RNORM)
The subroutine parameters are defined as follows:
MODE Set by the user to the value 1, 2, or 3 to select
the desired function as described above.
G(,), MDG, NB, IP, IR These parameters must contain values as they
were defined upon the return from a preceding
call to BNDACC.
X( ) On input, with MODE = 2 or 3, this array
must contain the N-dimensional right-side vec-
tor of the system to be solved. On output with
MODE = 1, 2, or 3 this array will contain
the N-dimensional solution vector of the appro-
priate system that has been solved.
N Set by user to specify the dimensionality of the
desired solution vector. This causes the sub-
routine BNDSOL to use only the leading N x
N submatrix of the triangular matrix currently
represented in columns 1 through NB of the
array G( ,) and (if MODE = 1) only the first
N components of the right-side vector currently
represented in column NB + 1 of the array
G(,).
If any of the first N diagonal elements are zero,
this subroutine executes the Fortran statement
STOP after printing an appropriate message.
See the text following Algorithm (27.24) for a
APP. C DESCRIPTION AND USE OF FORTRAN CODES 267
Example of Usage
See PROG5 for an example of the usage of BNDACC and BNDSOL
LDP
Purpose
This subroutine computes the solution of the following constrained least
squares problem:
Problem LDP: Minimize ||x|| subject to Gx > h. Here G is an M x N
matrix and h is an M-vector.
No initial feasible vector, x0, is required. Thus the subroutine LDP can be
used to obtain a solution to an arbitrary set of linear inequalities, Gx > h. If
these inequalities have no solution the user is notified by means of a flag
returned by the subroutine.
Method
This subroutine implements Algorithm LDP (23.27), which reduces
Problem LDP (23.3) to a related Problem NNLS (23.2). Problem NNLS is
solved with subroutine NNLS and a solution to Problem LDP is then ob-
tained or it is noted that no solution exists.
268 DESCRIPTION AND USB OF FORTRAN CODES APP. C
Usage
DIMENSION G(MDG,nt), H(m,), X(n1), W(n2)
INTEGER INDEX(mi)
CALL LDP (G, MDG,M,N, H,X, XNORM, W, INDEX, MODE)
The dimensioning parameters must satisfy MDQ > M,n1, > N, m1 > M,
and n2 > (M + 2)*(N +1) + 2*M.
The subroutine parameters are defined as follows:
G(,), MDQ, M, N The array G ( , ) has first dimensioning parameter
MDG and contains the M x N matrix G of Problem
LDP. Either M > N or M < N is permitted and
there is no restriction on the rank of G. The contents
of G ( , ) are not modified.
H( ) The array H( ) contains the M-vector h of Problem
LDP. The contents of H( ) are not modified.
X( ) The contents of the array X( ) do not need to be
defined initially. On normal termination (MODE=1)
X( ) contains the solution vector X. On an error
termination (MODE > 2) the contents of X( ) are
set to zero.
XNORM On normal termination (MODE = 1) XNORM con-
tains ||x||. On an error termination (MODE ;> 2)
the value of XNORM is set to zero.
W( ) This array is used as temporary working storage.
INDEX( ) This array is used as temporary type INTEGER
working storage.
MODE This flag is set by the subroutine to indicate the
status of the computation on completion. Its value
indicates the reason for terminating:
1 = successful.
2 = bad dimension. The condition N <; 0 was noted
in subroutine LDP.
3 = maximum number (3*M) of iterations
exceeded in subroutine NNLS.
4 = inequality constraints Gx > h are incompatible.
Example 1
See program PROG6 for one example of the usage of this subroutine.
Example 2
Suppose a 5 x IS matrix G and a 5-dimensional vector h are given. The
following code will solve Problem LDP for this data:
APP. C. DESCRIPTION AND USE OF FORTRAN CODES 269
NNLS
H12
Purpose
This subroutine implements Algorithms HI and H2 (10.22). Given an m-
vector v and integers I, and /,, this subroutine computes an m-vector u and
a number s such that the m x m (Householder) symmetric orthogonal matrix
Q = I + (uut)l(sud satisfies Qv = w where w, — vt for i < lp, |w,.\ =
<v2lp+Emi-liv2i)1/2,wt,= v1 for lp, < i < l1„ and w, = 0 for /, < i < m.
tionally this matrix Q may be multiplied times a given set of NCV m-vectors.
Usage
CALL H12 (MODE, LPIVOT, L1, M, U, IUE, UP, C, ICE, ICV. NCV)
The parameters are defined as follows:
G1, G2
Purpose
Given the numbers x1 and x2, the subroutine Q1 computes data that
defines the 2 x 2 orthogonal rotation matrix G such that
MFEOUT
USER'S GUIDE TO MFEOUT: MATRIX OUTPUT
SUBROUTINE
Purpose
Usage
A(,)
Array containing an M x N matrix to be output.
MDA
Dimension of first subscript of the array A(,).
M,N
NAMES( )
NAMES(i) contains a character string label to be output in as-
sociation with row i of the matrix. If NAMES(l) contains only
blank characters, it will be assumed that no names have been
provided, and the subroutine will not access the NAMES( )
array beyond the first element and will output blank labels for
all rows.
APP. C DESCRIPTION AND USB OF FORTRAN CODES 277
MODE
MODE = 1 provides headings appropriate for the V matrix of
the singular value decomposition and uses a numeric format of
4pfl4.0.
MODE = 2 provides headings appropriate for the array of
candidate solutions resulting from singular value analysis and
uses a numeric format of g14.6.
UNIT
Selects the output unit. If UNIT > 0, then UNIT is the
output unit number. If UNIT = — 1, output will go to the "*"
unit, i.e., the standard system output unit.
WIDTH
Selects the width of output lines. Each output line from this
subroutine will have at most max(26,min( 124, WIDTH)) char-
acters plus one additional leading character for Fortran "car-
riage control," which will always be a blank.
GEN
Purpose
This FUNCTION is used by PROG1, PROG2, and PROG3 to generate
data for test cases. By basing the generation method on small integers it is
intended that the same test cases can be generated on virtually all computers.
Method
This FUNCTION generates two sequences of integers:
and
The sequence {Ik} has period 10, while the sequence {J1] has period 332.
On the kth call after initialization, GEN produces the REAL output value
278 DESCRIPTION AND USE OF FORTRAN CODES APP. C
The next member of the sequence {J1} is produced only when ANOI8E > 0.
No claim is made that this sequence has any particular pseudorandom
properties.
Usage
This FUNCTION must be initialized by a statement of the form
DIFF
Purpose
This FUNCTION is used by the subroutines HFTI, QRBD, LDP and
NNLS to make the test "If ((x + h) - x) = 0" which is used in place of
the test "If (| h| > n |x|)" where 9 is the relative precision of the machine
floating point arithmetic.
Method
In the intended usage of this FUNCTION the intermediate sum z = x
+ h is computed in the calling program using n-precision arithmetic. Then
the difference d= z — x is computed by this FUNCTION using ^-precision
arithmetic.
Usage
This FUNCTION can be used for the test described in Purpose with
an arithmetic IF statement of the form
The statement numbered 10 corresponds to the condition \h\ > i\\x\. The
statement numbered 20 corresponds to the condition \h\ < n\x\.
BVLS
USER'S GUIDE TO BVLS: BOUNDED VARIABLES LEAST
SQUARES
Purpose
This Fortran 90 subroutine finds the least squares solution to an M x N
system of linear equations Ax ~ 6, subject to the constraints
Method
Problem BVLS is solved using an algorithm of C. Lawson and R. Han-
son. It is a generalization of the algorithm NNLS described in Chapter
23. In the following descriptions we refer to sets f, P, and Z. Set T con-
sists of variables fixed throughout the solution process due to their lower
and upper bounds being equal. All other variables are in sets P or Z.
The membership of sets P and Z may change during the solution process.
Variables hi Set P are determined to minimize the objective FUNCTION
subject to the variables in Set Z being held (temporarily) at fixed values.
A variable in Set Z may be at one of its bounds or at zero.
Usage
interface
subroutine bvls(A, B, BND, X, RNORM,&
NSETP, W, INDEX, IERR)
real(kind(leO)) A(:,:), B(:), BND(:,:), X(:), RNORM, W(:)
integer NSETP, INDEX(:), IERR
279
280 DESCRIPTION AND USE OF FORTRAN CODES APP.C
end subroutine
end interface
A(:,:) [INTENT(inout)]
On entry, A(:,:) contains the M x N matrix A. On return,
A(:,:) contains the product matrix QA, where Q is an M x M
orthogonal matrix generated implicitly by this subroutine. The
values M and N are the respective number of rows and columns
in A(:,:). Thus M=size(A,l) and N=size(A,2). Required
are M > 0 and N > 0.
B(:) [INTENT(inout)]
On entry, B(:) contains the M-vector, 6. On return, B(:)
contains Qb.
BND(1:2,:) pENTENT(in)]
The lower bound a, for Xi must be given in BND(1,») and the
upper bound Bi in BND(2,i). Required are a, < Bi. To indi-
cate Xi has no lower bound, set BND(l,i) = -HUGE(l.OeO).
To indicate that xi has no upper bound, set BND(2,i) =
HUGE(l.OeO).
X(:) [INTENT(out)]
On entry, X(:) need not be initialized. On a normal return,
X(:) will contain the solution N-vector.
RNORM [INTENT(out)]
On a normal return this is the Euclidean norm of the residual
vector.
NSETP [INTENT(out)]
APP.C DESCRIPTION AND USE OF FORTRAN CODES 281
W(:) (INTENT(out)]
On return, W(:) will contain the dual vector w. This is the
negative gradient vector for the objective FUNCTION at the
final solution point. For j € F, W(j) may have an arbitrary
value. For j € P, W(J) =0. For j € Z, W(j) is < 0, > 0, or 0,
depending on whether X(j) is at its lower bound, at its upper
bound, or at zero with zero not being a bound.
INDEX(:) [INTENT(out)]
An INTEGER array of length at least N. On exit the value of
NSETP and the contents of this array define the sets P, z, and
f. The indices in INDEX(1: NSETP) identify P. Letting nf
denote the number of pairs of lower and upper bounds that are
identical, the indices in INDEX(NSETP+1: N-nf) identify
Z, and the indices in INDEX(N-nf+l: N) identify f.
IERR [INTENT(out)]
Indicates the status on return.
= 0 Solution completed OK.
= 1 m < 0 or n<0
= 2 One array has inconsistent size.
Required are size(B) > M, size(BND,l) = 2, size(BND,2)
> N, size(X) > N, size(W) > N, and size(INDEX) > N.
= 3 Input bounds are inconsistent. Required are BND(l,i) <
BND(2,i), i=l,... ,N.
— 4 Exceeds maximum number of iterations, ITMAX = 3*N.
This last condition may be caused by the problem being very ill-
conditioned. It is possible, but uncommon, that the algorithm
needs more iterations to complete the computation. The itera-
tion counter, ITER, is increased by one each tune a variable is
moved from Set Z to Set P.
282 DESCRIPTION AND USE OF FORTRAN CODES APP.C
Remark
It is not required that A be of full column rank. In particular it is
allowed to have M < N. In such cases there may be infinitely many
vectors x that produce the (unique) minimum value of the residual vector
length. The one returned by this subroutine does not have any particular
distinguishing properties. It does not necessarily have the least possible
number of nonzero components nor the minimum euclidean norm of all
solutions.
Functional Description
D
Introduction
DEVELOPMENTS
FROM 1974 TO 1995
284
APR 0 DEVELOPMENTS FROM 1974 TO 1995 285
(1990b), Paige (1984,1986), Paige and Saunders (1981), Park (1994), Stew-
art (1982), Sun (1983), and Van Loan (1976, 1985). Generalizations of the
QR decomposition are treated in Paige (1990).
new algorithmic ideas for rank estimation and introduced the term rank-
revealing decomposition, which has been further treated by a number of
authors, e.g., Barlow and Vemulapati (1992b), Bischof and Hansen (1991,
1992), Chan and Hansen (1990, 1992, 1994), Chandrasekaran and Ipsen
(1994), Fierro and Hansen (1993), Higham (1987), Mathias (1995), Park
and Eldeii (1995a), Shroff and Bischof (1992), and Van Huffel and Zha
(1993).
Deriving formulas for computation of the covariance matrix for use with
the solution method of Chapter 21 is only slightly more involved. Using
the quantities defined in Eqs. (21.4)-(21.7), the resulting formulas are:
Any value can be assigned to yy. However, y2 = 0 gives the minimum value
for \\xz\\. With y2 = 0, compute y1, x1, and x2, respectively, from
Poor (1992), Quak, et al. (1993), Reichel (1991), Sevy (1995), Springer
(1986), Stewart (1992), Wahba (1990), Williams and Kalogiratou (1993),
Wood (1994), and Xu, et al. (1994). Data assimilation in the fields of me-
teorology and oceanography gives rise to very large and complex nonlinear
least squares problems, e.g., see Navon, et al. (1992), Zou, et al. (1993),
and Zou and Navon (1994).
References
Adams, G, E., Bojanczyk, A. W., and Luk, F. T. (1994) "Computing the
PSVD of Two 2x2 Triangular Matrices," SIAM J. Matrix Anal. Appl,
15, No. 4, 366-382.
APP. D DEVELOPMENTS PROM 1974 TO 1995 297
Elhay, S., Golub, G. H., and Kautsky, J. (1991) "Updating and Down-
dating of Orthogonal Polynomials with Data Fitting Applications," SIAM
J. Matrix Anal. Appl., 12, No. 2, 327-353.
Elliott, G. H. (1993) "Least Squares Data Fitting Using Shape Preserv-
ing Piecewise Approximations," Numer. Algorithms, 5, No. 1-4, 365-371.
Fausett, D. W. and Fulton, C. T. (1994) "Large Least Squares Problems
Involving Kronecker Products," SIAM J. Matrix Anal. Appl., 15, No. 1,
219-227.
Fernando, K. V. and Hammarling, S. J. (1987) "A Product Induced
Singular Value Decomposition (IISVD) for Two Matrices and Balanced Re-
alisation," in Linear Algebra in Signals, Systems, and Control, Eds. Datta,
B. N., Johnson, C. R., Kaashoek, M. A., Plemmons, R. J., and Sontag, E.
D., SIAM Publ., Philadelphia, PA., 128-140.
Fierro, R. D. and Bunch, J. R. (1994) "Colinearity and Total Least
Squares," SIAM J. Matrix Anal. Appl., 15, No. 4, 1167-1181.
Fierro, R. D., Golub, G. H., Hansen, P. C., and O'Leary, D. P. (1993)
"Regularization by Truncated Total Least Squares", Report UNIC-93-14,
(20 pages); SIAM J. Sci. Stat. Comput., [To appear]
Fierro, R. D. and Hansen, P. C. (1993) "Accuracy of TSVD Solutions
Computed from Rank-Revealing Decompositions," revised version of Re-
port UNIC-93-05, (15 pages); submitted to Numer. Math.
Gander, W., Golub, G. H., and Strebel, R. (1994) "Least Squares Fit-
ting of Circles and Ellipses," BIT, 34, No. 4, 558-578.
George, A., and Heath, M. T. (1980) "Solution of Sparse Linear Least
Squares Problems Using Givens Rotations," Linear Algebra Appl., 34, 69-
83.
George, A., Heath, M. T., and Ng, E. (1984) "Solution of Sparse Under-
determined Systems of Linear Equations," SIAM J. Sci. Statist. Comput.,
5, No. 4, 988-997.
George, A. and Liu, J. W-H (1981) Computer Solution of Large Sparse
Positive Definite Systems, Prentice-Hall, Englewood Cliffs, 324 pp.
Gilbert, J. and Schreiber, R. (1992) "Highly Parallel Sparse Cholesky
Factorization" SIAM J. Sci. Statist. Comput., 13, 1151-1172.
Gill, P. E., Hammarling, S. J., Murray, W., Saunders, M. A., and
Wright, M. H. (1986) "User's Guide for LSSOL (Version 1.0): A Fortran
Package for Constrained Linear Least-Squares and Convex Quadratic Pro-
gramming," Systems Optimization Laboratory Tech. Rpt. SOL 86-1, 38
pp.
Gill, P. E., Murray, W., Saunders, M. A., and Wright, M. H. (1984)
"User's Guide for QPSOL (Version 3.2): A Fortran Package for Quadratic
Programming,'' Systems Optimization Laboratory Tech. Rpt. SOL 84-6,
37 pp.
APP. D DEVELOPMENTS FROM 1974 TO 1995 303
Moonen, M., Van Dooren, P., and Vandewalle, J. (1992) "A Singular
Value Decomposition Updating Algorithm for Subspace Tracking," SIAM
J. Matrix And. Appi, 13, No. 4, 1015-1038.
More, J. J. (1977) "The Levenberg-Marquardt Algorithm: Implementa-
tion and Theory," Numerical Analysis, Proceedings, Biennial Conference,
Dundee 1977, ed. G. A. Watson, Springer-Verlag, Berlin, Heidelberg, New
York, 105-116.
More, J. J., Garbow, B. S., and Hillstrom, K. E. (1980) "User Guide for
MINPACK-1," Argonne National Laboratory Report ANL-80-74, Argonne,
IL.
More, J. J. and Wright, S. J. (1993) Optimization Software Guide, Fron-
tiers in Applied Math., 14, SIAM Publ., Philadelphia, PA., 154 pp.
Navon, I. M., Zou, X., Derber, J., and Sela, J. (1992) "Variational Data
Assimilation with the N.M.C. Spectral Model. Part 1: Adiabatic Model
Tests," Monthly Weather Review, 120, No. 7, 1433-1446.
Ng, E. and Peyton, B. W. (1993) "A Supernodal Cholesky Factoriza-
tion Algorithm for Shared-Memory Multiprocessors," SIAM J. Sci. Statist.
Comput, 14, No. 4, 761-769.
Ng, M. K. (1993) "Fast Iterative Methods for Solving Toeplitz-Plus-
Hankel Least Squares," Electron. Trans. Numer. Anal., 2, 154-170.
Ng, M. K. and Chan, R. H. (1994) "Fast Iterative Methods for Least
Squares Estimations," Numer. Algorithms, 6, Nos. 3-4, 353-378.
Olszanskyi, S. J., Lebak, J. M., and Bojanczyk, A. W. (1994) "Rank-
k Modification Methods for Recursive Least Squares Problems," Numer.
Algorithms, 7, Nos. 2-4, 325-354.
0sterby, O. and Zlatev, Z. (1983) Direct Methods for Sparse Matrices,
Lecture Notes in Computer Science, 157, Springer-Verlag, Berlin, 127 pp.
Paige, C. C. (1979a) "Computer Solution and Perturbation Analysis of
Generalized Least Squares Problems," Math. Comput., 33, 171-184.
Paige, C. C. (1979b) "Fast Numerically Stable Computations for Gen-
eralized Least Squares Problems," SIAM J. Numer. Anal., 16, 165-171.
Paige, C. C. (1984) "A Note on a Result of Sun Ji-Guang: Sensitivity
of the CS and CSV Decompositions," SIAM J. Numer. Anal., 21,186-191.
Paige, C. C. (1986) "Computing the Generalized Singular Value De-
composition," SIAM J. Sci. Statist. Comput, 7, 1126-1146.
Paige, C. C. (1990) "Some Aspects of Generalized QR Factorizations,"
Reliable Numerical Computation, Ed. Cox, M. G. and Hammarling, S.,
Oxford University Press, Oxford, UK, 73-91.
Paige, C. C. and Saunders, M. A. (1977) "Least Squares Estimation
of Discrete Linear Dynamic Systems Using Orthogonal Transformations,"
SIAM J. Numer. Anal., 14, No. 2, 180-193.
308 DEVELOPMENTS FROM 1974 TO 1995 APP. D
312
BIBLIOGRAPHY 313
51 column interchanges,
(16.1) Error analysis for 106
the full-rank overdeter- (18.5) Global quadratic con-
mined problem, 90 vergence
(16.11) Error analysis for of the shifted QR algo-
the square nonsingular rithm, 109, 240-247
problem, 92 (18.23) Algebraic
(16.18) Error analysis for rational for the implicit-
the full-rank underde- shift form of the QR- al-
termined problem, 93 gorithm, 113
(16.36) Error analysis for the (20.9) Solution of Problem
rank-deficient problem, LSE, least squares with
95 equality constraints,
(17.11) Error analysis for 136
mixed precision solution (20.31) An unconstrained
of the full- rank overde- Problem LS having the
termined problem, 101 same solution as Prob-
(17.15) Error analysis for lem LSE, 141
mixed precision solution (23.4) Kuhn-Tucker condi-
of the square nonsingu- tions for Problem LSI,
lar problem, 102 159
(17.19) Error analysis for (25.49) Optimal property of
mixed precision solution Levenberg-Marquardt
of the full- rank un- solution, 193
derdetermined problem, Toeplitz matrices, 295
102 Total least squares, 286
Transpose, 233
(17.23) Error analysis for
Triangular matrix, 236, 287-288
mixed precision solution
Tridiagonal matrix, 236
of the rank- deficient
problem, 103 U
(17.29) Error analysis of Unconstrained least
Householder triangular- squares problem having
ization with rows of dis- the same solution set as
parate sizes, 104 an equality constrained
(17.32) Error analysis of least squares problem,
Householder solution of 141
full-rank Problem LS Unsealed covariance matrix, 67
with rows of disparate Updating a QR decomposition,
sizes, 105 164, 174-179, 207-232,
(17.37) Growth of elements 293-294
in Householder triangu- Use of the V-matrix of the singu-
larization using row and lar value decomposition,
INDEX 337
73
V
van der Sluis, A., 187, 294
Vector, 233
space, 235
Verhey, C. T., 248
W
Wampler, R. H., 132
Wedin, P. A., 41, 294
Weighted least squares, 184
Wertz, H. J., 248
Wielandt, H. W., 24
Wielandt-Hoffman theorem, 23
Wilf, H. S., 132, 195
Wilkinson, J. H., 24, 36, 41, 47,
60, 83, 84, 86, 93, 100,
107, 109, 114, 123, 133,
240
Y
Young, G., 26