NNLS. Pag 160

Download as pdf or txt
Download as pdf or txt
You are on page 1of 352

Solving Least

Squares Problems
SIAM's Classics in Applied Mathematics series consists of books that were previously
allowed to go out of print. These books are republished by SIAM as a professional
service because they continue to be important resources for mathematical scientists.
Editor-in-Chief
Robert E. O'Malley, Jr., University of Washington
Editorial Board
Richard A. Brualdi, University of Wisconsin-Madison
Herbert B. Keller, California Institute of Technology
Andrzej Z. Manitius, George Mason University
Ingram Olkin, Stanford University
Stanley Richardson, University of Edinburgh
Ferdinand Verhulst, Mathematisch Instituut, University of Utrecht
Classics in Applied Mathematics
C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the
Natural Sciences
Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras
with Applications and Computational Methods
James M. Ortega, Numerical Analysis: A Second Course
Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: Sequential
Unconstrained Minimization Techniques
F. H. Clarke, Optimization and Nonsmooth Analysis
George F. Carrier and Carl E. Pearson, Ordinary Differential Equations
Leo Breiman, Probability
R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding
Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathemat-
ical Sciences
Olvi L. Mangasarian, Nonlinear Programming
*Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject
to Errors: Part One, Part Two, Supplement. Translated by G. W. Stewart
Richard Bellman, Introduction to Matrix Analysis
U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary
Value Problems for Ordinary Differential Equations
K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of Initial-Value
Problems in Differential-Algebraic Equations
Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems
J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained
Optimization and Nonlinear Equations
Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability
Cornelius Lanczos, Linear Differential Operators
Richard Bellman, Introduction to Matrix Analysis, Second Edition
Beresford N. Parlett, The Symmetric Eigenvalue Problem
*First time in print.
Classics in Applied Mathematics (continued)

Richard Haberman, Mathematical Models: Mechanical Vibrations, Population


Dynamics, and Traffic Flow
Peter W. M. John, Statistical Design and Analysis of Experiments
Tamer Basar and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second
Edition
Emanuel Parzen, Stochastic Processes
Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods
in Control* Analysis and Design
Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering
Populations: A New Statistical Methodology
James A. Murdock, Perturbations: Theory and Methods
Ivar Ekeland and Roger T6mam, Convex Analysis and Variational Problems
Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II
J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in
Several Variables
David Kinderlehrer and Guido Stampacchia, An Introduction to Variational
Inequalities and Their Applications
F. Natterer, The Mathematics of Computerized Tomography
Avinash C. Kak and Malcolm Slaney, Principles of Computerized Tomographic Imaging
R. Wong, Asymptotic Approximations of Integrals
O. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems:
Theory and Computation
David R. Brillinger, Time Series: Data Analysis and Theory
Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear
Programming, Fixed-Point Theorems
Philip Hartman, Ordinary Differential Equations, Second Edition
Michael D. Intriligator, Mathematical Optimization and Economic Theory
Philippe G. Ciarlet, The Finite Element Method for Elliptic Problems
Jane K. Cullum and Ralph A. Willoughby, Lanc^os Algorithms for Large Symmetric
Eigenvalue Computations, Vol. I: Theory
M. Vidyasagar, Nonlinear Systems Analysis, Second Edition
Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and
Practice
Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and
Methodology of Selecting and Ranking Populations
Eugene L. Allgower and Kurt Georg, Introduction to Numerical Continuation Methods
Heinz-Otto Kreiss and Jens Lorenz, Initial-Boundary Value Problems and the Navier-
Sto/ces Equations
This page intentionally left blank
Solving Least
Squares Problems

Charles L. Lawson
San Clemente, California

Richard J. Hanson
Rice University
Houston, Texas

siam.
Society for Industrial and Applied Mathematics
Philadelphia
Copyright © 1995 by the Society for Industrial and Applied Mathematics. This SIAM
edition is an unabridged, revised republication of the work first published by Prentice-Hall,
Inc., Englewood Cliffs, New Jersey, 1974.

10 9 8 7 6 5

All rights reserved. Printed in the United States of America. No part of this book may be
reproduced, stored, or transmitted in any manner without the written permission of the
publisher. For information, write to the Society for Industrial and Applied Mathematics,
3600 University City Science Center, Philadelphia, PA 19104-2688.

Library of Congress Cataloging-in-Publication Data

Lawson, Charles L.
Solving least squares problems / Charles L. Lawson, Richard J.
Hanson.
p. cm. - (Classics in applied mathematics; 15)
"This SIAM edition is an unabridged, revised republication of the
work first published by Prentice-Hall, Inc., Englewood Cliffs, New
Jersey, 1974"--T.p. verso.
Includes bibliographical references and index.
ISBN 0-89871-356-0 (pbk.)
1. Least squares-Data processing. I. Hanson, Richard J., 1938-
II. Title. III. Series.
QA275.L38 1995
511'.42--dc20 95-35178

Siam. is a registered trademark.


Contents

Preface to the Classics Edition .»„.»......»...»....„...«...„..,.„....»...».»........„.„.......»»........... ix


Preface xi

Chapter 1 Introduction ...........................1


Chapter 2 Analysis of the Least Squares Problem ............................................... 5
Chapter 3 Orthogonal Decomposition by Certain Elementary
Orthogonal Transformations................................................................. 9
Chapter 4 Orthogonal Decomposition by Singular
Value Decomposition ..................................................... ...—........... 18
Chapter 5 Perturbation Theorems for Singular Values............................... 23
Chapter 6 Bounds for the Condition Number of a Triangular Matrix..... 28
Chapter 7 The Pseudoinverse..........................^...................^...^....^..................... 36
Chapter 8 Perturbation Bounds for the Pseudoinverse............................. 41
Chapter 9 Perturbation Bounds for the Solution of Problem LS................ 49
Chapter 10 Numerical Computations Using Elementary
Orthogonal Transformations ....................«..^.».^««..................... 53
Chapter 11 Computing the Solution for the Overdetermined
or Exactly Determined Full Rank Problem 63
Chapter 12 Computation of the Covariance Matrix
or the solution Parameters »»«...........»«.........................................................67
Chapter 13 Computing the Solution for the Underdetermined
Full Rank Problem 74
Chapter 14 Computing the Solution for Problem LS
with Possibly Deficient Pseudorank 77
Chapter 15 Analysis of Computing Errors
for Householder Transformations...................................................... 83
Chapter 16 Analysis of Computing Errors for the Problem LS .................... 90
vii
viii Contents

Chapter 17 Analysis of Computing Errors for the Problem LS


Using Mixed Precision Arithmetic...................—.......................... 100
Chapter 18 Computation of the Singular Value Decomposition
and the Solution of Problem LS ..................................................... 107
Section 1 Introduction 107
Section 2 The Symmetric Matrix QR Algorithm 108
Section 3 Computing the Singular Value Decomposition...... 110
Section 4 Solving Problem LS Using SVD 117
Section 5 Organization of a Computer Program for SVD .....118
Chapter 19 Other Methods for Least Squares Problems..........—......~......... 121
Section 1 Normal Equations with Cholesky Decomposition 122
Section 2 Modified Gram-Schmidt Orthogonalization~........129
Chapter 20 Linear Least Squares with linear Equality Constraints
Using a Basis of the Null Space ................»..............«........«..~.....~..134
Chapter 21 Linear Least Squares with linear Equality Constraints
by Direct Elimination... ......................................................... 144
Chapter 22 linear Least Squares with linear Equality Constraints
by Weighting 148
Chapter 23 Linear Least Squares with Linear Inequality Constraints........ 158
Section 1 Introduction ...............................................»..».................»..158
Section 2 Characterization of a Solution...........................—....159
Section 3 Problem NNLS ..........................„..............................„....._. 160
Section 4 Problem LDP 165
Section 5 Converting Problem LSI to Problem LDP 167
Section 6 Problem LSI with Equality Constraints.......™.... 168
Section 7 An Example of Constrained Curve Fitting............. 169
Chapter 24 Modifying a QR Decomposition to Add or Remove Column
Vectors 174
Chapter 25 Practical Analysis of Least Squares Problems............................. 180
Section 1 General Considerations............................................. 180
Section 2 Left Multiplication of A and b by a Matrix G—183
Section 3 Right Multiplication of A by a Matrix H
and Change of Variables x= Hx + %................„....... 185
Section 4 Append Additional Rows to [A:b] .............................188
Section 5 Deleting Variables....~...............................»............~........ 194
Section 6 Singular Value Analysis .................................................196
ix Contents

Chapter 26 Examples of Some Methods of Analyzing


a Least Squares Problem ............................................................... 199
Chapter 27 Modifying a QR Decomposition to Add or Remove Row
Vectors with Application to Sequential Processing
of Problems Having a Large or Banded Coefficient Matrix..... 207
Section 1 Sequential Accumulation.....................................................208
Section 2 Sequential Accumulation of Banded Matrices .......212
Section 3 An Example: Line Splines........................................................219
Section 4 Data Fitting Using Cubic Spline Functions 222
Section 5 Removing Rows of Data ............................. ........225
Appendix A Bask Linear Algebra Including Projections .................................. 233
Appendix B Proof of Global Quadratic Convergence
of the QR Algorithm ................................................ 240
Appendix C Description and Use of FORTRAN Codes
for Solving Problem LS.........................................~....».....248
Appendix D Developments from 1974 to 1995 284
Bibliography .312
Index 327
This page intentionally left blank
Preface to the Classics Edition
An accessible text for the study of numerical methods for solving least
squares problems remains an essential part of a scientific software foundation.
Feedback that we have received from practicing engineers and scientists, as well
as from educators and students in numerical analysis, indicates that this book
has served this purpose. We were pleased when SIAM decided to republish the
book in then- Classics in Applied Mathematics series.
The main body of the book remains unchanged from the original book that
was published by Prentice-Hall in 1974, with the exception of corrections to
known errata. Appendix C has been edited to reflect changes in the associated
software package and the software distribution method. A new Appendix D
has been added, giving a brief survey of the many new developments in topics
treated in the book during the period 1974-1995. Appendix D is organized
into sections corresponding to the chapters of the main body of the book and
includes a bibliography listing about 230 publications from 1974 to 1995.
The 1974 book was accompanied by a set of Fortran 66 subroutines that
implemented about ten of the major algorithms described in the book and six
sample main programs that illustrated the use of the subroutines. The codes
were listed hi full in the book and distributed first by IMSL, Inc. (now Visual
Numerics, Inc.) of Houston, Texas, and later by C Abaci, Inc. This software
was tested originally on computers of the major vendors of the time and has
been tested further on a wide range of personal computers, workstations, and
supercomputers.
The software from the original book has been upgraded to conform to the
Fortran 77 standard to accompany this SIAM edition. A new subroutine,
BVLS, has been added for solution of the Bounded Variables Least Squares
problem, which is a problem that was not specifically discussed in the original
book. Subroutine BVLS is written in Fortran 90.
For this edition, codes are not listed in the book but rather are being made
available from NETLIB via the Internet. We plan a future augmentation of
the software package in NETLIB with versions in the ANSI C and Fortran 90
languages.
We again wish to thank Gene Golub and the late George Forsythe for sug-
gesting to us in about 1969-1970 the writing of the original edition and for giving
us encouragement throughout the process. Gene again gave significant impetus
to the idea of the current SIAM republication. We are grateful to our colleagues
around the world who examined a draft of Appendix D that we made available
via the Internet and contributed improvements and bibliographic citations, also
via the Internet. We thank Vickie Kearn of SIAM for her cheerful patience in
working out the details of publishing the new edition. Finally we express grati-
tude to our friends and colleagues Fred Krogh and W. Van Snyder for the many
personal favors they provided and for adding constructive comments about our
writing and codes.

Xi
This page intentionally left blank
PREFACE

This book brings together a body of information on solving least squares


problems whose practical development has taken place mainly during the past
decade. This information is valuable to the scientist, engineer, or student who
must analyze and solve systems of linear algebraic equations. These systems
may be overdetermined, underdetermined, or exactly determined and may or
may not be consistent. They may also include both linear equality and ine-
quality constraints.
Practitioners in specific fields have developed techniques and nomen-
clature for the least squares problems of their own discipline. The material
presented in this book can unify this divergence of methods.
Essentially all real problems are nonlinear. Many of the methods of
reaching an understanding of nonlinear problems or computing using non-
linear models involve the local replacement of the nonlinear problem by a
linear one. Specifically, various methods of analyzing and solving the non-
linear least squares problem involve solving a sequence of linear least squares
problems. One essential requirement of these methods is the capability of
computing solutions to (possibly poorly determined) linear least squares
problems which are reasonable in the context of the nonlinear problem.
For the reader whose immediate concern is with a particular application
we suggest first reading Chapter 25. Following this the reader may find that
some of the Fortran programs and subroutines of Appendix C have direct
applicability to his problem.
There have been many contributors to theoretical and practical develop-
ment of least squares techniques since the early work of Gauss. We particu-
larly cite Professor G. H. Golub who, through broad involvement in both
theoretical and practical aspects of the problem, has contributed many signi-
ficant ideas and algorithms relating to the least squares problem.
The first author introduced the second author to the least squares problem
xiii
Xiv PREFACE

in 1966. Since that time the authors have worked very closely together in the
adaptation of the stable mathematical methods to practical computational
problems at JPL. This book was written as part of this collaborative effort.
We wish to express our appreciation to the late Professor G. E. Forsythe
for extending to us the opportunity of writing this book. We thank the mana-
gement of the Jet Propulsion Laboratory, California Institute of Technology,
for their encouragement and support. We thank Drs. F. T. Krogh and D.
Carta for reading the manuscript and providing constructive comments. A
number of improvements and simplifications of mathematical proofs were
due to Dr. Krogh.
Our thanks go to Mrs. Roberta Cerami for her careful and cheerful
typing of the manuscript in all of its stages.
CHARLES L. LAWSON
RICHARD J. HANSON
1 INTRODUCTION

This book is intended to be used as both a text and a reference for persons
who are investigating the solutions of linear least squares problems. Such
least squares problems often occur as a component part of some larger com-
putational problem. As an example, the determination of the orbit of a space-
craft is often considered mathematically as a multipoint boundary value
problem in the field of ordinary differential equations, but the computation
of the orbital parameters usually amounts to a nonlinear least squares esti-
mation problem using various linearization schemes.
More generally, almost any problem with sufficient data to overdetermine
a solution calls for some type of approximation method. Most frequently
least squares is the approximation criterion chosen.
There are a number of auxiliary requirements which often arise in least
squares computations which merit identification for algorithmic develop-
ment. Examples:
A problem may require certain equality or inequality relations between
variables. A problem may involve very large volumes of data so that the
allocation of computer storage is of major concern.
Many times the purpose of least squares computation is not merely to
find some set of numbers that "solve" the problem, but rather the investigator
wishes to obtain additional quantitative information describing the relation-
ship of the solution parameters to the data. In particular, there may be a
family of different solution parameters that satisfy the stated conditions
almost equally well and the investigator may wish to identify this indeter-
minacy and make a choice in this family to meet some additional condi-
tions.
This book presents numerical methods for obtaining least squares solu-
1
2 INTRODUCTION CHAP. 1

tions keeping the points above in mind. These methods have been used suc-
cessfully by many engineers and scientists, together with the authors, in the
NASA unmanned space program.
The least squares problem that we are considering here is known by
different names by those in different scientific disciplines. For example,
mathematicians may regard the (least squares) problem as finding the closest
point in a given subspace to a given point in a function space. Numerical
analysts have also tended to use this framework, which tends to ignore data
errors. It follows that the opportunity to exploit the often present arbitrari-
ness of the solution is disregarded.
Statisticians introduce probability distributions into their conception
of the problem and use such terms as regression analysis to describe this
area. Engineers reach this problem by studying such topics as parameter
estimation, filtering, or process identification.
The salient point is this: When these problems (formulated in any of
these contexts) reach the stage of numerical computation, they contain the
same central problem, namely, a sequence of linear least squares systems.
'This basic linear least squares problem can be stated as follows:
PROBLEM LS
Given a real m x n matrix A of rank k < min (m, n), and given a real
m-vector btfind a real n-vector x0 minimizing the euclidean length of Ax — b.

(The reader can refer to Appendix A for the definitions of any unfamiliar
linear algebraic terms.) We shall use the symbolism A x = b to denote
Problem LS.
This problem can also be posed and studied for the case of complex A
and b. The complex case arises much less frequently in practice than the real
case and the theory and computational methods for the real case generalize
quite directly to the complex case.
In addition to the statement of Problem LS (in a given context) there
is an additional condition: The numerical data that constitute A and b have
only a limited number of accurate digits and after those digits the data are
completely uncertain and hence arbitrary I In practice this is the usual state
of affairs. This is due, in part, to the limited accuracy of measurements or
observations. It is important that explicit note be taken of this situation and
that it be used to advantage to obtain an appropriate approximate solution
to the problem. We shall discuss methods for accomplishing this, particularly
in Chapters 25 and 26.
Consider, as an important example, the case in which the linear least
squares problem arises from a nonlinear one and the solution vector xo is
to be used as a correction to be added to a current nominal solution of the
nonlinear problem. The linearized problem will be a useful approximation
Fig. 1.1 The six cases of Problem LS depending on the relative
sizes of m, it, and Rank (A).

3
4 INTRODUCTION CHAP. 1

to the nonlinear one only in some limited neighborhood. If there are differ-
ent vectors that give satisfactorily small residuals in the linear problem, one
may prefer the one of smallest length to increase the likelihood of staying
within the neighborhood of validity of the linear problem.
The following three points are fundamental to our approach to least
squares computation.
1. Since the data defining Problem LS are uncertain, we can change
these data to suit our needs within this uncertainty.
2. We shall use orthogonal elimination matrices directly on the linear
system of Problem LS.
3. Practicality, for computer implementation, is stressed throughout.

We shall briefly comment on the first two of these points. Our goal in
changing the problem, as stated in point 1, will be to avoid the situation above
where a "small" change in the data produces "large" changes in the solution.
Orthogonal transformation matrices, mentioned in point 2, have a natural
place in least squares computations because they leave the euclidean length
of a vector invariant. Furthermore, it is desirable to use them because of their
stability in computation with regard to propagating data errors or uncer-
tainty.
We have made no assumption regarding the relative sizes of m and n.
In our subsequent discussion of Problem LS it will be convenient to identify
the six cases illustrated in Fig. 1.1.
The main interest of this book will be in Case 2a with special attention
to the situation where data uncertainty leads to Case 2b. Algorithms and
discussions applicable to all six cases will be given, however.
2 ANALYSIS OF THE
LEAST SQUARES PROBLEM

The central point of this chapter will be an analysis of Problem LS based


on a certain decomposition of an m x n matrix A in the form HRK? where
H and K are orthogonal matrices. The definition of an orthogonal matrix
and other linear algebraic concepts are summarized in Appendix A.
Our interest in this general type of decomposition A = HRKT is moti-
vated by the practicality and usefulness of certain specific computable decom-
positions of this type that are introduced in Chapters 3 and 4.
An important property of orthogonal matrices is the preservation of
euclidean length under multiplication. Thus for any m-vector y and any
m x m orthogonal matrix Q,

In the context of Problem LS, the least squares problem of minimizing


the euclidean length of Ax — b, we have

for any m x m orthogonal matrix Q and any n-vector x.


The use of orthogonal transformations allows us to express the solution
of Problem LS in the following way:
(2.3) THEOREM
Suppose that A is an M X N matrix of rank k and that

5
6 ANALYSIS OF THE LEAST SQUARES PROBLEM CHAP. 2

where
(a) H is an m x m orthogonal matrix.
(b) R is an m x n matrix of the form

(c) RH is a k X k matrix of rank k.


(d) K is an n x n orthogonal matrix.
Define the vector

and introduce the new variable

Define ?i to be the unique solution of

(1) Then all solutions to the problem of minimizing || Ax — b|| are of


the form

where yt is arbitrary.
(2) Any such ft gives rise to the same residual vector r satisfying

(3) The norm ofr satisfies

(4) The unique solution of minimum length is


CHAP. 2 ANALYSIS OF THE LEAST SQUARES PROBLEM 7

Proof: Replacing A by the right side of Eq. (2.4) and using Eq. (2.2)
yields

Using Eq. (2.5) to (2.8) we see that

for all x.
The right member of Eq. (2.13) has the minimum value \\gt ||2 when

Equation (2.14) has a unique solution yt since R11 is of rank k.


The most general solution for y is given by

where y2 is arbitrary.
Then with £ defined by Eq. (2.8) one obtains

which establishes Eq. (2.9).


Clearly the minimum length vector y, of the form given in Eq. (2.15),
is that one with y2 = 0. From Eq. (2.8) it follows that the solution x of mini-
mum euclidean length is given by

This completes the proof of Theorem (2.3).

Any decomposition of an m x n matrix A = HRKT, as in Eq. (2.4),


will be called an orthogonal decomposition of A.
In case k = n or k = m, quantities having dimensions involving (n — k)
or (m — k) are respectively absent. In particular, the solution to Problem
LS is unique in case k = n.
Notice that the solution of minimum length, the set of all solutions,
and the minimum value, for the problem of minimizing \\Ax — b\\, are all
unique; they do not depend on the particular orthogonal decomposition.
8 ANALYSIS OF THE LEAST SQUARES PROBLEM CHAP. 2

EXERCISE

(2.17) Suppose 4MXn, m ;> n, is of rank n withQ1R1,= A = &£*. If the


Qt are m x it with orthogonal column vectors and the Rt are n x n
upper triangular, then there exists a diagonal matrix D with diagonal
elements plus or minus one such that QtD = Qt and DR1 = R2.
ORTHOGONAL DECOMPOSITION

3 BY CERTAIN ELEMENTARY
ORTHOGONAL TRANSFORMATIONS

In the previous chapter it was shown that Problem LS could be solved


assuming an orthogonal decomposition of the m x n matrix A. In this chap-
ter we shall establish the existence of such a decomposition for any matrix
A using explicit orthogonal transformations.
Two elementary orthogonal transformations that will be useful are pre-
sented in the following two lemmas.
(3.1) LEMMA
Given an m-vector v (which is not the zero vector), there exists an orthog-
onal matrix Q such that

with

and

where V1 is the first component of v.


9
10 CERTAIN ELEMENTARY ORTHOGONAL TRANSFORMATIONS CHAP. 3

Proof: Define

and

The proof is completed by verifying directly that the matrix Q of Eq.


(3.3) is symmetric and orthogonal and that the remaining statement of Lem-
ma (3.1) is satisfied.

The transformation defined in Eq. (3.3) was used by A. S. Householder


in discussing certain eigenvalue problems [Householder (1958)]. For this
reason the matrix Q is called a Householder transformation matrix.
This transformation can be viewed geometrically as a reflection in the
(m — l)-dimensional subspace, 5, orthogonal to the vector u. By this it is
meant that Qu— —u and Qs = s for all s e S.
In the special case that one wishes to transform only one element of the
vector v to zero, the following Givens transformation [Givens (1954)] is often
used. Since this transformation only modifies two components of a vector,
it suffices to discuss its operation on a 2-vector.
(3.4) LEMMA
Given a 2'Vector v = (v1, v2)T (with either v1 = 0 or v2 = 0), there
is a 2 x 2 orthogonal matrix.

with

such that

Proof: Simply put

AND
CHAP. 3 CERTAIN ELEMENTARY ORTHOGONAL TRANSFORMATIONS 11

Then the matrix G of Eq. (3.5) can be easily shown to be orthogonal


and satisfy Eq. (3.7), completing the proof of Lemma (3.4).

We also remark that we may choose G to be symmetric as well as orthogo-


nal by defining

with c and s as in Eq. (3.8) and (3.9).


(3.11) THEOREM
Let A be an m x n matrix. There is an m X m orthogonal matrix
Q such that QA = Ris zero below the main diagonal.

Let an m x m orthogonal matrix Q1, be chosen as in Lemma (3.1) so


that the first column of Q1A is zeroed in components 2 through m. Next
choose an (m — 1) x (m — 1) orthogonal matrix P2, which when applied
to the (m — 1) vector consisting of components 2 through m of the second
column of Q1A results in zeros in components 3 through m. The transforma-
tion matrix

is orthogonal and Q2Q1A has zeros below the diagonal in both the first two
columns.
Continuing in this way, a product of at most n orthogonal transforma-
tions can be constructed that will transform A to upper triangular form.
These remarks can be formalized to provide a finite induction proof of Theo-
rem (3.11).
The algorithmic details of the construction of these transformations will
be given in Chapter 10. This decomposition of a matrix as a product of an
orthogonal matrix and an upper triangular matrix is called a QR decomposi-
tion. It plays an important role in a number of computational algorithms
of linear algebra [Francis (1961); Golub and Businger (1965)].
For Cases la and 2a of Fig. 1.1, in which Rank (A) = n, Theorem (3.11)
establishes the existence of an orthogonal decomposition of A. Thus from
Theorem (3.11) we can write

where the matrices QT, R, and I„ in this representation have the properties
required of the matrices H, R, and KT of an orthogonal decomposition for
a matrix A, as given in Theorem (2.3).
12 CERTAIN ELEMENTARY ORTHOGONAL TRANSFORMATIONS CHAP. 3

In the case that the rank of A is m, Case 3a of Fig. 1.1, Theorem (3.11)
allows us to write

so that

Here, m> Im, and Q in this representation have the properties required of
the matrices H, R, and KT for an orthogonal decomposition of a matrix of
rank m, as given in Theorem (2.3).
In Cases Ib, 2b, and 3b of Fig. 1.1, the matrix R obtained in Theorem
(3.11) is not necessarily in the form required for an orthogonal decomposi-
tion.
We proceed to discuss additional transformations that will obtain the
orthogonal decomposition for these cases.
(3.15) THEOREM
Let A be an m x n matrix whose rank k satisfies k < n <; m. There
is an m x m orthogonal matrix Q and an n x n permutation matrix
P such that

Here R is a k X k upper triangular matrix of rank k.

Proof: Let the permutation matrix P be selected so that the first


k columns of the matrix AP are linearly independent. By application of
Theorem (3.11) there is an m x m orthogonal matrix such that QAP is upper
triangular. Since the first k columns of AP are linearly independent, the
same will be true of the first k columns of QAP.
All the elements in rows k 4- 1 through m of columns k + 1 through
n of QAP will be zero. Otherwise the rank of QAP would be greater than k,
which would contradict the fact that A is of rank k. Therefore QAP has
the form indicated in the right member of Eq. (3.16). This completes the
proof of Theorem (3.15).

The submatrix [R:T]in the right member of Eq. (3.16) can be further
transformed to the compact form required of the matrix R in Theorem (2.3).
This transformation is described by the following lemma:
CHAP. 3 CERTAIN ELEMENTARY ORTHOGONAL TRANSFORMATIONS 13

(3.17) LEMMA
Let [R: T] be a k x n matrix where R is of rank k. There is an n x n
orthogonal matrix W such that

Here R is a lower triangular matrix of rank k.

Lemma (3.17) follows from Theorem (3.15) by identification of the


present n, k, [R: 7], and W, respectively, with m, n, AT, and QT of Theorem
(3.15).
Lemma (3.17) can be used together with Theorem (3.15) to prove the
following theorem:

(3.19) THEOREM
Let A be an m x n matrix of rank k. Then there is an m x m ortho-
gonal matrix H and an n x n orthogonal matrix K such that

where

Here R,, is a k X k nonsingular triangular matrix.

Note that by an appropriate choice of H and K of Eq. (3.20) we can


arrange that R11of Eq. (3.21) be either upper triangular or lower triangular.
In summary, it has been shown that there are constructive procedures
for producing orthogonal decompositions of the form A — HRKT in each
of the six cases displayed in Fig. 1.1. In all the cases the rank k submatrix
R11 of Theorem (2.3) is obtained in triangular form. Thus it is particularly
easy to compute the solution of Eq. (2.7).
As already noted, in Cases la and 2a, the matrix A7 of the decomposition
can be taken to be the identity matrix I„. Likewise, in Case 3a, the matrix
H can be taken to be the identity matrix /.
To illustrate these orthogonal decompositions we give a numerical exam-
ple for each of the six cases of Fig. 1.1.
(3.22) CASE la Square, nonsingular
14 CERTAIN ELEMENTARY ORTHOGONAL TRANSFORMATIONS CHAP. 3

(3.23) CASE 2a Overdetermined, full rank

(3.24) CASE 3a Underdetermined, full rank


CHAP. 3 CERTAIN ELEMENTARY ORTHOGONAL TRANSFORMATIONS 15

(3.25) CASE 1b Square, singular

(3.26) CASE 2b Overdetermined, rank deficient


16 CERTAIN ELEMENTARY ORTHOGONAL TRANSFORMATIONS CHAP. 3

(3.27) CASE 3b Underdetermined, rank deficient


CHAP. 3 CERTAIN ELEMENTARY ORTHOGONAL TRANSFORMATIONS 17

EXERCISES

(3.28) Find an eigenvalue-eigenvector decomposition of the Householder


matrix H = I - 2wwT, \\w\\ = 1.
(3.29) Find an eigenvalue-eigenvector decomposition of the Givens reflec-
tion matrix of Eq. (3.10).
(3.30) Show that G of Eq. (3.10) is a Householder transformation matrix.
4
ORTHOGONAL DECOMPOSITION
BY SINGULAR VALUE DECOMPOSITION

In this chapter we shall present another practically useful orthogonal


decomposition of the m x n matrix A. In the previous chapter the matrix A
was expressed as a product HRKT where R was a certain rectangular matrix
whose nonzero elements belonged to a nonsingular triangular submatrix. We
shall show here that this nonsingular submatrix of R can be further simplified
so that it constitutes a nonsingular diagonal matrix. This decomposition is
particularly useful in analyzing the effect of data errors as they influence
solutions to Problem LS.
This decomposition is closely related to the eigenvalue-eigenvector
decomposition of the symmetric nonnegative definite matrices ATA and AAT.
The standard facts concerning the eigenvalue-eigenvector decomposition of
symmetric nonnegative definite matrices are summarized in Appendix A.
(4.1) THEOREM (Singular Value Decomposition)
Let A. be an m x n matrix of rank k. Then there is an m x m orthogonal
matrix Vt an n X n orthogonal matrix V, and an m x n diagonal matrix
S such that

Here the diagonal entries of S can be arranged to be nonincreasing; all


these entries are nonnegative and exactly k of them are strictly positive.

The diagonal entries of S are called the singular values of A. It will be


convenient to give the proof of Theorem (4.1) first for the special case in
which m = n — Rank (A). The more general statement follows easily from
this special case.
18
CHAP. 4 SINGULAR VALUE DECOMPOSITION 19

(4.3) LEMMA
Let A be an n x n matrix of rank n. Then there exists an n x n
orthogonal matrix U, an n x n orthogonal matrix V, and on n x n
diagonal matrix S such that

The successive diagonal entries of S are positive and nonincreasing.

Proof of Lemma (4.3): The positive definite symmetric matrix ATA has
an eigenvalue-eigenvector decomposition

where the n x n matrix V is orthogonal and the matrix D is diagonal with


positive nonincreasing diagonal entries.
Define the diagonal matrix S to be the n x n matrix whose diagonal
terms are the positive square roots of the respective diagonal entries of D.
Thus

and

Define the n x n matrix

From Eq. (4.5), (4.7), and (4.8) and the orthogonality of V,

so that V is orthogonal.
From Eq. (4.8) and the fact that V is orthogonal,

This completes the proof of Lemma (4.3).


20 SINGULAR VALUE DECOMPOSITION CHAP. 4

Proof of Theorem (4.1): Let

where H, R, and KT have the properties noted in Theorem (3.19).


Since the k x k matrix R11 of Eq. (3.21) is nonsingular, Lemma (4.3)
shows that we may write

Here U and V are k x k orthogonal matrices, while 5 is a nonsingular diago-


nal matrix with positive and nonincreasing diagonal elements.
It follows from Eq. (4.12) that the matrix R of Eq. (3.21) may be written as

where U is the m x m orthogonal matrix

V is the n x n orthogonal matrix

and 5 is the m x n diagonal matrix

Then defining U and V as

we see from Eqs. (4.11)-(4.18) that

where U, S, and Knave the properties stated in Theorem (4.1). This completes
the proof of Theorem (4.1).

Notice that the singular values of the matrix A are uniquely determined
even though the orthogonal matrices U and V of Eq. (4.19) are not unique.
CHAP. 4 SINGULAR VALUE DECOMPOSITION 21

Let o be a singular value of A with multiplicity /. Thus among the ordered


singular values there is an index i such that

and

The I-dimensional subspace T of n-space spanned by the column vectors j,


j = i, i + 1,...,i + / — 1, of V is uniquely determined by A but the specific
set of orthogonal basis vectors of T constituting columns i, i+1,...,
i + i — 1 of Vis not uniquely determined.
More specifically, let k — min (m, n) and let Q be a k x k orthogonal
matrix of the form

where Pis / x / and orthogonal. if A = USVT is a singular value decomposi-


tion of A and st = s1+1 = • • • = s1+1-1, then USVT with

is also a singular value decomposition of A.


The following numerical example gives the singular value decomposition
of the matrix A introduced previously as Example (3.23) for Case 2a.
22 SINGULAR VALUE DECOMPOSITION CHAP. 4

EXERCISES

(4.21) If A is n x n and symmetric with distinct singular values, then


(a) A has distinct (real) eigenvalues.
(b) A singular value decomposition of A can be used to obtain
an eigenvalue-eigenvector decomposition of A and conversely.
What if the singular values are not distinct?
(4.22) If 5, is the largest singular value of A, then || A \\ = S1.
(4.23) If R is an n x n nonsingular matrix and s, is its smallest singular
value, then | R~l || = s~l.
(4.24) Let Si and sm respectively, denote the largest and smallest singular
values of a matrix X MXn of rank n. Show that jJ|x||^M*U<£
S1|| x|| for all n-vectors x.
(4.25) Let s1,..., sk be the nonzero singular values of A. Then

(4.26) Let A = USVT be a singular value decomposition of A. Show that


the column vectors of U are eigenvectors of the symmetric matrix
AAT.
5 PERTURBATION THEOREMS
FOR SINGULAR VALUES

The singular values of a matrix are very stable with respect to changes
in the elements of the matrix. Perturbations of the elements of a matrix
produce perturbations of the same, or smaller, magnitude in the singular
values. The purpose of this chapter is to present Theorems (5.7) and (5.10),
which provide precise statements about this stability, and Theorem (5.12),
which gives bounds on the perturbation of singular values due to removing
a column, or row, of a matrix.
These theorems are direct consequences of corresponding theorems about
the stability of eigenvalues of a symmetric matrix. We first quote the three
relevant eigenvalue theorems.
(5.1) THEOREM
Let B, A, and E be n x n symmetric matrices with B — A = E.
Denote their respective eigenvalues by Bi, ai, and Ei, = 1,..., n, each
set labeled in nonincreasing order. Then

A weaker conclusion that is often useful because it requires less detailed


information about E is

(5.2) THEOREM (Wielandt-Hoffman)


With the same hypotheses as in Theorem (5.7),

23
24 PERTURBATION THEOREMS FOR SINGULAR VALUES CHAP. 5

(5.3) THEOREM
Let A be an n x n symmetric matrix with eigenvalues a1 > a2 > • • •
>an. Let k be an integer, 1 < k <; n. Let B be the (n — 1) X (n — I)
symmetric matrix obtained by deleting the kth row and column from A.
Then the ordered eigenvalues B1 of B interlace with those of\ as follows:

The reader is referred to Wilkinson (1965a), pp. 99-109, for discussion


and proofs of these three theorems as well as the Courant-Fischer Minmax
Theorem on which Theorems (5.1) and (5.3) depend. Theorem (5.2) is due to
Hoffman and Wielandt (1953) and is further treated in Wilkinson (1970).
To derive singular value perturbation theorems from these eigenvalue
perturbation theorems, we shall make use of the relationship between the
singular value decomposition of a matrix A and the eigenvalue-eigenvector
decomposition of the symmetric matrix

If A is square with a singular value decomposition A = USVT, it is easily


verified that C has an eigenvalue-eigenvector decomposition

where

and

If mxn, m;>n, has a singular value decomposition

then C, as defined in Eq. (5.4), has the eigenvalue-eigenvector decomposition


CHAP. 5 PERTURBATION THEOREMS FOR SINGULAR VALUES 25

where

and

Clearly, analogous results hold for the case of m < n. For later reference
we state the following theorem based on the discussion above:
(5.6) THEOREM
Let A be an m x n matrix and k = min (m, n). Let C be the (m + n) x
(m + n) symmetric matrix defined by Eq. (5.4). If the singular values of
A are s1,..., sk, then the eigenvalues of C areS1,...,sk, —s1,...,
—sk, and zero repeated | m — n | times.

We may now state the following three theorems regarding the perturba-
tion of singular values.
(5.7) THEOREM
Let B, A, and E be m X n matrices with B — A = E. Denote their
respective singular values by Bi a1 and E i — 1,..., k; k = min
(m, n), each set labeled in nonincreasing order. Then

Proof: Introduce the three symmetric matrices

Then

The eigenvalues of these matrices are related to the singular values of B, A,


and E as stated in Theorem (5.6). Applying Theorem (5.1) to the matrices
B, A, and E, we obtain the result in Eq. (5.8).

(5.10) THEOREM
With hypotheses as in Theorem (5.7), there follows the inequality
26 PERTURBATION THEOREMS FOR SINGULAR VALUES CHAP. S

Proof: Introducing Eq. (S.9) and using Theorems (5.6) and (5.2), we
obtain

which is equivalent to Eq. (S. 11).


(5.12) THEOREM
Let A be an m x n matrix. Let k be an integer, 1 <. k <, n. Let B
be the m x (n — 1) matrix resulting from the deletion of column k
from A. Then the ordered singular values B1 of B interlace with those
ai of A as follows:

CASE 1 m > n

CASE 2 m < n

Proof: The results in Eq. (5.13) and (5.14) follow directly from appli-
cation of Theorem (5.3) to the symmetric matrices A = ATA and B = BTB.
In Case 1 the eigenvalues of A and B are a12, i = 1,...,n, and B12, i =
1,..., n — 1, respectively. In Case 2 the eigenvalues of b are b2l i=
m, and zero repeated n — m times, while the eigenvalues of B are B21, i =
1 , . . . , mt and zero repeated n — 1 — m times.

EXERCISES

(5.15) Problem [Eckart and Young (1936)]: Given an m x n matrix A, of


rank k, and a nonnegative integer r < k, find an m x n matrix B of
rank r that minimizes ||B — A ||F.
Solution: Let A = USFT be a singular value decomposition of A
with ordered singular values sl > s2 > ... > sk> 0. Let S be
constructed from S by replacing Sr+1 through sk by zeros. Show
that B = USV solves the stated problem, and give an expression for
|| B — A ||, in terms of the singular values of A.
Remark: The proof is straightforward using Theorem (5.10). The
proof given by Eckart and Young contains the essential ideas needed
to prove Theorem (5.10).
(5.16) Solve the problem of Exercise (5.15) with || • ||, replaced by || • ||.
CHAP. 5 PERTURBATION THEOREMS FOR SINGULAR VALUES 27

(5.17) Define x(A) to be the ratio of the largest singular value of A to the
smallest nonzero singular value of A. (This "condition number'* of
A will be used in subsequent chapters.) Show that if Rank (AMXN) = n
and if Bmxr is a matrix obtained by deleting (n — r) columns from
A,then K(B)<K(A).
6 BOUNDS FOR THE CONDITION NUMBER
OF A TRIANGULAR MATRIX

In the discussion of the practical solution of least squares problems in


Chapter 25, the need will arise to estimate the largest and smallest nonzero
singular values of a matrix A since the ratio of these two quantities (the con-
dition number of A) has an interpretation of being an error magnification
factor. This interpretation of the condition number will be presented in
Chapter 9.
The most direct approach is to compute the singular values of A (see
Chapters 4 and 18 and Appendix C). One may, however, wish to obtain
bounds on the condition number without computing the singular values.
In this chapter we present some theorems and examples relating to this
problem.
Most of the algorithms which are described in this book produce as
an intermediate step a nonsingular triangular matrix, say, R, which has
the same nonzero singular values as the original matrix A. In general, a non-
singular triangular matrix is a better starting point for the bounding of singu-
lar values than is a full matrix. Therefore we shall deal only with bounds for
the singular values of a nonsingular triangular matrix R.
Denote the ordered singular values of the n x n nonsingular triangular
matrix R by S1, > s2 ;> • • • > sn > 0. As noted in Exercise (4.22), j, = || R ||.
Therefore, one easily computable lower bound for s1 is

Furthermore since R is triangular, the reciprocals of the diagonal elements


of R are elements of R- l . Therefore

28
CHAP. 6 BOUNDS FOR THE CONDITION NUMBER 29

From Exercise (4.23), sn-l = ||R-1||, and thus

Therefore, a lower bound p for the condition number K —s1/snis avail-


able as

This lower bound P, while of some practical use, cannot in general be


regarded as a reliable estimate of K. In fact, K can be substantially larger
than p.
As our first example [Kahan (1966b), p. 790], let the upper triangular
n x n matrix R be defined by

For example, for n — 4 we have

Using Eq. (6.4) we find p — 1 as a lower bound for K, which tells us nothing
at all. For this matrix a more realistic upper bound for sn can be obtained by
noting the effect of R on the vector y with components

It is easily verified that Ry — z where

Then, using the inequalities


30 BOUNDS FOR THE CONDITION NUMBER CHAP. 6

we obtain

The inequality of Eq. (6.9) may also be interpreted as showing that R is


close to a singular matrix in the sense that there exists a matrix E such that
|| E || < 22-n and R — E is singular. It is easily verified that such a matrix is
given by E = zyT/YTY.
To illustrate this example for the case of n = 4, we note that y =
(1,1/2,1/4,1/4)T,z= (0,0,0,1/4)T,s4 <,1/4,and K > 4. Furthermore, subtracting
the matrix

from the matrix R of Eq. (6.6) gives a singular matrix.


This example illustrates the possible danger of assuming a matrix is well
conditioned even when the p of Eq. (6.4) is small and the matrix is "innocent
looking."
Recall that we are primarily interested in triangular matrices that arise
from algorithms for solving systems of linear equations. The matrix defined
by Eq. (6.5) has been discussed in the literature because it is invariant under
Gaussian elimination with either full or partial pivoting. It is not invariant,
however, under elimination by Householder transformations using column
interchanges (such as the Algorithm HFTI, which will be described in Chap-
ter 14).
As a numerical experiment in this connection we applied Algorithm
HFTI to the matrix R of Eq. (6.5). Let R denote the triangular matrix
resulting from this operation. We also computed the singular values of R.
The computing was done on a UNIVAC 1108 with mixed precision (see
Chapter 17) characterized by single and double precision arithmetic, respec-
tively, of

Table 6.1 lists the computed values of the last diagonal element FM and
the smallest singular value sn as well as the ratiosn/Fnnfor n — 20 through 26.
Observe that in these casesFnnprovides a good estimate of the size of sn.
It is nevertheless true that there exist n x n matrices with unit column
norms which could be produced by Algorithm HFTI and whose smallest
singular value is smaller than the smallest diagonal element by a factor of
approximately 21-n.
CHAP. 6 BOUNDS FOR THE CONDITION NUMBER 31

Table 6.1 LAST DIAGONAL ELEMENT OF A AND LAST SINGULAR VALUE OF Jt


n rnn x 10" 5. X 108 sn/rnn
20 330. 286, 0.867
21 165. 143. 0.867
22 82.5 71.6 0.868
23 40.5 35.8 0.884
24 20.5 17.9 0.873
25 8.96 8.95 0.999
26 4.59 4.52 0.985

The following example of such a matrix is due to Kalian (1966b), pp. 791-
792. Define the n x n matrix R by

where s and c are positive numbers satisfying s2 +


c2 = 1. For example, if
n = 4, we have

For general n, the matrix R is upper triangular, the column vectors are all of
unit euclidean length, and the inequalities

are satisfied. As established in Chapter 14, these inequalities imply that R


is a matrix that could result from applying the Algorithm HFTI to some
matrix A.
Let T = R-1. The elements of Tare given by
32 BOUNDS FOR THE CONDITION NUMBER CHAP. 6

For example, if n = 4, we have

Let JN denote the last column vector of R-1. Then

As s—» 0+ and c = (1 —S2)I/2—»1", the product|Jn|2r2nnapproaches


(4n-1 + 2)/3 from below. Thus there is some value of s, depending on n, for
which

Then, using. Exercise (4.24),

we have

The following theorem shows that this case represents nearly the minimum
value attainable by sn/ r nn|.
(6.13) THEOREM [Stated without proof in Faddeev, et al. (1968)]
Let A be an m X n matrix of rank n. Assume all column vectors of A.
have unit euclidean length. Let R be an upper triangular matrix pro-
duced by Householder triangularization of A with interchange of
columns as in Algorithm HFTI (14.9). Then Sn, the smallest singular
value of A., is related to rnn, the last diagonal element of R, by

and
CHAP. 6 BOUNDS FOR THE CONDITION NUMBER 233

Proof: From the way in which R is constructed, its column vectors


have unit euclidean length and it has the same singular values as A. Further-
more, as a result of the use of column interchanges, the following inequalities
will hold:

From Eq. (6.16) it follows that

Since

the assertions in Eq. (6.14) and (6.15) are equivalent to the assertions

and

The assertion in Eq. (6.19) follows from the fact thatrnn-1is an element of
R-1.
To establish Eq. (6.20), we shall develop upper bounds on the magnitudes
of all elements of R-1 and then compute the Frobenious norm of the bound-
ing matrix M.
Define T — R~l. It will be convenient to group the elements of T into
diagonals parallel to the main diagonal. Introduce bounding parameters
for the diagonals as follows:

Then

The elements in the Kth superdiagonal of T are expressible in terms of ele-


ments of the main diagonal and preceding superdiagonals of T and elements
of R as follows:
34 BOUNDS FOR THE CONDITION NUMBER CHAP. 6

Since |r,il+l| <, |rw|, we may use Eq. (6.21) and (6.23) to obtain

It is then easily verified by induction that

Define an n x n upper triangular matrix M by

and define

For example, if n = 4,

From Eq. (6.21) and (6.2S) to (6.27), it follows that the elements ofR~l
(== 7) are individually bounded in magnitude by the corresponding elements
ofM. Thus

To compute the Frobenious norm of A/, we note that for/ > 2 the sum of
squares of the elements of column j is

This expression is also correct fory = 1, assuming the sum in the middle
member is defined as zero.
Then
CHAP. 6 BOUNDS FOR THE CONDITION NUMBER 36

The inequalities in Eq. (6.28) and (6.30) together establish the inequality
in Eq. (6.20) and complete the proof of Theorem (6.13).
The following more general theorem has also been stated by Faddeev,
Kublanovskaya, and Faddeeva (1968).
(6.31) THEOREM
Let A and R be matrices as given in Theorem (6.13). The singular
values st> s2 > •. • > sn of A are related to the diagonal elements
ru of R by the inequalities

Proof: Define Rj to be the leading principal submatrix of R of order


j. Denote the ith singular value of Rj by s(1). Using Theorem (5.12) we have
the inequalities

Theorem (6.13) may be applied to Ri to obtain

which with Eq. (6.33) establishes the lower bound for s, stated in Eq. (6.32).
Define Wj to be the principal submatrix of R of order n +1 — j consist-
ing of the intersections of rows and columns j through n. Note that the
leading diagonal element of Wi is rtt. Using the properties in Eq. (6.16) and
Exercise (6.37), one obtains

Denote the ith singular value of Wj byWi(j).Then from Theorem (5.12)


we may write

SinceW1(i)= || Wt ||, Eq. (6.35) and (6.36) may be combined to obtain the
upper bound for st in Eq. (6.32), completing the proof of Theorem (6.31).

EXERCISE
(6.37) Let A be an m x n matrix with column vectors aj. Then ||A||
n1/2 max, || a,||.
7 THE PSEUDOINVERSE

If A is an n x n nonsingular matrix, the solution of the problem Ax = b


can be written as x = A- l b where A-l is the (unique) inverse matrix for A.
The inverse matrix is a very useful mathematical concept even though effici-
ent and reliable contemporary methods [Forsythe and Moler (1967); Wilkin-
son (1965a)] for computing the solution of Ax = b do not involve explicit
computation of A- l .
For Problem LS there arises the question as to whether there exists some
n x m matrix Z, uniquely determined by A, such that the (unique) minimum
length solution of Problem LS is given by x = Zb. This is indeed the case
and this matrix Z is called the pseudoinverse of A. As noted above for the
inverse, the pseudoinverse is a useful mathematical concept but one would
not usually compute the pseudoinverse explicitly in the process of computing
a solution of Problem LS.
The following two theorems lead to a constructive definition of the pseu-
doinverse of an m x n matrix A.
(7.1) THEOREM
Let A be an m x n matrix of rank k with an orthogonal decomposition
A = HRKT as in the hypotheses of Theorem (2.3). Then the unique
minimum length solution of Problem LS is given by

Proof: The conclusion, Eq. (7.2), is an alternative way of writing Eq.


(2.5) to (2.7) and Eq. (2.11).
36
CHAP. 7 THE PSEUDOINVERSE 37

(7.3) THEOREM
Let A m x n = HRKT as in Theorem (7.7). Define

Then Z is uniquely defined by A; it does not depend on the particular


orthogonal decomposition of A.

Proof: For each j, 1 <j<, m, they jth column zj of Z can be written as


Zj = Ze j, where ej is they'th column of the identity matrix Im From Theorem
(7.1), Zj is the unique minimum length solution of the least squares problem
Axj = ej. This completes the proof of Theorem (7.3).

In view of Theorems (7.1) and (7.3) we make the following definition.


(7.4) DEFINITION
For a general m x n matrix A, the pseudoinverse of A, denoted by
A+, is the n x m matrix whose jth column zj is the unique minimum
length solution of the least squares problem

where ej is theyth column of the identity matrix Im.

This definition along with Theorems (7.1) and (7.3) immediately allow us
to write the minimum length solution to Problem LS as

The following two cases are worthy of special note. For a square nonsin-
gular matrix B, the pseudoinverse of B is the inverse of B:

For an m x n matrix,

with R11 a k x k nonsingular matrix, the pseudoinverse of R is the n x m


matrix
38 THE PSEUDOINVERSE CHAP. 7

The pseudoinverse of an m x n matrix A can be characterized in other


ways. Each of the following two theorems provides an alternative character-
ization of the pseudoinverse.
(7.8) THEOREM
If A = HRKT is any orthogonal decomposition of A, as given in
Theorem (2.5), then A+ = KR+HT where R* is given in Eq. (7.7).
(7.9) THEOREM [The Penrose Conditions; Penrose (1955)]
The n x m matrix A+ is the unique matrix X that satisfies the following
four conditions:
(a) AXA = A
(b) XAX = X
(c) (AX)T = AX
(d) (XA)T = XA

Proofs of these two theorems are left as exercises.


The explicit representation of A+ given in Theorem (7.8) is particularly
useful for computation. For example, if one has the orthogonal decom-
position A of Eq. (3.20), with R11 of Eq. (3.21) nonsingular and triangular,
then

If one has the singular value decomposition of Eq. (4.2), as given in Theo-
rem (4.1), then

As mentioned at the beginning of this chapter, there is usually no need


to construct the pseudoinverse of a matrix A explicitly. For example, if the
purpose of the computation is the solution of Problem LS, then it is more
economical in computer time and storage to compute the solution with the
following three steps [which are essentially repeated from the proof of Theo-
rem (2.3)]

(7.13) Solve Rttyt = gt for yt.


CHAP. 7 THE PSEUDOINVERSE 39

EXERCISES

(7.15) Verify Eq. (7.6) and (7.7).


(7.16) Prove Theorem 7.8.
(7.17) [Penrose (1955)] Prove Theorem 7.9.
(7.18) Prove (A+)T = (AT)+ and (A+)+ = A.
(7.19) Prove: If QmXn has orthonormal columns or orthonormal rows, then
Q+ = QT.
If Qmxn is of rank n and satisfies Q+ = QT, then Q has or-
thonormal columns.
(7.20) [Penrose (1955)] If U and V have orthonormal columns, then (UAV T)+
= VA+UT.
(7.21) (a) Let A =USVTbe a singular value decomposition of A. Write
singular value decompositions for the four matrices P1 = A+A,
P2 = I - A+A, P3 = AA+, and P4 = I - AA+ in terms of the
matrices U, S, and V.
(b) Deduce from part (a) that the matrices Pi are symmetric and idem-
potent, hence projection matrices.
(c) Let TI denote the subspace associated with the projection matrix
Pt; that is, Tt = {x: Pix = x}. Identify the relationship of each
subspace Ti to the row or column spaces of A.
(7.22) Prove that the most general solution of A x = b is x = A+b + (I —
A+A)y, where y is arbitrary.
(7.23) [Greville (1966)] For nonsingular matrices one has the identity(AB)-1
= B-1A-1. The analogous equation (AB)+ = B+A+ is not satisfied
by all matrices Amxk and B kxn.
(a) Exhibit two matrices A and B with m — n=1 and k — 2 such
that (AB)+ = B+A+.
(b) Prove that matrices Amxk and Bkxn satisfy (AB)+ = B+A+ if and
only if the range space of B is an invariant space of ATA and the
range space of AT (row space of A) is an invariant space of BBT.
(7.24) If Rank (Amxn) = n, then A+ = (ATA)-1AT.
If Rank (A mxn )= m, then A+ = ArAA*)-*.
(7.25) [Penrose (1955); Graybill, et al. (1966)] Given A, the equations XAAT
— AT and ATA Y= AT are each consistent. If matrices X and Y
satisfy these two equations, respectively, then XA and A y are projec-
tion matrices (symmetric and idem potent) and A+ = AM K. Note the
simplification of this proposition in case A is symmetric.
40 THE PSEUDOINVERSE CHAP. 7

(7.26) [Graybill, et al. (1966)] The pseudoinverse of a rectangular matrix A


can be defined in terms of the pseudoinverse of the symmetric matrix
ATA or AAT, whichever is more convenient, by the formulas A* —
(ATA)+AT or A+ = AT(AAT)+, respectively.
(7.27) [Penrose (1955)] If A is normal (i.e., satisfies ATA = AAT), then A+A
= AA+ and (An)+ = (A+)n.
(7.28) [Penrose (1955)] If A = E At with AtAjT = 0 and ATiAj = 0 whenever
i=j, then A+ =EA+i.
8 PERTURBATION BOUNDS
FOR THE PSEUDOINVERSE

Our objective here and in Chapter 9 is to study the relationship of per-


turbation of the data of Problem LS to perturbation of the solution of the
problem. In this chapter we develop the perturbation theorems for the pseudo-
inverse. These theorems are then used in Chapter 9 to study perturbation of
the Problem LS.
In practice the consideration of such perturbations can arise due to the
limited precision with which observable phenomena can be quantified. It is
also possible to analyze the effects of round-off errors in the solution proce-
dure as though their effects were due to perturbed data. This analysis for
algorithms using Householder transformations will be described in Chapters
I5, 16, and 17.
Results relating to perturbation of the pseudoinverse or the solution of
Problem LS have been given by a number of authors. The treatment of this
problem by Wedin (1969) seems most appropriate for our present purposes in
terms of generality and the convenient form of the final results. For earlier
treatments of this perturbation problem or special cases of the problem, see
Golub and Wilkinson (1966), Bjorck (1967a and 1967b), Pereyra (1968),
Stewart (1969), and Hanson and Lawson (1969).
Let A and E be m x n matrices and define the perturbed matrix

and the residual matrix

We wish to determine the dependence of G on E and in particular to obtain


bounds for || G||in terms of || A||and || E\\.
41
42 PERTURBATION BOUNDS FOR THE PSEUDOINVERSE CHAP. 8

It will be convenient to introduce the four projection matrices:

These matrices have many useful properties derivable directly from the
Penrose conditions [Theorem (7.9)]. Also see Exercise (7.21) and the sum-
mary of standard properties of projection matrices given in Appendix A.
The matrices defined in Eq. (8.1) to (8.4) will be used throughout this
chapter without further reference.
(8.5) THEOREM
Using definitions of Eq. (8.1) and (8.2), the matrix G satisfies

where

These matrices are bounded as follows:

Proof: Write G as a sum of eight matrices as follows:

Using the properties

Eq. (8.13) reduces to


CHAP. 8 PERTURBATION BOUNDS FOR THE PSEUDOINVERSE 43

To expose the first-order dependence of G on E, write

The bounds in Eq. (8.10) to (8.12) follow from the facts that ||I — Q || < 1
and ||I — ^H < 1, completing the proof of Theorem (8.5).

We remark that for the case of real numbers a and 3 (and in fact for square
nonsingular matrices) one has the algebraic identity

which suggests that Eq. (8.10) is a reasonable form to expect in a bound for
\\G\\. The additional terms G2 and G3 are specifically associated with the
cases of nonsquare matrices or square singular matrices in the following
sense: The matrix G2 can be nonzero only if Rank (A) < m and G3 can be
nonzero only if Rank (A) < n since Q = Im if Rank (A) — m and P = In if
Rank (A) = n.
We next wish to replace ||A+ || in the right sides of Eq. (8.10), (8.11) and
(8.12) by its bound in terms of ||A+|| and ||E||. Such a bound is available
only under the assumptions in Eq. (8.16) and (8.17) of the following theorem.
(8.15) THEOREM
Assume

and

Let sk denote the smallest nonzero singular value of A and E — ||E||.


Then

and

Proof: The inequality of Eq. (8.17) can be written as E/s k < 1 or


equivalently sk — E > 0. Let Sk denote the Kth singular value of A—A + E.
44 PERTURBATION BOUNDS FOR THE PSEUDOINVERSE CHAP. 8

From Theorem (5.7)

which implies that Rank (A + E)>k. With the inequality of Eq. (8.16)
this establishes Eq. (8.18). The inequality of Eq. (8.20) can be written as

which is equivalent to the inequality of Eq. (8.19). This completes the proof
of Theorem (8.15).
The conditions of Eq. (8.16) and (8.17) are necessary. It is easy to verify
that || A+ || may be unbounded if either of these conditions are not satisfied.
As long as we shall be imposing Eq. (8.16) and (8.17) to obtain Eq. (8.19), we
can take advantage of the condition in Eq. (8.18) to prove that|E|.|A+||2
can be replaced by| | E | | . | | A + | | A +|| in the bound for ||G2|| given in Eq.
(8.11). This is accomplished by Theorems (8.21) and (8.22).
(8.21) THEOREM
If Rank (A) = Rank (A), then

Proof: This proof is due to F. T. Krogh. Write singular value


decompositions

and

Then from Eq. (8.3) and (8.4) and the assumption that A and A have the
same rank, say, k,

and

Define the m x m orthogonal matrix W with submatrices Wij by


CHAP. 8 PERTURBATION BOUNDS FOR THE PSEUDOINVERSE 45

Then

Similarly, one can verify that

It remains to be shown that || Wt2 || = || W21 ||. Let x be any (m — K)-vector


and define

Then, using the orthogonality of W,

and thus

Therefore

where sm_k is the smallest singular value of W22.


Similarly, from

one obtains

Thus || W12 || = || W21 ||, which completes the proof of Theorem (8.21).
(8.22) THEOREM
If Rank (A) = Rank (A), then the matrix G2 defined in Eg. (8.8) satis-
fies
46 PERTURBATION BOUNDS FOR THE PSEUDOINVERSE CHAP. 8

Proof: Using Rank (A) = Rank (A), Theorem (8.21) permits us to


write

Then since

it follows that || G2 || satisfies Eq. (8.23), which completes the proof of Theo-
rem (8.22).

We are now in a position to establish the following theorem, which may


be regarded as a more useful specialization of Theorem (8.5). By making
stronger assumptions, bounds are obtained that do not include ||A+||.
(8.24) THEOREM
Use definitions of Gt, i = 1,2,3, from Eq. (8.7) to (8.9) and, in addi-
tion, assume ||E||-|| A+ || < 1 and Rank (A) <, Rank (A). Then Rank
(A) = Rank (A) and

where

Proof: The conclusion that Rank (A) = Rank (A) is obtained from
Theorem (8.15). Using the bound for ||A+ || obtained in that theorem, one
obtains Eq. (8.25), (8.26) and (8.27) from Eq. (8.10), (8.23) and (8.12), re-
spectively.
Thus the bound of Eq. (8.28) would follow immediately with c = 3. For
practical purposes this would be a satisfactory result. The special result in
Eq. (8.31) is also immediate since in this case || G2 || = ||G3 ||= 0. Of course
CHAP. 8 PERTURBATION BOUNDS FOR THE PSEUDOINVERSE 47

this result for a square nonsingular matrix is well known and can be proved
more directly [e.g., see Wilkinson (1965a), pp. 189-190].
The special results of Eq. (8.29) and (8.30) are established as follows. Let
x be a unit m-vector. Define

Then

and there exists a number p such that

and

Let

so that from the inequalities in Eq. (8.25) to (8.27)

Because of the rightmost factors A+ and (I — Q) in Eq. (8.7) to (8.9), we


have

Because of the leftmost factors A+ and (/ - P) in Eq. (8.7) to (8.9), the


vector y3 is orthogonal to y1 and y2. Thus
48 PERTURBATION BOUNDS FOR THE PSEUDOINVERSE CHAP. 8

Therefore

Then

which proves Eq. (8.29).


For the conclusion in Eq. (8.3 e have either Rank (A) = n < m, in
which case P = I„ so that G3 = 0, or else Rank (A) — m < n, in which case
Q = Im so that G2 = 0. Thus either y2 or y3 is zero in Eq. (8.32) leading to

which establishes Eq. (8.30), completing the proof of Theorem (8.24).

Equations (8.1) to (8.9) and the theorems of this chapter may be used to
prove that with appropriate hypotheses on the rank of A the elements of A+
are differentiable functions of the elements of A.
Examples of some specific differentiation statements and formulas are
given in Exercises (8.33) and (9.22-9.24). Note that the formulas given in
these exercises generalize immediately to the case in which t is a K-dimen-
sional variable with components t1, . • • , t k . One simply replaces d/dt in these
formulas by d/dt, for / = 1,.... k.
Differentiation of the pseudoinverse has been used by Fletcher and Lill
(1970) and by Perez and Scolnik (1972) in algorithms for constrained mini-
mization problems. See Golub and Pereyra (1973) and Krogh (1974) for an
application of differentiation of the pseudoinverse to nonlinear least squares
problems in which some of the parameters occur linearly.

EXERCISE

(8.33) [Hanson and Lawson (1969); Pavel-Parvu and Korganoff (1969)]


Let A be an m x n matrix with m>n whose elements are differen-
tiable functions of a real variable /. Suppose that, for t = 0, A is of
rank n. Show that there is a real neighborhood of zero in which A+
is a differentiable function of t and the derivative of A+ is given by
9 PERTURBATION BOUNDS FOR
THE SOLUTION OF P BLEM LS

In this chapter the theorems of the preceding chapter will be applied to


study the effect of perturbations of A and b upon the minimum length solu-
tion x of Ax = b. We shall continue to use the definitions given in Eq. (8.1)
to (8.4). Theorem (9.7) will be established without restriction on the relative
sizes of m, n and k = Rank (A). Following the proof of Theorem (9.7) the
bounds obtained will be restated for the three special cases of m = n = k,
m>n = k and n>m — k.
For convenience in stating results in terms of relative perturbations we
define the relative perturbations

and the quantities

and

49
50 PERTURBATION BOUNDS FOR SOLUTION OF PROBLEM LS CHAP. 9

The definitions in Eq. (9.1) to (9.6) of course apply only when the respective
denominators are nonzero. The quantity K of Eq. (9.5) is called the condition
number of A.
(9.7) THEOREM
Let x be the minimum length solution to the least squares problem Ax =
b with residual vector r = b — Ax. Assume || E|A+| A* || < 1 and Rank
(A) <, Rank (A), and let x + dx be the minimum length solution to the
least squares problem

Then

and

Proof: The conclusion in Eq. (9.8) follows from Theorem (8.15). The
vectors x and x + dx satisfy

and

Thus

Note that r = (I - Q)r = (I - Q)b, which with Eq. (8.8) implies G2b
G2r. Then using Eq. (8.6) to (8.9) gives

Using the bound for || A+ || from Theorem (8.15) and the bound for || G 2 ||
from Eq. (8.26), the result in Eq. (9.9) is obtained.
Dividing the inequality in Eq. (9.9) by ||x||, using the definitions in Eq.
(9.1) to (9.6) gives the inequality in Eq. (9.10). This completes the proof of
Theorem (9.7).
CHAP. 9 PERTURBATION BOUNDS FOR SOLUTION OF PROBLEM LS 51

Observe that when n = k = Rank (A), the matrix G3 of Eq. (8.9) is zero,
which leads to the fourth term in the right member of the inequalities in Eq.
(9.9) and (9.10) being zero. Similarly when m — k = Rank (A), the matrix
G2 of Eq. (8.8) and consequently the third term in the right member of the
inequalities in Eq. (9.9) and (9.10) are zero.
Furthermore if either n = k or m = k, then the rank of A clearly cannot
exceed the rank of A. Thus the hypothesis Rank (A) < Rank (A) that was
used in Theorem (9.7) is automatically satisfied when either n = k or m = k.
These observations provide proofs for the following three theorems.

and

and

and

Alternatively, by an argument similar to that used to establish Eq. (8.30),


52 PERTURBATION BOUNDS FOR SOLUTION OF PROBLEM LS CHAP. 9

it can be shown that

The following exercises illustrate some differentiation formulas that can


be derived from the results of this chapter. See the remarks preceding Exer-
cise (8.33) for an indication of generalizations and applications of these for-
mulas.

EXERCISES

(9.22) Let A be an m x n matrix with m>n whose elements are differenti-


able functions of a real variable t. Let x be a vector function of t
defined by the condition that A x = b for all f in a neighborhood U
in which A is differentiale and Rank (A) = n. Show that for t E U,
dx/dt exists and is the solution of the least squares problem

where r = b = Ax.
(9.23) Further show that dx/dt is the solution of the square nonsingular
system

(9.24) If A = QTR where Q is n x m with orthonormal rows and R is n x


n and nonsingular (a decomposition of A obtained by Householder
transformations, for example), show that dxldt satisfies
10
NUMERICAL COMPUTATIONS
USING ELEMENTARY
ORTHOGONAL TRANSFORMATIONS

We now turn to describing computational algorithms for effecting an


orthogonal decomposition of an m x n matrix A, as given in Th rem (3.19),
and calculating a solution to Problem LS.
Because of the variety of applications using Householder transformations
or Givens transformations, we have found it convenient and unifying to
regard each of these individual transformations as computational entities.
We shall describe each of these in detail. These modules will then be used in
describing other, more complicated, computational procedures.
The computation of the Householder transformation can be broken into
two parts: (1) the construction of the transformation and (2) its application
to other vectors.
The m x m Householder orthogonal transformation can be represented
in the form

where u is an w-vector satisfying || u || = 0 and b = —|| u||2/2.


In Chapter 3 it was shown how for a given vector v a vector u could be
determined so that Eq. (3.2) was satisfied. In practice, occasions arise in which
one wishes to determine Q to satisfy
53
54 ELEMENTARY ORTHOGONAL TRANSFORMATIONS CHAP. 10

The computation of such a Q could be described as a row permutation follow-


ed by an ordinary Householder transformation as described in Lemma (3.1)
followed by a restoring row permutation. We find it more convenient, how-
ever, to view Eq. (10.2) as defining the purpose of our basic computational
module. The effect of the matrix Q in transforming v to y can be described by
means of three nonnegative integer parameters, p, I, and m, as follows:

1. lf p > 1, components 1 through p — 1 are to be unchanged.


2. Component p is permitted to change. This is called the pivot element.
3. If p < I - 1, components p +1 through / — 1 are to be left un-
changed.
4. If I < m, components / through m are to be zeroed.

Note that we must have

and

The relation / > m is permitted in the Fortran implementation of this com-


putational procedure (subroutine H12 in Appendix C) and is interpreted to
mean that Q is to be an identity matrix.
The computational steps necessary to produce an m x m orthogonal
matrix Q that satisfies the conditions imposed by the integer parameters p,
CHAP. 10 ELEMENTARY ORTHOGONAL TRANSFORMATIONS 55

/, and m can be stated as follows:

The fact that the matrix Q defined in Eq. (10.11) has the desired properties
is established by the following three lemmas.
(10.12) LEMMA
The m-vector u and the scalar b defined in Eq. (10.5) to (10.10) satisfy

Proof: This follows from the algebraic identities

(10.14) LEMMA
The matrix Q of Eq. (10.11) is orthogonal.
Proof: The verification that (QT Q— Im follows directly using Eq.
(10.11) and (10.13).
(10.15) LEMMA
Let y — Qv. Then
56 ELEMENTARY ORTHOGONAL TRANSFORMATIONS CHAP. 10

Proof: If v = 0, the lemma is obviously true. For v = 0, the easily


verified fact that UTV = —b while using Eq. (10.6) to (10.9) shows that

satisfies Eq. (10.16) to (10.19).

In actual computation it is common to produce the nonzero components


of the vectors u and y in the storage previously occupied by v. The exception
to this is the pth entry in the storage array; a choice must be made as to
whether up or yp will occupy it. We shall let yp occupy it and store up in
an additional location; the quantity b of Eq. (10.10) can be computed from
the identity indicated there whenever it is needed.
After constructing the transformation one will generally wish to apply
the transformation to a set of m-vectors, say, cj, j = 1,..., v. Thus it is
desired to compute

Using the definition of Q given by Eq. (10.11), this computation is accom-


plished as follows:

These computations will now be restated in an algorithmic form suitable


for implementation as computer code. We shall define Algorithm Hl(P, l, m,
v, h, C, v) for constructing and optionally applying a Householder transfor-
mation and Algorithm H2(p, l, m, v, h, C, v) for optionally applying a pre-
viously constructed Householder transformation.
The input to Algorithm HI consists of the integers p, l, m, and v; the m-
vector v, and, if v > 0, an array C containing the m-vectors cj,j = 1,..., v.
The storage array C may either be an m x v array containing the vectors
Cj as column vectors or a v x m array containing the vectors Cj as row vectors.
These two possible storage modes will not be distinguished in describing
Algorithms HI and H2. However, in references to Algorithm H1 or H2 that
occur elsewhere in this book we shall regard the column storage as normal
and make special note of those cases in which the set of vectors cj upon
which Algorithm H1 or H2 is to operate are stored as row vectors in the
storage array C.
Algorithm H1 computes the vector u, the number b, the vector y — Qv,
CHAP. 10 ELEMENTARY ORTHOGONAL TRANSFORMATIONS 57

and, if v > 0, the vectors tj = Qcj,j = 1,..., v. The output of Algorithm


HI consists of the Pth component of u stored in the location named h, the
components / through m of u stored in these positions of the storage array
named v, components 1 through / — 1 of y stored in these locations of the
storage array named v, and, if v > 0, the vectors tji j = 1,..., v, stored in
the storage array named C.
In Algorithm H2 the quantities P, I, and m have the same meaning as for
Algorithm H1. The vector v and the quantity h must contain values computed
by a previous execution of Algorithm H1. These quantities define a transfor-
mation matrix Q. If v > 0, the array C must contain a set of m-vectors cji
J — 1,..., v, on input and Algorithm H2 will replace these with vectors
cj = Qcj,j= 1, . . . , v .
(10.22) ALGORITHMS H1(P, l, m, v, h, Ct v) [use Steps 1-11] and H2(P, /, m,
v* h, C, v) [use Steps 5-11]
Step Description
1
2 If vp>0, set 5:= — s.
3 Set h := vp — s, vp :== s.
4 Comment: The construction of the transformation is com-
plete. At Step S the application of the transformation to
the vectors cj begins.
5 Set b := vph.
6 If b = 0 orv = 0,go to Step ll.
7 For j := 1 v, do Steps 8-10.

9 Set cpj:=cPj + sh.


10 For i :=I,..., m, set cij :— cij + svj-
11 Comment:
(a) Algorithm H1 or H2 is completed.
(b) In Step 1 the computation of the square root of sum of
squares can be made resistant to underflow by com-
puting it based on the identity (w12 + • • • + wm2)1/2 =
t[(wi/t2+ • . . + (wm/t)]1/2, with t = max {[wt|, i = l,
..., m}.
The Fortran subroutine H12 given in Appendix C implements Algorithms
H1 and H2.
58 ELEMENTARY ORTHOGONAL TRANSFORMATIONS CHAP. 10

The other general elementary orthogonal transformation we shall discuss


is the Givens rotation. The formulas for construction of the Givens rotation
were given in Eq. (3.5) to (3.9). The formulas for c and s given in Eq. (3.8)
and (3.9) can be reformulated to avoid unnecessary underflow or overflow.
This is accomplished by computing r = (x2 + J 2 ) 1/2 as

On a machine that does not use normalized base 2 arithmetic, additional care
should be taken to avoid loss of accuracy in computing c and s. For example,
Cody (1971) has given the following reformulation of the expression in Eq.
(10.23) for use with normalized base 16 arithmetic.

Algorithms Gl(v1, v2, c, s, r) and G2(c, s, z1 z2) will be described for,


respectively, constructing and applying a Givens rotation. The expression in
Eq. (10.23), appropriate for base 2 arithmetic, will be used. The input to
Algorithm Gl consists of the components v1 and v2 of the 2-vector v. The
output consists of the scalars c and s, which define the matrix G of Eq. (3.5),
together with the square root of sum of squares of v1 and v2 which is stored
in the storage location named r. The storage location r can be identical with
the storage location v, or v2, although it is most often convenient to identify
it with the storage location for v1. The input to Algorithm G2 consists of the
scalars c and s defining the matrix G of Eq. (3.5) and the components z1 and
z2 of the 2-vector z. The output consists of the components d1, and d2 of the
vector d= Gz stored in the locations called z1 and z2.
(10.25) ALGORITHM Gl(v1, v2, c, s, r)
Step Description
1 If |vl|<|, go to Step 8.
2 Set w := v2/v1.
3 Set q :=(1 + w2)1/2.
4 Set c := I/q.
5 If v, < 0, set c := — c.
6 Set s :— we.
7 Set r :— |v1|q and go to Step 16.
CHAP. 10 ELEMENTARY ORTHOGONAL TRANSFORMATIONS 59

Step Description
8 If va=0, go to Step 10.
9 Set c := 1, s := 0, r := 0, and go to Step 16.
10 Set w := v1v2.
11 Set q :=(1+w2)l/2.
12 Set s :=1/q.
13 If v2 < 0, set s:= — s.
14 Set c := ws.
15 Set r:= |v2|q.
16 Comment: The transformation has been constructed.
(10.26) ALGORITHM G2(c, s, z1, z2)
step Description
1 Set w := z1c + z2s.
2 Set z2 := —z1s + z2c.
3 Setz1:=w.
4 Comment: The transformation has been applied.
The Fortran subroutines G1 and G2 given in Appendix C implement
Algorithms Gl and G2, respectively.
Many variations of algorithmic and programming details are possible in
implementing Householder or Givens transformations. Tradeoffs are possible
involving execution time, accuracy, resistance to underflow or overflow,
storage requirements, complexity of code, modularity of code, taking advan-
tage of sparsity of nonzero elements, programming language, portability, etc.
Two examples of such variations will be described.
Our discussion of the Householder transformation has been based on the
representation given in Eq. (10.1). The Householder matrix can also be
expressed as

The representations in Eq. (10.1) and (10.27) are related by the substitutions
g = b-lU1U — s-1u and h — ui-lu.
The form of Eq. (10.27) is mainly of interest for small m, in which case
the need to store two vectors g and h instead of one vector u is of no con-
60 ELEMENTARY ORTHOGONAL TRANSFORMATIONS CHAP. 10

sequence and the saving of two multiplications each time a product C — Qc


is computed is of some relative significance. This form is used with m = 3 in
Martin, Peters, and Wilkinson [pp. 359-371 of Wilkinson and Reinsch (1971)]
and with m — 2 and m = 3 in Moler and Stewart (1973).
Specifically with m = 2 the computation of c — Qc using the expression
in Eq. (10.1) [see Eq. (10.20) and (10.21)] requires five multiplications (or four
multiplications and one division) and three additions, while the use of Eq.
(10.27) requires only three multiplications and three additions. Thus Eq.
(10.27) requires fewer operations than Eq. (10.1) and is, in fact, competitive
with the Givens transformation, which, as implemented by Algorithm G2
(10.26), requires four multiplications and two additions to compute c = Gc.
The actual comparative performance of computer code based on Eq.
(10.27) versus code based on the conventional Givens transformation
[Algorithms Gl (10.25) and G2 (10.26)] will very likely depend on details of
the code. For example, we tested two such codes that used single precision
arithmetic (27-bit fraction part) on the UNIVAC 1108 computer. These were
used to zero the (2, l)-element of 100 different single precision 2 x11
matrices. For each of the 1100 2-vectors transformed, the relative error was
computed as p = ||v' —v"||/||v'||where v' is the transformed vector com-
puted by one of the two single precision codes being tested and v" is the
transformed vector computed by a double precision code. For the House-
holder code based on Eq. (10.27), the root-mean-square value of p was 1.72
x 2-27 and the maximum value of p was 8.70 x 2-27. For the Givens code,
the corresponding figures were 0.88 x 2-27 and 3.17 x 2-27.
Methods of reducing the operation count for the Givens transformation
have been reported by Gentleman (1972a and 1972b). These methods require
that the matrix A to be operated upon and the transformed matrix A — GA
each be maintained in storage in a factored form.
For ease of description of one of the methods given by Gentleman, con-
sider the case of a 2 x it matrix A in which the (2, l)-element is to be zeroed
by left multiplication by a Givens rotation. Thus

where a21 =0. Here the Givens matrix G would be constructed as


CHAP. 10 ELEMENTARY ORTHOGONAL TRANSFORMATIONS 61

In the technique to be described, instead of having the matrix A available


in storage, one has matrices D2x2 and B2xn such that

and

We wish to replace D and B in storage with new matrices D2x2 and B 2 x n with

such that the matrix A of Eq. (10.28) is representable as

We distinguish three cases:

CASE I
We may set d = d and B = B.

CASE II
Define

Note thatb11=B11(1+t)andb21= 0.
62 ELEMENTARY ORTHOGONAL TRANSFORMATIONS CHAP. 10

CASE III
dEFINE

Note that B11 = b21(1 + t) and b21 = 0.


It is easily verified that the matrices D and & defined by this process
satisfy Eq. (10.34). The saving of arithmetic in applying the transformation
comes from the presence of the two unit elements in the matrix H. This per-
mits the matrix multiplication B = HB to be done with 2n additions and 2n
multiplications in contrast to the requirement of 2n additions and 4ft multi-
plications required for the matrix multiplication X = GA. Furthermore the
square root operation usually required to construct the Givens transforma-
tion has been eliminated.
In the case of triangularizing a general m x n matrix A by a sequence
of Givens rotations, one could begin with D1 = I MXm and B1, = A and pro-
duce a sequence of matrices{Dk}and {Bk}, k — 1,2 by this procedure.
In general the size of elements in Dk will decrease as k increases but the rate
is limited by the fact that1/2<, (1 + t)-1 <, 1. There is a corresponding but
slower general increase in the size of the elements in Bk since the euclidean
norm of each column vector of the product matrixD1//2Bkis invariant with
respect to k.
To reduce the likelihood of overflow of elements of B or underflow of
elements of D, any code implementing this version of Givens rotations should
contain some procedure to monitor the size of the numbers dt, rescaling dt
and the ith row of B whenever dt becomes smaller than some tolerance. For
example, one might set t — 2-24, p = J-1, and B = T1/2. When a number dt
is to be operated upon, it can first be compared with T. If dt < t, replace dt
by pdt and replace bij,j— 1,...,n, by Bbij,j = 1,...,n. With this choice
of J the multiplications by p and ft will be exact operations on many com-
puters having base 2, base 8, or base 16 floating point arithmetic.
COMPUTING THE SOLUTION FOR

11 THE OVERDETERMINED OR EXACTLY


DETERMINED FULL RANK PROBLEM

In Theorem (3.11) we saw that for an m x n matrix A there existed an m


x m orthogonal matrix Q such that QA = R is zero below the main diago-
nal. In this chapter we shall describe the numerical computation of the
matrices Q and R using Algorithms H1 and H2 of Chapter 10.
The matrix Q is a product of Householder transformations

where each Qj has the form

Rather than saving the quantity b1 that appears in Eq. (11.2), we shall
recompute it when needed based on Eq. (10.10).
Introducing subscripts we have

Each Sj is the jth diagonal entry of the R matrix and will be stored as
such. The quantities are stored in an auxiliary array of locations named
hj, =1,...,«.
The computational algorithm will construct the decomposition of Theo-
rem (3.11). This algorithm will be known as HFT(m, n, A, K). The input to
HFT consists of the integers m and n and the m x n matrix A. The output
consists of the nonzero portion of the upper triangular matrix R stored in
the upper triangular portion of the array named A, the scalars u ( /)j stored in
theyth entry hj of the array named h, and the remaining nonzero portions
63
64 OVERDETERMINED OR EXACTLY DETERMINED FULL RANK PROBLEM CHAP. 11

of the vectors u(J) stored as columns in the subdiagonal part of theyth column
of the array named A.
(IIA) ALGORITHM HFT(m, n, A, h)
Step Description
1 For j := 1,..., n, execute Algorithm Hl(j,j + 1, m, aij,
hj, a1+ l t n —j) (see Algorithm 10.22).
2 Comment: The (forward) triangularization is computed.
In Step 1 of the algorithm above we are adopting the convention (con-
sistent with Fortran usage) that they'th column of A can be referred to by
the name of its first component, and the submatrix of A composed of columns
y+1 through n can be referred to by the name of the first entry of column
7+1-
In Cases la and 2a of Fig. 1.1 the n x n upper triangular matrix /?,, (see
Eq. 3.21) is nonsingular. Thus, in these two cases, the solution X of Problem
LS can be obtained by computing

partitioning g as

and solving

for the solution vector Jc.


Using Eq. (2.9) and (2.10), the residual vector r ~ b — A% and its norm
can be computed as

and

Use of Eq. (11.8) and (11.9) obviates the need to save or regenerate the data
matrix [A: b] for the purpose of computing residuals.
The following algorithm, HS1, will accomplish the computation of Eq.
(11.5) to (11.7). The input to this algorithm will consist of the integers m and
n, the arrays named A and h as they are output by the Algorithm HFT (11.4)
and the array named b that holds the right-side m-vector b of Problem LS.
CHAP. 11 OVEROETERMINED OR EXACTLY DETERMINED FULL RANK PROBLEM 65

The output of the algorithm consists of the n-vector x replacing the first
n entries of the array named b and, if m > n, the (m — n)-vector g2 will
replace entries n+1 through m of the array named b.
(11.10) ALGORITHM HSl(m, n, A, h, b)
Step Description
\ For j := !,...,», execute Algorithm H2(j,j + 1, m, ajt
M.O.
2 Comment: In Steps 3 and 4 we shall compute the solution
to the triangular system R11x = gi of Eq. (11.7).
3 Set bn := bjann.
4 If it < 1, go to Step 6.
5 Fort :=it—l,it — 2 , . . . , l,set&,:=
6 Comment: The solution of Problem LS has been computed
for Cases la and 2a.
The case where b represents an m x / matrix can be easily handled by
changing Step 1 to execute Algorithm H2(/,/ + 1, m, au, Ay, b, l)J = 1,...,
it, and by changing Steps 3 and 5 to deal with b as an (m x /)-array.
To compute the residual norm p [see Eq. (11.9)], one could add to
Algorithm HS1
Step?

To compute the residual vector [see Eq. (11.8)], one could add to
Algorithm HS1
Step 8 For t := 1,..., it, set bt := 0
Step 9 For y := it, it — 1,..., 1, execute Algorithm H2(j,j + 1,
m, <u, hj, b, 1)
Note that if Steps 8 and 9 are used, the solution vector x must first be
moved to storage locations distinct from the array b if it is not to be over-
written. Following Step 9 the residual vector r = b — Ax occupies the storage
array called b.
Algorithm HS1 is based on the assumption that the matrix A of Problem
LS is of rank it. There is no test made in the algorithm to check for this.
In practice it is important to know whether a change in the matrix A of
the order of the.data uncertainty could produce a matrix of rank less than it.
One computational approach to this problem, involving the use of column
interchanges as a first step, will be discussed in Chapter 14. Another
66 OVERDETERMINED OR EXACTLY DETERMINED FULL RANK PROBLEM CHAP. 11

approach, which provides more detailed information about the matrix,


involves the computation of the singular value decomposition. This will be
treated in Chapter 18.
Frequently, when a least squares problem is solved, there is also interest
in computing the covariance matrix for the solution parameters. This topic
will be treated in Chapter 12.
A Fortran implementation of Algorithms HFT and HS1 of this chapter
and the covariance matrix computation COV of Chapter 12 is given in
PROG1 in Appendix C. This program constructs and solves a sequence of
sample problems.

EXERCISES
(11.11) Derive a forward triangularization algorithm using Givens transfor-
mations. Count the number of adds and multiplies separately. How
is this count changed if the two-multiply, two-add transformation is
used?
(11.12) Determine the number of multiplications required to solve Problem
LS using Algorithms HFT and HS1. Compare this count with that
obtained for the Givens transformations of Exercise (11.11).
COMPUTATION OF

12 THE COVARIANCE MATRIX


OF THE SOLUTION PARAMETERS

The symmetric positive definite matrix

or a scalar multiple of it, say, s2C, has a statistical interpretation, under


appropriate hypotheses, of being an estimate of the covariance matrix for
the solution vector of Problem LS [e.g., see Plackett (I960)]. We shall refer
to C as the unsealed covariance matrix.
In this chapter some algorithms for computing C will be presented. We
shall also identify some commonly occurring situations in which the explicit
computation of C can be avoided by reconsidering the role of C in the later
analysis.
We shall not discuss the derivation or interpretation of the scalar factor a2
but simply note that one expression commonly used to compute a2 is

where £ is the least squares solution of A x = b and m and n are, respectively,


the number of rows and columns of the matrix A.
We shall discuss algorithms for computing C that make use of decom-
positions of A. These decompositions occur in the course of three different
methods for solving Problem LS that are treated in detail in this book. These
three decompositions may be summarized as follows:
67
68 THE COVARIANCE MATRIX OF THE SOLUTION PARAMETERS CHAP. 12

The decompositions of Eq. (12.3) to (12.S) are, respectively, those ob-


tained by Algorithms HFT (11.4) and HFTI (14.9) and the singular value
decomposition of Chapter 18. The matrix Q of Eq. (12.3) is n x n and
orthogonal. The matrices Q and P of Eq. (12.4) are both orthogonal, while
P is a permutation matrix. The matrices V and V of Eq. (12.5) are, respec-
tively, m x m and n x n, and they are orthogonal.
It is easy to see that in the cases of Eq. (12.3) to (12.5) we have

so that if A is of rank n, inversion of these equations, respectively, yields

We first consider the details of the computation required by Eq. (12.9)


and (12.10). There are three basic steps.
1. Invert the upper triangular matrix R Or R onto itself in storage.
2. Form the upper triangular part of the symmetric matrix R- l (R- l )T or
R-l(R-l)T. This can replace R-1 or R-l in storage.
3. Repermute the rows and columns of the matrix R- l (R-1) T . At each
step the matrix is symmetric so only the upper triangular part of it need be
saved.
To derive formulas for the inversion of a triangular matrix R, denote
the elements of R~l by ttj. Then from the identity R~1R — I and the fact
that both R and R'1 are upper triangular, one obtains the equations
CHAP. 12 THE COVARIANCE MATRIX OF THE SOLUTION PARAMETERS 69

Solving for tti gives

In order to be able to replace rij by tij in computer storage when tij is com-
puted and to reduce the number of division operations, these formulas will
be used in the following form:

This set of formulas requires (n3/6) + O(n2) operations where an operation is


a multiply plus an add or a divide plus an add. The postmultiplication ofR~l
by its transpose indicated in Eq. (12.9) also requires (ns/6) + 0(n2) opera-
tions. Thus Algorithm COV requires (n3/3) + O(n2) operations to compute
the elements of the upper triangle of (A T A)- 1 from the elements of R.
For Algorithm COV the matrix RORR of the right member of Eq. (12.6)
or (12.7), respectively, must occupy the upper triangular part of an array
named A on input. On output the upper triangular part of the matrix (A T A}- 1
occupies the upper triangular part of the array named A. The array named p
will hold the information [generated by Algorithm HFTI (14.9), for example]
describing the effect of the matrix P of Eq. (12.10).
(12.12) ALGORITHM COV(A, n, p)
Step Description
1 For i := 1,...,», set aa := l/au>
2 If n = 1, go to Step 8.
3 For i := 1,..., n — 1, do through Step 7.
4 Fory := i-f-!,...,», do through Step 7.
5 Set s := 0.
6 For /:=/,... ,J — 1, set s := 5 + aijaij.
7 Set aij:— —tijjS.
8 For / := 1,..., it, do through Step 12.
9 Fory :=/,...,», do through Step 12.
10 Set s :== 0.
11 For / :=/,...,», set s := s + atfljt.
12 Set%:=5.
70 THE COVARIANCE MATRIX OF THE SOLUTION PARAMETERS CHAP. 12

Step Description
13 Remark: This completes the computation of the elements
of the upper triangle of(A T A-~ l for the case of Eq. (12.3)
where no permutations were used in computing R. Alter-
natively, in the case of Eq. (12.4) Steps 14-23 must be
executed to accomplish the final premultiplication and
postmultiplication by P and PT, respectively, as indicated
in Eq. (12.10). In this case we assume an array of integers
pi, i = 1,..., is, is available recording the fact that the
ith permutation performed during the triangularization of
A was an interchange of columns i and pt.
14 For i := n, n — .1,..., 1, do through Step 22.
15 If pt = i, go to Step 22.
16 Set k :== pt. Interchange the contents of a,, and akk. If
/= 1, go to Step 18.
17 For / := 1 , . . . , / — 1, interchange the contents of au
and ajk.
18 If k - i = 1, go to Step 20.
19 For l := i + 1,...,K — 1, interchange the contents of
ail and aik.
20 If k = n, go to Step 22.
21 For /:=k+ 1,...,n, interchange the contents of au
and aw.
22 Continue.
23 Remark: The computation of the unsealed covariance
matrix is completed.
Examples of Fortran implementations of this algorithm with and without
column permutations are, respectively, given in the Programs PROG1 and
PROG2 of Appendix C.
If the singular value decomposition is used, as in Eq. (12.5), the unsealed
covariance matrix C satisfies Eq. (12.11). Thus the individual elements of C
are given by

If the matrix Khas been obtained explicitly in an (n x n)-array of storage,


then yS-l can replace V in storage. Next the upper triangular part of C
can replace the upper triangle of VS~l in storage. This requires the use of
an auxiliary array of length it.
CHAP. 12 THE COVARIANCE MATRIX OF THE SOLUTION PARAMETERS 71

Remarks on Some Altmrnatives to Computing C

We wish to identify three situations in which the unsealed covariance


matrix C is commonly computed.

(12.14) • The matrix C or C'1 may occur as an intermediate quantity in


a statistical formula.
(12.15) • The matrix C may be printed (or otherwise displayed) to be
inspected and interpreted by the problem originator.
(12.16) • Some subset of the elements of the matrix C may be used inter-
nally as control parameters for some automated process.

In each of these cases we shall indicate an alternative approach that may


be either more informative or more economical.
First consider Case (12.14). Commonly, formulas involving C or C~l
contain subexpressions of the form

or

The most efficient, and numerically stable, approach to computing such


expressions is generally found by regarding C or C~l in its factored form,

or

respectively.
Thus one would evaluate Eq. (12.17) by first solving for Yin the equation

and then computing

Similarly in the case of Eq. (12.18) one would compute

followed by
72 THE COVARIANCE MATRIX OF THE SOLUTION PARAMETERS CHAP. 12

The important general principle illustrated by these observations is the


fact that a factorization of C or of C-1 is available as shown in Eq. (12.19)
and (12.20), respectively, and can be used to develop computational proce-
dures that are more economical and more stable than those involving the
explicit computation of C or C"1.
These remarks above can be easily modified to include the case where
the matrices C or C-1 involve the permutation matrix P as in Eq. (12.7) and
(12.10), respectively.
In Case (12.15) the problem originator is frequently investigating correla-
tions (or near dependencies) between various pairs of components in the
solution vector x. In this context it is common to produce an auxiliary
matrix,

where D is a diagonal matrix whose ith diagonal element is given by dtl


= cijt/2. Then eu = 1, i = 1,...,n, and |e tj |< 1, i = 1,..., n;j = 1,..., n.
The occurrence of an element eij, i =j, close to 1 or —1, say, eij = 0.95,
for example, would be taken as an indication that the ith and/th components
of the solution vector x are highly correlated. Algebraically this corresponds
to the fact that the 2 x 2 principal submatrix

is near-singular.
A weakness of this type of analysis is the fact that only dependencies
between pairs of variables are easily detected. For example, there may be a
set of three variables that have a mutual near dependence, whereas no two
of the variables are nearly dependent. Such a set of three variables could be
associated with a 3 x 3 principal submatrix of E such as

Here no off-diagonal element is close to 1 or — 1 and thus no principal 2x2


submatrix is near-singular but the 3 x 3 submatrix is near-singular. This
matrix becomes singular if the off-diagonal elements are changed from —0.49
to -0.50.
Dependencies involving three or more variables are very difficult to
detect by visual inspection of the matrix C or E. Such dependencies are
revealed, however, by the matrix V of the singular value decomposition A —
USVT (or equivalently of the eigenvalue decomposition ATA — VS2VT).
CHAP. 12 THE COVARIANCE MATRIX OF THE SOLUTION PARAMETERS 73

Thus if S j is a singular value of A that is small relative to the largest singu-


lar value Sit then the corresponding columns v, and uj of V and C/, respec-
tively, satisfy

or

The fact that (Sj/si is small may be taken as a criterion of near-singularity


of A, while the vector Vj identifies a particular linear combination of columns
of A that are nearly dependent. The visual inspection of columns of V asso-
ciated with small singular values has been found to be a very useful technique
in the analysis of ill-conditioned least squares problems.
In Case (12.16) the point to be observed is that one does not need to
compute all the elements of C if only some subset of its elements are required.
For example, once R-l is computed, then individual elements of C may be
computed independently by formulas derivable directly from Eq. (12.9).
Typical examples might involve a need for only certain diagonal elements of
C or only a certain principal submatrix of C.
COMPUTING THE SOLUTION

13 FOR THE UNDERDETERMINED


FULL RANK PROBLEM

Consider now the Problem LS, Ax = b, for the case m < n, Rank (A)
= m (Case 3a, Fig. 1.1).
A solution algorithm is described briefly in the following steps:

The algorithm will compute the orthogonal matrix Q of Eq. (13.1) so


that the matrix R is lower triangular and nonsingular. The existence of such
matrices was established by the transposed form of Theorem (3.11).
The m-vector y1, of Eq. (13.2) is uniquely determined. With any (n — m)-
vector y2 of Eq. (13-3). the vector x of Eq. (13.4) will satisfy Ax = b. The
minimal length solution is attained by setting y2 = 0 in accordance with
Theorem (2.3).
There are circumstances when this underdetermined problem Ax = b
arises as a part of a larger optimization problem that leads to y2 being non-
zero. In particular, the problem of minimizing ||Ex — f|| subject to the
underdetermined equality constraints Ax — b is of this type. This problem,
which we call LSE, will be treated in Chapters 20 to 22.
The matrix Q of Eq. (13.1) is constructed as a product of the form

74
CHAP. 13 THE UNDERDETERMINED FULL RANK PROBLEM 75

where each Qj has the form

The quantity bj that appears in Eq. (13.6) is recomputed when needed


based on Eq. (10.10). Introducing subscripts, we have

Each Sj is they jth diagonal entry of the R matrix and will be stored as such.
This algorithm will be known as HBT(m, it, A, g). The input to Algorithm
HBT consists of the integers m and n and the m x n matrix A of rank m.
The output consists of the nonzero portion of the lower triangular matrix R
stored in the lower triangular portion of the array named A. The wj;) are
stored in an auxiliary array named gjtj = 1 m. The remaining nonzero
portions of the vectors ulj) are stored in the upper diagonal part of they'th
row of the array named A.
(13.8) ALGORITHM HBT(m, », A, g)
Step Description
1 For j := 1,..., m, execute Algorithm Hl(j,j 4- 1,n,aj1,
gj, aj+i.i, w —j) (see Algorithm 10.22).
2 Comment: The (backward) triangularization is computed.
In Step 1, it should be noted that aj,1 denotes the start of a row vector in
storage, while aj+ltl denotes the start of m -j rows to be operated on by
the transformations.
The following algorithm will accomplish the computation of Eq. (13.2)
to (13.4). The input of this algorithm will consist of the integers m and n,
the arrays named A and g as they are output by Algorithm HBT (13.8).
The solution will be returned in an array named x. The array b is replaced in
storage by the vector yl of Eq. (13.2).
(13.9) ALGORITHM HS2(m, n, A, g, b, x)
Step Description
1 Setfb1:=b1,/a1.
2 For i := 2,..., m, set ft,
3 Comment: At this point the vector yz must be determined.
If the minimal length solution is desired, put y2 := 0.
4 Ser
76 THE UNDERDETBRMINED FULL RANK PROBLEM CHAP. 13

Step Description
5 For j := m, m — 1,..., 1, execute Algorithm H2(j,j + 1,
ntajltgjtxt 1).
6 Comment: The array named x now contains a particular
solution of the system Ax — b. If the vector y2 of Step 4 is
zero, the array x contains the solution of minimal length.
COMPUTING THE SOLUTION FOR

14 PROBLEM LS WITH POSSIBLY


DEFICIENT PSEUDORANK

The algorithms of Chapters 11 to 13 for full rank problems are intended


for use in cases in which the matrix A is assumed to be sufficiently well con-
ditioned so that there is no possibility that either data uncertainties or virtual
perturbations of A that arise due to round-off error during the computation
could replace A by a rank-deficient matrix. Bounds for these latter perturba-
tions will be treated in Chapters 15 to 17.
There are other cases, however, when one would like to have the algo-
rithm make a determination of how close the given matrix is to a rank-deficient
matrix. If it is determined that the given matrix is so close to a rank-deficient
matrix that changes in the data of the order of magnitude of the data un-
certainty could convert the matrix to one of deficient rank, then the algorithm
should take some appropriate action.
The algorithm should at least inform the user when this near-rank-defi-
cient situation is detected. In addition, one may wish to have a solution vec-
tor computed by some procedure that avoids the arbitrary instability that
can occur when a very ill-conditioned problem is treated as being of full rank.
One technique for stabilizing such a problem is to replace A by a nearby
rank-deficient matrix, say, A, and then compute the minimal length solution
to the problem Ax = b. This replacement most commonly occurs implicitly
as part of a computational algorithm. Algorithm HFTI, to be described in
this chapter, is of this type. An example illustrating the use of Algorithm
HFTI as well as some other stabilization methods is given in Chapter 26. In
Chapter 25 other methods of stabilization are developed.
We define the pseudorank & of a matrix A to be the rank of the matrix A
that replaces A as a result of a specific computational algorithm. Note that
pseudorank is not a unique property of the matrix A but also depends on
77
78 PROBLEM LS WITH POSSIBLY DEFICIENT PSEUDORANK CHAP, 14

other factors, such as the details of the computational algorithm, the value
of tolerance parameters used in the computation, and the effects of machine
round-off errors.
Algorithm HFTI applies in particular to the rank-deficient problems iden-
tified as Cases 1b, 2b, and 3b in Fig. 1.1. Strictly speaking, however, it is the
pseudorank rather than the rank of A that will determine the course of the
computation.
Using Theorems (2.3) and (3.19), the mathematical relations on which
Algorithm HFTI is based are the following:

The algorithm will determine the orthogonal matrix Q and the permuta-
tion matrix P so that R is upper triangular and R11 is nonsingular.
The permutation matrix P arises implicitly as a result of the column in-
terchange strategy used in the algorithm. The column interchange strategy is
intimately related to the problem of determining the pseudorank k. An es-
sential requirement is that the submatrix R11 be nonsingular. Generally,
unless other criteria are suggested by special requirements of the application,
one would prefer to have R11 reasonably well-conditioned and ||R22 ||rea-
sonably small.
One example of an exceptional case would be the case in which AmXH is
known to have zero as a distinct eigenvalue. Then one would probably wish
to set k — n — 1 rather than have the algorithm determine k.
Another exceptional case is the weighted approach to Problem LSE
(Chapter 22) where it is reasonable to permit R11 to be very ill-conditioned.
The column interchange strategy used in Algorithm HFTI is as follows.
For the construction of the y'th Householder transformation, we consider
columns/ through n and select that one, say, column A, whose sum of squares
of components in rows./ through m is greatest. The contents of columns./
CHAP. 14 PROBLEM LS WITH POSSIBLY DEFICIENT PSEUDORANK 79

and y are then interchanged and theyth Householder transformation is con-


structed to zero the elements stored in aij, i = J + 1, •. •, m.
Some saving of execution time can be achieved by updating the needed
sums of squares. This is possible due to the orthogonality of the Householder
transformation. The details of this appear in Steps 3-10 of Algorithm HFTI.
As a result of this interchange strategy the diagonal elements of R will be
nonincreasing in magnitude. In fact, they will satisfy the inequality in Eq.
(6.16). In this connection, see Theorems (6.13) and (6.31).
In Algorithm HFTI and in its implementation as the Fortran subroutine
HFTI in Appendix C, the pseudorank k is determined as the largest index j
such that \rjj|> T, where t is a user-supplied nonnegative absolute tolerance
parameter.
The column-interchange strategy and the appropriate selection of r clearly
depend on the initial scaling of the matrix. This topic is discussed in Chapter
25.
Choosing k to be less than rain (m, n) amounts to replacing the given
matrix

by the matrix

Note that

and

Further, with the stated procedures for column interchange and pseudorank
determination, one has

where

The orthogonal matrix K of Eq. (14.3) is chosen so that Wisk x k non-


singular and upper triangular. The fc-vector y1 is computed as the unique
solution of Eq. (14.4). The (n — k)-vector y2 can be given an arbitrary value
but the value zero gives the minimal length solution.
The final solution x is given by Eq. (14.6). The norm of the residual can
be computed using the right side of Eq. (14.7).
80 PROBLEM LS WITH POSSIBLY DEFICIENT PSEUDORANK CHAP. 14

The input to Algorithm HFTI will consist of the data A and b stored in
the arrays called A and b, respectively, the integers m and it, and the non-
negative absolute tolerance parameter T. The orthogonal matrix Q of Eq.
(14.1) is a product of fi Householder transformations Q,. The data that
define these matrices occupy the lower triangular part of the array A on out-
put plus /i additional stores in an array named h.
The array h is also used to store the squares of the lengths of columns of
certain submatrices generated during the computation. These numbers are
used in the selection of columns to be interchanged. This information can be
(and is) overwritten by the pivot scalars for the matrices Qt.
The permutation matrix P is constructed as a product of transposition
matrices, P = (l,p,) • • • (jitpj. Here (i,y) denotes the permutation matrix
obtained by interchanging columns i and j of /„. The integers p, will be re-
corded in an array p. The orthogonal matrix K of Eq. (14.3) is a product of
k Householder transformations K,. The data that define these matrices oc-
cupy the rectangular portion of the array A consisting of the first k rows of
the last n — k columns plus k additional stores in an array g.
Figure 14.1 illustrates the output storage configuration for this decom-
position when m = 6, n = 5, and k = Rank (A) — 3. If one wishes to
retain complete information about the original matrix A, an extra array
consisting of the diagonal terms of the matrix Kn of Eq. (14.1) must be
recorded. These would be needed to compute the matrix Q of that same
equation. Since Q is applied immediately to the vector b in Algorithm HFTI
this additional storage is not needed.

Stores that initially held A. After


processing they hold W and most of
the information for constructing the
Householder orthogonal decomposi-
tion.

Pivot scalars for the Householder


transformations Qt
Pivot scalars for the Householder
transformations Kt.

Interchange record.

+(Indicates elements of R22 that are ignored.

Fig. 14.1
CHAP. 14 PROBLEM LS WITH POSSIBLY DEFICIENT PSEUDORANK 81

At the conclusion of the algorithm the solution vector x is stored in the


storage array called x and the vector c is stored in the array called b. The
algorithm is organized so that the names b and x could identify the same
storage arrays. This avoids the need for an extra storage array for x. In this
case the length of the b-array must be max (m, n).
Steps 1-13 of Algorithm HFTI accomplish forward triangularization
with column interchanges. This is essentially the algorithm given in Golub
and Businger (1965). The additional steps constitute the extension for rank-
deficient cases as given in Hanson and Lawson (1969).
The coding details of Steps 3-10 are influenced by the code given in
Bjorck and Golub (1967). The test at Step 6 is essentially equivalent to "If
AA > lQ3tjh." The.form used in Step 6 avoids the explicit use of the machine-
dependent relative precision parameter q.
(14.9) ALGORITHM HFTI04, m, nt b, rt *, kt h, g, p)
Step Description
1 Set n := min (m, n).
2 For j := 1 /*, do Steps 3-12.
3 Ifj = 1, go to Step 7.
4 For / :=y,...,», set h, := h, — aj.^,.
5 Determine A such that AA := max {h,:j <, I <, n}.
6 If(n + IQ-'hJ > A, go to Step 9.
7 For/:=y,...,«, set ht :=fx.
8 Determine A such that AA := max {A/:/<£/<;«}. Set
*:-*a.
9 Set PJ := A. tip, = y, go to Step 11.
10 Interchange columns./ and A of A and set h^ := /»/.
11 Execute Algorithm Hl(J,J + 1, m, aljt hjt ati/+it n —j).
12 Execute Algorithm H2(/, j + 1, mt a,y, hjt b, 1).
13 Comment: The pseudorank k must now be determined. Note
that the diagonal elements of R (stored in a,, through a^)
are nonincreasing in magnitude. For example, the Fortran
subroutine HFTI in Appendix C chooses k as the largest
index j such that \a]1\> T. If all \oi}\< T, the pseudorank
k is set to zero, the solution vector x is set to zero, and the
algorithm is terminated.
14 If k — n, go to Step 17.
82 PROBLEM LS WITH POSSIBLY DEFICIENT PSEUDORANK CHAP. 14

Step Description
15 Comment: Here k < n. Next, determine the orthogonal
transformations K, whose product constitutes K of Eq. (14.3).
16 For i := k, k - 1,..., 1, execute Algorithm Hl(i, k + 1,
"n,aij,gi,a11,i-)• (The parameters a11 and a11 each iden-
tify the first element of a row vector that is to be referenced.)
17 Set xk := bk/akk. lf k < 1, go to Step 19.
18 For i := k - 1, k - 2,..., 1, x, :
[see Eq. (14.4)].
19 If k = n, go to Step 22.
20 Define the (n — k)-vector y2 of Eq. (14.5) and store its com-
ponents in xij i:= k +1,..., n. In particular, set y2 := 0
if the minimal length solution is desired.
21 For i:=1,...,k, execute Algorithm H2(i, k + 1,n, aij,
giy x, 1). [See Eq. (14.6). Here an identifies the first element
of a row vector.]
22 Fory := /i, n — 1,..., 1, do Step 23.
23 If PJ =J, interchange the contents of xj and xpi [see Eq.
(14.6)].
A Fortran subroutine HFTI, implementing Algorithm HFTI, is given in
Appendix C. In the subroutine the algorithm has been generalized to handle
multiple right-side vectors. The subroutine also handles the (unlikely) case
of the pseudorank k being zero by setting the solution vector x to zero.
15 ANALYSIS OF COMPUTING ERRORS
FOR HOUSEHOLDER TRANSFORMATIONS

Throughout the preceding chapters we have treated primarily the mathe-


matical aspects of various least squares problems. In Chapter 9 we studied
the effect on the solution of uncertainty in the problem data. In practice,
computer arithmetic will introduce errors into the computation. The re-
markable conclusion of floating point error analyses [e.g., see Wilkinson
(1965a)] is that in many classes of linear algebraic problems the computed
solution of a problem is the exact solution to a problem that differs from the
given problem by a relatively small perturbation of the given data.
Of great significance is the fact that these virtual perturbations of the data
due to round-off are frequently smaller than the inherent uncertainty in the
data so that, from a practical point of view, errors due to round-off can be
ignored.
The purpose of Chapters 15,16, and 17 is to present results of this type for
the least squares algorithms given in Chapters 11,13, and 14.
The set of real numbers that are exactly representable as normalized
floating point numbers in a given computer will be called machine numbers.
For our purposes the floating point arithmetic capability of a given com-
puter can be characterized by three positive numbers L, U, and n. The num-
ber L is the smallest positive number such that both L and — L are machine
numbers. The number U is the largest positive number such that both U and
— U are machine numbers.
In the analysis to follow we shall assume that all nonzero numbers x
involved in a computation satisfy L <|x|< U. In the practical design of
reliable general-purpose programs implementing these algorithms the condi-
tion |x|< U can be achieved by appropriate scaling. It may not be feasible
to guarantee simultaneously that all nonzero numbers satisfy |x| >L. but
83
84 HOUSEHOLDER TRANSFORMATIONS CHAP. IS

appropriate scaling can assure that numbers satisfying | x | < L can safely
be replaced by zero without affecting the accuracy of the results in the sense
of vector norms.
The number n characterizes the relative precision of the normalized float-
ing point arithmetic. Following Wilkinson we use the notation

to indicate that z is the machine number produced by the computer when its
addition operation is applied to the machine numbers x and y. If z — x + y,
then even assuming L <, | z | < U, it is frequently true that £ — z = 0. For
convenience this difference will be called round-off error regardless of whether
the particular computer uses some type of rounding or simply truncation.
The number n is the smallest positive number such that whenever the true
result z of one of the five operations, add, subtract, multiply, divide, or square
root, is zero or satisfies L < | z | < U, there exist numbers El bounded in
magnitude by 17, satisfying the corresponding one of the following five equa-
tions.

(When x and y have the same signs, E1, = E2 and E3 = E4.)

(When x and y have opposite signs, Es — E6 and E7 = E8.)

The general approach to analyzing algorithms of linear algebra using as-


sumptions of this type is well illustrated in the literature [e.g., Wilkinson
(1960) and (1965a); Forsythe and Moler (1967); Bjork (1967a and 1967b)].
Different techniques have been used to cope with the nuisance of keeping
track of second- and higher-order terms in tj that inevitably arise. Since the
main useful information to be obtained is the coefficient of q in the bounds,
we propose to determine only that coefficient in our analyses.
Due to grouping tj2 terms into a single O(n2) term, the "bounds" obtained
in Chapters 15,16, and 17 are not fully computable. For values of rj corre-
CHAP. 15 HOUSEHOLDER TRANSFORMATIONS 85

spending to practical computers, however, the discrepancy from this source


is negligible. Since maximum accumulation of errors is assumed throughout
the analysis, the bounds obtained tend to be severe overestimates. The main
practical purpose of these bounds is to give an indication of the dependence
of the errors on the parameters m, n, and n and to establish the excellent
numerical stability of the algorithms of Chapters 11,13, and 14.
We first analyze the round-off error in Algorithm HI (10.22). This al-
gorithm constructs a single m x m Householder transformation that will
zero all but the first element of a given m-vector v. Including scaling to avoid
loss of accuracy from underflow, the algorithm is mathematically defined by

Using an overbar to denote computed machine numbers, the quantities


actually computed using this algorithm may be written as

The computed quantities defined in Eq. (15.14) to (15.19) are related to


the true quantities of Eq. (15.7) to (15.12) as follows:
86 HOUSEHOLDER TRANSFORMATIONS CHAP. 15

The details of the derivation of these bounds are omitted since they are
somewhat lengthy and are very similar to the derivation of bounds given on
pp. 152-156 of Wilkinson (1965a). Note, however, that the bounds given
here are different from those in Wilkinson (1965a) primarily because we are
assuming that all arithmetic is done with precision 9, whereas it is assumed
there that some operations are done with a precision of if. We defer discus-
sion of such mixed precision arithmetic to Chapter 17.
We next consider the application of a Householder transformation to an
m x n matrix C using Algorithm H2 or the optional capability of Algorithm
HI (10.22). Mathematically we wish to compute

In actual computation, however, we only have Q available in place of Q so


the computed result will be

The matrix B will satisfy

Using an analysis similar to that on pp. 157-160 of Wilkinson (1965a), it


can be shown that

Thus using Eq. (15.25), (15.28), and (15.29) gives

We must consider the error associated with the application of A: successive


Householder transformations beginning with an m x n matrix A. The true
mathematical operation desired is
CHAP. 15 HOUSEHOLDER TRANSFORMATIONS 87

where the Householder matrix Q, is constructed to produce zeros in positions


i + l through m of column i of QlAt. This sequence of operations is basic to
the various algorithms we are considering.
The computed quantities will be

where Qt is the computed approximation to the true matrix Qt that would


have produced zeros in elements / + 1 through m of column / of the prod-
uct matrix QlA,.
Define the error matrix

From Eq. (15.30) we have the bound

where

It follows by direct substitution and the orthogonality of the matrices Ql


that

It is useful to define the error matrix

Then

and from Eq. (15.38)


88 HOUSEHOLDER TRANSFORMATIONS CHAP. 15

Equation (15.39) and the bound in Eq. (15.40) can be interpreted as show-
ing that the computed matrix Ak+l is the exact result of an orthogonal trans-
formation of a matrix A + Hk where || Hk ||r is small relative to || A ||P.
Note that the result expressed by Eq. (15.39) and (15.40) is independent
of the fact that the matrices Qt were computed for the purpose of zeroing
certain elements of A,. Equations (15.39) and (15.40) remain true for cases in
which A (and thus Ak+1) is replaced by a matrix or vector not having any
special relation to the matrices Ql.
Furthermore if the computed matrices of Eq. (15.34) are used in the op-
posite order, say,

where X is an arbitrary matrix, then the computed matrix Y satisfies

where

The orthogonal matrices Qt in Eq. (15.42) are the same as those in Eq. (15.39).
This latter observation will be needed in the proofs of Theorems (16.18),
(16.36), (17.19), and (17.23).
Finally we shall need bounds on the error in solving a triangular system
of equations. Let R be a k x k triangular matrix and c be a k-vector. Then
the computed solution x of

will satisfy

where S is a triangular matrix of the same configuration as R and satisfies

For the derivation of this result, see Forsythe and Moler (1967), pp. 103-105.
In the course of proving Eq. (15.46) in the place cited, it is shown that

so that for reasonable values of n we are assured that nonsingularity of R


implies nonsingularity of R + S.
CHAP. 15 HOUSEHOLDER TRANSFORMATIONS 89

EXERCISE

(15.48) Let G denote a Givens matrix computed by Algorithm Gl (10.25)


using exact arithmetic, and let G denote the corresponding matrix
computed using arithmetic of finite precision n. Let z be an arbitrary
machine representable 2-vector. Derive a bound of the form

and determine a value for the constant c.


16
ANALYSIS OF COMPUTING ERROR
FOR THE PROBLEM LS

In this chapter the error analyses of Chapter IS will be used to obtain


bounds for the computational errors of the algorithms of Chapters 11, 13,
and 14. These algorithms provide solution procedures for the six cases of
Problem LS defined in Fig. 1.1.
It is assumed in this chapter that alt arithmetic is done with the same
precision 9. This leads to error bounds involving n2 and mn. Smaller bounds,
involving only the first power of n and independent of m, are obtainable if
higher precision arithmetic is used at certain critical points in these compu-
tations. The theorems of this chapter will be restated for the case of such
mixed precision in Chapter 17.
(16.1) THEOREM (The full rank overdetermined least squares problem)
Let A be an m x n matrix ofpseudorank n (see Chapter 14 for defini-
tion ofpseudorank) and bbean m-vector. If X is the solution of Ax = b
computed using Algorithms HFT (11.4) and HS1 (11.10) or HFTI
(14.9) with computer arithmetic of relative precision n, then X is the
exact least squares solution of a problem

with

and

90
CHAP. 16 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS 91

Proof: The mathematical operations required may be written as

Using Eq. (15.39) and (15.40) with A there replaced by the matrix [A: b]
of Eq. (16.5), it follows that there exist an orthogonal matrix Q, a matrix (7,
and a vector f such that the computed quantities R, c, and d satisfy

with

and

In place of Eq. (16.6) to be solved for x, the computer is presented with


the problem

From Eq. (15.45) and (15.46) the computed solution x will satisfy

with

To relate this computed vector x to the original least squares problem, define
the augmented matrix

Left multiplying this matrix by the matrix QT where Q is the orthogonal


matrix of Eq. (16.7) gives
92 ANALYSIS OP COMPUTING ERRORS FOR THE PROBLEM I.S CHAP. 16

Define

Then x is the least squares solution of Eq. (16.2). Using |E| < ||G|| + ||S||
along with Eq. (16.8) to (16.10) establishes the bounds in Eq. (16.3) and
(16.4). This completes the proof of Theorem (16.1).
(16.11) THEOREM (The square nonsingular problem)
Let A be an n x n matrix of pseudorank n and b be an n-vector.
If X is the solution of Ax. = b computed using Algorithms HFT(11.4)
and HS1 (11.10) or HFTl (14.9) with computer arithmetic of relative
precision n, then X is the exact solution of a problem

with

and

This theorem is simply a special case of Theorem (16.1) for the case n — m
so the bounds in Eq. (16.13) and (16.14) follow from the bounds in Eq. (16.3)
and (16.4). We have stated Theorem (16.11) as a separate theorem because
there is often independent interest in the square nonsingular case.
It is of interest to compare Theorem (16.11) with an analogous error
analysis theorem for Gaussian elimination algorithms with pivoting [Forsythe
and Moler (1967)].
In this case one has the inequality

Here the symbol || • 1U denotes the max-row^sum norm defined for any
m x n matrix B as

The quantity pm is a measure of the relative growth of elements in the course


of Gaussian elimination (loc. cit., p. 102).
If partial pivoting is used, one has the bound
CHAP. 16 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS 93

If complete pivoting is used, one has the bound

The usual implementations of Gaussian elimination employ the partial


pivoting strategy. Thus the bound in Eq. (16.15) for the norm of the matrix E
grows exponentially with n.
With the use of the Householder algorithms such as Algorithm HFTI
(14.9), the situation is much more satisfactory and assuring. In this case the
inequality in Eq. (16.13) does not involve a growth term Pn at all. Wilkinson
(1965a) has also remarked on this but points out that roughly twice the num-
ber of operations must be performed with the Householder algorithm as
with Gaussian elimination with partial pivoting. Furthermore the quantity pn
can easily be computed in the course of executing a Gaussian elimination
algorithm and Wilkinson has stated that in practice pn rarely exceeds 4 or 8
or 16.
Analogous remarks can be made about Theorem (17.15) where extended
precision is used at certain points of the Householder algorithm.
In Theorems (16.1) and (16.11) it was shown that the computed solution
is the solution of a perturbed problem. Theorems (16.18) and (16.36), in
contrast, show that the computed solution is close to the minimum length
solution of a perturbed problem. This more complicated result is due to the
fact that Theorems (16.18) and (16.36) concern problems whose solution
would be nonunique without imposition of the minimum length solution
condition.
(16.18) THEOREM (The minimum length solution of the full rank underdeter-
mined problem)
Let A be an m X n matrix of pseudorank m and b be an m-vector. If
X is the solution of Ax — b computed using Algorithms HBT (13.8)
and HS2 (13.9) using computer arithmetic of relative precision 17,
then X is close to the exact solution of a perturbed problem in the
following sense: There exists a matrix E and a vector x such that x
is the minimum length solution of

with

and
94 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS CHAP. 16

Proof: A brief mathematical statement of the algorithm of Chapter


13 is

In place of Eq. (16.22) the computed lower triangular matrix R will satisfy

where Q is orthogonal and, using the inequality in Eq. (15.40),

Furthermore from Eq. (15.45) and (15.46) it follows that in place of Eq.
(16.23) the computed vector y will satisfy

with

In place of Eq. (16.24) the computed vector x will be defined by

where from Eq. (15.41) to (15.43) one obtains

The orthogonal matrix Q in Eq. (16.29) is the same as the matrix Q in Eq.
(16.25) as noted in the remarks associated with Eq. (15.41) to (15.43).
Rewriting Eq. (16.27) as

then
CHAP. 16 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS 95

and using Eq. (16.25) and (16.29) there follows

Define

and

From the inequality in Eq. (15.47) we may assume that R + S of Eq.


(16.32) is nonsingular and thus that the row space of A + E == [R + S: 0]QT
is the same as the space spanned by the first m columns of Q.
From Eq. (16.33) to (16.35), X satisfies Eq. (16.19), while again using Eq.
(16.35), x lies in the space spanned by the first m columns of Q. Thus X is
the unique minimum length solution of Eq. (16.19).
The matrix E and the vector (x — x) satisfy

and

which establish the inequalities in Eq. (16.20) and (16.21), completing the
proof of the Theorem (16.18).

(16.36) THEOREM (The rank-deficient Problem LS)


Let A be an m x n matrix and b be an m-vector. Let * be the solution
of Ax s b computed using Algorithm HFTI (14.9) with computer
arithmetic of relative precision n. Let k be the pseudorank determined
by the algorithm and let R22 be the computed matrix corresponding to
R22 of Eq. (14.1). Then X is close to the exact solution to a perturbed
problem in the following sense: There exists a matrix E and vectors f
and ft such that X is the minimal length least squares solution of

with
96 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS CHAP. 16

Proof: Refer to Eq. (14.1) to (14.6) for the mathematical description


of the algorithm.
Recall that the matrix Q of Eq. (14.1) is the product of n Householder
transformations, Q = Qn • • • g,. Note that the matrices R11 and R12 of Eq.
(14.1) and the vector cl of Eq. (14.2) are fully determined by just the first k
of these transformations. Let

Then

and

where

and

In place of Eq. (16.42) the computed matrices R11 R12, and S will satisfy

where, using the results of Chapter 15, Qk is an exactly orthogonal matrix and

and

Here a, is defined by Eq. (15.37), n — min (m, n) as in Chapter 14, and R22
denotes the computed matrix corresponding to R22 of Eq. (14.1).
CHAP. 16 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS 97

From Eq. (16.46) we may also write

with

In place of Eq. (14.3) the computed W satisfies

where R is an orthogonal matrix.


The bound for ||M||r is somewhat different than is given by Eq. (15.40)
since each of the k Householder transformations whose product constitutes
K has only n — k + 1 columns that differ from columns of the identity
matrix. Therefore, using a, as defined in Eq. (15.37),

In place of Eq. (16.43) the computed vectors c, and d satisfy

where Q is the same matrix as in Eq. (16.46) and

which will establish Eq. (16.39) of Theorem (16.36).


Referring to Eq. (14.4) the computed vector y1 will satisfy

with (W + Z) nonsingular and

In Eq. (14.5) we shall consider only the case y2 — 0. In place of Eq. (14.6)
98 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS CHAP. 16

the computed x satisfies

with

The error coefficient in Eq. (16.57) is determined in the same way as that
in Eq. (16.51).
Define

Then from Eq. (16.56) to (16.58)

which establishes Eq. (16.40) of the theorem.


It follows from Eq. (16.54) and (16.58), by arguments similar to those
used in proving Theorem (16.18), that X is the minimal length solution of

From Eq. (16.50)

where

and from Eq. (16.51) and (16.55)

Substituting Eq. (16.61) into Eq. (16.60) and augmenting the system with
additional rows shows that x is the minimal length least squares solution of
CHAP. 16 ANALYSIS OF COMPUTING ERRORS FOR THE PROBLEM LS 99

Left-multiplying Eq. (16.64) by QT and using Eq. (16.46) shows that X


is the unique minimum length solution of the least squares problem, Eq.
(16.37), with

Thus

or using Eq. (16.49) and (16.63), and using k <n to simplify the final expres-
sion,

With Eq. (16.53) and (16.59) this completes the proof of Theorem (16.36).
ANALYSIS OF COMPUTING ERRORS

17 FOR THE PROBLEM LS USING


MIXED PRECISION ARITHMETIC

In our analysis of computing errors we have assumed that all arithmetic


was performed with the same precision characterized by the parameter jy.
If a particular computer has the capability of efficiently executing arithmetic
in higher precision (say, with precision parameter w < n) and has efficient
machine instructions for convening numbers from one precision to the other,
then it is possible to use this extended precision at selected points in the com-
putation to improve the computational accuracy with very small extra cost
in execution time and essentially no extra storage requirements.
Generally, unless one decides to convert entirely from n-precision to
co-precision, one will limit the use of co-precision arithmetic to those parts of
the algorithm that will require only a small fixed number of co-precision
storage locations, the number of such locations being independent of the
problem dimensions m and n.
Wilkinson has published extensive analyses [e.g., Wilkinson (1965a)] of
computing errors for specific mixed precision versions of algorithms in
linear algebra assuming w = n2. We shall use similar methods of analysis to
derive error bounds for specific mixed precision versions of the algorithms
analyzed in the previous chapter.
For the computation of a Householder transformation we propose to
modify Eq. (15.13) to (15.19) as follows. Equation (15.14) will be computed
using co-precision arithmetic throughout, with a final conversion to represent
f as an n-precision number. Then the error expressions of Eq. (15.20) to
(15.25) become

100
CHAP. 17 THE PROBLEM LS USING MIXED PRECISION ARITHMETIC 101

In place of Eq. (15.27) the matrix

will be computed using co-precision arithmetic with a final conversion of the


elements of B to n-precision form. Then in place of the error bounds in Eq.
(15.29) and (15.30) we have, respectively,

and, using Eq. (17.5),

Then the bound in Eq. (15.40) on the error associated with multiplication
by k Householder transformations becomes

Finally, co-precision arithmetic could be used in solving the triangular


system of Eq. (15.44) in which case the bound in Eq. (15.46) would become

The error bound in Eq. (15.46) is already small relative to the error bounds
from other sources in the complete solution of Problem LS and thus the addi-
tional reduction of the overall error bounds due to the use of co-precision
arithmetic in solving Eq. (15.44) is very slight. For this reason we shall assume
that //-precision rather than co-precision arithmetic is used in solving Eq.
(15.44).
With this mixed precision version of the algorithms, Theorems (16.1),
(16.11), (16.18) and (16.36) are replaced by the following four theorems.
Proofs are omitted due to the close analogy with those of Chapter 16.
(17.11) THEOREM (The full rank overdetermined least squares problem)
Let A be an m x n matrix of pseudorank n and b be an m-vector.
102 THE PROBLEM LS USING MIXED PRECISION ARITHMETIC CHAP. 17

If X is the solution of Ax. = b computed using Algorithms HFT (11.4)


and HS1 (11.10) or HFTI (14.9), with mixed precision arithmetic of
precisions 17 and w<,n2 as described above, then X is the exact least
squares solution of a problem

where

ana

(17.15) THEOREM (The square nonsingular problem)


Let A be an n x n matrix of pseudorank n and b be an n-vector. If X
is the solution of Ax = b computed using Algorithms HFT (11.4) and
HS1 (11.10) or HFTI(14.9) with mixed precision arithmetic of preci-
sions n and w < n2 as described above, then X is the exact solution
of a problem

with

and

(17.19) THEOREM (The minimum length solution of the full rank underdeter-
mined problem)
Let A be an m x n matrix of pseudorank m and b be an m-vector.
If X is the solution of Ax = b computed using the Algorithms HBT
(13.8) and HS2 (13.9) with mixed precision arithmetic of precisions n
and w<,n2 as described above, then x is close to the exact solution of
a perturbed problem in the following sense: There exists a matrix E
and a vector & such that X is the minimum length solution of

with

and
CHAP. 17 THE PROBLEM LS USING MIXED PRECISION ARITHMETIC 103

(17.23) THEOREM (The rank-deficient Problem LS)


Let A be an m X n matrix and b be an m-vector. Let x be the solution
of Ax = b computed using Algorithm HFTI(14.9) with mixed preci-
sion arithmetic of precisions n and w < n2 as described above. Let k
be the pseudorank determined by the algorithm and let R22 be the
computed matrix corresponding to R22 of Eg. (14.1). Then X is close
to the exact solution to a perturbed problem in the following sense:
There exists a matrix E and vectors f and ft such that ft is the minimal
length least squares solution of

with

A different type of round-off error analysis of the Householder trans-


formation algorithm as applied to the problem Amxmx~b,m>n, Rank (A)
— n, has been carried out by Powell and Reid (1968a and 1968b). This analy-
sis was motivated by consideration of the disparately row-weighted least
squares problem which arises in the approach to the equality-constrained
least squares problem described in Chapter 22.
In Chapter 22 we shall wish to solve a least squares problem in which
some rows of the data matrix [A: b] have the property that all elements of the
row are very small in magnitude relative to the elements of some other rows
but the relative accuracy of these small elements is nevertheless essential to
the problem. By this it is meant that perturbations of these elements that are
significant relative to the size of the elements will cause significant changes in
the solution vector.
Clearly, Theorem (17.11) does not assure that the Householder algorithm
will produce a satisfactory solution to such a problem since Eq. (17.13) and
(17.14) allow perturbations to all elements of A and b which, although "small"
relative to the largest elements of A and b, respectively, may be significantly
large relative to the small elements.
Indeed Powell and Reid found experimentally that the performance of
the Householder algorithm [such as Algorithm HFTI (14.9)] on disparately
row-weighted problems depended critically on the ordering of the rows of the
matrix [A: b]. They observed that if a row of small elements was used as
the pivotal row while some of the following rows contained large elements in
the pivotal column, then in Eq. (10.7) the quantity vp would be small relative
to s. In the extreme case of | vp |/| s \ < n the quantity vp makes no contribution
in Eq. (10.7). Furthermore in such cases it may also be true that the pth
104 THE PROBLEM LS USING MIXED PRECISION ARITHMETIC CHAP. 17

component of cj will be much smaller in magnitude than the/pth component


of tjU in Eq. (10.21) so that the/nh component of cj makes no contribution in
Eq. (10.21). Thus if the elements of the pivotal row are very small relative to
the elements of some of the following rows, the effect can be essentially the
same as replacing the data of the pivotal row with zeros. If the solution is
sensitive to this loss of data, then this situation must be avoided if possible.
Powell and Reid observed satisfactory results in such problems after
adding a row-interchange strategy, which we shall describe preceding Theo-
rem (17.37).
We shall state three theorems given in Powell and Reid (1968a) that pro-
vide insight into the performance of Householder transformations in dis-
parately row-weighted problems.
For notational convenience, assume the columns of A are ordered a priori
so that no column interchanges occur during the execution of Algorithm
HFTI. Let [A<*>: b] = [A: b] and let [Au+»: £"+»] denote the computed
matrix resulting from application of the /th Householder transformation to
[Aw :bw] for i = !,...,«.
Define

where «J$> is the (/,y)-elament of A{k).


Let g(" denote the exact Householder orthogonal matrix determined by
the condition that elements in positions (/, ft), / — fc + 1,..., m, in the prod-
uct matrix Q(k)A™ must be zero.
(17.29) THEOREM [Powell and Reid (1968a)]
Let A be an m x n matrix of pseudorank n. Let A be reduced to
an upper triangular m x n matrix A(a+1> by Algorithm HFTI (14.9)
using mixed precision arithmetic of precisions q and w <, n2. Then

where the elements eu ofE satisfy

For the statement of Theorem (17.32) we need the following additional


definitions to account for the fact that the right-side vector b cannot partici-
pate in the column interchanges:
CHAP. 17 THE PROBLEM LS USING MIXED PRECISION ARITHMETIC 105

Note that because of our assumptions on the a priori ordering of the columns
of A, the quantity sk satisfies

(17.32) THEOREM (Powell and Reid (1968a)]


Let A be an m x n matrix ofpseudorank n and let b be an m-vector.
Let X be the solution of Ax = b computed using Algorithm HFTI
(14.9) with mixed precision arithmetic of precisions n and w <, n2.
Then X is the exact least squares solution of a problem

where the elements eij of E satisfy

and the elements f1 of f satisfy

Using the general fact that |eij| < ||e||r, one can obtain the inequality

from Eq. (17.13). Comparing inequalities in Eq. (17.34) and (17.36) we note
that inequality in Eq. (17.34) will be most useful in those cases in which yt is
substantially smaller than ||A||P. Analogous remarks hold regarding Eq.
(17.14) and (17.35).
Recall that y, denotes the magnitude of the largest element occurring in
row / of any of the matrices A(1),..., A(H+l\ Thus for y, to be small (relative
to || A Id,) requires that the elements of row i of the original matrix A(l} = A
must be small and that elements in row i of the successive transforming
matrices A(k} must not grow substantially in magnitude.
Powell and Reid have given an example showing that if no restriction is
placed on the ordering of the rows of A, the growth of initially small elements
in some rows can be very large. They recommend the following row-inter-
change strategy: Suppose that a pivot column has been selected and moved
106 THE PROBLEM LS USING MIXED PRECISION ARITHMETIC CHAP. 17

to the kth column of the storage array in preparation for the kth Householder
transformation. Determine a row index / such that

Interchange rows / and k. The kth Householder transformation is then


constructed and applied as usual. This process is executed for k = 1 n.
(17.37) THEOREM [Powell and Reid (1968a)]
With yt defined by Eg. (17.28) and using the column- and row-inter-
change strategy just described, the quantities yt satisfy

Powell and Reid report that using this combined column- and row-inter-
change strategy they have obtained satisfactory solutions to certain dispa-
rately row-weighted problems for which results were unsatisfactory when
rows of small elements were permitted to be used as pivotal rows.
COMPUTATION OF THE SINGULAR

18 VALUE DECOMPOSITION AND THE


SOLUTION OF PROBLEM LS
Section 1. Introduction
Section 2. The Symmetric Matrix QR Algorithm
Section 3. Computing the Singular Value Decomposition
Section 4. Solving Problem LS Using SVD
Section 5. Organization of a Computer Program for SVD

Section 1. INTRODUCTION

A review of the singular value decomposition of a matrix was given in


Golub and Kahan (1965). Included was a bibliography dealing with applica-
tions and algorithms, and some work toward a new algorithm. This work
was essentially completed in Businger and Golub (1967). An improved ver-
sion of the algorithm is given by Golub and Reinsch in Wilkinson and
Reinsch (1971). This algorithm is a special adaptation of the QR algorithm
[due to Francis (1961)] for computing the eigenvalues and eigenvectors of a
symmetric matrix.
In this chapter we present the symmetric matrix QR algorithm and the
Golub and Reinsch algorithm (slightly enhanced) for computing the singular
value decomposition.

Included are additional details relating to the use of the singular value decom-
position in the analysis and solution of Problem LS.
For notational convenience and because it is the case in which one is
most likely to apply singular value analysis, we shall assume that m>n
throughout this chapter. The case of m<n can be converted to this case by
adjoining n — m rows of zeros to the matrix A or, in the case of Problem
107
108 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS CHAP. 18

LS, to the augmented data matrix [A : b]. This adjoining of zero rows can be
handled implicitly in a computer program so that it does not actually require
the storing of these zero rows or arithmetic operations involving these zero
elements. Such an implementation of singular value decomposition with
analysis and solution of Problem LS is provided by the set of Fortran sub-
routines SVA, SVDRS, and QRBD in Appendix C.

Section 2. THE SYMMETRIC MATRIX OR ALGORITHM

The symmetric matrix QR algorithm of Francis, incorporating origin


shifts, is one of the most satisfactory in all numerical analysis. Its properties
and rate of convergence are particularly well understood. The following dis-
cussion and related theorems define this algorithm and form the basis for its
convergence proof.
An n x n symmetric matrix A can initially be transformed to tridiagonal
form by the use of (n — 2) Householder or [(n — l)(n — 2)/2] Givens ortho-
gonal similarity transformations. Thus we may assume, without loss of gen-
erality, that the symmetric matrix A whose eigenvalues are to be computed
is tridiagonal.
The QR algorithm with origin shifts for computing the eigenvalue decom-
position of an n X n symmetric tridiagonal matrix A can be written in the
form

where for k = 1,2,3,...,

(a) Qk is orthogonal
(b) Rk is upper triangular
(c) sk is the kth shift parameter

Before stating the method of computing the shift parameters <sk, notation
will be introduced for the elements of the matrices Ak. It can be verified [see
Exercise (18.46)] that A1 being tridiagonal implies that all the matrices Ak
k = 2,3,..., will be tridiagonal also. We shall write the diagonal terms of
each tridiagonal matrix Ak as a(k) i= 1,...,n, and the superdiagonal and
subdiagonal terms as b(k) i = 2 n, for reference in this chapter and in
Appendix B.
CHAP. 18 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS 109

The shift parameters ak may now be defined as follows:


(18.4) Each ak is the eigenvalue of the lower right 2x2 submatrix ofA k ,

which is closest to ai(k).

From Eq. (18.1) to (18.3) we have

so that all Ak have the same eigenvalues. We denote them asl1,...,ln,.


If for any k some of thebl(k)are zero, the matrix Ak may be split into in-
dependent submatrices in which all superdiagonal and subdiagonal elements
are nonzero. Thus we need only give our attention to the problem for which
all b(k)l are nonzero.
Furthermore, as is stated in Lemma (B.1) of Appendix B, all subdiagonal
elements being nonzero implies that all eigenvalues are distinct. Thus we need
only concern ourselves with computing eigenvalue-eigenvector decomposi-
tions of symmetric tridiagonal matrices whose eigenvalues are distinct.
The convergence of the QR algorithm is characterized by the following
theorem of Wilkinson (1968a and 1968b).
(18.5) THEOREM (Global Quadratic Convergence of the Shifted QR Al-
gorithm)
Let A. = At be an n x n symmetric tridiagonal matrix with nonzero
subdiagonal terms. Let matrices Ak orthogonally similar to A be
generated with the QR algorithm using origin shifts as described in
Eq. (18.1) to (18.3) and (18.4). Then

(a) Each Ak is tridiagonal and symmetric.


(b) The entry b(k)n of Ak at the intersection of the nth row and(n — l)st
column tends to zero as k —» oo.
(c) The convergence ofb(k)nto zero is ultimately quadratic: There exists
e > 0, depending on A, such that for all k

The proof of this theorem is given in Appendix B.


In practice the algorithm appears to be cubically convergent. However,
quadratic convergence is the best that has been proved. Further remarks on
this point are given near the end of Appendix B.
110 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS CHAP. 18

Section 3. COMPUTING THE SINGULAR VALUE


DECOMPOSITION

We now consider the construction of the singular value decomposition


of an m X it matrix A. We assume m > n. See Section 1 of this chapter for
remarks on the case m < n.
The singular value decomposition (SVD) will be computed in two stages.
In the first stage A is transformed to an upper bidiagonal matrix by a
sequence of at most 2n — 1 Householder transformations:

where

The transformation matrix Qt is selected to zero elements in rows i + 1


through m of column i, whereas the matrix H, is selected to zero elements in
columns i + 1 through n of row i — 1.
Note that Hn is simply an identity matrix. It is included in Eq. (18.6) for
notational convenience. Furthermore, Qn is an identity matrix if m = n but
is generally nontrivial if m > n.
The second stage of the SVD computation is the application of a specially
adapted QR algorithm to compute the singular value decomposition of B,

where 0 and P are orthogonal and S is diagonal. Then the singular value
decomposition of A is

We now treat the computation represented by Eq. (18.8). First note that
if any et is zero, the matrix B separates into blocks that can be treated in-
CHAP. 18 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS 111

dependency. Next we will show that if any g, is zero, this permits the applica-
tion of certain transformations which also produce a partition of the matrix.
Suppose that qk = 0, with qt 9*= 0, and ej = 0, j = k + 1,...,n. Pre-
multiplying B by (n — k) Givens rotations T, we will have

with e'j = 0,j = k + 2 , . . . , n, and q'j = 0,j = k + 1 , . . . , n.


The rotation Tj is constructed to zero the entry at the intersection of row
k and column j,j = k + 1,...,«, and is applied to rows k and j,j = k + 1,
..., n.
The matrix B' of Eq. (18.10) is of the form

where all diagonal and superdiagonal elements of B'2 are nonzero. Before we
discuss B'2 further, note that B'1 has at least one zero singular value since its
lower right corner element [position (k, k)] is zero.
When the algorithm later returns to operate on B'l this fact can be used
to eliminate ek, with the following sequence of rotations:

Here Rt, operates on columns i and k to zero position (i, k). For i > 1, the
application of this rotation creates a nonzero entry in position (i — 1, k).
We return to consideration ofB'2and for convenience revert to the symbol
B to denote this bidiagonal matrix, all of whose diagonal and superdiagonal
elements are nonzero. The symbol n continues to denote the order of B.
The singular value decomposition of B [Eq. (18.8)] will be obtained by an
iterative procedure of the form
112 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS CHAP. 18

where Uk and Vk are orthogonal and Bk is upper bidiagonal for all k. The
choice of Uk and Vk is such that the matrix

exists and is diagonal.


Note that the diagonal elements of the matrix S which results directly
from this iterative procedure are not generally positive or ordered. These
conditions are obtained by appropriate postprocessing steps.
This iterative procedure is the QR algorithm of Francis as organized for
the singular value problem by Golub and Reinsch (1970). One step of the
algorithm proceeds as follows. Given Bk, the algorithm determines the eigen-
values l1 and l2 of the lower right 2x2 submatrix of BTkBk and sets the
shift parameter sk equal to the A, nearest the value of the lower right element
of BTKBk.
The orthogonal matrix Vk is determined so that the product

is upper triangular.
The orthogonal matrix Uk is determined so that the product matrix

is upper bidiagonal.
The computational details differ from what would be suggested by the
formulas above. In particular the matrix BrkBk is never formed and the shift
by ak is accomplished implicitly.
To simplify the notation we shall drop the subscripts indicating the itera-
tion stage, k.
Using the notation of (18.7) the lower right 2x2 submatrix of BTB is

Its characteristic equation is

Since we seek the root of Eq. (18.13) that is closest to q2n + e2n, it is c
venient to make the substitution

Then s satisfies
CHAP. 18 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS 113

To aid in solving Eq. (18.15), define

and

Then y satisfies

The root y of Eq. (18.18) having smallest magnitude is given by

where

Thus, using Eq. (18.14) and (18.17), the root l of Eq. (18.13) closest to ql +
el is

and the shift parameter is defined as

Next we must determine V so that VT(BTB — al) is upper triangular [see


Eq. (18.11)]. Note that since BTB is tridiagonal, it will follow that both
VT(BTB - sI)V and vT(BTB)V are tridiagonal [see Exercise (18.46)].
There is a partial converse to these assertions which leads to the algorithm
actually used to compute V.
(18.23) THEOREM [Paraphrased from Francis (1961)]
If BTB is tridiagonal with all subdiagonal elements nonzero, V is
orthogonal, a is an arbitrary scalar,

and
(18.25) the first column ofVT(BTB— sI) is zero below the first element,
then VT(BTB — al) is upper triangular.
114 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS CHAP. 18

Using this theorem the matrices V and U of Eq. (18.11) and (18.12) will
be computed as products of Givens rotations,

and

where R, operates on columns i and i 4- 1 of B and T, operates on rows / and


i + l o f B.
The first rotation RI is determined to satisfy condition (18.25) of Theorem
(18.23). The remaining rotations R, and T, are determined to satisfy condi-
tion (18.24) without disturbing condition (18.25).
Note that it is in the use of the matrices T, that this algorithm departs
from the standard implicit QR algorithm for a symmetric matrix [Martin
and Wilkinson in Wilkinson and Reinsch (1971)]. In the symmetric matrix
problem one has a tridiagonal symmetric matrix, say, Y, and wishes to pro-
duce

In the present algorithm one has Fin the factored form BTB where B is bidi-
agonal. One wishes to produce 7 in a factored form, Y =BTB.Thus B must
be of the form B = UTBV where Kis the same orthogonal matrix as in Eq.
(18.28) and {/is also orthogonal.
Turning now to the determination of the rotations Rt and Tt of Eq. (18.26)
and (18.27), note that the first column of (BTB — al) is

with a given by Eq. (18.22). The first rotation Rt is determined by the require-
ment that the second element in the first column of RT1(BTB — s I) must be
zero.
The remaining rotations are determined in the order T1,R2,T2,...,
Rx-i> T,_i and applied in the order indicated by the parentheses in the fol-
lowing expression:
CHAP. 18 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS 115

The rotation Tt,i= 1,...,n — 1, operates upon rows i and i + 1 and


zeros the element in position (i + 1, i). The rotation Rt, i = 1,..., n — 1,
operates on columns i and i + 1 and, for i > 2, zeros the element in position
(i-1,i+1).
This is an instance of an algorithmic process sometimes called chasing.
The pattern of appearance and disappearance of elements outside the bidiago-
nal positions is illustrated for the case of n = 6 by Fig. 18.1.

Fig. 18.1 The chasing pattern for one QR sweep for the case of
n = 6.

From the preceding discussion it is clear that an SVD algorithm for a


general m x n matrix can be based upon an algorithm for the more special-
ized problem of the singular value decomposition of a nonsingular bidiagonal
matrix B with elements identified as in Eq. (18.7).
The algorithm QRBD will be stated for performing one sweep of this
fundamental part of the SVD computation. This algorithm first tests the off-
diagonal elements e, against a given tolerance parameter E. If | et | < 6 for any
i, 2 <i< n, the largest index / for which |et| < € is stored in the location
named /, and the algorithm terminates. Otherwise the algorithm sets l= 1 to
indicate that |e,|> E for2 < i < n. The algorithm proceeds to compute the
shift parameter [see Eq. (18.22)]. Next the algorithm determines and applies
the rotations Rt and Th i = 1,..., n — I [see Eq. (18.30)]. The algorithm
terminates with the elements of the transformed matrix 6 — UTBV replacing
those of B in storage.
For some purposes, such as solving Problem LS, it is necessary to multiply
certain other vectors or matrices by the matrices UT and V. Algorithm QRBD
has provision for the user to supply a k x n matrix W = {w tj } and an n x p
matrix G = {gtj} which the algorithm will replace by the A: x n product ma-
trix WV and the n x p product matrix UTG, respectively.
116 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS CHAP. 18

(18.31) ALGORITHM QRBD(q, e, n, f, l, W, k, G, p)


Step Action
1 Comment: Steps 2 and 3 scan for small off-diagonal ele-
ments.
2 For i := n, n — 1,..., 2, do Step 3.
3 If | et | <. €, set / := i and go to Step 14.
4 Set / := 1 and compute o by means of Eq. (18.16) to
(18.22).
5 Set f := 2.
6 Set *, := ql — olqt and z :— ez [see Eq. (18.29)].
7 Execute Algorithm Gl(e,_,, zt ct st et.,).
8 Execute Algorithm G2(c, 5, 0,_i, e,) and fory := 1,..., k
execute Algorithm G2(c, 5, W A ,_I, wjti).
9 Set z := $0, and qt := c^,.
10 Execute Algorithm Gl(qt.1,z,c, s, qt-1).
11 Execute Algorithm G2(c,s,eltqt) and for j := 1,.. .,p
execute Algorithm G2(c, s, &-i.>, ft.y).
12 If i = n, go to Step 14.
13 Set z := sel+l, el+i := cet+it and i := i 4- 1 and go to
Step 7.
14 Comment: If / = 1, one full QR sweep on the bidiagonal
matrix has been performed. If / > 1, the element e, is
small, so the matrix can be split at this point.
By repeated application of Algorithm QRBD to a nonsingular upper
bidiagonal matrix B, one constructs a sequence of bidiagonal matrices Bk
with diagonal terms q\k),..., ?<*> and superdiagonal terms ef\ ..., e<*>.
From Theorem (18.5) we know that the product e(^q(nk) converges quadra-
tically to zero as k —> oo. The assumption that B is nonsingular implies that
0<*> is bounded away from zero. It follows that e(k}n converges quadratically
to zero. In practice the iterations will be stopped when|e(k)n|< E.
Having accepted | e(mk} [ as being sufficiently small, one next constructs the
SVD of one or more bidiagonal matrices of order n — I or lower; this process
is continued as above with n replaced by n — 1 until n = 1. With no more
than n — 1 repetitions of the procedures above, the decomposition B = USVT
is produced.
With the precision parameter n in the range of about 10-" to 10-8 and
E = n|| B ||, experience shows that generally a total of about 2n executions of
CHAP. 18 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS 117

Algorithm QRBD are required to reduce all superdiagonal elements to less


than € in magnitude.
The diagonal elements of S are not generally nonnegative or nonincreas-
ing. To remedy this, choose a diagonal matrix D whose diagonal terms are
+1 or —1 with the signs chosen so that

has nonnegative diagonal terms. Next choose a permutation matrix P such


that the diagonal terms of

are nonincreasing. It is easily seen that

is a singular value decomposition of B with the diagonal elements of S non-


negative and nonincreasing.
The matrix V of Eq. (18.9) will be produced as a product of the form

where the matrices Ht are the Householder transformations of Eq. (18.6),


the matrices Rt are all the postmultiplying Givens rotations produced and
used proceeding and during QR sweeps [see Eq. (18.30) and the text following
Eq. (18.10)], and the matrices D and P are given by Eq. (18.32) and (18.33),
respectively.
Similarly the matrix UT of Eq; (18.9) is generated as

where P is given by Eq. (18.33), the rotations T, are all the premultiplying
Givens rotations that arise proceeding and during the QR sweeps [see Eq.
(18.10) and (18.30)], and the matrices Qt are the Householder transforma-
tions of Eq. (18.6).

Section 4. SOLVING PROBLEM LS USING SVO

For application to Problem LS the singular value decomposition,

can be used to replace the problem Ax ~ b by the equivalent problem


118 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS CHAP. 18

where

and

One may wish to consider a sequence of candidate solutions x(k) defined by

where v, is the/th column vector of V.


Note that it is convenient to compute the vectors x(k} by the formulas

The residual norm pk associated with the candidate solution x(k),


defined by

also satisfies

These numbers can be computed in the following manner:

where g(1) and g(2) are the subvectors of £ defined by Eq. (18.38).
The columns of V associated with small singular values may be inter-
preted as indicating near linear dependencies among the columns of A. This
has been noted in Chapter 12. Further remarks on the practical use of the
singular values and the quantities pk are given in Chapter 25, Section 6.

Section 6. ORGANIZATION OF A COMPUTER


PROGRAM FOR SVD

We shall describe a way in which these various quantities can be computed


with economical use of storage. This description assumes m > n. See Section
1 of this chapter for remarks on the case m < n.
CHAP. 18 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS 119

First, if mn is large and m > it, we could arrange that the data matrix
[A : b] be first reduced to an (n + 1) upper triangular matrix by sequential
Householder processing (see Chapter 27). The matrix A is reduced to bidi-
agonal fora as indicated in Eq. (18.6). The nonzero elements of B re-
place the corresponding elements of A in the storage array called A. The
transformations Qt of Eq. (18.6) may be applied to b as they are formed and
need not be saved. The vector resulting from this transformation replaces b
in the storage array called b. The transformations H, of Eq. (18.6) must be
saved in the storage space that becomes available in the upper triangle of the
A array plus an it-array, called A, say.
After the bidiagonalization the nonzero elements of B are copied into the
two it-arrays q and e, occupying locations q,,i= 1,..., it, and et,i=2,...,
it, with location e\ available for use as working storage by the QR algorithm.
The computation of the matrix V of Eq. (18.36) is initiated by explicitly
forming the product (H2 • • • //J required in Eq. (18.34). This computation
can be organized (and is so organized in the Fortran subroutine SVDRS of
Appendix C) so that the resulting product matrix occupies the first it rows of
the storage array A, and no auxiliary arrays of storage are required.
The QR algorithm is applied to B, as represented by the data stored in
arrays q and e. As each rotation Rt is produced during the QR algorithm it
is multiplied into the partial product matrix stored in the A array for the
purpose of forming V [see Eq. (18.34)]. Similarly, each rotation Tt is multi-
plied times the vector stored in the array b in order to form the vector UTb
[see Eq. (18.35) and (18.38)].
At the termination of the QR iterations the numbers stored in the loca-
tions et, i = 2,..., it will be small. The numbers stored in qtt i = 1,..., nt
must next be set to be nonnegative and sorted. Application of these sign
changes and permutations to the storage array A completes the computation
of the matrix F[see Eq. (18.34)]. Application of the permutations to the stor-
age array b completes the computation of g — VTb [see Eq. (18.35) and
(18.38)].
If desired the candidate solutions of Eq. (18.41) and (18.42) can be com-
puted and stored as column vectors in the upper n x it submatrix of the
array A, overwriting the matrix V.
The Fortran subroutine SVDRS (Appendix C) contains the additional
feature of initially scanning the matrix A for any entirely zero columns. If /
such columns are found, the columns are permuted so that the first n — I
columns are nonzero. Such a matrix has at least / of the computed singular
values exactly zero.
This feature was introduced into the subroutine so that a user can delete
a variable from a problem by the simple process of zeroing the corresponding
column of the matrix A. Without this procedure some of the computed sin-
120 SINGULAR VALUE DECOMPOSITION AND SOLUTION OF PROBLEM LS CHAP. 18

gular values that should be zero would have values of approximately tj\\A\\.
In many circumstances this would be satisfactory, but in some contexts it is
desirable to produce the exact zero singular values.
The subroutine SVDRS also tests the rows of A, and if necessary permutes
the rows of the augmented matrix [A : b] so that if A contains / nonzero rows
the first n — min (/, ri) rows of the permuted matrix are nonzero. This is only
significant when / < , in which case A has at least n — I zero singular values
and this process assures that at least n — I of the computed singular values
will be exactly zero.
When the singular value decomposition is computed for the purpose of
reaching a better understanding of a particular least squares problem, it
is desirable to have a program that prints various quantities derived from
the singular value decomposition in a convenient format. Such a Fortran
subroutine, SVA, is described and included in Appendix C. An example of
the use of SVA is given in Chapter 26.

EXERCISES

(18.46) Prove that if At is symmetric and tridiagonal, then so are the Ak,
k = 2,..., of Eq. (18.3).
(18.47) Prove that the special QR algorithm [Eq. (18.11), (18.12), and
(18.22)] converges in one iteration for 2 x 2 bidiagonal matrices.
19 OTHER METHODS FOR
LEAST SQUARES PROBLEMS
Saction 1. Normal Equations with Cholasky Decomposition
Section 2. Modtftod Gram-Schmidt Orthogonalteation

As a computational approach to least squares problems we have stressed


the use of Householder orthogonal transformations. This approach has the
numerical stability characteristic of orthogonal transformations (Chapters
15 to 17) along with ready adaptability to special requirements such as se-
quential accumulation of data (see Chapter 27). We have also suggested the
use of singular value analysis as a technique for reaching a better understand-
ing of a poorly conditioned problem (Chapters 18 and 25).
In this chapter we shall discuss some of the other computational ap-
proaches to the least squares problem. For convenience Table 19.1 gives the
high-order terms in the count of operations for the solution of the least
squares problem Amxm x = btm > n, using various methods.
One class of methods to be discussed is based on the mathematical equiva-
lence of the least squares problem Axs b and the system of linear equations
(ATA)x = (ATb). This system is referred to as the normal equations for the
problem. Methods based on forming and solving the normal equations typ-
ically require only about half as many operations as the Householder algo-
rithm, but these operations must be performed using precision 9* to
satisfactorily encompass the same class of problems as can be treated using
the Householder algorithm with precision if,
Most special processes, such as sequential accumulation of data (see
Chapter 27), for example, can be organized to require about the same number
of units of storage in either a normal equations algorithm or a Householder
algorithm. Note, however, that if the units of storage are taken to be com-
puter words of the same length for both methods, the Householder algorithm
will successfully process a larger class of problems than the normal equations
algorithm.

121
122 OTHER METHODS FOR LEAST SQUARES PROBLEMS CHAP. 19

The second method to be discussed in this chapter is modified Gram-


Schmidt orthogonalization (MGS). The numerical properties of this method
are very similar to those of the Householder algorithm. The factored re-
presentation of the Q matrix which is used in the Householder algorithm
occupies about n2/2 fewer storage locations than the explicit representation
used in MGS. Largely because of this feature, we have found the Householder
algorithm more conveniently adaptable than MGS to various special ap-
plications.
Table 19.1 OPERATION COUNTS FOR VARIOUS LEAST SQUARES
COMPUTATIONAL METHODS

Asymptotic Number of
Operations Where an Operation
Is a Multiply or Divide Plus
Method an Add
Householder triangularization mn2 — n3/3
Singular value analysis
Direct application to A 2mn2 + s(n)t
Householder reduction of A to triangular R
plus singular value analysis of R mn2 + 5n3/3 + s(n)t
Form normal equations mn2/2
Cholesky solution of normal equations n3/6
Gauss-Jordan solution of normal equations
(for stepwise regression) n3/3
Eigenvalue analysis of normal equations 4n3/3 + s(n)t
Gram-Schmidt (either classical or modified) mn2
tThe term s(n) accounts for the iterative phase of the singular value or eigenvalue com-
putation. Assuming convergence of the QR algorithm in about 2it sweeps and implementa-
tion as in the Fortran subroutine QRBD (Appendix C), s(n) would be approximately 4n3.

As a specific example of this compact storage feature of the Householder


transformation, note that one can compute the solution x of A x = b and the
residual vector r = b — AX using a total of m(n + 1) -f In + m storage
locations. Either the use of normal equations or the MGS algorithm requires
an additional n(n + l)/2 storage locations. Furthermore, as mentioned be-
fore, the assessment of storage requirements must take into consideration the
fact that the normal equations method will require n2-precision storage loca-
tions to handle the class of problems that can be handled by Householder or
MGS using ^-precision storage locations.

Section 1. NORMAL EQUATIONS WITH CHOLESKY


DECOMPOSITION

The method most commonly discussed in books giving brief treatment of


least squares computation is the use of normal equations. Given the problem
Amxmx = b one premultiplies by AT, obtaining the system
CHAP. 19 OTHER METHODS FOR LEAST SQUARES PROBLEMS 123

which is called the system of normal equations for the problem. Here

and

Equation (19.1) arises directly from the condition that the residual vector
b — Ax for the least squares solution must be orthogonal to the column
space of A. This condition is expressed by the equation

which is seen to be equivalent to Eq. (19.1) using Eq. (19.2) and (19.3).
While P has the same rank as A and thus can be singular, it is nevertheless
true that Eq. (19.1) is always consistent. To verify this note that, from Eq.
(19.3), the vector d is in the row space of A but from Eq. (19.2), the row space
of A is identical with the column space of P.
The system (19.1) could be solved by any method that purports to solve
a square consistent system of linear equations. From Eq. (19.2), however,
the matrix P is symmetric and nonnegative definite, which permits the use of
Cholesky elimination with its excellent properties of economy and stability
[Wilkinson (1965a), pp. 231-232].
The Cholesky method, which we will now describe, is based on the fact
that there exists an n x it upper triangular (real) matrix U such that

and thus the solution x of Eq. (19.1) can be obtained by solving the two tri-
angular systems

and

Alternatively, the process of solving Eq. (19.6) for y can be accomplished


as a pan of the Cholesky decomposition of an appropriate augmented ma-
trix. For this purpose define

and
124 OTHER METHODS FOR LEAST SQUARES PROBLEMS CHAP. 19

Note that P and d of Eq. (19.8) denote the same matrix and vector as in Eq
(19.2) and (19.3) and to satisfies

The upper triangular Cholesky factorization of P is then of the form

with

where V and y of Eq. (19.10) satisfy Eq. (19.5) and (19.6). Furthermore it is
easily verified that p of Eq. (19.10) satisfies

where X is any vector minimizing \\b — Ax||.


For convenience the details of computing the Cholesky decomposition
will be described using the notation of Eq. (19.5). Obviously the algorithm
also applies directly to Eq. (19.9).
From Eq. (19.5) we obtain the equations

Solving for utj in the equation involving pt] leads to the following equa-
tions, which constitute the Cholesky (also called square root or Banachiewicz)
factorization algorithm:

In Eq. (19.12) and (19.14) the summation term is taken to be zero when
i = l.
Theoretically if Rank (A) — n, then vt > 0, i=1,...,n. Equations
(19.12) to (19.14) then define unique values for each un.
If Rank (A) = k < n however, there will be a first value of /, say, i = t,
such that v, = 0. Then for i = t the numerator in the right side of Eq. (19.14)
CHAP. 19 OTHER METHODS FOR LEAST SQUARES PROBLEMS 125

must also be zero for j = r + 1 , . . . , n , since solutions exist for all the equa-
tions in (19.11). In this case there is some freedom in the assignment of values
to u,jtj = / + ! , . . . , « . One set of values that is always admissible is «,/ =
0,y — I + 1 , . . . , n [see Exercise (19.37)].
It follows that the theoretical algorithm of Eq. (19.12) to (19.14) provides
a solution for Eq. (19.5) regardless of the rank of A if it is modified so that
Eq. (19.14) is replaced by

for any value of / for which utl = 0.


Note that the upper triangular matrix R of the Householder decomposi-
tion QA = satisfies RTR = P. If Rank (A) = n, the solution of Eq. (19.5)
is unique to within the signs of rows of U [see Exercise (2.17)]. Thus if Rank
(A) = n, the matrix R of the Householder decomposition is identical with
the Cholesky matrix U to within signs of rows.
The Cholesky decomposition can also be computed in the form

where L is an n x n lower triangular matrix. The formulas for elements of L


are

In actual computation there arises the possibility that the value of v,


computed using Eq. (19.12) for the Cholesky decomposition of the (theoret-
ically) nonnegative definite matrix P of Eq. (19.2) may be negative due to
round-off error. Such errors can arise cumulatively in the computation as-
sociated with Eq. (19.2), (19.12), (19.13), and (19.14).
If a computed vt is negative for some /, one method of continuing the
computation is to set v, = 0, «„ = 0, and apply Eq. (19.15). A Fortran sub-
routine using essentially this idea is given by Healy (1968).
Another approach is to perform and record a symmetric interchange of
rows and columns to maximize the value off, at the ith step of the algorithm.
This has the effect of postponing the occurrence of nonpositive values of v,,
if any, until all positive values have been processed. When all remaining
values of v, are nonpositive, then the corresponding remaining rows of U can
be set to zero.
126 OTHER METHODS FOR LEAST SQUARES PROBLEMS CHAP. 19

The implementation of such an interchange strategy requires some reor-


dering of the operations expressed in Eq. (19.12) to (19.14) so that partially
computed values of v, will be available when needed to make the choice of
pivital element. When interchanges are used on an augmented matrix of the
form occurring in Eq. (19.8), the last column and row must not take part in
the interchanges.
A Cholesky decomposition for P of Eq. (19.8) gives rise to an upper
triangular matrix U of Eq. (19.10) having the same relationship to Problem
LS as does the triangular matrix obtained by Householder triangularization
of X. This can be a useful observation in case one already has data in the
form of normal equations and then wishes to obtain a singular value analysis
of the problem.
Ignoring the effects of round-off error, the same information obtainable
from singular value analysis of A (see Chapter 18 and Chapter 25, Section 5)
can be obtained from an eigenvalue analysis of the matrix P of Eq. (19.2).
Thus if the eigenvalue-eigenvector decomposition of P is

with eigenvalues,s211> s222 > • • • > s2nt one can compute

where dand w2 are defined by Eq. (19.8). Then with the change of variables
x = Vp
Problem LS is equivalent to the problem

This least squares problem is equivalent to problem (18.37) obtained from


singular value analysis since the diagonal matrix S and the n-vector g(1)of
Eq. (19.19) are the same as in Eq. (18.37) and y of Eq. (19.19) and g(2) of Eq.
(18.37) satisfy y = |g(2)|.
For a given arithmetic precision 17, however, the quantities S,g(l), and
y — ||g(2) || will be determined more accurately by singular value analysis
of [A : b] or of the triangular matrix R = Q[A : b] obtained by Householder
triangularization than by an eigenvalue analysis of [A : b]T[A : b].
As mentioned in the introduction of this chapter, using arithmetic of a
fixed relative precision 17, a larger class of problems can be solved using
Householder transformations than can be solved by forming normal equa-
CHAP. 19 OTHER METHODS FOR LEAST SQUARES PROBLEMS 127

tions and using the Cholesky decomposition. This can be illustrated by con-
sidering the following example.
Define the 3 x 2 matrix

Suppose the value off is such that it is significant to the problem, c > lOOiy,
say, but €2 < qt so that when computing with relative precision 17 we have
1 — e & I but 3 -f- Ea is computed as 3. Thus instead of computing

we shall compute the matrix

plus some random errors of order of magnitude 3n.


Further computation of the upper triangular matrix

that satisfies WR = ATA yields, using the Cholesky method [Eq. (19.12) to
(19.14)],

This final equation is computed as

plus random errors of the order of magnitude 3n and of arbitrary sign. The
correct result at this point would be of course
128 OTHER METHODS FOR LEAST SQUARES PROBLEMS CHAP. 19

Thus using arithmetic of relative precision jy, no significant digits are ob-
tained in the element r22. Consequently the matrix R is not distinguishable
from a singular matrix. This difficulty is not a defect of the Cholesky decom-
position algorithm but rather reflects the fact that the column vectors of ATA
are so nearly parallel (linearly dependent) that the fact that they are not
parallel cannot be established using arithmetic of relative precision n.
Contrast this situation with the use of Householder transformations to
triangularize A directly without forming ATA. Using Eq. (10.5) to (10.11) one
would compute

Then to premultiply the second column, a2, of A by I + b-l uuT one computes

and

The important point to note is that the second and third components of
#2, which are of order of magnitude e, were computed as the difference of
quantities of order of magnitude unity. Thus, using imprecision arithmetic,
these components are not lost in the round-off error.
The final step of Householder triangularization would be to replace the
second components of a2 by the negative euclidean norm of the second and
third components, and this step involves no numerical difficulties.
CHAP. 19 OTHER METHODS FOR LEAST SQUARES PROBLEMS 129

Collecting results we obtain

plus absolute errors of the order of magnitude 3n.


In concluding this example we note that, using if-precision arithmetic,
applying Householder transformations directly to A produces a triangular
matrix that is clearly nonsingular, whereas the formation of normal equa-
tions followed by Cholesky decomposition does not.

Section 2. MODIFIED GRAM-SCHMIDT


ORTHOGONALIZATION

Gram^-Schmidt orthogonalization is a classical mathematical technique


for solving the following problem: Given a linearly independent set of vec-
tors (a,,. ..,«„}, produce a mutually orthogonal set of vectors {ft oj
such.that for k — 1 , . . . , » , the set (q1,...,&} spans the same it-dimen-
sional subspace as the set (a, ak}. The classical set of mathematical for-
mulas expressing q} in terms of a, and the previously determined vectors
9\* • • • » 9 / - i appear as follows:

where

To convert to matrix notation, define A to be the matrix with columns


ajt Q to be the matrix with columns qjt and R to be the upper triangular
matrix with unit diagonal elements with the strictly upper triangular elements
given by Eq. (19.22). Then Eq. (19.20) and (19.21) can be written as

It is apparent that Gram-Schmidt orthogonalization can be regarded as being


another method for decomposing a matrix into the product of a matrix with
orthogonal columns and a triangular matrix.
Experimental evidence [Rice (1966)] indicated that Eq. (19.20) to (19.22)
have significantly less numerical stability than a certain mathematically
130 OTHER METHODS FOR LEAST SQUARES PROBLEMS CHAP. 19

equivalent variant. The stability of this modified Gram-Schmidt method was


established by Bjorck (1967a), who obtained the expressions given below as
Eq. (19.35) and (19.36).
Note that the value of the inner product (aRjqt) will not change if a} is
replaced by any vector of the form

since qlq — 0 for k = i. In particular, it is recommended that Eq. (19.22)


be replaced by

where

In fact, if one poses the problem of choosing the numbers a* to minimize the
norm of the vector defined by expression (19.24), it is easily verified that the
minimal length vector is given by Eq. (19.26). Thus the vector a, of Eq. (19.22)
and the vector a(t) of Eq. (19.25) are related by the inequality ||a ( j) || < ||aj||.
The superior numerical stability of this modified algorithm derives from this
fact.
The amount of computation and storage required is not increased by this
modification since the vectors af can be computed recursively rather than
using Eq. (19.26) explicitly. Furthermore, the vector a{J+1} can replace the
vector af in storage.
This algorithm, which has become known as the modified Gram-Schmidt
(MGS) algorithm [Rice (1966)], is described by the following equations:

To use MGS in the solution of Problem LS one can form the augmented
matrix

and apply MGS to the m x (n 4- 1) matrix A to obtain


CHAP. 19 OTHER METHODS FOR LEAST SQUARES PROBLEMS 131

where the matrix R is upper triangular with unit diagonal elements. The
strictly upper triangular elements of R are given by Eq. (19.30). The vectors
q, given by Eq. (19.28) constitute the column vectors of the m x (n + 1)
matrix Q. One also obtains the (n + 1) x (n + 1) diagonal matrix D with
diagonal elements </,»' = 1 » . . . , » + 1, given by Eq. (19.29).
For the purpose of mathematical discussion leading to the solution of
Problem LS, let Q be an m x m orthogonal matrix satisfying

Then

where D and R have been partitioned as

and

Then for arbitrary x we have

Therefore the minimum value of the quantity \\Ax — b\\ is \dn+l \ and this
value is attained by the vector Jc, which satisfies the triangular system

For a given precision of arithmetic MGS has about the same numerical
stability as Householder triangularization. The error analysis given in Bjorck
(1967a) obtains the result that the solution to Ax s b computed using MGS
132 OTHER METHODS FOR LEAST SQUARES PROBLEMS CHAP. 19

with mixed precision fo, jy*) is the exact solution to a perturbed problem
(X + £)xS&-t-/with

and

Published tests of computer programs [e.g., Wampler (1969)] have found


MOS and Householder programs to be of essentially equivalent accuracy.
The number of arithmetic operations required is somewhat larger for
MOS than for Householder (see Table 19.1) because in MGS one is always
dealing with column vectors of length m, whereas in Householder triangu-
larization one deals with successively shorter columns. This fact also means
that programs for MOS usually require more storage than Householder
programs, since it is not as convenient in MGS to produce the JR matrix in
the storage initially occupied by the matrix A. With some extra work of mov-
ing quantities in storage, however, MOS can be organized to produce JF in
the storage initially occupied by A if one is not saving the 9, vectors.
Hie MOS method can be organized to accumulate rows or blocks of rows
of [A : b] sequentially to handle cases in which the product mn is very large
and m > n. This possibility rests on the fact that the (n + 1) x (n + 1)
matrix (£>R) [see Eq. (19.34)] defines the same least squares problem as the
m x (n H- 1) matrix [A : b]. This fact can be used to develop a sequential
MOS algorithm in the same way as the corresponding property of the
Householder triangularization is used in Chapter 27 to develop a sequential
Householder method.
Specialized adaptations of Gram-Schmidt orthogonalization include the
conjugate gradient method of solving a positive definite set of linear equations
[Hestenes and Stiefel (1952); Beckman, appearing in pp. 62-72 of Ralston
and Wut (1960); Kammerer and Nashed (1972)] that has some attractive
features for large sparse problems [Reid (1970), (1971a), and (1971b)] and
the method of Forsythe (1957) for polynomial curve fitting using orthogo-
nalized polynomials.

EXERCISES

(19.37) Let it be an n x n upper triangular matrix with some diagonal


elements r lt = 0. Show that there exists an n x it upper triangular
matrix tf such that CTO= 1TR anduij- O, j = i,...,n, for a
values of / for which rtt = 0. Hint: Construct U in the form U =
QR, where Q is the product of a finite sequence of appropriately
chosen Givens rotation matrices.
CHAP. 19 OTHER METHODS FOR LEAST SQUARES PROBLEMS 133

(19.38) [Peters and Wilkinson (1970)] Assume Rank (Amxm) = n and m > n.
By Gaussian elimination (%f"Mnfg>g partial pivoting) the decomposi-
tion A = PLR can be obtained where L»XJt is lower triangular with
unit diagonal elements, J^x, is upper triangular, and P is a per-
mutation matrix accounting for the row interchanges.
(a) Count operations needed to compute this Tdecomposition.
(b) Problem LS can be solved by solving L Ly = LTPTb by the
Cholesky method and then solving Rx = y. Count the oper-
ations needed to form LTL and to compute its Cholesky
factorization.
(c) Show that the total operation count of this method is the same
as for the Householder method (see Table 19.1). (We suggest
determining only the coefficients of mn* and n9 in these opera-
tion counts.)
(19.39) Let A be an m x n matrix of rank n. Let A » fl rinQ be the decom-
position of A obtained by the (exact) application of the modified
Gram-Schmidt algorithm to A. Assume g and R are adjusted so
that the columns of Q have unit euclidean length. Consider the
(exact) application of the algorithm HFT(11.4) to the (m + it) x n
matrix PS"*"] and let f"^"x"l denote the data that replace R] in
storage after execution of HFT. Show that B is the same as R
to within signs of rows and C is the same as Q to within signs
and normalization of columns.
(19.40) (Cholesky Method Without Square Roots) In place of Eq. (19.5)
consider the decomposition WTDW — P, where W is upper trian-
gular with unit diagonal elements and d is diagonal. Derive
formulas analogous to Eq. (19.12) to (19.14) for computing the
diagonal elements of D and the strictly upper triangular elements
of W. Show that using this decomposition of f [Eq. (19.8)] prob-
lem (19.1) can be solved without computing any square roots.
LINEAR LEAST SQUARES WITH

2O LINEAR EQUALITY CONSTRAINTS


USING A BASIS OF THE NULL SPACE

In this chapter we begin the consideration of least squares problems in


which the variables are required to satisfy specified linear equality or inequal-
ity constraints. Such problems arise in a variety of applications. For example,
in fitting curves to data, equality constraints may arise from the need to inter-
polate some data or from a requirement for adjacent fitted curves to match
with continuity of the curves and possibly of some derivatives. Inequality
constraints may arise from requirements such as positivity, monotonicity,
and convexity.
We shall refer to the linear equality constrained least squares problem as
Problem LSE. The problem with linear inequality, as well as possibly linear
equality constraints, will be called Problem LSI. Three distinct algorithms
for solving Problem LSE will be given here and in Chapters 21 and 22,
respectively. Problem LSI will be treated in Chapter 23.
To establish our notation we state Problem LSE as follows:
(20.1) PROBLEM LSE (Least Squares with Equality Constraints)
Given on mt x n matrix C of rank k1, an mi-vector d, an m2 x n
matrix E, and an mt-vector f, among all n-vectors x that satisfy

find one that minimizes

Clearly Problem LSE has a solution if and only if Eq. (20.2) is consistent.
We shall assume the consistency of Eq. (20.2) throughout Chapters 20, 21,
22, and 23. In the usual practical cases of this problem one would have
134
CHAP. 20 LINEAR EQUALITY CONSTRAINTS USING A BASIS OF NULL SPACE 135

n > ml — kt, which would assure that Eq. (20.2) is consistent and has more
than one solution.
It will subsequently be shown that, if a solution for Problem LSE exists,
it is unique if and only if the augmented matrix is of rank n. In the case
of nonuniqueness there is a unique solution of minimal length.
Clearly Problem LSE could be generalized to the case in which Eq. (20.2)
is inconsistent but is interpreted in a least squares sense. We have not seen this
situation arise in practice; however, our discussion of Problem LSE would
apply to that case with little modification.
The three algorithms to be given for Problem LSE are each compact in
the sense that no two-dimensional arrays of computer storage are needed
beyond that needed to store the problem data. Each of the three methods can
be interpreted as having three stages:

1. Derive a lower-dimensional unconstrained least squares problem from


the given problem.
2. Solve the derived problem.
3. Transform the solution of the derived problem to obtain the solution
of the original constrained problem.

The first method [Hanson and Lawson (1969)], which will be described in
this chapter, makes use of an orthogonal basis for the null space of the matrix
of the constraint equations. This method uses both postmultiplication and
premultiplication of the given matrix by orthogonal transformation
matrices.
If the problem does not have a unique solution, this method has the
property that the (unique) minimum length solution of the derived uncon-
strained problem gives rise to the (unique) minimum length solution of the
original constrained problem. Thus this first method is suggested for use
if it is expected that the problem may have deficient pseudorank and if stabi-
lization methods (see Chapter 25) based on the notion of limiting the length
of the solution vector are to be used.
This method is amenable to numerically stable updating techniques for
solving a sequence of LSE problems in which the matrix C is successively
changed by adding or deleting rows. The method is used in this way by Stoer
(1971) in his algorithm for Problem LSI.
Furthermore, this first method is of theoretical interest. In Theorem
(20.31) this method is used to show the existence of an unconstrained least
squares problem which is equivalent to Problem LSE in the sense of having
the same set of solutions for any given right-side vectors d and f. This permits
the application to Problem LSE of certain computational procedures, such
as singular value analysis.
136 UNBAR EQUALITY CONSTRAINTS USING A BASIS OP NULL SPACE CHAP. 20

The second method, presented in Chapter 21, uses direct elimination by


premultiplication using both orthogonal and nonorthogonal transformation
matrices.
Either of the first two methods is suitable for use in removing equality
constraints (if any) as a first step in solving Problem LSI. Also, either method
is adaptable to sequential data processing in which new rows of data are
adjoined to either the constraint system [C: d] or the least squares system
[E: f] after the original data have already been triangularized.
The third method for Problem LSE, presented in Chapter 22, illuminates
some significant properties of the Householder transformation applied to
a least squares problem with disparately weighted rows. From a practical
point of view the main value of this third method is the fact that it provides
a way of solving Problem LSE in the event that one has access to a subroutine
implementing the Householder algorithm for the unconstrained least squares
problem but one does not have and does not choose to produce code imple-
menting one of the specific algorithms for Problem LSE.
We turn now to our first method for Problem LSE. This method makes
use of an explicit parametric representation of the members of the linear fiat

If

is any orthogonal decomposition (see Chapter 2 for definition) of C, and Kis


partitioned as

then from Theorem (2.3), Eq. (2.8), it follows that X may be represented as

where

and y2 ranges over the space of all vectors of dimension n — k1.


(20.9) THEOREM
Assuming Eq. (20.2) is consistent, Problem LSE has a unique minimal
length solution given by
CHAP. 20 LINEAR EQUALITY CONSTRAINTS USING A BASIS OF NULL SPACE 137

where

or equivalently

where K2 is defined by Eq. (20.6).


The vector His the unique solution vector for Problem LSE if and
only If Eq. (20.2) is consistent and the rank of is n.
L. —-*
Proof: To establish the equivalence of Eq. (20.10) and (20.11) note
first that

and thus

The fact that

can be verified by directly checking the four Penrose conditions [Theorem


(7.9)] and recalling that

Thus Eq. (20.10) and (20.11) are equivalent.


Using Eq. (20.7) it is seen that the problem of minimizing the expression
of Eq. (20.3) over all x e X is equivalent to finding an (n—A:,)-vector yi that
mnnmifljpy

or equivatently

From Eq. (7.5) the unique minimal length solution vector for this problem
is given by
138 LINEAR EQUALITY CONSTRAINTS USING A BASIS OF NULL SPACE CHAP. 20

Thus using Eq. (20.7) a solution vector for Problem LSE is given by

which is equivalent to Eq. (20.11) and thus to Eq. (20.10) also.


Since the column vectors of K2 are mutually orthogonal, and also orthog-
onal to x, the norm of any vector x e X satisfies

If y2 = y2 also minimizes Eq. (20.10), then ||y2 > || y2||. Thus x = x 4-


Ktyt is a solution to LSE but

It follows that x is the unique minimum length solution vector for Problem
LSE.
It remains to relate the uniqueness of the solution of Problem LSE to
k — Rank . Clearly if k < n there exists an H-vector w ^ 0 satisfying

and if St is a solution of Problem LSE, so is % + w.


On the other hand, if k — n, consider the (m, 4- m2) x n matrix

whose rank is also n. Having only n columns, all columns must be indepen-
dent. Since C#2 = 0, it follows that £tf2 must be of rank n — kv. Thus $ of
Eq. (20.13) is the unique vector minimizing Eq. (20.12), and consequently x
of Eq. (20.14) is the unique solution of Problem LSE.
This completes the proof of Theorem (20.9).

A computational procedure can be based on Eq. (20.11). We shall assume


Ar, = m, since this is the usual case in practice. Then the matrix H of Eq.
(20.5) can be taken as the identity matrix.
Let K be an n x n orthogonal matrix that when postmultiplied times C
transforms C to a lower triangular matrix. Postmultiply C and Eby K; thus
CHAP. 20 LINEAR EQUALITY CONSTRAINTS USING A BASIS OF NULL SPACE 139

where C, is mt x ml lower triangular and nonsingular.


Solve the lower triangular system

for the mi-vector $t. Compute

Solve the least squares problem

for the (n — m,)-vector y2. Finally, compute the solution vector,

The actual computational steps are described by means of the following


algorithm. Initially the matrices and vectors C, d, E, and/are stored in arrays
of the same names. The array £ is of length m,. The arrays h, u, and P are each
of length it — m1. The solution of minimal length is returned in the storage
array, x, of length n. The parameter r is a user provided tolerance parameter
needed at Step 6.
(20.24) ALGORITHM LSE(C, d, E,f, /»„ m2t n, g, h, u,p, x, T)
Step Description
1 For i := 1,..., /H,, execute Algorithm Hl(/, / + 1, », cn,
&» 4+i.i, flit — i). [See Eq. (20.19). Here and in Step 2
Algorithms HI and H2 operate on rows of the arrays C
and£]
2 For i := 1,..., m^ execute Algorithm H2(i, / + 1, it, cn,
g<> en, mt). [See Eq. (20.19). If the matrices C and E were
stored in a single (m, + m2) x n array, say W, then Steps
1 and 2 could be combined as: For i := 1,..., ml execute
Algorithm Hl(i, i -f 1,», wtlt gt, wt+i,i, »i, + wa — /).]
3 Set Jc, := d1/c11. If m1 = 1, go to Step 5.
140 LINEAR EQUALITY CONSTRAINTS USING A BASIS OF NULL SPACE CHAP. 20

Step Description
4 For / := 2,..., m,, set

[See Eq. (20.20).]


5 For i := 1,..., wa, set

[See Eq. (20.21).]


6 Execute Algorithm HFTI(ei,mi+1,ra2,n - mi, /, r, zmi+1,
k,h,u,p) of Eq. (14.9). [The solution £2 of Eq. (20.22)
has been computed and stored in locations x^i :— mi +1,
..., n.]
7 For i:— mlt m, — 1,..., 1, execute Algorithm H2(/,
i + 1, n, cn, &, x, 1). [See Eq. (20.23). Here Algorithm H2
refers to rows of the array C and applies transformations to
the singly subscripted array, *.]
8 Comment: The array named x now contains the minimal
length solution of Problem LSE.

As an example of Problem LSE, consider the minimization of || Ex — /1|


subject to Cx = d where
(20.25) C = [0.4087 0.1593]
^^. _ T0.4302 0.35161
(20^6) JT-^ 03JMJ

(20.27) d=0.1376
0.6593
(20.28) f=
0.9666

The Householder orthogonal matrix, A", which triangularizes C from


the right is

Of course this matrix would not usually be computed explicitly in executing


Algorithm LSE (20.24). Using Eq. (20.19) to (20.23) we compute
CHAP. 20 UNBAR EQUALITY CONSTRAINTS UMNO A BASIS Of NULL SPACE 141

We turn now to some theoretical consequences of Theorem (20.9). Equa-


tion (20.10) can be rewritten as

if we define

and

But Eq. (20.30) implies that £ is the unique minimum length solution to
the problem of minimizing

where A is the pseudoinverse of the matrix A+ defined in Eq. (20.30).


(20.31) THEOREM
The pseudoinverse of the matrix A+ defined in Eq. (20.30) is

where

and Z and K2 are defined as in Theorem (20.9).


142 LINEAR EQUALITY CONSTRAINTS USING A BASIS OF NULL SPACE CHAP. 20

Proof: Recall that

and

Other relations that follow directly from these and will be used in the
proof are

and

Equation (20.34) establishes the equality of the different expressions for £ in


Eq. (20.33).
Next define

and verify the four Penrose equations [Theorem (7.9)] as follows:

which is symmetric.

which is symmetric.
CHAP. 20 LINEAR EQUALITY CONSTRAINTS USING A BASIS OF NULL SPACE 143

and

This completes the proof of Theorem (20.31).

As a numerical illustration of Theorem (20.31) we evaluate Eq. (20.32)


and (20.33) for the data (20.25) to (20.28). From Eq. (20.33) we have

and from Eq. (20.32)

According to Theorem (20.31) this matrix has the property that for an arbi-
trary one-dimensional vector, d, and two-dimensional vector,/, the solution
of the least squares problem

is also the solution of the following Problem LSE: Minimize \\Ex-f\\


subject to Cx = d where C and E are defined by Eq. (20.25) and (20.26).
It should be noted, however, that the residual norms for these two least
squares problems are not in general the same.
LINEAR LEAST SQUARES WITH

21 LINEAR EQUALITY CONSTRAINTS


BY DIRECT ELIMINATION

In this chapter Problem LSE, which was introduced in the preceding


chapter, will be treated using a method of direct elimination by premultiplica-
tion. It will be assumed that the rank k1 of C is ml and the rank of is n.
From Theorem (20.9) this assures that Problem LSE has a unique solution for
any right-side vectors d and f.
It will be further assumed that the columns of are ordered so that
the first MI columns of C are linearly independent. Column interchanges to
achieve such an ordering would be a necessary part of any computational pro-
cedure of the type to be discussed in this chapter.
Introduce the partitioning

and

The constraint equation Cx — d can be solved for xt and will give

Substituting this expression for xt in || Ex —/1| gives


144
CHAP. 21 LINEAR EQUALITY CONSTRAINTS BY DIRECT ELIMINATION 146

as the expression to be minimized to determine x2.


Conceptually this leads to the following solution procedure: Compute

and

Solve the least squares problem

and finally compute

There are a variety of ways in which these steps can be accomplished. For
example, following Bjorck and Golub (1967), suppose a QR decomposition
of C, is computed so that

where Q1 is orthogonal and C1 is upper triangular. Then Eq. (21.4) and (21.5)
can be written as

and

The resulting algorithm may be described as follows. Compute House-


holder .transformations to triangularize Ct and also apply these transfor-
mations to C2 and d:

Compute the m* x ml matrix £, as the solution of the triangular system

Compute
46 LINEAR EQUALITY CONSTRAINTS BY DIRECT ELIMINATION CHAP. 21

and

Compute Householder transformations to triangularize £t and also apply


these transformations to /:

Finally compute the solution vector [ ?! I by solving the triangular system

This algorithm requires no two-dimensional arrays of computer storage


other than those required to store the initial data since quantities written
with a tilde can overwrite the corresponding quantities written without a tilde
and quantities written with a circumflex can overwrite the corresponding
quantities written with a tilde.
The coding of this algorithm can also be kept remarkably compact, as is
illustrated by the ALGOL procedure decompose given in Bjorck and Golub
(1967). Suppose the data matrb s stored in an m x n array, A, with m
= m, + ma and the data vectoi s stored in an m-array, b. Steps (21.11)
and (21.15) can be accomplished
accomplished by
by the
the same
same code,
code, essentially
essentiallySteps
Steps2-12
2-12of
of
Algorithm HFTI (14.9). When executing Eq. (21.11) arithmetic operations are
performed only on the first m, rows of [A: b] but any interchanges of columns
that are required must be done on the full m-dimensional columns of A.
Steps (21.12) to (21.14), which may be interpreted as Gaussian elimina-
tion, are accomplished by the operations
CHAP. 21 LINEAR EQUALITY CONSTRAINTS BY DIRECT ELIMINATION 147

As a numerical example of this procedure, let us again consider Problem


LSE with the data given in Eq. (20.25) to (20.28). For these data the matrix
Qt of Eq. (21.11) can be taken to be the 1 x 1 identity matrix. Using Eq.
(21.11) to (21.16) we compute
LINEAR LEAST SQUARES WITH

22 UNEAR EQUALITY CONSTRAINTS


BY WEIGHTING

An observation that many statisticians and engineers have made is the


following. Suppose one has a linear least squares problem where one would
like some of the equations to be exactly satisfied. This can be accomplished
approximately by weighting these constraining equations heavily and solving
the resulting least squares system. Equivalently, those equations which one
does not necessarily want to be exactly satisfied can be downweighted and
the resulting least squares system can then be solved.
In this chapter we shall analyze this formal computational procedure for
solving Problem LSE (20.1). Basically the idea is simple. Compute the solution
of the least squares problem:

(using Householder's method, for example) for a "small" but nonzero value ofe.
Then the solution x(f) of the least squares problem (22.1) is "close" [in a sense
made precise by inequality (22.38) and Eq. (22.37)] to the solution X of Problem
LSE.
This general approach is of practical interest since some existing linear
least squares subroutines or programs can effectively solve Problem LSE by
means of solving the system of Eq. (22.1).
An apparent practical drawback of this idea is the fact that the matrix of
problem (22.1) becomes arbitrarily poorly conditioned [assuming Rank (C)
< n] as the "downweighting parameter" c is made smaller. This observation
certainly does limit the practicality of the idea if problem (22.1) is solved by
the method of normal equations [see Eq. (19.1)]. Thus the normal equation
148
CHAP. 22 LINEAR EQUALITY CONSTRAINTS BY WEIGHTING 149

for problem (22.1) is

and unless the matrices involved have very special structure, the computer
representation of the matrix aCTC + €*£?£ will be indistinguishable from
the matrix C*C (and C*d + f £T/wiU be indistinguishable from CTd) when
e is sufficiently small.
Nevertheless, Powell and Reid (1968a and 1968b) have found experi-
mentally that a satisfactory solution to a disparately weighted problem such
as Eq. (22.1) can be computed by Householder transformations if care is
taken to introduce appropriate row interchanges. The reader is referred to
Chapter 17, in which the principal theoretical results of Powell and Reid
(1968a) are presented in the form of three theorems.
In Powell and Reid (1968a) the interchange rules preceding the-construc-
tion of the &th Householder transformation are as follows. First, do the usual
column interchange as in Algorithm HFTI (14.9), moving the column whose
euclidean length from row k through m is greatest into column k. Next scan
the elements in column k from row k through m. If the element of largest
magnitude is in row /, interchange rows / and k.
This row interchange rule would generally be somewhat overcautious for
use in problem (22.1). The main point underlying the analysis of Powell and
Reid (1968a) is that a very damaging loss of numerical information occurs if
the pivot element vf [using the notation of Eq. (10.5) to (10.11)] is significant
for the problem but of insignificant magnitude relative to some elements vt,
i > p. By "significant for the problem** we mean that the solution would be
significantly changed if v, were replaced by zero. But if |v,| is sufficiently
small relative to some of the numbers \vt\, i> p, then vf will be computa-
tionally indistinguishable from zero in the calculation of s and u, by Eq. (10.5)
and (10.7), respectively.
Suppose the nonzero elements of C and E in problem (22.1) are of the
same order of magnitude so that any large disparity in sizes of nonzero ele-
ments of the coefficient matrix of Eq. (22.1) is due to the small multiplier e.
Then the disastrous loss of numerical information mentioned above would
generally occur only if the pivot element v, [of Eq. (10.5)] were chosen from
a row of cE (say, v, = eetj) while some row of C (say, row /) has not yet been
used as a pivot row and contains an element (say, cu) such that j cu | > e \ etj
This situation can be avoided by keeping the rows in the order indicated in
Eq. (22.1) so that all rows of C are used as pivot rows before any row ofcE
is used. Here we have used the symbols f£and C to denote those matrices or
their successors in the algorithm.
We now turn to an analysis of convergence of the solution vector of
problem (22.1) to the solution vector of Problem LSE as 6 —»0. First consider
160 LINEAR EQUALITY CONSTRAINTS BY WEIGHTING CHAP. 22

the special case of Problem LSE in which the matrix C is diagonal and E is
an identity matrix. This case is easily understood, and it will be shown
subsequently that the analysis of more general cases is reducible to this special
case.
Define

and

Together the following two lemmas show that the solution of the down-
weighted least squares problem (22.1) converges to the solution of Problem
LSE, as e —+ 0, for the special case of C and E as given in Eq. (22.2) and
(22.3).
(22.4) LEMMA
Suppose that C and E are given in Eq. (22.2) and (223). Then the solu-
tion of Problem LSE using the notation of Eq. (20.2) and (203) is
given by

The length of Ex - f is

(22.7) LEMMA
Let C and E be as in Eq. (22.2) and (223). The unique least squares
solution of problem (22.1) is given by

Lemma (22.7) can be easily verified by forming and solving the normal
equations (CTC + €1ETE)x = (C*d -f €*E*f). With the hypotheses present
CHAP. 22 UNBAR BQUAU1Y CONstrAlNIS BY WDOH11NO 151

on C and £this system has a diagonal coefficient matrix from which Eq. (22.8)
easily follows.
For convenience in expressing the difference between 3(c) and X define
the vector-valued function ft(e) by

Thus, using Eq. (22.5) and (22.8),

and

The residual vector for problem (22.1) is

Introduce the weighted euclidean norm

then

From Eq. (22.10) we see that

while from Eq. (22.14) we see that


152 UNBAR EQUALITY CONSTRAINTS BY WHOHTTNO CHAP. 22

For practical computation one is interested in the question of how small


f must be to assure that S(e) and & are indistinguishable when their compo-
nents are represented as computer words with a relative precision of 9. If all
components of the vector £ are nonzero, this condition is achieved by requir-
ing that

This will be satisfied for all | e | <. e0, where

Using Eq. (22.9) and (22.11) the vector norm of the difference between
x(f) and X is bounded as follows:

Thus in order to have \\x(e) — *|| ^ ij\\£|| it suffices to choose \e\ ^ f0»
where

Note that Eq. (22.18) does not apply if ||/— *|| = 0. In this case, however,
Eq. (22.1) is consistent and has the same solution for all e ^ 0.
We now consider Problem LSE for the more general case in which C is
wit x n of rank A:, <;/w, <n,Eism2 x », and the matrix {€*:£*} is of rank
n. It is further assumed that the constraint equations Cx = d are consistent
By appropriate changes of variables we, shall reduce this problem to the
special one previously considered. First note that for | e | < 1 the least squares
problem (22.1) is exactly equivalent to the least squares problem

where

Next we introduce a uniform rescaling of problem (22.19), multiplying all


coefficients and right-side components by (1 — c*)~l/* to obtain the problem
CHAP. 22 LINEAR BQUALTry CONSTRAINTS BY WEIGHTING 153

where

Introduce a QR decomposition of £*:

where Q is (m, + ma) x (mt -f ma) and orthogonal and R is » x n and


nonsingular. With

we have the equivalent least squares problem

Let

be a singular value
l
decomposition of CR~l [see Theorem (4.1)]. Since C is
of rank *„ CR~ is also, and we have

wherej,^ ••• ^j*t>0.


Partition g into two segments,

Then with P*> = z the least squares problem of Eq. (22.24) is equivalent
to the feast squares problem
164 LINEAR EQUALITY CONSTRAINTS BY WEIOHTINO CHAP. 22
Since Cx — dis consistent,

Thus problem (22.1) has been transformed to problem (22.26), where the
coefficient matrix consists of a zero matrix and two diagonal matrices, one of
which is a scalar multiple of the identity matrix.
From Lemmas (22.4) and (22.7) it follows that as p tends to zero the solu-
tion £(/>) of problem (22.26) converges to the (unique) solution £ of the prob-
lem

On the other hand, by means of the substitutions

and

where R and Kare defined by Eq. (22.22) and (22.25), respectively, Problem
LSE (20.1) is converted to problem (22.28).
Thus using Eq. (22.29) and (22.30) we obtain

as the (unique) solution of problem (22.1) with p = e(l — 62)~1/2 and

as the (unique) solution of Problem LSE (20.1).


To express the difference between x(p) and £, note that 2(p) and £ satisfy
CHAP. 22 UNBAR EQUALITY CONSTRAINTS BY WEIGHTING 155

with the vector-valued function h given by Eq. (22.10) with appropriate


changes of notation, and thus

In order to apply the bounds in Eq. (22.11) using the quantities appearing
in problem (22.26), define

and

Then, using Eq. (22.11) and (22.34),

where g[ and d\ are the /th components of the vectors of Eq. (22.35) and
(22.36), respectively.
For St 9fc 0, one obtains the relative precision \\x(p) — 4II ^ 9II* I) by
choosing 0 < p ^ p0 where

Then defining

the solution x(c) of Eq. (22.1) satisfies

for e satisfying 0 <( <,c0.


It should be emphasized that the artifact of considering the various
extended problems (22.19), (22.21), (22.24), (22.26), and (22.28) and the asso-
ciated coordinate changes is only for the purpose of proving that the solution
of problem (22.1) converges to the solution of Problem LSE (20.1) as e tends
to zero. One should not lose sight of the fact that in any actual computational
procedure based on these ideas, one can simply solve problem (22.1) directly,
say, by Householder transformations.
Although the value of c0 given by Eq. (22.37) is an upper bound for | € \ in
order to achieve full relative precision 9, it involves quantities that one would
156 LINEAR EQUALITY CONSTRAINTS BY WEIGHTING CHAP. 22

not generally compute in solving problem (22.1). Note that there is no positive
lower bound on permissible values of | € \ imposed by the mathematical analysis
of the problem or the numerical stability of the Householder method as
analyzed in Powell and Reid (1968a). The only practical lower limit is set by
the computer's floating point underflow value, L, referred to in Chapter IS.
For example, consider a machine with 9 = 10"' and I. = 10~3>. Suppose
all nonzero data in the matrices C and E and the vectors d and/are approxi-
mately of unit magnitude.
In practice one would probably need to have € < tjl/* = 10~4 and e > L
— 10-". With this wide range available one might, for example, decide to
set € « 10-1*.
As a numerical illustration of this weighted approach to solving Problem
LSE, consider the data given in Eq. (20.25) to (20.28). We now formulate this
as the weighted least squares problem

Problem (22.39) was solved on a UNIVAC 1108 computer using sub-


routine HFT1 (Appendix C) with mixed precision (precisions: 9 = 2~29 ==
10-'- > and to « 2-" = 10'17 s) for 40 values of € (c » 10"', r - 1 , . . . , 40).
Let x(r) denote the solution obtained when e = 10-'.
Table 22.1

r j(r>

1 3.7 X 10-*
2 3.7 x 10-4
3 3.6 x 10-6
4 6.3 X 10-8
S 2.6 X 10-7
6 9.1 X 10-8
7 9.1 X 10-«
8 12 X 10-7
9 4.5 x 10-«
10 5.8 X 10-3

36 8.7 x 10-8
37 3.6 x 10-9
38 1 E small. Numbers multiplied by f e
too
,Q J underflow to zero.
CHAP. 22 UNBAR EQUALITY CONSTRAINTS BY WEIGHTING 157

A "true** solution vector x was computed by the method of Chapter 20


using full double precision (10-") arithmetic. To 12 figures this vector was

The relative error of each solution XM was computed as

A selection of values of &M is given in Table 22.1.


From Table 22.1 we see that any s
value of € from 10~4 to 10~*7 gave
reasonable single precision (if == 10~ -') agreement with the "true" answer, St.

EXERCISE

(22.40) In Eq. (22.1) the matrix C is mt x n, E is ma x it, and m — mv


+ m,. Let H, denote the first Householder matrix that would be
constructed when triangularizing Define

Show that ff — \imJ-lHtJt exists and give formulas for computing


•-•0
vectors tit and Ui of dimensions m, and w2, respectively, and a scalar
b (as functions of thefirstcolumn of | » I) such that
V L/U/

Show that the product matrix B has zeros below the first ele-
ment in the first column and thus that an appropriately constructed
sequence of mt matrices like O can be used to zero all elements
below the diagonal in the first w, columns of Show that this
sequence of operations has the same effect as Eq. (21.11) to (21.14).
23 LINEAR LEAST SQUARES WITH
UNEAR INEQUALITY CONSTRAINTS
Section 1. Introduction
Section 2. Characterization of a Solution
Section 3. Problem NNLS
Section* Problem LDP
Section 5. Converting Problem LSI to Problem LDP
Section 6. Problem LSI with Equality Conatrainta
Section 7. An Example of Cortatrainad Curve Fitting

Section 1. INTRODUCTION

There are many applications in applied mathematics, physics, statistics,


mathematical programming, economics, control theory, social science, and
other fields where the usual least squares problem must be reformulated by
the introduction of certain inequality constraints. These constraints constitute
additional information about a problem.
We shall concern ourselves with linear inequality constraints only. A vari-
ety of methods have been presented in the literature. We mention particu-
larly papers that have given serious attention to the numerical stability of
their methods: Golub and Saunders (1970), Gill and Murray (1970), and
Stoer (1971).
The ability to consider least squares problems with linear inequality con-
straints allows us, in particular, to have such constraints on the solution as
nonnegativity, or that each variable is to have independent upper and lower
bounds, or that the sum of all the variables cannot exceed a specified value
or that a fitted curve is to be monotone or convex.
Let E be an ma x n matrix,fan mr vector, G an m x n matrix, and h
an m-vector. The least squares problem with linear inequality constraints will
be stated as follows:
(23.1) PROBLEM LSI
Minimize 11 Ex - f H subject to Gx ^ h.

168
CHAP. 23 LINEAR INEQUALITY CONSTRAINTS 159

The following important special cases of Problem LSI will also be treated in
detail:
(23.2) PROBLEM NNLS (Nonnegative Least Squares)
Minimize||Ex - f||subjectto x ^ 0.
(23.3) PROBLEM LDP (Least Distance Programming)
Minimize || x|| subject to Gx > h.
Conditions characterizing a solution for Problem LSI are the subject of
the Kuhn-Tucker theorem. This theorem is stated and discussed in Section
2 of this chapter.
In Section 3 Problem NNLS is treated. A solution algorithm, also called
NNLS, is presented. This algorithm is fundamental for the subsequent
algorithms to be discussed in this chapter. A Fortran implementation of
Algorithm NNLS is given in Appendix C as subroutine NNLS.
In Section 4 it is shown that the availability of an algorithm for Problem
NNLS makes possible an elegantly simple algorithm for Problem LDP.
Besides stating Algorithm LDP in Section 4, a Fortran implementation, sub-
routine LDP, is given in Appendix C.
The problem of determining whether or not a set of linear inequalities
(7* ;> /r is consistent and if consistent finding some feasible vector arises in
various contexts. Algorithm LDP can of course be used for this purpose.
The fact that no assumptions need be made regarding the rank of G or the
relative row and column dimensions of G may make this method particularly
useful for some feasibility problems.
In Section 5 the general problem LSI, having full column rank, is treated
by transforming it to Problem LDP. Problem LSI with equality constraints
is treated in Section 6.
Finally in Section 7 a numerical example of curve fitting with inequality
constraints is presented as an application of these methods for handling
constrained least squares problems. A Fortran program, PROG6, which
carries out this example, is given in Appendix C.

Section 2. CHARACTERIZATION OF A SOLUTION

The following theorem characterizes the solution vector for Problem LSI:
(23.4) THEOREM (Kuhn-Tucker Conditions for Problem LSI)
An n-vector His a solution for Problem LSI (23.1) if and only if there
exists an m-vector $ and a partitioning of the integers I through m
into subsets 6 and § such that
160 UNBAR INEQUALITY CONfTRAIKTS CHAP. 23

where

This theorem may be interpreted as follows. Let gf denote the fth row
vector of the matrix G. The fth constraint, gfx ^ hlt defines a feasible half-
space, [x: gjx ^> h,}. The vector gt is orthogonal (normal) to the bounding
hyperplane of this halfspace and is directed into the feasible halfspace. The
point x is interior to the halfspaces indexed in S (S for slack) and on the
boundary of the halfspaces indexed in 8 (8 for equality).
The vector

is the gradient vector of p(x) = £||E* — /||a at * = *. Since j>, = 0 for


i $ 8 Eq. (23.5) can be written as

which states that the negative gradient vector of p at x is expressible as a non-


negative Q>| ;> 0) linear combination of outward-pointing normals (—ft) to
the constraint hyperplanes on which x lies (/' e 8). Geometrically this means
that the negative gradient vector —p lies in the convex cone based at the point
x and generated by the outward-pointing normals —gt.
Any perturbation u of X such that x + u remains feasible must satisfy
uTg( ^ 0 for all i s 8. Multiplying both sides of Eq. (23.9) by such a vector
if and using the fact that the $, ^ 0, it follows that u also satisfies urp ^ 0.
From the identity p(* + «) = ?(x) + uTp + \\Eu\\*12, it follows that no
feasible perturbation of x can reduce the value off.
The vector j> (or the negative of this vector) which occurs in the Kuhn-
Tucker theorem is sometimes called the dual vector for the problem. Further
discussion of this theorem, including its proof, will be found in the literature
on constrained optimization [see, e.g., Fiacco and McCormick (1968), p. 256].

Section 3. PROBLEM NNLS

Problem NNLS is defined by statement (23.2). We shall state Algorithm


NNLS for solving Problem NNLS. The finite convergence of this algorithm
will be proved.
We are initially given the roa x n matrix £, the integers m* and n, and
the ma-vector/. The it-vectors w and z provide working space. Index sets <P
CHAP. 23 UNBAR INEQUALITY CONSTRAINTS 161

and Z witt be defined and modified in the course of execution of the algorithm.
Variables indexed in the set Z will be held at the value zero. Variables indexed
in the set <P will be free to take values different from zero. If such a variable
takes a nonpositive value, the algorithm will either move the variable to
a positive value or else set the variable to zero and move its index from set (+P
to set Z.
On termination x will be the solution vector and w will be the dual vector.
(23.10) ALGORITHM NNLS(1?, ma, »,/, x, w, z, <J>, Z)
Step Description
1 Set <J> := NULL, Z := {1,2,..., a), and x := 0.
2 Compute the n-vector w := JF(f — Ex).
3 If the set Z is empty or if w, ^ 0 for all j € Z, go to
Step 12.
4 Find an index t e Z such that w, = max [wjij e Z).
5 Move the index I from set Z to set 9.
6 LctEf denote the m* x n matrix defined by

Column y of

Compute the it-vector 2 as a solution of the least squares


problem EeZ =/. Note that only the components ztj e <P,
are determined by this problem. Define z, := 0 for./ e Z.
7 I f z y > 0 for ally e <P, set x :=* * and go to Step 2.
8 Find an index q € <P such that xl/(xff — zf) = min
{x//(x/-^):2/^0,/€(P}.
9 Set a := xj(xt - ar,).
10 Setx:=* + *(*--*).
11 Move from set <P to set Z all indices./ € <P for which x} =
0. Go to Step 6.
12 Comment: The computation is completed.
On termination the solution vector x satisfies

and is a solution vector for the least squares problem


162 UNBAR INEQUALITY CONSTRAINTS CHAP. 23

The dual vector w satisfies

and

Equations (23.11), (23.12), (23.14), (23.15), and (23.16) constitute the


Kuhn-Tucker conditions [see Theorem (23.4)] characterizing a solution vec-
tor x for Problem NNLS. Equation (23.13) is a consequence of Eq. (23.12),
(23.14), and (23.16).
Before discussing the convergence of Algorithm NNLS it will be conven-
ient to establish the following lemma:
(23.17) LEMMA
Let \beanm x nmatrixof rank n and let b bean m-vector sat iffy-
6»g

with

If # is the least squares solution of Ax = b, then

where X. denotes the nth component of St.


Proof: Let Q be an « x m orthogonal matrix that zeros the sub-
diagonal elements in the first n — 1 columns of A* thus

where R is upper triangular and nonsingular.


Since Q is orthogonal the conditions (23.18) imply
CHAP. 23 LINEAR INEQUALITY CONSTRAINTS 163

and

Since R is nonsingular, Eq. (23.22) implies that H = 0. Thus Eq. (23.23)


reduces to

From Eq. (23.21) it follows that the nth component X. of the solution
vector X is the least squares solution of the reduced problem

Since the pseudoinverse of the column vector t is tT/(tTt), the solution of


problem (23.25) can be immediately written as

which completes the proof of Lemma (23.17).


Algorithm NNLS may be regarded as consisting of a main loop, Loop A,
and an inner loop. Loop B. Loop B consists of Steps 6-11 and has a single
entry point at Step 6 and a single exit point at Step 7.
Loop A consists of Steps 2-5 and Loop B. Loop A begins at Step 2 and
exits from Step 3.
At Step 2 of Loop A the set <P identifies the components of the current
vector x that are positive. Hie components of x indexed in Z are zero at this
point
In Loop A the index / selected at Step 4 selects a coefficient not presently
in set that will be positive (by Lemma (23.17)] if introduced into the solu-
tion. This coefficient is brought into the tentative solution vector z at Step 6
in Loop B. If all other components of z indexed in set <P remain positive, then
at Step 7 the algorithm sets x := z and returns to the beginning of Loop A.
In this process set 9 is augmented and set Z is diminished by the transfer of
the index t.
In many examples this sequence of events simply repeats with the addition
of one more positive coefficient on each iteration of Loop A until the termina-
tion test at Step 3 is eventually satisfied.
However, if some coefficient indexed in set 0 becomes zero or negative in
the vector z at Step 6, then Step 7 causes the algorithm to remain in Loop B
performing a move that replaces x by x + a(z — x),Q<a<,l, where a
is chosen as large as possible subject to keeping the new x nonnegative. Loop
B is repeated until it eventually exits successfully at Step 7.
164 LINEAR INEQUALITY CONSTRAINTS CHAP. 23

The finiteness of Loop B can be proved by showing that all operations


within Loop B are well defined, that at least one more index, the index called
q at that point, is removed from set 0 each time Step 11 is executed, and
that z, is always positive [Lemma (23.17) applies here]. Thus exit from Loop
B at Step 7 must occur after not more than n — 1 iterations within Loop B,
where n denotes the number of indices in set <P when Loop B was entered.
In practice Loop B usually exits immediately on reaching Step 7 and does not
reach Steps 8—11 at all.
Finiteness of Loop A can be proved by showing that the residual norm
function

has a strictly smaller value each time Step 2 is reached and thus that at Step
2 the vector x and its associated set (P = {/: xt > 0} are distinct from all
previous instances of x and (P at Step 2. Since (P is a subset of the set (1,2,...,
n} and there are only a finite number of such subsets, Loop A must terminate
after a finite number of iterations. In a set of small test cases it was observed
that Loop A typically required about1/2niterations.

Updating the QR Decomposition of E

The least squares problem being solved at Step 6 differs from the problem
previously solved at Step 6 either due to the addition of one more column of E
into the problem at Step 5 or the deletion of one or more columns of E at
Step 11. Updating techniques can be used to compute the QR decomposition
for the new problem based upon retaining the QR decomposition of the
previous problem. Three updating methods are described in Chapter 24.
The third of these methods has been used in the Fortran subroutine NNLS
(Appendix C).

Coping with Finite Precision Arithmetic

When Step 6 is executed immediately after Step 5 the component z, com-


puted during Step 6 will theoretically be positive. If z, is not positive, as can
happen due to round-off error, the algorithm may attempt to divide by zero
at Step 8 or may incorrectly compute a = 0 at Step 9.
This can be avoided by testing z, following Step 6 whenever Step 6 has
been entered directly from Step 5. If z, < 0 at this point, it can be interpreted
to mean that the number wt, computed at Step 2 and tested at Steps 3 and 4
should be taken to be zero rather than positive. Thus one can set wt := 0 and
loop back to Step 2. This will result either in termination at Step 3 or the
assignment of a new value to t at Step 4.
At Step 11 any x1 whose computed value is negative (which can only be
CHAP. 23 LINEAR INEQUALITY CONSTRAINTS 165

doe to round-off error) should be treated as being zero by moving its index
from set (P to set Z.
The sign tests on z1, / e (P, at Steps 7 and 8 do not appear to be critical.
The consequences of a possible misclassification here do not seem to be
damaging.
A Fortran subroutine NNLS implementing Algorithm ]NNLS and using
these ideas for enhancing the numerical reliability appears in Appendix C.

Section 4. PROBLEM LOP

The solution vector for Problem LDP (23.3) can be obtained by an


appropriate normalization of the residual vector in a related Problem NNLS
(23.2). This method of solving Problem LDP and its verification was brought
to the authors' attention by Alan Cline.
We are given the m x n matrix C, the integers m and it, and the m-vector
h. If the inequalities G x > h are compatible, then the algorithm will set the
logical variable p = TRUE and compute the vector X of minimal norm satisfy-
ing these inequalities. If the inequalities are incompatible, the algorithm will
set f — FALSE and no value will be assigned to X. Arrays of working space
needed by this algorithm are not explicitly indicated in the parameter list.
(23.27) ALGORITHM LDP(G, m, n, h, X, 9)
Step Description
\ Define the (n + 1) x m matrix E and the (n + l)-vector/

as and Use Algorithm


NNLS to compute an m-vector tf solving Problem NNLS:
Minimize \\Eu — f\\ subject to «.;> 0.
2 Compute the (n + l)-vector r := Eu — /.
3 If||r|| = 0, set g :— FALSE and go to Step 6.
4 Set g := TRUE.
5 For j :=1,...,n, compute *, :==* —/v/r^,.
6 The computation is completed.
Proof of Validity of Algorithm LDP

First consider the Problem NNLS, which is solved in Step 1 of Algorithm


LDP. The gradient vector for the objective function1/2||Eu — f ||2 at the solu-
tion point fl is
166 LINEAR INEQUALITY CONSTRAINTS CHAP. 23

From the Kuhn-Tucker conditions (Theorem (23.4)] for this Problem NNLS
there exist disjoint index sets e and $ such that

and

Using Eq. (23.28) to (23.31) we obtain

Consider the case in which \\r\\ > 0 at Step 3. From Eq. (23.32) this
implies thatrn+1< 0, so division byrn+1at Step 5 is valid. Using Eq. (23.31)
and (23.32) and the equations of Steps 2 and 5, we establish the feasibility
of x as follows:

Therefore,

From Eq. (23.31) and (23.33) it follows that the rows of the system of
inequalities of Eq. (23.34) indexed in set $ are satisfied with equality. The
gradient vector for the objective function J|| x | ja of Problem LDP is simply x.
The Kuhn-Tucker conditions for X to minimize 1/2||x||2 subject to Gx>h
require that the gradient vector X must be representabte as a nonnegative
linear combination of the rows of G that are associated with equality condi-
tions in Eq. (23.34), i.e., the rows of G indexed in set S.
From Steps 2 and 5 and Eq. (23.32) we have
CHAP. 23 LINEAR INEQUALITY CONSTRAINTS 167

Noting the sign conditions on A given in Eq. (23.30) completes the proof that
£ is a solution of Problem LDP.
It is clearly the unique solution vector since, if x is a different solution
vector, then p|| = ||x|| and the vector x =1/2(x+ x) would be a feasible
vector having a strictly smaller norm than X, which contradicts the fact that
is a feasible vector of minimum norm.
Now consider the case of \\r\\ = 0 at Step 3. We must show that the
inequalities Gx > h are inconsistent. Assume the contrary, i.e., that there
exists a vector x satisfying Gx > h. Define

Then

This last expression cannot be zero, however, because q > 0 and u> 0.
From this contradiction we conclude that the condition \\r\\ — 0 implies
the inconsistency of the system Gx > h. This completes the mathematical
verification of Algorithm LDP.

Sections. CONVERTING PROBLEM LSI


TO PROBLEM LDP

Consider Problem LSI (23.1) with the m2 x n matrix E being of rank n.


In various ways as described in Chapters 2 to 4 one can obtain an orthogonal
decomposition of the matrix E:

where Q is m2x m2 orthogonal, AT is n x n orthogonal, and R is n x n


nonsingular. Furthermore, the matrix R can be obtained in triangular or
diagonal form.
Introduce the orthogonal change of variables

The objective function to be minimized in Problem LSI can then be written as


168 LINEAR INEQUALITY CONSTRAINTS CHAP. 23

where

With a further change of variables,

we may write

The original problem LSI of minimizing || f — Ex \\ subject io Gx>h is


thus equivalent, except for the additive constant|f
2 |2in the objective
function, to the following Problem LDP:

If a vector 2 is computed as a solution of this Problem LDP, then a solu-


tion vector X for the original Problem LSI can be computed from Eq. (23.39)
and (23.36). The squared residual vector norm for the original problem can
be computed from Eq. (23.40).

Section 6. PROBLEM LSI WITH EQUALITY


CONSTRAINTS

Consider Problem LSI (23.1) with the addition of a system of equality


constraints, say Cm1*nx=d, with Rank(C)=m1 < it and Rank ([CT:Et])=n.
The equality constraint equations can be eliminated initially with a corre-
sponding reduction in the number of independent variables. Either the
method of Chapter 20 or that of Chapter 21 is suitable for this purpose.
Using the method of Chapter 20, introduce the orthogonal change of
variables

where K triangularizes C from the right:


CHAP. 23 UNBAR INEQUALITY CONSTRAINTS 169

Theo y1 is determined as the solution of the lower triangular system c1y1 = d,


and y2 is the solution of the following Problem LSI:

After solving problem (23.44) for y2 the solution & can be computed using
Eq. (23.42).
If the method of Chapter 21 is used, one would compute Q1, c1, c2, d, e1,
£„ and / using Eq. (21.11) to (21.14) and additionally solve for the matrix
&, in the upper triangular system

Then x2 is the solution of the following Problem LSI:

and x, would be computed by solving the upper triangular system

Section 7. AN EXAMPLE OF CONSTRAINED


CURVE FITTING

As an example illustrating a number of the techniques that have been


described in this chapter we consider a problem of fitting a straight tine to
a set of data points where the line must satisfy certain constraints. PROG6,
a Fortran main program that performs the computation for this example, is
given in Appendix C.
Let the data be given as follows:

t w
0.25 0.5
0.50 0.6
0.50 0.7
0.80 1.2
170 LINEAR INEQUALITY CONSTRAINTS CHAP. 23

We wish to find a line of the form

which fits these data in a least squares sense subject to the constraints

This problem can be written as Problem LSI:

where

We compute an orthogonal decomposition of the matrix E in order to


convert the Problem LSI to a Problem LDP as described in Section 5 of this
chapter. Either a QR or a singular value decomposition of E could be used.
We shall illustrate the use of a singular value decomposition.
CHAP. 23 UNBAR INEQUALITY CONSTTAINTS 171

Introduce the change of variables

We then wish to solve the following Problem LDP:

where

and

A graphical interpretation of this Problem LDP is given in Fig. 23.1.


Each row of the augmented matrix [G: h] defines one boundary line of the
feasible region. The solution point Z is the point of minimum euclidean norm

FIG. 23.1 Graphical interpretation of the sample Problem LDP


23.54).
172 LINEAR INEQUALITY CONSTRAINTS CHAP. 23

within the feasible region. This point, as computed by subroutine LDP, is

Then using Eq. (23.53) we finally compute

The residual vector for the solution vector X is

Fig. 2&2 Graph of solution line for the sample problem (23.48)^-
(23.51).
CHAP. 23 LINEAR INEQUALITY CONSTRAINTS 173

and the residual norm is

The given data points (tt, w,), i = 1 , . . . , 4, and the fitted line, f(t) =
0.621t + 0.379, are shown in Fig. 23.2. Note that the third constraint, f(1)
<; 1, is active in limiting how well the fitted line approximates the data points.
The numerical values shown in describing this example were computed
using a UNTVAC 1108 computer. Executing the same Fortran code on
an IBM 360/67 resulted in opposite signs in the intermediate quantities v, f1,
G, and z. this is a consequence of the fact that the signs of columns of the
matrix V in a singular value decomposition are not uniquely determined.
The difference in the number of iterations required to compute the singular
value decomposition of the matrix E on the two computers having different
word lengths resulted in a different assignment of signs in the matrices U
and V

EXERCISES
(23.55) Prove that if a Problem LSE has a unique solution without inequality
constraints, then it has a unique solution with inequality constraints.
(23.56) Show that the problem of minimizing a quadratic function f(x)
=1/2xtBx+atx,for positive definite B can be transformed to the
problem of minimizing1/2||w||2by letting w = Fx—g for an appro-
priate choice of the nonsingular matrix F and vector g.
(23.57) If the function / of Ex. (23.56) is to be minimized subject to the
constraints Cx = d and Gx > A, what are the corresponding con-
straints for the problem of minimizing1/2|| w||2?
MODIFYING A QR DECOMPOSITION TO
24 ADD OR REMOVE COLUMN VECTORS

The key to a successful algorithm for the constrained Problem LSI of


Chapter 23 is the ability to compute solutions to a sequence of least squares
problems in an efficient manner. In Algorithm NNLS (23.10) it can be seen
that the coefficient matrix has the property that one linearly independent
column vector is added or removed at each step.
Thus we discuss the following problem: Let n > 0. Let a , , . . . , a, be a
set of m-vectors. Consider the matrix

Introduce a QR decomposition for Ak

Here Rk is k x k upper triangular and nonsingular, while Q is orthogonal.


Once Q and Rk have been computed, the least squares solution of the system

is given by

which requires very little additional work.


We shall consider the problem of computing a QR decomposition for a
174
CHAP. 24 MODIFYING A QR DECOMPOSITION 176

matrix obtained by deleting a column vector from Ak or adjoining a new


column vector to Ak. We wish to take advantage of the prior availability of a
QR decomposition of Ak.
We shall discuss three useful methods for updating the QR decomposi-
tion. In view of Eq. (24.4) the updating of Q and R effectively provides an
updating of Ak.

METHOD 1
We have Q stored as an explicit m x m orthogonal matrix and R stored
as a A: x fc upper triangle.
Adjoining • Vector

Let ak+1 be a vector linearly independent of the columns of Ak. Form the
augmented matrix

Compute the product

using Q of Eq. (24.2). Introduce a Householder matrix Qk+1 so that the vector

has zero components in positions k + 2,... ,m. Then a QR decomposition


of ^jn., is given by

where

and

Removing a Vector
If the matrix Ak of Eq. (24.1) is modified by the removal of the column
vector ajf forming the matrix
176 MODIFYING A QR DECOMPOSITION CHAP. 24

we see that

where each r, is zero in entries i + 1,..., m. [See Golub and Saunden (1969)
and Stoer (1971).] Define

where the G, are Givens rotation matrices chosen so that the matrix

is upper triangular. The matrix Gt operates on rows i and i + 1 and produces


a zero in position (i +1,i)0 of the product matrix (G1,••• G1, QAk-1).
Method 1 requires m2 computer storage locations for Q and /(/ + l)/2
locations for R where 1 < m is the largest number of column vectors to be
used simultaneously.

METHOD 2
Instead of storing Q explicitly as in Method 1 we now assume that g is
given as a product of fc Householder transformations.

where each

is stored by retaining only the nonzero entries of the vectorsm1plus one ad-
ditional scalar associated with each vector. The matrix R is again stored as a
k x k upper triangle.

Adjoining • Vector

If the set is enlarged to the linearly independent set [a1 ak, ak+1]
compute the product bk+l = Qk ••• Qqak+i. Compute a Householder
transformation Qk+l so that Qk+1bk+1 =rk+1is zero in entries k + 2,..., m.
It follows that
CHAP. 24 MOD1FY1NO A QR DECOMPOSITION 177

is a QR decomposition for the enlarged set.


Removing • Vector
Again define the matrix Ak-t by Eq. (24.6). The data defining the QR
decomposition of Ak is of the form

Denote the QR decomposition of The matrix A will


.contain the same submatrix R11 appearing in Eq. (24.13) as its leading
(j — 1) x (j — 1) submatrix. Thus only the last k—j columns of k need
to be determined.
One approach is to replace

in storage by [aj+1,,..., ak]. Then compute

Next premultiply S by new Householder transformations, fir, so that

with AM upper triangular.


The QR decomposition of Ak-1 is then represented as
178 MODIFYING A QR DECOMPOSITION CHAP. 24

Alternatively, the matrix S of Eq. (24.14) could be produced by the for-


mula

whereK'12and R'22 are defined in Eq. (24.13).


Method 2 requires (m + I)/ storage locations to retain the data defining
Q and R. Here / <£ m is the maximum number of columns being used simul-
taneously. The version of Method 2 using Eq. (24.14) requires an additional
m x n storage array to retain [a1,,..., an] so that any of these vectors will
be available when needed in Eq. (24.14).
The second version of Method 2, using Eq. (24.15), obviates the need
to retain copies of the vectors a,, a, +1 ,..., ak in storage. Thus all the data
representing the QR decomposition of a matrix Ak plus all the vectors a, not
presently being regarded as columns of Ak can be stored in one m x n storage
array plus about 3» additional locations for bookkeeping purposes.

METHODS
In this method the entire contents of a storage array,Wm*[n+1],which
initially contains the data [A m x n : bmx1,], is modified each time a new column
vector is adjoined to or deleted from the basis. The column vectors of the
upper triangular matrix Rk of Eq. (24.2) will occupy some set of A: columns of
the array W. No record will be kept of the matrix Q of Eq. (24.2).
Let (Pk = {PI,PI,... ,Pk} be a subset of (1,2,..., n) identifying the
columns of A that have been introduced into the basis. The order of the
integers p, in the set (P is significant. Let Ak denote the m x A: matrix consist-
ing of those column vectors of A indexed in (P and ordered in the sequence
P1,P2, • • • p k . Let Q denote the orthogonal matrix satisfying Eq. (24.2).
Then W will contain the m x (« + 1) matrix Q[A : b]. This implies that the
jih column vector of Rk will be stored in column number pj of W. In this
method it may be found convenient to store explicit zeros for elements that
are zeroed by Householder or Givens transformations.
Adjoining • Vector

Let pk+i denote the index of a column of A to be adjoined to the basis.


Form a new index set(Pk+1,consisting of pi pk from (Pk plus pk+i. Form
the Householder transformation H that transforms column numberpk+1of
W to have zeros below row k + 1. Premultiply H times the entire array W.
CHAP. 24 MODIFYING A QR DECOMPOSITION 179

Removing • Vector

Assume again that <P* identifies the current basis. Let 1 <.j £ k and as-
sume column number pj is to be removed from the basis. For i =j 4-1,/ -f-
2,..., k form the Givens rotation matrix G( which operates on rows i — 1
and / and zeros the (i,/pi+1) element of W. Premultiply Gt times the entire
array w
Form the new index set (Pk_1 by setting p1 := pt for / = 1,2,...,./ — 1
and setting p1:—p1+\ for i*=/,/ + l , . . . , K — 1. This method requires only
the m x (n+1) storage array W which initially contains the data [A : &]. If
a copy of the initial data will be needed at later stages of the computation,
then it must be saved in additional storage.
25
PRACTICAL ANALYSIS OF
LEAST SQUARES PROBLEMS

Section 1. General Considerations


Section 2. Left Multiplication of A and o by a Matrix 6
Sect/on 3. Right Multiplication of A by a Matrix //and Change of Variable*

Section 4. Append Additional Rows to [A:b]


Section 6. Deleting Variables
Sections. Singular Value Analysis

Section 1. GENERAL CONSIDERATIONS

In this chapter we discuss the strategy of planning and interpreting the


practical solution of a least squares problem. We consider the case of m > n,
as this is the central problem under consideration in this book.
The person who originates a least squares problem must acquire data and
develop or select a mathematical model and frequently a statistical model
also. At some point in his work he reaches the stage at which he has a matrix
A, and a vector 6, and he desires to mid a vector x that minimizes the
euclidean norm of the residual vector

The problem originator should also have information about the un-
certainty of the data constituting A and b. Frequently he may have a priori
information about the solution of his matrix problem. This may involve some
knowledge as to what would constitute reasonable values for some or all of
the solution components. It might also include information such as a require-
ment that some or all solution components must be nonnegative.
In very general terms there are two major considerations in the design of a
computational approach to the problem and it is important to keep these
points separate.
180
CHAP. 25 PRACTICAL ANALYSIS OP LEAST SQUARES PROBLEMS 181

1. Computational error can be kept down to the point where it is negli-


gible compared with uncertainty in the solution caused by uncertainty in the
initial data.
2. The combined effect of the a priori uncertainty in A and b and the con-
dition number of A may produce a situation in which there are many signif-
icantly different vectors x that reduce the norm of r to an acceptably small
value. We shall discuss techniques for detecting this situation and some
methods for controlling the selection of a particular solution from this set
of candidate solutions.
Expanding on point 1, we assume that the information regarding the un-
certainty in A and b can be expressed by the statement that there are known
numbers j and y such that any matrix of the form A + E with

and any vector of the form b + db with

would be acceptable to the problem originator as substitutes for the specific


data A and b.
For comparison with these inherent error parameters j and y we may
use Eq. (16.3) and (16.4) to define the following two functions of n:

Here we have replaced \\A\\r of Eq. (16.3) by its upper bound n1/2 \\A\\.
Ignoring the terms 0(n2) in Eq. (16.3) and (16.4), we infer from Theorem
(16.1) that if n is chosen small enough so that

and

then the solution computed using arithmetic of uniform precision n is the


exact solution of some problem whose initial data differs from the given data
by amounts satisfying Eq. (25.2) and (25.3).
Similarly to select an 9 and w<n2 for mixed precision computation one
can use Eq. (17.13) and (17.14) to define
182 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS CHAP. 25

and

and choose 9 so that

and

With this choice of n and to w<n2 we infer from Theorem (17.11) that the
computed solution is the exact solution of a perturbed problem with per
turbations smaller than the a priori uncertainty as described by Eq. (25.2)
and (25.3).
Recall that the multipliers of \\A \\ n and \\b \\ q in Eq. (25.4), (25.5), (25.8),
and (25.9) were derived by considering worst-case propagation of computa-
tional errors. In practice, replacing these multipliers by their square roots
will generally give more realistic estimates of the norms of the virtual per-
turbations due to the computational errors.
We now turn to point 2. It will be convenient to formalize the notion of
an "acceptably small" residual vector. Suppose we are willing to accept resid-
ual vectors with norms as large as some number p. Then we may define the
acceptable solution set as

It is important to note that the definition of the set X depends upon the three
tolerance parameters, j, y, and p, which should be chosen on the basis of
actual knowledge about the problem and its data.
Our purpose in writing Eq. (25.12) is not so much to establish this particu-
lar definition of an acceptable solution set as it is to give some degree of
concreteness to the general idea of an acceptable solution set and to identify
some of the quantities that determine the "size" of this set.
To assess the "size" of the set X we first observe that if Rank (A) < nt or
if (p and K (K denotes the condition number of A) are so large that Kj|| A\\>
1, thtnA + £ will be singular for some \\E\\ < j and the set JIT is unbounded.
On the other hand, if Rank (A) — n andkj||A||< 1, then the per-
turbation bound (9.13) or (9.14) may be used to obtain useful information
regarding the diameter of the set X. The presence of the parameter p in the
definition of the set X leads to some further possible increase in its size.
If this set X is "large"" in the sense that it contains vectors which are
significantly different from each other, then some selection must be made of
CHAP. 25 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS 183

a particular solution vector from JIT. This selection process can be viewed
very broadly as including any steps that the problem originator might take
to change the problem to one having a smaller solution set, generally con-
tained in X,
The criterion that one uses to reduce the size of the set X depends upon
the application. A situation that occurs very commonly is one in which the
problem A x = b arises as a local linearization of a nonlinear least squares
problem. This was mentioned in Chapter 1. In this case one is likely to prefer
the use of the x e X having least norm in order to reduce the likelihood of
leaving the region in which b — Ax is a good approximation to the nonlinear
problem.
Although there are many different motivations (statistical, mathematical,
numerical, heuristic, etc.) that have been proposed for different specific pro-
cedures for changing a given least squares problem, most of these procedures
consist of performing one or more of the following four operations, not
necessarily in the order listed.
1. Left-multiply A and b by an m x m matrix G.
2. Right-multiply A by an n x n matrix H with the corresponding change
of variables x - Hx or x — 'HZ + E.
3. Append additional rows to A and additional elements to b.
4. Assign fixed values (often zero) to some components of the solution
vector. This may be done either with respect to the original set of variables
or a transformed set of variables.
We shall expand on each of these four items in Sections 2,3,4, and 5,
respectively.
Finally in Section 6 we describe singular value analysis. By singular value
analysis we mean the computation of a number of quantities derived from
the singular value decomposition of the matrix A and the interpretation of
these quantities as an aid in understanding the indeterminacies of the prob-
lem Ax^b and in selecting a useful solution vector. The problem Ax = b to
which singular value analysis is applied may of course be a problem resulting
from preliminary application of various of the operations described in Sec-
tions 2 to 5.

Section 2. LEFT MULTIPLICATION OF A AND b


BY A MATRIX G

This operation changes the norm by which the size of the residual vector
is assessed. Thus instead of seeking x to minimize (b — AxY(b — Ax), one
changes the problem to that of minimizing (Gb — GAxf(Gb — GAx\ which
184 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS CHAP. 25

also can be written as (b — Ax)t(GtG)(b — Ax), or as (b — Ax)tw(b — Ax),


where W = CFG.
A commonly used special case is that in which the matrix G is diagonal.
Then the matrix W is diagonal. In this case left-multiplication of the aug-
mented matrix [A : b] by G can be interpreted as being a row-scaling opera-
tion in which row i of [A: b] is multiplied by the number gu.
Define

and

Then if G is diagonal the quantity to be minimized is

Loosely speaking, assigning a relatively large weight, \gu\ (or equivalently


wu) to the ith equation will tend to cause the resulting residual component
| r,\ to be smaller. Thus if some components of the data vector b, are known
with more absolute accuracy than others, one may wish to introduce relatively
larger weights, |g,,|, or wut in the ith row of [A : b].
This procedure, in which G is diagonal, is generally referred to as weighted
least squares. As a systematic scheme for assigning the weights suppose that
one can associate with each component, blt of the data vector b a positive
number, ot, indicating the approximate size of the uncertainty in bt. If one
has the appropriate statistical information about b it would be common to
take ot to be the standard deviation of the uncertainty in bt. Then it would be
usual to define weights by

or equivalently

Note that with this scaling all components of the modified vector B defined by

have unit standard deviation.


More generally, if one has sufficient statistical information about the
uncertainty in b to place numerical values on the correlation between the
CHAP. 25 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS 185

errors in different components of b, one may express this information as an


m x m positive definite symmetric covariance matrix C. [See Plackett (1960)
for a detailed discussion of these statistical concepts.] In this case one can
compute the Cholesky factorization of C as described in Eq. (19.5) and (19.12)
to (19.14) obtaining a lower triangular matrix F that satisfies

Then the weighting matrix G can be defined by

Operationally it is not necessary to compute the explicit inverse of F.


Rather one would compute the weighted matrix [A: b] by solving the equa-
tions

This process is straightforward because F is triangular.


If G is derived from an a priori covariance matrix for the errors in b as
described above, then the covariance matrix for the errors in the transformed
vector B is the identity matrix.
Whether on the basis of statistical arguments or otherwise, it is desirable
to use a matrix G such that the uncertainty is of about the same size in all
components of the transformed vector B — Gb. This has the effect of making
the euctidean norm a reasonable measure of the "size" of the error vector db,
as for instance in Eq. (25.3).

Section 3. RIGHT MULTIPLICATION OF A BY A


MATRIX H AND CHANGE OF
VARIABLES x=HX+e

Here one replaces the problem

by the problem

where
186 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS CHAP. 25

The matrix H is n x /, with 1<,n. The vector x is 1-dimensional and the ma-
trix A is m x 1 lf His n x n diagonal, this transformation may be interpreted
as a column scaling operation applied to A.
If H is n x n nonsingular, this transformation does not change the prob-
lem mathematically. Thus the set of vectors satisfying x — Hx + £ where
x minimizes \\B — Xx\\ is the same as the set of vectors x minimizing
\\b-Ax\\.
However, unless His orthogonal, the condition number of A will generally
be different from that of A. Therefore, if one is using an algorithm, such as
HFTI (14.9) or singular value analysis [Eq. (18.36) to (18.45)], that makes a
determination of the pseudorank of A, the algorithm may make a different
determination using A.
Furthermore, if the pseudorank is determined to have a value k < n and
one proceeds to compute the minimal length solution of the rank k problem,
then the use of the transformation matrix H alters the norm by which the
"size" of the solution vector is measured. This will, in general, lead to a
different vector being selected as the "minimum length'* solution. Thus in
problem (25.16) theminimal length solution isa solution that minimizes || x |||,
whereas using the transformed problem in Eq. (25.17) the minimal length
solution is a solution x of Eq. (25.17) that minimizes \\x\\. This amounts to
minimizing || H~1(x — E)|| rather than minimizing || x||.
As to criteria for selecting H we first note that the use of the spectral norm
of the perturbation matrix, E, in Eq. (25.2) is a realistic mathematical model
only if the absolute uncertainties in different components of A are all of
about the same magnitude. Thus, if one has estimates of the.uncertainty of
individual elements of A, the matrix H can be selected as a column scaling
matrix to balance the size of the uncertainties in the different columns of A.
A somewhat similar idea can be based on a priori knowledge of the solu-
tion vector x. Suppose it is known that the solution vector should be close to
a known vector ((the a priori expected value of x). Suppose further that one
has an a priori estimate, o1 of the uncertainty of e1, as an estimate of x1. One
can take H to be the n x n diagonal matrix with diagonal components

Then the transformed variables

have unit a priori uncertainty and zero a priori expected values.


More generally, if one has sufficient a priori statistical information to
define an n x n positive-definite a priori covariance matrix K describing the
uncertainty of {, then H can be computed as the upper triangular Cholesky
CHAP. 25 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS 187

factor of C [see Eq. (19.16) to (19.18)]:

Then the a priori covariance of the transformed variable vector x of Eq.


(25.20) is the n x n identity matrix, and the a priori expected value of Jc is the
zero vector.
The situation in which the separate components of X have approximately
equal uncertainty is a desirable scaling of the variables if the problem is
judged to have pseudorank k < n and a minimal length solution is to be
computed.
Since the condition number of A of Eq. (25.18) will, in general, be different
from that of A, one may decide to choose H with the objective of making the
condition number of X small. If A is nonsingular, there exists a matrix H
such that Cond (AH) = 1. Such a matrix is given by H = R-1 , where R is
the triangular matrix resulting from Householder triangularization. One is
not likely to have this matrix, R, available a priori. It is of interest to note,
however, that if the data that lead to the determination of the a priori covar-
iance matrix, C, of Eq. (25.23) were approximately the same as the present
data [A : b], then the matrix H of Eq. (25.23) will be approximately equal
to /?-». Then Cond (AH) is likely to be fairly small.
If no estimate of C is available, there remains the possibility of using a
diagonal matrix H as a column scaling matrix to balance the euclidean
norms (or any other selected norms) of the columns of A. This is accom-
plished by setting

where a, denotes the jth column vector of A. It has been proved by van der
Sluis (1969) that with H defined by Eq. (25.24), using euclidean column
norms, Cond (AH) does not exceed the minimal condition number obtainable
by column scaling by more than a factor of n1/2.
Improving the condition number has the advantage that perturbation
bounds such as Eq. (9.10) will give less pessimistic results. It also may lead
to a determination of pseudorank k = n, whereas without the improvement
of the condition number by scaling a pseudorank of k < n might have been
determined leading to unnecessary and inappropriate complications in ob-
taining a solution.
A transformation matrix H can be selected so that A or a submatrix of A
has an especially convenient form, such as triangular or diagonal. Thus if
one has computed the singular value decomposition, A = USVt, then left-
multiplying [A : b] by Urand right-multiplying A by H = K transforms A to
the diagonal matrix S. Operationally one would not usually do this pre- and
188 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS CHAP. 25

postmultiplication explicitly as it would be done along with the computation


of the singular value decomposition as described in Chapter 18. Systematic
use of the singular value decomposition will be discussed in Section 6 of
this chapter.

Section 4. APPEND ADDITIONAL ROWS TO


{A:b}

Here we discuss replacing the problem A x = b by the problem

where F is an 1 x n matrix and is an l-dimensional vector.


Suppose one prefers that the solution vector x should be close to a known
vector e. Setting F = 1„ and d— e in (25.25) expresses this preference. In
particular if d— e = 0 this expresses a preference for \\x\\ to be small.
The intensity of this preference can be indicated by a scaling parameter,
say a, incorporated into the definition of F and d as

The number a can be regarded as an estimate of the size of the uncertainty


in {. Thus setting a small causes the solution to be closer to (.
A further refinement of this idea is to assign separate numbers, <r,, / =
1 , . . . , n, where s is an estimate of the size of the uncertainty of the ith com-
ponent of E. Thus one would define dt =Et/s1for i = 1 , . . . , n and let F be a
diagonal matrix with diagonal elementsf11= s1-1.
Finally, if one has sufficient a priori statistical information about the
expected value, E, of the solution, x, to define a symmetric positive definite
n x n covariance matrix, K, for (x — E), then it is reasonable to set

where L is the lower triangular Cholesky factor of K [see Eq. (19.12) to


19.14)]

and set

Note that the case of d = 0 can be reduced to the case of d = 0 by a sim-


ple translation if the system Fw = d is consistent. To see this let w be a solu-
CHAP. 25 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS 189

tion of Fw = d and make the change of variables x = w + x in Eq. (25.25),


which yields the transformed problem

It is important to make this change of variables if one intends to apply a


minimal length solution method since the preference for ||x|| to be small is
consistent with the conditions F x = O .
We consider now the question of the relative weighting of the two sets of
conditions A x = b and Fx = din Eq. (25.25). From a formal statistical point
of view one might say that the appropriate relative weighting is established
by setting up the problem in the form

where ((GtG)-1 is the 1a priori covariance matrix of the uncertainty in the data
vector b and (FtF)- is the a priori covariance matrix of the uncertainty in
the a priori expected value e of the solution vector x. In practice, however,
these a priori covariance matrices, particularly (FtF)-1 may not be known
with much certainty and one may wish to explore the changes that different
relative weighting produces in the solution vector and in the residual vector.
For this purpose introduce a nonnegative scalar weighting parameter A
into problem (25.30) and consider the new problem

where

and

For the convenience of readers who may be more accustomed to other


ways of motivating and stating the least squares problem involving a priori
covariance matrices we note that problem (25.31) is the problem of finding
a vector x to minimize the quadratic form
190 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS CHAP. 25

or equivalently

The idea of using a relative weighting parameter A in this context was


discussed by Levenberg (1944). Further elaboration and applications of the
technique are given by Hoerl (1959,1962, and 1964), Morrison (1960),
Marquardt (1963 and 1970), and Hoerl and Kennard (1970a and 1970b).
As a result of the 1963 paper and a computer program contributed to the
computer information exchange organization SHARE by Marquardt, the
use of this idea in problems arising from the linearization of nonlinear least
squares is often called Marquardt's method. The study of problem (25.31) as a
function of y has also been called ridge regression or damped least squares.
To analyze the dependence of the solution vector and the residual vector
in problem (25.31) upon the parameter A we first introduce the change of
variables

and obtain the transformed problem

where

and

The translation by E and seating by F-1 used in Eq. (25.35) have been
discussed in Section 3 of this chapter. Their purpose is to produce the new
variable y that is better scaled for meaningful assessment of the "size" of the
solution vector.
Write a singular value decomposition of A:

Recall that S — Diag {s1,..., Sn}. If Rank (A) = k< n, then st = 0 for
i > k. Introduce the orthogonal change of variables

and left-multiply Eq. (25.36) by the orthogonal matrix


CHAP. 25 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS 191

obtaining the new least squares problem

where

Finally, for A > 0, the submatrix,yI,,in problem (25.39) can be elimi-


nated by means of left-multiplication by appropriate Givens rotation ma-
trices. Thus to eliminate the ith diagonal element of yI., left-multiply Eq.
(25.39) by a matrix that differs from an (m + n)th-order identity matrix only
in the following four positions:

After this elimination operation for i = 1 , . . . , n the resulting equivalent


problem is

where

and
192 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS CHAP. 25

with

For A = 0 problem (25.39) has a diagonal matrix of rank k and the solu-
tion vector p(0) is given by

For A > 0 we use Eq. (25.41) to (25.44) and obtain the solution compo-
nents

Furthermore we have

Note that Eq. (25.46) and (25.47) remain valid for A = 0.


The vector p(y), which is a solution for problem (25.39), can be transformed
by the linear equations (25.38) and (25.35) to obtain vectors y(l) and x(l),
which are solutions for problems (25.36) and (25.31), respectively.
We are primarily interested in the problem Ax = B and have introduced
problem (25.31) only as a technical device for generating solutions to a par-
ticular family of related problems. Thus formulas will also be derived for the
quantity wl, = \\B- Ax(l) ||. With l > 0 andp(l)defined by Eq. (25.46) we
obtain

Note that as A increases, || p(l) || decreases andw(l)increases. Thus the prob-


lem originator has the opportunity of selecting A to obtain some acceptable
CHAP. 25 PRATICAL ANALYSIS OF LEAST SQUARES PROBLEMS 193

compromise between the size of the solution vector p(l) and the size of the
residual norm w(l).
The set of solutions of problem (25.31) given by this technique has an
important optimality property expressed by the following theorem.
(25.49) THEOREM [Morrison (1960); Marquardt (1963)]
For a fixed nonnegative value of A, say, l, let y be the solution
vector for problem (25.36), and let w = ||b — Ay||. Then & is the
minimum value of \\b —Ay \\for all vectors y satisfying \\y\\ <,\\y\\.
Proof: Assume the contrary. Then there exists a vector y with
||y|| <||y|| satisfying ||ay||<||b-aj||. Therefore, ||$-^|p +
l2||y||2 < ||b-ay||2 + l2||y||, which contradicts the assumption that y is
the least squares solution of problem (25.36) and thus minimizes \\b — Ay\\2
+ l2||yll2.
A simple tabular or graphic presentation of the quantities || p(l)\\ and wl
given by Eq. (25.47) and (25.48) can be very useful in studying a particular
least squares problem. Further detail can be obtained by tabular or graphic
presentation of the individual solution components [Eq. (25.46), (25.38), and
(25.35)] as functions of JU An example of this type of analysis is given in
Chapter 26.
One may wish to solve Eq. (25.47) or (25.48) to find a value of A which
produces a prespecified solution norm or residual norm. If Newton's method
is used for this purpose the following expressions will be useful:

and
194 PRACTICAL ANALYS1S OF LEAST SQUARES PROBLEMS CHAP. 25

If one solves problem (25.36) directly, using any algorithm that determines
the norm, say pl of the residual vector, then it should be noted that Pl sat-
isfies

Thus if one wishes to obtain the quantity wl this can be done by computing
|| y(l)||2 and then solving Eq. (25.50) for wl.

Section 5. DELETING VARIABLES

Deleting a variable from a problem is equivalent to fixing the value of


that variable at zero. If one variable, say XM is deleted, then the problem

is changed to

where A is the m x (n — 1) matrix that consists of the first n — \ column


vectors of A and x is an (n — l)-dimensional vector.
Assuming m > n, the separation theorem (5.12) implies that the singular
values of A interlace with those of A, from which it follows that Cond (A) <.
Cond (A). By repeated application of this process it follows that removing
any proper subset of the variables leads to a matrix whose condition number
is no larger than that of the original matrix.
Clearly the minimum residual obtainable in problem (25.52) is not smaller
than the minimum residual attainable in problem (25.51).
Thus like some of the techniques discussed previously, removing variables
is a device that reduces, or at least does not increase, the condition number
of a matrix at the expense of increasing, or at least not decreasing, the norm
of the residual vector.
Although we feel it is useful for the purpose of comparing variable
deletion or variable augmentation with other stabilization methods to note
the properties stated above, it is nevertheless true that these properties are
not usually taken as the motivating concepts for this type of procedure. More
commonly one uses variable deletion or augmentation because one wishes
to find the smallest number of parameters that can be used in the solution
and still give an acceptably small residual norm.
In some cases there is a natural ordering of the variables, for instance,
when the variables are coefficients of a polynomial in one variable. Then it is
a straightforward matter to obtain a sequence of solutions, first using only
the first variable, next the first and second variable, and so forth. In fact, in
CHAP. 25 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS 195

the special case of fitting data by a polynomial or by truncated Fourier series


there are quite specialized algorithms available [e.g., Forsythe (1957)].
If there is not a natural ordering of the variables, then the following prob-
lem of subset selection is sometimes considered. For each value of k =1,2,
. . . , n, find the set Jk consisting of k indices such that the residual norm pk
obtained by solving for only the k variables x,, i e Jk, is as small as can be
attained by any set of k variables. Usually one would also introduce a linear
independence tolerance t and limit consideration to sets of k variables such
that the associated matrix satisfies some conditioning test (such as size of
pivot element) involving r.
The sequence {pk} is obviously nonincreasing with increasing k and on
might introduce a number p with the meaning that the computational pro-
cedure is to stop if a value of k is reached such that pk < p. Alternatively,
one might stop when pk —pk+1,is smaller than some tolerance, possibly de-
pending upon k. This latter type of termination test arises from statistical
considerations involving the "f" distribution [e.g., see Plackett (I960].
This problem, as stated, appears to require for each value of & an ex-
haustive procedure for solving for all combinations of k variables. This
exhaustive approach becomes prohibitively expensive if is large.
Methods have been devised that exploit the partial ordering (of set inclu-
sion) among the subsets to determine the optimal subsets jk without explic-
itly testing every subset. See LaMotte and Hocking (1970) and Garside
(1971) and references cited there for an exposition of these ideas. Sugges-
tions for basing subset selection on ridge regression (see Section 4, this
chapter) are given in Hoerl and Kennard (1970b).
An alternative course of action is to compromise by solving the following
more restricted problem instead. Solve the problem as stated above for k = 1.
Let j1 be the solution set /,. After the set Jk has been determined, consider
only candidate sets of the form jk U {j} forj e jk in searching for a pre-
ferred set of k + 1 variables that will constitute the set Jk+1. Let us denote
the sequence of sets formed in this way by jk, k = 1 , . . . , n , and the associ-
ated residual norms by pk. Note that jk = Jk and pk = pk for k = 1 and k =
if, but for 1 < k < n, jk is generally not the same as jk, and pk > pk.
This type of algorithm is referred to as stepwise regression [Efroymson,
pp. 191-203 in Ralston and Wilf (I960)]. The computation can be organized
so that the selection of the new variable to be introduced at stage k is only
slightly more involved than the pivot selection algorithm normally used in
solving a system of linear equations.
This idea may be elaborated further to consider at each stage the possibil-
ity of removing variables whose contribution to making the residual small
has become insignificant due to the effect of variables subsequently introduced
into the solution (Efroymson,loc.cit.).
196 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS CHAP. 25

The primary mathematical complication of stepwise regression is the fact


that the amount that one coefficient, say xj, contributes to reducing the re-
sidual norm depends in general upon what other set of coefficients are being
included in the solution at the same time. This drawback can be circumvented
by performing a linear change of variables to obtain a new set of variables
whose individual effects upon the residual vector are mutually independent.
In statistical terminology the new set of variables is uncorrelated. In algebraic
terms this amounts to replacing the A matrix by a new matrix X = AC such
that the columns of X. are orthogonal. There are a variety of distinct matrices,
C, that would accomplish such a transformation and in general different sets
of uncorrelated variables would be produced by different such matrices. One
such transformation matrix that has some additional desirable properties is
C — V where V is the matrix arising in a singular value decomposition, A =
US
In particular, if the transformation matrix C is required to be orthogonal,
then for AC to have orthogonal columns it follows that C must be the matrix
V of some singular value decomposition of A.
Furthermore the numbers t1, that occur as the square roots of the diagonal
elements of the diagonal matrix [(AC)t(AC)]- 1 have a statistical interpretation
as being proportional to the standard deviations of the new coefficients. The
choice of C = V causes these numbers tt to be reciprocals of the singular
values of A. This choice minimizes the smallest of the standard deviations,
and also minimizes each of the other standard deviations subject to having
selected the proceeding variables.
The use of the singular value decomposition of A will be discussed further
in the next section.

Section 6. SINGULAR VALUE ANALYSIS

Suppose a singular value decomposition (see Chapters 4 and 18) is com-


puted for the matrix A.

One can then compute

and consider the least squares problem


CHAP. 25 PRACTICAL ANALYSIS OP LEAST SQUARES PROBLEMS 197

where p is related to x by the orthogonal linear transformation

Problem (25.55) is equivalent to the problem Ax = b in the sense discussed


in Chapter 2 for general orthogonal transformations of least squares prob-
lems.
Since S is diagonal (S = Diag {s1, sn}) the effect of each component
of p upon the residual norm is immediately obvious. Introducing a compo-
nent pj with the value

reduces the sum of squares of residuals by the amount


Assume the singular values are ordered so that sk > sk+1, K = 1 , . . . ,
n = 1. It is then natural to consider "candidate'' solutions for problem (25.55)
of the form

where pj is given by Eq. (25.57). The candidate solution vector p(k) is the
pseudoinverse solution (i.e., the minimal length solution) of problem (25.55)
under the assumption that the singular values sj, for j > k are regarded as
being zero.
From the candidate solution vectors p(k) one obtains candidate solution
vectors x*" for the problem Ax S b as

where v(/) denotes the jth column vector of V. Note that

hence |x(k)| is a nondecreasing function of k. The squared residual norm


198 PRACTICAL ANALYSIS OF LEAST SQUARES PROBLEMS CHAP. 25

associated withx(k)is given by

Inspection of the columns of the matrix V associated with small singular


values is a very effective technique for identifying the sets of columns of A
that are nearly linearly dependent [see Eq. (12.23) and (12.24)].
It has been our experience that computing and displaying the matrix V,
the quantities sk, sk-1, pk, x(k), and || x(k)|| for k = 1,..., n and gk, g2k, p2k, and
pk for k = 0,1,..., n is extremely helpful in analyzing difficult practical
least squares problems. For certain statistical interpretations [see Eq. (12.2)]
the quantities

are also of interest.


Suppose the matrix A is ill-conditioned; then some of the later singular
values are significantly smaller than the earlier ones. In such a case some of
the later pj values may be undesirably large. Typically one hopes to locate
an index k such that all coefficients ps for j < k are acceptably small, all
singular values Sj for y < k are acceptably large, and the residual norm pk is
acceptably small. If such an index k exists, then one can take the candidate
vector x(k) as an acceptable solution vector.
This technique has been used successfully in a variety of applications. For
example, see Hanson (1971) for applications to the numerical solution of
Fredholm integral equations of the first kind.
Once the singular values and the vector g have been computed it is also
a simple matter to compute the numbers \\p(l)\\ and wl of the Levenberg-
Marquardt stabilization method [see Eq. (25.47) and (25.48)] for a range of
values of A. These quantities are of interest due to the optimality property
stated in Theorem (25.49).
An example in which all the quantities mentioned here are computed and
interpreted for a particular set of data is given in the following chapter.
EXAMPLES OF SOME METHODS OF

26 ANALYZING A LEAST SQUARES


PROBLEM

Consider the least squares problem A x = b where the data matrix


Matrix[a15xg:b15x1]is given by Table 26.1. (These data are given also as the data
element, DATA4, in Appendix C.) We shall assume there is uncertainty of
the order of 0.5 x 10- in the elements of A and 0.5 x 10-4 in the compo-
nents of b.
First consider a singular value analysis of this problem as described in
Section 6 of Chapter 25. The computation and printing of the quantities used
in singular value analysis is accomplished by the Fortran subroutine SVA
(Appendix C). The Fortran main program PROG4 (Appendix C) applies the
subroutine SVA to the particular data of the present example. Executing
PROG4 on a UNIVAC 1108 computer resulted in the output reproduced in
Fig. 26.1.
The subroutine SVA in general performs a change of variables x = Dy
and treats the problem (AD)y = b. In this example we have set D — /so the
label y in Fig. 26.1 is synonymous with x for this example.
Recall that the vector; (output column headed "G COEF") is computed
asg = UTb, where tf is orthogonal. Since the fourth and fifth components of
g are smaller than the assumed uncertainty in b, we are willing to treat these
two components of g as zero. This leads us to regard the third candidate
solution, x<3) (output column headed "SOLN 3") as the most satisfactory
solution.
Alternatively one could compare the numbers sk [see Eq. (25.62)] in the
column headed "N.S.R.C.S.S" (meaning formalized Square Root of Cumu-
lative Sum of Squares) with the assumed uncertainty (0.5 x 10-4)in b. Note
that s2(= 1.1107 x 10-2) is significantly larger than this assumed uncertainty
while s3, (= 4.0548 x 10-5) is slightly smaller than this uncertainty. This
199
200 SOME METHODS OF ANALYZING A LEAST SQUARES PROBLEM CHAP. 26

Table 26.1 THE DATA MATRIX [A: AJ


-.13405547 -.20162827 -.16930778 -.18971990 -.17387234 -.4361
-.10379475 -.15766336 -.13346256 -.14848550 -.13597690 -.3437
—.08779597 -.12883867 -.10683007 -.12011796 -.10932972 -.2657
.02058554 .00335331 -.01641270 .00078606 .00271659 -.0392
-.03248093 -.01876799 .00410639 -.01405894 -.01384391 .0193
.05967662 .06667714 .04352153 .05740438 .05024962 .0747
.06712457 .07352437 .04489770 .06471862 .05876455 .0935
.08687186 .09368296 .05672327 .08141043 .07302320 .1079
.02149662 .06222662 .07213486 .06200069 .05570931 .1930
.06687407 .10344506 .09153849 .09508223 .08393667 .2058
.15879069 .18088339 .11540692 .16160727 .14796479 .2606
.17642887 .20361830 .13057860 .18385729 .17005549 .3142
.11414080 .17259611 .14816471 .16007466 .14374096 .3529
.07846038 .14669563 .14365800 .14003842 .12571177 .3615
.10803175 .16994623 .14971519 .15885312 .14301547 .3647

could be taken as a reason to select x(3) as the preferred candidate


solution.
If the assumed uncertainty in b is given the statistical interpretation of
being the standard deviation of the errors in b, then multiplication of this
quantity (O.5 x 10~4) times each reciprocal singular value (column headed
"RECIP. S.V.") gives the standard deviation of the corresponding component
of the vector p (column headed "P COEF"). Thus the first three components
of p exceed their respective standard deviations in magnitude while the last
two components are smaller in magnitude than their standard deviations.
This could be taken as a reason to prefer the(k)candidate solution x(3).
As still another method for choice of an x , one would probably reject
the candidate solutions x(l) and x<2> because their associated residual norms
are too large (column headed "RNORM"). One would probably reject x">
because || x(3) || is too large (column headed "YNORM"). The choice between
x<3> and x(4) is less decisive. The vector x(3) would probably be preferred to
x(4) since || x(3) || <||x(4)|| and the residual norm associated with x(4) is only
slightly smaller than that associated with x(3>.
While these four methods of choosing a preferred x(k) have led us to the
same selection, x(3), this would not always be the case for other sets of data.
The user must decide which of these criteria (possibly others included) are
most meaningful for his application.
It is of interest to compare some other methods of obtaining stabilized
solutions to this same problem. The Levenberg-Marquardt analysis (Chap-
ter 25, Section 4) provides a continuum of candidate solutions. Using the
Levenberg-Marquardt data shown in Fig. 26.1 one can produce the graph,
Fig. 26.2, of RNORM versus YNORM. Following Theorem (25.49) this
curve constitutes a boundary line in the (YNORM-RNORM)-plane such
CHAP. 26 SOME METHODS OF ANALYZING A LEAST SQUARES PROBLEM 201

Fig. 26.1 Output from the subroutine SVA for the sample problem
of Chapter 26.

that for any vector y the coordinate pair (\\y\\, \\b — Ay\\) lies on or above
the curve.
For more detailed information Eq. (25.46), (25.38), and (25.35) can be
used to compute and plot values of the individual solution components as
functions of L Figures 26.3 and 26.4 show this information for the present
example. Graphs of this type are extensively discussed in Hoerl and Kennard
(1970b).
Figure 26.5 provides a comparison of solution norms and residual norms
of the five candidate solutions obtained by singular value analysis with the
corresponding data for the continuum of Levenberg-Marquardt solutions.
We have also applied the subroutine, HFTI (see Appendix C) to this
example. The magnitudes of the diagonal elements of the triangular matrix
202 SOME METHODS OF ANALYZING A LEAST SQUARES PROBLEM CHAP. 26

Fig. 26.2 Residual norm versus solution norm for a range of values
of the Levenberg-Marquardt stabilization parameter A.

Fig.2&3 Solution coefficients and residual norm as a function of A.


CHAP. 26 SOME METHODS OF ANALYZING A LEAST SQUARES PROBLEM 203

Fig. 26.4 Same data as Fig. 26.3 with expanded vertical scale.

R to which the matrix A is transformed are 0.52, 0.71 x 10-1, 0.91 x 10-2,
0.14 x 10-4, and 0.20 x 10~6. It is interesting to note that these numbers
agree with the respective singular values of A to within a factor of two. From
Theorem (6.31) it is known that the singular values, st, and the diagonal
elementsttt,must satisfy

By setting the pseudorank tolerance parameter successively to the values


T = 0.29,0.040,0.0046,0.0000073, and 0.0, the subroutine HFTI was used
to produce five different solutions, say z(k) associated with pseudoranks of
k = 1,2,3,4, and 5. The solution norms and residual norms for these five
vectors are shown in Table 26.2.
204 SOME METHODS OF ANALYZING A LEAST SQUARES PROBLEM CHAP. 26

Fig. 265 Candidate solutions x(t) from singular value analysis


compared with the continuum of Levenberg-Marquaxdt solutions.

Table 26.2 SOLUTION AND RESIDUAL NORMS USINg HFTI ON THE SAMPLE
PROBLEM

k ||z(k)|| ||b-Az(k)||

1 0.99719 0.216865
2 2.24495 0.039281
3 4.58680 0.000139
4 4.929S1 0.000139
5 220.89008 0.000138

Note that the data of Table 26.2 are quite similar to the corresponding
data (columns headed "YNORM" and "RNORM") given in Fig. 26.1 for
the candidate solutions obtained by singular value analysis.
As still another way of analyzing this problem, solutions were computed
using each of the 31 possible nonnull subsets of the five columns of the
matrix A. Solution norms and residual norms for each of these 31 solutions
are listed in Table 26.3 and are plotted in Fig. 26.6.
CHAP. 26 90MB METHODS OF ANALYZING A LEAST SQUARES PROBLEM 206

FIG.26.6Solutions using subsets of the columns of A.


We shall use the notation w(l,j,...) to denote the solution obtained using
columns i , j , . . . .
From Fig. 26.6 we see that the simplest form of stepwise regression would
select solution w(3) as the preferred solution among those involving only one
column.
It would next consider solutions to"'", w(1,3), w(3>4), and w(3,5). It would
select w(1,3) as the one of these four solutions providing the smallest residual
norm. Note that this solution has a residual norm larger by a factor of 85
than that corresponding to w(1,5). This latter solution clearly gives the smallest
norm obtainable using any set of two columns.
At the next stage simple stepwise regression would select w(1,3,5), which
is quite similar in solution norm and residual norm to the previously men-
tioned x(3) and z(3).
206 SOME METHODS OP ANALYZING A LEAST SQUARES PROBLEM CHAP. 26

The progress of a stepwise regression algorithm beyond this stage would


depend critically on details of the particular algorithm due to the very ill-
conditioned problems that would be encountered in considering sets of four
or five columns.
Table 26.3 SOLUTION AND RESIDUAL NORMS USING SUBSETS OF THE COLUMNS

Columns Used II "II ||b-aw||

1 2.46 0.40
2 1.92 0.22
3 2.42 0.07
4 2.09 0.19
5 2.30 0.19
1.2 5.09 0.039
1,3 2.72 0.052
1,4 4.53 0.023
1,5 5.07 0.001
2,3 3.03 0.053
2,4 20.27 0.030
2,5 17.06 0.128
3,4 3.07 0.056
3,5 2.97 0.058
4,5 17.05 0.175
,2,3 22.1 0.00018
,2.4 10.8 0.00018
,2.5 5.0 0.00014
,3.4 8.1 0.00015
,3.5 5.0 0.00014
,4,5 4.9 0.00014
2,3,4 13.5 0.00020
2,3.5 7.6 0.00014
2,4,5 24.0 0.00028
3,4,5 17.3 0.00017
1,2.3.4 10.3 0.00014
1,2,3,5 5.0 0.00014
1,2,4,5 5.0 0.00014
1,3,4,5 5.0 0.00014
2,3,4,5 9.0 0.00014
1,2,3,4,5 220.9 0.00014
MODIFYING A QR DECOMPOSITION
TO ADD OR REMOVE ROW VECTORS
WITH APPLICATION TO SEQUENTIAL
PROCESSING OP PROBLEMS
HAVING A LARGE OR BANDED
27 COEFFICIENT MATRIX
Section 1. Sequential Accumulation
Section 2. Sequential Accumulation of Banded Matrices
Section 3. An Example: Line Splines
Section 4. Data Fitting Using Cubic Spline Functions
Sections. Removing Rowe of Data

In this chapter we present adaptations of orthogonal transformations to


the sequential processing of data for Problem LS. Such methods provide a
means of conserving computer storage for certain types of problems having
voluminous data.
These methods also have important application to problems in which one
wishes to obtain a sequence of solutions for a data set to which data are
being sequentially added or deleted. A requirement for this type of computa-
tion arises in the class of problems called sequential estimation, filtering, or
process identification. The augmentation methods presented in this chapter
have excellent properties of numerical stability. This is in contrast to many
published approaches to this problem that are based on the notion of updat-
ing the inverse of a matrix.
The problem of deleting data can be an inherently unstable operation
depending upon the relationship of the rows to be deleted to the whole data
set. Some methods will be presented that attempt to avoid introducing any
additional unnecessary numerical instability.
Write the least squares problem in the usual way:

In Section 1 we treat the problem in which the number of rows of A is


large compared with the number of columns. In Section 2 attention is devoted
to the case in which A has a "banded" structure. The methods presented in
207
208 MODIFYING A QR DECOMPOSITION CHAP. 27

both Sections 1 and 2 are applicable to the sequential estimation problem since
the solution vector or its associated covariance matrix can readily be com-
puted at each step of the sequential matrix accumulation if required. In
Section 3 an application of these ideas is presented using the example of curve
fitting using line splines. In Section 4 a procedure is described for least squares
data fitting by cubic splines having equally spaced breakpoints. This provides
a further example of the applicability of the banded sequential accumulation
method of Section 2.
Methods of deleting data will be described in Section 5.

Section 1. SEQUENTIAL ACCUMULATION

In this section we shall describe an algorithm for transforming the matrix


[A: b] to upper triangular form without requiring that the entire matrix
[A:b]be in computer storage at one time. A formal description of this pro-
cedure will be given as Algorithm SEQHT (27.10).
We first establish the notation to be used and give an informal description
of the process. Write the matrix A and the vector b in partitioned form:

where each A1 is m1 x n and each b1 is a vector of length m1.We have m =


m1 + • • • + mo, of course. The integers mt can be as small as 1, which per-
mits the greatest economy of storage. Remarks on the dependence of the total
number of arithmetic operations on the size of m, are given on page 211.
The algorithm will construct a sequence of triangular matrices [r1: d1],
i = 1,..., q, with the property that the least squares problem

has the same solution set and the same residual norm as the problem
CHAP. 27 MODIFYING A QR DECOMPOSITION 209

The significant point permitting a saving of storage is the fact that for each t,
the matrix[r1:d1]can be constructed and stored in the storage space previ-
ously occupied by the augmented matrix

Thus the maximum number of rows of storage required is maxj [mf + min
For notational convenience let [Re: de] denote a (null) matrix having no
rows. At the beginning of the ith stagee of the algorithm one has themt-1xx
(a +1) matrix [Rl-l :dt-i] from the (i — l)st stage and the new m, x
(n + 1) data matrix [At: &,]. Let m, = mt-1 + m,. Form the mt X (n + 1)
augmented matrix of Eq. (27.4) and reduce this matrix to triangular form
by Householder triangularization as follows:

where mt, = min (n + 1, m,}.


This completes the ith stage of the algorithm. For future reference, let
the result at the qth stage of the algorithm be denoted by

It is easily verified that there exists an orthogonal matrix Q such that

Therefore, the least squares problem


210 MODIFYING A QR DECOMPOSITION CHAP. 27

is equivalent to the original problem in Eq. (27.1) in that it has the same set
of solution vectors and the same minimal residual norm. Furthermore the
matrix R has the same set of singular values as the matrix A of Eq. (27.1).
We shall now describe a computing algorithm that formalizes this pro-
cedure. To this end set

and

The processing will take place in a computer storage array W having at


least m rows and n + 1 columns. We shall use the notation W(it: i2, ji :j2)
to denote the subarray of W consisting of rows i1 through i2 of columns y,
through y2. We use the notation W(iJ) to denote the (ij) element of the array
W. The symbol p identifies a single storage location.
(27.10) ALGORITHM SEQHT (Sequential Householder Triangularization)
Step Description
1 Set / := 0.
2 For / := 1,..., q, do Steps 3-6.
3 Setp:=/ + m,.
4 Set W(l+ 1:/», 1:» + 1) := \A,:bt] [see Eq. (27.2) and
(27.3)].
5 For /:=!,..., min (n + ltp — 1), execute Algorithm
H1 (i, max (i + 1, / + 1), P, W(l, 0, * W(1, i + 1),
»-/+!).
6 Set l:= min (n+1,p).
7 Remark: The matrix [A: b] has been reduced to upper
triangular form as given in Eq. (27.6).
A sequence of Fortran statements implementing Step 5 of this algorithm
is given as an example in the user's guide for the subroutine H12 in Appen-
dix C.
Notice, in Step 4, that only the submatrix [At: bt] needs to be intro-
duced into computer stores at one time. All these data can be processed in
the ft x (» + 1) working array W. By more complicated programming one
could further reduce the storage required by exploiting the fact that each ma-
trix [R,: dj] is upper triangular.
CHAP. 27 MODIFYING A QR DECOMPOSITION 211

For the purpose of discussing operation counts let a denote an addition


or subtraction and m denote a multiplication or division. Define

It was noted in Table 19.1 that if quadratic and lower-order terms in m and
n are neglected, the number of operations required to triangularize an m x it
matrix (m > n) using Householder transformations is approximately v(2a +
2m). If the Householder processing is done sequentially involving q stages as
in Algorithm SEQHT (27.10), then the operation count is increased to ap-
proximately v(2a + 2/iXm + 9)/m. If the entering blocks of data each con-
tain k rows (kg = m), then the operation count can be written as v(2a + 2m)
(k+1)/k.
Thus sequential Householder accumulation increases in cost as the block
size is decreased. In the extreme case of a block size of A: = 1, the operation
count is approximately doubled relative to the count for nonsequential
Householder processing.
For small block sizes it may be more economical to replace the House-
holder method used at Step S of Algorithm SEQHT (27.10) with one of the
methods based on 2 x 2 rotations or reflections. The operation counts for
these methods are independent of the block size. The operation count for
triangularization using the Givens method [Algorithms Gl (10.25) and G2
(10.26)] is v(2a + 4ft). The 2x2 Householder transformation expressed
as in Eq. (10.27) has an operation count of v(3a + 3m).
Gentleman's modification (see Chapter 10) of the Givens method reduces
the count to v(2a + 2m). Thus this method is competitive with the standard
Householder method for nonsequential processing and requires fewer arith-
metic operations than the Householder method for sequential processing.
Actual comparative performance of computer programs based on any of
the methods will depend strongly on coding details.
Following an application of Algorithm SEQHT (27.10), if the upper trian-
gular matrix R of Eq. (27.6) is nonsingular, we may compute the solution, X,
by solving

The number e in Eq. (27.6) will satisfy

In many applications one needs the unsealed covariance matrix [see Eq.
(12.1)3
212 MODIFYING A QR DECOMPOSITION CHAP. 27

Computation of R~1 can replace R in storage and then computation of


(the upper triangular pan of) R~1(R-1)t can replace R-1 in storage. Thus the
matrix C of Eq. (27.13) can be computed with essentially no additional
storage. See Chapter 12 for a more detailed discussion of this computation.
Furthermore, the quantity a2 of Eq. (12.2) can be computed as

where e is defined by Eq. (27.6).


For definiteness, Algorithm SEQHT was presented as a loop for f = 1,
..., q. In an actual application using this sequential processing it is more
likely that Steps 3-6 of the algorithm would be implemented as a subroutine.
The number q and the total data set represented by the matrix [A: b] need
not be known initially. The calling program can use the current triangular
matrix at any stage at which R is nonsingular to solve for a solution based
on the data that have been accumulated up to that point. By using an
additional set of n(n + l)/2 storage locations, the calling program can also
compute the upper triangular part of the unsealed covariance matrix Ct =
Rt-1(Rt-1)t at any stage at which Rt is nonsingular. Thus Steps 3-6 of Algo-
rithm SEQHT provide the algorithmic core around which programs for
sequential estimation can be constructed.

Section 2. SEQUENTIAL ACCUMULATION


OF BANDED MATRICES

In some problems, possibly after preliminary interchange of rows or


columns, the data matrix [A: b] of Eq. (27.2) and (27.3) has a banded struc-
ture in the following sense. There exists an integer n><,n and a nondecreas-
ing set of integersj1,...,.j,such that all nonzero elements in the submatrix
At occur in columns jt through jt + n — 1.
Thus At is of the form

We shall refer to nb as the bandwidth of A.


It can be easily verified that all nonzero elements in the ith row of the up-
per triangular matrix R of Eq. (27.6) will occur in column / through i + n—
1. Furthermore, rows 1 through j, — 1 of R will not be affected by the pro-
cessing of the submatrices At,...,Ag in Algorithm (27.10).
These observations allow us to modify Algorithm (27.10) by managing
the storage in a working array G that needs only nb + 1 columns. Specifically,
let G denote a m x (nb+ 1) array of storage with m satisfying Eq. (27.9).
CHAP. 27 MODIFYING A QR DECOMPOSITION 213

The working array G will be partitioned by the algorithm into three sub-
arrays g1, g2, and G3 as follows:
(27.15) GI = rows 1 through tp, - 1 of G
(27.16) =G2rowsipthrough ir - 1 of G
(27.17) G, = rows J, through /, + m, — 1 of G
The integers i, andirare determined by the algorithm and their values chang
in the course of the processing. These integers are limited by the inequalities
l<ip<ir<n + 2.
At the various stages of the algorithm the (nb + l)st column of the G
array contains either the vector Bt [e.g., see the left side of Eq. (27.4)] or the
processed vector dt [e.g., see the right side of Eq. (27.5)].
For 1 < j < n the identification of matrix elements with storage locations
is as follows:
(27.18) In GI : storage location (i,j) contains matrix element
(/,/+./-1)
(27.19) In Gl: storage location (i,j) contains matrix element
(i,i ,+ j-i)
(27.20) In G1: storage location (i,j) contains matrix element
(i,je+j-1)
To indicate the primary idea of a compact storage algorithm for the
banded least squares problem consider Fig. 27.1 and 27.2. If Algorithm (27.10)
were applied to the block diagonal problem with nb = 4, then at the step in
which the data block \Cn bt] is introduced the working array might appear as
in Fig. 27.1. Taking advantage of the limited bandwidth these data can be
packed in storage as in Fig. 27.2.
In this illustration the inequalities i, <j, ^ t, are satisfied. For furthe
diagrammatic illustration of this algorithm the reader may wish to look
ahead to Section 3 and Fig. 27.4 and 27.5.
A detailed statement of the algorithm follows. In describing the algorithm
we shall use the notation G(i,j) to denote the (ij) element of the array G
and the notation G(il: i2,j1, :jt) to denote the subarray of G consisting of
all elements G(i,j) with i1 < i < i2, and j1 < j< J2.
(27.21) ALGORITHM BSEQHT (Banded Sequential Householder Trian-
gularization)
Step Description
1 Set ir := land/, := 1.
2 For t := 1,...,q, do Steps 3-24.
214 DECOMPOSITION
MODIFIYIGN A QR DECOMPOSITION CHAP. 27

Fig. 27.1 Introducing the data C, and br

Step Description
3 Remarks: At this point the data [C,: bt] and the integers
mt and jt must be made available to the algorithm. It is as-
sumed that m, > 0 for all t and jq > jq-1 > • • • >jt > 1.
4 Set G(ir:ir + mt - 1, 1:nb +): =[Ct:bt][SeeEq.(27.14)
for a definition of C,. Note that the portion of the G array
into which data are being inserted is the subarray (7, of
Eq. (27.17).]
5 If jt, — ip go to Step 18.
6 Remark: Here the monotonicity assumption on j, assures
that jt > ip.
7 If y, ^ in go to Step 12.
CHAP. 27 MODIFTOK} A QR DECOMPOSITION 215

Fig. 27.2 The samedataas in Fig. 27.1 but packed into the storage
array, G.

Step Description
8 Remark: Here jt exceeds /,. This indicates rank 'deficiency
since the diagonal element in position (ir, ir) of the trian-
gular matrix R [see Eq. (27.6)] will be zero. Nevertheless,
the triangularization can be completed. Some methods for
solving this rank-deficient least squares problem will be
discussed in the text following this algorithm.
9 Move the contents of G(ir: /r + m, — 1,1:nb+ 1) into
G(jt :Jt + >m1- 1,1:nb+ !).
10 Zero the subarray G(i,: jt - 1,1:»»+ 1).
11 Set/,:-/,.
12 Set m := min (nb - 1, / , - / , — 1); if /i — 0, go to Step
17.
13 For l := 1 , . . . , m, do Steps 14-16.
14 Set k := min (/, jt - /,).
216 MODIFYING A QR DECOMPOSITION CHAP. 27

Step Description
15 For i : = / + l nb„ set (?(ip, + /, i - k) := G(ip + 1,
i).
16 For!:« 1 , . . . , * , set G(if + /,nb+ 1 - 0 := 0.
17 Set ip :=jt,. [Note that this redefines the partition line
between subarrays G1 and G2 of Eq. (27.15) and (27.16).]
18 Remark: Steps 19 and 20 apply Householder triangulariza-
tion to rows if through ir + mt— 1. As was discussed
following the statement of Algorithm SEQHT (27.10),
some reduction of execution time may be attainable by re-
placing the general Householder method by one of the
methods based on 2 x 2 rotations or reflections.
19 Set m :— if + m, — /,. Set k := min (nb+ 1, m).
20 For i := 1 £, execute Algorithm HI (/, max (i + 1,
I, - i, + 1), mt <?(/„ i), p, <?(/„ i -f 1),nb + 1 - /).
21 Set ir, := ip + k [Note that this redefines the partition line
between subarrays Gt and G3 of Eq. (27.16) and (27.17).]
22 If H < n> + 1, go to Step 24.
23 For j :- 1....,nb, set G(i, - 1,j) := 0.
24 Continue.
25 Remark: The main loop is finished. The triangular matrix
of Eq. (27.6) is stored in the subarrays (7, and Ga [see Eq.
(27.15) and (27.16)] according to the storage mapping of
Eq. (27.18) and (27.19).
If the main diagonal elements of the matrix R of Eq.
(27.6) are all nonzero, the solution of problem (27.7) can
be computed by back substitution using Steps 26-31. Note
that these diagonal elements of the matrix R are used as
divisors at Step 31. Discussion of some techniques for
handling the alternative (singular) case are given in the
text following this algorithm.
26 For i := 1 , . . . , n , set X(i) := G(i, nb + 1).
27 For i: n,n — 1 , . . . , 1 do Steps 28-31.
28 Set j := 0 and / :— max (0, i — /„).
29 If i = n, go to Step 31.
30 For j:= 2 , . . . , min(n + 1 — i,nb), set s := s +
G(i,j+1)xX(i-1+j).
31 Set X(i): = [X(i) - s]/G(it I + 1).
CHAP. 27 MODIFYING A QR DECOMPOSITION 217

Step Description
32 Remark: The solution vector x is now stored in the array
X. If, as would usually be the case, the full data matrix
[A: b] [see Eq. (27.2) and (27.3)] has more than n rows,
the scalar quantity, e, of Eq. (27.6) will be stored in location
G(n + 1, nt + 1). The magnitude of e is the norm cf the
residual associated with the solution vector, x.
The banded structure of A does not generally imply that the unsealed
covariance matrix C of Eq. (12.1) will have any band-limited structure. How-
ever, its columns, cjtj= 1 it, can be computed, one at a time, without
requiring any additional storage. Specifically the vector c, can be computed
by solving the two banded triangular systems

where et is the jth column of the n x n identity matrix.


The computational steps necessary to solve Eq. (27.22) or more generally
the problem Rty = z for arbitrary z are given in Algorithm (27.24). Here it is
assumed that the matrix R occupies the array G in the storage arrangement
resulting from the execution of Algorithm (27.21). The right-side vector, for
example et for Eq. (27.22), must be placed in the array X. Execution of Al-
gorithm (27.24) replaces the contents of the array X by the solution vector,
for example, Wj of Eq. (27.22).
(27.24) ALGORITHM Solving JP> = z
Step Description
1 Fory := 1 , . . . , n, do Steps 2-6.
2 Set j := 0.
3 Ify = 1, go to Step 6.
4 Set/i :=max(l,/-ii,+ l). Set i2 :<=*/-1.
5 Fori:=i l ,...,i2, set s:=s + X(t) x G(i,j-i+
max(0,i-ip)).
6 Set X(J) : = ( X ( j ) - j)/0(/, 1 + max(0,j-ip)).
To solve Eq. (27.23) for c} after solving Eq. (27.22) for wj one can simply
execute Steps 27-31 of Algorithm (27.21).
Furthermore, the quantity ?* of Eq. (12.2) may be computed as
218 MODIFYING A QR DECOMPOSITION CHAP. 27

where e is defined in Eq. (27.6). The quantity e is computed by the Algorithm


BSEQHT as noted at Step 32.
Fortran subroutines BNDACC and BNDSOL are included in Appendix
C as an implementation of Algorithm BSEQHT (27.21) and Algorithm
(27.24). Use of these subroutines is illustrated by the Fortran program
PROGS of Appendix C, which fits a cubic spline curve to discrete data and
computes the covariance matrix of the coefficients using the technique of
Eq. (27.22) and (27.23).
The remarks at the end of Section 1 regarding possible implementations
of Algorithm SEQHT (27.10) for sequential estimation apply with some
obvious modification to Algorithm BSEQHT also. The significant difference
is the fact that in Algorithm BSEQHT the number of columns having nonzero
entries is typically increasing as additional blocks of data are introduced.
Thus the number of components of the solution vector that can be determined
is typically less at earlier stages of the process than it is later.
At Steps 8 and 25 mention was made of the possibility that the problem
might be rank-deficient. We have found that the Levenberg idea (see Chapter
25, Section 4) provides a very satisfactory approach to this situation without
increasing the bandwidth of the problem. We refer to the notation of Eq.
(25.30) to (25.34).
Assume the matrix A. of Eq. (25.31) is banded with bandwidth nb. For
our present consideration assume that F is also banded with bandwidth not
exceeding nb. It is common in practice for the F matrix to be diagonal. By
appropriate row interchanges the matrix

of Eq. (25.31) can be converted to a new matrix

in which the matrix A has bandwidth nb. The matrix [ A : b ] can then be pro-
cessed by Algorithm BSEQHT. If the matrix F is nonsingular, then A is of
full rank. If the matrix Fis nonsingular and A is sufficiently large, then A will
be of full pseudorank so that Algorithm BSEQHT can be completed including
the computation of a solution.
If one wishes to investigate the effect of different values of A in problem
(25.31), one could first process only the data [A: b] of problem (25.31), re-
ducing this matrix to a banded triangular array

Assume that this matrix T is saved, possibly in the computer's secondary


storage if necessary.
CHAP. 27 MODIFYING A QR DECOMPOSITION 219

Then to solve problem (25.31) for a particular Value of A, note that an


equivalent problem is

The latter problem can be subjected to row interchanges so that the coef-
ficient matrix has bandwidthnb,and this transformed problem is then in the
form such that its solution can be computed using Algorithm BSEQHT.
This technique of interleaving rows of the matrix [AF: ld] with the rows
of the previously computed triangular matrix [R: d], to preserve a limited
bandwidth, can also be used as a method for introducing new data equations
into a previously solved problem. This avoids the need for triangularization
using the entire original data set for the expanded problem.
In the case where F = In and d = 0, the choice of parameter A can be
facilitated by computing the singular values and transformed right side of the
least squares problem of Eq. (27.7). Thus with a singular value decomposition

the vector

and the matrix

can be computed without additional arrays of storage other than the array
that contains the banded matrix R. The essential two ideas are: post- and
premultiply R by a finite sequence of Givens rotation matrices Jt, and Tt so
that the matrix B = Tv--t1,RJ1• • • /, is bidiagonal. The product Tv • • • r,J
replaces <7 in storage. Compute the SVD of the matrix B using Algorithm
QRBD (18.31). The premultiplying rotations are applied to d in storage,
ultimately producing the vector g of Eq. (27.25). A value of A can then be
determined to satisfy a prespecified residual norm or solution vector norm by
using Eq. (25.48) or (25.47), respectively.

Section 3. AN EXAMPLE: LINE SPLINES

To fix some of the ideas that were presented in Sections 1 and 2, consider
the following data-fitting (or data-compression) problem. Suppose that we
have m data pairs {(t„ y,)} whose abscissas occur in an interval of t such that
a<.tt<.b, i = I,... ,m.It is desired that these data be fit by a function
/(/) whose representation reduces the storage needed to represent the data.
Probably the simplest continuous function (of some generality) that can be
220 MODIFYING A QR DECOMPOSITION CHAP. 27

(least squares) fit to these data is the piecewise linear continuous function
defined as follows:
(27.26) Partition the interval [a, b] into n — 1 subintervals with breakpoints
t(l) satisfying a = t(1) < t(2) < - - • < t(n) = b.
For

define

and

The n parameters xt, i — 1,...,n, of Eq. (27.29) are to be determined as


variables of a linear least squares problem.
As an example, let us take m = 10 and n = 4 with breakpoints evenly
spaced. A schematic graph for such a problem is given in Fig. 27.3.

Fig. 27.3 A schematic example of a line spline with m - 10, n


4, and data points (rt, yt) indicated by dots.
CHAP. 27 MODIFYING A QR DECOMPOSITION 221

The least squares problem for the x, then has the form shown in Fig. 27.4.
Note that the coefficient matrix is banded with bandwidth nb = 2.

Fig. 27.4

The progress of the band-limited algorithm BSEQHT (27.21) for this


example is diagramed in Fig. 27.5.

Fig. 27.5
222 MODIFYING A QR DECOMPOSITION CHAP. 27

Stage (1) Initially ip = ir, = 1. The first block of data, consisting of the
nontrivial data in the first three rows of Fig. 27.4, is introduced. Set j1 = 1,
MI == 3.
Stage (2) This block is triangularized using Householder transforma-
tions. Set ir, = 4.
Stage (3) Introduce second block of data from Fig. 27.4. Set J2 = 2,
M2 = 3.
Stage (4) Left shift of second row exclusive of the last column, which
represents the right-side vector. Set ip = j2 (= 2).
Stage (5) Triangularization of rows 2-6. Set ir = 5.
Stage (6) Introduce third block of data from Fig. 27.4. Set J3 = 3,
m3 = 4.
Stage (7) Left shift of row 3 exclusive of last column. Set ip = ji (= 3).
Stage (8) Triangularization of rows 3-8. Set i, = 6.

The data appearing at stage (8) represent the least squares problem shown
in diagram (9). This problem can now be solved by back substitution.
As an illustration of the extent of storage that can be saved with the use
of this band-matrix processing of Section 2, consider an example of line-
spline fitting with m — 1000 data points using 100 intervals. Thus the least
squares problem will have n — 101 unknowns. Let us suppose further that
the row dimension of each block we shall process does not exceed 10. Then
the maximum size of the working array does not have to exceed [n + 1 +
max (m,)] x 3 = (101 + 1 + 10) x 3 = 336. The use of a less specialized
sequential accumulation algorithm such as that of Section 1 of this chapter
would require a working array of dimension at least [n + 1 + max (m,)] x
(it + 1) = (101 + 1 + 10) x 102 = 11,424. If all rows were brought in at
once, the working array would have to have dimensions m x (n + 1) = 1000
x 101 - 101,000.

Section 4. DATA FITTING USING CUBIC


SPLINE FUNCTIONS

As a further example of the use of sequential processing for banded least


squares problems we shall discuss the problem of data fitting using cubic
spline functions with uniformly spaced breakpoints. The Fortran program
PROGS given in Appendix C implements the approach described in this
section.
Let numbers bi<bt< • • • < bm be given. Let S denote the set of all
cubic spline functions defined on the interval [bi,bn] and having internal
breakpoints (often called knots) b 2 , ...,bn-1.A function/is a member of S
if and only if fis a cubic polynomial on each subinterval[bk,,bk+1]and is
CHAP. 27 MODIFYING A QR DECOMPOSITION 223

continuous together with its first and second derivatives throughout the
interval [b1,bn].
It can be shown that the set S constitutes an (n + 2)-dimensional linear
space of functions. Thus any set of n + 2 linearly independent members of
5, say [jq :j — 1 , . . . , n + 2), is a basis for 5. This implies that each f e S
has a unique representation in the form

Using such a basis, the problem of finding a member of 5 that best fits a
set of data {(x1 y,): x, e [bt, bn]; t = 1 , . . . , m] in a least squares sense takes
the form

where

Here c is the (n + 2>vector with components c, and y is the m-vector with


components yt.
There exist bases for S with the property that if the data are ordered so
that x1 < X2 <; • • • <, xm the matrix A will be band-limited with a band-
width of four. Methods for computing such basis functions for unequally
spaced breakpoint sets are described in Carasso and Laurent (1968), de Boor
(1971 and 1972), and Cox (1971). Here we restrict the discussion to uniformly
spaced breakpoint sets in order to avoid complexities not essential to the
illustration of banded sequential processing.
To define such a basis for a uniformly spaced breakpoint set let

Define the two cubic polynomials

and

Let 11 denote the closed interval [b1, b2] and let 1k denote the half-open in-
terval (bk, bk+,] for k — 2 , . . . , n — 1.
In the interval /* only four of the functions qt have nonzero values. These
four functions are defined for x e 1k by
224 MODIFYING A QR DECOMPOSIIION CHAP. 27

The Fortran program PROQ6 in Appendix C solves a sample data-fitting


problem using this set of basis functions and sequential Householder accu-
mulation of the resulting band-limited matrix. The data for this curve fitting
example are given in Table 27.1.
Table 27.l DATA FOR CUBIC SPLINE-FITTING EXAMPLE

* y x y
2 2.2 14 3.8
4 4.0 16 5.1
6 5.0 18 6.1
8 4.6 20 6.3
10 2.8 22 5.0
12 2.7 24 2.0

A parameter NBP is set to the values 5,6,7,8,9, and 10, respectively, to


specify the number of breakpoints for each of the six cases. In terms of the
parameter NBP the breakpoints for a case are defined as

The number of coefficients to be determined is NC = NBP + 2. Note


that in the sixth case the number of coefficients is equal to the number of data
points. In this sixth case the fitted curve interpolates the data.
Define

where rt is the residual at the ith data point. The value of RMS for each case
is given in Table 27.2. Note that RMS is not a monotone function of NBP.
Table 27.2 RMS AS A FUNCTION OF NBP

NBP 5 6 7 8 9 10
RMS 0.254 0.085 0.134 0.091 0.007 0.0

Plots of each of the six curve fits are given in Fig. 27.6.
CHAP. 27 MODIFYING A QR DECOMPOSITION 225

Fig. 27.6 Spline curves computed by PROG5 as fits to the data of


Table 27.1. Triangles along the baseline of each graph indicate
breakpoint abcissas.

Section 6. REMOVING ROWS OF DATA

Three methods will be described for removing a row of data from a least
squares problem, A x = b . For notational convenience let
226 MODIFYING A QR DECOMPOSITION CHAP. 27

Suppose C has m rows and n columns and is of rank k. Suppose further that
a Cholesky factor R for C has been computed. Here R is a k x n upper trian-
gular matrix satisfying

for some m x m orthogonal matrix Q. The matrix R also satisfies

It will be convenient to make the further assumption that the k diagonal


elements of R are all nonzero. This will necessarily be true if k = n. If k < n,
this can always be achieved by appropriate column interchanges and re-
triangularization if necessary.
Let vt denote a row of C that is to be removed. Without loss of generality
we shall assume that tf is the last row of C. Let C denote the submatrix con-
sisting of the remaining rows of C.

Note that the rank of C may be either k or k — 1. The practical applications


of data removal most commonly involve only the case in which Rank (C) =
Rank(c) = n; however, we shall consider the general case of Rank (C) — 1
< Rank.(c) <, Rank (C) = k < n as long as it does not require a signifi-
cant amount of additional discussion.
We wish to find a Cholesky factor R for c. Thus R will be an upper trian-
gular k x n matrix having the same rank as C and satisfying

for some (m — 1) x (m — 1) orthogonal matrix Q. Such a matrix R will also


satisfy

The first two methods to be described appear in Gill, Golub, Murray, and
Saundcrs (1972).
ODIFYING MODIFYING A QR DECOMPOSTION 227

Row Removal Method 1

In this method it is assumed that the matrix Q of Eq. (27.32) is available


as well as the matrix R. Partition Q and rewrite Eq. (27.32) as

Left-multiply both sides of Eq. (27.37) by an orthogonal matrix that trans-


forms the vector q to zero, modifies a, and leaves the first k rows of Q un-
changed. This can be accomplished, for instance, by a single Householder
transformation or by (m — k — 1) Givens transformations. Equation (27.37)
is thereby transformed to

Next left-multiply both members of Eq. (27.38) by a sequence of & Givens


transformations that successively transform each component of p to zero
while modifying a. Let Gij denote a Givens transformation matrix operating
on rows / and/ First left-multiply byGk,k+1,next by Gk-1,k+i, and so forth
with the final transformation matrix beingG1,k+1.This sequence of operations
assures that R is transformed to an upper triangular matrix, say, R.
By these transformations Eq. (27.38) is transformed to

Since & is the only nonzero element in its column and the column is of unit
euctidean length it follows that a = ±1. Furthermore since the rows also
have unit euclidean length u must be zero. This in turn implies that wt = uvt
= +vt Thus Eq. (27.39) can be written as

Defining
228 MODIFYING A QR DECOMPOSITION CHAP. 27

it follows that Q is an (m — 1) x (m — 1) orthogonal matrix, R is a k x n


upper triangular matrix and

Thus Q and R satisfy the conditions required of Q and H in Eq. (27.35) and
(27.36).
The question of whether the rank of Risk or k—I depends upon whether
& in Eq. (27.38) is nonzero or zero. By assumption the k diagonal elements
of It are each nonzero. If & is nonzero, it is easily verified by noting the
structure and effect of the Givens transformation matrices used in passing
from Eq. (27.38) to Eq. (27.40) that the k diagonal elements of R will also be
nonzero.
Suppose on the other hand that & is zero. Then \\p\\ = 1, which assures
that some components of p are nonzero. Letp1denote the last nonzero com-
ponent of p, i.e., pi = 0, and if 1 = k, then pt = 0 for 1 < i < k. In the order
in which the matrices Gij are applied in transforming Eq. (27.38) to Eq.
(27.40), the matrix C1,k+1 will be the first transformation matrix that is not
simply an identity matrix. This matrix Gl>k+l will be a (signed) permutation
matrix that interchanges rows / and k + 1, possibly changing the sign of one
of these rows. In particular, its effect on R will be to replace row / with a row
of zero elements. Subsequent transformations will not alter this row. There-
fore, row l of R in Eq. (27.40) will be zero.
Thus Rank (R) will be less than k. The rank of R cannot be less than k— I
since Rank (R) = Rank (C) > Rank (C) - 1 = k - 1.

Row Removal Method 2

In a problem in which m is very large or in which data are being accumu-


lated sequentially one would probably not retain the matrix Q of Eq. (27.32).
In this case Method 1 cannot be used directly. Note, however, that the only
information from the matrix Q that contributes to the determination of .R in
Method 1 is the vector p of Eq. (27.37) and (27.38) and the number & of Eq.
(27.38). We shall see that these quantities p and & can be computed from R
and v.
Rewrite Eq. (27.32) as

Then using the partitions defined in Eq. (27.37) it follows that


CHAP. 27 MODIFYING A QR DECOMPOSITION 229

or equivatently

If R is of rank n, Eq. (27.45) constitutes a nonsingular triangular system that


can be solved for p.
If the rank k of R is less than n, the system in Eq. (27.45) is still consistent
and has a unique k-dimensional solution vector p. By assumption the first k
rows of Rt constitute a k x k triangular submatrix with nonzero diagonal
elements. This submatrix along with the first k components of v defines a
system of equations that can be directly solved for p.
Having determined p the number & can be computed as

as a consequence of the unit euclidean length of the column vector (ptt a, 0)T
in Eq. (27.38). The sign of & is arbitrary and so can be taken to be nonnega-
tive, as is implied by Eq. (27.46).
Having computed p and a one can compute R by using k Givens transfor-
mations as described following Eq. (27.38) in Method 1. This part of the com-
putation is expressed by the equation

It is to be expected that Method 2 will be a somewhat less accurate com-


putational procedure than Method 1 because of the necessity of computing p
and a from Eq. (27.45) and (27.46). The accuracy with which p is determined
will be affected by the condition number of R, which, of course, is the same
as the condition number of C.
This accuracy limitation appears to be inherent, however, in any method
of updating the Cholesky factor R to reflect data removal that does not make
use of additional stored matrices, such as Q or C. For a given precision of
computer arithmetic, updating methods that operate directly on the matrix
(ATA)-1 may be expected to be significantly less reliable than Method 2.
Row Removal Method 3

The thud method to be described is of interest because it requires fewer


operations than Method 2 and because it is algorithmically very similar to
one of the methods for adding data. This permits the design of fairly compact
code that can handle either addition or removal of data. The comparative
numerical reliability of Methods 2 and 3 has not been studied.
230 MODIFYING A QR DECOMPOSITION CHAP. 27

This method has been discussed by Golub (1969), Chambers (1971), and
Gentleman (1972a and 1972b). Following the practice of these authors we
shall restrict our discussion to the case in which Rank (C) = Rank (C) = k
= n. Properties of the more general case will be developed in the Exercises.
Letting i denote the imaginary unit (i2 = — 1), Eq. (27.36) can be written
as

By formal use of the Givens equations (3.5), (3.8), and (3.9). matrices
(l)
F can be defined that will triangularize as follows:

Here the matrix F(i) differs from the identity matrix only at the intersec-
tions of rows / and k +1 with columns / and k + 1. In these four positions
F(i) contains the submatrix

where

It can be shown that our assumption that both C and & are of full rank n
implies that the numbers d (l) defined by Eq. (27.53) are positive.
The multiplication by F(i) indicated in Eq. (27.50) can be expressed entirely
in real arithmetic as follows:
CHAP. 27 MODIFYING A QR DECOMPOSITION 231

The method of defining F(i) assures thatr(l)il= p(l) andv(l)l= 0. Fur-


thermore, it can be easily verified that V(k) = 0, that each matrixR(l)is upper
triangular, that Fa>TFa) = I, and that

Thus R satisfies the conditions required of A; i.e., R is an upper triangular


Chotesky factor for the matrix C of Eq. (27.34).
In practice the inherent potential instability of the data-removal problem
will be reflected in loss of digits in the subtraction by which d(l) is computed
and in the fact that the multipliers s(l) and T(l) in Eq. (27.57) and (27.58) can
be large.
Gentleman (1972a and 1972b) points out that modifications of the Givens
method that maintain a separate array of squared row scale factors [see Eq.
(10.32)] are particularly adaptable to Row Removal Method 3. Simply by
permitting a squared row scale factor to be negative one can represent a row
such as ii? of Eq. (27.48) that has an imaginary scale factor. Thus by per-
mitting negative as well as positive values of d2 in Eq. (10.31) and d2 in
Eq. (10.33) the method given in Eq. (10.31) to (10.44) will handle data
removal as well as data augmentation. Note that when d2 is negative the
quantity t of Eq. (10.35) is no longer bounded in the interval [0,1]; hence the
problem of avoiding overflow or underflow of D and B becomes more
involved.

EXERCISES
(27.60) [Chambers (1971)]
(a) Show that the numbers s(l) and T(l) of Eq. (27.55) and (27.56)
can be interpreted as being the secant and tangent, respectively,
of an angle 0(l).
(b) Show that these angles0(l)are the same angles that appear in
the Givens transformations G(l) (c(a) = coso(l), s(l) — sin0(l))
which would occur in the process of adjoining the row if to R
to produce R, i.e., in the operations represented by
232 MODIFYING A QR DECOMPOSITION CHAP. 27

(c) With O(l), c(l), and s(l) defined as above, show that as an alterna-
tive to Eq. (27.58)v(l)icould be computed as

when c(t) = l/o(l) and t(l) = t(l)/s(l).


(27.61) Show that Row Removal Method 3 can theoretically be extended to
the general case of Rank (C) - 1 < Rank(c) < Rank(C) = k <
n using the following observations. (Continue to assume, as at the
beginning of Section 5, that the k diagonal elements of R are non-
zero.)
(a) If S(l) is nonpositive for some / = 1,..., kt let h be the index
of the first such value of s(l), i.e., S(w) < 0 and if h & 1, d(l) > 0
for 1 < l< h. Show that d(l) = 0.
(b) With h as above it follows thatv(k-1)k=Er(k-1)kkwhere e = +1 or
e = —1. In this case show that it must also be true that v(k-1)j
=er(k-1)kj for j= h+1, ..., n.
(c) With A as above, show that it is valid to terminate the algorithm
by defining the final matrix R to be the matrix R(k-1) with row
h set to zero.
APPENDIX

A BASIC UNEAR ALGEBRA


INCLUDING PROJECTIONS

In this appendix we list the essential facts of linear algebra that are used
in this book. No effort is made to present a logically complete set of con-
cepts. Our intention is to introduce briefly just those concepts that are directly
connected with the development of the book's material.
For a real number x define

An n-vector x is an ordered it-tuple of (real) numbers, xlt..., XM.


An m x if matrix A is a rectangular array of (real) numbers having m
rows and n columns. The element at the intersection of row / and column./ is
identified as atj. We shall often denote this m x n matrix by Amxn
The transpose of an m x n matrix, A, denoted AT, is the n x m matrix
whose component at the intersection of row / and column j is aji.
The matrix product of an m x it matrix A and an l x k matrix B, written
AB, is defined only if / = n, in which case C = AB is the m x A; matrix with
components

Frequently it is convenient to regard an it-vector as an it x 1 matrix. In par-


ticular, the product of a matrix and a vector can be defined in this way.
It is also often convenient to regard an m x n matrix as consisting of
m it-vectors, as its rows, or n m-vectors as its columns.
The timer product (also called scalar product or dot product) of two n-
233
234 BASIC LINEAR ALGEBRA INCLUDING PROJECTIONS AFP. A

dimensional vectors u and v is defined by

Two vectors are orthogonal to each other if their inner product is zero.
The euclidean length or euclidean norm or l2 norm of a vector v, denoted
by || v 11, is defined as

This norm satisfies the triangle inequality,

positive homogeneity,

where a is a number and u is a vector, and positive definiteness,

These three properties characterize the abstract concept of a norm.


The spectral norm of a matrix A, denoted by \\A ||, is defined as

We shall also have occasion to use the Frobenioits norm (also called the
Schur or euclidean matrix norm) of a matrix A, denoted by || A||, and defined
as

The spectral and Frobenious norms satisfy

where A is an m x n matrix and k — min (m, it).


Both of these matrix norms satisfy the three properties of an abstract
norm plus the multiplicative inequalities
AFP. A BASIC LINEAR ALGEBRA INCLUDING PROJECTIONS 236

We shall let the numeral 0 denote the zero vector or the zero matrix with
the distinction and dimension to be determined by the context.
A set of vectors v1,..., vk is linearly dependent if there exist scalars a1,
...,ak, not all zero, such that

Conversely if condition (A.4) holds only with a, = • • • = a* = 0, the vec-


tors are linearly independent.
The set of all n-dimensional vectors is an n-dimensional vector space. Note
that if u and v are members of this vector space, then so is u + v and av
where a is any scalar. These two conditions of closure under vector addition
and scalar-vector multiplication characterize the abstract definition of a
vector space; however, we shall be concerned exclusively with the particular
finite-dimensional vector space whose members are it-tuples of real numbers.
If a subset T of a vector space S is closed under vector addition and scalar-
vector multiplication, then T is called a subspace. There is a maximal number
of vectors in a subspace T that can be linearly independent. This number, m,
is the dimension of the subspace T, A maximal set of linearly independent
vectors in a subspace T is a basis for T. Every subspace T of dimension m > 1
has a basis. In fact, given any set of k < m linearly independent vectors in an
m-dimensional subspace T, there exist m — k additional vectors in T such
that these m vectors together constitute a basis for T. If the vectors MI, . . . ,
um constitute a basis for rand v e Tt then there exists a unique set of scalars
Of such that v —Emt=1a1m1.
The span of a set of vectors v 1 . . . , vk is the set of all linear combina-
tions of these vectors, i.e., the set of all vectors of the form u =Ekt=1a1v1,for
arbitrary scalars a,. The span of a set of k vectors is a subspace of dimension
m<k.
There are certain subspaces that arise naturally in connection with ma-
trices. Thus with an m x n matrix A we associate the range or column space,
which is the span of its column vectors; the null space or kernel, which is the
set{x:Ax = 0); and the row space, which is the span of the row vectors.
Note that the row space of A is the range of AT.
It is often convenient to note that the range of a product matrix, say, A =
UVW, is a subspace of the range of the leftmost factor in the product, here
U. Similarly the row space of A is a subspace of the row space of the right-
most factor, here W.
The row and column spaces of a matrix A have the same dimension. This
number is called the rank of the matrix, denoted by Rank (A). A matrix AmXH
is rank deficient if Rank (A) < min (m, n) and of full rank if Rank (A) = min
(m, n). A square matrix Amxn is nonsingular if Rank (A) = n and singular if
Rank (A) < n.
236 BASIC LINEAR ALGEBRA INCLUDING PROJECTIONS APP. A

A vector v is orthogonal to a subspace T if v is orthogonal to every vector


in T. It suffices for v to be orthogonal to every vector in a basis of T. A sub-
space T is orthogonal to a subspace U if t is orthogonal to u for every t e T
and u e U. lf two subspaces T and U are orthogonal, then the direct sum of
T and U, written V = T U, is the subspace consisting of all vectors {v: v
= t + u,t e T,u e U}. The dimension of V is the sum of the dimensions
of T and U.
If T and U are mutually orthogonal subspaces of an it-dimensional vector
space S and S = T ® U, then T and U are called orthogonal complements of
each other. This is written as T = CM and U = T . For any subspaceT,T
exists. If T is a subspace of S and j e 5, then there exist unique vectors t €
T and u e T such that s = t+u. For such vectors one has the Pythagorean
Condition ||s||2 = ||t||2 + ||u||2
A linear flat is a translated subspace; i.e., if T is a subspace of S and s E
5, then the set L = {v: v = 5 + t, t € T} is a linear flat. The dimension of a
linear flat is the dimension of the unique associated subspace T. A hyperplane
H in an n-dimensional vector space S is an (n — l)-dimensional linear flat.
If His a hyperplane and h0 e H, then T— {t: t = h — h0t h e H] is an
(n — l)-dimensional subspace and T- is one-dimensional. If u e T, then
uTh takes the same value, say, d, for all h E. H. Thus given the vector u and
the scalar d, the hyperplane H can be characterized as the set {x: utx = d}. It
is from this type of characterization that hyperplanes arise in practical com-
putation.
A halfspace is the set of vectors lying on one side of a hyperplane, i.e., a
set of the form {x: UTX > d}. A polyhedron is the intersection of a finite num-
ber of halfspaces, i.e., a set of the form {x:ut1x>dt,i=1,...,m] or equi-
valently (x: Ux;> d} where U is an m x n matrix with rows uti, d is an m
vector with components dlt and the inequality is interpreted termwise.
The elements att of a matrix Amxm are called the diagonal elements of A.
This set of elements, {att},is called the mam diagonal of A. If all other elements
of A are zero, 4 is a diagonal matrix. If aij = 0 for |i — j| > 1,4 is ^(diago-
nal.
If a,, = 0 for j < i and/ > i + 1, A is upper bidiagonal. If a,, = 0 for
j < i, A is upper triangular. If atj = 0 for j < i — I, A is upper Hessenberg.
If Ar is upper bidiagonal, triangular, or Hessenberg, then A is lower bidi-
agonal, triangular, or Hessenberg, respectively.
A square n x it diagonal matrix with all diagonal elements equal to unity
is an identity matrix and is denoted by /„ or by /.
If BA = I, then B is a left inverse of A. A. matrixAmxnhas a left inverse if
and only if Rank (A) — n< m. The left inverse is unique if and only if
Rank (A) = it = m. By transposition one similarly defines a right inverse.
If A is square and nonsingular, there exists a unique matrix, denoted A-1,
that is the unique left inverse and the unique right inverse of A and is called
the inverse of A.
APP. A BASIC LINEAR ALGEBRA INCLUDING PROJECTIONS 237

A generalization of the concept of an inverse matrix is the pseudoinverse,


denoted A+, that is uniquely defined for any matrix AmKm and agrees with the
inverse when A is square nonsingular. The row and column spaces of A+
are the same as those of AT. The pseudoinverse is intimately related to the
least squares problem and is defined and discussed in Chapter 7.
A square matrix Q is orthogonal if QTQ = L It follows from the unique-
ness of the inverse matrix that QQT = / also.
A set of vectors is orthonormal if the vectors are mutually orthogonal and
of unit euclidean length. Clearly the set of column vectors of an orthogonal
matrix is orthonormal and so is the set of row vectors.
If C«x», m > it, has orthonormal columns, then \\Q\\ = 1, ||Q||f = n1/2,
||QA|| — ||QA|| and||QA||f = \\A \\r- IfQmxnm < n, has orthonormal rows,
then ||Q\\ = 1, ||Q||r = m1/2\ \\AQ\\ = ||A||, and \\AQ\|f=\\A\\r.. Note
that if Qmxn is orthogonal, it satisfies both these sets of conditions.
Some particular orthogonal matrices that are computationally useful are
the Givens rotation matrix,

the Givens reflection matrix,

and the Householder reflection matrix,

for an arbitrary nonzero vector ti.


A permutation matrix is a square matrix whose columns are some permu-
tation of those of the identity matrix. A permutation matrix is orthogonal.
A square matrix A is symmetric if AT — A. A symmetric matrix has an
eigenvalue-eigenvector decomposition of the form

where Q is orthogonal and E is (real) diagonal. The diagonal elements of E


are the eigenvalues of A and the column vectors of Q are the eigenvectors of
A. The jth column vector of Q, say, qj, is associated with the jth eigenvalue,
ejj, and satisfies the equation Aq, = etflt. The matrix A — eijI is singular.
The eigenvalues of a symmetric n x n matrix A are unique. If a number A
occurs m times among the n eigenvalues of A, then the m-dimensional sub-
space spanned by the m eigenvectors (columns of 0 associated with A is
238 BASIC LINEAR ALGEBRA INCLUDING PROJECTIONS AIT. A

uniquely determined and is called the eigenspace of A associated with the


eigenvalue L
A symmetric matrix is positive definite if all its eigenvalues are positive.
A positive definite matrix P is also characterized by the fact that xTPx > 0
for all x = 0.
A symmetric matrix S is nonnegative definite if its eigenvalues are non-
negative. Such a matrix satisfies xtSx ;> 0 for all x=0.
For any m x n matrix A the matrix S = ATA is symmetric and nonnega-
tive definite. It is positive definite if A is of rank n.
An invariant space of a square matrix A is a subspace T such that x e T
implies Ax e T. If S is a symmetric matrix, then every invariant space of S
is the span of some set of eigenvectors of S, and conversely the span of any
set of eigenvectors of S is an invariant space of S.
A symmetric matrix P is a projection matrix if all its eigenvalues are either
unity or zero. A matrix P is idempotent if P2 = P [equivalently P(I — P) —
0]. A matrix P is a projection matrix if and only if P is symmetric and
idempotent.
Let Pmxnbe a projection matrix with k unit eigenvalues. Let T denote the
k-dimensional eigenspace of P associated with the unit eigenvalues. The
subspace T is the (unique) subspace associated with the projection matrix P
and P is the (unique) projection matrix associated with the subspace T. The
subspace T is both the row space and column space of its associated projec-
tion matrix P and is characterized by

If Pis a projection matrix with associated subspace T and |jPx|| = ||x||,


then Px — x and x e T. Furthermore ||Px|| < \\x\\ for x e T and thus
||P|| =1 unless P=0.
For any matrix Amxn the matrix A*A is the n x n projection matrix as-
sociated with the row space of A, (I — A*A) is the projection matrix as-
sociated with the null space of A, and AA+ is the m x m projection matrix
associated with the range (column space) of A.
Given any n-vector x the unique representation of x as x= t + u where
t e T and u e T is given by t = Px and u = (I — P)x. The vector t is the
nearest vector to x in T in the sense that ||x — t|| = min{||x —1>||: v e T}.
The vector t is called the projection of x onto T.
For a discussion of projection operators including proofs of most of the
assertions above see pages 43-44 of Halmos (1957).
An arbitrary matrix AmKn has a singular value decomposition A =
UmxmSmxmVTmxn where U and V are orthogonal and S is diagonal with non-
negative diagonal elements. The diagonal elements of S (su, i — 1,..., k,
where k = min [m, n]) are the singular values of A. This set of numbers is
APP. A BASIC LINEAR ALGEBRA INCLUDING PROJECTIONS 239

uniquely determined by A. The number of nonzero singular values is the


rank of A. Furthermore

which provides a simple proof of the two rightmost inequalities (A.3).


Let A = USVt be a singular value decomposition of Amxn. Then eigen-
value decompositions of ATA and AAT are given by ATA — V(StS)Vr and
AAT^U(SSr)VT.
The singular value decomposition provides information useful in the
practical analysis of linear algebraic problems and is discussed further in
Chapters 4,5,18,25, and 26.
APPENDIX

B PROOF OF GLOBAL QUADRATIC


CONVERGENCE OF THE QR ALGORITHM

The purpose of this appendix is to give a proof of Theorem (18.5). The


proof given is a more detailed version of that given in Wilkinson (1968a and
1968b). Conclusion (a) of Theorem (18.5) is left as Exercise (18.46).
The content of Lemma (B.I) will be found, with proof, in Wilkinson
(1965a).
(B.I) LEMMA
Let A be an n X n symmetric tridiagonal matrix with diagonal terms
a1 ..., a. and super- and subdiagonal terms b 2 ,..., bB where each
bt is nonzero. Then the eigenvalues of A are distinct.
We also need the following simple lemma whose proof is left as Exercise
(B.52). Use will be made of notation introduced in the text and equations of
Chapter 18 from the beginning through Eq. (18.4).
(B.2) LEMMA
The elements of the matrices Ak and the shift parameters ak are bounded
in magnitude by ||A||, and the elements of the matricesand
k
la,)

k—
(Ak-skin
s

) and
Rk are bounded in magnitude by 2 \\A\\, for all k= 1,2,
We must examine the basic operations of the QR method with shifts to
establish properties of certain intermediate quantities and ultimately to esti-
mate the magnitude of certain of the off-diagonal terms of the matrices Ak.
Denote the diagonal terms of the shifted matrix (Ak —skIn)by

By the choice rule for sk [see Eq. (18.4)] it follows that the eigenvalue of the
240
AFP. B GLOBAL QUADRATIC CONVERGENCE OF THE QR ALGORITHM 241

lower right 2x2 submatrix of (Ak —skin,)which is closest to ai*> is zero.


Thus

and

The orthogonal matrix Qk is determined in the form

where eachJ(k)t-t,lis a rotation in the plane determined by the (i — l)st and ith
coordinate axes. The scalarsc(k)iands(k)tof the rotation matrices, as given in
Lemma (3.4), define the rotation by means of the identity

Here each P\k) is defined as in Eq. (3.5) with c and s appropriately super- and
subscripted.
Following premultiplication of (Ak — skIn) by the first i — 2 of the rota-
tion matrices we shall have

Premultiplying both sides of Eq. (B.7) byJ(k)t-t,tshows, by means of Eq. (3.7)


to (3.9), that the following recurrence equations hold.
242 GLOBAL QUADRATIC CONVERGENCE OF THE QR ALGORITHM APP. B

The similarity transformation is completed by forming

In this process the new off-diagonal elements are computed as

We wish to analyze the convergence ofb(k+1)as k —» oo.


In the Eq. (B.17) to (B.19) we suppress the superscript (k) but write the
superscript (k + 1). Beginning with Eq. (B.I6) with j = it we have

Eliminating pn-1 between Eq. (B.ll) and (B.16) gives

Then using Eq. (B.17) and (B.18) we obtain


APP. B GLOBAL QUADRATIC CONVERGENCE OF THE QR ALGORITHM 243

From Eq. (B.4) expressed as 1 and Eq. (B.I7) we have

From Eq. (B.I9) we see that

so that converges to a limit L > 0 as


(B.22) LEMMA The limit L is zero.
Proof: Assume L > 0. Passing to the limit, as k —> oo, in both
members of Eq. (B.I9) yields

From Eq. (B.10) and (B.24) and the fact that the sequence is
bounded [Lemma (B.2)], we obtain

Thus from Eq. (B.13) and (B.14),

Since Eq. (B.26) impliesc(k)n-1— 0, it follows from Eq. (B.28) that

From Eq. (B.25), (B.26), and (B.29) we haveB(k)n-1—>0, but since the
sequence{b(k)n}is bounded, this implies , contradicting the as-
sumption that L > 0. This completes the proof of Lemma (B.22).
(B.30) LEMMA
The sequence k =1,2,..., contains arbitrarily small terms.

Proof: Let T > 0. By Lemma (B.2) the sequences {|B(k)n'|}_and


are bounded. Thus by Lemma (B.22) there exists an integer k for
which either
244 GLOBAL QUADRATIC CONVERGENCE OF THE QR ALGORITHM APP. B

or

If Eq. (B.31) holds, then with Eq. (B.20) we have

It follows that for any t > 0 there exists an integer £ depending upon t such
that

This completes the proof of Lemma (B.30).


(B.33) LEMMA
I f x > Q i s sufficiently small and if k is such that \b(k)n\ < T, then
|b< k ) |<T f o r a l l k > k .
Proof: Define 8 = min {| l, — l, \ : i ^j}. Note that d > 0 since
the eigenvalues A, of A are distinct.
Define

Let T0 > 0 be a number small enough so that

and

Choose T, 0 < T < TO, and let k be an integer such that

For notational convenience we introduce

If b{k}n = 0, the algorithm has converged. Otherwise without loss of generality


we may assume € > 0.
APP. B GLOBAL QUADRATIC CONVERGENCE OF THE QR ALGORITHM 245

Sincea(k)na(k)n-1— €2 we can write

Letm1,...,mn-1be the roots of the (n — 1) x (n — 1) symmetric matrix B,


and A',,..., Xn be the roots of the shifted matrix Ak — skln. By Theorem
(5.1), the ordered eigenvalues of Ak — akIH and those of

differ at most by c. Thus with a possible reindexing of the A,' we have the
inequalities

From the identity

we have

Now using Eq. (B.41) on the second and third terms of the right side of
inequality (B.42), observing that |£ — Xm\ = | l, -- |n > d, followed by use
of Eq. (B.4) with b(nk) = € on the fourth term of the right side of inequality
(B.42) we have

At the next-to-last step in the forward triangularization of Ak — akla the


configuration in the lower right 2x2 submatrix is seen by Eq. (B.3), (B.7),
246 GLOBAL QUADRATIC CONVERGENCE OF THE QR ALGORITHM APP. B

(B.14), and (B.39) to be

Note thatx(k)n-1,of Eq. (B.44) is a diagonal element of the (n — 1) x


(it — 1) upper triangular matrix resulting from premultiplication of the
matrix B of Eq. (B.40) by (n — 2) rotation matrices. Thus by Eq. (6.3)
and (B.43)

Completing the similarity transformation we have, from Eq. (B.3), (B.9),


(B.13), and (B.17) with

Now from Eq. (B.ll) and the inequality (B.45)

Finally, using Eq. (B.46) and the inequality (B.47) we have

The inequality in Eq. (B.48) shows that if is bounded away from


zero, then the convergence ultimately is cubic. However, all we know general-
ly from Eq. (B.4) is that , and therefore

or, recalling Eq. (B.34) and (B.39),

From conditions (B.36) to (B.38) we havef(b(k)n)< 1 and thus


APP. B GLOBAL QUADRATIC CONVERGENCE OF THE QR ALGORITHM 247

By induction it follows that inequalities (B.50) and (B.51) with k replaced by


k +1 hold for all / = 0,1 This completes the proof of Lemma (B.33).
Lemma (B.33) implies thatb(l)n—» 0 as /—> oo, which is conclusion (b)
of Theorem (18.5). Sinceb(l)n—»0 and the inequality in Eq. (B.50) holds for
all sufficiently large k the quadratic convergence asserted in the final con-
clusion (c) of Theorem (18.5) is also established.
We conclude this appendix with these remarks:
1. The shifts sk do not need to be explicitly subtracted from each of the
diagonal terms of Ak when forming Ak+1. Discussion of this as it applies to
computation of the singular value decomposition is given in Chapter 18.
2. In practice the iteration procedure for each eigenvalue is terminated
when the termsb(k)nare "zero to working accuracy.** This can be defined in a
variety of ways. One criterion for this is given in Chapter 18 as it concerns the
numerical aspects of the singular value decomposition.
3. The proof we have given assumes that n > 3. For n = 2 it is easy to
see that if

then performing one shifted QR transformation as given in Eq. (18.1) to


(18.4) gives

withb(2)2= 0. Thus A2 is diagonal and the eigenvalues of A have been com-


puted.
4. More generally if an actual eigenvalue A of A instead of sk of Eq (18.4)
were used as a shift, then the matrix Ak+l would break up after the next QR
sweep. In fact,bn(k+1)=0 anda(k+1)n= A. To verify this note that since Ak —
A/, is rank deficient at least one of the diagonal termsp(k)tof the triangular
matrix Rk must be zero. From Eq. (B.8), p<k)t > 0 for / = 1 , . . . , it — 1 and
therefore it must beP(k)nthat is zero. Upon completion of the similarity trans-
formation and translation of Eq. (B.15) while using Eq. (B.I7) we see that
b(k+1)n =s(k)nP(k)n=o, and thereforea(k+1)

EXERCISE

(B.52) Prove Lemma (B.2).


APPENDIX DESCRIPTION AND USE OF

C FORTRAN CODES FOR


SOLVING PROBLEM LS

Introduction

Throughout this book, attention has focused primarily on the mathematical


and algorithmic descriptions of various methods for obtaining solutions
to Problem LS (Chapter 1) or its variants such as Problem LSE (20.1)
and Problem LSI (23.1). These algorithms can be implemented in any of
several well-known computer source languages such as Fortran, C, or Ada.
Accomplishing this can be expensive and time consuming. For this reason
we have prepared Fortran source codes for some of the algorithms. All but
one of the codes are written to conform to the Fortran 77 standard. The
exception is BVLS, which conforms to the Fortran 90 standard.
These codes can be obtained from the NETLIB site on the Internet.
NETLIB has become the preeminent mechanism in the known universe
for collecting and distributing public-domain mathematical software. For
background information on NETLIB see Dongarra and Grosse (1987), Don-
garra, Rowan, and Wade (1995), and Grosse (1995) in Appendix D. The
current alternate Internet addresses for NETLIB for general ftp and e-mail
access are netlib.org and netlibQresearch.att.com. A mirrored ad-
dress in Norway is netlib.no. The URL for World Wide Web access to
NETLIB is http://www.netlib.org.
The 1974 versions of these programs (except PROG7 and BVLS) were
compiled and executed successfully on the UNIVAC 1108 and IBM 360/75
computer systems at the Jet Propulsion Laboratory, on the IBM 360/67 at
Washington State University, on the CDC 6500 at Purdue University, on
the CDC 6600 and 7600 at the National Center for Atmospheric Research,
and on the CDC 7600 at Aerospace Corporation. We wish to express our
appreciation to Drs. T. J. Aird, Alan Cline, and H. J. Wertz and Mr. C.
T. Verhey for their valuable cooperation in making these runs on CDC
computers.

Summary of subprograms implemented


The subprograms implemented are

• HFTI. Implements Algorithm HFTI of Chapter 14. Calls H12 and


DIFF.
248
AFP. C DESCRIPTION AND USB OP FORTRAN CODES 249

• SVA. Implements singular value analysis and Levenberg-Marquardt


analysis as described in Chapters 18 and 25. Produces printed output
of quantities of interest. Calls SVDRS and MFEOUT.
• SVDRS. Computes the singular value decomposition as described in
Chapter 18. Called by SVA. Calls H12 and QRBD.
• QRBD. Computes the singular value decomposition of a bidiagonal
matrix as described in Chapter 18. Called by SVDRS. Calls DIFF
and Gl.
• BNDACC and BNDSOL. Implements the band-limited sequential
least squares algorithm of Chapter 27, Section 2. Calls H12.
• LDP. Solves the least distance programming problem as described
in Chapter 23. Calls NNLS and DIFF.
• NNLS. Computes a least squares solution, subject to all variables
being nonnegative, as described in Chapter 23. Called by LDP.Calls
H12, DIFF, and Gl.
• H12. Constructs and applies a Householder transformation as de-
scribed in Chapter 10. Called by HFTI, SVDRS, NNLS, and
BNDACC.
• G1 and G2. Constructs and applies a Givens rotation as described
in Chapter 10. Gl is called by QRBD and NNLS. The code repre-
senting Algorithm G2 appears inline in QRBD and NNLS.
• MFEOUT. Prints a two-dimensional array in a choice of two for-
mats. Called by SVA.
• GEN. Generates a sequence of numbers for use in constructing test
data. Used by PROG1, PROG2, and PROGS.
• DIFF. Computes the difference between two floating point argu-
ments. Called by HFTI, QRBD, LDP, and NNLS to support a
method of comparing computed quantities against the precision limit
of the host computer.
• BVLS. A Fortran 90 subroutine that computes a least squares solu-
tion, subject to all variables having upper and lower bounds.
250 DESCRIPTION AND USE OF FORTRAN CODES AFP. C

Summary of main programs implemented


Seven main programs are implemented to demonstrate the various algo-
rithms and the usage of the subprograms.
• PROG1. Demonstrates Algorithms HFT, HSl, and COV of Chap-
ters 11 and 12. Calls H12 and GEN.
• PROG2. Demonstrates Algorithms HFTI and COV of Chapters
14 and 12, respectively. Calls HFTI and GEN.
• PROG3. Demonstrates the singular value decomposition algorithm
of Chapter 18. Calls SYDRS and GEN.
• PROG4. Demonstrates singular value analysis including computa-
tion of Levenberg-Marquardt solution norms and residual norms as
described in Chapters 18, 25, and 26. Calls SYA and reads test data
from the file DATA4. The file DATA4 contains the data listed in
Table 26.1, and the computation produces the results shown in Figure
26.1.
• PROG5. Demonstrates the band-limited sequential accumulation
algorithm of Chapter 27, Sections 2 and 4. The algorithm is used
to fit a cubic spline (with uniformly spaced breakpoints) to a table
of data. This example is discussed in Chapter 27, Sections 2 and 4.
Calls BNDACC and BNDSOL.
• PROG6. Computes the constrained line-fitting problem given as an
example in Section 7 of Chapter 23. The program illustrates a typical
usage of the subroutine LDP, which in turn uses NNLS. PROG6
also calls SYDRS.
• PROG7. A Fortran 90 program that demonstrates the use of BYLS.
Discussion of PROG1, PROG2, and PROGS
The program PROG1 is an illustration of the simplest method of solv-
ing a least squares problem described in this book. It implements the
Algorithms HFT, HSl, and COV of Chapters 11 and 12. This code may
give unacceptable answers if the matrix is nearly rank-deficient, so beware:
no checking for nearly rank-deficient matrices is made in PROG1.
Methods coping with problems having these nearly rank-deficient ma-
trices are illustrated by PROG2 and PROG3, which demonstrate, re-
spectively, Algorithms HFTI (Chapter 14) and SVD (Chapter 18). The
reader who is solving least squares problems with near rank deficiencies
(highly correlated variables) would be advised to use code based on one
APP. C DESCRIPTION AND USB OF FORTRAN COOES 251

of these more general programs. The program PROG2 also implements


Algorithm COV (Chapter 12) in the cases that are of full pseudorank.
The programs PROG2 and PROG3 each use an absolute tolerance
parameter t for pseudorank determination. In PROG2, r is compared
with diagonal elements of the triangular matrix resulting from Householder
triangularization, while in PROG3, t is compared with the singular values.
Each of the three main programs, PROG1, PROG2, and PROG3,
uses the FUNCTION subprogram GEN to generate the same set of
36 test cases. For the first 18 cases the data consist of a sequence of
integers between —500 and 500, with a period of 10. This short period
causes certain of the matrices to be mathematically rank-deficient. All the
generated matrices have norms of approximately 500.
For the first 18 cases PROG2 sets the absolute tolerance t — 0.5.
In PROG3 a relative tolerance p is set to 10-3. Then for each case the
absolute tolerance is computed as t — ps1 where s1 is the largest singular
value of the test matrix A. Recall that s1 = \\A\\.
The second set of 18 cases are the same as the first 18 except that
"noise," simulating data uncertainty, is added to all data values. The
relative magnitude of this "noise" is approximately v = 10-4. For these
cases PROG2 sets t = 10va where a is preset to 500 to represent the
approximate norm of the test matrices. The program PROG3 sets p — 10v
and then, for each test matrix A, computes t — p\\A\\.
This method of setting t, particularly in the second 18 cases, is intended
to stress the idea of choosing t on the basis of a priori knowledge about
the size of data uncertainty.
If the reader executes these programs, he or she will find that in the
pseudorank-deficient cases the solutions computed by PROG2 and
PROG3 are similar and of order of magnitude unity, while those com-
puted by PROG1 contain very large numbers that would probably be
unacceptable in practical applications. For example, for the test case
with m = 7, n = 6, and a noise level of zero, the values computed by
PROG1, PROG2, and PROG3 for the first component of the solution
vector on a computer with precision of about 10~8 were 0.237e8, 0.1458,
and 0.1458, respectively. With a relative noise level of 10~4, the values
computed were -0.4544e4, 0.1470, and 0.1456. In these cases PROG1
reported a pseudorank of 6, whereas PROG2 and PROG3 reported a
pseudorank of 4.

Machine and Problem Dependent Tolerances


In order to avoid storing the machine precision constant 77 in various sub-
programs, tests of the general form "If(|h| >n|x|)\x\Yhave been replaced by
252 DESCRIPTION AND USB OF FORTRAN CODES APP. C

the test "If((x + h) — x= 0)." To obtain the intended result, it is essential


that the sum x + h be truncated (or rounded) to n-precision before com-
puting the difference (a: + h) — x. If the expression (x+ h) — x were written
directly in Fortran code, there is the possibility that the intended result
would not be obtained due to an optimizing compiler or the use of an ex-
tended length accumulator. To circumvent these hazards a FUNCTION
subprogram, DIFF, is used so that the test is coded as "If(DIFF(x + h,x)
= 0)." In newer languages, such as Fortran 90, ANSI C, and Ada, access to
a machine precision parameter is provided by the language, so techniques
such as this are not needed.
The call statement for HFTI includes a parameter, TAU, which is
used as an absolute tolerance in determining the pseudorank of the matrix.
It is intended that the user set the value of this parameter to express
information about the accuracy of the data rather than to express the
machine's arithmetic precision.

Conversion of Fortran 77 Codes Between REAL and


DOUBLE PRECISION
All of the variables in these codes are declared. All constants are intro-
duced in PARAMETER statements. All intrinsic functions are used by
their generic names. As a consequence it should be possible to change
these Fortran 77 codes between REAL and DOUBLE PRECISION just by
making changes in the declaration part of the codes.

The Algorithm BVLS


The NNLS algorithm first appeared in the 1974 edition of this book. A
number of readers of the book have informed us that they have used NNLS
successfully in a variety of applications. NNLS has also been incorporated
in a number of commercial software packages.
Since 1974 we developed an algorithm and code for the Bounded Vari-
ables Least Squares problem, BVLS. This is a generalization of the NNLS
algorithm in which the constraints on the variables are ai < Xi < bi rather
than the constraints xi > 0 of NNLS. The structure of the BVLS algorithm
is essentially the same as NNLS, with additional details to deal with the
two-sided inequalities.

User's Guides for the Subprograms


APP. C DESCRIPTION AND USB OF FORTRAN CODES 253

HFTI

USER'S GUIDE TO HFTI: SOLUTION OF THE LEAST SQUARES


PROBLEM BY HOUSEHOLDER TRANSFORMATIONS

Subroutines Called H12, DIFF


Purpose
This subroutine solves a linear least squares problem or a set of linear
least squares problems having the same matrix but different right-side vectors.
The problem data consists of an M x N matrix A, an M x NB matrix B,
and an absolute tolerance parameter t. The N B column vectors of B represent
right-side vectors bj for NB distinct linear least squares problems.

This set of problems can also be written as the matrix least squares problem:

whereX'is the N x NB matrix having column vectors xj.


Note that if B is the M x M identity matrix, then X will be the pseudo-
inverse of A.
Method
This subroutine first transforms the augmented matrix [A: B] to a matrix
[R: C] using premultiplying Householder transformations with column inter-
changes. All subdiagonal elements in the matrix R are zero and its diagonal
elements satisfy |ru| > |rl+l,l+l|i = 1, where l = min {M, N}.
The subroutine will set the pseudorank KRANK equal to the number of
diagonal elements of R exceeding t in magnitude. Minimal length solution
vectors Xj, J = 1,..., NB, will be computed for the problems defined by
the first KRANK rows of [R: C].
If the relative uncertainty in the data matrix B is p, it is suggested that t
be set approximately equal to P \\ Aj||.
For further algorithmic details, see Algorithm HFTI in Chapter 14.
Usage
DIMENSION A(MDA, n1), {B(MDB, n2) or B(m,)}, RNORM(n2),
H(n1). G(n1)
INTEGER IP(nt)
CALL HFTI (A, MDA, M,N, B, MDB,NB,TAU,KRANK,RNORM,
H,G,IP)
The dimensioning parameters must satisfy MDA > M, n1, > N, MDB
> max {M, N}, m, > max {M, N}, and n2 > NB.
The subroutine parameters are defined as follows:
264 DESCRIPTION AND USB OF FORTRAN CODES APP. C

A(,), MDA, M, N The array A ( . ) initially contains the M x N matrix


A of the least squares problem AX= B. The first
dimensioning parameter of the array A ( , ) is MDA,
which must satisfy MDA> M. Either M > N or
M < N is permitted. There is no restriction on the
rank of A. The contents of the array A ( , ) will be
modified by the subroutine. See Fig. 14.1 for an exam-
ple illustrating the final contents of A(,).
B( ), MDB, NB If NB == 0 the subroutine will make no references
to the array B( ). If NB > 0 the array B( ) must
initially contain the M x NB matrix B of the least
squares problem AX ^ B and on return the array
B( ) will contain the N x NB solution matrix X.
If NB > 2 the array B( ) must be double sub-
scripted with first dimensioning parameter MDB >
max {M, N}. If NB = 1 the array B( ) may be
either doubly or singly subscripted. In the latter case
the value of MDB is arbitrary but some Fortran
compilers require that MDB be assigned a valid
integer value, say MDB = 1.
TAU Absolute tolerance parameter provided by user for
pseudorank determination.
KRANK Set by the subroutine to indicate the pseudorank
of ,4.
RNORM( ) On exit, RNORM(J) will contain the euclidean
norm of the residual vector for the problem defined
by theyth column vector of the array B ( , ) for 7 =
1 NB.
H( ).G( ) Arrays of working space. See Fig. 14.1 for an example
illustrating the final contents of these arrays.
IP( ) Array in which the subroutine records indices describ-
ing the permutation of column vectors. See Fig. 14.1
for an example illustrating the final contents of this
array.
Example of Usage
See PROG2 for an example of the usage of this subroutine.
APP. C DESCRIPTION AND USE OF FORTRAN CODES 266

SVA

USER'S GUIDE TO SVA: SINGULAR VALUE ANALYSIS

Subroutines Catted SVDRS, MFEOUT


Purpose
The subroutine SVA uses subroutine SVDRS to obtain the singular
value decomposition of a column scaled matrix A = AD and the associated
transformation of the right-side vector b of a least squares problem Ax = b.
The subroutine SVA prints quantities derived from this decomposition to
provide the user with information useful to the understanding of the given
least squares problem.
Method
The user provides an m x n matrix A and an m-vector b, defining a least
squares problem Ax 3 b. The user selects one of three options regarding
column scaling described below in the definition of the parameter (SCALE.
This selection defines an n x n diagonal matrix D.
Introducing the change of variables

the subroutine performs a singular value analysis of the least squares problem

where

The subroutine SVA uses the subroutine SVDRS to compute the singu-
lar value decomposition

and the transformed right-side vector

Denote the (ordered) singular values of 3, i.e., the diagonal elements of 5,


by stt..., sm. Let n denote the index of the last nonzero singular value and
define m = min (m, n).
Compute the vector

with components
256 DESCRIPTION AND USB OF FORTRAN CODES AFP. C

The fcth candidate solution vector y(k) is defined as

where Vj denotes the jth column vector of V. Reverting to the original vari-
ables we have the corresponding candidate solutions

The quantities

are computed. For k < n the quantity p2k is the sum of squares of residuals
associated with the kth candidate solution; i.e., p\ = \\b — Ax(k)\\2.
It is possible that the m x (n + 1) data matrix [A: b] provided to this
subroutine may be a compressed representation of a least squares problem
involving more than m rows. For example, [A: b] may be the triangular
matrix resulting from sequential Householder triangularization of a large set
of data as described in Chapter 27. The user provides an integer MDATA
that specifies the number of rows of data in the original problem. Of course,
if[A:b] is the original data, then MDATA and M should be set to the same
value. The number MDATA is used in computing

Under appropriate statistical hypotheses on the data [A: b], the number
ak can be interpreted as an unbiased estimate of the standard deviation of
errors in the data vector b.
Adapting Eq. (25.47) and (25.48) to the notation of this Subroutine User's
Guide, we compute the following quantities, which may be used for a Leven-
berg-Marquardt analysis:
APP. C DESCRIPTION AND USE OP FORTRAN CODES 257

This subroutine has the capability of printing output information that


is organized into six blocks. The user has the option, via the argument
KPVEC, to select individually which of these blocks are to be printed.
The contents of the blocks are as follows:
1. The quantities M,N, MDATA, identification of the scaling option
used, and the diagonal elements of the scaling matrix D.
2. The matrix V, multiplied by 104 to facilitate scanning for large and
small elements.
3. Singular values and related quantities: Sk for A; = 1,..., m; pk, sk-1,
gk,gk 2 , for k = 1,... ,n; and pk2 and 0k for k = 0,... ,n.
4. Solution norm versus residual norm at different singular value cutoffs:
*» Ily (k) ll,pk,log 10 ||y(k)||, and log10ppk, for k = 1,... ,n.
5. Levenberg-Marquardt analysis: l,vl,wl,log10l,log10vl, and log10
wlafor a sequence of 21 values of A ranging from 10s1 to sn/10 in
increments that are uniformly spaced in Iog10 A.
6. Candidate solutions at different singular value cutoffs: x(k) for k —
l,...,n.

Usage

INTEGER MDA, M, N, MDATA, KPVEC(4), ISCALE


REAL A(MDA,m),B(m1),SING(n1),D(n1),WORK(n2)
CHARACTER NAMES(ni) * (lennam)

CALL SVA(A, MDA, M, N, MDATA, B, SING, KPVEC, NAMES,


ISC ALE, D, WORK)

The dimension parameters must satisfy MDA > max(M, N),


nI > N, mi > M, n2 > 2xN, lennam > 1.
The subroutine parameters are defined as follows:
A(,),MDA,M,N
258 DESCRIPTION AND USB OF FORTRAN CODES AFP. C

The array A(,) initially contains the M x N matrix A of the


least squares problem Ax = 6. Either M > N o r M < N is
permitted. The first dimensioning parameter of the array A(,)
is MDA, which must satisfy MDA > max(M, N). On output
the kih candidate solution x(k) will be stored in the first N
locations of the kth column of A(,) for k — 1,..., min(M, N).
MDATA
The number of rows in the original least squares problem, which
may have had more rows than [A : 6]. MDATA is used only
in computing the numbers sk, K — 0,..., n.

B()
The array B( ) initially contains the M-vector 6 of the least
squares problem Ax ~ 6. On output the M-vector g is stored
in the array B( ).
SING( )
On return the singular values of the scaled matrix A are stored
in descending order in SING(i), i — 1,..., min(M, N). If M
< N, SING(M+1) through SING(N) will be set to zero.
KPVEC( )
Option array controlling report generation. If KPVEC(l) — 0,
default settings will be used, producing the full report, sending
it to the standard system output unit, formatted with a max-
imum line length of 79. If KPVEC(l) = 1, the contents of
KPVEC(i), i = 2,..., 4, set options for the report as follows:
KPVEC(2)
The decimal representation of KPVEC(2) must be at most 6
digits, each being 0 or 1. The decimal digits will be interpreted
as independent on/off flags for the 6 possible blocks of the report
that were described above. Examples: 111111, which is the
default value, selects all blocks. 101010 selects the 1st, 3rd,
and 5th blocks. 0 suppresses the whole report.
KPVEC(3)
Define UNIT = KPVEC(3). K UNIT = -1, which is the
default value, output will be written to the "*" output unit,
i.e., the standard system output unit. If UNIT > 0, UNIT will
APP. C DESCRIPTION AND USE OF FORTRAN CODES 259

be used as the output unit number. The calling program is


responsible for opening and/or closing the selected output unit
if the host system requires these actions.
KPVEC(4)
Determines the width of blocks 2, 3, and 6 of the output re-
port. Define WIDTH = KPVEC(4). The default value is 79.
Each output line will have a leading blank for Fortran "carriage
control" with line widths as follows: Output blocks 1, 4, and
5 always have 63, 66, and 66 character positions, respectively.
Output blocks 2 and 6 will generally have at most WIDTH char-
acter positions. One output line will contain a row number, a
name from NAMES( ), and from one to eight floating-point
numbers. The space allocated for a name will be that needed
for the longest name in NAMES( ), which may be less than
the declared length of the elements of NAMES( ). The line
length will only exceed WIDTH if this is necessary to accom-
modate a row number, a name, and one floating-point number.
Output block 3 will have 69 character positions if WIDTH <
95 and otherwise will have 95 character positions.
NAMES( )
NAMES (J), for j = 1,..., N, may contain a name for the jth
component of the solution vector. If NAMES(l) contains only
blank characters, it will be assumed that no names have been
provided, and this subroutine will not access the NAMES( )
array beyond the first element.
ISCALE
Set by user to 1,2, or 3 to select the column scaling option.
1. The subroutine will use identity scaling and ignore the D( )
array.
2. The subroutine will scale nonzero columns of A to have
unit Euclidean length, and will store reciprocal lengths of the
original nonzero columns in D(). If column j of A has only
zero elements, D(jf) will be set to one.
3. User supplies column scaling factors in D( ). The subroutine
will multiply column j of A by DO') and remove the scaling
from the solution at the end.
DO
260 DESCRIPTION AND USE OF FORTRAN CODES APP. C

Usage of D( ) depends on ISC ALE as described above. When


used, its length must be at least N.
WORK( )
Scratch work space of length at least 2xN.

Example of Usage

See PROG4 for an example of the usage of this subroutine.


Output printed by this subroutine is illustrated in Fig 26.1.

SVDRS
USER'S GUIDE TO SVDRS: SINGULAR VALUE
DECOMPOSITION OF PROBLEM LS

Subroutines Called H12, QRBD

Purpose
Given an M x N matrix A and an M x NB matrix B, this subroutine
computes the singular values of A and also computes auxiliary quantities
useful in analyzing and solving the matrix least squares problem AX =*
B. Denoting the singular value decomposition of A by A — USVT, this
subroutine computes 5, V, and G — UtB.
To complete the solution of the least squares problem AX = B, the
user must first decide which small singular values are to be treated as zero.
Let S+ denote the matrix obtained by transposing S and reciprocating
the significant singular values and setting the others to zero. Then the
solution matrix X can be obtained by computing P = S+G and X — VP.
Either M > N or M < N is permitted. Note that if B = I, then X is the
pseudoinverse of A.
Input
The matrices A and B and their dimensioning parameters are input.
Output
The matrices V, G — UTB, and the diagonal elements of S are output in
the arrays A(,), B(,), and S(), respectively.

Usage
APP. C DESCRIPTION AND USB OP FORTRAN CODES 261

INTEGER MDA, M, N, MDB, NB


REAL A(MDA,rn ),B(MDB,n2) or B(m1)},S(n3), WORK(n4)
CALL SVDRS (A, MDA, M, N, B, MDB, NB, S, WORK)
The dimensioning parameters must satisfy MDA > max(M, N),
nI > N, MDB > M, n2 > NB, mi > M, n3 > N, and
n4 > 2 x N.
The subroutine parameters are defined as follows:
A(,),MDA,M,N
The array A(,) is a doubly subscripted array with first dimen-
sioning parameter equal to MDA. The array A(,) initially
contains the M x N matrix A with A(i, j) := a^. On output
the array A(,) contains the N x N matrix V with A(i, j) :—
Vy. Either M > N or M < N is permitted.
B( ),MDB,NB
The value NB denotes the number of column vectors in the ma-
trix B. If NB = 0, B( ) will not be referenced by this subrou-
tine. If NB > 2, the array B(,) should be doubly subscripted
with first dimensioning parameter equal to MDB. If NB = 1,
then B( ) will be used as a singly subscripted array of length
M. In this latter case the value of MDB is arbitrary but for
successful functioning of some Fortran compilers it must be set
to some acceptable integer value, say, MDB = 1. The contents
of the array B(,) are initially
B(i, j) := 6y, i = 1,....,M, j = 1,...,NB
or B(i):=6i, i = 1 M.
At the conclusion B(i, j) := gij, i = 1,..., M, j = 1,..., NB
or B(i) := &, i =1,...,M.

SO
On conclusion the first N locations contain the ordered singular
values of the matrix S(l) > S(2) > • • • > S(N) > 0.
WORK( )
This is working space of size 2 x N.
262 DESCRIPTION AND USB OF FORTRAN CODES APP. C

Error Message
In subroutine QRBD the off-diagonal of the bidiagonal matrix with
smallest nonzero magnitude is set to the value zero whenever the iteration
counter reaches 10 * N. The iteration counter is then reset. The results
are therefore still likely to result in an accurate SVD. This is reported
to SVDRS with the output parameter IPASS = 2. Subroutine SVDRS
then prints the message
FULL ACCURACY NOT ATTAINED IN BIDIAGONAL SVD
Example of Usage
See PROG3 and the subroutine SVA for examples of usage of this
subroutine.

QRBD

USER'S GUIDE TO QRBD: SINGULAR VALUE


DECOMPOSITION OF A BIDIAGONAL MATRIX

Subroutines Called G1, G2, and DIFF


Purpose
Compute the singular value decomposition of an N x N bidiagonal
matrix

This subroutine implements the special QR iteration described in Chapter 18.


The output of the subroutine consists of the singular values of B (the diagonal
elements of the matrix 5) and the two matrix products W and UTc where
V and C are given matrices of dimensions NRV x N and N x NC, respec-
tively.
APP. C DESCRIPTION AND USE Of FORTRAN CODES 203

Usage
DIMENSION D(n,), E(n1,), V(MDV, n1), C(MDC, m.)
CALL QRBD (IPASS, D. E. N, V, MDV, NRV, C, MDC, NC)
The dimension parameters must satisfy nl > N, m, ;> NC, MDV >
NRV, MDC > N.
The subroutine parameters are defined as follows:
IPA8S This integer flag is returned with either of the values
1 or 2.
IPASS = 1 signifies that convergence of the QR

IPASS = 2 denotes a loss of accuracy in the sin-


gular values. Convergence was not attained after
10 * N QR sweeps. This resulted in setting the
smallest nonzero offdiagonal to zero. From Theo-
rem (5.7) this may perturb the singular values by a
value larger than the usual value, n || A ||.
D( ) The array D( ) initially contains the diagonal ele-
ments of the matrix B.
D(l) :=b11,I = 1 N
On return the array D( ) contains the N (nonnega-
tive) singular values of B in nonincreasing order.
E( ) The array E( ) initially contains the superdiagonal
elements of the matrix B.'
E(1) := arbitrary.
E(l):=b1-1,1,l-2.....N.
The contents of the array E( ) are modified by the
subroutine.
N Order of the matrix B.
V(,), MDV, NRV If NRV < 0, the parameters V(,) and MDV will
not be used. If NRV > 1, then MDV must satisfy
MDV > NRV. The first dimensioning parameter of
V ( , ) must be equal to MDV. The array V ( , )
initially contains an NRV x N matrix V and on
return this matrix will be replaced by the NRV x N
product matrix W.
C(,), MDC, NC If NC <, 0, the parameters C(,) and MDC will not
be used.
If NC > 2, then MDC must satisfy MDC > N.
The first dimensioning parameter of the array C ( , )
must be equal to MDC.
264 DESCRIPTION AND USB OF FORTRAN CODES APP. C

If NC = 1, the parameter MDC can be assigned


any value, say, MDC = 1. In this case the array
C(,) may be either singly or doubly subscripted.
When NC ;> 1, the array C(,) must initially con-
tain an N x NC matrix C. On output this matrix
will be replaced by the N x NC product matrix UTC.
Example of Usage
This subroutine is used by the subroutine SVDRS.

BNDACC, BNDSOL

USER'S GUIDE TO BNDACC AND BNDSOL: SEQUENTIAL


PROCESSING OF BANDED LEAST SQUARES PROBLEMS

Subroutine Called H12


Purpose
These subroutines implement the algorithms given in Section 2 of Chapter
27 for solving A x = b where the matrix A has bandwidth NB. The four
principal parts of these algorithms are obtained by the following CALL
statements:
CALL BNDACC(...) Introduce new rows of data.
CALL BNDSOL(1,...) Compute solution vector and norm of residual
vector.
CALL BNDSOL(2,...) Given a vector d, solve Bty = d for y [see Eq.
(27.22)].
CALL BNDSOL(3,...) Given a vector w, solve Re = w for c [see Eq.
(27.23)].
The dots indicate additional parameters that will be specified in the following
paragraphs.

Usage of BNDACC
DIMENSION G(MDG,n,)
The dimensioning parameters must satisfy MDQ > m [see Eq. (27.9) for
the definition of m] and it,n1>NB +1.
The user must set IP - 1 and IR - 1 before thefirstcall to BNDACC.
APP. C DESCRIPTION AND USB OP FORTRAN CODES 265

The subroutine BNDACC is to be called once for each block of data


[C1: bt] to be introduced into the problem. See Eq. (27.14) and Fig. 27.2. For
each block of data the calling program must assign values to MT and JT and
copy the [MT x (NB + 1)]-array of data [C,: bt] into rows IR through
IR + MT — 1 of the working array G ( , ) and then execute the statement
CALL BNDACC (G, MDG, NB, IP, IR. MT, JT)

The subroutine parameters are defined as follows:


G(,) The working array. See Eq. (27.15) to (27.17) and adjacent text.
MDG Set by user to indicate the number of rows in the array G(, ).
NB Set by user to indicate the bandwidth of the data matrix A. See
text following Eq. (27.14).
IP Must be set to the value 1 before the first call to BNDACC. Its
value is subsequently controlled by BNDACC.
IR Index of first row of G ( , ) into which user is to place new data.
The variable IR is initialized by the user to the value 1 and sub-
sequently updated by BNDACC. It is not to be modified by the
user.
MT Set by user to indicate the number of new rows of data being
introduced by the current call to BNDACC. See Eq. (27.14).
JT Set by user to indicate the column of the submatrix At that is
identified with the first column of the array Ct. This parameter
JT has the same meaning as jt of Eq. (27.14).

Usage of BNDSOL
This subroutine performs three distinct functions selected by the first
parameter MODE, which may have the values. 1,2, or 3.
The statement CALL BND8OL (1,...) may be executed after one or
more calls to BNDACC to compute a solution for the least squares problem
whose data has been introduced and triangularized by the calls to BNDACC.
The problem being solved is represented by Eq. (27.11). This statement also
computes the number | e \ of Eq. (27.6), which has the interpretation given in
Eq. (27.12). The number \e\ will be stored in RNORM.
The computation performed by CALL BND8OL (1,...) does not alter
the contents of the array G ( , ) or the value of IR. Thus, after executing
CALL BNDSOL(1,.. .),more calls can be made to BNDACC to introduce
additional data.
The statement CALL BNDSOL (2,...) may be used to solve Rty = d
and the statement CALL BNDSOL (3,...) may be used to solve RC = w.
These entries do not modify the contents of the array G(. ) or the variable
266 DESCRIPTION AND USE OF FORTRAN CODES APP. C

IR. The variable RNORM is set to zero by each of these latter two state-
ments.
The primary purpose of these latter two statements is to facilitate compu-
tation of columns of the covariance matrix C — (RtR)- l as described in
Chapter 27. To compute c,, the jth column of C, one would set d — ej, the
Jin column of the identity matrix. After solving Rty = d, set w = y and
solve Rcj — w. This type of usage is illustrated in the Fortran program
PROG5.
The appropriate specification statement is
DIMENSION G(MDG,n1), X(n2)
The dimensioning parameters must satisfy MDG > m [see Eq. (27.9)
for the definition of m], n1 > NB +1, and n2 > N. The CALL statement is
CALL BNDSOL(MODE, G, MDG, NB, IP, IR, X, N, RNORM)
The subroutine parameters are defined as follows:
MODE Set by the user to the value 1, 2, or 3 to select
the desired function as described above.
G(,), MDG, NB, IP, IR These parameters must contain values as they
were defined upon the return from a preceding
call to BNDACC.
X( ) On input, with MODE = 2 or 3, this array
must contain the N-dimensional right-side vec-
tor of the system to be solved. On output with
MODE = 1, 2, or 3 this array will contain
the N-dimensional solution vector of the appro-
priate system that has been solved.
N Set by user to specify the dimensionality of the
desired solution vector. This causes the sub-
routine BNDSOL to use only the leading N x
N submatrix of the triangular matrix currently
represented in columns 1 through NB of the
array G( ,) and (if MODE = 1) only the first
N components of the right-side vector currently
represented in column NB + 1 of the array
G(,).
If any of the first N diagonal elements are zero,
this subroutine executes the Fortran statement
STOP after printing an appropriate message.
See the text following Algorithm (27.24) for a
APP. C DESCRIPTION AND USE OF FORTRAN CODES 267

discussion of some steps one might take to


obtain a usable solution in this situation.
RNORM If MODE = 1, RNORM is set by the sub-
routine to the value \e\ defined by Eq. (27.6).
This number is computed as

If MODE = 2 or 3, RNORM is set to zero.

Example of Usage
See PROG5 for an example of the usage of BNDACC and BNDSOL

LDP

USER'S GUIDE TO LDP: LEAST DISTANCE PROGRAMMING

Subroutine Called NNLS, DIFF

Purpose
This subroutine computes the solution of the following constrained least
squares problem:
Problem LDP: Minimize ||x|| subject to Gx > h. Here G is an M x N
matrix and h is an M-vector.

No initial feasible vector, x0, is required. Thus the subroutine LDP can be
used to obtain a solution to an arbitrary set of linear inequalities, Gx > h. If
these inequalities have no solution the user is notified by means of a flag
returned by the subroutine.
Method
This subroutine implements Algorithm LDP (23.27), which reduces
Problem LDP (23.3) to a related Problem NNLS (23.2). Problem NNLS is
solved with subroutine NNLS and a solution to Problem LDP is then ob-
tained or it is noted that no solution exists.
268 DESCRIPTION AND USB OF FORTRAN CODES APP. C

Usage
DIMENSION G(MDG,nt), H(m,), X(n1), W(n2)
INTEGER INDEX(mi)
CALL LDP (G, MDG,M,N, H,X, XNORM, W, INDEX, MODE)
The dimensioning parameters must satisfy MDQ > M,n1, > N, m1 > M,
and n2 > (M + 2)*(N +1) + 2*M.
The subroutine parameters are defined as follows:
G(,), MDQ, M, N The array G ( , ) has first dimensioning parameter
MDG and contains the M x N matrix G of Problem
LDP. Either M > N or M < N is permitted and
there is no restriction on the rank of G. The contents
of G ( , ) are not modified.
H( ) The array H( ) contains the M-vector h of Problem
LDP. The contents of H( ) are not modified.
X( ) The contents of the array X( ) do not need to be
defined initially. On normal termination (MODE=1)
X( ) contains the solution vector X. On an error
termination (MODE > 2) the contents of X( ) are
set to zero.
XNORM On normal termination (MODE = 1) XNORM con-
tains ||x||. On an error termination (MODE ;> 2)
the value of XNORM is set to zero.
W( ) This array is used as temporary working storage.
INDEX( ) This array is used as temporary type INTEGER
working storage.
MODE This flag is set by the subroutine to indicate the
status of the computation on completion. Its value
indicates the reason for terminating:
1 = successful.
2 = bad dimension. The condition N <; 0 was noted
in subroutine LDP.
3 = maximum number (3*M) of iterations
exceeded in subroutine NNLS.
4 = inequality constraints Gx > h are incompatible.
Example 1
See program PROG6 for one example of the usage of this subroutine.
Example 2
Suppose a 5 x IS matrix G and a 5-dimensional vector h are given. The
following code will solve Problem LDP for this data:
APP. C. DESCRIPTION AND USE OF FORTRAN CODES 269

DIMENSION G(5,15). H(5), X(16). W(122)


INTEGER INDEX (6)
DO 10 I - 1. 5
H(I)-ht,
DO 10 J=1,15
10 G(1, J)= gtj
CALL LDP (G. 5,5,15, H, X, XNORM, W. INDEX. MODE)
GO TO (20.30.40,60), MODE
20 [Successful return. The solution vector £ is in X( ).]
90 [Bad dimension. N < 0 was noted.]
40 [Excessive iterations (more than 3*M) in subroutine NNLS.J
50 [The inequalities Gx > h are incompatible.]

NNLS

USER'S GUIDE TO NNLS: NONNEQATIVE LINEAR


LEAST SQUARES

Subroutines Called H12, G1, G2, DIFF


Purpose
This subroutine computes a solution vector, X, for the following con-
strained least squares problem:
Problem NNLS: Solve A x = b subject to x > 0.
Here A is a given M x N matrix and b is a given ro-vector. This problem
always has a solution but it is nonunique if the rank of A is less than N.
Method
Problem NNLS is solved using Algorithm NNLS (23.10). It can be proved
that this algorithm converges in a finite number of iterations. The technique
for adding and removing vectors in the QR decomposition of A is that
described in Method 3 of Chapter 24.
Usage
DIMENSION A(MDA.n,). B(m,). X(nt). W(n1), Z(m1)
INTEGER INDEX(n1)
CALL NNLS (A. MDA. M. N, B. X. RNORM.W.Z. INDEX. MODE)
270 DESCRIPTION AND USB OF FORTRAN CODES APP. C

The dimensioning parameters must satisfy MDA > M, nt > N, m1> M.


The subroutine parameters are defined as follows:
A(,), MDA, M, N The array A ( , ) has first dimensioning parameter
MDA and initially contains the M x N matrix A.
Either M > N or M < N is permissible. There is no
restriction on the rank of A. On termination A ( , )
contains A = QA, where Q is an M x M orthogonal
matrix implicitly generated by this subroutine.
B( ) The array B( ) initially contains the M-vector, b. On
termination B( ) contains Qb.
X( ) The initial content of the array X( ) is arbitrary. On
termination with MODE == 1 X( ) contains a solu-
tion vector, x > 0.
RNORM On termination RNORM contains the euclidean
norm of the final residual vector RNORM :—
\\b-Ax\\.
W( ), Z( ) The initial contents of these two arrays are arbitrary.
On termination W( ) will contain the N-dimensional
dual vector w — AT(b — AX). The array Z( ) is work-
ing space.
IN DEX( ) The initial content of this array is arbitrary. The array
INDEX( ) is integer working space.
MODE This flag is set by the subroutine to indicate the status
of the computation on completion. Its value indicates
the reason for terminating:
1 — successful.
2 = bad dimensions. One of the conditions M <; 0
or N < 0 has occurred.
3 = maximum number (3*N) of iterations has been
exceeded.
Error Message
The subroutine will write the error message
NNLS QUITTING ON ITERATION COUNT
and exit with a feasible (approximate solution), x, if the iteration count
exceeds 3*N.
Example of Usage
Solve the least squares problem A x = b subject to x > 0 where A is a 20
x 10 matrix and b is a 20-dimensional vector.
APP. C DESCRIPTION AND USB Of FORTRAN CODES 271

DIMENSION A(20,10), B(20), X(10), W(10), 2(20), INDEX(10)


DO 10 I - 1, 20
B(l)=bt
DO 10 J =1,10
10 A(l,J)=a,;
CALL NNL8 (A, 20,20.10, B, X, RNORM, W, Z, INDEX, MODE)

H12

USER'S GUIDE TO H12: CONSTRUCTION AND APPLICATION


OF A HOUSEHOLDER TRANSFORMATION

Purpose
This subroutine implements Algorithms HI and H2 (10.22). Given an m-
vector v and integers I, and /,, this subroutine computes an m-vector u and
a number s such that the m x m (Householder) symmetric orthogonal matrix
Q = I + (uut)l(sud satisfies Qv = w where w, — vt for i < lp, |w,.\ =
<v2lp+Emi-liv2i)1/2,wt,= v1 for lp, < i < l1„ and w, = 0 for /, < i < m.
tionally this matrix Q may be multiplied times a given set of NCV m-vectors.
Usage
CALL H12 (MODE, LPIVOT, L1, M, U, IUE, UP, C, ICE, ICV. NCV)
The parameters are defined as follows:

MODE Set by the user to the value 1 or 2. If MODE = 1,


the subroutine executes Algorithm HI (10.22),
which computes a Householder transformation
and, if NCV > 0, multiplies it times the set of
NCV m-vectors stored in the array C. If MODE =
2, the subroutine executes Algorithm H2 (10.22),
which assumes that a Householder transformation
has already been defined by a previous call with
MODE - 1 (Algorithm HI) and, if NCV > 0,
multiplies it times the set of NCV m-vectors stored
in the array C.
LPIVOT, L1, M If these integers satisfy
1<LPIVOT<L1^M
272 DESCRIPTION AND USE OF FORTRAN CODES AFP. C

then they constitute the quantities /p, l1, and m


defined above under Purpose. If these inequalities
are not all satisfied, then the subroutine returns
without doing any computation. This implicitly
effects an identity transformation.
U( ), IUE, UP The array U() contains M elements with a positive
storage increment of IUE between elements. If
MODE = 1, the array U( ) must initially contain
the m-vector v stored as U(1 + (/ — 1)*IUE) =
vjt j — 1,..., m. The subroutine will compute s
and HI, (see Purpose), storing s in place of Vj, and
storing MI, in U P. The other elements of U( ) remain
unchanged; however, the elements Vj for l\< j <
m will be regarded on exit as constituting elements
mj since uj = vl, for li < j < m. If MODE = 2,
the contents of U( ) and UP must contain the
results produced by a previous call to the subrou-
tine with MODE -1.
Example l: lf v is stored in a single subscripted
Fortran array called W with W(l) := vt, then the
parameters "U,IUE" should be written as "W. 1".
Example 2: If v is stored as the Jth column of
a doubly subscripted Fortran array called A with
A(l, J):= vt, then the parameters "U,IUE" should
be written as "A(1,J),1".
Example 3: If v is stored as the Ith row of a
doubly subscripted Fortran array called A with
A(l, J) := Vj and A is dimensioned A(50,40),
then the parameters "U, IUE" should be written
as"A(l,1).50".
C( ), ICE, ICV, NCV If NCV <O, no action is taken involving the
parameters C( ), ICE, and ICV. If NCV > 0,
the array C( ) must initially contain a set of NCV
m-vectors stored with a positive storage increment
of ICE between elements of a vector and a storage
increment of ICV between vectors. Thus if zij
denotes the ith element of the jth vector, then C(1 +
(i- 1)ICE + (j - 1)*ICV) = ztj. On output,
C( ) contains the set of NCV m-vectors resulting
from multiplying the given vectors by Q. The fol-
lowing examples illustrate the two most typical
ways in which the parameters C, ICE, ICV, and
NCV would be used.
AFP. C DESCRIPTION AND USB OF FORTRAN CODES 273

Example 1: Compute E— QF where F is a 50


x 30 matrix stored in a Fortran array F dimen-
sioned as F(60,40). Here M = 50 and the param-
eters "C. ICE. ICV, NCV" should be written as
"F, 1.60.30".
Example 2: Compute G = FQ where F is a 20
x 60 matrix stored in a Fortran array F dimen-
sioned as F(30,70). Here M » 60 and the param-
eters "C. ICE. ICV. NCV" should be written as
"F. 30.1.20".
Usage Examples
A. Reduce an n x n real matrix A to upper Hessenberg form using House-
holder transformations. This is often used as a first step in computing
eigenvalues of an unsymmetric matrix.

DIMENSION A(60,50). UP(50)


IF(N.LE.2) GO TO 20
DO 10 I - 3. N
CALL H12(1.l-1.I.N.A(1.l-2).1.UP(l-2).A(1.l-1).1.50.N-l+2)
10 CALL H12(2,1-1,1. N. A(1,1-2). 1. UP(l-2). A. 50.1. N)
20 CONTINUE
Note that the matrix H occupies the upper triangular and first subdiago-
nal part of the array named A on output. The data defining Q occupies
the remaining part of the array A and the array UP.
B. Suppose that we have the least squares problem A x = b with the special
form

where R is upper triangular. Use Householder transformations to reduce


this system to upper triangular form.
274 DESCRIPTION AND USE OF FORTRAN CODES APP. C

The augmented matrix [A:b] occupies the (N + K + 1 ) x ( N + 1)


•array named A. This is an example of block by block sequential accu-
mulation as described in Algorithm SEQHT (27.10).
DIMENSION A(50,30)
NP1 -N + 1
DO10J=1,NP1
10 CALL H12(1 J,N+2,N+K+1,A(1,J),1,T,A(1,J+1),1,50,N+1-J)
Notice that this code does not explicitly zero the lower triangular part of
the A-array in storage.

G1, G2

USER'S GUIDE TO G1 AND G2: CONSTRUCTION AND


APPLICATION OF ROTATION MATRICES

Purpose
Given the numbers x1 and x2, the subroutine Q1 computes data that
defines the 2 x 2 orthogonal rotation matrix G such that

The subroutine G2 performs the matrix-vector product

See Algorithms Gl (10.25) and G2 (10.26) for further details.


Input
(a) The numbers x, and x2 are input to G1.
(b) The numbers c, s, zt and z2 are input to G2.
Output
(a) The numbers c, s, and r are output by Gl.
(b) The matrix product I ^ I is output by G2 following a previous execution
AHP. C DESCRIPTION AND USB OF FORTRAN CODES 275

of Q1 that defined the appropriate c and s of the G matrix. The output


quantities z1 and z2 replace the input quantities z1 and z2 in storage.
Usage
CALL G1(X1,X2,C,S,R)
CALL G2(C,S,Z1,Z2)
Usage Example
Suppose that we have the least squares problem A x = b with the special
form

where & is upper triangular.


Use rotation matrices to reduce this system to upper triangular form and
reduce the right side so that only its first n 4- 1 components are nonzero.
The augmented matrix [A: b] occupies the (N + 2) x (N + 1 )-array named
A.
This is an example of a row by row "sequential accumulator." See the
remarks following Algorithm SEQHT (27.10).
NP1 =N + 1
DO 10 I = 1,NP1
CALL G1 (A(l, I), A(N+2,1). C, S, A(l, I))
A(N+ 2,1) = 0..
IF (I.QT.N) GO TO 20
DO 10 J - I, N
10 CALL 62 (C,S,A(I,J+1),A(N+2,J+1))
20 CONTINUE

MFEOUT
USER'S GUIDE TO MFEOUT: MATRIX OUTPUT
SUBROUTINE
Purpose

This subroutine outputs a formatted representation of an M


x N matrix with one of two built-in titles, row and column
numbering, and row labels. It is designed to be called by the
276 DESCRIPTION AND USB OF FORTRAN CODES APP. C

singular value analysis subroutine SVA to print the V-matrix


and the matrix of candidate solution vectors.
The matrix will be printed by blocks of columns. Each block
will contain all M rows of the matrix and as many columns as
will fit within the character width specified by WIDTH. A line
of output will contain a Fortran "carriage control" character, a
row number, a row label taken from NAMES( ), and elements
of the matrix occupying 14 character positions each.

Usage

INTEGER MDA, M, N, MODE, UNIT, WIDTH


REAL A(MDA,n1)
CHARACTER NAMES(n1) * (lennam)
CALL MFEOUT(A, MDA, M, N, NAMES, MODE, UNIT,
WIDTH)
The dimensioning parameters must satisfy MDA > M, n1 >
N, and lennam > 1.
The subroutine parameters are defined as follows:

A(,)
Array containing an M x N matrix to be output.
MDA
Dimension of first subscript of the array A(,).
M,N

Number of rows and columns, respectively, hi the matrix to be


output.

NAMES( )
NAMES(i) contains a character string label to be output in as-
sociation with row i of the matrix. If NAMES(l) contains only
blank characters, it will be assumed that no names have been
provided, and the subroutine will not access the NAMES( )
array beyond the first element and will output blank labels for
all rows.
APP. C DESCRIPTION AND USB OF FORTRAN CODES 277

MODE
MODE = 1 provides headings appropriate for the V matrix of
the singular value decomposition and uses a numeric format of
4pfl4.0.
MODE = 2 provides headings appropriate for the array of
candidate solutions resulting from singular value analysis and
uses a numeric format of g14.6.
UNIT
Selects the output unit. If UNIT > 0, then UNIT is the
output unit number. If UNIT = — 1, output will go to the "*"
unit, i.e., the standard system output unit.
WIDTH
Selects the width of output lines. Each output line from this
subroutine will have at most max(26,min( 124, WIDTH)) char-
acters plus one additional leading character for Fortran "car-
riage control," which will always be a blank.

GEN

USER'S GUIDE TO QEN: DATA GENERATION FUNCTION

Purpose
This FUNCTION is used by PROG1, PROG2, and PROG3 to generate
data for test cases. By basing the generation method on small integers it is
intended that the same test cases can be generated on virtually all computers.
Method
This FUNCTION generates two sequences of integers:

and

The sequence {Ik} has period 10, while the sequence {J1] has period 332.
On the kth call after initialization, GEN produces the REAL output value
278 DESCRIPTION AND USE OF FORTRAN CODES APP. C

The next member of the sequence {J1} is produced only when ANOI8E > 0.
No claim is made that this sequence has any particular pseudorandom
properties.
Usage
This FUNCTION must be initialized by a statement of the form

where A < 0. In this case the value assigned to X is zero.


The next number in the sequence defined above is produced by a statement
of the form

where the input parameter ANOISE is nonnegative.

DIFF

USER'S GUIDE TO DIFF

Purpose
This FUNCTION is used by the subroutines HFTI, QRBD, LDP and
NNLS to make the test "If ((x + h) - x) = 0" which is used in place of
the test "If (| h| > n |x|)" where 9 is the relative precision of the machine
floating point arithmetic.
Method
In the intended usage of this FUNCTION the intermediate sum z = x
+ h is computed in the calling program using n-precision arithmetic. Then
the difference d= z — x is computed by this FUNCTION using ^-precision
arithmetic.
Usage
This FUNCTION can be used for the test described in Purpose with
an arithmetic IF statement of the form

The statement numbered 10 corresponds to the condition \h\ > i\\x\. The
statement numbered 20 corresponds to the condition \h\ < n\x\.
BVLS
USER'S GUIDE TO BVLS: BOUNDED VARIABLES LEAST
SQUARES
Purpose
This Fortran 90 subroutine finds the least squares solution to an M x N
system of linear equations Ax ~ 6, subject to the constraints

It is permitted to have M > N or M < N. This problem always has a


solution, but it is generally nonunique if the rank of A is less than N.

Method
Problem BVLS is solved using an algorithm of C. Lawson and R. Han-
son. It is a generalization of the algorithm NNLS described in Chapter
23. In the following descriptions we refer to sets f, P, and Z. Set T con-
sists of variables fixed throughout the solution process due to their lower
and upper bounds being equal. All other variables are in sets P or Z.
The membership of sets P and Z may change during the solution process.
Variables hi Set P are determined to minimize the objective FUNCTION
subject to the variables in Set Z being held (temporarily) at fixed values.
A variable in Set Z may be at one of its bounds or at zero.

Usage

CALL BVLS(A, B, BND, X, RNORM, NSETP, W, INDEX, IERR)

The subroutine parameters are defined with the following in-


terface:

interface
subroutine bvls(A, B, BND, X, RNORM,&
NSETP, W, INDEX, IERR)
real(kind(leO)) A(:,:), B(:), BND(:,:), X(:), RNORM, W(:)
integer NSETP, INDEX(:), IERR

279
280 DESCRIPTION AND USE OF FORTRAN CODES APP.C

end subroutine
end interface

A(:,:) [INTENT(inout)]
On entry, A(:,:) contains the M x N matrix A. On return,
A(:,:) contains the product matrix QA, where Q is an M x M
orthogonal matrix generated implicitly by this subroutine. The
values M and N are the respective number of rows and columns
in A(:,:). Thus M=size(A,l) and N=size(A,2). Required
are M > 0 and N > 0.

B(:) [INTENT(inout)]
On entry, B(:) contains the M-vector, 6. On return, B(:)
contains Qb.

BND(1:2,:) pENTENT(in)]
The lower bound a, for Xi must be given in BND(1,») and the
upper bound Bi in BND(2,i). Required are a, < Bi. To indi-
cate Xi has no lower bound, set BND(l,i) = -HUGE(l.OeO).
To indicate that xi has no upper bound, set BND(2,i) =
HUGE(l.OeO).

X(:) [INTENT(out)]
On entry, X(:) need not be initialized. On a normal return,
X(:) will contain the solution N-vector.

RNORM [INTENT(out)]
On a normal return this is the Euclidean norm of the residual
vector.

NSETP [INTENT(out)]
APP.C DESCRIPTION AND USE OF FORTRAN CODES 281

Indicates the number of components of the solution vector,


X(:), that are in Set P. NSETP and INDEX(:) together
define Sets P, z, and T.

W(:) (INTENT(out)]
On return, W(:) will contain the dual vector w. This is the
negative gradient vector for the objective FUNCTION at the
final solution point. For j € F, W(j) may have an arbitrary
value. For j € P, W(J) =0. For j € Z, W(j) is < 0, > 0, or 0,
depending on whether X(j) is at its lower bound, at its upper
bound, or at zero with zero not being a bound.

INDEX(:) [INTENT(out)]
An INTEGER array of length at least N. On exit the value of
NSETP and the contents of this array define the sets P, z, and
f. The indices in INDEX(1: NSETP) identify P. Letting nf
denote the number of pairs of lower and upper bounds that are
identical, the indices in INDEX(NSETP+1: N-nf) identify
Z, and the indices in INDEX(N-nf+l: N) identify f.

IERR [INTENT(out)]
Indicates the status on return.
= 0 Solution completed OK.
= 1 m < 0 or n<0
= 2 One array has inconsistent size.
Required are size(B) > M, size(BND,l) = 2, size(BND,2)
> N, size(X) > N, size(W) > N, and size(INDEX) > N.
= 3 Input bounds are inconsistent. Required are BND(l,i) <
BND(2,i), i=l,... ,N.
— 4 Exceeds maximum number of iterations, ITMAX = 3*N.
This last condition may be caused by the problem being very ill-
conditioned. It is possible, but uncommon, that the algorithm
needs more iterations to complete the computation. The itera-
tion counter, ITER, is increased by one each tune a variable is
moved from Set Z to Set P.
282 DESCRIPTION AND USE OF FORTRAN CODES APP.C

Remark
It is not required that A be of full column rank. In particular it is
allowed to have M < N. In such cases there may be infinitely many
vectors x that produce the (unique) minimum value of the residual vector
length. The one returned by this subroutine does not have any particular
distinguishing properties. It does not necessarily have the least possible
number of nonzero components nor the minimum euclidean norm of all
solutions.

Functional Description

The algorithm used is a generalization of Algorithm NNLS, given in


Chapter 23, for the solution of the least squares problem with nonnegativity
constraints on the variables.
Given A and 6, for any vector x we may define the residual vector,
r = 6 — Ax, and the dual vector w — ATr. The vector w is half the
negative gradient vector of the objective function \\r\\ . Thus, for a vector
x that does not give the minimum of this objective FUNCTION, w points
in the downhill direction. In particular for the unconstrained problem, a
vector x is the solution if and only if its dual vector w is zero.
In the bounded variables problem the solution is characterized by the
signs of components of w being such that for each i, either Wi — 0 or a
perturbation of Xi in the direction Wi is blocked by one of the given bounds
for Xi. Thus x is a solution to the bounded variables problem if and only
if for each i one of the following is true:

Every variable is first set to a value as close to zero as possible, subject


to satisfying its bounds. During execution of the algorithm the variables
are classified into three disjoint sets, called ,F, p, and Z. Set F consists of
the variables that are permanently fixed at a constant value due to their
upper and lower bounds being equal.
Initially all variables not in Set T are placed hi Set Z. As the computa-
tion proceeds, the variables not in Set T will move between Sets Z and P.
At each stage of the procedure Set Z will consist of variables that are tem-
porarily being held at fixed values, which may be a lower or upper bound or
zero. Set P will consist of variables whose values are determined by solving
the least squares problem subject to the variables hi Sets F and Z having
APP.C DESCRIPTION AND USE OF FORTRAN CODES 283

fixed values. The augmented matrix [A : 6] will be successively modified by


the application of orthogonal transformations from the left determined so
that the columns associated with Set P form an upper triangular matrix.
A variable will not be moved into Set P if its associated column in
A is nearly linearly dependent on the columns already in Set P. This
test uses the machine precision obtained from the elemental Fortran 90
function EPSILON(l.OeO). It is possible for this rejection of a variable
to cause the algorithm not to achieve the minimum value for the objective
FUNCTION. The criteria and methods for moving variables between
Sets Z and P are similar to those described for algorithm NNLS but with
changes to handle two-sided bounds.
On return, NSETP is the number of variables in Set P, and the indices
of these variables are given in INDEX(i), i = 1,..., NSETP. These
variables will generally not be at constraint boundaries, whereas the other
variables, INDEX(i), i = NSETP+1, ..., N, will be at their lower or
upper bounds or (exceptionally) at the value zero.
APPENDIX

D
Introduction
DEVELOPMENTS
FROM 1974 TO 1995

In this appendix we remark on developments since 1974 directly related to


the topics treated in the main body of the book, plus a few topics that were
not previously discussed. The remarks are grouped by the book chapters
and appendices to which they relate and by the names of the new topics.
Most of the citations in this appendix are to the references at the end of
this appendix. In the few that refer to the original bibliography the year
is prefixed with "b-."

Books and Monographs


The monograph Least Squares Methods by Bjorck (1990), thoroughly de-
scribes methods of least squares computation. A-significantly expanded
revision of this monograph is projected for publication as a book in 1995:
Numerical Methods for Least Squares Problems. Ake Bjorck maintains
an extensive bibliography on least squares computation in the directory
pub/references at the Internet address math.liu.se. Information there is
available by anonymous ftp.
Matrix Computations by Golub and Van Loan (1989) gives a compre-
hensive overview of contemporary methods of computational linear algebra,
with special attention to block-oriented algorithms that perform efficiently
on computers that have a high-speed cache memory between the processor
and the lower-speed main memory.
The LAPACK library of linear algebra software, LAPACK Users' Guide-
book by Anderson, et al. (1995) provides Fortran implementations of algo-
rithms for linear least squares, singular value decomposition, linear equa-
tions, and eigensystem problems for dense matrices. The authors of LA-
PACK developed new block-oriented algorithms and developed new accu-
racy analyses.
Theory of the Combination of Observations Least Subject to Errors by
Stewart (1995) is of historic and scholarly significance, being a translation
from Latin and German into English of classic papers on least squares and
related topics by Carl Friedrich Gauss.
Other books and monographs that have substantial coverage of topics
that have applications to least squares computation are Nonlinear Parame-
ter Estimation by Bard (1974), Factorization Methods for Discrete Sequen-
tial Estimation by Bierman (1977), Large Sparse Numerical Optimization
by Coleman (1984), Handbook for Matrix Computations by Coleman and
Van Loan (1988), Numerical Linear Algebra and Applications by Datta

284
APR 0 DEVELOPMENTS FROM 1974 TO 1995 285

(1995), Numerical Methods for Unconstrained Optimization and Nonlinear


Equations by Dennis and Schnabel (1983), Direct Methods for Sparse Ma-
trices by Duff, et al. (1987), Practical Optimization by Gill, et al. (1981),
Optimization Software Guide by More and Wright (1993), Direct Methods
for Sparse Matrices by 0sterby and Zlatev (1983), The Symmetric Eigen-
value Problem by Parlett (1980), Linear Algebra and Its Applications by
Strong (1988), The Total Least Squares Problem—Computational Aspects
and Analysis by Van Huffel and Vandewalle (1991), and Spline Methods for
Observational Data by Wahba (1990).

Chapters 1 and 2. Statement of least squares problems


In the usual least squares problem, Ax =± fe, one regards 6 as containing
observational error and A as not having observational error. Problems in
which there is observational error in A also arise. Such problems have been
discussed in the statistical literature under the names orthogonal regression
and errors in variables. The name total least-squares was introduced for
this problem class in Golub and Van Loan (1980), where systematic solu-
tion methods using the singular value decomposition are described. This
problem class is treated at length in the book by Van Huffel and Vandewalle
(1991). Additional research on the total least squares problem is reported
in Boggs, et al. (1987), Arun (1992), De Moor (1993b), Fierro and Bunch
(1994), Fierro, et al. (1993), Reilly, et al. (1993), Rosen, et al. (1996), Ten
Vregelaar (1995), Van Huffel (1992), and Wei (1992c).

Chapters 3 and 4. Orthogonal decompositions


Chapters 3 and 4 present, respectively, the orthogonal decomposition A =
HRK? and the singular value decomposition A — USV7. Here H, K, U,
and V are orthogonal, R is triangular, and 5 is diagonal. The HRKT de-
composition includes the very important QR decomposition as the special
case of K = I.
Two other orthogonal decompositions that have been studied are the OS
and GSVD (generalized SVD) decompositions. The relation of the GSVD
to the SVD is analogous to the relation between the eigensystem problem
for a single matrix Ax — x\ = Q and the eigensystem problem for a matrix
pencil Ax — Bx\ = 0. The CS decomposition has been used mainly as an
aid in studying and computing the GSVD.
The GSVD was introduced and studied in Van Loan's thesis (1973).
Exposition and references on the CS and GSVD decompositions appear
in Golub and Van Loan (1989) and Bjorck (1990, 1995). The GSVD is
further treated in Bai and Demmel (1991), Bai and Zha (1993), Hansen
286 DEVELOPMENTS FROM 1974 TO 1995 APP.D

(1990b), Paige (1984,1986), Paige and Saunders (1981), Park (1994), Stew-
art (1982), Sun (1983), and Van Loan (1976, 1985). Generalizations of the
QR decomposition are treated in Paige (1990).

Chapters 5, 8, 9, 15, 16, and 17. Perturbation bounds


and computational error analyses for QR, Cholesky, and
singular value decompositions and the solution of the
least squares problem
Results given in Chapter 5 show that small absolute perturbations to the
elements of a matrix, A, can give rise to only small absolute perturbations
of its singular values. This allows for the possibility that small perturba-
tions of a matrix may give rise to large relative perturbations of the small
singular values. Studies have identified conditions under which small rela-
tive perturbations of small singular values can be guaranteed. In LAPACK,
algorithms that obtain good relative accuracy for small singular values are
used when the structure of the matrix permits this. See Demmel and Ka-
han (1990) and other references in Anderson, et al. (1995). Perturbation
properties of singular values are also studied in Vaccaro (1994).
Perturbation analyses for the QR decomposition are the subject of Sun
(1995) and Zha (1993). The latter paper gives component-wise bounds.
Error analysis for an algorithm for the equality constrained least squares
problem is given in Barlow (1988). Perturbation and error analyses for the
Cholesky and LDLT factorizations are given in Sun (1992). Perturbation
bounds for the solution of the least squares problem are given in Ding and
Huang (1994), Stewart (1977), and Wei (1990).

Chapter 6. Bounds for the condition number of a tri-


angular matrix
Computing the singular values of a matrix generally provides the most
reliable way to compute its condition number. There has been a continuing
search for less expensive ways of estimating the condition number or rank
of a matrix that still have a reasonable level of reliability. It is also desirable
that such methods integrate conveniently with the main problem-solving
algorithm, which may involve updating, sparse matrix techniques, and so
on.
The methods discussed in Chapter 6, i.e., inspection of diagonal ele-
ments after Householder triangulation using column interchanges, are in-
expensive but not as reliable as one might wish. A method of condition
estimation introduced in UNPACK [Dongarra, et al. (1979)] brought in-
creased attention to this class of algorithms. Chan (1987) contributed
APP.D DEVELOPMENTS FROM 1974 TO 1995 287

new algorithmic ideas for rank estimation and introduced the term rank-
revealing decomposition, which has been further treated by a number of
authors, e.g., Barlow and Vemulapati (1992b), Bischof and Hansen (1991,
1992), Chan and Hansen (1990, 1992, 1994), Chandrasekaran and Ipsen
(1994), Fierro and Hansen (1993), Higham (1987), Mathias (1995), Park
and Eldeii (1995a), Shroff and Bischof (1992), and Van Huffel and Zha
(1993).

Chapter 10. Algorithms for application of orthogonal


transformations
Reorganization of the Householder transformation into a block form is
treated in Dietrich (1976), Dongarra, et al. (1986), Kaufman (1987),
Schreiber and Parlett (1988), and Schreiber and Van Loan (1989). The
WY algorithm is summarized in Golub and Van Loan (1989).
An alternative realization of the 2-multiply Givens algorithm that is
intended to have advantages for implementation on a systolic array is given
in Barlow and Ipsen (1988). Another formulation of the 2-multiply Givens
algorithm, which reduces the number of inner loop operations in some
compiler/processor combinations, is given in Anda and Park (1994).

Chapter 12. Computation of the covariance matrix with


constraints
The covariance matrix computation described in Chapter 12 can easily be
adapted to the case of Problem LSE (Least Squares with Equality Con-
straints) which is treated in Chapters 20-22. The LSE problem is stated
in Eqs. (20.1)-(20.3) and one solution method for the case of Rank (C) =
m1 is presented in Eqs. (20.19)-(20.24).
Define

where E2 is defined by Eq. (20.19). S is the unsealed covariance matrix


for y2 which is determined from Eq. (20.22).
Since y1 of Eq. (20.20) is known with certainty, the unsealed covariance
matrix for y is

and the scaled covariance matrix for x is


288 DEVELOPMENTS FROM 1974 TO 1995 APP. D

If the least squares problem of Eq. (20.22) is solved using a QR factor-


ization of E2, one will have

where Q is orthogonal m2xma and R is upper triangular (n—m1) x (n—m1).


Then S can be computed as

Deriving formulas for computation of the covariance matrix for use with
the solution method of Chapter 21 is only slightly more involved. Using
the quantities defined in Eqs. (21.4)-(21.7), the resulting formulas are:

where a is defined as above. Since the expression defining H occurs as


a subexpression in Eq. (21.4), H can be obtained as a byproduct of the
solution algorithm.

Chapter 13. The underdetermined full rank problem


The problem Ax = b where A is an m x n matrix with m < n and
Rank (A) = m will have an (n — m)-dimensional linear fiat of solutions.
In Chapter 13 we gave a method for finding the unique minimum length
solution making use of orthogonal transformations of A from the right.
Paige (1979a, 1979b) treats the situation in which one wants only some
portion of the x-vector to have minimum length. See also the heading
"Generalized Least Squares" in Golub and Van Loan (1989).
For k < m, let x1 denote the first k and x2 denote the last n — k
components of x. Let A be partitioned similarly: A = [A1 : A2]. Suppose
one wants a solution of Ax — b with \\x2\\ minimized.
Apply orthogonal transformations to the partitioned matrix [A1 : A2 :b]
from the left to transform A1 to upper triangular form:
APR D DEVELOPMENTS FROM 1974 TO 1995 289

The dimensions of RI, S1 and 52 are k x k,k x (n — k), and (m — k) x


(n — k). Assume

Compute an orthogonal matrix K that triangularizes Sa from the right:

The dimension of T2 is (m — k) x (m — k). Introduce the change of variables

The matrices K1 and K2 have dimensions (n — k)x(m — k) and (n — k) x


(n — m) and the vectors j/i and jfc have m — k and n — m components.
Writing

the problem becomes

Any value can be assigned to yy. However, y2 = 0 gives the minimum value
for \\xz\\. With y2 = 0, compute y1, x1, and x2, respectively, from

In the referenced papers, Paige applied this formulation to the classical


weighted least squares problem. Suppose the observation vector b in the
system Az = b is affected by errors assumed to have mean zero and
variance-covariance matrix s2C. Here C is symmetric, nonnegative def-
inite, with Cholesky factorization C — LLT.
If C is positive definite and thus L is nonsingular, the appropriate
weighted least squares problem is that offindingz to minimize ||L-l(Az -
6)||. Writing r = L- 1 (Az — 6), the problem becomes one of finding z and r
to minimize ||r|| subject to Az — Lr — b. This is an equivalent statement
when L is nonsingular and remains well defined even when L is singular.
This system can be analyzed as above if one identifies A with A\, —L with
A2, z with x1, and r with x2-
Sparse underdetermined linear systems are treated in George, et al.
(1984).
290 DEVELOPMENTS FROM 1974 TO 1995 APP.D

Chapter 18. The SVD algorithm


Besides the GSVD, mentioned above in connection with Chapters 3 and
4, a number of other variants on the SVD have been defined and studied.
See Adams, et al. (1994), De Moor and Van Dooren (1992), Fernando and
Hammarling (1987), Watson (1992), and Zha (1991, 1992).
In Section 5 of Chapter 18 it is mentioned that to compute the singular
values and the V matrix of the SVD of an m x n matrix A when mn is
large and m » n, computer storage and run-time can be reduced by first
transforming A to an n x n triangular matrix R by sequential processing
as in Chapter 27 and then applying the SVD or SVA algorithm to R.
Operation counts for this approach are included hi Table 19.1 in Chapter
19.
Chan (1982) incorporates a preprocessing phase of Householder trian-
gulation into an SVD code so it will automatically be used when m is
suitably larger than n. Bau (1994) elaborates this idea, showing that when
1 < ra/n < 2, the operation count is minimized by executing bidiago-
nalization, as hi the usual SVD algorithm, until the aspect ratio of the
nonbidiagonalized part of the matrix is approximately 2, applying a QR
factorization to this remaining part, and finally bidiagonalizing the trian-
gular factor. These approaches improve on the execution time savings of
the approach given in Chapter 18, but do not achieve the saving of storage
that may be essential when the product mn is very large and m » n.
Additional direct SVD algorithms are presented and analyzed in Chan-
drasekaran and Ipsen (1995) and Demmel and Kahan (1990). For iterative
SVD algorithms see Golub, et al. (1981) and Vogel and Wade (1994). A
parallel SVD algorithm is given in Jessup and Sorensen (1994).

Chapter 19. Other methods for least squares problems


The method of seminormal equations is treated in Bjorck (1987, 1990) and
summarized under the heading "Normal Equations Versus QR" in Golub
and Van Loan (1989). Orthogonality properties of the modified Gram-
Schmidt algorithm are studied in Bjorck and Paige (1992).
In a surface-fitting application using tensor product splines, Grosse
(1980) shows that the expected instability from using normal equations
is, in fact, not encountered.

Chapters 20, 21, and 22. Least squares with equality


constraints
The problem of least squares with equality constraints (LSE) is of interest in
its own right as well as its being an essential component of most algorithms
APR D DEVELOPMENTS FROM 1974 TO 1995 291

for more general constrained least squares and constrained optimization


problems. Problem LSE is treated in Anda and Park (to appear), Barlow
and Handy (1988), Baflow and Vemulapati (1992a), Elden (1980, 1982),
and Wei (1992a).

Chapter 23. Least squares with inequality constraints


The NNLS algorithm first appeared in the 1974 edition of this book. A
number of readers of the book have informed us that they have used NNLS
successfully in a variety of applications. NNLS has also been incorporated
in a number of commercial software packages.
Since 1974 we developed a BVLS algorithm and code for the Bounded
Variables Least Squares problem. This is a generalization of the NNLS
Algorithm in which the constraints on the variables are ai < xi < bi
rather than the constraints x< > 0 of NNLS. The structure of the BVLS
Algorithm is essentially the same as NNLS with additional details to deal
with the two-sided inequalities.
At each stage of the solution process in NNLS the variables are classified
into two classes, which we call Set 2 and Set P. Variables in Set P are
determined to minimize the objective function subject to the variables in
Set 2 being held (temporarily) at fixed values. In NNLS each xi is initially
set to zero and classified as being in Set Z.
In BVLS there is a third class, Set F, consisting of variables that are
fixed throughout the process due to their lower and upper bounds being
identical. If the bounds for an Xi are relatively large in magnitude and
of opposite signs, it could introduce unnecessary cancellation errors to ini-
tialize Xi at one of its bounds and possibly have it end up at an interior
value of much smaller magnitude. To avoid this potential source of error
we initialize each Xi to a value closest to zero that is within its bounds.
If an Xi is not in Set F, it is initially classified as being in Set Z. As a
consequence, variables in Set Z are not necessarily at their bounds. There
is no harm in this, but the algorithm must take it into consideration.
A hierarchy of more general constrained minimization problems can be
defined by adding general linear inequality or equality constraints on the
variables or nonlinear constraints. Furthermore the objective function can
be generalized from a sum of squares of linear functions of the variables to
a general quadratic function of the variables or to a general differentiable
nonlinear function of the variables.
These more general constrained minimization problems are treated in
Gill, et al. (1981). For additional research on constrained least squares,
see Cline (1975), Dax (1991), Guler, et al. (1995), Gulliksson (1994,1995),
Gulliksson and Wedin (1992), Hanson (1986), Hanson and Haskell (1981,
292 DEVELOPMENTS FROM 1974 TO 1995 APP.D

1982), James (1992), Stoer (1992), and Wei (1992b).

Chapters 25 and 26. Practical analysis and regulariza-


tion in applications
In an ill-conditioned least squares problem there will typically be a set of
significantly different candidate solution vectors that reduce the objective
function to an acceptably small value. The term regularization is used to
describe a variety of methods that can be used to pick a solution from this
set that is "reasonable" for the intended application.
Three methods for comparing candidate solutions for an ill-conditioned
problem are discussed in Chapter 26. The curve in Figure 26.2 has since
been called an L-curve by P. C. Hansen. He and others have made a number
of studies of these and related methods of regularization.
The method of cross validation for choosing a regularization parameter
in a possibly ill-conditioned least squares problem is presented in Golub,
Heath, and Wahba (1979). This topic has led to a considerable amount of
subsequent research.
Regularization, including L-curve, cross validation, and other approaches,
is treated in Ekten (1977,1984a, 1984b, 1984c, 1990), Fierro, et al. (1993),
Hanke and Hansen (1993), Hansen (1987, 1989a, 1989b, 1990a, 1990b,
1990c, 1992b, 1994), Hansen and O'Leary (1993), Hansen, et al.(1992),
and Zha and Hansen (1990).

Chapter 27. Updating and downdating of rows, sequen-


tial estimation, and consideration of sparsity and struc-
ture
Introducing or removing the effect of a row in a least squares problem is
called row updating or row downdating, respectively. Updating and down-
dating of both rows and columns are at the heart of algorithms for con-
strained optimization, e.g., see Gill, et al. (1981).
Row updating is the key operation in sequential estimation algorithms
such as are described in Chapter 27. Such algorithms can be used to
accumulate the effect of observations into the solution process one at a time
or in batches. One motivation for sequential algorithms is the reduction
of computer memory capacity needed, while in other cases the purpose
is to be able to obtain solutions at many intermediate time points as the
observations are being acquired. This latter type of application is generally
treated in the engineering literature with the terminology of filtering.
The algorithm for sequential solution of a banded least squares prob-
lem given in Chapter 27 can be regarded as an example of an algorithm
APP. D DEVELOPMENTS FROM 1974 TO 1995 293

designed to take advantage of a particular case of sparsity and structure.


Much attention has been given to algorithms for large sparse, and pos-
sibly structured, least squares problems. This includes both direct and
iterative methods. Algorithms for the solution of large sparse positive defi-
nite systems are often motivated by their application to partial differential
equations but also have potential application to least squares problems.
For research on updating and downdating see Bartels and Kaufman
(1989), Bendtsen, et al. (1995), Bischof, et al. (1993), Bjorck, et al. (1994),
Elden and Park (1994a, 1994b, 1995), Elhay, et al. (1991), Jones and
Plassman (1995a, 1995b), Moonen, et al. (1992), Olszanskyi, et al. (1994),
Park and Elden (1995a), Park and Van Huffel (1995), Shroff and Bischof
(1992), Stewart (1992), Xu, et al. (1994), and Yoo and Park (1995). .
A fundamental paper on the solution of large sparse linear least squares
problems is George and Heath (1980). Methods for a variety of types
of large, sparse, and in some cases structured least squares problems are
treated in Bjorck (1984), Coleman (1984), Dax (1993), Golub, et al. (1981),
Heath (1982), Matstoms (1992, 1994), Ng and Chan (1994), Osterby and
Zlatev (1983), Paige and Saunders (1982), Ray (1995), and Van Der Sluis
and Van Der Vorst (1990). Methods for large sparse linear least squares
problems are surveyed in Heath (1984). Sparse underdetermined linear
systems are treated in George, et al. (1984).
The point of view and terminology of filtering as an approach to sequen-
tial estimation problems springs primarily from Kalman (b-1960). Sequen-
tial estimation is treated from the point of view of statistical regression by
Duncan and Horn (1972) and as a structured sparse least squares problem,
to which orthogonal transformations can be effectively applied, by Paige
and Saunders (1977). Bierman (1977) presents sequential estimation from
the filtering point of view and mates effective use of orthogonal transfor-
mations for numerical stability.
Methods for Cholesky factorization of large sparse positive definite sys-
tems are given in Duff, Erisman, and Reid (1987), George, and Liu (1981),
Gilbert and Schreiber (1992), Joubert and Oppe (1994), Ng and Peyton
(1993), and Rothberg and Gupta (1994).

Nonlinear least squares


The topic of nonlinear least squares is not explicitly treated in the body
of this book. This problem is treated in Bard (1974), Bjorck (1990, 1995),
Dennis, et al. (1981), Dennis and Schnabel (1983), Donaldson and Schn-
abel (1987), Gill, et al. (1981), Hanson and Krogh (1992), Heinkenschloss
(1993), Huschens (1994), More, et al. (1980), Schnabel and Frank (1984),
and Wedin and Lindstrom (1988).
294 DEVELOPMENTS FROM 1974 TO 1995 APP.D

A subproblem that typically occurs in trust region algorithms for non-


linear least squares is the solution of a linear least squares problem subject
to a norm constraint. This subproblem is essentially the same as the reg-
ularization problem mentioned above in Chapters 25 and 26, and methods
for one of these problems are generally also applicable to the other. Papers
addressing these problems are Golub and von Matt (1991), Heinkenschloss
(1994), and More (1977).
It is not uncommon for some variables in a nonlinear least squares
problem to occur linearly. Special treatment can be given to such problems.
Such problems are called separable nonlinear least squares problems, and
the solution approach involving differentiation of a pseudoinverse matrix is
called the variable projection method. Such algorithms are presented and
analyzed in Golub and Pereyra (b-1973), Kaufman (1975), Kaufman and
Sylvester (1992), Kaufman, et al. (1994), Krogh (1974), and Lawton and
Sylvestre (1971).

Least squares algorithms involving Toeplitz and Hankel


matrices
Fast least squares algorithms have been developed for problems involving
Toeplitz and Hankel matrices. These problems arise in image analysis. See
Bojanczyk, et al. (1986), Chan, et al. (1993, 1994a, 1994b), Cybenko
(1987), Park and Elden (1995b), and Ng (1993).
There are signal processing algorithms that involve centrohermitian cor-
relation matrices. These have relations to Toeplitz matrices that allow the
development of sparse transformations that can be used reduce computa-
tions required for most applications. See Linebarger, et al. (1994).

Applications of least squares


The following papers deal with the use of least squares computations in a
variety of applications, including signal processing, maritime and aerospace
navigation, curve and surface fitting (using spline, radial, and Haar basis
functions), solution of integral equations, Markov chains, tomography, inte-
ger data, and so on: Arun (1992), Brezinski and Matos (1993), Dax (1992,
1994), Demetriou (1995), De Moor (1993a), Dowling, et al. (1994), Elhay,
et al. (1991), Elliott (1993), Fausett and Fulton (1994), Gander, et al.
(1994), Golub and Meyer (1986), Goodman, et al. (1995), Grosse (1980),
Hansen (1992a), Hansen and Christiansen (1985), Hansen, et al. (1992),
Hanson and Phillips (1975, 1978), Hanson and Norris (1981), Helmke and
Shayman (1995), Kaufman (1993), Kaufman and Neumaier (1994), Lawson
(1977, 1984), Linebarger, et al. (1994), Liu (1994), Mason, et al. (1993),
APR D DEVELOPMENTS FROM 1974 TO 1995 295

Poor (1992), Quak, et al. (1993), Reichel (1991), Sevy (1995), Springer
(1986), Stewart (1992), Wahba (1990), Williams and Kalogiratou (1993),
Wood (1994), and Xu, et al. (1994). Data assimilation in the fields of me-
teorology and oceanography gives rise to very large and complex nonlinear
least squares problems, e.g., see Navon, et al. (1992), Zou, et al. (1993),
and Zou and Navon (1994).

Appendix A. Basic Linear Algebra Including Projec-


tions
For more detailed exposition of the fundamentals of linear algebra, see
textbooks on the subject, e.g., Strang (1988).

Appendix B. Proof of Global Quadratic Convergence of


the QR Algorithm
The convergence proof presented in Appendix B is due to Wilkinson (b-
1968a, b-1968b). This proof, which is based on analysis of the history of
individual scalar components as the algorithm proceeds, may be unsatisfy-
ing to readers who would like more insight into the broader linear algebra
concepts underlying the algorithm. Such readers should see the conver-
gence proof given in Parlett (1980).

Appendix C. Description and Use of Fortran Codes for


Solving Problem LS
A set of computer procedures implementing the principal algorithms pre-
sented in the book has been developed. An additional subprogram, called
BVLS for the Bounded Variables Least Squares problem, has been added
since the 1974 publication. This set of software is described in Appendix
C. This software is available via the Internet from the NETLIB site, which
is also discussed in Appendix C.
The NETLIB site contains a very valuable collection of mathematical
software, including a number of packages relating to least squares com-
putation. LINPACK [Dongarra, et al. (1979)] and LAPACK [Anderson,
et al. (1995)] cover the basic problem types of linear algebra, including
linear least squares problems. The block-oriented algorithms of LAPACK
are primarily realized with Level-3 BLAS [Dongarra, et al. (1990)]. Also
used are Level-1 and Level-2 BLAS [Lawson, et al. (1979); Dongarra, et
al. (1988)]. The Level-1 BLAS package includes subroutines for both the
standard (4-multiply) Givens transformation and the "fast" (2-multiply)
Givens transformation as described in Chapter 10. The implementation of
296 DEVELOPMENTS FROM 1974 TO 1995 APP. D
D

the 2-multiply Givens transformation incorporates code to take care of the


overflow/underflow problems of that algorithm.
As a follow-on to LAPACK, the software packages ScaLAPACK, PBLAS,
and BLAGS were developed with the goal of achieving a practical level
of portability in the solution of fundamental linear algebra problems on
distributed-memory multiprocessor systems. The first "official" release of
these packages was in March 1995. See Ghoi, et al. (1995). All of the
software mentioned in this and the preceding paragraph is available from
NETLIB.
The nonlinear least squares problem is represented in NETLIB by soft-
ware described in Dennis, Gay, and Welsch (1981), Hanson and Krogh
(1992), and More, et al. (1980).
The Fortran 77 code QR27 [Matstoms (1992)] for sparse QR is in
NETLIB but requires the code MA27 from the Harwell library, which is
not in the public domain.
Also in NETLIB are the package of MATLAB procedures for apply-
ing regularization methods to least squares problems described in Hansen
(1994) and the MATLAB sparse QR procedure SQR of Matstoms (1994).
The integer linear equations and least squares package by Springer
(1986) is in the TOMS (Association for Computing Machinery Transac-
tions on Mathematical Software) portion of NETLIB. A software package,
CUTE, for testing optimization software [Bongartz, et al. (1995)] will be
in the TOMS portion of NETLIB.
A comprehensive guide to optimization software, including quadratic
programming and nonlinear least squares, is given in More and Wright
(1993).
The optimization packages LSSOL, for the linearly constrained linear
least squares and linear and quadratic programming problems, and NPSOL,
for the nonlinear programming problem, are described respectively in Gill,
et al. (1986, 1984) and are available for a fee from Stanford University.
The software package LANCELOT for large nonlinear optimization
problems is described in Conn, et al. (1992). For availability of the soft-
ware, inquire of pht@math. f uncp. ac. be.
A library of C codes for linear algebra, including least squares, is de-
scribed in Stewart (1994).

References
Adams, G, E., Bojanczyk, A. W., and Luk, F. T. (1994) "Computing the
PSVD of Two 2x2 Triangular Matrices," SIAM J. Matrix Anal. Appl,
15, No. 4, 366-382.
APP. D DEVELOPMENTS PROM 1974 TO 1995 297

Anda, A. A. and Park, H. (1994) "Fast Plane Rotations with Dynamic


Scaling," SIAM J. Matrix Anal. AppL, 15, No. 1, 162-174.
Anda, A. A. and Park, H. (To appear) "Self-Scaling Fast Rotations for
Stiff Least Squares Problems," Linear Algebra AppL
Anderson, £., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz,
J., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., and
Sorensen, D. (1995) LAPACK Users' Guide, Second Edition, SIAM Publ.,
Philadelphia, PA.
Arun, K. S. (1992) "A Unitarily Constrained Total Least-Squares Prob-
lem in Signal-Processing," SIAM J. Matrix Anal. AppL, 13, No. 3, 729-
745.
Bai, Z. and Demmel, J. W. (1991) "Computing the Generalized Singular
Value Decomposition," Report UCB/CSD 91/645, Computer Science Div.,
Univ. of Calif., Berkeley.
Bai, Z. and Zha, H. (1993) "A New Preprocessing Algorithm for the
Computation of the Generalized Singular Value Decomposition," siam J.
Set. Statist. Comput., 14, No. 4, 1007-1012.
Bard, Y. (1974) Nonlinear Parameter Estimation, Academic Press, San
Diego, 341 pp.
Barlow, J. L. (1988) "Error Analysis and Implementation Aspects of
Deferred Correction for Equality Constrained Least Squares Problems,"
SIAM J. Numer. Anal., 25, 1340-1358.
Barlow, J. L. and Handy, S. L. (1988) "The Direct Solution of Weighted
and Equality Constrained Least Squares Problems," SIAM J. Sci. Statist.
Comput., 9, 704-716.
Barlow, J. L. and Ipsen, I. C. F. (1988) "Scaled Givens' Rotations for
the Solution of Linear Least Squares Problems on Systolic Arrays," SIAM
J. Sci. Statist. Comput, 8, 716-733.
Barlow, J. L. and Vemulapati, U. B. (1992a) "A Note on Deferred
Correction for Equality Constrained Least Squares Problems," SIAM J.
Matrix Anal. AppL, 29, No. 1, 249-256.
Barlow, J. L. and Vemulapati, U. B. (1992b) "Rank Detection Methods
for Sparse Matrices," SIAM J. Matrix Anal. AppL, 13, No. 4, 1279-1297.
Bartels, R. and Kaufman, L. C. (1989) "Cholesky Factor Updating
Techniques for Rank-Two Matrix Modifications," SIAM J. Matrix Anal.
AppL, 10, No. 4, 557-592.
Bau, D. (1994) "Faster SVD for Matrices with Small m/n," TR94-14U,
Dept. of Computer Science, Cornell Univ.
Bendtsen, C., Hansen, P. C., Madsen, K., Nielsen, H. B., and Pinar, M.
(1995) "Implementation of QR Updating and Downdating on a Massively
Parallel Computer," Parallel Comput., 21, No. 1, 49-61.
Bierman, G. J. (1977) Factorization Methods for Discrete Sequential Es-
298 DEVELOPMENTS FROM 1974 TO 1995 APP. D

timation, Mathematics in Science and Engineering, 128, Academic Press,


Orlando, FL
Bischof, C. H., Pan, C.-T., and Tang, P. T. P. (1993) "A Cholesky Up-
and Downdating Algorithm for Systolic and SIMD Architectures," SI AM
J. Sci. Comput., 14, No. 3, 670-676.
Bischof, C. H. and Hansen, P. C. (1991) "Structure Preserving and
Rank-Revealing QR-Factorizations," SIAM J. Sci. Statist. Comput., 12,
1332-1350.
Bischof, C. H. and Hansen, P. C. (1992) "A Block Algorithm for Com-
puting Rank-Revealing QR-Factorizations," Numer. Algorithms, 2, 371-
392.
Bjorck, A. (1984) "A General Updating Algorithm for Constrained Lin-
ear Least Squares Problems," SIAM J. Sci. Statist. Comput., 5, No. 2,
394r402.
Bjorck, A. (1987) "Stability Analysis of the Method of Seminormal
Equations," Linear Algebra. Appl, 88/89, 31-48.
Bjorck, A. (1990) "Least Squares Methods," in Handbook of Numerical
Analysis, Vol. 1, ed. P. G. Ciarlet and J. L. Lions, Elsevier-North Holland,
465-652. Second printing, 1992.
Bjorck, A. (1995) Numerical Methods for Least Squares Problems, SIAM
Publ., Philadelphia, PA. (To appear).
Bjorck, A. and Paige, C. C. (1992) "Loss and Recapture of Orthogo-
nality hi the Modified Gram-Schmidt Algorithm," SIAM J. Matrix Anal.
Appl, 13, No. 1, 176-190.
Bjorck, A, Park, H. and Elden L. (1994) "Accurate Downdating of Least
Squares Solutions," SIAM J. Matrix Anal. Appl., 15, No. 2, 549-568.
Boggs, P. T., Byrd, R. H., and Schnabel, R. B. (1987) "A Stable and
Efficient Algorithm for Nonlinear Orthogonal Distance Regression," SIAM
J. Sci. Statist. Comput, 8, 1052-1078.
Bongartz, L, Conn, A. R., Gould, N., and Toint, P. L. (1995) "CUTE:
Constrained and Unconstrainted Testing Environment," ACM Trans. Math.
Software, 21, No. 1, 123-160.
Bojanczyk, A. W., Brent, R. P., and de Hoog, F. R. (1986) "QR fac-
torization of Toeplitz matrices," Numer. Math., 49, 81-94.
Brezinski, C. and Matos, A. C. (1993) "Least-squares Orthogonal Poly-
nomials," J. Comput. Appl. Math., 46, No. 1-2, 229-239.
Chan, R. H., Nagy, J. G., and Plemmons, R. J. (1993) "FFT-Based Pre-
conditioners for Teoplitz-Blcck Least Squares Problems," SIAM J. Numer.
Anal., 30, No.6, 1740-1768.
Chan, R. H., Nagy, J. G., and Plemmons, R. J. (1994a) "Circulant
Preconditioned Toeplitz Least Squares Iterations," SIAM J. Matrix Anal.
Appl, 15, No. 1, 80-97.
APP.D DEVELOPMENTS FROM 1974 TO1995 299

Chan, R. H., Nagy, J. G., and Plemmons, R. J. (1994b) "Displacement


Preconditioner for Toeplitz Least Squares Iterations," Electron. Trans.
Numer. Anal., 2, 44-56.
Chan, T. F. (1982) "An Improved Algorithm for Computing the Sin-
gular Value Decomposition," ACM Trans. Math. Software, 8, 72-83. Also
"Algorithm 581," 84-88.
Chan, T. F. (1987) "Rank-Revealing QR Factorizations," Linear Alge-
bra Appl, 88/89, 67-82.
Chan, T. F. and Hansen, P. C. (1990) "Computing Truncated SVD
Least Squares Solutions by Rank Revealing QR-Factorizations," SIAM J.
Sci. Statist Comput., 11, 519-530.
Chan, T. F. and Hansen, P. C. (1992) "Some Applications of the Rank
Revealing QR Factorization," SIAM J. Sci. Statist. Comput., 13, 727-741.
Chan, T. F. and Hansen, P. C. (1994) "Low-Rank Revealing QR Fac-
torizations," Numerical Linear Algebra with Applications, 1, Issue 1, 33-44.
Chandrasekaran, S. and Ipsen, I. C. F. (1994) "On Rank-Revealing
Factorisations," SIAM J. Matrix Anal. Appl, 15, No. 2, 592-622.
Chandrasekaran, S. and Ipsen, I. C. F. (1995) "Analysis of a QR Algo-
rithm for Computing Singular Values," SIAM J. Matrix Anal. Appl, 16,
520-535.
Choi, J., Demmel, J., Dhillon, I., Dongarra, J., Ostrouchov, S., Petitet,
A., Stanley, K., Walker, D., and Whaley (1995), ScaLAPACK, PBLAS,
and BLACS, Univ of Term., Knoxville, Univ. of Calif, Berkeley, and Oak
Ridge National Laboratory. scalapack@cs.utk.edu.
Cline, Alan (1975) "The Transformations of a Quadratic Programming
Problem into Solvable Form," ICASE Report 75-4, NASA Langley Research
Center, Hampton, VA.
Coleman, T. F. (1984) Large Sparse Numerical Optimization, Lecture
Notes in Computer Science, 165, ed. Goos and Hartmanis, Springer-
Verlag, Berlin, 105 pp.
Coleman, T. F. and Van Loan, C. F. (1988) Handbook for Matrix Com-
putations, SIAM Publ., Philadelphia, PA.
Conn, A. R., Gould, N. I. M., and Toint, Ph. L. (1992) LANCELOT: A
Fortran Package for Large-Scale Nonlinear Optimization, Springer Series in
Computational Mathematics, 17, Springer-Verlag, Heidelberg, New York.
Cybenko, G. (1987) "Fast Toeplitz Orthogonalization Using Inner Prod-
ucts," SIAM J. Sci. Stat. Comput., 8, 734-740.
Datta, B. N. (1995) Numerical Linear Algebra and Applications, Brooks/
Cole Publ. Co., Pacific Grove, CA.
Dax, A. (1991) "On Computational Aspects of Bounded Linear Least-
Squares Problems," ACM Trans. Math. Software, 17, No. 1, 64-73.
300 DEVELOPMENTS FROM 1974 TO 1995 APP. D

Dax, A. (1992) "On Regularized Least Norm Problems," SIAM J. Op-


tim., 2, No. 4, 602-618.
Dax, A. (1993) "On Row Relaxation Methods for Large Constrained
Least Squares Problems," SIAM J. Sci. Statist Comput, 14, No. 3,
570-584.
Dax, A. (1994) "A Row Relaxation Method for Large lp Least Norm
Problems," Numerical Linear Algebra with Applications, 1, Issue 3, 247-
263.
Demetriou, I. C. (1995) "Algorithm: L2CXFT: A Fortran Subroutine
for Least Squares Data Fitting with Nonnegative Second Divided Differ-
ences," ACM Trans. Math. Software, 21, No. 1, 98-110.
De Moor, B. (1993a) "The Singular Value Decomposition and Long and
Short Spaces of Noisey Matrices," IEEE Trans. Signal Processing, 41, No.
9, 2826-2838.
De Moor, B. (1993b) "Structured Total Least Squares and L2 Approx-
imation Problems," Linear Algebra Appl, 188/189, 163-206.
De Moor, B. and Van Dooren, P. (1992) "Generalizations of the Singular
Value and QR Decompositions," SIAM J. Matrix Anal. Appl, 13, No. 4,
993-1014.
Demmel, J. W. and Kahan, W. (1990) "Accurate Singular Values of
Bidiagonal Matrices," SIAM J. Sci. Statist. Comput., 11, 873-912.
Dennis, J. E. Jr., Gay, D. M., and Welsch, R. E. (1981) "An Adaptive
Nonlinear Least-Squares Algorithm," A CM Trans. Math. Software, 7, No.
3, 348-368. Also "Algorithm 573, NL2SOL," 369-383.
Dennis, J. E. Jr. and Schnabel, R. B. (1983) Numerical Methods for
Unconstrained Optimization and Nonlinear Equations, Prentice-Hall, En-
glewood Gluts, NJ, 378 pp.
Dietrich, G. (1976) "A New Formulation of the Hypermatrix Householder-
QR Decomposition," Computer Methods in Applied Mechanical Engineer-
ing, 9, 273-289.
Ding, J. and Huang, L. J. (1994) "On the Perturbation of the Least
Squares Solutions hi Hilbert Spaces," Linear Algebra Appl., 212/213, 487-
500.
Donaldson, J. R. and Schnabel, R. B. (1987) "Computational Experi-
ence with Confidence Regions and Confidence Intervals for Nonlinear Least
Squares," Technometrics, 29, 67-82.
Dongarra, J. J., DuCroz, J., Hammarling, S., and Hanson, R. (1988),
"An Extended Set of Fortran Basic Linear Algebra Subprograms," ACM
Trans. Math. Software, 14, 1-32.
Dongarra, J. J., DuCroz, J., Duff, I., and Hammarling, S. (1990), "A
Set of Level-3 Basic Linear Algebra Subprograms," ACM Trans. Math.
Software, 16, 1-17.
APP. D DEVELOPMENTS FROM 1974 TO 1995 301

Dongarra, J. J. and Grosse, E. H. (1987) "The Distribution of Mathe-


matical Software Using Electronic Mail," Comm. of the A CM, 30, 403-407.
Dongarra, J. J., Kaufman, L. C., and Hammarting S. (1986) "Squeez-
ing the Most Out of Eigenvalue Solvers on High-performance Computers,"
Linear Algebra Appl., 77, 113-136.
Dongarra, J. J., Rowan, T., and Wade, R. (1995) "Software Distribution
using XNETLIB," ACM Trans. Math. Software, 21, No. 1, 79-88.
Dongarra, J. J., Moler, C. B., Bunch, J. R., and Stewart, G. W. (1979)
UNPACK Users' Guide, SIAM Publ., Philadelphia, PA.
Dowling, E. M., Ammann, L. P., and De Groat, R. D. (1994) "A TQR-
Iteration Based Adaptive SVD for Real-Time Angle and Frequency Track-
ing," IEEE Trans. Signal Processing, 42, No. 4, 914-926.
Duff, I. S., Erisman, A. M., and Reid, J. K. (1987) Direct Methods for
Sparse Matrices, Oxford University Press.
Duncan, D. B. and Horn, S. D. (1972) "Linear Dynamic Recursive Es-
timation from the Viewpoint of Regression Analysis," J. Amer. Statist.
Assoc., 67, 815-821.
Elden, L. (1977) "Algorithms for the Regularization of Hi-Conditioned
Least Squares Problems", BIT, 17, 134-145.
Elden, L. (1980) "Perturbation Theory for the Least Squares Problem
with Equality Constraints," SIAM J. Numer. Anal., 17, 338-350.
Elden, L. (1982) "A Weighted Pseudoinverse, Generalized Singular Val-
ues, and Constrained Least Squares Problems," BIT, 22, 487-502.
Elden, L. (1984a) "An Algorithm for the Regularization of 111-Conditioned,
Banded Least Squares Problems," SIAM J. Sci. Statist. Comput, 5, 237-
254.
Elden, L. (1984b) "An Efficient Algorithm for the Regularization of
Ill-Conditioned Least Squares Problems with Triangular Toeplitz Matrix,"
SIAM J. Sci. Stat Comput, 5, 229-236.
Elden, L. (1984c) "A Note on the Computation of the Generalized
Cross-Validation Function for Ill-Conditioned Least Squares Problems,"
BIT, 24, 467-472.
Elden, L. (1990) "Algorithms for the Computation of Functionals De-
fined on the Solution of a Discrete HI-Posed Problem," BIT, 30, 466-483.
Elden, L. and Park, H. (1994a) "Block Downdating of Least Squares
Solutions," SIAM J. Matrix Anal. Appl., 15, No. 3,1018-1034.
Elden, L. and Park, H. (1994b) "Perturbation Analysis for Block Down-
dating of a Cholesky Decomposition," Numer. Math., 68, 457-467.
Elden, L. and Park, H. (1995) "Perturbation and Error Analyses for
Block Downdating of a Cholesky Decomposition," Report LiTH-Mat-R-11,
Department of Mathematics, Linkoping University.
302 DEVELOPMENTS FROM 1974 TO 1995 APP. D

Elhay, S., Golub, G. H., and Kautsky, J. (1991) "Updating and Down-
dating of Orthogonal Polynomials with Data Fitting Applications," SIAM
J. Matrix Anal. Appl., 12, No. 2, 327-353.
Elliott, G. H. (1993) "Least Squares Data Fitting Using Shape Preserv-
ing Piecewise Approximations," Numer. Algorithms, 5, No. 1-4, 365-371.
Fausett, D. W. and Fulton, C. T. (1994) "Large Least Squares Problems
Involving Kronecker Products," SIAM J. Matrix Anal. Appl., 15, No. 1,
219-227.
Fernando, K. V. and Hammarling, S. J. (1987) "A Product Induced
Singular Value Decomposition (IISVD) for Two Matrices and Balanced Re-
alisation," in Linear Algebra in Signals, Systems, and Control, Eds. Datta,
B. N., Johnson, C. R., Kaashoek, M. A., Plemmons, R. J., and Sontag, E.
D., SIAM Publ., Philadelphia, PA., 128-140.
Fierro, R. D. and Bunch, J. R. (1994) "Colinearity and Total Least
Squares," SIAM J. Matrix Anal. Appl., 15, No. 4, 1167-1181.
Fierro, R. D., Golub, G. H., Hansen, P. C., and O'Leary, D. P. (1993)
"Regularization by Truncated Total Least Squares", Report UNIC-93-14,
(20 pages); SIAM J. Sci. Stat. Comput., [To appear]
Fierro, R. D. and Hansen, P. C. (1993) "Accuracy of TSVD Solutions
Computed from Rank-Revealing Decompositions," revised version of Re-
port UNIC-93-05, (15 pages); submitted to Numer. Math.
Gander, W., Golub, G. H., and Strebel, R. (1994) "Least Squares Fit-
ting of Circles and Ellipses," BIT, 34, No. 4, 558-578.
George, A., and Heath, M. T. (1980) "Solution of Sparse Linear Least
Squares Problems Using Givens Rotations," Linear Algebra Appl., 34, 69-
83.
George, A., Heath, M. T., and Ng, E. (1984) "Solution of Sparse Under-
determined Systems of Linear Equations," SIAM J. Sci. Statist. Comput.,
5, No. 4, 988-997.
George, A. and Liu, J. W-H (1981) Computer Solution of Large Sparse
Positive Definite Systems, Prentice-Hall, Englewood Cliffs, 324 pp.
Gilbert, J. and Schreiber, R. (1992) "Highly Parallel Sparse Cholesky
Factorization" SIAM J. Sci. Statist. Comput., 13, 1151-1172.
Gill, P. E., Hammarling, S. J., Murray, W., Saunders, M. A., and
Wright, M. H. (1986) "User's Guide for LSSOL (Version 1.0): A Fortran
Package for Constrained Linear Least-Squares and Convex Quadratic Pro-
gramming," Systems Optimization Laboratory Tech. Rpt. SOL 86-1, 38
pp.
Gill, P. E., Murray, W., Saunders, M. A., and Wright, M. H. (1984)
"User's Guide for QPSOL (Version 3.2): A Fortran Package for Quadratic
Programming,'' Systems Optimization Laboratory Tech. Rpt. SOL 84-6,
37 pp.
APP. D DEVELOPMENTS FROM 1974 TO 1995 303

Gill, P. E., Murray, W., and Wright, M. H. (1981) Practical Optimiza-


tion, Academic Press, London, 401 pp. Sixth printing, 1987.
Golub, G. H., Heath, M., and Wahba, G. (1979) "Generalized Cross-
Validation as a Method for Choosing a Good Ridge Parameter," Techno-
metrics, 21, 215-223.
Golub, G. H., Luk, F. T., and Overton, M. (1981) "A Block Lanczos
Method for Computing the Singular Values and Corresponding Singular
Vectors of a Matrix.," ACM Trans. Math. Software, 7, 149-169.
Golub, G. H. and Meyer, C. D. (1986) "Using the QR Factorization and
Group Inversion to Compute, Differentiate, and Estimate the Sensitivity of
Stationary Probalities for Markov Chains," SIAM J. Algebraic and Discrete
Methods, 7, 273-281.
Golub, G. H. and Van Loan, C. F. (1980) "An Analysis of the Total
Least Squares Problem," SIAM J. Numer. Anal., 17, 883-893.
Golub, G. H. and Van Loan, C. F. (1989) Matrix Computations, Second
Edition, Johns Hopkins University Press, Baltimore, 642 pp.
Golub, G. H. and von Matt, U. (1991) "Quadratically Constrained Least
Squares and Quadratic Problems," Numer. Math., 59, 561-580.
Goodman, T. N. T., Micchelli, C. A., Rodriguez, G. and Seatzu, S.
(1995) "On the Cholesky Factorization of the Gram Matrix of Locally Sup-
ported Functions.," BIT, 35, No. 2, 233-257.
Grosse, E. H. (1980) Tensor Spline Approximation," Linear Algebra
Appl, 34, 29-41.
Grosse, E. H. (1995) "Repository Mirroring," ACM Trans. Math. Soft-
ware, 21, No. 1, 89-97.
Guler, O., Hoffman, A. J., and Rothblum, U. G. (1995) "Approxima-
tions to Solutions to Systems of Linear Inequalities," SIAM J. Matrix Anal.
Appl, 16, No. 2, 688-696.
Gulliksson, M. (1994) "Iterative Refinement for Constrained and Weighted
Linear Least Squares," BIT, 34, No. 2, 239-253.
Gulliksson, M. (1995) "Backward Error Analysis for the Constrained
and Weighted Linear Least Squares Problem When Using the Weighted
QR Factorization," SIAM J. Matrix Anal. Appi, 16, No. 2, 675-687.
Gulliksson, M. and Wedin, P. A. (1992) "Modifying the QR Decomposi-
tion to Constrained and Weighted Linear Least Squares," SIAM J. Matrix
Anal. Appi, 13, No. 4,1298-1313.
Hanke, M. and Hansen, P. C. (1993) "Regularization Methods for Large-
Scale Problems," Surveys Math. Indust. 3, 253-315.
Hansen, P. C. (1987) "The Truncated SVD as a Method for Regular-
ization," BIT, 27, 354-553.
Hansen, P. C. (1989a) "Regularization, GSVD and Truncated GSVD,"
BIT 29, 491-504.
304 DEVELOPMENTS FROM 1974 TO 1995 APP. D

Hansen, P. C. (1989b) "Perturbation Bounds for Discrete Tikhonov


Regularization," Inverse Problems, 5, L41-L44.
Hansen, P. C. (1990a) "Truncated SVD Solutions to Discrete Ill-posed
Problems with Ill-determined Numerical Rank," SIAM J. Set. Statist.
Comput, 11, 503-518.
Hansen, P. C. (1990b) "Relations Between SVD and GSVD of Discrete
Regularization Problems in Standard and General Form," Linear Algebra
Appl., 141, 165-176.
Hansen, P. C. (1990c) "The Discrete Picard Condition for Discrete Ill-
posed Problems," BIT, 30, 658-672.
Hansen, P. C. (1992a) "Numerical Tools for Analysis and Solution of
Fredholm Integral Equations of the First Kind," Inverse Problems, 8,
849-872.
Hansen, P. C. (1992b) "Analysis of Discrete Ill-posed Problems by
Means of the L-curve," SIAM Rev., 34, 561-580.
Hansen, P. C. (1994) "Regularization Tools: A Matlab Package for
Analysis and Solution of Discrete Ill-posed Problems," Numer. Algorithms,
6, 1-35. Longer version: UNIC-92-03, 1993, Technical University of Den-
mark, Lyngby, Denmark, 110 pp.
Hansen, P. C. and Christiansen, S. (1985) "An SVD Analysis of Lin-
ear Algebraic Equations Derived from Fust Kind Integral Equations," J.
Comput. Appl Math., 12-13, 341-357.
Hansen, P. C. and O'Leary, D. P. (1993) "The Use of the L-Curve in
the Regularization of Discrete Ill-Posed Problems," SIAM J. Sci. Comput.,
14, 1487-1503.
Hansen, P. C., Sekii, T., and Shibahashi, H. (1992) "The Modified
Truncated-SVD Method for Regularization in General Form," SIAM J.
Sci. Statist. Comput, 13, 1142-1150.
Hanson, R. J. (1986) "Linear Least Squares with Bounds and Linear
Constraints," SIAM J. Sci. Statist. Comput, 7, No. 3, 826-834.
Hanson, R. J. and Haskell, K. H. (1981) "An Algorithm for the Linear
Least Squares Problems with Equality and Inequality Constraints," Math.
Programming, 21, 98-118.
Hanson, R. J. and Haskell, K. H. (1982) "Algorithm 587: Two Algo-
rithms for the Linearly Constrained Least Squares Problem"ACM Trans.
Math. Software, 8, No. 3, 323-333.
Hanson, R. J. and Krogh, F. T. (1992) "A Quadratic-Tensor Model
Algorithm for Nonlinear Least-Squares Problems with Linear Constraints,"
ACM Trans. Math. Software, 18, No. 2, 115-133.
Hanson, R. J. and Norris, M. J. (1981) "Analysis of Measurements
Based on the Singular Value Decomposition," SIAM J. Sci. Statist Corn-
put, 2, No. 3, 363-373.
APP. D DEVELOPMENTS FROM 1974 TO 1995 305

Hanson, R. J. and Phillips, J. L. (1975) "An Adaptive Numerical Method


for Solving Linear Fredholm Integral Equations of the First Kind," Numer.
Math., 24, 291-307.
Hanson, R. J. and Phillips, J. L. (1978) "Numerical Solution of Two-
Dimensional Integral Equations Using Linear Elements," SIAM J. Numer.
Anal., 15, No. 1, 113-121.
Heath, M. T. (1982) "Some Extensions of an Algorithm for Sparse Lin-
ear Least Squares Problems," SIAM J. Sci. Statist. Comput, 3, No. 2,
223-237.
Heath, M. T. (1984) "Numerical Methods for Large Sparse Linear Least
Squares Problems," SIAM J. Sci. Statist. Comput., 5, No. 3, 497-513.
Heinkenschloss, M. (1993) "Mesh Independence for Nonlinear Least
Squares Problems with Norm Constraints," SIAM J. Optim., 3, No. 1,
81-117.
Heinkenschloss, M. (1994) "On the Solution of a Two Ball Trust Region
Subproblem," Math. Programming, 64, No. 3, 249-276.
Helmke, U. and Shayman, M. A. (1995) "Critical Points of Matrix Least
Squares Distance Functions," Linear Algebra Appi, 215, 1-20.
Higham, N. J. (1987) "A Survey of Condition Number Estimation for
Triangular Matrices," SIAM Rev., 29, 575-596.
Huschens, J. (1994) "On the Use of Product Structure in Secant Meth-
ods for Nonlinear Least Squares Problems," SIAM J. Optim., 4, No. 1,
108-129.
James, D. (1992) "Implicit Nullspace Iterative Methods for Constrained
Least-Squares Problems," SIAM J. Matrix Anal. Appl, 13, No. 3, 962-
978.
Jessup, E. R. and Sorensen, D. C. (1994) "A Parallel Algorithm for
Computing the Singular Value Decomposition of a Matrix," SIAM J. Ma-
trix Anal. Appl, 15, No. 2, 530-548.
Jones, M. T. and Plassmann, P. E. (1995a) "An Improved Incomplete
Cholesky Factorization," ACM Trans. Math. Software, 21, No. 1, 5-17.
Jones, M. T. and Plassmann, P. E. (1995b) "Algorithm 740: Fortran
Subroutines to Compute Improved Incomplete Cholesky Factorizations,"
ACM Trans. Math. Software, 21, No. 1, 18-19.
Joubert, W. and Oppe, T. (1994) "Improved SSOR and Incomplete
Cholesky Solution of Linear Equations on Shared Memory and Distributed
Memory Parallel Computers," Numerical Linear Algebra with Applications,
1, Issue 3, 287-311.
Kaufman, L. C. (1975) "A Variable Projection Method for Solving Sep-
arable Nonlinear Least Squares Problems," BIT, 15, 49-57.
Kaufman, L, C. (1987) "The Generalized Householder Transformation
and Sparse Matrices," Linear Algebra Appi, 90, 221-235.
306 DEVELOPMENTS FROM 1974 TO 1995 APP. D

Kaufman, L. C. (1993) "Maximum Likelihood, Least Squares, and Pe-


nalized Least Squares for PET (Positron Emission Tomography)," IEEE
Trans. Medical Imaging, 12, No.2, 200-214.
Kaufman, L. C. and Neumaier, A. (1994) "Image Reconstruction Through
Regularization by Envelope Guided Conjugate Gradients", Bell Laborato-
ries Comput. Sci. Report 940819-14, AT&T Bell Laboratories, Murray
Hill, New Jersey, 32 pp.
Kaufman, L. and Sylvester, G. (1992) "Separable Nonlinear Least-
Squares with Multiple Right-hand Sides," SIAM J. Matrix Anal. Appl,
13, No. 1, 68-89.
Kaufman, L. C., Sylvester, G., and Wright, M. H. (1994) "Structured
Linear Least-Squares Problems in System Identification and Separable Non-
linear Data Fitting," SIAM J. Optim., 4, 847-871.
Krogh, F. T. (1974) "Efficient Implementation of a Variable Projection
Algorithm for Nonlinear Least Squares Problems," Comm. of the ACM,
17, No. 3, 167-169. Errata in same volume, No. 10, p. 591.
Lawson, C. L. (1977) "Software for C1 Surface Interpolation," Mathe-
matical Software III, ed. Rice, J. R., Academic Press, 161-194.
Lawson, C. L. (1984) "C1 Surface Interpolation for Scattered Data on
a Sphere," The Rocky Mountain J. Math., 14, No. 1, 177-202.
Lawson, C. L., Hanson, R., Kincaid, D., and Krogh, F. (1979) "Ba-
sic Linear Algebra Subprograms for Fortran Usage," ACM Trans. Math.
Software, 5, 308-325.
Lawton, W. H. and Sylvestre, E. A. (1971) "Elimination of Linear Pa-
rameters in Nonlinear Regression," Technometrics, 13, No. 3, 461-467.
Linebarger, D., DeGroat, R., and Dowling, E. (1994) "Efficient Direction-
Finding Methods Employing Forward/Backward Averaging," IEEE Trans.
Signal Processing, 42, No. 8, 2136-2145.
Liu, J. (1994) "A Sensitivity Analysis for Least-Squares Ill-Posed Prob-
lems Using the Haar Basis," SIAM J. Numer. Anal., 31, No. 5,1486-1496.
Mason, J. C., Rodriguez, G., and Seatzu, S. (1993) "Orthogonal Splines
Based on B-splines—with Applications to Least Squares, Smoothing and
Regularization problems," Numer. Algorithms, 5, 25-40.
Mathias, R. (1995) "Condition Estimation for Matrix Functions via the
Schur Decomposition," SIAM J. Matrix Anal. Appl., 16, No. 2, 565-578.
Matstoms, P. (1992) "Subroutine QR27—Solution of Sparse Linear
Least Squares Problems," Department of Mathematics, University of Linkop-
ing, S-581 83 Linkoping, Sweden.
Matstoms, P. (1994) "Sparse QR Factorization inMATLAB,"ACM
Trans. Math. Software, 20, No. 1, 136-159.
Matstoms, P. (1995) Sparse QR factorization with applications to linear
least squares problems, Thesis, University of Linkoping, Sweden.
APP. D DEVELOPMENTS FROM 1974 TO 1995 307

Moonen, M., Van Dooren, P., and Vandewalle, J. (1992) "A Singular
Value Decomposition Updating Algorithm for Subspace Tracking," SIAM
J. Matrix And. Appi, 13, No. 4, 1015-1038.
More, J. J. (1977) "The Levenberg-Marquardt Algorithm: Implementa-
tion and Theory," Numerical Analysis, Proceedings, Biennial Conference,
Dundee 1977, ed. G. A. Watson, Springer-Verlag, Berlin, Heidelberg, New
York, 105-116.
More, J. J., Garbow, B. S., and Hillstrom, K. E. (1980) "User Guide for
MINPACK-1," Argonne National Laboratory Report ANL-80-74, Argonne,
IL.
More, J. J. and Wright, S. J. (1993) Optimization Software Guide, Fron-
tiers in Applied Math., 14, SIAM Publ., Philadelphia, PA., 154 pp.
Navon, I. M., Zou, X., Derber, J., and Sela, J. (1992) "Variational Data
Assimilation with the N.M.C. Spectral Model. Part 1: Adiabatic Model
Tests," Monthly Weather Review, 120, No. 7, 1433-1446.
Ng, E. and Peyton, B. W. (1993) "A Supernodal Cholesky Factoriza-
tion Algorithm for Shared-Memory Multiprocessors," SIAM J. Sci. Statist.
Comput, 14, No. 4, 761-769.
Ng, M. K. (1993) "Fast Iterative Methods for Solving Toeplitz-Plus-
Hankel Least Squares," Electron. Trans. Numer. Anal., 2, 154-170.
Ng, M. K. and Chan, R. H. (1994) "Fast Iterative Methods for Least
Squares Estimations," Numer. Algorithms, 6, Nos. 3-4, 353-378.
Olszanskyi, S. J., Lebak, J. M., and Bojanczyk, A. W. (1994) "Rank-
k Modification Methods for Recursive Least Squares Problems," Numer.
Algorithms, 7, Nos. 2-4, 325-354.
0sterby, O. and Zlatev, Z. (1983) Direct Methods for Sparse Matrices,
Lecture Notes in Computer Science, 157, Springer-Verlag, Berlin, 127 pp.
Paige, C. C. (1979a) "Computer Solution and Perturbation Analysis of
Generalized Least Squares Problems," Math. Comput., 33, 171-184.
Paige, C. C. (1979b) "Fast Numerically Stable Computations for Gen-
eralized Least Squares Problems," SIAM J. Numer. Anal., 16, 165-171.
Paige, C. C. (1984) "A Note on a Result of Sun Ji-Guang: Sensitivity
of the CS and CSV Decompositions," SIAM J. Numer. Anal., 21,186-191.
Paige, C. C. (1986) "Computing the Generalized Singular Value De-
composition," SIAM J. Sci. Statist. Comput, 7, 1126-1146.
Paige, C. C. (1990) "Some Aspects of Generalized QR Factorizations,"
Reliable Numerical Computation, Ed. Cox, M. G. and Hammarling, S.,
Oxford University Press, Oxford, UK, 73-91.
Paige, C. C. and Saunders, M. A. (1977) "Least Squares Estimation
of Discrete Linear Dynamic Systems Using Orthogonal Transformations,"
SIAM J. Numer. Anal., 14, No. 2, 180-193.
308 DEVELOPMENTS FROM 1974 TO 1995 APP. D

Paige, C. C. and Saunders, M. A. (1981) "Toward a Generalized Singu-


lar Value Decomposition," SIAM J. Numer. Anal., 18, 398-405.
Paige, C. C. and Saunders, M. A. (1982) "LSQR: An Algorithm for
Sparse Linear Equations and Sparse Least Squares," ACM Trans. Math.
Software, 6, 43-71.
Park, H (1994) "ESPRIT Direction-of-Arrival Estimation in the Pres-
ence of Spatially Correlated Noise," SIAM J. Matrix Anal. Appi, 15, No.
1, 185-193.
Park, H. and Elden, L. (1995a) "Downdating the Rank-Revealing URV
Decomposition," SIAM J. Matrix Anal Appl, 16, 138-155.
Park, H. and Ekten, L. (I995b) "Stability Analysis and Fast Algorithms
for Triangularization of Toeplitz Matrices," Tech. Report, LiTH-MAT-
R-95-16, Department of Mathematics, Linkoping University, Linkoping,
Sweden.
Park, H. and Van Huffel, S. (1995) "Two-way Bidiagonalization Scheme
for Downdating the Singular Value Decomposition," Linear Algebra Appl,
220, 1-17.
Parlett, B. N. (1980) The Symmetric Eigenvalue Problem, Prentice-
Hall, Inc. Englewood Cliffs, NJ
Poor, W. (1992) "Statistical Estimation of Navigational Errors," IEEE
Trans. Aerospace Electron. Systems, 28, No. 2, 428-438.
Quak, E., Sivakumar, N., and Ward, J. D. (1993) "Least Squares Ap-
proximation by Radial Functions," SIAM J. Math. Anal., 24, No. 4,
1043-1066.
Ray, R. D. (1995) "Algorithm: Least Squares Solution of a Linear Bor-
dered, Block-diagonal System of Equations," ACM Trans. Math. Software,
21, No. 1, 2-25.
Reichel, L. (1991) "Fast QR Decomposition of Vandermonde-like Matri-
ces and Polynomial Least Squares Approximation," SIAM J. Matrix Anal.
Appl, 12, 552-564.
Reilly, P. M., Reilly, H. V., and Keeler, S. E. (1993) "Algorithm AS-286:
Parameter Estimation in the Errors-in-Variables Model," Appl Statistics
J. of the Royal Stat. Soc., 42, No. 4, 693-701.
Rosen, J.B., Park, H, and Click, J (1996) "Total Least Norm Formula-
tion and Solution for Structured Problems," SIAM J. Matrix Anal. Appl,
17, No. 1, [To appear].
Rothberg, E. and Gupta, A. (1994) "An Efficient Block-Oriented Ap-
proach to Parallel Sparse Cholesky Factorization," SIAM J. Sci. Statist.
Comput., 15, No. 6, 1413-1439.
Schnabel, R. B. and Frank, P. D. (1984) "Tensor Methods for Nonlinear
Equations," SIAM J. Numer. Anal., 21, No. 5, 815-843.
APP. D DEVELOPMENTS FROM 1974 TO 1995 309

Schreiber, R. and Parlett, B (1988) "Block reflectors; Theory and Com-


putation," SIAM J. Numer. Anal., 25, No. 1, 189-205.
Schreiber, R. and Van Loan, C. F. (1989) "A Storage Efficient WY
Representation for Products of Householder Transformations," SIAM J.
Set. Statist Comput., 10, 53-57.
Sevy, J. C. (1995) "Lagrange and Least-squares Polynomials as Limits of
Linear Combinations of Iterates of Bernstein and Durrmeyer Polynomials,"
J. Approx. Theory, 80, Number 2, 267-271.
Shroff, G. M. and Bischof, C. (1992), "Adaptive Condition Estimation
for Rank-one Updates of QR Factorizations," SIAM J. Matrix Anal. Appl.,
13, No. 4, 1264-1278.
Springer, J. (1986) "Exact Solution of General Integer Systems of Lin-
ear Equations," ACM Trans. Math. Software, 12, No. 1, 51-61. Also
"Algorithm 641," same jour. 12, No. 2, p. 149.
Stewart, D. (1994) "Meschach 1.2—A Library in C of Linear Algebra
Codes, Including Least-Squares," Centre for Mathematics and its Appli-
cations, School of Mathematical Sciences, Australian National University,
Canberra, ACT 0200, Australia, E-mail: david.stewart@anu.edu.au.
Stewart, G. W. (1977) "On the Perturbation of Pseudo-Inverses, Pro-
jections and Linear Least Squares Problems," SIAM Rev., 19, 634-662.
Stewart, G. W. (1982) "Computing the CS-Decomposition of a Parti-
tioned Orthonormal Matrix," Numer. Math., 40, 297-306.
Stewart, G. W. (1992) "An Updating Algorithm for Subspace Track-
ing," IEEE Trans. Signal Processing, 40, No. 6, 1535-1541.
Stewart, G. W. (1995) Theory of the Combination of Observations Least
Subject to Errors - Part One, Part Two, Supplement, SIAM Classics in
Applied Mathematics, 11, SIAM Publ, Philadelphia, PA.
Stoer, J. (1992) "A Dual Algorithm for Solving Degenerate Linearly
Constrained Linear Least Squares Problem," J. Numer. Linear Algebra
Appl, 1, No. 2, 103-132.
Strang, G. (1988) Linear Algebra and Its Applications, 3rd Edition,
Academic Press, New York.
Sun, Ji-G. (1983) "Perturbation Analysis for the Generalized Singular
Value Problem," SIAM J. Numer. Anal., 20, 611-625.
Sun, J. (1992) "Rounding Errors and Perturbation Bounds for the
Cholesky and LDLT Factorizations," Linear Algebra Appl., 173, 77-97.
Sun, Ji-G. (1995) "On Perturbation Bounds for the QR Factorization,"
Linear Algebra Appl., 215, 95-112.
Ten Vregelaar, J. M. (1995) "On Computing Objective Function and
Gradient in the Context of Least Squares Fitting a Dynamic Errors-In-
Variables Model," SIAM J. Sci. Statist. Comput., 16, No. 3, 738-753.
310 DEVELOPMENTS FROM 1974 TO 1995 APP. D

Vaccaro, R. J. (1994) "A Second-Order Perturbation Expansion for the


SVD," SIAM J. Matrix And. Appl, 15, No. 2, 661-671.
Van Der Sluis, A. and Van Der Vorst, H. (1990) "SIRT- and CG-Type
Methods for the Iterative Solution of Sparse Least Squares Problems,"
Linear Algebra Appl, 130, 257-302.
Van Huffel, S. (1992) "On the Significance of Nongeneric Total Least-
Squares Problems," SIAM J. Matrix Anal. Appl, 13, No. 1, 20-35.
Van Huffel, S. and Vandewalle, J. (1991) The Total Least Squares Prob-
lem — Computational Aspects and Analysis, Frontiers in Applied Math.,
9, SIAM Publ., Philadelphia, PA., 300 pp.
Van Huffel, S. and Zha, H. (1993) "An Efficient Total Least Squares
Algorithm Based on a Rank-Revealing Two-sided Orthogonal Decomposi-
tion," Numer. Algorithms, 4, Issues 1 and 2, 101-133.
Van Loan, C. F. (1973) Generalized Singular Values with Algorithms
and Applications, Ph.D. thesis, University of Michigan, Ann Arbor.
Van Loan, C. F. (1976) "Generalizing the Singular Values Decomposi-
tion," SIAM J. Numer. Anal., 13, 76-83.
Van Loan, C. F. (1985) "Computing the CS and the Generalized Sin-
gular Values Decomposition," Numer. Math., 46, 479-491.
Vogel, C. R. and Wade, J. G. (1994) "Iterative SVD-Based Methods for
Ill-Posed Problems," SIAM J. Sci. Statist. Comput., 15, No. 3, 736-754.
Wahba, G. (1990) Spline Methods for Observational Data, SIAM Publ.,
Philadelphia, PA.
Watson, G. A. (1992) "Computing the Structured Singular Value,"
SIAM J. Matrix Anal. Appl, 13, No. 4, 1054-1066.
Wedin, P.-A. and Lindstrom, P. (1988) "Methods and Software for Non-
linear Least Squares Problems," Rep. UMINF-133.87, University of Umea,
5 - 9021 - 87, Umea, Sweden.
Wei, M. S. (1990) "Perturbation of Least Squares Problem," Linear
Algebra Appl, 141, 177-182.
Wei, M. S. (1992a) "Perturbation Theory for the Rank-Deficient Equal-
ity Constrained Least Squares Problem," SIAM J. Numer. Anal, 29, No.
5, 1462-1481.
Wei, M. S. (1992b) "Algebraic Properties of the Rank-Deficient Con-
strained and Weighted Least Squares Problems," Linear Algebra Appl,
161, 27-43.
Wei, M. S. (1992c) "The Analysis for the Total Least-Squares Problem
with More Than One Solution," SIAM J. Matrix And. Appl, 13, No. 3,
746-763.
Williams, J. and Kalogiratou, Z. (1993) "Least Squares and Chebyshev
Fitting for Parameter Estimation in ODEs," Adv. Comput. Math., 1,
Issues 3 and 4, 357-366.
APP. D DEVELOPMENTS FROM 1974 TO 1995 311

Wood, S. N. (1994) "Monotone Smoothing Splines Fitted by Cross Val-


idation," S1AM J. Sci. Statist. Comput., 15, No. 5, 1126-1133.
Xu, G. G., Zha, H. Y., Golub, G. H., and Kailath, T. (1994) "Fast Al-
gorithms for Updating Signal Subspaces," IEEE Trans. Circuits Systems,
41, No. 8, 537-549.
Yoo, K. and Park, H. (1995) "Accurate Downdating of a QR Decom-
position," BIT, 35, No. 4.
Zha, H. Y. (1991) The Restricted Singular Value Decomposition of
Matrix Triplets," SIAM J. Matrix Anal. Appl, 12, 172-194.
Zha, H. Y. (1992) "A Numerical Algorithm for Computing Restricted
Singular Value Decomposition of Matrix Triplets," Linear Algebra Appl.,
168, 1-25.
Zha, H. Y. (1993) "A Componentwise Perturbation Analysis of the QR
Decomposition," SIAM J. Matrix Anal. Appl., 14, No. 4, 1124-1131.
Zha, H. and Hansen, P. C. (1990) "Regularization and the General
Gauss-Markov Linear Model," Math. Comput., 55, 613-624.
Zou, X. and Navon, I. M. (1994) Environmental Modeling, Volume II,
Computer Methods and Software for Simulating Environmental Pollution
and its Adverse Effects, P. Zabbetti, Ed., Computational Mechanics Publ.,
Southampton and Boston, 350 pp, 277-325.
Zou, X., Navon, I. M., Berger, M., Phua, M. K., Schlick, T., and LeD-
iment, F. X. (1993) "Numerical Experience with Limited-Memory, Quasi-
Newton Methods for Large-Scale Unconstrained Nonlinear Minimization,"
SIAM J. Optim,, 3, No. 3, 582-608.
BIBLIOGRAPHY

ALBERT, ARTHUR (1972) Regression and the Moore-Penrose Pseudotnverse.


Academic Press, New York and London, 180 pp.
ANSI Subcommittee X3J3 (1971) "Clarification of Fortran Standards-
Second Report," Comm. ACM, 14, No. 10, 628-642.
ASA Committee X3 (1964) "Fortran vs. Basic Fortran," Comm. ACM, 7,
No. 10, 591-625.
AUTONNE, L. (1915) "Sur les Matrices Hypohermitienes et sur les Matrices
Unitaries," Ann. Univ. Lyon, (2), 38,1-77.
BARD, YONATHAN (1970) "Comparison of Gradient Methods for the Solution
of Nonlinear Parameter Estimation Problems," S1AM J. Numer. Ana!., 7,
No. 1,157-186.
BARTELS, R. H., GOLUB, O. H., AND SAUNDERS, M. A. (1970) "Numerical
Techniques in Mathematical Programming," in Nonlinear Programming.
Academic Press, New York, 123-176.
BARTBLS, RICHARD H. (1971) "A Stabilization of the Simplex Method,"
Numer. Math., 16, 414-434.
BAUER, F. L. (1963) "Optimal Scaling of Matrices and the Importance of the
Minimal Condition," Proc. IFIP Congress 1962, North-Holland, 198-201.
BAUER, F. L., see Wilkinson, J. H.
BEN-ISRAEL, ADI (1965) "A Modified Newton-Raphson Method for the
Solution of Systems of Equations," Israel J. Math., 3, 94-98.
BEN-ISRAEL, ADI (1966) "On Error Bounds for Generalized Inverses,"
SIAM J. Numer. Anal., 3, No. 4, 585-592.
BENNETT, JOHN M. (1965) Triangular Factors of Modified Matrices,"
Numer. Math., 7,217-221.

312
BIBLIOGRAPHY 313

BJORCK, AKB (1967a) "Solving Linear Least Squares Problems by Gram-


Schmidt Orthogonalization," BIT, 7,1-21.
BJORCK,An (1967b) "Iterative Refinement of Linear Least Squares Solutions
I,''BIT,7,257-278.
BJOROC, An AND GOLUB, O. H. (1967) "Iterative Refinement of Linear Least
Squares Solutions by Householder Transformation," BIT, 7, 322-337.
BJORCK, An AND GOLUB, O. H. (1973) "Numerical Methods for Computing
Angles between Linear Subspaces," Math. Camp., 27, 579-594.
Boggs, D. H. (1972) "A Partial-Step Algorithm for the Nonlinear Estimation
Problem," AIAA Journal. 10, No. 5, 675-679.
BOULiON, T. AND OdeLL, P. (1971) Generalized Invent Matrices. Wiley
(Interscience), New York.
BROWN, KENNETH M. (1971) "Computer Solution of Nonlinear Algebraic
Equations and Nonlinear Optimization Problems," Proc. Share, XXXVn,
New York, 12 pp.
BUCHANAN, J. E AND THOMAS, D. H. (1968) "On Least-Squares Fitting of
Two-Dimensional Data with a Special Structure,** S1AM J. Numer. Anal.,
5, No. 2,252-257.
Busman, P. A. (1967) "Matrix Scaling with Respect to the Maximum-
Norm, the Sum-Norm, and the Euclidean Norm," Thesis TNN71, The Uni-
versity of Texas, Austin, 119 pp.
Busman, P. A. (1970a) "MIDAS—Solution of Linear Algebraic Equations,"
Bell Telephone Laboratories, Numerical Mathematics Computer Programs 3,
Issue 1. Murray Hill, NJ.
Busman, P. A. (1970b) "Updating a Singular Value Decomposition," BIT,
10,376-385.
Busman, P. A., see Golub, O. H.
Busman, P. A. AND GOLUB, G. H. (1967) "An Algol Procedure for Com-
puting the Singular Value Decomposition," Stanford Unto. Report No. CS-73,
Stanford, Calif.
BusINGER, P. A. AND GOLUB, G. H. (1969) "Singular Value Decomposition
of a Complex Matrix," Comm. ACM, 12, No. 10,564-565.
CARASSO, C. AND LAURENT, P. J. (1968) "On the Numerical Construction and
the Practical Use of Interpolating Spline-Functions," Proc. IFIP Congress
1968, North-Holland Pub. Co.,
CHAMBERS, JOHN M. (1971) "Regression Updating," j. Amer. Statist. Assn.,
66, 744-748.
CHENEY, E. W. AND GOLDSTEIN, A. A. (1966) "Mean-Square Approximation
by Generalized Rational Functions," Boeing Scientific Research Lab. Math.
Note, No. 465, p. 17, Seattle.
314 BIBLIOGRAPHY

CODY, W. J. (1971) "Software for the Elementary Functions," Mathematical


Software, ed. by J. R. Rice. Academic Press, New York, 171-186.
COWDEN, DUDLEY J. (1958) "A Procedure for Computing Regression
Coefficients," /. Amer. Statist. Assn., 53,144-150.
Cox, M. G. (1971) "The Numerical Evaluation of B-Splines," Report No.
NPL-DNAC-4, National Physical Lab., Teddington, Middlesex, England,
34pp.
CRUISE, D. R. (1971) "Optimal Regression Models for Multivariate Analysis
(Factor Analysis)," Naval Weapons Center Report NWC TP 5103, China
Lake, Calif., 60 pp.
DAVIDON, WILLIAM C. (1968) "Variance Algorithm for Minimization," Opti-
mization (Sympos. Keele, 1968) Acad Press London, 13-20.
DAVIS, CHANDLER AND KAHAN, W. M. (1970) The Rotation of Eigenvectors
by a Perturbation, 111," S1AMJ. Numer. Anal., 7, No. 1,1-46.
DE BOOR, CARL (1971) "Subroutine Package for Calculating with B-Splines,"
Report No. LA-4728-MS, Los Alamos Scientific Lab, 12 pp.
DE BOOR, CARL (1972) "On Calculating with B-Splines," J. Approximation
Theory, 6, No. 1, 50-42.
DE BOOR, CARL AND RICE, JOHN R. (1968a) "Least Squares Cubic Spline
Approximation I—Fixed Knots," Purdue Univ., CSD TR 20, Lafayette,
Ind., 30 pp.
DE BOOR, CARL AND RICE, JOHN R. (1968b) "Least Squares Cubic Spline
Approximation II—Variable Knots," Purdue Univ., CSD TR 21, Lafayette,
Ind., 28 pp.
DRAPER, N. R. AND SMITH, H. (1966) Applied Regression Analysts. John
Wiley & Sons, Inc., New York, 407 pp.
DYER, P., see Hanson, R. J.
DYER, P. AND MCREYNOLDS, S. R. (1969) "The Extension of Square-Root
Filtering to Include Process Noise," J. Opt.: Theory and Applications, 3,
92-105.
EBERLIN, P. J. (1970) "Solution to the Complex Eigenprobkm by a Norm
Reducing Jacobi Type Method," Numer. Math., 14,232-245.
ECKART, C. AND YOUNG, O. (1936) "The Approximation of One Matrix by
Another of Lower Rank," Psychometrika, 1, 211-218.
FADDEEV, D. K., KUBLANOVSKAYA, V. N., AND FADDEEVA, V. N. (1968)
"Sur les Systemes Lineaires Algebriques de Matrices Rectangularies et
Mal-Conditionnees," Programmation en Mathcmatiques Numenques,
Editions Centre Nat. Recherche Sci., Paris, VII, 161-170.
FADDEEVA, V. N., see Faddeev, D. K.
BIBLIOGRAPHY 315

FIACCO, A. V. AND McCORMiCK, G. P. (1968) Nonlinear Programming:


Sequential Unconstrained Minimization Techniques. John Wiky & Sons, Inc.,
New York.
FLETCHER, R. (1968) "Generalized Inverse Methods for the Best Least
Squares Solution of Systems of Non-Linear Equations," Comput. /., 10,
392-399.
FLETCHER, R. (1971) "A Modified Marquardt Subroutine for Nonlinear Least
Squares," Report No. R-6799, Atomic Energy Research Estab., Harwell,
Berkshire, England, 24 pp.
FLETCHER, R. AND Lill, S. A. (1970) "A Class of Methods for Non-Linear
Programming, II, Computational Experience," in Nonlinear Programming,
ed. by J. B. Rosen, O. L. Mangasarian, and K. Ritter. Academic Press,
New York, 67-92.
FORSYTHE, GEORGE E. (1957) "Generation and Use of Orthogonal Poly-
nomial for Data-Fitting with a Digital Computer," J. Soc. Indust. Appl.
Math. 5,74-88.
FORSYTHE, GEORGE E. (1967) "Today's Methods of Linear Algebra," SIAM
Rev., 9, 489-515.
FORSYTHE, GEORGE E. (1970) "Pitfalls in Computation, or Why a Math Book
Isn't Enough," Amer. Math. Monthly, 77, 931-956.
FORSYTHE, GEORGE E. AND MOLER, CLEVE B. (1967) Computer Solution of
Linear Algebraic Systems. Prentice-Hall, Inc., Engtewood Cliffs, N J., 148 pp.
Fox, L. (1965) An Introduction to Numerical Linear Algebra. Oxford Univ.
Press, New York, 327 pp.
FRANCIS, J. G. F. (1961-1962) "The QR Transformation," Parts I and H,
Comput. J., 4, 265-271, 332-345.
FRANKLIN, J. N. (1968) Matrix Theory. Prentice-Hall, Inc., Engtewood
Cliffs, N.J., 292 pp.
FRANKLIN, J. N. (1970) "Well-Posed Stochastic Extensions of Ill-Posed Linear
Problems," /. Math. Anal. Appl, 31,682-716.
GALE, DAVID (1960) Theory of Linear Economic Models, McGraw-Hill,
330pp.
GALE, DAVID (1969) "How to Solve Least Inequalities," Amer. Math.
Monthly, 76,589-599.
GARSTOE, M. J. (1971) "Some Computational Procedures for the Best Subset
Problem," Appl Statist. (J. Roy. Statist. Soc., Ser. Q, 20,8-15 and 111-115.
GASTTNEL, NOEL (1971) Linear Numerical Analysis. Academic Press, New
York, 350 pp.
GENTLEMAN, W. M. (1972a) "Least Squares Computations by Givens
Transformations without Square Roots," Univ. of Waterloo Report CSRR-
2062, Waterloo, Ontario, Canada, 17 pp.
316 BIBLIOGRAPHY

GENTLEMAN, W. M. (1972b) "Basic Procedures for Large, Sparse or Weighted


Linear Least Squares Problems," Univ. of Waterloo Report CSRR-2068,
Waterloo, Ontario, Canada, 14 pp.
Gill, P. E. AND MURRAY, W. (1970) "A Numerically Stable Form of the
Simplex Algorithm," National Physical Lab. Math. Tech. Report No. 87,
Teddington, Middlesex, England, 43 pp.
GILL, P. E., GOLUB, G. H., MURRAY, W., AND SAUNDERS, M. A. (1972)
"Methods for Modifying Matrix Factorizations," Stanford Univ. Report No.
CS-322, Stanford, Calif., 60 pp.
GIVENS, W. (1954) "Numerical Computation of the Characteristic Values of a
Real Symmetric Matrix," Oak Ridge National Laboratory Report ORNL-1574,
Oak Ridge, Tenn.
GLASKO, V. B., see Tihonov, A. N.
GOLDSTEIN, A. A., see Cheney, E. W.
GOLUB, G. H. (1968) "Least Squares, Singular Values and Matrix Approxi-
mations,** Aplikace Mathematicky, 13,44-51.
GOLUB, G. H. (1969) "Matrix Decompositions and Statistical Calculations,"
in Statistical Calculations, ed. by R. C. Milton and J. A. Nelder. Academic
Press, New York, 365-397.
GOLUB, G. H., see Bartels, R. H.
GOLUB, G. H., see Bjftrck, Ake.
GOLUB, G. H., see Bushiger, P. A.
GOLUB, G. H., see Gill, P. E.
GOLUB, G. H. AND BUSINOER, P. A. (1965) "Linear Least Squares Solutions
by Householder Transformations," Nutner. Math., 7, H. B. Series Linear
Algebra, 269-276.
GOLUB, G. H. AND KAHAN, W. (1965) "Calculating the Singular Values and
Pseudoinverse of a Matrix," SIAMJ. Numer. Anal, 2, No. 3,205-224.
GOLUB, G. H. AND PEREYRA, V. (1973) "The Differentiation of Pseudoinverses
and Nonlinear Least Squares Problems Whose Variables Separate," SIAM
J. Numer. Anal., 10,413-432.
GOLUB, G. H. AND REINSCH, C. (1971) "Singular Value Decomposition and
Least Squares Solutions" in Handbook for Automatic Computation, H, Linear
Algebra [see Wilkinson and Reinsch (1971)].
GOLUB, G. H. AND SAUNDERS, MICHAEL A. (1970) "Linear Least Squares and
Quadratic Programming" in Integer and Nonlinear Programming, n, J.
Abadie (ed.), North Holland Pub. Co., Amsterdam, 229-256.
GOLUB, G. H. AND SMITH, L. B. (1971) "Chebyshev Approximation of Con-
tinuous Functions by a Chebyshev System of Functions," Comm, ACM, 14,
No. 11, 737-746.
BIBLIOGRAPHY 317

GOLUB, O. H. AND STYAN, O. P. H. (to appear) "Numerical Computations


for Unfrariate Linear Models," /. Statist. Computation and Simulation.
GOLUB, O. H. AND UNDERWOOD, RICHARD (1970) "Stationary Values of the
Ratio of Quadratic Forms Subject to Linear Constraints," Z. Angew. Math,
Phys., 21, 318-326.
GOLUB, O. H. AND WILKINSON, J. H. (1966) "Note on the Iterative Refine-
ment of Least Squares Solution," Numer. Math., 9,139-148.
GORDON, WILLIAM R., see Marcus, Marvin.
GoRDONOVA, V. I., Translated by Linda Kaufman (1970) "Estimates of the
Roundoff Error in the Solution of a System of Conditional Equations,"
Stanford Univ. Report No. CS-164, Stanford University, Stanford, Calif.
GRAYMLL, F. A., MEYER, C D., AND PAINTER, R. J. (1966) "Note on the
Computation of the Generalized Inverse of a Matrix," SIAM Rev., 8, 522-
524.
GREVilLE, T. N. E. (1959) "The Pseudo-Inverse of a Rectangular or Singular
Matrix and its Application to the Solution of Systems of Linear Equations,"
SIAM Rev., 1, 38-43.
GREVILLB, T. N. E. (1960) "Some Applications of the Pseudoinverse of a
Matrix," SIAM Rev., 2, No. 1,15-22.
GREVILLB, T. N. E. (I960 "Note on the Generalized Inverse of a Matrix
Product," SIAM Rev., 8,518-521.
HALMOS, PAUL R. (1957) Introduction to Hilbert Space, 2nd ed. Chelsea Pub.
Co., New York, 114 pp.
HALMOS, PAUL R. (1958) Finite Dimensional Vector Spaces, 2nd ed. D. Van
Nostrand Co., Inc., Princeton, NJ., 199 pp.
HANSON, R. J. (1970) "Computing Quadratic Programming Problems: Linear
Inequality and Equality Constraints," JPL Sec. 314 Tech. Mem. No. 240,
Jet Propulsion Laboratory, California Institute of Technology, Pasadena,
Calif.
HANSON, R. J. (1971) "A Numerical Method for Solving Fredholm Integral
Equations of the First Kind Using Singular Values," SIAM J. Numer. Anal.,
8, No. 3, 616-622.
HANSON, R. J. (1972) "Integral Equations of Immunology," Comm. ACM,
15, No. 10, 883-890.
HANSON, R. J. AND DYER, P. (1971) "A Computational Algorithm for Sequen-
tial Estimation," Comput. J., 14, No. 3,285-290.
HANSON, R. J. AND LAWSON, C. L. (1969) "Extensions and Applications of
the Householder Algorithm for Solving Linear Least Squares Problems,"
Math. Comp., 23, No. 108,787-812.
HEALY, M. J. R. (1968) "Triangular Decomposition of a Symmetric Matrix,"
Appl Statist. (/. Roy. Statist. Soc., Ser. C), 17,195-197.
318 BIBLIOGRAPHY

HEMMERLE, W. J. (1967) Statistical Computations on a Digital Computer.


Blaisdell Pub. Co., New York, 230 pp.
HERBOLD, ROBERT J., see Hilsenrath, Joseph.
HESTENES, M. R. (1958) "Inversion of Matrices by Biorthogonalization and
Related Results," J. Soc. Indust. Appl. Math., 6, 51-90.
HESTENES, M". R. AND STIBFEL, E. (1952) "Methods of Conjugate Gradients
for Solving Linear Systems," Nat. Bur. of Standards J. Res.. 49,409-436.
HlLSENRATH, JOSEPH, ZlEGLER, GUY G., MESSINA, CARLA G., WALSH, PHILLIP
J., AND HERBOLD, ROBERT J. (1966, Revised 1968) "OMNITAB, a Computer
Program for Statistical and Numerical Analysis," Nat. Bur. of Standards,
Handbook 101, 275 pp.
HOCKING, R. R., see La Motte, L. R.
HOERL, A. E. (1959) "Optimum Solution of Many Variable Equations,"
Chemical Engineering Progress, 55, No. 11, 69-78.
HOERL, A. E. (1962) "Application of Ridge Analysis to Regression Problems,"
Chemical Engineering Progress, 58, No. 3, 54-59.
HOERL, A. E. (1964) "Ridge Analysis," Chemical Engineering Progress, 60,
No. 50, 67-78.
HOERL, A. E. AND KENNARD, R. W. (1970a) "Ridge Regression: Biased
Estimation for Nonorthogonal Problems," Technometrics, 12, 55-67.
HOERL, A. E. AND KENNARD, R. W. (1970b) "Ridge Regression: Applications
to Nonorthogonal Problems," Technometrics, 12, 69-82.
HOFFMAN, A. J. AND WIELANDT, H. W. (1953) "The Variation of the Spectrum
of a Normal Matrix," Duke Math. /., 20, 37-39.
HOUSEHOLDER, A. S. (1958) "Unitary Triangularization of a Nonsymmetric
Matrix," /. ACM, 5, 339-342.
HOUSEHOLDER, A. S. (1964) The Theory of Matrices in Numerical Analysis.
Blaisdell Pub. Co., New York, 257 pp.
HOUSEHOLDER, A. S. (1972) "KWIC Index for Numerical Algebra," Report
No. ORNL-4778, Oak Ridge National Laboratory, 538 pp.
JACOBI, C. O. J. (1846) "Uber ein Leichtes Verfahren die in der Theorie der
Sacularstorungen Vorkommenden Glcichungen Numerisch Aufzulosen,"
Cretle's J., 30,297-306.
JENNINGS, L. S. AND OSBORNE, M. R. (1970) "Applications of Orthogonal
Matrix Transformations to the Solution of Systems of Linear and Nonlinear
Equations," Australian National Univ. Computer Centre Tech. Kept., No. 37,
Canberra, Australia, 45 pp.
JENNRICH, R. I. AND SAMPSON, P. I. (1971) "Remark AS-R3. A Remark on
Algorithm AS-10," Appl. Statist. (J. Roy. Statist. Soc., Ser. C), 20,117-118.
KAHAN, W. (1966a) "When to Neglect Off-Diagonal Elements of Symmetric
Tri-Diagonal Matrices," Stanford Univ. Report No. CS-42, Stanford, Calif.
BIBLIOGRAPHY 319

KAHAN, W. (1966b) "Numerical Linear Algebra," Canadian Math. Butt.,


9, No. 6, 757-801.
KAHAN, W. M., tee Davis, Chandler.
KAHAN, W. M., see Gohib, G. H.
KALHAN, R. E. (1960) "A New Approach to Linear Filtering and Prediction
Problems," ASME Trans., J. Basic Eng., 82D, 35-45.
KAMMBRER, W. J. AND NASHBD, M. Z. (1972) "On the Convergence of the
Conjugate Gradient Method for Singular Linear Operation Equations,"
SIAMJ. Numer. Anal., 9,165-181.
KENNARD, R. W., see Hoerl, A. E.
KOROANOFF, ANDRE, see Pavel-Parvu, Monica.
KORGANOFF, ANDRE AND PAVEL-PARVU, MONICA (1967) Elements de Theorte
des Matrices Carries et Rectangles en Analyse NumMque. Dunod, Paris,
441 pp.
KROOH, F. T. (1974) "Efficient Implementation of a Variable Projection
Algorithm for Nonlinear Least Squares Problems," Comm. ACM, 17 (to
appear).
KUBLANOVSKAYA, V. N., see Faddeev, D. K.
LA MOTTO, L. R. AND HOCKING, R. R. (1970) "Computational Efficiency in
the Selection of Regression Variables," Technometrics, 12, 83-93.
LAURENT, P. J., see Carasso, C.
LAV/SON, C. L. (1961) "Contributions to the Theory of Linear Least Maximum
Approximation," Thesis, UCLA, Los Angeles, Calif., 99 pp.
LAWSON, C. L. (1971) "Applications of Singular Value Analysis," Mathe-
matical Software, ed. by John Rice. Academic Press, New York, 347-356.
LAWSON, C. L., see Hanson, R. J.
LERINGB, ORJAN AND WEWN, PER-AKE (1970) "A Comparison between Dif-
ferent Methods to Compute a Vector x which Minimizes \\Ax-B ||2 when
Gx = A," Lund University, Lund, Sweden, 21 pp.
LEVENBERO, K. (1944) "A Method for the Solution of Certain Non-Linear
Problems in Least Squares," Quart. Appl. Math., 2,164-168.
LILL, S. A., see Fletcher, R.
LONGLEY, JAMES W. (1967) "An Appraisal of Least Squares Programs for the
Electronic Computer from the Point of View of the User," J. Amer. Statist.
Assn., 62, 819-841.
LYNN M. S. AND TIMLAKE, W. P. (1967) "The Use of Multiple Deflations in
the Numerical Solution of Singular Systems of Equations with Applications
to Potential Theory," IBM Houston Scientific Center, 37.017, Houston,
Texas.
320 BIBLIOGRAPHY

MARCUS, MARVIN AND GORDON, WILLIAM R. (1972) "An Analysis of Equality


in Certain Matrix Inequalities, II," SIAM J. Numer. Anal., 9, 130-136.
MARQUARDT, DONALD W. (1963) "An Algorithm for Least-Squares Estima-
tion of Nonlinear Parameters," J. Soc. Indust. AppL Math., 11, No. 2,431-
441.
MARQUARDT, DONALD W. (1970) "Generalized Inverses, Ridge Regression,
Biased Linear Estimation, and Nonlinear Estimation," Technometrtcs, 12,
591-612.
MASON, J. C. (1969) "Orthogonal Polynomial Approximation Methods in
Numerical Analysis," Univ. of Toronto Tech. Report, No. 11, Toronto,
Ontario, Canada, 50 pp.
McCoRMiCK, G. P., see Fiacco, A. V.
MCCRATTH, D. L., see Mitchell, W. C.
McREYNOLDS, S. R., see Dyer, P.
MESSINA, CARLA G., see Hilsenrath, Joseph.
MEYER, CARL D., JR. (1972) The Moore-Penrose Inverse of a Bordered
Matrix," Lin. Alg. and Its AppL, 5, 375-382.
MEYER, C. D., see Graybill, F. A.
MITCHELL, W. C. AND MCCRATTH, D. L. (1969) "Heuristic Analysis of
Numerical Variants of the Gram-Schmidt Orthonormalization Process,"
Stanford Univ. Report No. CS-122, Stanford, Calif.
MITRA, S. K., see Rao, C. R.
MOLER, CLBVB B. (1967) "Iterative Refinement in Floating Point,** J. ACM,
14, No. 2,316-321.
MOLER, OLEVE B. AND STEWART, G. W. (1973) "An Algorithm for General-
ized Matrix Eigenvalue Problem,** SIAM J. Numer. Anal., 10, No. 2,
241-256.
MOLER, CLEVB B., see Forsythe, George E.
MOORE, E. H. (1920) "On the Reciprocal of the General Algebraic Matrix,"
Bulletin, AMS, 26, 394-395.
MOORE, E. H. (1935) "General Analysis, Part 1," Mem. Amer. Phihs. Soc., 1,
1-231.
MORRISON, DAVID D. (1960) "Methods for Nonlinear Least Squares Problems
and Convergence Proofs," Proc. of Seminar on Tracking Programs and Orbit
Determination, Jet Propulsion Lab., Pasadena, Calif., Cochairmen: Jack
Lorell and F. Yagi, 1-9.
MORRISON, W. J., see Sherman, Jack.
MURRAY, W. (1971) "An Algorithm to Find a Local Minimum of an Indefi-
nite Quadratic Program" (Presented at the VII Symposium on Mathematical
Programming, The Hague, 1970) Report No. NPL-DNAC-1, National
Physical Laboratory, Teddington, Middlesex, England, 31 pp.
BIBLIOGRAPHY 321

MURRAY, W., see Gill, P. E.


NEWMAN, MORRIS AND TODD, JOHN (1958) "The Evaluation of Matrix
Inversion Programs," J. Soc. Indust. Appl Math., 6, No. 4, 466-476.
ODELL, P., see Boullion, T.
ORTEGA, JAMBS M. AND RHBINBOLDT, WERNER C (1970) Iterative Solution
of Nonlinear Equations in Several Variables. Academic Press, New York,
572pp.
OSBORNE, E. E. (1961) "On Least Squares Solutions of Linear Equations,"
J. ACM, 8, 628-636.
OSBORNE, E. E. (1965) "Smallest Least Squares Solutions of Linear Equa-
tions," SIAMJ. Numer. Anal., 2, No. 2, 300-307.
OSBORNE, M. R., see Jennings, L. S.
PAIGE, C. C. (1972) "An Error Analysis of a Method for Solving, Matrix
Equations," Stanford Univ. Report No. CS-297, Stanford, Calif.
PAINTER, R. J., see Oraybill, F. A.
PARLETT, B. N. (1967) "The LU and QR Transformations," Chap. 5 of
Mathematical Methods for Digital Computers, H John Wiley & Sons, Inc.,
New York.
PAVBL-PARVU, MONICA, see Korganoff, Andre*.
PAVBL-PARVU, MONICA AND KOROANOFF, ANDRE (1969) "Iteration Functions
for Solving Polynomial Matrix Equations," Constructive Aspects of the Funda-
mental Theorem of Algebra, ed. B. DeJon and P. Henrid Wiley-Inter-
sdence, New York, 225-280.
PENROSB, R. (1955) "A Generalized Inverse for Matrices," Proc. Cambridge
PhU.Soc., 51,406-413.
PBREYRA, V. (1968) "Stabilizing Linear Least Squares Problems," 68, Proc.
IFIP Congress 1968.
PEREYRA, V. (1969b) "Stability of General Systems of Linear Equations,"
Aequattones Mathematlcac, 2, Fasc. 2/3.
PEREYRA, V., see Golub, G. H.
PEREZ, A. AND SCOLNUC, H. D. (to appear) "Derivatives of Pseudoinverses
and Constrained Non-Linear Regression Problems," Numer. Math.
PETERS, G., see Martin, R. S.
PETERS, G. AND WILKINSON, J. H. (1970) "The Least Squares Problem and
Pseudoinverses," Comput. J., 13,309-316.
PHILLIPS, D. L. (1962) "A Technique for the Numerical Solution of Certain
Integral Equations of the First Kind," J. ACM, 9, 84-97.
PLACKETT, R. L. (1960) Principles of Regression Analysis. Oxford Univ.
Press, New York, 173 pp.
322 BIBLIOGRAPHY

POWELL, M. J. D. (1968) "A Fortran Subroutine for Solving Systems of


Non-Linear Algebraic Equations," Report No. R-5947, Atomic Energy
Research Estab., Harwell, Berkshire, England.
POWELL, M. J. D. (1970) "A Survey of Numerical Methods for Unconstrained
Optimization," SIAM Rev., 12, 79-97.
POWELL, M. J. D. AND REID, J. K. (1968a) "On Applying Householder's
Method to Linear Least Squares Problems," Report No. T-P-332, Atomic
Energy Research Estab., Harwell, Berkshire, England, 20 pp.
POWELL, M. J. D. AND REID, J. K. (1968b) "On Applying Householder's
Method to Linear Least Squares Problems," Proc. IFIP Congress, 1968.
PRINGLE, R. M. AND RAYNER, A. A. (1970) "Expressions for Generalized
Inverses of a Bordered Matrix with Application to the Theory of Constrained
Linear Models," SIAM Rev., 12, 107-115.
PYLE, L. DUANE (1967) "A Generalized Inverse (Epsilon)-Algorithm for
Constructing Intersection Projection Matrices, with Applications," Numer.
Math., 10, 86-102.
RALSTON, A. AND WILF, H. S. (1960) Mathematical Methods for Digital
Computers. John Wiley & Sons, Inc., New York, 293 pp.
RALSTON, A. AND WILF, H. S. (1967) Mathematical Methods for Digital
Computers, II. John Wiley & Sons, Inc., New York, 287 pp.
RAO, C. R. AND MITRA, S. K. (1971) Generalized Inverse of Matrices and its
Application. John Wiley & Sons, Inc., New York, 240 pp.
RAYNER, A. A., see Pringle, R. M.
REID, J. K. (1970) "A Fortran Subroutine for the Solution of Large Sets of
Linear Equations by Conjugate Gradients," Report No. 6545, Atomic Energy
Research Estab., Harwell, Berkshire, England, 5 pp.
REID, J. K. (1971a) The Use of Conjugate Gradients for Systems of Linear
Equations Possessing Property A," Report No. T-P-445, Atomic Energy
Research Estab., Harwell, Berkshire, England, 9 pp. Also appeared in SIAM
J. Numer. Anal., 9, 325-332.
REID, J. K., see Powell, M. J. D.
REID, J. K. (ed.) (1971b) Large Sparse Sets of linear Equations. Academic
Press, New York, 284 pp.
REINSCH, C., see Wilkinson, J. H.
RHEINBOLDT, WERNER C., see Ortega, James M.
RICE, JOHN R. (1966) "Experiments on Gram-Schmidt Orthogonalization,"
Math. Comp., 20, 325-328.
RICE, JOHN R. (1969) The Approximation of Functions, 2—Advanced Topics.
Addison Wesley Pub. Co., Reading, Mass., 334 pp.
RICE, JOHN R. (1971a) "Running Orthogonalization," J. Approximation
Theory, 4, 332-338.
BIBLIOGRAPHY 323

RICE, JOHN R., see De Boor, Carl.


Rice, JOHN R. (ed.) (1971b) Mathematical Software. Academic Press, New
York, 515 pp.
RICE, JOHN R. AND WHITE, JOHN S. (1964) "Norms for Smoothing and
Estimation," SIAM Rev., 6, No. 3, 243-256.
ROSEN, EDWARD M. (1970) "The Instrument Spreading Correction in GPC
III. The General Shape Function Using Singular Value Decomposition with
a Nonlinear Calibration Curve," Monsanto Co., St. Louis, Mo.
ROSEN, J. B. (1960) The Gradient Projection Method for Nonlinear Pro-
gramming, Part I, Linear Constraints," J. Soc. Indust. Appl. Math., 8, No. 1,
181-217.
RUTISHAUSER, H. (1968) "Once Again the Least Squares Problem," J. Lin. Alg.
and Appl, 1,479-488.
SAMPSON, P. I., see Jennrich, R. I.
SAUNDERS, M. A. (1972a) "Large-Scale Linear Programming Using the
Chotesky Factorization," Stanford Univ. Report CS-252, Stanford, Calif.,
64pp.
SAUNDERS, M. A. (1972b) "Product Form of the Cholesky Factorization for
Large-Scale Linear Programming," Stanford Univ. Report No. CS-301,
Stanford, Calif., 38 pp.
SAUNDERS, M. A., see Bartels, R. H.
SAUNDERS, M. A., see Gill, P. E.
SAUNDERS, M. A., see Golub, G. H.
SCHUR, I. (1909) "Uber die Charakteristischen Wurzeln einer Linearen
Substitution mit einer Anwedung auf die Theorie der Integralgtetchungen,"
Math. Ann., 66, 488-510.
SCOLNK, H. D., see Perez, A.
SHERMAN, JACK AND MORRISON, W. J. (1949) "Adjustment of an Inverse
Matrix Corresponding to the Changes in the Elements of a Given Column
or a Given Row of the Original Matrix," Ann. Math. Statist., 20, 621.
SHERMAN, JACK AND MORRISON, W. J. (1950) "Adjustment of an Inverse
Matrix Corresponding to a Change in one Element of a Given Matrix,"
Ann. Math. Statist., 21,124.
SMITH, GERALD L. (1967) "On the Theory and Methods of Statistical Infer-
ence," NASA Tech. Rept. TR R-251, Washington, D.C., 32 pp.
SMITH, H., see Draper, N. R.
SMITH, L. B., see Golub, G. H.
STEWART, G. W. (1969) "On the Continuity of the Generalized Inverse,"
SIAMJ. Appl Math., 17, No. 1, 33-45.
324 BIBLIOGRAPHY

STEWART, G. W. (1970) "Incorporating Origin Shifts into the QR Algorithm


for Symmetric Tridiagonal Matrices," Comm. ACM, 13, No. 6, 365-367,
369-371.
STEWART, G. W. (1973) Introduction to Matrix Computations. Academic
Press, New York, 442 pp.
STEWART, G. W., see Moler, C. B.
STIEFEL, E., see Hestenes, M. R.
STOER, JOSEPH (1971) "On the Numerical Solution of Constrained Least
Squares Problems," SIAM J. Numer. Anal, 8, No. 2, 382-411.
STRAND, OTTO NEAL, see Wcstwater, Ed R.
STRAND, OTTO NEAL AND WESTWATER, ED R. (1968a) "Statistical Estimation
of the Numerical Solution of a Fredholm Integral Equation of the First
Kind," J. ACM, 15, No. 1,100-114.
STRAND, OTTO NEAL AND WESTWATER, ED R. (1968b) "Minimum-RMS
Estimation of the Numerical Solution of a Fredholm Integral Equation of the
Fust Kind," SIAMJ. Numer. Anal., 5, No. 2, 287-295.
STYAN, G. P. H., see Golub, G. H.
SWERUNO, P. (1958) "A Proposed Stagewise Differential Correction Pro-
cedure for Satellite Tracking and Prediction," Rand Corp. Report P-1292,
Santa Monica, Calif, [also in /. Astronaut. Sci., 6 (1959)].
THOMAS, D. H., see Buchanan, J. E.
TIHONOV, A. N. AND GLASKO, V. B. (1964) "An Approximate Solution of
Fredholm Integral Equations of the First Kind," Zhurnal Vychislitel'noi
Matematiki i Matcmaticheskoi Fiziki, 4, 564-571.
TIMLAKB, W. P., see Lynn, M. S.
TODD, JOHN, see Newman, Morris.
TORNHEIM, LEONARD (1961) "Stepwise Procedures Using Both Directions,"
Proc. 16th Nat. Meeting of ACM, 12A4.1-12A4.4.
TUCKER, A. W. (1970) "Least Distance Programming," Proc. of the Princeton
Sympos. on Math. Prog., Princeton Univ. Press, Princeton, N.J., 583-588.
TURING, A. M. (1948) "Rounding off Errors in Matrix Processes," Quart.
J. Mech., 1, 287-308.
TWOMEV, S. (1963) "On the Numerical Solution of Fredholm Integral Equa-
tions of the First Kind by Inversion of the Linear System Produced by
Quadrature," J. ACM, 10, 97-101.
UNDERWOOD, RICHARD, see Golub, G. H.
VAN DER SLUIS, A. (1969) "Condition Numbers and Equilibration of
Matrices," Numer. Math., 14,14-23.
BIBLIOGRAPHY 325

VAN DBR SLUIS, A. (1970) "Stability of Solutions of Linear Algebraic Sys-


tems," Numer. Math., 14, 246-251.
VARAH, J. M. (1969) "Computing Invariant Subspaces of a General Matrix
When the Eigensystem is Poorly Conditioned," Math. Res. Center Report 962,
University of Wisconsin, Madison, Wis., 22 pp.
WALSH, PHILLIP J., see Hilsenrath, Joseph.
WAMPLER, ROY H. (1969) "An Evaluation of Linear Least Squares Computer
Programs," Nat. Bur. of Standards J. Res., Ser. B Math. Sci, 73B, 59-90.
WEDIN, PER-AKE (1969) "On Pseudoinverses of Perturbed Matrices," Lund
University, Lund, Sweden, 56 pp. Also appeared with revisions and extensions
as "Perturbation Theory for Pseudo-Inverses," BIT, 13,217-232 (1973).
WEDIN, PER-AKE (1972) "Perturbation Bounds in Connection with Singular
Value Decomposition," BIT, 12, 99-111.
WEDIN, PuR-Ao (1973) "On the Almost Rank Deficient Case of the Least
Squares Problem," BIT, 13,344-354.
WEDIN, PER-AKB, see Leringe, Orjan.
WBSTWATBR, ED R., see Strand, Otto Neal.
WESTWATER, ED R. AND STRAND, OTTO NEAL (1968) "Statistical Information
Content of Radiation Measurements Used in Indirect Sensing," J. Atmo-
spheric Sciences, 25, No. 5, 750-758.
WHITE, JOHN S., see Rice, John R.
WBLANDT, H. W., see Hoffinan, A. J.
WILF, H. S., see Ralston, A.
WILKINSON, J. H. (1960) "Error Analysis of Floating-Point Computation,"
Numer. Math., 2, 319-340.
WILKINSON, J. H. (1962) "Householder's Method for Symmetric Matrices,"
Numer. Math., 4, 354-361.
WILKINSON, J. H. (1963) Rounding Errors in Algebraic Processes. Prentice-
Hall, Inc., Engtewood Cliffs, N.J., 161 pp.
WILKINSON, J. H. (1965a) The Algebraic Eigenvalue Problem. Clarendon Press,
Oxford, 662 pp.
WILKINSON, J. H. (1965b) "Convergence of the LR, QR and Related
Algorithms," Comput. J., 8, No. 1,77-84.
WILKINSON, J. H. (1968a) "Global Convergence of QR Algorithm," Proc.
IFIP Congress 1968. North-Holland Pub. Co., Amsterdam.
WILKINSON, J. H. (1968b) "Global Convergence of Tridiagonal QR Algorithm
with Origin Shifts," /. Lin. Alg. and Appl., 1, 409-420.
WILKINSON, J. H. (1970) "Elementary Proof of the Wielandt-Hoffman
Theorem and its Generalization," Stanford Univ. Report No. CS-I50, Stan-
ford, Calif.
326 BIBLIOGRAPHY

WILKINSON, J. H. AND REINSCH, C. (ed. by F. L. Bauer, et al.) (1971) Hand-


book for Automatic Computation, n, Linear Algebra. Springer-Verlag,
New York, 439 pp.
WILKINSON, J. H., see Golub, O. H.
WILKINSON, J. H., see Peters, G.
WILLOUOHBY, RALPH A. (ed.) (1968) "Sparse Matrix Proceedings," Thomas
J. Watson Research Center Report RA1-(11707\ White Plains, N.Y., 184 pp.
WOLFE, P. (1965) "The Composite Simplex Algorithm," SIAM Rev., 7,
42-54.
YOUNG, G., see Eckart, C.
ZEEOLER, GUY G., see Hilsenrath, Joseph.
INDEX

Acceptable solution set, 182 HFT (11.4) Householder for-


Aird, T. J., 248 ward triangularization,
ALGOL procedure decompose, 64, 68, 92,102,133, 250
146 HFTI (14.9) House-
Algorithm: holder forward trian-
BSEQHT (27.21) Banded gularization and solu-
sequential Householder tion using column inter-
triangularization, 213 changes, 30-31, 68-69,
BVLS Bounded vari- 77, 81, 92-93, 95, 102-
able least squares, 252, 105, 146, 186, 248, 252
292 HS1 (11.10)
Cholesky factorization Finish Householder so-
(19.12)-(19.14), 123 lution for Cases la and
Cholesky factorization 2a, 65, 92, 102, 250
(19.17)-(19.18), 125 HS2 (13.9) Finish
COV (12.12) Unsealed co- Householder solution of
variance matrix, 69, under-
250, 251 (see also 217- determined system, 75,
218) 93, 102
Gl (10.25) Construct Givens LDP (23.27) Least distance
rotation, 58 programming, 165
G2 (10.26) Apply Givens ro- LSE (20.24) Least squares
tation, 59 with equal-
H1 and H2 (10.22) Construc- ity constraints, 139; for
tion and application of alternative methods see
Householder 144-147 and 148-157
transformation, 57, 85, LSI Least
86 squares with inequality
HBT (13.8) Householder constraints, 167-169,
backward triangulariza- NNLS (23.10) Nonnegative
tion, 75, 93, 102 least squares, 161, 165,
327
328 INDEX

174, 292 Bounds for singular values, 28-35,


QR Eigenvalues and eigen- 203
vectors of a symmetric Businger, P. A., 11, 81, 107
matrix, 108- 109 BVLS, 249, 252, 279, 292
QRBD (18.31) Singu-
lar value decomposition C
of a bidiagonal matrix, Candidate solutions, 197, 200,
204
116
SEQHT (27.10) Sequential Carasso, C., 223
Householder triangular- Case:
ization, 210 la, 3, 11, 13, 63-66, 92-93,
Solving RTy = z following 102
banded sequential accu- Ib, 3, 12, 15, 77-82, 95-99,
mulation (27.24), 217 103
SVA Singular value analysis, 2a, 3, 11, 14, 21, 63-66, 90-
92, 101-102
117-118, 120, 196-198
SVD Singular value decom- 2b, 3, 12, 15, 77-82, 95-99,
103
position, 110, 251, 291
3a, 3, 12, 14, 74-76, 93-95,
Alternatives to computing the co-
102
variance matrix, 71
3b, 3, 12, 16, 77-82, 95-99,
Approximation by a matrix of
103
lower rank, 26
Chambers, J. M., 230, 231
Avoiding machine dependent tol- Cholesky:
erance parameters, 251 decomposition (see
Cholesky, factorization)
B factor, 226
Banachiewicz method, 124 factorization, 185-188, 287,
Banded matrix, 212, 221 294
Bandwidth, 212, 223 method without square
Basic linear algebra, 233-239 roots, 133
Basic linear algebra solution of normal equations,
subprograms, 296 122-123
Basis, 235 Cline, A. K., 165, 248
Beckman, F. S., 132 Cody, W. J., 58
Bidiagonal matrix, 110, 116, 236 Column:
Bjorck, A., 41, 81, 84, 130, 131, interchanges, 78, 104, 144,
145, 146, 285 149
BLAS, 296 scaling, 186
Block-oriented algorithms, 285, space, 39, 235
288, 296, 297 Condition number, 27, 28, 50,
BNDACC, 218, 249, 264 186, 194, 287
BNDSOL, 218, 249, 264 Conjugate gradient method, 132
INDEX 329

Convergence proof for the sym- Eckart, C., 26


metric matrix QR algo- Efroymson, M. A., 195
rithm, 240 Eigenspace of a symmetric ma-
Conversion to double precision, trix, 238
252 Eigenvalue analysis of normal
Courant-Fischer minmax theo- equations, 122, 126
rem, 24 Eigenvalue-eigenvector decompo-
Covariance matrix, 67, 185-186, sition of a symmetric
189, 208, 211, 217, 288- matrix, 237, 239
289 Eigenvalues of a symmetric ma-
alternatives to computing, trix, 237
71 Eigenvectors of a symmetric ma-
Cox, M. G., 223 trix, 237
Cubic spline fitting, 222 Equality constraints (see Prob-
Cubic spline function, 223 lem LSE)
Error analysis:
D for 77 precision arithmetic,
Damped least squares, 190 83-89, 90-99
DATA4, 199, 250 for mixed precision arith-
de Boor, C., 223 metic, 100-103
Deleting variables, 194, 293 using row as well as column
Detecting linear dependencies, interchanges, 103-106
72, 287, 288 Errors in variables, 286
Diagonal: Euclidean:
elements, 236 length 1, 5, 234
main, 236 matrix norm, 234
matrix, 236
norm, 234
DIFF, 249, 252, 278
Differentiation:
of the pseudoinverse, 48 F
of the solution of Problem Faddeev, D. K., 32, 35
LS, 52 Faddeeva, V. N., 32, 35
Dimension: Feasibility of linear inequalities,
of a linear flat, 236 159
of a vector space or subspace, Fiacco, A. V., 160
235 Filtering, 207, 293, 294
Direct sum, 236 Flat, linear, 236
Dot product, 233 Fletcher, R., 48
Double precision, 252 Floating point arithmetic preci-
Downdating (see Updating a QR sion parameter n, 84
decomposition) Forsythe, G. E., 36, 84, 88, 92,
132, 195
E Fortran code, 248, 296, 297
330 INDEX

for solving problem LS, 296- Greville, T. N. E., 39


297
Francis, J. G. F., 11, 107, 108, H
112, 113 H12, 54, 57, 210, 249, 271
Frobenious norm, 234 Halfspace, 236
Full rank, 235, 289-290 Halmos, P. R., 238
Hankel matrices, 295
G Hanson, R. J., 41, 48, 81, 135, 198
Gl, 59, 249, 274 Healy, M. J. R., 125
G2, 59, 249, 274 Hessenberg matrix, 236, 273
Garside, M. J., 195 Hestenes, M. R., 132
Gaussian elimination, 30, 92-93, HFTI, 79, 82, 156, 201, 203, 204,
133, 146 253
Gauss-Jordan solution of normal Higher precision arithmetic pa-
equations, 122, 195 rameter w, 100
GEN, 249, 277 'Hocking, R. R., 195
Generalized SVD, 286, 291 Hoerl, A. E., 190, 195, 201
Gentleman, W. M., 60, 211, 230, Hoffman, A. J., 24
231 Householder:
Gill, P. E., 158, 226 reflection matrix, 237
Givens: transformation in block
reflection matrix, 237 form, 288
rotation matrix, 237 transformation using fewer
transformation using fewer multiplications, 59
multiplications, 60 Householder, A. S., 10
Givens, J. W., 10 Householder transformation ma-
Givens transformation matrix, trix, 10, 53
10, 58 eigenvalues and eigenvectors
eigenvalues and eigenvectors of, 17
of, 17 Householder triangularization,
relation to Householder ma- 122
trix, 17 relation to Gram-Schmidt
Golub, G. H., 11, 41, 48, 81, 107, orthogonalization, 133
112, 145, 146, 158, 176, Hyperplane, 160, 236
226, 230, 285
Gram-Schmidt orthogonaliza- I
tion, 122, 129 (see also Idempotent matrix, 238
Modified Identity matrix, 236
Gram-Schmidt Inequality constraints (see Prob-
orthogonalization) lem LSI)
relation to Householder tri- Inner product, 233
angularization, 133 Integral equations, 198
Grayhffl, F. A., 39, 40 Invariant space, 39, 238
INDEX 331

Inverse matrix, 36, 236 (10.12) Properties of param-


left, 236 eters of a Householder
right, 236 transformation, 55
of a triangular matrix, 68-69 (10.14) Orthogonality of the
Householder transfor-
K mation matrix, 55
Kahan, W. M., 29, 31, 107, 287 (10.15) Prop-
Kennard, R. W., 190, 195, 201 erties of a Householder
Kernel, 235 transformation, 55
Korganoff, A., 48 (22.4) Solution of a special
Krogh, F. T., 44, 48 case of Problem LSE
Kublanovskaya, V. N., 32, 35 having diagonal matri-
Kuhn-Tucker conditions, 159, ces, 150
162, 166 (22.7) Solution of a special
weighted least squares
L
problem having diago-
l2 norm, 234
nal matrices, 150
La Motte, L. R., 195
(23.17) Positivity of the nth
LAPACK Users' Guide, 285
component of a special
Laurent, P. J., 223
least squares problem,
Lawson, C. L., 41, 48, 81, 135
162
LDP, 159, 171, 249, 250, 267
Least squares: (B.I) Sufficient conditions
algorithms, 295 for distinct eigenvalues,
applications, 295-296 240
with equality constraints, (B.2) Boundedness of quan-
291 tities in the QR algo-
with inequality constraints, rithm, 240
291 (B.22) Lemma relating to
nonlinear, 294-295 convergence of the QR
other methods, 291 algorithm, 243
statement of problem, 286 (B.30) Lemma relating to
Left inverse matrix, 236 convergence of the QR
Lemma: algorithm, 243
(3.1) Householder transfor- (B.33) Lemma relating to
mation, 9 convergence of the QR
(3.4) Givens transformation, algorithm, 244
10 Levenberg-Marquardt:
(3.17) Transposed QR de- analysis, 201
composition, 13 stabilization, 190, 198, 218
(4.3) Singular value decom- (see also Regularization)
position for a square Levenberg, K., 190
nonsingular matrix, 19 Lill, S. A., 48
332 INDEX

Linear: Frobenious, 234


dependence, 235 l2, 234
flat, 236 Schur, 234
independence, 235 spectral, 234
Line splines, 220 Normal equations, 121, 122, 123,
149
M Null space, 235
Machine numbers, 83 Numerical example:
Marquardt, D. W., 190, 193 bounding singular values of
Marquardt's method, 190 a triangular matrix, 29,
Martin, R. S., 60, 114 31
Matrix, 233 constrained curve fitting,
Matrix product, 233 169
McCormick, G. P., 160 cubic spline fitting, 222
Meyer, C. D., 39, 40
HFTI, 203
MFEOUT, 249, 275
Levenberg-Marquardt analy-
MGS (see Mod-
sis, 201
ified Gram-Schmidt or-
thogonalization) loss of information in form-
Minimal condition number, 187 ing normal equations,
Minimum length solution, 186 127
Mixed precision arithmetic with orthogonal
precisions n and w, 100 decomposition for Case
Modified Gram-Schmidt orthog- la, 13
onalization, 122, 129, orthogonal
130 decomposition for Case
Moler, C. B., 36, 60, 84, 88, 92 1b, 15
Morrison, D. D., 190, 193 orthogonal
Murray, W., 158, 226, 286, 292 decomposition for Case
2a, 14
N orthogonal
Newton's method, 193 decomposition for Case
NNLS, 159, 164, 165, 249, 250, 2b, 15
269, 292 orthogonal
Nonlinear least squares, 294, 295 decomposition for Case
Nonnegative definite matrix, 123, 3a, 14
238 orthogonal
Nonnegativity constraints (see decomposition for Case
Problem NNLS) 3b, 16
Nonsingular matrix, 235 Problem LSE, 140, 147, 156
Norm, 234 PROG1, PROG2, and
euclidean, 234 PROG3, 250-251
euclidean matrix, 234 singular value analysis, 199
INDEX 333

singular value decomposi- BVLS Bounded variables


tion, 21 least squares, 252, 292
stepwise regression, 205 LDP (23.2) Least distance
subset selection, 204 programming, 159, 165,
167, 170
O LS Least squares, 2, 5, 286,
Operation counts, 59, 60, 66, 122, 296-297
164, 211 LSE (20.1) Least squares
Orthogonal, 237 with equality
complement, 236 constraints, 74, 78, 134,
decomposition, 5, 7, 9, 11, 144, 148, 288-289, 291
12, 18, 36, 38, 286-287 LSI (23.1) Least
matrix, 5, 237 squares with inequality
regression, 286 constraints, 158, 167,
to a subspace, 236 170, 174
transformation, 288 NNLS (23.2) Nonnegative
vectors, 234 least squares, 159, 160,
165
P Process identification, 207
Painter, R. J., 39, 40 PROG1, 66, 69, 250-251
Pavei-Parvu, M., 48 PROG2, 69, 250-251
Penrose, R., 38, 39, 40 PROG3, 250-251
Penrose conditions, 38, 142 PROG4, 199, 250-251
Pereyra, V., 41, 48, 295 PROGS, 218, 222, 224, 225,
Perez, A., 48 250-251
Permutation matrix, 68, 78, 237 PROG6, 159, 169, 250-251
Projection matrix, 39, 42, 238
Perturbation bounds, 287
associated with a subspace,
for eigenvalues of a symmet-
238
ric matrix, 23
Projection of a vector onto a sub-
for the pseudoinverse matrix,
space, 238
41-48 Pseudoinverse matrix, 36, 37,
for singular values, 23
237
for solutions of Problem LS, differentiation of, 48
49-52 of a full-rank matrix, 39
Peters, G., 60, 133 of a nonsingular matrix, 37
Plackett, R. L., 185, 195 of an orthogonal matrix, 39
Polyhedron, 236 Penrose conditions, 38
Polynomial curve fitting, 195 of a product matrix, 39
Positive definite matrix, 173, 238 Pseudorank, 77, 186, 187
Powell, M. J. B., 103, 104, 105, Pythagorean condition, 236
106, 149, 156
Problem: Q
334 INDEX

QR algorithm, 107, 108 Sequential estimation, 207, 293,


convergence proof for a sym- 294
metric matrix, 240 Sequential processing, 207, 293-
QRBD, 122, 249, 262 294
QR decomposition, 8, 11, 287 Sgn, 233
SHARE, 190
R Singular matrix, 235
Ralston, A., 132, 195 Singular value analysis, 117-118,
Range space, 235 122, 186, 196-198, 199,
Rank, 123, 235 255
deficient, 235 Singular value decomposition,
full, 235, 289-290 18, 68, 107, 188, 196,
Rank revealing decomposition, 219, 238, 287, 291
287, 288 numerical example, 21
Reflection matrix, 237 Singular values, 18, 238
Regularization, 293 Space, 235
Reid, J. K., 103, 104, 105, 106, column, 235
132, 149, 156 invariant, 238
Reinsch, C., 60, 107, 112, 114 null, 235
Removing rows of data, 225-232,
range, 235
293
row, 235
Rice, J. R., 129, 130
Span, 235
Ridge regression, 190
Spectral norm, 234
Right inverse matrix, 236
Rotation matrix, 237 Spline curves and surfaces, 222-
Round-off error, 84 225, 286, 295
Row: Square root method, 124
downdating, 293 Standard deviation, 184,186, 200
interchanges, 103-106, 149 Stepwise regression, 122, 195,
scaling, 184 205-206
space, 39, 235 Stewart, G. W., 41, 60, 285
updating, 293 Stiefel, E., 132
Stoer, J., 135, 158, 176, 293
S Subset selection, 195, 204-206
Saunders, M. A., 158, 176, 226 Subspace, 235
Scalar product, 233 associated with a projection
Schur norm, 234 matrix, 238
Scolnik, H. D., 48 Summary:
Sequential accumulation, 121, of main programs, 250
132, 208-212, 272, 274, of subprograms, 248-249
293-294 SVA, 108, 120, 199, 249, 250,
of banded matrices, 212-219, 255
221 SVDRS, 108, 119, 120, 249, 250,
INDEX 335

260 special triangular ma-


Symmetric matrix, 123, 237 trix, 32
(6.31) Bounds for all singular
T values of a special trian-
Theorem: gular matrix, 35
(2.3) Analysis of Problem LS (7.1) Solution of Problem
using an orthogonal de- LS using any orthogonal
composition, 5 decomposition, 36
(3.11) QR decomposition, 11 (7.3) Uniqueness of pseu-
(3.15) QR decomposition for doinverse matrix, 37
rank deficient matrix, (7.8) Construction of the
12 pseudoinverse using an
(3.19) Construction of an orthogonal decomposi-
orthogonal decomposi- tion, 38
tion, 13 (7.9) The Penrose conditions
(4.1) Singular value decom- for the pseudoinverse,
position, 18 38
(5.1) Bound on maximum (8.5) Perturbation bounds
perturbation of eigen- for the pseudoinverse,
values of a symmetric 42
matrix, 23 (8.15) Upper bound for norm
(5.2) Bound on euclidean of perturbed pseudoin-
norm of perturbations verse, 43
of eigenvalues of a sym- (8.21) On the norms of cer-
metric matrix, 23 tain products of projec-
(5.3) Interlacing eigenvalues tion matrices, 44
of symmetric matrices, (8.22) Improved bound for
24 the norm of G2, 45
(5.6) Special matrices having (8.24) Perturbation bounds
related eigenvalues and for the pseudoinverse,
singular values, 25 46
(5.7) Bound on the max- (9.7) Perturbation bounds
imum perturbation of for the solution of Prob-
singular values, 25 lem LS, 50
(5.10) Bound on the eu- (9.12) Specialization of The-
clidean norm of pertur- orem (9.7) for Case 2a,
bations of singular val- 51
ues, 25 (9.15) Specialization of The-
(5.12) Interlacing singular orem (9.7) for Case la,
values, 26 51
(6.13) Bounds for the small- (9.18) Specialization of The-
est singular value of a orem (9.7) for Case 3a,
336 INDEX

51 column interchanges,
(16.1) Error analysis for 106
the full-rank overdeter- (18.5) Global quadratic con-
mined problem, 90 vergence
(16.11) Error analysis for of the shifted QR algo-
the square nonsingular rithm, 109, 240-247
problem, 92 (18.23) Algebraic
(16.18) Error analysis for rational for the implicit-
the full-rank underde- shift form of the QR- al-
termined problem, 93 gorithm, 113
(16.36) Error analysis for the (20.9) Solution of Problem
rank-deficient problem, LSE, least squares with
95 equality constraints,
(17.11) Error analysis for 136
mixed precision solution (20.31) An unconstrained
of the full- rank overde- Problem LS having the
termined problem, 101 same solution as Prob-
(17.15) Error analysis for lem LSE, 141
mixed precision solution (23.4) Kuhn-Tucker condi-
of the square nonsingu- tions for Problem LSI,
lar problem, 102 159
(17.19) Error analysis for (25.49) Optimal property of
mixed precision solution Levenberg-Marquardt
of the full- rank un- solution, 193
derdetermined problem, Toeplitz matrices, 295
102 Total least squares, 286
Transpose, 233
(17.23) Error analysis for
Triangular matrix, 236, 287-288
mixed precision solution
Tridiagonal matrix, 236
of the rank- deficient
problem, 103 U
(17.29) Error analysis of Unconstrained least
Householder triangular- squares problem having
ization with rows of dis- the same solution set as
parate sizes, 104 an equality constrained
(17.32) Error analysis of least squares problem,
Householder solution of 141
full-rank Problem LS Unsealed covariance matrix, 67
with rows of disparate Updating a QR decomposition,
sizes, 105 164, 174-179, 207-232,
(17.37) Growth of elements 293-294
in Householder triangu- Use of the V-matrix of the singu-
larization using row and lar value decomposition,
INDEX 337

73

V
van der Sluis, A., 187, 294
Vector, 233
space, 235
Verhey, C. T., 248
W
Wampler, R. H., 132
Wedin, P. A., 41, 294
Weighted least squares, 184
Wertz, H. J., 248
Wielandt, H. W., 24
Wielandt-Hoffman theorem, 23
Wilf, H. S., 132, 195
Wilkinson, J. H., 24, 36, 41, 47,
60, 83, 84, 86, 93, 100,
107, 109, 114, 123, 133,
240

Y
Young, G., 26

You might also like