A Tutorial Overview of Vector and Matrix Norms

File: NormOvrv Tutorial Overview of Vector and Matrix Norms Version dated January 30, 2013 11:18 am
A Tutorial Overview of
Vector and Matrix Norms
Prepared for the Numerical Analysis Seminar at U.C. Berkeley, 2010-2013 ,

and for lectures 8 & 15 Oct. 2010 at the Univ. of Texas @ Arlington
by W. Kahan, Prof. Emeritus

Mathematics Dept., & Computer Science Dept.
University of California @ Berkeley
This has been posted at

<www.eecs.berkeley.edu/~wkahan/MathH110/NormOvrv.pdf>
Prof. W. Kahan SUBJECT TO CHANGE: Do you have the latest version? Page 1 / 79
Abstract
Intended for new graduate students whose experience as undergraduates
may have prepared them inadequately to apply norms to numerical error-
analyses and to proofs of convergence, this tutorial surveys norms for
finite-dimensional real spaces in a way that may ease a transition to the
infinite-dimensional spaces of Functional Analysis. Among the topics
covered are some more useful than is implied by their absence from most
curricula. The notation is mostly standard but interpreted in ways not
always taught to undergraduates, so attendees may prepare for the tutorial
by reading just a few of my lecture notes for Math. H110 posted at
<eecs.berkeley.edu/~wkahan/MathH110/2dspaces.pdf> and
<.../pts.pdf> in that order, and afterwards <.../geo.pdf> and
<.../geos.pdf> skimmed lightly.
This tutorial omits proofs; almost all can be found in
<.../NORMlite.pdf>, <.../GIlite.pdf>, and a few other places cited.
This tutorial’s pages have been posted at <.../NormOvrv.pdf> .
Contents
Abstract Page 2
What are Norms for? Two Examples … 5-7
Part I: Vector Norms
The Trouble with Norms …, too many Unit Balls 9
Choosing a Norm 12-3
Dual Spaces 15-7
Changing a Basis 18
Real Inner-Product Spaces 19
Auerbach’s Parallelepiped Theorem 21
Fritz John’s Ellipsoid Theorem 22
Part II: Matrix Norms
Overloaded Notation 24
What must we know to choose an apt norm? 25
Mere Matrix Norms vs. Operator Norms 26-8
Maximized Ratios of Familiar Norms 29
Choosing a Norm 30
When is a Preassigned Matrix Norm Also an Operator Norm? 31
Orthogonally Invariant Matrix Norms 32
Dual Norms for Dual Matrix Spaces, and Norms for Bilinear Forms 33-4
Part III: Matrix Norms and Matrix Inverses
Condensed Review of Parts I & II 35
Sensitivities of Inverses to (Infinitesimal) Perturbations 36
Operator Norms of Inverses 37-8
… continued …
… Contents continued …
Equilibration and Preconditioning 39-40
Iterative Refinement 41
Diagonal Dominance and Schur Complements 42-3
Fredholm’s Alternatives 44
Generalized Inverses and Pseudo-Inverses 45
How Big must they be? Their Use can be Dangerous! 46-8
A Generality 49
Part IV: Matrix Norms and Eigenvalues
Matrix Norms exceed Eigenvalues 51
A Computed Eigenvector’s Error is an Angle 52
Clustered Eigenvalues’ Sensitivities to Perturbations 53
Gershgorin’s Circles enclose Eigenvalues; Extreme Singular Values 54-5
Eigenvalues’ Sensitivities 56-7
Perron-Frobenius Theory of Nonnegative Matrices, and Optimal Diagonal Equilibration 58-9
Part V: Matrix Norms and Real Symmetric Matrices’ Eigenvalues
Stationary Points and Values, Real Quadratic Forms, Congruence, Diagonalization 61-63
Stationary Points and Values, and Generalized Symmetric Eigenproblems 64
Simultaneous Diagonalization by Congruence 65-6
Real Symmetric Eigenproblem; Courant-Fischer Minimax 67
Absolute and Relative Perturbations 68
Partial Eigensystems and Spectral Gaps 69-73
Spectral Gaps, Invariant Subspaces, and Angles 74
Miscellany 75
Citations and Further Reading; Epilogue 76-9
What are Norms for?

They provide vector spaces and their linear operators with measures of
size, length and distance
only a little more general than what we already use routinely in everyday life.
“A little more general” ⇒ more widely applicable than our most familiar notions
but still often conforming to our intuitions about them.
Examples …
1• A “Backward” Error-Analysis of a computed approximation Y to ƒ(X) :

Y + ∆Y = ƒ(X + ∆X) ; are ∆Y and ∆X negligible?
Depends upon what we can infer about ||∆Y|| and ||∆X|| . vector norms
2• Convergence Analysis of an Iteration towards a Fixed-Point z :

xn+1 := ƒ(xn) = ƒ[n+1](x0) → ? → z = ƒ(z) ?
Depends upon what we can infer about derivative … ||ƒ `(z)|| . matrix norm
•••
Example 1: A “Backward” Error-Analysis

A program F(X) is intended to compute vector-valued ƒ(x) for vector inputs x .
Actually, computed Y := F(X) only approximates y := ƒ(x) . HOW WELL ?
We deem program F to be “Backward Stable” numerically just If we have proved that

||F(X) – ƒ(X)|| is at worst slightly bigger than ||ƒ(X + ∆X) – ƒ(X)|| can be for some
unknown roundoff-induced perturbation ∆X whose ||∆X|| is at worst slightly bigger than
negligible compared with ||X|| for all inputs X in a region big enough to be useful.
Useful computed
DATA results F(X) lie
Our chosen norm inside this circle:
may exaggerate ƒ
uncertainties in X
actual input data ===> ƒ(X)
and computed results X + ∆X RESULTS
whose correlations e.g. Matrix Inversion,
ƒ(X + ∆X)
the norm disregards. but inappropriate for log, acos, …
If F is “backward stable” but computed F(X) is very wrong, do we blame the victim ƒ for “ill condition” at X ?
Error-Analyses tend to excessive pessimism partly because they allow for unlikely conspiracies
among rounding errors, and partly because the chosen norms are often not the most suitable.
Example 2: Convergence Analysis of an Iteration

Given a smooth map ƒ(x) of a vector-space to itself, and a starting vector x0 , let
xn+1 := ƒ(xn) = ƒ(ƒ(xn–1)) = … = ƒ[n+1](x0) for n = 0, 1, 2, 3, … in turn.
Does xn → z fast enough from every x0 near enough to a Fixed-Point z = ƒ(z) ?
Yes if and only if a z exists and every | eigenvalue of ƒ `(z) | is sufficiently less than 1 .
But we don’t know z yet, much less the eigenvalues of the derivative ƒ `(z) . Jacobian
Instead we explore conditions upon ƒ easier to test. For instance, maybe ƒ is a …
Contractive Map: ||ƒ(y) – ƒ(x)||/||y – x|| < λ < 1 whenever distinct x and y
lie in some sufficiently large region X .
Then either ||xn – z|| ≤ λn·||x0 – z|| → 0 so xn → z = ƒ(z) uniquely in X ,
or ultimately xn escapes from X , which is too small to hold a fixed-point.
And test the Contractive hypothesis: Is ||xn+1 – xn||/||xn – xn–1|| < 1 ? Until roundoff …
THE CATCH: All this makes sense only for an appropriately chosen norm ||…|| .
That is the trouble with norms: There are so many of them, apt choice may be hard.
Part I
Vector Norms
The trouble with norms is that there are so many of them.

To be a vector norm, ||…|| need satisfy only three requirements …
• Positivity: ∞ > ||x|| > 0 for every vector x except ||o|| = 0 .
• Homogeneity: ||λ·x|| = |λ|·||x|| for every scalar λ . Let’s keep λ real.
• Triangle Inequality: ||x + y|| ≤ ||x|| + ||y|| for all x and y . Equality need not imply parallelism.
If ||x|| is a norm, so is |||x||| := ||L–1·x|| for any fixed invertible linear operator L .
If ||x|| and |||x||| are norms, so are max{||x||, |||x|||} , √(||x||2 + |||x|||2) , ||x|| + |||x||| , … .
The Unit Ball of a norm ||x|| is the region B := { x : ||x|| ≤ 1 } . This B turns out to
be closed, bounded, centrally symmetric ( B = –B ) and convex with o strictly inside.
“Convex” means that, if x and y lie in B , so does ζ·x + (1– ζ)·y for 0 ≤ ζ ≤ 1 .
Line segment joining “points” x and y
Conversely, any region B closed, bounded, centrally symmetric and convex with o
strictly inside is the Unit Ball of the norm ||x|| := inf ( |ξ| for which x/ξ lies in B ) .
•
y •
∂B x ||x|| ≈ 2/3
o•

How is an appropriate one to be chosen?
Scalar value ||x|| has to be computable at a tolerable cost. Computable from what?
Computable from scalar components of a representation x of (abstract?) vector x .
An applied mathematician’s first challenge is to choose suitable parameters

( variables, coordinates, basis vectors, … )
to represent his problem’s entities ( on paper, on a blackboard, in his computer, … ).
Let B := [b1, b2, b3, …, bn] be a Basis for the (abstract?) space of vectors x , so each
vector x = B·x = ∑j bj·ξj is represented by a column-vector x = [ξ1; ξ2; ξ3; …; ξn] in
MATLAB’s notation. Thus, basis B is an invertible linear map from a space of columns x
to the (abstract?) space of vectors x = B·x ; each x has its own column x = B–1·x .
Then ||x|| will be computed as ||x|| := ||x|| from the components ξj of column x .
If a basis B accepted first is unsatisfactory, a new basis B := B·C–1 can be chosen; here
C is an invertible matrix. And the new representative of x = B·x = B·x is then column
x = C·x , whence the new formula to compute ||x|| = || x || becomes || x || = ||C–1·x|| .
Don’t Memorize! Re-derive!
Having chosen (or accepted) a basis in which vector x is represented by its column
x = [ξ1; ξ2; ξ3; …; ξn] , we can choose a formula by which to compute ||x|| := ||x|| from
the components ξj . Here are three familiar formulas and their Unit Balls for n = 3 :
• ||x||∞ := max |ξj| . Ball is a solid Cube. Postal box
• ||x||2 := √(∑ |ξj|2) . Ball is a solid Sphere. Euclidean length!
• ||x||1 := ∑ |ξj| . Ball is a solid Octahedron Taxicab norm
These norms are unchanged by permutation of the elements of x ; these norms treat each
element the same as any other. What if some elements need closer scrutiny than others?
Let W be an invertible diagonal matrix; |||x||| := ||W–1·x|| is a norm too, maybe better.
Let C be an invertible matrix of a change of basis; |||x||| := ||C–1·x|| is a norm too.
A change of variables often induces a change of norm. Or is it vice-versa ?
How should a norm be chosen for error-analysis?

Ideally, an appropriate norm would be what I call Equitable, i.e., so chosen that …
All errors of the same norm have roughly equal (in)significance.
This ideal is often not achievable before a computation’s results have been inspected,
and not always achievable afterwards, but always worth attempting.
Example: The control of electric power distribution networks involves voltages of widely
diverse magnitudes, ranging from those on cross-country transmission lines to those in
radio receivers of the microwave transmissions needed to control power generation.
Locations Voltage Ranges Better Units
1 Cross-Country Transmission Towers 250,000 - 1,000,000 Megavolts
2 Substation to Transformers on Telephone Poles 2,000 - 10,000 Kilovolts
3 In-house wall sockets to appliances 110 - 240 Volts
4 Computer power supplies to transistors 2 - 12 Volts
5 Transistor circuits’ on-off variations 0.001 - 0.01 Millivolts
6 Inputs to radio receivers’ antennas 0.000,001 - 0.000,01 Microvolts
Changes by 0.001 Volt are negligible in locations 1 - 4 , devastating in the last two.
Changes by 0.001 Unit are negligible in all locations.
How should a norm be chosen for convergence analysis?

Ideally, the norm should be so chosen that a Stationary Iteration …
xn+1 := ƒ(xn) → z = ƒ(z) converges monotonically in norm.
This ideal is often not achievable, but always worth attempting.
Local Convergence: If 1 > λ > | each eigenvalue of ƒ `(z) | , norms ||…|| exist such that
||xn+1 – z|| ≤ λ·||xn – z|| ≤ λn+1·||x0 – z|| → 0 for every x0 close enough to z .
In principle such a norm can be constructed by changing to a new basis obtained from the
eigendecomposition of ƒ `(z) , but this merely raises hopes for something more practical.
Global Convergence: Rather than by shrinkage of a norm, convergence must be proved
(if true) by shrinkage of a Lyapunov Function whose Level-lines/surfaces form a nested
family shrinking onto z , and whose shapes need not be centrally symmetric and convex.
Global Convergence of a Non-Stationary Iteration xn+1 := ƒn(xn) → z = ƒm(z) for all m

can rarely be proved using norms; rare exceptions are mostly matrix computations like
Conjugate Gradients. Other global convergence proofs need some other monotonicity.
E.g.: §6 of <eecs.berkeley.edu/~wkahan/Math128/GnSymEig.pdf> uses a monotonic determinant, not a norm at all.
And now for something entirely different:
Why are vector spaces like potato chips?
Why are vector spaces like potato chips?

Because you cannot have just one.
Each space of vectors x comes with its Dual-Space of Linear Functionals wT :
• Scalar Product wT·x is a scalar, real for real spaces. Complex: w*·x or wH·x
• wT acts linearly upon vectors x and y : wT·(λ·x + µ·y) = λ·wT·x + µ·wT·y .

• x acts linearly upon vT and wT : (λ·vT + µ·wT)·x = λ·vT·x + µ·wT·x .
So the linear functionals wT form a vector space Dual or Conjugate to the space of
vectors x . Each space is dual to the other, and they have the same finite dimension.
But among infinite dimensional spaces, many are properly contained within their dual’s dual.
Since x need not be a column, wT need not be a row, much less the “Transpose” of a
vector w . Except for Euclidean and other Inner-Product spaces, there is no necessary
relation between wT and w , just as Miss Carla and Master Carlo need not be related.
Compare Contravariant and Covariant Tensors.
Distinctions between dual spaces are obscured by other notations for wT·x like w'·x, w•x, <x, w>,
<x|w>, (x, w), … devised originally for Euclidean and other Inner-Product spaces, each one
Isomorphic to its dual. Many mathematicians expect context to disambiguate those notations.
E.g.: The space dual to x ’s contains the derivative µ`(x) of any scalar function µ(x)
because scalar dµ(x) = µ`(x)·dx .
The Dual-Space’s Norm

The natural dual norm for functional wT is ||wT|| := max |wT·x|/||x|| over x ≠ o .
From that follows ||x|| = max |wT·x|/||wT|| over wT ≠ oT . Non-trivial proof in general
Examples of dual norms for rows and columns: … in MATLAB’s notation …

Say column x = [ξ1; ξ2; ξ3; …; ξn] , and row wT = [ω1, ω2, ω3, …, ωn] .
• Dual to ||x||∞ := max |ξj| is ||wT||∞ = ∑ |ωj|
• Dual to ||x||2 := √(∑ |ξj|2) is ||wT||2 = √(∑ |ωj|2)
• Dual to ||x||1 := ∑ |ξj| is ||wT||1 = max |ωj|
Hölder’s Inequality: |wT·x| ≤ ||wT||·||x|| for all wT and x . Cf. Cauchy’s for ||…||2
Vectors and Functionals Dual with respect to the Norm

z and uT are called “Dual w.r.t. the norm” when uT·z = ||uT||·||z|| and ||uT|| = ||z|| ≠ 0 .
This duality relation is also called “Polarity”. Its z and uT determine each other
nonlinearly and perhaps nonuniquely in general, but linearly and uniquely only in
Euclidean and in Inner-Product spaces to be described imminently. Examples …
• Grad µ(x) is a vector in x ’s space dual to derivative µ`(x) in the dual-space.
From Hölder’s Inequality, µ(x) changes fastest in the direction of Grad µ(x) , and ||Grad µ(x)|| = ||µ`(x)|| .
• Row(s) wT = [ω1, ω2, ω3, …, ωn] dual to column x = [ξ1; ξ2; ξ3; …; ξn] w.r.t. ||x|| :
» For ||x||∞ : If |ξj| < ||x||∞ then ωj := 0

else let sign(ωj) := sign(ξj) or 0 , and ∑ |ωj| := ||x||∞ .
This dual wT is unique just when only one |ξj| = ||x||∞ .
» For ||x||2 : wT := xT (its transpose !) uniquely. (Complex conjugate transpose for complex spaces)
» For ||x||1 : If ξj = 0 then choose any ωj with |ωj| ≤ ||x||1

else ωj := ||x||1·sign(ξj) .
This dual wT is unique just when every ξj ≠ 0 .
Changing a vector space’s Basis changes the Dual-space’s Basis:

Column x that represents vector x = B·x in the basis B := [b1, b2, b3, …, bn] has its
counterpart in row wT that represents functional wT = wT·B–1 in the same basis. This
notation preserves the scalar product wT·x = wT·x as a matrix product in a natural way.
Just as the “columns” bj of B are basis vectors for the space of vectors x , the “rows”
of B–1 are basis vectors ejT (functionals) for the dual-space of linear functionals wT .
“Evaluation” functional ejT extracts x ’s component ξj = ejT·x of column x = B–1·x .
Changing basis to B := B·C–1 changes column x representing vector x = B·x = B·x to

x = C·x , and changes row wT representing wT = wT·B–1 = wT·B–1 to wT = wT·C–1 .
• When can “ …T ” be construed as an operator instead of merely a suffix?

When does row wT = wT·C–1 match the transpose (w)T = wT·CT of column w = C·w ?
They match only if CT = C–1 is an Orthogonal matrix. That is what happens when
B and B are Orthonormal bases for a Euclidean space, which is the prototypical
instance of an Inner-Product Space ...
Real Inner-Product Spaces

A real Inner-Product Space possesses a symmetric scalar product x•y = y•x linear in
each of x and y separately, and with x•x > 0 except o•o = 0 . Therefore the space is
normed: ||x|| := √x•x . But its Unit Ball is generally an Ellipsoid. ( Complex x•y = y•x .)
An inner-product space has a natural linear map between it and its dual space. Vector
x = B·x represented by column x maps to linear functional x• = xT·M·B–1 represented
by row xT·M for some symmetric positive definite matrix M = MT , the space’s Metric.
M comes from the basis vectors in B := [b1, b2, b3, …, bn] thus: Mij = bi•bj . Similarly
w• = wTB–1 represented by row wT maps to w = B·M–1·w . Now representative
columns x = B–1·x and y = B–1·y figure in the formula to obtain y•x = yT·M·x ; and
||x|| = √(xT·M·x) but ||w•|| = √(wT·M–1·w) . Now x and x• are duals w.r.t. ||…|| .
A Euclidean space’s metric M is the Identity matrix I , which simplifies everything.
Changing basis B to B := B·C–1 changes metric M to M = C–1T·M·C–1 . Therefore

Every real inner-product space is a Euclidean space
disguised perhaps by a non-orthonormal basis B .
B changes to an orthonormal basis B := B·C–1 when CT·C = M . ... Gram-Schmidt, Cholesky
What distinguishes an Inner-Product or Euclidean space

from all other Normed vector spaces?
The Parallelogram Law: ||x + y||2 + ||x – y||2 ≡ 2||x||2 + 2||y||2 . … Jordan-von Neumann Th’m
And then x•y ≡ ( ||x + y||2 – ||x – y||2 )/4 .

For a proof see <www.eecs.berkeley.edu/~wkahan/MathH110/QF.pdf>
• Every Inner-Product space is Isomorphic to its Dual-space.
But every other normed n-space is NOT Isomorphic to its dual space
except possibly if dimension n ≤ 2 .
||…||∞ ↔ ||…||1
Generally, flat spots on the surface of a normed space’s unit ball match up with vertices of the dual space’s unit ball, and vice-versa.
The Derivative of a Vector Norm ( Say dimension n > 1 .)

It is determined uniquely only where the norm’s unit ball is smooth (no vertex nor edge),
as is an inner-product space’s ellipsoidal unit ball, everywhere but at o .
Then d||z|| = uT·dz/||z|| in which uT is the linear functional dual to z w.r.t. ||…|| .
More generally, d||z|| is an extremum of uT·dz/||z|| over all the linear functionals uT dual to z w.r.t. ||…|| .
The foregoing formula helps to provide a short proof of the most useful special case of …
Auerbach’s Theorem: Any n-space’s unit ball B can be circumscribed by at least one
Parallelepiped P whose 2n faces touch B at their midpoints.
One such P is a circumscribing parallelepiped of minimum volume. And if a new basis

P is chosen from just the vectors that join o to the midpoints of P ’s faces, then the
column x := P–1·x that represents n-dimensional vector x in this new basis P has
||x||1/n ≤ ||x||∞ ≤ ||x|| ≤ ||x||1 ≤ n·||x||∞ for every x .
• Therefore any n-dimensional norm ||…|| can be approximated within a factor n±1/2 by
either ||…||∞ or ||…||1 , each of which costs at most n operations to compute after the
change of basis (whose one-time cost may be of order n3).
An Ellipsoid is the level surface of a Positive Definite Quadratic Form; see pp. 62-63.
Fritz John’s Ellipsoid Theorem: (1948)

Any n-dimensional normed space’s unit ball B can be circumscribed
by an ellipsoid E tightly enough that √n·B ⊇ E ⊇ B .
One such E is a circumscribing ellipsoid of minimum volume. And if a new basis E is

chosen from the principal axes of E then the column x := E–1·x that represents n-
dimensional vector x in this new basis E has ||x||/√n ≤ ||x||2 ≤ ||x|| for every x .
• Therefore any n-dimensional norm ||…|| can be approximated within a factor n±1/4
by ||…||2 at a cost of order n after the change of basis.
Often a change of basis involving nothing more onerous than choices of suitable units
(recall the voltages’ example on p. 12) allows one of the cheap norms ||…||∞, ||…||2 or
||…||1 to provide adequate error-estimates.
Fritz John’s Ellipsoid Theorem has vastly many other useful implications some of which
are mentioned in my posted lecture notes <…/MathH110/NORMlite.pdf>
However, if dimension n is too big or infinite, different norms may defy comparison.
Part II
Matrix Norms
All the linear operators L mapping one given normed space of vectors x to another
constitute a third vector space and therefore can be subjected to any of a vast horde of
norms. We prefer that each such Matrix Norm satisfy these four requirements:
• Positivity: ∞ > ||L|| > 0 for every L except ||O|| = 0 .
• Homogeneity: ||λ·L|| = |λ|·||L|| for every scalar λ . Let’s keep λ real.
• Triangle Inequality: ||L + K|| ≤ ||L|| + ||K|| for all L and K .

• Compatibility: ||L·x|| ≤ ||L||·||x|| for all L and x . Subordination
Overloaded Notation
Note that the last requirement involves three norms all written “||…||” . This notation is
heavily overloaded, obliging readers to disambiguate norms by close attention to the
linguistic type and context of each norm’s argument, just as we learned to distinguish
||wT|| from ||x|| . A Compatible Matrix norm is compatible with one or two vector norms
and consequently with their dual norms: Compatibility implies ||wT·L|| ≤ ||wT||·||L|| too.
This Compatibility requirement is trifling because it can always be met by scaling up the
norm offered for linear maps L : If µ := ( max ||L·x|| over ||L|| = ||x|| = 1 ) > 1 then
replace ||L|| by a scaled-up norm |||L||| := ||L||·µ to make it compatible.
(Then if identity I maps a space to itself but with a different norm, ||I|| can have any positive value! )

How is an appropriate one to be chosen? … to be computed?
Scalar value ||L|| has to be computable at a tolerable cost. Computable from what?
Computable from scalar components of a representation L of (abstract?) operator L .
Let B be a basis for vectors x in the domain of L . and E a basis for vectors y in the
target-space containing the range of L . Then matrix L := E–1·L·B represents L , and
||L|| must be computed from its elements Lij . These change when bases are changed.
More generally, most of a second college course in linear algebra concerns the question
“What can be inferred about a linear operator L from its matrix L given
only the natures of L ’s domain- and target-spaces but not their bases?”
If only the spaces’ dimensions are known, only Rank(L) := Rank(L) can be inferred.
If L maps an otherwise undistinguished space to itself, only L ’s Jordan Normal Form.
If L maps one Euclidean space to another, only the Singular Values of L .
If L is a Symmetric map between a space and its dual space, only L ’s Inertia.
see Sylvester’s Inertia Theorem on p. 63
If too little is known about the domain- and target-spaces of linear operator L , not much
about it can be inferred (numerically) from arbitrary matrix representations L of L .
Here are three Matrix norms for m-by-n matrices L that take account only of its
elements Lij , not of the operator L that L represents, so these norms’ theory is shallow:
• ||L||∑ := ∑i ∑j |Lij| extends from ||x||1

• ||L||F := √( ∑i ∑j |Lij|2 ) … the Frobenius norm, the earliest matrix norm.
• ||L||µ := n·maxi,j |Lij| extends from ||x||∞
• Compatibilities: ||L||∑ & ||x||1 , ||L||F & ||x||2 , ||L||µ & ||x||∞ , among others.
Each of these norms possesses separately a property called Multiplicative Dominance :
||K·L|| ≤ ||K||·||L|| whenever matrices K and L can be multiplied.
This property implies Compatibility with inherited n-by-1 vector norms. Consequently
Multiplicative Dominance is so useful that we shall require it of all Matrix norms.
And we shall get it from Operator Norms defined thus:
• ||L|| := ( max ||L·x|| over ||x|| = 1 ) = ( max ||wT·L|| over ||wT|| = 1 )

= ( max | wT·L·x | over ||wT|| = ||x|| = 1 ) with dual ||…T|| and ||…||
Danger : Here five distinguishable norms are all written the same way: “||…||” .
Operator Norms are also called “Sup Norms” and “LUB-norms” (Least Upper Bound).
Operator Norms
• ||L|| := ( max ||L·x|| over ||x|| = 1 ) = ( max ||wT·L|| over ||wT|| = 1 )
= ( max | wT·L·x | over ||wT|| = ||x|| = 1 ) with ||…T|| dual to ||L·…||
Later we shall see how to test whether a given Matrix norm ||L|| is an Operator Norm too.
For now observe that Operator Norms possess Multiplicative Dominance in this way:
||K·L|| ≤ ||K||·||L|| if Domain(K) ⊇ Range(L) and both have the same vector norm.
And for linear maps L of a space to itself each Operator Norm satisfies ||L|| ≥ |λ| for
every eigenvalue λ of L ; and || I || = 1 .
A few Operator Norms can be computed at modest cost from the matrix L representing
L in suitably chosen bases for its domain- and target-spaces. Because there are many
uses for different norms, they must be distinguished by subscripts that clutter the notation:
Let ||L||αβ := ( max ||L·x||α/||x||β over x ≠ o ) = ( max ||wT·L||β/||wT||α over wT ≠ oT )
except we abbreviate ||L||αα to ||L||α . Most subscripts α and β will be 1, 2 or ∞ .
Actually we compute ||L||αβ from elements Lij of L ’s matrix L := E–1·L·B , just as

we got ||x||β from elements ξi of x := B–1·x , and ||wT||α from ωj of wT = wT·E .
Formulas for Five Familiar Operator Norms of Matrices

Here L is an m-by-n matrix of elements Lij :
• ||L||∞ = maxi ∑j |Lij| = ||LT||1 , the biggest Row-Sum of Magnitudes
• ||L||1 = maxj ∑i |Lij| = ||LT||∞ , the biggest Column-Sum of Magnitudes
• ||L||∞1 = maxi maxj |Lij| = ||LT||∞1 = ||L||µ/n = ||LT||µ/m , the biggest Magnitude
• ||L||∞2 = maxi √( ∑j |Lij|2 ) , the biggest Euclidean Row-Length
• ||L||2 = ( the biggest Singular Value of L ) = ||LT||2 = √( biggest Eigenvalue of LT·L )
T
Also ||L||2 = the biggest Eigenvalue of O L from the Singular Value Decomposition.
L O
And ||L||2 ≤ √(||L||1·||L||∞) . Do you see why?
||L||12 , ||L||21 , ||L||1∞ and ||L||2∞ are little used because their computations cost too much.
If dimensions are not too big, then, just as any vector norm ||…|| can be approximated by
||…||1, ||…||2 or ||…||∞ perhaps after a change of basis, so can any Operator Norm …
Maximized Ratios of Norms

Set µαβ := maxx ≠ o ||x||α/||x||β = maxwT ≠ oT ||wT||β/||wT||α when α ≠ β .
All such ratios µαβ are finite for finite dimensions n , but may grow with n . For
instance, µ∞1 = µ∞2 = µ21 = 1 ; µ12 = µ2∞ = √n ; µ1∞ = n . Generally µαβ·µβα > 1 .
There are analogous ratio-maxima for m-by-n Matrix norms; for instance
µF∑ := maxL ≠ O ||L||F/||L||∑ = 1 ; µ∑F = √m·n ; µµF = µµ∑ = n ; µFµ = √m/n ; µ∑µ = m .
Operator Norms inherit their ratio-maxima from their associated vector norms:
Recalling that ||L||αβ := maxx ≠ o ||L·x||α/||x||β , set µαβγδ := maxL ≠ O ||L||αβ/||L||γδ .
Then it so happens that µαβγδ = µαγ·µδβ .
Caution: If L is not square, then ratios µ… must take different dimensions into account.
These ratio-maxima µ… reveal how well or badly one norm can approximate another.

How is an appropriate one to be chosen?
How should a Matrix norm be chosen for error-analysis?

Ideally, an appropriate norm would be what I call Equitable, i.e., so chosen that …
All errors of the same norm have roughly equal (in)significance.
This ideal is often not achievable before a computation’s results have been inspected,
and not always achievable afterwards, but always worth attempting.
For example, given two diagonal matrices Vt and Vd with wildly diverse magnitudes on
their diagonals, but such that the nonzero elements of Vt·K·Vd span a modest range of
magnitudes (such a K is called a “Graded” matrix), then variations ∆K in K may be
gauged more equitably by norm |||∆K||| := ||Vt·∆K·Vd|| than by a familiar ||∆K|| .
If ||…|| is an Operator Norm, so is |||…||| , but derived from vector norms after basis
changes tantamount to scaling the vectors: |||x||| := ||Vd–1·x|| for x in the domain-space;
|||y||| := ||Vt·y|| for y in the target-space of ∆K whose |||∆K||| = maxx≠o |||∆K·x|||/|||x||| .
Another example: Compute a norm [ ∆K]] := || ∆K || from the elements ∆Kij := ∆Kij/Ωij
scaled by a given array of positive Weights 1/Ωij . When ||…|| = ||…||∞1 , the biggest
magnitude, [ …]] is called an “Elementwise” norm, especially if every Ωij := |Kij| > 0 .
When is a preassigned Matrix norm also an Operator Norm ?

Operator Norms have properties that some mere Matrix norms lack. For instance, the
norm of a rank-1 matrix or operator is ||x·wT|| = ||x||·||wT|| for every Operator Norm but
not for every compatible Matrix norm.
To simplify the exploration that follows, suppose a given Matrix norm [ …]] for square
n-by-n matrices possesses Positivity, Homogeneity and the Triangle Inequality but, for
lack of defined vector norms, perhaps not Compatibility.
Instead we assume that [ …]] possesses Multiplicative Dominance: [ K·L]] ≤ [ K]]·[[L]] ,

which can always be realized by scaling [ …]] if necessary. Then choose a fixed n-row
rT ≠ oT and define a (perhaps new) vector norm to be ||x|| := [ x·rT] , the same for both
the domain- and target-spaces of the n-by-n matrices for which [ …]] was given. Now
this vector norm induces an Operator Norm ||L|| := maxx ≠ o ||L·x||/||x|| . We find that this
||L|| ≤ [ L]] for all n-by-n matrices L . Thus, there is a sense in which …
Operator Norms are minimal among Multiplicatively Dominant norms.
And ||L|| = [ L]] for all L only when the given [ …]] was already an Operator Norm.
Eg. [ L]] = ||L||F ⇒ ||L|| = ||L||2 ; [ L]] = ||L||µ ⇒ ||L|| = ||L||∞ ; [ L]] = ||L||∑ ⇒ ||L|| = ||L||1 .
Orthogonally Invariant Matrix Norms

These are for m-by-n matrices L that represent operators L that map one Euclidean
space to another. These norms satisfy ||L|| = ||Q·L·P|| for all Orthogonal matrices
QT = Q–1 and PT = P–1 . (Analogous Unitarily Invariant norms are for complex unitary
spaces, and especially for Hilbert spaces whose dimensions are infinite.)
These norms ||L|| depend solely upon the Singular Values of L , which are the biggest
T
min{m, n} eigenvalues of O L (these come in ± pairs). In fact, every such norm is
L O
||L|| = ||v|| in which column vector v is a column of the singular values of L and ||…|| can
be any vector norm.
Almost all useful orthogonally invariant matrix norms are Symmetrical Cross-Norms :
• “Symmetrical” means the order of the singular values in v does not matter to ||v|| .
• “Cross-Norms” means that the norm of a rank-1 matrix is ||x·wT|| = ||x||2·||wT||2 .
Among these the most common by far are the Multiplicatively Dominant …
• ||L||2 = ||v||∞ =the biggest singular value of L , the natural Operator Norm.
• ||L||F = ||v||2 = √(Trace(LT·L)) = √(∑(squared singular values)) , the Frobenius norm.
||C·L||F ≤ ||C||2·||L||F ≤ ||C||F·||L||F
• ||L||N = ||v||1 = ∑ (singular values) , the “Nuclear” norm, used for some optimizations.
Dual Norms for Dual Matrix Spaces

The vector space of all real m-by-n matrices L has its Dual-space consisting of n-by-m
real matrices KT with the scalar product ∑i ∑j Kij·Lij = Trace(KT·L) = Trace(L·KT) .
Given a Matrix norm ||L|| for m-by-n matrices L , the natural norm for their dual-
space of n-by-m matrices KT is ||KT|| := maxL≠O Trace(KT·L)/||L|| .
This places a severe strain upon our subscripting notation. For example …
Dual to the norm ||L||∞ = maxi ∑j |Lij| is ||KT||(∞) := ∑i maxj |Kij| ≥ ||KT||∞ = ||K||1 .
The strain is less severe for our orthogonally invariant norms; for example …
Dual to the norm ||L||F = √(Trace(LT·L)) is ||KT||F = ||K||F .
Dual to ||L||2 = max(singular value(L)) is ||KT||N = ∑(singular values(ΚΤ)) = ||Κ||Ν.

This last duality explains partially why the Nuclear norm figures in some optimizations
that can be reformulated advantageously in a dual-space. In certain optimizations, the K
that minimizes ||K||N subject to appropriate constraints also minimizes Rank(K) .
Do not confuse Dual Norms for Dual Matrix Spaces with É

Norms for Linear Maps £ from a Real Vector Space to its Dual:
Scalar £áxáy is linear in x and y separately. É (conjugate-linear in y if space is complex)
£áxáy is called a Bilinear Form , also written £(x, y) .
Matrix L representing £ depends upon Basis B = [b1, b2, É, bn] : Lij = £ábjábi so
£áxáy = yTáLáx for columns x := BÐ1áx and y := BÐ1áy . É ( É = y'áLáx if complex)
Change of Basis from B to B := BáCÐ1 changes L to L := (CÐ1)TáLáCÐ1 ; then É

L and L are called Congruent. See also pp. 62-3 for more about Congruence.
Operator Norms: ||£|| := max |£áxáy| over ||x|| = ||y|| = 1 .

||L||αα := max |yTáLáx| over ||x||α = ||y||α = 1 . ||L||αα = ||LT||αα
Examples:
||L||11 = ||L||∞1 = maxi maxj |Lij| .
||L||22 = ||L||2 = biggest singular value of L .
||L||∞∞ = ||L||1∞ ≤ ∑i ∑j |Lij| and ||L||∞∞ ≥ max{ ||L||1, ||L||∞ } .
£ is called Symmetric when £áxáy ≡ £áyáx = yTáLáx ≡ xTáLáy , whereupon LT = L .

See also p. 62.
Part III:
Matrix Norms and Matrix Inverses
Condensed Review of Parts I & II :

1. Try to avoid choosing a norm ||É|| badly;
take care to choose appropriate Bases, Coordinates, Variables.
Ideally, perturbations with the same norm would be about equally (in)signiÞcant.
The choice of ||...||1 , ||...||2 or ||...||∞ matters most when dimensions are big.
2. Among Matrix norms, the Operator Norms ||L|| := maxx≠o ||L⋅x||/||x|| have
the most useful properties ... multiplicative ..., minimality ... .
Matrix Norms and Matrix Inverses

Sensitivity of Inverses to (InÞnitesimal) Perturbations:
Changing L to L + δL changes L-1 to L-1 + δ(L-1) ;
δ(L-1) = -L-1⋅δL⋅L-1 so ||δ(L-1)|| ≤ ||δL||⋅||L-1||2, and equality is achievable.
Condition Number (for Inversion): κ(L) := ||L-1||á||L|| , an ampliÞcation factor;

||δ(L-1)||/||L-1|| ≤ κ(L)⋅||δL||/||L|| and equality is achievable.
Also ||δ(L-1)||/||L-1|| ≥ (1/κ(L))á||δL||/||L|| , because κ(L-1) = κ(L) .
Perhaps κ(L) would be better regarded as a Distortion factor.
Sensitivity to Perturbations in Data {L, c} of Solutions x of equation L⋅x = c :

||δx||/||x|| ≤ κ(L)⋅( ||δL||/||L|| + ||δc||/||c|| ) , and equality is achievable.
If & when it succeeds,

Backward Error-Analysis maps rounding errors to induced perturbations in data.
⇒ As matrix inversionÕs rounding errors propagate they get ampliÞed by κ(L) .
Ill-Condition means a HUGE ampliÞcation factor κ(L) . cf. p. 6.
Operator Norms of Inverses

Theorem: For any Operator Norm ||...|| , É known to Banach ?
||L-1|| = 1/( min ||∆L|| over Rank(L - ∆L) < Rank(L) ) .
Condition Number (for Inversion): κ(L) := ||L-1||á||L|| ampliÞes perturbationsÕ effects
Space of n-by-n Matrices or Linear Operators

Rank = n-2
Cone of Rank ≤ n-1
An Ill-Conditioned
L - ∆L¥ Matrix L is
extremely close to
O• A ¥L Singular Matrices
Angle A = arcsin( 1/κ(L) )
Space of n-by-n Matrices or Linear Operators

Rank = n-2
Cone of Rank ≤ n-1
How does L-1 behave
L - ∆L ¥ as L approaches a
Singular matrix ?
O• A ¥L Where are the inverses

Angle A = arcsin( 1/κ(L) ) of the little Red Ball ?
||L-1 - (Rank-1 matrix)||/||L-1|| → 0
2/(Radius of Red Ball)
This is the most likely situation, necessary for eigenvector computation.
Equilibration invokes Diagonal Scaling to help compute the solution x

of an equation Ò C⋅x = b Ó more nearly as accurately as the data deserve, though
limited by the available arithmeticÕs precision: Choose apt diagonals Vt & Vd ;
replace C by C := VtáCáVd , b by b := Vtáb ; solve Cáx = b ; get x := Vdáx .
How are Òapt diagonalsÓ Vt & Vd to be chosen?

¥ To avoid introducing extraneous roundoff, restrict diagonalsÕ elements to
powers of the arithmeticÕs radix (2 for Binary, 10 for Decimal).
¥ If the uncertainties in the elements of the data {C, b} are known, diagonals
should ideally be chosen to make some common norm, say ||É||∞ ,
Equitable for scaled data: i.e., every ∆C with the same ||∆C||∞ is very
roughly equally (in)consequential or (in)signiÞcant or É . This ideal
may be unattainable, especially before solution x has been estimated.
¥ If the uncertainties in the elements of the data {C, b} are unknown, diagonals
should ideally be chosen to roughly minimize some common condition
number, say κp(C) = ||C||pá||CÐ1||p for p in {1, 2, ∞} , of scaled data.
This ideal usually costs too much; e.g. see p. 59. There is an exception:
Equilibration continues É minimize condition number É one exception:

A. van der SluisÕ Theorem (1969):
Suppose H is an N-by-N Positive-deÞnite symmetric or Hermitian matrix. See p. 63
Then κ2((Diag(H))Ð1/2áHá(Diag(H))Ð1/2) ≤ Námindiagonal V κ2(VáHáV) .
¥ Other equilibration schemes abound, all somewhat mysterious. For instance:

Compute diagonals Vt & Vd to turn every row- and column-sum of magnitudes of
C := VtáCáVd into 1 , making |C| Doubly Stochastic. This computation is iterative; usually it
converges fast but at times appallingly slowly, especially when equilibration is most needed.
Gaussian Elimination is affected by equilibration only through its effect upon the order of
pivot selection. This effect may be thwarted if column exchanges are disallowed, and
then computed results can be undeservedly grossly inaccurate, remediable only by
Iterative ReÞnement. See striking examples posted on my <É/Math128/FailMode.pdf>
Preconditioning resembles equilibrationÕs attempt to reduce κ(VtáCáVd) a

lot but allows non-diagonal choices of Vt & Vd , and rarely computes VtáCáVd
explicitly. Preconditioning is an essential step in the fast solution of discretized
continuum problems by iterative methods like Conjugate Gradients.
Iterative ReÞnement
attenuates ill effects of Ill-Condition or BIG dimensions or ... ?
Given F and c , let G stand for operations performed to solve Ò F⋅z = c Ó for z ;
e.g., triangular factorization of F , etc., or Conjugate Gradient iteration, ... .
Computation yields instead x := G⋅c ≈ z ... roundoff accumulates, iteration stops, ...
Let Residual r := c Ð F⋅x , computed as accurately as is affordable. Then

reuse (if saved) operations in G to get y := x + G⋅r ≈ z more closely than x ≈ z .
If it works, why does it work ? y Ð z = ( I Ð G⋅F )⋅(x Ð z) + more roundoff,

and if G ≈ F-1 roughly then we expect ||I Ð G⋅F|| << 1 , so ||y Ð z|| << ||x Ð z|| .
It might not work if ...

F is too Ill-Conditioned, within too few rounding errors of singular, or
G is too inaccurate, or cf. <www.eecs.berkeley.edu/~wkahan/Math128/FailMode.pdf>
Residual r is too inaccurate, drowned perhaps in its own rounding errors
for lack of extra-precise accumulation of r .
Diagonal Dominance
Some matrices are obviously invertible.
Suppose L maps a space linearly to itself, and ||L|| < 1 . Then (I Ð L)Ð1 exists
because LiouvilleÕs series (I Ð L)Ð1 = I + L + L2 + L3 + L4 + É converges.
Square matrix B is said to have Rows Dominated by its Diagonal when

every |bii| > ∑j≠i |bij| ; then BÐ1 exists. ( L := I Ð Diag(B)Ð1áB has ||L||∞ < 1 )
Square matrix B is said to have Columns Dominated by its Diagonal when

every |bjj| > ∑i≠j |bij| ; then BÐ1 exists. ( L := I Ð BáDiag(B)Ð1 has ||L||1 < 1 )
Often B is invertible though Ò = Ó replaces almost all Ò > Ó signs, but not always.
Gaussian Elimination generally needs Pivotal Exchanges to ensure numerical

stability, but they turn out to be unnecessary for matrices dominated by their
diagonals and also for symmetric Positive deÞnite matrices. É see p. 63.
Schur Complements
S := Z Ð EáCÐ1áD is the Schur Complement of C in B := C D .
E Z
Without pivotal exchanges, Gaussian Elimination reduces B through
successive Schur Complements of BÕs leading principal submatrices, like C ,
in the course of Triangular (LU) Factorization: B → UC UD

→U=
O S
¥ All Schur Complements in Diagonally Dominant matrices B are also

Diagonally Dominant, and do not grow much in norm.
¥ All Schur Complements in Symmetric Positive DeÞnite matrices B are also

Symmetric Positive DeÞnite, and do not grow at all in norm.
What matters most is ÒÉ do not grow ÉÓ lest elements of Z be corrupted.

The choice of norm implied in ÒgrowÓ can affect ÒcorruptedÓ drastically;
cf. pp. 3-6 of <É/Math128/FailMode.pdf>
When does equation Ò Fáx = y Ó have at least one solution x ?

And if it exists, when is solution x unique ?
Ivar FredholmÕs Alternatives:

Valid also in many an inÞnite-dimensional space; no determinants!
¥ At least one solution x exists if & only if

w'áy = 0 whenever w'áF = o'
as w' runs through the vector space dual to F Õs target space.
¥ If it exists, solution x is unique if & only if Fáz ≠ o whenever z ≠ o
as z runs through F Õs domain.
Proof: The canonical form of F under Equivalence is I O = RÐ1áFáC in

OO
which dimension(I) = rank(F) . (Some O Õs may be empty.) Now switch given
equation Ò Fáx = y Ó to Ò RÐ1áFáCá(CÐ1áx) = (RÐ1áy) Ó in canonical form; etc.
Generalized Inverses and Pseudo-Inverses

Suppose matrix F is non-invertible because it is not square and/or Rank(F) is
less than both dimensions. At least one Generalized Inverse G always exists
such that if equation Ò Fáx = y Ó has solution(s) x then x := Gáy is a solution.
The Generalized Inverses of F are the solutions G of Ò FáGáF = F Ó .
Example: The Least-Squares problem Ò Choose x to minimize ||Fáx - g||2 Ó

always has at least one solution x and, if more than one, then the
one that also minimizes ||x||2 is x := F ág in which F is F Õs
Moore-Penrose Pseudo-Inverse , a Generalized Inverse of F .
Ò FáF áF = F , F áFáF = F , (F áF)T = F áF , (FáF )T = FáF Ó characterize F

but F is best computed from the Singular Value Decomposition of F .
F and F are Reciprocal, each a Generalized Inverse of the other ;

but non-reciprocal Generalized Inverses of F may exist too. ...
How Big Must Generalized Inverses Be ?

Most are arbitrarily big; when some Z ≠ O satisÞes either FáZ = O or ZáF = O ,
then any Generalized Inverse G of F yields inÞnitely many others: G ± λ⋅Z .
Generalized Inverses of some matrices F are all HUGE :

Theorem: Every Generalized Inverse G of F has cf. picture on p. 37
||G|| ≥ 1/( min ||∆F|| over Rank(F - ∆F) < Rank(F) ) .
Equality can occur for Pseudo-Inverses gauged by the l2 Operator Norm ||...||2 :
||F ||2 = 1/( min ||∆F||2 over Rank(F - ∆F) < Rank(F) )
= 1/(the least nonzero singular value of F ) .
For Operator Norms ||...|| generally, the best that can be said appears to be ...
Theorem: F has at least one Generalized Inverse G with
||G|| ≤ √(Rank(F))/( min ||∆F|| over Rank(F - ∆F) < Rank(F) ) .
I hope this theorem has a short proof; mine is much too long.
Use of Generalized Inverses can be Dangerous !

E.g.: Pseudo-Inverse F = (FT⋅F)-1⋅FT or FT⋅(F⋅FT)-1 , whichever inverse exists,
unless neither inverse exists, in which case use ... . SVD ?
Limit Formula: F = limα→0+ (FT⋅F + α⋅I)-1⋅FT ?
Over-determined x :
F g
Choose x with minimum ||x||2 to minimize ||F⋅x Ð g||2 .
Solution: x = F ⋅g unless columns of F are too nearly linearly dependent.
Remedies for HUGE and HYPERSENSITIVE F :

QR factorization, Doubled precision, Better basis for x orthogonal polynomials
cf. pp. 15-16 of <É/HilbMats.pdf>
A Bad Idea: Tychonoff Regularization: x := (FT⋅F + α⋅I)-1⋅FT⋅g for a small α .

If good values for Regularization Parameter α exist they depend upon NOISE.
Better Idea: Choose basis for x so that ||∆x||2 is appropriate, then compute
SVD(F) to diagonalize Ò F⋅x ≈ g Ó and delete elements below noise levels.
Limit Formula: F = limα→0+ FTá(F⋅FT + α⋅I)-1 ?
Under-determined x :
Choose x with minimum ||x||2 to satisfy F⋅x ≈ g . F g
Solution: x = F ⋅g unless rows of F are too nearly linearly dependent.
Remedies for HUGE and HYPERSENSITIVE F :

Doubled precision? Discard redundant rows ? Change basis in Range(F) ?
A Bad Idea: Tychonoff Regularization: x := FT⋅(F⋅FT + α⋅I)-1⋅g for a small α .
If good values for Regularization Parameter α exist they depend upon NOISE.
Better Idea: Choose basis for g so that ||∆F||2 is Equitable; then compute
SVD(F) to diagonalize Ò F⋅x ≈ g Ó and delete elements below noise levels.
Alternatively, seek x with the fewest Òvery nonzeroÓ components.
High noise levels can leave no solution x determined unambiguously. Then É

Sometimes NO RESULT is better than a BAD RESULT
when NO GOOD RESULT exists.
A Generality
Often a computational task amounts to solving Ò F(z) = o Ó for z given F .
Errors and uncertainties due to roundoff etc. make F(z + δz) = δF instead.
Consequent uncertainty or error in z is δz ≈ F `(z)-1⋅δF .
Uncertainty in data that speciÞes F is ampliÞed in z by as much as ||F `(z)-1|| .
HUGE ampliÞcation ||F `(z)-1|| can occur only if DATA puts F CLOSE to a
SINGULARITY :
¥ A Pole (inÞnite value) of F `-1 .
¥ Conßuence (collapse of dimension; multiplicity of mapping F , of zeros z ).
¥ Exponentially-growing functions of variables unobviously near ∞ .
Common! See also my web pageÕs <É/WrongR.pdf>
Changes of variables/parameters/bases, perhaps nonlinear, can alter closeness

of F to a singularity, or change the norm that measures closeness, sometimes so
drastically as to change ampliÞcation of error by many orders of magnitude.
Good choices are worth the thought devoted to them.
Part IV:
Matrix Norms and Eigenvalues
Matrix Norms exceed Eigenvalues

Eigenvalue λ of L has L⋅v = λ⋅v for some Eigenvector v ≠ o ; therefore
|λ| ≤ ||L|| for every compatible ||...|| ,
including every Operator Norm.
Eigenvalue λ of L also has w'⋅L = λ⋅w' for some Left Eigenvector w' ≠ o' in
the space of linear functionals w' Dual to Domain(L) = Target-Space(L) . The
spaces, like λ , may be complex regardless of whether L is real.
Eigenvalues λ are the zeros of L Õs Characteristic Polynomial det(λ⋅Ι − L) .

Computing the Characteristic Polynomial explicitly is usually a numerically bad way to determine eigenvalues.
Eigenvalues λ are Continuous Functions of L ;

but Perhaps Not Differentiable,
unless λ is a Simple eigenvalue where É
det(λ⋅Ι − L) = 0 ≠ d det(λ⋅Ι − L)/dλ .
A Computed EigenvectorÕs Error is an Angle:

Let x be a computed approximation to a desired eigenvector v ;
then the error is not ||x Ð v|| but the difference between two Subspaces ,
one spanned by scalar multiples of x , the other ... of v .
The relevant error is the (unsigned) Angle ∠(x, v) between the two subspaces,
usually in complex spaces where x' is the complex conjugate transpose of x .
How to compute ∠(x, v) := arccos( |x'⋅v|/(||x||2⋅||v||2) ) :

NOT FROM THIS FORMULA ! Why not?
First choose a basis for which ||∆x||2 is an Equitable measure of perturbations.
Then replace x by x/||x||2 and v by v/||v||2 , so now √x'áx = ||x||2 = ||v||2 = 1 .
Compute ∠(x, v) := 2⋅arcsin( ||x⋅(x'⋅v/|x'⋅v|) Ð v||2/2 ) . Presubstitute 1 for (0/|0|) .
A generalization works for Angles between higher-dimensional Subspaces; see p. 74.

|Accurate| for all angles. For Signed Angle between two vectors see p. 15 of my <.../Cross.pdf>
(Unobviously) Clustered Eigenvalues of a Matrix

not Real Symmetric nor Hermitian nor Orthogonal nor Unitary nor ÒNormalÓ
can be hypersensitive to perturbations
0 10 0 0 0 0
0 0 10 0 0 0
Example: L := 0 0 0 10 0 0
; det(λ⋅Ι − L) = λ6 Ð 105⋅ξ .
0 0 0 0 10 0
0 0 0 0 0 10
ξ 0 0 0 0 0
6 Eigenvalues λ jump from 0 at ξ = 0 to |λ| = 1 at ξ = 1/105 . Clustered?
Eigenvectors of a Matrix can be hypersensitive to perturbations,

especially if it is close to a matrix with Repeated Eigenvalues.
Example: 6 eigenvectors of L at tiny ξ ≠ 0 collapse to one at ξ = 0 .
Whenever a matrixÕs Jordan Normal Form is not merely diagonal, it is a
Discontinuous Function of the matrixÕs elements, and thus very hard to compute.
GershgorinÕs Circles enclose Eigenvalues

Given a square matrix L , its eigenvalues λ can be located within the union of a
family of circular disks in the complex plane: É Row version. A Column version exists too.
Disk Di is centered at Lii with radius Σj≠i |Lij| = Σ | ith rowÕs off-diagonals | .
GershgorinÕs Circles Theorem :
Every eigenvalue λ of L lies in the union of all the disks Di .
Each disjoint component of that union contains
as many eigenvalues λ as it contains disks.
Numbers of Eigenvalues: 3 2 2
Proof: If no disk Di contains η it cannot be an eigenvalue since ηI Ð L is Diagonally

Dominant (p. 42). To count eigenvalues use continuity as disks expand from their centers.
This theorem is useful mainly for matrices with Prominent if not Dominant diagonals.
Sometimes replacing L by V-1⋅L⋅V for a suitable matrix V , perhaps diagonal, perhaps
consisting of approximate eigenvectors, can reduce the sizes of at least some of the disks.
Diagonal Prominence (rather than Dominance) also admits cheaper estimates of

Extreme Singular Values:
(Biggest Singular Value of L) = maxx≠o ||Láx||2/||x||2 = ||L||2 = ||L'||2 .
(Least Singular Value of square L) = minx≠o ||Láx||2/||x||2 = 1/||LÐ1||2 .
These Eigenvalues cost work O(min{m, n}3) to compute closely for m-by-n matrices L .
Cheaper estimates of ||L||2 :

(Biggest Euclidean row-length) = ||L||∞2 ≤ ||L||2 ≤ √má||L||∞2 , and
√( ||L||1á||L||∞ )/ 4√mán ≤ ||L||2 ≤ √( ||L||1á||L||∞ ) .
The blue Ò ≤ Ó inequalities come closer as the diagonal of L becomes more prominent.
(The 2nd-last Ò ≤ Ó is tight Ò = Ó when L is a Hadamard Matrix; cf. MATLABÕs hadamard.m .)
Cheaper underestimates of the least (nth) singular value of an n-by-n matrix L :

If possible, Þrst permute its rows and columns to make its diagonal prominent. Then
scale its rows or columns by factors of magnitude 1 to make the diagonal positive. Now
(nth Singular Value of L) ≥ mink { Lkk Ð ∑j≠k |Ljk + Lkj|/2 } ; hope itÕs positive.
Cf. Chas. R. JohnsonÕs ÒA Gersgorin-type Lower Bound for the Smallest Singular ValueÓ pp.1-7 Lin. Alg. & Appl. 112 (1989)
Appending rows and/or columns to n-by-n L cannot decrease its nth singular value.
Sensitivity of Eigenvalues to Tiny Perturbations

Say λ is an eigenvalue of L with eigenvectors x and w' ,
perhaps all complex: Láx = λáx and w'áL = w'áλ . Then
dλ/dζ = w'á(dL/dζ)áx/w'áx
UNLESS λ is a multiple eigenvalue, in which case dλ/dζ
may be multi-valued or inÞnite: λ
or
ζ
When λ is a simple eigenvalue, κ := ||w'||á||x||/|w'áx| is its
Condition Number.
κ can be arbitrarily big unless L stays special, say ÒNormalÓ
(i.e. L'áL = LáL' É Real Symmetric, Hermitian, Unitary, Orthogonal É κ = 1 )
ÒStaysÓ? If L is Hermitian but not dL , κ can be ≈ 1+2álog(dimension)/π .
Sensitivity of Eigenvalues to Perturbations, contÕd

Suppose matrix L is diagonalizable by similarity (as almost
all square matrices are) and let Λ = XÐ1áLáX be the diagonal
matrix of eigenvalues of L ; a matrix of eigencolumns is X .
How much can Λ NOT change if L changes to L + ∆L ?
Now (probably non-diagonal) Λ + ∆Λ := XÐ1á(L + ∆L)áX , so

||∆Λ||1 = ||XÐ1á∆LáX||1 ≤ ||XÐ1||1á||∆L||1á||X||1 = κ1(X)á||∆L||1 .
Here κ1(X) is a (perhaps excessive) condition number for inversion of
X . Now GershgorinÕs Circles Theorem applied to Λ + ∆Λ implies É
Bauer-Fike Theorem:
No eigenvalue of L + ∆L can differ from an eigenvalue
of L by more than min{ κ1(X)á||∆L||1 , κ∞(X)á||∆L||∞ } .
The Perron-Frobenius Theory of Nonnegative Matrices

Let P be any square matrix with elements all positive; we shall write Ò P > O Ó .
Then P has a positive eigenvector, its ÒPerron VectorÓ p > o , and a positive
eigenvalue ρ > 0 with Páp = ρáp ; and P Õs ÒPerron RootÓ ρ is a simple
eigenvalue strictly bigger than the magnitude of every other eigenvalue of P .
Proof: The simplex S := {s: s ≥ o & ||s||1 = 1} is closed, convex and bounded, and mapped into
itself continuously by function π(s) := Pás/||Pás||1 . BrauerÕs Fixed-Point Theorem provides a Þxed-
point p = π(p) strictly inside S , and ρ = ||Páp||1 . Let V := Diag(p) and B := VÐ1áPáV ; then B
and P have the same eigenvalues, and ρ = ||B||∞ is the Perron Root of B with Perron eigenvector
b = [1; 1; 1; É; 1] . JacobiÕs formula: d Det(λI Ð B)/dλ = Trace(Adj(λI Ð B)) > 0 @ λ = ρ because
principal submatrices of ρI Ð B are diagonally dominant, so ρ is a simple eigenvalue. Except ρ ,
every other eigenvalue § of B has |§| < ||B||∞ = ρ , strictly < when the eigenvector is considered.
Let C be any square matrix with |C| ≤ P elementwise. Then no eigenvalue of

C can exceed in magnitude the Perron Root ρ of P .
If, instead of P > O , we have merely P ≥ O , then Perron Root ρ need not be simple; it and/or
elements of eigenvector p may vanish, and other eigenvalues of P may match ρ in magnitude.
Application of Perron-Frobenius theory to

Optimal Diagonal Equilibration
(See the bottom of p. 39.)
Suppose we seek diagonal matrices Λ and V to minimize the condition number

κ∞(Λ-1áCáV-1) := ||(Λ-1áCáV-1)-1||∞á||(Λ-1áCáV-1)||∞ .
Theorem:
min( κ∞(Λ-1áCáV-1) over all diagonal Λ and V ) = Perron Root of |C-1|á|C|
and the minimum is achieved (at some computational cost) by setting
Λ := Diag( Perron Vector of |C|á|C-1| ) and
V := Diag( Perron Vector of |C-1|á|C| ) .
É due to F.L. Bauer [1963] ÒOptimally Scaled MatricesÓ, pp. 73-87
of Numerische Mathematik 5.
From the late 1950s Fritz Bauer and his students in Mainz and Munich were major contributors to
the applications of norms to numerical analysis. Bauer also contributed to the design of Algol 60,
an early programming language more humane than most. Later he became ÒMr. ComputingÓ to the
Bavarian government. Recently he authored a fascinating text on Cryptology (Springer). As of this
writing he is still active at age 90, having barely survived 1942 - 1945 in the Wehrmacht.
Part V:
Matrix Norms and

Real Symmetric MatricesÕ Eigenvalues
The best source about eigensystems of real symmetric and Hermitian matrices
is B.N. ParlettÕs book The Symmetric Eigenvalue Problem
[1998] 426 pp., SIAM, Philadelphia
Real Symmetric MatricesÕ Eigenvalues are all É
Stationary Values of Quotients of Real Quadratic Forms

¥ The Stationary Values (also called Critical Values) of a function Ä(x)
are the values it takes where its derivative Ä `(x) vanishes.
¥ The Stationary Points (also called Critical Points) of a function Ä(x)
are the arguments x at which its derivative Ä `(x) vanishes.
Instances are maxima and minima, but these are far from the only instances.
e.g., Ä(ξ) := 3ξ5 Ð 5ξ3 takes all real values on the real ξ-axis,
takes a locally maximum value Ä(Ð1) = +2 ,
takes a locally minimum value Ä(+1) = Ð2 ,
and takes another stationary value Ä(0) = 0 neither maximum nor minimum.
2.5
1.5
0.5
-0.5
-1
-1.5
-2
-2.5
-1.5 -1 -0.5 0 0.5 1 1.5
Real Symmetric MatricesÕ Eigenvalues are all É

¥ A Real Quadratic Form Φ(x) := x'áHáx , where column x represents x
in some basis, and matrix H = H' represents a symmetric bilinear operator:
Háxáy = Háyáx = y'áHáx . For a real vector space, y' and H' are transposes,
and H is the matrix of a linear map H from x Õs vector space to its dual.
Abstractly, Ò Φ(x+y) + Φ(xÐy) ≡ 2Φ(x) + 2Φ(y) Ó characterizes quadratic forms Φ . cf. p. 20.
Example: 2nd derivative in Taylor Series for scalar µ(z + x) = µ(z) + µ`(z)áx + µ"(z)áxáx/2 + É
(Complex spaces, for which y' and H' are complex conjugate transposes, and H = H' is Hermitian
instead of real symmetric, and Háxáy is complex conjugate to Háyáx , will not be treated here. Besides,
the treatment of complex spaces would differ only slightly from real.)
Typically Φ(x) is some kind of Energy,Ñ Kinetic Energy if x stands for velocities or
momenta, Elastic Energy if for inÞnitesimal displacements from equilibrium, etc.
How does a change of basis from B to B := B·C–1 affect H ? cf. pp. 18-19
x changes to x := C·x but w' to w' := w'·C–1 , and H to H := C'–1·HáCÐ1 , whence
Φ(x) = x'áHáx = x'áHáx . ( Φ(x) need not be defined upon the vector space dual to x Õs .)
The relation between H and H := C'–1·HáCÐ1 is called a Congruence.
Diagonalization of a Real Quadratic Form by Congruence

Let real quadratic form Φ(x) := x'áHáx in some chosen basis, no matter which.
Lagrange showed that InÞnitely Many Congruences C'Ð1áHáCÐ1 = V have a diagonal V .
Let n+ count positive diagonal elements of V , n0 É zero É, nÐ É negative É .
SylvesterÕs Inertia Theorem: Every diagonal V congruent to H has
the same Inertia(H) := {n+, n0, nÐ} .
Some authors use the word ÒSignatureÓ instead of ÒInertiaÓ, a word chosen by Sylvester.
Geometrically, n+ = max{ dimension of every Subspace on which Φ(x) > 0 for x ≠ o } ;

nÐ = max{ É Φ(x) < 0 for x ≠ o } ; n0 = dimension(x) Ð n+ Ð nÐ .
Inertia distinguishes shapes of Ellipsoids vs. Hyperboloids of one or two sheets.
Nomenclature for Φ and Symmetric H

Names n+ positives n0 zeros n– negatives
Nonnegative Definite Positive Definite ≥1 0 0
Positive Semidefinite ≥1 ≥1 0
Indefinite ≥1 ≥1
Degenerate or Singular ≥1
Nonpositive Definite Negative Semidefinite 0 ≥1 ≥1
Negative Definite 0 0 ≥1

Given two real quadratic forms Φ(x) := x'áHáx and Ψ(x) := x'áMáx , we seek all
Stationary Values ρ of the Rayleigh Quotient ρ(x) := Φ(x)/Ψ(x)
as x runs through all nonzero vectors in a real space.
Each stationary value of ρ(x) and its stationary point x turn out to satisfy
H·x = ρ(x)·M·x and M·x ≠ o .
Hence this ρ and x are solutions of a Symmetric Generalized Eigenproblem.
If M–1 exists, each such ρ is an eigenvalue of B := H·M–1 . However, not all
eigenvalues ß of B need be stationary values of ρ(x) , not even if all are real.
e.g., H := 20 , M := 01 , x := ξ , ρ(x) = ξ2/(ξη) ; B = H·M–1 = 02 , ß = 0 . No stationary ρ .
00 10 η 00
e.g., H := 2 0 , M := 01 , x := ξ , ρ(x) = ξ/η – η/ξ ; B = H·M–1 = 0 2 , ß = ±2ı . No stationary ρ .

0 –2 10 η –2 0
Every real square matrix B = H·M–1 for some symmetric H = H' and M = M' .
Therefore, without further restrictions, Symmetric Generalized Eigenproblems and
unrestricted nonsymmetric eigenproblems fall prey to the same pathologies; cf. pp. 51-56.
Consequently we shall impose further restrictions upon Φ and Ψ in what follows. …
Simultaneous Diagonalization by Congruence

A powerful motive to find all the stationary points x of the Rayleigh Quotient
ρ(x) := x'·H·x/x'·M·x is that, IF they are linearly independent and numerous
enough to constitute the columns of an invertible matrix C–1 , they provide a
new coordinate system (basis) that transforms H and M into diagonal matrices
H := C'–1·HáCÐ1 and M := C'–1·MáCÐ1 simultaneously by the same congruence.
The eigenvalues (stationary values) ρ are the diagonal elementwise quotients H./ M .
They are often identified with squares of resonant frequencies of vibration.
The columns of C–1 are often identified with “Natural Modes” of vibration.
What conditions suffice for simultaneous diagonalization by congruence?

• John Milnor’s Criterion: If x = o whenever x'·H·x = x'·M·x = 0 , and if the
dimension of vectors x is n ≠ 2 , then some linear combination of
real symmetric H and M is positive definite. If n = 2 ? cf. 2d. e.g. on p. 64.
(The case n = 2 is unexceptional when x is complex and H and M Hermitian.)
• If some linear combination of real symmetric H and M is positive definite,

H and M can be diagonalized simultaneously by a congruence C–1 …
Suppose some linear combination, say η·H + µ·M , of real symmetric H and M
is positive definite, so some congruence diagonalizes H and M simultaneously.
How then may such a congruence C–1 be found?
Choose a second linear combination, say α·H – β·M , independent of the first
so α·µ + β·η ≠ 0 . A good choice would have ||α·H – β·M|| << || |α·H| + |β·M| ||
by virtue of substantial cancellation, but this may not be feasible. Thus a given
Symmetric Generalized Eigenproblem “ H·x = ρ·M·x ” is converted into a
Definite Symmetric Generalized Eigenproblem “ H·y = ρ·M·y ” in which
H := α·H – β·M = H' and M := η·H + µ·M = M' is positive definite.
Their eigenvalues are related: ρ = (α·ρ – β)/(η·ρ + µ) ; ρ = (β + µ·ρ)/(α – η·ρ) .
A way to compute them and a desired congruence: Cholesky factorize M = U'·U
and compute an eigendecomposition of W := U'–1·H·U–1 = W' = Q·Ω·Q' with
an orthogonal Q' = Q–1 and diagonal Ω of eigenvalues ρ . Now C–1 := U–1·Q .
Thus does MATLAB’s eig . Since such eigenproblems can be pathological, their error-analysis isn’t
yet tidy enough for a succinct and memorable overview. See instead Ren-Cang Li’s §15.4 in L.
Hogben’s Handbook … cited under Further Reading, and my …/Math128/GnSymEig.pdf> . No
comparable numerical scheme is known to find an Indefinite Symmetric Generalized Eigenproblem’s
congruence when it exists. What follows concerns problems that are easier and better understood. …
The (ordinary) Real Symmetric Eigenproblem

… is the special case M = I of the Definite Symmetric Generalized Eigenproblem:
Given a real symmetric H = H' , we seek its eigenvalues θ and eigenvectors q
satisfying H·q = θ·q . All of them constitute a diagonal Θ and an orthogonal
Q = Q'–1 satisfying Q'·H·Q = Θ . The eigenvalues can be ordered in two ways:
• Ascending: θ1 ≤ θ2 ≤ θ3 ≤ … ≤ θn . • Descending: θ1 ≥ θ2 ≥ θ3 ≥ … ≥ θn .
These two orderings are relevant to the identification of eigenvalues as stationary
values of the Rayleigh Quotient ρ(x) := x'·H·x/x'·x via the …
Courant-Fischer Minimax Principle:
Ascending order, θk = Minsubspaces S of dimension k Maxnonzero x in S ρ(x) .
Descending order, θk = Maxsubspaces S of dimension k Minnonzero x in S ρ(x) .
Let a perturbation ∆H = ∆H' that changes eigenvalues of H + ∆H to Θ + ∆Θ

have Inertia(∆H) = { π := n+ , ζ := n0 , ν := n– } . (See p. 63.) Then ordered …
Ascending, θk + ∆θk ≤ θk+π for 1 ≤ k ≤ n–π ; θm + ∆θm ≥ θm–ν for ν < m ≤ n .
Descending, θi + ∆θi ≥ θi+ν for 1 ≤ i ≤ n–ν ; θj + ∆θj ≤ θj–π for π < j ≤ n .
… useful mainly when π and/or ν is small like 0, 1 or 2 .
Absolute and Relative Perturbations

Let perturbation ∆H = ∆H' change eigenvalues of H + ∆H to Θ + ∆Θ with the
same ordering as Θ . Then ||∆Θ|| ≤ ||∆H|| for every Orthogonally Invariant
norm ||…|| . (See p. 32.) The norms usually chosen are ||…||2 and ||…||F .
The foregoing inequality ||∆Θ|| ≤ ||∆H|| is satisfactory when H is the matrix of a

symmetric linear operator from a Euclidean space to itself. More generally if H
is the matrix of a symmetric linear operator from a normed vector space to its
dual, no orthogonally invariant norm need be Equitable; different perturbations
with the same norm ||∆H|| may differ utterly in significance and effect.
Graded matrices; cf. p. 30. Better coordinates; cf. p. 12. Other norms; cf. Li & Mathias [1999]
Ostrowski’s Refinement of Sylvester’s Inertia Theorem: (See p. 63)

If the eigenvalues Θ of H change to eigenvalues Θ of H := C'–1·H·C–1 with
the same ordering, Inertia(Θ) = Inertia(Θ) and every θj ≠ 0 in Θ has
1/||C'·C||2 ≤ θj/θj ≤ ||(C'·C)–1||2 .
Typically this is applied with C close to I , so H is Relatively close to H .
Partial Eigensystems and Spectral Gaps

The Spectrum of n-by-n real symmetric H = H' is the Multiset of its n eigenvalues θj
(some of which may be repeated) which we assume to be ordered Descending, say. Let
E (H) denote this so ordered spectrum { θ1 ≥ θ2 ≥ θ3 ≥ … ≥ θn } of H .
Especially when dimension n is big, occasions arise to compute only those eigenvalues
of H in some interval separated from the rest of E (H) by a Gap or two. To this end,
suppose n-by-m matrix F has m < n linearly independent columns approximating
(perhaps poorly) m eigenvectors of H , and suppose m-by-m real symmetric matrix M
(not necessarily diagonal) has a spectrum E (M) = {µ1 ≥ µ2 ≥ µ3 ≥ … ≥ µm} thought to
approximate part of E (H) . Let Residual R := H·F – F·M and let β := ||F ||2·||R||2 .
R := – M
Computable H F F For ||F ||2 see p. 44.
Theorem: Among n eigenvalues θj of H there are m each of which lies in a different

(though perhaps overlapping) interval µi – β ≤ θ ≤ µi + β for i = 1, 2, 3, …, m .
Eigenvectors of H are orthogonal. If their estimates in F are too far from orthonormal,
||F ||2 may be excessively big. A remedy for this is Reorthogonalization: …
Reorthogonalization:
One way replaces F by Q from the QR factorization F = Q·U with upper-triangular U
and Q'·Q = I . Other ways, one sometimes faster, one closer to F , are explored in
<www.eecs.berkeley.edu/~wkahan/Math128/NearestQ.pdf> .
After a (closely) orthonormal n-by-m matrix Q replaces F , the new residual becomes
R := H·Q – Q·M , and β := ||R||2 . Then, as before, the m eigenvalues µi of E (M)
estimate m of the n eigenvalues θj in E (H) thus:
Among n eigenvalues θj in E (H) there are m each of which lies in a different
(possibly overlapping) interval wherein | µi – θ | ≤ β for i = 1, 2, 3, …, m .
Now R can have its norm β := ||R||2 minimized by the choice M := Q'·H·Q . Then those
m eigenvalues θj fall into much narrower intervals when β is much tinier than Spectral
Gaps between the rest of E (H) and those m or their estimates µi . To describe these
Spectral Gaps we perform a (notional, not necessarily computed) change of coordinates:
Let n-by-n orthogonal [Q, Q] be obtained from any n–m orthonormal columns Q
orthogonal to Q , so n-by-n [Q, Q]'·[Q, Q] = I . This [Q, Q] provides a new ortho-
normal coordinate system in which the linear operator formerly represented by H is now
represented by [Q, Q]'·H·[Q, Q] . Of course, spectrum E ([Q, Q]'·H·[Q, Q]) = E (H) .
To define Spectral Gaps, let …
[Q, Q]'·H·[Q, Q] =: M B and set Y := M O . Now O represents R := H·Q – Q·M in

B' W O' W B'
the new coordinates; β := ||R||2 = ||B||2 . Usually only M = Q'·H·Q and R are computed.
Let names for the relevant ordered spectra be …
E (H) = E ([Q, Q]'·H·[Q, Q]) = { θ1 ≥ θ2 ≥ θ3 ≥ … ≥ θn } which we wish to estimate.
E (M) = { µ1 ≥ µ2 ≥ µ3 ≥ … ≥ µm } which are our m computed estimates.
E (W) = { ω1 ≥ ω2 ≥ ω3 ≥ … ≥ ωn-m } for which rough estimates will be needed.
E (Y) = { η1 ≥ η2 ≥ η3 ≥ … ≥ ηn } = E (M) ∪ E (W) as multi-sets. |θi –ηi| ≤ β .
~~~~~~~~~~~~~~~~~~~~~~~
For i = 1, 2, 3, …, n define the Gaps γi between spectra E (M) and E (W) thus:
If ηi ∈ E (M) then γi := minj |ηi – ωj| else if ηi ∈ E (W) then γi := minj |ηi – µj| .
Let Gap γ := mini γi . Usually E (M) and E (W) are disjoint, and then γ > 0 .
Then Chi-Kwong Li & Ren-Cang Li [2005] proved VERY GENERALLY that every
|θi – ηi| ≤ β2/( γi/2 + √( β2 + γi2/4 ) ) ≤ β2/( γ/2 + √( β2 + γ2/4 ) ) ≤ min{ β , β2/γ } .
When β << γ these inequalities estimate that part of E (H) approximated by E (M) far
more tightly than β because somehow the rest of E (H) is known to be farther away.
SOMEHOW ?
Error-bounds on three previous pages bound Absolute errors θi –ηi . What about
Relative Errors log(θi/ηi) ?
Again, n-by-m matrix Q has orthonormal columns, M := Q'·H·Q , R := H·Q – Q·M .
Now, provided ||R·M–1||2 < ψ < 1 , the m eigenvalues µi in E (M) estimate m of the
n eigenvalues θj in E (H) thus: cf. p. 70
Among n eigenvalues θj in E (H) there are m each of which lies in a different
(possibly overlapping) interval wherein | log(θ/µi) | < ψ for i = 1, 2, 3, …, m .
These bounds are advantageous only when ||R·M–1|| is a lot smaller than ||R||·||M–1|| .
Deflation replaces H = H' = M B by Y = M O when ||B|| is deemed small enough.

2
B' W O' W
Its motivation is to repace a big eigenproblem by two smaller ones, maybe not both much smaller.
Replacing E (H) by E (Y) incurs Absolute errors bounded by ||B||2 , or by ||B||22/γ if
the gap γ is known to be big enough. Relative errors are bounded by ψ < 1 whenever
both ||M–1·B||2 < ψ and ||B·W–1||2 < ψ .
Only in special situations can these Relative error-bounds be worth what they cost to compute.
Relative Error Bounds for Deflated Singular Value Problems

Let triangular S := D E and Z := D O have singular value multisets respectively
O' F O' F
S (S) = { σ1 ≥ σ2 ≥ … ≥ σn } and S (Z) = { ζ1 ≥ ζ2 ≥ … ≥ ζn } = S (D) ∪ S (F)
wherein S (D) = { δ1 ≥ δ2 ≥ … ≥ δm } and S (F) = { φ1 ≥ φ2 ≥ … ≥ φn–m } .
Z comes from S via Deflation. Every Absolute Error |σj – ζj| ≤ ||E|| .
What about Relative Errors log(σj/ζj) ?
If either ||D–1·E|| < 2ψ < 1 or ||E·F–1|| < 2ψ < 1 then every | log(σj/ζj)| < ψ .
It persists with 0/0 := 1 even if some ζj = 0 so long as either ||D–1·E|| or ||E·F–1|| exists.
Example:
Let n-by-n S := bidiag s s … s s e = D e in which the pair s is
1 1 … … 1 1 f o' f 1
missing from only the first and last columns, and s > f >> 1 > e > 0 .
σ1 ≈ s + 1 >> σn ≈ (s2 – 1)/√(s2n – n·s2 + n – 1) for s > 3 and n > 3 .
Deleting e causes relative error < ψ if e < 2ψ·f though this e can exceed σn hugely.
For relative error-bounds improved substantially by a knowledge of gaps like γ , see Kahan [2012"].
Spectral Gaps, Invariant Subspaces, and Angles

Although the n eigenvectors of n-by-n H = H' can be chosen to be orthonormal, those
belonging to eigenvalues repeated or too tightly clustered are partially indeterminate, at
least numerically. Instead the Invariant Subspace spanned by the cluster’s eigenvectors
is determined accurately whenever the cluster is separated from the rest of the spectrum by
a sufficiently wide gap. The Angles between a subspace and an approximation to it are
described best in Davis & Kahan [1969]; the best way to compute them is on p. 7 of
<www.eecs.berkeley.edu/~wkahan/Math128/NearestQ.pdf> .
Let the columns of n-by-m Q be an orthonormal (Q'·Q = I) basis for an approximation

to the invariant subspace belonging to a tight cluster of m eigenvalues of H . As before,
[Q, Q]'·H·[Q, Q] = M B ; Y := M O ; R := H·Q – Q·M = Q·B' so β := ||R||2 = ||B||2 .
B' W O' W
E (H) = E ([Q, Q]'·H·[Q, Q]) = { θ1 ≥ θ2 ≥ θ3 ≥ … ≥ θn } . Recall M := Q'·H·Q .
E (M) = { µ1 ≥ µ2 ≥ µ3 ≥ … ≥ µm } approximates a tight cluster in E (H) .
E (W) = { ω1 ≥ ω2 ≥ ω3 ≥ … ≥ ωn-m } approximates the rest of E (H) .
Gap γ := mini γi = mini,j |µi – ωj| . This may be of little use unless γ >> 2β .
Let Â be the biggest angle between Range(Q) and the cluster’s invariant subspace;
then Â ≤ arcsin(2β/γ)/2 if γ > 2β .
When E (W) is all on just one side of E (M) , then Â ≤ arctan(2β/γ)/2 ≤ β/γ .
Miscellany:
• Elaborations of Perturbation Theory: Ren-Cang Li’s §15 in Hogben [2007]
• Compute Eigenvalues to High Relative Accuracy: Z. Drmac’s §46 in Hogben
… of Graded Real Symmetric Matrices, Positive Definite Matrices (only ?)
… of Specially Structured Matrices: see his citations of P. Koev et al.
Another such example: pp. 12-13 of <…/HilbMats.pdf>
• Computational Methods and Perturbation Theories of Singular Values of L :

T
… same as for eigensystem of O L amended by some simplifications.
L O
see R. Mathias’ §17 in Hogben [2007]
Regions in the Complex Plane associated with L acting on a complex space:

• ε-Pseudo-Spectrum of L : { ζ for which ||(ζ·I – L)–1|| > 1/ε } . Usually ||…||2
… includes spectrum of L ; see M. Embree’s §16 in Hogben [2007]
• Field of Values of, or Numerical Range of L : { w'·L·x/w'·x } as w' and x
run through all nonzero pairs Dual (p. 17) with respect to the space’s norm.
… includes spectrum of L . Convex set for ||…||2 ; see C.K. Li’s §18 in Hogben
Citations and Further Reading

A huge encyclopedic survey of facts citing the literature for their proofs is Handbook of
Linear Algebra ed. by Leslie Hogben [2007] 1504 pp., Chapman & Hall/CRC.
For a treatment of finite-dimensional linear algebra that prepares the diligent reader for
infinite dimensions, try Finite-Dimensional Linear Analysis, a Systematic Presentation
in Problem Form by I.M. Glazman & Ju.I. Ljubic, translated and edited by G.P. Barker
& G. Kuerti [1974] MIT Press. This huge text’s exposition consists of about 1500
problems with hints but none accompanied by a full solution. You must do it yourself.
Normed Linear Spaces 3rd ed. by M.M. Day [1973], Springer-Verlag, is a short (211
pages) brutally compressed overview of the situation in infinite-dimensional spaces. The
last chapter is a nine-page reader’s guide to the literature up to 1972.
For more about unitarially invariant Cross-Norms and Symmetric Gauge Functions
applicable to linear operators upon Hilbert spaces, see ch. V of Norm Ideals of
Completely Continuous Operators by R. Schatten [1960], Springer-Verlag.
B.N. ParlettÕs book The Symmetric Eigenvalue Problem [1998] 426 pp., SIAM,
Philadelphia, is the best source about the properties and computations of eigensystems of
real symmetric and Hermitian matrices.
Citations & Further Reading continues É
É Citations & Further Reading continued É
N.J. Higham [2002] Accuracy & Stability of Numerical Algorithms 2nd ed., ~700 pp,.
SIAM, Philadelphia, is the best text treating error-analysis of roundoff.
Often better than bounds upon norms of errors are elementwise bounds mentioned herein
at the bottom of p. 30. For more about them see Higham’s “A survey of componentwise
perturbation theory in numerical linear algebra” pp. 49-77 in …
W. Gautschi (ed.) Mathematics of Computation 1943-1993, A half century of
computational mathematics; Proc. of the Math. of Comp. 50th Anniv. Symposium,
9 - 13 Aug. 1993 in Vancouver B.C., American Math Soc.
Chi-Kwong Li & Roy Mathias [1999] ÒThe Lidskii-Mirsky-Wielandt Theorem Ñ

additive and multiplicative versionsÓ pp. 377-413 of Numerische Mathematik 81, is a
superb survey with elegant proofs of matrix normsÕ relations with Hermitian matrices.
Chandler Davis & W.M. Kahan [1969] “Some New Bounds on Perturbations of
Subspaces” pp. 863-868 in Bulletin Amer. Math. Soc. 75 #4. This describes bounds upon
angles between subspaces in a readable way, far more so than the most often cited …
Davis & Kahan [1970] “The Rotation of Eigenvectors by a Perturbation. III” pp. 1-46 in
SIAM J. Numer. Anal 7 #1. Used herein on p. 74.
Citations & Further Reading continues É
É Citations & Further Reading continued É
Chi-Kwong Li & Ren-Cang Li [2005] ÒA note on eigenvalues of perturbed Hermitian

matricesÓ pp. 183-190 in Linear Algebra and its Applications 395 , cited herein on p. 71.
ÒDeßations Preserving Relative AccuracyÓ by W. Kahan [2012"] was posted recently at

<www.eecs.berkeley.edu/~wkahan/4June12.pdf> and fully at É/ma221/Deßate.pdf> .
Mentioned at the bottom of p. 56 is the possibly heightened sensitivity of the eigenvalues

of an Hermitian matrix to non-Hermitian perturbations. For more about this see É
“Spectra of Operators with Fixed Imaginary Parts” by Andrzej Pokrzywa [1981], pp.
359-364 in Proc. Amer. Math. Soc. 81 #3, and …
ÒArbitrary Perturbations of Hermitian MatricesÓ by Arnold Schšnhage [1979], pp. 143-9
in Linear Algebra and its Applications 24
Epilogue
I learned what little I know about norms etc. over half a century ago, and later turned in
a different direction. Consult appropriate Math. Dept. professors for this century’s
understandings of normed and more general metric spaces.
As an error-analyst, I have chosen mostly applications to error-analyses to illustrate how a

norm can be used. Scattered among them are attempts to awaken an awareness of how
important, despite its difficulty, is choosing an appropriate thing to gauge with a norm.
Choosing an Equitable norm (pp. 12, 30, 39, 48, 52, 68) raised that issue. However, …
Many a situation cannot be comprehended in a single number.
These situations abound in Scientific and Engineering computations, and in almost all
human endeavors; see a few surprising military examples in …
J.G. Roche & B.D. Watts [1991] “Choosing Analytic Measures” pp. 165-209 in J. Strategic Studies
14 #2. Disregard their mathematically naive and irrelevant uses of “linear” and “chaos”.
Still, when a decision is needed, it often amounts to distilling one number out of many:
Pass a test? Accept a candidate? Choose a purchase? Launch an enterprise? …
Such unavoidable decisions must occasionally be mistaken or, less often, very lucky.

A Tutorial Overview of Vector and Matrix Norms

Uploaded by

Copyright:

Available Formats

A Tutorial Overview of Vector and Matrix Norms

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Tutorial Overview of Vector and Matrix Norms

Uploaded by

Copyright:

Available Formats

File: NormOvrv Tutorial Overview of Vector and Matrix Norms Version dated January 30, 2013 11:18 am

Prepared for the Numerical Analysis Seminar at U.C. Berkeley, 2010-2013 ,

by W. Kahan, Prof. Emeritus

This has been posted at

This tutorial’s pages have been posted at <.../NormOvrv.pdf> .

What are Norms for?

1• A “Backward” Error-Analysis of a computed approximation Y to ƒ(X) :

2• Convergence Analysis of an Iteration towards a Fixed-Point z :

Example 1: A “Backward” Error-Analysis

We deem program F to be “Backward Stable” numerically just If we have proved that

Example 2: Convergence Analysis of an Iteration

Instead we explore conditions upon ƒ easier to test. For instance, maybe ƒ is a …

The trouble with norms is that there are so many of them.

The trouble with norms is that there are so many of them.

An applied mathematician’s first challenge is to choose suitable parameters

• ||x||∞ := max |ξj| . Ball is a solid Cube. Postal box

• ||x||2 := √(∑ |ξj|2) . Ball is a solid Sphere. Euclidean length!

• ||x||1 := ∑ |ξj| . Ball is a solid Octahedron Taxicab norm

A change of variables often induces a change of norm. Or is it vice-versa ?

How should a norm be chosen for error-analysis?

A change of variables often induces a change of norm. Or is it vice-versa ?

How should a norm be chosen for convergence analysis?

Global Convergence of a Non-Stationary Iteration xn+1 := ƒn(xn) → z = ƒm(z) for all m

And now for something entirely different:

Why are vector spaces like potato chips?

Why are vector spaces like potato chips?

• wT acts linearly upon vectors x and y : wT·(λ·x + µ·y) = λ·wT·x + µ·wT·y .

The Dual-Space’s Norm

Examples of dual norms for rows and columns: … in MATLAB’s notation …

Vectors and Functionals Dual with respect to the Norm

» For ||x||∞ : If |ξj| < ||x||∞ then ωj := 0

» For ||x||1 : If ξj = 0 then choose any ωj with |ωj| ≤ ||x||1

Changing a vector space’s Basis changes the Dual-space’s Basis:

Changing basis to B := B·C–1 changes column x representing vector x = B·x = B·x to

• When can “ …T ” be construed as an operator instead of merely a suffix?

Real Inner-Product Spaces

A Euclidean space’s metric M is the Identity matrix I , which simplifies everything.

Changing basis B to B := B·C–1 changes metric M to M = C–1T·M·C–1 . Therefore

What distinguishes an Inner-Product or Euclidean space

And then x•y ≡ ( ||x + y||2 – ||x – y||2 )/4 .

• Every Inner-Product space is Isomorphic to its Dual-space.

The Derivative of a Vector Norm ( Say dimension n > 1 .)

One such P is a circumscribing parallelepiped of minimum volume. And if a new basis

Fritz John’s Ellipsoid Theorem: (1948)

One such E is a circumscribing ellipsoid of minimum volume. And if a new basis E is

• Triangle Inequality: ||L + K|| ≤ ||L|| + ||K|| for all L and K .

The trouble with norms is that there are so many of them.

• ||L||∑ := ∑i ∑j |Lij| extends from ||x||1

And we shall get it from Operator Norms defined thus:

• ||L|| := ( max ||L·x|| over ||x|| = 1 ) = ( max ||wT·L|| over ||wT|| = 1 )

Actually we compute ||L||αβ from elements Lij of L ’s matrix L := E–1·L·B , just as

Formulas for Five Familiar Operator Norms of Matrices

• ||L||∞ = maxi ∑j |Lij| = ||LT||1 , the biggest Row-Sum of Magnitudes

• ||L||1 = maxj ∑i |Lij| = ||LT||∞ , the biggest Column-Sum of Magnitudes

• ||L||∞2 = maxi √( ∑j |Lij|2 ) , the biggest Euclidean Row-Length

• ||L||2 = ( the biggest Singular Value of L ) = ||LT||2 = √( biggest Eigenvalue of LT·L )

Maximized Ratios of Norms

The trouble with norms is that there are so many of them.

How should a Matrix norm be chosen for error-analysis?