Convexity properties of the condition number.
∗
Carlos Beltrán †
Jean-Pierre Dedieu ‡
Gregorio Malajovich §
Mike Shub ¶
June 30, 2009
Abstract
We define in the space of n × m matrices of rank n, n ≤ m,
the condition Riemannian structure as follows: For a given matrix A
the tangent space at A is equipped with the Hermitian inner product obtained by multiplying the usual Frobenius inner product by
the inverse of the square of the smallest singular value of A denoted
σn (A). When this smallest singular value has multiplicity 1, the function A → log(σn (A)−2 ) is a convex function with respect to the condition Riemannian structure that is t → log(σn (A(t))−2 ) is convex, in
the usual sense for any geodesic A(t). In a more abstract setting, a
function α defined on a Riemannian manifold (M, h, i) is said to be
∗
Mathematics Subject Classification (MSC2000): 65F35 (Primary), 15A12 (Secondary).
†
C. Beltrán, Departmento de Matemáticas, Estadı́sticas y Computacı́on Universidad de
Cantábria, Santander, España (beltranc@gmail.com). CB was supported by MTM200762799 and by a Spanish postdoctoral grant.
‡
J.-P. Dedieu, Institut de Mathématiques, Université Paul Sabatier, 31062 Toulouse
cedex 09, France (jean-pierre.dedieu@math.univ-toulouse.fr). J.-P. Dedieu was supported by the ANR Gecko.
§
G. Malajovich, Departamento de Matemática Aplicada, Universidade Federal de
Rio de Janeiro, Caixa Postal 68530, CEP 21945-970, Rio de Janeiro, RJ, Brazil
(gregorio@ufrj.br). He was partially supported by CNPq grants 303565/2007-1 and
470031/2007-7, by FAPERJ (Fundação Carlos Chagas de Amparo à Pesquisa do Estado
do Rio de Janeiro) and by the Brazil-France agreement of cooperation in Mathematics.
¶
M. Shub, Department of Mathematics, University of Toronto, Toronto, Ontario,
Canada M5S 2E4 (shub.michael@gmail.com). CB and MS were supported by an NSERC
Discovery Grant.
1
self-convex when log α(γ(t)) is convex for any geodesic in (M, α h, i).
Necessary and sufficient conditions for self-convexity are given when
α is C 2 . When α(x) = d(x, N )−2 where d(x, N ) is the distance from
x to a C 2 submanifold N ⊂ Rj we prove that α is self-convex when
restricted to the largest open set of points x where there is a unique
closest point in N to x. We also show, using this more general notion,
that the square of the condition number kAkF /σn (A) is self-convex
in projective space and the solution variety.
1
Introduction
Let two integers 1 ≤ n ≤ m be given and let us consider the space of matrices
Kn×m , K = R or C, equipped with the Frobenius Hermitian product
X
hM, N iF = trace (N ∗ M ) =
mij nij .
i,j
Given an absolutly continuous path A(t), a ≤ t ≤ b, its length is given by
the integral
Z b
dA(t)
dt,
L=
dt F
a
and the shortest path connecting A(a) to A(b) is the segment connecting
them. Consider now the problem of connecting these two matrices with the
shortest possible path in staying, as much as possible, away from the set of
“singular matrices” that is the matrices with non-maximal rank.
The singular values of a matrix A ∈ Kn×m are denoted in non-increasing
order:
σ1 (A) ≥ . . . ≥ σn−1 (A) ≥ σn (A) ≥ 0.
We denote by GLn,m the space of matrices A ∈ Kn×m with maximal rank :
rank A = n, that is σn (A) > 0 so that the set of singular matrices is
N = Kn×m \ GLn,m = A ∈ Kn×m : σn (A) = 0 .
Since the smallest singular value of a matrix is equal to the distance from
the set of singular matrices:
σn (A) = dF (A, N ) = min kA − SkF ,
S∈N
2
given an absolutly continuous path A(t), a ≤ t ≤ b, we define its “condition
length” by the integral
Z b
dA(t)
Lκ =
σn (A(t))−1 dt.
dt
a
F
A good compromise between length and distance to N is obtained in minimizing Lκ . We call “minimizing condition geodesic” an absolutly continuous
path, parametrized by arc length, which minimizes Lκ in the set of absolutly
continuous paths with given end-points and condition distance dκ (A, B) between two matrices the length Lκ of a minimizing condition geodesic with
endpoints A and B, if any.
In this paper our objective is to investigate the properties of the smallest
singular value σn (A(t)) along a condition geodesic. Our main result says
that the map log (σn (A(t))−1 ) is convex. Thus σn (A(t)) is concave, and its
minimum value along the path is reached at one of the endpoints.
Note that a similar property holds in the case of hyperbolic geometry
where instead of Kn×m we take Rn−1 ×[0, ∞[, instead of N we have Rn−1 ×{0},
and where the length of a path a(t) = (a1 (t), . . . , an (t)) is defined by the
integral
Z
da(t)
an (t)−1 dt.
dt
Geodesics in that case are arcs of circles centered at Rn−1 × {0} or segments
of vertical lines, and log (an (t)−1 ) is convex along such paths.
The approach used here to prove our theorems is heavily based on Riemannian geometry. We define on GLn,m the following Riemannian structure:
hM, N iκ,A = σn (A)−2 Re hM, N iF
where M, N ∈ Kn×m and A ∈ GLn,m . The minimizing condition geodesics
defined previously are clearly geodesic in GLn,m for this Riemannian structure so that we may use the toolbox of Riemannian geometry. In fact things
are not so simple: the smallest singular value σn (A) is a locally Lipschitz
map in GLn,m , and it is smooth on the open subset
GL>
n,m = {A ∈ GLn,m : σn−1 (A) > σn (A)}
that is when the smallest singular value of A is simple. On the open subset
GL>
n,m the metric h·, ·iκ defines a smooth Riemannian structure, and we call
3
“condition geodesics” the geodesics related to this structure. Such a path is
not necessarily a minimizing geodesic. Our first main theorem establishes a
remarkable property of the condition Riemannian structure:
Theorem 1. σn−2 is logarithmically convex on GL>
n,m i.e. for any geodesic
curve γ(t) in GL>
for
the
condition
metric
the
map
log (σn−2 (γ(t))) is conn,m
vex.
Problem 1. The condition Riemannian structure h., .iκ is defined in GLn,m
where it is is only locally Lipschitz. Let us define condition geodesics in GLn,m
as the extremals of the condition length Lκ (see for example [3] Chapter 4,
Theorem 4.4.3, for the definition of such extremals in the Lipschitz case). Is
Theorem 1 still true for GLn,m ? All the examples we have studied confirm
that convexity holds, even if σn−1 (γ(t)) fails to be C 1 . See Boito-Dedieu [2].
We intend to address this issue in a future paper.
In a second step we extend these results to other spaces of matrices: the
>
sphere Sr (GL>
n,m ) of radius r in GLn,m in Corollary 6, the projective space
P GL>
n,m in Corollary 7. We also consider the case of the solution variety
of the homogeneous equation M ζ = 0 that is the set of pairs
(M, ζ) ∈ Kn×(n+1) × Kn+1 : M ζ = 0 .
Now our function α is the square of the condition number studied by Demmel
in [4]. This is done in the affine context in Theorem 3 and in the projective
context in Corollary 8.
Since σn (A) is equal to the distance from A to the set of singular matrices
a natural question is to ask whether our main result remains valid for the
inverse of the distance from certain sets or for more general functions.
Definition 1. Let (M, h·, ·i) be Riemannian and let α : M → R be a function
of class C 2 with positive values. Let Mκ be the manifold M with the new
metric
h·, ·iκ,x = α(x)h·, ·ix
called condition Riemann structure. We say that α is self-convex when
log α(γ(t)) is convex for any geodesic γ in Mκ .
For example, with M = {x = (x1 , . . . , xn ) ∈ Rn : xn > 0} equipped with
the usual metric, α(x) = x−2
n is self-convex. The space Mκ is the Poincaré
model of hyperbolic space.
4
In the following theorem we prove self-convexity for the distance function
to a C 2 submanifold without boundary N ⊂ Rj . Let us denote by
ρ(x) = d(x, N ) = min kx − yk and α(x) =
y∈N
1
.
ρ(x)2
Let U be the largest open set in Rj such that, for any x ∈ U, there is a unique
closest point in N to x. When U is equipped with the new metric α(x) h., .i
we have:
Theorem 2. The function α : U \ N → R is self-convex.
Theorem 2 is then extended to the projective case. Let N be a C 2 submanifold without boundary of P(Rj ). Let us denote by dR the Riemannian
distance in projective space (points in the projective space are lines throught
the origin and the distance dR between two lines is the angle they make).
Let us denote dP = sin dR (this is also a distance), define α(x) = dP (x, N )−2 ,
and let U be the largest open subset of P(Rj ) such that for x ∈ U there is a
unique closest point from N to x for the distance dP . Then
Corollary 1. The map α : U \ N → R is self-convex.
The extension of Theorem 1 and Theorem 2 to other types of sets or
functions is not obvious. In Example 1 we prove that α(A) = σ1 (A)−2 +
· · · + σn (A)−2 is not self-convex in GLn,m .
In Example 2 we take N = R2 , and U the unit disk so that U contains a
point (the center) which has many closest points from N . In that case the
corresponding function α : U \ N → R is self-convex but it fails to be smooth
at the center of the disk.
In Example 3 we provide an example of a submanifold N ⊂ R2 such that
the function α(x) = d(x, N )−2 defined on R2 \ N is not self-convex.
Our interest in considering the condition metric in the space of matrices
comes from recent papers by Shub [8] and Beltrán-Shub [1] where these
authors use condition length along a path in certain solution varieties to
estimate step size for continuation methods to follow these paths. They give
bounds on the number of steps required in terms of the condition length of the
path. If geodesics in the condition metric are followed the known bounds on
polynomial system solving are vastly improved. To understand the properties
of these geodesics we have begun in this paper with linear systems where we
can investigate their properties more deeply. We find self-convexity in the
5
context of this paper remarkable. We do not know if similar issues may
naturally arise in linear algebra even for solving systems of linear equations.
Similar issues do clearly arise when studying continuation methods for the
eigenvalue problem.
2
Self-convexity
Let us first start to recall some basic definitions about convexity on Riemannian manifolds. A good reference on this subject is Udrişte [9].
Definition 2. We say that a function f : M → R is convex whenever
f (γxy (t)) ≤ (1 − t)f (x) + tf (y)
for every x, y ∈ M, for every geodesic arc γxy joigning x and y and 0 ≤ t ≤ 1.
The convexity of f in M is equivalent to the convexity in the usual sense
of f ◦ γxy on [0, 1] for every x, y ∈ U and the geodesic γxy joining x and y
or also to the convexity of g ◦ γ for every geodesic γ ([9] Chap. 3, Th. 2.2).
Thus, we see that
Lemma 1. Self-convexity of a function α : M → R is equivalent to the
convexity of log ◦α in the condition Riemannian manifold Mκ .
When f is a function of class C 2 in the Riemannian manifold M, we
define its second derivative D2 f (x) as the second covariant derivative. It is
a symmetric bilinear form on Tx M. Note ([9, Chapter 1]) that if x ∈ M and
ẋ ∈ Tx M, and if γ(t) is a geodesic in M, γ(0) = x, dtd γ(0) = ẋ, then
d2
D f (x)(ẋ, ẋ) = 2 (f ◦ γ)(0).
dt
2
This second derivative depends on the Riemannian connection on M. Since
M is equipped with two different metrics: h., .i and h., .iκ we have to distinguish between the corresponding second derivatives; they are denoted by
D2 f (x) and Dκ2 f (x) respectively. No such distinction is necessary for the
first derivative Df (x).
Convexity on Riemannian manifold is characterized by (see [9] Chap. 3,
Th. 6.2):
6
Proposition 1. A function f : M → R of class C 2 is convex if and only if
D2 f (x) is positive semidefinite for every x ∈ M.
We use this proposition to obtain a caracterisation of self-convexity: α
is self-convex if and only if the second derivative Dκ2 (log ◦α)(x) is positive
semidefinite for any x ∈ Mκ . We get
Proposition 2. For a function α : M → R of class C 2 with positive values
self-convexity is equivalent to
2α(x)D2 α(x)(ẋ, ẋ) + kDα(x)k2x kẋk2x − 4(Dα(x)ẋ)2 ≥ 0
for any x ∈ M and for any vector ẋ ∈ Tx M, the tangent space at x.
Proof. Let x ∈ M be given. Let ϕ : Rm → M be a coordinate system such
that ϕ(0) = x and with first fundamental form gij (0) = δij (Kronecker’s
delta) and Christoffel’s symbols Γijk (0) = 0, and let
A=α◦ϕ
so that α(x) = A(0). Those coordinates are called “normal” or “geodesic”.
Note that this implies
∂gij
(0) = 0
∂zk
for all i, j, k. We denote by gκ,ij and Γiκ,jk respectively the first fundamental
form and the Christoffel symbols for ϕ in Mκ . Let us compute them. Note
that
gκ,ij (z) = gij (z)A(z),
∂gκ,ij
(0) = Dgκ,ij (0)(ek ) = D(gij A)(0)(ek ) =
∂zk
∂A
(0).
gij (0)DA(0)(ek ) + A(0)Dgij (0)(ek ) = δij
∂zk
Moreover,
Γiκ,jk
1 i
1
∂gκ,ik
∂gκ,jk
∂gκ,ij
=
Γ =
(0) +
(0) −
(0) =
A(0) jk 2A(0)
∂zk
∂zj
∂zi
1
∂A
∂A
∂A
δij
(0) + δik
(0) − δjk
(0) .
2A(0)
∂zk
∂zj
∂zi
7
That is,
1
∂A
i
i
Γκ,ik = Γκ,ki = 2A(0) ∂zk (0) for all i, k,
−1 ∂A
(0),
j 6= i,
Γiκ,jj = 2A(0)
∂zi
i
Γκ,jk = 0
otherwise.
The second derivative of the composition of two maps
f
ψ
M→R→R
is given by the identity (see [9] Chap. 1.3, Hessian)
D2 (ψ ◦ f )(x) = Dψ(f (x))D2 f (x) + ψ ′′ (f (x))Df (x) ⊗ Df (x)
and where Df (x) ⊗ Df (x) is the bilinear form on Tx M by
(Df (x) ⊗ Df (x))(u, v) = Df (x)(u)Df (x)(v).
This gives in our context, that is when f = α and ψ = log,
Dκ2 (log ◦α)(x) =
1
1
Dα(x) ⊗ Dα(x).
Dκ2 α(x) −
α(x)
α(x)2
According to Proposition 1 our objective is now to give a necessary and
sufficient condition for Dκ2 (log ◦α)(x) to be positive semidefinite for each
x ∈ M. In our system of local coordinates the components of D2 α(x) are
(see [9] Chap. 1.3)
Ajk =
X
∂2A
∂A
∂2A
−
Γijk
=
∂zj ∂zk
∂zi
∂zj ∂zk
i
while the components of Dκ2 α(x) are
Aκ,jk =
X
∂2A
∂A
−
.
Γiκ,jk
∂zj ∂zk
∂zi
i
If we replace the Christoffel symbols in this last sum by the values previously
computed we obtain, when j = k,
X
i
Γiκ,jj
∂A X i ∂A
∂A
= Γjκ,jj
+
Γκ,jj
=
∂zi
∂zj
∂zi
i6=j
8
1
2A
∂A
∂zj
2
while when j 6= k,
1 X
−
2A i6=j
X
Γiκ,jk
i
∂A
∂zi
2
1
=
A
∂A
∂zj
2
1 X
−
2A i
∂A
∂zi
2
∂A
∂A
∂A
= Γjκ,jk
+ Γkκ,jk
=
∂zi
∂zj
∂zk
1 ∂A ∂A
1 ∂A ∂A
1 ∂A ∂A
+
=
.
2A ∂zk ∂zj 2A ∂zj ∂zk
A ∂zj ∂zk
Both cases are subsumed in the identity
X
i
∂A
Γiκ,jk
∂zi
1 ∂A ∂A
δjk X
=
−
A ∂zj ∂zk 2A i
∂A
∂zi
2
.
Putting together all these identities gives the following expression for the
components of Dκ2 (log ◦α)(x):
2 !
2
X
1 ∂A ∂A δjk
∂A
1 ∂A ∂A
∂ A
1
−
+
=
− 2
Dk a2 (log ◦α)(x)jk =
A ∂zj ∂zk A ∂zj ∂zk 2A i
∂zi
A ∂zj ∂zk
!
X ∂A 2
1
∂A ∂A
∂2A
+ δjk
−4
2A
.
2A2
∂zj ∂zk
∂z
∂z
∂z
i
j
k
i
Thus, Dκ2 (log ◦α)(x) ≥ 0 if and only if
2α(x)D2 α(x) + kDα(x)k2x h., .ix − 4Dα(x) ⊗ Dα(x)
is positive semi-definite, that is when
2α(x)D2 α(x)(ẋ, ẋ) + kDα(x)k2x kẋk2x − 4(Dα(x)ẋ)2 ≥ 0
for any x ∈ M and for any vector ẋ ∈ Tx M. This finishes the proof.
An easy consequence of Proposition 2 is the following. See also Example
3.
Corollary 2. When a function α : M → R of class C 2 is self-convex then
any critical point of α has a positive semi-definite second derivative D2 α(x).
Such a function cannot have a strict local maximum or a non-degenerate
saddle.
9
Proposition 3. The following condition is equivalent for a C 2 function α =
1/ρ2 : M −→ R to be self-convex on M: For every x ∈ M and ẋ ∈ Tx M,
kẋk2 kDρ(x)k2 − (Dρ(x)ẋ)2 − ρ(x)D2 ρ(x)(ẋ, ẋ) ≥ 0,
or, what is the same,
2kẋk2 kDρ(x)k2 ≥ D2 ρ2 (x)(ẋ, ẋ).
Proof. Note that
Dα(x)ẋ =
D2 α(x)(ẋ, ẋ) =
−2
Dρ(x)ẋ,
ρ(x)3
6
2
2
(Dρ(x)
ẋ)
−
D2 ρ(x)(ẋ, ẋ).
ρ(x)4
ρ(x)3
Hence, the necessary and sufficient condition of Proposition 2 reads
16
12
4
4kẋk2 kDρ(x)k2
−
(Dρ(x)ẋ)2 +
(Dρ(x)ẋ)2 −
D2 ρ(x)(ẋ, ẋ) ≥ 0,
6
6
6
ρ(x)
ρ(x)
ρ(x)
ρ(x)5
and the proposition follows.
Corollary 3. Each of the following conditions is sufficient for a function
α = 1/ρ2 : M −→ R to be self-convex at x ∈ M: For every ẋ ∈ Tx M,
D2 ρ(x)(ẋ, ẋ) ≤ 0,
or
kD2 ρ2 (x)k ≤ 2kDρ(x)k2 .
In the following proposition we obtain a weaker condition on α to obtain
convexity in Mκ instead of self-convexity.
Proposition 4. α(x) is convex in Mκ if and only if
2α(x)D2 α(x)(ẋ, ẋ) + kDα(x)k2x kẋk2x − 2(Dα(x)ẋ)2 ≥ 0,
for any x ∈ M and any vector ẋ ∈ Tx M.
Proof. We follow the lines of the proof of Proposition 2 with ψ equal to the
identity map instead of ψ = log.
10
3
Some general formulas for matrices
Proposition 5. Let A = (Σ, 0) ∈ GL>
n,m , where Σ = diag (σ1 ≥ · · · ≥
σn−1 > σn ) ∈ Kn×n . The map σn : GL>
n,m → R is a smooth map and, for
n×m
every U ∈ K
,
(
Dσn (A)U = Re(unn ),
P
Pn−1 |ukn σn +unk σk |2
2
D2 σn2 (A)(U, U ) = 2 m
.
j=1 |unj | − 2
k=1
σ 2 −σ 2
k
n
Proof. Since σn2 is an eigenvalue of AA∗ with multiplicity 1, the implicit
function theorem proves the existence of smooth functions σn2 (B) ∈ R and
u(B) ∈ Kn , defined in an open neighborhood of A and satisfying
BB ∗ u(B) = σn2 (B)u(B),
ku(B)k2 = 1,
u(A) = en = (0, . . . , 0, 1)T ∈ Kn ,
2
σn (A) = σn2 .
Differentiating these equations at B gives, for any U ∈ Kn×m ,
′
(U B ∗ + BU ∗ )u(B) + BB ∗ u̇(B) = (σn2 ) u(B) + σn2 (B)u̇(B),
u(B)∗ u̇(B) = 0
′
with u̇(B) = Du(B)U and (σn2 ) = Dσn2 (B)U . Pre-multiplying the first
equation by u(B)∗ gives
′
u(B)∗ (U B ∗ +BU ∗ )u(B)+u(B)∗ BB ∗ u̇(B) = σn2 u(B)∗ u(B)+σn2 (B)u(B)∗ u̇(B)
so that
Dσn2 (B)U = σn2
and
Dσn (B)U =
′
= 2Re(u(B)∗ U B ∗ u(B))
Re(u(B)∗ U B ∗ u(B))
.
σn (B)
The derivative of the eigenvector is now easy to compute:
′
Du(B)U = u̇(B) = (σn2 (B)In − BB ∗ )† (U B ∗ + BU ∗ − σn2 In )u(B)
where (σn2 (B)In − BB ∗ )† denotes the generalized inverse (or Moore-Penrose
inverse) of σn2 (B)In − BB ∗ .
11
The second derivative of σn2 at B is given by
D2 σn2 (B)(U, U ) = 2Re(u̇(B)∗ U B ∗ u(B)+u(B)∗ U U ∗ u(B)+u(B)∗ U B ∗ u̇(B)) =
2Re(u(B)∗ U U ∗ u(B) + u(B)∗ (U B ∗ + BU ∗ )u̇(B)) = 2Re(u(B)∗ U U ∗ u(B)+
′
u(B)∗ (U B ∗ + BU ∗ )(σn2 (B)In − BB ∗ )† (U B ∗ + BU ∗ − σn2 In )u(B)).
Using u(A) = en and σn (A) = σn we get
Dσn2 (A)U = 2Re(U A∗ )nn = 2σn Re(unn ),
Dσn (A)U = Re(unn ),
and the second derivative is given by
D2 σn2 (A)(U, U ) =
2Re (U U ∗ )nn +
n−1
X
2 ′
(U A∗ + AU ∗ )nk (σn2 − σk2 )−1 (U A∗ + AU ∗ − σn In )kn
k=1
∗
2Re (U U )nn +
n−1
X
|(U A∗ + AU ∗ )kn |2
σn2 − σk2
k=1
!
=2
m
X
2
|unj | −2
j=1
!
=
n−1
X
|ukn σn + unk σk |2
k=1
σk2 − σn2
Corollary 4. Let A = (Σ, 0) ∈ GL>
n,m , where Σ = diag (σ1 ≥ · · · ≥ σn−1 >
n×n
σn > 0) ∈ K . Let us define ρ(A) = σn (A)/ kAkF . Then, for any U ∈
Kn×m such that Re hA, U iF = 0, we have
(
nn )
Dρ(A)U = Re(u
,
kAkF
P
Pn−1 |ukn σn +unk σk |2 kU k2F 2
m
2
2
− kAk2 σn .
D2 ρ2 (A)(U, U ) = kAk
|u
|
−
2
nj
j=1
k=1
σ 2 −σ 2
n
k
F
F
Proof. Note that
Dρ(A)U =
iF
Dσn (A)U kAkF − σn (A) 2RehA,U
2kAkF
kAk2F
=
Dσn (A)U
,
kAkF
and the first assertion of the corollary follows from Proposition 5. For the
second one, note that h = h1 /h2 (for real valued C 2 functions h, h1 , h2 with
h2 (0) 6= 0) implies
D2 h =
h22 D2 h1 − h1 h2 D2 h2 − 2h2 Dh1 Dh2 + 2h1 (Dh2 )2
.
h32
Now, ρ2 (A) = σn2 (A)/kAk2F , D(kAk2F )U = 2RehA, U iF = 0, D2 (kAk2F )(U, U ) =
2kU k2F , and D2 σn2 (A)(U, U ) is known from Proposition 5. The formula for
D2 ρ2 (A) follows after some elementary calculations.
12
.
4
The affine linear case
We consider here the Riemannian manifold M = GL>
n,m equipped with the
usual Frobenius Hermitian product. Let α : GL>
n,m → R be defined as
2
α(A) = 1/σn (A).
Corollary 5. The function α is self-convex in GL>
n,m .
Proof. From Proposition 3, it suffices to see that
2kU k2F kDσn (A)k2F ≥ D2 σn2 (A)(U, U ).
Since unitary transformations are isometries in GL>
n,m with respect to the
condition metric we may suppose, via a singular value decomposition that
n×n
A = (Σ, 0) ∈ GL>
. Now,
n,m , where Σ = diag (σ1 ≥ · · · ≥ σn−1 > σn ) ∈ K
the inequality to verify is obvious from Proposition 5, as kDσn (A)kF = 1
and
D2 σn2 (A)(U, U ) = 2
m
X
|unj |2 −2
j=1
n−1
X
|ukn σn + unk σk |2
σk2 − σn2
k=1
≤2
m
X
|unj |2 ≤ 2kU k2F .
j=1
Corollary 6. Let r > 0. The function α is self-convex in the sphere Sr (GL>
n,m )
of radius r in GL>
.
n,m
Proof. It is enough to prove that any geodesic in (Sr (GL>
n,m ), α) is also a
geodesic in (GL>
,
α).
Indeed,
suppose
that
A
and
B
are matrices in
n,m
>
>
Sr (GLn,m ) and the minimal geodesic in (GLn,m , α) between A and B is X(t),
rX(t)
≤ Lκ (X(t)). Indeed, for any t,
a ≤ t ≤ b. Then we claim that Lκ kX(t)k
F
d
dt
rX(t)
kX(t)kF
so that
r2
r dX(t)
X(t)Re(hX(t), dX(t)
iF )
dt
dt
=
−r
3
kX(t)kF
kX(t)kF
d
dt
dX(t)
dt
kX(t)k2F
rX(t)
kX(t)k
=
F
2
F
+
r
2
iF )2
Re(hX(t), dX(t)
dt
kX(t)k4F
13
−
2r
2
1/2
iF )2
Re(hX(t), dX(t)
dt
kX(t)k4F
=
Hence,
r
2
dX(t)
dt
kX(t)k2F
d
dt
1/2
2
F
−
rX(t)
kX(t)kF
kX(t)kF σn−1 (X(t)) d
r
dt
r
iF )2
r2 Re(hX(t), dX(t)
dt
kX(t)k4F
=
σn−1
κ
rX(t)
kX(t)k
rX(t)
kX(t)kF
d
dt
≤ σn−1 (X(t))
F
≤
dX(t)
dt
F
kX(t)kF
rX(t)
kX(t)k
dX(t)
dt
=
F
=
F
dX(t)
dt
.
κ
Therefore X(t) can only be a minimizing geodesic if it belongs to Sr (GL>
n,m ).
Since all geodesics are locally minimizing geodesics, Corollary 6 follows.
The following gives an example of a smooth and non-selfconvex function
in GLn,m .
Example 1. For n ≥ 3, the function α(A) = σ1 (A)−2 + · · · + σn (A)−2 is not
self-convex in GLn,m .
Proof. For simplicity we consider the case of real square matrices. We have
α(A) = kA−1 k2F ,
Dα(A)Ȧ = −2hA−1 , A−1 ȦA−1 iF = −2hA−T A−1 A−T , ȦiF ,
kDα(A)k2F = 4kA−T A−1 A−T k2F ,
D2 α(A)(Ȧ, Ȧ) = 2kA−1 ȦA−1 k2F + 4hA−1 , A−1 ȦA−1 ȦA−1 iF .
According to Proposition 4, the self-convexity of α(A) in GLn is equivalent
to
2kA−1 k2F 2kA−1 ȦA−1 k2F + 4hA−1 , A−1 ȦA−1 ȦA−1 iF +
4kȦk2F kA−T A−1 A−T k2F − 8hA−1 , A−1 ȦA−1 i2F ≥ 0
This inequality is not satisfied when
0 1 0
1 0 0
A = 0 1 0 and Ȧ = −1 0 0 .
0 0 0
0 0 2
14
5
5.1
The homogeneous linear case
The complex projective space.
The matter of this subsection is mainly taken from Gallot-Hulin-Lafontaine
[6] sect. 2.A.5.
Let V be a Hermitian space of complex dimension dimC V = d + 1. We
denote by P(V ) the corresponding projective space that is the quotient of V \
{0} by the group C∗ of dilations of V ; P(V ) is equipped with its usual smooth
manifold structure with complex dimension dim P(V ) = d. We denote by p
the canonical surjection.
Let V be considered as a real vector space of dimension dimR V = 2d + 2
equipped with the scalar product Re h., .iV . The sphere S(V ) is a submanifold
in V of real dimension 2d + 1. This sphere being equipped with the induced
metric becomes a Riemannian manifold and, as usual, we identify the tangent
space at z ∈ S(V ) with
Tz S(V ) = {u ∈ V : Re hu, ziV = 0} .
The projective space P(V ) can also be seen as the quotient S(V )/S 1 of
the unit sphere in V by the unit circle in C for the action given by (λ, z) ∈
S 1 × S(V ) → λz ∈ S(V ). The canonical map is denoted by
pV : S(V ) → P(V ).
pV is the restriction of p to S(V ).
The horizontal space at z ∈ S(V ) related to pV is defined as the (real)
orthogonal complement of ker DpV (z) in Tz S(V ). This horizontal space is
denoted by Hz . Since V is decomposed in the (real) orthogonal sum
V = Rz ⊕ Riz ⊕ z ⊥
and since ker DpV (z) = Riz (the tangent space at z to the circle S 1 z) we get
Hz = z ⊥ = {u ∈ V : hu, zi = 0} .
There exists on P(V ) a unique Riemannian metric such that pV is a
Riemannian submersion that is, pV is a smooth submersion and, for any
z ∈ S(V ), DpV (z) is an isometry between Hz and Tp(z) P(V ). Thus, for this
Riemannian structure, one has:
hDpV (z)u, DpV (z)viTp(z) P(V ) = Re hu, viV
for any z ∈ S(V ) and u, v ∈ Hz .
15
Proposition 6. Let z ∈ S(V ) be given.
1. A chart at p(z) ∈ P(V ) is defined by
ϕz : Hz → P(V ), ϕz (u) = p(z + u).
2. Its derivative at 0 is the restriction of Dp(z) at Hz :
Dϕz (0) = Dp(z) : Hz → Tp(z) P(V )
which is an isometry.
3. For any smooth mapping ψ : P(V ) → R, and for any v ∈ Hz we have
Dψ(p(z)) (Dp(z)v) = D(ψ ◦ ϕz )(0)v
and
D2 ψ(p(z))(Dp(z)v, Dp(z)v) = D2 (ψ ◦ ϕz )(0)(v, v).
Proof. 1 and 2 are easy. We have D(ψ ◦ ϕz )(0) = Dψ(p(z))D(ϕz )(0) which
gives 3 since D(ϕz )(0)v = Dp(z)v for any v ∈ Hz . For the second derivative,
recall that D2 ψ(p(z))(Dp(z)v, Dp(z)v) = (ψ ◦ γ̃)′′ (0), where γ̃ is a geodesic
curve in P(V ) such that γ̃(0) = p(z), γ̃ ′ (0) = Dp(z)v. Now, consider the
horizontal pV −lift γ of γ̃ to S(V ) with base point z. Note that γ(0) =
z, γ ′ (0) = v. Hence,
(ψ ◦ γ̃)′′ (0) = (ψ ◦ p ◦ γ)′′ (0) = D2 (ψ ◦ p)(z)(v, v) + Dψ(p(z))Dp(z)γ ′′ (0).
As γ ′′ (0) is orthogonal to Tz S(V ), we have Dp(z)γ ′′ (0) = 0. Finally,
D2 (ψ◦p)(z)(v, v) = (ψ◦p(z+tv))′′ (0) = (ψ◦ϕz (tv))′′ (0) = D2 (ψ◦ϕz )(0)(v, v),
and the assertion on the second derivative follows.
The following result will be helpful.
Proposition 7. Let M1 , M2 be Riemannian manifolds and α2 : M2 →
]0, ∞[ be of class C 2 . Let π : M1 → M2 be a Riemannian submersion.
Let U2 ⊆ M2 be an open set and assume that α1 = α2 ◦ π is self-convex in
U1 = π −1 (U2 ). Then, α2 is self-convex in U2 .
16
Proof. Let Mκ,1 be M1 , but endowed with the condition metric given by α1 ,
and let Mκ,2 be M2 , but endowed with the condition metric given by α2 .
Then, π : Mκ,1 → Mκ,2 is also a Riemannian submersion.
Now, let γ2 : [a, b] → U2 ⊆ Mκ,2 be a geodesic, and let γ1 ⊆ Mκ,1 be its
horizontal lift by π. Then, γ1 is a geodesic in U1 ⊆ M1 (see [6, Cor 2.109])
and hence log α1 (γ1 (t)) is a convex function of t. Now,
log(α2 (γ2 (t))) = log(α2 ◦ π(γ1 (t))) = log(α1 (γ(t))),
is convex as wanted.
2 −2
Corollary 7. The function α2 : P(GL>
n,m ) → R, α2 (A) = kAkF σn (A) is
>
self-convex in P(GLn,m ).
>
Proof. Note that p : S(GL>
n,m ) → P(GLn,m ) is a Riemannian submersion
and α2 = α ◦ p where α is as in Corollary 6. The corollary follows from
Proposition 7.
5.2
The solution variety.
Let us denote by p1 and p2 the canonical maps
p2
p1
S1 → P Kn×(n+1) and S2 → P Kn+1 = Pn (K),
where S1 is the unit sphere in Kn×(n+1) and S2 is the unit sphere in Kn+1 .
Consider the affine solution variety,
Ŵ > = (M, ζ) ∈ S1 × S2 : M ∈ GL>
n,n+1 and M ζ = 0 .
It is a Riemannian manifold equipped with the metric induced by the product
metric on Kn×(n+1) × Kn+1 . The tangent space to Ŵ > is given by
n
o
T(M,ζ) Ŵ > = (Ṁ , ζ̇) ∈ TM S1 × Tζ S2 : Ṁ ζ + M ζ̇ = 0 .
The projective solution variety considered here is
W > = (p1 (M ), p2 (ζ)) ∈ P Kn×(n+1) × Pn (K) : M ∈ GL>
n,n+1 and M ζ = 0 ,
that is also a Riemannian manifold
equipped with the metric induced by the
n×(n+1)
product metric on P K
× Pn (K).
Let us denote by π1 the restriction to Ŵ > of the first projection S1 ×S2 →
S1 , and by R : Ŵ > → R, R = σn ◦ π1 . We have
17
Lemma 2. Let w = (M, ζ) ∈ Ŵ > and let γ be a geodesic in Ŵ > , γ(0) = w.
Then,
Dσn (π1 (w))(π1 ◦ γ)′′ (0) < 0.
Proof. Our problem is invariant by unitary change of coordinates. Hence,
using a singular value decomposition, we can assume that M = (Σ, 0) ∈
n×n
GL>
and ζ = en+1 =
n,n+1 , where Σ = diag (σ1 ≥ · · · ≥ σn−1 > σn ) ∈ K
T
>
(0, . . . , 0, 1) ∈ S2 . As γ = (M (t), ζ(t)) is a geodesic of Ŵ ⊆ Kn×(n+1) × Kn ,
γ ′′ (0) is orthogonal to Tw Ŵ, which contains all the pairs of the form ((A, 0), 0)
where A is a n × n matrix, RehΣ, Ai = 0. Hence, M ′′ (0) has the form
M ′′ (0) = (aΣ, ∗),
for some real number a ∈ R. Finally, M (t) is contained in the sphere so
kM (t)kF = 1 and
0 = (||M (t)||2F )′′ (0) = 2||M ′ (0)||2F + 2RehM (0), M ′′ (0)i = 2||M ′ (0)||2F + 2a,
so that a = −kM ′ (0)k2F and (M ′′ (0))nn = −kM ′ (0)k2F σn . From Proposition
5,
Dσn (π1 (w))(π1 ◦ γ)′′ (0) = Re((π1 ◦ γ)′′ (0)nn ) = Re(M ′′ (0))nn < 0.
Theorem 3. The map α : Ŵ > → R given by α(M, ζ) = σn (M )−2 is selfconvex.
Proof. Using unitary invariance we can take M = (Σ, 0) ∈ GL>
n,n+1 , where
n×n
Σ = diag (σ1 ≥ · · · ≥ σn−1 > σn ) ∈ K
and ζ = en+1 = (0, . . . , 0, 1)T ∈ S2 .
According to proposition 3 we have to prove that
2 kẇk2w kDR(w)k2 ≥ D2 R2 (w)(ẇ, ẇ)
for every w ∈ Ŵ > and ẇ ∈ Tw Ŵ > . From Proposition 5 we have
DR(w)ẇ = Dσn (π1 (w))(Dπ1 (w)ẇ) = Re(Dπ1 (w)ẇ)nn ,
so that kDR(w)k = 1. On the other hand, assume that ẇ 6= 0 and let γ be
a geodesic in Ŵ > , γ(0) = w, γ̇(0) = ẇ. From Lemma 2,
D2 R2 (w)(ẇ, ẇ) = (σn2 ◦ π1 ◦ γ)′′ (0) =
18
D2 σn2 (π1 (w))(Dπ1 (w)ẇ, Dπ1 (w)ẇ) + 2σn Dσn (π1 (w))(π1 ◦ γ)′′ (0) <
D2 σn2 (π1 (w))(Dπ1 (w)(ẇ), Dπ1 (w)(ẇ)).
Thus, we have to prove that for ẏ ∈ Kn×(n+1) ,
2 kẏk2 ≥ D2 σn2 (π1 (w))(ẏ, ẏ).
which is a consequence of our Proposition 5.
Corollary 8. The map α2 : W > → R given by α2 (M, ζ) = kM k2F /σn2 (M ) is
self-convex.
Proof. Consider the Riemannian submersion
p1 × p2 : S1 × S2 −→ P Kn×(n+1) × Pn (K) , p1 × p2 (M, ζ) = (p1 (M ), p2 (ζ)).
Note that T(M,ζ) Ŵ > contains the kernel of the derivative D(p1 × p2 )(M, ζ).
Thus, the restriction p1 × p2 : Ŵ > → W > , is also a Riemannian submersion.
The corollary follows combining Proposition 7 and Theorem 2.
6
Self-convexity of the distance from a submanifold of Rj
Let N be a C k submanifold without boundary N ⊂ Rj , k ≥ 2. Let us denote
by
ρ(x) = d(x, N ) = inf kx − yk
y∈N
the distance from N to x ∈ Rj (here d(x, y) = kx − yk denotes the Euclidean
distance). Let U be the largest open set in Rj such that, for any x ∈ U, there
is a unique closest point from N to x. This point is denoted by K(x) so that
we have a map defined by
K : U → N , ρ(x) = d(x, K(x)).
Classical properties of ρ and K are given in the following (see also Foote [5],
Li and Nirenberg [7]).
Proposition 8.
1. ρ is defined and 1−Lipschitz on Rj ,
19
2. For any x ∈ U, x − K(x) is a vector normal to N at K(x) i.e. x −
⊥
K(x) ∈ TK(x) N ,
3. K is C k−1 on U,
4. ρ2 is C k on U, Dρ2 (x)ẋ = 2 hx − K(x), ẋi and D2 ρ2 (x)(ẋ, ẋ) = 2kẋk2 −
2 hDK(x)ẋ, ẋi
5. ρ is C k on U \ N ,
6. hDK(x)ẋ, ẋi ≥ 0 for every x ∈ U and ẋ ∈ Rj .
Proof.
1. For any x and y one has ρ(x) = d(x, K(x)) ≤ d(x, K(y)) ≤
d(x, y) + d(y, K(y)) = d(x, y) + ρ(y). Since x and y play a symmetric
role we get |ρ(x) − ρ(y)| ≤ d(x, y).
2. This is the classical first order optimality condition in optimization.
3. This classical result may be derived from the inverse function theorem
applied to the canonical map defined on the normal bundle to N
can : NN → Rj , can(y, n) = y + n,
for every y ∈ N and n ∈ Ny N = (Ty N )⊥ . The normal bundle is a C k−1
manifold, the canonical map is a C k−1 diffeomorphism when restricted
to the set {(y, n) : y + tn ∈ U, ∀ 0 ≤ t ≤ 1} and K(x) is easily given
from can−1 .
4. The derivative of ρ2 is equal to Dρ2 (x)ẋ = 2 hx − K(x), ẋ − DK(x)ẋi =
⊥
2 hx − K(x), ẋi because DK(x)ẋ ∈ TK(x) N and x−K(x) ∈ TK(x) N .
Thus ∇ρ2 (x) = 2(x −K(x)) is C k−1 on U so that ρ2 is C k . The formula
for D2 ρ2 follows.
5. Obvious.
= ẋ(t),
6. Let x(t) be a curve in U with x(0) = x. Let us denote dx(t)
dt
d2 x(t)
dy(t)
d2 y(t)
= ẍ(t), y(t) = K(x(t)), dt = ẏ(t) and dt2 = ÿ(t). From the
dt2
first order optimality condition we get
hx(t) − y(t), ẏ(t)i = 0
20
whose derivative at t = 0 is
hẋ − ẏ, ẏi + hx − y, ÿi = 0.
Thus
hDK(x)ẋ, ẋi = hẏ, ẋi = hẏ, ẏi − hx − y, ÿi .
2
This last quantity is equal to 12 dtd 2 kx − y(t)k2
the second order optimality condition.
t=0
. It is nonnegative by
Proof of Theorem 2 and Corollary 1. We are now able to prove our
second main theorem. Let us denote α(x) = 1/ρ(x)2 . We shall prove that α
is self-convex on U. From proposition 3 it suffices to prove that, for every
ẋ ∈ Rj ,
2kẋk2 kDρ(x)k2 ≥ D2 ρ2 (x)(ẋ, ẋ)
or, according to Proposition 8.4 and kDρk = 1, that
2kẋk2 ≥ 2kẋk2 − 2 hDK(x)ẋ, ẋi .
This is obvious from Proposition 8.4.
Now we prove Corollary 1. Let S1 (Rj ) be the sphere of radius 1 in Rj
and let pRj denote the canonical projection pRj : Rj → P(Rj ). Note that the
preimage of N by pRj satisfies
d(y, p−1
(N )) = dP (pRj (y), N )kyk.
Rj
As in the proof of Corollary 6, the mapping 1/ρ(x)2 is self-convex in the set
S1 (Rj ) ∩ p−1
(U). Now, apply Proposition 7 to the Riemannian submersion
Rj
pRj to conclude the corollary.
Two examples.
Example 2. Take U the unit disk in R2 and N the unit circle. The corresponding function is given by
α(x) = d(x, N )−2 = 1/ (1 − kxk)2 .
According to Theorem 2, the map log α(x) is convex along the condition
geodesics in
U \ {(0, 0)} = x ∈ R2 : 0 < kxk < 1 .
21
This property also holds in U: a geodesic through the origin is a ray x(t) =
(−1 + et )(cos θ, sin θ) when −∞ < t ≤ 0, and x(t) = (1 − e−t )(cos θ, sin θ)
when 0 ≤ t < ∞ for some θ. In that case
log α(x(t)) = 2 |t|
which is convex.
Example 3. Take N ⊂ R2 equal to the union of the two points (−1, 0) and
(1, 0). In that case
α(x)−1 = d(x, N )2 = min (1 + x1 )2 + x22 , (1 − x1 )2 + x22 .
It may be shown that for any 0 < a ≤ 1/10, the straight line segment is
the only minimizing geodesic joining the points (0, −a) and (0, a). Since
log α(0, t) = − log(1 + t2 ) has a maximum at t = 0, g(t), −a ≤ t ≤ a, cannot
be log-convex. Here {0} × R is equal to the locus in R2 of points equally
distant from the two nodes which is the set we avoid in Theorem 2.
References
[1] Beltrán C., and M. Shub, Complexity of Bézout’s Theorem VII:
Distances Estimates in the Condition Metric. Foundations of Computational Mathematics, 9 (2009) 179-195.
[2] Boito P., and J.-P. Dedieu, The condition metric in the
space of full rank rectangular matrices. http://www.math.univtoulouse.fr/ dedieu/Boito-Dedieu-future.pdf
[3] Clarke F. H., Optimization and Nonsmooth Analysis. Les Publications CRM (1989) ISBN 2-921120-01-1.
[4] Demmel J. W., The probability that a Numerical Problem is Difficult.
Mathematics of Computation, 50 (1988) 449-480.
[5] Foote R., Regularity of the distance function, Proceedings of the AMS,
92 (1984) pp 153-155.
[6] Gallot S., D. Hulin and J. Lafontaine, Riemannian Geometry,
Springer (2004) ISBN 9780387524016.
22
[7] Li Y. and L. Nirenberg, Regularity of the distance function to the
boundary, Rendiconti Accad. Naz. delle Sc. 123 (2005) pp 257-264.
[8] Shub M., Complexity of Bézout’s Theorem VI: Geodesics in the Condition Metric. Foundations of Computational Mathematics, 9 (2009)
171-178.
[9] Udriste, C., Convex Functions and Optimization Methods on Riemannian Manifolds, Kluwer (1994) ISBN 0-7923-3002-1.
23