Academia.eduAcademia.edu

Convexity Properties of the Condition Number

2010, SIAM Journal on Matrix Analysis and Applications

We define in the space of n × m matrices of rank n, n ≤ m, the condition Riemannian structure as follows: For a given matrix A the tangent space at A is equipped with the Hermitian inner product obtained by multiplying the usual Frobenius inner product by the inverse of the square of the smallest singular value of A denoted σ n (A). When this smallest singular value has multiplicity 1, the function A → log(σ n (A) −2 ) is a convex function with respect to the condition Riemannian structure that is t → log(σ n (A(t)) −2 ) is convex, in the usual sense for any geodesic A(t). In a more abstract setting, a function α defined on a Riemannian manifold (M, , ) is said to be * Mathematics Subject Classification (MSC2000): 65F35 (Primary), 15A12 (Secondary).

Convexity properties of the condition number. ∗ Carlos Beltrán † Jean-Pierre Dedieu ‡ Gregorio Malajovich § Mike Shub ¶ June 30, 2009 Abstract We define in the space of n × m matrices of rank n, n ≤ m, the condition Riemannian structure as follows: For a given matrix A the tangent space at A is equipped with the Hermitian inner product obtained by multiplying the usual Frobenius inner product by the inverse of the square of the smallest singular value of A denoted σn (A). When this smallest singular value has multiplicity 1, the function A → log(σn (A)−2 ) is a convex function with respect to the condition Riemannian structure that is t → log(σn (A(t))−2 ) is convex, in the usual sense for any geodesic A(t). In a more abstract setting, a function α defined on a Riemannian manifold (M, h, i) is said to be ∗ Mathematics Subject Classification (MSC2000): 65F35 (Primary), 15A12 (Secondary). † C. Beltrán, Departmento de Matemáticas, Estadı́sticas y Computacı́on Universidad de Cantábria, Santander, España (beltranc@gmail.com). CB was supported by MTM200762799 and by a Spanish postdoctoral grant. ‡ J.-P. Dedieu, Institut de Mathématiques, Université Paul Sabatier, 31062 Toulouse cedex 09, France (jean-pierre.dedieu@math.univ-toulouse.fr). J.-P. Dedieu was supported by the ANR Gecko. § G. Malajovich, Departamento de Matemática Aplicada, Universidade Federal de Rio de Janeiro, Caixa Postal 68530, CEP 21945-970, Rio de Janeiro, RJ, Brazil (gregorio@ufrj.br). He was partially supported by CNPq grants 303565/2007-1 and 470031/2007-7, by FAPERJ (Fundação Carlos Chagas de Amparo à Pesquisa do Estado do Rio de Janeiro) and by the Brazil-France agreement of cooperation in Mathematics. ¶ M. Shub, Department of Mathematics, University of Toronto, Toronto, Ontario, Canada M5S 2E4 (shub.michael@gmail.com). CB and MS were supported by an NSERC Discovery Grant. 1 self-convex when log α(γ(t)) is convex for any geodesic in (M, α h, i). Necessary and sufficient conditions for self-convexity are given when α is C 2 . When α(x) = d(x, N )−2 where d(x, N ) is the distance from x to a C 2 submanifold N ⊂ Rj we prove that α is self-convex when restricted to the largest open set of points x where there is a unique closest point in N to x. We also show, using this more general notion, that the square of the condition number kAkF /σn (A) is self-convex in projective space and the solution variety. 1 Introduction Let two integers 1 ≤ n ≤ m be given and let us consider the space of matrices Kn×m , K = R or C, equipped with the Frobenius Hermitian product X hM, N iF = trace (N ∗ M ) = mij nij . i,j Given an absolutly continuous path A(t), a ≤ t ≤ b, its length is given by the integral Z b dA(t) dt, L= dt F a and the shortest path connecting A(a) to A(b) is the segment connecting them. Consider now the problem of connecting these two matrices with the shortest possible path in staying, as much as possible, away from the set of “singular matrices” that is the matrices with non-maximal rank. The singular values of a matrix A ∈ Kn×m are denoted in non-increasing order: σ1 (A) ≥ . . . ≥ σn−1 (A) ≥ σn (A) ≥ 0. We denote by GLn,m the space of matrices A ∈ Kn×m with maximal rank : rank A = n, that is σn (A) > 0 so that the set of singular matrices is  N = Kn×m \ GLn,m = A ∈ Kn×m : σn (A) = 0 . Since the smallest singular value of a matrix is equal to the distance from the set of singular matrices: σn (A) = dF (A, N ) = min kA − SkF , S∈N 2 given an absolutly continuous path A(t), a ≤ t ≤ b, we define its “condition length” by the integral Z b dA(t) Lκ = σn (A(t))−1 dt. dt a F A good compromise between length and distance to N is obtained in minimizing Lκ . We call “minimizing condition geodesic” an absolutly continuous path, parametrized by arc length, which minimizes Lκ in the set of absolutly continuous paths with given end-points and condition distance dκ (A, B) between two matrices the length Lκ of a minimizing condition geodesic with endpoints A and B, if any. In this paper our objective is to investigate the properties of the smallest singular value σn (A(t)) along a condition geodesic. Our main result says that the map log (σn (A(t))−1 ) is convex. Thus σn (A(t)) is concave, and its minimum value along the path is reached at one of the endpoints. Note that a similar property holds in the case of hyperbolic geometry where instead of Kn×m we take Rn−1 ×[0, ∞[, instead of N we have Rn−1 ×{0}, and where the length of a path a(t) = (a1 (t), . . . , an (t)) is defined by the integral Z da(t) an (t)−1 dt. dt Geodesics in that case are arcs of circles centered at Rn−1 × {0} or segments of vertical lines, and log (an (t)−1 ) is convex along such paths. The approach used here to prove our theorems is heavily based on Riemannian geometry. We define on GLn,m the following Riemannian structure: hM, N iκ,A = σn (A)−2 Re hM, N iF where M, N ∈ Kn×m and A ∈ GLn,m . The minimizing condition geodesics defined previously are clearly geodesic in GLn,m for this Riemannian structure so that we may use the toolbox of Riemannian geometry. In fact things are not so simple: the smallest singular value σn (A) is a locally Lipschitz map in GLn,m , and it is smooth on the open subset GL> n,m = {A ∈ GLn,m : σn−1 (A) > σn (A)} that is when the smallest singular value of A is simple. On the open subset GL> n,m the metric h·, ·iκ defines a smooth Riemannian structure, and we call 3 “condition geodesics” the geodesics related to this structure. Such a path is not necessarily a minimizing geodesic. Our first main theorem establishes a remarkable property of the condition Riemannian structure: Theorem 1. σn−2 is logarithmically convex on GL> n,m i.e. for any geodesic curve γ(t) in GL> for the condition metric the map log (σn−2 (γ(t))) is conn,m vex. Problem 1. The condition Riemannian structure h., .iκ is defined in GLn,m where it is is only locally Lipschitz. Let us define condition geodesics in GLn,m as the extremals of the condition length Lκ (see for example [3] Chapter 4, Theorem 4.4.3, for the definition of such extremals in the Lipschitz case). Is Theorem 1 still true for GLn,m ? All the examples we have studied confirm that convexity holds, even if σn−1 (γ(t)) fails to be C 1 . See Boito-Dedieu [2]. We intend to address this issue in a future paper. In a second step we extend these results to other spaces of matrices: the > sphere Sr (GL> n,m ) of radius r in GLn,m in Corollary 6, the projective space  P GL> n,m in Corollary 7. We also consider the case of the solution variety of the homogeneous equation M ζ = 0 that is the set of pairs  (M, ζ) ∈ Kn×(n+1) × Kn+1 : M ζ = 0 . Now our function α is the square of the condition number studied by Demmel in [4]. This is done in the affine context in Theorem 3 and in the projective context in Corollary 8. Since σn (A) is equal to the distance from A to the set of singular matrices a natural question is to ask whether our main result remains valid for the inverse of the distance from certain sets or for more general functions. Definition 1. Let (M, h·, ·i) be Riemannian and let α : M → R be a function of class C 2 with positive values. Let Mκ be the manifold M with the new metric h·, ·iκ,x = α(x)h·, ·ix called condition Riemann structure. We say that α is self-convex when log α(γ(t)) is convex for any geodesic γ in Mκ . For example, with M = {x = (x1 , . . . , xn ) ∈ Rn : xn > 0} equipped with the usual metric, α(x) = x−2 n is self-convex. The space Mκ is the Poincaré model of hyperbolic space. 4 In the following theorem we prove self-convexity for the distance function to a C 2 submanifold without boundary N ⊂ Rj . Let us denote by ρ(x) = d(x, N ) = min kx − yk and α(x) = y∈N 1 . ρ(x)2 Let U be the largest open set in Rj such that, for any x ∈ U, there is a unique closest point in N to x. When U is equipped with the new metric α(x) h., .i we have: Theorem 2. The function α : U \ N → R is self-convex. Theorem 2 is then extended to the projective case. Let N be a C 2 submanifold without boundary of P(Rj ). Let us denote by dR the Riemannian distance in projective space (points in the projective space are lines throught the origin and the distance dR between two lines is the angle they make). Let us denote dP = sin dR (this is also a distance), define α(x) = dP (x, N )−2 , and let U be the largest open subset of P(Rj ) such that for x ∈ U there is a unique closest point from N to x for the distance dP . Then Corollary 1. The map α : U \ N → R is self-convex. The extension of Theorem 1 and Theorem 2 to other types of sets or functions is not obvious. In Example 1 we prove that α(A) = σ1 (A)−2 + · · · + σn (A)−2 is not self-convex in GLn,m . In Example 2 we take N = R2 , and U the unit disk so that U contains a point (the center) which has many closest points from N . In that case the corresponding function α : U \ N → R is self-convex but it fails to be smooth at the center of the disk. In Example 3 we provide an example of a submanifold N ⊂ R2 such that the function α(x) = d(x, N )−2 defined on R2 \ N is not self-convex. Our interest in considering the condition metric in the space of matrices comes from recent papers by Shub [8] and Beltrán-Shub [1] where these authors use condition length along a path in certain solution varieties to estimate step size for continuation methods to follow these paths. They give bounds on the number of steps required in terms of the condition length of the path. If geodesics in the condition metric are followed the known bounds on polynomial system solving are vastly improved. To understand the properties of these geodesics we have begun in this paper with linear systems where we can investigate their properties more deeply. We find self-convexity in the 5 context of this paper remarkable. We do not know if similar issues may naturally arise in linear algebra even for solving systems of linear equations. Similar issues do clearly arise when studying continuation methods for the eigenvalue problem. 2 Self-convexity Let us first start to recall some basic definitions about convexity on Riemannian manifolds. A good reference on this subject is Udrişte [9]. Definition 2. We say that a function f : M → R is convex whenever f (γxy (t)) ≤ (1 − t)f (x) + tf (y) for every x, y ∈ M, for every geodesic arc γxy joigning x and y and 0 ≤ t ≤ 1. The convexity of f in M is equivalent to the convexity in the usual sense of f ◦ γxy on [0, 1] for every x, y ∈ U and the geodesic γxy joining x and y or also to the convexity of g ◦ γ for every geodesic γ ([9] Chap. 3, Th. 2.2). Thus, we see that Lemma 1. Self-convexity of a function α : M → R is equivalent to the convexity of log ◦α in the condition Riemannian manifold Mκ . When f is a function of class C 2 in the Riemannian manifold M, we define its second derivative D2 f (x) as the second covariant derivative. It is a symmetric bilinear form on Tx M. Note ([9, Chapter 1]) that if x ∈ M and ẋ ∈ Tx M, and if γ(t) is a geodesic in M, γ(0) = x, dtd γ(0) = ẋ, then d2 D f (x)(ẋ, ẋ) = 2 (f ◦ γ)(0). dt 2 This second derivative depends on the Riemannian connection on M. Since M is equipped with two different metrics: h., .i and h., .iκ we have to distinguish between the corresponding second derivatives; they are denoted by D2 f (x) and Dκ2 f (x) respectively. No such distinction is necessary for the first derivative Df (x). Convexity on Riemannian manifold is characterized by (see [9] Chap. 3, Th. 6.2): 6 Proposition 1. A function f : M → R of class C 2 is convex if and only if D2 f (x) is positive semidefinite for every x ∈ M. We use this proposition to obtain a caracterisation of self-convexity: α is self-convex if and only if the second derivative Dκ2 (log ◦α)(x) is positive semidefinite for any x ∈ Mκ . We get Proposition 2. For a function α : M → R of class C 2 with positive values self-convexity is equivalent to 2α(x)D2 α(x)(ẋ, ẋ) + kDα(x)k2x kẋk2x − 4(Dα(x)ẋ)2 ≥ 0 for any x ∈ M and for any vector ẋ ∈ Tx M, the tangent space at x. Proof. Let x ∈ M be given. Let ϕ : Rm → M be a coordinate system such that ϕ(0) = x and with first fundamental form gij (0) = δij (Kronecker’s delta) and Christoffel’s symbols Γijk (0) = 0, and let A=α◦ϕ so that α(x) = A(0). Those coordinates are called “normal” or “geodesic”. Note that this implies ∂gij (0) = 0 ∂zk for all i, j, k. We denote by gκ,ij and Γiκ,jk respectively the first fundamental form and the Christoffel symbols for ϕ in Mκ . Let us compute them. Note that gκ,ij (z) = gij (z)A(z), ∂gκ,ij (0) = Dgκ,ij (0)(ek ) = D(gij A)(0)(ek ) = ∂zk ∂A (0). gij (0)DA(0)(ek ) + A(0)Dgij (0)(ek ) = δij ∂zk Moreover, Γiκ,jk   1 i 1 ∂gκ,ik ∂gκ,jk ∂gκ,ij = Γ = (0) + (0) − (0) = A(0) jk 2A(0) ∂zk ∂zj ∂zi   1 ∂A ∂A ∂A δij (0) + δik (0) − δjk (0) . 2A(0) ∂zk ∂zj ∂zi 7 That is,  1 ∂A i i  Γκ,ik = Γκ,ki = 2A(0) ∂zk (0) for all i, k, −1 ∂A (0), j 6= i, Γiκ,jj = 2A(0) ∂zi   i Γκ,jk = 0 otherwise. The second derivative of the composition of two maps f ψ M→R→R is given by the identity (see [9] Chap. 1.3, Hessian) D2 (ψ ◦ f )(x) = Dψ(f (x))D2 f (x) + ψ ′′ (f (x))Df (x) ⊗ Df (x) and where Df (x) ⊗ Df (x) is the bilinear form on Tx M by (Df (x) ⊗ Df (x))(u, v) = Df (x)(u)Df (x)(v). This gives in our context, that is when f = α and ψ = log, Dκ2 (log ◦α)(x) = 1 1 Dα(x) ⊗ Dα(x). Dκ2 α(x) − α(x) α(x)2 According to Proposition 1 our objective is now to give a necessary and sufficient condition for Dκ2 (log ◦α)(x) to be positive semidefinite for each x ∈ M. In our system of local coordinates the components of D2 α(x) are (see [9] Chap. 1.3) Ajk = X ∂2A ∂A ∂2A − Γijk = ∂zj ∂zk ∂zi ∂zj ∂zk i while the components of Dκ2 α(x) are Aκ,jk = X ∂2A ∂A − . Γiκ,jk ∂zj ∂zk ∂zi i If we replace the Christoffel symbols in this last sum by the values previously computed we obtain, when j = k, X i Γiκ,jj ∂A X i ∂A ∂A = Γjκ,jj + Γκ,jj = ∂zi ∂zj ∂zi i6=j 8 1 2A  ∂A ∂zj 2 while when j 6= k, 1 X − 2A i6=j X Γiκ,jk i  ∂A ∂zi 2 1 = A  ∂A ∂zj 2 1 X − 2A i  ∂A ∂zi 2 ∂A ∂A ∂A = Γjκ,jk + Γkκ,jk = ∂zi ∂zj ∂zk 1 ∂A ∂A 1 ∂A ∂A 1 ∂A ∂A + = . 2A ∂zk ∂zj 2A ∂zj ∂zk A ∂zj ∂zk Both cases are subsumed in the identity X i ∂A Γiκ,jk ∂zi 1 ∂A ∂A δjk X = − A ∂zj ∂zk 2A i  ∂A ∂zi 2 . Putting together all these identities gives the following expression for the components of Dκ2 (log ◦α)(x): 2 !  2 X 1 ∂A ∂A δjk ∂A 1 ∂A ∂A ∂ A 1 − + = − 2 Dk a2 (log ◦α)(x)jk = A ∂zj ∂zk A ∂zj ∂zk 2A i ∂zi A ∂zj ∂zk ! X  ∂A 2 1 ∂A ∂A ∂2A + δjk −4 2A . 2A2 ∂zj ∂zk ∂z ∂z ∂z i j k i Thus, Dκ2 (log ◦α)(x) ≥ 0 if and only if 2α(x)D2 α(x) + kDα(x)k2x h., .ix − 4Dα(x) ⊗ Dα(x) is positive semi-definite, that is when 2α(x)D2 α(x)(ẋ, ẋ) + kDα(x)k2x kẋk2x − 4(Dα(x)ẋ)2 ≥ 0 for any x ∈ M and for any vector ẋ ∈ Tx M. This finishes the proof. An easy consequence of Proposition 2 is the following. See also Example 3. Corollary 2. When a function α : M → R of class C 2 is self-convex then any critical point of α has a positive semi-definite second derivative D2 α(x). Such a function cannot have a strict local maximum or a non-degenerate saddle. 9 Proposition 3. The following condition is equivalent for a C 2 function α = 1/ρ2 : M −→ R to be self-convex on M: For every x ∈ M and ẋ ∈ Tx M, kẋk2 kDρ(x)k2 − (Dρ(x)ẋ)2 − ρ(x)D2 ρ(x)(ẋ, ẋ) ≥ 0, or, what is the same, 2kẋk2 kDρ(x)k2 ≥ D2 ρ2 (x)(ẋ, ẋ). Proof. Note that Dα(x)ẋ = D2 α(x)(ẋ, ẋ) = −2 Dρ(x)ẋ, ρ(x)3 6 2 2 (Dρ(x) ẋ) − D2 ρ(x)(ẋ, ẋ). ρ(x)4 ρ(x)3 Hence, the necessary and sufficient condition of Proposition 2 reads 16 12 4 4kẋk2 kDρ(x)k2 − (Dρ(x)ẋ)2 + (Dρ(x)ẋ)2 − D2 ρ(x)(ẋ, ẋ) ≥ 0, 6 6 6 ρ(x) ρ(x) ρ(x) ρ(x)5 and the proposition follows. Corollary 3. Each of the following conditions is sufficient for a function α = 1/ρ2 : M −→ R to be self-convex at x ∈ M: For every ẋ ∈ Tx M, D2 ρ(x)(ẋ, ẋ) ≤ 0, or kD2 ρ2 (x)k ≤ 2kDρ(x)k2 . In the following proposition we obtain a weaker condition on α to obtain convexity in Mκ instead of self-convexity. Proposition 4. α(x) is convex in Mκ if and only if 2α(x)D2 α(x)(ẋ, ẋ) + kDα(x)k2x kẋk2x − 2(Dα(x)ẋ)2 ≥ 0, for any x ∈ M and any vector ẋ ∈ Tx M. Proof. We follow the lines of the proof of Proposition 2 with ψ equal to the identity map instead of ψ = log. 10 3 Some general formulas for matrices Proposition 5. Let A = (Σ, 0) ∈ GL> n,m , where Σ = diag (σ1 ≥ · · · ≥ σn−1 > σn ) ∈ Kn×n . The map σn : GL> n,m → R is a smooth map and, for n×m every U ∈ K , ( Dσn (A)U = Re(unn ), P Pn−1 |ukn σn +unk σk |2 2 D2 σn2 (A)(U, U ) = 2 m . j=1 |unj | − 2 k=1 σ 2 −σ 2 k n Proof. Since σn2 is an eigenvalue of AA∗ with multiplicity 1, the implicit function theorem proves the existence of smooth functions σn2 (B) ∈ R and u(B) ∈ Kn , defined in an open neighborhood of A and satisfying  BB ∗ u(B) = σn2 (B)u(B),    ku(B)k2 = 1, u(A) = en = (0, . . . , 0, 1)T ∈ Kn ,    2 σn (A) = σn2 . Differentiating these equations at B gives, for any U ∈ Kn×m ,  ′ (U B ∗ + BU ∗ )u(B) + BB ∗ u̇(B) = (σn2 ) u(B) + σn2 (B)u̇(B), u(B)∗ u̇(B) = 0 ′ with u̇(B) = Du(B)U and (σn2 ) = Dσn2 (B)U . Pre-multiplying the first equation by u(B)∗ gives ′ u(B)∗ (U B ∗ +BU ∗ )u(B)+u(B)∗ BB ∗ u̇(B) = σn2 u(B)∗ u(B)+σn2 (B)u(B)∗ u̇(B) so that Dσn2 (B)U = σn2 and Dσn (B)U = ′ = 2Re(u(B)∗ U B ∗ u(B)) Re(u(B)∗ U B ∗ u(B)) . σn (B) The derivative of the eigenvector is now easy to compute: ′ Du(B)U = u̇(B) = (σn2 (B)In − BB ∗ )† (U B ∗ + BU ∗ − σn2 In )u(B) where (σn2 (B)In − BB ∗ )† denotes the generalized inverse (or Moore-Penrose inverse) of σn2 (B)In − BB ∗ . 11 The second derivative of σn2 at B is given by D2 σn2 (B)(U, U ) = 2Re(u̇(B)∗ U B ∗ u(B)+u(B)∗ U U ∗ u(B)+u(B)∗ U B ∗ u̇(B)) = 2Re(u(B)∗ U U ∗ u(B) + u(B)∗ (U B ∗ + BU ∗ )u̇(B)) = 2Re(u(B)∗ U U ∗ u(B)+ ′ u(B)∗ (U B ∗ + BU ∗ )(σn2 (B)In − BB ∗ )† (U B ∗ + BU ∗ − σn2 In )u(B)). Using u(A) = en and σn (A) = σn we get  Dσn2 (A)U = 2Re(U A∗ )nn = 2σn Re(unn ), Dσn (A)U = Re(unn ), and the second derivative is given by D2 σn2 (A)(U, U ) = 2Re (U U ∗ )nn + n−1 X  2 ′ (U A∗ + AU ∗ )nk (σn2 − σk2 )−1 (U A∗ + AU ∗ − σn In )kn k=1 ∗ 2Re (U U )nn + n−1 X |(U A∗ + AU ∗ )kn |2 σn2 − σk2 k=1 ! =2 m X 2 |unj | −2 j=1 ! = n−1 X |ukn σn + unk σk |2 k=1 σk2 − σn2 Corollary 4. Let A = (Σ, 0) ∈ GL> n,m , where Σ = diag (σ1 ≥ · · · ≥ σn−1 > n×n σn > 0) ∈ K . Let us define ρ(A) = σn (A)/ kAkF . Then, for any U ∈ Kn×m such that Re hA, U iF = 0, we have ( nn ) Dρ(A)U = Re(u , kAkF P Pn−1 |ukn σn +unk σk |2 kU k2F 2  m 2 2 − kAk2 σn . D2 ρ2 (A)(U, U ) = kAk |u | − 2 nj j=1 k=1 σ 2 −σ 2 n k F F Proof. Note that Dρ(A)U = iF Dσn (A)U kAkF − σn (A) 2RehA,U 2kAkF kAk2F = Dσn (A)U , kAkF and the first assertion of the corollary follows from Proposition 5. For the second one, note that h = h1 /h2 (for real valued C 2 functions h, h1 , h2 with h2 (0) 6= 0) implies D2 h = h22 D2 h1 − h1 h2 D2 h2 − 2h2 Dh1 Dh2 + 2h1 (Dh2 )2 . h32 Now, ρ2 (A) = σn2 (A)/kAk2F , D(kAk2F )U = 2RehA, U iF = 0, D2 (kAk2F )(U, U ) = 2kU k2F , and D2 σn2 (A)(U, U ) is known from Proposition 5. The formula for D2 ρ2 (A) follows after some elementary calculations. 12 . 4 The affine linear case We consider here the Riemannian manifold M = GL> n,m equipped with the usual Frobenius Hermitian product. Let α : GL> n,m → R be defined as 2 α(A) = 1/σn (A). Corollary 5. The function α is self-convex in GL> n,m . Proof. From Proposition 3, it suffices to see that 2kU k2F kDσn (A)k2F ≥ D2 σn2 (A)(U, U ). Since unitary transformations are isometries in GL> n,m with respect to the condition metric we may suppose, via a singular value decomposition that n×n A = (Σ, 0) ∈ GL> . Now, n,m , where Σ = diag (σ1 ≥ · · · ≥ σn−1 > σn ) ∈ K the inequality to verify is obvious from Proposition 5, as kDσn (A)kF = 1 and D2 σn2 (A)(U, U ) = 2 m X |unj |2 −2 j=1 n−1 X |ukn σn + unk σk |2 σk2 − σn2 k=1 ≤2 m X |unj |2 ≤ 2kU k2F . j=1 Corollary 6. Let r > 0. The function α is self-convex in the sphere Sr (GL> n,m ) of radius r in GL> . n,m Proof. It is enough to prove that any geodesic in (Sr (GL> n,m ), α) is also a geodesic in (GL> , α). Indeed, suppose that A and B are matrices in n,m > > Sr (GLn,m ) and the minimal geodesic in (GLn,m , α) between A and B is X(t),   rX(t) ≤ Lκ (X(t)). Indeed, for any t, a ≤ t ≤ b. Then we claim that Lκ kX(t)k F d dt  rX(t) kX(t)kF so that    r2  r dX(t) X(t)Re(hX(t), dX(t) iF ) dt dt = −r 3 kX(t)kF kX(t)kF d dt dX(t) dt kX(t)k2F  rX(t) kX(t)k  = F 2 F + r 2 iF )2 Re(hX(t), dX(t) dt kX(t)k4F 13 − 2r 2 1/2 iF )2  Re(hX(t), dX(t) dt  kX(t)k4F =  Hence, r 2 dX(t) dt   kX(t)k2F d dt  1/2 2 F − rX(t) kX(t)kF  kX(t)kF σn−1 (X(t)) d r dt  r iF )2  r2 Re(hX(t), dX(t) dt  kX(t)k4F = σn−1 κ rX(t) kX(t)k   rX(t) kX(t)kF  d dt ≤ σn−1 (X(t)) F ≤  dX(t) dt F kX(t)kF rX(t) kX(t)k dX(t) dt  = F = F dX(t) dt . κ Therefore X(t) can only be a minimizing geodesic if it belongs to Sr (GL> n,m ). Since all geodesics are locally minimizing geodesics, Corollary 6 follows. The following gives an example of a smooth and non-selfconvex function in GLn,m . Example 1. For n ≥ 3, the function α(A) = σ1 (A)−2 + · · · + σn (A)−2 is not self-convex in GLn,m . Proof. For simplicity we consider the case of real square matrices. We have α(A) = kA−1 k2F , Dα(A)Ȧ = −2hA−1 , A−1 ȦA−1 iF = −2hA−T A−1 A−T , ȦiF , kDα(A)k2F = 4kA−T A−1 A−T k2F , D2 α(A)(Ȧ, Ȧ) = 2kA−1 ȦA−1 k2F + 4hA−1 , A−1 ȦA−1 ȦA−1 iF . According to Proposition 4, the self-convexity of α(A) in GLn is equivalent to   2kA−1 k2F 2kA−1 ȦA−1 k2F + 4hA−1 , A−1 ȦA−1 ȦA−1 iF + 4kȦk2F kA−T A−1 A−T k2F − 8hA−1 , A−1 ȦA−1 i2F ≥ 0 This inequality is not satisfied when     0 1 0 1 0 0 A = 0 1 0 and Ȧ = −1 0 0 . 0 0 0 0 0 2 14 5 5.1 The homogeneous linear case The complex projective space. The matter of this subsection is mainly taken from Gallot-Hulin-Lafontaine [6] sect. 2.A.5. Let V be a Hermitian space of complex dimension dimC V = d + 1. We denote by P(V ) the corresponding projective space that is the quotient of V \ {0} by the group C∗ of dilations of V ; P(V ) is equipped with its usual smooth manifold structure with complex dimension dim P(V ) = d. We denote by p the canonical surjection. Let V be considered as a real vector space of dimension dimR V = 2d + 2 equipped with the scalar product Re h., .iV . The sphere S(V ) is a submanifold in V of real dimension 2d + 1. This sphere being equipped with the induced metric becomes a Riemannian manifold and, as usual, we identify the tangent space at z ∈ S(V ) with Tz S(V ) = {u ∈ V : Re hu, ziV = 0} . The projective space P(V ) can also be seen as the quotient S(V )/S 1 of the unit sphere in V by the unit circle in C for the action given by (λ, z) ∈ S 1 × S(V ) → λz ∈ S(V ). The canonical map is denoted by pV : S(V ) → P(V ). pV is the restriction of p to S(V ). The horizontal space at z ∈ S(V ) related to pV is defined as the (real) orthogonal complement of ker DpV (z) in Tz S(V ). This horizontal space is denoted by Hz . Since V is decomposed in the (real) orthogonal sum V = Rz ⊕ Riz ⊕ z ⊥ and since ker DpV (z) = Riz (the tangent space at z to the circle S 1 z) we get Hz = z ⊥ = {u ∈ V : hu, zi = 0} . There exists on P(V ) a unique Riemannian metric such that pV is a Riemannian submersion that is, pV is a smooth submersion and, for any z ∈ S(V ), DpV (z) is an isometry between Hz and Tp(z) P(V ). Thus, for this Riemannian structure, one has: hDpV (z)u, DpV (z)viTp(z) P(V ) = Re hu, viV for any z ∈ S(V ) and u, v ∈ Hz . 15 Proposition 6. Let z ∈ S(V ) be given. 1. A chart at p(z) ∈ P(V ) is defined by ϕz : Hz → P(V ), ϕz (u) = p(z + u). 2. Its derivative at 0 is the restriction of Dp(z) at Hz : Dϕz (0) = Dp(z) : Hz → Tp(z) P(V ) which is an isometry. 3. For any smooth mapping ψ : P(V ) → R, and for any v ∈ Hz we have Dψ(p(z)) (Dp(z)v) = D(ψ ◦ ϕz )(0)v and D2 ψ(p(z))(Dp(z)v, Dp(z)v) = D2 (ψ ◦ ϕz )(0)(v, v). Proof. 1 and 2 are easy. We have D(ψ ◦ ϕz )(0) = Dψ(p(z))D(ϕz )(0) which gives 3 since D(ϕz )(0)v = Dp(z)v for any v ∈ Hz . For the second derivative, recall that D2 ψ(p(z))(Dp(z)v, Dp(z)v) = (ψ ◦ γ̃)′′ (0), where γ̃ is a geodesic curve in P(V ) such that γ̃(0) = p(z), γ̃ ′ (0) = Dp(z)v. Now, consider the horizontal pV −lift γ of γ̃ to S(V ) with base point z. Note that γ(0) = z, γ ′ (0) = v. Hence, (ψ ◦ γ̃)′′ (0) = (ψ ◦ p ◦ γ)′′ (0) = D2 (ψ ◦ p)(z)(v, v) + Dψ(p(z))Dp(z)γ ′′ (0). As γ ′′ (0) is orthogonal to Tz S(V ), we have Dp(z)γ ′′ (0) = 0. Finally, D2 (ψ◦p)(z)(v, v) = (ψ◦p(z+tv))′′ (0) = (ψ◦ϕz (tv))′′ (0) = D2 (ψ◦ϕz )(0)(v, v), and the assertion on the second derivative follows. The following result will be helpful. Proposition 7. Let M1 , M2 be Riemannian manifolds and α2 : M2 → ]0, ∞[ be of class C 2 . Let π : M1 → M2 be a Riemannian submersion. Let U2 ⊆ M2 be an open set and assume that α1 = α2 ◦ π is self-convex in U1 = π −1 (U2 ). Then, α2 is self-convex in U2 . 16 Proof. Let Mκ,1 be M1 , but endowed with the condition metric given by α1 , and let Mκ,2 be M2 , but endowed with the condition metric given by α2 . Then, π : Mκ,1 → Mκ,2 is also a Riemannian submersion. Now, let γ2 : [a, b] → U2 ⊆ Mκ,2 be a geodesic, and let γ1 ⊆ Mκ,1 be its horizontal lift by π. Then, γ1 is a geodesic in U1 ⊆ M1 (see [6, Cor 2.109]) and hence log α1 (γ1 (t)) is a convex function of t. Now, log(α2 (γ2 (t))) = log(α2 ◦ π(γ1 (t))) = log(α1 (γ(t))), is convex as wanted. 2 −2 Corollary 7. The function α2 : P(GL> n,m ) → R, α2 (A) = kAkF σn (A) is > self-convex in P(GLn,m ). > Proof. Note that p : S(GL> n,m ) → P(GLn,m ) is a Riemannian submersion and α2 = α ◦ p where α is as in Corollary 6. The corollary follows from Proposition 7. 5.2 The solution variety. Let us denote by p1 and p2 the canonical maps   p2 p1 S1 → P Kn×(n+1) and S2 → P Kn+1 = Pn (K), where S1 is the unit sphere in Kn×(n+1) and S2 is the unit sphere in Kn+1 . Consider the affine solution variety,  Ŵ > = (M, ζ) ∈ S1 × S2 : M ∈ GL> n,n+1 and M ζ = 0 . It is a Riemannian manifold equipped with the metric induced by the product metric on Kn×(n+1) × Kn+1 . The tangent space to Ŵ > is given by n o T(M,ζ) Ŵ > = (Ṁ , ζ̇) ∈ TM S1 × Tζ S2 : Ṁ ζ + M ζ̇ = 0 . The projective solution variety considered here is   W > = (p1 (M ), p2 (ζ)) ∈ P Kn×(n+1) × Pn (K) : M ∈ GL> n,n+1 and M ζ = 0 , that is also a Riemannian manifold equipped with the metric induced by the  n×(n+1) product metric on P K × Pn (K). Let us denote by π1 the restriction to Ŵ > of the first projection S1 ×S2 → S1 , and by R : Ŵ > → R, R = σn ◦ π1 . We have 17 Lemma 2. Let w = (M, ζ) ∈ Ŵ > and let γ be a geodesic in Ŵ > , γ(0) = w. Then, Dσn (π1 (w))(π1 ◦ γ)′′ (0) < 0. Proof. Our problem is invariant by unitary change of coordinates. Hence, using a singular value decomposition, we can assume that M = (Σ, 0) ∈ n×n GL> and ζ = en+1 = n,n+1 , where Σ = diag (σ1 ≥ · · · ≥ σn−1 > σn ) ∈ K T > (0, . . . , 0, 1) ∈ S2 . As γ = (M (t), ζ(t)) is a geodesic of Ŵ ⊆ Kn×(n+1) × Kn , γ ′′ (0) is orthogonal to Tw Ŵ, which contains all the pairs of the form ((A, 0), 0) where A is a n × n matrix, RehΣ, Ai = 0. Hence, M ′′ (0) has the form M ′′ (0) = (aΣ, ∗), for some real number a ∈ R. Finally, M (t) is contained in the sphere so kM (t)kF = 1 and 0 = (||M (t)||2F )′′ (0) = 2||M ′ (0)||2F + 2RehM (0), M ′′ (0)i = 2||M ′ (0)||2F + 2a, so that a = −kM ′ (0)k2F and (M ′′ (0))nn = −kM ′ (0)k2F σn . From Proposition 5, Dσn (π1 (w))(π1 ◦ γ)′′ (0) = Re((π1 ◦ γ)′′ (0)nn ) = Re(M ′′ (0))nn < 0. Theorem 3. The map α : Ŵ > → R given by α(M, ζ) = σn (M )−2 is selfconvex. Proof. Using unitary invariance we can take M = (Σ, 0) ∈ GL> n,n+1 , where n×n Σ = diag (σ1 ≥ · · · ≥ σn−1 > σn ) ∈ K and ζ = en+1 = (0, . . . , 0, 1)T ∈ S2 . According to proposition 3 we have to prove that 2 kẇk2w kDR(w)k2 ≥ D2 R2 (w)(ẇ, ẇ) for every w ∈ Ŵ > and ẇ ∈ Tw Ŵ > . From Proposition 5 we have DR(w)ẇ = Dσn (π1 (w))(Dπ1 (w)ẇ) = Re(Dπ1 (w)ẇ)nn , so that kDR(w)k = 1. On the other hand, assume that ẇ 6= 0 and let γ be a geodesic in Ŵ > , γ(0) = w, γ̇(0) = ẇ. From Lemma 2, D2 R2 (w)(ẇ, ẇ) = (σn2 ◦ π1 ◦ γ)′′ (0) = 18 D2 σn2 (π1 (w))(Dπ1 (w)ẇ, Dπ1 (w)ẇ) + 2σn Dσn (π1 (w))(π1 ◦ γ)′′ (0) < D2 σn2 (π1 (w))(Dπ1 (w)(ẇ), Dπ1 (w)(ẇ)). Thus, we have to prove that for ẏ ∈ Kn×(n+1) , 2 kẏk2 ≥ D2 σn2 (π1 (w))(ẏ, ẏ). which is a consequence of our Proposition 5. Corollary 8. The map α2 : W > → R given by α2 (M, ζ) = kM k2F /σn2 (M ) is self-convex. Proof. Consider the Riemannian submersion  p1 × p2 : S1 × S2 −→ P Kn×(n+1) × Pn (K) , p1 × p2 (M, ζ) = (p1 (M ), p2 (ζ)). Note that T(M,ζ) Ŵ > contains the kernel of the derivative D(p1 × p2 )(M, ζ). Thus, the restriction p1 × p2 : Ŵ > → W > , is also a Riemannian submersion. The corollary follows combining Proposition 7 and Theorem 2. 6 Self-convexity of the distance from a submanifold of Rj Let N be a C k submanifold without boundary N ⊂ Rj , k ≥ 2. Let us denote by ρ(x) = d(x, N ) = inf kx − yk y∈N the distance from N to x ∈ Rj (here d(x, y) = kx − yk denotes the Euclidean distance). Let U be the largest open set in Rj such that, for any x ∈ U, there is a unique closest point from N to x. This point is denoted by K(x) so that we have a map defined by K : U → N , ρ(x) = d(x, K(x)). Classical properties of ρ and K are given in the following (see also Foote [5], Li and Nirenberg [7]). Proposition 8. 1. ρ is defined and 1−Lipschitz on Rj , 19 2. For any x ∈ U, x − K(x) is a vector normal to N at K(x) i.e. x − ⊥ K(x) ∈ TK(x) N , 3. K is C k−1 on U, 4. ρ2 is C k on U, Dρ2 (x)ẋ = 2 hx − K(x), ẋi and D2 ρ2 (x)(ẋ, ẋ) = 2kẋk2 − 2 hDK(x)ẋ, ẋi 5. ρ is C k on U \ N , 6. hDK(x)ẋ, ẋi ≥ 0 for every x ∈ U and ẋ ∈ Rj . Proof. 1. For any x and y one has ρ(x) = d(x, K(x)) ≤ d(x, K(y)) ≤ d(x, y) + d(y, K(y)) = d(x, y) + ρ(y). Since x and y play a symmetric role we get |ρ(x) − ρ(y)| ≤ d(x, y). 2. This is the classical first order optimality condition in optimization. 3. This classical result may be derived from the inverse function theorem applied to the canonical map defined on the normal bundle to N can : NN → Rj , can(y, n) = y + n, for every y ∈ N and n ∈ Ny N = (Ty N )⊥ . The normal bundle is a C k−1 manifold, the canonical map is a C k−1 diffeomorphism when restricted to the set {(y, n) : y + tn ∈ U, ∀ 0 ≤ t ≤ 1} and K(x) is easily given from can−1 . 4. The derivative of ρ2 is equal to Dρ2 (x)ẋ = 2 hx − K(x), ẋ − DK(x)ẋi = ⊥ 2 hx − K(x), ẋi because DK(x)ẋ ∈ TK(x) N and x−K(x) ∈ TK(x) N . Thus ∇ρ2 (x) = 2(x −K(x)) is C k−1 on U so that ρ2 is C k . The formula for D2 ρ2 follows. 5. Obvious. = ẋ(t), 6. Let x(t) be a curve in U with x(0) = x. Let us denote dx(t) dt d2 x(t) dy(t) d2 y(t) = ẍ(t), y(t) = K(x(t)), dt = ẏ(t) and dt2 = ÿ(t). From the dt2 first order optimality condition we get hx(t) − y(t), ẏ(t)i = 0 20 whose derivative at t = 0 is hẋ − ẏ, ẏi + hx − y, ÿi = 0. Thus hDK(x)ẋ, ẋi = hẏ, ẋi = hẏ, ẏi − hx − y, ÿi . 2 This last quantity is equal to 12 dtd 2 kx − y(t)k2 the second order optimality condition. t=0 . It is nonnegative by Proof of Theorem 2 and Corollary 1. We are now able to prove our second main theorem. Let us denote α(x) = 1/ρ(x)2 . We shall prove that α is self-convex on U. From proposition 3 it suffices to prove that, for every ẋ ∈ Rj , 2kẋk2 kDρ(x)k2 ≥ D2 ρ2 (x)(ẋ, ẋ) or, according to Proposition 8.4 and kDρk = 1, that 2kẋk2 ≥ 2kẋk2 − 2 hDK(x)ẋ, ẋi . This is obvious from Proposition 8.4. Now we prove Corollary 1. Let S1 (Rj ) be the sphere of radius 1 in Rj and let pRj denote the canonical projection pRj : Rj → P(Rj ). Note that the preimage of N by pRj satisfies d(y, p−1 (N )) = dP (pRj (y), N )kyk. Rj As in the proof of Corollary 6, the mapping 1/ρ(x)2 is self-convex in the set S1 (Rj ) ∩ p−1 (U). Now, apply Proposition 7 to the Riemannian submersion Rj pRj to conclude the corollary. Two examples. Example 2. Take U the unit disk in R2 and N the unit circle. The corresponding function is given by α(x) = d(x, N )−2 = 1/ (1 − kxk)2 . According to Theorem 2, the map log α(x) is convex along the condition geodesics in  U \ {(0, 0)} = x ∈ R2 : 0 < kxk < 1 . 21 This property also holds in U: a geodesic through the origin is a ray x(t) = (−1 + et )(cos θ, sin θ) when −∞ < t ≤ 0, and x(t) = (1 − e−t )(cos θ, sin θ) when 0 ≤ t < ∞ for some θ. In that case log α(x(t)) = 2 |t| which is convex. Example 3. Take N ⊂ R2 equal to the union of the two points (−1, 0) and (1, 0). In that case  α(x)−1 = d(x, N )2 = min (1 + x1 )2 + x22 , (1 − x1 )2 + x22 . It may be shown that for any 0 < a ≤ 1/10, the straight line segment is the only minimizing geodesic joining the points (0, −a) and (0, a). Since log α(0, t) = − log(1 + t2 ) has a maximum at t = 0, g(t), −a ≤ t ≤ a, cannot be log-convex. Here {0} × R is equal to the locus in R2 of points equally distant from the two nodes which is the set we avoid in Theorem 2. References [1] Beltrán C., and M. Shub, Complexity of Bézout’s Theorem VII: Distances Estimates in the Condition Metric. Foundations of Computational Mathematics, 9 (2009) 179-195. [2] Boito P., and J.-P. Dedieu, The condition metric in the space of full rank rectangular matrices. http://www.math.univtoulouse.fr/ dedieu/Boito-Dedieu-future.pdf [3] Clarke F. H., Optimization and Nonsmooth Analysis. Les Publications CRM (1989) ISBN 2-921120-01-1. [4] Demmel J. W., The probability that a Numerical Problem is Difficult. Mathematics of Computation, 50 (1988) 449-480. [5] Foote R., Regularity of the distance function, Proceedings of the AMS, 92 (1984) pp 153-155. [6] Gallot S., D. Hulin and J. Lafontaine, Riemannian Geometry, Springer (2004) ISBN 9780387524016. 22 [7] Li Y. and L. Nirenberg, Regularity of the distance function to the boundary, Rendiconti Accad. Naz. delle Sc. 123 (2005) pp 257-264. [8] Shub M., Complexity of Bézout’s Theorem VI: Geodesics in the Condition Metric. Foundations of Computational Mathematics, 9 (2009) 171-178. [9] Udriste, C., Convex Functions and Optimization Methods on Riemannian Manifolds, Kluwer (1994) ISBN 0-7923-3002-1. 23