Solving Underdetermined Nonlinear Equations by Newton-Like Method
Solving Underdetermined Nonlinear Equations by Newton-Like Method
Solving Underdetermined Nonlinear Equations by Newton-Like Method
Abstract Newton method is one of the most powerful methods for finding
solution of nonlinear equations. In its classical form it is applied for systems of
n equations with n variables. However it can be modified for underdetermined
equations (with m < n, m being the number of equations). Theorems on
solvability of such equations as well as conditions for convergence and rate of
convergence of Newton-like methods are addressed in the paper. The results
are applied to systems of quadratic equations, one-dimensional equations and
inequalities.
Keywords nonlinear equations · quadratic equations · existence of solution ·
Newton method · underdetermined equations · feasibility
1 Introduction
B. Polyak
Institute for Control Sciences, Profsoyuznaya 65, 117997 Moscow, Russia;
Skolkovo Institute of Science and Technology, Skolkovo Innovation Center Building 3, 143026
Moscow, Russia, E-mail: boris@ipu.ru
A. Tremba
Institute for Control Sciences, Profsoyuznaya 65, 117997 Moscow, Russia, E-mail:
atremba@ipu.ru
2 Boris Polyak, Andrey Tremba
and references see [8, 23, 19]. The basic version of Newton method for (1) is
applicable when g(x) is differentiable and g 0 (x) is invertible (this implies m =
n):
xk+1 = xk − αk z k , k = 0, 1, . . .
(3)
z k = arg minz {||z|| : g 0 (xk )z = y − g(xk )}.
Solving underdetermined nonlinear equations by Newton-like method 3
If m = n and g 0 (xk )−1 exists, the method coincides with classical Newton
for αk = 1 and damped Newton for αk = α < 1. Starting at x0 = 0 the latter
method converges to x∗ under some additional constraints on ||y||, L, ρ, µ, α
(see Theorem 1 below on rigorous conditions). Notice that norms in Rn , Rm
can be chosen arbitrarily, and they imply different forms of Newton method
(3) and various conditions on solvability and convergence.
The first goal of the present paper is to give explicit expressions of the
method (3) for various norms and to provide simple, easily checkable conditions
for convergence of the method. This also provides existence theorems: what is
the feasible set Y such that y ∈ Y implies solvability of (1).
The second goal is to develop constructive algorithms for choosing stepsizes
αk to achieve fast and as global as possible convergence. We suggest different
strategies for constructing algorithms and study their properties.
We also examine some special cases of the nonlinear equations. One of
them is the quadratic case, when all components of g are quadratic functions:
1
gi (x) = (Ai x, x) + (bi , x), Ai = ATi , bi ∈ Rn . (4)
2
In this case we try to specify above results and design the algorithms to check
feasibility of a vector y ∈ Rm . The next important case is the scalar one, i.e.
m = 1. We specify general results for scalar equations and inequalities; the
arising algorithms have much in common with unconstrained minimization
methods. Finally we discuss nonlinear equations having some special structure.
Then convergence results can be strongly enhanced.
Few words on comparison with known results. The paper, which contains
the closest results to ours, is [14]. Nesterov addresses the same problem (1)
and his method (in our notation) has the form
xk+1 = xk − z k , k = 0, 1, . . .
z k = arg min{||g(xk ) − y + g 0 (xk )z|| + M ||z||2 },
z
2 Preliminaries
First of all let us specify subproblem of finding vector z k in (3) for different
norms of x ∈ Rn .
Az = b, b ∈ Rm , z ∈ Rn (5)
The Lemma is claiming that matrix A has full row rank equal to m provided
(6) holds. It is another way to say that the mapping A : Rn → Rm is onto
mapping, i.e. covering all image space. In case of Euclidean norms parameter
µ0 is the smallest singular value of the matrix µ0 = σm (A) (we denote singular
values of a matrix in Rm×n in decreasing order as σ1 ≥ σ2 ≥ . . . ≥ σm ).
Finally we introduce sum of double exponential functions Hk : [0, 1) →
P∞ `
R+ , Hk (δ) = `=k δ (2 ) (cf. [16]) and inverse function for the first of them
∆(·) : R+ → [0, 1), such that of ∆(H0 (δ)) ≡ δ, δ ∈ [0, 1). All functions Hk (δ)
are monotonically increasing on δ. In results below we also use specific constant
1 1
c = H0 ≈ 0.8164, s.t. ∆(c) = .
2 2
Following upper and lower approximations appear to be useful for simplifying
Theorems’ resulting expressions
k
δ (2 ) 1 H
Hk (δ) ≤ (2 k) = −(2k) , ∆(H) ≥ . (7)
1−δ δ −1 1+H
Proof We apply algorithm (3) with α > 0 small enough and x0 = 0. The
algorithm is well defined — condition B and Lemma 1 imply existence of
solutions z k provided that xk ∈ B; this is true for k = 0 and will be validated
recurrently. Standard formula
Z 1
g(x + z) = g(x) + g 0 (x + tz)zdt
0
Lα2 k 2
uk+1 ≤ |1 − α|uk + kz k .
2
Now condition B and Lemma 1 transform this estimate into
Lα2 u2k
uk+1 ≤ |1 − α|uk + .
2µ2
2
2µ u0
Choose α = ε Lu 0
(1 − µρ ) with small ε < 1 satisfying 0 < α < 1; it
is possible due to condition C. From the above inequality we get uk+1 ≤
uk (1 − α + αε uuk0 (1 − µρ u0
)). For k = 0 this implies u1 < u0 and recurrently
u0
uk+1 < uk . We also get uk+1 ≤ quk , q = (1 − α + αε(1 − µρ )) < 1. Thus
k
uk ≤ q u0 and uk → 0 for k → ∞.
k
On the other hand we have kxk+1 −xk k = αkzk k ≤ kg(x µ)−yk = uµk ≤ q k uµ0 .
Hence for any k, s and for k → ∞
k+s−1
k+s k
X u0
kx −x k≤ kxi+1 − xi k ≤ q k → 0.
(1 − q)µ
i=k
It is worth noting that if we apply pure Newton method (i.e. take αk ≡ 1),
2
the conditions of its convergence are more restrictive: we need kyk ≤ 2µ
L , that
is we guarantee only local convergence even for ρ = ∞.
In this case our method (3) reduces to classical Newton method (2).
For m < n equation (1) in general case has many solutions, we constructed
just one of them, and we can not guarantee that the solution is the closest to
the initial point x0 = 0.
Among assumptions of Theorem 1 the main difficulty admits Assump-
tion B. In many examples L can be estimated relatively easy; it is not the
case for µ. Much simpler is to check its analog in a single point x = 0. Let us
modify the solvability result in this way, with assumptions
B0 . The following inequality holds for some µ0 > 0:
kg 0 (0)T hk∗ ≥ µ0 khk∗ , ∀h ∈ Rm .
µ20 µ0
C0 . kyk < 4L , 2L ≤ ρ.
Theorem 2 If conditions A, B0 , C0 hold then there exists a solution x∗ of
(1), and kx∗ k ≤ 2kyk
µ0 .
Proof For any linear operator A and its dual A∗ (coinciding with AT in real-
valued vector spaces) holds kAk = kA∗ k∗ , where dual operator norm is induced
by dual norms [7]. Then from condition A follows same Lipschitz constant for
transposed derivative operator g 0 (·)T :
k(g 0 (xa ) − g 0 (xb ))T k∗ = kg 0 (xa ) − g 0 (xb )k ≤ Lkxa − xb k, xa , xb ∈ B.
µ0
Denote r = 2L , due to second part of Condition C ball Br = {x : kxk ≤
r} lays within B. In Br evidently holds condition A. Also in Br we have
kg 0 (x)T hk∗ ≥ kg 0 (0)T hk∗ − k(g 0 (x) − g 0 (0))T hk∗ ≥ (µ0 − Lkxk)khk∗ ≥ (µ0 −
Lr)khk∗ ≥ µ0 khk2
∗
, thus condition B holds with µ = µ20 . Condition C is also
µ20
satisfied on this ball due to first part of Condition C0 as kyk < 4L = rµ and
Theorem 1 holds (with r instead of ρ in its statement). t
u
It is worth noting that the result of Theorem 2 is always local, even for
ρ = ∞ (compare with Corollary 1). From the proof it also follows that second
µ0
condition 2L ≤ ρ in Assumption C0 is non-restrictive.
Corollary 3 Assumption C0 can be replaced with kyk ≤ (µ0 − Lr∗ ) r∗ , result-
µ0
ing in theorem’s statement kx∗ k ≤ r∗ , where r∗ = min{ρ, 2L }.
4 Main algorithms
For our purposes it is more convenient to write the main equation not in the
form (1) (where we assumed g(0) = 0, x0 = 0) but as
P (x) = 0, P : Rn → Rm . (8)
Then via trivial change of variables the Newton method becomes
z k = arg min kzk,
P 0 (xk )z=P (xk )
xk+1 = xk − αk z k , k = 0, 1, . . .
8 Boris Polyak, Andrey Tremba
B00 . The following inequality holds for all x ∈ B and some µ > 0:
If A00 , B00 hold true, we have same recurrent inequalities for uk = kP (xk )k:
Lαk2 kzk k2
uk+1 ≤ |1 − αk |uk + , (9)
2
Lαk2 u2k
uk+1 ≤ |1 − αk |uk + , (10)
2µ2
the second one being just continuation of the first one based on estimate
kzk k2 ≤ uµk , compare calculations in the proof of Theorem 1. Now we can
minimize right-hand sides of these inequalities over αk ; it is natural to expect
that such choice of step-size imply the fastest convergence of uk to zero and
thus the fastest convergence of iterations xk to a solution. If one applies such
policy based on inequality (10), optimal α depends on µ, L (Algorithm 1 be-
low). Its value is hard to estimate in most applications, thus the method would
be hard for implementation. Fortunately, we can modify the algorithm using
parameter adjustment (Algorithm 2). On the other hand the same policy based
on (9) requires just the value L, which is commonly available (Algorithm 3).
Thus we arrive to an algorithm which we call Newton method while in fact
it is blended pure Newton with damped Newton with special rule for damping.
In some relation it reminds Newton method for minimization of self-concordant
functions [13].
n µ2 o
xk+1 = xk − min 1, z k , k ≥ 0. (11)
LkP (xk )k
Its optimum over α is at αk = uβk < 1, if uβk < 1 (i.e. uk > β); and αk = 1
otherwise.
During Stage 1 of damped Newton steps (αk < 1) target functional mono-
tonically decreases as
β
uk+1 ≤ uk − . (15)
2
2u0
There are at most kmax = max{0, β − 2} iterations in the phase, say
k ones, resulting in uk ≤ β. As soon uk reaches threshold β, the algorithm
switches to Stage 2 (pure Newton steps). Then recurrent relation (15) becomes
2
1 2 uk
uk+1 ≤ u = 2β , k ≥ k.
2β k 2β
so we can write
(2` ) (2` )
uk 1
uk+` ≤ 2β ≤ 2β , ` ≥ 0.
2β 2
1
For the second phase kxi+1 − xi k = kz i k ≤ µ ui due Lemma 1, and for `2 ≥
`1 ≥ 0 holds
2 −1
`X u u
k+`2 k+`1 2β
kx −x k≤ kxi+1 − xi k ≤ H`1 k − H`2 k . (16)
µ 2β 2β
i=`1
u
The {xk } sequence is a Cauchy sequence because Hj ( 2βk ) ≤ Hj ( 12 ) →j→∞ 0.
It converges to a point x∗ : kP (x∗ )k = limk→∞ uk = 0 due to continuity of P ,
with
k+` ∗ 2β uk 2β 1
kx −x k≤ H` ≤ H` , ` ≥ 0. (17)
µ 2β µ 2
Next we are to estimate distance from points xk in Stage 1 to the limit
solution point x∗ . One-step distance is bounded by constant kxk+1 − xk k ≤
αk β
µ uk = µ , k < k, and
k−1
X β
kxk − x∗ k ≤ kxk − x∗ k + kxi+1 − xi k ≤ (k − k + 2c), k < k. (18)
µ
i=k
Note that the formula also coincides with upper bound (17) at k = k. Exact
number k of steps in first phase is not known, but we can replace it with upper
1 α
The relation is very alike to Newton method analysis of [16], with γ = |1 − α|, λ = µ
.
Solving underdetermined nonlinear equations by Newton-like method 11
2
bound kmax in all estimates (15)–(18). Substituting β = µL back we arrive to
Theorem 3 bounds (14).
Finally we are to ensure our primal assumption of algorithm-generated
points xk being within B. This is guaranteed by one of two conditions, de-
pending on whether the Algorithm starts from Stage 1 or Stage 2 steps.
2
In the first case kP (x0 )k ≥ µL , and kx0 − xk k can be bounded similarly
to (16)
k−1
X k−1
X ∞
X
kx0 − xk k ≤ kxi+1 − xi k ≤ kxi+1 − xi k + kxi+1 − xi k ≤
i=0 i=0 i=k
µ µ µ 2L 0
≤ (k + 2c) ≤ (kmax + 2c) = kP (x )k − 2 + 2c .
L L L µ2
2
It is sufficient to satisfy kP (x0 )k ≤ µL (1 + 12 Lρ
0
µ − 2c ) for guaranteeing kx −
2
xk k ≤ ρ. As we assumed kP (x0 )k ≥ µL in this case, we conclude that this
condition may hold only if Lρ
µ ≥ 2c. This results in (12b).
2
In the second case we have kP (x0 )k < µL , and the algorithm makes pure
Newton steps with αk ≡ 1 from the beginning. Then k = 0, and from (16)
follows
2µ kP (x0 )k kP (x0 )k 2µ LkP (x0 )k
kxk −x0 k ≤ (H0 −Hk )≤ H0 , k ≥ 0.
L 2β 2β L 2µ2
The inequality kx0 − xk k ≤ ρ, k ≥ 0 is satisfied if
2µ2
Lρ
kP (x0 )k ≤ ∆ .
L 2µ
2
In order to have kP (x0 )k < µL , argument value Lρ
2µ must be less than c. This
results in condition (12a). Altogether (12) ensures xk ∈ B, k ≥ 0 and bounds
(14). t
u
Result on the rate of convergence means, roughly speaking, that after no
more than kmax iterations one has very fast (quadratic) convergence. For good
initial approximations kmax = 0, and pure Newton method steps are performed
from the very start.
If we use approximation bounds (7), then condition (12a) can be replaced
with the simpler one:
−1
2µ2
0 2µ Lρ
kP (x )k ≤ 1+ , if ≤ c. (19)
L Lρ 2µ
also (14d) can be roughly estimated as
2µ 1
kxk − x∗ k ≤ (2 k−kmax ) , k ≥ kmax ,
L 2 −1
(k−kmax −1) µ
or even simpler bound kxk − x∗ k ≤ 2.32 · 2−2 L.
12 Boris Polyak, Andrey Tremba
Presented Algorithm 1 explicitly uses two constants µ and L but both enter
2
into the algorithm as one parameter β = µL . There is a simple modification
allowing adaptively change an estimate of the parameter.
Input of the algorithm is an initial point x0 , approximation β0 and param-
eter 0 < q < 1.
uk+1 = P (xk − αk zk ).
2. If either
αk
αk < 1 and uk+1 < (1 − )uk ,
2
or
1
αk = 1 and uk+1 < uk ,
2
holds, then go to Step 4. Otherwise
3. apply update rule βk ← qβk and return to Step 1 with-
out increasing counter.
4. Take
xk+1 = xk − αk zk ,
set βk+1 = βk , increase counter k ← k + 1, and go to
Step 1.
Let us mention two other versions of adaptive Newton method. First one
uses increasing update (e.g. βk+1 = q2 βk with q2 > 1) in the end of Step 4,
thus adapting the constant to current xk . Also other decrease policies can be
applied to βk in Step 3.
Second option is to use line-search or Armijo-like rules for choosing step-size
αk to minimize objective function kP (xk − αz k )k directly. Rigorous validation
of the algorithms can be provided.
n kP (xk )k o
xk+1 = xk − min 1, z k , k ≥ 0.
Lkz k )k2
distance (21), and eventually lead to more conservative condition (20). Surpris-
ingly enough, in practice the algorithm (and its adaptive variant) sometimes
converges faster than Algorithm 1, because direction-wise (along z k ) Lipschitz
constant is less or equal than uniform Lipschitz constant of Assumption A00 ,
and convergence rate can be better.
The idea of adaptive algorithm with estimates Lk works as well for Algo-
rithm 3; including its modifications with increasing Lk and/or line-search over
α.
For comparison specify convergence property of pure Newton method (αk = 1).
2µ
Theorem 4 Let conditions A00 , B00 hold. If δ = 2µ
L 0
2 kP (x )k < 1 and L H0 (δ) ≤
∗
ρ, then pure Newton method converges to a solution x of (8), and
2µ2 (2k ) 2µ
kP (xk )k ≤ δ , kxk − x∗ k ≤ Hk (δ).
L L
It coincides with Corollary 1 of [16], proven in Banach space setup (a misprint
in [16] is corrected here). In m = n case the result is minor extension of
Mysovskikh’s theorem [7].
Most of the theorems above require quite strict Assumption B00 or similar ones.
In fact, usually we can estimate µ in one point, say, x0 , but not for entire ball
B. Initially Kantorovich-type results on Newton method were proved for such
“single-point condition” setup, and we proceed to such results as corollary of
more general case.
Let µ0 satisfy condition (6) with A = P 0 (x0 ). Following same reasoning as
in Theorem 2 proof, we can estimate constants µ on balls Br with variable
radius
µ(r) ≥ µ0 − rL.
We added dependence on r to indicate that for arbitrary r ∈ [0, µL0 ) there is a
corresponding constant µ > 0.
If assumption of type A or A00 hold on Bρ , then Theorem 3 and their exten-
sions can be rewritten in terms of µ(r) and r instead of ρ. It is done similarly
to Corollary 3. Optimization over interval r ∈ [0, µL0 ) ∩ [0, ρ] gives maximal
µ0
allowed range of kP (x0 )||. The simplest choice is r = 2L , then we can take
µ0
µ = 2 . We omit obvious versions of the Algorithms and convergence results
for this case. Two examples of such theorems are given in Subsection 5.4,
for the case where A00 , B00 hold everywhere. We encountered that both Algo-
rithm 1 and Algorithm 3 in such case are subject to the same upper bound,
and, in fact, shall start with Stage 2 of pure Newton method.
Solving underdetermined nonlinear equations by Newton-like method 15
5 Special Cases
In the section we outline few important cases in more detail, namely solv-
ing equations with special structure, solving scalar equations or inequalities,
solvability of quadratic equations.
It is not hard to see that Assumptions A00 , B00 hold on the entire space Rn
and Algorithm 1 converges, with Theorem 3 and Corollary 1 providing rate of
convergence. The rate of convergence depends on estimates for L, µ which can
be written as functions of µϕ , M and singular values of matrix C with rows
ci . However the special structure of the problem allows to get much sharper
results.
Indeed P 0 (x) = D(x)C, D(x) = diag (ϕ0 (cTi x)) and the main inequality
(16) can be proved to be
α2 u2k M
uk+1 ≤ (1 − α)uk + γ ,γ = 2 .
2 µϕ
1
Hence uk+1 ≤ uk − 2γ at Stage 1, thus this inequality does not depend on C!
As the result we get estimates for the rate of convergence which are the same
for ill-conditioned and well-conditioned matrices C.
This example is just an illustrating one (explicit solution of the problem can
be found easily), but it emphasizes the role of special structure in equations
to solve.
f (x) = 0, f : Rn → R.
f (xk )
zk = ei , i = arg max |∇f (xk )i |, in case of `1 -norm,
k∇f (xk )k∞ i
k f (xk ) k
z = ∇f (x ), in case of Euclidean norm,
k∇f (xk )k22
f (xk )
zk = sign (∇f (xk )), in case of `∞ -norm,
k∇f (xk )k1
where ej = (0, . . . , 0, 1, 0, . . . , 0)T is j-th orth vector, and sign (·) function is
coordinate-wise sign function, sign : Rn → {−1, 1}n .
Constant µ (and µ0 ) are also calculated explicitly via conjugate (dual)
vector norm µ = minx∈B k∇f (x)k∗ , µ0 = k∇f (x0 )k∗ . For any norms kz k k =
|f (xk )|/k∇f (xk )k∗ , and damped Newton step is performed iff k∇f (xk )k2∗ <
L|f (xk )|, otherwise pure Newton step is made.
If we choose `1 norm, the method becomes coordinate-wise one. Thus, if
we start with x0 = 0 and perform few steps (e.g. we are in the domain of
attraction of pure Newton algorithm) we arrive to a sparse solution of the
equation.
In Euclidean case a Stage 1 step (damped Newton) of Algorithm 1 is
1
xk+1 = xk − sign (f (xk ))∇f (xk ),
L
which is exactly gradient minimization step for function |f (xk )|. Stage 2 (pure
Newton) step is
f (xk )
xk+1 = xk − ∇f (xk ).
k∇f (xk )k22
This reminds well-known subgradient method for minimization of convex func-
tions. However in our case we do not assume any convexity properties, and the
direction may be either gradient or anti-gradient in contrast with minimization
methods!
One-dimensional inequality
f (x) ≤ 0, f : Rn → R.
can be efficiently solved as well. Denote the set of points where the inequality
is violated as S = {x : f (x) > 0}. Suppose that µ = minx∈S k∇f (x)k∗ > 0
and L is Lipshitz constant for ∇f (x) on S. Then Algorithm 1 can be applied
for xk ∈ Rn \ S with the only change — if f (xk ) ≤ 0, then we have obtained
the solution and algorithm stops. Thus we arrive to the method (for `2 norm):
1. If f (xk ) ≤ 0, then stop, a solution is found;
Solving underdetermined nonlinear equations by Newton-like method 17
1
2. If k∇f (x)k22 < Lf (xk ), then xk+1 = xk − k k
L f (x )∇f (x ).
k
f (x )
3. Otherwise xk+1 = xk − k∇f (xk )k22
∇f (xk ), increase k and return to Step 1.
Proceed to a specific nonlinear equation, namely the quadratic one. Then the
function g(x) may be written componentwise as (4), with gradients
∇gi (x) = Ai x + bi ∈ Rn , i = 1, . . . , m.
∇g1 (x)T
T
x A1 + bT1
g 0 (x) = .. .. m×n
= ∈R .
. .
T T T
∇gm (x) x Am + bm
from [18], where λmax is the maximal eigenvalue of a matrix. Other esti-
mates can be obtained via elaborate convex semidefinite optimization problem
(SDP), cf. [22] for details. We obtain the following consequence of Theorem 2
for Euclidean norms.
18 Boris Polyak, Andrey Tremba
Theorem 5 Suppose that matrix H has rank m, and µ0 > 0 is its smallest
singular value. For quadratic function g equation g(x) = y has a solution x∗ (y)
for all
µ2
kyk < 0 , (22)
4L
µ0
with kx∗ (y)k ≤ 2L .
µ20
kyk ≤ s1 , s1 ≈ 0.1877178,
L
with kx∗ (y)k ≤ t1 µL0 , t1 ≈ 0.40100511, where the constants s1 and r1 are
t
maxima and maximizer of function S(t) = 2(1 − t)2 ∆( 2(1−t) ), t ∈ [0, 21 ].
Proof As ρ = ∞, the Theorem 3 is valid for any r ≤ µL0 with µ(r) = µ0 − Lr.
We are to estimate a value of r with maximal allowed bound (12) on kP (x0 )k.
First assume that r ≤ µL0 2c+1
2c
≈ 0.62 µL0 . Then function S(t) is a repre-
2
sentation of upper bound (12a) normalized to multiplier µL and written via
variable t = L
µ r. The function is unimodal with maximum s1 = S(t1 ). Direct
2c L L
check of second case r ∈ ( 2c+1 µ0 , µ0 ] and corresponding bounds (12b), (20)
µ20
with respect to substitution µ ← µ(r), ρ ← r reveal lesser than s1 L bound
on kP (x0 )k.
3 µ20
kyk ≤ .
16 L
We have discussed above how Newton-like methods can be applied for solving
one inequality. Below we address some tricks to convert systems of inequalities
into systems of equations.
Solving underdetermined nonlinear equations by Newton-like method 19
gi (x) ≤ 0, i = 1, . . . , m, x ∈ R` ,
gi (x) + x2`+i = 0, i = 1, . . . , m, x ∈ Rn , n = ` + m.
n
X
Ai,j zi2 = bi , i = 1, . . . , m, z ∈ Rn .
j=1
6 Numerical tests
Pi (x) = φ((ci , x) − bi ) − yi , x ∈ Rn , y ∈ Rm
where
t 0 1 + (1 + |t|)e−|t|
φ(t) = , φ (t) = .
1 + e−|t| (1 + e−|t| )2
Matrix C with rows ci , vectors b, y were generated randomly. For function
φ(t) we have µφ = maxt φ0 (t) ≥ 0.5, M = maxt |φ00 (t)| ≤ 2 for all t. Thus if
we do not pay attention to the special structure of the problem we have µ ≥
0.5σmin (C), L ≤ 2σmax (C) and the convergence can be slow, because matrix C
can be ill-conditioned.
2
On the other hand if we take into account the structure
and replace µL (= 0.0031 in example) in Algorithm 3 (see Subsection 4.3) with
µ2φ
M = 0.125 much faster convergence is achieved. Simulation results on Figure 1
confirm this conclusion. Most iterations are generally spent in first phase of
the Algorithm, and the number is close to theoretic bound N ≈ kmax (β), (13).
We also tested adaptive Algorithm 2 on same initial point, and it performed
better for bigger initial β0 = 5, as shown on Figure 2.
20 Boris Polyak, Andrey Tremba
Acknowledgements Authors thank Yuri Nesterov for helpful discussions and references.
References
24. Yu, S., Nguyen, H.D., Turitsyn, K.S.: Simple certificate of solvability of power flow
equations for distribution systems. Power & Energy Society General Meeting, IEEE
(2015)