BolognaniZampieri NECSYS2010
BolognaniZampieri NECSYS2010
BolognaniZampieri NECSYS2010
compensator i
+
• h(g(q), q) = 0;
input
• ∃ q̄ such that v = 0 is exponentially stable for (1) smart microgrid
with constant input q(t) = q̄ (and g(q̄) = 0).
−
Assume that each node is capable of measuring the i-th
element vi of v.
Let us consider a quadratic cost function of q
compensator N
M
F (q) = qT q + mT q , M > 0 (2)
2
whose gradient ∇F (q) = Mq + m coincides with the Fig. 1. Schematic representation of the controller structure
function g(q) given above. In other words, by measuring proposed in Tedeschi et al. (2008).
the steady-state response of the system (1) to the constant
input q, the nodes can estimate (element-wise) the gradi- Both residential and industrial users may require a si-
ent of the cost function (2) in q. Notice that as g(q̄) = 0, nusoidal current which is not in phase with voltage. A
then q̄ = arg min F (q). convenient description for that consists in stating that
they demand reactive power together with active power,
In this work we focus on the problem of designing a associated with out-of-phase and in-phase components of
distributed algorithm for the minimization of (2) subject the current respectively. Precisely, the active power p and
to a linear equality constraint, that is solving reactive power q delivered at node i are
min F (q) subject to aT q = b. (3)
q pi (t) = Ui Ii cos φ, qi (t) = Ui Ii cos φ,
2.1 Communication constraints where φ is the phase difference θiu − θii .
These power terms can be defined also in the case in
Communication between agents is constrained to be con- which signals are not sinusoidal and in which they can
sistent with a given communication graph G = (V, E), be considered perturbed versions of periodic signals with
where E ⊆ V × V is a set of edges (i, j), with i, j ∈ V. period T0 = 2π/ω0 . Following Tenti and Mattavelli (2003),
Therefore, for every node i, there exists a set of neighbors we define the homo-integral of a generic function x(t) as
Ni = {j ∈ V|(j, i) ∈ E} ⊆ V which is the set of agents from x̂(t) = ω0 X(t) − X̄(t)
which node i can gather data. We assume that (i, i) ∈ E ∀ i, Rt R t+T
and that G is strongly connected. where X(t) = 0 x(τ )dτ and X̄(t) = T10 t 0 X(t)dt.
3. OPTIMAL REACTIVE POWER COMPENSATION By introducing the scalar product (function of time)
Z t+T0
IN MICROGRIDS 1
hx, yit = x(τ )y(τ )dτ (4)
T0 t
As a motivating example, we introduce in this section the
we can then define active and reactive powers in this more
problem of optimal reactive power compensation in power
general framework as
distribution networks.
pi (t) = hu, iit , qi (t) = hû, iit .
Let us define a smart microgrid as a portion of the
electrical power distribution network that connects to the Like active power flows, reactive power flows contribute
larger distribution grid (or to the transmission grid) in one to power losses on the transmission and distribution lines,
point and that is managed autonomously from the rest of cause voltage drop in the network, and may lead to grid
the network. In particular, ancillary services like reactive instability (see Kundur (1994) and references therein).
power compensation, voltage support, and voltage profile
quality enhancement, are taken care by some microgrid Reactive power is not a “real” physical power, meaning
controllers, whose objective is to provide a high quality that to produce it there is no energy conversion involved
of the service to the microgrid users while satisfying some nor fuel costs. It is therefore preferrable to minimize
constraint on how the microgrid interfaces with the larger reactive power flows by producing reactive power as close
grid. We focus here on reactive power compensation. as possible to the users that need it.
Consider the steady state of the network, when voltages One possible approach has been proposed in Tedeschi
and currents are sinusoidal signals at frequency ω0 /2π in et al. (2008), and is sketched in Figure 1. It consists in
any point (at any port) of the network: a centralized controller that measures the reactive power
flow at the input port of the microgrid, i.e. where the
ui (t) = Ui sin(ω0 t + θiu ), ii (t) = Ii sin(ω0 t + θii ). microgrid connects with the main grid. According to this
qi fi
1 i
fi
1 i
fj
j
N
qj N
nodes in Fi
user
Fig. 3. The subset Fi , defined as the set of nodes containing
compensator the root for which the edge i is the only bridge.
is generally tackled via gradient-driven iterative methods constraint (as proof of Lemma 1 shows, feasibility of
(see for example Boyd and Vandenberghe (2008)). A q+ is guaranteed regardless of the choice of H).
general formulation of the iterative update law of these According to the avilable level of knowledge of the system,
methods is the following: different degrees of approximation of the inverse of M can
q+ = q − Γg(q) be achieved, resulting in different algorithms.
(7)
Γ+ = Φ(q, Γ, g(q)). On one hand, we saw in Lemma 1 that if M−1 is com-
1 pletely known, it can be exploited to obtain the fastest
where Γ is an N × N gain matrix .
1 In this and in the following sections, we introduced the shorter quantities that appear in the algorithms), when this does not lead
notation q+ = q(tk+1 ) and q = q(tk ) (and similarly for other to confusion.
(one-step) convergence. However, Newton method requires aT ∆q(tk )∆d(tk )u
that node i knows the whole i-th row of M−1 . This may aT G(tk+1 )u = aT G(tk )u +
∆dT (tk )∆d(tk )
not be possible in large-scale systems, and jeopardizes the
possibility of node insertion and removal. aT G(tk )∆d(tk )∆dT (tk )u
−
∆dT (tk )∆d(tk )
On the other hand, when minimal knowledge is available, a
= 0,
diagonal H = αI can be used, obtaining the specialization
of steepest descent method to the linearly constrained case. where we used ∆q(tk ) = G(tk )d(tk ), and the fact that
Unfortunately, the steepest descent descent method may aT ∆d(tk ) = 0 and aT d(tk ) = 0. Therefore by induction
require a large number of iterations to converge, depending the thesis is verified.
on the condition number of the Hessian M (Boyd and Lemma 2 guarantees that aT ∆q = 0, or in other words
Vandenberghe (2008)). This results to be a major problem it guarantees that the update step for q returns always a
in our framework, where estimating the gradient g(q) feasible point for the constrained optimization problem, if
for a given q requires that the systems is driven into q(0) is feasible.
the state q, and consists in measuring the steady state
response of a dynamical system (therefore introducing an Lemma 3. The estimate G(tk+1 ) has full rank as long as
implicit tradeoff between accuracy and time delay in the G(0) is full rank and d(tj ) 6= 0 for 0 ≤ tj ≤ tk .
measurement). Proof. By Sherman-Morrison formula, G(tk+1 ) has full
Quasi-Newton methods (see for example Dennis and Schn- rank whenever G(tk ) is full rank and
abel (1983) and Nocedal and Wright (2006)) have instead ∆dT (tk )G(tk )−1 ∆q(tk ) 6= 0.
the useful feature of building an estimate of the inverse of By substituting we have
the Hessian from the previous steps of the algorithm. In
∆dT (tk )G(tk )−1 ∆q(tk ) = −∆dT (tk )d(tk )
the framework of complex systems, these methods deserve
special attention, as they require minimal knowledge of the = dT (tk )G(tk )MΩa d(tk )
problem, and they can deal with time-varying structures = dT (tk )G(tk )Md(tk )
via their adaptive behavior. which is zero if and only if d(tk ) = 0. Therefore by
Consider the following specialization of Broyden’s algo- induction G(tk+1 ) has full rank.
rithm (belonging to the class of quasi-Newton methods) Lemma 4. (Lemma 2.1 in Gay (1979)). Consider the k-th
for the constrained optimization problem (3): iteration of algorithm (11). If
q+ = q − Gd d(tk ) and ∆d(tk−1 ) are linearly independent,
[∆q − G∆d] ∆dT (11) then for 1 ≤ j ≤ b(k + 1)/2c, the j + 1 vectors
G+ = G + T
i
[Ωa MG(tk−2j+1 )] d(tk−2j+1 ), 0 ≤ i ≤ j,
∆d ∆d
where d = Ωa g, Ωa = I − aaT /aT a, is the projection of are linearly independent.
the gradient on the constraint, and Proof. The proof is given in Gay (1979).
∆d = d+ − d
We can now state the following result on the global, finite-
∆q = q+ − q. time convergence of (11).
Suppose that G is initialized as αI, for some α > 0, and Theorem 5. Consider the algorithm (11) initialized with
that it is not updated if k∆dk = 0. 2 G(0) = αI, with α > 0, and q(0) any feasible state. Then
It is easy to see that (11) is a special case of (7), in which the algorithm converges in at most 2N steps to the solution
Γ = GΩa and in which the rank-1 update for G satisfies of the constrained quadratic problem (3).
the secant condition Proof. By Lemma 4, there exists k with 1 ≤ k ≤ 2N
G+ ∆d = ∆q. such that d(tk ) and ∆d(tk−1 ) are linearly dependent.
Trivially, if d(tk ) = 0, this solves the optimization problem
The following lemmas will be helpful in proving the global (3), as the gradient is orthogonal to the constraint and
convergence of this algorithm 3 . Lemma 2 ensures that q(tk ) is a feasible point. If instead
Lemma 2. For any G(tk ) returned by the algorithm (11), ∆d(tk−1 ) = 0, then from the definition of d we have
and for all u ∈ RN , Ωa M∆q(tk−1 ) = 0, and therefore
Mq(tk ) = Mq(tk−1 ) + βa for some β ∈ R.
aT u = 0 ⇒ aT G(tk )u = 0.
Being M invertible, this means that ∆q(tk−1 ) = βM−1 a.
Proof. Consider the base case G(0) = αI. We have By left multiplying both terms by aT and using Lemma 2,
we get
aT G(0)u = αaT u = 0. βaT M−1 a = aT ∆q(tk−1 ) = 0.
Let us now suppose that the condition is verified for G(tk ), Therefore β = 0 and ∆q(tk−1 ) = 0. By Lemma 3 and
and consider G(tk+1 ). We have (11), this implies that d(tj ) = 0 for some j ≤ k − 1, and
therefore the solution has been reached in at most k ≤ 2N
2 steps. As a last case, suppose that d(tk ) and ∆d(tk−1 ) are
It will be clear in the proof of Theorem 5 that if k∆dk = 0, then
the algorithm has converged.
both non zero, but they are linearly dependent. Therefore
3 For these lemmas and for Theorem 3 we need to express time there exists λ 6= 0 such that
dependance explicitely. d(tk ) = λ∆d(tk−1 ).
From the algorithm equations and from the secant con- Of course, the way in which x is computed and the way
dition we have that ∆d(tk−1 ) = Ωa M∆q(tk−1 ) = in which all agents agree on its value is a key point in
Ωa MH(tk )∆d(tk−1 ). The same is then true for d(tk ), the design of the algorithm. For example a finer time
yielding d(tk ) = Ωa MG(tk )d(tk ). By rearranging the scale might exist: the fusion vector x can be obtained as
algorithm equations it is easy to see that d(tk+1 ) = d(tk )+ the result of another distributed algorithm, which exploits
Ωa M∆q(tk ) and therefore the same communication graph. This faster algorithm is
d(tk+1 ) = d(tk ) − Ωa MG(tk )d(tk ) = 0. initialized locally on the basis of the data stored in the
nodes and on the basis of measured steady state of the
Even in this case, d(tk+1 ) = 0 together with Lemma 2 underlying system. As it runs at a much faster pace, by
guarantee that q(tk+1 ) is the solution of the costrained the end of the period of time T it is able to implement
optimization problem. the function f of the data and to guarantee that all nodes
agree on a common x.
5. ALGORITHM DISTRIBUTION
Among the main algorithms that can be exploited to
obtain the fusion vector x, the tool of average consensus
In this section we deal with the problem of distributing (as described for example in Olfati-Saber et al. (2007) and
the algorithm among the nodes, in a way that is consis- in Fagnani and Zampieri (2008)) is of particular interest.
tent with the capabilities of the nodes to gather infor-
mation from neighbors according to the communication Average consensus can be quite useful when dealing with
constraints given by the problem. optimization problems subject to linear equality con-
straints. Consider indeed the decomposition (10) of the
Consider the decomposition of the generic algorithm (7) update step in an approximate Newton method, consisting
into update laws for the single agents. The generic agent i in an uncontrained descent step followed by its projection
has to perform the update over the constraint:
qi+ = qi − ΓTi g(q) aT Hg(q)
(12) q+ = q − Hg(q) + Ha
Γ+
i = Φi (q, Γ, g(q)). aT Ha
= q − Hg(q) + xHa.
where ΓTi is the i-th row of Γ. It is not possible, in general,
to implement (12) in a distributed manner, as both the The proposed decomposition has the major advantage that
update for qi and for the vector Γi require quantities that the scalar x on which all nodes have to agree, can be
are not available at node i: q, g(q), and Γ. obtained via average consensus, once the communication
graph is strongly connected. Indeed, if every node is
In the special case in which at every iteration of the capable of initializing a vector
algorithm
(i) ai Hi g(q)
Γij 6= 0 ⇒ (i, j) ∈ E, (13) z (0) = , (16)
ai Hi a
then the update law for qi only requires information that
can be gathered from the set of neighbors Ni . The issue of where ai and Hi are the i-th component of a and the
distributing the algorithm among the agents reduces then i-th row of H, respectively, then by running an average
to a sparsity condition on Γ (and a similar condition can consensus algorithm the nodes eventually agree on the
be stated for the update law for Γi ). quantity
1 T
However, note that the decision variables in the optimiza- a Hg(q)
z̄ = N 1
tion problem in (3) are coupled in two ways: by a possibly T
non diagonal Hessian M, and by the complicating con- a Ha
N
straint in which all the variables appear. For this reason, from which every node can obtain the desired quantity
it is inherently hard to solve the optimization problem via T
x = a aHg(q)
T Ha .
some purely local laws.
To tackle this issue, we introduce some shared piece of Therefore if we choose an estimate H of M−1 that satisfies
information among all agents: let x ∈ Rp be an auxiliary the sparsity constraint given by the communication graph,
fusion vector, function of the whole system state, and i.e.
suppose that all the nodes agree on the value of x. Hij 6= 0 ⇒ (i, j) ∈ E,
then both the computation of ∆qdesc and the inizialization
The local update laws (12) can then be replaced by of z for the computation of ∆qproj requires only commu-
qi+ = qi − γi (qj , gj (q), j ∈ Ni ; η i , x) nication between neighbors in G.
(14)
η+i = φi (qj , gj (q), j ∈ Ni ; η i , x).
6. DISTRIBUTED QUASI-NEWTON METHODS
where η i is a local parameter vector. The fusion vector x
is a function of all the data available in the system:
The approach presented in last section for algorithm
x = f (q, g(q), η i , i ∈ V). (15) distribution will now be applied to the Broyden quasi-
Newton method proposed in Section 4.
Algorithm (14) is now consistent with the communication
graph, because the update laws for both qi and η i are Let us define g(i) the the part of the gradient correspond-
function of local data (qi , gi (q), η i ), of data that can be ing to the set of the neighbors Ni = {j1 , . . . jni } of node
gathered from neighbors j ∈ Ni , and of the fusion vector i:
(i)
x. g(i) ∈ Rni , g` = gj` for ` = 1, . . . , ni . (17)
Following the sparse secant method proposed in Dennis 150
and Schnabel (1983) (Theorem 11.2.1) and including a
projection step as proposed in Section 5, let us consider 140