Abstract
In this work, we study resolvent splitting algorithms for solving composite monotone inclusion problems. The objective of these general problems is finding a zero in the sum of maximally monotone operators composed with linear operators. Our main contribution is establishing the first primal-dual splitting algorithm for composite monotone inclusions with minimal lifting. Specifically, the proposed scheme reduces the dimension of the product space where the underlying fixed point operator is defined, in comparison to other algorithms, without requiring additional evaluations of the resolvent operators. We prove the convergence of this new algorithm and analyze its performance in a problem arising in image deblurring and denoising. This work also contributes to the theory of resolvent splitting algorithms by extending the minimal lifting theorem recently proved by Malitsky and Tam to schemes with resolvent parameters.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In the last decades, monotone inclusion problems have become an attractive topic of research in operator theory and numerical optimization. The wide variety of situations in applied mathematics that can be modeled as finding a zero of the sum of mixtures of maximally monotone operators is one of the reasons for its increasing popularity. Among the methods that are usually employed for tackling these problems, splitting algorithms (see, e.g., [2, Chapter 26]) are the ones that have received more attention. Using simple operations, these methods define an iterative sequence which separately handles the operators defining the problem and is convergent to a solution to the inclusion problem. Furthermore, as these methods only use first-order information, they are well suited for large-scale optimization problems.
In this work, we focus on the study of primal-dual splitting algorithms for composite monotone inclusion problems in real Hilbert spaces of the following form.
Problem 1
Let \({\mathscr{H}}\) and \((\mathcal {G}_{j})_{1\leq j\leq m}\) be real Hilbert spaces. Let \(A_{1}, \ldots , A_{n}:{\mathscr{H}} \rightrightarrows {\mathscr{H}}\) be maximally monotone operators, let \(B_{j}:\mathcal {G}_{j}\rightrightarrows \mathcal {G}_{j}\) be maximally monotone and \(L_{j}:{\mathscr{H}} \to \mathcal {G}_{j}\) be a bounded linear operator whose adjoint is denoted by \(L_{j}^{*}\), for all j ∈{1,…,m}. The problem consists in solving the primal inclusion
together with its associated dual inclusion
Problem 1 encompasses numerous important problems in mathematical optimization and real-world applications (see, e.g., [10, 11, 20]). In these settings, it is highly desirable to devise algorithms that simultaneously obtain solutions to both problems (1) and (2) –namely, a primal-dual solution– and which only make use of resolvents of the maximally monotone operators, forward evaluations of the linear operators and their adjoints, scalar multiplication, and vector addition. Many splitting methods can be found in the literature satisfying these conditions (see, e.g., [3,4,5, 12, 22]). One of the best-known primal-dual algorithm is the one proposed by Briceño-Arias and Combettes in [7], which was further studied in [6]. To derive this scheme, let us consider first the particular instance of Problem 1 in which n = m = 1 and let us define the pair of operators M and N given by
The operator M is maximally monotone and N is a skew symmetric bounded linear operator. Furthermore, the set of zeros of the sum M + N consists of primal-dual solutions to Problem 1. Applying the forward-backward-forward algorithm to the problem of finding the zeros of M + N results in the fixed point iteration given by
where γ > 0, Id denotes the identity operator and JγA stands for the resolvent of A with parameter γ (see Definition 3). Thus, since the resolvent of a cartesian product is the cartesian product of the resolvents, it can be seen that (3) is a full splitting algorithm, as it only requires evaluations of the resolvents JγA and \(J_{\gamma B^{-1}}\), and of the linear operator and its adjoint.
The general problem involving more than two operators can be addressed by setting
In this case, according to (3), the resulting algorithm is generated by a fixed point iteration of an operator defined in the ambient space \({\mathscr{H}}^{n} \times \mathcal {G}_{1} \times {\cdots } \times \mathcal {G}_{m}\). The dimension of the underlying space is directly related to the memory requirements of the resulting algorithm. In general, a smaller dimension of the space translates into less consumption of computational resources. For this reason, the development of algorithms with reduced dimension for solving monotone inclusion problems has recently become an active topic of research [8, 14, 16, 19].
Lifted splitting algorithms
The notion of lifted splitting, first introduced in [19], relates a fixed point algorithm with the dimension of its underlying ambient space. Consider the simplest case of the classical monotone inclusion problem obtained by setting m = 0 in (1):
Problem 2
Let \(A_{1},\ldots ,A_{n}:{\mathscr{H}}\rightrightarrows {\mathscr{H}}\) be maximally monotone operators and consider the problem
A fixed point algorithm for finding a solution to Problem 2 employs a d-fold lifting if its underlying fixed point operator can be defined on the d-fold Cartesian product \({\mathscr{H}}^{d}\). For example, if n = 2, the famous Douglas–Rachford algorithm [15] makes use of a 1-fold lifting, since it can be written as the fixed point iteration
with λ ∈ ]0,2[. Until very recently, the only way to tackle the problem when n > 2 was using Pierra’s product space reformulation [18], which implies an n-fold lifting. Nowadays, various algorithms have been proposed allowing to solve the problem by only resorting to an (n − 1)-lifting (see, e.g., [8, 13, 14]). This reduction from n to n − 1 has been proven to be minimal [16] when the algorithms are required to be frugal resolvent splittings [19], which means that each of the resolvents \(J_{A_{1}}, \ldots , J_{A_{n}}\) is evaluated only once per iteration.
To the best of the authors’ knowledge, the notion of lifting has not been developed in the setting of primal-dual inclusions given by Problem 1. We will say that a primal-dual splitting has (d,f)-lifting if the underlying fixed point operator can be written in the product space
with \(f = {\sum }_{j=1}^{m} f_{j}\). Thus, the Briceño–Arias–Combettes primal-dual splitting algorithm makes use of an (n,m)-fold lifting. This is also the case for the other primal-dual algorithms existing in the literature. In this work, we propose the first (n − 1,m)-lifted splitting method for solving primal-dual inclusions and demonstrate the minimality of the algorithm. In order to do this, it is important to note that the definition of frugal resolvent splitting does not allow the use of parametrized resolvents. The inclusion of these resolvent parameters is of crucial importance for controlling the Lipschitz constants of the linear operators in Problem 1, as can be seen in all the existent primal-dual schemes. This motivates the introduction of the concept of frugal parametrized resolvent splitting, whose definition coincides with the one of frugal resolvent splitting except that it permits the inclusion of resolvent parameters. Our contribution to the theory of minimal lifting splitting methods is double: (i) we extend the results of Malitsky–Tam in [16, Section 3] to frugal parametrized resolvent splitting algorithms, (ii) we prove that for a frugal primal-dual parametrized resolvent splitting (see Section 3 for a precise definition) with (d,m)-fold lifting to solve Problem 1, one necessarily has d ≥ n − 1. Our proposed algorithm is the first algorithm in the literature being minimal according to this relation.
The rest of this work is structured as follows. In Section 2, we recall some preliminary notions and results. In particular, in Section 2.1, we present the extension of the results by Malitsky–Tam [16] to parametrized resolvent splitting algorithms. In Section 3, we introduce the first primal-dual algorithm with reduced lifting for composite monotone inclusion problems and prove its convergence. The concept of parametrized resolvent splitting is adapted to primal-dual schemes in Section 3. We prove a minimality theorem under the hypothesis of frugality and show that our proposed algorithm verifies it. In Section 4, we include a numerical experiment on image deblurring and compare the performance of the new algorithm with the best performing primal-dual algorithm for this problem. The paper ends with some conclusions and possible future work directions in Section 5. Finally, in Appendix A, a detailed proof of the results in Section 2.1 is presented.
2 Preliminaries
Throughout this paper, \({\mathscr{H}}\), \(\mathcal {G}\), and \((\mathcal {G}_{j})_{1 \leq j \leq m}\) are real Hilbert spaces. Otherwise stated, to simplify the notation we will employ 〈⋅,⋅〉 and ∥⋅∥ to denote the inner product and the induced norm, respectively, of any space. We use → to denote norm convergence of a sequence. We denote by \({\mathscr{H}}^{n}\) the product Hilbert space \({\mathscr{H}}^{n}={\mathscr{H}} \times \stackrel {(n)}{\cdots } \times {\mathscr{H}}\) with inner product defined as
Sequences and sets in product spaces are marked with bold, e.g., \(\mathbf {x}=(x_{1},\ldots ,x_{n})\in {\mathscr{H}}^{n}\).
For a set-valued operator, we write \(A:{\mathscr{H}} \rightrightarrows {\mathscr{H}}\), in opposite to \(A:{\mathscr{H}}\to {\mathscr{H}}\) which denotes a single-valued operator. The notation dom, Fix, zer and gra is used for the domain, the set of fixed points, the zeros and the graph of A, respectively, i.e.,
The inverse operator of A, denoted by A− 1, is the operator whose graph is given by \(\text {gra} A^{-1} = \left \{ (u,x)\in {\mathscr{H}}\times {\mathscr{H}} : u\in A(x)\right \}\). The identity operator is denoted by Id. When \(L:{\mathscr{H}}\to \mathcal {G}\) is a bounded linear operator, we use \(L^{*}:\mathcal {G}\to {\mathscr{H}}\) to denote its adjoint, which is the unique bounded linear operator such that 〈Lx,y〉 = 〈x,L∗y〉, for all \(x\in {\mathscr{H}}\) and \(y\in \mathcal {G}\).
To simplify the notation, we will use ⟦k,l⟧ to denote the set of integers between \(k,l\in \mathbb {N}\), i.e.,
Definition 1
An operator \(T:{\mathscr{H}} \to {\mathscr{H}}\) is said to be
-
(i)
κ-Lipschitz continuous for κ > 0 if
$$ \|T(x)-T(y)\| \leq \kappa \|x-y\| \quad \forall x,y \in \mathcal{H}; $$ -
(ii)
nonexpansive if it is 1-Lipschitz continuous, i.e.,
$$ \|T(x)-T(y)\| \leq \|x-y\| \quad \forall x,y \in \mathcal{H}; $$ -
(iii)
α-averaged nonexpansive for α ∈ ]0,1[ if
$$ \|T(x)-T(y)\|^{2}+\frac{1-\alpha}{\alpha} \|(\text{Id}-T)(x)-(\text{Id}-T)(y)\|^{2} \leq \|x-y\|^{2} \quad \forall x,y\in\mathcal{H}. $$
Definition 2
A set-valued operator \(A:{\mathscr{H}} \rightrightarrows {\mathscr{H}}\) is monotone if
Furthermore, A is said to be maximally monotone if there exists no monotone operator \(B:{\mathscr{H}} \rightrightarrows {\mathscr{H}}\) such that graB properly contains graA.
Definition 3
Given an operator \(A\colon {\mathscr{H}}\rightrightarrows {\mathscr{H}}\), the resolvent of A with parameter γ > 0 is the operator \(J_{\gamma A}\colon {\mathscr{H}}\rightrightarrows {\mathscr{H}}\) defined by JγA := (Id + γA)− 1.
The next result contains Minty’s theorem [17].
Proposition 1 ([2, Proposition 23.10])
Let \(A:{\mathscr{H}}\rightrightarrows {\mathscr{H}}\) be monotone and let γ > 0. Then,
-
(i)
JγA is single-valued,
-
(ii)
\(\text {dom} J_{\gamma A}={\mathscr{H}}\) if and only if A is maximally monotone.
2.1 Parametrized resolvent splitting
Besides developing lifted splitting algorithms with reduced dimension, different works have been devoted to determine the minimal dimension reduction that can be achieved under some conditions. This is the case of [16, 19], where a minimality result is obtained for the classical monotone inclusion Problem 2. In what follows, we employ T for denoting a fixed point operator and S for a solution operator, both depending on the maximally monotone operators appearing in the problem.
Definition 4 (Fixed point encoding 19)
A pair of operators (T,S) is a fixed point encoding for Problem 2 if, for all particular instance of the problem,
Previous works on minimality are based on the concept of resolvent splitting, which does not allow employing parametrized resolvents (i.e., it only permits computation of the resolvents \(J_{A_{1}},\ldots ,J_{A_{n}}\)). In this work, we introduce the notion of parametrized resolvent splitting and adapt the minimality result in [16, Section 3] to the more general parametrized setting. Since the reasoning is very similar to the one in the mentioned reference, we only present the results here and refer the interested reader to Appendix A for a detailed demonstration.
Definition 5 (Parametrized resolvent splitting)
A fixed point encoding (T,S) for Problem 2 is a parametrized resolvent splitting if, for all particular instances of the problem, there is a finite procedure that evaluates T and S at a given point which only uses vector addition, scalar multiplication, and the parametrized resolvents of A1,…,An.
Definition 6 (Frugality)
A parametrized resolvent splitting (T,S) for Problem 2 is frugal if, in addition, each of the parametrized resolvents of A1,…,An is used exactly once.
Definition 7 (Lifting 19)
Let \(d\in \mathbb {N}\). A fixed point encoding (T,S) is a d-fold lifting for Problem 2 if \(T:{\mathscr{H}}^{d}\to {\mathscr{H}}^{d}\) and \(S:{\mathscr{H}}^{d}\to {\mathscr{H}}\).
Example 1
In [8], a product space reformulation with reduced dimension is proposed, which applied to Problem 2 yields the following lifted splitting. Given any γ > 0 and λ ∈ ]0,2], the algorithm in [8, Theorem 5.1] can be defined by the operator \(R:{\mathscr{H}}^{n-1}\to {\mathscr{H}}^{n-1}\) given by
where z = (z0,z1,…,zn− 1) and \(\mathbf {x} = (x_{0},x_{1},\ldots ,x_{n-1})\in {\mathscr{H}}^{n}\) is the vector defined as
Moreover, if we let \(S:{\mathscr{H}}^{n-1} \to {\mathscr{H}}\) be the operator given by
then the pair (R,S) is a frugal parametrized resolvent splitting with (n − 1)-fold lifting which is not a resolvent splitting, since it makes use of resolvent parameters.
Malitsky and Tam prove in [16, Theorem 3.3] that the minimal lifting that one can achieve for Problem 2 with frugal resolvent splittings is n − 1. From their proof, it cannot be directly determined whether the same result holds when the resolvents are allowed to have different parameters. The next theorem provides an affirmative answer to this question.
Theorem 2 (Minimal lifting for frugal parametrized splittings)
Let n ≥ 2 and let (T,S) be a frugal parametrized resolvent splitting with d-fold lifting for Problem 2. Then, d ≥ n − 1.
Proof
See Theorem 6 in Appendix A. □
3 A primal-dual splitting with minimal lifting
In this section, we devise a primal-dual splitting algorithm for Problem 1 with minimal lifting. We base our analysis in the case in which the primal problem involves only one linear composition, i.e., m = 1, and later extend to an arbitrary finite number of linearly composed maximally monotone operators by appealing to a product space reformulation. Lastly, we prove minimality of the algorithm by adapting the concept of lifted splitting to primal-dual algorithms.
The case with one linear composition
Let n ≥ 2. We start by considering the primal-dual problem given by
and
where \(A_{1},\ldots ,A_{n}:{\mathscr{H}}\rightrightarrows {\mathscr{H}}\) and \(B:\mathcal {G} \rightrightarrows \mathcal {G}\) are maximally monotone operators and \(L:{\mathscr{H}}\to \mathcal {G}\) is a bounded linear operator. Note that in this case, (5) corresponds to the Attouch–Théra dual problem of (4) (see [1]). In the following, we denote the set of solutions of (4) and (5) by \(\mathcal {P}\) and \(\mathcal {D}\), respectively, and consider the set Z defined as
which is useful for tackling primal-dual inclusion problems. It is well-known that Z is a subset of \(\mathcal {P} \times \mathcal {D}\) and that
Indeed, we have
We refer to an element of Z as a primal-dual solution of (4) and (5).
Now, we introduce a fixed point algorithm for solving the primal-dual problem given by (4) and (5). Let λ,γ > 0 and let \(T:{\mathscr{H}}^{n-1} \times \mathcal {G} \to {\mathscr{H}}^{n-1} \times \mathcal {G}\) be the operator given by
where \((\mathbf {x},y)=(x_{1},\ldots ,x_{n},y) \in {\mathscr{H}}^{n}\times \mathcal {G}\) depends on \((\mathbf {z},v) = (z_{1},\ldots ,z_{n-1},v) \in {\mathscr{H}}^{n-1} \times \mathcal {G}\) in the following way
In the next lemma, we characterize the set of fixed points of the operator T by means of the set of primal-dual solutions to (4) and (5).
Lemma 1
Let n ≥ 2 and λ,γ > 0. The following assertions hold.
-
(i)
If \((\bar {x},\bar {u})\in \mathbf {Z}\), then there exists \(\bar {\textbf {z}}\in {\mathscr{H}}^{n-1}\) such that \((\bar {\textbf {z}},\gamma L\bar {x}-\bar {u})\in \text {Fix} {T}\).
-
(ii)
If \((\bar {z}_{1},\ldots ,\bar {z}_{n-1},\bar {v})\in \text {Fix} {T}\), then \((J_{A_{1}}(\bar {z}_{1}),\gamma L\bar {x}-\bar {v})\in \mathbf {Z}\).
As a result,
Proof
(i) Let \((\bar {x},\bar {u})\in \mathbf {Z}\). Then, \(\bar {u} \in B(L\bar {x})\) and there exists \((a_{1},\ldots ,a_{n})\in {\mathscr{H}}^{n}\) such that \(a_{i}\in A_{i}(\bar {x})\) and \(-L^{*}\bar {u} = {\sum }_{i=1}^{n} a_{i}\). Consider the vectors \((\bar {z}_{1},\ldots ,\bar {z}_{n-1},\bar {v})\in {\mathscr{H}}^{n-1}\times \mathcal {G}\) defined as
Then, we deduce that \(\bar {x} = J_{A_{1}}(\bar {z}_{1})\) and \(\bar {x} = J_{A_{i}} (\bar {z}_{i} + \bar {x}-\bar {z}_{i-1})\) for all i ∈ ⟦2,n − 1⟧. Moreover, we have
Altogether, we obtain
which implies that \((\bar {z}_{1},\ldots ,\bar {z}_{n-1},\bar {v})\in \text {Fix} {T}\).
(ii) Let \((\bar {z}_{1},\ldots ,\bar {z}_{n-1},\bar {v})\in \text {Fix} {T}\) and set \(\bar {x}:= J_{A}(\bar {z}_{1})\). By (6), \(y = L\bar {x}\) and \(\bar {x}_{i} = \bar {x}\) for all i = 1,…,n. Consequently, from (7), we derive
Summing together the first n inclusions above and setting \(\bar {u}:=\gamma L\bar {x}-\bar {v}\), we deduce
which implies \((\bar {x},\bar {u})\in \mathbf {Z}\), as claimed. □
The following technical lemma provides nonexpansive properties of the operator T in the Hilbert space \({\mathscr{H}}^{n-1} \times \mathcal {G} \) with scalar product given by
for \((z_{1},\ldots ,z_{n-1},v),(\bar {z}_{1},\ldots ,\bar {z}_{n-1},\bar {v})\in {\mathscr{H}}^{n-1}\times \mathcal {G}\) and γ > 0.
Lemma 2
For all \((\mathbf {z},v) \! =\! (z_{1},\ldots ,z_{n-1},v)\! \in \! {\mathscr{H}}^{n-1}\! \times \! \mathcal {G}\) and \((\bar {\mathbf {z}},\bar {v}) \! =\! (\bar {z}_{1},\ldots ,\bar {z}_{n-1},\bar {v})\) \(\in {\mathscr{H}}^{n-1}\times \mathcal {G}\),
where ∥⋅∥γ denotes the norm induced by the scalar product (8). In particular, if λ ∈ ]0,1[ and \(\gamma \in {\left ]0,\frac {1}{\|L\|^{2}}\right ]}\), the operator T is λ-averaged nonexpansive.
Proof
Let \((x_{1},\ldots ,x_{n},y)\in {\mathscr{H}}^{n}\times \mathcal {G}\) and \((\bar {x}_{1},\ldots ,\bar {x}_{n},\bar {y})\in {\mathscr{H}}^{n} \times \mathcal {G}\) be given by (7) from (z,v) and \(({\bar {\mathbf {z}}},\bar {v})\), respectively. For simplicity, we denote (z+,v+) = T(z,v) and \(({\bar {\mathbf {z}}}^{+},\bar {v}^{+})=T({\bar {\mathbf {z}}},\bar {v})\). Since z1 − x1 ∈ A1(x1) and \(\bar {z}_{1}-\bar {x}_{1}\in A_{1}(\bar {x}_{1})\), by monotonicity of A1
For every i ∈ ⟦2,n − 1⟧ , we have zi + xi− 1 − zi− 1 − xi ∈ Ai(xi) and \(\bar {z}_{i} + \bar {x}_{i-1}-\bar {z}_{i-1}-\bar {x}_{i}\in A_{i}(\bar {x}_{i})\) and thus, by monotonicity of Ai
Now, since \(x_{1}+x_{n-1}-z_{n-1}-x_{n} - L^{*}\left (\gamma Lx_{1}-v\right ) \in A_{n}(x_{n}) \) and \(\bar {x}_{1}+\bar {x}_{n-1}-\bar {z}_{n-1}-\bar {x}_{n} - L^{*}\left (\gamma L\bar {x}_{1}-\bar {v}\right )\in A_{n}(\bar {x}_{n})\), again monotonicity of An results in the inequality
Finally, we have γL(x1 + xn) − v − γy ∈ B(y) and \(\gamma L(\bar {x}_{1}+\bar {x}_{n})-\bar {v}-\gamma \bar {y}\in B(\bar {y})\), so by monotonicity of B we get
Summing together (10)–(13) and rearranging, yields
The sums in (14) can be written, respectively, as
and
The third term in (14), becomes
while the fourth term yields
Lastly, making use of the Cauchy–Schwarz and Young’s inequalities, the second last term of (14) gives
while the last term can be rearranged as follows
Summing together (19) and (20) and using the Lipschitz continuity of L, we get
Multiplying (14) by 2λ and substituting (15)–(21), we obtain the final inequality
To complete the proof, just note that
from where (9) finally follows. □
Next, we state our main result, which establishes the convergence of the iterative algorithm defined by the operator T in (6) and (7).
Theorem 3
Let n ≥ 2, let \(L:{\mathscr{H}} \to \mathcal {G}\) be a bounded linear operator and let \(A_{1},\ldots ,A_{n}:{\mathscr{H}}\rightrightarrows {\mathscr{H}}\) and \(B:\mathcal {G}\rightrightarrows \mathcal {G}\) be maximally monotone operators with \(\text {zer}~{\left ({\sum }_{i=1}^{n} A_{i} + L^{*}BL\right )\neq \emptyset }\). Furthermore, let λ ∈ ]0,1[ and \(\gamma \in {\left ]0,\frac {1}{\|L\|^{2}}\right ]}\). Given an initial point \((\mathbf {z}^{0},v^{0}) = ({z_{1}^{0}},\ldots ,z_{n-1}^{0},v^{0})\in {\mathscr{H}}^{n-1}\times \mathcal {G}\), consider the sequences given by
with
Then, the following statements hold.
-
(i)
The sequence \((\mathbf {z}^{k},v^{k})_{k\in \mathbb {N}}\) converges weakly to a point \((\bar {\mathbf {z}},\bar {v})\in \text {Fix} {T}\).
-
(ii)
The sequence \(({x^{k}_{1}},\ldots ,{x^{k}_{n}},y^{k})_{k\in \mathbb {N}}\) converges weakly to \((\bar {x},\ldots ,\bar {x},L\bar {x})\) with \(\bar {x} \in \mathcal {P}\).
-
(iii)
The sequence \(\left (\gamma L{x_{i}^{k}}-v^{k} \right )_{k\in \mathbb {N}}\) converges weakly to \(\gamma L\bar {x}-\bar {v}\in \mathcal {D}\), for all i ∈ ⟦1,n⟧.
Proof
(i) The sequence in (22) is the fixed point iteration generated as
Since λ ∈ ]0,1[ and \(\gamma \in {\left ]0,\|L\|^{-2}\right ]}\), T is averaged nonexpansive by Lemma 2 and, moreover, FixT = ∅, due to Z≠∅ and Lemma 1(i). Then, by [2, Theorem 5.15] the sequence \((\mathbf {z}^{k},v^{k})_{k\in \mathbb {N}}\) converges weakly to a point \((\bar {\mathbf {z}},\bar {v})\in \text {Fix} {T}\) and \(\lim _{k\to \infty } \|(\mathbf {z}^{k+1},v^{k+1})-(\mathbf {z}^{k},v^{k})\|_{\gamma }= 0\).
(ii) From (i), the sequence \((\mathbf {z}^{k},v^{k})_{k\in \mathbb {N}}\) is bounded. Then, nonexpansivity of the resolvents and boundedness of the linear operator L imply that the sequence \((\mathbf {x}^{k},y^{k})_{k\in \mathbb {N}}=({x_{1}^{k}},\ldots ,{x_{n}^{k}},y^{k})_{k\in \mathbb {N}}\) is also bounded. Furthermore, the fact that \((\mathbf {z}^{k+1},v^{k+1})_{k\in \mathbb {N}}-(\mathbf {z}^{k},v^{k})_{k\in \mathbb {N}}\to 0\), as \(k\to \infty \), implies by (22) that
Next, by making use of the definition of resolvents and (23), we can write
where the operator \(C\colon {\mathscr{H}}^{n}\times \mathcal {G}\rightrightarrows {\mathscr{H}}^{n}\times \mathcal {G}\) is given by
The operator C is maximally monotone as the sum of a maximally monotone operator and a skew symmetric linear operator (see, e.g., [2, Corollary 25.5 (i) & Example 20.35]). Thus, the graph of C is sequentially closed in the weak-strong topology, by demiclosedness of maximally monotone operators [2, Corollary 20.38].
Now, let \((\bar {\mathbf {x}},\bar {y})\) be a weak sequential cluster point of \((\mathbf {x}^{k},y^{k})_{k\in \mathbb {N}}\). Due to (24), \(\bar {\mathbf {x}}\) is of the form \(\bar {\mathbf {x}} = (\bar {x},\ldots ,\bar {x})\in {\mathscr{H}}^{n}\) and \(\bar {y}=L\bar {x}\). Taking the limit along a subsequence of \((\mathbf {x}^{k},y^{k})_{k\in \mathbb {N}}\) which converges weakly to \((\bar {\mathbf {x}},\bar {y})\) and using demiclosedness of C, (25) and (26) yield the expression
which, by summing the first n equations, implies that \((\bar {x},\gamma L\bar {x}-\bar {v})\in \mathbf {Z}\) with \(\bar {x} = J_{A_{1}}(\bar {z}_{1})\). In particular, we have shown that \((\bar {\mathbf {x}},\bar {y})\) is directly obtained from \(\bar {\mathbf {z}}\), implying that it is the unique weak sequential cluster point of the bounded sequence \((\mathbf {x}^{k},y^{k})_{k\in \mathbb {N}}\). Thus, the full sequence converges weakly to this point.
(iii) From (i)-(ii), for all i ∈ ⟦1,n⟧, we deduce that the sequence \((\gamma L{x_{i}^{k}}-v^{k})_{k\in \mathbb {N}}\) weakly converges to \(\gamma L\bar {x}-\bar {v}\), which belongs to \(\mathcal {D}\) since \((\bar {x},\gamma L\bar {x}-\bar {v})\in \mathbf {Z}\). □
Remark 1 (Malitsky–Tam resolvent splitting 16 as a special case)
Consider Problems (4) and (5) in the particular case in which L = Id. Then, \(B:{\mathscr{H}}\rightrightarrows {\mathscr{H}}\) and (4) becomes the classical monotone inclusion problem with (n + 1)-operators. Furthermore, by setting γ = 1 in Theorem 3, it is straightforward to see that the sequences in (22) and (23) yield the Malitsky–Tam resolvent splitting with minimal lifting for (n + 1)-operators.
Remark 2 (On the parameter γ in the definition of the norm ∥⋅∥γ)
In Lemma 2, we proved that the operator T is λ-averaged nonexpansive with respect to the norm ∥⋅∥γ induced by the scalar product defined in (8). Although the use of this norm did not require detours from the usual procedure to prove convergence of the fixed point algorithm in Theorem 3, it may numerically affect the performance of the algorithm. To give an intuition about this, consider the norm of the sequence of residuals \(\left (\|(\mathbf {z}^{k+1},v^{k+1})-(\mathbf {z}^{k},v^{k})\|_{\gamma }\right )_{k\in \mathbb {N}}\), which converges to 0 as the algorithm reaches a fixed point, and note that we have
Lemma 2 implies that this sequence is monotone decreasing, but if γ is very small, the weight of the sequence of dual variables \((v^{k+1}- v^{k})_{k\in \mathbb {N}}\) in the norm would be much larger than the one of the sequence of primal variables \((\mathbf {z}^{k+1}-\mathbf {z}^{k})_{k\in \mathbb {N}}\), so a small decrease in the value of ∥vk+ 1 − vk∥ will readily imply a decrease of the norm of the sequence of residuals even if ∥zk+ 1 −zk∥ does not diminish much. Because of that, a larger number of iterations might be needed to achieve convergence of the primal sequence, which can slow down the overall convergence of the algorithm. Nonetheless, it is possible to perform some sort of pre-conditioning to prevent from having a large constant in the definition of the norm. We will further comment on this in the numerical experiments in Section 4.
The case with multiple linear compositions
A standard product space reformulation permits to extend our method to the more general inclusion Problem 1, which has finitely many linearly composed maximally monotone operators. We detail this in the following corollary, while the resulting scheme is displayed in Algorithm 1.
Corollary 1
Let n ≥ 2 and assume that Problem 1 has a solution. Let λ ∈ ]0,1[ and \(\gamma \in {\left ]0,1/ {\sum }_{j=1}^{m} \|L_{j}\|^{2} \right ]}\). Given some initial points \(\mathbf {z}^{0}= (z_{1},\ldots ,z_{n-1})\in {\mathscr{H}}^{n-1}\) and \(\mathbf {v}^{0} = ({v_{1}^{0}},\ldots ,{v_{m}^{0}})\in \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\), consider the sequences \((\mathbf {z}^{k},\mathbf {v}^{k})_{k\in \mathbb {N}}\) and \((\mathbf {x}^{k},\mathbf {y}^{k})_{k\in \mathbb {N}}\) generated by Algorithm 1. Then, the following assertions hold:
-
(i)
The sequence \((\mathbf {z}^{k},\mathbf {v}^{k})_{k\in \mathbb {N}}\) converges weakly to a point \((\bar {\mathbf {z}},\bar {\mathbf {v}})\in {\mathscr{H}}^{n-1}\times \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\).
-
(ii)
The sequence \(({x_{1}^{k}}, \ldots , {x_{n}^{k}},{y_{1}^{k}},\ldots ,{y_{m}^{k}})_{k\in \mathbb {N}}\) converges weakly to \((\bar {x},\ldots ,\bar {x},\) \(L_{1}\bar {x},\ldots ,L_{m}\bar {x})\) with \(\bar {x}\in {\mathscr{H}}\) solving the primal inclusion (1).
-
(iii)
For all i ∈ ⟦1,n⟧, the sequence \((\gamma L_{1} {x_{i}^{k}}-{v_{1}^{k}},\ldots ,\gamma L_{m} {x_{i}^{k}}-{v_{m}^{k}})_{k\in \mathbb {N}}\) converges weakly to \((\gamma L_{1}\bar {x}-\bar {v}_{1}, \ldots , \gamma L_{m}\bar {x}-\bar {v}_{m})\), which solves the dual inclusion (2).
Proof
Just note that Problem 1 can be reformulated as an instance of Problems (4) and (5) by replacing B by the operator \(\mathbf {B}: \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\rightrightarrows \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\) defined as the cartesian product and L by the linear operator
. In particular, \(\|\mathbf {L}\|^{2} ={\sum }_{j=1}^{n}\|L_{j}\|^{2}\) and its adjoint operator is \(\mathbf {L}^{*}:\mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\to {\mathscr{H}}:(v_{1},\ldots ,v_{m})\to {\sum }_{j=1}^{m} L_{j}^{*} v_{j}\). Hence, the result follows by considering the averaged nonexpansive operator T in (6) for this choice of operators and applying Theorem 3. □
Minimality for primal-dual parametrized resolvent splitting
We begin by extending the definition of fixed point encoding to englobe primal-dual problems. As in Section 2.1, we denote by T a fixed point operator and by S a solution operator, both parametrized by the maximally monotone operators as well as the linear and adjoint operators appearing in Problem 1.
Definition 8 (Fixed point encoding)
A pair of operators (T,S) is a fixed point encoding for Problem 1 if, for all particular instance of the problem,
where we recall that Z denotes the set of primal-dual solutions of the problem.
When talking about lifting for primal-dual problems, the need to distinguish between variables in the space of primal solutions and dual solutions arises. This motivates the following definition.
Definition 9 (Primal-dual lifting)
Let \(d,f\in \mathbb {Z}_{+}\). A fixed point encoding (T,S) is a (d,f)-fold lifting for Problem 1 if
and
where fj ≥ 0 for all j ∈ ⟦1,m⟧ and \(f = {\sum }_{j=1}^{m} f_{j}\). We adopt the convention that the space \(\mathcal {G}_{j}\) vanishes from the equation when fj = 0.
The need to control the Lipschitz constants of the linear operators requires the introduction of parameters in the resolvents of the maximally monotone operators. This motivates the definition of parametrized resolvent splitting introduced in Section 2.1 and which we now adapt to primal-dual splitting algorithms.
Definition 10 (Primal-dual parametrized resolvent splitting)
A fixed point encoding (T,S) for Problem 1 is a primal-dual parametrized resolvent splitting if, for all particular instance of the problem, there is a finite procedure that evaluates T and S at a given point which only uses vector addition, scalar multiplication, the parametrized resolvents of A1,…An and B1,…,Bm, and forward evaluations of L1,…,Lm and their adjoints.
Definition 11 (Frugality)
A primal-dual parametrized resolvent splitting (T,S) for Problem 1 is frugal if, in addition, each of the parametrized resolvents of A1,…,An and B1,…,Bm is used exactly once.
Remark 3 (On the absence of restrictions on the evaluation of the linear operators)
Since in the finite case, a forward evaluation of a linear operator is computationally equivalent to performing vector addition and scalar multiplication, this suggests that for practical applications there is no computational need to control the number of evaluations of the linear operators in the definition of frugality.
Example 4
Let n ≥ 2 and consider Problem 1. Let \(T:{\mathscr{H}}^{n-1}\times \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\to {\mathscr{H}}^{n-1}\times \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\) be the operator defined in (6) by setting and
. Let \(S:{\mathscr{H}}^{n-1}\times \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\to {\mathscr{H}}\times \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\) be defined as
Then, by Lemma 1 and Corollary 1, the pair (T,S) is a frugal parametrized resolvent splitting with (n − 1,m)-fold lifting.
The following result shows that the lifting of Algorithm 1 is minimal among frugal primal-dual parametrized resolvent splitting algorithms with m dual variables.
Theorem 5 (Minimality theorem for frugal parametrized splitting)
Let (T,S) be a frugal primal-dual parametrized resolvent splitting for Problem 1 with (d,m)-fold lifting. Then, if n ≥ 2, necessarily d ≥ n − 1.
Proof
By way of contradiction, let (T,S) be a frugal primal-dual parametrized resolvent splitting for Problem 1 with (d,m) fold lifting and d < n − 1. Consider the instance of the problem in which \(L_{j} =\text {Id} :{\mathscr{H}} \to {\mathscr{H}}\) for all j ∈ ⟦1,m⟦. Then, Problem 1 becomes the classical monotone inclusion problem with n + m operators and (T,S) is a frugal resolvent splitting with (d + m)-fold lifting for such problem with d + m < n + m − 1, which contradicts Theorem 2. □
Finally, we conclude this section by highlighting that Algorithm 1 can be applied with n < 2, by setting Ai = 0 if required. However, a reduction in the lifting is not obtained in this case.
Remark 4 (Algorithm 1 when n ≤ 1)
Consider Algorithm 1 applied to Problem 1 with n ≤ 1. We distinguish the two cases:
-
(i)
If n = 1, then Algorithm 1 has (1,m)-lifting. Indeed, (27) and (28) become
$$ \left( \begin{array}{ll} z^{k+1} \\ \mathbf{v}^{k+1} \end{array}\right) = \left( \begin{array}{ll} z^{k} \\ \mathbf{v}^{k} \end{array}\right) + \lambda \left( \begin{array}{cc} x^k-z^k \\ \gamma (y_1^k - L_1x^k) \\ {\vdots} \\ \gamma(y_m^k - L_m x^k) \end{array}\right) \quad \forall k\geq 0, $$(29)and
$$ \left\{\begin{array}{ll} x^{k} &= J_{A_{1}}\left( z^{k} -\sum\limits_{j=1}^{m} L_{j}^{*}(\gamma L_{j} z^{k} - {v_{j}^{k}})\right), \\ {y_{j}^{k}} &= J_{B_{j}/\gamma} \left( L_{j}(z^{k}+x^{k}) - \frac{{v^{k}_{j}}}{\gamma}\right), \quad \forall j\in{\llbracket 1,m\rrbracket}, \end{array}\right. $$(30)respectively. This means that, in contrast with what happens when n ≥ 2, there is no reduction in the lifting with respect to the number of operators involved.
-
(ii)
If n = 0, the scheme also has (1,m)-lifting. In fact, the scheme is the same as in the previous case but substituting \(J_{A_{1}}\) by Id in (30). Note that this is also the lifting obtained by the already known algorithms in the literature applied to this case.
4 Numerical experiments
In this section, we test our algorithm for solving an ill-conditioned linear inverse problem which arises in image deblurring and denoising. Let \(b\in \mathbb {R}^{n}\) be an observed blurred and noisy image of size M × N, with n = MN for grayscale and n = 3MN for color images, and denote by \(A\in \mathbb {R}^{n\times n}\) the blur operator. The problem can be tackled by means of the regularized convex non-differentiable problem
where α1,α2 > 0 are regularization parameters, \(\delta _{[0,1]^{n}}\) denotes the indicator function of the set [0,1]n, \(TV:\mathbb {R}^{n}\to \mathbb {R}\) is the discrete isotropic total variation function and W is the linear operator given by the normalized nonstandard Haar transform [21].
Recalling Remark 2, it is of interest to consider a mechanism which allows tuning the parameter γ appearing in the definition of the norm given by the inner product in (8) to an appropriate value. To this aim, we perform in (31) a change of variable of the form s = μx, with μ > 0, and instead handle the problem
Below, we will see the way in which the choice of μ can help setting a suitable parameter γ.
The minimization problem in (32) can be modeled as a composite monotone inclusion problem. For this, define the operator \(L:\mathbb {R}^{n}\to \mathbb {R}^{n}\times \mathbb {R}^{n} : x \to (L_{1} x, L_{2} x)\) where L1 and L2 are defined component-wise as
Then, the parametrized total variation function can be written as TV (μ⋅) = ∥L(⋅)∥×, with \(\|(p,q)\|_{\times } := {\sum }_{i=1}^{m} {\sum }_{j=1}^{n} \sqrt {p_{i,j}^{2} + q_{i,j}^{2}}\). Furthermore, an upper bound of the Lipschitz constant of L is given by ∥L∥2 ≤ 8μ2 (see [9] for details).
By [2, Proposition 27.5], obtaining a solution to the following problem is equivalent to solving (32)
with \(g_{1}: \mathbb {R}^{n} \to \mathbb {R}\), g1(y) = α1μ∥y∥1, \(g_{2}:\mathbb {R}^{n} \to \mathbb {R}\), g2(y) = μ∥y − b/μ∥1, \(g_{3}:\mathbb {R}^{n} \times \mathbb {R}^{n} \to \mathbb {R}\), g3(p,q) = α2∥(p,q)∥×, and \(N_{[0,1/\mu ]^{n}}\) the normal cone operator to the set [0,1/μ]n. In order to implement Algorithm 1 for solving (34), we need the expression of the following resolvents and proximity operators. By [2, Proposition 23.25 (iii)], the second term in (34) is a maximally monotone operator and its resolvent can be expressed as
where proxg = J∂g denotes the proximity operator of a function g, and \(g_{1}^{*}\) is the conjugate function to g1, which is equal to the indicator function \(\delta _{[-\alpha _{1}\mu ,\alpha _{1}\mu ]^{n}}\), and thus \(\text {prox}_{g_{1}^{*}} = P_{[-\alpha _{1}\mu ,\alpha _{1}\mu ]^{n}}\). Given σ > 0, the proximity operators of g2 and g3 are, respectively,
where ⊙ denotes element-wise product and [⋅]+ and |⋅| are applied element-wise, and
since the conjugate function of g3 is \(g_{3}^{*}:\mathbb {R}^{n}\times \mathbb {R}^{n} \to \mathbb {R}^{n}\), \(g_{3}^{*} = \delta _{S}\), with the set S defined as
and the projection operator \(P_{S}:\mathbb {R}^{n}\times \mathbb {R}^{n}\to S\) is given component-wise by
Hence, when choosing \(z^{0}\in \mathbb {R}^{n}\), \({v_{1}^{0}}\in \mathbb {R}^{n}\) and \({v_{2}^{0}}\in \mathbb {R}^{n}\times \mathbb {R}^{n}\) as starting values, and letting λ ∈ ]0,1[ and \(\gamma \in {\left ]0,1/(\|A\|^{2}+\|L\|^{2})\right ]}\), the iterative scheme in Algorithm 1 becomes
In our experiment, we replicate the problem in [5, Section 4.2], where an extensive comparison between different primal-dual algorithms is presented. Since the best performing algorithm is the Douglas–Rachford type primal-dual method in [5, Algorithm 3.1], we limit our comparison to this algorithm, whose detailed implementation is given in the cited work. We ran our experiments in MATLAB, making use of the inbuilt functions fspecial and imfilter to define an operator A which is a Gaussian blur operator of size 9 × 9 with standard deviation 4 and reflexive boundary conditions. In particular, A verifies ∥A∥ = 1 and A∗ = A. We employed as observed image b a picture taken at the Schönbrunn Palace Gardens (Vienna) subjected to the already specified blur followed by the addition of a zero-mean Gaussian noise with standard deviation 10− 3 (see Fig. 2). To test the influence on the performance of the picture size, we resized the original picture to different pixel resolutions (see Table 1).
When measuring the quality of the restored images, we use the improvement in signal-to-noise-ratio (ISNR), which is given by
where x and xk are the original and the reconstructed image at iteration k, respectively. We tuned the regularization parameters in order to guarantee an adequate ISNR value for the restored images, setting α1 := 0.005 and α2 := 0.009.
We recall that the stepsize parameter γ of Algorithm 1 must be taken in the interval \(\gamma \in {\left ]0,1/(\|A\|^{2}+\|L\|^{2})\right ]}= {\left ]0,1/(1+8\mu ^{2})\right ]}\). When μ = 1 (i.e., we solve (31)), this interval is ]0,0.111]. In our numerical experiments, we empirically observed that a very small stepsize negatively affects the performance of the algorithm, as mentioned in Remark 2. After testing different options, the most convenient one seems to be \(\mu = 1/\sqrt {8}\), which implies making the Lipschitz constant of both linear operators in the problem equal to 1.
The initialization of each of the methods was the following:
-
DR1([5, Algorithm 3.1]): starting points x0 = b and (v1,0,v2,0,v3,0) = (0,0,0), σ1 = 1, σ2 = 0.05, σ3 = 0.05, τ = 1(σ1 + σ2 + 8σ3)− 1 − 0.01, λn = 1.5 for al \(n\in \mathbb {N}\).
-
Algorithm 1 with μ = 1: starting points z0 = b and \(({v_{1}^{0}},{v_{2}^{0}}) = (0,0)\), λ = 0.99 and γ = 1/9;
-
Algorithm 1 with \(\mu =1/\sqrt {8}\): starting points z0 = b/μ and \(({v_{1}^{0}},{v_{2}^{0}}) = (0,0)\), λ = 0.99 and γ = 1/2.
We performed 400 iterations of each of the algorithms and compared the values of the objective function in (32) and the ISNR with respect to the CPU time, which provides a more realistic comparison than iteration count, since DR1 has a higher computational cost per iteration than Algorithm 1. The tests were ran on a desktop of Intel Core i7-4770 CPU 3.40GHz with 32GB RAM, under Windows 10 (64-bit). The algorithms were ran 3 times, once for each of the RGB components of the picture. The evolution in CPU time of adding these 3 values of the objective function and those of the ISNR for the 640 × 768-sized picture are represented in Fig. 1, where we observe that Algorithm 1 with \(\mu =1/\sqrt {8}\) obtains slightly better values than those returned by DR1, but in significantly less time.
The evolution of the values of the objective function and of the ISNR in CPU time for 400 iterations of Algorithm 1 with μ = 1 and \(\mu =1\sqrt {8}\) and DR1, using the 640 × 768 pixels image displayed in Fig. 2
The restored images are presented in Fig. 2. There is no much difference between the ones corresponding to Algorithm 1 with \(\mu =1/\sqrt {8}\) (bottom-middle) and DR1 (bottom-right), but a close look at the image obtained with Algorithm 1 with μ = 1 permits to observe its worse quality. To show that this trend in the performance of the algorithms is not affected by the image size, we present in Table 1 the results from running the algorithms on the same picture for five different pixel resolutions. Overall, we notice that the CPU time required for computing the 400 iterations is significantly lower for Algorithm 1, as expected. On average, DR1 required 45% more time than Algorithm 1 to compute the 400 iterations, independently of the size of the image. Regarding the parameter μ, Algorithm 1 with μ = 1 is notably outperformed by the other two methods, making thus clear the influence that this parameter has on it. The function values obtained were slightly lower for DR1, while the ISNR was slightly lower for Algorithm 1 with \(\mu =1/\sqrt {8}\), which implies that both algorithms performed similarly with respect to the restored image quality.
Interpretation of the results of the experiments
The experimental results show that, after performing the same number of iterations, Algorithm 1 with \(\mu =1/\sqrt {8}\) obtains similar results in the function values and the measurement in the quality of the image recovery than those obtained by DR1, but in considerably less time. This decrease in the running time can be attributed to the reduction in the lifting of the operator. Although in the first iterations DR1 achieves a larger reduction of the objective function, the quality of the restored image is not sufficient, as assessed by the low ISNR values. On the other hand, Algorithm 1 with μ = 1 can be discarded, as it obtains higher objective and lower ISNR values. Consequently, Algorithm 1 with \(\mu =1/\sqrt {8}\) is the preferable choice to address problem (31).
5 Conclusions and open questions
In this work, we have considered the composite monotone inclusion problem together with its dual counterpart given by Problem 1. We have extended the definition of resolvent splitting given in [19] to encompass primal-dual algorithms and the inclusion of parameters in the resolvent and presented a definition of minimal lifting for frugal schemes of this form. We have proposed the first primal-dual algorithm which presents minimal lifting in this sense, and show its good performance with a numerical example.
To conclude, we outline possible directions for further research.
Establishing an optimal criterion for tuning the stepsize γ
We pointed out in Remark 2 the influence that the parameter γ can have in the performance of the algorithm. In Section 4 we presented a possibility for controlling this parameter, by making use of a change of variable which modifies the Lipschitz constants of the linear operators, and we empirically showed that it significantly affects the speed of performance of the algorithm. However, there is no guarantee that this strategy is optimal. It would be interesting to further investigate which is the best way for tuning the value of γ.
Achieving lifting reduction in the dual variables
The reduction in the lifting with respect to the number of operators achieved in the algorithm here presented only affects the primal variables. It remains open the question of whether it is possible to reduce the dimension of the underlying space associated to the linearly composed operators. More precisely, if we consider the problem given by
is it possible to obtain an algorithm for solving this problem with (0,m − 1)-fold lifting (according to Definition 9)? Or even with (1,m − 1) or (0,m)-fold lifting? All these questions remain open.
Notes
Here, we make use of an abuse of notation. Indeed (36), should be written as y = (Yz ⊗Id)z + (Yx⊗)x, where ⊗ denotes the Kronecker product.
References
Attouch, H., Théra, M.: A general duality principle for the sum of two operators. J. Convex Anal. 3, 1–24 (1996)
Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces, 2nd edn. Springer, Berlin (2017)
Boţ, R.I., Csetnek, E.R., Heinrich, A.: A primal-dual splitting algorithm for finding zeros of sums of maximally monotone operators. SIAM J. Optim. 23(4), 2011–2036 (2013)
Boţ, R.I., Csetnek, E.R., Heinrich, A., Hendrich, C.: On the convergence rate improvement of a primal-dual splitting algorithm for solving monotone inclusion problems. Math. Program. 150(2), 251–279 (2015)
Boţ, R.I., Hendrich, C.: A Douglas–Rachford type primal-dual method for solving inclusions with mixtures of composite and parallel-sum type monotone operators. SIAM. J. Optim. 23(4), 2541–2565 (2013)
Boţ, R.I., Hendrich, C: Solving monotone inclusions involving parallel sums of linearly composed maximally monotone operators. Inverse Probl. Imaging 10(3), 617–640 (2016)
Briceño-Arias, L., Combettes, P.L.: A monotone + skew splitting model for composite monotone inclusions in duality. SIAM J. Optim. 21(4), 1230–1250 (2011)
Campoy, R: A product space reformulation with reduced dimension for splitting algorithms. Comput. Optim. Appl 83, 319–348 (2022)
Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20(1–2), 89–97 (2004)
Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997)
Chambolle, A., Pock, T: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Combettes, P.L., Pesquet, J.-C.: Primal-dual splitting algorithm for solving inclusions with mixture of composite, Lipschtizian, and parallel-sum type monotone operators. Set-valued Var. Anal. 20(2), 307–330 (2012)
Condat, L, Kitahara, D, Contreras, A, Hirabayashi, A: Proximal splitting algorithms for convex optimization: a tour of recent advances, with new twists. To be published in SIAM Review (2022)
Dao, M.N., Dizon, N., Hogan, J.A., Tam, M.K.: Constraint reduction reformulations for projection algorithms with applications to wavelet construction. J. Optim. Theory Appl. 190, 201–233 (2021)
Lions, P.L., Mercier, B: Splitting algorithms for the sum of two nonlinear operators. SIAM. J. Numer. Anal. 16(6), 964–979 (1979)
Malitsky, Y, Tam, M.K: Resolvent splitting for sums of monotone operators with minimal lifting. arXiv:2108.02897 (2021)
Minty, G.J.: Monotone (nonlinear) operators in Hilbert space. Duke. Math. J. 29, 341–346 (1962)
Pierra, G.: Decomposition through formalization in a product space. Math. Program. 28, 96–115 (1984)
Ryu, E.K.: Uniqueness of DRS as the 2-operator resolvent-splitting and impossibility of 3-operator resolvent-splitting. Math. Program. 182(1), 233–273 (2020)
Setzer, S., Steidl, G., Teuber, T.: Infimal convolution regularizations with discrete ℓ1-type functionals. Commun. Math. Sci. 9(3), 797–827 (2011)
Stollnitz, E.J., DeRose, T.D., Salesim, H.D.: Wavelets for computer graphics: a primer, part 1. IEEE Comput. Graph. Appl. 15(3), 76–84 (1995)
Vũ, B.C.: A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv. Comput. Math. 38, 667–681 (2013)
Funding
FJAA and DTB were partially supported by the Ministry of Science, Innovation and Universities of Spain and the European Regional Development Fund (ERDF) of the European Commission, Grant PGC2018-097960-B-C22. FJAA was partially supported by the Generalitat Valenciana (AICO/2021/165). RIB was partially supported by FWF (Austrian Science Fund), project P 34922-N. DTB was supported by MINECO and European Social Fund (PRE2019-090751) under the program “Ayudas para contratos predoctorales para la formación de doctores” 2019.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Proof of the minimality theorem for parametrized resolvent splitting
Appendix A: Proof of the minimality theorem for parametrized resolvent splitting
Throughout this section, we assume that n ≥ 2 and we denote by \(\mathcal {A}_{n}\) the set of all n-tuples of maximally monotone operators on \({\mathscr{H}}\). Hence, an element \(A\in \mathcal {A}_{n}\) is of the form A = (A1,…,An), where \(A_{i}:{\mathscr{H}} \rightrightarrows {\mathscr{H}}\) are maximally monotone operators for all i ∈ ⟦1,n⟧. Every instance of Problem 2 is determined by the choice of \(A\in \mathcal {A}_{n}\). In particular, when considering a fixed point encoding for this problem, the fixed point operator and the solution operator are both parametrized in terms of \(A\in \mathcal {A}_{n}\). To emphasize this idea and to facilitate the exposition, we denote these operators by TA and SA in the following.
Let (TA,SA) be a d-fold lifted frugal parametrized resolvent splitting for Problem 2. By definition, there exists a finite procedure for evaluating TA and SA using only vector addition, scalar multiplication and the resolvents \(J_{\delta _{1} A_{1}}, \ldots , J_{\delta _{n} A_{n}}\) precisely once, where δ = (δ1,…,δn)T is a vector of positive parameters. Following the same reasoning than in [16, Section 3], we can completely describe the evaluation of a point \(\mathbf {z} = (z_{1},\ldots ,z_{d})\in {\mathscr{H}}^{d}\) by TA with a series of equations. We directly present them here.
-
(1)
There exists \(\mathbf {x}= (x_{1},\ldots ,x_{n})\in {\mathscr{H}}^{n}\) and \(\mathbf {y} = (y_{1},\ldots ,y_{n})\in {\mathscr{H}}^{n}\) such that
$$ \mathbf{x}= J_{\delta A} (\mathbf{y}) \Longleftrightarrow 0 \in \mathbf{x} - \mathbf{y} + \delta A(\mathbf{x}), $$(35)where \(\delta A := (\delta _{1} A_{1}, \ldots , \delta _{n} A_{n})\in \mathcal {A}_{n}\).
-
(ii)
There exists \(Y_{z} \in \mathbb {R}^{n\times d}\) and a lower-triangular matrix \(Y_{x} \in \mathbb {R}^{n\times n}\) with zeros in the diagonal such thatFootnote 1
$$ \mathbf{y} = Y_{z} \mathbf{z} + Y_{x}\mathbf{x}. $$(36) -
(iii)
By frugality, there exists \(T_{z}\in \mathbb {R}^{d\times d}\) and \(T_{x}\in \mathbb {R}^{d \times n}\) such that
$$ T_{A}(\mathbf{z}) = T_{z}\mathbf{z} + T_{x}\mathbf{x}. $$(37)
Similarly, also by frugality, the evaluation of z by the solution operator S can be expressed as
where \(S_{z}\in \mathbb {R}^{1\times d}\) and \(S_{x}\in \mathbb {R}^{1\times n}\).
The proof of the next technical lemma can be obtained by following the same steps than in [16, Lemma 3.1], so we do not replicate it here.
Lemma 3
Let (TA,SA) be a frugal parametrized resolvent splitting for Problem 2. Let M denote the block matrix given by
If z ∈FixTA, then there exists \(\mathbf {v} = \left [\mathbf {z}, \mathbf {x}, \mathbf {y}, \mathbf {a} \right ]^{T}\in \ker {M}\) with a ∈ A(x). Conversely, if \(\mathbf {v} = \left [ \mathbf {z}, \mathbf {x}, \mathbf {y}, \mathbf {a} \right ]^{T}\in \ker {M}\) and a ∈ A(x), then z ∈FixTA, x = JδA(y) and SA(z) = Szz + Sxx.
Proposition 5 (Solution operator)
Let (TA,SA) be a frugal parametrized resolvent splitting for Problem 2 . Then, for all \(\bar {\mathbf {z}}\in \text {Fix} {T_{A}}\) and \(\bar {\mathbf {x}} = J_{\delta A}(\bar {\mathbf {y}})\), we have
where \(\bar {\mathbf {a}} =A(\bar {\mathbf {x}})\).
Proof
Consider a particular instance of Problem 2 given by some operators \(A\in \mathcal {A}_{n}\). Let TA and SA be the fixed point and solution operators of this particular instance, respectively. Let \(\bar {\mathbf {z}}\in \text {Fix} {T_{A}}\) and \(x^{*} = S_{A}(\bar {\mathbf {z}})\). By Lemma 1, there exists \(\mathbf {v} := \left [ \bar {\mathbf {z}}, \bar {\mathbf {x}}, \bar {\mathbf {y}}, \bar {\mathbf {a}} \right ]^{T}\in \ker {M}\) with \(\bar {\mathbf {a}}\in A(\bar {\mathbf {x}})\) and \(x^{*} = S_{A}(\bar {\mathbf {z}}) = S_{z} \bar {\mathbf {z}} + S_{z} \bar {\mathbf {x}}\).
Consider now the n + 1 instances of Problem 2 given by the n-tuples of maximally monotone operators \(A^{(0)}, A^{(1)},\ldots , A^{(n)}\in \mathcal {A}_{n}\) defined as
Since \({\mathbf {v}}\in \ker M\) and \(\bar {\mathbf {a}} = A^{(j)}(\bar {\mathbf {x}})\), for all j ∈ ⟦0,n⟧, Lemma 1 implies that \(\bar {\mathbf {z}} \in \text {Fix} {T_{A^{(j)}}}\), \(\bar {\mathbf {x}} = J_{\delta A^{(j)}}(\bar {\mathbf {y}})\) and thus, \(S_{A^{(j)}}(\bar {\mathbf {z}})= S_{z} \bar {\mathbf {z}} + S_{x}\bar {\mathbf {x}}=x^{*}\) is a solution to every instance. Therefore, we have \(0 = {\sum }_{i=1}^{n} A_{i}^{(0)}(x^{*}) = {\sum }_{i=}^{n} \bar {a}_{i}\) and hence
from where it follows that \(x^{*} = \bar {x}_{1} = {\cdots } = \bar {x}_{n}\). Finally, since \(\bar {\mathbf {x}} = J_{\delta A^{(0)}} (\bar {\mathbf {y}})\), we have that \(\bar {\mathbf {y}}-\bar {\mathbf {x}} = \delta A^{(0)}(\bar {\mathbf {x}})=(\delta _{1}\bar {a}_{1},\ldots ,\delta _{n}\bar {a}_{n})\). Consequently, \({\sum }_{i=1}^{n} \bar {y}_{i} - n x^{*} = {\sum }_{i=1}^{n} \delta _{i} \bar {a}_{i}\), which completes the proof. □
Note that, although the expression for the solution operator given by (39) differs from the one obtained in [16, Proposition 3.2], it still holds that the vector \(\bar {\mathbf {x}}\) belongs to the diagonal subspace of dimension n, which we denote by Δn. This is what we employ to prove the following theorem.
Theorem 6
Let (TA,SA) be a frugal parametrized resolvent splitting with d-fold lifting for Problem 2. Then, d ≥ n − 1.
Proof
Suppose, by contradiction, that (TA,SA) is a frugal parametrized resolvent splitting for Problem 2 with d-fold lifting such that d ≤ n − 2. Consider a particular instance of the problem given by \(A\in \mathcal {A}_{n}\) such that \(\text {zer}~{\left ({\sum }_{i=1}^{n} A_{i}\right )} \neq \emptyset \) and take z ∈FixTA. By Lemma 1, there exists \(\mathbf {v}:= \left [ \mathbf {z}, \mathbf {x}, \mathbf {y}, \mathbf {a} \right ]^{T}\in \ker M\) with a ∈ A(x). The last row of M implies that 0 = (Tz −Id)z + Txx. Since \(T_{x}\in \mathbb {R}^{d\times n}\) and d ≤ n − 2, by the rank-nullity theorem, \(\dim \ker T_{x} = n - \dim \text {rank} T_{x} \geq n - d \geq 2\). Since Δn is a subspace of dimension 1, there exists \(\bar {\mathbf {x}}\notin {\Delta }_{n}\) such that \(T_{x} \mathbf {x} = T_{x}\bar {\mathbf {x}}\).
Now, set \({\bar {\mathbf {z}}} := \mathbf {z}\), \(\bar {\mathbf {y}} := Y_{z} \bar {\mathbf {z}} + Y_{x} \bar {\mathbf {x}}\) and \(\bar {\mathbf {a}} := ((\bar {y}_{1}-\bar {x}_{1})/\delta _{1},\ldots ,(\bar {y}_{n} -\bar {x}_{n})/\delta _{n})\) and consider the instance of the problem given by \(\bar {A}\in \mathcal {A}_{n}\) defined as \(\bar {A}(\mathbf {s}) := \bar {\mathbf {a}}\) for all \(\mathbf {s}\in {\mathscr{H}}^{n}\). Then, \(\bar {\mathbf {v}} := \left [\bar {\mathbf {z}}, \bar {\mathbf {x}}, \bar {\mathbf {y}}, \bar {\mathbf {a}}\right ]^{T}\in \ker M\) with \(\bar {\mathbf {a}} =\bar {A}(\bar {\mathbf {x}})\). By Lemma 3 and Proposition 5, this implies that \(\bar {\mathbf {x}} \in {\Delta }_{n}\), obtaining thus a contradiction which completes the proof. □
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Aragón-Artacho, F.J., Boţ, R.I. & Torregrosa-Belén, D. A primal-dual splitting algorithm for composite monotone inclusions with minimal lifting. Numer Algor 93, 103–130 (2023). https://doi.org/10.1007/s11075-022-01405-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11075-022-01405-9