[PDF] Backward-Forward-Reflected-Backward Splitting for Three Operator Monotone Inclusions

Abstract

In this work, we propose and analyse two splitting algorithms for finding a zero of the sum of three monotone operators, one of which is assumed to be Lipschitz continuous. Each iteration of these algorithms require one forward evaluation of the Lipschitz continuous operator and one resolvent evaluation of each of the other two operators. By specialising to two operator inclusions, we recover the forward-reflected-backward and the reflected-forward-backward splitting methods as particular cases. The inspiration for the proposed algorithms arises from interpretations of the aforementioned reflected splitting algorithms as discretisations of the continuous-time proximal point algorithm.

Full PDF

aa r X i v : . [ m a t h . O C ] J a n Backward-Forward-Reﬂected-Backward Splitting forThree Operator Monotone Inclusions

Janosch Rieger ∗ Matthew K. Tam †‡ January 22, 2020

Abstract

In this work, we propose and analyse two splitting algorithms for ﬁnding azero of the sum of three monotone operators, one of which is assumed to be Lips-chitz continuous. Each iteration of these algorithms require one forward evaluationof the Lipschitz continuous operator and one resolvent evaluation of each of theother two operators. By specialising to two operator inclusions, we recover theforward-reﬂected-backward and the reﬂected-forward-backward splitting methodsas particular cases. The inspiration for the proposed algorithms arises from inter-pretations of the aforementioned reﬂected splitting algorithms as discretisations ofthe continuous-time proximal point algorithm.

Keywords. operator splitting · monotone operators · dynamical systems MSC2010. · · · · In this work, we propose two new splitting algorithms for ﬁnding a zero of the sum ofthree monotone operators in a real Hilbert space H with inner-product h· , ·i and inducednorm k·k . Precisely, we consider monotone inclusions of the form0 ∈ ( A + B + C )( x ) , (1)where A, C : H ⇒ H are maximally monotone operators, the operator B : H → H issingle-valued, monotone and Lipschitz continuous, and ( A + B + C ) − (0) = ∅ . We areparticularly interested in the case when B is not cocoercive. This situation arises, forinstance, when considering the ﬁrst order optimality condition for saddle-point problemsof the form min x ∈H max y ∈H f ( x ) + f ( x ) + Φ( x, y ) − g ( y ) − g ( y ) (2) ∗ School of Mathematics, Monash University, 9 Rainforest Walk, Clayton VIC 3800,

Australia .Email: [email protected] † School of Mathematics & Statistics, The University of Melbourne, Parkville VIC 3010,

Australia .Email: [email protected] ‡ Institute for Numerical and Applied Mathematics, University of Göttingen, Lotztestr. 16–18, 37083Göttingen,

Germany f , f : H → ( −∞ , + ∞ ] and g , g : H → ( −∞ , + ∞ ] are proper lower semicon-tinuous convex functions and Φ : H × H → R is a smooth convex-concave function.Precisely, the optimality condition for (2) is given by (1) with H = H × H and A ( x, y ) = ∂f ( x ) ∂g ( y ) ! , B ( x, y ) = ∇ x Φ( x, y ) −∇ y Φ( x, y ) ! , C ( x, y ) = ∂f ( x ) ∂g ( y ) ! . (3)The operator B in (3) is Lipschitz continuous whenever ∇ Φ is, but even in the simplerealisation of (2) where Φ is a bilinear form, the operator B fails to be cocoercive [15,Section 1]. Splitting algorithms which do not require cocoercivity of B are thereforeof general interest for solving the saddle-point problem (2), and they currently attractparticular attention because of their success in training generative adversarial networks [9, 20, 16].Until recently, most known splitting algorithms could only directly solve monotoneinclusions with two operators, instead resorting to a higher-dimensional product spacereformulation when more than two operators were involved (see, for instance, [4, Propo-sition 26.4]). One of the ﬁrst schemes to overcome this for three operator inclusions wasproposed by Davis & Yin in [10] which, in turn, generalises earlier work by Raguet, Fadili& Peyré [17]. In this connection, see [12, 13]. Davis–Yin splitting for (1) with stepsize λ >  x k = J λA ( z k ) y k = J λC (cid:16) x k − z k − λB ( x k ) (cid:17) z k +1 = z k + y k − x k , (4)where J T := ( I + T ) − denotes the resolvent operator of a maximally monotone operator T : H ⇒ H which is a single-valued operator with full-domain [4, Chapter 23]. When B = 0, the method (4) reduces to Douglas–Rachford splitting and, when A = 0, it reducesto the forward-backward method given by x k +1 = J λC (cid:16) x k − λB ( x k ) (cid:17) . (5)Thus, like for the forward-backward method, it is necessary that B is cocoercive to guar-antee convergence of (4). Consequently, (4) cannot be used to solve (2).In order to overcome this shortcoming, Ryu & V˜u proposed the forward-reﬂected-Douglas–Rachford splitting method in [19], which, for stepsizes γ > λ >

0, takes theform  x k +1 = J λA (cid:16) x k − λz k − λB ( x k ) + λB ( x k − ) (cid:17) y k +1 = J γC (cid:16) x k +1 − x k + λz k (cid:17) z k +1 = z k + 1 λ (2 x k +1 − x k − y k +1 ) . (6)The discovery of this method was computer-assisted and used ideas from the performanceestimation methodology [21, 18, 11]. When B = 0, the method (6) reduces to Douglas–Rachford splitting, and, when C = 0, it reduces to the forward-reﬂected-backward method [15] given by x k +1 = J λC (cid:16) x k − λB ( x k ) + λB ( x k − ) (cid:17) . (7)Unlike the forward backward method, the iteration (7) does not require B to be cocoerciveand can applied to (2) when f = g = 0 (see [15]). As a consequence, the method (6)2lso does not require cocoercivity of B . A closely related method with the same propertyis the reﬂected-forward backward method [6, 14] which takes the form x k +1 = J λC (cid:16) x k − λB (2 x k − x k − ) (cid:17) . (8)To the best of the authors’ knowledge, an explanation for why (7) or (8) convergewithout cocoercivity has not yet been given (beyond their proofs), although there aresatisfactory interpretations in the case when C = 0 (see [20, 8]).In the ﬁrst part of this work, we address this issue by providing interpretations of(7) and (8) in the general case as two diﬀerent discretisations of the same asymptoticallystable dynamical system. This interpretation explains their convergence in the absenceof cocoercivity. In the second part of this work, we use this interpretation to derive twonew three operator splitting algorithms for solving (1) which, in contrast to (6), use thesame stepsize in resolvents of A and C . In further contrast to approach used in (6), ouralgorithms were not discovered using computer-assistance, but rather they are derivedsystematically through discretising a continuous-time dynamical system.The remainder of this work is structured as follows. In Section 2, we propose aninterpretation of two reﬂected splitting algorithms for two operator inclusions, namelythe forward-reﬂected-backward and the reﬂected-forward-backward splitting methods, asdiscretisations of the continuous-time proximal point algorithm. In Sections 3 and 4, weexploit an analogous technique to derive two new algorithms for solving the three operatorinclusion (1) and analyse their convergence. More precisely, the scheme in Section 3 gen-eralises the forward-reﬂected-backward method, and the scheme in Section 4 generalisesthe reﬂected-forward-backward splitting method. In this section, we provide interpretations of the forward-reﬂected-backward (7) and thereﬂected forward-backward (8) splitting schemes as discretisations of continuous-time dy-namical systems associated with the proximal point algorithm. To this end, in this section,we restrict our attention to the inclusion0 ∈ ( B + C )( x ) ⊆ H , (9)where B : H → H and C : H ⇒ H are maximally monotone and B is L -Lipschitzcontinuous. In other words, we consider the three operator inclusion (1) with A = 0.In the absence of stronger assumptions such as cocoercivity of B or strong monotonicityof B + C , the forward-backward method need not converge [7]. Recall that, with constantstepsize λ >

0, this method takes the form x k +1 = J λC (cid:16) x k − λB ( x k ) (cid:17) , (10)and can be interpreted as a discretisation of the dynamical system (see [1, 3])˙ x ( t ) + x ( t ) = J λC (cid:16) x ( t ) − λB ( x ( t )) (cid:17) . As a monotone inclusion, this system takes the form − ˙ x ( t ) ∈ λC (cid:16) ˙ x ( t ) + x ( t ) (cid:17) + λB (cid:16) x ( t ) (cid:17) . (11)3e posit that the reliance of (10) on cocoercivity of B for convergence arises from thechoice of the argument of B in (11). Indeed, by augmenting the argument of B , we obtainthe dynamical system − ˙ x ( t ) ∈ λC (cid:16) ˙ x ( t ) + x ( t ) (cid:17) + λB (cid:16) ˙ x ( t ) + x ( t ) (cid:17) which is asymptotically stable ( i.e., it has the property that all its trajectories converge tosolutions of the inclusion (1)) whenever the sum B + C is maximally monotone. This factcan be shown by noting its equivalence to the continuous-time proximal point algorithmgiven by ˙ x ( t ) + x ( t ) = J λ ( B + C ) (cid:16) x ( t ) (cid:17) . (12)We will now use the dynamical system (12) to interpret the forward-reﬂected-backwardand the reﬂected forward-backward splitting schemes (7) and (8). To this end, we ﬁrstdecouple B and C to obtain˙ x ( t ) + x ( t ) = J λC (cid:16) x ( t ) − λB (cid:16) ˙ x ( t ) + x ( t ) (cid:17)(cid:17) . (13)In this form, the system is not explicit due to the appearance of ˙ x ( t ) on the right-handside. To deal with this diﬃculty, we approximate B (cid:16) ˙ x ( t ) + x ( t ) (cid:17) using the linearisationof B at x ( t ). Denoting y ( t ) = B ( x ( t )) and assuming suﬃcient smoothness, we obtain B (cid:16) x ( t ) + ˙ x ( t ) (cid:17) ≈ B ( x ( t )) + J B ( x ( t )) ˙ x ( t ) = y ( t ) + ˙ y ( t ) , (14)where J B denotes the Jacobian of B and the the identity J B ( x ( t )) ˙ x ( t ) = ˙ y ( t ) is a conse-quence of the chain rule. Substituting (14) into (13) gives the system  ˙ x ( t ) + x ( t ) = J λC (cid:16) x ( t ) − λy ( t ) − λ ˙ y ( t ) (cid:17) y ( t ) = B (cid:16) x ( t ) (cid:17) . (15)Let h >

0. We now approximate the trajectories of (15) at the time points ( kh ) k ∈ N bydiscrete trajectories ( x k ) k ∈ N and ( y k ) k ∈ N with x k ≈ x ( kh ) and y k ≈ y ( kh ) = B ( x k ). In thisnotation, using the forward discretisation ˙ x ( t ) ≈ x k +1 − x k h and the backward discretisation˙ y ( t ) ≈ y k − y k − h in (15) gives the scheme x k +1 = (1 − h ) x k + hJ λC x k − λB ( x k ) − λh (cid:16) B ( x k ) − B ( x k − ) (cid:17)! . (16)For h ∈ (0 , h = 1, (16) recoversthe standard forward-reﬂected-backward method (7). Thus, in summary, (7) can beinterpreted as a discretisation of a linearisation of the proximal point algorithm (12).Alternatively, using the forward discretisation ˙ x ( t ) ≈ x k +1 − x k h on the left-hand side andthe backward discretisation ˙ x ( t ) ≈ x k − x k − h on the right-hand side of equation (13) leadsto the scheme x k +1 = (1 − h ) x k + hJ λC (cid:18) x k − λB (cid:16) x k + 1 h ( x k − x k − ) (cid:17)(cid:19) . (17)When h = 1, the iteration (17) is precisely the reﬂected-forward backward method (8), sothe method (8) can interpreted as a discretisation of the proximal point algorithm (12).The fact that both schemes (16) and (17) converge without cocoercivity of B , while themethod (10) does not, can therefore be partly explained by a connection to the proximalpoint algorithm, which also does not require cocoercivity for convergence.4 Backward-Forward-Reﬂected-Backward Splitting

In this section, we use the idea of discretising a linearisation of a dynamical systemto derive a new algorithm for solving an operator monotone inclusion. Recall that weconsider the three operator monotone inclusion0 ∈ ( A + B + C )( x ) , (18)where A, C : H ⇒ H are maximally monotone operators, and B : H → H is single-valued,monotone and L -Lipschitz continuous. Note that, in this case, the sum B + C is alsomaximally monotone [4, Corollary 25.2].Let λ >

0. The

Douglas–Rachford splitting method can only be directly applied toproblems of ﬁnding a zero of the sum of two monotone operators. Applying a continuous-time version of this method to the inclusion (18) yields the equation  x ( t ) = J λA ( z ( t )) y ( t ) = J λ ( B + C ) (cid:16) x ( t ) − z ( t ) (cid:17) ˙ z ( t ) = y ( t ) − x ( t ) . (19)All trajectories of this dynamical system converge weakly to solutions of the inclusion(18) (see [5, 8]). Our aim is to derive a new algorithm in which B and C are decoupled.Proceeding as in the previous section, we rewrite the second equation in (19) as y ( t ) = J λC (cid:16) x ( t ) − z ( t ) − λB ( y ( t )) (cid:17) . Let h >

0. Assuming suﬃcient smoothness, we may approximate B ( y ( t )) by using thelinearisation of B ( y ( · )) at t − h . That is, B ( y ( t )) ≈ B ( y ( t − h )) + J B ◦ y ( t − h ) h. (20)Substituting this approximation into (19) gives the system  x ( t ) = J λA ( z ( t )) y ( t ) = J λC (cid:16) x ( t ) − z ( t ) − λB ( y ( t − h )) − λ J B ◦ y ( t − h ) h (cid:17) ˙ z ( t ) = y ( t ) − x ( t ) . (21)Now we approximate the trajectories of (21) at the time points ( kh ) k ∈ N by discrete tra-jectories ( x k ) k ∈ N , ( y k ) k ∈ N and ( z k ) k ∈ N with x k ≈ x ( kh ), y k ≈ y ( kh ) and z k ≈ z ( kh ). Usingthe forward discretisation ˙ z ( t ) ≈ z k +1 − z k h and the backward discretisation J B ◦ y ( t − h ) ≈ B ( y k − ) − B ( y k − ) h yields the iteration  x k = J λA ( z k ) y k = J λC (cid:16) x k − z k − λB ( y k − ) + λB ( y k − ) (cid:17) z k +1 = z k + h ( y k − x k ) . h = 1, so the discrete-time system becomes  x k = J λA ( z k ) y k = J λC (cid:16) x k − z k − λB ( y k − ) + λB ( y k − ) (cid:17) z k +1 = z k + y k − x k . (22)We refer to (22) as the backward-forward-reﬂected-backward method and assume through-out this section that the initial points z , y − , y − ∈ H are chosen arbitrarily.When B = 0, the iteration (22) reduces to the Douglas–Rachford method for A + C ,and when A = 0, it reduces to the forward-reﬂected-backward splitting method for B + C .Note also that (22) can equivalently written as the system of inclusions  λA ( x k ) ∋ z k − x k λC ( y k ) ∋ x k − z k − y k − λB ( y k − ) + λB ( y k − ) z k +1 = z k + y k − x k . (23)The following lemma will be key in proving convergence of the iteration (22). Lemma 1.

Consider points x, z ∈ H such that z − x ∈ λA ( x ) and x − z ∈ λ ( B + C )( x ) .Then the sequences ( x k ) k ∈ N , ( y k ) k ∈ N and ( z k ) k ∈ N given by iteration (22) satisfy k z k +1 − z k + 2 λ h B ( y k ) − B ( y k − ) , x − y k i + k z k +1 − z k k ≤ k z k − z k + 2 λ h B ( y k − ) − B ( y k − ) , x − y k − i + 2 λ h B ( y k − ) − B ( y k − ) , y k − − y k i . (24) Proof.

By monotonicity of λA , we have0 ≤ h ( z − x ) − ( z k − x k ) , x − x k i . (25)Using monotonicity of λC and (22) gives0 ≤ h ( x − z ) − λB ( x ) + z k − x k + y k + 2 λB ( y k − ) − λB ( y k − ) , x − y k i = h ( x − z ) − ( x k − z k ) , x − x k i + h z k +1 − z k , z − z k +1 i + λ h B ( y k − ) − B ( x ) , x − y k i + λ h B ( y k − ) − B ( y k − ) , x − y k i . (26)From monotonicity of λB , it follows that λ h B ( y k − ) − B ( x ) , x − y k i ≤ − λ h B ( y k ) − B ( y k − ) , x − y k i . (27)By summing the inequalities (25), (26) and (27), and using the identity h z k +1 − z k , z − z k +1 i = 12 (cid:16) k z k − z k − k z k +1 − z k k − k z k +1 − z k (cid:17) , (28)we obtain0 ≤ k z k − z k − k z k +1 − z k − k z k +1 − z k k − λ h B ( y k ) − B ( y k − ) , x − y k i + 2 λ h B ( y k − ) − B ( y k − ) , x − y k i , from which the claimed inequality (24) follows.6or the convenience of the reader and clarify of presentation, we recall the followingwell-known result for reference in the proof of Theorem 3. Lemma 2 (Opial’s lemma) . Let ( z k ) k ∈ N be a sequence in H and let Ω be a non-emptysubset of H . Suppose the following assertions hold.(a) For every z ∈ Ω , the sequence ( k z k − z k ) k ∈ N converges.(b) Every weak sequential cluster point of ( z k ) k ∈ N belongs to Ω .Then ( z k ) k ∈ N converges weakly to a point in Ω .Proof. See, for instance, [4, Lemma 2.47].The following theorem is our main result regarding convergence of the backward-forward-reﬂected-backward method (22). In what follows, we make use of the fact thatthe resolvent J λA is ﬁrmly nonexpansive (see, for instance, [4, Proposition 23.10]), that is, k J λA ( x ) − J λA ( y ) k + k (Id − J λA )( x ) − (Id − J λA )( y ) k ≤ k x − y k ∀ x, y ∈ H . Theorem 3.

Suppose ( A + B + C ) − (0) = ∅ , let λ ∈ (0 , L ) , and consider sequences ( z k ) k ∈ N , ( x k ) k ∈ N and ( y k ) k ∈ N given by (22) for arbitrary initial points z , y − , y − ∈ H .Then the following assertions hold:(a) The sequence ( z k ) k ∈ N converges weakly to a point ¯ z ∈ H .(b) The sequences ( x k ) k ∈ N and ( y k ) k ∈ N converge weakly to a point ¯ x ∈ H .(c) We have ¯ x = J λA (¯ z ) ∈ ( A + B + C ) − (0) .Proof. We shall apply Lemma 2 to sequence ( z k ) k ∈ N and the set Ω deﬁned byΩ := { z ∈ H : J λA ( z ) ∈ ( A + B + C ) − (0) , ( J λA − Id)( z ) ∈ λ ( B + C )( J λA ( z )) } , (29)which we claim is nonempty. Indeed, since ( A + B + C ) − (0) = ∅ by assumption, thereexist x ∈ ( A + B + C ) − (0) and z ∈ H such that z − x ∈ λA ( x ) and x − z ∈ λ ( B + C )( x ).Combining these two inclusions yields x = J λA ( z ) and( J λA − Id)( z ) = x − z ∈ λ ( B + C )( x ) = λ ( B + C )( J λA ( z )) . Thus z ∈ Ω and hence Ω = ∅ .Next, consider an arbitrary z ∈ Ω and denote x := J λA ( z ). By Lemma 1, we have k z k +1 − z k + 2 λ h B ( y k ) − B ( y k − ) , x − y k i + k z k +1 − z k k ≤ k z k − z k + 2 λ h B ( y k − ) − B ( y k − ) , x − y k − i + 2 λ h B ( y k − ) − B ( y k − ) , y k − − y k i (30)for all k ∈ N . To estimate the last term, ﬁrst note that ﬁrm nonexpansivity of J λA gives k y k − − y k − k = k ( z k − z k − + x k − ) − ( z k − − z k − + x k − ) k ≤ k z k − z k − k + 2 k ( x k − − z k − ) − ( x k − − z k − ) k ≤ k z k − z k − k + 2 k z k − − z k − k , (31)7nd thus Lipschitz continuity of B yields2 L h B ( y k − ) − B ( y k − ) , y k − − y k i≤ k y k − − y k − k + k y k − y k − k ≤ k z k − − z k − k + 4 k z k − z k − k + 2 k z k +1 − z k k (32)for all k ∈ N . Consider the sequence ( ϕ k ) k ∈ N given by ϕ k := k z k − z k + 2 λ h B ( y k − ) − B ( y k − ) , x − y k − i + 34 k z k − z k − k + 2 λL k z k − − z k − k . Substituting inequality (30) into (32), and setting ǫ := (1 − λL ) − / > ϕ k +1 + ǫ k z k +1 − z k k ≤ ϕ k = ⇒ ϕ k +1 + ǫ k X i =0 k z i +1 − z i k ≤ ϕ ∀ k ∈ N . (33)On the other hand, Lipschitz continuity of B , nonexpansivity of J λA and (31) implies2 L h B ( y k ) − B ( y k − ) , x − y k i≤ k y k − y k − k + k ( z k +1 − x k +1 ) − ( z k − x k ) + ( x k +1 − x ) k ≤ k z k − z k − k + 2 k z k +1 − z k k + 2 k ( z k +1 − x k +1 ) − ( z k − x k ) k + 2 k x k +1 − x k ≤ k z k − z k − k + 4 k z k +1 − z k k + 2 k z k +1 − z k , which yields the lower bound ϕ k +1 ≥ (1 − λL ) k z k +1 − z k + (cid:18) − λL (cid:19) k z k +1 − z k k ≥ k z k +1 − z k ≥ ∀ k ∈ N . By combining this with inequality (33), we deduce that ( ϕ k ) k ∈ N converges, k z k +1 − z k k → z k ) k ∈ N is bounded. Since x k = J λA ( z k ) and x = J λA ( z ), nonexpansivity of J λA impliesthat k x k +1 − x k k → x k ) k ∈ N is bounded. Because of the identity y k = z k +1 − z k + x k ,we then have that k y k +1 − y k k → y k ) k ∈ N is bounded. It then follows thatlim k →∞ k z k − z k = lim k →∞ ϕ k , which establishes Lemma 2(a).Now, let z ∈ H be a weak sequential cluster point of ( z k ) k ∈ N . Since ( x k ) k ∈ N is bounded,it follows that there exists x ∈ H such that ( z, x ) is a weak sequential cluster point of(( z k , x k )) k ∈ N . Next, we note that (23) implies the inclusion z k − z k +1 z k − z k +1 ! − λ B ( y k − ) − B ( y k − ) ! − λ B ( y k − ) − B ( y k ) ! ∈ " ( λA ) − λ ( B + C ) + " − IdId 0 z k − x k z k +1 − z k + x k ! . (34)Since A and B + C are maximally monotone operators, appealing to [8, Lemma 1] andtaking the limit in (34) along a subsequence of (( z k , x k )) k ∈ N which converges weakly to( z, x ), yields ! ∈ " ( λA ) − λ ( B + C ) + " − IdId 0 z − xx ! . z − x ∈ λA ( x ), which is equivalent to x = J λA ( z ). The secondinclusion gives x − z ∈ λ ( B + C )( x ) which implies ( J λA − Id)( z ) ∈ λ ( B + C )( J λA ( z )).Thus, altogether, we have that z ∈ Ω which establishes Lemma 2(b).Having veriﬁed all of its assumptions, we now invoke Lemma 2 to deduce that ( z k ) k ∈ N converges weakly to a point ¯ z ∈ Ω. Finally, let ¯ x ∈ H be an arbitrary weak sequentialcluster point of the bounded sequence ( x k ) k ∈ N . Then, by using an argument analogousto the above, we deduce that ¯ x = J λA (¯ z ). Thus, ( x k ) k ∈ N possesses precisely one weaksequential cluster point and hence ( x k ) k ∈ N is weakly convergent. The remainder of theproof easily follows from the identity y k = z k − z k +1 + x k and the deﬁnition of Ω. Remark . By setting A = 0 in (22), we have x k = z k and Theorem 3 reduces to [15,Corollary 2.6], albeit with a worse stepsize. On the other hand, setting C = 0 gives ascheme which seems to be new, even in the two operator case. Remark . In [19], Ryu & V˜u proposed a related method known as the forward-reﬂected-Douglas–Rachford (FRDR) splitting for inclusion (1) which takes the form  x k +1 = J λA (cid:16) x k − λu k − λB ( x k ) + λB ( x k − ) (cid:17) y k +1 = J γC (cid:16) x k +1 − x k + λu k (cid:17) u k +1 = u k + 1 λ (2 x k +1 − x k − y k +1 ) . For its convergence, the constants λ, γ > A and C are requiredto satisfy 0 < λ < γ Lγ . (35)Note that, in particular, this means that λ < γ whereas in the setting of Theorem 3 bothresolvents use the same constant. On the other hand, by taking γ suﬃciently large in(35), the constant λ be choose arbitrarily close to L , in line with [15, Corollary 2.6]. Wealso remark that, in practice, there is usually no advantage to having both resolvents withthe same stepsize ( i.e., λ = γ ). In this section, we derive a second algorithm for solving the inclusion (18) by means ofdiscretising the system (19) which, in decoupled form, is given by  x ( t ) = J λA ( z ( t )) y ( t ) = J λC (cid:16) x ( t ) − z ( t ) − λB ( y ( t )) (cid:17) ˙ z ( t ) = y ( t ) − x ( t ) . (36)Let h >

0. Assuming suﬃcient smoothness, we approximate y ( t ) on the right-hand sideof the second equation by using the linearisation of y at t − h . That is, we have y ( t ) ≈ y ( t − h ) + ˙ y ( t − h ) h (37)Substituting this approximation into (36) gives  x ( t ) = J λA ( z ( t )) y ( t ) = J λC (cid:16) x ( t ) − z ( t ) − λB (cid:16) y ( t − h ) + ˙ y ( t − h ) h (cid:17)(cid:17) ˙ z ( t ) = y ( t ) − x ( t ) . (38)9ow we discretise the trajectories in (38) at the time points ( kh ) k ∈ N which we denoted by z k := z ( kh ) , x k := x ( kh ) and y k := y ( kh ). Using the forward discretisation ˙ z ( t ) ≈ z k +1 − z k h and the backward discretisation ˙ y ( t − h ) ≈ y k − − y k − h yields  x k = J λA ( z k ) y k = J λC (cid:16) x k − z k − λB (2 y k − − y k − ) (cid:17) z k +1 = z k + h ( y k − x k ) , As in Section 3, we assume h = 1 for simplicity, so that the system becomes  x k = J λA ( z k ) y k = J λC (cid:16) x k − z k − λB (2 y k − − y k − ) (cid:17) z k +1 = z k + y k − x k . (39)We refer to (39) as the backward-reﬂected-forward-backward method and assume thatthe initial points z , y − , y − ∈ H are chosen arbitrarily. Note that, when B = 0, (39)reduces to the Douglas–Rachford algorithm for A + C , and when A = 0, it reduces tothe reﬂected-forward-backward splitting method for B + C . For convenience, we use thenotation ¯ z k = 2 z k − z k − , ¯ x k = 2 x k − x k − , ¯ y k = 2 y k − y k − . The iteration (39) can then be equivalently written as the system of inclusions  λA ( x k ) = z k − x k λC ( y k ) ∋ x k − z k − y k − λB (¯ y k − ) z k +1 = z k + y k − x k . (40)Before proving weak convergence of (39), we require the following preparatory lemma. Lemma 6.

Consider points x, z ∈ H such that z − x ∈ λA ( x ) and x − z ∈ λ ( B + C )( x ) .Then the sequences ( z k ) k ∈ N , ( x k ) k ∈ N and ( y k ) k ∈ N given by (39) satisfy k z k +1 − z k + 2 λ h B (¯ y k − ) − B ( x ) , y k − y k − i + 2 k z k +1 − z k k + k z k +1 − ¯ z k k ≤ k z k − z k + 2 λ h B (¯ y k − ) − B ( x ) , y k − − y k − i + k z k − z k − k + 2 λ h B (¯ y k − ) − B (¯ y k − ) , ¯ y k − − y k i . (41) Proof.

By monotonicity of λA , we have0 ≤ h ( z − x ) − ( z k − x k ) , x − x k i . (42)Using monotonicity of λC and iteration (39) gives0 ≤ h ( x − z ) − λB ( x ) − x k + z k + y k + λB (¯ y k − ) , x − y k i = h ( x − z ) − ( x k − z k ) , x − x k i + h z k +1 − z k , z − z k +1 i + λ h B (¯ y k − ) − B ( x ) , x − y k i . (43)From monotonicity of λB , it follows that λ h B (¯ y k − ) − B ( x ) , x − y k i ≤ λ h B (¯ y k − ) − B ( x ) , ¯ y k − − y k i . (44)10y summing the inequalities (42), (43) and (44), and using the identity (28) we obtain k z k +1 − z k + k z k +1 − z k k − k z k − z k ≤ λ h B (¯ y k − ) − B (¯ y k − ) , ¯ y k − − y k i + 2 λ h B (¯ y k − ) − B ( x ) , ¯ y k − − y k i . (45)Now, by monotonicity of λA , we have0 ≤ h ( z k − x k ) − ( z k − − x k − ) , x k − x k − i . (46)Using monotonicity of λC gives0 ≤ h (2 x k − z k − y k − λB (¯ y k − )) − (2 x k − − z k − − y k − − λB (¯ y k − )) , y k − y k − i = h ( x k − z k +1 − λB (¯ y k − )) − ( x k − − z k − λB (¯ y k − )) , y k − y k − i = h ( x k − z k ) − ( x k − − z k − ) , x k − x k − i + h z k − z k +1 , ( z k +1 − z k ) − ( z k − z k − ) i + λ h B ( x ) − B (¯ y k − ) , y k − y k − i + λ h B (¯ y k − ) − B ( x ) , y k − ¯ y k − i + λ h B (¯ y k − ) − B ( x ) , y k − − y k − i . (47)By summing (46) and (47), and using the identity h z k − z k +1 , ( z k +1 − z k ) − ( z k − z k − ) i = 12 (cid:16) k z k − z k − k − k z k +1 − z k k − k z k +1 − ¯ z k k (cid:17) , we obtain k z k +1 − z k k + k z k +1 − ¯ z k k − k z k − z k − k + 2 λ h B (¯ y k − ) − B ( x ) , y k − y k − i≤ λ h B (¯ y k − ) − B ( x ) , y k − − y k − i + 2 λ h B (¯ y k − ) − B ( x ) , y k − ¯ y k − i . (48)The claimed inequality follows by summing (45) and (48).The following theorem is our main result regarding convergence of the backward-reﬂected-forward-backward method (39). Theorem 7.

Suppose ( A + B + C ) − (0) = ∅ , let λ ∈ (0 , L ) , and consider the sequences ( z k ) k ∈ N , ( x k ) k ∈ N and ( y k ) k ∈ N given by (39) for arbitrary initial points z , y − , y − ∈ H .Then the following assertions hold:(a) The sequence ( z k ) k ∈ N converges weakly to a point ¯ z ∈ H .(b) The sequences ( x k ) k ∈ N and ( y k ) k ∈ N converge weakly to a point ¯ x ∈ H .(c) We have ¯ x = J λA (¯ z ) ∈ ( A + B + C ) − (0) .Proof. The proof strategy is analogous to Theorem 3 and uses the same nonempty set Ωdeﬁned in (29). Consider an arbitrary z ∈ Ω and denote x := J λA ( z ). By Lemma 6, wehave k z k +1 − z k + 2 λ h B (¯ y k − ) − B ( x ) , y k − y k − i + 2 k z k +1 − z k k + k z k +1 − ¯ z k k ≤ k z k − z k + 2 λ h B (¯ y k − ) − B ( x ) , y k − − y k − i + k z k − z k − k + 2 λ h B (¯ y k − ) − B (¯ y k − ) , ¯ y k − − y k i . (49)11e now estimate the last term in (49). To this end, ﬁrst observe that ﬁrm nonexpansivityof J λA implies k (¯ z k − − ¯ x k − ) − ( z k − x k ) k ≤ k ( z k − x k ) − ( z k − − x k − ) k + 2 k ( z k − − x k − ) − ( z k − − x k − ) k ≤ k z k − z k − k + 2 k z k − − z k − k . Using this inequality, we deduce that k ¯ y k − − y k k = k (¯ z k − ¯ z k − + ¯ x k − ) − ( z k +1 − z k + x k ) k ≤ (1 + 6) k z k +1 − ¯ z k k + (cid:18) (cid:19) k (¯ z k − − ¯ x k − ) − ( z k − x k ) k ≤ k z k +1 − ¯ z k k + 73 (cid:16) k z k − z k − k + k z k − − z k − k (cid:17) . (50)Note that the inequality (31) from the proof Theorem 3 is still valid for (39) as the thirdlines of (22) and (39) are identical. By combining (50) with (31), we obtain k ¯ y k − − ¯ y k − k ≤ k y k − − y k − k + 2 k y k − − ¯ y k − k ≤ k z k − z k − k + 263 k z k − − z k − k + 143 k z k − − z k − k + 14 k z k − ¯ z k − k . Thus using (50) and the previous inequality yields2 L h B (¯ y k − ) − B (¯ y k − ) , ¯ y k − − y k i≤ k ¯ y k − − ¯ y k − k + k ¯ y k − − y k k ≤ k z k − z k − k + 11 k z k − − z k − k + 143 k z k − − z k − k + 7 k z k +1 − ¯ z k k + 14 k z k − ¯ z k − k (51)We deﬁne the sequence ( ϕ k ) k ∈ N by ϕ k := k z k − z k + 2 λ h B (¯ y k − ) − B ( x ) , y k − − y k − i + (1 + 22 λL ) k z k − z k − k + 473 λL k z k − − z k − k + 143 λL k z k − − z k − k + 711 k z k − ¯ z k − k . (52)Substituting (51) into the estimate (49) and setting ǫ := 1 − λL > ϕ k +1 + ǫ k z k +1 − z k k ≤ ϕ k = ⇒ ϕ k +1 + k X i =0 k z i +1 − z i k ≤ ϕ ∀ k ∈ N . (53)Next, we derive a lower bound for ϕ k +1 . To this end, note that ﬁrm nonexpansivity of J λA implies k (¯ z k − ¯ z k − + ¯ x k − ) − ( z − z + x ) k ≤ k ¯ z k − z k + 2 k (¯ x k − − ¯ z k − ) − ( x − z ) k ≤ k ¯ z k − z k + 4 k ( x k − − z k − ) − ( x − z ) k + 4 k ( x k − − z k − ) − ( x k − − z k − ) k ≤ k z k +1 − z k + 4 k z k +1 − ¯ z k k + 4 k z k − − z k + 4 k z k − − z k − k ≤ k z k +1 − z k + 16 k z k +1 − z k k + 16 k z k − z k − k + 4 k z k − − z k − k + 4 k z k +1 − ¯ z k k . B and (31) gives2 L h B (¯ y k − ) − B ( x ) , y k − y k − i≤ k (¯ z k − ¯ z k − + ¯ x k − ) − ( z − z + x ) k + 2 k y k − y k − k ≤ k z k +1 − z k + 12 k z k +1 − z k k + 12 k z k − z k − k + 2 k z k − − z k − k + 2 k z k +1 − ¯ z k k . Altogether, we have the lower bound ϕ k +1 ≥ (1 − λL ) k z k +1 − z k + (1 + 10 λL ) k z k +1 − z k k + 113 λL k z k − z k − k + 83 λL k z k − − z k − k + (cid:18) − λL (cid:19) k z k +1 − ¯ z k k ≥ k z k +1 − z k ≥ . By combining this with (53), we deduce that ( ϕ k ) k ∈ N converges, k z k +1 − z k k →

0, and( z k ) k ∈ N is bounded. Arguing as in Theorem 3, it then follows thatlim k →∞ k z k − z k = lim k →∞ ϕ k , which establishes Lemma 2(a). Next, we note that (40) implies the inclusion z k − z k +1 z k − z k +1 ! + λ B ( y k ) − B (¯ y k − ) ! ∈ " ( λA ) − λ ( B + C ) + " − IdId 0 z k − x k z k +1 − z k + x k ! . The remainder of the proof is now analogous to Theorem 3.

Remark . By using slightly tighter estimates in the current proof, numerics suggest thatthe upper bound λL < can be improved slightly to approximately . However, sincethe arithmetic in the resulting proof becomes signiﬁcantly more complex, we have decidedto present a the present slightly sub-optimal version for the sake of presentation. More-over, as the main novelty of Theorem 7 is its connection to a continuous-time dynamicalsystem, the precise value of this upper-bound is not our main concern.

In this work, we provided an intuitive interpretation to explain convergence of the forward-reﬂected-backward and reﬂected-forward-backward methods in the absence of cocoerciv-ity. More precisely, we showed that these methods can be understood as two diﬀer-ent discretions of the continuous-time proximal point algorithm which corresponds to anasymptotically stable dynamical system. This insight allowed us to derive two new threeoperator splitting algorithms, neither of which relies on cocoercivity for convergence. Fu-ture work will investigate whether the insights gained from Section 2 can be combinewith the three operator resolvent-splitting scheme with minimal lifting from from [18,Section 4] to derive a four operator scheme which exploits forward evaluations or with [2]to compute the resolvent of three operator sums.13 cknowledgements.

This work was supported in part by a Robert Bartnik VisitingFellowship from the School of Mathematics at Monash University. MKT is the recipient ofa Discovery Early Career Research Award (DE200100063) from the Australian ResearchCouncil.

References [1] B. Abbas, H. Attouch, and B. F. Svaiter. Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces.

Journalof Optimization Theory and Applications , 161(2):331–360, 2014.[2] F. J. A. Artacho and R. Campoy. Computing the resolvent of the sum of maximallymonotone operators with the averaged alternating modiﬁed reﬂections algorithm.

Journal of Optimization Theory and Applications , 181(3):709–726, 2019.[3] H. Attouch and A. Cabot. Convergence of a relaxed inertial forward–backward al-gorithm for structured monotone inclusions.

Applied Mathematics & Optimization ,80(3):547–598, 2019.[4] H. H. Bauschke and P. L. Combettes.

Convex analysis and monotone operator theoryin Hilbert spaces , volume 408 of

CMS Books in Mathematics . Springer, 2nd edition,2017.[5] R. I. Bot and E. R. Csetnek. A dynamical system associated with the ﬁxed pointsset of a nonexpansive operator.

Journal of Dynamics and Diﬀerential Equations ,29(1):155–168, 2017.[6] V. Cevher and B. C. V˜u. A reﬂected forward-backward splitting method for monotoneinclusions involving Lipschitzian operators. arXiv:1908.05912 , 2019.[7] G. H. Chen and R. T. Rockafellar. Convergence rates in forward–backward splitting.

SIAM Journal on Optimization , 7(2):421–444, 1997.[8] E. R. Csetnek, Y. Malitsky, and M. K. Tam. Shadow Douglas–Rachford splitting formonotone inclusions.

Applied Mathematics and Optimization , pages 1–14, 2019.[9] C. Daskalakis, A. Ilyas, V. Syrgkanis, and H. Zeng. Training GANs with optimism.In

International Conference on Learning Representations , 2018.[10] D. Davis and W. Yin. A three-operator splitting scheme and its optimization appli-cations.

Set-Valued and Variational Analysis , 25(4):829–858, 2017.[11] Y. Drori and M. Teboulle. Performance of ﬁrst-order methods for smooth convex min-imization: a novel approach.

Mathematical Programming , 145(1-2):451–482, 2014.[12] P. Giselsson. Nonlinear forward-backward splitting with projection correction. arXiv:1908.07449 , 2019.[13] P. R. Johnstone and J. Eckstein. Projective splitting with forward steps: asyn-chronous and block-iterative operator splitting. arXiv:1803.07043 , 2018.1414] Y. Malitsky. Projected reﬂected gradient methods for monotone variational inequal-ities.

SIAM Journal on Optimization , 25(1):502–520, 2015.[15] Y. Malitsky and M. K. Tam. A forward-backward splitting method for monotoneinclusions without cocoercivity. arXiv:1808.04162 , 2018.[16] K. Mishchenko, D. Kovalev, E. Shulgin, P. Richtárik, and Y. Malitsky. Revisitingstochastic extragradient. arXiv:1905.11373 , 2019.[17] H. Raguet, J. Fadili, and G. Peyré. A generalized forward-backward splitting.

SIAMJournal on Imaging Sciences , 6(3):1199–1226, 2013.[18] E. K. Ryu, A. B. Taylor, C. Bergeling, and P. Giselsson. Operator splitting per-formance estimation: Tight contraction factors and optimal parameter selection. arXiv:1812.00146 , 2018.[19] E. K. Ryu and B. C. V˜u. Finding the forward-Douglas–Rachford-forward method.

Journal of Optimization Theory and Aplications , 2019.[20] E. K. Ryu, K. Yuan, and W. Yin. ODE analysis of stochastic gradient methods withoptimism and anchoring for minimax problems and GANs. arXiv:1905.10899 , 2019.[21] A. B. Taylor, J. M. Hendrickx, and F. Glineur. Smooth strongly convex interpolationand exact worst-case performance of ﬁrst-order methods.