[PDF] The Projected Polar Proximal Point Algorithm Converges Globally

Abstract

Friedlander, Mac\^{e}do, and Pong recently introduced the projected polar proximal point algorithm (P4A) for solving optimization problems by using the closed perspective transforms of convex objectives. We analyse a generalization (GP4A) which replaces the closed perspective transform with a more general closed gauge. We decompose GP4A into the iterative application of two separate operators, and analyse it as a splitting method. By showing that GP4A and its under-relaxations exhibit global convergence whenever a fixed point exists, we obtain convergence guarantees for P4A by letting the gauge specify to the closed perspective transform for a convex function. We then provide easy-to-verify sufficient conditions for the existence of fixed points for the GP4A, using the Minkowski function representation of the gauge. Conveniently, the approach reveals that global minimizers of the objective function for P4A form an exposed face of the dilated fundamental set of the closed perspective transform.

Full PDF

TThe Projected Polar Proximal Point AlgorithmConverges Globally

Scott B. LindstromHong Kong Polytechnic UniversityFebruary 23, 2021

Abstract

Friedlander, Macˆedo, and Pong recently introduced the projected po-lar proximal point algorithm (P4A) for solving optimization problems byusing the closed perspective transforms of convex objectives. We analysea generalization (GP4A) which replaces the closed perspective transformwith a more general closed gauge. We decompose GP4A into the iterativeapplication of two separate operators, and analyse it as a splitting method.By showing that GP4A and its under-relaxations exhibit global conver-gence whenever a ﬁxed point exists, we obtain convergence guarantees forP4A by letting the gauge specify to the closed perspective transform fora convex function. We then provide easy-to-verify suﬃcient conditionsfor the existence of ﬁxed points for the GP4A, using the Minkowski func-tion representation of the gauge. Conveniently, the approach reveals thatglobal minimizers of the objective function for P4A form an exposed faceof the dilated fundamental set of the closed perspective transform.

Key words: projected polar proximal point algorithm, gauge optimization,polar convolution

Friedlander, Macˆedo, and Pong introduced the projected polar proximal pointalgorithm ( P A , Deﬁnition 4) as the ﬁrst proximal-point-like algorithm basedon the polar envelope [6]. The motivation to study such algorithms stemsfrom the polar envelope’s relationship to inﬁmal max convolution; Friedlander,Macˆedo, and Pong showed that this relationship is analogous to the connectionbetween the Moreau envelope and inﬁmal convolution.The method makes use of the closed perspective transform for a properconvex function f : X → R : f π : X × R + → R : ( x, λ ) (cid:55)→  λf ( λ − x ) if λ > f ∞ ( x ) if λ = 0; ∞ if λ < . a r X i v : . [ m a t h . O C ] F e b ere f ∞ denotes the recession function of f [1, Deﬁnition 2.5.1]. The perspective f π is proper closed convex [7, Page 67] and is characterized byepigraph f π = cl (cone(epigraph f × { } )) . Friedlander, Macˆedo, and Pong also provided a result [6, Theorem 7.5] showingconvergence under an assumption of strong convexity of ( f π ) ; however, thequestion of convergence more generally has remained open until now. Outline and contributions

In Section 2, we recall familiar notation and concepts from convex analysis. InSection 3, we recall and analyse the polar proximity operator, a fundamentalcomponent of P A . In particular, we show that it is ﬁrmly quasinonexpansive (Theorem 3.4).In Section 4, we recall the algorithm P A and introduce a generalizationthereof, GP A (Deﬁnition 5). In Section 4.1, we show that, when the oper-ator associated to GP A has a nonempty ﬁxed point set, its ﬁxed points allshare a special property (Proposition 4.4), on which our analysis depends. InSection 4.2, we use this property to show that the operator is strictly quasi-nonexpansive (Theorem 4.8) and admits global convergence of sequences to aﬁxed point, whenever one exists (Theorem 4.11). We show similar results forthe algorithm’s under-relaxed variants (Theorem 4.10). In Section 4.3, we pro-vide convergence results for the associated shadow sequences. In Section 4.4,we provide an example that shows the operator associated with GP A is not,generically, ﬁrmly quasinonexpansive.In Section 5, we provide suﬃcient conditions to guarantee ﬁxed points of GP A (Theorem 5.2). Moreover, when GP A speciﬁes to P A , we showthat set of global minimizers of f deﬁnes an exposed face of the fundamentalset for the perspective function f π (Theorem 5.3). The latter results connectto known results ([6, Theorem 7.4]) about ﬁxed points of P A , and we explainhow (Remark 2).In Section 6, we connect our suﬃcient conditions for ﬁxed point existence,with our convergence results that depend on that existence, to state a simpleglobal convergence guarantee (Theorem 6.1). It shows that P A is globallyconvergent in the full generality of [6]. Throughout, X is a ﬁnite dimensional Euclidean space with the Euclidean norm,and κ is a closed gauge in the sense of [5]). In other words κ is convex and( ∀ x ∈ X ) ( ∀ λ ∈ [0 , ∞ ]) 0 ≤ κ ( λx ) = λκ ( x ) . For an operator U : X → X , Fix U := { x | U ( x ) = x } is its ﬁxed point set. Fora function f : X → R + , argmin f := { x | f ( x ) = inf f ( X ) } is its set of global2inimizers, dom f := { x | f ( x ) < ∞} is its domain, lev ≤ r ( f ) := { x | f ( x ) ≤ r } is its r - lower level set , and zer f := { x | f ( x ) = 0 } is its zero set. For a closed,convex subset C ⊂ X , cone C := { λx | ( x, λ ) ∈ ( C × [0 , ∞ [) } is the cone of C ,and P C : x (cid:55)→ argmin y ∈ C (cid:107) y − x (cid:107) is the projection operator associated with C .For projection operators associated with (closed, convex) lower level sets, weuse the shorthand: P f ≤ r := P lev ≤ r ( f ) .We will make use of various notions of nonexpansivity, which we now intro-duce; more information may be found in [2], and a comparison of what maybe shown through diﬀerent cutter and projection methods is found in [4]. Thefollowing deﬁnition may be found in either of these. Deﬁnition 1 (Properties of operators) . Let D ⊂ X be nonempty and let U : D → X . Assume that Fix U (cid:54) = ∅ . Then U is said to be1. ﬁrmly nonexpansive if (cid:107) U ( x ) − U ( y ) (cid:107) + (cid:107) (Id − U )( x ) − (Id − U )( y ) (cid:107) ≤ (cid:107) x − y (cid:107) ∀ x ∈ D, ∀ y ∈ D ;2. nonexpansive if it is Lipschitz continuous with constant 1, (cid:107) U ( x ) − U ( y ) (cid:107) ≤ (cid:107) x − y (cid:107) ∀ x ∈ D, ∀ y ∈ D ;3. quasinonexpansive (QNE) if (cid:107) U ( x ) − y (cid:107) ≤ (cid:107) x − y (cid:107) ∀ x ∈ D, ∀ y ∈ Fix U (an operator that is both quasinonexpansive and continuous is called para-contracting);4. ﬁrmly quasinonexpansive (FQNE) (or a cutter ) if (cid:107) U x − y (cid:107) + (cid:107) U x − x (cid:107) ≤ (cid:107) x − y (cid:107) ∀ x ∈ D, ∀ y ∈ Fix U ;5. strictly quasinonexpansive (SQNE) if (cid:107) U ( x ) − y (cid:107) < (cid:107) x − y (cid:107) ∀ x ∈ D \ Fix U, ∀ y ∈ Fix U ;6. ρ -strongly quasinonexpansive for ρ > (cid:107) U x − y (cid:107) ≤ (cid:107) x − y (cid:107) − ρ (cid:107) U x − x (cid:107) ∀ x ∈ D \ Fix U, ∀ y ∈ Fix U. Lemma 2.1. [2, Proposition 4.4] Let D ⊂ X be nonempty. Let U : D → X .The following are equivalent:1. U is ﬁrmly quasinonexpansive.2. U − Id is quasinonexpansive.3. ( ∀ x ∈ D )( ∀ y ∈ Fix U ) (cid:107) U x − y (cid:107) ≤ (cid:104) x − y, U x − y (cid:105) . . ( ∀ x ∈ D )( ∀ y ∈ Fix U ) (cid:104) y − U x, x − U x (cid:105) .5. ( ∀ x ∈ D )( ∀ y ∈ Fix U ) (cid:107) U x − x (cid:107) ≤ (cid:104) y − x, U x − x (cid:105) . Deﬁnition 2 (Fej´er monotonicity [2, 5.1]) . A sequence ( x n ) n ∈ N is Fej´er mono-tone with respect to a closed convex set C if (cid:107) x n +1 − x (cid:107) ≤ (cid:107) x n − x (cid:107) ∀ x ∈ C, ∀ n ∈ N . A Fej´er monotone sequence with respect to a closed convex set C may bethought of as a sequence deﬁned by x n := U n x where U is QNE with respectto C = Fix U . We will make use of the fact that a Fej´er monotone sequencewith respect to a non-empty set is always bounded. We will also make use ofthe following convergence result. Theorem 2.2. [2, Theorem 5.11] Let ( x n ) n ∈ N be a sequence in X and let C bea nonempty closed convex subset of X . Suppose that ( x n ) n ∈ N is Fej´er monotonewith respect to C . Then the following are equivalent:1. the sequence ( x n ) n ∈ N converges strongly (i.e. in norm) to a point in C ;2. ( x n ) n ∈ N possesses a strong sequential cluster point in C ;3. lim inf n →∞ d ( x n , C ) = 0 . We will make use of the following result, which may be recognized as asimpliﬁed version of [4, Theorem 4] and variants of which may be found in [3].

Lemma 2.3.

Let U be FQNE. Then the operator given by (2 − γ )( U − Id) + Id is QNE for all γ ∈ [0 , . Moreover, if γ ∈ ]0 , then U is SQNE and thesequence ( y n ) n ∈ N given by y n := U n y satisﬁes lim n →∞ (cid:107) x n − U ( x n ) (cid:107) → . Friedlander, Macˆedo, and Pong introduced the polar envelope and its associatedpolar proximity operator [6], which we now recall.

Deﬁnition 3 (Polar envelope and polar proximal map [6]) . For any closedgauge κ and positive scalar α , the function κ α ( x ) := inf u max { κ ( u ) , (1 /α ) (cid:107) x − u (cid:107)} is the polar envelope of κ . The corresponding polar proximal map T κ,α ( x ) := argmin u max { κ ( u ) , (1 /α ) (cid:107) x − u (cid:107)} sends a point x to the minimizing set that deﬁnes κ α ( x ).4igure 1 shows the construction of the polar envelope and its proximityoperator for κ = (cid:107) · (cid:107) ∞ . At top and at bottom left, we take three choicesof x and plot the functions (cid:107) · − x (cid:107) in yellow, red, and orange respectively.The domain points for which each of these epigraphs intersects the epigraph of (cid:107) · (cid:107) ∞ at lowest height are the respective proximal points. The height at thepoint of intersection determines the envelope value. For points x in the whiteregions, such as the points for which the functions (cid:107) · − x (cid:107) are orange and yellowrespectively, the envelope value is simply (cid:107) x (cid:107) ∞ /

2. For points lying in the redregions, the proximal point lies on the diagonals; for points in the interiors of thered regions, the envelope values are strictly greater than (cid:107) x (cid:107) ∞ /

2. This resultsin the smoothing apparent in the red regions for the envelope shown at bottomright.Figure 1: Construction of the polar envelope and corresponding polar proximalmap for κ = (cid:107) · (cid:107) ∞ .We devote the remainder of this section to showing that the polar proximityoperator T κ,α is ﬁrmly quasinonexpansive. We have the symmetry of verticalrescaling, T κ,α ( u ) = argmin u max { κ ( u ) , (1 /α ) (cid:107) x − u (cid:107)} = argmin u max { ακ ( u ) , (cid:107) x − u (cid:107)} = T ακ, ( u ) , by which the proximity operators satisfy T κ,α = T ακ, , while the envelopes5atisfy ( ακ ) = α ( κ α ). Thus, by working with a general κ , we can, and do, let α = 1 without loss of generality. For simplicity, we also write T instead of T κ,α .Note that we still need the notation κ α to distinguish the polar envelope fromthe gauge κ itself. Lemma 3.1.

For a closed gauge κ , it holds that Fix T = zer κ. Proof.

The fact that zer κ ⊂ Fix T is obvious. We will show the reverse inclusion.Let x ∈ Fix T . Then x = T x = argmin u ∈ X max { κ ( u ) , (cid:107) x − u (cid:107)} , and so inf u ∈ X max { κ ( u ) , (cid:107) x − u (cid:107)} = max { κ ( x ) , (cid:107) x − x (cid:107)} = 0 . Thus κ ( x ) ≤

0. Combining with the fact that κ ( x ) ≥

0, we have κ ( x ) = 0. Lemma 3.2.

Let κ be a closed gauge and κ α its polar envelope. Then ( κ ( x ) = 0) ⇐⇒ ( κ α ( x ) = 0) . Proof.

Let κ ( x ) = 0. Then κ α ( x ) = inf u ∈ X max { κ ( u ) , (cid:107) x − u (cid:107)} ≤ max { κ ( x ) , (cid:107) x − x (cid:107)} = 0 . Thus κ α ( x ) = 0.Now let κ α ( x ) = 0. Since κ α ( x ) = 0, there exists a sequence ( u n ) n ∈ N suchthat max { κ ( u n ) , (1 /α ) (cid:107) x − u n (cid:107)} → . Then we have that κ ( u n ) → (cid:107) x − u n (cid:107) →

0. Since (cid:107) x − u n (cid:107) →

0, we havethat u n → x . Combining with the fact that κ is lower semicontinuous, we havethat κ ( x ) ≤ lim u n → x κ ( u n ) = 0. This concludes the result. Lemma 3.3.

Let κ be a closed gauge and κ ( x ) > . Then one of the followingholds:(i) κ ( T x ) < α (cid:107) T x − x (cid:107) , in which case ∈ α T x − x (cid:107) T x − x (cid:107) + N dom κ ( T x ) and T x = P dom κ x .(ii) We have that r := κ ( T x ) = 1 α (cid:107) T x − x (cid:107) where T x = P lev ≤ r ( κ ) x, nd there exists λ ∈ [0 , such that ∈ − λα T x − X (cid:107) T x − x (cid:107) + ∂ ( λ + κ )( T x )= 1 − λα T x − x (cid:107) T x − x (cid:107) + (cid:40) { λz | z ∈ ∂κ ( T x ) } if λ > N dom κ ( T x ) if λ = 0 . Proof.

Let κ ( x ) >

0. From Lemma 3.2, κ α ( x ) >

0. We have from [6, Section 5]that the condition κ α ( x ) > T from Lemma 3.3 often; hence the short-hand P κ ≤ r := P lev ≤ r ( κ ) . Now we have the principle result of this section, whichestablishes that the polar proximity operator T is ﬁrmly quasinonexpansive. Theorem 3.4. T is ﬁrmly quasinonexpansive.Proof. Let y ∈ X and x ∈ Fix T . If κ ( y ) = 0 then y ∈ Fix T , so let κ ( y ) >

0. ByLemma 3.3, we need only consider three cases.Case 1: If Lemma 3.3(i) holds, then we have that − α T y − y (cid:107) T y − y (cid:107) ∈ N dom κ ( T y ) , (3.1)and so y − T y ∈ N dom κ ( T y ). Thus

T y = P dom κ ( y ). The operator P dom κ isFQNE with Fix P dom κ = dom κ . Thus by Lemma 2.1,( ∀ v ∈ X ) (cid:0) ∀ u ∈ Fix P dom κ (cid:1) (cid:10) u − P dom κ v, v − P dom κ v (cid:11) ≤ . (3.2)Since x ∈ Fix T ⊂ dom κ = Fix P dom κ , we have that (3.2) is true, in particular,for u = x and v = y . Thus we obtain (cid:10) x − P dom κ y, y − P dom κ y (cid:11) ≤ . This is just (cid:104) x − T y, y − T y (cid:105) ≤ . By Lemma 2.1, this is what we needed to show.Case 2: If Lemma 3.3(ii) holds with λ = 0, then we again obtain (3.1) andproceed as in Case 1, obtaining what we needed to show.Case 3: If Lemma 3.3(ii) holds with λ > r > T y = P κ ≤ r y. Now since P κ ≤ r is FQNE, we have that( ∀ v ∈ X ) ( ∀ u ∈ Fix P κ ≤ r ) (cid:104) u − P κ ≤ r v, v − P κ ≤ r v (cid:105) ≤ . (3.3)Since x ∈ Fix T = lev ≤ κ ⊂ lev ≤ r κ = Fix P κ ≤ r , we have that (3.3) is true, inparticular, for u = x and v = y . Thus we obtain (cid:104) x − P κ ≤ r y, y − P κ ≤ r y (cid:105) ≤ . This is just (cid:104) x − T y, y − T y (cid:105) ≤ . By Lemma 2.1, this is what we needed to show.7

The projected polar proximal point algorithm

We now recall the projected polar proximal point algorithm.

Deﬁnition 4 (Projected polar proximal point algorithm P A [6, 7.2]) . Fix α > ρ α,f ( v ) := f πα ( v, , P α,f ( v ) := T α,f π ( v, . The projected polar proximal point algorithm is to begin with any v and updateby ( v k +1 , λ k +1 ) = P α,f ( v k ) . (4.1)Friedlander, Macˆedo, and Pong showed that the mapping has a useful ﬁxedpoint property, which we now recall. Theorem 4.1 (Fixed points of P A [6, Theorem 7.4]) . Let f : X → R + ∪{ + ∞} be a proper closed nonnegative convex function with inf f > and argmin f (cid:54) = ∅ . The following hold(i) If ( v, λ ∗ ) = P ,f , then λ ∗ > and λ − ∗ v ∈ argmin f .(ii) If v ∈ argmin f , then there exists λ ∗ > so that ( τ v, λ ∗ ) = P ,f ( τ v ) where τ := [1 + f ( v )] − . To show convergence of P A , we will analyse a generalization of it, whichwe now introduce. Deﬁnition 5 (Generalized projected polar proximal point algorithm ( GP A )) . For a gauge κ : X × R → R and ﬁxed α >

0, choose a starting point x ∈ X × R and iterate by x k +1 = P S ◦ T x k , (4.2)where S = X × { } , and P S : X × R → X × R : ( y, λ ) (cid:55)→ ( y, S .Note that P A and GP A are deﬁned on X and X × R respectively. How-ever, when κ = f π the sequences ( v k ) from (4.1) and ( x k ) k from (4.2) clearlysatisfy ( v k ,

1) = x k , and so we may study P A by studying GP A . We chooseto analyse GP A , because the decomposition of the method into iterative ap-plication of the two separate operators— T and P S —allows for greater ﬂexibility. We next establish a useful characterization of the ﬁxed points (Proposition 4.4).For the purpose, we need the following two lemmas.8 emma 4.2.

Let ( x, ∈ Fix P S ◦ T . Then the following hold.(i) T ( x,

1) = ( x, λ ) for some λ ∈ [0 , .(ii) Moreover, if ( x, / ∈ Fix T , then λ ∈ [0 , .Proof. (i): Since ( x, ∈ Fix P S ◦ T , we have that P S ◦ T ( x,

1) = ( x, . Since P − S ( u,

1) = { ( u, λ ) | λ ∈ R } for all u ∈ X , we have that T ( x,

1) = ( x, λ )for some λ ∈ R . We now show that λ ∈ [0 , T is F QN E ,( ∀ u ∈ X × R )( ∀ v ∈ Fix T ) (cid:104) v − T u, u − T u (cid:105) ≤ . (4.3)In particular, 0 ∈ Fix T and so (4.3) yields (cid:104)− T u, u − T u (cid:105) ≤ . (4.4)In particular (4.4) holds for u = ( x, (cid:104)− T ( x, , ( x, − T ( x, (cid:105) ≤ (cid:104)− ( x, λ ) , ( x, − ( x, λ ) (cid:105) = (cid:104)− ( x, λ ) , (0 , − λ ) (cid:105) = − λ (1 − λ ) ≤ λ ∈ [0 , T is FQNE, it is SQNE [2], and so we have that( ∀ u ∈ X × R \ Fix T )( ∀ v ∈ Fix T ) (cid:107) T u − v (cid:107) < (cid:107) u − v (cid:107) . (4.5)Speciﬁcally, (4.5) holds for u = ( x,

1) and v = 0, and so we obtain (cid:107) ( x, λ ) (cid:107) < (cid:107) ( x, (cid:107) . This is just (cid:107) x (cid:107) + λ < (cid:107) x (cid:107) + 1 , and so we conclude that λ <

1. This concludes the result.

Lemma 4.3.

Let ( x, ∈ Fix P S ◦ T and ( x, λ ) := T ( x, (a representation thatalways holds by Lemma 4.2). Then for any ( y, ∈ S the following hold:(i) (cid:107) T ( y, − ( y, (cid:107) ≥ (cid:107) T ( x, − ( x, (cid:107) ;(ii) Additionally, if T ( x,

1) = P dom κ ( x, or λ = 1 then(a) (cid:107) T ( y, − P S ◦ T ( y, (cid:107) ≥ (cid:107) T ( x, − ( x, (cid:107) ;(b) T ( y,

1) = ( w, µ ) for some µ ∈ R satisfying | − λ | ≤ | − µ | ; c) If λ < , then T ( y,

1) = ( w, µ ) for some µ ≤ λ .Proof. Let ( y, ∈ S and ( w, µ ) = T ( y, r (cid:48) := (cid:107) T ( x, − ( x, (cid:107) and r := (cid:107) T ( y, − ( y, (cid:107) . We have by Lemma 4.2 that λ ∈ [0 , . We will consider two cases: when λ = 1 and when λ ∈ [0 , Case λ = 1. Suppose λ = 1. Then | − λ | = 0, and so (ii)b clearly holds.Additionally, by Lemma 4.2, we have that ( x, ∈ Fix T , and so (cid:107) T ( x, − ( x, (cid:107) = 0, and so (i) and (ii)a both clearly hold, while (ii)c does not apply.This concludes what we needed to show in the case λ = 1. Case λ <

1. Let λ ∈ [0 ,

1[ (4.6)By Lemma 3.3, we may further consider two subcases, namely when T ( x,

1) = P dom κ ( x,

1) and when T ( x,

1) = P κ ≤ r (cid:48) ( x, Subcase 1 : Let T ( x,

1) = P dom κ ( x, P dom κ is FQNE with Fix P dom κ =dom κ , we have that( ∀ u ∈ X × R )( ∀ v ∈ dom κ ) (cid:104) P dom κ u − u, P dom κ u − v (cid:105) ≤ . (4.7)In particular, (4.7) holds for u = ( x,

1) and v = T ( y, (cid:104) P dom κ ( x, − ( x, , P dom κ ( x, − T ( y, (cid:105) ≤ . (4.8)Since P dom κ ( x,

1) = T ( x, P dom κ ( x,

1) = ( x, λ ). Combining with(4.8), we obtain (cid:104) ( x, λ ) − ( x, , ( x, λ ) − T ( y, (cid:105) ≤ (cid:104) (0 , λ − , ( x, λ ) − T ( y, (cid:105) ≤ λ − λ − µ ) ≤ . (4.10)Recall that by (4.6), we have λ ∈ [0 , λ − <

0. Combiningthis fact with (4.10), we have λ − µ ≥

0, and so λ ≥ µ . This shows (ii)b and(ii)c. Thus (1 − µ ) ≥ (1 − λ ) >

0, where the ﬁnal inequality is because λ < µ − ≥ ( λ − . (4.11)We also have that (cid:107) T ( y, − ( y, (cid:107) = (cid:107) ( w, µ ) − ( y, (cid:107) = (cid:107) w − y (cid:107) + ( µ − ≥ ( µ − , (4.12)and that( λ − = (cid:107) ( x, − ( x, λ ) (cid:107) = (cid:107) T ( x, − ( x, (cid:107) = r (cid:48) . (4.13)Combining (4.11), (4.12), and (4.13), we obtain (cid:107) T ( y, − ( y, (cid:107) ≥ r (cid:48) . Thus r ≥ r (cid:48) , which shows (i). 10e also have that (cid:107) T ( x, − P ◦ T ( x, (cid:107) = (cid:107) ( x, λ ) − ( x, (cid:107) = 1 − λ ≤ − µ and 1 − µ = (cid:107) ( w, µ ) − ( w, (cid:107) = (cid:107) T ( y, − P ◦ T ( y, (cid:107) , which shows (ii)a. Thus we have shown everything we needed to show in thecase when T ( x,

1) = P dom κ ( x, Subcase 2 : Let T ( x,

1) = P κ ≤ r (cid:48) ( x, r < r (cid:48) . For the sake of simplicity, deﬁne P r (cid:48) := P κ ≤ r (cid:48) . We have that κ ( T ( y, ≤ r < r (cid:48) , where the ﬁrst inequality is because κ ( T ( y, ≤ (cid:107) T ( y, − ( y, (cid:107) [6, Theo-rem 4.4] and the second is our contradiction assumption. Thus T ( y, ∈ lev κ ≤ r κ ⊂ lev κ

1) = T ( x, P r (cid:48) ( x,

1) = ( x, λ ). Let ( w, µ ) := T ( y, r ≥ r (cid:48) , a contradiction. Thus r ≥ r (cid:48) , which shows (i).We now establish the useful alternative characterization of the set Fix P S ◦ T . Proposition 4.4 (Fixed points of P S ◦ T ) . Let ( x, ∈ Fix P S ◦ T and set r (cid:48) := (cid:107) ( x, − T ( x, (cid:107) . It holds that Fix P S ◦ T = { ( u, | (cid:107) T ( u, − ( u, (cid:107) = r (cid:48) } . (4.16) Proof.

The ﬁrst inclusion { ( u, | (cid:107) T ( u, − ( u, (cid:107) = r (cid:48) } ⊃ Fix P S ◦ T is aconsequence of Lemma 4.3. Simply let ( x, , ( y, ∈ Fix P S ◦ T , and we havefrom Lemma 4.3(i) that (cid:107) T ( y, − ( y, (cid:107) ≥ (cid:107) T ( x, − ( x, (cid:107) and (cid:107) T ( y, − ( y, (cid:107) ≤ (cid:107) T ( x, − ( x, (cid:107) and so (cid:107) T ( y, − ( y, (cid:107) = (cid:107) T ( x, − ( x, (cid:107) = r (cid:48) .Now we will show the reverse inclusion. Let ( y, ∈ S and let (cid:107) ( y, − T ( y, (cid:107) = r (cid:48) = (cid:107) ( x, − T ( x, (cid:107) .

11y Lemma 4.2 we have that ∃ λ ∈ [0 ,

1] such that ( x, λ ) = T ( x, Case 1: λ = 1. Suppose λ = 1. Then we have that ( x, ∈ Fix T andso r (cid:48) = 0. Since (cid:107) T ( y, − ( y, (cid:107) = r (cid:48) , we have (cid:107) T ( y, − ( y, (cid:107) = 0. Thus( y, ∈ Fix T , and so ( y, ∈ Fix P S ◦ T . This concludes the case when λ = 1. Case 2: λ ∈ [0 , λ ∈ [0 , x, / ∈ Fix T and so κ ( x, > κ ( x, >

0, we have by Lemma 3.3 that either T ( x,

1) = P dom κ ( x ) or T ( x,

1) = P κ ≤ r (cid:48) ( x, Case 2(a): T ( x,

1) = P dom κ ( x ). Since dom κ is closed and convex, we havethat P dom κ is FQNE. Since P dom κ is FQNE with Fix P dom κ = dom κ , we havethat ( ∀ v ∈ X × R )( ∀ u ∈ dom κ ) (cid:104) u − P dom κ v, v − P dom κ v (cid:105) ≤ . (4.17)In particular, T ( y, ∈ dom κ , and so we may apply (4.17) holds with u = T ( y, v = ( x, (cid:104) T ( y, − P dom κ ( x, , ( x, − P dom κ ( x, (cid:105) ≤ . (4.18)Using the fact that T ( y,

1) = ( w, µ ) and P dom κ ( x,

1) = T ( x,

1) = ( x, λ ), (4.18)becomes (cid:104) ( w, µ ) − ( x, λ ) , ( x, − ( x, λ ) (cid:105) ≤ . (4.19)From (4.19) we have that ( µ − λ )(1 − λ ) ≤ . (4.20)Using the fact that λ <

1, (4.20) implies that µ ≤ λ . Thus we have that1 − µ ≥ − λ ≥

0. Thus we have that(1 − µ ) ≥ (1 − λ ) (4.21)Now since( r (cid:48) ) = (cid:107) ( x, − T ( x, (cid:107) = (cid:107) ( x, − ( x, λ ) (cid:107) = (1 − λ ) and( r (cid:48) ) = (cid:107) ( y, − T ( y, (cid:107) = (cid:107) ( y, − ( w, µ ) (cid:107) = (cid:107) y − w (cid:107) + (1 − µ ) , we have that (1 − λ ) = (cid:107) y − w (cid:107) + (1 − µ ) (4.22)Combining (4.21) and (4.22), we obtain0 ≥ (1 − λ ) − (1 − µ ) = (cid:107) w − y (cid:107) . Thus we have that w = y , and so T ( y,

1) = ( y, µ ). Thus P S ◦ T ( y,

1) = P S ◦ T ( y, µ ) = ( y, , y, ∈ Fix P S ◦ T . Case 2(b): T ( x,

1) = P κ ≤ r (cid:48) ( x, T ( x,

1) = P κ ≤ r (cid:48) ( x, P κ ≤ r (cid:48) is FQNE with Fix P κ ≤ r (cid:48) = lev ≤ r (cid:48) κ , we have that( ∀ v ∈ X × R )( ∀ u ∈ lev ≤ r (cid:48) κ ) (cid:104) u − P κ ≤ r (cid:48) v, v − P κ ≤ r (cid:48) v (cid:105) ≤ . (4.23)In particular, since κ ( T ( y, ≤ (cid:107) T ( y, − ( y, (cid:107) = r (cid:48) , we have that T ( y, ∈ Fix P κ ≤ r (cid:48) , and so we may apply (4.23) with u = T ( y,

1) and v = ( x, (cid:104) T ( y, − P κ ≤ r (cid:48) ( x, , ( x, − P κ ≤ r (cid:48) ( x, (cid:105) ≤ . (4.24)Using the fact that T ( y,

1) = ( w, µ ) and P κ ≤ r (cid:48) ( x,

1) = T ( x,

1) = ( x, λ ), (4.24)again yields (4.19), and we proceed as in

Case 2(a) .This shows the desired result.The following deﬁnition will simplify notation in the results that follow.

Deﬁnition 6.

Let Fix P S ◦ T (cid:54) = ∅ and deﬁne E : (Fix P S ◦ T ) × S → R : ( x, y ) (cid:55)→ (cid:107) T y − y (cid:107) − (cid:107) T x − x (cid:107) . Our previous results admit the following important property.

Lemma 4.5.

Whenever

Fix P S ◦ T (cid:54) = ∅ , we have that ( ∀ x ∈ Fix P S ◦ T )( ∀ y ∈ S ) E ( x, y ) ≥ , and equality holds if and only if y ∈ Fix P S ◦ T .Proof. The inequality is an immediate consequence of Lemma 4.3(i). The factthat equality holds if and only if y ∈ Fix P S ◦ T is an immediate consequence ofProposition 4.4. In this subsection, we will show that sequences admitted by GP A are globallyconvergent to a point in Fix P S ◦ T , whenever the latter is nonempty. The keyresult, Theorem 4.8, uses the following auxiliary lemma. Lemma 4.6.

Let

Fix P S ◦ T (cid:54) = ∅ . The following holds: ( ∀ x ∈ Fix P S ◦ T )( ∀ y ∈ S ) (cid:104) T x − T y, y − T y (cid:105) ≤ Proof.

Let x ∈ Fix P S ◦ T and y ∈ S . By Lemma 3.3, we may consider twocases: when T y = P dom κ y and when T y = P κ ≤ r y where r = (cid:107) T y − y (cid:107) . Case 1 : Let

T y = P dom κ y . Since P dom κ y is FQNE, we have from Lemma 2.1that ( ∀ u ∈ Fix P dom κ )( ∀ v ∈ V ) (cid:104) u − P dom κ v, v − P dom κ v (cid:105) ≤ . (4.25)13n particular T x ∈ Fix P dom κ , and so we can apply (4.25) with u = T x and v = y , obtaining (cid:104) T x − P dom κ y, y − P dom κ y (cid:105) ≤ . (4.26)Finally, substituting in (4.26) using the fact that P dom κ y = T y , we obtain (cid:104)

T x − T y, y − T y (cid:105) ≤ . This shows the result in Case 1.

Case 2 : Let

T y = P κ ≤ r y where r = (cid:107) T y − y (cid:107) . As P κ ≤ r is FQNE, wehave that( ∀ u ∈ Fix P κ ≤ r )( ∀ v ∈ V ) (cid:104) u − P κ ≤ r v, v − P κ ≤ r v (cid:105) ≤ r ≥ r (cid:48) = (cid:107) T x − x (cid:107) . Combining this with the factfrom [6, Theorem 4.4(ii)] that κ ( T x ) ≤ (cid:107) T x − x (cid:107) , we have that T x ∈ lev ≤ r (cid:48) κ ⊂ lev ≤ r κ . Thus T x ∈ Fix P κ ≤ r , and so we can apply (4.27) with u = T x and v = y , obtaining (cid:104) T x − P κ ≤ r y, y − P κ ≤ r y (cid:105) ≤ . (4.28)Finally, since P κ ≤ r y = T y , we may substitute in (4.28) to obtain (cid:104)

T x − T y, y − T y (cid:105) ≤ . This concludes the result.

Fact 4.7.

Since S is an aﬃne subspace, it holds that( ∀ u ∈ X )( ∀ v ∈ S ) (cid:107) u − v (cid:107) = (cid:107) P S ( u ) − v (cid:107) + (cid:107) u − P S ( u ) (cid:107) (4.29) Proof.

This follows immediately from the Pythagorean theorem.Figure 2 illustrates the strategy of the following theorem, which brings to-gether all of the diﬀerent results we have established so far.

Theorem 4.8.

Let

Fix P S ◦ T (cid:54) = ∅ . Set x ∈ S . The operator P S ◦ T : S → S is SQNE, and so the sequence ( x n ) n ∈ N ⊂ S given by y n +1 := P S ◦ T y n is Fej´er monotone with respect to Fix( P S ◦ T ) . More speciﬁcally, ( ∀ x ∈ Fix P S ◦ T )( ∀ y ∈ S ) (cid:107) P S ◦ T y − x (cid:107) ≤ (cid:107) y − x (cid:107) − E ( x, y ) , where E is as in Deﬁnition (6) .Proof. Let x ∈ Fix P S ◦ T and y ∈ S . There exists β ∈ R such that T x − y = (1 + β )( T y − y ) + u for some u ∈ (span { T y − y } ) ⊥ . (4.30)We will ﬁrst show that β ≥

0. From Lemma 4.6 we have that (cid:104)

T x − T y, y − T y (cid:105) ≤ . (4.31)14 = X × { } x = P S ◦ T x T x yT y P S ◦ T yT x − u (cid:107) y − x (cid:107)(cid:107) x − P S ◦ T y (cid:107)(cid:107)

T x − x (cid:107) (cid:107) T x − y (cid:107)(cid:107) T y − y (cid:107) uβ ( T y − y )Figure 2: The strategy for Theorem 4.8 is illustrated.Adding (cid:104) y − T x, y − T y (cid:105) to both sides of (4.31) yields (cid:107)

T y − y (cid:107) ≤ (cid:104) T x − y, T y − y (cid:105) . (4.32)Combining (4.30) and (4.32), we obtain (cid:107) T y − y (cid:107) ≤ (cid:104) (1 + β )( T y − y ) + u, T y − y (cid:105) = (1 + β ) (cid:107) T y − y (cid:107) . (4.33)Now (4.33) implies that β ≥ (cid:107) T y − y (cid:107) = 0. If (cid:107) T y − y (cid:107) = 0, then T y = y ∈ S and so P S ◦ T y = y , and so y ∈ Fix( P S ◦ T ), in which case we are done. Thuswe may restrict to considering the case when (cid:107) T y − y (cid:107) >

0. This, together with(4.33), yields β ≥ . Now applying the Pythagorean Theorem to (4.30) yields (cid:107) u (cid:107) = (cid:107) T x − y (cid:107) − (cid:107) (1 + β )( T y − y ) (cid:107) . (4.34)Moreover, we may rearrange (4.30) to obtain T x − T y = β ( T y − y ) + u. (4.35)Applying the Pythagorean Theorem to (4.35), we obtain (cid:107) T x − T y (cid:107) = (cid:107) β ( T y − y ) (cid:107) + (cid:107) u (cid:107) . (4.36)Using (4.34) to substitute for (cid:107) u (cid:107) in (4.36), we obtain (cid:107) T x − T y (cid:107) = (cid:107) β ( T y − y ) (cid:107) + (cid:107) T x − y (cid:107) − (cid:107) (1 + β )( T y − y ) (cid:107) = (cid:107) T x − y (cid:107) − (1 + 2 β ) (cid:107) T y − y (cid:107) . (4.37)15ow from Fact 4.7 we have that (4.29) holds for u = T x and v = y , and so wehave (cid:107) T x − y (cid:107) = (cid:107) T x − P S ◦ T ( x ) (cid:107) + (cid:107) y − P S ◦ T ( x ) (cid:107) which is (cid:107) T x − y (cid:107) = (cid:107) T x − x (cid:107) + (cid:107) y − x (cid:107) . (4.38)Now we may use (4.38) to substitute for (cid:107) T x − y (cid:107) in (4.37) and obtain (cid:107) T x − T y (cid:107) = (cid:107) T x − x (cid:107) + (cid:107) y − x (cid:107) − (1 + 2 β ) (cid:107) T y − y (cid:107) . (4.39)Now we have that (cid:107) T y − y (cid:107) = (cid:107) T x − x (cid:107) + (cid:107) T y − y (cid:107) − (cid:107) T x − x (cid:107) = (cid:107) T x − x (cid:107) + E ( x, y ) , (4.40)where E is as deﬁned in Deﬁnition 6. Multiplying both sides of (4.40) by − (1 + 2 β ), we obtain − (1 + 2 β ) (cid:107) T y − y (cid:107) = − (1 + 2 β ) (cid:107) T x − x (cid:107) − (1 + 2 β ) E ( x, y ) . (4.41)Using (4.41) to make the appropriate substitution for − (1 + 2 β ) (cid:107) T y − y (cid:107) in(4.39), we obtain (cid:107) T x − T y (cid:107) = (cid:107) T x − x (cid:107) + (cid:107) y − x (cid:107) − (1 + 2 β ) (cid:107) T x − x (cid:107) − (1 + 2 β ) E ( x, y )= (cid:107) y − x (cid:107) − β (cid:107) T x − x (cid:107) − (1 + 2 β ) E ( x, y ) . (4.42)Since S is closed and convex, P S is nonexpansive ([2, Proposition 4.16]). Thuswe have that (cid:107) P S ◦ T y − P S ◦ T x (cid:107) ≤ (cid:107) T y − T x (cid:107) . (4.43)Since x ∈ Fix P S ◦ T , we have that P S ◦ T x = x . Making this substitution in(4.43), we obtain (cid:107) P S ◦ T y − x (cid:107) ≤ (cid:107) T y − T x (cid:107) . (4.44)Together, (4.42) and (4.44) yield (cid:107) P S ◦ T y − x (cid:107) ≤ (cid:107) y − x (cid:107) − β (cid:107) T x − x (cid:107) − (1 + 2 β ) E ( x, y ) (4.45) ≤ (cid:107) y − x (cid:107) − E ( x, y ) . (4.46)where the second inequality uses the fact that β ≥ E ( x, y ) ≥

0. Thisshows the desired result.Theorem 4.8 admits the following corollary.

Corollary 4.9 (Averaged variant) . Let

Fix P S ◦ T (cid:54) = ∅ . The operator given by P S ◦ T + 12 Id is FQNE. roof. By Theorem (4.8), we have that P S ◦ T is QNE. By Lemma 2.1, anoperator R is FQNE if and only if 2 U − Id is QNE. Letting2 U − Id = P S ◦ T we have that U is FQNE and that U = P S ◦ T + Id.Now having the key results of Theorem 4.8, we are ready to show convergencefor both GP A and its under-relaxed variants. Theorem 4.10 (Convergence of under-relaxed variants of GP A ) . Let

Fix P S ◦ T (cid:54) = ∅ . Let γ ∈ ]0 , and y ∈ S . The sequence given by y n +1 := U γ y n where U γ :=(1 − γ ) P S ◦ T + γ Id is strongly convergent to some y ∈ Fix P S ◦ T .Proof. Notice that U γ = ((1 − γ )( P S ◦ T − Id) + Id)= (cid:18) − γ (cid:19) ( P S ◦ T − Id) + Id= (2 − γ ) (cid:18) P S ◦ T −

12 Id (cid:19) + Id= (2 − γ ) (cid:18)(cid:18) P S ◦ T + 12 Id (cid:19) − Id (cid:19) + Id= (2 − γ )( U − Id) + Id , where 2 γ ∈ ]0 ,

2[ and U := 12 P S ◦ T + 12 Idis the FQNE operator from Corollary 4.9. Thus, applying Lemma 2.3 for theoperator U , we have that U γ is QNE and that (cid:107) P S ◦ T y n − y n (cid:107) = 2 (cid:13)(cid:13)(cid:13)(cid:13) P S ◦ T y n − y n (cid:13)(cid:13)(cid:13)(cid:13) = 2 (cid:13)(cid:13)(cid:13)(cid:13) y n − (cid:18) P S ◦ T + 12 Id (cid:19) y n (cid:13)(cid:13)(cid:13)(cid:13) = 2 (cid:107) y n − U ( y n ) (cid:107) → n → ∞ . (4.47)Since ( y n ) n ∈ N is Fej´er monotone, it is bounded. Since S is ﬁnite dimensionaland ( y n ) n ∈ N ⊂ S is bounded, we may take a convergent subsequence ( y j ) j ∈ J ⊂ N such that (cid:107) y j − y (cid:107) → j → ∞ (4.48)17or some y ∈ S . Since T is continuous and P S is continuous, P S ◦ T is continuous.Thus we have lim j →∞ P S ◦ T ( y j ) = P S ◦ T ( lim j →∞ y j ) = P S ◦ T ( y ) . (4.49)The triangle inequality yields (cid:107) y − P S ◦ T ( y ) (cid:107) ≤ (cid:107) y − y j (cid:107) + (cid:107) y j − P S ◦ T ( y j ) (cid:107) + (cid:107) P S ◦ T ( y j ) − P S ◦ T ( y ) (cid:107) (4.50)Taking the limit as j → ∞ , each of the terms in the right hand side of (4.50)go to zero by (4.48), (4.47), and (4.49) respectively. Thus (cid:107) y − P S ◦ T ( y ) (cid:107) = 0,and so y = P ◦ T ( y ). Thus we have that y ∈ Fix P S ◦ T .Since ( y n ) n ∈ N is Fej´er monotone with respect to Fix P S ◦ T and possesses asequential cluster point y ∈ Fix P S ◦ T , we conclude by Theorem 2.2 that y n → y as n → ∞ .Having proven the convergence for the under-relaxed variants of GP A , wenow show the convergence of its non-relaxed version. Theorem 4.11 (Convergence of GP A ) . Let

Fix P S ◦ T (cid:54) = ∅ . Let y ∈ S . Thesequence ( y n ) n ∈ N given by y n +1 := P S ◦ T y n converges to a point y ∈ Fix P S ◦ T .Proof. Fix x ∈ Fix P S ◦ T . Applying Theorem 4.8, we have that (cid:107) y n +1 − x (cid:107) ≤ (cid:107) y n − x (cid:107) − E ( x, y n ) , and so we have that0 ≤ (cid:107) y n +1 − x (cid:107) ≤ (cid:107) y − x (cid:107) − n (cid:88) i =0 E ( x, y i ) . Thus we obtain n (cid:88) i =0 E ( x, y i ) ≤ (cid:107) y − x (cid:107) , which shows that E ( x, y n ) → n → ∞ . (4.51)From the deﬁnition of E , (4.51) implies that (cid:107) T y n − y n (cid:107) ↓ (cid:107) T x − x (cid:107) as n → ∞ . (4.52)As ( y n ) n ∈ N is Fej´er monotone, it is bounded. Thus we may take a convergentsubsequence ( y j ) j ∈ J ⊂ N . Therefore, let y j → y as j → ∞ . (cid:107) T y − y (cid:107) = lim j →∞ (cid:107) T y j − y j (cid:107) = (cid:107) T x − x (cid:107) , where the ﬁrst equality follows from the continuity of T − Id and the secondequality is from (4.52). Now since (cid:107)

T y − y (cid:107) = (cid:107) T x − x (cid:107) with x ∈ Fix P S ◦ T , wehave by Proposition 4.4 that y ∈ Fix P S ◦ T . Since ( y n ) n ∈ N is Fej´er monotonewith respect to Fix P S ◦ T and possesses a sequential cluster point y ∈ Fix P S ◦ T ,we conclude by Theorem 2.2 that y n → y as n → ∞ .When Fix P S ◦ T (cid:54) = ∅ , Theorem 4.11 guarantees the convergence of GP A ,and Theorem 4.10 does the same for its under-relaxed variants. The next corol-lary simply formalizes this by including both cases. Corollary 4.12.

Let

Fix P S ◦ T (cid:54) = ∅ . Let γ ∈ [0 , and y ∈ S . Then thesequence given by y n +1 := U γ y n where U γ :=(1 − γ ) P S ◦ T + γ Id (4.53) is convergent to some y ∈ Fix P S ◦ T , and the operator U γ is paracontracting.Proof. The convergence when γ ∈ ]0 ,

1[ is shown by Theorem 4.10, and the con-vergence when γ = 0 is dealt with by Theorem 4.11. These theorems also showthat U γ is QNE in these two cases respectively. Combining with the fact that U γ is obviously a weighted average of continuous operators, the paracontractingproperty is clear. Having established convergence for the governing sequence, we also have thefollowing result that describes the behaviour of the sequence of shadows of theproximity operator: T ◦ ( U γ ) n − ( y , Corollary 4.13.

Let

Fix P S ◦ T (cid:54) = ∅ . Set ( y , ∈ S and γ ∈ [0 , . Thesequence ( λ n ) n ∈ N ⊂ S given by ( y n +1 , λ n +1 ) := T ◦ ( U γ ) n − ( y , where U γ is as speciﬁed in (4.53) satisﬁes λ n → − r (cid:48) ∈ [0 , as n → ∞ , where r (cid:48) is as speciﬁed in (4.16) .Proof. From Corollary 4.12 we have that( y n , → ( y,

1) for some ( y, ∈ Fix P ◦ T. (cid:107) T ( y, − ( y, (cid:107) = r (cid:48) (4.54)where r (cid:48) is as characterized in (4.16). Using Lemma 4.2, we have that T ( y,

1) =( y, λ ) for some λ ∈ [0 , (cid:107) T ( y, − ( y, (cid:107) = (cid:107) ( y, λ ) − ( y, (cid:107) = 1 − λ. (4.55)Combining (4.54) and (4.55) we have that r (cid:48) = 1 − λ , and so λ = 1 − r (cid:48) ∈ [0 , . Since ( y n , → ( y,

1) and T is continuous, we have that( y n +1 , λ n +1 ) = T ( y n , → T ( y,

1) = ( y, λ ) . Thus λ n → λ = 1 − r (cid:48) . This concludes the result. P S ◦ T is not, generically, FQNE The property of ﬁrm quasinonexpansivity is especially important in the analysisof algorithms. In this section, we discuss under which conditions the operator P S ◦ T may or may not exhibit this property. In particular, we provide anexample illustrating that it is not, generically, FQNE, and we show that failureto act like a FQNE operator implies some very speciﬁc conditions. Proposition 4.14.

Let ( y, ∈ S , and let ( x, ∈ Fix P S ◦ T . Let ( x, λ ) = T ( x, and ( w, µ ) = T ( y, . If < (cid:104) ( x, − P S ◦ T ( y, , ( y, − P S ◦ T ( y, (cid:105) , (4.56) then the following hold:(i) λ < µ < (ii) T ( x, (cid:54) = P dom κ ( x, Proof.

Suppose that (4.56) holds. Then we have that0 < (cid:104) ( x, − P S ◦ T ( y, , ( y, − P S ◦ T ( y, (cid:105) (4.57)= (cid:104) ( x, − ( w, , ( y, − ( w, (cid:105) = (cid:104) ( x − w, , ( y − w, (cid:105) = (cid:104) x − w, y − w (cid:105) . (4.58)We will ﬁrst show ((i)).From Lemma 4.6 we have that (cid:104) T ( x, − T ( y, , ( y, − T ( y, (cid:105) ≤ . (4.59)20et ( w, µ ) = T ( y,

1) and ( x, λ ) = T ( x,

1) for λ ≤

1. Then (4.59) becomes (cid:104) ( x, λ ) − ( w, µ ) , ( y, − ( w, µ ) (cid:105) ≤ (cid:104) ( x − w, λ − µ ) , ( y − w, − µ ) (cid:105) ≤ (cid:104) x − w, y − w (cid:105) + ( λ − µ )(1 − µ ) ≤ λ ∈ [0 ,

1] and so there are only four possibilities: when λ = 1,when µ ≤ λ <

1, when λ < ≤ µ , and when λ < µ <

1. We will show that anycase other than λ < µ <

Case λ = 1. Suppose λ = 1. Then ( λ − µ )(1 − µ ) = (1 − µ ) ≥ (cid:104) x − w, y − w (cid:105) ≤ , (4.61)which contradicts (4.58), and so we obtain a contradiction. This concludes thecase λ = 1. Case µ ≤ λ <

1. Suppose µ ≤ λ ≤

1. We have that ( λ − µ ) ≥ − µ ) ≥

0, and so clearly ( λ − µ )(1 − µ ) ≥

0. Combining this with (4.60),we again obtain (4.61), which is a contradiction. This concludes the case when µ ≤ λ < Case λ < ≤ µ . Suppose λ < ≤ µ . Then ( λ − µ ) ≤ − µ ) ≤ λ − µ )(1 − µ ) ≥

0. Combining this with (4.60), we again obtain (4.61),a contradiction. This concludes the case when λ < ≤ µ .We are left with only one possibility, λ < µ <

1, and so (i) holds.We next show (ii). Having established that λ < µ , we have by Lemma 4.3(ii)cthat ( λ < µ ) = ⇒ T ( x, (cid:54) = P dom κ ( x, Example 1 ( P S ◦ T is generically not a cutter) . Let κ : R → R : x (cid:55)→ (cid:107) x (cid:107) ∞ + ι C x, where C := { ( v, u ) | v ≤ − u/ } . Then T γ (1 , − /

20) = P κ ≤ / (1 , − /

20) = (1 / , − / , and so (1 , − / ∈ Fix P S ◦ T γ . Additionally, T γ (2 ,

1) = P dom κ (2 ,

1) = ( − / , / , and so (2 , / ∈ Fix P S ◦ T γ . We have that (cid:104) P S ◦ T γ (2 , − ( − / , , ( − / , − (2 , (cid:105) = ( − / / − / − > . This example is illustrated in Figure 3.21igure 3: An illustration of Example 1.

In the previous section, we consistently assumed that Fix P S ◦ T (cid:54) = ∅ . It bearsnoting that this condition may not hold for a general gauge . Of course, it does hold for P A , under the conditions in Theorem 4.1. We will provide suﬃcientconditions for the more general GP A , which allow us to describe the solutionsto P A as lying on an exposed face of a dilated fundamental set. For thepurpose, we make use of the Minkowski function representation of the gauge: κ ( x ) = γ D ( x ) := inf { µ ≥ | x ∈ µD } . (5.1)Such a representation always holds by choosing D = { x | κ ( x ) ≤ } [6]. Thefollowing Lemma will be instrumental to our main result in Theorem 5.2. Lemma 5.1.

Let D = lev κ ≤ . The following hold.(i) cone D = dom κ .(ii) If there exists λ (cid:48) > such that λ (cid:48) = max λ ∈ R { λ | ∃ y (cid:48) ∈ X so that ( y (cid:48) , λ (cid:48) ) ∈ D } , then for any ( y, β ) ∈ cone D ∩ ( X × R > ) there exists a minimal < r < ∞ so that x = rd for some d ∈ D .Proof. (i): Let x ∈ cone D . Then there exist λ < ∞ , d ∈ D such that x = λd .By positive homogeneity, κ ( λd ) = λκ ( d ) ≤ λ < ∞ , and so x ∈ dom κ , socone D ⊂ dom κ . Now let x ∈ dom κ . Then there exists r < ∞ such that κ ( x ) = r and so by homogeneity κ ( x/r ) = 1 and so x/r ∈ D , so x = r ( x/r ) ∈ cone D ,and so dom κ ⊂ cone D . 22ii): Let ( y, β ) ∈ cone D ∩ ( X × R > ). First of all, notice that by (i), ( y, β ) ∈ dom κ and so r := inf { λ | ( y, β ) ∈ λD } < ∞ . Next we show r >

0. Since ( y, β ) ∈ cone D , there exists some λ ≥ , ( d y , µ ) ∈ D such that λ ( d y , µ ) = ( y, β ). Now any such λ that satisﬁes this equality clearlysatisﬁes λµ = β with β >

0, and so all three constants are greater than zero.Moreover, any such constant λ that satisﬁes this equality satisﬁes λ = β/µ ≥ β/λ (cid:48) >

0. Thus we have that r = inf { λ | ( y, β ) ∈ λD } ≥ β/λ (cid:48) > . Now let ( λ n ) n satisfy λ n ↓ r as n → ∞ . Since λ n > r for all n , ( y, β ) ∈ λ n D forall n and so there exists ( d n ) n such that ( y, β ) = λ n d n for all n . Notice that (cid:107) d n (cid:107) = (cid:107) ( y, β ) (cid:107) λ n ≤ (cid:107) ( y, β ) (cid:107) r < ∞ , and so the sequence ( d n ) n is bounded. Thus we can pass to a convergent sub-sequence if need be and have d n → d as n → ∞ . Since D is closed and d n ∈ D for all n , we have d ∈ D . Taking the limit of both sides of( y, β ) = λ n d n , as n → ∞ , we have ( y, β ) = dr with d ∈ D and r being the attained inﬁmumof all such values such that ( y, β ) ∈ rD . This shows the desired result.The following theorem provides conditions that guarantee nonemptiness ofthe ﬁxed point set. The strategy is to relate an exposed face of the fundamentalset D to the ﬁxed points of the algorithm. Theorem 5.2 (Existence of ﬁxed points of GP A ) . Let D be the (closed)fundamental set of κ as in (5.1) . The following hold.(i) If there exists λ (cid:48) ≥ such that λ (cid:48) = max λ ∈ R { λ | ∃ y ∈ X so that ( y, λ ) ∈ D } , then F = D ∩ { ( y, λ (cid:48) ) | y ∈ X } is an exposed face of D and(a) (cid:16) λ (cid:48) (cid:17) F = T (Fix P S ◦ T ); and(b) Any ( x, ∈ Fix P S ◦ T satisﬁes T ( x,

1) = (cid:16) x, λ (cid:48) λ (cid:48) (cid:17) .For example, this is always the case when D is bounded.(ii) If such a λ (cid:48) does not exist and there exists a sequence ( y n , λ n ) n ∈ N such that λ n → ∞ and λ n / (cid:107) y n (cid:107) → m > , then T (Fix P S ◦ T ) = Fix P S ◦ T = zer κ ∩ S . roof. (i): Suppose λ (cid:48) exists as described. To understand why F is an exposedface of D , see Remark 1 below. Case 1: λ (cid:48) = 0, then any point in F is of the form ( x,

0) for some x ∈ X .Moreover, (0 ,

1) is a ﬁxed point and satisﬁes (cid:107) T (0 ,

1) = (0 , (cid:107) = 1. It is then astraightforward consequence of Proposition 4.4 that T ( x,

1) = ( x,

0) if and onlyif ( x,

1) is a ﬁxed point of P S ◦ T . This is all we needed to show in this case. Case 2: λ (cid:48) >

0. Since λ (cid:48) >

0, the set ∪ r ≥ rD ∩ ( X × { θ } ) (cid:54) = ∅ for each θ >

0. For any ( x, θ ) ∈ cone D with θ > r ( x,θ ) > x, θ ) ∈ r ( x,θ ) D. By the Minkowskideﬁnition of the gauge this means, κ ( x, θ ) = r ( x,θ ) . There must also exist ( d x , µ θ ) ∈ D such that r ( x,θ ) ( d x , µ ) = ( x, θ ). Thus r ( x,θ ) µ θ = θ and r ( x,θ ) = θ/µ θ . Furthermore, our choice of λ (cid:48) guarantees that µ θ ≤ λ (cid:48) . Using positive homogeneity we have r ( x,θ ) = κ ( x, θ ) = κ ( r ( x,θ ) ( d x , µ θ )) = r ( x,θ ) κ ( d x , µ θ ) and so κ ( d x , µ θ ) = 1 . Again using positive homogeneity, we obtain κ ( x, θ ) = κ (( θ/µ θ )( d x , µ θ )) = ( θ/µ θ ) κ (( d x , µ θ )) = θ/µ θ , (5.2)where the ﬁnal equality is because we just showed κ ( d x , µ θ ) = 1.Additionally, for any point ( y (cid:48) , λ (cid:48) ) ∈ F , homogeneity assures that θ/λ (cid:48) ≥ ( θ/λ (cid:48) ) κ ( y (cid:48) , λ (cid:48) ) = κ (( θ/λ (cid:48) )( y (cid:48) , λ (cid:48) )) = κ (( p θ y (cid:48) , θ )) where p θ := θ/λ (cid:48) . (5.3)Now let θ (cid:48) := argmin θ ∈ R max { θ/λ (cid:48) , | θ − |} = λ (cid:48) λ (cid:48) . (5.4)Now we show that ( p θ (cid:48) y (cid:48) ,

1) is in Fix P S ◦ T . Remember that κ is the polarenvelope of κ from Deﬁnition 3 κ (( p θ (cid:48) y (cid:48) , ( x,θ ) ∈ X × R max { κ ( x, θ ) , (cid:107) ( x, θ ) − ( p θ (cid:48) y (cid:48) , (cid:107)} = inf ( x,θ ) ∈ cone D max { κ ( x, θ ) , (cid:107) ( x, θ ) − ( p θ (cid:48) y (cid:48) , (cid:107)} (5.5a)= inf ( x,θ ) ∈ cone D max { θ/µ θ , (cid:107) ( x, θ ) − ( p θ (cid:48) y (cid:48) , (cid:107)} (5.5b) ≥ min θ ∈ R max { θ/λ (cid:48) , | θ − |} (5.5c)= max { θ (cid:48) /λ (cid:48) , | θ (cid:48) − |} (5.5d) ≥ max { κ ( p θ (cid:48) y (cid:48) , θ (cid:48) ) , (cid:107) ( p θ (cid:48) y (cid:48) , θ ) − ( p θ (cid:48) y (cid:48) , (cid:107)} . (5.5e)Here (5.5a) is true by Lemma 5.1(i), (5.5b) holds by (5.2), (5.5c) holds because µ θ ≤ λ (cid:48) , (5.5d) holds by (5.4), and (5.5e) is obtained by applying (5.3) with24 = θ (cid:48) . Altogether (5.5) shows that ( p θ (cid:48) y (cid:48) , θ (cid:48) ) = T ( p θ (cid:48) y (cid:48) , p θ (cid:48) y (cid:48) , ∈ Fix P S ◦ T .Notice that θ (cid:48) ∈ ]0 ,

1[ and is nearer to 1 for larger λ (cid:48) and nearer to 0 forsmaller λ (cid:48) , exactly as we would expect. Notice also that we have shown that any point ( θ (cid:48) /λ (cid:48) )( y (cid:48) , λ (cid:48) ) = 11 + λ (cid:48) ( y (cid:48) , λ (cid:48) ) ∈

11 + λ (cid:48) F, admits a corresponding point ( p θ (cid:48) y (cid:48) , ∈ Fix P S ◦ T whose proximal image is T ( p θ (cid:48) y (cid:48) ,

1) = ( p θ (cid:48) y (cid:48) , θ (cid:48) ) = ( θ (cid:48) /λ (cid:48) )( y (cid:48) , λ (cid:48) ) . This shows that 11 + λ (cid:48) F ⊂ T (Fix P S ◦ T ) . Now let ( x, ∈ Fix P S ◦ T . Using Proposition 4.4, we have that any ( x, ∈ Fix P S ◦ T must satisfy T ( x,

1) = ( x, λ ) where λ ∈ [0 ,

1] and (cid:107) ( x, − ( x, λ ) (cid:107) = (cid:107) ( p θ (cid:48) y (cid:48) , θ (cid:48) ) − ( p θ (cid:48) y (cid:48) , (cid:107) = r (cid:48) = | − θ (cid:48) | , which forces λ = θ (cid:48) . This shows that (i)b is true.Now by Lemma 5.1(i),( x, θ (cid:48) ) ∈ cone D , since ( x, θ (cid:48) ) ∈ dom κ . Now using thefact that ( x, θ (cid:48) ) ∈ cone D and following the same reasoning as we used aboveto obtain (5.2), we have that there exists a value r ( x,θ (cid:48) ) and a point ( d x , µ θ (cid:48) )with µ θ (cid:48) ≤ λ (cid:48) such that r ( x,θ (cid:48) ) ( d x , µ θ (cid:48) ) = ( x, θ (cid:48) ) and κ ( x, θ (cid:48) ) = ( θ (cid:48) /µ θ (cid:48) ). Since( x, , ( p θ (cid:48) y (cid:48) , ∈ Fix P S ◦ T , we have from Proposition 4.4 that | θ (cid:48) − | = (cid:107) ( x, θ (cid:48) ) − ( x, (cid:107) = (cid:107) ( p θ (cid:48) y (cid:48) , θ (cid:48) ) − ( p θ (cid:48) y (cid:48) , (cid:107) = r (cid:48) . Moreover, θ (cid:48) /µ θ (cid:48) = κ ( x, θ (cid:48) ) ≤ (cid:107) ( x, θ (cid:48) ) − ( x, (cid:107) = r (cid:48) , and so r (cid:48) = max { θ (cid:48) /µ θ (cid:48) , | − θ (cid:48) |} , ≥ min ( θ,µ ) ∈ R + × ]0 ,λ (cid:48) ] max { θ/µ, | − θ |} , = max { θ (cid:48) /λ (cid:48) , | − θ (cid:48) |} , = r (cid:48) . (5.6)The equality throughout (5.6) forces µ θ (cid:48) = λ (cid:48) . Finally,( d x , µ θ (cid:48) ) = ( d x , λ (cid:48) ) ∈ F and so T ( x,

1) = ( x, θ (cid:48) ) = ( x, θ (cid:48) ) = ( θ (cid:48) /λ (cid:48) )( d x , λ (cid:48) ) ∈ ( θ (cid:48) /λ (cid:48) ) F = 11 + λ (cid:48) F. This shows that 11 + λ (cid:48) F ⊃ T (Fix P S ◦ T ) . This concludes the proof of (i)a.(ii): Let the sequence ( y n , λ n ) n ∈ N exist as described. By compactness of theunit ball in Euclidean space and by appealing to a subsequence if necessary, the25equence y n / (cid:107) y n (cid:107) converges to some y in the unit ball. Now since ( y n , λ n ) ∈ D for all n , κ ( y n , λ n ) ≤ ∀ n ) , and by the Minkowski function representation of κ , κ (cid:18) y n m (cid:107) y n (cid:107) , λ n m (cid:107) y n (cid:107) (cid:19) = 1 m (cid:107) y n (cid:107) κ ( y n , λ n ) . Taking the limits of both sides as n → ∞ and using the lower semicontinuity of κ , we obtain κ (cid:16) ym , (cid:17) = 0 . The point (cid:0) ym , (cid:1) ∈ zer κ ∩ S is clearly a ﬁxed point of T sincemax (cid:110) κ (cid:16) ym , (cid:17) , (cid:13)(cid:13)(cid:13)(cid:16) ym , (cid:17) − (cid:16) ym , (cid:17)(cid:13)(cid:13)(cid:13)(cid:111) = 0 . Thereafter appealing to Lemma 3.1, Proposition 4.4, and the fact that κ ( T ( x, ≤(cid:107) ( x, − T ( x, (cid:107) = r (cid:48) = 0 for any ( x, ∈ Fix P S ◦ T , the result (ii) is clear. Remark 1 (What do we mean by an exposed face ?) . Let us explain what wemean in Theorem 5.2 when we say that F is an exposed face of D . Recalling[8, Deﬁnition 6], F is an exposed face of a closed, convex set D if there exists asupporting hyperplane H to D with F = D ∩ H . In our case, H = X × { λ (cid:48) } .Recalling [8, Deﬁnition 5], the hyperplane H is a supporting hyperplane because D lies entirely in the aﬃne half space X × R ≤ λ (cid:48) deﬁned by H .The following example showcases a situation when Fix P S ◦ T may be empty.In so-doing, it illustrates the importance of the condition m > Example 2.

Let D = { ( y, λ ) | y ≥ λ } ⊂ R . Then for any ( y, ∈ S , T ( y,

1) = ( u,

1) with u > y , and so Fix P S ◦ T = ∅ . A: facial characterization

When we take the results of Theorem 5.2 and specify from κ back to the per-spective transform f π , we recover the following characterization of the ﬁxedpoints of P A . Theorem 5.3 (Facial characterization of ﬁxed points of P A ) . Let f : X → R + ∪ { + ∞} be a proper closed nonnegative convex function with inf f > and argmin f (cid:54) = ∅ . Let κ = f π . The following hold:(i) λ (cid:48) = u ∈ X f ( u ) , where λ (cid:48) is as in Theorem 5.2(i);(ii) Where F is as in Theorem 5.2(i), F = u ∈ X f ( u ) (argmin f × { } ) ;(iii) Any ( x, ∈ Fix P S ◦ T satisﬁes λ (cid:48) λ (cid:48) x ∈ argmin f ; iv) Any y ∈ argmin f satisﬁes ( λ (cid:48) λ (cid:48) y, ∈ Fix P S ◦ T .Proof. (i): For simplicity, let η := min u ∈ X f ( u ). We will ﬁrst show that1 η = max λ ∈ R { λ | ∃ y ∈ X so that ( y, λ ) ∈ D } . Let y ∈ argmin f . Then f π ( y/η, /η ) = 1 η f π ( y,

1) = 1 η f ( y ) = η = 1 , and so ( y, /η ) ∈ D . To see that 1 /η is maximal, suppose for a contradictionthat there exists ( y , λ ) ∈ ( X × ]1 /η, ∞ [) ∩ D. Then1 ≥ f π ( y , λ ) ≥ λ f π ( y /λ , > η f ( y /λ ,

1) = 1 η f ( y /λ ) ≥ , a contradiction.(ii): Having shown (i), we have from the deﬁnition of F that F = D ∩{ ( x, /η ) | x ∈ X } . Let ( y, /η ) ∈ F . Then1 ≥ f π ( y, /η ) = (1 /η ) f π ( yη,

1) = (1 /η ) f ( yη ) ≥ , and the equality throughout forces f ( yη ) = η . Thus yη ∈ argmin f and so y ∈ (1 /η ) argmin f . Thus F ⊂ (1 /η ) argmin f . The reverse inclusion is similar.(iii) & (iv): By Theorem 5.2(i)a ( x, ∈ Fix P S ◦ T is equivalent to (cid:18) x, λ (cid:48) λ (cid:48) (cid:19) = (cid:18)

11 + λ (cid:48) (cid:19) (cid:18) y, η (cid:19) for some ( y, /η ) ∈ F. (5.7)Having shown (ii), we have that the latter inclusion is equivalent to yη ∈ argmin f . Combining with (5.7), yη ∈ argmin f ⇐⇒ (cid:18) λ (cid:48) x (cid:19) η ∈ argmin f. Having shown (i), this is equivalent to1 + λ (cid:48) λ (cid:48) x ∈ argmin f, which shows both (iii) and (iv).In the following remark, we compare the facial characterization of ﬁxedpoints of P A from Theorem 5.3 with the closely related results from [6].27 emark 2 (On synchronicity between Theorems 4.1 and 5.3) . Theorem 5.3subsumes and is closely connected with the original results of [6, Theorem 7.4],which we recalled as Theorem 4.1. To see why, notice that items (iii) and (iv)of Theorem 5.3 have the following characterizations.(iii): Applying Theorem 4.1(i), we have that ( x, λ ∗ ) = T ( x,

1) satisﬁes λ − ∗ x ∈ argmin f . Theorem 5.2(i) guarantee that λ ∗ = λ (cid:48) λ (cid:48) .(iv): From Theorem 4.1(ii), ((1+ η ) − y, ∈ Fix P S ◦ T . From Theorem 5.3(i),(1 + η ) − = λ (cid:48) λ (cid:48) .Theorem 5.3 essentially uses the more general results from Theorem 5.2 toshow that the minimizers of f form an exposed face of (min u ∈ X f ( u )) D : namelythe face that is (min u ∈ X f ( u )) F . We now state our eponymous convergence result, which shows global conver-gence of P A in the full generality of [6]. It also, under suﬃcient conditions toguarantee existence of a ﬁxed point, shows convergence of GP A . Theorem 6.1 (Convergence of P A and GP A ) . Let D be the (closed) fun-damental set of κ as in (5.1) . Suppose one of the following holds.(i) κ = f π for f : X → R + ∪ { + ∞} a proper closed nonnegative convexfunction with inf f > and argmin f (cid:54) = ∅ .(ii) There exists λ (cid:48) ≥ such that λ (cid:48) = max λ ∈ R { λ | ∃ y ∈ X so that ( y, λ ) ∈ D } ; (iii) Such a λ (cid:48) does not exist and there exists a sequence ( y n , λ n ) n ∈ N such that λ n → ∞ and λ n / (cid:107) y n (cid:107) → m > .Let γ ∈ [0 , and ( y , ∈ S . Then the following hold.1. The sequence given by ( y n +1 ,

1) := U γ ( y n +1 , where U γ :=(1 − γ ) P S ◦ T + γ Id is convergent to some ( y, ∈ Fix P S ◦ T ;2. The shadow sequences ( y n +1 , λ n +1 ) = T x n satisfy λ n → λ for some λ ∈ [0 , ;3. When (i) or (ii) holds, λ = λ (cid:48) λ (cid:48) ;4. When (i) holds, λ (cid:48) = 1 / (inf f ) and (cid:16) λ n (cid:17) y n → (cid:16) λ (cid:48) λ (cid:48) (cid:17) y ∈ argmin f . roof. By Theorem 5.3, we have that (i) = ⇒ (ii). Either of the assumptions (ii)or (iii) guarantees existence of a ﬁxed point by Theorem 5.2. The convergenceof ( y n ) n is then assured by Corollary 4.12, and the convergence of ( λ n ) n isguaranteed by Corollary 4.13. The characterization of λ (cid:48) in cases (i) and (ii) isdue to Theorems 5.2 and 5.3. Further research

We suggest three further avenues of inquiry. Firstly, results on faces of fun-damental sets (e.g. Theorem 5.3) are of interest in the development of moregeneral theory. Secondly, Friedlander, Macˆedo, and Pong also introduced asecond algorithm,

EMA , which is not addressed here [6]. A natural questionis whether

EMA possesses similar properties to P A . Finally, a motivatingquestion is whether or not algorithms such as P A may have computationaladvantages for certain problems. Acknowledgements

The author was supported by Hong Kong Research Grants Council PolyU153085/16p.The author thanks Ting Kei Pong and Michael P. Friedlander for their usefulsuggestions on this manuscript.

References [1] Alfred Auslender and Marc Teboulle.

Asymptotic cones and functions inoptimization and variational inequalities . Springer Monographs in Mathe-matics. Springer-Verlag, New York, 2003.[2] Heinz H. Bauschke and Patrick L. Combettes.

Convex analysis andmonotone operator theory in Hilbert spaces . CMS Books in Mathemat-ics/Ouvrages de Math´ematiques de la SMC. Springer, Cham, second edition,2011.[3] Andrzej Cegielski.

Iterative methods for ﬁxed point problems in Hilbertspaces , volume 2057 of

Lecture Notes in Mathematics . Springer, Heidelberg,2012.[4] Reinier D´ıaz Mill´an, Scott B. Lindstrom, and Vera Roshchina. Comparingaveraged relaxed cutters and projection methods: Theory and examples. InDavid H. Bailey, Naomi Borwein, Richard P. Brent, Regina S. Burachik,Judy-Anne Osborn, Brailey Sims, and Qiji Zhu, editors,

From Analysis toVisualization: A Celebration of the Life and Legacy of Jonathan M. Bor-wein, Callaghan, Australia, September 2017 , Springer Proceedings in Math-ematics and Statistics, pages 75–98. Springer, 2020.[5] Michael P. Friedlander, Ives Macˆedo, and Ting Kei Pong. Gauge optimiza-tion and duality.

SIAM Journal on Optimization , 24(4):1999–2022, 2014.296] Michael P. Friedlander, Ives Macˆedo, and Ting Kei Pong. Polar convolution.

SIAM Journal on Optimization , 29(2):1366–1391, 2019.[7] Ralph Tyrell Rockafellar.

Convex Analysis . Princeton University Press,1970.[8] Vera Roshchina. Faces of convex sets. Available at