Generalized Bregman envelopes and proximity operators
GGeneralized Bregman envelopes and proximity operators
Regina S. Burachik ∗ , Minh N. Dao † , and Scott B. Lindstrom ‡ February 22, 2021
Abstract
Every maximally monotone operator can be associated with a family of convex functions, calledthe
Fitzpatrick family or family of representative functions . Surprisingly, in 2017, Burachik andMartínez-Legaz showed that the well-known Bregman distance is a particular case of a generalfamily of distances, each one induced by a specific maximally monotone operator and a specificchoice of one of its representative functions. For the family of generalized Bregman distances,sufficient conditions for convexity, coercivity, and supercoercivity have recently been furnished.Motivated by these advances, we introduce in the present paper the generalized left and rightenvelopes and proximity operators, and we provide asymptotic results for parameters. Certainresults extend readily from the more specific Bregman context, while others only extend for cer-tain generalized cases. To illustrate, we construct examples from the Bregman generalizing case,together with the natural “extreme” cases that highlight the importance of which generalizedBregman distance is chosen. Primary 90C25; Secondary 26A51, 26B25, 47H05, 47H09.
Keywords: convex function, Fitzpatrick function, generalized Bregman distance, maximally monotone op-erator, Moreau envelope, proximity operator, regularization, representative function.
1. Introduction
In this paper, unless stated otherwise, ( X, k·k ) is a reflexive Banach space with dual ( X ∗ , k·k ∗ ), andΓ ( X ) is the set of all proper lower semicontinuous convex functions from X to ] −∞ , + ∞ ].In 1962, Moreau [27] introduced what has come to be known as the Moreau envelope,env γ,θ : X → [ −∞ , + ∞ ] : y inf x ∈ X (cid:26) θ ( x ) + 1 γ D k·k / ( x, y ) (cid:27) (1)and its corresponding proximity operatorProx γ,θ : X → X : y argmin x ∈ X (cid:26) θ ( x ) + 1 γ D k·k / ( x, y ) (cid:27) , (2) ∗ Mathematics, UniSA STEM, University of South Australia, Mawson Lakes, SA 5095, Australia. E-mail: [email protected] . † School of Engineering, Information Technology and Physical Sciences, Federation University Australia, Ballarat,VIC 3353, Australia. E-mail: [email protected] . ‡ Department of Applied Mathematics, Hong Kong Polytechnic University, Hong Kong. E-mail: [email protected] . a r X i v : . [ m a t h . F A ] F e b here D k·k / : ( x, y )
7→ k x − y k /
2. Moreau worked in a Hilbert space and with parameter γ = 1,and then Attouch [1, 2] introduced the more general parameter γ ∈ [0 , + ∞ [; see also [5, Chapter 12].In 1967, Bregman [11] introduced the distance associated with a differentiable convex function f , D f : X × X → [0 , + ∞ ] : ( x, y ) ( f ( x ) − f ( y ) − h∇ f ( y ) , x − y i if y ∈ int dom f, + ∞ otherwise , (3)which now bears his name and whose corresponding envelopes and proximity operators specify tothe Moreau proximity operator and envelope when f is the energy , namely f = k · k /
2. When f is not the energy, the distance may fail to be symmetric and so one is led to consider the left andright envelopes defined by ←− env γ,θ : X → [ −∞ , + ∞ ] : y inf x ∈ X (cid:26) θ ( x ) + 1 γ D f ( x, y ) (cid:27) (4a)and −→ env γ,θ : X → [ −∞ , + ∞ ] : x inf y ∈ X (cid:26) θ ( y ) + 1 γ D f ( x, y ) (cid:27) , (4b)where the left and right proximity operators are defined as in (2), with D f in place of D k·k / .The asymptotic properties of the envelopes and proximity operators for differentiable f withrespect to the parameter γ were explored in [7]. Bregman distances admit proximal point methodswhile also casting light on those constructed from the classical Moreau envelopes; see, for example,[4, 6, 14, 15, 17, 20, 21, 22, 24, 26], as well as [16, Chapter 6].Recently, Burachik and Martínez-Legaz [18] have introduced two distances based on a represen-tative function h of a maximally monotone operator T : D [,hT ( x, y ) := inf v ∈ T y ( h ( x, v ) − h x, v i ) (5a)and D ],hT ( x, y ) := sup v ∈ T y ( h ( x, v ) − h x, v i ) . (5b)When T = ∇ f , and h ( x, ∇ f ( y )) = ( f ⊕ f ∗ )( x, ∇ f ( y )) := f ( x ) + f ∗ ( ∇ f ( y )), these distances reduce(under mild domain conditions), to the Bregman distance D f ( x, y ).More recently, [13] has provided a framework of sufficient conditions for coercivity and super-coercivity of the left and right variants of the distance, calling it the generalized Bregman distance (or GBD ). It has also shown how such properties are useful for establishing coercivity of the sumof the distance together with a function in Γ ( X ).Hence, it is natural to use these new coercivity properties for carrying out a detailed analysisof the envelopes and proximity operators that are obtained when the GBD replaces the Bregmandistance in (4a) and (4b). In particular, ours can be seen as a unifying analysis which includesMoreau envelopes as a particular case. The goal of the present work is to furnish this analysis. Wecharacterize the domains of the envelopes. We then show under what conditions the GBD envelopespossess the same advantageous properties that we have when specializing to Bregman case, andunder what conditions those properties may be lost. We will illustrate with the same GBDs usedin [13]. We show, in particular, that the GBDs that arise from using the Fitzpatrick representativegenerically share the same desirable properties as their Bregman-generalizing counterparts. Suchresults will be important if some Fitzpatrick cases possess computational advantages over theirBregman counterparts or admit envelopes that are useful for examining existing algorithms throughtheir dual characterizations. 2 utline and contributions In Section 2, we recall the generalized Bregman distances, along with some of their basic properties.We also recall the coercivity framework as well as the computed distances recently established in[13]. Moreover, we explain why these distances are important; since we use them to build theenvelopes and proximity operators in our examples.In Section 3, we introduce the left and right GBD envelopes and their associated proximityoperators. We characterize the domains of the envelopes, and we provide sufficient conditions toguarantee the attainment of minimizers so that the proximity operators have nonempty images.The sufficient conditions rely upon the framework for coercivity established in [13].In Section 4, we provide asymptotic results for the parameter γ . We then show how the results inthe setting of GBDs vary from those we obtain more easily when specializing to Bregman distances.We illustrate all examples with both left and right versions, along with images of the envelopenets for a selection of γ values. For all examples, we include three prototypical cases: the case ofthe distance constructed from the Fenchel–Young representative (which, under certain conditions,coincides with the classical Bregman distance), as well as the distances constructed from the smallestand largest members of the representative function set.We conclude in Section 5, and we provide explicit forms for all of our computed examples andproximity operators in Appendix A.
2. Preliminaries
Given a nonempty subset C of X , we denote by ι C : X → ] −∞ , + ∞ ] the indicator function of C ,i.e., ι C ( x ) := 0 when x ∈ C and ι C ( x ) := + ∞ otherwise. We will denote by int( C ) the interior of C and by C the closure of C .Let f : X → ] −∞ , + ∞ ]. The domain of f is defined by dom f := { x ∈ X : f ( x ) < + ∞} ,the lower level set of f at height ξ ∈ R by lev ≤ ξ f := { x ∈ X : f ( x ) ≤ ξ } , and the epigraph of f by epi f := { ( x, ρ ) ∈ X × R : f ( x ) ≤ ρ } . We say that f is proper if dom f = ∅ and lowersemicontinuous (lsc) at ¯ x if f (¯ x ) ≤ lim inf x → ¯ x f ( x ). Unless specifically mentioned, these conceptsare with respect to the strong (norm) topology. The function f is said to be convex if ∀ x, y ∈ X, ∀ λ ∈ [0 , , f ((1 − λ ) x + λy ) ≤ (1 − λ ) f ( x ) + λf ( y ); (6) coercive if lim k x k→ + ∞ f ( x ) = + ∞ ; and supercoercive if lim k x k→ + ∞ f ( x ) / k x k = + ∞ .For a proper function f : X → ] −∞ , + ∞ ], its subdifferential is the point-to-set mapping ∂f : X ⇒ X ∗ given by ∂f ( x ) := ( { v ∈ X ∗ : ∀ y ∈ X, h y − x, v i + f ( x ) ≤ f ( y ) } if x ∈ dom f, ∅ otherwise (7)and its Fenchel conjugate is the function f ∗ : X ∗ → ] −∞ , + ∞ ] given by f ∗ ( v ) := sup x ∈ X {h x, v i − f ( x ) } . (8)Given a point-to-set operator T : X ⇒ X ∗ , its domain is dom T := { x ∈ X : T x = ∅ } , its range isran T := T ( X ), and its graph is G ( T ) := { ( x, x ∗ ) ∈ X × X ∗ : x ∗ ∈ T x } . We say that T is maximallymonotone if ( x, u ) ∈ G ( T ) ⇐⇒ ∀ ( y, v ) ∈ G ( T ) , h x − y, u − v i ≥ . (9)3ore properties and facts on maximally monotone operators can be found in [5, 16]. From thedefinition of f ∗ , we directly obtain the Fenchel-Young inequality ∀ ( x, v ) ∈ X × X ∗ , f ( x ) + f ∗ ( v ) ≥ h x, v i , (10)and when f is convex, we also have the following well-known characterization of G ( ∂f ): f ( x ) + f ∗ ( v ) = h x, v i ⇐⇒ v ∈ ∂f ( x ) . (11) Let S : X ⇒ X ∗ be a maximally monotone operator. Following [18, Definition 2.3], we say that afunction h : X × X ∗ → ] −∞ , + ∞ ] represents S if it satisfies the following conditions:(a) h is convex and norm × weak ∗ lower semicontinuous in X × X ∗ (the weak ∗ topology in X ∗ isthe smallest topology that makes continuous the linear functionals induced by x ∈ X ).(b) ∀ ( x, v ) ∈ X × X ∗ , h ( x, v ) ≥ h x, v i .(c) h ( x, v ) = h x, v i ⇐⇒ ( x, v ) ∈ G ( S ).In this situation, we denote h ∈ H ( S ), and call H ( S ) the Fitzpatrick family of S . We will make use,in particular, of three prototypical members of H ( S ). These are as follows.(i) The Fitzpatrick function of S , denoted by F S : ( x, y ) sup ( z,w ) ∈G ( S ) ( h z − x, y − w i + h x, y i ).It is well-known that F S is the smallest member of H ( S ), see [25];(ii) The largest member of H ( S ), which we denote by σ S ;(iii) In the case when T = ∂f for f ∈ Γ ( X ), we also consider the Fenchel–Young representative ,denoted as ( f ⊕ f ∗ ) ∈ H ( ∂f ), where f ⊕ f ∗ : X × X ∗ → ] −∞ , + ∞ ] is given by ∀ ( x, v ) ∈ X × X ∗ , f ⊕ f ∗ ( x, v ) := f ( x ) + f ∗ ( v ) . (12) From now on, we assume that S : X ⇒ X ∗ is a maximally monotone operator, h ∈ H ( S ), and T : X ⇒ X ∗ any point-to-set operator. As in [18, Definition 3.1], for each ( x, y ) ∈ dom S × dom T ,we define D [,hT ( x, y ) := inf v ∈ T y ( h ( x, v ) − h x, v i ) (13a)and D ],hT ( x, y ) := sup v ∈ T y ( h ( x, v ) − h x, v i ) . (13b)If y dom T , then D [,hT ( x, y ) = D ],hT ( x, y ) := + ∞ for every x ∈ X . If x dom S , then D [,hT ( x, y ) = D ],hT ( x, y ) := + ∞ for every y ∈ X . When T is point to point, we simply write D hT := D [,hT = D ],hT . In some situations, we will refer to both distances (13a) and (13b) simultaneously by the symbol D ?,hT .If a distance is of the form (13a) or (13b) we call it a generalized Bregman distance or GBD for short. We mentioned before that the GBDs specialize to the Bregman distance under certaincircumstances, which we now make precise. To a proper and convex function f : X → ] −∞ , + ∞ ],we associate two Bregman distances (see [26]) defined by D [f ( x, y ) := f ( x ) − f ( y ) + inf v ∈ ∂f ( y ) h y − x, v i (14a)4nd D ]f ( x, y ) := f ( x ) − f ( y ) + sup v ∈ ∂f ( y ) h y − x, v i . (14b)It is known that the GBDs specialize to the Bregman distances in the case where the Fenchel–Young representative is used [18, Proposition 3.5], under mild domain conditions illuminated in [13,Proposition 2.2]. We recall this result in the following proposition. Proposition 2.1 (GBDs specialize to Bregman distances).
Let f ∈ Γ ( X ) . Then, for all ( x, y ) / ∈ (dom f \ dom ∂f ) × dom ∂f , D [, ( f ⊕ f ∗ ) ∂f ( x, y ) = D [f ( x, y ) and D ], ( f ⊕ f ∗ ) ∂f ( x, y ) = D ]f ( x, y ) . (15) Remark 2.2.
By this proposition, we see that, in the case when dom f \ dom ∂f = ∅ , the twokinds of distances are everywhere equal. However, if f is the Boltzmann–Shannon (see (23)), thendom f \ dom ∂f = { } , and the two kinds of distances fail to be equal on the set { (0 , y ) | y > } (see[13] for more details).We will make use of the following lemma to simplify our analysis of the asymptotic behaviourin Section 4. Lemma 2.3.
Let f ∈ Γ ( X ) and suppose that h ∈ H ( ∂f ) is smaller than the Fenchel–Youngrepresentative. In other words, ∀ ( x, v ) ∈ dom f × ran ∂f, h ( x, v ) ≤ f ⊕ f ∗ ( x, v ) = f ( x ) + f ∗ ( v ) . (16) Then the following hold: (i) dom D [,h∂f = dom ∂f × dom ∂f . (ii) dom ∂f × int(dom ∂f ) ⊂ dom D ],h∂f ⊆ dom ∂f × dom ∂f. Consequently, when dom ∂f is open,then dom D ],h∂f = dom ∂f × dom ∂f .Proof. Let ( x, y ) ∈ dom ∂f × dom ∂f . It follows from the assumption on h and Proposition 2.1 that D ?,h∂f ( x, y ) ≤ D ?,f ⊕ f ∗ ∂f ( x, y ) = D ?f ( x, y ) , (17)where ? ∈ { [, ] } .(i): By definition, we have that dom D [,h∂f ⊆ dom ∂f × dom ∂f . Hence, it is enough to prove theopposite inclusion. Indeed, fix any v ∈ ∂f ( y ). By (17) for ? = [ and then by Cauchy-Schwartzinequality, we have that D [,h∂f ( x, y ) ≤ D [f ( x, y ) = f ( x ) − f ( y ) + inf v ∈ ∂f ( y ) h y − x, v i (18a) ≤ f ( x ) − f ( y ) + k y − x k · k v k < + ∞ , (18b)Here, the final inequality follows from the fact that x, y ∈ dom ∂f ⊆ dom f . Thus ( x, y ) ∈ dom D [,f ⊕ f ∗ ∂f , and we deduce that dom D [,h∂f = dom ∂f × dom ∂f .(ii): As in part (i), we always have that dom D ],h∂f ⊆ dom ∂f × dom ∂f . For the leftmost inclusion,assume that ( x, y ) ∈ dom ∂f × int(dom ∂f ). Since y ∈ int(dom ∂f ) and ∂f is maximally monotone,the set ∂f ( y ) is bounded, so there exists a constant M ( y ) ≥ k v k ≤ M ( y ) for every v ∈ ∂f ( y ). Altogether, (17) for ? = ] and Cauchy-Schwartz inequality give D ],h∂f ( x, y ) ≤ D ]f ( x, y ) = f ( x ) − f ( y ) + sup v ∈ ∂f ( y ) h y − x, v i (19a)5 f ( x ) − f ( y ) + k y − x k · M ( y ) < + ∞ , (19b)because x, y are fixed. This proves both inclusions. The last statement in (ii) follows directly fromthe first one. (cid:4) Remark 2.4.
The right most inclusion in Lemma 2.3(ii) might be strict, in contrast with part (i).Let B := { ( a, b ) ∈ R : a + b ≤ } the unit ball in two dimensions, and consider f := ι B theindicator function of the unit ball. Take y = (1 , ∈ B , and any x = ( x , x ) ∈ B such that x < B x = y ). Then ∂f ( y ) = { ( t,
0) : t ≥ } and f ( x ) = f ( y ) = 0, thus D ], ( f ⊕ f ∗ ) ∂f ( x, y ) = sup t ≥ h y − x, ty i = (1 − x ) sup t ≥ t = + ∞ . (20)Therefore, ( x, y ) dom D ], ( f ⊕ f ∗ ) ∂f for all x = y . For our examples in this section, X = X ∗ = R and, T = S = ∂f is point-to-point on int dom f inwhich case we simply write D h for a specific choice of h ∈ H ( ∂f ) (not to be confused with D f , theclassical Bregman distance for f ). In these cases, we will refer to the representative function usedby its name. Specifically,(i) When h = F S is the Fitzpatrick function for a maximally monotone operator S , we will write D F S ;(ii) When h = σ S , the largest member of H ( S ), we will write D σ S ;(iii) When h = f ⊕ f ∗ ∈ H ( ∂f ) for f ∈ Γ ( X ) is the Fenchel–Young representative, we will denotethis by D f ⊕ f ∗ . Remark 2.5 (The lower closed distance).
The function D ?,hT ( · , y ) may not be lower semicon-tinuous at x ∈ dom ∂f \ dom ∂f . Additionally, for y / ∈ int dom( T ), the distance may not be lowersemicontinuous with respect to the right variable either. For more details on the semicontinuityproperties of the GBDs, see [18, section 3]. For these reasons, [13] introduced the lower closed GBD D ?,hT defined by epi D ?,hT = epi D ?,hT , (21)where, as before, ? ∈ { [, ] } . The lower closed distances D F S , D σ S , and D f ⊕ f ∗ are defined analogously(with D F S , D σ S and D f ⊕ f ∗ as in (i)–(iii) above). For all our computed examples, in the cases when D h and D h do not agree, we will compute with D h so as to have a lower semicontinuous distance.The authors in [7, 13] illustrated their findings with energy , whose Bregman distance specializesto the Moreau case, and the Boltzmann–Shannon entropy , whose importance we will soon recall.Naturally, we will illustrate our results about envelopes and proximity operators using the GBDsassociated with these operators that were computed in [13]. They are as follows.
Example 2.6 (Energy).
In the case where f : x x is the energy , we have ∂f = Id. Thedistances constructed from the smallest and biggest members of H (Id) are D F Id ( x, y ) = 14 ( x − y ) and D σ Id = ι G (Id) . (22)6lready from this example, we see that we should not expect all asymptotic properties ofBregman envelopes extend to GBD envelopes, because envelopes associated with the distance D σ Id will always be vacuously equal to the function being regularized, while their associated proximityoperators will vacuously be equal to Id. Example 2.7 (Kullback–Liebler divergence and GBD D ent ⊕ ent ∗ ). The (negative)Boltzmann–Shannon entropy is defined asent : R → ] −∞ , + ∞ ] : x x log x − x if x > , x = 0 , + ∞ otherwise. (23)The Boltzmann-Shannon entropy is particularly important and natural to consider, because itsderivative is log, its conjugate is ent ∗ = exp, and its associated Bregman distance is the Kullback–Liebler divergence, D ent : ( x, y ) x (log( x ) − log( y )) − x + y if y > ,y if y > x = 0 , + ∞ otherwise , (24)which is frequently used as a measure of distance between positive vectors in information theory,statistics, and portfolio selection. The GBD associated with the Fenchel–Young representativeent ⊕ ent ∗ ∈ H (log) is D ent ⊕ ent ∗ : ( x, y ) ( x (log( x ) − log( y )) − x + y if x, y > , + ∞ otherwise . (25)Thus it may be seen that the Bregman distance of the
Boltzmann–Shannon entropy is the specialcase of the
GBD for the Fenchel–Young representative of the logarithm function, except on the setdom f \ dom ∂f × dom ∂f = { } × ]0 , + ∞ [ (see Proposition 2.1 and Remark 2.5). Its lower closureis given by D ent ⊕ ent ∗ : ( x, y ) x (log( x ) − log( y )) − x + y if y > , x ≥ , y = x = 0 , + ∞ otherwise (26)and is shown in Figure 1b. Hence, this new distance allows us to extend the domain of the classicalBregman distance to the boundary points. Example 2.8 (Kullback–Liebler divergence and GBD D F log ). The GBD constructed withthe Fitzpatrick representative is D F log ( x, y ) = + ∞ if x ≤ y ≤ ,x (cid:18) W (cid:16) xey (cid:17) + W (cid:0) xey (cid:1) − (cid:19) otherwise , (27)where W is the principal branch of the Lambert W function. The above expression simplifies,except on the set { } × [0 , + ∞ [, to the lower closed version D F log : R + × R + → [0 , + ∞ ] (28)7 a) D F log (b) D ent ⊕ ent ∗ (c) D σ log Figure 1: Three prototypical distances constructed from H (log).( x, y ) + ∞ if x < y < x > y = 0) ,ye − if x = 0 and y ≥ ,x (cid:18) W (cid:16) xey (cid:17) + W (cid:0) xey (cid:1) − (cid:19) otherwise . This distance is shown in Figure (1a). Taking the closure of the epigraph of D F log admits D F log (0 , y ) = ye − , the difference we see in (28). We opt to use the latter lower semicontinuousversion of the distance in our later examples of proximity operators and envelopes. For informationon the computation of F log and D F log , see [9, Example .17.6] and [13, Example 3.2], respectively. Example 2.9 (Kullback–Liebler divergence and GBD D σ log ). The GBD constructed withthe biggest member of H (log) is D σ log : ( x, y )
7→ − x log( y ) + ( x log( x ) if log( y ) ≤ log( x ) , + ∞ otherwise , (29)which may be recognized as equal, except at the point (0 , D σ log : ( x, y ) x log( x ) − x log( y ) if 0 < y ≤ x, x = y = 0 , + ∞ otherwise , (30)which is shown in Figure (1c).
3. Envelopes and proximity operators
We next define formally the envelopes and the proximity operators.
Definition 3.1.
Given θ : X → ] −∞ , + ∞ ] and γ ∈ R ++ , the left and right D ?,hT -envelopes of θ withparameter γ are respectively defined by ←− env ?,hγ,θ : X → [ −∞ , + ∞ ] : y inf x ∈ X (cid:18) θ ( x ) + 1 γ D ?,hT ( x, y ) (cid:19) (31a)8nd −→ env ?,hγ,θ : X → [ −∞ , + ∞ ] : x inf y ∈ X (cid:18) θ ( y ) + 1 γ D ?,hT ( x, y ) (cid:19) . (31b)The left and right proximity operator of θ with parameter γ are respectively defined by ←− P ?,hγ,θ : X ⇒ X : y argmin x ∈ X (cid:18) θ ( x ) + 1 γ D ?,hT ( x, y ) (cid:19) (32a)and −→ P ?,hγ,θ : X ⇒ X : x argmin y ∈ X (cid:18) θ ( y ) + 1 γ D ?,hT ( x, y ) (cid:19) . (32b)As before, the symbol ? can be either [ or ] . When T is point to point, we remove the symbol ? .When the symbol † appears next to the name of the representative function with the envelope orproximity operator, it is understood that the distance being used is the closed GBD D ?,hT . Whenthe Bregman distance (14a) or (14b) is used, we remove the name of the representative function inthe envelopes and proximity operators. Remark 3.2 (A selection operator for the proximity operator).
We will find it useful toemploy a selection map for the proximity operators.(i) If ←− s ?,hγ,θ ( y ) ∈ ←− P ?,hγ,θ ( y ), then ←− env ?,hγ,θ ( y ) = θ ( ←− s ?,hγ,θ ( y )) + 1 γ D ?,hT ( ←− s ?,hγ,θ ( y ) , y ) ≥ θ ( ←− s ?,hγ,θ ( y )) . (33)(ii) If −→ s ?,hγ,θ ( x ) ∈ −→ P ?,hγ,θ ( x ), then −→ env ?,hγ,θ ( x ) = θ ( −→ s ?,hγ,θ ( x )) + 1 γ D ?,hT ( x, −→ s ?,hγ,θ ( x )) ≥ θ ( −→ s ?,hγ,θ ( x )) . (34) Example 3.3 (Energy).
Let f be the energy and θ : x
7→ | x − / | . In this case, ∂f = Id, andthe distances D F Id and D σ Id are given in Example 2.6 and are both lsc. It is straightforward todetermine that their corresponding envelopes and proximity operators are given by:(i) D F Id : the corresponding GBD envelopes (both left ←− env F Id γ,θ and right −→ env F Id γ,θ ) are equal to the Moreau envelope with parameter 2 γ , and the corresponding proximity operators are equal tothe respective Moreau proximity operators with parameter 2 γ .(ii) D σ Id : the left and right proximity operators are both necessarily Id, and our left and rightenvelopes for θ are exactly equal to θ for both the right and left envelopes, a fact that wouldhold true with any other choice of θ .The following result makes use of a particular type of enlargement of a maximally monotoneoperator S . Fix h ∈ H ( S ), the enlargement L h of S is defined by L h ( x, ε ) := { v ∈ X ∗ : h ( x, v ) − ( x, v ) ≤ ε } . (35)More details on L h can be found, e.g., in [16, 19]. Part (i) of the next result extends [7, Proposi-tion 2.1] and its proof is the same as the one in [5, Proposition 12.22(i)]. Parts (ii) and (iii) are newand establish relationships between the domain of left and right envelopes with those of S and T . Proposition 3.4.
Let θ : X → ] −∞ , + ∞ ] and let γ, µ ∈ R ++ . Then the following hold: (i) ←− env ?,hµ,γθ = γ ←− env ?,hγµ,θ and −→ env ?,hµ,γθ = γ −→ env ?,hγµ,θ . ←− env ?,hγ,θ ⊆ dom T and dom −→ env ?,hγ,θ ⊆ dom S . Moreover, dom ←− env ],hγ,θ = [ x ∈ dom θ, ε ∈ R + T − ( L h ( x, ε )) and (36a)dom −→ env ],hγ,θ = x ∈ X : dom θ ∩ [ ε ∈ R + T − ( L h ( x, ε )) = ∅ . (36b)(iii) If dom S ∩ dom θ = ∅ and (dom S ∩ dom θ ) × dom T ⊆ dom D ?,hT , then dom ←− env ?,hγ,θ = dom T .If dom T ∩ dom θ = ∅ and dom S × (dom T ∩ dom θ ) ⊆ dom D ?,hT , then dom −→ env ?,hγ,θ = dom S .Proof. (i): This is straightforward from the definition.(ii): We observe that y ∈ dom ←− env ?,hγ,θ ⇐⇒ inf x ∈ X (cid:18) θ ( x ) + 1 γ D ?,hT ( x, y ) (cid:19) = ←− env ?,hγ,θ ( y ) < + ∞ (37a) ⇐⇒ ∃ x ∈ X, θ ( x ) + 1 γ D ?,hT ( x, y ) < + ∞ (37b) ⇐⇒ ∃ x ∈ dom θ, D ?,hT ( x, y ) < + ∞ , (37c)since it always holds that θ ( x ) > −∞ and D ?,hT ( x, y ) ≥
0. On the one hand, (37c) implies that y ∈ dom T , and hence dom ←− env ?,hγ,θ ⊆ dom T . To prove the first equality in (ii), note that, bydefinition of D ],hT , D ],hT ( x, y ) < + ∞ ⇐⇒ sup v ∈ T y ( h ( x, v ) − h x, v i ) < + ∞ (38a) ⇐⇒ ∃ ε ∈ R + , ∀ v ∈ T y, h ( x, v ) − h x, v i ≤ ε (38b) ⇐⇒ ∃ ε ∈ R + , T y ⊆ L h ( x, ε ) (38c) ⇐⇒ y ∈ [ ε ∈ R + T − ( L h ( x, ε )) . (38d)Combining with (37) yields dom ←− env ],hγ,θ = [ x ∈ dom θ, ε ∈ R + T − ( L h ( x, ε )) . (39)The proof for the right envelope is analogous.(iii): We will only prove the first claim because the second one is similar. In view of (ii), it sufficesto prove that dom T ⊆ dom ←− env ?,hγ,θ . Let y ∈ dom T . By assumption, there exists x ∈ dom S ∩ dom θ .Then ( x , y ) ∈ (dom S ∩ dom θ ) × dom T ⊆ dom D ?,hT which gives D ?,hT ( x , y ) < + ∞ . Using (37), weobtain that y ∈ dom ←− env ?,hγ,θ . This completes the proof. (cid:4) The following proposition compares minimum values of the envelopes with those of the referencefunction θ over the domains of S and T . Proposition 3.5.
Let θ : X → ] −∞ , + ∞ ] , x ∈ dom S , y ∈ dom T , and γ, µ ∈ R ++ with γ ≤ µ .Then the following hold: Suppose that dom S ∩ dom θ = ∅ and that T z ⊆ Sz for all z ∈ dom T . Then inf θ (dom S ) ≤ ←− env ?,hµ,θ ( y ) ≤ ←− env ?,hγ,θ ( y ) ≤ θ ( y ) (40a) and inf θ (dom S ) ≤ ←− env ?, † hµ,θ ( y ) ≤ ←− env ?, † hγ,θ ( y ) ≤ ←− env ?,hγ,θ ( y ) ≤ θ ( y ) . (40b) Consequently, inf θ (dom S ) ≤ inf ←− env ?,hγ,θ ( X ) ≤ inf θ (dom T ) , with equality throughout when dom S ⊆ dom T . Moreover, there exist α, β ∈ [inf θ (dom S ) , θ ( y )] such that ←− env ?,hγ,θ ( y ) ↓ α as γ ↑ + ∞ and ←− env ?,hγ,θ ( y ) ↑ β as γ ↓ . (41)(ii) Suppose that dom T ∩ dom θ = ∅ and that T z ⊆ Sz for all z ∈ dom S . Then inf θ (dom T ) ≤ −→ env ?,hµ,θ ( x ) ≤ −→ env ?,hγ,θ ( x ) ≤ θ ( x ) (42a) and inf θ (dom T ) ≤ −→ env ?, † hµ,θ ( x ) ≤ −→ env ?, † hγ,θ ( x ) ≤ −→ env ?,hγ,θ ( x ) ≤ θ ( x ) . (42b) Consequently, inf θ (dom T ) ≤ inf −→ env ?,hγ,θ ( X ) ≤ inf θ (dom S ) , with equality throughout when dom T ⊆ dom S . Moreover, there exist α, β ∈ [inf θ (dom T ) , θ ( x )] such that −→ env ?,hγ,θ ( x ) ↓ α as γ ↑ + ∞ and −→ env ?,hγ,θ ( x ) ↑ β as γ ↓ . (43) Proof.
The proof for the right envelopes follows the same steps as those for the left ones, andhence the details are omitted. The assumption of
T z ⊆ Sz for all z ∈ dom T , together with [18,Remark 3.3(c)] implies that ∀ z ∈ dom T, D ?,hT ( z, z ) = 0 . (44)As µ ≥ γ >
0, we have that, for all x ∈ dom S , θ ( x ) ≤ θ ( x ) + 1 µ D ?,hT ( x, y ) ≤ θ ( x ) + 1 γ D ?,hT ( x, y ) . (45)Taking the infimum over x ∈ dom S , with noting that D ?,hT ( x, y ) = + ∞ for x / ∈ dom S , yieldsinf θ (dom S ) ≤ inf x ∈ dom S θ ( x ) + 1 µ D ?,hT ( x, y ) = inf x ∈ X θ ( x ) + 1 µ D ?,hT ( x, y ) = ←− env ?,hµ,θ ( y ) (46a) ≤ inf x ∈ dom S θ ( x ) + 1 γ D ?,hT ( x, y ) = inf x ∈ X θ ( x ) + 1 γ D ?,hT ( x, y ) = ←− env ?,hγ,θ ( y ) (46b) ≤ θ ( y ) + 1 γ D ?,hT ( y, y ) = θ ( y ) , (46c)where the last equality is due to (44). This proves (40a).Next, for all x ∈ dom S , it holds that θ ( x ) ≤ θ ( x ) + µ D ?,hT ( x, y ) ≤ θ ( x ) + γ D ?,hT ( x, y ), and henceinf θ (dom S ) = inf x ∈ dom S θ ( x ) ≤ ←− env ?, † hµ,θ ( y ) ≤ ←− env ?, † hγ,θ ( y ) , (47)where we used the fact that D ?,hT ( x, y ) = + ∞ for x / ∈ dom S . Noting also that D ?,hT ≤ D ?,hT , we have ←− env ?, † hγ,θ ( y ) ≤ ←− env ?,hγ,θ ( y ), which combined with (47) and and (40a) implies (40b).Now, taking infimum over y ∈ dom T in (40a) and using the fact that dom ←− env ?,hγ,θ ⊆ dom T (seeProposition 3.4(ii)), we obtain thatinf θ (dom S ) ≤ inf ←− env ?,hγ,θ ( X ) ≤ inf θ (dom T ) . (48)If dom S ⊆ dom T , then inf θ (dom S ) ≥ inf θ (dom T ), and the equalities in (48) must hold. Theremaining conclusion follows directly from (40a) and the fact that γ ≤ µ . (cid:4) emark 3.6. Suppose that dom T = dom S =: D and we want to solve the optimization problemmin θ ( x ) s.t. x ∈ D. (49)It is interesting to be able to state relationships between the optimal value of the problem andinf ←− env ?,hγ,θ ( D ) as well as inf −→ env ?,hγ,θ ( D ). It is also important to establish relationships between thesets argmin θ ( D ), argmin ←− env ?,hγ,θ ( D ), and argmin −→ env ?,hγ,θ ( D ). We establish these relationships in thenext result. Proposition 3.7.
Suppose that dom T = dom S =: D and that T z ⊆ Sz for all z ∈ D . Let θ : X → ] −∞ , + ∞ ] with D ∩ dom θ = ∅ , let γ ∈ R ++ , and let x, y ∈ D . Set ←− A := [ x ∈ argmin θ ( D ) { y ∈ D : T y ∩ Sx = ∅ } , ←− B := [ x ∈ argmin θ ( D ) { y ∈ D : T y ⊆ Sx } (50a) and −→ A := [ y ∈ argmin θ ( D ) { x ∈ D : T y ∩ Sx = ∅ } , −→ B := [ y ∈ argmin θ ( D ) { x ∈ D : T y ⊆ Sx } . (50b) Then the following hold: (i) inf θ ( D ) = inf ←− env ?,hγ,θ ( X ) = inf −→ env ?,hγ,θ ( X ) and argmin θ ( D ) ⊆ argmin ←− env ?,hγ,θ ( X ) ∩ argmin −→ env ?,hγ,θ ( X ) . (ii) ←− A ⊆ argmin ←− env [,hγ,θ ( X ) and −→ A ⊆ argmin −→ env [,hγ,θ ( X ) . (iii) If ←− P ],hγ,θ ( y ) = ∅ for all y ∈ argmin ←− env ],hγ,θ ( X ) , then argmin ←− env ],hγ,θ ( X ) ⊆ ←− B . Consequently, argmin ←− env ],hγ,θ ( X ) ⊆ ←− B ⊆ ←− A ⊆ argmin ←− env [,hγ,θ ( X ) . (iv) If −→ P ],hγ,θ ( y ) = ∅ for all y ∈ argmin −→ env ],hγ,θ ( X ) , then argmin −→ env ],hγ,θ ( X ) ⊆ −→ B . Consequently, argmin −→ env ],hγ,θ ( X ) ⊆ −→ B ⊆ −→ A ⊆ argmin −→ env [,hγ,θ ( X ) .Proof. (i): Since dom T = dom S = D , Proposition 3.5 implies that inf θ ( D ) = inf ←− env ?,hγ,θ ( X ) =inf −→ env ?,hγ,θ ( X ). Now, let z ∈ argmin θ ( D ). Then z ∈ D and, by assumption, T z ⊆ Sz , fromwhich we have D ?,hT ( z, z ) = 0. Therefore, θ ( z ) + γ D ?,hT ( z, z ) = θ ( z ) = inf θ ( D ) = inf ←− env ?,hγ,θ ( X ) =inf −→ env ?,hγ,θ ( X ), which implies that z ∈ argmin ←− env ?,hγ,θ ( X ) and also z ∈ argmin −→ env ?,hγ,θ ( X ). We obtainthat argmin θ ( D ) ⊆ argmin ←− env ?,hγ,θ ( X ) ∩ argmin −→ env ?,hγ,θ ( X ).(ii): Let any y ∈ ←− A . Then y ∈ D and there exists x ∈ argmin θ ( D ) such that T y ∩ Sx = ∅ . By[18, Remark 3.3(b)], D [,hT ( x, y ) = 0. Using (i) with ? = [ , we haveinf θ ( D ) = inf ←− env [,hγ,θ ( X ) ≤ ←− env [,hγ,θ ( y ) ≤ θ ( x ) + 1 γ D [,hT ( x, y ) = θ ( x ) = inf θ ( D ) , (51)which implies that y ∈ argmin ←− env [,hγ,θ ( X ). Hence, ←− A ⊆ argmin ←− env [,hγ,θ ( X ). Similarly, we havethat −→ A ⊆ argmin −→ env [,hγ,θ ( X ).(iii): Let any y ∈ argmin ←− env ],hγ,θ ( X ). By assumption, there exists x ∈ X such that θ ( x ) + γ D ],hT ( x, y ) = ←− env ],hγ,θ ( y ) < + ∞ . Then, we must have x ∈ dom S = D and, by (i) with ? = ] ,inf θ ( D ) = inf ←− env ],hγ,θ ( X ) = ←− env ],hγ,θ ( y ) = θ ( x ) + 1 γ D ],hT ( x, y ) ≥ θ ( x ) ≥ inf θ ( D ) . (52)12herefore, θ ( x ) = inf θ ( D ) and D ],hT ( x, y ) = 0. While the former implies x ∈ argmin θ ( D ), the latterimplies T y ⊆ Sx (see [18, Proposition 3.7(b)]). We deduce that y ∈ ←− B .(iv): This is similar to (iii). (cid:4) Proposition 3.7 shows that the envelopes can be used to automatically impose the constraints.Namely, they transform the original constrained problem into an unconstrained one. This remarkjustifies the assumption imposed in the next proposition.For A ⊆ X , denote by A w its weak closure. Namely, A w contains all the (weak) limits of weaklyconvergent sequences contained in A . Recall that a set is said to be precompact when its closure iscompact. Proposition 3.8 (Left proximity operators).
Let θ ∈ Γ ( X ) and y ∈ dom T . Suppose that dom S ∩ dom θ = ∅ and (dom S ∩ dom θ ) × { y } ⊆ dom D ],hT . Then the following hold: (i) Suppose that ϕ µ ( · ) := θ ( · ) + µ D ],hT ( · , y ) is coercive for some µ ∈ R ++ . Then, for all γ ∈ ]0 , µ ] , ∅ = ←− P ],hγ,θ ( y ) ⊆ dom S ∩ dom θ . If T z ⊆ Sz for all z ∈ dom T , then P := [ γ ∈ ]0 ,µ ] ←− P ],hγ,θ ( y ) ⊆ lev ≤ θ ( y ) ϕ µ . (53) If, in addition, y ∈ dom θ , then P is weakly precompact. (ii) Suppose that θ is coercive. Then, for all γ ∈ R ++ , ∅ = ←− P ],hγ,θ ( y ) ⊆ dom S ∩ dom θ . If T z ⊆ Sz for all z ∈ dom T , then P := [ γ ∈ R ++ ←− P ],hγ,θ ( y ) ⊆ lev ≤ θ ( y ) θ. (54) If, in addition, y ∈ dom θ , then P is weakly precompact.Proof. For each γ ∈ R ++ , set ϕ γ ( · ) := θ ( · ) + γ D ],hT ( · , y ). It follows from dom S ∩ dom θ = ∅ and (dom S ∩ dom θ ) × { y } ⊆ dom D ],hT that ϕ γ is proper. By the same argument as in [18,Lemma 3.17(b)], D ],hT ( · , y ) is (strongly) lsc on X , and so is ϕ γ . Since convexity of ϕ γ follows fromthat of θ and D ],hT ( · , y ), we obtain that ϕ γ ∈ Γ ( X ).(i): Let γ ∈ ]0 , µ ]. Then ϕ γ ≥ ϕ µ , and the coercivity of ϕ γ follows from the assumption that ϕ µ is coercive. By combining with the fact that ϕ γ ∈ Γ ( X ) and using [29, Theorem 5.4.4], there exists z γ ∈ X such that ϕ γ ( z γ ) = min x ∈ X ϕ γ ( x ) < + ∞ . We obtain that θ ( z γ )+ γ D ],hT ( z γ , y ) = ←− env ],hγ,θ ( y ) < + ∞ , and so ←− P ],hγ,θ ( y ) = ∅ . Now, take an arbitrary s γ ∈ ←− P ],hγ,θ ( y ). Then θ ( s γ ) + γ D ],hT ( s γ , y ) = ←− env ],hγ,θ ( y ) < + ∞ . Since θ ( s γ ) > −∞ and D ],hT ( s γ , y ) ≥
0, we must have that D ],hT ( s γ , y ) < + ∞ and θ ( s γ ) < + ∞ , which yield s γ ∈ dom S ∩ dom θ . Hence, ←− P ],hγ,θ ( y ) ⊆ dom S ∩ dom θ .Next, assume that T z ⊆ Sz for all z ∈ dom T . We have from the rightmost inequality in (40a)of Proposition 3.5(i) that, for all γ ∈ ]0 , µ ] and s γ ∈ ←− P ],hγ,θ ( y ), θ ( y ) ≥ ←− env ],hγ,θ ( y ) = θ ( s γ ) + 1 γ D ],hT ( s γ , y ) ≥ ϕ µ ( s γ ) = θ ( s γ ) + 1 µ D ],hT ( s γ , y ) . (55)Therefore, P = [ γ ∈ ]0 ,µ ] ←− P ],hγ,θ ( y ) ⊆ lev ≤ θ ( y ) ϕ µ = { z ∈ X : ϕ µ ( z ) ≤ θ ( y ) } . (56)13ow, assume that y ∈ dom θ , so θ ( y ) < + ∞ . Since ϕ µ is convex and lsc, it is weakly lsc. Moreover, ϕ µ is coercive, so its level set lev ≤ θ ( y ) ϕ µ is bounded and weakly closed. By [12, Theorem 3.17],lev ≤ θ ( y ) ϕ µ is weakly compact. This directly implies that P w ⊆ lev ≤ θ ( y ) ϕ µ . Being a weakly closedsubset of a weakly compact set, P w is also weakly compact. Hence, P is weakly precompact.(ii): Since θ is coercive, we have that ϕ µ is also coercive for all µ ∈ R ++ . The first conclusionfollows from (i). Now, assume that T z ⊆ Sz for all z ∈ dom T . Then, by the rightmost inequalityin (40a) of Proposition 3.5(i), for all γ ∈ R ++ and s γ ∈ ←− P ],hγ,θ ( y ), θ ( y ) ≥ ←− env ],hγ,θ ( y ) = θ ( s γ ) + 1 γ D ],hT ( s γ , y ) ≥ θ ( s γ ) . (57)The rest of the proof is similar to that of (i). (cid:4) Proposition 3.9 (Right proximity operators).
Let θ : X → ] −∞ , + ∞ ] be proper and lsc, andlet x ∈ dom S . Suppose that X is finite-dimensional, that dom T ∩ dom θ = ∅ , that { x } × (dom T ∩ dom θ ) ⊆ dom D ?,hT , and that D ?,hT ( x, · ) is lsc. Then the following hold: (i) Suppose that ϕ µ ( · ) := θ ( · ) + µ D ?,hT ( x, · ) is coercive for some µ ∈ R ++ . Then, for all γ ∈ ]0 , µ ] , ∅ = −→ P ?,hγ,θ ( x ) ⊆ dom T ∩ dom θ . If T z ⊆ Sz for all z ∈ dom S , then P := [ γ ∈ ]0 ,µ ] −→ P ?,hγ,θ ( x ) ⊆ lev ≤ θ ( x ) ϕ µ . (58) If, in addition, x ∈ dom θ , then P is bounded. (ii) Suppose that θ is coercive. Then, for all γ ∈ R ++ , ∅ = −→ P ?,hγ,θ ( x ) ⊆ dom T ∩ dom θ . If T z ⊆ Sz for all z ∈ dom S , then P := [ γ ∈ R ++ −→ P ?,hγ,θ ( x ) ⊆ lev ≤ θ ( x ) θ. (59) If, in addition, x ∈ dom θ , then P is bounded.Proof. Arguing as in the proof of Proposition 3.8, for all γ ∈ ]0 , µ ] in case (i) and for all γ ∈ R ++ incase (ii), ϕ γ ( · ) := θ ( · ) + γ D ?,hT ( x, · ) is proper, lsc, and coercive. Since −→ env ?,hγ,θ ( x ) = inf y ∈ X ϕ γ ( y ) < + ∞ , we can take a sequence ( y n ) n ∈ N in X such that ϕ γ ( y n ) → −→ env ?,hγ,θ ( x ) as n → + ∞ . We derivefrom the coercivity of ϕ γ that ( y n ) n ∈ N is bounded, and so there is a subsequence ( y k n ) n ∈ N convergingto some z γ ∈ X . As ϕ γ is lsc, ϕ γ ( z γ ) ≤ lim inf n → + ∞ ϕ γ ( y n ) = −→ env ?,hγ,θ ( x ) = inf y ∈ X ϕ γ ( y ) ≤ ϕ γ ( z γ ).Therefore, −→ env ?,hγ,θ ( x ) = ϕ γ ( z γ ) = θ ( z γ ) + γ D ?,hT ( x, z γ ), which implies that −→ P ?,hγ,θ ( x ) = ∅ . Proceedingas in the proof of Proposition 3.8(i), we have −→ P ?,hγ,θ ( x ) ⊆ dom T ∩ dom θ .We now prove the second conclusion of (i). By the assumption that T z ⊆ Sz for all z ∈ dom S ,the rightmost inequality in (42a) of Proposition 3.5(ii) implies that, for all γ ∈ ]0 , µ ] and s γ ∈−→ P ?,hγ,θ ( x ), θ ( x ) ≥ −→ env ?,hγ,θ ( x ) = θ ( s γ ) + 1 γ D ?,hT ( x, s γ ) ≥ ϕ µ ( s γ ) = θ ( s γ ) + 1 µ D ?,hT ( x, s γ ) . (60)We deduce that P = [ γ ∈ ]0 ,µ ] −→ P ?,hγ,θ ( x ) ⊆ lev ≤ θ ( x ) ϕ µ = { z ∈ X : ϕ µ ( z ) ≤ θ ( x ) } . (61)Combining this fact with the coercivity of ϕ µ and the additional assumption that θ ( x ) < + ∞ , weobtain the boundedness of P . The second conclusion of (ii) follows by the same argument as in theproof of Proposition 3.8(ii). (cid:4) . Asymptotic behaviour properties Moreau first considered the envelope that has come to bear his name in the setting of γ = 1 [27]. Hewas interested, in particular, in the characterization of infimal convolution as epigraph addition; see[28]. Attouch introduced the more general parameter γ for regularizing convex functions [1, 2] andlater with Wets for nonconvex functions [3]. When the regularized function is the sum of a convexobjective function θ together with the indicator function for a constraint set, the Moreau envelopeprovides a smooth regularization with full domain. Recovery of the regularized function θ as γ ↓ θ , while the asymptoticproperties as γ ↑ + ∞ shed light on other properties of the regularization. In what follows, we willanalyse both.We say that a maximally monotone operator S is strictly monotone over a subset B ⊆ dom S iffor every z, z ∈ B we have h z − z , w − w i = 0 with w ∈ T z, w ∈ T z implies z = z . (62)Because the proximity operators are generically set-valued operators, when their images arenonempty we will make use of selection operators ←− s ?,hγ,θ , −→ s ?,hγ,θ that satisfy ←− s ?,hγ,θ ( y ) ∈ ←− P ?,hγ,θ ( y ) and −→ s ?,hγ,θ ( x ) ∈ −→ P ?,hγ,θ ( x ). Theorem 4.1 (Asymptotic left behaviour when γ ↓ ). Let θ ∈ Γ ( X ) and let y ∈ dom T ∩ dom θ . Suppose that dom S ∩ dom θ = ∅ , that (dom S ∩ dom θ ) × { y } ⊆ dom D ],hT , that T z ⊆ Sz for all z ∈ dom T , and that ϕ µ ( · ) := θ ( · ) + µ D ],hT ( · , y ) is coercive for some µ ∈ R ++ . For each γ ∈ ]0 , µ ] , let s γ := ←− s ],hγ,θ ( y ) ∈ ←− P ],hγ,θ ( y ) . Then the following hold: (i) D ],hT ( s γ , y ) → as γ ↓ . (ii) For every weak cluster point z of ( s γ ) γ ∈ ]0 ,µ ] as γ ↓ , it holds that D ],hT ( z, y ) = 0 , T y ⊆ Sz ,and z ∈ lev ≤ θ ( y ) θ . (iii) If T = S and S is strictly monotone over dom S ∩ dom θ , then, as γ ↓ , s γ * y, ←− env ],hγ,θ ( y ) ↑ θ ( y ) , θ ( s γ ) → θ ( y ) , and γ D ],hT ( s γ , y ) → . (63) Proof.
By the rightmost inequality in (40a) of Proposition 3.5(i), for all γ ∈ ]0 , µ ], θ ( y ) ≥ ←− env ],hγ,θ ( y ) = θ ( s γ ) + 1 γ D ],hT ( s γ , y ) ≥ θ ( s γ ) . (64)(i): Since θ ∈ Γ ( X ), we can apply [16, Proposition 3.4.17] to conclude the existence of u ∈ X ∗ and η ∈ R such that θ ≥ h· , u i + η (equivalently, dom θ ∗ = ∅ ). Using (64) and Cauchy–Schwarzinequality, we have that, for all γ ∈ ]0 , µ ], θ ( y ) ≥ θ ( s γ ) + 1 γ D ],hT ( s γ , y ) ≥ h s γ , u i + η + 1 γ D ],hT ( s γ , y ) (65a) ≥ − ρ k u k + η + 1 γ D ],hT ( s γ , y ) , (65b)which re-arranges as 0 ≤ D ],hT ( s γ , y ) ≤ γ ( θ ( y ) + ρ k u k − η ) → γ ↓ . (66)15herefore, D ],hT ( s γ , y ) → γ ↓ s k ( γ ) ) be a subnet of ( s γ ) γ ∈ ]0 ,µ ] weakly converging to z as k ( γ ) →
0. Note that D ],hT ( · , y )and θ are weakly lsc since they are convex and lsc. Applying (66) to subnet ( s k ( γ ) ) and using theweak lsc of D ],hT ( · , y ), we derive that0 ≤ D ],hT ( z, y ) ≤ lim inf k ( γ ) ↓ D ],hT ( s k ( γ ) , y ) ≤ lim inf k ( γ ) ↓ k ( γ )( θ ( y ) + ρ k u k − η ) = 0 , (67)which yields D ],hT ( z, y ) = 0. Therefore, T y ⊆ Sz due to the definition of D ],hT and property (c) of h .Next, by applying the first inequality in (64) to subnet ( s k ( γ ) ) and using the weak lsc of θ , θ ( y ) ≥ lim inf k ( γ ) ↓ (cid:18) θ ( s k ( γ ) ) + 1 k ( γ ) D ],hT ( s k ( γ ) , y ) (cid:19) ≥ lim inf k ( γ ) ↓ θ ( s k ( γ ) ) ≥ θ ( z ) , (68)and so z ∈ lev ≤ θ ( y ) θ .(iii): Let z be an arbitrary weak cluster point z of ( s γ ) γ ∈ ]0 ,µ ] as γ ↓
0. By assumption and (ii),
T y ⊆ T z . Since y ∈ dom T , taking v ∈ T y ⊆ T z , we have that 0 = h z − y, v − v i with v ∈ T z and v ∈ T y . The strict monotonicity of T implies that z = y . We deduce that s γ * y as γ ↓
0. Now,recall from Proposition 3.5(i) that ←− env ?,hγ,θ ( y ) ↑ β as γ ↓ β ∈ R . Using (64) and the weaklsc of θ , we obtain that θ ( y ) ≥ β ≥ lim inf γ ↓ θ ( s γ ) ≥ θ ( y ) ≥ lim sup γ ↓ θ ( s γ ) , (69)which yields β = θ ( y ) = lim γ ↓ θ ( s γ ). Again using (64), this implies γ D ],hT ( s γ , y ) →
0, and we aredone. (cid:4)
Theorem 4.2 (Asymptotic right behaviour when γ ↓ ). Let θ : X → ] −∞ , + ∞ ] be properand lsc, and let x ∈ dom S ∩ dom θ . Suppose that X is finite-dimensional, that dom T ∩ dom θ = ∅ ,that { x } × (dom T ∩ dom θ ) ⊆ dom D ?,hT , that D ?,hT ( x, · ) is lsc, and that ϕ ( · ) := θ ( · ) + µ D ?,hT ( x, · ) iscoercive for some µ ∈ R ++ . For each γ ∈ ]0 , µ ] , let s γ := −→ s ?,hγ,θ ( x ) ∈ −→ P ?,hγ,θ ( x ) . Then the followinghold: (i) D ?,hT ( x, s γ ) → as γ ↓ . (ii) For every cluster point z of ( s γ ) γ ∈ ]0 ,µ ] as γ ↓ , it holds that D ?,hT ( x, z ) = 0 and z ∈ lev ≤ θ ( x ) θ . (iii) If T = S and S is strictly monotone over dom S ∩ dom θ , then, as γ ↓ , s γ → x, ←− env ?,hγ,θ ( x ) ↑ θ ( x ) , θ ( s γ ) → θ ( x ) , and γ D ?,hT ( x, s γ ) → . (70) Proof.
This is proved similarly to Theorem 4.1 by using Proposition 3.5(ii). (cid:4)
It is worthwhile to connect these asymptotic results with previous ones in the literature. When T = S = ∇ f for a strictly convex and differentiable f : R j → R ∞ , then ( T z ⊆ Sz ) ⇐⇒ T z = Sz and ( D f ( z, y ) = 0) ⇐⇒ z = y . In this specific case, Theorems 4.1 and 4.2 show both [7,Proposition 3.2] and [7, Theorem 3.3] . Of course, Theorems 4.1 and 4.2 are stronger. In additionto not requiring right convexity of the distance, these results also include envelopes that are notclassical Bregman envelopes. We include several such examples of non-classical left envelopes inFigure 2 and of non-classical right envelopes in Figure 3. In [7], the authors assume joint convexity and coercivity of D f to obtain non-emptiness of the right proximityoperator images by [6, Proposition 3.5]; the latter result relies on the lower semicontinuity of the right distance asshown in [6, Lemma 2.6]. Thus our rather weak assumption that the right distance be lower semicontinuous is muchless restrictive than the assumptions in [7]. heorem 4.3 (Asymptotic left behaviour when γ ↑ + ∞ ). Let θ : X → ] −∞ , + ∞ ] , y ∈ dom T ,and γ ∈ R ++ . Suppose that dom S ∩ dom θ = ∅ , that (dom S ∩ dom θ ) × { y } ⊆ dom D ?,hT , and that T z ⊆ Sz for all z ∈ dom T . Then the following hold: (i) ←− env ?,hγ,θ ( y ) ↓ inf θ (dom S ) as γ ↑ + ∞ . Consequently, if inf θ (dom S ) = inf θ (dom S ) , then ←− env ?, † hγ,θ ( y ) ↓ inf θ (dom S ) as γ ↑ + ∞ . (ii) Suppose that θ ∈ Γ ( X ) and θ is coercive. For each γ ∈ R ++ , let s γ := ←− s ],hγ,θ ( y ) ∈ ←− P ],hγ,θ ( y ) .Then θ ( s γ ) → inf θ (dom S ) as γ ↑ + ∞ . (71) Moreover, if inf θ (dom w S ) = inf θ (dom S ) , then all weak cluster points of ( s γ ) γ ∈ R ++ as γ ↑ + ∞ lie in argmin θ (dom w S ) = ∅ . If additionally argmin θ (dom w S ) is a singleton, then s γ → argmin θ (dom w S ) as γ ↑ + ∞ . (iii) Suppose that θ ∈ Γ ( X ) , θ is coercive, and inf θ (dom S ) = inf θ (dom S ) . For each γ ∈ R ++ ,let s γ := ←− s ], † hγ,θ ( y ) ∈ ←− P ], † hγ,θ ( y ) . Then θ ( s γ ) → inf θ (dom S ) as γ ↑ + ∞ . (72) Moreover, if inf θ (dom w S ) = inf θ (dom S ) , then all weak cluster points of ( s γ ) γ ∈ R ++ as γ ↑ + ∞ lie in argmin θ (dom w S ) = ∅ . If additionally argmin θ (dom w S ) is a singleton, then s γ → argmin θ (dom w S ) as γ ↑ + ∞ .Proof. (i): In view of Proposition 3.5(i), ←− env ?,hγ,θ ( y ) ↓ α ≥ inf θ (dom S ) as γ ↑ + ∞ . (73)By definition, for all x ∈ dom S ∩ dom θ , ←− env ?,hγ,θ ( y ) ≤ θ ( x ) + 1 γ D ?,hT ( x, y ) , (74)Letting γ ↑ + ∞ and using the assumption that (dom S ∩ dom θ ) × { y } ⊆ dom D ?,hT , we derive that,for all x ∈ dom S ∩ dom θ , α ≤ θ ( x ), and so α ≤ inf θ (dom S ∩ dom θ ) = inf θ (dom S ). Combiningwith (73) implies that ←− env ?,hγ,θ ( y ) ↓ inf θ (dom S ) as γ ↑ + ∞ . In turn, by invoking (40b), we get thesecond conclusion.(ii): According to Proposition 3.8(ii), for all γ ∈ R ++ , ∅ = ←− P ],hγ,θ ( y ) ⊆ dom S ∩ dom θ . Since s γ ∈ ←− P ],hγ,θ ( y ), we have that s γ ∈ dom S ∩ dom θ , and soinf θ (dom S ) ≤ θ ( s γ ) ≤ θ ( s γ ) + 1 γ D ],hT ( s γ , y ) = ←− env ],hγ,θ ( y ) . (75)By combining with (i), θ ( s γ ) → inf θ (dom S ) as γ ↑ + ∞ .Now, if inf θ (dom w S ) = inf θ (dom S ), then we also have that θ ( s γ ) → inf θ (dom w S ) as γ ↑ + ∞ .The conclusion follows from [7, Lemma 3.1].(iii): Similar to Proposition 3.8(ii), we have that, for all γ ∈ R ++ , ∅ = ←− P ], † hγ,θ ( y ) ⊆ dom S ∩ dom θ .Thus, s γ ∈ dom S ∩ dom θ andinf θ (dom S ) ≤ θ ( s γ ) ≤ θ ( s γ ) + 1 γ D ],hT ( s γ , y ) = ←− env ], † hγ,θ ( y ) . (76)By assumption and (i), ←− env ], † hγ,θ ( y ) → inf θ (dom S ) = inf θ (dom S ) as γ ↑ + ∞ , and hence θ ( s γ ) → inf θ (dom S ) as γ ↑ + ∞ . Finally, proceeding as in (ii), we complete the proof. (cid:4) heorem 4.4 (Asymptotic right behaviour when γ ↑ + ∞ ). Let θ : X → ] −∞ , + ∞ ] , x ∈ dom S , and γ ∈ R ++ . Suppose that dom T ∩ dom θ = ∅ , that { x } × (dom T ∩ dom θ ) ⊆ dom D ?,hT ,and that T z ⊆ Sz for all z ∈ dom S . Then the following hold: (i) −→ env ?,hγ,θ ( x ) ↓ inf θ (dom T ) as γ ↑ + ∞ . Consequently, if inf θ (dom T ) = inf θ (dom T ) , then −→ env ?, † hγ,θ ( x ) ↓ inf θ (dom T ) as γ ↑ + ∞ . (ii) Suppose that X is finite-dimensional, that θ is lsc and coercive, and that D ?,hT ( x, · ) is lsc. Foreach γ ∈ R ++ , let s γ := −→ s ?,hγ,θ ( x ) ∈ −→ P ?,hγ,θ ( x ) . Then θ ( s γ ) → inf θ (dom T ) as γ ↑ + ∞ . (77) Moreover, if inf θ (dom T ) = inf θ (dom T ) , then all cluster points of ( s γ ) γ ∈ R ++ as γ ↑ + ∞ lie in argmin θ (dom T ) = ∅ . If additionally argmin θ (dom T ) is a singleton, then s γ → argmin θ (dom T ) as γ ↑ + ∞ . (iii) Suppose that X is finite-dimensional, that θ is lsc and coercive, and that inf θ (dom T ) =inf θ (dom T ) . For each γ ∈ R ++ , let s γ := −→ s ?, † hγ,θ ( x ) ∈ −→ P ?, † hγ,θ ( x ) . Then θ ( s γ ) → inf θ (dom T ) as γ ↑ + ∞ . (78) Moreover, all cluster points of ( s γ ) γ ∈ R ++ as γ ↑ + ∞ lie in argmin θ (dom T ) = ∅ . If addition-ally argmin θ (dom T ) is a singleton, then s γ → argmin θ (dom T ) as γ ↑ + ∞ .Proof. This is analogous to the proof of Theorem 4.3 and uses Proposition 3.5(ii) and Proposi-tion 3.9(ii). We note for (iii) that D ?,hT ( x, · ) is lsc. (cid:4) Remark 4.5.
We note that the condition inf θ (dom w S ) = inf θ (dom S ) in Theorem 4.3 is satisfiedas soon as either(i) dom S is weakly closed, or(ii) θ ∈ Γ ( X ), dom S is convex, and int dom S ∩ dom θ = ∅ .Indeed, the former case is obvious, while the latter case follows from [5, Proposition 11.1(iv)] (whoseproof is still valid in a Banach space). Analogous statements hold true for the finite-dimensionalcounterpart inf θ (dom T ) = inf θ (dom T ) in Theorem 4.4.It is worthwhile to remark on the importance of the domain conditions imposed in the leftTheorem 4.3(i)–(ii), and in the right Theorem 4.4(i)–(ii). In particular, the distance D † σ log does notsatisfy them for all x, y ∈ dom log, and we use it to derive envelopes for θ = | · − / | in Figures 2cand 3c that do not have the asymptotic properties (i)–(ii) for every x, y ∈ dom log. Of course, thedomain conditions we have imposed in Proposition 3.7, Theorem 4.3, and Theorem 4.4 are still quitebroad in what they cover. Corollary 4.6 will showcase how the corresponding asymptotic guaranteessubsume those of [7, Proposition 2.2], while generalizing them to many other classes of distancesand envelopes. In particular, we illustrate with the GBD D † F log by constructing its generalized leftand right envelopes in Figures 2a and 3a. It may be seen from the figures that these new envelopesrespectively satisfy the asymptotic guarantees of Theorem 4.3(i)–(ii) and Theorem 4.4(i)–(ii). Corollary 4.6.
Let f ∈ Γ ( X ) with D := dom ∂f . Suppose that T = S = ∂f and that h ∈ H ( ∂f ) satisfies ∀ ( x, v ) ∈ dom f × ran ∂f, h ( x, v ) ≤ f ⊕ f ∗ ( x, v ) = f ( x ) + f ∗ ( v ) . (79) Let θ : X → R ∞ with D ∩ dom θ = ∅ , let γ ∈ R ++ , and let x, y ∈ D . Then the following hold: As γ ↑ + ∞ , (a) ←− env [,hγ,θ ( y ) ↓ inf θ ( D ) and −→ env [,hγ,θ ( x ) ↓ inf θ ( D ) ; (b) ←− env [, † hγ,θ ( y ) ↓ inf θ ( D ) and −→ env [, † hγ,θ ( x ) ↓ inf θ ( D ) ; (c) Classic Bregman envelopes satisfy ←− env [γ,θ ( y ) ↓ inf θ ( D ) and −→ env [γ,θ ( x ) ↓ inf θ ( D ) . (ii) Suppose further that dom ∂f is open. Then, as γ ↑ + ∞ , (a) ←− env ],hγ,θ ( y ) ↓ inf θ ( D ) and −→ env ],hγ,θ ( x ) ↓ inf θ ( D ) ; (b) ←− env ], † hγ,θ ( y ) ↓ inf θ ( D ) and −→ env ], † hγ,θ ( x ) ↓ inf θ ( D ) ; (c) Classic Bregman envelopes satisfy ←− env ]γ,θ ( y ) ↓ inf θ ( D ) and −→ env ]γ,θ ( x ) ↓ inf θ ( D ) .Proof. (i): By Lemma 2.3(i), dom D [h = D × D , so D [h satisfies Theorem 4.3(i) and Theorem 4.4(i),and we therefore get (i)(a) and (i)(b).Now, since y ∈ D = dom ∂f , in view of Proposition 2.1 and Definition 3.1, we have that ←− env ?γ,θ ( y ) = ←− env ?, ( f ⊕ f ∗ ) γ,θ ( y ) and −→ env ?γ,θ ( x ) = −→ env ?, ( f ⊕ f ∗ ) γ,θ ( x ). Thus (i)(c) follows by applying (i)(a)with h = f ⊕ f ∗ .(ii): As dom ∂f is open, Lemma 2.3(ii) implies that dom D ],h∂f = D × D . The proof is thencompleted by a similar argument as in (i). (cid:4) The results in Corollary 4.6(i)(c)&(ii)(c) for classical Bregman distances reduce to the ones in[7, Proposition 2.2] when f is a differentiable convex function. The following corollary providesanalogous extensions for the generalized proximity operators as well. We denote by ←− P ?γ,θ , −→ P ?γ,θ theclassical Bregman proximity operators associated to the classical Bregman envelopes ←− env ?γ,θ , −→ env ?γ,θ . Corollary 4.7.
Let f ∈ Γ ( X ) with D := dom ∂f open. Suppose that T = S = ∂f and that h ∈ H ( ∂f ) satisfies ∀ ( x, v ) ∈ dom f × ran ∂f, h ( x, v ) ≤ f ⊕ f ∗ ( x, v ) = f ( x ) + f ∗ ( v ) . (80) Let θ ∈ Γ ( X ) be coercive with int D ∩ dom θ = ∅ , let γ ∈ R ++ , and let y ∈ D . Suppose that oneof the following holds: (i) For each γ ∈ R ++ , s γ := ←− s ],hγ,θ ( y ) ∈ ←− P ],hγ,θ ( y ) ; (ii) For each γ ∈ R ++ , s γ := ←− s ], † hγ,θ ( y ) ∈ ←− P ], † hγ,θ ( y ) ; (iii) For each γ ∈ R ++ , s γ := ←− s ]γ,θ ( y ) ∈ ←− P ]γ,θ ( y ) .Then θ ( s γ ) → inf θ ( D ) as γ ↑ + ∞ . Moreover, the net ( s γ ) γ ∈ R ++ is bounded with all clusterpoints as γ ↑ + ∞ lying in argmin θ ( D w ) = ∅ . If additionally argmin θ ( D w ) is a singleton, then s γ → argmin θ ( D w ) as γ ↑ + ∞ .Proof. We first have from Lemma 2.3(ii) that dom D ],h∂f = D × D . In view of Remark 4.5, inf θ ( D w ) =inf θ ( D ). So, the conditions of Theorem 4.3 are satisfied, and we have the desired result in cases(i), and (ii).As shown in the proof of Corollary 4.6(i), ←− env ?γ,θ ( y ) = ←− env ?, ( f ⊕ f ∗ ) γ,θ ( y ), which implies that ←− P ?γ,θ = ←− P ?,f ⊕ f ∗ γ,θ . Therefore, the desired result in case (iii) follows from (i) with h = f ⊕ f ∗ . (cid:4) Corollary 4.8.
Let f ∈ Γ ( X ) with D := dom ∂f . Suppose that T = S = ∂f and that h ∈ H ( ∂f ) satisfies ∀ ( x, v ) ∈ dom f × ran ∂f, h ( x, v ) ≤ f ⊕ f ∗ ( x, v ) = f ( x ) + f ∗ ( v ) . (81) Let θ : X → ] −∞ , + ∞ ] be lsc and coercive with int D ∩ dom θ = ∅ , let γ ∈ R ++ , and let x ∈ D .Suppose that X is finite-dimensional and that one of the following holds: a) ←− env † F log γ,θ (b) ←− env † ent ⊕ ent ∗ γ,θ (c) ←− env † σ log γ,θ (d) key Figure 2: Left envelopes for representative functions of the logarithm.(i) D [,hT ( x, · ) is lsc and, for each γ ∈ R ++ , s γ := −→ s [,hγ,θ ( x ) ∈ −→ P [,hγ,θ ( x ) ; (ii) For each γ ∈ R ++ , s γ := −→ s [, † hγ,θ ( x ) ∈ −→ P [, † hγ,θ ( x ) ; (iii) D [∂f ( x, · ) is lsc and, for each γ ∈ R ++ , s γ := −→ s [γ,θ ( x ) ∈ −→ P [γ,θ ( x ) . (iv) dom ∂f is open, D ],hT ( x, · ) is lsc, and, for each γ ∈ R ++ , s γ := −→ s ],hγ,θ ( x ) ∈ −→ P ],hγ,θ ( x ) ; (v) dom ∂f is open and, for each γ ∈ R ++ , s γ := −→ s ], † hγ,θ ( x ) ∈ −→ P ], † hγ,θ ( x ) ; (vi) dom ∂f is open, D ]∂f ( x, · ) is lsc, and, for each γ ∈ R ++ , s γ := −→ s ]γ,θ ( x ) ∈ −→ P ]γ,θ ( x ) .Then θ ( s γ ) → inf θ ( D ) as γ ↑ + ∞ . Moreover, the net ( s γ ) γ ∈ R ++ is bounded with all clusterpoints as γ ↑ + ∞ lying in argmin θ ( D ) = ∅ . If additionally argmin θ ( D ) is a singleton, then s γ → argmin θ ( D ) as γ ↑ + ∞ .Proof. The proof is similar to Corollary 4.7, using Lemma 2.3 and Theorem 4.4. (cid:4)
When f is differentiable, the results for the classical Bregman envelopes given in Corollaries 4.7and 4.8 generalize those in [7, Theorem 3.5]. Examples of envelopes and proximity operators
Theorems 4.1–4.4 and Corollaries 4.6, 4.7, and 4.8 are important for several reasons. They revealthat some of the asymptotic results provided in [7, Propositions 2.2 & 3.2, Theorems 3.3 & 3.5]for Bregman envelopes are only a special case of a class results that hold for envelopes constructedfrom representative functions for maximally monotone operators. We also note that we have alsohere provided a clarification of the domain conditions in the exposition of [7, Propositions 2.2 &3.2], namely, that the asymptotic results are, of course, restricted to the domain of the distanceoperator.We illustrate these connections with the envelopes that correspond to the distances D F log , D ent ⊕ ent ∗ , and D σ log from Examples 2.8, 2.7, and 2.9 respectively, which are illustrated in Fig-ure 1. The derivation of these distances may be found in [13]. The left envelopes are shown inFigure 2, and the right envelopes are shown in Figure 3. For both figures, we use the closed versionsof the distances. The explicit forms for the envelopes and their proximity operators are given inAppendix A. 20 a) −→ env † F log γ,θ (b) −→ env † ent ⊕ ent ∗ γ,θ (c) −→ env † σ log γ,θ (d) key Figure 3: Right envelopes for representative functions of the logarithm.Let us first discuss how these examples illustrate Theorems 4.1 & 4.2. As the parameter γ → θ , the Legendrefunction being regularized. This is, at first, less clear when the representative function employedis σ log as in Figures 2c and 3c, and so some clarification is in order. For any γ ∈ ]0 , + ∞ [, thefunction ←− env † σ log γ,θ is exactly equal to θ on [0 , / −→ env † σ log γ,θ is exactly equal to θ on [1 / , , /
2] in Figure 2cand on [1 / ,
1] in Figure 3c. The reason for this may be found by scrutinizing the distance D σ log in(30) and in Figure 1c, and observing that the distance takes the value infinity whenever the rightvariable is greater than the left. For this reason, the sum θ ( · ) + D σ log ( · , y ) is minimized at y for y ≥ /
2, while the sum θ ( · ) + D σ log ( x, · ) is minimized at x for x ≤ / D σ log is also instrumental in understanding Theorems 4.3 & 4.4 and Corollary 4.6.For the reasons we have just discussed, the condition θ ( ←− s † σ log γ,θ ( y )) → inf θ ( U ) as γ ↑ + ∞ fails to holdfor y ∈ ]1 / , y ∈ ]1 / ,
1] the condition ←− env † σ log γ,θ ( y ) ↓ inf θ ( U ) as γ ↑ + ∞ doesnot hold. An analogous situation arises for −→ env † σ log γ,θ ( x ) in the case of x ∈ [0 , / D σ Id = ι G Id is an even more dramatic case where the GBDenvelope does not asymptotically approach inf θ ( U ).On the other hand, for y ∈ [0 , /
2] we have that θ ( ←− s † σ log γ,θ ( y )) → inf θ ( U ) and ←− env † σ log γ,θ ( y ) ↓ inf θ ( U ) as γ ↑ + ∞ . Similarly, for x ∈ [1 / ,
1] we have that θ ( −→ s † σ log ,σ log γ,θ ( y )) → inf θ ( U ) and −→ env † σ log γ,θ ( y ) ↓ inf θ ( U ) as γ ↑ + ∞ .The loss of some desirable asymptotic properties for the largest member of H ( ∂f ) highlightsthe advantage of Corollary 4.7, which assures us that ←− env † F log γ,θ ( y ) ↓ inf θ ( U ) and −→ env † F log γ,θ ( x ) ↓ inf θ ( U ) as γ ↑ + ∞ , because F log ≤ ent ⊕ ent ∗ . We see this property illustrated in Figures 2a and3a. Comparing with Figures 2b and 3b, we can also see the essential property from the proof ofCorollary 4.7: that envelopes built from smaller representative functions than the Fenchel–Youngrepresentative are majorized thereby.Finally, from a theoretical standpoint, the case of −→ P † F log γ,θ is illustrative of why we had to employselection operators in our analysis of the proximity operators, because it is set valued for x = 0 and e = 1 /γ . 21 . Conclusion In Section 2, we recalled the theory of GBDs [18] and the coercivity framework established in[13]. We also introduced the specific computed distances from [13] that we have used to build theenvelopes and proximity operators in the current exposition, along with an explanation of why theyare natural distances to consider. In Section 3, we introduced the left and right envelopes for theGBDs, along with their associated proximity operators. In Section 4, we provided a selection ofasymptotic results. The examples in Section 4 illustrate how the results in the setting of GBDsvary from those we obtain more easily when specializing to Bregman distances.Our analysis also yields results on the Bregman case when specializing thereto by using theFenchel–Young representative. Pleasingly, the desirable asymptotic properties for Bregman dis-tances extend to GBDs constructed from Fitzpatrick representatives, which suggests that suchdistances may be a useful subject for specialized investigation. Many important optimization al-gorithms may be studied as special cases of gradient descent applied to envelopes; now that asufficient coercivity framework has been developed [13] and distances with desirable asymptoticenvelope properties identified, two natural follow-up questions present themselves. The first is:excluding the already known Bregman cases, are there useful descent algorithms—discovered orundiscovered—whose analysis may fit within such a framework? The second is: are there previ-ously unknown GBDs—for example, GBDs constructed from Fitzpatrick representatives—whoseforms admit computational advantages over their Bregman counterparts?
Acknowledgements
Part of this work was done during MND’s visit to the University of South Australia in 2018 towhom he acknowledges the hospitality. SBL was supported by an Australian Mathematical SocietyLift-Off Fellowship, and by Hong Kong Research Grants Council PolyU153085/16p.
A. Appendix: Closed forms for envelopes and proximity operators
For all our examples, we use the closed distances as described in Remark 2.5. The function denoted W is the principal branch of the Lambert W function, whose occurrences in variational analysis hasbeen discussed in, for example, [8, 10, 23]. Example A.1 (Left prox and envelope for D F log ). Beginning with the smallest member of H (log), we first consider the left envelope and corresponding proximity operator characterized by ←− env † F log γ,θ ( x ) = inf y ∈ R + { θ ( y ) + 1 γ D F log ( y, x ) } = θ ( ←− P † F log γ,θ ( x )) + 1 γ D F log (cid:16) ←− P † F log γ,θ ( x ) , x (cid:17) , where F log is given in [9, Example 3.6] and D F log is in (28). We have that ←− P † F log γ,θ ( x ) = (cid:16) γ + 1 (cid:17) e γ γx if x ≤ e − γ γ ; (cid:16) γ − (cid:17) e − γ γx if e γ − γ < x ; otherwise , ←− env † F log γ,θ ( x ) = − e γ x if x ≤ e − γ γ ; γ (cid:16)(cid:16) γ − (cid:17) e − γ x + e γ x (cid:17) − if x > e γ − γ ;( W ( e x ) − ) γ W ( e x ) otherwise . The envelope is shown in Figure 2a.
Example A.2 (Right prox and envelope for D F log ). We next consider the right envelope andcorresponding proximity operator characterized by −→ env † F log γ,θ ( x ) = inf y ∈ R + { θ ( y ) + 1 γ D F log ( x, y ) } = θ ( −→ s † F log γ,θ ( x )) + 1 γ D F log (cid:16) x, −→ s † F log γ,θ ( x ) (cid:17) , where −→ s † F log γ,θ ( x ) ∈ −→ P † F log γ,θ ( x ) . We use the selection operator −→ s F log γ,θ because, in this case, the prox operator −→ P F log γ,θ is set-valued: −→ P † F log γ,θ ( x ) = x = 0 and e < γ ;[0 , ] if x = 0 and γ = e ; if x = 0 and γ < e ; − γx W (cid:0) − γ (cid:1) W (cid:0) − γ (cid:1) +1 if 0 < x < − γ ( W ( − γ )+1)2 W ( − γ ) and e < γ ; γx W (cid:0) γ (cid:1) W (cid:0) γ (cid:1) +1 if γ ( W ( γ )+1)2 W ( γ ) < x ; otherwise . The corresponding right envelope is −→ env † F log γ,θ ( x ) = if x = 0 and e ≤ γ ; γe if x = 0 and γ < e ; x W ( γ ) γ ( W ( γ )+1) − + x (cid:0) W (cid:0) eγ ( W ( γ )+1) W ( γ ) (cid:1) − (cid:1) γ W (cid:0) eγ ( W ( γ )+1) W ( γ ) (cid:1) if γ ( W ( γ )+1)2 W ( γ ) < x ; x W ( − γ ) γ ( W ( − γ )+1) + + x (cid:0) W (cid:0) − eγ ( W ( − γ )+1) W ( − γ ) (cid:1) − (cid:1) γ W (cid:0) − eγ ( W ( − γ )+1) W ( − γ ) (cid:1) if 0 < x < − γ ( W ( − γ )+1)2 W ( − γ ) and e < γ ; x ( W (2 xe ) − γ W (2 xe ) otherwise . The envelope is shown in Figure 3a.
Example A.3 (Left prox and envelope for D σ log ). Turning to the biggest of the representativefunctions for the logarithm, we next consider the left envelope and corresponding proximity operatorcharacterized by ←− env † σ log γ,θ ( x ) = inf y ∈ R + { θ ( y ) + 1 γ D σ log ( y, x ) } = θ ( ←− P † σ log γ,θ ( x )) + 1 γ D σ log ( ←− P † σ log γ,θ ( x ) , x ) . D σ log is as in 30. We have that ←− P † σ log γ,θ ( x ) = xe − γ +1 if x ≤ e − γ +1 and 1 ≤ γ ; x if x ≥ or γ < otherwise . The corresponding envelope is ←− env σ log γ,θ ( x ) = x = 0; x − if x ≥ ; − x if x < and γ < − xγ e γ − (cid:0) − log (cid:0) xe γ − (cid:1) + γ + log( x ) (cid:1) + if x ≤ e − γ +1 and 1 ≤ γ ; − γ log(2 x ) otherwise . . The envelope is shown in Figure 2c.
Example A.4 (Right prox and envelope for D σ log ). We consider the right envelope and cor-responding proximity operator characterized by −→ env † σ log γ,θ ( x ) = inf y ∈ R + { θ ( y ) + 1 γ D σ log ( x, y ) } = θ ( −→ P † σ log γ,θ ( x )) + 1 γ D σ log (cid:16) x, −→ P † σ log γ,θ ( x ) (cid:17) . The corresponding proximity operator is −→ P † σ log γ,θ ( x ) = if < x and x ≤ γ ; xγ if 1 ≤ γ and γ < x ; x otherwise , while the corresponding envelope is −→ env † σ log γ,θ ( x ) = − x if x ≤ ; x − if < x and γ ≤ − + xγ (1 + log( γ )) if 1 ≤ γ and γ < x ; xγ log(2 x ) otherwise . The envelope is shown in Figure 3c.The operators in Example A.5 (and their computation) may be found in [7], with the minormodification that here we are computing with the lower closure of the distance and so obtain closedforms which differ at zero. We include them here for their comparison with the new GBDs for thelogarithm.
Example A.5 (Proxes and envelopes for D ent ⊕ ent ∗ ). We next consider the case when D ent ⊕ ent ∗ is the closed GBD for the Fenchel–Young representative of log (26), a case whose relationship tothe Bregman distance for the Boltzmann–Shannon entropy is discussed in Example 2.7. The leftproximity operator and envelope are given by ←− P † ent ⊕ ent ∗ γ,θ ( y ) = y exp( γ ) , if 0 ≤ y < exp( − γ ); y exp( − γ ) , if y > exp( γ ); , otherwise , − env † ent ⊕ ent ∗ γ,θ ( y ) = y (1 − e γ ) γ + , if 0 ≤ y < exp( − γ ); y (1 − e − γ ) γ − , if exp( γ ) < y ; y − ln( y ) − − ln(2)2 γ , otherwise . For details, see [7, Example 4.1(ii)]. The envelope is shown in Figure 2b. The right proximityoperator and envelope are given by −→ P † ent ⊕ ent ∗ γ,θ ( x ) = x − γ , if 0 ≤ x < − γ ; x γ , if x > γ ; , otherwise . −→ env † ent ⊕ ent ∗ γ,θ ( x ) = ln(1 − γ ) γ x + , if 0 ≤ x < − γ ; ln(1+ γ ) γ x − , if x > γ ; γ (cid:16) x ln(2 x ) − x + (cid:17) , otherwise . For details, see [7, Example 4.1(ii)]. The envelope is shown in Figure 3b.
References [1] H. Attouch, Convergence de fonctions convexes, des sous-différentiels et semi-groupes associés,
Comptes Rendus de l’Académie des Sciences de Paris :539–542, 1977.[2] H. Attouch,
Variational Convergence for Functions and Operators , Pitman, 1984.[3] H. Attouch and R.J-B. Wets, Approximation and convergence in nonlinear optimization, in
Nonlinear Programming 4 , edited by O. Mangasarian, R. Meyer, and S. Robinson, 367–394,Academic Press, New York, 1983.[4] H.H. Bauschke and J.M. Borwein, Legendre functions and the method of random Bregmanprojections,
Journal of Convex Analysis (1):27–67, 1997.[5] H.H. Bauschke and P.L. Combettes, Convex Analysis and Monotone Operator Theory in HilbertSpaces , second edition, Springer, 2017.[6] H.H. Bauschke, P.L. Combettes, and D. Noll, Joint minimization with alternating Bregmanproximity operators,
Pacific Journal of Optimization (3):401–424, 2006.[7] H.H. Bauschke, M.N. Dao, and S.B. Lindstrom, Regularizing with Bregman–Moreau envelopes, SIAM Journal on Optimization (4):3208–3228, 2018.[8] H.H. Bauschke and S.B. Lindstrom, Proximal averages for minimization of entropy functionals, Pure and Applied Functional Analysis (3):505–531, 2020.[9] H.H. Bauschke, D.A. McLaren, and H.S. Sendov, Fitzpatrick functions: inequalities, examples,and remarks on a problem by S. Fitzpatrick, Journal of Convex Analysis (3–4):499–523,2006.[10] J.M. Borwein and S.B. Lindstrom, Meetings with Lambert W and other special functions inoptimization and analysis, Pure and Applied Functional Analysis (3):361-396, 2017.[11] L.M. Bregman, The relaxation method of finding the common point of convex sets and its appli-cation to the solution of problems in convex programming, USSR Computational Mathematicsand Mathematical Physics (3):200–217, 1967.2512] H. Brezis, Functional Analysis, Sobolev Spaces and Partial Differential Equations . Springer,Berlin, 2011.[13] R.S. Burachik, M.N. Dao, and S.B. Lindstrom, The generalized Bregman distance, to appearin
SIAM Journal on Optimization (1):404–424, 2021.[14] R. S. Burachik and J. Dutta. Inexact proximal point methods for variational inequality prob-lems. SIAM Journal on Optimization (5):2653–2678, 2010.[15] R. S. Burachik and A. N. Iusem. A generalized proximal point algorithm for the variationalinequality problem in a Hilbert space. SIAM Journal on Optimization (1):197–216, 1998.[16] R. S. Burachik and A. N. Iusem. Set-Valued Mappings and Enlargements of Monotone Opera-tors . Springer, Berlin, 2008.[17] R. Burachik and G. Kassay, On a generalized proximal point method for solving equilibriumproblems in Banach spaces,
Nonlinear Analysis (18):6456–6464, 2012.[18] R.S. Burachik and J.E. Martínez-Legaz, On Bregman-type distances for convex functions andmaximally monotone operators, Set-Valued and Variational Analysis (2): 369–384, 2018.[19] R. S. Burachik, B. F. Svaiter. Maximal monotone operators, convex functions and a specialfamily of enlargements. Set-Valued Analysis (4):297–316, 2002.[20] C. Byrne and Y. Censor, Proximity function minimization using multiple Bregman projections,with applications to split feasibility and Kullback–Leibler distance minimization, Annals ofOperations Research (1–4):77–98, 2001.[21] Y. Censor and S.A. Zenios, Proximal minimization algorithm with D-functions,
Journal ofOptimization Theory and Applications (3):451–464, 1992.[22] G. Chen and M. Teboulle, Convergence analysis of a proximal-like minimization algorithmusing Bregman functions, SIAM Journal on Optimization (3):538–543, 1993.[23] R.M. Corless, G.H. Gonnet, D.E. Hare, D.J. Jeffrey, and D.E. Knuth, On the Lambert Wfunction, Advances in Computational mathematics (1):329–359, 1996.[24] J. Eckstein, Nonlinear proximal point algorithms using Bregman functions, with applicationsto convex programming, Mathematics of Operations Research (1):202–226, 1993.[25] S. Fitzpatrick, Representing monotone operators by convex functions, Functional Analysis andOptimization, Workshop and Miniconference, Australia, Proc. Center Math. Anal., AustralianNat. Univ. :59–65, 1988.[26] K.C. Kiwiel, Proximal minimization methods with generalized Bregman functions, SIAM Jour-nal on Control and Optimization (4):1142–1168, 1997.[27] J.-J. Moreau, Fonctions convexes duales et points proximaux dans un espace hilbertien, Comptes Rendus de l’Académie des Sciences :2897–2899, 1962.[28] R.T. Rockafellar and R.J.B. Wets,
Variational analysis , Springer, 2009.[29] W. Schirotzek,