Lp-norms, Log-barriers and Cramer transform in Optimization
aa r X i v : . [ m a t h . O C ] M a y L p -NORMS, LOG-BARRIERS AND CRAMER TRANSFORMIN OPTIMIZATION J.B. LASSERRE AND E.S. ZERON
Abstract.
We show that the Laplace approximation of a supremumby L p -norms has interesting consequences in optimization. For instance,the logarithmic barrier functions (LBF) of a primal convex problem P and its dual P ∗ appear naturally when using this simple approximationtechnique for the value function g of P or its Legendre-Fenchel conjugate g ∗ . In addition, minimizing the LBF of the dual P ∗ is just evaluatingthe Cramer transform of the Laplace approximation of g . Finally, thistechnique permits to sometimes define an explicit dual problem P ∗ incases when the Legendre-Fenchel conjugate g ∗ cannot be derived explic-itly from its definition. Introduction
Let f : X → R and ω : X → R m be a pair of continuous mappings definedon the convex cone X ⊆ R n . Consider the function g : R m → R ∪ {−∞} given by the formula:(1.1) y g ( y ) := sup x { f ( x ) : ω ( x ) ≤ y , x ∈ X } . For each fixed y ∈ R m , computing g ( y ) is solving the optimization problem(1.2) P : sup x { f ( x ) : ω ( x ) ≤ y , x ∈ X } , and g is called the value function associated with P . The value function g provides a systematic way to generate a dual problem P ∗ via its Legendre-Fenchel conjugate denoted g ∗ : R m → R ∪ {−∞} . In the concave version(i.e. when g and g ∗ are concave instead of convex), the Legendre-Fenchelconjugate g ∗ is defined by(1.3) λ g ∗ ( λ ) := inf y ∈ R m { λ ′ y − g ( y ) } , and is finite on some domain D ⊂ R m + . Then a dual problem is defined by:(1.4) P ∗ : ˜ g ( y ) := ( g ∗ ) ∗ ( y ) = inf λ { λ ′ y − g ∗ ( λ ) } . Of course, one has the property ˜ g ( y ) ≥ g ( y ) because from g ∗ ( λ ) = inf x { λ ′ x − g ( x ) } ≤ λ ′ y − g ( y ) , Mathematics Subject Classification.
Key words and phrases. optimization; Logarithmic Barrier Function; Legendre-Fencheland Cramer transforms. one may deduce˜ g ( y ) = ( g ∗ ) ∗ ( y ) = inf λ { λ ′ y − g ∗ ( λ ) } ≥ inf λ { λ ′ y + g ( y ) − λ ′ y } = g ( y ) . Moreover, notice that:˜ g ( y ) = inf λ { λ ′ y − g ∗ ( λ ) } = inf λ { λ ′ y + sup z { g ( z ) − λ ′ z } } = inf λ { λ ′ y + sup z { sup x ∈ X { f ( x ) : ω ( x ) ≤ z } − λ ′ z } } = ( + ∞ if λ R m + inf λ ∈ R m + { λ ′ y + sup x ∈ X { f ( x ) − λ ′ ω ( x ) } } otherwise= inf λ ∈ R m + sup x ∈ X { f ( x ) + λ ( y − ω ( x ) } . (1.5)Finally, noting that inf λ ∈ R m + { f ( x ) + λ ′ ( y − ω ( x ) } = f ( x ) if ω ( x ) ≤ y and −∞ otherwise, one may write g ( y ) = sup x ∈ X inf λ ∈ R m + { f ( x ) + λ ′ ( y − ω ( x ) } (1.6) ≤ ˜ g ( y ) = inf λ ∈ R m + sup x ∈ X { f ( x ) + λ ′ ( y − ω ( x ) } [by (1.5)] , (1.7)and the equality g ( y ) = ˜ g ( y ) holds true under some convexity assumption.However, in general g ∗ cannot be obtained explicitly from its definition(1.3), and for dual methods to solve P , the inner maximization in (1.7)must be done numerically for each fixed λ . A notable exception is the conicoptimization problem where f and ω are both linear mappings, for whichthe dual (1.4) has an explicit form in terms of λ . Of course, alternativeexplicit duals have been proposed but they involve both primal ( x ) anddual ( λ ) variables. In particular, the Wolfe [14] and Mond-Weir [11] dualseven allow to consider weakened notions of convexity like e.g. pseudo- orquasi-convexity. For a nice exposition and related references on this topic,the interested reader is referred to Mond [12] and the references therein. Contribution.
Our contribution is to show that the simple and well-known Laplace approximation of a supremum via a converging sequence of L p -norms has interesting consequences in optimization, for both primal anddual problems P and P ∗ .Recall that the celebrated Logarithmic Barrier Function (LBF in short)associated with a convex optimization problem P as in (1.2), or with its dual P ∗ in (1.4) when g ∗ is explicit, is an important tool in convex optimizationbecause of its remarkable mathematical properties. For instance, when theLBF has the self-concordance property then the associated Logarithmic A function ϕ : D → R is called κ -self-concordant on D ⊂ R n , κ ≥
0, if ϕ is three timescontinuously differentiable in D , and for all x ∈ D and h ∈ R n , one has |∇ ϕ ( x )[ h, h, h ] | ≤ κ (cid:0) h ′ ∇ ϕ ( x ) h (cid:1) / , where ∇ ϕ ( x )[ h, h, h ] is the third differential of ϕ at x and h ; see e.g. [6, p. 52]. p -NORMS AND CRAMER TRANSFORM IN OPTIMIZATION 3 Barrier algorithm to solve P or its dual P ∗ , runs in time polynomial in theinput size of the problem; see e.g. [5] and [6, p. 13, 51 and 60]. But the LBFis only one particular choice among many other interior penalty functions!Our main contribution is to provide a rationale behind the LBF as we showthat the LBF can be obtained by approximating the ”max” of a functionover a domain by standard L p -norms on the same domain. The scalar 1 /p becomes the parameter of the LBF and nice convergence properties holdwhen p → ∞ . More precisely: • We first show that the LBF (with parameter p ) associated with theprimal problem P appears naturally by using the simple and well-knownLaplace approximation of a supremum via L p -norms, applied to the innerinfimum in (1.6). It is a bit suprising to obtain an efficient method in thisway. Indeed, the inner infimum in (1.6) (which is exactly equal to f ( x )when x is an admissible solution of P ) is replaced with its ”naive” Laplaceapproximation by L p -norms, and to the best of our knowledge, the efficiencyof this approximation has not been proved or even tested numerically! • Similarly, when using the same Laplace L p -norm approximation tech-nique for the infimum in the definition (1.3) of the conjugate function g ∗ , weobtain a function φ p : R m → R which: (a) depends on an integer parameter p and (b), is valid on the relative interior ri D of some domain D ⊂ R m .Theorem 4 states that the minimum of φ p converges to the minimum of P as p → ∞ . In doing so for conic optimization problems, the set D is justthe feasible set of the (known) explicit dual problem P ∗ , and φ p is (up to aconstant) the LBF with parameter p , associated with P ∗ . So again, for conicprograms, the simple Laplace approximation of a supremum by L p -normspermits to retrieve the LBF of the dual problem P ∗ ! Interestingly, Theorem5 states that the function y min λ φ p ( λ ; y ) is nothing less than the Cramertransform of the Laplace approximation k e f k pL p (Ω( y )) , where Ω( y ) ⊂ X is thefeasible set of problem P and k · k L p is the usual norm associated with theLebesgue space L p . To the best of our knowledge, this interpretation of theLogarithmic Barrier algorithm (with parameter 1 /p ) for the dual P ∗ , is new(although in the particular context of Linear Programming, this result wasalready alluded to in [8]).Analogies between the Laplace and Fenchel transforms via exponentialsand logarithms in the Cramer transform have been already explored in othercontexts, in order to establish nice parallels between optimization and proba-bility via a change of algebra; see e.g. Bacelli et al. [1], Maslov [10], Lasserre[9], and the many references therein. In probability, the Cramer transformof a probability measure has also been used to provide exact asymptotics ofsome integrals as well as to derive large deviation principles. For a nice sur-vey on this topic the interested reader is referred to Piterbarg and Falatov[13]. • Finally, an interesting feature of this Laplace approximation techniqueis to provide us with a systematic way to obtain a dual problem (1.4) in cases
J.B. LASSERRE AND E.S. ZERON when g ∗ cannot be obtained explicitly from its definition (1.3). Namely, ina number of cases and in contrast with g ∗ , the function φ p ( λ ; y ) obtained byusing the Laplace approximation of the conjugate function g ∗ by L p -norms,can be computed in closed-form explicitly. Examples of such situations arebriefly discussed. In the general case, φ p is of the form h ( λ ; y ) + h ( λ ; p )where: for every λ ∈ ri D fixed, h ( λ ; p ) → p → ∞ , and for each fixed p , the function λ h ( λ ; p ) is a barrier for the domain D . This yields toconsider the optimization problem P ∗ : min λ { h ( λ ; y ) : λ ∈ D} as a natural dual of P , and for which φ p is an associated barrier functionwith parameter p . If g ∗ is concave then strong duality holds.2. Main result
We need some intermediary helpful results before stating our main result.2.1.
Some preliminary results.
Let L q ( X ) be the usual Lebesgue spaceof integrable functions defined on a Borel-measurable set X ⊆ R n , and k h k L q ( X ) (or sometimes k h k q ) be the associated norm k h k L q ( X ) = k h k q := (cid:18)Z X | h ( x ) | q d x (cid:19) /q . To make the paper self-contained we prove the following known result.
Lemma 1.
Let X ⊆ R n be any Borel-measurable set, and h ∈ L q ( X ) forsome given q ≥ , so that k h k L q ( X ) < ∞ . Then: lim p →∞ k h k L p ( X ) = k h k ∞ := ess sup x ∈ X | h ( x ) | . Proof.
Notice that X may be an unbounded set. Suppose that k h k q < ∞ for some given q ≥
1, and define Λ to be the essential suppremum of | h | in X . The result is trivial when Λ = 0, so we assume that Λ ∈ (0 , ∞ ). Theness sup x ∈ X h ( x ) = Λ = lim p →∞ (cid:0) k h/ Λ k L q ( X ) (cid:1) q/p Λ= lim p →∞ (cid:20)Z X Λ p (cid:18) | h ( x ) | Λ (cid:19) q d x (cid:21) /p ≥ lim p →∞ (cid:20)Z X | h ( x ) | p d x (cid:21) /p = lim p →∞ k h k L p ( X ) . (2.1)It is also obvious that Λ ≥ lim p k h k p when Λ = ∞ . On the other hand,suppose that the essential suppremum Λ of | h | in X is finite. Given anarbitrary parameter ǫ >
0, there exists a bounded subset B ⊂ X withpositive finite Lebesgue measure λ ( B ) ∈ (0 , ∞ ) such that | h ( x ) | > Λ − ǫ forevery x ∈ B . Thenlim p →∞ k h k L p ( X ) ≥ lim p →∞ k h k L p ( B ) ≥ lim p →∞ λ ( B ) /p (Λ − ǫ ) = Λ − ǫ. p -NORMS AND CRAMER TRANSFORM IN OPTIMIZATION 5 Therefore, since ǫ is arbitrary, combining the previous identity with (2.1)yields the desired result lim p →∞ k h k L p ( X ) = Λ. In the same way, assume thatthe essential supremum of | h | in X is infinite. Given an arbitrary naturalnumber N ∈ N , there exists a bounded subset B ⊂ X with positive finiteLebesgue measure λ ( B ) ∈ (0 , ∞ ) such that | h ( x ) | > N for every x ∈ B .Then lim p →∞ k h k L p ( X ) ≥ lim p →∞ k h k L p ( B ) ≥ lim p →∞ λ ( B ) /p N = N. Therefore, since N is arbitrary, combining the previous identity with (2.1)yields the desired result lim p →∞ k h k L p ( X ) = Λ = ∞ . (cid:3) Next we also need the following intermediate result.
Lemma 2.
For every p ∈ N let U p ⊂ R n be some open subset, and let h p : U p → R be a sequence of functions indexed by the parameter p ∈ N .Suppose that h p converges pointwise to a function h defined on an opensubset U of R n . Then: lim p →∞ inf x ∈ U p h p ( x ) ≤ inf x ∈ U h ( x ) , provided that the limit in the left side of the equation exists in the extendedinterval [ −∞ , ∞ ) .Proof. Suppose that the infinimum of h on U is equal to −∞ . For every N ∈ R there is a point x ∈ U such that h ( x ) < N , and so there is alsoan index p such that x ∈ U p and h p ( x ) < N for every p > p . Hence theinfinimum of h p on U p is strictly less than N , and solim p →∞ inf x ∈ U p h p ( x ) = −∞ = inf x ∈ U h ( x ) , because N ∈ R is arbitrary. On the other hand, assume that the infinimumof h on U is equal to λ ∈ R . For every ǫ > x ∈ U suchthat h ( x ) < λ + ǫ , and so there is also an index p such that x ∈ U p and h p ( x ) < λ + ǫ for every p > p . Since the infinimum of h p on U p is strictlyless than λ + ǫ and ǫ > p →∞ inf x ∈ U p h p ( x ) ≤ λ = inf x ∈ U h ( x ) . (cid:3) L p -norm approximations for the primal. Let us go back to prob-lem P in (1.1) where X ⊆ R n is a convex cone, and let Z := R m + . Let X ∗ ⊂ R n be the dual convex cone associated with X , and let x ∆( x ) bethe universal logarithmic barrier function associated with the convex cone X , that is,(2.2) x ∆( x ) := ln (cid:18)Z X ∗ e − x ′ y d y (cid:19) , x ∈ int X , J.B. LASSERRE AND E.S. ZERON where int X denotes the interior of X . See e.g. G¨uller [4] and G¨uler andTuncel [5]. Next, let H ⊂ R n be the set(2.3) H := { x ∈ R n : ω ( x ) < y ; x ∈ int X } . Recalling that P is a maximization problem, the LBF associated with the(primal) problem P , and with parameter p ∈ N , is the function ψ p : H → R defined by:(2.4) x ψ p ( x ) := f ( x ) + 1 p − ∆( x ) + m X j =1 ln( y − ω ( x )) j . (In some references like e.g. [6], p ψ p is rather used.) The LBF in convex pro-gramming dates back to Frisch [3] and became widely known later in Fiaccoand McCormick [2]. For more details and a discussion, see e.g. den Hertog[6, Chapter 2]. It is well-known that under some convexity assumptions ,and if g ( y ) < ∞ ,(2.5) g ( y ) = lim p →∞ sup x { ψ p ( x ) : x ∈ H } , and the sequence of maximizers ( x p ) p ∈ N ⊂ H of ψ p converges to a maximizerof P .We next provide a simple rationale that explains how the LBF naturallyappears to solve problem P . Proposition 3.
With H as in (2.3), let y ∈ R m and x ∈ H . Then: ( λ,µ ) ∈ Z × X ∗ { λ ′ ( ω ( x ) − y ) − µ ′ x } = lim p →∞ ln k e λ ′ ( ω ( x ) − y ) − µ ′ x k L p ( Z × X ∗ ) = lim p →∞ p ∆( x ) − m X j =1 ln( y − ω ( x )) j (2.6) Proof.
The first equation is trivial whereas the second one follows fromLemma 1. Next,ln k e λ ′ ( ω ( x ) − y ) − µ ′ x k L p ( Z × X ∗ ) = 1 p ln Z Z Z X ∗ e pλ ′ ( ω ( x ) − y ) − pµ ′ x dµ dλ = 1 p [∆( p x ) − m X j =1 ln( p ( y − ω ( x ))) j ]= 1 p [∆( x ) − m X j =1 ln( y − ω ( x )) j − ( m + n ) ln p ] For instance if the mappings − f and ω are convex twice continuous differentiable, theinterior of the feasible set is bounded, and the Hessian −∇ ψ p is positive definite on itsdomain; see Den Hertog [6, page 2]. As p varies, the unique maximizer x ( p ) of ψ p , calledthe p - center , lies on the so-called central path . p -NORMS AND CRAMER TRANSFORM IN OPTIMIZATION 7 where we have used that for each p ∈ N , ∆( p x ) = ∆( x ) − n ln p because X ∗ is a cone. (cid:3) Observe thatinf ( λ,µ ) ∈ Z × X ∗ { f ( x ) + λ ′ ( y − ω ( x )) + µ ′ x } = (cid:26) f ( x ) if x ∈ X and ω ( x ) ≤ y −∞ otherwise,and so (1.6) can be rewritten(2.7) g ( y ) = sup x ∈H inf ( λ,µ ) ∈ Z × X ∗ { f ( x ) + λ ′ ( y − ω ( x )) + µ ′ x } . Using Proposition 3 and lim p →∞ ( m + n ) ln p/p = 0, yields(2.8) g ( y ) = sup x ∈H f ( x ) + lim p →∞ p − ∆( x ) + m X j =1 ln( y − ω ( x )) j . A direct application of Lemma 2 to (2.8) yields g ( y ) ≤ lim p →∞ sup x ∈H f ( x ) + 1 p − ∆( x ) + m X j =1 ln( y − ω ( x )) j = lim p →∞ sup x { ψ p ( x ) : x ∈ H} , and in fact, (2.5) states that one also has the reverse inequality,In other words, the LBF ψ p appears naturally when one approximatesinf ( λ,µ ) ∈ Z × X ∗ { λ ′ ( y − ω ( x )) + µ ′ x } , (whose value is exactly zero if ω ( x ) ≤ y and x ∈ X ), by the quantity1 p − ∆( x ) + m X j =1 ln( y − ω ( x )) j , which comes from the Laplace approximation of a ”sup” by L p -norms.For instance, in Linear programming where X = R n + , x c ′ x and x ω ( x ) = Ax for some vector c ∈ R n and some matrix A ∈ R m × n , g ( y ) = lim p →∞ sup x { c ′ x + 1 p m X j =1 ln( y − Ax ) j + n X i =1 ln( x i ) : x ∈ H } . L p -norm approximations for the dual. We now use the same ap-proximation technique via L p -norms to either retrieve the known dual P ∗ when it is explicit, or to provide an explicit dual problem P ∗ in cases where g ∗ cannot be obtained explicitly from its definition (1.3). Recall that if g J.B. LASSERRE AND E.S. ZERON is concave, upper semi-continuous, and bounded from above by some linearfunction, then by Legendre-Fenchel duality, g ( y ) = inf λ (cid:8) λ ′ y − g ∗ ( λ ) (cid:9) , where(2.9) g ∗ ( λ ) := inf y (cid:8) λ ′ y − g ( y ) (cid:9) . (2.10)One can express g ∗ in terms of the definition (1.1) of g and the involvedcontinuous transformations f and ω . Namely, − g ∗ ( λ ) = sup y (cid:8) g ( y ) − λ ′ y (cid:9) = sup y sup x ∈ X ,ω ( x ) ≤ y (cid:8) f ( x ) − λ ′ y (cid:9) (2.11) = ( sup x ∈ X (cid:8) f ( x ) − λ ′ ω ( x ) (cid:9) if λ ≥ , + ∞ otherwise.(2.12)Therefore the domain of definition D ⊂ R m of g ∗ is given by:(2.13) D := (cid:26) λ ∈ R m : λ ≥ , sup x ∈ X (cid:8) f ( x ) − λ ′ ω ( x ) (cid:9) < ∞ (cid:27) , with relative interior denoted by ri D . Observe that D is convex because − g ∗ is convex and proper on D . Theorem 4.
Let g and g ∗ be as in (1.1) and (2.10), respectively. Assumethat g is concave, upper semi-continuous, and bounded from above by somelinear function. Suppose that the relative interior ri D is not empty and forevery λ ∈ ri D there exists an exponent q ≫ such that (2.14) (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) L q ( X ) < ∞ . Then: (2.15) g ( y ) = lim p →∞ inf λ ∈ ri D (cid:26) λ ′ y + ln (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) L p ( X ) − m X j =1 ln( pλ j ) p (cid:27) Proof.
In view of (2.11)(2.16) − g ∗ ( λ ) = ln (cid:20) sup y sup x ∈ X ω ( x ) ≤ y n e − λ ′ y + f ( x ) o (cid:21) if λ ∈ D + ∞ otherwise. p -NORMS AND CRAMER TRANSFORM IN OPTIMIZATION 9 Hypothesis (2.14) and Lemma 1 allow us to replace the supremum in(2.16) by the limit of the L p -norms as p → ∞ . Namely, − g ∗ ( λ ) = lim p →∞ ln Z x ∈ X Z ω ( x ) ≤ y e − pλ ′ y + pf ( x ) d y d x ! /p = lim p →∞ ln (cid:18)Z x ∈ X e − pλ ′ ω ( x )+ pf ( x ) d x (cid:19) /p − m X j =1 ln( pλ j ) p = lim p →∞ (cid:26) ln (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) L p ( X ) − m X j =1 ln( pλ j ) p (cid:27) . (2.17)Hence from (2.17), equation (2.9) can be rewritten as follows: g ( y ) = inf λ ∈ ri D (cid:8) λ ′ y − g ∗ ( λ ) (cid:9) == inf λ ∈ ri D lim p →∞ (cid:26) λ ′ y + ln (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) L p ( X ) − m X j =1 ln( pλ j ) p (cid:27) ≥ lim p →∞ inf λ ∈ ri D (cid:26) λ ′ y + ln (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) L p ( X ) − m X j =1 ln( pλ j ) p (cid:27) , (2.18)where we have applied Lemma 2 in order to interchange the ”inf” and ”lim”operators. Notice that the terms between the brackets are the functions h p ( λ ) of Lemma 2. On the other hand, given y ∈ R m , letΘ( y ) := { ( x , z ) ∈ X × R m + : ω ( x ) + z ≤ y } ⊂ R n + m , so that whenever λ ∈ ri D , (cid:13)(cid:13)(cid:13) e f ( x ) (cid:13)(cid:13)(cid:13) L p (Θ( y )) ≤ (cid:13)(cid:13)(cid:13) e f ( x )+ λ ′ ( y − ω ( x ) − z ) (cid:13)(cid:13)(cid:13) L p (Θ( y )) ≤ e λ ′ y (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) − λ ′ z (cid:13)(cid:13)(cid:13) L p ( X × R m + ) = e λ ′ y (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) L p ( X ) m Y j =1 ( pλ j ) − /p . By hypothesis (2.14), given λ ∈ ri D fixed, the L p -norm term in the lastabove identity is finite for some p large enough. Therefore, by definition(1.1), Lemma 1, and the continuity of the logarithm, one obtains g ( y ) = ln { sup ( x , z ) ∈ Θ( y ) e f ( x ) } = lim p →∞ ln (cid:13)(cid:13)(cid:13) e f ( x ) (cid:13)(cid:13)(cid:13) L p (Θ( y )) ≤ lim p →∞ inf λ ∈ ri D (cid:26) λ ′ y + ln (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) L p ( X ) − m X j =1 ln( pλ j ) p (cid:27) , which combined with (2.18) yields the desired result (2.15). (cid:3) Remark 1.
Condition (2.14) can be easily checked in particular cases. Forinstance let x ω ( x ) := Ax for some matrix A ∈ R m × n . • If X := R n + and x f ( x ) := c ′ x + ln x d for some c and d in R n with d ≥
0. The notation x d stands for the monomial Q nk =1 x d k k .Then f is concave and Z R n + e pf ( x ) − pλ ′ ω ( x ) d x = Z R n + e p ( c − A ′ λ ) ′ x x p d d x = n Y k =1 Γ(1 + p d k )( A ′ k λ − c k ) p d k < ∞ , whenever λ ∈ ri D := { λ ∈ R m : λ > , A ′ λ > c } and p ∈ N . • If X = R m and x f ( x ) := − x ′ Qx + c ′ x for some c ∈ R n and asymmetric (strictly) positive definite matrix Q ∈ R n × n , then Z R n e pf ( x ) − pλ ′ ω ( x ) d x = Z R n e − p x ′ Qx e p ( c − A ′ λ ) ′ x d x < ∞ , whenever λ ∈ ri D := { λ ∈ X : λ > } and p ∈ N .Consider next the following functions for every p ∈ N :(2.19) λ φ p ( λ ; y ) := λ ′ y + ln (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) L p ( X ) − m X j =1 ln( pλ j ) p defined on some domain of R m , and(2.20) y g p ( y ) := inf λ (cid:8) φ p ( λ ; y ) : λ ∈ ri D (cid:9) defined on R m . Recall that the Cramer transform (denoted C ) applied to anintegrable function u : R m → R , is the Legendre-Fenchel transform (denoted F ) of the logarithm of the Laplace transform (denoted L ) of u , i.e., u
7→ C ( u ) = F ◦ ln ◦ L ( u ) . The Cramer transform is natural in the sense that the logarithm of theLaplace transform is always a convex function. For our purpose, we willconsider the concave version of the Fenchel transform(2.21) ˆ u [ F (ˆ u )]( λ ) = inf y { λ ′ y + ˆ u ( y ) } , for ˆ u : R m → R convex, so that − ˆ u is concave. We claim that: Theorem 5.
The function y p g p ( y ) defined in (2.20) is the Cramertransform of the function (2.22) y ˜ g p ( y ) := Z Ω( y ) e pf ( x ) d x = (cid:13)(cid:13)(cid:13) e f (cid:13)(cid:13)(cid:13) pL p (Ω( y )) , where Ω( y ) := { x ∈ X : ω ( x ) ≤ y } ⊂ R n . p -NORMS AND CRAMER TRANSFORM IN OPTIMIZATION 11 Proof.
The result follows from the definition of the Cramer transform C .˜ g p
7→ C (˜ g p ) := F ◦ ln ◦ L (˜ g p ) y
7→ C (˜ g p )( y ) = inf λ (cid:8) λ ′ y + [ln ◦ L (˜ g p )]( λ ) (cid:9) . Hence [ L (˜ g p )]( pλ ) = Z y ∈ R m e − pλ ′ y ˜ g p ( y ) d y == Z y ∈ R m e − pλ ′ y (cid:20) Z x ∈ X , ω ( x ) ≤ y e pf ( x ) d x (cid:21) d y = Z x ∈ X e pf ( x ) (cid:20) Z y ≥ ω ( x ) e − pλ ′ y d y (cid:21) d x = (cid:20) Z x ∈ X e pf ( x ) − pλ ′ ω ( x ) d x (cid:21) m Y j =1 pλ j = (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) pL p ( X ) m Y j =1 pλ j . Therefore, [ln ◦L (˜ g p )]( pλ ) = ln (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) pL p ( X ) − m X j =1 ln( pλ j ) . On the other hand, recall the definition of g p ( y ) given in (2.20)-(2.19), g p ( y ) = inf λ ∈ ri D (cid:26) λ ′ y + ln (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) L p ( X ) − m X j =1 ln( pλ j ) p (cid:27) . Thus, with F as in (2.21) and D p := { z : p z ∈ D} , we obtain the desiredresult : p g p ( y ) = inf λ ∈ ri D (cid:26) pλ ′ y + ln (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) pL p ( X ) − m X j =1 ln( pλ j ) (cid:27) = inf λ ∈ ri D n pλ ′ y + [ln ◦ L (˜ g p )]( pλ ) o = inf z ∈ ri D p n z ′ y + [ln ◦ L (˜ g p )]( z ) o = [ F ◦ ln ◦ L (˜ g p )]( y ) = [ C (˜ g p )]( y ) , (cid:3) For linear programming, this result was already obtained in [8, 9].
Example 1. (Linear Programming)
In this case set the cone X = R n + andthe functions f ( x ) := c ′ x and ω ( x ) = Ax for some vector c ∈ R n and matrix A ∈ R m × n . We easily have that (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) pL p ( X ) = Z X e p ( c − A ′ λ ) ′ x d x = n Y k =1 p A ′ k λ − p c k for every p ∈ N and each λ in the relative interior ri D of the set(2.23) D = { λ ∈ R m : A ′ λ ≥ c , λ ≥ } . Hence from (2.19) φ p ( λ ; y ) = λ ′ y + 1 p ln (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) pL p ( X ) − m X j =1 ln( pλ j ) p == λ ′ y − n X k =1 ln( A ′ k λ − c k ) p − m X j =1 ln( λ j ) p − m + np ln p, One easily recognizes (up to the constant ( m + n )[ln p ] /p ) the LBF withparameter p , of the dual problem: P ∗ : min λ { λ ′ y : A ′ λ ≥ c , λ ≥ } . Example 2. (The general conic problem)
Consider the conic optimizationproblem min x { c ′ x : ω x ≤ y , x ∈ X } , for some convex cone X ⊂ R n , some vector c ∈ R n , and some linear mapping ω : R n → R m with adjoint mapping ω ∗ : R m → R n . We easily have that (cid:13)(cid:13)(cid:13) e c ′ x − λ ′ ω x (cid:13)(cid:13)(cid:13) pL p ( X ) = (cid:13)(cid:13)(cid:13) e ( c − ω ∗ λ ) ′ x (cid:13)(cid:13)(cid:13) pL p ( X ) == Z X e p ( c − ω ∗ λ ) ′ x d x = p − n Z X e ( c − ω ∗ λ ) ′ x d x , because X is a cone. Claim (2.19) reads φ p ( λ ; y ) = λ ′ y + 1 p ln (cid:13)(cid:13)(cid:13) e c ′ x − λ ′ ω x (cid:13)(cid:13)(cid:13) pL p ( X ) − m X j =1 ln( pλ j ) p = λ ′ y + ψ ( ω ∗ λ − c ) p − m X j =1 ln λ j p − m + np ln p, (2.24)where ψ : R n → R is the universal LBF (2.2) associated with the dual cone X ∗ , and with domain ri D , where(2.25) D = { λ ∈ R m : ω ∗ λ − c ∈ X ∗ , λ ≥ } . In φ p (and up to a constant), one easily recognizes the LBF with parameter p , of the dual problem: P ∗ : min λ { λ ′ y : ω ∗ λ − c ∈ X ∗ , λ ≥ } . p -NORMS AND CRAMER TRANSFORM IN OPTIMIZATION 13 Example 3. (Quadratic programming: non conic formulation)
Considersymmetric positive semidefinite matrixes Q j ∈ R n × n and vectors c j ∈ R n for j = 0 , , ..., m . The notation Q (cid:23) Q ≻
0) stands for Q ispositive semidefinite (resp. strictly positive definite). Let X := R n , f ( x ) := − x ′ Q x − c ′ x and let ω : R n → R m have entries ω j ( x ) := x ′ Q j x +2 c ′ j x for every j = 1 , . . . , m . For λ ∈ R m with λ >
0, define the real symmetricmatrix Q λ ∈ R n × n and vector c λ ∈ R n : Q λ := Q + m X j =1 λ j Q j and c λ := c + m X j =1 λ j c j , so that (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) pL p ( X ) = Z X exp (cid:0) − p x ′ Q λ x − p c ′ λ x (cid:1) d x = π n/ exp (cid:0) p c ′ λ Q − λ c λ (cid:1)q det (cid:0) p Q λ (cid:1) < ∞ , whenever p ∈ N and Q λ ≻
0. Therefore φ p ( λ ; y ) = λ ′ y + 1 p ln (cid:13)(cid:13)(cid:13) e c ′ x − λ ′ ω x (cid:13)(cid:13)(cid:13) pL p ( X ) − m X j =1 ln( pλ j ) p = λ ′ y + c ′ λ Q − λ c λ − ln (cid:0) det Q λ (cid:1) p − (2.26) − m X j =1 ln λ j p + n ln π − ln p p − m ln pp , on the domain of definition ri D := { λ : λ > , Q λ ≻ } . Again, in equation(2.26) one easily recognizes (up to a constant) the LBF with parameter p ,of the dual problem P ∗ :min λ ≥ , Q λ (cid:23) max x ∈ X (cid:26) − x ′ Q x − c ′ x − m X j =1 λ j (cid:0) x ′ Q j x +2 c j x − y j (cid:1)(cid:27) = min λ ≥ , Q λ (cid:23) (cid:26) λ ′ y + max x ∈ X (cid:8) − x ′ Q λ x − c ′ λ x ) (cid:9)(cid:27) = min λ (cid:8) λ ′ y + c ′ λ Q − λ c λ : λ ≥ , Q λ (cid:23) (cid:9) , where we have used the fact that x ∗ = Q − λ c λ ∈ R n is the unique optimalsolution to the inner maximization problem in the second equation above.If − Q ≻ Q j (cid:23) j = 1 , . . . , m , then ri D := { λ : λ > } because Q λ ≻ λ >
0; in this case P is a convex optimization problemand there is no duality gap between P and P ∗ . An explicit dual.
A minimization problem P ∗ in the variables λ ∈D ⊂ R m with cost function λ h ( λ ) is a natural dual of P in (1.2) if weakduality holds, that is, if for every feasible solution λ ∈ D of P ∗ and everyfeasible solution x ∈ R n of P , one has f ( x ) ≤ h ( λ ). Of course, a highlydesirable feature is that strong duality holds, that is, the optimal values of P and P ∗ coincide.In Examples 1, 2, and 3, the function φ p defined in (2.19) can be decom-posed into a sum of the form:(2.27) λ φ p ( λ ; y ) = h ( λ ; y ) + h ( λ ; p )where h is independent of the parameter p . Moreover, if h ( λ ; p ) < ∞ for some λ > h ( λ ; p ) converges to zero when p → ∞ ; inaddition, for fixed p , h is a barrier as h ( λ ; p ) → ∞ as λ approaches theboundary of D . One may also verify that h ( λ ) = λ ′ y − g ∗ ( λ ) , ∀ λ ∈ D , where g ∗ is the Legendre-Fenchel conjugate in (2.10). In fact, the above pre-vious decomposition is more general and can be deduced from some simplefacts. Recall that problem P is given in (1.2), and y g ( y ) := sup x { f ( x ) : ω ( x ) ≤ y , x ∈ X } , for a convex cone X ⊂ R n and continuous mappings f and ω . Assume that g is concave, upper semi-continuous, and bounded from above by some affinefunction, so that Legendre-Fenchel duality yields g ( y ) = inf λ (cid:8) λ ′ y − g ∗ ( λ ) (cid:9) , where g ∗ ( λ ) := inf y (cid:8) λ ′ y − g ( y ) (cid:9) , and where the domain of g ∗ is the set D ⊂ R m in (2.13). Lemma 6.
Suppose that ri D is not empty and for every λ ∈ ri D thereexists an exponent q ≫ such that (2.14) holds. Then: (2.28) λ h ( λ ; y ) := lim p →∞ φ p ( λ ; y ) = λ ′ y − g ∗ ( λ ) , ∀ λ ∈ ri D . Proof.
Let λ ∈ ri D be fixed. The given hypothesis and Lemma 1 imply thatlim p →∞ ln (cid:13)(cid:13)(cid:13) e f ( x ) − λ ′ ω ( x ) (cid:13)(cid:13)(cid:13) L p ( X ) = sup x ∈ X (cid:8) f ( x ) − λ ′ ω ( x ) (cid:9) == sup y sup x ∈ X ,ω ( x ) ≤ y (cid:8) f ( x ) − λ ′ y (cid:9) = sup y (cid:8) g ( y ) − λ ′ y (cid:9) = − g ∗ ( λ ) , and so (2.28) follows from the definition (2.19) of φ p . (cid:3) As a consequence we obtain: p -NORMS AND CRAMER TRANSFORM IN OPTIMIZATION 15 Corollary 7.
Let D be as in (2.13) with ri D 6 = ∅ , φ p as in (2.19) and let λ h ( λ ; y ) be as in (2.28). Then the optimization problem (2.29) P ∗ : min λ { h ( λ ; y ) : λ ∈ ri D } . is a dual of P . Moreover, if g is concave, upper-semicontinuous and boundedabove by some affine function, then strong duality holds.Proof. By Lemma 6, h ( λ ; y ) = λ ′ y − g ∗ ( λ ) for all λ ∈ ri D . And so if min P ∗ (resp. max P ) denotes the optimal value of P ∗ (resp. P ), one hasmin P ∗ = min λ { λ ′ y − g ∗ ( λ ) : λ ∈ ri D} ≥ g ( y ) = max P , where we have used that − g ∗ ( λ ) = sup z { g ( z ) − λ ′ z } ≥ g ( y ) − λ ′ y . Finally,if g is concave, upper semi-continuous and bounded above by some affinefunction, then − g ∗ is convex. Therefore, as a convex function is continuouson its domain D (which is convex)min P ∗ = min λ { h ( λ ; y ) : λ ∈ ri D } = min λ { λ ′ y − g ∗ ( λ ) : λ ∈ ri D} = min λ { λ ′ y − g ∗ ( λ ) : λ ∈ D} = g ( y ) , that is, strong duality holds. (cid:3) In a number of cases, the L p -norm approximation of g ∗ can be obtained explicitly as a function of λ , whereas g ∗ itself cannot be obtained explicitlyfrom (1.3). In this situation one obtains an explicit LBF φ p with parameter p , for some dual P ∗ of P , and sometimes an explicit dual problem P ∗ .Indeed if φ p is known explicitly, one may sometimes get its pointwise limit h ( λ, y ) in (2.28), in closed form, and so P ∗ is defined explicitly by (2.29).With p fixed, computing φ p ( λ ; y ) reduces to compute the integral over aconvex cone of an exponential of some function parametrized by λ and p .Sometimes this can be done with the help of some known transforms likee.g. the Laplace or Weierstrass transforms, as illustrated below. Linear mappings and Laplace transform.
Let ω : R n → R m be a linearmapping, with ω ( x ) = Ax for some real matrix A ∈ R m × n , and let X = R n + .Then ln k e f ( x ) − λ ′ ω ( x ) k L p ( X ) = 1 p ln (cid:18)Z X e − ( p A ′ λ ) ′ x e pf ( x ) d x (cid:19) = 1 p ln (cid:16) L [e pf ]( p A ′ λ ) (cid:17) . That is, the L p -norm approximation is the logarithm of the Laplace trans-form of the function e f , evaluated at the point p A ′ λ ∈ R n . So if in problem P , the objective function f is such that e f has an explicit Laplace transform,then one obtains an explicit expression for the LBF λ φ p ( λ ; y ) defined in(2.19). For instance if f ( x ) = c ′ x + ln q ( x ) for some vector c ∈ R n and somepolynomial q ∈ R [ x ], positive on the feasible set of P , write q ( x ) p = X α ∈ N n q pα x α , for finitely many non zero coefficients ( q pα ), and where the notation x α standfor the monomial x α · · · x α n n . Then ln (cid:0) L [e pf ]( p A ′ λ ) (cid:1) can be computed inclosed-form since we have:ln (cid:16) L [e pf ]( p A ′ λ ) (cid:17) = ln X α ∈ N n q pα Z X e p ( c − A ′ λ ) ′ x x α d x ! , = − n ln p + ln X α ∈ N n q pα ∂ | α | ∂ x α Q ni =1 ( A ′ λ − c ) i ! , where ∂ | α | ∂ x α = Q ni =1 ∂ αi ∂x αii . Of course the above expression can become quitecomplicated, especially for large values of p . But it is explicit in the variables( λ i ). If the function x ln q ( x ) is concave, then Corollary 7 applies. Onthe other hand, to obtain g ∗ explicity would require to solve A ′ λ − c −∇ q ( x ) /q ( x ) = 0 in closed form, which is impossible in general.Similarly if f is linear, i.e. x f ( x ) = c ′ x for some vector c ∈ R n , thenln k e f ( x ) − λ ′ ω ( x ) k L p ( X ) = p − ln (cid:16) L [e − pλ ′ ω ( x ) ]( p c ) (cid:17) and so if the function x e − pλ ′ ω ( x ) has an explicit Laplace transform thenso does the L p -norm approximation, and again, φ p is obtained in closedform. Example 4.
As a simple illustrative example, consider the optimizationproblem:(2.30) P : sup x ( c ′ x + n X k =1 b k ln x k : Ax ≤ y , x > , x ∈ R n ) , for some given matrix A ∈ R m × n and vectors b , c ∈ R n and y ∈ R m . Wesuppose that b ≥
0, so that c ′ x + ln( x b ) is concave , and in which case P is a convex program. Notice that with X = R n ++ := { x ∈ R n : x > } ,sup x n c ′ x + ln( x b ) − λ ′ Ax : x ∈ R n , x ∈ X o < ∞ whenever λ lies in D = { λ ∈ R m : A ′ λ > c , λ ≥ } . The notation x b stands for the monomial x b · · · x b n n . p -NORMS AND CRAMER TRANSFORM IN OPTIMIZATION 17 Proposition 8.
The function φ p in (2.19) associated with the optimizationproblem (2.30) is given by: φ p ( λ ; y ) = λ ′ y + n X k =1 (cid:20) ln Γ(1+ p b k ) p − b k ln( p A ′ k λ − p c k ) (cid:21) (2.31) − n X k =1 ln( A ′ k λ − c k ) p − m X j =1 ln λ j p − m + np ln p. and is the LBF with parameter p , of the dual problem: P ∗ : inf λ ( λ ′ y − n X k =1 b k ln (cid:20) A ′ k λ − c k e − b k (cid:21) : A ′ λ > c , λ ≥ ) . In particular, strong duality holds, i.e., the optimal values of P and P ∗ areequal.Proof. We have (cid:13)(cid:13)(cid:13) e c ′ x − λ ′ Ax x b (cid:13)(cid:13)(cid:13) pL p ( X ) = Z X e p ( c − A ′ λ ) ′ x x p b d x = n Y k =1 Γ(1+ p b k )( p A ′ k λ − p c k ) p b k < ∞ , whenever p ∈ N and λ > R m satisfies A ′ λ > c (and where Γ is the usualGamma function). Next, φ p in (2.19) reads φ p ( λ ; y ) = λ ′ y − p ln (cid:13)(cid:13)(cid:13) e c ′ x − λ ′ Ax x b (cid:13)(cid:13)(cid:13) pL p ( X ) − n X k =1 ln( pλ k ) p , which is (2.31). Next, Stirling’s approximation Γ(1 + t ) ≈ ( t/ e) t √ πt forreal numbers t ≫
1, yieldslim p →∞ ln Γ(1+ p b k ) p − b k ln p = lim p →∞ b k ln (cid:20) p b k e (cid:21) − b k ln p = b k ln(e − b k ) . By Lemma 6, λ ′ y − g ∗ ( λ ) = lim p →∞ φ p ( λ ; y ) = λ ′ y − n X k =1 b k ln (cid:20) A ′ k λ − c k e − b k (cid:21) . And so, by Corollary 7, the function φ p is the LBF with parameter p , ofthe dual problem P ∗ : inf λ ( λ ′ y − n X k =1 b k ln (cid:20) A ′ k λ − c k e − b k (cid:21) : A ′ λ > c , λ ≥ ) . In particular, strong duality holds. (cid:3)
References [1]
F. Bacelli, G. Cohen, J. Olsder and J.P Quadrat , Syncronization and Linear-ity , Wiley, New York, 1992.[2]
A.V. Fiaco, G.P. McCormick , Nonlinear Programming, Sequential UnconstrainedMinimization Techniques , John Wiley & Sons, New York, 1968.[3]
K.R. Frisch , The logarithmic potential method of convex programming, Memoran-dum, Institute of Economics, Oslo, Norway, 1955.[4]
O. G¨uller , Barrier functions in interior point methods,
Math. Oper. Res. (1996),860–885[5] O. G¨uller and L. Tuncel , Characterization of the barrier parameter of homoge-neous convex cones,
Math. Progr. (1998), 55–76.[6] D. den Hertog , Interior Point Approach to Linear, Quadratic and Convex Pro-gramming , Kluwer, Dordrecht, 1994.[7]
J.B. Hiriart-Urruty , Conditions for global optimality, in:
Handbook of GlobalOptimization , R. Horst and P. Pardalos (Eds.), Kluwer, Dordrecht (1995), pp. 1–26.[8]
J.B. Lasserre , Why the logarithmic barrier function in convex and linear program-ming,
Oper. Res. Letters (2000), 149–152.[9] J.B. Lasserre , Linear and Integer Programming Versus Linear Integration andCounting , Springer, New York, 2009.[10]
V.P. Maslov , M´ethodes Op´eratorielles , Editions Mir, Moscou 1973, TraductionFran caise, 1987.[11]
B. Mond and T. Weir , Generalized convexity and higher order duality,
J. Math.Sci. (1981-83), 74–94.[12]
B. Mond , Mond-Weir duality, in C.E.M. Pearce and E. Hunt (Eds.),
Optimization:Structures and Applications , Springer, Dordrecht (2009), pp. 157–165.[13]
V.I. Piterbarg, V.R. Fatalov , The Laplace method for probability measures onBanach spaces,
Russian Math. Surveys (1995), 1152–1223.[14] F. Wolfe , A duality theorem in nonlinear programming,
Quart. Appl. Math. (1961), 239–244. LAAS-CNRS and Institute of Mathematics, University of Toulouse, LAAS,7 avenue du Colonel Roche, 31077 Toulouse C´edex 4,France
E-mail address : [email protected] Depto. Matematicas, CINVESTAV-IPN, Apdo. Postal 14-740, Mexico, D.F.07000, Mexico
E-mail address ::