Decomposition by Partial Linearization: Parallel Optimization of Multi-Agent Systems
Gesualdo Scutari, Francisco Facchinei, Peiran Song, Daniel P. Palomar, Jong-Shi Pang
aa r X i v : . [ c s . I T ] S e p Decomposition by Partial Linearization:Parallel Optimization of Multi-Agent Systems
Gesualdo Scutari, Francisco Facchinei, Peiran Song, Daniel P. Palomar, and Jong-Shi Pang
Abstract —We propose a novel decomposition framework forthe distributed optimization of general nonconvex sum-utilityfunctions arising naturally in the system design of wireless multi-user interfering systems. Our main contributions are: i) thedevelopment of the first class of (inexact)
Jacobi best-response algorithms with provable convergence, where all the users simul-taneously and iteratively solve a suitably convexified version ofthe original sum-utility optimization problem; ii) the derivationof a general dynamic pricing mechanism that provides a unifiedview of existing pricing schemes that are based, instead, onheuristics; and iii) a framework that can be easily particularizedto well-known applications, giving rise to very efficient practical(Jacobi or Gauss-Seidel) algorithms that outperform existing ad-hoc methods proposed for very specific problems. Interestingly,our framework contains as special cases well-known gradientalgorithms for nonconvex sum-utility problems, and many block-coordinate descent schemes for convex functions.
I. I
NTRODUCTION W IRELESS networks are composed of users that mayhave different objectives and generate interference,when no multiplexing scheme is imposed a priori to regulatethe transmissions; examples are peer-to-peer, ad-hoc, and cog-nitive radio systems. A usual and convenient way of designingsuch multiuser systems is by optimizing the “social function”,i.e., the (weighted) sum of the users’ objective functions.Since centralized solution methods are too demanding in mostapplications, the main difficulty of this formulation lies inperforming the optimization in a distributed manner with lim-ited signaling among the users. When the social problem is asum-separable convex programming, many distributed methodshave been proposed, based on primal and dual decompositiontechniques; see, e.g., [2]–[4] and references therein. In thispaper we address the more frequent and difficult case in whichthe social function is nonconvex. It is well known that theproblem of finding a global minimum of the social functionis, in general, NP hard (see e.g. [5]), and centralized solutionmethods (e.g., based on combinatorial approaches) are toodemanding in most applications. As a consequence, recentresearch efforts have been focused on finding efficiently highquality suboptimal solutions via easy-to-implement (possibly)distributed algorithms. A recent survey on nonconvex resourceallocation problems in interfering networks modeled as Gaus-sian Interference Channels (ICs) is [6].
G. Scutari and P. Song are with the Dpt. of Electrical Eng., State Univ.of New York at Buffalo, Buffalo, USA. F. Facchinei is with the Dpt. ofComputer, Control, and Management Eng., Univ. of Rome “La Sapienza”,Rome, Italy. J.-S. Pang is with the Dpt. of Industrial and Systems Eng.,Univ. of Southern California Viterbi School of Eng., Los Angeles, USA.D. Palomar is with the Dpt. of Electronic and Computer Eng., Hong KongUniv. of Science and Technology, Hong Kong. Emails:
Sequential decom-position algorithms were proposed in [21]–[24] for the sum-rate maximization problem over SISO/MIMO ICs, and in [25]for more general (nonconvex) functions. In these algorithms,only one agent at a time is allowed to update his optimizationvariables; a fact that in large scale networks may lead toexcessive communication overhead and slow convergence.The aim of this paper is instead the study of more appealing simultaneous distributed methods for general nonconvex sum-utility problems, where all users can update their variables atthe same time. The design of such algorithms with provableconvergence is much more difficult, as also witnessed bythe scarcity of results available in the literature. Besides theapplication of the classical gradient projection algorithm to thesum-rate maximization problem over MIMO ICs [26], paralleliterative algorithms (with message passing) for DSL/ad-hocSISO networks and MIMO broadcast interfering channelswere proposed in [27]–[29] and [30], respectively. Unfortu-nately, the gradient schemes [26] suffer from slow convergenceand do not exploit any degree of convexity that might bepresent in the sum-utility function; [27]–[29] hinge cruciallyon the special log-structure of the users’ rate functions; and[30] is based on the connection with a weighted MMSEproblem. This makes [27]–[30] not applicable to differentclasses of sum-utility problems.Building on the idea first introduced in [1], the maincontribution of this paper is to propose a new decompositionmethod that: i) converges to stationary points of a large class of(nonconvex) social problems, encompassing most sum-utilityfunctions of practical interest (including functions of complexvariables); ii) decomposes well across the users, resulting inthe parallel solution of convex subproblems, one for each user; iii) converges also if the users’ subproblems are solved in an inexact way; and iv) contains as special case the gradient algo-rithms for nonconvex sum-utility problems, and many block-coordinate descent schemes for convex functions. Moreover,the proposed framework can be easily particularized to well-known applications, such as [21]–[24], [29], [31], giving risein a unified fashion to distributed simultaneous algorithms thatoutperform existing ad-hoc methods both theoretically andnumerically. We remark that while we follow the seminal ideasput forward in [1], in this paper, besides providing full proofsof the results in [1], we i) consider a much wider class ofsocial-problems and (possibly inexact) algorithms, including[1] as special cases, ii) discuss in detail the case of functionsof complex variables, and iii) compare numerically to state-of-the-art alternative methods. To the best of our knowledge, thispaper is the first attempt toward the development of decompo-sition techniques for general nonconvex sum-utility problemsthat allow distributed simultaneous (possibly inexact ) best-response -based updates among the users.On one hand, our approach draws on the Successive ConvexApproximation (SCA) paradigm, but relaxes the key require-ment that the convex approximation must be a tight globalupper bound of the social function, as required instead in[27], [32], [33] (see Sec VI for a detailed comparison with[27], [32], [33]). This represents a turning point in the designof distributed SCA-based methods, since up to date, findingsuch an upper bound convex approximation for sum-utilityfunctions having no specific structure (as, e.g., [24], [26]–[30])has been an elusive task.On the other hand, our method also sheds new light onwidely used pricing mechanisms: indeed, our scheme can beviewed as a dynamic pricing algorithm where the pricing rulederives from a deep understanding of the problem characteris-tics and is not obtained on an ad-hoc basis, as instead in [21]–[24], [31]. We conclude this review by mentioning the recentwork [34], where the authors, developing ideas contained in[30], [33], proposed parallel schemes based of the SCA ideathat are applicable (only) to the class of sum-utility problemsfor which a connection with a MMSE formulation can beestablished. Note that [33] [34], which share some ideas withour approach, appeared after [1].The rest of the paper is organized as follows. Sec. IIintroduces the sum-utility optimization problem along withsome motivating examples. Sec. III presents our novel de-composition mechanism based on partial linearizations; thealgorithmic framework is described in Sec. IV. Sec. V extendsour results to sum-utility problems in the complex domain;further generalizations are discussed in Sec. VI. In Sec. VIIwe apply our new algorithms to some resource allocationproblems over SISO and MIMO ICs, and compare theirperformance with the state-of-the-art decomposition schemes.Finally, Sec. VIII draws some conclusions.II. P ROBLEM F ORMULATION
We consider the design of a multiuser system composed of I coupled users I , { , . . . , I } . Each user i makes decisions onhis own n i -dimensional real strategy vector x i , which belongsto the feasible set K i ; the vector variables of the other usersis denoted by x − i , ( x j ) j = i ∈ K − i , Q j = i K j ; the users’ strategy profile is x = ( x i ) Ii =1 , and the joint strategy set of theusers is K , Q j ∈I K j . The system design is formulated asminimize x U ( x ) , X ℓ ∈I f f ℓ ( x ) subject to x i ∈ K i , ∀ i ∈ I , (1)with I f , { , . . . , I f } . Observe that, in principle, the set I f of objective functions is different from the set I of users; weshow shortly how to explore this extra degree of freedom togood effect. Of course, (1) contains the most common casewhere there is exactly one function for each user, i.e. I = I f . Assumptions.
We make the following blanket assumptions:A1) Each K i is closed and convex;A2) Each f i is continuously differentiable on K ;A3) Each ∇ x f i is Lipschitz continuous on K , with constant L ∇ f i ; let L ∇ U , P i L ∇ f i ;A4) The lower level set L ( x ) , { x ∈ K : U ( x ) ≤ U ( x ) } of the social function U is compact for some x ∈ K .The assumptions above are quite standard and are satisfiedby a large class of problems of practical interest. In particular,condition A4 guarantees that the social problem has a solution,even when the feasible K is not bounded; if K is bounded A4 istrivially satisfied. A sufficient condition for A4 when K is notnecessarily bounded is that U be coercive [i.e., U ( x ) → + ∞ as k x k → + ∞ , with x ∈ K ]. Note that, differently fromclassical Network Utility Maximization (NUM) problems, herewe do not assume any convexity of the functions f ℓ , thus, (1) isa nonconvex minimization problem. For the sake of simplicity,in (1) we assume that the users’ strategies are real vectors; inSec. V, we extend our framework to complex matrix strategies,to cover also the design of MIMO systems. A motivating example . The social problem (1) is generalenough to encompass many sum-utility problems of practicalinterest. It also includes well-known utility functions studiedin the literature; an example is given next. Consider an N -parallel Gaussian IC composed of I active users, and let r i ( p i , p − i ) , N X k =1 log | H ii ( k ) | p ik σ ik + P j = i | H ij ( k ) | p jk ! be the maximum achievable rate on link i , where p i , ( p ik ) Nk =1 denotes the power allocation of user i over the N parallel channels, p − i , ( p j ) j = i is the power profile of allthe other users j = i , | H ij ( k ) | is the gain of the channelbetween the j -th transmitter and the i -th receiver, σ ik is thevariance of the thermal noise over carrier k at the receiver i, and P j = i | H ij ( k ) | p jk represents the multiuser interferencegenerated by the users j = i at the receiver i . Each transmitter i is subject to the power constraints p i ∈ P i , with P i , (cid:8) p i ∈ R N + : W i p i ≤ I max i (cid:9) , (2)where the inequality, with given I max i ∈ R m i + and W i ∈ R m i × N + is intended to be component-wise. Note that thelinear (vector) constraints in (2) are general enough to modelclassical power budget constraints and different interferenceconstraints, such as spectral mask or interference-temperaturelimits. Finally, let θ i : R + → R be the utility functions of theusers’ rates. The system design can then be formulated as maximize p ,..., p I X i ∈I θ i ( r i ( p i , p − i )) subject to p i ∈ P i , ∀ i ∈ I . (3)Note that (3) is an instance of (1), with I f = I ; moreoverassumptions A1-A4 are satisfied if the utility functions θ i ( x ) are i) concave and nondecreasing on R + , and ii) continuouslydifferentiable with Lipschitz gradients. Interestingly, this classof functions θ i ( x ) includes many well-known special casesstudied in the literature, such as the weighted sum-rate func-tion, the harmonic mean of the rates, the geometric mean of(one plus) the rates, etc.; see, e.g., [6], [21], [22], [35].Since the class of problems (1) is in general nonconvex(generally NP hard [5]), the focus of this paper is on the de-sign of distributed solution methods for computing stationarysolutions (possibly local minima) of (1). Our major goal is todevise simultaneous best-response schemes fully decomposedacross the users, meaning that all the users can solve inparallel a sequence of convex problems while converging toa stationary solution of the original nonconvex problem.III. A N EW D ECOMPOSITION T ECHNIQUE
We begin by introducing an informal description of ournew algorithms that sheds light on the core idea of thenovel decomposition technique and establishes the connectionwith classical descent gradient-based schemes. This will alsoexplain why our scheme is expected to outperform currentgradient methods. A formal description of the proposed algo-rithms along with their main properties is given in Sec. IV forthe real case, and in Sec. V for the complex case.
A. What do conditional gradient methods miss?
A classical approach to solve a nonconvex problem like(1) would be using some well-known gradient-based descentscheme. A simple way to generate a (feasible) descent di-rection is for example using the conditional gradient method(also called Frank-Wolfe method) [4]: given the current iterate x n = ( x ni ) Ii =1 , the next feasible vector x n +1 is given by x n +1 = x n + γ n d n (4)where d n , x n − x n , x n = ( x ni ) Ii =1 is the solution of thefollowing set of convex problems (one for each user): x ni = argmin x i ∈K i n ∇ x i U ( x n ) T ( x i − x ni ) o , (5)for all i ∈ I , and γ n ∈ (0 , is the step-size of the algorithmthat needs to be properly chosen to guarantee convergence.Looking at (5) one infers that gradient methods are based onsolving a sequence of parallel convex problems, one for eachuser, obtained by linearizing the whole utility function U ( x ) around x n , a fact that does not exploit any “nice” structurethat the original problem may potentially have.At the basis of the proposed decomposition techniques,there is instead the attempt to properly exploit any degreeof convexity that might be present in the social function. Tocapture this idea, for each user i ∈ I , let S i ⊆ I f be the setof indices of all the functions f j ( x i , x − i ) that are convex in x i on K i , for any given x − i ∈ K − i : S i , { j ∈ I f : f j ( • , x − i ) is convex on K i , ∀ x − i ∈ K − i } (6)and let C i ⊆ S i be a given subset of S i . The idea is to preservethe convex structure of the functions in C i while linearizingthe rest. Note that we allow the possibility that S i = ∅ , even ifwe “hope” that S i = ∅ , and actually this latter case occurs inmost of the applications of interest, see Sec. VII. For each user i ∈ I , we can introduce the following convex approximationof U ( x ) around x n ∈ K : e f C i ( x i ; x n ) , X j ∈C i f j ( x i , x n − i ) + π C i ( x n ) T ( x i − x ni )+ τ i x i − x ni ) T H i ( x n ) ( x i − x ni ) (7)with π C i ( x n ) , X j ∈C − i ∇ x i f j ( x ) | x = x n , (8)where C − i , I f \C i is the complement of C i , τ i is a givennonnegative constant, and H i ( x n ) is an n i × n i uniformlypositive definite matrix (possibly dependent on x n ) , i.e. H i ( x n ) − c H i I (cid:23) , for some positive c H i . For notationalsimplicity, we omitted in e f C i ( x i ; x n ) the dependence on τ i and H i ( x n ) . Note that in (7), we added a proximal-like reg-ularization term, in order to relax the convergence conditionsof the resulting algorithm or enhance the convergence speed(cf. Sec. IV). A key feature of e f C i we will always require isthat e f C i ( • ; x ) be uniformly strongly convex. By this we meanthe following. Let c τ i ( x ) be the constant of strong convexityof e f C i ( • ; x ) . We require that c τ i , inf x ∈K c τ i ( x ) > . (9)Note that this is not an additional assumption , but just arequirement on the way τ i is chosen. Under the uniformlypositive definiteness of H i ( x n ) , condition (9) is always sat-isfied if τ i > ; however it is also satisfied with τ i = 0 if P j ∈C i f j ( • , x − i ) is uniformly strongly convex on K − i ; a factthat occurs in many applications, see, e.g., Sec. VII.Associated with each e f C i ( x i ; x n ) we can define the follow-ing “best response” map that resembles (5): b x C i ( x n , τ i ) , argmin x i ∈K i e f C i ( x i ; x n ) . (10)Note that, in the setting above, b x C i ( x n , τ i ) is always well-defined, since the optimization problem in (10) is stronglyconvex and thus has a unique solution. Given (10), we canintroduce the best-response mapping of the users, defined as K ∋ y b x C ( y , τ ) , ( b x C i ( y , τ i )) Ii =1 ; (11)and also set τ , ( τ i ) Ii =1 . The proposed search direction d n at point x n in (4) becomes then b x C ( x n , τ ) − x n . Thechallenging question now is whether such direction is stilla descent direction for the function U at x n and how tochoose the free parameters (such as τ i ’s, γ n ’s, and H i ( x n ) ’s)in order to guarantee convergence to a stationary solution ofthe original nonconvex sum-utility problem. These issues areaddressed in the next sections. B. Properties of the best-response mapping b x C ( y , τ ) Before introducing a formal description of the proposedalgorithms, we derive next some key properties of the best-response map b x C ( y , τ ) , which shed light on how to choosethe free parameters in (10) and prove convergence. Proposition 1:
Given the social problem (1) under A1)-A4),suppose that each H i ( x ) − c H i I (cid:23) for all x ∈ K and some c H i > , and ( c τ i ) Ii =1 > . Then the mapping K ∋ y b x ( y , τ ) has the following properties:(a) b x C ( • , τ ) is Lipschitz continuous on K , i.e., there exists apositive constant ˆ L such that k b x C ( y , τ ) − b x C ( z , τ ) k ≤ ˆ L k y − z k , ∀ y , z ∈ K ; (12)(b) The set of the fixed-points of b x C ( • , τ ) coincides with theset of stationary solutions of the social problem (1) ; therefore b x C ( y , τ ) has a fixed-point;(c) For every given y ∈ K , the vector b x C ( y , τ ) − y is a descentdirection of the social function U ( x ) at y such that ( b x C ( y , τ ) − y ) T ∇ x U ( y ) ≤ − c k b x C ( y , τ ) − y k , (13)for some positive constant c ≥ c τ , with c τ , min i ∈I { c τ i } . (14)(d) If ∇ x U ( x ) is bounded on K , then there exists a finiteconstant α > such that k b x C ( y , τ ) − y k ≤ α, ∀ y ∈ K . (15) Proof:
See Appendix A.Proposition 1 makes formal the idea introduced in Sec.III-A and thus paves the way to the design of distributed best-response-like algorithms for (1) based on b x C ( • , τ ) . Indeed, theinequality (13) states that either ( b x C ( x n ) − x n ) T ∇ x U ( x n ) < or b x C ( x n ) = x n . In the former case, d n , b x C ( x n ) − x n is a descent direction of U ( x ) at x n ; in the latter case, x n isa fixed-point of the mapping b x C ( • , τ ) and thus a stationarysolution of the original nonconvex problem (1) [Prop. 1 (b)].Quite interestingly, we can also provide a characterization ofthe fixed-points of b x C ( y , τ ) [and thus the stationary solutionsof (1)] in terms of Nash equilibria of a game with a properpricing mechanism. Formally, we have the following. Proposition 2:
Any fixed-point x ⋆ of b x C ( • , τ ) is a Nashequilibrium of the game where each user i ∈ I solves thefollowing priced convex optimization problem: given x − i ,min x i ∈K i X j ∈C i f j ( x i , x − i ) + π C i ( x ⋆ ) T x i . (16)According to the above proposition, the stationary solutionsof (1) achievable as fixed-points of b x C i ( • , τ ) are unilaterally optimal for the objective functions in (16). This result is inagreement with those obtained in [22], [23] for the sum-rate maximization problem over SISO frequency selective-channels. Despite its theoretical interest, however, Prop. 2 doesnot help in practice to solve (1). Indeed, the computationof a Nash equilibrium of the game in (16) would requirethe a-priori knowledge of the prices π C i ( x ⋆ ) and thus theequilibrium itself, which of course is not available. IV. D ISTRIBUTED D ECOMPOSITION A LGORITHMS
We are now ready to introduce our new algorithms, as adirect product of Prop. 1. We first focus on (inexact) Jacobischemes (cf. Sec. IV-A); then we show that the same resultshold also for (inexact) Gauss-Seidel updates (cf. Sec. IV-C).
A. Exact Jacobi best-response schemes
The first algorithm we propose is a Jacobi scheme whereall users update simultaneously their strategies based on thebest-response b x C i ( • , τ ) (possibly with a memory); the formaldescription is given in Algorithm 1 below, and its convergenceproperties are given in Theorem 3. Algorithm 1 : Exact Jacobi SCA AlgorithmData : τ ≥ , { γ n } > , x ∈ K . Set n = 0 . (S.1) : If x n satisfies a termination criterion: STOP; (S.2) : For all i ∈ I , compute b x C i ( x n , τ ) [cf. (10)]; (S.3) : Set x n +1 , x n + γ n ( b x C ( x n , τ ) − x n ) ; (S.4) : n ← n + 1 , and go to (S.1) . Theorem 3:
Given the social problem (1) under A1-A4,suppose that one of the two following conditions is satisfied:(a) For each i , H i ( x ) is such that H i ( x ) − c H i I (cid:23) for all x ∈ K and some c H i > ; furthermore { γ n } and τ ≥ arechosen so that < inf n γ n ≤ sup n γ n ≤ γ max ≤ and c τ ≥ γ max L ∇ U , (17)with c τ defined in (14).(b) For each i , H i ( x ) is such that H i ( x ) − c H i I (cid:23) for all x ∈ K and some c H i > , τ ≥ is such that c τ > , andfurthermore { γ n } is chosen so that γ n ∈ (0 , , γ n → , and X n γ n = + ∞ . (18)Then, either Algorithm 1 converges in a finite number ofiterations to a stationary solution of (1) or every limit pointof the sequence { x n } ∞ n =1 (at least one such point exists) is astationary solution of (1) . Moreover, none of such points is alocal maximum of U . Proof:
See Appendix B.
Main features of Algorithm 1 . The algorithm implementsa novel distributed
SCA decomposition: all the users solve inparallel a sequence of decoupled strongly convex optimizationproblems as in (10). The algorithm is expected to performbetter than classical gradient-based schemes (at least in termsof convergence speed) at the cost of no extra signaling, becausethe structure of the objective functions is better preserved. Itis guaranteed to converge under very mild assumptions (theweakest available in the literature) while offering some flexi-bility in the choice of the free parameters [conditions (a) or (b)of Theorem 3]. This degree of freedom can be exploited, e.g.,to achieve the desired tradeoff between signaling, convergencespeed, and computational effort, as discussed next.As far as the computation of the best-response b x C i ( x n , τ ) is concerned, at each iteration, every user needs to known P j ∈C i f j ( • , x n − i ) and π C i ( x n ) . The signaling required toacquire this information is of course problem-dependent. Ifthe problem under consideration does not have any specific structure, the most natural message-passing strategy is tocommunicate directly x n − i and ( ∇ x i f j ( x n )) j / ∈C i . However, inmany specific applications much less signaling may be needed;see Sec. VII for some examples. On the choice of the free parameters.
Convergence ofAlgorithm 1 is guaranteed either using a constant step-size rule[cf. (17)] or a diminishing step-size rule [cf. (18)]. Moreover,different choices of {C i } are in general feasible for a givensocial function, resulting in different best-response functionsand signaling among the users.
1) Constant step-size:
In this case, γ n = γ ≤ γ max forall n , where γ max ∈ (0 , needs to be chosen together with τ ≥ and ( H i ( y )) Ii =1 so that the condition c τ ≥ γ max L ∇ U is satisfied, with c τ defined in (14). This can be done inseveral ways. A simple (but conservative) choice satisfyingthat condition is, e.g., τ i = τ > for all i ∈ I , γ max ∈ (0 , ,and γ/τ ≤ /L ∇ U . Note that this condition imposes aconstraint only on the ratio γ/τ , leaving free the choice ofone of the two parameters.An interesting special case worth mentioning is: γ = γ max = 1 for all n , H i ( y ) = I for all i ∈ I , and τ > large enough so that c τ ≥ L ∇ U . This choice leads to theclassical Jacobi best-response scheme (but with a proximalregularization), namely: at each iteration n , x n +1 i = b x C i ( x n , τ ) , ∀ ∈ I . To the best of our knowledge, this algorithm along with itsconvergence conditions [Theorem 3a)] represents a new resultin the optimization literature; indeed classic best-responsenonlinear Jacobi schemes require much stronger (sufficient)conditions to converge (implying contraction) [4, Ch. 3.3.5].Note that the choice of τ i ’s to guarantee convergence [i.e., c τ ≥ L ∇ U ] can be done locally by each user with no sig-naling exchange, once the Lipschitz constant L ∇ U is known.As a final remark, we point out that in the case of constantand “sufficiently” small step-size γ n , one can relax the syn-chronization requirements among the users allowing (partially)asynchronous updates of users best-responses (in the sense of[4]); we omit the details because of space limitation.
2) Variable step-size:
In scenarios where the knowledge ofthe system parameters, e.g. L ∇ U , is not available, one canuse the diminishing step-size rule (18). Under such a rule,convergence is guaranteed for any choice of H i ( x ) − c H i I (cid:23) and τ ≥ such that c τ > . Note that if P j ∈C i f j ( • , x − i ) isstrongly convex on K i for any x − i ∈ K − i , one can also set τ i = 0 , otherwise any arbitrary but positive τ i is necessary.We will show in the next section that a diminishing step-size rule is also useful to allow an inexact computation of thebest-response b x C i ( x n , τ ) while preserving convergence of thealgorithm. Two classes of step-size rules satisfying (18) are:given γ = 1 ,Rule γ n = γ n − (cid:0) − ǫ γ n − (cid:1) , n = 1 , . . . , (19)Rule γ n = γ n − + α ( n )1 + β ( n ) , n = 1 , . . . , (20)where in (19) ǫ ∈ (0 , is a given constant, whereas in (20) α ( n ) and β ( n ) are two nonnegative real functions of n ≥ such that: i) ≤ α ( n ) ≤ β ( n ) ; and ii) α ( n ) /β ( n ) → as n → ∞ while P n ( α ( n ) /β ( n )) = ∞ . Examples of such α ( n ) and β ( n ) are: α ( n ) = α or α ( n ) = log( n ) α , and β ( n ) = β n or β ( n ) = β √ n , where α, β are given constants satisfying α ∈ (0 , , β ∈ (0 , , and α ≤ β .Another issue to discuss is the choice of the free posi-tive definite matrices H i ( y ) . Mimicking (quasi-)Newton-likeschemes [36], a possible choice is to consider for H i ( x n ) aproper (diagonal) uniformly positive definite “approximation”of the Hessian matrix ∇ x i U ( x n ) . The exact expression toconsider depends on the amount of signaling and computa-tional complexity required to compute such a H i ( x n ) , andthus varies with the specific problem under consideration.
3) On the choice of C i ’s: In general, more than one(feasible) choice of {C i } is possible for a given social function,resulting in different decomposition schemes. Some illustrativeexamples are discussed next. Example − (Proximal) gradient/ Newton algorithms : If each C i = ∅ and I = I f , b x C i ( x n , τ i ) reduces to the gradientresponse (5) (possibly with a proximal regularization). Itturns out that (exact and inexact) gradient algorithms alongwith their convergence conditions are special cases of ourframework. Note that if S i = ∅ for every i (i.e., no convexitywhatsoever is present in U ), this is the only possible choice,and indeed our approach reduces to a gradient-like method. Onthe other hand, as soon as at least some S i = ∅ , we may departfrom the gradient method and exploit the available convexity.Note that our framework contains also Newtown-like up-dates. For instance, if U ( x i , x n − i ) is convex in x i ∈ K i for any x n − i ∈ K − i , a feasible choice is C i = ∅ and H i ( x n ) = ∇ x i U ( x n ) , resulting in: b x i ( x n , τ i ) , argmin x i ∈K i (cid:8) ∇ x i U ( x n ) T ( x i − x ni )+ 12 ( x i − x ni ) T ∇ x i U ( x n )( x i − x ni )+ τ i k x i − x ni k o . (21)Essentially (21) corresponds to a Newton-like step of user i in minimizing the “reduced” problem min x i ∈K i U ( x i , x n − i ) . Example − Pricing algorithms in [1]: Suppose that I = I f ,and each S i = { i } (implying that f i ( • , x − i ) is convex on K i for any x − i ∈ K − i ). By taking each C i = { i } and H i ( x n ) = I ,we obtain the pricing-based algorithms in [1]: b x i ( x n , τ i ) , argmin x i ∈K i f i ( x i , x n − i )+ π i ( x n ) T x i + τ i k x i − x ni k , where π i ( x n ) , P j = i ∇ x i f j ( x n ) . Algorithm 1 based on theabove best-response implements naturally a pricing mecha-nism; indeed, each π i ( x n ) represents a dynamic pricing thatmeasures somehow the marginal increase of the sum-utility ofthe other users due to a variation of the strategy of user i ;roughly speaking, it works like a punishment imposed to eachuser for being too aggressive in choosing his own strategyand thus “hurting” the other users. Pricing algorithms basedon heuristics have been proposed in a number of papers forthe sum-rate maximization problem over SISO/SIMO/MIMOICs [21]–[23], [31], [37]. However, on top of being sequential schemes, convergence of algorithms in the aforementionedpapers is established under relatively strong assumptions (e.g.,limited number of users, special classes of functions, specificchannel models and transmission schemes, etc...), see [23].The pricing in our framework is instead the natural conse-quence of the proposed SCA decomposition technique andleads to simultaneous algorithms that can be applied (withconvergence guaranteed) to a very large class of problems,even when [21]–[23], [31], [37] fail. Example − (Proximal) Jacobi algorithms for a single jointlyconvex function: Suppose that the social function is a single(jointly) convex function f ( x , . . . , x I ) on K = Q i K i . Ofcourse, this optimization problem can be interpreted as aspecial case of the framework (1), with C i = S i = { } = I f ,for all i ∈ I and f ( x ) = f ( x ) . Then, setting H i ( x n ) = I ,the best-response (10) of each user i reduces to b x C i ( x n , τ i ) , argmin x i ∈K i f ( x i , x n − i ) + τ i k x i − x ni k . (22)Algorithm 1 based on (22) reads as a block-Jacobi schemesconverging to the global minima of f ( x , . . . , x I ) over K (cf. Theorem 3). To the best of our knowledge, these arenew algorithms in the literature; moreover their convergenceconditions enlarge current ones; see, e.g., [4, Sec. 3.2.4]. Quiteinterestingly, this new algorithm can be readily applied tosolve the sum-rate maximization over MIMO multiple access channels [38], resulting in the first (inexact) simultaneous MIMO iterative waterfilling algorithm in the literature; weomit the details because of the space limitation.
Example − Algorithms for DC programming.
The proposedframework applies naturally to sum-utility problems where theusers’ functions are the difference of two convex functions,namely: minimize x ,..., x I X i ∈I f cvx i ( x ) + X i ∈I f ccv i ( x ) subject to x i ∈ K i , ∀ i ∈ I (23)where f cvx i ( x ) and f ccv i ( x ) are convex and concave functionson K , respectively. Letting f ( x ) , X i ∈I f cvx i ( x ) and f ( x ) , X i ∈I f ccv i ( x ) , the optimization problem (23) can be interpreted as a specialcase of the framework (1), with I f = { , } , C i = { } for all i ∈ I . The best-response (10) of each user i reduces then to b x C i ( x n , τ i ) = argmin x i ∈K i (cid:8) f ( x i , x n − i ) + π i ( x n ) T x i + τ i k x i − x ni k o (24)where π i ( x n ) , ∇ x i f ( x n ) and H i ( x n ) = I . The above de-composition can be applied, e.g., to the sum-rate maximization(3), when all θ i ( x ) = w i x , with w i > ; see Sec. VII. B. Inexact Jacobi best-response schemes
In many practical network settings, it can be useful tofurther reduce the computational effort needed to solve users’(convex) sub-problems (10) by allowing inexact computations of the best-response functions b x C i ( x n , τ ) . Algorithm 2 is avariant of Algorithm 1, in which suitable approximations of b x C i ( x n , τ ) can be used. Algorithm 2 : Inexact Jacobi SCA AlgorithmData : { ε ni } for i ∈ I , τ ≥ , { γ n } > , x ∈ K . Set n = 0 . (S.1) : If x n satisfies a termination criterion: STOP; (S.2) : For all i ∈ I , solve (10) within the accuracy ε ni : Find z ni s.t. k z ni − b x C i ( x n , τ ) k ≤ ε ni ; (S.3) : Set x n +1 , x n + γ n ( z n − x n ) ; (S.4) : n ← n + 1 , and go to (S.1) . The error term ε ni in Step 2 measures the accuracy usedat iteration n in computing the solution b x C i ( x n , τ ) of eachproblem (10). Note that if we set ε ni = 0 for all n and i ,Algorithm 2 reduces to Algorithm 1. Obviously, the errors ε ni ’s and the step-size γ n ’s must be chosen according to somesuitable conditions, if one wants to guarantee convergence.These conditions are established in the following theorem. Theorem 4:
Let { x n } ∞ n =1 be the sequence generated byAlgorithm 2, under the setting of Theorem 3 where howeverwe reenforce assumption A4 by assuming that U is coerciveon K . Suppose that the sequences { γ n } and { ε ni } satisfy thefollowing conditions: i) γ n ∈ (0 , ; ii) γ n → ; iii) P n γ n =+ ∞ ; iv) P n ( γ n ) < + ∞ ; and v) P n ε ni γ n < + ∞ for all i = 1 , . . . , I . Then, either Algorithm 2 converges in a finitenumber of iterations to a stationary solution of (1) or everylimit point of the sequence { x n } ∞ n =1 (at least one such pointsexists) is a stationary solution of (1) .Proof: See Appendix B.As expected, in the presence of errors, convergence ofAlgorithm 2 is guaranteed if the sequence of approximatedproblems (10) is solved with increasing accuracy. Note that,in addition to requiring ε ni → , condition v) of Theorem4 imposes also a constraint on the rate by which the ε ni go to zero, which depends on the rate of decrease of { γ n } .Two instances of step-size rules satisfying the summabilitycondition iv) are given by (19) and (some choices of) (20).An example of error sequence satisfying condition v) is ε ni ≤ c i γ n , where c i is any finite positive constant. Sucha condition can be forced in Algorithm 2 in a distributed way,using classical error bound results in convex analysis; see, e.g.,[17, Ch. 6, Prop. 6.3.7].Finally, it is worth observing that Algorithm 2 (and 1) with adiminishing step-size rule satisfying i)-iv) of Theorem 4 can bemade robust against (stochastic) errors on the price estimates,due to an imperfect communication scenario (random linkfailures, noisy estimate, quantization, etc...). Because of thespace limitation, we do not further elaborate on this here; see[39] for details. C. (Inexact) Gauss-Seidel best-response schemes
The Gauss-Seidel implementation of the proposed SCAdecomposition is described in Algorithm 3, where the userssolve sequentially, in an exact or inexact form, the convexsubproblems (10). In the algorithm, we used the notation x t +1 i< , ( x t +11 , . . . , x t +1 i − ) and x ti ≥ , ( x ti , . . . , x tI ) .Note that one round of Algorithm 3 (i.e., t ← t +1 ) whereinall users sequentially update their own strategies, corresponds Algorithm 3 : Inexact Gauss-Seidel SCA AlgorithmData : { ε ti } for i ∈ I , τ ≥ , { γ t } > , x ∈ K . Set t = 0 . (S.1) : If x t satisfies a termination criterion: STOP; (S.2) : For i = 1 , . . . , I ,a) Find z ti s.t. k z ti − b x C i (cid:0) ( x t +1 i< , x ti ≥ ) , τ (cid:1) k ≤ ε ti ;b) Set x t +1 i , x ti + γ t ( z ti − x ti ) (S.3) : t ← t + 1 , and go to (S.1) . to I consecutive iterations n of the Jacobi updates describedin Algorithms 1 and 2. In Appendix C we prove that, quiteinterestingly, Algorithm 3 can be interpreted as an inexactJacobi scheme based on the best-response b x C ( • , τ ) , satisfyingTheorem 4. It turns out that convergence of Algorithm 3follows readily from that of Algorithm 2, and is stated next. Theorem 5:
Let { x n } ∞ n =1 be the sequence generated byAlgorithm 3, under the setting of Theorem 4. Then, theconclusions of Theorem 4 holds .Proof: See Appendix C.V. T HE C OMPLEX C ASE
In this section we show how to extend our framework tosum-utility problems where the users’ optimization variablesare complex matrices. This will allow us to deal with thedesign of MIMO multiuser systems. Let us consider thefollowing sum-utility optimization:minimize X ,..., X I U ( X ) , X ℓ ∈I f f ℓ ( X ) subject to X i ∈ X i , ∀ i ∈ I , (25)where X , ( X i ) i ∈I , with X i ∈ C n i × m i being the (matrix)strategy of user i , X i ⊆ C n i × m i , and f ℓ : X → R , with X , Q i ∈I X i ; let define also X − i , Q j = i X j . We study (25)under the same assumptions A1-A4 stated for the real case,where in A2 the differentiability condition is now replaced bythe R -differentiability (see, e.g., [40], [41]), and in A3 U ( X ) is required to have Lipschitz conjugate-gradient ∇ X ∗ U ( X ) on K , with constant L C ∇ U , where X ∗ is the conjugate of X . A motivating example.
An instance of (25) is the MIMOversion of (3):maximize Q ,..., Q I X i ∈I θ i ( R i ( Q i , Q − i )) subject to Q i ∈ Q i , ∀ i ∈ I . (26)where R i ( Q i , Q − i ) is the rate over the MIMO link i , R i ( Q i , Q − i ) , log det (cid:0) I + H Hii R i ( Q − i ) − H ii Q i (cid:1) , (27) Q i is the covariance matrix of transmitter i , R i ( Q − i ) , R n i + P j = i H ij Q j H Hij is the covariance matrix of the mul-tiuser interference plus the thermal noise R n i (assumed to befull-rank), with Q − i , ( Q j ) j = i , H ij is the channel matrixbetween the j -th transmitter and the i -th receiver, and Q i isthe set of constraints of user i , Q i , (cid:8) Q i ∈ C n i × n i : Q i (cid:23) , tr ( Q i ) ≤ P i , Q i ∈ Z i (cid:9) . In Q i we also included an arbitrary convex and closed set Z i ,which allows us to add additional constraints, such as: i) null constraints U Hi Q i = , where U i ∈ C n i × r i is a full rank matrixwith r i < n i ; ii) soft-shaping constraints tr (cid:0) G Hi Q i G i (cid:1) ≤ I ave i , with G i ∈ C n i × m Gi for some m G i > ; iii) peak-powerconstraints λ max (cid:0) F Hi Q i F i (cid:1) ≤ I peak i , with F i ∈ C n i × m Fi forsome m F i > ; and iv) per-antenna constraints [ Q i ] kk ≤ α ik .Note that the optimization problems in [23], [24], [26] arespecial cases of (26). A. Distributed decomposition algorithms
At the basis of the proposed decomposition techniquesfor (25) there is the (second order) Taylor expansion of acontinuously R -differentiable function f : C n × m → R [41]: f ( X + ∆ X ) − f ( X ) ≈ h ∆ X , ∇ X ∗ f ( X ) i + 12 vec ([ ∆X , ∆X ∗ ]) H H XX ∗ f ( X ) vec ([ ∆X , ∆X ∗ ]) , (28)where h A , B i , Re (cid:8) tr ( A H B ) (cid:9) , vec ( • ) denotes the “vec”operator, and H XX ∗ f ( X ) is the so-called augmented Hessian of f , defined as [41] H XX ∗ f ( X ) , ∂∂ vec ([ X , X ∗ ]) T (cid:18) ∂f ( X ) ∂ vec ([ X ∗ , X ]) T (cid:19) T . (29)In [41], we proved that H XX ∗ f ( X ) plays the role of theHessian matrix for functions of real variables. In particular, f is strongly convex on C n × m if and only if there exists a c f C > , the constant of strong convexity of f , such thatvec ([ Y , Y ∗ ]) H H XX ∗ f ( X ) vec ([ Y , Y ∗ ]) ≥ c f C k Y k F , (30)for all X ∈ C n × m and Y ∈ C n × m , where k•k F de-notes the Frobenius norm. When (30) holds, we say that H XX ∗ f ( X ) is augmented uniformly positive definite, andwrite H XX ∗ f ( X ) − c f C I A (cid:23) [41]. If f is only convex butnot strongly convex, then c f C in (30) is zero.Motivated by the Taylor expansion (28), and using the samesymbols S i and C i to denote the complex counterparts of S i and C i introduced for the real case [cf. (6)], let us considerfor each user i the following convex approximation of U ( X ) at X n : denoting by ∆ X i , X i − X ni , e f C i ( X i ; X n ) , X j ∈C i f j ( X i , X n − i ) + h Π C i ( X n ) , ∆ X i i + τ i vec ([∆ X i , ∆ X ∗ i ]) H H i ( X n ) vec ([∆ X i , ∆ X ∗ i ]) (31)with Π C i ( X n ) , X j ∈C − i ∇ X ∗ i f j ( X ) (cid:12)(cid:12) X = X n , (32)where H i ( X n ) is any given nm × nm matrix such that H i ( X ) − c H i I A (cid:23) , for all X ∈ X and some c H i > . Notethat if H i ( X ) = I , the quadratic term in (31) reduces to thestandard proximal regularization τ i k X i − X ni k F . Then, thebest-response matrix function of each user is b X C i ( X n , τ i ) , argmin X i ∈X i e f C i ( X i ; X n ) . (33)Decomposition algorithms for (25) are formally the sameas those proposed in Sec. IV for (1) [namely Algorithms1-3], where the real-valued best-response map b x C ( x n , τ ) isreplaced with the complex-valued counterpart b X C ( X n , τ ) , ( b X C i ( X n , τ i )) Ii =1 . Convergence conditions read as in The-orems 3-5, under the following natural changes: i) L ∇ U becomes L C ∇ U ; ii) the condition H i ( x ) − c H i I (cid:23) for all x ∈ K reads as H i ( X ) − c H i I A (cid:23) , for all X ∈ X ; and iii) inthe constant c τ defined in (14) c τ i ( x ) is replaced with c τ i ( X ) ,where c τ i ( X ) is the constant of strong convexity of e f C i ( • ; X ) [41]: D Z i − W i , ∇ X ∗ i e f C i ( Z i ; X ) − ∇ X ∗ i e f C i ( W i ; X ) E ≥ c τ i ( X ) k Z i − W i k F , ∀ Z i , W i ∈ X i . VI. E
XTENSIONS AND RELATED WORKS
The key idea in the proposed SCA schemes, e.g., (33), isto convexify the nonconvex part of U via partial linearizationof P j ∈C − i f j ( X ) , resulting in the term h Π C i ( X n ) , ∆ X i i . Inthe same spirit of [27], [32], [33], it is not difficult to showthat one can generalize this idea and replace the linear term h Π C i ( X n ) , ∆ X i i in (31) with a nonlinear scalar function Π C i ( • ; X n ) : X i ∋ X i Π C i ( X i ; X n ) . All the resultspresented so far are still valid provided that Π C i ( • ; X n ) enjoysthe following properties: for all X n ∈ X ,P1) Π C i ( • ; X n ) is R -continuously differentiable on X i ;P2) ∇ X ∗ i Π C i ( X ni ; X n ) = P j ∈C − i ∇ X ∗ i f j ( X n ) ;P3) ∇ X ∗ i Π C i ( X ni ; • ) is uniformly Lipschitz on X ;P4) Π C i ( X i ; X n ) is continuous in ( X i ; X n ) ∈ X i × X . Similar conditions can be written in the real case for thenonlinear function π C i ( • ; x n ) : K i ∋ x i π C i ( x i ; x n ) replacing the linear pricing π T C i x i . It is interesting to compareP1-P3 with conditions in [27], [32], [33]. First of all, ourconditions do not require that the approximation function isa global upper bound of the original sum-utility function, aconstraint that remains elusive for sum-utility problems withno special structure. Second, even when the aforementionedconstraint can be met, it is not always guaranteed that theresulting convex subproblems are decomposable across theusers, implying that a centralized implementation might berequired. Third, SCA algorithms [27], [32], [33], even whendistributed, are generally sequential schemes (unless the sum-utility has a special structure). On the contrary, the algorithmsproposed in this paper do not suffer from any of the abovedrawbacks, which enlarges substantially the class of (largescale) nonconvex problems solvable using our framework.VII. A PPLICATIONS AND N UMERICAL R ESULTS
In this section, we customize the proposed decompositionframework to the SISO and MIMO sum-rate maximizationproblems introduced in (3) and (26), respectively, and comparethe resulting new algorithms with state-of-the-art schemes[23], [24], [29], [30], [33]. Quite interestingly, our algorithmsare shown to outperform current schemes, in terms of conver-gence speed and computational effort, while reaching the samesum-rate. It is worth mentioning that this was not obvious atall, because algorithms in [23], [24], [29], [30], [33] are ad-hocschemes for the sum-rate problem, whereas our framework hasbeen introduced for general sum-utility problems.
A. Sum-Rate Maximization over SISO ICs
Consider the social problem (3), with f i ( x ) = w i x , where w i are positive given weights; to avoid redundant constraints,let also assume w.l.o.g. that all the columns of W i are linearlyindependent. We describe next two alternative decompositionsfor (3) corresponding to differ choices of I f and {C i } .
1) Decomposition − Pricing Algorithms:
Since eachuser’s rate r i ( p i , p − i ) is concave in p i ∈ P i , a natural choiceis I f = I and C i = { i } , which leads to the following class ofstrongly concave subproblems [cf. (7)]: given p n = ( p ni ) Ii =1 and choosing H i ( p n ) = I , the best-response of user i is ˆ p i ( p n ) , argmax p i ∈ P i n w i r i ( p i , p n − i ) − π i ( p n ) T p i − τ i k p i − p ni k o , where π i ( p n ) , ( π ik ( p n )) Nk =1 is the pricing factor, given by π i,k ( p n ) , − X j ∈N i w j | H ji ( k ) | snr njk (1 + snr njk ) · mui njk ; (34) N i denotes the set of neighbors of user i , i.e., the set ofusers j ’s which user i interferers with; and snr njk and mui njk are the SINR and the multiuser interference-plus-noise powerexperienced by user j , generated by the power profile p n : snr njk , | H jj ( k ) | p njk mui njk , mui njk , σ jk + X i = j | H ji ( k ) | p nik . The best-response ˆ p i ( p n ) can be computed in closed form(up to the multipliers associated with the inequality constraintsin P i ) according to the following multi-level waterfilling-likeexpression [41]: ˆ p i ( p n ) , (cid:20) p ni ◦ (cid:0) − ( snr ni ) − (cid:1) + − τ i (cid:18) ˜ µ i − q [ ˜ µ i − τ i p ni ◦ ( + ( snr ni ) − )] + 4 τ i w i (cid:19)(cid:21) + (35)where ◦ denotes the Hadamard product, ( snr ni ) − , (1 / snr nik ) Nk =1 and ˜ µ i , π i ( p n ) + W Ti µ i , with the multipliervector µ i chosen to satisfy the nonlinear complementaritycondition (CC) ≤ µ i ⊥ I max i − W i ˆ p i ( p n ) ≥ . The optimal µ i satisfying the CC can be efficiently computed (in a finitenumber of steps) using a multiple nested bisection methodas described in [41, Alg. 6]; we omit the details because ofthe space limitation. Note that, in the presence of the powerbudget constraint only (as in [23], [29], [30]), µ i reduces toa scalar quantity µ i such that ≤ µ i ⊥ P i − T ˆ p i ( p n ) ≥ ,whose solution can be obtained using the classical bisectionalgorithms (or the methods in [42]).Given ˆ p i ( p n ) , one can now use any of the algorithmsintroduced in Sec. IV. For instance, a good candidate is theexact Jacobi scheme with diminishing step-size (Algorithm1), whose convergence is guaranteed if, e.g., rules in (19) or(20) are used for the sequence { γ n } (Theorem 3). Note thatthe proposed algorithm is fairly distributed. Indeed, given theinterference generated by the other users [and thus the MUIcoefficients mui njk ] and the current interference price π i ( p n ) ,each user can efficiently and locally compute the optimalpower allocation ˆ p i ( p n ) via the waterfilling-like expression (35). The estimation of the prices π ik ( p n ) requires howeversome signaling among nearby users. Interestingly, the pricingexpression in (34) as well as the signaling overhead necessaryto compute it coincides with that in [23]. But, because oftheir sequential nature, algorithms in [23] require more CSIexchange in the network then our simultaneous schemes.
2) Decomposition − DC Algorithms:
An alternative classof algorithms for the sum-rate maximization problem underconsideration can be obtained exploring the D.C. nature ofthe rate functions (cf. Example U ( p ) = f ( p ) + f ( p ) , where f ( p ) , X i w i X k log( σ i,k + X j | H ij ( k ) | p jk ) f ( p ) , − X i w i X k log( σ i,k + X j = i | H ij ( k ) | p jk ) , which is an instance of (23) with I f = { , } . A naturalchoice of C i is then C i = { } for all i ∈ I , resulting in thebest-response: e p i ( p n ) , argmax p i ∈ P i n f ( p i , p n − i ) − π i ( p n ) T p i − τ i k p i − p ni k o , where π i ( p n ) , ( π ik ( p n )) Nk =1 , with π i,k ( p n ) , − X j ∈N i w j | H ji ( k ) | mui njk . (36)We remark that the best-response e p i ( p n ) can be efficientlycomputed by a fixed-point iterate, in the same spirit of [29]; weomit the details because of the space limitation. Note that thecommunication overhead to compute the prices (34) and (36)is the same, but the computation of e p i ( p n ) requires more CSIexchange in the network than that of ˆ p i ( p n ) , since each user i also needs to estimate the cross-channels {| H ji ( k ) | } j ∈N i . Numerical Example . We compare now Algorithm 1 basedon the best-response ˆ p i ( p n ) in (35) (termed SJBR), withthose proposed in [29] [termed SCALE and SCALE one-step, the latter being a simplified version of SCALE whereinstead of solving the fixed-point equation (16) in [29], onlyone iteration of (16) is performed], [23] (termed MDP), [30](termed WMMSE). Since in the aforementioned papers onlypower budget constraints can be dealt with, to allow thecomparison, we simplified the sum-rate maximization problemdescribed above and considered only power budget constraints(and all w i = 1 ). We assume the same power budget P i = P ,noise variances σ ik = σ , and snr = P/σ = 3 dB for all theusers. We simulated SISO frequency channels with N = 64 subcarriers; the channels are generated as FIR filters of order L = 10 , whose taps are i.i.d. Gaussian random variables withzero mean and variance / ( d ij ( L + 1) ) , where d ij is thedistance between the transmitter j and the receiver i . Allthe algorithms are initialized by choosing the uniform powerallocation, and are terminated when (the absolute value) of thesum-utility error in two consecutive rounds becomes smallerthan e - . The accuracy in the bisection loops (required byall methods) is set to e - . In our algorithm, we used rule(19) with ǫ = 1 e - and set all τ i = 0 . In Fig. 1, we plot theaverage number of iterations required by the aforementioned Number of Users It e r a t i on s MDPSJBR(proposed scheme)WMMSE
SCALESCALE(one step)Gradient(opt. tuning)Gradient(SJBR tuning)
Fig. 1: Average number of iterations versus number of users in SISOfrequency-selective ICs. Note that all algorithms are simultaneous exceptMDP; this means that, at each iteration, in MDP there is only one userupdating his strategy, whereas in the other algorithms all users do so). algorithms to converge versus the number of users; the averageis taken over independent channel realizations; we set d ij /d ii = 3 and d ij = d ji and d ii = d jj for all i and j = i . Asbenchmark, we also plot two instances of proximal conditionalgradient algorithms [4], which can be interpreted as specialcases of our SJBR with C i = ∅ for all i ∈ I (cf. Ex. τ i and ǫ as in SJBR, whereas in theother one [termed Gradient (opt. tuning)] we chose τ i = 50 for all i ∈ I and ǫ = 1 e - , which leads experimentally to thefastest behavior of the gradient algorithm.All the algorithms reach the same average sum-rate (thatthus is not reported here, see [43]), but their convergencebehavior is quite different. The figure clearly shows thatour SJBR outperforms all the others (note that SCALE,WMMSE, and the proximal gradient are also simultaneous-based schemes). For instance, the gap with the WMMSEis about one order of magnitude, for all the network sizesconsidered in the experiment, while the gap with MDP isup to three orders of magnitude. The good behavior of ourscheme has been observed also for other choices of d ij /d ii ,termination tolerances, and step-size rules; we cannot presenthere more experiments because of space limitation; we referthe interested reader to the technical report [43] for more nu-merical results. Note that SJBR, SCALE one-step, WMMSE,MDP, and gradient schemes have similar per-user computa-tional complexity, whereas SCALE is much more demandingand is not appealing for a real-time implementation. Therefore,Fig. 1 provides also a rough indication of the per-user cpu timeof SJBR, SCALE one-step, WMMSE, and gradient algorithms.It is also interesting to compare the proposed algorithmwith gradient schemes. A first natural question is whether thepartial linearization (as performed in SJBR) really improvesthe convergence speed of the algorithm. The answer is givenby the comparison in Fig. 1 between SJBR and “Gradient(SJBR tuning)”. One can see that, under the same choiceof { γ n } and ( τ i ) Ii =1 , the former is almost three order ofmagnitude faster then the latter, for all the network sizesconsidered in the experiment. If an independent, ad hoc tuning of { γ n } and ( τ i ) Ii =1 is performed for the gradient algorithm,the gap reduces up to one order of magnitude, still in favorof SJBR. This result supports the intuition motivating thiswork: preserving the structure of the problem via a partiallinearization can significantly improve the convergence speedof the algorithm.The comparison with gradient algorithms also reveals awell-known issue of these schemes: the convergence behaviorstrongly depends on the choice of the step-size sequence { γ n } and the proximal gains τ i . It is then natural to ask whether alsothe proposed algorithms suffer from the same drawback. Toanswer this question, in Fig. 2 we compare the convergencebehavior of the proximal condition gradient algorithm withthat of SJBR, using the step-size rule (19), but changing thefree parameter ǫ ∈ (0 , by several orders of magnitude. Forgradient schemes, we considered two choices of τ i , namely: τ i = 0 and τ i = 50 (as in Fig. 1); the latter resulting in theexperimentally fastest behavior of gradient schemes (see Fig.1). More specifically, in Fig. 2, we plot the average numberof iterations needed to reach convergence within the accuracyof e - versus ǫ ∈ (0 , , for different number of users (therest of the setting is as in Fig. 1). The figure clearly showsthat, differently from gradient algorithms, the convergencebehavior of our scheme appears to be almost independentof the choice of ǫ . This is a very desirable feature that letsone avoid the expensive and difficult tuning of the step-size,thus making the proposed algorithms a very good candidate inmany applications. We remark one more time that the gradientmethod is very sensitive to the choice of parameters; indeed,based on further simulations that we do not report here forlack of space, the behavior of the gradient method is verysensitive to the number of users and characteristics of thenetwork (SNR, pair distances, etc...) and its optimal behaviorrequires different tunings of parameters each time. −4 −3 −2 −1 ε It e r a t i on s SJBR ( τ =0, 5 users)SJBR( τ =0, 10 users)SJBR( τ =0, 15 users) Gradient( τ =50, 5 users)Gradient( τ =50, 10 users)Gradient( τ =50, 15 users) Gradient( τ =0, 5 users)Gradient( τ =0, 10 users)Gradient( τ =0, 15 users) Fig. 2: Proximal conditional gradient algorithms versus SJBR: Averagenumber of iterations versus ǫ ∈ (0 , [cf. (19)]. B. Sum-Rate Maximization over MIMO ICs
Let us focus now on the MIMO formulation (26), assuming f i ( x ) = w i x , with w i > .
1) Decomposition
Choosing I f = I , C i = { i } , and H i ( Q n ) = I , the best-response of user i is ˆ Q i ( Q n , τ i ) , argmax Q i ∈ Q i (cid:8) w i r i ( Q i , Q n − i ) − h Π i ( Q n ) , Q i − Q ni i− τ i k Q i − Q ni k F o (37)with Π i ( Q n ) , X j ∈N i w j H Hji e R j ( Q n − j ) H ji , where N i is defined as in the SISO case, and e R j ( Q n − j ) , R j ( Q n − j ) − − ( R j ( Q n − j ) + H jj Q nj H Hjj ) − . Note that, once the price matrix Π i ( Q n ) is given, the best-response ˆ Q i ( Q n , τ i ) can be computed locally by each usersolving a convex optimization problem. Moreover, for somespecific structures of the feasible sets Q i , the case of full-column rank channel matrices H i , and τ i = 0 , a solution inclosed form (up to the multipliers associated with the powerbudget constraints) is also available [24]. Given ˆ Q i ( Q n , τ i ) ,one can now use any of the algorithms introduced in Sec.V. To the best of our knowledge, our schemes are the firstclass of best-response Jacobi (inexact) algorithms for MIMOIC systems based on pricing with provable convergence. Complexity Analysis and Message Exchange . It is interestingto compare the computational complexity and signaling (i.e.,message exchange) of our algorithms, e.g., Algorithm 1 basedon the best-response ˆ Q i ( Q n , τ i ) (termed MIMO-SJBR) withthose of the schemes proposed in the literature for a similarproblem, namely the MIMO-MDP [23], [24], and the MIMO-WMMSE [30]. We assume that all channel matrices H ii ’s arefull-column rank, and set τ i = 0 in (37). For the purposeof complexity analysis, since all algorithms include a similarbisection step which generally takes few iterations, we willignore this step in the computation of the complexity (as in[30]). Also, WMMSE and SJBR are simultaneous schemes,while MDP is sequential; we then compare the algorithms bygiven the per-round complexity , where one round means oneupdate of all users. Denoting by n T (resp. n R ) the number ofantennas at each transmitter (resp. receiver), the computationalcomplexity of the algorithms is: • MIMO-MDP: O (cid:0) I ( n T n R + n T n R + n R ) + I n T (cid:1) • MIMO-WMMSE: O (cid:0) I ( n T n R + n T n R + n T ) + I n R (cid:1) [30] • MIMO-SJBR: O (cid:0) I ( n T n R + n T n R ) + I ( n T + n R ) (cid:1) .It is clear that the complexity of the three algorithms is verysimilar, and same in order in the case in which n T = n R ( , n ),given by O ( I n ) .We remark that, in a real system, the MUI covariancematrices R i ( Q − i ) come from an estimation process. It is thusinteresting to understand how the complexity changes whenthe computation of R i ( Q − i ) from R n i + P j = i H ij Q j H Hij isnot included in the analysis. We obtain the following: • MIMO-MDP: O (cid:0) I ( n T n R + n T n R + n R ) + I n T (cid:1) • MIMO-WMMSE: O (cid:0) I ( n T n R + n T ) + I ( n R + n T n R ) (cid:1) • MIMO-SJBR: O (cid:0) I ( n T n R + n T n R ) + I ( n T + n R ) (cid:1) .Finally, if one is interested in the time necessary to completeone iteration, it can be shown that it is proportional to theabove complexity divided by I . As far as the communication overhead is concerned, thesame remarks we made about the schemes described in theSISO setting, apply also here for the MIMO case. The onlydifference is that now the users need to exchange a (pricing)matrix rather than a vector, resulting in O ( I n R ) amount ofmessage exchange per-iteration for all the algorithms.
2) Decomposition − WMMSE Algorithms:
In [30], theauthors showed that the MIMO problem (26) (under powerconstraints only) is equivalent to the following sum-MSEminimization: writing Q i = V i V Hi , V , ( V i ) Ii =1 , andintroducing the auxiliary matrix variables U , ( U i ) Ii =1 , W , ( W ) Ii =1 ,min W , U , V f ( W , U , V ) , X i ∈I w i ( tr ( W i E i ( U , V )) − log det( W i )) s.t. tr ( V i V Hi ) ≤ P i , W i (cid:23) , ∀ i ∈ I , (38)where E i ( U , V ) is the MSE matrix at the receiver i (see (3)in [30]). The formulation (38) has some desirable properties,namely: i) f ( W , U , V ) is continuously ( R -)differentiablewith Lipschitz continuous (conjugate) gradient on the feasibleset; ii) f ( W , U , V ) is convex in each variables W , U , V ;iii) the minimization of f ( W , U , V ) w.r.t. to each W , U , V can be performed in parallel by the users; and iv) the optimalsolutions of the individual minimizations are available inclosed form, see [30] for details. We will denote such optimalsolutions as ˆ W i ( U , V ) , ˆ U i ( U , V ) , and ˆ V i ( U , W ) , for all i ∈ I , where we made explicit the dependence on the variablesthat are kept fixed. In [30] the authors proposed to use the(Gauss-Seidel) block coordinate descent method to solve (38),resulting in the so-called MIMO-WMMSE algorithm.It is not difficult to see that the formulation (38) canbe cast into our framework, resulting in the following best-response mapping for each user i : ˆ X i ( W n , U n , V n ) , (cid:16) ˆ W i ( U n , V n ) , ˆ U i ( U n , V n ) , ˆ V i ( U n , W n ) (cid:17) . We can thencompute a stationary solution of (38) and thus (26) using anyof the Jacobi algorithms introduced in the previous sectionsbased on ˆ X i ( W n , U n , V n ) (or its inexact computation). Notethat the computational complexity as well as the communica-tion overhead of such algorithms are roughly the same of thoseof the MIMO-WMMSE [30]. Numerical Example . In Tables I and II we compare theMIMO-SJBR, the MIMO-MDP [23], [24], and the MIMO-WMMSE [23], [24], in terms of average number of iterationsrequired to reach convergence, for different number of users,normalized distances d , d ij /d ii (with d ij = d ji and d ii = d jj for all i and j = i ), and termination accuracy (namely: e - and e - ). We considered the following setup. All the trans-mitters/receivers are equipped with antennas; we simulateduncorrelated fading channel model, where the coefficients areGaussian distributed with zero mean and variance /d ij ; andwe set R n i = σ I for all i , and snr , P/σ = 3 dB. We usedthe step-size rule (19) with ǫ = 1 e - and τ i = 0 . We computedthe best-response (37) using the closed form solution [24].In our simulations all the algorithms reached the same aver-age sum-rate. Given the results in Tables I and II, the followingcomments are in order. The proposed SJBR outperforms allthe others schemes in terms of iterations, while having similar (or even better) computational complexity. Interestingly, theiteration gap with the other schemes reduces with the distanceand the termination accuracy. More specifically: i) SJBR seemsto be much faster than all the other schemes (about oneorder of magnitude) when d ij /d ii = 3 [say low interferencescenarios], and just a bit faster (or comparable to MIMO-WMMSE) when d ij /d ii = 1 [say high interference scenarios];and ii) SJBR is much faster than all the others, if an hightermination accuracy is set (see Table I). Also, the convergencespeed of SJBR is not affected too much by the number ofusers. Finally, in our experiments, we also observed that theperformance of SJBR are not affected too much by the choiceof the parameter ǫ in the (19): a change of ǫ of many ordersof magnitude leads to a difference in the average number ofiterations which is within 5%; we refer the reader to [43]for details, where one can also find a comparison of severalother step-size rules. We must stress however that MIMO-MDP and MIMO-WMMSE do not need any tuning, which isan advantage with respect to our method. of users = 10 of users = 50 of users = 100 d=1 d=2 d=3 d=1 d=2 d=3 d=1 d=2 d=3MDP 1370.5 187 54.4 4148.5 1148 348 8818 1904 704WMMSE 169.2 68.8 53.3 138.5 115.2 76.7 154.3 126.9 103.2JSBR 169.2 24.3 6.9 115.2 34.3 9.3 114.3 28.4 9.7 TABLE I: Average number of iterations (termination accuracy= e - ) of users = 10 of users = 50 of users = 100 d=1 d=2 d=3 d=1 d=2 d=3 d=1 d=2 d=3MDP 429.4 74.3 32.8 1739.5 465.5 202 3733 882 442.6WMMSE 51.6 19.2 14.7 59.6 24.9 16.3 69.8 26.0 19.2JSBR 48.6 9.4 4.0 46.9 12.6 5.1 49.7 12 5.5 TABLE II: Average number of iterations (termination accuracy= e - ) VIII. C
ONCLUSION
In this paper, we proposed a novel decomposition frame-work, based on SCA, to compute stationary solutions ofgeneral nonconvex sum-utility problems (including socialfunctions of complex variables). The main result is a newclass of convergent distributed Jacobi (inexact) best-responsealgorithms, where all users simultaneously solve (inexactly)a suitably convexified version of the original social problem.Our framework contains as special cases many decompositionmethods already proposed in the literature, such as gradientalgorithms, and many block-coordinate descent schemes forconvex functions. Finally, we tested our methodology on somesum-rate maximization problems over SISO/MIMO ICs; ourexperiments show that our algorithms are faster than ad-hoc state-of-the-art methods while having the same (user)computational complexity in the SISO case and similar (orbetter) complexity in the MIMO case. Some interesting futuredirections of this work are under investigation, e.g., how toadaptively choose the step-size rule (so that no a-priori tuningis needed), and how to generalize our framework to scenarioswhen only long-term channel statistics are available.A
CKNOWLEDGMENTS
The authors wish to thank the Associate Editor, Prof.Anthony So, and the anonymous reviewers for their valuablecomments. The authors are also deeply grateful to Prof. Tom Luo, Wei-Cheng Liao, and Yang Yang whose commentscontributed to improve the quality of the paper.The research of Scutari and Song is supported by thegrants NSF No. CNS-1218717 and NSF CAREER No. ECCS-1254739. The research of Palomar is supported by the HongKong RGC 617810 research grant. The research of Pang issupported by NSF grant No. CMMI 0969600 (awarded to theUniversity of Illinois at Urbana-Champaign).A
PPENDIX
For notational simplicity, in the following we will omit ineach b x C i ( y , τ i ) [and b x C ( y , τ ) ] the dependence on C i and τ i ,and write b x i ( y ) [and b x ( y ) ]; also, we introduce f C i ( x i , x − i ) , P j ∈C i f j ( x i , x − i ) and f C − i ( x i , x − i ) , P j ∈C − i f j ( x i , x − i ) . A. Proof of Proposition 1
Before proving the proposition, let us introduce the fol-lowing intermediate result whose proof is a consequence ofassumptions A1-A3 and thus is omitted.
Lemma 6:
Let ˜ f ( x ; y ) , P i ˜ f C i ( x i ; y ) , with ˜ f C i ( x i ; y ) defined in (7). Then the following hold:(i) ˜ f ( • ; y ) is uniformly strongly convex on K with constant c τ > , i.e., ( x − w ) T (cid:16) ∇ x ˜ f ( x ; y ) − ∇ x ˜ f ( w ; y ) (cid:17) ≥ c τ k x − w k , (39)for all x , w ∈ K and given y ∈ K ;(ii) ∇ x ˜ f ( x ; • ) is uniformly Lipschitz continuous on K , i.e.,there exists a < L ∇ ˜ f < ∞ independent on x such that (cid:13)(cid:13)(cid:13) ∇ x ˜ f ( x ; y ) − ∇ x ˜ f ( x ; w ) (cid:13)(cid:13)(cid:13) ≤ L ∇ ˜ f k y − w k , (40)for all y , w ∈ K and given x ∈ K .We prove now the statements of Proposition 1 in thefollowing order (c)-(a)-(b)-(d).(c): Given y ∈ K , by definition, each b x i ( y ) is the uniquesolution of the problem (10) and thus satisfies the minimumprinciple: for all z i ∈ K i , ( z i − b x i ( y )) T ( ∇ x i f C i ( b x i ( y ) , y − i ) + π C i ( y ) + τ i H i ( y ) ( b x i ( y ) − y i )) ≥ . (41)Summing and subtracting ∇ x i f C i ( y i , y − i ) in (41), choosing z i = y i , and using π C i ( y ) , ∇ x i f C − i ( y ) , we get ( y i − b x i ( y )) T ( ∇ x i f C i ( b x i ( y ) , y − i ) − ∇ x i f C i ( y i , y − i ))+ ( y i − b x i ( y )) T ∇ x i U ( y ) − τ i ( b x i ( y ) − y i ) T H i ( y ) ( b x i ( y ) − y i ) ≥ , (42)for all i ∈ I . Recalling the definition of c τ [cf. (14)] and using(42), we obtain ( y i − b x i ( y )) T ∇ x i U ( y ) ≥ c τ k b x i ( y ) − y i k , (43)for all i ∈ I . Summing (43) over i we obtain (13).(a): Let us use the notation as in Lemma 6. Given y , z ∈ K ,by the minimum principle, we have ( v − b x ( y )) T ∇ x ˜ f ( b x ( y ); y ) ≥ ∀ v ∈ K ( w − b x ( z )) T ∇ x ˜ f ( b x ( z ); z ) ≥ ∀ w ∈ K . (44) Setting v = b x ( z ) and w = b x ( y ) , summing the two inequal-ities above, and adding and subtracting ∇ x ˜ f ( b x ( y ); z ) , weobtain: ( b x ( z ) − b x ( y )) T (cid:16) ∇ x ˜ f ( b x ( z ); z ) − ∇ x ˜ f ( b x ( y ); z ) (cid:17) ≤ ( b x ( y ) − b x ( z )) T (cid:16) ∇ x ˜ f ( b x ( y ); z ) − ∇ x ˜ f ( b x ( y ); y ) (cid:17) . (45)Using (39) we can now lower bound the left-hand-side of (45)as ( b x ( z ) − b x ( y )) T (cid:16) ∇ x ˜ f ( b x ( z ); z ) − ∇ x ˜ f ( b x ( y ); z ) (cid:17) ≥ c τ k b x ( z ) − b x ( y ) k , (46)whereas the right-hand side of (45) can be upper bounded as ( b x ( y ) − b x ( z )) T (cid:16) ∇ x ˜ f ( b x ( y ); z ) − ∇ x ˜ f ( b x ( y ); y ) (cid:17) ≤ L ∇ ˜ f k b x ( y ) − b x ( z ) k k y − z k , (47)where the inequality follows from the Cauchy-Schwartz in-equality and (40). Combining (45), (46), and (47), we obtainthe desired Lipschitz property of b x ( • ) .(b): Let x ⋆ ∈ K be a fixed point of b x ( y ) , that is x ⋆ = b x ( x ⋆ ) .By definition, each b x i ( y ) satisfies (41), for any given y ∈ K .Setting y = x ⋆ and using x ⋆ = b x ( x ⋆ ) , (41) reduces to ( z i − x ⋆i ) T ∇ x i U ( x ⋆ ) ≥ , (48)for all z i ∈ K i and i ∈ I . Taking into account the Cartesianstructure of K and summing (48) over i ∈ I we obtain ( z − x ⋆ ) T ∇ x U ( x ⋆ ) ≥ , for all z ∈ K , with z , ( z i ) Ii =1 ;therefore x ⋆ is a stationary solution of (1).The converse holds because i) b x ( x ⋆ ) is the unique optimalsolution of (10) with y = x ⋆ , and ii) x ⋆ is also an optimalsolution of (10), since it satisfies the minimum principle.(d): It follows readily from (43). (cid:3) B. Proof of Theorems 3 and 4
We prove Theorem 4; Theorem 3(b) is a special case;the proof of simpler Theorem 3(a) is omitted and can beobtained following similar steps. The line of the proof isbased on standard descent arguments, but suitably combinedwith the properties of b x ( y ) (cf. Prop. 1), and the presence oferrors { ǫ ni } . We will also use the following lemma, which isthe deterministic version of the Robbins-Siegmund result forrandom sequences [44, Lemma 11] (but without requiring thenonnegativity of X n and Z n as instead in [44, Lemma 11]). Lemma 7:
Let { X n } , { Y n } , and { Z n } be three sequencesof numbers such that Y n ≥ for all n . Suppose that X n +1 ≤ X n − Y n + Z n , ∀ n = 0 , , . . . and P n Z n < ∞ . Then either X n → −∞ or else { X n } converges to a finite value and P n Y n < ∞ . (cid:3) We are now ready to prove Theorem 4. For any given n ≥ ,the Descent Lemma [36] yields U (cid:0) x n +1 (cid:1) ≤ U ( x n ) + γ n ∇ x U ( x n ) T ( z n − x n )+ ( γ n ) L ∇ U k z n − x n k , (49)with z n , ( z ni ) Ii =1 , and z ni defined in Step 2 (Al-gorithm 2). Using k z n − x n k ≤ k b x ( x n ) − x n k + P i k z ni − b x i ( x n ) k ≤ k b x ( x n ) − x n k +2 P i ( ε ni ) , wherein the last inequality we used k z ni − b x i ( x n ) k ≤ ε ni , and ∇ x U ( x n ) T ( z n − b x ( x n ) + b x ( x n ) − x n ) ≤− c τ k b x ( x n ) − x n k + P i ε ni k∇ x i U ( x n ) k , (50)which follows from Prop. 1(c), (49) yields: for all n ≥ , U (cid:0) x n +1 (cid:1) ≤ U ( x n ) − γ n ( c τ − γ n L ∇ U ) k b x ( x n ) − x n k + T n , (51)where T n , γ n P i ε ni k∇ x i U ( x n ) k + ( γ n ) L ∇ U P i ( ε ni ) .Note that, under the assumptions of the theorem, P ∞ n =0 T n < ∞ . Since γ n → , we have for some positive constant β andsufficiently large n , say n ≥ ¯ n , U (cid:0) x n +1 (cid:1) ≤ U ( x n ) − γ n β k b x ( x n ) − x n k + T n . (52)Invoking Lemma 7 with the identifications X n = U (cid:0) x n +1 (cid:1) , Y n = γ n β k b x ( x n ) − x n k and Z n = T n while using P n T n < ∞ , we deduce from (52) that either { U ( x n ) } →−∞ or else { U ( x n ) } converges to a finite value and lim n →∞ n X t =¯ n γ t (cid:13)(cid:13)b x ( x t ) − x t (cid:13)(cid:13) < + ∞ . (53)Since U ( x ) is coercive, U ( x ) ≥ min y ∈K U ( y ) > −∞ ,implying that { U ( x n ) } n is convergent; it follows from (53)and P ∞ n =0 γ n = ∞ that lim inf n →∞ k b x ( x n ) − x n k = 0 . Using Prop. 1, we show next that lim n →∞ k b x ( x n ) − x n k =0 ; for notational simplicity we will write △ b x ( x n ) , b x ( x n ) − x n . Suppose, by contradiction, that lim sup n →∞ k△ b x ( x n ) k > . Then, there exists a δ > such that k△ b x ( x n ) k > δ forinfinitely many n and also k△ b x ( x n ) k < δ for infinitely many n . Therefore, one can always find an infinite set of indexes,say N , having the following properties: for any n ∈ N , thereexists an integer i n > n such that k△ b x ( x n ) k < δ, (cid:13)(cid:13) △ b x ( x i n ) (cid:13)(cid:13) > δ (54) δ ≤ (cid:13)(cid:13) △ b x ( x j ) (cid:13)(cid:13) ≤ δ n < j < i n . (55)Given the above bounds, the following holds: for all n ∈ N , δ ( a ) < (cid:13)(cid:13) △ b x ( x i n ) (cid:13)(cid:13) − k△ b x ( x n ) k≤ (cid:13)(cid:13)b x ( x i n ) − b x ( x n ) (cid:13)(cid:13) + (cid:13)(cid:13) x i n − x n (cid:13)(cid:13) (56) ( b ) ≤ (1 + ˆ L ) (cid:13)(cid:13) x i n − x n (cid:13)(cid:13) (57) ( c ) ≤ (1 + ˆ L ) i n − X t = n γ t (cid:0)(cid:13)(cid:13) △ b x ( x t ) (cid:13)(cid:13) + (cid:13)(cid:13) z t − b x ( x t ) (cid:13)(cid:13)(cid:1) ( d ) ≤ (1 + ˆ L ) (2 δ + ε max ) i n − X t = n γ t , (58)where (a) follows from (54) and (55); (b) is due to Prop.1(a); (c) comes from the triangle inequality and the updatingrule of the algorithm; and in (d) we used (54), (55), and k z t − b x ( x t ) k ≤ P i ε ti , where ε max , max n P i ε ni < ∞ .It follows from (58) that lim inf n →∞ i n − X t = n γ t ≥ δ (1 + ˆ L )(2 δ + ε max ) > . (59) We show next that (59) is in contradiction with the conver-gence of { U ( x n ) } n . To do that, we preliminary prove that,for sufficiently large n ∈ N , it must be k△ b x ( x n ) k ≥ δ/ .Proceeding as in (58), we have: for any given n ∈ N , (cid:13)(cid:13) △ b x ( x n +1 ) (cid:13)(cid:13) − k△ b x ( x n ) k ≤ (1 + ˆ L ) (cid:13)(cid:13) x n +1 − x n (cid:13)(cid:13) ≤ (1 + ˆ L ) γ n ( k△ b x ( x n ) k + ε max ) . It turns out that for sufficiently large n ∈ N so that (1 +ˆ L ) γ n < δ/ ( δ + 2 ε max ) , it must be k△ b x ( x n ) k ≥ δ/ (60)otherwise the condition (cid:13)(cid:13) △ b x ( x n +1 ) (cid:13)(cid:13) ≥ δ would be violated[cf. (55)]. Hereafter we assume w.l.o.g. that (60) holds for all n ∈ N (in fact, one can alway restrict { x n } n ∈N to a propersubsequence).We can show now that (59) is in contradiction with theconvergence of { U ( x n ) } n . Using (52) (possibly over a sub-sequence), we have: for sufficiently large n ∈ N , U (cid:0) x i n (cid:1) ≤ U ( x n ) − β i n − X t = n γ t (cid:13)(cid:13) △ b x ( x t ) (cid:13)(cid:13) + i n − X t = n T t ( a ) < U ( x n ) − β ( δ / i n − X t = n γ t + i n − X t = n T t (61)where in (a) we used (55) and (60), and β is some positiveconstant. Since { U ( x n ) } n converges and P ∞ n =0 T n < ∞ , (61)implies lim N ∋ n →∞ P i n − t = n γ t = 0 , which contradicts (59).Finally, since the sequence { x n } is bounded [due to thecoercivity of U ( x ) and the convergence of { U ( x n ) } n ], it hasat least one limit point ¯ x that must belong to K . By the con-tinuity of b x ( • ) [Prop. 1(a)] and lim n →∞ k b x ( x n ) − x n k = 0 ,it must be b x (¯ x ) = ¯ x . By Prop. 1(b) ¯ x is also a stationarysolution of the social problem (1).Note that, in the setting of Theorem 3, ε ni = 0 for all i and n ; therefore T n = 0 for all n . It follows from (52) that U ( x n ) is a decreasing sequence, which entails that no limit point of { x n } can be a local maximum. (cid:3) C. Proof of Theorem 5
The main idea of the proof is to interpret Algorithm 3 as aninstance of the inexact Jacobi scheme described in Algorithm2, and show that Theorem 4 is satisfied. It is not difficult toshow that this reduces to prove that, for all i = 1 , . . . , I , thesequence z ti in Step 2a) of Algorithm 3 satisfies k z ti − b x i ( x t ) k ≤ ˜ ε ti , (62)for some { ˜ ε ti } such that P t ˜ ε ti γ t < ∞ . The following holdsfor the LHS of (62): k z ti − b x i ( x t ) k ≤ k b x i ( x t +1 i< , x ti ≥ ) − b x i ( x t ) k + k z ti − b x i ( x t +1 i< , x ti ≥ ) k ( a ) ≤ k b x i ( x t +1 i< , x ti ≥ ) − b x i ( x t ) k + ε ti ( b ) ≤ ˆ L k x t +1 i< − x t
Proc. of Int. Conf. on NETwork Games, COntrol andOPtimization (NetGCooP 2011) , Paris, France, Oct. 2011, pp. 12–14.[2] D. P. Palomar and M. Chiang, “Alternative distributed algorithms fornetwork utility maximization: Framework and applications,”
IEEE Trans.on Automatic Control , vol. 52, no. 12, pp. 2254–2269, Dec. 2007.[3] M. Chiang, S. H. Low, A. R. Calderbank, and J. C. Doyle, “Layeringas optimization decomposition: A mathematical theory of networkarchitectures,”
Proc. of the IEEE , vol. 95, no. 1, pp. 255–312, Jan. 2007.[4] D. P. Bertsekas and J. N. Tsitsiklis,
Parallel and Distributed Computa-tion: Numerical Methods , 2nd ed. Athena Scientific Press, 1989.[5] Z.-Q. Luo and S. Zhang, “Spectrum management: Complexity andduality,”
IEEE J. Sel. Topics Signal Process. , vol. 2, no. 1, pp. 57–72,Feb. 2008.[6] M. Hong and Z.-Q. Luo, “Signal processing and optimal resourceallocation for the interference channel,”
Elsevier e-Reference-SignalProcessing , 2013. Available at http://arxiv.org/pdf/1206.5144v1.pdf.[7] W. Yu, G. Ginis, and J. M. Cioffi, “Distributed multiuser power controlfor digital subscriber lines,”
IEEE J. Sel. Areas Commun. , vol. 20, no. 5,pp. 1105–1115, June 2002.[8] Z.-Q. Luo and J.-S. Pang, “Analysis of iterative waterfilling algorithmfor multiuser power control in digital subscriber lines,”
EURASIP Jour.on Applied Signal Processing , vol. 2006, pp. 1–10, May 2006.[9] G. Scutari, D. P. Palomar, and S. Barbarossa, “Optimal linear precodingstrategies for wideband noncooperative systems based on game theory—part I & II: Nash equilibria and distributed algorithms,”
IEEE Trans.Signal Process. , vol. 56, no. 3, pp. 1230–1267, March 2008.[10] ——, “Asynchronous iterative water-filling for Gaussian frequency-selective interference channels,”
IEEE Trans. on Information Theory ,vol. 54, no. 7, pp. 2868–2878, July 2008.[11] R. Cendrillon, J. Huang, M. Chiang, and M. Moonen, “Autonomousspectrum balancing for digital subscriber lines,”
IEEE Trans. SignalProcess. , vol. 55, no. 8, pp. 4241–4257, Aug. 2007.[12] G. Scutari, D. P. Palomar, and S. Barbarossa, “Competitive design ofmultiuser MIMO systems based on game theory: A unified view,”
IEEEJ. Sel. Areas Commun. , vol. 26, no. 7, pp. 1089–1103, Sept. 2008.[13] ——, “The MIMO iterative waterfilling algorithm,”
IEEE Trans. SignalProcess. , vol. 57, no. 5, May 2009.[14] G. Scutari and D. P. Palomar, “MIMO cognitive radio: A game theoreti-cal approach,”
IEEE Trans. Signal Process. , vol. 58, no. 2, pp. 761–780,Feb. 2010.[15] E. Larsson, E. Jorswieck, J. Lindblom, and R. Mochaourab, “Gametheory and the flat-fading gaussian interference channel,”
IEEE SignalProcess. Mag. , vol. 26, no. 5, pp. 18–27, Sept. 2009.[16] A. Leshem and E. Zehavi, “Game theory and the frequency selectiveinterference channel,”
IEEE Signal Process. Mag. , vol. 26, no. 5, pp.28–40, Sept. 2009.[17] F. Facchinei and J.-S. Pang,
Finite-Dimensional Variational Inequalitiesand Complementarity Problem . Springer-Verlag, New York, 2003.[18] J.-S. Pang, G. Scutari, D. P. Palomar, and F. Facchinei, “Design ofcognitive radio systems under temperature-interference constraints: Avariational inequality approach,”
IEEE Trans. Signal Process. , vol. 58,no. 6, pp. 3251–3271, June 2010.[19] G. Scutari, D. Palomar, F. Facchinei, and J.-S. Pang, “Flexible designof cognitive radio wireless systems: From game theory to variationalinequality theory,”
IEEE Signal Process. Mag. , vol. 26, no. 5, pp. 107–123, Sept. 2009.[20] ——, “Convex optimization, game theory, and variational inequalitytheory in multiuser communication systems,”
IEEE Signal Process.Mag. , vol. 27, no. 3, pp. 35–49, May 2010.[21] J. Huang, R. Berry, and M. L. Honig, “Distributed interference com-pensation for wireless networks,”
IEEE J. Sel. Areas Commun. , vol. 24,no. 5, pp. 1074–1084, May 2006.[22] F. Wang, M. Krunz, and S. Cui, “Price-based spectrum management incognitive radio networks,”
IEEE J. Sel. Topics Signal Process. , vol. 2,no. 1, pp. 74–87, Feb. 2008. [23] D. Schmidt, C. Shi, R. Berry, M. Honig, and W. Utschick, “Distributedresource allocation schemes: Pricing algorithms for power control andbeamformer design in interference networks,”
IEEE Signal Process.Mag. , vol. 26, no. 5, pp. 53–63, Sept. 2009.[24] S.-J. Kim and G. B. Giannakis, “Optimal resource allocation for MIMOad hoc cognitive radio networks,”
IEEE Trans. on Information Theory ,vol. 57, no. 5, pp. 3117–3131, May 2011.[25] L. Grippo and M. Sciandrone, “On the convergence of the blocknonlinear gauss-seidel method under convex constraints,”
OperationsReseach Letters , vol. 26, no. 3, pp. 127–136, April 2000.[26] S. Ye and R. S. Blum, “Optimized signaling for MIMO interferencesystems with feedback,”
IEEE Trans. Signal Process. , vol. 51, no. 11,pp. 2839–2848, Nov. 2003.[27] M. Chiang, C. W. Tan, D. P. Palomar, D. O. Neill, and D. Julian, “Powercontrol by geometric programming,”
IEEE Trans. Wireless Commun. ,vol. 6, no. 7, pp. 2640–2651, July 2007.[28] P. Tsiaflakis, M. Diehl, and M. Moonen, “Distributed spectrum man-agement algorithms for multiuser dsl networks,”
IEEE Trans. SignalProcess. , vol. 56, no. 10, pp. 4825–4843, Oct. 2008.[29] J. Papandriopoulos and J. S. Evans, “Scale: a low-complexity distributedprotocol for spectrum balancing in multiuser dsl networks,”
IEEE Trans.on Information Theory , vol. 55, no. 8, pp. 3711–3724, Aug. 2009.[30] Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iteratively weightedMMSE approach to distributed sum-utility maximization for a MIMOinterfering broadcast channel,”
IEEE Trans. Signal Process. , vol. 59,no. 9, pp. 4331–4340, Sept. 2011.[31] C. Shi, D. A. Schmidt, R. A. Berry, M. L. Honig, and W. Utschick,“Distributed interference pricing for the MIMO interference channel,”in
Int. Conf. on Comm. (ICC) , Princeton, NJ, USA, June 14-18 2009.[32] B. R. Marks and G. P. Wright, “A general inner approximation algorithmfor nonconvex mathematical programs,”
Operations Research , vol. 26,no. 2, pp. 681–683, 1978.[33] M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A unified convergenceanalysis of block successive minimization methods for nonsmoothoptimization,”
Arxiv.org , Oct. 2012. http://arxiv.org/abs/1209.2385v1.[34] M. Hong, Q. Li, Y.-F. Liu, and Z.-Q. Luo, “Decomposition by succes-sive convex approximation: A unifying approach for linear transceiverdesign in interfering heterogenous networks,”
Arxiv.org , Oct. 2012.http://arxiv.org/abs/1210.1507v1.[35] R. Cendrillon, W. Yu, M. Moonen, J. Verlinden, and T. Bostoen,“Optimal multiuser spectrum balancing for digital subscriber lines,”
IEEE Trans. Signal Process. , vol. 54, no. 5, pp. 922–933, May 2006.[36] D. Bertsekas,
Nonlinear Programming . Belmont, MA, USA: AthenaScientific, 2th Ed., 1999.[37] C. Shi, R. Berry, and M. Honig, “Distributed interference pricing forofdm wireless networks with non-separable utilities,” in , Princeton, NJ, USA, Mar. 2008.[38] W. Yu, W. Rhee, S. Boyd, and J. Cioffi, “Iterative water-filling forgaussian vector multiple access channels,”
IEEE Trans. on InformationTheory , vol. 50, no. 1, pp. 145–151, Jan. 2004.[39] Y. Yang, G. Scutari, and D. Palomar, “Parallel stochastic decompositionalgorithms for multi-agent systems,” in
Proc. of 2013 IEEE InternationalWorkshop on Signal Processing Advances in Wireless Communications(SPAWC ’13) , Darmstadt, Germany, June 16-19 2013.[40] A. Hjorungnes,
Complex-Valued Matrix Derivatives With Applicationsin Signal Processing and Communications . London: CambridgeUniversity Press, May 2011.[41] G. Scutari, F. Facchieni, J.-S. Pang, and D. P. Palomar, “Realand complex monotone communication games,”
IEEE Trans. onInformation Theory , (submitted, Nov. 2012). [Online]. Available:http://arxiv.org/abs/1212.6235[42] D. P. Palomar and J. Fonollosa, “Practical algorithms for a family ofwaterfilling solutions,”
IEEE Trans. Signal Process.