[PDF] Rate Allocation and Content Placement in Cache Networks

Abstract

We introduce the problem of optimal congestion control in cache networks, whereby \emph{both} rate allocations and content placements are optimized \emph{jointly}. We formulate this as a maximization problem with non-convex constraints, and propose solving this problem via (a) a Lagrangian barrier algorithm and (b) a convex relaxation. We prove different optimality guarantees for each of these two algorithms; our proofs exploit the fact that the non-convex constraints of our problem involve DR-submodular functions.

Full PDF

RRate Allocation and ContentPlacement in Cache Networks

Khashayar Kamran, Armin Moharrer, Stratis Ioannidis, and Edmund Yeh

Electrical and Computer Engineering, Northeastern University, Boston, MA, USA { kamrank, amoharrer, ioannidis, eyeh } @ece.neu.edu Abstract —We introduce the problem of optimal congestioncontrol in cache networks, whereby both rate allocations andcontent placements are optimized jointly . We formulate this as amaximization problem with non-convex constraints, and proposesolving this problem via (a) a Lagrangian barrier algorithm and(b) a convex relaxation. We prove different optimality guaranteesfor each of these two algorithms; our proofs exploit the factthat the non-convex constraints of our problem involve DR-submodular functions.

Index Terms —Congestion control, caching, rate control, utilitymaximization, DR-submadular maximization, non-convex opti-mization

I. I

NTRODUCTION

Trafﬁc engineering and congestion control have played acrucial role in the stability and scalability of communicationnetworks since the early days of the Internet. They have beenextremely active research areas since the seminal work byKelly et al. [1], who studied optimal rate control subject tolink capacity constraints. Formally, given a network G ( V , E ) with nodes v ∈ V , links e ∈ E , and ﬂows n ∈ N , Kelly etal. [1] studied the following convex optimization problem: max λ (cid:80) n ∈N U n ( λ n ) (1a)s.t. ρ e ( λ ) ≤ C e , ∀ e ∈ E , (1b)where λ = [ λ n ] n ∈N ∈ R | n | + is the vector of rate allocations λ n , n ∈ N across ﬂows, ρ e : R | n | → R + , C e ∈ R + are the loadsand capacities of links e ∈ E , respectively, and U n : R + → R , n ∈ N , are concave utility functions of rates. Motivatedby Kelly et al. [1], distributed congestion control algorithmssolving Prob. (1) are now both numerous and classic [2]–[6].In this work, we revisit this problem in the context of cachenetworks [7]–[9]. Motivated by technologies such as softwaredeﬁned networks [10], [11] and network function virtual-ization [12], nodes in cache networks are no longer merelystatic routers. Instead, they are entities capable of storingdata, performing computations, and making decisions. Nodescan thus fetch user-requested content [13], [14], or performuser-speciﬁed computation tasks [15], [16], instead of simplymaintaining point to point communication sessions. In turn,such functionalities can address the ever increasing interestin running data-intensive applications in large-scale networks,such as machine learning at the edge [17], IoT-enabled healthcare [18], and scientiﬁc data-intensive computation [19], [20]. Fig. 1: Example of a cache network. N distinct ﬂows of requests enterthe network at node v . Each ﬂow contains requests for an item i ina catalog I . Requests are forwarded towards the designated serverthat stores all items in I . Upon reaching it, responses carrying therequested items follow the reverse path towards node v . However,all intermediate nodes have caches that can be used to store itemsin I . Thus, requests need not traverse the entire path, but can besatisﬁed upon the ﬁrst hit. Hence, the load on link e = ( v k +1 , v k ) is a function of both rates λ = [ λ n ] Nn =1 as well as cache allocationdecisions made by the intermediate nodes v , . . . , v k , to name a few. Congestion control in such networks is fundamentally dif-ferent from the classic setting. When nodes can store user-requested content or provide server functionalities, networkdesign amounts to determining not only the rate allocationsper ﬂow but also the location of offered network services.Put differently, to attain optimality, cache allocation decisionsneed to be optimized jointly with rate allocation decisions .In turn, this necessitates the development of novel congestioncontrol algorithms that take cache allocation into account.To make this point clear, we illustrate the effect of cacheallocations on congestion in a cache network shown in Fig. 1.Requests for content items in a catalog I arrive over N ﬂowson a node on the left of a path network. They are subsequentlyforwarded towards a designated server on the right, that storesall items in I . Upon reaching the server, responses carrying therequested items are sent back over the reverse path. Assumingthat request trafﬁc is negligible, the trafﬁc load on an edge isdetermined by the item (i.e., response) trafﬁc ﬂowing throughit. However, if intermediate nodes are equipped with cachesthat can store some of the items in I , as in Fig. 1, requestsneed not be propagated all the way to the designated server.As a result, the load ρ e caused by items traversing edge e =( v k +1 , v k ) depends not only on the rate vector λ = [ λ n ] n ∈N ∈ R | n | + , but also on the cache allocation decisions made at allnodes v , . . . , v k preceding e in the path. For example, theload ρ e is zero if all items in I are stored in nodes v , . . . , v k . a r X i v : . [ c s . N I] F e b ormally, in cache networks, Problem (1) becomes: max λ , x (cid:80) n ∈N U n ( λ n ) (2a)s.t. ρ e ( λ , x ) ≤ C e , ∀ e ∈ E , (2b) (cid:80) i ∈I x vi ≤ c v , ∀ v ∈ V , (2c)where x = [ x vi ] v ∈ V,i ∈I ∈ { , } |V||I| is the vector of cacheallocation decisions x vi ∈ { , } , indicating if node v ∈ V stores i ∈ I , c v ∈ N is the storage capacity of node v ∈ V ,and λ , ρ e , C e , U n are respectively the rate allocation vector,loads, link capacities, and utilities, as in Eq. (1). Crucially, theload ρ e on links e ∈ E is a function of both the allocated rates and cache decisions. As a result, constraints (2b) deﬁne a anon-convex set. This is not just due to the combinatorial natureof cache allocation decisions x : even if x vi are relaxed toreal values in [0 , , which corresponds to making probabilisticcache allocation decisions, the resulting constraint (2b) is stillnot convex , and Problem (2) cannot be solved via standardconvex optimization techniques. This is a signiﬁcant departurefrom Problem (1), in which constraints (1b) are linear.In spite of the challenges posed by the lack of convexity, we propose algorithms solving Problem 2 with provable ap-proximation guarantees. Speciﬁcally:1) We provide a uniﬁed optimization formulation for con-gestion control in cache networks, through joint proba-bilistic content placement and rate control. To the bestof our knowledge, we are the ﬁrst to study this classof non-convex problems and develop algorithms withapproximation guarantees.2) We propose two algorithms, each yielding different ap-proximation guarantees. The ﬁrst is a Lagrangian barriermethod; the second is a convex relaxation. In both cases,we exploit the fact that constraints (2b) can be ex-pressed in terms of DR-submodular functions [21]. Bothalgorithms and their corresponding analysis are noveland of independent interest, as they may be applicablefor attacking problems with DR-submodular constraintsbeyond the cache network setting we consider here.3) Finally, we implement both methods and compare themexperimentally to greedy algorithms over several real-world and synthetic topologies, observing an improve-ment in aggregate utility by as much as 5.43 × .The remainder of this paper is structured as follows. Wereview related work in Section II. Our network model andproblem formulation are discussed in Section III. In Sec-tion IV, we describe our two different methods for solvingthe optimal congestion control problem, as well as our per-formance guarantees. Finally, we present our evaluations inSection V, and we conclude in Section VI.II. R ELATED W ORK

Network Cache Optimization.

Studies on optimal in-networkcache allocation are numerous, roughly split into the ofﬂine and online solutions. Several papers study centralized, of-ﬂine cache optimization in a network modeled as a bipartite graph [22]–[24]. Shanmugam et al. [25] consider a femto-cell network, where content is placed in caches to reducethe cost of fetching data from a base station. They do notconsider congestion, and study routing costs that are linearin the trafﬁc per link. Mahdian et al. [26] model every linkwith an M/M/1 queue, and consider objectives that are (non-linear) functions of the queue sizes. Similarly, Li and Ioannidis[27] model every link with an M/M/1c queue to capture theconsolidation of identical responses before being forwardeddownstream. The same problem was also studied, albeit ina different model, by Dehghan et al. [28]. Online cacheallocation algorithms exist, e.g., for maximizing throughput[15], [29], or minimizing delay [30]. Ioannidis and Yeh [9]study a similar problem as Shanmugam et al. [25] for networkswith arbitrary topology and linear link costs, seeking cacheallocations that minimize routing costs across multiple hops.The same authors extend this work to jointly optimizing cacheand routing decisions [31].Although we too consider ofﬂine algorithms, we depart sub-stantially from prior work. First, all mentioned papers assumethe input request rates are ﬁxed, whereas we consider jointcache allocation and rate control. Prior works on allocationminimizing costs [9], [25]–[28], [31] cast the problem as a sub-modular maximization problem subject to matroid constraints,for which a (1 − /e ) -approximate solution can be constructedin polynomial time. Instead, akin to Kelly et al. [1], we treatloads on links as constraints rather than part of the objective.Hence, we cannot directly leverage sub-modular maximizationtechniques and need to design altogether new algorithms.Moreover, works which consider congestion [26]–[28] assumethat the system is stable when all caches are empty; in fact,ﬁnding a cache allocation under which the system is stable isleft open. We partially resolve this, jointly ﬁnding a rate andan allocation that ensure stability.Similar to this paper, many consider ﬁxed routing forrequests. Although one can incorporate routing decisions intothe problem, not doing that does not make the optimal cacheallocation problem trivial, and the problem is still NP-complete[9], [25]–[28], [30], [32], [33]. TTL caches . Time-to-Live (TTL) caches providing an elegantgeneral framework for analyzing cache replacement policies.In TTL caches, a timer is assigned to each content, andan eviction occurs upon timer expiration. Multiple studiesanalyzed TTL caches as approximations to popular cacheeviction policies (see [8], [34]–[38]). TTL cache optimizationincludes maximizing the cache hit rate [39] and the aggregateutility of cache hits [40]–[42]. In contrast to our approach,however, works on TTL caching do not provide a solution forjoint cache and rate allocation, and do not guarantee networkstability. Furthermore, they focus on the utility of cache hits,whereas we consider rate utility, similar to Kelly [1].

Rate admission control in cache networks.

Various methodshave been proposed for rate admission control in Content-Centric Networking (CCN) [13] and Named Data Networking(NDN) [14] architectures, primarily using congestion feedbackfrom the network for rate control [43]–[47]. In contrast to ourork, none of these come with optimality guarantees. Closer tous, Caroﬁglio et al. [48] ﬁx a cache allocation and maximizerate utility via rate control; we depart by jointly optimizingrate and cache allocations.

Non-Convex Optimization Techniques.

Our analysis em-ploys the Lagrangian barrier algorithm proposed by Conn etal. [49] along with a trust-region algorithm proposed by thesame authors [50]. This is a modiﬁed version of the barriermethod [51] which explicitly deals with numerical difﬁcultiesarising from maximizing an unconstrained barrier function.For general, non-convex optimization problems, the aforemen-tioned Lagrangian barrier and the trust-region algorithms comewith no optimality guarantees. One of our main technicalcontributions is to provide such guarantees for our problemby exploiting the fact that the non-convex constraints involveDR-submodular functions [21].

DR-submodular optimization.

Since their introduction byBian et al. [21], DR-submodular functions have receivedmuch attention [52]–[58] as examples of functions whichcan be maximized with performance guarantees, in spite ofthe fact they are not convex. Bian et al. [21] propose aconstant-factor approximation algorithm for (a) maximizingmonotone DR-submodular functions subject to down-closedconvex constraints and (b) maximizing non-monotone DR-submodular functions subject to box constraints. In a follow-uppaper, Bian et al. [52] provide a constant-factor algorithm formaximizing non-monotone continuous DR-submodular func-tions under general down-closed convex constraints. Theseworks, however, do not consider DR-submodular functions inconstraints, rather than in the objective. In a combinatorialsetting, Crawford et al. [53] and Iyer et al. [54] provide approx-imate greedy algorithms for minimizing a submodular functionsubject to a single threshold constraint involving a submodularfunction. None of the above solutions, however, is applicableto our problem, which involves maximizing a concave functionsubject to multiple

DR-submodular constraints. To the best ofour knowledge, we are the ﬁrst to study this problem andprovide solutions with optimality guarantees.III. S

YSTEM M ODEL

We consider a network of caches, each capable of storingitems such that the number of stored items cannot exceeda ﬁnite cache capacity. Requests are routed over ﬁxed (andgiven) paths and are satisﬁed upon hitting the ﬁrst cache thatcontains the requested item. Our goal is to determine the (a)items to be stored at each cache as well as (b) the requestrates, so that the aggregate utility is maximized, subject to bothbandwidth and storage capacity constraints in the network.

A. Network Model

Caches and Items.

Following [9], [26], we represent anetwork by a directed graph G ( V , E ) . We assume G ( V , E ) is symmetric , i.e., ( b, a ) ∈ E implies that ( a, b ) ∈ E . Thereexists a catalog I of items (e.g., ﬁles, or ﬁle chunks) ofequal size which network users can request. Each node isassociated with a cache that can store a ﬁnite number of items. We describe cache contents via indicator variables: x vi ∈ { , } for v ∈ V , i ∈ I , where x vi = 1 indicatesthat node v stores item i ∈ I . The total number of items thata node v ∈ V can store is bounded by its node capacity c v ∈ N (measured in number of items). More precisely, (cid:80) i ∈I x vi ≤ c v for all v ∈ V . (3)We associate each item i with a ﬁxed set of designated servers S i ⊆ V , that permanently store i ; equivalently, x vi = 1 , forall v ∈ S i . As we discuss below, these act as “caches of lastresort”, and ensure that all items can be eventually retrieved. Content Requests.

Item requests are routed over the networktoward the designated servers. We denote by N the set of allrequests. A request is determined by the item requested andthe path that the request follows. Formally, a request n is apair ( i, p ) where i ∈ I is the requested item and p ⊆ V isthe path to be traversed to serve this request. Path p of length | p | = K is a sequence of nodes { p , p , . . . , p K } ⊆ V , where ( p j , p j +1 ) ∈ E for all j ∈ [ K − (cid:44) { , , . . . , K − } .An incoming request ( i, p ) is routed over the graph G andfollows the path p , until it reaches a node that stores item i .At that point, a response message is generated, which carriesthe requested item. The response is propagated over p in thereverse direction, i.e., from the node that stores the item, backto the ﬁrst node in p , from which the request was generated.Following [9], [26], we say that a request n = ( i, p ) is well-routed if: (a) the path p is simple, i.e., it contains no loops,(b) the last node in the path is a designated server for i , i.e., p K ∈ S i , and (c) no other node in the path is a designatedserver node for i , i.e., p k / ∈ S i , for k ∈ [ K − . Without loss ofgenerality, we assume that all requests in N are well-routed;note that every well-routed request eventually encounters anode that contains the item requested. Bandwidth Capacities.

For each link ( a, b ) ∈ E there existsa positive and ﬁnite link capacity C ab > (measured initems/sec) indicating the bandwidth available on ( a, b ) . Wedenote the vector of link capacities by C ∈ R + |E| . Weconsider two means of controlling the rate of item transmissionon a link, thereby preventing congestion in the network: (a) viathe cache allocation strategy, i.e., by storing the requested itemon a node along the path, which eliminates the ﬂow of itemon upstream links, and (b) via the rate allocation strategy, i.e.,by controlling the rate with which requests enter the network.We describe each one in detail below. Cache Allocation Strategy.

We adopt a probabilistic cacheallocation strategy. That is, we partition time into periods ofequal length

T > . At the beginning of the t -th time period,each node v ∈ V stores an item i ∈ I independently of othernodes and other time periods with probability y vi ∈ [0 , ,i.e., y vi = P { x vi ( t ) = 1 } = E [ x vi ( t )] , for all t > , where x vi ( t ) = 1 indicates that node v stores item i at the t -th timeperiod. We denote by Y = [ y vi ] v ∈V ,i ∈I ∈ [0 , |V||I| the cacheallocation strategy vector, satisfying constraints: (cid:80) i ∈I y vi ≤ c v for all v ∈ V . (4)lthough condition (4) implies that cache capacity constraintsare satisﬁed in expectation , it is necessary and sufﬁcient for theexistence of a probabilistic content placement (i.e., a mappingof items to caches) that satisﬁes capacity constraints (3) exactly (see, e.g., [9], [59]). We present this probabilistic placementin detail in Appendix A. In this mapping, (a) node v stores atmost c v items, and (b) the marginal probability of storing i is y vi . Given a global cache allocation strategy Y that satisﬁes(4), this mapping can be used to randomly place items atcaches at the beginning of each time period. We thereforetreat Y as the parameter to optimize over in the remainder ofthe paper. Rate Allocation Strategy.

Our second knob for controllingcongestion is classic rate allocation, as in [1], [48]. That is,we control the input rate of requests so that the ﬁnal requestsinjected into the network have a rate equal or smaller thanoriginal rates. We refer to the original exogenous arrival rateof a requests n = ( i, p ) ∈ N as the demand rate , and denoteit by ¯ λ n > (in requests per second). We denote the vector ofdemand rates by ¯ λ = [¯ λ n ] n ∈N . We also denote the admittedinput rate of requests into the network by λ n , where λ n ≤ ¯ λ n , for all n ∈ N . (5)We refer to the vector λ = [ λ n ] n ∈N ∈ R |N | + as the rateallocation strategy . We make the following assumptions onrequests admitted into the network: (a) the request process isstationary and ergodic, (b) a corresponding response messageis eventually created for every admitted request, (c) the net-work is stable if, for all ( b, a ) ∈ E , the following holds: ρ ( b,a ) ( λ , Y ) = (cid:80) ( i,p ):( a,b ) ∈ p λ ( i,p ) (cid:81) av = p (1 − y vi ) ≤ C ba . (6) Using the probabilistic cache allocation scheme, and the factthat the admitted request process is stationary and ergodic, ρ ( b,a ) ( λ , Y ) is the expected rate of requests passing throughlink ( a, b ) . In particular, (cid:81) av = p (1 − y vi ) is the fraction ofadmitted rate λ ( i,p ) which is forwarded on link ( a, b ) . Sincewe have assumed that for each request a response message isgenerated, and comes back on the reverse path, the conditionin (6) ensures that the rate of items transmitted on link ( b, a ) is less than or equal to the link capacity C ba . If the trafﬁcrate on a link is greater than the link capacity, the networkbecomes unstable. In order for (6) to ensure stability, similarto [15], [29], in effect we assume that the size of requestsare negligible compared to the size of requested items, andthe load primarily consists of the downstream trafﬁc of items.Note that the load on edge ( b, a ) depends both the rate and thecache allocation strategy, while constraints (6) are non-convex. System Utility.

Consistent with Kelly et al. [1], each requestclass n ∈ N is associated with a utility function U n : R + → R of the admitted rate λ n . The network utility is then the socialwelfare, i.e., the sum of all request utilities in the network: U ( λ ) = (cid:80) n ∈N U n ( λ n ) . (7) We assume that each function U n is twice continuously differ-entiable, non-decreasing, and concave for all n ∈ N . Our goalis to determine a rate allocation strategy λ = [ λ n ] n ∈N ∈ R |N | + and a cache allocation strategy Y = [ y vi ] v ∈V ,i ∈I ∈ [0 , |V||I| that jointly maximize (7), subject to the constraints (4), (5),and (6). For technical reasons, we ﬁrst transform this probleminto an equivalent problem via a change of variables. B. Problem Formulation

Change of variables.

Let the residual rate per request be r n (cid:44) ¯ λ n − λ n , for n ∈ N . Given the rate residual strategy R (cid:44) [ r n ] n ∈N ∈ R |N | + , we rewrite the utility as F ( R ) (cid:44) U (¯ λ − R ) = (cid:80) n ∈N U n (¯ λ n − r n ) . (8) Under this change of variables, we state our problem as:U

TILITY M AX maximize F ( R ) (9a)subject to ( Y , R ) ∈ D , (9b)where D is the set of points ( Y , R ) ∈ R |V||I| × R |N | satisfying the following constraints : g ba ( Y , R ) ≥ (cid:80) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba , ∀ ( b, a ) ∈ E (10a) g v ( Y ) ≤ c v , ∀ v ∈ V (10b) ≤ y vi ≤ , ∀ v ∈ V , i ∈ I (10c) ≤ r n ≤ ¯ λ n , ∀ n ∈ N , (10d)where, for ( b, a ) ∈ E and v ∈ V , g v ( Y ) (cid:44) (cid:80) i ∈I y vi , and g ba ( Y , R ) (cid:44) (cid:88) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − (¯ λ ( i,p ) − r ( i,p ) ) a (cid:89) v = p (1 − y vi ) . (11)An important consequence of this change of variables is thefollowing lemma. Lemma 1.

For all ( b, a ) ∈ E , functions g ba : R |V||I| × R |N | → R , are monotone DR-submodular.Proof. Please see Appendix B.We use this in Section IV to provide algorithms withoptimality guarantees for U

TILITY M AX . For this reason, webrieﬂy review DR-submodular functions below. C. DR-submodular Functions

Bian et al. [21] deﬁne a DR-submodular function as follows:

Deﬁnition 1.

Suppose X is a subset of R n . A function f : X → R is DR-submodular if for all a ≤ b ∈ X , i ∈ [ n ] , and k ∈ R + , such that ( ke i + a ) and ( ke i + b ) are still in X , thefollowing inequality holds: f ( ke i + a ) − f ( a ) ≥ f ( ke i + b ) − f ( b ) Intuitively, a DR-submodular function f is concavecoordinate-wise along any non-negative or non-positive direc-tion. DR-submodular functions arise in a variety of differentsettings (see Bian et al. [21]), and in some sense satisfy a W.l.o.g., we implicitly set y vi = 1 , for all v ∈ S i , i ∈ I and do includethese constraints in (10). ABLE I: Properties of DR-submodular functions

Properties DR-submodular f ( . ) , ∀ x , y ∈ X f ( x ) + f ( y ) ≥ f ( x ∨ y ) + f ( x ∧ y ) ,and f ( . ) is coordinate-wise concave.1’st order Deﬁnition 12’nd order ∂ f ( x ) ∂x i ∂x j ≤ , ∀ i, j ∈ [ n ] weakened notion of concavity. They can also be deﬁned inalternative ways that parallel the zero-th, ﬁrst, and secondorder conditions for concavity (see [60]). For example, for X ⊆ R n , a function f : X → R is DR-submodular iff for all x , y ∈ X , f ( x ) + f ( y ) ≥ f ( x ∨ y ) + f ( x ∧ y ) , where ∨ and ∧ are coordinate-wise maximum and minimumoperations, respectively. A list of such conditions of DR-submodular functions is summarized in Table I. Each one ofthese properties serves as a necessary and sufﬁcient conditionfor a function to be DR-submodular [21].The following lemma, proved in Appendix B, indicates howDR-submodularity arises in the constraints of U TILITY M AX : Lemma 2. for all ( b, a ) ∈ E , functions g ba : R |V||I| × R |N | → R , given by (11) , are monotone DR-submodular. For more information on DR-submodular functions, werefer the interested reader to [21], [52].IV. C

ACHE AND R ATE A LLOCATION

The constraint set D in Problem (9) is not convex. There-fore, there is in general no efﬁcient way to ﬁnd the globaloptimum. Constrained optimization techniques can be usedto ﬁnd a Karush-Kuhn-Tucker (KKT) point (i.e., a point inwhich KKT necessary conditions for optimality hold) undermild conditions. In general, there is no guarantee on thevalue of the objective at a KKT point compared to the globaloptimum. However, as one of our major contributions, weprovide optimality guarantees for the objective value of (9)at a KKT point. In particular, we propose two algorithms thatcome with (different) optimality guarantees. Both algorithmsexploit the fact that the functions g ba ( · ) in (10a)–which arethe cause of non-convexity–are DR-submodular functions.In our ﬁrst approach, described in Section IV-A, we solveProblem (9) using a Lagrangian barrier algorithm [49]. Weshow that this converges to a Karush-Kuhn-Tucker (KKT)point (i.e., a point at which KKT necessary conditions foroptimality hold) under mild assumptions. Crucially, and incontrast to general non-convex problems [49], we provideguarantees on the objective value at such KKT points. Inparticular, we show that the ratio of the objective valueat a KKT point to the global optimum value approaches1, asymptotically, under an appropriate proportional scalingof capacities and demand. In Section IV-B, we provide analternative solution via convex relaxation of the constraint set D . This turns our problem into a convex optimization problemfor which efﬁcient algorithms exist. We show that the solutionobtained by solving the convex problem is feasible, and its objective value is bounded from below by the optimal valueof another instance of Problem (9) with tighter constraints. A. Lagrangian Barrier Algorithm for U TILITY M AX Problem (9) is a maximization problem subject to theinequality constraints (10a), (10b), and the simple box con-straints on the variables (10c) and (10d). Due to this structure,we propose to use the

Lagrangian Barrier with Simple Bounds (LBSB) Algorithm, introduced by Conn et al. [49].

Algorithm description.

LBSB deﬁnes the Lagrangian barrierfunction Ψ( Y , R , µ , γ , s ) , given by: F ( R ) + (cid:88) ( b,a ) ∈E µ ba s ba log( g ba ( Y , R ) − (cid:88) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) + C ba + s ba ) + (cid:88) v ∈V γ v s v log( c v − g v ( Y ) + s v ) , (12)where the elements of vectors µ (cid:44) [ µ ba ] ( b,a ) ∈E ∈ R |E| + and γ (cid:44) [ γ v ] v ∈V ∈ R |V| + are the positive Lagrange multiplier esti-mates corresponding to (10a) and (10b) respectively, and thevector s ∈ R |E| + |V| + consists of the positive values [ s ba ] ( b,a ) ∈E and [ s v ] v ∈V called shifts [49]. Intuitively, the Lagrangianbarrier function in (12) penalizes the infeasibility of the linkand cache constraints, and the shifts allow the constraints tobe violated to some extent. Consider the following problem: max ( Y , R ) Ψ( Y , R , µ k , γ k , s k ) (13)s.t. ( Y , R ) ∈ B , where the values µ k , γ k , s k are given, and B is the boxconstraints set deﬁned by (10c) and (10d). Then the necessaryoptimality condition for Problem (13) is (cid:107) P (( Y , R ) , ∇ Y , R Ψ( Y , R , µ k , γ k , s k )) (cid:107) = 0 , (14)where P ( a , b ) (cid:44) a − Π B ( a + b ) , and Π B ( a ) is the projectionof the vector a on the set B . At the k -th iteration, LBSBupdates Y k , R k by ﬁnding a point in B , such that the followingcondition is satisﬁed: (cid:107) P (( Y , R ) , ∇ Y , R Ψ( Y , R , µ k , γ k , s k )) (cid:107) ≤ ω k , (15)where parameter ω k ≥ indicates the accuracy of thesolution; when ω k = 0 , the point ( Y k , R k ) satisﬁes thenecessary optimality condition (14). In general, this point canbe found by iterative algorithms such as interior-point methodsor projected gradient ascent. Here, we use the trust-regionalgorithm [50] for simple box constraints, which we describefor completeness in Appendix C.After updating ( Y k , R k ) , LBSB checks whether the solutionis in a “locally convergent regime” (with tolerance δ k ). If so, itupdates the Lagrange multiplier estimates. It also updates theaccuracy parameter ω k +1 , the tolerance parameter δ k +1 andthe shifts s k +1 ; these updates differ depending on whetherthe algorithm is in a locally convergent regime or not. Theseiterations continue until the algorithm converges; a high-levelsummary of LBSB is described in Alg. 1. We refer theinterested reader to Conn et al. [49] or Appendix D for a lgorithm 1 Summary of Lagrangian Barrier with SimpleBounds (LBSB) Set accuracy parameter ω Set tolerance parameter for locally convergent regime δ Set Lagrange multiplier estimates µ , γ , and other initialparameters k ← − repeat k ← k + 1 Compute shifts s k Find ( Y k , R k ) ∈ B such that || P (( Y k , R k ) , ∇ Y , R Ψ( Y k , R k , µ k , γ k , s k )) || ≤ ω k . if in locally convergent regime (with threshold δ k ) then Update Lagrange multiplier estimates µ k +1 , γ k +1 Update ω k +1 using ω k Update δ k +1 using δ k else Update ω k +1 using initial parameters Update δ k +1 using initial parameters end if until convergencedetailed description of the algorithm. The details include initialparameters, updates of the Lagrange multiplier estimates,shifts, accuracy and tolerance parameters, as well as a formaldeﬁnition of the locally convergent regime. Under relativelymild assumptions (see Lemma 3), the solution generated byLBSB converges to a KKT point and the Lagrange multiplierestimates converge to the Lagrange multipliers correspondingto that KKT point. Guarantees.

For general non-convex problems, the KKTpoint to which LBSB converges comes with no optimalityguarantees . Our main contribution is showing that due toDR-submodularity, applying LBSB to Problem (9) yields astronger result. We ﬁrst need a few additional assumptions.

Deﬁnition 2.

A function U n : R + → R has logarithmicdiminishing return if there exists a ﬁnite number θ n ∈ R + such that λ dU n ( λ ) dλ ≤ θ n for all λ ∈ [0 , ∞ ) . Assumption 1.

All utility functions U n , n ∈ N , havelogarithmic diminishing return. Assumption 2.

At least one of the utility functions is un-bounded from above.We want to stress that Assumptions 1 and 2 are relativelymild. For example, consider the well-known α − fair utilityfunctions [4]: U α ( x ) = (cid:26) ω · x − α − α if α > , α (cid:54) = 1 ω · log ( x ) if α = 1 , where ω ≥ . All α − fair utility functions with α ≥ havelogarithmic diminishing return. For α = 1 , the utility functionis unbounded from above. Therefore, for example, a problem instance with α − fair utility functions, where α ≥ and atleast one function has α = 1 , satisﬁes Asssumptions 1 and 2. Deﬁnition 3.

Regular point : If the gradients of the activeinequality constraints at ( Y , R ) are linearly independent, then ( Y , R ) is called a regular point.Our main result is the following theorem, characterizingthe quality of regular limit points of the sequence { ( Y k , R k ) } generated by Alg. 1. We note that the regularity of limit pointsis typically considered in the analysis of other methods inconstrained optimization literature as well [61]–[63]. Theorem 1.

Consider a problem instance with link ca-pacity vector C ∈ R |E| + and demand rate vector ¯ λ ∈ R |N | + . Suppose Assumptions 1 and 2 hold, and { ( Y k , R k ) } , k ∈ K is a sub-sequence generated by Alg. 1 which con-verges to a regular point (cid:2) ˆ Y ( C , ¯ λ ) , ˆ R ( C , ¯ λ ) (cid:3) . Denote theoptimal solution by (cid:2) Y ∗ ( C , ¯ λ ) , R ∗ ( C , ¯ λ ) (cid:3) . Then, we have lim m →∞ F ( ˆ R ( m C , m ¯ λ )) /F ( R ∗ ( m C , m ¯ λ )) = 1 . Hence, the value of the objective at a regular limit pointof Alg. 1 approaches the optimal objective value, when linkcapacities and demand rates grow to inﬁnity by the same factor m . Note that increasing the link capacities does not make theproblem easier, since demand rates increase proportionally.The proof of Theorem 1 follows from a sequence of lemmas,which we now outline. Lemma 3.

Let { ( Y k , R k ) } , k ∈ K , be any subsequencegenerated by Alg. 1 which converges to a regular point ( ˆ Y , ˆ R ) . Then ( ˆ Y , ˆ R ) is a KKT point for Problem (9) .Proof. Please see Appendix F. The lemma is proved byshowing that the regularity assumption is equivalent to theassumption stated in Theorem 4.4 of Conn et. al. [49].Our major contribution is to characterize the differencebetween the value of the objective at a KKT point and theglobal optimal value as shown in Lemma 4. The key factorin proving Lemma 4 is the concavity of F ( · ) and the factthat g ba ( · ) are monotone DR-submodular functions for all ( b, a ) ∈ E . Lemma 4.

Let ( ˆ Y , ˆ R ) be a KKT point and ( Y ∗ , R ∗ ) be theoptimal point for Problem (9) . Then F ( ˆ R ) ≥ F ( R ∗ ) − (cid:88) ( b,a ) ∈E ˆ µ ba  (cid:88) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba  . (16) Lemma 4 is proved in Appendix G, and impliesthat the value of the objective in the KKT point isbounded away from the optimal value by an additive term (cid:80) ( b,a ) ∈E ˆ µ ba ( (cid:80) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba ) . Lemma 5.

Under Assumption 1, (cid:88) ( b,a ) ∈E ˆ µ ba ( (cid:88) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba ) ≤ θ (cid:88) ( b,a ) ∈E n ab C ba ( (cid:88) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba ) , here n ab is the number of paths passing through ( a, b ) , and θ (cid:44) max n ∈N θ n is the maximum logarithmic diminishingreturn parameter among utilities.Proof. Please see Appendix H.

Proof of Theorem 1.

By Lemma 3, we know that (cid:2) ˆ Y ( m C , m ¯ λ ) , ˆ R ( m C , m ¯ λ ) (cid:3) is a KKT point for all m ∈ R + . Thus, Lemma 4 and Lemma 5 imply that, for all m ∈ R + , F ( ˆ R ( m C , m ¯ λ )) is bounded from below, i.e., F ( ˆ R ( m C , m ¯ λ )) ≥ F ( R ∗ ( m C , m ¯ λ )) − θ (cid:88) ( b,a ) ∈E n ab C ba ( (cid:88) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba ) . (17)According to (8), F ( R ∗ ( m C , m ¯ λ )) = U ( λ ∗ ( m C , m ¯ λ )) .The rate vector m λ ∗ ( C , ¯ λ ) is feasible in Problem (9) with linkcapacity vector m C and demand rate vector m ¯ λ , and we have U ( λ ∗ ( m C , m ¯ λ )) ≥ U ( m λ ∗ ( C , ¯ λ )) . Combining this and thefact that there exists an unbounded utility function U n ( . ) which grows without bound as the input rate goes to inﬁnity(Assumption 2), we have lim m →∞ U ( λ ∗ ( m C , m ¯ λ )) = ∞ , orequivalently lim m →∞ F ( R ∗ ( m C , m ¯ λ )) = ∞ . This impliesthat there exists a m > such that F ( R ∗ ( m C , m ¯ λ )) > ,for all m ≥ m . We conclude the proof by dividing bothsides of (17) by F ( R ∗ ( m C , m ¯ λ )) for m ≥ m , and letting m → ∞ . Convergence Rate.

Conn et. al. [49] have also studied theconvergence rate of Alg. 1 under additional assumptions. Wecan also use this to characterize convergence rate of thealgorithm as applied to U

TILITY M AX . Assumption 3.

The function F ( R ) , its gradient, and elementsof its Hessian are Lipschitz continuous. Proposition 1.

Suppose Assumption 3 holds, and iterates { ( Y k , R k ) } generated by Alg. 1 have a single limit point ( ˆ Y , ˆ R ) which is regular, and satisﬁes the second-order sufﬁ-ciency condition (discussed in Appendix E). Then with properchoice of parameters, { ( Y k , R k ) } converges to ( ˆ Y , ˆ R ) at least R-linearly for sufﬁciently large k , i.e., there exists r ∈ (0 , , P > , and k such that (cid:107) ( Y k , R k ) − ( ˆ Y , ˆ R ) (cid:107) ≤ P r k , forall k ≥ k . We prove this in Appendix I, by showing that the assump-tions in Proposition 1 imply all the assumptions for part (ii)of Theorem 5.3 and Corollary 5.7 of Conn et. al. [49].We can combine the results of Thm. 1 and Proposition 1to characterize the quality of solutions obtained at the k -thiteration of Alg. 1. Corollary 1.

If the assumptions of Thm. 1 and Prop. 1 hold,there exist r ∈ (0 , , Q > , β > and k such that F ( R ( k ) ) F ( R ∗ ) ≥ − θF ( R ∗ ) (cid:80) ( b,a ) ∈E n ab ( (cid:80) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba ) C ba − Qr k F ( R ∗ ) ,for all k ≥ k . Proof. By Assumption 3, F ( · ) is Lipschitz continuous. Hence,there exists an L ∈ R + s.t. | F ( R k ) − F ( ˆ R ) | ≤ L (cid:107) ( Y k , R k ) − ( ˆ Y , ˆ R ) (cid:107) Prop. 1 ≤ LP r k , where L is the Lipschitz constant. Let Q (cid:44) LP, we then havethe following for all k ≥ k F ( R k ) ≥ F ( ˆ R ) − Qr k Lem. 4 , Lem. 5 ≥ F ( R ∗ ) − θ (cid:88) ( b,a ) ∈E n ab (cid:16)(cid:80) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba (cid:17) C ba − Qr k . Dividing both sides by F ( R ∗ ) concludes the proof.Corollary 1 states that if the assumptions in Thm. 1 andProposition 1 hold and k is large enough, the ratio of theobjective value for ( Y k , R k ) to the optimum is bounded awayfrom 1 by two additive terms, i.e., a sum and Qr k F ( R ∗ ) ; the lattercan be made arbitrary small (by increasing k ), while the formeralso goes to zero when link capacities and demand rates areincreased with the same factor (see Thm. 1). B. Convex Relaxation of U TILITY M AX An alternative approach for solving Problem (9) is to comeup with a convex relaxation of constraint set D . This turnsour problem into a convex optimization problem which canbe solved efﬁciently. Similar to prior literature [9], [64], [65],we construct concave upper and lower bounds for the non-convex and non-concave functions in constraints (10a), usingthe so-called Goemans and Williamson inequality: Lemma 6 (Goemans and Williamson [66]) . For Z ∈ [0 , n deﬁne A ( Z ) (cid:44) − (cid:81) ni =1 (1 − z i ) and B ( Z ) (cid:44) min { , (cid:80) ni =1 z i } . Then, (1 − /e ) B ( Z ) ≤ A ( Z ) ≤ B ( Z ) . Applying Lemma 6 to the functions g ba ( · ) yields the fol-lowing corollary. Corollary 2.

Functions g ba ( Y , R ) , ( b, a ) ∈ E , satisfy: (1 − /e )˜ g ba ( Y , R ) ≤ g ba ( Y , R ) ≤ ˜ g ba ( Y , R ) where ˜ g ba ( Y , R ) (cid:44) (cid:88) ( i,p ) ∈N :( a,b ) ∈ p ¯ λ ( i,p ) min { , r ( i,p ) ¯ λ ( i,p ) + a (cid:88) k =1 y p k i } , (18) are concave functions, for all ( b, a ) ∈ E . The proof of Corollary 2 is presented in Appendix J. Weuse this to formulate the following convex problem:C

ONVEX U TILITY M AX maximize F ( R ) (19a)subject to ( Y , R ) ∈ D , (19b)here D ⊆ R |V|×|I| × R |N | is the set of ( Y , R ) satisfying: ˜ g ba ( Y , R )) ≥ (cid:80) ( i,p ):( b,a ) ∈ p ¯ λ ( i,p ) − C ba − /e , ∀ ( b, a ) ∈ E g v ( Y ) ≤ c v , ∀ v ∈ V ≤ y vi ≤ , ∀ v ∈ V , ∀ i ∈ I ≤ r n ≤ ¯ λ n , ∀ n ∈ N . Although ˜ g ba ( · ) , for all ( b, a ) ∈ E , are non-differentiable, theoptimal solution can be found using sub-gradient methods as D is convex. The following theorem provides a bound on theoptimal value of Problem (19) with respect to Problem (9). Theorem 2.

Let ( Y ∗∗ C , R ∗∗ C ) be the optimal solution of Prob-lem (19) with link capacity vector C . Also let ( Y ∗ C , R ∗ C ) and ( Y ∗ C (cid:48) , R ∗ C (cid:48) ) be the optimal solutions of two instances ofProblem (9) with link capacity vectors C and C (cid:48) , respectively,where for all ( b, a ) ∈ E C (cid:48) ba = C ba − e − (cid:80) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba ] . (21) Then, ( Y ∗∗ C , R ∗∗ C ) is a feasible solution to Problem (9) withlink capacity vector C , and F ( R ∗ C (cid:48) ) ≤ F ( R ∗∗ C ) ≤ F ( R ∗ C ) .Proof. Let D ⊆ R |V||I| × R |N | be the set of ( Y , R ) satisfying: g ba ( Y , R ) ≥ (cid:80) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba − /e ∀ ( b, a ) ∈ E g v ( Y ) ≤ c v ∀ v ∈ V ≤ y vi ≤ ∀ v ∈ V , ∀ i ∈ I ≤ r n ≤ ¯ λ n ∀ n ∈ N Observe that D is the constraint set for Problem (9) with thelink capacity vector C (cid:48) . By Corollary 2, we have D ⊆ D ⊆D . By deﬁnition, F ( R ∗ C ) , F ( R ∗∗ C ) , and F ( R ∗ C (cid:48) ) are maxi-mum values of F ( R ) subject to D , D , and D , respectively.As a result, we have F ( R ∗ C (cid:48) ) ≤ F ( R ∗∗ C ) ≤ F ( R ∗ C ) . Thm. 2 implies that instead of Problem (9), we can solveProblem (19), which is a convex program with tighter con-straint set. Since Problem (19) has more restrictive constraints,its solution is naturally a feasible solution to (9) and isupper bounded by the optimum. On the other hand, Thm. 2states that this solution is no worse than the optimum ofan instance of the original problem, with link capacities C (cid:48) ba = C ba − e − [ (cid:80) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba ] . Note that C (cid:48) ba can be negative. In that case, Problem (9) with negativelink capacities has no feasible solutions, and the solution ofProblem (19) has no lower bound. The following corollary isprovided in terms of rate of requests λ for further clariﬁcation. Corollary 3. If (cid:80) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) ≤ ∆ C ba for all ( b, a ) ∈ E and some ∆ ∈ [1 , e ] , then U (cid:18) e − ∆ e − λ ∗ (cid:19) ≤ U ( λ ∗∗ ) ≤ U ( λ ∗ ) , where λ ∗ = ¯ λ − R ∗ are the optimal rates for Problem (9) ,and λ ∗∗ = ¯ λ − R ∗∗ are the optimal rates for Problem (19) . TABLE II: Graph Topologies and Experiment Parameters.

Graph |V| |E| |I| |N | | Q | c (cid:48) v ˆ F (loose) ˆ F (tight) cycle

30 60 10 100 10 2 9.53 9.53 344 lollipop

30 240 10 100 10 2 9.53 9.53 274 geant

22 66 10 100 10 2 9.53 9.53 228 abilene dtelekom

68 546 15 125 15 3 11.91 11.91 301 balanced-tree

63 124 30 450 15 3 42.88 28.45 1434 grid-2d

64 224 30 450 15 3 42.88 37.08 1665 hypercube

64 384 15 450 15 3 42.88 35.59 1189 small-world

64 308 30 450 15 3 42.88 37.90 1349 erdos-renyi

64 378 30 450 15 3 42.88 35.06 1191

Proof.

By (8) and Thm. 2 we have that U ( λ ∗ C (cid:48) ) ≤ U ( λ ∗∗ C ) ≤ U ( λ ∗ C ) , where λ ∗ C = ¯ λ − R ∗ C and λ ∗ C (cid:48) = ¯ λ − R ∗ C (cid:48) are the optimalrates of two instances of Problem (9) with link capacity vectors C and C (cid:48) , respectively, and λ ∗∗ C = ¯ λ − R ∗∗ C is the optimalrate for Problem (19) with link capacity vector C . Since (cid:80) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) ≤ ∆ C ba for all ( b, a ) ∈ E , by (21) wehave e − ∆ e − C ≤ C (cid:48) . Therefore, the rate vector e − ∆ e − λ ∗ C isfeasible in Problem (9) with link capacity vector C (cid:48) . Hence, U (cid:16) e − ∆ e − λ ∗ C (cid:17) ≤ U ( λ ∗ C (cid:48) ) and we can write U (cid:18) e − ∆ e − λ ∗ C (cid:19) ≤ U ( λ ∗ C (cid:48) ) ≤ U ( λ ∗∗ C ) ≤ U ( λ ∗ C ) . Therefore, we have U (cid:16) e − ∆ e − λ ∗ C (cid:17) ≤ U ( λ ∗∗ C ) ≤ U ( λ ∗ C ) , andthis concludes the proof.Comparing this to LBSB, the convex relaxation is a simplerproblem, as it requires solving a convex program. On the otherhand, LBSB provides better optimality guarantees, especiallywhen the demand rate of requests exceeds the capacity ofthe links. In practice, as we see in the numerical evaluations(Section V), the Lagrangian barrier outperforms the convexrelaxation method for a wide range of network topologies andparameter settings.V. N UMERICAL E VALUATION

In this section, we evaluate the performance of the proposedmethods, i.e., LBSB introduced in Section IV-A and theconvex relaxation introduced in Section IV-B. We also im-plement two greedy algorithms and compare the performanceof the proposed methods against these greedy algorithms. Asdiscussed below, we observe that our methods demonstrateimpressive performance; for example, in Fig. 2 we observedthat out of 20 scenarios, the Lagrangian barrier and the convexrelaxation methods attain higher objective values than thegreedy algorithms in 19 and 15 scenarios, respectively.

Topologies.

The networks we consider are summarized inTable II. Graph cycle is a simple cyclic graph, and lollipop is a clique (i.e., complete graph), connected toa path graph of equal size. The next 3 graphs represent theDeutche Telekom, Abilene, and GEANT backbone networks[67]. Graph grid-2d is a two-dimensional square grid, balanced-tree is a complete binary tree of depth 5, and hypercube is a 6-dimensional hypercube. Finally, the last ycle lollipop geant abilene dtelekom balanced_tree grid_2d hypercube small_world erdos_renyi N o r m a li z e d O b j e c t i v e LBSB CR Greedy1 Greedy2 (a) κ = 0 . cycle lollipop geant abilene dtelekom balanced_tree grid_2d hypercube small_world erdos_renyi N o r m a li z e d O b j e c t i v e LBSB CR Greedy1 Greedy2 (b) κ = 0 . Fig. 2: Objectives performance. The ﬁgure shows the normalized objective obtained by different algorithms across different topologies and intwo settings, i.e., the loose setting, with κ = 0 . (Fig 2a), and the tight setting with κ = 0 . (Fig 2b). We see that LBSB outperforms otheralgorithms. Also, CR performance almost matches LBSB in the loose setting (Fig. 2a); however, its performance signiﬁcantly deteriorates inthe tight setting (Fig. 2b). Note that solutions for all 4 algorithms are feasible in all cases. Number of variables10 T i m e ( s ) LBSBCRGreedy1Greedy2 (a) κ = 0 . Number of variables10 T i m e ( s ) LBSBCRGreedy1Greedy2 (b) κ = 0 . Time (s) O b j e c t i v e Objective 0.9400.9450.9500.9550.9600.965 S a t i s f i e d C o n s t r a i n t s R a t i o Satisfied Constraints Ratio (c)

LBSB trajectory for grid-2d

Fig. 3: Execution Times and Convergence. Figures 3a and 3b show the execution times for algorithms w.r.t. the number of variables, forthe loose and tight settings, respectively. We observe that in the loose setting the execution times for

LBSB are comparable with greedyalgorithms; however, in the tight setting

LBSB is much slower, as the the trust-region algorithm that we use at each iteration of

LBSB requires more iterations to satisfy (15). Fig. 3c shows the objective and feasibility trajectories for the iterations of

LBSB ; as iterationsprogress, feasibility improves and thusly the objective value decreases. two graphs are random; small-world is the graph byKleinberg [68], which comprises a grid with additional longrange links, and erdos-renyi is an Erd˝os-R´enyi graph withparameter p = 0 . . Experiment Setup.

We evaluate our algorithms on cycle , lollipop (which is a complete graph connected to a pathgraph of equal size), Deutche Telekom, Abilene, and GEANT backbone networks [67], grid-2d , balanced-tree , hypercube , small-world , and erdos-renyi graphwith parameter p = 0 . . Given a graph G ( V , E ) , we generatea catalog I , and assign a cache to each node in the graph.For every item i ∈ I , we designate a source node selecteduniformly at random (u.a.r.) from V . We set the capacity c v of every node v so that c (cid:48) v = c v − |{ i : v ∈ S i }| is constant .5 0.6 0.7 0.8 0.9 1.0Looseness coefficient 151050 O b j e c t i v e LBSBCRGreedy1Greedy2 (a) abilene O b j e c t i v e LBSBCRGreedy1Greedy2 (b) geant O b j e c t i v e LBSBCRGreedy1Greedy2 (c) cycle

Fig. 4: Effects of tightening the constraints. The ﬁgure shows the objective values w.r.t. the looseness coefﬁcient κ , for three topologies abilene , geant , and cycle . We see that as the constraints are tightened (i.e., κ decreases) the gap between the objective valuesobtained by LBSB and other algorithms increases signiﬁcantly, where

LBSB delivers a superb performance. Moreover, for tighter settings, CR performance gets poor, e.g., for κ = 0 . Greedy1 and

Greedy2 achieve higher objectives, in all three cases. Note that solutions forall 4 algorithms are feasible in all cases. among all nodes in V . We then generate a set of requests N as follows. First, we select a set Q nodes in V selectedu.a.r., that we refer to as query nodes : these are the only nodesthat generate requests. More speciﬁcally, for each query node v ∈ Q, we generate ≈ |N | / | Q | requests according to a Zipfdistribution with parameter . and without replacement fromthe catalog I . Each request is then routed over the shortestpath between the query node and the designated source forthe requested item. We assign a demand rate ¯ λ ( i,p ) = 1 toevery request n ∈ N . The values of |I| , |N | , | Q | , and c v foreach topology are given in Table II. Our process also makessure that each item i ∈ I is requested at least once.We determine the link capacities C ba , ( b, a ) ∈ E , as follows.First, note that the maximum possible load on each link ( b, a ) ∈ E is λ ( max ) ba (cid:44) (cid:80) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) . We set the linkcapacities as C ba = κλ ( max ) ba , where κ ∈ (0 , is a loosenesscoefﬁcient : the higher κ is, the easier it becomes to satisfy thedemand. Note that for every link ( b, a ) , if C ba ≥ λ ( max ) ba (orequivalently κ ≥ ), then the link constraint corresponding to ( b, a ) in (10a) is trivially satisﬁed. In contrast, as κ decreasesbelow , the link constraints are tightened, and ﬁnding optimalrate and cache allocation strategies becomes non-trivial. Foreach topology in Table II, we study two settings, i.e., (1) a loose setting, where κ = 0 . and (2) a tight setting, where κ = 0 . . Algorithms.

We implement Alg. 1 and refer to it as

LBSB .We run this algorithm until the convergence criterion in [49] ismet (with δ ∗ = 10 − , ω ∗ = 10 − ). We also solve the convexrelaxation in (19) via a sub-gradient method, as described inSection 7.5 of Bertsekas [63]; we refer to this algorithm by CR , for convex relaxation. We run CR for a ﬁxed numberof iterations (500 iterations). In addition, we implement twogreedy algorithms, i.e., Greedy1 and

Greedy2 . We describeeach of these algorithms below; we stress that, as we are the Our code is publicly available at https://github.com/neu-spiral/UtilityMaximizationProbCaching. ﬁrst to consider problem U

TILITY M AX , there are no prior artalgorithms to compare with. • Greedy1 consists of three steps; in Step 1, we initialize Y = 0 and update R by solving (9) only w.r.t. R . Thisis a convex optimization problem. In Step 2,

Greedy1 keeps R ﬁxed, as computed by Step 1, and updates Y by maximizing the sum (cid:80) ( b,a ) ∈E g ba ( Y , R ) , subjectto the constraints (10b) and (10c) and only w.r.t. Y . This is equivalent to minimizing the total long-termtime-average item load over the links of the graph (c.f.Section III-A). Note that Step 2 is a monotone DR-submodular maximization subject to a polytope, whichwe solve via the Frank-Wolfe algorithm proposed by Bianet. al. [21]. Finally in Step 3,

Greedy1 updates R bysolving (9) w.r.t. R , one more time, while Y is ﬁxed tothe value computed by Step 2. • Greedy2 is an alternating optimization algorithm. Weinitialize Y = 0 , and then alternatively update Y and R ,while keeping the other variable ﬁxed at a time; we referto the former step as the cache allocation step and to thelatter as the rate allocation step.In the cache allocation step, Greedy2 “greedily” placesone item to a cache: it changes one zero variable ( y vi = 0 )to 1, where ( v, i ) is the feasible pair with the largestmarginal gain in load reduction. Formally, (a) node v has not fully used its cache capacity ( g v ( Y ) < c (cid:48) v ) and(b) changing y vi from 0 to 1 has the highest increasein the total sum (cid:80) ( b,a ) ∈E g ba ( Y , R ) (or equivalently thehighest decrease in the aggregate item load). In the rateallocation step, Greedy2 keeps Y constant from theprevious step, and updates R by solving (9) w.r.t. R .Finally, Greedy2 terminates once node storage capac-ities are depleted, i.e., there is no pair ( v, i ) left, s.t., y vi = 0 and y v < c (cid:48) v . Metrics.

Throughout the experiments we report the objectivefunction F ( R ) obtained by different algorithms for the choiceof the logarithmic utility functions U n ( λ ) = log( λ + 0 . , or all n ∈ N . Note that these utility functions satisfyAssumption 1 and Assumption 2. To assess feasibility duringthe progression of an algorithm, we also report the ratio ofsatisﬁed constraints, i.e., the fraction of constraints in (10a)and (10b) that are satisﬁed.

Objective Performance.

Fig. 2 shows objectives attainedby different algorithms, normalized by the objective under

LBSB ; the latter is given in the ˆ F (loose) and ˆ F (tight) columnsof Table II, for the loose ( κ = 0 . and tight ( κ = 0 . settings, respectively. We observe that LBSB outperforms allits competitors across all topologies in both settings, except forone case ( balanced-tree with κ = 0 . . ) In Fig. 2a, wesee that in the loose setting, CR performance almost matches LBSB and achieves better objective values in comparison with

Greedy1 and

Greedy2 . However, in Fig. 2b we see thatas the constraint set is tightened, the performance of CR deteriorates. Moreover, by comparing Fig. 2a and Fig. 2b wesee that for some topologies (e.g., grid-2d , hypercube , small-world , and erdos-renyi ), the gap between theobjective values obtained by LBSB and other algorithms issigniﬁcantly higher in the tight constraints regime.

Execution Time.

In Fig. 3, we plot the execution times of allalgorithms for each scenario as a function of the number ofvariables in the corresponding instance of problem U

TILITY -M AX (reported in the last column of Table II). Figures 3aand 3b correspond to Figures 2a and 2b (the loose andtight settings), respectively. In particular, in the loose setting(Fig. 3a) we see that the execution times for LBSB almostmatch the execution times for

Greedy2 , and

LBSB is muchfaster than CR . In the tight setting (Fig. 3b), however, wesee that the execution time of LBSB is higher, particularlywhen the number of variables is large (corresponding to largertopologies, i.e., grid-2d , balanced-tree , hypercube , small-world , and erdos-renyi ). The main reason isthat, in the tight setting, the trust-region algorithm used asa subroutine at each iteration requires a higher number ofiterations to satisfy (15); as a result, the execution time for LBSB increases. Nonetheless, as we observed in Fig. 2b,

LBSB achieves signiﬁcantly improved objective performancecompared to other algorithms.

Convergence.

To obtain further insight into the convergenceof

LBSB , we plot in Fig. 3c the objective and feasibility (i.e.,the ratio of satisﬁed constraints (10a) and (10b)) trajectoriesas a function of time. Each marker in these plots correspondsto an outer iteration of

LBSB . We observe that, initially, thesolutions are infeasible and the objective value is high, as thealgorithm converges, the feasibility improves and consequentlythe objective decreases. We stress here that although the con-straint satisfaction ratio in Fig. 3c does not reach 1, the highestconstraint violation at the last displayed iteration is in theorder − (and, hence, our convergence criterion was met).For brevity, we show only the trajectory for grid-2d and κ = 0 . , for which LBSB is slower than other algorithms.

Effect of Tightening Constraints.

Motivated by our obser-vation regarding the superior performance of

LBSB in thetighter setting, we study the effects of further decreasing the looseness coefﬁcient κ . In Fig. 4, we plot the objective valuesachieved by different algorithms for looseness coefﬁcients κ = 0 . , . , . , . , . , . , and . For brevity, we reportthese results only for abilene , geant , and cycle . FromFig. 4, we observe that for all three topologies, when κ = 1 ,all algorithms achieve the optimal objective value; this isexpected, because as explained, when C ba ≥ λ ( max ) ba , the non-convex constraints (10a) are trivially satisﬁed. As we tightenthe constraints by decreasing κ , we observe that all algorithmsobtain smaller objectives, which is also expected. Crucially, LBSB signiﬁcantly outperforms other algorithms and remainsquite resilient to tightening of the constraints; for example,for all three topologies, when κ ≥ . LBSB still obtainsthe maximum objective value, i.e., the same value as with κ = 1 . In fact, for cycle , the performance of

LBSB remainspractically invariant, while all other algorithms deteriorate.Moreover, we also see in Fig. 4 that for moderate tightnessof constraints (e.g., κ ≥ . , CR ) shows decent perfor-mance and outperforms Greedy1 and

Greedy2 ; however,tightening the constraints further, e.g., for κ ≤ . , grosslyaffects the performance of CR : the objective values decreasesigniﬁcantly, falling below that of the greedy algorithms. Infact, this is expected from Thm. 2. To see this, note that when κ < /e ≈ . , for the capacities in (21) we have C (cid:48) ba < ,for all ( a, b ) ∈ E . As a result, based on Thm. 2, for κ < /e, the lower bound on the optimal objective of CR is non-existent,as the constraint set D (see (22)) is an empty set. In otherwords, in this regime, CR comes with no guaranteed lowerbound. VI. C ONCLUSION

We studied a new class of non-convex optimization prob-lems for joint content placement and rate allocation in cachenetworks, and proposed solutions with optimality guarantees.Our solutions establish a foundation for several possible futureinvestigations. First, in the spirit of Kelly et al. [1], studyingdistributed algorithms that converge to a KKT point, andproviding similar guarantees as Thm. 1, is an important openquestion. Ideally, such algorithms would amount to protocolsthat adjust both rates as well as caching decisions in away that leads to optimality guarantees. Second, in both thecentralized/ofﬂine setting we study here, as well as a dis-tributed/adaptive setting, designing new rounding techniquesfor deterministic content placement is another open question.Finally, both providing lower bounds for U

TILITY M AX , ordevising algorithms with better/tighter optimality guarantees,are additional open questions in the context of our problem.A CKNOWLEDGMENT

The authors gratefully acknowledge support from NationalScience Foundation grants NeTS-1718355 and CCF-1750539,and a research grant from American Tower Corp.A

PPENDIX AP ROBABILISTIC C ONTENT P LACEMENT A LGORITHM

Here we describe a distributed and random content place-ment algorithm [9], [59]. Each node v ∈ V has access to its (𝟏) 𝝉 (𝟐) 𝒚’ 𝒗𝟏 𝒚’ 𝒗𝟐 𝒚’ 𝒗𝟑 𝒚’ 𝒗𝟒 Fig. 5: A content placement satisfying cache capacity constraints(3), using [ y vi ] i ∈I as marginal distribution. Here, c v = 3 and I = { , , , } . At the beginning of the ﬁrst time period, a randomnumber τ (1) ∈ [0 , is chosen. The corresponding vertical lineintersects rectangles of item 1, item 2, and item 4. Thus, the set of { , , } is stored in the cache for the duration of this time period.At the beginning of the second time period, another random number τ (2) ∈ [0 , is chosen, and the set { , , } is stored in the cachefor the duration of this time period. cache allocation strategy [ˆ y vi ] i ∈I obtained by solving Prob-lem (9). Our goal at the beginning of the t -th time period, is touse [ˆ y vi ] i ∈I as marginal cache probabilities at node v , and pro-vide a probabilistic content placement (i.e., mapping of contentitems to the cache) ˆ X ( t ) (cid:44) [ˆ x vi ( t )] v ∈V ,i ∈I . The probabilisticcontent placement must ensure ˆ y vi = P r { ˆ x vi ( t ) = 1 } , andthat cache capacity constraint is satisﬁed exactly, i.e., (cid:88) i ˆ x vi ( t ) ≤ c v ∀ t > . If we naively pick ˆ x vi independently using a Bernoullidistribution with marginal probabilities [ˆ y vi ] i ∈I , the capacityconstraint is only satisﬁed in expectation, and at each time,the cache may store fewer or more items than its capacity. Toconstruct a desirable placement, consider a rectangle box ofarea c v × . For each i ∈ I , place a rectangle of length ˆ y vi andheight inside the box, starting from the top left corner. If arectangle does not ﬁt in a row, cut it, and place the remainderin the row immediately below, starting again from the left. As (cid:80) i ∈I ˆ y vi ≤ c v , this space-ﬁlling method is contained in the c v × box. In order to randomly choose a set of items, atthe beginning of the t -th time period we pick uniformly atrandom a number τ ( t ) ∈ [0 , and draw a vertical line locatedat that number which intersects the box area by no more than c v distinct items. The items are distinct because y vi ∈ [0 , .Moreover, the probability of appearance of item i in a memoryof size c v is exactly equal to ˆ y vi . A graphical explanation ofthis algorithm is presented in Fig. 5.A PPENDIX BP ROOF OF L EMMA ∂ g ba ( Y , R ) ∂y vi ∂y v (cid:48) i (cid:48) ≤ , ∂ g ba ( Y , R ) ∂y vi ∂r n ≤ , and ∂ g ba ( Y , R ) ∂r n ∂r n (cid:48) ≤ , for all v, v (cid:48) ∈ V , i, i (cid:48) ∈ I , and n, n (cid:48) ∈ N .This proves the DR-submodularity of g ba ( · ) , for all ( b, a ) ∈ E (See Section III-C). In addition, since ∂g ba ( Y , R ) ∂y vi ≥ , and ∂g ba ( Y , R ) ∂r n ≥ , for all v ∈ V , i ∈ I , n ∈ N , g ba ( · ) aremonotone, for all ( b, a ) ∈ E .A PPENDIX

CA T

RUST -R EGION A LGORITHM FOR S IMPLE B OX B OUNDS

We brieﬂy describe a trust-region algorithm proposed byConn et. al. [50]. This algorithm is used to ﬁnd a pointsatisfying the necessary optimality condition for the followingproblem: max x Ψ( x ) (23)s.t. x ∈ B , where B is a region consisting of simple box constraint. Thenecessary optimality condition for Problem (23) can be writtenas || P (cid:0) x , ∇ x Ψ( x ) (cid:1) || = 0 . where P ( a , b ) (cid:44) a − Π B ( a + b ) , and Π B ( a ) is the projectionof vector a on to the box region B . We use this algorithm toﬁnd a point satisfying the condition at line 8 of Alg. 1.The trust-region algorithm starts from an initial point insidethe box constraint B . At the k -th iteration, it performs theprojected gradient ascent; s ( t ) is the projected gradient ascentdirection, parameterized by the step size t . This step size t is chosen in such that (a) it is within a trust region deﬁnedwith ∆ k and (b) it is the smallest local maximum of Q ( k )Ψ ,i.e., the second-order approximation of the objective Ψ( · ) around the current point x k . Next, the algorithm assesses theimprovement of the objective along the direction of s k . Ifthe improvement is above some threshold µ , it accepts thedirection s k and updates the solution , and enlarges the trustregion. Otherwise, the algorithm rejects the direction and doesnot update the solution, and shrinks the trust region. Thesesteps are outlined in Alg. 2A PPENDIX DL AGRANGIAN B ARRIER W ITH S IMPLE B OUNDS

We describe the details of the Lagrangian Barrier withSimple Bounds (in short LBSB) by Conn et. al. [49]. Forsimplicity and avoid repetition, let us deﬁne an index set forinequality constraints: J (cid:44) { ( b, a ) ∈ E} ∪ { v ∈ V} . Thus, we can denote all inequality constraint (10a), (10b) with c j ( Y , R ) (cid:44) (cid:40) g ba ( Y , R ) − (cid:80) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) + C ba j = ( b, a ) ∈ E c v − g v ( Y ) j = v ∈ V , for all j ∈ J . In addition, we denote the box constraint setdeﬁne by (10c), (10d) with B . Problem (9) is then written as max F ( R ) s.t. c j ( Y , R ) ≥ ∀ j ∈ J ( Y , R ) ∈ B . (24) lgorithm 2 Trust-Region Algorithm for Simple Box Con-straints Set parameters µ ∈ (0 , , η ∈ ( µ, , < γ ≤ γ ≤ ≤ γ Set the initial point x − ∈ B k ← − while Not converged do s ( t ) (cid:44) Π B ( x k + t ∇ Ψ( x k )) − x k t k ← smallest local maximum of Q ( k )Ψ (cid:0) s ( t ) + x k (cid:1) subject to s ( t ) ≤ ∆ k s k ← s ( t k ) ρ k ← Ψ( x k + s k ) − Ψ( x k ) Q ( k )Ψ ( x k + s k ) − Ψ( x k ) if ρ k ≥ µ then x k +1 ← x k + s k if ρ k ≥ η then ∆ k +1 ← ∆ ∈ [∆ k , γ ∆ k ] else ∆ k +1 ← ∆ ∈ [ γ ∆ k , ∆ k ] end if else x k +1 ← x k ∆ k +1 ← ∆ ∈ [ γ ∆ k , γ ∆ k ] end if end while LBSB deﬁnes the Lagrangian barrier function Ψ( Y , R , Σ , s ) (cid:44) F ( R ) + (cid:88) j ∈J σ j s j log( c j ( Y , R ) + s j ) where the elements of the vector Σ (cid:44) [ σ j ] j ∈J ∈ R |J | + arethe positive Lagrange multipliers estimates associated withinequality constraints c j ( · ) ≥ ∀ j ∈ J in (24). The elementsof the vector s (cid:44) [ s j ] j ∈J ∈ R |J | + are positive shifts .Writing the gradient of the Lagrangian barrier function, wehave: ∇ Y , R Ψ( Y , R , Σ , s )= ∇ R F ( R ) + (cid:88) j ∈J σ j s j c j ( Y , R ) + s j ∇ Y , R c j ( Y , R ) (25)The values multiplied by the gradient of the constraints in (25)are called ﬁrst-order Lagrange multiplier approximations : ¯ σ j ( Y , R , Σ , s ) (cid:44) σ j s j c j ( Y , R ) + s j The vector of ﬁrst-order Lagrangian multiplier approximationsis denoted by ¯Σ( Y , R , Σ , s ) (cid:44) [¯ σ j ( Y , R , Σ , s )] j ∈J ∈ R |J | ,and is used in updating the Lagrangina multiplier estimates inLBSB.Consider the following problem: max ( Y , R ) Ψ( Y , R , Σ k , s k ) (26)s.t. ( Y , R ) ∈ B , where the values Σ k and s k are given. Then the followingcondition is the necessary optimality condition for Prob. (26): (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P (cid:18) ( Y , R ) , ∇ Y , R Ψ( Y , R , Σ k , s k ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 0 , where P ( a , b ) (cid:44) a − Π B ( a + b ) , and Π B ( a ) is the projectionof vector a on to the box region B . LBSB uses this conditionas explained below.After setting initial parameters, LBSB proceeds as follows.At the k − th iteration, it updates ( Y k , R k ) by ﬁnding a pointin B , such that the following condition is satisﬁed: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P (cid:18) ( Y , R ) , ∇ Y , R Ψ( Y , R , Σ k , s k ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ω k , where parameter ω k indicates the accuracy of the solution;when ω k = 0 , the point ( Y k , R k ) satisﬁes the necessaryoptimality conditions. LBSB is designed to be locally con-vergent to a KKT point if the penalty parameter (cid:15) k is ﬁxedat a sufﬁciently small value, and the Lagrange multipliersestimates are updated using their ﬁrst order approximations.The following condition is to detect whether we are ableto move from a globally convergent to a locally convergentregime, using a tolerance parameter δ k : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:2) c j ( Y k , R k )¯ σ j ( Y k , R k , Σ k , s k ) σ α σ k,j (cid:3) j ∈J (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ δ k (27)After updating ( Y k , R k ) , if the condition at (27) is satisﬁed,Lagrangian multiplier estimates are updated using their ﬁrstorder approximations ¯Σ( Y k , R k , Σ k , s k ) , penalty parameter (cid:15) k +1 stays the same, ω k +1 and δ k +1 are updated usingtheir previous value ω k and δ k . Otherwise, it means that thepenalty parameter is not small enough. Thus, the Lagrangianmultipliers are not changed, penalty parameter is decreased,and ω k +1 and δ k +1 are updated using the initial parameters ω s and δ s . The iterations stop if both conditions below aresatisﬁed: || P (cid:0) ( Y k , R k ) , ∇ Y , R Ψ( Y k , R k , Σ k , s k ) (cid:1) || ≤ ω ∗ (28a) || (cid:2) c j ( Y k , R k )¯ σ j ( Y k , R k , Σ k , s k ) (cid:3) j ∈J || ≤ δ ∗ . (28b)If the convergence criteria in (28a) and (28b) is not met, thealgorithm proceeds to the next iteration. At the beginningof the next iteration, LBSB computes the shifts using theobtained parameters. The algorithm can guarantee that thepenalty parameter is sufﬁciently small by driving it to zero,while at the same time ensuring that the Lagrange multiplierestimates converge to the Lagrange multipliers of the KKTpoint. The procedure is fully described in Alg. 3. For moredetails refer to the paper by Conn et. al. [49]. lgorithm 3 Lagrangian Barrier with Simple Bounds (LBSB) Set strictly positive parameters δ s , ω s , α ω , β ω , α δ , β δ , α σ ≤ , τ < , ω ∗ (cid:28) , δ ∗ (cid:28) , such that α δ +(1+ α σ ) − Set penalty parameter (cid:15) < Set initial solution ( Y − , R − ) ∈ B Set initial vector of Lagrangian multiplier estimates Σ =[ σ j, ] j ∈J , such that c j ( Y − , R − ) + (cid:15) σ α (cid:15) j, > for all j ∈ J Set accuracy parameter ω ← ω s (cid:15) α ω Set tolerance parameter δ ← δ s (cid:15) α δ k ← − repeat k ← k + 1 Compute shifts s k,j = (cid:15) k σ α σ k,j for all j ∈ J Find ( Y k , R k ) ∈ B such that: || P (cid:0) ( Y k , R k ) , ∇ Y k , R k Ψ( Y , R , Σ k , s k ) (cid:1) || ≤ ω k . if || (cid:2) c j ( Y k , R k )¯ σ j ( Y k , R k , Σ k , s k ) σ ασk,j (cid:3) j ∈J || ≤ δ k then Σ k +1 ← ¯Σ( Y k , R k , Σ k , s k ) (cid:15) k +1 ← (cid:15) k ω k +1 ← ω k (cid:15) β ω k +1 δ k +1 ← δ k (cid:15) β δ k +1 else Σ k +1 ← Σ k (cid:15) k +1 ← τ (cid:15) k ω k +1 ← ω s (cid:15) α ω k +1 δ k +1 ← δ s (cid:15) α δ k +1 end if until || P (cid:0) ( Y k , R k ) , ∇ Y , R Ψ( Y k , R k , Σ k , s k ) (cid:1) || ≤ ω ∗ and || (cid:2) c j ( Y k , R k )¯ σ j ( Y k , R k , Σ k , s k ) (cid:3) j ∈J || ≤ δ ∗ A PPENDIX EC ONSTRAINED O PTIMIZATION AND O PTIMALITY C ONDITIONS

Consider a general optimization problem of the followingform: Maximize f ( x ) (29a)Subj. to g j ( x ) ≥ j = 1 , . . . , r (29b) h i ( x ) = 0 i = 1 , . . . , m, (29c)where f : R n → R , g j : R n → R , j = 1 , . . . , r , and h i : R n → R , i = 1 , . . . , m are continuously differentiablefunctions. Here we provide a statement of the most commonﬁrst-order necessary optimality condition known as KKT con-dition. First let us deﬁne a regular point:

Deﬁnition 4.

Regular point : If the gradient of equality con-straints and active inequality constraints are linearly indepen-dent at x ∗ , then x ∗ is called a regular point.Now, let us formally deﬁne Karush-Kuhn-Tucker (KKT)points which we use extensively throughout this paper andin stating optimality conditions:

Deﬁnition 5.

A point x ∗ ∈ R n is called a KKT point forProblem (29) if there exist Lagrangian variables ν ∗ ∈ R m and µ ∗ ∈ R r , such that: ∇ x L ( x ∗ , µ ∗ , ν ∗ ) = 0 h i ( x ∗ ) = 0 ∀ i ∈ { , . . . , m } g j ( x ∗ ) ≥ ∀ j ∈ { , . . . , r } µ ∗ j ≥ ∀ j ∈ { , . . . , r } µ ∗ j g j ( x ∗ ) = 0 ∀ j ∈ { , . . . , r } . where L ( x , µ , ν ) (cid:44) f ( x ) + (cid:80) i ν i h i ( x ) + (cid:80) j µ j g j ( x ) iscalled the Lagrangian function . Proposition 2. (First-order KKT Necessary Conditions) Let x ∗ be a local minimum of Problem (29) , and assume x ∗ isregular. Then x ∗ is a KKT point. Using the second derivatives of the Lagrangian function, wecan state the sufﬁcient condition for optimality.

Proposition 3. (Second-order Sufﬁciency Conditions) Assume f , g j , ∀ j = 1 , . . . , r , and h i , ∀ i = 1 , . . . , m are twicecontinuously differentiable, and x ∗ is a KKT point withcorresponding Lagrange variables µ ∗ and ν ∗ . In addition let V T ∇ xx L ( x ∗ , µ ∗ , ν ∗ ) V < , for all V (cid:54) = 0 such that ∇ h i ( x ∗ ) T V = 0 , ∀ i = 1 , . . . , m, ∇ g j ( x ∗ ) T V = 0 , ∀ j ∈ A ( x ∗ ) . In addition assume we have strict complementary slackness,i.e., µ j > ∀ j ∈ A ( x ∗ ) where A ( x ∗ ) is set of active inequality constraints in x ∗ , i.e., A ( x ∗ ) (cid:44) { j | g j ( x ∗ ) = 0 } Then x ∗ is a strict local maximum of Problem (29) . For further information on other forms of necessary andsufﬁcient conditions for optimality, refer to the book byBertsekas [63]. A

PPENDIX FP ROOF OF L EMMA ( ˆ Y , ˆ R ) isequivalent to the assumption stated in Theorem 4.4 of Conn et.al. [49]. We divide the variables ( ˆ Y , ˆ R ) into two distinct class F and F (cid:48) , such that the variables in F (cid:48) are equal to their upperor lower bound and the variables in F are strictly between theirupper or lower bound. As a result, we have exactly |F (cid:48) | activebounding constraints. We denote by A the set of active linkand cache constraints (10a), (10b). Thus, we can decomposethe Jacobian matrix for the active constraints at point ( ˆ Y , ˆ R ) as J = (cid:104) J [ A F ] J [ A F(cid:48) ] |F(cid:48)|×|F| Q |F(cid:48)|×|F(cid:48)| (cid:105) . We denote by M [ s s ] is a sub-matrix of the matrix M where rows are picked according toset s and columns are picked according to set s . The last F (cid:48) | rows correspond to active box constraints (10c), (10d). Itcan be easily seen that Q |F (cid:48) |×|F (cid:48) | is a diagonal matrix withelements of diagonal being +1 or − depending on whetherthe variable is at its upper bound or lower bound. If ( ˆ Y , ˆ R ) is a regular point, then J is full rank at ( ˆ Y , ˆ R ) . Hence, itcan be seen that J [ A F ] is also full rank, which is exactly theassumption in Theorem 4.4 of Conn et. al. [49].A PPENDIX GP ROOF OF L EMMA ( ˆ Y , ˆ R ) is a KKT point, KKT necessary conditionsfor optimality hold at ( ˆ Y , ˆ R ) . Hence, there exists Lagrangemultipliers [ˆ µ ba ] ( b,a ) ∈E associated with (10a), [ˆ γ v ] v ∈V asso-ciated with (10b), [ ˆ ξ vi ] v ∈V ,i ∈I , [ ˆ ξ (cid:48) vi ] v ∈V ,i ∈I associated with(10c), [ˆ η ( i,p ) ] ( i,p ) ∈N , [ˆ η (cid:48) ( i,p ) ] ( i,p ) ∈N associated with (10d). Tobe concise and avoid repetitions, we put our variables andLagrange multipliers corresponding to box constraints (10c),(10d) in 1-column vectors as follows: (cid:20) ˆ Y ˆ R (cid:21) ( |V||I| + |N | ) × , (cid:20) Y ∗ R ∗ (cid:21) ( |V||I| + |N | ) × , ˆ η (cid:44) (cid:20) |V||I|× [ˆ η n ] n ∈N (cid:21) , ˆ η (cid:48) (cid:44) (cid:20) |V||I|× [ˆ η (cid:48) n ] n ∈N (cid:21) , ˆ ξ (cid:44) (cid:20) [ ˆ ξ vi ] v ∈V ,i ∈I |N |× (cid:21) , ˆ ξ (cid:48) (cid:44) (cid:20) [ ˆ ξ (cid:48) vi ] v ∈V ,i ∈I |N |× (cid:21) . Here the ﬁrst |V||I| columns correspond to the variables ˆ Y and the next |N | correspond to the variables ˆ R . Now we writethe KKT conditions as: ∇ Y , R F ( R ) + (cid:88) ( b,a ) ∈E ˆ µ ba ∇ Y , R g ba ( ˆ Y , ˆ R ) − (cid:88) v ∈V ˆ γ v ∇ Y , R g v ( ˆ Y )+ ˆ η − ˆ η (cid:48) + ˆ ξ − ˆ ξ (cid:48) = (31a) ˆ µ ba (cid:0) g ba ( ˆ Y , ˆ R ) − ( (cid:88) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba ) (cid:1) = 0 , ˆ µ ba ≥ (31b) ˆ γ v ( g v ( ˆ Y ) − c v ) = 0 , ˆ γ v ≥ , ∀ v ∈ V (31c) ˆ ξ vi ˆ y vi = 0 , ˆ ξ vi ≥ ∀ v ∈ V , ∀ i ∈ I (31d) ˆ ξ (cid:48) vi (1 − ˆ y vi ) = 0 , ˆ ξ (cid:48) vi ≥ , ∀ v ∈ V , ∀ i ∈ I (31e) ˆ η n ˆ r n = 0 , ˆ η n ≥ , ∀ n ∈ N (31f) ˆ η (cid:48) n (¯ λ n − ˆ r n ) = 0 , ˆ η (cid:48) n ≥ , ∀ n ∈ N . (31g)Due to the concavity of F , we can write ∇ Y , R F ( ˆ R ) T (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) = ∇ R F ( ˆ R ) T ( R ∗ − ˆ R ) ≥ F ( R ∗ ) − F ( ˆ R ) . Hence, F ( ˆ R ) ≥ F ( R ∗ ) − ∇ Y , R F ( ˆ R ) T (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) (31a) = F ( R ∗ ) + (cid:88) ( b,a ) ∈E ˆ µ ba ∇ Y , R g ba ( ˆ Y , ˆ R ) T (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) − (cid:88) v ∈V ˆ γ v ∇ Y , R g v ( ˆ Y ) T (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) + ˆ ξ (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) − ˆ ξ (cid:48) (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) + ˆ η (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) − ˆ η (cid:48) (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) . Due to Lemma 2, functions g ba ( · ) , for all ( b, a ) ∈ E are monotone DR-submodular. Therefore, we can use thefollowing Lemma from Bian et. al [52]: Lemma 7.

For any differntiable DR-submodular function f : X → R and any two points a , b in X , we have ( b − a ) T ∇ f ( a ) ≥ f ( a ∨ b ) + f ( a ∧ b ) − f ( a ) , where ∨ and ∧ are coordinate-wise maximum and minimumoperations, respectively. Hence, we can write F ( ˆ R ) ( Lemma 7 ) ≥ F ( R ∗ ) + (cid:88) ( b,a ) ∈E ˆ µ ba (cid:0) g ba ( ˆ Y ∨ Y ∗ , ˆ R ∨ R ∗ )+ g ba ( ˆ Y ∧ Y ∗ , ˆ R ∧ R ∗ ) − g ba ( ˆ Y , ˆ R ) (cid:1) − (cid:88) v ∈V ˆ γ v ∇ Y , R g v ( ˆ Y ) T (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) + ˆ ξ (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) − ˆ ξ (cid:48) (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) + ˆ η (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) − ˆ η (cid:48) (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) By Lemma 2, g ba ( · ) are monotone and we know that they areall positive of for all ( b, a ) ∈ E . Hence, g ba ( ˆ Y ∨ Y ∗ , ˆ R ∨ R ∗ ) ≥ g ba ( Y ∗ , R ∗ ) and g ba ( ˆ Y ∧ Y ∗ , ˆ R ∧ R ∗ ) ≥ for all ( b, a ) ∈ E . Therefore, F ( ˆ R ) ≥ F ( R ∗ ) + (cid:88) ( b,a ) ∈E ˆ µ ba (cid:0) g ba ( Y ∗ , R ∗ ) − g ba ( ˆ Y , ˆ R ) (cid:1) − (cid:88) v ∈V ˆ γ v ∇ Y , R g v ( ˆ Y ) T (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) + ˆ ξ (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) − ˆ ξ (cid:48) (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) + ˆ η (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) − ˆ η (cid:48) (cid:20) Y ∗ − ˆ YR ∗ − ˆ R (cid:21) ∗∗ ) = F ( R ∗ )+ (cid:88) ( b,a ) ∈E ˆ µ ba (cid:0) g ba ( Y ∗ , R ∗ ) − (cid:88) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba ) (cid:1) − (cid:88) v ∈V ˆ γ v ( (cid:88) i ∈I y ∗ vi − (cid:88) i ∈I ˆ y vi ) + (cid:88) v ∈V ,i ∈I ˆ ξ vi y ∗ vi − (cid:88) v ∈V ,i ∈I ˆ ξ (cid:48) vi ( y ∗ vi −

1) + (cid:88) n ∈N ˆ η n r ∗ n − (cid:88) n ∈N ˆ η (cid:48) n ( r ∗ n − ¯ λ n ) ≥ F ( R ∗ ) − (cid:88) ( b,a ) ˆ µ ba ( (cid:88) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba ) , where ( ∗∗ ) is due to (31b),(31c),(31d), (31e), (31f), and (31g).Therefore, we have F ( ˆ R ) ≥ F ( R ∗ ) − (cid:88) ( b,a ) ˆ µ ba ( (cid:88) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba ) . A PPENDIX HP ROOF OF L EMMA active link as follows.

Deﬁnition 6.

A link for which the constraint (10a) is satisﬁedexactly is called an active link . In other words, for an activelink ( b, a ) we have: (cid:88) ( i,p ):( a,b ) ∈ p λ ( i,p ) a (cid:89) v = p (1 − y vi ) = C ba Lemma 8.

For an active link ( b, a ) there exists a set ˆ N active ( b,a ) ⊆ N such that: ˆ N active ( b,a ) = { ( i, p ) : ( a, b ) ∈ p, (¯ λ ( i,p ) − ˆ r ( i,p ) ) a (cid:89) v = p (1 − ˆ y vi ) > , (cid:88) ( i,p ) ∈ ˆ N active ( b,a ) (¯ λ ( i,p ) − ˆ r ( i,p ) ) a (cid:89) v = p (1 − ˆ y vi ) = C ba } The proof of Lemma 8 is followed by the deﬁnition of active link and C ba being positive. Suppose ( b, a ) is anactive link. By Lemma 8, we have ˆ r ( i,p ) < ¯ λ ( i,p ) , ∀ ( i, p ) ∈ ˆ N active ( b,a ) . By writing the KKT conditions with respect to ˆ r ( i,p ) for ( i, p ) ∈ ˆ N active ( b,a ) , we have dU ( i,p ) (¯ λ ( i,p ) − ˆ r ( i,p ) ) dλ = (cid:80) ( c,d ):( c,d ) ∈ p ˆ µ dc (cid:81) cv = p (1 − ˆ y vi ) + ˆ η ( i,p ) . This implies dU ( i,p ) (¯ λ ( i,p ) − ˆ r ( i,p ) ) dλ ≥ ˆ µ ba (cid:81) av = p (1 − ˆ y vi ) . After multiplyingboth sides by (¯ λ ( i,p ) − ˆ r ( i,p ) ) , we have (¯ λ ( i,p ) − ˆ r ( i,p ) ) dU ( i,p ) ( λ ) dλ (cid:12)(cid:12)(cid:12)(cid:12) ¯ λ ( i,p ) − ˆ r ( i,p ) ≥ ˆ µ ba (¯ λ ( i,p ) − ˆ r ( i,p ) ) a (cid:89) v = p (1 − ˆ y vi ) . (32)Using Assumption 1 and the fact that θ is the maximum amonglogarithmic diminishing return parameters, we can write (32)as θ ≥ ˆ µ ba (¯ λ ( i,p ) − ˆ r ( i,p ) ) a (cid:89) v = p (1 − ˆ y vi ) . (33) By summing (33) over all ( i, p ) ∈ ˆ N active ( b,a ) , we have θ | ˆ N active ( b,a ) | ≥ ˆ µ ba C ba ⇒ ˆ µ ba ≤ θ | ˆ N active ( b,a ) | C ba . If ( b, a ) is not an active link, ˆ µ ba = 0 . As a re-sult, we have (cid:80) ( b,a ) ∈E ˆ µ ba ( (cid:80) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba ) ≤ θ (cid:80) ( b,a ) ∈E n ab (cid:0) (cid:80) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − C ba (cid:1) /C ba , where thelast inequality is due to | ˆ N active ( b,a ) | ≤ n ab .A PPENDIX IP ROOF OF P ROPOSITION

Lemma 9.

Suppose ( ˆ Y , ˆ R ) is regular and satisﬁes thesecond-order sufﬁciency condition. Then Assumption 5 ofConn et. al. [49] hold.Proof. Let us decompose the Jacobian matrix for the activeconstraints at point ( ˆ Y , ˆ R ) similar to the way mentioned inAppendix (F): J = (cid:20) J [ A F ] J [ A F (cid:48) ] |F (cid:48) |×|F| Q |F (cid:48) |×|F (cid:48) | (cid:21) . Similarly, we denote by M [ s s ] is a sub-matrix of matrix M where rows are picked according to set s and columns arepicked according to set s . The sets A , F , F (cid:48) , and matrix Q are introduced in Appendix F. The Lagrangian function iswritten as L ( Y , R , µ , γ , η , ξ ) (cid:44) F ( R ) + (cid:88) ( b,a ) ∈E µ ba g ba ( Y , R ) − (cid:88) v γ v g v ∈V ( Y )+ (cid:88) n ∈N η n r n − (cid:88) n ∈N η (cid:48) n ( r n − ¯ λ n )+ (cid:88) v ∈V ,i ∈I ξ vi y vi − (cid:88) v ∈V ,i ∈I ξ (cid:48) vi ( y vi − Due to the second order sufﬁciency conditions, we have V T ∇ xx L ( ˆ Y , ˆ R , ˆ µ , ˆ γ , ˆ η , ˆ ξ ) V < , (34)for all V (cid:54) = 0 such that J × V = . (35)We decompose V and ∇ xx L ( ˆ Y , ˆ R , ˆ µ , ˆ γ , ˆ η , ˆ ξ ) into variablescorresponding to class F and F (cid:48) : V = (cid:20) V F V F (cid:48) (cid:21) , ∇ xx L ( ˆ Y , ˆ R , ˆ µ , ˆ γ , ˆ η , ˆ ξ ) = (cid:20) H [ F F ] H [ F F (cid:48) ] H [ F (cid:48) F ] H [ F (cid:48) F (cid:48) ] (cid:21) For all V (cid:54) = 0 that satisﬁes (35) we can write (cid:20) J A F ] J [ A F (cid:48) ] |F (cid:48) |×|F| Q |F (cid:48) |×|F (cid:48) | (cid:21) × (cid:20) V F V F (cid:48) (cid:21) = ⇒ V F (cid:48) = . (36)Based on (34), (35), and (36) we can write V T F H [ F F ] V F < , (37)or all V F (cid:54) = 0 such that J [ A F ] V F = 0 . Now consider the following matrix: (cid:20) H [ F F ] J T [ A F ] J [ A F ] (cid:21) (38)We claim that matrix deﬁned in (38) is non-singular. Suppose (cid:20) H [ F F ] J T [ A F ] J [ A F ] (cid:21) × (cid:20) X F X A (cid:21) = . Then, J [ A F ] X F = 0 ,H [ F F ] X F + J T [ A F ] X A = 0 (39)Since J [ A F ] X F = 0 , if X F (cid:54) = we must have X T F H F×F X F < according to (37). We multiply both sidesof (39) to obtain X T F H F×F X F + X T F J T [ A F ] X A = 0 ⇒ X T F H F×F X F = 0 , which is not possible. As a result X F = and J T [ A F ] X A = .If X A (cid:54) = , it violates regularity of ( ˆ Y , ˆ R ) . So X F = and X A = . As a result the matrix in (38) is non-singular. Similarto Conn et. al. [49] we deﬁne g l ( R , Y , µ , γ ) (cid:44) F ( R ) + (cid:88) ( b,a ) ∈E µ ba g ba ( Y , R ) − (cid:88) v ∈V γ v g v ( Y ) Due to KKT conditions we have ∂g l ( R , Y , µ , γ ) ∂r n + ˆ η n − ˆ η (cid:48) n = 0 ∂g l ( R , Y , µ , γ ) ∂y vi + ˆ ξ vi − ˆ ξ (cid:48) vi = 0 if ∂g l ( R , Y , µ , γ ) ∂r n = 0 , we have ˆ η n − ˆ η (cid:48) n = 0 . Lagrangemultipliers ˆ η n and ˆ η (cid:48) n correspond to lower bound and upperbound constraints on the variable r n respectively. Due tocomplementary slackness, both of them cannot be positivesince a variable cannot be equal to its lower bound and itsupper bound at the same time. Therefore, both are zero. Ifthey are both zero, it means neither of upper bound and lowerbound constraints are active, due to the strict complementaryslackness in the second-order sufﬁcient conditions. Hence, if ∂g l ( R , Y , µ , γ ) ∂r n = 0 , then ¯ λ n > r n > . The same is true for cache allocation variables y vi . Therefore,the set J in Assumption 5 of Conn et. al. [49] is exactlythe set F of variables which are not on their bounds. As aresult, the matrix deﬁned in Assumption 5 of Conn et. al.[49] is equivalent to the matrix (38). Hence, if the secondorder sufﬁcient condition and the regularity assumption hold,Assumption 5 of Conn et. al. [49] holds automatically. According to Lemma 9, if the second order sufﬁcient condi-tion and the regularity assumption hold, Assumption 5 of Connet. al. [49] holds automatically. In addition, Assumption 3 isexactly the Assumption 4 of Conn et. al. [49], and the singlelimit point assumption is the same as Assumption 6 of Connet. al. [49]. The strict complementary slackness is the sameassumption made in part (ii) of Theorem 5.3 of Conn et. al.[49]. The proper choice of parameters is deﬁned by Conn et.al. [49] as follows: α (cid:44) min(1 , α w ) and β (cid:44) min(1 , β w ) and whenever α η and β η satisfy the conditions α η < min(1 , α w ) β η < βα η + β η < α + 1 , Thus, assumptions of Proposition 1 implies all the assumptionsfor part (ii) of Theorem 5.3 and Corollary 5.7 of Conn et.al. [49]. Hence, the R-linear convergence results stated inCorollary 5.7 of Conn et. al. [49] hold here.A

PPENDIX JP ROOF OF C OROLLARY (1 − /e ) min { , r ( i,p ) ¯ λ ( i,p ) + a (cid:88) k =1 y p k i }≤ − (1 − r ( i,p ) ¯ λ ( i,p ) ) a (cid:89) v = p (1 − y vi ) ≤ min { , r ( i,p ) ¯ λ ( i,p ) + a (cid:88) k =1 y p k i } . multiplying both sides by ¯ λ ( i,p ) , we have (1 − /e )¯ λ ( i,p ) min { , r ( i,p ) ¯ λ ( i,p ) + a (cid:88) k =1 y p k i }≤ ¯ λ ( i,p ) − (¯ λ ( i,p ) − r ( i,p ) ) a (cid:89) v = p (1 − y vi ) ≤ ¯ λ ( i,p ) min { , r ( i,p ) ¯ λ ( i,p ) + a (cid:88) k =1 y p k i } . Summing over all ( i, p ) : ( a, b ) ∈ p , we have (1 − /e ) (cid:88) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) min { , r ( i,p ) ¯ λ ( i,p ) + a (cid:88) k =1 y p k i }≤ (cid:88) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) − (¯ λ ( i,p ) − r ( i,p ) ) a (cid:89) v = p (1 − y vi ) ≤ (cid:88) ( i,p ):( a,b ) ∈ p ¯ λ ( i,p ) min { , r ( i,p ) ¯ λ ( i,p ) + a (cid:88) k =1 y p k i } , and this concludes the proof. EFERENCES[1] F. P. Kelly, A. K. Maulloo, and D. K. Tan, “Rate control for commu-nication networks: shadow prices, proportional fairness and stability,”

Journal of the Operational Research society , vol. 49, no. 3, pp. 237–252, 1998.[2] R. Srikant,

The mathematics of Internet congestion control . SpringerScience & Business Media, 2012.[3] F. Kelly and E. Yudovina,

Stochastic networks . Cambridge UniversityPress, 2014, vol. 2.[4] J. Mo and J. Walrand, “Fair end-to-end window-based congestioncontrol,”

IEEE/ACM Transactions on Networking , vol. 8, no. 5, pp. 556–567, 2000.[5] S. H. Low and D. E. Lapsley, “Optimization ﬂow control. i. basicalgorithm and convergence,”

IEEE/ACM Transactions on networking ,vol. 7, no. 6, pp. 861–874, 1999.[6] M. Chiang, S. H. Low, A. R. Calderbank, and J. C. Doyle, “Layeringas optimization decomposition: A mathematical theory of networkarchitectures,”

Proceedings of the IEEE , vol. 95, no. 1, pp. 255–312,2007.[7] E. J. Rosensweig, J. Kurose, and D. Towsley, “Approximate models forgeneral cache networks,” in

INFOCOM , 2010, pp. 1–9.[8] N. C. Fofack, P. Nain, G. Neglia, and D. Towsley, “Analysis of ttl-basedcache networks,” in , 2012, pp. 1–10.[9] S. Ioannidis and E. Yeh, “Adaptive caching networks with optimalityguarantees,”

SIGMETRICS Perform. Eval. Rev. , vol. 44, no. 1, p.113–124, 2016.[10] S. Shenker, M. Casado, T. Koponen, N. McKeown et al. , “The futureof networking, and the past of protocols,”

Open Networking Summit ,vol. 20, pp. 1–30, 2011.[11] D. Kreutz, F. M. V. Ramos, P. E. Ver´ıssimo, C. E. Rothenberg,S. Azodolmolky, and S. Uhlig, “Software-deﬁned networking: A com-prehensive survey,”

Proceedings of the IEEE , vol. 103, no. 1, pp. 14–76,2015.[12] B. Han, V. Gopalakrishnan, L. Ji, and S. Lee, “Network functionvirtualization: Challenges and opportunities for innovations,”

IEEECommunications Magazine , vol. 53, no. 2, pp. 90–97, 2015.[13] V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs,and R. L. Braynard, “Networking named content,” in

CoNEXT , 2009,pp. 1–12.[14] L. Zhang, D. Estrin, J. Burke, V. Jacobson, J. D. Thornton, D. K.Smetters, B. Zhang, G. Tsudik, D. Massey, C. Papadopoulos et al. ,“Named data networking (ndn) project,”

Relat´orio T´ecnico NDN-0001,Xerox Palo Alto Research Center-PARC , vol. 157, p. 158, 2010.[15] K. Kamran, E. Yeh, and Q. Ma, “Deco: Joint computation, caching andforwarding in data-centric computing networks,” in

Mobihoc , 2019, p.111–120.[16] H. Feng, J. Llorca, A. M. Tulino, and A. F. Molisch, “Optimal dynamiccloud network control,”

IEEE/ACM Transactions on Networking , vol. 26,no. 5, pp. 2118–2131, 2018.[17] H. Li, K. Ota, and M. Dong, “Learning iot in edge: Deep learning forthe internet of things with edge computing,”

IEEE Network , vol. 32,no. 1, pp. 96–101, 2018.[18] R. Mahmud, F. L. Koch, and R. Buyya, “Cloud-fog interoperability iniot-enabled healthcare solutions,” in

ICDCN , 2018.[19] J. Ekanayake, S. Pallickara, and G. Fox, “Mapreduce for data inten-sive scientiﬁc analyses,” in

IEEE Fourth International Conference oneScience , 2008, pp. 277–284.[20] S. A. Makhlouf and B. Yagoubi, “Data-aware scheduling strategy forscientiﬁc workﬂow applications in iaas cloud computing,”

InternationalJournal of Interactive Multimedia and Artiﬁcial Intelligence , vol. 5,no. 4, pp. 75–85, 2019.[21] A. A. Bian, B. Mirzasoleiman, J. Buhmann, and A. Krause, “GuaranteedNon-convex Optimization: Submodular Maximization over ContinuousDomains,” in

Proceedings of the 20th International Conference onArtiﬁcial Intelligence and Statistics , vol. 54, 2017, pp. 111–120.[22] I. Baev, R. Rajaraman, and C. Swamy, “Approximation algorithmsfor data placement problems,”

SIAM J. Comput. , vol. 38, no. 4, p.1411–1429, 2008.[23] Y. Bartal, A. Fiat, and Y. Rabani, “Competitive algorithms for distributeddata management,”

Journal of Computer and System Sciences , vol. 51,no. 3, pp. 341 – 358, 1995. [24] L. Fleischer, M. X. Goemans, V. S. Mirrokni, and M. Sviridenko, “Tightapproximation algorithms for maximum general assignment problems,”in

SODA , 2006, p. 611–620.[25] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, andG. Caire, “Femtocaching: Wireless content delivery through distributedcaching helpers,”

IEEE Transactions on Information Theory , vol. 59,no. 12, pp. 8402–8413, 2013.[26] M. Mahdian, A. Moharrer, S. Ioannidis, and E. Yeh, “Kelly cachenetworks,” in

INFOCOM , 2019, pp. 217–225.[27] Y. Li and S. Ioannidis, “Universally stable cache networks,” in

INFO-COM , 2020.[28] M. Dehghan, A. Seetharam, B. Jiang, T. He, T. Salonidis, J. Kurose,D. Towsley, and R. Sitaraman, “On the complexity of optimal routingand content caching in heterogeneous networks,” in

INFOCOM , 2015,pp. 936–944.[29] E. Yeh, T. Ho, Y. Cui, M. Burd, R. Liu, and D. Leong, “Vip: Aframework for joint dynamic forwarding and caching in named datanetworks,” in

ACM-ICN , 2014, p. 117–126.[30] M. Mahdian and E. Yeh, “Mindelay: Low-latency joint caching andforwarding for multi-hop networks,” in

ICC , 2018, pp. 1–7.[31] S. Ioannidis and E. Yeh, “Jointly optimal routing and caching for arbi-trary network topologies,” in

Proceedings of the 4th ACM Conferenceon Information-Centric Networking , ser. ICN ’17. Association forComputing Machinery, 2017, p. 77–87.[32] K. Poularakis and L. Tassiulas, “On the complexity of optimal contentplacement in hierarchical caching networks,”

IEEE Transactions onCommunications , vol. 64, no. 5, pp. 2092–2103, 2016.[33] Yonggong Wang, Zhenyu Li, G. Tyson, S. Uhlig, and G. Xie, “Optimalcache allocation for content-centric networking,” in , 2013, pp. 1–10.[34] Hao Che, Ye Tung, and Zhijun Wang, “Hierarchical web cachingsystems: modeling, design and experimental results,”

IEEE Journal onSelected Areas in Communications , vol. 20, no. 7, pp. 1305–1314, 2002.[35] C. Fricker, P. Robert, and J. Roberts, “A versatile and accurate approx-imation for lru cache performance,” in

ITC , 2012, pp. 1–8.[36] V. Martina, M. Garetto, and E. Leonardi, “A uniﬁed approach to theperformance analysis of caching systems,” in

INFOCOM , 2014, pp.2040–2048.[37] G. Bianchi, A. Detti, A. Caponi, and N. Blefari Melazzi, “Check beforestoring: What is the performance price of content integrity veriﬁcationin lru caching?”

SIGCOMM Comput. Commun. Rev. , vol. 43, no. 3, p.59–67, 2013.[38] D. S. Berger, P. Gland, S. Singla, and F. Ciucu, “Exact analysis of ttlcache networks,”

Performance Evaluation , vol. 79, pp. 2 – 23, 2014,special Issue: Performance 2014.[39] A. Ferragut, I. Rodriguez, and F. Paganini, “Optimizing ttl caches underheavy-tailed demands,”

SIGMETRICS Perform. Eval. Rev. , vol. 44, no. 1,p. 101–112, 2016.[40] M. Dehghan, L. Massouli´e, D. Towsley, D. S. Menasch´e, and Y. C. Tay,“A utility optimization approach to network cache design,”

IEEE/ACMTransactions on Networking , vol. 27, no. 3, pp. 1013–1027, 2019.[41] N. K. Panigrahy, J. Li, F. Zafari, D. Towsley, and P. Yu, “A ttl-basedapproach for content placement in edge networks,” arXiv , 2017.[42] N. K. Panigrahy, J. Li, and D. Towsley, “Hit rate vs. hit probabilitybased cache utility maximization,”

SIGMETRICS Perform. Eval. Rev. ,vol. 45, no. 2, p. 21–23, 2017.[43] N. Rozhnova and S. Fdida, “An effective hop-by-hop interest shapingmechanism for ccn communications,” in

INFOCOM , 2012, pp. 322–327.[44] Y. Ren, J. Li, S. Shi, L. Li, G. Wang, and B. Zhang, “Congestion controlin named data networking – a survey,”

Computer Communications ,vol. 86, pp. 1 – 11, 2016.[45] M. Mahdian, S. Arianfar, J. Gibson, and D. Oran, “Mircc: Multipath-aware icn rate-based congestion control,” in

ACM-ICN , 2016, p. 1–10.[46] M. Badov, A. Seetharam, J. Kurose, V. Firoiu, and S. Nanda,“Congestion-aware caching and search in information-centric networks,”in

ACM-ICN , 2014, p. 37–46.[47] D. Tanaka and M. Kawarasaki, “Congestion control in named datanetworking,” in

LANMAN , 2016, pp. 1–6.[48] G. Caroﬁglio, M. Gallo, L. Muscariello, M. Papalini, and Sen Wang,“Optimal multipath congestion control and request forwarding ininformation-centric networks,” in

ICNP , 2013, pp. 1–10.[49] A. Conn, N. Gould, and P. Toint, “A globally convergent lagrangianbarrier algorithm for optimization with general inequality constraintsnd simple bounds,”

Mathematics of Computation of the AmericanMathematical Society , vol. 66, no. 217, pp. 261–288, 1997.[50] A. Conn, P. Toint, and N. Gould, “Global convergence of a class of trustregion algorithms for optimization with simple bounds,”

SIAM Journalon Numerical Analysis , vol. 25, 1988.[51] A. V. Fiacco and G. P. McCormick, “The sequential unconstrained min-imization technique (sumt) without parameters,”

Operations Research ,vol. 15, no. 5, pp. 820–827, 1967.[52] A. Bian, K. Levy, A. Krause, and J. M. Buhmann, “Continuous dr-submodular maximization: Structure and algorithms,” in

Advances inNeural Information Processing Systems , 2017, pp. 486–496.[53] V. G. Crawford, A. Kuhnle, and M. T. Thai, “Submodular cost submod-ular cover with an approximate oracle,” arXiv:1908.00653 , 2019.[54] R. K. Iyer and J. A. Bilmes, “Submodular optimization with submodularcover and submodular knapsack constraints,” in

NeurIPS , 2013, pp.2436–2444.[55] A. Mokhtari, H. Hassani, and A. Karbasi, “Stochastic conditional gradi-ent methods: From convex minimization to submodular maximization,” arXiv:1804.09554 , 2018.[56] H. Hassani, A. Karbasi, A. Mokhtari, and Z. Shen, “Stochastic condi-tional gradient++,” arXiv:1902.06992 , 2019.[57] A. Mokhtari, H. Hassani, and A. Karbasi, “Decentralized sub-modular maximization: Bridging discrete and continuous settings,” arXiv:1802.03825 , 2018.[58] Y. Bian, J. Buhmann, and A. Krause, “Optimal continuous DR-submodular maximization and applications to provable mean ﬁeld in-ference,” in

ICML , 2019.[59] B. Blaszczyszyn and A. Giovanidis, “Optimal geographic caching incellular networks,” in

ICC , 2015, pp. 3358–3363.[60] S. Boyd and L. Vandenberghe,

Convex optimization . Cambridgeuniversity press, 2004.[61] D. P. Bertsekas,

Constrained Optimization and Lagrange MultiplierMethod . Academic Press, 1982.[62] R. Fletcher,

Practical Methods of Optimization, Vol. 2 . John Wiley,1981.[63] D. P. Bertsekas,

Nonlinear programming . Athena scientiﬁc, 1999.[64] L. Seeman and Y. Singer, “Adaptive seeding in social networks,” in

FOCS , 2013, pp. 459–468.[65] M. Karimi, M. Lucic, H. Hassani, and A. Krause, “Stochastic submod-ular maximization: The case of coverage functions,” in

NeurIPS , 2017,pp. 6853–6863.[66] M. X. Goemans and D. P. Williamson, “New 34-approximation al-gorithms for the maximum satisﬁability problem,”

SIAM Journal onDiscrete Mathematics , vol. 7, no. 4, pp. 656–666, 1994.[67] D. Rossi and G. Rossini, “Caching performance of content centricnetworks under multi-path routing (and more),” Telecom ParisTech,Tech. Rep., 2011.[68] J. Kleinberg, “The small-world phenomenon: An algorithmic perspec-tive,” in