[PDF] Cache Placement in Fog-RANs: From Centralized to Distributed Algorithms

Abstract

To deal with the rapid growth of high-speed and/or ultra-low latency data traffic for massive mobile users, fog radio access networks (Fog-RANs) have emerged as a promising architecture for next-generation wireless networks. In Fog-RANs, the edge nodes and user terminals possess storage, computation and communication functionalities to various degrees, which provides high flexibility for network operation, i.e., from fully centralized to fully distributed operation. In this paper, we study the cache placement problem in Fog-RANs, by taking into account flexible physical-layer transmission schemes and diverse content preferences of different users. We develop both centralized and distributed transmission aware cache placement strategies to minimize users' average download delay subject to the storage capacity constraints. In the centralized mode, the cache placement problem is transformed into a matroid constrained submodular maximization problem, and an approximation algorithm is proposed to find a solution within a constant factor to the optimum. In the distributed mode, a belief propagation based distributed algorithm is proposed to provide a suboptimal solution, with iterative updates at each BS based on locally collected information. Simulation results show that by exploiting caching and cooperation gains, the proposed transmission aware caching algorithms can greatly reduce the users' average download delay.

Full PDF

aa r X i v : . [ ee ss . SP ] A ug Cache Placement in Fog-RANs:From Centralized to Distributed Algorithms

Juan Liu,

Member, IEEE , Bo Bai,

Member, IEEE , Jun Zhang,

Senior Member, IEEE ,and Khaled B. Letaief,

Fellow, IEEE

Abstract —To deal with the rapid growth of high-speed and/orultra-low latency data trafﬁc for massive mobile users, fogradio access networks (Fog-RANs) have emerged as a promisingarchitecture for next-generation wireless networks. In Fog-RANs,the edge nodes and user terminals possess storage, computationand communication functionalities to various degrees, whichprovides high ﬂexibility for network operation, i.e., from fullycentralized to fully distributed operation. In this paper, westudy the cache placement problem in Fog-RANs, by takinginto account ﬂexible physical-layer transmission schemes anddiverse content preferences of different users. We develop bothcentralized and distributed transmission aware cache placementstrategies to minimize users’ average download delay subjectto the storage capacity constraints. In the centralized mode,the cache placement problem is transformed into a matroidconstrained submodular maximization problem, and an ap-proximation algorithm is proposed to ﬁnd a solution withina constant factor to the optimum. In the distributed mode, abelief propagation based distributed algorithm is proposed toprovide a suboptimal solution, with iterative updates at each BSbased on locally collected information. Simulation results showthat by exploiting caching and cooperation gains, the proposedtransmission aware caching algorithms can greatly reduce theusers’ average download delay.

Index Terms —Content placement, Fog-RAN, submodular op-timization, belief propagation.

I. I

NTRODUCTION

With the explosive growth of consumer-oriented multimediaapplications, a large scale of end devices, such as smartphones, wearable devices and vehicles, need to be connectedvia wireless networking [2]. This has triggered the rapidincrease of high-speed and/or ultra-low latency data trafﬁc thatis very likely generated, processed and consumed locally atthe edge of wireless networks. To cope with this trend, fogradio access network (Fog-RAN) is emerging as a promisingnetwork architecture, in which the storage, computation, and

This work was supported in part by the NSFC under Grant No. 61601255,the Hong Kong Research Grants Council under Grant No. 610113, theScientiﬁc Research Foundation of Ningbo University under Grant No. 010-421703900 and the Zhejiang Open Foundation of the Most Important Subjectsunder Grant No. 010-421500212. This work was presented in part at the IEEEInternational Conference on Communications (ICC), Kuala Lumpur, Malaysia,May 2016 [1].J. Liu is with the College of Electrical Engineering and Computer Science,Ningbo University, Zhejiang, China, 315211. E-mail: [email protected]. Bai is the Future Network Theory Lab, Huawei Technologies Co., Ltd.,Shatin, N. T., Hong Kong. E-mail: [email protected]. Zhang and K. B. Letaief are with the Department of Electronic and Com-puter Engineering, The Hong Kong University of Science and Technology,Clear Water Bay, Hong Kong. K. B. Letaief is also with Hamad bin KhalifaUniversity, Doha, Qatar. E-mail:[email protected], [email protected]. communication functionalities are moved to the edge of wire-less networks, i.e., to the near-user edge devices and end-userterminals [2]–[4]. To further improve the delivery rate anddecrease latency for mobile users, a promising solution is topush the popular contents towards end users by caching themat the edge nodes in Fog-RANs [3]. Thus, the content deliveryservice of mobile users consists of two phases, i.e., cacheplacement and content delivery [1], [5]–[9]. The recent worksstudying cache-aided wireless networks fall into two majorcategories: 1) analyzing the content delivery performance forcertain cache placement policies; 2) designing cache place-ment strategies for efﬁcient content delivery.It is critical to study the content delivery performance incache-assisted wireless networks to reveal the beneﬁts of plac-ing caches distributedly across the whole network [10]–[16].By coupling physical-layer transmission and random caching,the authors in [10] investigated the system performance interms of the average delivery rate and outage probability forsmall-cell networks, where cache-enabled BSs are modeledas a Poisson point process. In [11] and [12], the throughput-outage tradeoff was investigated and the throughput-outagescaling laws were revealed for cache-assisted wireless net-works, where clustered device caching and one-hop device-to-device (D2D) transmission are applied. This line of workshave also been extended to the multi-hop D2D network in[13], where the multi-hop capacity scaling laws were stud-ied. The throughput scaling laws were studied for wirelessAd-Hoc networks with device caching in [14], where themaximum distance separable (MDS) code and cache-assistedmulti-hop transmission/cache-induced coordinate multipoint(CoMP) delivery were applied. In [15] and [16], content-centric multicasting was studied for cache-enabled cloud RANand heterogeneous cellular networks, respectively.Cache placement strategies should be carefully designedsuch that ﬂexible transmission opportunities can be providedamong users and caching gain can be efﬁciently exploited inthe content delivery phase [1], [7]–[9], [17]–[24]. The cacheplacement problem in femtocell networks was studied in [8],where femtocell BSs with ﬁnite-capacity storages are deployedto act as helper nodes to cache popular ﬁles. In [7], [17],coded caching was exploited to create simultaneous codedmulticasting opportunities to mobile users. This work was ex-tended to the decentralized setting in [18] and hierarchical two-layer network in [19], respectively. By applying an AlternatingDirection Method of Multipliers approach, the authors of [21]proposed a distributed caching algorithm for cache-enabledsmall base stations (SBSs) to minimize the global backhaulosts of all the SBSs subject to the cache storage capacities.In [9], the design of optimal cache placement was pursued forwireless networks, by taking the extra delay induced via back-haul links and physical-layer transmissions into consideration.The authors in [20] proposed user preference proﬁle basedcaching policies for radio access networks along with backhauland wireless channel scheduler to support more concurrentvideo sessions. In [22], mobility-aware caching strategies wereproposed to exploit user mobility patterns to improve cacheperformance. The joint routing and caching problem wasstudied for small-cell networks and heterogeneous networksin [23] and [24], respectively, subject to both the storage andtransmission bandwidth capacity constraints on the small-cellBSs.The existing works mainly focused on designing centralizedcache placement strategies for speciﬁc network structures(e.g. small cell networks), where some speciﬁc transmissionschemes are applied for content delivery. However, very fewworks have studied the cache placement problem in Fog-RANs. We notice that different users may be connected toFog-RANs in different ways and with different transmissionopportunities. Meanwhile, Fog-RANs support ﬂexible networkoperation, i.e., from fully centralized to fully distributed op-eration. This motivates us to develop both centralized anddistributed transmission aware cache placement strategies forthe emerging Fog-RANs so that the spectrum efﬁciency ofcontent delivery is improved as much as possible.In this paper, we consider a Fog-RAN system, where eachuser is served by one or multiple network edge devices, e.g.,base stations (BSs), and each BS is equipped with a cacheof ﬁnite capacity. In contrast to [8] and [24] where eachuser has the same ﬁle preference and ﬁle delivery scheme,we consider that the users have different ﬁle preferences [25]and possibly different candidate transmission schemes. Then,we formulate an optimization problem to minimize the users’average download delay subject to the BSs’ storage capacities,which turns out to be NP-hard. To deal with this difﬁculty, weapply different optimization techniques to ﬁnd efﬁcient cacheplacement policies for centralized and distributed operationmodes of Fog-RANs, respectively.In the centralized mode, we transform the delay min-imization problem into a matroid constrained submodularmaximization problem [26]. In this problem, the averagedelay function is submodular for all the possible transmissionschemes, and the cache placement strategy subject to the BSs’storage capacities is a partition matroid. Based on the submod-ular optimization theory [26], we then develop a centralizedlow-complexity algorithm to ﬁnd a caching solution within / of the optimum in polynomial-time complexity O ( M N K ) ,where M , N and K denote the number of BSs, ﬁles andusers, respectively.In the distributed mode, we develop a low-complexity beliefpropagation based distributed algorithm to ﬁnd a suboptimalcache placement strategy [27]. Based on local information ofits storage capacity, the users in its serving range and theirﬁle request statistics, each BS perform individual computationand exchange its belief on the local caching strategy with itsneighboring BSs iteratively. Through iterations, the distributed (cid:70)(cid:79)(cid:82)(cid:88)(cid:71) (cid:86)(cid:72)(cid:85)(cid:89)(cid:72)(cid:85)(cid:37)(cid:54) (cid:37)(cid:54) (cid:37)(cid:54) (cid:69)(cid:68)(cid:70)(cid:78)(cid:75)(cid:68)(cid:88)(cid:79)(cid:3)(cid:79)(cid:76)(cid:81)(cid:78)(cid:70)(cid:82)(cid:81)(cid:87)(cid:85)(cid:82)(cid:79)(cid:3)(cid:79)(cid:76)(cid:81)(cid:78)(cid:73)(cid:82)(cid:74) (cid:73)(cid:82)(cid:74)(cid:37)(cid:54)(cid:37)(cid:54) (cid:73)(cid:76)(cid:79)(cid:72)(cid:3)(cid:79)(cid:76)(cid:69)(cid:85)(cid:68)(cid:85)(cid:92) (cid:20)(cid:21)(cid:49) Fig. 1. An illustration of a Fog-RAN that consists of BSs and mobile users,where BSs are connected to a cloud data center via backhaul links. Withthe aid of transmission aware caching designs, the neighboring BSs couldcache the same ﬁles and deliver them to their common users via cooperativebeamforming. algorithm converges to a suboptimal caching solution whichachieves an average delay performance comparable to thecentralized algorithm, as shown by simulation results. Bydistributing computing tasks, each individual BS always doesmuch fewer calculations than the central controller whenrunning the caching algorithms. Notice that the distributedcaching algorithm proposed in [21] is run by each SBSindividually and no parameters are shared between the SBSs.In this work, we propose a belief propagation based trans-mission aware distributed caching algorithm which requirescooperation and message passing between neighboring BSs.The rest of this paper is organized as follows. SectionII introduces the system model of Fog-RANs. Section IIIformulates the cache placement problem which minimizes theaverage download delay under the cache capacity constraints.In Section IV, a centralized algorithm is proposed to solve thecache placement problem under the framework of submodularoptimization for the centralized Fog-RANs. In Section V, abelief propagation based distributed algorithm is proposedfor cache placement in the distributed Fog-RANs. SectionVI demonstrates the simulation results. Finally, Section VIIconcludes this paper.II. S

YSTEM M ODEL

As shown in Fig. 1, we consider a Fog-RAN consistingof M edge nodes, i.e., BSs, and K mobile users. Let A = { a , · · · , a M } and U = { u , · · · , u K } denote the BS setand the user set, respectively. Each user can be served byone or multiple BSs, depending on the way it connects tothe Fog-RAN. The connectivity between the users and theBSs is denoted by a K × M matrix L , where each binaryelement l km indicates whether user u k can be served by BS a m . That is, l km = 1 if user u k is located in the coverageof BS a m , and l km = 0 otherwise. The set of users in thecoverage of BS a m is denoted by U m = { u k ∈ U| l km = 1 } .Similarly, the set of serving BSs of user u k is denoted by A k = { a m ∈ A| l km = 1 } .Suppose that the library of N ﬁles, denoted by F = { f , · · · , f N } , is stored at one or multiple content servers2hich could be far away in the cloud data center. The contentservers can be accessed by the BSs via backhaul links, asillustrated in Fig. 1. Assume all the ﬁles have the same size,i.e., | f n | = | f | ( ∀ f n ∈ F ) . The ﬁle popularity distributionconditioned on the event that user u k makes a request isdenoted by p nk , which can be viewed as the user preferenceindicator and estimated via some learning procedure [28],[29]. The user’s ﬁle preferences are normalized such that P Nn =1 p nk = 1 . We also assume that each BS a m has aﬁnite-capacity storage. Denote by Q m the normalized storagecapacity of BS a m , which means that each BS a m can storeat most Q m ﬁles. Let x nm be a binary variable indicatingwhether ﬁle f n is cached at BS a m . That is, x nm = 1 if ﬁle f n is stored at BS a m , and otherwise x nm = 0 . The cachingvariables { x nm } shall be determined collaboratively by theBSs to improve the probability that the users’ requested ﬁlescan be found in the caches of the BSs, i.e., the hit probability.Meanwhile, the cooperative caching strategy, denoted by X ,should also be carefully designed to provide ﬂexible andcooperative transmission opportunities for each user.When user u k makes a request for ﬁle f n , the servingBSs A k jointly decide how to transmit to this user based onthe caching strategy X . Speciﬁcally, when ﬁle f n is cachedin one or multiple BSs, the BSs transmit this ﬁle to theuser directly by employing some transmission schemes, e.g.,non-cooperative transmission or cooperative beamforming, asshown in Fig. 1. When ﬁle f n has not been cached in anyserving BS of the user, the associated BSs A k fetch the ﬁlefrom a content server via backhaul links before they transmitto user u k over wireless channels.The users’ ﬁle delivery performance depends not only on thecache placement strategy but also on the speciﬁc transmissionschemes applied to deliver the ﬁles to the users. In thefollowing, we discuss the ﬁle delivery rates for some typicalphysical-layer transmission schemes, when the requested ﬁleis cached in one or multiple associated BSs.

1) Non-cooperative Transmission:

When user u k is servedby one single BS a m , a non-cooperative transmission schemebe applied by this BS to transmit the ﬁle to the user directly, ifthe requested ﬁle f n is cached in this BS. Assume that efﬁcientinterference management schemes are applied and interferencepower is constrained by a ﬁxed value χ . Let SINR m = P m N B + χ denote the target signal-to-interference-plus-noise ratio (SINR)at the transmitter side, where P m is the average transmissionpower at BS a m , N is the power spectral density of noise,and B is the system bandwidth. The ﬁle delivery rate in timeslot i can be estimated as R nk ( X , i ) = B log (cid:16) | h km ( i ) | l km x nm SINR m (cid:17) , (1)where h km ( i ) denotes the channel coefﬁcient between user u k and BS a m in time slot i .

2) Cooperative Beamforming:

When user u k is served bymultiple BSs, cooperative beamforming can be applied by theassociated BSs A k , if ﬁle f n has been cached in multiple BSsand the instantaneous channel state information is available.During the ﬁle delivery phase, cooperative beamformer canbe created possibly in a distributed way to avoid signaling overhead [30]. Accordingly, the ﬁle delivery rate in time slot i is estimated as R nk ( X , i ) = B log  X a m ∈A k,n | h km ( i ) | x nm SINR m  , (2)where A k,n ⊆ A k denotes a set of BSs that transmit ﬁle f n to user u k via cooperative beamforming.In this work, we aim at ﬁnding the optimal cache placementstrategy to minimize the average download delay, consideringdifferent candidate transmission schemes for each user, as bepresented in the next section.III. P ROBLEM F ORMULATION FOR C ACHE P LACEMENT

In this section, we ﬁrst show how to calculate the averagedownload delay by applying martingale theory [31]. Then, weformulate the cache placement problem.Let ¯ D nk ( X ) denote the average delay for user u k todownload ﬁle f n from its serving BSs for a given cachingstrategy X and a speciﬁc transmission scheme. When ﬁle f n has been cached in one or multiple BSs, user u k can downloadthis ﬁle from the associated BSs with rate R nk ( X , i ) (c.f. (1)-(2)) in each time slot i . In this case, it takes at least T ∗ nk ( X ) time slots for user u k to successfully receive all the bits ofﬁle f n . The minimum number of time slots T ∗ nk ( X ) can beevaluated as T ∗ nk ( X ) = arg min ( T : T X i =1 R nk ( X , i ) ≥ | f n | ∆ t ) , (3)where ∆ t is the duration of one time slot. Thus, for user u k ,the average delay of downloading ﬁle f n is expressed as ¯ D nk ( X ) = E h { T ∗ nk ( X ) } ∆ t. (4)When ﬁle f n has not been cached at any associated BS, one ormultiple serving BSs of user u k , denoted by A ′ k , should ﬁrstfetch the ﬁle from the content server via the backhaul linkbefore delivering the requested ﬁle to this user over wirelesschannel. Let D nk denote the extra delay of downloadingﬁle f n from the content server to the BSs A ′ k . We thenevaluate the average download delay under the assumptionthat the channel coefﬁcients { h km ( i ) } are identically andindependently distributed (i.i.d.) across the time slots i in thefollowing theorem. Theorem 1.

If the channel coefﬁcients { h km ( i ) } are i.i.d.across the time slots, the average delay for user u k todownload ﬁle f n can be expressed as ¯ D nk ( X ) = ( | f n | E h { R nk ( X ) } , P a m ∈A k x nm = 0 ,D nk + | f n | E h { R nk ( X k ) } , P a m ∈A k x nm = 0 . (5) where E h {·} denotes the expectation over the channel coefﬁ-cients { h km ( i ) } and X k is a caching strategy with x nm = 1 for a m ∈ A ′ k .Proof: The proof is deferred to Appendix A.From this theorem, we can evaluate the average downloaddelay by (5) for any given caching strategy and employed3ransmission scheme. Without loss of generality, we assumethat the users’ average delay of downloading ﬁle f n from thecontent server is larger than the average delay of direct ﬁledelivery from the BSs and the following inequality holds: | f n | E h { R nk ( X k ) } + D nk > max P am ∈A k x nm =0 (cid:26) | f n | E h { R nk ( X ) } (cid:27) . (6)If D nk is much larger than | f n | E h { R nk ( X k ) } , the average delay ¯ D nk ( X ) can be approximated by D nk when P a m ∈A k x nm =0 . Notice that D nk is the sum of the delay of ﬁle deliverywithin the Internet which mainly depends on the level ofcongestion in the network, and the delay of ﬁle delivery viabackhaul links which may depend on the backhaul capacitiesand the caching strategy X . Considering all these effects,the impact of the caching strategy X on the delay D nk isnegligible. Hence, we assume that the average delay D nk isﬁxed and can be evaluated by the average time of downloadingﬁle f n from the content server to the serving BSs of user u k .In the considered system, we seek to design transmissionaware cache placement strategies to minimize the average de-lay of all the users, by taking different candidate transmissionschemes for each user into consideration. Formally, the cacheplacement problem can be formulated as followsminimize { x nm } ¯ D ( X ) = 1 K K X k =1 N X n =1 p nk ¯ D nk ( X ) subject to (P Nn =1 x nm ≤ Q m , ∀ a m ∈ A , ( a ) x nm ∈ { , } , ∀ f n ∈ F , a m ∈ A , ( b ) (7)where constraint ( .a) means that each BS a m is allowed tostore at most Q m ﬁles. Since the variable x nm is binary,Problem (7) is a constrained integer programming problem,which is generally NP-hard [32]. Hence, it is very challengingto ﬁnd the optimal solution X ∗ to Problem (7). In the nexttwo sections, we show how to approach the optimal cacheplacement strategy in the centralized and distributed modes ofFog-RANs, respectively.IV. S UBMODULAR O PTIMIZATION BASED C ENTRALIZED C ACHE P LACEMENT A LGORITHM

As a powerful tool for solving combinatorial optimizationproblems, the submodular optimization is applied when Fog-RANs operate in the centralized mode with the aid of a centralcontroller. In this section, Problem (7) is ﬁrst reformulatedinto a monotone submodular optimization problem subjectto a matroid constraint. A centralized low-complexity greedyalgorithm is then proposed to obtain a suboptimal cacheplacement strategy with guaranteed performance. The basicconcepts about matroid and submodular function can be foundin [26].

A. Matroid Constrained Submodular Optimization

We ﬁrst deﬁne the ground set for cache placement as S = n f (1)1 , · · · , f (1) N , · · · , f ( M )1 , · · · , f ( M ) N o , (8) where f ( m ) n denotes the event that ﬁle f n is placed in the cacheof BS a m . The ground set S contains all possible cachingstrategies which can be applied in the system. In particular,we use S m = n f ( m )1 , f ( m )2 , · · · , f ( m ) N o ( ∀ m = 1 , , . . . , M ) (9)to denote the set of all ﬁles that might be placed in the cacheof BS a m . Thus, the ground set S can be partitioned into M disjoint sets, i.e., S = S Mm =1 S m , S m T S m ′ = ∅ for any m = m ′ .Given the ﬁnite ground set S , we continue to deﬁne apartition matroid M = ( S ; I ) , where I ⊆ S is a collectionof independent sets deﬁned as: I = n X ⊆ S : (cid:12)(cid:12)(cid:12) X \ S m (cid:12)(cid:12)(cid:12) ≤ Q m , ∀ m = 1 , , . . . , M o , (10)which accounts for the constraint on the cache capacity Q m at each BS a m (c.f. ( .a)). The set of ﬁles placed in the cacheof BS a m can be denoted by X m = X T S m .Then, we show that the average delay is a monotonesupermodular set function over the ground set S . Note thatevery set has an equivalent boolean presentation. For any X ⊆ S , the incidence vector of X is denoted by the vector µ ∈ { , } S whose i -th element is deﬁned as µ i . = x nm , i = ( m − N + n, (11)where . = represents the mapping between x nm and µ i . Inthe set X ⊆ S , f ( m ) n ∈ X indicates µ i = x nm = 1 .Otherwise, µ i = x nm = 0 . Similarly, the boolean presentationof the subset X m is denoted by µ m . In this context, the delayfunction ¯ D nk ( X ) is equivalent to the set function ¯ D nk ( X ) over the set X ⊆ S . The property of ¯ D nk ( X ) is summarizedin the following theorem. Theorem 2. ˜ D nk ( X ) = − ¯ D nk ( X ) is a monotone submodu-lar function deﬁned over X ∈ I .Proof:

The proof is deferred to Appendix B.From [26], the class of submodular functions is closed undernon-negative linear combinations. Therefore, for p nk ≥ with k = 1 , , . . . , K and n = 1 , , . . . , N , the set function ˜ D ( X ) = 1 K K X k =1 N X n =1 p nk ˜ D nk ( X ) (12)is also monotone submodular.By taking the partition matroid M = ( S ; I ) (c.f. (10)) intoconsideration, Problem (7) can be reformulated into a matroidconstrained monotone submodular maximization problem:maximize ˜ D ( X ) = 1 K K X k =1 N X n =1 p kn ˜ D ( X ) subject to X ∈ I , (13)where the constraint X ∈ I (c.f. (10)) shows that each BS a m can cache up to Q m ﬁles.4 lgorithm 1 Centralized algorithm for cache placement Set

X ← ∅ and

Y ← S ; Set X m ← ∅ and Y m ← S m for m = 1 , , · · · , M ; Calculate ∆ X ( s ) for each element s ∈ S\X ; repeat Select the element f ( m ) n with the highest marginal gain, f ( m ) n = arg max s ∈S\X , X S { s }∈I ∆ X ( s ) ; Add f ( m ) n to the sets X and X m : X ← X S { f ( m ) n } , X m ← X m S { f ( m ) n } ; Remove f ( m ) n from the sets Y and Y m : Y m ← Y m \ { f ( m ) n } , Y ← Y \ { f ( m ) n } ; if |X m | = Q m then Y ← Y \ Y m ; end if Calculate ∆ X ( s ) for each element s ∈ S\X ; until Y = ∅ or ∆ X ( s ) = 0 for all s ∈ S\X B. Centralized Algorithm Design for Cache Placement

We adopt a greedy algorithm [26] to ﬁnd a suboptimalsolution to Problem (13) in a centralized way. Deﬁne themarginal gain of adding one element s ∈ S\X to the set X as ∆ X ( s ) = ˜ D (cid:16) X [ { s } (cid:17) − ˜ D ( X ) . (14)At ﬁrst, X and X m are initialized to be the empty set ∅ , while Y and Y m are initialized as the set S . In each step, we calculatethe marginal gain ∆ X ( s ) for each element s ∈ S\X and selectthe element f ( m ) n with the highest marginal gain, i.e., f ( m ) n = arg max s ∈S\X , X S { s }∈I ∆ X ( s ) , (15)where X S { s } ∈ I indicates that adding the new element f ( m ) n into the current set X does not violate the cache capacityconstraint at each BS a m . Then, we add this element f ( m ) n tothe set X m as well as the set X , and remove it from the sets Y and Y m at the same time. When the set X m has accumulated Q m elements, the set Y m be removed from the set Y , whichmeans that BS a m has cached up to Q m ﬁles and has nospace for any more ﬁle. This step runs repeatedly until nomore element can be added, i.e., the marginal value ∆ X ( s ) iszero for all s ∈ S\X or the set Y becomes empty. The aboveprocedures are summarized in Algorithm . According to [33],the greedy algorithm can achieve the expected / -ratio of theoptimal value in general. The computation complexity of thecentralized algorithm can be estimated as O ( N M K ) in theworst case.V. B ELIEF P ROPAGATION BASED D ISTRIBUTED C ACHE P LACEMENT A LGORITHM

When Fog-RANs operate in the distributed mode, thereexists no central controller. The BSs should carry out a dis-tributed algorithm for cache placement autonomously, relyingon locally collected network-side and user-related information,as well as local interactions between BSs in the neighborhood.In this section, we propose a belief propagation based dis-tributed algorithm to perform cooperative caching. The basic concept of the message passing procedure can be found inAppendix C.

A. Factor Graph Model for Cache Placement

To apply the belief propagation based distributed algorithm,Problem (7) is ﬁrst transformed into an unconstrained opti-mization problem as presented in Lemma 3. To this end, wedeﬁne two functions of the caching strategy X as: η nk ( X ) = exp (cid:0) − p nk ¯ D nk ( X ) (cid:1) , (16) g m ( X ) = ( , P Nn =1 x nm ≤ Q m , , otherwise . (17) Lemma 3.

Let C = { ( f n , u k ) | p nk > , f n ∈ F , u k ∈ U} denote the set of all possible pairs of ﬁle f n and user u k .Problem (7) is equivalent to the following problem ˆ X = arg max X ∈{ , } NM Y ( f n ,u k ) ∈C η nk ( X ) M Y m =1 g m ( X ) . (18) Proof:

Problem (7) is equivalent to maximizing − P Kk =1 P Nn =1 p nk ¯ D nk ( X ) subject to the constraints P Nn =1 x nm ≤ Q m for all m . By introducing the exponentialfunction η nk ( X ) given by (16) and the indicator function g m ( X ) given by (17), the equivalent optimization problem isconverted into a product form, as presented in (18).In (18), η nk ( X ) is used to measure the delay performancewhen transmitting ﬁle f n to user u k , and g m ( X ) imposes astrict constraint on the cache capacity of BS a m .Then, we present the factor graph model for the optimiza-tion problem (18). According to the network topology (e.g.,Fig. 2(a)), we introduce a variable node µ i for each element x nm and a function node F j for each function η nk ( X ) or g m ( X ) , as shown in Fig. 2(b). The mapping rule from x nm to µ i is given by (11), and the mapping rule from η nk ( X ) or g m ( X ) to F j is expressed as F j . = ( η nk , j = P k − l =1 |F l | + ξ ( n, k ) ,g m , j = P Kk =1 |F k | + m, (19)where F k = { f n | p nk > } denotes the set of ﬁles which maybe requested by user u k , and |F k | is the number of elementsin the set F k , and ξ ( n, k ) denotes the index of ﬁle f n in theset F k .In the bipartite factor graph (e.g., Fig. 2(b)), each variablenode µ i . = x nm is adjacent to the function nodes { F j } . = { η nk } S { g m } for all u k ∈ U m . Similarly, each function node F j . = η nk is connected to the variable nodes { µ i = x nm } forall a m ∈ A k . Each function node F j . = g m is adjacent to thevariable nodes { µ i . = x nm } for all f n ∈ F . Hence, there are I = N M variable nodes and J = M + P Kk =1 |F k | functionnodes in this factor graph model. B. Message Passing Procedure for Cache Placement

Our goal is to design a message-passing procedure whichallows us to gradually approach the optimal solution to (18).5 (cid:21) file AP user (cid:49)

BS 2 User 1User 2User 312 BS 1 BS 1 BS 2 BS 2BS 1 BS 2 BS 1 BS 2 (a) Connectivity between the BSs and users (cid:20)(cid:21) file AP user (cid:49)

BS 2 User 1User 2User 3BS 1 BS 1 BS 2 BS 2BS 1 BS 2 BS 1 BS 2 (b) The factor graph modelFig. 2. An illustrative example: (a) a system with BSs, users, and alibrary of ﬁles, (b) the factor graph model.

1) Message Update :

Let m tµ i → F j ( x ) denote the messagefrom a variable node µ i to a function node F j , and m tF j → µ i ( x ) denote the message from a function node F j to a variable node µ i , respectively. The update of the messages m tµ i → F j ( x ) and m tF j → µ i ( x ) can be obtained by (31) and (32), respectively.Since all the variables { x nm } are binary, it is sufﬁcient topass the scalar ratio of the messages between each pair ofnodes in practice. We can also express the message ratios inthe logarithmic domain as α ti → j = log m tµ i → F j (1) m tµ i → F j (0) ! , β tj → i = log m tF j → µ i (1) m tF j → µ i (0) ! . (20)In this way, the computation complexity and communicationoverhead are greatly reduced. This is because only half ofthe messages are actually calculated and passed. As shownin Fig. 2(b), the message α ti → j , instead of m tµ i → F j ( x ) ( x ∈{ , } ) , is sent from the variable node µ i to the functionnode F j , and the message β tj → i , instead of m tF j → µ i ( x )( x ∈ { , } ) , is sent from the function node F j to the variablenode µ i . Meanwhile, the product operations in (31) and (32)become simple additive operations in the logarithmic domain,as presented in the following theorem. Theorem 4.

The message α ti → j is updated as α t +1 i → j = X l ∈ Γ µi \{ j } β tl → i . (21) When F j . = η nk , the message β t +1 j → i is given by β t +1 j → i = p nk (cid:0) ¯ D nk ( X ti, ) − ¯ D nk ( X ti, ) (cid:1) , (22) where the caching vectors X ti, and X ti, can be obtained byassigning their elements as x nm . = µ l = ( , l ∈ E ti = { i ∈ Γ Fj \{ i }| α ti → j > } , , otherwise , and x nm . = µ l = ( , l ∈ E ti S { i } , , otherwise , respectively. When F j . = g m , the message β tj → i is updated as β t +1 j → i = min n , − α ( Q m ) l → j ( t ) o , (23) where α ( Q m ) l → j ( t ) is the Q m -th message among the messages { α tl → j } ( l ∈ Γ Fj \{ i } ) sorted in the descending order.Proof: The proof is deferred to Appendix D.In practice, the messages α ti → j and β tj → i reﬂect the beliefson the value of µ i and should be updated according to (21)and (22) (or (23)), respectively, in each iteration.

2) Belief Update :

In the t -th iteration, the belief on µ i = x is expressed as b t +1 i ( x ) = Y j ∈ Γ µi m tF j → µ i ( x ) , (24)which is the product of all the messages incident to µ i . Hence,the belief ratio in the logarithmic domain can be obtained as ˜ b ti = log (cid:18) b ti (1) b ti (0) (cid:19) = X j ∈ Γ µi β tj → i , (25)where β tj → i is given by (23) for F l . = g m , and by (22) for F j . = η nk ( j ∈ Γ µi \{ l } ), respectively. As a result, the estimation of µ i can be expressed as ˆ µ ti = ( , if ˜ b ti > , , if ˜ b ti < . (26)In each iteration, each variable node µ i updates its belief onits associated variable x nm according to (25) and makes anestimate of x nm according to (26) until it converges. C. Distributed Cache Placement Algorithm

When we map the message passing procedure derived on thefactor graph (e.g., Fig. 2(b)) back to the original network graph(e.g., Fig. 2(a)), we notice that all the messages are updated atthe BSs and some of them be exchanged between neighboringBSs.6 lgorithm 2

Distributed algorithm for cache placement Map η nk , g m to F j and x nm to µ i for ∀ n, k, m , Set t = 0 and α ti → j = β tj → i = 0 , ∀ i, j , Set t max as a sufﬁciently large constant. while Not convergent and t ≤ t max do for m = 1 : M do for n = 1 : N do Calculate the message α ti → j by (21); for k ∈ e U m do Calculate the message β tj → i for F j . = η nk by(22); end for end for Calculate the message β tj → i for F j . = g m by (23); Calculate the belief ˜ b ti by (25); Estimate each variable ˆ µ i by (26); end for Check the convergence, and set t = t + 1 ; end while Obtain the optimal estimate ˆ X to the solution of (18).

1) Scenario I:

When user u k is connected to one single BS a m , as shown in Fig. 2(b), the update of messages α ti → j and β tj → i is performed at this BS for the variable node µ i . = x nm ,the function nodes F j . = η nk , and F j . = g m . In this case, eachBS a m performs the message calculation and belief update forall the users just served by itself, i.e., u k ∈ U m and |A k | = 1 .

2) Scenario II:

When user u k is in the coverage of multipleBSs A k , the update of messages α ti → j and β tj → i associatedwith the function node F j . = η nk is performed at one BS a m and be exchanged between the serving BSs of this user A k over control links, as shown in Fig. .Notice that message exchanges just take place in ScenarioII, and the communication overhead induced depends on thenumber of common users covered by multiple BSs. From theabove discussion, we summarize the message passing baseddistributed algorithm for cache placement in Algorithm 2. Inthis algorithm, the message update for each user should beperformed just once by one single BS in each iteration. Toavoid confusion, ˜ U m is used to denote the set of users whosemessages are processed by BS a m in Algorithm 2.VI. S IMULATION R ESULTS

In this section, we present simulation results to demonstratethe performance of the proposed cache placement algorithms,i.e., Algorithm 1 and Algorithm 2. We consider a Fog-RANwith M BSs and K mobile users. Each BS serves the usersin a circular cell with a radius of m, and the distancebetween neighboring BSs is m. K users are uniformly andindependently distributed in the area covered by the M cells.File requests of each user u k follow the Zipf distribution withparameter γ k . The users in the cell interior are served by justone single BS, while the users in the overlapping area of cellsare covered by multiple BSs and thus cooperative transmissionmay be enabled. The connectivity between the BSs and usersis thus established.

100 200 300 400 500 600 700 800 900 1000

The cache capacity Q T he a v e r age do w n l oad de l a y CoTC, Algorithm 1Non-CoTC, Algorithm 1Algorithm 2LPC, cooperativeLPC, non-cooperativeGPC, cooperativeGPC, non-cooperative (a) The average download delay

100 200 300 400 500 600 700 800 900 1000

The cache capacity Q T he a v e r age h i t p r obab ili t y CoTCNon-CoTCLPCGPC (b) The average hit probabilityFig. 3. The average delay and hit probability of the proposed cachingstrategies when γ k = 0 . and N = 1000 . Suppose that the system bandwidth is MHz, and the lengthof each time slot is ms. The ﬁle size is equal to Mbits.The path-loss exponent is set as . . The small-scale channelgain | h km | follows independently standard exponential distri-bution in each time slot. Assume that no inter-cell interferenceis induced by adopting appropriate scheduling policies, and thetransmit power is set to make sure that the average receivedSNR at the cell edge is equal to dB. Unless otherwise stated,we set K = 100 , M = 10 , and D nk = 40 s. Suppose that eachuser u k requests ﬁle f n with probability p nk = ( φ ( n )) − γk P Nn =1 n − γk ,where { φ ( n ) } Nn =1 is a random permutation of [1 , · · · , N ] , i.e.,we assume different users have different request distributions.In the considered system, we compare two transmissionaware caching strategies and two baseline popular cachingstrategies: 1) Non-cooperative transmission aware caching(Non-CoTC) strategy, which is designed based on prior knowl-edge that each individual user has the ﬁle preference p nk and isserved by one serving BS using non-cooperative transmission7iven by (1); 2) Cooperative transmission aware caching(CoTC) strategy, which is designed based on prior knowledgethat each individual user has the ﬁle preference p nk and isserved by one BS using non-cooperative transmission given by(1), or by multiple BSs using cooperative beamforming givenby (2), depending on the connectivity between the user andthe BSs; 3) Globally popular caching (GPC) strategy, whichcaches the most Q m popular ﬁles at each BS a m based on thenetwork-wide ﬁle popularity { ˜ p n } . Here, the ﬁle popularityis evaluated as ˜ p n = K P Kk =1 p nk , i.e., the average valueof the ﬁle preferences of users in the network; 4) Locallypopular caching (LPC) strategy, which caches the most Q m popular ﬁles at each BS a m based on the local ﬁle popularity ˜ p ( m ) n = |U m | P u k ∈U m p nk , i.e., the average value of the ﬁlepreferences of users served by the BS a m . The proposedtransmission aware caching strategies can be performed ineither a centralized or a distributed way. There is no differencebetween centralized and distributed ways of performing thePopular caching strategy. A. Performance Evaluation

We demonstrate the performances of our considered fourcaching strategies in two scenarios when γ k = 0 . , N =1000 and γ k = 0 . . kK , N = 200 in Fig. 3 and Fig. 4,respectively. In each scenario, we plot the average downloaddelay and hit probability curves of these caching strategiesin sub-ﬁgures (a) and (b), respectively, for different cachecapacities Q m = Q . When our proposed Non-CoTC or CoTCstrategy is applied, the users’ average download delay ¯ D ( X ) is computed by substituting the solution X that is achievedeither by Algorithm 1 or by Algorithm 2. When the GPC orLPC strategy is applied, the average delay ¯ D ( X ) is obtainedby substituting the GPC or LPC solution X . As shown inFig. 3 and Fig. 4, the average download delay monotonicallydecreases with the increase of the cache capacity Q for anygiven caching strategy. This is due to the fact that with theincrease of storage capacity, more ﬁles are cached in each BSand more users can download ﬁles from local BSs insteadof the content server. Due to the same reason, the users’average hit probability monotonically increases with the cachecapacity.As shown in Fig. 3(a) and Fig. 4(a), the two transmissionaware caching strategies, i.e., Non-CoTC and CoTC, achievesmaller average download delays than the two popular cachingstrategies, i.e., LPC and GPC, for any cache capacity Q lessthan N . Meanwhile, the average hit probabilities of the CoTCand Non-CoTC strategies are higher or equal to that of theLPC strategy, and much higher than the GPC strategies when Q < N , as shown in Fig. 3(b) and Fig. 4(b). This is becausethe transmission aware caching strategies cache ﬁles at theBSs based on the accurate ﬁle preferences of individual usersand the prior information on content delivery techniques thatwill be applied by the BSs. While the LPC or GPC strategyperforms caching based on the ﬁle preference statistics of theusers in each cell or in the network, which could not reﬂectthe ﬁle preferences of individual users.The delay performance of the caching strategies not onlydepends on the users’ hit performance, but also on the trans- mission schemes the BSs will adopt to deliver the requestedﬁles. It is observed from Fig. 3(a) and Fig. 4(a) that the CoTCstrategy performs much better than the Non-CoTC strategyin terms of the average delay and hit probability. The delayperformance gap between the two transmission aware cachingstrategies becomes larger as the cache capacity increases, sincemore ﬁles can be cached to facilitate cooperative transmissionfor cell-edge users. In other words, the CoTC strategy canexploit both caching gain and cooperative gain to reducethe average delay. Hence, the design of caching strategiesshould not only target at improving the users’ average hitprobability, but also bringing more cooperative transmissionopportunities. Similarly, the delay performance is signiﬁcantlyimproved when cooperative transmission is applied instead ofnon-cooperative transmission for any caching strategy.At the same time, the users’ skewness on content popularityhas a great impact on the performances of the consideredcaching strategies. When γ k = 0 . , each user is interestedin a large number of ﬁles while only a very small number ofﬁles can be cached locally at the serving BSs of each userwhen Q is less than N . From Fig. 3(a), the delay gap betweenthe CoTC (or Non-CoTC) strategy and the LPC strategy is notvery large. And the GPC strategy which caches the same ﬁlesin each BS achieves the worst delay and hit performances.When γ k = 0 . . kK , the skewness on content popularityis quite different among users. This means some users haveinterests on many ﬁles while some users just have preferenceson very few ﬁles. In contrast to the case with γ k = 0 . , ahigher proportion of the ﬁles that the users may request can becached at the BSs. Therefore, the delay and hit performancesof the considered caching strategies are all improved. And thedelay gap between the CoTC (or Non-CoTC) strategy and theLPC strategy becomes very signiﬁcant especially when thecache capacity Q is very small. It is also interesting to seethat the delay performance of the LPC strategy gets affectedby content delivery schemes applied by the BSs. As shown inFig. 4(a), the LPC strategy always achieves a smaller averagedelay than the GPC strategy if cooperative transmission isadopted. However, it performs worse in the larger Q region( Q > ) when non-cooperative transmission is applied. Thishappens when some users are served by their serving BSswhich have not cached their requested ﬁles, since the LPCstrategy caches ﬁles based on the ﬁle preferences of co-locatedusers and pushes quite different contents in each BS.From Fig. 3 and Fig. 4, the proposed belief propagationbased distributed algorithm can achieve a nearly identicaldelay performance as compared to the centralized greedyalgorithm which provides a guaranteed performance [33],i.e., / -approximation in the general case and (1 − /e ) -approximation in some special cases. It has a slightly largerdelay performance in the small-capacity region (e.g., Q isaround ), and achieves almost the same performance as thecentralized algorithm in other scenarios. B. Approximation of File Preferences

In practice, it is very challenging to accurately estimatethe ﬁle preference of each individual user due to the lack8

20 40 60 80 100 120 140 160 180 200

The cache capacity Q T he a v e r age h i t p r obab ili t y CoTCNon-CoTCLPCGPC (b) The average hit probabilityFig. 4. The average delay and hit probability performances of the proposedcaching strategies when γ k = 0 . . kK and N = 200 . The parameter γ T he a v e r age do w n l oad de l a y Centralized algorithm, perfect preferenceCentralized algorithm, approximate preferenceDistributed algorithm, perfect preferenceDistributed algorithm, approximate preferencePopular caching algorithm

Fig. 5. The average download delay vs. the parameter γ . Iteration index T he v a l ue i n ea c h i t e r a t i on Q=10Q=30Q=50Q=70Q=90

Fig. 6. The iterative procedure of the proposed distributed algorithm. of sufﬁcient samples. Instead, each BS may estimate anapproximate ﬁle preference for all the users in its coverage,i.e., to estimate the average preference. In this part, we discussthe impact of the users’ ﬁle request preference statistics, eitherperfectly or approximately known. In Fig. 5, we show how theaverage download delay changes with the content popularityskewness. In this experiment, all the users are supposed tohave the same preference parameter γ k = γ . The cachecapacity is set as Q = 50 and the total number of ﬁles is N = 100 . The approximate preference for ﬁle f n is givenby ˜ p nk = |U m | P u k ∈U m p nk ( ∀ u k ∈ U m ), i.e., only thestatistical average of all the users in the coverage of eachBS a m is known, while a perfect knowledge p nk includespreference for each individual user. It is observed that theaverage delay is signiﬁcantly reduced when the parameter γ is increased within . ≤ γ ≤ . In this range, the users havepreferences on fewer and fewer ﬁles with the increase of theparameter γ . This means that more and more requested ﬁlesare cached at the BSs, and can be transmitted to the usersdirectly. As a result, the average download delay is greatlyreduced when γ is increased within . ≤ γ ≤ . When γ > ,almost all the requested ﬁles have been cached and the averagedownload delay is nearly equal to the average transmissiontime from the BSs to the users. In this case, the change ofthe average delay is not obvious. In Fig. 5, we also plot theaverage delay performance when approximate ﬁle preferencesinstead of accurate ﬁle preferences are applied. It can be seenthat the delay gap is very small.In Fig. 6, we plot the iterative procedure of the beliefpropagation based distributed algorithm for different storagecapacities Q m = Q and N = 100 . In this experiment, theCoTC strategy is performed in a distributed way. It is observedthat the average delay starts from an initial value, ﬂuctuates upto dozens of iterations and gradually converges to a suboptimalsolution.9 Cache capacity Q T he nu m be r o f c a l c u l a t i on s Centralized algorithmDistributed algorithm, each BSDistributed algorithm, all BSs

Fig. 7. The number of calculations vs. cache capacity Q . C. Algorithm Complexity

We now discuss the computation complexity of our pro-posed centralized and distributed algorithms when perform-ing the CoTC strategy. Here, we measure the computationcomplexity by the number of calculations required in thealgorithms. In Fig. 7, we plot the computation complexity ofthe proposed algorithms versus the cache capacity Q . In thisexperiment, the number of BSs and the number of users are setas M = 10 and K = 100 , and the total number of ﬁles is setto be . It can be seen that the computation complexity ofthe centralized algorithm rapidly increases with the increaseof the cache capacity Q , while the computation complexityof the distributed algorithm increases very slowly with thecache capacity Q . This indicates that the cache capacityhas a greater impact on the computation complexity of thecentralized algorithm rather than the distributed algorithm,since more elements are added greedily and more iterations areprocessed in the centralized algorithm when the cache capacity Q is increased. When applying the distributed algorithm, thecache capacity is a parameter which only adjusts the value ofthe messages during iterations. It does not change the factorgraph model, and hence may not cause a signiﬁcant impacton its computation complexity.VII. C ONCLUSIONS

In this work, we studied the cache placement problem inFog-RANs, by taking into account different ﬁle preferencesand diverse transmission opportunities for each user. We de-veloped transmission aware cache placement strategies in bothcentralized and distributed operation modes of Fog-RANs. Inthe centralized mode, a low-complexity centralized greedyalgorithm was proposed to achieve a suboptimal solutionwithin a constant factor to the optimum using submodular opti-mization techniques. In the distributed mode, a low-complexitybelief propagation based distributed algorithm was proposedto place ﬁles at the BSs based on locally collected information.Each BS run computations and exchange very few messages with its neighboring BSs iteratively until convergence. By sim-ulations, we showed that both of the proposed algorithms cannot only improve the users’ cache hit probability but also pro-vide more ﬂexible cooperative transmission opportunities forthe users. As a result, our proposed centralized and distributedcache placement algorithms can signiﬁcantly improve the ﬁledelivery performance by providing cooperative transmissionopportunities for mobile users to the maximum extent. It wasalso shown that the distributed cache placement algorithmcan achieve an average delay performance comparable to thecentralized cache placement algorithm while spending muchless calculations in each individual BS.A

PPENDIX

A. Proof of Theorem 1

In the scenario when ﬁle f n has been cached in one ormultiple serving BSs of user u k , i.e., P a m ∈A k x nm = 0 , theassociated BSs can transmit to user u k with rate R nk ( X , i ) (c.f. (1)-(2)) by applying some speciﬁc transmission scheme.Since the channel coefﬁcients h km ( i ) are i.i.d. across thetime slots { i } , the ﬁle delivery rates R nk ( X , i ) are i.i.d.random variables. Hence, the stopping time of completingthe transmission of ﬁle f n , T ∗ nk ( X ) given by (3), is also arandom variable. Based on the deﬁnition of channel capacity,we have R nk ( X , i ) ≥ for i = 1 , , · · · , T ∗ nk ( X ) . Accordingto Wald’s Equation in martingale theory [31], we have E h nP T ∗ nk ( X ) i =1 R nk ( X , i ) o = E h { T ∗ nk ( X ) } · E h { R nk ( X ) } = | f n | ∆ t . (27)Therefore, the average download delay is expressed as ¯ D nk ( X ) = E h { T ∗ nk ( X ) · ∆ t } = | f n | E h { R nk ( X ) } , (28)when ﬁle f n is cached in the associated BSs with P a m ∈A k x nm = 0 . When P a m ∈A k x nm = 0 , ﬁle f n has not been cached in any serving BS of user u k . TheBSs A ′ k download this ﬁle from the content server by thebackhaul link and then transmit to user u k over the wirelesschannel. Accordingly, the average delay can be estimated by ¯ D nk ( X ) = D nk + | f n | E h { R nk ( X k ) } , where D nk is the extra delayof ﬁle delivery from the content server to the serving BSs A ′ k ,and R nk ( X k ) is the data rate at which the BSs A ′ k transmit ﬁle f n to user u k over wireless channel. Here, X k is an equivalentcaching strategy indicating that ﬁle f n can be downloadedfrom the BSs A ′ k by user u k . Thus, the average delay ¯ D nk ( X ) is established in (5). B. Proof of Theorem 2

From Theorem 1, the average delay of downloading ﬁle f n for user u k presented in (5) can also be expressed as ¯ D nk ( X ) = ( | f n | ¯ R nk ( X ) , P Mm =1 x nm = 0 ,D nk + | f n | ¯ R nk ( X k ) , otherwise , (29)where ¯ R nk ( X ) = E { B log(1 + Y nk ( X )) } with Y nk ( X ) = P Mm =1 | h km | x nm SINR m representing the received SINR.10e will show that the average delay ˜ D nk ( X ) = − ¯ D nk ( X ) is a monotone submodular function.Let X ⊆ X ′ ∈ I , and s ∈ S \ X ′ . The incidence vectors for X and X ′ are denoted by X = [ x nm ] and X ′ = [ x ′ nm ] ,respectively. If s = f ( m ) n for any m ∈ A k , we have ˜ D nk ( X ∪ { s } ) − ˜ D nk ( X ) = ˜ D nk ( X ′ ∪ { s } ) − ˜ D nk ( X ′ ) = 0 .We then consider the case when s = f ( m ∗ ) n for any m ∗ ∈ A k . Case I: X = X ′ ∈ I and P m ∈A k x nm = P m ∈A k x ′ nm In this case, s = ∅ ∈ X ′ \ X and ¯ D nk ( X ) =¯ D nk ( X ′ ) . Hence, we have ˜ D nk ( X ∪ { s } ) − ˜ D nk ( X ) =˜ D nk ( X ′ ∪ { s } ) − ˜ D nk ( X ′ ) = 0 . Case II:

X ⊆ X ′ ∈ I and < P m ∈A k x nm < P m ∈A k x ′ nm According to the deﬁnition of ¯ R nk ( X ) , we have ¯ R nk ( X ∪{ s } ) = E { B log(1 + Y nk ( X ) + | h km ∗ | SINR m ∗ )) } . Hence, ¯ R nk ( X ) < ¯ R nk ( X ′ ) and ¯ R nk ( X ∪ { s } ) < ¯ R nk ( X ′ ∪{ s } ) naturally hold due to P Mm =1 x nm < P Mm =1 x ′ nm and Y nk ( X ) < Y nk ( X ′ ) . The gap between ˜ D nk ( X ∪ { s } ) and ˜ D nk ( X ) satisﬁes ˜ D nk ( X ∪ { s } ) − ˜ D nk ( X )= | f n | ¯ R nk ( X ) ¯ R nk ( X ∪ { s } ) E (cid:26) B log (cid:18) | h km ∗ | SINR m ∗ Y nk ( X ′ ) (cid:19)(cid:27) ( a ) > | f n | ¯ R nk ( X ′ ) ¯ R nk ( X ′ ∪ { s } ) E (cid:26) B log (cid:18) | h km ∗ | SINR m ∗ Y nk ( X ′ ) (cid:19)(cid:27) ( b ) > | f n | ¯ R nk ( X ′ ) ¯ R nk ( X ′ ∪ { s } ) E (cid:26) B log (cid:18) | h km ∗ | SINR m ∗ Y nk ( X ′ ) (cid:19)(cid:27) = ˜ D nk (cid:16) X ′ ∪ { s } (cid:17) − ˜ D nk (cid:16) X ′ (cid:17) , where the inequality (a) comes from ¯ R nk ( X ) ≤ ¯ R nk ( X ′ ) and ¯ R nk ( X ∪ { s } ) ≤ ¯ R nk ( X ′ ∪ { s } ) , and the inequality (b) holdssince Y nk ( X ) < Y nk ( X ′ ) and B log (cid:16) | h km ∗ | SINR m ∗ Y nk ( X ) (cid:17) >B log (cid:16) | h km ∗ | SINR m ∗ Y nk ( X ′ ) (cid:17) . Case III:

X ⊆ X ′ ∈ I and P m ∈A k x nm < P m ∈A k x ′ nm We have ˜ D nk ( X ∪ { s } ) − ˜ D nk ( X ) = D nk + | f n | ¯ R nk ( X k ) − | f n | ¯ R nk ( { s } ) . The following inequality ˜ D nk ( X ∪ { s } ) − ˜ D nk ( X ) = D nk + | f n | ¯ R nk ( X k ) − | f n | ¯ R nk ( { s } ) > | f n | ¯ R nk ( X ′ ) − | f n | ¯ R nk ( X ′ ∪ { s } ) = ˜ D nk (cid:16) X ′ ∪ { s } (cid:17) − ˜ D nk (cid:16) X ′ (cid:17) is satisﬁed, since D nk + | f n | ¯ R nk ( X k ) > | f n | ¯ R nk ( X ′ ) and | f n | ¯ R nk ( { s } ) < | f n | ¯ R nk ( X ′ ∪{ s } ) . In this case, we still get ˜ D nk ( X ∪ { s } ) − ˜ D nk ( X ) > ˜ D nk (cid:16) X ′ ∪ { s } (cid:17) − ˜ D nk (cid:16) X ′ (cid:17) .Combining the above three cases, we have ˜ D nk ( X ∪ { s } ) − ˜ D nk ( X ) ≥ ˜ D nk (cid:16) X ′ ∪ { s } (cid:17) − ˜ D nk (cid:16) X ′ (cid:17) . (30)Meanwhile, it is trivial to show that since ¯ R nk ( X ) ≤ ¯ R nk ( X ′ ) , we have ˜ D nk ( X ) ≤ ˜ D nk ( X ′ ) for any X ⊆ X ′ .Therefore, ˜ D nk ( X ) is a monotone submodular function. Inthe above discussion, cooperative beamforming is applied asa candidate transmission scheme to demonstrate the monotone submodular property of the average delay function. In fact, thisproperty holds for any candidate transmission scheme. C. Basics of the Message Passing Procedure

We brieﬂy introduce the factor graph model and the max-product algorithm. A factor graph is a bipartite graph whichconsists of I variable nodes { µ , · · · , µ I } and J functionnodes { F , · · · , F J } . Let Γ µi and Γ Fj denote the set of indicesof the neighboring function nodes of a variable node µ i and that of the neighboring variable nodes of a functionnode F j , respectively. Max-product is a belief propagationalgorithm based on the factor graph model, which is widelyapplied to ﬁnd the optimum of the global function taking theform as F ( µ ) = Q Jj =1 F j ( µ Γ Fj ) in a distributed manner. Acomprehensive tutorial can be found in [27].In each iteration, each variable node sends one updatedmessage to one of its neighboring function nodes and receivesone updated message from this node. According to the max-product algorithm [27], the message from a variable node µ i to a function node F j , i.e., m tµ i → F j ( x ) , is updated as m t +1 µ i → F j ( x ) = Y l ∈ Γ µi \{ j } m tF l → µ i ( x ) , (31)which collects all the beliefs on the value of µ i = x from theneighboring function nodes F l ( l ∈ Γ µi \{ j } ) except F j . Themessage from a function node F j to a variable node µ i , i.e., m tF j → µ i ( x ) , is updated as m t +1 F j → µ i ( x ) = max Γ Fj \{ i } ( F j ( X ) Y l m tµ l → F j ( x l ) ) , (32)which achieves the maximization of the product of the localfunction F j ( X ) and incident messages over conﬁgurations in Γ Fj \{ i } . D. Proof of Theorem By substituting (20) into (31), we can easily obtain thepractical message α ti → j as given by (21).From (32), the derivation of the message β tj → i involves onemaximization operation over all possible values of { µ l = x l } ( l ∈ Γ Fj \{ i } ) . Then, we discuss the message β tj → i in the caseswhen F j . = η nk and F j . = g m , respectively. Case I:

Derivation of β tj → i for F j . = η nk By substituting the average delay (such as the metricpresented in (5)) into (32), the message m t +1 F j → µ i (1) with F j = η nk and µ i = 1 can be represented as m t +1 F j → µ i (1) = max E i  exp( − p nk ¯ D nk ( X (1) )) Y l ∈ E i m tµ l → F j (1) m tµ l → F j (0) ! × Y l ∈ Γ Fj \{ i } m tµ l → F j (0) , (33) where E i ⊆ Γ Fj \{ i } is a subset of the index set Γ Fj \{ i } suchthat its associated elements in X (1) are equal to one, i.e. , µ l =1 for all l ∈ E i ∪ { i } , while µ l = 0 for all l ∈ Γ Fj \{ i }\ E i .Similarly, we can compute the message m t +1 F j → µ i (0) as11 t +1 F j → µ i (0) = max E i  exp( − p nk ¯ D nk ( X (0) )) Y l ∈ E i m tµ l → F j (1) m tµ l → F j (0) ! × Y l ∈ Γ Fj \{ i } m tµ l → F j (0) (34) where E i ⊆ Γ Fj \{ i } is also a subset of the index set Γ Fj \{ i } such that its associated elements in X (0) are equalto one, while the other elements are zero with µ l = 0 for all l ∈ Γ Fj \ E i . From (33) and (34), the message β t +1 j → i can beexpressed as β t +1 j → i = max E i  ( − p nk ¯ D nk ( X (1) )) + X l ∈ E i α tl → j  − max E i  ( − p nk ¯ D nk ( X (0) )) + X l ∈ E i α tl → j  , = p nk (cid:16) ¯ D nk ( X (0) i ) − ¯ D nk ( X (1) i ) (cid:17) , (35) where X (0) i and X (1) i are set as caching vectors by selectingthe variable nodes { µ l } with positive α tl → j , i.e. , l ∈ E + i = { i ′ ∈ Γ Fj \{ i }| α ti ′ → j > } , and assigning their associatedelements to one. Thus, we have µ l . = x nm = 1 for all l ∈ E + i in X (0) i and µ l . = x nm = 1 for all l ∈ E + i ∪ { i } in X (1) i . Thismeans that each function node F j should select its neighboringvariable nodes µ l with positive input message α tl → j and thencalculate the delay gap between ¯ D nk ( X (0) i ) and ¯ D nk ( X (1) i ) . Case II:

Derivation of β tj → i for F j . = g m By substituting the constraint function into (32), the mes-sage m t +1 F j → µ i (1) when F j . = g m can be represented as m t +1 F j → µ i (1) = max E i  g m ( X (1) ) Y l ∈ E i m tµ l → F j (1) m tµ l → F j (0) ! × Y l ∈ Γ Fj \{ i } m tµ l → F j (0) , (36) where E i is a subset of the index set Γ Fj \{ i } and | E i | ≤ Q m − . This means that to satisfy the cache capacityconstraint, there exist at most Q m − neighboring variablenodes { µ l } with µ l = 1 ( l ∈ E i ) except the variable node µ i = 1 . Similarly, we can compute the message m t +1 F j → µ i (0) when F j . = g m as m t +1 F j → µ i (0) = max E i  g m ( x (0) ) Y l ∈ E i m tµ l → F j (1) m tµ l → F j (0) ! × Y l ∈ Γ Fj \{ i } m tµ l → F j (0) (37) where E i is a subset of the index set Γ Fj \{ i } and | E i | ≤ Q m .Since µ i = 0 , there exist at most Q m neighboring variablenodes { µ l } ( l ∈ E i ) with µ l = 1 to satisfy the cachecapacity constraint. From (36) and (37), the message ratio of m t +1 F j → µ i (1) and m t +1 F j → µ i (0) in the logarithmic domain can beexpressed as β t +1 j → i = max E i  X l ∈ E i α tl → j  − max E i  X l ∈ E i α tl → j  . (38) By sorting the messages { α tl → j } ( ∀ l ∈ Γ Fj \{ i } ) in thedecreasing order as α (1) l → j , α (2) l → j , · · · , α ( Q m − l → j , · · · , we canfurther simplify β t +1 j → i as β t +1 j → i = ( min { , − α ( Q m ) l → j } , if α ( Q m − l → j ≥ , , otherwise , (39) which is exactly equal to min { , − α ( Q m ) l → j } , as given by (23).R EFERENCES[1] J. Liu, B. Bai, J. Zhang, and K. B. Letaief, “Content caching at thewireless network edge: A distributed algorithm via belief propagation,”in

Proc. IEEE ICC , Kuala Lumpur, Malaysia, May 2016.[2] M. Chiang, “Fog networking: An overview on re-search opportunities,” Jan. 2016. [Online]. Available:http://arxiv.org/ftp/arxiv/papers/1601/1601.00835.pdf[3] S.-H. Park, O. Simeone, and S. Shamai (Shitz), “Joint optimization ofcloud and edge processing for fog radio access networks,” Jan. 2016.[Online]. Available: http://arxiv.org/pdf/1601.02460v1.pdf[4] Y. Shi, J. Zhang, K. B. Letaief, B. Bai, and W. Chen, “Large-scale convexoptimization for ultra-dense cloud-RAN,”

IEEE Wireless Commun. ,vol. 22, no. 3, pp. 84–91, Jun. 2015.[5] S. Borst, V. Gupta, and A. Walid, “Distributed caching algorithms forcontent distribution networks,” in

Proc. IEEE INFOCOM , Mar. 2010,pp. 1–9.[6] N. Golrezaei, A. F. Molisch, and A. G. Dimakis, “Base-station assisteddevice-to-device communications for high-throughput wireless videonetworks,” in

Proc. IEEE ICC , Jun. 2012, pp. 7077–7081.[7] M. A. Maddah-Ali and U. Niesen, “Fundamental limits of caching,”

IEEE Trans. Inf. Theory , vol. 60, no. 5, pp. 2856–2867, May 2014.[8] N. Golrezaei, K. Shanmugam, A. G. Dimakis, A. F. Molisch, andG. Caire, “FemtoCaching: Wireless video content delivery throughdistributed caching helpers,” in

Proc. IEEE INFOCOM , Mar. 2012, pp.1107–1115.[9] X. Peng, J.-C. Shen, J. Zhang, and K. B. Letaief, “Backhaul-awarecaching placement for wireless networks,” in

Proc. IEEE Globecom ,San Diego, CA, Dec. 2015.[10] E. Ba¸stuˇg, M. Bennis, M. Kountouris, and M. Debbah, “Cache-enabledsmall cell networks: Modeling and tradeoffs,”

EURASIP J. WirelessCommun. , vol. 2015, no. 1, p. 41, Feb. 2015. [Online]. Available:http://jwcn.eurasipjournals.com/content/2015/1/41/abstract[11] M. Ji, G. Caire, and A. F. Molisch, “Optimal throughput-outage trade-off in wireless one-hop caching networks,” in

Proc. IEEE InternationalSymposium on Information Theory Proceedings (ISIT) , Jul. 2013, pp.1461–1465.[12] ——, “Wireless device-to-device caching networks: Basic principles andsystem performance,”

IEEE J. Sel. Areas in Commun. , vol. 34, no. 1,pp. 176–189, Jan. 2015.[13] S.-W. Jeon, S.-N. Hong, M. Ji, G. Caire, and A. F. Molisch, “Wirelessmultihop device-to-device caching networks,” Nov. 2015. [Online].Available: http://arxiv.org/abs/1511.02574[14] A. Liu and V. Lau, “On the improvement of scaling laws for wirelessad hoc networks with physical layer caching,” in

Proc. IEEE ISIT , Jun.2015, pp. 161–165.[15] M. Tao, E. Chen, H. Zhou, and W. Yu, “Content-centric sparse multicastbeamforming for cache-enabled cloud RAN,”

IEEE Trans. WirelessCommun. , vol. 15, no. 9, pp. 6118–6131, Sep. 2016.[16] B. Zhou, Y. Cui, and M. Tao, “Stochastic content-centric multicastscheduling for cache-enabled heterogeneous cellular networks,”

IEEETrans. Wireless Commun. , vol. 15, no. 9, pp. 1536–1276, Sep. 2016.[17] U. Niesen and M. A. Maddah-Ali, “Coded caching with nonuniformdemands,” in

Proc. IEEE INFOCOM WKSHPS , April 27-May 2 2014,pp. 221–226.[18] M. A. Maddah-Ali and U. Niesen, “Decentralized coded caching at-tains order-optimal memory-rate tradeoff,”

IEEE/ACM Trans. Network. ,vol. 23, no. 4, pp. 1029–1040, Aug. 2015.[19] N. Karamchandani, U. Niesen, M. A. Maddah-Ali, and S. Diggavi,“Hierarchical coded caching,” in

Proc. IEEE ISIT , Jun. 2014, pp. 2142–2146.[20] H. Ahlehagh and S. Dey, “Video-aware scheduling and caching in theradio access network,”

IEEE/ACM Trans. Network. , vol. 22, no. 5, pp.1444–1462, Oct. 2014.

21] A. Abboud, E. Ba¸stuˇg, K. Hamidouche, and M. Debbah, “Distributedcaching in 5G networks: An alternating direction method of multipliersapproach,” in

Proc. IEEE International Workshop on Signal ProcessingAdvances in Wireless Communications (SPAWC), Stockholm, Sweden ,June 28-July 1 2015.[22] R. Wang, X. Peng, J. Zhang, and K. B. Letaief, “Mobility-aware cachingfor content-centric wireless networks: Modeling and methodology,”

IEEE Commun. Mag. , vol. 54, no. 8, pp. 77–83, Aug. 2016.[23] K. Poularakis, G. Iosiﬁdis, and L. Tassiulas, “Approximation algorithmsfor mobile data caching in small cell networks,”

IEEE Trans. Commun. ,vol. 62, no. 10, pp. 3665–3677, Oct. 2014.[24] M. Dehghan, A. Seetharam, B. Jiang, T. He, T. Salonidis, J. Kurose,D. Towsley, and R. Sitaraman, “On the complexity of optimal routingand content caching in heterogeneous networks,” in

Proc. INFOCOM ,April 26-May 1 2015.[25] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web cachingand Zipf-like distributions: Evidence and implications,” in

Proc. IEEEINFOCOM , Mar. 1999, pp. 126–134.[26] A. Schrijver,

Combinatorial optimization: Polyhedra and efﬁciency .Berlin: Springer, 2003.[27] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs andthe sum-product algorithm,”

IEEE Trans. Inf. Theory , vol. 47, no. 2, pp.498–519, Feb. 2001.[28] E. Ba¸stuˇg, M. Bennis, and M. Debbah, “A transfer learning approachfor cache-enabled wireless networks,” in

Proc. 13th International Sym-posium on Modeling and Optimization in Mobile, Ad Hoc, and WirelessNetworks (WiOpt) , May 2015, pp. 161–166.[29] B. N. Bharath, K. G. Nagananda, and H. V. Poor, “A learning-basedapproach to caching in heterogenous small cell networks,” Aug. 2015.[Online]. Available: http://arxiv.org/abs/1508.03517[30] R. Mudumbai, G. Barriac, and U. Madhow, “On the feasibility ofdistributed beamforming in wireless networks,”

IEEE Trans. WirelessCommun. , vol. 6, no. 5, pp. 1754–1763, May 2007.[31] David Williams,

Probability with Martingale . Cambridge UniversityPress, 1991.[32] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, andG. Caire, “FemtoCaching: Wireless content delivery through distributedcaching helpers,”

IEEE Trans. Inf. Theory , vol. 59, no. 12, pp. 8402–8413, Dec. 2013.[33] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N.Glance, “Cost-effective outbreak detection in networks,” in

Proc. 13thACM Int. Conf. on Knowledge Discovery and Data Mining (KDD) , 2007,pp. 420–429., 2007,pp. 420–429.