[PDF] Content Placement in Networks of Similarity Caches

Abstract

Similarity caching systems have recently attracted the attention of the scientific community, as they can be profitably used in many application contexts, like multimedia retrieval, advertising, object recognition, recommender systems and online content-match applications. In such systems, a user request for an object o, which is not in the cache, can be (partially) satisfied by a similar stored object o', at the cost of a loss of user utility. In this paper we make a first step into the novel area of similarity caching networks, where requests can be forwarded along a path of caches to get the best efficiency-accuracy tradeoff. The offline problem of content placement can be easily shown to be NP-hard, while different polynomial algorithms can be devised to approach the optimal solution in discrete cases. As the content space grows large, we propose a continuous problem formulation whose solution exhibits a simple structure in a class of tree topologies. We verify our findings using synthetic and realistic request traces.

Full PDF

CContent Placement in Networks of Similarity Caches

Michele Garetto a, ∗ , Emilio Leonardi b , Giovanni Neglia c a Universit`a degli Studi di Torino, C.so Svizzera 185, Torino, Italy b Politecnico di Torino, C.so Duca degli Abruzzi 24, Torino, Italy c Inria - Universit´e Cˆote d’Azur, 2004 route des Lucioles, Sophia Antipolis, France

Abstract

Similarity caching systems have recently attracted the attention of the scientiﬁccommunity, as they can be proﬁtably used in many application contexts, likemultimedia retrieval, advertising, object recognition, recommender systems andonline content-match applications. In such systems, a user request for an object o , which is not in the cache, can be (partially) satisﬁed by a similar storedobject o ’, at the cost of a loss of user utility. In this paper we make a ﬁrststep into the novel area of similarity caching networks, where requests can beforwarded along a path of caches to get the best eﬃciency-accuracy tradeoﬀ.The oﬄine problem of content placement can be easily shown to be NP-hard,while diﬀerent polynomial algorithms can be devised to approach the optimalsolution in discrete cases. As the content space grows large, we propose acontinuous problem formulation whose solution exhibits a simple structure ina class of tree topologies. We verify our ﬁndings using synthetic and realisticrequest traces. Keywords:

Cache networks, Similarity search, Content distribution

1. Introduction

Similarity caching is an extension to traditional (exact) caching, whereby arequest for an object can be satisﬁed by providing a similar cached item, under adissimilarity cost. In some cases, user requests are themselves queries for objectssimilar to a given one (similarity searching [1]). Caching at network edges candrastically reduce the latency experienced by users, as well as backbone traﬃcand server provisioning.Similarity searching and caching have several applications in multimedia re-trieval [2], contextual advertising [3], object recognition [4, 5, 6, 7], recommender ∗ Corresponding author

Email addresses: [email protected] (Michele Garetto), [email protected] (Emilio Leonardi), [email protected] (GiovanniNeglia)

Preprint submitted to Computer Networks 2021/02/10 a r X i v : . [ c s . N I] F e b ystems [3, 8], online prediction serving systems [9]. However, theoretical un-derstanding of similarity caching and development of related algorithms andpolicies are at their early stages.Simple modiﬁcations to the Least Recently Used policy ( LRU ) which dealwith approximate (soft) hits were proposed in [2, 3]. In [8] authors have studiedhow to statically place contents in edge caches of a cellular network, given theirpopularities and the utility for a user interested in content o to receive a similarcontent o (cid:48) . An adversarial setting was studied in [10] by competitive analysis.The authors of [11] have proposed a similarity caching policy tailored for thecase when cached objects may be embedded in R d with a distance that capturesdissimilarity costs. The work most closely related to this paper is [12], where wehave analyzed a single similarity cache in the oﬄine, adversarial, and stochasticsettings, proposing also some dynamic online policies to manage the cache.We mention that many researchers have studied networks of exact caches (e.g.,[13, 14, 15, 16, 17, 18]), however their results cannot be applied to the similaritycaching setting, which is a fundamentally diﬀerent problem (in exact cachingthere is no notion of distance between objects). To the best of our knowledge,only the recent letter [19] has considered a network of similarity caches, whererequests can be forwarded along a path of caches towards a repository storingall objects, at the cost of increasing delays and resource consumption. Theauthors of [19] have proposed a heuristic based on the gradient descent/ascentalgorithm to jointly decide request routing and caching, similarly to what donein [16] for exact caches but without the corresponding theoretical guarantees.The proposed algorithm requires memory proportional to the size of the catalog,and appears to be computationally feasible only on small-scale systems.In our work, similarly to [19], we focus mainly on the oﬄine setting, i.e., theproblem of statically placing objects in the caches so as to minimize the expectedcost under known content request rates and routing. In contrast to [19], weﬁrst propose algorithms with guaranteed performance, and then we move to thecontinuous limit of the large requests/catalog space, where we investigate the structure of the optimal solution.Our contributions are the following:1. while the content placement problem in networks of similarity caches isNP-hard, we show that it can be formulated as the maximization of asub-modular function over a matroid; therefore a polynomial Greedy algorithm can be deﬁned with 1 / LocalSwap algorithm that does not enjoyworst-case guarantees as

Greedy , but asymptotically converges to a lo-cally optimal solution;3. we characterize the structure of the optimal similarity-caching placementproblem in special cases; in particular, we show that, under mild assump-tions, when the cache network has a regular tree structure and requestsarrive only at the leaves the optimal solution in the large catalog regimehas a relatively simple structure; 2. we show that the above structure is lost in general networks, analyzing asimple tandem network where requests arrive at both caches;5. we propose an online, λ -unaware policy called NetDuel , that extends

Duel [12] to the networked setting;6. we illustrate our ﬁndings considering both synthetic and real request pro-cesses for Amazon items.

2. Main assumptions and problem formulation

Let X be the (ﬁnite or inﬁnite) set of objects that can be requested by theusers. We assume that all objects have equal size and cache i can store up to k i objects.We consider a network of caches with requests potentially arriving at everynode. Some nodes can act as content repositories, where (a subset of) requestscan be satisﬁed exactly or with a small approximation cost. Speciﬁcally, weassume that each request has at least one repository acting as ‘authoritativeserver’ for it, meaning that the approximation cost at the content repository iseither zero or it is negligible as compared to the ﬁxed cost to reach the repository(see next). Let K be the set of all nodes in the network (including caches andrepositories).A request r is a pair ( o, i ) where o is the requested object and i is the nodewhere the request ﬁrst enters the network. Every request is issued according toa Poisson process with rate λ r .At each cache, for any two objects x and y in X there is a non-negative(potentially inﬁnite) cost C a ( x, y ) to locally approximate x with y . We consider C a ( x, x ) = 0. We assume that caches can eﬃciently compute, upon arrival ofa request for x , the closest stored object y . This is typically done resorting tolocality sensitive hashing (LSH) [3].Moreover, there is an additional retrieval cost h ( i, j ) to reach node j fromcache i , which is assumed to increase as more and more hops need to be tra-versed by the request. Costs h ( i, j ) represent the additional penalty (in termsof network delay) incurred by requests, in addition to the approximation cost C a . If a request from i cannot be forwarded to cache j , then h ( i, j ) = + ∞ .We call an approximizer α a pair ( o (cid:48) , j ), where object o (cid:48) has been placed atcache j . If a request r = ( o, i ) is served by object o (cid:48) at node j , it will incur atotal cost C ( r, α ) = C a ( o, o (cid:48) ) + h ( i, j ), that depends on how dissimilar o is from o (cid:48) and how far node i is from node j . For approximizers located at a contentrepository j , we take C ( r, α ) = h ( i, j ), neglecting the local approximation cost.We assume that each cache knows how to route each request to a corre-sponding repository. Nevertheless, deciding if a request should be served locallyor should be forwarded along the path to the repository is still a challengingproblem to solve in a distributed way: while a relatively good approximizer canbe found at a cache i , a better one may be located at an upstream cache j ,justifying the additional cost h ( i, j ). This is in sharp contrast to what happens3n exact caching network, where the forwarding operation is straightforward (arequest is forwarded upon a miss).In our initial investigation, we will suppose that optimal forwarding strategyis available at all caches, i.e., that each cache knows whether to solve a requestlocally or forward it towards the repository. This assumption is reasonable intwo possible scenarios: i) when caches exchange meta-data information abouttheir stored objects (this is acceptable when content is static or quasi-static); ii)when the dominant component of the delay is content download, so that, priorto download, small request messages can go all the way up to the repositoryand back, dynamically ﬁnding the best approximizer along the path. We leaveto future work the challenging case in which optimal forwarding is not availableat the nodes.A consequence of our assumptions is that each request r will be servedminimizing the total cost, i.e., given S the initial set of approximizers at contentrepositories, and A the set of approximizers at the caches, we have C ( r, A ) = min α ∈A∪S C ( r, α ) . (1)In what follows we will consider two main instances for X and C a (). Inthe ﬁrst instance, X is a ﬁnite set of objects and thus the approximation costcan be characterized by an |X | × |X | matrix of non-negative values. This casecould well describe the (dis)similarity of contents (e.g. videos) in a ﬁnite catalog.In the second instance, X is a subset of R p and C a ( x, y ) = f ( d ( x, y )), where f : R + → R + is a non-decreasing non-negative function and d ( x, y ) is a metricin R p (e.g. the Euclidean one). This case is more suitable to describe objectscharacterized by continuous features, as in machine learning applications. Forexample, consider a query to retrieve similar images, as one can issue to images.google.com . The set of images the user may query Google for is essentiallyunbounded, and in any case it is larger than the catalog of images Google hasindexed.In the continuous case, we assume a spatial density of requests arriving ateach cache deﬁned by a Borel-measurable function λ x,i : X × K → R + , i.e., forevery Borel set B ⊆ X , and every cache i ∈ K , the rate with which requests forobjects in B arrive at node i is given by (cid:82) B λ x,i d x . We will refer to the abovetwo instances as discrete and continuous , respectively.Under the above assumptions, our goal is to ﬁnd the optimal static allocation A that minimizes the expected cost C ( A ) per time unit (or per request, if wenormalize the aggregate request arrival rate to 1): C ( A ) (cid:44) (cid:40)(cid:80) r λ r C ( r, A ) , discrete case (cid:80) i ∈K (cid:82) X λ x,i C (( x, i ) , A ) d x, continuous case (2)i.e., minimize A C ( A )subject to (cid:88) o :( o,i ) ∈A ≤ k i , ∀ i ∈ K (3)4 . Algorithms for the Discrete case In this section, we restrict ourselves to the discrete scenario, as this allowsus to make rigorous statements about NP-hardness and algorithms’ complexity.

Proposition 3.1.

The static oﬀ-line similarity caching problem in a network (3) is NP-hard.

This is an immediate consequence of the fact that, as shown in [12, Thm. III.1],the static oﬀ-line similarity caching problem is already NP-hard for a singlecache. Nevertheless, we will show in Sec. 4 that, when the cache network has aregular tree structure, a simple characterization of the optimal solution can bedetermined in the large catalog regime, by exploiting a continuous approxima-tion.Given the initial set S of objects allocated at content repositories, we wantto pick an additional set A of objects and place them at the caches. Let I denote the set of possible allocations that satisfy cardinality constraints at eachcache (corresponding to the constraints in (3)). Let G ( A ) quantify the cachinggain [20, 16] from allocation A in comparison to the case when each requestneeds to be served by its content repository, i.e. G ( A ) = C ( ∅ ) − C ( A ) . Problem (3) is equivalent to the following maximization problemmaximize

A∈I G ( A ) . (4) Proposition 3.2.

The static oﬀ-line similarity caching problem in a networkis a submodular maximization problem with matroid constraints.

The result does not rely on any speciﬁc assumption on C ( r, α ) but for thecost being non-negative. In particular, we can deﬁne C ( r, α ) to embed requests’routing constraints. For example, given a request r = ( o, i ), we can enforce therequest to be satisﬁed by the repository of content o or by one of the caches onthe routing path between node i and the repository (we denote it as P i,o ). Thisconstraint can be imposed by selecting C (( o, i ) , ( o (cid:48) , j )) = ∞ for each j / ∈ P i,o .The proof is quite standard and we report it in Appendix A for completeness. Greedy algorithm and its complexity

As Problem (4) is the maximization of a monotone non-negative submodularfunction with matroid constraints, the

Greedy algorithm has 1 / G ( A Greedy ) ≥ max A∈I G ( A ) [21]. We mention thatthere exists also a randomized algorithm that combines a continuous greedyprocess and pipage rounding to achieve a 1 − /e approximation ratio in expec-tation [22]. 5he Greedy algorithm proceeds from an empty allocation A = ∅ and pro-gressively adds to the current allocation an approximizer in argmax α G ( A ∪{ α } ) − G ( A ) = argmax α (cid:80) r λ r ( C ( r, A ) − C ( r, A ∪ { α } )) up to select (cid:80) i k i = K objects, where K is the total cache capacity in the network (by respecting lo-cal constraints at individual caches). Let O , O R , and N denote the number ofobjects in the catalog, the number of objects that can be requested, and thenumber of caches in the network. When choosing the i -th approximizer thegreedy algorithms needs in general to evaluate ON − i + 1 possible approx-imizers, and how they reduce the cost for the set of requests with cardinal-ity at most O R N . The time-complexity of the algorithm is then bounded by (cid:80) Ki =1 O R N ( ON − i + 1) = O R N ( ON K − K ( K − / Greedy algorithm would be too complexfor catalogue sizes O beyond a few thousands of objects. Moreover, the set ofpossible requested objects O R may be much larger than O . LocalSwap algorithm and its complexity

We now present a diﬀerent algorithm, called

LocalSwap , which is basedon the simple idea to systematically move to states with a smaller expectedcost (2).

LocalSwap can be used both in an oﬀ-line and on-line scenario. Itworks as follows. At the beginning the state of caches is populated by randomcontents. Then, in the on-line scenario the algorithm adapts the cache stateupon every request. In the oﬀ-line scenario, instead, a sequence of emulatedrequests is generated (satisfying the same statistical properties of the originalarrival process), and applied to drive cache state changes. Let A t be the allo-cation obtained by the algorithm at iteration t . Upon an (emulated) request r for o , LocalSwap computes the maximum decrement in the expected cost thatcan be obtained by replacing one of the objects currently stored at some cachealong the forwarding path with o , i.e., ∆ C (cid:44) min y ∈A t C ( A t ∪ { o } \ { y } ) − C ( A t ). • if ∆ C < x contributes to decrease the cost), then the cache replaces y e ∈ arg min y ∈A t C ( A t ∪ { o } \ { y } ) with o ; • if ∆ C ≥

0, the cache state is not updated.

LocalSwap does not provide worst case guarantees as

Greedy , but itasymptotically reaches a locally optimal cache conﬁguration, deﬁned as a con-ﬁguration whose cost (2) is lower than the cost of all conﬁgurations that can beobtained by replacing just one content in one cache. On the contrary,

Greedy does not necessarily reach a local optimal state (as we show below in Sect. 3.4).

Proposition 3.3.

For long enough request sequence

LocalSwap convergeswith probability 1 to a locally optimal cache conﬁguration.

LocalSwap generalizes a similar algorithm proposed in [12] for a singlecache (called there “greedy”) with similar theoretical guarantees. Under the6ssumption that requests are optimally forwarded, the proof of Proposition 3.3is essentially the same of [12, Thm. V.3], so we omit it. By clever data structuredesign, the computational cost of each iteration can be kept O ( N O R ). Remark . Note that by cascading

Greedy and

LocalSwap it is possibleto achieve a locally optimal cache conﬁguration whose approximation ratio isguaranteed to be at least 1/2 (i.e., G ( A Greedy + LocalSwap ) ≥ max A∈I G ( A )). Greedy and

LocalSwap in a toy example

This example shows that 1)

Greedy does not converge necessarily to a lo-cally optimal cache conﬁguration, and 2) there are both settings where

Greedy ﬁnds the optimal cache conﬁguration while

LocalSwap may not, and settingswhere

LocalSwap ﬁnds the optimal cache conﬁguration while

Greedy doesnot.Consider a scenario with 5 contents x i for 1 ≤ i ≤

5. Let us assumethat C a ( x , x ) = C a ( x , x ) = 0, C a ( x , x ) = C a ( x , x ) = (cid:15) > , while C a ( x i , x j ) = ∞ otherwise. We want to solve the content placement problemfor a single cache with k = 2 and λ x > λ x = λ x > λ x = λ x . The cost toretrieve the objects from the remote server is h s > (cid:15) . The optimal placementconﬁguration is: { x , x } . Greedy will reach one of the following equivalentsub-optimal conﬁgurations { x , x } , with x ∈ { x , x } . LocalSwap , on thecontrary, will reach the optimal conﬁguration { x , x } (because it is the uniquelocally optimal conﬁguration). We observe that the conﬁgurations reached by Greedy are not locally optimal: for example if

Greedy selects { x , x } , it isconvenient to replace x with x .If we consider two caches 1 and 2 in tandem, each of size k = 1 withrequests arriving only to the ﬁrst cache and retrieval cost equal to h (1 ,

2) ifthe object is retrieved from cache 2, and h (1 ,

2) + h s if it is retrieved by theserver. The optimal conﬁgurations will maintain a similar structure for h (1 , { ( x , , ( x , } and { ( x , , ( x , } . Greedy will still reach a state { ( x , , ( x, } with x ∈{ x , x } , while LocalSwap will reach an optimal state. For h (1 ,

2) large enoughthe optimal states become { ( x , , ( x, } with x ∈ { x , x } and both previousalgorithms will succeed in reaching an optimal solution. At the same time thereare settings for which the conﬁgurations { ( x , , ( x , } and { ( x , , ( x , } correspond to global minima, the conﬁgurations { ( x , , ( x , } and { ( x , , ( x , } correspond to local minima, and Greedy ﬁnds one of the ﬁrst conﬁgurations,while

LocalSwap may reach one of the second conﬁgurations. For examplethis is the case for h s = 1, h (1 ,

2) = (cid:15) = 4 / λ = λ = 1, and λ = λ = 4 / λ > λ . All costs are assumed to be symmetric. . The Continuous case When O R is much larger than O , or O is itself very large, it makes sense tostudy the request space as continuous. Such continuous representation permitsus to formulate a simpliﬁed optimization problem whose solution well approxi-mates the optimal cost achieved in discrete scenarios with large catalog size.If the number of objects in the catalog is ﬁnite, one could in principle devisea Greedy algorithm also for this case, working exactly as in the discrete case.Indeed the problem (3) can be easily shown to be still submodular even whenrequests lies over a continuous space. However, one now has to evaluate, for eachpossible candidate approximizer α to add to the current allocation, complexintegrals over the inﬁnite query space. It is not simple to deﬁne in generalthe complexity of such operations but it is evident that previous algorithmicapproaches becomes rapidly unfeasible for large set of requests and/or largecatalog.Hereinafter, we will assume that both the request space and the catalogspace are continuous. As a necessary background, we summarize here some results obtained in [12]for the case of a single cache with capacity k . Let B r ( y ) be the closed ball ofradius r around y , i.e., the set of points y such that d ( y, y ) ≤ r . The authorsof [12] proved: Proposition 4.1.

Under a homogeneous request process with intensity λ overa bounded set X , any cache state A = { y , . . . , y k } , such that, for some r , theballs B r ( y h ) for h = 1 , . . . , k are a tessellation of X (i.e., ∪ h B r ( y h ) = X and |B r ( y i ) ∩ B d ( y j ) | = 0 for each i and j ), is optimal. Such regular tessellation exists, in all dimensions, under the norm-1 distance,and corresponds to the case in which balls are squares (assuming that k suchsquares cover exactly the domain X ).It is then immediate to analytically compute the optimal cost for this case.For example, in a two-dimensional domain (see Fig. 1), requests arriving in aparticular ball produce an approximation cost: c ( r ) = 4 (cid:90) r (cid:90) r − x ( x + y ) γ λ d y d x = 4 λ r γ +2 γ + 2 (5)and the total cost is just C ( A ) = k c ( r ).If the request rate is not space-homogeneous, one can apply the results aboveover small regions X i of X where λ x can be approximated by a constant value λ X i . Intuitively, the approximation becomes better and better the more λ x varies smoothly over each Voronoi cell of region i . This in particular occurswhen λ x is smooth over the entire domain, and the cache size increases.Under this approximation, let k i, be the number of cache slots devotedto region i (with the constraint that (cid:80) i k i, = k ). Then, using standard8 Figure 1: Perfect tessellation with square cells in a two-dimensional domain, under the norm-1distance. constrained optimization methods, it is possible to determine the optimal valueof k i, as function of the local request rate λ X i . Without loss of generality, wecan assume that domain X is partitioned into M regions of unitary area, onwhich the request rate is approximately assumed to be constant and equal to λ i , 1 ≤ i ≤ M .Then, focusing for simplicity on the two dimensional case when d ( x, y ) isthe norm-1, and C a ( x, y ) = d ( x, y ) γ , each cache slot is used to approximaterequests falling in a square of area 1 /k i, and radius r i = (cid:112) / (2 k i, ). Following(5), the approximation cost c i within a square belonging to region i can be easilycomputed as: c i ( r i ) = 4 λ i r γ +2 i γ + 2 = ζλ i k − γ +22 i, where ζ (cid:44) (2 − γ ) / / ( γ +2). Hence the total approximation cost in the whole do-main, which depends on the vector k of cache slots k i, ’s, is C ( k ) = (cid:80) Mi =1 k i, c i, ( k i, ).We select the values k that minimize the expected cost:minimize k , ,...,k M, ζ M (cid:88) i =1 λ i k − γ/ i, subject to M (cid:88) i =1 k i, = k (6)Employing the standard Lagrange method, one obtains that λ i k − ( γ +2) / i, equals some unique constant for any region i , which means that k i, has to be9roportional to λ / ( γ +2) i . After some algebra we get:min C ( k ) = ζk − γ/ (cid:32) M (cid:88) i =1 λ γ +2 i (cid:33) γ +22 . (7)In the limit of large M , we substitute the sum in (7) with an integral, obtaining:min C ( k ) ≈ ζk − γ/ (cid:18)(cid:90) X λ ( x ) γ +2 d x (cid:19) γ +22 . (8)We observe that, when the distance is the norm-1, this approach from [12]can be extended to higher dimensions computing integrals similar to (5). Underother distances, things are not as simple, but in principle one can determine thebest partitioning of the domain into k Voronoi cells V i with center b i , suchthat C ( A ) = (cid:88) i (cid:90) V i C a ( x, b i ) d x is minimum, and store in the cache objects { b i } i . Similarly to [12], we prefer toavoid such geometric complications, and stick for simplicity to the norm-1 case. Here we extend the approach recalled in previous section to a chain networkof N caches, where requests arrive at the leaf cache 1, and are possibly forwardedalong the chain up to the node providing the best approximizer. In a chain thecost incurred by request r for object x , served by approximizer α = ( o (cid:48) , j ) is C ( r, α ) = C a ( x, o (cid:48) ) + h (1 , j ). As request originates always at the leaf cache 1,we simplify the notation and denote h (1 , j ) by h j . We naturally assume h i > h j if i > j . The N -th cache in the chain is the repository, where the approximationcost is negligible. In the following formulas, we recover this situation consideringthat the last cache has inﬁnite cache size.Let k i,j be the number of cache slots devoted by cache j to region i . Eachof these slots is used to approximate requests falling in a square of area 1 /k i,j and radius r i,j = (cid:112) / (2 k i,j ). Hence the cost incurred by requests falling in asquare of region i and served by cache j is: c i,j ( r i,j ) = 4 (cid:90) r i,j (cid:90) r i,j − x [( x + y ) γ + h j ] λ i d y d x =4 λ i r γ +2 i,j γ + 2 + 2 λ i r i,j h j (9) In the d dimensional case we have c ( r ) = a d λr γ + d , for an appropriate constant a d . This task is not hard when the domain X can be exactly partitioned into k Voronoi cellsof the same shape. Otherwise, for suﬃciently large cache sizes, one can neglect border eﬀectsand approximately consider k Voronoi cells of the same shape covering the entire domain. C i,j incurred by all requests falling in region i and served by cache j , as function of k i,j , reads: C i,j ( k i,j ) = ζλ i k − γ i,j + λ i h j (10)In general a region i can be served by several caches along the path (ev-ery cache for which k i,j > j ∗ with j ∗ = argmin j C i,j (ties can be neglected). We encode previous property byintroducing weights w i,j ∈ [0 , w i,j represents the fraction of region i served exclusively by cache j . Let w j be the vector of { w i,j } i .We obtain the optimization problem:minimize w ,..., w N ζk − γ/ (cid:32) M (cid:88) i =1 (cid:18) − N (cid:88) j =2 w i,j (cid:19) λ γ +2 i (cid:33) γ +22 + M (cid:88) i =1 (cid:18) − Z (cid:88) j =2 w i,j (cid:19) w i λ i h + N (cid:88) j =2 (cid:34) ζk − γ/ j (cid:18) M (cid:88) i =1 w i,j λ γ +2 i (cid:19) γ +22 + M (cid:88) i =1 w i,j λ i h j (cid:35) subject to w i,j ≥ ∀ j > , ∀ i N (cid:88) j =2 w i,j ≤ ∀ i (11)where notice that we have separated the contribution of cache 1, and taken asdecision variables vectors w j , with j >

1, since w = − (cid:80) Nj =2 w j . More-over, notice that the constraints in (11) are suﬃcient to guarantee that also thefollowing obvious constraints hold: w i,j ≤ ∀ j > , ∀ i ≤ w i, ≤ ∀ i In this form, (11) is a convex minimization problem over a convex domain,thus it has a global minimum. Without loss of generality, let the M regions besorted in increasing values of λ i . Employing the standard method of Lagrangemultipliers, KKT conditions imply that the global optimum is attained whencache 1 handles all most popular regions region i > i ∗ (i.e., w i, = 1, i > i ∗ ),plus possibly a piece of region i ∗ (if 0 < w i ∗ , < i < i ∗ .Previous result allows us to prove the following interesting property aboutthe structure of the optimal solution: Proposition 4.2.

In the case of a chain topology, with requests arriving onlyat the ﬁrst cache, the best solution of the continuous-domain, ﬁnite- M problem

11) is characterized by a set of popularity thresholds λ ∗ = min { λ i } ≤ λ ∗ ≤ λ ∗ ≤ . . . ≤ λ ∗ N − ≤ λ N = max { λ i } , such that cache j approximates all requestsfalling in regions i with λ ∗ j − < λ i < λ ∗ j , plus possibly a portion of a region with λ i = λ ∗ j − , and a portion of a region with λ i = λ ∗ j .Proof. It is suﬃcient to apply the above property about the regions handled bycache 1, ﬁltering out the requests handled by cache 1, and iteratively applyingthe same result to the request process forwarded upstream to caches 2 , . . . , N .When the set of popularity values is not ﬁnite, it is possible to extend theresult in Proposition 4.2, letting M diverge. We partition X into N sub-domains X j , j = 1 , . . . , N , stacked in vector X , such that cache j handles only requestsfalling into domain X j , and we seek to minimize: C ( X ) = N (cid:88) j =1  ζk − γ/ j (cid:32)(cid:90) X j λ ( x ) γ +2 d x (cid:33) γ +22 + h j (cid:90) X j λ ( x ) d x  In principle we would like to ﬁnd the best partitioning: X ∗ = arg min X C ( X )In this asymptotic case we can restate Proposition 4.2 as follows, providing asimpler and more elegant proof. Proposition 4.3.

In the case of a chain topology with requests arriving only atthe ﬁrst cache, the best partition X ∗ is characterized by the following property:for any i < j , inf X ∗ i λ ( x ) ≥ sup X ∗ j λ ( x ) .Proof. By contradiction, let us assume that we ﬁnd two non negligible areas∆ X i ⊆ X ∗ i and ∆ X j ⊆ X ∗ j such that:sup ∆ X j λ ( x ) > inf ∆ X i λ ( x )Then we can always ﬁnd two non-negligible areas ∆ X (cid:48) i ⊆ ∆ X i and ∆ X (cid:48) j ⊆ ∆ X j such that we jointly have: (cid:90) ∆ X (cid:48) i λ ( x ) γ d x = (cid:90) ∆ X (cid:48) j λ ( x ) γ d x (12)and inf ∆ X (cid:48) j λ ( x ) ≥ sup ∆ X (cid:48) i λ ( x ) > X (cid:48) i with ∆ X (cid:48) j , i.e., if we take anew partition X (cid:48) where X (cid:48) i = ( X ∗ i \ ∆ X (cid:48) i ) ∪ ∆ X (cid:48) j and X (cid:48) j = ( X ∗ j \ ∆ X (cid:48) j ) ∪ ∆ X (cid:48) i .Note that by construction C ( X (cid:48) ) = C ( X ∗ ) + ( h j − h i ) (cid:90) ∆ X (cid:48) i λ ( x ) d x + ( h i − h j ) (cid:90) ∆ X (cid:48) j λ ( x ) d x h j > h i , we have C ( X (cid:48) ) ≤ C ( X ∗ ) if we can show that (cid:90) ∆ X (cid:48) j λ ( x ) d x ≥ (cid:90) ∆ X (cid:48) i λ ( x ) d x. Denoted with β = 2 / (2 + γ ) < (cid:90) ∆ X (cid:48) j λ ( x ) d x = (cid:90) ∆ X (cid:48) j λ ( x ) β λ ( x ) − β d x ≥ ( inf ∆ X (cid:48) j λ ( x )) − β (cid:90) ∆ X (cid:48) j λ ( x ) β d x = ( inf ∆ X (cid:48) j λ ( x )) − β (cid:90) ∆ X (cid:48) i λ ( x ) β d x by (12) ≥ (sup ∆ X (cid:48) i λ ( x )) − β (cid:90) ∆ X (cid:48) i λ ( x ) β d x by (13)= (cid:90) ∆ X (cid:48) i (sup ∆ X (cid:48) i λ ( x )) − β λ ( x ) β d x ≥ (cid:90) ∆ X (cid:48) i λ ( x ) − β λ ( x ) β d x = (cid:90) ∆ X (cid:48) i λ ( x ) d x Previous results obtained for the chain topology can be easily extended totrees with L leaves at the same depth D , where requests arrive only at theleaves and all caches at the same level have the same size. Let h D − j be the(equal) cost to reach the cache at level j starting from a leaf. We assume thespatial arrival rate at leaf (cid:96) to be given by λ (cid:96) ( x ) = β (cid:96) λ ( x ), for some constant β (cid:96) >

0, i.e., spatial arrival rates at diﬀerent caches are identical after rescalingby a constant factor. Moreover arrival processes at diﬀerent leaves are assumedto be independent. We will call equi-depth tree a cache network with the abovecharacteristics. We naturally assume h i > h j if i > j . Proposition 4.4.

In an equi-depth tree the optimal cost is achieved by repli-cating the same allocation at each cache of the same level. The allocation to bereplicated is the one obtained in the special case of a chain topology ( L = 1 ).Proof. Suppose to increase the number of nodes in the topology, creating asystem of L parallel chain topologies. Each leaf now has an independent pathtowards a dedicated copy of the root node. By doing so the total cost in thesystem of parallel chains is surely not larger than the total cost achievable inthe original tree, and, in general, it might be smaller (this because we canindependently place objects in every chain so as to minimize the cost inducedby the requests arriving at the corresponding leaf). On the other hand, theoptimal allocation on each chain is the same, since the objective function in1311) is linear with respect to parameter β (cid:96) . Therefore, by adopting such equalallocation on each cache of the same level in the original tree, we obtain exactlythe same total cost achieved in the system of parallel chains, hence this allocationis optimal. In general cache networks that do not belong to the class of equi-depth trees,the simple optimal structure described in Proposition 4.2 is, unfortunately, lost.To see why, it is suﬃcient to consider the simple case of a tandem networkwith two identical caches (hereinafter called the leaf and the parent), wherethe same external arrival process λ ( x ) of requests arrives at both nodes. Now,let us suppose that the cost h to reach the parent from the leaf is large (butit does not need to be disproportionally large). Then the leaf will not ﬁndparticularly convenient to forward its requests to the parent, unless maybe forobjects very close to the ones stored in the parent (whichever they are). Onthe other hand, the parent has to locally approximate all requests, hence itwill need to adequately cover the entire domain X like an isolated cache. Asa consequence, we do not expect any clear separation of X into a sub-domainhandled by the leaf, and a sub-domain handled by the parent. In particular, theproperty that we had before, according to which a single cache has to allocateslots to cover a particular region of the domain, does not hold anymore.A more formal explanation of what happens in this simple case can be pro-vided by the following model. Again, we divide the domain, both at the leaf andat the parent cache, into M regions of unitary area. The request rate over eachregion is assumed to be constant and we denote it by λ i and βλ i for the leaf andthe parent cache, respectively (hence by setting β = 0 we can recover previouscase in which requests arrive only at the leaf). Let k i, and k i, be the numberof slots devoted to region i by the leaf and the parent node, respectively. Noticethat now both quantities are in general diﬀerent from zero. The leaf node willforward to the parent the requests falling in a fraction (1 − w i, ) of region i ,and it is natural to assume that these requests are those falling farther fromthe locally stored objects, i.e., at a distance larger than r ∗ ,i = √ w ,i r ,i , where r ,i = (cid:112) / (2 k i, ). Therefore the approximation cost (10) is changed to: C i, ( k i, , w i, ) = ζλ i w i, γ +22 k i, − γ . Requests forwarded to the parent cache will experience an additional move-ment cost h , plus a local approximation cost at the parent, that we model byassuming that the total area of the subregion forwarded to the parent cache k i, r ,i (1 − w ,i ) will be served by the k i, points at the parent, within squaresof radius: (cid:115) k i, r i, (1 − w i, ) k i, = (cid:115) − w i, k i, Moreover, at the parent cache the local requests will generate an approximationcost similar to (10) (with no retrieval cost).14he total approximation cost in the network is then: C ( A ) = ζ M (cid:88) i =1 λ i w i, γ +22 k i, − γ + ζ M (cid:88) i =1 λ i ( β + (1 − w i, ) γ +22 ) k i, − γ + h M (cid:88) i =1 λ i (1 − w i, ) . This cost should be minimized over { w i, } i , { k i, } i , and { k i, } i . By ﬁnding theoptimal values for { k i, } i and { k i, } i given { w i, } i , we get C ( w ) = ζk − γ (cid:32) M (cid:88) i =1 λ γ i w i, (cid:33) γ + ζk − γ (cid:32) M (cid:88) i =1 λ γ i ( β + (1 − w i, ) γ +22 ) γ (cid:33) γ + h M (cid:88) i =1 λ i (1 − w i, ) . (14)Note that for β = 0 we recover the cost resulting from (11) in the case of atandem network. Computing the derivative of the above cost with respect to w i, we get: ∂ C ( w ) ∂w i, = ζk − γ γ + 22 (cid:32) M (cid:88) i =1 λ γ i w ,i (cid:33) γ λ γ j − ζk − γ γ + 22 (cid:32) M (cid:88) i =1 λ γ i ( β + (1 − w ,i ) γ +22 ) γ (cid:33) γ × λ γ j (1 − w ,j ) γ ( β + (1 − w ,j ) γ +22 ) γ γ − hλ j . (15)Imposing the optimality conditions, we ﬁnd that there may be multiple regionswith diﬀerent popularities λ i for which w ,i ∗ ∈ (0 , λ is uniform over the whole domain. In this case it is convenientto shift over space the two regular tessellations so that the centroids at the leafand at the parent are as far as possible, as shown in Fig. 2. This allows the leafto forward the requests farthest from its centroids to the parent, where they arebetter approximated. 15 Figure 2: Optimal allocation in the tandem network with uniform arrival process at bothnodes: square tessellation in the leaf (red nodes), and square tessellation in the parent (greennodes).

Requests arriving at the leaf are approximated by the leaf in the red portionof the domain, as depicted in Fig. 2, while they are approximated by the parentin the green portion of the domain. Distance z (in Fig. 2) that deﬁnes theseparation between the two portions can be easily computed (for γ = 1) as z = max { , ( r − h ) / } , where r is the radius of the square of each tessellation(note that if h > r requests are not forwarded from the leaf to the parent).Then one can easily compute the reduction ∆ c = z in the approximationcost for requests arriving at the leaf, provided by each slot of the second cache,and compute the resulting overall approximation cost (the approach can begeneralized to γ (cid:54) = 1, but we omit the details here).

5. NetDuel: an online dynamic policy

Although in our work we have focused on the static, oﬄine problem of con-tent allocation at similarity caches, we have also devised an online, λ -unawaredynamic policy NetDuel , which is a networked version of policy

Duel we haveproposed in [12]. At high-level, it is based on the following idea: each (real)content currently is the cache is paired to a (virtual ) content competing withit. The cumulative saving in the total cost produced by the real and the virtual The cache stores only metadata of a virtual object, not the object itself. Virtual objectsare taken from the arrival process. t o t a l c o s t , C cost to parent, hNetDuel - σ = L/2Greedy - σ = L/2LocalSwap - σ = L/2Continuous - σ = L/2 1.5 2 2.5 3 3.5 4 4.5 5 0 1 2 3 4 5 6 t o t a l c o s t , C cost to parent, hNetDuel - σ = L/8Greedy - σ = L/8LocalSwap - σ = L/8Continuous - σ = L/8 Figure 3: Total cost obtained by

Greedy , LocalSwap , continuous approximation and

Net-Duel in a tandem network with arrivals at the leaf, for σ = L/ σ = L/ objects are observed over a suitable time window, and if the saving of the vir-tual object exceeds the saving of the real one by a suﬃcient amount, the virtualreplaces the real in the cache. Otherwise, at the end of the observation window,the virtual object is discarded, and afterwards the real object will be paired toa new virtual object taken from the arrival process. NetDuel achieves an allo-cation close to the optimal one, suggesting that eﬀective online dynamic policiescan be devised for networks of similarity caches, at least under the assumptionthat each node knows when to forward requests upstream.

6. Numerical experiments

To test our algorithms, we consider 10000 objects falling on the points of abi-dimensional L × L grid with L = 100, equipped with the norm-1 metric andthe local cost C a ( x, y ) = d ( x, y ), i.e., we take (unless otherwise speciﬁed) γ = 1.The request process follows a Gaussian distribution, such that the request rateof object i is proportional to exp( − d i / (2 σ )), where d i is the hop distance fromthe grid center. To jointly test our continuous approximations, we assume thateach grid point i is the center of a small square of area 1, on which λ is assumedto be constant and equal to λ i .We ﬁrst consider a simple tandem network with arrivals only at the leaf, andﬁxed cost h to reach the parent. In Fig. 3 we compare the total cost produced by Greedy , LocalSwap , the continuous approximation (the solution of (11)) and

NetDuel , as function of h , for a larger gaussian ( σ = L/

2) or a narrow gaussian17 reedy LocalSwapContinuous NetDuel

Figure 4: Allocations obtained by

Greedy , LocalSwap , continuous approximation and

Net-Duel in the tandem network with σ = L/ h = 3. Circle marks for the parent cache andtriangle marks for the leaf cache. ( σ = L/ LocalSwap performs better than

Greedy , whichperforms better than

NetDuel . The continuous approximation does not neces-sarily provide a lower bound to discrete algorithms/policy, since it is a diﬀerentsystem where the request space is continuous, rather than constrained on thegrid points. However, we do observe that the continuous approximation curvegets closer to the curve produced by

LocalSwap for σ = L/ λ varies more smoothly over the domain.In Fig. 4 we show the allocations (circles for the parent, triangles for theleaf) produced by the four approaches above in the case σ = L/ h = 3,using two diﬀerent colors for the sub-domains where requests arriving at theleaf are approximated by the leaf or the parent . We observe that Greedy and

NetDuel produce more irregularities than

LocalSwap , as compared to thetheoretical prediction of the continuous approximation.In Fig. 5 we report, for a larger system with 100000 contents, the allocationproduced at the parent by

LocalSwap in a tandem network with requestsarriving at both nodes, showing also with two diﬀerent colors the regions whererequests arriving at the leaf are approximated by the leaf or the parent. Weconsider both a Gaussian arrival process with σ = L/ h = 3. Notice that the parent cache For the continuous approximation, we do not show stored contents, and (border) squareletsare considered as handled exclusively by the parent if w i, > w i, . igure 5: Parent allocation obtained by LocalSwap in a tandem network with arrivals atboth nodes. Gaussian traﬃc (left plot) and Uniform traﬃc (right plot). covers also the central part of the domain, in contrast to Fig. 4. Resultsproduced by

LocalSwap suggest that now, for the requests arriving at theleaf, the regions served directly by the leaf and the regions approximated bythe parent are intertwined in a complex way. For uniform λ , Fig. 6 shows theaccuracy of the continuous approximation based on the shifted regular squaretessellations shown in Fig. 2. By crawling the Amazon web-store, the authors of [23] built an image-baseddataset of users’ preferences for millions of items. Using a neural network pre-trained on ImageNet, each item is embedded into a d -dimensional space, onwhich Euclidean distance is used as item similarity. We consider as requestprocess the timestamped reviews left by users for the 10000 most popular itemsbelonging to the baby category, with d = 100. The resulting trace, containingabout 10.3M requests, is fed into a cache of size 100, with a parent cache ofthe same size (a tandem network) reachable by paying an additional ﬁxed cost h = 150. The local approximation cost is set equal to the Euclidean distance.In Fig. 7 we show the allocations produced by LocalSwap in both caches,reporting, for each stored item, the popularity rank ( x axes) and the distancefrom the baricenter ( y axes). Across the entire catalog we found no correlationbetween popularity rank and distance from the baricenter. Nevertheless, we doobserve that the leaf cache tends to store items that are either very popular orvery close to the baricenter. The resulting total cost is C = 266 (left plot inFig. 7).Moreover, by computing the request density within spherical shells at dis-tance d ∈ [ ρ, ρ + 1] from the baricenter, we found a decreasing trend in ρ , seeFig. 8, which justiﬁes the attempt of ‘enforcing’ the structure of the optimalsolution that we found in chain topologies fed only from the leaf. We do so byconstraining the leaf (parent) cache to store only contents at distance from thebaricenter smaller (larger) than a given threshold d ∗ . The constrained Local-Swap algorithm obtains, for the best possible d ∗ = 350, a total cost C = 26919 T o t a l c o s t h Continuous γ = 1Continuous γ = 1.2Continuous γ = 0.8LocalSwap γ = 1LocalSwap γ = 1.2LocalSwap γ = 0.8 Figure 6: Total cost in the tandem network with arrivals at both node, λ uniform, as functionof h , for diﬀerent values of γ , according to LocalSwap (points) and continuous approximation(curves).

150 200 250 300 350 400 450 500 550 600 1 10 100 1000 d i s t a n ce fr o m b a r i ce n t e r popularity rankleaf cacheparent cache 150 200 250 300 350 400 450 500 550 600 1 10 100 1000 d i s t a n ce fr o m b a r i ce n t e r popularity rankleaf cacheparent cache Figure 7: Allocations obtained by

LocalSwap in a tandem network with arrivals at the leafaccording to Amazon trace. Unconstrained version (left) and constrained version (right). -280 -270 -260 -250 -240 -230

0 100 200 300 400 500 600 700 800 d e n s it y o f r e qu e s t s distance from baricenter, ρ Figure 8: Density of requests of Amazon trace within spherical shells at distance d ∈ [ ρ k , ρ k +1 ]from the baricenter. (only 1% worse than before), right plot in Fig. 7, suggesting that a simple al-location and forwarding rule based on the distance from the baricenter is closeto optimal also in a realistic scenario.

7. Conclusions

We made a ﬁrst step into the analysis of networks of similarity caches, focus-ing on the oﬄine problem of static content allocation. Despite the NP-hardnessof the problem, eﬀective greedy algorithms can be devised with guaranteed per-formance, but they become prohibitive as the system size increases. For verylarge request space/catalog size, we relaxed the problem to the continuous, ob-taining for equi-depth tree topologies an easily implementable solution with asimple structure, which greatly simpliﬁes the related request forwarding prob-lem. The above simple structure is unfortunately lost in more general networks.We have also proposed a ﬁrst online dynamic policy, though much more can bedone in the design of practical online policies and request forwarding strategiesfor similarity caching networks.

Appendix A. Proof of Proposition 3.2

Proof.

We ﬁrst show that constraints are matroid ones. The empty set obviouslybelongs to I , and if A ⊂ B with

B ∈ I , then

A ∈ I . Finally, given twoallocations with |A| < |B| , there exists a cache i that stores less elements under A than under B , i.e., such that (cid:80) o (cid:48) :( o (cid:48) ,i ) ∈A < (cid:80) o (cid:48) :( o (cid:48) ,i ) ∈B

1. Then, there existsan object o that is stored at i under B , but not under A . As (cid:80) o (cid:48) :( o (cid:48) ,i ) ∈A < (cid:80) o (cid:48) :( o (cid:48) ,i ) ∈B ≤ k i , A ∪ ( o, i ) is still a feasible allocation.21e now prove that G ( A ) is a non-negative monotone submodular function. G ( A ) = (cid:88) r λ r C ( r, ∅ ) − (cid:88) r λ r C ( r, A )= (cid:88) r λ r ( C ( r, ∅ ) − C ( r, A ))= (cid:88) r λ r (cid:18) C ( r, ∅ ) − min α ∈ S ∪A C ( r, α ) (cid:19) = (cid:88) r λ r (cid:18) C ( r, ∅ ) − min (cid:18) min α ∈A C ( r, α ) , C ( r, ∅ ) (cid:19)(cid:19) = (cid:88) r λ r (cid:18) C ( r, ∅ ) − min α ∈A min ( C ( r, α ) , C ( r, ∅ )) (cid:19) = (cid:88) r max α ∈A λ r (cid:16) C ( r, ∅ ) − min ( C ( r, α ) , C ( r, ∅ )) (cid:17) = (cid:88) r max α ∈A λ r (cid:16) max ( C ( r, ∅ ) − C ( r, α ) , (cid:17) Then G ( A ) = (cid:80) r max α ∈A M r,α , where M r,α ≥ r and α . The setfunction is obviously monotone (i.e., if A ⊂ B , then G ( A ) ≤ G ( B )) and non-negative and corresponds to the utility of a facility location problem that isknown to be submodular (e.g., [24], but it is also easy to check directly). References [1] A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensionsvia hashing, in: Proceedings of the 25th International Conference on VeryLarge Data Bases, VLDB ’99, Morgan Kaufmann Publishers Inc., San Fran-cisco, CA, USA, 1999, p. 518–529.URL https://dl.acm.org/doi/10.5555/645925.671516 [2] F. Falchi, C. Lucchese, S. Orlando, R. Perego, F. Rabitti, A Metric Cachefor Similarity Search, in: Proceedings of the 2008 ACM Workshop on Large-Scale Distributed Systems for Information Retrieval, LSDS-IR ’08, ACM,New York, NY, USA, 2008, pp. 43–50. doi:10.1145/1458469.1458473 .[3] S. Pandey, A. Broder, F. Chierichetti, V. Josifovski, R. Kumar, S. Vas-silvitskii, Nearest-neighbor Caching for Content-match Applications, in:Proceedings of the 18th International Conference on World Wide Web,WWW ’09, ACM, New York, NY, USA, 2009, pp. 441–450. doi:10.1145/1526709.1526769 .[4] U. Drolia, K. Guo, J. Tan, R. Gandhi, P. Narasimhan, Cachier: Edge-caching for recognition applications, in: Proc. of the IEEE ICDCS, IEEE,2017, pp. 276–286. doi:10.1109/ICDCS.2017.94 .225] U. Drolia, K. Guo, P. Narasimhan, Precog: Prefetching for image recogni-tion applications at the edge, in: Proc. of ACM/IEEE Symposium on EdgeComputing, 2017, pp. 1–13. doi:10.1145/3132211.3134456 .[6] P. Guo, B. Hu, R. Li, W. Hu, Foggycache: Cross-device approximatecomputation reuse, in: Proc. of the MobiCom, 2018, pp. 19–34. doi:10.1145/3241539.3241557 .[7] S. Venugopal, M. Gazzetti, Y. Gkoufas, K. Katrinis, Shadow puppets:Cloud-level accurate AI inference at the speed and economy of edge, in:USENIX HotEdge , 2018.URL [8] P. Sermpezis, T. Giannakas, T. Spyropoulos, L. Vigneri, Soft Cache Hits:Improving Performance Through Recommendation and Delivery of RelatedContent, IEEE Journal on Selected Areas in Communications 36 (6) (2018)1300–1313. doi:10.1109/JSAC.2018.2844983 .[9] D. Crankshaw, X. Wang, G. Zhou, M. J. Franklin, J. E. Gonzalez, I. Sto-ica, Clipper: A Low-Latency Online Prediction Serving System, in: 14thUSENIX Symposium on Networked Systems Design and Implementation(NSDI 17), USENIX Association, Boston, MA, 2017, pp. 613–627.URL [10] F. Chierichetti, R. Kumar, S. Vassilvitskii, Similarity Caching, in: Proceed-ings of the Twenty-eighth ACM SIGMOD-SIGACT-SIGART Symposiumon Principles of Database Systems, PODS ’09, ACM, New York, NY, USA,2009, pp. 127–136. doi:10.1145/1559795.1559815 .[11] A. Sabnis, T. Si Salem, G. Neglia, M. Garetto, E. Leonardi, R. K. Sitara-man, Grades: Gradient descent for similarity caching, in: IEEE Conferenceon Computer Communications (INFOCOM), 2021.[12] M. Garetto, E. Leonardi, G. Neglia, Similarity caching: Theory and algo-rithms, in: IEEE INFOCOM 2020 - IEEE Conference on Computer Com-munications, IEEE Press, 2020, p. 526–535. doi:10.1109/INFOCOM41043.2020.9155221 .[13] E. J. Rosensweig, D. S. Menasche, J. Kurose, On the steady-state of cachenetworks, in: 2013 Proceedings IEEE INFOCOM, 2013, pp. 863–871. doi:10.1109/INFCOM.2013.6566874 .[14] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, G. Caire, Fem-tocaching: Wireless content delivery through distributed caching helpers,IEEE Transactions on Information Theory 59 (12) (2013) 8402–8413. doi:10.1109/TIT.2013.2281606 .2315] N. Choungmo Fofack, P. Nain, G. Neglia, D. Towsley, Performance evalua-tion of hierarchical ttl-based cache networks, Computer Networks 65 (2014)212 – 231. doi:https://doi.org/10.1016/j.comnet.2014.03.006 .URL [16] S. Ioannidis, E. Yeh, Adaptive caching networks with optimality guar-antees, IEEE/ACM Transactions on Networking 26 (2) (2018) 737–750. doi:10.1109/TNET.2018.2793581 .[17] E. Leonardi, G. Neglia, Implicit coordination of caches in small cell net-works under unknown popularity proﬁles, IEEE Journal on Selected Areasin Communications 36 (6) (2018) 1276–1285. doi:10.1109/JSAC.2018.2844982 .[18] Y. Li, S. Ioannidis, Universally stable cache networks, in: IEEE INFOCOM2020 - IEEE Conference on Computer Communications, 2020, pp. 546–555. doi:10.1109/INFOCOM41043.2020.9155416 .[19] J. Zhou, O. Simeone, X. Zhang, W. Wang, Adaptive oﬄine and onlinesimilarity-based caching, IEEE Networking Letters 2 (4) (2020) 175–179. doi:10.1109/LNET.2020.3031961 .[20] N. Golrezaei, K. Shanmugam, A. G. Dimakis, A. F. Molisch, G. Caire,Femtocaching: Wireless video content delivery through distributed cachinghelpers, in: 2012 Proceedings IEEE INFOCOM, 2012, pp. 1107–1115. doi:10.1109/INFCOM.2012.6195469 .[21] M. L. Fisher, G. L. Nemhauser, L. A. Wolsey, An analysis of approximationsfor maximizing submodular set functions—II, Springer Berlin Heidelberg,Berlin, Heidelberg, 1978, pp. 73–87. doi:10.1007/BFb0121195 .[22] G. Calinescu, C. Chekuri, M. P´al, J. Vondr´ak, Maximizing a monotonesubmodular function subject to a matroid constraint, SIAM Journal onComputing 40 (6) (2011) 1740–1766. doi:10.1137/080733991 .[23] J. McAuley, C. Targett, Q. Shi, A. van den Hengel, Image-based recommen-dations on styles and substitutes, in: Proceedings of the 38th InternationalACM SIGIR Conference on Research and Development in Information Re-trieval, SIGIR ’15, Association for Computing Machinery, New York, NY,USA, 2015, p. 43–52. doi:10.1145/2766462.2767755 .[24] A. Krause, D. Golovin, Submodular Function Maximization, CambridgeUniversity Press, 2014, p. 71–104. doi:10.1017/CBO9781139177801.004doi:10.1017/CBO9781139177801.004