[PDF] On the Power of False Negative Awareness in Indicator-based Caching Systems

Abstract

Distributed caching systems such as content distribution networks often advertise their content via lightweight approximate indicators (e.g., Bloom filters) to efficiently inform clients where each datum is likely cached. While false-positive indications are necessary and well understood, most existing works assume no false-negative indications. Our work illustrates practical scenarios where false-negatives are unavoidable and ignoring them has a significant impact on system performance. Specifically, we focus on false-negatives induced by indicator staleness, which arises whenever the system advertises the indicator only periodically, rather than immediately reporting every change in the cache. Such scenarios naturally occur, e.g., in bandwidth-constraint environments or when latency impedes the ability of each client to obtain an updated indicator. Our work introduces novel false-negative aware access policies that continuously estimate the false-negative ratio and sometimes access caches despite negative indications. We present optimal policies for homogeneous settings and provide approximation guarantees for our algorithms in heterogeneous environments. We further perform an extensive simulation study with multiple real system traces. We show that our false-negative aware algorithms incur a significantly lower access cost than existing approaches or match the cost of these approaches while requiring an order of magnitude fewer resources (e.g., caching capacity or bandwidth).

Full PDF

aa r X i v : . [ c s . N I] F e b On the Power of False Negative Awareness inIndicator-based Caching Systems

Itamar Cohen ∗ , Gil Einziger † , and Gabriel Scalosub ‡∗ Department of Electronics and Telecommunications, Politecnico di Torino, Italy † Department of Computer Science, Ben-Gurion University of the Negev, Beer Sheva, Israel ‡ School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, [email protected], [email protected], [email protected]

Abstract —Distributed caching systems such as content distri-bution networks often advertise their content via lightweightapproximate indicators (e.g., Bloom ﬁlters) to efﬁciently informclients where each datum is likely cached. While false-positiveindications are necessary and well understood, most existingworks assume no false-negative indications. Our work illustratespractical scenarios where false-negatives are unavoidable andignoring them has a signiﬁcant impact on system performance.Speciﬁcally, we focus on false-negatives induced by indicator stal-eness, which arises whenever the system advertises the indicatoronly periodically, rather than immediately reporting every changein the cache. Such scenarios naturally occur, e.g., in bandwidth-constraint environments or when latency impedes each client’sability to obtain an updated indicator.Our work introduces novel false-negative aware access policiesthat continuously estimate the false-negative ratio and sometimesaccess caches despite negative indications. We present optimalpolicies for homogeneous settings and provide approximationguarantees for our algorithms in heterogeneous environments. Wefurther perform an extensive simulation study with multiple realsystem traces. We show that our false-negative aware algorithmsincur a signiﬁcantly lower access cost than existing approachesor match the cost of these approaches while requiring an order ofmagnitude fewer resources (e.g., caching capacity or bandwidth).

I. I

NTRODUCTION

Caches are extensively used in networking environmentssuch as Content Delivery Networks [1]–[3], Named DataNetworks [4], 5G networks [5], and Information CentricNetworks [6]. In such networks, accessing caches often incurssome overhead in terms of latency, bandwidth, or energy [3],[7]. On the other hand, fetching a datum without cachesusually incurs a larger miss penalty , e.g., for retrieving therequested item from a remote server [2].In large distributed systems, caches often further optimizeperformance by advertising their content [1]–[4], [6], [8].Such advertisements allow clients to minimize costs by se-lecting which cache to access for a requested datum. Ideally,the advertisement policy would always accurately reﬂect theup to date content at every cache in the network. However,such a solution requires a prohibitive amount of memory, com-putation, and bandwidth resources. Hence, systems often com-promise some accuracy for efﬁciency by advertising periodicalapproximate indicators . Indicators are data structures that tradeaccuracy for space efﬁciency. Common embodiment of suchindicators are Bloom ﬁlters [7], [9]–[11], and ﬁngerprint hashtables [12]. Such approximations commonly introduce the risk of false-positive errors, i.e., the indicator sometimes wrongly indicatesthat a datum is stored in the cache. In such a case, accessingthe cache results in an unnecessary cache access, whichtranslates to an excessive cost. Consequently, the problem ofadvertising space-efﬁcient indicators while keeping a low false-positive rate has attracted a bulk of research effort [7], [9]–[11], [13]. Other works addressed the cache selection problem,namely, selecting which cache to access when there exist one(or more) several positive indications, where some of themmay actually be false-positives [7], [14].Most previous works [3], [7], [14] assume that there areno false-negative indications. Indeed, there exist indicatorsthat theoretically guarantee a false-negative ratio of zero (e.g.,simple Bloom ﬁlter [9]), or a negligible false-negative ratio(e.g., a Counting Bloom Filter (CBF) [10]). However, tomanifest this guarantee in practical distributed environment,every cache should advertise its indicator to all the clients inthe network upon every change in the cached content, usuallyresulting in prohibitive bandwidth consumption. For instance,a leading CDN provider reports using Bloom ﬁlters of about70MB in size in every cache [2]. Insisting on sending anupdate upon every change in the cached content in such asystem may result in having the advertised indicators consumemore bandwidth than the cached content itself. Henceforth,caches commonly advertise their content only periodically.When using periodical updates, the advertised content grad-ually becomes stale . Namely, it takes time for the indicatoravailable at the clients to reﬂect changes in the cached con-tent. Unfortunately, such staleness may lead to a signiﬁcantincrease in false-negative indications. To illustrate the problem,consider a cache that advertises a fresh indicator, and lateradmits a new item x . When the client tests for x , the indicatoris likely to wrongly indicate that x is not in the cache (as itwasn’t in the cache when the advertisement was sent), resultingin a false-negative error. Such scenarios are quite common inhighly dynamic networks, such as 5G-networks [5].To explore the signiﬁcance of false-negatives caused dueto staleness, consider Fig. 1. The ﬁgure presents the false-negative ratio indications as a function of the time betweensubsequent advertisements, referred to as the update interval .We measure the update interval by the number of cachechanges (insertions of new items). The indicator used is anoptimally conﬁgured simple Bloom ﬁlter [9], where the ﬁgure − − − Update Interval ( R a ti o Wiki 2 16 128 1K 8KUpdate Interval ( bpe bpe bpe Fig. 1. Effect of the update interval on the false-negative ratio. Both axes arein log-scale, the cache size is 10K, and policy is LRU. The traces are Wikiand Gradle (described in Sec. V). The distinct plots for increasing values ofbits-per-cached-element ( bpe ), correspond to increasing indicator sizes, withdecreasing false-positive ratios, respectively. shows distinct indicators with varying number of Bits Percached Element (bpe); a higher bpe implies a larger indicator,that is guaranteed to provide a lower false-positive ratio [13].Notice that the X-axis and the Y-axis are in logarithmic scale.Fig. 1 shows that the false-negative ratio dramatically increasesfor all indicator sizes. Furthermore, this phenomenon is man-ifested for various types of workloads, where Fig. 1 showsthis for two speciﬁc traces, Wiki and Gradle (which representsigniﬁcantly distinct workloads, as described in Sec. V-A).For instance, it is not uncommon to have a false-negativeratio as high as 10% when the update interval is above 1K.Most interestingly, using a larger indicator, which guaranteesa lower inherent false-positive ratio [13], results in a higher false-negative ratio. We discuss and explain this phenomenonwith more detail in Sec. V-C.Designing an access strategy that considers both false-positives, and false-negatives, is a challenging task. In par-ticular, it is unclear whether, or when, it may be beneﬁcial toaccess a cache despite a negative indication. However, to thebest of our knowledge, despite its importance, this problemhas never been studied.

A. Our Contribution

We consider the problem of accessing a multi-cache systemwhile using indicators that exhibit both false-positive and false-negative indications. We challenge the common practice whichassumes that it is always better not to access caches withnegative indications. Speciﬁcally, we develop a framework thatsupports false-negative awareness , and design policies thatactively access caches with negative indications, aiming atminimizing the overall access cost.We ﬁrst present an algorithm for fully-homogeneous envi-ronments, and show that it is optimal in terms of the overallaccess cost. These results appear in Sec. III. In Sec. IV, wedevelop a strategy for heterogeneous environments where boththe cache and the client estimate some of the underlyingdistributions (which may depend inter alia on the systemconﬁguration and the workload being served). Our suggestedfalse-negative aware (

FNA ) strategy makes deliberate accesses also to caches with negative indications. Furthermore, we showthat any approximation guarantee provided by a false-negativeoblivious (

FNO ) access strategy (in our model), can be usedfor our false-negative aware strategy. In particular, we showhow to employ known

FNO strategies as subroutines, whichinduce their performance guarantees on our proposed

FNA solution.Finally, in Sec. V, we present the results of our in-depthsimulation study, where we evaluate the performance of ourproposed solution in varying system conﬁgurations. We showthat our

FNA strategy implies a signiﬁcant reduction in accesscosts in many real-life scenarios, compared to state-of-the-art

FNO approaches. Furthermore, our results show that our

FNA strategy with minimal resources obtains comparable resultsto those obtained by

FNO strategies that use considerablymore resources. For instance, our results indicate that in orderto match the performance of our

FNA strategy, an

FNO approach might require as much as an order of magnitudemore resources (e.g., in terms of system caching capacity orthe bandwidth required for indicator advertisement).

B. Related Work

As described in the previous section, indicators are com-monly used to periodically advertise the content of caches inan efﬁcient manner. Since indicators are of bounded size, theyusually fail to provide a precise representation of the cachecontent and exhibit false-positive indications [9], [13]. Thepioneering work of [7] shows that due to these false-positives,sometimes naively relying on an indicator for accessing evena single cache may do worse than not using an indicator at all.Subsequent work [14] tackles a distributed scenario where mul-tiple caches advertise indicators, and develops access strategiesthat take into account both the access cost, and the false-positive rate in each cache, to minimize the overall expectedcost. However, [7], [14] disregard false-negative indications.The work of [15] studies the problem of false-negatives inpractical deployment of Counting Bloom Filters [10]. Othertechniques to reduce the false-negative ratio in numerousvariants of Bloom ﬁlters are surveyed in [11]. These worksaddress false-negatives that stem from architectural design ,i.e., from concrete data structures used to implement indicators.Consequently, these works focus on developing enhanceddata structures that reduce such false-negatives. In contrast,we focus on false-negatives caused by staleness, i.e., falsenegatives that follow from the operational usage of the system.Such false-negatives may occur in any indicator, even if itsdesign is false-negative-free, such as a simple Bloom ﬁlter [9],[13]. In this sense our approach is orthogonal to previous worktargeting the reduction of false-negatives [11], [15], and theseapproaches may be seamlessly combined with our proposedsolutions.Since constantly advertising a fresh indicator might beprohibitively costly, in practice caches commonly advertisefresh indicators only periodically [1], [16]–[18], where oneusually refers to the period between the advertisements of freshindicators as the update interval . Several works addressedthe interplay between the update interval and performance by means of simulations [1], [17], [18]. The works [19], [20]reduce the transmission overheads by accurately advertisingimportant information, while allowing less important informa-tion to be stale, or less accurate. Some previous work [16]analyzed the impact of stale Bloom ﬁlter replicas on thefalse-positive ratio and the false-negative ratio. However, theframework of [16] implicitly assumes that requests are drawnfrom a uniform distribution, and that each object is storedin a single cache. This framework may conforms with theproperties of distributed storage systems. However, in many(if not most) real-life distributed caching environments, work-loads do not conform with a uniform distribution of requests,and furthermore objects may be found in either a single cache,multiple caches, or no cache at all [3], [4], [6]. Part of ouranalysis of such general environments is inspired by ideasintroduced in [16] (see, e.g., Sec. IV-B).The problem of stale indicators relates to other problems ofdecision making under uncertainty. In particular, our problemis closely related to the concept of the

Age of Information (AoI). The AoI quantiﬁes the time since the generation of thelast successfully received information from a remote system.The AoI paradigm was applied to numerous environments,e.g. vehicular networks, scheduling, and buffer management;a detailed survey can be found in [21]. The AoI was appliedalso to caches [22], but in the context of the coherency of thecached data, while we focus on the coherency of the indicators.Lastly, the question whether to follow the recommendationof a binary indicator has been extensively studied in thecontext of branch prediction [23]. However, the models usedto study such systems are signiﬁcantly different than the onesconsidered in our work. In particular, these models do notadhere to or follow traditional cache-memory models, whichlay at the core of our work.II. S

YSTEM M ODEL AND P RELIMINARIES

This section formally deﬁnes our system model and nota-tions, which are summarized in Table I. We consider a set N of n = | N | caches , containing possibly overlapping sets ofitems. We associate each time t with a unique item request x t issued at time t , and we refer to the entire sequence ofrequests as σ . Let S j,t denote the set of items stored at cache j at time t , prior to handling request x t . For every request x t issued at time t , drawn from some distribution, we let h j,t denote the probability that x t ∈ S j,t . This probabilitydepends both on the distribution of the requests, as well ason the cache policy. The average h j,t over the entire sequenceis commonly referred to as the hit ratio , i.e., the fraction ofrequests in σ that were available in cache j , upon being issued.Similarly to previous works, we assume that the past hit ratiois a reasonable estimate of h j,t [24], [25]. We refer to thisestimation as the probability that the next accessed item x t isavailable in S j,t . Each cache j maintains an indicator I j,t , which approxi-mates the set of items in cache j at time t ; given an item We provide further details of how to obtain such an estimation inSec. IV-C. TABLE IL

IST OF S YMBOLS . T

HE TOP PART CORRESPONDS TO OUR SYSTEM MODEL (S ECTION

II),

THE MIDDLE PART CORRESPONDS TO THEFULLY - HOMOGENEOUS CASE (S EC . III), AND THE BOTTOM PARTCORRESPONDS TO THE HETEROGENEOUS PART (S ECTION

IV)-V). Symbol Meaning N Set of caches. n Number of caches: n = | N | x ( x t ) Item request (issued at time t ) N x Set of caches with positive indications for requested item xn x Number of positive indications for requested item x : S j The set of data items in cache jh Hit ratio of cache j , h = Pr( x ∈ S j ) I j Indicator of cache jI j ( x ) Indication of indicator I j for item x FP j False-positive ratio for I j FN j False-negative ratio for I j π j Probability of a miss in cache j given a positive indication ν j Probability of a miss in cache j given a negative indication q j Probability of a positive indication in indicator I j c j Access cost of cache jM Miss penalty φ Cost function. See Eq. (4). r Number of caches with negative indication accessed r Number of caches with positive indication accessed ˆ φ Cost function for the fully-homogeneous case. See Eq. (5). r ∗ Optimal choice of r . See Eq. (6). r ∗ Optimal choice of r . See Eq. (6). k Number of hash functions in the Bloom ﬁlter C j Size of cache j ( | S j | ≤ C j )bpe Bits per cached element in the Bloom ﬁlter B ( t ) Number of ’1’ bits in the updated Bloom ﬁlter at time tB ( t ) Number of ’0’ bits in the updated Bloom ﬁlter at time t ∆ ( t ) Number of bits that are ’1’ in the updated Bloom ﬁlter,but ’0’ in the stale Bloom ﬁlter at time t ∆ ( t ) Number of bits that are ’0’ in the updated Bloom ﬁlter,but ’1’ in the stale Bloom ﬁlter at time tδ Smoothness parameter of moving average. See Eq. (9). x , I j,t ( x ) = 1 is referred to as a positive indication while I j,t ( x ) = 0 is considered a negative indication .The false positive ratio of I j,t is deﬁned by FP j,t =Pr( I j,t ( x ) = 1 | x / ∈ S j,t ) . It captures the probability that givena request x issued at time t , the indicator would mistakenlyindicate that it is in S j,t . Similarly, the false negative ratio of I j,t is deﬁned by FN j,t = Pr( I j,t ( x ) = 0 | x ∈ S j,t ) . Itcaptures the probability that given a request for x , issued attime t , the indicator would mistakenly indicate that it is notin S j,t .For every cache j , and every time t , we denote by π j,t =Pr( x / ∈ S j,t | I j,t ( x ) = 1) the positive exclusion probability ,that is, the probability that a requested item x is not inthe cache, despite a positive indication. Similarly, we let ν j,t = Pr( x / ∈ S j,t | I j,t ( x ) = 0) denote the negative exclusionprobability , that is, the probability that a requested item x isnot in the cache, given a negative indication. We denote by q j,t the probability of a positive indication for an item requestedfrom cache j , and refer to q j,t as the positive indication ratio .When clear from the context, we abuse notation and omit thetime t .Since a positive indication occurs when either x ∈ S j andno false-negative occurs; or x / ∈ S j , and a false-positive occurs, we have q j = Pr( I j ( x ) = 1) = h j · (1 − FN j ) + (1 − h j ) · FP j . (1)Using Bayes’ theorem it follows that π j = Pr( x / ∈ S j | I j ( x ) = 1) = FP j · (1 − h j ) /q j (2) ν j = Pr( x / ∈ S j | I j ( x ) = 0)= (1 − FP j ) · (1 − h j ) / (1 − q j ) , (3)for q j as deﬁned in Eq. (1).We say that a system is sufﬁciently-accurate if for everyindicator of cache j , FP j + FN j < . We note that in mostreal-life scenarios, both the false-positive ratio FP j , and thefalse-negative ratio FN j , are well below 0.5, and thereforesuch systems are sufﬁciently-accurate.The following simple condition characterizes sufﬁciently-accurate systems (proof omitted due to space constraints). Proposition 1.

A system is sufﬁciently-accurate iff for every j it holds that ν j > π j . For any queried item x , let N x denote the set of cacheswith positive indications, i.e., N x = { j | I j ( x ) = 1 } . We let n x denote the number of caches with positive indications, i.e., n x = | N x | .A request for datum x triggers a data access which consistsof (i) querying for x in all the n indicators, (ii) selectinga subset D ⊆ N of caches, and (iii) accessing all the | D | selected caches in parallel. Accessing each cache incurs somepredeﬁned access cost , c j . For ease of presentation, we assumewithout loss of generality that min j c j = 1 . The overall accesscosts of accessing a set D of caches is c D = P j ∈ D c j .A multi-cache data access is considered a hit if the item x is found in at least one of the accessed caches, and a miss otherwise. A miss incurs a miss penalty of M , for some M ≥ . In our model, we do not assume any speciﬁc sharingpolicy among the caches. Yet, in the analysis of our system(Sections III-IV) we assume that the exclusion probabilitiesare mutually independent. Under this assumption, our analysisprovides a baseline for understanding the performance of suchsystems. However, in the evaluation of our algorithms, we alsoconsider environments where the exclusion probabilities neednot be mutually independent (Section V).For a subset of caches D , we deﬁne its (expected) miss cost for a query x by M · Q j ∈ DI j ( x )=1 πj · Q j ∈ DI j ( x )=0 νj .The (expected) service cost of a query is the sum of theaccess cost and the miss cost, namely, φ x ( D ) = X j ∈ D c j + M Y j ∈ DI j ( x )=1 πj · Y j ∈ DI j ( x )=0 νj. (4)The Cache Selection (CS) problem is to ﬁnd a subset ofcaches D ⊆ N that minimizes the expected cost φ x ( D ) . In what follows we refer to an access to a cache with apositive indication as a positive access, and refer to an accessto a cache with a negative indication as a negative access.In particular, we consider two types of approaches to solving This deﬁnition is inspired by the notions of accuracy and informed-ness [26]. When clear from the context, we will omit the subscript x from φ x . the CS problem: (i) false-negative oblivious ( FNO ) schemes,which only perform positive accesses, and (ii) false-negativeaware ( FNA ) schemes, which may also perform negativeaccesses. While the former may be viewed as the traditionalmanner in which access strategies are designed, the latter canbe viewed as a more speculative approach, which sometimesaccesses a cache even with no positive indication, risking anincreased access cost.III. T HE F ULLY H OMOGENEOUS C ASE

In this section, we focus on a simpliﬁed fully-homogeneouscase. In such settings, the access cost of all caches is the same,and is normalized to one ( c = 1 ). The per-cache hit ratio,false-positive ratio, and false-negative ratio, are identical forall caches. I.e., for each cache j , h j = h , FP j = FP and FN j = FN for some constants h, FP , FN ∈ [0 , .We explore the challenges and potential beneﬁts aris-ing from developing a false-negative aware cache selec-tion strategy. We ﬁrst describe the aspects speciﬁc to suchhomogeneous settings, and then describe and analyze ourfalse-negative aware Homogeneous Cache Selection policy,HoCS

FNA . Our analysis shows that HoCS

FNA minimizes theservice cost in the fully-homogeneous case. Later on, we useHoCS

FNA to derive insights as to when it is beneﬁcial toaccess a cache despite a negative indication.

A. Preliminaries

In the fully-homogeneous settings, the task of selecting asubset of the caches D ⊆ N that minimizes the service cost isreduced to selecting two integers: ≤ r ≤ n x , the number ofcaches with positive indication to access; and ≤ r ≤ n − n x ,the number of caches with negative indication to access. Theobjective function φ (Eq. (4)) is reduced to ˆ φ ( r , r ) = r + r + M · ν r · π r . (5)We let r ∗ and r ∗ denote the values of r and r thatminimize the service cost, namely ˆ φ ( r ∗ , r ∗ ) = min ≤ r ≤ n x , ≤ r ≤ n − n x ˆ φ ( r , r ) (6)In our analysis of ˆ φ , we use the extension of ˆ φ overthe reals, which we hereafter denote by ˜ φ . Observe that forany ﬁxed constants a, b , the functions ˜ φ ( a, r ) , and ˜ φ ( r , b ) are strictly convex. The following proposition will be usedthroughout our analysis in this section (proof omitted due tospace constraints). Proposition 2.

Let ˜ f : R → R be a strictly convex function,and let ˆ f be its restriction over the integers. If ˆ f obtains itsminimum value at some integer ˆ a ≥ , then ˆ f (1) < ˆ f (0) . We recall that the function ˜ φ ( r , r ∗ ) is strictly convex. Byapplying Proposition 2, we therefore obtain the followingcorollary. Corollary 3. If r ∗ ≥ , then ˆ φ (1 , r ∗ ) < ˆ φ (0 , r ∗ ) . Algorithm 1

HoCS

FNA r ∗ = 0; r ∗ = arg min r ∈ [0 ,n x ] [ r + M · π r ] if M · π r ∗ > then r ∗ = arg min r ∈ [0 ,n − n x ] (cid:2) r + M · π r ∗ ν r (cid:3) return r ∗ , r ∗ B. An Optimal Strategy

Our algorithm for the fully-homogeneous settings,HoCS

FNA , is formally deﬁned in Algorithm 1. The algorithmﬁrst calculates the number of caches with positive indicationto access, r ∗ , assuming no cache with negative indication isaccessed (line 1). Next, if the expected miss cost is still higherthan accessing an additional cache (the condition in line 2),the algorithm also considers caches with negative indications(line 3). The following theorem shows that HoCS FNA isoptimal in the fully-homogeneous case.

Theorem 4.

If a fully-homogeneous system is sufﬁciently-accurate, then HoCS

FNA minimizes the service cost.Proof.

By Proposition 1, we have ν > π . In addition, by thedeﬁnition of ˆ φ (Eq. (5)), the objective function ˆ φ ( r , r ) issymmetric in r , r . It follows that assigning r > mayreduce cost only if r is maximized, namely, if r = n x .Hence, line 1 indeed calculates the optimal value of r ∗ .If M · π r ∗ > , then the algorithm sets r ∗ to a value whichminimizes the total cost (line 3). Else (namely, if M · π r ∗ ≤ ),accessing cache(s) with negative indications can only increasethe total cost, as it increases the aggregate access cost by atleast 1, which is at least as high as the potential marginaldecrease in the miss cost, which is at most M · π r ∗ ≤ .Hence, when the if-condition (line 2) is not met, it is best tokeep the default value r ∗ = 0 .HoCS FNA implies that sometimes it is better to accessa cache in spite of a negative indication. The followingproposition characterizes such cases.

Proposition 5.

Assume ν, π ∈ (0 , . (i) If n x = 0 , thenaccessing at least one cache with negative indication strictlyreduces the service cost iff ν < − M . (ii) If n x > , thenaccessing at least one cache with negative indication strictlyreduces the service cost iff ν < − Mπ nx and M · π n x − (1 − π ) > .Proof. We ﬁrst provide some intuition as to the validity ofthe claim. Intuitively, ν is a proxy to the true negative rate.Hence, the conditions in cases (i) and (ii) reﬂect the fact thatif the expected miss cost after accessing all the caches witha positive indication is high, and the true-negative rate is low,then it might be beneﬁcial to access a cache with a negativeindication. We now turn to prove the claim.(i) Assume n x = 0 , implying that r ∗ = 0 . By the deﬁnitionof ˆ φ , the inequality ν < − M holds iff ˆ φ (1 ,

0) = 1 + M · ν . Hence, by Corollary 3, ˆ φ (1 , < ˆ φ (0 , , thus completing the proof of this case.(ii) Assume n x > . By the deﬁnition of ˆ φ the condition M · π n x − (1 − π ) > holds true iff n x + M π nx < n x − M n x − , which is equivalent to ˆ φ (0 , n x ) = ˆ φ (0 , n x − .Similarly, the condition ν < − Mπ nx holds true iff ˆ φ (1 , n x ) < ˆ φ (0 , n x ) . We therefore have to prove that accessinga cache with a negative indication strictly reduces the servicecost iff ˆ φ (1 , n x ) < ˆ φ (0 , n x ) < ˆ φ (0 , n x − .For the ﬁrst direction, assume that accessing a cache withnegative indication strictly reduces the service cost, namely, r ∗ > . By the proof of Theorem 4, having r ∗ > impliesthat r ∗ = n x . Hence, by Corollary 3, ˆ φ (1 , n x ) < ˆ φ (0 , n x ) . Fur-thermore, HoCS FNA calculates r ∗ while using r = 0 (line 1).We can therefore conclude that ˆ φ (0 , n x ) < ˆ φ (0 , n x − , thuscompleting the proof for this direction.For the other direction, assume that ˆ φ (1 , n x ) < ˆ φ (0 , n x ) < ˆ φ (0 , n x − . The function ˜ φ (0 , r ) is convex in r , andtherefore does not have a local maximum within the domain r ∈ [0 , n x ] . As ˆ φ (0 , n x ) < ˆ φ (0 , n x − , it follows that ˜ φ (0 , r ) is monotonously decreasing in the range r ∈ [0 , n x ] .Hence, ˆ φ (0 , r ) is minimized when r = n x . As HoCS FNA cal-culates r ∗ by assigning r = 0 (line 3), we know that r ∗ = n x .By assigning r ∗ = n x in our assumption ˆ φ (1 , n x ) < ˆ φ (0 , n x ) ,we obtain ˆ φ (1 , r ∗ ) < ˆ φ (0 , r ∗ ) . Hence, accessing at least onecache with a negative indication strictly reduces the servicecost.HoCS FNA also implies that sometimes it is better to accessno cache, even if there exist positive indications. The followingproposition characterizes such cases.

Proposition 6.

If there exist positive indications but (1 − h ) FP ≥ h (1 − FN)( M − , then the best policy is to accessno cache.Proof. We ﬁrst provide some intuition as to the validity of ourclaim. Observe that in a system where (1 − h ) FP is large, apositive indication is very likely to be false-positive. Hence,it may be beneﬁcial not to access a cache despite a positiveindication. This is true especially if the miss penalty is small,as reﬂected by the right-hand-side of the condition.We now turn to prove the claim. To prove the claim, itsufﬁces to show that if there exist positive indications, and thebest policy is to access at least one cache, then (1 − h ) FP , or r ∗ > . By the proof of Theorem 4,assigning r > may reduce the cost only if r ∗ = n x . As itis given that there exist positive indications (namely, n x > ,it follows that r ∗ > only if r ∗ > . Hence, the best policyis to access at least one cache only if r ∗ > .Algorithm HoCS FNA calculates r ∗ by setting r = 0 (line 1). As HoCS FNA minimizes the service cost (Theo-rem 4), we have that ˆ φ (0 , r ∗ ) < ˆ φ (0 , . As the function ˜ φ (0 , r ) is strictly convex, by Proposition 2 we know that ˆ φ (0 , < ˆ φ (0 , . By the deﬁnition of ˆ φ (Eq. (5)), we have M · π < M . Using the expression for π from Eq. (2) (recallthat in the homogeneous case, we omit the subscript j ), weobtain FP(1 − h ) q < M − M . Assigning the value of q from Eq. (1),we have M · FP(1 − h ) < ( M −

1) [ h (1 − FN) + (1 − h ) FP] .By algebraic simpliﬁcation this implies that FP(1 − h )

This section focuses on the heterogeneous case, where bothcache access costs and exclusion probabilities can be arbitrary.The main challenge in such settings is to evaluate the exclusionprobabilities π j and ν j . In order to evaluate these terms, wecollect recent statistics of the various parameters governingsystem behavior.In what follows, we ﬁrst provide an overview of therelevant aspects of Bloom ﬁlters, which we use as indicators(Sec. IV-A). Subsequently, we detail how we estimate and usethe statistics we collect, both by the cache (Sec. IV-B), and bythe client (Sec. IV-C), to evaluate the exclusion probabilities, π j , and ν j . Later on, we show how approximation algorithmsdevised for false-negative oblivious settings (e.g., in [14]) canbe used as a building block in algorithms for solving the CSproblem. A. Bloom Filters and False Negatives

For completeness, we provide a brief overview of simpleBloom ﬁlters [9], and their structural properties which arerelevant for our proposed solutions. A Bloom ﬁlter is arandomized data structure that approximately represents a setof items. Bloom ﬁlters are composed of a bit array of size | I | ,as well as k independent hash functions. When adding an itemto the ﬁlter, each of the k hash functions is applied to the item,and the corresponding bit in the array is set. When testingfor an item’s existence, we apply the k hash functions andtest the corresponding bits. If all bits are set, the Bloom ﬁlterreplies with a positive indication. Otherwise, the indication isnegative.In a fresh Bloom ﬁlter, which is updated upon every inser-tion of an item to the set, positive indications may be false dueto hash collisions, but negative indications are guaranteed to becorrect. However, in a stale Bloom ﬁlter, negative indicationsmay also be erroneous. Such false-negatives occur, e.g., whenindicators are advertised to the users only periodically. In sucha scenario, when a new item is admitted, but the updatedindicator is not yet advertised, the stale indicator availableto the user fails to represent this change.To allow a meaningful analysis of the trade-off betweenaccuracy and memory footprint, it is useful to express the sizeof Bloom ﬁlters using the notion of Bits Per Element (bpe) .Intuitively, optimally conﬁgured Bloom ﬁlters of the same bpehave the same false-positive accuracy, regardless of the size ofthe set being approximated by the Bloom ﬁlter. More formally,given the value of bpe, one can calculate the optimal numberof hash functions k , that minimizes the false-positive ratio(see [13]). B ( t ) B ( t ) updated BF . . . . . . . . . . . . . . . stale BF . . . . . . . . . . . . . . . ( t ) B ( t ) − ∆ ( t ) ∆ ( t ) B ( t ) − ∆ ( t ) Fig. 2. An example of an updated Bloom ﬁlter and a stale Bloom ﬁlter attime t . We consider the usage of Bloom ﬁlters as indicators ofcache content. In particular, we let C j denote the size of cache j , i.e., the maximum number of elements that can be stored incache j , where we assume that all elements have the same size.The actual size of the bloom ﬁlter indicator I j associated withcache j is therefore | I j | = bpe · C j . Each cache manages itsown Bloom ﬁlter, and occasionally advertises the indicator tothe client. At any time t , we consider the updated Bloom ﬁltermaintained by the cache, and let B ( t ) and B ( t ) denote thenumber of bits set (i.e., with value 1) and reset (i.e., with value0), at time t , respectively, in the Bloom ﬁlter approximatingthe content of the cache. A client uses a replica of the cache’sBloom ﬁlter, which represents a snapshot of the Bloom ﬁlterthat the cache advertised at some time t ′ ≤ t . We refer to thisreplica as the stale Bloom ﬁlter. Let ∆ ( t ) denote the numberof bits that are set in the updated Bloom ﬁlter but are resetin the stale Bloom ﬁlter. Similarly, we let ∆ ( t ) denote thenumber of bits that are reset in the updated Bloom ﬁlter, butare set in the stale Bloom ﬁlter. Fig. 2 illustrates this situation.For clarity of presentation, and WLOG, we group together allthe bits contributing to ∆ ( t ) , and also group together the bitsaccounted for by ∆ ( t ) . The false-negative ratio:

Consider a query for an item x that is stored in the cache at time t . Recall that an updatedBloom ﬁlter never exhibits false negatives. Hence we knowthat all the k hashes of x are mapped to the bits that areset in the updated Bloom ﬁlter. The query for x in a staleindicator is a true positive iff all the hashes are mapped tothe set of B ( t ) − ∆ ( t ) bits that are also set in the staleindicator; by the fact that the k hash functions are independent,and uniformly distributed over their range, this happens withprobability h B ( t ) − ∆ ( t ) B ( t ) i k . Otherwise, the query for x is afalse-negative. It follows that the false-negative ratio of thecache at time t can be estimated by FN t = 1 − (cid:20) B ( t ) − ∆ ( t ) B ( t ) (cid:21) k . (7) The false-positive ratio:

Consider a query for an item y that is not stored in the cache at time t . For uniformlydistributed and independent hash functions, the hashes of y are mapped to arbitrary locations in the Bloom ﬁlter. The staleBloom ﬁlter exhibits a false positive iff all the k hashes of y map to bits that are set in the stale Bloom ﬁlter. Hence, theprobability of a false positive in the stale indicator at time t can be estimated by: FP t = (cid:20) B ( t ) − ∆ ( t ) + ∆ ( t ) | I j | (cid:21) k . (8)We note that Eqs. (7) and (8) are only estimations, while theexact miss probabilities strongly depend upon the workload,and the cache policy. For instance, consider the case whereimmediately after cache j sends an update, it caches an item x . Algorithm 2 CS FNA ( N, ~c, M,

Alg) periodically obtain updated FP j , FN j from each cache j periodically obtain updated indicator I j from each cache j estimate q j for each cache j for every request for datum x do for j = 1 , . . . , n do h j = q j − FN j − FP j − FN j ⊲ Eq. (1) if I j ( x ) == 1 then ρ j = FP j · (1 − h j ) /q j ⊲ Eq. (2) else ρ j = (1 − FP j ) · (1 − h j ) / (1 − q j ) ⊲ Eq. (3) D = Alg( N, ~c, ~ρ, M ) ⊲ Reduction of Theorem 7 access D Then, any subsequent request for x , until the cache advertisesthe next update, are false-negatives. However, until the cacheadvertises the next updated indicator, x may be accessedmany times, or not accessed at all, according to the concreteworkload.The analysis above assumes that the indicator is a simpleBloom ﬁlter [9]. However, one may apply a similar techniqueof estimating the false-positive ratio, and the false-negativeratio, also to other indicators (e.g., TinyTable [12]).Estimating the false-negative ratio and the false-positiveratio can be done periodically to reduce the computationaloverhead of comparing the stale and updated bloom ﬁlters. Inwhat follows, we show how these estimations can be harnessedfor solving the CS problem in heterogeneous and dynamicsettings. B. Cache-side Algorithm

The cache maintains both the stale Bloom ﬁlter (i.e., themost recently advertised Bloom ﬁlter, which is also availableat the client) and the updated Bloom ﬁlter. Along a sequence ofrequests σ , each cache j will estimate the false-negative ratio,and the false-positive ratio, according to Eqs. (7) and (8), bycomparing the stale and updated Bloom ﬁlters. These estimateswill be sent (periodically) to the client. C. Client-side Algorithm

The client has two main tasks when solving the CS problem:(i) estimating the exclusion probabilities π j and ν j for everycache j , and (ii) deciding which of the caches should be ac-cessed for every request. The algorithm executed by the client,which we dub CS FNA , is formally deﬁned in Algorithm 2. Wenow turn to describe and discuss the two main tasks performedby CS

FNA . Exclusion probabilities estimation:

For estimating theexclusion probabilities of some cache j , the client makesuse of Eqs. (1)-(3). First, recall that the client periodicallyreceives estimations of FN j , FP j from the cache. Next, forevaluating q j , the client periodically estimates the probability Pr( I j ( x ) = 1) empirically , using a weighted exponential mov-ing average. Formally, consider a sequence of requests σ , andconsider epochs of T requests. Let a j ( s, t ) denote the number of positive indications of indicator I j for requests s + 1 , . . . , t made by the client. For any t ≤ T we let the estimated positiveindication ratio after handling request t be q j,t = a j (0 ,t ) t . Forevery i = 1 , , . . . and every iT < t < ( i + 1) T , we let q j,t be the most recent estimate over epochs of T requests, i.e., q j,t = q j, ⌊ t/T ⌋· T , and the estimate is updated at t = ( i + 1) T such that q j, ( i +1) T = δ · a j ( iT, ( i + 1) T ) T + (1 − δ ) · q j,iT , (9)where δ ∈ (0 , is some constant governing the dynamics ofthe estimate change. We note that only the client can performsuch an estimation since it requires knowing all the requestsin σ , and not only requests for which the cache has beenaccessed.Finally, given the current values for FP j , FN j , and q j ,for every item being requested in the sequence σ , the clientestimates the hit ratio h j (line 6), and the exclusion probabil-ities π j (line 8) and ν j (line 10) using Eqs. (1), (2), and (3),respectively. Choosing the caches to access:

For any set of caches D , the client’s estimations of the exclusion probabilities essen-tially determine the expected miss cost. By setting ρ j = π j if I j ( x ) = 1 , and ρ j = ν j if I j ( x ) = 0 , the expected miss costcan be expressed by M · Q j ∈ D ρ j , and the objective functiondeﬁned in Eq. (4) translates to ﬁnding the set of caches D minimizing φ x ( D ) = X j ∈ D c j + M · Y j ∈ D ρ j . (10)The problem of ﬁnding a set of caches D (out of thosewith a positive indication) minimizing an objective of the formdepicted in Eq. (10) has been studied in [14], where theypresent several approximation algorithms for the problem. Theproblem studied in [14] is essentially equivalent to assumingthat there are no false-negative indications, and therefore itsufﬁced to consider only caches for which I j ( x ) = 1 . Werefer to this special case of the CS problem as the restrictedCS problem .When considering the restricted CS problem within ourmodel, the framework of [14] can be viewed as assumingthat all caches have a positive indication, and ρ j representsthe positive exclusion probability of cache j . Equivalently, themodel of [14] essentially assumed that ν j = 1 for all j , whichis fundamentally not the case in the CS problem.Our proposed algorithm CS FNA selects the set of caches toaccess as follows: (i) CS

FNA gets as input an algorithm

Alg forsolving the restricted CS problem (assuming all caches have apositive indication), (ii) generates the appropriate input for thisalgorithm (as described above) in lines 7-10, and (iii) accessesthe set of caches prescribed by algorithm

Alg .The following theorem serves to analyze the worst-caseperformance guarantees of CS

FNA . Theorem 7.

If there exists an algorithm

Alg that is an α -approximation algorithm for the restricted CS problem, thenthere exists an α -approximation algorithm for the general CSproblem (with arbitrary values of ν j ).Proof sketch. Assume an input to the CS problem such thatevery cache j has its indicator I j , and its positive and negative exclusion probabilities π j and ν j , respectively. In what followswe abuse notation, and refer to φ x,~π,~ν,~I as the expected servicecost for an input x , given these system parameters.Let Alg be an α -approximation algorithm for the restrictedCS problem. Assume each cache j has some arbitrary negativeexclusion probability, ν j . For every cache j , we let π ∗ j = π j if I j ( x ) = 1 , and let π ∗ j = ν j if I j ( x ) = 0 . Furthermore, we let ν ∗ j = 1 for all j . Finally, for every cache j we deﬁne indicator I ∗ j such that I ∗ j ( x ) = 1 , implying that the set of caches witha positive indication according to ~I ∗ is the set of all caches, N .We deﬁne algorithm Alg ∗ such that Alg ∗ returns the outputof Alg for the inputs of ~π ∗ (for the positive exclusion proba-bilities), ~ν ∗ (for the negative exclusion probabilities), and theset of all caches with a positive indication according to ~I ∗ (i.e., N ). We now show that the solution returned by Alg ∗ isan α -approximate solution for the CS problem with negativeexclusion probabilities ν j .By the assumption on Alg , its output D satisﬁes φ x,~π ∗ ,~ν ∗ ,~I ∗ ( D ) ≤ α · φ x,~π ∗ ,~ν ∗ ,~I ∗ ( D ∗ ) (11)where D ∗ is the optimal solution to the CS problem with ~π ∗ , ~ν ∗ , and the set of caches induced by ~I ∗ as inputs. Since bythe deﬁnition of ~π ∗ , ~ν ∗ , and ~I ∗ it follows that φ x,~π ∗ ,~ν ∗ ,~I ∗ ( ˜ D ) = φ x,~π,~ν,~I ( ˜ D ) (12)for every set of caches ˜ D , we are guaranteed to have that D ∗ is also the optimal solution to the CS problem with ~π , ~ν , and ~I as inputs. The theorem follows.The proof of Theorem 7 implies the following corollary. Corollary 8.

If the estimations of π j and ν j produced byCS FNA are precise, and the algorithm

Alg used by CS

FNA isan α -approximation algorithm for the restricted CS problem,then CS FNA produces an α -approximate solution to the CSproblem. Combining Corollary 8 with the results of [14], we obtain amyriad of tradeoffs and possible approximation guarantees forCS

FNA . In particular, in Sec. V we consider the performanceof one speciﬁc realization of CS

FNA , which uses algorithm DS PGM for the restricted CS problem presented in [14].V. S

IMULATION S TUDY

In this section, we evaluate the performance and trade-offsof our proposed false-negative aware algorithm, CS

FNA , in avariety of scenarios, using traces of real-life workloads. Webegin by describing our evaluation settings and parameters.

A. Simulation SettingsTraces:

We use the following real workloads, which arecommonly used when evaluating caching systems: (i)

Wiki :Read requests to Wikipedia pages [27]. (ii)

Gradle : Gradle isa build tool that reduces the compilation time of large projectsby caching build parts. Users access the cache to fetch up-to-date compiled parts, and they compile from scratch upona cache miss. This trace [28] was provided by the Gradle project. (iii)

Scarab : A trace from Scarab Research, a per-sonalized recommendation system for e-commerce sites [28].(iv) F2: Traces taken from a ﬁnancial transaction processingsystem [29]. For each of the traces above, we use the ﬁrst 1Mrequests in the trace.

Caches:

We consider a system-wide request distributionwhere a missed item is placed in a single cache that is chosenby the controller. Such an approach is common in large dis-tributed systems, in an attempt to obtain good load balancing,and maximize the amount of content being cached [30].Each cache applies the

Least Recently Used (LRU) policy,which is arguably the most common policy used in practice.LRU maintains items ordered by their last access time andevicts the least recently used item when admitting an item toa full cache.

Indicators:

Each cache j of size C j periodically adver-tises an indicator I j of size bpe · C j . For computing the indi-cator, cache j maintains a Counting Bloom Filter (CBF) [10]with three-bit counters, where the number of counters is bpe · C j . The CBF uses the optimal number of hash functionsso as to minimize the false positive probability. The CBF ismaintained for bookkeeping where we add an item to theCBF upon item insertion to the cache, and remove an itemfrom the CBF upon item eviction. The cache uses the CBFfor constructing the advertised indicator by compressing theCBF to a simple (1 bit) indicator where a bit is set iff therespective counter in the CBF is strictly positive. Algorithms compared:

Recall that CS

FNA makes use ofan algorithm for solving the CS problem for the case whereindicators exhibit no false-negatives. In our evaluation, wemake use of the DS PGM algorithm from [14]. This strategywas shown to produce a (log M ) -approximation for the CSproblem with no false-negatives. By Theorem 7 this guaranteealso applies to the general CS problem. Furthermore, thisalgorithm exhibits close-to-optimal results in practice, whentested on real-world workloads [14].We consider two benchmarks for evaluating the performanceof CS FNA : (i) applying the vanilla DS PGM algorithm, whichonly considers accessing caches with a positive indication(albeit stale), using only the estimates of π j for every cache j ,and using ν j = 1 for all j , and (ii) the ideal strategy that uses perfect information (PI), i.e., a strategy that always has accessto the precise cache content, which accesses the cheapest cachecontaining an item if such a cache exists, and doesn’t accessany cache otherwise. We refer to the former algorithm whichis false-negative oblivious as the CS FNO algorithm, and to thelatter ideal strategy as the PI strategy.Throughout our evaluation both CS

FNA and CS

FNO eval-uate q j with a time horizon of T = 100 requests andusing δ = 0 . for the weighting of the moving average.Furthermore, each cache j re-estimates the false-positive ratio FP j and the false-negative ratio FN j once every 50 insertionsto the cache. Evaluation metric:

We consider the mean service costper request for each algorithm over the entire input. In orderto obtain a qualitative metric for comparing the performanceof the various algorithms and scenarios, we consider the normalized cost of each algorithm, where we normalize the w i k i g r a d l e s ca r a b F . . . N o r m a li ze d s e r v i cec o s t M = 50 w i k i g r a d l e s ca r a b F M = 100 w i k i g r a d l e s ca r a b F M = 500 CS FNO CS FNA

Fig. 3. Normalized cost of the heterogeneous 3-caches baseline scenario forvarying traces and miss penalty values. mean service cost by the mean service cost of the ideal PIstrategy. We recall that the PI strategy is infeasible in real-life,as it requires the client to have an accurate representation ofthe cache content at any time. However, it is instructive touse it as the baseline comparison, and its performance can beviewed as a lower bound on the cost of any policy for solvingthe CS problem.

Baseline scenario:

Unless stated otherwise, our evalua-tion considers three K elements caches whose access costsare , , and , and a miss penalty of 100 (chosen to be 50times the average cache access cost). Each cache advertisesan updated indicator once the number of insertions performedsince the previous update reaches 10% of its capacity. For ourbaseline scenario, this translate to an indicator advertisementonce every K insertions. The advertised indicator of eachcache j uses bpe = 14 , which implies an indicator size of · C j , where the number of hash functions is optimized tominimize the false-positive ratio. In particular, in our baselinescenario, this translates to a designed false-positive ratio of0.1% [13]. In each of our evaluations, we explore the impactof varying one of the systems parameters, where the remainingparameters are set according to our baseline scenario.Our Python code used for performing the evaluation isavailable in [31]. B. Impact of Miss Penalty and Workload Diversity

We ﬁrst compare the performance of CS

FNO and CS

FNA when varying the miss penalty values M in the range { , , } . The results in Figure 3 show that while theperformance of the false-negative oblivious policy CS FNO degrades as the miss penalty increases, the performance ofour proposed false-negative aware algorithm CS

FNA improvessigniﬁcantly. Furthermore, the performance of CS

FNA tends tothe optimal performance as the miss penalty increases. Thisbehavior follows from the fact that a higher miss penaltyaccentuates the impact of false-negative events. In particular,ignoring negative indications (as is done by CS

FNO ) mightreduce the access cost but is severely penalized by an increasedexpected miss cost in cases where the miss penalty is large.Fig. 3 also demonstrates signiﬁcant differences across dis-tinct workloads. CS

FNO worst performance is exhibited forthe Gradle trace, whereas its best performance is obtained forthe Wiki trace. To understand this phenomena, we observe thatGradle exhibits a high recency-bias, where items are requested

16 128 1K 8K1234567 Update Interval [ N o r m a li ze d S e r v i ce C o s t Wiki 16 128 1K 8KUpdate Interval [

FNO CS FNA

Fig. 4. Normalized cost of the heterogeneous 3-caches baseline scenario forvarying update intervals. Update intervals are measured by the number ofcache insertions between subsequent updates. shortly after their ﬁrst appearance. As false-negatives occursince the indicator takes time to reﬂect the insertion of newitems, CS

FNO , which never accesses caches with a negativeindication, fails to take advantage of this recency-bias. Incontrast, the Wiki trace is more frequency-biased, whichimplies that popular items do not rapidly change over timeand that the impact of false-negatives is less pronounced.As the Wiki trace and the Gradle trace exhibit the mostextreme scenarios in terms of the impact of false-negatives,due to space constraints, in subsequent sections, we focus ourattention on these traces alone.

C. Impact of Advertisement Policy and Indicator Parameters1) Update interval:

We now turn to study the effect ofstaleness on the performance of our algorithm. To this end,we let the update interval, which is the number of insertionsbetween indicator advertisement, vary between 16 and 8K(8192), and consider the normalized cost of both CS

FNA andCS

FNO . These results are presented in Figure 4, where weconsider the performance for the Gradle and Wiki workloads.Our results show that both algorithms’ performance de-grades as the update interval increases. When updates arerelatively frequent (i.e., up to 128), the performance of CS

FNA and CS

FNO is similar. However, a signiﬁcant gap emergesbetween the performance of both algorithms for larger updateintervals. In particular, the performance of CS

FNO , which ig-nores negative indications, quickly degrades, whereas CS

FNA shows a considerably milder degradation. This phenomenon isdirectly related to the fact that when the update interval is large,the false-negative ratio increases signiﬁcantly (as demonstratedin Fig. 1). Under such regimes, CS

FNO fails to access acache even when the item is available at the cache, whereasCS

FNA relies on its false-negative awareness to make accesseseven in cases of negative indications, taking into account thefalse-negative ratio estimation provided by the caches. Ourresults imply that CS

FNA match the performance of CS

FNO while using a signiﬁcantly lower bandwidth overhead for cacheadvertisements. For instance, for the Wiki workload CS

FNA obtains the same service cost as CS

FNO while using x16 lessbandwidth for indicator advertisements. To observe this notice N o r m a li ze d S e r v i ce C o s t Wiki, uInterval = 256 5 7 9 11 13 15Wiki, uInterval = 1024CS

FNO CS FNA N o r m a li ze d S e r v i ce C o s t Gradle, uInterval = 256 5 7 9 11 13 15Indicator Size [bits per element]Gradle, uInterval = 1024Fig. 5. Normalized cost of the heterogeneous 3-caches baseline scenario forvarying indicator sizes, measured by the number of bits-per-cached-element(bpe). that CS

FNA ’s cost using an update interval of K is on parwith the service cost of CS FNO with an update interval of .

2) Indicator size:

Fig. 5 illustrates our results for varyingthe size of the indicator being used and advertised by the cache.We vary the number of indicator bits per cached element (bpe)and study the impact of the indicator’s size on the service cost.In our evaluation, we compare the performance of CS

FNO andCS

FNA with update intervals of 256 and 1024.Not surprisingly, in most cases, the performance improveswhen increasing the indicator size, which can be attributedto the fact that larger indicators exhibit fewer false-positiveerrors. Interestingly, and somewhat counter-intuitively, thereexist cases where the performance of CS

FNO does not im-prove when increasing the indicator size, and in some casesperformance actually degrades . To understand this anomaly, itis instructive to understand the impact of false-positive indi-cations, false-negative indications, and the interplay betweensuch errors. First, note that the false-positive rate is ofteninversely proportional to the false-negative rate. I.e., a constantdecrease in the false-positive ratio is usually associated withan increase in the false-negative ratio. An extreme such caseoccurs, for example, when all indications are negative, thusexhibiting a false-positive ratio of 0 and a large false-negativeratio. Next, note that a false-positive event typically translatesto an unnecessary cache access, resulting in a relatively smallpenalty (e.g., an access cost of 1, 2, or 3 in our evalua-tion). However, a false-negative event typically translates toa “non-compulsory” miss, translating to a high miss penalty(e.g., 100, in our evaluation). It follows that even a milddecrease in the false-positive ratio may easily result in a non-negligible increase in the false-negative ratio that may nullify

1K 2K 4K 8K 16K 32K203040 Cache Size S e r v i ce C o s t update Interval = 256 1K 2K 4K 8K 16K 32K203040 Cache Sizeupdate Interval = 1024PI CS FNO CS FNA

Fig. 6. Normalized cost of the heterogeneous 3-caches baseline scenario forvarying cache sizes. the contribution of a lower false-positive ratio. Such effectsare especially signiﬁcant when the miss penalty is high. Highmiss penalties are common in systems as cache misses oftenimply accessing slower memories whose access time maybe orders of magnitude higher than that of the cache [24],[32]. Still, our proposed false-negative aware algorithm CS

FNA handles such scenarios seamlessly, extracting the beneﬁts ofthe reduced false-positive ratio without any adverse impact onperformance.

D. Impact of Caching Capacity

In this section, we study the effect of the caching archi-tecture on the performance of our algorithm. In particular,we study the effect of having a larger, or more diverse,caching capacity on system performance. To this end, weuse a longer trace of Wiki to ensure that sufﬁciently manyelements are indeed accessed, and caches are not underutilized.Speciﬁcally, we make use of a Wiki trace containing 4.3Mrequests and 394K unique elements. One should note that thecost of the ideal PI strategy usually drops when increasingthe caching capacity as it has perfect information, and therequested items are more likely to be stored in at least oneof the caches. Therefore, in order to provide better insightas to the performance of the system, throughout this sectionwe consider the actual mean cost per request (and not thenormalized service cost, as done in previous sections).

1) Scaling the cache size:

We study the impact of the cachesize on the performance of CS

FNO and CS

FNA with update in-terval of 256 and 1024. The results in Fig. 6 show that, as couldbe expected, for every given setting, scaling-up the caches’capacities decreases the service cost due to the improvedhit ratio. Our results show that when updates are relativelyfrequent (e.g., the case where the update interval is 256), theperformance of both CS

FNO is comparable to that of CS

FNA and they both exhibit a performance close to that of the idealPI strategy. However, once the updates are less frequent (e.g,the case where the update interval is 1024), CS

FNO exhibits asigniﬁcant degradation in performance. CS

FNA , on the otherhand, is far less affected by the increase in the update interval,and is still quite comparable to PI. In general, CS

FNA shows upto 25% reduction in cost compared to CS

FNO . The differences Number of Caches S e r v i ce C o s t update Interval = 256 1 2 3 4 5 6 7 8Number of Cachesupdate Interval = 1024PI CS FNO CS FNA

Fig. 7. Effect of varying the number of caches. The access cost to each cacheis 2, and the miss penalty is 100. between CS

FNO and CS

FNA become more accentuated whenone considers the cache size required to maintain a certainlevel of cost; CS

FNA performs better in a system where cachesare of size 4K, than CS

FNO performs with caches of size 32K.

2) Scaling the number of caches:

To study the effect ofvarying the number of caches in the system, we focus ourattention on homogeneous cache access costs to obtain resultscomparable with the results presented in previous sections.In particular, we assume all caches have an access cost of2, ensuring that the average access cost is the same as inother scenarios examined in our evaluation. Fig. 7 showsthe results for update intervals of 256 and 1024. The resultsshow that having more caches is not necessarily beneﬁcial toeither CS

FNA or CS

FNO . Intuitively, for CS

FNA , having morecaches with negative indications makes it harder to identify which of them are actually false-negative. Similarly, for bothCS

FNA and CS

FNO , having more caches with false-positiveindications makes it harder to reliably select a cache with-out a false-positive indication. However, CS

FNA consistentlyoutperforms CS

FNO , and signiﬁcantly more so as the updateinterval increases. VI. C

ONCLUSIONS

This work studies the cache selection problem while usingapproximate indicators exhibiting both false-positive and false-negative errors. The client in such a system selects a subset ofthe caches to minimize the expected service cost. While thereis extensive work in this ﬁeld, all previous access strategies donot access caches with negative indications. While reasonableat ﬁrst glance, our work shows that such an omission severelyhinders the system performance. We argue that caches thatperiodically advertise their content indicators inherently intro-duce false-negative indications, and the rate of such indicationsis non-negligible. In particular, we show that it is sometimesadvisable to access caches with a negative indication, as it mayreduce the overall system cost.We devise false-negative-aware access strategies in twomain scenarios: (i) fully-homogeneous settings, where weshow a policy that attains the optimal (minimal) access cost,and (ii) general heterogeneous environments, where we presenta strategy for which we can bound its approximation guar-antee compared to the optimal solution. We complete our study through an extensive evaluation based on real systemtraces. Our results show that our proposed methods performsigniﬁcantly better than the state-of-the-art in diverse settings.Furthermore, our false-negative aware solutions can match thecost of competitive false-negative oblivious approaches whilerequiring an order of magnitude fewer resources (e.g., cachingcapacity or bandwidth required for indicators advertisement).Our results demonstrate the potential beneﬁts of embracingfalse-negative awareness into the algorithmic design space.We expect our work to further induce both analytic andexperimental research on the role of false negatives in largedistributed systems. R

EFERENCES[1] L. Fan, P. Cao, J. Almeida, and A. Z. Broder, “Summary cache:a scalable wide-area web cache sharing protocol,”

IEEE/ACM Trans.Netw. , vol. 8, no. 3, pp. 281–293, 2000.[2] B. M. Maggs and R. K. Sitaraman, “Algorithmic nuggets in contentdelivery,”

ACM SIGCOMM CCR , vol. 45, no. 3, pp. 52–66, 2015.[3] X. Guo, T. Wang, and S. Wang, “Joint optimization of caching androuting strategies in content delivery networks: A big data case,” in

IEEE ICC , 2019.[4] R. Hou, L. Zhang, T. Wu, T. Mao, and J. Luo, “Bloom-ﬁlter-basedrequest node collaboration caching for named data networking,”

ClusterComputing , vol. 22, no. 3, pp. 6681–6692, 2019.[5] X. Wang, M. Chen, T. Taleb, A. Ksentini, and V. C. M. Leung, “Cachein the air: exploiting content caching and delivery techniques for 5Gsystems,”

IEEE Comm. Mag. , vol. 52, no. 2, pp. 131–139, 2014.[6] M. Zhang, H. Luo, and H. Zhang, “A survey of caching mechanismsin information-centric networking,”

IEEE Comm. Surv. & Tut. , vol. 17,no. 3, pp. 1473–1499, 2015.[7] O. Rottenstreich and I. Keslassy, “The bloom paradox: When not to usea bloom ﬁlter,”

IEEE/ACM Trans. Netw. , vol. 23, no. 3, pp. 703–716,2015.[8] T. Le, Y. Lu, and M. Gerla, “Social caching and content retrieval indisruption tolerant networks (dtns),” in

IEEE ICNC , 2015, pp. 905–910.[9] B. H. Bloom, “Space/time trade-offs in hash coding with allowableerrors,”

Commun. ACM , vol. 13, no. 7, pp. 422–426, 1970.[10] F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese,“An improved construction for counting bloom ﬁlters,” in

ESA , 2006, pp.684–695.[11] L. Luo, D. Guo, R. T. Ma, O. Rottenstreich, and X. Luo, “Optimizingbloom ﬁlter: Challenges, solutions, and comparisons,”

IEEE Comm. Surv.& Tut. , vol. 21, no. 2, pp. 1912–1949, 2018.[12] G. Einziger and R. Friedman, “Counting with tinytable: Every bitcounts!”

IEEE Access , vol. 7, pp. 166 292–166 309, 2019.[13] S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz, “Theory and practiceof bloom ﬁlters for distributed systems,”

IEEE Comm. Surv. & Tut. ,vol. 14, no. 1, pp. 131–155, 2012.[14] I. Cohen, G. Einziger, R. Friedman, and G. Scalosub, “Access strategiesfor network caching,”

IEEE/ACM Trans. Netw. , 2021, [Online].[15] D. Guo, Y. Liu, X. Li, and P. Yang, “False negative problem of countingbloom ﬁlter,”

IEEE Trans. Knowl. Data Eng. , vol. 22, no. 5, pp. 651–664,2010.[16] Y. Zhu and H. Jiang, “False rate analysis of bloom ﬁlter replicas indistributed systems,” in

ICPP , 2006, pp. 255–262.[17] I.-W. Ting and Y.-K. Chang, “Improved group-based cooperative cachingscheme for mobile ad hoc networks,”

J. Parallel. and Distrib. Comp. ,vol. 73, no. 5, pp. 595–607, 2013.[18] M. Tortelli, L. A. Grieco, and G. Boggia, “CCN forwarding engine basedon bloom ﬁlters,” in

CFI , 2012, pp. 13–14.[19] S. Z. Kiss, ´E. Hosszu, J. Tapolcai, L. R´onyai, and O. Rottenstreich,“Bloom ﬁlter with a false positive free zone,” in

IEEE INFOCOM , 2018,pp. 1412–1420.[20] Y. Zhu, H. Jiang, J. Wang, and F. Xian, “HBA: Distributed metadatamanagement for large cluster-based storage systems,”

IEEE Trans. Par-allel Distrib. Syst. , vol. 147, pp. 204–220, 2018.[21] A. Kosta, N. Pappas, V. Angelakis et al. , “Age of information: A newconcept, metric, and tool,”

Foundat. and Trend. in Netw. , vol. 12, no. 3,pp. 162–259, 2017.[22] R. D. Yates, P. Ciblat, A. Yener, and M. Wigger, “Age-optimal con-strained cache updating,” in

IEEE ISIT , 2017, pp. 141–145. [23] E. Jacobsen, E. Rotenberg, and J. E. Smith, “Assigning conﬁdence toconditional branch predictions,” in MICRO , 1996, pp. 142–152.[24] G. Einziger, O. Eytan, R. Friedman, and B. Manes, “Adaptive softwarecache management,” in

ACM Middleware , 2018, pp. 94–106.[25] G. Einziger, R. Friedman, and B. Manes, “Tinylfu: A highly efﬁcientcache admission policy,”

TOS , vol. 13, no. 4, pp. 35:1–35:31, 2017.[26] D. Powers, “Evaluation: From precision, recall and F-measure to ROC,informedness, markedness & correlation,”

J. Mach. Learn. Tech. , vol. 2,no. 1, pp. 37–63, 2011.[27] G. Urdaneta, G. Pierre, and M. van Steen, “Wikipedia workload analysisfor decentralized hosting,”

Comp. Netw. , vol. 53, no. 11, pp. 1830–1845,2009.[28] “Caffeine’s simulator cache traces.” [Online]. Available:https://github.com/ben-manes/caffeine/tree/master/simulator/src/main/resources/com/github/benmanes/caffeine/cache/simulator/parser[29] M. Liberatore and P. Shenoy, “Umass trace repository,” 2016. [Online].Available: http://traces.cs.umass.edu/[30] G. Einziger, R. Friedman, and E. Kibbar, “Kaleidoscope: Adding colorsto kademlia,” in