[PDF] Approximation of LRU Caches Miss Rate: Application to Power-law Popularities

Abstract

Building on the 1977 pioneering work of R. Fagin, we give a closed-form expression for the approximated Miss Rate (MR) of LRU Caches assuming a power-law popularity. Asymptotic behavior of this expression is an already known result when power-law parameter is above 1. It is extended to any value of the parameter. In addition, we bring a new analysis of the conditions (cache relative size, popularity parameter) under which the ratio of LRU MR to Static MR is worst-case.

Full PDF

CChristian BERTHET Page 1 5/19/2017

Approximation of LRU Caches Miss Rate: Application to Power-law Popularities

Christian BERTHET

STMicroelectronics, Grenoble, France,

Abstract . Building on the 1977 pioneering work of R. Fagin, we give a closed-form expression for the approximated Miss Rate (MR) of LRU Caches assuming a power-law popularity. Asymptotic behavior of this expression is an already known result when power-law parameter is above 1. |It is extended to any value of the parameter. In addition, we bring a new analysis of the conditions (cache relative size, popularity parameter) under which the ratio of LRU MR to Static MR is worst-case.

Keywords : LRU (Least Recently Used) Miss Rate, IRM, Power-Law, Generalized Exponential Integral.

Address all correspondence to:

BERTHET Christian; E-mail: [email protected]

1. Introduction

Computation of exact

Miss rate (MR) of a cache with LRU replacement policy has been thoroughly studied. Given an occurrences distribution (so-called ‘popularity’) of a set of addresses (‘alphabet’ or ‘footprint’), two formulas exist: one proposed by King (1971), also in (Fagin, 1977; Fagin, 1978) and another one by Flajolet et al. (1992). Unfortunately, both of them are intractable and therefore, in practice, one has to resort to approximation formulas to evaluate the MR of LRU real-life caches. The following work relies on a very simple approximation stated by Ronald Fagin in 1977. Let us make a major preliminary point of stressing that, in cache theoretical models and Miss Rate prediction, an underlying hypothesis is always used, namely the ‘Independent Reference Model’ (IRM). This means that a set of accesses (‘trace’) respecting the popularity law, is a sequence of independent, identically distributed random variables. IRM is generally not the case in real life cache accesses (in particular for HW Level 1 processor caches) where addresses are subject to some sort of ‘clustering’ bias; hence, effectively measured Miss Rate is much less pessimistic than the one produced assuming IRM. However, to our knowledge, no one knows how to quantify, in a MR formula, the degree of locality of a trace. So, in the following work, IRM is assumed. We will focus on popularities that can be modelled as power-laws (a.k.a Generalized Zipf law): items (cacheline addresses for a cache) are ranked according to their popularity (occurrence frequency) and popularity is in a power-law relation with the rank. These laws have since long been recognized as the most accurate way to represent sw-cache interactions (Voldman, 1983) and more generally computer programs (Zhang, 2009). Let us note also that, in this report, caches are assumed to be fully-associative. The current report is organized as follows: hristian BERTHET Page 2 5/19/2017 Section 2 is a reminder of concepts related to caches and in particular Stack Distance, Working-Set function and Miss Rate. In Section 3 we introduce Fagin approximation for LRU Miss Rate under IRM and see how it has been recently rediscovered under the “Che’s approximation” label. In Section 4 starting from Fagin equations and moving these equations to the continuous domain, we give an analytic form of LRU MR for power-law popularities under IRM generation, using the Generalized Exponential Integral notation (We use the nickname ‘ExpInt’). Section 5 details how this closed-form expression matches previous results from different sources, in particular asymptotics given by Jelenkovic (1999) and Fill (1996). Also we extend Jelenkovic LRU vs Static relation to the case of a power-law parameter between 0 and 1. Static is the ‘(non-)replacement policy’ according to which the D most frequently accessed items are permanently resident in the cache of size D. Finally, Section 6 develops on the analysis of LRU vs Static MR ratio, and particularly on the maximum of this ratio. Under IRM hypothesis, LRU MR is always worse than the Miss rate of the Static policy. Two quantities are determinant to understand under which conditions this LRU/Static ratio varies: First, the parameter (‘exponent’) of the power-law, a real positive, and second, the cache ratio 0 ≤ d ≤ ≤ d ≤

1) for which LRU/static ratio is maximum (i.e. LRU MR is worst-case compared to Static MR). In particular, for a=1 power-law (standard Zipf law), a maximum of 1.43227 is obtained for a cache ratio d =0.453. More generally, there is a direct relation between ‘a’ and d . In particular, d → → d → →∞ . Although a closed-form expression does not seem to exist to compute the maximum of LRU/static ratio and the corresponding d , we propose an approximation which appears reasonably good at least for any parameter a ≤

2. We conclude on stressing some open questions. Regarding proofs, the longer ones are given in Appendices. We strived to correlate theoretical results with practical measurements done using different tools: DineroIV (or a variant) for simulation of cache traces, GSL package or WolframAlpha © for mathematical computations.

2. Caches: Stack Distance and Re-reference Probability a. Terminology

We consider traces represented by a sequence of memory accesses to a memory space (also called the alphabet or footprint of the execution). Such a trace results from the execution of programs on a processor and is the input stream to a cache. Each trace is characterized by its length L, and a number N of distinct addresses. The following terminology to characterize a trace is used: hristian BERTHET Page 3 5/19/2017 • An occurrence is an access to a specific address. Addresses are characterized by the number of occurrences in the input stream to a cache. Distribution of occurrences among the addresses is often called popularity. • A re-reference distance is the number of accesses (possibly 0) between two accesses to the same address. The pdf (probability density function) of the distribution of re-reference distances is noted p reref . The CCDF (complementary cumulative distribution function) of re-reference distances is noted P reref . • A stack distance is the number of unique addresses (possibly 0) between two accesses to the same address. The CCDF of the distribution of stack distances is noted P stack . Note the re-referenced datum is excluded from stack distance count. Intuitively, the performance of an LRU Cache depends on the addresses occurrence law of the input stream as well as on the re-reference profile. This observation led to many works on the so-called “Stack Distance” analysis which date back to the 1970s with the seminal paper on Stack Distances (Mattson, 1970). b. Working-Set Function The working set function WS (D) of a trace of accesses is the average number of distinct (unique) addresses in a D-size window, D varying from 1 to L, length of the trace. Obviously WS(1)=1 and the limit of WS(D) when D increases is the number of distinct addresses accessible in the trace under consideration. Also, WS(D) ≤ D. When L and N are large, there is no other way to represent WS function than a log log graph such as the following one. It gives two examples of WS functions for two different power-law popularities (with exponents a=0, i.e., a uniform law, and a=1, i.e., a standard Zipf law) on a 256K state space and traces in the range of 100M generated under the IRM hypothesis. c. Working-Set Steady-State Relation Stack Distance analysis consists generally in computing an histogram of the stack distances (whose complexity is in N*log(M) for a sequence of N accesses over M unique addresses). Rather, the WS(D) function can be computed from the re-reference probability obtained thru a linear traversal of the trace and the following observation: hristian BERTHET Page 4 5/19/2017 When D increments by one item, WS(D) increases by the probability that no re-reference of length less than D occurs in the window due to the additional item. Hence WS(D+1) = WS(D)*(1-P reref [D])+ (WS(D)+1)*P reref [D] = WS(D)+P reref [D]. Remember our definition of CCDF P reref [D] means that D or more accesses sit between two references to the same address. Assuming WS(0)=0, the relation holds for D=0, since WS(1)=WS(0)+P reref [0]=0+1=1. It follows that ∑ -= = ][)( DX reref

XPDWS . Another form of this relation is D WS(D)= P reref [D] and is known as Denning and Schwartz’s difference equation (Denning, 1972). d. Steady-State compared to Trace Sliding window

Following figures are experimental measurements for two traces: I0 L1 (Instruction Level 1 cache) and L2 (Level 2 cache) illustrating the two possible ways to compute the average stack distance. Traces are respectively 1.8G long for I0 L1 and 83M for L2 and are generated from a real-life platform. For each graph, the blue curve uses the steady-state relation (with P reref ) while the red one computes directly on the traces the average WS(D) for a sliding window of width D. Computation of the red curve is done as follows: The average number of distinct addresses in the incoming stream is evaluated for a moving window of size D using a

I0 L1 cache: Average Stack distance vs Rereference distance

L2 cache: Average Stack distance vs Rereference distance hristian BERTHET Page 5 5/19/2017 0.001 log binning. The algorithm takes ~25min (C code) on a linux 70GB box for L2, but it takes much longer, in the range of 40h, for I0 L1 trace. Note that both curves match almost entirely, for example on L2 they show two thresholds at approximately 10 and 10 . A divergence occurs after 10 for L1 and 10 for L2, so when window width is above 1/10 of total trace length. The difference is due to the fact that the first curve describes the steady-state from the Re-reference probability function, whereas the other one is a measure restricted to the selected trace. For our purposes, the steady-state relation and corresponding computation of the WS function is perfectly suited to our needs. e. Relation Pstack and LRU Miss Rate P reref [X>=D] is the CCDF probability that at least D records occur between two occurrences of the same cache line address. P stack [X>=D] is the CCDF probability that at least D unique records occur in the interval. LRU replacement policy means that there is a Hit in a cache of size D for an aceess to a given address (i.e., address is in the cache at that time) if and only if there has been accesses to at most (D-1) other addresses since the previous occurrence of the address under consideration. Conversely, an access results in a miss if and only if at least D unique items have been accessed since the previous occurrence of the address. Hence, for any trace, P stack CCDF calculated at value D is obviously the same expression as the Miss Rate of a LRU cache of size D: MR(D)=P

Stack [D].

3. LRU MR Asymptotic under IRM model a. P reref and WS function under IRM model

We consider an IRM (Independent Reference Model) framework for each trial of the sequence, in other words every reference is an i.i.d. random variable. Also we assume addresses respect a given popularity distribution p i among N addresses with = ∑ = Ni . With these two assumptions, re-reference distance PDF is: ii1 i p)p1(p][ DNireref Dp -= ∑ = . In particular, ∑ = = Nireref p p]0[ . It is direct that probability is well-formed: ii1 iii01 iii1 i0 =--=-=- ∑∑∑∑∑ =+¥===+¥= NiDDNiDNiD . Re-reference CCDF (Complementary Cumulative Distribution Function) is:

DNiXXDNiXNiDXreref DP )p1(pp))p1(()p1(pp)p1(p][ i1 iii0i1 iii1 i -=--=-= ∑∑∑∑∑ =+¥===+¥= In particular, P reref [0]=1 and lim D → + ∞ P reref [D]=0. From its definition, the working-set function is: ∑∑∑∑ =-= =-= --=-== Ni DDX XNiDX reref

XPDWS ))p1((1)p1(p)()( . hristian BERTHET Page 6 5/19/2017 WS function verifies WS (0) =0, WS (1) =1 and lim D → + ∞ WS (D) =N. b. Fagin approximation “in a certain asymptotic sense”

In (Fagin 1977), a tractable computation of LRU MR is given and is shown to be “correct in a certain asymptotic sense”. P reref

CCDF is noted M (page 224) and called expected working-set miss ratio (parameter is the window size). Another quantity noted S is the expected working-set size which is our WS function. (Fagin 1977) main claim is that “in a certain asymptotic sense”, the Miss Rate of an LRU cache of size D (called expected working-set miss ratio with expected working-set size) is: M(S -1 (D)), where S -1 is the inverse function of S. A proof is given as well as measurements for a Zipf’s popularity, that justify the claim. Reminding that P Stack is another notation for LRU MR, Fagin relation is: MR(D)=P

Stack [D]=P reref [WS -1 (D)]. Next figure illustrates Fagin relation for the 83M trace of a L2 cache. It shows that P stack [X] computed by the usual stack distance algorithm is very close to the curve obtained by computing P reref [WS -1 (X)]. For comparison, we show also P reref [X] as well as the Miss Rate curve obtained using DineroIV tool (Dinero) with LRU-Fully-Associative settings, this curve fits almost exactly P stack . c. Constraint on popularity distribution We consider the extension to the continuous domain of integer variable D, then P reref [D] and WS(D) are considered as real functions of a positive real variable D. We also consider a constraint on the occurrences, namely we assume that p i <<1 for all i, hence ln(1-p i )=-p i . Under this assumption, it is direct that P reref [D]=WS’(D) since: ][)p1()p1(ln][' DPDWS rerefNi D =-(cid:215)--= ∑ = In (Xiang, 2013), this relation is presented as an additional order compared to the usual pdf/CCDF relation: ][]['

DpDP rerefreref = . hristian BERTHET Page 7 5/19/2017 Using Fagin asymptotics equation together with the relation on derivative of inverse function: )(1)( xxfxfxf ¶¶=¶¶ -- o , one finally obtains: )]'([1)]([')]([][ DWSDWSWSDWSPDMR reref --- === d. Fagin rediscovered: “Che’s approximation”

Another direct consequence of the constraint is that if ln(1-p i )~-p i holds for all i, then DNi eDeref i p1 i p][Pr -= ∑ = and ∑ -= = - Ni D eDWS )1()( i . (Che & al., 2002) introduced an approximation of LRU-FA Hit Rate which is defined as follows (see also (Fricker & al. 2012) ): hit rate of object n of popularity q(n) in a cache of size C is approximated by h(n)=1-e -q(n) t where t is the unique root of Ce Nn t =- ∑ = - )1( . Using our notation, it is obvious that t is the solution of CWS = )( t and then, MR of C-size cache is: )]([][))(1)(( CWSPPihiq rerefrerefNi -= ==- ∑ t which is Fagin approximation. Consequently, “Che’s approximation” is essentially a 25 years old re-phrasing of Fagin asymptotic formula together with the constraint ln(1-p i )~-p i for each i.

4. LRU MR Approximation for Power-law Occurrences

We now consider that popularity is distributed according to a power law, also called Generalized Zipf law in the literature. As usual, let the law be p i = k/i a , where i, 1 ≤ i ≤ N, is the rank of the item, N the size of the addresses footprint (‘alphabet’), and k the normalizing constant: ∑ = = Ni k i1/1 = H N,a (H N,a is the N-th generalized harmonic number). Preref[D] and WS(D) are calculated by integration on the [0,N] domain of the previous approximations extended to the continuous domain. a. Uniform (a=0) Distribution

The pdf form of occurrence probability is p i = 1/N and its CCDF form: P occur [D]=1-D/N. Then, NDNDN edxeDeref -- == ∫ N1][Pr is the exponential distribution with Mean N and )1()1()( NDN ND eNdxeDWS -- -(cid:215)=(cid:215)-= ∫ . Note that: =-(cid:215)= -¥ﬁ¥ﬁ NNN eNWS . Then for D

Stack (i.e. LRU Miss Rate) is also uniform and is the exact replica of P occur as illustrated on next Figure (abscissa is log D) for N=2 , where the Re-reference CCDF P reref is given as well. hristian BERTHET Page 8 5/19/2017 Preref [N] =e -1 =0.367879 and Preref[10*N]= e -10 =4.53997e-05, hence very close to 0. b. Zipf (a=1) Distribution Using the upper incomplete gamma function ),( xa G (http://dlmf.nist.gov/8.2), following expressions are obtained: ),0(),0(xk][Pr NDkkxDkkdxeDeref

NN Dxk

G(cid:215)=  G= ∫ (cid:215)(cid:215)= - since nn "=+¥G ,0),( . and ),1(),1()1()( NkDkDNxkDkDxdxeDWS

NN Dxk -G(cid:215)-=  -G(cid:215)-= ∫ (cid:215)-= - . From the relation xs exxssxs - +G(cid:215)=+G ),(),1( , NkD

NeNkDkDNDWS - -G(cid:215)+= ),0()( and since xsx exxs --+¥ﬁ ﬁG ),(lim , we see that WS(D) tends to N as expected. Rather than the incomplete Gamma function we use a more compact form: E p , the generalized exponential integral of order p (http://dlmf.nist.gov/8.19 ),1()( zpzzE pp -G(cid:215)= - , in particular ),0()( zzE G= See Appendix 1 for a graph and asymptotic of this function, nicknamed ‘ExpInt’. With this function and introducing k=1/H N , where H N,1 =H N =lnN + g + O(1/N), and g is Euler-Mascheroni constant ( g ≈ )(1][Pr NN NHDEHDeref (cid:215)= and ))(1()( N NHDENDWS -(cid:215)= . Note that since E p (0)= 1/(p-1) for p>1 (http://dlmf.nist.gov/8.19.E6) then WS(0)=0, and it can be shown that: =-(cid:215)= ¥ﬁ¥ﬁ NNN

NHENWS . Also - -=¶¶ pp ExE ( http://dlmf.nist.gov/8.19.E13) implies WS’=P reref relation is preserved. The inverse function of WS is: )1()( NDENHDWS N -(cid:215)= -- for D<=N. And, finally, we have the closed-form expression: ))1((1][ NDEEHDMR N -= - . hristian BERTHET Page 9 5/19/2017 Following graph (with log abcissa) shows the Occurences (for N=2 addresses), Re-references and Stack CCDFs for a simulation of (10 *N) accesses generated randomly (IRM) according to an occurrence power-law with a=1 parameter. We can now see how previous computations of P reref and P stack based on ExpInt functions fit with these simulations. Following graphs of P reref and P stack are computed with GSL package (using gsl_sf_gamma_inc incomplete upper gamma function). Strikingly, it appears the approximation is almost perfect for a Distance above 10. Below this value, ln(1-p i )~-p i assumption is likely too strong. However, for P stack (and MR) we are interested in much larger cache sizes, hence the assumption is essentially OK for our purposes. c. General Power Law (a ≠

1) Distribution

Preref and WS expressions are: ),1()/(),1()/(x1][Pr ,1,0,1,0 x1a, a, aaNaaNNaaNaaNN DHaN NH DaaaDHDxH DaaaDHDdxeHDeref aN -G=  -G=(cid:215)(cid:215)= ∫ - and: power(1.0), N=2^15, Trace Length=1000*N E1(D/N*Hn)/Hn simulated PrerefE1(invE2(1-D/N))/Hn simulated Pstack hristian BERTHET Page 10 5/19/2017 ),1()/(),1()/()1()( ,1,0,1,0 1 , aaNaaNNaaNaaNN DxH NH DaaHDNxH DaaHDxdxeDWS aaN -G(cid:215)-=  -G(cid:215)-=(cid:215)-= ∫ - As in a=1 case, these can be expressed with generalized exponential integral (with a non-integer parameter) ),1()( zpzzE pp -G(cid:215)= - . )N(][Pr a,1,1 aNaaN a H DEaHNDeref - = and ))N(11()( a,11 aNa H DEaNDWS + (cid:215)-= , Consequently, ))1((N)( NDaEHDWS aaN -= - +- . Similarly to a=1, WS (0) = 0. Finally LRU miss rate is: )))1(((][

NDaEEaHNDMR aaaNa -(cid:215)= - +- . Remarks: There is no discontinuity for a=1 case at this point. If power-laws parameters are such that a>b, then WS a (D)

We use the following code to compute P reref and P stack . As in previous section, it uses gsl_sf_gamma_inc GSL function (rather than GSL generalized exponential integral function which unfortunately is limited to integer parameters) : double N=pow(2,15); double X; double a=2.0; double inva= 1.0/a; double k=0.0; double Hn=0.0; int i; for (i=1;i<=N;i++) Hn+=1.0/pow(i,a); k= 1.0/Hn; //printf ("%.10f\t%.10f",N, k); printf ("\n"); for (X=0.0; X<=6.0;X=X+0.01) { double D=pow(10,X); double T=k*D/pow(N,a); double Y=N*(1.0-(inva*pow(T,inva)*gsl_sf_gamma_inc (-inva, T))); // Y is WS(D) double Z=pow(D*k,inva)*gsl_sf_gamma_inc (1.0-inva, T)/(a*D); // Z is Preref[D] printf ("%.5f\t%.5f\t%.5f\n",X, log(Y)/log(10),Z); } // Y:WS(X), Z:Preref[X], Y,Z Preref[WS-1()]

Then we compare them to graphs of P reref and P stack obtained by simulation (i.e. IRM random generation according to the occurrence power law and computation of P reref and P stack from the trace). Following figures show the results for a=0.5 and a=2. In the former case, match is almost perfect. a=0.5 pow er law hristian BERTHET Page 11 5/19/2017 In the latter case, curves are similar for distances above 10. It is not the case below 10 probably for the same reason as before, i.e., logarithm approximation is too strong for high values of exponents.

However let us notice that, for values of D corresponding to real-life caches, the match is largely sufficient to guarantee that the approximation is a faithful representation of the P stack

CCDF and therefore, the LRU Miss rate. a=2 pow er law hristian BERTHET Page 12 5/19/2017

5. Asymptotics of LRU MR Approximation

We first define the Miss Rate for Static caches: In terms of performance, LRU caching scheme is often compared to the static optimal caching. Then we distinguish between large and small LRU caches since they lead to different expressions regarding the asymptotic behavior when the alphabet size N increases. a. Static caches

Static (or ‘Static optimal’) caching simply consists in ‘locking’ the most popular D addresses in the cache (in other words, the most frequently accessed items are permanently resident in the cache) (Jelenkovic, 1999). With a minor variant, Static is also called “A0” in (Fagin, 1977). Although it is not necessarily the local optimum, however, optimality is true on a large time window. Static Miss Rate is the tail of the popularity distribution: ∫ = ND aaNstatic xH dxDMR , )( . This leads to the following: If a =1, NNDNstatic

H DNH xDMR lnlnln)( -=  = , else aN aaNDaNastatic Ha DNHaxDMR ,11,1 )1()1()( - -=  -= --- This gives (using approximation of generalized harmonic numbers, see Appendix 4) the following asymptotic when N ﬁ ∞ : If a >1 )()1( 1)( - -ﬁ astatic DaaDMR V , since for a>1, limit of H N,a when N is infinite, is z (a), where z is Rieman Zeta function. If a =1 NDDMR static lnln1)( -ﬁ , since H N ~lnN when N ﬁ ∞ If 0

DNNDMR ---- -=-ﬁ d , noting d = D/N. For a=1/2, Static MR limit is d- . Note the continuity for a=0 with MR= d- . b. Large LRU Caches In that case, d =D/N is close to 1 (case of a large cache size close to alphabet size). Then, parameter a(1- d ) is in the vicinity of 0. Using ExpInt reciprocal approximation ( ) )lnln( xxxE p --= - when x ﬁ

0 (see Appendix 2), and noting it holds regardless of p, it follows that ( ) ( ) xxEE pp ~ - + . Hence, for large caches, LRU MR is: aNaaN a HNaaHNDMR ,1,1 )1()1(~)( dd -=-(cid:215) -- hristian BERTHET Page 13 5/19/2017 And LRU vs Static ratio is: if a=1: ddd ln1lnln)1( - -=-- DNHH NN , otherwise if a ≠ aaa aNaNa aDN HaHN ---- - --=---

111 ,,1 ddd . Therefore LRU to Static ratio does not depend on the exact value of N but rather, on the cache size ratio. There is a continuity for a=0, i.e. the uniform distribution: in that case both Static and LRU MR are equal to d- , hence ratio is 1. Same continuity holds for a=1 since )ln(111lim dd -=- - -ﬁ aa a . Limit of the ratio is 1 when dﬁ

1: for a=1, obviously from (1- d )~-ln d ; and for a ≠

1, from series expansion ))1((2 )1(11 )1)(1( ddd d -+--=- -- - Oaa a in the vicinity of 1. Asymptotics of LRU MR of large caches when N ﬁ ∞ When N increases and cache size is large (i.e., d in the vicinity of 1) LRU MR is: If 01, )( )1( aNMR a z d -ﬁ - and, for a =1 (Zipf): NHMR N ln )1()1( dd -»-ﬁ . c. Small LRU Caches This case (i.e., d in the vicinity of 0) is much more interesting because it corresponds to real-life caches, with a cache size generally much lower than the alphabet size. When d is close to 0, previous approximation on Generalized Exponential integral functions ( ) ( ) xxEE pp ~ - + does not hold. We first relate our analysis to other models: (Jelenkovic, 1999), (Fricker&al., 2012) and (Fill, 1996). (i) Jelenkovic Asymptotic Relation for a>1

For a>1, (Jelenkovic, 1999) gives an asymptotic formula of LRU cache miss rate compared to ‘optimal static’ for power-law distributions. LRU Miss Rate Formula when support N increases to infinite, is for a>1: aa aDaaDMR  -G= - )11()( 1][ z , hence LRU to static ratio is a aa  -G- )11()11( . In Appendix 5, we show that this result can be derived simply by using the first two terms of the series expansion of the generalized exponential integral. hristian BERTHET Page 14 5/19/2017 An interesting result given by (Jelenkovic, 1999) is that when a ﬁ ∞ , the limit is e g , meaning that LRU MR cannot be worse than ~1.78 times static MR. Note also that according to Figure 1 of (Jelenkovic, 1999), when power-law parameter approaches 1 (i.e. Zipf law), LRU to static MR ratio tends to 1: LRU is exactly equivalent to static algorithm when support size N ﬁ ∞ . (ii) Deriving Jelenkovic 0

In two different papers (Jelenkovic, 2002) and (Jelenkovic, 2005), Theorem 3 with k=1, the following formula is given: ))(,11()()11(][ ¶-G¶-=¶ - hh aaMR a , where )( ¶h is the unique solution of: ¶=-G- -- ),(1 hh aa a and ¶ the cache size ratio. Using the generalized exponential integral notation, then )( ¶ h is simply: ))1(()( ¶-=¶ - + aE a h and )))1((()11())(()11(][ ¶--=¶-=¶ - + aEEaEaMR aaa h . It is very similar to the formula derived from Fagin relation. Indeed they are in exact match to one another assuming approximation of a<1 generalized Harmonic numbers: aNH aaN - - (see Appendix 4). (iii) Fricker&al. formula (Fricker&al., 2012) Proposition 3 (page 62) gives an asymptotics of a quantity called   N t d where   N d d <1 is the cache size, for an un-normalized power law q(n)=1/n a , 1 ≤ n ≤ N:   )()( aaad dy NoNt N += - , and ψ α defined in Lemma 2 for any b >0: dxe x ∫ - -= a ba by . We note that a bbaabby aaa )(1,11)( a Ex + -=   -G-= Hence     )()()())1(()()( ,11111 aaaaaaaad ddady

NoH NWSNoNENoNt NN +=+(cid:215)-=+= -- +- In other words,   N t d is asymptotic to WS -1 function after normalizing the power law: p(n)=1/( n a H n,a ) =q(n)/H n,a , 1 ≤ n ≤ N. (iv) Asymptotics for 0

In (Fill, 1996) detailed results are given for the search cost under the move-to-front rule, problem which has been shown equivalent to the LRU caching (see (Flajolet, 1992)). Formulas are given for the density of the search cost of 01/2) of a quantity noted f A (a) which can be interpreted as the derivative of LRU Hit Rate w.r.t. to the cache size. We show that these formulas can be derived from the series expansion of the generalized exponential integral function for different cases of 0

2. Consequently, a first order approximation of reciprocal function is: ( ) ( ) aaaxxE a --= - + and then ( ) ( ) aaE a -=- - + dd . Approximation of E E series expansion is: ( ) ..)13!212()11(1 +  -+----G+-= - axaxaxaaxE aa Two cases have to be analyzed depending on whether 1/a-1<1, i.e. 1>a>1/2, or not. Case 1>a>1/2

Gamma function is always defined on the interval ]1/2,1[, thus ( ) )11(1 axaaxE aa -G+-= - . Then ( ) ( )  -G-+-=  - -- + )11()1(1)1( aaaaaEE aaa dd )12()1(1][ aaMR aa -G--= ⇒ -- dd . This matches with (Fill, 1996) lemma.8.b.(iii) result. Case ½>a>0

On the other hand, interval ½>a>0 includes values for which there are undefined expressions. As described in Appendix ExpInt series expansion singularities (Appendix 3), it can be seen that the factor with the gamma function term can be paired with a term in the infinite series such that the sum is negligible: merging the two terms and passing to the limit leads to an expression which is at least quadratic, so: ( ) -+-= aaxaaxE a hristian BERTHET Page 16 5/19/2017 ( )

12 )1(1)1( - -+-=  - ⇒ - + a aaaaaEE aa dd . And finally aaMR

21 )1(1][ - --= dd , matching with (Fill, 1996) lemma.8.b.(i). Note that when power-law parameter ‘a’ tends to 0 (i.e., a uniform distribution) LRU tends to Static MR, i.e.: (1- d ) as expected. Case a=1/2

Point a=1/2 is a singular point. ExpInt series expansion cannot be used directly since )11( a -G is not defined. Using approximation ( ) )()1(ln1 xoxxxE +-++= g and with ( ) dd =- - E ,then: )12(ln212][ -++=  = gdddd EMR , which is equivalent to (Fill, 1996) lemma.8.b.(ii). Case a=1

There again, series expansion cannot be directly used because of undefined terms. See ExpInt series expansion singularities Appendix 3, where are given the series at x=0 of both E (x) and E (x): ( ) )()ln( xoxxxE +++-= g and ( ) )()1(ln1 xoxxxE +-++= g . Reciprocal function of E (x) cannot be easily devised however it can be observed that series expansion of -E (x)/ln(1- E (x)) tends to (lnx+ g )/(lnx+ln(-lnx- g +1)) whose limit is 1 when x tends to 0. With this limit, )1ln()))((1ln())(( XXEEXEE --=--» -- when X is close to 1. Thus NDNDEE ln))1(( -»- - and NDH DNNDEEHDMR NN lnln1lnln))1((1][ -»-»-= - when D is close to 0. MR formula is exactly the static MR for a=1 and is a confirmation of Jelenkovic trend as mentioned on Figure 1 of his paper (Jelenkovic, 1999) which shows a limit of 1 when parameter a tends to 1 (i.e., when N →∞ , LRU MR tends to Static MR). (v) Relation to other works on Caching analysis

Based on their measurements, (Breslau & al., 1999) claim a ln(D) trend for a=1 power law, and a D (1/a)-1 trend for 0a>1/2. Jelenkovic relation for a>1 is also proved (and derived by other means) in (Sugimoto, 2006). (Hattori, 2009) also gives Jelenkovic formula for a>1 laws. They also addressed the case 1>a>1/2 in formula (67) page 18 where they give a result similar to ours where Hit rate is in first order proportional to t times a constant proportional to G (2-1/a): MissRate(t)=1- G (2-1/a)*K* t + O(t). (vi) Encounter of another kind

A consequence of the WS(D) definition for Power-laws is that the slope at the origin of the WS(D) function in a loglog graph is always 1 when power-law parameter ‘a’ is 0 ≤ a ≤

1, and is 1/a when a>1. This observation is proven in Appendix 6. hristian BERTHET Page 17 5/19/2017 Not unsurprisingly, this result converges with an empirical observation done in the field of computational linguistics (See formula (7) of (Lü 2010)), and relating the so-called Heaps law (measuring the growth of the vocabulary size width document size) and the power-law frequency distribution of the lexical items (simply named Zipf law in (Lü 2010)).

6. Analysis of the Maximum of LRU/Static MR ratio

This Section is intended to be the main contribution of the report. We think the analysis of the maximum of LRU/Static MR ratio is a novelty, bringing confirmation of previous results and opening new questions. a. Zipf law (a=1)

LRU/Static MR ratio for Zipf law is for d =D/N, 1 ≤ D1, i.e. LRU MR is always higher than Static MR, in other words, ))(1ln()( yEyE --> for y>0. This by noting that following relation holds: >-+= yEyEyf since obviously = +¥ﬁ yf y , +¥= +ﬁ )(lim yf y using series expansion at origin of E and E which implies that f(y)=-( g +ln(y))+o(y). Its derivative )(1 )()()(' yE yEyEyf -+-= is always negative for y>0, since: yyeyE y <= - for y>0, hristian BERTHET Page 18 5/19/2017 hence )()()()()( yEyEy yEyEyE -=< , so <+- yEyEyEyE and, since << yE , finally: <-+- yE yEyE . Then, F(y) is always above 1. Function F(y) reaches a maximum when its derivative is null, i.e. when y is solution of )())(1ln())(1()( yEyEyEyE ---= and )(1 yE =- d . Unfortunately, an analytical solution does not seem to exist. Using WolframAlpha © tool, we found a 1.43227 maximum for y~0.223059 and then a cache ratio d =1-E (0.223059)=0.453. Coordinates of the maximum ( d =0.453, Max=1.43227) are strikingly confirmed by a set of runs on DineroIV (actually a variant of Dinero allowing for non-power of 2 cache sizes), each run is performed on a 20M IRM trace (over 64K addresses) generated according to a Zipf (a=1) popularity law. Each point is a percent of the cache ratio from 0% to 99%. b. Generalized power law (a ≠ We generalize the ratio LRU/Static MRs to a>0 parameter of the power-law. Ratio for Generalized power law (a>0, a ≠

1) is (for d =D/N, 1 ≤ D

0y 1,)(lim >"= +ﬁ yF aa . This stems directly from E n (x)~E n+1 (x) when n ﬁ + ∞ , and is coherent with a constant ratio equal to 1 for a uniform popularity. a=1 Ratio of LRU/Static MRs hristian BERTHET Page 19 5/19/2017 Property P2: )(F)(lim yyF aa = ﬁ , where F (y) is defined in the previous paragraph for a=1: ))(1ln( )()( yE yEyF --= . This comes from: ( ) ))(1ln( 1)(11 )1(lim xuxpup pp --=-- - -ﬁ . Property P3: )(0 -= ¥ﬁ yEaa e yEyF . We note this function F ∞ . This limit comes from: ( ) )(110 -=-- - -ﬁ xupp expup . It holds that = ¥¥ﬁ yF y , and g eyF y = ¥ﬁ )(lim . One can verify that

0y 1,)( >"> ¥ yF , from the consideration that )()1)(ln( yEyE -+ is always positive, since it tends to 0 when y ﬁ + ∞ , to g when y ﬁ  -+=++- - yyEyE yEyEyE yE

00 000 1 is always negative.

Property P4: ++¥ﬁ = yF ay . When y ﬁ + ∞ , )(1 ypE p + -=d ﬁ

1, meaning the cache is large. When y ﬁ + ∞ , ﬁ= yEz n , regardless of n, hence: ( ) pzay pzzpyF

11 )1(lim)(lim -ﬁ+¥ﬁ -- -= . Using Laurent series when x ﬁ )(2 11)1(1 1 xonnnxx n +-+=-- , finally, =  +--+--= ﬁ+¥ﬁ zoppzzpyF zay . Property P5: = +ﬁ yF ay when 0 ≤ a ≤

1, and a aaa  -G- )11(1 when a>1. Proof is given in Appendix 7. Result for a>1 is the relation given by (Jelenkovic, 1999). In the sequel we note F a (0) the Jelenkovic limit. It is always above 1, tends to 1 when a ﬁ

1, and tends to e g ~1.781 when a ﬁ ∞ . Note it is valid only for small caches (y ﬁ dﬁ Property P6: >>">" yFya a Proof is in Appendix 8.

Property P7 :

0y (y),F)( >"> ¥ yF Proof is in Appendix 9.

Summary of the Properties F a (y) is a monotonically-increasing family of functions such that, for any y>0: if 00 and a ≠

1, is null when y is solution of: ( ) ( ) ( ) ( ) )( )(1)()(111 )()1()(11 )()1()( yE ypEyEypEppE yEpypEdyd yEpdydyF p pppppp ppp pa +--+--+ -=  -  -- --=  -- -= and Maximum of ratio is then F a (y ) if such a solution y exists. Yet this does not lead to an analytic form of its zeroes. For a=1 this leads to equation ( ) )( )(1)()( yE yEyEyF -= which, as we have seen, has a unique solution (see previous section) but does not lead to a closed-form expression. For a →

0, since ( ) =-= +-¥ﬁﬁ yE ypEyEyF p ppppaa , this means that for any y, F (y) is a maximum equal to 1. When p ﬁ ﬁ ∞ ), ( ) )()(0 11110 yEyEp pppp yeyeyE yEyE ypEyE +==- -+-ﬁ hence a maximum is found for y such that )()(0 yEyE yeye yE +=- . Each of those two expressions tends to e g when y ﬁ

0, therefore the Ratio Maximum sits at abciss y=0 and is equal to F ∞ (0)= e g . d. Graph of Maximum Approximations are made for specific values of parameter ‘a’ using WolframAlpha © tool. For example, for a=1/2, ratio F is ( ) )(211 )( yEyE -- with limit F (0)=1 and a maximum at 1.28732 for y=0.55779224, i.e. d =0.5927514. For a=1/3, maximum is 1.21657 for y=0.8158 and d =0.6729. Similarly a maximum above F a (0)=1 exists for any 0

1. Intuitively, d increases to 1 as a gets closer to 0. For a=2, ratio is ( ) )(2 )(2)(

23 2321 yE yEyE - giving a maximum of 1.58 with Jelenkovic limit F (0)= )21(21  G = p /2~1.57 very close to the maximum value, while abciss of the maximum is very close to 0 (<0.035). hristian BERTHET Page 21 5/19/2017 Extending to a number of values of parameter ‘a’ and using both WolframAlpha © or GSL, we obtain the following graph which compares the Max of LRU/Static ratio with Jelenkovic limit: as ‘a’ increases, F a (0) limit gets closer to Maximum of ratio, and both tend to e g . When a is above 2, the maximum is very close to F a (0) Jelenkovic limit. The corresponding cache ratio as a function of power-law exponent ‘a’ is as follows: d → → d → →∞ . In (Jelenkovic, 1999) Fig. 2 experiments with a=1.4 show that simulation results for a cache ratio below 10 -3 (cache size up to 10 on a vocabulary of size 10 ) are very close to F a (0)=1.42362 constant approximation of LRU to Static ratio. The figure above comparing Max F p to F p (0) shows a ~1.5 maximum when a=1.4, which is reached for d ~0.38, so a cache size much higher than those analyzed in (Jelenkovic, 1999) Fig. 2. This explains why F p (0) is an excellent approximation of F p ( d ) for low values of d . On the other hand, when parameter a is just above 1, and very close to 1, this approximation may be at risk depending on the size of the cache: we have seen that LRU/Static ratio for a=1 goes from 1 (for d =0) with a very steep curve up to a maximum of 1.43227 (for d =0.453). It is so steep that its value is 1.243 for d =0.00096 (abciss=0.0001). So, for a cache in the range studied by (Jelenkovic, 1999), when parameter a is close to 1, F p (0) approximation can lead to an underestimation of LRU MR in the range of 25%. Maximum of LRU/static ratio vs (1-1/a)* G (1-1/a) a Jelenkovic limit (abciss is a=1/p)

Maximum of LRU/Static ratio Jelenkovic limit00.20.40.60.81 0 1 2 3 d of maximum - function of a hristian BERTHET Page 22 5/19/2017 Clearly this underestimation gets smaller as parameter a increases e. Confirmation with Cache simulation tools Following graph shows 99 runs (cache ratio varying from 1 to 99%) on a Dinero-variant tool for a 20M IRM trace generated over 64K addresses according to a power-law (for both a=0.1 and a=0.2). They confirm the results obtained both for the cache ratio of the maximum and the maximum itself. For a>1, we have results on DineroIV tool (hence restricted to power-of-2 caches from 0 to 1K) on 100M traces IRM-generated over 1K addresses for a set of values from a=1.0 to a=2.0. It shows the trend to F a (0) as well as the very steep slope at the origin. Having in mind that a real cache is in the range of 0.1 to 1% of the address space, this justifies the concern made in the previous paragraph. f. A possible approximation of abciss of Maximum Repeating previous computations using WolframAlpha © on a number of points, we found the following graph comparing the abciss of the maximum of F p with the function E (1/p) and they appear to match almost exactly up to a=2 (i.e. p ≥ LRU/Static (1K @, 100M trace) Dinero results hristian BERTHET Page 23 5/19/2017 A similar (slightly more precise) graph can be obtained using GSL package. Divergence from E (1/p) can be further analyzed above p=2. For a=1, using x=E (1)=0.293839, the value of maximum is 1.43129 which is coherent up to the 3 th decimal with the solution of Maximum produced by Wolfram. In conclusion, E (1/p) seems to be an excellent predictor of the abciss of maximum when p>0.5 (i.e. 2>a ≥ ≥

2) it is not the case, however we have seen that for these values, maximum is extremely close to the value at origin (Jelenkovic’s value). Finally we conjecture that, for p>0.5, y=E (1/p) is a good approximate solution of the equation: ( ) ( ) ( )  ---=- ++- )(1)(1)()()1( ypEypEyEyEp ppppp In particular, using the limits where p=1, y=E (1)= G (0,1) is a good approximate solution of )())(1ln())(1()( yEyEyEyE (cid:215)-(cid:215)--= . Abciss of maximum w.r.t E1(1/p) (function of a=1/p)

AbcissE1(1/p)0.0000010.00010.011 0 2 4 6 8 10

Abciss of Maximum compared to E1(1/p) (function of a=1/p)

Abciss of maximum E1(1/p) hristian BERTHET Page 24 5/19/2017

7. Conclusion

In this report, we have proved that a closed-form expression for power-law popularities can be derived from R. Fagin LRU Miss rate approximation. Asymptotics of this expression are coherent with previously known results. The main contribution of this work is in a more thorough analysis of the LRU/static ratio which shows that, for any real positive power-law parameter ‘a’, there is a cache ratio 0 ≤ d ≤ hristian BERTHET Page 25 5/19/2017 References L. Breslau, P. Cao, L. Fan, G. Phillips, S. Shenker,

Web caching and Zipf-like distributions:Evidence and implications,

Proc. of IEEE INFOCOM. ’99 (1999) 126-134. Che, H., Tung, Y., & Wang, Z. (2002). Hierarchical web caching systems: Modeling, design and experimental results.

Selected Areas in Communications, IEEE Journal on, 20(7), 1305-1314. Chiccoli, C., Lorenzutta, S., & Maino, G. (1990). Recent results for generalized exponential integrals.

Computers & Mathematics with Applications , (5), 21-29. 4. Denning, P. J., & Schwartz, S. C. (1972). Properties of the working-set model.

Communications of the ACM , (3), 191-198. 5. Dinero IV Trace-Driven Uniprocessor Cache Simulator http://pages.cs.wisc.edu/~markhill/DineroIV/ 6.

Fagin, R., & Price, T. G. (1978). Efficient calculation of expected miss ratios in the independent reference model. SIAM Journal on Computing, 7(3), 288-297. 8.

Fill, James Allen Limits and rates of convergence for the distribution of search cost under the move-to-front rule. Theoret. Comput. Sci. 164 (1996), no. 1-2 9.

Flajolet, Philippe; Gardy, Danièle; Thimonier, Loÿs (1992), "Birthday paradox, coupon collectors, caching algorithms and self-organizing search", Discrete Applied Mathematics 39 (3): 207–229. 10.

Fricker, C., Robert, P., & Roberts, J. (2012, September). A versatile and accurate approximation for LRU cache performance.

Hattori, K., & Hattori, T. (2009). Hydrodynamic limit of move-to-front rules and search cost probabilities. arXiv preprint arXiv:0908.3222 . 12.

Jelenkovi ć , P. R. (1999). Asymptotic approximation of the move-to-front search cost distribution and least-recently used caching fault probabilities. Annals of Applied Probability , 430-464. 13.

Jelenkovic, P. R. (2002). Least-Recently-Used Caching with Zipf’s Law Requests. In

The Sixth INFORMS Telecommunications Conference . 14.

Jelenkovic, P. R., Kang, X., & Radovanovic, A. (2005, February). Near optimality of the discrete persistent access caching algorithm. In

International Conference on Analysis of Algorithms DMTCS proc. AD (Vol. 201, p. 222). 15.

Jelenkovi ć , P. R., & Kang, X. (2007, January). LRU caching with moderately heavy request distributions. In Proceedings of the Meeting on Analytic Algorithmics and Combinatorics (pp. 212-222). Society for Industrial and Applied Mathematics. 16.

W.F. King III, “Analysis of demand paging algorithms”,

Proc. of the

I.F.I.P. Congress

Lü L, Zhang Z-K, Zhou T. Zipf’s Law Leads to Heaps’ Law: Analyzing Their Relation in Finite-Size Systems. Sporns O, ed.

PLoS ONE

R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies.

IBM System Journal, 9(2):78–117, 1970.

Sugimoto, T., & Miyoshi, N. (2006). On the asymptotics of fault probability in least ‐ recently ‐ used caching with Zipf ‐ type request distribution. Random Structures & Algorithms , (3), 296-323. 20. J. Voldman, B. Mandelbrot, L. W. Hoevel, J. Knight, and P.Rosenfeld, “Fractal nature of software-cache interaction,” IBM J. Res. Develop., vol. 27, pp. 164-170, Mar. 1983 21.

Zhang, H. (2009). Discovering power laws in computer programs.

Information Processing & Management , (4), 477-483. 22. Xiang, X., Ding, C., Luo, H., & Bao, B. (2013, March). HOTL: a higher order theory of locality. In

ACM SIGARCH Computer Architecture News (Vol. 41, No. 1, pp. 343-356). ACM. hristian BERTHET Page 27 5/19/2017

Appendix 1: Generalized Exponential Integral: ‘ExpInt’

Graph of Generalized Exponential Integral E p (x) is given in http://dlmf.nist.gov/8.19 General considerations

Maplesoft uses the notation: Ei(a,z)=z a−1 Γ (1−a,z) and WolframAlpha © uses expint(a,z). We use the nickname ‘ExpInt’ for Generalized Exponential Integral. E (x)=x -1 e -x hence E (x)=1 for 01 http://dlmf.nist.gov/8.19.E6 and - -=¶¶ pp ExE ( http://dlmf.nist.gov/8.19.E13). And also relation nE n+1 (x)=e -x -xE n (x). We are interested in E functions a>0. We have E (0)=a, for a>0. Since E (0)=1, p=2 is a particular point for E p (0). Vicinity of + ∞ Generalized exponential-integral E p (x) ~ e -x /x when x →∞ , regardless of p. See asymptotic series expansion in http://functions.wolfram.com/GammaBetaErf/ExpIntegralEi/introductions/ExpIntegrals/ShowAll.html Vicinity of 0

Obviously an approximation of E will depend how 1+1/a compares to 2, i.e. a to 1. When 0E >E ∞ . Slope of E (x) at point x=0 is (- E (0)=a/(a-1), hence a possible linear approximation of E around 0 is ( ) aaxaxE a -- + . When 1 ≤ a<+ ∞ , E >E ≥ E and E (0)=a, therefore + ∞ > E (0)>1. Slope of E (x) at point x=0 is (- E (0)) which is infinite for a=1 since E (0)=+ ∞ . When a increases, E function tends to E (z)= G (0,z) function, but starting from an ever-increasing origin since E (0)=a. For Generalized Exponential Integral, an interesting reference is (Chiccoli, 1990). hristian BERTHET Page 28 5/19/2017 Appendix 2: Approximation of Reciprocal of Generalized Exponential Integral

We apply previous results to an approximation of reciprocal ( ) xE a - + when x ˛ [0,a]. Vicinity of 0

It is known that Lambert function W is the inverse of xe x function. Hence the inverse function: Inverse(e -x /x) = W(1/x). This can be checked with WolframAlpha © p (x) ~ e -x /x when x →∞ , regardless of p, this gives the following: E p -1 (x)~W(1/x). It is known that W(x) has a first-order approximation ln(x)-ln(ln(x)): http://dlmf.nist.gov/4.13 Hence we can approximate the reciprocal of the exponential integral function in the vicinity of 0+, regardless of parameter p, by: ( ) )/1ln(ln)/1ln( xxxE p -= - or, simply, ( ) )/1ln( xxE p = - Vicinity of a

Obviously vicinity of 0+ cannot extend higher than x=1 (since ln(ln(x)) is not defined after this point) however, approximation may be acceptable over whole [0,a] if a is sufficiently small. If this is not the case, we use an approximation of E p (x) when x=0. When 0

We are interested in the series expansion of ExpInt function in the vicinity of 0: ( ) ( ) ..)3!22()1(11 +-+----G+-= - pxpxpxpxE pp . This series expansion is not defined for a positive integer p since Gamma function G (1-p) is not defined at p=1,2,3,.. and also for each of these values of p, there exists a value of k for which denominator is null (respectively for k=0,1,2,..). However there is an interesting limit that can be computed for each case of p. For p=1: )1)((lim))1(11(lim px ppxp pppp -G=-G+- ﬁ-ﬁ where we find the very nice relation )(ln)1)((lim g +-=-G ﬁ xpx p pp with g the Euler-Mascheroni constant, hence the approximation ( ) )()(ln xoxxxE +++-= g . The nice relation stems directly from the definition of g : ))(1(lim pp p G-= ﬁ g and the series expansion of Gamma function. More generally, a known relation can be used for positive integer parameters of ExpInt function http://dlmf.nist.gov/8.19.E8 : Which gives ( ) )()ln)1(( xoxxxE ++-= y where y is the digamma function such that y (1)=- g , so ( ) )()ln( xoxxxE +++-= g Using y (n)- y (n-1)=1/(n-1), we have y (2)= - g +1 and ( ) )()1(ln1)(1)ln)2(( xoxxxoxxxE +-++=++--= gy . Note that correctly E ’(x)=-E (x). Similarly ( ) )()ln232(421)(21)ln)3((!2 xoxxxxoxxxxE +-+-+-=+-+-= gy . For an ExpInt parameter n above of equal to 3, singularity ( y (n)-lnx) is applied to a monomial of exponent at least 2.This form is given by WolframAlpha © under the name Generalized Puiseux series (see http://functions.wolfram.com/GeneralIdentities/4/ ). hristian BERTHET Page 30 5/19/2017 Appendix 4: Generalized harmonic numbers

The Nth generalized Harmonic number is . ∑ = = NiaN H i1 . For N=28 and 0<=a<=2, the following graph is obtained: If a =1, it is well known that H N =lnN + g + O(1/N) where g is the Euler-Mascheroni constant ( g ≈ k : [ ] mkk kmmj xfkBdxxfjf )(!)()( -¥ =-= ∑∫∑ += . So: [ ] a amaxxkBdxxjjH amak mkakm amj amj aam - -=  -+++=+== --¥ = ---= -= - ∑∫∑∑ Note that Hm,0= m as expected. Following figure shows the result of the approximation for N=28 and 0<=a<=2:

Hma for m=28 - Hm,0=m and Hm,1=ln(m)+ g hristian BERTHET Page 31 5/19/2017 Appendix 5: Deriving Jelenkovic a>1 Asymptotics from Fagin equations

LRU miss rate equation )))1(((][

NDaEEaHNDMR aaaNa -(cid:215)= - +- means that, if D is small, D/N will tend to 0 when N increases, and consequently, Inverse function of E will take its value in the vicinity of ‘a’ or, reciprocally, E is in the vicinity of 0. ExpInt has the following series expansion http://dlmf.nist.gov/8.19.10: ( ) ( ) ..)3!22()1(11 +-+----G+-= - pxpxpxpxE pp Obviously when a>1, exponent p-1=1/a is less than 1 and ( ) xE a + can be approximated by the first two terms of the series expansion in the vicinity of 0. Hence ( ) )1(~ axaxE aa -G+ + implying: ( ) aa aaxxE  -G - - + )1(~ . And ( ) ( ) )1(1 aaxxEdxdxE aaa -G-=-= -+ . Thus: ( ) )1(1)1(~ aaaaxxEE aaa -G  -G --  -- + . With relation )()1( xxx G=+G , i.e. )11()1( aaa -G-=-G (http://dlmf.nist.gov/5.2.5) ( ) aaaaa aa xaaaa axxEE  -G  -=-G  -G- -  --- + )11()11()11(~ Finally, with aaaa aNDNDaEE  -G  =   - -- + )11()1( , MR formula gives: aaaNaaaNa aDaHaNDaHNDMR  -G=  -G  = --- )11(1)11(][ And asymptotically, when N ﬁ ∞ : astaticaa aaaDMRaDaaDMR  -G-(cid:215)=  -Gﬁ - )11(1][)11()( 1][ V which is the well-known Jelenkovic relation for a>1 (Jelenkovic 1999). hristian BERTHET Page 32 5/19/2017 Appendix 6: Slope of WS(D) for power-law popularities

We observe the form of the slope at the origin of the following WS (D) curves for different values of the popularity parameter: a=0, a=1 and a=2, having in mind that both axis are in logarithmic scale. Clearly a=0 and a=1 curves have the same slope (i.e., the identity) at the origin, whereas for a=2, slope is ½. Traces are in the range of 100M (10 ) and generated under the IRM hypothesis. Note that for both a=0 and 1, the state space is 256K, where it is restricted to 21400 for a=2, even with a trace length of 1G (extending to a 256K space would have likely required a trace longer than 100G) We are interested in the slope in the vicinity of 0 of WS (D) loglog representation, i.e., Y=ln WS versus X=ln D. Following computation is done using natural logarithm, however it’s clearly equivalent to decimal logarithm regarding the slope at origin. Under IRM, ))N(11()( a,11 aNa H DEaNDWS + (cid:215)-= , so ))N(11ln(lnln a,11 aN Xa H eEaNWSY + (cid:215)-+== . Hence )N(11 N)N(1)(' a,11 a,a,1 aN Xa aN XaN Xa H eEa H eH eEaXY + (cid:215)-(cid:215)= , and, setting a, N aN X H eZ = , )()()(' ZEa ZZEZY aa + - (cid:215)= . When D ﬁ + , X=lnD ﬁ - ∞ and Z ﬁ + for N fixed. We show that when Z ﬁ + : Y’(Z) ﬁ ≤ a ≤

1, 1/a when a>1. In order to do so, we use the series expansion of Expint in the vicinity of 0. For p non-integer ( ) ( ) ..)3!22()1(11 +-+----G+-= - pxpxpxpxE pp Hence, for p=1/a non-integer, ( ) )()11(1 xoaxaaxE aa --G+-= - and with: ( ) )()11()1( xoaxaxaxE aa +----G+= + , we finally obtain: IRM WS a=2 a=1 a=0 hristian BERTHET Page 33 5/19/2017 )()1(1 )()11(1)('

11 11

ZoaZaa ZoaZaaZY aa +-G-- --G+-= -- . When Z ﬁ + , this expression tends to 1 for 1/a-1>0, i.e. for a<1, and for a>1 its limit is aaa =-G- -G . In case 1/a=n is integer, previous series is not defined and we use the other series expansion (see previous Appendix): Which gives (for n=1,2,3..) ∑∑ ∑∑ ¥ „=¥ -„= + ¥ „=¥ -„= ++ +--+-+-- +--+----= +-+-+-- +--+----=- (cid:215)= nkk kn nkk kn nkk kn nkk knnn knk ZZnnZ knk ZZnn Z knk ZZnnZn knk ZZnn ZZEn ZZEZY ,1 1,0 1 ,01,0 11 )(! )()ln)1((!)( )1(! )()ln)(()!1( )( )(! )()ln)1((!)(1 )1(! )()ln)(()!1( )()(1 )()(' yy yy Limits when Z ﬁ + are the following: For n =1, )22 ﬁ+- +-= zoZZ zoZZZY yy For n>=2: ﬁ++- - +--=+--+-+-- +--+----= ∑∑ ¥ „=¥ -„= + Zon Z ZonZknk ZZnnZ knk ZZnn ZZY nkk kn nkk kn yy This concludes the proof that, if a ≤

1, the limit of the slope at origin of WS in a loglog representation is always 1. Hence the result for any value of a>0: slope is 1 when 0

1, and 1/a when a>1. Note that, for a=0, limit of the slope at origin is also 1 since )1()( ND eNDWS - -= , so )1ln(lnln ND eNWSY - -+== . With X=lnD, NDND eeNDXDDYXY -- -=(cid:215)= , and, setting NDZ = , then: ﬁ- (cid:215)= -- ZZ e ZeZY when Z ﬁ

0. hristian BERTHET Page 34 5/19/2017

Appendix 7: Property P5: limits of F a (y) LRU/Static ratio at origin We use the series expansion for 1/a non-integer, ( ) )()11(1 yoayaayE aa --G+-= - and ( ) )()11()1( yoayayayE aa +----G+= + Hence aa aaaaa yoayaaya yoayaaa ayEa yEa ayF ---+  ---+-G-- --G+--=  ---=

121 111111 )()11(1)1(11 )()11(11)(111 )(1)(

When y ﬁ

0, ratio limit is the ratio of the smallest order coefficients. Thus, if 1/a-1>0 (i.e. a<1) = +ﬁ yF ay . On the opposite, when 1/a-1<0 (a>1), aaay aaaaa aa ayF  -G-=  -G-- -G-= -+ﬁ )11(1)1(1 )11(1)(lim . We found similar result = +ﬁ yF ay when 1/a is integer (hence a<1) using similar reasoning as in the previous Appendix (Appendix 6). hristian BERTHET Page 35 5/19/2017 Appendix 8: Property P6: >>">" yFya a This is equivalent to showing that: For p>1 (0, when 01). Relation can be checked by analyzing function ( ) ( ) ( ) )(1 )(11)( -+ - --= pp ppp ypE yEpyf . One can prove that on one hand = ¥ﬁ yf py whatever p (this is direct from E p limit), and on the other hand, = ﬁ yf py if p>1 and >-G-= ﬁ ppyf ppy if 01 (i.e., f p is increasing from 0 to 1) or always negative for 00 and p positive integer, but which can be readily extended to real positive (Chicolli, 1990)) twice, second factor is equal to the product: ( )  --- + )(1)()1()( yEyyEpypE pp . First factor of product is positive, from left-hand side of )()()(1 yEyEyEpp ppp <<- + inequality http://dlmf.nist.gov/8.19.E19 for y>0 and p real positive (Chicolli, 1990)). Second factor is obviously positive as well. Consequently derivative of f p has the sign of (p-1) which completes the proof. An interesting by-product of this inequality is: For p=2, one gets: ( ) ( ) )(21)(1 yEyE -<- or: ( ) ( ) )()(2)( yEyEyE -< . Using )()(2 yyEeyE y -= - , it results that: ( ) y eyEyyE - >-+ )(2)( . This inequality is stronger than the well-known ( ) y eyyE - >+ . Another interesting property is: >- -->‡"‡" +- ypE yEpyE yEyp p ppp . Both sides directly result from previous relations. hristian BERTHET Page 36 5/19/2017 Appendix 9: Property P7:

0y (y),F)( >"> ¥ yF We first show that

0y ,1)(1 )()( )(210 >"->-> yE eyE yEyE . From >-+ yEyE it stands: )(0 -> yE eyE . Also from case a=1, one has: )(1 )()( yE yEyE -> . We consider the function )(1)(1 )(ln)( yEyE yEyf -  +-= . Right-hand side inequality of property P7 holds iff f(y)>0, " y>0. One can see that f(y) tends to 0 when y ﬁ ∞ , a positive number when y ﬁ g -ln(1-1/ g ), using E and E series expansion at y=0 and series expansion of ln (1+X)=-ln(1/X)+1/X+o(1/X ) in the vicinity of X=+ ∞ ) , and its derivative has the sign of ( )( ) ( ) )()()()(1)( yEyEyEyEyE --- . In turn this expression can be proven always negative for y real positive, since it tends to -1 when y ﬁ

0, tends to 0 when y ﬁ ∞ , and its derivative can be brought down to ( ) ( )( ) -++- - x exyEyEyxEyE which is always positive (each term is positive). Property P7 :

0y ,F))(1ln( )(1)( >"=-->-= ¥ yE yEe yEF yE . This stems from a=1 analysis. When y increases from 0, F increases from 1 to a maximum where y is solution y of )( )())(1()( yE yEyEyF -= . From previous property, it holds that ( ) )( )(1)()( yE yEyEyF -> ¥ hence F ∞ (y )>F (y ). Obviously F ∞ (y)>F (y) for yy , ( ) )()( )(1)(

11 20 yFyE yEyE >->-