[PDF] The Study of Dynamic Caching via State Transition Field -- the Case of Time-Invariant Popularity

Abstract

This two-part paper investigates cache replacement schemes with the objective of developing a general model to unify the analysis of various replacement schemes and illustrate their features. To achieve this goal, we study the dynamic process of caching in the vector space and introduce the concept of state transition field (STF) to model and characterize replacement schemes. In the first part of this work, we consider the case of time-invariant content popularity based on the independent reference model (IRM). In such case, we demonstrate that the resulting STFs are static, and each replacement scheme leads to a unique STF. The STF determines the expected trace of the dynamic change in the cache state distribution, as a result of content requests and replacements, from any initial point. Moreover, given the replacement scheme, the STF is only determined by the content popularity. Using four example schemes including random replacement (RR) and least recently used (LRU), we show that the STF can be used to analyze replacement schemes such as finding their steady states, highlighting their differences, and revealing insights regarding the impact of knowledge of content popularity. Based on the above results, STF is shown to be useful for characterizing and illustrating replacement schemes. Extensive numeric results are presented to demonstrate analytical STFs and STFs from simulations for the considered example replacement schemes.

Full PDF

aa r X i v : . [ c s . O S ] S e p The Study of Dynamic Caching via StateTransition Field - the Case of Time-InvariantPopularity

Jie Gao,

Member, IEEE , Lian Zhao,

Senior Member, IEEE , andXuemin (Sherman) Shen,

Fellow, IEEE

Abstract

This two-part paper investigates cache replacement schemes with the objective of developing ageneral model to unify the analysis of various replacement schemes and illustrate their features. Toachieve this goal, we study the dynamic process of caching in the vector space and introduce the conceptof state transition ﬁeld (STF) to model and characterize replacement schemes. In the ﬁrst part of thiswork, we consider the case of time-invariant content popularity based on the independent referencemodel (IRM). In such case, we demonstrate that the resulting STFs are static, and each replacementscheme leads to a unique STF. The STF determines the expected trace of the dynamic change inthe cache state distribution, as a result of content requests and replacements, from any initial point.Moreover, given the replacement scheme, the STF is only determined by the content popularity. Usingfour example schemes including random replacement (RR) and least recently used (LRU), we show thatthe STF can be used to analyze replacement schemes such as ﬁnding their steady states, highlightingtheir differences, and revealing insights regarding the impact of knowledge of content popularity. Basedon the above results, STF is shown to be useful for characterizing and illustrating replacement schemes.Extensive numeric results are presented to demonstrate analytical STFs and STFs from simulations forthe considered example replacement schemes.

Index Terms cache replacement policy, probabilistic caching, cache state transition, IRM, online caching, mobileedge caching.

J. Gao and X. Shen are with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON,N2L 3G1, Canada (e-mail: { jie.gao, sshen } @uwaterloo.ca).L. Zhao is with the Department of Electrical, Computer, and Biomedical Engineering, Ryerson University, Toronto, ON, M5B2K3, Canada (e-mail: [email protected]). I. I

NTRODUCTION

Caching has been attracting an increasing amount of attention in the research of wirelesscommunications, especially in the context of mobile edge caching [1] and the joint study ofcommunication, computation, and caching with the objective of deploying services close tomobile users [2], [3]. The research on the performance of caching in wireless communicationsystems may adopt various metrics. The focus can be decreasing the content delivery latency [4],alleviating congestion over the backhaul [5], reducing energy consumption [6], or a combinationof the above [7]. While metrics can be different, the underlying caching performance is largelycentered around one measurement, i.e., the cache hit ratio. Since a cache can only accommodatea limited portion of all contents, the cache hit ratio is determined by how the cached contentsare selected and how they are updated.Selecting the contents to be cached is relevant in the context of proactive caching. For example,an edge node can cache contents in advance during off-peak hours to reduce peak-hour networktrafﬁc load [8]- [10]. The key to proactive caching is adapting to unknown content popularity ornetwork environment, usually leading to a Markov decision problem [11] or a learning problem[12].Updating the cached contents is relevant in the context of online caching. Speciﬁcally, acached content may be evicted and replaced by a new content whenever a cache miss occurs,which leads to a dynamic process that updates cached contents on the ﬂy [13]. The guidingrule in updating the contents is referred to as a cache replacement scheme . Evidently, the cachereplacement scheme has a signiﬁcant impact on the performance of caching. In fact, even if thecontents are cached proactively, cache replacement can still play an important roll in updatingthe cached contents while requests are being received.Due to the importance of cache replacement schemes, related topics have been extensivelystudied in various scenarios [14]. Classic replacement schemes include ﬁrst in ﬁrst out (FIFO),least recently used (LRU), least frequently used (LFU), random replacement (RR), etc. and theirvariants. Some early works adopted simple probabilistic models with primitive assumptionson the request distribution [15] or focused on bounding the performance of the aforementionedschemes [16]. More recent works adopted Markov chains to model and analyze cache replacementschemes [17]- [20]. This class of studies generally focused on deriving the steady states of theaforementioned schemes and the mixing time of their underlying Markov chains [21] [22]. In our previous work [23], we considered the problem in reverse and designed the Markov chainunderlying the replacement scheme so that a target set of content caching probabilities can beachieved.Most recent works in the communications ﬁeld tended to evaluate existing replacement schemesin their considered network scenarios or propose new schemes that suit their speciﬁc objectives.Chang et al. studied the joint problem of cache replacement and bandwidth allocation in thescenario of peer-assisted video-on-demand systems and compared different cache replacementsthrough simulations [24]. Fiore et al. developed a replacement scheme for boosting contentdiversity in a wireless ad hoc network based on the estimated content presence at peer nodes[25]. A least fresh ﬁrst scheme was designed in [26] to maintain the freshness of cached data forthe scenario of the Internet of Things based on named data networking. Two replacement schemeswere proposed for the video-on-demand service in femtocells [27], the ﬁrst of which exploitscontent access history for improving cache hit ratio while the second exploits information onuser access delay to promote service fairness. Kamiyama et al. proposed a replacement schemefor content delivery networks based on the hop count from end users to the content serverwith an objective to reduce network trafﬁc load [28]. Chattopadhyay et al. investigated contentreplacement based on the knowledge of cached contents at neighbor base stations for a cellularnetwork with densely deployed base stations [29]. A similar scenario was studied in [30], inwhich the authors proposed replacement schemes that implicitly coordinate contents at cachesover the network to maximize the overall hit ratio of the considered system.While there has been abundant research on the topic of cache replacement, a model that canconveniently unify the analysis of different replacement schemes, characterize their features,and intuitively illustrate their differences is not yet available. The objective of this two-partpaper is to develop such a model. Speciﬁcally, we have three targets. First, we aim to integratecache replacement schemes under a uniﬁed general probabilistic cache replacement model anddemonstrate this using several speciﬁc schemes as examples. Second, we target at studying thegeneral cache replacement model from a novel perspective, the state transition ﬁeld (STF), whichcharacterizes replacement schemes in the vector space and captures the insights on their features.Third, we strive toward the goal of developing the model and methodology for studying cachereplacements using the SFT.The ﬁrst part of this work focuses on the case when the content popularity is time-invariantwhile the second part investigates the scenario of time-varying content popularity [31]. Through the two parts of this paper, we demonstrate that a replacement scheme corresponds to a uniquestate transition matrix, which in turn generates a unique STF, and the resulting STF jointlydetermines the performance of the replacement scheme with the content request statistics. Fur-thermore, although such an extension is not directly included, we provide the motivation, basicmodel, and methodology for studying the problem in reverse: given a performance target, can areplacement scheme be designed through determining the state transition matrix, which is in turngenerated based on creating the STF according to the performance target and content requeststatistics?The contributions of the ﬁrst part are the followings.First, we propose a general content replacement model based on probabilistic state transition asa uniﬁed model for cache replacement schemes. Unlike existing general model based on Markovchains, e.g., [21], we focus not just on the steady states but more on the dynamic change of thecache state distribution and describe this dynamic change in the vector space of state cachingprobabilities. Moreover, we introduce new ideas and results, such as the decomposition of statetransition probability matrices based on contents and the mapping between state and contentcaching probabilities, to form a complete toolset for establishing our model.Second, based on the aforementioned model, we introduce STF, which is a vector ﬁeld deﬁnedover the state transition domain. We demonstrate that STF can characterize and illustrate cachereplacement schemes. The STF determines the expected change of the dynamic cache statedistribution just like an electromagnetic ﬁeld determines the movement of a charged particleplaced in it (although the STF can have more than 3 dimensions). Moreover, we show that thesteady state of replacement schemes can be conveniently found based on the STF.Third, we analyze the STF using four example replacement schemes of three types as casestudies: RR, replace less popular (LP) and replace the least popular (TLP), and LRU. RRexploits no knowledge of content popularity, LP and TLP exploit perfect knowledge basedon an assumption of perfect prediction, and LRU exploits imperfect knowledge from historicalrequests. We compare their STF and analyze the impact of the exploited knowledge on theirsteady states through the STF. Moreover, we conduct extensive simulations to generate the STFof the above example schemes to demonstrate the impact of replacement schemes and contentpopularity on the STF.

Initial 1 st requestinstant st replacement point n th requestinstant n th replacementpoint (n+1) th requestinstant (n+1) th replacementpoint D (0) D (n) t Fig. 1: Illustration of the timeline model. D ( n ) represents the duration between the n th and the ( n + 1) th replacement points. II. S YSTEM M ODEL

The scenario of N c contents and a cache with size L is considered. The set of all contents isdenoted by C . Without loss of generality, we assume that all contents are of identical and unitsize. We do not target at a speciﬁc scenario as the model can be applied to a cache located at asmall-cell base station in a cellular network, a road side unit (RSU) in a vehicular network, oreven a user device for D2D caching. A. Content Request and Replacement

The fundamental assumption in the ﬁrst part of this paper is that the requested contents at allinstants, as integer random variables, are independent and identically distributed. This followsfrom the widely used independent reference model (IRM), a simpliﬁcation of the actual requestprocess that can be accurate with a large number of requesting users [21] or within a short timeframe [32]. As the requested content follows a distribution that is time-invariant, the probabilityof content l ∈ C being requested can be denoted by υ l . The probabilities { υ l } ∀ l are organizedinto the request probability vector υ and referred to as the content popularity.If content l is requested but not being cached, it will be downloaded and, depending on thereplacement scheme, may replace one cached content. It is assumed that the download andreplacement can be completed before the next content request arrives at the cache.The timeline of the considered dynamic caching is illustrated in Fig. 1. For simplicity ofnotation, we place a replacement point after each request regardless of whether a replacementactually happens or not. If a replacement occurs following the n th request, it is completed bythe n th replacement point. {1,2} {2,3} {1,3}{2,4} {3,4} {3,5} {4,5}{1,4} {1,5}{2,5} [1 0 1 0 0] T [1 0 0 1 0] T [1 0 0 0 1] T [0 1 1 0 0] T [0 0 1 1 0] T [0 0 0 1 1] T [0 1 0 1 0] T [0 0 1 0 1] T [0 1 0 0 1] T [1 1 0 0 0] T Fig. 2: An illustration of states with N c = 5 and L = 2 . Each circle represents a state. Thenumber above a circle represents the state ID, and the set inside a circle represents the set ofcached contents in that state. For example, state 7 caches contents 2 and 5. B. Cache State

The cache state is introduced to describe the combination of cached contents. There are N s = (cid:0) N c L (cid:1) different possible combinations of cached contents, corresponding to N s cachingstates. The set of all cache states is denoted by S . The set of contents cached in state k isdenoted by C k . The cache state vector for state k is deﬁned as a N c × vector with elementsdetermined as follows: s k ( l ) =  , if l ∈ C k , , if l / ∈ C k , ∀ l ∈ C , ∀ k ∈ S , (1)where the l th element of vector s k corresponds to the l th content. An example with N c = 5 and L = 2 is illustrated in Fig. 2. In this example, there are (cid:0) (cid:1) = 10 states. Each circle in the ﬁgurerepresents a state, while the number above the circle represents the state ID. The set given inthe circle of state k is the set of cached contents in state k , i.e., C k , and the vector beneath state k is s k . For example, state 7 caches contents { , } and is represented by the cache state vector s = [0 1 0 0 1] T , where · T stands for transpose, given beneath the circle of state 7 in Fig. 2.A state is a neighbor of state k if its cached contents differ from those cached in state k byjust one element. The set of neighbors of state k is denoted as H k . For any content l / ∈ C k , acontent- l neighbor of state k is a neighboring state that caches l . The set of content- l neighboring states of state k is denoted as H k,l . Using Fig. 2 and state 8 as an example, H is the set of allcolored states, and H , is the two states with deep color. C. State and Content Caching Probabilities

The cached contents and the cache state remain constant in the durations between consecutivereplacement points (shown in Fig. 1). The state caching probability (SCP) for state k and the n th duration, denoted by η ( n ) k , is the probability that the cache is in state k in the n th duration.The content caching probability (CCP) for content l and the n th duration, denoted by λ ( n ) l , isthe probability that the content l is cached in the n th duration.Deﬁne the SCP vector η ( n ) and CCP vector λ ( n ) such that η ( n ) ( k ) = η ( n ) k and λ ( n ) ( l ) = λ ( n ) l .Evidently, T η ( n ) = 1 and T λ ( n ) = L . Based on the time line in Fig. 1, the SCP and the CCPvectors at the instant of the ( n + 1) th request are η ( n ) and λ ( n ) , respectively.The SCP and the CCP are connected through cache states. Using Fig. 2 as an example, theprobability that content 5 is cached is equal to the sum of the probabilities that the states in thedotted box are cached. Deﬁne a cache state matrix C s = [ s , . . . , s N s ] . In general, the relationbetween the SCP η ( n ) and CCP λ ( n ) is given by: λ ( n ) = C s η ( n ) . (2) D. Cache Hit Probability

Given that content l is being requested at the ( n + 1) th request, the conditional instantaneouscache hit probability is λ ( n ) l . The instantaneous cache hit probability at the ( n + 1) th request,denoted by γ ( n +1) is given by: γ ( n +1) = υ T λ ( n ) . (3)The symbols used in this paper are listed in Table I. Throughout the paper, we use lower-casebold letters for vectors, upper-case bold letters for matrices, and calligraphic letters for sets. Thesuperscript ( · ) ( n ) is used on letters related to the n th request or replacement. Greek letters areused to represent various probabilities. The indexes m and k are used to denote cache states,while the indexes l and q are used to denote contents. TABLE I: List of Symbols N c The number of all contents N s The number of all cache states L The cache size limit C The set of all contents, i.e., { , . . . , N c }S The set of all cache states, i.e., { , . . . , N s }S l The set of all cache states that cache content l s k The k th cache state vector C k The set of contents cached in state k C s The cache state matrix, i.e., [ s , . . . , s N s ] H k The set of all neighbors of state k H k,l The set of all content- l neighbors of state ke ( k, m ) The unique element in the set C k − C m , where m ∈ H k υ l The request probability of content l υ The content request probability vector, i.e., [ υ , . . . , υ N c ] T φ l,q,k The conditional probability that content l replaces content q given that cache is in state k and content l is requested Θ The state transition probability matrix Θ l The conditional state transition probability matrix given that content l is requested Θ ( m, k ) The probability of transitioning from state k to state m Θ l ( m, k ) The probability of transitioning from state k to state m given that content l is requested η ( n ) k The SCP for state k in the duration from the n th to the ( n + 1) th replacement η ( n ) The SCP vector in the duration from the n th to the ( n + 1) th replacement, i.e, [ η ( n )1 , . . . , η ( n ) N s ] T λ ( n ) l The CCP for content l in the duration from the n th to the ( n + 1) th replacement λ ( n ) The CCP vector in the duration from the n th to the ( n + 1) th replacement, i.e., [ λ ( n )1 , . . . , λ ( n ) N c ] T γ ( n ) The instantaneous cache hit probability at the n th request u ( η ) The state transition ﬁeld at η u l ( η ) The content- l state transition ﬁeld at η u m,l ( η ) The m th element of the state transition ﬁeld at η u m,l ( η ) The m th element of the content- l state transition ﬁeld at η III. G

ENERAL C ONTENT R EPLACEMENT M ODEL AND S TATE T RANSITION F IELD

If the cache is at state k while content l / ∈ C k is requested, the cache downloads content l anddecides whether to replace a cached content with content l . In the general model, the probability of replacing content q with content l when the cache is at state k is denoted by φ l,q,k , for any q ∈ C k and l / ∈ C k . For each state, there are L ( N c − L ) possible replacements. A. General Cache State Transition Model

A content replacement triggers a cache state transition. For neighboring states k and m whichsatisﬁes m ∈ H k,l and k ∈ H m,q , replacing content q with l triggers a transition from state k tostate m . The conditional cache state transition probabilities given that content l is requested canbe organized into the following matrix Θ l : Θ l ( m, k ) =  , if k = m and l ∈ C k , − P m ′ ∈H k,l φ l,e ( k,m ′ ) ,k , if k = m and l / ∈ C k ,φ l,e ( k,m ) ,k , if m ∈ H k,l , , otherwise , (4)where e ( k, m ) denotes the unique content that is cached by state k but not state m given that k ∈ H m . Accordingly, the overall cache state transition probability matrix in the general case isgiven by: Θ = X l ∈C υ l Θ l . (5)From the deﬁnition of the SCP vector η ( n ) and state transition probability matrix Θ , it canbe seen that: η ( n ) = Θ η ( n − . (6)It is worth mentioning that the model can be extended to the scenario in which each contentrequest (and replacement) involves multiple contents. In such case, assuming that each requestis for a block of B contents ( B < L ), there are N B = (cid:0) N c B (cid:1) different blocks. Then, eq. (5) canbe extended as follows: Θ B = N B X b =1 υ B b Θ B b , (7)where υ B b is the probability that the b th block is requested, and Θ B b is the conditional cachestate transition probabilities given that block b is requested. The size of Θ B b remains N s × N s .However, for any given state, e.g., state k , the set of its neighbors H k will contain more statesunder block replacement, and the set of its content- l neighbors H k,l will be replaced by a set ofblock- b neighbors. The extension is straightforward and the details are omitted here. B. STF

Denote the general SCP without specifying any time instant as η . Consider η as a point in the N s -dimensional vector space. Driving by the requests and replacements, η varies in the followingdomain: D = ((cid:0) η , . . . , η N s (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) ≤ η k ≤ , ∀ k ∈ S ; X k η k = 1 ) . (8)The expected ‘movement’ of η in D after the n th replacement point, assuming a replacementactually happens, is characterized by η ( n ) − η ( n − . This difference, in turn, is determined bythree factors: • the current position of η in D , i.e., the value of η ( n − • the content popularity υ • the state transition probability matrix Θ ,while Θ is determined by the replacement scheme and generally dependent on υ (and suchdependence is shown in eq. (5)).Deﬁne the STF at the point η ( n − using the aforementioned difference: u ( η ( n − ) = η ( n ) − η ( n − . (9)Substituting eq. (6) into eq. (9), it follows that: u ( η ( n − ) = Θ η ( n − − η ( n − . (10)The STF is a vector ﬁeld deﬁned over the domain D . It can be seen that understanding the STFcan provide insight into the design and performance analysis of replacement schemes. Similarto a magnetic or electric ﬁeld, the STF can vary in direction and strength at different points inthe domain (although the STF exists mathematically but not physically).In the deﬁnition eq. (9), the η ( n − in the brackets speciﬁes a point in the domain D . If theSTF is known at all points in D , then a path can be identiﬁed from any initial point, as illustratedin Fig. 3, the end of which gives the steady state of the replacement scheme while the numberof steps in the path reﬂects the time for the underlying Markov chain to attain its stationary statefrom that initial point. Different replacement schemes yield different STFs, and the impact isconveyed through Θ . Therefore, the STF is a complete characterization of replacement schemes.It is worth noting that the STF does not change over time under the IRM in general, as υ and Θ are both constant. D η (0) η (1) η (2) η (3) u ( η (0) ) u ( η (1) ) u ( η (2) ) u ( η (3) ) η ⋆ Fig. 3: An illustration of STF at four points, i.e., η (0) to η (3) . The end point η ⋆ represents thesteady state, at which the STF diminishes to an all-zero vector. C. Content-speciﬁc STF

The STF can be decomposed. Deﬁne: u l ( η ( n − ) = Θ l η ( n − − η ( n − . (11)It follows that: X l ∈C υ l u l ( η ( n − ) = X l ∈C υ l Θ l η ( n − − η ( n − = u ( η ( n − ) , (12)where the last step uses eq. (5). Accordingly, u l ( η ( n − ) can be considered as the content-speciﬁcSTF that represents the ‘movement’ of η from the point η ( n − after content l is requested. Thesuperposition of all content-speciﬁc STFs, weighted by the corresponding content popularity,yields the overall STF.It is not difﬁcult to see that the following equalities hold: T u l ( η ( n − ) = 0 , ∀ l ∈ C , ∀ η ( n − ∈ D (13) T u ( η ( n − ) = 0 , ∀ η ( n − ∈ D . (14)IV. S TATE T RANSITION M ATRICES OF S PECIFIC R EPLACEMENT S CHEMES

In this section, we demonstrate how four speciﬁc replacement schemes, i.e., RR, LP, TLP, andLRU, ﬁt into the general content replacement model in the preceding section. As the impact ofreplacement schemes on the STFs is conveyed through the state transition matrix Θ , the focuswill be on ﬁnding Θ for the considered schemes. The four replacement schemes can be categorized into three groups based on the contentpopularity information that they exploit. • RR does not use any content popularity information; • Both LP and TLP rely on the prediction of content popularity, and a perfect prediction willbe assumed. • LRU exploits imperfect content popularity information from request history, i.e., the infor-mation of recent content requests.The impact of the difference in the exploited content popularity information on the STF will bepresented in subsequent sections of this paper.

A. RR

For RR, the conditional content replacement probability φ l,q,k reduces to a constant: φ l,q,k = φ ∈ (0 , /L ] , ∀ q ∈ C k , l / ∈ C k . (15)Accordingly, the conditional state transition probability matrix Θ l is given by: Θ RR ,l ( m, k ) =  , if l ∈ C k and k = m, − Lφ, if l / ∈ C k and k = m,φ, if m ∈ H k,l , , otherwise , (16)i.e., the probabilities of content l replacing a cached content and no replacement are Lφ and − Lφ , respectively.The overall state transition probability matrix Θ RR is given by: Θ RR ( m, k ) =  − Lφ P l / ∈C k υ l , if k = m,φυ e ( m,k ) , if m ∈ H k , , otherwise . (17) B. LP

Denote the predicted content popularity by ˜ υ . Using LP, the requested content l / ∈ C k mayreplace a cached content q in state k if ˜ υ l > ˜ υ q , i.e., the requested content is more popular. The conditional state transition probability is given by: Θ LP ,l ( m, k )=  , if l ∈ C k and k = m, − α, if l / ∈ C k , k = m, and ˜ υ l > ˜ υ q ,αφ l,q,k , if m ∈ H k,l , k ∈ H m,q , and ˜ υ l > ˜ υ q , , otherwise , (18)where α is a parameter that controls the probability of a replacement.The conditional replacement probability, assuming that ˜ υ l > ˜ υ q , is set to be proportional to ˜ υ l − ˜ υ q , as follows: φ l,q,k = ˜ υ l − ˜ υ q P t ∈C ↓ k,l (˜ υ l − ˜ υ t ) , (19)where C ↓ k,l = { t | t ∈ C k , ˜ υ t < ˜ υ l } . (20)Order the states based on P t ∈C k ˜ υ t , i.e., the summation of the predicted content requestprobability of each state, in a non-decreasing order. Then, it can be shown that the state transitionmatrix Θ LP becomes a lower-triangular matrix: Θ LP ( m, k )=  P q ∈C k υ q + P l ∈ ¯ C ↓ k υ l + P l ∈ ¯ C ↑ k υ l (1 − α ) , if m = k,αυ e ( m,k ) φ e ( m,k ) ,e ( k,m ) ,k , if m > k and m ∈ H k , , otherwise . (21)in which ¯ C ↓ k = (cid:26) l | l / ∈ C k , ˜ υ l ≤ min t ∈C k { ˜ υ t } (cid:27) , (22a) ¯ C ↑ k = (cid:26) l | l / ∈ C k , ˜ υ l > min t ∈C k { ˜ υ t } (cid:27) . (22b) C. TLP

Denote the least popular content of state k based on the prediction by q † ( k ) , i.e., q † ( k ) = argmin t ∈C k { ˜ υ t } . (23) Using TLP, the requested content l / ∈ C k can only replace q † ( k ) when the cache is in state k ,and the replacement can happen only if ˜ υ l > ˜ υ q † ( k ) . The conditional state transition probabilityis given by: Θ TLP ,l ( m, k )=  , if l ∈ C k and k = m, − φ l,q † ( k ) ,k , if l / ∈ C k and k = m, and ˜ υ l > ˜ υ q † ( k ) ,φ l,q † ( k ) ,k , if m ∈ H k,l , k ∈ H m,q † ( k ) , and ˜ υ l > ˜ υ q † ( k ) , , otherwise . (24)Two choices of the replacement probability φ l,q † ( k ) ,k are considered when ˜ υ l > ˜ υ q † ( k ) : φ l,q † ( k ) ,k =1 and φ l,q † ( k ) ,k = ˜ υ l − ˜ υ q † ( k ) . In the ﬁrst case, the replacement always occurs, and the TLP insuch case will be referred to as TLP-A. In the second case, the replace occurs probabilistically,and the the TLP in such case will be referred to as TLP-P. Intuitively, TLP-A would lead tofaster convergence while TLP-P could be useful when each replacement incurs a replacementcost.Order the states based on P t ∈C k ˜ υ t , i.e., the summation of the predicted content requestprobability of each state, in a non-decreasing order. Then, the state transition matrix Θ TLP also becomes a lower-triangular matrix: Θ TLP ( m, k )=  P q ∈C k υ q + P l ∈ ¯ C ↓ k υ l + P l ∈ ¯ C ↑ k υ l (1 − φ l,q † ( k ) ,k ) , if m = k,υ e ( m,k ) φ e ( m,k ) ,q † ( k ) ,k , if m > k and e ( k, m ) = q † ( k ) , , otherwise . (25) D. LRU

When the cache is in state k while content l is requested, the conditional state transitionprobability matrix Θ l is given by: Θ LRU ,l ( m, k ) =  , if l ∈ C k and k = m,ρ LRU e ( k,m ) | k , if m ∈ H k,l , , otherwise . (26)where ρ LRU e ( k,m ) | k represents the conditional probability that content e ( k, m ) is the least recentlyused content given that the cache is in state k . The probability ρ LRU e ( k,m ) | k can be found, as a simpliﬁed special case under IRM, based on Lemma 1 in the second part of this two-part paper,which addresses the more general case of time-varying content popularity [31].The overall state transition probability matrix Θ LRU is given by: Θ LRU ( m, k ) =  P l ∈C k υ l , if k = m,υ e ( m,k ) ρ LRU e ( k,m ) | k , if m ∈ H k , , otherwise . (27)Note that, unlike RR and LRU, LP and TLP are not practical replacement schemes. However,the latter two are considered here for the purpose of analyzing what the STF of a replacementscheme would become in the ideal case with perfect content popularity information, as a com-parison to the cases with no and imperfect content popularity information (e.g., RR and LRU,respectively).V. STF BASED A NALYSIS FOR C ACHE R EPLACEMENT U NDER T IME - INVARIANT C ONTENT P OPULARITY

In this section, we analyze speciﬁc replacement schemes using the STF to demonstrate thatanalysis based on STF can characterize the features of different replacement schemes and revealinsights regarding their steady states.

A. RR

Using the deﬁnition of content-speciﬁc STF in eq. (11) and the state transition probabilitymatrix of RR in eq. (16), it can be shown that the m th element of the content-speciﬁc STF at η is given by: u m,l, RR ( η ) =  φ P { k | m ∈H k,l } η k , if l ∈ C m , − Lφη m , otherwise . (28)Using the STF in eq. (28), the following result becomes straightforward. Theorem 1:

The steady state of RR, denoted by η ⋆ , is independent on the parameter φ andsatisﬁes the following property: η ⋆m X l / ∈C m υ l = 1 L X k ∈H m η ⋆k υ e ( m,k ) , ∀ m ∈ S . (29) Proof : See Section A in Appendix. The property in Theorem 1 can be used to obtain a closed-form expression of the steady state.Deﬁne N s vectors, one for each state, so that a m ( k ) =  P l / ∈C m υ l , if k = m, − L υ e ( m,k ) , if m ∈ H k , , otherwise . (30)where a m ( k ) represents the k th element of the vector for the m th state. Then, N s − out of the N s vectors are linearly independent. Deﬁne matrix A as follows: A = [ a , . . . , a N s − , ] T , (31)where is an all-one vector. Then, the steady state η ⋆ can be given by η ⋆ = A − g , (32)in which g = [0 , . . . , , T is the vector that has 0 as its ﬁrst N s − elements and 1 as its lastelement.Evidently, the steady state of RR does not maximize cache hit probability as RR does notexploit any content popularity information. The property in Theorem 1 characterizes the steadystate of RR. Speciﬁcally, eq.(29) shows that the steady state of RR achieves such balance that,if a randomly selected cached content is to be replaced by a random content not cached, theresulting expected cache miss probability due to this replacement should be equal to the cachemiss ratio of the steady state without any replacement.The rate of convergence of a ﬁnite-state ergodic Markov chain is decided by the second largesteigenvalue of its transition probability matrix [33]. Speciﬁcally, it holds that [34]: k Θ t η (0) − η ⋆ ( Θ ) k ≤ d t ( Θ ) k η (0) k (33)for any initial state distribution η (0) , where Θ represents any ergodic Markov chain, η ⋆ ( Θ ) represents the corresponding steady state, d ( Θ ) represents the second largest eigenvalue of Θ ,and t is the number of steps since the initial point. While it is generally impossible to derive aneigenvalue of an arbitrary transition matrix Θ in closed-form, the bounds on the second largesteigenvalue of a reversible transition matrix can be estimated [35]. The STF provides anotherintuitive perspective for analyzing the rate of convergence. In the case of RR, a larger φ impliesstronger STF while the direction of the STF at all points remains the same. Therefore, a larger φ generally leads to a shorter mixing time. B. LP and TLP

Note that, in practice, the L most popular contents can be placed in the cache from thebeginning without using LP or TLP for replacements if the content popularity is known. However,as we intend to analyze the impact of content popularity information adopted by a replacementscheme on the path of state cache distribution starting from an arbitrary state, the analysis ofLP and TLP is of interest.For LP and TLP, the steady state is straightforward. Sort the contents based on a nondecreasingorder of their predicted popularity so that ˜ υ l ≥ ˜ υ q if l ≥ q . Sort the states based on P t ∈C k ˜ υ t ,i.e., the summation of the predicted content request probability of each state, in a non-decreasingorder. Then, the L least popular contents are cached in state 1, and the L most popular contentsare cached in state N s . Lemma 1:

The steady state for both LP and TLP is η ⋆ = [0 , . . . , , .The proof is straightforward given that, for any k ∈ S , the following two facts hold: 1). State k can only transition to state m if m > k ; and 2). State k transitions to at least one neighboringstate in H k with a positive probability. The two observations can be made based on eq. (21) andeq. (25).Compared to the steady state of RR, the result in Lemma 1 reﬂects the impact of exploitingcontent popularity information on the steady state of a replacement scheme.Since both Θ LP and Θ TLP are lower-triangular matrices, the eigenvalues of Θ LP and Θ TLP aretheir respective diagonal elements. Evidently, neither of Θ LP and Θ TLP is ergodic. Nevertheless,the second largest eigenvalue of both falls in (0 , in both cases, and the result in eq. (33) stillholds for Θ LP and Θ TLP . The largest eigenvalue is 1 in both cases. The second largest eigenvalue,which determines the mixing time of LP and TLP, is given by the following result.

Lemma 2:

The second largest eigenvalues of Θ LP and Θ TLP are given by g ( Θ LP ) = 1 − α ˜ υ ˆ l (34) g ( Θ TLP ) = 1 − ˜ υ ˆ l φ ˆ l, ˆ l − ,N s − (35)where ˆ l = N c − L + 1 . Proof : See Section B in Appendix.Based on Lemma 2, the rate of convergence depends on α in the case of LP and φ ˆ l, ˆ l − ,N s − in the case of TLP. Furthermore, it can also be seen from Lemma 2 that the rate of convergence in both cases also depends on the popularity of a particular content, i,e., the ( N c − L + 1) thcontent, or equivalently, the L th most popular content.Unlike RR and LRU, which do not rely on the content popularity information, prediction errorin the content request probabilities could have an impact on either the STF or both the STF andthe steady state of LP and TLP. Speciﬁcally, if there are errors in the prediction but the set ofthe L most popular contents is predicted correctly, then the predicted STF can differ from theactual STF but the steady state will not be affected. By contrast, if the predicted L most popularcontents are different from the actual L most popular contents, then both the STF and the steadystate from the prediction will differ from their respective actual values. C. LRU

Using the deﬁnition of content-speciﬁc STF in eq. (11) and the state transition probabilitymatrix of LRU in eq. (26), it can be shown that the m th element of the content-speciﬁc STF at η is given by: u m,l, LRU ( η ) =  P { k | m ∈H k,l } ρ LRU e ( k,m ) | k η k , if l ∈ C m , − η m , otherwise . (36)Under the IRM model, the probabilities { ρ LRU e ( k,m ) | k } ∀ k, ∀ m ∈H k are constants and can be calculated.Given { ρ LRU e ( k,m ) | k } , the following result regarding the steady state in the case of LRU can be foundusing the STF. Theorem 2:

The steady state η ⋆ in the case of LRU satisﬁes the following property: η ⋆m X l / ∈C m υ l = X k ∈H m υ e ( m,k ) ρ LRU e ( k,m ) | k η ⋆k . (37) Proof : See Section C in Appendix.Comparing eq. (37) and eq. (29) reveals an interesting insight. Denote the steady state SCP in the case of RR by η ⋆ RR . The STF at the point η ⋆ RR in the case of LRU is given by: u m, LRU ( η ⋆ RR )= X l ∈C m υ l · u m,l, LRU ( η ⋆ RR ) + X l / ∈C m υ l · u m,l, LRU ( η ⋆ RR )= X l ∈C m υ l X { k | m ∈H k,l } ρ LRU e ( k,m ) | k η ⋆k, RR − X l / ∈C m υ l η ⋆m, RR = X l ∈C m υ l X { k | m ∈H k,l } ρ LRU e ( k,m ) | k η ⋆k, RR − L X k ∈H m η ⋆k, RR υ e ( m,k ) = X l ∈C m υ l X { k | m ∈H k,l } ρ LRU e ( k,m ) | k η ⋆k, RR − X l ∈C m υ l L X { k | m ∈H k,l } η ⋆k, RR = X l ∈C m υ l X { k | m ∈H k,l } (cid:18) ρ LRU e ( k,m ) | k − L (cid:19) η ⋆k, RR , (38)where the second step uses eq. (37) and the third step uses the property in eq. (29). The term ρ LRU e ( k,m ) | k − /L in eq. (38) is interesting as it shows the difference between the steady states inRR and LRU. Speciﬁcally, (38) shows that, compared to RR, the steady state of LRU favorsstates with popular contents.As an example, consider the case when state m caches the L most popular contents. Thenit follows that ρ LRU e ( k,m ) | k > /L in eq. (38) for any k such that m ∈ H k . This is true becausecontent e ( k, m ) is less popular than the other L − contents in state k , which are also cachedby state m and therefore among the L most popular contents. Note that the constant /L canbe considered as the probability that e ( k, m ) is the LRU content when all cached contents haveexactly the same request probability. As ρ LRU e ( k,m ) | k > /L for any k such that m ∈ H k in eq. (38), u m, LRU ( η ⋆ RR ) > , which shows that the STF of the LRU at η ⋆ RR points towards a direction thatincreases the probability of caching state m . Similarly, it can be shown that u m ′ , LRU ( η ⋆ RR ) < if m ′ caches the least popular contents.The above difference between the steady states of the RR and LRU roots from the differencein the information exploited in the two schemes. Unlike RR, which exploits no information andtreats each cached content indifferently in every single replacement, the LRU exploits historicalrequest information, which reﬂects the content popularity. As a result, LRU can converge to asteady state that caches popular contents with larger probabilities. η ⋆ u ( η ⋆ ) u ( η ⋆ ) u ( η ⋆ ) (a) Small k u l ( η ⋆ ) k . ˜ η ⋆ u ( ˜ η ⋆ ) u ( ˜ η ⋆ ) u ( ˜ η ⋆ ) (b) Large k u l (˜ η ⋆ ) k . Fig. 4: Illustration of decomposing the STF at the steady state.VI. D

ISCUSSIONS

In this section, we discuss the beneﬁts of using the proposed STF to analyze replacementschemes in practice. First, we use an example to show how the STF can characterize the propertyof the stationary states. Then, we use another example to show how the STF can be used tocompare the convergence rate of replacement schemes.

A. On the Steady State

Given two replacement schemes (or the same replacement scheme with different parameters),can we tell more about their steady states besides the cache hit probability?At the steady state, the overall STF must be equal to regardless of the replacement scheme.However, this does not mean that no replacement happens after the steady state is achieved.Instead, contents can still be evicted from or accepted into the cache, while the probabilities ofthe two events must be equal for any content at the steady state. Therefore, it is not difﬁcultto see that, there can be more frequent replacements at the steady state of one replacementscheme than that of another. This frequency of replacement at a steady state can be analyzed bydecomposing the STF into content-speciﬁc STF using eq. (12), as illustrated in Fig. 4. In theillustrated cases, we assume the same content request probabilities, while the content-speciﬁcSTFs in Fig. 4a have much smaller norms than those in Fig. 4b. Correspondingly, there can beless frequent replacements at the steady state η ⋆ in Fig. 4a than at the steady state ˜ η ⋆ in Fig. 4b.In the case when each replacement incurs a cost or when cache wear-out is a concern,characterizing the frequency of replacement can be of interest. Based on the above discussion,the weighted sum of the norm of content-speciﬁc STF can be used as a metric for comparing η a u ( η a ) η ⋆ u ( η b ) η b Fig. 5: Illustration of sampling the STF for characterizing the convergence rate.the frequency of content replacement at the steady state of different replacement schemes. Forexample, a metric can be calculated as follows: M ( η ⋆ ) = X l ∈C υ l k u l ( η ⋆ ) k , (39)where the weights are the content request probabilities. B. On the Convergence to the Steady State

We mentioned the rate of convergence and its relation with the second largest eigenvalue ofthe transition probability matrix Θ in Section V. Since STF is a derivative of state transitionprobability matrix, it does not provide a new characterization of the rate of convergence in theory.However, we could use STF to develop a metric for comparing the convergence rate of differentreplacement schemes in practice.For example, we can generate sample points in the state transition region, as illustrated usinghollow circles in Fig. 5. Hypothetically, if the STF at every point of the state transition regionpoints toward the steady state η ⋆ , then the rate of convergence is determined by the strength(norm) of the STF. In practice, the STF at the sample points generally does not point straighttoward the steady state. Nevertheless, we can project the STF at a sample point onto theconnection line between that sample point and the steady state. This is illustrated with twoexample sample points, i.e., η a and η b , in Fig. 5. In this ﬁgure, the solid circle ﬁlled withred represents the steady state η ⋆ . The black arrows at sample points η a and η b represent theSTF u ( η a ) and u ( η b ) , respectively. The two dashed lines connect η a and η b with the steadystate η ⋆ . The two blue arrows on the dashed lines represent the projection of u ( η a ) and u ( η b ) ,respectively. The norm of the projection, aggregated over all sample points, can provide a metric RR, N c = 3 ,L = 2 , υ = [0.5 0.29 0.21] T , φ =0.45 η η η (a) STF of RR, φ = 0 . , υ = [0 . , . , . T . −3 −3 −3 RR, N c = 30 ,L = 3 η η η (b) STF of RR, N c = 30 , L = 3 , in a 3-Dsubspace. Fig. 6: STF of RR in 3-D.for characterizing the rate of convergence of replacement schemes. The accuracy of this approachdepends on the number and locations of the chosen sample points.VII. N

UMERICAL R ESULTS

The numerical examples are organized into three sections. The ﬁrst section demonstrates STFsobtained from analysis. The second section demonstrates STFs obtained from simulations andcompare it with the analytical results. The third section demonstrates and compares the CCPand cache hit probability of the considered schemes to reveal the impact of different STFs.

A. STF - Analytical

In this section, the analytical STFs of RR, LP, TLP, and LRU are demonstrated. In general, STFcan be of high dimensions. We limit most of our demonstration to the case of three dimensions,as three-dimensional ﬁelds can be very well visualized and illustrated. A three-dimensionalsubspace in a high-dimensional STF is also illustrated.Fig. 6a demonstrates a three-dimensional STF of RR. In this ﬁgure, N c = 3 , L = 2 , andtherefore there are only three cache states (i.e., C = { , } , C = { , } , C = { , } ). The x , y ,and z axes correspond to the SCP for the cache states 1, 2, and 3, respectively. The triangulararea is the state transition domain D , the square marker represents the center of the triangle,and the circle represents the steady-state SCP η ⋆ in this example. The STF at a point in D is represented by an arrow originating from that point, while the strength and direction of the RR, N c = 3 ,L = 2 , υ = [0.5 0.29 0.21] T , φ =0.2 η η η (a) STF of RR, φ = 0 . , υ = [0 . , . , . T . RR, N c = 3 ,L = 2 , υ = [0.55 0.35 0.1] T , φ =0.45 η η η (b) STF of RR, φ = 0 . , υ = [0 . , . , . T . Fig. 7: The impact of υ and φ on the STF of RR.STF are shown by the length of the arrow and the direction of the arrowhead, respectively. Thestraight lines in the x-y plane show the contour of the cache hit probability for the SCP.Fig. 6b demonstrates part of a high-dimensional STF over the surface of an ellipsoid in a three-dimensional subspace. In this example, N c = 30 , L = 3 , and there are 4060 cache states. Threemutually-neighbor cache states are selected, corresponding to the three-dimensional subspacein the ﬁgure. The STF over the surface of an ellipsoid in this subspace is demonstrated as anexample. The x , y , and z axes correspond to the SCP for the three selected cache states. Unlikethe case in Fig. 6a, the SCPs in 6b are small and do not sum up to 1 since there are manyother states. Fig. 6b serves as an example of high-dimensional STF.Fig. 7 demonstrates the impact of the content popularity υ and the parameter φ on the STFof RR. Fig. 7a shows the STF under the same settings as in Fig. 6a except that φ is decreasedfrom 0.45 to 0.2. Two observations can be made by comparing Fig. 7a with Fig. 6a. First, thesteady-state SCP in both cases are identical, which conﬁrms Theorem 1. Second, the strengthof STF at any given point in Fig. 7a is weaker as compared to that in Fig. 6a, which impliesa longer mixing time. Fig. 7b shows the STF under the same settings as in Fig. 6a except achange in the content popularity υ . It can be seen from this ﬁgure that the steady state alsochanges following the change in content popularity. Comparing Fig. 7b with Fig. 6a, the impactof content popularity on the STF can be observed.Fig. 8 demonstrates three-dimensional STFs of LP, TLP, and LRU. In all three plots in Fig. 8, υ is set to [0 . , . , . T . In Figs. 8a and 8b, the steady state is the vertex of the trianglewith the highest cache hit probability. The difference is that the STF in Fig. 8a lead to a ‘curvy’ LP, N c = 3 ,L = 2 , υ = [0.5 0.29 0.21] T , α =0.9 η η η (a) STF of LP, α = 0 . . TLP-A, N c = 3 ,L = 2 , υ = [0.5 0.29 0.21] T η η η (b) STF of TLP-A LRU, N c = 3 ,L = 2 , υ = [0.5 0.29 0.21] T η η η (c) STF of LRU Fig. 8: The STF of LP, TLP, and LRU in 3-D.path towards the steady state in Fig. 8a while the curvature of paths in Fig. 8b is much smaller.This reﬂects the fact that TLP makes replacements along the path which increases the cachehit probability most rapidly, bearing a certain resemblance to the ‘steepest ascent’ in gradientascent. Fig. 8c appears similar to Fig. 6a. However, it can be observed that, compared to RR, thesteady state of LRU assigns a larger caching probability to states with more popular contents.This is consistent with eq. (38) and the fact that RR exploits no historical information whilemaking cache replacements.

B. STF - Numerical

In this section, we demonstrate, using RR and LRU as examples, STFs obtained from simu-lations and compare them with the analytical STF from the preceding section.Fig. 9 shows the STF of RR generated from simulations. The settings on φ and υ in Fig. 9are exactly the same as those in Fig. 6a. For each point in the STF, M realizations of states aregenerated based on the corresponding SCP. For each realization, R content requests are generated RR, N c = 3 ,L = 2 , υ = [0.5 0.29 0.21] T , φ =0.45 η η η (a) STF of RR from simulation, M = 100 , R =100 . RR, N c = 3 ,L = 2 , υ = [0.5 0.29 0.21] T , φ =0.45 η η η (b) STF of RR from simulation, M = 1000 , R =1000 . Fig. 9: The STF of RR from simulations.

LRU, N c = 3 ,L = 2 , υ = [0.5 0.29 0.21] T η η η (a) STF of LRU from simulation, R = 500 . LRU, N c = 3 ,L = 2 , υ = [0.5 0.29 0.21] T η η η (b) STF of LRU from simulation, R = 10000 . Fig. 10: The STF of LRU from simulations.based on the content popularity. Each data point (i.e., each arrow) in Figs. 9a and 9b is obtainedfrom averaging the state transitions following the M × R requests. In Fig. 9a, M and R areboth set to 100. It can be seen that the STF is not accurate, especially in the area close to thesteady state, due to insufﬁcient samples. In addition, the arrows point to a steady state slightlydeviated from the true steady state in Fig. 6a. In Fig. 9b, M and R are both increased to 1000.It can be seen that the resulting STF generated based on simulation in Fig. 9b becomes an exactmatch for the analytical STF in Fig. 6a.Fig. 10 shows the STF of LRU generated from simulations. The settings on υ in Fig. 10is exactly the same as that in Fig. 8c. Since LRU depends on request history, the simulationmethod used for Fig. 9 based on randomly generated states cannot be applied. Instead, for eachpoint in the STF, R content requests are generated based on the content popularity. The STF is I n s t an t aneou s C on t en t C a c h i ng P r obab ili t y RR l = 1 , φ = 0 . l = 1 , φ = 1 l = 10 , φ = 0 . l = 10 , φ = 1 l = 35 , φ = 0 . l = 35 , φ = 1 (a) Instantaneous CCP - RR. I n s t an t aneou s C on t en t C a c h i ng P r obab ili t y LP l = 1 , α = 0 . l = 1 , α = 1 l = 10 , α = 0 . l = 10 , α = 1 l = 35 , α = 0 . l = 35 , α = 1 (b) Instantaneous CCP - LP. No. of Requests I n s t an t aneou s C on t en t C a c h i ng P r obab ili t y TLP l = 1, A l = 1, P l = 10, A l = 10, P l = 35, A l = 35, P (c) Instantaneous CCP - TLP. No. of Requests I n s t an t aneou s C on t en t C a c h i ng P r obab ili t y LRU l = 1 l = 10 l = 35 (d) Instantaneous CCP - LRU. Fig. 11: Demonstration of instantaneous CCP of the replacement schemes, N c = 1000 , L = 30 .generated based on the state transitions following the R requests. Figs. 10a and 10b demonstratea similar result as that from Figs. 9a and 9b: the STF from simulations can deviate from theanalytical STF when the number of samples is small, while the two become an almost exactmatch when the sampled number of requests is sufﬁciently large. C. Instantaneous CCP

This section demonstrates the instantaneous CCP of RR, LP, TLP, and LRU, and relate theresults to the STF demonstrated in the preceding sections.The number of contents N c and the cache size L are set to 1000 and 30, respectively. Foreach of the four considered replacement schemes, the simulation consists of 5000 rounds. Foreach round, 2000 content requests are generated randomly based on a Zipf’s distribution withparameter 0.8. The contents are sorted based on the request probability in a decreasing order. The cache is empty at the beginning. The instantaneous CCP for each content after each requestis obtained and averaged over the 5000 rounds.The resulting CCP for three selected contents, i.e., contents 1, 10, and 35, are shown in Fig. 11.It can be seen from Figs. 11a and 11d that, starting with an empty cache, RR and LRU becomesstationary faster than LP and TLP, which are shown in Fig. 11b and Fig. 11c, respectively. Inaddition, by comparing Figs. 11a and 11d, it can be seen that LRU caches popular contents,e.g., content 1, with larger probabilities than RR. This is consistent with the observation fromcomparing Fig. 8c and Fig. 6a. The impact of α on the performance of LP can be seen fromFig. 11b, while the difference between TLP-A and TLP-P can be seen from Fig. 11c. In thecases of LP and TLP, the cache hit probability of content 35 ﬁrst increases and then decreasesto zero. This corresponds to the ‘curvy’ paths in the STF as shown in Fig. 8a and Fig. 8b.VIII. C ONCLUSION

We have revisited the problem of modeling and analyzing cache replacement schemes underIRM with the objective of providing a rigorous yet intuitive general model from a novel per-spective. Through this work, we have developed a basic tool set based on STF to characterizeand illustrate cache replacement schemes. Our investigation has also been targeted at revealinginsights regarding the relation between content popularity, knowledge of content popularityexploited by replacement schemes, and the resulting STFs. The model and methodology wehave established in this paper can also be applied to multi-level cache and cache networks afterappropriate extensions. A

PPENDIX

A. Proof of Theorem 1

We ﬁrst prove, using the STF, that the steady state is independent on φ . It can be seen fromeq. (28) that φ is just a scaling factor in u m,l, RR . Moreover, the scaling factor is the same forany state m and content l . Therefore, we can deﬁne a base STF such that at point η it satisﬁes: ¯ u m,l, RR ( η ) =  P { k | m ∈H k,l } η k , if l ∈ C m − Lη m , otherwise . (40)Then, it is easy to show that: u l, RR ( η ) = φ ¯ u l, RR ( η ) , (41a) u RR ( η ) = φ ¯ u RR ( η ) . (41b) Accordingly, a change in φ can change the strength of the STF but does not alter the directionof the STF at any point in the state transition domain. Therefore, the steady state of RR mustbe independent on φ .Next, we prove the property of the steady state. Based on the deﬁnition of STF in eq. (9), theSTF at the steady state SCP η ⋆ must be equal to . It follows that: u m, RR ( η ⋆ ) = X l ∈C m υ l · u m,l, RR ( η ⋆ ) + X l / ∈C m υ l · u m,l, RR ( η ⋆ )= X l ∈C m υ l φ X { k | m ∈H k,l } η ⋆k + X l / ∈C m υ l ( − Lφ ) η ⋆m = 0 , (42)which must hold for any m ∈ S . Based on the deﬁnition of neighbors and content-speciﬁcneighbors, it can be seen that: X l ∈C m υ l X { k | m ∈H k,l } η ⋆k = X k ∈H m υ e ( m,k ) η ⋆k . (43)Combining eq. (43) and eq. (42) gives eq. (29). (cid:4) B. Proof of Lemma 2

First, we will prove that the second largest eigenvalue of both Θ LP and Θ TLP is the ( N s − , N s − th element. In the case of LP, the sum probability of state k transitioning into anyother state is given by α P l ∈C ¯ k ↑ υ l , which is non-increasing with k . Accordingly, Θ LP ( m, m ) ≥ Θ LP ( k, k ) if m > k . Similarly, the same result can be shown for the TLP.Second, as the states are sorted based on the sum predicted request probability of their cachedcontents, it can be seen that e ( N s , N s − is the ( N c − L + 1) th content. Based on eq. (21), itcan be seen that Θ LP ( N s − , N s − is equal to − α ˜ υ ˆ l with ˆ l denoting N c − L + 1 . Similarly, Θ TLP ( N s − , N s − is equal to − ˜ υ ˆ l φ ˆ l, ˆ l − ,N s − based on eq. (25). (cid:4) C. Proof of Theorem 2

The STF at the steady state SCP η ⋆ must be equal to . It follows that: u m, LRU ( η ⋆ ) = X l ∈C m υ l · u m,l, LRU ( η ⋆ ) + X l / ∈C m υ l · u m,l, LRU ( η ⋆ )= X l ∈C m υ l X { k | m ∈H k,l } ρ LRU e ( k,m ) | k η ⋆k − X l / ∈C m υ l η ⋆m = 0 , (44) which must hold for any m ∈ S . It can be shown that: X l ∈C m υ l X { k | m ∈H k,l } ρ LRU e ( k,m ) | k η ⋆k = X k ∈H m υ e ( m,k ) ρ LRU e ( k,m ) | k η ⋆k . (45)Combining eq. (45) and eq. (44) gives eq. (37). (cid:4) R EFERENCES [1] Z. Piao, M. Peng, Y. Liu, and M. Daneshmand, “Recent Advances of Edge Cache in Radio Access Networks for Internetof Things: Techniques, Performances, and Challenges,”

IEEE Internet Things J. , vol. 6, no. 1, pp. 1010–1028, Feb. 2019.[2] E. K. Markakis, K. Karras, A. Sideris, G. Alexiou, and E. Pallis, “Computing, Caching, and Communication at the Edge:The Cornerstone for Building a Versatile 5G Ecosystem,”

IEEE Commun. Mag. , vol. 55, no. 11, pp. 152–157, Nov. 2017.[3] M. Tang, L. Gao, and J. Huang, “Enabling Edge Cooperation in Tactile Internet via 3C Resource Sharing,”

IEEE J. Sel.Areas Commun. , vol. 36, no. 11, pp. 2444–2454, Nov. 2018.[4] S. Zhang, P. He, K. Suto, P. Yang, L. Zhao, and X. Shen, “Cooperative Edge Caching in User-Centric Clustered MobileNetworks,”

IEEE Trans. Mobile Comput. , vol. 17, no. 8, pp. 1791–1805, Aug. 2018.[5] M. Emara, H. Elsawy, S. Sorour, S. Al-Ghadhban, M. Alouini, and T. Y. Al-Naffouri, “Optimal Caching in 5G NetworksWith Opportunistic Spectrum Access,”

IEEE Trans. Wireless Commun. , vol. 17, no. 7, pp. 4447–4461, July 2018.[6] T. X. Vu, S. Chatzinotas, B. Ottersten, and T. Q. Duong, “Energy Minimization for Cache-Assisted Content DeliveryNetworks With Wireless Backhaul,”

IEEE Wireless Commun. Lett. , vol. 7, no. 3, pp. 332–335, June 2018.[7] G. Lee, I. Jang, S. Pack, and X. Shen, “FW-DAS: Fast Wireless Data Access Scheme in Mobile Networks,”

IEEE Trans.Wireless Commun. , vol. 13, no. 8, pp. 4260–4272, Aug. 2014.[8] E. Bastug, M. Bennis, and M. Debbah, “Living on the Edge: The Role of Proactive Caching in 5G Wireless Networks,”

IEEE Commun. Mag. , vol. 52, no. 8, pp. 82–89, Aug. 2014.[9] K. N. Doan, T. Van Nguyen, T. Q. S. Quek, and H. Shin, “Content-Aware Proactive Caching for Backhaul Ofﬂoading inCellular Network,”

IEEE Trans. Wireless Commun. , vol. 17, no. 5, pp. 3128–3140, May 2018.[10] J. Gao, L. Zhao, and L. Sun, “Probabilistic Caching as Mixed Strategies in Spatially-Coupled Edge Caching,” in

Proc.29th Biennial Symp. Commun. , Toronto, Canada, 2018.[11] J. Qiao, Y. He, and X. Shen, “Proactive Caching for Mobile Video Streaming in Millimeter Wave 5G Networks,”

IEEETrans. Wireless Commun. , vol. 15, no. 10, pp. 7187–7198, Oct. 2016.[12] S. O. Somuyiwa, A. Gy¨orgy, and D. G¨und¨uz, “A Reinforcement-Learning Approach to Proactive Caching in WirelessNetworks,”

IEEE J. Sel. Areas Commun. , vol. 36, no. 6, pp. 1331–1344, June 2018.[13] R. Pedarsani, M. A. Maddah-Ali, and U. Niesen, “Online Coded Caching,”

IEEE/ACM Trans. Netw. , vol. 24, no. 2,pp. 836–845, Apr. 2016.[14] S. Podlipnig and L. B¨osz¨ormenyi, “A Survey of Web Cache Replacement Strategies,”

ACM Comput. Surv. , vol. 35, no. 4,pp. 374–398, Dec. 2003.[15] L. A. Belady, “A Study of Replacement Algorithms for a Virtual-Storage Computer,”

IBM Sys. J. , vol. 5, no. 2, pp. 78–101,1966.[16] D. D. Sleator and R. E. Tarjan, “Amortized Efﬁciency of List Update and Paging Rules,”

Commun. ACM , vol. 28 , no. 2,pp. 202–208, Feb. 1985.[17] G. S. Rao, “Performance Analysis of Cache Memories,”

J. ACM , vol. 25, no. 3, pp. 378-395, July 1978.[18] A. R. Karlin,, S. J. Phillips, and P. Raghavan, “Markov Paging,”

SIAM J. Comput. , vol. 30, no. 3, pp. 906-922, Aug. 2000. [19] R. Hirade and T. Osogami. “Analysis of Page Replacement Policies in the Fluid Limit,” Operations Research , vol. 58,no. 4, pp. 971-984, July 2010.[20] H. Gomaa, G. G. Messier, C. Williamson, and R. Davies, “Estimating Instantaneous Cache Hit Ratio Using Markov ChainAnalysis,”

IEEE/ACM Trans. Netw. , vol. 21, no. 5, pp. 1472–1483, Oct. 2013.[21] S. Tarnoi, V. Suppakitpaisarn, W. Kumwilaisak, and Y. Ji, “Performance Analysis of Probabilistic Caching Scheme usingMarkov Chains,” in

Proc. IEEE LCN , Clearwater Beach, USA, 2015, pp. 46–54.[22] J. Li, S. Shakkottai, J. C. S. Lui, and V. Subramanian, “Accurate Learning or Fast Mixing? Dynamic Adaptability ofCaching Algorithms,”

IEEE J. Sel. Areas Commun. , vol. 36, no. 6, pp. 1314–1330, June 2018.[23] J. Gao, S. Zhang, L. Zhao, and X. Shen, “The Design of Dynamic Probabilistic Caching with Time-Varying ContentPopularity,” submitted to

IEEE Trans. Mobile Comput. , under review.[24] L. Chang, J. Pan, and M. Xing, “Effective Utilization of User Resources in PA-VoD Systems with Channel Heterogeneity,”

IEEE J. Sel. Areas Commun. , vol. 31, no. 9, pp. 227–236, Sept. 2013.[25] M. Fiore, C. Casetti, and C. Chiasserini, “Caching Strategies Based on Information Density Estimation in Wireless AdHoc Networks,”

IEEE Trans. Veh. Technol. , vol. 60, no. 5, pp. 2194–2208, June 2011.[26] M. Meddeb, A. Dhraief, A. Belghith, T. Monteil, K. Drira, and H. Mathkour, “Least Fresh First Cache Replacement Policyfor NDN-based IoT networks,”

Pervasive Mob. Comput. , vol 52, pp. 60–70, Jan. 2019.[27] Z. H. Meybodi, J. Abouei, and A. H. F. Raouf, “Cache Replacement Schemes based on Adaptive Time Window for Videoon Demand Services in Femtocell Networks,”

IEEE Trans. on Mobile Comput. , vol. 18, no. 7, pp. 1476–1487, July 2019.[28] N. Kamiyama, Y. Nakano, and K. Shiomoto, “Cache Replacement Based on Distance to Origin Servers,”

IEEE Trans.Netw. Service Manag. , vol. 13, no. 4, pp. 848–859, Dec. 2016.[29] A. Chattopadhyay, B. Błaszczyszyn, and H. P. Keeler, “Gibbsian On-Line Distributed Content Caching Strategy for CellularNetworks,”

IEEE Trans. Wireless Commun. , vol. 17, no. 2, pp. 969–981, Feb. 2018.[30] E. Leonardi and G. Neglia, “Implicit Coordination of Caches in Small Cell Networks Under Unknown Popularity Proﬁles,”

IEEE J. Sel. Areas Commun. , vol. 36, no. 6, pp. 1276–1285, June 2018.[31] J. Gao, L. Zhao, and X. Shen, “The Study of Caching via State Transition Field - the Case of Time-Varying Popularity,”

IEEE Trans. Wireless Commun. , accepted.[32] G. S. Paschos, G. Iosiﬁdis, M. Tao, D. Towsley, and G. Caire, “The Role of Caching in Future Communication Systemsand Networks,”

IEEE J. Sel. Areas Commun. , vol. 36, no. 6, pp. 1111–1125, June 2018.[33] D. Levin, Y. Peres, E. Wilmer,

Markov Chains and Mixing Times . American Mathematical Society, Providence, RI, USA,2008.[34] S. Arora.

Random walks, Markov Chains, and How to Analyse Them . Lecture Notes, Department of Computer Science,Princeton University, 2013.[35] S. G. Walker, “Bounds for the Second Largest Eigenvalue of a Transition Matrix,”