[PDF] Model and Machine Learning based Caching and Routing Algorithms for Cache-enabled Networks

Abstract

In-network caching is likely to become an integral part of various networked systems (e.g., 5G networks, LPWAN and IoT systems) in the near future. In this paper, we compare and contrast model-based and machine learning approaches for designing caching and routing strategies to improve cache network performance (e.g., delay, hit rate). We first outline the key principles used in the design of model-based strategies and discuss the analytical results and bounds obtained for these approaches. By conducting experiments on real-world traces and networks, we identify the interplay between content popularity skewness and request stream correlation as an important factor affecting cache performance. With respect to routing, we show that the main factors impacting performance are alternate path routing and content search. We then discuss the applicability of multiple machine learning models, specifically reinforcement learning, deep learning, transfer learning and probabilistic graphical models for the caching and routing problem.

Full PDF

aa r X i v : . [ c s . N I] A p r Model and Machine Learning based Caching andRouting Algorithms for Cache-enabled Networks

Adita Kulkarni, Anand SeetharamDepartment of Computer Science, SUNY Binghamton, [email protected], [email protected]

Abstract —In-network caching is likely to become an integralpart of various networked systems (e.g., 5G networks, LPWANand IoT systems) in the near future. In this paper, we compareand contrast model-based and machine learning approachesfor designing caching and routing strategies to improve cachenetwork performance (e.g., delay, hit rate). We ﬁrst outline thekey principles used in the design of model-based strategies anddiscuss the analytical results and bounds obtained for theseapproaches. By conducting experiments on real-world traces andnetworks, we identify the interplay between content popularityskewness and request stream correlation as an important factoraffecting cache performance. With respect to routing, we showthat the main factors impacting performance are alternate pathrouting and content search. We then discuss the applicabilityof multiple machine learning models, speciﬁcally reinforcementlearning, deep learning, transfer learning and probabilisticgraphical models for the caching and routing problem.

I. I

NTRODUCTION

Over the last decade, cache networking research (e.g.,information-centric networking) has gathered signiﬁcant mo-mentum and its beneﬁts are likely to impact a variety offuture communication systems including 5G networks, clouds,LPWAN and IoT systems [1], [2]. In fact, information-centric architectures have already shown promising initialresults in IoT systems [3]. By caching and serving contentfrom in-network nodes rather than the content custodians(origin servers), cache-enabled networks seek to improve userperformance. Therefore, a number of caching and routingstrategies have been designed that effectively leverage in-network caching to improve performance.Therefore, in this paper, we outline the key principles usedin the design of these protocols and quantitatively demonstratehow these principles aid in improving performance. Based onprior work, we identify three main approaches for developingcaching and routing protocols— i) Design optimized cachemanagement strategies assuming that requests for content arerouted according to the network’s underlying routing strategy. ii)

Design optimized routing strategies assuming that thenetwork adopts some native cache management strategy. iii)

Design strategies that jointly optimize for caching and routing.We then present research on the analysis of caching androuting protocols that complement and aid the understandingof the design factors required for developing new protocols.In particular, we present an overview of recent analyticalresearch that seek to answer questions related to optimality,performance guarantees and attempt to determine the actualperformance of particular protocols in speciﬁc settings. We conduct experiments using multiple real-world networksand traces and show that the interplay between content popu-larity skewness and request stream correlation is an importantfactor affecting cache performance. We also demonstrate thataugmenting shortest path routing with alternate path routingand content search can signiﬁcantly improve performance.We next present an overview of machine learning ap-proaches that have been used to address the caching and rout-ing problem in cache-enabled networked systems. We discussthe potential beneﬁts of multiple different classes of machinelearning algorithms, in particular reinforcement learning, deeplearning, deep reinforcement learning, transfer learning, andprobabilistic graphical models to solve the caching and routingproblem. We conclude by discussing the various challengesthat need to be overcome to allow the seamless adoption ofmachine learning models to solve these problems.

The goal of this paper is to provide an overview of state-of-the-art research in cache networks in a succinct manner, todraw attention to key contributions in the ﬁeld, to highlight thevarious model-based and machine learning approaches thatcan be used to solve the caching and routing problem and tostimulate further discussion.

II. K EY D ESIGN P RINCIPLES OF R OUTING AND C ACHING

In this section, we provide an overview of the main designconsiderations while developing new caching and routing pro-tocols. We ﬁrst outline the principles behind designing cachingprotocols, followed by routing and conclude by discussingjoint caching and routing. Table I provides an overview ofsome of the recently proposed caching and routing algorithms.Due to lack of space, we are unable to cite each paperindividually. Most citations can be found within [1].

A. Caching

When adopting a cache management strategy, the networkhas two options—static caching/content placement or dynamiccaching. We ﬁrst describe static and dynamic caching and thenhighlight the differences between them.

1) Static Caching:

If static caching is adopted, the networkdecides the set of content to be placed at the different networknodes so as to optimize performance. The set of content tobe placed at the various network nodes is determined a priori,primarily based on the content popularity and then these piecesof content are cached at network nodes. As network cachesdo not change their cached content, in static caching, requestsor cached content result in hits while requests for all othercontent result in misses. While for the single cache case, theoptimal static caching strategy is caching the most popularcontent, for an interconnected network comprising of multiplenodes, determining the set of content to cache, particularlyat core network nodes is considerably harder. As upstreamcaches only receive the miss request stream from downstreamcaches, this miss stream dictates the content placement at thesenodes. Though determining the optimal set of content to cachein a general network is still largely unsolved, Banerjee et al.propose a greedy solution to this problem [4].

2) Dynamic Caching:

In dynamic caching, the contentof network caches can potentially change as new contentpasses through them. Dynamic caching strategies thus havetwo important decisions to make, one is cache eviction andthe other is cache insertion.

Cache Eviction:

If a node decides to cache a particularcontent, the cache eviction policy decides what content to evictfrom the cache. Popular cache eviction policies are the LeastRecently Used (LRU) and First In First Out (FIFO) policies.

Cache Insertion:

The other important aspect of dynamiccaching is cache insertion that aims to improve performance byincreasing the network content diversity as well as by pushingpopular content closer to the user. The simple Leave CopyEverywhere (LCE) policy results in a piece of content beingcached at all nodes on the return path from the custodian. Toincrease network content diversity, two widely adopted metricsare— i) network centrality that uses the centrality of nodesto determine what content to cache, and ii) a probabilisticapproach that takes factors such as content popularity, nodeconnectivity and whether other nodes on the path have cachedthe same content into account to determine if a content shouldbe inserted into a cache. Cache Less for More (CL4M) andProbCache are two popular strategies that rely on networkcentrality and adopt a probabilistic approach, respectively. Joint Cache Insertion and Eviction:

A variety of policieshave also been proposed that address the cache insertion andeviction aspects together. For example, a number of differentvariants of LRU (e.g., p -LRU, k -LRU [10]) have also beenproposed that address the cache insertion aspect assuming thatthe eviction policy is LRU. In p -LRU, a piece of content isinserted into a cache based on some probability p , while the k -LRU policy exploits a chain of ( k − virtual caches to ﬁltercontent. Before a request arrives at the physical cache thatstores the actual content, it passes through a chain of ( k − virtual caches that are in front of it. These virtual caches onlystore object pointers and perform caching operations on them.A content or a pointer can only be stored in the cache at level i if it obtains a hit at level ( i − .

3) Static vs. Dynamic Caching:

A natural question thatarises is what are the advantages of adopting one type ofcaching strategy (i.e., static or dynamic) over the other? Toanswer this question, it is important to understand how staticcaching and dynamic caching attempt to serve requests. Staticcaching strategies take advantage of the long tail of the contentpopularity distribution (i.e., small number of content receive majority of the requests) and cache popular content within thenetwork, while dynamic caching leverages the correlation inthe request stream. Therefore, the performance of static anddynamic caching strategies is dominated by the interplay of theskewness of the popularity distribution and the request streamcorrelation. Another important question that arises is how toadapt static caching to real-world scenarios where contentpopularity varies over time? In such scenarios, the approachadopted by static caching is to estimate content popularity overa certain time window and to cache content based on it. Thisprocess is repeated periodically to help static caching capturetemporal variations in popularity.

B. Routing

Having studied the main principles adopted for designingcaching strategies, in this section, we focus on routing. A keyidea to effectively utilize in-network caches is to seek alternatepaths for obtaining content in addition to the shortest path.In this context, the simplest approach is to adopt standardmulti-path routing. A smarter approach is to perform book-keeping in the form of keeping breadcrumbs (i.e., pointers) atusers and intermediate routers in order to keep track of thenode(s) from where a particular content is recently obtained.By following the trail of these breadcrumbs, a node canpotentially obtain content faster than shortest path routing.Content search, particularly in mobile networks is anotherkey principle that is used in conjunction with shortest pathrouting to improve performance [11]. It exploits the fact thatthe requested content may be cached nearby and thus readilyavailable at neighboring nodes.

C. Joint Caching and Routing

Instead of focusing solely on caching or routing, recentresearch has also tried to jointly optimize caching and routing[8]. While solving the joint problem, majority of existingapproaches attempt to ﬁnd the optimal content placement androuting and do not approach the problem from the dynamiccaching perspective. Based on previous research, we nextoutline the basic steps generally adopted by the researchcommunity to solve the joint caching and routing problem. • The usual methodology adopted is to cast the joint contentplacement and routing problem as an optimization onesubject to constraints such as the caching capacity at variousnodes and the connectivity among the different nodes. • The main objective functions considered in prior work aredelay and hit rate with some recent research also focusingon general utility functions [8]. • Prior research has also demonstrated that most of these for-mulations turn out to be computationally hard (i.e., NP-hard)[8]. One of the main factors contributing to the hardness isthe fact that ﬁnding the optimal content placement resultsin a combinatorial explosion. • A natural next step is to formulate approximation algorithmsthat are computationally efﬁcient and provide performanceguarantees. Most of these problem formulations are integer rotocol Name Type Machine Learning Summary of Contributionsbased Approach

Greedy Caching [5]

Static Caching No Exploits content popularity and miss stream from downstreamnodes to make caching decisions in a general network.

Femto Caching

Static Caching No Exploits content popularity to make caching decisions in aheterogeneous cellular network with performance guarantees.

Least Recently

Dynamic Caching No Evicts the content that has not been accessed

Used (LRU) (Cache Eviction) for the longest time duration.

Leave Copy

Dynamic Caching No Cache a copy of the content on all en route caches.

Everywhere (LCE) (Cache Insertion)

Leave Copy

Dynamic Caching No Cache a copy of the content at the node that is one

Down (LCD) (Cache Insertion) hop downstream from the cache hit.

Cache Less for

Dynamic Caching No Cache content based on network centrality.

More (CL4M) (Cache Insertion)

PopCache

Dynamic Caching No Cache content based on popularity.(Cache Insertion)

ProbCache

Dynamic Caching No Probabilistically cache content at a node(Cache Insertion)

Hybrid Caching [6]

Combines Static and No Divide caches into static and dynamic componentsDynamic Caching based on a utility function.

Hash-Routing

Routing No Route requests based on hash tables.

Breadcrumbs

Routing No Uses pointers (breadcrumbs) to keep track of content,follow pointers to obtain content.

CTR [7] Routing No Uses characteristic time of a content in a cache to route requests.

Optimal Caching [8] Joint Routing No Determine the optimal set of content to be cachedand Caching and the routing strategy adopted by each nodeby taking network congestion into account.

DeepCache [9] Dynamic Caching Yes Deep LSTM based encoder-decoder modelto predict the request stream,smart caching policy based on these predictions.

Q-Caching

Dynamic Caching Yes Uses Q-learning based approachto determine what content to cache.

TABLE I: A Comparison of Caching and Routing Algorithmslinear programs, thus making them amiable to approxima-tion algorithms. One approach is to demonstrate that theobjective function is submodular and that the constraintsfollow a matroid. This subsequently entails that there existsa greedy solution that provides a (1 − /e ) approximationguarantee. Additionally, researchers have designed heuristicsolutions that provide good performance in practice. • The proposed approximation and heuristic solutions aregenerally centralized in nature which necessitates that theproblem be solved in a central server and the results bedistributed to network nodes. To address this concern, sev-eral efﬁcient distributed solutions have also been proposedrecently. A widely adopted technique is to design a gradientdescent/ascent approach that asymptotically converges to thesame solution as obtained by the centralized approach.

D. Analysis of Caching and Routing

While understanding the key factors governing performanceis necessary to develop novel caching and routing strategies,analysis is essential to quantify the performance of algorithmsin particular network settings and understanding the scenarioswhere one strategy is likely to outperform another. Addi-tionally, analytical bounds and expressions can also be usedto design better caching and routing strategies and can aidnetwork management and maintenance.

1) Caching:

A signiﬁcant amount of effort has been in-vested in understanding the performance, in particular the net- work hit rate for different cache insertion and eviction policies.Table II succinctly describes the research in determining thehit rate of network caches. In one of the seminal papers, Che etal. derive approximations for the hit rate of LRU caches. Thisapproximation, popularly known as Che’s approximation hasbeen shown to be applicable for general content popularitydistributions. In recent years, this approximation has beenextended to non-stationary requests and to general networkscomprising of multiple nodes. Garretto et al. [10] deriveexpressions for the hit rate for multiple caching insertion andeviction policies such as LRU, p -LRU, k -LRU, FIFO, LFUand RANDOM, and LCE, Leave Copy Down (LCD) andLeave Copy Probabilistically (LCP) respectively. Simulationand trace-based evaluation show that the analytical and sim-ulation results match closely. This study also demonstratesthe superiority of the k -LRU policy in comparison to otherstrategies. Alongside, research effort has also been devoted toanalyze the performance of Time To Live (TTL) based cachesbecause in general it is easier to derive exact expressions foruncorrelated and correlated request streams.In [14], [15] the authors analytically study the fundamentallimits of caching in wireless networks. For example, theauthors in [14] obtain upper bounds on capacity and achievablecapacity lower bounds in wireless cache networks. Similarly,in [15], the authors investigate the capacity scaling laws incache-enabled wireless networks considering the skewness of pproach Summary of Contributions Che et al.’s Approximation

Determines the hit rate of LRU caches.

Rizk et al.’s Approximation [12]

Determines the hit rate at LRU caches in cache hierarchies described by a directed acyclic graph. a-Net [13] Iterative algorithm to determine the hit rate of a network of LRU caches.

Garretto et al. ’s Approximation [10] Extends Che’s approximation to determine the hit rate of FIFO and RANDOM cacheeviction policies. Also determines expressions for LCE, LCD and LCP cache insertion policies.

Approximation of TTL caches

Determine how to set the parameters of TTL caches to mimic the behavior of other policiessuch as FIFO and LRU.

TABLE II: Analytical Approaches for Determining Cache Hit Ratethe Zipﬁan popularity distribution. They show that the capacityat individual nodes increases monotonically with the numberof nodes for skewed popularity distributions. These scalinglaws are invaluable and help us appreciate the maximumbeneﬁts of caching. Similarly, the authors investigate a generalcache network using queuing theory and determine how toplace objects in caches to attain a desired objective [16].

2) Routing:

Theoretical analysis has also been conductedto determine the extent to which content search and scopedﬂooding is beneﬁcial. Analysis and experiments show that theoptimal ﬂooding radius is small (less than 3 hops). This meansthat ﬂooding requests beyond the immediate neighborhoodof a requester is likely to incur signiﬁcant overhead whileproviding minimal performance improvement. The beneﬁts ofopportunistic routing, an important routing paradigm designedfor wireless networks that exploits the broadcast nature ofthe wireless medium to select the best relay to forward arequest toward the custodian has been analyzed in [17]. Theauthors design Markovian models to analyze the performanceof opportunistic request routing in wireless cache networks inthe presence of multi-path fading. Based on their results, theauthors conclude that the beneﬁts of in-network caching aremore pronounced when the probability of successful packettransmission is low. This result suggests that caching is likelyto have more beneﬁts in a wireless network with lossy links.Popular implementation of the ICN architecture such asContent-centric Networks (CCN) and Named Data Networks(NDN) perform aggregation of requests for same content(popularly known as Interest aggregation) through the use of adata structure called Pending Interest Table (PIT) to improverouting performance. A recent study investigates and quantiﬁesthe beneﬁts that PITs provide under realistic conditions. Thispreliminary investigation suggests that only a small fractionof requests may beneﬁt from request aggregation with thebeneﬁts being closely tied to the network cache budget.III. E

XPERIMENTAL R ESULTS

In this section, we experimentally demonstrate the perfor-mance beneﬁts of the key principles discussed in the previoussection in the design of caching and routing strategies. Todemonstrate how the interplay of content popularity skewnessand correlation impact the performance of caching strategies,we conduct experiments on multiple real-world topologies(e.g., GARR, WIDE, GEANT), synthetic and real-world re-quest stream traces (e.g., YouTube, Wikipedia) and multiplecache insertion policies (LCE, CL4M and ProbCache). Wepresent representative results for the GARR topology and the

100 200 300 400 500 600

Cache Size C ac h e H it R a ti o StaticDynamic (a) Synthetic Request Stream

100 200 300 400 500 600

Cache Size C ac h e H it R a ti o StaticDynamic (b) YouTube Request Stream

Fig. 1: Performance of Static and Dynamic Caching

100 300 500 600

Cache Size C on t e n t D i v e r s it y LCECL4MProbCache (a) Network Content Diversity

100 300 500 600

Cache Size P e r ce n t a g e o f r e qu e s t s s e r v e d SP MP Search (b) Performance Impact of Multi-path Routing and Content Search

Fig. 2: Comparing Caching and Routing Strategy PerformanceYouTube trace to avoid cluttering the paper with multiplesimilar ﬁgures. The GARR topology comprises of 61 nodesand 21 users. The YouTube request stream trace used herewas collected over a campus network at the University ofMassachusetts Amherst. In this particular trace, the long-term content popularity is low whereas the overall correlationamong requests is high, which means that requests for thesame content tend to occur in bursts. We assume all contentto be of unit size and vary the cache size in our experiments.To study the impact of request stream on the performanceof static and dynamic caching, let us consider Figures 1aand 1b. In Figure 1a, we assume that content popularity isdistributed according to a Zipﬁan distribution with skewnessparameter α = 0.7 and requests for content are generatedfollowing an independent request stream model, while Figure1b is generated for the YouTube trace. The static cachingpolicy adopted in the ﬁgures is Greedy Caching [4], while thedynamic cache insertion and eviction policies are Leave CopyEverywhere (LCE) and LRU respectively. We observe fromFigure 1a that static caching outperforms dynamic cachingwhile the opposite is true in Figure 1b. This is becauseas Figure 1a is generated considering an identically andindependently distributed request stream, the correlation is 0and hence, static caching outperforms dynamic caching. Incomparison, as the overall correlation in the request streamncreases, as is the case in the YouTube trace, dynamic cachingoutperforms static caching (Figure 1b).We next turn our attention to dynamic caching policiesand study the impact of cache insertion policies on contentdiversity, which in turn affects performance. We deﬁne contentdiversity as the total number of unique content in the network.Figure 2a shows the content diversity for LCE, CL4M andProbCache for various cache sizes for the GARR topology forthe YouTube trace. We observe from Figure 2a that the contentdiversity for CL4M and ProbCache is considerably higher incomparison to LCE. This content diversity also translates toimproved performance (e.g., hit rate), with both ProbCacheand CL4M outperforming LCE [4].From the above discussion, it is evident that static anddynamic caching attempt to exploit different aspects of therequest stream to improve performance. Recent work hasexplored the beneﬁts of hybrid caching that combines the bestaspects of static and dynamic caching [6]. The key idea isto split a cache into two parts—a static part that staticallycaches content based on popularity and a dynamic part thatleverages content popularity. Determining the optimal split isan important research question that is still being investigated.We next study the positive performance impact of ideassuch as multi-path routing and content search (Figure 2b).As paths to content custodians are always available in a staticnetwork, we consider a real-world mobile network comprisingof pedestrian users to demonstrate their beneﬁts. To this end,we consider the Stockholm pedestrian mobility trace thatcontains simulation traces of pedestrians walking in a part ofdowntown Stockholm, covering an area of 5872 sq. m. For ourexperiments, we consider 300,000 location entries consistingof 587 pedestrians. As mobility can result in frequent pathbreakages, Figure 2b shows the percentage of requests servedfor various cache sizes if we adopt either multi-path or contentsearch to execute on top of shortest path routing. We observefrom the ﬁgure that both multi-path (denoted by MP) andcontent search aid in serving a greater number of requests.We observe that using shortest path (denoted by SP) andcontent search together serves signiﬁcantly more number ofrequests than multi-path forwarding. The reason for the limitedperformance improvement in multi-path routing is that bothpaths in multi-path routing are calculated based on the priornetwork state and thus when the shortest path breaks due tonode mobility, the alternate path to the custodian is also likelyto be broken. In comparison, large number of requests canbe served by content search because node mobility helps inexploiting content diversity by searching new neighbors.IV. M ACHINE L EARNING A PPROACHES FOR C ACHINGAND R OUTING

In this paper, so far we have focused on design principlesand analysis of caching and routing strategies. In this section,we ﬁrst investigate the beneﬁts of adopting machine learningtechniques to solve the caching and routing problem and thenlist some of the challenges that need to be overcome to allowthe seamless adoption of machine learning for these problems.

A. Possible Machine Learning Models

The performance of caching strategies can be improved ifone can accurately predict changes in content popularity anddetermine what content is likely to be requested in future.Learning web request streams and using them to improve theperformance of HTTP caches has been well investigated inthe early part of this century. Models such as n-gram models,Markov models and Markov trees developed in the contextof HTTP caches can also be adopted in cache networks withcertain modiﬁcations.

Reinforcement Learning:

The necessity to make joint cachingand routing decisions to improve performance makes thisproblem an ideal candidate for adopting reinforcement learn-ing methods. To this end, a reinforcement learning methodcalled Q-caching has been proposed that builds on standardQ-routing to make joint caching and routing decisions. Q-caching increases network content diversity, reduces the loadat custodians and content download times for clients. Similarly,in a recent work, the authors propose a reinforcement learningapproach that uses model-free acceleration for online cachereplacement by taking predicted content popularity, cache hitsand replacement costs into account.

Deep Learning:

Deep learning models have also been pro-posed to predict future content popularity variations by tak-ing the spatial-temporal features of popularity into account.For example, a recently proposed approach DeepCache [9]uses a deep LSTM based encoder-decoder model to predictthe changes in content popularity. The authors then designa caching policy, which takes predicted content popularityinformation into account to make smart caching decisions.Similarly, in other recent work, lightweight cache insertionand eviction schemes that take these deep learning predictionsin consideration have also been proposed.

Deep Reinforcement Learning:

In recent years, multiple deepreinforcement learning approaches (e.g., asynchronous advan-tage actor-critic (A3C), deep deterministic policy gradient(DDPG), Deep Q Networks (DQN), Trust Region Policy Op-timization (TRPO)) have been designed that have been shownto provide good performance in multiple different domains.Most of the above-mentioned approaches are based on deepactor-critic architectures, a recent architectural improvementto reinforcement learning. Actor-critic architectures are alsothe easiest to adapt to an online training setting. For example,the asynchronous nature of A3C helps in designing paralleland distributed implementations of the algorithm that allowfor greater exploration of the state space, resulting in goodtest performance on previously unseen data. Similarly, thepresence of off-policy updates in DDPG similarly allow for awide exploration of the state space. DDPG also has provisionsto learn from a large amount of past data and uncorrelatedtransitions from the replay buffer.

Transfer Learning and Bandit Models:

Along with reinforce-ment learning, transfer learning and bandit models are alsogood candidates for determining changes in content popularityand making joint caching and routing decisions. The maindea behind transfer learning is to leverage the knowledgegained in one domain and apply it to related but ‘new’domain. In this regard, content popularity estimation and inturn caching performance can potentially be improved bytaking into account information such as a user’s location andsocial networking connections. To apply bandit models tocache networks, we can assume the agent to have only partialknowledge of the network. Based on this current knowledge,the agent takes actions to maximize its accumulated rewardwhile acquiring new knowledge.

Probabilistic Graphical Models:

Probabilistic graphical mod-els (both discriminative and generative) are also good candi-dates for predicting content popularity variations [18]. More-over, the low computational requirements of graphical modelsduring the training phase also make them a more attrac-tive option than deep learning models. While discriminativemodels (e.g., conditional random ﬁelds (CRFs)) only learnthe conditional dependencies in the output variables (futurepredictions) given the input variables (past data) at trainingtime, generative models (e.g., Markov random ﬁelds (MRFs),hidden Markov models (HMMs)) learn the joint dependenciesin the entire data. While at a cursory glance, generative modelsmay appear to be superior than discriminative models as theyjointly model the dependencies in the entire data, prior workhas demonstrated that discriminative models often achievesuperior prediction performance as they are tuned to maximizeperformance by learning structured outputs. In comparison,generative models capture the inherent dependencies in thedata, help in accurately generating the data, and thus enableus to better understand the underlying network characteristics.

B. Challenges

One of the biggest hurdles in adopting machine learningmodels, in particular deep models is the high computationalresource requirement necessary for training these models. Thiscomputational overhead makes it harder to train the modelsin an online manner. Apart from the computational overhead,another well-known issue with deep learning models is theirlack of interpretability. Due to the complex inter-connectionof cells in a neural network architecture as well as the largenumber of hidden layers, it is often very difﬁcult to explainthe predictive performance of deep learning models. Anotherimportant issue that has been largely overlooked in priorresearch is the applicability of a trained model for a varietyof different settings. In majority of prior work, even thoughthe models are trained and tested on different data, both thetrain and test data are usually collected in a similar setting.In our preliminary investigation, we have found that a modeltrained in one setting may not necessarily perform well in anew setting. The main reason is that the sequences and dataseen by the model at test time may not be similar to the onesseen at training time, thus leading to poor performance at testtime. A fundamental question that arises in this regard is—how to design training datasets so that one can obtain overallgood performance even in previously unseen network settings? V. C

ONCLUSION

In this paper, we highlighted the key principles adopted inthe design of model-based caching and routing strategies toimprove performance in cache networks. We then discussedthe applicability of machine learning models, in particulardeep learning, reinforcement learning, transfer learning andprobabilistic graphical models for this problem.R

EFERENCES[1] G. S. Paschos, G. Iosiﬁdis, M. Tao, D. Towsley, and G. Caire, “The roleof caching in future communication systems and networks,” to appearin JSAC Special Issue on Caching , 2018.[2] I. U. Din, S. Hassan, M. K. Khan, M. Guizani, O. Ghazali, andA. Habbal, “Caching in information-centric networking: Strategies, chal-lenges, and future research directions,”

IEEE Communications Surveys& Tutorials , vol. 20, no. 2, pp. 1443–1474.[3] J. Pfender, A. Valera, and W. K. Seah, “Performance comparison ofcaching strategies for information-centric iot,” in

Proceedings of the 5thACM Conference on Information-Centric Networking , 2018, pp. 43–53.[4] B. Banerjee, A. Kulkarni, and A. Seetharam, “Greedy caching: Anoptimized content placement strategy for information-centric networks,”

Computer Networks , vol. 140, pp. 78–91, 2018.[5] B. Banerjee, A. Seetharam, and C. Tellambura, “Greedy Caching: Alatency-aware caching strategy for information-centric networks,” in

Networking, 2017 International Conference on . IFIP, 2017.[6] A. Kulkarni and A. Seetharam, “Exploiting correlations in requeststreams: A case for hybrid caching in cache networks,” in

LocalComputer Networks, 2017 International Conference on . IEEE, 2018.[7] B. Banerjee, A. Seetharam, A. Mukherjee, and M. K. Naskar, “Char-acteristic time routing in information-centric networks,”

Computer Net-works , vol. 113, pp. 148–158, 2017.[8] M. Dehghan, B. Jiang, A. Seetharam, T. He, T. Salonidis, J. Kurose,D. Towsley, and R. Sitaraman, “On the complexity of optimal re-quest routing and content caching in heterogeneous cache networks,”

IEEE/ACM Transactions on Networking (TON) , vol. 25, no. 3, pp. 1635–1648, 2017.[9] A. Narayanan, S. Verma, E. Ramadan, P. Babaie, and Z.-L. Zhang,“Deepcache: A deep learning based framework for content caching,” in

Proceedings of the 2018 Workshop on Network Meets AI & ML . ACM,2018, pp. 48–53.[10] M. Garetto, E. Leonardi, and V. Martina, “A uniﬁed approach tothe performance analysis of caching systems,”

ACM Transactions onModeling and Performance Evaluation of Computing Systems , vol. 1,no. 3, p. 12, 2016.[11] A. Banerjee, B. Banerjee, A. Seetharam, and C. Tellambura, “Contentsearch and routing under custodian unavailability in information-centricnetworks,”

Computer Networks , vol. 141, pp. 92–101, 2018.[12] A. Rizk, M. Zink, and R. Sitaraman, “Model-based design and analysisof cache hierarchies,” in

Networking, 2017 International Conference on .IFIP, 2017.[13] E. Rosensweig, “On the analysis and management of cache networks,”2012.[14] L. Qiu and G. Cao, “Cache increases the capacity of wireless networks,”in

INFOCOM 2016-The 35th Annual IEEE International Conference onComputer Communications, IEEE . IEEE, 2016, pp. 1–9.[15] ——, “Popularity-aware caching increases the capacity of wirelessnetworks,” in

INFOCOM 2017-IEEE Conference on Computer Com-munications, IEEE . IEEE, 2017, pp. 1–9.[16] A. M. Milad Mahdian, S. Ioannidis, and E. Yeh, “Kelly cache networks,”in . IEEE, 2019.[17] J. D. Herath and A. Seetharam, “Analyzing opportunistic request routingin wireless cache networks,” in . IEEE, 2018, pp. 1–6.[18] D. Koller, N. Friedman, and F. Bach,

Probabilistic graphical models:principles and techniques , 2009., 2009.