[PDF] A Survey of Deep Learning for Data Caching in Edge Network

Abstract

The concept of edge caching provision in emerging 5G and beyond mobile networks is a promising method to deal both with the traffic congestion problem in the core network as well as reducing latency to access popular content. In that respect end user demand for popular content can be satisfied by proactively caching it at the network edge, i.e, at close proximity to the users. In addition to model based caching schemes learning-based edge caching optimizations has recently attracted significant attention and the aim hereafter is to capture these recent advances for both model based and data driven techniques in the area of proactive caching. This paper summarizes the utilization of deep learning for data caching in edge network. We first outline the typical research topics in content caching and formulate a taxonomy based on network hierarchical structure. Then, a number of key types of deep learning algorithms are presented, ranging from supervised learning to unsupervised learning as well as reinforcement learning. Furthermore, a comparison of state-of-the-art literature is provided from the aspects of caching topics and deep learning methods. Finally, we discuss research challenges and future directions of applying deep learning for caching

Full PDF

aa r X i v : . [ c s . N I] A ug A S

URVEY OF D EEP L EAR NING FOR D ATA C AC HING IN E DGE N ETWORK

A P

REPRINT

Yantong Wang and Vasilis Friderikos Center for Telecommunications ResearchDepartment of Engineering, King’s College LondonLondon, WC2R 2LS, U.K. [email protected] [email protected] A BSTRACT

The concept of edge caching provision in emerging 5G and beyond mobile networks is a promisingmethod to deal both with the trafﬁc congestion problem in the core network as well as reducinglatency to access popular content. In that respect end user demand for popular content can be satis-ﬁed by proactively caching it at the network edge, i.e, at close proximity to the users. In additionto model based caching schemes learning-based edge caching optimizations has recently attractedsigniﬁcant attention and the aim hereafter is to capture these recent advances for both model basedand data driven techniques in the area of proactive caching. This paper summarizes the utilization ofdeep learning for data caching in edge network. We ﬁrst outline the typical research topics in contentcaching and formulate a taxonomy based on network hierarchical structure. Then, a number of keytypes of deep learning algorithms are presented, ranging from supervised learning to unsupervisedlearning as well as reinforcement learning. Furthermore, a comparison of state-of-the-art literature isprovided from the aspects of caching topics and deep learning methods. Finally, we discuss researchchallenges and future directions of applying deep learning for caching. K eywords deep learning · content caching · network optimization · edge network Undoubtedly, future 5G and beyond mobile communication networks will have to address stringent requirements ofdelivering popular content at ultra high speeds and low latency due to the proliferation of advanced mobile devicesand data rich applications. In that ecosystem, edge-caching has received signiﬁcant research attention over the lastdecade as an efﬁcient technique to reduce delivery latency and network congestion especially during peak-trafﬁc timesor during unexpected network congestion episodes by bringing popular data closer to the end users. One of the mainreasons of enabling edge caching in the network is to reduce the number of requests that traverse the access and coremobile network as well as reducing the load at the origin servers that would have to, otherwise, respond to all requestsdirectly in absence of edge caching. In that case popular content and objects can be stored and served from edgelocations, which are closer to the end users. This operation is also beneﬁcial from the end user perspective sinceedge caching can dramatically reduce the overall latency to access the content and increase in the sense overall userexperience. It is also important to note that the notion of popular content means that the requests of top 10% of videocontent on the Internet account for almost 80% of all trafﬁc; which relates to multiple requests from different end usersof the same content [1].Recently, deep learning (DL) has attracted signiﬁcant attention from both academia and industry and has been appliedto diverse domains like self-driving, medical diagnosis, playing complex games such as Go [2]. DL has also made theirway into communication areas [3]. In this paper, we pay attention to the application of DL in caching policy. Thoughthere are some earlier surveys related to machine learning applications, they either focus on general machine learningtechniques for caching [4, 5, 6], or concentrate on overall wireless applications [7, 8, 9]. The work [3] provides a bigicture of applying machine learning in wireless communications. In [4], the authors consider the machine learning onboth caching and routing strategy. A comprehensive survey on machine learning applications for caching content inedge networks is provided in [5]. The researchers [6] provide a survey about machine learning on mobile edge cachingand communication resources. On the other hand, [7] overviews how artiﬁcial neural networks can be employed forvarious wireless network problems. The authors in [8] detail a survey on deep reinforcement learning (DRL) for issuesin communications and networking. [9] presents a comprehensive on deep learning applications and edge computingparadigm. Our work can be distinguished from the aforementioned papers based on the fact that we focus on thedeep learning techniques on content caching and both wired and wireless caching are taken into account. Our maincontributions are listed as follows: • We classify the content caching problem into Layer 1 Caching and Layer 2 Caching. Each layer cachingconsists of four tightly coupled subproblems: where to cache, what to cache, cache dimensioning and contentdelivery. Related researches are provided accordingly. • We present the fundamentals of DL techniques which are widely used in content caching, such as convolu-tional neural network, recurrent neural network, actor-critic model based deep reinforcement learning, etc. • We analyze a broad range of state-of-the-art literature which use DL to content caching. These papers arecompared based on the DL structure, layer caching coupled subproblems and the objective of DL in eachscenarios. Then we discuss research challenges and potential directions for the utilization of DL in caching.Figure 1: Survey ArchitectureThe rest of this survey is organized as follows (as illustrated in Figure 1 ). Section 2 presents the categories of contentcaching problem. Section 3 reviews typical deep neural network structures. In Section 4, we list state-of-the-art DL-based caching strategies and their comparison. Section 5 debates challenges as well as potential research directions.In the end, Section 6 concludes this paper. For better readability, the abbreviations in this paper is listed as Table 1shows.

The paradigm of data caching in edge networks is illustrated in Figure 2. Similarity to [10], the scope of edge inthis paper is along the path between end user and data server, which contains Content Router (CR), Macro BaseStation (MSB), Femto Base Station (FSB) and End Device (ED). In the context of Cloud-Radio Access Network (C-RAN)[11], both baseband unit (BBU) and remote radio head (RRH) are considered as potential caching candidates2able 1: List of Abbreviations.

Abbr. Description Abbr. Description

3C Computing, Caching and Communication A3C Asynchronous Advantage Actor-CriticBBU Baseband Unit CCN Content-Centric NetworkCNN Convolutional Neural Network CoMP-JT Coordinated Multi Point Joint TransmissionCR Content Router C-RAN Cloud-Radio Access NetworkCSI Channel State Information D2D Device to DeviceDDPG Deep Deterministic Policy Gradient DL Deep LearningDNN Deep Neural Network DQN Deep Q NetworkDRL Deep Reinforcement Learning DT Digital TwinED End Device ES Edge ServerETSI European Telecommunication Standardization InstituteESN Echo-State Network FIFO First In First OutFNN Feedforward Neural Network FBS Femto Base StationICN Information-Centric Network LFU Least Frequently UsedLP Linear Programming LRU Least Recently UsedLSTM Long Short-Term Memory MAR Mobile Augmented RealityMD Mobile Device MILP Mixed Integer Linear ProgrammingMBS Macro Base Station NFV Network Function VirtualizationPNF Physical Network Function PPO Proximal Policy OptimizationQoE Quality of Experience RL Reinforcement LearningRNN Recurrent Neural Network RRH Remote Radio HeadSAE Sparse Auto Encoder SDN Software Deﬁned Networkseq2seq Sequence to Sequence SNM Shot Noise ModelTRPO Trust Region Policy Optimization TTL Time to LiveVNF Virtual Network Function WSN Wireless Sensor Networkto hosting content, where the BBUs are clustered as a BBU pool centrally and RRHs are deployed near BS’s antennadistributively. According to the hierarchical structure of edge network, the data caching is classiﬁed into two categories:Layer 1 Caching and Layer 2 Caching. In this section we illustrate the typical research topics in these two areas.

In Layer 1 Caching, the popular content is considered to be hosted in CRs. In the context of Information-CentricNetwork (ICN), CR plays dual roles both as a typical router (i.e. data ﬂow forwarding) and content store (i.e. localarea data caching facility). Generally, the CR is connected via wired networks. Layer 1 Caching consists of fourtightly coupled problems: where to cache, what to cache, cache dimensioning and content delivery[12].Where to cache focuses on selecting the proper CRs to host the content. For instance, in Figure 2, contents replicascan be placed in lower hierarchical level CRs, such as router B and C, as the mean of reducing transmission costbut with extra cost pay for hosting contents; reversely, consolidating caching in CR A can be adopted to savingcaching cost at the expense of more transmission cost and has the risk of expiring end users’ delay requirement.Here the caching/hosting cost is the cost to deploy the content, which could be measured by space utilization, energyconsumption or other metrics. The transmission cost represents the price for delivering the content from cached CR(or data server) to end user and is basically estimated via the number of hops. Where to cache problem usually hasbeen modelled as a Mixed Integer Linear Programming (MILP): min x c T x (1a)s.t. Ax ≤ b (1b) x ∈ { , } (1c)or x ≥ (1d)where x is the decision variable. Normally it is a binary variable indicating the CR assignment. In special cases, withthe aim of modelling or linearization, some non-binary auxiliary variables are introduced as constraint (1d) shows. Iftaking caching a part of a ﬁle not the complete into consideration, the decision variable x is a continuous variablerepresenting the segments host in the CR, then constraint (1c) becomes x ∈ [0 , and MILP model turns to linearprogramming (LP). There are many work allocating contents via MILP with different objectives and limitations. The3igure 2: Data Caching in Edge Networkauthors in [13] propose a model to minimize the user delay and load balancing level of CRs with the satisfaction ofcache space. The work in [14] considers a trade-off between caching and transmission cost with cache space, linkbandwidth and user latency constraints. In [15], an energy efﬁcient optimization model is constructed consistingof caching energy and transport energy. [16] provides more details of mathematical model and related heuristicalgorithms in caching deployment of wired networks.What to cache concentrates on selecting the proper contents in CRs for the purpose of maximizing the cache hit ratio.Via exploiting the statistical patterns of user requests, the popularity of requested information and user preferencecan be forecasted and play a very signiﬁcant role in determining caching content. On the one hand, from the view ofaggregated request contents, researchers propose many different models and algorithms for popularity estimation. Onewidely used model in web caching is the Zipf model based the assumption that the content popularity is static and eachusers’ request is independent[17]. However, this method fails to reﬂect the temporal and spatial correlations of thecontent, where the temporal correlation reﬂects the popularity varies over time and the spatial correlation representsthe content preference is different on the geographical area and social cultural media. A temporal model named theshot noise model (SNM) is built in [18] which enables users to estimate the content popularity dynamically. Inspiredby SNM, the work in [19] considers both spatial and temporal characteristics during caching decision. On the otherhand, from the view of a speciﬁc end user during a certain period, caching his/her preference content (may not bethe popular in network) can also help to reduce the trafﬁc ﬂow. Many approaches in recommendation systems can beapplied in this case[20]. Another aspect of what to cache problem is the designing of cache eviction strategies whenstorage space faces the risk of overﬂow. Depending on the life of caching contents, these policies can be divided intotwo categories roughly: one is like ﬁrst in ﬁrst out (FIFO), least frequently used (LFU), least recently used (LRU) andrandomized replacement, the contents would not be removed until no more memory is available; the other one is calledtime to live (TTL) strategy, where the eviction happens once the related timer expires. [21] presents analytic model forhit ratio in TTL-based cache requested by independent and identically distributed ﬂows. It worth noting that in [21],the TTL-based cache policy is used for the consistency of dynamic contents instead of contents replacement. In [22],the authors introduce a TTL model for cache eviction and the timer is reset once related content cache hit happens.Cache dimensioning highlights how much storage space to be allocated. Beneﬁt from the softwarization and virtual-ization technologies, the cache size in each CR or edge cloud can be managed in a more ﬂexible and dynamical way,which makes the cache dimensioning decisions become an important feature in data caching. Technically, the cachehit ratio rises with the increasing of cache memory, and consequently eases the trafﬁc congestion in the core network.4owever, excessive space allocation would waste the resource like energy to support the caching function. Hence thereis a trade-off between cache size cost and network congestion. Economically, taking such scenario into consideration:a small content provider wants to rent service from a CDN provider such as Akamai or Huawei Cloud, and there isalso a balance between investment saving and network performance. In [23], the proper cache size of individual CRin Content-Centric Network (CCN) is investigated via exploiting the network topology. In [24], the authors considerthe effect of network trafﬁc distribution and user behaviours when designing cache size.Content delivery considers how to transform the caching content to the requested user. The delivery trafﬁc embracessingle cache ﬁle downloading and video content steaming and the metrics for these two scenarios vary. Regarding ﬁledownloading, the content cannot be consumed until the delivery is completed. Therefore the downloading time of theentire ﬁle is viewed as a metric to reﬂect the quality of experience (QoE). For video steaming, especially for thoselarge video splitted into several chunks, the delay limitation only works on the ﬁrst chunk. In that case, deliveringthe ﬁrst chunk in time and keep the smooth transmission of the rest chunks are the key aims [25]. Apart from thosemeasuring metrics, another problem in content delivery is the routing policy. In CCN [26], one implementation ofICN architecture, employs a ﬂooding-based name routing protocol to publish the request among cached CRs. On onehand, ﬂooding strategy simpliﬁes the designing complexity and reduce the maintaining cost particularly in an unstablescenario; on the other hand, it costly wastes bandwidth resources. In [27], the authors discuss the optimal radius inscoped ﬂooding. The deliver route is often considered jointly with where to cache problem, in which the objectivefunction (1a) includes both deployment and routing cost. Contrast to Layer 1 caching in wired connection, Layer 2 caching considers implementing caching techniques in wire-less network. Though both of them need solve where to cache, what to cache, cache dimensioning and content deliveryproblems, wireless caching is more challenging and some mature strategies in wired caching cannot be migrated di-rectly to wireless case. Some reasons come from the listed aspects: the resources in wireless environment, such ascaching storage and spectrum, are limited compared with CRs in Layer 1 Caching; the mobility of end users anddynamic network typologies are also required to be considered during the design of caching strategies; moreover, thewireless channels are uncertain since they can be effected by fading and interference.In wireless caching, where to cache focus on ﬁnding the proper candidates among MBS, FBS, ED, even BBU pooland RRU in C-RAN to host the content. Caching at MBS and FBS can alleviate backhaul congestion since end usersobtain the requested content from BS directly instead of from CR via backhaul links. Compared with FBS, MBShas wider coverage and typically, there is no overlap among different MBSs [28]. As mentioned above, the cachingspace in BSs is limited and it is impractical to cache all popular content. With the aim of improving cache-hit ratio,a MILP-modelled collaborative caching strategy among MBSs is proposed in [29]. If the accessed MBS does nothost the content, the request will be served by a neighbour MBS which cache the ﬁle rather than by the data server.For FBS caching, a distributed caching method is presented in [30] and the main idea is that the ED locating in theFBS coverage overlap is able to obtain contents from multiple hosters. Caching at ED can not only ease backhaulcongestion but also improve the area spectral efﬁciency [28]. When the end user requests a content, he/she would besevered by the local storage if the content is precached in his/her ED or by adjacent ED via D2D communication if thecontent is host accordingly. In [31], the authors model the cache-enabled D2D network as a Poisson cluster process,where end users are grouped into several clusters and the collective performance is improved. Individually, caching theinterested contents for other users affects personal beneﬁt. In [32], a Stackelberg game model is applied to formulatethe conﬂict among end users and a related incentive mechanism is designed to encourage content sharing. For thecase of cache-enabled C-RAN, caching at BBU can ease the trafﬁc congestion in the backhaul while caching at RRHcan reduce the fronthaul communication cost. On the other hand, caching all at BBU raises the signaling overhead ofBBU pool while at RRH weakens the processing capability. Therefore, where to cache the content in C-RAN makes asubstantial contribution to balancing the signal processing capability at the BBU pool and the backhaul/fronthaul costs[28]. The work in [33] investigates caching at RRHs with jointly considering cell outage probability and fronthaulutilization. Due to the end users’ mobility, the prediction/awareness of user moving behaviour also inﬂuence the properhoster selection. There are some researches exploiting user mobility in cache strategy designing like [34] and [35].Similar with Layer 1, what to cache decision as well as eviction policy of layer 2 depends on the accurate predictionon content popularity or user preference in proactive caching method. The content popularity contains the feature oftemporal and spatial correlations, which has already been described in Layer 1 Caching. In Layer 2 caching, the properspatial granularity in popular contents estimation needs to take special attentions [36]. For example, the coverage ofMBS and FBS are different, which makes the popularity in MBS and FBS are different as well. Because the formerbased on a large number of users’ behaviors but the individual may prefer speciﬁc content categories. For small cells,the preference estimation requires more accurate information like historical data [28]. In order to capture the temporal5nd spatial dynamics of user preference, many different deep learning based algorithms are proposed, which will beillustrated in Section 4.Cache dimensioning in Layer 2 Caching has more complicated factors need to be considered, not only including thenetwork topology and content popularity as Layer 1 Caching, but also containing backhaul transmission status andwireless channel features. The proper cache size assignment is studied in the scenario of backhaul limited cellularnetwork [37]. It also provides the closed-form boundary of minimum cache size in one cell case. In the case of densewireless network, the work in [38] quantiﬁes the minimum required cache to achieve the linear capacity scaling ofnetwork throughput. The authors of [39] also consider the scenario of dense networks. They derive the closed-form ofthe optimal memory size which can reduce the consumption of backhaul capacity as well as guarantee wireless QoS.According to the number of transmitters and receivers, we divide the content delivery in Layer 2 caching into three cat-egories: one candidate serves one end user, such as unicast and D2D transmission; one candidate serves multiple userslike multicast; and coordinated delivery including multiple transmitters serve one or more receivers like coordinatedmulti-point joint transmission (CoMP-JT). Once the requested content is cached locally, BS can serve the end uservia unicast or the adjacent device shares the contents by implementing D2D transmission. Concurrent transmissionhas the risk of co-channel interference in dense deployed networks. In D2D network, link scheduling is introducedto select subsets of links to transmit simultaneously [28]. With the aim of improving the spectral efﬁciency, multicastis applied in content delivery when serving multiple requests simultaneously with the same content. Therefore thereis a trade off between spectral efﬁciency and service delay. For the aim of serving more users in one transmission aswell as higher spectral efﬁciency, the BS will wait to collect enough requirement for the same content which makesthe ﬁrst request a long waiting time. An optimal dynamic multicast scheduling is proposed in [40] to balance thesetwo factors. Multicast can also serve multiple requests with different contents. In [41], the authors provides a codedcaching scheme which requires the communication link is error free and each user caches a part of its own content andpartial of other users. Then BS multicasts the coded data to all users. Each user can decode his own requested contentby XOR operation between the received data and the precached other users’ ﬁle. However, the coding complexityincreases exponentially as the quantity of end users grows. The CoMP-JT can improve the spectral efﬁciency as wellvia sharing channel state information (CSI) and contents among BSs but it also needs high-capacity backhaul con-sumption for exchanging data. In C-RAN, the BBUs are centralized in the BBU pool, which makes communicationamong BSs very efﬁciency. [42] designs CoMP-JT in C-RAN for the purpose of minimizing power consumption withlimitations of transmission energy, link capacity and requested QoS.

As Figure 3 shows, some typical deep neural network (DNN) methods are stated. These models are classiﬁed intothree categories depending on the training methods: supervised learning, unsupervised learning and reinforcementlearning.

FNN is a kind of DNNs whose information propagation direction is forward and there is no cycle in neurons. In thispaper, the term FNN is used to represent fully connected neural network, which indicates the connection between twoadjacent layers is ﬁlled. According to the Universal Approximation Theorem, FNN has the ability to approximate anyclosed and bounded function with enough neurons in hidden layer [43]. The hidden layer is applied to extract featuresof input vector, and then feed the output layer, which works as a classiﬁer. Though FNN is very powerful, it gets intotrouble when dealing with real-world task such as image recognition due to enormous weight parameters (because offully connected) and lack of data augmentation.

For the aim of overcoming the aforementioned drawback of FNN, CNN employs convolution and pooling operations,where the former applies sliding convolutional ﬁlters to the input vector and the later does down sampling, usuallyvia maximum or mean pooling. Generally, CNN tends to contain deeper layers and smaller convolutional ﬁlters, andthe structure becomes fully convolutional network [44], reducing the ratio of pooling layers as well as fully connectedlayers. Taxonomically, CNN belongs to FNN and has been broadly employed in image recognition, video analysis,natural language processing, etc. Including CNN, one of the limitations of FNN is that the output only depends oncurrent input vectors. So it is hard to deal with sequential tasks.6igure 3: Typical DNN Structures

In order to deal with sequential tasks and using historical information, RNN employs neurons with self feedback inhidden layers. Unlike the hidden neuron in FNN, the output of recurrent neuron depends on both current output ofprecious layer and last hidden state. Compared with FNN approximates any continues functions, RNN with Sigmoidactivation function can simulate a universal Turing Machine and has the ability to solve all computational problems[45]. It is worth noting that RNN has the risk to suffer from long-term dependencies problem [43] including gradientexploding and vanishing. Additionally, RNN has more parameters waiting to be trained due to adding recurrentweights. In the following, we introduce some RNN variants as Figure 4 shows.Figure 4: RNN Variants

As aforementioned, simple RNN contains more parameters in training step, where the recurrent weights and inputweights are difﬁcult to learn [43]. The basic idea of ESN is ﬁxing these two kinds of weights and only learn the output7eights (as links highlighted in Figure 4). The hidden layer is renamed as reservoir in ESN, where the neurons aresparsely connected and the weights are randomly assigned. The recurrent weights keep constant so the information ofprevious moments is stored in the reservoir with constant weight like voice echoing.

Recently, an efﬁcient way to cope with long-term dependencies in practical is employing gated RNN, including LSTM[43]. Then we compare with the recurrent neuron in simple RNN: Internally LSTM introduces three gates to controlsignal propagation, where input gate I decides the partition of input signal to be stored, forget gate F controls ratio oflast moment memory to be kept until next period (the name "forget gate" may be a little misleading because it actuallyrepresents the ratio to be remembered) and output gate O inﬂuences the proportion of current state to be delivered;Externally LSTM has four inputs embracing one input signal and three control signals for three gates. All these foursignals are derived via the calculation of current network input and last moment delivered state. A typical application of RNN is converting one sequence to another sequence (seq2seq) such as machine translation.Conventionally, the output of seq2seq architecture is a probability distribution of output dictionary. However, it cannotdeal with the problem that the size of output relies on the length of input due to ﬁxed output dictionary. In [46], theauthors modify the output to be the distribution of input sequence, which is analogous to pointers in C/C++. Pointernetwork has been widely used in text condensation.

Auto Encoder is a stack of two NNs named encoder and decoder respectively, where the former tries to learn therepresentative characteristics of input and generate a related code, and the later reads the code and reconstructs theoriginal input. In order to avoid the auto encoder simply copying the input, some restrictions are considered like thedimension of code is smaller than input vector [43]. The quality of auto encoder can be measured via reconstructionerror, which estimates the similarity between input and output. In most cases, the auto encoder is used for the properrepresentation of input vector so the decoder part is removed after unsupervised training. The code can be employedas input for further deep learning models.

Reinforcement Learning (RL) is a Markov Decision Process represented by a quintuple {S , A , P , R , γ } , where S isthe state space controlled by environment; A is the action space determined by agent; P is the state transition functionmeasuring the probability of moving to a new state s t +1 given previous state s t and action a t ; R is reward functioncalculated by environment considering state and action; γ is a discount factor for estimating total reward. During theinteraction between agent and environment, agent observes current state s t from environment, and then takes action a t following its policy π . The environment moves to a new state s t +1 stochastically based on P ( s t , a t ) and returnsa reward r t to agent. The RL’s aim is ﬁnding the policy π to maximum accumulated reward P t γ t r t . In the earlystage, RL focuses on scenarios whose S and A are discrete and limited. So the agent can use a table to record theseinformation. Recently, some tasks have enormous discrete states and actions such as playing go and even continuousvalue such as self-driving, which makes table recording impractical. In order to solve this, DRL combines RL and DL,where RL deﬁnes the problem and optimization object; DL models the policy and the reward expectation. Dependingon the roles of DNN in DRL, we classify the DRL into 3 categories as Figure shows. In value-based method, DNN does not get involved with policy decision but estimates the policy performance. Twofunctions are introduced for the measurement: V π ( s ) represents the reward expectation of policy π starting from state s ; Q π ( s, a ) illustrates the reward expectation of policy π starting from state s and taking action a . In addition, V π ( s ) is the expected value of Q π ( s, a ) . If we can estimate Q π ( s, a ) , the policy π can also be improved by choosing theaction a ∗ hold Q π ( s, a ∗ ) ≥ V π ( s ) . So the DNN employed in agent is approximating function Q π ( s, a ) , where theinputs are state s and action a and output is the estimated value Q π ( s, a ) . There are some representative critic methodslike Deep Q Networks (DQN) [47] and its variants Double DQN [48], Dueling DQN [49], etc.8 .5.2 DNN as Actor (Policy-Based) In policy-based method, DNN gets involved in the action selection directly instead of via Q π ( s, a ) . The policy canbe viewed as an optimization problem, where the objective function is maximizing reward expectation and the searchspace is policy space. The input of DNN is current state and output is the probability distribution of potential actions.By employing gradient ascent, we can update the DNN to provide better action then maximize total reward. Somepopular algorithms include Trust Region Policy Optimization (TRPO) [50], Proximal Policy Optimization (PPO) [51]. Generally, compared with policy-based approach, the value-based method is less stable and suffer from poor conver-gence since the policy is derived based on Q π ( s, a ) approximation. But value-based method is more sample efﬁcient,while policy-based method is easier to fall into local optimal solution because the search space is vast. The actor-criticmodel combines these two approaches, i.e. the agent contains two DNNs named actor and critic respectively. In eachtraining iteration, the actor considers current state s and policy π for deciding action a . Then the environment changesto state s ′ and returns reward r . The critic updates its own parameters based on the feedback from environment andoutput a mark for the actor’s action. The actor updates the policy π depending on critic’s mark. Some typical algo-rithms are proposed recent years like Deep Deterministic Policy Gradient (DDPG) [52] and Asynchronous AdvantageActor-Critic (A3C) [53]. We divide the studies regarding deep learning for data caching in edge networks into four categories depending onthe DL tools employed: FNN and CNN; RNN; Auto Encoder; DRL. Recently many works utilize more than oneDL techniques for jointly considered caching problems. For instance, at the beginning we applies a RNN to predictcontent popularity, and then a DRL to ﬁnd suboptimal solutions of content placement for the purpose of reducingtime complexity. In such case, we classify the related work into DRL since it represents the caching allocation policy.Unless mention the caching location (such as CRs, MBSs, FBSs, EDs and BBUs) otherwise, the approaches in thissection can be utilized for both Layer 1 and Layer 2 caching. Table 2 summarize some studies of DL for caching.

In [54], the content delivery problem in wireless network is formulated as two MILP optimization models with theaims of minimum delivery time slot and energy consumption respectively. Both models consider the data rate forcontent delivery. Considering the computational complexity of solving MILP, a CNN is introduced to reduce thefeasible region of decision variables, where the input is channel coefﬁcients matrix. The FNN in paper [55] plays asimilar role as [54] to simplify the searching space of the content delivery optimization model.For resource allocation problem, the authors of [96] model it as linear sum assignment problems then utilize CNN andFNN to solve the model. The idea is extended in [56] and [57], where the authors consider where to cache problemamong potential CRs and content delivery jointly, which is modeled as MILP with the aim of balancing caching andtransmission cost by considering the user mobility, space utilization and bandwidth limitations. The cache allocationis viewed as multi-label classiﬁcation problem and is decomposed into several independent sub-problems, where eachone correlates with a CNN to predict assignment. The input of CNN is a grey-scale image which combines theinformation of user mobility, space and link utilization level. In [56], a hill climbing local search algorithm is providedto improve the performance of CNN while in [57], the prediction of CNN is used to feed a smaller MILP model.For these above works [54, 55, 56, 57, 96], the FNN or CNN input is extracted from the optimization model. Thework in [97] trains a CNN via original graph instead of parameters matrix/image, which makes the process humanrecognizable and interpretable. Though the authors take traveling salesman problem not data caching as an example,the method can be viewed as a potential research direction.In [58], an ILP model is proposed to minimize the backhaul video-data type load by determining the portion of cachedcontent in BSs. Considering the fact that the mobile users covered by a BS change frequently, therefore predictinguser preference is unnecessary. Instead, the authors concentrate on the popular content in general. At the beginning, a3D CNN is introduced to extract spatio-temporal features of videos. The popularity of new contents without historicalinformation is determined via comparing similar video features. The authors of [59] also considers the spatio-temporalfeatures among visiting contents in a mobile bus WiFI environment. By exploiting the previous 9 days collecting data,the content that the user may visit on the last day and corresponding visiting frequency can be forecast. The socialproperty is taken into account in [60]. By observing users interests on tweets during 2016 U.S. election, a CNN based9able 2: Summary of Deep Learning for Data Caching method Study Caching Problem DL Objective

FNNandCNN [54] content delivery reduce feasible region of time slot allocation[55] where to cache, content delivery determine MBSs for caching & delivery duration[56] where to cache, content delivery nominate proper CRs for caching[57] where to cache, content delivery reduce feasible region for caching[58] what to cache extract video features[59] what to cache predict requested content & frequency[60] what to cache predict requested content[61] what to cache predict content popularityRNN [62, 63] what to cache predict requested content & user mobility[64, 65, 66,67, 68] what to cache predict content popularity[69] content delivery reduce trafﬁc load, select optimal BS subsetAutoEncoder [70, 71, 72,73, 74] what to cache predict content popularity[75] what to cache predict top popular contentsDRL [76, 77, 78,79, 80, 81,82, 83, 84,85] what to cache decide cache placement[86] what to cache decide cache replacement & power allocation[87] what to cache predict popularity & searching best NN model[88] where to cache decide cache location[89] content delivery users grouping[90] where to cache, content delivery decide BS connection, computation ofﬂoading &caching location[91] what to cache, content delivery decide caching & bandwidth allocation[92] what to cache, content delivery decide caching, computing ofﬂoading & radio re-source allocation[93] what to cache, content delivery decide multicast scheduling & caching replace-ment[94] where & what to cache predict popularity, decide caching & task ofﬂoad-ing[95] where & what to cache, contentdelivery predict user mobility & content popularity, deter-mine D2D linkpredicted model can foresee the content category that is most likely to be requested. Such kind of content would becached in MBSs and FBSs.The work of [61] examines the role of DNN in caching from another aspect. The authors propose a FNN to predictcontent popularity as a regression problem. The results show that FNN outperforms RNN, though the later is believedto be effective to solve sequential predictions. Moreover, replacing the FNN by a linear estimator does not devalue theperformance signiﬁcantly. The author provides explanation that FNN would work better than linear predictor in thecase of incomplete information, and RNN has more advantages to model the popularity prediction as a classiﬁcationrather than a regression problem.

Considering RNN is superior in dealing with sequential tasks, the work [64] applies a bidirectional RNN for onlinecontent popularity prediction in mobile edge network. Simple RNN’s output depends on previous and current storage,but the bidirectional can also take future information into account. The forecast model consists three blocks cascad-ingly: a CNN reads user requests and extract features; bidirectional LTSM learns association of requests over timestep; FNN is added in the end to improve the prediction performance. Then content eviction is based on the popularityprediction.The authors in [62] utilize ESN to predict both content request distribution and end user mobility pattern. The user’spreference is viewed as context which links with personal information combining gender, age, job, location, etc..10or the request prediction, the input of ESN is user’s information vector and the output represent the probabilitydistribution of content. For mobility prediction, the input includes historical and present user’s location and the outputis the expected position for next time duration. Eventually, the prediction inﬂuences the caching content decisionsin BBUs and RRHs for the purpose of minimizing trafﬁc load and delay in CRAN. The authors extend their workin [63] by introducing conceptor-based ESN which can split users’ context into different patterns and learn themindependently. Therefore a more accurate prediction is achieved.In [65], a caching decision policy named PA-Cache is proposed to predict time-variant video popularity for cache evic-tion when the space is full. The temporal content popularity is exploited by attaching every hidden layer representationof RNN to an output regression. In order to improve the accuracy, hedge backpropagation is introduced during trainingprocess which decides when and how to adapt the depth of the DNN in an evolving manner. Similarly, the work in[66] also considers caching replacement of video content. A deep LSTM network is utilized for popularity predictionconsisting of stacking multiple LSTM layers and one softmax layer, where the input of the network is request sequencedata (device, timestamp, location, title of video) without any prepossessing and the output is estimated content popu-larity. Another work concentrates on prediction and interactions between user mobility and content popularity can befound [67].The work of [68] recognizes the popularity prediction as a seq2seq modeling problem and proposes LSTM Encoder-Decoder model. The input vector consists of past probabilities where each vectors are calculated during a predeﬁnedtime window. In [69], the authors focus on caching content delivery with the aim of minimizing BSs to cover allrequested users, i.e. set cover problem, via coded caching. Unlike [68], an auto encoder is introduced in codedcaching stage for ﬁle conversion to reduce transmission load. In addition, a RNN model is employed to select BSs forbroadcasting.The paper [98] shows the potential of RNN in solving where to cache problem. In [98], a task allocation model isformulated as a knapsack problem and the decision variables represent the task is processed locally in mobile devices(MDs) or remotely in edge servers (ESs). The authors design a multi-pointer network structure of 3 RNNs, where 2encoders encode MDs and ESs respectively, 1 decoder demonstrates ES and MD pairing. Considering the similarityof where to cache optimization model and knapsack problem, the multi-pointer network can be transferred for cachinglocation decision after according parameter modiﬁcations.

Generally, auto encoder is utilized to learn efﬁcient representation or extract features of raw data in an unsupervisedmanner. The work in [70] considers the cache replacement in wireless sensor network (WSN) based on content popu-larity. Considering sparse auto encoder (SAE) can extract representative expression of input data, the authors employa SAE followed by a classiﬁer where the input contains collecting user content requests and the output represents thecontents popularity level. The authors also think about the implementation in a distributed way by SDN/NFV tech-nical, i.e. the input layer is deployed on sink node, while the rest layers are implemented on the main controller. Arelated work applying auto encoder in 5G network proactive caching can be found in [71]. In [72], two auto encodersare utilized for extracting the features of users and content respectively. Then the extracted information is explored toestimate popularity at the core network. Similarly, the auto encoder in [73] is for spatio-temporal popularity featuresextraction and auto encoders work collaboratively in [75] to predict top K popular videos.

The work in [78] focuses on the cooperative caching policy at FBSs with maximum distance separable coding inultra dense networks. A value-based model is utilized to determine caching categories and the content quantity atFBSs during off peak duration. The authors in [79] study the problem of caching 360 ◦ videos and virtual viewportsin FBSs with unknown content popularity. The virtual viewport represents the most popular tiles of a 360 ◦ videoover users’ population. A DQN is introduced to decide which tiles of a video to be hosted and in which quality.Additionally, [80] employs DQN for content eviction decision offering a satisfactory quality of experience and [81] isfor the purpose of minimizing energy consumption. In [82], the authors also apply DQN to decide cache eviction in asingle BS. Moreover, the critic is generated with stacking LSTM and FNN to evaluate Q value and an external memoryis added for recording learned knowledge. For the purpose of improving the prediction accuracy, the Q value update isdetermined by the similarity of estimated value of critic and recording information in the external memory, instead ofcritic domination. The paper [84] puts forth DQN a two-level network caching, where a parent node links with multipleleaf nodes to cache content instead of a single BS. In [76], a DRL framework with Wolpertinger architecture [99] ispresented for content caching at BSs. The Wolpertinger architecture is based on actor-critic model and performsefﬁciently in large discrete action space. [76] employs two FNNs working as actor and critic respectively, where11he former determines requested content is cached or not and the later estimates the reward. The whole frameworkconsists two phases: in ofﬂine phase, these two FNNs are trained in supervised learning; in online phase, the critic andactor update via the interaction with environment. The authors extend their work to a multi agent actor-critic modelfor decentralized cooperative caching at multiple BSs [77]. In [83], an actor-critic model is used for solving cachereplacement problem, which balance the data freshness and communication cost. The aforementioned papers putattention on the network performance while ignore the inﬂuence of caching on information processing and resourceconsumption. Therefore, authors of [85] design cache policy considering both network performance during contenttransmission and processing efﬁciency during data consumption. A DQN is employed to determine the number ofchunks of the requested ﬁle to be updated. The paper [86] investigates a joint cache replacement and power allocationoptimization problem to minimize latency in a downlink F-RAN. A DQN is proposed for ﬁnding a suboptimal solution.Though [87] is regarded as solving what to cache problem like [76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86], thereinforcement learning approach plays a different role. In [87], a DNN is utilized for content popularity predictionand then a RL is used for DNN hyperparameters tuning instead of determining caching content. Therefore the actionspace consists of choosing model architectures (i.e. CNN, LSTM, etc.), number of layers and layer conﬁgurations.In [88], the authors generate an optimization model with the aim of maximizing network operator’s utility in mobilesocial networks under the framework of mobile edge computing, in network caching and D2D communications (3C).The trust value which if estimated through social relationships among users are also considered. Then a DQN modelis utilized for solving optimization problem, including determine video provider and subscriber association, videotranscoding ofﬂoading and the video cache allocation for video providers.. The DQN employs two CNNs for trainingprocess, where one generates target Q value and the other is for estimated Q value. Unlike the conventional DQN, theauthors in [88] introduces a dueling structure, i.e. the Q value is not computed in the ﬁnal fully connected layer, butis decomposed into two components and use the summary as estimated Q value, which helps achieve a more robustresult. The authors also consider utilizing dueling DQN model in different scenarios like cache-enabled opportunisticinterference alignment [89] and orchestrating 3C in vehicular network [90]. The work [91] provides a DDPG model tocope with continuous valued control decision for 3C in vehicular edge networks, which is combined with the idea ofDQN and actor-critic model. The DDPG structure can be divided into two parts as DQN, one is for estimated Q valueand the other for target Q value. Each part consists of two DNNs, which play the role of actor and critic respectively.The critic updates its parameters like DQN while the actor learns policy via deterministic policy gradient approach.The proposed DRL is used for deciding content caching/replacement, vehicle organization and bandwidth resourceassignment on different duration.The paper [92] provides an optimization model which takes what to cache and content delivery into considerationin the fog-enabled IoT network in order to minimize service latency. Since the wireless signals and user requestsare stochastic, a actor-critic model is engaged where the actor makes decision for requesting contents while criticestimates the reward. Specially, the action space S consists of decision variables and reward function is a variantof the objective function. A caching replacement strategy and dynamic multicast scheduling strategy are studied in[93]. In order to get a suboptimal result, an auto encoder is used to approximate the state. Further, a weighted doubleDQN scheme is utilized for avoiding overestimation of Q value. [94] applies a RNN to predict content popularityby collecting historical requests and the output represents the popularity in the near future. Then the predictionis employed for cooperative caching and computation ofﬂoading among MEC servers, which is modelled as a ILPproblem. For the purpose of solving it efﬁciently, a multi-agent DQN is applied where each user is viewed as an agent.The action space consists of task local computing and ofﬂoading decision as well as local caching and cooperativecaching determination. The reward is measured by accumulated latency. The agent choose its own action based oncurrent state without cooperation. The where to cache, what to cache and content delivery decision of D2D network arejointly modelled in [95]. Two RNNs, ESN and LSTM, are considered to predict mobile users’ location and requestedcontent popularity. Then the prediction result is used for determining content categories and cache locations. Thecontent delivery is formulated as the actor-critic based DRL framework. The state spaces include CSI, transmissiondistances and communication power between requested user and other available candidates. The function of DRL isdetermining the communication link among users with the aim of minimizing power consumption and content delay.We notice that most papers prefer to use value-based model (critic) and value-policy-based (actor-critic) model in DRLframework, but rare paper considers only policy-based model to solve data caching problem. One proper reason is thatthe search space of caching problem is enormous so policy-based model is easier to fall into local optimal solution,resulting in poor performance. Though the value-based model is less stable, some variant structures are utilized likeDouble DQN in [93] to avoid value overestimation and dueling DQN in [88, 89, 90] to improve robust.12 Research Challenges and Future Directions

A serious of open issues on content caching and potential research directions are discussed in this section. We ﬁrstextend the idea of content caching to virtual network function chain since caching can be viewed as a speciﬁc networkfunction. Then we consider the caching for augmented reality applications. Moreover, we notice that the cachedimensioning has not been covered yet by DL methods. Finally, we debate the addition cost introduced by DL.

The concept of Network Function Virtualization (NFV) has been ﬁrstly discussed and proposed within the realms of theEuropean Telecommunication Standardization Institute (ETSI) . The rational is to facilitate the dynamic provisioningof network services through virtualization technologies to decouple the service creation process form the underlyinghardware. The framework allows network services to be implemented by a speciﬁc chaining and ordering of a setof functions which can be implemented either on a more traditional dedicated hardware which in this this case arecalled Physical Network Functions (PNFs), or alternatively as Virtual Network Functions (VNFs) which is a softwarerunning on top of virtualized general-purpose hardware. The decoupling between the hardware and the software isone of the important considerations the other – equally important – is that a virtualized service lend itself naturally toa dynamic programmable service creation where VNF resources can be deployed as required. Hence, edge cloud andnetwork resource usage can be adapted to the instantaneous user demand whilst avoiding a more static over-provisionedconﬁgurations.Within that framework, the incoming network service requests include the speciﬁcation of the service function chainthat need to be created in the form of an ordered sequence of VNFs. For example different type of VNFs such as aﬁrewall or a NAT mechanism need to be visited in a speciﬁc order. In such constructed service chain each independentVNF requires speciﬁc underlying resources in terms for example of CPU cycles and/or memory.Under this framework, caching of popular content can be considered as a specialized VNF chain function since in-evitably delivery of the cached popular content to users will require a set of other functions to be supported related tosecurity, optimization of the content etc. However, the issue of data caching and VNF chaining have evolved ratherindependently in the literature and the issue on how to optimize data caching when seeing it as part of VNF chain isstill an interesting open ended issue. Mobile augmented reality (MAR) applications can be considered as a way to augment the physical real-world en-vironment with artiﬁcial computer-based generated information and is an area that has received signiﬁcant researchattention recently. In order to successfully superimpose different digital object in the physical world MAR applicationsinclude several computationally and storage complex concepts such as image recognition, mobile camera calibration,and also the use of advanced 2D and 3D graphics rendering. These functionalities are highly computationally intensiveand as such require support from an edge cloud, in addition the virtual objects to be embedded in the physical worldare expected to be proactively cached closer to the end user so that latency is minimized. Ultra low latency in thesetype of applications is of paramount importance so that to provide a photorealistic embedding of virtual objects in thevideo view of the end user. However, since computational and augmented reality objects need to be readily available,the caching of those objects should be considered in conjunction with the computational capabilities of he edge cloud.In addition to the above when MAR is considered under the lenses of an NFV environment the application mightinherently require access to some VNFs and therefore the above discussion on VNF chaining for MAR applications isalso valid in this case.Recently the concept of Digital Twin (DT) [100], [101] has received signiﬁcant research attention due to the plethoraof applications ranging from industrial manufacturing and health to smart cities. In a nutshell, a DT can be deﬁnedas an accurate digital replica of a real world object across multiple granularity levels; and this real world object couldbe a machine, a robot or an industrial process or (sub) system. By reﬂecting the physical status of the system underconsideration in a virtual space open up a plethora of optimization, prediction, fault tolerance and automation processthat cannot be done using solely the physical object. At the core of DT applications is the requirement of stringenttwo–way real time communication between the digital replica and the physical object. This requirement inevitablyrequire support from edge clouds to minimize latency and efﬁcient storage and computational resources includingcaching. In that setting, the use of the aforementioned deep learning technologies will have a key role to play in orderto provide high quality real time decision making to avoid misalignment between the digital replica of the physical Network Functions Virtualisation, An Introduction, Beneﬁts, Enablers, Challenges and Call for Action, ETSI, 2012https://portal.etsi.org/NFV/NFV_White_Paper.pdf

As introduced in Section 2, cache dimensioning explores the appropriate cache size allocation for content host suchas CRs and BSs. Disappointingly, there is rare paper applies DL on cache dimensioning decisions. One proper reasonis lack of training data set in contrast to content popular prediction, where we have historical user request log to traina DNN. In addition, the caching size allocation affects the network performance and economic investment. Recently,network slicing is identiﬁed as an important tool to enable 5G to provide multi-services with diverse characteristics.The slice is established on physical infrastructure including network storage. Therefore it is a very interesting topic toconsider the allocation of the memory space to support content caching and other storage services, which guaranteesQoE and satisﬁes task requirements. Furthermore, for the case lack of training data set, DRL can be viewed as apromising technology to conﬁgure slicing settings as well as cache dimensioning. For the action space designing, itcan be either discrete by setting storage levels, or continuous which is allocate the memory space directly. However,there are requirements to design caching-enabled network slicing model especially for dynamic allocation as well asassociated DRL framework including state space, detailed action space, reward function and agent structure.

Though the application of DL brings performance efﬁciency for caching policy, additional cost introduced by DLis unneglected, since training and deploying DL model require not only network resources but also time duration.Naturally there is a trade-off between the cost which DL-assisted caching policy saved and the consumption whichsupports DL itself running, which indicates the trading with DL results in either proﬁt, loss, or break even. Therefore,where and when to apply DL should be carefully investigated. In addition, for the purpose of reducing resourceconsumption and accelerating training process, some knowledge transfer methods like transfer learning [102] can beutilized, which can transform the knowledge already learnt from the source domain to a relevant target domain.

This article presents a comprehensive study for the application of deep learning methods in the area of content caching.Particularly, the data caching is divided into two classiﬁcations according to the caching location of edge network.Each category contains where to cache, what to cache, cache dimensioning and content delivery. Then we introducetypical DNN methods which are categorised via training process into supervised learning, unsupervised learning andRL. Further, this paper critically compares and analyzes state-of-the-art papers on parameters, such as DL methodsemployed, the caching problems solved and the objective of applying DL. The challenges and research directions ofDL on caching is also examined on the topic of extending caching to VNF chains, the application of caching for MARas well as DTs, DL for cache size allocation and the additional cost of employing DL. Undoubtedly, DL is playing asigniﬁcant role in 5G and beyond. We hope this paper will increase discussions and interests on DL for caching policydesign and relevant applications, which will advance future network communications.

References [1] X. Wang, M. Chen, Z. Han, D. O. Wu, and T. T. Kwon. Toss: Trafﬁc ofﬂoading by social network service-basedopportunistic sharing in mobile social networks. In

IEEE INFOCOM 2014 - IEEE Conference on ComputerCommunications , pages 2346–2354, 2014.[2] Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. Efﬁcient processing of deep neural networks: Atutorial and survey.

Proceedings of the IEEE , 105(12):2295–2329, 2017.[3] Yaohua Sun, Mugen Peng, Yangcheng Zhou, Yuzhe Huang, and Shiwen Mao. Application of machine learningin wireless networks: Key techniques and open issues.

IEEE Communications Surveys & Tutorials , 21(4):3072–3108, 2019.[4] Adita Kulkarni and Anand Seetharam. Model and machine learning based caching and routing algorithms forcache-enabled networks. arXiv preprint arXiv:2004.06787 , 2020.145] Junaid Shuja, Kashif Bilal, Eisa Alanazi, Waleed Alasmary, and Abdulaziz Alashaikh. Applying machinelearning techniques for caching in edge networks: A comprehensive survey. arXiv preprint arXiv:2006.16864 ,2020.[6] Stephen ANOKYE, SEID Mohammed, and SUN Guolin. A survey on machine learning based proactivecaching.

ZTE Communications , 17(4):46–55, 2020.[7] Mingzhe Chen, Ursula Challita, Walid Saad, Changchuan Yin, and Mérouane Debbah. Artiﬁcial neuralnetworks-based machine learning for wireless networks: A tutorial.

IEEE Communications Surveys & Tuto-rials , 21(4):3039–3071, 2019.[8] Nguyen Cong Luong, Dinh Thai Hoang, Shimin Gong, Dusit Niyato, Ping Wang, Ying-Chang Liang, andDong In Kim. Applications of deep reinforcement learning in communications and networking: A survey.

IEEE Communications Surveys & Tutorials , 21(4):3133–3174, 2019.[9] Xiaofei Wang, Yiwen Han, Victor CM Leung, Dusit Niyato, Xueqiang Yan, and Xu Chen. Convergence ofedge computing and deep learning: A comprehensive survey.

IEEE Communications Surveys & Tutorials ,22(2):869–904, 2020.[10] Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. Edge computing: Vision and challenges.

IEEEinternet of things journal , 3(5):637–646, 2016.[11] Mugen Peng, Yaohua Sun, Xuelong Li, Zhendong Mao, and Chonggang Wang. Recent advances in cloud radioaccess networks: System architectures, key techniques, and open issues.

IEEE Communications Surveys &Tutorials , 18(3):2282–2308, 2016.[12] Georgios S Paschos, George Iosiﬁdis, Meixia Tao, Don Towsley, and Giuseppe Caire. The role of caching infuture communication systems and networks.

IEEE Journal on Selected Areas in Communications , 36(6):1111–1125, 2018.[13] Siyang Shan, Chunyan Feng, Tiankui Zhang, and Jonathan Loo. Proactive caching placement for arbitrarytopology with multi-hop forwarding in icn.

IEEE Access , 7:149117–149131, 2019.[14] Yantong Wang, Gao Zheng, and Vasilis Friderikos. Proactive caching in mobile networks with delay guarantees.In

ICC 2019-2019 IEEE International Conference on Communications (ICC) , pages 1–6. IEEE, 2019.[15] Chao Fang, F Richard Yu, Tao Huang, Jiang Liu, and Yunjie Liu. An energy-efﬁcient distributed in-networkcaching scheme for green content-centric networks.

Computer Networks , 78:119–129, 2015.[16] Jagruti Sahoo, Mohammad A Salahuddin, Roch Glitho, Halima Elbiaze, and Wessam Ajib. A survey onreplica server placement algorithms for content delivery networks.

IEEE Communications Surveys & Tutorials ,19(2):1002–1026, 2016.[17] Asif Kabir, Gohar Rehman, Syed Mushhad Gilani, Edvin J Kitindi, Zain Ul Abidin Jaffri, and Khurrum MustafaAbbasi. The role of caching in next generation cellular networks: A survey and research outlook.

Transactionson Emerging Telecommunications Technologies , 31(2):e3702, 2020.[18] Stefano Traverso, Mohamed Ahmed, Michele Garetto, Paolo Giaccone, Emilio Leonardi, and Saverio Niccolini.Temporal locality in today’s content caching: why it matters and how to model it.

ACM SIGCOMM ComputerCommunication Review , 43(5):5–12, 2013.[19] Ali Dabirmoghaddam, Maziar Mirzazad Barijough, and JJ Garcia-Luna-Aceves. Understanding optimalcaching and opportunistic caching at" the edge" of information-centric networks. In

Proceedings of the 1stACM conference on information-centric networking , pages 47–56, 2014.[20] Yue Shi, Martha Larson, and Alan Hanjalic. Collaborative ﬁltering beyond the user-item matrix: A survey ofthe state of the art and future challenges.

ACM Computing Surveys (CSUR) , 47(1):1–45, 2014.[21] Jaeyeon Jung, Arthur W Berger, and Hari Balakrishnan. Modeling ttl-based internet caches. In

IEEE INFOCOM2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat.No. 03CH37428) , volume 1, pages 417–426. IEEE, 2003.[22] Nicaise Choungmo Fofack, Philippe Nain, Giovanni Neglia, and Don Towsley. Performance evaluation ofhierarchical ttl-based cache networks.

Computer Networks , 65:212–231, 2014.[23] Dario Rossi and Giuseppe Rossini. On sizing ccn content stores by exploiting topological information. In , pages 280–285. IEEE, 2012.[24] Yuemei Xu, Yang Li, Tao Lin, Zihou Wang, Wenjia Niu, Hui Tang, and Song Ci. A novel cache size opti-mization scheme based on manifold learning in content centric networking.

Journal of Network and ComputerApplications , 37:273–281, 2014. 1525] Georgios Paschos, George Iosiﬁdis, and Giuseppe Caire. Cache optimization models and algorithms. arXivpreprint arXiv:1912.12339 , 2019.[26] Van Jacobson, Diana K Smetters, James D Thornton, Michael F Plass, Nicholas H Briggs, and Rebecca L Bray-nard. Networking named content. In

Proceedings of the 5th international conference on Emerging networkingexperiments and technologies , pages 1–12, 2009.[27] Liang Wang, Suzan Bayhan, Jörg Ott, Jussi Kangasharju, and Jon Crowcroft. Understanding scoped-ﬂoodingfor content discovery and caching in content networks.

IEEE Journal on Selected Areas in Communications ,36(8):1887–1900, 2018.[28] Liying Li, Guodong Zhao, and Rick S Blum. A survey of caching techniques in cellular networks: Researchissues and challenges in content placement and delivery strategies.

IEEE Communications Surveys & Tutorials ,20(3):1710–1732, 2018.[29] Ammar Gharaibeh, Abdallah Khreishah, Bo Ji, and Moussa Ayyash. A provably efﬁcient online collaborativecaching algorithm for multicell-coordinated systems.

IEEE Transactions on Mobile Computing , 15(8):1863–1876, 2015.[30] Negin Golrezaei, Andreas F Molisch, Alexandros G Dimakis, and Giuseppe Caire. Femtocaching and device-to-device collaboration: A new architecture for wireless video distribution.

IEEE Communications Magazine ,51(4):142–149, 2013.[31] Mehrnaz Afshang, Harpreet S Dhillon, and Peter Han Joo Chong. Fundamentals of cluster-centric contentplacement in cache-enabled device-to-device networks.

IEEE Transactions on Communications , 64(6):2511–2526, 2016.[32] Zhuoqun Chen, Yangyang Liu, Bo Zhou, and Meixia Tao. Caching incentive design in wireless d2d networks:A stackelberg game approach. In , pages 1–6.IEEE, 2016.[33] Zhun Ye, Cunhua Pan, Huiling Zhu, and Jiangzhou Wang. Tradeoff caching strategy of the outage probabilityand fronthaul usage in a cloud-ran.

IEEE Transactions on Vehicular Technology , 67(7):6383–6397, 2018.[34] Dewang Ren, Xiaolin Gui, Kaiyuan Zhang, and Jie Wu. Mobility-aware trafﬁc ofﬂoading via cooperative codededge caching.

IEEE Access , 8:43427–43442, 2020.[35] Jaeyoung Song and Wan Choi. Mobility-aware content placement for device-to-device caching systems.

IEEETransactions on Wireless Communications , 18(7):3658–3668, 2019.[36] Dong Liu, Binqiang Chen, Chenyang Yang, and Andreas F Molisch. Caching at the wireless edge: designaspects, challenges, and future directions.

IEEE Communications Magazine , 54(9):22–28, 2016.[37] Xi Peng, Jun Zhang, SH Song, and Khaled B Letaief. Cache size allocation in backhaul limited wirelessnetworks. In , pages 1–6. IEEE, 2016.[38] An Liu and Vincent KN Lau. How much cache is needed to achieve linear capacity scaling in backhaul-limiteddense wireless networks?

IEEE/ACM Transactions on Networking , 25(1):179–188, 2016.[39] Jaeyoung Song and Wan Choi. Minimum cache size and backhaul capacity for cache-enabled small cell net-works.

IEEE Wireless Communications Letters , 7(4):490–493, 2017.[40] Bo Zhou, Ying Cui, and Meixia Tao. Optimal dynamic multicast scheduling for cache-enabled content-centricwireless networks.

IEEE Transactions on Communications , 65(7):2956–2970, 2017.[41] Mohammad Ali Maddah-Ali and Urs Niesen. Fundamental limits of caching.

IEEE Transactions on InformationTheory , 60(5):2856–2867, 2014.[42] Vu Nguyen Ha, Long Bao Le, et al. Coordinated multipoint transmission design for cloud-rans with limitedfronthaul capacity constraints.

IEEE Transactions on Vehicular Technology , 65(9):7432–7447, 2015.[43] Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

Deep Learning . MIT Press, 2016. .[44] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation.In

Proceedings of the IEEE conference on computer vision and pattern recognition , pages 3431–3440, 2015.[45] Hava T Siegelmann and Eduardo D Sontag. Turing computability with neural nets.

Applied MathematicsLetters , 4(6):77–80, 1991.[46] Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In

Advances in neural informationprocessing systems , pages 2692–2700, 2015. 1647] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, AlexGraves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deepreinforcement learning. nature , 518(7540):529–533, 2015.[48] Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. In

Thirtieth AAAI conference on artiﬁcial intelligence , 2016.[49] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. Dueling networkarchitectures for deep reinforcement learning. In

International conference on machine learning , pages 1995–2003, 2016.[50] John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy opti-mization. In

International conference on machine learning , pages 1889–1897, 2015.[51] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimizationalgorithms. arXiv preprint arXiv:1707.06347 , 2017.[52] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver,and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 ,2015.[53] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley,David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In

Interna-tional conference on machine learning , pages 1928–1937, 2016.[54] Lei Lei, Yaxiong Yuan, Thang X Vu, Symeon Chatzinotas, and Björn Ottersten. Learning-based resourceallocation: Efﬁcient content delivery enabled by convolutional neural network. In , pages 1–5. IEEE, 2019.[55] Lei Lei, Lei You, Gaoyang Dai, Thang Xuan Vu, Di Yuan, and Symeon Chatzinotas. A deep learning ap-proach for optimizing content delivering in cache-enabled hetnet. In , pages 449–453. IEEE, 2017.[56] Yantong Wang and Vasilis Friderikos. Caching as an image characterization problem using deep convolutionalneural networks. arXiv preprint arXiv:1907.07263 , 2019.[57] Yantong Wang and Vasilis Friderikos. Network orchestration in mobile networks via a synergy of model-drivenand ai-based techniques. arXiv preprint arXiv:2004.00660 , 2020.[58] Khai Nguyen Doan, Thang Van Nguyen, Tony QS Quek, and Hyundong Shin. Content-aware proactive cachingfor backhaul ofﬂoading in cellular network.

IEEE Transactions on Wireless Communications , 17(5):3128–3140,2018.[59] Zhou Qin, Yikun Xian, and Desheng Zhang. A neural networks based caching scheme for mobile edge networks.In

Proceedings of the 17th Conference on Embedded Networked Sensor Systems , pages 408–409, 2019.[60] Kuo Chun Tsai, Li Wang, and Zhu Han. Mobile social media networks caching with convolutional neuralnetwork. In , pages83–88. IEEE, 2018.[61] Vladyslav Fedchenko, Giovanni Neglia, and Bruno Ribeiro. Feedforward neural networks for caching: nenough or too much?

ACM SIGMETRICS Performance Evaluation Review , 46(3):139–142, 2019.[62] Mingzhe Chen, Walid Saad, Changchuan Yin, and Mérouane Debbah. Echo state networks for proactive cachingin cloud-based radio access networks with mobile users.

IEEE Transactions on Wireless Communications ,16(6):3520–3535, 2017.[63] Mingzhe Chen, Mohammad Mozaffari, Walid Saad, Changchuan Yin, Mérouane Debbah, and Choong SeonHong. Caching in the sky: Proactive deployment of cache-enabled unmanned aerial vehicles for optimizedquality-of-experience.

IEEE Journal on Selected Areas in Communications , 35(5):1046–1061, 2017.[64] Laha Ale, Ning Zhang, Huici Wu, Dajiang Chen, and Tao Han. Online proactive caching in mobile edgecomputing using bidirectional deep recurrent neural network.

IEEE Internet of Things Journal , 6(3):5520–5530, 2019.[65] Qilin Fan, Jian Li, Xiuhua Li, Qiang He, Shu Fu, and Sen Wang. Pa-cache: Learning-based popularity-awarecontent caching in edge networks. arXiv preprint arXiv:2002.08805 , 2020.[66] Cong Zhang, Haitian Pang, Jiangchuan Liu, Shizhi Tang, Ruixiao Zhang, Dan Wang, and Lifeng Sun. Towardedge-assisted video content intelligent caching with long short-term memory learning.

IEEE access , 7:152832–152846, 2019. 1767] Hanlin Mou, Yuhong Liu, and Li Wang. Lstm for mobility based content popularity prediction in wirelesscaching networks. In , pages 1–6. IEEE, 2019.[68] Arvind Narayanan, Saurabh Verma, Eman Ramadan, Pariya Babaie, and Zhi-Li Zhang. Deepcache: A deeplearning based framework for content caching. In

Proceedings of the 2018 Workshop on Network Meets AI &ML , pages 48–53, 2018.[69] Zhengming Zhang, Yaru Zheng, Chunguo Li, Yongming Huang, and Luxi Yang. On the cover problem forcoded caching in wireless networks via deep neural network. In , pages 1–6. IEEE, 2019.[70] Fangyuan Lei, Jun Cai, Qingyun Dai, and Huimin Zhao. Deep learning based proactive caching for effectivewsn-enabled vision applications.

Complexity , 2019, 2019.[71] FangYuan Lei, QinYun Dai, Jun Cai, HuiMin Zhao, Xun Liu, and Yan Liu. A proactive caching strategy basedon deep learning in epc of 5g. In

International Conference on Brain Inspired Cognitive Systems , pages 738–747.Springer, 2018.[72] Shailendra Rathore, Jung Hyun Ryu, Pradip Kumar Sharma, and Jong Hyuk Park. Deepcachnet: A proactivecaching framework based on deep learning in cellular networks.

IEEE Network , 33(3):130–138, 2019.[73] Wai-Xi Liu, Jie Zhang, Zhong-Wei Liang, Ling-Xi Peng, and Jun Cai. Content popularity prediction andcaching for icn: A deep learning approach with sdn.

IEEE access , 6:5075–5089, 2017.[74] Wei Li, Jun Wang, Guoyong Zhang, Li Li, Ze Dang, and Shaoqian Li. A reinforcement learning based smartcache strategy for cache-aided ultra-dense network.

IEEE Access , 7:39390–39401, 2019.[75] Yu-Tai Lin, Chia-Cheng Yen, and Jia-Shung Wang. Video popularity prediction: An autoencoder approach withclustering.

IEEE Access , 8:129285–129299, 2020.[76] Chen Zhong, M Cenk Gursoy, and Senem Velipasalar. A deep reinforcement learning-based framework forcontent caching. In , pages 1–6.IEEE, 2018.[77] Chen Zhong, M Cenk Gursoy, and Senem Velipasalar. Deep reinforcement learning-based edge caching inwireless networks.

IEEE Transactions on Cognitive Communications and Networking , 6(1):48–61, 2020.[78] Shen Gao, Peihao Dong, Zhiwen Pan, and Geoffrey Ye Li. Reinforcement learning based cooperative codedcaching under dynamic popularities in ultra-dense networks.

IEEE Transactions on Vehicular Technology ,69(5):5442–5456, 2020.[79] Pantelis Maniotis and Nikolaos Thomos. Viewport-aware deep reinforcement learning approach for 360 videocaching. arXiv preprint arXiv:2003.08473 , 2020.[80] Xiaoming He, Kun Wang, and Wenyao Xu. Qoe-driven content-centric caching with deep reinforcement learn-ing in edge-enabled iot.

IEEE Computational Intelligence Magazine , 14(4):12–20, 2019.[81] Jie Tang, Hengbin Tang, Xiuyin Zhang, Kanapathippillai Cumanan, Gaojie Chen, Kai-Kit Wong, andJonathon A Chambers. Energy minimization in d2d-assisted cache-enabled internet of things: A deep rein-forcement learning approach.

IEEE Transactions on Industrial Informatics , 16(8):5412–5423, 2019.[82] Pingyang Wu, Jun Li, Long Shi, Ming Ding, Kui Cai, and Fuli Yang. Dynamic content update for wireless edgecaching via deep reinforcement learning.

IEEE Communications Letters , 23(10):1773–1777, 2019.[83] Hao Zhu, Yang Cao, Xiao Wei, Wei Wang, Tao Jiang, and Shi Jin. Caching transient data for internet of things:A deep reinforcement learning approach.

IEEE Internet of Things Journal , 6(2):2074–2083, 2018.[84] Alireza Sadeghi, Gang Wang, and Georgios B Giannakis. Deep reinforcement learning for adaptive cachingin hierarchical content delivery networks.

IEEE Transactions on Cognitive Communications and Networking ,5(4):1024–1033, 2019.[85] Yimeng Wang, Yongbo Li, Tian Lan, and Vaneet Aggarwal. Deepchunk: Deep q-learning for chunk-basedcaching in wireless data processing networks.

IEEE Transactions on Cognitive Communications and Network-ing , 5(4):1034–1045, 2019.[86] GM Shaﬁqur Rahman, Mugen Peng, Shi Yan, and Tian Dang. Learning based joint cache and power allocationin fog radio access networks.

IEEE Transactions on Vehicular Technology , 69(4):4401–4411, 2020.[87] Kyi Thar, Thant Zin Oo, Yan Kyaw Tun, Ki Tae Kim, Choong Seon Hong, et al. A deep learning modelgeneration framework for virtualized multi-access edge cache management.

IEEE Access , 7:62734–62749,2019. 1888] Ying He, Chengchao Liang, Richard Yu, and Zhu Han. Trust-based social networks with computing, cachingand communications: A deep reinforcement learning approach.

IEEE Transactions on Network Science andEngineering , 2018.[89] Ying He, Zheng Zhang, F Richard Yu, Nan Zhao, Hongxi Yin, Victor CM Leung, and Yanhua Zhang. Deep-reinforcement-learning-based optimization for cache-enabled opportunistic interference alignment wireless net-works.

IEEE Transactions on Vehicular Technology , 66(11):10433–10445, 2017.[90] Ying He, Nan Zhao, and Hongxi Yin. Integrated networking, caching, and computing for connected vehicles:A deep reinforcement learning approach.

IEEE Transactions on Vehicular Technology , 67(1):44–55, 2017.[91] Guanhua Qiao, Supeng Leng, Sabita Maharjan, Yan Zhang, and Nirwan Ansari. Deep reinforcement learningfor cooperative content caching in vehicular edge computing and networks.

IEEE Internet of Things Journal ,7(1):247–257, 2019.[92] Yifei Wei, F Richard Yu, Mei Song, and Zhu Han. Joint optimization of caching, computing, and radio resourcesfor fog-enabled iot using natural actor–critic deep reinforcement learning.

IEEE Internet of Things Journal ,6(2):2061–2073, 2018.[93] Zhengming Zhang, Hongyang Chen, Meng Hua, Chunguo Li, Yongming Huang, and Luxi Yang. Double codedcaching in ultra dense networks: Caching and multicast scheduling via deep reinforcement learning.

IEEETransactions on Communications , 68(2):1071–1086, 2019.[94] Shilu Li, Baogang Li, and Wei Zhao. Joint optimization of caching and computation in multi-server noma-mecsystem via reinforcement learning.

IEEE Access , 2020.[95] Lixin Li, Yang Xu, Jiaying Yin, Wei Liang, Xu Li, Wei Chen, and Zhu Han. Deep reinforcement learningapproaches for content caching in cache-enabled d2d networks.

IEEE Internet of Things Journal , 7(1):544–557,2019.[96] Mengyuan Lee, Yuanhao Xiong, Guanding Yu, and Geoffrey Ye Li. Deep neural networks for linear sumassignment problems.

IEEE Wireless Communications Letters , 7(6):962–965, 2018.[97] Zhengxuan Ling, Xinyu Tao, Yu Zhang, and Xi Chen. Solving optimization problems through fully convolu-tional networks: An application to the traveling salesman problem.

IEEE Transactions on Systems, Man, andCybernetics: Systems , 2020.[98] Qingmiao Jiang, Yuan Zhang, and Jinyao Yan. Neural combinatorial optimization for energy-efﬁcient ofﬂoadingin mobile edge computing.

IEEE Access , 8:35077–35089, 2020.[99] Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt,Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. Deep reinforcement learning in largediscrete action spaces. arXiv preprint arXiv:1512.07679 , 2015.[100] A. El Saddik. Digital twins: The convergence of multimedia technologies.

IEEE MultiMedia , 25(2):87–92,2018.[101] K. M. Alam and A. El Saddik. C2ps: A digital twin architecture reference model for the cloud-based cyber-physical systems.

IEEE Access , 5:2050–2062, 2017.[102] Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. A survey of transfer learning.