Serverless Streaming for Emerging Media: Towards 5G Network-Driven Cost Optimization
Konstantinos Konstantoudakis, David Breitgand, Alexandros Doumanoglou, Nikolaos Zioulis, Avi Weit, Kyriaki Christaki, Petros Drakoulis, Emmanouil Christakis, Dimitrios Zarpalas, Petros Daras
SServerless Streaming for Emerging Media: Towards 5GNetwork-Driven Cost Optimization
A real-time adaptive streaming FaaS service for small-session-oriented immersivemedia
Konstantinos Konstantoudakis · David Breitgand · Alexandros Doumanoglou · Nikolaos Zioulis · Avi Weit · Kyriaki Christaki · Petros Drakoulis · Emmanouil Christakis · Dimitrios Zarpalas · Petros DarasAbstract
Immersive 3D media is an emerging typeof media that captures, encodes and reconstructs the3D appearance of people and objects, with applicationsin tele-presence, teleconference, entertainment, gamingand other fields. In this paper, we discuss a novel con-cept of live 3D immersive media streaming in a server-less setting. In particular, we present a novel network-centric adaptive streaming framework which deviatesfrom a traditional client-based adaptive streaming usedin 2D video. In our framework, the decisions for the pro-duction of the transcoding profiles are taken in a cen-tralized manner, by considering consumer metrics vsprovisioning costs and inferring an expected consumerquality of experience and behavior based on them.
This work has been realized in the context of the 5G-MEDIAproject ( ), which has received funding fromthe European Union’s Horizon 2020 research and innovationprogramme under grant agreement No. 761699.K. Konstantoudakis, A. Doumanoglou, N. Zioulis,K. Christaki, P. Drakoulis, E. Christakis, D. Zarpalasand P. Daras are with:Visual Computing Lab (VCL),Information Technologies Institute (ITI),Centre for Research and Technology - Hellas (CERTH),Thessaloniki, GreeceTel.: +30-2310-464160Fax: +30-2310-464164E-mail: [email protected]; [email protected];[email protected]; [email protected];[email protected]; [email protected]; [email protected];[email protected]. Breitgand and A. Weit are with:Hybrid Cloud, Cloud and Data Technologies,IBM Research,Haifa, IsraelTel.: +972-4-8296211E-mail: [email protected]; [email protected]
In addition, we demonstrate that a naive applicationof the serverless paradigm might be sub optimal undersome common immersive 3D media scenarios.
Keywords immersive media · serverless · · real-time adaptive streaming · service optimization · cognitive networking · OPEX optimization · Function-as-a-Service (FaaS)
Media intensive applications and services become in-creasingly important. The ongoing COVID-19 pan-demic has forced people to work, learn, and commu-nicate remotely on an unprecedented scale. With morepeople in quarantine and isolation, the demand for lowlatency applications, such as video streaming, onlinegames, and teleconferencing has soared to the pointthat has prompted some countries to look at ways tocurb streaming data to avoid overwhelming the Inter-net [39]. It has been suggested by many that in a post-COVID-19 world, as restrictions are gradually lifted,many people might use telecommunication as a newnormal mode of working. Several large companies havealready announced that this unintended pilot on re-mote teleworking might become the standard way ofhow people will work in the 21 st century.With the emergence of immersive media, this op-tion – remote teleworking and infotainment – becomeseven more attractive and real, since much better qual-ity of experience will be provided to the users. How-ever, immersive media is likely to further exacerbatethe issues related to bandwidth and latency (even in thenew generation 5G networks), since all next-generationmedia types [87] —either omni-directional (360 o ) ormulti-view or three-dimensional —impose bandwidthrequirements and latency requirements that vastly sur-pass those of the traditional media, even when the high- a r X i v : . [ c s . N I] F e b K. Konstantoudakis, D. Breitgand et al. end profiles of the current media are considered (i.e.UHD).A number of approaches aim at mitigating this issueby optimizing the use of resources for media intensiveservices. The media services vary by type. A large andincreasingly important family of media services are re-lated to tele-presence and infotainment. These servicesare characterised by highly dynamic consumer popu-lations. Therefore they require efficient scaling withinstantaneous elasticity to handle irregular workloadspikes [80].However, for the real-time media streaming setting,the resource management should extend beyond scal-ing. The finer-grained decisions might include selectionof bit-rate and transcoding profiles to optimize cost-efficiency from the service provider perspective [96].More advanced optimization relies on recent standards,such as MPEG-DASH SAND [46], which leverages theknowledge it obtains from the network to collabora-tively manage media services in order to optimize theusers’ quality of experience (QoE) [60].The aforementioned approaches evolved for the cen-tralized cloud model and are limited by each cloudprovider’s infrastructure, functionalities, and billingschemes. With the emergence of the 5G networks, ultra-fast, ultra-reliable, and high bandwidth capable edgebecomes an attractive option to media services devel-opers. For the immersive media, 5G is a crucial enablingtechnology, since its targeted key performance indica-tors stipulated by the architecture documents are es-sential to providing good QoE for the users [64].Software Defined Networks (SDN) and NetworkFunction Virtualization (NFV) technologies drive the network softwarization transformation. A softwarizednetwork is much more amenable to collaborative ap-plication and infrastructure optimization via optimizedworkload placement, application demand adaptation,and network optimization across cloud and edge, basedon elaborate monitoring of the infrastructure and ser-vice behavior analytics. The cloud-native transforma-tion that drives innovation in the modern cloud andtelco cloud edge opens up a number of new opportuni-ties for fine grain resource optimization.The finer-grained approaches are able to factor inthe information provided by the network into the opti-mization schemes [96] and are better suited to addressthe central challenge of developing a network architec-ture being able to dynamically adapt to fluctuating traf-fic patterns [28].Serverless computing was first introduced in the endof 2014 and in the last two or three years it has be-come an extremely popular cloud native pattern usedto build highly granular, yet very cost-efficient, micro- services. Serverless computing is an execution modelin which a provider of a serverless computing platformmanages servers in the back-end and dynamically allo-cates server resources to virtualization containers (e.g.Docker containers) to execute customers’ workloads. Inserverless computing, a developer only focuses on thecode while the actual packaging and execution is be-ing taken care of by the serverless framework. Broadlyspeaking, a serverless application scales to zero in ab-sence of the load and automatically scales out (almostinstantaneously) when the load is applied. A customerof the serverless computing framework (e.g. a devel-oper) does not have to worry about auto-scaling. Thismechanism is automatically included with the server-less framework. A serverless execution model, where aunit of work is a function provided on demand (e.g. inresponse to some event ) is called Function-as-a-Service(FaaS). FaaS is a sub-model in a broader serverlessparadigm. However, exempting instances where claritydemands a specific term, we will use the terms server-less and FaaS interchangeably in this paper. An impor-tant feature of FaaS is its billing model. FaaS comesmuch closer to the initial business value promise of theCloud — pay as you go — than any other cloud con-sumption model. A typical billing scheme for FaaS isbased on amount of main memory committed duringthe execution multiplied by the number of seconds, tothe granularity of 100 ms (and is hence priced per GB · seconds). In order to simplify scheduling and flattencapacity planning cycle, FaaS providers limit maximallifetime of serverless functions by 10–15 minutes. Be-cause of its importance to scheduling (serverless func-tions are treated as “sand” that can always be scheduledamong “boulders”, i.e. jobs with generally distributedlifetimes), limited maximal lifetime is not a transientfeature of FaaS. Therefore, it is mostly suited for ses-sion based, event driven, highly dynamic, but relativelyshort workloads .These are exactly the characteristics of many im-mersive media applications. However, to the best ofour knowledge, FaaS is not being widely applied to me-dia intensive services yet. There are multiple reasonsfor that: First, serverless functions do not communi-cate with each other via the data network. A typicalinter-function communication is via a database. This isway too slow and inadequate for media. Second, FaaSframeworks do not support Day 1, Day 2 configura-tion of services based on FaaS. Consider an application, Note that there exist popular serverless frameworks, suchas Knative [5] that do not have limitations on the maximallife time of a function. In these framework, function is noteven a building block. Rather such frameworks help buildingWeb services that scale to zero helping with code to containerdevops cycle along the way.erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 3 in which serverless functions should be executed in re-sponse to events, get configured to connect to the restof an application and then terminate while the rest ofan application continues to execute. Such complex man-agement flows are not supported in current frameworks.Third, in many media intensive applications, the use ofspecialized hardware (e.g. GPUs) is required. This isnot supported out of the box neither by open sourceFaaS, nor by commercial offerings. Fourth, since thismodel is relatively new, it is largely unknown to thebroader community involved with immersive media.This paper is intended to fill this void. We present anovel architectural approach to developing cost-efficientimmersive media applications using the FaaS approach.The overall architectural framework and standards fordeploying applications in 5G edge is being evolved bystandardization organizations, such as ETSI that stip-ulates application of FaaS technology in 5G MEC [71].A typical use case envisioned for FaaS in 5G MEC isIoT. In our previous work, we applied the 5G MECprinciples to media intensive applications at the cloudedge [9] and presented an overall architectural frame-work that pioneered the use of FaaS in media intensiveapplications. That work applied for the first time FaaSto NFV orchestration utilizing a FaaS VIM integratedwith the ETSI MANO framework.In this paper, we continue this line of work byconsidering session-based workloads typical in immer-sive media streaming related to infotainment and tele-presence. We developed a fully functional prototype ofa tele-immersive gaming service, where time-varyingmulti-view textured meshes of two players are beingproduced in real time (a watertight geometry of a playeris being produced from four camera streams) and em-bedded into the virtual environment, where the playerscan freely move in all 6 degrees of freedom. The playerscommunicate with each other via a broker that is be-ing placed in the 5G MEC in geographical proximity tothe players to leverage the 5G latency and bandwidthfor the sake of the application. Spectators can join fromany edge location and also from non 5G access network.The spectators tolerate some small lag (much like it isthe case for the sport events broadcasting).In contrast to the players, who directly exchangeimmersive media frames via the broker, the specta-tors consume 3D streams that are being transcoded tomatch the capabilities of the spectators’ terminals. Itshould be noted that we take an approach different froma typical media streaming architecture. Rather than let-ting spectators ask for specific transcoding, our appli-cation automatically considers the capabilities of theusers’ terminals and the network conditions and allo-cates the most cost-efficient transcoding scheme, trying to balance the trade-off between the cost of transcoders,revenue produced by the spectators and the total ben-efit for spectators in the form of QoE that motivatesthem to stay longer in the sessions. In other words,our application optimization strives at achieving max-imum profit while providing maximum QoE to specta-tors. Each player’s stream may be transcoded to lowerbit-rate versions, namely transcoding profiles, that maybe consumed by a multitude of spectators. Cheapertranscoding profiles are being accommodated on CPU,with less RAM and perhaps lower quality configuration,while more expensive ones utilize GPUs (using our ex-tended FaaS framework based on Apache Open-Whiskand Kubernetes).When in-application events of interest occur (e.g.scoring in an immersive game), a replay serverless func-tion can be executed on demand. The function usessome buffered media to produce a replay clip on andstores it in a low cost cloud storage from which specta-tors can retrieve it at any time. The number of eventshappening during the session serves as a proxy to esti-mate the session popularity with the spectators.For remote spectators joining at edges where no bro-ker is present, a broker is being started on demand,connected to the main broker, which is being used bythe players and each of the transcoded 3D streams isbeing transmitted by only once to the remote broker,to reduce overall traffic load on the network.A few important points should be noted about ourapproach. First, each serverless function in our appli-cation has one well defined functionality and a singleconfiguration profile. This greatly simplifies design andoperation. Second, thanks to the inbuilt auto-scaling,the application is elastic by design. Third, FaaS is anexcellent match for the session-based nature of the ap-plication and its fine granularity (a single function level)allows to optimize cost-efficiency of resource allocationat the level of individual sessions paying only for what isactually being used. These advantages are not availableout of the box in any other cloud-native model.We validate our approach via extensive experimen-tation, contrasting our network-centric optimizationapproach with a naive serverless implementation (whichwould always start transcoders on demand irrespectiveof the predicted accrued benefit), and a traditional Vir-tual Machine (VM) based approach. Since some fea-tures (e.g. support for GPUs) are not yet available incommercial offerings, the billing schemes necessary forexperimentation on cost efficiency are not available. Tothat end, we examine how GPUs are being offered inthe cloud today and examine conditions for their costefficiency in FaaS offerings in 5G MEC. We then use thebilling schemes extrapolated from this study as a proxy
K. Konstantoudakis, D. Breitgand et al. to obtain preliminary figures illustrating a comparativecost-efficiency of the proposed approach.In summary, our main contributions are as follows: – We expand the range of applications for serverlessarchitectures to media streaming, addressing its re-quirements and introducing the concept of serverlessstreaming; – We apply this concept to a demanding use-case ofnext-generation media by implementing and deploy-ing an adaptive streaming service to 5G-enablednetwork infrastructure, in the context of a real-timeand interactive media scenario; – We show how a serverless architecture within a 5Gframework can also enable in-network service op-timization and network centric adaptation for themedia intensive verticals; – We demonstrate the cost effectiveness of serverlessstreaming compared to traditional solutions takinginto account the balance between the total QoE andcost of production; – Our findings also serve as a guideline to how server-less should be used in similar use-cases and indi-cate that naively applying serverless would be sub-optimal.The rest of this paper is organized as follows: In Sec-tion 2 we discuss related work and the present work’s re-lation and connection to it. In Section 3, we outline ourextensions to Apache OpenWhisk serverless framework,while in Section 4 we present our serverless adaptivestreaming service. In Section 5, network-centric costoptimization is discussed and in Section 6 experimen-tal results are given. Finally, Section 7 concludes thepaper.
In this work, we expand on the novel concept ofnetwork-centric 3D immersive media real-time adaptivestreaming in a serverless setting. The concept is mul-tidisciplinary and therefore has several partial overlapswith various topics in the literature.To facilitate the reader’s comprehension, we splitthis section into a small number of more focused sub-sections covering different sub-topics. In Subsection 2.1,we briefly describe and provide examples of 3D immer-sive media production platforms. In Subsection 2.2, wepresent the principal ideas and some of the more recentadvancements in the area of video adaptive stream-ing. In Subsection 2.3, we focus on immersive media,namely 360 o video and 3D representations. Next, inSubsection 2.4, we go over other adaptive streaming so-lutions, including some of the more recent works in the area of server-based, network-assisted adaptive stream-ing and cloud-based streaming solutions. Finally, inSubsection 2.5 we provide an overview of the server-less computing model focusing on the features that aremore relevant to the context of this work.2.1 3D Immersive Media Production PlatformsThe key enabler of 3D immersive media production isa volumetric capturing system. A volumetric capturingsetup is usually comprised of a 360 o arrangement of in-ward looking camera sensors, defining a capturing spacewith specific boundaries. Despite the fact that volumet-ric capturing systems most commonly output a multi-view plus depth [29] representation of the capturedscene, the most common 3D immersive media format iscolored point clouds or textured 3D meshes. The latterare produced by 3D reconstruction algorithms [50,51]run on the 3D points of the spatially aligned capturedviews. In general, the 3D reconstruction process can beperformed either offline, or in real-time which — givensufficient computational and network resources — canadditionally allow for live streaming.An open, free-to-use, state-of-the-art, low-cost andportable volumetric capture system, which does not in-tegrate a 3D reconstruction algorithm, is [82]. One ofthe earliest low-cost platforms [97] utilized 4 consumergrade RGB-D sensors and incorporated 3D reconstruc-tion, enabling tele-immersion at interactive rates.More recently, Holoportation [66] utilized 16 IR-stereo pairs for depth estimation along with 8 colorcameras for texturing, to produce and stream highquality 3D textured meshes. Even though this sys-tem produces stunning 3D reconstructions, its compu-tational complexity is high as it requires 1 GPU perIR stereo pair and a main workstation equipped with 2GPUs to undertake the task of actual processing. More-over, the output bit-stream requires approximately 40Mbit/frame, which for a 30 frames-per-second real-timestreaming scenario would require over 1 Gbps of band-width.A significant improvement on the volume of thestreamable content, which has been kept below 16 Mbpswithout compromising quality, has been demonstratedby the offline immersive media platform in [25]. Toachieve such a remarkable performance, the authors em-ployed 61 12-core Intel Xeon machines, while processingwould take 25-29 sec/frame. Other 3D immersive me-dia platforms also exist in the literature [30,75]. Com-mon elements among most existing works are the in-creased processing power required to achieve high qual-ity content and the extreme bandwidth requirements erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 5 for streaming, which can only be mitigated by devotingeven more processing resources.2.2 Adaptive StreamingConsumers of media content over the internet are highlyheterogeneous. A consumer is characterized by devicecapabilities, available processing power and networkquality (bandwidth, latency, and loss rate). The mostcommon way that the contemporary technology opti-mizes QoE for consumers, is through HTTP AdaptiveStreaming (HAS) [16]. The objective of HAS is to main-tain the viewer’s QoE at high levels, countering the neg-ative impact of the network bandwidth fluctuations. InHAS, prior to the distribution, the video needs to beavailable in segments and encoded in multiple qualities.The most popular HAS protocols today are MPEG-Dynamic Adaptive Streaming over HTTP (DASH) [79]and Apple’s HTTP Live Streaming (HLS) [67]. Whilethey have differences in specification and content de-ployment, recent MPEG’s standardization efforts forthe Common Media Application Format (CMAF) [2],allow adaptive streaming using either MPEG-DASH orHLS from a single source.There exist multiple studies on QoE in video adap-tive streaming [76,45,15,69]. Some of the more im-portant factors affecting QoE include: initial delay,stalling frequency, stalling duration, adaptation (qual-ity change) interval, adaptation frequency, adaptationdirection, adaptation amplitude, video’s spatial resolu-tion, video’s frame-rate and video’s visual quality [76].Due to the multiplicity of factors affecting a consumer’sQoE, there is no single QoE model that different studiesconverge on and which can serve as a common referenceframework.HAS leaves encoding schemes and the adaptationstrategy without a specification. According to [17], andbased on the location of the adaptation logic insidethe HAS system, HAS schemes can be split into fourcategories: i) client-based ii) server-based iii) network-assisted and iv) hybrid. The most common scheme is(i), in which the adaptation logic runs on the clientwith the video player fetching the video segments basedon a manifest. In most implementations, the adaptationlogic relies on monitoring internal buffer levels and mea-suring throughput [19]. Current state-of-the-art client-based bitrate adaptation algorithms are presented in[81] and [57]. A cutting-edge reinforcement-learning ap-proach is provided by [59], while [93] describes ensemblealgorithms tailoring different network conditions. 2.3 Immersive Media Adaptive Streaming A survey on 360 o video streaming can be foundin [36]. Regarding immersive media, MPEG has re-cently standardized the Omnidirectional Media Format(OMAF) [40] specification for 360 degree video stream-ing. For the 360 o video streaming, the most commonadaptation strategy is viewport-based, in which theequirectangular image is split into tiles which are en-coded at different bitrates based on the viewing direc-tion of the client [77,73,72,78,38] often exploiting tilingsupport in video coding algorithms, like HEVC [84,63].In [44], a tile-based approach is described usingMPEG-DASH SRD (Spatial Relationship Descriptor)and tile-over-viewport prioritization. Naive tile-basedapproaches download the portion of the video that theviewer is looking at. However, the fetching of new tilesfrom network results in more latency than motion-to-photon latency of the VR headset. In [88], a proba-bilistic approach is taken for a tile pre-fetching strat-egy that minimizes expected distortion of the down-loaded tiles. In [21], a tile-based probabilistic approachis taken, that captures the likelihood of the viewer nav-igating towards specific tiles in the form of heatmaps.In [41], the authors attempt to provide a solution to360 o video streaming to smartphones, overcoming theirprocessing power limitations compared to desktop PCs.Finally, [14] presents a real-time streaming system of360 o video relying on GPU-based HEVC [84] coding. Due to a higher complexity of 3D representations, the3D Immersive media coding and streaming approachesare less mature compared to 360 o or standard 2D video.To begin with, there exist very few 3D immersive me-dia codecs exploiting inter-frame redundancy in time-varying mesh sequences (the mesh sequences of varyinggeometry and connectivity like the ones produced byreal-time 3D reconstruction systems) [31], [90]. Thus,for the 3D mesh geometry, only static 3D mesh codecsare utilized [32]. Furthermore, there is very little litera-ture regarding QoE for 3D immersive media streaming,which could drive adaptive streaming systems [33].On the other hand, for the point-cloud representa-tions more options exist. In [55] and [23], point cloudsare compressed exploiting volumetric function represen-tations, while in [62] point cloud sequences are intra-frame and inter-frame coded based on octrees and mo-tion prediction. A detailed survey summarizing works in3D geometry compression can be found in [58]. Finally, K. Konstantoudakis, D. Breitgand et al. the accompanied textures used to colorize the 3D meshare often compressed using standard 2D image or videocompression algorithms like Motion-JPEG (MJPEG)or HEVC.One of the first works for real-time adaptive stream-ing of textured 3D time-varying meshes is [26], whichis based on a dynamic rule adaptation strategy modi-fying compression parameters of the real-time stream.For point-cloud streaming, in [43] and [68], multiple 3Dobjects of the same scene are streamed with adapta-tion relying on content’s proximity to the viewer, alongwith the viewer’s looking direction and distance to con-tent. One of the first complete HAS implementations ofa point-cloud adaptive streaming solution is presentedin [42]. The authors in [42], demonstrate a DASH-compliant HAS system for dynamic point clouds, in-troducing rate adaptation heuristics that are based onviewer’s position and looking direction, network band-width and buffer status. At the same time, the encodingscheme utilizes the recently introduced MPEG VideoPoint Cloud Coding (V-PCC) algorithm [27].2.4 Other Adaptive Streaming SolutionsServer-based, network-assisted and hybrid approachesto adaptive streaming used to be less popular, but re-cently they started attracting an increased interest withthe emergence of Software-Defined-Networking (SDN)and 5G Networks. In [35], a DASH-based server-clientadaptive streaming system for standard 2D video isproposed that achieves efficiency, stability, fairness andconvergence with server and clients co-operating formaximum gains. A SAND-DASH network assisted ap-proach is given in [61], describing a method to performadaptive video streaming to mobile devices, in Multi-Access Edge Computing (MEC) scenarios. Noticeableperformance improvements have been observed whenthe achievable throughput was moderately high or thelink qualities across mobile clients were alike. Finally,in [86], the authors propose Cloud Live Video Stream-ing (CLVS), a model that exploits Amazon S3’s stor-age capabilities in order to enable cost-efficiency in alive video streaming scenario oriented towards smallstreaming sessions.The solution in [86] eliminates the need for a con-stantly up-and-running streaming server (and in thatsense it is serverless). Rather, a source video is beingrecorded by a mobile device, on which then it is seg-mented and encoded. Next, those video segments arepushed into a designated Amazon S3 bucket. On end-user devices, the client program of CLVS directly re-trieves the most recent video segments from the S3bucket, then performs decoding and video playing back. While being inventive and accruing the cost-efficiencyadvantages compared to a typical solution, in whicha cloud based video streaming server can have idleperiods, CLVS will not scale to support 3D adaptivestreaming neither from the latency, nor from the band-width, nor from the cost-efficiency perspectives. Alsothis design does not allow a network-centric adaptationof QoE.In addition to the previously presented related work,we find it important to also mention two other works inthe literature that are relevant to our domain. In [52]a game engine plugin is designed based on MPEG-DASH [79] SRD and HEVC [84] with an in-game 360virtual camera in order to enable 360 o video streamingof the game environment in e-sports events. And in [85],the authors try to exploit the 5G network infrastructureto offer better QoE in 360 o video streaming.2.5 Serverless Computing for MediaThe serverless programming model rapidly becomespopular with developers. Serverless computing relievesthe developers from the tasks related to applicationpackaging and server provisioning. The developers needonly to provide the code of their application and asource to executable pipeline automatically creates arunning task in a cloud. As explained in the previ-ous section, broadly speaking, the serverless computingparadigm refers to services that scale to zero. FaaS, aspecifically popular serverless computing model, refersto structuring applications as stateless functions thatare being called on demand (e.g. in response to events).The reader is advised to consult [54] for a comprehen-sive review of serverless frameworks.Recently FaaS has been applied by practitioners tovideo streaming [37,74]. Little scientific literature ex-ists on the topic. In [94] a measurement study of trans-coding tasks has been performed to explore how differ-ent lambda function configurations (in terms of memoryand proportionally allocated CPU) affect performanceand cost.The study reveals that the memory configura-tion for cost-efficient serverless functions is non-trivial.The best memory configuration is influenced by the tasktype or even the video content. More work is needed todesign an efficient and adaptive system to find the bestconfiguration for serverless functions in video process-ing pipelines. In [10], a serverless framework facilitatingdevelopment of video processing pipelines is described.Common to all these solutions is rising serverless func-tions (e.g. AWS Lambda) for performing operations(e.g. transcoding) on a video file that is uploaded to thecloud storage (e.g. S3 bucket). Upon the file upload, anevent is being generated by the storage, which triggers erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 7 the execution of a serverless function, whose output iseither stored in the cloud storage again (potentially cre-ating another trigger for another function execution) orpropagated to a Content Distribution Network (CDN),such as AWS CloudFront.To the best of our knowledge, until this work, noattempt has been made to use serverless functions foradaptive transcoding of a live 3D immersive mediastream. In our implementation we used open sourceApache OpenWhisk project [1]. We leveraged Open-Whisk’s capability of executing functions on top of Ku-bernetes to provide features such as direct network com-munication among serverless functions (as opposed tothe communication via storage or database, which istypical in other frameworks), support for Day 0, Day 1,and Day 2 configuration, as well as support for GPUs.All these features are not being provided out of thebox to the developers, which hinders serverless adop-tion for media intensive applications. In this paper wedemonstrate how adding this features might open upnew opportunities to achieve cost-efficient immersivemedia implementations.2.6 Summary of our innovationsIn this paper, we expand on a preliminary design [34]of a novel multiplayer tele-immersive game applica-tion [24] where players are embedded inside the gameenvironment via their 3D reconstructed avatars. Thegaming application is supported by a 3D immersivemedia production platform which uses [82] for volu-metric capturing and a re-implementation of the 3Dreconstruction algorithm found in [7]. This platform islow-cost, portable, real-time and produces streamablecontent of ∼
50 Mbps, at interactive rates (25 framesper second). The application offers live spectating ofthe game action and on-demand viewing of replay clips.It is deployed on 5G serverless infrastructure and em-ploys adaptive streaming techniques to stream the 3Dappearance of the players to spectators. Our adaptivestreaming algorithm is based on [32] for compressinggeometry and MJPEG for compressing textures. Adap-tation is achieved by varying compression parameters toproduce different profiles at various bit-rates. Further,apart from costs, adaptation optimization is driven bya variant of the QoE model in [92]. Our work is amongthe first to discuss live 3D immersive media streamingunder a 5G, serverless framework. Such an attempt wasnot possible before, mainly due serverless frameworkslacking support for network communications of server-less functions, Day 0/1/2 configurations and GPUs.Furthermore, this work is also among the first to pro-vide a network-centric novel adaptive streaming algo- rithm which takes into account the serverless benefitsin order to minimize service costs while offering highQoE to spectators.
FaaS frameworks and offerings are rapidly proliferat-ing. However, there are just a few industrial grade opensource FaaS platforms available. One such framework isApache OpenWhisk [1], which powers the IBM CloudFunctions commercial offering [4]. Presently, FaaS com-mercial offerings do not offer usage of GPUs in server-less functions. The reason for that is that GPU sharingis a relatively new topic that poses a number of chal-lenges. Since NVIDIA has introduced Multi-ProcessService (MPS) in its Volta GPU architecture [65], GPUsharing is being a hot research topic [95].A most common compute virtualization technologypowering FaaS is containers. In production, containersare managed by container orchestrators, such as Kuber-netes [56]. However, current container orchestrators donot know how to leverage architectures such as NVIDIAMPS yet. Thus, a solution that we adopt for extendingFaaS to use GPUs is time-sharing of GPUs rather thancollocating workloads on the same GPU. Another rea-son for preferring time sharing to spatial collocation isthat collocating workloads on GPUs might require re-writing of the application code.Another problem that is currently not being ad-dressed by the FaaS frameworks is supporting both in-bound and outbound network traffic to and from server-less functions. Usually, only the outbound traffic is be-ing supported seamlessly. For the inbound traffic, animage of the serverless function container should in-clude some communication service, which might be dif-ficult to do due to inability to expose the function as aservice to the outside world and intricate firewall set-tings. In our solution we rely on using Container Net-work Interface (CNI) to connect serverless functions toa logical network maintained by container orchestrator.Finally, in the context of 5G MEC, a FaaS frame-work is provided as part of the MEC platform. Figure 1shows the ETSI reference architecture for 5G MEC. Inthis architecture variant, termed
MEC in NFV , theapplication components (serverless functions) are re-quired to be packaged as Virtual Network Functions(VNFs) to be managed by the ETSI Management andOrchestration Stack (MANO) via either a Virtual Net-work Function Manager (VNFM) or a Network Func-tion Virtualization Orchestrator (NFVO). Finally, theactual container allocation should be performed by Vir-tual Infrastructure Manager (VIM) MANO component.Therefore, a challenge arises in how to harmonize ETSI
K. Konstantoudakis, D. Breitgand et al.
MANO standards with FaaS. We partially addressedthis problem in our previous work [9], where we de-scribed an ETSI compatible FaaS VIM. In this paper,we deal with additional problems related to harmo-nizing orchestration of serverless functions with ETSIMANO to implement the tele-immersive media appli-cation.We will now briefly discuss the challenges mentionedabove, and outline how we deal with them in the pro-posed solution .3.1 Orchestrating Serverless Applications in 5G MECOne of the more important challenges to integration be-tween MANO and serverless technology that we facedin our serverless tele-immersive media implementationwas the inability to model a FaaS based service us-ing ETSI VNF Descriptors and VNF Packages. Par-ticularly, a FaaS-based network service includes com-ponents that should not be started upon service in-stantiation, but created and deleted based on customevents. Some of these events are possibly happeninginside the application itself. This pattern cannot be re-duced to what ETSI MANO already handles well —auto-scaling. Rather it requires additional flexible or-chestration mechanisms, which are application specific.We have developed such serverless orchestration, whichgeneralizes to any custom orchestration scenarios andacross multiple use cases.In Figure 2 we show how we combine serverlessorchestration with MANO for the sake of managingserverless tele-immersive media in 5G MEC. We usedKubernetes as our NFVI because of it being a de factocontainer orchestration standard. Also it provides out-of-the-box capabilities for networking and GPU con-sumption by the OpenWhisk serverless functions, aswe discuss in Subsection 3.2 and Subsection 3.3, re-spectively.To harmonize serverless functions with the standardETSI network service modeling and life cycle manage-ment cycle, we add key/value pairs to the optional in-formation field of a Virtual Network Function Descrip-tor (VNFD), indicating whether a VNF/CNF is server-less and whether is should be started upon instantiationof the network service or upon some custom event.Each serverless VNF/CNF in our system is anOpenWhisk action that is pre-registered with the A reference implementation of our extended FaaS frame-work and its integration with MANO, can be found inhttps://github.com/5g-media/faas-vim-plugin. In Apache OpenWhisk parlance, functions are termed ac-tions . We will use the terms interchangeably, wherever thisdoes not cause an ambiguity
OpenWhisk FaaS system, which is provided as part ofthe 5G MEC Platform. This is part of the onboardinginto a VIM mechanism prescribed by MANO. For moredetails see our previous work [6]. The image of a server-less VNF/CNF is simply a fully qualified action namethat points to an appropriate metadata associated withthe action: key/value pairs describing the action exe-cutable, resource limits, such as memory, CPU, maxi-mum execution time, and a variety of annotations thathelp OpenWhisk to invoke and manage this function.This metadata is interpreted by our OpenWhiskVIM that does not invoke an action (does not startthe serverless VNF/CNF) upon the initial instatiationdriven by ETSI MANO (shown on the left of the fig-ure). A standard MANO instantiation flow consists ofun-packaging the network service package, instantiatingVNFs/CNFs in the sequential order of appearance oftheir VNFDs in the package. MANO’s NFVO of VNFMpolls VIM periodically to obtain metadata on the in-stantiated VNFs/CNFs when they get started and con-figured with Day 0 configuration (e.g. IP:port address).This data is being stored inside MANO’s in-memory in-ventory of the running VNFs/CNFs, called Virtual Net-work Function Record (VNFR). Our OpenWhisk VIMinitially makes up default metadata, such as “0.0.0.0:0”for serverless VNFs/CNFs that should not be startedupon instantiation, but rather upon events.To handle event-driven instantiation and configura-tion of serverless functions, we developed a novel orches-tration subsystem, which is shown on the right side ofFigure 2. We use CNCF Argo Workflows [12] and ArgoEvents [11] as the basic mechanism for the proposedserverless orchestration. The former is a Kubernetes-native workflow management engine, while the latter isa Kubernetes-native event dependency resolution sys-tem that can trigger Argo Workflows in response toexternal events. We include a special bootstrap functionwith every network service that uses serverless func-tions that should be started on demand. In particular,in our implementation of the tele-immersive gaming,transcoders and replay functions are started on demandin response to in-application events rather than uponthe initial instantiation.The bootstrap function contains yaml definitionsfor two Kubernetes
Custom Resources (CRs) : Gate-way and Sensor, which are specific to this network ser-vice. The CRs comprise the standard Kubernetes mech-anism to extend the Kubernetes resources ecosystem,so that external resources can be managed like nativeones, such as pods, deployments, jobs, etc. An inter-ested reader is referred to the Argo documentation fordetails of how to use Argo. For the sake of the expo-sition in this work, it suffices to mention that CRs are erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 9
Fig. 1: Serverless Tele-Immersive Media in 5G MECessentially yaml files adhering to the Argo dialect. Eachsuch file, a CR, is an instance of a schema called CustomResource Definition (CRD). Gateway and Sensor life-cycles are managed by Gateway and Sensor controllers,respectively, which watch for the new CR instances ofGateway and Sensor CRDs. When such instances ap-pear as a result of applying the CR document to theKubernetes API server, the Gateway controller sets upa new Gateway instance and connects it to an externalevent source and a Sensor target, as specified in the CRspecification. Likewise, when a new Sensor CR is ap-plied, a Sensor controller that watches the Sensor CRDcreates a new Sensor instance and makes itself availableto receiving events from the appropriate gateways.In our implementation, the Argo Gateway, Sensor,and Workflow controllers are part of the pre-deployedservices provided by the MEC (see Figure 1). A boot-strap function is always started upon instantiation andimmediately after starting, and applies yaml CR defi-nitions of Gateway and Sensor for this service instance,thus creating a session level event-driven orchestrationcontrol plane. This control plane exists for the durationof the service and once the service is deleted (or nat-urally comes to a termination, e.g. if the game time isup), this event driven control plane is purged from thesystem. Our implementation uses an out-of-the-box Web-hook Gateway that can receive external HTTP requeststhat it passes to the Sensor. The sensor is more intri-cate. Based on the payload of the HTTP request (i.e.an event that it receives from the Gateway), it condi-tionally executes lifecycle management actions, such asstarting a serverless VNF/CNF, stopping a VNF/CNF,Day2 configuration related actions, etc. A service de-veloper has to program the Sensor to enable this event-driven orchestration at runtime.The lifecycle management action is an Argo Work-flow instance (yet another CRD) that natively executesin Kubernetes. In essence, when a Sensor condition-ally triggers Argo workflows dependent on the opera-tion passed to the Sensor by the Gateway. The opera-tion specification is part of the original HTTP request.The triggering is performed by applying a correspond-ing Argo Workflow CR instance to the Kubernetes APIServer. The Argo Workflows controller that watches forthe new CR instances pick it up and sees for the work-flow execution.We use this this novel orchestration mechanism asfollows. When our network-centric optimization decidesto reallocate specific transcoding profiles, the controlplane of the application that performs the optimiza-tion of a specific session sends an HTTP request tothe Gateway of that session (previously set up by the bootstrap function upon instantiation of the service)requesting termination of some transcoder profiles andallocation of some other profiles (i.e. terminating somerunning OpenWhisk actions and invoking some otherOpenWhisk actions in the Kubernetes NFVI throughthe OpenWhisk API). Likewise, when an event of in-terest happens in the session, an HTTP request to starta replay function is sent to the Gateway of the session,triggering a management workflow in the Sensor of thesession that invokes the replay action, configures it andconnects it to the rest of the running service.3.2 Networking for Serverless Applications in the 5GMECFaaS frameworks do not support direct network com-munication among functions out of the box. In ourprototypical implementation, we use Kubernetes as thebackend for OpenWhisk actions (containers) execution.Kubernetes provides a number of networking solutionsout of the box through its Container Network Interface(CNI) standard. These solutions differ in the level ofmaturity and sophistication. A survey of the Kuber-netes networking landscape is out of scope for this pa-per, so we provide only a high level description pertain-ing to our implementation. In our proposed solution,we use Flannel [3], a simple pod level overlay networkthat can be used to enable containers running in thesepods to communicate directly. The challenge in usingFlannel for our work was in devising the orchestrationworkflows in the Sensor to set up the network just intime upon the service instantiation and then connectingthe newly invoked OpenWhisk actions (which eventu-ally run as pods) to get connected to this network.A typical hard problem associated with this is portmapping. For each pod in Kubernetes, an IP addressof the pod is the address of the Kubernetes Master(also known as an address of the cluster). However,ports should be allocated dynamically and without con-flicts. For internally addressable (i.e. within the sameKubernetes cluster), the port mapping is automaticallysolved by using a NodePort resource that exposes a podas a service. However, in our case, if a service compo-nent should be accessed externally, a more elaboratedIngress resource should be defined. We omit the tech-nical details of setting up and configuring the Ingressresource and Ingress Controller. It is important to stressthat in our system this is being done on demand usingour serverless orchestration mechanism described in theprevious subsection. 3.3 GPU Allocation for Serverless Applications in 5GMECSome transcoding profiles require GPUs for efficiency.In fact, a large part of this work is devoted to optimiz-ing usage of GPUs for serverless tele-immersive mediaapplications in 5G MEC, where these resources mightbe scarce and relatively expensive. However, before wecan optimize usage of GPUs by serverless frameworks,we need a basic support for consuming them. ApacheOpenWhisk proved to be an easily extensible frame-work in this respect. OpenWhisk contains an extensi-ble dictionary of action kinds that define their runtimes.We created a new runtime that uses NVIDIA’s CUDAframework. For example, a generic CUDA action canbe defined as follows: " cuda ":[ { " kind ": " cuda :8 @gpu ", " default ": true , " image ": { " prefix ": " docker5gmedia ", " name ": " cuda8action ", " tag ": " latest " }, " deprecated ": false , " attached ": { " attachmentName ": " codefile ", " attachmentType ": " text / plain " } } Listing 1: CUDA actionAdding an entry to the action kinds dictionary isnot sufficient to make Apache OpenWhisk to interpretthis new action kind. There is a component in ApacheOpenWhisk called Kubernetes Client, which — whenOpenWhisk is being configured to use Kubernetes asa container management environment for the actions— creates a Kubernetes pod yaml definition out of theaction metadata. This yaml definition is then appliedby the Kubernetes Client to the Kubernetes API Serverand the action starts executing as a Kubernetes pod.A Kubernetes yaml definition for the action shown inListing 1 would look as follows: apiVersion : v1 kind : Pod metadata : name : cuda8action spec : containers : - name : cuda8action image : " docker5gmedia / cuda8action :latest " resources : limits : erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 11 Fig. 2: Serverless Tele-Immersive Media in 5G MEC nvidia . com / gpu : 1 Listing 2: CUDA action Kubernes yaml definitionWe modified Apache OpenWhisk Kubernetes clientto recognize GPU action kinds that we defined fortranscoding profiles that use GPUs. When such an ac-tion is on-boarded on OpenWhisk that is configured towork with Kubernetes that has GPU equipped workernodes in its cluster, the action will be placed by theKubernetes scheduler to a node that has NVIDIA GPU(this functionality is being supported in Kubernetes asan experimental feature since Kubernetes 1.8). A fullimplementation is available in [20].
In this section, we describe the design of our adaptiveserverless streaming service. In our implementation weused FaaS, as described in Section 3. Although FaaSactions have limited life time, we found them adequatefor implementing our short-session-based service.
Tele-immersive media streaming services are usu-ally sporadic in nature, with long periods of idlenessinterspersed with short sessions of activity (e.g. gam-ing or conferencing). As a result, under a traditionalVM-based design, apart from the increased service com-plexity, a constant sizing problem would manifest when seeking to optimize the service’s costs. FaaS offers amore cost-efficient alternative as it automatically scalesto the number of active sessions.As media streaming consumers can have very differ-ent bandwidth or processing capabilities and networkconditions can fluctuate, a crucial part of an effectivemedia streaming service is adaptation. The original con-tent is transcoded into a number of media profiles, eachtargeting a different bandwidth and media quality, al-lowing each consumer to receive the profile most suitedto their needs. Lack of an appropriate profile can leadto frequent buffering events, for on-demand consump-tion, or make meaningful reception completely impos-sible for live streaming. Hence, adaptation is especiallyimportant in live streaming media services.An apparent advantage of a serverless adaptive me-dia streaming service is more efficient utilization of itsavailable resources, such as different transcoding pro-files. Indeed, for smaller consumer size sessions, notall profiles might be relevant, which allows for cost-optimized use of resources. Thus, apart from inter-session scaling, serverless streaming offers the capabilityof finer-grained intra-session scaling and adaptation.This is more pronounced for emerging media ser-vices, whose profiles and codecs have not yet convergedto a standard, in contrast to traditional (i.e. flat/2D)audiovisual media. Thus, emerging media has to deal with a wider repertoire of profiles. Specifically for 3Dimmersive media [49], [24], the profile selection prob-lem is more complex [32] due to the simultaneous avail-ability of various profiles (joint 2D and 3D) and theirsuitability to highly heterogeneous consumer types (i.e.mobile, workstations, VR headsets) that in turn, comewith different requirements in terms of the deliveredprofiles.This type of immersive media delivers two payloadssimultaneously, the 3D mesh media stream and themulti-view textures media streams. While the latter areencoded with traditional flat/2D media encoders, theformer use distinct 3D codecs. This effectively renderseach immersive media stream profile to be a tuple ofa video (i.e. 2D), P D , and 3D, P D profiles, leadingto a more complex visual quality formulation [33]. Fur-thermore, emerging consumption means that VR andAR accompany traditional displays (i.e. desktop/laptopand mobile), creating a far more complex landscape forprofile selection that depends on each consumer type’scomputing and viewing characteristics. We argue thatfor sessions with relatively few consumers, which willrequire only an optimal subset of profiles, a serverlessstreaming model is more appropriate, because it opensup more opportunities for optimization.As explained in the previous section, our extendedFaaS framework allows GPU consumption. This addsanother dimension to our profiles, expanding the two-tuple to a three-tuple ( P a D , P b D , R c ), containing the3D and video profiles, in addition to the computing re-source type R c (i.e. CPU: c = 0; or GPU: c = 1). Thus,profiles with similar bit-rates, may reduce processing la-tency at the expense of using higher cost resources. Forconciseness, we denote a transcoder’s joint 3D mediaprofile as P n , with n encoding a unique combination of a, b and c .Serverless design follows a single responsibility prin-ciple: each function is responsible only for a single task,instantiated as the need arises and destroyed when thetask is completed. In the context of media streamingadaptation, this translates to having one transcodingfunction for every combination of profile and source (i.e.a player).A general scenario for tele-immersive media stream-ing includes a population of producers ( K ), which gen-erate live 3D video streams; and a population of specta-tors ( S ) who need to receive the streams of all producersand reconstruct them in the virtual environment. Ourservice then comprises a broker function ( vBroker ) and | N | · | K | transcoder functions ( vTranscoder ), where N is the set of transcoding profiles, as each vTranscoderis responsible for transcoding the stream of one specificplayer to one specific profile. Producers send their production streams to thevBroker, while vTranscoders receive these streams fromthe vBroker, transcode them according to predefinedprofiles, and upload them back to the vBroker. Con-sumers then are served either the production stream ora transcoded one from each producer, based on an adap-tation logic. In the context of Kubernetes, this meansallocating a set of transcoding actions for each produc-tion stream.This serverless adaptive streaming design lends it-self to optimization. Transcoder functions can be de-ployed on demand while monitoring the service’s be-haviour, as events in response to the monitoring anal-ysis. Typically this relates to monitoring its cost, andseeking to minimize it, and monitoring the QoE of itsconsumers, seeking to maximize it. Taking into accountthe cloud-native transformation happening thanks tothe emergence of 5G and the virtualization and soft-warization of the network, it is possible to perform ser-vice optimization in an integrated manner with the net-work itself. Instead of relying on a local client-basedadaptation, service adaptation and optimization cantake a more global approach.Our streaming service is entirely dynamic, with thevBroker action deployed at the start of each session,for that specific session. This allows for edge proxim-ity placements and a flexible vBroker interconnectionscheme that unifies edge and core resources, allowingour session based services to span multiple infrastruc-tures. Transcoders are deployed on-the-fly according tothe network-centric session optimization logic. The ser-vice has a choice either starting with zero transcodersand subsequently adding them on demand as guidedby the optimization, or starting with a default trans-coder profiles configuration, and then adapt it to theactual consumers. This is similar to client-based adap-tation that start either on the lowest/highest profile,and then adapt to that which results in higher QoE.Application specific events (e.g. replaying high-lights) trigger processing functions that are deployedon the serverless infrastructure and are responsible tosynchronize media and game-state streams to producereplay clips that they can later be served to spectatorson demand.In Fig. 3, the service components are depicted. Onthe left, producers in the 3D immersive media produc-tion platform produce high quality profile 3D mediastreams, denoted as P k . The adaptive streaming com-ponents are comprised of a set of vTranscoders eachone being responsible for transcoding an input 3D me-dia stream from a single player to a single profile.Those transcoded streams become available to the con-sumers via the vBroker instance. Additionally, vReplay erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 13 Fig. 3: An abstracted architecture of our service: vTranscoders are instantiated or destroyed as needed, eachone responsible for transcoding the media stream of one producer to one profile. vReplay functions are similarlytriggered by certain events. All streams flow through the vBroker, from which consumers receive the allocatedmedia streams.instances are instantiated on the FaaS infrastructure inresponse to specific events, as described in Section 3.Upon the completion of replay clip processing, the pro-cessed media clips become available to the applicationconsumers on demand.In more detail, our network-centric real-time adap-tive streaming service drives an AugmentedVR [49]gaming application. The application manages gamingsessions supporting K players and S spectators, where | S | (cid:29) | K | . Each player is captured with a volumet-ric capturing station [83] and 3D-reconstructed in realtime [7], producing a live 3D media stream. The players’live media traffic, along with the application game statemetadata are transmitted and synchronized among theplaying users (more details regarding the application’sarchitecture can be found at [24]). In this way, playersare emplaced within the same shared virtual environ-ment, and interact within it under a capture-the-flagcontext. Through the aforementioned adaptive stream-ing service back-end, the application allows for remoteparty spectating of each gaming session.The spectators S receive the synchronized gamestate and all | K | players’ media streams, faithfully re-producing the current session, with example screen-shots presented in Fig. 4. While the players’ communi-cation is based on stringent real-time requirements, thespectators’ media consumption relies on broadcast traf-fic, and thus requires consistent streaming with relaxedlatency constraints. This is driven by a centralized con-trol plane of the application, which oversees the produc-tion and delivery of appropriate profiles to each specta-tor for smooth playback. The control plane is extensible and new optimization algorithms can be plugged in asneeded. In Section 5.5 we present our proposed network-centric optimization to drive the control plane and inSection 6 we compare this smart optimization with amore naive baseline algorithm to quantify the benefitsof the network-centric optimization. The control planeof the application interacts with the Serverless Orches-tration mechanism described in Section 3 to actuate thetranscoder profile allocation plans calculated via opti-mization.These profiles are selected from a set of profiles N ,with each spectator receiving one profile P kn ∈ N (with n ∈ | N | ) for each player k ∈ K . Each profile is servedby a single transcoding action, spawned and managedby the service, that re-encodes the originally producedprofile P k0 from a specific playing user, to a lower bit-rate profile P kn , which is made available on the broker.At the same time, the application orchestrates the pro-duction of on-demand media in the form of highlightreplay clips. These are event-driven processing actionsthat produce finite media streams of previously cap-tured live traffic. Once produced, these too are availableon the service’s broker for on-demand consumption bythe spectators. Finally, the orchestration and manage-ment of the transcoding actions are handled by our ser-vice’s optimization logic that has a dual role. On onehand, to optimize the application’s costs, while preserv-ing the resulting QoE by making scaling decisions for itselastic components (i.e. the transcoding actions); and,on the other hand, to apply network-centric adaptationby collectively deciding each spectator’s consumed pro-file. Fig. 4: Screenshots of the AugmentedVR [49] immersive media game where the playing users real-time 3D me-dia streams are embedded into the same shared virtual environment. The screenshots’ viewpoints are those ofspectating users that can freely navigate the scene in order to spectate the action around the virtual arena.One important design concern is dealing with thefixed maximal life time of FaaS executions. In cases,when the session time is about to exceed the lifetime ofthe functions involved, a shadow FaaS invocations canbe started and configured. As explained in Section 3we use NodePort to expose serverless functions as Ku-bernetes services. This means that we can transparentlyswitch one FaaS invocation by another without disturb-ing the service. Therefore, while any concrete serverlessfunction cannot execute beyond its maximal life time,collectively an intensive media session can be extendedas needed at fine granularity.
When considering the optimization of a serverless livestreaming delivery network, there are two conflict-ing objectives: to maximize the QoE of every indi-vidual spectator and to minimize the cost to the ser-vice provider. Maximizing the QoE entails making thestreams available in multiple versions differentiated invisual quality and bitrate, so that each spectator canconsume a version most suited to her device type, pro-cessing power and connection capabilities. The produc-tion of multiple transcoding profiles, however, involvesrunning more transcoder FaaS functions, thus increas-ing cost.In order to balance a tradeoff between QoE and cost,both must be expressed in common units. Providing acertain QoE level can be naturally connected to gener-ating revenue for the stream producer, either directly orindirectly. Our proposed optimization maximizes profit for the stream provider (i.e. the revenue minus cost ob-jective). This section describes the components involvedin modelling revenue and cost.5.1 Spectator behaviorDuring the course of a session of live-streamed me-dia, individual spectators may be consuming the streamfrom its start, or join at any later point in time. Streamsof different characteristics (e.g. popularity) may attractnew spectators at different rates and numbers. Simi-larly, spectators may stay online until the stream ends,or quit before that, for reasons which may or may notbe relevant to the stream characteristics.
Traditionally, an arrival process of people to stores, fa-cilities, telephone calls have been modeled using Poissondistribution [53], [47]. The Poisson distribution calcu-lates the probability of k events (e.g. arrivals) occurringin a specified interval, given the average number λ ofevents per interval [91]: P ( k events occurring in interval) = λ k e − λ k ! (1)The process of spectators arrival to an immersivegaming session involves humans making discrete deci-sions on joining a gaming session. Modeling spectators’arrival to immersive gaming requires a thorough studyand careful characterization. However, we are unaware erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 15 of the literature presenting accurate stochastic mod-eling for spectators’ arrival. Probably, this can be ex-plained by this domain being relatively new and rapidlyevolving.It is reasonable to assume that in certain settingsthe overall arrival process in immersive gaming mightdiffer considerably from the Poisson distribution. Forexample, a large number of spectators can arrive simul-taneously at the beginning of a session if a session isperceived as extremely interesting because of the promi-nence and high ratings of the players. More spectatorsmight arrive later and their arrival process can followthe Poisson distribution with the overall distributionbeing non-Poisson. Note that Poisson inter-arrival pro-cess requires that the inter-arrival times are distributedexponentially. Obviously, a simultaneous bulk joining ofspectators breaks this assumption. Likewise, a sessioninter-arrival process can start as a Poisson one and thena bulk of spectators can join all at once, e.g. because ofa viral notification in a social network that this sessionis a must to attend. Obviously, the Poisson assumptionswill be broken in this case as well.These are extreme scenarios which, albeit possible,are not necessarily expected in each and every gamingsession. Furthermore, these scenarios are not amenableto the fine grain optimization that we propose in thispaper, because statistically in large spectator popula-tions we would expect to have enough members with ev-ery possible terminal capability to justify allocation ofall possible combinations of transcoding profiles to max-imize the total QoE for spectators. This would make theproposed serverless architecture to perform no betterthan other approaches.To explore and characterise the sweet spot of ourproposed optimization, we focus on relatively small ses-sions in terms of spectators number and make an as-sumption that these spectators will behave similarly tothe viewers of the video streaming services. It shouldbe noted that spectators consume video and becausethe sessions are relatively short and dynamic their be-havior might be similar to that of the video streamingservices customers browsing video content and watch-ing previews. It is widely accepted to model the inter-arrival process of such customers using Poisson distribu-tion [48,18] even though some works exist that indicatethat the inter-arrival process to the media streamingservices can be better modeled by the lognormal distri-bution [8].In this work, we decided to follow the mainstreamapproach and model inter-arrival process of spectatorsto the gaming sessions as a Poisson distribution. Wedefine the distribution’s interval as the ten-second timestep and set the average number λ of arrivals per inter- val to a range of values from 0.25 to 1, with a defaultof 0.5 arrivals per ten-second time step. Once spectators join, they may remain online until theend of the stream or quit before then. Chen et al. [22]model spectator quitting probability as a function oftheir QoE: a spectator with very bad QoE is certain toquit, while a spectator with very good QoE is likely toremain but still has some 20% probability of quittingbefore the session ends, for non-QoE related reasons.Between these two extremes, the decrease of quittingprobability with QoE is assumed to be linear.In a scenario with a diverse mix of spectators, QoEmay vary significantly depending on device type, pro-cessing power and connection bandwidth. Spectatorswith powerful PCs and a good connection will have abetter QoE than spectators with mobile devices, whichwould lead to mobile spectators quitting much more fre-quently. In this work we consider that each spectator isaware of their own hardware and connection capabili-ties, and will be happy with the best QoE possible forthat configuration. Hence, in estimating quitting proba-bility, we consider the difference between the maximumQoE possible for each spectator and their actual QoE.Other factors that might impact quitting for non-QoE related reasons include the interest level of a givensession: spectators may abandon a boring or slow ses-sion more easily than a very active or thrilling session.This will also impact QoE-based quitting probabilitymodeling, as spectators may be reluctant to leave aninteresting stream despite QoE being mediocre. QoE-related quitting can be further altered by how demand-ing a spectator population is.Hence, based on the findings of [22] and these con-siderations, we build a linear quitting model for each10-second time step. The probability that during a timestep t a spectator experiencing QoE t , will quit, is: q t = q ( QoE t ) = b + ( QoE max − QoE t ) · d = b + dQoE · d (2)where: – b is the base quitting probability per time step fornon-QoE related reasons, with a default value of0.37%, corresponding to a cumulative probabilityof 20% to quit at some point in the course of a 10-minute session. – dQoE s is the difference from the maximum possibleQoE for that spectator. – d is a factor denoting how much QoE impacts quit-ting, which is dependent on QoE value range pro-duced by the QoE model and the session parameters(i.e. how interesting or important a session is, andconsequently how likely spectators are to leave be-cause of QoE dissatisfaction). The QoE model weadopt (see 5.3) produces values usually within therange of 2.8–3.8. Hence, d ranges from 10% (an in-teresting session that spectators won’t quit easily)to 50% (very demanding spectators), with a defaultvalue of 20%.The probability that a spectator remains online ina given time step is p t = 1 − q t . The probability of aspectator to remain online from t to t would be theproduct of remaining at each individual time step inbetween, which, naturally, is decreasing over time: p t → t = t (cid:89) t = t p t = t (cid:89) t = t (1 − q t ) (3)Eq. 3 assumes that quitting events during differenttime steps are independent and identically distributed.While this may not always be the case, a more so-phisticated model of spectator behavior is currentlyoutside of the scope of this work, because more fielddata should be collected on immersive media spectatorsonline behavior as these services become mainstream.Presently, this is still a new area and we believe that us-ing simpler modelling is justified forr initial explorationof cost/QoE trade-offs.To calculate the probability of a spectator remainingactive from the beginning of the session to its end, t and t can be set to 0 and | T | , respectively. For a 10-minute session comprised of 10-second time steps, | T | would equal 60.Hence, for example, a demanding spectator in a bor-ing session, with a dQoE of 0.5, might have a 5.37%probability to quit every 10 seconds, meaning she maysoon leave unless her QoE improves. Note that in therelatively narrow QoE range produced by the QoEmodel (see Section 5.3), a dQoE of 0.5 represents asignificant decrease from the optimal QoE for this spec-tator. Conversely, for an undemanding spectator in aninteresting session with a dQoE of only 0.1, quittingprobability would be 0.57% per ten-second time step,and he is 71% likely to remain until the end of a 10 minsession.5.2 RevenueDepending on the use-case and the marketing approach,revenue for the media stream service provider can range from direct (e.g. a subscription-based or pay per-useservice) to indirect (e.g. a service supported by ads).In general, the provider is interested to keep specta-tors engaged for longer time periods, because it mightgenerate more revenue. In an ad-supported service,spending more time watching the stream results ingreater exposure to the advertisements. In a subscrip-tion service, spectators who don’t spend so much timewatching the stream may reconsider renewing their sub-scription. Spectator QoE may also impact the revenuethey are generating, or not, depending on the specificuse-case. In a pay per-use service, the revenue gener-ated is directly proportional to the time spent in theservice.In this work, we consider an ad-supported use-caseas a baseline scenario, and correspondingly assume thateach active spectator generates indirect revenue pertime unit, so long as they remain active. Revenue gen-erated per time step can be constant, or a function ofthe spectator’s QoE, considering that spectators hap-pier with their QoE may be more receptive to ads. Asrevenue modelling varies by use-case and is outside thescope of this paper, we consider the generic case thatrevenue is a function of QoE. This can be modelledby any monotonically non-decreasing function, e.g. con-stant, linear or logistic: r t = r ( QoE t ) = a (constant)(4)or r t = r ( QoE t ) = a · QoE t (linear)(5)or r t = r ( QoE t ) = a ·
11 + e − QoE t (logistic)(6)Over the course of a streaming session, the revenuegenerated by a spectator during each time step accu-mulates to produce the total revenue over time: r t join → t quit = t quit (cid:88) t = t join r t (7)However, the time that spectators remain active,and therefore generate revenue, is directly affected bythe QoE they are experiencing, as mentioned in subsec-tion 5.1.2 and Eq. 2. For a given future time step t , the average expected revenue generated by a spectator with QoE t and q t probability of quitting will be dependent erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 17 on the probability they remain active until t . Takinginto consideration Eq. 3: E ( r t ) = r t · p t → t = r t · t (cid:89) t = t (1 − q t ) (8)Therefore, taking into consideration Eq 8 and thedependency of r and q on QoE for every time step,the total expected revenue from a spectator, from thecurrent time t until time t is: E ( r t → t ) = t (cid:88) k = t E ( r k ) = t (cid:88) k = t (cid:34) r k · k (cid:89) t = t (1 − q t ) (cid:35) = t (cid:88) k = t (cid:34) r ( QoE k ) · k (cid:89) t = t (cid:16) − q ( QoE t ) (cid:17)(cid:35) (9)Eq. 9 highlights how QoE can impact revenue bothdirectly, by altering the revenue an active spectator gen-erates per time unit, and indirectly, affecting their prob-ability of quitting early.5.3 QoE modelIn order to keep spectators from quitting the streamearly, thus maximizing generated revenue, an optimiza-tion algorithm would need to know what each spec-tator’s QoE is at present, and how it may change de-pending on the network-centric optimization decisions.Although a number of video streaming QoE models ex-ist (e.g. [69,70,89]), there is none, to our best knowl-edge, that regards textured 3D meshes viewed in a freeviewpoint environment. However, for testing purposes,a suitable 2D video QoE model may be adopted.In this paper we derive our QoE model from Zad-tootaghaj et al. [92]. In that work the authors considercloud gaming, which is a close match to our own use-case. Using subjective mean opinion score (MOS) mea-surements, they derive QoE as a second degree functionof image PSNR and frame rate (FR), fitted to the MOS: QoE = − .
97 + 0 . · FR + 0 . · PSNR − . · PSNR − . · FR + 0 . · FR · PSNR (10)Knowing the average PSNR and frame size for eachtranscoding profile and each spectator’s bandwidth, weuse this model to calculate each spectator’s QoE atpresent and estimate their QoE in the future for dif-ferent profiles. In a tele-immersive game, a spectator will be receiv-ing each player’s 3D representation in a transcodingprofile. For each profile, the average PSNR is known,calculated from the PSNR of the textures used to colorthe 3D mesh, considering that part of the screen occu-pied by the 3D reconstruction. Although the latter ofcourse varies by a spectator viewpoint, in the vast ma-jority of cases the 3D reconstruction will occupy an areaof 1-5% of the total screen area. Given that the areanot occupied by the 3D reconstruction is computer-generated and suffers no loss of quality with differenttranscoding profiles, we offset average texture PSNR toobtain an estimate of average spectator view PSNR.Depending on a spectator’s maximum bandwidth,they may be unable to receive the incoming stream atits full framerate. Eq. 10 considers the actual framerateexperienced by a spectator, which will be dependent onthat spectator’s connection bandwidth and the averageframe size of the received profile.In the general tele-immersive scenario, each spec-tator will receive transcoded media streams from | K | players. Each received stream might employ a differenttranscoding profile and have different PSNR and frame-rate values, thus resulting in a different QoE, regard-ing the quality of the 3D reconstruction of a particularplayer.The total QoE for each spectator, which aims to re-flect their satisfaction with the whole immersive expe-rience, will be a function of the individual QoEs corre-sponding to each player. A simple approach would be tosimply average the QoE of each player’s 3D reconstruc-tion. A more thorough modelling, which is beyond thescope of the present work, might take into account theposition and orientation of the spectator and the play-ers inside the virtual space and assign greater weightsto the 3D representations of players closer to the spec-tator, and nearer the center of their field of view. How-ever, position and orientation would not likely remainconstant in an immersive environment, even for a ten-second time step.In this work we opt for the simple averaging ap-proach, assuming that spectators can see both playersequally in the virtual space. This in no way limits thegenerality of the methodology and outcomes, as it con-siders the most generic case.5.4 CostsThe costs of delivering live media to a population ofspectators are comprised of two separate categories: thecost of running the necessary software to transcode andbuffer the data, and the cost of delivering the data tothe consumers. In the serverless approach we examine in this work,each transcoded media quality is being produced by itsown dedicated FaaS transcoder. We assume that suchtranscoders are being deployed in a 5G MEC FaaS (e.g.using our extended FaaS framework). Since MEC is, es-sentially, a cloud deployed at the edge (also referred toby telcos as a cloud edge ) the business model is similarto that of the cloud, but the resources are more scarceand therefore are likely to be priced differently. Appli-cations (such as our tele-immersive gaming) rent theseresources on a pay as you go basis. In fact, the applica-tion deployment can be more sophisticated. For exam-ple, some low end transcoders for spectators who alsocan tolerate slower (and thus cheaper) network connec-tions can be be placed in a public cloud, such as thoseprovided by IBM, Amazon, Google, etc. Some other,more demanding transcoders, will be placed closer tothe spectators, i.e. in the MEC and utilize the un-precedented 5G connectivity speed and bandwidth ata higher price point, striving at the overall profit max-imization for the service provider.In addition to the regular resources available toFaaS in the current commercial offerings, our videotranscoders may require the use of a GPU for real-time processing, which will incur additional costs, asdescribed in the following subsection.In addition to the transcoders, a broker function,active throughout the session, is also necessary to fa-cilitate the media stream traffic. The core broker func-tion also facilitates communication between the players.Therefore it will always be placed in the MEC.In cloud-deployed functions, only outgoing (not in-ternal) traffic is usually charged, with typical pricesranging from $ $ Over the last few years, GPUs have become essential toa multitude of applications. Cloud vendors have recog-nized this market potential and have started providingnew virtual server families that include GPUs. However,GPUs present some new issues. In particular, GPUs arenot easily amenable to sharing among different work-loads. This dictates a time-sharing approach and drivesup the cost of the cloud based GPU servers.Limitations to GPU sharing are especially challeng-ing for serverless computing. If time-sharing is used,then only one serverless function consuming GPU canrun at a GPU-equipped virtual server at a time, withthe rest of the server resources (CPU, RAM) beingwasted. As we go to press, we are not aware of any com-mercial offering for serverless computing with GPUs.This does not preclude such offerings in the near futureas GPU sharing improves (Nvidia , Nuweba ). Fur-thermore, we believe that a significant progress withbuilding commercial cloud offerings for serverless GPUswill only become possible when shareable GPU archi-tectures will become ubiquitous and this programmingmodel will be consumable at the application level.In our previous work, we developed a first-of-its-kind prototype for using GPUs with serverless func-tions. Our prototype uses Apache OpenWhisk and Ku-bernetes [20]. To enable quantitative reasoning aboutusing serverless computing for tele-immersive gamingin the 5G MEC’s FaaS, we need to develop an esti-mation of a realistic pricing model for GPUs usage inserverless computing. Since the MEC business model isessentially the same as the public cloud business modelwith an important distinction of resources being morescarce in MEC, which justifies their higher pricing thanin a typical public cloud. Essentially, the supposition ofMEC is hat it behaves like a cloud in the edge allowingto leverage proximity to users and higher KPIs at pos-sibly higher price points for providers, but overall mak-ing more profit by enabling new application capabilitiesand providing much better QoE that would attract alarge customer base.We therefore derive our hypothetical pricing plan forMEC using public clouds as a starting point. To thatend, we consider a typical CPU-based cloud functionspricing, and CPU-based virtual server pricing vs GPU-based virtual server pricing and develop a speculativemodel for the GPU based serverless costs. It should bestressed that our intention is neither to propose an ac-tual pricing model for GPU-based serverless computing nor to argue that the profit margins should necessarilybe the same as for the CPU-based one. Rather, our in-tention is to provide an educated guess for what thismodel might look like and use it to study the pros andcons of our proposed approach quantitatively.Our methodology is to assume the same profit mar-gins ratio between the GPU- and CPU-based serverlesscomputing as between GPU- and CPU-based virtualservers . The latter is directly observable from the pub-licly advertised cloud vendors pricing plans. Note thatwhile this assumption can deviate from the actual ra-tios in practice, a proportionality between the internalcost of production and the profit should exist. Hence, aslong as we preserve the directly observable ratios in ourestimations, they should serve as a reasonable proxy.As an example pricing reference point, we con-sider pricing plans for IBM Cloud. Similar results canbe obtained for other cloud vendors. Functions andIBM Cloud Virtual Server Instances . ACL1.8x60 andM1.8x64 are the two models of virtual servers with andwithout GPU, respectively. These two models have thesame number of CPUs (8) and approximately the sameamount of RAM (60 and 64, respectively). Billing is be-ing done on a monthly basis. At full time utilization (i.e.720 hours per month up time), M1.8x64 costs $ . $ .
504 while ACL1.8x60 costs $ . . ,
880 = 720 ·
4. Therefore, the cost of a single 15 minexecution can be assumed to be $ .
487 = , . , .To verify this calculation, one can observe that ex-actly the same number can be obtain by simply dividingthe hourly rate of ACL1.8x60 ( $ . $ . . , GPU seconds (we assumethe same usage of RAM as for the CPU case).Note that while running a GPU-based serverlessfunction, the same host can be used to also run CPU-based functions. Otherwise the CPUs and RAM ofthe GPU based host will be just wasted. As we ob-served above, the cost ratio between a CPU- and aGPU-based VM is 3 .
8. IBM Cloud Functions are be- https://cloud.ibm.com/functions/learn/pricing ing priced at the base rate of $ . $ . . · . . $ . $ . when this does not impact revenue too much by trig-gering too many spectators to quit the stream.In the course of our network-centring cost-efficiencyoptimization two sets of decisions must be taken on-line based on the metrics reported by active spectators andmodels for spectator behavior, cost, revenue and QoEdeveloped in previous subsections:1. Which transcoding profiles should be deployed inproduction at each point in time to minimize costsof production?2. Which of the transcoding profiles for each playershould be allocated by the service provider to eachspectator to maximize their QoE, thus, maximizingrevenue?We now define our optimization problem more rig-orously. Table 1 summarizes the notations that we usein problem formulation. Given a set of transcoding profiles N , a set of players K , and a set of spectators S t at time t , Determine the transcoding profiles x kn that should beproduced and assign which of those produced profileseach spectator should consume y sk → n , so as to Maximize an estimated total profit (ETP) forthe immersive gaming service provider:Based on all of the above, the expected total profitis given by:
ET P = (cid:88) s ∈ S t E (cid:32) r st →| T | (cid:18) QoE st ( y s K → N ) (cid:19)(cid:33) − (cid:88) k ∈ K (cid:88) n ∈ N (cid:18) c ( (cid:126)n ) · (cid:108) (cid:80) s ∈ S t y sk → n | S | (cid:109)(cid:19) − o · (cid:88) n ∈ N (cid:20) b n · (cid:88) k ∈ K (cid:88) s ∈ S t (cid:0) y sk → n − f sk → n (cid:1)(cid:21) (11) Subject to constraints : (cid:88) n ∈ N y sk → n = 1 , ∀ s ∈ S t , ∀ k ∈ K , ∀ t ∈ T (12) (cid:88) k ∈ K (cid:88) n ∈ N b n · (cid:0) y sk → n − f sk → n (cid:1) ≤ b st , ∀ s ∈ S t , ∀ t ∈ T ,f sk − >n ∈ (0 ,
1) (13) y sk → n ≥ f sk → n , ∀ s ∈ S t , ∀ k ∈ K , ∀ n ∈ N , ∀ t ∈ T (14)Eq. 11 gives the expected total profit (ETP) of theprovider as the difference between the expected rev-enue and the costs. It consists of three terms: The first term , deriving from Eq. 8, sums the expected revenuefor all spectators, which is a function of their QoE,which, in turn, depends on the profiles each of themis assigned to consume. The second term representstranscoding costs, summed for all players and profiles.For each player/profile combination, the ceiling func-tion returns 1 if at least one spectator consumes thatprofile (and hence it is actually in production), and 0if none do. The third term calculates traffic costs,summed for all spectators, players and profiles. Eachprofile n has an average bandwidth requirement of b n ,which is the maximum consumed by a spectator s whois receiving that profile from player k (i.e. y sk → n = 1 ).However, some of these spectators may be receiving n at a lower framerate and thus consume less bandwidth;this reduction is expressed by f sk → n .Constraint 12 ensures that each spectator is allo-cated exactly one transcoding profile per each player.Constraint 13 makes sure that the total effective band-width consumed by any spectator at any given time in-stance does not exceed the maximum bandwidth thatthis spectator can contain. QoE modeled by Eq. 10 im-plicitly corrects the frame rate to match constraints ofa spectator on inbound bandwidth at time t . In theproblem formulation we explicitly model this via band-width adaptation coefficients f sk → n ∈ (0 ,
1) . Finally,Constraint 14 prevents negative outbound traffic allo-cation.Algorithm 1 depicts how we solve the provider prob-lem in the on-line setting. Since in this setting the fu-ture is not known, we solve the optimization problemat every time step, estimating the revenue that will beaccumulated if all currently active spectators will re-main in the stream. In the next time step we correctthe estimation and again solve the optimization prob-lem to deploy the transcoding profiles and allocate themto the spectators. Since the network conditions (as wellas availability of the compute resources) might changefrom one time window to another for spectator s , thetranscoding profile allocation for s can also change. Aswe use FaaS, there are no additional costs associatedwith releasing serverless transcoders and starting newones. Since in practical settings the optimization prob-lem is relatively small it can be solved exactly eitherusing linear solvers like CPLEX or even through bruteforce.Obviously, the proposed algorithm is suboptimal,because it is based on estimating the quitting probabil-ities of spectators based on an estimated QoE and doesnot make long term decisions. Estimating QoE can betackled in a number of ways. In [13] this problem isapproached using reinforcement learning, reducing itscomplexity. erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 21 Table 1: Notation Summary
Notation Description
Sets Description K playing users (players), k ∈ KS spectators s ∈ S ; S t is the set of spectators at time t N transcoding profiles, n ∈ N Sequences T = { t i } | T | i =0 equidistant time steps t , t , . . . , | T | , where | T | is the maximal session lifetime Parameters q s ( QoE ) = q st an estimated probability that spectator s quits at time t for a given QoE value (See Eq. 2) p st = 1 − q st an estimated probability that spectator s stays in the session at time t(cid:126)n = ( m, g ) a resource demand vector of a transcoding profile n , where: m is memory in GB, g = (cid:40) , if GPU should be allocated0 , otherwise b n the average outbound bandwidth (GB/sec) of a transcoding profile nb st the average inbound bandwidth (GB/sec) that spectator s can contain at time tc ( (cid:126)n ) the cost (per second) of hosting a transcoding profile n using FaaS o ( b n ) the cost (per GB) of the outbound traffic produced by a transcoding profile nr st ( QoE ) the revenue generated by spectator s at time t for QoE level (see Eqs. 4, 5, 6)
QoE s K → N QoE of a spectator s when consuming transcoding profiles allocation K → N , ∀ k ∈ K , ∀ n ∈ N Decision Variables y sk → n spectator consumption assignments: y sk → n = (cid:40) , if spectator s consumes profile n for player k , otherwise f sk − >n ∈ (0 , b n of s reduced to match capacity Auxiliary Variables x kn transcoding profile active status: x kn = (cid:40) , if transcoding profile n is produced for player k , otherwise Achieving cost efficiency depends on an accuratemodelling of the costs and revenues. The former de-pends on the available cloud and 5G MEC commercialofferings for FaaS. The latter depends on the spectators’behavior. The modeling approach of Subsections 5.1–5.4 is relatively simple and generic, developed with theuse-case of immersive 3D media live streaming in mind.Naturally, each use-case will have its own peculiarities,which will need to be modelled accurately and possiblyfine-tuned using real data.In this paper, our focus is on demonstrating thateven for the relatively simple model, the serverless com-puting paradigm might result in significant benefits tothe provider.
We performed a series of experiments to validate theproposed optimization approach and quantify its ben-efits in different scenarios and conditions. Our exper-iments consider the aforementioned Augmented VRgame use-case, in which the spectators must receive two3D video streams, one for each of the two players. 6.1 Experimental setupTo develop and test the application functionality,all components of the service, as described in Sec-tion 4 and Fig. 3, were implemented, deployed andtested in the infrastructure provided by the 5G-MEDIAproject that offered Kubernetes NFVI with workernodes equipped with NVIDIA GTX Geforce 1650 GPUsand Open Sorce MANO (OSM) R5.05 with FaaS VIMplugin installed . Players, spectators, and the controlplane, have been deployed locally on PCs, while thebroker, transcoders and replay application componentshave been deployed as FaaS VNFs via OSM/FaaS Plu-gin and orchestrated by the control plane in an event-driven manner using our Serverless Orchestration mech-anism with OSM being a unified entry point.However, the infrastructure we had access to hasbeen relatively small and imposed hard limits on boththe number of spectators and the number of concurrenttranscoders that can use GPUs. Hence, after initial testson the actual infrastructure, a more extensive study ofcost optimization was conducted using simulation. https://github.com/5g-media/faas-vim-plugin2 K. Konstantoudakis, D. Breitgand et al. Algorithm 1:
The Overall Smart network-centric optimization Algorithmic Framework
Input:
Transcoding Profiles N , Players K Output: (1) Optimized Deployment of TranscodingProfiles ∀ t ∈ T (2) Optimized Allocation of TranscodingProfiles to spectators ∀ t ∈ T QoE model ← Eq. 10;Costs model ← pricing plan for GPU and CPU FaaSand outbound bandwidth;Revenue model ← e.g. Eq. 4; for t ← t to | T | do Metrics ← collect metrics from the activespectators S t ;Infer b st and compute power ∀ s ∈ S t ;Estimate QoE s K → N , ∀ s ∈ S t and ∀ K → N ;Estimate q st ∀ s ∈ S t ; y sk → n , f sk → n ← solve the optimization,considering Eq. 11, Eq. 12, Eq. 13, Eq. 14;Determine which transcoding profiles to activatefor each player: x kn = (cid:40) , if ∃ y sk → n = 10 , otherwiseActivate transcoders (via FaaS), as required;Inform spectators about the new profileallocation; end The Simulated spectators adhere to joining and quit-ting behavior described in Subsection 5.1.1 and Sub-section 5.1.2. The experiments feature a diverse set ofspectators, varying in connection bandwidth and pro-cessing capabilities, to reflect a mixture of real-life userprofiles. For each spectator a set of metrics is collectedevery 10 seconds, reporting their bandwidth, process-ing power, the transcoding profiles they are currentlyreceiving, and the framerate for each. Based on thosemetrics, each spectator’s current QoE is be calculated(from Eq. 10), as well as an estimate of the QoE theywould experience if they were to receive different trans-coding profiles.We consider relatively small sessions and assumethat GPUs are available as needed in 5G MEC’s NFVI.Spectator bandwidth is subject to a small degreeof random fluctuation, to simulate changing networkconditions. Likewise, the processing power that can beallocated to the video processing is varied to simulatechanging workload conditions of the user equipment.Processing power can impose a limit to the maximumframe-rate a spectator can decode.
The control plane receives metrics from all spectatorsand, based on them, decides the optimal set of trans-coding profiles that must be produced, and which oneof them each spectator should consume. The algorithmmakes decisions on 10 second time steps correspondingto the 10 second monitoring intervals.We compare two optimization algorithms: – Naive Optimization greedily optimizes spectatorQoE. Based on the QoE modelling described in Sub-section 5.3, it determines which transcoding profilewill result in the optimal QoE for each spectator andallocates transcoders to produce this set of profiles,regardless of the production cost. – Smart Network-Centric Optimization optimizescost-efficiency. It balances the trade-off between theprofit and QoE. It considers spectator QoE, thequitting probability as a function of QoE, revenuegenerated by the spectators remaining online, andproduction and delivery costs, and determines theset of transcoding profiles to be produced to max-imize profit (see Eq. 11, Eq. 12, Eq. 13, Eq. 14).In particular, expected revenue (Eq. 8) is calculatedon the assumption that profiles assigned to specta-tors during the current time step will also persistfor future time steps.In what follows, we will refer to these two algorithmssimply as
Naive and
Smart . For each of the two players’ 3D video streams five trans-coding profiles are supported, in addition to the produc-tion streams, which are also available for consumptionby the spectators and require no transcoding.The production stream and all still images profiles,encode textures as JPEG images of quality 30, in vari-ous resolutions. The video profiles encode textures as aHEVC video of fixed resolution, targeting various bit-rates. Production frame-rate is set to be 25 frames persecond. Table 2 lists the specifications of transcodingprofiles used in our experiments.Table 2: Transcoding profiles’ specifications
Texture MeshName Node Frame size Resolution PSNR Geometry Blend weightsProduction None 200 KB 960x540 32.02 dB 10 bits 6 bitsImages Mid CPU 170 KB 864x486 28.78 dB 9 bits 5 bitsImages Low CPU 135 KB 768x432 28.02 dB 8 bits 4 bitsVideo Low GPU 55 KB 960x540 28.66 dB 8 bits 4 bitsVideo Mid GPU 70 KB 960x540 30.00 dB 9 bits 5 bitsVideo High GPU 85 KB 960x540 31.59 dB 10 bits 6 bits erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 23
Still images profiles produce slightly worse imagequality and, naturally, a significantly larger averageframe size. They can be transcoded in real time on aCPU node. Since they have no inter-frame decoding de-pendency, spectators who fail to receive or decode theframes at the production frame-rate can skip frames toalways display the most current frame.On the other hand, video profiles achieve better im-age quality with much higher compression rates, and re-quire a GPU node for real-time transcoding. In the testimplementation, they do not support skipping frames,due to inter-frame compression. Spectators who can-not match the production framerate may start laggingbehind, so the optimization algorithm running in thecontrol plane will never assign a video profile to such aspectator.
Each experiment considers a streaming session of tenminutes. The stream starts with a default number ofspectators being 10. New spectators join the streamaccording to the Poisson process (See Subsection 5.1.1)and leave the stream according to their probability ofquitting (see Subsection 5.1.2).Figure 5 shows the timeline of a sample experimentfeaturing the Smart optimization in the control plane.Cost-related quantities (revenue, cost, and profit) cor-respond to the upper left vertical axis and are shown ina per-time-step basis. Initially revenue is low, due to alow number of initial spectators, resulting in low profit,at times dipping into the negative, for the first coupleof minutes. As the session progresses, more spectatorsjoin, gradually increasing revenue and profit.Node costs are significant in the beginning of thesession, becoming less so as more spectators join andrevenue increases. They remain more or less constant,while random changes in spectator bandwidth or pro-cessing power occasionally result in the production ofan extra set of GPU profiles, when it is deemed prof-itable. Towards the end of the session node costs diplower, as the expected future revenue of individual spec-tators diminishes, capped by the session duration. Thehigher cost of video profiles is partially offset by theirlower bitrate, which result in lower traffic costs. Thelatter naturally increase as more spectators join, butat a much lower rate, reaching a plateau after aboutfour minutes, when the active spectator population canjustify the production of more video profiles.Average QoE follows an upward trend. As morespectators join, the increased revenue can support theproduction of more transcoding profiles, able to satisfya more diverse population. The part of the graph below the horizontal axisshows the QoE progress of three sample spectators. Al-though the behavior of individual spectators has verysmall impact to the total revenue and profit in a ses-sion of about 20 active spectators, these examples canprovide some intuition about the progress of a session.Spectator A has joined from the start; she has limitedbandwidth with significant fluctuations. Sometimes sheis unable to receive the better quality profiles, and herQoE drops as a result. Finally, when it drops too low,she decides to leave. Her departure can be seen markinga small decrease in the revenue and profit of the nextcouple of time steps. Spectator B joins in the middleof the stream and experiences only very minor fluctua-tions. After some time he leaves, perhaps for non-QoE-related reasons, as his QoE is not so low. When specta-tor C joins, she is experiencing a quite low QoE. How-ever, the optimization algorithm quickly assigns an ap-propriate transcoding profile that maximizes her QoE,and so she remains active until the end of the stream.Abrupt changes in individual spectators’ QoE arenot always reflected in the average QoE or the profit,meaning that the optimizer made a decision to lowersome spectators’ QoE and raise that of the others, aim-ing for maximum expected profit.
In order to obtain general results and compare theSmart and Naive optimization adaptability under dif-ferent client conditions, experiments were performed us-ing a range of values for different experiment variables.Each set of experiments measured the impact of chang-ing one variable while keeping the others constant attheir default values. Experiment variables included: – Spectator arrival rate, following the Poisson distri-bution. Default value of 0.5 new spectators per timestep. – Revenue generated by each spectator. Default valueof 0.2 cents per time step. – Numbers of GPUs available. Six of the transcodingprofiles require a GPU to perform in real time. Alimitation on GPUs that can be used concurrently(implies that some profiles may not be producedconcurrently). The default number is 6, i.e. no lim-itation on the concurrent use of GPUs. – GPU costs. As mentioned before, there is currentlyno commercial option to rent GPU processing forFaaS. Based on calculations derived from currentCPU and GPU pricing for VMs, and considering theimplementation obstacles in GPU sharing, we esti-mate a default value of 10 times that of an otherwiseequivalent CPU node, and test for factors between
Fig. 5: A sample timeline of an experiment. Above the horizontal axis: revenue, node cost, traffic cost, profit (leftvertical axis) and mean QoE (right vertical axis). Below the horizontal axis: Individual QoE progress for threeexample spectators.x5 and x20, which seem reasonable, considering theanalysis in 5.4.1. – Spectator population. We identify 5 broad types ofspectators: • Mobile devices on Wi-Fi. • Mobile devices on 4G data. • Standard PC on basic DSL connection. • Standard PC on faster connections (e.g. VDSL) • High-end PC on a fiber optic connection.In the preliminary experiments, average decodingtimings for all transcoding profiles were measuredfor each device type, and these are used to calculatea maximum frame rate from a hardware perspective.In addition, each connection type is associated witha bandwidth typical for it, which provides a framerate cap from a connection perspective. The defaultpopulation consists of a balanced mix of the abovespectator types, while we also conduct experimentswhere specific types of spectators are dominant. – Quitting behavior: As mentioned in Subsec-tion 5.1.2, quitting probability is a function of QoEdissatisfaction and non-QoE-related causes, such ashow interesting a specific session is. As the default,derived from [22], we assume a 20% probability toquit before the session’s end at maximum QoE. Weconduct experiments for relatively boring (50% toquit) or interesting sessions (10% to quit), and also for more demanding spectators, in which case QoEdissatisfaction weighs more.6.2 ResultsIn the following graphs we present a comparison be-tween Smart and Naive optimizations. Each graph dis-plays a number of experiments differing at one exper-iment variable (see Subsection 6.1.5), shown on thehorizontal axis while keeping the others constant. Thegraphs show aggregate measurements of the entire ten-minute sessions, averaged across several replications ofthe same experiment.Quantities denoted with an S refer to Smart and areshown in a lighter shade, while those denoted with an N refer to the Naive. Across all graphs, money-relatedquantities (revenue, cost, and profit) are shown as barsand correspond to the left vertical axis. An additionalquantity, relevant to each graph, is shown as a line cor-responding to the right vertical axis. Such additionalvalues may include: – Spectators: The average number of spectators activeduring the session. This directly impacts revenue. – dQoE: the average difference between spectators’actual QoE and the maximum QoE they could pos-sibly achieve, given their network connection and erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 25 Fig. 6: Total session revenue, costs and profit (on the left vertical axis) and average spectator number (on the rightvertical axis) for different rates of new spectator arrival.hardware. This directly impacts the quitting prob-ability, which indirectly affects revenue. dQoE isshown in an inverted vertical axis. – Total QoE: The sum of spectators’ QoE, averagedacross all time steps, which can be perceived as ameasurement of a total quality of experience.
Fig. 6 compares performance of Naive and Smart underdifferent spectator arrival rates. As the rate increases,so does the average number of spectators staying in thesession and the revenue they generate. Traffic costs alsonaturally increase, as there are more spectators down-loading the streams. With more spectators, Smart’sspending on transcoding nodes also increases slightly,as the higher costs of producing more profiles is offsetby a larger number of spectators who will benefit fromthem. It can be noted that, as expected, the Smart’sadvantage is more pronounced when less spectators areactive.
Fig. 7 presents the experimental results for differentrevenue rates around the default of 1 cent per 10-secondtime step. This set of experiments follows the defaultarrival rate of 0.5, meaning that revenue is generatedby an average of about 25 spectators per session.Naturally, as the revenue generated by each activespectator increases, so too does the overall revenue. Thecosts also increase, as it is becomes more profitable to keep spectators satisfied. This is also reflected on thedecreasing dQoE, shown on the right vertical axis. Atlower revenue rates Smart makes a greater difference inprofit. A similar behavior of Smart is observed with theincreasing arrival rate as shown in Fig. 6.
This series of experiments, shown in Fig. 8, considersthe case where production GPUs usage is limited, cap-ping the number of video profiles that can be trans-coded simultaneously. As the number of available GPUsdrops, so do the options and versatility of Smart, lim-iting its benefit.
Fig. 9 compares system behavior with different revenuemodels regarding spectator QoE. With the constantmodel spectators deliver a set revenue so long as theyremain online, while with the linear and sigmoid modelsspectators with good QoE generate more revenue thanones with bad QoE. This makes good QoE more impor-tant, as it impacts revenue both directly and indirectly(by affecting quitting probability). With the linear andsigmoid models revenue models the advantage of Smartover Naive is less pronounced since it tends to maximizeQoE similarly to Naive.
As mentioned before, to the best of our knowledge thereis currently no commercial option to rent GPU nodes
Fig. 7: Total session revenue, costs and profit (on the left vertical axis) and average difference from the maximumpossible QoE (on the right vertical axis) for the different (constant) rates of revenue per active spectator and pertime step.Fig. 8: Total session revenue, costs and profit (on the left vertical axis) and average difference from the maximumpossible QoE (on the right vertical axis) when different numbers of GPUs are available for the transcoding ofstreams to video in real time.for FaaS processing. However, it is entirely possible thatsuch options will be available in the near future, espe-cially if demand for it rises. GPU processing will cer-tainly cost more than CPU processing. This set of ex-periments examines the impact of the price ration be-tween GPU and CPU nodes. As seen on Fig. 10, as GPUprocessing becomes more expensive, Smart becomesmore frugal with GPU-dependent profiles, letting spec-tator QoE drop away from the optimal. Hence, it cankeep running costs manageable and generate a profiteven when GPU utilization is priced high, with a small decrease in QoE. It can be noted that although at lowGPU pricing optimization offers a relatively small ben-efit, this becomes much more emphasized when GPUusage is more expensive. Also note how Smart’s trafficcosts get higher as GPU cost increases and still-imageCPU profiles, which have a lower compression rate andthus higher bitrate, are preferred. erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 27
Fig. 9: Total session revenue, costs and profit (on the left vertical axis) and average difference from the maximumpossible QoE (on the right vertical axis) is we assume a constant revenue rate (regardless of QoE) or a revenuerate that is a linear or sigmoid function of QoE, rewarding better QoE with higher revenue.Fig. 10: Total session revenue, costs and profit (on the left vertical axis) and average difference from the maximumpossible QoE (on the right vertical axis) if we assume different ratios in the prices of GPU nodes to CPU nodes.Ratios of x5 - x20 seem reasonable, based on current VM rental prices.
Fig. 11 regards experiments with different spectatorpopulations. Although all experiments contain all typesof spectators, in this set we examine the impact of hav-ing different dominant spectator types. Smart holds asteady advantage across all cases. The two right-mostsets of bars, corresponding to a greater percentage offaster connections, shows a marked decrease in nodecosts, offset by an increase in traffic costs, as many ofthose spectators can consume the high-bitrate produc- tion stream, obviating the need for (and cost of) trans-coding.
This set of experiments considers how quitting prob-ability impacts Smart’s decisions. Fig. 12 shows mea-surements for standard boring and interesting sessions,in which quitting probability is respectively higher orlower; and also how more demanding spectators (inwhich case QoE dissatisfaction weighs more in their
Fig. 11: Total session revenue, costs and profit (on the left vertical axis) and total aggregate QoE (on the rightvertical axis) for spectator populations where different types are more frequent.Fig. 12: Total session revenue, costs and profit (on the left vertical axis) and average difference from the maximumpossible QoE (on the right vertical axis) when different factors affect spectator quitting probability, including howinteresting or boring the session is and how demanding the spectators.probability of quitting) affect the process. In a boringsession, Smart increases spending in an effort to keepspectators from quitting via providing a better QoE.On the contrary, in the interesting session Smart re-duces costs and allows QoE to drop, as spectators areless likely to leave anyway. With more demanding spec-tators, Smart targets higher QoE to keep them engaged,resulting in higher transcoding node costs.
Having conducted experiments with parameters span-ning numerous different assumptions and cases, someoverall conclusions may be reached. Smart QoE-costoptimization can reduce transcoding costs by up to60% and traffic costs by about 20%, while keeping rev-enue and QoE very close to the optimum. Optimiza-tion’s benefits are especially pronounced in cases withfew spectators, low revenue and high GPU costs. WhenGPUs become available for FaaS, possibly in the nearfuture, they can be expected to start at higher prices, erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 29 gradually dropping as their use becomes more common.Live streaming platforms, especially those dealing inemerging media, will likely need to start with a smallspectator base before gaining momentum and scaling.Hence, smart QoE-cost optimization will be indispens-able in the live media streaming landscape of the nearfuture.
5G networks will disrupt the way media intensive ser-vices are being developed and operated by unlockinga plethora of opportunities to both service developerand service operator. In this work, we studied one suchcapability that integrates modern serverless technologywith real-time adaptive media streaming in 5G MEC.To the best of our knowledge, this is the first work thatdoes this.Apart from the conceptual, architectural and tech-nical contributions, our work further examined the po-tential of this option in terms of network-centric servicecost optimization. Our findings indicate that for smalluser populations and finite duration sessions, serverlessadaptive streaming can offer reduced operating expen-diture (OPEX) while preserving the service’s QoE.Also, through our extensive modelling and analy-sis we concluded that naively applying serverless willnot necessarily offer these gains. We hope that ourwork will inspire further research and development to-wards adapting services not originally suited for lighter-weight virtualization to serverless architectures, andunify them with the advanced capabilities that 5G net-works offer to capitalize on its advantages in novel ways.Finally, taking into account the recent introduction andthe emerging availability of GPUs specifically designedfor data centers our work can be extended to accom-modate these developments. Specifically for media ser-vices, GPU slicing can allow for even finer-grained costoptimization, something that was not possible before.One interesting future work direction is to exploremore sophisticated placement schemes for transcodersand other components, in which they can be spreadacross the full compute spectrum across cloud and edgeto leverage differentiated pricing for compute, storageand network resources to meet demanding KPIs atlower price points.
Conflict of interest
The authors declare that they have no conflict of inter-est.
References
1. Apache OpenWhisk: Open Source Serverless Cloud Plat-form. https://openwhisk.apache.org/
2. Common Media Application Format | MPEG. URL https://mpeg.chiariglione.org/standards/mpeg-a/common-media-application-format
3. Flannel, Kubernetes Networking. https://github.com/coreos/flannel
4. IBM Cloud Functions. https://cloud.ibm.com/functions/
5. Knative: Kubernetes-based platform to deploy and man-age modern serverless workloads. https://knative.dev/
6. Service Development Kit for Media-Type VirtualizedNetwork Services in 5G Networks (to appear). IEEECommunication Magazine (2020)7. Alexiadis, D., Chatzitofis, A., Zioulis, N., Zoidi, O.,Louizis, G., Zarpalas, D., Daras, P.: An Integrated Plat-form for Live 3D Human Reconstruction and MotionCapturing. IEEE Transactions on Circuits and Sys-tems for Video Technology (4), 798–813 (2017). DOI10.1109/TCSVT.2016.25769228. Ali-Eldin, A., Kihl, M., Tordsson, J., Elmroth, E.: Anal-ysis and characterization of a video-on-demand serviceworkload. In: Proceedings of the 6th ACM MultimediaSystems Conference, pp. 189–200 (2015)9. Alvarez, F., Breitgand, D., Griffin, D., Andriani, P., Ri-zou, S., Zioulis, N., Moscatelli, F., Serrano, J., Keltsch,M., Trakadas, P., et al.: An edge-to-cloud virtualized mul-timedia service platform for 5g networks. IEEE Transac-tions on Broadcasting (2), 369–380 (2019)10. Ao, L., Izhikevich, L., Voelker, G.M., Porter, G.:Sprocket: A serverless video processing framework. Pro-ceedings of the ACM Symposium on Cloud Computing(SoCC ’18) (2018). DOI 10.1145/3267809.3267815. URL http://par.nsf.gov/biblio/10098946
11. Argo Events Team: Argo Events - The Event-drivenWorkflow Automation Framework. https://github.com/argoproj/argo-events
12. Argo Team: Argo Workflows. https://github.com/argoproj/argo
13. Athanasoulis, P., Christakis, E., Konstantoudakis, K.,Drakoulis, P., Rizou, S., Weit, A., Doumanoglou, A.,Zioulis, N., Zarpalas, D.: Optimizing qoe and cost in a3d immersive media platform: A reinforcement learningapproach. In: MMEDIA 2020: The Twelfth InternationalConference on Advances in Multimedia. IARIA (2020)14. Ballard, T., Griwodz, C., Steinmetz, R., Rizk, A.: Rats:Adaptive 360-degree live streaming. In: Proceedings ofthe 10th ACM Multimedia Systems Conference, MM-Sys ’19, p. 308–311. Association for Computing Ma-chinery (2019). DOI 10.1145/3304109.3323837. URL https://doi.org/10.1145/3304109.3323837 . Event-place: Amherst, Massachusetts15. Barman, N., Martini, M.G.: QoE Modeling for HTTPAdaptive Video Streaming–A Survey and Open Chal-lenges. IEEE Access , 30831–30859 (2019). DOI 10.1109/ACCESS.2019.2901778. Conference Name: IEEEAccess16. Bentaleb, A., Taani, B., Begen, A.C., Timmerer, C., Zim-mermann, R.: A survey on bitrate adaptation schemes forstreaming media over http. IEEE Communications Sur-veys & Tutorials (1), 562–585 (2018)17. Bentaleb, A., Taani, B., Begen, A.C., Timmerer, C., Zim-mermann, R.: A Survey on Bitrate Adaptation Schemes0 K. Konstantoudakis, D. Breitgand et al.for Streaming Media Over HTTP. IEEE Communica-tions Surveys Tutorials (1), 562–585 (2019). DOI10.1109/COMST.2018.2862938. Conference Name: IEEECommunications Surveys Tutorials18. Bentaleb, A., Yadav, P.K., Ooi, W.T., Zimmermann,R.: Dq-dash: A queuing theory approach to distributedadaptive video streaming. ACM Transactions on Mul-timedia Computing, Communications, and Applications(TOMM) (1), 1–24 (2020)19. Bhargava, A., Martin, J., Babu, S.V.: Comparative Eval-uation of User Perceived Quality Assessment of DesignStrategies for HTTP-based Adaptive Streaming (2019).URL https://doi.org/10.1145/3345313
20. Breitgand, D., Weit, A.: Using gpus with apacheopenwhisk. https://medium.com/openwhisk/using-gpus-with-apache-openwhisk-c6773efcccfb (2019)21. Chakareski, J., Aksu, R., Corbillon, X., Simon, G.,Swaminathan, V.: Viewport-driven rate-distortion opti-mized 360 º video streaming. In: 2018 IEEE InternationalConference on Communications (ICC), p. 1–7 (2018).DOI 10.1109/ICC.2018.842285922. Chen, Y., Zhang, F., Wu, K., Zhang, Q.: Qoe-aware dy-namic video rate adaptation. In: 2015 IEEE Global Com-munications Conference (GLOBECOM), pp. 1–6. IEEE(2015)23. Chou, P.A., Koroteev, M., Krivoku´ca, M.: A volumetricapproach to point cloud compression—part i: Attributecompression. IEEE Transactions on Image Processing , 2203–2216 (2020). DOI 10.1109/TIP.2019.290809524. Christaki, K., Apostolakis, K.C., Doumanoglou, A.,Zioulis, N., Zarpalas, D., Daras, P.: Space wars: An aug-mentedvr game. In: International Conference on Multi-media Modeling, pp. 566–570. Springer (2019)25. Collet, A., Chuang, M., Sweeney, P., Gillett, D., Evseev,D., Calabrese, D., Hoppe, H., Kirk, A., Sullivan, S.: High-quality streamable free-viewpoint video (2015). URL https://doi.org/10.1145/2766945
26. Crowle, S., Doumanoglou, A., Poussard, B., Boniface,M., Zarpalas, D., Daras, P.: Dynamic adaptive meshstreaming for real-time 3d teleimmersion. In: Proceed-ings of the 20th International Conference on 3D WebTechnology, Web3D ’15, p. 269–277. Association forComputing Machinery (2015). DOI 10.1145/2775292.2775296. URL https://doi.org/10.1145/2775292.2775296 . Event-place: Heraklion, Crete, Greece27. Cui, L., Mekuria, R., Preda, M., Jang, E.: Point-cloudcompression: Moving picture experts group’s new stan-dard in 2020. IEEE Consumer Electronics Magazine (4),17–21 (2019). DOI 10.1109/MCE.2019.290548328. Ding, A.Y., Janssen, M.: 5g applications: Requirements,challenges, and outlook. arXiv preprint arXiv:1810.06057(2018)29. Doma´nski, M., Stankiewicz, O., Wegner, K., Grajek, T.:Immersive visual media — mpeg-i: 360 video, virtual nav-igation and beyond. In: 2017 International Conference onSystems, Signals and Image Processing (IWSSIP), pp. 1–9 (2017)30. Dou, M., Khamis, S., Degtyarev, Y., Davidson, P.,Fanello, S.R., Kowdle, A., Escolano, S.O., Rhemann, C.,Kim, D., Taylor, J., Kohli, P., Tankovich, V., Izadi, S.:Fusion4D: real-time performance capture of challengingscenes (2016). URL https://doi.org/10.1145/2897824.2925969
31. Doumanoglou, A., Alexiadis, D.S., Zarpalas, D., Daras,P.: Toward real-time and efficient compression of humantime-varying meshes. IEEE Transactions on Circuits and Systems for Video Technology (12), 2099–2116 (2014).DOI 10.1109/TCSVT.2014.231963132. Doumanoglou, A., Drakoulis, P., Zioulis, N., Zarpalas,D., Daras, P.: Benchmarking open-source static 3d meshcodecs for immersive media interactive live streaming.IEEE Journal on Emerging and Selected Topics in Cir-cuits and Systems (1), 190–203 (2019). DOI 10.1109/JETCAS.2019.289876833. Doumanoglou, A., Griffin, D., Serrano, J., Zioulis, N.,Phan, T.K., Jim´enez, D., Zarpalas, D., Alvarez, F., Rio,M., Daras, P.: Quality of Experience for 3-D ImmersiveMedia Streaming. IEEE Transactions on Broadcasting (2), 379–391 (2018). DOI 10.1109/TBC.2018.2823909.Conference Name: IEEE Transactions on Broadcasting34. Doumanoglou, A., Zioulis, N., Griffin, D., Serrano, J.,Phan, T.K., Jim´enez, D., Zarpalas, D., Alvarez, F., Rio,M., Daras, P.: A system architecture for live immersive3d-media transcoding over 5g networks. In: 2018 IEEEInternational Symposium on Broadband Multimedia Sys-tems and Broadcasting (BMSB), pp. 11–15. IEEE (2018)35. El Marai, O., Taleb, T., Menacer, M., Koudil, M.: Onimproving video streaming efficiency, fairness, stability,and convergence time through client–server cooperation.IEEE Transactions on Broadcasting (1), 11–25 (2018).DOI 10.1109/TBC.2017.278114636. Fan, C.L., Lo, W.C., Pai, Y.T., Hsu, C.H.: A surveyon 360 ° video streaming: Acquisition, transmission, anddisplay. ACM Computing Surveys (4) (2019). DOI10.1145/332911937. Girinathan, J., Breckinridge, R.: Simple serverless videoon demand (vod) workflow. https://aws.amazon.com/blogs/networking-and-content-delivery/serverless-video-on-demand-vod-workflow/ (2018)38. Graf, M., Timmerer, C., Mueller, C.: Towards Band-width Efficient Adaptive Streaming of OmnidirectionalVideo over HTTP: Design, Implementation, and Evalua-tion. In: Proceedings of the 8th ACM on Multimedia Sys-tems Conference - MMSys’17, pp. 261–271. ACM Press,Taipei, Taiwan (2017). DOI 10.1145/3083187.3084016.URL http://dl.acm.org/citation.cfm?doid=3083187.3084016
39. Hadas Gold: Netflix and YouTube are slowing downin Europe to keep the internet from breaking.URL https://edition.cnn.com/2020/03/19/tech/netflix-internet-overload-eu/index.html
40. Hannuksela, M.M., Wang, Y.K., Hourunranta, A.: AnOverview of the OMAF Standard for 360 ° Video. In:2019 Data Compression Conference (DCC), pp. 418–427(2019). DOI 10.1109/DCC.2019.00050. ISSN: 2375-035941. He, J., Qureshi, M., Qiu, L., Li, J., Li, F., Han, L.: Ru-biks: Practical 360-degree streaming for smartphones. p.482–494 (2018). DOI 10.1145/3210240.321032342. van der Hooft, J., Wauters, T., De Turck, F., Tim-merer, C., Hellwagner, H.: Towards 6DoF HTTP Adap-tive Streaming Through Point Cloud Compression. In:Proceedings of the 27th ACM International Conferenceon Multimedia, MM ’19, pp. 2405–2413. Associationfor Computing Machinery, Nice, France (2019). DOI10.1145/3343031.3350917. URL https://doi.org/10.1145/3343031.3350917
43. Hosseini, M.: Adaptive rate allocation for view-awarepoint-cloud streaming. CoRR abs/1911.00812 (2019).URL http://arxiv.org/abs/1911.00812
44. Hosseini, M., Swaminathan, V.: Adaptive 360 vr videostreaming: Divide and conquer. In: 2016 IEEE Inter-national Symposium on Multimedia (ISM), p. 107–110(2016). DOI 10.1109/ISM.2016.0028erverless Streaming for Emerging Media: 5G Network-driven Cost Optimization 3145. Hoßfeld, T., Seufert, M., Sieber, C., Zinner, T., Tran-Gia, P.: Identifying QoE optimal adaptation of HTTPadaptive streaming based on subjective studies. Com-puter Networks , 320–332 (2015). DOI 10.1016/j.comnet.2015.02.015. URL
46. ISO/IEC: Information technology — Dynamic adaptivestreaming over HTTP (DASH) — Part 5: Server andnetwork assisted DASH (SAND).
47. Jain, R.: The Art of Computer Systems PerformanceAnalysis: Techniques for Experimental Design, Measure-ment, Simulation, and Modeling. Wiley (1990). URL https://books.google.co.il/books?id=eOR0kJjgMqkC
48. Jin, S., Bestavros, A.: GISMO: a generator of internetstreaming media objects and workloads. ACM SIGMET-RICS Performance Evaluation Review (3), 2–10 (2001)49. Karakottas, A., Papachristou, A., Doumanoqlou, A.,Zioulis, N., Zarpalas, D., Daras, P.: Augmented vr. In:2018 IEEE Conference on Virtual Reality and 3D UserInterfaces (VR), pp. 1–1 (2018)50. Kazhdan, M.: Reconstruction of solid models from ori-ented point sets. In: Proceedings of the third Euro-graphics symposium on Geometry processing, p. 73. Eu-rographics Association (2005)51. Kazhdan, M., Bolitho, M., Hoppe, H.: Poisson surface re-construction. In: Proceedings of the fourth Eurographicssymposium on Geometry processing, vol. 7 (2006)52. Kim, H., Yang, J., Choi, M., Lee, J., Yoon, S., Kim,Y., Park, W.: Immersive 360 ° vr tiled streaming sys-tem for esports service. In: Proceedings of the 9thACM Multimedia Systems Conference, MMSys ’18, p.541–544. Association for Computing Machinery (2018).DOI 10.1145/3204949.3209619. URL https://doi.org/10.1145/3204949.3209619 . Event-place: Amsterdam,Netherlands53. Kleinrock, L.: Queueing Systems. Wiley (1976)54. Kritikos, K., Skrzypek, P.: A review of serverless frame-works. In: 2018 IEEE/ACM International Conference onUtility and Cloud Computing Companion (UCC Com-panion), pp. 161–168. IEEE (2018)55. Krivoku´ca, M., Chou, P.A., Koroteev, M.: A volumetricapproach to point cloud compression–part ii: Geometrycompression. IEEE Transactions on Image Processing , 2217–2229 (2020). DOI 10.1109/TIP.2019.295785356. Kubernetes: Production-Grade Container Orchestrator. https://kubernetes.io/
57. Li, Z., Zhu, X., Gahm, J., Pan, R., Hu, H., Begen,A.C., Oran, D.: Probe and Adapt: Rate Adaptationfor HTTP Video Streaming At Scale. IEEE Journalon Selected Areas in Communications (4), 719–733(2014). DOI 10.1109/JSAC.2014.140405. URL http://arxiv.org/abs/1305.0510 . ArXiv: 1305.051058. Maglo, A., Lavou´e, G., Dupont, F., Hudelot, C.: 3dmesh compression: Survey, comparisons, and emergingtrends. ACM Comput. Surv. (3) (2015). DOI 10.1145/2693443. URL https://doi.org/10.1145/2693443
59. Mao, H., Netravali, R., Alizadeh, M.: Neural AdaptiveVideo Streaming with Pensieve. In: Proceedings of theConference of the ACM Special Interest Group on DataCommunication, SIGCOMM ’17, pp. 197–210. Associa-tion for Computing Machinery, Los Angeles, CA, USA(2017). DOI 10.1145/3098822.3098843. URL https://doi.org/10.1145/3098822.3098843
60. Mehrabi, A., Siekkinen, M., Yl¨a-J¨a¨aski, A.: Joint opti-mization of qoe and fairness through network assistedadaptive mobile video streaming. In: 2017 IEEE 13th International Conference on Wireless and Mobile Com-puting, Networking and Communications (WiMob), pp.1–8. IEEE (2017)61. Mehrabi, A., Siekkinen, M., Yla-Jaaski, A.: Edge com-puting assisted adaptive mobile video streaming. IEEETransactions on Mobile Computing (4), 787–800(2019). DOI 10.1109/TMC.2018.285002662. Mekuria, R., Blom, K., Cesar, P.: Design, implemen-tation, and evaluation of a point cloud codec for tele-immersive video. IEEE Transactions on Circuits and Sys-tems for Video Technology (4), 828–842 (2017). DOI10.1109/TCSVT.2016.254303963. Misra, K., Segall, A., Horowitz, M., Xu, S., Fuldseth, A.,Zhou, M.: An Overview of Tiles in HEVC. IEEE Jour-nal of Selected Topics in Signal Processing (6), 969–977(2013). DOI 10.1109/JSTSP.2013.2271451. ConferenceName: IEEE Journal of Selected Topics in Signal Pro-cessing64. New European Media (NEM): 5G-MEDIA Slice Defini-tion. https://bscw.5g-ppp.eu/pub/bscw.cgi/d322688/NEM%20Networld2020%205G%20media%20slice%20V1-2_24092019.pdf (2019)65. NVIDIA: Multi-process service. https://docs.nvidia.com/deploy/mps/
66. Orts-Escolano, S., Rhemann, C., Fanello, S., Chang, W.,Kowdle, A., Degtyarev, Y., Kim, D., Davidson, P.L.,Khamis, S., Dou, M., Tankovich, V., Loop, C., Cai, Q.,Chou, P.A., Mennicken, S., Valentin, J., Pradeep, V.,Wang, S., Kang, S.B., Kohli, P., Lutchyn, Y., Keskin,C., Izadi, S.: Holoportation: Virtual 3D Teleportation inReal-time. In: Proceedings of the 29th Annual Sympo-sium on User Interface Software and Technology, UIST’16, pp. 741–754. Association for Computing Machin-ery, Tokyo, Japan (2016). DOI 10.1145/2984511.2984517.URL https://doi.org/10.1145/2984511.2984517
67. Pantos, R., May, W.: HTTP Live Streaming. Tech. Rep.RFC8216, RFC Editor (2017). DOI 10.17487/RFC8216.URL
68. Park, J., Chou, P.A., Hwang, J.N.: Volumetric me-dia streaming for augmented reality. In: 2018 IEEEGlobal Communications Conference (GLOBECOM), p.1–6 (2018). DOI 10.1109/GLOCOM.2018.864753769. Paudyal, P., Battisti, F., Carli, M.: Impact of video con-tent and transmission impairments on quality of experi-ence. Multimedia Tools and Applications (23), 16461–16485 (2016). DOI 10.1007/s11042-015-3214-0. URL https://doi.org/10.1007/s11042-015-3214-0
70. Robitza, W., Garcia, M.N., Raake, A.: A modular httpadaptive streaming qoe model—candidate for itu-t p.1203 (“p. nats”). In: 2017 Ninth International Conferenceon Quality of Multimedia Experience (QoMEX), pp. 1–6.IEEE (2017)71. Sami Kekki, e.a.: MEC in 5G networks, ETSIWhite Paper No. 28, (2018). URL
72. Schatz, R., Sackl, A., Timmerer, C., Gardlo, B.: Towardssubjective quality of experience assessment for omnidirec-tional video streaming. In: 2017 Ninth International Con-ference on Quality of Multimedia Experience (QoMEX),pp. 1–6 (2017). DOI 10.1109/QoMEX.2017.7965657.ISSN: 2472-781473. Schatz, R., Zabrovskiy, A., Timmerer, C.: Tile-basedStreaming of 8K Omnidirectional Video: Subjective andObjective QoE Evaluation. In: 2019 Eleventh Interna-tional Conference on Quality of Multimedia Experience2 K. Konstantoudakis, D. Breitgand et al.(QoMEX), pp. 1–6 (2019). DOI 10.1109/QoMEX.2019.8743230. ISSN: 2472-781474. Schoeffelen, T.: Designing a serverless video stream-ing pipeline. https://medium.com/@tschoffelen/designing-a-serverless-video-streaming-pipeline-2d3828f3ccf8 (2020)75. Schreer, O., Feldmann, I., Renault, S., Zepp, M.,Worchel, M., Eisert, P., Kauff, P.: Capture and 3D VideoProcessing of Volumetric Video. In: 2019 IEEE Interna-tional Conference on Image Processing (ICIP), pp. 4310–4314 (2019). DOI 10.1109/ICIP.2019.8803576. ISSN:2381-854976. Seufert, M., Egger, S., Slanina, M., Zinner, T., Hoßfeld,T., Tran-Gia, P.: A Survey on Quality of Experienceof HTTP Adaptive Streaming. IEEE CommunicationsSurveys Tutorials (1), 469–492 (2015). DOI 10.1109/COMST.2014.2360940. Conference Name: IEEE Com-munications Surveys Tutorials77. Singla, A., G¨oring, S., Raake, A., Meixner, B., Koenen,R., Buchholz, T.: Subjective quality evaluation of tile-based streaming for omnidirectional videos. In: Pro-ceedings of the 10th ACM Multimedia Systems Confer-ence, MMSys ’19, pp. 232–242. Association for Comput-ing Machinery, Amherst, Massachusetts (2019). DOI10.1145/3304109.3306218. URL https://doi.org/10.1145/3304109.3306218
78. Skupin, R., Sanchez, Y., Podborski, D., Hellge, C.,Schierl, T.: Viewport-dependent 360 degree video stream-ing based on the emerging Omnidirectional Media For-mat (OMAF) standard. In: 2017 IEEE InternationalConference on Image Processing (ICIP), pp. 4592–4592(2017). DOI 10.1109/ICIP.2017.8297155. ISSN: 2381-854979. Sodagar, I.: The MPEG-DASH Standard for MultimediaStreaming Over the Internet. IEEE MultiMedia (4),62–67 (2011). DOI 10.1109/MMUL.2011.71. ConferenceName: IEEE MultiMedia80. Soltanian, A., Naboulsi, D., Salahuddin, M.A., Glitho,R., Elbiaze, H., Wette, C.: Ads: Adaptive and dynamicscaling mechanism for multimedia conferencing servicesin the cloud. In: 2018 15th IEEE Annual Consumer Com-munications & Networking Conference (CCNC), pp. 1–6.IEEE (2018)81. Spiteri, K., Urgaonkar, R., Sitaraman, R.K.: BOLA:Near-Optimal Bitrate Adaptation for Online Videos.arXiv:1601.06748 [cs] (2016). URL http://arxiv.org/abs/1601.06748 . ArXiv: 1601.0674882. Sterzentsenko, V., Karakottas, A., Papachristou, A.,Zioulis, N., Doumanoglou, A., Zarpalas, D., Daras, P.:A Low-Cost, Flexible and Portable Volumetric Captur-ing System. pp. 200–207 (2018). DOI 10.1109/SITIS.2018.0003883. Sterzentsenko, V., Karakottas, A., Papachristou, A.,Zioulis, N., Doumanoglou, A., Zarpalas, D., Daras, P.: Alow-cost, flexible and portable volumetric capturing sys-tem. In: 2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS),pp. 200–207. IEEE (2018)84. Sullivan, G., Ohm, J.R., Han, W.J., Wiegand, T.:Overview of the high efficiency video coding (HEVC)standard. IEEE Transactions on Circuits and Systemsfor Video Technology (12), 1649–1668 (2012). DOI10.1109/TCSVT.2012.222119185. Sun, L., Duanmu, F., Liu, Y., Wang, Y., Ye, Y.,Shi, H., Dai, D.: Multi-path multi-tier 360-degree videostreaming in 5g networks. In: Proceedings of the 9thACM Multimedia Systems Conference, MMSys ’18, p. 162–173. Association for Computing Machinery (2018).DOI 10.1145/3204949.3204978. URL https://doi.org/10.1145/3204949.3204978 . Event-place: Amsterdam,Netherlands86. Tian, Y., Babcock, R., Taylor, C., Ji, Y.: A new live videostreaming approach based on amazon s3 pricing model.In: 2018 IEEE 8th Annual Computing and Communi-cation Workshop and Conference (CCWC), p. 321–328(2018). DOI 10.1109/CCWC.2018.830161587. Wien, M., Boyce, J.M., Stockhammer, T., Peng, W.H.:Standardization status of immersive video coding. IEEEJournal on Emerging and Selected Topics in Circuits andSystems (1), 5–17 (2019)88. Xie, L., Xu, Z., Ban, Y., Zhang, X., Guo, Z.: 360prob-dash: Improving qoe of 360 video streaming using tile-based http adaptive streaming. In: Proceedings of the2017 ACM on Multimedia Conference, MM 2017, Moun-tain View, CA, USA, October 23-27, 2017, p. 315–323.ACM (2017). DOI 10.1145/3123266.3123291. URL https://doi.org/10.1145/3123266.3123291
89. Xin, Z., Fu, S.: User-centric qoe model of visual per-ception for mobile videos. The Visual Computer (9),1245–1254 (2019)90. Yamasaki, T., Aizawa, K.: Patch-based compression fortime-varying meshes. In: 2010 IEEE International Con-ference on Image Processing, p. 3433–3436 (2010). DOI10.1109/ICIP.2010.565291191. Yates, R.D., Goodman, D.J.: Probability and stochasticprocesses: a friendly introduction for electrical and com-puter engineers. John Wiley & Sons (2014)92. Zadtootaghaj, S., Schmidt, S., M¨oller, S.: Modeling gam-ing qoe: Towards the impact of frame rate and bit rate oncloud gaming. In: 2018 Tenth International Conferenceon Quality of Multimedia Experience (QoMEX), pp. 1–6.IEEE (2018)93. Zhang, G., Lee, J.Y.B.: Ensemble Adaptive Streaming- A New Paradigm to Generate Streaming Algorithmsvia Specializations. IEEE Transactions on Mobile Com-puting pp. 1–1 (2019). DOI 10.1109/TMC.2019.2909202.Conference Name: IEEE Transactions on Mobile Com-puting94. Zhang, M., Zhu, Y., Zhang, C., Liu, J.: Video process-ing with serverless computing: a measurement study. In:Proceedings of the 29th ACM Workshop on Network andOperating Systems Support for Digital Audio and Video,pp. 61–66 (2019)95. Zhang, W., Chen, Q., Fu, K., Zheng, N., Huang, Z., Leng,J., Li, C., Zheng, W., Guo, M.: Towards QoS-Awareand Resource-Efficient GPU Microservices Based on Spa-tial Multitasking GPUs In Datacenters. arXiv preprintarXiv:2005.02088 (2020)96. Zheng, Y., Wu, D., Ke, Y., Yang, C., Chen, M., Zhang,G.: Online cloud transcoding and distribution for crowd-sourced live game video streaming. IEEE Transactions onCircuits and Systems for Video Technology27