[PDF] Key Technologies for Networked Virtual Environments

Abstract

Thanks to the improvements experienced in technology in the last few years, most especially in virtual reality systems, the number and potential of networked virtual environments or NVEs and their users are increasing. NVEs aim to give distributed users a feeling of immersion in a virtual world and the possibility of interacting with other users or with virtual objects inside it, like when they interact in the real world. Being able to provide that feeling and natural interactions when the users are geographically separated is one of the goals of these systems. Nevertheless, this goal is especially sensitive to different issues, such as different connections with heterogeneous throughput or different network latencies, which can lead to consistency and synchronization problems and, thus, to a worsening of the users' quality of experience or QoE. With the purpose of solving these issues, researchers have proposed and evaluated numerous technical solutions, in fields like network architectures, data distribution and filtering, resource balancing, computing models, predictive modeling and synchronization in NVEs. This paper gathers and classifies them, summarizing their advantages and disadvantages, using a new way of classification. With the current increase of the number of NVEs and the multiple solutions proposed so far, this work aims to become a useful tool and a starting point not only for future researchers in this field but also for those who are new in NVEs development, in which guaranteeing a good users' QoE is essential.

Full PDF

>> PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 1

Abstract — Thanks to the improvements experienced in technology in the last few years, most especially in virtual reality systems, the number and potential of networked virtual environments or NVEs and their users are increasing. NVEs aim to give distributed users a feeling of immersion in a virtual world and the possibility of interacting with other users or with virtual elements inside it, in a similar way to when they interact in the real world. Being able to provide that feeling and natural interactions when the users are geographically separated is one of the goals of these systems. Nevertheless, this goal is especially sensitive to different issues, such as different connections with heterogeneous bitrates or different network latencies, which can lead to consistency and synchronization problems and, thus, to a worsening of the users’ quality of experience or QoE. With the purpose of solving these issues, many technical solutions have been proposed and evaluated so far, in fields like network architectures, data distribution and filtering, resource balancing, computing models, predictive modeling and synchronization in NVEs. This paper gathers and classifies most of them, summarizing their advantages and disadvantages, using a new way of classification. With the current increase of the number of NVEs and the multiple solutions proposed so far, this work aims to become a useful tool and a starting point not only for future researchers in this field but also for those who are new in NVEs development, in which guaranteeing a good QoE is essential.

Index Terms —Consistency, Networked Virtual Environment, NVE, NVE Architectures, NVE Components, NVE Resource Balancing, NVE Taxonomy, Responsiveness, Synchronization. I. I NTRODUCTION

ETWORKED

Virtual Environments are computer-based systems simulating a virtual world that support multiple distributed users who can interact within that virtual space in both ways: with the objects in it (e.g., grabbing them) and with the other users in the NVE in real time [1]. To do so, users usually employ an avatar inside the NVE, which is a virtual representation of themselves, controlled by one or more input devices (e.g., a mouse, keyboard, joystick, gamepad or haptic) connected to their computers. Moreover, by providing feedback (e.g., images, sounds, pressure, wind, or aromas) to the events within the virtual world, the users’ feeling of immersion is greatly improved. For visually representing the world, traditional screens, and Cave Assisted Virtual Environments (CAVEs) are usually employed, but head-mounted displays (HMDs) are gaining momentum as they are becoming more

Submitted for review: 02/02/2021. This work was supported, in part, by the Spanish State Research Agency (Agencia Estatal de Investigación) under Grant PEJ2018-003875-A-AR. The authors are with the Immersive Interactive Media affordable and are the ones that really provide users with a complete feeling of immersion. Although NVEs have been mostly used for entertainment (e.g., videogames), they have shown a clear potential for other areas, such as remote virtual meetings, collaboration, or teaching/learning. There are five main scenarios of NVEs depending on their purpose [2]–[9]: Multiplayer online games (MOGs), educational or training NVEs, NVEs for collaborative work, commercially oriented NVEs, and NVEs for social interaction. • MOGs consist of computer-based games where several remote users play, at the same time, in the same virtual environment. These games can have a real-time play or a turn-based one, which might influence the networking requirements [10]. Some examples are MiMaze [11], City of Heroes (CoH) [12], Kingspray [13], Rokkatan [14], and World of Warcraft (WoW) [15]. • Some NVEs can be also used for teaching/learning in different fields, such as in military [16] or medical [17] applications. They can take the trainees away from hazards of real-world training, help with motor learning or enable e-learning. Some other examples are Future Visual [18], FishWorld [19], CoMove [20], Virtway [21], and Acadicus [22]. • NVEs for collaborative work are virtual environments where remote users communicate, interact, and work together to accomplish a mission or a task [23]. NPSNET-V [24], VSculpt [25], RING [26], MASSIVE [27], DIVE [28], ShareX3D [29], CAVRNUS [30], Spatial [31], Co-Surgeon [32], The Wild [33], and COLLAVIZ [34] are some examples. • Commercially oriented NVEs are used by companies for marketing goals, like researching, designing, testing, or advertising products [35]. Some solutions are found in Spinview [36], Trezi [37], Vizible [38], and Theia Interactive [39]. • NVEs for social interaction offer a new way of communication and social interaction for relatives, friends, and strangers to connect from any place around the world and engage in social activities. Second Life [40], IMVU [41], Diamond Park [42], Virtual Real Meeting [43], Mozilla Hubs [44], CAVRN [45], Decentraland [46], VRChat [47], and Bigscreen [48] are some few examples. Although NVEs can vary significantly depending on their (IIM) R&D Group, at Universitat Politècnica de València (UPV), Campus de Gandia, C/ Paraninfo, 1, 46730, Grao de Gandia, Valencia, Spain (e-mails: {juagons4@epsg, fboronat@dcom, asapena@mat, fjpastor@dib}.upv.es).

Key Technologies for Networked Virtual Environments: A new taxonomy

Juan González, Fernando Boronat, Almanzor Sapena, Javier Pastor N PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 2 purposes, all of them must provide users with the following features [1], [49], [50]: • A shared sense of space. The distributed users should have a common feeling of being in the same space. • A shared sense of presence. The users usually have a virtual representation of themselves in the NVE (avatar), which should be visible to all the users in the virtual world. • A shared sense of time. Changes in the NVE should be seen or perceived by all the users at the same time. • A communication medium between remote users for interaction (e.g., through text messages, voice, or gestures). • Realistic interactions between users and between users and objects in the virtual world, if possible. Being able to reproduce a feeling of immersion and natural interactions when users are geographically separated is one of the goals of NVEs. Nevertheless, this goal is especially sensitive to different network issues, such as different connections with heterogeneous throughput or users experiencing different network latencies, which can lead to consistency and synchronization problems and, thus, to a worsening of their quality of experience (QoE). For example, users with lower network throughput will probably interact in an unfair way within the virtual world since some update messages or events may take more time to reach them. Consistency is very important in NVEs and is directly connected to other aspects [51], [52], such as synchronization, causality (ordering of events), and concurrency, which are strongly related. Nevertheless, apart from these three aspects, when designing an NVE, there are other factors that should also be considered, such as concurrency, robustness, scalability, interactivity, coherence, flexibility, responsiveness, fairness, security, and traffic overload, which will be defined later in this paper. With the purpose of minimizing the effects of all the mentioned issues, in the past, several solutions have been proposed and evaluated, in fields like network architectures, data distribution and filtering, resource balancing, computing models, predictive modeling and synchronization. This paper firstly focuses on the different aspects and factors that have an influence in the consistency of the NVE, affecting the end-users’ QoE. Secondly, the different techniques presented in most of the existing NVE-related works are surveyed and classified, summarizing their advantages and disadvantages, and identifying some subfields that are less (or not yet) explored, as potential areas of interest for future research related to NVEs. As far as authors know, this is the most complete survey compiling the existing key technologies in each field, regarding the development of NVEs. As a summary, the main aims of this work are the following: • To review the up-to-date technologies needed for designing NVEs and compare their advantages and disadvantages. • To provide a novel taxonomy, grouping the technologies for managing consistency and responsiveness in NVEs. • To identify the different subfields that are less (or not yet) explored as potential areas of interest for future research. • To provide a useful baseline for developing new NVEs or improving the current ones. With the increase of the number of NVEs and the multiple solutions proposed so far, the presented survey aims to become a useful tool and a starting point not only for future researchers in NVE design but also for those who are novel in this field. The rest of the paper is structured as follows. In Section II, the fundamentals of NVEs are described. In Section III, a compilation of other surveys related to the classification of techniques in NVEs is presented. In Section IV, the novel classification that this paper proposes is summarized. In the following sections, different network architectures (Section V), filtering techniques and data distribution models (Section VI), resource balancing techniques (Section VII), prediction and synchronization techniques (Section VIII), and computing models (Section IX) for NVEs are described and compared, providing highlights on fields that deserve further exploration. Finally, some conclusions and insights into future work are summarized in Section X. II. F UNDAMENTALS

To help the reader to better understand the content of the rest of the paper, the main fundamentals of NVEs are explained in this section. Firstly, some useful definitions are provided and then, due to their relevance in NVEs, consistency and interaction issues are described in more detail. A. Definitions

NVEs are usually a collection of computers, networks, software, databases (DBs), and users. Related to them, there are some important concepts, such as: • Avatar: An entity directly controlled by a user, which usually represents that user within the NVE. • Behavior: Events and modifications related to an entity (e.g., movement). • Client: A node that allows the user to participate in the NVE. • Data or Database: All the information related to the NVE. • Entity: virtual object/avatar represented in the virtual world. • Event or Update message: A state update or other kind of NVE-related information, transmitted through the network. • Node: A computer or device connected to a network, enabling communication, or using it. • Parameter: Characteristic of the NVE, virtual world or entities that can be modified. • Peer: Each node belonging to a serverless network. It can act as both Server and Client. • Server: A device that provides resources and information to the clients of the NVE. • State: Any parameter or combination of them that defines a particular entity (e.g., position) or the NVE characteristics. • Technique: A method that solves or mitigates the problem/s that NVEs can experience. • Viewport: part of the virtual world being represented on a screen or HMD and visualized by the user. • Virtual world: A virtual representation (usually in 3D) where different entities are placed. • Zone: A virtual geographic area inside the virtual world. PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 3 To design an NVE, the following factors must be considered [51], [53]–[60], to fulfill the features of NVEs described in Section I: • Interactivity is the ability of the NVE to respond to users’ input. The level of interactivity required in each NVE depends on its purpose. For example, the level of interactivity of a first-person shooter (FPS) MOG is higher than a simple quest MOG. • Concurrency consists in the execution of events by different users on the same entity at the same time. This refers to the causality concept and to the fact that each of these events should not interfere with the others. If concurrency is not controlled, interactivity and consistency could be impaired. • Consistency consists in the view of the same information by the users at the same time notwithstanding the network delay, losses, or other negative issues. Consistency errors can affect the interactions between users and with objects, as, e.g., they might not share the same state of the objects or sense of time. • Robustness , or flexibility of the NVE, refers to its tolerance to user disconnections and failures. User disconnections or failures should not provoke a malfunctioning or stop the NVE application. NVEs must be able to adapt to possible unexpected events without (or minimally) affecting the users’ QoE. • Scalability of an NVE is its ability to support and adapt to a growing number of simultaneous users. • Coherence of the NVE refers to its capacity to coordinate the different copies, or viewports, of the virtual world in a simultaneous and synchronized way. All users must be viewing the same scenario, at the same time, including the actions taking place in it. Errors in the coherence of the NVE can cause problems in the interaction between users, worsening their QoE. • Responsiveness is a concept highly linked to interactivity. The users of the NVE must be provided by a sense of natural interactions. The time between the starting instant of an action, usually related to a user’s input, and the visualization or perception of its effects by the users should be close to reality. • Fairness of the NVE is important when interacting. It is defined as the ability of the users to interact with other users and with the NVE entities despite their networking capabilities. • Security refers to the measures taken in NVEs to prevent users from cheating and to protect their data from malicious changes, either intentionally or unintentionally. • Traffic overload . When the number of users increases, consequently the number of messages and exchanged data in the NVE increases. If this increment exceeds a certain threshold, the NVE can collapse, failing to process and deliver all the information, making the interaction impossible for users. Another key attribute when designing an NVE is time . Even though the meaning of time can vary from one NVE application to another, there are two principal concepts of time to be considered:

Absolute Time (a.k.a. Wall Clock Time) and

Virtual Time (a.k.a. Causal Time or Simulation Time) [51], [61]. The former is based on the concept of a periodic clock, usually synchronized to the Coordinated Universal Time (UTC). The latter is based on a logical, loosely synchronized clock, as a sequence of ordered events, which halts if no new events occur. Furthermore, NVE designers must also deal with network- and access devices-related issues caused by having distributed users across heterogeneous networks and different access devices. Moreover, mobile devices and networks are gaining momentum in NVEs. Those main issues are latency (delay of messages), jitter (variance in latency), network throughput (data transmission capability), and packet loss [62]–[65]. For NVEs, in general, the latency values should be less than 100ms, but it also depends on its purpose (e.g., in MOGs it varies from 80ms to 200ms) [64], [66], [67]. Furthermore, having high latency with low jitter is preferable to having low latency with high jitter [62]. Throughput requirements in NVEs depend on the number of simultaneously connected users, the number of entities in their virtual world, and the size and transmission rate of the update messages [5]. As these issues directly affect the consistency of any NVE, they should be considered too. Even though NVEs could get rid of these problems (e.g., by increasing the available network bandwidth), in most common circumstances, NVEs cannot control these network-related issues directly (e.g., when they depend on the Internet or Internet Service Provider’s access network conditions). Regarding the use of wireless access devices, they face high communication latency (e.g., in 4G it goes from 50ms to 200ms), limited available wireless bandwidth (compared to wired ones), reduced energy, storage and processing capabilities and, by the absence of a shared memory or a global clock, a lack of global references [68]–[72]. With the arrival of 5G/6G networks, the first two problems are expected to be relieved. As an example, one of the goals proposed for 5G networks is the reduction of latency to around 1ms [73], [74]. A solution for the low performance on lightweight devices is to render the virtual world and do other intensive tasks in another, more powerful computer, and then send the frames back to be displayed (e.g., in mobile devices embedded in HMDs). However, this brings a new issue known as motion-to-photon latency , this is, the delay between user input (e.g., head movement wearing a HMD, or moving with keyboard and mouse) and its virtual world reflection (e.g., the viewport changing), which produces an unpleasant effect and needs to be treated [75], [76]. Users normally detect this latency when it is above 20ms or 80ms, when they are making fast movements (e.g., shaking their head abruptly), and above 120ms during slow ones (e.g., panning their head gradually) [77], [78]. B. Consistency

Consistency can be considered one of the most important features of NVEs. When different users try to simultaneously access to the same entities in the NVE, their actions could interfere, giving place to inconsistencies that could lead to a worsening of the users’ QoE. As proved in [79], in any kind of distributed (partitioned) system, there is no possible way of guaranteeing both the consistency and the responsiveness (availability of the information) at the same time, so, designers must find a trade-off between consistency and responsiveness for the sake of having an interactive immersive experience. To achieve this balance, a tolerable degree of inconsistency could be allowed PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 4 by the NVE, and hence differing the virtual world representation between the users. So, in this field, there are contrasts between real-time (or imposed) consistency and delayed consistency, depending on whether the same NVE state is perceived by the users at the same time or some users receive it later; and between local consistency and global consistency, depending on whether the same NVE state is simultaneously perceived by a part of the NVE’s users or by all them, respectively [51], [80]. Moreover, even when information availability is ensured, if data can be lost (e.g., due to network packet loss), inconsistencies may possibly appear [79]. This reasserts the importance of techniques for solving issues related to consistency and responsiveness in NVEs. To tackle the consistency and responsiveness trade-off problem, the existing consistency maintenance techniques dealing with the interactivity or the concurrency of the NVEs may try to balance consistency and responsiveness in different ways. Based on how the users’ actions are managed, they can be divided into optimistic , pessimistic , predictive or hybrid approaches [51], [80]–[82]. • Optimistic (or aggressive) approach: actions can be executed without previously checking whether they affect consistency. Each user’s copy of the NVE does not wait for his/her interactions to be validated and communicated to other users but executes them and goes on. In the case that the processed actions cause inconsistencies, rollbacks (the reversal of the NVE state to a previous known consistent one) must be applied to recover the lost consistency. This approach is useful for high latency networks, improving responsiveness, but only when interactions have a low chance to produce inconsistencies, as too many rollbacks will produce the opposite effect, worsening the responsiveness. • Pessimistic (or conservative ) approach is, basically, the opposite of the optimistic one. Each user’s copy of the NVE must validate every action, ensuring consistency is maintained and communicating the new state to the rest of users before allowing further interactions. This effectively limits the responsiveness, but ensures consistency without the need of rollbacks, being a good option when low latency networks are used or when responsiveness is not necessary. This approach is best suited for turn-based games or NVEs with a low number of users and geographically close to each other. • Prediction-based approach comes between the optimistic and pessimistic ones. It tries to accurately predict interactions (when they will happen or what their effects will be) before they occur. So, the user’s copy of the NVE does not need to stop to check the consistency (as it can be done in advance) while also reducing the number of rollbacks. Only when wrong predictions are made, rollbacks are needed. This approach is very important, as the previous two ones are not highly scalable. • Hybrid control approaches combine the previous approaches, employing them at different intervals in the NVE life cycle, by, for example, switching approaches when network conditions change or using different approaches depending on the types of entities (e.g., a vehicle that moves fast may require a higher update rate). C. Interactions in NVEs

Given the nature of communications in NVEs, causing delays, events are not conveyed instantly, and this could cause actions to conflict in a concurrent space, which is known as race condition . The NVE could avoid it by making users wait for others to finish their actions, lowering the responsiveness. Nevertheless, this is not suitable for real-time applications. Concurrency in NVEs can be controlled by limiting the interactivity to be sequential or collaborative, to balance, again, between consistency and responsiveness, and provide a good QoE [51], [80], [83]. On the one hand, in sequential interactions , a single user at a time is given a turn to modify an entity, so other users cannot access it. This guarantees the consistency, but if the actions take too much time or the latency is too high, the more connected users the lower interactivity and responsiveness. On the other hand, in collaborative interactions , several users can interact at the same time, but which parameters or entities the users can modify simultaneously should be defined to reduce inconsistencies in the NVEs. In this sense, the parameters in an NVE can be divided into three types: Independent, co-dependent, and non-restricted parameters. • Independent parameters . Users can interact with entities that are not already in use by other users and whose modifications do not interfere with the behavior of other entities that are being updated by other users. • Co-dependent parameters . Users, although unable to interact with entities already in use, can update parameters of other entities whose changes affect those entities in use. In this case, modifications may violate the rules of the NVE, and, therefore, corrections should be done, even if a specific level of inconsistency is tolerated. • Non-restricted parameters . In this case, no limit is imposed, allowing the users to change even the same parameters of the same entities in the NVE. This offers the highest responsiveness, but, when corrections (for consistency) happen, the flexibility, or even the responsiveness (that it was supposed to be improved), can be impaired. III. R ELATED WORKS

In the past, some works have made some efforts for classifying the vast set of existing solutions and techniques designed for, or used in NVEs, as well as to classify the different components that NVEs comprise. In [84] the distinct parts of an NVE are identified, but only few related techniques are classified (e.g., network protocols or data distribution models). Four parts are described: Communication, Views, Data, and Processes. The Communication part is related to network issues (bandwidth, latency, and reliability) and the geographic distribution of the users. The Views part is related to the user’s viewports of the NVE. The Data part is related to the models of data distribution between nodes. The Processes part is related to the execution of the NVE, including the involved servers and clients, computing requirements, and software. In [51] and [52] the problems of consistency and PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 5 responsiveness that can happen in NVEs are studied, dividing the mechanisms to solve them into three main components: Time, Information and System Architectural management. The Time Management component includes synchronization, time prediction and concurrency control techniques. The Information Management component includes techniques dealing with network latency, and with data filtering and management methods; and finally, the System Architecture Management component includes network and software architecture, communication protocols, and QoS constraints. In [80] several techniques for maintaining the consistency and responsiveness in NVES are classified. A main division between the system architecture and consistency maintenance components is proposed. The system architecture component is divided into the network architecture, data distribution, and communication parts. The consistency maintenance component is mostly presented as a catch-all group containing the synchronization, prediction, interactivity approaches, and concurrency mechanisms. Finally, in [1] the following basic components of NVEs are defined: Graphic Engine and Displays to visualize the environment, Communication and Control Devices, Processing Systems for the computing and transmission of events, and Data Networks for the actual communication and information sharing. Related techniques/solutions are classified in two groups: Architectures, and Technologies and Protocols. While the Architectures one only includes the basic network architectures (Client/Server, P2P and Hybrid), the Technologies and Protocols one includes 3D Technologies (software, programming languages, frameworks, and interfaces) and several Communication Protocols. These works have shown their classification of techniques, including the benefits and drawbacks, but focusing on the explored use cases. Those works are quite informative and descriptive, and have left unexplored other important fields, such as, e.g., computing models, or prediction techniques, which are included in the taxonomy presented in this paper. Moreover, more modern techniques that appeared in the last decade (mostly thanks to cloud technologies) are also surveyed and a more complete classification is provided in this paper. IV. P ROPOSED TAXONOMY

In this paper, a novel taxonomy, more NVE-based and suited for NVE design than the ones in the works summarized in the previous section, is presented. It is shown in Fig. 1. It includes most of the techniques used to design NVEs, classified into the following five components or parts of an NVE: Network, Information Management, Resource Balancing, Time Management and Computing Models. In this Section, these five parts are described, while all the involved techniques in each one of them will be explained in the following sections. With this new taxonomy, a clearer and more intuitive classification than the ones in previous works is provided, making it easier to identify the different parts of an NVE, navigate through all the included techniques in them, and quickly find all their relationships. Moreover, the different identified parts and subparts of an NVE also point out the main research fields related to NVEs (e.g., networking, data distribution models, computing, etc.). The Network part of NVEs is related to the structure that enables communication between the involved nodes in an NVE, as in [1], [51], [80]. This part consists of two subparts: Network Architecture, which defines the connections between nodes; and Communication Protocols, which establish the protocols that the network nodes employ in the communications. In this work, only the network architectures subpart is considered, and a classification of the existing solutions so far is provided. As far as authors know, the only existing specific communications protocols designed for NVEs, are DIS and HLA protocols The DIS protocol was used in NPSNET [24], while the HLA protocol [85] was created to outperform DIS and replace it. The latter is often used for Military applications in private networks. In the rest of NVEs general purpose and widely supported protocols, can be employed, such as TCP, UDP, RTP/RTCP, etc. Due to that, the review and classification of protocols used in NVEs is out of the scope of this paper, and, therefore, the Communication Protocol subpart of the Network component of NVEs is not considered. Regarding the Information Management part of an NVE, it includes all the techniques that directly manage the NVE’s data (reading, modifying, copying, or deleting them). Data are one of the more important parts of NVEs [52], [80], [84], [86]. At the same time, this part contains techniques for data filtering and data distribution. Additionally, it can also include the data compression and file systems control techniques [87] as well, but they will not be considered in the classification either, as no solutions specifically designed and applied to NVEs have been found. The division of the distinct responsibilities that nodes can have in an NVE, to ensure the correct operation of the NVE, is

Fig. 1. NVE Taxonomy. Parts of an NVE and techniques included in them.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 6 another pillar of NVE development. Different NVEs could identify different resources that must be managed by one or several nodes. For example, one NVE could only require multiple zones to be controlled by the same server and another NVE could allow the same zone to be managed by multiple peers. The techniques that orchestrate the roles between the available nodes are also classified in the Resource Balancing part of the NVE in the presented taxonomy. In previous works, this part is also referred to as Interest Management, Resource Management, or Load Balancing [5], [84], [88]–[90]. The Time Management part of an NVE deals with the execution (or simulation) of the involved processes, with the events and other messages generated and transmitted between nodes, so that actions take place in a causal, coherent, and consistent manner [66], [86]. As NVEs are programs that execute orders over time, and the passage of time is required for users to perceive movement and progress, this part constitutes another pillar of NVE design. Many techniques (explained later) already exist for time management in NVEs, which can be divided into two groups: Predictive Modeling and Synchronization, which will be explained later. Finally, the Computing Models part [91], [92], include the techniques that deal with the computation tasks in an NVEs (e.g., rendering a frame), which are important for bringing enhanced functionalities to the distributed users. Previous works on NVEs have barely delved on this subject, and most solutions exist only for MOG applications. In this work, this part is considered as relevant, due to the recent improvements in computation technologies (e.g., faster processors and networks) that go hand in hand with the increase in computation requirements (e.g., more quality and data). Each identified part of an NVE is clearly separated from the other parts, making it easier to classify the existing techniques included in them. In the following sections, an up-to-date and more complete classification of many techniques for each part of the NVE is provided. Unlike previous works, several comparison tables are included showing their advantages and disadvantages. V. N ETWORK ARCHITECTURES

The layout of the network/s involved in NVEs is very important. A well-designed one will perform better by optimizing the network usage and reducing packet transmission load and delay, hence making the consistency easier to control without loss of responsiveness. In this section, the main types of network architectures used in NVEs are described and compared: Client/Server, Cloud-based, Edge-based, Peer-to-Peer (P2P), and Hybrid. A. Client/Server architecture

In a Client/Server architecture (Fig. 2), a server stores all the virtual environment data and manages the NVE state and communications between clients. The server acts as a central authority to which all clients must connect and send updates [60]. To communicate or to inform about an event, the clients must communicate it to the server and then it is forwarded to the other clients [52], [80]. Rokkatan [14], RING [26], ShareX3D [29], Co-Surgeon [32], CAVRN [45], TerraNet [93], and the one in [94] are examples of NVEs that follow this architecture. On the one hand, the main advantages of this architecture are its easy implementation, and a simple consistency, synchronization, and security control, as the server manages the whole NVE. On the other hand, this architecture presents robustness and scalability problems. The server is a weak point, but it can be overcome with the use of redundant or load balanced servers. When the number of clients (i.e., users) increases, a bottleneck can occur on the server or its network access links due to the rising number of communications, and, thus, consistency maintenance can be affected. Finally, it can unnecessarily increase the latency between clients since the communication goes through a server. B. Cloud-based architecture

The Cloud-based architecture (Fig. 3) is like Client/Server, but instead of having a single Server, the workload is distributed among several computers with sufficient resources and computing capabilities that manage the needs of the NVE [60], [95]. It is mostly used in massively multiplayer online (MMO) games to solve the scalability problem in the Client/Server architecture. When following Client/Server architectures, game operators are forced to adapt their resources to the game load peaks with the consequent high economic cost and expensive maintenance. Another possibility is to take advantage of the current Cloud Computing business model, which is based on a pay-per-rent model, in which the customers only need to pay for the resources they use. Thus, it is an elastic model that allows NVE operators to adapt dynamically and rapidly the resources used depending on the game load. Some examples of the use of this architecture can be found in WoW [15], in CloudyGame [96], in [97], and in [98]. This architecture presents some advantages, such as the dynamic adjustment of the resources, or the reduction in maintenance costs, and the easiness to maintain consistency. The NVE application must include additional components to give support to the dynamic provisioning of resources, such as a monitor and a provisioner . The monitor component must consider different performance measures of parameters, such as the response time, the average system throughput, the amount of consumed bandwidth or the use of the rented machines. The provisioner component handles analytical load models and fast prediction algorithms to anticipate load peaks and under-utilization of resources. As a drawback, the implementation of this architecture is more complex, since additional components

Fig. 2. Client/Server architecture.

Fig. 3. Cloud-based architecture.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 7 are needed, and extra delays appear due to the use or access to those components. C. Edge-based architecture

The Edge-based architecture (Fig. 4) extends the Cloud concept, offering the data and the processing resources in Servers closer to the clients (called Edge nodes), improving network usage and responsiveness. In that sense, an Edge node is just a part of the Cloud that interacts with it, doing tasks that benefit the clients close to them. Variations of this architecture can bring to the Edge the components of the Cloud partially (Fog [99]) or totally (Cloudlet [100]). Examples of the use of this architecture can be found in MUVR [76] and in CloudyGame [96]. Due to the increasing problems of accessing the Cloud with higher amount of data and number of clients, which impair the quality of the service or content, this architecture offers a specialized way of dealing with them and reducing the overhead of the centralized or the Cloud-based infrastructures. The advantages of this architecture are a lower latency, reduced costs, and optimized network bandwidth usage, while its disadvantages are less security and robustness, compared to the Cloud-based one, as the edge nodes are more vulnerable to attacks and failures. D. P2P architecture

P2P (Peer-to-Peer) architecture (Fig. 5) is a decentralized architecture with peers interconnected without the need of a central authority or server. The peers supervise and distribute the NVE load between them, having all of them similar roles by acting, at the same time, as clients and servers [1], [60], [80], [101]. Examples of the use of this architecture can be found in MiMaze [11], NPSNET-V [24], DIVE [28], Phaneros [102], SimMud [103], and in Pithos [104]. In general, P2P architectures have some advantages: they provide high scalability (i.e., support a big number of clients) and facilitate local consistency and responsiveness. By contrast, as clients have local copies of the NVE, it is more difficult to keep the global consistency compared to Client/Server architectures. Moreover, due to every client having control of the NVE, security issues may arise (e.g., cheating) [105]. NVEs can tackle this problem by including a component to monitor activity and validate clients’ identities. E. Hybrid architecture

This kind of architecture mixes other architectures to solve common problems of P2P and Client/Server ones, by combining the centralized and the decentralized schemes (Fig. 6). In hybrid architectures, on the one hand, multiple servers connect themselves using P2P, and, on the other hand, one client connects to only one server in the same way as a traditional Client/Server architecture [80]. To inform clients, servers coordinate themselves. Examples of this architecture can be found in Diamond Park [42], in [106], and in [107]. The main advantages of the hybrid architectures are the ability to provide scalability (as the number of clients increases, more servers can be easily added) and redundancy (i.e., duplication of the NVE data). On the other hand, servers communicate with clients and with other servers, thus, each server must process more data, in addition to the fact that multiple servers can introduce more latency to the NVE. F. Comparison

Table 1 presents a summary of the different network architectures employed in NVEs, described in this section, including their advantages and disadvantages, as well as some examples of NVEs following each of them.

Network Architectures Advantages Disadvantages Examples

Client/Server • Easy Implementation • Easy consistency control • Easy synchronization control • Easy security control • Robustness problems • Scalability problems • Adds latency Rokkatan [14]; RING [26]; ShareX3D [29]; Co-Surgeon [32]; CAVRN [45]; TerraNet [93]; Pandzic et al. [94] Cloud • Easy consistency control • Easy robustness control • High scalability • Reduced maintenance cost • Increased latency • Complex implementation WoW [15]; CloudyGame [96]; Nguyen et al. [97]; Nae et al. [98] Edge • Reduced latency • Reduced network usage • Reduced maintenance cost • Global consistency problems • Less secure than the Cloud one MUVR [76]; CloudyGame [96] P2P • Easy local consistency control • Easy local responsiveness control • High scalability • Global consistency problems • Less secure than centralized MiMaze [11]; NPSNET-V [24]; DIVE [28]; Phaneros [102]; SimMud [103]; Pithos [104] Hybrid • High robustness • High scalability • Adds data usage and latency • Complex implementation Diamond Park [42]; Anthes et al. [106]; Capece et al. [107] Table 1. Network architectures used in NVEs.

Fig. 4. Edge-based architecture.

Fig. 5. P2P mesh architecture.

Fig. 6. Hybrid architecture.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 8 The way the nodes interconnect is highly related to the techniques that are needed to guarantee and maintain consistency in the NVE. Additionally, networked systems may also experience scalability, flexibility, or security issues, which should be balanced or managed by those or other techniques explained in the following sections. VI. I NFORMATION MANAGEMENT

One of the most important parts of NVEs is their data, how they are stored and how they are transmitted between the partakers. In this section, Data Filtering and Data Distribution techniques for optimizing the data of the NVE and for reducing the network usage in it, respectively, are presented. Data filtering techniques are the ones that select the needed data to be transmitted or processed, while data distribution techniques are the ones that manage the replication of the database of the NVE (or specific parts of it) among the participant nodes. A. Data filtering techniques

Filtering data at transmission, reception or in any of the intermediary nodes can reduce traffic overload and increase scalability [108]. To do so, data filtering techniques can be used to set a priority for the updates of the generated states and, if needed, to discard the transmission of data that is deemed as less important. This way, when a lot of events happen and not all the data could be processed or transferred, the network overload does not increase. However, the extra processing load required for filtering data increases latency and, hence, causes transmission delays that reduce responsiveness. So, the goal of the data filtering techniques is to reduce network congestion and usage that could cause large delays, at the expense of some temporal inconsistencies, given that those inconsistencies are barely perceived by the users. In this subsection, some data filtering techniques are explained and compared, such as: Potentially Visible Sets, Frontier Sets, Update-free Regions, Reachability Range and Local Perception. Potentially Visible Sets (PVS)

PVS is built on the basis that an avatar can only see a set of the total entities in the virtual world. So, only those ones in that set should be rendered [109]. The virtual world is divided into zones and, for each one, all the entities that can be viewed from any point of that zone are stored. Static entities can be used and stored beforehand, whereas dynamic ones (e.g., that can change their position) must be re-processed when they change. Clients, instead of being constantly (every frame) calculating visibility, only need to receive updates of the set of entities visible from the zone in which their avatars are located. An example of the use of PVS can be found in Phaneros [102], and in [110]. Fig. 7 shows an example of a world divided into 9 zones. When the avatar moves from one zone (a) to another zone (b), its visible entities change. PVS is simple to apply and can save data processing and network usage when there are few visible entities. Nevertheless, it requires extra storage and, in open virtual spaces or with a lot of dynamic entities displayed at the same time the data size increase can become uncontrollable. Frontier Sets

Frontier Sets is based on PVS. The NVE is also divided into zones, but, in this case, the clients look for other visible zones instead of visible entities, so that all the entities inside a visible zone are considered, even if they are hidden. Frontier Sets consists in finding, for each zone containing an avatar, a pair of groups of zones in which one group has no visibility in common with the other [111]. A group of zones is called a Set, and a pair of sets, in which an avatar located inside one set cannot see the contents or entities located in the other set, is called a Frontier. Ultimately, all existing frontiers are called the Frontier Set of the NVE. When an avatar enters a zone in any set, the frontier sets including that set are calculated and, the corresponding client only receives update messages from the zones visible from that set. This way, an avatar can move freely within a set, without the need of requesting information from other non-visible sets. An example of the use of Frontier Sets can be found in [112], and in [113]. As an example, in Fig. 8 a couple of frontiers are shown. An avatar in the zone 4 could combine frontiers a) and b) to ignore the zones 3, 5, and 9. If the avatar moves to the zone 7, it will leave frontier a), being able to view the zone 9. In Frontier Sets, each client checks all the frontiers of the zone in which its avatar is located to know from which other zones the user does not need to receive updates, instead of calculating the visibility for each entity (as in PVS). Thanks to that, the required memory and processing resources of the NVE do not increase when the number of entities increases. Nevertheless, if the number of entities is high inside one zone,

Fig. 7. Virtual world divided into 9 zones with PVS. The avatar moves from a) to b), changing the zone and, thus, the visible entities.

Fig. 8. Frontier Sets. In a), the sets of 1-2-4 and 5-9 zones. In b), the sets of 3-5 and 4-7 zones.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 9 the number of messages transmitted could still be high and the zone should be further divided (increasing the storage need) or the consistency should be ensured by other means. Update-free regions (UFR)

In UFR, zones are called regions, and the NVE defines, for every possible pair of entities, a pair of regions, so that the contents of one region are hidden for the other region in the pair [114]. Both regions are considered update-free regions (UFR) in the pair since clients with avatars in one region will not receive updates from the other region in the pair. While an avatar stays inside a UFR in a pair, the associated client will not send update messages to the clients with avatars in the other region of the pair. An example of the use of UFR can be found in [114]. Although UFR may be less robust than frontier sets, it does not require knowledge of the whole contents of the NVE by each client, as they only keep track of the zones they have to leave, instead of all the possible zones, or entities in sight [52]. This makes UFR suited for distributed architectures like P2P. Fig. 9 shows an NVE with two avatars, each one in a different UFR (separated by a wall) in the same pair of regions. The clients of each avatar will not receive updates from the other while their avatars remain inside their respective regions. When the avatars leave their UFR, they check if they became visible to each other, and a new pair of UFRs will be generated for that pair of avatars so when the avatars enter and stay inside them, they will not send update messages to each other. The main benefit of UFR is that the network usage is reduced by not sending unnecessary updates, and without affecting the consistency perceived by the user. The UFRs are easy to compute, but as the number of entities increase, the size of memory needed to store the data of the generated UFRs increases, becoming hard to be managed for all the possible pairs of entities. Reachability Range

In Reachability Range, instead of dividing the virtual world into zones, a circular zone is defined around each avatar and the associated client only accepts update states from entities inside that zone (i.e., only entities inside are updated) [108], as shown in the example in Fig. 10. An example of the use of Reachability Range can be found in Second Life [40], Pithos [104] and in [108]. Reachability Range is easy to implement, but its performance depends on the number of entities within the defined range, as too many could slow down the NVE or reduce the consistency. In this case, no computation is required to determine pairs or zones, saving memory as well. Local Perception

In Local Perception, messages are prioritized based on how close or far the entities are in the virtual world from the avatar controlled by a client [115]. All the state changes will arrive to the client but ordered depending on how close the entities are from the avatar. The farther the entity, the later the update will be received. Local Perception is like the Reachability Range, but instead of discarding information of entities beyond a range, that information will be received with delay, and therefore, the corresponding updates of those entities will be delayed. Local Perception effectively distorts time, as the closest entities are quickly updated, while the update of the ones far away is delayed. This can be perceived as a bad effect (e. g., as entities move away from the avatar, the movement slows down), but it allows to receive events from more and distant entities in comparison to Reachability range. This way, if an entity is getting closer, its state will be updated more frequently, so no jumpy movements are perceived. Fig. 11 shows, from a) to c), how an entity that is approaching the avatar is perceived by the client, and its real position. The closer the entity, the sooner it is updated, closing the gap between the real position and the received one. An example of the use of Local Perception can be found in [115]. As an advantage, Local Perception supports a high number of entity updates without a high loss of consistency and responsiveness. On the contrary, when there are a lot of entities, update messages can be delayed more than expected, even causing network congestion. Comparison

Table 2 presents a summary of the different data filtering techniques described in this section, including their advantages and disadvantages, as well as examples of NVEs using each of them. Notice that the found techniques are focused on data filtering based on position or visibility. Filtering techniques yet to be developed could be dynamic or based on predictions.

Fig. 9. Example of Update-Free Regions.

Fig. 10. Example of reachability range (nearby entities are still updated even if not visible).

Fig. 11. Movement of an entity (dashed square) and the state received at the time (black square). The closer to the avatar, the more accurate its position is.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 10

Data Filtering Advantages Disadvantages Examples

Potentially visible sets • Easy implementation • Reduced network usage • Increased storage need • Scalability problems Phaneros [102]; Moreira et al. [110] Frontier sets • Reduced storage need • Scalable for dispersed entities • Scalability problems for high number of entities Avni et al. [112]; Steed et al. [113] Update-free regions • Reduced traffic overload • Reduced computation needs • Increased storage need Makbily et al. [114] Reachability range • Easy implementation • Reduced computation needs • Reduced storage need • Scalability problems for high number of entities Second Life [40]; Bassiouni et al. [108] Local perception • High scalability • Delayed consistency • High traffic overload • High network latency Sharkey et al. [115] Table 2. Data filtering techniques used in NVEs. B. Data distribution techniques

Choosing the location of the data is a critical decision when designing an NVE. Data are usually collected in a DB containing all the information about the elements composing the virtual world, the position of the avatars, the model geometry, textures, terrain, and their behaviors (i.e., the way they react to an event). The locations where the data are stored and how they are replicated among the participants can reduce network usage and network latency. In this subsection, several methods to distribute and replicate the data among the NVE nodes are explained and compared, such as: Shared Centralized World, Homogeneous Replicated World, Shared Distributed World and Dynamically Changing Data Distribution. Shared Centralized World

In Shared Centralized World, a server stores the DB and shares the contents with the clients (Fig. 12, with the different shapes inside the DB representing different entities). Clients must connect to the server to be able to interact within the NVE [84]. Every time the state of an object is going to be modified, a request must be sent to the server. Then, the server performs the changes in the DB and sends an update message to all the other clients to update the state of that object (Fig. 12). Shared Centralized World is frequently used in Client/Server architectures, and some examples can be found in [116] and VISTEL [117]. The main advantages of Shared Centralized World are the ensured consistency, as only one database is used, and the absence of data replication on several servers. On the other hand, in addition to the well-known drawbacks that are present when using a Client/Server architecture (e.g., robustness, scalability…) this mode presents two additional main drawbacks. First, possible high transmission delays between clients and the server, and the processing time in the server can increase the latency, inducing a lack of responsiveness, worsening the interaction and, therefore, becoming annoying for users. Second, a bottleneck might occur in the server, as the higher the number of clients, the higher the number of messages to be processed in it and the higher the number of update messages to be sent by it. Homogeneous Replicated World

In Homogeneous Replicated World, each client stores a copy of the virtual world DB (which is modified locally) and has the control of the object behaviors [1], [84] (Fig. 13). Only object state changes or events (e.g., collisions) are exchanged between clients, to maintain consistency. When a client modifies an object behavior (e.g., opening a door or cutting down a tree), synchronization techniques must be applied to replicate these behaviors in the rest of the clients (Fig. 13). Examples of Homogeneous Replicated World can be found in SIMNET [16], RING [26] and ShareX3D [29]. The use of Homogeneous Replicated World presents two main benefits: the sent messages are simple updates, and their size and number are quite small; and the latency on the interactions is very low. Furthermore, any modification to the objects in the virtual world is performed by clients. On the contrary, it has some few drawbacks. First, some inconsistencies can occur between the clients since message losses or network delays could prevent some clients from updating their own copy of the DB on time. Moreover, additional mechanisms are needed on each client to manage the concurrent access. Each user can locally modify the environment but only when this modification is transmitted, possible conflicts can be detected. Another flaw is the size of the DB, since the bigger the NVE the bigger the amount of data to be stored. Finally, the lack of flexibility is another important disadvantage. Adding new elements to the NVE can be a hard task as they must be created and replicated into all the DB copies in each client. Shared Distributed World

Unlike in Homogeneous Replicated World, in Shared Distributed World, the clients do not store a copy of the full database, but each one stores only a different part of it that is shared with the rest of the clients [84] (Fig. 14). So, these clients act as the servers of their DB. Examples can be found in DIVE [28], RAVE [118], and BrickNet [119].

Fig. 12. Shared Centralized World, with the steps of a modification.

Fig. 13. Homogeneous Replicated World. All the clients store the same DB; and every time an event is triggered, each DB is updated and synchronized.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 11 By using it, the data filtering process and the initial database download can be reduced, saving time and network usage. Needed resources are more reduced and keeping consistency in the NVE becomes an easier task than in Homogeneous Replicated World. The consistency in the complete database, however, needs to be guaranteed by using other methods (e.g., by using Dynamically Changing Data Distribution, presented in the next subsection, or a hybrid network architecture). Dynamically Changing Data Distribution

This is a hybrid data distribution technique, which dynamically adapts the replication of data to be either Shared Centralized or Shared Distributed, depending on the requirements of consistency and responsiveness of the entities. In the same NVE, some entity manipulations require good responsiveness (e.g., real-time movement) while others require strong consistency (e.g., turn-based event) [86]. Data distribution can also be changed depending on the capabilities of the network or its nodes. This way, depending on the object the user is interacting with and the current network latency, the data distribution can be dynamically changed to make for a trade-off between consistency and responsiveness. An example of the use of this data distribution technique can be found in COLLAVIZ [34] and in OpenMask [120]. In it, clients have a replicated copy of the NVE data. Nonetheless, each of the object behaviors will be executed in just one of the clients, denominated controller of that object. In other words, each object is assigned to only one controller. Two types of handles are used: reference object handle and mirror object handle. Reference object handles are used by the controllers to identify simulated objects whose behaviors they can execute or control (these objects are named referents). Mirror object handles are used by the controllers to identify additional copies of their referent objects but controlled by other clients (i.e., copies of the objects whose behaviors will be executed in a different controller). The process to modify an object is shown in Fig. 15. If the referent object to be modified is in the user’s client, that object will be locally manipulated and then an update message will be sent to the rest of the clients which have and control a copy of that object (steps 1 and 2 of Fig. 15). On the other hand, if the object to be modified by the user is not a referent object in the user’s client, a request will be sent to the remote client, which is the controller for that (mirror) object (steps 3 to 6 of Fig. 15). In that remote client, the object will be modified, and an update message will be sent to all the other clients controlling copies of that object. The advantages Dynamically Changing Data Distribution offers are a better scalability compared to centralized solutions, and that the NVE can minimize the throughput requirements. On the other side, this technique is more complex to implement, and the exchange of additional control messages can affect the network throughput. Comparison

Table 3 presents a summary of the data distribution techniques presented in this section, including their advantages, disadvantages, as well as some examples of NVEs using them. They cover all the possibilities for the storage and replication of the NVE data, from the most centralized database to more distributed ones, including hybrid solutions as well. VII. R ESOURCE BALANCING

To address the scalability problems of certain techniques, the NVEs can distribute their computation load and, hence, the tasks needed for running the NVE applications, among several nodes, by using resource balancing techniques, depending on the architecture of the NVE [5], [121]. This is quite different from the information management techniques, which have direct control to the data and not to the computing requirements. The resource balancing techniques contribute to reduce the network usage and the end-to-end delay, making the NVE more scalable, but may add problems related to security, robustness, or global consistency maintenance. Moreover, they can be centralized, when controlled individually (by a server or a sole peer), or distributed, when several peers manage the same processes. A. Centralized balancing

Centralized techniques for resource balancing are mainly suited for centralized architectures (Client/Server or Cloud-based ones) [5]. They grant authority to the same node or group of nodes (e.g., multiple servers sharing workload and different tasks between them) for managing a session for a group of clients that share the same interests (e.g., the players of the same match, a group of clients with avatars in the same zone, or all the attendees of a virtual event). Different centralized resource balancing techniques are described and compared in this subsection, such as: Mirroring, Instancing and Zoning.

Fig. 14. Shared distributed world (clients storing different parts of the DB and sharing them through the network).

Fig. 15. Dynamically Changing Data Distribution. A client modifies a local entity, while another modifies a mirrored entity.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 12

Data Distribution Advantages Disadvantages Examples

Shared centralized world • Easy consistency control • Reduced storage need • Responsiveness problems • Scalability problems Curtis et al. [116]; VISTEL [117] Homogeneous replicated world • Easy responsiveness control • Reduced network usage • Reduced network latency • Consistency problems • Concurrency problems • Robustness problems • Increased storage need SIMNET [16]; RING [26]; ShareX3D [29] Shared distributed world • Easy local consistency control • Easy responsiveness control • Reduced storage need • Global consistency problems DIVE [28]; RAVE [118]; BrickNet [119] Dynamically changing data distribution • High scalability • Reduced network usage • Complex implementation • Adds control messages COLLAVIZ [34]; OpenMask [120] Table 3. Data Distribution in NVEs. Mirroring

This consists in dividing the management of overloaded NVE zones into a set of servers. Each one of these servers manages and processes the computing, states, and a part of the DB for different subgroups of entities placed inside the overloaded zone (Fig. 16). The information and updates of all the entities in that zone are then synchronized among the servers. Examples of the use of Mirroring can be found in [14], [122]. Fig. 16 shows two servers, each managing different entities (represented with different shapes inside the DBs) for the same NVE that a client accesses by connecting to each of the servers, while the servers are also connected between them for exchanging control messages. Mirroring efficiently manages the network usage, adapting it to the overload of zones in the virtual world. Nevertheless, it adds extra computing and design complexity requirements with the need of synchronization when the number of users increases. Instancing

It is a simplification of Mirroring, in which session load is distributed by starting multiple independent instances of the same NVE zone [123]. These instances are independent of each other, thus, players in different instances cannot interact and do not see each other even if their avatars are closely placed in the virtual world (Fig. 17). Thanks to this, the NVE is easily scalable. However, this solution cannot be used when it is needed that the users view and interact with all the other users in the same virtual zone at the same instant. Instancing is mostly applied in MMO games, such as, e.g., CoH [12] and WoW [15]. In WoW, it is referred to as sharding. Different instances of a zone are generated and the players within each zone are distributed between instances depending on different features of the game, such as their level, progress, or party (grouped in-game players). In that game, a technique called cross-region zoning is also employed to reduce the number of instances, when needed. If the number of users inside some instances decreases, multiple instances can be merged into one, balancing the number of players in them. This way, the number of users in each instance is always balanced. Zoning

In Zoning, the virtual world is divided into different zones that are handled independently by separated servers [89], [124]. Clients then connect to the server that holds the zone their avatars are located and interact between them through the same server. When an avatar moves to another zone, the server changes (Fig. 18). An example of the use of Zoning can be found in [124], [125], and [126]. The overload of the NVE is distributed, making it better manageable and easier to implement. Nevertheless, controlling the zones can become a difficult task when the number of clients fluctuates a lot. B. Distributed balancing

Distributed resource balancing consists in changing the manager of the processes and resources as avatars enter, move through, or leave the different zones of the NVE or the near surroundings of other avatars. These techniques are based on distributed network architectures, like P2P, and, therefore, cannot be applied in NVEs following centralized network architectures [115], [127]. The techniques that are explained and compared in this subsection are: Mutual Notification, Neighbor-list Exchange, Fully Connected, Multicast, Distributed Hash Tables, and World Partitioning. Mutual Notification

With Mutual Notification, clients are directly connected only to those other clients (peers) whose avatars are closely located in the virtual world (considered as neighbors). Only when a peer’s avatar changes its position, interact with the virtual world, or a new peer’s avatar enters in the surroundings of another peer’s avatar, notifications between direct neighbors will be sent, so they can achieve global connectivity and

Fig. 16. Mirroring. Two servers, each managing different parts of the NVE (represented with the different shapes) for the same overloaded zone.

Fig. 17. Instancing. Two instances of the same zone, with different entities.

Fig. 18. Zoning. The client changes the server when its avatar moves from the left zone (controlled by Server 1) to the right one (controlled by Server 2).

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 13 neighbors’ discovery. Thanks to that, network usage is optimized, but at the expense of adding the overload of calculating the neighborhood. Additionally, having a lot of close users in the virtual world can impair the consistency or responsiveness. In Fig. 19, it can be seen how only peers with their avatars close enough in the same zone of the virtual world get connected (3 groups of peers). Examples of the use of Mutual Notification can be found in VON [128] and in pSense [129] Neighbor List Exchange

When using Neighbor List Exchange, peers keep knowledge about the existence of their neighbor peers’ avatars (in the virtual world) and about the avatars of the neighbors of the neighbors by exchanging messages continuously (Fig. 20). To get a better view of the virtual world and to be better informed of the state changes and updates, a peer receives all the information from its neighbor peers [130]. These neighbor peers not only send information about their own avatar and its actions (e. g., updated states on entities modified by its own avatar) but also retransmit the information received from their own neighbor peers (e.g., updated states on entities modified by its neighbor peers’ avatars). Fig. 20 shows the peer number 4 sending update information to the rest of its neighbor peers. Peer number 1 retransmit the update information to its own neighbor, the peer number 2. Examples of NVEs can be found in [131] and [132]. Neighbor List Exchange can reduce the network usage, but the exchange of messages can affect the NVE negatively, and the global consistency cannot be guaranteed. Fully Connected

This consists in connecting every peer to each one of the others, so they exchange the updates directly. This way, delays between the participants are reduced. This has been done in some systems intended to improve the performance of MOGs, such as Pithos [104], and Donnybrook [133]. When the number of connected peers increases, scalability problems arise. With the purpose of having a more efficient NVE, data filtering techniques can be applied, limiting message transmissions. Multicast-based technique

This takes advantages from IP multicast delivery, as well as application layer multicast (ALM). A multicast group is created with peer clients that share an interest in common entities or zones of the virtual world. Only members of that group will receive updates about state changes of those entities or zones. Some examples of NVEs employing Multicast are NPSNET-V [24], DIVE [28], TerraNet [93], and SimMud [103]. Multicast scales well when the number of clients increases but can experience some flexibility and fairness issues when peers connect through heterogeneous access networks, with different bitrates and latencies. Distributed Hash Tables (DHT)

This is an effective and fair way of balancing the system load among all the peers. In general, the entities in the virtual world are divided into groups and each group is assigned to one peer participating in the NVE. Each peer has an ID and manages a group of entities. To decide the group of entities each peer manages, a hash function (to associate different variables with a certain length value) is used (Fig. 21). This process has two steps. Firstly, a hash function is used to convert data of variable length to fixed size data. The parameters used by the hash function are usually the latency and the geographical location of the peer. Secondly, values produced by the hash function that are similar are associated to the same peer. For instance, in Fig. 21 it is shown how closer positions and latencies produce similar values. SimMud [103] and Colyseus [134] are NVE systems which use DHT but combined with multicast. Another example can be found in [127]. DHT improves the NVE scalability and robustness and reduces network delays. Nevertheless, as it can be expected, it is more complex to implement than other simpler methods. World Partitioning

The virtual world is divided into zones, each assigned to a specific peer whose avatar is inside the zone, with sufficient computing power, throughput, and storage capacity, known as Super Peers (SP) [135], [136] (Fig. 22). The SPs act like servers and clients at the same time, controlling the zone management (i.e., data from entities in that

Fig. 19. Mutual Notification in a virtual world (10 peers) with 3 groups.

Fig. 20. Neighbor List Exchange with a peer sending an update message.

Fig. 21. Distributed Hash Tables. Position and latency are passed through a hash function and entities are assigned to peers based on the returned value.

Fig. 22. World Partitioning (zones and avatars assigned to an SP in each).

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 14 zone are managed by the SP). However, an avatar’s information can be managed by both the SP and the associated peer to which the avatar belongs. Examples of NVEs with World Partitioning are found in Pithos [104], MOPAR [137], and Badumna [138]. World Partitioning is like mutual notification in that both divide and group peers, based on the zone their avatars are located, but, while with mutual notification the peers of the avatars in the same zone get connected one to each other, with world partitioning the peers get connected to the same SP. This reduces computing needs for the group management, compared to mutual notification, as the neighborhood calculation processes a lower number of connections with the peers. Additionally, it also offers a good scalability, and the consistency is easy to maintain. Nevertheless, it is not suited when clients with low capabilities are mostly used (e.g., mobile phones), as it makes it harder to manage the NVE by few peers. C. Comparison

Table 4 shows a summary of the explained centralized and distributed techniques for resource balancing used in decentralized NVEs, their advantages and disadvantages, as well as examples of NVEs using them. Resource Balancing techniques can be either centralized or distributed, being suited for centralized or for distributed network architectures, respectively. In general, these techniques already deal with all the kinds of applications and network architectures the NVEs can have, but, like data filtering, newer techniques could be based on more parameters than the position of avatars and node geographic locations. For instance, avatar mobility patterns not based on random walk could be considered for deciding how the nodes will connect and when the connections should change, as proposed by Liang et al. in [88]. Additionally, other studies on modeling the behavior of avatars can be found in [127], [139], [140]. In conclusion, this field could be further explored. VIII. T IME MANAGEMENT

NVE systems need to allow users in a concurrent environment to perceive and interact within the virtual world despite them experiencing different network delays. In this sense, time management solutions deal with the time instants or periods when messages are transmitted and/or processed, to balance between consistency and responsiveness, and include Predictive Modeling and Synchronization techniques. Predictive modeling techniques employ information from previous instants and manage the time in a predictive manner (e.g., predicting future events), while synchronization techniques work in a more deterministic way and are used for ensuring the causality or consistency of the events in the NVE. A. Predictive modeling

Predictive modeling techniques try to predict the behaviors of users and entities, and their consequences, to reduce the need to send update messages or to optimize their delivery. Consequently, the traffic overload and network usage are minimized, and end-to-end latency is decreased, while keeping a certain degree of tolerable consistency. The prediction of events can happen in the client originating the event or in the rest of clients that are supposed to receive the associated update message. In this subsection, the following predictive modeling techniques are explained and compared: Dead Reckoning, Position History-based Prediction, Exponentially Weighted Moving Average and Kalman Predictor. Dead Reckoning

In Dead Reckoning, a client that is modifying the state of an entity (e.g., moving an avatar) also runs a simulation, in background, of what the rest of users are perceiving based on the last update message communicated to them until the user modifying a state notifies a new change of the state to the rest of the users [52]. For example, when an entity is moving in a direction at a certain speed according to the last update, it is supposed that it will keep that speed and direction until a change is notified. If the actual state of the entity (e.g., speed, direction, acceleration, position, etc.) exceeds a certain allowed threshold, or degree, of inconsistency (divergence), compared to the simulation (e.g., the current velocity of an avatar is much lower than before -i.e., when the last update message was sent-), a message is sent to update it, and then the rest of users will change the state to the actual one, and the client modifying the state of that entity will go on simulating what the others perceive, from that last update message.

Technique Advantages Disadvantages Examples

Centralized Mirroring • Reduced network usage • Synchronization problems • High computing needs Rokkatan [14]; Morillo et al. [122] Instancing • High scalability • Limited number of users CoH [12]; WoW [15] Zoning • Easy implementation • Scalability problems Abdulazeez et al. [124]; Cai et al. [125]; Zhang et al. [126] Distributed Mutual notification • Reduced network usage • Consistency problems • Responsiveness problems VON [128]; pSense [129] Neighbor list exchange • Reduced traffic overload • Consistency problems Chen et al. [131]; Kawahara et al. [132] Fully connected • High responsiveness • Scalability problems Pithos [104]; Donnybrook [133] Multicast • High scalability • Robustness problems • Fairness problems NPSNET [24]; DIVE [28]; TerraNet [93]; SimMud [103] DHT • High robustness • High scalability • Reduced network delay • Complex implementation SimMud [103]; Colyseus [134]; Wang et al. [127] World partitioning • Easy consistency control • High scalability • Robustness problems • Bad when there are only lightweight clients Pithos [104]; MOPAR [137]; Badumna [138] Table 4. Resource Balancing in NVEs.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 15 With Dead Reckoning, a margin of consistency is ensured, but the complexity of the NVE is increased. An example of the use of Dead Reckoning can be found in MiMaze [11], SIMNET [16], TerraNet [93], and in [141]. Fig. 23 shows Client 2 simulating the movement of Client 1’s avatar, from the last known direction. Meanwhile, Client 1 is moving the avatar and running a simulation, at the same time, of the movement that Client 2 should be viewing. At some point in time (t1, in Fig. 23), Client 1 changes the direction of the avatar, so the movement is not the same as the simulation of what the Client 2 perceives. Nonetheless, Client 1 does not send any update message until the actual movement and the simulated movement diverge, exceeding a threshold. This way, when using the Dead Reckoning, a tolerable difference between any state of an entity and its simulation reduces the need for updates. The main benefit of Dead Reckoning is that update messages are only sent when the consistency diverges, exceeding a threshold, and thus, network usage is reduced when participants can estimate the states with a tolerable degree of consistency. Nevertheless, if the states fluctuate a lot or are unpredictable (e.g., avatars making arbitrary movements or entities with high speed), messages will be exchanged more often, and other techniques will be required to handle the inconsistencies and reduce the number of updates needed. Position History-based Prediction

Position History-based Prediction is used to anticipate the movement of entities in other clients in a way that it seems close to the real movement experienced in the client that originated that movement, even with network delays. To do this, the other clients extrapolate future positions using the previous directions and, when an update is received with the correct position, instead of instantly moving to that position, the movement is changed so the entity converges to that position. An example of the use of Position History-based Prediction can be found in [142]. Fig. 24 shows an example where an update message for a moving entity is received, the received and the previous known directions are used to estimate the current direction (e.g., an average direction) it is following until receiving the next update. With Position History-based Prediction, users will still see a continuous movement, even when no updates are received. Nonetheless, the memory and processing requirements of the NVE are increased and, as it uses past movements to predict new ones, it may lose effectiveness when arbitrary movements happen. Exponentially Weighted Moving Average (EWMA)

EWMA is inspired by the previous one, but it sets different weights to the previous directions, so that the more recent ones are weighted higher [143]. Movements are a combination of position, direction, and speed. The last known movements are used to estimate future movements, favoring the recent ones in the prediction. An example of this predictive model can be found in [144]. Fig. 25 shows an example comparing Position History-based Prediction in a), and EWMA in b), in which the movement of an entity is approximated to a direction based on weighted previous directions, so the most recent ones have greater influence. EWMA provides a quick adaptation to the specific way entities move (e.g., if an entity has been moving in circles recently, EWMA predicts it will keep moving in that circular course), reducing the inconsistency without increasing the network usage. Nonetheless, as Position History-based Prediction and EWMA use past movements to predict new ones, they may lose effectiveness when arbitrary movements happen. Also, they increase the memory and computing requirements, used to store previous directions, and calculate the predicted ones. Kalman Predictor

The Kalman Predictor is used for reducing the inconsistencies of the tracking prediction between the original entity movement and the simulated one in the rest of the clients [144]. Kalman Predictor has two phases: in the first one, it makes a prediction based on the previous states, while in the second part, when receiving the actual state, it sets a weight on that prediction depending on how accurate it was (the more accurate, the higher weight). Then, the process is repeated, using the previous weighted predictions for the next estimations. This is represented in Fig. 26. In a), the Kalman Predictor is not employed, and therefore, the system does not learn from previous inaccurate predictions. In b), a previous prediction with high error is processed to set new weights, and a latter prediction is weighted according to that, obtaining a more precise prediction. The Kalman Predictor adapts easily, requiring less update messages and correcting errors. Like Dead Reckoning, Kalman Predictor is also useful when the movements of the entities are stable (i.e., with low variations). Nonetheless, when movements are fast and arbitrary, Kalman Predictor loses accuracy.

Fig. 23. Dead Reckoning (predicted and real movements of an avatar with the diverged distance).

Fig. 24. Position history-based prediction (from two previous directions).

Fig. 25. Comparison of Position History-based Prediction (a) and EWMA (b).

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 16 An example of the use of this predictive model can be found in [145], where rather than predicting current positions on the destination client, it is used for predicting future positions at the time the update will be received due to network delays. For instance, if there is a 100ms delay, the sender will estimate the position at 100ms latter and transmit it. Another example can be found in [146], where it is used to estimate the head motion of users’ HMDs. Comparison

Table 5 presents a summary of the predictive modeling techniques used in NVEs described in this section, including their advantages and disadvantages, as well as some examples of NVEs following each of them. On the one hand, Predictive Modeling techniques deal with the consistency in the NVE, and, as it can be expected, their consistency approach (conservative, aggressive, etc.) enters in the group of prediction-based approaches. On the other hand, these techniques predict only movement of entities, but newer techniques, yet to be developed and applied in the field of NVEs, could be used to estimate other parameters like entity access (i.e., if an entity is going to be used by several clients), the user density of zones (i.e., when a zone is going to be crowded, considering other techniques, like distribution), and other interactions like an avatar shooting a gun or opening a door. B. Synchronization

In NVEs, the synchronization (abbreviated as sync, hereinafter) techniques constitute important mechanisms to maintain a satisfactory level of consistency and fairness, which contributes to provide truly engaging and interactive experiences to users, despite the existence of network issues. These techniques schedule the notification and execution of events to be performed on specific times. Additionally, they may offer either a delayed global consistency , when users perceive the same consistent world but at different times, or an imposed global consistency , when the execution of events happens for all users at the same instant. Moreover, sync techniques for NVE can be employed to either synchronize the occurrence of events, when their temporal relationship should be maintained between clients, or to synchronize media streams (composed of stream media units or MUs), which can be continuous streams (e.g., audio or video streams) or data streams containing parts or the totality of one or more events. Well-known sync techniques in media communications usually deal with the sync of the playout of media streams (i.e., stream MUs inside data packets) [147], [148]. These sync techniques, at the same time, are divided into four types [149], [150]: intra-stream sync, inter-stream sync, inter-destination media sync (IDMS), and inter-device sync (IDES) techniques. Intra-stream sync handles and maintains the temporal relationship within each time-dependent media stream (i.e., received stream MUs, are processed and presented in the correct order and timing). Inter-stream sync handles and maintains the sync between the playout processes of related (time-dependent or not) media streams (e.g., audio-to-video sync, or lip-sync), i.e., those streams can be played out on the same device or on different devices, which, in turn, can be either close-by (a.k.a. IDES) or far apart, in different locations (a.k.a. IDMS or group sync). On the one hand, in NVEs, IDES techniques handle the sync between different devices used in one client, such as HMDs, haptic devices, smartphones, smart TVs, and computers, to maintain interactivity and a good QoE when multi-device scenarios are required. Group of devices synchronization techniques (IDES or IDMS) can be divided into three groups, according to the sync control scheme followed by the sync solution [59], [151], [152]: Master-Slave, Synchronization Maestro, and Distributed Control schemes. • Master-Slave scheme (M/S), which consists of selecting one device as the master device while the other devices are considered as slave ones, and only the master device sends timing information about its playout processes to the slave ones that take it as the sync timing reference. • Synchronization Maestro Scheme (SMS), in which there is a sync maestro device, which can even be an independent device, in charge of collecting timing playout information from all the involved devices, processing it, and sending messages with a calculated sync timing reference to all of them to make them adjust their playout processes to be in sync. • Distributed Control Scheme (DCS), in which all the devices exchange their timing playout information and individually calculate asynchronies between them and adjust their own playout process to be in sync. In a previous authors’ work in [59], a qualitative comparison of the three schemes used for IDMS is presented.

Technique Advantages Disadvantages Examples

Dead Reckoning • Reduces the network usage • Cannot handle high divergences MiMaze [11]; SIMNET [16]; TerraNet [93]; Chen et al. [141] Position History-based Prediction • Reduces the network usage • Can predict big deviations • Cannot estimate arbitrary actions • Increased computation needs Singhal et al. [142] EWMA • Reduces the network usage • Adapts quickly to changes • Cannot estimate arbitrary actions • Increased computation needs Chan et al. [144] Kalman Predictor • Adapts quickly to changes • Reduces the network usage • Loses accuracy on arbitrary actions • Increased computation needs Tumanov et al. [145]; Gül et al. [146] Table 5. Predictive modeling techniques in NVEs.

Fig. 26. Kalman Predictor improving predictions from previous errors.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 17 On the other hand, IDMS is the most important sync type in NVE deployment as it is related to more parts of the NVE, which affect the consistency and responsiveness between the partakers of the NVE. Such parts are the end-to-end network, clients, servers (if they exist), and NVE data. Furthermore, this type of techniques are basically the ones in charge of synchronizing the virtual world and the states of its entities between several distributed clients. So, apart from the above group sync techniques (M/S, SMS and DCS), in this section other group sync techniques used in the past for NVEs are considered and compared, such as Local Lag, Dynamic Local Lag, Adaptive ∆-causality, Bucket Synchronization and Breathing Time Buckets, Lockstep sync, Asynchronous sync, Adaptive Event sync, Time Warp sync, Breathing Time Warp, Trailing State sync, Event Correlation sync and Optimistic Obsolescence-based sync. Local Lag & Dynamic Local Lag

Local lag (LL) is used to reduce short-time inconsistencies between clients in an NVE by delaying an operation in every client a certain amount of time, called the local lag. This way, all the clients execute the same actions at the same time, despite their differences in network latency, by slowing down the responsiveness [153]. Fig. 27 shows an example in a Client/Server-based solution. When the clients connect, the Server tests the latency of every client and based on the maximum value of those latencies, a fixed waiting time (local lag) is set for each client, which, when added up to the network latency of the client, will be equal to that maximum delay for all the clients. This technique can be further improved by using Dynamic local lag (DLL), in which, instead of using a fixed amount of delay, the information is delayed dynamically according to the network latency in both source and destination clients [147]. First, the local lag is calculated at the beginning for each type of entity, according to the network latency and the responsiveness requirements of the type of entity (e.g., an entity with higher relevance in the NVE would require lower delay). Then, every time the network load changes, or the position of an entity changes, the value of the local lag is calculated and updated. Update messages are stored depending on the local lag value and sent after their waiting time. Examples of both techniques can be found in [69] (LL), and [147] (DLL). LL and DLL have a limit, as too much lag negatively affects the responsiveness, therefore, other techniques should be used in conjunction. Also, the DLL ensures an optimum delay for every participant that also is adapted to network fluctuations, but enough throughput and processing resources are needed for the periodic calculation. Adaptive ∆-causality

All the clients of the NVE use the same maximum end-to-end delay, as in LL, which is set dynamically according to the network latency, and determines the time limit the updates can take to reach their destination [147]. The clients will send the update messages as soon as they are generated, and, if they are received before the time limit expires, they are stored until it happens, and then, they are executed. If an update is received later, it is not executed but used for estimating values (e.g., to predict the future position of an entity). An example of NVE with Adaptive ∆-causality can be found in [147]. In Fig. 28, Client 1 sends update messages to Client 2, with information of the position of its avatar. The update message 1 and 3 are received on time, but the update message 2 (including the position x = 5 ) arrives late at Client 2. If Adaptive ∆-causality were not used, the client 2 receiving the update 200 milliseconds late would update the value of x to 5, since that is the value received; or discard the message and wait for the next one. However, when using Adaptive ∆-causality, the Client 2 can use the timestamps and delays to calculate a new value instead. So, the new position is calculated with, e.g., the formula: Ve = Vl + d × s , where Ve is the value to be estimated, Vl is the value received late, d is the delay of that value and s is the step (or slope) in the value between the late update and its previous one. So, in this case, it will be x = 5 + 200ms × (5 - 2) / 600ms = 6 . Adaptive ∆-causality keeps the causal relationship among messages. Thanks to this, the consistency can be maintained when the network latency is not too high. Otherwise, other techniques should be combined. Bucket Synchronization (BS)

The main idea of BS is that messages originated by all the clients at the same time or during the same period should be processed together and at the same time in all of them. In BS, time is divided into time slots with a fixed length and a bucket is associated to each slot (called bucket period) [51]. All the update messages received by a client that were generated and transmitted by sender clients during a given period are stored by the receiver clients in the bucket corresponding to that period. At the end of each bucket interval, the receiver clients compute all the (own and received) update messages in that bucket to get their new local views of the global state of the NVE. With BS, the simulation runs in cycles of the same duration (bucket period). After each cycle, clients synchronize with the rest and then, its virtual time is increased the same amount, that is, a bucket period. However, when an update message is received after its bucket period, due to, e.g., network delay, the clients return the states to that previous bucket and repeat the execution of all its messages (rollback). When late

Fig. 27. Local lag in a Client/Server-based solution. The update message of an event is delayed 30ms for the client with lower network delay (on the right), to match the total delay to 100ms (other client on the left).

Fig. 28. Adaptive ∆-causality. A late update used to estimate a position.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 18 update messages from other clients, or other rollbacks affect more than one bucket, cascade rollbacks are conducted, forcing several buckets to be reprocessed successively. An example of the use of BS can be found in MiMaze [11]. In Fig. 29, the bucket ‘a’ receives the update messages 1 and 2 unordered, but all of them belonging to that same bucket, so they can be processed together without causing rollbacks nor inconsistencies. During the bucket ‘b’, update message 4 is received, but the message 3 is missing. The received message 4 is processed, therefore, causing an inconsistency. During bucket ‘c’, the late update message 3 is received, forcing a rollback, reprocessing the previous bucket ‘b’, and then, the bucket ‘c’, recovering the consistency. The main advantage of BS is its low computation overhead required, but it presents several flaws that make it impossible to be used in NVEs requiring a high level of responsiveness. These disadvantages are mainly related to the bucket period, as it should be long enough, so each client can process enough messages, but it should also be short enough to support fast and realistic interactions. Breathing Time Buckets (BTB)

BTB is like BS, but, in this case, the length of the bucket periods is variable, and each client processes their buckets independently of the others [154]. During a bucket period, the received update messages are executed as they arrive if they do not precede another that has already been processed. Otherwise, if rollbacks are needed, the bucket duration ends prematurely, while corrections are made. With BTB, when this situation happens, control messages are interchanged between clients so that rollbacks are done locally, and no cascade rollback occurs. An example of the use of BTB can be found in [155]. In Fig. 30, an example of a transmission of update messages between two clients is shown. Client 1 sends the update message 3 that arrives late (outside its corresponding bucket period) to Client 2, but it does not cause a rollback in that client because it does not break consistency (e.g., it is an independent event). Nevertheless, when Client 2 sends the update message 4 to Client 1, it arrives too late and generates a rollback on Client 1 because this update message is received after update message 5 that depends on the update message 4, and the consistency had been broken. With BTB, network usage is reduced, since most rollbacks happen locally, and responsiveness is increased. Nevertheless, the global consistency cannot be ensured, and enough messages are required in the same bucket to reduce the number of rollbacks. Lockstep synchronization (LS)

In LS, a server manages a global time reference. For every interaction (event) in the NVE that changes any states of entities, the server stops the simulation time until all the participants update their states for those entities. Clients do not advance in time until the server notifies them to do so. Then, the simulation time is resumed. This way, a consistent NVE is achieved [26]. In Fig. 31, an example of the steps followed for each interaction in a simple scenario are described. Firstly, when a user wants to interact, the associate client informs the server. Then, the server stops the global simulation time or GST (lockstep mode), notifies it to the rest of clients and sends them the event, so each client can start its computation processing. When finished, each client notifies it to the server. Only when all the clients’ notifications have been received by the server, it advances the GST and moves to the next interaction or turn of the NVE. An example of the use of LS can be found in RING [26], and in [156]. With LS, the consistency is always ensured. Nevertheless, it is not recommended to be used in NVEs demanding high level of interactions and responsiveness, since clients would enter continuously in the lockstep mode, waiting for the rest of clients to notify that they finished processing the events, being very annoying for them. Furthermore, if the processing of an interaction in any client is delayed for a considerable amount of time (e.g., the end-to-end delay is high, or the update message takes too much time to be processed on one client), the server will stop the GST until that client finishes and notifies it to the server. So, the responsiveness of the NVE, and, consequently, the users’ QoE, could be seriously affected. Asynchronous Synchronization (AS)

AS is based on the previous one, but with a decentralized clock, which allows each client to advance simulation time without depending on the other clients [157]. This is achieved by only sending event update messages to the clients that are affected by those events (e.g., when a client’s avatar can see another client’s avatar opening a door or they are shooting each other). Each client has their own clock and perceives the same consistent world, but at different times. An example can be found in [158], where the concept of Spheres of Influence (SoI) is used. A SoI is the zone close to a clients’ avatar that can be affected by it in future turns, as shown in Fig. 32.

Fig. 29. Example of BS with a rollback.

Fig. 30. Example of BTB with different sized buckets.

Fig. 31. Steps of the LS.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 19 AS is resilient to game cheaters and improves the flaws of the LS, solving isolated effects of poor connection. However, this method requires all clients to have similar network conditions, because a client that has higher latency than the rest would act as a bottleneck. Moreover, a global consistency is also hard to maintain when the number of connected clients increases. Adaptive event synchronization (AES)

AES takes the fluctuating conditions of the network into account and combines the delay and packet loss to provide information that helps to determine the playout delay. The playout delay is a controllable delay added in each client to have the same end-to-end delay for all of them (Fig. 33), attaining visualization of the events at the same time. This playout delay is calculated by selecting a main client that takes the maximum delay between the different clients, estimates the jitter based on previous known measurements, monitors the packet loss, and share all this information with the rest of clients. Then, the clients determine how much time the events will be kept in a buffer before being transmitted or executed in sync. The usage of that buffer is the main difference between AES and LL. The idea of using both the delay and loss is to solve punctual, as well as long-term inconsistencies. An example of the use of AES can be found in [159]. . As network characteristics are used to establish the sync parameters, users will likely be able to interact in a consistent virtual world. Nevertheless, as one of the clients is elected to process the sync computations, it needs enough computation resources to work properly. Furthermore, AES also adds the control messages that reduce the available network throughput. Time Warp Synchronization (TWS)

In TWS, the event-related messages exchanged in the NVE have four fields: the name of the sender, the name of the receiver, and the virtual sending and receiving timestamps [160], which are filled by the clients and used to synchronize the update messages. The sender and receiver clocks should be previously synchronized, so that the timestamps indicate the correct relationships between events. The update messages are processed by the clients as they arrive. If an update message arrives containing information of an event timestamped before the event being processed, rollbacks are made by executing the older event and then the following ones in order. As shown in Fig. 34, when an event (3) is received late (a) a rollback is performed to return to the state corresponding to an execution moment before that event happened and then all the received events are processed in order (b). To avoid inconsistency problems caused by update messages that have been sent by that client before the late update message was received, that client inform the rest of the clients about the rollback and that some of its previously sent update messages could represent an incorrect state of the NVE. An example of TWS can be found in [97]. TWS allows the NVE to have a high responsiveness when having enough throughput and processing resources. If these features cannot be guaranteed, TWS should be used only when the rollback processes do not occur often because they can be highly annoying for the users. The main disadvantage of TWS is the need of high memory capacity, as copies of the NVE processed messages must be stored. Breathing Time Warp (BTW)

BTW consists in a combination of TWS and BTB, and deals with the issues they can experience [161]. First, it starts with a TWS phase, where events are treated in an optimistic way up to a chosen time delay (lookahead), performing rollbacks and informing about them if inconsistencies happen. When a specified time passes, the BTW moves into a BTB phase, until a specific number of events are processed, and rollbacks are stored to be sent later. After this, it goes back to the TWS phase and repeats the processing cycle. In Fig. 35, the two delayed update messages in the TWS phase led to two rollbacks, each happening when a late update message is received. Later, in the BTB phase, the two late update messages received in bucket ‘b’ generate a single rollback event to the bucket ‘a’. Examples of the use of BTW can be found in [161], [162]. The problem of BTW is that the time between cycles can affect the consistency in scenarios where the number of messages varies dynamically. To solve this problem, in SafeBTW [162] the time between these two phases is changed dynamically, adapting to the network and to the number of received messages. Generally, BTW can provide a better consistency than TWS, without losing too much responsiveness like with BTB.

Fig. 32. Example of AS. The clients inside a SoI receive the update message.

Fig. 33. AES. The latency estimation from the loss and delay is used to calculate the necessary end-to-end delay to equilibrate the timings of all clients.

Fig. 34. Rollback process in Time Warp.

Fig. 35. Example of BTW (TWS and BTB combination).

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 20

Trailing state synchronization (TSS)

TSS is like TWS, but in this case, a series of consistent copies of the NVE in previous execution times are stored [163]. In parallel to the main simulation of the NVE, delayed copies of itself (i.e., copies of the NVE at a different virtual time), are being executed. When a new event arrives, if it precedes the current visible state, a rollback process takes place. As there are several delayed copies of the NVE running, it is probable that one is in a state before the instant the late update message was sent, and there will not be any inconsistency in that copy of the NVE. In that case, that copy of the NVE will be turned into the main version, executing all the received updates in it in the correct order. This is represented in Fig. 36, in which there is a main processing copy of the NVE and three delayed copies ( trailing state copies ) of the NVE (a). When late update message 6 arrives a kind of rollback is forced. The copy stored previously to the instant the event in the late message should happen is restored as the main processing copy and the NVEs continues reprocessing all the ordered received events since the time it was stored. Although several copies of the NVE are simultaneously running, only the main one is visible to the users. An example can be found in [163]. In comparison to TWS, TSS provides better responsiveness, and the rollback process is improved, making it suitable to be used in fast-paced NVEs, such as first-person shooter (FPS) MOG. Nevertheless, it requires high memory capacity and processing resources to maintain all the needed copies of the NVE.

Event Correlation Synchronization (ECS)

ECS is like TWS and is based on event correlation algorithms. Some events are time-related between them (event-correlation) and some are not (non-event-correlation) [164]. Correlation means that one event depends on another event, so that one can happen after the other, but not vice versa. When a late update message arrives, instead of rolling back as in TWS, firstly, correlation of this event with the already processed events (stored) is checked to decide what the best choice is. If no correlation is found, the late event can be processed without rolling back and it will not lead to inconsistency. Otherwise, the rollback process will take place. As shown in Fig. 37, Client 1 sends update messages to Client 2, and the update message 2 is received after the update message 3 by the Client 2, e.g., due to fluctuations in network latency. If the events in messages 2 and 3 are correlated, so that event 2 must happen before event 3, a rollback will take place. Otherwise, update message 2 will be processed without the need of a rollback. Moreover, if the update message 3 already updated the same state that the update message 2 would (e.g., both changed an avatar’s position), the late update message 2 will not be executed, as there is already a most recently updated state by the update message 3. An example of ECS can be found in [164]. ECS can reduce the number of rollbacks, improving responsiveness and interactivity and lowering the network usage and avoiding inconsistencies. Nevertheless, it has high memory capacity and processing resource requirements to do the correlation computing.

Optimistic Obsolescence-based Synchronization (OOS)

In OOS, besides applying TWS and ECS, event obsolescence is also considered. If an event arrives after it has already been overridden by a newer one, it is discarded [165], [166]. Given two update messages, i , and j , j makes i obsolete if processing j (generated after i ) without i can achieve the same final state that would be reached if both events were processed in the correct order. The obsolete events are discarded, so there will not be a need to check for inconsistencies and no rollback will happen. For instance, i could be an update message that changed the position of an entity, while j could be an update message that destroyed this entity (eliminating it from the virtual world) and, therefore, the older position update becomes unnecessary. Examples of the use of OOS can be found in [167], [168]. As in TWS, in OOS a list of the processed events is also maintained, and responsiveness is improved. However, in OOS, the number of rollback actions is highly decreased, reducing the users’ annoyance and improving interactivity and users’ QoE. However, like in ECS, this one also requires high memory capacity and processing resources to execute the required processes. Comparison

Table 6 presents a summary of the sync techniques presented in this section, including their advantages, disadvantages, as well as some examples of NVEs using them. In the future, newer and more advanced solutions could apply artificial intelligence (AI) or machine learning (ML) techniques to model and predict things like the occurrence of events, the network conditions, user behavior, etc., so the synchronization can be adapted [169]. For example, in [170], AI techniques are used to optimize the scheduling of events and performance on the synchronization of wireless sensor devices. So, it could be possible to contrive AI-based NVE solutions soon.

Fig. 36. Example of TSS. In a), 3 TS copies are recorded holding the changes from the preceding received update messages. In b) a rollback moves the execution point of the last TS copy that did not miss any update message is.

Fig. 37. Example of the ECS.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 21

Synchronization technique Advantages Disadvantages Examples

Local lag (LL) • Easy consistency control • Reduced network usage • Responsiveness problems • Robustness problems Khan et al. [69] Dynamic local lag (DLL) • Easy consistency control • Control messages add to network usage Huang et al. [147] Adaptive ∆-causality • Easy consistency and responsiveness control • Requires low network delay Huang et al. [147] Bucket synchronization (BS) • Low sync overhead • Reduced Responsiveness MiMaze [11] Breathing time buckets (BTB) • Better responsiveness • Reduced network usage • Hard to keep global consistency Damitio et al. [155] Lockstep synchronization (LS) • Ensured consistency • Easy to implement • Not suited for real-time applications • Not suitable if high responsiveness is needed RING [26]; Chen et al. [156] Asynchronous synchronization (AS) • Resilience to cheaters • Good local consistency • Hard to keep global consistency • Not suitable for fast-paced NVEs Baughman et al. [158] Adaptive event synchronization (AES) • Good local consistency and responsiveness • High computation resources needed • Control messages increase network use Kim et al. [159] Time warp synchronization (TWS) • Good responsiveness • Increased network usage • High memory capacity needs • Rollbacks Nguyen et al. [97] Breathing time warp (BTW) • Good consistency and responsiveness • Complex to implement • High computation resources needed Steinman et al. [161]; SafeBTW [162] Trailing state synchronization (TSS) • Good responsiveness (better than TWS) • Suited for real-time applications • High computing resources needed • High memory capacity needs • Rollbacks Cronin et al. [163] Event correlation synchronization (ECS) • Good responsiveness (better than TSS) • Decrease network usage • Reduced number of Rollbacks • High computing resources needed • High memory capacity needs • Rollbacks Bin Shi et al. [164] Optimistic obsolescence-based synchronization • High responsiveness and good consistency • High computing resources needed • High memory capacity needs • Rollbacks Ferretti et al. [167], [168] Table 6. Sync techniques in NVEs.

IX. C OMPUTING MODELS

Depending on the employed network architecture, in NVEs, the data, the tasks, and the computation needed by each of the interconnected nodes can be managed in different ways, optimizing the delivery of information and the performance of tasks, like, e.g., the rendering of the 3D virtual world. The rendering of the virtual world, other intensive tasks (e.g., a complex behavior of an entity), and the required storage size for the NVE can be delegated to remote nodes, which will provide the needed results and information for updating the virtual world and representing it in the client. Those helper nodes may be closer or farther to the client (e.g., in the Edge nodes or in the Cloud, respectively), in another peer or in the same house, and could be serving their features to a single user or to multiple ones (e.g., rendering frames for a group of users). These techniques also allow the NVE designers to provide a service-oriented solution, where the access to products, programs and other components are offered as services (e.g., a subscription to use an application temporarily) instead of the traditional on-premises approach. So, clients delegate part of (or all) their computational requirements and roles to third parties. In the Cloud Computing area, this is known as *aaS (“Something” as a Service). Examples are SaaS (Software as a Service), PaaS (Platform as a Service) and GaaS (Games as a Service). In the NVE scope, these techniques mitigate the problems that end-user lightweight clients (e.g., computers with low processing capabilities, or smartphones with lower storage size) experience. Furthermore, these techniques are still acceptable for the rest of clients if the downsides they present are not severe, allowing companies to offer this business model to all the possible clients. In this section several techniques to manage the computing processing requirements for clients in NVEs, such as Remote Rendering, Adaptive Streaming, Foveated Imaging, Memoization, and Progressive Downloading are explained. A. Remote Rendering

Remote Rendering is based on using other computers for rendering the contents of the NVE. The resulting audio and video streams are delivered through a network connection to the clients [92]. As users interact with the NVE, their clients send the input interactions to the renderer computer to execute the necessary processes and return the results or needed information. To allow a good level of interactivity, the latency between clients and that computer needs to be low, or the responsiveness will be impaired. This becomes a troublesome task when the quality of the images or video of the NVEs is very high and the available throughput is very low. To tackle this problem, besides the predictive modelling and sync, Adaptive Streaming and Memoization techniques (explained later) can be used. There are two main reasons to use this model: 1) it allows lightweight clients to still run and interact with NVEs that have high computation requirements, also saving money with the Cloud-based architecture [91]; and 2) it avoids the need to store the entire NVE program (as it runs in another computer), hence saving storage capacity on the client. There are also two additional benefits that come with Remote Rendering: 1) it is platform-independent, the NVE must only be developed for the server that is going to execute it, and the same rendered frames can be transmitted to all kinds of clients and platforms (Android, Linux, Windows, PlayStation, Xbox, etc.) without restrictions; and 2) there is only one copy of the NVE, making it easier to maintain and update the NVE. There are many examples that employ Remote Rendering. Examples of Cloud Gaming platforms are Sora Stream [171], PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 22 PlayKey [172], GeForce Now [173] and Google Stadia [174]. These platforms offer games on demand for a monthly fee, all rendered from their Cloud and brought to the clients. Parsec [175] and Steam Remote Play [176] allow the users to store the games on their own computer and stream them to other devices (e.g., a TV over LAN). MUVR [76] and CloudyGame [96] allow mobile clients (e.g., smartphones) to play games using Edge computers, which do the rendering of frames instead of a farther Cloud that would increase the delay. Finally, DROVA [177] and Vectordash [178] are distributed solutions, a.k.a. P2P Cloudless Gaming, which, instead of using a Cloud, depend on a decentralized network that balances the required rendering and computation load. This way, users can have a close available computer to manage the NVE load, instead to using a Cloud-based service, improving responsiveness. B. Adaptive Streaming

As high-quality rendered images (frames) require a high throughput, adaptive video streaming methods can be applied to reduce the throughput usage. The quality of the rendered frames must be the optimum for the available throughput (which can change dynamically depending on the fluctuation of the network conditions) [179]. With Adaptive Streaming, when the available throughput decreases, the client receives lower quality frames instead of getting them delayed or getting disconnected from the NVE session. When the available throughput increases, higher quality frames can be rendered again and transmitted. Examples of the use of Adaptive Streaming can be found in [180]–[182]. With Adaptive Streaming, the network usage is optimized, reducing congestion and possible packet loss. Nonetheless, if the available throughput is too low, the reduced quality of the received frames can provide the users with a bad QoE. It is also important to note that these frames are generated and transmitted in real time instead of stored beforehand. Therefore, an algorithm to decide the quality of the frames to be transmitted in each moment is needed. C. Foveated Imaging

When using adaptive streaming the users can perceive a bad QoE when the throughput is too low. To solve it, with Foveated Imaging a region of interest (RoI) is defined in the viewport of the user, so that the quality of the RoI in each transmitted frame is higher than in the rest of the frame. It allows to improve the users’ QoE by reducing the quality of the parts of the rendered frame the user is not paying attention to, as shown in Fig. 38. Examples of the use of Foveated Imaging can be found in [183], [184]. In the virtual reality (VR) world, this is called foveated rendering [183]. As users employing an HMD can move the head, and hence the viewport, freely and easier than without the HMD, the NVE should be able to send newer frames faster, updating to the new perspective and reducing the motion-to-photon latency. The advantage of this Foveated Imaging is that the amount of transmitted information can be significantly reduced, without affecting too much to the users’ QoE, improving the network usage and the experienced latency. Nonetheless, this comes with a higher computation demand. D. Memoization

Since there can be redundancy, during the NVE session, between the content of rendered frames for different clients (e.g., users whose avatar is moving in the same environment view similar backgrounds), with the Memoization, the rendered frames are cached and reused if needed. This consists in storing rendered frames or the results of other long computations to use them again when possible [76]. This way the needed time for processing and computing resources is decreased, reducing delays, and optimizing the performance of remote rendering. An example of the use of Memoization can be found in MUVR [76], where it is used on an Edge-based architecture to reduce delays and rendering requirements between mobile clients. Every time a frame is requested from a specific position in the virtual environment (usually from the head of the user’s avatar), it is checked whether there is a stored frame recorded from a similar position. If not, that frame is rendered, and then it is stored along with the position and orientation it was viewed from, for possible future uses. If cached frames exist from a similar position, the NVE combines them to generate a new frame representing what the user should view from that perspective. This is called image-based rendering (IBR) [92] and it consists in combining 2D images to simulate and render 3D points of view at different positions in the virtual world. The NVE can also use frames that were rendered for different clients to render new ones for other clients (e.g., a cloud NVE rendering the frames from several users participating in the same session). So, a system that is rendering the NVE session for multiple clients, can take advantage of that redundancy too, reducing computing needs. Fig. 39 illustrates Memoization. In a) and b), two different points of view of the same entity are employed to store two new frames generated by 3D rendering. In c), the two frames are used to generate a new frame from a different point of view with IBR, discarding the need of 3D rendering. As IBR requires less computing power, clients with low processing capabilities can store and use cached frames to render new frames, reducing the network usage and the latency but needing more memory capacity to store them. However, the frames can become obsolete as time passes or when the virtual environment changes (e.g., entities move), forcing new frames to be rendered. Furthermore, if the number of cached frames is high, a node may choose to delete less used or older ones, to save storage size.

Fig. 38. Foveated rendered image.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 23 E. Progressive Download

The Progressive Download consists in allowing clients to execute the NVE although not all its contents are available yet [185]. This means that the NVE application can run while downloading its remaining content (Fig. 40). Examples that use progressive download can be found in SuperStreamer [185] and Utomik [186]. Thanks to Progressive Download, users start experiencing the NVE while the complete version keeps downloading. Recent videogames with high-definition graphics already surpass 100GB of storage size (e.g., Destiny 2 is 105GB [187] and Call of Duty: Modern Warfare is 175GB [188]). Furthermore, on clients with low storage capacity, Progressive Download allows to just load the information required at each moment (e.g., a virtual zone and its contents). Progressive Download is also practical for highly customizable and continuously changing NVEs, like Second Life [40], Mozilla hubs [44] and Decentraland [46], NVEs where the parts of the environment are loaded on demand, as the user requires them. Also following Progressive Download, the Blizzard’s Battle.net [189] launcher has three stages for a game download. In the first one, a predefined downloaded percentage of the game is needed to be able to run the game with reduced performance and graphics. In the second stage, after a subsequent amount has been downloaded, the game can be executed without stability issues (i.e., without the frames per second fluctuating), but with some content missing. Finally, the third stage comes when the game is fully downloaded. Moreover, the well-known videogame platforms PS4 and Xbox One have the features Play as you Download [190] and Ready to Start [191], respectively, for allowing their users to be able to play small segments of a game while the rest of it is being downloaded. Additionally, the Xbox One has another functionality called FastStart [191] that determines which resources of the game are needed before, so they can be downloaded first, allowing a user to run the game before it is fully downloaded. F. Comparison

Table 7 summarizes the described techniques for optimizing the NVEs requirements, including their advantages and disadvantages, as well as examples of NVEs using them. For the remote rendering of stereoscopic vision (e.g., VR on HMDs), there is no standardized solution for image compression yet that considers the redundancy existing between the two frames rendered for the two eyes [76]. These frames, being rendered at the same moment, are oftentimes quite similar, and the transmission of them, when remote rendering, could take advantage from a streaming technique that reduces that extra network usage when remote rendering for VR, this way improving the current compression solutions like in Adaptive Streaming and Foveated Imaging. Other works that also study this redundancy between eyes and between different clients can be found in DeltaVR [192] and in Coterie [193]. Moreover, in [194], there is a study on modeling the viewpoint (or gaze) of users in VR, which could be useful to improve the performance of Foveated Imaging. Additionally, the IBR can also be used for local rendering, to save computation resources that could be used by other techniques, and the same happens with memoization, which can be used for storing results of other computing intensive tasks, like the ones a prediction algorithm could do, and store the results to reduce the delay added by computing processes. Unfortunately, techniques applied specifically on NVEs, for the Computing models component, could not be found.

Technique Advantages Disadvantages Examples

Remote rendering • Easy update for all clients • Reduces computing and storage requirements of the end clients • High bitrate requirements • Sensible to delays MUVR [76]; CloudyGame [96]; Sora Stream [171]; PlayKey [172]; GeForce Now [173]; Stadia [174]; Parsec [175]; Remote Play [176]; DROVA [177]; Vectordash [178] Adaptive streaming • Reduces network usage • Can worsen the QoE Rhee et al. [180]; Hong et al. [181]; Wang et al. [182] Foveated imaging • Reduces network usage • Low impact on QoE • Increases computation requirements Illahi et al. [183]; Ahmadi et al. [184] Memoization • Reduced overload of the Cloud • Lower requirements for clients • Not suited for dynamic worlds • Balance needed between the storage and computing MUVR [76] Progressive downloading • Data easy to manage by the NVE owner • Reduce storage needs • Reduces waiting times • Increases NVE complexity • Data downloading decreases available throughput employed for interaction SuperStreamer [185]; Utomik [186]; Second Life [40]; Mozilla hubs [44]; Decentraland [46]; Battle.net [189]; PS4 [190]; Xbox One [191] Table 7. Computing models in NVEs.

Fig. 39. Two stored frames used to create a frame from a new point of view.

Fig. 40. Progressive Download when the avatar is moving.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 24 Furthermore, upscaling and sharpening techniques have been used recently in videogames to improve graphic quality of the rendered images, by increasing the image resolution and detail, without much increase on the computation costs, with the most notable example being NVIDIA DLSS (Deep Learning Super Sampling) [195]. The problem of this technology is that it is applied at the rendering process, meaning that remote rendered images get upscaled before being transmitted to the client, instead of sending the downscaled version, and letting the client, or a node closer than the Cloud, to upscale the frame, improving the network usage. Overall, more techniques in these fields are expected to be developed and applied to NVEs alongside the increasing requirements of more modern NVEs. X. C ONCLUSIONS

This paper provides a big picture of the main technologies employed for designing NVEs. First, some NVE-related background information has been provided and very important factors to be considered in NVEs are described, such as consistency, responsiveness, concurrency, and sync, among some others, to guarantee an overall satisfaction of the users (i.e., good QoE). Then, an up-to-date review and compilation of the most important network architectures, computing models, data distribution models, and techniques for resource balancing, predictive modeling, and sync used in NVEs, has been presented. These models and techniques have been revised, compared, and classified, while also mentioning the diverse fields that may boost interest to explore. The different techniques manage the described important factors in an NVE, requiring a classification in different components. This classification is based on the nature of those techniques, to make it simpler to extract the relationship between them and to choose the most appropriate ones for each NVE to be designed. A novel taxonomy has been provided as an assistance tool in the study of NVE techniques, and to classify new techniques to appear in the future. This paper is intended to serve as a starting point for future investigations in the NVE field as well as a handful tool for future NVE developments. More research is needed in this field to improve the users’ QoE. It is continuously growing with new and better techniques. Some of the less explored areas (e.g., prediction and data compression) in NVEs could be further explored to find more advanced and useful techniques for future developments. Maybe those techniques will be easily accommodated in the proposed taxonomy or it will have to be modified or extended in the future. R

EFERENCES [1] C. J. Bouras, E. Giannaka, and T. Tsiatsos, “Networked Virtual Environments,” in Gaming and Simulations , IGI Global, 2011, p. 7. [2] K. Hosoya, Y. Ishibashi, S. Sugawara, and K. E. Psannis, “QoE Assessment of Group Synchronization Control in Distributed Virtual Environments with Avatars,”

IEICE Tech. Rep. , vol. 108, no. 137, pp. 27–32, Jun. 2008, doi: 10.1109/DS-RT.2008.12. [3] M. Peruzzini, M. Mengoni, and M. Germani, “The Impact of Virtual Environments on Humans Collaboration in Product Design,” in , 2009, pp. 57–68. The availability of the references has been checked on 26 January 2021. [4] A. Soares Pereira and S. Dutra Piovesan, “Virtual Reality Applied in Distance Education,” in

Distance education , 2012, pp. 81–98. [5] E. Buyukkaya, M. Abdallah, and G. Simon, “A survey of peer-to-peer overlay approaches for networked virtual environments,”

Peer-to-Peer Netw. Appl. , vol. 8, no. 2, pp. 276–300, 2015, doi: 10.1007/s12083-013-0231-5. [6] D. Roth, C. Kleinbeck, T. Feigl, C. Mutschler, and M. E. Latoschik, “Social Augmentations in Multi-User Virtual Reality: A Virtual Museum Experience,”

Adjunct Proceedings of the 2017 IEEE International Symposium on Mixed and Augmented Reality, ISMAR-Adjunct 2017 . IEEE, pp. 42–43, 2017, doi: 10.1109/ISMAR-Adjunct.2017.28. [7] D. Checa and A. Bustillo, “A review of immersive virtual reality serious games to enhance learning and training,”

Multimed. Tools Appl. , vol. 79, no. 9, pp. 5501–5527, 2019, doi: 10.1007/s11042-019-08348-9. [8] A. de Regt and S. J. Barnes, “Multi-User Virtual Reality Technology as Means to Engage Global Consumers: An Abstract,” in

Academy of Marketing Science World Marketing Congress , 2019, pp. 945–946, doi: 10.1007/978-3-030-02568-7_269. [9] M. Alcañiz, E. Bigné, and J. Guixeres, “Virtual reality in marketing: A framework, review, and research agenda,”

Front. Psychol. , vol. 10, no. JULY, p. 1530, 2019, doi: 10.3389/fpsyg.2019.01530. [10] S. S. Sabet, S. Schmidt, S. Zadtootaghaj, C. Griwodz, and S. Möller, “Delay sensitivity classification of cloud gaming content,” in , 2020, pp. 25–30, doi: 10.1145/3386293.3397116. [11] L. Gautier and C. Diot, “Design and evaluation of MiMaze a multi-player game on the Internet,” in

International Conference on Multimedia Computing and Systems , 1998, pp. 233–236, doi: 10.1109/mmcs.1998.693647. [12] “City of Heroes.” [Online]. Available: http://cityofheroes.ca/. [13] “Kingspray.” [Online]. Available: http://infectiousape.com/. [14] J. Müller and S. Gorlatch, “Rokkatan: Scaling an RTS game design to the massively multiplayer realm,”

Comput. Entertain. , vol. 4, no. 3, p. 11, 2006, doi: 10.1145/1146816.1146833. [15] “World of Warcraft.” [Online]. Available: https://worldofwarcraft.com/. [16] J. Calvin, A. Dickens, B. Gaines, P. Metzger, D. Miller, and D. Owen, “The SIMNET virtual world architecture,” in

Virtual Reality Annual International Symposium , 1993, pp. 450–455, doi: 10.1109/vrais.1993.380745. [17] S. Touel, M. Mekkadem, M. Kenoui, and S. Benbelkacem, “Collocated learning experience within collaborative augmented environment (anatomy course),” in

ACM Trans. Access. Comput.

Future Technologies Conference , 2018, pp. 962–981, doi: 10.1007/978-3-030-02686-8_72. [24] M. Capps, D. McGregor, D. Brutzman, and M. Zyda, “NPSNET-V: A new beginning for dynamically extensible virtual environments,”

IEEE Comput. Graph. Appl. , vol. 20, no. 5, pp. 12–15, 2000, doi: 10.1109/38.865873. [25] F. W. B. Li, R. W. H. Lau, and F. F. C. Ng, “Collaborative distributed virtual sculpting,” in

Virtual Reality Annual International Symposium , 2001, pp. 217–224, doi: 10.1109/vr.2001.913789.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 25 [26] T. A. Funkhouser, “RING: A client-server system for multi-user virtual environments,” in

Symposium on Interactive 3D Graphics , 1995, pp. 85-ff, doi: 10.1145/199404.199418. [27] C. Greenhalgh and S. Benford, “MASSIVE: A Collaborative Virtual Environment for Teleconferencing,”

Trans. Comput. Interact. , vol. 2, no. 3, pp. 239–261, 1995, doi: 10.1145/210079.210088. [28] C. Carlsson and O. Hagsand, “DIVE a multi-user virtual reality systems,” in

Virtual Reality Annual International Symposium , 1993, pp. 394–400, doi: 10.1109/vrais.1993.380753. [29] S. Jourdain et al. , “ShareX3D, a scientific collaborative 3D viewer over HTTP,” in , 2008, pp. 35–41, doi: 10.1145/1394209.1394220. [30] “CAVRNUS.” [Online]. Available: https://cavrn.us/. [31] “Spatial.” [Online]. Available: https://spatial.io/. [32] Yeongho Kim et al. , “Collaborative surgical simulation over the Internet,”

IEEE Internet Comput. , vol. 5, no. 3, pp. 65–73, May 2001, doi: 10.1109/4236.935179. [33] “The Wild.” [Online]. Available: https://thewild.com/. [34] F. Dupont et al. , “Collaborative Scientific Visualization: The COLLAVIZ Framework,” in

Joint Virtual Reality Conference of EGVE- EuroVR - VEC , 2010. [35] H. Song, F. Chen, Q. Peng, J. Zhang, and P. Gu, “Improvement of user experience using virtual reality in open-architecture product design,”

Inst. Mech. Eng. Part B J. Eng. Manuf. et al. , “Diamond Park and Spline: Social virtual reality with 3D animation, spoken interaction, and runtime extendability,”

Presence Teleoperators Virtual Environ. , vol. 6, no. 4, pp. 461–481, Aug. 1997, doi: 10.1162/pres.1997.6.4.461. [43] “Virtual Real Meeting.” [Online]. Available: https://jansen.itch.io/vr-meeting. [44] “Mozilla Hubs.” [Online]. Available: https://hubs.mozilla.com/. [45] S. Herscher et al. , “CAVRN: An Exploration and Evaluation of a Collective Audience Virtual Reality Nexus Experience,” in

IEEE Trans. Vis. Comput. Graph. , vol. 25, no. 11, pp. 3178–3189, 2019, doi: 10.1109/TVCG.2019.2932173. [50] W. Park, H. Heo, S. Park, and J. Kim, “A Study on the Presence of Immersive User Interface in Collaborative Virtual Environments Application,”

Symmetry (Basel). , vol. 11, no. 4, p. 476, 2019, doi: 10.3390/sym11040476. [51] D. Delaney, T. Ward, and S. McLoone, “On consistency and network latency in distributed interactive applications: A survey-part I,”

Presence Teleoperators Virtual Environ. , vol. 15, no. 2, pp. 218–234, Aug. 2006, doi: 10.1162/pres.2006.15.2.218. [52] D. Delaney, T. Ward, and S. McLoone, “On consistency and network latency in distributed interactive applications: A survey-part II,”

Presence Teleoperators Virtual Environ. , vol. 15, no. 4, pp. 465–482, Aug. 2006, doi: 10.1162/pres.15.4.465. [53] L. Gautier, C. Diot, and J. Kurose, “End-to-end transmission control mechanisms for multiparty interactive applications on the Internet,” in , 1999, vol. 3, pp. 1470–1479, doi: 10.1109/INFCOM.1999.752168. [54] T. Manninen, “Interaction in networked virtual environments as communicative action: Social theory and multi-player games,” in , 2000, pp. 154–157, doi: 10.1109/CRIWG.2000.885173. [55] W. Lamotte, P. Quax, and E. Flerackers, “Large-scale networked virtual environments: Architecture and applications,”

Campus-Wide Inf. Syst. , vol. 25, no. 5, pp. 329–341, Nov. 2008, doi: 10.1108/10650740810921475. [56] J. R. Jiang, Y. L. Huang, and S. Y. Hu, “Scalable AOI-cast for peer-to-peer networked virtual environments,” in , 2008, pp. 447–452, doi: 10.1109/ICDCS.Workshops.2008.80. [57] W. Zhang, H. Zhou, Y. Peng, and S. Li, “Providing responsiveness requirement based consistency in DVE,” in , 2009, pp. 594–601, doi: 10.1109/ICPADS.2009.52. [58] J. L. Miller, “Distributed virtual environment scalability and security,” University of Cambridge, 2011. [59] M. Montagud, F. Boronat, H. Stokking, and R. Van Brandenburg, “Inter-destination multimedia synchronization: Schemes, use cases and standardization,”

Multimed. Syst. , vol. 18, no. 6, pp. 459–482, Nov. 2012, doi: 10.1007/s00530-012-0278-9. [60] B. Shen, W. Tan, J. Guo, H. Cai, B. Wang, and S. Zhuo, “A Study on Design Requirement Development and Satisfaction for Future Virtual World Systems,”

Futur. Internet , vol. 12, no. 7, p. 112, 2020, doi: 10.3390/fi12070112. [61] L. Lamport, “Time, Clocks, and the Ordering of Events in a Distributed System,”

Commun. ACM , vol. 21, no. 7, pp. 558–565, 1978, doi: 10.1145/359545.359563. [62] J. Smed, T. Kaukoranta, and H. Hakonen, “Aspects of networking in multiplayer computer games,”

Electron. Libr. , vol. 20, no. 2, pp. 87–97, 2002, doi: 10.1108/02640470210424392. [63] J. Saldana and M. Suznjevic, “QoE and Latency Issues in Networked Games,” in

Handbook of Digital Games and Entertainment Technologies , R. Nakatsu and M. Rauterberg, Eds. Singapore: Springer Singapore, 2015, pp. 1–36. [64] S. U. Shah Khalid, A. Alam, and F. Din, “Optimal Latency in Collaborative Virtual Environment to Increase User Performance: A Survey,”

Int. J. Comput. Appl. , vol. 142, no. 3, pp. 35–47, May 2016, doi: 10.5120/ijca2016909723. [65] M. Amiri, H. Al Osman, and S. Shirmohammadi, “Game-aware bandwidth allocation for home gateways,” in , 2017, pp. 1–3, doi: 10.1109/NetGames.2017.7991546. [66] M. Roccetti, S. Ferretti, and C. E. Palazzi, “The brave new world of multiplayer online games: Synchronization issues with smart solutions,” in , 2008, pp. 587–592, doi: 10.1109/ISORC.2008.17. [67] L. Ricci and E. Carlini, “Distributed virtual environments: From client server to cloud and P2P architectures,” in

International Conference on High Performance Computing and Simulation. , 2012, pp. 8–17, doi: 10.1109/HPCSim.2012.6266885. [68] A. M. Khan, S. Chabridon, and A. Beugnard, “Synchronization medium: A consistency maintenance component for mobile multiplayer games,” in , 2007, pp. 99–104, doi: 10.1145/1326257.1326275. [69] A. M. Khan, S. Chabridon, and A. Beugnard, “A dynamic approach to consistency management for mobile multiplayer games,” in , 2008, p. 42, doi: 10.1145/1416729.1416783. [70] F. Messaoudi, A. Ksentini, G. Simon, and P. Bertin, “Performance Analysis of Game Engines on Mobile and Fixed Devices,”

ACM Trans. Multimed. Comput. Commun. Appl. , vol. 13, no. 4, p. 28, 2017, doi: 10.1145/3115934. [71] M. A. O. Bello, E. L. Dominguez, S. E. P. Hernandez, and J. R. Perez Cruz, “Synchronization Protocol for Real Time Multimedia in Mobile Distributed Systems,”

IEEE Access , vol. 6, pp. 15926–15940, 2018, doi: 10.1109/ACCESS.2018.2817386. [72] I. Parvez, A. Rahmati, I. Guvenc, A. I. Sarwat, and H. Dai, “A Survey on Low Latency Towards 5G: RAN, Core Network and Caching Solutions,”

IEEE Commun. Surv. Tutorials , vol. 20, no. 4, pp. 3098–3130, 2018, doi: 10.1109/COMST.2018.2841349. [73] S. Y. Lien, S. C. Hung, D. J. Deng, and Y. J. Wang, “Efficient ultra-reliable and low latency communications and massive machine-Type communications in 5G new radio,” in

IEEE Global Communications Conference , 2017, pp. 1–7, doi: 10.1109/GLOCOM.2017.8254211. [74] A. Nasrallah et al. , “Ultra-Low Latency (ULL) Networks: The IEEE TSN and IETF DetNet Standards and Related 5G ULL Research,”

IEEE Commun. Surv. Tutorials , vol. 21, no. 1, pp. 88–145, 2019, doi: 10.1109/COMST.2018.2869350.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 26 [75] A. Becher, J. Angerer, and T. Grauschopf, “Novel Approach to Measure Motion-To-Photon and Mouth-To-Ear Latency in Distributed Virtual Reality Systems,” in , 2018. [76] Y. Li and W. Gao, “MUVR: Supporting Multi-User Mobile Virtual Reality with Resource Constrained Edge Cloud,” in , 2018, pp. 1–16, doi: 10.1109/SEC.2018.00008. [77] L. Hoyet, C. Spies, P. Plantard, A. Sorel, R. Kulpa, and F. Multon, “Influence of Motion Speed on the Perception of Latency in Avatar Control,” in

IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR) , 2019, pp. 286–2863, doi: 10.1109/AIVR46125.2019.00066. [78] C. Roth, E. Luckett, and J. A. Jones, “Latency Detection and Illusion in a Head-Worn Virtual Environment,” in

IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW) , 2020, pp. 215–218, doi: 10.1109/VRW50115.2020.00046. [79] S. Gilbert and N. Lynch, “Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services,”

ACM SIGACT News , vol. 33, no. 2, p. 51, 2002, doi: 10.1145/564585.564601. [80] C. Fleury, T. Duval, V. Gouranton, and B. Arnaldi, “Architectures and Mechanisms to Maintain efficiently Consistency in Collaborative Virtual Environments,” in

Workshop on Software Engineering and Architectures for Realtime Interactive Systems , 2010. [81] P. F. Reynolds, “A spectrum of options for parallel simulation,” in

Winter Simulation Conference Proceedings , 1988, pp. 325–332, doi: 10.1109/WSC.1988.716181. [82] X. Song and J. W. S. Liu, “Maintaining Temporal Consistency: Pessimistic vs. Optimistic Concurrency Control,”

IEEE Trans. Knowl. Data Eng. , vol. 7, no. 5, pp. 786–796, Nov. 1995, doi: 10.1109/69.469820. [83] H. Xia, S. Herscher, K. Perlin, and D. Wigdor, “Spacetime: Enabling Fluid Individual and Collaborative Editing in Virtual Reality,” in , 2018, pp. 853–866, doi: 10.1145/3242587.3242597. [84] M. R. Macedonia and M. J. Zyda, “A taxonomy for networked virtual environments,”

IEEE Multimed. , vol. 4, no. 1, pp. 48–56, 1997, doi: 10.1109/93.580395. [85] S. D. Knight, “A comparison of analysis in DIS and HLA,” Naval Postgraduate School, Monterey, California, 1998. [86] C. Fleury, T. Duval, V. Gouranton, and B. Arnaldi, “A new adaptive data distribution model for consistency maintenance in collaborative virtual environments,” in , 2010, pp. 29–36, doi: 10.2312/EGVE/JVRC10/029-036. [87] A. B. Karuvally, B. Hameem, A. J. Sundar, and J. P. Joseph, “Enhancing Performance and Reliability of Network File System,” in

International CET Conference on Control, Communication, and Computing (IC4) , 2018, pp. 317–321, doi: 10.1109/CETIC4.2018.8531062. [88] H. Liang, I. Tay, M. F. Neo, W. T. Ooi, and M. Motani, “Avatar Mobility in Networked Virtual Environments: Measurements, Analysis, and Implications,” Jul. 2008. [89] A. Yahyavi and B. Kemme, “Peer-to-Peer Architectures for Massively Multiplayer Online Games: A Survey,”

ACM Comput. Surv. , vol. 46, no. 1, p. 51, 2013, doi: 10.1145/2522968.2522977. [90] D. Meiländer, S. Köttinger, and S. Gorlatch, “A Scalability Model for Distributed Resource Management in Real-Time Online Applications,” in , 2013, pp. 763–772, doi: 10.1109/ICPP.2013.90. [91] R. Shea, J. Liu, E. C.-H. Ngai, and Y. Cui, “Cloud gaming: architecture and performance,”

IEEE Netw. , vol. 27, no. 4, pp. 16–21, 2013, doi: 10.1109/MNET.2013.6574660. [92] S. Shi and C.-H. Hsu, “A Survey of Interactive Remote Rendering Systems,”

ACM Comput. Surv. , vol. 47, no. 4, pp. 1–29, 2015, doi: 10.1145/2719921. [93] V. Y. Kharitonov, “A Software Architecture for High-Level Development of Component-Based Distributed Virtual Reality Systems,” in

IEEE 37th Annual Computer Software and Applications Conference , 2013, pp. 696–705, doi: 10.1109/COMPSAC.2013.111. [94] I. Sunday Pandzic, E. Lee, N. Magnenat Thalmann, T. K. Capin, and D. Thalmann, “A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments,”

Comput. Graph. Forum , vol. 16, no. 3, pp. 177–188, 2008, doi: 10.1111/1467-8659.16.3conferenceissue.19. [95] F. Messaoudi, “User equipment based-computation offloading for real-time applications in the context of Cloud and edge networks,” Université Rennes 1, 2018. [96] A. Bhojan, S. P. Ng, J. Ng, and W. T. Ooi, “CloudyGame: Enabling Cloud Gaming on the Edge with Dynamic Asset Streaming and Shared Game Instances,”

Multimed. Tools Appl. , vol. 79, no. 43–44, pp. 32503–32523, 2020, doi: 10.1007/s11042-020-09612-z. [97] T. C. Nguyen, S. Kim, J. Son, and J. Yun, “Selective Timewarp Based on Embedded Motion Vectors for Interactive Cloud Virtual Reality,”

IEEE Access , vol. 7, pp. 3031–3045, 2018, doi: 10.1109/ACCESS.2018.2888700. [98] V. Nae, A. Iosup, and R. Prodan, “Dynamic resource provisioning in Massively Multiplayer Online Games,”

IEEE Trans. Parallel Distrib. Syst. , vol. 22, no. 3, pp. 380–395, 2012, doi: 10.1109/TPDS.2010.82. [99] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli, “Fog Computing and Its Role in the Internet of Things,” in , 2012, pp. 13–16, doi: 10.1145/2342509.2342513. [100] M. Satyanarayanan, P. Bahl, R. Cáceres, and N. Davies, “The Case for VM-Based Cloudlets in Mobile Computing,”

IEEE Pervasive Comput. , vol. 8, no. 4, pp. 14–23, 2009, doi: 10.1109/MPRV.2009.82. [101] F. Boronat, J. Lloret, and M. García, “Multimedia group and inter-stream synchronization techniques: A comparative study,”

Inf. Syst. , vol. 34, no. 1, pp. 108–131, Mar. 2009, doi: 10.1016/j.is.2008.05.001. [102] Ş. B. Çevikbaş and V. İşler, “Phaneros: Visibility-based framework for massive peer-to-peer virtual environments,”

Comput. Animat. Virtual Worlds , vol. 30, no. 1, p. e1808, 2019, doi: 10.1002/cav.1808. [103] B. Knutsson, H. Lu, W. Xu, and B. Hopkins, “Peer-to-peer support for massively multiplayer games,” in , 2004, vol. 1, pp. 96–107, doi: 10.1109/infcom.2004.1354485. [104] H. A. Engelbrecht and J. S. Gilmore, “Pithos: Distributed Storage for Massive Multi-User Virtual Environments,”

ACM Trans. Multimed. Comput. Commun. Appl. , vol. 13, no. 3, p. 33, 2017, doi: 10.1145/3105577. [105] S. Ferretti, “Cheating detection through game time modeling: A better way to avoid time cheats in P2P MOGs?,”

Multimed. Tools Appl. , vol. 37, no. 3, pp. 339–363, 2008, doi: 10.1007/s11042-007-0163-2. [106] C. Anthes, P. Heinzlreiter, and J. Volkert, “An adaptive network architecture for close-coupled collaboration in distributed virtual environments,” in

SIGGRAPH International Conference on Virtual Reality Continuum and its Applications in Industry , 2004, pp. 382–385, doi: 10.1145/1044588.1044671. [107] N. Capece, U. Erra, G. Losasso, and F. D’Andria, “Design and Implementation of a Web-Based Collaborative Authoring Tool for the Virtual Reality,” in , 2019, pp. 603–610, doi: 10.1109/SITIS.2019.00123. [108] M. A. Bassiouni, M.-H. Chiu, M. L. Loper, M. Garnsey, and J. Williams, “Performance and Reliability Analysis of Relevance Filtering for Scalable Distributed Interactive Simulation,”

ACM Trans. Model. Comput. Simul. , vol. 7, no. 3, pp. 293–331, 1997, doi: 10.1145/259207.259209. [109] M. Laakso, “Potentially Visible Set (PVS),” in

Tik-111.500 Seminar on computer graphics , 2003, p. 15. [110] F. O. Moreira, J. L. D. Comba, and C. M. D. S. Freitas, “Smart Visible Sets for Networked Virtual Environments,” in , 2002, pp. 373–380, doi: 10.1109/SIBGRA.2002.1167168. [111] A. Steed and C. Angus, “Frontier Sets: A Partitioning Scheme to Enable Scalable Virtual Environments,” in

Eurographics , 2004, doi: 10.2312/egs.20041019. [112] S. Avni and J. Stewart, “Frontier Sets in Large Terrains,” in

Graphics Interface , 2010, pp. 169–176, doi: 10.5555/1839214.1839244. [113] A. Steed and C. Angus, “Enabling Scalability by Partitioning Virtual Environments Using Frontier Sets,”

Presence Teleoperators

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 27

Virtual Environ. , vol. 15, no. 1, pp. 77–92, 2006, doi: 10.1162/pres.15.1.77. [114] Y. Makbily, C. Gotsman, and R. Bar-Yehuda, “Geometric algorithms for message filtering in decentralized virtual environments,” in

Symposium on Interactive 3D Graphics , 1999, pp. 39–46, doi: 10.1145/300523.300527. [115] P. M. Sharkey, M. D. Ryan, and D. J. Roberts, “A local perception filter for distributed virtual environments,” in

Virtual Reality Annual International Symposium , 1998, pp. 242–249, doi: 10.1109/VRAIS.1998.658502. [116] P. Curtis and D. A. Nichols, “MUDs grow up: social virtual reality in the real world,” in

Compcon Spring’94, Digest of Papers , 2002, pp. 193–200, doi: 10.1109/cmpcon.1994.282924. [117] M. Yoshida, Y. A. Tijerino, S. Abe, and F. Kishino, “A virtual space teleconferencing system that supports intuitive interaction for creative and cooperative work,” in

Symposium on Interactive 3D Graphics , 1995, pp. 115–122, doi: 10.1145/199404.199425. [118] I. J. Grimstead, N. J. Avis, and D. W. Walker, “RAVE: the resource-aware visualization environment,”

Concurr. Comput. Pract. Exp. , vol. 21, no. 4, pp. 415–448, Mar. 2009, doi: 10.1002/cpe.1327. [119] G. Singh, L. Serra, W. Png, A. Wong, and H. Ng, “BrickNet: sharing object behaviors on the net,” in

Virtual Reality Annual International Symposium , 1995, pp. 19–25, doi: 10.1109/vrais.1995.512475. [120] D. Margery, B. Arnaldi, A. Chauffaut, S. Donikian, and T. Duval, “OpenMASK: {Multi-Threaded| Modular} Animation and Simulation {Kernel | Kit}: A General Introduction,” in

Virtual Reality International Conference , 2002, pp. 101–110. [121] R. Prodan and V. Nae, “Prediction-based real-time resource provisioning for massively multiplayer online games,”

Futur. Gener. Comput. Syst. , vol. 25, no. 7, pp. 785–793, Jul. 2009, doi: 10.1016/j.future.2008.11.002. [122] P. Morillo, J. M. Orduña, M. Fernández, and J. Duato, “An Adaptive Load Balancing Technique For Distributed Virtual Environment Systems,” in , 2003, pp. 256–261. [123] U. Farooq and J. Glauert, “Faster dynamic spatial partitioning in OpenSimulator,”

Virtual Real. , vol. 21, no. 4, pp. 193–202, 2017, doi: 10.1007/s10055-017-0307-2. [124] S. A. Abdulazeez, A. El Rhalibi, and D. Al-Jumeily, “Dynamic Area of Interest Management for Massively Multiplayer Online Games Using OPNET,” in , 2017, pp. 50–55, doi: 10.1109/DeSE.2017.19. [125] W. Cai, P. Xavier, S. J. Turner, and B. S. Lee, “A scalable architecture for supporting interactive games on the internet,” in , 2002, pp. 60–67, doi: 10.1109/PADS.2002.1004201. [126] W. Zhang and H. Zhou, “A dynamic mapping method for reducing migrations in DVE systems,” in

IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) , 2017, pp. 187–190, doi: 10.1109/IAEAC.2017.8054003. [127] M. Wang, J. Jia, N. Xie, and C. Zhang, “Interest-driven avatar neighbor-organizing for P2P transmission in distributed virtual worlds,”

Comput. Animat. Virtual Worlds , vol. 27, no. 6, pp. 519–531, Nov. 2016, doi: 10.1002/cav.1670. [128] S. Y. Hu, J. F. Chen, and T. H. Chen, “VON: A scalable peer-to-peer network for virtual environments,”

IEEE Netw. , vol. 20, no. 4, pp. 22–31, 2006, doi: 10.1109/MNET.2006.1668400. [129] A. Schmieg, P. Kabus, M. Stieler, B. Kemme, S. Jeckel, and A. Buchmann, “pSense - maintaining a dynamic localized peer-to-peer structure for position based multicast in games,” in , 2008, pp. 247–256, doi: 10.1109/P2P.2008.20. [130] P. Mildner, T. Triebel, S. Kopf, and W. Effelsberg, “Scaling Online Games with NetConnectors: A Peer-to-Peer Overlay for Fast-Paced Massively Multiplayer Online Games,”

Comput. Entertain. , vol. 15, no. 3, p. 21, 2017, doi: 10.1145/2818383. [131] J. F. Chen, W. C. Lin, H. S. Bai, and S. Y. Dai, “A message interchange protocol based on routing information protocol in a virtual world,” in , 2005, vol. 2, pp. 377–384, doi: 10.1109/AINA.2005.34. [132] Y. Kawahara, T. Aoyama, and H. Morikawa, “A peer-to-peer message exchange scheme for large-scale networked virtual environments,”

Telecommun. Syst. , vol. 25, no. 3–4, pp. 353–370, 2004, doi: 10.1023/B:TELS.0000014789.70171.fd. [133] A. Bharambe et al. , “Donnybrook: Enabling Large-Scale, High-Speed, Peer-to-Peer Games,”

SIGCOMM Comput. Commun. Rev. , vol. 38, no. 4, pp. 389–400, 2008. [134] A. Bharambe, J. Pang, and S. Seshan, “Colyseus: A Distributed Architecture for Online Multiplayer Games,” in , 2006, vol. 6, pp. 155–168. [135] K. Prasetya, “Efficient Methods for Improving Scalability and Playability of Massively Multiplayer Online Game (MMOG),” 2010. [136] Y. Amar, G. Tyson, G. Antichi, and L. Marcenaro, “Towards Cheap Scalable Browser Multiplayer,” in

IEEE Conference on Games (CoG) , 2019, pp. 1–4, doi: 10.1109/CIG.2019.8847958. [137] A. (Peiqun) Yu and S. T. Vuong, “MOPAR: a mobile peer-to-peer overlay architecture for interest management of massively multiplayer online games,” in

International Workshop on Network and Operating Systems Support for Digital Audio and Video , 2005, pp. 99–104, doi: 10.1145/1065983.1066007. [138] S. Kulkarni, S. Douglas, and D. Churchill, “Badumna: A decentralised network engine for virtual environments,”

Comput. Networks , vol. 54, no. 12, pp. 1953–1967, 2010, doi: 10.1016/j.comnet.2010.05.015. [139] G. Vigueras, J. M. Orduña, and M. Lozano, “Performance improvements of real-time crowd simulations,” in

IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) , 2010, pp. 1–4, doi: 10.1109/IPDPSW.2010.5470807. [140] E. Carlini and A. Lulli, “Analysis of Movement Features in Multiplayer Online Battle Arenas,”

J. Grid Comput. , vol. 17, no. 1, pp. 45–57, 2019, doi: 10.1007/s10723-018-9470-2. [141] Y. Chen and E. S. Liu, “Comparing Dead Reckoning Algorithms for Distributed Car Simulations,” in

ACM SIGSIM Conference on Principles of Advanced Discrete Simulation , 2018, pp. 105–111, doi: 10.1145/3200921.3200939. [142] S. K. Singhal and D. R. Cheriton, “Exploiting Position History for Efficient Remote Rendering in Networked Virtual Reality,”

Presence Teleoperators Virtual Environ. , vol. 4, no. 2, pp. 169–193, 1995, doi: 10.1162/pres.1995.4.2.169. [143] J. H. P. Chim, M. W. Green, R. W. H. Lau, H. V. Leong, and A. Si, “On caching and prefetching of virtual objects in distributed virtual environments,” in , 1998, pp. 171–180, doi: 10.1145/290747.290769. [144] A. Chan, R. W. H. Lau, and B. Ng, “A hybrid motion prediction method for caching and prefetching in distributed virtual environments,” in

Symposium on Virtual reality software and technology , 2001, pp. 135–142, doi: 10.1145/505008.505035. [145] A. Tumanov, R. Allison, and W. Stuerzlinger, “Variability-Aware Latency Amelioration in Distributed Environments,” in

IEEE Virtual Reality Conference , 2007, pp. 123–130, doi: 10.1109/VR.2007.352472. [146] S. Gül, S. Bosse, D. Podborski, T. Schierl, and C. Hellge, “Kalman Filter-based Head Motion Prediction for Cloud-based Mixed Reality,” in , 2020, pp. 3632–3641, doi: 10.1145/3394171.3413699. [147] P. Huang and Y. Ishibashi, “Simultaneous output-timing control in networked games and virtual environments,” in

MediaSync: Handbook on Multimedia Synchronization , M. Montagud, P. Cesar, F. Boronat, and J. Jansen, Eds. Cham: Springer International Publishing, 2018, pp. 149–166. [148] M. Montagud, P. Cesar, F. Boronat, and J. Jansen, “Introduction to Media Synchronization (MediaSync),” in

MediaSync , Springer, 2018, pp. 3–31. [149] Y. Ida, Y. Ishibashi, N. Fukushima, and S. Sugawara, “QoE assessment of interactivity and fairness in first person shooting with group synchronization control,” in , 2010, p. 10, doi: 10.1109/NETGAMES.2010.5680283. [150] D. N. Kanellopoulos, “Group Synchronization for Multimedia Systems,” in

Advanced Methodologies and Technologies in Media and Communications , 2019, pp. 229–241. [151] P. Huang, Y. Ishibashi, N. Fukushima, and S. Sugawara, “QoE

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 28

Assessment of Group Synchronization Control Scheme with Prediction in Work Using Haptic Media,”

Int. J. Commun. Netw. Syst. Sci. , vol. 5, no. 6, pp. 321–331, 2012, doi: 10.4236/ijcns.2012.56042. [152] S. U. Din, B. Ahmad, A. Ahmed, M. Amin, and S. Aoudi, “Inter-destination Synchronization: A Comparison between Master-Slave and Synchronization-Manager Techniques,” in

International Arab Conference on Information Technology (ACIT) , 2019, pp. 222–229, doi: 10.1109/ACIT47987.2019.8991020. [153] M. Mauve, J. Vogel, V. Hilt, and W. Effelsberg, “Local-lag and timewarp: Providing consistency for replicated continuous applications,”

IEEE Trans. Multimed. , vol. 6, no. 1, pp. 47–57, 2004, doi: 10.1109/TMM.2003.819751. [154] A. Ferscha and S. K. Tripathi, “Parallel and Distributed Simulation of Discrete Event Systems,” USA, 2001. [155] M. Damitio and S. J. Turner, “Comparing the Breathing Time Buckets Algorithm and the Time Warp Operating System on a Transputer Architecture,” in

SCS European Simulation Multiconference , 1994, pp. 141–145, doi: 10.1.1.52.3785. [156] T. Chen, Z. Wang, and Q. Lu, “An Adaptive Lockstep Synchronization Method for Scene Collaborative Editing of 3D Geometry,” in

International Conference on Intelligent Computing, Automation and Systems (ICICAS) , 2019, pp. 324–328, doi: 10.1109/ICICAS48597.2019.00076. [157] W. Zhang and H. Zhou, “An Asynchronous Control Method For Reducing Inconsistency In DVE,” in , 2017, pp. 6–11, doi: 10.2991/jimec-17.2017.2. [158] N. E. Baughman and B. N. Levine, “Cheat-proof playout for centralized and distributed online games,” in , 2001, vol. 1, pp. 104–113, doi: 10.1109/infcom.2001.916692. [159] J. Kim, S. Lee, and J. W. Kim, “Adaptive event synchronization control for distributed virtual environment,” in , 2005, pp. 1–4, doi: 10.1109/MMSP.2005.248612. [160] D. R. Jefferson, “Virtual Time,”

Trans. Program. Lang. Syst. , vol. 7, no. 3, pp. 404–425, 1985, doi: 10.1145/3916.3988. [161] J. S. Steinman, “Breathing time warp,”

SIGSIM Simul. Dig. , vol. 23, no. 1, pp. 109–118, 1993, doi: 10.1145/174134.158473. [162] Y. Zhang and G. Li, “SafeBTW: A Scalable Optimistic Yet Non-risky Synchronization Algorithm,” in , 2012, vol. 1, pp. 75–77, doi: 10.1109/PADS.2012.39. [163] E. Cronin, A. R. Kurc, B. Filstrup, and S. Jamin, “An efficient synchronization mechanism for mirrored game architectures,”

Multimed. Tools Appl. , vol. 23, no. 1, pp. 7–30, 2004, doi: 10.1023/B:MTAP.0000026839.31028.9f. [164] X. Bin Shi, L. Fang, D. Ling, Z. Xing-hai, and X. Yuan-sheng, “An event correlation synchronization algorithm for MMOG,” in , 2007, vol. 1, pp. 746–751, doi: 10.1109/SNPD.2007.152. [165] S. Ferretti, M. Roccetti, and S. Cacciaguerra, “On distributing interactive storytelling: Issues of event synchronization and a solution,” in

International Conference on Technologies for Interactive Digital Storytelling and Entertainment , 2004, vol. 3105, pp. 219–231, doi: 10.1007/978-3-540-27797-2_29. [166] S. Ferretti and M. Roccetti, “Fast delivery of game events with an optimistic synchronization mechanism in massive multiplayer online games,” in

SIGCHI International Conference on Advances in computer entertainment technology , 2005, vol. 265, pp. 405–412, doi: 10.1145/1178477.1178570. [167] S. Ferretti and M. Roccetti, “A novel obsolescence-based approach to event delivery synchronization in multiplayer games,”

Int. J. Intell. Games Simul. , vol. 3, no. 1, pp. 7–19, 2004, doi: 11576/2679118. [168] S. Ferretti, M. Roccetti, and C. E. Palazzi, “An optimistic obsolescence-based approach to event synchronization for massively multiplayer online games,”

Int. J. Comput. Appl. , vol. 29, no. 1, pp. 33–43, 2007, doi: 10.1080/1206212X.2007.11441830. [169] R. Boutaba et al. , “A comprehensive survey on machine learning for networking: evolution, applications and research opportunities,”

J. Internet Serv. Appl. , vol. 9, no. 1, p. 16, 2018, doi: 10.1186/s13174-018-0087-2. [170] L. Paladina, A. Biundo, M. Scarpa, and A. Puliafito, “Artificial Intelligence and Synchronization in wireless sensor networks,”

J. Networks , 2018, pp. 7–12, doi: 10.1145/3210424.3210434. [180] E. Rhee, I. Shin, and H. Lee, “Implementation of the cloud gaming platform with adaptive bitrate streaming,” in

International Conference on Information and Communication Technology Convergence (ICTC) , 2014, pp. 478–479, doi: 10.1109/ICTC.2014.6983185. [181] H.-J. Hong, C.-F. Hsu, T.-H. Tsai, C.-Y. Huang, K.-T. Chen, and C.-H. Hsu, “Enabling Adaptive Cloud Gaming in an Open-Source Cloud Gaming Platform,”

IEEE Trans. Circuits Syst. Video Technol. , vol. 25, no. 12, pp. 1–1, 2015, doi: 10.1109/TCSVT.2015.2450173. [182] L. Wang, M. J. Suarez, and R. A. Domanico, “Adaptive Bitrate Streaming in Cloud Gaming,” Worcester Polytechnic Institute, 2017. [183] G. K. Illahi, T. Van Gemert, M. Siekkinen, E. Masala, A. Oulasvirta, and A. Ylä-Jääski, “Cloud Gaming with Foveated Video Encoding,”

ACM Trans. Multimed. Comput. Commun. Appl. , vol. 16, no. 1, p. 24, 2020, doi: 10.1145/3369110. [184] H. Ahmadi, S. Zadtootaghaj, F. Pakdaman, M. R. Hashemi, and S. Shirmohammadi, “A Skill-Based Visual Attention Model for Cloud Gaming,”

IEEE Access (Early Access) , p. 1, 2021, doi: 10.1109/ACCESS.2021.3050489. [185] Y. Eu et al. , “SuperStreamer: Enabling Progressive Content Streaming in a Game Engine,” in , 2019, pp. 13–24, doi: 10.1145/3302506.3310385. [193] J. Meng, S. Paul, and Y. C. Hu, “Coterie: Exploiting Frame Similarity to Enable High-Quality Multiplayer VR on Commodity Mobile Devices,” in , 2020, pp. 923–937, doi: 10.1145/3373376.3378516. [194] S. Shen et al. , “Analysis of Viewing Behaviors in a Head-Mounted Virtual Geographic Environment,” in

International Conference on Virtual Reality and Visualization , 2017, pp. 461–462, doi: 10.1109/ICVRV.2017.00123. [195] “Deep learning super sampling.” [Online]. Available: https://developer.nvidia.com/dlss.

PRE-PRINT MANUSCRIPT SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS < 29

Juan González Salinas was born in Simat de la Valldigna (Valencia, Spain) and studied Telecommunications Systems, Sound, and Image Engineering Degree (2016) at Polytechnic University of Valencia (UPV) and Bioinformatics and Biostatistics MsC (2018) at OUC (Universitat Oberta de Cataluña) and UB (Universitat de Barcelona). He is an assistant researcher and developer in the Immersive Interactive Media (IIM) R&D Group. His main interests are web and mobile app development, VR and AR design and machine learning.

Fernando Boronat Seguí is the head of the IIM R&D Group (http://iim.webs.upv.es) at the Gandia Campus of the UPV, Spain. He received the M.E. and Ph.D. degrees in telecommunication engineering from the UPV in 1994 and 2014, respectively. After working for several Spanish telecommunication companies, he moved back to the UPV in 1996. Currently, he is an Assistant Professor in its Communications Department. He has extensive experience in research and both undergraduate and postgraduate teaching in communication networks, multimedia systems and protocols, and media synchronization. His main research topics of interest are immersive and interactive media systems and applications, mulsemedia and media synchronization. He is the author of two books, several book chapters, an IETF RFC and more than 100 research papers in relevant journals and conferences. He is involved in several IPCs of national and international refereed journals and conferences and serves as a reviewer for highly respected journals. He is member of IEEE (M’93–SM’11) and ACM (M’15).

Almanzor Sapena Piera received his BS degree in mathematics from the University of Valencia. In April 2002, he received his PhD degree from the UPV, with a focus of topological properties in fuzzy metric spaces. He is now an associate professor in the Department of Applied Mathematics at UPV and is doing research on fuzzy topology and noise reduction in digital images and on adaptive media playout techniques. He has published some papers on these topics in international journals and conferences. He is a member of the IIM R&D Group at Gandia Campus of the UPV.