Deployment of Elastic Virtual Hybrid Clusters Across Cloud Sites
Miguel Caballer, Marica Antonacci, Zden?k ?ustr, Michele Perniola, Germán Moltó
DDeployment of Elastic Virtual Hybrid Clusters AcrossCloud Sites
Miguel Caballer a , Marica Antonacci b , Zdenˇek ˇSustr c , Michele Perniola d,b ,Germ´an Molt´o a, ∗ a Instituto de Instrumentaci´on para Imagen Molecular (I3M)Centro mixto CSIC - Universitat Polit`ecnica de Val`enciaCamino de Vera s/n, 46022, Valencia b INFN Bari, Via E. Orabona 4, 70125 Bari, Italy c CESNET, Zikova 4, 160 00 Praha 6, Czechia d Department of Physics, University of Bari, Via G. Amendola 173, 70126 Bari, Italy
Abstract
Virtual clusters are widely used computing platforms than can be deployedin multiple cloud platforms. The ability to dynamically grow and shrink thenumber of nodes has paved the way for customised elastic computing bothfor High Performance Computing and High Throughput Computing workloads.However, elasticity is typically restricted to a single cloud site, thus hinderingthe ability to provision computational resources from multiple geographicallydistributed cloud sites. To this aim, this paper introduces an architecture ofopen-source components that coherently deploy a virtual elastic cluster acrossmultiple cloud sites to perform large-scale computing. These hybrid virtualelastic clusters are automatically deployed and configured using an Infrastruc-ture as Code (IaC) approach on a distributed hybrid testbed that spans dif-ferent organizations, including on-premises and public clouds, supporting au-tomated tunneling of communications across the cluster nodes with advancedVPN topologies. The results indicate that cluster-based computing of embar-rassingly parallel jobs can benefit from hybrid virtual clusters that aggregatecomputing resources from multiple cloud back-ends and bring them togetherinto a dedicated, albeit virtual network. ∗ Corresponding author
Email address: [email protected] (Germ´an Molt´o)
Preprint submitted to Journal of L A TEX Templates February 18, 2021 a r X i v : . [ c s . D C ] F e b eywords: Cloud Computing, Network virtualization, Cluster computing
1. Introduction
Scientific computing typically requires execution of resource-intensive ap-plications that require collaborative usage of multiple computing and storageresources in order to satisfy the demands of the applications. This is whyDistributed Computing Infrastructures (DCIs), such as the European Grid In-frastructure (EGI) [1] or the Open Science Grid (OSG) [2], emerged in thelast decades to encompass the rise of eScience. Indeed, as described in [3],“eScience studies, enacts, and improves the ongoing process of innovation incomputationally-intensive or data-intensive research methods; typically this iscarried out collaboratively, often using distributed infrastructures”.The advent of cloud computing [4], exemplified by public cloud providerssuch as Amazon Web Services, Microsoft Azure or Google Cloud Platform,together with Cloud Management Platforms such as OpenStack and OpenNeb-ula, popularised access to on-demand elastic computing, storage and networkingcapabilities. This represented a major step forward from previous Grid infras-tructures, in which users had to adapt their applications to run on the Grid.Instead, using cloud technologies, users can customize the virtualized executionenvironment to satisfy the requirements of the application, not the other wayround.However, coordinated execution on these customized computing environ-ments was still required to execute scientific applications. Indeed, High Per-formance Computing (HPC) applications, typically based on message-passinglibraries such as the Message Passing Interface (MPI) or shared-memory pro-gramming libraries, such as OpenMP, require a coordinated execution of a set ofprocesses on the computing infrastructure. To this aim, computing clusters havebeen widely adopted in the last decades as the preferred computing platform torun scientific applications.A computing cluster is a set of nodes typically interconnected by low-latency2etworks that use a Local Resource Management System (LRMS) together witha shared file system to ease operations across all the nodes of the cluster. Popularexamples of LRMS are SLURM [5] or HTCondor [6, 7], which schedule jobs to afinite set of computing resources provided by the nodes of the computing cluster.However, the inherent acquisition and maintenance cost of physical clusters,together with the general availability of cloud providers, made it economicallyviable for certain workloads to deploy virtual elastic clusters on the cloud. Athorough discussion on the viability of outsourcing cluster computing to a cloudis available in a previous work by the authors [8].Furthermore, the coexistence of on-premises and public clouds has leveragedcloud bursting, where virtual clusters can be enlarged with resources outside theorganization and, therefore, hybrid clusters can harness on-premises and publiccloud resources simultaneously. This approach is highly advantageous becauseusers can seamlessly access cluster-based computing resources in addition tothose available via their on-premises clouds. Other topologies can be consideredfor hybrid clusters when using virtual resources, such as heterogeneous clusters,where various nodes in the cluster have different hardware characteristics.However, when deploying hybrid virtual elastic clusters across multiple IaaScloud sites several challenges need to be solved, which include, but are notlimited to: i) deciding on the set of cloud sites that are most suitable to be chosenfor the hybrid deployment; ii) provisioning and configuring the computing nodesacross multiple cloud back-ends; iii) automatically creating secure networkingtunnels that provide seamless connectivity among geographically distributednodes of the cluster; iv) minimising the number of public IPv4 addresses involvedin the deployment of the cluster to cope with the restricted quotas that manyresearch centers face; v) taking advantage of level 2 capabilities of user-createdprivate networks in Cloud Management Frameworks such as OpenStack.To this aim, this paper describes an architecture that integrates the followingopen-source components: the INDIGO-DataCloud PaaS Orchestrator [9] thatcollects high-level deployment requests from the software layer, and coordinates3he resource or service deployment over dynamic Apache Mesos clusters ordirectly over IaaS platforms using continuous configuration and DevOps-baseddeployment approaches; the Infrastructure Manager (IM), which deploys com-plex and customized virtual infrastructures on IaaS cloud deployments, and theINDIGO Virtual Router [10], to establish overlay virtual private networks tointerconnect nodes in a cloud platform deployment even if deployed in multiple,geographically distant sites. The main contribution of this paper is to showcasean open-source integrated platform to provide users with the ability to dynam-ically provision elastic virtual clusters across hybrid clouds that can seamlesslysupport cloud bursting supported by flexible virtual routing in hybrid setups toaccommodate workload increases.After the introduction, the remainder of the paper is structured as follows.First, section 2 describes the related work in the area of virtual clusters andhybrid networking. Next, section 3 introduces the proposed architecture todeploy the virtual elastic clusters on hybrid infrastructures composed of severalIaaS clouds. Then, section 4 describes a case study that uses the aforementionedarchitecture to deploy virtual clusters with automated networking connectivityamong the nodes with hybrid resources from both research centers and publicclouds. The results are analysed to point out the benefits and limitations of theproposed approach. Finally, section 5 summarises the main achievements of thepaper and points to future work.
2. Related Work
There are previous works in the literature that aim at deploying virtualclusters on cloud infrastructures: ElastiCluster [11] is a command line toolto create, manage and setup computing clusters hosted on cloud infrastruc-tures (like Amazon’s Elastic Compute Cloud EC2, Google Compute Engine orOpenStack-based clouds). The clusters must be scaled by the user and, there- Apache Mesos: http://mesos.apache.org/
3. Architecture Design and Components for Hybrid Virtual Clusters
This section describes the main open-source components integrated in thearchitectural solution to deploy hybrid virtual clusters. For each component, abrief description is provided together with the role played by the component inthe overall architecture.
Figure 1 describes the architecture employed to support the deploymentof hybrid virtual elastic clusters across sites. In this approach, the front-end7 rnodeslurmserver
CESNET
CLUESLRMSvnode-1
UPV
VRouterCentralpoint vnode-2VRouter
PaaS OrchestratorInfrastructureManager
Figure 1: Architecture to deploy hybrid virtual elastic clusters across multiple cloud sites(exemplified for two specific cloud sites). node of the cluster (the only VM that will require a public IP) will also act asthe vRouter Central Point (CP) in order to avoid requiring an additional VMwith a public IP. The rest of VMs hosted in the same provider will not requirethe installation of any vRouter component, they will only require setting thefront-end as the network gateway through DHCP. Then, in each different cloudprovider where the cluster’s working nodes will be deployed, an extra VM willalso be deployed and configured as the vRouter node. It will be configuredto route the traffic through the vRouter CP installed in the front-end node.Finally, the rest of VMs deployed on each site will be configured to set thevRouter node deployed in its site as the network gateway.The deployment flow starts with the user choosing an existing TOSCA tem-plate from a curated repository publicly available in GitHub . The templatedescribes an application architecture to be deployed in the cloud. Previous indigo-dc/tosca-templates: https://github.com/indigo-dc/tosca-templates The PaaS Orchestrator allows to coordinate the provisioning of virtualizedcompute and storage resources on Cloud Management Frameworks, both pri-vate and public (like OpenStack, OpenNebula, AWS, etc.), and the deploymentof dockerized services and jobs on Mesos clusters. It receives the deploymentrequests, expressed through TOSCA templates, and deploys them on the bestavailable cloud site. In order to select the best site, the Orchestrator imple-ments a complex workflow: it gathers information about the SLA signed by theproviders and monitoring data about the availability of the compute and storageresources.The Orchestrator architecture (Figure 2) is modular and based on plugins:9 igure 2: PaaS Orchestrator high-level architecture. depending on the deployment requirements, specified in the TOSCA template,the proper adapter/connector is automatically activated: • IaaS adapters, in charge of managing the interaction with the Cloud Man-agement Frameworks; this interaction can be performed using directly theHeat Orchestration service for deployments on top of OpenStack cloudsor delegating to the Infrastructure Manager. • Mesos connectors, implementing the interfaces that abstract the interac-tion with the Mesos frameworks: Marathon for managing containerizedlong-running services and Chronos for submitting containerized batch-like jobs. • HPC adapter, implementing the interfaces to interact with the HPC ser- Marathon: https://mesosphere.github.io/marathon/ Chronos: https://mesos.github.io/chronos/ . • Data Management connectors, implementing the interfaces to interactwith data orchestration systems in order to coordinate data movementand replication requested by the user. The reference implementation isbased on the Rucio system [30]. • Data placement connectors, in charge of interacting with the storage ser-vices in order to get information about the location of user data; thisinformation is used by the Orchestrator to perform data-aware schedul-ing: if requested by the user, the Orchestrator will try to match computeresources with data availability in order to start the processing near thedata.The users can interact with the PaaS Orchestrator via its REST API, alsousing the orchent command-line interface and, finally, via the web-based graph-ical dashboard, as shown in Figure 3. The infrastructure Manager (IM) [31] is a cloud orchestration tool that de-ploys complex and customized virtual infrastructures on IaaS cloud deployments(such as AWS, Azure, OpenStack, OpenNebula, etc.), as shown in Figure 4. Itperforms automatic deployment of applications described as code, using the OA-SIS TOSCA (Topology and Orchestration Specification for Cloud Applications)[32] and its own description language (Resource and Application DescriptionLanguage – RADL). It automates the deployment, configuration, software in-stallation, monitoring and update of the virtual infrastructures. It supports QCG Computing: Rucio: https://rucio.github.io/index.html Orchent : http://github.com/indigo-dc/orchent Infrastructure Manager: igure 3: Orchestrator dashboard. APIs from a large number of virtual platforms, making user applications cloud-agnostic. In addition it integrates a contextualization system (based on AnsibleDevOps tool) to install and configure all the user required applications pro-viding the user with a fully functional infrastructure. The IM internally usesApache Libcloud in several connectors to interact with the corresponding cloudback-end.The IM enables the deployment of virtual infrastructures on top of on-premises, public, federated clouds and container orchestration platforms, thusenabling the deployment of virtual hybrid infrastructures which span acrossmultiple providers. Furthermore it uses an SSH reverse approach allowing An-sible to configure all the VMs of an infrastructure from a single VM, known as“master”, thus requiring only one public IP per infrastructure.Functionality can be accessed via a CLI, a web GUI, a REST API and aXML-RPC API. It can be deployed in a highly available mode with a set ofload-balanced replicated instances in order to cope with a large pool of users. Apache Libcloud: https://libcloud.apache.org/ OSCA /RADL IM VM Master
Ctxt.AgentAnsible
VMVM VM ...
VM VM VM ...
IM Client IM Web
Figure 4: Infrastructure Manager diagram.
The IM is being used in this architecture to provide multi-cloud provisioningand configuration of the Virtual Machines that will be the nodes of the virtualcluster.
CLUES (CLuster Elasticity System) [33, 34] is a general elasticity frame-work that can be applied both to power on/off nodes of a physical cluster orto provision/terminate virtual nodes from a cloud provider. It supports a widevariety of plugins to introduce elasticity in several LRMS (Local Resource Man-agement Systems) such as HTCondor, SGE, SLURM; container orchestrationplatforms such as Kubernetes, Mesos and Nomad and on-premises cloud man-agement frameworks such as OpenNebula.CLUES is being used in this work as the elasticity component employedin the provisioned clusters by the Infrastructure Manager in order to monitorthe job queue of the system and decide whether additional nodes should be CLUES: https://github.com/grycap/clues andDEEP Hybrid-DataCloud . The single most important requirement in virtual networking in the contextof hybrid virtual cluster deployments is the ability to support geographicallydistributed hybrid clusters, wherein different nodes can be located in differentsites. This requirement translates into the need for multi-site spanning pri-vate networks providing adequate security and connectivity to virtual machinescontained within.
Designing inter-site virtual networks as private, isolated infrastructures withtightly controlled outside access has multiple reasons. One or more of the fol-lowing apply in many hybrid deployment scenarios considered and analysed inthe framework of the DEEP Hybrid-DataCloud (DEEP) project.
Fire and forget
The community developing tools for academic hybrid cloudsis currently striving to cater to the needs of the so-called “long tail ofscience”, entailing small user groups with little manpower to spend onprovisioning and maintaining their computing tools. This puts extra em-phasis on deployment isolation, since a deployed cluster – once verified asoperational – tends to get very little attention in terms of maintenanceand updates (security or otherwise). As such, it must be tightly isolatedfrom the outside network to prevent exploitation of missing updates, yetfull visibility among the nodes of the cluster must be preserved. INDIGO-DataCloud: DEEP Hybrid-DataCloud: https://deep-hybrid-datacloud.eu/ egacy tools Somewhat similar to the previous point, rather than through ag-ing, cluster deployments are sometimes obsolete by design since the timeof their very deployment. Virtual installations are a viable way of continu-ing to run outdated legacy software, which is a sorely needed capability toassure comparability and reproducibility of scientific results. Again, strictisolation and encapsulation into a private network is a necessity. On topof that, it is highly beneficial for such private network to support InternetProtocol version 4 (IPv4) since many legacy tools require it. Also, there isa disparate degree of adoption of IPv6 for different countries, which turnsIPv4 addresses into a scarce network resource at least in some geographicalareas [ ? ]. Real world resemblance
Cluster software management tools may occasion-ally employ “automagic” algorithms for self-configuration, which assumephysical network components. Hence it is also advantageous to make thepurpose-built private virtual network resemble a physical one in terms ofcomponents and their purpose.
The design of the Virtual Router component, originally conceived withinthe INDIGO-DataCloud project, is based on several key assumptions: • Cluster nodes (whole clusters) need to be located in private networks pri-marily for security reasons. In case of hybrid deployments such a privatenetwork must span all participating sites and comprise all the deployednodes while “visibility” within the network remains unhampered to facil-itate auto-discovery and communication across the cluster. • There is negligible chance that the orchestration layer will have control ofall intermediate network components across all participating cloud sites, INDIGO Virtual Router: https://github.com/indigo-dc/ansible-role-indigovr • Some cluster nodes may not allow changes in their internal configuration– typically those based on custom, user-supplied images whose contentscannot be modified by the orchestrating component. These may comesimply as pre-prepared user-produced binary images with no root access,or even as images of “alien” operating systems such as MS Windows,which the orchestration layer does not know how to modify. Hence theirnetworking behavior should be configurable with DHCP and no additionalintervention should be required. Note that this “black-box-like” behavioris not the case in the particular example presented in this paper’s Section 4,where the IM actually has control of the internals of individual nodes,but it is a regular occurrence the authors observe while supporting usercommunities in HTC clouds. • It is impossible (or at least difficult) to rely on a custom appliance imagebeing available in all participating cloud sites. Therefore, the networkingcomponent cannot take the form of a binary image but rather that of a“recipe” deployable and configurable on standard distribution images (e.g.,latest Debian, latest Ubuntu) available universally in all IaaS clouds. • It is likewise impossible (or at least difficult) to rely on solutions bundledwith popular cluster software – such as the
OVN4NFV-K8S-Plugin forKubernetes [35] – since that would make every cluster deployment dependon Kubernetes or similar heavy-weight cluster stack, limiting the genericnature of cloud VMs. • Finally it may be difficult to deploy the virtual router in a container.While some of its core components (for example a VPN client or server)may lend themselves to containerization quite readily, in general a rout-ing appliance (especially to fulfill its role as a default gateway in a localarea network) needs to take over such a large portion of its host system’s16etworking that, firstly, the host system would have to delegate its corenetworking functions to the container and, secondly, the configuration ofthe router would have to reside in two locations, i.e., the host system andthe containerized system.
Figure 5: A demonstration use case for a hybrid cluster deployment with the INDIGO VirtualRouter. A private network spans all participating sites. There is a private virtual networkallocated in each cloud, and each has an instance of the vRouter; together, these vRouterinstances form the overlay network.
Figure 5 shows the simplest use case of cluster deployment with the virtualrouter (vRouter). There is a virtual private network set up dynamically, specif-ically for the given deployment in each participating site. Each such networkholds an instance of the vRouter. One of the vRouters acts as a Central Point[of the star topology] while the others set up a point-to-point VPN connectionto it, routing traffic from their local network through to the central point andonward.Each vRouter is a simple virtual machine (VM), which runs primarily anOpenVPN component (client or server). The appliance may be deployed in tworoles: vRouter routes traffic between nodes in the local private network and remote17 .g. 192.168.3.0/24 e.g. 192.168.4.0/24 e.g. 192.168.5.0/24e.g. 192.168.1.0/24Open VPNServerPublic IPOpenVPNClient OpenVPNClient OpenVPNCliente.g. 192.168.2.0/24Open VPNServer + ClientPublic IP ba ck up ba ck up ba ck up Figure 6: An illustration of a redundant star topology with five private networks and fivevRouters, out of which two have the role of central point. The remaining vRouters are awareof both central points but would only use their connection to the backup central point (right)if connection to the primary one (left) was lost. sites participating in the deployment. It may also comprise a DHCP serverto advertise desired network configuration to nearby cluster nodes. Thiscapability, however, is used typically only if the local cloud middlewarecannot perform custom DHCP configuration by itself.
Central Point is a designated vRouter. There must be at least one in each de-ployment. It accepts VPN connections established by individual vRouters.It has the intended network configuration of the deployment pre-configuredand as clients connect it assigns subnet address ranges and other configu-rations. It is the only node within the whole deployment requiring a publicIP address. The topology may contain multiple Central Points (Figure 6)but currently an additional central point will only be used as a hot backup.
Although not originally assumed by the INDIGO Virtual Router design, itturned out that there was a demand for supporting “stand-alone” nodes, be itcluster nodes or other machines the user wishes to connect to the setup. Suchstand-alone nodes exist typically in public or shared networks, which cannot be18ully dedicated to the given hybrid cluster deployment. That makes it impossiblefor a vRouter instance to take control of the whole subnet. As a solution, theconcept of a stand-alone node has been introduced.Stand-alone nodes will typically appear for one of two reasons:1. The participating cloud does not support creation of virtual private net-works, or will not otherwise allow the vRouter to take control of the privatenetwork, yet the orchestration layer has decided to place cluster node(s)in that site.2. The user wishes to connect a pre-existing machine (e.g., their own work-station) to the deployment.Should either be the case, a VPN client can be installed directly on the stand-alone node, and a VPN connection established directly with the central point(Figure 7). This, however, breaks the third assumption made in Section 3.5.2as the orchestration layer must now be able to directly install and configurecomponents (VPN clients) on the nodes. This precludes the use of “black box”images for setting up cluster nodes, but as a trade-off allows for hybrid clus-ter deployment in sites that will not expose sufficient control capabilities fornetworking resources.
The overlay inter-site network deployed in this manner can be consideredfully secured. The individual local networks in participating cloud sites areprivate, established on purpose for the given deployment (or, at worst, assignedfrom a pool of pre-configured networks). Inter-site connections between vRouterinstances and possible stand-alone nodes are built with OpenVPN and as suchthey are also secure.OpenVPN uses X.509 certificates for its clients to authenticate to servers.With client identities (certificate subjects) pre-registered at the server (CP) itis even possible to assign a static configuration to each node. That makes itpossible for the orchestration layer to pre-determine which client vRouter willbe assigned which subnet. 19 .g. 192.168.3.0/24 e.g. 192.168.4.0/24 e.g. 192.168.5.0/24e.g. 192.168.1.0/24Open VPNServerPublic IPOpenVPNClient OpenVPNClient OpenVPNCliente.g. 192.168.2.0/24Open VPNServer + ClientPublic IP ba ck up ba ck up ba ck up Stand-alone
Node e.g.:192.168.1.1192.168.2.1
Figure 7: A complex virtual private network topology with a stand-alone node located outsideany of the vRouter-controlled local virtual networks
The decision to rely on host certificates to establish trust between vRouter in-stances led to the need for an ability to generate said certificates, distribute themand register their subjects with the CP. Third-party solutions were consideredbut finally rejected because, at that point, there was no other use case requiringa separate, stand-alone trust storage solution. OpenVPN comes bundled withthe
Easy RSA tool, which could be easily reused for this purpose. Alternatively,the OpenSSL suite also provides tools to manage certificates. Certificates aregenerated at the CP, and Infrastructure Manager’s existing call-back function isused to retrieve the generated client certificates or even request additional onesin more dynamic use cases. This makes it possible for the orchestration layerto distribute secrets and establish complete trust between networking elementswithin the deployment relying solely on pre-existing tools already available inthe DEEP stack, without the need to introduce additional tools into the mix.
An encrypted OpenVPN connection between local networks may prove topresent a bottleneck, especially should the hybrid cluster deployment requireextensive inter-node communication. This may be perceived as a problem ifinter-node communication is already encrypted natively by the cluster software,20 igure 8: Deployed virtual hybrid infrastructure across cloud sites. making another layer of encryption by OpenVPN superfluous.This particular issue is easily tackled by configuring OpenVPN to use aless costly encryption algorithm or even no encryption at all. That would beperfectly adequate for clusters based on software that is already prepared forworking over the public Internet. However it is not the default option becausemost relevant deployments will assume they may rely on the safety of a localprivate network.In the future it should be also possible to alleviate the burden imposedespecially on the central point by enabling dynamic identification of shorternetwork paths within the deployed topology.
4. Use Case: Scalable Batch Computing Across Geographic and Ad-ministrative Boundaries
In order to test the suitability of the defined architecture we designed a usecase that fully embraces the term hybrid. First, it consists of the deploymentof a virtual elastic cluster across two different cloud sites. Second, one willbe a public cloud provider and the other will be an on-premises cloud beingused in production in a research center. This way, it will be demonstrated thatthe cluster can span across multiple organizational entities and across differenttechnology stacks.
These are the cloud sites involved in the use case:21 etaCentrum Cloud (MCC)
Operated by CESNET, MCC is based on Open-Stack as its Cloud Management Platform and supports federated authen-tication through OIDC. The site was used in prototyping and deploymenttesting hybrid cluster deployments with vRouter-based virtual networks.
Amazon Web Services
The largest public cloud provider to-date. In particu-lar, the region us-east-2 (Ohio) was used. VPC (Virtual Private Cloud) to-gether with dynamically created private networks has been used to accom-modate the execution of Virtual Machines. The instance type t2.medium(2 vCPUS, 4 GBs of RAM) was chosen since it provides an adequate com-promise between hourly price (billed by the second) and performance re-quired to perform the execution of the jobs. A plain Ubuntu 16.04 imagewas used in both cases as the base image of the VMs.The hybrid virtual elastic cluster creation has been made using the Orches-trator dashboard shown in Figure 3. The user selects the “SLURM Elasticcluster” option and fills in the required fields (maximum number of workingnodes, CPU and amount of memory of the nodes, etc.). Finally, the dash-board submits the deployment to the PaaS Orchestrator, in order to start thedeployment process.The deployment of the hybrid virtual elastic cluster proceeded according tothe following steps (corresponding to the scenario shown in Figure 8):1. Only the front-end node (FE) of the cluster is initially deployed at CES-NET, and configured with CLUES in order to detect when pending jobsare queued up at the LRMS (SLURM was used). The FE node does notexecute jobs since it is in charge of executing the control plane (sharedNFS volume across all the nodes, vRouter, SLURM, etc.)2. Up to two additional nodes are deployed at CESNET in order to configurethe first part of the cluster. This represents a situation where existingquotas at the on-premises cloud prevent a user from deploying additionalnodes. 22. Jobs are submitted as workload to the virtual cluster. This causes theallocation of jobs to be executed in any of the two working nodes (WN)allocated in CESNET.4. Once all the executing slots are filled, CLUES triggers the provisioningof additional WNs through the PaaS Orchestrator, which provisions thenodes from AWS.5. CLUES is responsible for monitoring the state of the WNs and the LRMSto decide on allocating additional WNs or terminating them, when nopending jobs are available to be executed.These executions have used the Audio Classifier model [36] from the DEEPOpen Catalog [37]. This is a Deep Learning model to perform audio classificationthat was initially pre-trained with 527 high-level classes from Google’s AudioSetdataset [38], packaged as Docker images. To easily execute the audio classifierin the virtual cluster, as jobs sent through the LRMS, the udocker tool [39] wasused in order to run Docker containers in user space.To produce a real-world workload we selected a subset of the Urban SoundDatasets [40] that consisted of the first 4 audio folders with 3,676 audio files toprocess and a total of 2.8 GB.The workload was split in 4 blocks, including a certain waiting time in be-tween to demonstrate the ability of the elasticity manager to add/remove nodesdynamically based on the system workload. The workload timeline is shown inFigure 9.
SubmitDelay
Figure 9: Workload timeline.
Each processing job includes four steps:23
Download and install udocker (in case it is not already installed in theWN) • Pull the audio classifier Docker image from Docker Hub (in case it is notalready available in the WN from a previous pull operation). • Create the container (in case it has not been created before). • Process the audio fileNotice that the first three steps are only carried out once per node, and theytook an average total of 4 min 30 secs. The audio processing step usually takesabout 15 – 20 secs.Each job processes an audio (WAV) file and creates a JSON output file withthe labels of the most accurately predicted categories of sounds together withlinks to both Google and Wikipedia, as shown in the following excerpt:
Figure 10 shows the usage of all participating nodes during the test. Notethat vnode-1 and vnode-2 correspond to the WNs initially deployed at CESNETwhile the remaining WNs correspond to those dynamically deployed in AWSon-demand. It took approximately 19 minutes for AWS nodes to be deployed,configured, and added to the SLURM cluster.24 igure 10: Cluster usage evolution.Figure 11: Node state evolution.
Figure 11 shows the states of the nodes. The number of nodes that are used(executing jobs) are shown in blue; green indicates that the nodes are poweringon (until they are fully configured and added to the cluster); orange is reservedfor idle nodes and purple is used for nodes that are powering off (until they arefinally destroyed). It clearly shows the evolution of the cluster’s status in time:how the nodes are powered on when the first set of jobs arrives to the cluster(15:00) until all of them are powered on and in use (16:00). These 60 minutescorrespond to the time required to power on three nodes (20 minutes each).Then, at 16:05, the first set of jobs is finished and all the nodes become idle.Finally, three of them are selected to power off until the next set of jobs arrives.But the early arrival of new jobs made CLUES cancel the pending power offoperations on the used nodes, thus making only vnode-3 actually power off. Aproblem appeared in this second step: vnode-5 is detected within a few minutesas “off” by the SLURM manager and, thus, CLUES marks the node as “failed”and powers it off to avoid unnecessary costs by failed VMs. Then, since thereare remaining jobs, CLUES powers it on again. This introduces an additionaldelay until vnode-5 can start processing jobs. Lastly, in the final two steps, thebehavior is once again as expected and only “vnode-5” is powered off and on,25espectively, when the set of jobs ends/starts again.The total duration of the test was 5 hours and 40 minutes, with a total CPUusage of 20 hours. In particular the total time needed to execute all the jobs was5 hours and 20 minutes. Twenty extra minutes were required for the additionalAWS nodes to power off correctly.The PaaS Orchestrator workflow engine has a limitation in that it does notallow a deployment to be modified (nodes added or removed) while an updateoperation is in progress. As a result of this, Figures 10 and 11 show the stepsof about 20 minutes spent deploying additional nodes (15:00 – 15:20, 15:20 –15:40 and 15:40 – 16:00).AWS WNs were executing jobs for 9 h 42 m and there were about 5 hoursspent in idle or in the power on/off processes. Therefore, 66% of the paidtime of these nodes was used in effective job computation. Notice that theaforementioned limitation in the Orchestrator is causing the nodes to spend arelevant amount of time in the power on/off operations, since multiple nodedeployments cannot be performed simultaneously.The total cost of the AWS allocated resources of this test was 0.75 $ . Thiscost includes near 15 CPU hours of the three WNs deployed at AWS (5:31 –vnode-3, 4:45 – vnode-4 and 4:25 – vnode-5) and 6 extra hours of the VRouterinstance.Assuming that there was no possibility of extending the cluster with AWSresources, the workload that required 9 h and 42 mins of computation wouldhave been distributed among the only two nodes that were available on theCESNET site. Thus, considering that CESNET WNs have similar features tothe selected AWS instance types, the test would have required approximatelyfour extra hours to complete all the jobs. Note that the user of the cluster did notneed to change anything in their deployment procedure, apart from submittingthe jobs and wait for them to finish. Furthermore, all the external networkconnections were made through secured VPN networks assuring the securityand confidentiality of data being transferred among the nodes. Therefore, beingable to provision clusters across hybrid cloud deployments allows to pool a26ignificantly larger amount of resources in order to decrease the amount of timerequired to process high-throughput computing tasks.
5. Conclusions and Future Work
This paper has introduced an architecture to automatically deploy hybridvirtual infrastructures across multiple cloud platforms. This has been exempli-fied for the use case of elastic virtual clusters, where these computing entitiescan span across geographical and administrative boundaries. These clusters poolcomputing resources from different cloud sites, integrated within a single batchsystem, and have the ability to deploy additional working nodes and terminatethem when no longer needed.A real-world workload based on an existing dataset to be inferenced withavailable Deep Learning models from an open catalog has been adopted as usecase to demonstrate the effectiveness of the designed platform. The integrationof resources both from an on-premises cloud and from a public cloud provider hasproved its benefits for cloud bursting within a single computing entity, which isthe virtual cluster. This allows users to be completely abstracted from decidingwhich cloud sites the resources are actually being provisioned in.The integration of the virtual router has provided seamless connectivityamong the virtual nodes of the cluster, including automated encryption of com-munications, all dynamically deployed as part of the virtual infrastructure pro-vision.Future work includes performing large-scale tests involving a wide numberof cloud sites in order to determine the bottlenecks of the developed approach.Also, the integration of both CPU and GPU based resources within the samevirtual cluster entity pooled from multiple cloud sites and made available tousers via different batch queues. Furthermore, optimising the ability to per-form parallel provisioning of nodes in the PaaS Orchestrator will reduce thedeployment time.Another set of future work objectives relates to dynamic balancing of inter-27loud virtual network connections. The private virtual overlay networks basedon the INDIGO Virtual Router, as explained in Section 3.5, already have theadvantage of resembling actual physical “metropolitan area” networks (MAN)in their topology. They consist of multiple local networks (analogical to LANs)with routers directing traffic to and from distant networks – other LANs withinthe deployment. This is perceived as a benefit since it makes it easy for platformusers to understand the topology of their deployment without learning newnetworking concepts. What is currently missing from the design, though, isthe dynamic identification of the best path for each data frame, which hasbecome the hallmark of IP-based traffic on the Internet. While the vRoutercan be configured to recognize and maintain connections to multiple routingcounterpoints within the virtual infrastructure, only one such connection is usedas a primary route and the others would only serve as “hot backup”, ready totake over in case the primary connection (or vRouter central point) fails.Extending the overall configuration of the overlay deployment so that it mim-ics actual physical networks even in that trait of automatic optimum (shortest)path lookup would be a logical and potentially quite worthwhile step forward.
6. Acknowledgement
The work presented in this article has been partially funded by project DEEPHybrid-DataCloud (grant agreement No 777435). GM and MC would also liketo thank the Spanish “Ministerio de Econom´ıa, Industria y Competitividad”for the project “BigCLOE” with reference number TIN2016-79951-R. Compu-tational resources at CESNET, used in the real-world use case, were suppliedby the project “e-Infrastruktura CZ” (e-INFRA LM2018140) provided withinthe program Projects of Large Research, Development and Innovations Infras-tructures. 28 eferences [1] D. Kranzlm¨uller, J. M. de Lucas, P. ¨Oster, The European Grid Initiative(EGI), in: Remote Instrumentation and Virtual Laboratories, Springer US,2010, pp. 61–66. doi:10.1007/978-1-4419-5597-5_6 .[2] M. Altunay, P. Avery, K. Blackburn, B. Bockelman, M. Ernst, D. Fraser,R. Quick, R. Gardner, S. Goasguen, T. Levshina, M. Livny, J. McGee,D. Olson, R. Pordes, M. Potekhin, A. Rana, A. Roy, C. Sehgal, I. Sfil-igoi, F. Wuerthwein, A Science Driven Production Cyberinfrastructure-the Open Science Grid, Journal of Grid Computing 9 (2) (2011) 201–218. doi:10.1007/s10723-010-9176-6 .[3] C. B. Medeiros, D. S. Katz, EScience today and tomorrow (mar 2016). doi:10.1016/j.future.2015.10.016 .[4] P. Mell, T. Grance, The NIST Definition of Cloud Computing. NISTSpecial Publication 800-145 (Final), Tech. rep. (2011).URL http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf [5] Slurm WorkLoad Manager.URL https://slurm.schedmd.com/ [6] D. Thain, T. Tannenbaum, M. Livny, Distributed computing in practice:The Condor experience (feb 2005). doi:10.1002/cpe.938 .[7] E. M. Fajardo, J. M. Dost, B. Holzman, T. Tannenbaum, J. Letts,A. Tiradani, B. Bockelman, J. Frey, D. Mason, How much higher can HT-Condor fly?, in: Journal of Physics: Conference Series, Vol. 664, Instituteof Physics Publishing, 2015. doi:10.1088/1742-6596/664/6/062014 .[8] C. De Alfonso, M. Caballer, F. Alvarruiz, G. Molt´o, An economic andenergy-aware analysis of the viability of outsourcing cluster computing toa cloud, Future Generation Computer Systems 29 (3) (2013) 704–712. doi:10.1016/j.future.2012.08.014 .299] INFN, INDIGO PaaS Orchestrator.URL [10] CESNET, INDIGO Virtual Router.URL https://github.com/indigo-dc/ansible-role-indigovr [11] U. of Zurich, ElastiCluster.URL https://github.com/elasticluster/elasticluster [12] MIT, StarCluster.URL http://web.mit.edu/stardev/cluster/ [13] J. E. Coulter, E. Abeysinghe, S. Pamidighantam, M. Pierce, Virtual clus-ters in the jetstream cloud: A story of elasticized hpc, in: Proceedings ofthe Humans in the Loop: Enabling and Facilitating Research on CloudComputing, HARC ’19, ACM, New York, NY, USA, 2019, pp. 8:1–8:6. doi:10.1145/3355738.3355752 .URL http://doi.acm.org/10.1145/3355738.3355752 [14] L. Yu, Z. Cai, Dynamic scaling of virtual clusters with bandwidth guaranteein cloud datacenters, in: IEEE INFOCOM 2016 - The 35th Annual IEEEInternational Conference on Computer Communications, 2016, pp. 1–9. doi:10.1109/INFOCOM.2016.7524355 .[15] L. Yu, H. Shen, Z. Cai, L. Liu, C. Pu, Towards bandwidth guarantee forvirtual clusters under demand uncertainty in multi-tenant clouds, IEEETransactions on Parallel and Distributed Systems 29 (2) (2018) 450–465. doi:10.1109/TPDS.2017.2754366 .[16] M. Caballer, C. De Alfonso, F. Alvarruiz, G. Molt´o, EC3: Elastic cloudcomputing cluster, Journal of Computer and System Sciences 79 (8) (2013)1341–1351. doi:10.1016/j.jcss.2013.06.005 .[17] G. Sipos, G. La Rocca, D. Scardaci, P. Solagna, The EGI applications onDemand service, Future Generation Computer Systems 98 (2019) 171–179. doi:10.1016/j.future.2019.03.009 .3018] A. Calatrava, E. Romero, G. Molt´o, M. Caballer, J. M. Alonso, Self-managed cost-efficient virtual elastic clusters on hybrid Cloud infras-tructures, Future Generation Computer Systems 61 (2016) 13–25. doi:10.1016/j.future.2016.01.018 .URL http://dx.doi.org/10.1016/j.future.2016.01.018http://hdl.handle.net/10251/79206 [19] Azure, Azure CycleCloud.URL https://azure.microsoft.com/en-us/features/azure-cyclecloud/ [20] AWS, AWS Parallel Cluster.URL https://aws.amazon.com/es/blogs/opensource/aws-parallelcluster/ [21] AWS Batch.URL https://aws.amazon.com/batch/ [22] ALCES Flight.URL https://alces-flight.com/ [23] Istio. Connect, secure, control, and observe services.URL https://istio.io/ [24] Submariner k8s project documentation website.URL https://submariner.io/ [25] Open Netorking Environment.URL https://networkencyclopedia.com/open-network-environment/ [26] Open vSwitch.URL [27] Cloudify.URL https://cloudify.co/ arXiv:1711.03334 , doi:10.1007/s10723-017-9418-y .URL http://link.springer.com/10.1007/s10723-017-9418-y [29] M. Caballer, G. Donvito, G. Molt´o, R. Rocha, M. Velten, TOSCA-basedorchestration of complex clusters at the IaaS level, Journal of Physics: Con-ference Series 898 (2017) 082036. doi:10.1088/1742-6596/898/8/082036 .URL http://stacks.iop.org/1742-6596/898/i=8/a=082036?key=crossref.af71f04f17660fdd1e050f7c1e00b643 [30] M. Barisits, T. Beermann, F. Berghaus, B. Bockelman, J. Bogado,D. Cameron, D. Christidis, D. Ciangottini, G. Dimitrov, M. Elsing,V. Garonne, A. di Girolamo, L. Goossens, W. Guan, J. Guenther,T. Javurek, D. Kuhn, M. Lassnig, F. Lopez, N. Magini, A. Molfetas,A. Nairz, F. Ould-Saada, S. Prenner, C. Serfon, G. Stewart, E. Vaandering, P. Vasileva, R. Vigne, T. Wegner, Rucio: Scientific data manage-ment, Computing and Software for Big Science 3 (1) (2019) 11. doi:10.1007/s41781-019-0026-3 .URL https://doi.org/10.1007/s41781-019-0026-3 [31] M. Caballer, I. Blanquer, G. Molt´o, C. de Alfonso, Dynamic managementof virtual infrastructures, Journal of Grid Computing 13 (1) (2015) 53–70. doi:10.1007/s10723-014-9296-5 .URL https://doi.org/10.1007/s10723-014-9296-5 [32] D. Palma, M. Rutkowski, T. Spatzier, TOSCA Simple Profile in YAMLVersion 1.1, Tech. rep. (2016).URL http://docs.oasis-open.org/tosca/TOSCA-Simple-Profile-YAML/v1.1/TOSCA-Simple-Profile-YAML-v1.1.html [33] C. de Alfonso, M. Caballer, F. Alvarruiz, V. Hern´andez, An energymanagement system for cluster infrastructures, Computers & Electrical32ngineering 39 (8) (2013) 2579–2590.URL [34] C. de Alfonso, M. Caballer, A. Calatrava, G. Molt´o, I. Blanquer, Multi-elastic Datacenters: Auto-scaled Virtual Clusters on Energy-Aware Phys-ical Infrastructures, Journal of Grid Computing 17 (1) (2019) 191–204. doi:10.1007/s10723-018-9449-z .URL http://link.springer.com/10.1007/s10723-018-9449-z [35] S. R. Addepalli, R. Sood, OVN4NFVK8s Plugin.URL https://github.com/opnfv/ovn4nfv-k8s-plugin [36] DEEP Audio Model.URL https://marketplace.deep-hybrid-datacloud.eu/modules/deep-oc-audio-classification-tf.html [37] DEEP Open Catalog.URL https://marketplace.deep-hybrid-datacloud.eu [38] AudioSet: A large-scale dataset of manually annotated audio events.URL https://research.google.com/audioset/ [39] J. Gomes, E. Bagnaschi, I. Campos, M. David, L. Alves, J. Martins, J. Pina,A. L´opez-Garc´ıa, P. Orviz, Enabling rootless Linux Containers in multi-user environments: The udocker tool, Computer Physics Communications232 (2018) 84–97. arXiv:1711.01758 , doi:10.1016/j.cpc.2018.05.021 .[40] J. Salamon, C. Jacoby, J. P. Bello, A dataset and taxonomy for urbansound research, in: MM 2014 - Proceedings of the 2014 ACM Conferenceon Multimedia, Association for Computing Machinery, Inc, 2014, pp. 1041–1044. doi:10.1145/2647868.2655045 .URL http://dx.doi.org/10.1145/2647868.2655045.http://dx.doi.org/10.1145/2647868.2655045.