MEDAL: An AI-driven Data Fabric Concept for Elastic Cloud-to-Edge Intelligence
Vasileios Theodorou, Ilias Gerostathopoulos, Iyad Alshabani, Alberto Abello, David Breitgand
MMEDAL: An AI-driven Data Fabric Concept forElastic Cloud-to-Edge Intelligence
Vasileios Theodorou, Ilias Gerostathopoulos, Iyad Alshabani, Alberto Abell´o andDavid Breitgand
Abstract
Current Cloud solutions for Edge Computing are inefficient for data-centric applications, as they focus on the IaaS/PaaS level and they miss the datamodeling and operations perspective. Consequently, Edge Computing opportunitiesare lost due to cumbersome and data assets-agnostic processes for end-to-end de-ployment over the Cloud-to-Edge continuum. In this paper, we introduce MEDAL—an intelligent Cloud-to-Edge Data Fabric to support Data Operations (DataOps)across the continuum and to automate management and orchestration operationsover a combined view of the data and the resource layer. MEDAL facilitates build-ing and managing data workflows on top of existing flexible and composable dataservices, seamlessly exploiting and federating IaaS/PaaS/SaaS resources across dif-ferent Cloud and Edge environments. We describe the MEDAL Platform as a usabletool for Data Scientists and Engineers, encompassing our concept and we illustrateits application though a connected cars use case
Modern consumers seek for personalized innovative services and superior user-experience that can only be achieved through novel data-driven technologies. Con-nected Cars, Smart City and Industry 4.0 are notable examples of domains that arebacked by mission-critical applications, fueled by and heavily dependent on data.Such applications typically need to process vast amounts of data at various levels toextract actionable information in a timely, reliable and privacy-preserving manner.
Vasileios TheodorouIntracom Telecom, Peania, Greece, e-mail: [email protected]
Ilias GerostathopoulosVrije Universiteit Amsterdam, Amsterdam, Netherlands e-mail: [email protected]
Iyad AlshabaniBitSparkles, Sophia Antipolis, France e-mail: [email protected]
Alberto Abell´oUniversitat Polit`ecnica de Catalunya, Barcelona, Spain e-mail: [email protected]
David BreitgandIBM, Haifa, Israel e-mail: [email protected] a r X i v : . [ c s . D C ] F e b Authors Suppressed Due to Excessive Length
The emergence of Cloud computing has been a huge leap forward to effectively hostapplications, with on-demand resources and pay-as-you-go business model signifi-cantly simplifying management and reducing upfront-investment costs.With the advances in virtualization and cloud-native technologies and the abun-dance of devices, efficient ways have emerge to store and process data away fromcentralized data centers and “on-the-Edge”, i.e., closer or even right at the datasources. This emerging service delivery paradigm referred to as
Edge Computing ,promises to lead to decreased latency, which is of paramount importance for time-critical applications, e.g., autonomous driving, as well as to more efficient utilizationof both the current ubiquitous computation resources (smartphones, telecom servers,cars’ on-board units, IoT devices, etc.) and communication bandwidth. Equally im-portantly, it enables the efficient analysis of data that due to practical, legal, or con-fidentiality constraints are not allowed to leave the environments in which they weregenerated, or their transfer entails great performance or other costs. Edge environ-ments are well represented in Cloud offerings of major Cloud providers, usually asenablers for Internet of Things (IoT) scenarios (AWS IoT Greengrass, Azure IoTEdge, Google Cloud IoT Core).The problem is that existing Edge computing resources are underutilized, whilethe network is overutilized. The reason is that although modern Edge offerings sup-port some data preparation and pre-processing services at the Edge, data still needsto be transferred to a central location to be properly analysed. Deploying and manag-ing data analytics applications at the Edge is still not straightforward, since existingCloud solutions for the Edge miss the data modeling and operations perspective.Instead, they focus on an infrastructure and platform level, being oblivious of appli-cations running on top of them. This view encumbers intelligence from Cloud to theEdge, e.g. decision-making processes of when to move data analytics tasks betweenCloud and Edge versus when to move data. Thus, existing Cloud solutions for theEdge cannot actively support organizations in the continuous development, opera-tions, and lifecycle management of data analytics applications (DataOps) [5], whichis essential for effectively leveraging data for competitive advantage.
Overall, thereis no solution yet that supports advanced DataOps on the Cloud-to-Edge con-tinuum, despite the abundance of mature, yet disconnected, Cloud solutions fordata analytics at the Edge.
To illustrate this problem, we consider the scenario of continuously collectingdata from a large number of vehicles and combining them with other context datasuch as that from wearable sensors and smartphones to detect driving behaviors.The data must be analysed so that statistics over large datasets can be calculated andAI/ML prediction models can be trained to identify correlations and mine frequentpatterns. This scenario includes performing anomaly detection to identify differentsafety-related events, e.g. sudden loss of driver’s focus and scoring of driver behav-ior. Nevertheless, driver’s sensitive data produced within the car may not be allowedto leave the vehicle, or may entail privacy restrictions on being shared among differ-ent service providers. In addition, as the number of cars increases, there is a signifi-cant rise in the volume of data that needs to be analysed, as well as in the complex-ity of required data and model management. The challenge then is to deploy, test,
EDAL: An AI-driven Data Fabric Concept for Elastic Cloud-to-Edge Intelligence 3
Fig. 1
Continuous data application life-cycle management on the cloud-to-edge continuum. execute, and manage service components in the most efficient way, both regardingresponse time and resource utilization (network bandwidth, compute, storage), fromthe vehicle to the Cloud, while at the same time respecting data privacy restrictions.Another challenge is that there is also a methodological gap on how data scien-tists can deal with the challenges of the Cloud-to-Edge continuum, i.e. the volatilityand dynamicity of resources, the varying quality and utility of diverse data sourcesall along the data path, and the difficulty in discovering and managing relevant dataassets [10]. A crucial question is how to abstract and obtain a data-centric view of theunderlying infrastructure and assets, while at the same time considering the capa-bilities and opportunities they offer and avoiding vendor lock-in effects. Essentially,the data scientist should be concerned with the data aspects of the analytics work-flows, which in turn poses a requirement for sophisticated automation mechanismsto handle and optimize infrastructure and deployment aspects, as well as datasetsand data models management and operation, even across operational domains.
Our proposed solution aims to support data scientists in the DataOps activ-ities of building and maintaining data analytics applications of high flexibilityand quality on the Cloud-to-Edge continuum, optimally utilizing Cloud/Edgeresources and services (Fig. 1). In particular, it aims to contribute to the evolutionof Cloud services for data analytics in the Cloud-to-Edge continuum by:• Introducing the MEDAL concept—an
Intelligent Data Fabric as a continuumon the data application layer, formed by the federation of semantically enabled,cloud-native data-centric constructs acting as building blocks• Offering a platform for AI-driven Cloud-to-Edge DataOps that provides the datascientist with a comprehensive data-centric view over Edge/Fog/Cloud assets, aswell as the ability to manage and automate the lifecycle and operation of data-intensive analytics and ML workflows, deployed in a distributed fashion that re-spects data locality and cost models of data operations.To this end, we introduce innovations in the areas of: (i) Cloud-native Data Fabricacross the continuum; (ii) DataOps over the Cloud-to-Edge continuum; (iii) AI-Opsfor runtime adaptations over the continuum; and (iv) semantic representation and
Authors Suppressed Due to Excessive Length
Fig. 2
The MEDAL Platform for an intelligent continuum of cloud-native Data Fibers. management of Cloud-to-Edge resources and data assets to ultimately provide aflexible, scalable, and cost-effective platform for Cloud-to-Edge intelligence.In Sec. 2, we describe the main concepts and methodologies empowering ourapproach; in Sec. 3, we showcase the application of MEDAL on an illustrative con-nceted cars use case; finally, in Sec. 4, we conclude this paper with our remarks.
We adopt a data architectural angle where the main structural component of ourdata analytics workflow is the
Data Fiber , which we define as homogeneous wrap-per of data assets and services at the data layer. We consider that the
Data Fabric is formed by the federation of
Data Fibers of different volumes and capacities onthe Cloud-to-Edge continuum (Fig. 2). The Data Fabric facilitates data represen-tation, storage, processing, access and exchange and can be realized using DataLake technologies [7] in a distributed manner. The high flexibility and configurabil-
EDAL: An AI-driven Data Fabric Concept for Elastic Cloud-to-Edge Intelligence 5 ity provided by Data Fibers as our structural units primarily stems from followingcloud-native principles, according to which data ingestion and state is decoupledfrom data processing and analytics. This allows for paying the effort and cost ofdata transformation/integration when it is required, on-demand.At a deployment level, Data Fibers across the continuum are realized as con-tainerized micro-services with a focus on scalability and resilience. Data Fibers areequipped with advanced data profiling and summarization mechanisms, as well aswith cloud-native capabilities at the resource layer, fostering rapid instantiation ofdata workflows over collected data where the data resides—a concept also knownas in-situ processing [8]. Data Fibers may belong to one or more administrativedomains (e.g., in multi-cloud setups) and need to interconnect and to interoper-ate. Moreover, Data Fibers are highly dynamic and volatile, making it essential tomanage their efficient and automated cloud-native orchestration at the infrastructurelayer, including primitives such as dynamic provisioning/decommissioning, auto-scaling and migration, trigger-able via declarative interfaces.We envision the
MEDAL Platform , a platform to elastically manage and orches-trate Data Fibers and their federations over the continuum, while offering a uni-fied view over underlying data assets and resources to Data Scientists and Engi-neers (Fig. 2). Thus, the MEDAL Platform composes an intelligent Data Fabric formanaging heterogeneous data and resources adaptively, on demand, facing versatileneeds and requirements. To achieve these objectives, MEDAL bases on innovativeDataOps principles, tools and techniques for managing the complete lifecycle ofdata applications; AIOps mechanisms for intelligent response to observed eventsand evolving requirements; and semantic annotation of data assets and metadatamanagement processes, as we further describe in the following subsections.
Recently, the DataOps paradigm has emerged as a catalyst towards data workflowautomation, aiming at streamlining data operations, accelerating data application de-velopment and fostering quality and continuous improvement throughout all phasesof data workflow development and operation [5, 2]. DataOps combines ideas fromagile methodologies, DevOps, and lean manufacturing and tries to deal with chang-ing requirements and accelerate time to market, break the silos between develop-ment and operations, and improve quality by reducing non-value-add activities [1].DataOps views the development of data analytics as a continuous process and fo-cuses on how to make it iterate faster and with higher quality by advocating bothfollowing best practices and using the right tools.We tailor the DataOps paradigm and apply it to the development of data analyticsin the Cloud-to-Edge continuum. We adopt the “infinite loop” of DataOps accord-ing to which the development of data analytics passes through different phases:Planning, Composition, Testing, and Release of logical data workflows and Orches-tration, Adaptation, and Evaluation of deployed workflows on the continuum. Our
Authors Suppressed Due to Excessive Length
DataOps framework includes both (i) methodological principles and best practicesthat guide data scientists and (ii) tools that help speed up the design and automate thetesting, quality assurance, and deployment of data analytics workflows in the con-tinuum. Contrary to other Cloud frameworks and platforms for Edge computing, wesupport the complete development lifecycle, from design to maintenance, and putemphasis on continuous integration and deployment of data analytics workflows. Tothis end, the MEDAL Platform, incorporates tools for the following features:
Data Workflow Composition.
MEDAL Platform provides data scientists with cus-tomized access to input their data queries as workflows and compositions of dif-ferent data processing tasks. It exposes (i) visual editors for highly automated de-velopment (akin to mashup tools such as Node-RED), and (ii) script/code editors(akin to Jupyter Notebook) for end-users to directly input their code and define dataservices. Data Service Composition provides the logical model of a data workflow,which is further mapped by service orchestration tools to a physical model over theavailable resources. Data workflows can optionally be annotated with requirementsof geographical restrictions, resource affinity, capacity (CPU, RAM, throughput),priority and isolation, for optimized mapping.
Monitoring Dashboard.
MEDAL Platform provides visualizations for interactivedata and analytics exploration, exposing information about the data analytics out-puts and quality at the various application ensembles. In addition, it provides visualgraphs for the health, status and availability of infrastructure, as well as log moni-toring for event management, as exposed by the Cloud-to-Edge resources.
Autonomic Cloud-to-Edge Management & Orchestration.
The MEDAL Plat-form manages flexibility and adaptivity of data workflows as well as provisioningand data asset-aware coordination of Cloud/Fog/Edge services. In this respect, dataworkflow deployment (including both service binding and job scheduling) and qual-ity control are performed in a resource-aware fashion, matching available resources’characteristics with data workflow requirements. Data service orchestration can berealised using workflow and data pipeline management open source tools.
Continuous Quality Control.
The MEDAL Platform provides data workflow test-ing and optimization environments for data engineers, for continuous improvementof data and infrastructure compositions. They include mechanisms to create stagingenvironments using Cloud and Edge nodes and test data, and to automate the testingof data workflows in those environments. In particular, input data quality is contin-uously estimated using the Semantic Knowledge Base described below. Once a dataworkflow passes its prescribed quality tests, it is deployed in production, where itsquality and operation continue to be monitored and profiled.
The inclusion of Edge devices, Fog nodes and corresponding services into the poolof Cloud resources introduces new challenges related to volatility, mobility, dy-namicity and capacity limitations [10]. Advanced IT operations over such complex
EDAL: An AI-driven Data Fabric Concept for Elastic Cloud-to-Edge Intelligence 7 and dynamic environments are necessary for maintaining quality of deployed dataanalytics while minimizing resource usage costs. The challenge here is to intro-duce Edge Intelligence [4] mechanisms both (i) for supporting the elastic lifecyclemanagement and interoperation of Data Fibers (i.e., the data infrastructure layerof our approach) and (ii) for managing the distributed nature of data analytics andML pipelines spread across the continuum. However, such intelligent mechanismscan only take place with the appropriate visibility and reaction over performancedata across all disparate Cloud-to-Edge resources. AIOps [6] has recently been pro-posed as an effective paradigm to exploit AI/ML techniques towards IT operationsautomation, by correlating data across different interdependent environments andproviding real-time, actionable insights over system behaviors, as well as recom-mendations and (semi-)automated corrective actions. AIOps services provide timelyawareness and proactive actions over service quality degradation, resource utiliza-tion changes and system mis-configurations, using event management mechanismscombined with application logic to identify root causes and to trigger appropriaterestorative management workflows.We adopt an AIOps angle of high automation with services of built-in intelli-gence, where runtime adaptation mechanisms play a central role for closing the loopfrom issue detection or prediction, to autonomic response. Adaptation mechanismsare crucial for managing unpredictability of resources’ and services’ availability,as well as for accounting for the varying availability and quality of data along theCloud-to-Edge continuum, which can also continuously change. Runtime adapta-tion primitives are instilled into proactive management workflows and include:
Quality-driven scheduling : Re-allocation and re-scheduling of data collection anddata analytics tasks to sensing/compute nodes based on intelligent monitoring ofdata and analytics quality;
Flexible Data/ML Model deployment : Move data models across levels (i.e., closerto Cloud or closer to Edge) to efficiently utilize resources and maintain analyticsquality, affecting where data aggregation/model training [11] takes place and thusthe necessity of transferring unaggregated/training data across levels;
Elasticity of Data Fibers : Dynamic provisioning, auto-scaling and migration prim-itives for the Data Fibers across the continuum to respond to detected or predictedover-/under-utilization of resources and to adjust to evolving data analytics require-ments (e.g. increase sample size for higher accuracy).Runtime adaptations follow the Monitor-Analyze-Plan-Execute over Knowledge(MAPE-K) control loop. In the Monitoring phase, data at both the infrastructure andplatform level (e.g. CPU load, memory consumption), and at the application level(e.g. application telemetry data on data analytics accuracy and precision, logs). TheAnalyze phase is responsible for preprocessing, combining, and applying AI/MLtechniques for identifying situations that trigger adaptations, also throwing rele-vant events. Such situations can be both negative (e.g. reduced output quality ofa deployed data workflow) and positive (e.g. addition of Edge nodes bringing inopportunity to increase service availability). In the Plan phase, different adaptationactions or plans are determined and compared to each other. If more than one planis available, a decision is taken either via involving a human operator or (to be
Authors Suppressed Due to Excessive Length fully autonomous) via prioritization based on the contribution of each plan to meet-ing certain predefined and prioritized goals (e.g. load balancing, increase of outputquality). We should note here the importance of cost models [3, 9] for the evaluationof alternative plans, which play the role of a Knowledge Base and are continuouslyaugmented with historical data from monitoring and reaction to past events.Finally, in the Execute phase, the selected plan is rolled out via the activation ofa workflow including a series of concrete changes (e.g. provisioning of a new DataFiber, decommissioning of another one, and starting a computation on the new DataFiber).
Semantic interoperability takes place both at the data layer and the infrastructure re-source layer to seamlessly manage heterogeneous resources and services across theCloud-to-Edge continuum. This can only be achieved with semantically rich infor-mation about available computing resources and data assets, combined with appro-priate mechanisms for persisting and exchanging such information. In this respect,we introduce the concept of a decentralized
Semantic Knowledge Base that actsas the source of information used by both (i) management entities to monitor DataFibers and obtain a unified view over the available resource, data, and service assets;(ii) assets to discover and interoperate with each other. Information includes meta-data about infrastructure resource characteristics, data assets (data sources, schemas,profiling, data quality, information available, etc.), monitored runtime state (utiliza-tion, active sessions). The Semantic Knowledge Base is also enhanced by predictivecost models for future performance estimations that provide recommendations aboutthe deployment of analytics workflows over the available resources.To be scalable and allow partially autonomous operation, the Semantic Knowl-edge Base is decentralized, i.e. there is no central node which keeps track of themetadata in the whole continuum. Instead, nodes form metadata exchange clustersdynamically and only share with other clusters in the continuum the metadata nec-essary for inter-cluster provisioning and management of data workflows. This way,the single point of failure is avoided and a certain degree of autonomicity in resourcemanagement and scheduling of operations is retained by each cluster (which couldalso be a single node). While the discovery and metadata-exchange process can becontinuous, the synapsis of federations between Data Fibers at the continuum cantake place on-demand and have a temporal nature.
Modern cars are equipped with a plethora of sensors, enabling a variety of servicesin the context of safety, control and entertainment. Insurance companies, as well as
EDAL: An AI-driven Data Fabric Concept for Elastic Cloud-to-Edge Intelligence 9
Fig. 3
Intelligent Data Fabric for Connected Cars applications. city and road safety administrators and fleet owners, are particularly interested inautomatized car analytics such as driving behavior analysis (DBA) and predictivemaintenance. The value chain ranges from processing units and actuators embeddedin the car, to service providers using car data to provide advanced connected services(for the driver, for the manufacturer, for the city, etc.). The execution of analyticsover generated data can take place inside the vehicle’s Onboard Units (OBUs), atcentralized cloud environments or at intermediary nodes along the Edge to Clouddata path (i.e., Edge/Fog Nodes), such as Roadside Units (RSUs) or cellular networkinfrastructure (Mobile Base Stations acting as MEC points of presence).In Fig. 3, we depict the application of the MEDAL concept on this use case. DataFibers are instantiated through the MEDAL Platform as interconnected containersat the different levels, forming an intelligent Data Fabric. Onboard the car, at theOBU, the Data Fiber collects data from car sensors and performs local storage andprocessing (i) for
ML inference tasks such as the diagnosis of hazardous driving be-havior or its prediction due to vital signs; (ii) for local model training in case of dis-tributed machine learning data applications, e.g., collection of sensitive (DBA) datafrom thousands of drivers and federated learning of correlations without any rawdata actually leaving any car and (iii) for data preparation so that data can be trans-formed and cleansed accordingly before being moved to higher level Data Fibers.At the Edge/Fog nodes (RSU, Base Station etc.), the Data Fiber performs modelaveraging over the model parameters received from the Data Fibers on the variouscars, or data aggregation . On the powerful centralized Cloud, the Data Fiber per-forms global model training and advanced analytics tasks, possibly interoperatingwith other available services at the Cloud. The scheduling of tasks between differentlevels, as well as the activation, termination, scaling and migration of Data Fibers, ismanaged in an automated fashion by the platform, according to availability and costof resources (e.g., density of cars over a particular geographical area at a particu-lar time) and changing application requirements which are dependent on situationalawareness (e.g., spawn Data Fibers in multiple cars close to a traffic collision)
In this work, we have introduced MEDAL—a novel concept for the efficient man-agement of the complete lifecycle of data applications deployed all along the Cloud-to-Edge continuum. We constructed the notions for an intelligent Data Fabric com-posed of Data Fibers—our semantically-enabled cloud-native distributed buildingunits that can dynamically launch, federate and scale on and across the differentlevels of the Cloud-to-Edge continuum. We described the DataOps, AIOps and se-mantic annotation principles underpinning MEDAL and we illustrated our approachthrough a use case from the connected cars domain. In contrast with existing Cloudsolutions, MEDAL fully exploits available knowledge about data assets over thecontinuum and uses this information to provide a unified data and monitoring viewto application developers, as well as to make informed decisions about management,orchestration and adaptation of data workflows. As a next step, we plan to build aprototype solution of the MEDAL Platform and conduct large-scale experiments toassess its benefits for interesting distributed learning scenarios.
References
1. Christopher Bergh, Gil Benghiat, and Eran Strod. The DataOps Cookbook, 2nd Edition. In
DataKitchen . 2019.2. Antonio Capizzi, Salvatore Distefano, and Manuel Mazzara. From DevOps to DevDataOps:Data Management in DevOps Processes. In
Software Engineering Aspects of Continuous De-velopment and New Paradigms of Software Production and Deployment , pages 52–62, 2020.3. M. Casimiro, D. Didona, P. Romano, L. E. T. Rodrigues, and W. Zwaenepoel. Lynceus: Tuningand provisioning data analytic jobs on a budget.
CoRR , abs/1905.02119, 2019.4. S. Deng, H. Zhao, W. Fang, J. Yin, S. Dustdar, and A. Y. Zomaya. Edge intelligence: Theconfluence of edge computing and artificial intelligence.
IEEE Internet of Things Journal ,7(8):7457–7469, 2020.5. Julian Ereth. DataOps - Towards a Definition. In
LWDApp , pages 104–112, 09 2018.6. Adnan Masood and Adnan Hashmi. AIOps: Predictive Analytics and Machine Learning inOperations. In
Cognitive Computing Recipes: Artificial Intelligence Solutions Using MicrosoftCognitive Services and TensorFlow , pages 359–382. 2019.7. F. Nargesian, E. Zhu, R. J. Miller, K. Q. Pu, and P. C. Arocena. Data Lake Management:Challenges and Opportunities.
Proc. VLDB Endow. , 12(12):1986–1989, 2019.8. Joy Rahman and Palden Lama. MPLEX: In-Situ Big Data Processing with Compute-StorageMultiplexing. In
IEEE MASCOTS 2017, Banff, Canada , pages 43–52, 2017.9. V. A. Stefanidis., Y. Verginadis, D Bauer, T Przezdziek, and G. Mentzas. Reconfigurationpenalty calculation for cross-cloud application adaptations. In , pages 355–362, 2020.10. B. Varghese, N. Wang, S. Barbhuiya, P. Kilpatrick, and D. Nikolopoulos. Challenges andOpportunities in Edge Computing. In
IEEE Int. Conf. on Smart Cloud (SmartCloud) , 2016.11. Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. Federated Machine Learning:Concept and Applications.