[PDF] A Self-Integration Testbed for Decentralized Socio-technical Systems

Abstract

The Internet of Things comes along with new challenges for experimenting, testing, and operating decentralized socio-technical systems at large-scale. In such systems, autonomous agents interact locally with their users, and remotely with other agents to make intelligent collective choices. Via these interactions they self-regulate the consumption and production of distributed resources. While such complex systems are often deployed and operated using centralized computing infrastructures, the socio-technical nature of these decentralized systems requires new value-sensitive design paradigms; empowering trust, transparency, and alignment with citizens' social values, such as privacy preservation, autonomy, and fairness among citizens' choices. Currently, instruments and tools to study such systems and guide the prototyping process from simulation to live deployment are missing, or not practical in this distributed socio-technical context. This paper bridges this gap by introducing a novel testbed architecture for decentralized socio-technical systems running on IoT. This new architecture is designed for a seamless reusability of (i) application-independent decentralized services by an IoT application, and (ii) different IoT applications by the same decentralized service. This dual self-integration promises IoT applications that are simpler to prototype, and can interoperate with decentralized services during runtime to self-integrate more complex functionality. Such integration provides stronger validation of IoT applications, and improves resource utilization. Pressure and crash tests during continuous operations of several weeks, with more than 80K network joining and leaving of agents, 2.4M parameter changes, and 100M communicated messages, confirm the robustness and practicality of the testbed architecture.

Full PDF

AA Self-Integration Testbed for Decentralized Socio-technical Systems

Farzam Fanitabasi a, ∗ , Edward Gaere a , Evangelos Pournaras b a Professorship of Computational Social Science, ETH Zurich, Zurich, Switzerland b School of Computing, University of Leeds, Leeds, UK

Abstract

The Internet of Things (IoT) comes along with new challenges for experimenting, testing, and operating decentralized socio-technical systems at large-scale. In such systems, autonomous agents interact locally with their users, and remotely with otheragents to make intelligent collective choices. Via these interactions they self-regulate the consumption and production of distributed(common) resources, e.g., self-management of trafﬁc ﬂows and power demand in Smart Cities. While such complex systems areoften deployed and operated using centralized computing infrastructures, the socio-technical nature of these decentralized systemsrequires new value-sensitive design paradigms; empowering trust, transparency, and alignment with citizens’ social values, suchas privacy preservation, autonomy, and fairness among citizens’ choices. Currently, instruments and tools to study such systemsand guide the prototyping process from simulation, to live deployment, and ultimately to a robust operation of a high TechnologyReadiness Level (TRL) are missing, or not practical in this distributed socio-technical context. This paper bridges this gap byintroducing a novel testbed architecture for decentralized socio-technical systems running on IoT. This new architecture is de-signed for a seamless reusability of (i) application-independent decentralized services by an IoT application, and (ii) di ﬀ erent IoTapplications by the same decentralized service. This dual self-integration promises IoT applications that are simpler to prototype,and can interoperate with decentralized services during runtime to self-integrate more complex functionality, e.g., data analytics,distributed artiﬁcial intelligence. Additionally, such integration provides stronger validation of IoT applications, and improves re-source utilization, as computational resources are shared, thus cutting down deployment and operational costs. Pressure and crashtests during continuous operations of several weeks, with more than 80K network joining and leaving of agents, 2.4M parameterchanges, and 100M communicated messages, conﬁrm the robustness and practicality of the testbed architecture. This work promisesnew pathways for managing the prototyping and deployment complexity of decentralized socio-technical systems running on IoT,whose complexity has so far hindered the adoption of value-sensitive self-management approaches in Smart Cities. Keywords:

Internet of Things, Testbed architecture, Socio-technical system, Self-integration, Decentralized systems, Multi-agentsystem

1. Introduction

The Internet of Things (IoT) radically transforms howcomplex socio-technical systems are designed, operatedand managed. Smart Cities turn into organic ecosystemsof ubiquitous sensors, autonomous vehicles, and personalpervasive devices that are massively interconnected anddistributed [1, 2, 3]. New opportunities arise to control andmanage socio-technical systems in real-time as the meansto cope with uncertainties and continuous change [4, 5, 6]:self-improving socio-technical operations by seamlesslyself-integrating decentralized services that measure, learn,optimize, and adapt [7, 8, 9], i.e. load-balancing transportor power networks to prevent tra ﬃ c congestion and black-outs. However, the IoT complexity, heterogeneity, scale,infrastructural cost, and privacy concerns have so far limitedthe broader experimentation and research on designing suchgeneral-purpose services [2, 10]. Nevertheless, decentralized ∗ Corresponding author. Stampfenbachstrasse 48, 8092 Zurich, Switzerland.Email: [email protected] systems exhibit properties that can empower values by designin a socio-technical context: (i) They can better preserve pri-vacy by processing sensitive information locally and allowinginformational self-determination [11, 12]. (ii) They are moretransparent against algorithmic nudging and manipulation,as data are not centrally located and users preserve theirautonomy [13, 14]. (ii) They can be designed to promote socialwelfare such as fairness [15, 16]. Therefore, their adoption insocio-technical IoT applications of Smart Cities has a socialand sustainability impact.Prototyping and testing decentralized socio-technical sys-tems that continuously change and adapt is a challenge. Inparticular, testing in real-world self-improving system inte-gration (SISSY) [4, 5, 6] as the means to cope with complexsystem dynamics as well as user and network uncertaintiesremains to a high extent an ad hoc process. This paper in-troduces a new IoT testbed architecture with a novel dualself-integration capability: (i) An IoT application integratesseveral application-independent and modular decentralizedservices to compose low-cost complex functionalities without

Preprint submitted to Elsevier July 23, 2020 a r X i v : . [ c s . M A ] J u l hanging the application implementation. (ii) A decentralizedservice is integrated into several IoT applications withoutchanging the service implementation. This reusability is madepossible by abstracting the software engineering complexityand interactions within two software agents under user’scontrol: the application agent , and service agent .Prototyping IoT applications and services to support, inparticular, multiple self-integration scenarios, requires testingand reﬁnements at multiple stages that start from simulations,move to live deployments, and ultimately to high TechnicalReadiness Level (TRL) operations. Maintaining differentimplementations, or changing the code back and forth tovalidate new functionality is costly and complex [17, 3]. Expe-rience shows that such ﬂexibility for decentralized multi-agentsystems is extremely scarce [1, 2]. Existing toolkits cannotserve in practice the self-integration scenarios envisioned.This barrier is overcome by introducing a prototyping toolkitthat extends and improves earlier work [18]: The Livepeer toolkit. It provides support for IoT devices, i.e. software agentrunning on smart phones, a new e ﬃ cient networking module, anew scalable logging infrastructure for system monitoring andanalysis, as well as an improved design to limit earlier severememory leaks and synchronization problems.The testbed architecture with the two software agents is ex-perimentally evaluated with real-world data under long-lastingpressure and crash tests over several weeks. The complexoperations of two decentralized socio-technical services areintegrated in Livepeer as a proof of concept: (i) I-EPOS ( Iter-ative Economic Planning and Optimized Sections ) [8], and (ii)DIAS (

Dynamic Intelligent Aggregation Service ) [9]. I-EPOSperforms decentralized combinatorial optimization usinglearning agents with structured interactions. In contrast, DIASperforms real-time collective measurements over a dynamicunstructured network of agents, where agents can arbitraryjoin, leave, or fail, while their input data continuously change.Both services empower highly sophisticated IoT applicationscenarios, such as tra ﬃ c ﬂow optimization, power peak-shaving, load-balancing of bike sharing stations, participatorycrowd-sensing of mobility, and tra ﬃ c [19, 16, 8, 9]. Resultsconﬁrm the self-integration capability, and the performancebenchmarks validate the robustness of the testbed architecturein scenarios of continuous change and adaptation, with morethan 80,000 agents joining and leaving the network, 2.4 millionparameter changes, and 100 million communicated messages.The ﬁndings of this paper provide new insights to communities,government bodies, system operators and utilities on how tomanage, operate, and regulate complex socio-technical IoTinfrastructures.In summary, the contributions of this paper are as follows:(i) A conceptual testbed architecture that facilitates dual self-integration of various decentralized services by an IoT applica- https: // / directorates / heo / scan / engineering / technology [Lastaccessed: May 2020] tion, and di ﬀ erent IoT applications by a decentralized service.(ii) The realization of the conceptual testbed architecture by ab-stracting the software engineering complexity in two softwareagents and their interactions in a generic communication pro-tocol. (iii) An improved and extended distributed prototypingtoolkit for decentralized socio-technical systems of TRL-6. (iv)Improvements of the I-EPOS software artifact [8] that transi-tions from simulations to live deployment with a demonstratedTRL-6 continuous operation. (v) A proof of concept based onthe self-integration and experimental evaluation of two decen-tralized services for IoT applications under highly dynamic en-vironments.The rest of this paper is outlined as follows: Section 2 re-views relevant previous work. Section 3 introduces the testbedarchitecture, and the realization protocol. Section 4 illustratesthe Livepeer toolkit, and Section 5 introduces the two studiedservices. Sections 6 and 7 illustrate the experimental methodol-ogy, and evaluations, respectively. Finally, Section 8 concludesthis paper and outlines future work.

2. Related Work

Self-adaptive frameworks have been earlier introduced to ad-dress the complexity, heterogeneity and uncertainties [39, 40]of large-scale integrated networked systems such as perva-sive / ubiquitous computing and IoT [41]. Such frameworksstudy adaptive service composition in dynamic environmentsat runtime, utilizing various techniques, such as context-awarecomputing [42, 43], service re-selection heuristics [44] andparallel service execution [45, 46]. However, these frameworksoften do not address the self-integration of different physicaldevices at runtime [39].In the ﬁeld of IoT, experimental facilities and physicaltestbeds have been subject to extensive previous researchand surveys [1, 2, 10, 38, 47, 48]. Physical testbeds equipresearchers with deployed and ready-to-use physical de-vices, simplifying the design and evaluation of novel IoTsystems and services (e.g., network protocols, Big Dataalgorithms, city-wide IoT services) under realistic opera-tional conditions [20, 21, 22, 23, 24, 25]. One example isSmartSantander [21] with a city-wide scale ( ∼ / project-speciﬁc re-quirements determine the design and technological aspects(i.e., communication protocols) of sensors, smart objects,and middleware. This limits their reusability in di ﬀ erentdomains and applications. To tackle such challenges, the PaaS (platform-as-a-service) model has been studied andutilized. The PaaS model leverages standard interfaces andinteroperability measures, to provide researchers with tools torapidly develop, execute, and manage IoT systems without thecomplexity of building and maintaining the infrastructure [3].This enables the design and deployment of cross-application2 able 1:

Comparison of related work.

Symbols: PT = Physical Testbed. PaaS = Platform-as-a-Service, TaaS = Testbed-as-a-Service, AO-Arch = Agent-Oriented Architecture. H = Hardware, D = Device, S = Service, Te = Technical, Sy = Syntactical, Se = Semantical.

Related Work Paradigm Abstraction Socio-Technical System Design Considerations Reusability Interoperability

DataLocality Privacy Autonomy DecentralizedControlFIT IoT-Lab [20] PT H - - - - H Te + SySmartSantander [21] PT H - (cid:88) - - H Te + SyCity of Things [22] PT H - - - - H TeCityLab [23] PT H - - - - H TeSmartCampus [24] PT H - - - - H Te + SyMakeSense [25] PT H - (cid:88) - - H TeVICINITY [26] PaaS H + D - (cid:88) - - H + D Te + Sy + SeIoTbed [27] TaaS H - - - - H Te + SyXively [28] PaaS D - - - - D SyLysis [3] AO-PaaS D (AO) (cid:88) (cid:88) (cid:88) (cid:88)

D Sy + SeAoT [29] AO-Arch D (AO) - - (cid:88) - D Te + SySIoT [30] AO-Arch D (AO) - - (cid:88) (cid:88) - SeiSapiens [31] AO-Arch D (AO) - - (cid:88) - D Te + SyBEMOSS [32] AO-Arch D (AO) - - - - D Te + SyUBIWARE [33] AO-Arch D (AO) - - - - D Te + Sy + SeFIoT [34] AO-Arch D (AO) - - (cid:88) (cid:88)

D Te + SyACOSO-Meth [17] AO-Arch D (AO) (cid:88) - (cid:88) - D + S Sy + SeVIVO [35] Framework D (cid:88) (cid:88) - - D SyiCore [36] Framework D (AO) - (cid:88) (cid:88) - D + S Te + Sy + SeFluidware [37] Framework D (cid:88) (cid:88) - (cid:88) D Sy + Se Proposed

AO-Arch D (AO) + S (cid:88) (cid:88) (cid:88) (cid:88) D + S SyThe abstraction indicates the three possible levels of applied abstraction: H: Hardware abstraction by providing software routines to access the hardware via programming interfaces. D:Device abstraction by having virtualized counterparts for each IoT device at the system-level. The AO indicates whether virtual counterpart is an agent. Agents are networked softwarecomponents that autonomously perform speciﬁc tasks on device / user behalf by interacting with other agents and with their environment [17]. S: Indicates service-level abstraction byproviding common communication protocols for IoT services. Data locality, refers to local processing of data, and autonomy is the ability of the device to autonomously interact andactuate its function. Decentralized control indicate the existence / lack of central control entities at the service-level. Reusability refers to the ability to reuse the Hardwares (H), Devices(D), or Services (S) in di ﬀ erent application scenarios. The interoperability illustrates the utilized communication paradigm. Technical refers to technological approaches (e.g., bluetooth),syntactical the shared message formats, and semantical the use of shared ontologies and knowledge representation [38]. IoT platforms [3]. Xively is an example in the context ofdistributed cloud-based applications with a centralized controlplane, where di ﬀ erent tasks are executed in separate platformsand devices. For instance, application-level functions can beexecuted in di ﬀ erent virtual and real entities to reduce latencyand bottlenecks. Nevertheless, PaaS approaches often neglectsocio-technical requirements, such as data locality, privacy,autonomy, and decentralized control.Agent-based computing has been used extensively toenable cooperative, decentralized, dynamic, and open IoTsystems [38]. In such systems, agents autonomously interactand cooperate based on (typically) asynchronous messagepassing mechanisms to perform a task or a service. Sharedcommunication standards facilitate agent interoperability andallow for incorporating heterogeneous resources. Examplesof such systems include Lysis [3], which introduces a PaaS https: // xively.com [Last accessed: May 2020] model with virtualized autonomous social agents, allowingfor the deployment of fully distributed applications. ACOSO-Meth [17] introduces an agent-oriented architecture basedon IoT smart objects, as well as a taxonomy for assessingsystem-level requirements and technological readiness ofIoT systems. While agent-based approaches utilize devicevirtualization and address some socio-technical considerations,on the service-level they often su ﬀ er from lack of standardinterfaces and interoperability. Thus, to reuse a speciﬁc servicein di ﬀ erent applications, the code and communication protocolshould change. The proposed architecture in this paper utilizesservice abstraction to enable the reusability of devices andservices in various application domains.IoT systems are operated in increasingly dynamic and com-plex environments [37], where during system runtime, devicescan fail, users might join / leave, communication across de-vices and agents becomes disrupted, system goals and require-ment can vary, and new services are required. Not all such3hanges can be foreseen during the initial design phase. Of-ten it is infeasible for a centralized controller to have knowl-edge of all such changes in a timely manner. Self-adaptiveapproaches, autonomic computing [49, 50], and hierarchicalself-aware decision-making [51] have been studied as meansto handle such changes at runtime, with minimal human inter-vention [52, 53, 5, 50]. To this end, the proposed testbed ar-chitecture facilitates the rapid prototyping and experimentationof IoT services that can handle dynamic environments (withhigh TRL) as well as autonomously initialize and include var-ious devices and services during runtime. Table 1 illustrates anon-exhaustive comparison between relevant previous research,providing insights of the existing experimental IoT testbeds .

3. A Conceptual IoT Testbed Architecture for System Self-integration

This paper introduces a conceptual testbed architecture,designed to enable seamless reusability and self-integration of(i) application-independent decentralized services by an IoTapplication, and (ii) di ﬀ erent IoT applications by the same de-centralized service. This architecture focuses on decentralizedsocio-technical IoT services with autonomous agents, withoutcentral authority to coordinate the agents and their actions.These agents interact locally with their users and remotelywith each other to make intelligent collective choices, viawhich they can self-regulate the consumption and productionof common resources. Hence, in this context a service isessentially a distributed software running on multiple agents.Examples of such services include monitoring services [54],real-time analytics [9], planning and coordination systems [8],learning techniques [55], and distributed control systems [7].Figure 1 illustrates the conceptual testbed architecture, and twoexamples of self-integrating di ﬀ erent decentralized serviceswith di ﬀ erent IoT applications.To enable the aforementioned reusability and self-in-tegration, the proposed architecture utilizes two levels ofabstraction: (i) IoT application level, and (ii) decentralizedservice level. At the IoT application level, this abstractioncreates application agents : a piece of software lying on eachuser’s IoT device, acting as the middleware for communicationwith the decentralized service. These IoT devices providesensing and actuation capabilities. They are of di ﬀ erent types(e.g., sensors, mobile phones) and geo-spatially distributed. Atthe decentralized service level, this abstraction creates serviceagents, as one-to-one counterparts for each application agent.Each service agent has the following tasks: (i) Receiving datafrom the corresponding application agent (IoT device). (ii)Executing the service by interacting and cooperating withother service agents. Finally, (iii) Providing the outcomeof the service to the application agent (e.g. in the form of Note that the comparisons and distinction of the socio-technical consider-ations are based on the system design goals rather than subsequent third-partyaugmentations and applications.

Table 2:

System entities

Entity Explanation Example (Figure 1a)

IoT Application Control logic Monitoring power demandIoT Device Sensing and actuation Smart metersIoT Service Autonomous general-purpose agents Collective measurementsGateway System bootstrapping proxy for service agents SoftwareService Operator Setting up service agents and the gateway Communities, power utility control commands). This dual abstraction creates a decouplingbetween the internal operations of the IoT application and thecomplex functionality of the decentralized service: The ﬁrstabstraction level (application to services) facilitates the inclu-sion of heterogeneous devices, and their reusability of theirapplications in di ﬀ erent services, while the second abstractionlevel (service to applications) simpliﬁes the reusability ofdecentralized services in new applications, as the interfaces,communication logic, and protocols remain una ﬀ ected bychanges in the IoT devices or applications. In production-readysystems, both the application and service agents can run onthe same computational node to reduce latency and providedata locality, i.e. on a user’s device such as a smartphone, orat two remote nodes, i.e. on a user’s device, a cloud node, or acrowdsourced community server .The deployment and governance of the testbed depend onthe target IoT application. The service agents are deployed andmanaged by the service operator, which can be a third-party me-diator in sensing-as-a-service scenarios [47], or a communityin case of participatory sensing applications [57]. Examples ofthird-party mediators include companies such as Waze , Uber ,and Swiss Mobility , while environmental monitoring [58],and urban sensing [59] are examples of participatory sensingapplications deployed and managed by service communities.Furthermore, Smart City scenarios run by municipalities [60],Smart Grids [61], and smart supply chains [62] are exampleswhere a central authority such as the municipality or the utilitycompany governs the system. However, in participatory sens-ing scenarios, such as environmental monitoring [58], and ur-ban sensing [59], the testbed can be self-governed by users andthe service community (public good infrastructure). & Runtime Cycle

A distributed protocol is designed for the communicationand self-integration between the two abstracted levels. Thisgeneric communication protocol is application and service in-dependent. It determines the communication logic and com-mon interfaces between service and application agents. Table 2illustrates the various system entities in the protocol accordingto Figure 1a. Figures 2 illustrates the protocol sequence dia-gram, and runtime cycle. The protocol is outlined as follows: Such as the Diaspora [56] foundation: diasporafoundation.org [Last ac-cessed: May 2020] https: // https: // https: // a) Monitoring power demand and optimizing tra ﬃ c. (b) Monitoring tra ﬃ c and optimizing power demand. Figure 1:

Conceptual testbed architecture and two examples of application scenarios by self-integrating di ﬀ erent decentralized services with di ﬀ erent IoT applications. Note how byswitching the coupling of the two IoT applications with the two decentralized services, new application scenarios are seamlessly supported. Application Agent Service Agent Gateway Service OperatorInitiated by the Service OperatorBroadcast Gateway Address: broadcastMsg (ii)Register IoT Device: regDevMsg (iii) Assign Service AgentDevice Registered, Agent Assigned: asgnAgnMsg (iv) Request IoT Service: servReqMsg (v)Notify Agent: readyMsg (vi)Agent Ready: agnReadyMsg (vii) Requested Agents ReadyRun Service: runServMsg (viii)Running Service...Periodical / On-demand Date: sensingMsg (ix)Actuation: actuationMsg (ix) Service DataService ExecutedService Finished, Informing Service OperatorService Executed

Figure 2:

The communication protocol that realizes the conceptual testbed architecture.The details of messages are illustrated in Section 3. This protocol treats the running ser-vices as blackboxes and create standard interface between di ﬀ erent components of thetestbed. Hence, di ﬀ erent devices and services can be self-integrated at runtime. (i) The service operator initializes the service agents, and thegateway. The gateway act as a bootstrapping proxy, connect-ing the service agents to the corresponding application agent.The gateway is agnostic of data, the internal processes of IoT applications and decentralized services . (ii) It is assumed thatapplication agents know the public address of the gateway. Thisis possible via the broadcastMsg:{GWAddr, servInfo} bythe gateway, where GWAddr is the gateway address, and servInfo indicates the service. (iii) To connect to the decen-tralized IoT service, each application agent contacts the ser-vice gateway, registers itself, and its corresponding IoT de-vice, via the regDevMsg:{devAddr, devInfo, servInfo} ,where devAddr is the application agent’s address, and devInfo includes information such as device type, and lo-cation. (iv) In response, the gateway assigns a service agentto the IoT device and informs the application agent viathe asgnAgnMsg:{agnAddr} , where agnAddr is the ad-dress of the assigned service agent. (v) The service op-erator submits a service request to the gateway, specify-ing its requested service, and execution metadata, via the servReqMsg:{servInfo, servMD} , where servMD containsthe metadata required to execute the service, such as the num-ber of service agents, number of devices, and their locations.(vi) The gateway receives the service request, and notiﬁesthe service agents, via the readyMsg:{servInfo, servMD} .(vii) The service agents validate the service informationalong with the associated metadata, and reply to the gate-way, via the agnReadyMsg:{agnAddr, servInfo} . (viii)When all agents are notiﬁed and ready to run the ser-vice, the gateway sends the execute service command, viathe runServMsg:{servInfo} . (ix) Each service agentrequests / receives data from the application agent via the sensingMsg:{servInfo, data} , and submits control / actu-ation messages to the IoT device either periodically, on-demand, or at the end of the service execution via the actuationMsg:{servInfo, actuation} . (x) Finally, afterthe service is executed, the application agent, gateway, and theservice operator are informed. In practice, the gateway does not need to be a separate entity, and can beincorporated in a service agent. . Prototyping Decentralized IoT Systems To ensure efﬁcient and reliable performance, continuous test-ing and reﬁnements are needed, from the early stages of simula-tion, to live deployment, to long-lasting stable operation. More-over, high costs and complexity are involved in the mainte-nance of different implementations when code is changed backand forth to validate new functionality and expand to new ap-plication domains. To address these challenges, this paper in-troduces the Livepeer toolkit. Livepeer is based on an im-proved version of the general-purpose prototyping toolkit Pro-topeer [18], now made highly robust and e ﬃ cient for long-termoperations. Livepeer comes with a high Technical ReadinessLevel (TRL-6) and its new modules include: (i) The redesignedand reengineered Protopeer node, providing core functionalitysuch as communication protocols (e.g. TCP messaging), timers,and an execution environment for the service agents. (ii) Soft-ware clients, acting as application agents, for supporting IoTdevices (i.e. smartphones). (iii) A networking module for efﬁ-cient and reliable application-to-services and service-to-appli-cations communication. (iv) A scalable monitoring infrastruc-ture, integrated to each computational node, for application / ser-vice monitoring and analysis. & Reengineering Protopeer

The Protopeer toolkit [18] is designed with the main goal offacilitating the rapid prototyping of P2P applications, and thetransition from simulation to live environments. However, thetransition from live environments to robust, long-lasting, TRL-6 operations is not trivial, as the latter often has a larger scale,higher realism regarding performance degradation by networkpartitions, and message losses. For instance, long-term oper-ations require magnitude resources, incur a higher number ofoperations, and communicate a larger volumes messages. Thisresults in the discovery of unforeseen system faults and de-ﬁciencies, such as sporadic message losses, deadlocks, syn-chronization, excessive thread counts, and memory leaks byevolving communication processes. To address the above chal-lenges, this paper introduces a redesigned and reengineeredversion of Protopeer for highly efﬁcient and robust long-last-ing experimentation. In addition to several implementation-spe-ciﬁc improvements, the redesigned Protopeer includes: (i) Aredesigned communication and networking module, enablingmultiple queues for message processing (Section 4.2), and (ii)a novel module for distributed monitoring and logging of ser-vices, events, and memory usage (Section 4.3). This newsystem has been tested extensively over several weeks underhighly dynamic environments, with approximately 3000 nodejoin / leaves, 150,000 runtime parameter changes, and over 2.1million exchanged messages per day (Section 7.2).Figure 3 illustrates a summarized view of the internal ar-chitecture of a Livepeer node. There are two core conceptswithin each node: the peer, and the peerlet. The peer providescore functionality such as communication protocols (e.g., TCPmessaging), and timers. It acts as the execution environment(container) for the peerlets. The peerlets, on the other hand, PeerNetwork API Time APIMessaging QueuingMessage SerialisationTCP (ZeroMQ) Simulated Network

Inbound QueueOutbound QueueMonitoring Peerlet Communication Topology Peerlet Service Peerlet (Agent) … C o mm on f o r S i m u l a ti on & L i v e D e p l oy m e n t s Different Implementations for Simulation and Live

Peerlets

Figure 3:

Internal architecture and modules of the Livepeer node, a redesigned version ofthe Protopeer node. The modules are separated in two categories: deployment-dependentand deployment-independent. This design has the advantage to ﬁrst test a system in a con-trolled simulated environment and then gradually move to a large-scale live deployment,while the deployment-dependent modules are the only ones that require change. are independent modules that provide speciﬁc functionality andtasks. Typically a node consists of a single Livepeer peer andmultiple peerlets that collectively fulﬁll the required function-ality of the service. In addition to the service agent, two otherexamples of peerlets are the communication topology peerlet,and the monitoring peerlet. The communication topology peer-let determines the service network topology and the commu-nication logic, for instance, a tree-topology [63] (utilized in I-EPOS, see Section 5.1), or the gossip-based peer sampling forP2P systems [64] (utilized in DIAS, see Section 5.2) The mon-itoring peerlet stores and submits logs from di ﬀ erent modulesin the peer to the monitoring infrastructure (Section 4.3). Bydefault, this peerlet includes three di ﬀ erent logging modes: (i) Service logger , which logs the speciﬁc service logs. (ii)

Eventlogger , which provides event-based logging, and insights intoexecution sequence. (iii)

Memory logger , which measures thetotal memory footprint of a peer in memory, including nestedobjects.

The proposed testbed utilizes a messaging protocol basedon a fast and lightweight TCP / IP implementation using Ze-roMQ for agent-to-agent communication. Each agent is instan-tiated with a single PULL socket and multiple PUSH sockets.Thus, each agent can act as a sink and receives messages fromany other agents in the network, whilst simultaneously sendingmessages to other agents through the PUSH sockets. This isachieved by implementing two independent messaging queues,one for inbound messages, and one for outbound messages.This separation allows the monitoring of queue size within eachagent to regulate tra ﬃ c ﬂows.6 .3. Monitoring Infrastructure A major challenge in decentralized IoT services with au-tonomous agents is to log and monitor the service, both at theindividual agent level, as well as system-wide [65]. Devicesand agents can be geo-spatially distributed, and deployed overdi ﬀ erent computational clusters and networks [66]. Whileeach agent can run autonomously and independently, mostanalytics require an aggregated view in real-time, whilst alsoallowing to drill-down and investigate the internal activitieswithin a single agent. Providing such multi-granular views iseven more challenging for large-scale systems [67]. SimpleNFS (Networked File System) solutions may not provide thenecessary throughput to sustain heavy logging from severalagents [67]. Additionally, IoT devices and agents can berestricted in computational resources, hence, the logging andmonitoring needs to be lightweight, simple to use, e ﬃ cient,and with minimal impact on the real-time performance [68].To address these challenges, this paper devises a monitoringinfrastructure, comprised of the following components: (i) Asingle database, containing the logged data from agents. (ii)A single logging gateway, accessible to all agents, which re-ceives the logs and commits them to the database. The maintask of the logging gateway is to perform authentication, au-thorization, and connection pooling for the database. (iii) Asingle, lightweight peerlet on each agent, known as the moni-toring peerlet, which collects the logs and submits them to thelogging gateway . This infrastructure is easy to integrate (byonly adding the peerlet) in Livepeer nodes. It is distributed, andmodular in design, and can be connected to various observabil-ity platforms and dashboards, such as Grafana and Redash for real-time visualization.

5. Studied Services

This paper studies two live implementations of generic multi-purpose IoT services as proof-of-concept and use-cases of theproposed architecture; Namely, collective learning based onthe I-EPOS system (Section 5.1), and decentralized collectivemeasurements based on the DIAS system (Section 5.2). I-EPOS performs collective learning for multi-agent combinato-rial problems [8], utilizing a structured network topology (tree-topology) with synchronous learning iterations and communi-cation between agents. DIAS performs decentralized privacy-preserving data analytics, relying on local computations, peer-to-peer interactions, and hashed information [9, 7]. It hasan unstructured topology (P2P) for asynchronous communica-tion [9]. Due to their decentralized socio-technical design, bothof these services are very relevant to IoT applications. How-ever, they are profoundly di ﬀ erent in their operation, whichchallenges the ﬂexibility and applicability of the proposed ar-chitecture. The earlier version of the Protopeer toolkit saves logging objects locallythat creates a discrepancy for post-processing and analysis as systems operatein the long run https: // grafana.com [Last accessed: May 2020] https: // redash.io [Last accessed: May 2020] The I-EPOS system [8] performs fully decentralized, self-organizing, privacy-preserving combinatorial optimization .The I-EPOS agents (service agents) provide the learning ser-vice, and each has a set of local plans generated by the ap-plication agent. These plans can be alternative routes from anautonomous vehicle, or power consumption schedules from asmart appliance (e.g., smart washing machine). I-EPOS opti-mizes a system-wide goal, measured by a global cost function.This goal can be load-balancing tra ﬃ c ﬂows in a city [19], orpeak-shaving power demand for Smart Grids [16]. The I-EPOSagents interact and cooperate with each other to select a planthat minimizes the global cost. The agents self-organize in atree-topology [63] as a way of structuring their interactions. I-EPOS performs consecutive learning iterations, which includestwo phases: the bottom-up (leaves to root) phase and top-down (root to leaves) phase. At each iteration t , agent u selects theplan p tu , s to satisfy the following optimization objective: p tu , s : = arg | P u | min j = (cid:32) (1 − ( α + β )) (cid:16) f G ( p tu , j ) (cid:17) + β (cid:16) f L ( p tu , j ) (cid:17) + α (cid:16) f U ( p tu , j ) (cid:17)(cid:33) (1)In the above equations, | P u | is the number of plans for agent u , f G ( p tu , j ) is the global cost of selecting p tu , j , which can be thevariance of tra ﬃ c load across di ﬀ erent routes (in case of traf-ﬁc load-balancing). Each plan has a local cost, calculated by f L ( p tu , j ), which can be the trip duration (in case of alternativeroutes), or user discomfort (in case of shifting power consump-tion). f U ( p tu , j ) is the unfairness, calculated by the dispersion ofthe local cost of the selected plans over all agents, with lowervalues indicating more equal distribution of the local cost acrossall agents. The α , β , and 1 − ( α + β ) parameters indicate theagents’ preferences for unfairness, local cost, and global cost,respectively. For instance, an agent with α = , β = selﬁsh agents, which prioritizes the minimizationof its local cost, while another agent with α, β = altruistic agent, which minimizes the global cost. Afterthe ﬁnal iteration F is completed, p Fu , s is presented to the users’devices for execution. Further elaboration on I-EPOS is out ofthe scope of this paper and the interested reader is referred toprevious work [8]. The DIAS system [9] performs fully decentralized privacy-preserving data analytics for the Internet of Things . Eachapplication agent acts as a data supplier and consumer: Datasuppliers are sensors that locally generate a stream of real-timeprivacy-sensitive data, while data consumers collect these datato compute information (e.g., summation, average, max / min,top-k.). For instance, data suppliers provide consumption datafrom residential smart meters, and data consumers receive the Available at http: // epos-net.org [Last accessed: May 2020] Available at http: // dias-net.org [last accessed: May 2020] k possible states. Additionally, DIAS addresses two other un-certainties: changes in the set of possible states, and agents leav-ing / failing / rejoining the network. The challenge here is to pre-serve the accuracy of DIAS estimations under these two dynam-ics. To address this, DIAS uses a distributed memory systembased on Bloom ﬁlters [71] to track the history of the performedcomputations and when needed perform self-corrective actions.Further elaboration on DIAS is out of the scope of this paperand the interested reader is referred to previous work [7, 9].

6. Experimental Methodology & Settings

The experiments in this paper are divided into two evalu-ation scenarios, both utilize the Livepeer toolkit, and followthe conceptual architecture, and communication protocol illus-trated on Figures 1 and 2, respectively. The ﬁrst evaluation sce-nario (Section 6.1) studies the accuracy of the two services inlive environments, to provide a performance benchmark for thetestbed architecture and Livepeer toolkit given experimental re-alism [21]. The second evaluation scenario (Section 6.2) studiesthe e ﬃ ciency and robustness of the services during long-lastingoperation under dynamic and volatile environments. Experiments under live environments, even without dynamicchanges, can incur inaccuracies due to networking errors (e.g.,packet losses), clock di ﬀ erences across machines, and systemfailures [10]. This evaluation scenario analyzes the accuracy ofthe two studied scenarios, and provides a benchmark compar-ison in live non-volatile environments, to study the validity ofthe testbed architecture and the Livepeer toolkit. For I-EPOS,this evaluation is made by comparing the simulation and live de-ployments of the service, and for DIAS the evaluation is madebased on long-lasting operation with high experimental realism. Table 3:

I-EPOS settings and parameters for the two evaluation scenarios. The proﬁles forevaluation scenario I are illustrated in Table 4 . In evaluation scenario II, the number ofagents range between [150 , α, β values range between [0 , α + β = Parameters Value inEvaluation Scenario I Value inEvaluation Scenario II

PerformedExperiments 100 per proﬁle Continuous: Intensity changeevery 8 hoursNumber of Agents 50 / /

300 [150 , / / / RMSE MIN-VAR / RMSELocal Cost Function Discomfort DiscomfortAgent Preference α = β = / α, β ∈ [0 , , α + β = The experimental settings and parameters for I-EPOS in eval-uation scenario I are illustrated in Table 3. The utilized datasetcontains charging plans for 2779 electric vehicles (EV) in threedifferent planning horizons: 1, 3, and 7 days ahead [72]. In allcases, each EV has 4 alternative charging plan in the form ofa vector, specifying the energy demand for each minute duringthe planning horizon. For 1-day-ahead plans, the length is 1440(24 h ∗ min ), and for 3 and 7-days-ahead plans, the length is4320 (3 d ∗ h ∗ min ), and 10080 (7 d ∗ h ∗ min ), re-spectively. Two di ﬀ erent global cost functions are applied tothe total charging demand of the participating EVs, each ad-dressing a di ﬀ erent charging scenario: (i) Minimizing chargingdemand variance (MIN-VAR), and (ii) shifting charging timesto night (MIN-RMSE) . Each plan also has a local cost, whichis its discomfort calculated by the historical likelihood of us-ing the EV while charging [72] . The performed experimentsare based on the 12 proﬁles illustrated in Table 4. Each pro-ﬁle is tested 100 times, and overall there are 1200 experimentsin the simulation, and 1200 in the real-world environment. Tocompare the performance between the simulation and live envi-ronments, the relative di ﬀ erences between the global cost andaverage local cost are calculated as: Relative global cost di ﬀ erence: g ts , i − g tl , i g ts , i Relative average local cost di ﬀ erence: l ts , i − l tl , i l ts , i (2)where g ts , i and g tl , i are the global costs of proﬁle i at iteration t insimulation and live settings, respectively. Similarly, l ts , i and l tl , i are the average local costs of all agent in proﬁle i at iteration t in simulation and live settings, respectively. MIN-RMSE: Minimizing the root mean square error between the totalcharging demand of all EVs, and the steering signal set by the service oper-ator to incentivize night charging. The steering signal is a vector of the samelength as the charging plans, with the day-time charging target set to 0. Further elaboration on this dataset can be found in previous work [72]. able 4:

12 Proﬁles used for I-EPOS experiments scenario I

Proﬁles Scale AgentPreference PlanningHorizon Global CostFunction α = β = α = β = α = β = α = β = α = β = α = β = α = β = α = β = α = β = α = β = α = β = α = β = These experiments are based on the GDELT (

Global Datasetof Events, Languages, and Tone ) platform . GDELT moni-tors and captures print / broadcast / web-based global news mediain real-time. Its data can be accessed via an API in 15-minuteintervals. This paper employs the DIAS-GDELT demonstra-tor [73] . It fetches GDELT news updates every 15 minutes,extracts the possible states, and sends them to the applicationagents. DIAS agents (service agents) are mapped to 28 appli-cation agents, each representing a country from GDELT. EachDIAS agent receives the number of news generated during thelast 15 minutes from the application agent, disseminates them inthe network, and receives the aggregated total number of newsgenerated by the other agents. In this evaluation scenario, a set of continuous real-world ex-periments between 24 /

11 - 24 / / ﬀ erentrate of change for system dynamics. The experimental settings for this scenario are shown in Ta-ble 3. The I-EPOS service is initialized with 200 agents, andeach agent is randomly assigned to one of the 2779 EVs fromthe EV dataset with 7-days-ahead planning horizon. Duringruntime, four dynamics are adopted, each corresponding to achange in system settings: (i) Agents joining / leaving, (ii) lo-cal plan change, (iii) α, β (weight) change, and (iv) global costfunction change. The rate of change for each dynamic variesacross the intensity periods (Table 5), with the high-intensityperiod incurring the highest number of changes. For exam- https: // http: // dias-net.org / dias-gdelt-live / [Last accessed: May 2020] Table 5:

Rate of change for dynamics in I-EPOS and DIAS live experiments

Service / Parameters Intensity / RateLow Medium HighI-EPOS

Plan Change 10% 20% 50% α and β Change 10% 20% 50%Global Cost Function (System-wide) 10% 20% 50%Agent Join / Leave 10% 20% 50%

DIAS

Change of Possible States 3h 2h 1hChange of Selected State 5 (cid:48) (cid:48) (cid:48) Agent Join / Leave 10 (cid:48) (cid:48) (cid:48) ple, in the low-intensity period at the end of each run agentschange their plans with 5% probability. The rate of change for α and β operates the same way, however, the change in the globalcost function is applied system-wide. The e ﬀ ect of such dy-namic changes on the performance of I-EPOS is studied usingtwo metrics: (i) The latency indicates the variation of the I-EPOS execution time with varying dynamics, with respect tonon-changing dynamics [74]. The execution time is deﬁned asthe time it take (in milliseconds) for I-EPOS to complete 50 it-erations, plus applying changes enforced by the dynamics (e.g.,agents join / leave, changes in plans, or α, β values). The latencyis calculated as follows: Latency : = Varying Dynamics Execution TimeNon-changing Dynamics Execution Time (3)(ii) WAT, which indicates if the system spends excessivetime adapting to dynamic changes with respect to performingservice-related task. The WAT is calculated as follows:

WAT : = Working timeAdaptivity time (4)where the working time concerns the time (in milliseconds) re-quired to execute the 50 learning iterations of I-EPOS, whilethe adaptivity time concerns the time required to adapt to dy-namic changes [74]. For instance, adapting to changes in thenumber of agents, which triggers the self-reorganization of thetree-topology.

At the start of each day, DIAS is initialized with 20 agents.During runtime, three di ﬀ erent dynamics are adopted, each cor-responding to a change in the system settings: (i) Agents join-ing / leaving, (ii) change in the set of possible states, and (iii)change in the selected state. Every time an agent changes itsset of possible states, it randomly selects 9 numbers betweenthe current time and the next hour. For instance, the possiblestates for an agent at 14:00 (1400) is 9 numbers in the rangeof [1400 , Each run refers to the completion of 50 learning iterations by I-EPOS. . . . R e l a ti v e D i ff e r e n ce Iterations Iterations P r o f il e s (a) Global Cost (b) Local Cost Figure 4:

Relative di ﬀ erence in global and local cost between the simulation and live settings, calculated based on Equation 2. The value of each cell shows the mean across 100 repeatedexperiments for the given proﬁle. The low values conﬁrm that utilizing the conceptual architecture and the Livepeer toolkit, I-EPOS can transition from simulation to live with minimalintroduced error. GDELT Actual DIAS Actual DIAS Estimated

Time S u m Figure 5: (i) GDELT Actual: baseline values extracted from GDELT (i.e., total number of news items generated by the 28 countries) (ii) DIAS Actual: sum of selected states from eachof the 28 DIAS agents, based on the set of possible states for each agent. (iii) DIAS Estimated: the estimated total number of news items by all countries, calculated by averaging theestimation of each agent. DIAS can accurately estimate the actual GDELT events in the long term. across the intensity periods (Table 5), with the high-intensityperiod incurring the highest number of changes on the sys-tem. For example, in the low-intensity period, each DIAS agentchanges its selected state every 5 minutes. The agent join / leaverate means that, in the high-intensity case, all agents leave thenetwork every 2 minutes, and return 2 minutes later. The deployment infrastructure of the testbed is as follows:one server with higher computational power for scaling, anda less powerful server for long-running experiments. Bothservers provide ‘bare metal’ access, which is substantially fasterthan using virtual images. The larger machine has the follow-ing speciﬁcations: Intel Xeon hex-core 3.50GHz 256GB DDR3RAM, 2TB Raid 1 storage, Ubuntu 16.04. As for the smallermachine: Intel Core i7-6700 Quad-Core, 64 GB DDR4 RAM,1TB storage space, Ubuntu 16.04. Each service agent is imple-mented using the Livepeer toolkit (Section 4) as a separate JVMobject . The logging gateway is a single persistence daemon,that creates a single connection to the database (PostgreSQL10.6) and has a predeﬁned commit rate and queue size that can By using this approach, in principal agents can run on a di ﬀ erent machinesand networks. be adapted based on system scale. It listens to ZeroMQ mes-sages with logging information sent by the agents, and commitsthe information to the database. Each agent notiﬁes the log-ging gateway of the required relations, tables, and indices, inthe form of SQL query templates created on the database. Thecommunication between all agents is based on message pass-ing, implemented based on the ZeroMQ library.

7. Experimental Results

This section illustrates the results of the experiments basedon the methodology introduces in Section 6.1.

Figure 4 shows the relative di ﬀ erence in global and localcost between the simulation and live environments across theI-EPOS iterations. For each proﬁle, the global cost is calcu-lated based on the corresponding global cost function (MIN-VAR / MIN-RMSE) in Table 4, and the local cost is the averagediscomfort of agents’ selected plans, calculated by the likeli-hood of using the EV while it is charging [72]. The value of10 N od e s (cid:3) J o i n / L ea (cid:89) e C (cid:88) m (cid:88) l a ti (cid:89) e (cid:237)1(cid:24)(cid:237)10(cid:237)(cid:24)0(cid:24)101(cid:24) 0(cid:24)0010001(cid:24)0020002(cid:24)003000 N od e s (cid:3) J o i n / L ea (cid:89) e C (cid:88) m (cid:88) l a ti (cid:89) e (a) Nodes Join/Leave - upper figure: snapshot from December 12 th , 2019, and lower figure: 30-days live operations P (cid:79) a (cid:81) (cid:3) C h a (cid:81)g e (cid:86) C (cid:88) (cid:80) (cid:88) (cid:79) a (cid:87)i (cid:89) e L(cid:82)(cid:90)(cid:3)I(cid:81)(cid:87)e(cid:81)(cid:86)i(cid:87)(cid:92)Medi(cid:88)m(cid:3)I(cid:81)(cid:87)e(cid:81)(cid:86)i(cid:87)(cid:92)High(cid:3)I(cid:81)(cid:87)e(cid:81)(cid:86)i(cid:87)(cid:92)C(cid:88)m(cid:88)la(cid:87)i(cid:89)e(cid:3)/(cid:3)R(cid:82)lli(cid:81)g(cid:3)Mea(cid:81) P (cid:79) a (cid:81) (cid:3) C h a (cid:81)g e (cid:86) C (cid:88) (cid:80) (cid:88) (cid:79) a (cid:87)i (cid:89) e (b) Plan Changes - upper figure: snapshot from December 12 th , 2019, and lower figure: 30-days live operations W e i gh (cid:87) (cid:3) C h a (cid:81)g e (cid:86) C (cid:88) (cid:80) (cid:88) (cid:79) a (cid:87)i (cid:89) e W e i gh (cid:87) (cid:3) C h a (cid:81)g e (cid:86) C (cid:88) (cid:80) (cid:88) (cid:79) a (cid:87)i (cid:89) e (c) Weight Changes - upper figure: snapshot from December 12 th , 2019, and lower figure: 30-days live operations (cid:57)A (cid:53)(cid:53) (cid:48) (cid:54) (cid:40) Runs G C F (cid:3) C h a ng e s C u m u l a ti (cid:89) e VA RR M S E G C F (cid:3) C h a ng e s C u m u l a ti v e (d) GCF Changes - upper figure: snapshot from December 12 th , 2019, and lower figure: 30-days live operations Figure 6:

EPOS live operations between 24 /

11 - 24 / / th , 2019, while the lower ﬁgure illustrates the operations over the month-longexperiments. L a (cid:87) e (cid:81) c (cid:92) R (cid:82) (cid:79)(cid:79)(cid:76) (cid:81)g (cid:3) M ea (cid:81) Lo(cid:90)(cid:3)In(cid:87)en(cid:86)i(cid:87)(cid:92)Medi(cid:88)m(cid:3)In(cid:87)en(cid:86)i(cid:87)(cid:92)High(cid:3)In(cid:87)en(cid:86)i(cid:87)(cid:92)Rolling(cid:3)Mean L a (cid:87) e (cid:81) c (cid:92) R (cid:82) (cid:79)(cid:79)(cid:76) (cid:81)g (cid:3) M ea (cid:81) (a) Latency - upper figure: snapshot from December 12 th , 2019, and lower figure: 30-days live operations R(cid:88)(cid:81)(cid:86) W A T R (cid:82) (cid:79)(cid:79)(cid:76) (cid:81)(cid:74) (cid:3) M ea (cid:81) W A T R (cid:82) (cid:79)(cid:79)(cid:76) (cid:81)(cid:74) (cid:3) M ea (cid:81) (b) WAT - upper figure: snapshot from December 12 th , 2019, and lower figure: 30-days live operations Figure 7:

EPOS live operations between 24 /

11 - 24 / / th , 2019. Latency indicates the ratio between I-EPOS runtime given varying dynamics withrespect to runtime with static non-changing dynamics (Equation 3). WAT indicates if the system is spending too much time adapting to dynamic changes rather than performing the I-EPOSlearning iterations (Equation 4). Note that due to lower WAT (higher adaptivity time), I-EPOS manages to complete fewer runs in higher intensity period, during the same time-frame. each cell in Figure 4 shows the mean across 100 repeated exper-iments for the given proﬁle. For each experiment, I-EPOS ini-tializes random trees and assigns the agents to the nodes. Thisrandom assignment generates small variations in the I-EPOSoutcome [75]. However, the general trend across all proﬁlesshows that as the number of learning iterations progresses tothe ﬁnal iteration (50), the global and average local costs of thesimulation and live environments converge. After 10 learningiterations, the relative di ﬀ erence in global cost for all proﬁles isless than 0.02, and the highest di ﬀ erence in average local cost atthe ﬁnal iteration is 0.0256 related to Proﬁle 10. This conﬁrmsthat the I-EPOS transition from simulation to live based on theLivepeer toolkit, can be performed with minimal introduced in-accuracies. Figure 5 outlines three time-series based on the experimentalmethodology introduced in Section 6.1.2: (i) GDELT Actual:the raw baseline values extracted from GDELT, representingthe total number of news items generated by the 28 countries.(ii) DIAS Actual: the sum of selected states from each of the28 DIAS agents. Each selected state is the number of generatednews items by the assigned country. The set of possible states isextracted by a sliding a window of 27 observations, uniformlysampling 9 values. (iii) DIAS Estimated: The estimated total number of news items by all countries (estimated DIAS actual),calculated by averaging the estimates of each agent. This esti-mation is what each DIAS agent calculates as the true value forthe DIAS actual. The accuracy of this estimation is a ﬀ ected byvarious factors, such as the sampling pool size of data suppliers,convergence time, and the selected state changes. This exper-iment has been running since November 1 st , 2018. As illus-trated in Figure 5, DIAS service based on the Livepeer toolkitcan perform long-running operation and accurate estimations ofthe GDELT baseline. It can rapidly adapt to sudden changes. This section illustrates results of the experiments based onthe methodology introduced in Section 6.2.

These experiments were continuously executed between24 /

11 - 24 / / th , 2019, as well as the live operationduring the month-long experiments. During a typical day (overthe month-long period), I-EPOS live handles approximately80.000 changes in agents’ plans (2.4 million), 80.000 changesin α, β parameters (2.4 million), 3000 agents joining / leaving12 N od e s (cid:3) J o i n / L ea (cid:89) e C (cid:88) m (cid:88) l a ti (cid:89) e (cid:237)10(cid:237)50510 01000200030004000500060007000 N od e s (cid:3) J o i n / L ea (cid:89) e C (cid:88) m (cid:88) l a ti (cid:89) e (a) Nodes Join/Leave - upper figure: snapshot from December 12 th , 2019, and lower figure: 30-days live operations N u m .(cid:3) o f (cid:3) M e ss a g e s C u m u l a ti (cid:89) e N u m .(cid:3) o f (cid:3) M e ss a g e s C u m u l a ti (cid:89) e (b) Number of Messages - upper figure: snapshot from December 12 th , 2019, and lower figure: 30-days live operations E (cid:86) (cid:87)i (cid:80) a (cid:87) e d (cid:3) S (cid:88) (cid:80) T (cid:85) (cid:88) e (cid:3) S (cid:88) (cid:80) L(cid:82)(cid:90)(cid:3)I(cid:81)(cid:87)e(cid:81)(cid:86)i(cid:87)(cid:92)Medi(cid:88)(cid:80)(cid:3)I(cid:81)(cid:87)e(cid:81)(cid:86)i(cid:87)(cid:92)High(cid:3)I(cid:81)(cid:87)e(cid:81)(cid:86)i(cid:87)(cid:92)C(cid:88)(cid:80)(cid:88)(cid:79)a(cid:87)i(cid:89)e/T(cid:85)(cid:88)e(cid:3)S(cid:88)(cid:80)/R(cid:82)(cid:79)(cid:79)i(cid:81)g(cid:3)Mea(cid:81) E (cid:86) (cid:87)i (cid:80) a (cid:87) e d (cid:3) S (cid:88) (cid:80) T (cid:85) (cid:88) e (cid:3) S (cid:88) (cid:80) (c) Estimated Sum - upper figure: snapshot from December 12 th , 2019, and lower figure: 30-days live operations R(cid:88)n(cid:86) D I A S (cid:3) E rr o r R o lli ng (cid:3) M ea n D I A S (cid:3) E rr o r R o lli ng (cid:3) M ea n (d) DIAS Error- upper figure: snapshot from December 12 th , 2019, and lower figure: 30-days live operations Figure 8:

DIAS live operations between 24 /

11 - 24 / / th , 2019, while the lower ﬁgure illustratesthe operations over the month-long experiments. The initial burst in DIAS error (Figure 8c) is due to the change in set of possible states at midnight. The estimated sum is calculated byaveraging the estimation of each agent, indicating the value each DIAS agent estimates as the true sum of selected state for all DIAS agents. DIAS error is the di ﬀ erence between true sum(raw data) and the estimated sum. While this error increases with the rise in intensity, due to quick dissemination and convergence, the rolling mean remains low. ﬀ erent intensity settings, wherethe average WAT is always higher than 1. Generally, if the ratiois less than one, the system is spending a lot of time adapting tochanges [74]. The above experiments conﬁrm that even underhighly dynamic environments, I-EPOS completes its learningiterations without any crashes / failures, and delivers the learn-ing outcome. The experiments are continuously executed between 24 /

11- 24 / / th , 2019, as well as the live operation dur-ing the month-long experiments. During a typical day (over themonth-long period), DIAS handles approximately 4000 agentjoins / leaves (250,000), 16,000 state changes (480,000), and 2million exchanges of messages (100 million). The estimatedsum of the selected states of all DIAS agents is shown in Fig-ure 8c. As shown, even under intense dynamic changes, theDIAS live still provides accurate estimations. Finally, Figure 8dillustrates the overall DIAS error, calculated as the di ﬀ erencebetween true sum (raw data) and the estimated sum. This er-ror is caused by various factors, such as summarization (rawvalues to the set of possible states), rapid state changes, agentjoining / leaving, and convergence time. As shown, this errorincreases with the rise in intensity, however due to quick dis-semination of state changes and convergence in the network,the rolling mean error is low.

8. Conclusion and Future Work

This paper introduces a novel IoT testbed architecture fordecentralized socio-technical services and applications runningon IoT. This architecture applies two layers of abstractionon both the IoT application (devices), and the decentralizedservices, enabling a dual self-integration capability: (i) anIoT application integrating several application-independentand modular decentralized services, and (ii) a decentralizedservice integrates to several IoT applications without, changingthe implementation of the service. A distributed commu-nication protocol is designed to realize and operationalizethe conceptual architecture, providing common interfacesand the communication logic required for self-integration ofapplications and services at runtime. Additionally, this papercontributes the Livepeer toolkit, providing a general purposeIoT prototyping toolkit for rapid design and testing of decen-tralized socio-technical applications, as well as facilitating thetransition from simulation to live environments. Experimentalevaluations on two decentralized IoT services, performedunder highly dynamic environments conﬁrm the e ﬃ ciency, androbustness of the testbed architecture. This work promises new instruments for prototyping and de-veloping decentralized socio-technical services running on IoT,and pathways to manage their complexity, which so far havehindered value-oriented self-management approaches in SmartCities. Ultimately, the architecture and toolkit will be able to fa-cilitate pilot tests in Smart City use-cases. Future research canaddress the inclusion of other decentralized services with di ﬀ er-ent requirements and network structures, device / agent mobility,and semantic service composition. Lastly, further deploymentsin larger-scale infrastructures (e.g., PlanetLab ) can providenew insights about the applicability of the proposed architec-ture. Acknowledgement

This work was supported by the ERC Advanced Grant(324247) Momentum, and the Engineering Social Technolo-gies for a Responsible Digital Future Project at ETH Zurichand TU Delft. Authors would also like to thank Renato Kunzfor his contributions in development and implementation of thetestbed.

Artifacts & Reusability

To facilitate the reusability of the testbed and Livepeer toolkitby the community, the code bases, protocols, and the documen-tations are made openly available: Simulation and live versionsof I-EPOS , Simulation and live versions of DIAS , monitor-ing infrastructure and its documentation , Livepeer and itsdocumentation , and IoT device / application agents are alsoavailable for the community. References [1] H. Arasteh, V. Hosseinnezhad, V. Loia, A. Tommasetti, O. Troisi,M. Shaﬁe-Khah, P. Siano, Iot-based smart cities: a survey, in: 2016IEEE 16th International Conference on Environment and Electrical Engi-neering (EEEIC), IEEE, pp. 1–6.[2] M. Chernyshev, Z. Baig, O. Bello, S. Zeadally, Internet of things (iot):Research, simulators, and testbeds, IEEE Internet of Things Journal 5(2017) 1637–1647.[3] R. Girau, S. Martis, L. Atzori, Lysis: A platform for iot distributed appli-cations over socially connected objects, IEEE Internet of Things Journal4 (2016) 40–51.[4] K. Bellman, J. Botev, A. Diaconescu, L. Esterle, C. Gruhl, C. Landauer,P. R. Lewis, A. Stein, S. Tomforde, R. P. W¨urtz, Self-improving systemintegration-status and challenges after ﬁve years of sissy, in: 2018 IEEE3rd International Workshops on Foundations and Applications of Self*Systems (FAS* W), IEEE, pp. 160–167.[5] K. L. Bellman, C. Gruhl, C. Landauer, S. Tomforde, Self-improving sys-tem integration-on a deﬁnition and characteristics of the challenge, in:2019 IEEE 4th International Workshops on Foundations and Applicationsof Self* Systems (FAS* W), IEEE, pp. 1–3. https: // / about https: // github.com / epournaras / EPOS https: // github.com / epournaras / DIAS-Development https: // github.com / epournaras / Livelog https: // github.com / epournaras / Livelog-Documentation https: // github.com / epournaras / Livepeer https: // github.com / epournaras / Livepeer-Documentation https: // github.com / epournaras / DIASClient

6] S. Tomforde, M. Goller, To adapt or not to adapt: A quantiﬁcation tech-nique for measuring an expected degree of self-adaptation, Computers 9(2020) 21.[7] E. Pournaras, M. Yao, D. Helbing, Self-regulating supply–demand sys-tems, Future Generation Computer Systems 76 (2017) 73–91.[8] E. Pournaras, P. Pilgerstorfer, T. Asikis, Decentralized collective learningfor self-managed sharing economies, ACM Transactions on Autonomousand Adaptive Systems (TAAS) 13 (2018) 10.[9] E. Pournaras, J. Nikolic, A. Omerzel, D. Helbing, Engineering democ-ratization in internet of things data analytics, in: 2017 IEEE 31st Inter-national Conference on Advanced Information Networking and Applica-tions (AINA), IEEE, pp. 994–1003.[10] A. Gluhak, S. Krco, M. Nati, D. Pﬁsterer, N. Mitton, T. Razaﬁndralambo,A survey on facilities for experimental internet of things research (2011).[11] N. Zhang, W. Zhao, Distributed privacy preserving information sharing,in: Proceedings of the 31st international conference on Very large databases, Citeseer, pp. 889–900.[12] S. Mahboubi, R. Akbarinia, P. Valduriez, Privacy-preserving top-k queryprocessing in distributed systems, in: Transactions on Large-Scale Data-and Knowledge-Centered Systems XLII, Springer, 2019, pp. 1–24.[13] A. Preece, Asking whyin ai: Explainability of intelligent systems–perspectives and challenges, Intelligent Systems in Accounting, Financeand Management 25 (2018) 63–72.[14] A. Adadi, M. Berrada, Peeking inside the black-box: A survey on explain-able artiﬁcial intelligence (xai), IEEE Access 6 (2018) 52138–52160.[15] Y. Huang, G. Poderi, S. ˇS´cepanovi´c, H. Hasselqvist, M. Warnier, F. Bra-zier, Embedding internet-of-things in large-scale socio-technical systems:A community-oriented design in future smart grids, in: The Internet ofThings for Smart Urban Ecosystems, Springer, 2019, pp. 125–150.[16] F. Fanitabasi, E. Pournaras, Appliance-level ﬂexible scheduling for socio-technical smart grid optimization, IEEE Access (2020).[17] G. Fortino, W. Russo, C. Savaglio, W. Shen, M. Zhou, Agent-oriented co-operative smart objects: From iot system design to implementation, IEEETransactions on Systems, Man, and Cybernetics: Systems 48 (2017)1939–1956.[18] W. Galuba, K. Aberer, Z. Despotovic, W. Kellerer, Protopeer: a p2ptoolkit bridging the gap between simulation and live deployement, in:Proceedings of the 2nd International Conference on Simulation Toolsand Techniques, ICST (Institute for Computer Sciences, and Social-Informatics, p. 60.[19] I. Gerostathopoulos, E. Pournaras, Trapped in tra ﬃ c? a self-adaptiveframework for decentralized tra ﬃ c optimization, in: 2019 IEEE / ACM14th International Symposium on Software Engineering for Adaptive andSelf-Managing Systems (SEAMS), IEEE, pp. 32–38.[20] C. Adjih, E. Baccelli, E. Fleury, G. Harter, N. Mitton, T. Noel, R. Pissard-Gibollet, F. Saint-Marcel, G. Schreiner, J. Vandaele, et al., Fit iot-lab:A large scale open experimental iot testbed, in: 2015 IEEE 2nd WorldForum on Internet of Things (WF-IoT), IEEE, pp. 459–464.[21] L. Sanchez, L. Mu˜noz, J. A. Galache, P. Sotres, J. R. Santana, V. Gutier-rez, R. Ramdhany, A. Gluhak, S. Krco, E. Theodoridis, et al., Smartsan-tander: Iot experimentation over a smart city testbed, Computer Networks61 (2014) 217–238.[22] S. Latre, P. Leroux, T. Coenen, B. Braem, P. Ballon, P. Demeester, Cityof things: An integrated and multi-technology testbed for iot smart cityexperiments, in: 2016 IEEE international smart cities conference (ISC2),IEEE, pp. 1–8.[23] J. Struye, B. Braem, S. Latr´e, J. Marquez-Barja, The citylab testbedlarge-scale multi-technology wireless experimentation in a city environment:Neural network-based interference prediction in a smart city, in: IEEE IN-FOCOM 2018-IEEE Conference on Computer Communications Work-shops (INFOCOM WKSHPS), IEEE, pp. 529–534.[24] M. Nati, A. Gluhak, H. Abangar, W. Headley, Smartcampus: A user-centric testbed for internet of things experimentation, in: 2013 16th Inter-national Symposium on Wireless Personal Multimedia Communications(WPMC), IEEE, pp. 1–6.[25] J. Jiang, R. Pozza, N. Gilbert, K. Moessner, Makesense: An iot testbedfor social research of indoor activities, arXiv preprint arXiv:1908.03380(2019).[26] A. Cimmino, V. Oravec, F. Serena, P. Kostelnik, M. Poveda-Villal´on,A. Tryferidis, R. Garc´ıa-Castro, S. Vanya, D. Tzovaras, C. Grimm, Vicin-ity: Iot semantic interoperability based on the web of things, in: 2019 15th International Conference on Distributed Computing in Sensor Sys-tems (DCOSS), IEEE, pp. 241–247.[27] M. M. Hossain, S. Al Noor, Y. Karim, R. Hasan, Iotbed: A genericarchitecture for testbed as a service for internet of things-based systems.,in: ICIOT, pp. 42–49.[28] N. Sinha, K. E. Pujitha, J. S. R. Alex, Xively based sensing and mon-itoring system for iot, in: 2015 International Conference on ComputerCommunication and Informatics (ICCCI), IEEE, pp. 1–6.[29] A. M. Mzahm, M. S. Ahmad, A. Y. Tang, Agents of things (aot): Anintelligent operational concept of the internet of things (iot), in: 2013 13thInternational Conference on Intellient Systems Design and Applications,IEEE, pp. 159–164.[30] L. Atzori, A. Iera, G. Morabito, Siot: Giving a social structure to theinternet of things, IEEE communications letters 15 (2011) 1193–1195.[31] F. Cicirelli, A. Guerrieri, G. Spezzano, A. Vinci, O. Briante, G. Ruggeri,isapiens: A platform for social and pervasive smart environments, in:2016 IEEE 3rd World Forum on Internet of Things (WF-IoT), IEEE, pp.365–370.[32] M. Pipattanasomporn, M. Kuzlu, W. Khamphanchai, A. Saha, K. Rathi-navel, S. Rahman, Bemoss: An agent platform to facilitate grid-interactive building operation with iot devices, in: 2015 IEEE InnovativeSmart Grid Technologies-Asia (ISGT ASIA), IEEE, pp. 1–6.[33] V.-M. Scuturici, S. Surdu, Y. Gripay, J.-M. Petit, Ubiware: Web-baseddynamic data & service management platform for ami, in: Proceedingsof the posters and demo track, 2012, pp. 1–2.[34] N. M. do Nascimento, C. J. P. de Lucena, Fiot: An agent-based frameworkfor self-adaptive and self-organizing applications based on the internet ofthings, Information Sciences 378 (2017) 161–176.[35] L. Luceri, F. Cardoso, M. Papandrea, S. Giordano, J. Buwaya, S. Kundig,C. M. Angelopoulos, J. Rolim, Z. Zhao, J. L. Carrera, et al., Vivo: Asecure, privacy-preserving, and real-time crowd-sensing framework forthe internet of things, Pervasive and Mobile Computing 49 (2018) 126–138.[36] P. Vlacheas, R. Gia ﬀ reda, V. Stavroulaki, D. Kelaidonis, V. Foteinos,G. Poulios, P. Demestichas, A. Somov, A. R. Biswas, K. Moessner, En-abling smart cities through a cognitive management framework for theinternet of things, IEEE communications magazine 51 (2013) 102–111.[37] F. Zambonelli, M. Viroli, G. Fortino, B. Re, Towards adaptive ﬂow pro-gramming for the iot: The ﬂuidware approach, in: 2019 IEEE Interna-tional Conference on Pervasive Computing and Communications Work-shops (PerCom Workshops), IEEE, pp. 549–554.[38] C. Savaglio, M. Ganzha, M. Paprzycki, C. B˘adic˘a, M. Ivanovi´c,G. Fortino, Agent-based internet of things: State-of-the-art and researchchallenges, Future Generation Computer Systems 102 (2020) 1038–1053.[39] P. Novoa-Hern´andez, C. C. Corona, D. A. Pelta, Self-adaptation in dy-namic environments-a survey and open issues, International Journal ofBio-Inspired Computation 8 (2016) 1–13.[40] C. Krupitzer, F. M. Roth, S. VanSyckel, G. Schiele, C. Becker, A sur-vey on engineering approaches for self-adaptive systems, Pervasive andMobile Computing 17 (2015) 184–206.[41] L. Baresi, L. Pasquale, Live goals for adaptive service compositions, in:Proceedings of the 2010 ICSE Workshop on Software Engineering forAdaptive and Self-Managing Systems, pp. 114–123.[42] A. Urbieta, A. Gonz´alez-Beltr´an, S. B. Mokhtar, M. A. Hossain, L. Capra,Adaptive and context-aware service composition for iot-based smartcities, Future Generation Computer Systems 76 (2017) 262–274.[43] K. Geihs, M. Wagner, Context-awareness for self-adaptive applicationsin ubiquitous computing environments, in: International Conference onContext-Aware Systems and Applications, Springer, pp. 108–120.[44] L. Barakat, S. Miles, M. Luck, Adaptive composition in dynamic serviceenvironments, Future Generation Computer Systems 80 (2018) 215–228.[45] Y. Brun, R. Desmarais, K. Geihs, M. Litoiu, A. Lopes, M. Shaw, M. Smit,A design space for self-adaptive systems, in: Software Engineering forSelf-Adaptive Systems II, Springer, 2013, pp. 33–50.[46] A. Ferscha, Collective adaptive systems, in: Adjunct Proceedings of the2015 ACM International Joint Conference on Pervasive and UbiquitousComputing and Proceedings of the 2015 ACM International Symposiumon Wearable Computers, pp. 893–895.[47] S. K. YR, H. Champa, An extensive review on sensing as a serviceparadigm in iot: Architecture, research challenges, lessons learned andfuture directions, International Journal of Applied Engineering Research / IFIP / USENIX International Conference on Dis-tributed Systems Platforms and Open Distributed Processing, Springer,pp. 79–98.[65] Q. Wang, W. U. Hassan, A. Bates, C. Gunter, Fear and logging in theinternet of things, in: Network and Distributed Systems Symposium.[66] F. Bonomi, R. Milito, J. Zhu, S. Addepalli, Fog computing and its rolein the internet of things, in: Proceedings of the ﬁrst edition of the MCCworkshop on Mobile cloud computing, ACM, pp. 13–16.[67] H. Cai, B. Xu, L. Jiang, A. V. Vasilakos, Iot-based big data storage sys-tems in cloud computing: perspectives and challenges, IEEE Internet ofThings Journal 4 (2016) 75–87.[68] D. Trihinas, G. Pallis, M. Dikaiakos, Low-cost adaptive monitoring tech-niques for the internet of things, IEEE Transactions on Services Comput-ing (2018).[69] E. Pournaras, M. Vasirani, R. E. Kooij, K. Aberer, Decentralized plan-ning of energy demand for the management of robustness and discomfort, IEEE Transactions on Industrial Informatics 10 (2014) 2280–2289.[70] M. Jelasity, S. Voulgaris, R. Guerraoui, A.-M. Kermarrec, M. Van Steen,Gossip-based peer sampling, ACM Transactions on Computer Systems(TOCS) 25 (2007) 8.[71] B. H. Bloom, Space / time trade-o ﬀ s in hash coding with allowable errors,Communications of the ACM 13 (1970) 422–426.[72] E. Pournaras, S. Jung, S. Yadhunathan, H. Zhang, X. Fang, Socio-technical smart grid optimization via decentralized charge control of elec-tric vehicles, Applied Soft Computing 82 (2019) 105573.[73] E. Pournaras, E. Gaere, R. Kunz, A. N. Ghulam, Democratizing data an-alytics: Crowd-sourcing decentralized collective measurements, in: 13thInternational Conference on Self-adaptive and Self-organizing Systems(SASO 2019), IEEE.[74] E. Kaddoum, C. Raibulet, J.-P. Georg´e, G. Picard, M.-P. Gleizes, Crite-ria for the evaluation of self-* systems, in: Proceedings of the 2010 ICSEWorkshop on Software Engineering for Adaptive and Self-Managing Sys-tems, ACM, pp. 29–38.[75] J. Nikolic, E. Pournaras, Structural self-adaptation for decentralized per-vasive intelligence, in: 2019 22nd Euromicro Conference on Digital Sys-tem Design (DSD), IEEE, pp. 562–571.s in hash coding with allowable errors,Communications of the ACM 13 (1970) 422–426.[72] E. Pournaras, S. Jung, S. Yadhunathan, H. Zhang, X. Fang, Socio-technical smart grid optimization via decentralized charge control of elec-tric vehicles, Applied Soft Computing 82 (2019) 105573.[73] E. Pournaras, E. Gaere, R. Kunz, A. N. Ghulam, Democratizing data an-alytics: Crowd-sourcing decentralized collective measurements, in: 13thInternational Conference on Self-adaptive and Self-organizing Systems(SASO 2019), IEEE.[74] E. Kaddoum, C. Raibulet, J.-P. Georg´e, G. Picard, M.-P. Gleizes, Crite-ria for the evaluation of self-* systems, in: Proceedings of the 2010 ICSEWorkshop on Software Engineering for Adaptive and Self-Managing Sys-tems, ACM, pp. 29–38.[75] J. Nikolic, E. Pournaras, Structural self-adaptation for decentralized per-vasive intelligence, in: 2019 22nd Euromicro Conference on Digital Sys-tem Design (DSD), IEEE, pp. 562–571.