openCoT: The opensource Cloud of Things platform
11 openCoT: The opensource Cloud of Thingsplatform Abolfazl Danayi ∗ , Saeed Sharifian †∗ Amirkabir University of Technology, Tehran, [email protected] † Amirkabir University of Technology, Tehran, Iransharifi[email protected]
Abstract —In order to address the complexity and extensiveness of technology, Cloud Computing is utilized withfour main service models. The most recent service model, function-as-a-service, enables developers to develop theirapplication in a function-based structure and then deploy it to the Cloud. Using an optimum elastic auto-scaling, theperformance of executing an application over FaaS Cloud, overcomes the extra overhead and reduces the total cost.However, researchers need a simple and well-documented FaaS Cloud manager in order to implement their proposedAuto-scaling algorithms. In this paper, we represent the openCoT platform and explain its building blocks and details.Experimental results show that executing a function (invoking and passing arguments) and returning the result usingopenCoT takes 21 ms over a remote connection. The source code of openCoT is available in the GitHub repository ofthe project ( ) for public usage.
Index Terms —Cloud Computing, FaaS, serverless, function-as-a-service, cloud of things (cid:70)
NTRODUCTION
Cloud Computing is one of the answers to theproblem of the growing complexity and exten-siveness of the technology. Besides, it providesa faster platform for application developmentand a better way for updating an application.Due to the definition of NIST in [1], a cloudachieves this goal by providing on demandresource that can be rapidly provisioned andfreed. In order to achieve this goal, cloudsoffer different service models. In both academiaand enterprises, IaaS, PaaS and SaaS are wellknown. A recently born service model is FaaSin which the cloud allows a user to execute acode in the form of a function and the user doesnot face the complexity of managing, schedul-ing and execution of the code over underlyingresources [2].In order to make use of the benefits of thismodel optimally, the programming architecturemust be reviewed and maybe modified. Histor-ically, the monolithic programming architecture has been the dominant choice for both applica-tion development and execution [3]. However,besides this approach, there have been alterna-tives. Microservices architecture, as a subset ofthe Service Oriented Architectures [4], is one ofthe potential optimal choices for utilization ofFaaS Clouds [5], [2], [6]. However, service ori-ented programming has some overheads (suchas API calls) in comparison to plain programsand primarily it seems to suffer from lowerefficiency. But when we look into the problemconsidering its execution on Clouds the scalingand resource allocation may result in a betterefficiency [7], [2], [8]. The usage of FaaS cloudsfor microservices programming is getting moreattention and in [9], authors have proposeda benchmark procedure for evaluation of theperformance of FaaS clouds.In the cases of IaaS, PaaS and SaaS, re-searchers have proposed many papers on theAuto-scaling subject and this problem has re-ceived enough attention. However, as the FaaSis a recent service model, its Auto-scaling algo- a r X i v : . [ c s . N I] J a n rithms need a new branch of research work. Inthis paper, we propose the openCoT platformwhich is designed and implemented in orderto help researchers implement their cloud pro-visioning algorithms and analyze the output inthe real world. openCoT has a modular design,and can be setup easily in a local or remotenetwork.The rest of this paper is structured as follows.In section 2, the main blocks of openCoT aredefined and explained. Then, the underlyingstructure, abstracted as Node , is covered insection 3. In the next section, the heart ofthe system,
Controller module, and also themechanisms which are used to connect
Nodes to the
Controller , will be proposed. In section5, the implementation details and experimentalresults are given, and in the last section conclu-sion and future works are established.
AIN BLOCKS OF THE ARCHITEC - TURE
The main concepts in openCoT are Controller,Node and Function and Cloud Broker whichis not a part of openCoT but is the externallayer that uses the Controller and is consideredas a part of the architecture. In this section,we introduce each building block using a top-down approach.
The Broker uses Controller APIs in order tomake use of the openCoT. Broker is responsiblefor: • Collecting user requests • Formatting requests and inserting theminto the openCoT • Collecting returned values of function exe-cutions and passing them to correspondinguser/application(s). • Auto-scaling system using the openCoT’sscaling API (The auto-scaling format willbe discussed layer). • Setting up ports table (Ports and communi-cation mechanisms will be discussed later) • Setting up initial state of the system • Setting up a folder for functions sourcecodes and introducing its path to the Con-troller • Introducing the number of Nodes insideeach cluster to openCoT
The heart of openCoT is the
Controller ; The corethat setups servers so that nodes can commu-nicate with and on the other hand, providesa simple API for the Cloud Broker. Controllerdispatches Function Execution Requests (FER)between nodes and scales the system based onthe Broker’s order.
A Node is a host computer that is setup andrunning the openCoT’s
Node.py program. ANode only needs to know the Internet (IPand port) address of the Clerk server (in Con-troller), and then it automatically starts pairingwith the Controller, auto-scaling itself and alsoreceiving Function Execution Requests along-side with executing them and returning theRET values. In order to support heterogeneity,we have also defined a concept called Cluster.A Cluster is a number of nodes with sim-ilar physical attributes. Nodes, automaticallydownload the source codes of functions fromthe Controller and build Docker [10] images.On a Node there is a number of Function Exe-cution Units. Each Function Execution Unit is adocker container that can run its correspondingfunction when invoked. func.py and a requirements.txt file, inside a folderwhose name is similar to the name of thatfunction. The func.py file is the main codepart of the function. The following script isthe simplest example of a func.py file andexplains the fixed structure of naming anddefinition of a function. def f(FER):return {’ret’:’Hello Cloud of Things!’}
INPUTS is a dictionary and hasan arbitrary structure which is defined by thedeveloper user and will be announced to con-suming users if needed.
METADATA is the dataprovided by Cloud Broker (the upper layer ofopenCoT) and will be passed to the functiontoo. The Cloud Broker should announce thestructure of
METADATA to the users. Please notethat the FER is a defined entity and existsinside openCoT. In other words, each requestfor execution of a function shapes an FER andexists until a node (that can handle that func-tion) receives it for execution. Cloud Broker hasto provide all three fields when submitting anFER to the system, tough the ID field will beeliminated from the FER when passing to thefunction for execution, and will be attached tothe corresponding RET again and passed byBroker at the end. Thus, the broker knows theRET goes back to which request. {’id’:ID, ’x’:INPUTS, ’m’:METADATA} ID will be at-tached to it and will be returned to the Broker.RET values are in form of the following struc-ture where RET_VAL is the returned dictionaryof the function and stat holds the status ofthe execution (i.g. OK and ERROR ). {’id’:ID, ’ret’:{’stat’:STATUS,’val’:RET_VAL}} AND N ODE
In the previous section, we briefly explainedNodes and Clusters. In this section, we providedetailed information about Nodes. As men-tioned before, on a Node there is a number
Node.pyNodeServiceFEU Service FEU ServiceFEU FEU Deployerimage Baseimage imageimageimageimageAgent Agent
Fig. 1. The structure of Node of Function Execution Units (FEU) that runfunctions when invoked. Each FEU has its owndocker container and is able to execute oneFER in parallel with other FEUs. The structureof Nodes in openCoT is shown in Fig. 1. A nodeprovides five characteristics: • Communications with the (remote) Con-troller • Performing the Auto-scaling mechanism • Performing the FER execution mechanism • Performing the RET collection mechanism • Performing the Container deploymentmechanism
As shown in Fig. 1, the three top classes areNode, NodeService and Deployer.
Node.py ,is the main building block and communicateswith the Controller while the NodeService isresponsible for creating, managing and com-municating with FEUs via FEUService objects.However, when creating an FEU container, thecorresponding Docker image must exist on thesystem. Thus, the Depolyer class is responsiblefor managing and creating FEU images whennecessary. Alongside with FER execution andFEU image deployment, the third mechanismwhich an openCoT Node provides is the Auto-scaling. These three mechanism are explainedin the following subsections.
In the mentioned function execution mecha-nism shown in Fig. 5, there are three modulesinvolved in a Node. In this subsection wepropose and explore each of them.
On the Controller’s side, a Gate is devoted foreach function. A Gate consists of two entities:a Dispatcher which is the server that sendsFERs to Nodes and also a Collector that re-ceives RETs from Nodes named as PULL andPUSH servers, respectively. On the other side,Agents (on Nodes) are the Gate clients. AnAgent is a process which asks the PUSHserver for FERs. If there are no FERs submittedto the Gate, it waits a determined period oftime, and then rechecks. In opposite, if theGate responds with an FER, it schedules theFER to an available (non-busy) FEU via theNodeService and waits for the completion andthen sends the results to the PULL server andrestarts this cycle. It worth mentioning that thenumber of Agents for a function on a Node, isequal to the number of FEUs for that function.Thus, When an Agents asks NodeService foran available FEU, there is at least one availableFEU; However, FEUs are not bound to Agents.In other words, an Agent can send its FER toany FEU.
This class is a thread that provides a between-process TCP/IP communication (Local-host)with FEUs and sends FERs to FEUs usingthis socket connection. It also receives RETsand passes them to the NodeService (and theAgent) using a python asynchronous scheme.FEUService is also responsible for sending theFIN message to the FEU which causes the FEUto finish its life-cycle and shutdown. UnlikeAgents, FEUServices are bound to their corre-sponding FEUs.
The Function Execution Unit (FEU) is a Dockerbased
Container which receives FERs and
FEU.pyBoot CoreFunc.py
InnerExposedPort MappedPort
Fig. 2. The structure of FEU passes them to the function which is imple-mented on it. The FEU structure is shown inFig. 2. Each FEU Container immediately starts
FEU.py program which reads the
Boot file.The
Boot file consists of the inner server’s IPand Port. This server is implemented in the
Core.py file, and listens to the dedicated Port.An important trick is that for all FEUs thisinner port is the same within the container,but is mapped to a different outer port bythe Docker engine and NodeService. In thefirst version of openCoT,
Core.py ’s servercan handle two types of requests (primitives):
EXE and
FIN . When FEUService invokes the
EXE primitive attached with the plain (bytes)FER,
Core.py executes
Func.py and returnsthe plain RET over TCP/IP connection to theFEUService. On the other hand,
FIN primitiverequests
Core.py to close the program.
Another important task of the Node class isthe Scaling mechanism. As explained in theprevious subsection, A node consists of FEUs,FEUServices and Agents and we also explainedthat on a Node, the numbers of FEUs, FEUSer-vices and Agents are the same. We define theScaling process of a Node as the process of allo-cating a Scaling Table over it. The transmissionof this table from the Controller to the Nodewill be covered later in this paper. It is up tothe NodeService class to perform the allocation.The structure of a Scaling Table is given inthe following code segment. It is a dictionarywhere keys are the name of requested functions (same with FEU images), N i is the number ofinstances of that function and P i is the portionof CPU allocated to those N i FEUs. WhenNodeService receives a Scaling Table, it firstflushes the Node, in other word, closes all Cur-rent FEUs by sending
FIN primitive to themusing their FEUServices. After that, the Node-Service creates new FEUs and FEUServices andbounds each FEU to its corresponding FEUSer-vice; finally handles all created FEUServices tothe upper layer,
Node.py . Finally,
Node.py creates N i agents for each function. { ’function1’:(N1, P1),’function2’:(N2, P2),...’function_M’:(N_M, P_M)} As an instance, the following Scaling Tablerequests for creation of 1 hellocot
FEU with10% of the CPU, 2 echo
FEUs with 10% of theCPU and 3 echo
FEUs 20% of the CPU for eachof them. { ’hellocot’:(1, 0.1),’echo’:(2, 0.1),’echo’:(3, 0.2),}
Another key process is the Deployment mech-anism. When
Node.py receives the Scalingrequest (Scaling Table), before passing it tothe NodeService, it checks if the images of re-quested FEUs exist (cached) on the Node. If oneor more are missing, the
Node.py requests theClerk Server on the Controller for the sourcefiles, func.py and requirements.txt . Thenusing the
Deployer class, creates FEU imagesfor missing functions. We call this process De-ployment. The Depolyer class as shown in Fig.1 has access to
Base source files. The
Base contains the following list: • FEU.py file • Core.py file • Boot file
ControllerControllerCore.pyFunctions Init AutoScalerMethods Gates ClerkAutoScalingG1 Gn S1 Sm Ev Clerk Server
Fig. 3. The structure of Controller • common_convs.py file • Dockerfile
The first three files are already known tothe reader and, the same classes existing inthe FEU. The common_convs.py file is usedinside the FEU but is not shown in Fig. 2. Itis a simple library that provides dictionary-to-plainByte conversion functions which are usedby FEUService and
Core.py . The last file isthe most important one, and is the file whichhas the settings for the docker engine in orderto create the FEU image.
ONTROLLER
The previous section explored the structure andfunctionalities of Nodes. In this section, weintroduce the Controller and also the mech-anisms which are used for Node-Controllercommunications that are build upon the highperformance ZeroMQ (zmq) [12]. The structureof the Controller is shown in Fig. 3. Startingwith the next subsection, we first explain thisstructure and then discuss the proposed com-munication mechanisms in each subsection.
As shown in Fig. 3, the Controller consistsof four main communication mechanisms thatare implemented in the
ControllerCore.py class. We call each of these mentioned mecha-nisms a space. The first space, Methods space,is the wrapper of high level functions offered to the Cloud Broker. The Clerk server is used toannounce the Internet address of other servers.Auto-scaling space consists of an Events serverand a number of Scaling servers. And finally,the Gates space is responsible for sending tasksto Nodes.Furthermore, there are three more modulesembedded in the Controller. The functions database is a root directory which consists ofthe source ( func.py ) and requirements filesof each function within a sub-directory namedwith the label of that function.
Init is alsoa directory that includes the initialization set-tings, such as ports table, the label of clus-ters and number of Nodes inside each clus-ter. The Autoscaler is another module thatthe
ControllerCore.py utilizes in order tomanage auto-scaling process and keep track ofNodes scaling.
The Methods space provides high level meth-ods (function calls) for the Cloud Broker. Themain methods are listed and explained below. • Autoscale
This method receives the Auto-Scaling table. Although the structure ofNode’s Scaling table is explained, theAuto-Scaling table is a bit different. Thistable describes how many nodes withineach cluster and with which Scaling table(can be more than one Scaling tables) arerequired. • Push FER
The Push FER method receivesthe FER and the function label and pushesit in the FERs queue for that function label. • Pop RET
The argument of this methodis the function label and returns the RETobject on a FIFO basis. • Check Available
This method gets thefunction label and returns a True valueif there are RET objects available in thequeue of that function. The Broker can thencall the
Pop RET method to get this RET.
The Clerk server is used by Nodes to queryinformation from the Controller. As mentionedbefore we have used the ZeroMQ platform for
Cluster 1ScalingServerClutser CScaling ServerScalingEvents Server Cluster 1Cluster C
Fig. 4. Auto-scaling space communication. ZeroMQ offers four messagingpatterns and the
REQ/REP pattern fulfills theClerk server’s requirements better than otherpatterns. In this pattern, a node sends a
REQ message and the the Clerk server replies (
REP ).The first usage of the Clerk server is to queryif the Controller is set or not. In this case,the Node sends a chk message and the serverreplies with a OK message. The second and thethird types are the queries for Gate Ports tableand Auto-Scaling Ports table. The forth case isthe function source query in which the nodeasks the Clerk server for the source code of afunction when it needs a function deployment. As depicted in Fig. 4, for each cluster a Scal-ing server is dedicated. This server follows a
REQ/REP pattern. In addition to this servers,the Scaling Events server is shared betweenall clusters and follows a
PUB/SUB pattern.Whenever the Cloud broker submits a Auto-scaling table to the Cloud Broker, the Eventsserver sends the Scaling event to all of thelistening (Subscriber) nodes. As Nodes receivethis event, they send a scaling request to thecorrelated Scaling server and the server returnsthe Scaling table to the node. It is also possiblefor the scaling server to return a null scalingtable and in this case, the Node finds out thatthere is no need for its utilization and firstflushes all of its working FEU, FEUService andagents and then scales itself out.
PushServerFERs Queue Pull ServerRETs QueueGate AgentPullClientPush Client Execution (FEUService)
PUSHREQREP
Fig. 5. Function execution mechanism
As mentioned before, the function executionmechanism has two main sides. A Gate whichis devoted for each function on the Controllerand also Agents on Nodes. This space is shownin Fig. 5. When the Cloud Broker submitsFERs using the Controller’s methods space,the Controller pushes the FER into the FERsQueue of the related Gate. On the other hand,Agent’s Pull client sends an
FER REQ requestto PUSH Server using a
REQ/REP protocol.The Push server checks the FERs Queue andif there are FERs, pops them from the queueand sends them to the push client. In oppositewhen the queue is empty, push server returnsa
Null
REP message. After the Push Serverpasses the FER to the Agent, it executes theFER and acquires the RET and then pushes theRET object to the Pull Server. The
PUSH/PULL pattern is used for the connection between thePull server and its clients.
XPERIMENTAL RESULTS
In this section we check the performance andfunctionality of the openCoT in two scenar-ios. In Scenario-A, execution of the hellocot function is analyzed. In Scenario-B, openCoTis used to calculate the fast Fourier transform(FFT) of 100 blocks. Each block contains 256samples. Two host computers are used in sce-narios. C a has an Intel(R) Core(TM) i7-6500UCPU @ 2.50GHz with 8 GBytes of RAM mem-ory and C b utilizes an AMD Athlon(tm) IIX2 240 Processor @ 2.8 GHz with 3 GBytesof RAM. In both scenarios C a runs the Con-troller. It worth mentioning that the Controller Available as a default function in the Github repositoryof the project. M e a n e x e c u t i o n t i m e ( s ) Fig. 6. Scenario-A result 1 M e a n e x e c u t i o n t i m e ( s ) Fig. 7. Scenario-A result 2 computer can a Node too. In this case, theconnection is made using the Local Host’s IP . In this scenario, first the hellocot function isexecuted in a typical Python environment 1000times on the C a computer. Results show thatthe mean execution time is 1.196 microsecondswith a standard deviation of 0.3 microseconds.Then the C a is utilized to be the Controllerand simultaneously a Node with 10 FEUs andthis configuration is tested and the result isshown in Fig. 6. In each iteration 1000 FERsare submitted and when all RETs are receivedthe mean execution time is calculated.Finally, C b is used for the explained test, andthe result is shown in Fig. 7. M e a n e x e c u t i o n t i m e ( s ) Fig. 8. Scenario-B result
In this scenario a simple FFT function is im-plemented using the Numpy library [13] and isagain called 1000 times on C a outside openCoTfor blocks of size 256. In this case we have 98microseconds for the average execution timeand 64 microseconds of standard deviation.Then, both C a and C b are used and the resultis shown in Fig. 8.Based on the result of Scenario-A1 the over-head of running a function using openCoTon the local host is 6.2 ms. Using the ping command, it is determined that the round-trip-time between C a is C b is 6.25 ms. Thus,the pure overhead in Scenario-A2, is 21 ms.The difference between these two values comesfrom the fact that C a is more powerful than C b and thus, it can be concluded that the overheadtime is related to the power of the computer onboth sides (Node and Controller).According to the results of the Scenario-A,we can guess that the mean execution time ofScenario-B would be: .T C a + 10 .T C b
10 + 10 = 13 . s However, the mean execution time is a bit moreand is 16.1. The difference comes from the highdata rate of the Scenario-B as the values aresent and received in the ASCII format.
ONCLUSION AND F UTURE WORKS
In this work, motivated by the goal of enablingacademia with a simple and well-documented FaaS Cloud platform for research purposes,we introduced the openCoT platform and thendescribed its main blocks which are Broker,Controller, Node and FEU. After that, we ex-plained the structure of Node and FEUs in de-tails. Three communication mechanisms wereproposed for the Controller-Node connection.We tested the performance of openCoT andmeasured the overhead time added to the func-tion execution. Experimental results show thaton a Local Host, 6.25 ms is added by theopenCoT, where it is increased to 21 ms overremote connection. Based on these results, wesuggest the following list as the future worksof openCoT: • This version of openCoT only supportsCPU allocation. However, the bandwidthallocation plays an essential role too andmust be added to openCoT. • In this paper, we have tested openCoTusing two simple scenarios. More compre-hensive benchmarks are needed. • Currently, the function execution mech-anism of openCoT supports Python ob-jects which can be transformed into JSONstrings. In order to provide better perfor-mance, a bytes level data transfer shouldbe utilized. • In this version of openCoT, there are no se-curity considerations. For commercial useor for more precise academic analysis, con-sidering security protocols (i.g. Node au-thentication) is vital. R EFERENCES [1] P. Mell, T. Grance et al. , “The nist definition of cloudcomputing,” 2011.[2] A. Abrahamsson, “Using function as a service for dy-namic application scaling in the cloud,” 2018.[3] S. Daya, N. Van Duy, K. Eati, C. M. Ferreira, D. Glozic,V. Gucer, M. Gupta, S. Joshi, V. Lampkin, M. Martins et al. , Microservices from Theory to Practice: Creating Applicationsin IBM Bluemix Using the Microservices Approach . IBMRedbooks, 2016.[4] P. Di Francesco, I. Malavolta, and P. Lago, “Research onarchitecting microservices: trends, focus, and potential forindustrial adoption,” in
Software Architecture (ICSA), 2017IEEE International Conference on . IEEE, 2017, pp. 21–30.[5] F. Alder, N. Asokan, A. Kurnikov, A. Paverd, andM. Steiner, “S-faas: Trustworthy and accountablefunction-as-a-service using intel sgx,” arXiv preprintarXiv:1810.06080 , 2018. [6] E. Van Eyk, L. Toader, S. Talluri, L. Versluis, A. UÈ ˙Z˘a, andA. Iosup, “Serverless is more: From paas to present cloudcomputing,”
IEEE Internet Computing , vol. 22, no. 5, pp.8–17, 2018.[7] G. McGrath and P. R. Brenner, “Serverless computing:Design, implementation, and performance,” in
DistributedComputing Systems Workshops (ICDCSW), 2017 IEEE 37thInternational Conference on . IEEE, 2017, pp. 405–410.[8] T. Asghar, S. Rasool, M. Iqbal, Z. Qayyum, A. Noor Mian,and G. Ubakanma, “Feasibility of serverless cloud ser-vices for disaster management information systems,”2018.[9] T. Back and V. Andrikopoulos, “Using a microbenchmarkto compare function as a service solutions,” in
Euro-pean Conference on Service-Oriented and Cloud Computing .Springer, 2018, pp. 146–160.[10] D. Merkel, “Docker: lightweight linux containers for con-sistent development and deployment,”
Linux Journal , vol.2014, no. 239, p. 2, 2014.[11] M. F. Sanner et al. , “Python: a programming languagefor software integration and development,”
J Mol GraphModel , vol. 17, no. 1, pp. 57–61, 1999.[12] P. Hintjens,
ZeroMQ: messaging for many applications . "O’Reilly Media, Inc.", 2013.[13] N. Developers, “Numpy,”