From Zero to Fog: Efficient Engineering of Fog-Based IoT Applications
FFrom Zero to Fog: Efficient Engineering of Fog-Based IoTApplications
Tobias Pfandzelter, Jonathan Hasenburg, and David BermbachMobile Cloud Computing Research GroupTechnische Universit¨at Berlin & Einstein Center Digital Future { tp, jh, db } @mcc.tu-berlin.deAugust 19, 2020 Abstract
In IoT data processing, cloud computing alone doesnot suffice due to latency constraints, bandwidth lim-itations, and privacy concerns. By introducing inter-mediary nodes closer to the edge of the network thatoffer compute services in proximity to IoT devices,fog computing can reduce network strain and highaccess latency to application services. While this isthe only viable approach to enable efficient IoT ap-plications, the issue of component placement amongcloud and intermediary nodes in the fog adds a newdimension to system design. State-of-the-art solu-tions to this issue rely on either simulation or solvinga formalized assignment problem through heuristics,which are both inaccurate and fail to scale with a so-lution space that grows exponentially. In this paper,we present a three step process for designing practicalfog-based IoT applications that uses best practices,simulation, and testbed analysis to converge towardsan efficient system architecture. We then apply thisprocess in a smart factory case study. By deployingfiltered options to a physical testbed, we show thateach step of our process converges towards more effi-cient application designs.
For more than a decade, cloud computing has beenthe dominating paradigm when designing and deploy- ing software services but this is not a good fit for newapplication domains such as the Internet of Things(IoT): Sending the world’s IoT data to a centralizedcloud for processing is not only inefficient but alsoprohibitively expensive [1]. Processing should insteadhappen where IoT data is generated and needed [2].Fog computing, as first proposed by Bonomi et al. [3],brings the required paradigm shift: It extends thecloud to the edge of the network so that applica-tions can leverage additional infrastructure betweenthe cloud and end-devices. From powerful data cen-ters in larger cities to small, single-board computersco-located with cellular base stations, application de-signers can deploy their services not only in a centralcloud but anywhere on the edge-cloud continuum.While cloud resources still provide elastic, seeminglyinfinite scalability at low cost, edge infrastructure of-fers service consumers low latency access while alsoconsuming less network bandwidth [2]. Overall, fogcomputing enables hitherto impossible applicationarchitectures but it does not simplify application de-sign. Even worse, when designing fog-based IoT ap-plications, the placement of software services withinthe fog is now a new dimension on top of actuallybuilding the application.Correctly placing services, however, is vital inleveraging fog computing for the IoT as it directly in-fluences both quality and cost of applications. At thesame time, the number of deployment options growsexponentially with each service or location. Existing1 a r X i v : . [ c s . D C ] A ug pproaches to designing fog-based IoT data process-ing applications each have their drawbacks. First,there are those that try to parameterize the entiresystem to form an optimization problem solved byheuristics or similar means (e.g., [4–8]). This re-quires detailed information upfront, is limited by theassumptions of the applied model, and can becomeinsolvable for complex applications. Alternatively, asecond approach is to follow guidelines, best practicesor reference architectures such as [9–12], which, whileuseful as a starting point, target generalized scenar-ios and are hence not sufficient for a specific use-case. Third, there are approaches that aim to simu-late the fog environment to help make informed de-cisions about application performance (e.g., [13–15])and, fourth, those that introduce tooling to create(emulated) fog testbeds (e.g., [16–18]) to deploy, test,and benchmark applications. Simulation and emu-lation, however, do not scale well with the growingamount of application deployment options, especiallygiven the cost of testbeds.In this paper, we propose a new process for de-signing efficient fog-based systems that combinesand extends existing approaches, namely followingbest practices, simulation, and testbed emulation.Through this combination, we leverage the advan-tages of each approach while mitigating their respec-tive limitations. For instance, we apply best practicesto reduce the parameter space for simulation whichprevents incurring the costs of simulating the entireparameter space without sacrificing the accuracy ofsimulation results. Our overall goal is, hence, to iden-tify an efficient fog application design as effectively aspossible.To this end, we make two core contributions: • We extend and integrate previous research ofours into a novel framework. We use best prac-tices [9], simulation with
FogExplorer [13, 14],and infrastructure emulation with
MockFog [16](Section 3). • We implement a smart factory application fol-lowing our proposed process and compare thefinal application design to a range of discardeddesign options in experiments on a physical fogtestbed. (Section 4). Figure 1: The layered fog architecture comprisescloud, intermediary, and edge nodes, as well as IoTdevices.
In this section, we summarize fog computing conceptsand discuss characteristics of fog-based IoT applica-tions and efficient IoT application design.
Our definition of fog computing is adapted from [2]:Fog computing is the extension of the cloud towardsthe edge of the network. The idea is to combinecloud resources, intermediary nodes, edge computing,and even on-device computation to distribute appli-cations across a wide variety of infrastructure. In thisway, application developers can leverage both low ac-cess latency at the edge and scalability in the cloud.We show an example of a layered fog architecture inFig. 1.As fog computing combines platforms from dif-ferent vendors, e.g., a cloud provider or a networkprovider, heterogeneity is a major challenge. Differ-ent platforms are likely to provide different program-ming models and service levels. Furthermore, inter-mediary and especially edge nodes are also likely tobe more expensive and less scalable than their cloudcounterparts.A major obstacle to using fog computing is thatapplications need to be deployed in a distributedmanner, with different software components placedon different nodes in the fog. This is impossible2hen dealing with traditional monolithic informationsystems. Only a modularized application split intodistinct software services allows each service to beplaced at specific locations within the fog, whetherthat be in the cloud or towards the edge. Whileincreasing the communication overhead, smaller ser-vices are necessary for fine-granular scaling and en-able more flexibility in service placement on the foginfrastructure [2]. To this end, leveraging lightweightvirtualization technologies such as Docker can makesoftware deployment easier [19].
IoT applications analyze data from sensors or pro-cess them to trigger actor devices and software sys-tems [9]. A key characteristic of IoT applications isthat they do not follow a request-response model asin user-facing systems; instead, data move througha processing pipeline in a more “linear” way – typi-cally in the form of a directed acyclic graph (DAG).Overall, there are two classes of IoT data process-ing: event processing and data analytics . Zhang etal. [1] describe these as “real-time applications withlow-latency requirements” and “ambient data collec-tion and analytics,” respectively. An application of-ten comprises multiple data processing componentsthat can each be classified individually in this man-ner.In event processing, events from the outside world,measured through connected devices, trigger reac-tions in the system and, by extension, possibly inthe physical world. The main focus here is time sen-sitivity: events are expected to be reacted to as fastas possible. Advantageously, operations are thus alsowell-defined and simple, and events as data points aresmall as they only carry metadata [20].Data analytics is the process of collecting and pro-cessing data to obtain information. Here, complexoperations are applied to data from multiple sourcesover a longer period of time [21].
We consider two dimensions to efficiency in fog-basedIoT applications: service level and cost.
Service level , often also referred to as quality ofservice (QoS), can be both the availability of the ap-plication and the access latency for particular ser-vices [2, 22]. Latency is highly dependent on ser-vice placement and is caused by data processing andtransmission. Data processing latency describes thetime that passes between the input into the process-ing unit, which could, for example, be a cloud func-tion, and the output of a computed result. Datatransmission latency, on the other hand, is the de-lay from the first packet of data to be sent by thesender to the last packet of data to be received bythe receiver. To limit the scope of this paper, wedo not consider availability and leave this to futurework.
Cost is incurred through the usage of resources inthe fog, such as compute, storage, and network band-width, and through upfront investment in IoT devicesor other hardware. Generally, compute and storageare far cheaper towards the cloud, as providers aremore capable of leveraging economies of scale in largedata centers rather than on the edge. For networkbandwidth, fog platform providers often charge foroutgoing and incoming traffic to a data center andIoT devices may use cellular network access whereeach packet incurs a specific cost. These costs arethe main contributors towards the total cost of oper-ating an IoT application.When designing fog-based IoT applications, differ-ent design options result in different service levels andcost. An efficient design offers the best possible QoSlevels at the lowest possible cost, i.e., it finds a sweetspot on the QoS and cost tradeoffs [22–24], as QoSand cost are not independent from each other. De-ploying powerful servers at every edge location wouldminimize latency but result in high cost. Similarly,moving all services to the cloud can minimize costbut dramatically increase latency [1, 2].3
Designing Efficient IoT Ap-plications
In this section, we present our proposed fog applica-tion design process. We start by giving a high leveloverview of our approach before describing the indi-vidual steps in detail.
The process we propose for designing efficient fog-based IoT applications comprises five main steps. Ini-tially, there is a broad range of design options whicheach describe a mapping of software services to nodesin the cloud, edge, or in-between, i.e., the serviceplacement. Each step of the proposed process thenfilters out application design options starting at thecartesian product of all software and infrastructuremodels, thereby converging towards the limited setof most efficient designs. The key idea is to create asequence of steps in which each step provides moreaccurate recommendations than its predecessor butis also more expensive to execute. Since each step re-duces the application design space by orders of mag-nitude, we use more expensive analysis steps for onlya limited number of options late in the process whilerelying on low-cost heuristics in the first steps; seeFigure 2 for a high-level overview of the proposedprocess.In the first step, we build models of software com-ponents and of the infrastructure on which the ap-plication will be deployed. In each later step, wethen extend these models and augment them withadditional details as available further in the designprocess. Finally, we are able to select an efficient fogapplication design.In the second step, we apply a set of best practicesin IoT data processing. By following these informedrules, we can already discard all highly inefficient op-tions. This reduces the set of options that we haveto consider later in the process, enabling us to movethrough these subsequent steps more efficiently. Asthe number of available options grows exponentiallywith each additional component, this step reduces the design options considered in the subsequent stepsfrom millions to only thousands.In the third step, we simulate service placement toinfrastructure components. With this, we can calcu-late service cost based on the given cost factors andexamine latency constraints for different designs. Byintroducing service level objectives (SLOs) for partsof the application, we can remove application designoptions that violate required service levels and in-stead focus only on inexpensive options that conformto all constraints.In the fourth step, we set up emulated testbeds foreach of the remaining application design options todeploy and benchmark software services. As this stepis expensive and time-consuming, we propose to useonly the options in the 95 th percentile of the secondstep, again reducing the number of considered appli-cation design options by orders of magnitude. Basedon the number of remaining design options, this se-lection may be limited or broadened, reducing testingcost or leading to more accurate results, respectively.This process eventually converges towards a smallset of highly efficient design options. If available,these options that show the best performance at goodcost levels can then be deployed on a physical testbedor the actual infrastructure to measure their perfor-mance in their real environment (fifth step). Our process requires basic insights into the avail-able runtime infrastructure and the individual soft-ware services. We start with a notion of infrastruc-ture components, yet at this early step in the designprocess we cannot assume that detailed informationabout runtime infrastructure is available. We there-fore only require high level, abstract descriptions ofavailable data processing locations, such as IoT de-vices, edge nodes, or cloud platform providers. Suchknowledge can, for example, be gained by survey-ing and analyzing eligible providers and products orby comparing options for IoT devices and gateways.For some more complex use cases, synthesizing possi-ble edge infrastructure configurations as proposed byRausch et al. [25] could be an alternative approach.4 reateSoftware ModelCreateInfrastructure Model Select Design Options That Follow Best Practices Simulation in FogExplorer Apply SLA Constraints Select 99 th Percentile in Overall Cost Implement Software Components Emulation in MockFogTestbed Benchmark Deploy to Physical Testbed(Optional)Survey Available InfrastructureDefine Application Software
1. Preparation 2. BestPractices 3. FogExplorer 4. MockFog
Select Design With Best Performance
5. DeploymentNumber of Application Design Options ∼ ∼ ∼ ∼ Figure 2: Starting from the set of all possible application design options derived from an software andinfrastructure model, we remove poor design options in each step of the application design process, convergingon the most efficient one.Aside from infrastructure components, we alsomodel software components . At this point, no ac-tual implementation has to be available yet. For ourmodel, we use three kinds of components: sources , services , and sinks . Sources are components that pro-duce new data. For an IoT use case, sources are typi-cally IoT sensors. Services consume data and performoperations, thereby producing new data. Servicescould, for instance, transform data through aggrega-tion or trigger events. Finally, sinks are componentsthat persist data, e.g., a database system, or interactwith the physical world based on data, e.g., an IoTactuator. Sinks that persist data can also have a sec-ondary role as sources exposing historical data. Weshow an example application of this kind in Fig. 3.We define the overall application as a collection of application paths . Each application path starts withone or more data sources, has a number of servicesalong the way and ends in one sink, i.e., an applica-tion path is the DAG of processing steps that leadsto a particular sink. Please, note that we are not trying to derive a formalmodel as used in either mathematically formulated optimiza-tion problems or in standardized modeling languages. Rather,our efforts focus on deriving and abstracting certain propertiesfrom an application and its available infrastructure; the waythis abstract information is represented is irrelevant for ourpurposes.
At this point, albeit early in the process, it is al-ready useful to simplify both software and infrastruc-ture models. In most IoT applications, specific com-ponents are instances of the same class of compo-nents. In a smart home use case, for example, therecould be various light bulbs and corresponding lightswitches. Assuming that each switch controls a num-ber of lights, a pattern emerges. To simplify simula-tion and benchmarking, we model only one applica-tion path and later apply this to all instances of lightswitches and lights in the system. This also allowsour process to scale well and to require less upfrontinformation about the system, while not influencingthe results as we merge instances of the same com-ponent rather than modifying them.For sources and sinks, the mapping to infrastruc-ture components is clear, as these are tied to thephysical world. An IoT device, for instance, exists asa physical device, i.e., an infrastructure component,and as a source in the software model. Consequently,we only need to consider the placement of services,i.e., the software components that process data, inthe subsequent steps of the design process.
In previous work [9], we proposed best practices forfog-based IoT application design, which we now use5 ensor ActuatorFilter Datastore
SourceServiceSinkApplication Path
Controller
Figure 3: Example software model with two applica-tion paths, sources, services, and sinks.to exclude unsuitable application design options. Inthe following, we will briefly describe how we ap-ply these best practices, which we split into rulesfor event processing and data analytics applicationpaths.In event processing, processing is time-sensitiveand services should be placed on the shortest com-munication path between data source(s) and sink,as close to the cloud as possible to minimize cost,and as close to the edge as necessary to fulfill SLOs.As typical event processing services are not compute-intensive, minimizing round-trip time is more impor-tant than reducing processing delay. Yet, as cloudcomputing resources scale better and moving towardsthe edge reduces flexibility and increases cost, it isstill important to process events as close to the cloudas possible. That means selecting the infrastructurenode that provides the most flexibility and least ex-pensive compute power from the set of nodes on theshortest path between the event source and its sink.For data analytics, rather than time sensitivity be-ing the focus, operations are complex and requirea lot of processing power. These operations rangefrom filtering or aggregation to predictive analyticswith machine learning. Furthermore, services heremust consider and even combine data from differentsources. Data analytics processors that preprocessdata and reduce the data volume, e.g., through filter-ing and aggregation, should be kept as close to theedge as possible but and as close to the cloud as nec-essary. Compute-heavy operators, on the other hand,should be placed near the cloud, where processing ischeaper.Given these best practices, we can filter the set ofapplication design options. Here, we consider each application path individually. For each applicationpath, we first identify whether it targets event pro-cessing or data analytics. For an event processingapplication path, infrastructure nodes that lie on theshortest path between the infrastructure componentsthat host the event source and sink are an efficientlocation for software services. In data processing,we argue for preprocessing of data close to the edgewhere possible. This reduces usage of bandwidth to-wards the cloud, where we propose to place morecomplex data processing. We also rule out optionswhere the resulting data flow uses the same networklinks more than twice.
In the second step, we use simulation to analyze theremaining application design options. For this, werely on FogExplorer [13, 14] which we presented inprevious work. For a given mapping, FogExplorercalculates QoS and cost metrics through simulationand provides recommendations for optimizing com-ponent placement. FogExplorer can be used in an in-teractive way in which application designers updatemappings and observe the resulting metric values in-stantly but can also be used in a batch mode throughits API.Based on an infrastructure and software model,FogExplorer calculates four metrics per mapping: processing cost , processing time , transmission cost ,and transmission time . Processing cost and trans-mission cost describe average cost per second withinthe system. Processing time and transmission timedescribe latency induced by services and transmissionof data.To calculate these metrics, FogExplorer first deter-mines the data stream routing by identifying the pathwith the lowest total bandwidth cost for each set oftwo communicating software components.In a second step, FogExplorer calculates resourceusage to assert that the selected mapping does notexceed resource limits; for example, a connection mayhave a limited amount of bandwidth. FogExplorerwill thus determine if the bandwidth required by anyconnection within the mapping exceeds the availablebandwidth.6n the third step, FogExplorer calculates total costbased on resource usage. Transmission costs dependon bandwidth used and the respective bandwidthprice. In a similar manner, FogExplorer also calcu-lates processing costs.In addition, FogExplorer also determines time met-rics and calculates processing time and transmissiontime for each application path. Processing time isthe total latency induced by services processing data,while transmission time is the total connection la-tency along the application path.Finally, FogExplorer tallies transmission costs andprocessing costs, as well as transmission times andprocessing times to project the total cost and end-to-end latency of the given mapping.We use FogExplorer to further filter out applica-tion design options as the third step of our proposedprocess. To use this simulation, we have to extendour software and infrastructure models slightly.In the infrastructure model, we also specify dif-ferent hardware options that are available for eachnode. For example, at an edge data center location,the installation of different types of servers with dif-ferent capabilities yet also different price points maybe possible. Here, FogExplorer allows us to comparethese different options to find the most efficient one.While this increases the space of application designoptions, this is necessary to determine the optimalinfrastructure.For each infrastructure option at each node, wespecify a relativePerformanceIndicator , which is arough estimate of compute power compared to a cho-sen reference machine. For instance, if a machinetype has a performance indicator of 2, it is twice as“fast” as the reference machine. Furthermore, the availableMemory metric specifies how much memoryis available for the machine and the price metric spec-ifies the price for using the machine. Network com-ponents are extended with an availableBandwidth , a bandwidthPrice , and a latency for each connection.If latency cannot be accurately benchmarked aheadof time, it is also possible to use estimates based onlink layer performance and geographical locations ofnodes as done in, e.g., [26].Similarly, we add quantitative attributes to soft-ware model components as well. Sources produce Figure 4: Extension of the example software model;we infer connection data rates from given outputRates and outputRatios .data at a constant rate that we mark as their av-erage outputRate in the form of Byte/s. The rateat which services output data depends on their in-put rate, hence we use an outputRatio to calculatetheir outgoing bandwidth. For services, we also em-ploy a referenceProcessingDelay factor that describeshow long, on average, the service needs to processdata on the aforementioned reference machine, anda requiredMemory metric to describe the amount ofmemory needed by the service. Of course, both sinksand sources as software components require a certainamount of memory as well once they run on an infras-tructure node. The infrastructure nodes then incurcost for running these components. As we have de-scribed, however, mapping for sources and sinks isfixed, as these components relate to objects in thephysical world. Hence, while it is possible to simu-late costs incurred here as well, these costs would bestatic and, subsequently, not influence our decisionbetween one application design option and another,which is why we omit them in the simulation andonly focus on resources required by service compo-nents. We show the extended version of our examplesoftware model from Section 3 in Fig. 4.We also introduce SLOs in the form of limits toend-to-end latency for each application path at thispoint. As we have described in Section 2.3, we mea-sure efficiency for fog application design in cost andlatency. Yet as cost and latency depend on eachother, finding the most efficient application designis a difficult multi-objective optimization problem.Rather than finding the quantitatively optimal solu-7ion, we apply constraints in the form of SLOs to con-vert this problem into single-objective optimizationproblem . While it depends on the specific applica-tion, the economic law of diminishing returns usuallyalso applies to the tradeoff between cost and latency:To give an example, imagine both a user-facing webservice and a machine-to-machine communication usecase. In the first use case, investing a considerablecost to decrease latency by 10ms would often not beuseful, while it can be in the second scenario. Appli-cation designers can set the required access latencyfor all application paths arbitrarily high or low as isrequired by the application and our process will op-timize cost within this specified service level.Given these limits on end-to-end latency, we con-sider only those models that satisfy these constraintsin an efficient way, that is, at low cost, further.From the set of application design options, we se-lect only those that do not violate the service levelsfor any application path as defined in Section 3.2.If no model conforms to these constraints, it is use-ful to reconsider the constraints or available infras-tructure. From the remaining design options, wenow select those that we will consider in the testbedstep through the remaining influence factor, i.e., to-tal cost. As testbed evaluation is expensive and time-consuming, the number of application design optionsthat will be benchmarked needs to be low. On theother hand, the design options that are identified asgood options in the simulation step are not neces-sarily the best options, i.e., it can be beneficial toproceed with a broader variety of options. We pro-pose to solve this tradeoff by proceeding with designoptions that lie in the 95 th percentile when consider-ing their total cost. If necessary, this range can beadapted. In the fourth step of our process, we evaluate de-sign options through experiments on an emulated fogtestbed. This evaluation requires an implementation An alternative would be to use a utility function to trans-form the multi-objective optimization problem into a single-object optimization problem. of the application software that we can deploy to thetestbed and is thus the most time consuming andcostly. Yet, the low number of viable options thatremain after the first three steps of our process lim-its the required experiments. Furthermore, it alsolimits needed implementation efforts as services onlyneed to be implemented for the platforms they couldbe deployed on in the remaining application designoptions [19, 27].To benchmark fog application design options, wepropose to use MockFog as we have presented in [16].MockFog provides an emulated yet realistic environ-ment for functional testing and benchmarking of fogapplications in the cloud. In MockFog, cloud, edge,and intermediate nodes as well as IoT devices areinstantiated as cloud virtual machines. Computepower, memory, and intra-node network characteris-tics such as latency or failures rates can be configured;also, failure scenarios can be emulated.Again we need to modify our initial software andinfrastructure models to fit the model used by Mock-Fog. Instead of a performance indicator given formachines in the infrastructure model, we now needto quantify the actual compute power , memory , and storage capabilities. Furthermore, we have to de-fine bandwidth and latency parameters for networkconnections. MockFog introduces routers betweenconnected machines rather than direct connections.Hence, in order for all nodes to be able to communi-cate, we have to add these routing components whereapplicable.Rather than extending it, we need to replace theapplication model with actual implementations ofservice and sink components that we then deploy onthe MockFog testbed. For source components, themajority of which are IoT devices, implementationis more difficult. These source components need toproduce IoT data in conformance with the applica-tion model. It is possible to use traces of real IoTdata, e.g., through BenchFoundry [28], or to attachreal world IoT devices, although this requires a con-sideration of network conditions between these de-vices and the MockFog testbed location. Finally, we8an also employ artificial workload generators suchas Apache JMeter to generate data.On the emulated MockFog testbed, we can then an-alyze the behavior of the IoT application, especiallyin the context of component placement. While theMockFog environment also allows us to change con-figuration parameters at runtime, e.g., to inject fail-ures, we use it only to benchmark application designsunder the assumption that the provided applicationimplementation is correct. After these four steps, an informed decision on thebest design option can be made. The selected designoption is likely to be the most efficient one regardingcost and service level, as it has been selected throughbest practices and simulation as well as verified onan emulated fog testbed. If in doubt, the best twoor three options can then also be test-deployed in thereal runtime environment or on a physical testbed tofurther substantiate the results.
To evaluate our approach, we use a case study basedon a smart factory scenario. In the first part, wefollow the process described in Section 3 to show thatit can be used in practice. We make all software weuse available as open source .In the second part (Section 4.2), we show that thedesign option identified by our process is among thebest options; for this we implement the design on aphysical testbed and compare it to alternative designoptions. Due to the number of permutations and theresulting experiment effort, it is not feasible to showthat the identified option is the best option. We,hence, rely on sampling and run experiments withrandomly selected design options that we have dis-carded in earlier process steps. https://jmeter.apache.org https://github.com/pfandzelter/zero2fog-artifacts Figure 5: The smart factory comprises a factory floor,factory data center, and logistics office, and is aug-mented by an office data center and the cloud.
In our case study, we apply our proposed process to asmart factory IoT application. We start by describingthe scenario and derive software and infrastructuremodels (Section 4.1.1), apply our set of best prac-tices (Section 4.1.2), use simulation (Section 4.1.3)and testbed experiments with the implemented soft-ware services (Section 4.1.4) to identify good designoptions, and briefly discuss the results of followingour approach (Section 4.1.5). This shows that it isindeed possible to follow our proposed process and topick a resulting design option.
We give an overview of our IoT application’s compo-nents in Figure 5. The factory comprises a factoryfloor, a small data center, and a logistics office. Inaddition to the factory, there is a central office in anoffsite location.The factory floor has two machines: the
Produc-tion Machine produces a part that the
PackagingMachine then prepares for shipment. To ensure thatthe Packaging Machine processes only faultless parts,the Production Machine has an attached camera thattakes a picture of each produced part and checks fordefects. The Packaging Machine should adapt its9 ameraTemperature SensorProduction Controller Adapt MachinePackaging ControllerPackaging Controller Predict Pickup
A1A2A3A4
Central Office DashboardGenerate DashboardAggregate LogisticsPrognosisPackagingControllerProductionControllerCheck forDefects
Figure 6: Data sources, services, and sinks in ourapplication. We mark application paths A1 - A4 forthe components.speed to the output rate of its preceding machine.Furthermore, the Packaging Machine can only op-erate within a fixed ambient temperature range andthus has a temperature sensor installed to shut-off thePackaging Machine if necessary. Each machine is alsoequipped with a controller that controls the speed atwhich the machine operates. These controllers areable to communicate over a common wireless gate-way. In the onsite logistics office, logistics person-nel makes the decision on when to arrange outgoingproduct shipments. To this end, a logistics dashboardpredicts machine output based on recent productiv-ity. The factory data center provides some computepower and a connection to the WAN.In the central company office in an offsite location,the business requires central reporting of factory pro-ductivity. This central office also has a collocatedmedium-size datacenter. Additionally, it is possibleto leverage cloud computing to outsource some com-putational tasks.We use this information to create our infrastruc-ture model with the cloud, data centers in the smartfactory and central office as well as wireless gateway,machine controllers, and sensor nodes that all haveadditional compute capabilities.We also derive the following application paths fromthe initial concept (see Figure 6 for the softwaremodel): A1:
The
Camera takes pictures of parts leaving the first machine and the
Check for Defects service ana-lyzes each picture for defects. In case of a defect, theservice instructs the
Production Controller to discardthe respective part.
A2:
The Production Controller has information onthe output rate of the machine that produces partsand uses this information to adapt the packaging rateof the packaging machine through an intermediaryservice. As a second input, the
Packaging Controller also relies on data from the
Temperature Sensor tocontrol the packaging rate. When temperature read-ings leave a specified range, as detected by the
AdaptMachine service, the Packaging Controller instructsthe packaging machine to pause operation.
A3:
The Packaging Controller provides data on therate and amount of packaged parts to the
PredictPickup service that feeds into the
Logistics TeamPrognosis . A4:
Data of the Packaging Controller is also con-sumed by a service that aggregates and filters thatdata to generate a dashboard for the central office,which then runs inside a browser on a machine in thecentral office.Data sources and sinks closely mirror the real worldand placement for them is straightforward. For ex-ample, the Camera component in both the infrastruc-ture and software models are the same device in thereal world. For services, however, we still need to findan efficient mapping. To this end, we now follow theprocess introduced in Section 3.
As described in Section 3.3, we need to consider allapplication paths individually in this step. We beginby classifying each application path and then use thecorresponding best practice advice to filter out someapplication design options.
A1:
Although a photo is larger than a sensor value,we classify A1 as event processing. Each photo cor-responds to an event in the physical world, in thiscase the production of a part, and the Camera trans-lates this event into a message carrying metadata inthe form of the image. Processing the image is alsotime-critical as the Production Machine should dis-10ard any faulty part before it arrives at the PackagingMachine. Although the event message has a rela-tively large size, the Check for Defects service on thisapplication path only needs to consider one source ata time, which, depending on complexity of analysisfor each event, limits processing time.As such, limited bandwidth and high network la-tency can be a bigger factor in not achieving QoSgoals here. Therefore, image processing should atleast be kept on factory premises, if not even insidethe machine on either Camera or Production Con-troller. A more specific decision is not possible aslong as more detailed information about service com-plexity and infrastructure capabilities are not avail-able at this stage.
A2:
We can make a similar argument for A2. Here,two event sources produce events independently buta single service that controls the packaging rate con-sumes all of them. Again, we classify this path asevent processing as events are small in size and de-cisions need to be made quickly. Service complexityis also low as, albeit consuming two data sources,the service does not consider historic data and per-forms simple calculations. Thus, placing the AdaptMachine service closely to data sources and sinks, onfactory premises, is the most efficient option.
A3:
Despite using only one data source producingrather simple data items, we classify A3 as data an-alytics since it needs to consider current and histor-ical data; also, the processing is more complex asthe goal is to predict future packaging rates. Fur-thermore, QoS limits for latency are in the range ofseconds (rather than milliseconds) as the staff willonly periodically check the report. Consequently, de-pending on prediction complexity, we propose placingthe Predict Pickup service where compute power isthe cheapest, the cloud or a data center for instance.Correct placement then comes down to a cost calcu-lation between bandwidth price and compute costs,as is part of the subsequent simulation.
A4:
Finally there is A4 which monitors the factoryoutput rate to feed data into a dashboard in the cen-tral office. This, too, is a data analytics workflow andthere are no strict latency constraints. Instead, again,data amount and processing complexity are the lim- iting factors. Consequently, as the Aggregate serviceis a preprocessing step, placing it close to the Packag-ing Controller limits bandwidth usage. Similarly toA3, we can then place the complex processing serviceGenerate Dashboard where processing is available forthe lowest price, which is likely to be the cloud or oneof the data centers.Starting with five services that we can deploy toone of eight infrastructure components each meansthat there are 32,768 permutations, growing expo-nentially with additional services or infrastructurecomponents. By following our best practices, wemanaged to reduce the set of options to only 864.
We now use FogExplorer to simulate QoS and cost ofthe remaining application design options as explainedin Section 3.4. To use FogExplorer, we first need toextend the software and infrastructure models whichwe show in Figures 7 and 8, respectively. To give anexample, in the application model the camera pro-duces data in the form of images at a rate of 100kb/sand the service thereafter takes an estimated 20msto process data items on a reference machine withan outputRatio of 0.1, meaning that with a 100kb/sinput it outputs 10kb/s. Furthermore, this service re-quires 250MB of memory. For each application pathwe have also introduced QoS requirements in the formof latency limits. In the simulation, we discard anyservice mapping that violates either of these condi-tions. For the A1 application path, for instance, weset an upper limit of 50ms as delay between takinga picture and the command reaching the productioncontroller. In the infrastructure model, we introducedifferent machine options with different capabilitiesand price points for some nodes. For example, thereare two options for the camera component: One hascomputational capabilities of 0.1% that of the ref-erence machine with 1MB of memory at a price of$0.5/month while the other has 5% performance ofthe reference machine with 10MB of memory avail-able at a higher price of $5/month.As our case study is fictional, we estimate theseprices in lieu of actual infrastructure. As a ba-sis, we use pricing for a moderate compute instance11 amera outputRate: 100kb/s outputRatio: 0.1referenceProcessingDelay: 20msrequiredMemory: 250MBLimit: 50ms
Temperature SensorProduction Controller Adapt Machine outputRate: 10kb/soutputRate: 10kb/s outputRatio: 0.5referenceProcessingDelay: 1msrequiredMemory: 100MBLimit: 30ms
Packaging Controller outputRate: 10kb/s outputRatio: 0.1referenceProcessingDelay: 2msrequiredMemory: 100MB outputRatio: 100referenceProcessingDelay: 50msrequiredMemory: 2500MB
Limit: 2s
Packaging Controller Predict Pickup outputRate: 10kb/s outputRatio: 0.1referenceProcessingDelay: 100msrequiredMemory: 1500MBLimit: 1s
A1A2A3A4
Check forDefects ProductionControllerPackagingControllerLogisticsPrognosisGenerate DashboardAggregate Central Office Dashboard
Figure 7: We extend components of the applicationpaths in our software model with attributes as re-quired by FogExplorer: sources have an outputRate and services have an outputRatio , referenceProcess-ingDelay , and requiredMemory . Furthermore, appli-cation paths have a QoS limit of acceptable end-to-end latency.with a 2 core processor and 4GB of memory onAmazon Web Services (AWS) Lightsail , which costs$20/month. This is similar in price and performanceto the medium machine option for the Factory DataCenter node. We estimate total cost of ownershipper performance to be lower near the cloud and withmore powerful machine options, yet higher near theedge, where maintenance is a higher factor, and ex-trapolate accordingly.The A2 application path has two sources and,depending on its placement, they have a differentconnection latency to their common service. Asboth sources send their data in parallel, we considerthe maximum end-to-end latency for this applicationpath and assert that this does not violate the QoS.We automate the simulation using the Node.js in-terface of FogExplorer. Although the number of pos-sible application design options grows exponentiallywith software and infrastructure components and ma-chine options for nodes, the preceding step in whichwe have discarded options using best practices has https://aws.amazon.com/lightsail already limited those options, allowing us to simu-late all remaining design options efficiently. In fact,with current software and infrastructure models weneed to consider only 186,624 different options andare able to simulate and calculate metrics for all ofthem in about one minute on a standard laptop com-puter. For comparison, and to emphasize the impor-tance of the first step of our process, there is a totalof 7,077,888 application design options and a com-plete simulation of those already takes 50 minutesfor this simple use case. As such, using only simula-tion without applying best practices first is infeasibleespecially for more complex application scenarios.In addition to overall cost and time metrics, wealso calculate metrics for each application path on itsown. This helps us discard options that violate SLOlimits. From 186,624 possible application design op-tions only 2520 are valid and only 215 remain afterapplying the latency limits we defined. Consequently,FogExplorer lets us discard the 99.9% of applicationdesign options that are impossible to deploy in prac-tice given infrastructure and SLO constraints.The options that remain are therefore those thatconform to all infrastructure and QoS constraints andwe can now choose those that have the lowest overallcost according to the simulation. We select the ap-plication designs in the 95 th percentile in the pool ofoptions based on cost, a total of ten designs. Fromthe simulation, it is clear that placing the Check forDefects service of the A1 application path in the Fac-tory Data Center, the Adapt service of the A2 pathon the Packaging Controller or the Factory Data Cen-ter, and the Aggregate service of path A4 on theWireless Gateway are the most efficient applicationdesign options. Furthermore, it becomes apparentthat the Camera, Production Controller, and Sensordo not require additional compute capabilities as theydo not need to run any data processing services. Forthe Factory Data Center, the simulation recommendsthe medium machine option for each application de-sign options and the least expensive options for bothOffice Data Center and Cloud.12 elativePerformanceIndicator: [5,10,20]availableMemory: [32,64,256] GBprice: [50,100,300] $/month Office Data CenterCloud relativePerformanceIndicator: [10,25,50]availableMemory: [64,256,512] GBprice: [50,250,500] $/month
Factory Data Center relativePerformanceIndicator: [0.5,1,5]availableMemory: [2,4,16] GBprice: [10,20,50] $/month
Wireless Gateway relativePerformanceIndicator: [0.1,0.5]availableMemory: [0.1,0.25] GBprice: [25,60] $/month
Camera relativePerformanceIndicator: [0.001,0.05]availableMemory: [0.001,0.01] GBprice: [0.5,5] $/month
Sensor relativePerformanceIndicator: 0.001availableMemory: 0.001 GBprice: 0.5 $/month
Packaging Controller relativePerformanceIndicator: 0.5availableMemory: 0.1 GBprice: 30 $/month
Production Controller relativePerformanceIndicator: [0.05.0.5]availableMemory: [0.01,0.1] GBprice: [10,30] $/monthlatency: 110msavailableBandwidth: 10Gb/sbandwidthPrice: 1 $/Gb latency: 125msavailableBandwidth: 1Gb/sbandwidthPrice: 1 $/Gb latency: 1msavailableBandwidth: 1Gb/sbandwidthPrice: 0 $/Gb latency: 5msavailableBandwidth: 1Mb/sbandwidthPrice: 50 $/Gb latency: 0.01msavailableBandwidth: 100Mb/sbandwidthPrice: 0 $/Gblatency: 0.01msavailableBandwidth: 100Mb/sbandwidthPrice: 0 $/Gblatency: 5msavailableBandwidth: 1Mb/sbandwidthPrice: 50 $/Gblatency: 15msavailableBandwidth: 10Gb/sbandwidthPrice: 5 $/Gb
Figure 8: We extend infrastructure components and their network links with more attributes as required byFogExplorer: each node has a relativePerformanceIndicator , availableMemory , and MemoryPrice . Networkconnections have a latency , availableBandwidth , and a bandwidthPrice . Square brackets denote that morethan one hardware option is available at a specific node. These hardware options differ in price and capability. Through simulation, we have chosen the ten most ef-ficient application designs and can now deploy theseon an emulated MockFog testbed. Before deploymentcan begin, we must first implement our software com-ponents. To this end, we implement each source,service, and sink in Go 1.14. We then install thecompiled binaries on the MockFog nodes as Dockercontainers. We use an extended version of MockFogfor our experiments that is available with all othersoftware artifacts.Each node in the system maps to one instance onAWS Elastic Compute Cloud (EC2) in the sameavailability zone of the us-east-1 region. To emu-late different kinds of hardware, we use different in-stance types. We show the mapping from referen-cePerformanceIndicator as employed in FogExplorerto EC2 instance types in Table 1. For instances ofthe t2 family, we enable unlimited accrual of CPUcredits to prevent inconsistent CPU bursting. Giventhe limited number of available instance types, how- https://aws.amazon.com/ec2 Table 1: referencePerformanceIndicator (rPI) andCorresponding AWS EC2 Instance Types Used inOur MockFog Experiments rPI EC2 InstanceType [0 ,
1[ t2.micro 1 2 1.25[1 ,
5[ t2.medium 2 4 2.90[5 ,
10[ t2.xlarge 4 16 5.89[10 ,
20[ t2.2xlarge 8 32 11.78[20 ,
50[ m5a.12xlarge 48 192 45.48[50 , ∞ [ m5a.24xlarge 96 384 90.91 ever, this is not as fine-grained as the referencePer-formanceIndicator in FogExplorer. Furthermore, italso does not allow setting the availableMemory tothe same value as in the FogExplorer infrastructuremodel. To validate performance differences betweeninstance types we use the sysbench CPU benchmarkin version 1.0.20. This benchmark calculates allprime numbers up to a certain limit, which we setat 1,000,000, in 1,024 threads simultaneously. It https://github.com/akopytov/sysbench/tree/1.0 CPU speed metric that describes thenumber of events the benchmarked CPU was ableto handle per second, with each event correspondingto one completed prime computation. We repeat thisbenchmark three times and report median results. Asshown in Table 1, this metrics scales nearly linearlywith the amount of CPU cores. Note that in orderto leverage this performance for our application, theservices we deploy have to actually use all availableCPU cores. To this end, our implemented applica-tion services use multithreading through goroutines .Nevertheless, we can expect that performance doesnot scale strictly linearly with the number of CPUcores in practice.MockFog sets artificial network bandwidth and la-tency limits between machines, and deploys our soft-ware components to the machines. The mappings forsinks and sources are identical each time, with, forinstance, the Camera process running on the Cam-era node. Service mappings follow the ten most ef-ficient design options identified through simulation.For each option, MockFog runs the application for 20minutes and then collects logs to determine end-to-end latency for each application path. We repeat thisprocess three times to gain accurate results and usemedian results in further analysis.We measure end-to-end latency by attaching times-tamps and unique identifiers to each request thatpasses through the system. Each component logswhen it sends or receives a request with a specificidentifier. One problem with measuring end-to-endlatency in this manner is clock skew. When the clocksof two machines are not in sync, the measurement canbecome inaccurate. To limit this effect, all machinessynchronize their clocks through the AWS Time SyncService in their region before the experiments runwhich, during our experiments, resulted in clock de-viations lower than 0.3ms.Between re-runs of the same experiment setup, wesee a small overall coefficient of variation of between0% and 3%. Consequently, we can say that our ex-periment results are robust. We use the average end-to-end latency unless stated otherwise and show theseresults in Figure 9. As expected, latency for the A1application path is similar across all design options,as the Check for Defects service is always deployed to the same kind of Factory Data Center. On the A2application path, we observe an end-to-end latency ofbetween 3ms and 4ms when the Adapt service runson the Packaging Controller and 14ms when placedon the Factory Data Center, due to the increase innetwork latency caused by additional hops for eachrequest. This difference is even greater when con-sidering only the Sensor source, where end-to-end la-tency is sub-millisecond when the Adapt service isdeployed on the Packaging Controller. For the A3application path, processing latency of the Predictservice is higher when it runs on the Factory DataCenter, with an average latency of 89ms for applica-tion design option 1, and even higher for options 2and 9, where the Check for Defects, Adapt Machine,and Predict service are all deployed on this node,with 123ms and 108ms, respectively. When the Pre-dict service runs on the Office Data Center or Cloud,this processing latency is lower, between 67ms and77ms. For placement on the Cloud node, this reduc-tion of processing latency is offset by a considerableincrease in network latency to 257ms. The Aggre-gate service of application path A4 has a processinglatency of between 0.1ms and 0.15ms, regardless ofthe machine type of the Wireless Gateway, which thisservice is always deployed to. At this scale, this differ-ence could also be attributed to measurement error.The Generate Dashboard service has a lower process-ing latency when deployed to the Cloud at 89ms to90ms than when deployed to the Factory Data Cen-ter, where processing latency ranges from 95ms up to109ms. Yet again this difference is offset by transmis-sion latency, which, here, is lower at 23ms comparedto 243ms.As we had already ensured through simulation withFogExplorer, all application design options we havebenchmarked on the MockFog testbed comply withall SLOs defined for the application paths.
Using the results from our MockFog experiments,we can now discard more application design options.From the ten application design options we have de-ployed to the emulated testbed, option is the most14 Application Design Option L a t e n c y i n m s (a) Application Path A1 Application Design Option L a t e n c y i n m s (b) Application Path A2 Application Design Option L a t e n c y i n m s (c) Application Path A3 Application Design Option L a t e n c y i n m s (d) Application Path A4 Figure 9: Results for testbed experiments with MockFog. We show average end-to-end latency measured forall application design options for each application path. Error bars show the standard deviation. Applicationdesign option consistently is among those with the lowest end-to-end latency for each application path.efficient. We show service mapping and determinedinfrastructure options in Figure 10. Here, the FactoryData Center hosts the Check for Defects and Gener-ate Dashboard services, the Adapt Machine serviceis placed on the Packaging Controller, the PredictPickup service on the Office Data Center, and theWireless Gateway is used for the Aggregate service.As infrastructure options, we use the smallest avail-able machines for the Wireless Gateway and OfficeData Center, and the medium option for the Fac- tory Data Center. In this application design option,the Cloud is not used to host any services, hence wedo not require a machine there. Here, we skip theoptional deployment of several options on a physicalfog testbed as described in Section 3.6 since we willdo exactly that in our evaluation of result quality inSection 4.2.15 elativePerformanceIndicator: availableMemory:
32 GB price: $50/month
Office Data CenterCloud Factory Data Center relativePerformanceIndicator: availableMemory: price: $20/month Wireless Gateway relativePerformanceIndicator: availableMemory: price: $25/month
CameraSensorPackaging Controller relativePerformanceIndicator: availableMemory: price: $30/month
Production Controller
Check forDefects
GenerateDashboard Aggregate
PredictPickup AdaptMachine
Figure 10: Service Mapping and Infrastructure Op-tion in the Best Application Design Option as Deter-mined in our Case StudyTable 2: Overview of placement options and the stepin which the option was discarded. This shows thatearly process steps alone cannot provide good enoughrecommendations.
Camera ProductionController Sensor PackagingController WirelessGateway FactoryData Center OfficeData Center Cloud A Check forDefects Simulation Simulation BestPractices BestPractices Simulation FinalDesign BestPractices BestPractices A AdaptMachine BestPractices BestPractices Simulation FinalDesign Simulation Simulation BestPractices BestPractices A PredictPickup BestPractices BestPractices BestPractices BestPractices BestPractices FinalDesign EmulatedTestbed EmulatedTestbed A Aggregate Simulation Simulation Simulation Simulation FinalDesign Simulation BestPractices BestPracticesGenerateDashboard BestPractices BestPractices BestPractices BestPractices BestPractices FinalDesign Simulation EmulatedTestbed
After having shown the applicability of our processthrough a case study, we now evaluate it by deploy-ing our resulting architecture on a physical testbed.We benchmark our application with a synthetic work-load and determine whether our process has reallyconverged towards the most efficient design by com-paring it to application design options that were dis-carded in earlier steps of the process.We show application design options and at whichstep we have filtered them out in Table 2. This figurealso shows the final application design as determinedby our process to be the most efficient. The finaldesign has passed the check for best practices, sim-ulation with FogExplorer, and benchmarking on theemulated MockFog testbed. We now further evaluatethis design by comparing it to other design optionsthat we have filtered out during the process. Obvi-ously, we cannot compare all possible design options. For each filter we have applied, we randomly choosethree of the discarded design options, deploy themon a physical testbed and benchmark them.
M1-3 , F1-3 , and
B1-3 denote the three designs that werefiltered out by M ockFog, F ogExplorer, and the ap-plication of b est practices, respectively. For sake ofcomparison, we also deploy and benchmark our final, w inning design as presented in Section 4.1.5, whichwe denote as W . Software components use the sameimplementation and deployment method, i.e., Dockercontainers, as in our emulated MockFog testbed. Ourtestbed comprises two Raspberry Pi 3B+ single boardcomputers, one acting as Camera and ProductionController, and the other as Sensor and PackagingController. These boards connect over 2.4GHz WiFito a
MacBook Pro with an
Intel Core 2 Duo pro-cessor that we use as our Wireless Gateway. Thiscomputer, in turn, connects to a LAN over GigabitEthernet. This network has a 50Mbit/s Internet up-link and a
ThinkPad x220 laptop with an
Intel Corei5 processor that acts as the Factory Data Centerconnected to it. Finally, as our Office Data Center,we use a virtual machine instance on AWS EC2 in the eu-west-1
Ireland region. As the Cloud instance, weuse an AWS EC2 virtual machine instance in the ap-northeast-2
Tokyo region. The respective instancetypes depend on the machine type used in the se-lected application design, see Table 1. Experimentsrun for 20 minutes after an initial startup time of 5minutes and are repeated three times. We report theresults of the median run, variance across runs withthe same experiment setup was between 1% and 4%for all experiment except for setup M3 (9%) whereone outlier had a higher end-to-end latency for theA3 application path, and experiments B1 (15%) andB3 (6%) that were unable to complete correctly.Figure 11 shows the average transmission and pro-cessing times measured in our experiments. Experi-ments for application design options B1 and B2 wereunable to complete as the Predict Pickup service ranout of memory on the Packaging Controller and Wire-less Gateway, respectively, where it was deployedwith these design options. The B3 option, while ableto run all services, leads to a higher latency than oth-ers that have been selected with the first step of ourprocess. Design option F1 has been determined by16
Application Design Option L a t e n c y i n m s (a) Application Path A1 Application Design Option L a t e n c y i n m s (b) Application Path A2 W M1 M2 M3 F1 F2 F3 B1 B2 B3
Application Design Option L a t e n c y i n m s (c) Application Path A3 Application Design Option L a t e n c y i n m s (d) Application Path A4 Figure 11: Latency results for experiments on the physical testbed. We show average end-to-end latencymeasured for all application design options for each application path. Error bars show the standard deviation.Application design options B1 and B2 were unable to run the Predict Pickup service as the infrastructurecomponent would run out of memory, hence no results for the A3 application path can be shown here.FogExplorer to comply with all SLOs, yet was not inthe 95 th percentile cost-wise and was hence discarded.Nevertheless, latency measurements appear to be onpar with designs W and M1 through M3. Option F2violates SLO requirements in the simulation and weobserve that it is also less efficient than others wetest, so this elimination was correct. Finally, whileFogExplorer discards F3 for insufficient resources, asthe Wireless Gateway component here has too little available memory for the Check for Defects service,we were able to deploy it correctly on our physicaltestbed and latency is similar to our winning designoption W. Yet this deployment is more costly than Was it uses more expensive infrastructure components.For options W and M1 through M3 we see results asin MockFog where we have tested these design op-tions already. Consequently, design option W againis again the most efficient option among those.17 Discussion
The five-step design process we propose can help toaddress the challenge of designing efficient fog-basedIoT applications. Yet as with all tools, it is importantto know its limits to employ it correctly. First andforemost, our proposed process targets static applica-tions. Although not all information about the systemis necessarily required upfront and infrastructure andsoftware models are extended and modified along theway, as we have described, our design process is notequipped to deal with dynamic deployment changesas would be necessary for physically moving sources,sinks, or compute nodes. For example, in order toaugment the application with a new service, parts ofthe process would need to be re-run from the start.While simulation and testbed emulation can be au-tomated, applying best practices would need to beapplied by an actual application design engineer.While networks with mobile nodes, frequent out-ages, or regular changes in topology may exist, weenvision that static applications such as the smartfactory in our case study are common. Furthermore,our process may be used for the static componentsof a more dynamic application while the dynamiccomponents are deployed using other approaches suchas [29].One other challenge is that fog application designis complicated given the amount of factors that are atplay. For example, we mention in Section 2.3 that weonly consider service latency and cost as metrics todescribe application design efficiency. Beyond that,availability is of course important as well. Cloudplatforms may, for instance, provide better availabil-ity guarantees than a local data center. Availability,performance, network latency, or available networkbandwidth may also be subject to external influencefactors such as another tenant using the same net-work connection. Abstracting from such factors inour models means that our simulation and testbedexperiments cannot accurately reflect results that wewould observe in the real world. Yet, we argue thatwe need this abstraction to keep models and simula-tion simple, which in turn is necessary to even facil-itate its use in such a design process. These factors can then be tested later in the process by using phys-ical testbeds.In Section 3.4, we have introduced SLOs for appli-cation paths as a way to convert the multi-objectiveoptimization of cost and service latency for each pathinto a single-objective optimization of cost within thespecified latency constraints. While reducing end-to-end latency is always better, we argue that additionalinvestment can lead to diminishing returns after acertain point. Finding these fixed constraints, how-ever, can be difficult for system designers and settingSLOs too low or too high can have negative impactson the overall satisfaction with the final applicationdesign by unnecessarily increasing cost or latency, re-spectively. In future work, we want to further explorethis relationship between cost and utility of reducedlatency so that this decision can be made on a moreinformed basis.
We have motivated how the correct placement of IoTapplication components in the fog is difficult yet cru-cial for an efficient use of resources. This is a knownresearch problem and has been discussed in existingpublications.Brogi et al. [30] present
FogTorch that models foginfrastructure by parameterizing available fog nodes,communication links, end devices, application com-ponents, and QoS constraints, and then finds eligibledeployments of application components. While thisapproach leads to a set of valid application deploy-ment options, solving fog application deployment inthis manner is NP-hard, as the authors show. Con-sequently, finding valid deployment options becomesexponentially harder with each added component andis infeasible for larger deployments. Tong et al. [31]and, to some extent, Heintz et al. [32] take a simi-lar approach to FogTorch, while [4–7, 33–36] employheuristics to solve the formalized optimization prob-lem. While using heuristics can lead to results moreefficiently, it requires infrastructure and software im-plementation details upfront, allowing little room forflexibility and agile development. Often, such infor-mation may not be available at design time. Further-18ore, these approaches only find solutions throughstatic analysis, yet it is hard to verify if the calcu-lated results hold up in a real deployment, which isonly possible through benchmarks on an emulated orphysical testbed.Khare et al. [37] also employ heuristics to createan efficient application design for distributed, edge-based stream processing. Additionally, they also em-ploy it in a multi-step process, where a DAG of theentire application is first split into a set of linearchains for which latency is estimated individually,similar to the application paths we introduced in Sec-tion 3.2. The authors here, however, approximatethese processing chains algorithmically, which is aninteresting alternative approach, as it leads to lessoverhead for application designers, albeit by sacrific-ing accuracy.
Fogernetes as proposed in [38] automates the de-ployment of software services across a number offog nodes by leveraging the Kubernetes orchestra-tion, as Santos et al. [39] have also proposed. Sim-ilarly, [27, 40, 41] have also presented such dynamicmiddleware. While these systems are flexible, theycan only optimize latency and are not aware of systemcosts. Rather, they assume that a specific set of in-frastructure already exists and a mapping exists thatdoes not lead to under-provisioning. In our proposedprocess, we provision only infrastructure that is reallyneeded, keeping overall cost to a minimum. We arguethat a more efficient fog application can be designedby building the underlying infrastructure alongside.Furthermore, often the infrastructure may not yet befixed when the development process is started.To this end, Roy et al. [42] present MAQ-PRO,a process for infrastructure capacity planning forcomponent-based applications that is similar to ourproposed process. MAQ-PRO begins with a pro-file of components, analysis of the application sce-nario (compare Section 3.2), and a base performancemodel (compare Section 3.4), and it also considersSLA bounds and workloads. Their approach, how-ever, is unsuitable for the novel paradigm of fog com-puting as it does not consider network distance be-tween infrastructure components, which is crucial inthe fog. In Section 3.4 we propose to use FogExplorer tosimulate fog placement. Alternatively, Gupta etal. [15] have proposed the iFogSim tool to model andsimulate the use of fog application resources. Theirtool, however, has constraints in that it only allowstree-shaped infrastructure models, which is not repre-sentative of most fog infrastructure that can containcycles, as is the case in our case study, for example.Furthermore, their tool requires highly detailed ap-plication traces, which is not feasible this early in thedesign phase.In [43], Brambilla et al. present an approach forsimulating large scale sensor networks for the IoT.While useful in its own right, it lacks an estimationof system cost and we target more heterogeneous fognetworks, albeit at a lower scale. Additionally, [44–49] also present simulation tools that could be appliedto fog computing.We also propose to use MockFog as an emu-lated testbed for different application designs inSection 3.5. Besides MockFog, other applicationtestbeds exist as well. Eisele et al. [50] propose ahardware in the loop simulation that uses a simu-lation tool in conjunction with a physical testbed.This allows them to leverage flexibility in workloadgeneration from the simulation tool but a realisticenvironment from the physical testbed, yet it alsoleads to increased cost without being entirely accu-rate. The
D-Cloud [51] software testing frameworkallows individual software components to be placedon different virtual machines to emulate a cloud en-vironment. This tool, however, cannot be appliedto a fog infrastructure. Furthermore, Coutinho etal. [18], and Mayer et al. [17] propose
Fogbed and
EmuFog , which use the network simulators Mininetand Maxinet [52], to test distributed fog applications.Yet unlike MockFog, these testbeds can only simu-late realistic network conditions, not the constrainedcompute capabilities of fog nodes, especially at theedge. Balasubramanian et al. [53] present a testbedfor fog applications that facilitates emulating theseconstraints, yet requires physical hardware for eachnode rather than cheaper virtual machines.To the best of our knowledge, our work is the firstthat combines best practices, simulation and emula-19ion into a complete design process for fog-based IoTapplications.
Engineering IoT applications in an efficient way ischallenging as the process needs to consider both soft-ware architecture and their deployment to a physicalinfrastructure. Existing approaches can only providelimited guidance since they are either based on theo-retical models and simulation, i.e., inherently limitedin their accuracy, or based on experiment testbeds,i.e., the evaluation effort is too high to explore morethan a few design options.In this paper, we have proposed a five-step processfor designing efficient fog-based IoT applications thatintegrates and extends previous work of ours. Ratherthan relying solely on global optimization, simula-tion, or testbed benchmarking, we combine best prac-tices, simulation, and testbed evaluation to choosethe most efficient infrastructure options and softwareservice placements from an exponentially growingpool of deployment options. Furthermore, we haveshown the effectiveness of this approach through asmart factory case study. Through deploying differ-ent options on a physical testbed, we also showed thatour process identified an efficient application designin our case study and, by extension, that our processachieves the desired results.
Acknowledgments
Funded by the Deutsche Forschungsgemeinschaft(DFG, German Research Foundation) – 415899119.
References [1] B. Zhang, N. Mor, J. Kolb, D. S. Chan, K. Lutz,E. Allman, J. Wawrzynek, E. Lee, and J. Kubi-atowicz, “The cloud is not enough: Saving iotfrom the cloud,” in ,Jul. 2015, pp. 21–27. [2] D. Bermbach, F. Pallas, D. G. P´erez, P. Plebani,M. Anderson, R. Kat, and S. Tai, “A researchperspective on fog computing,” in
Service-Oriented Computing – ICSOC 2017 Workshops ,Jun. 2018, pp. 198–210.[3] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli,“Fog computing and its role in the internet ofthings,” in
Proceedings of the First Edition ofthe MCC Workshop on Mobile Cloud Comput-ing , Aug. 2012, pp. 13–16.[4] A. Brogi, S. Forti, and A. Ibrahim, “How to bestdeploy your fog applications, probably,” in , May 2017, pp. 105–114.[5] O. Skarlat, M. Nardelli, S. Schulte,M. Borkowski, and P. Leitner, “OptimizedIoT service placement in the fog,”
ServiceOriented Computing and Applications , vol. 11,no. 4, pp. 427–443, 2017.[6] R. Mahmud, K. Ramamohanarao, andR. Buyya, “Latency-aware application modulemanagement for fog computing environments,”
ACM Trans. Internet Technol. , vol. 19, no. 1,pp. 1–21, 2018.[7] H. Hong, P. Tsai, and C. Hsu, “Dynamic mod-ule deployment in a fog computing platform,”in , Oct.2016, pp. 1–6.[8] O. Skarlat, M. Nardelli, S. Schulte, and S. Dust-dar, “Towards QoS-Aware fog service place-ment,” in , May2017, pp. 89–96.[9] T. Pfandzelter and D. Bermbach, “IoT data pro-cessing in the fog: Functions, streams, or batchprocessing?” in
Proc. of DaMove , Jun. 2019, pp.201–206.2010] M. Gusev, B. Koteska, M. Kostoska, B. Jaki-movski, S. Dustdar, O. Scekic, T. Rausch,S. Nastic, S. Ristov, and T. Fahringer, “A de-viceless edge computing approach for stream-ing IoT applications,”
IEEE Internet Comput-ing , vol. 23, no. 1, pp. 37–45, 2019.[11] V. Karagiannis and S. Schulte, “Comparison ofalternative architectures in fog computing,” in , May 2020, pp.19–28.[12] L. Santos, E. Silva, T. Batista, E. Cavalcante,J. Leite, and F. Oquendo, “An architecturalstyle for internet of things systems,” in
Proceed-ings of the 35th Annual ACM Symposium on Ap-plied Computing , Mar. 2020, pp. 1488–1497.[13] J. Hasenburg, S. Werner, and D. Bermbach,“Supporting the evaluation of fog-based IoT ap-plications during the design phase,” in
Proceed-ings of the 5th Workshop on Middleware and Ap-plications for the Internet of Things , Dec. 2018,pp. 1–6.[14] ——, “Fogexplorer,” in
Proceedings of the 19thInternational Middleware Conference (Demosand Posters) , Dec. 2018, pp. 1–2.[15] H. Gupta, A. Vahid Dastjerdi, S. K. Ghosh, andR. Buyya, “iFogSim: A toolkit for modeling andsimulation of resource management techniquesin the internet of things, edge and fog computingenvironments,”
Softw. Pract. Exp. , vol. 47, no. 9,pp. 1275–1296, 2017.[16] J. Hasenburg, M. Grambow, E. Gr¨unewald,S. Huk, and D. Bermbach, “MockFog: Emulat-ing fog computing infrastructure in the cloud,”in , Jun. 2019, pp. 144–152.[17] R. Mayer, L. Graser, H. Gupta, E. Saurez, andU. Ramachandran, “EmuFog: Extensible andscalable emulation of large-scale fog comput-ing infrastructures,” in , Oct. 2017, pp. 1–6. [18] A. Coutinho, F. Greve, C. Prazeres, and J. Car-doso, “Fogbed: A rapid-prototyping emulationenvironment for fog computing,” in , May 2018, pp. 1–7.[19] R. Morabito, V. Cozzolino, A. Y. Ding, N. Bei-jar, and J. Ott, “Consolidate IoT edge com-puting with lightweight virtualization,”
IEEENetw. , vol. 32, no. 1, pp. 102–111, 2018.[20] N. Govindarajan, Y. Simmhan, N. Jamadagni,and P. Misra, “Event processing across edge andthe cloud for internet of things applications,” in
Proceedings of the 20th International Conferenceon Management of Data , Dec. 2014, pp. 101–104.[21] M. R. Anawar, S. Wang, M. Azam Zia, A. K.Jadoon, U. Akram, and S. Raza, “Fog com-puting: An overview of big IoT data analyt-ics,”
Proc. Int. Wirel. Commun. Mob. Comput.Conf. , vol. 2018, pp. 1–22, 2018.[22] D. Bermbach, E. Wittern, and S. Tai,
Cloud Ser-vice Benchmarking: Measuring Quality of CloudServices from a Client Perspective . Springer,Cham, 2017.[23] D. Bermbach, “Benchmarking eventually consis-tent distributed storage systems,” Ph.D. disser-tation, Karlsruhe Institute of Technology, Feb.2014.[24] D. Kossmann, T. Kraska, and S. Loesing, “Anevaluation of alternative architectures for trans-action processing in the cloud,” in
Proceedingsof the 2010 ACM SIGMOD International Con-ference on Management of Data , Jun. 2010, pp.579–590.[25] T. Rausch, C. Lachner, P. A. Frangoudis,P. Raith, and S. Dustdar, “Synthesizing plau-sible infrastructure configurations for evaluat-ing edge computing systems,” in { USENIX } Workshop on Hot Topics in Edge Computing(HotEdge 20) , Jun. 2020.2126] J. Hasenburg, F. Stanek, F. Tschorsch, andD. Bermbach, “Managing latency and ex-cess data dissemination in fog-based pub-lish/subscribe systems,” in
Proceedings of theSecond IEEE International Conference on FogComputing (ICFC 2020) , Apr. 2020, pp. 9–16.[27] O. Skarlat, V. Karagiannis, T. Rausch, K. Bach-mann, and S. Schulte, “A framework for opti-mization, service placement, and runtime opera-tion in the fog,” in , Dec. 2018, pp. 164–173.[28] D. Bermbach, J. Kuhlenkamp, A. Dey, A. Ra-machandran, A. Fekete, and S. Tai, “Bench-Foundry: A Benchmarking Framework for CloudStorage Services,” in
Proceedings of the 15thInternational Conference on Service OrientedComputing (ICSOC 2017) . Springer, 2017.[29] D. Bermbach, S. Maghsudi, J. Hasenburg, andT. Pfandzelter, “Towards auction-based func-tion placement in serverless fog platforms,” in
Proceedings of the Second IEEE InternationalConference on Fog Computing (ICFC 2020) .IEEE, 2020.[30] A. Brogi and S. Forti, “QoS-Aware deploymentof IoT applications through the fog,”
IEEE In-ternet of Things Journal , vol. 4, no. 5, pp. 1185–1192, 2017.[31] L. Tong, Y. Li, and W. Gao, “A hierarchical edgecloud architecture for mobile computing,” in
IEEE INFOCOM 2016 - The 35th Annual IEEEInternational Conference on Computer Commu-nications , Apr. 2016, pp. 1–9.[32] B. Heintz, A. Chandra, and R. K. Sitara-man, “Optimizing timeliness and cost in Geo-Distributed streaming analytics,”
IEEE Trans-actions on Cloud Computing , vol. 8, no. 1, pp.232–245, 2020.[33] X. Xu, D. Li, Z. Dai, S. Li, and X. Chen,“A heuristic offloading method for deep learn-ing edge services in 5G networks,”
IEEE Access ,vol. 7, pp. 67 734–67 744, 2019. [34] V. Cardellini, V. Grassi, F. Lo Presti, andM. Nardelli, “Optimal operator placement fordistributed stream processing applications,” in
Proceedings of the 10th ACM International Con-ference on Distributed and Event-based Systems ,Jun. 2016, pp. 69–80.[35] S. Shekhar, A. Chhokra, H. Sun, A. Gokhale,A. Dubey, X. Koutsoukos, and G. Karsai, “UR-MILA: Dynamically trading-off fog and edge re-sources for performance and mobility-aware IoTservices,”
Journal of Systems Architecture , vol.107, no. 101710, 2020.[36] K. Oh, A. Chandra, and J. Weissman, “A net-work cost-aware geo-distributed data analyt-ics system,” in , May 2020, pp. 649–658.[37] S. Khare, H. Sun, J. Gascon-Samson, K. Zhang,A. Gokhale, Y. Barve, A. Bhattacharjee, andX. Koutsoukos, “Linearize, predict and place:minimizing the makespan for edge-based streamprocessing of directed acyclic graphs,” in
Pro-ceedings of the 4th ACM/IEEE Symposium onEdge Computing , Nov. 2019, pp. 1–14.[38] C. W¨obker, A. Seitz, H. Mueller, andB. Bruegge, “Fogernetes: Deployment and man-agement of fog computing applications,” in
NOMS 2018 - 2018 IEEE/IFIP Network Oper-ations and Management Symposium , Apr. 2018,pp. 1–7.[39] J. Santos, T. Wauters, B. Volckaert, andF. De Turck, “Towards network-aware resourceprovisioning in kubernetes for fog computing ap-plications,” in , Jun. 2019, pp.351–359.[40] E. Saurez, K. Hong, D. Lillethun, U. Ramachan-dran, and B. Ottenw¨alder, “Incremental deploy-ment and migration of geo-distributed situation22wareness applications in the fog,” in
Proceed-ings of the 10th ACM International Conferenceon Distributed and Event-based Systems , Jun.2016, pp. 258–269.[41] D. Santoro, D. Zozin, D. Pizzolli, F. D. Pel-legrini, and S. Cretti, “Foggy: A platform forworkload orchestration in a fog computing envi-ronment,” in , Dec. 2017, pp. 231–234.[42] N. Roy, A. Dubey, A. Gokhale, and L. Dowdy,“A capacity planning process for performanceassurance of component-based distributed sys-tems,” in
Proceedings of the 2nd ACM/SPECInternational Conference on Performance engi-neering , Sep. 2011, pp. 259–270.[43] G. Brambilla, M. Picone, S. Cirani,M. Amoretti, and F. Zanichelli, “A simu-lation platform for large-scale internet of thingsscenarios in urban environments,” in
Proceed-ings of the First International Conference onIoT in Urban Space , Oct. 2014, pp. 50–55.[44] S. Sotiriadis, N. Bessis, E. Asimakopoulou, andN. Mustafee, “Towards simulating the internetof things,” in , May 2014, pp. 444–448.[45] X. Zeng, S. K. Garg, P. Strazdins, P. P. Ja-yaraman, D. Georgakopoulos, and R. Ranjan,“IOTSim: A simulator for analysing IoT ap-plications,”
Int. J. High Perform. Syst. Archit. ,vol. 72, pp. 93–107, 2017.[46] T. Qayyum, A. W. Malik, M. A. Khan Khat-tak, O. Khalid, and S. U. Khan, “FogNetSim++:A toolkit for modeling and simulation of dis-tributed fog environment,”
IEEE Access , vol. 6,pp. 63 570–63 583, 2018.[47] D. Fern´andez-Cerero, A. Fern´andez-Montes,F. Javier Ortega, A. Jak´obik, and A. Wid- lak, “Sphere: Simulator of edge infrastructuresfor the optimization of performance and re-sources energy consumption,”
Simulation Mod-elling Practice and Theory , vol. 101, no.1019663, 2020.[48] C. Sonmez, A. Ozgovde, and C. Ersoy, “Edge-CloudSim: An environment for performanceevaluation of edge computing systems,”
TransEmerging Tel Tech , vol. 29, no. 11, 2018.[49] N. K. Giang, M. Blackstock, R. Lea, andV. C. M. Leung, “Developing IoT applicationsin the fog: A distributed dataflow approach,” in , Oct. 2015, pp. 155–162.[50] S. Eisele, G. Pettet, A. Dubey, and G. Karsai,“Towards an architecture for evaluating and an-alyzing decentralized fog applications,” in , Oct. 2017,pp. 1–6.[51] T. Banzai, H. Koizumi, R. Kanbayashi,T. Imada, T. Hanawa, and M. Sato, “D-Cloud:Design of a software testing environment for reli-able distributed systems using cloud computingtechnology,” in , May 2010, pp. 631–636.[52] R. L. S. de Oliveira, C. M. Schweitzer, A. A.Shinoda, and L. Rodrigues Prete, “Usingmininet for emulation and prototyping software-defined networks,” in , Jun. 2014, pp. 1–6.[53] D. Balasubramanian, A. Dubey, W. R. Otte,W. Emfinger, P. S. Kumar, and G. Karsai, “Arapid testing framework for a mobile cloud,” in , Oct. 2014, pp. 128–134.23
Overview of Application Design Options Deployed to EmulatedTestbed in Case Study
Table 3: Overview of the ten most efficient designs as established by our FogExplorer simulation.
A1 A2 A3 A4
Application DesignOption
Check for Defects
Placement
Adapt Machine
Placement
Predict Pickup
Placement
Aggregate
Placement
Generate Dashboard
Placement FDC PKC FDC WGW CLD FDC FDC FDC WGW CLD FDC FDC ODC WGW CLD FDC FDC CLD WGW CLD FDC PKC ODC WGW FDC FDC FDC ODC WGW FDC FDC PKC CLD WGW FDC FDC FDC CLD WGW FDC FDC FDC FDC WGW CLD FDC FDC CLD WGW FDC
PKC Packaging ControllerWGW Wireless GatewayFDC Factory Data CenterODC Office Data CenterCLD Cloud (a) Service placement in the different application design options tested on the emulated MockFog testbed.
Application DesignOptions
WirelessGateway
HardwareOption
FactoryData Center
HardwareOption
OfficeData Center
HardwareOption
Cloud
HardwareOption
1, 2, 4, 7, 8 1 2 — 13 1 2 1 15, 6 1 2 1 —9, 10 2 2 — 1 (b) Infrastructure options in the different applica-tion design options tested on the emulated MockFogtestbed. Hardware options for the
Camera and
Pro-duction Controller have been omitted for brevity asno service is deployed on these nodes.
End-to-End Latency in msApplication DesignOption A1 A2 A3 A4 Cost in$/month (c) Results of the FogExplorer simulation for the tenbest application design options tested on the emu-lated MockFog testbed. Overview of Application Design Options Deployed to the Phys-ical Testbed in Case Study
Table 4: Overview of the ten application design options selected for deployment on the physical testbed. W denotes the most efficient design as determined by our process. M1-3 , F1-3 , and
B1-3 denote the threedesigns that were filtered out by MockFog, FogExplorer, and the application of Best Practices respectively.
A1 A2 A3 A4
Application DesignOption
Check for Defects
Placement
Adapt Machine
Placement
Predict Pickup
Placement
Aggregate
Placement
Generate Dashboard
Placement W FDC PKC ODC WGW FDC M1 FDC PKC FDC WGW CLD M2 FDC FDC CLD WGW CLD M3 FDC PKC CLD WGW FDC F1 FDC FDC ODC WGW CLD F2 FDC PKC CLD WGW CLD F3 WGW PKC FDC WGW ODC B1 PKC CLD PRC ODC FDC B2 CLD ODC WGW WGW PKC B3 FDC CLD FDC PKC PKC
PRC Production ControllerPKC Packaging ControllerWGW Wireless GatewayFDC Factory Data CenterODC Office Data CenterCLD Cloud (a) Service placement in the different application design options tested on the physical testbed.
Application DesignOptions
WirelessGateway
HardwareOption
FactoryData Center
HardwareOption
OfficeData Center
HardwareOption
Cloud
HardwareOption
W 1 2 1 —M1, M2, M3 1 2 — 1F1 1 2 2 3F2 1 2 3 3F3 1 2 3 —B1 — 2 1 2B2 1 — 1 3B3 — 2 — 1 (b) Infrastructure options in the different applicationdesign options tested on the physical testbed. Hard-ware options for the
Camera and
Production Con-troller have been omitted for brevity as no service isdeployed on these nodes.have been omitted for brevity as no service isdeployed on these nodes.