PIERES: A Playground for Network Interrupt Experiments on Real-Time Embedded Systems in the IoT
Franz Bender, Jan Jonas Brune, Nick Lauritz Keutel, Ilja Behnke, Lauritz Thamsen
PPIERES: A Playground for Network Interrupt Experiments onReal-Time Embedded Systems in the IoT
Franz Bender [email protected] Universität BerlinBerlin, Germany
Jan Jonas Brune [email protected] Universität BerlinBerlin, Germany
Nick Lauritz Keutel [email protected] Universität BerlinBerlin, Germany
Ilja Behnke [email protected] Universität BerlinBerlin, Germany
Lauritz Thamsen [email protected] Universität BerlinBerlin, Germany
ABSTRACT
IoT devices have become an integral part of our lives and the in-dustry. Many of these devices run real-time systems or are used aspart of them. As these devices receive network packets over IP net-works, the network interface informs the CPU about their arrivalusing interrupts that might preempt critical processes. Therefore,the question arises whether network interrupts pose a threat tothe real-timeness of these devices. However, there are few tools toinvestigate this issue.We present a playground which enables researchers to conductexperiments in the context of network interrupt simulation. Theplayground comprises different network interface controller imple-mentations, load generators and timing utilities. It forms a flexibleand easy to use foundation for future network interrupt research.We conduct two verification experiments and two real world ex-amples. The latter give insight into the impact of the interrupthandling strategy parameters and the influence of different loadtypes on the execution time with respect to these parameters.
CCS CONCEPTS • Computer systems organization → Embedded hardware ; •
Computing methodologies → Simulation tools ; •
Hardware → Testing with distributed and parallel systems . KEYWORDS internet of things, real time, interrupts, load simulation, cyber phys-ical systems, benchmarking
ACM Reference Format:
Franz Bender, Jan Jonas Brune, Nick Lauritz Keutel, Ilja Behnke, and LauritzThamsen. 2021. PIERES: A Playground for Network Interrupt Experimentson Real-Time Embedded Systems in the IoT. In
Companion of the 2021ACM/SPEC International Conference on Performance Engineering (ICPE ’21Companion), April 19–23, 2021, Virtual Event, France.
ACM, New York, NY,USA, 4 pages. https://doi.org/10.1145/3447545.3451189
ICPE ’21 Companion, April 19–23, 2021, Virtual Event, France © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.This is the author’s version of the work. It is posted here for your personal use. Not forredistribution. The definitive Version of Record was published in
Companion of the 2021ACM/SPEC International Conference on Performance Engineering (ICPE ’21 Companion),April 19–23, 2021, Virtual Event, France , https://doi.org/10.1145/3447545.3451189.
Many processes in industrial settings have real-time constraints.Engineers need guarantees for these processes, e.g. that an action 𝐴 takes at most 𝑁 seconds. Real-time operating systems (RTOSs)have proven to be a suitable platform for implementing softwareby providing such guarantees. RTOSs offer, in contrast to regularoperating systems, special interfaces for precise timing and sched-uling of time critical tasks [9]. These RTOSs have therefore foundapplication in the field of embedded systems such as sensor sys-tems, industrial control systems and many other specialized devices.For researchers and engineers using RTOSs it is essential that thereal-timeness of their device is maintained.With the rise of the Internet of Things (IoT) many of these de-vices get connected to the internet. This trend has been describedas a new era [10]. It enables features such as remote monitoring,remote debugging and a more intelligent management of devicesby combining data from multiple devices for decisions. You couldtake smart navigation systems as proposed in [3] as an example forthe latter.Modern network interface controllers (NIC) such as the Intel©82574GbE Controller Family use interrupts to inform the CPU about newpackets. This implies that other network devices have the abilityto invoke interrupts at the target. As interrupts are handled byinterrupt service routines (ISRs), which have a very high execu-tion priority, these interrupts may interfere with user code. As therate of network interrupts increases, more time is spent in the ISRinstead of the user code [1].This behavior is sound as ISRs reside in a higher priority spaceto minimize the I/O delay of the embedded system. However, whilebeing a valid phenomenon, this behavior may not be intended bythe engineer of the device, as network packets might not be asimportant as the critical user code. Note that the high numberof interrupts may be caused by a malicious attacker or by othernon-malicious conditions such as a bad network configuration orother similar conditions. All in all it means that adding networkcapabilities to real-time systems adds a per-packet workload whichmay drown critical tasks and break time guarantees.In this paper we present a playground which enables researchersand engineers to tackle these issues. The playground a r X i v : . [ c s . N I] F e b can simulate network interrupts on a microcontroller to helpanalyze the impact of network traffic on applications runningon IoT devices. • offers simulation of continuous and Poisson-distributed network interrupts, as well as replays of captures of actualnetwork trafficThe remainder of this paper is structured as follows. SectionII gives an overview of the related workSection III details the ap-proach we took to design the playground. Section IV details theexperiments we conducted using our playground. Finally, SectionV summarizes our work. Available literature focusses mostly on mitigation strategies.As a general example, interrupt moderation or coalescing [8] isused by the authors of [2] to reduce energy consumption in a systemfor high speed networks. Multiple interrupts are grouped to invokea single interrupt at a later time.Other approaches rely on the optimization of the interruptsthemselves. The authors of [6] use that the interrupt mechanismcan be split up into preprocessing, the ISR and postprocessing.They exploit that the preprocessing and the postprocessing areindependent of the interrupt itself and “reuse” these phases formultiple interrupts and only calling the ISR for every interrupt.While these authors worked on mitigation strategies, othersconducted research on the interrupts themselves. This resultedin different models for network interrupts that differ in their ap-proaches: The authors of [4] present a complex model for the firingand execution of interrupts. They use an extension of stochasticPetri nets to model the system as this allows them to combine theprobabilistic aspects caused by the randomness of the interruptswith the stateful aspects of prioritized interrupt handling.A more basic model was presented in [7]. There, a double in-terrupted poisson process is employed for their calculations. Thisis a poisson process that can be can be in either a high or a lowstate which determines the poisson parameters for the load. Theyconclude by giving formulas for blocking probabilities, i.e. the prob-ability that a link is blocked and packets are dropped.To the best of our knowledge, there is no work that focusses onproviding a testing capability for the impact of network interruptson real-time systems. This section describes the goal of the playground, how to enabledifferent NIC implementations and the choice of network trafficscenarios as test loads, which are detailed in the following.
The goal of this playground is to enable researchers to conductexperiments and to enable engineers to test their code regardingnetwork interrupt simulation. This shall be done on an actual deviceused for IoT applications. Therefore, the playground has to fulfillthe following requirements: It has to • run on real IoT hardware. A Poisson distribution is commonly used to model traffic in the literature, e.g. [7] setup microcontrollerinsert user codeconfigure load parse recorded PCAPconfigure NICflash to µC & executecopy result for analysis
Figure 1: General procedure of a user operating the play-ground. • be capable of simulating multiple NIC implementations. • be able to simulate multiple network traffic scenarios. • be easily configurable. • have minimal performance impact on the tested process. The operation of the playground is shown in Fig. 1. The playgroundis setup on the microcontroller. The user then inserts the code ofthe critical process under test and either chooses a configurationfor a Uniform, a Random Load, or parses a recorded network trace.Afterwards, the user configures the NIC and flashes the playgroundonto the device. Measured metrics can be configured by the userand range from execution time and number of interrupts to morecomplex metrics such as the ratio between interrupt sources (seeSec.4).
The playground has to model different NICs, that can be configuredby the user. The simplest NIC would inform the CPU at the arrivalof every packet by triggering an interrupt for each. Alternatively,some modern NICs offer interrupt moderation. To incorporate thisinto the playground, we offer the choice between a simple NICmodel without any interrupt moderation and several smarter NICimplementations.For the simple NIC, the duration 𝑑 ( 𝑙 ) of an interrupt is dependenton the packet length 𝑙 and plainly modeled as a length dependentdelay 𝑑 𝑙 which is evoked 𝑙 times plus constant length independentdelay 𝑑 𝑐 which is an overhead evoked for every interrupt: 𝑑 ( 𝑙 ) = 𝑑 𝑙 · 𝑙 + 𝑑 𝑐 Different simple NIC implementations can be characterized bythe user through setting the values for the length dependent delay 𝑑 𝑙 and the length independent delay 𝑑 𝑐 .For smarter, more elaborate NIC simulations with interrupt mod-eration, we support NICs to be defined with a counter mode, atimer mode or a combination out of the box. Here, the parts of theinterrupt duration model, packet length and corresponding depen-dent and independent delay, are modeled twice each, once for thesimulated ISR and once for a simulated receiver task. NIC with the counter mode does not trigger an interrupt forevery received packet, but counts the arriving packets, stores themin a buffer and – after a specified number of packets – evokes oneinterrupt for them all. Afterwards, the counter and buffer are reset.Another option is a NIC with the timer mode. In this case, for anarriving packet a delay timer of specified duration is set. Uponexpiration, one interrupt will be triggered. If further packets arrivebefore the timer has run out, the timer will be reset without evokingan interrupt. However, problems may arise if the timer is constantlybeing reset by arriving packets, never allowing an interrupt tobe triggered. The combination of both modes circumvents thisproblem.
The arrival of packets with corresponding time stamps over someobserved time constitutes a load scenario. As we want to simulatedifferent scenarios, the playground offers uniform loads, randomloads and user defined/recorded loads.Uniform loads have a constant receive frequency. For the ran-domized loads, a Poisson distribution is used to model the arrivalof new packets.The Poisson distribution is achieved by inverse transform sam-pling with a uniform distribution. Assuming the number of incom-ing packets per interval 𝑝 𝑖 are Poisson distributed, the inter-arrivaltime 𝑑 𝑖 is exponentially distributed: 𝑝 𝑖 ∼ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 ( 𝜆 ) , 𝑑 𝑖 = 𝑝 𝑖 + − 𝑝 𝑖 ⇒ 𝑑 𝑖 ∼ 𝐸𝑥𝑝 ( 𝜆 ) ˆ 𝑑 𝑖 = 𝐹 − ( 𝑢 𝑖 ) = − 𝜆 ln ( − 𝑢 𝑖 ) ˆ = − 𝜆 ln ( 𝑢 𝑖 ) . By inverse transform sampling (as described in [5] Section 23.2)we determine the empirical delays between packets ˆ 𝑑 𝑖 by sampling 𝑢 𝑖 ∼ 𝑈 ( , ) and calculating ˆ 𝑑 𝑖 . By setting the parameter 𝜆 , differentrandomized Poisson distributed loads can be specified.As a third option, the playground allows for recorded networkscenarios to be replayed. Two validation and two demonstration experiments were performedfor which a large summation (in a loop) with a conditional state-ment at each step of the iteration was used as the user code.
The playground is implemented on the ESP32 , a dual core CPU,and written in C and C++, matching the system of the micro-controller. The parsing script for the recorded network scenarios(PCAPs) is written in Python and generates C++ code. The lastplayground requirement is fulfilled by using both cores to separatethe computational load of the playground code from the tested usercode. s E x ec u ti on ti m e i n s I n t e rr up t s Figure 2: Execution time and number of interrupts decreaseinversely with the increase of the times threshold when us-ing the Smart NIC packet delay timer for interrupt modera-tion. E x ec u ti on ti m e i n s t ( ) t ( ) = 0.142082 st ( ) t ( ) = 1.93 t ( ) = 2500 = 5000 = 10000 Figure 3: Comparison of the execution time of three Poisson-distributed random network load scenarios for different 𝜆 independence of the counter threshold. All validation testing was performed using a Poisson-distributedrandom load. The following results merely show a selection of allthe experiments that have been carried out with the playground.First, the impact of the packet delay timer on the execution timeand the number of interrupts was measured, as seen in Fig. 2. Thistest was performed using the combined interrupt moderation mode.Both, the execution time and the number of interrupts decreasewith an increasing packet delay timer. As the number of packets isconstant, the execution time is reduced by using interrupt modera-tion. This gain from increasing the packet delay timer comes withthe caveat of a higher packet latency.In the second validation experiment three different parameters 𝜆 for the Poisson-distributed random load generation were compared,as seen in Fig. 3. This test was performed using the counter mode.A larger parameter value corresponds to a higher load. In the graph,the counter threshold is plotted against the execution time. 𝜆 ishalf as big as 𝜆 and 𝜆 is twice as big as 𝜆 . As shown in the legend,it can be observed that the difference between the execution timesof 𝜆 and 𝜆 is twice as big as the one between 𝜆 and 𝜆 . This oun t e r t h r e s ho l d T i m e r d e l a y i n m s E x ec u ti on ti m e i n s Figure 4: Ratio of the reasons for triggering interrupts,shown in relation to counter threshold, timer delay and ex-ecution time when using the smart NIC with mixed mode. E x ec u ti on T i m e i n s Execution Time SpotifyExecution Time ZoomNumber of Interrupts SpotifyNumber of Interrupts Zoom 050100150200250 N u m b e r o f I n t e rr up t s Figure 5: Execution time of the test code in two replayednetwork scenarios from prerecorded PCAPs is shown for di-verse delay timer thresholds. ratio stays roughly the same along the x-axis. This shows that thecounter mode scales linearly with load.
In the first practical example the reasons for interrupts using thecombined interrupt moderation mode of the smart NIC model wereinvestigated, as seen in Fig. 4. A Poisson-distributed random loadwas used for this test. The coloring of the data points indicates thereason why an interrupt was triggered. An area in the plane of thedata points where the coloring indicates an equilibrium betweenthe two reasons can be observed. The area extends in both counterthreshold and timer delay directions but drifts towards the direc-tion of the counter threshold axis, indicating that with increasingcounter threshold, it plays less of a role in causing interrupts thanthe timer delay does.In the second practical example we take a look at recorded loads.We compare the execution time of the user code when using a mixedmode NIC with a Spotify network load to a Zoom conference loadthat have been prerecorded. The load is a lot less intense comparedto the previous experiments. We use these two loads because theyhave two different packet arrival patterns: the Spotify load is a bursty load, while the Zoom load is a rather continuous load. Notethat we use a longer running user code (more iterations) here toallow for a longer measurement.Fig. 5 is a combination of two diagrams: the lines show theexecution time for different delay timer thresholds while the filledarea is a histogram which shows in what intervals packets arrive inthe two load scenarios. We see that the execution time in the Spotifyscenario benefits more from the relaxing of the timer timeout delaywhile in the Zoom scenario the behavior matches the behavior of aPoisson load more closely.When comparing the results from both scenarios it becomesmore obvious that the expected load progression can be used tofind suitable interrupt moderation parameters. We presented a playground which enables researchers to conductexperiments in the context of network interrupt simulation in real-time scenarios. It offers multiple load generators including randomand custom prerecorded settings as well as logging capabilities.The playground was validated through a series of tests. We alsopresented two practical use cases, highlighting the ability of theplayground to simulate desired network characteristics and analyzethe results.Further steps include a broader range of more complex NICmodels and random load sources. Additionally, a repository ofPCAP files could be created by playground users.
REFERENCES [1] Ilja Behnke, Lukas Pirl, Lauritz Thamsen, Robert Danicki, Andreas Polze, andOdej Kao. 2020. Interrupting Real-Time IoT Tasks: How Bad Can It Be to ConnectYour Critical Embedded System to the Internet?. In
IPCCC 2020: 39th InternationalPerformance Computing and Communications Conference . IEEE, 1–6.[2] Jaeil Han and Young Man Kim. 2016-10. Interval-Based Adaptive InterruptCoalescing in High-Speed Networks. In (Jeju). IEEE, 68–70.https://doi.org/10.1109/ICTC.2016.7763437[3] Marcus Handte, Stefan Foell, Stephan Wagner, Gerd Kortuem, and Pedro JoseMarron. 2016-10. An Internet-of-Things Enabled Connected Navigation Systemfor Urban Bus Riders. 3, 5 (2016-10), 735–744. https://doi.org/10.1109/JIOT.2016.2554146[4] Gang Hou, Weiqiang Kong, Kuanjiu Zhou, Jie Wang, Xun Cao, and Akira Fukud.2018-07. Analysis of Interrupt Behavior Based on Probabilistic Model Checking.In (Yonago, Japan). IEEE, 86–91. https://doi.org/10.1109/IIAI-AAI.2018.00026[5] Kevin Patrick Murphy. 2012.
Machine Learning: A Probabilistic Perspective . MITPress.[6] K. Nakashima, S. Kusakabe, H. Taniguchi, and M. Amamiya. 2002. Design andImplementation of Interrupt Packaging Mechanism. In
International Workshopon Innovative Architecture for Future Generation High-Performance Processors andSystems (Big Island, HI, USA). IEEE Comput. Soc, 95–102. https://doi.org/10.1109/IWIA.2002.1035023[7] M. Rajaratnam and F. Takawira. 1996. Network Modelling in Circuit-SwitchedNetworks Using the Double Interrupted Poisson Process Model. In
Proceedings of8th Mediterranean Electrotechnical Conference on Industrial Applications in PowerSystems, Computer Science and Telecommunications (MELECON 96) (Bari, Italy),Vol. 2. IEEE, 971–975. https://doi.org/10.1109/MELCON.1996.551371[8] Khaled Salah. 2007-04. To Coalesce or Not to Coalesce. 61, 4 (2007-04), 215–225.https://doi.org/10.1016/j.aeue.2006.04.007[9] John A. Stankovic. 1994. Real-Time Operating Systems. In
Real Time Comput-ing , Wolfgang A. Halang and Alexander D. Stoyenko (Eds.). Springer BerlinHeidelberg, Berlin, Heidelberg, 65–82.[10] Martin Wollschlaeger, Thilo Sauter, and Juergen Jasperneite. 2017-03. The Futureof Industrial Communication: Automation Networks in the Era of the Internetof Things and Industry 4.0. 11, 1 (2017-03), 17–27. https://doi.org/10.1109/MIE.2017.2649104 code and details on https://github.com/dos-group/pieres_playgroundcode and details on https://github.com/dos-group/pieres_playground