Boomerang: Real-Time I/O Meets Legacy Systems
aa r X i v : . [ c s . O S ] M a r Boomerang: Real-Time I/O Meets Legacy Systems
Ahmad Golchin
Computer Science DepartmentBoston University
Boston, [email protected]
Soham Sinha
Computer Science DepartmentBoston University
Boston, [email protected]
Richard West
Computer Science DepartmentBoston University
Boston, [email protected]
Abstract —This paper presents Boomerang, an I/O system thatintegrates a legacy non-real-time OS with one that is customizedfor timing-sensitive tasks. A relatively small RTOS benefits fromthe pre-existing libraries, drivers and services of the legacysystem. Additionally, timing-critical tasks are isolated from lesscritical tasks by securely partitioning machine resources amongthe separate OSes. Boomerang guarantees end-to-end processingdelays on input data that requires outputs to be generated withinspecific time bounds.We show how to construct composable task pipelines inBoomerang that combine functionality spanning a custom RTOSand a legacy Linux system. By dedicating time-critical I/O tothe RTOS, we ensure that complementary services provided byLinux are sufficiently predictable to meet end-to-end serviceguarantees. While Boomerang benefits from spatial isolation,it also outperforms a standalone Linux system using deadline-based CPU reservations for pipeline tasks. We also show howBoomerang outperforms a virtualized system called ACRN,designed for automotive systems.
Index Terms —Partitioning hypervisor, real-time operating sys-tem, composable task pipelines, input/output
I. I
NTRODUCTION
Mixed-criticality systems require the spatial and temporalisolation of tasks to meet timing, safety and security con-straints [1]. Additionally, these systems involve real-time taskpipelines to implement sensing, processing and actuation.For example, an automotive system supports low-criticalityinfotainment services, which must be isolated from highlycritical driving assistance tasks that process sensor data toavoid vehicle collisions.Spatial isolation ensures that one software component can-not alter another component’s private code or data, or interferewith the control of its devices. Temporal isolation ensures thata software component cannot affect when another componentaccesses a resource (e.g., a CPU). Lack of temporal andspatial isolation leads to potential timing or functional failures.Failure of a highly critical task has potentially catastrophicconsequences, while failure of a low-criticality task has lesssignificant consequences.One way to support mixed-criticality systems is to partitiontasks onto separate hardware. This ensures less critical tasksare unable to directly affect those of greater importance.Automotive systems have traditionally taken this approach, byassigning a different functional component to a separate elec-tronic control unit (ECU) [2]. However, as the complexity ofthese systems increases, hardware costs, wiring and packaging become prohibitive. For this reason, new hardware platformsthat integrate the functionality of multiple hardware compon-ents, including multicore processors, accelerators, GPUs, andvarious input/output (I/O) interfaces are now emerging. Tesla’sAutoPilot 2.x, for example, already uses platforms such as theNvidia Drive PX2 in its cars, to assist with vehicle control.An integrated solution, combining tasks of different crit-icality levels on the same hardware, requires an operatingsystem to correctly enforce temporal and spatial isolation.Partitioning operating systems such as Tresos [1] and LynxOS[3] have been developed for automotive and avionics systems,respectively, in accordance with standards such as AUTOSAR[4] and ARINC653 [5], to isolate tasks of different criticalitylevels. However, these types of systems are not able to takeadvantage of legacy software, including libraries and devicedrivers written for the newest hardware. In contrast, systemssuch as Linux, Windows and OS X are regularly updatedwith features that would take an operating system developeryears to reproduce in a clean-slate design. Unfortunately,general purpose systems lack the necessary temporal andspatial requirements, including the ability to perform real-timesensing, processing and actuation required by emerging mixed-criticality systems.In this paper, we present a system called Boomerang. Boom-erang uses a partitioning hypervisor [6], which separates thehardware of a physical machine into different guest domainsthat directly manage their assigned resources. This contrastswith a conventional multiplexing (or consolidating) hypervisor,which intervenes in the sharing of physical machine resourcesamong multiple guests. Boomerang’s approach removes thehypervisor from resource management, once CPU cores, phys-ical memory and I/O devices are assigned to separate guests.Using separate partitions, Boomerang supports the co-existence of a real-time operating system (RTOS) and a legacysystem such as Linux. Rather than treating these systemsas separate guests, Boomerang establishes a tightly-coupled symbiotic relationship , such that the RTOS is empowered withlegacy features, and the legacy system is empowered with real-time capabilities. For example, a Boomerang Linux partitionmight support OpenGL and CUDA libraries for hardwareaccelerators, camera devices, and machine learning algorithms,which would be difficult to write and certify for an RTOS.Likewise, the RTOS partition in Boomerang provides thetiming guarantees for real-time tasks to perform sensor data1rocessing and actuation.Key to this paper’s contributions is the construction of a composable tuned pipe abstraction. This abstraction imple-ments real-time task pipelines that ensure end-to-end guaran-tees on sensing, processing and actuation, spanning both RTOSand legacy OS services. Boomerang extends prior work ontuned pipes between a USB device and a task running in thesame OS [7] to encompass task pipelines spanning an RTOSand another guest. The aim is to show that Boomerang isable to combine legacy and real-time services in a way thatensures information flow is bounded by throughput, loss anddelay constraints.As stated above, many emerging mixed-criticality systemsrequire tasks to process sensory inputs before subsequentlygenerating outputs that affect the actuation of a device. Forexample, a cruise control system in an electric car may collectdata from cameras and speed sensors before determining thatthe motors need to change speed to keep a safe distance tothe vehicle ahead.Novel to Boomerang’s composable tuned pipes is the abilityfor an integrated RTOS based on Quest [8] to manage I/O thatrequires services in a legacy system such as Linux. We showhow to construct composable task pipelines in Boomerang thatcombine tasks spanning Quest and a legacy Linux system. Byassigning time-critical I/O to Quest, Boomerang ensures thatcomplementary services provided by Linux meet end-to-endtiming guarantees. We compare Boomerang to a standaloneLinux system, using specific cores to handle timing-sensitiveI/O. Boomerang not only benefits from spatial isolation, italso outperforms a standalone Linux system using deadline-based CPU reservations for pipeline tasks. We also showhow Boomerang outperforms a partitioning hypervisor calledACRN, designed for automotive systems.The following section provides background to the problemaddressed by Boomerang. Section III describes the Boomer-ang partitioning hypervisor and composable tuned pipes. Anevaluation of Boomerang is described in Section IV. Relatedwork is discussed in Section V. Finally, conclusions and futurework are described in Section VI.II. B
ACKGROUND
Boomerang supports composable task pipelines that form around-trip path, originating from a device input and ultimatelyfinishing with a device output. It is designed specifically forapplications that require sensing, processing and actuation.Figure 1(a) shows the round-trip path in a typical OS. Adevice acknowledges the completion of an I/O request bygenerating an interrupt. Most systems handle interrupts atpriorities above those of software tasks. They also incorrectlycharge interrupt handling to the task that was preempted bythe arrival of the interrupt. Worse still, a burst of interruptswithin a short time may delay a time-critical task enough tomiss its deadline [8], [9].Figure 1(a) uses a dedicated core for I/O handling of deviceinterrupts, to avoid interference with task execution. However, Figure 1: (a) Round-trip I/O in a single OS, and (b) possibleI/O paths in a Boomerang partitioning hypervisor.the single OS approach does not provide adequate spatial isola-tion of tasks of different criticalities, and underutilizes the coreexclusively used for interrupt handling. If the OS malfunctionsthen tasks of all criticalities are potentially compromised. Incontrast, Figure 1(b) shows how Boomerang supports threedifferent classes of I/O using a partitioning hypervisor [10],[11] to separate highly critical timing sensitive operations fromless critical system components using different guest OSes.In the first case (shown with a dashed line), all I/O iscontained within the RTOS. Real-time tasks and interrupthandlers for device I/O share the same processor cores, asthe RTOS ensures predictable timing guarantees on task andI/O processing.In the second case, the I/O path traverses a task pipelinethat enters into a legacy OS via secure shared memory. Here,the legacy OS provides services that would require significanteffort to port to the RTOS. The round-trip I/O path in case 2is still able to meet end-to-end timing guarantees because thetasks in the legacy OS are isolated from timing unpredictabilitycaused by interrupts. This is possible by demoting interrupts(in the legacy OS) to priorities that are distinctly lower thanthose of tasks. Additionally, legacy OSes such as Linux sup-port
SCHED_DEADLINE execution for tasks, thereby ensuringsome degree of timing guarantees, as long as there is nointerference from interrupts [7].In the third case, it may be necessary for some I/O to behandled by a legacy system, which has drivers and librariesthat are unavailable in the RTOS. For example, a series ofcameras used in a driverless car need suitable device driversand machine learning algorithms to perform object classific-ation. The outcomes of object classification dictate whetherinformation needs to be communicated to the RTOS to issuereal-time outputs that adjust vehicle motion. As with the singleOS approach, I/O originating in case 3 may handle interruptson a dedicated core, to avoid interference with tasks that serveRTOS requests in case 2. Alternatively, I/O processing in thelegacy OS is given lower priority than task execution, leavingcritical I/O to the real-time OS.
A. VCPU Scheduling
Boomerang’s partitioning hypervisor allows each guest todirectly manage its assignment of physical CPUs (PCPUs) .This differs from a traditional hypervisor, which schedules A PCPU is either a processor core, hardware thread, or individual CPU. C , and period, T . A VCPU is required to receive atleast C units of execution time every T time units when itis runnable, as long as a schedulability test [13] is passedwhen creating new VCPUs. This way, the Quest schedulerguarantees temporal isolation between threads associated withdifferent VCPUs.Figure 2: VCPU scheduling hierarchy in Quest.Figure 2 shows the scheduling of threads and VCPUs forreal-time tasks and interrupt handlers. Tasks are assigned to Main VCPUs , and separate
IO VCPUs are used for interrupthandling. Main VCPUs are implemented as Sporadic Serv-ers [14]. Each Sporadic Server keeps track of its VCPU’sbudget usage, and constructs a list of timestamped futurereplenishments, to ensure timing guarantees. By default eachSporadic Server VCPU is scheduled using Rate-MonotonicScheduling (RMS) [15], although an alternative policy such asearliest-deadline first (EDF) may be chosen. With RMS, theVCPU with the smallest period, T , has the highest priority.To ensure that tasks are isolated from interrupts, Questpromotes interrupt handling to a schedulable thread context,whose execution is charged to an IO VCPU. Each IO VCPUis associated with the Main VCPU that led to the occurrenceof the interrupt. Such occurrences result from tasks issuingblocking requests (e.g., via a read() system call), or asystem thread awaiting a kernel event.Consider a task that issues a blocking I/O request on adevice (e.g., USB interface). When the task blocks, it stopscharging execution time to its Main VCPU. Some time later aninterrupt occurs when an I/O transfer is complete. This causesa top half handler to execute, which determines the MainVCPU waiting on I/O. The top half then inserts into a systemready queue an IO VCPU with a dynamically calculatedbudget and period, based on the parameters of its correspond-ing Main VCPU. Finally, the interrupt is acknowledged, andall subsequent handling occurs in a bottom half thread context,when the corresponding IO VCPU is scheduled. Consequently, all bottom half execution time is charged to its IO VCPUbefore the blocked task resumes execution on its Main VCPU.Each IO VCPU in Quest is given a utilization bound, U IO .There is one IO VCPU for each device class, with classesexisting for USB, networking, ATA, and GPIO devices, amongothers. When an IO VCPU is added to the scheduler readyqueue, its budget is set to U IO × T Main and its period is setto T Main , where T Main is the period of the Main VCPU ofthe source entity associated with the interrupt. Quest is thenable to correctly schedule bottom half interrupt handlers atthe priority of the source task running on a Main VCPU. Thiscontrasts with systems such as Linux, which schedule bottomhalves (a.k.a., tasklets or softirqs) at priorities that are not tiedto the source of corresponding interrupts.IO VCPUs have a dynamically calculated budget and periodbased on the Main VCPUs they serve, to avoid the overhead ofmaintaining replenishment lists for short-lived interrupt serviceroutines (ISRs). This budget is eligible for use as long asthe sustained IO VCPU’s utilization does not exceed U IO .This policy is shown to be effective for short-lived interruptservice routines (ISRs), which would fragment a SporadicServer budget as used for Main VCPUs.Quest requires reprogramming of hardware timers in one-shot mode, to determine the next system event. This is similarto Linux’s tickless operation. As IO VCPUs only have onebudget replenishment to consider, rather than a list, this leadsto reduced timer reprogramming overhead. B. Communication Model
Data flow involves a pipeline of communicating tasks. Eachtask processes its input data to produce output, either fordevices or subsequent tasks in the pipeline. This leads to acommunication model characterized by: (1) the interarrivaltimes of tasks in the pipeline, (2) inter-task buffering, and(3) each tasks’ access pattern to communication buffers.
Task Interarrival Times.
Each task ordinarily samplesinput data periodically. However, a task will block if data isunavailable, leading to aperiodic or irregular intervals betweensuccessive task instances. Either way, a task pipeline’s timingrequirements assume that data will propagate with a minimuminter-arrival time between tasks.
Register-based versus FIFO-based Communication.
AFIFO-based shared buffer is used in scenarios where data his-tory is an important factor. However, in sensor-data processingthe most recent data is often more important. For example,a driving assistance system should always compute outputsthat affect vehicle dynamics from the latest sensor data.FIFO-based communication results in loosely synchronouscommunication: the producer is suspended when the FIFObuffer is full and the consumer is suspended when the bufferis empty. Register-based communication achieves asynchronybetween two parties using Simpson’s four-slot algorithm [16].
Implicit versus Explicit Communication.
Explicit com-munication allows access to shared data at any time duringa task’s execution. This might lead to data inconsistencyin the presence of task preemption. Conversely, the implicit3ommunication model [17] essentially follows a read-before-execute paradigm to avoid data inconsistency. It mandates atask to make a local copy of the shared data at the beginningof its execution and to work on that copy throughout itsexecution.Boomerang supports both periodic and aperiodic tasks. Italso supports both register- and FIFO-based communication.Implicit communication is enforced for data consistency.III. B
OOMERANG
The Boomerang partitioning hypervisor divides processorcores, physical memory and I/O devices among guest domains.Each guest manages its physical resources without involvementof the hypervisor. This has two important properties: (1)the hypervisor is only used to bootstrap the system and toestablish secure communication channels between guests usinghardware extended page tables (EPTs) , and (2) the hypervisoris removed from runtime management of physical machineresources, making its trusted code base extremely small.Boomerang’s partitioning hypervisor has a text segment ofless than 4KB, although more space is needed for EPTs (e.g.,24KB for a 4GB guest). Given the hypervisor is not accessedunder normal guest operation, the system’s most privileged ring of protection is less susceptible to security attacks than aconventional OS image running directly on hardware. In thelatter case, system calls must pass control to the OS kernel,whereas in Boomerang these are restricted to the local guest.Unlike traditional hypervisors that multiplex guests ontothe same shared physical machine, partitioning hypervisorsoffer opportunities for applications that require security andtiming predictability. Hardware virtualization features isolateguests, using an additional ring of protection reserved for thehypervisor. At the same time, time-critical guests are ableto run real-time resource management policies without beingcompromised by additional resource management policies inthe hypervisor.We see partitioning hypervisors as being suitable for mixed-criticality systems, requiring spatial and temporal isolationof application tasks and software components according todifferent system criticality levels. For example, automotivesystems adhering to standards such as ISO 26262 [18] arerequired to meet specific functional safety requirements, ac-cording to several classes of automotive safety integrity levels known as ASIL A-D. Software certified to ASIL D standardoperates at the most stringent safety level, where the riskof failure is potentially life threatening. In contrast, ASILA applies to software that has a very low probability ofsignificant human injury even during failures. Other standardssuch as ARINC 653 and DO-178B have similar requirementsfor avionics systems. For these types of systems, it is possibleto assign software to machine partitions according to theirsafety integrity levels. Intel processors with VT-x capabilities refer to these tables as EPTs. AMD-V processors have similar nested page tables (NPTs).
A. Composable Tuned Pipes
Figure 3 shows a logical representation of a single tunedpipe (a.k.a., tpipe ). A pipe has one pipe processor and two endpoints , with one endpoint for input and the other for output.A pipe processor is represented by a VCPU, guaranteeingat least C units of execution time every T time units whenrunnable. Pipe processors are associated with tasks bound toMain VCPUs, or threaded interrupt handlers bound to I/OVCPUs. Figure 3: A tuned pipe.A tuned pipe guarantees data flowing from an input toan output endpoint is processed according to specific servicerequirements. These requirements apply end-to-end, through a pipeline of one or more tuned pipes. If the pipeline is lossless,it ensures specific throughput and delay guarantees, whereasif it is lossy, it guarantees a maximum fraction of lost datawhile meeting delay bounds.Boomerang maintains a local repository for each guestOS (a.k.a., sandbox or machine partition), which stores in-formation about available endpoints. The repository recordsa globally unique identifier for each endpoint, in the form: hostID:sandboxID:asID:epID . This distinguishes endpoints indifferent host machines (by hostID ) , sandboxes (by sand-boxID ), and address spaces (by asID ). Access capabilitiesrestrict which tuned pipes are able to connect to endpoints.The rules controlling connectivity to endpoints are a topicof ongoing research. They have implications for secure in-formation flow analysis [19]–[21], which is outside the scopeof this paper. Notwithstanding, pipelines may be constructedwithin a single address space, between address spaces in thesame machine partition, between different partitions on thesame host, and across different hosts.When creating a tuned pipe, Boomerang automatically cal-culates (i.e., tunes ) the budget and period of the pipe VCPU toensure end-to-end guarantees are met. Tuned pipes are createdwith a call to tpipe() , as follows: tpipe_id_t tpipe(ep_t *inp[], int n_inp, ep_t *outp,qos_t spec, tpipe_task_t func, void* arg); The input endpoint of the new tuned pipe specifies an arrayof pointers, inp , to endpoint types. This array identifies theendpoint addresses of n_inp inputs to the tuned pipe, alongwith the buffering semantics of each input, which will bediscussed in Section III-B. In this paper, we restrict communication within the same host machine. func ), which sends its output to specificdestinations connected to the output endpoint, identified by outp . The callback function takes an optional argument( arg ), and runs in its own thread context. The thread con-text defines a task, which is bound to a VCPU having anautomatically-generated budget, C i , and period, T i , for thetuned pipe, tpipe i . The budget and period are derived fromthe quality-of-service (QoS) requirement ( spec ) for end-to-end throughput and delay on data processing. This requirementmust also satisfy the schedulability of all VCPUs on a givenphysical CPU (PCPU), otherwise the tuned pipe is not created.If a tuned pipe is successfully created, it is given a unique IDwithin its guest OS. tpipe i requires its callback function to process data fromone or more input endpoints and produce output in onequantum of size C i , every period, T i . Functions are selectedfrom a predefined repository of callbacks. Each callback hasa known worst-case execution time (WCET) based on pre-profiled timing information to handle a maximum I i inputs andproduce up to O i outputs in one quantum. The actual amountof processing in a quantum depends on the availability of datain input buffers, and how many outputs need to be written.Each function in the repository declares the allowablebuffering capabilities for its inputs and outputs. Any tunedpipe connecting to another with a function that does not matchthe allowed buffering capabilities is rejected. B. POSIX Pipes versus Tuned Pipes
Similarities exist between a pair of tuned pipes and a singlePOSIX pipe. The latter provides a shared memory bufferthat is accessible to a group of communicating threads viafile descriptors. The file descriptors describe the endpointcapabilities, including whether the pipe is readable or writable.A tuned pipe pair in Boomerang differ from a POSIX pipeby capturing the timing requirements for data processing andcommunication. They also define the buffering semantics forI/O endpoints. Two pipes, tpipe i and tpipe j are composedby connecting the output endpoint of tpipe i to the inputendpoint of tpipe j . Boomerang allows the composition of twoor more pipes to support either asynchronous ( RT_ASYNC ) orsemi-asynchronous (
RT_FIFO ) communication, as shown inFigure 4.Figure 4: A two-stage pipeline with (a) 4-slot asynchronousbuffering, and (b) semi-asynchronous ring buffering. With
RT_ASYNC , Simpson’s Four Slot bufferingscheme [16], [22] is used to allow the two pipe threads toexecute independently of each other. Four Slot communicationguarantees freshness and integrity of data objects exchangedbetween a producer and consumer, without the sender orreceiver ever having to block. Freshness guarantees the mostrecent value of a data object is made available. Integrityensures a data object is not partially updated before theprevious object has been read in entirety.With
RT_FIFO , a ring-buffer is established between thecommunicating pair of pipes to avoid data loss. However, thesender must block when the buffer is full, and the receivermust block when the buffer is empty. This places a timingdependency on producers and consumers, which potentiallyviolates end-to-end timing guarantees unless data flow ratesare managed correctly.
C. Device versus Task Pipes
Boomerang’s RTOS provides a pre-defined set of tunedpipes for all devices involved in real-time I/O. A devicepipe features an IO VCPU for interrupt handling, and anoptional Main VCPU for endpoint buffer management ofshared devices. Sharing requires scatter-gather functions tomove data between the device endpoint buffer and pipe-specific buffers of task pipes . If a device is not shared, itshandler directly accesses the buffer of a specific task pipe.The tpipe() call, described earlier, creates a task pipe .Unlike a device pipe, there is no IO VCPU for interrupthandling. Task pipes form pipelines between device pipesthat act as the sources and sinks of input and output data,respectively.Figure 5: Example composition of a device and task pipe forasynchronous I/O.Figure 5 shows an example composition of a device andtask pipe for asynchronous (non-blocking) I/O communication.The device is assumed to be shared with other tasks. If atask requires semi-asynchronous device communication forblocking I/O, it would replace the four slot pipe buffer with aring buffer.
D. Pipeline Construction
Pipelines of tuned pipes are constructed in the order inwhich data flows, from input to output. A tuned pipe isresponsible for the creation of all buffers that connect to itsinput endpoint. It also declares its output endpoint, whichincludes a count of the number of outputs it handles. A5ipeline is incomplete until all I i inputs and O i outputs ofeach tpipe i are connected.The output endpoint of each task pipe has a connection toa default device pipe, which could be a null device. A systemcall interface allows this output endpoint to be redirected toone or more different device pipes.Once fully connected, the system activates the pipeline byallowing each tpipe task to be scheduled for execution. Thosetasks that execute in the RTOS are runnable when they haveavailable budgets on their corresponding VCPUs. Tuned pipetasks that execute in Linux are runnable when they have avail-able budgets in their SCHED_DEADLINE scheduling class.Linux’s
SCHED_DEADLINE scheduling class uses a ConstantBandwidth Server [23] to limit the maximum CPU bandwidthconsumed by a task within a specific period. The end of theperiod is used to define the task deadline, and all tasks arescheduled earliest deadline first. However, interrupt handlersare not managed in this scheme.Boomerang runs our in-house RTOS in one sandbox, andLinux in another sandbox on the same physical machine. ALinux kernel module maps a secure shared memory region bycalling into the hypervisor. The hypervisor uses EPTs to mapmachine physical memory into each sandbox so they are ableto communicate.Each sandbox is equipped with kernel services that managea local repository of endpoints and tuned pipes. Communica-tion services allow queries to a remote sandbox, to discoverendpoints and to connect or disconnect from tuned pipes.Mailbox channels are established by Boomerang to enableOSes in different sandboxes to send remote OS requests.Access policies determine whether address spaces in the localor remote sandbox are able to connect to endpoints of existingtuned pipes.Boomerang’s RTOS provides a remote shell to Linuxthrough inter-sandbox shared memory. Linux uses a kernelmodule to allow user-space application programs with rootprivilege to execute shell commands on the RTOS. A shellinterface allows pipelines of tuned pipes to be constructed. TheRTOS is able to query endpoints and tuned pipes that exist inLinux, and issue requests to connect to them via tpipe() calls.After the construction of the pipeline, the RTOS runsan end-to-end throughput and delay analysis. If the end-to-end requirements are met for the pipeline, the transmissionof data is allowed to begin from the RTOS. Tuned pipefunctions synchronize their start and end of operation life-cycle using Start-Of-Task and End-Of-Task packets on theirinput endpoints.The following example illustrates a pipeline specification: [ ∗ ]( A | B ) , C | D | E, F [ e e tput | loss rate, e e delay ] The resultant pipeline is shown in Figure 6. Boomerang’srepository of tuned pipe functions requires that A and C connect to a device output endpoint for reading, while E and F connect to a device input endpoint for writing. Figure 6: An example pipeline with multiple inputs & outputs.Boomerang defaults to non-blocking tuned pipe semantics,where data freshness is more important than lossless commu-nication. Figure 6 shows four-slot buffering of all pipelinestages. If lossless communication is required, the entirepipeline specification is preceded by an asterisk. This pipelinewould then use FIFO buffers between each pair of tuned pipes.With four-slot buffering, the entire pipeline has an op-tional end-to-end service specification in terms of tolerable loss rate and e e delay . With FIFO buffering, the pipelineis specified with an optional end-to-end throughput, e e tput ,and delay. The throughput is measured as the minimumnumber of data objects per unit time exiting a final tunedpipe, while the delay is measured in microseconds. Each dataobject represents a message, which is the size of one slot ofeither a four-slot or FIFO buffer.If the QoS specification is omitted, then the pipeline defaultsto best effort. In such case, the VCPUs of each tuned piperevert to their default values. If the pipeline overloads thePCPUs to which it is assigned, leading to an infeasibleschedule, its VCPU periods are repeatedly extended until thepipeline is schedulable.The shell interpreter allows parallel sections of a pipeline tobe defined by comma-separated lists of tuned pipes. Here, thepipeline section A | B runs parallel with C . This could be rep-resentative of two separate input sensor streams coming fromtwo different devices. Parentheses ensure correct grouping ofpipeline sequences, while two tuned pipes are connected usingthe shell vertical bar symbol ( | ).In the example, the outputs of B and C feed into the singletuned pipe, D . Similarly, the output of D is split across E and F . D might represent a sensor fusion and control task, while E and F might be specific actuator tasks that output their datato different devices. In an automotive system, for example, E and F might send their outputs to two different CAN buses,managed by device pipes.The e e delay constraint applies to the longest path throughthe pipeline, while for FIFO-buffered communication the e e tput applies to whichever final task pipe has the lowestrate of output. If FIFO-buffering were used in the figure,whichever of E and F had the lowest output rate would dictatethe end-to-end throughput.As a four-slot buffered pipeline allows each tuned pipe toread and process whatever data sits in its input buffers, it ispossible that new data has overwritten old data before theconsumer runs. This happens if the producer has an arrivalrate, λ = 1 /T p , greater than the consumer’s service rate, µ =1 /T c . Here, it is assumed that T p and T c are set to ensure6ne message transfer every corresponding period, regardlessof whether it is a new message or not. E. End-to-end QoS Guarantees
Given a pipeline of tuned pipes and buffers, Boomerangruns a constraint solver to determine C i and T i for each tpipe i .The function executed by tpipe i is assumed to process at leastone of its I i inputs and generate one of its O i outputs everyperiod, T i . Essentially, one or more processed data messages propagate through a tuned pipe within C i execution time.Boomerang assumes that C i is derived by pre-profiling theWCET of the corresponding task function. This WCET is thenstored in the local repository, along with the set of inputs andoutputs used by the function.For a pipeline to successfully meet its end-to-end timingrequirements, Boomerang must still determine each period, T i | T i >C i , and possibly scale each service time C i to forwardmore than one message at a time. It follows that a FIFObuffered pipeline successfully meets its end-to-end timingrequirements if:1) P i ∈ l T i ≤ e e delay , for the longest path l ,2) min ∀ i { m i T i }≥ e e tput , where m i ≥ messages aretransferred by tpipe i every C i ,3) all FIFO buffers are sized to ensure no additional block-ing delays of tasks, and4) all task scheduling constraints are satisfied on theirrespective PCPUs.Similarly, a pipeline with four-slot buffering meets its end-to-end requirements if:1) P i ∈ l T i ≤ e e delay , for the longest path l ,2) max { − T p T c }≤ loss rate , for all T p ≤ T c , and3) all task scheduling constraints are satisfied on theirrespective PCPUs.The end-to-end delay represents the time for a message totraverse the longest path through a pipeline. The final messageoutput from the pipeline is a transformation of data propagatedthrough each tuned pipe.The worst-case end-to-end delay is the sum of all the periodsof the tuned pipes in the longest path, plus any blockingdelays. The blocking delays are zero with asynchronous com-munication as each tuned pipe processes its most recent data,regardless of it being updated. Similarly, blocking delays areavoided with FIFO-based communication if each buffer isnever empty or totally full.It follows that each tuned pipe propagates a message after C i worst-case execution time. However, if data arrives at theinputs to a tuned pipe when it has just depleted its budget,it must wait T i − C i before the budget is replenished. If thenext tuned pipe is not synchronized to start exactly when theprevious pipe forwards its data there could be an additionaldelay of T i − C i on top of C i to process the data in tpipe i .To see this more clearly, consider a system of T tasks eachwith a service time of time unit every T . Suppose two ofthese tasks are associated with tpipe and tpipe . Input data D in to tpipe is processed and forwarded to tpipe , which produces D out . These two tuned pipes form a pipeline, whileall other tasks compete for execution on the same PCPU. Usingeither rate monotonic or earliest deadline first scheduling [15]yields the same schedule in this case: neglecting schedulingoverheads, each task has the same priority. A possible scheduleis shown in Figure 7.Figure 7: Worst-case delay: D in (cid:1) tpipe (cid:1) tpipe (cid:1) D out .The worst-case end-to-end delay is when each of the T − tasks other than those for tpipe and tpipe run immediatelyafter the data, D in , has arrived. Then, tpipe executes andprocesses old input data before tpipe is able to read D in .Consequently, tpipe does not process D in and forward theoutput to tpipe until T time after the data first arrived.Similarly, tpipe is not able to run again until T − , whenit finally reads D in . This is because the scheduler will notprovide it with a budget replenishment until one period afterit last executed. The total end-to-end delay between D out and D in is therefore T − . For large T this approaches a worst-case delay of T . Extending this to more than two tasks ina pipeline leads to the worst-case end-to-end delay being thesum of the corresponding tuned pipe periods.The end-to-end throughput of a path through a FIFO buf-fered pipeline is limited by the minimum output rate of anyone tuned pipe in that path. A tuned pipe’s output rate is howmany messages it is able to forward in its period. As FIFObuffering allows tpipe i to forward m i ≥ messages per period,the minimum value of m i T i ≥ e e tput for all i is a lower-boundon overall throughput.For any pair of tuned pipes connected via FIFO buffers,it is essential that blocking delays are factored into the end-to-end service guarantees. Boomerang tries to avoid blockingon message exchanges by matching the arrival and departurerates of messages passed through shared FIFO buffers.Suppose a producing and consuming pair of tuned pipeshave budgets C p and C c , respectively. Given C p = C in issufficient to produce one message in T p , and C c = C out issufficient to consume one message in T c , Boomerang startsby setting T p = T c = ∆ , where ∆ · n = e e delay , and n isthe number of tuned pipes in the longest path. This ensuresthe producer and consumer are rate-matched , to prevent thebuffer between them either completely filling or emptying.Rate-matching is applied to all tuned pipes in the pipeline.If the pipeline cannot feasibly be scheduled on its PCPUs,each tuned pipe period is scaled by a factor α , where α > .This is repeated until all tuned pipes are schedulable, but leadsto a violation of the end-to-end latency requirement.To reduce end-to-end latency, Boomerang adjusts tuned pipeperiods, starting with the inputs to the pipeline. For eachtuned pipe pair, T p is repeatedly halved and C c is similarlydoubled, ensuring that T p >C p , T c >C c and all VCPUs are7chedulable when possible. The doubling of C c enables it toprocess multiple messages, m c , in one budget cycle. T p isreduced until either the entire pipeline meets its end-to-enddelay requirement or it is set as low as feasibly possible. If P i ∈ l T i ≤ e e delay for longest path l , the algorithms stops,or else it moves onto the next stage in the path, and repeatsthe above procedure.If all stages of the pipeline have been processed from inputto output, the algorithm revisits each consumer whose budgetis set to process multiple messages in one period. For eachconsumer, both C c and T c are halved, as long as C c is nosmaller than the time to process one message. If the path’s e e delay is satisfied, or tuned pipe periods and budgetscannot be reduced further, the algorithm stops. At this pointeach C p = m p · C in and each C c = m c · C out , for m p , m c ≥ .If a feasible pipeline schedule is found, each FIFO bufferis set to have enough space for m p · ( ⌈ T c T p ⌉ + 1) messages fromthe producer. ⌈ T c T p ⌉ accounts for the maximum number of timesthe producer can generate m p messages within one period ofthe consumer. An additional m p messages may be generatedby the producer by the time the consumer accesses the buffer,due to potential phase shifting between the two tasks.For four-slot communication, if the consumer has a smallerperiod than a producer at any stage in the pipeline, then theconsumer will always see the most recent data. Given thatfour-slot communication restricts each tuned pipe to read,process and write one message every period, it is impossiblefor a pipeline to lose any data if all consumer periods aresmaller than their corresponding producer periods. However,if a consumer has a larger period than its producer, such that T c > T p , then the producer may overwrite data before theconsumer sees the previous message. It follows that the loss-rate through a four-slot pipeline is limited to the maximumvalue of − T p T c of any stage in the pipeline. This is animportant metric for sensor data processing, where the fractionof lost data must be constrained.Irrespective of four-slot or FIFO-based communication,all VCPUs serving all tuned pipes in a pipeline must sat-isfy the system scheduling requirements. For n tuned pipesscheduled using rate-monotonic scheduling, the schedulingconstraint is satisfied if P ni =1 C i T i ≤ n · (2 /n − . If earliest-deadline first scheduling is used, the scheduling constraintis satisfied if P ni =1 C i T i ≤ on a single PCPU. Boomerangapplies these constraints, including utilization bounds on IOVCPUs used by device pipes, to ensure pipeline schedulability.This holds for pipelines encompassing our RTOS and LinuxSCHED DEADLINE tasks.IV. E VALUATION
We evaluated Boomerang on an Up Squared single-boardcomputer (SBC), featuring an Intel Pentium N4200 processor,as shown in Figure 8. We connected a five-channel KvaserUSBCan Pro 5xHS CAN bus interface via USB 3.0, to emulatean automotive system.Traffic on CAN channels 1-3 (CAN1-3) was produced byWoodward MotoHawk ECM5634-70 ECUs, as used in chassis Figure 8: Boomerang experimental setup.and powertrain applications in a real vehicle. Each of thesechannels produced data at 20%, 30% and 40% of their 500kbpsbandwidths, respectively. Channels 4 and 5 (CAN4-5) werereplaced with Arduino UNOs [24] equipped with CAN shields,to collect performance data.Two separate pipelines were constructed for CAN4 andCAN5, with thread budgets and periods shown in Table I.These pipelines shared three device I/O threads: mhydra_rx and mhydra_tx for Kvaser USBCan scatter-gather func-tionality, and a USB xHCI bottom half handler (
USB_BH ).Pipeline 1 (labeled 1 in Figure 8) consisted of three taskpipes:
CanRead , ProcData & CanWrite , to read, process,and write CAN data, respectively. All tasks ran in the RTOSexcept
ProcData ( τ ) , which executed in Linux and rep-resented a task requiring capabilities unavailable in the RTOS.Pipeline 1 extended from the RTOS into Linux via a secureshared memory channel using extended page table mappings.Pipeline 2 (whose I/O path is shown with a dashed line andlabeled 2 ) consisted of two task pipes that both ran in theRTOS. These tasks were RTFusion and
RTControl , forsensor data fusion and control.
Thread Budget (ms)
Period (ms)
Utilization (%)
CorePipeline 1 (CAN4: τ (cid:1) τ (cid:1) τ (cid:1) τ (cid:1) τ (cid:1) τ (cid:1) τ ) USB BH ( τ ) 0.1 1 10 0 mhydra rx ( τ ) 0.2 1 20 0CanRead ( τ ) 0.1 2 5 0ProcData ( τ ) 0.2 2 10 1CanWrite ( τ ) 0.1 2 5 0 mhydra tx ( τ ) 0.2 1 20 0 USB BH ( τ ) 0.1 1 10 0 Pipeline 2 (CAN5: τ (cid:1) τ (cid:1) τ (cid:1) τ (cid:1) τ (cid:1) τ ) USB BH ( τ ) 0.1 1 10 0 mhydra rx ( τ ) 0.2 1 20 0RTFusion ( τ ) 0.1 2 5 0RTControl ( τ ) 0.1 2 5 0 mhydra tx ( τ ) 0.2 1 20 0 USB BH ( τ ) 0.1 1 10 0 Background ×
11 – – 57 1
Table I: Pipeline task parameters.xHCI device interrupts were mapped to Core 0, while allother device interrupts were redirected to Core 1. 11 back-ground tasks running on Core 1 generated disk and networkI/O activity. These included five wget tasks that each retrieveda copy of a 1.9GB binary image over the Internet. Five other8asks performed file copies of a local version of the binaryimage to different directories. A periodic task additionallyconsumed 20% of the CPU time to bring the total utilizationon Core 1 (including
ProcData ) to 67%.Given the above setup, we compared Boomerang to tunedpipes implemented in a standalone Linux SMP system. Thestandalone system did not have the support of Quest, in-stead mapping all tasks in Table I to the specified cores ofthe same OS. Yocto Linux (Pyro release), featuring kernel4.9.99 with the PREEMPT RT patch, was used in both thestandalone system and Boomerang Linux guest. With LinuxSMP, all threads were assigned budgets and periods withinthe
SCHED_DEADLINE scheduling class except the
USB_BH bottom half handler.All experiments featuring Boomerang and Linux SMP wererun for 30s, averaged over 10 runs each. End-to-end delayresults are limited to the first 200 packet transmissions, due tospace. In all results, similar behavior was observed for moreextensive runs.
A. Asynchronous Communication
Asynchronous communication has the potential to suffer in-formation loss. We constructed two experiments with expectedpipeline losses of 0% and 20%. In both cases, packets forPipelines 1 and 2 arrived and departed on CAN4 and CAN5channels, respectively. We measured the round-trip time foreach packet to be read from and written to each of thesechannels. From Table I (
Period column), the expected end-to-end delay for Pipeline 1 is 10ms, and for Pipeline 2 is 8ms.
1) End-to-end Delay:
Figures 9a and 9b show the end-to-end delay for Pipelines 1 and 2, when there is no expectedloss. The horizontal lines represent the expected latency ascalculated from the sum of the task periods. The end-to-endlatency for Boomerang is always less than the theoreticallycalculated bound. However, Linux SMP frequently fails tomeet the end-to-end delay requirements. The main reason isthe priority mismatch between bottom-half handlers and thetask awaiting I/O operations. Our RTOS ensures that bottom-half handlers run at the correct priority with a specific CPUreservation. Therefore, Boomerang achieves temporal isolationbetween tasks and interrupts.
Packet ID E n d - t o - e n d D e l a y ( m s ) BoomerangLinux SMPDelay Bound (a) Pipeline 1
Packet ID E n d - t o - e n d D e l a y ( m s ) BoomerangLinux SMPDelay Bound (b) Pipeline 2
Figure 9: End-to-end delay with no expected loss.As Linux is unable to achieve the same level of timingguarantees, even when tasks are guaranteed CPU reservations, there are some lost packets as observed by the missing datapoints in Figures 9a and 9b. Table II summarizes the end-to-end latency results. It also shows that Linux suffers packetlosses of 28% and 56% for Pipelines 1 and 2, respectively.
System Min (ms)
Max (ms)
Avg (ms)
Loss (%)
Pipeline 1 (Delay bound = 10 ms)Boomerang 0.79 9.57 3.27 0Linux SMP 2.1 31.5 11.70 28
Pipeline 2 (Delay bound = 8 ms)Boomerang 0.92 7.97 4.35 0Linux SMP 1.8 24.77 6.79 56
Table II: Latency - no expected loss.Boomerang experienced a total of 20623 interrupts com-pared to 16693 for Linux SMP during these experiments.Linux has fewer overall interrupts but more on Core 0.We conjecture this is caused by local APIC timer inter-rupts, which are influenced by the budget management ofSCHED DEADLINE tasks. However, this requires furtherinvestigation. Notwithstanding, Linux SMP fails to meet end-to-end delay guarantees because of its unpredictability ininterrupt handling.
2) Loss:
Sensor data processing is often tolerant of lostsamples. We increased the periods of certain pipeline tasks, asshown in Table III, to allow up to 20% lost data. The expectedlatency for Pipeline 1 is now changed from 10ms to 11ms dueto increased periods of
ProcData and
CanWrite . Similarly,the expected latency of Pipeline 2 is changed from 8ms to8.5ms due to the increased periodicity of
RTControl . Task Pipeline Loss (%)
Budget (ms)
Period (ms)ProcData 0 0.2 220 0.2 2.5CanWrite 0 0.1 220 0.1 2.5RTControl 0 0.1 220 0.1 2.5
Table III: Task parameters for different loss constraints.Figures 10a and 10b show the the performance of Boom-erang versus Linux SMP. Once again, both pipelines transferdata within their end-to-end delay bounds with Boomerang,but not with Linux SMP. The packet latency for Pipeline 2 is,on average, worse for Linux SMP in Figure 10b compared toFigure 9b. This is because the
RTControl task might notreceive a packet until a later period due to lost transfers. Thetask period itself is also larger to cause the increased likelihoodof packet loss.Boomerang keeps the loss ratio within 20%, as observedin Table IV. However, Linux SMP loses 50-55% of the 200packets sent across each pipeline.
B. ACRN Partitioning Hypervisor
The experiments in Section IV-A were repeated with animplementation of tuned pipes in the ACRN partitioninghypervisor. ACRN has similarities to Jailhouse [11], but isalready ported to the Up Squared board and is targeted at the9
50 100 150 200
Packet ID E n d - t o - e n d D e l a y ( m s ) BoomerangLinux SMPDelay Bound (a) Pipeline 1
Packet ID E n d - t o - e n d D e l a y ( m s ) BoomerangLinux SMPDelay Bound (b) Pipeline 2
Figure 10: End-to-end delay with 20% allowed loss.
System Min (ms)
Max (ms)
Avg (ms)
Loss (%)
Pipeline 1 (Delay bound = 11 ms)Boomerang 0.64 10.96 4.87 3.5Linux SMP 2.24 98.21 14.46 55
Pipeline 2 (Delay bound = 8.5 ms)Boomerang 0.64 2.38 1.07 0Linux SMP 3.49 96.02 13.91 50
Table IV: Latency - 20% allowed loss.same applications as Boomerang. ACRN specifically supportssafety-critical applications such as Automotive SDC (SoftwareDefined Cockpit) and IVE (In-Vehicle Experience), similarto Boomerang [25]. It supports partitioning of CPU cores,memory, and I/O devices among one Service OS (SOS) andmultiple User OSes (UOS). The SOS provides backend devicedrivers and bootstraps UOSes.Figure 11: Inter-sandbox communication in ACRN.The ACRN tuned pipe implementation uses a virtual net-work bridge and tap devices for inter-sandbox communication.Figure 11 depicts how data is exchanged between a UOS andthe SOS, using shared memory ring buffers mapped to bothVMs. A UOS request passes through a TCP stack and virtualdevice driver, causing a VMExit. Then the hypervisor notifiesthe SOS about the new message. Virtio services within theSOS deliver the message to the appropriate backend devicedriver, where it passes through the TCP stack and into user-space. Although capable of mimicking network communica- tion between two guests, this approach incurs far more timingunpredictability compared to Boomerang’s dedicated sharedmemory communication channels. Data exchanges betweentuned pipes in ACRN incur VMExits and, hence, control flowtransitions via the ACRN hypervisor that are avoided withBoomerang. The consequence of this is shown in Figures 12a–12d, where ACRN is compared with Linux SMP. Linux SMPhas already been shown to be less predictable than Boomerangin Section IV-A.In these experiments we used a PREEMPT RT-patchedClearOS Linux based on kernel version 4.19.73 for both theSOS and UOS, as recommended by the ACRN developers.Both ACRN and Linux SMP had the same mapping ofthreads to cores, as shown for Boomerang in Table I. ACRNadditionally partitioned tasks and resources in the same wayas Boomerang in Figure 8, except the SOS replaced Boom-erang’s Quest RTOS, and the UOS featured ClearOS Linuxinstead of Yocto Linux. We intentionally did not port theQuest RTOS to ACRN as it would leave little differencebetween the solution provided by Boomerang and ACRN,except the implementation of the hypervisor and inter-sandboxcommunication method.As Boomerang already outperforms Linux SMP, it followsthat ACRN’s lack of timing predictability makes it inferior forend-to-end communication guarantees.
C. Synchronous Communication
We repeated experiments with Pipelines 1 and 2 using FIFO-buffering. The constraint solver described in Section III-E isused to establish correct budgets, periods and buffer sizes whenpipelines are constructed. The updated budgets and periods arepresented in Table V. Buffer sizes are 4, 2 and 4 messages,respectively between
CanRead and
ProcData , ProcData and
CanWrite , and
RTFusion and
RTControl . Task Budget (ms)
Period (ms)
Utilization (%)
Pipeline 1 (CAN4)CanRead 0.1 2 5ProcData 0.2 4 5CanWrite 0.2 4 5
Pipeline 2 (CAN5)RTFusion 0.1 2 5RTControl 0.125 2.5 5
Table V: Synchronous pipeline (common threads not shown).1) Throughput and Delay:
The expected end-to-end delayof Pipeline 1 is increased to 14ms because of the increasedperiods of the tpipe threads. Figures 13a and 13b showthe revised end-to-end delays. Measurements are summarizedin Table VI. FIFO buffering does not improve the latencyfor Linux SMP because of previously mentioned issues withinterrupts. However, it reduces the packet loss for LinuxSMP, as a buffer holds messages even if a tpipe thread isinterrupted.Table VII shows the throughput with Boomerang and LinuxSMP are similar, although the standard deviation is smallerwith Boomerang. Arrival rates ( λ ) from CAN4 and CAN5 areshown for each pipeline.10
50 100 150 200
Packet ID E n d - t o - e n d D e l a y ( m s ) Linux SMPACRNDelay Bound (a) Pipeline 1 - no expected loss
Packet ID E n d - t o - e n d D e l a y ( m s ) Linux SMPACRNDelay Bound (b) Pipeline 2 - no expected loss
Packet ID E n d - t o - e n d D e l a y ( m s ) Linux SMPACRNDelay Bound (c) Pipeline 1 - 20% allowed loss
Packet ID E n d - t o - e n d D e l a y ( m s ) Linux SMPACRNDelay Bound (d) Pipeline 2 - 20% allowed loss
Figure 12: ACRN versus Linux SMP for asynchronous communication.
Packet ID E n d - t o - e n d D e l a y ( m s ) BoomerangLinux SMPDelay Bound (a) Pipeline 1
Packet ID E n d - t o - e n d D e l a y ( m s ) BoomerangLinux SMPDelay Bound (b) Pipeline 2
Figure 13: FIFO buffered synchronous communication.
System Min (ms)
Max (ms)
Avg (ms)
Loss (%)
Pipeline 1 (Delay bound = 14 ms)Boomerang 0.77 11.23 4.25 0Linux SMP 0.96 65.24 33.10 0.5
Pipeline 2 (Delay bound = 8.5 ms)Boomerang 0.70 5.03 1.56 0Linux SMP 0.70 38.46 12.84 0
Table VI: Latency - FIFO buffering.
System Min (msg/s)
Max (msg/s)
Avg (msg/s) stddevPipeline 1 ( λ =100 msgs/s)Boomerang 99 101 99.77 0.63Linux SMP 86 105 98.1 4.39 Pipeline 2 ( λ =125 msgs/s)Boomerang 123 126 124.77 0.73Linux SMP 120 126 123.17 1.39 Table VII: Synchronous throughput.
D. MIMO Pipelines
Boomerang supports the construction of pipelines with mul-tiple inputs and outputs (MIMO). We constructed a pipelinebased on Figure 6, representative of automotive tasks wheremultiple sensor inputs are combined to control more than oneactuator. Using the labeling in that figure, tuned pipes A − F have the following (budget, period) pairs in milliseconds: A (0.1,1), B (0.2, 2), C (0.1, 1), D (0.4, 2), E (0.1, 1) and F (0.1, 1). A reads input from CAN4 while C reads inputfrom CAN5. Similarly, E writes back to CAN4, and F writesto CAN5. The CAN4 path traverses ABDE , while the CAN5path traverses
CDF . Tuned pipe D is shared by both paths; itruns in Linux while all other tuned pipes operate in the RTOS.Tables VIIIa and VIIIb summarize the latencies andthroughput, while Figure 14 shows the end-to-end delay. The Min (ms)
Max (ms)
Avg (ms)
StdDev Loss (%)
CAN4 path (Delay bound=10 ms)0.86 9.63 2.56 1.32 0
CAN5 path (Delay bound=8 ms)0.70 5.00 2.11 0.86 0 (a) Latency
Min (msg/s)
Max (msg/s)
Avg (msg/s)
StdDevCAN4 path ( λ =100 msgs/s)99 101 99.74 0.58 CAN5 path ( λ =125 msgs/s)124 126 124.74 0.58 (b) Throughput Table VIII: MIMO pipelines in Boomerang.delay bounds of the two paths are 10ms and 8ms, accountingfor 4ms worst-case delay from mhydra rx/tx and
USB BH threads, using the parameters shown in Table I. Even withmultiple device inputs and outputs, both paths through CAN4and CAN5 transfer data within their expected bounds.
Packet ID E n d - t o - e n d D e l a y ( m s ) Boomerang CAN4 PathBoomerang CAN5 PathDelay Bound CAN4 PathDelay Bound CAN5 Path
Figure 14: MIMO pipeline delay guarantees.V. R
ELATED W ORK
A. Operating Systems
Mercer et al implemented processor capacity reserves in theMach micro-kernel [12], to provide tasks with budgets andperiods. Steere et al used a reservation-based scheme alongwith a feedback-based controller to adjust CPU allocationsamong tasks [26]. Linux supports reservation-based schedulingusing the
PREEMPT_RT patch [27] and
SCHED_DEADLINE [28] task execution managed by a Constant BandwidthServer [23]. LITMUS RT [29] is a Linux-based system that sup-ports configurable real-time schedulers, including those withreservations. Multiple RTOSes attempt to provide temporalisolation to tasks [30]–[32]. However, these systems do notproperly handle events such as interrupts, which may interferewith the timing requirements of real-time tasks.RT-Linux virtualizes interrupts for non-time-critical parts ofthe system, thereby ensuring real-time service to time-critical11asks [33]. Similar approaches have been adopted by WindRiver Linux [34], the Real-time Application Interface (RTAI)for Linux [35], Xenomai [36], and NASA’s CFS Linux [37].Zhang et al integrated interrupt handling with task schedulingin Linux. A bottom half handler for a device interrupt inheritedthe highest priority of a blocked process waiting on thedevice [9]. However, interrupt handling was not limited toa CPU reservation, meaning a burst of interrupts could stillinterfere with tasks.Many real-time OSes provide a single address space, multi-threaded solution for multicore machines [38]–[40]. However,this is insufficient for many safety-critical domains, requiringboth temporal and spatial isolation between components ofdifferent criticality levels. The Quest RTOS [8] not onlysupports multiple address spaces, but also provides a Priority-Inherited Bandwidth-preserving Server approach to serve theinterrupts in a timely manner along with CPU-bound tasks.While Quest provides timing isolation for both I/O- and CPU-bound tasks, it does not support the richness of services foundin a legacy system such as Linux. B. Hypervisors
Several hypervisors attempt to support both temporal andspatial isolation of guests [41]–[44]. RT-Xen [45] adds real-time scheduling support to the Xen [46] hypervisor. How-ever, all these hypervisors multiplex their guests on a sharedphysical machine. They virtualize interrupts, and performadditional resource management operations that conflict withthe policies within each guest.Partitioning hypervisors allow guests to directly managesubsets of machine resources. The Quest-V [10] separationkernel [47] uses a partitioning hypervisor to support the co-existence of the Quest RTOS and one or more general purposeOSes. Each guest OS runs simultaneously on separate cores ina multicore machine, with device interrupts delivered directlyto the guest that owns the device.PikeOS [48] and Muen [49] are separation kernels that sup-port multiple guest OSes. However, unlike Quest-V, interruptsare trapped into the hypervisor, and subsequently delivered tothe guest OSes. Jailhouse [11] and ACRN [50] have similarit-ies to Quest-V. Jailhouse uses Linux to bootstrap a system thatprovides cells for system inmates. These are essentially restric-ted hardware subsets assigned to guests. ACRN’s philosophy isto allow a service OS to manage machine resources on behalfof other safety-critical OSes. However, as with Jailhouse, thereis currently no way to communicate between guests with end-to-end timing guarantees. Boomerang’s partitioning hypervisoris modeled on the approach taken by Quest-V, but providessupport for composable tuned pipes spanning multiple guests.
C. Predictable Communication
Boomerang’s support for composable tuned pipes is inspiredby Scout [51], which treats paths through a sequence ofservices as first-class schedulable entities. Path processing isentirely within the context of a single thread that is scheduledaccording to the bottleneck queue. Boomerang, in contrast, schedules each component of a pipeline with a separate time-budgeted thread. This allows paths to be interleaved andexecuted on different PCPUs, spanning different sandboxes.RAD-FLOWS [52] provided a design framework for pre-dictable data communication. Golchin et al developed a systemabstraction for predictable data delivery between USB devicesand a real-time process [7]. Boomerang provides support forreal-time I/O to span multiple tasks in different guest VMs.VI. C
ONCLUSIONS AND F UTURE W ORK
This paper presents Boomerang, an I/O system comprisingreal-time task pipelines in a partitioning hypervisor. Boomer-ang’s partitioning hypervisor connects a built-in guest RTOS(Quest) with a legacy system such as Linux, via secureand predictable shared memory communication channels. Thelegacy OS benefits from timing predictable services that areisolated from less critical code. At the same time, the RTOSbenefits from the pre-existing services, including librariesand lower criticality device drivers of a legacy non-real-timesystem.Boomerang supports composable tuned pipes, for real-timetask pipelines that require guaranteed end-to-end service ondata transfers. The system provides real-time task pipelineswith complementary legacy services that are timing predictableusing CPU reservations.Experiments show that real-time task pipelines guaranteeend-to-end throughput, delay and loss requirements in Boom-erang. This is the case for all pipelines contained withinthe RTOS and which span both the RTOS and Linux. Incontrast, task pipelines in a Linux-only system are not able toensure end-to-end service constraints, even when using CPUreservations. This is because of task interference by interruptsfrom I/O devices. The interrupt handlers need to be assignedsuitable CPU reservations at appropriate priorities that matchthe pipelined tasks they serve. Alternatively, if I/O processingis assigned to a dedicated core, it reduces system utilization.Finally, other partitioning hypervisors such as ACRN rely onheavyweight networking protocols and VMExits to performinter-guest communication via shared memory, rendering themunsuitable for real-time data processing.Future work will extend Boomerang’s composable tunedpipes to span different physical machines. We see a pro-gramming model for real-time pipes useful in data flowmachines and stream processing applications, such as thosein neuromorphic computing.A
CKNOWLEDGMENT
This work is supported in part by the National ScienceFoundation (NSF) under Grant
EFERENCES[1] R. O. Bernhard Leiner, Martin Schlager and B. Huber, “A Comparison ofPartitioning Operating Systems for Integrated Systems,” in
Proceedingsof the 26th International Conference on Computer Safety, Reliability andSecurity (SAFECOMP)
Proceedings of the 39thIEEE Real-Time Systems Symposium (RTSS) , Dec 2018, pp. 196 – 207.[8] M. Danish, Y. Li, and R. West, “Virtual-CPU Scheduling in the QuestOperating System,” in . IEEE, 2011, pp. 169–179.[9] Y. Zhang and R. West, “Process-Aware Interrupt Scheduling and Ac-counting,” in
Proceedings of the 27th IEEE International Real-TimeSystems Symposium , ser. RTSS ’06. Washington, DC, USA: IEEEComputer Society, 2006, pp. 191–201.[10] R. West, Y. Li, E. Missimer, and M. Danish, “A Virtualized SeparationKernel for Mixed-Criticality Systems,”
ACM Transactions on ComputerSystems , vol. 34, no. 3, pp. 8:1–8:41, Jun. 2016.[11] R. Ramsauer, J. Kiszka, D. Lohmann, and W. Mauerer, “Look Mum, NoVM Exits! (Almost),” in
Proceedings of the 13th Workshop on OperatingSystems Platforms for Embedded Real-time Applications (OSPERT) ,2017.[12] C. W. Mercer, S. Savage, and H. Tokuda, “Processor Capacity Reservesfor Multimedia Operating Systems,” in
Proceedings of the IEEE Inter-national Conference on Multimedia Computing and Systems , 1994.[13] J. Lehoczky, L. Sha, and Y. Ding, “The Rate Monotonic SchedulingAlgorithm: Exact Characterization and Average Case Behavior,” in
Proceedings of the IEEE Real-Time Systems Symposium (RTSS) , 1989.[14] B. Sprunt, “Scheduling Sporadic and Aperiodic Events in a Hard Real-Time System,” Software Engineering Institute, Carnegie Mellon, Tech.Rep., 1989.[15] C. L. Liu and J. W. Layland, “Scheduling Algorithms for Multiprogram-ming in a Hard Real-Time Environment,”
Journal of the ACM , vol. 20,no. 1, pp. 46–61, 1973.[16] H. Simpson, “Four-slot Fully Asynchronous Communication Mechan-ism,”
IEEE Computers and Digital Techniques , vol. 137, pp. 17–30,January 1990.[17] A. Hamann, D. Dasari, S. Kramer, M. Pressler, and F. Wurst, “Commu-nication Centric Design in Complex Automotive Embedded Systems,”in
Proceedings of the 29th Euromicro Conference on Real-Time Systems ,Dagstuhl, Germany, 2017.[18] ISO, “ISO 26262-3: Road vehicles - Functional safety - Part 3: Conceptphase ,” 2011.[19] N. Zeldovich, S. Boyd-Wickizer, E. Kohler, and D. Mazieres, “Makinginformation flow explicit in histar,” in
OSDI ’06: Proceedings of thesecond USENIX symposium on Operating systems design and imple-mentation , 2006, pp. 263–278.[20] P. Efstathopoulos, M. Krohn, S. VanDeBogart, C. Frey, D. Ziegler,E. Kohler, D. Mazi`eres, F. Kaashoek, and R. Morris, “Labels and EventProcesses in the Asbestos Operating System,” in
SOSP ’05: Proceedingsof the twentieth ACM symposium on Operating systems principles . NewYork, NY, USA: ACM Press, 2005, pp. 17–30.[21] D. E. Bell and L. J. LaPadula, “Secure Computer System: UnifiedExposition and Multics Interpretation,” Mitre Corporation, Bedford, MA,Tech. Rep. ESD-TR-75-306, March 1976.[22] J. Rushby, “Model Checking Simpson’s Four-slot Fully AsynchronousCommunication Mechanism,”
Computer Science Laboratory–SRI Inter-national, Tech. Rep. Issued , 2002.[23] L. Abeni and G. Buttazzo, “Integrating Multimedia Applications in HardReal-Time Systems,” in
Proceedings of the 19th IEEE Real-time SystemsSymposium , 1998, pp. 4–13. [24] Arduino Homepage, 2019, http://arduino.cc.[25] S. Sinha, A. Golchin, C. Einstein, and R. West, “A ParavirtualizedAndroid for Next Generation Interactive Automotive Systems,” in
Pro-ceedings of the 21st International Workshop on Mobile ComputingSystems and Applications (HotMobile 2020) , Austin, Texas, USA, March3–4 2020.[26] D. C. Steere, A. Goel, J. Gruenberg, D. McNamee, C. Pu, and J. Walpole,“A Feedback-driven Proportion Allocator for Real-rate Scheduling,” in
Proceedings of the Third Symposium on Operating Systems Design andImplementation
Litmus RT : A Testbed for Empirically Comparing Real-time Multiprocessor Schedulers,” in ACM SIGOPS Operating Systems Review , vol. 9,no. 5. ACM, 1975, pp. 33–42.[32] J. J. Labrosse,
MicroC/OS-II: The Real Time Kernel . CRC Press, 2002.[33] V. Yodaiken,
The RT Linux Manifesto
Motion Con-trol . Citeseer, 2009, pp. 263–272.[45] S. Xi, J. Wilson, C. Lu, and C. Gill, “RT-Xen: Towards Real-timeHypervisor Scheduling in Xen,” in
Proceedings of the ninth ACMinternational conference on Embedded software . ACM, 2011, pp. 39–48.[46] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neuge-bauer, I. Pratt, and A. Warfield, “Xen and the Art of Virtualization,” in
ACM SIGOPS operating systems review , vol. 37, no. 5. ACM, 2003,pp. 164–177.[47] J. Rushby, “The Design and Verification of Secure Systems,” in
EighthACM Symposium on Operating System Principles (SOSP) , Asilomar,CA, Dec. 1981, pp. 12–21, (ACM
Operating Systems Review
University of Applied Sciences Rapperswil (HSR),Tech. Rep , 2013.[50] Project ACRN, 2019, https://projectacrn.org/.[51] D. Mosberger and L. L. Peterson, “Making Paths Explicit in the ScoutOperating System,” in
Proceedings of the Second USENIX Symposiumon Operating Systems Design and Implementation , ser. OSDI ’96. NewYork, NY, USA: ACM, 1996, pp. 153–167.[52] R. Pineiro, K. Ioannidou, S. A. Brandt, and C. Maltzahn, “RAD–FLOWS: Buffering For Predictable Communication,” in
Proceedings ofthe 17th IEEE Real-Time and Embedded Technology and ApplicationsSymposium , April 2011, pp. 23–33., April 2011, pp. 23–33.