[PDF] LINTS^RT: A Learning-driven Testbed for Intelligent Scheduling in Embedded Systems

Abstract

Due to the increasing complexity seen in both workloads and hardware resources in state-of-the-art embedded systems, developing efficient real-time schedulers and the corresponding schedulability tests becomes rather challenging. Although close to optimal schedulability performance can be achieved for supporting simple system models in practice, adding any small complexity element into the problem context such as non-preemption or resource heterogeneity would cause significant pessimism, which may not be eliminated by any existing scheduling technique. In this paper, we present LINTS^RT, a learning-based testbed for intelligent real-time scheduling, which has the potential to handle various complexities seen in practice. The design of LINTS^RT is fundamentally motivated by AlphaGo Zero for playing the board game Go, and specifically addresses several critical challenges due to the real-time scheduling context. We first present a clean design of LINTS^RT for supporting the basic case: scheduling sporadic workloads on a homogeneous multiprocessor, and then demonstrate how to easily extend the framework to handle further complexities such as non-preemption and resource heterogeneity. Both application and OS-level implementation and evaluation demonstrate that LINTS^RT is able to achieve significantly higher runtime schedulability under different settings compared to perhaps the most commonly applied schedulers, global EDF, and RM. To our knowledge, this work is the first attempt to design and implement an extensible learning-based testbed for autonomously making real-time scheduling decisions.

Full PDF

LLINTS RT : A Learning-driven Testbed for IntelligentScheduling in Embedded Systems Zelun Kong, Yaswanth Yadlapalli, Soroush Bateni, Junfeng Guo, Cong Liu

Department of Computer Science,The University of Texas at Dallas

Abstract —Due to the increasing complexity seen in bothworkloads and hardware resources in state-of-the-art embed-ded systems, developing efﬁcient real-time schedulers and thecorresponding schedulability tests becomes rather challenging.Although close to optimal schedulability performance can beachieved for supporting simple system models in practice, addingany small complexity element into the problem context such asnon-preemption or resource heterogeneity would cause signiﬁ-cant pessimism, which may not be eliminated by any existingscheduling technique. In this paper, we present LINTS RT , alearning-based testbed for intelligent real-time scheduling, whichhas the potential to handle various complexities seen in practice.The design of LINTS RT is fundamentally motivated by AlphaGoZero for playing the board game Go, and speciﬁcally addressesseveral critical challenges due to the real-time scheduling context.We ﬁrst present a clean design of LINTS RT for supporting thebasic case: scheduling sporadic workloads on a homogeneousmultiprocessor, and then demonstrate how to easily extendthe framework to handle further complexities such as non-preemption and resource heterogeneity. Both application and OS-level implementation and evaluation demonstrate that LINTS RT is able to achieve signiﬁcantly higher runtime schedulabilityunder different settings compared to perhaps the most commonlyapplied schedulers, global EDF, and RM. To our knowledge,this work is the ﬁrst attempt to design and implement anextensible learning-based testbed for autonomously making real-time scheduling decisions. I. INTRODUCTIONOne of the core areas within the embedded and real-timesystems research community is to design efﬁcient and practicalrule-based scheduling algorithms (e.g., EDF) for workloadscharacterized by various formal task models. For earlier taskand system models, rather elegant scheduling algorithms alongwith their corresponding schedulability tests have been pro-posed, e.g., the optimality of EDF on a uniprocessor [1].Unfortunately, it becomes increasingly hard to design suchscheduling tools due to the increasing complexities seen inboth workloads and hardware resources in state-of-the-art em-bedded systems. Any tiny piece of complexity in either work-load (e.g., non-preemption or data dependency) or hardware(e.g., resource heterogeneity) would cause signiﬁcant pes-simism in using existing scheduling disciplines. For instance,the non-preemptive scheduling of sporadic task systems on ahomogeneous multiprocessor has shown to be hard [2], [3],[4], where using existing scheduling algorithms may result inunacceptable utilization loss under various scenarios. Addingmultiple complexity elements into the real-time schedulingcontext would make most existing rule-based scheduler design fail. For instance, the problem of non-preemptively schedul-ing sporadic tasks on a heterogeneous multiprocessor largelyremains as an open problem, despite only two straightforwardcomplexity elements (i.e., non-preemption and resource het-erogeneity) being considered.This work is motivated due to a collaboration with aprominent industrial partner in the ﬁeld of wireless basebandnetworking systems. We were asked a challenging questionthat is it possible to design a real-time scheduling frameworkthat would be efﬁcient in terms of achieving low deadlinemiss ratio in handling workloads that may exhibit variouscomplexity characteristics. Unfortunately, we realize that itis difﬁcult (if not impossible) to design such a schedulingsolution containing a set of ﬁxed rules or heuristics that maycover various system settings exhibiting different workloadand hardware characteristics. Indeed, many of these real-timescheduling problems are NP-hard problems, and to come upwith a general base framework using carefully chosen rule-based approaches for these problems may not be feasible.An alternative to solve traditional NP-hard problems is byusing machine learning. In particular, reinforcement learning(RL) has been applied for many such problems, with a recentfamous example solution on playing the Go board game. Thestate-of-the-art RL-driven solution is called AlphaGo Zero [5],[6], which uses a Monte Carlo tree search to ﬁnd moves basedon previously “learned” moves.Driven by the issues raised above, we present LINTS RT ,a learning-based testbed for intelligent real-time schedul-ing. LINTS RT contains three major components: Simulator,Deep Neural Network (DNN), and Monte Carlo Tree Search(MCTS). During the training phase given any training taskset, the simulator simulates a sporadic multiprocessor system.By looking at the current state of the simulator, the DNNdetermines the probabilities of future actions. Then MCTSrandomly selects a ﬁnite set of actions guided by the actionprobabilities from the DNN. The simulator then simulates byapplying the actions from MCTS. This procedure is repeateduntil we reach an end state upon which a reward is calculated.This reward is sent as feedback to the DNN. DNN associatesthe reward with all the intermediate states and actions between In many industrial systems including wireless baseband, it is often notpossible and thus not required to guarantee stringent hard real-time correct,i.e., every single deadline must be met. Many such systems allow deadlinemisses, however, such deadline misses shall be rather infrequent (e.g., < ). Note that different application domains have different performancecriteria. a r X i v : . [ c s . O S ] J u l he start and end states. This process is then repeated for allthe task sets in the training data. The trained DNN from thisprocedure is used as a dynamic priority scheduler in the de-cision phase. Hence, the structural design and implementationof the DNN have to be taken into consideration due to thetradeoff between expressibility vs. overhead as running a DNNto make runtime decisions could cause an unacceptable latencypenalty. This three-component system can be easily extendedto take into consideration additional constraints, as shown inSec. III-F.We have implemented LINTS RT both as an application-levelsimulator and a scheduler plugin within a real-time OS, i.e.,LITMUS RT [7], [8]. Through extensive evaluation, LINTS RT is proven to have the following properties: • Efﬁcacy.

We compare the schedulability of various tasksets by LINTS RT in simulated environments consideringpractical scheduling-induced overheads as well as inLITMUS RT . We observe signiﬁcant improvements w.r.t.runtime schedulability under both settings in comparisonto GEDF and RM, for scheduling real-time sporadictask sets considering complexities expressed via non-preemption and resource heterogeneity. • Extensibility.

We demonstrate how to easily extend thebasic framework of LINTS RT , which supports sporadictasks scheduled on a homogeneous multiprocessor in apreemptive manner, to incorporate non-preemption and/orresource heterogeneity, which represents a practical yetdifﬁcult characteristic of workloads and hardware, respec-tively. Through such demonstration on extensibility, weaim to encourage other researchers from both academiaand industry to leverage and extend LINTS RT in address-ing their speciﬁc scheduling problems. • Practicality.

Our implementation of LINTS RT is shownto be practical. Due to an optimized structure and GPU-based implementation of the DNN component, LINTS RT incurs rather low latency overhead when making runtimescheduling decisions.To our knowledge, this work represents the ﬁrst attempt todesign and implement an extensible learning-based frameworkfor autonomously making real-time scheduling decisions. Afew limitations of our focused settings are worth noting. First,while we believe that the design of LINTS RT can applyto other scheduling disciplines, LINTS RT focuses on globalscheduling, which is shown to be superior under many com-mon settings [9], [7]. In future work, we will extend LINTS RT to consider other disciplines. Similarly, in creating LINTS RT ,producing a fully-featured testbed covering all complexitiesseen in practice would simply not be feasible at present.Rather, our goal was to produce an easily extensible platformwhich could be further leveraged and extended by otherresearchers. For this reason, our design and implementationhave focused on independent sporadic tasks. We leave issuessuch as support for I/O, shared resources, and other workloadcharacteristics such as dependency and parallelism as futurework. AgentEnvironment Action +1 +1 StateReward Fig. 1: Reinforcement learning.II. BACKGROUND

A. Real-time Scheduling

A rich set of real-time scheduling algorithms and theircorresponding schedulability tests have been proposed in theliterature, including a large body of such works focusingon scheduling sporadic real-time tasks on a homogeneousmultiprocessor. Till recently, this classical scheduling problemis deemed to be practically resolved as a heuristic-basedalgorithm is shown to be able to achieve a rather high runtimeschedulability considering real system overheads [10]. Besidesthis basic system model (i.e., sporadic workloads preemp-tively scheduled on homogeneous multiprocessor), extensiveefforts have been made to handle increasing complexities seenin the task model, such as non-preemptive execution [2],[3], [4], precedence constraints [11], [12], [13], [14], self-suspensions [15], [16], and mixed-criticality [17], [18], [19],[20], as well as those seen in the hardware platform, suchas heterogeneous multiprocessors [21], [22], computing ac-celerators [23], [24], [25], and shared resources [26], [27],[28]. Unfortunately, although promising progress has beenmade for dealing with various complexities, many such com-plexities yield rather pessimistic results. For instance, thenon-preemptive scheduling of sporadic task systems on ahomogeneous multiprocessor, which merely adds the complex-ity of non-preemptive execution into the scheduling context,remains largely as an open problem since the state-of-the-art schedulability tests [2], [3], [4] may cause signiﬁcantutilization loss under many settings. When facing a systemenvironment where multiple such complexities are presented,the real-time scheduling problem may become signiﬁcantlydifﬁcult. For instance, no efﬁcient scheduling technique isknown to be able to non-preemptively schedule just simplesporadic workloads on a heterogeneous multiprocessor wherecores have different speeds.This observed difﬁculty in designing efﬁcient rule-basedreal-time scheduling algorithms motivates us to developLINTS RT . LINTS RT focuses on global scheduling, which hasbeen shown to be effective and superior under several commonsettings compared to other disciplines such as partitioned andclustered scheduling [29]. B. Machine Learning Techniques

Reinforcement learning (RL) is a branch of machine learn-ing, which is particularly good at training an intelligent agentthat can act autonomously in a speciﬁc environment. Thetrained autonomous agent is supposed to take proper actionsthrough interaction with the environment and improve the2 rainingTask sets

Simulator DNN

SystemParameters

System States

DNN computejob selectionprobabilities

System states are stored in the nodes of the tree Job selections are storedin the edges of the treeSimulator releases jobs, calls DNN and generates states

System States

DNN

Ji J k ... Ji J k ...Schedules MCTS

MCTS returnsjob selection

MCTS

Train data after search ends

Dynamic Static

RuntimeTask set

Fig. 2: Overview of LINTS RT , with the training phase andruntime scheduling phase illustrated on the left and rightsubﬁgure, respectively.“goodness” of the system state over time. Figure 1 illustratesthe structure of a reinforcement learning system. Reinforce-ment learning systems are composed of four elements: • State. For example, the system state of our schedulingproblem, denoted as S . In our scheduling problem, thejob queue and processor information (heterogeneity) con-stitute S . • Actions. Given the current state, only a subset of all ac-tions is allowed. For example, in our scheduling problem,only jobs that are already in the job waiting list and jobrunning list are allowed to be selected and run in the nexttime slot. Actions are denoted as a . • Reward r . After the agent takes some actions, the re-inforcement learning system needs to interact with theenvironment and evaluate the utility of the action. Theutility values will be used to generate a reward to helpthe reinforcement learning system to train the intelligenceagent. For example, scheduling actions that make somejobs miss their deadline will have a low reward, whereasscheduling actions that achieve the opposite will have ahigher reward. • Policy. A mapping from states to actions denoted as π : S → A . It is the only output of the reinforcement learningsystem. Note that a policy could be a table, a function,or even a neural network.Detailed state, action, and reward deﬁnitions for our systemare given in Section III-C. RL enables the system to keepcontinuously tweaking its learned hypothesis as the systemprogresses through its computation. For our scheduling sys-tem, we deﬁne the state to be the current job queue and actionsas selecting a job for execution. But, setting an immediatereward for every action the scheduler takes is challengingbecause it is not immediately apparent in the system howthe granular scheduling decisions affect the overall schedule.Coincidentally learning systems for board games such as Goalso run into a similar problem, where the reward for everymove is unclear, as it is not immediately apparent in thegame how a move affects the outcome of the game. AlphaGoZero , which is a state-of-the-art RL algorithm for playing Go,inspires our approach.III. DESIGN OF LINTS RT LINTS RT utilizes an optimization search algorithm whichoriginates from AlphaGo Zero . However, there is a set of inherent differences between the environment of real-timescheduling and the board game Go, which makes it non-trivial to leverage the fundamental reinforcement learning idea.

Latency sensitivity.

A major difference is regarding latencysensitivity, as runtime actions on playing the board game Godo not need to be real-time, whereas real-time schedulingdecisions have to be made in a stringent real-time fashion.This causes a key challenge for leveraging any reinforce-ment learning technique (particularly those using deep neuralnetworks) in the scheduling problem context because such alearning-based decision-making process often incurs a non-trivial latency overhead, unlike fast heuristic-base schedulerssuch as EDF and RM. Since the game Go does not have suchstrict latency constraints, the DNN deployed in AlphaGo Zeroutilizes ResNet [30], which may incur a runtime overheadof at least 100ms [30]. On the other hand, using a smallbut fast DNN may result in unsatisfactory accuracy, whichmay cause frequent deadline misses. We carefully explore thetradeoff between accuracy and latency when designing andimplement LINTS RT . As discussed in Sec. III-E in detail, wedevelop a new DNN structure, which yields high accuracywhile incurring a rather small runtime overhead, as reportedin the evaluation. Adversary.

Board games are played against an opponent,where one could improve the performance of the algorithm byself-play. It is thus straightforward to deﬁne a reward functionas the win condition is clear. In contrast, there is no suchadversary for real-time scheduling problems, which makesit challenging to deﬁne and compute rewards as there isn’tany intuitive win condition for sporadic task scheduling. Toresolve this issue, we deﬁne the reward for an action as thelongest possible time for the sporadic task system not to missa deadline after taking the action. To compute this reward,we implement an efﬁcient scheduling simulator, which wedescribe in Sec. III-D.

Action predictability.

Predicting moves of a deterministicboard game such as Go is vastly simpler than predicting areal-time system because the system has hidden variables suchas the unpredictable job releases of sporadic tasks that resultsin non-deterministic behavior. To leverage the reinforcementlearning framework for the sporadic real-time task schedulingproblem, it is thus required to obtain more comprehensiveand reliable training data. Only well-distributed training datacan ensure that the neural network would learn the patternof the sporadic tasks. This is because the job release timeand execution time of sporadic tasks are usually subject torandom distributions. Therefore, if the training data is notcomprehensive enough, the distribution cannot be learnedproperly, leading to improper scheduling decisions. This isillustrated in Fig. 4 and discussed in detail in Sec. III-E.

Operation duration.

Real-time scheduling, especially forsporadic tasks, is inﬁnite in length, whereas any board game isﬁnite. This inﬁnity property could lead to error accumulation,i.e., a former improper scheduling decision is very likely tocause a series of deadline misses at later times, which is avery hard problem. A perfect solution to resolve this challengeis to guarantee that optimal scheduling decisions are madeat every instant, which is clearly impossible. Our alternative3olution is to design a DNN to output an evaluation scorethat measures the “optimality” of the current state. LINTS RT will then make scheduling decisions that can increase thisscore accordingly. Details are discussed in Sec. III-E4 andSec. III-E3. These differences make it challenging from manyaspects to developing a learning-based framework for makingreal-time scheduling decisions. To overcome these challenges,the design and implementation of LINTS RT have been care-fully optimized and tailored speciﬁcally to the problem contextof real-time scheduling. A. Overview

The overview of LINTS RT is illustrated in Fig 2, which hasbeen generally described in Sec. I.For the training phase, the input for LINTS RT is a set oftraining task sets, and the output would be a State-Actionmapping. A task set from the input is given to the simulator,which then simulates the job releases and constructs the stateas a matrix (also called tensor) derived from the current jobqueue and processor information. This state contains all theinformation that a rule-based scheduler would require to makea scheduling decision. The Action is deﬁned as a tensor thatrepresents a job being selected for execution.The reward for an action is deﬁned as the maximumruntime achievable without missing a deadline. Through ourdesign, this reward can be estimated by examining multipleexecutions of the Monte Carlo Tree Search simulation andﬁnding scenarios that include that speciﬁc action.However, the inherently stochastic nature of the MCTSmakes it very inefﬁcient. To remedy this problem, we use aDeep Neural Network as a companion to the MCTS in order toreduce the randomness of the tree search. The DNN would betrained to take the state tensor as an input and give a reasonableaction as the output. This training should be possible becausethe state already contains all the necessary information to makea scheduling decision (e.g., a rule-based scheduler can makedecisions based on the same state).As we shall discuss in Sec. III-D, we use the DNN as aguide to the MCTS to limit the search to known “good” areasof the search space. This search is continued until the MCTSﬁnds a feasible schedule (or an iteration threshold is reached)for a task set. The output of this search would in-turn, be usedto train the DNN. This process is repeated for all the task setsin the input.Both the DNN guided MCTS simulations and the DNNby itself are different versions of State-Action mapping. Thismeans that a well-trained DNN can potentially be used alone tomake runtime scheduling decisions. However, the simulation-based DNN guided MCTS is much more accurate, as we shalldiscuss in Sec. III-E, albeit with a larger overhead. To explorethis tradeoff, we design LINTS RT under two conﬁgurations:a dynamic conﬁguration, suitable for dynamic workloads thatonly uses the DNN, and a static conﬁguration which uses theentirety of the DNN-guided MCTS, and is thus more suitablefor static scenarios where workload parameters are pre-known,thus allowing LINTS RT to generate a scheduling table ofﬂineand make runtime scheduling decisions accordingly. We haveevaluated both conﬁgurations in the evaluation. B. System Model

We consider the problem of scheduling a set of n sporadictasks τ = { τ , τ , · · · , τ n } on m identical processors (homo-geneous setting) or processors with varying speeds (heteroge-neous setting) in a preemptive or non-preemptive manner. Asdiscussed before, our current design scope focuses on globalscheduling, where the jobs running on a processor can bemigrated to another processor by the scheduler with certainoverhead. Each task τ i is speciﬁed by a vector (cid:126)τ i = ( p i , d i , e i ) ,where p i denotes the minimum interval between two jobs ofa task, e i denotes the worst-case execution time for a job, and d i is the relative deadline ( d i = p i ). Let J ( j ) i denote the j -thinstance of task τ i , which can also be described by a vector (cid:126)J i ( j ) = ( R i , E i , D i ) , where R i is the release time, E i is theactual execution time, and D i is the absolute deadline.A time slot is considered to be the basic scheduling timeunit. A job must execute during a whole time slot or notexecute in that time slot at all. A job has two different states:waiting and running. Jobs with the same status are storedrespectively in two different job lists, job waiting list L w and job running list L r . Let N l = ( (cid:107) L r (cid:107) + (cid:107) L w (cid:107) ) where (cid:107) L r (cid:107) , (cid:107) L w (cid:107) are the lengths of L w and L r respectively. Notethat these lists are disjoint as one job can only be in one ofthe lists. C. State, Action and Reward Representation1) State:

At the start of the time slot [ t, t + 1) with i idle processors is represented as (cid:126)s t,i . The current status ofthe system can be represented as a tensor of size N l × , inwhich every job in L r and L w is represented by a row. Theﬁrst column of this tensor is the relative deadline of the job,the second column is the remaining execution time of the job,and the third column is a boolean which is true if the job isexecuting in time slot [ t, t + 1) . For example, the followingcan be a valid representation of the system’s status:  D ( j ) i − t E ( j ) i − execution( R ( j ) i , t ) 0 / D ( j ) i − t E ( j ) i − execution( R ( j ) i , t ) 0 / ... ... D ( j n ) i n − t E ( j n ) i n − execution( R ( j n ) i n , t ) 0 /  This tensor can be calculated from L r and L w . Additionally,we keep the last N h statuses of the current tasks in the state(this improves the performance of the DNN while makingscheduling decisions described in section III-E). Thus, weconstruct a state tensor at time point t of size N l × × N h .

2) Action:

Action is deﬁned as a boolean vector of length N l , where only one entry can be true. (cid:126)a t,i = (0 , · · · , , · · · , (1)Here t and i have the same meaning as they are in the systemstate (cid:126)s t,i . One action corresponds to choosing one job fromeither L w or L r to execute in the next time-slot. Therefore,for an m processor system, given the current state, we needto take at most m actions with m − intermediate states(i.e., we use { ( (cid:126)s t,m , (cid:126)a t,m ) , ( (cid:126)s t,m − , (cid:126)a t,m − ) , · · · , ( (cid:126)s t, , (cid:126)a t, ) } to represent the jobs running in the next time slot). These4ntermediary states are constructed by removing the jobs thathave been selected by the previous action.

3) Reward:

The reward value computation method is in-spired by AlphaGo Zero, which uses MCTS. MCTS is asimulation-based algorithm which uses a tree where each noderepresents a state, and each edge represents a possible action totake. Also, a policy p is deﬁned as the probability distributionover the set of actions given a state = p ( a | s ) . Additionally,each node in the tree is assigned value v to represent thegoodness of a state. Initially p and v are completely uniform.However, these distributions are updated by performing roll-outs . A roll-out is deﬁned as simulating random actions(sampled from the probability distribution deﬁned by p and v )from the current node until the end condition of the system,which for us is either a deadline miss or when the time elapsedis greater than the hyper-period of the task set.For example, a roll-out from the initial state (root of thetree) simulates the schedule from (cid:126)s ,m → (cid:126)s t,m where either t > H ( τ ) or at t , one of the jobs misses its deadline. Theoutcome of a roll-out contributes to v and p of nodes and edgesrespectively in the path of the roll-out. These updates makethe subsequent roll-outs explore the “good” areas of the searchtree as the random actions are sampled from the probabilitydistributions.For this approach to work, we need to create a schedulingsimulator to perform this roll-out, the simulator implementa-tion is deﬁned in Sec. IV-A. We deﬁne the reward for an actionas the maximum system runtime out of all the roll-outs that gothrough this action. Note that our reward is always a positivenumber, and the RL framework is capable of identifying thebest scheduling decision using the variation in the magnitudeof reward. However, the inherently stochastic nature of MCTSmakes it very inefﬁcient. It might take thousands of roll-outsuntil the algorithm starts to search the “good” areas. It wouldbe beneﬁcial if the tree search had a guide to help it ﬁnd goodstates to expand. D. Deep Neural Network guided MCTS

To guide the MCTS algorithm, we need a component whichgiven a state (cid:126)s t,i produces a reasonable action to take. Sincethe state contains all the required information that a rule-basedscheduler requires to make a scheduling decision (for sporadictasks on a homogeneous multiprocessor system), we should beable to train a Deep Neural Network (DNN) to do the same.Let θ be the parameter of DNN f , then the DNN can be writtenas f θ ( (cid:126)s t,i ) = ( (cid:126)p, v ) where: • (cid:126)p is the policy, which represents the probability of a jobbeing selected to run in the next time slot for the givensystem state (cid:126)s t,i . For example, (cid:126)p = (0 . , . , . , · · · ) . • v ∈ [ − , is evaluation value, which represents the“goodness” of state (cid:126)s t,i . A value of means that in theforeseeable future, no jobs will miss its deadline. A valueof − means that in the next time slot, some job missesits deadline. SystemStates × 3 × ℎ CNN Layer

ConVﬁlter=64kernel=m * 3stride=1 BatchNorm ReLU

CNNLayer CNNLayer

Policy Block

Policy Output:

Probabilities ofjob selections.

ReLULinearinput=Loutput=L/2 SoftmaxLinearinput=L/2output=1

Value Output:

Evaluation of thestate of system.

Value Block

TanhLinearinput=L/2output=1

Fig. 3: Network architecture.The DNN achieves the same functionality as the State-Actionmapping, which is the output from RL. Therefore, we can usethe overall RL model to train the DNN over time. For eachiteration of the RL model, the DNN would keep getting betterat guiding the MCTS. The details about the DNN’s structureand training are discussed in the next section. For now, assumethat we have a DNN that is semi-competent. Next, we describehow such a DNN can be used to guide the MCTS.In the MCTS tree, each edge ( (cid:126)s t,i , (cid:126)a t,i ) need to store 3values: prior probability P ( (cid:126)s t,i , (cid:126)a t,i ) = (cid:126)p · (cid:126)a t,i where f θ ( (cid:126)s t,i ) =( (cid:126)p, v ) , visit count N ( (cid:126)s t,i , (cid:126)a t,i ) which is equal to the numberof times the edge has been visited in previous roll-outs, andthe job selection value Q ( (cid:126)s t,i , (cid:126)a t,i ) which is initialized to zero.Each roll-out starts from the initial state (cid:126)s ,m , then iterativelyselects the actions that maximize an upper conﬁdence bound Q ( (cid:126)s t,i , (cid:126)a t,i ) + U ( (cid:126)s t,i , (cid:126)a t,i ) , where U ( (cid:126)s t,i , (cid:126)a t,i ) ∝ P ( (cid:126)s t,i , (cid:126)a t,i )1 + N ( (cid:126)s t,i , (cid:126)a t,i ) . (2)The simulation stops when it encounters a leaf node (cid:126)s t (cid:48) ,j (i.e., when a job misses its deadline or t (cid:48) > H ( τ ) ). Foreach edge ( (cid:126)s t,i , (cid:126)a t,i ) traversed in the roll-out, N ( (cid:126)s t,i , (cid:126)a t,i ) isincremented by 1 and: Q ( (cid:126)s t,i , (cid:126)a t,i ) = 1 N ( (cid:126)s t,i , (cid:126)a t,i ) (cid:88) (cid:126)s ∈{ (cid:126)s t,i → (cid:126)s t (cid:48) ,j } V ( (cid:126)s ) (3)where { (cid:126)s t,i → (cid:126)s t (cid:48) ,j } indicates the set of states in the pathfrom the current node to the leaf node and V ( (cid:126)s ) = v where f θ ( (cid:126)s ) = ( (cid:126)p, v ) . E. DNN Structure and Training1) Deep Neural Network Architecture:

Figure 3 shows thearchitecture of the DNN used in LINTS RT . The networkis mainly composed of three modules: the input module,the policy block, and the value block. The input module iscomposed of 5 CNN blocks [31], with each block containinga convolutional layer with 64, m × , 1 as its ﬁlter size, kernelsize and stride respectively followed by a batch normalizationand ReLU layers. The policy block is composed of 3 linearlayers and uses the sof tmax as the activation function as eachelement in the policy lies in the range of [0,1]. The value5lock contains 2 linear layers and uses T anh as the activationfunction as the evaluation value v ∈ [ − , . Note that thestructure of DNN is purposefully kept lean as compared toDNNs for other applications such as Go, to minimize thescheduling overhead.

2) Policy Iteration and Network Training:

Here, the DNN-guided MCTS may be seen as a trial-and-error algorithmwhich, given the DNN parameters θ and a root state (cid:126)s t,i ,computes the State-Action mapping as a probability tensor (cid:126)π .Note that (cid:126)π and (cid:126)p are both representing the probability of a jobbeing selected at state (cid:126)s t,i . However, as (cid:126)π is computed fromsimulating many roll-outs, it is more accurate and reliable than (cid:126)p , especially when the DNN has not completed training.For a state (cid:126)s t,i , we can compute (cid:126)π from the DNN-guidedMCTS and then compute z as the number of time slots from t where no job misses its deadline. The (cid:126)π, z pair can thenbe used as a training sample to update the parameters θ ofthe DNN, making the output job selection probabilities andevaluation value, f θ ( (cid:126)s t,i ) = ( (cid:126)p t,i , v ) , closer to the best jobselection decisions.Fig. 4 illustrates how MCTS computes the optimal State-Action mapping (cid:126)π , starting from (cid:126)π : • Policy evaluation.

Use simulator to get the value ofpolicy (cid:126)π , v (cid:126)π . • Policy improvement.

Optimize the policy to (cid:126)π accord-ing to the evaluation value v (cid:126)π . • Repeat step 1 and step 2 until the optimal policy (cid:126)π isfound.In order to keep the size of the input of the DNN constant,we do zero-padding or truncate the state tensor (cid:126)s t accordingto the absolute deadlines D .The DNN is trained by the policy iteration training proce-dure for each job selection (note that all the job selectionsdone in the MCTS are only simulated). The steps to train theneural network are: • The DNN is initialized to random parameters, θ . • At each subsequent iteration, k ≥ , do a schedulingsimulation (shown in Fig. 4). • For each iteration (at time point t ) and for each systemstate (cid:126)s t,i , a Monte-Carlo Tree Search is executed usingthe previous iteration of DNN f θ k − . The job selectionis done by maximizing the upper conﬁdence bound for Q ( (cid:126)s t,i , (cid:126)a t,i ) + U ( (cid:126)s t,i , (cid:126)a t,i ) . • A simulation stops/terminates at time point t (cid:48) , when anyjob misses its deadline or t (cid:48) > H ( τ ) . The simulation isthan scored (Equation 3) to calculate the reward z i . • The data for each system state is stored as ( (cid:126)s t,i , (cid:126)π t,i , z t,i ) ,and the parameters for DNN ( θ k − ) are updated to θ k using the data uniformly sampled from all system statesof the last iteration by using ( (cid:126)s, (cid:126)π, z ) as the trainingsamples.The training objective for the DNN is to minimize the errorbetween v t,i and z t,i , and to maximize the similarity between (cid:126)p t,i and (cid:126)π t,i . Thus, the loss function used by the gradientdescent procedure to adjust the parameters θ of the DNN canbe written as: ( (cid:126)p, v ) = f θ ( (cid:126)s ) , l = | z − v | − (cid:126)π T log (cid:126)p + c (cid:107) θ (cid:107) (4) ⃗ ⃗ J ⃗ J J J k J m J n J i (a) Simulation Procedure. ( , ) ⃗ ,4 ,4 ,4 J ( , ) ⃗ ,3 ,3 ,4 J J ( , ) ⃗ ,2 ,2 ,4 J i J k J m J n (b) Neural Network Training. Fig. 4: (a) Policy iteration. The simulator do the simulationsfrom (cid:126)s , to (cid:126)s T . For each system state (cid:126)s t,i , an MCTS isexecuted using the deep neural network with the latest param-eters θ . Job selections are determined through the probabilities π computed by the MCTS. The simulation terminates whileany job misses its deadline or a feasible schedule is found(the simulation lasts for a whole hyper-period). The terminalsystem state (cid:126)s T is evaluated by Equation 3. (b) Neural networktraining. The neural network f θ takes the system state (cid:126)s t,i as its input and output both a probabilities tensor (cid:126)p t,i and aevaluation value v t,i . (cid:126)p t,i represents the probability distributionover job selections, and v t,i reﬂects how well the systemworks. The parameter of the neural network θ will be updatedto minimize the errors between the search probabilities π t,i ,and to minimize the error between the output evaluation value v t,i with the actual evaluation value. The updated parameterswill be used in the next iteration.

3) Decision Phase for Dynamic Conﬁguration:

As dis-cussed earlier, we use the DNN as a dynamic runtime sched-uler. Given the current state (cid:126)s t,m of the system we get themost probable action by ﬁnding i = max index ( (cid:126)p ) where f θ ( (cid:126)s t,m ) = ( (cid:126)p, v ) . We mark J i to be executed in processor 1for the next time slot and update the state tensor (cid:126)s t,m − bychanging only the entries corresponding to J i . Then, (cid:126)s t,m − isgiven as the input to the DNN and the same procedure repeatsuntil we reach (cid:126)s t, or some (cid:126)s t,i which is empty. Jobs thathave been marked for execution are sent to their correspondingprocessors by performing context switches, or migrationswherever required, and the system resumes execution for onetime slot. After this timeslot, we get the new state tensor (cid:126)s t +1 ,m , and the above procedure is repeated.

4) Decision Phase for Static Conﬁguration:

For this con-ﬁguration, we are given a static task set, RL training startsfrom (cid:126)s ,m . We use the DNN guided MCTS to performs roll-outs. After one roll-out is complete, if we ﬁnd a feasibleschedule, then we return that schedule. Otherwise, we compute6he reward and train the DNN according to that reward. Thisprocedure is repeated for subsequent roll-outs until either aroll-out manages to ﬁnd a feasible schedule or the roll-outthreshold is reached. If the roll-out threshold is reached, ouralgorithm has failed to ﬁnd a feasible schedule. In that case,we can re-run the algorithm with a higher roll-out threshold. F. Extensions

In addition to preemptive multiprocessor systems, LINTS RT can be easily extended to handle other complexities suchas non-preemptive systems and/or heterogeneous multiproces-sors. The ﬂexibility and extensibility of LINTS RT are funda-mentally due to the fact that it is a training-based approach.Thus, one only needs to modify the system state tensor whenfacing complexities in the models. The DNN can automaticallylearn to convert this new state to a reasonable action and, inturn, guide the MCTS to regions that conform to these newconstraints.

1) Non-preemptive System:

For non-preemptive workloads,the system state tensor has the same format as the tensor inSec. III-C1. Since we do not need to consider the running jobsfor selection, the state tensor here is derived only from the jobwaiting list of the system, and the parameter i starts from thenumber of idle processors.

2) Heterogeneous Multiprocessors:

For heterogeneousmultiprocessors, (cid:126)s t,i has the same format. However, since thejob will have different execution times on different types ofprocessors, the second column of (cid:126)s t,i needs modiﬁcation. Forexample, assuming there are two types of processors in thesystem, a computation that needs to consume 1 millisecondon a processor of type 1 will need 2 milliseconds to computeon a processor of type 2 because a processor of type 1 istwice as fast. First, we deﬁne the execution of J ( j ) i at time t as follows:execution ( R ( j ) i , t ) = execution ( R ( j ) i , t, execution ( R ( j ) i , t, where execution ( R ( j ) i , t, k ) denotes the time allocated onprocessors of type k for J ( j ) i in the window [ R ( j ) i , t ) . Thus,the system state tensor that is used for selecting a job to run ona type 1 processor is same as the tensor for the homogeneouscase, whereas the system state used for a type 2 processor is: (cid:126)s (2) t,i =  D ( j ) i − t × [ E ( j ) i − execution( R ( j ) i , t )] 0 / D ( j ) i − t × [ E ( j ) i − execution( R ( j ) i , t )] 0 / ... ... D ( j n ) i n − t × [ E ( j n ) i n − execution( R ( j n ) i n , t )] 0 /  (5)We ﬁrst select jobs to execute on type 1 processors and thenconstruct the next intermediate state for type 2 processors forselecting jobs to execute on them. Note that the schedulingoverhead is similar to the homogeneous case as the number ofintermediary states is the same, and the conversion betweenthe states for type 1 and 2 processors is trivial.

3) Scheduling-related Overhead Consideration:

Note thatLINTS RT takes into account the various kinds of overheadsdue to the scheduling process in the design space. Similar tothe above-mentioned extensions, the base workﬂow remainsthe same. The only needed modiﬁcation is on the systemstate representation. We rely on an existing straightforwardoverhead accounting method [32] and gather the actual over-heads for running task sets in LITMUS RT , which is a Linux-based testbed for evaluating various scheduling algorithms.The system state considering such overheads modiﬁes eachjob’s worst-case execution time from E to E (cid:48) = ¯ o + E. Where ¯ o denotes the average overhead for all the tasks in τ , whichis an empirical value originating from the data collected inLITMUS RT .IV. I MPLEMENTATION AND E VALUATION

We have implemented LINTS RT in a simulator for onlineconﬁguration and LITMUS RT for ofﬂine conﬁguration. A. Implementation details1) Simulator Implementation:

The simulator is used forboth evaluation and MCTS roll-outs, and is responsible forreleasing jobs, simulating job execution and computing systemstate. The simulator maintains a clock value c . For each runof the simulator, the following steps happen: • A random integer r is generated. • For each task τ i in the task set, if there is no job releasedby τ i and mod ( c, p i ) == r , then release a new job for τ i . • Compute the system state and send it to the scheduler. • Update the remaining execution time for every job thatwas selected by the scheduler to execute. • Remove completed jobs from the queue. • Increment c .In the future, we plan to extend this simulator to handleother kinds of hardware constraints such as I/O and memory.For MCTS roll-outs, multiple simulations need to run at eachnode, using the fact that these simulations are independent ofeach other they can be run in parallel. Hence we develop asimulator that is implemented in CUDA [33] to utilize thecapabilities of a GPU. Additionally, we can see that extendingto non-preemptive and heterogeneous systems is intuitive. B. Evaluation and Training Setup

We consider both preemptive and non-preemptive systems,and for each setting, we also consider platforms with m ∈{ , } processors of both homogeneous and heterogeneous ar-chitecture. For a given m , preemption setting and architecturesetting, we vary the total utilization U across [0 . × m, m ] . Wealso separate the task sets into 4 different types based on theirper-task-utilization: light, medium, heavy, and mixed. “Light”task sets contain tasks with utilization that varies from 0 to0.2, “medium” task sets contain tasks with utilization from 0.2to 0.5, for “heavy”, utilization is from 0.5 to 0.8, and “mixed”contains task with all the different utilization. Furthermore, foreach conﬁguration ( m , preemption setting, architecture setting,total utilization U and per-task utilization type), we randomly7 .5 0.6 0.7 0.8 0.9 1.0Utilization0.00.20.40.60.81.0 S c h e d u li b ili t y GEDFLINTS^RTGRM (a) light. S c h e d u li b ili t y GEDFLINTS^RTGRM (b) medium. S c h e d u li b ili t y GEDFLINTS^RTGRM (c) heavy. S c h e d u li b ili t y GEDFLINTS^RTGRM (d) mixed.

Fig. 5: Runtime schedulbility for ( m = 4 , Preemptive, homogeneous, { light, medium, heavy, mixed } ) vs. Um S c h e d u li b ili t y GEDFLINTS^RTGRM (a) light. S c h e d u li b ili t y GEDFLINTS^RTGRM (b) medium. S c h e d u li b ili t y GEDFLINTS^RTGRM (c) heavy. S c h e d u li b ili t y GEDFLINTS^RTGRM (d) mixed.

Fig. 6: Runtime schedulbility for ( m = 4 , Preemptive, heterogeneous, { light, medium, heavy, mixed } ) vs. Um S c h e d u li b ili t y GEDFLINTS^RTGRM (a) light. S c h e d u li b ili t y GEDFLINTS^RTGRM (b) medium. S c h e d u li b ili t y GEDFLINTS^RTGRM (c) heavy. S c h e d u li b ili t y GEDFLINTS^RTGRM (d) mixed.

Fig. 7: Runtime schedulbility for ( m = 4 , Non-preemptive, homogeneous, { light, medium, heavy, mixed } ) vs. Um S c h e d u li b ili t y GEDFLINTS^RTGRM (a) light. S c h e d u li b ili t y GEDFLINTS^RTGRM (b) medium. S c h e d u li b ili t y GEDFLINTS^RTGRM (c) heavy. S c h e d u li b ili t y GEDFLINTS^RTGRM (d) mixed.

Fig. 8: Runtime schedulbility for ( m = 4 , Non-preemptive, heterogeneous, { light, medium, heavy, mixed } ) vs. Um generate multiple task sets using the unbiased task set gen-erator from [34], which yields tasks whose utilization is uni-formly distributed in the speciﬁed range for the conﬁguration.Task periods were drawn uniformly at random from the set { , , , , , , , , , , , , , , , , , , , , , , } (that only containsprime factors { , , } ), which reﬂects realistic timing con-straints and ensures a short hyper-period. These generated tasksets are used to train and evaluate LINTS RT . Also, note thatthe total utilization of our generated task sets is a little smallerthan it is in the conﬁguration (at most by 0.5 percent). Forheterogeneous systems, we use the setting where half of theprocessors are working at a speed of “1”, and the remainingprocessors are working at a speed of “0.5”.

1) DNN Training:

We implement our DNN on a maturemachine learning framework PyTorch [35]. We train our DNNusing the approach described in Sec. III-E for each systemconﬁguration, which is a tuple of ( m , preemption setting,architecture setting, per-task-utilization type). Training is doneon 100 task sets generated using the methods discussed in the previous section. Sporadicity of the task model is handled bythe simulator, as discussed in section IV-A1.The machine used to train the deep neural network containsan Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz and twoNVIDIA GTX 2080 Ti GPUs. DNN parameters are updatedusing the stochastic gradient descent method (SGD) withmomentum and learning rate annealing. The loss function iscalculated according to Equation 4. Table I shows the learningrate of the policy and value networks over different trainingphases. Parameters for the networks are updated over 10000mini-batches containing 256 samples each, which are selecteduniformly from the search trees.

Simulation Learning Rate for Policy Learning Rate for evaluation0-500 0.1 0.01500-1000 0.01 0.0011000-1500 0.001 0.00011500-2000 0.0001 0.00001

TABLE I: Learning rates.As discussed earlier, each pass through the DNN schedulesone job, which takes around 10 nanoseconds to execute on a8 .5 0.6 0.7 0.8 0.9 1.0Utilization0.00.20.40.60.81.0 S c h e d u li b ili t y preemptive, heavy GEDFLINTS^RT (a) Heavy, preemptive. S c h e d u li b ili t y preemptive, mixed GEDFLINTS^RT (b) Mixed, preemptive. S c h e d u li b ili t y non-preemptive, heavy GEDFLINTS^RT (c) Heavy, non preemptive. S c h e d u li b ili t y non-preemptive, mixed GEDFLINTS^RT (d) Mixed, non preemptive.

Fig. 9: Runtime schedulbility (LITMUS RT ) for ( m = 2 , { preemptive, Non-preemptive } , homogenous, { heavy, mixed } ) vs. Um GPU (NVIDIA GTX 2080 Ti) and 4 milliseconds on a CPUwith 8 cores. This makes the overhead of LINTS RT = × m (20ns, 40ns for m = { , } respectively). C. Results

Since LINTS RT makes scheduling decisions based on thecurrent system state, coming up with a static schedulabilitytest is difﬁcult. So, we evaluate LINTS RT based on the runtimeschedulability, i.e., the percentage of 1000 generated task setsthat are schedulable by the scheduler for each system conﬁg-uration ( m , preemption setting, architecture setting, U , per-task-utilization distribution). Following results show graphswith the y-axis representing the runtime schedulability of eachscheduler and the x-axis representing the total utilization oftask set divided by the number of processors, i.e., Um .

1) Results on Preemptive, Homogeneous / HeterogeneousSettings:

Fig. 5 shows the results of LINTS RT working ona preemptive and homogeneous system with processor count m = 4 . As seen in the ﬁgure, LINTS RT yields a much higherruntime schedulability compared to GEDF and GRM, par-ticularly when per-task utilizations are heavy. We performedexperiments for m = { , } . Due to space constraints, weonly show the results for the 4-processor case. However, thegeneral trend for both 2-processor and 4-processor cases issimilar, with all the schedulers performing slightly better inthe 2-processor case. The reason for this in case of LINTS RT isthat, at the beginning of every time slot, the DNN needs to beexecuted for each of the processors. Thus, the latency overheaddue to the DNN decision-making process will increase alongwith an increasing number of processors. Under heterogeneoussettings, As seen in Fig. 6, LINTS RT still outperforms GEDFand GRM by a large margin.

2) Results on Non-preemptive, Homogeneous / Heteroge-neous Settings:

For non-preemptive settings, Figs. 7 and 8show that LINTS RT can outperfom GEDF and GRM by asigniﬁcant margin. The reason is that MCTS is a search-basedmethod, which takes trial-and-error actions while ﬁnding thefeasible schedules, and the neural network is trained fromthe data generated by MCTS. Therefore, the neural networkcan learn patterns of job selections that are able to reducethe number of missed deadlines. This result is encouragingbecause non-preemptive scheduling on either homogeneous orheterogeneous multiprocessors is known to be a notoriouslyhard problem.It is clear from the results that the acceptance rate of theLINTS RT is much higher than GEDF and GRM. Additionally,empirical observations show that task sets accepted by eitherGEDF or GRM are also accepted by LINTS RT . From this, we can conclude that LINTS RT offers signiﬁcant optimizationswith no detrimental side-effects over the traditional schedulingapproaches. Immediate future work would be formulating autilization test using the weights of the DNN as parameters inits schedulability test since a trained DNN is a deterministiccomponent which, when given the same system state, wouldalways provide a consistent scheduling decision. D. Experiments in LITMUS RT We also perform evaluation on LITMUS RT through table-driven scheduling tools of LITMUS RT for periodic tasksets. Due to space constraints, we omit some discussionof implementation details. Table-driven scheduling is a partof LITMUS RT , and it is realized as a reservation type inthe P-RES plugin. On multiprocessor systems, table-drivenreservations are partitioned, which means each reservation isrestricted to a scheduling slot on only one processor. Thus,we use different table-driven reservations on each core. Forthe sake of comparison, we still choose to compare againstGEDF, which is implemented using the

GSN-EDF plugin.As discussed earlier, the simulator is important forLINTS RT to perform well. If the simulator does not reﬂectthe real environment of a platform, both the trained neuralnetwork and the generated static schedule will not work well.Our way to handle overheads is to add an empirical value ¯ o to the original execution time. We would like to use thisevaluation to answer two questions: Is our estimation of thetotal overhead effective? Will the generated static scheduleoutperform the GEDF-based scheduler implemented withinLITMUS RT Fig. 9 shows the experimental results. As we cansee in Fig. 9, LINTS RT outperforms GEDF by a wide marginfor both preemptive and non-preemptive settings, which meansthat considering the empirically-measured overheads in thetraining phase works well and facilitates LINTS RT to provideoverhead-conscious runtime scheduling decision making.V. R ELATED W ORKS

As we have discussed the extensive set of related workson real-time scheduling in Sec. II, we focus herein on dis-cussing works related to applying machine learning techniquesin application domains relevant to resource management.DeepRM [36] is of particular relevance. DeepRM appliesRL techniques to solve the problem of resource management We note that supporting sporadic task scheduling using LINTS RT withinLITMUS RT requires to integrate the entire machine learning framework inthe OS kernel, which consists of a large number of mathematical libraries,such as tensor calculation libraries and automatic differentiation libraries. Asthis effort would be signiﬁcant, we leave this implementation as an immediatefuture work.

9n online scheduling. It was proposed as an online resourcemanagement scheduler for compute clusters, cloud computing,and video streaming applications. Even though DeepRM isan online scheduler, the latency constraints for its applicationdomain are very loose, which makes the design of DeepRMtotally different from LINTS RT , which expects to make real-time scheduling decisions. Additionally, their approach doesn’tconsider any non-determinism, such as sporadic task releases,while calculating the reward for their actions. As discussedin Sec. III, addressing these differences is non-trivial. We arenot aware of any work that applies reinforcement learning fordynamic-priority real-time scheduling.VI. CONCLUSIONIn this paper, we present LINTS RT , a learning-based testbedfor intelligent real-time scheduling, which has the potential tohandle various workload and hardware complexities that arehard to handle in practice. We ﬁrst present a basic design ofLINTS RT for supporting sporadic workloads on homogeneousmultiprocessors, and then demonstrate how to easily extend theframework to handle further complexities in the form of non-preemptivity and resource heterogeneity. Both application-and OS-level implementation and evaluation demonstrate thatLINTS RT is able to achieve signiﬁcantly higher runtimeschedulability under different settings compared to perhaps themost commonly applied schedulers, global EDF and RM.R EFERENCES[1] J. Liu,

Real-Time Systems . Prentice Hall, 2000.[2] S. K. Baruah, “The non-preemptive scheduling of periodic tasks uponmultiprocessors,”

Real-Time Systems , vol. 32, no. 1-2, pp. 9–20, 2006.[3] H. Baek, N. Jung, H. S. Chwa, I. Shin, and J. Lee, “Non-preemptivescheduling for mixed-criticality real-time multiprocessor systems,”

IEEETransactions on Parallel and Distributed Systems , vol. 29, no. 8, pp.1766–1779, 2018.[4] J. Lee, “Improved schedulability analysis using carry-in limitation fornon-preemptive ﬁxed-priority multiprocessor scheduling,”

IEEE Trans-actions on Computers , vol. 66, no. 10, pp. 1816–1823, 2017.[5] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang,A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al. , “Masteringthe game of go without human knowledge,”

Nature , vol. 550, no. 7676,p. 354, 2017.[6] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. VanDen Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam,M. Lanctot et al. , “Mastering the game of go with deep neural networksand tree search,” nature , vol. 529, no. 7587, p. 484, 2016.[7] B. Brandenburg and J. H. Anderson, “Scheduling and locking in multi-processor real-time operating systems,” Ph.D. dissertation, PhD thesis,The University of North Carolina at Chapel Hill, 2011.[8] J. M. Calandrino, H. Leontyev, A. Block, U. C. Devi, and J. H.Anderson, “Litmusˆ rt: A testbed for empirically comparing real-timemultiprocessor schedulers,” in . IEEE, 2006, pp. 111–126.[9] A. Bastoni, B. B. Brandenburg, and J. H. Anderson, “An empiricalcomparison of global, partitioned, and clustered multiprocessor edfschedulers,” in . IEEE,2010, pp. 14–24.[10] B. B. Brandenburg and M. G¨ul, “Global scheduling not required: Simple,near-optimal multiprocessor real-time scheduling with semi-partitionedreservations,” in .IEEE, 2016, pp. 99–110.[11] A. Saifullah, D. Ferry, J. Li, K. Agrawal, C. Lu, and C. D. Gill,“Parallel real-time scheduling of dags,”

IEEE Transactions on Paralleland Distributed Systems , vol. 25, no. 12, pp. 3242–3252, 2014.[12] X. Jiang, N. Guan, X. Long, and W. Yi, “Semi-federated scheduling ofparallel real-time tasks on multiprocessors,” in . IEEE, 2017, pp. 80–91. [13] X. Jiang, N. Guan, X. Long, and H. Wan, “Decomposition-basedreal-time scheduling of parallel tasks on multi-cores platforms,”

IEEETransactions on Computer-Aided Design of Integrated Circuits andSystems , 2019.[14] N. Ueter, G. Von Der Br¨uggen, J.-J. Chen, J. Li, and K. Agrawal,“Reservation-based federated scheduling for parallel real-time tasks,”in . IEEE, 2018, pp.482–494.[15] J.-J. Chen, G. Nelissen, W.-H. Huang, M. Yang, B. Brandenburg,K. Bletsas, C. Liu, P. Richard, F. Ridouard, N. Audsley et al. , “Manysuspensions, many problems: A review of self-suspending tasks in real-time systems,”

Real-Time Systems , vol. 55, no. 1, pp. 144–207, 2019.[16] Z. Dong and C. Liu, “Closing the loop for the selective conversionapproach: A utilization-based test for hard real-time suspending tasksystems,” in . IEEE,2016, pp. 339–350.[17] H. Li and S. Baruah, “Outstanding paper award: Global mixed-criticalityscheduling on multiprocessors,” in . IEEE, 2012, pp. 166–175.[18] S. Baruah, B. Chattopadhyay, H. Li, and I. Shin, “Mixed-criticalityscheduling on multiprocessors,”

Real-Time Systems , vol. 50, no. 1, pp.142–177, 2014.[19] Z. Guo, S. Sruti, B. C. Ward, and S. Baruah, “Sustainability in mixed-criticality scheduling,” in . IEEE, 2017, pp. 24–33.[20] A. Papadopoulos, E. Bini, S. Baruah, and A. Burns, “Adaptmc: Acontrol-theoretic approach for achieving resilience in mixed-criticalitysystems,” in

Proceeding ECRTS Conference . LIPICS, 2018, p. 14.[21] S. Baruah, M. Bertogna, and G. Buttazzo, “Real-time scheduling uponheterogeneous multiprocessors,” in

Multiprocessor Scheduling for Real-Time Systems . Springer, 2015, pp. 205–211.[22] J. Singh and N. Auluck, “Real time scheduling on heterogeneous multi-processor systems—a survey,” in . IEEE, 2016,pp. 73–78.[23] G. A. Elliott, “Real-time scheduling for gpus with applications inadvanced automotive systems,” University of North Carolina at ChapelHill Chapel Hill, Tech. Rep., 2015.[24] A. Biondi, A. Balsini, M. Pagani, E. Rossi, M. Marinoni, and G. But-tazzo, “A framework for supporting real-time applications on dynamicreconﬁgurable fpgas,” in . IEEE, 2016, pp. 1–12.[25] N. Capodieci, R. Cavicchioli, M. Bertogna, and A. Paramakuru,“Deadline-based scheduling for gpu with preemption support,” in . IEEE, 2018, pp. 119–130.[26] A. Biondi, G. C. Buttazzo, and M. Bertogna, “Schedulability analysisof hierarchical real-time systems under shared resources,”

IEEE Trans-actions on Computers , vol. 65, no. 5, pp. 1593–1605, 2015.[27] W.-H. Huang, M. Yang, and J.-J. Chen, “Resource-oriented partitionedscheduling in multiprocessor systems: How to partition and how toshare?” in . IEEE,2016, pp. 111–122.[28] A. Biondi and B. B. Brandenburg, “Lightweight real-time synchroniza-tion under p-edf on symmetric and asymmetric multiprocessors,” in . IEEE,2016, pp. 39–49.[29] R. I. Davis and A. Burns, “A survey of hard real-time schedulingfor multiprocessor systems,”

ACM computing surveys (CSUR) , vol. 43,no. 4, p. 35, 2011.[30] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in

Proceedings of the IEEE conference on computer visionand pattern recognition , 2016, pp. 770–778.[31] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcationwith deep convolutional neural networks,” in

Advances in neural infor-mation processing systems , 2012, pp. 1097–1105.[32] B. B. Brandenburg, H. Leontyev, and J. H. Anderson, “Accounting forinterrupts in multiprocessor real-time systems,” in . IEEE, 2009, pp. 273–283.[33] C. Nvidia, “Nvidia cuda c programming guide,”

Nvidia Corporation ,vol. 120, no. 18, p. 8, 2011.[34] P. Emberson, R. Stafford, and R. I. Davis, “Techniques for the synthesisof multiprocessor tasksets,” in proceedings 1st International Workshopon Analysis Tools and Methodologies for Embedded and Real-timeSystems (WATERS 2010) , 2010, pp. 6–11.

35] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation inPyTorch,” in

NIPS Autodiff Workshop , 2017.[36] H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource man-agement with deep reinforcement learning,” in

Proceedings of the 15thACM Workshop on Hot Topics in Networks . ACM, 2016, pp. 50–56.. ACM, 2016, pp. 50–56.