Scheduling of Real-Time Tasks with Multiple Critical Sections in Multiprocessor Systems
Jian-Jia Chen, Junjie Shi, Georg von der Brüggen, Niklas Ueter
EEARLY ANNOUNCEMENT 1
Scheduling of Real-Time Tasks with MultipleCritical Sections in Multiprocessor Systems
Jian-Jia Chen, Junjie Shi, Georg von der Br ¨uggen, and Niklas UeterDepartment of Informatics, TU Dortmund University, Germany { jian-jia.chen, junjie.shi, georg.von-der-brueggen, niklas.ueter } @tu-dortmund.de Abstract —The performance of multiprocessor synchronization and locking protocols is a key factor to utilize the computation power ofmultiprocessor systems under real-time constraints. While multiple protocols have been developed in the past decades, theirperformance highly depends on the task partition and prioritization. The recently proposed Dependency Graph Approach showed itsadvantages and attracted a lot of interest. It is, however, restricted to task sets where each task has at most one critical section. In thispaper, we remove this restriction and demonstrate how to utilize algorithms for the classical job shop scheduling problem to construct adependency graph for tasks with multiple critical sections. To show the applicability, we discuss the implementation in LITMUS RT andreport the overheads. Moreover, we provide extensive numerical evaluations under different configurations, which in many situationsshow significant improvement compared to the state-of-the-art. Index Terms —Real-Time Systems, Multiprocessor Resource Synchronization, Job Shop, and Dependency Graph Approaches (cid:70)
NTRODUCTION U NDER the von-Neumann programming model, sharedresources that require mutual exclusive accesses, suchas shared files, data structures, etc., have to be protectedby applying synchronization ( binary semaphores ) or locking( mutex locks ) mechanisms. A protected code segment thathas to access a shared resource mutually exclusively iscalled a critical section . For uniprocessor real-time systems,the state-of-the-art are longstanding protocols that havebeen developed in the 90s, namely the Priority InheritanceProtocol (PIP) and the Priority Ceiling Protocol (PCP) bySha et al. [34], as well as the Stack Resource Policy (SRP)by Baker [3]. Specifically, a variant of PCP has been imple-mented in Ada (called Ceiling locking) and in POSIX (calledPriority Protect Protocol).Due to the development of multiprocessor platforms,multiprocessor resource synchronization and locking pro-tocols have been proposed and extensively studied, suchas the Distributed PCP (DPCP) [33], the MultiprocessorPCP (MPCP) [32], the Multiprocessor SRP (MSRP) [16],the Flexible Multiprocessor Locking Protocol (FMLP) [4],the Multiprocessor PIP [13], the O ( m ) Locking Protocol(OMLP) [7], the Multiprocessor Bandwidth Inheritance (M-BWI) [15], and the Multiprocessor resource sharing Proto-col (MrsP) [8]. Since the performance of these protocolshighly depends on task partitioning, several partitioningalgorithms were developed in the literature, e.g., for MPCPby Lakshmanan et al. [26] and Nemati et al. [30], forMSRP by Wieder and Brandenburg [42], and for DPCP byHsiu et al. [21], Huang et. al [22], and von der Br ¨uggen etal. [40]. In addition to the theoretical soundness of theseprotocols, some of them have been implemented in the real-time operating systems LITMUS RT [5], [9] and RTEMS .For several decades, the primary focus when consid-ering multiprocessor synchronization and locking in real- time systems has been the design and analysis of resourcesharing protocols, where the protocols decide the orderin which the new incoming requests access the sharedresources dynamically. Contrarily, the Dependency GraphApproaches (DGA), that was proposed by Chen et al. [11]in 2018, pre-computes the order in which tasks are allowedto access resources, and consists of two individual steps:1) A dependency graph is constructed to determine theexecution order of the critical sections guarded by onebinary semaphore or mutex lock .2) Multiprocessor scheduling algorithms are applied toschedule the tasks by respecting the constraints givenby the constructed dependency graph(s).Chen et al. [11] showed significant improvement againstexisting protocol-based approaches from the empirical aswell as from the theoretical perspective, and demonstratedthe practical applicability of the DGA by implementing it inLITMUS RT [5], [9]. However, the original dependency graphapproaches presented in [11] has two strong limitations:1) the construction in the first step allows only one criticalsection per task, and 2) the presented algorithms can onlybe applied for frame-based real-time task systems, i.e., alltasks have the same period and release their jobs always atthe same time. The latter has been recently removed by Shiet al. [36], who applied the DGA after unrolling the jobs inthe hyperperiod. However, the former remains open and isa fundamental obstacle which limits the generality of thedependency graph approaches.In the original DGA, the assumption that each task hasonly one non-nested critical section allows the algorithm topartition the tasks according to their shared resources in thefirst step. However, when a task accesses multiple shared re-sources, such a partitioning is no longer possible. Therefore,to enable the DGA for tasks with multiple critical sections,an exploration of effective construction mechanisms for adependency graph that considers the interactions of the a r X i v : . [ c s . O S ] J u l ARLY ANNOUNCEMENT 2 shared resources is needed.
Contribution:
In this paper, we focus on allowing mul-tiple critical sections per task in the dependency graphapproaches for both frame-based and periodic real-time tasksystems with synchronous releases. Our contributions are: • Our key observation is the correlation between thedependency graph in DGA and the classical job shopscheduling problem . With respect to the computationalcomplexity, we present a polynomial-time reductionfrom the classical job shop scheduling problem , which is N P -hard in the strong sense [28]. Intractability resultsare established even for severely restricted instances ofthe studied multiprocessor synchronization problem, asdetailed in Sec. 3. • For frame-based task sets, we reduce the problem ofconstructing the dependency graph in the DGA tothe classical job shop scheduling problem in Sec. 4, andestablish approximation bounds for minimizing themakespan based on the approximation bounds of job-shop algorithms. Sec. 4.4 details how these results canbe extended to periodic real-time task systems. • We explain how we implemented the dependencygraph approach with multiple critical sections inLITMUS RT and report the overheads in Sec. 5, showingthat our new implemented approach is comparable tothe existing methods with respect to the overheads. • We provide extensive numerical evaluations in Sec. 6,which demonstrate the performance of the proposedapproach under different system configurations. Com-pared to the state-of-the-art, our approach shows sig-nificant improvement for all the evaluated frame-basedreal-time task systems and for most of the evaluatedperiodic task systems.
YSTEM M ODEL
We consider a set T of n recurrent tasks to be scheduledon M identical (homogeneous) processors. All tasks canhave multiple (non-nested) critical sections and may accessseveral of the Z shared resources. Each task τ i is describedby τ i = (( η i , C i ) , T i , D i ) , where: • η i is the number of computation segments in task τ i . • C i is the total worst-case execution time (WCET) of thecomputation segments in task τ i . • T i is the period of τ i . • D i is the relative deadline of τ i .We consider constrained deadlines, i.e., ∀ τ i ∈ T , D i ≤ T i .For the j -th segment of task τ i , denoted as θ i,j = ( C i,j , λ i,j ) : • C i,j ≥ is the WCET of computation segment θ i,j with C i = (cid:80) η i j =1 C i,j . • λ i,j indicates whether the corresponding segment is anon-critical section or a critical section. If θ i,j is a criticalsection, λ i,j is ; otherwise, λ i,j is . • If θ i,j is a non-critical section, then θ i,j − and θ i,j +1 must be critical sections (if they exist). That is, θ i,j and θ i,j +1 cannot be both non-critical sections. • If θ i,j is a critical section, it starts from the lock of amutex lock (or wait for a binary semaphore), denotedby σ i,j , and ends at the unlock of the same mutex lock(or signal to the same binary semaphore). Furthermore, we make following assumptions: • A job cannot be executed in parallel, i.e., the computa-tion segments in a job must be sequentially executed. • The execution of the critical sections guarded by a mu-tex lock (or one binary semaphore) must be sequentiallyexecuted. Hence, if two computation segments sharethe same lock, they must be executed one after another. • There are in total Z mutex locks (or binary semaphores).We consider two kinds of task systems, namely: • Frame-based task systems: all tasks release their jobs atthe same time and have the same period and relativedeadline, i.e., ∀ i, j, T i = T j ∧ D i = D j . Hence, theanalysis can be restricted to one job of each task. • Periodic task systems (with synchronous release): alltasks release their first job at time and subsequent jobsare released periodically, but different tasks may havedifferent periods and relative deadlines. The hyper-period of the task set T is defined as the least commonmultiple (LCM) of the periods of the tasks in T . In this subsection, we define the problem of schedulingframe-based real-time tasks with multiple critical sectionsin homogeneous multiprocessor systems.We define a schedule from the sub-job’s perspective.Suppose that Θ is the set of the computation segments, i.e., Θ = { θ i,j | τ i ∈ T , j = 1 , , . . . , η i } . A schedule for T is afunction ρ : R × M → Θ ∪{⊥} , where ρ ( t, m ) = θ i,j denotesthat the sub-job θ i,j is executed at time t on processor m ,and ρ ( t, m ) = ⊥ denotes that processor m is idle at time t . Since a job has to be sequentially executed, at any timepoint t ≥ , only a sub-job of τ i can be executed on one ofthe M processors, i.e., if ρ ( t, m ) is θ i,j , then ρ ( t, m (cid:48) ) (cid:54) = θ i,k for any k ≤ η i and m (cid:48) (cid:54) = m . Moreover, since the sub-jobs ofa job must be executed sequentially, θ i,k cannot be executedbefore θ i,j finishes for any j < k ≤ η i , i.e., if ρ ( t, m ) is θ i,j for some t, m, i, j , then ρ ( t (cid:48) , m ) (cid:54) = θ i,k for any t (cid:48) ≤ t andany k > j . The critical sections guarded by one mutex lockmust be sequentially executed, i.e., if λ i,j is , λ k,(cid:96) is , and σ i,j = σ k,(cid:96) , then when ρ ( t, m ) is θ i,j , and a schedule mustguarantee that ρ ( t, m (cid:48) ) (cid:54) = θ k,(cid:96) for any t ≥ and m (cid:54) = m (cid:48) .We only consider schedules that can finish the executiondemand of the computation segments. Let R be the finishingtime of the schedule. In this case, (cid:80) Mm =1 (cid:82) R [ ρ ( t, m ) = θ i,j ] dt must be equal to C i,j , where [ P ] is the Iverson bracket, i.e., [ P ] is when the condition P holds, otherwise [ P ] is . Notethat the integration is used in this paper only as a sym-bolic notation to represent the summation over time. Theearliest moment when all sub-jobs finish their computationsegments in the schedule (under all the constraints definedabove) is called the makespan of the schedule, commonlydenoted as C max in scheduling theory, i.e., C max of schedule ρ is:min. R s. t. M (cid:88) m =1 (cid:90) R [ ρ ( t, m ) = θ i,j ] dt = C i,j , ∀ θ i,j ∈ Θ The problem of multiprocessor synchronization withmultiple critical sections per task can be transferred to thefollowing two general problems:
ARLY ANNOUNCEMENT 3
Definition 1.
Multiprocessor Multiple critical-Sectionstask Synchronization (
MMSS ) makespan problem:
Assume M identical (homogeneous) processors and that n tasks arearriving at time . Each task τ i is composed of η i computationsegments, each of which is either a non-nested critical sectionor a non-critical section. The objective is to find a schedule thatminimizes the makespan. A feasible schedule of the
MMSS makespan problem is aschedule that satisfies all aforementioned non-overlappingconstraints. An optimal solution of an input instance of the
MMSS makespan problem is the makespan of a schedulethat has the minimum makespan among the feasible sched-ules of the input instance. An algorithm A for the MMSS makespan problem has an approximation ratio a ≥ , if givenany task set T and M processors, the resulting makespan isat most a · C ∗ max , where C ∗ max is the optimal makespan. Definition 2.
The
MMSS schedulability problem:
Assumethere are M identical (homogeneous) processors and that n tasksare arriving at time . All tasks τ i have the same deadline D .Each task is composed of η i computation segments, each of whichis either a non-nested critical section or a non-critical section. Theobjective is to find a feasible schedule that meets the deadline D on the given M processors. A feasible schedule of the
MMSS schedulability problemis a schedule that has a makespan no more than D andsatisfies all the non-overlapping constraints. The MMSS schedulability problem is a decision problem, in which fora given D and a given algorithm either a feasible scheduleis derived that meets the deadlines or no feasible schedulecan be derived from the algorithm. For such a decisionsetting, the speedup factor [23], [31] can be used to examinethe performance. Provided that there exists one feasible scheduleat the original speed , the speedup factor a ≥ of a schedulingalgorithm A for the MMSS schedulability problem is thefactor a ≥ by which the overall speed of a system wouldneed to be increased so that the algorithm A always derivesa feasible schedule. In this subsection, for completeness, we summarize theclassical flow shop and job shop scheduling problems inoperations research (OR). In scheduling theory, a schedulingproblem is described by a triplet α | β | γ . • α describes the machine (i.e., processing) environment. • β specifies the characteristics and constraints. • γ is the objective to be optimized.The widely used machine environment in α are: • : single machine (or uniprocessor). • P : independent machines (or homogeneous multipro-cessor systems). • F M : flow shop. The environment F M consists of M machines and each job i has a chain of M sub-jobs, de-noted as O i, , O i, , . . . , O i,M , where the M operationsare executed in the specified order and O i,m is executedon the m -th machine. A job has to finish its operationon the m -th machine before it can start any operationon the ( m + 1) -th machine, for any m = 1 , , . . . , M − . • J M : job shop , i.e., a job i has a chain of η i sub-jobs,denoted as O i, , O i, , . . . , O i,η i , where the η i operations should be executed in the specified order and O i,m isexecuted on a specified machine. Note that a flow shopis a special case of a job shop environment.In this paper, we are specifically interested in three con-straints specified in β : • prmp : preemptive scheduling. In classical schedulingtheory, preemption in parallel machines implies thepossibility of job migration from one machine to an-other machine. • r j : with specified arrival time of the job (and deadline). • l i,j : preparation time between dependent job pair, i.e.,job i and job j . • prec : the jobs have precedence constraints.Note that the scheduler is implicitly assumed to be non-preemptive if prmp is not specified. Furthermore, the jobset is assumed to arrive at time if r j is not specified.In addition, we are specifically interested in two objectivesspecified in γ : • C max : to minimize the makespan, as defined in Sec. 2.2. • L max : to minimize the maximum lateness over all jobs,in which the lateness of a job is defined as its finishingtime minus its absolute deadline. Two types of access patterns of the critical sections areconsidered, which we name according to the applicablealgorithms for convenience: • Flow-Shop Compatible Access Patterns : A task set hasa pattern where flow-shop approaches can be applied, ifall tasks access each resource (in a non-nested manner)at most once and a total order ≺ in which tasks accessthe resources can be constructed over all tasks in theset. Hence, a flow-shop pattern means that σ i,j (cid:48) ≺ σ i,j when j (cid:48) < j and θ i,j (cid:48) and θ i,j are both critical sections.In such a case, we can assume that the mutex locksare indexed according to the specified order. However,while tasks have to respect the order ≺ when accessingthe resources, mutex locks that are not needed may beskipped. • Job-Shop Compatible Access Patterns allow tasks toaccesses shared resources multiple times and withoutany restriction on the order in which resources areaccessed.
Flow-shop compatible access patterns are a very restrictivespecial case and of the much more general job-shop compatibleaccess patterns . We implicitly assume job-shop compatibleaccess patterns if not specified differently, but examineflow-shop compatible access patterns when showing certaincomplexity results.
OMPUTATIONAL C OMPLEXITY A NALYSIS
In this section, we provide a short overview of resultsregarding job shop and flow shop problems in the literatureat first. Afterwards, we explain the connection of the
MMSS schedulability problem to the job and flow shop problemby showing different reductions that can be later appliedfor demonstrating different scenarios with respect to theircomputational complexity.
ARLY ANNOUNCEMENT 4
Since the late 1950s, many computational complexity results,approximation algorithms, heuristic algorithms, and toolsfor job and flow shop scheduling problems have been estab-lished. Intractability results have been well-established evenfor severely restricted instances of job shop or flow shopproblems. The reader is referred to the surveys by Lawler etal. [27] and Chen et al. [10] for details.Specifically, the following restricted scenarios are N P -complete in the strong sense: • J || C max , see [28]. • J | p i,j = 1 | C max , i.e., unit execution time, see [28]. • J | n = 3 | C max , i.e., 3 jobs with multiple operations on3 shops, see [39]. • F || C max , i.e., three-stage flow shop [17]. • F | r j | C max , i.e., two-stage flow shop with arrival times,as shown in [28]. • F | p i,j = 1 , t j | C max , i.e., two-stage flow shop with unitprocessing time and transportation time between thefinishing time of the first and the starting time of thesecond stage [44].The best polynomial-time approximation algorithm for thegeneral J M || C max problem was provided by Shmoys etal. [38], showing an approximation ratio of O (cid:16) log ( Mµ )log log( Mµ ) (cid:17) ,where M is the number of shops and µ is the maximumnumber of operations per job. The approximation ratio ofthis algorithm was later improved by Goldberg et al. [18],showing a ratio of O (cid:16) log ( Mµ )(log log( Mµ )) (cid:17) .Whether there exists a polynomial-time algorithm witha constant approximation ratio for the general F M || C max or J M || C max problem remained open until 2011, whenMastrolilli and Svensson [29] showed that F M || C max (andhence J M || C max ) does not admit any polynomial-time ap-proximation algorithm with a constant approximation ratio.Moreover, they also showed that the lower bound on theapproximation ratio is very close to the existing upperbound provided by Goldberg et al. [18].In Sec. 3.3, we demonstrate that the MMSS schedulabilityproblem is already N P -complete in the strong sense forvery restrictive scenarios, even when M and Z are bothextremely small. In Sec. 3.4, we further reduce from themaster-slave problem [44] to show that the MMSS schedu-lability problem is N P -complete in the strong sense evenwhen there are two critical sections that access the uniqueshared resource with unit execution time per task. Chen et al. [11] showed that a special case of the
MMSS makespan problem is N P -hard in the strong sense when atask has only one critical section and M is sufficiently large.The MMSS schedulability problem is the decision versionof the
MMSS makespan problem. We therefore focus onthe hardness of the decision version in Definition 2. Here,we provide reductions from the job/flow shop schedulingproblems to different restricted scenarios of the
MMSS schedulability problem. Such reductions are used in Sec. 3.3for demonstrating the N P -completeness for different sce-narios. We start from the more general scenario under thesemi-partitioned scheduling paradigm. Theorem 1.
Under the semi-partitioned scheduling paradigm,there is a polynomial-time reduction from an input instance ofthe decision version of the job shop scheduling problem J Z || C max with Z shops to an input instance of the MMSS schedulabilityproblem that has Z mutex locks on M processors with M ≥ Z .Proof. The proof is based on a polynomial-time reduc-tion from an instance of the job shop scheduling problem J Z || C max to the MMSS schedulability problem. We presenta polynomial-time reduction from the job shop schedulingproblem J Z || C max to the MMSS schedulability problem.Suppose a given input instance with n jobs of the job shopscheduling problem J Z || C max . • We have Z shops with non-preemptive execution. • A job i is defined by a chain of η i sub-jobs, denoted as O i, , O i, , . . . , O i,η i . The processing time of O i,j is C i,j . • These η i operations should be executed in the specifiedorder and O i,m is executed on one of the given Z shops,i.e., on shop s ( O i,m ) , where s ( O i,m ) ∈ { , , . . . , Z } .The decision version of the job shop scheduling problem isto decide whether there is a non-preemptive schedule whosemakespan is no more than a given D . The polynomial-timereduction to the MMSS schedulability problem is as follows: • There are M ≥ Z processors. • There are Z mutex locks, indexed as , , . . . , Z . • For a job i of the input instance of the job shop schedul-ing problem, we create a task τ i , which is composed of η i computation segments. The execution time of θ i,j isthe same as the processing time of the operation O i,j .The mutex lock σ i,j used by θ i,j is s ( O i,m ) . • The deadline of the tasks is D and the period is T = D .We denote the above input instance of the job shop schedul-ing problem as I (the MMSS schedulability problem as I (cid:48) ,respectively). We show that there exists a feasible schedule ρ for I (in the job shop scheduling problem) if and onlyif there exists a feasible schedule ρ (cid:48) for I (cid:48) (in the MMSS schedulability problem). Only-if part : Suppose ρ is a feasible schedule for I , i.e., (cid:32) Z (cid:88) m =1 (cid:90) D [ ρ ( t, m ) = O i,j ] dt (cid:33) = C i,j , ∀ O i,j (1)and ρ ( t, m ) (cid:54) = O i,j for any t and m if s ( O i,j ) (cid:54) = m . Since theexecution on shops ins non-preemptive, if two operations O i,j and O k,(cid:96) are supposed to be executed on a shop z ,they are executed sequentially in ρ . As a result, without anyconflict, for ≤ t ≤ D , we can set ρ (cid:48) ( t, m ) = (cid:40) ⊥ if ρ ( t, m ) = ⊥ θ i,j if ρ ( t, m ) = O i,j (2)In the schedule ρ (cid:48) , critical sections guarded by the mutexlock z are executed sequentially on the z -th processor.Therefore, (cid:32) Z (cid:88) m =1 (cid:90) D [ ρ (cid:48) ( t, m ) = θ i,j ] dt (cid:33) = C i,j , ∀ θ i,j ∈ Θ (3)
2. Although we do not formally define the schedule function ofthe job shop scheduling problem, we believe that the context is clearenough by replacing the use of the computation segments with theoperations.
ARLY ANNOUNCEMENT 5 and all the constraints for a feasible schedule for I (cid:48) are met.Such a schedule is a semi-partitioned and non-preemptiveschedule (from the sub-job’s perspective), which is also aglobal preemptive schedule (from the job’s perspective). If part : Suppose that ρ (cid:48) is a feasible schedule for I (cid:48) , i.e., M (cid:88) m =1 (cid:90) D [ ρ (cid:48) ( t, m ) = θ i,j ] dt = C i,j , ∀ θ i,j ∈ Θ (4)and the schedule ρ (cid:48) executes any two critical sections θ i,j and θ k,(cid:96) with σ i,j = σ k,(cid:96) = z sequentially. Therefore, for amutex lock z ∈ { , , . . . , Z } , the critical sections guardedby z must be sequentially executed. As a result, without anyconflict, for ≤ t ≤ D , we can set ρ ( t, z ) = (cid:26) O i,j if ∃ m with ρ (cid:48) ( t, m ) = θ i,j and σ i,j = z ⊥ otherwise (5)However, since we do not put any constraint on the feasibleschedule ρ (cid:48) , it is possible that the execution of O i,j on shop z is not continuous. Suppose that a i,j ( f i,j , respectively)is the first (last, respectively) time instant when O i,j isexecuted on shop z in ρ . Since the schedule ρ (cid:48) executesany two critical sections θ i,j and θ k,(cid:96) sequentially when σ i,j = σ k,(cid:96) = z , we know that for any t between a i,j and f i,j either ρ ( t, z ) = O i,j or ρ ( t, z ) = ⊥ . Therefore,we can simply set ρ ( t, z ) to O i,j for any t in the timeinterval [ a i,j , a i,j + C i,j ) and set ρ ( t, z ) to ⊥ for any t in [ a i,j + C i,j , f i,j ) . The resulting schedule ρ executes all theoperations non-preemptively on the corresponding shops.Therefore, all the scheduling constraints of the job shopscheduling problem are met and (cid:32) Z (cid:88) m =1 (cid:90) D [ ρ ( t, m ) = O i,j ] dt (cid:33) = C i,j , ∀ O i,j (6)We note that there is no specific constraint of schedulingimposed by the schedule ρ (cid:48) .The proof of Theorem 1 is not valid for the more re-strictive partitioned scheduling paradigm, i.e., all the com-putation segments of a task must be executed on the sameprocessor, since the constructed schedule ρ (cid:48) in the proof ofthe only-if part is not a partitioned schedule. Interestingly,if we use an abundant number of processors, i.e., M ≥ n ,then the reduction in Theorem 1 holds for the partitionedscheduling paradigm as well. Theorem 2.
Under the partitioned scheduling paradigm, thereis a polynomial-time reduction which reduces from an inputinstance of the decision version of the job shop scheduling problem J Z || C max with Z shops to an input instance of the MMSS schedulability problem that has n tasks and Z mutex locks on M processors with M ≥ n ≥ Z .Proof. The proof is identical to the proof of Theorem 1 byensuring that ρ (cid:48) constructed in the only-if part in the proofof Theorem 1 can be converted to a partitioned schedule.Instead of applying Eq. (2), since M ≥ n , without anyconflict, for ≤ t ≤ D and i = 1 , , . . . , n , we can set ρ (cid:48) ( t, i ) = (cid:40) ⊥ if (cid:64) m with ρ ( t, m ) = O i,j θ i,j if ∃ m with ρ ( t, m ) = O i,j (7) Since all computation segments of τ i are executed on pro-cessor i , the schedule ρ (cid:48) is a partitioned schedule. All theremaining analysis follows the proof of Theorem 1. Theorem 3.
There is a polynomial-time reduction which reducesfrom an input instance of the decision version of the flow shopscheduling problem F Z || C max with Z flow shops to an inputinstance of the MMSS schedulability problem that has Z mutexlocks with a flow-shop compatible access pattern. The conditions inTheorems 1 and 2 for different scheduling paradigms with respectto constraint of M remain the same.Proof. The proof is identical to the proofs of Theo-rems 1 and 2. The additional condition is to access to the Z mutex locks by following the index, starting from .The above theorems show that the computational com-plexity of the MMSS schedulability problem is almost in-dependent from the number of processors (i.e., adding pro-cessors may not be helpful) and the underlying schedulingparadigm. The fundamental problem is the sequencing ofthe critical sections. M We can now reach the computational complexity of the
MMSS schedulability problem when Z ≥ for small M .For completeness, we state the following lemma. Lemma 1.
The
MMSS schedulability problem is in N P .Proof. Since the feasibility of a given schedule for the
MMSS schedulability problem can be verified in polynomial-time,it is in N P .The following four theorems are based on the reductionsin Theorem 1 and Theorem 3. In general, even very specialcases are N P -complete in the strong sense. Theorem 4.
Under the semi-partitioned scheduling paradigm,the
MMSS schedulability problem is N P -complete in the strongsense when Z = M = 2 .Proof. The job shop scheduling problem J || C max with 2shops is N P -complete in the strong sense [28]. Togetherwith Theorem 1, we conclude the theorem.The MMSS schedulability problem is also difficult whenall computation segments have the same execution time.
Theorem 5.
Under the semi-partitioned scheduling paradigm,the
MMSS schedulability problem is N P -complete in the strongsense when Z = M = 3 and C i,j = 1 for any computationsegment θ i,j .Proof. The job shop scheduling problem J | p i,j = 1 | C max with unit execution time on 3 shops is N P -complete in thestrong sense [28]. Together with Theorem 1, we conclude thetheorem.The following theorem shows that the MMSS schedu-lability problem is also difficult when there are just threetasks, three mutex locks, and three processors.
Theorem 6.
The
MMSS schedulability problem is N P -completein the strong sense when n = Z = M = 3 . ARLY ANNOUNCEMENT 6
Proof.
The job shop scheduling problem J | n = 3 | C max with 3 jobs (with multiple operations) on 3 shops is N P -complete in the strong sense [39]. Together with Theorem 1,we conclude the theorem for semi-partitioned schedulingparadigm.For the partitioned scheduling paradigm, since there areexactly 3 tasks, 3 processors, and 3 mutex locks, the compu-tational complexity remains the same, as a semi-partitionedschedule can be mapped to a partitioned schedule. Theorem 7.
Under the semi-partitioned scheduling paradigm,the
MMSS schedulability problem for flow-shop compatibleaccess patterns is N P -complete in the strong sense when Z = M = 3 .Proof. The flow shop scheduling problem F || C max with 3shops is N P -complete in the strong sense [17]. Togetherwith Theorem 3, we conclude the theorem. M ≥ N Chen et al. [11] showed that a special case of the
MMSS makespan problem is N P -hard in the strong sense when atask has only one critical section and M is sufficiently large.The following theorem shows that the MMSS schedulabilityproblem is N P -complete when there are only two criticalsections per task and the critical sections are with unitexecution time. Theorem 8.
The
MMSS schedulability problem is N P -completein the strong sense when Z = 1 , η i ≥ for every τ i ∈ T , C i,j = 1 for every computation segment θ i,j with λ i,j = 1 , and M ≥ N .Proof. The problem is in N P , since the feasibility of a givenschedule can be verified in polynomial-time. Similar to theproof of Theorem 1, we show a polynomial-time reductionfrom the master-slave scheduling problem with unit execu-tion time on the master [44]. Assume a given input instancewith n jobs of the master-slave scheduling problem: • We assume a sufficient number of slaves, but only onemaster that can be modeled as a uniprocessor. • A job i has a chain of three sub-jobs, in which the firstand third sub-jobs have to be executed on the masterand the second sub-job has to be executed on a slave. • The processing time of the first and third sub-jobs of ajob i is . The processing time of the second sub-job ofa job i is O i > .The decision version of the master-slave scheduling prob-lem is to decide whether there is a schedule whosemakespan is no more than a given target D , which is N P -complete in the strong sense [44]. The master-slavescheduling problem is equivalent to the uniprocessor self-suspension problem with two computation segments andone suspension interval.The polynomial-time reduction to the MMSS schedulabilityproblem is as follows: • There are M ≥ n processors. • There is one mutex lock. • For a job i of the input instance of the master-slavescheduling problem, we create a task τ i , which is com-posed of three computation segments. The executiontime C i, = C i, and C i, = O i . Computation segments θ i, and θ i, are critical sections guarded by the onlymutex lock. Computation segment θ i, is a non-criticalsection. • The deadline of the tasks is D and the period is T = D .It is not difficult to prove that a feasible schedule ρ forthe original input of the master-slave scheduling problemexists if and only if there exists a feasible schedule ρ (cid:48) for thereduced input of the MMSS schedulability problem. Detailsare omitted due to space limitation. HE DGA B
ASED ON J OB /F LOW S HOP
In this section, we detail the DGA for tasks with multiplecritical sections, based on job shop scheduling to constructa dependency graph. • In the first step, we construct a directed acyclic graph G = ( V, E ) . For each sub-job θ i,j of task τ i in T , we create a vertex in V . The sub-job θ i,j is apredecesor of θ i,j +1 for j = 1 , , . . . , η i − . Sup-pose that Θ z is the set of the computation seg-ments that are critical sections guarded by mutex lock z , i.e., Θ z ← { θ i,j | λ i,j = 1 and σ i,j = z } . For each z = 1 , , . . . , Z , the subgraph of the computation seg-ments in Θ z is a directed chain, which represents thetotal execution order of these computation segments. • In the second step, we construct a schedule of G on M processors either globally or partitioned, either pre-emptive or non-preemptive.For a directed acyclic graph G , a critical path of G is alongest path of G , and its length is denoted by len ( G ) . Wenow explain how to reduce from an input instance I MS ofthe MMSS makespan problem to an input instance I JS ofthe job shop scheduling problem J Z + n || C max . • We create Z + n shops: – Shop z ∈ { , , . . . , Z } is exclusively used to executecritical sections guarded by mutex lock z . That is,only critical sections θ i,j with λ i,j = 1 and σ i,j = z (i.e., θ i,j ∈ Θ z ) can be executed on shop z . – Shop Z + i is exclusively used to execute non-criticalsections of task τ i . That is, only non-critical sections θ i,j with λ i,j = 0 can be executed on shop Z + i . • The operation of each computation segment θ i,j istransformed to the corresponding shop, and the pro-cessing time is the same as the segment’s executiontime, i.e., C i,j .Suppose that ρ JS is a feasible job shop schedule for I JS .Since ρ JS is non-preemptive, the operations on a shop areexecuted sequentially in ρ JS . The construction of the de-pendency graph G sets the precedence constraints of Θ z byfollowing the total order of the execution of the operationson shop z , i.e., the shop dedicated for Θ z in ρ JS .Once the dependency graph G is constructed, a schedule ρ MS of the original input instance I MS can be generated byapplying any scheduling algorithms to schedule G , as al-ready detailed in [11], [36]. Specifically, for semi-partitionedscheduling, the LIST-EDF in [36] based on classical listscheduling by Graham [19] can be applied, i.e., whenevera processor idles and at least one sub-job is eligible, thesub-job with the earliest deadline starts its execution on theprocessor. Additionally, its partitioned extension in [37] (P-EDF) can be applied to generate the partitioned schedule. ARLY ANNOUNCEMENT 7 non-critical section mutex lock 1/mutex lock 2 τ τ τ τ (a) A dependency graph of a task set with two binary semaphores. Shop Res. Shop Res. Shop τ ) Shop τ ) Shop τ ) Shop τ ) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sh Sh Sh Sh Sh Sh Sh Sh τ τ τ τ τ τ τ τ (b) The job shop schedule (with 6 shops denoted as Sh − P P θ , θ , θ , θ , θ , θ , θ , θ , θ , θ , θ , θ , θ , θ , θ , θ , θ , θ , θ , θ , (c) Schedule the dependency graph on 2 processors using LIST-EDF. Fig. 1. An example of the DGA based on job shop scheduling.
We assume each computation segment/sub-task exe-cutes exactly its WCET for all the releases, i.e., early comple-tion is forbidden, thus the schedule generated for one hyper-period is static and repeated periodically. Accordingly, anexact schedulability test is performed by simply evaluatingthe LIST-EDF or P-EDF schedule over one hyper-period tocheck whether there is any deadline miss. Since the scheduleis static and repeated periodically, there is no dynamics thatcan lead to the multiprocessor anomalies pointed out byGraham [19].
To demonstrate the work flow of our approach, we providean illustrative example in Fig. 1. Consider a frame-basedtask set consisting of four tasks and two shared resources,where all tasks have the same period, i.e., T i = 25 . Each taskconsists of five segments, two critical sections (rectanglesin Fig. 1 (a)) and three non-critical sections (circles). Thesecomputation segments within one task have to be executedsequentially by following the pre-defined order (black solidarrows in Fig. 1 (a)). Each of the critical sections accessesone of the shared resources, protected by mutex locks re-spectively. The numbers in the circles and rectangles are theexecution times of corresponding computation segments.To construct a dependency graph for the task set, we ap-ply job shop scheduling with exclusively assigned shops:shop and shop are for the critical sections of the two shared resources, and shops 3 to 6 are for the non-criticalsections of tasks τ to τ . Hence, once a task needs to accessthe shared resource, the execution will be migrated to thecorresponding shops, e.g., to shop 1 for resource 1. Thisinput instance for J Z + n || C max is I JS .Fig. 1 (b) shows a job shop schedule for I JS . The execu-tion order for shared resources 1 and 2 in shop 1 and shop2 is according to the precedence constraints in Fig. 1 (a),where dashed red directed edges represent the precedenceconstraints of mutex lock 1 and the dotted blue directededges are the precedence constraints of mutex lock 2.The concrete schedule is shown in Fig. 1 (c), wherethe LIST-EDF presented in [36] is adopted to generate theschedule on two processors. We now prove the equivalence of a schedule of I JS and adirected acyclic graph G for I MS . Lemma 2.
Suppose that there is a directed acyclic graph G for I MS whose critical path length is len ( G ) . There is a job shopschedule for I JS whose makespan is len ( G ) .Proof. This lemma is proved by constructing a job shopschedule ρ JS for I JS , in which the makespan of ρ JS is len ( G ) . Suppose that the longest path ended at a vertex θ i,j in V in the directed acyclic graph G is L i,j . There are twocases to schedule θ i,j in ρ JS : • If θ i,j is a non-critical section, the schedule ρ JS sched-ules the operation on shop i + Z from time L i,j − C i,j to L i,j . • If θ i,j is a critical section guarded by mutex lock z , theschedule ρ JS schedules the operation on shop z fromtime L i,j − C i,j to L i,j .The above schedule has a makespan of len ( G ) by con-struction. The only thing that has to be proved is that theschedule is a feasible job shop schedule for I JS .Suppose for contradiction that the schedule ρ JS is not afeasible job shop schedule for I JS . This is only possible ifthe schedule ρ JS has a conflicting decision to schedule twooperations at the same time t on a shop z . There are twocases:1) z is an exclusively reserved shop for the non-criticalsections of a task. This contradicts to the definition of G since the non-critical sections of task τ i form a totalorder in graph G .2) z is a shop for the critical sections guarded by the mutexlock z . This contradicts to the definition of G since thecritical sections in Θ z form a total order in graph G .In both cases, we reach the contradiction. Therefore, I JS isa feasible job shop schedule with a makespan of len ( G ) . Lemma 3.
Suppose that there is a job shop schedule for I JS whose makespan is ∆ . Then, there is a directed acyclic graph G for I MS whose critical path length is at most ∆ .Proof. This lemma is proved by constructing a graph G for I , in which the critical path length of G is at most ∆ . Bythe definition of G , the sub-job θ i,j is a predecesor of θ i,j +1 for j = 1 , , . . . , η i − for every task τ i . For the sub-jobsin Θ z , we define their total order and form a chain in G by following the execution order on shop z in the given ARLY ANNOUNCEMENT 8 schedule ρ JS for I JS . Such a graph G must be acyclic;otherwise, the schedule ρ JS is not a valid job shop schedulefor I JS .We now prove that the critical path length len ( G ) of G isno more than ∆ . Suppose for contradiction that len ( G ) > ∆ .This critical path of G defines a total order of the executionof the computation segments in the critical path, whichfollows exactly the total order of the operations of a job anda shop in ρ JS . Therefore, this contradicts to the fact that themakespan of schedule ρ JS for I JS is ∆ .Based on Lemmas 2 and 3, we get the following theorem: Theorem 9. An a -approximation algorithm for the job shopscheduling problem J Z + n || C max can be used to construct adependency graph G with len ( G ) ≤ a × len ( G ∗ ) , where G ∗ is a dependency graph that has the shortest critical path length forthe input instance I MS of the MMSS makespan problem.Proof.
Suppose that ∆ ∗ is the optimal makespan for I JS .By Lemma 2, we know that ∆ ∗ ≤ len ( G ∗ ) . By Lemma 3,we know that ∆ ∗ ≥ len ( G ∗ ) . Therefore, ∆ ∗ = len ( G ∗ ) .Suppose that the algorithm derives a solution for I JS with amakespan ∆ . By the a -approximation for I JS and Lemma 3,we know ∆ ≤ a × ∆ ∗ . Therefore, by Lemma 3 and abovediscussions, len ( G ) ≤ ∆ ≤ a ∆ ∗ = a × len ( G ∗ ) . Lemma 4.
Let G ∗ be defined as in Theorem 9. The optimalmakespan for the input instance I MS of the MMSS makespanproblem is at least max (cid:40) (cid:88) τ i ∈ T C i M , len ( G ∗ ) (cid:41) (8) Proof.
The lower bound (cid:80) τ i ∈ T C i M is due to the pigeon holeprinciple. The lower bound len ( G ∗ ) is due to the definitionwith an infinite number of processors. Theorem 10.
Applying list scheduling for the dependency graph G with len ( G ) ≤ a × len ( G ∗ ) results in a schedule with an ap-proximation ratio of a +1 for the MMSS makespan problem undersemi-partitioned scheduling, where G ∗ is defined in Theorem 9.Proof. According to Theorem 1 and Section 4 in [19], byapplying list scheduling, the makespan of I MS for the MMSS makespan problem is at most len ( G ) + (cid:88) τ i ∈ T C i M ≤ a × len ( G ∗ ) + (cid:88) τ i ∈ T C i M ≤ ( a + 1) × max (cid:88) τ i ∈ T C i M , len ( G ∗ ) The resulting schedule is a semi-partitioned schedule sincetwo computation segments of a task can be executed ondifferent processors. By Lemma 4, we conclude the theorem.Since the 1950s [10], [27], job/flow shop schedulingproblems have been extensively studied. Although the prob-lems are N P -complete in the strong sense (even for veryrestrictive cases), algorithms with different properties havebeen reported in the literature. If time complexity is not amajor concern, applying constraint programming as wellas mixed integer linear programming (MILP) or branch-and-bound heuristics can derive optimal solutions for the job shop scheduling problem. In such a case, based onTheorem 10, our DGA has an approximation ratio of forthe MMSS makespan problem.
At first glance, it may seem impractical to reduce the
MMSS makespan problem to another very challenging problem,i.e., job shop scheduling, in the first step of our DGA algo-rithms. However, an advantage of considering the job shopscheduling problem is that it has been extensively studiedin the literature, related results can directly be applied,and commercial tools, like the Google OR-Tools , can beutilized, as we did in our evaluation. In addition, due toLemma 2, constructing a good dependency graph implies agood schedule for I JS .The last n job shops, i.e., shops Z + 1 , Z + 2 , . . . , Z + n ,in I JS , are in fact created just to match the original jobshop scheduling problem. From the literature of flow andjob shop scheduling, we know that these additional n jobshops can be removed by introducing delay ( l i,j in Sec. 2.3).If the first computation segment θ i, of task τ i is a non-critical section, this implies a non-zero release time r i oftask τ i in I JS .In our Google OR-Tools implementation for solving I JS ,the no overlap constraint has to be taken into considerationfor both machine and job perspectives. For each machine,it prevents jobs assigned on the same machine from over-lapping in time. For each job, it prevents sub-jobs for thesame job from overlapping in time. The first constraint canbe achieved by applying the AddNoOverlap method, bydefault supported in Google OR-Tools, for each machine.For the second constraint, instead of creating n + Z jobshops, we utilize the above concept by creating only Z jobshops and adding proper delays between the operations.We configure the start time (denoted as θ i,j .start ) of acomputation segment based on the end time (denoted as θ i,j .end ) of an earlier computation segment. For notationalbrevity, we assign θ i, .start ≥ and θ i, .end = 0 . For any j ≥ with λ i,j = 1 : (cid:40) θ i,j .start ≥ θ i,j − .end if λ i,j − is θ i,j .start ≥ θ i,j − .end + C i,j − if λ i,j − is (9)In other words, if θ i,j − is a non-critical section, the execu-tion time C i,j − is added directly to the end (finishing) timeof θ i,j − ; otherwise θ i,j is started after the end time of θ i,j − .Hence, a proper job shop scheduling problem for I JS is J Z | r j , l j | C max , i.e., scheduling of jobs with release time anddelays between operations on Z shops. An a -approximationalgorithm for the problem J Z | r j , l j | C max can be used toconstruct a dependency graph. This problem is not widelystudied and only few results can be found in the literature.For a task system with a flow-shop compatible access pat-tern , i.e., the Z mutex locks have a pre-defined total order,the instance I JS is in fact a flow shop problem. For a specialcase with three computation segments per task in which thesecond segment is a non-critical section, and the first andthe third segments are critical sections of mutex locks and , respectively, the constructed input I JS is a two-stage flow
3. https://developers.google.com/optimization/
ARLY ANNOUNCEMENT 9 shop problem with delays, i.e., F | l j | C max . For the problem F | l j | C max , several polynomial-time approximation algo-rithms are known: Karuno and Nagamochi [24] developeda -approximation, Ageev [1] developed a . approxima-tion for a special case when C i, = C i, for every task τ i ,and Zhang and van de Velde [45] proposed polynomial-timeapproximation schemes (PTASes), i.e., (1+ (cid:15) ) -approximationfor any (cid:15) > .Specifically, Zhang and van de Velde [45] presentedPTASes for different settings of the job/flow shop schedul-ing problems in [45]. For any of such scenarios, the approxi-mation ratio of DGA is at most (cid:15) for any (cid:15) > , accordingto Theorem 10. The treatment used in [36] to construct dependency graphscan also be applied here. That is, unroll the jobs of all thetasks in one hyper-period and then construct a dependencygraph of these jobs. Since the jobs for one task should nothave any execution overlap with each other, we only needone dedicated shop for them. Therefore, there are two mod-ifications of the job shop problem scheduling considered inSec. 4 (the studied problem is J Z + n | r j , l i,j | L max ): • For the (cid:96) -th job, we set its release time to ( (cid:96) − T i andits absolute deadline to ( (cid:96) − T i + D i . • Instead of optimizing the makespan, the objective is tominimize the maximum lateness.In the end, the schedules are generated offline by apply-ing LIST-EDF or P-EDF, similar to fame-based task systems.
MPLEMENTATION AND O VERHEADS
In this section, we present details on how we imple-mented the dependency graph approach in LITMUS RT tosupport multiple critical sections per task. Afterwards, theimplementation overheads are compared with the FlexibleMultiprocessor Locking Protocol (FMLP) [4] provided byLITMUS RT for both partitioned and global scheduling. When implementing our approach in LITMUS RT , we caneither apply the table-driven scheduling that LITMUS RT provides, or implement a new binary semaphore whichenforces the execution order of critical sections that accessthe same resource, since this order is defined in advanceby the dependency graph. A static scheduling table can begenerated over one hyper-period and be repeated periodi-cally in a table-driven schedule. This table determines whichsub-job is executed on which processor for each time pointin the hyper-period. However, due to the large numberof sub-jobs in one hyperperiod and possible migrationsamong processors, the resulting table can be very large.To avoid this problem, we decided to implement a newbinary semaphore that supports all the properties of ournew approach instead.Since our approach is an extension of the DGA byChen et al. [11], and Shi et al. [36], our implementation isbased on the source code the authors provided online [35],i.e., it is implemented under the plug-in Partitioned EDFwith synchronization support (PSN-EDF), called P-DGA-JS, and the plug-in Global EDF with synchronization support(GSN-EDF), denoted G-DGA-JS.The EDF feature is guaranteed by the original designof these two plug-ins. Therefore, we only need to providethe relative deadlines for all the sub-jobs of each task, andLITMUS RT will automatically update the absolute deadlinesaccordingly during runtime.In order to enforce the sub-jobs to follow the executionorder determined by the dependency graph, our implemen-tation has to: 1) let the all the sub-jobs inside one job followthe predefined order; 2) force all the sub-jobs that access thesame resource to follow the order determined by the graph.The first order is ensured in LITMUS RT by default. Thetask deploy tool rtspin provided by the user-space library liblitmus defines the task structure, e.g., the execution orderof non-critical sections and critical sections within one task,the related execution times, and the resource ID that eachcritical section accesses. Moreover, the resource ID for eachcritical section is parsed by rtspin , so the critical sectioncan find the correct semaphore to lock, and in our imple-mentation we do not have to further consider addressingthe corresponding resources. Afterwards, rtspin emulatesthe work load in a CPU according to the taskset. A sub-job can be released only when its predecessor (if any) hasfinished its execution. Please note that for sub-jobs relatedto critical sections the release time is not only definedby its predecessor’s finish time inside the same job, butalso related to another predecessor that accesses the sameresource (if one exists).A ticket system with a similar general concept to [35]is applied to enforce the execution order. However, due todifferent task structure which allows to support multiplecritical sections, compared to [35], additional parametershad to be introduced and the structure of existed parametershad to be revised. To be precise, we extended LITMUS RT data structure rt_params that describes tasks, e.g., priority,period, and execution time, by adding: • total_jobs : an integer which defines the number ofjobs of the related task in one hyper-period. • total_cs : an integer that defines the number of criti-cal sections in this task. • job_order : an array which defines the total orderof the sub-jobs related to critical sections that accessthe same resource over one hyper-period. In addition,the last Z elements record the total number of crit-ical sections of the taskset for each shared resource.Thus, the length of the array is the number of criticalsections in one hyper-period plus the number of totalshared resources, i.e., len ( job_order ) = total_jobs × total_cs + Z . • current_cs : an integer that defines the index of thecurrent critical section of the task that is being executed. • relative_ddls : an array which records the relativedeadlines for all sub-jobs of one task.Furthermore, we implemented a new binary semaphore,named as mdga_semaphore , to make sure the executionorder of all the sub-jobs that access the same resourcefollows the order specified by the dependency graph.A semaphore has the following common components: • litmus_lock protects the semaphore structure, ARLY ANNOUNCEMENT 10 total jobs total cs job order current cs τ τ τ τ τ TABLE 1An example of the data structure for tasks. • semaphore_owner defines the current holder of thesemaphore, and • wait_queue stores all jobs waiting for this semaphore.A new parameter named serving_ticket is added tocontrol the non-work conserving access pattern of the crit-ical sections, i.e., a job can only lock the semaphore andstart its critical section if it holds the ticket equals to thecorresponding serving_ticket .The pseudo code in Algo. 1 shows three main functionsin our implementation: The function get_cs_order re-turns the position of the sub-job in the execution order forall the sub-jobs that access the same shared resource duringthe run-time. In LITMUS RT , job_no counts the number ofjobs that one task releases. In order to find out the exactposition of this job in one hyper-period, we apply a modulooperation on job_no and total_jobs . Since a job hasmultiple critical section and the current_cs representsthe position of the critical section in a job, the index iscalculated by counting the number of previous jobs’ criticalsections and the current_cs in this job. After that, thevalue of cs_order is searched from job_order based onthe obtained index.We provide an example with 5 tasks which share tworesources. The four tasks, i.e., τ , τ , τ , and τ are identicalto Fig. 1 and task τ has a period T = 50 and the samepattern as τ , i.e., it requests resource 2 in its second segmentand request resource 1 in its forth segment. Hence, thehyper-period for this taskset is , τ , τ , τ , and τ releasetwo jobs in one hyper-period, and τ releases one job inone hyper-period. The related data structure is shown inTable 1. Task τ has the job_order = [1, 3, 6, 8, 9, 9]. Thefirst two elements, i.e., [1, 3], represents that the two criticalsections of J have the execution order 1 and 3 accordingly,the following two elements, i.e., [6, 8] denotes the executionorder for J ’s two critical sections in one hyper-period, andthe last two elements, i.e., [9, 9] shows the number of jobsthat request the related resources. For both resource 1 andresource 2, there are nine jobs which request the resource inone hyper-period. Assume that the job_no for τ is 13. Line1 in Algo. 1 returns the current_jobno which representsthe corresponding relative position in one hyper-period, i.e.,the th job of τ is the second job of τ in the currenthyper-period. Then line 2 finds the index of correspondingcritical section, i.e., the second critical section of the secondjob of τ has the index . In the end, the correspondingexecution order can be found from job_order according toline 3 in Algo. 1. Therefore, the th job of task τ now hasthe execution order to grant access to the correspondingresource.The function mdga_lock is called in order to lock thesemaphore and get access to the corresponding resource. Af- Algorithm 1
DGA with multi-critical sections implementa-tion
Input:
New coming task τ i { job_no , total_jobs , total_cs , current_cs , relative_ddls } , andRequested semaphore s z { semaphore_owner , serving_ticket , wait_queue } ; Function get cs order(): current_jobno ← τ i . job_no mod τ i . total_jobs ; index ← current_jobno × τ i . total_cs + current_cs ; cs_order ← τ i . job_order [index]; Function mdga lock(): if s z . semaphore_owner is NULL and s z . serving_ticket equals to τ i . cs_order then s z . semaphore_owner ← τ i ; Update the deadline for τ i ; τ i starts the execution of its critical section; else Add τ i to s z . wait_queue ; Function mdga unlock(): τ i releases the semaphore lock; Update the deadline for τ i ; τ i . current_cs ++; if τ i . current_cs = total_cs then Set τ i . current_cs ← s z . serving_ticket ++; if s z . serving_ticket = num_cs then Set s z . serving_ticket ← Next task τ next ← the head of the wait_queue (if exists); if serving_ticket equals to τ next . cs_order then s z . semaphore_owner ← τ next ; τ next starts the execution of its critical section; else s z . semaphore_owner ← NULL;
Add τ next to s z . wait_queue ; ter getting the correct position in the execution order in onehyper-period by applying function get_cs_order() , thesemaphore’s ownership will be checked. If the semaphore isoccupied by another job at that moment, the new arrivingjob will be added to the wait_queue directly; otherwise,the semaphore’s current_serving_ticket and the job’s cs_order are compared. If they are equal, the semaphore’sowner will be set to that job, and the job will start itscritical section; otherwise, the job will be added to the wait_queue as well. In our setting the wait_queue issorted by the jobs’ cs_order , i.e., the job with the smallest cs_order is the head of the waiting queue. Hence, onlythe head of the wait_queue has to be checked when thecurrent semaphore owner finishes its execution, rather thanchecking the whole unsorted wait_queue .The function mdga_unlock is called once a job has fin-ished its critical section and tries to unlock the semaphore.The task’s current_cs is added by one to point to the nextpossible critical section in this job. If current_cs reaches tothe total_cs , which means all the critical sections in thisjob have finished their execution, then the current_cs willbe reset to zero. Next, the semaphore’s serving_ticket isincreased by 1, i.e., it is ready to be obtained by the successorin the dependency graph. If serving_ticket reaches thetotal number of critical sections related to this resource in ARLY ANNOUNCEMENT 11 one hyper-period, i.e., num_cs , the dependency graph istraversed completely, i.e., all sub-jobs that access the relatedresource finished their executions of the critical sections inthe current hyper-period, the parameter serving_ticket is reset to to start the next iteration. Please note, the num_cs can be found in the last Z elements of job_order according to the related resource id. After that, the first job(if any) in the wait_queue , named as τ next is checked. If τ next has the cs_order which equals to the semaphore’s serving_ticket , the the semaphore’s owner is set as τ next , and τ next can start the execution of its critical section.Otherwise, the semaphore owner is set as NULL, and thetask τ next is put back to the corresponding wait_queue .Additionally, each sub-job has its own modified deadlineaccordingly, which means each job can have different dead-lines when it is executing different segments. Therefore, wehave to take care of the deadline update during the imple-mentation. When we deploy a task using rtspin to thesystem, we deliver the relative deadline of its first sub-taskas the relative deadline of the whole task. Since no two con-tinuous non-critical sections are allowed in the task model,once a sub-job finishes its execution, either mdga_lock or mdga_unlock is called. If mdga_lock is called, thenew critical section’s deadline is updated by searching the relative_deadline ; if mdga_lock is called, only thefinished critical section can update related job’s deadlinefor its successor (if any), since τ next ’s deadline has beenupdated when it tries to lock the semaphore already.The implementations for the global and partitionedplug-ins are similar. However, due to the frequent preemp-tion and/or interrupts in global scheduling, the preemptionhas to be disabled during the executions of semaphorerelated functions in order to protect the functionalities ofaforementioned functions. We evaluated the overheads of our implementation in thefollowing platform: a cache-coherent SMP, consisting oftwo 64-bit Intel Xeon Processor E5-2650Lv4, with 35 MBcache and 64 GB main memory. The FMLP supported inLITMUS RT was also evaluated for comparisons, includingP-FMLP for partitioned scheduling and G-FMLP for globalscheduling. These four protocols are evaluated using sametask sets where each task has multiple critical sections.The overheads that we tracked are: • CXS : context-switch overhead. • RELEASE : time spent to enqueue a newly released jobinto a ready queue. • SCHED : time spent to make a scheduling decision, i.e.,find the next job to be executed. • SCHED2 : time spent to perform post context switchand management activities. • SEND-RESCHED : inter-processor interrupt latency, in-cluding migrations.The overheads are reported in Table 2, which shows thatthe overheads of our approach and those of P-FMLP, G-FMLP are comparable. Furthermore, the implementationsprovided in [36], called P-LIST-EDF and G-LIST-EDF, wereevaluated to examine the overhead and reported in Table 2.The direct comparison between P-LIST-EDF and P-DGA-JS
Max. (Avg.) in µs CXS RELEASE SCHED SCHED2 SEND-RESCHEDP-FMLP 29.51 (0.98) 17.68 (0.96) 31.85 (1.31) 28.77 (0.18) 66.33 (2.86)P-DGA-JS 30.65 (1.25) 18.63 (1.02) 31.09 (1.64) 29.43 (0.19) 59.09 (21.06)G-FMLP 30.51 (1.05) 48.53 (3.75) 45.99 (1.51) 29.62 (0.16) 72.26 (2.50)G-DGA-JS 26.87 (0.94) 30.01 (2.19) 30.25 (1.02) 19.26 (0.14) 72.53 (21.50)P-LIST-EDF 18.76 (0.90) 18.98 (1.06) 48.50 (1.33) 29.25 (0.16) 38.3 (1.61)G-LIST-EDF 30.87 (1.79) 61.63 (12.06) 59.05 (4.46) 27.17 (0.25) 72.09 (20.77)
TABLE 2
Overheads of protocols in LITMUS RT . (G-LIST-EDF and G-DGA-JS, respectively) is not possiblebecause they are designed for different scenarios, dependingon the number of critical sections per task. The reportedoverheads in Table 2 for our approach are for task sets withmultiple critical sections per task, whilst the overheads forP-LIST-EDF and G-LIST-EDF were for task sets with onecritical section per task. Regardless, they are in the sameorder of magnitude. VALUATIONS
We evaluated the performance of the proposed approach byapplying numerical evaluations for both frame-based tasksets and periodic task sets, and measuring its overheads.
We conducted evaluations on M = 4, 8, and 16 processors.Based on M , we generated synthetic task sets with M tasks each, using the RandomFixedSum method [14]. Weset (cid:80) τ i ∈ T U i = M and enforced U i ≤ . for each task τ i , where U i = C i T i is the utilization of a task. The numberof shared resources (binary semaphores) Z was either , ,or . Each task τ i accesses the available shared resourcerandomly between and times, i.e., (cid:80) λ i,j ∈ [2 , . Thetotal length of the critical sections (cid:80) λ i,j =1 C i,j is a fractionof the total execution time C i of task τ i , depended on H ∈ { [5% − , [10% − , [40% − } . When con-sidering shared resources in real-time systems, the utiliza-tion of critical sections for each task in classical settings isrelatively low. However, with the increasing computationdemand in real-time systems (e.g., for machine learningalgorithms), adopted accelerators, like GPUs, behave likeclassical shared resources (i.e., they are non-preemptive andmutually exclusive), but have a relatively high utilization.Hence, we chose possible settings of H that cover thecomplete spectrum. The total length of critical sections andnon-critical sections are split into dedicated segments byapplying UUniFast [14] separately. For task τ i , the numberof critical sections N um cs equals to (cid:80) λ i,j , and the numberof non-critical sections N um ncs = N um cs + 1 . In the end,the generated non-critical sections and critical sections arecombined in pairs, and the last segment is the last non-critical section. We evaluated all resulting 27 combinationsof M , Z , and H .The dependency graph is generated by applying:1) The method in Sec. 4 with the objective to minimizethe makespan, denoted as JS . We utilized the constraintprogramming approach provided in the Google OR-Tools to solve the job shop scheduling problem, ARLY ANNOUNCEMENT 12
2) The extension to multiple critical sections sketchedin [36], denoted as
PRP . To check the feasibility of thegenerated dependency graph, one simulated schedulewith respect to the dependency graph is generated.We name these algorithms by combining:1)
JS/PRP : the two different dependency graph generationmethods.2)
LEDF/PEDF : to schedule the generated graph, we usedthe LIST-EDF in [36] (LEDF) or partitioned EDF (PEDF)in [37], and a worst-fit partitioning algorithm.3)
P/NP : preemptive or non-preemptive schedule for crit-ical sections.We also compare our approach with the following proto-cols regarding their schedulability by applying the publiclyavailable tool SET-MRTS in [12] with the same naming: • Resource Oriented Partitioned PCP (ROP-PCP) [22]:Binds the resources on dedicated processors and sched-ules tasks using semi-partitioned PCP. • GS-MSRP [41]: THe Greedy Slacker (GS) partitioningheuristic for spin-based locking protocol MSRP [16], us-ing Audsley’s Optimal Priority Assignment [2] for pri-ority assignment. (LP) analysis for global FP schedulingusing the FMLP [4]. • LP-GFP-PIP: LP-based global FP scheduling using thePriority Inheritance Protocol (PIP) [13]. • LP-PFP-DPCP [6]: DPCP [33] with a Worst-Fit-Decreasing (WFD) task assignment strategy [6]. Theanalysis is based on a linear-programming (LP). • LP-PFP-MPCP [6]: MPCP [32] with a Worst-Fit-Decreasing (WFD) task assignment strategy as pro-posed in [6]. The analysis is based on a LP. • LP-GFP-FMLP [4]: FMLP [4] for global FP schedulingwith a LP analysis.
Note that a comparison to the original DGA in [11] is not possible,since the approach in [11] is only applicable when there is onecritical section per task.
We also launched the evaluation ofthe Priority Inheritance Protocol (PIP) [13] based on LP, butwe were not able to collect the complete results because val-idating a task set took multiple hours. However, accordingto [11], [36], [43], the PIP based on LP performs similar toLP-GFP-FMLP.
For frame-based task systems, we set T = D = 1 for allthe tasks, i.e., the execution time of each task is the same asits utilization. We tracked the number of dependency graphscalculated with PRP where the ratio of P RP/JS is less thana certain factor. The results are shown in Fig. 2, where F represents the number of infeasible dependency graph forthe PRP method due to cycle detection. The job-shop baseddependency graph generation method clearly outperformthe method extended from the original DGA. In addition,the failure rate of the
PRP is increasing when the length ofcritical sections is increased, i.e., Fig. 2 (a), (b), and (c). Theother results show similar trends and are thus omitted dueto space limitation.In our schedulability evaluation, we considered syn-thetic task sets under the aforementioned settings, testingthe utilization level from to × M in steps of .The acceptance ratios of LP-PFP-DPCP and LP-PFP-MPCP . . . . . . P RP/J S . . . . . . N u m b e r o f S e t s . . . . . (a) M=8, Z=8,H=5%-10%,F=0 . . . . . (b) M=8, Z=8,H=10%-40%,F=4 . . . . . (c) M=8, Z=8,H=40%-50%,F=10 . . . . . (d) M=4, Z=4,H=10%-40%,F=0 . . . . . (e) M=8, Z=16,H=10%-40%,F=1 . . . . . (f) M=16, Z=16,H=10%-40%,F=5 Fig. 2. Comparison of critical paths from the two graph generationmethods. are zero for all configurations, even for utilization levels ≤ × M . Hence, we omitted them in Fig. 3. Additionally,considering the readability of the figure, we only show PRP-LEDF-P , which has the best performance for the approacheswhere dependency graphs are generated by
PRP .Fig. 3 shows that our approach outperforms the othernon-DGA based methods significantly for all evaluated set-tings, and performs slightly better than the methods using
PRP . Fig. 2 and Fig. 3 also show that a better dependencygraph, i.e, a shorter critical path, not always results in betterschedulability in the second step of the DGA.
We applied constraint programming to solve the jobshop problem J Z | r j , l j | L max and construct the dependencygraph. We extended the settings for frame-based task setsin Sec. 6.2 to periodic task systems by choosing the period T i randomly from a set of semi-harmonic periods, i.e., T i ∈ { , , , } , which is a subset of the periods usedin automotive systems [20], [25]. We used a small range ofperiods to generate reasonable task sets with high utilizationof the critical sections, which are otherwise by default notschedulable.Due to space limitation, only a subset of the results ispresented in Fig. 4. When the utilization of critical sections ishigh, i.e., H = [40% − in Fig. 4 (c), or under mediumutilization when the number of processor and shared re-sources are relative high, i.e., M = H = 16 in Fig. 4 (f), ourapproaches outperforms the other methods significantly.However, when the utilization of critical sections is low,i.e., H = [5% − in Fig. 4 (a) and (b), ROP-PCPoutperformed the proposed approaches. The reason is thatthe constraint programming of the problem J Z | r j , l j | L max has the objective to minimize the maximum lateness, but ARLY ANNOUNCEMENT 13 . . . . . . Utilization (%) / M . . . . . . A cce p t a n ce R a t i o ( % )
30 40 50 60 70 80 90 100020406080100 (a) M=8, Z=8, H=5%-10%
JS-LEDF-PJS-LEDF-NP JS-PEDF-PJS-PEDF-NP PRP-LEDF-PROP-PCP GS-MSRPLP-GFP-FMLP
30 40 50 60 70 80 90 100020406080100 (b) M=8, Z=8, H=10%-40%
30 40 50 60 70 80 90 100020406080100 (c) M=8, Z=8, H=40%-50%
30 40 50 60 70 80 90 100020406080100 (d) M=4, Z=4, H=10%-40%
30 40 50 60 70 80 90 100020406080100 (e) M=8, Z=16, H=10%-40%
30 40 50 60 70 80 90 100020406080100 (f) M=16, Z=16, H=10%-40%
Fig. 3. Schedulability of different approaches for frame-based task sets. ignores the execution order of the sub-jobs that do not haveany influence on the optimal lateness, which may lead tolower performance when the utilization of the non-criticalsections is high. When the utilization of critical section ismedium, i.e., H = [10% − , and the number of proces-sor is relative small i.e., M = { , } , the newly proposedDGA-based methods and the extension PRP-LEDF-P bothoutperform all the other methods significantly, but theirrelation differs depending on the utilization value. ONCLUSION
We have removed an important restriction, i.e., only onecritical section per task, of the recently developed depen-dency graph approaches (DGA). Regarding the computa-tional complexity, we show that the multiprocessor synchro-nization problem is N P -complete even in very restrictivescenarios, as detailed in Sec. 3. We propose a systematicdesign flow based on the DGA by using existing algorithmsdeveloped for job/flow shop scheduling and provide theapproximation ratio(s) for the derived makespan.The evaluation results in Sec. 6.2 show that our approachis very effective for frame-based real-time task systems.Extensions to periodic task systems are presented in Sec. 4.4,and the evaluation results show that our approach hassignificant improvements, compared to existing protocols,in most evaluated cases except light shared resource utiliza-tion. This paper significantly improves the applicability ofthe DGA by allowing arbitrary configurations of the numberof non-nested critical sections per task. . . . . . . Utilization (%) / M . . . . . . A cce p t a n ce R a t i o ( % )
30 40 50 60 70 80 90 100020406080100 (a) M=8, Z=8, H=5%-10%
JS-LEDF-PJS-LEDF-NP JS-PEDF-PJS-PEDF-NP PRP-LEDF-PROP-PCP GS-MSRPLP-GFP-FMLP
30 40 50 60 70 80 90 100020406080100 (b) M=8, Z=8, H=10%-40%
30 40 50 60 70 80 90 100020406080100 (c) M=8, Z=8, H=40%-50%
30 40 50 60 70 80 90 100020406080100 (d) M=4, Z=4, H=10%-40%
30 40 50 60 70 80 90 100020406080100 (e) M=8, Z=16, H=10%-40%
30 40 50 60 70 80 90 100020406080100 (f) M=16, Z=16, H=10%-40%
Fig. 4. Schedulability of different approaches for periodic task sets. A CKNOWLEDGMENTS
This paper is supported by DFG, as part of the Col-laborative Research Center SFB876, project A1 and A3(http://sfb876.tu-dortmund.de/). The authors thank ZeweiChen and Maolin Yang for their tool SET-MRTS [12](Schedulability Experimental Tools for Multiprocessors RealTime Systems) to evaluate the GS-MSRP, LP-GFP-FMLP, andROP-PCP in Fig. 3 and Fig. 4. R EFERENCES [1] A. A. Ageev. A 3/2-approximation for the proportionate two-machine flow shop scheduling with minimum delays. In
Approx-imation and Online Algorithms, 5th International Workshop, WAOA ,2007.[2] N. C. Audsley. Optimal priority assignment and feasibility of staticpriority tasks with arbitrary start times. Technical Report YCS-164,Department of Computer Science, University of York, 1991.[3] T. P. Baker. Stack-based scheduling of realtime processes.
Real-TimeSystems , 3(1):67–99, 1991.[4] A. Block, H. Leontyev, B. Brandenburg, and J. Anderson. A flexiblereal-time locking protocol for multiprocessors. In
RTCSA , 2007.[5] B. Brandenburg.
Scheduling and Locking in Multiprocessor Real-TimeOperating Systems . PhD thesis, The University of North Carolinaat Chapel Hill, 2011.[6] B. Brandenburg. Improved analysis and evaluation of real-timesemaphore protocols for P-FP scheduling. In
RTAS , 2013.[7] B. B. Brandenburg and J. H. Anderson. Optimality results formultiprocessor real-time locking. In
RTSS , 2010.[8] A. Burns and A. J. Wellings. A schedulability compatible multipro-cessor resource sharing protocol - MrsP. In
Euromicro Conferenceon Real-Time Systems (ECRTS) , pages 282–291, 2013.[9] J. M. Calandrino, H. Leontyev, A. Block, U. C. Devi, and J. H.Anderson. LITMUS RT : A testbed for empirically comparing real-time multiprocessor schedulers. In RTSS , 2006.[10] B. Chen, C. N. Potts, and G. J. Woeginger.
A Review of MachineScheduling: Complexity, Algorithms and Approximability , pages 1493–1641. Springer US, Boston, MA, 1998.
ARLY ANNOUNCEMENT 14 [11] J.-J. Chen, G. von der Br ¨uggen, J. Shi, and N. Ueter. Dependencygraph approach for multiprocessor real-time synchronization. In
IEEE Real-Time Systems Symposium, RTSS , pages 434–446, 2018.[12] Z. Chen. SET-MRTS: Schedulability Experimental Tools forMultiprocessors Real Time Systems. https://github.com/RTLAB-UESTC/SET-MRTS-public, 2018.[13] A. Easwaran and B. Andersson. Resource sharing in global fixed-priority preemptive multiprocessor scheduling. In
RTSS , 2009.[14] P. Emberson, R. Stafford, and R. I. Davis. Techniques for thesynthesis of multiprocessor tasksets. In
WATERS , pages 6–11, 2010.[15] D. Faggioli, G. Lipari, and T. Cucinotta. The multiprocessorbandwidth inheritance protocol. In
Euromicro Conference on Real-Time Systems (ECRTS) , pages 90–99, 2010.[16] P. Gai, G. Lipari, and M. D. Natale. Minimizing memory utilizationof real-time task sets in single and multi-processor systems-on-a-chip. In
Real-Time Systems Symposium (RTSS) , pages 73–83, 2001.[17] M. R. Garey and D. S. Johnson.
Computers and intractability: A guideto the theory of NP-completeness . W. H. Freeman and Co., 1979.[18] L. A. Goldberg, M. Paterson, A. Srinivasan, and E. Sweedyk.Better approximation guarantees for job-shop scheduling.
SIAM J.Discrete Math. , 14(1):67–92, 2001.[19] R. L. Graham. Bounds on multiprocessing timing anomalies.
SIAM Journal of Applied Mathematics , 17(2):416–429, 1969.[20] A. Hamann, D. Dasari, S. Kramer, M. Pressler, and F. Wurst.Communication centric design in complex automotive embeddedsystems. In , 2017.[21] P.-C. Hsiu, D.-N. Lee, and T.-W. Kuo. Task synchronizationand allocation for many-core real-time systems. In
InternationalConference on Embedded Software, (EMSOFT) , pages 79–88, 2011.[22] W.-H. Huang, M. Yang, and J.-J. Chen. Resource-oriented par-titioned scheduling in multiprocessor systems: How to partitionand how to share? In
Real-Time Systems Symposium , 2016.[23] B. Kalyanasundaram and K. Pruhs. Speed is as powerful asclairvoyance.
Journal of ACM , 47(4):617–643, July 2000.[24] Y. Karuno and H. Nagamochi. A better approximation for thetwo-machine flowshop scheduling problem with time lags. In
Algorithms and Computation, 14th International Symposium , 2003.[25] S. Kramer, D. Ziegenbein, and A. Hamann. Real world automotivebenchmark for free. In
WATERS , 2015.[26] K. Lakshmanan, D. de Niz, and R. Rajkumar. Coordinated taskscheduling, allocation and synchronization on multiprocessors. In
Real-Time Systems Symposium , pages 469–478, 2009.[27] E. L. Lawler, J. K. Lenstra, A. H. R. Kan, and D. B.Shmoys. Se-quencing and scheduling: Algorithms and complexity.
Handbooksin Operations Research and Management Science , 4:445–522, 1993.[28] J. Lenstra and A. Rinnooy Kan. Computational complexity ofdiscrete optimization problems.
Ann. Discrete Math. , 4, 1979.[29] M. Mastrolilli and O. Svensson. Hardness of approximating flowand job shop scheduling problems.
Journal of the ACM , 58(5):20:1–20:32, Oct. 2011.[30] F. Nemati, T. Nolte, and M. Behnam. Partitioning real-timesystems on multiprocessors with shared resources. In
Principles ofDistributed Systems - International Conference , pages 253–269, 2010.[31] C. Phillips, C. Stein, E. Torng, and J. Wein. Optimal time-criticalscheduling via resource augmentation. In
ACM Symposium onTheory of Computing , pages 140–149, 1997.[32] R. Rajkumar. Real-time synchronization protocols for sharedmemory multiprocessors. In
Proceedings.,10th International Confer-ence on Distributed Computing Systems , pages 116 – 123, 1990.[33] R. Rajkumar, L. Sha, and J. P. Lehoczky. Real-time synchronizationprotocols for multiprocessors. In
RTSS , 1988.[34] L. Sha, R. Rajkumar, and J. P. Lehoczky. Priority inheritanceprotocols: An approach to real-time synchronization.
IEEE Trans.Computers , 39(9):1175–1185, 1990.[35] J. Shi. HDGA-LITMUS-RT. https://github.com/Strange369/Dependency-Graph-Approach-for-Periodic-Tasks, 2019.[36] J. Shi, N. Ueter, G. von der Br ¨uggen, and J.-j. Chen. Multiprocessorsynchronization of periodic real-time tasks using dependencygraphs. In , pages 279–292, 2019.[37] J. Shi, N. Ueter, G. von der Br ¨uggen, and J.-J. Chen. Partitionedscheduling for dependency graphs in multiprocessor real-timesystems. In
Proceedings of the 25th IEEE International Conferenceon Embedded and Real-Time Computing Systems and Applications,RTCSA , 2019. [38] D. B. Shmoys, C. Stein, and J. Wein. Improved approximationalgorithms for shop scheduling problems.
SIAM J. Comput. ,23(3):617–632, 1994.[39] Y. Sotskov and N. Shakhlevich. NP-hardness of shop-schedulingproblems with three jobs.
Discrete Appl. Math. , 59(3):237–266, 1995.[40] G. von der Br ¨uggen, J.-J. Chen, W.-H. Huang, and M. Yang.Release enforcement in resource-oriented partitioned schedulingfor multiprocessor systems. In
RTNS , 2017.[41] A. Wieder and B. Brandenburg. On spin locks in AUTOSAR:blocking analysis of FIFO, unordered, and priority-ordered spinlocks. In
RTSS , 2013.[42] A. Wieder and B. B. Brandenburg. Efficient partitioning of spo-radic real-time tasks with shared resources and spin locks. In
International Symposium on Industrial Embedded Systems, (SIES) ,pages 49–58, 2013.[43] M. Yang, A. Wieder, and B. B. Brandenburg. Global real-timesemaphore protocols: A survey, unified analysis, and comparison.In
Real-Time Systems Symposium (RTSS) , pages 1–12, 2015.[44] W. Yu, H. Hoogeveen, and J. K. Lenstra. Minimizing makespan ina two-machine flow shop with delays and unit-time operations isnp-hard.
J. Scheduling , 7(5):333–348, 2004.[45] X. Zhang and S. L. van de Velde. Polynomial-time approximationschemes for scheduling problems with time lags.