Discrete Frequency Selection of Frame-Based Stochastic Real-Time Tasks
DDiscrete Frequency Selection of Frame-Based StochasticReal-Time Tasks
Vandy B
ERTEN , Chi-Ju C
HANG , Tei-Wei K UO National Taiwan UniversityComputer Science and Information Engineering dept. { vberten, ktw } @csie.ntu.edu.tw, [email protected] 29, 2018 Abstract
Energy-efficient real-time task scheduling has been activelyexplored in the past decade. Different from the past work,this paper considers schedulability conditions for stochasticreal-time tasks. A schedulability condition is first presentedfor frame-based stochastic real-time tasks, and several algo-rithms are also examined to check the schedulability of agiven strategy. An approach is then proposed based on theschedulability condition to adapt a continuous-speed-basedmethod to a discrete-speed system. The approach is ableto stay as close as possible to the continuous-speed-basedmethod, but still guaranteeing the schedulability. It is shownby simulations that the energy saving can be more than 20%for some system configurations.
Keywords:
Stochastic low-power real-time scheduling,frame-based systems, schedulability conditions.
In the past decade, energy efficiency has received alot of attention in system designs, ranged from serverfarms to embedded devices. With limited energysupply but an increasing demand on system perfor-mance, how to deal with energy-efficient real-time taskscheduling in embedded systems has become a highlycritical issue. There are two major ways in frequencychanges of task executions: Inter-task or intra-taskdynamic voltage scaling (DVS). Although Intra-taskDVS seems to save more energy, the implementationis far more complicated than Inter-task DVS. Most ofthe time we need very good supports from compilersor/and operating systems, that is often hard to receivefor many embedded systems. On the other hand, inter-task DVS is easier to deploy, and tasks might not beeven aware of the deployment of the technology.Energy-efficient real-time task scheduling has beenactively explored in the past decade. Low-power real-time systems with stochastic or unknown durationhave been studied for several years. The problem hasfirst been considered in systems with only one task, or systems in which each task gets a fixed amount oftime. Gruian [3, 4] or Lorch and Smith [5, 6] bothshown that when intra-task frequency change is avail-able, the more efficient way to save energy is to in-crease progressively the speed. Solutions using a dis-crete set of frequencies and taking speed change over-head into account have also been proposed [11, 10]. Forinter-task frequency changes, some work has been al-ready undertaken. In [7], authors consider a similarmodel to the one we consider here, even if this model ispresented differently. The authors present several dy-namic power management techniques: Proportional,Greedy or Statistical. They don’t really take the dis-tribution of number of cycles into account, but only itsmaximum, and its average for Statistical. According tothe strategy, a task will give its slack time (the differ-ence between the worst case and the actual number ofused cycle) either to the next task in the frame, or to allof them. In [1], authors attempt to allow the manager totune this aggressiveness level, while in [10], they pro-pose to adapt automatically this aggressiveness usingthe distribution of the number of cycles for each task.The same authors have also proposed a strategy takingthe number of available speeds into account from thebeginning, instead of patching algorithms developedfor continuous speed processors [8]. Some multipro-cessor extensions have been considered in [2].Although excellent research results have been pro-posed for energy-efficient real-time task scheduling,little work is done for stochastic real-time tasks, wherethe execution cycles of tasks might not be known inadvance. In this paper, we are interested in frame-based stochastic real-time systems with inter-task DVS,where frame-based real-time tasks have the samedeadline (also referred as the frame). Note that theframe-based real-time task model does exist in manyexisting embedded system designs, and the results ofthis paper can provide insight in the designs of morecomplicated systems. Our contribution is twofold:First, we propose a schedulability test, allowing to eas-ily know if a frequency selection will allow to meet a r X i v : . [ c s . O S ] A p r eadlines for any task in the system. As a second con-tribution, we provide a general method allowing toadapt a method designed for a continuous set of speeds(or frequencies) into a discrete set of speeds. This canbe done more efficiently than classically by using theschedulability condition we give in the first part. Apartfrom this alternative way of adapting continuous strat-egy, we will show how this schedulability test can beused in order to improve the robustness to parame-ters variation. The capability of the proposed approachis demonstrated by a set of simulations, and we showthat the energy saving can be more than 20% for somesystem configurations.The rest of this paper is organized as follows: we firstpresent the mathematical model of a real-time systemthat we consider in Section 2. We then present our firstcontribution in Section 3, which consists in schedula-bility conditions and tests for the model. We then usethose results in Section 3.5 and 4 to explain how we canimprove the discretization of continuous-speed-basedstrategies, and show the efficiency of this approach inthe experimental part, in Section 5, and finally con-clude in Section 6. We have N tasks { T i , i ∈ [1 , . . . , N ] } which run on aDVS CPU. They all share the same deadline and period D (which we call the frame ), and are executed in theorder T , T , . . . , T N . The maximum execution num-ber of cycles of T i is w i . Task T i will require x cycleswith a probability c i ( x ) , where c i ( · ) is then the distri-bution of the number of cycles. Of course, in practical,we cannot use a so precise information, and authorsusually group cycles in “bins”. For instance, we canchoose to use a fixed bin system, with b i the size of thebins. In this case, the probability distribution c (cid:48) i ( · ) issuch that c (cid:48) i ( k ) represent the probability to use between ( k − × b i (excluded) and k × b i (included) cycles.The system is said to be expedient if a task never waitsintentionally. In other words, T starts at time , T starts as soon as T finishes, and so on.The CPU can run at M frequencies (or speeds) f Example of scheduling with function S i ( t ) .We have 5 tasks T , . . . , T , running every D . In thisframe, T is run at frequency f = S ( t ) , T at f = S ( t ) , T at f = S ( t ) , etc f f f T T T T D t t t t t T f f f DS ( t ) z f f f DS ( t ) f z f f We need now to define the concept of schedulabilityin our model: Definition 1. An expedient system { T i , S i ( · ) } , { f j } ( i ∈{ , . . . , N } , j ∈ { , . . . , M } ) is said to be schedulable if,whatever the combination of effective number of cycles foreach task, any task T i finishes its execution no later than theend of the frame. From this definition, we can easily see that if { T i } issuch that f M (cid:80) Ni =1 w i > D (the left hand size repre-sents the time needed to run any task in the frame at2he highest speed if every task requires its worst caseexecution cycle), the system will never be schedulable,whatever the set of scheduling functions. In the sameway, we can see that if { T i } is such that f (cid:80) Ni =1 w i ≤ D , the system is always schedulable, even with a “verybad” set of scheduling functions.Of course, a non schedulable system could be ableto run its tasks completely in almost every case. Be-ing non schedulable means that stochastically certainly(with a probability equal to 1), we will have a framewhere a task will not have the time to finish before thedeadline (or the end of the frame) Lemma 1. Any task in { T i , T i +1 , . . . , T N } can always fin-ish no later than D if and only if the system is expedient, and T i starts no later than z i , defined as z i = D − f M N (cid:88) k = i w k . Proof. This lemma can be proved by induction. Initialization. We first consider the case T N . The verylast time the task T N can start is the time allowing it toend before D even if it consumes its w N cycles. At thehighest frequency f M , T N takes at most w N f M to finish. T N has then necessarily to start no later than D − w N f M .Otherwise, if the task starts after that time, even at thehighest frequency, there is no certitude that T N will fin-ish by D . Induction. We know that if (and only if) T i +1 starts nolater than z i +1 , the schedulability of { T i +1 , . . . , T N } isensured. We need then to show that if T i starts no laterthan z i , it will be finished by z i +1 . If T i starts no laterthat z i , we can choose the frequency in order that T i finishes before z i + w i f M = D − f M N (cid:88) k = i w k + w i f M = z i +1 . Definition 2. The danger zone of T i is the range ] z i , D ] . This danger zone means that if T i has to start in ] z i , D ] , we cannot guarantee the schedulability any-more. Even if, because of the variable nature of execu-tion time, we cannot guarantee that some task will missits deadline. Of course, the size of the danger zone of T i is larger that the one of T j if i < j , which means that z i < z j iff i < j .In order to simplify some notation, we will state z N +1 = D . Let us now consider conditions on { S i } allowing toguarantee the schedulability of the system. We provethe following theorem: Theorem 1. S i ( t ) ≥ w i z i +1 − t ∀ i ∈ [1 , . . . , N ] , t ∈ [0 , z i [ , where z i = D − f M N (cid:88) k = i w k , is a necessary and sufficient condition in order to guaranteethat if task T i does never require more than w i cycles andthe system is expedient, any task T i can finish no later than z i +1 , and then the last one T N no later than D .Proof. We show this by induction. Let τ i be the worstfinishing time of task T i . Please note that this does notnecessarily correspond to the case where any task be-fore T i consumes its WCEC. Figure 2 highlights why. Figure 2 Example showing that a shorter number ofcycles for one task can result in a worse ending time forsubsequent tasks. Here, t (cid:48) is the point at which S ( t ) goes from f to f . On the top plot, T uses slightly lesscycles than in the bottom plot, and T uses the samenumber in both cases, but is run at f in the first case,and at f in the second one. f f t (cid:48) f f t (cid:48) T T T T First, we have to show that in the range [0 , z i ] , w i z i +1 − t ≤ f M . As this function is an increasing func-tion of t , we just need to consider the maximal valuewe need: w i z i +1 − z i = w i D − f M N (cid:80) k = i +1 w k − (cid:18) D − f M N (cid:80) k = i w k (cid:19) = w i f M w i = f M Initialization. For the initialization, we consider T .Clearly, as the execution length is not taken into ac-count for the frequency selection, the worst case occurswhen T uses w cycles. As T starts at time , we have τ = w S (0) . S ( t ) ≥ w z − t by hypothesis, we have τ ≤ w w z = z .T ends then no later than z in any case. Similarly,we have that if S ( t ) < w z − t , τ > z , and we cannotguarantee that T finishes no later than z Induction. Let us now consider T i , with i > . Weknow by induction that T i − finished its execution be-tween time and time z i . Let θ be this end time. Know-ing that task T i starts at θ , the worst case for T i is to use w i cycles. The worst end time of T i is then τ i = θ + w i S i ( θ ) with θ ∈ [0 , τ i − = z i ] .Then, as S i ( t ) ≥ w i z i +1 − t (which is possible, be-cause we have just shown that the right hand side isnot higher than f M in the range we have to consider),we have τ i = θ + w i S i ( θ ) ≤ θ + w iw i z i +1 − θ = θ + z i +1 − θ = z i +1 . We then have that if S i ( t ) ≥ w i z i +1 − t , task T i finishesalways no later than z i +1 , and then, as a consequence,that any task finishes no later than z N +1 = D .Symmetrically, we can show also that if S i ( t ) We denote by L i ( t ) the schedulability limit,or L i ( t ) = w i z i +1 − t where z i = D − f M N (cid:88) k = i w k . An example of such schedulability limits is given inFigure 3, with four tasks, and a maximum frequency of1000MHz. Figure 3 Set of limit functions L i ( t ) , for an example of4 tasks. DZ represents the Danger Zone of T . The closest scheduling functions set to the limit is S i ( t ) = min { f ∈ { f , . . . , f N } : f ≥ L i ( t ) } . Informally, we could write this function S i ( t ) = (cid:24) w i z i +1 − t (cid:25) , where (cid:100) w (cid:101) stands for “ the smallest availablefrequency not lower than x ”. This function varies as adiscrete hyperbola between (cid:24) w i z i +1 (cid:25) and (cid:24) w i z i +1 − z i (cid:25) = (cid:38) w iw i f M (cid:39) = (cid:100) f M (cid:101) = f M . This function is however in general not very effi-cient: T is run at the slowest frequency allowing tostill run the following jobs in the remaining time. Butthen, T is run very slowly, while { T , . . . , T N } have apretty high probability to run at a high frequency. Amore balanced frequency usage is often better.This strategies actually corresponds to the Greedytechnique (DPM-G) described by Moss´e et al. [7], ex-cept that they consider continuous speeds.Building such a function is very easy, and is in O ( M ) for each task, with the method given by Algorithm 1.We mainly need to be able to inverse L : L − i ( f ) = z i +1 − w i f . Algorithm 1 Building Limit , worst case schedulingfunctions. ( a ) + means max { , a } . z ← D foreach i ∈ { N, . . . , } do S i + ← (0 , f ) foreach j ∈ { , . . . , M } do S i + ← (cid:0) (cid:18) z − w i f j − (cid:19) + , f j (cid:1) z ← z − w i f M In the following, this strategy is named as Limit .4 .4 Checking the schedulability Provided a set of scheduling functions { S } , checkingits schedulability is pretty simple. As we know that thelimit function is non decreasing, we just need to checkthat each step of S i is above the limit. This can be donewith the following algorithm. Algorithm 2 Schedulability check z ← D foreach i ∈ { N, . . . , } doforeach k ∈ { , . . . , | S i |} doif S i [ k − .f < w i z − S i [ k ] .t thenreturn false z ← z − w i f M return true This check can then be performed in O (cid:16)(cid:80) Ni =1 | S i | (cid:17) which, is S i is non decreasing (which is almost alwaysthe case) is lower than O ( N × M ) .This test can be used offline to check the schedula-bility of some method or heuristic, but can also be per-formed as soon as some parameter change has been de-tected. For instance, if the system observes that a task T i used more cycles than its (expected) WCEC w i , thetest could be performed with the new WCEC in orderto see if the current set of S functions can still be used.Notice that we only need to check tasks between and i , because the schedulability of tasks in { i + 1 , . . . , N } does not depend upon w i . See Section 6 about futurework for more details. Figure 4 Two different ways of discretizing a continu-ous strategy: Discr. strat. 1 rounds up to the first avail-able frequency. Discr. strat. 2 (our proposal) uses theclosest available frequency, taking the limit into ac-count. Limit is the strategy described by Algorithm 1. There are mainly two ways of building a set of S -functions for a given system. The first method consistsin considering the problem with continuous availablefrequencies, and by some heuristic, adapting this re-sult for a discrete speeds system. The second methodconsists in taking into account from the beginning thatthere are only a limited number of available speeds.The second family of methods has the advantage of be-ing usually more efficient in terms of energy, but thedisadvantage of being much more complex, requiringa non negligible amount of computations or memory.This is not problematic if the system is very stable andits parameters do not change often, but as soon as someon-line adaptation is eventually required, heavy andcomplex computations cannot be performed anymore.In the first family, the heuristic usually used con-sists in computing a continuous function S ci ( t ) whichis build in order to be schedulable, and to obtain adiscrete function by using for any t the smallest fre-quency above S ci ( t ) , or S i ( t ) = (cid:100)S ci ( t ) (cid:101) . However, thisstrategy is often pessimistic. But so far, there wereno other method in order to ensure the schedulability.This assertion is not valid anymore, because we pro-vided in this paper a schedulability condition whichcan be used.The main idea is, instead of using the smallest fre-quency above S ci ( t ) , to use the closest frequency to S ci ( t ) , and, if needed, to round this up with the schedu-lability limit L i ( t ) . In other words, we will use: S i ( t ) = max {(cid:100)S ci ( t ) (cid:99) , (cid:100)L i ( t ) (cid:101)} . The advantage of this technique is that we have morechance to be closer to the continuous function (whichis often optimal in the case of continuous CPU). How-ever, both techniques (ceiling and closest frequency)are approximations, and none of them is guaranteedto be better than the other one in any case. As we willshow in the experimental section, there are systems inwhich the classical discretization is better, but there arealso many cases where our discretization is better.Algorithm 3 shows how step functions can be ob-tained. For each task, computing its function is in O ( M × A ) , where A is the complexity of computing S − i ( f ) . According to the kind of continuous methodwe use, A can range between (if S c − i ( f ) has a con-stant closed form) and log( D/ε ) × B , with a binarysearch, where ε is the desired precision, and B the com-plexity of computing S ci ( t ) .Actually, computing the closest frequencyamongst { f , f , . . . , f M } roughly boils down tocompute the round up frequency amongst the set { f + f , f + f , . . . , f M − + f M } . Then, the range cor-responding to f + f is mapped onto f , etc. InAlgorithm 3, if we simply use f j − instead of f , weobtain the classical round up operation.5 lgorithm 3 Algorithm computing the closest step-function to S ci ( · ) , respecting the schedulability limit L i ( · ) . ( a ) + stands for max { , a } . foreach i ∈ { N, . . . , } do S i + ← (0 , f ) foreach j ∈ { , . . . , M } do f ← ( f j − + f j ) / t ← min {S c − i ( f ) , L − i ( f j − ) } S i + ← (( t ) + , f j ) Our model allows to easily take the time penalty of fre-quency changes into account. Let P T ( f i , f j ) be the timepenalty of changing from f i to f j . This means that oncethe frequency change is asked (usually, a special regis-ter has been set to some predefined value), the proces-sor is “idle” during P T ( f i , f j ) units of time before thenext instruction is run. We assume that the worst timeoverhead is when the CPU goes from f to f M . We de-note for this P MT = max i,j P T ( f i , f j ) = P T ( f , f M ) .Notice that this model is rather pessimistic: on mod-ern DVS CPUs, the processor does not stop after achange request, but still run at the old frequency for afew cycles before the change becomes effective. How-ever, even if the processor never stops, there is stilla penalty, but the time penalty is negative when thespeed goes down (because the job will be finishedsooner than if the frequency change had been per-formed before it started). Then as a first approxima-tion, we could consider that negative penalties com-pensate positive penalties. But this approximationdoes not hold for energy penalties, because all of themare obviously positive.We want also to take the switching time before jobsinto account, even if there is no frequency change (weassume that the job switching time is already taken intoaccount in P T ). Let S T ( f i ) be the switching time whenthe frequency is f i , and is not changed between twoconsecutive jobs. Again, let S MT denote S T ( f M ) . Usu-ally, we have S T ( f i ) < S T ( f j ) if f i > f j . We made herethe simplifying hypothesis that the switching time isjob independent, which is an approximation since thistime usually depends upon the amount of used mem-ory. However, in our purpose, we only need to con-sider an upperbound of this time.As before, we know that T N must start no later than D − w N f M . If T N starts at this limit (and even before), theselected frequency must be f M . Then we could havetwo situations: • Best case: the previous tasks T N − was alreadyrunning at f M . Then T N − needs to finish beforethe start limit for T N , minus the switching time,then D − w N f M − S MT ; • Worst case: the previous tasks T N − was not run-ning at f M , we need then to change the frequency.In the worst case, the time penalty will be P MT . T N − needs then to finish no later than D − w N f M − P MT .The first limit is then a necessary condition, and thesecond, a sufficient condition to ensure the schedu-lability of T N . Similarly, we can see that T i muststart before z ni to ensure the schedulability of itselfand any subsequent task (necessary condition), andthis schedulability is ensured (sufficient condition) if T i starts before z si , where z ni and z si are defined as: z ni = D − f M N (cid:88) k = i w k − ( N − i +1) S MT = z i − ( N − i +1) S MT and z si = D − f M N (cid:88) k = i w k − ( N − i +1) P MT = z i − ( N − i +1) P MT We can then provide two schedulability conditions: • Necessary condition: S i ( t ) ≥ w i z ni +1 − t ; • Sufficient condition: S i ( t ) ≥ w i z si +1 − t .Algorithm 3 can easily be adapted using those con-ditions. We use then L i ( t ) = w i z si +1 − t . If we want to be a little bit more flexible, we could pos-sibly consider soft deadlines, and adapt our schedula-bility condition consequently. The main idea is to notconsider the WCEC, but to use some percentile: if κ i ( ε ) is such that P [ c i < κ i ( ε )] ≥ − ε , where c i is the actualnumber of cycles of T i , we can use κ i ( ε ) as a worst caseexecution time.However, it seems to be almost impossible to com-pute analytically the probability of missing a dead-line with this model. It would boil down to compute P [ E + E + E + ... + E N ] where E i represents the ex-ecution time of jobs of task T i . E i depends then uponthe job length distribution, but also upon the speed atwhich T i is run, which depends upon the time at which T i − ends ... which depends upon the time T i − ended,and so on. As E i ’s are not independent, it seems thenthat we cannot use the central limit theorem.If we accept an approximation of the failure prob-ability, we could do in the following way. Let C i bethe random variable giving the number of cycles of T i , and C = (cid:80) i C i . Let W = (cid:80) i w i be the maximalvalue of C (the frame worst case execution cycle). Let C ε = min c { P [ C < c ] > − ε } .We assume that using the deadline D WC ε will allowto respect deadlines with a probability close to − ε .6hose propositions are only heuristics, and should re-quire more work, both analytic and experimental. In order to evaluate the advantage of using a “closest”approach instead of an “upper bound” approach, weapplied it on two methods. The first is one describedby Moss´e et al. in [7], and is called DPM-S (DynamicPower Management-Statistical), and the second one isdescribed by described by Xu, Melhem and Moss´e [10],called PITDVS (Practical Inter-Task DVS). The method DPM-S described in [7] bets that the nextjobs will not need more cycles than their average, andcompute then the speed making this assumption whena job starts. Of course, the schedulability limit is alsotaken into account. In their paper, the authors considerthat they can use any (normalized) frequency between and . In order to apply this method on a system witha limited number of frequencies, we can either roundthem up, or use or “closest” approach. They don’t takefrequency change overheads into account, but accord-ing to what we claimed hereabove, those overheads areeasy to integrate.We compute now the two following step functions inthis way, where avg i stands for the average number ofcycles of T i : in Algorithm 3 adapted to take frequencychanges overhead into account (cf Section 4.1), • DPM-S up : we replace S − i by D − (cid:80) Nj = i avg i f j − ; (1) • DPM-S closest : we replace S − i by D − (cid:80) Nj = i avg i f . (2) The second method we consider, by Xu, Melhemand Moss´e [10], is called PITDVS (Practical Inter-TaskDVS), and aims at patching OITDVS (Optimal Inter-Task DVS [9]), an optimal method for ideal proces-sors (with a continuous range of available frequen-cies). They apply several patches in order to make thisoptimal method usable for realistic processors. Theystart by taking speed change overhead into account,then they introduce maximal and minimal speed (OIT-DVS assumes speed from 0 to infinity), and finally,they round up the S -function to the smallest availablefrequency. It is in this last patch that we apply ourtechnique. Using the β i value described in [10] (rep-resenting the aggressiveness level), we compute the step functions in the following way: in Algorithm 3adapted to take frequency changes overhead into ac-count (cf Section 4.1), • PITDVS up (in [10]): we replace S − i by D − P T × ( N − i ) − w i β i f j − ; (3) • PITDVS closest (our adaptation): we replace S − i by D − P T × ( N − i ) − w i β i f . (4)In the following, we also run simulations using L ( Limit ) to choose the frequency. Our aim was not toshow how efficient or how bad this technique is, butmore to show that often, we observe rather counterin-tuitive results. For the simulations we present bellow, we use two dif-ferent sets of workloads. The first one is pretty simple,and quite theoretical. We use a set of 12 tasks, each ofthem having lengths uniformly distributed, betweenmiscellaneous bounds, different from each other. Forthe second set of simulations, we used several work-loads coming from video decoding using H.264, whichis used in our lab for some other experiments on a TIDaVinci DM6446 DVS processor. On Figure 9, we showthe distribution of the 8 video clips we used, each withseveral thousands of frames.We present here experimental results run for two dif-ferent kinds of DVS processors (see for instance [8] fordetails about characteristics): a XScale Intel processor(with frequencies 150, 400, 600, 800 and 1000MHz), anda PowerPC 405LP (with frequencies 33, 100, 266 and333MHz). We took frequency change overhead intoaccount, but the contribution of change overhead wasusually negligible for all of the simulations we per-formed (lower that 0.1% in most cases). As a thirdCPU, we used the characteristics of XScale, but we dis-abled one of its available frequency (400MHz in theplots we show here), in order to highlight the advan-tage of using our approximation against round up ap-proximation when the number of available frequenciesis quite low. We performed a large number of simulations in orderto compare the energy performance of “round up” and“round to closest”. We compare several processor char-acteristics, and several job characteristics. We both usetheoretical models and realistic values extracted fromproduction systems.7 igure 5 Energy consumption relative to DPM-S closest , for a set of 12 tasks with uniformly distribution. ( R e l a t i v e ) E ne r g y Frame length (Deadline)PowerPCDPM-S closest DPM-S up Limit 0.9 1 1.1 1.2 1.3 1.4 1.5 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 ( R e l a t i v e ) E ne r g y Frame length (Deadline)XScaleDPM-S closest DPM-S up Limit 0.95 1 1.05 1.1 1.15 1.2 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 ( R e l a t i v e ) E ne r g y Frame length (Deadline)XScale (no 400MHz)DPM-S closest DPM-S up Limit Figure 6 Energy consumption relative to PITDVS closest , for a set of 12 tasks with uniformly distribution. ( R e l a t i v e ) E ne r g y Frame length (Deadline)PowerPCPITDVS closest PITDVS up Limit 0.9 1 1.1 1.2 1.3 1.4 1.5 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 ( R e l a t i v e ) E ne r g y Frame length (Deadline)XScalePITDVS closest PITDVS up Limit 0.95 1 1.05 1.1 1.15 1.2 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 ( R e l a t i v e ) E ne r g y Frame length (Deadline)XScale (no 400MHz)PITDVS closest PITDVS up Limit For the figures we present here, we simulatedthe same system with different strategies computedwith variations of Algorithm 3, amongst DPM-S closest (Eq. (2)), DPM-S up (Eq. (1)), PITDVS closest (Eq. (4)), PITDVS up (Eq. (3)) and Limit (Algorithm 1), computedthe energy consumption, and presented the ratio ofthis energy to PITDVS closest or DPM-S closest . We thenperformed the same system, but for various deadlines,going from the deadline allowing to run any task atthe lowest frequency ( D = f (cid:80) Ni =1 w i ), to the small-est deadline allowing to run any task at the higherfrequency ( D = f M (cid:80) Ni =1 w i ). We even used smallerdeadlines, because this limit represents a frame whereeach task needs at the same time its WCEC, which has avery tiny probability to occur. We can consider that de-creasing the deadline boils down to increase the load:the smaller the deadline, the higher the average fre-quency. And quite intuitively, for small and large dead-line (or frame length), we don’t have any difference be-tween strategies, because they all use always either thelowest (large deadline) or the highest (small deadline)frequency.A first observation was that in many cases, the S -function of PITDVS up was already almost equal to Limit . As a consequence, we could not observe any dif-ference between PITDVS up and PITDVS closest . We canfor instance see this on Figure 6, right plot: for dead-lines between 0.1 and 0.06, we don’t see any differencebetween PITDVS closest and Limit .In the first set of simulations (Figures 5 and 6), weused 12 tasks, each of them having a uniformly dis-tributed number of cycles, with miscellaneous param- eters. On the PowerPC processor, we observe a largevariety in performance comparison. According to theload (or the frame length), we see that PITDVS closest can gain around 30% compared to PITDVS up , or losealmost 20%, while we obtain similar comparison for DPM-S closest and DPM-S up , but with smaller values.We observe also very abrupt and surprising varia-tions, such as in Figure 6, middle and right, for Limit ,around 0.03. A closer look around to variations showsthat they usually occurs when the frequency of T changes. Indeed, as T starts always at time 0, its speeddoes not really depends upon S ( t ) , but only upon S (0) . So when D varies, S (0) goes suddenly fromone frequency to another one. Then a very slight varia-tion of D could have a big impact of each frame. Thoseslight variations do not have the same impact for othertasks, because of the stochastic nature of tasks length.For instance, if we slightly change S i ( i (cid:54) = 1 ), it willonly impact a few task speeds. But slight changes in S have either no impact at all, or an impact on every taskin every frame.From those first figures, we can for sure not claimthat doing a “closest” approach is always better than a“upper bound”. But those simulations highlight thatthere are certainly situations where one approach isbetter than the other one, and situations with the otherway around. System designers should then pay at-tention to the way they round continuous frequencies.With a very small additional effort, we can often dobetter than simply round up the original schedulingfunction.For the second set of simulations (using real videoworkloads), on Figures 7 and 8, we observe the same8 igure 7 Energy consumption relative to DPM-S closest , for a set of 8 tasks distributed as shown in Figure 9. ( R e l a t i v e ) E ne r g y Frame length (Deadline)PowerPCDPMS closest DPM-S up Limit 0.6 0.8 1 1.2 1.4 1.6 0.5 1 1.5 2 2.5 ( R e l a t i v e ) E ne r g y Frame length (Deadline)XScaleDPM-S closest DPM-S up Limit 0.95 1 1.05 1.1 1.15 1.2 1.25 0.5 1 1.5 2 2.5 ( R e l a t i v e ) E ne r g y Frame length (Deadline)XScale (no 400MHz)DPM-S closest DPM-S up Limit Figure 8 Energy consumption relative to PITDVS closest , for a set of 8 tasks distributed as shown in Figure 9. ( R e l a t i v e ) E ne r g y Frame length (Deadline)PowerPCPITDVS closest PITDVS up Limit 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 0.5 1 1.5 2 2.5 ( R e l a t i v e ) E ne r g y Frame length (Deadline)XScalePITDVS closest PITDVS up Limit 0.95 1 1.05 1.1 1.15 1.2 1.25 0.5 1 1.5 2 2.5 ( R e l a t i v e ) E ne r g y Frame length (Deadline)XScale (no 400MHz)PITDVS closest PITDVS up Limit kind of differences as from the previous experiments:according to the configuration, one round method isbetter than the other one. With PowerPC configura-tion, PITDVS closest is better than PITDVS up , but DPM-S up seems to be better than DPM-S closest . However, withthe XScale processor where we disabled one frequency,both “closest” methods are better than “up” methods.Remark that we observe the same kind of benefit bydisabling another frequency than 400MHz.From the many experiments we performed, it seemsthat our approach is especially interesting when thenumber of available frequencies is limited, which isnot surprising. Indeed, the less available frequency, thefurther from the continuous model. As the two strate-gies we adapt where basically designed from continu-ous model, and as our adaptation attempts to be closerfrom the original strategy than the classical adaptation,we would have expected such behavior.We have also observed than “smooth” systems suchas the one with uniform distribution — but we havesimulated other distributions such as normal or bi-modal normal distribution — do not give smoothercurves than with the realistic workload, even if sev-eral of them contain very chaotic data. The irregu-lar behavior of our curve does not seem to be relatedto irregular data, but more to the fact that, as alreadymentioned slight variations in S can have a big im-pact on the average energy. In this paper, we do notpresent a huge number of simulations, because we donot claim that our approach is always better: what wepresent should be enough to persuade system design-ers to have a deeper look at the way they manage dis-cretization. The aim of our work was twofold. First, we presenteda simple schedulability condition for frame-based low-power stochastic real-time systems. Thanks to this con-dition, we are able to quickly check that any schedul-ing function guarantees the schedulability of the sys-tem, even when frequency change overheads are takeninto account. This test can either be used off-line tocheck that a scheduling function is schedulable, or on-line, after some parameter changes, to check whetherthe functions can still be used.The second contribution of this paper was to usethis schedulability condition in order improve theway a strategy developed for systems with continuousspeeds can be adapted for systems with a discrete setof available speeds. We show that our approach is notalways better that the classical one consisting in round-ing up to the first available frequency, but can in somecircumstances, give a gain up to almost 40% in the sim-ulations we presented.Our future work includes several aspects. First, byrunning much more simulations, we would like toidentify more precisely when our approach is betterthan the classical one. It would allow system designersto be able to choose the approach to use without run-ning simulation, or making experiments on their sys-tem.Another aspect we would like to consider is to havea deeper look to how the schedulability test we pro-vide will allow to improve the robustness of a system.If particular, if we observe that a job has required morethan its (expected) worst case number of cycles, how9 igure 9 Distribution of the number of cycles needed to decode different kinds of video, ranging from newsstreaming to complex 3D animations. The x-axis is the number of cycles, and the y-axis the probability. can we adapt temporarily our system in order to im-prove its schedulability, before we can compute thenew set of functions, using those new parameters. References [1] A YDIN , H., M EJ ´ IA -A LVAREZ , P., M OSS ´ E , D., AND M ELHEM , R. Dynamic and aggressive schedulingtechniques for power-aware real-time systems. In RTSS ’01: Proceedings of the 22nd IEEE Real-TimeSystems Symposium (RTSS’01) (Washington, DC,USA, 2001), IEEE Computer Society, p. 95.[2] C HEN , J.-J., Y ANG , C.-Y., K UO , T.-W., AND S HIH , C.-S. Energy-efficient real-time taskscheduling in multiprocessor dvs systems. In ASP-DAC ’07: Proceedings of the 2007 conference onAsia South Pacific design automation (Washington,DC, USA, 2007), IEEE Computer Society, pp. 342–349.[3] G RUIAN , F. Hard real-time scheduling for low-energy using stochastic data and dvs processors.In ISLPED ’01: Proceedings of the 2001 interna-tional symposium on Low power electronics and design (New York, NY, USA, 2001), ACM, pp. 46–51.[4] G RUIAN , F. On energy reduction in hard real-time systems containing tasks with stochastic ex-ecution times. In Proceedings of Workshop on PowerManagement for Real-Time and Embedded Systems (2001), pp. 11–16.[5] L ORCH , J. R., AND S MITH , A. J. Improving dy-namic voltage scaling algorithms with pace. In SIGMETRICS ’01: Proceedings of the 2001 ACMSIGMETRICS international conference on Measure-ment and modeling of computer systems (New York,NY, USA, 2001), ACM, pp. 50–61. [6] L ORCH , J. R., AND S MITH , A. J. Pace: A newapproach to dynamic voltage scaling. IEEE Trans-actions on Computers 53 , 7 (2004), 856–869.[7] M OSSE , D., A YDIN , H., C HILDERS , B., AND M ELHEM , R. Compiler-assisted dynamic power-aware scheduling for real-time applications. In COLP’00: Proceedings of the Workshop on Compilersand Operating Systems for Low-Power (2000).[8] X U , R., M ELHEM , R., AND M OSS ´ E , D. A unifiedpractical approach to stochastic dvs scheduling.In EMSOFT ’07: Proceedings of the 7th ACM & IEEEinternational conference on Embedded software (NewYork, NY, USA, 2007), ACM, pp. 37–46.[9] X U , R., M OSS ´ E , D., AND M ELHEM , R. Minimiz-ing expected energy in real-time embedded sys-tems. In EMSOFT ’05: Proceedings of the 5th ACMinternational conference on Embedded software (NewYork, NY, USA, 2005), ACM, pp. 251–254.[10] X U , R., M OSS ´ E , D., AND M ELHEM , R. Mini-mizing expected energy consumption in real-timesystems through dynamic voltage scaling. ACMTrans. Comput. Syst. 25 , 4 (2007), 9.[11] X U , R., X I , C., M ELHEM , R.,