Learning to Emulate an Expert Projective Cone Scheduler
LLearning to Emulate an Expert Projective Cone Scheduler
Neal Master
Abstract
Projective cone scheduling defines a large class of rate-stabilizing policies for queueing models relevant to severalapplications. While there exists considerable theory on the properties of projective cone schedulers, there is littlepractical guidance on choosing the parameters that define them. In this paper, we propose an algorithm for designing anautomated projective cone scheduling system based on observations of an expert projective cone scheduler. We show thatthe estimated scheduling policy is able to emulate the expert in the sense that the average loss realized by the learnedpolicy will converge to zero. Specifically, for a system with n queues observed over a time horizon T , the average loss forthe algorithm is O (cid:16) ln( T ) (cid:112) ln( n ) /T (cid:17) . This upper bound holds regardless of the statistical characteristics of the system.The algorithm uses the multiplicative weights update method and can be applied online so that additional observationsof the expert scheduler can be used to improve an existing estimate of the policy. This provides a data-driven methodfor designing a scheduling policy based on observations of a human expert. We demonstrate the efficacy of the algorithmwith a simple numerical example and discuss several extensions.
1. Introduction
In a variety of application areas, processing systemsare dynamically scheduled to maintain stability and tomeet various other objectives. Indeed, the basic prob-lem in scheduling theory has been to find and study poli-cies that accomplish this task under different modelingassumptions. In practice however, while human expertsmay be able to manage real-world processing systems, itis typically non-trivial to precisely quantify the costs andobjectives that govern expert schedulers. For example, inoperating room scheduling, ad hoc metrics have been ap-plied in an attempt to model the cost of delays, e.g. [1], butthese metrics are largely subjective. The Delphi Method iscommonly used in management science to quantitativelymodel expert opinions but such methods have no algorith-mic guarantees and are not always reliable [2].In this paper, we present an online algorithm that allowsus to emulate an expert scheduler based on observations ofthe backlog of the queues in the system and observations ofthe expert’s scheduling decisions. We use the term “emu-late” to mean that while the parametric form of the learnedpolicy may not converge to the parametric form of the ex-pert policy in all cases, it will always yield scheduling de-cisions that on average converge to the expert’s decisions.This offers a data-driven way for designing autonomousscheduling systems. We specifically consider a projectivecone scheduling (PCS) model which has applications inmanufacturing, call/service centers, and in communicationnetworks [3, 4].
Email address: [email protected] (Neal Master)
The algorithm in this paper uses the multiplicativeweight update (MWU) method [5]. The MWU methodhas been used in several areas including solving zero-sumgames [6], solving linear programs [7], and inverse opti-mization [8]. Because the PCS policy can be written asa maximization, our techniques are most similar to thoseused in [8]. In [8], the authors apply an MWU algorithmover a fixed horizon to learn the objective of an expert whois solving a sequence of linear programs. Our results differfrom [8] in several ways. One is that because of the queue-ing dynamics that we consider, our expert’s objective willvary over time whereas in [8] the objective is constant. Arelated issue is that in [8], when the expert has a decisionvariable of dimension n , the dimension of the parameterbeing learned is also n . In our case, when the expert hasa decision variable of dimension n (i.e. there are n queuesin the system), we need to estimate Θ( n ) parameters.We also note that in this paper we provide an algorithmthat can be applied even when the horizon is not known a priori .The goal of inferring parts of an optimization modelfrom data is a well-studied problem in many other ap-plications. For example, genetic algorithm heuristics havebeen applied to estimate the objective and constraints of alinear program in a data envelopment analysis context [9].The goal of imputing the objective function of a convexoptimization problem has also been considered in the opti-mization community, e.g. [10, 11]. These papers rely heav-ily on the convexity of the objective and the feasible set.This approach does not apply in a PCS context becausethe set of feasible scheduling actions is discrete and hencenon-convex. Preprint January 29, 2018 a r X i v : . [ c s . L G ] J a n his paper is also related to inverse reinforcement learn-ing. Inverse reinforcement learning is the problem of esti-mating the rewards of a Markov decision process (MDP)given observations of how the MDP evolves under an opti-mal policy [12]. Inverse reinforcement learning can be usedto emulate expert decision makers (referred to as “appren-ticeship learning” in the machine learning community) aslong as the underlying dynamics are Markovian [13]. Inthe PCS model, no such assumption is made and so ourresults naturally do not require Markovian dynamics.The remainder of this paper is organized as follows. Sec-tion 2 specifies the PCS model that we consider. Section 3presents our algorithms and the relevant guarantees. Be-cause we take a MWU approach to the problem, our guar-antees are bounds on the average loss. However, we alsoprovide a concentration bound which gives guarantees onthe tail of the loss distribution. We provide a simple nu-merical demonstration of our algorithms in Section 4. InSection 5 we discuss some extensions of our results and weconclude in Section 6.
2. Projective Cone Scheduling Dynamics
In this section we summarize the PCS model presentedin [4] and comment on the connection to the model pre-sented in [3]. The PCS model has n queues each withinfinite waiting room following an arbitrary queueing dis-cipline. Time is discretized into slots t ∈ Z +1 . The backlogin queue i at the beginning of time slot t is x t ( i ). The back-log across all queues can be written as a vector x t ∈ Z n + .The number of customers that arrive at queue i at theend of time slot t is a t ( i ). The arrivals across all queuescan be written as a vector a t ∈ Z n + . Scheduling config-urations are chosen from a finite set S (cid:40) Z n + . If con-figuration s t ∈ S is chosen in time slot t then for eachqueue i , min { s t ( i ) , x t ( i ) } customers are served at the be-ginning of the time slot. We take the departure vectoras d t = min { s t , x t } ∈ Z n + where the minimum is takencomponent-wise. This gives us the following dynamics x t +1 = x t − d t + a t (1)where x ∈ Z n + is arbitrary. Note that the arrival vector isallowed to depend on previous scheduling configurations,previous arrivals, and previous backlogs in an arbitraryway.The scheduling configurations are dynamically chosenby solving the maximizationmax s ∈S (cid:104) s, Bx t (cid:105) = max s ∈S (cid:88) i,j s ( i ) B ( i, j ) x t ( j ) (2)where B ∈ R n × n is symmetric and positive-definite withnon-positive off-diagonal elements. We assume that S isendowed with some arbitrary ordering used for breaking We use the notation Z + = { , , , . . . } . ties. This PCS policy defines a broad class of schedulingpolicies and in particular we note that by taking B as theidentity matrix, we return to the typical maximum weightmatching scheduling algorithm.Although B is a matrix, because B is symmetric, thereare only p = n ( n + 1) / n free param-eters that need to be learned. Consequently, we willrepresent the projective cone scheduler with an upper-triangular array rather than a matrix. In particular, take b ( i, i ) ∝ B ( i, i ) for i ∈ [ n ] and b ( i, j ) ∝ − B ( i, j ) for( i, j ) ∈ { ( i, j ) ∈ [ n ] × [ n ] : i < j } . We can also assumewithout loss of generality that (cid:80) i,j b ( i, j ) = 1. Then wecan write the projective cone scheduling decision as fol-lows: s t = arg max s ∈S (cid:104) s, Bx t (cid:105) = arg max s ∈S n (cid:88) i =1 n (cid:88) j =1 B ( i, j ) s ( i ) x t ( j )= arg max s ∈S (cid:110) (cid:88) i B ( i, i ) s ( i ) x t ( i )+ (cid:88) i 3. The Learning Algorithm In this section we present our algorithm. We firstpresent a finite horizon algorithm and then leverage this topresent an infinite horizon algorithm. For both algorithms,we show that the average error is O (ln( T ) (cid:112) ln( p ) /T ). Wealso provide a bound on the fraction of observations forwhich the error exceeds our average case bound.These algorithms are applied causally in an online fash-ion. Although we do not focus on computational issues, wenote that computing µ ( y t ; b ) is generally a difficult prob-lem. However, there are local search heuristics that allowefficient computation of µ ( y t +1 ; b ) based on the solutionto µ ( y t ; b ) [14]. Our algorithms require the computationof µ ( y t ; ˆ b t ) where ˆ b t is the estimate of b at time t and soan online algorithm is appropriate if we want to use theprevious solution as a warm start.Before presenting the algorithms, we consider the lossfunction of interest. Since the expert we are trying toemulate is specified by the array b , it may seem reasonableto want our estimates ˆ b t to converge to b . However, thisgoal is not as reasonable as it may seem. Because S isdiscrete, it is possible that two different values of b canrender the same scheduling decisions. Consequently, thegoal of exactly recovering b may be ill-posed. We aim toemulate the expert scheduler so we want ˆ s t = µ ( y t ; ˆ b t ), thescheduling decision induced by the estimate ˆ b t , to be thesame as s t , the expert’s scheduling decision. Hence, theloss should directly penalize discrepancies between s t andˆ s t . This leads us to jointly consider ˆ b t and ˆ s t so that theloss at time t is (cid:96) t = (cid:88) i ≤ j σ ( i, j )(ˆ b t ( i, j ) − b ( i, j ))( δ t ( i ) y t ( j )+ δ t ( j ) y t ( i )) (7)where δ t = ˆ s t − s t . When b = ˆ b t , we have that s t = ˆ s t and (cid:96) t = 0. In addition, when s t = ˆ s t we have that (cid:96) t = 0 evenif b (cid:54) = ˆ b t . The definition of µ will allow us to show belowthat (cid:96) t ≥ b relevant to those queues. Moregenerally, the arrival process may not sufficiently exciteall modes of the system. By considering ˆ s t and ˆ b t simul-taneously, we can provide bounds that apply even in thepresence of pathological arrival processes. We first present Algorithm 1, a finite horizon algorithmthat requires knowledge of the horizon. Algorithm 1 is amultiplicate weights update algorithm and this time hori-zon is used to set the learning rate. Algorithm 1: Online Parameter Learning with a Fixed Horizon Input : (( y , s ) , . . . , ( y T , s T )) Output: (ˆ b , . . . , ˆ b T ) Output: (ˆ s , . . . , ˆ s T ) η ← (cid:112) ln( p ) /T w ← upper triangular array of 1s for t = 1 , . . . T do ˆ b t ← w t / (cid:80) i ≤ j w t ( i, j ) ˆ s t ← µ ( y t ; ˆ b t ) m t ← upper triangular array of 0s δ t = ˆ s t − s t if ˆ s t (cid:54) = s t then z t = δ t / (cid:107) δ t (cid:107) ∞ for ( i, j ) ∈ [ n ] : i ≤ j do m t ( i, j ) ← σ ( i, j )( z t ( i ) y t ( j ) + z t ( j ) y t ( i )) end end w t +1 ← w t (1 − ηm t ) endTheorem 1. Let D = max ( u,v ) ∈S (cid:107) u − v (cid:107) ∞ and p = n ( n + 1) / . If T > p ) then the output ofAlgorithm 1 satifies the following inequality: ≤ T T (cid:88) t =1 (cid:96) t ≤ D (cid:114) ln( p ) T (8) Proof. Note that because m t ∈ [ − , p and η < we candirectly apply [5, Corollary 2.2.]: T (cid:88) t =1 (cid:88) i ≤ j m t ( i, j )ˆ b t ( i, j ) ≤ T (cid:88) t =1 (cid:88) i,j ( m t ( i, j ) + η | m t ( i, j ) | ) b ( i, j ) + ln( p ) η (9)3ince | m t ( i, j ) | ≤ (cid:80) i ≤ j ˆ b t ( i, j ) = 1 we have thefollowing: T (cid:88) t =1 (cid:88) i ≤ j m t ( i, j )ˆ b t ( i, j ) ≤ T (cid:88) t =1 (cid:88) i,j m t ( i, j ) b ( i, j ) + ηT + ln( p ) η (10)A straightforward calculation shows that this upper boundis minimized when η = (cid:112) ln( p ) /T . Rearranging the in-equality and applying this fact give us the following:1 T T (cid:88) t =1 (cid:88) i ≤ j m t ( i, j )ˆ b t ( i, j ) − T T (cid:88) t =1 (cid:88) i ≤ j m t ( i, j ) b ( i, j ) ≤ (cid:114) ln( p ) T (11)Now we apply the specifics of m t . By definition of D , (cid:107) δ t (cid:107) ∞ ≤ D . This gives us the following:1 T T (cid:88) t =1 (cid:88) i ≤ j σ ( i, j )( δ t ( i ) y t ( j ) + δ t ( j ) y t ( i ))ˆ b t ( i, j )+ 1 T T (cid:88) t =1 (cid:88) i ≤ j σ ( i, j )( − δ t ( i ) y t ( j ) − δ t ( j ) y t ( i )) b ( i, j ) ≤ D (cid:114) ln( p ) T (12)Note that ˆ s t = µ ( y t ; ˆ b t ) and µ is defined in terms of amaximization. Therefore, (cid:88) i ≤ j σ ( i, j )(ˆ s t ( i ) y t ( j ) + ˆ s t ( j ) y t ( i ))ˆ b t ( i, j ) ≥ (cid:88) i ≤ j σ ( i, j )( s ( i ) y t ( j ) + s ( j ) y t ( i ))ˆ b t ( i, j )for any s ∈ S . This shows that each term in the firstCes`aro sum in (12) is non-negative. Similarly, each termin the second Ces`aro sum in (12) is non-negative. Thisgives us a lower bound of zero. Rearranging the termsleaves us with the desired results. We now present Algorithm 2, an infinite horizon algo-rithm that dynamically changes the learning rate. Al-gorithm 2 applies the “doubling trick” to Algorithm 1.The idea is that we define epochs [ T k , T k +1 ] where T k = 2 k (4 ln( p )) for k ≥ T − = 0. The durationof the k th epoch is T k and in this epoch we apply Algo-rithm 1. Up to poly-logarithmic factors of T , this gives usthe same convergence rate that we had for Algorithm 1. Algorithm 2: Online Parameter Learning with an Unknown Horizon Input : (( y , s ) , ( y , s ) , . . . ) Output: (ˆ b , ˆ b , . . . ) Output: (ˆ s , ˆ s , . . . ) T − (cid:44) T k (cid:44) k (4 ln( p )) for k ∈ { , , , . . . } . for t = 1 , , . . . do if T k < t ≤ T k +1 then Apply Algorithm 1 with T ≡ T k and without re-initializing w t end endTheorem 2. Suppose T ≥ T . Define lg( · ) as (cid:100) log ( · ) (cid:101) .Then the output of Algorithm 2 satifies the following in-equality: ≤ T T (cid:88) t =1 (cid:96) t ≤ √ D lg (cid:18) TT (cid:19) (cid:114) ln( p ) T (13) Note that these are the same bounds as in Theorem 1but with an additional factor of √ T /T ) .Proof. First note that the proof of [5, Corollary 2.2.] doesnot require the initial weights to be uniform so Theorem 1still applies even without the initialization on line 2 ofAlgorithm 1. For convenience, let U k = 2 D (cid:112) ln( p ) /T k andtake K = lg( T /T ). Applying Theorem 1 to each stage ofAlgorithm 2 gives us the following:0 ≤ T (cid:88) t =1 (cid:96) t ≤ K (cid:88) k =0 T k (cid:88) t =1 (cid:96) t ≤ K (cid:88) k =0 T k U k ≤ K (cid:88) k =0 D (cid:112) ln( p ) (cid:112) T k ≤ D ( K + 1) (cid:112) ln( p ) (cid:112) T K ≤ √ D ( K + 1) (cid:112) ln( p ) √ T (14)The first inequality follows from the fact that (cid:96) t ≥ 0; thesecond inequality follows by extending the sum from T upto T K ; the third and fourth inequalities follow from The-orem 1. The penultimate inequality follows from the factthat { T k } is an increasing sequence and the final inequalityfollows because T K can be no more that 2 T .Dividing by T gives the desired result. Our previous results provided bounds on the averageloss of our algorithms. In this section, we provide boundsfor the tail of the distribution of the loss. This gives usthe guarantee that the fraction of observations for whichthe loss exceeds our average case bound tends to zero.4 heorem 3. Let f T ( (cid:15) ) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:26) t ≤ T : (cid:96) t > √ D lg (cid:16) TT (cid:17) (cid:113) ln( p ) T + (cid:15) (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) T (15) be the fraction of observations up to time T ≥ T for whichthe loss exceeds the average-case bound by at least (cid:15) . Thenfor any (cid:15) > we have that f T ( (cid:15) ) ≤ − (cid:15) √ D lg (cid:16) TT (cid:17) (cid:113) ln( p ) T + (cid:15) (16) and hence, lim T →∞ f T ( (cid:15) ) = 0 . (17) Proof. The observed loss sequence { (cid:96) t } Tt =1 defines a pointmeasure on R + where each point has mass 1 /T . ApplyingMarkov’s Inequality to this measure gives us that f T ( (cid:15) ) ≤ √ D lg (cid:16) TT (cid:17) (cid:113) ln( p ) T + (cid:15) · T T (cid:88) t =1 (cid:96) t ≤ √ D lg (cid:16) TT (cid:17) (cid:113) ln( p ) T + (cid:15) · √ D lg (cid:18) TT (cid:19) (cid:114) ln( p ) T Rearranging the upper bound gives the first result. Forthe second result we simply take the limit and note thatlim T →∞ √ D lg (cid:18) TT (cid:19) (cid:114) ln( p ) T = 0 . 4. A Numerical Demonstration We now demonstrate Algorithm 2 on a small exampleof n = 2 queues. In each time slot, the number of arrivingcustomers is geometrically distributed on Z + . For queue 1the mean number of arriving customers is 1 and for queue2 the mean number of arriving customers is 2. The arrivalsare independent across time slots as well as across queues.We take b = (cid:20) . . . (cid:21) and S = (cid:110)(cid:2) (cid:3) (cid:48) , (cid:2) (cid:3) (cid:48) , (cid:2) (cid:3) (cid:48) , (cid:2) (cid:3) (cid:48) (cid:111) . This choice of b shows that the expert scheduler prioritizesqueue 1 over queue 2 and the expert also has a preferenceto not serve both queues simultaneously. We simulate thesystem and run Algorithm 2 for T = 10 time slots with x = (cid:2) (cid:3) (cid:48) . The results are shown in Figure 1.First note that Figure 1a shows that the ˆ b t does notconverge to b . We see that (to 4 decimal places)ˆ b T = (cid:20) . . . (cid:21) ˆ b t (0 , b (0 , ˆ b t (0 , b (0 , Time ˆ b t (1 , b (1 , (a) Evolution of ˆ b t Time k ˆ s t − s t k ∞ (b) Error in learned scheduling decisions Time -3 -2 -1 A v e r age Lo ss RealizedUpper Bound (c) Average realized loss Figure 1: Output of Algorithm 2 for the example in Section 4 b T yields the same scheduling decisions as b .The algorithm learns to emulate the expert scheduler sothe loss becomes zero and the weights stop updating. Thispossibility was discussed at the beginning of Section 3.Figure 1c which shows that while the average loss doesindeed tend to zero, the upper bound proved in Theorem 2is quite loose in this situation. This is expected due tothe generality of the theorem. This also means that theconcentration bound in Theorem 3 is quite conservative.Indeed, for this simulation we see that f t ( (cid:15) ) = 0 for all t and for any (cid:15) > 0. In other words, no observed loss everexceeds the average case bound. 5. Extensions We now discuss some extensions to our algorithms. Wefirst note that we could replace line 14 in Algorithm 1 with w t +1 ← w t · exp( − ηw t ) . The new algorithm would be a Hedge-style algorithm andwe would be able to use apply other results (e.g. [5, The-orem 2.4]) to obtain similar upper bounds on the averageloss.We also note that we could modify our algorithms andobtain tighter upper bounds if we impose additional as-sumptions on the expert. For example, the expert mayhave a fairly simple objective that leads to prioritizationof some queues over others. In this case, we would have b ( i, j ) = 0 for i (cid:54) = j . Rather than having a triangular ar-ray of p parameter estimates, we could instead keep trackof just n estimates. Since Θ(ln( p )) = Θ(ln( n )), the con-vergence rate would not change but we would have smallerconstant factors. Other sparsity patterns could be handledin a similar fashion. The diagonal case is slightly simplerbecause there would be no need use σ ( i, j ) to keep trackof the appropriate signs.As noted in Section 2, a continuous-time PCS modelwith heterogenous and stochastic service times was con-sidered in [3]. Our algorithms could be applied in thissetting as well by updating the algorithm immediately af-ter customer arrivals and departures rather than in dis-crete time slots. In [3], B is diagonal and so we couldapply the simplifications mentioned above. Our theoremswould still hold because they not require that the state up-date happen at regularly spaced intervals – the algorithmsmerely require a stream of observed backlogs and observedscheduling actions. 6. Conclusions and Future Work In this paper we have proposed an algorithm that learnsa scheduling policy that emulates the behavior of an expertprojective cone scheduler. This offers a data-driven way ofdesigning automated scheduling policies that achieve the same goals as a human manager. We have provided severaltheoretical guarantees and have numerically demonstratedthe efficacy of the algorithm on a simple example.This paper opens the door for a few area of future work.One idea is to provide tighter bounds that depend on thestatistical properties of the arrival process. A benefit ofthe current approach is that it does not require any as-sumptions on the arrival process but the clear downside isthat the resulting bounds are quite loose. An algorithmthat uses information about the arrival process could havefaster convergence rates and tighter bounds.Another idea is to investigate the impact of an approx-imate computation of µ . As mentioned in Section 3, inlarge-scale problems, exactly computing µ ( y ; b ) is gener-ally a difficult problem and heuristic approaches are typ-ically taken in practice. An area of future work wouldbe to consider how such approximation “noise” affects ourability to emulate the expert scheduler. References [1] N. Master, Z. Zhou, D. Miller, D. Scheinker, N. Bambos,P. Glynn, Improving predictions of pediatric surgical durationswith supervised learning, International Journal of Data Scienceand Analytics (2017) 1–18.[2] C. Okoli, S. D. Pawlowski, The Delphi method as a researchtool: an example, design considerations and applications, Infor-mation & management 42 (1) (2004) 15–29.[3] M. Armony, N. Bambos, Queueing dynamics and maximalthroughput scheduling in switched processing systems, Queue-ing systems 44 (3) (2003) 209–252.[4] K. Ross, N. Bambos, Projective cone scheduling (PCS) algo-rithms for packet switches of maximal throughput, IEEE/ACMTransactions on Networking 17 (3) (2009) 976–989.[5] S. Arora, E. Hazan, S. Kale, The Multiplicative Weights Up-date Method: a Meta-Algorithm and Applications., Theory ofComputing 8 (1) (2012) 121–164.[6] Y. Freund, R. E. Schapire, Adaptive game playing using mul-tiplicative weights, Games and Economic Behavior 29 (1-2)(1999) 79–103.[7] S. A. Plotkin, D. B. Shmoys, ´E. Tardos, Fast approximationalgorithms for fractional packing and covering problems, Math-ematics of Operations Research 20 (2) (1995) 257–301.[8] A. B¨armann, S. Pokutta, O. Schneider, Emulating the Expert:Inverse Optimization through Online Learning, in: Interna-tional Conference on Machine Learning, 2017, pp. 400–410.[9] M. D. Troutt, A. A. Brandyberry, C. Sohn, S. K. Tadisina, Lin-ear programming system identification: The general nonnega-tive parameters case, European Journal of Operational Research185 (1) (2008) 63–75.[10] A. Keshavarz, Y. Wang, S. Boyd, Imputing a convex objec-tive function, in: IEEE International Symposium on IntelligentControl, IEEE, 2011, pp. 613–619.[11] J. Thai, A. M. Bayen, Imputing a variational inequality functionor a convex objective function: A robust approach, Journal ofMathematical Analysis and Applications.[12] A. Y. Ng, S. J. Russell, Algorithms for inverse reinforcementlearning., in: International Conference on Machine Learning,2000, pp. 663–670.[13] P. Abbeel, A. Y. Ng, Apprenticeship learning via inverse re-inforcement learning, in: International Conference on Machinelearning, ACM, 2004, pp. 1–8.[14] K. Ross, N. Bambos, Local search scheduling algorithms formaximal throughput in packet switches, in: Annual Joint Con-ference of the IEEE Computer and Communications Societies(INFOCOM), Vol. 2, IEEE, 2004, pp. 1158–1169.[1] N. Master, Z. Zhou, D. Miller, D. Scheinker, N. Bambos,P. Glynn, Improving predictions of pediatric surgical durationswith supervised learning, International Journal of Data Scienceand Analytics (2017) 1–18.[2] C. Okoli, S. D. Pawlowski, The Delphi method as a researchtool: an example, design considerations and applications, Infor-mation & management 42 (1) (2004) 15–29.[3] M. Armony, N. Bambos, Queueing dynamics and maximalthroughput scheduling in switched processing systems, Queue-ing systems 44 (3) (2003) 209–252.[4] K. Ross, N. Bambos, Projective cone scheduling (PCS) algo-rithms for packet switches of maximal throughput, IEEE/ACMTransactions on Networking 17 (3) (2009) 976–989.[5] S. Arora, E. Hazan, S. Kale, The Multiplicative Weights Up-date Method: a Meta-Algorithm and Applications., Theory ofComputing 8 (1) (2012) 121–164.[6] Y. Freund, R. E. Schapire, Adaptive game playing using mul-tiplicative weights, Games and Economic Behavior 29 (1-2)(1999) 79–103.[7] S. A. Plotkin, D. B. Shmoys, ´E. Tardos, Fast approximationalgorithms for fractional packing and covering problems, Math-ematics of Operations Research 20 (2) (1995) 257–301.[8] A. B¨armann, S. Pokutta, O. Schneider, Emulating the Expert:Inverse Optimization through Online Learning, in: Interna-tional Conference on Machine Learning, 2017, pp. 400–410.[9] M. D. Troutt, A. A. Brandyberry, C. Sohn, S. K. Tadisina, Lin-ear programming system identification: The general nonnega-tive parameters case, European Journal of Operational Research185 (1) (2008) 63–75.[10] A. Keshavarz, Y. Wang, S. Boyd, Imputing a convex objec-tive function, in: IEEE International Symposium on IntelligentControl, IEEE, 2011, pp. 613–619.[11] J. Thai, A. M. Bayen, Imputing a variational inequality functionor a convex objective function: A robust approach, Journal ofMathematical Analysis and Applications.[12] A. Y. Ng, S. J. Russell, Algorithms for inverse reinforcementlearning., in: International Conference on Machine Learning,2000, pp. 663–670.[13] P. Abbeel, A. Y. Ng, Apprenticeship learning via inverse re-inforcement learning, in: International Conference on Machinelearning, ACM, 2004, pp. 1–8.[14] K. Ross, N. Bambos, Local search scheduling algorithms formaximal throughput in packet switches, in: Annual Joint Con-ference of the IEEE Computer and Communications Societies(INFOCOM), Vol. 2, IEEE, 2004, pp. 1158–1169.