A Constraint Programming-based Job Dispatcher for Modern HPC Systems and Applications
AA Constraint Programming-based Job Dispatcherfor Modern HPC Systems and Applications
Cristian Galleguillos,
Zeynep Kiziltan, Ricardo Soto Pontificia Universidad Cat´olica de Valpara´ıso, Valpara´ıso, Chile University of Bologna, Bologna, [email protected], [email protected], [email protected] 17, 2020
Abstract
Constraint Programming (CP) is a well-established area in AI as a programmingparadigm for modelling and solving discrete opti-mization problems, and it has been been success-fully applied to tackle the on-line job dispatchingproblem in HPC systems including those run-ning modern applications. The limitations of theavailable CP-based job dispatchers may hindertheir practical use in today’s systems that arebecoming larger in size and more demanding inresource allocation. In an attempt to bring basicAI research closer to a deployed application, wepresent a new CP-based on-line job dispatcherfor modern HPC systems and applications. Un-like its predecessors, our new dispatcher tacklesthe entire problem in CP and its model size isindependent of the system size. Experimentalresults based on a simulation study show thatwith our approach dispatching performance in-creases significantly in a large system and in asystem where allocation is nontrivial.
Motivations
High Performance Computing(HPC) is the application of supercomputers tosolve complex computational problems in sci-ence, business and engineering. As such, HPCsystems have become indispensable for scientificprogress, industrial competitiveness, economicgrowth and quality of life in our modern soci-ety [12, 15]. An HPC system is a network ofcomputing nodes, each containing one or moreCPUs and its own memory. The next gener-ation of HPC systems aim at reaching the ex-aFLOP level (10 floating-point operations persecond). Indeed, in single or further reduced pre-cision, which are often used in machine learn-ing and AI applications, the peak performanceof today’s most powerful system Fugaku is over1 exaFLOPS. In their march towards exascaleperformance, HPC systems are getting larger intheir number of nodes and becoming more het-erogeneous in their computing resources in an ef-fort to keep the power consumption at bay. Fig-ure 1 shows in blue dots and green triangles the a r X i v : . [ c s . A I] S e p igure 1: Size of the Eurora, KIT ForHLR IIand the top 500 HPC systems.size of today’s top 500 systems , where the ma-jority has thousands of nodes. Around 30% ofthese systems employ specialized energy-efficientaccelerators such as GPUs and MICs.Central to the efficiency and the Quality-of-Service (QoS) of an HPC system is the job dis-patcher which has the key role of deciding whichjobs to run next among those waiting in thequeue ( scheduling ) and on which resources torun them ( allocation ). This is an on-line de-cision making in the sense that the process isrepeated periodically as new jobs arrive to thesystem while some previously dispatched jobs arestill running. Traditionally, HPC job dispatchershave been designed for compute-intensive jobsrequiring days to complete. There is an increas-ing trend where HPC systems are being usedfor modern applications that employ many shortjobs ( < Related work [9] presented two CP-based on-line job dispatchers for HPC systems, which wehere refer to as PCP’19 and HCP’19. Thesedispatchers are built on previous CP-based dis-patchers [4, 6] and are redesigned for satisfyingthe challenges of systems running modern appli-cations that employ many short jobs and thathave strict timing requirements. A simulationstudy [9] based on a workload trace collectedfrom the Eurora system [7] reveals that PCP’19and HCP’19 yield substantial improvements overthe original dispatchers [4, 6] and provide a bet-ter QoS compared to Eurora’s dispatcher [11],which is a part of the commercial workload man-agement system PBS Professional [2].PCP’19 and HCP’19 play significant roles inthe adoption of an AI-driven technology in theworkload management of HPC systems, yet theyhave limitations which may hinder their practicaluse in today’s systems. PCP’19 is not scalable tolarge systems composed of thousands of nodes.2his is because in PCP’19 the entire problemis tackled using CP, and the number of decisionvariables increases proportionally to the numberof nodes and the possible allocations of jobs ineach node. Figure 1 shows where a system likeEurora stands compared to today’s top 500 sys-tems. As we will show in our experimental re-sults, PCP’19 cannot be used in a larger systemlike KIT ForHLR II , whose size is comparableto that of the majority of the top systems.In HCP’19, the problem is decomposed andsolved in two stages. First, the scheduling prob-lem is addressed using CP by treating the re-sources of the same type across all nodes as apool of resources. Then the allocation problem issolved with a heuristic algorithm using the best-fit strategy [18], while fixing any inconsistenciesintroduced in the first phase. Without an allo-cation model on the CP side, the number of de-cision variables drops dramatically and HCP’19can scale to larger systems like KIT ForHLR II.However, the decoupled approach may result inseveral iterations between the two stages whenallocation is nontrivial, for instance when manyjobs demand the scarce resource types in an het-erogeneous system [14], or when power-aware al-location is required to limit power consumption[6]. This in turn could decrease dispatching per-formance, as we will show in our experimentalresults with the heterogeneous system Eurora.The advantage of tackling the entire problem inCP is that scheduling and allocations decisionsare made jointly and complex allocation con-straints that emerge from the needs of today’ssystems can be integrated in the CP model. Contributions
We exploit the strengths ofPCP’19 and HCP’19 to overcome their limita- tions. We present a new CP allocation modelwhere the number of variables is system size in-dependent. We combine this model with theCP scheduling model common to PCP’19 andHCP’19 and devise a tailored search algorithm.Our contributions are (i) a novel CP-based on-line job dispatcher (PCP’20) suitable for modernHPC systems and applications and (ii) experi-mental evidence of the benefits of PCP’20 overPCP’19 and HCP’19 supported by a simulationstudy based on workload traces collected fromthe Eurora and KIT ForHLR II systems. Organization
The rest of the paper is orga-nized as follows. In Section 2, we introduce theon-line job dispatching problem in HPC systems,and describe briefly the CP scheduling and allo-cation models of PCP’19 as we will later use thesame scheduling model in PCP’20 and contrastthe allocation model with ours. In Section 3, wepresent our new CP allocation model and searchalgorithm. In Sections 4 and 5, we detail oursimulation study and present our results. Weconclude and describe the future work in Sec-tion 6. A job is a user request in an HPC system andconsists of the execution of a computational ap-plication over the system resources. A set ofjobs is a workload . A job has a name, requiredresource types (cores, memory, etc) to run thecorresponding application, and an expected du-ration which is the maximum time it is allowedto execute on the system. An HPC system typi-cally receives multiple jobs simultaneously from3ifferent users and places them in a queue to-gether with the other waiting jobs (if there areany). The waiting time of a job is the time in-terval during which the job remains in the queueuntil its execution time.An HPC system has N nodes, with each node n ∈ N having a capacity cap n,r for each of itsresource type r ∈ R . Each job i in the queue Q has an arrival time q i , maximum number of re-quested nodes rn i and a demand req i,r giving theamount of resources required from r during itsexpected duration d i . The resource request of i is distributed among rn i identical job units, eachrequiring req i,r /rn i amount of resources from r .A specific resource can be used by one job unitonly. We have rn i = 1 for serial jobs and rn i > t for (a subsetof) the queued jobs Q . The on-line job dispatch-ing problem at a time t consists in scheduling each job i by assigning it a start time s i ≥ t ,and allocating i to the requested resources dur-ing its expected duration d i , such that the ca-pacity constraints are satisfied: at any time inthe schedule, the capacity cap n,r of a resource r is not exceeded by the total demand req i,r ofthe jobs i allocated on it, taking into accountthe presence of jobs already in execution. Theobjective is to dispatch in the best possible wayaccording a measure of QoS, such as with mini-mum waiting times s i − q i for the jobs, which isdirectly perceived by the HPC users. A solutionto the problem is a dispatching decision . Oncethe problem is solved, only the jobs with s i = t are dispatched. The remaining jobs with s i > t are queued again with their original q i . Duringexecution, a job exceeding its expected durationis killed. It is the workload management system software that decides the dispatching time t andthe subsequent dispatching times. Scheduling model
The scheduling problemis modeled using Conditional Interval Variables(CIVs) [13]. A CIV τ i ∈ τ represents a job i anddefines the time interval during which i runs. Ata dispatching time t , there may already be jobs inexecution which were previously scheduled andallocated. We refer to such jobs as running jobs.The scheduling model considers in the τ vari-ables both the running jobs and a subset ¯ Q ⊆ Q of the queued jobs that can start execution asof time t . The properties s ( τ i ) and d ( τ i ) cor-respond respectively to the start time and theduration of the job i . Since the actual runtime(real) duration d ri of a running or queued job i is unknown at the modeling time, PCP’19 usesan expected duration d i for d ( τ i ), which is sup-plied by a job duration prediction method . Forthe queued jobs, we have d ( τ i ) = d i . For the run-ning jobs instead, d ( τ i ) = max (1 , s ( τ i ) + d i − t )taking into account the possibility that d i < d ri due to underestimation. While the start time ofthe running jobs have already been decided, thequeued jobs have s ( τ i ) ∈ [ t, eoh ], where eoh isthe end of the worst-case makespan calculatedas t + (cid:80) τ i d ( τ i ).The capacity constraints are enforcedvia the cumulative constraint [1] as cumulative ([ s ( τ i )] , [ d ( τ i )] , [ req i,r ] , T cap r ),for all n ∈ N and for all r ∈ R , with T cap r = (cid:80) Nn cap n,r . The objective func-tion minimizes the total job slowdown (cid:80) τ i s ( τ i ) − q i + d ( τ i ) d ( τ i ) . The search for solutionsfocuses on the jobs with highest priority wherethe priority of a job i is its slowdown t − q i + d ( τ i ) d ( τ i ) at the dispatching time t .4 llocation model The allocation modelreplicates each τ i variable p i,n times for each n ∈ N , where p i,n = min ( rn i , min r ∈ R (cid:98) cap n,r req i,r /rn i (cid:99) )giving the minimum times a job unit can fit on n . Such a variable u i,n,j represents a possibleallocation of a job unit j of i on node n andhas s ( u i,n,j ) = s ( τ i ) and d ( u i,n,j ) = d ( τ i ). Todefine the allocation, the model relies on theexecution state property ( x ) of CIVs. We have x ( u i,n,j ) ∈ [0 , x ( τ i ) = 1 because all of themneed to be scheduled and thus be present inthe solution. The model uses the alternative constraint [13] to restrict the number of vari-ables in ∪ n ∈ N [ x ( u i,n,j )] present in the solutionto be the maximum number of requestednodes rn i , that is (cid:80) n ∈ N (cid:80) j x ( u i,n,j ) = rn i with s ( τ i ) = s ( u i,n,j ) iff x ( u i,n,j ) = 1. Ad-ditionally, the capacity constraints are en-forced for each n ∈ N and for each r ∈ R as cumulative ([ s ( u i,n,j )] , [ d ( u i,n,j )] , [ req i,r /rn ] , cap n,r ).A drawback of this model is its number of vari-ables. While the scheduling model has | ¯ Q | vari-ables, the allocation model has (cid:80) i ∈ ¯ Q (cid:80) n ∈ N p i,n variables, which increases proportionally to N (i.e., system size). Minimum 1 + | N | variablesare needed to model a serial job. Parallel jobs re-quire even more variables which may create dif-ficulty in big systems with many parallel jobs. Our new dispatcher PCP’20 imports the schedul-ing model, the objective function and the job pri-orities of PCP’19 and contains a new allocationmodel with | ¯ Q | + (cid:80) i ∈ ¯ Q rn i ∗ | R | variables, whichis system size independent. The number of vari- ables thus depends mainly on the workload, witha variable number of requested nodes for each job i multiplied by the number of resource types inthe system which has a small value. In the fol-lowing, we first present the allocation model andthen describe how we search on the schedulingand the allocation variables. Allocation model
In this new model, we rep-resent the system in a way to emphasize the re-sources instead of the nodes as in the previousmodel. We consider all the resources of a cer-tain type r in an ordered list by following thesequence of the nodes. This is exemplified inFigure 2 which represents partially the Eurorasystem composed of 64 nodes. Each node has 16cores and 16 GM memory, additionally the first32 nodes have 2 GPUs, and the next 32 has 2MICs instead of GPUs. In the figure, the linelabelled as GPU, for instance, lists all the GPUresources available in the system. There are intotal 2 ∗
32 GPUs, the first two in the list arefrom the first node, the third and the forth fromthe second node and so on. Each position in alist thus refers to a specific resource of type r ina node n .Figure 2: Node mapping on the Eurora system.This representation allows to model the posi-tion of a job unit in a timeless way, however,it can be easily transformed to a 2-dimensionalrepresentation considering also the time, as de-5igure 3: Representation of the allocation of ajob unit on a resource type as a box.picted in Figure 3, where the y-axis gives theavailable positions to allocate a job unit on aresource type r , and the x-axis gives the timeinterval during which the job unit will consumethe resource. In detail, the allocation of a unit j of a job i on a resource type r is representedas a box. The vertices of the box are definedby the variables in the origin: s ( τ i ) which isthe starting time of the job i and y i,r,j whichis the starting position of the allocation of thejob unit in the new system representation. Thebox spans from the origin to the expected du-ration d ( τ i ) in the x-axis, instead in the y-axisto req i,r /rn i which is the amount of resource re-quired by the job unit. As for the domains of thevariables, we have D ( y i,r,j ) = [1 , T cap r ], where T cap r = (cid:80) n ∈ N cap n,r . The domain of the start-ing time remains the same as in the schedulingmodel, that is D ( s ( τ i )) = [ t, eoh ].To enforce that a resource can be used by onejob unit only, we forbid the boxes to overlap viathe diffn constraint. For each r ∈ R , we have diffn ([ s ( τ i )] , [ d ( τ i )] , [ y i,r,j ] , [ req i,r /rn i ]). Asthe domain size of the y i,r,j variables dependson the system size and can be very large, weadd implied constraints to the model to shrinkthe domains. The first one regards the positions y i,r,j of a job i on a resource type r when rn i >
1. We post alldifferent ([ y i,r,j ]) to ensure thatthe positions are different. The other impliedconstraints are the classical cumulative con-straints used together with a diffn constraintin packing problems, as was also done in [20]: cumulative ([ s ( τ i )] , [ d ( τ i )] , [ req i,r /rn i ] , T cap r )and cumulative ([ y i,r,j ] , [ req i,r /rn i ] , [ s ( τ i )] , eoh ).Finally, we need additional constraints toguarantee that certain job units are allocated inthe same node. For that, we utilize a mappingarray map r for each resource type r , which isbased on the new representation of the systemintroduced earlier. The positions of map r cor-respond to the available resources, indexed by1 to T cap r = (cid:80) n ∈ N cap n,r , and each value inthe array is a number corresponding to a sys-tem node. To ensure that a unit j of a job i of each r are allocated in the same node, wepost an element constraint, which indexes an ar-ray with a variable, as element ( map r , y i,r ,j ) = element ( map r , y i,r ,j ) ∀ r , r ∈ ˆ R , where ˆ R isthe set of the requested resource types of the unit j of job i . We use the element constraint also toenforce that the covered positions spanning from y i,r,j to y i,r,j + req i,r /rn i are in the same node: element ( map r , y i,r,j ) = element ( map r , y i,r,j + req i,r /rn i ) ∀ r ∈ ˆ R iff req i,r /rn i > Search
We search on the scheduling and theallocation variables by interleaving the schedul-ing and the allocation assignments of a selectedjob. At each decision node during search, we se-lect the job i whose priority is highest and thatcan start first. Note that the priorities are cal-culated once statically at the dispatching time t before search starts. We assign to s ( τ i ) its ear-liest start time min ( D ( s ( τ i ))). Then among theallocation variables [ y i,r,j ] of i , we select the onethat has the minimum domain and assign it to its6aximum value, by following the best-fit strat-egy. To evaluate the significance of our approach,we conducted an experimental study by simu-lating on-line job submission two HPC systems.We dispatched the jobs using PCP’20, PCP’19,HCP’19, and compared them in various aspects.
HPC systems and workload datasets
Ourstudy is based on workload traces collected fromtwo HPC systems different in size and architec-ture. The first is the KIT ForHLR II system ,located at Karlsruhe Institue of Technology inGermany.The system size is comparable to the cur-rent trend (see Figure 1) with 1,152 thin nodes,each equipped with 20 cores and 64 GB mem-ory, along with other 21 fat nodes each contain-ing 48 cores, 4 GPUs, and 1 TB memory. Theworkload dataset is available on-line and con-tains logs for 114,355 jobs submitted during thetime period June 2016–January 2018. Of all thejobs, 66.26% are short ( < Job duration prediction
We derived the ex-pected durations d i of jobs via three predictionmethods. The first is a data-driven heuristic firstproposed in [10] and later used with PCP’19and HCP’19 during the simulation of the Eu-rora dataset [9]. Despite being a valid alterna-tive, this method relies on job names, a type ofdata omitted in the KIT and some other publicdatasets. We thus employed a second heuristicmethod that uses the run times of the last twojobs to predict the duration of the next job [21].In both methods, the predictions are calculatedon-line during the simulation and the knowledgebase is updated upon job termination. The lastprediction method is an oracle which gives theactual runtime (real) durations d ri and providesa baseline during the simulation of both datasets. Simulation
We used the AccaSim workloadmanagement system simulator [8] to simulatethe HPC systems with their workload datasets.Each job submission is simulated by using itsavailable data, for instance, the owner, the re-quested resources, and the real duration, the ex-ecution command or the name of the applica-tion executed. AccaSim uses the real duration tosimulate the job execution during its entire dura-tion. Therefore job duration prediction errors donot affect the running time of the jobs with re-spect to the real workload data. The dispatchersare implemented using the AccaSim directives toallow them to generate the dispatching decisionsduring the system simulation.
Experimental setup
As a CP modelling andsolving toolkit, we customized Google OR-7ools , and carried over their parame-ters to PCP’20. All experiments were performedon a CentOS machine equipped with Intel XeonCPU E5-2640 Processor and 15GB of RAM. Thesource code of all the dispatchers is available at https://git.io/fjia1 . In this section, we show our experimental results.In each simulation, we compare the dispatchers’performance (in Tables 1 and 2) in terms of (i)the average CPU time spent in generating a dis-patching decision over all dispatcher invocations,including the time for modeling the dispatchingproblem instance and searching for a solution,and (ii) the total simulation time from the firstjob submission until the last job completion. Wealso compare the dispatchers’ QoS (in Figures4 and 5) in terms of the average slowdown andwaiting times of the jobs. To refer to a dispatcherusing a certain job duration prediction method,we append -D, -L2 or -R to the name of the dis-patcher for the data-driven heuristic, the last-two heuristic and the real duration, respectively.
PCP’19 cannot not finalize the simulation of abig system like KIT ForHLR II. At some pointin time, it stops dispatching, even if new jobs areentering in the queue and the system is emptywith all its resources available. This is because https://developers.google.com/optimization/ https://git.io/fjia1 PCP’19 cannot handle certain dispatching in-stances within the available time limit and blocksthe current and the next dispatching decisions.PCP’20 and HCP’19 instead complete the simu-lation, confirming their advantage to PCP’19 ina big system. Comparing PCP’20 and HCP’19(Table 1 and Figure 4), we see that HCP’19 has amuch better performance than PCP’20 and pro-vides a slightly better QoS. This is not surprisingdue to the system architecture with only CPUcores and memory in 98% of its nodes. In such anhomogeneous system, allocation is rather trivial.The decisions generated in the scheduling stageof HCP’19 are often feasible also during the allo-cation stage, with no need of a special allocationapproach.
All dispatchers finalize the simulation of a smallsystem Eurora. Comparing their results (Table2 and Figure 5), we can clearly see the benefits ofusing PCP’20. In an heterogeneous system, allo-cation decisions are nontrivial, hence the decou-pled approach of HCP’19 decreases significantlythe dispatcher performance. We observe a fur-ther performance decrease in PCP’19 which canbe attributed to its higher number of decisionsvariables. While the quality of the dispatchingdecisions are comparable across the dispatchers(and are superior to those of the Eurora’s dis-patcher PBS), we note the substantial decreasein the error of average slowdown from HCP’19-Dto PCP’19-D and then to PCP’20-D.
An additional analysis is needed in order toquantify the reduction in the number of deci-8 ispatcher Avg. disp. time [ms] Total sim. time [s]HCP’19-L2 292 58,934PCP’19-L2 ∞ ∞
PCP’20-L2 537 108,608HCP’19-R 270 54,719PCP’19-R ∞ ∞
PCP’20-R 662 133,086
Table 1: Times obtained from the KIT ForHLRII system.Figure 4: Average and error bars showing onestd. deviation of slowdown and waiting times [s]obtained from KIT.sions variables obtained by going from PCP’19to PCP’20. During the simulation of an HPCsystem and its workload data, all dispatchersstart with the same dispatching instance, butthen they schedule and allocate jobs diversely.This in turn leads to different jobs running ondifferent resources of the system as well as todifferent jobs waiting in the queue in the nextdispatching time. We cannot therefore comparethe dispatchers’ model size on the distinct in-stances they entail throughout the simulationperiod. To analyze the dispatchers on the sameinstances, we saved the instances created dur-ing the simulation of the Eurora workload whileusing PCP’19-D and PCP’19-R as a dispatcher.Each instance is created when the simulator callsthe corresponding dispatcher, and the instance isdescribed by the queued jobs, the running jobs
Dispatcher Avg. disp. time [ms] Total sim. time [s]HCP’19-D 411 219,142PCP’19-D 565 301,078PCP’20-D 252 134,240HCP’19-R 385 204,363PCP’19-R 512 272,925PCP’20-R 364 193,751
Table 2: Times obtained from the Eurora sys-tem.Figure 5: Average and error bars showing onestd. deviation of slowdown and waiting times [s]obtained from Eurora.and their specific allocation on the system. Weobtained in total 624,564 instances.Figure 6 shows the ratio of the number of de-cision variables between PCP’20 and PCP’19 oneach instance. For all instances, the ratio is be-low 0.1, proving the significance of the new allo-cation model in PCP’20. To confirm the impacton the dispatching time, we show in Figure 7the ratio of the dispatching time. For almost allthe instances, the ratio is between 1 and 0.01,supporting the direct effect of model size on thedispatcher performance. We also analyzed theratio of the quality of the dispatching decisions.The results (not shown for space reasons) are in-line with those shown in Figure 5. The ratio is1 for the vast majority of the instances.9igure 6: Ratio of the number of decision vari-ables between PCP’20 and PCP’19 on the indi-vidual Eurora instances.Figure 7: Ratio of the dispatching time betweenPCP’20 and PCP’19 on the individual Eurorainstances.
Constraint Programming (CP) is a well-established area in AI as a programmingparadigm for modelling and solving discrete opti-mization problems, and it has been been success-fully applied to tackle the on-line job dispatchingproblem in HPC systems [4, 6] including thoserunning modern applications [9]. The limitationsof the available CP-based job dispatchers mayhinder their practical use in today’s systems thatare becoming larger in size and more demandingin resource allocation. In an attempt to bringbasic AI research closer to a deployed applica-tion, we presented a new CP-based on-line jobdispatcher for HPC systems (PCP’20). Unlike its predecessors, PCP’20 tackles the en-tire problem in CP and its model size is inde-pendent of the system size. Experimental re-sults based on a simulation study show that withour approach dispatching performance increasessignificantly in a large system and in a systemwhere allocation is nontrivial.While we have used in our experiments realdata representing the workload of modern ap-plications, our conclusions are based on a sim-ulation study which is restricted by the capa-bilities of the simulator. For instance, AccaSimdoes not add the dispatching time to the waitingtimes of jobs. This could be the reason why wehave not observed meaningful gains in the QoS.In a real system, jobs’ waiting time (and slow-down) would increase during dispatching time,therefore dispatcher performance would directlyaffect the QoS. We want to investigate this bymodifying the simulator accordingly. Towardsour objective to deploy and evaluate a CP-baseddispatcher in a real system, we plan to integratein the model sophisticated allocation strategies,like those proposed for heterogeneous systems[14]. Moreover, we plan to improve the searchperformance by breaking the symmetry intro-duced in the model due to the resources of thesame type.
Acknowledgements
We thank A. Bartolini, L. Benini, M. Milano,M. Lombardi and the SCAI group at Cinecafor providing the Eurora data. We also thankthe School of Computer Engineering of PUCVin Chile for providing access to computing re-sources for simulations. C. Galleguillos has beensupported by Postgraduate Grant INF-PUCV2020.10 eferences [1] A. Aggoun and N. Beldiceanu. ExtendingCHIP in order to solve complex schedulingand placement problems. In
JFPL’92, st French Conference on Logic Programming,25-27 May 1992, Lille, France , page 51,1992.[2] Altair. Altair PBS professional (accessedseptember 4 2020), 2020.[3] P. Baptiste, P. Laborie, C. L. Pape, andW. Nuijten. Chapter 22 - constraint-basedscheduling and planning. In
Handbook ofConstraint Programming , volume 2 of
Foun-dations of Artificial Intelligence , pages 761–799. Elsevier, 2006.[4] A. Bartolini, A. Borghesi, T. Bridi, M. Lom-bardi, and M. Milano. Proactive work-load dispatching on the EURORA super-computer. In
Proceedings of Principlesand Practice of Constraint Programming -20th International Conference, CP 2014,Lyon, France, September 8-12, 2014. , vol-ume 8656 of
Lecture Notes in Computer Sci-ence , pages 765–780. Springer, 2014.[5] J. Blazewicz, J. K. Lenstra, and A. H.G. R. Kan. Scheduling subject to resourcesonstraints: classification and complexity.
Discrete Applied Mathematics , 5(1):11–24,1983.[6] A. Borghesi, F. Collina, M. Lombardi,M. Milano, and L. Benini. Power cappingin high performance computing systems.In
Proceedings of Principles and Practiceof Constraint Programming - 21st Interna-tional Conference, CP 2015, Cork, Ireland,August 31 - September 4, 2015, Proceedings , volume 9255 of
Lecture Notes in ComputerScience , pages 524–540. Springer, 2015.[7] C. Cavazzoni. EURORA: a european ar-chitecture toward exascale. In
Proceedingsof the Future HPC Systems - the Chal-lenges of Power-Constrained Performance,FutureHPC@ICS 2012, Venezia, Italy, June25, 2012 , pages 1:1–1:4. ACM, 2012.[8] C. Galleguillos, Z. Kiziltan, A. Netti, andR. Soto. Accasim: a customizable workloadmanagement simulator for job dispatchingresearch in HPC systems.
Cluster Comput-ing , 23(1):107–122, 2020.[9] C. Galleguillos, Z. Kiziltan, A. Sˆırbu, and¨O. Babaoglu. Constraint programming-based job dispatching for modern HPC ap-plications. In
Proceeding of Principles andPractice of Constraint Programming - 25thInternational Conference, CP 2019, Stam-ford, CT, USA, September 30 - October4, 2019 , volume 11802 of
Lecture Notes inComputer Science , pages 438–455. Springer,2019.[10] C. Galleguillos, A. Sˆırbu, Z. Kiziltan,¨O. Babaoglu, A. Borghesi, and T. Bridi.Data-driven job dispatching in HPC sys-tems. In
Proceedings of Machine Learning,Optimization, and Big Data - Third Inter-national Conference, MOD 2017, Volterra,Italy, September 14-17, 2017, Revised Se-lected Papers , volume 10710 of
LectureNotes in Computer Science , pages 449–461.Springer, 2017.[11] R. L. Henderson. Job scheduling under theportable batch system. In
Proceedings ofJob Scheduling Strategies for Parallel Pro-cessing, IPPS’95 Workshop, Santa Barbara, A, USA, April 25, 1995. , volume 949 of
Lecture Notes in Computer Science , pages279–294. Springer, 1995.[12] ITIF. The vital importance of high-performance computing to u.s. competitive-ness. information technology and innovationfoundation. (accessed september 4, 2020),2016.[13] P. Laborie and J. Rogerie. Reasoning withconditional time-intervals. In
Proceedingsof the Twenty-First International FloridaArtificial Intelligence Research Society Con-ference, May 15-17, 2008, Coconut Grove,Florida, USA , pages 555–560. AAAI Press,2008.[14] A. Netti, C. Galleguillos, Z. Kiziltan,A. Sˆırbu, and ¨O. Babaoglu. Heterogeneity-aware resource allocation in HPC systems.In
Proceedings of High Performance Com-puting - 33rd International Conference, ISCHigh Performance 2018, Frankfurt, Ger-many, June 24-28, 2018 , volume 10876 of
Lecture Notes in Computer Science , pages3–21. Springer, 2018.[15] PRACE. The scientific case for comput-ing in europe 2018-2026. prace scientificsteering committee. (accessed september 4,2020), 2018.[16] A. Reuther, C. Byun, W. Arcand, D. Be-stor, B. Bergeron, M. Hubbell, M. Jones,P. Michaleas, A. Prout, A. Rosa, andJ. Kepner. Scalable system scheduling forHPC and big data.
J. Parallel DistributedComput. , 111:76–92, 2018.[17] F. Rossi, P. van Beek, and T. Walsh, edi-tors.
Handbook of Constraint Programming , volume 2 of
Foundations of Artificial Intel-ligence . Elsevier, 2006.[18] A. Silberschatz, P. B. Galvin, andG. Gagne.
Operating System Concepts, 9thEdition . Wiley, 2014.[19] G. Simonin, C. Artigues, E. Hebrard, andP. Lopez. Scheduling scientific experi-ments for comet exploration.
Constraints ,20(1):77–99, 2015.[20] H. Simonis and B. O’Sullivan. Searchstrategies for rectangle packing. In
Proceed-ing of Principles and Practice of ConstraintProgramming, 14th International Confer-ence, CP 2008, Sydney, Australia, Septem-ber 14-18, 2008. , volume 5202 of
LectureNotes in Computer Science , pages 52–66.Springer, 2008.[21] D. Tsafrir, Y. Etsion, and D. G. Feit-elson. Backfilling using system-generatedpredictions rather than user runtime esti-mates.