QoS-Aware Power Minimization of Distributed Many-Core Servers using Transfer Q-Learning
QQoS-Aware Power Minimization of DistributedMany-Core Servers using Transfer Q-Learning
Dainius Jenkus ∗ , Fei Xia ∗ , Rishad Shafik ∗ , and Alex Yakovlev ∗∗ School of Engineering, Newcastle University, Newcastle upon Tyne, NE2 1AD, UKE-mail: { d.jenkus1, fei.xia, rishad.shafik, alex.yakovlev } @newcastle.ac.uk Abstract —Web servers scaled across distributed systems neces-sitate complex runtime controls for providing quality of service(QoS) guarantees as well as minimizing the energy costs underdynamic workloads. This paper presents a QoS-aware runtimecontroller using horizontal scaling (node allocation) and verticalscaling (resource allocation within nodes) methods synergisticallyto provide adaptation to workloads while minimizing the powerconsumption under QoS constraint (i.e., response time). A hori-zontal scaling determines the number of active nodes based onworkload demands and the required QoS according to a set ofrules. Then, it is coupled with vertical scaling using transfer Q-learning, which further tunes power/performance based on work-load profile using dynamic voltage/frequency scaling (DVFS). Ittransfers Q-values within minimally explored states reducingexploration requirements. In addition, the approach exploits ascalable architecture of the many-core server allowing to reuseavailable knowledge from fully or partially explored nodes. Whencombined, these methods allow to reduce the exploration time andQoS violations when compared to model-free Q-learning. Thetechnique balances design-time and runtime costs to maximizethe portability and operational optimality demonstrated throughpersistent power reductions with minimal QoS violations underdifferent workload scenarios on heterogeneous multi-processingnodes of a server cluster.
I. I
NTRODUCTION
An important aspect of web servers is the need to scaleand support the user traffic demands at a specified quality ofservice (QoS) with minimal energy costs [1]. The computenode allocation and/or management of application instances isoften provided by horizontal scaling using rule-based controls,which employ server-level metrics, e.g., CPU utilization [2],application workload characteristics [3] to make scaling deci-sions. Model-based techniques (e.g., using time-series analysis[2], neural networks [4]) have been proposed to improve theoptimality of runtime controls. However, the portability isaffected as extensive modelling data is required, which canbe limited considering the complexity of decision space fordistributed systems and dynamic environments of applications.Alternatively, model-free Q-learning has been used [2],[5] for providing adaptability with little or no domain- orplatform-specific knowledge. Although low-cost, the model-free horizontal scaling using Q-learning often suffer frommuch greater QoS violation rates during the exploration.At higher complexity, reinforcement learning (RL) hybridcontrols [6] and model-assisted Q-learning [7] have been de-ployed for achieving faster learning and lower QoS violations.Authors of [8] proposed energy minimization using transfer learning approach, where learnt DVFS actions are mapped tounexplored states achieving faster learning and better applica-tion performance. This work balances design-time and runtimecosts to maximize the portability and operational optimality bymaking the following contributions : • a QoS-aware runtime controller (termed as RHQV Scaler)with joint horizontal and vertical scaling to provide elasticcontrols that minimize power cost and QoS violationsunder dynamic workloads, • vertical scaling using the principle of transfer Q-learning combining proposed Inter-Node Learning Trans-fer (INLT) and Intra-State Learning Transfer (ISLT)methods to minimize QoS violations, • experimental validation and comparative analysis of themethod, demonstrating minimal QoS violations and min-imized power under different workload scenarios.The remainder of this paper is organized as follows: SectionII explains the proposed runtime management. A case studyand discussions are presented in Section III. Fig. 1: Runtime controller (RHQV Scaler) and its interfaces.
II. P
ROPOSED R UNTIME M ANAGEMENT
The runtime controls are carried out by rule-based horizon-tal scaling (RH) and vertical scaling using transfer Q-learning(QV) together termed as RHQV Scaler (Fig. 1). A monitorcollects metrics, e.g., users, response time ( rt ) to guide scalingdecisions and allow to carry out learning at runtime. Rule-based Horizontal Scaling: firstly, a hierarchical con-trols of RHQV synergistically tunes power/performance ofan application by scaling up/down nodes using a set ofrules, while ensuring QoS (defined as a soft response time
Copyright is held by the author/owner(s).DATE Friday Workshop on
System-level Design Methods for Deep Learning onHeterogeneous Architectures (SLOHA 2021) , Virtual Workshop, February 5, 2021. onstraint). The scale-up action is reactively triggered andenables an additional node when QoS is violated and verticalscaling is no longer sufficient to maintain the required QoS.As workload decreases, the scale-down rule ensures that rt is within the QoS constraint and that the workload wouldbe sustained after scaling-down to avoid instability of scalingactions. This is achieved by comparing a current state (definedas active users) to the available history of states when thesystem made a transition to a current configuration of nodes. Vertical Scaling using Transfer Q-learning : the power/per-formance of an application is further tuned by globally scalingCPU frequencies of active nodes using DVFS (see Fig. 1). Thecontrols are orchestrated by the Q-learning agent, where theQ-values are estimated as follows: Q ( s, a ) = Q ( s, a ) + α [ R ( s, a )+ γmaxQ (cid:48) ( s (cid:48) , a (cid:48) ) − Q ( s, a )] , (1)where R ( s, a ) is the reward, α is the learning rate, and γ isthe discount factor applied for the next state s (cid:48) . The Q-valuesare transferred using the learning transfer consisting of ISLTand INLT components.At the early exploration stage of an unexplored application,ISLT identifies how the performance varies (e.g., linearly)with different DVFS scaling actions across multiple states(i.e., active users). Once identified, the exploration of mini-mally explored states is guided to select specific actions, e.g.,min/max DVFS operating points in order to derive the state’sperformance-action relationship as an approximated function, f rt . The Q-values are then transferred using eq. 1, where thereward is estimated using eq. 2. For each unexplored actionwithin a state, the reward is found using a predicted responsetime from the identified performance-action function (i.e., f rt ).The best rewards are given to actions allowing the system tooperate neither too fast (wasting energy), nor to slow (violatingthe QoS). The reward is defined as follows: R ( s, a ) = (cid:40) β × ( T slack /T rt ) , if T slack < ,β × (1 /T slack ) , otherwise, (2)where T rt is the QoS constraint, T slack = T rt − rt , β is theconstant for scaling the positive rewards ( T slack < ), while β scales the negative rewards received when QoS is violated.INLT works by exploiting scalability provided by theclusters consisting of identical compute nodes. It transfersavailable knowledge between nodes, which host instancesof distributed applications. The INLT works by mapping anunobserved state in an unexplored multi-node configurationto a range of states, which correspond to a fully or partiallyexplored reference node. Then, such state is searched in theQ-table of the reference node, and if found, the correspondingQ-values are transferred to unexplored nodes. Otherwise, theexploration continues as described in ISLT method.III. C ASE S TUDY AND D ISCUSSIONS
The proposed controller is evaluated in terms of powersavings and QoS violation rate ( % of control intervals with Fig. 2: Comparative QoS trade-offs and power savings. violated QoS). The case study application is a Wordpress web-site hosted on a cluster of four Odroid XU4s. Two workloadcases are investigated covering irregular web traffic (rapidlychanging) and a scaled one-day English Wikipedia trace,which is applied using the developed workload generator. Fig.2 (a) indicates that ISLT knowledge transfer has allowed toimprove the selection of vertical scaling decisions (DVFSactions) reducing the QoS violations by around 50%, whileexposed to both workloads for the first time. Fig. 2 (b) displaysQoS violation rate when a second unexplored node is enabledand Q-values are transferred from explored node using INLT.Similarly to ISLT, it shows that QoS violations are lower forWikipedia workload and are minimized approximately by afactor of two when compared to the model-free Q-learning.The effectiveness of power minimization has been evaluatedusing QoS-unaware Linux
Ondemand and
Performance gover-nors [9]. Also, RH method, which excludes vertical scaling butemploys identical horizontal scaling to RHQV. The proposedscaler provides up to 28.39% savings when compared tothe system using Performance governor and up to 16.33%over
Ondemand governor as shown in Fig. 2 (c). RHQVoffers 13.65% more savings when compared to RH (withoutvertical scaling) suggesting that the rest of power reductionscome from the transfer Q-learning based vertical scaling. Thishighlights the advantage of having synergistically joint verticaland horizontal scaling controls for power minimization.R EFERENCES[1] E. M. Elnozahy et al. , “Energy-efficient server clusters,” in
Int. Workshopon Power-Aware Computer Systems , pp. 179–197, Springer, 2002.[2] S. Horovitz and Y. Arian, “Efficient cloud auto-scaling with sla objectiveusing q-learning,” in , pp. 85–92, 2018.[3] T. Lorido-Botran, J. Miguel-Alonso, and J. A. Lozano, “A review of auto-scaling techniques for elastic applications in cloud environments,”
Journalof Grid Computing , vol. 12, pp. 559–592, Dec 2014.[4] S. Islam et al. , “Empirical prediction models for adaptive resourceprovisioning in the cloud,”
Future Generation Computer Systems , vol. 28,no. 1, pp. 155 – 162, 2012.[5] H. Arabnejad et al. , “A comparison of reinforcement learning techniquesfor fuzzy cloud auto-scaling,” in , pp. 64–73, 2017.[6] G. Tesauro et al. , “A hybrid reinforcement learning approach to auto-nomic resource allocation,” in , pp. 65–73, 2006.[7] F. Rossi et al. , “Horizontal and vertical scaling of container-based appli-cations using reinforcement learning,” in , pp. 329–338, 2019.[8] R. A. Shafik et al. , “Learning transfer-based adaptive energy minimizationin embedded systems,”
IEEE Trans on CAD of Integrated Circuits andSystems , vol. 35, no. 6, pp. 877–890, 2016.[9] H. Kopka and P. W. Daly,
The Ondemand governor . Harlow, England:Addison-Wesley, 3rd ed., 1999.. Harlow, England:Addison-Wesley, 3rd ed., 1999.