Sapphire: Automatic Configuration Recommendation for Distributed Storage Systems
SS APPHIRE : Automatic Configuration Recommendation forDistributed Storage Systems
Wenhao Lyu , Youyou Lu , Jiwu Shu , Wei Zhao Tsinghua University, SenseTime Research
Abstract
Modern distributed storage systems come with a plethoraof configurable parameters that control module behav-ior and affect system performance. Default settings pro-vided by developers are often sub-optimal for specific usercases. Tuning parameters can provide significant perfor-mance gains but is a difficult task requiring profound ex-perience and expertise, due to the immense number ofconfigurable parameters, complex inner dependencies andnon-linear system behaviors.To overcome these difficulties, we propose an au-tomatic simulation-based approach, S
APPHIRE , to rec-ommend optimal configurations by leveraging machinelearning and black-box optimization techniques. We eval-uate S
APPHIRE on Ceph. Results show that S
APPHIRE significantly boosts Ceph performance to 2.2 × comparedto the default configuration. Modern distributed storage systems often have multiplelayered and highly modular software architectures, sup-port various types of user cases and consist of hetero-geneous storage devices. Construction of such a com-plex system comes with many design choices, producinga large number of configurable parameters [30]. Figure 1depicts the number of Ceph parameters grows drasticallyto over 1500, near three times larger than the original ver-sion. But storage systems are often deployed with de-fault settings provided by developers, rendering to be sub-optimal for specific user cases [21].Tuning parameters can provide significant performancegains [34], but is a challenging task for ordinary users, asrich experience and deep insight into system internals arerequired. The performance impact of parameter settingsis highly related to the hardware and workload character-istics. There is no single configuration that can work wellunder all kinds of user cases [8]. However, depending onexpert system administrators to manually tune parametervalues for different user cases is unrealistic. An automatic
Year N u m b e r o f P a r a m e t e r s ( k ) Figure 1:
Ceph parameters growth. This figure demon-strate the number of parameters of major Ceph releasessince 2012.solution to generate near-optimal configurations for dif-ferent user scenarios is in demand.We find new challenges in distributed storage scenar-ios compared to previous studies on automatic parame-ter tuning in the storage system [5, 8, 34]. (1)
Configu-ration constraints.
There exist many value constraintsinside the parameter domain. Misconfigurations that vi-olate some constraints can cause system failures or evencrushes. (2)
Huge numbers of parameters.
Distributedstorage systems often provide immense numbers of pa-rameters. The newest Ceph Nautilus even comes with1536 parameters. Search for optimal settings in such enor-mous knob space is still challenging for popular black-boxtechniques. (3)
Higher noise.
Benchmarking results of-ten contain stochastic noises that become much noticeablein distributed environments.In this paper, we propose S
APPHIRE to automaticallyrecommend near-optimal configurations. Firstly, we pro-vide general guidelines to solve the complex parameterconstraints, generate a clean configurable parameter do-main for later processes. Secondly, we analyze differ-ent performance effects of Ceph parameters and find thatonly a small set of parameters have a significant impacton system performance. Based on this observation, werank parameter impacts and only care about the top knobs.By tuning top parameters , we managed to significantly1 a r X i v : . [ c s . D C ] J u l Workloads B a nd w i d t h ( G B / s ) Optimized ConfigDefault Config (a)
Non-reusable configuration
80 200 320 440 560 680 800 920 10401.41.61.82.02.2
PG Number B a nd w i d t h ( G B / s ) (b) Non-linear performance effect
Figure 2: (a) shows performance measurements of the optimized configuration and the default configuration undertwo different. (b) shows how the bandwidth of Ceph changes when we alter the pg numbers in a pool.shrink the optimization time of S
APPHIRE , while main-tains the performance efficacy. Thirdly, we recommendusing Bayesian optimization with Gaussian Process to ef-fectively approximate system performance through noise-corrupted evaluation results. Our evaluations show thatS
APPHIRE can significantly improve the average Cephperformance by 2.2 × compared to the default configura-tion.The rest of this paper is organized as follows. In section2, we elucidate our motivation and difficulties in tuning adistributed storage system. In section 3, we elaborate onthe design and implementation of our automatic tuningsystem. In section 4, we conduct evaluations and demon-strate that S APPHIRE can recommend configurations waybetter than the default and the expert. In section 5, we an-alyze the recent related works and explain the uniquenessand superiority of S
APPHIRE . In section 6 and 7, we arelooking forward to the future work and reach the conclu-sion.
Ceph is a unified open-source distributed storage systemthat gains its popularity in the cloud environment [28].It comes with plenty of configurable parameters. Butmost users resort to using default settings, as the de-fault configuration provided by developers is trusted tobe ”good enough”. Tuning parameters is massively time-consuming and challenging for ordinary users, as pro-found experience and expertise in system internals arerequired. Even worse, Ceph lacks the functionality de-scriptions about parameters and the guidelines on how totune thems, leaving users clueless. The parameter infor-mation provided by the developer is too lacking, whichalso brings great difficulties to our research.Unfortunately, default parameter settings are far fromthe optima and may result in poor I/O performance, es-pecially for new hardware platforms, like NVMe SSDs. Default settings are mostly tuned for common commoditymachines. Users who pursue higher performance by usingmore powerful servers should reconfigure system settingsto leverage the extra hardware resources. Our evaluationshows that tuning parameters in Ceph can provide signif-icant performance gains. Thus we advocate that Cephusers should turn parameters based on their specific sit-uations.Traditionally, parameter tuning is done by system ad-ministrators. They adjust settings, then measure systemperformance. Based on the results, they tweak param-eter values intuitively based on their experience and in-sight of the system. But in distributed storage systems,like Ceph, tuning is much more challenging. Also, theoptimal settings are dependent on hardware and workloadcharacteristics [7]. The best configuration for one work-load may not perform as well for others. In Figure 2a,we optimize the configuration for workload , then testthe default and the optimized configurations on workload . Evaluations show that the optimized setting becomesineffective on workload . The tuning complexity andnon-reusable of optimal settings make manual tuning in-tractable. Thus, to provide the best settings for varioususers, an efficient and effective automatic parameter tun-ing approach is needed in distributed storage systems.Evaluation of distributed systems can be very time con-suming, as restarting and redeploying clusters are neededto make new parameter values take effect. For a Cephcluster with 48 OSDs (Object Storage Daemons), onesystem evaluation may cost nearly half an hour, and thetime increases as the cluster scales out. Thus, we use asimulation-based approach to design a lightweight systemfor near-optimal configuration searching (Sec. 3.1). Eval-uation processes are conducted in the small fixed-sizedtest environment.During our study, we find new challenges in automaticparameter tuning of distributed storage systems.(1)
Configuration constraints.
Complicated parame-2er value constraints exist inside the configuration space.Causing system performance changes not simply linearto the parameter value. Figure 2b shows how bandwidthchanges while we alter placement group numbers in astorage pool. Such irregular and multi-peak correlationsmake it hard to achieve global optimal performance as wehave to avoid those local optima.Also, some constraints are neither documented by de-velopers nor pinpointed by system log messages. For ex-ample, in Ceph Luminous, the place group number is re-stricted from 30 to 250 per OSD. This constrain is hard-coded in the system, but it is not displayed in the docu-mentation provided by the developers. Changing Param-eter values must obey these constraints, as violating themcan cause system failures or even crushes [31]. However,in order to apply automatic algorithms, we need to providea clean search domain that has explicitly defined bound-aries and contains no misconfigurations. To address thisproblem, we propose the parameter constraints solution togenerate a well-defined parameter value domain under theconstraints (Sec. 3.2).(2)
Huge numbers of parameters.
Distributed stor-age systems provide many more parameters, often exceedone thousand. Figure 1 shows the growth of Ceph knobsover the last ten years. With each release, more parame-ters are provided, while few are deprecated. The newestCeph Nautilus even comes with 1536 parameters. Search-ing optimal settings in such enormous parameter space isstill challenging for popular black-box optimization tech-niques. This motivates us to solve this problem by rankingparameter impacts and search optimal configuration onlywith the top set of parameters (Sec. 3.3).(3)
Higher noise.
Benchmark results of storage sys-tems often contain stochastic noises, which become muchmore noticeable in the distributed environments. Basedon our experiments, benchmark noises can deviate fromsystem average performance by 150MB/s (2.5%). Theconventional approach to deal with noise is to take theaverage of multiple tests. But such approaches are un-bearably time-consuming. To handle the high stochasticnoises, we use Bayesian optimization with Gaussian Pro-cess to effectively search optimal configurations throughsuch noise-corrupted observations (Sec. 3.4).The tuning efficiency and efficacy difficulties maketuning Ceph parameters a labored and complicated task.Worse, the optimal settings cannot be used across differ-ent hardware and workload. Manual configuring for everyuser case is intractable, due to the long tuning time andthe difficulties to find the global optima. Thus, to providethe best configurations for various users, an efficient andeffective automatic parameter tuning approach is highlyneeded in Ceph distributed storage system.
In this section, we present our automatic optimal config-uration recommendation system, S
APPHIRE , to overcomethose parameters tuning challenges. Figure 3 shows anoverview of the workflow and the main components ofthe system. C o n t r o ll e r Cluster
ManagementBenchmark
Recommended ConfigurationRecommended ConfigurationSettings
Optimization
Model
Ranking
Model
TestEnvironment
Product
Environment
Top K
Probing
Configuration
Probing
Configuration
Evaluation Database
Target MetricsTarget Metrics
User
Figure 3: S APPHIRE
FrameworkS
APPHIRE consists of two main parts, the controllerand the machine learning (ML) models. The Controlleraccepts user settings like cluster setups, maximum iter-ation steps, and the number of top values used in MLmodels. It manages the storage cluster, makes parame-ter changes take effect by injecting run-time commands,restart or re-deploy system services. Controller alsobenchmarks the performance of the storage system andsends target metrics to the machine learning models.Models consist of the ranking model and the optimiza-tion model. All the system measurement results are storedin the evaluation database. The ranking model processesall the evaluation results and produces a parameter rank-ing list according to their impact on system performance.Based on the ranking, the optimization model uses thetop K parameters to generates the search domain. Thenit probes different configurations and refines the learningmodel until the optimal settings are found, or the iterationreached the limit.S
APPHIRE adopts a simulation-based approach, whichmeans it learns experiences and builds optimization mod-els by evaluating the system performance of a small testcluster under the same simulated or initialized workload.Based on the trained models, it recommends optimal con-figurations for the large product storage cluster. In thisway, the process of finding optimal configurations wouldnot interfere with the online service in the product envi-ronment. We find the test environment is much efficientto evaluate while preserving high accuracy in simulatingdynamic behaviors os the large product cluster, as moderndistributed storage systems provide good scalability.3 arameter Constraints Ceph Examples (Luminous)
1. Some Parameters are unconfigurable. fsid and mon host are fixed at the startup and shouldnot be tuned.2. Some parameters have strict boundaries. In Ceph Luminous, pg number is restricted to [30 , per OSD.3.Some parameters determine whether others take effect, asthey control which module or functionality to use. osd objectstore determines the backend type forOSDs, which can be blueStore , fileStore , memstore or kstore .4. Some parameter values are interdependent, like theymust have a fixed sum or one must lower than the other. The sum of bluestore cache kv ratio and bluestore -cache meta ratio must not exceed 1. ms async -max op threads constraints the max value of ms -async op threads Table 1:
Summary of value constraints inside the parameter space.
We address the problem of constrained parameter valuespace. In Table 1, we analyze and summarize the existingconstraints inside the parameter domain. We propose gen-eral preprocessing guidelines to deal with parameter con-straints. With this solver, we washout un-configurable pa-rameters, prune unused ones, and set up the value bound-aries. Finally, we generate a clean and complete config-urable parameter space, which contains no misconfigu-rations and has well-defined boundaries for later impactranking and automatic tuning.
Parameter washing:
Unfortunately, parameter tuninglacks attention from Ceph developers. The documentsdevelopers provided lack descriptions about parameters’functionalities and sadly did not contain all the knobs thatcould be tuned. Thus, we analyze the configuration sourcecode directly to get the complete parameters set.Parameters that cannot be tuned like port numbers andIP addresses are mixed with configurable ones. We stat-ically analyze the variable names, data types, usage tagsand descriptions of the parameters to address these prob-lems. We remove unconfigurable parameters like IP ad-dresses, port numbers, path strings, and debugging usedparameters.
Parameter pruning:
Modern distributed storage sys-tems often consist of multiple layers of sub-modules andprovide different implementations of the same functional-ities for customization. Different modules and implemen-tations have unique parameters to control their behavior,and there are no shared parameters between them. Forexample, Ceph provides
Bluestore and
Filestore as twodifferent backends and uses osd objectstore to determinewhich to use. For one specific user case, some modulesmay not be used, and their parameters would not affect the system performance. Thus, we can ditch them to re-duce the parameter space.To resolve this constraint, we arrange module selectingparameters into an indexing structure and classify knobsbased on which module and sub-module they belong to.For the specific user case, we analyze modules it dependson, set values of module selecting parameters, and prunethose unused ones to shrink the configuration space.
Parameter boundary:
Values of parameters are mixedwith numerical and non-numerical ones. We convert non-numerical values, such as boolean or string , into integersby mapping candidate values to consecutive indexes.The other problem is that developers do not provide theboundary of most parameters. But a limited search do-main is necessary for optimization searching models. Set-ting parameters boundaries intuitively based on their de-fault values may be simple. But such a static approachcannot guarantee the optimal setting is included. Thus,we dynamically enlarge the extent of parameters when theprobing point comes near the boundary. B a nd w i d t h ( G B / s ) Static BoundaryDynamic Boundary
Time (Hours)
Figure 4:
Dynamic boundary. The recommendationprocess of S
APPHIRE with dynamic boundary and staticboundary.4igure 4 shows that with the dynamic boundary strat-egy, our optimization algorithm can successfully find theglobal optimal results.
In this section, we provide parameter ranking to addressthe problem of the high-dimensional parameter space. Af-ter the parameter preprocessing, we obtain a clean set ofconfigurable parameters that may affect system perfor-mance afterward. The challenge is that there are hundredsof configuration parameters in Ceph. Searching for near-optimal settings inside such high dimensional configura-tion space is too challenging to be achieved.During our experiments, we find that some parametershave a massive impact on system performance, while oth-ers seem to have no effect at all. Based on this observa-tion, we suggest only tuning the most influential param-eters in the configuration recommendation process. Butanother challenge is that we do not know what the mostinfluential parameters are. Ceph developers did not pro-vide information about the influential parameters. Due tothe complexity of the Ceph system, we believe even devel-opers have difficulty in determining the most influentialparameters. Doing experiments for each parameter to de-termine its influence on system performance is not feasi-ble, as the number of parameters is huge. Also, as we dis-cussed before, system evaluation is very time-consumingin Ceph. We have nearly thousands of parameters to an-alyze, yet we can only conduct a few tens to hundreds ofevaluations.In this paper, we propose to use machine learning tech-niques to quantify and rank parameter importance, and fi-nally select the most important parameters. We sample theentire parameter space randomly. Then, we probe Cephwith these sample configurations and collect correspond-ing system performance. Finally, based on the Lasso [36]method, we analyze the relationship between parametersand performance in the sample data and rank the parame-ter importance.In machine learning, feature selection is the process ofselecting a subset of the most relevant features. Featureselection techniques are used to avoid the curse of di-mensionality [3]. They are often used in domains wherethere are many features and comparatively few samples.Feature selection methods are typically divided into fil-ter method, wrapper method, and embedded method [12].Filter methods are particularly effective in computationtime and robust to overfitting [32]. However, they tend toselect redundant features when correlation exists. Wrap-per methods can detect the possible interactions betweenfeatures [20], but increase overfitting risk when the num-ber of samples is insufficient. They also need signifi-cant computation time when the number of features is large. Embedded methods are proposed to combine theadvantages of both previous methods, which perform fea-ture selection as part of the model construction process.Lasso regression is a typical embedded feature selectionmethod [19].Ridge regression and lasso regression are derived fromthe Ordinary Least Squares method, which is a standardapproach in regression analysis by minimizing the sumof the squares of the residuals. But the Ordinary LeastSquares method may have a huge variance under sucha situation, resulting in a biased inefficient model. Tosolve this overfitting problem, Ridge regression uses theL2 penalty, which penalizes the sum of squared coeffi-cients. L2 penalty shrinks coefficients closer to zero todecreases the model complexity. But it cannot zero outcoefficients, thus ended up with all the features. Lassouses the L1 penalty which penalizes the sum of absolutevalues. With L1 penalty, Lasso can zero many small coef-ficients out, thus exclude less relevant features and makethe feature selection. It makes Lasso work well in the highdimensional scenario.Lasso has many advantages over other regularizationand feature selection methods. It is interpretable, stable,and computationally efficient [10, 23]. There are manypractical and theoretical researches backing its effective-ness as a consistent feature selection algorithm [24, 25,35]. Thus, we propose to use Lasso to quantify and rankparameter importance, and finally select the most impor-tant parameters.We also need to preprocess the sample data before ap-plying the lasso model. Because, as far as we know, Lassoprovides higher quality results when the features are con-tinuous and have approximately the same order of mag-nitude. S
APPHIRE preprocesses the sample data in twosteps.First, S
APPHIRE has to deal with the categorical pa-rameters. The categorical parameter is one that has two ormore categories, but there is no intrinsic ordering to thecategories. For example, osd objectstore has four values: blueStore , fileStore , memstore , and kstore . And there is noagreed way to order these values from highest to lowest.In S APPHIRE , we transform categorical parameters intodummy variables. For example, for a categorical param-eter with n values, S APPHIRE converts it into n binaryparameters that take on the values of zero or one. Suchconverting may introduce more parameters. But the num-ber of categorical parameters are fairly small in ceph, onlyten percent. The performance degradation caused by con-verting can be ignored.Second, S APPHIRE has to normalize the data values tohave the same order of magnitude. S
APPHIRE uses log-transformation ( log p ) to transform parameters and per-formance values. After the log-transformation, the datavalues will have approximately the same order of magni-5 arameters Description Type Default Range osd per nvme The number of osds in a single NVMe SSD. Integer 1 Dynamic osd op num threadsper shard ssd
The number of threads per shard for SSD oper-ations. Integer 2 Dynamic osd op num shards ssd
The number of shards for SSD operations. Integer 8 Dynamic bluestore cache sizessd
Default bluestore cache size for non-rotational(solid state) media Integer 3221225472 Dynamic bluefs alloc size
BlueFS instance of allocator is initialized withbluefs alloc size Integer 1048576 Dynamic pg per osd
The place group number for each osd. Integer 100 [30, 250] objecter tick interval
None Double 5 Dynamic ms async rdma sendbuffers
How many work requestes for rdma sendqueue. Integer 1024 Dynamic osd max pgls
Maximum number of placement groups to list. Integer 1024 Dynamic osd loop before resettphandle
Max number of loop before we reset thread-pool’s handle Integer 64 Dynamic osd op pq min cost
None Integer 65536 Dynamic osd max omap bytesper request
The max omap size for a single request. Integer 1073741824 Dynamic journaler write headinterval
None Integer 15 Dynamic osd agent delay time
None Double 5 Dynamic osd agent max ops
Maximum number of simultaneous flushingops per tiering agent in the high speed mode. Integer 4 Dynamic mgr mon bytes
None Integer 134217728 Dynamic
Table 2:
Description of the top 16 configuration parameters generated by ranking.tude. Also, log transformation can decrease the variabilityof data and make data conform more closely to the normaldistribution.
In this section, we introduce the configuration recommen-dation process in S
APPHIRE . S
APPHIRE leverages theexperiment-driven black-box optimization techniques tosearch the near-optimal settings. Black-box optimizationsuits well in our problem, as it views the complex sys-tem in terms of its inputs and outputs, and assumes obliv-iousness to the system internals. Modeling-based tech-niques [15] try to build efficacy and efficient performanceprediction models via deep understanding and formaliza-tion of the system. But they are hard to be implementedhere, due to the complexity of distributed storage systems.We model our problem as an optimization problem withthe objective function: m = F hw ( wl, conf ) . The objec- tive function is defined with following statements: Givena Ceph system deployed on the certain hardware environ-ment, F hw ; for a kind of workload, wl ; suggest a con-figuration, conf , that optimize the target metric, m . Thetarget metric can be system bandwidth, latency or energy.The experiment-driven black-box auto-tuning methodcontains two main units: Experiment Unit and
SearchUnit . The
Experiment Unit takes parameter configura-tions as input. It executes tests on the system with thoseconfigurations, then automatically collects results of tar-get performance metrics. The corresponding configura-tions and results are combined and provided to the
SearchUnit . Guided by an optimization algorithm, the
SearchUnit selects the next configuration to try based on pre-viously learned information. This try and test procedurecontinues until it reaches the optimal results or the itera-tion limit.Selecting the optimization algorithm is important toS
APPHIRE ’s performance. Combining the characteristics6f Ceph, we analyzed and compared four widely stud-ied optimization algorithms: Simulated Annealing (SA),Genetic Algorithms (GA), Reinforcement Learning (RL),and Bayesian Optimization (BO) [2,9,17,26,37]. SA sug-gests the next configuration based on the states of the cur-rent system and does not learn from the old experience,which makes it unreliable to find the global optima. GAgenerates a whole set of new configurations in each iter-ation. It requires much more system measurements thanthe others. But system measurement in Ceph is very time-consuming, which makes GA less practical. RL requiresaccurate data set to train the deep convolutional neuralnetwork. When data noise exists, the learning process inRL can be extremely slow, because much effort is spent tounlearn the biased estimates [11].We observe that with Gaussian Process, Bayesian Op-timization can approximate the objective function accu-rately through noise-corrupted observations. Besides thetolerating of stochastic noises, BO utilizes the full his-tory information of past evaluation results, which makesthe optimal searching process more accurate. Also, BOprobes only one new configuration in each iteration. Itmakes BO more efficient in searching. Thus, we imple-ment the optimization algorithm in S
APPHIRE based onthe Bayesian Optimization and Gaussian Process.
In this section, we detail our evaluation of S
APPHIRE . Wefirst cover the experimental settings in Section 4.1. Sec-tion 4.2 provides an analysis of the top storage parame-ters generated by the parameter ranking process. Section4.3 demonstrates the efficiency of S
APPHIRE . We demon-strate S
APPHIRE ’s recommended configuration outper-forms the default and the expert ones in section 4.4.
We implement S
APPHIRE on the Ceph
RADOS [29] layerto improve the performance of the whole storage systemfrom the very bottom.
RADOS is the object storage layerin Ceph that provides the shared storage backend. Alluser-consumable services like
Ceph Object Store , CephBlock Device , and
Ceph File System use
RADOS to storethe data and metadata [1].
RADOS consists of two typesof daemons: the Monitors and the OSDs. The Monitorsmaintain the cluster map information. The OSDs provideunderlying object storage for all the user data.We perform optimal configuration searching on the testenvironment and then test the recommended configurationon the product environment. The test environment is com-prised of three hosts, one Monitor host, one OSD host andone Client host. The product environment has six hosts, Ceph Luminous 12.2.12OS CentOS 7.4 with kernel 3.10CPU Intel Xeon Gold 6148 ProcessorMemory 32GB Micron DDR4 DRAM × × Table 3:
The hardware and software details. All the ex-periment machines have the same setup.one Monitor host, three OSD host and two Client hosts.Each host has four NVMe SSDs as storage devices. Twoclusters consist of machines with same hardware and soft-ware specification which is summarized in Table 3.
Table 4:
Radosbench workload settings. Rand and Seqrefer to the radom read and the sequential read. For readbenchmark we prefill objects at the start. Size stands forthe Object size. Procs stands for the number of concurrent
Rados bench processes. Ops stands for concurrent I/Ooperations inside each process.We use the Ceph inbuilt benchmark tool, the
Radosbench , to generate various workloads to evaluate the sys-tem performance. There are three different workloads inour experiments. Table 4 shows the detailed settings ofthese workloads.For the simplicity of cluster management and the accu-racy of performance evaluation, we disable Ceph authen-tication, debugging functions, and cache tiering agents forall experiments. We set the replica to one and use a sin-gle storage pool for all
Rados bench clients. Before themeasurement of random read and sequential read perfor-mance, we prefill the pool with write operations. Pre-fill takes more time than read as write speed is generallymuch slower.7 est Env Product Env Test Env Product Env Test Env Product Env04812 B a nd w i d t h ( G B / s ) Workload
DefaultManualOptimized
Figure 5:
Performance measurement of Ceph under default, manual and S
APPHIRE recommended configurations.
Here we calculate and analyze the importance of storageparameters generated by S
APPHIRE ’s parameter rankingprocess. We collect about three hundred system evalua-tions as the sample data. All the evaluation results areunder different configurations generated by random sam-pling. After the collection and preprocessing, S
APPHIRE calculates the importance of all the configurable storageparameters. Table 2 lists the details about the top 16 mostimportant parameters after the ranking.
Rank I m p ac t Figure 6:
Parameter ranking. By ranking, we quantifythe parameter importance. This figure shows the result ofthe quantification.Figure 6 presents the parameter ranking result by usingS
APPHIRE to analyze three hundred different evaluationresults. It is noted that the tendency line drops drastically,which informs us only the top set of parameters can sig-nificantly affect Ceph storage performance.
APPHIRE
Here we evaluate the time consumed by S
APPHIRE to rec-ommend near-optimal configurations. In Sect. 3.3, 3.4,we rank the effect of Ceph parameters using Lasso andonly use top parameters for later optimization. To vali-date our design, we test S
APPHIRE with top 64, 32, and16 parameters under workload
Time (Hours) B a nd w i d t h ( G B / s ) Top 16 ParametersTop 32 ParametersTop 64 Parameters
Figure 7:
The recommendation process of S
APPHIRE with top 64, 32 and 16 parameters.In Figure 7, with the top 16 parameters, S
APPHIRE usesonly 2 hours to reach the optimal configuration. Whilewith the top 32 parameters, it takes nearly 7 hours to reachthe optima. Comparing to using the top 64 parameters, wefind that using the top 16 ones only consumes 30% of opti-mization time, while the performance of final recommen-dations has no apparent differences. This is because theCeph parameter impacts decrease drastically; most knobshave little effect on the performance. Thus, by only usingthe top ones, we can significantly shrink the optimizationtime of S
APPHIRE while maintains the performance effi-cacy.
APPHIRE
Here we evaluate the performance of the recommendedconfiguration by S
APPHIRE . We generate the manualtuned configuration based on the Micron’s storage solu-tion for all NVMe Ceph [18]. Then, we compare the op-timal configuration recommended by S
APPHIRE with thedefault configuration and the manual configuration.Figure 5 presents the measurement results for three8ifferent workloads under test and the product environ-ments. Evaluation results show that the average Ceph per-formance is increased by 120% with S
APPHIRE comparedto the default configuration. And S
APPHIRE can also out-perform manual tuned settings by 40% averagely.The default and the manual configurations are far fromthe optimal. The manual setting could even impair sys-tem performance under specific workloads. That is be-cause configuration effects are highly related to the hard-ware and the workload characteristics. It is also noted thatthe recommended settings based on the test environmentwork similarly well in the large product environment, dueto the excellent scalability of Ceph.
In this section, we describe the related auto-tuning stud-ies. In recent years, several studies were made to automatethe tuning of all kinds of computer systems [16]. Jian etal. [22] use neural networks to optimize the memory allo-cation of database instances, by adjusting buffer pool sizesdynamically according to the miss ratio. Ashraf et al. [17]perform a cost-benefit analysis to achieve long-horizonoptimized performance for clustered NoSQL DBMS inthe face of dynamic workload changes. Ana et al. [13]try to recommend near-optimal cloud VM and storagehardware configurations for target applications based onsparse training data. Black-box optimization are used, asthey view the system in terms of its inputs and outputsand assume obliviousness to the system internals. Meth-ods like Simulated Annealing [9], Genetic Algorithms [2],Reinforcement Learning [37], and Bayesian Optimiza-tion [17, 26] are implemented to find near-optimal con-figurations.Zhen et al. [5, 7, 8] tries to auto-tune storage systemsto improve I/O performance. They summarize the chal-lenges in tuning storage parameters and then performanalysis of multiple black-box optimization techniques onthe storage systems. Their works mainly focus on the lo-cal storage systems, often with less than 10 parameters.While in our work, we focus on the distributed storagesystems and find new challenges, such as the configura-tion constraints, the huge numbers of parameters, and thehigher noise.Caver [6] also tries to solve the challenge of the largenumber of parameters and exponential number of possi-ble configurations. Like S
APPHIRE , Caver proposes tofocusing on a smaller number of more important parame-ters. Inspired by CART [4], Carver uses a variance-basedmetric to quantify storage parameter importance. Carveris designed for categorical parameters, as they find mostparameters in local storage systems are discrete or cate-gorical. But we observe the exact opposite in Ceph, as most configurable parameters are continuous (about 90percent). From Table 2 we can find that all the top 16parameters are continuous. Although there are discretiza-tion techniques that can break continuous parameters intodiscrete sections, feature-selection results depend heavilyon the quality of discretization [14]. Thus, Carver is notsuitable for our problem. Different from Carver, S AP - PHIRE leverages Lasso to choose important knobs. Lassocan provide higher quality results for continuous parame-ters. And a small number of categorical parameters wouldnot degrade the parameter ranking’s performance.SmartConf [27] try to auto-adjust performance-sensitive parameters in the distributed in-memory com-puting system, spark, to provide better performance.SmartConf uses a control-theoretic framework to auto-matically set and dynamically adjust parameters to meetrequired operating constraints while optimizing other sys-tem performance metrics. But SmartConf does not workif the relationship between performance and parameter isnot monotonic. While in our cases, based on Figure 2b,those relationships can be irregular and multi-peak. Un-like SmartConf, S
APPHIRE uses machine learning tech-niques, which is a better fit for such complicated configu-ration space to find near-optimal settings.DAC [33] finds that the number of performance-sensitive parameters in spark is much larger than previousrelated studies (more than 40 vs. around 10). DAC com-bines multiple individual regression trees in a hierarchicalmanner to address the high dimensionality problem. Toreduce the modeling error, the hierarchical modeling pro-cess requires a large number of training examples, whichis proportional to the number of parameters. But thereare hundreds of performance-related parameters in ourproblem comparing to 40 in DAC. Modeling such a high-dimensional system with DAC would require hundreds ofhours to collect training examples, which is impractical.
Configuration constraints and huge numbers of param-eters are difficult challenges in automatic configurationrecommendation in distributed storage systems. We pro-vide general guidelines to resolve such constraints andusing ranking strategy to get top influential parameters.Our simulation-based approach not only leaves online ser-vices undisturbed, but also produces high-quality config-urations. Evaluations show that recommended configura-tions by S
APPHIRE perform much better than default andthe manual configurations.9 eferences [1] Ceph documentation. https://ceph.readthedocs.io/en/latest/ .[2] B. Behzad, H. V. T. Luu, J. Huchette, S. Byna, Prab-hat, R. Aydt, Q. Koziol, and M. Snir. Taming paralleli/o complexity with auto-tuning. In
SC ’13: Pro-ceedings of the International Conference on HighPerformance Computing, Networking, Storage andAnalysis , pages 1–12, Nov 2013.[3] M. L. Bermingham, R. Pong-Wong,A. Spiliopoulou, C. Hayward, I. Rudan, H. Camp-bell, A. F. Wright, J. F. Wilson, F. Agakov,P. Navarro, et al. Application of high-dimensionalfeature selection: evaluation for genomic predictionin man.
Scientific reports , 5:10312, 2015.[4] L. Breiman, J. Friedman, C. J. Stone, and R. A.Olshen.
Classification and regression trees . CRCpress, 1984.[5] Z. Cao, G. Kuenning, K. Mueller, A. Tyagi, andE. Zadok. Graphs are not enough: Using interactivevisual analytics in storage research. In , Renton, WA, July 2019. USENIXAssociation.[6] Z. Cao, G. Kuenning, and E. Zadok. Carver: Findingimportant parameters for storage system tuning. In , pages 43–57, Santa Clara, CA,Feb. 2020. USENIX Association.[7] Z. Cao, V. Tarasov, H. P. Raman, D. Hildebrand, andE. Zadok. On the performance variation in modernstorage stacks. In , pages 329–344, Santa Clara, CA, Feb. 2017. USENIX Asso-ciation.[8] Z. Cao, V. Tarasov, S. Tiwari, and E. Zadok.Towards better understanding of black-box auto-tuning: A comparative analysis for storage sys-tems. In , pages 893–907, Boston,MA, July 2018. USENIX Association.[9] F. Douglis, D. Bhardwaj, H. Qian, and P. Shi-lane. Content-aware load balancing for distributedbackup. In
Proceedings of the 25th InternationalConference on Large Installation System Adminis-tration , LISA11, page 13, USA, 2011. USENIX As-sociation. [10] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, et al.Least angle regression.
The Annals of statistics ,32(2):407–499, 2004.[11] R. Fox, A. Pakman, and N. Tishby. Taming thenoise in reinforcement learning via soft updates.In
Proceedings of the Thirty-Second Conference onUncertainty in Artificial Intelligence , UAI16, page202211, Arlington, Virginia, USA, 2016. AUAIPress.[12] I. Guyon and A. Elisseeff. An introduction to vari-able and feature selection.
Journal of machine learn-ing research , 3(Mar):1157–1182, 2003.[13] A. Klimovic, H. Litz, and C. Kozyrakis. Selecta:Heterogeneous cloud storage configuration for dataanalytics. In
Proceedings of the 2018 USENIXConference on Usenix Annual Technical Confer-ence , USENIX ATC 18, page 759773, USA, 2018.USENIX Association.[14] J. Li, K. Cheng, S. Wang, F. Morstatter, R. P.Trevino, J. Tang, and H. Liu. Feature selection: Adata perspective.
ACM Comput. Surv. , 50(6), Dec.2017.[15] C. Liu, D. Zeng, H. Yao, C. Hu, X. Yan, andY. Fan. Mr-cof: A genetic mapreduce configurationoptimization framework. In G. Wang, A. Zomaya,G. Martinez, and K. Li, editors,
Algorithms and Ar-chitectures for Parallel Processing , pages 344–357,Cham, 2015. Springer International Publishing.[16] J. Lu, Y. Chen, H. Herodotou, and S. Babu. Speedupyour analytics: Automatic parameter tuning fordatabases and big data systems.
Proc. VLDB En-dow. , 12(12):19701973, Aug. 2019.[17] A. Mahgoub, P. Wood, A. Medoff, S. Mitra,F. Meyer, S. Chaterji, and S. Bagchi. SOPHIA:Online reconfiguration of clustered nosql databasesfor time-varying workloads. In , pages223–240, Renton, WA, July 2019. USENIX Associ-ation.[18] R. Meredith. All-nvme performance deep dive intoceph. https://flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180807_INVT-101A-1_Meredith.pdf , 2018.[19] A. Y. Ng. Feature selection, l 1 vs. l 2 regulariza-tion, and rotational invariance. In
Proceedings ofthe twenty-first international conference on Machinelearning , page 78, 2004.1020] T. M. Phuong, Z. Lin, and R. B. Altman. Choos-ing snps using feature selection. In , pages 301–309. IEEE, 2005.[21] P. Sehgal, V. Tarasov, and E. Zadok. Evaluatingperformance and energy in file system server work-loads. In
Proceedings of the 8th USENIX Conferenceon File and Storage Technologies , FAST10, page 19,USA, 2010. USENIX Association.[22] J. Tan, T. Zhang, F. Li, J. Chen, Q. Zheng, P. Zhang,H. Qiao, Y. Shi, W. Cao, and R. Zhang. Ibtune:Individualized buffer tuning for large-scale clouddatabases.
Proc. VLDB Endow. , 12(10):12211234,June 2019.[23] R. Tibshirani. Regression shrinkage and selectionvia the lasso.
Journal of the Royal Statistical So-ciety: Series B (Methodological) , 58(1):267–288,1996.[24] R. J. Tibshirani, A. Rinaldo, R. Tibshirani,L. Wasserman, et al. Uniform asymptotic inferenceand the bootstrap after model selection.
The Annalsof Statistics , 46(3):1255–1287, 2018.[25] R. J. Tibshirani, J. Taylor, R. Lockhart, and R. Tib-shirani. Exact post-selection inference for sequentialregression procedures.
Journal of the American Sta-tistical Association , 111(514):600–620, 2016.[26] D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang.Automatic database management system tuningthrough large-scale machine learning. In
Proceed-ings of the 2017 ACM International Conference onManagement of Data , SIGMOD 17, page 10091024,New York, NY, USA, 2017. Association for Com-puting Machinery.[27] S. Wang, C. Li, H. Hoffmann, S. Lu, W. Sentosa,and A. I. Kistijantoro. Understanding and auto-adjusting performance-sensitive configurations. In
Proceedings of the Twenty-Third International Con-ference on Architectural Support for ProgrammingLanguages and Operating Systems , ASPLOS 18,page 154168, New York, NY, USA, 2018. Associ-ation for Computing Machinery.[28] S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E.Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In
Proceedingsof the 7th Symposium on Operating Systems Designand Implementation , OSDI 06, page 307320, USA,2006. USENIX Association. [29] S. A. Weil, A. W. Leung, S. A. Brandt, andC. Maltzahn. Rados: A scalable, reliable storage ser-vice for petabyte-scale storage clusters. In
Proceed-ings of the 2nd International Workshop on PetascaleData Storage: Held in Conjunction with Supercom-puting 07 , PDSW 07, page 3544, New York, NY,USA, 2007. Association for Computing Machinery.[30] T. Xu, L. Jin, X. Fan, Y. Zhou, S. Pasupathy,and R. Talwadker. Hey, you have given me toomany knobs!: Understanding and dealing with over-designed configuration in system software. In
Pro-ceedings of the 2015 10th Joint Meeting on Foun-dations of Software Engineering , ESEC/FSE 2015,page 307319, New York, NY, USA, 2015. Associa-tion for Computing Machinery.[31] T. Xu, J. Zhang, P. Huang, J. Zheng, T. Sheng,D. Yuan, Y. Zhou, and S. Pasupathy. Do not blameusers for misconfigurations. In
Proceedings of theTwenty-Fourth ACM Symposium on Operating Sys-tems Principles , SOSP 13, page 244259, New York,NY, USA, 2013. Association for Computing Ma-chinery.[32] L. Yu and H. Liu. Feature selection for high-dimensional data: A fast correlation-based filter so-lution. In
Proceedings of the 20th international con-ference on machine learning (ICML-03) , pages 856–863, 2003.[33] Z. Yu, Z. Bei, and X. Qian. Datasize-awarehigh dimensional configurations auto-tuning of in-memory cluster computing. In
Proceedings of theTwenty-Third International Conference on Architec-tural Support for Programming Languages and Op-erating Systems , ASPLOS 18, page 564577, NewYork, NY, USA, 2018. Association for ComputingMachinery.[34] E. Zadok, A. Arora, Z. Cao, A. Chaganti, A. Chaud-hary, and S. Mandal. Parametric optimization ofstorage systems. In ,Santa Clara, CA, July 2015. USENIX Association.[35] C. Zhang, A. Kumar, and C. R´e. Materializa-tion optimizations for feature selection workloads.
ACM Transactions on Database Systems (TODS) ,41(1):1–32, 2016.[36] C.-H. Zhang, J. Huang, et al. The sparsity and biasof the lasso selection in high-dimensional linear re-gression.
The Annals of Statistics , 36(4):1567–1594,2008.1137] J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng,J. Xing, Y. Wang, T. Cheng, L. Liu, and et al. Anend-to-end automatic cloud database tuning systemusing deep reinforcement learning. In