[PDF] Modeling and Predicting DNS Server Load

Abstract

The DNS relies on caching to ensure high scalability and good performance. In optimizing caching, TTL adjustment provides a means of balancing between query load and TTL-dependent performances such as data consistency, load balancing, migration time, etc. To gain the desired balance, TTL adjustment depends on predictions of query loads under alternative TTLs. This paper proposes a model of DNS server load, which employs the uniform aggregate caching model to simplify the complexity of modeling clients' requests and their caching. A method of predicting DNS server load is developed using that model. The prediction method is solely based on the unilateral measurements or observations at authoritative servers. Without reliance on lots of multi-point measurements nor distributed measuring facilities, the method is best suited for DNS authoritative operators. The proposed model and prediction method are validated through extensive simulations. Finally, global sensibility analysis is conducted to evaluate the impacts of measurement uncertainties or errors on the predictions.

Full PDF

aa r X i v : . [ c s . PF ] J un Modeling and Predicting DNS Server Load

Zheng Wang

Information Technology LaborataryNational Institute of Standards and TechnologyGaithersburg, MD 20899, USAEmail: [email protected]

Abstract —The DNS relies on caching to ensure high scalabilityand good performance. In optimizing caching, TTL adjustmentprovides a means of balancing between query load and TTL-dependent performances such as data consistency, load balancing,migration time, etc. To gain the desired balance, TTL adjustmentdepends on predictions of query loads under alternative TTLs.This paper proposes a model of DNS server load, which employsthe uniform aggregate caching model to simplify the complexityof modeling clients’ requests and their caching. A method ofpredicting DNS server load is developed using that model. Theprediction method is solely based on the unilateral measurementsor observations at authoritative servers. Without reliance on lotsof multi-point measurements nor distributed measuring facilities,the method is best suited for DNS authoritative operators. Theproposed model and prediction method are validated throughextensive simulations. Finally, global sensibility analysis is con-ducted to evaluate the impacts of measurement uncertainties orerrors on the predictions.

I. I

NTRODUCTION

The Domain Name System (DNS) provides an indispens-able substrate for Internet applications and services by linkinghuman-friendly names with machine-readable addresses. Be-ing a deceptively simple protocol based on the client-servermodel, the DNS basically forwards users queries to authorita-tive servers, asking for the translation from name to addressor vice visa, which then return the answers. Given strenuousefforts of optimizing the DNS efﬁciency, security and privacy,the DNS is increasingly complex in its implementation andoperation.A DNS client typically does not directly interact withauthoritative server. Rather, it relies on a caching resolverto iteratively query the relevant authoritative servers for theﬁnal answer. This model intentionally makes a simple andlightweight DNS client, leaving caching resolver tasked withiteration, caching, and even validating on behalf of DNS client.In the client-resolver-authoritative model, DNS caching per-formed at caching resolver also enables scalability in terms ofauthoritative servers load. With caching, a DNS record recentlyretrieved by a caching resolver can be temporarily kept incache for future responding. Hence a caching resolver mayﬁnd cache hit for a large population of incoming recursivequeries, rather than simply issue an outstanding query foreach of them. Since original DNS queries from clients maybe greatly ﬁltered by caching resolver, the resultant queryload on authoritative server and the query latency wouldbe reduced. This ensures the DNS scales well with surgingInternet terminals and intensiﬁed DNS usages. The duration of DNS caching follows a Time-to-Live (TTL)speciﬁed as a ﬁeld of DNS record. The initial value of TTLdecreases with time since the records arrival at a cachingresolver. The record in cache is evicted and no longer usedfor immediate response once its TTL value becomes zero.Naturally, a large TTL means a high cache hit rate and therebyan attenuated query load on authoritative server. In general,the higher the TTL, the less frequently the authoritativeserver is accessed. However, the penalty of a large TTL isthe slow propagation speed of authoritative DNS records tocaching resolver. Given the TTL-based caching mechanism,a caching resolver has no opportunity to refresh a record incache until its TTL expires. The cached copies maintainedat caching resolver may be inconsistent with the authoritativerecords served at authoritative server. A DNS update on theauthoritative side is unlikely to be immediately propagated tothe caching side. In general, the higher the TTL, the slowerDNS updates propagates. Therefore, any TTL value reﬂectsa balance between server load (as well as query latency) andfresh data.In the DNS operation practice, TTL often needs to beadjusted or reset for different purposes in variant scenarios.For example, the migration of a site may result in changeof the site servers IP address, which requires an update onthe associated DNS record mapping the sites domain to itsIP address. Due to cache inconsistency, some users, misledby the cached old record, may still navigate to the old serverafter the migration and thus experience the downtime. Sincethe maximum downtime can be approximated by the TTL, theDNS operator may reduce the TTL for a minimum downtimeprior to the migration and then restore the original TTLafter the migration. Another example of lowering the TTLis the ﬁne-grained DNS based load balancing. As probablythe most common nonproxy load distribution strategy usedby applications, Round-Robin DNS are employed to balancethe workload of multiple application servers accessed via onedomain name. The IP address of each application server isprovisioned in one record of the domain names record set.The load balancing preferences among application serversare encoded by the order of records which is controlled byauthoritative server operator. The effectiveness of DNS basedload balancing is distorted by the caching effects. The higherthe TTL, the less ﬁne-grained load balancing. So DNS operatormay tune the TTL towards low or even zero for a better loadbalancing. The major concern over aggressively decreasing uthoritative Servers

Caching Resolver Caching Resolver Caching Resolver Stub Client Stub Client Stub Client

Fig. 1. Original stub client only DNS model.

TTL is the effect of increasing the load on authoritativeserver, which may overload or overwhelm authoritative server.Conversely, when an authoritative server detects heavy requesttrafﬁc which almost saturates its capability, it may choose ahigher TTL as a means of defense to ensure the availabilityand resilience. In all of the TTL tuning scenarios discussedabove, DNS operators are always confronted with the problemof predicting DNS server load under an alternative TTL.DNS server load is the aggregate DNS request rate orig-inated from individual client, ﬁltered through caching, anddestined to the target domain. The complexity of predictingDNS server load can be reﬂected in three aspects: • The knowledge about each individual clients request pat-tern is hardly known in practice. Since each individual clientis hidden from the authoritative server by the caching resolver,the authoritative server can neither count each individual clientnor ﬁgure out its request rate. • The proﬁle regarding what individual client is served bywhat caching resolver is invisible to the authoritative server.The authoritative server can only observe the query trafﬁc asoutput of each caching resolver, not the original query trafﬁcas input of each caching resolver. • The caching mechanism is hard to quantitatively modelin terms of the collective effect of request ﬁltering by eachcaching resolver. While there were a number of measurementand modeling studies in recent years, their focus limits to asingle caching resolver, not aggregated caching resolvers.This paper proposes an uniform aggregate caching modelto simplify the analysis of equivalent aggregate query loadfrom a diversity of nonuniform clients. Using the proposedmodel, the complex DNS query pattern with caching effectsis parameterized. The load of authoritative servers can beobtained by retrieving the parameter(s) of the model usingthe limited observations from the authoritative servers.The rest of this paper is organized as follows.II. T HE P REDICTION M ETHOD

A. Stub Client Only Model

We ﬁrst assume a client-resolver-authoritative model here.The TTL of the requested DNS record is set as τ . Authoritative Servers

Caching Resolver Caching Resolver Caching Resolver Stub Clients Stub Clients Stub Clients

Fig. 2. Equivalent stub client only DNS model using the UAC model.

Authoritative Servers

Caching Resolver Caching Resolver Caching Resolver

Full Client Full Client Full Client Stub Client Stub Client Stub Client

Fig. 3. Original stub client and full client DNS model.

Authoritative Servers

Caching Resolver Caching Resolver Caching Resolver

Full Clients Stub Clients Stub Clients Stub Clients

Fig. 4. Equivalent stub client and full client DNS model using the UACmodel.

There are N caching resolvers, R , R , ..., R N querying theauthoritative servers for the DNS record. Caching resolver R i serves M i stub clients, S i , S i , ..., S iM i , i = 1 , , ..., N . Therequest rate of stub client S ij for the DNS record is a ij , i =1 , , ..., N , j = 1 , , ..., M i .The incoming DNS request rate for the DNS record ob-served by caching resolver R i , i = 1 , , ..., N can be aggre-gated as i = M i X j =1 a ij (1)And with some requests hit in cache, the outgoing DNSrequest rate for the DNS record from caching resolver R i , i = 1 , , ..., N towards the authoritative servers is b i , i =1 , , ..., N .Note that for constant stub clients and their invariant re-quest rates, A i can be considered as constant. Therefore, theoutgoing request rate of each individual caching resolver issimply determined by the TTL-based caching mechanism.That dependency can be expressed as b i ( τ ) = A i A i ∗ τ (2)The incoming DNS request rate observed by authoritativeservers is the aggregate of the outgoing DNS request ratesfrom all caching resolvers, and thus can be written as B ( τ ) = N X i =1 b i ( τ ) (3)The authoritative server is commonly able to and authorizedto observe, record, and analyze the incoming DNS requests.By identifying the source IP address of each DNS message,the authoritative server can ﬁgure out every caching resolveras well as its request rate. So the server load and the count ofcaching resolvers are known by the authoritative server. Sincethe TTL is totally controlled by the authoritative server, theTTL value is also available.Because the proﬁle of individual clients belonging to eachcaching resolver is always unknown, we simplify the aggregatemodel as the uniform aggregate caching (UAC) model. TheUAC model assumes that the original aggregate request rate asinput of caching resolver equally distributes among all cachingresolvers. That is A = A = , ..., = A N = e A (4)Because of Eq(2), we also have b ( τ ) = b ( τ ) = , ..., = b N ( τ ) = e b ( τ ) (5)So there will be N uniformly requested caching resolversin the UAC model as equivalent to the N diversely requestedcaching resolvers in practice. So the query load of authoritativeservers can be written as e B ( τ ) = N ∗ e A e A ∗ τ (6)Assume the constant query pattern of stub clients and theconstant requesting caching resolvers.Given one TTL of the requested record τ and the respectivequery load of authoritative servers e B ( τ ) , we can derive theequivalent aggregate request rate arriving at each cachingresolver as e A = e B ( τ ) N − e B ( τ ) ∗ τ (7)For a new TTL of the requested record τ ∗ , the query loadof authoritative servers can be predicted using the estimation e A : e B ( τ ∗ ) = N ∗ e A e A ∗ τ ∗ (8) B. Stub Client and Full Client Model

Besides the stub clients and their respective caching re-solvers assumed above in the stub client only model, weconsider some full clients simultaneously requesting the au-thoritative servers. Here full clients are able to contact theauthoritative servers by themselves, independent of cachingresolvers. Hence the authoritative servers will observe notonly the incoming query trafﬁc from caching resolvers butalso that from full clients. Note that the former is unilaterallyimpacted by the TTL of the requested DNS record, whereasthe latter is obviously not related to that TTL at all. Based onthe assumptions in the stub client only model, we further add K full clients, namely C , C , ..., C K . The request rate of fullclient C i for the DNS record is c i , i = 1 , , ..., K .The incoming DNS request rate observed by authoritativeservers is the aggregate of the outgoing DNS request ratesfrom all caching resolvers plus the overall DNS request ratesfrom all full clients: B ′ ( τ ) = N X i =1 b i ( τ ) + D (9)Where D is the overall query rate from full clients: D = K X i =1 c i (10)Similar to the stub client only model, we also suppose N equivalent uniformly requested caching resolvers so that Eq(4) and (5) holds. The query load of authoritative servers canbe expressed as f B ′ ( τ ) = e N ∗ e A e A ∗ τ + e D (11)

1) Two-Measurement-Based Prediction:

We can see fromEq (11) that the equivalent aggregate request rate e A and theoverall query rate from full clients e D are both unknown.Assuming the constant query pattern of stub clients and theconstant requesting caching resolvers, the number of request-ing caching resolvers N can be estimated by the authoritativeservers. Since both the caching resolvers and the full clientsare visible to the authoritative servers, the authoritative serverscan no longer infer the number of requesting caching resolverssimply by counting the requestors. What can differentiate arequesting caching resolver from a full client is that the queryrate of a full client is hardly impacted by the variant TTLhile a TTL change does impact the outbound query rateof a requesting caching resolver. So when the authoritativeservers change the TTL, those requestors with comparativelyminor query rate should be identiﬁed as full clients, and thenumber of requesting caching resolvers can be determined bysubtracting the estimated amount of full clients from the totalobserved requestors.To estimate the remaining two parameters, namely e A and D , we will need the observed inbound query rates of the targetauthoritative servers under at least two TTL values. Sincethe estimation of N also relies on the measurements undertwo TTL values, the overall measurements required by theprediction method can be summarized as the measurementsof each requestor’s query rate under two TTL values. Giventhe two observed inbound query rates f B ′ ( τ ) and f B ′ ( τ ) underthe two TTL values τ and τ and an estimated e N , e A and e D can be obtained by solving the following binary equations f B ′ ( τ ) = e N ∗ e A e A∗ τ + e D f B ′ ( τ ) = e N ∗ e A e A∗ τ + e D (12)For a new TTL of the requested record τ ∗ , the query loadof authoritative servers can be predicted using the estimation e A , e D and e N : e B ( τ ∗ ) = e N ∗ e A e A ∗ τ ∗ + e D (13)

2) Three-Measurement-Based Prediction:

The accuracy ofthe two-measurement-based prediction is somewhat limited bythe estimation e N . As mentioned above, the caching resolversand the clients can be identiﬁed using cluster analysis with twocluster centers. The cluster of caching resolvers has a patternof signiﬁcant query rate difference between two TTL values,and the cluster of clients has a pattern of minor query ratedifference. The distance between the two clusters decreaseswhen the two TTL values become closer, and the difﬁcultyof differentiating the two clusters adds. So the estimation e N is prone to bigger error for close TTL values. However,if more than two measurements under different TTL valuesare available, the estimation e N can be obtained by solvingthe three equations. So do the estimation e A , e D . The threeequations are listed as f B ′ ( τ ) = e N ∗ e A e A∗ τ + e D f B ′ ( τ ) = e N ∗ e A e A∗ τ + e D f B ′ ( τ ) = e N ∗ e A e A∗ τ + e D (14)Where f B ′ ( τ i ) is the observed inbound query rate under τ i ( i = 1 , , ).For a new TTL of the requested record τ ∗ , the query loadof authoritative servers can be predicted using Eq (13).III. V ALIDATIONS

In this section, we use various inbound query distribu-tions of caching resolvers to validate the proposed prediction E s i m a t i on E rr o r TTL of Measurement

60 010001000

TTL of Estimation

80 200030000 (a) Estimation error under varyingTTL of measurement and TTL ofestimationFig. 6. Exponential distribution under stub client only model. E s i m a t i on E rr o r TTL of Measurement

040 10001000

TTL of Estimation

50 200030000 (a) Estimation error under varyingTTL of measurement and TTL ofestimationFig. 7. Uniform distribution under stub client only model. method. For comparison, we let each of the following distri-bution have the mean as 1 (qps).1)

Exponential distribution . We use a exponential distri-bution with mean parameter as 1.2)

Uniform distribution . We use a uniform distributionwith lower and upper endpoints as 0 and 2 respectively(ensuring the mean of 1).3)

Lognormal distribution . The probability density func-tion (PDF) of Lognormal distribution is given by f ( x ) = 1 √ πσx exp ( − [ ln ( x ) − µ ] σ ) x > (15)We use a lognormal distribution with parameters µ and σ as -0.5493 and 1.0481 respectively to ensure the mean andvariance as 1 and 2 respectively.4) Weibull distribution . Weibull distribution has the abilityof assume the characteristics of many different types ofdistributions. This has made it popular among engineers. ThePDF of Weibull distribution is given by f ( x ) = kλ ( xλ ) k − e − ( t/λ ) k x ≥ (16)Where λ and k are the scale and shape parameters respec-tively. Here we let λ = 1 . and k = 5 ensuring the mean as1. 5) Zipf’s distribution . Zipf’s law was ﬁrst explained byG. K. Zipf [1] who found that the frequency of any wordis approximately inversely proportional to its rank in thefrequency table. Zipf’s law has been used to model Web links[2] and network trafﬁc [3]. And efﬁcient caching relies heavilyon Zipf’s law to replicate a small number of immensely (a) Exponential distribution (b) Lognormal distribution (c) Weibull distribution

10 20 30 40 50 60 70 80 90 10002468101214161820 (d) Zipf’s distributionFig. 5. PDFs of the query distributions used in validations E s i m a t i on E rr o r TTL of the 2nd MeasurementTTL of the 1st Measurement (a) Estimation error under varyingtwo TTLs of measurement and Re-questor Distribution 1Fig. 8. Exponential distribution under stub client and full client model (two-measurement-based prediction). E s i m a t i on E rr o r TTL of the 2nd MeasurementTTL of the 1st Measurement (a) Estimation error under varyingtwo TTLs of measurement and Re-questor Distribution 1Fig. 9. Uniform distribution under stub client and full client model (two-measurement-based prediction). popular ﬁles near the users [4]. Zipf’s distribution is usuallywritten as p ( k ) = Ck − α (17)Where the constant α ≈ . A. Validations under Stub Client Only ModelB. Validations under Stub Client and Full Client Model1) Two-Measurement-Based Prediction::

Let the TTL ofestimation be 1800(s). For each query distribution, we ﬁrstassume: 1) the number of caching resolvers and the numberof full clients are both 10,000 (Requestor Distribution 1); andthen 2) 50,000 and 150,000 respectively (Requestor Distribu-tion 2).

2) Three-Measurement-Based Prediction::

Let the TTL ofestimation be 1800(s) and one TTL of measurement remainconstant at 1000(s). For each query distribution, we ﬁrstassume: 1) the number of caching resolvers and the number E s i m a t i on E rr o r TTL of the 2nd Measurement

TTL of the 1st Measurement (a) Estimation error under varyingtwo TTLs of measurement and Re-questor Distribution 1Fig. 10. Exponential distribution under stub client and full client model(three-measurement-based prediction). E s i m a t i on E rr o r TTL of the 2nd MeasurementTTL of the 1st Measurement (a) Estimation error under varyingtwo TTLs of measurement and Re-questor Distribution 1Fig. 11. Uniform distribution under stub client and full client model (three-measurement-based prediction). of full clients are both 10,000 (Requestor Distribution 1); andthen 2) 50,000 and 150,000 respectively (Requestor Distribu-tion 2). IV. G

LOBAL S ENSITIVITY A NALYSIS

Global Sensitivity Analysis (GSA) is a term describing a setof mathematical techniques to investigate how the variationin the output of a numerical model can be attributed tovariations of its inputs. GSA can be applied for multiplepurposes, including: to apportion output uncertainty to thedifferent sources of uncertainty of the model, e.g. unknownparameters, measurement errors in input forcing data, etc.and thus prioritise the efforts for uncertainty reduction; toinvestigate the relative inﬂuence of model parameters overthe predictive accuracy and thus support model calibration,veriﬁcation and simpliﬁcation; to understand the dominantcontrols of a system (model) and to support model-baseddecision-making. .76 2.77 2.78 2.79

Mean of EEs S t anda r d de v i a t i on o f EE s BN Fig. 12. GSA results using EET.

In this section, we investigate how the uncertainties ofthe two inputs, namely the number of caching resolvers andthe server load, impact the output of prediction. We set thenumber of caching resolvers as 100,000, and assume a Zipf’sdistribution of inbound queries of caching resolvers with themean as 1 qps. The uncertainties of the two inputs are bothconﬁgured to range between 10% higher and lower than thetrue values.

A. EET Method

We ﬁrst use the Elementary Effects Test (EET) [5] forthe GSA. The EET is a One-At-the-Time method for globalSensitivity Analysis. It computes two indices for each input: i)the mean (mi) of the EEs, which measures the total effect ofan input over the output; ii) the standard deviation (sigma) ofthe EEs, which measures the degree of interactions with theother inputs. Both sensitivity indices are relative measures.We use a One-At-the-Time sampling strategy as describedin [6]. And the sample strategy is Latin Hypercube.Thebase sample size is 6,000. We use bootstrapping to deriveconﬁdence bounds. And the number of bootstrapping is 1,000.

B. FAST Method

We then adopt the Fourier Amplitude Sensitivity Test(FAST) [7] for the GSA. FAST uses the Fourier decompositionof the model output to approximate the variance-based ﬁrst-order sensitivity indices. The base sample size is set to 3,185.The index for N and B are 0.4902 and 0.4935 respectively. C. VBSA Method

We use Variance Based Sensitivity Analysis (VBSA) [8]for the GSA. We use two well established variance-basedsensitivity indices: the ﬁrst-order sensitivity index (or ’maineffects’) and the total-order sensitivity index (or ’total effects’).We estimates the main effects and the total effects indices [9]by using the approximation technique described e.g. in [11].The base sample size is 6,000. We use bootstrapping to deriveconﬁdence bounds. And the number of bootstrapping is 1,000.

N B-0.500.51 m a i n e ff e c t s N B0123 t o t a l e ff e c t s Fig. 13. GSA results using VBSA.

V. C

ONCLUSION

This paper proposed a model of DNS server load anda method of predicting DNS server load using that model.Simulations over various scenarios demonstrated that the pre-diction method based on the DNS server load model hashigh precision and good robustness. Moreover, the predictionmerely requires the limited local measurements at authoritativeservers, so it is best suited for DNS authoritative operators.R

EFERENCES[1] G. K. Zipf,

Human Behaviour and the Principle of Least Effort . Addison-Wesley, 1949.[2] M. Levene, J. Borges, and G. Loizou, Zipf’s law for Web surfers.

Knowledge and Information System , 3(1), pp. 120-129, 2001.[3] N. Sarrar, S. Uhlig, A. Feldmann, R. Sherwood, and X. Huang, Lever-aging Zipf’s law for trafﬁc ofﬂoading.

ACM SIGCOMM ComputerCommunication Review , 42(1), pp. 16-22, 2012.[4] I. Kotera, R. Egawa, H. Takizawa, and H. Kobayashi, Modeling of cacheaccess behavior based on Zipf’s law. In

Proc. of the 9th Workshop onMemory Performance: Dealing with Applications, Systems and Architec-ture , pp. 9-15, 2008.[5] M. D. Morris, Factorial Sampling Plans for Preliminary ComputationalExperiments,

Technometrics , 33(2), pp. 161-174, 1991.[6] F. Campolongo, A. Saltelli, and J. Cariboni, From Screening to Quan-titative Sensitivity Analysis: A Uniﬁed Approach,

Computer PhysicsCommunications , 182(4), pp. 978-988, 2011.[7] R. I. Cukier, C. M. Fortuin, K. E. Shuler, A. G. Petschek, and J.H. Schaibly, Study of the Sensitivity of Coupled Reaction Systems toUncertainties in Rate Coefﬁcients,

I Theory J Chem Phys. , 59, pp. 3873-3878, 1973.[8] I. Sobol, Sensitivity Estimates for Nonlinear Mathematical Models,

Mathematical Modeling & Computational Experiment (Engl. Transl.), 1,pp. 407-414, 1993.[9] T. Homma and A. Saltelli, Importance Measures in Global SensitivityAnalysis of Nonlinear Models,

Reliability Engineering & System Safety ,52(1), pp. 1-17, 1996.[10] A. Saltelli, P. Annoni, I. Azzini, F. Campolongo, M. Ratto, and S.Tarantola, Variance Based Sensitivity Analysis of Model Output. Designand Estimator for the Total Sensitivity Index,

Computer Physics Commu-nications , 181, pp. 259-270, 2010.[11] A. Saltelli, P. Annoni, I. Azzini, F. Campolongo, M. Ratto, and S.Tarantola, Variance Based Sensitivity Analysis of Model Output. Designand Estimator for the Total Sensitivity Index,

Computer Physics Commu-nications , 181, pp. 259-270, 2010.[12] SAFE Toolbox, http://bristol.ac.uk/cabot/resources/safe-toolbox/13] F. Pianosi, F. Sarrazin, and T. Wagener, A Matlab Toolbox for GlobalSensitivity Analysis,