[PDF] A review of analytical performance modeling and its role in computer engineering and science

Abstract

This article is a review of analytical performance modeling for computer systems. It discusses the motivation for this area of research, examines key issues, introduces some ideas, illustrates how it is applied, and points out a role that it can play in developing Computer Science.

Full PDF

AA review of analytical performance modelingand its role in computer engineering and science ∗ Y.C. Tay

National University of Singapore [email protected]

ABSTRACT

This article is a review of analytical performance modelingfor computer systems . It discusses the motivation for thisarea of research, examines key issues, introduces some ideas,illustrates how it is applied, and points out a role that it canplay in developing Computer Science.

1. INTRODUCTION A computer system has multiple components, possibly amix of both hardware and software. It may be a processorarchitecture on a chip, or a software suite that supports someapplication in the cloud. In our context, analytical modelingessentially refers to the formulation of equations to describethe performance of the system.System performance can be measured by some metric y ,say throughput, latency, availabilty, etc. This y depends onthe workload on the system, such as the number of concur-rent threads running on a chip, or the number of users foran application. It also depends on the system conﬁguration,like the number of cores in a processor, or the number ofvirtual machines processing user requests. Collectively, werefer to the numerical workload and conﬁguration variablesas input parameters x .Abstractly speaking, a computer system determines a func-tion f that maps input parameters x to the perfomance met-ric y , i.e. y = f ( x ). The complexity and nondeterminismin a real system makes it impossible to know this f exactly.The purpose of an analytical model is then to derive a func-tion ˆ f that approximates the ground truth f , i.e. ˆ f ( x ) ≈ y .There are various ways of constructing ˆ f . Using a sta-tistical approach, one might start with a sample of (cid:104) x , y (cid:105) values, ﬁx a class of functions for ˆ f , and determine the ˆ f ∗ Some parts of this article appeared in [T] and “Lessons fromTeaching Analytical Performance Modeling”,

Proc. ICPEWorkshop on Education and Practice of Performance Engi-neering , Mumbai, India (April 2019), 79–84. Many thanksto Haifeng Yu and Prashant Shenoy for reading the draftand making many helpful suggestions. . that best ﬁts the sample according to some optimizationcriterion (mean square error, say). Alternatively, a machinelearning approach might use the sample to train an artiﬁ-cial neural network that computes ˆ f . In both cases, ˆ f isdetermined from the sample (cid:104) x , y (cid:105) , so no knowledge of thesystem is used. It follows that we may not be able to useˆ f to analyze the behavior of the system, to understand whyone particular hardware architecture is better than another,or what combination of parameters can cause performanceto deteriorate, etc. Such an ˆ f is thus a blackbox model of f .Sometimes, very little is known about the system itself:the processor architecture may be proprietary, or the servercluster may be virtualized. It such cases, it is entirely ap-propriate that we resort to a blackbox model of the system.In many cases, the performance analyst may actually havea suﬃciently detailed understanding of the system to math-ematically describe the relationships among the speeds anddelays within it. One can then use equations to construct ˆ f .Such an ˆ f is then a whitebox model: it is based on a mathe-matical analysis of the system, can be analyzed by standardmathematical techniques (calculus, probability, etc.), andthe analysis can be interpreted in terms of details in thesystem. Therein lies the power of an analytical model.Our review begins in Sec. 2 by describing a common mo-tivation in constructing an analytical model, namely to en-gineer a computer system: to predict the throughput fora workload, to determine how failures aﬀect response time,to evaluate the performance implication for diﬀerent designchoices, etc.The derivations in an analytical model are often based onstrong assumptions; this is often considered a weakness ofthe approach. Sec. 3 presents examples to show that thesemodels are often accurate even when the assumptions are vi-olated. In addition to the assumptions, the derivations maymake approximations that are hard to justify theoretically(Sec. 4). However, the ﬁnal arbiter for whether the assump-tions and approximations are acceptable lies not in theory,but in the experimental validation of the model.One broadly applicable technique in modeling is bottle-neck analysis (Sec. 5). It is powerful in that it requires verylittle information about the details in a system, but it is alsoweak in that it oﬀers only performance bounds. Nonethe-less, these may suﬃce for some purposes, like comparingscalability limits, say.A crucial advantage that analytical models have over sim-ulation models lies in the global view of the parameter spacethat they oﬀer. One can often use the equations to identify a r X i v : . [ c s . PF ] M a y igure 1: TAS is a tool to help a tenant determine howmany grid nodes are needed to meet the demand [Elastic-Scaling] .important (and unimportant) regions of the space, reducethe number of parameters, etc. (Sec. 6).A computer system can have many components that inter-act in complicated ways, but Sec. 7 shows how this complexinteraction can be decomposed into separate smaller mod-els. Such a decomposition then provides a way to study theseparate impact of hardware and software on a workload,the interaction between resource and data contention, etc.Since an analytical model is an approximation of the realsystem, we must verify not just the accuracy of its numericalpredictions, but also check that crucial properties revealedby the model (are not just artefacts of the model, but) arein fact properties of the real system. Sec. 8 discusses thisconcept of analytic validation.In contrast to the engineering motivation for an analyticalmodel, Sec. 9 points out the role that such models can playin developing a science for the behavior of these engineeredsystems.Throughout, we will draw examples from recent literatureon hardware, networking and datacenters, etc. to show theuse of analytical models in designing, controlling and study-ing systems, large and small. The range of examples is wide,as wide as the broad perspective that we want our studentsto have.Inevitably, we will need to use notation (e.g. M/D/

2. WHY AN ANALYTICAL MODEL?

But ﬁrst, why model a system? To give an example, cloudproviders oﬀer their customers an attractive proposition thatthe resources provided can be elastically scaled according todemand. However, a tenant may need a tool to help deter-mine how much resources (compute power, memory, etc.)to acquire, but is hampered by having very little knowl-edge of the cloud architecture. [ElasticScaling] considersthis problem in the context of a NoSQL data grid, where Figure 2: Scaling a processor down or up (by reducing orincreasing its number of cores) to save energy or speed upcomputation [DatacenterAMP] .the number of grid nodes can be increased or decreased tomatch the workload (see Figure 1). The authors present atool (TAS) that uses equations to analytically model howtransaction throughput varies nonlinearly (and nonmono-tonically, sometimes) as the number of nodes increase.Energy and latency are major issues for data centers. [DatacenterAMP] uses a simple queueing model to studyhow asymmetric multicore processors (AMP) can facilitatea tradeoﬀ between energy use and latency bounds. The ideais to dynamically marshal the resources of multiple coresto give a larger processor that can speed up a sequentialcomputation or, when possible, scale it back down to reduceenergy consumption (see Figure 2). [GPU] also uses a model for architectural exploration. AGPU runs multiple threads simultaneously. This speeds upthe execution, but also causes delays from competition forMSHRs (Miss Status Handling Registers) and the DRAMbus. The paper uses a model to analyze how changing thenumber of MSHRs, cache line size, miss latency, DRAMbandwidth, etc. aﬀects the tradeoﬀ. There is no practicalway of doing such an exploration with real hardware. Part ofthe analysis is based on using the model to generate the CPI(cycles per instruction) stack that classiﬁes the stall cyclesinto MSHR queueing delay, DRAM access latency, L1 cachehits, etc.The model in [SoftErrors] is similarly used to determinehow various microarchitectural structures (reorder buﬀer,issue queue, etc.) contribute to the soft error rate that is in-duced by cosmic radiation. Again, tweaking hardware struc-tures is infeasible, and simulating soft errors is intractable,so an analysis with a mathematical model is essential.One equation from the [SoftErrors] model explains whyan intuition — that a workload with a low CPI should havea low vulnerability to soft errors — can be contradicted.Similarly, the CPI stack breakdown by the [GPU] modelexplains how workload that spends many cycles queueing forDRAM bandwidth can, counter-intuitively, have negligiblestall cycles from L2 cache misses.To design a complicated system, an engineer needs helpfrom intuition that is distilled from experience. However,experience with real systems is limited by availability andconﬁguration history. Although one can get around that viaimulation, the overwhelming size of the parameter spaceusually requires a limited exploration; this exploration is, inturn, guided by intuition. One way of breaking this circular-ity is to construct an analytical model that abstracts awaythe technical details and zooms in on the factors aﬀectingthe central issues. We can then check our intuition with ananalysis of the model.Intuition is improved through contradictions: they pointout limits on the old intuition, and new intuition is gained toreplace the contradictions. Again, such contradictions canbe hard to ﬁnd in a large simulation space, but may be plainto see with the equations in a mathematical model.The above examples illustrate the role that analytical mod-els can play in engineering complex computer systems.

3. ASSUMPTIONS

Analytical models are often dismissed because they arebased on unrealistic assumptions. For example, the latencymodel in [DatacenterAMP] uses a simple

M/M/ [HM] . The exponential distributionhas a strong “memoryless” property and Amdahl’s Law isan idealized program behavior. Therefore, one expects theengineering complexity and overheads of dynamically resiz-ing a multicore processor will render any model (simulatorsincluded) inaccurate. However, the contribution of an an-alytical model often does not necessarily lie in numericalaccuracy, but possibly in providing insight into system be-havior. In the case of [DatacenterAMP] , its simple modelsuﬃces to reveal an instructive interplay among chip area,latency bound and power consumption.Anyway, many models adopt equations from theory whilebreaking the assumptions that were used to derive thoseequations. In [GPU] , the model puts a bound on the de-lay computed with the Pollaczek-Khinchin formula for an

M/D/

M/D/ separable queueing network. [Databas-eScalability] shows how, by ﬁrst proﬁling the performanceof a standalone database, the MVA algorithm can be usedto predict the performance of a replicated database and,in addition, compare two alternative designs (multi-masterand single-master). However, the probability of aborting atransaction increases with the replication and the numberof clients, thus changing the demand for resources (at theCPU, disks, etc.) and thus breaking the MVA assumptions.Yet, experiments show good agreement between model pre-dictions and simulation experiments. [MapReduce] is another example where the model usesthe MVA algorithm for a system that violates MVA assump-tions. Jobs in a system suﬀer delays for various reasons. Inthis example, a job consists of map , shuﬄe and reduce tasksthat are delayed not only by queueing for resources, but alsoby precedence constraints among the tasks (see Figure 3). Figure 3: Precedence constraints among tasks in a MapRe-duce job adds to delays. [MapReduce] .The key equation in the model is A ik ( −→ N ) = (cid:88) j f ij Q jk ( −−−−→ N − i ) , (1)where A ik ( −→ N ) and Q jk ( −−−−→ N − i ) are average queue lengths,and f ij is a quantity determined by the precedence con-straint; this equation is then used in the MVA algorithm toiterate from job population −−−−→ N − i to −→ N , and thus solve themodel. Strictly speaking, however, precedence constraintsviolate MVA’s separability assumptions, and there is no f ij in Mean Value Analysis.This practice of using equations even when their underly-ing assumptions are violated recalls the proverbial reminderto keep the baby (e.g. the Pollazcek-Khinchin formula, theMVA algorithm) while throwing out the bath water (i.e. theassumptions). This seems like a mathematical sin, but onlyif one confuses the suﬃciency and necessity of the equations’assumptions: the equations may in fact be robust with re-spect to violations in their assumptions.

4. AVERAGE VALUE APPROXIMATION (AVA)

The assumptions in an analytical model are often there tohelp justify an equation. [TCP] is a well-cited mathematicalanalysis of the protocol. The ﬁrst step in its derivation forcalculating expected throughput was E (cid:18) YA (cid:19) = EYEA , where Y is the number of packets sent in a period A betweentriple-duplicate packet loss indications. This equation canbe justiﬁed by assuming the TCP window size is a Markovregenerative process , but why should this assumption hold?TCP has many states ( slow start , exponential backoﬀ , etc.)and correlated variables (window size, timeout threshold,etc.). These make it increasingly diﬃcult in the analysis forone to even identify the assumptions needed to go from oneequation to another. Eventually, the authors simply adoptthe approximation EQ ( W ) ≈ Q ( EW ) , without stating the assumptions. Here, Q ( W ) is the prob-ability that a packet loss in a window of size W is causedby a timeout, W is a random variable, and the approxima-tion estimates the average Q ( W ) by Q ( EW ), i.e. replac-ing the variable W with its average EW . This technique,which I call Average Value Approximation (AVA) [T] ,is widely used in performance modeling.or example, [SoftErrors] estimate the time an instruc-tion stays in a reorder buﬀer as (cid:96)K , where (cid:96) is the averagelatency per instruction, and K is the average critical pathlength. Strictly speaking, this time is E ( T + · · · + T n ), where T i is latency for instruction i , n is the critical path length, ET i = (cid:96) and En = K . It is mathematically wrong to say E ( T + · · · + T n ) = ET + · · · + ET n , since n is a randomvariable. If we use AVA and replace n by En , T En willnot make sense if E n is not an integer; if it is, then indeed ET + · · · + ET En = (cid:96) + · · · + (cid:96) = (cid:96)K .A similar approximation appears in [5G] , which is a per-formance analysis of 5G cellular networks. The system insuch a network consists of overlapping wireless cells, withbit rates that diﬀer from cell to cell, and from zone to zonewithin a cell. How long would it take to download a ﬁle (e.g.movie) as a user moves across the cells? The ﬁnal equationin the analysis estimates the average time as T = n c R + n h t h , (2)where n c is the average number of cells visited during thedownload, R is the average time the user spends in a cell, n h is the average number of handovers between cells, and t h is the handover delay. Notice the expression n c R is again anapproximation for E ( R + · · · + R n c ), where R i is the timespent in the i-th visited cell.One could state some assumptions to rigorously justify E ( T + · · · + T n ) = (cid:96)K and E ( R + · · · + R n c ) = n c R , but howdoes one justify those assumptions? In analytical modelingof a complicated system, one is focused on making progresswith the derivation (often liberally replacing random vari-ables by their averages, like E (cid:0) YA (cid:1) = EYEA ) without worryingabout the assumptions. Whether the approximations havegone overboard will eventually be decided by experimentalvalidation.

5. BOTTLENECK ANALYSIS

There is a common misunderstanding that analytical mod-eling requires queueing theory. This is certainly untrue forthe inﬂuential [Rooﬂine] model for analyzing multicore ar-chitectures. It is a very simple model that has a rooﬂine with two straight segments, one representing the peak mem-ory bandwidth between processor caches and DRAM, andthe other representing peak ﬂoating-point processor perfor-mance (see Figure 4). No queueing theory is involved.Despite its simplicity, the rooﬂine model suﬃces for an-swering some questions: (i) Given an architecture and aﬂoating-point kernel, is performance constrained by proces-sor speed or memory bandwidth? The answer lies in com-paring the kernel’s operational intensity (i.e. operations perbyte of DRAM traﬃc) to the two rooﬂine segments. Forexample, Figure 4 shows that the kernel stencil is limited bymemory bandwidth, not processor speed. (ii) Given a par-ticular kernel, which architecture will provide the best per-formance? One can answer this by comparing the rooﬂinesof alternative architectures. (iii) Given a particular archi-tecture, how can the kernels be optimized to push the per-formance? The optimization can be evaluated by examininghow it shifts the rooﬂine and changes the operational inten-sity. [Rooﬂine] makes very few assumptions and treats theprocessor as a blackbox. Details like multithreading, howthreads are distributed over the cores, the cache hierar-chy, etc. are not explicitly modeled. Rather, it focuses on Figure 4: An application’s rate of execution is boundedby memory bandwidth (sloping line) and processor speed(horizontal line); here, the application stencil is limited bymemory bandwidth, while

FFT is limited by processor speed [Rooﬂine] .the possible bottlenecks (memory bandwidth and processorspeed) in the blackbox.Processor architecture aside, every blackbox has its bottle-necks. This is true of datacenter-scale architectures as well.For example, diﬀerent cloud providers may have diﬀerentarchitectures for running transaction workloads. [Cloud-Transactions] consider three of these (see Figure 5). Whatdetermines their scalability limits?Once again, a cloud architecture is essentially a blackbox,with very little information on how the virtualized system isphysically deployed. Nonetheless, one can do a bottleneckanalysis [T] to get the limits on transaction throughput: λ classic = min (cid:26) N WA D WA , cD db , N st D st (cid:27) ,λ partition = min (cid:26) N WA D WA , N (cid:48) st D db + D st (cid:27) ,λ dist.control = min (cid:26) N (cid:48) WA D db + D WA , N st D st (cid:27) , (3)where classic , partition and dist.control denote the three ar-chitectures in Figure 5, N (cid:48) WA and N WA are the number ofweb/application servers with and without (respectively) co-located database servers; N (cid:48) st and N st are the number ofstorage servers with and without (respectively) co-locateddatabase servers; D WA , D db and D st are the service de-mands (seconds on a commodity machine M ) per transac-tion at a web/application, database and storage server (re-spectively), and c is the ratio of database server speed in classic architecture to M ’s speed.By scrutinizing the expressions in Equation (3), one cancompare the scalability limits of two architectures, and de-termine how one can push that limit for a particular archi-tecture. We thus see from Equation (3) how an analyticalmodel can bring clarity to the complex engineering choicesthat support cloud computing.

6. PARAMETER SPACE

The experimental results in [CloudTransactions] ap-pear to show throughput increasing almost linearly for dist.control as the number of emulated browsers EB increases. However,no system can scale its throughput linearly forever. A closerigure 5: Three diﬀerent cloud architectures for supporting transactions [CloudTransactions] .examination of Equation (3) shows that, for the parametervalues in the experiments, saturation should set in at around EB = 13500, but that is beyond the range of the experi-ments ( EB < [P2PVoD] , which an-alyzes peer-to-peer (P2P) video-on-demand. The viral suc-cess of P2P protocols inspired many ideas from academiafor improving such systems. Each proposal can be viewedas a point in the design space, whereas [P2PVoD] uses ananalytical model to represent the entire space spanned bythroughput (number of bytes downloaded per second), se-quentiality (whether video chunks arrive in playback order)and robustness (with respect to peer arrival/departure/fail-ure, bandwidth heterogeneity, etc.). This global view of theparameter space leads to a counter-intuitive observation,that a reduction in sequentiality can increase (sequential)throughput, thus demonstrating the point in Section 2 thatan analytical model can help improve intuition.Furthermore, this model yields a

Tradeoﬀ Theorem thatsays a P2P video-on-demand system cannot simultaneouslymaximize throughput, sequentiality and robustness. Thistheorem is an example of how an analytical model can dis-cover the science that underlies an engineered system.While an analytical model can give us a global view of theparameter space, not all of this space is of practical interest.Rather than delimit this space with some “magic constant”(e.g.

EB < [CloudTransactions] ), [Transac-tionalMemory] illustrates how the restriction can be donethrough an aggregated parameter.Single-processor techniques for coordinating access to sharedmemory (e.g. semaphores) do not scale well as the number ofcores increases, and [TransactionalMemory] examines analternative. For the software transactional memory in thisstudy, locks are used to control access to shared objects, andlocks in a piece of code are grouped into transactions; eachtransaction is, in eﬀect, executed atomically. The parame-ter space in the experiments is restricted to kNL <

1, where k is the number of locks per transaction, N is the numberof concurrent transactions, and L is the number of objectsthat can be locked (so, intuitively, the system is overloadedif kN > L ). In fact, one can show that the performance (through-put, abort rate, etc.) is determined by k and Λ, whereΛ = NL [TGS] , thus reducing the parameter space from3-dimensional ( k, N, L ) to 2-dimensional ( k, Λ). The useof such aggregated parameters ( kNL and Λ) signiﬁcantly re-duces the space that the experiments must cover.Parameter aggregation can go much further. One key is-sue in analyzing transaction performance is modeling the ac-cess pattern. Most studies assume access is uniform , i.e. theprobability of requesting any one of L objects is L . In real-ity, some objects are more frequently requested than others,but how many parameters would it take to model a realis-tic access distribution? One solution to this diﬃculty wasdiscovered by [ElasticScaling] . This paper deﬁnes an Ap-plication Contention Factor

ACF = P lock λ lock T hold , where P lock is the probability of a lock conﬂict, λ lock is the rate of lockrequests and T hold is the average lock holding time. Exper-iments show that the ACF of a given workload is constantwith respect to the number of nodes in the data grid, andthe same whether it is run in a private or public cloud (seeFigure 6).Now, let D = ACF , so P lock = λ lock T hold D . (4)By Little’s Law, λ lock T hold is the expected number of objectsthat are locked, so Equation (4) says the probability of con-ﬂict is as if access is uniformly distributed over D objects.This is because the access pattern, being a property of thetransactions, should be constant with respect to the numberof grid nodes, and the same whether the cloud is private orpublic.This observation simpliﬁes tremendously the analysis oftransaction behavior. It says the large literature that restson the assumption of uniform access remains valid once thenumber of objects is reinterpreted via ACF ; going forward,we can continue to adopt uniform access in modeling trans-action performance. This is another example where analyti-cal modeling is helping to discover the science in engineeringcomputer systems.

7. DECOMPOSITION AND DECOUPLING

Analyzing a computer system is not any easier if it is asigure 6: The Application Contention Factor

ACF is aproperty of the transactions; it does not vary with the num-ber of grid nodes, and is the same for private and publicclouds [ElasticScaling] .small as a chip, as is the case with [SoftErrors] (see Sec-tion 2). The probability that a radiation-induced fault ina microarchitecture structure (e.g. reorder buﬀer or issuequeue) causes a program output error depends on both thehardware (microarchitecture) and the software (program).Recall from Section 4 that the model estimates the time aninstruction stays in a reorder buﬀer as (cid:96)K , where (cid:96) is the av-erage instruction latency, and K is the average critical pathlength. It thus decouples the hardware and software: onecan analyze separately the impact of changing the hardware(which determines (cid:96) ) and changing the workload (which de-termines K ). For example, we can evaluate how changingissue width aﬀects performance, without having to re-proﬁlethe workload. Such decoupling is a powerful technique in thescientiﬁc analysis of a complicated system.One can view such a decoupling as decomposing into twosubmodels: one for hardware, the other for software. [MapRe-duce] has a similar decomposition. In Equation (1), f ij isa factor determined by the precedence constraint, while A ik and Q jk are queue lengths in the queueing network. Theprecedence constraint is a model for the job execution, whilethe queueing network is a model for the resource contention.These two submodels are not completely decoupled, as eval-uating f ij , A ik and Q jk requires an iteration between thegraph and queueing models.There is also model decomposition in [ElasticScaling] .Figure 1 shows the performance predictor consists of twosubmodels: (i) A statistical model for the resource con-tention; since the resources are virtualized and the hardwareconﬁguration is unknown, this model uses machine learningto model how the cloud performance responds to changesin the workload. (ii) An analytical model for the data con-tention, i.e. how the number of locks, transactions and ob-jects aﬀect the probability that a transaction encounters alock conﬂict, or has to abort. We see here that the black-box/whitebox diﬀerence mentioned in Section 1 does notmean we must choose just one of them. The [ElasticScal-ing] has both: the machine learning model is a blackbox,while the analytical model is a whitebox; the two are devel-oped independently but, again, the performance prediction Figure 7: As user speed increases while traversing 5G cells,the throughput for a ﬁle download at ﬁrst increases, thendecreases [5G] .requires an iteration between the two.

8. ANALYTIC VALIDATION

Although analytical models are often touted as an alter-native to simulators, many of them are actually used as sim-ulators. To give an example, [5G] presents an interestingplot showing that, as user speed increases, throughput atﬁrst increases, then decreases (see Figure 7). This plot wasnot generated by a simulator, but by numerical solution ofthe [5G] analytical model; and the nonmonotonic through-put behavior is observed from the plot, not proved with themodel. In this sense, the model is used as a simulator; I callthis analytic simulation [T] .If we draw conclusions from plotting numerical solutionsof an analytical model, how do we know if these conclusionsare about the model, rather than the system? This is apossibility because the model is only an approximation ofthe system (this is an issue for simulators as well).To give an example, classical MVA solutions of a queue-ing network model has throughput that increases monoton-ically as the number of jobs increases. If such a model wereused for [ElasticScaling] , it will fail to capture a criticalbehavior (throughput nonmonotonicity) in the system (seeSection 2). Indeed, one can ﬁnd examples in the literatureof such a qualitative divergence in properties between modeland system.We should therefore verify that an interesting propertyobserved in a model is, in fact, a property of the system. Icall this analytic validation [T] .For example, the [ElasticScaling] model claims that Ap-plication Contention Factor

ACF is a constant that is aproperty of the transactions (independent of the number ofgrid nodes and whether the transactions run in a private orpublic cloud), and Figure 6 presents an experiment to vali-date this claim. Note that nothing in this plot is generatedby the model — the properties we observe in the data areproperties of the system itself. In contrast, in a numeric val-idation of an analytical model, there is always a comparisonof experimental measurements to numerical solutions fromthe model.igure 8: For a ﬁxed utilization ρ = 0 .

8, simulated buﬀeroccupancy EQ for diﬀerent link rates are similar. Moreover,they have a saw-tooth behavior, as predicted by the model [RouterBuﬀer] .We can see another example of analytic validation in [Router-Buﬀer] . Internet routers drop packets when their buﬀersare full. To avoid this, and to accommodate the large num-ber of ﬂows, router buﬀers have become very large. [Router-Buﬀer] uses an analytical model to study whether ﬂow mul-tiplexing can make this buﬀer bloat unnecessary. Among theresults is an expression that says buﬀer occupancy EQ de-pends on the link rate only through utilization ρ . This isa strong claim from the model, so it requires experimentalveriﬁcation.Figure 8 shows that, for ﬁxed ρ = 0 .

8, simulation measure-ments of EQ are indeed similar for diﬀerent link rates. Fur-thermore, a numerical solution of the model shows that EQ has a saw-tooth behavior as the TCP ﬂow length increases;Figure 8 shows that the simulated EQ has this behavior too.The numerical diﬀerence between model and simulation inthe plot is beside the point; rather, the experiment showsthat an unusual behavior predicted by the model is not anartifact of the assumptions and approximations, but is infact a property of the system itself.For another illustration, consider the IEEE 802.11 proto-col for WiFi: it speciﬁes what a base station and the mobiledevices in its wireless cell should do in sending and receivingpackets. Simultaneous transmissions from diﬀerent mobiledevices can cause packet collisions and induce retransmis-sion and backoﬀ, so maximum throughput in the cell canbe lower than channel bandwidth. The model in [802.11] examines how this saturation throughput depends on theprotocol parameters, and analyzes the tradeoﬀ between col-lision and backoﬀ.One of the claims from the [802.11] model says that theprobability p of a collision depends on the protocol’s min-imum window size W and the number of mobile devices n only through the gap g = Wn − . Figure 9 shows that (cid:104) g, p (cid:105) from simulations with diﬀerent conﬁgurations do infact lie on the same curve. As a corollary, this reduces the 3-parameter space (cid:104) n, W, m (cid:105) — where maximum window sizeis 2 m W — to just a single parameter g (recall Section 6).The model also claims that bandwidth wasted by packetcollisions exceeds idle bandwidth caused by backoﬀs if and only if r > T , where r is the transmission rate and T is thetransmission time (including packet headers); this was alsoanalytically validated by the experiments. Here, we see howan analytical model can discover the science that governswireless packet transmissions over a shared channel.

9. ANALYSIS WITH AN ANALYTICAL MODEL

In answering the question “Why an Analytical Model?”(Section 2), we give examples where the model is used for ar-chitectural exploration ( [GPU] and [SoftErrors] ) and re-source provisioning ( [ElasticScaling] ); other examples in-clude [DatacenterAMP] , [CloudTransactions] and [Rooﬂine] (design exploration) and [MapReduce] (capacity planning).For these examples, the analytical models were used to gen-erate numerical predictions.This analytic simulation can sometimes reveal interestingbehavior in a system (e.g. [5G] in Section 8) but such reve-lations could arguably be obtained by a more realistic, albeitslower, simulation model. The power in an analytical modellies not in its role as a fast substitute for a simulator, butin the analysis that one can bring to bear on its equations.Such an analysis can yield insights that cannot be obtainedby eyeballing any number of plots — without knowing whatyou are looking for — and provide conclusions that no sim-ulator can oﬀer.We see illustrations of this power in [ElasticScaling] ,which discovers that nonuniform access is equivalent to uni-form access via the Application Contention Factor; in [Trans-actionalMemory] , which shows two dimensions of the pa-rameter space ( N and L ) can be reduced to one ( NL ); in [P2PVoD] , where the Tradeoﬀ Theorem says throughput,sequentiality and robustness cannot be simultaneously max-imized for P2P video-on-demand; in [802.11] , which charac-terizes the optimal balance between bandwidth loss throughpacket collisions and time loss to transmission backoﬀ. Thereare two other examples in [RouterBuﬀer] and [TCP] . [RouterBuﬀer] shows that (i) for n long TCP ﬂows,buﬀer size should be inversely proportional to √ n and (ii)for short ﬂows, the probability of a buﬀer overﬂow does notdepend on n , round-trip time, nor link capacity. These areinsights obtained from the analytical model — one cannotget them from any ﬁnite number of simulation experiments.The key equation in [TCP] expresses throughput in termsof packet loss probability p and round-trip time RT T . Clearly,for any nontrivial Internet path, p and RT T cn only be mea-sured, not predicted, so what is the point of having thatequation? Its signiﬁcance lies not in predicting throughput,but in characterizing its relationship with p and RT T . Sucha characterization led to the concept of TCP-friendliness andthe design of equation-based protocols. [RouterBuﬀer] and [TCP] thus advance the science ofnetwork communication.For a ﬁnal, topical example, consider the spread of fakenews and memes, etc. over the Web, driven by user inter-est, modulated by daily and weekly cycles, and dampenedover time. To adequately capture this behavior, [Infor-mationDiﬀusion] modiﬁes the classical epidemic model.However, there is no way of integrating the resulting diﬀer-ential equation, so it is solved numerically. The parametersof the model can then be calibrated by ﬁtting measured datapoints. Since measurements are needed for calibration, themodel has limited predictive power. Nonetheless, the para-metric values serve to succinctly characterize the diﬀusion,igure 9: Probability of packet collision p depends on minimum window size W and number of mobile devices n only throughthe gap g = Wn − . The maximum window size is 2 m W . [802.11] and provide some insight into its spread. This example il-lustrates the point that, although the pieces of a system aredesigned, engineered and artiﬁcial, they can exhibit a hard-to-understand, organic behavior when put together. An an-alytical model is thus a tool for developing the science ofsuch organisms.(Incidentally, given the recent, tremendous success of ar-tiﬁcial neural networks, it is fashionable now to speculatethat artiﬁcial intelligence may one day become so smart thatsome AI system will control humans. I believe that, if thatday comes, the system will behave biologically, with biolog-ical vulnerabilities that are open to attack.)

10. CONCLUSION

This review has discussed some issues (Sec. 3 and Sec. 8)and introduces some key ideas and techniques (Sec. 4, Sec. 5,Sec. 6 and Sec. 7) in analytical performance modeling. Al-though the models are often conceived as an engineeringtool to generate numerical predictions for the design andcontrol of computer systems (Sec. 2), Sec. 9 points to a lessobvious role in discovering the science that underlies thesesystems, much like the role that mathematical models playin discovering physics in nature. This is a role that spans allcomputer systems (processor, memory, bandwidth, databaseand multimedia systems, etc.), that helps develop a

Com-puter Science that withstands changes in technology.

11. REFERENCES [5G ] B. Baynat and N. Narcisse, Performance model for4G/5G networks taking into account intra- and inter-cell mobility of users.

Proc. LCN 2016 , 212–215. [802.11 ] Y. C. Tay and K. C. Chua. A capacity analysis forthe IEEE 802.11 MAC protocol.

Wireless Networks ,7(2):159–171 (2001). [CloudTransactions ] D. Kossman, T. Kraska and S. Loesing.An evaluation of alternative architectures for transac-tion processing in the cloud.

Proc. SIGMOD 2010 ,579–590. [DatacenterAMP ] V. Gupta and R. Nathuji. Analyz-ing performance asymmetric multicore processors forlatency sensitive datacenter applications.

Proc. Hot-Power 2010 , 1–8. [DatabaseScalability ] S. Elnikety, S. Dropsho, E. Cec-chet and W. Zwaenepoel. Predicting replicated databasescalability from standalone database proﬁling.

Proc.EuroSys 2009 , 303–316. [ElasticScaling ] D. Didona, P. Romano, S. Peluso and F.Quaglia. Transactional Auto Scaler: Elastic scalingof in-memory transactional data grids.

Proc. ICAC2012 , 125–134. [GPU ] J.-C. Huang, J.H. Lee, H. Kim and H.-H. S. Lee.GPUMech: GPU performance modeling technique basedon interval analysis.

Proc. MICRO 2014 , 268–279. [HM ] M.D. Hill and M.R. Marty. Amdahl’s Law in theMulticore Era.

IEEE Computer , 41(7):33–38 (2008). [InformationDiﬀusion ] Y. Matsubara, Y. Sakurai, B.A.Prakash, L. Li and C. Faloutsos. Rise and fall pat-terns of information diﬀusion: model and implications.

Proc. KDD 2012 , 6–14. [MapReduce ] E. Vianna, G. Comarela, T. Pontes, J. Almeida,V. Almeida, K. Wilkinson, H. Kuno and U. Dayal.Analytical performance models for MapReduce work-loads.

Int. J. Parallel Programming [P2PVoD ] B. Ran, D. G. Andersen, M. Kaminsky and K.Papagiannaki. Balancing throughput, robustness, andin-order delivery in P2P VoD.

Proc. CoNEXT 2010 ,10:1–10:12. [PipelineParallelism ] A. Navarro, R. Asenjo, Si. Tabikand C. Cascaval. Analytical modeling of pipeline par-allelism.

Proc. PACT 2009 , 281–290. [Rooﬂine ] S. Williams, A. Waterman and D. Patterson.Rooﬂine: an insightful visual performance model formulticore architectures.

CACM , 65–76, April 2009.

RouterBuﬀer ] G. Appenzeller, I. Keslassy and N. McK-eown. Sizing router buﬀers.

Proc. SIGCOMM 2004 ,281–292. [SoftErrors ] A.A. Nair, S. Eyerman, L. Eeckhout and L.K.John. A ﬁrst-order mechanistic model for architecturalvulnerability factor.

Proc. ISCA 2012 , 273–284. [T ] Y.C. Tay, Analytical Performance Modeling for Com-puter Systems (3rd Edition), Morgan & Claypool Pub-lishers, 2018. [TCP ] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose.Modeling TCP throughput: a simple model and itsempirical validation.

Proc. SIGCOMM 1998 , 303–314. [TGS ] Y.C. Tay, N. Goodman and R. Suri. Locking per-formance in centralized databases.

ACM Transactionson Database Systems 10 , 4(Dec. 1985), 415–462. [TransactionalMemory ] A. Heindl, G. Pokam, and A.-R. Adl-Tabatabai. An analytic model of optimisticsoftware transactional memory.