A review of analytical performance modeling and its role in computer engineering and science
AA review of analytical performance modelingand its role in computer engineering and science ∗ Y.C. Tay
National University of Singapore [email protected]
ABSTRACT
This article is a review of analytical performance modelingfor computer systems . It discusses the motivation for thisarea of research, examines key issues, introduces some ideas,illustrates how it is applied, and points out a role that it canplay in developing Computer Science.
1. INTRODUCTION A computer system has multiple components, possibly amix of both hardware and software. It may be a processorarchitecture on a chip, or a software suite that supports someapplication in the cloud. In our context, analytical modelingessentially refers to the formulation of equations to describethe performance of the system.System performance can be measured by some metric y ,say throughput, latency, availabilty, etc. This y depends onthe workload on the system, such as the number of concur-rent threads running on a chip, or the number of users foran application. It also depends on the system configuration,like the number of cores in a processor, or the number ofvirtual machines processing user requests. Collectively, werefer to the numerical workload and configuration variablesas input parameters x .Abstractly speaking, a computer system determines a func-tion f that maps input parameters x to the perfomance met-ric y , i.e. y = f ( x ). The complexity and nondeterminismin a real system makes it impossible to know this f exactly.The purpose of an analytical model is then to derive a func-tion ˆ f that approximates the ground truth f , i.e. ˆ f ( x ) ≈ y .There are various ways of constructing ˆ f . Using a sta-tistical approach, one might start with a sample of (cid:104) x , y (cid:105) values, fix a class of functions for ˆ f , and determine the ˆ f ∗ Some parts of this article appeared in [T] and “Lessons fromTeaching Analytical Performance Modeling”,
Proc. ICPEWorkshop on Education and Practice of Performance Engi-neering , Mumbai, India (April 2019), 79–84. Many thanksto Haifeng Yu and Prashant Shenoy for reading the draftand making many helpful suggestions. . that best fits the sample according to some optimizationcriterion (mean square error, say). Alternatively, a machinelearning approach might use the sample to train an artifi-cial neural network that computes ˆ f . In both cases, ˆ f isdetermined from the sample (cid:104) x , y (cid:105) , so no knowledge of thesystem is used. It follows that we may not be able to useˆ f to analyze the behavior of the system, to understand whyone particular hardware architecture is better than another,or what combination of parameters can cause performanceto deteriorate, etc. Such an ˆ f is thus a blackbox model of f .Sometimes, very little is known about the system itself:the processor architecture may be proprietary, or the servercluster may be virtualized. It such cases, it is entirely ap-propriate that we resort to a blackbox model of the system.In many cases, the performance analyst may actually havea sufficiently detailed understanding of the system to math-ematically describe the relationships among the speeds anddelays within it. One can then use equations to construct ˆ f .Such an ˆ f is then a whitebox model: it is based on a mathe-matical analysis of the system, can be analyzed by standardmathematical techniques (calculus, probability, etc.), andthe analysis can be interpreted in terms of details in thesystem. Therein lies the power of an analytical model.Our review begins in Sec. 2 by describing a common mo-tivation in constructing an analytical model, namely to en-gineer a computer system: to predict the throughput fora workload, to determine how failures affect response time,to evaluate the performance implication for different designchoices, etc.The derivations in an analytical model are often based onstrong assumptions; this is often considered a weakness ofthe approach. Sec. 3 presents examples to show that thesemodels are often accurate even when the assumptions are vi-olated. In addition to the assumptions, the derivations maymake approximations that are hard to justify theoretically(Sec. 4). However, the final arbiter for whether the assump-tions and approximations are acceptable lies not in theory,but in the experimental validation of the model.One broadly applicable technique in modeling is bottle-neck analysis (Sec. 5). It is powerful in that it requires verylittle information about the details in a system, but it is alsoweak in that it offers only performance bounds. Nonethe-less, these may suffice for some purposes, like comparingscalability limits, say.A crucial advantage that analytical models have over sim-ulation models lies in the global view of the parameter spacethat they offer. One can often use the equations to identify a r X i v : . [ c s . PF ] M a y igure 1: TAS is a tool to help a tenant determine howmany grid nodes are needed to meet the demand [Elastic-Scaling] .important (and unimportant) regions of the space, reducethe number of parameters, etc. (Sec. 6).A computer system can have many components that inter-act in complicated ways, but Sec. 7 shows how this complexinteraction can be decomposed into separate smaller mod-els. Such a decomposition then provides a way to study theseparate impact of hardware and software on a workload,the interaction between resource and data contention, etc.Since an analytical model is an approximation of the realsystem, we must verify not just the accuracy of its numericalpredictions, but also check that crucial properties revealedby the model (are not just artefacts of the model, but) arein fact properties of the real system. Sec. 8 discusses thisconcept of analytic validation.In contrast to the engineering motivation for an analyticalmodel, Sec. 9 points out the role that such models can playin developing a science for the behavior of these engineeredsystems.Throughout, we will draw examples from recent literatureon hardware, networking and datacenters, etc. to show theuse of analytical models in designing, controlling and study-ing systems, large and small. The range of examples is wide,as wide as the broad perspective that we want our studentsto have.Inevitably, we will need to use notation (e.g. M/D/
2. WHY AN ANALYTICAL MODEL?
But first, why model a system? To give an example, cloudproviders offer their customers an attractive proposition thatthe resources provided can be elastically scaled according todemand. However, a tenant may need a tool to help deter-mine how much resources (compute power, memory, etc.)to acquire, but is hampered by having very little knowl-edge of the cloud architecture. [ElasticScaling] considersthis problem in the context of a NoSQL data grid, where Figure 2: Scaling a processor down or up (by reducing orincreasing its number of cores) to save energy or speed upcomputation [DatacenterAMP] .the number of grid nodes can be increased or decreased tomatch the workload (see Figure 1). The authors present atool (TAS) that uses equations to analytically model howtransaction throughput varies nonlinearly (and nonmono-tonically, sometimes) as the number of nodes increase.Energy and latency are major issues for data centers. [DatacenterAMP] uses a simple queueing model to studyhow asymmetric multicore processors (AMP) can facilitatea tradeoff between energy use and latency bounds. The ideais to dynamically marshal the resources of multiple coresto give a larger processor that can speed up a sequentialcomputation or, when possible, scale it back down to reduceenergy consumption (see Figure 2). [GPU] also uses a model for architectural exploration. AGPU runs multiple threads simultaneously. This speeds upthe execution, but also causes delays from competition forMSHRs (Miss Status Handling Registers) and the DRAMbus. The paper uses a model to analyze how changing thenumber of MSHRs, cache line size, miss latency, DRAMbandwidth, etc. affects the tradeoff. There is no practicalway of doing such an exploration with real hardware. Part ofthe analysis is based on using the model to generate the CPI(cycles per instruction) stack that classifies the stall cyclesinto MSHR queueing delay, DRAM access latency, L1 cachehits, etc.The model in [SoftErrors] is similarly used to determinehow various microarchitectural structures (reorder buffer,issue queue, etc.) contribute to the soft error rate that is in-duced by cosmic radiation. Again, tweaking hardware struc-tures is infeasible, and simulating soft errors is intractable,so an analysis with a mathematical model is essential.One equation from the [SoftErrors] model explains whyan intuition — that a workload with a low CPI should havea low vulnerability to soft errors — can be contradicted.Similarly, the CPI stack breakdown by the [GPU] modelexplains how workload that spends many cycles queueing forDRAM bandwidth can, counter-intuitively, have negligiblestall cycles from L2 cache misses.To design a complicated system, an engineer needs helpfrom intuition that is distilled from experience. However,experience with real systems is limited by availability andconfiguration history. Although one can get around that viaimulation, the overwhelming size of the parameter spaceusually requires a limited exploration; this exploration is, inturn, guided by intuition. One way of breaking this circular-ity is to construct an analytical model that abstracts awaythe technical details and zooms in on the factors affectingthe central issues. We can then check our intuition with ananalysis of the model.Intuition is improved through contradictions: they pointout limits on the old intuition, and new intuition is gained toreplace the contradictions. Again, such contradictions canbe hard to find in a large simulation space, but may be plainto see with the equations in a mathematical model.The above examples illustrate the role that analytical mod-els can play in engineering complex computer systems.
3. ASSUMPTIONS
Analytical models are often dismissed because they arebased on unrealistic assumptions. For example, the latencymodel in [DatacenterAMP] uses a simple
M/M/ [HM] . The exponential distributionhas a strong “memoryless” property and Amdahl’s Law isan idealized program behavior. Therefore, one expects theengineering complexity and overheads of dynamically resiz-ing a multicore processor will render any model (simulatorsincluded) inaccurate. However, the contribution of an an-alytical model often does not necessarily lie in numericalaccuracy, but possibly in providing insight into system be-havior. In the case of [DatacenterAMP] , its simple modelsuffices to reveal an instructive interplay among chip area,latency bound and power consumption.Anyway, many models adopt equations from theory whilebreaking the assumptions that were used to derive thoseequations. In [GPU] , the model puts a bound on the de-lay computed with the Pollaczek-Khinchin formula for an
M/D/
M/D/ separable queueing network. [Databas-eScalability] shows how, by first profiling the performanceof a standalone database, the MVA algorithm can be usedto predict the performance of a replicated database and,in addition, compare two alternative designs (multi-masterand single-master). However, the probability of aborting atransaction increases with the replication and the numberof clients, thus changing the demand for resources (at theCPU, disks, etc.) and thus breaking the MVA assumptions.Yet, experiments show good agreement between model pre-dictions and simulation experiments. [MapReduce] is another example where the model usesthe MVA algorithm for a system that violates MVA assump-tions. Jobs in a system suffer delays for various reasons. Inthis example, a job consists of map , shuffle and reduce tasksthat are delayed not only by queueing for resources, but alsoby precedence constraints among the tasks (see Figure 3). Figure 3: Precedence constraints among tasks in a MapRe-duce job adds to delays. [MapReduce] .The key equation in the model is A ik ( −→ N ) = (cid:88) j f ij Q jk ( −−−−→ N − i ) , (1)where A ik ( −→ N ) and Q jk ( −−−−→ N − i ) are average queue lengths,and f ij is a quantity determined by the precedence con-straint; this equation is then used in the MVA algorithm toiterate from job population −−−−→ N − i to −→ N , and thus solve themodel. Strictly speaking, however, precedence constraintsviolate MVA’s separability assumptions, and there is no f ij in Mean Value Analysis.This practice of using equations even when their underly-ing assumptions are violated recalls the proverbial reminderto keep the baby (e.g. the Pollazcek-Khinchin formula, theMVA algorithm) while throwing out the bath water (i.e. theassumptions). This seems like a mathematical sin, but onlyif one confuses the sufficiency and necessity of the equations’assumptions: the equations may in fact be robust with re-spect to violations in their assumptions.
4. AVERAGE VALUE APPROXIMATION (AVA)
The assumptions in an analytical model are often there tohelp justify an equation. [TCP] is a well-cited mathematicalanalysis of the protocol. The first step in its derivation forcalculating expected throughput was E (cid:18) YA (cid:19) = EYEA , where Y is the number of packets sent in a period A betweentriple-duplicate packet loss indications. This equation canbe justified by assuming the TCP window size is a Markovregenerative process , but why should this assumption hold?TCP has many states ( slow start , exponential backoff , etc.)and correlated variables (window size, timeout threshold,etc.). These make it increasingly difficult in the analysis forone to even identify the assumptions needed to go from oneequation to another. Eventually, the authors simply adoptthe approximation EQ ( W ) ≈ Q ( EW ) , without stating the assumptions. Here, Q ( W ) is the prob-ability that a packet loss in a window of size W is causedby a timeout, W is a random variable, and the approxima-tion estimates the average Q ( W ) by Q ( EW ), i.e. replac-ing the variable W with its average EW . This technique,which I call Average Value Approximation (AVA) [T] ,is widely used in performance modeling.or example, [SoftErrors] estimate the time an instruc-tion stays in a reorder buffer as (cid:96)K , where (cid:96) is the averagelatency per instruction, and K is the average critical pathlength. Strictly speaking, this time is E ( T + · · · + T n ), where T i is latency for instruction i , n is the critical path length, ET i = (cid:96) and En = K . It is mathematically wrong to say E ( T + · · · + T n ) = ET + · · · + ET n , since n is a randomvariable. If we use AVA and replace n by En , T En willnot make sense if E n is not an integer; if it is, then indeed ET + · · · + ET En = (cid:96) + · · · + (cid:96) = (cid:96)K .A similar approximation appears in [5G] , which is a per-formance analysis of 5G cellular networks. The system insuch a network consists of overlapping wireless cells, withbit rates that differ from cell to cell, and from zone to zonewithin a cell. How long would it take to download a file (e.g.movie) as a user moves across the cells? The final equationin the analysis estimates the average time as T = n c R + n h t h , (2)where n c is the average number of cells visited during thedownload, R is the average time the user spends in a cell, n h is the average number of handovers between cells, and t h is the handover delay. Notice the expression n c R is again anapproximation for E ( R + · · · + R n c ), where R i is the timespent in the i-th visited cell.One could state some assumptions to rigorously justify E ( T + · · · + T n ) = (cid:96)K and E ( R + · · · + R n c ) = n c R , but howdoes one justify those assumptions? In analytical modelingof a complicated system, one is focused on making progresswith the derivation (often liberally replacing random vari-ables by their averages, like E (cid:0) YA (cid:1) = EYEA ) without worryingabout the assumptions. Whether the approximations havegone overboard will eventually be decided by experimentalvalidation.
5. BOTTLENECK ANALYSIS
There is a common misunderstanding that analytical mod-eling requires queueing theory. This is certainly untrue forthe influential [Roofline] model for analyzing multicore ar-chitectures. It is a very simple model that has a roofline with two straight segments, one representing the peak mem-ory bandwidth between processor caches and DRAM, andthe other representing peak floating-point processor perfor-mance (see Figure 4). No queueing theory is involved.Despite its simplicity, the roofline model suffices for an-swering some questions: (i) Given an architecture and afloating-point kernel, is performance constrained by proces-sor speed or memory bandwidth? The answer lies in com-paring the kernel’s operational intensity (i.e. operations perbyte of DRAM traffic) to the two roofline segments. Forexample, Figure 4 shows that the kernel stencil is limited bymemory bandwidth, not processor speed. (ii) Given a par-ticular kernel, which architecture will provide the best per-formance? One can answer this by comparing the rooflinesof alternative architectures. (iii) Given a particular archi-tecture, how can the kernels be optimized to push the per-formance? The optimization can be evaluated by examininghow it shifts the roofline and changes the operational inten-sity. [Roofline] makes very few assumptions and treats theprocessor as a blackbox. Details like multithreading, howthreads are distributed over the cores, the cache hierar-chy, etc. are not explicitly modeled. Rather, it focuses on Figure 4: An application’s rate of execution is boundedby memory bandwidth (sloping line) and processor speed(horizontal line); here, the application stencil is limited bymemory bandwidth, while
FFT is limited by processor speed [Roofline] .the possible bottlenecks (memory bandwidth and processorspeed) in the blackbox.Processor architecture aside, every blackbox has its bottle-necks. This is true of datacenter-scale architectures as well.For example, different cloud providers may have differentarchitectures for running transaction workloads. [Cloud-Transactions] consider three of these (see Figure 5). Whatdetermines their scalability limits?Once again, a cloud architecture is essentially a blackbox,with very little information on how the virtualized system isphysically deployed. Nonetheless, one can do a bottleneckanalysis [T] to get the limits on transaction throughput: λ classic = min (cid:26) N WA D WA , cD db , N st D st (cid:27) ,λ partition = min (cid:26) N WA D WA , N (cid:48) st D db + D st (cid:27) ,λ dist.control = min (cid:26) N (cid:48) WA D db + D WA , N st D st (cid:27) , (3)where classic , partition and dist.control denote the three ar-chitectures in Figure 5, N (cid:48) WA and N WA are the number ofweb/application servers with and without (respectively) co-located database servers; N (cid:48) st and N st are the number ofstorage servers with and without (respectively) co-locateddatabase servers; D WA , D db and D st are the service de-mands (seconds on a commodity machine M ) per transac-tion at a web/application, database and storage server (re-spectively), and c is the ratio of database server speed in classic architecture to M ’s speed.By scrutinizing the expressions in Equation (3), one cancompare the scalability limits of two architectures, and de-termine how one can push that limit for a particular archi-tecture. We thus see from Equation (3) how an analyticalmodel can bring clarity to the complex engineering choicesthat support cloud computing.
6. PARAMETER SPACE
The experimental results in [CloudTransactions] ap-pear to show throughput increasing almost linearly for dist.control as the number of emulated browsers EB increases. However,no system can scale its throughput linearly forever. A closerigure 5: Three different cloud architectures for supporting transactions [CloudTransactions] .examination of Equation (3) shows that, for the parametervalues in the experiments, saturation should set in at around EB = 13500, but that is beyond the range of the experi-ments ( EB < [P2PVoD] , which an-alyzes peer-to-peer (P2P) video-on-demand. The viral suc-cess of P2P protocols inspired many ideas from academiafor improving such systems. Each proposal can be viewedas a point in the design space, whereas [P2PVoD] uses ananalytical model to represent the entire space spanned bythroughput (number of bytes downloaded per second), se-quentiality (whether video chunks arrive in playback order)and robustness (with respect to peer arrival/departure/fail-ure, bandwidth heterogeneity, etc.). This global view of theparameter space leads to a counter-intuitive observation,that a reduction in sequentiality can increase (sequential)throughput, thus demonstrating the point in Section 2 thatan analytical model can help improve intuition.Furthermore, this model yields a
Tradeoff Theorem thatsays a P2P video-on-demand system cannot simultaneouslymaximize throughput, sequentiality and robustness. Thistheorem is an example of how an analytical model can dis-cover the science that underlies an engineered system.While an analytical model can give us a global view of theparameter space, not all of this space is of practical interest.Rather than delimit this space with some “magic constant”(e.g.
EB < [CloudTransactions] ), [Transac-tionalMemory] illustrates how the restriction can be donethrough an aggregated parameter.Single-processor techniques for coordinating access to sharedmemory (e.g. semaphores) do not scale well as the number ofcores increases, and [TransactionalMemory] examines analternative. For the software transactional memory in thisstudy, locks are used to control access to shared objects, andlocks in a piece of code are grouped into transactions; eachtransaction is, in effect, executed atomically. The parame-ter space in the experiments is restricted to kNL <
1, where k is the number of locks per transaction, N is the numberof concurrent transactions, and L is the number of objectsthat can be locked (so, intuitively, the system is overloadedif kN > L ). In fact, one can show that the performance (through-put, abort rate, etc.) is determined by k and Λ, whereΛ = NL [TGS] , thus reducing the parameter space from3-dimensional ( k, N, L ) to 2-dimensional ( k, Λ). The useof such aggregated parameters ( kNL and Λ) significantly re-duces the space that the experiments must cover.Parameter aggregation can go much further. One key is-sue in analyzing transaction performance is modeling the ac-cess pattern. Most studies assume access is uniform , i.e. theprobability of requesting any one of L objects is L . In real-ity, some objects are more frequently requested than others,but how many parameters would it take to model a realis-tic access distribution? One solution to this difficulty wasdiscovered by [ElasticScaling] . This paper defines an Ap-plication Contention Factor
ACF = P lock λ lock T hold , where P lock is the probability of a lock conflict, λ lock is the rate of lockrequests and T hold is the average lock holding time. Exper-iments show that the ACF of a given workload is constantwith respect to the number of nodes in the data grid, andthe same whether it is run in a private or public cloud (seeFigure 6).Now, let D = ACF , so P lock = λ lock T hold D . (4)By Little’s Law, λ lock T hold is the expected number of objectsthat are locked, so Equation (4) says the probability of con-flict is as if access is uniformly distributed over D objects.This is because the access pattern, being a property of thetransactions, should be constant with respect to the numberof grid nodes, and the same whether the cloud is private orpublic.This observation simplifies tremendously the analysis oftransaction behavior. It says the large literature that restson the assumption of uniform access remains valid once thenumber of objects is reinterpreted via ACF ; going forward,we can continue to adopt uniform access in modeling trans-action performance. This is another example where analyti-cal modeling is helping to discover the science in engineeringcomputer systems.
7. DECOMPOSITION AND DECOUPLING
Analyzing a computer system is not any easier if it is asigure 6: The Application Contention Factor
ACF is aproperty of the transactions; it does not vary with the num-ber of grid nodes, and is the same for private and publicclouds [ElasticScaling] .small as a chip, as is the case with [SoftErrors] (see Sec-tion 2). The probability that a radiation-induced fault ina microarchitecture structure (e.g. reorder buffer or issuequeue) causes a program output error depends on both thehardware (microarchitecture) and the software (program).Recall from Section 4 that the model estimates the time aninstruction stays in a reorder buffer as (cid:96)K , where (cid:96) is the av-erage instruction latency, and K is the average critical pathlength. It thus decouples the hardware and software: onecan analyze separately the impact of changing the hardware(which determines (cid:96) ) and changing the workload (which de-termines K ). For example, we can evaluate how changingissue width affects performance, without having to re-profilethe workload. Such decoupling is a powerful technique in thescientific analysis of a complicated system.One can view such a decoupling as decomposing into twosubmodels: one for hardware, the other for software. [MapRe-duce] has a similar decomposition. In Equation (1), f ij isa factor determined by the precedence constraint, while A ik and Q jk are queue lengths in the queueing network. Theprecedence constraint is a model for the job execution, whilethe queueing network is a model for the resource contention.These two submodels are not completely decoupled, as eval-uating f ij , A ik and Q jk requires an iteration between thegraph and queueing models.There is also model decomposition in [ElasticScaling] .Figure 1 shows the performance predictor consists of twosubmodels: (i) A statistical model for the resource con-tention; since the resources are virtualized and the hardwareconfiguration is unknown, this model uses machine learningto model how the cloud performance responds to changesin the workload. (ii) An analytical model for the data con-tention, i.e. how the number of locks, transactions and ob-jects affect the probability that a transaction encounters alock conflict, or has to abort. We see here that the black-box/whitebox difference mentioned in Section 1 does notmean we must choose just one of them. The [ElasticScal-ing] has both: the machine learning model is a blackbox,while the analytical model is a whitebox; the two are devel-oped independently but, again, the performance prediction Figure 7: As user speed increases while traversing 5G cells,the throughput for a file download at first increases, thendecreases [5G] .requires an iteration between the two.
8. ANALYTIC VALIDATION
Although analytical models are often touted as an alter-native to simulators, many of them are actually used as sim-ulators. To give an example, [5G] presents an interestingplot showing that, as user speed increases, throughput atfirst increases, then decreases (see Figure 7). This plot wasnot generated by a simulator, but by numerical solution ofthe [5G] analytical model; and the nonmonotonic through-put behavior is observed from the plot, not proved with themodel. In this sense, the model is used as a simulator; I callthis analytic simulation [T] .If we draw conclusions from plotting numerical solutionsof an analytical model, how do we know if these conclusionsare about the model, rather than the system? This is apossibility because the model is only an approximation ofthe system (this is an issue for simulators as well).To give an example, classical MVA solutions of a queue-ing network model has throughput that increases monoton-ically as the number of jobs increases. If such a model wereused for [ElasticScaling] , it will fail to capture a criticalbehavior (throughput nonmonotonicity) in the system (seeSection 2). Indeed, one can find examples in the literatureof such a qualitative divergence in properties between modeland system.We should therefore verify that an interesting propertyobserved in a model is, in fact, a property of the system. Icall this analytic validation [T] .For example, the [ElasticScaling] model claims that Ap-plication Contention Factor
ACF is a constant that is aproperty of the transactions (independent of the number ofgrid nodes and whether the transactions run in a private orpublic cloud), and Figure 6 presents an experiment to vali-date this claim. Note that nothing in this plot is generatedby the model — the properties we observe in the data areproperties of the system itself. In contrast, in a numeric val-idation of an analytical model, there is always a comparisonof experimental measurements to numerical solutions fromthe model.igure 8: For a fixed utilization ρ = 0 .
8, simulated bufferoccupancy EQ for different link rates are similar. Moreover,they have a saw-tooth behavior, as predicted by the model [RouterBuffer] .We can see another example of analytic validation in [Router-Buffer] . Internet routers drop packets when their buffersare full. To avoid this, and to accommodate the large num-ber of flows, router buffers have become very large. [Router-Buffer] uses an analytical model to study whether flow mul-tiplexing can make this buffer bloat unnecessary. Among theresults is an expression that says buffer occupancy EQ de-pends on the link rate only through utilization ρ . This isa strong claim from the model, so it requires experimentalverification.Figure 8 shows that, for fixed ρ = 0 .
8, simulation measure-ments of EQ are indeed similar for different link rates. Fur-thermore, a numerical solution of the model shows that EQ has a saw-tooth behavior as the TCP flow length increases;Figure 8 shows that the simulated EQ has this behavior too.The numerical difference between model and simulation inthe plot is beside the point; rather, the experiment showsthat an unusual behavior predicted by the model is not anartifact of the assumptions and approximations, but is infact a property of the system itself.For another illustration, consider the IEEE 802.11 proto-col for WiFi: it specifies what a base station and the mobiledevices in its wireless cell should do in sending and receivingpackets. Simultaneous transmissions from different mobiledevices can cause packet collisions and induce retransmis-sion and backoff, so maximum throughput in the cell canbe lower than channel bandwidth. The model in [802.11] examines how this saturation throughput depends on theprotocol parameters, and analyzes the tradeoff between col-lision and backoff.One of the claims from the [802.11] model says that theprobability p of a collision depends on the protocol’s min-imum window size W and the number of mobile devices n only through the gap g = Wn − . Figure 9 shows that (cid:104) g, p (cid:105) from simulations with different configurations do infact lie on the same curve. As a corollary, this reduces the 3-parameter space (cid:104) n, W, m (cid:105) — where maximum window sizeis 2 m W — to just a single parameter g (recall Section 6).The model also claims that bandwidth wasted by packetcollisions exceeds idle bandwidth caused by backoffs if and only if r > T , where r is the transmission rate and T is thetransmission time (including packet headers); this was alsoanalytically validated by the experiments. Here, we see howan analytical model can discover the science that governswireless packet transmissions over a shared channel.
9. ANALYSIS WITH AN ANALYTICAL MODEL
In answering the question “Why an Analytical Model?”(Section 2), we give examples where the model is used for ar-chitectural exploration ( [GPU] and [SoftErrors] ) and re-source provisioning ( [ElasticScaling] ); other examples in-clude [DatacenterAMP] , [CloudTransactions] and [Roofline] (design exploration) and [MapReduce] (capacity planning).For these examples, the analytical models were used to gen-erate numerical predictions.This analytic simulation can sometimes reveal interestingbehavior in a system (e.g. [5G] in Section 8) but such reve-lations could arguably be obtained by a more realistic, albeitslower, simulation model. The power in an analytical modellies not in its role as a fast substitute for a simulator, butin the analysis that one can bring to bear on its equations.Such an analysis can yield insights that cannot be obtainedby eyeballing any number of plots — without knowing whatyou are looking for — and provide conclusions that no sim-ulator can offer.We see illustrations of this power in [ElasticScaling] ,which discovers that nonuniform access is equivalent to uni-form access via the Application Contention Factor; in [Trans-actionalMemory] , which shows two dimensions of the pa-rameter space ( N and L ) can be reduced to one ( NL ); in [P2PVoD] , where the Tradeoff Theorem says throughput,sequentiality and robustness cannot be simultaneously max-imized for P2P video-on-demand; in [802.11] , which charac-terizes the optimal balance between bandwidth loss throughpacket collisions and time loss to transmission backoff. Thereare two other examples in [RouterBuffer] and [TCP] . [RouterBuffer] shows that (i) for n long TCP flows,buffer size should be inversely proportional to √ n and (ii)for short flows, the probability of a buffer overflow does notdepend on n , round-trip time, nor link capacity. These areinsights obtained from the analytical model — one cannotget them from any finite number of simulation experiments.The key equation in [TCP] expresses throughput in termsof packet loss probability p and round-trip time RT T . Clearly,for any nontrivial Internet path, p and RT T cn only be mea-sured, not predicted, so what is the point of having thatequation? Its significance lies not in predicting throughput,but in characterizing its relationship with p and RT T . Sucha characterization led to the concept of TCP-friendliness andthe design of equation-based protocols. [RouterBuffer] and [TCP] thus advance the science ofnetwork communication.For a final, topical example, consider the spread of fakenews and memes, etc. over the Web, driven by user inter-est, modulated by daily and weekly cycles, and dampenedover time. To adequately capture this behavior, [Infor-mationDiffusion] modifies the classical epidemic model.However, there is no way of integrating the resulting differ-ential equation, so it is solved numerically. The parametersof the model can then be calibrated by fitting measured datapoints. Since measurements are needed for calibration, themodel has limited predictive power. Nonetheless, the para-metric values serve to succinctly characterize the diffusion,igure 9: Probability of packet collision p depends on minimum window size W and number of mobile devices n only throughthe gap g = Wn − . The maximum window size is 2 m W . [802.11] and provide some insight into its spread. This example il-lustrates the point that, although the pieces of a system aredesigned, engineered and artificial, they can exhibit a hard-to-understand, organic behavior when put together. An an-alytical model is thus a tool for developing the science ofsuch organisms.(Incidentally, given the recent, tremendous success of ar-tificial neural networks, it is fashionable now to speculatethat artificial intelligence may one day become so smart thatsome AI system will control humans. I believe that, if thatday comes, the system will behave biologically, with biolog-ical vulnerabilities that are open to attack.)
10. CONCLUSION
This review has discussed some issues (Sec. 3 and Sec. 8)and introduces some key ideas and techniques (Sec. 4, Sec. 5,Sec. 6 and Sec. 7) in analytical performance modeling. Al-though the models are often conceived as an engineeringtool to generate numerical predictions for the design andcontrol of computer systems (Sec. 2), Sec. 9 points to a lessobvious role in discovering the science that underlies thesesystems, much like the role that mathematical models playin discovering physics in nature. This is a role that spans allcomputer systems (processor, memory, bandwidth, databaseand multimedia systems, etc.), that helps develop a
Com-puter Science that withstands changes in technology.
11. REFERENCES [5G ] B. Baynat and N. Narcisse, Performance model for4G/5G networks taking into account intra- and inter-cell mobility of users.
Proc. LCN 2016 , 212–215. [802.11 ] Y. C. Tay and K. C. Chua. A capacity analysis forthe IEEE 802.11 MAC protocol.
Wireless Networks ,7(2):159–171 (2001). [CloudTransactions ] D. Kossman, T. Kraska and S. Loesing.An evaluation of alternative architectures for transac-tion processing in the cloud.
Proc. SIGMOD 2010 ,579–590. [DatacenterAMP ] V. Gupta and R. Nathuji. Analyz-ing performance asymmetric multicore processors forlatency sensitive datacenter applications.
Proc. Hot-Power 2010 , 1–8. [DatabaseScalability ] S. Elnikety, S. Dropsho, E. Cec-chet and W. Zwaenepoel. Predicting replicated databasescalability from standalone database profiling.
Proc.EuroSys 2009 , 303–316. [ElasticScaling ] D. Didona, P. Romano, S. Peluso and F.Quaglia. Transactional Auto Scaler: Elastic scalingof in-memory transactional data grids.
Proc. ICAC2012 , 125–134. [GPU ] J.-C. Huang, J.H. Lee, H. Kim and H.-H. S. Lee.GPUMech: GPU performance modeling technique basedon interval analysis.
Proc. MICRO 2014 , 268–279. [HM ] M.D. Hill and M.R. Marty. Amdahl’s Law in theMulticore Era.
IEEE Computer , 41(7):33–38 (2008). [InformationDiffusion ] Y. Matsubara, Y. Sakurai, B.A.Prakash, L. Li and C. Faloutsos. Rise and fall pat-terns of information diffusion: model and implications.
Proc. KDD 2012 , 6–14. [MapReduce ] E. Vianna, G. Comarela, T. Pontes, J. Almeida,V. Almeida, K. Wilkinson, H. Kuno and U. Dayal.Analytical performance models for MapReduce work-loads.
Int. J. Parallel Programming [P2PVoD ] B. Ran, D. G. Andersen, M. Kaminsky and K.Papagiannaki. Balancing throughput, robustness, andin-order delivery in P2P VoD.
Proc. CoNEXT 2010 ,10:1–10:12. [PipelineParallelism ] A. Navarro, R. Asenjo, Si. Tabikand C. Cascaval. Analytical modeling of pipeline par-allelism.
Proc. PACT 2009 , 281–290. [Roofline ] S. Williams, A. Waterman and D. Patterson.Roofline: an insightful visual performance model formulticore architectures.
CACM , 65–76, April 2009.
RouterBuffer ] G. Appenzeller, I. Keslassy and N. McK-eown. Sizing router buffers.
Proc. SIGCOMM 2004 ,281–292. [SoftErrors ] A.A. Nair, S. Eyerman, L. Eeckhout and L.K.John. A first-order mechanistic model for architecturalvulnerability factor.
Proc. ISCA 2012 , 273–284. [T ] Y.C. Tay, Analytical Performance Modeling for Com-puter Systems (3rd Edition), Morgan & Claypool Pub-lishers, 2018. [TCP ] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose.Modeling TCP throughput: a simple model and itsempirical validation.
Proc. SIGCOMM 1998 , 303–314. [TGS ] Y.C. Tay, N. Goodman and R. Suri. Locking per-formance in centralized databases.
ACM Transactionson Database Systems 10 , 4(Dec. 1985), 415–462. [TransactionalMemory ] A. Heindl, G. Pokam, and A.-R. Adl-Tabatabai. An analytic model of optimisticsoftware transactional memory.