[PDF] Measuring the Complexity of Packet Traces

Abstract

This paper studies the structure of several real-world traces (including Facebook, High-Performance Computing, Machine Learning, and simulation generated traces) and presents a systematic approach to quantify and compare the structure of packet traces based on the entropy contained in the trace file. Insights into the structure of packet traces can lead to improved network algorithms that are optimized toward specific traffic patterns. We then present a methodology to quantify the temporal and non-temporal components of entropy contained in a packet trace, called the trace complexity, using randomization and compression. We show that trace complexity provides unique insights into the characteristics of various applications and argue that there is a need for traffic generation models that preserve the intrinsic structure of empirically measured application traces. We then propose a traffic generator model that is able to produce a synthetic trace that matches the complexity level of its corresponding real-world trace.

Full PDF

MMeasuring the Complexity of Packet Traces ∗ Chen Avin Manya Ghobadi Chen Griner Stefan Schmid School of Electrical and Computer Engineering, Ben Gurion University of the Negev, Israel Computer Science and Artiﬁcial Intelligence Laboratory, MIT, USA Faculty of Computer Science, University of Vienna, Austria

Abstract

This paper studies the structure of several real-world traces (including Facebook, High PerformanceComputing, Machine Learning, and simulation generated traces) and presents a systematic approach toquantify and compare the structure of packet traces based on the entropy contained in the trace ﬁle.Insights into the structure of packet traces can lead to improved network algorithms that are optimized to-ward speciﬁc traﬃc patterns. We then present a methodology to quantify the temporal and non-temporalcomponents of entropy contained in a packet trace, called the trace complexity , using randomization andcompression. We show that trace complexity provides unique insights into the characteristics of variousapplications and argue that there is a need for traﬃc generation models that preserve the intrinsic struc-ture of empirically measured application traces. We then propose a traﬃc generator model that is ableto produce a synthetic trace that matches the complexity level of its corresponding real-world trace.

Packet traces collected from networking applications, such as data center traﬃc, tend not to be completelyrandom and have been observed to feature structure : data center traﬃc matrices are sparse and skewed [1, 2],exhibit locality [3], and are bursty [4, 5]. Motivated by the existence of such structure, the networking com-munity is currently putting much eﬀort into designing algorithms to optimize diﬀerent network layers towardsuch structure towards self-driving and demand-aware networks [6, 7, 8], learning-based traﬃc engineering [9]and video streaming [10], as well as reconﬁgurable optical networks [11, 2, 12]. For instance, many networkoptimizations exploit the presence of elephant ﬂows [13, 14].However, the structure available in diﬀerent applications can diﬀer signiﬁcantly, and a uniﬁed approachto measure the structure in traﬃc traces is missing. Better quantiﬁcation of a trace structure will lead tobetter network optimization and to the understanding of the available improvement, if possible, in currentsolutions. Moreover, one of the critical factors in evaluating new proposals is their traﬃc workload. Ideally,the traﬃc workload should contain the same structure as real-world traces but often is overlooked due to thelack of a traﬃc generation model that can replicate real-world traces while preserving their temporal andnon-temporal structure.For instance, consider a trace ﬁle including the communication pattern of a Machine Learning (ML)application executing a popular convolutional neural network training job on four GPUs. This workload wasobtained from authors of [15]. Figure 1(a) is a visualization of the trace ﬁle where each entry in the trace isrepresented by a unique color corresponding to its ¡source, destination¿ GPU pair. This visualization showsthe temporal structure in the trace ﬁle, as colors appear consecutive and follow some pattern. In contrast,Fig. 1(b) shows the same trace ﬁle but the entries in the ﬁle are shuﬄed to remove the temporal structurein Fig. 1(a). The traﬃc matrix (TM) in Fig. 1(c) shows a skewed heatmap indicating that some GPU pairscommunicate more frequently than others. Note that even thought the temporal structures in (a) and (b)are diﬀerent, they both have the same TM shown in (c). In other words, the TM is able to capture the ∗ Authors appear in alphabetical order. Research conducted as part of Chen Griner’s thesis. a r X i v : . [ c s . N I] M a y a) Temporal structure in the original trace S o u r c e G P U D e s t i n a t i o n G P U (c) TM of (a)&(b) Time

Entries in the trace are shown in the order of appearance.(b) Shuffling the entries in (a) removes the temporal structure(d) A trace with lowest temporal structure corresponding to (a)(e) A trace with highest temporal structure corresponding to (a) (f) TM of (d)&(e) S o u r c e G P U D e s t i n a t i o n G P U Figure 1: Visualization of temporal and non-temporal structure in machine learning workload.non-temporal structure in the trace ﬁles but not the temporal one. For comparison, let us now consider twosynthetic traces shown in Fig. 1(d) and (e). Trace (d) is generated uniformly and random and has the lowesttemporal structure compared to (a), while trace (e) is sorted based on ¡source, destination¿ key and hencehas the highest temporal structure. Similarly, Fig. 1(f) captures the non-temporal structure in (d) and (e)but not the temporal one.But how can we measure the structure of a trace ﬁle? And how should we generate synthetic traces whilepreserving the structure of empirical traces? This paper aims at providing initial steps to these questions.In particular, we quantify the amount of temporal and non-temporal structure in traﬃc traces using theinformation theoretic measure of entropy [16] in the trace. Since the term entropy is deﬁne for randomvariables, as opposed to a sequence of individual communication requests in a packet trace, we use a moregeneral term called complexity [17] to quantify the structure in a packet trace and call it trace complexity .Moreover, we provide a traﬃc generation model to generate synthetic traces that match the structure of agiven trace. Intuitively, a packet trace with high structure has low entropy and low complexity : it containslittle information, and the sequence behavior is more predictable [18]. Our approach allows us to chart, whatwe call, a complexity map of individual traﬃc traces: to map each traﬃc trace to a two-dimensional graphindicating the amount of temporal and non-temporal information that is present in a trace.The main contributions of this paper are as follows. First, we present an information theoretic perspec-tive to systematically separate the temporal and non-temporal structures available in a traﬃc trace ( § § § This section describes our methodology to quantify the inherent structure in packet traces. Given a packettrace, σ , we deﬁne its trace complexity as the ratio of its entropy over that of a random trace, U ( σ ). Intuitively,a random trace does not compress as well as a more structured trace. At the heart of our methodology lie2able 1: Traces used in the paper.Type i ) Randomization: we systematically randomize diﬀerent slices of a traﬃc trace toproﬁle their contributions to the trace complexity; ( ii ) Compression: we then measure the complexity ofthe trace and its randomized variants by compressing the trace ﬁles. The size of the compressed trace ﬁle isthen taken as the measure of trace complexity. Next, we explain these two concepts more formally.

Eliminate Structure by Randomization.

A traﬃc trace ﬁle σ consists of an ordered list of entries σ , σ , . . . , σ t , where each entry σ i = ( s i , d i ) is a ¡source, destination¿ pair arriving at time i . In this work,we ignore other ﬁelds in a packet header (such as packet size and port number) and focus on the orderof entries in the trace ﬁle, to capture temporal complexity, and source/destination pairs, to capture non-temporal complexity, of the trace. We ﬁnd that there is enough information in our methodology to capturethe diﬀerences in temporal and non-temporal complexities of real traces. For instance, § Trace Complexity.

We now deﬁne the trace complexity, Ψ( σ ), as the ratio between the complexity of theoriginal trace, σ , to the expected complexity of its randomized counterpart, U ( σ ), where each of its entriesare chosen uniformly at random from the set of IDs in σ :Ψ( σ ) = C ( σ ) C ( U ( σ )) . (1)As we will discuss later in this section, C ( · ) represents the size of the compressed trace ﬁle and the morestructure a trace has, the better it can be compressed. Hence, Ψ( σ ) ∈ [0 ,

1] since C ( σ ) ≤ C ( U ( σ )). Temporal Trace Complexity.

The temporal structure of a trace is a reﬂection of the burstiness in thetraﬃc pattern. Prior work has measured the degree of burstiness in a trace as a sequence data packets withinter-arrival time less or equal to 1 millisecond [21, 22, 5]. Instead, we capture the temporal structure ina trace, σ , by systematically randomizing the original trace to eliminate all temporal relations in the traceand obtain a new trace ﬁle Γ( σ ). Intuitively, Γ( σ ) is a trace where each communication request is chosenindependently at random from the previous ones. Formally, let Γ( σ ) be a temporal transformation: a trans-formation that performs a uniform random permutation of the rows of σ , eliminating any time dependencybetween rows, therefore C ( σ ) ≤ C (Γ( σ )). To measure how much temporal complexity is contained in σ ,we therefore normalized it by the complexity of its temporal transformation, Γ( σ ). Hence, the normalizedtemporal trace complexity is deﬁned as T ( σ ) = C ( σ ) C (Γ( σ )) ∈ [0 , Non-Temporal Trace Complexity.

Note that, non-temporal structure is unaﬀected by the transformationΓ( σ ) because correlations such as requests frequency, source destination dependency are conserved, whileonly the order of the elements is changed such that it is uniformly random. Therefore, all the remainingcomplexity after the elimination of temporal complexity is non-temporal. This can be formally deﬁnedusing our methodology by normalizing Γ( σ ) with U ( σ ) which has maximum complexity and no structure: N T ( σ ) = C (Γ( σ ))) C ( U ( σ )) ∈ [0 , Theoretical Properties.

It directly follows from our deﬁnition that the measure of trace complexityΨ( σ ) deﬁned in Eq. 1 is the multiplication of the temporal and non-temporal complexity ratios. Formally,Ψ( σ ) = T ( σ ) × N T ( σ ). An important feature of our methodology is that is enables comparing traces ofdiﬀerent sizes and domains. Section 4.2 describes the relationship of our metric to the entropy rate of a tracewhen σ is generated by a stationary stochastic process. Compression-based Complexity.

Our methodology to measure the empirical entropy of a trace ﬁle relies3 .30.40.50.60.70.80.91.0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

HADMulti GridMLpFabBursty UniformSkewed &Bursty SkewedWEBDBCNS N o n - t e m p o r a l c o m p l e x i t y Temporal complexity

Figure 2: The complexity map of seven real traces (colored circles) and four reference points placed on thecorners of the map.on the principle of data compression . The better we can compress a traﬃc trace, σ , the lower must be itsentropy rate and hence its complexity. We assume a compression function or algorithm C is applied to σ and the complexity of σ , C ( σ ), is the size of the compressed trace ﬁle. In this work, we use the 7zipcompressor with Lempel-Ziv-Markov chain compression (LZMA) [23]; other compression techniques such asDEFLATE [24] can also be used. In this section, we ﬁrst propose a graphical representation, called the complexity map , to quantify andcompare the temporal and non-temporal complexities of diﬀerent traces on a 2-dimensional plane ( § § Setup and Dataset.

As shown in Table 1, our dataset consists of 17 trace ﬁles in ﬁve categories: ( i ) traceﬁles from three Facebook (FB) [19] datacenters: hadoop (HAD), web (WEB), and database (DB) includingIP and rack-level traces; ( ii ) MPI traces of three exascale applications in high performance computing (HPC)clusters [20]: CNS, MultiGrid, and NeckBone; ( iii ) pFabric [14] packet traces that we generated by runningthe NS2 simulation script obtained from the authors of the paper; ( iv ) a machine learning (ML) trace weobtained from [15] that measures the communication pattern between four GPUs running VGG19, a popularconvolutional neural network training job; and ( v ) four reference traces that we synthetically generate torepresent bursty, skewed, busty & skewed, and uniform traces. To avoid result distortion due to non-randomID selection in production traces, we uniformly hash all the source/destination IDs to the same length anddomain. 4

28 256 384 512 640 768 896 10241024896768640512384256128 128 256 384 512 640 768 896 10241024896768640512384256128 (a) (b) (c)Figure 3: Traﬃc matrices corresponding to three traces in Fig. 2: (a) CNS application, (b) MultiGridapplication, (c) Uniform reference point.

To compare the complexity of traces in our dataset, we place them on a complexity map where the Xand Y axes represent the temporal and non-temporal complexity dimensions. Fig. 2 shows the complexitymap of seven real traces and four reference points where each circle indicates a trace and the area of thecircles corresponds to the overall complexity of σ , Ψ( σ ), which is calculated by multiplying the temporal andnon-temporal complexities of σ , as described in § Uniform , Skewed , Bursty , and

Skewed & Bursty . The

Uniform trace is located at (1,1), indicating that it has thehighest possible complexity and, hence, no structure. This means that the trace is a uniformly chosen randomsequence. The

Skewed trace is located at (1,0.4), which indicates that it has high temporal complexity andlow non-temporal complexity. This is a result of requests in the sequence that are distributed iid (and hencewith high temporal complexity), but which arrive from a skewed distribution with low entropy (and hencehave low non-temporal complexity). In contrast, the

Bursty trace, located at (0.4,1), has low temporalcomplexity and high non-temporal complexity. This is the case when the next source-destination pair isselected uniformly at random, i.e., with high non-temporal complexity, but then repeated for some time(i.e., modeling a burst), creating temporal patterns and lower temporal complexity. Lastly, the

Skewed &Bursty trace, located at (0.4, 0.4), has the lowest complexity in the current map and has both temporal andnon-temporal structure. Requests are both temporally dependent (i.e., with repetitions) and new requestsarrive from a skewed distribution. All four traces can be generated using a Markovian model which wedescribe in more details later in Section 4. Fig. 3 shows the traﬃc matrix of three of the traces in Fig. 2:CNS, MultiGrid, and the uniform reference point. The observation that MultiGrid has less non-temporalcomplexity than CNS is captured by the diﬀerences in their traﬃc matrices shown in Fig. 3. In particular,we can observe that Fig. 3(a) has less structure than 3(b), hence CNS has higher non-temporal complexitythan MultiGrid. In contrast, Fig. 3(c) shows no structure and hence it has the highest complexity in thecomplexity map in Fig 2.

In this section, we apply the complexity map to diﬀerent traces and discuss the main takeaways with respectto their complexities and the diﬀerences between them.

Applications have diﬀerent complexity measures.

The complexity map highlights the diﬀerent char-acteristics and structures available in diﬀerent applications, conﬁrming observations such as [19] conductedon Facebook’s datacenters. Recall Fig. 2: pFabric and ML traces feature a higher non-temporal complexitythan MultiGrid and all Facebook (DB, HAD, WEB) traces, but pFabric has a lower temporal complexitythan all the other traces. Interestingly, Facebook traces have the highest temporal complexity. We suspectthis is because of the 30,000 to 1 sampling of Facebook traces that destroys the temporal structure resulting5 .50.60.70.80.91.0 0.0 0.2 0.4 0.6 0.8 1.0 pFabric 10% load pFabric 80% load CNS Multi Grid

Temporal complexity N o n - t e m p o r a l c o m p l e x i t y pFabric 50% load NeckBone HAD Rack DB Rack DB IP WEB IP WEB Rack HAD IP T e m p o r a l & n o n - t e m p o r a l j u m p Temporal complexity N o n - t e m p o r a l c o m p l e x i t y WEB SRC WEB DST HAD DST HAD SRC HPC SRC HPC DST pFab SRC pFab DST N o n - t e m p o r a l j u m p T e m p o r a l j u m p Temporal complexity N o n - t e m p o r a l c o m p l e x i t y (a) pFabric & HPC traces (b) IP and Rack-level aggregations (c) Src&Dst-level complexitiesFigure 4: Using diﬀerent complexity maps to understand various trace structures.in a high temporal complexity. This indicates that diﬀerent applications may be identiﬁed by their speciﬁccomplexity characteristic, and may provide diﬀerent opportunities for optimization. To obtain a more de-tailed understanding, let us zoom in to the pFabric and HPC traces. Fig. 4(a) shows that the complexity ofpFabric also depends on the load (here, 10%, 50%, and 80% loads are shown): at lower loads, fewer ﬂowsare competing, and hence ﬂows mix to a lesser extent, naturally resulting in a lower temporal complexity.While this is expected, it validates our methodology and shows that compression can capture this behavior.The non-temporal complexity of diﬀerent HPC traces (CNS, MultiGrid, and NeckBone) depends on thespeciﬁc application, but is generally lower than that of pFabric, validating that compression can capture thenon-temporal structure, as expected. Aggregation-level matters.

The Facebook traces provide an opportunity for additional insights, as theyallow us to study not only IP-to-IP traces but also rack-to-rack traﬃc. Fig. 4 (b) shows the complexities ofthree diﬀerent Facebook clusters, HAD, WEB, and DB. First, we see that some applications (e.g., Hadoop)feature much more structure than others (e.g., DB). However, we also observe the impact of aggregation :at the rack level, we see a higher temporal complexity: communication becomes more random. Moreover,WEB and DB have a slightly lower non-temporal complexity on the rack level, an indication that thistraﬃc has a high structure and placement in datacenters is subject to optimization. In case of HAD, therack-level complexity is higher than the IP-level complexity, in both dimensions: as we move higher tothe topology, communication becomes more uniform. It is important to note that the Facebook traces aresampled from real traﬃc, but only from a part of the racks in the datacenter [19]. As a result, this trace isnot a perfect representation of the entire network, especially in terms of its source distribution. To get moreaccurate results, without introducing under-estimation in non-temporal complexity, we modiﬁed the uniformtransformation U ( σ ) such that it will generate the source and destination columns individually, only fromthe set of IDs found in the its respective columns. Also, we note that while sampling inﬂuences the measuredabsolute values, it does not aﬀect the relative conclusions, e.g., regarding the relatively higher complexityobserved on the rack-level. The complexity of sources and destinations is unique to each trace.

So far, we have focused oncommunication pairs , however, our methodology also allows us to investigate the complexities introduced bysources and destinations separately. Indeed, traﬃc matrices may be asymmetric in that while communica-tion sources are more skewed in the traﬃc trace, communication destinations are uniform, and vice versa.Accordingly, Fig. 4 (c) depicts the complexity map for diﬀerent traces separated by their source and desti-nations. In case of pFabric traces, sources and destinations behave similarly, which is expected, given thatthey are sampled uniformly at random. In the case of HPC, the non-temporal complexity is high, namely,the work seems to be uniformly divided among all CPUs. See also the marginal source and destinationdistributions in Fig. 3(a). Interestingly, the sources and destinations individually contain temporal structure(where the source has lower complexity), which may be an indication that operations proceed in rounds,6.g., where a node (CPU) ﬁrst sends several requests and then receives answers asynchronously. The IPtraces for WEB application reveal that the sources have less temporal complexity than the destinations,which may be explained by the star-like communication patterns of a web server; the lower non-temporalcomplexity indicates a more skewed popularity distribution of web servers (compared to cache destinationswhich are load-balanced). The high non-temporal complexity of Hadoop on the rack-level shows the ratherequal distribution, however, temporal structure may be leveraged due to consecutive communications. Thelow non-temporal complexity of destinations is a result of the outband sampling of the FB trace where thenumber of sources is much smaller than the number of destinations.

One interesting implication of our methodology is that it naturally lends itself as the basis for synthetictraﬃc generators. Furthermore, it allows us to provide formal guarantees on the complexity of stochasticprocesses. In the following, we discuss these two aspects.

We next tackle the question of how to synthesize traﬃc workloads of a particular temporal and non-temporalcomplexity. Given the limited amount of publicly available communication traces, such a model can beparticularly useful to generate synthetic benchmarks, allowing researchers to compare their algorithms indiﬀerent settings (e.g., for longer communication traces). Hence, in the following, we propose an approachwhich allows to eﬃciently generate traces with formal guarantees on their expected complexity for anyspeciﬁc point on the complexity map. It is important to note that for all points on the map, there could bemany traces (and models) whose complexity maps to this point. Our model provides one such solution.To derive formal guarantees, we propose a simple Markovian model which is a stationary random processwith a well-deﬁned entropy rate [25]. The model has two components: temporal and non-temporal. Thenon-temporal component is a joint probability traﬃc matrix, M , which can be computed from a given trace σ , similar to Fig. 1(c), or represent a known distribution (e.g. Zipf) where the entropy depends on thedistribution’s parameters. The temporal component is a repeating probability p . To generate a trace, westart by sampling the ﬁrst pair from M , and then at each step we add a pair to the trace, with probability p we repeat the last pair and with probability 1 − p we sample a new pair from M . More formally, to emulatea point ( x, y ) on the complexity map we set H ( M ) = y · n where H ( M ) is the joint entropy of M . Itcan be shown that M is the stationary distribution of the chain and the non-temporal complexity of themodel is y = H ( M ) / (2 log n )The temporal complexity of the model is x = H ( p, − p ) + (1 − p ) H ( M ) H ( M )where p can be computed analytically given x . Therefore, we can produce traces with similar complexitieson the complexity map.Figure 5 presents the quality of the above traﬃc generation model in producing syntactic traces forseven example points in the complexity map. First we used the model to reproduce the four hypotheticalpoints from Fig. 2: Uniform , Skewed , Bursty , and

Skewed & Bursty . As a skewed distribution we used Zipfdistribution with an exponent parameter of which leads to a normalized entropy of 0 . .30.40.50.60.70.80.91.0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Multi GridMLBursty UniformSkewedWEB

Temporal complexity N o n - t e m p o r a l c o m p l e x i t y Original traceReproduced trace

Skewed &Bursty

Figure 5: Our traﬃc generation model is able to a produce new trace ﬁles with similar complexities as theoriginal traces. The solid circles indicate the original trace’s complexity and dashed circles represent thecomplexity of the trace produced by our traﬃc generator.Moreover, currently we do not reproduce exact packet arrival times, rather we take the order of packets as aproxy for temporal structure. In future work, we plan to add packet arrival times and ﬂow-level informationto the complexity analysis and our traﬃc generation model.

Our methodology can also provide a framework for formal analysis. In the following, assume that σ is a tracegenerated by a stationary stochastic process [25]. Then, for long sequences and using an optimal compressionalgorithm (such as the Lampel-Ziv [26]), we will achieve the compression limit deﬁned by Shannon [25, 27, 28]:the entropy rate of the process. From this, the normalized complexities of σ can be proved analytically.Let Z = { Z i } be a stationary stochastic process that generates σ where Z i = { S i , D i } are time-indexedrandom variables, and S i ∈ S and D i ∈ D are a random source and a random destination at time i ,respectively. Since Z is stationary, let π denote the stationary distribution and note that π is a jointdistribution over S and D . With a slight abuse of notation, let S and D also denote the random variable of π . Note that S and D may be deﬁned over diﬀerent domains (IDs) and may be dependent. Also Z i may bedependent on a past element in Z .A basic measure for the complexity of Z is based on Shannon Entropy and known as the entropy rate [25], H ( Z ). The entropy rate of a stochastic process captures the expected number of bits per symbol (i.e.,source or destination IDs) that are both necessary and suﬃcient to describe σ ; or, alternatively, the expectedamount of uncertainty in the next symbol given past symbols in the sequence. The smaller the entropy rate,the less complex is the sequence (it requires fewer bits to describe/compress it).We can use the previous deﬁnitions of normalized trace complexity of σ and formally relate them toentropy rates when σ is generated by a stationary stochastic process. Theorem 1 (Trace Complexity Ratios of a Stationary Process) . Consider an indexed stationary stochastictrace process Z = { Z t } to generate σ where Z t = { S t , D t } and n = | S ∪ D | . If an optimal compression lgorithm C is used, then:1. The trace complexity ratio is lim t →∞ Ψ( σ ) = C ( σ ) C ( U ( σ )) = H ( Z )2 log n (2)

2. The temporal complexity ratio is: lim t →∞ T ( σ ) = C ( σ ) C (Γ( σ )) = H ( Z ) H ( S, D ) (3)

3. The non-temporal complexity ratio is: lim t →∞ N T ( σ ) = C (Γ( σ ))) C ( U ( σ )) = H ( S, D )2 log n (4) Information-theoretic approaches and compression methodologies have already been proven successful incapturing entropy in other domains such as email [29], or comment [30] spam ﬁltering, or estimating neuraldischarges [31]. The study of traﬃc patterns and the design of models is an evergreen topic of high relevancein the networking literature, and examples where measurement studies spurred much research into traﬃcmodeling dates back to the 1990s [32, 33, 34]. Since then, a large number of methodologies have beendeveloped [35, 36, 37], based on temporal statistics [38, 39, 40], spatial statistics [41, 42], and physical [43]and information-theoretic [44, 45] models. In contrast to prior work, we are primarily interested in thecommunication pattern itself, rather than in the volume or headers of the exchanged data. While our workbuilds upon many signiﬁcant results developed over the last decades [17], we are not aware of any work whichallows to systematically diﬀerentiate between temporal and non-temporal components of traﬃc traces.

The speciﬁc characteristics of traﬃc workloads have important implications for emerging network fabricsand algorithms [19]. This paper takes the ﬁrst steps at understanding the complexity of traﬃc traces. Inaddition to temporal and non-temporal complexity measures, our entropy-based approach can be used toinvestigate other dimensions of the trace structure, e.g., regarding source-destination dependencies, or toexplore structure in transmission times and packet headers. More generally, while trace structure indicatespotential for optimizations, it remains to develop algorithms which exploit this structure to improve networkperformance and/or utilization.

References [1] T. Benson, A. Akella, and D. A. Maltz, “Network traﬃc characteristics of data centers in the wild,” in

Proc. ACM SIGCOMM Conference on Internet Measurement (IMC) , pp. 267–280, ACM, 2010.[2] M. Ghobadi et al., “Projector: Agile reconﬁgurable data center interconnect,” in

Proc. ACM SIG-COMM , pp. 216–229, 2016.[3] K. Chen, A. Singla, A. Singh, K. Ramachandran, L. Xu, Y. Zhang, X. Wen, and Y. Chen, “Osa:An optical switching architecture for data center networks with unprecedented ﬂexibility,”

IEEE/ACMTransactions on Networking (TON) , vol. 22, no. 2, pp. 498–511, 2014.94] S. Zou, X. Wen, K. Chen, S. Huang, Y. Chen, Y. Liu, Y. Xia, and C. Hu, “Virtualknotter: Onlinevirtual machine shuﬄing for congestion resolving in virtualized datacenter,”

Computer Networks , vol. 67,pp. 141–153, 2014.[5] Q. Zhang, V. Liu, H. Zeng, and A. Krishnamurthy, “High-resolution measurement of data center mi-crobursts,” in

Proceedings of the 2017 Internet Measurement Conference , IMC ’17, (New York, NY,USA), pp. 78–85, ACM, 2017.[6] C. Avin and S. Schmid, “Toward demand-aware networking: A theory for self-adjusting networks,” in

ACM SIGCOMM Computer Communication Review (CCR) , 2018.[7] S. Xiao, D. He, and Z. Gong, “Deep-q: Traﬃc-driven qos inference using deep generative network,”in

Proceedings of the 2018 Workshop on Network Meets AI & ML , NetAI’18, (New York, NY, USA),pp. 67–73, ACM, 2018.[8] K. Tu, B. Ribeiro, A. Swami, and D. Towsley, “Tracking groups in mobile network traces,” in

Proceedingsof the 2018 Workshop on Network Meets AI & ML , NetAI’18, (New York, NY, USA), pp. 35–40, ACM,2018.[9] A. Valadarsky, M. Schapira, D. Shahaf, and A. Tamar, “Learning to route,” in

Proceedings of the 16thACM Workshop on Hot Topics in Networks , HotNets-XVI, (New York, NY, USA), pp. 185–191, ACM,2017.[10] H. Mao, R. Netravali, and M. Alizadeh, “Neural adaptive video streaming with pensieve,” in

Proceedingsof the Conference of the ACM Special Interest Group on Data Communication , SIGCOMM ’17, (NewYork, NY, USA), pp. 197–210, ACM, 2017.[11] W. M. Mellette, R. McGuinness, A. Roy, A. Forencich, G. Papen, A. C. Snoeren, and G. Porter,“Rotornet: A scalable, low-complexity, optical datacenter network,” in

Proc. ACM SIGCOMM , pp. 267–280, 2017.[12] N. Hamedazimi, Z. Qazi, H. Gupta, V. Sekar, S. R. Das, J. P. Longtin, H. Shah, and A. Tanwer, “Fireﬂy:A reconﬁgurable wireless data center fabric using free-space optics,” in

Proc. ACM SIGCOMM ComputerCommunication Review (CCR) , vol. 44, pp. 319–330, 2014.[13] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, andS. Sengupta, “Vl2: a scalable and ﬂexible data center network,” in

Proc. ACM SIGCOMM ComputerCommunication Review (CCR) , vol. 39, pp. 51–62, 2009.[14] M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker, “pFabric:Minimal near-optimal datacenter transport,” in

ACM SIGCOMM Computer Communication Review ,vol. 43, pp. 435–446, ACM, 2013.[15] M. Khani, M. Ghobadi, M. Alizadeh, Z. Zhu, M. Glick, K. Bergman, and A. Vahdat, “Scaling DistributedMachine Learning with Silicon Photonics.”[16] C. E. Shannon, “A mathematical theory of communication,”

Bell system technical journal , vol. 27, no. 3,pp. 379–423, 1948.[17] J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,”

IEEE transactionson Information Theory , vol. 24, no. 5, pp. 530–536, 1978.[18] M. Feder, N. Merhav, and M. Gutman, “Universal prediction of individual sequences,”

IEEE transac-tions on Information Theory , vol. 38, no. 4, pp. 1258–1270, 1992.[19] A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, “Inside the social network’s (datacenter)network,” in

Proc. ACM SIGCOMM Computer Communication Review (CCR) , vol. 45, pp. 123–137,ACM, 2015. 1020] U. DOE, “Characterization of the DOE mini-apps.” https://portal.nersc.gov/project/CAL/doe-miniapps.htm , 2016.[21] H. Jiang and C. Dovrolis, “Source-level ip packet bursts: Causes and eﬀects,” in

Proceedings of the 3rdACM SIGCOMM Conference on Internet Measurement , IMC ’03, (New York, NY, USA), pp. 301–306,ACM, 2003.[22] M. Ghobadi, Y. Cheng, A. Jain, and M. Mathis, “Trickle: Rate limiting youtube video streaming,”in

Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC 12) , (Boston,MA), pp. 191–196, USENIX, 2012.[23] 7-zip. .[24] P. Deutsch, “Deﬂate compressed data format speciﬁcation version 1.3,” tech. rep., 1996.[25] T. M. Cover and J. A. Thomas,

Elements of information theory . John Wiley & Sons, 2012.[26] J. Ziv and A. Lempel, “A universal algorithm for sequential data compression,”

IEEE Transactions oninformation theory , vol. 23, no. 3, pp. 337–343, 1977.[27] A. D. Wyner and J. Ziv, “The sliding-window lempel-ziv algorithm is asymptotically optimal,”

Pro-ceedings of the IEEE , vol. 82, no. 6, pp. 872–877, 1994.[28] B. Vegetabile, J. Molet, T. Z. Baram, and H. Stern, “Estimating the entropy rate of ﬁnite markov chainswith application to behavior studies,” arXiv preprint arXiv:1711.03962 , 2017.[29] A. Bratko, G. V. Cormack, B. Filipiˇc, T. R. Lynam, and B. Zupan, “Spam ﬁltering using statisticaldata compression models,”

Journal of machine learning research , vol. 7, no. Dec, pp. 2673–2698, 2006.[30] A. Kantchelian, J. Ma, L. Huang, S. Afroz, A. Joseph, and J. Tygar, “Robust detection of commentspam using entropy rate,” in

Proc. 5th ACM Workshop on Security and Artiﬁcial Intelligence , pp. 59–70,ACM, 2012.[31] J. M. Amig´o, J. Szczepa´nski, E. Wajnryb, and M. V. Sanchez-Vives, “Estimating the entropy rate ofspike trains via lempel-ziv complexity,”

Neural Computation , vol. 16, no. 4, pp. 717–736, 2004.[32] N. Likhanov, B. Tsybakov, and N. D. Georganas, “Analysis of an atm buﬀer with self-similar (” fractal”)input traﬃc,” in

Proc. IEEE INFOCOM , vol. 3, pp. 985–992, IEEE, 1995.[33] M. E. Crovella and A. Bestavros, “Self-similarity in world wide web traﬃc: evidence and possiblecauses,”

IEEE/ACM Transactions on networking , vol. 5, no. 6, pp. 835–846, 1997.[34] W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson, “On the self-similar nature of ethernettraﬃc (extended version),”

IEEE/ACM Transactions on Networking (ToN) , vol. 2, no. 1, pp. 1–15,1994.[35] K. Park, G. Kim, and M. Crovella, “On the relationship between ﬁle sizes, transport protocols, andself-similar network traﬃc,” in

Network Protocols, 1996. Proceedings., 1996 International Conferenceon , pp. 171–180, IEEE, 1996.[36] M. S. Taqqu, V. Teverovsky, and W. Willinger, “Estimators for long-range dependence: an empiricalstudy,”

Fractals , vol. 3, no. 04, pp. 785–798, 1995.[37] E. Shriver, A. Merchant, and J. Wilkes, “An analytic behavior model for disk drives with reada-head caches and request reordering,” in

ACM SIGMETRICS Performance Evaluation Review , vol. 26,pp. 182–191, ACM, 1998. 1138] M. W. Garrett and W. Willinger, “Analysis, modeling and generation of self-similar vbr video traﬃc,”in

ACM SIGCOMM computer communication review , vol. 24, pp. 269–280, ACM, 1994.[39] R. H. Riedi, M. S. Crouse, V. J. Ribeiro, and R. G. Baraniuk, “A multifractal wavelet model withapplication to network traﬃc,”

IEEE transactions on Information Theory , vol. 45, no. 3, pp. 992–1018,1999.[40] M. Wang, T. Madhyastha, N. H. Chan, S. Papadimitriou, and C. Faloutsos, “Data mining meetsperformance evaluation: Fast algorithms for modeling bursty traﬃc,” in

Proceedings 18th InternationalConference on Data Engineering , pp. 507–516, IEEE, 2002.[41] N. Cressie, “Statistics for spatial data,”

Terra Nova , vol. 4, no. 5, pp. 613–617, 1992.[42] D. R. Cox and V. Isham,

Point processes , vol. 12. CRC Press, 1980.[43] P. Barford and M. Crovella, “Generating representative web workloads for network and server per-formance evaluation,” in

ACM SIGMETRICS Performance Evaluation Review , vol. 26, pp. 151–160,ACM, 1998.[44] Y. Liu, D. Towsley, T. Ye, and J. C. Bolot, “An information-theoretic approach to network monitoringand measurement,” in

Proc. 5th ACM SIGCOMM Conference on Internet Measurement , pp. 14–14.[45] Y. Liu, D. Towsley, J. Weng, and D. Goeckel, “An information theoretic approach to network tracecompression,”