Measuring the Complexity of Packet Traces
MMeasuring the Complexity of Packet Traces ∗ Chen Avin Manya Ghobadi Chen Griner Stefan Schmid School of Electrical and Computer Engineering, Ben Gurion University of the Negev, Israel Computer Science and Artificial Intelligence Laboratory, MIT, USA Faculty of Computer Science, University of Vienna, Austria
Abstract
This paper studies the structure of several real-world traces (including Facebook, High PerformanceComputing, Machine Learning, and simulation generated traces) and presents a systematic approach toquantify and compare the structure of packet traces based on the entropy contained in the trace file.Insights into the structure of packet traces can lead to improved network algorithms that are optimized to-ward specific traffic patterns. We then present a methodology to quantify the temporal and non-temporalcomponents of entropy contained in a packet trace, called the trace complexity , using randomization andcompression. We show that trace complexity provides unique insights into the characteristics of variousapplications and argue that there is a need for traffic generation models that preserve the intrinsic struc-ture of empirically measured application traces. We then propose a traffic generator model that is ableto produce a synthetic trace that matches the complexity level of its corresponding real-world trace.
Packet traces collected from networking applications, such as data center traffic, tend not to be completelyrandom and have been observed to feature structure : data center traffic matrices are sparse and skewed [1, 2],exhibit locality [3], and are bursty [4, 5]. Motivated by the existence of such structure, the networking com-munity is currently putting much effort into designing algorithms to optimize different network layers towardsuch structure towards self-driving and demand-aware networks [6, 7, 8], learning-based traffic engineering [9]and video streaming [10], as well as reconfigurable optical networks [11, 2, 12]. For instance, many networkoptimizations exploit the presence of elephant flows [13, 14].However, the structure available in different applications can differ significantly, and a unified approachto measure the structure in traffic traces is missing. Better quantification of a trace structure will lead tobetter network optimization and to the understanding of the available improvement, if possible, in currentsolutions. Moreover, one of the critical factors in evaluating new proposals is their traffic workload. Ideally,the traffic workload should contain the same structure as real-world traces but often is overlooked due to thelack of a traffic generation model that can replicate real-world traces while preserving their temporal andnon-temporal structure.For instance, consider a trace file including the communication pattern of a Machine Learning (ML)application executing a popular convolutional neural network training job on four GPUs. This workload wasobtained from authors of [15]. Figure 1(a) is a visualization of the trace file where each entry in the trace isrepresented by a unique color corresponding to its ¡source, destination¿ GPU pair. This visualization showsthe temporal structure in the trace file, as colors appear consecutive and follow some pattern. In contrast,Fig. 1(b) shows the same trace file but the entries in the file are shuffled to remove the temporal structurein Fig. 1(a). The traffic matrix (TM) in Fig. 1(c) shows a skewed heatmap indicating that some GPU pairscommunicate more frequently than others. Note that even thought the temporal structures in (a) and (b)are different, they both have the same TM shown in (c). In other words, the TM is able to capture the ∗ Authors appear in alphabetical order. Research conducted as part of Chen Griner’s thesis. a r X i v : . [ c s . N I] M a y a) Temporal structure in the original trace S o u r c e G P U D e s t i n a t i o n G P U (c) TM of (a)&(b) Time
Entries in the trace are shown in the order of appearance.(b) Shuffling the entries in (a) removes the temporal structure(d) A trace with lowest temporal structure corresponding to (a)(e) A trace with highest temporal structure corresponding to (a) (f) TM of (d)&(e) S o u r c e G P U D e s t i n a t i o n G P U Figure 1: Visualization of temporal and non-temporal structure in machine learning workload.non-temporal structure in the trace files but not the temporal one. For comparison, let us now consider twosynthetic traces shown in Fig. 1(d) and (e). Trace (d) is generated uniformly and random and has the lowesttemporal structure compared to (a), while trace (e) is sorted based on ¡source, destination¿ key and hencehas the highest temporal structure. Similarly, Fig. 1(f) captures the non-temporal structure in (d) and (e)but not the temporal one.But how can we measure the structure of a trace file? And how should we generate synthetic traces whilepreserving the structure of empirical traces? This paper aims at providing initial steps to these questions.In particular, we quantify the amount of temporal and non-temporal structure in traffic traces using theinformation theoretic measure of entropy [16] in the trace. Since the term entropy is define for randomvariables, as opposed to a sequence of individual communication requests in a packet trace, we use a moregeneral term called complexity [17] to quantify the structure in a packet trace and call it trace complexity .Moreover, we provide a traffic generation model to generate synthetic traces that match the structure of agiven trace. Intuitively, a packet trace with high structure has low entropy and low complexity : it containslittle information, and the sequence behavior is more predictable [18]. Our approach allows us to chart, whatwe call, a complexity map of individual traffic traces: to map each traffic trace to a two-dimensional graphindicating the amount of temporal and non-temporal information that is present in a trace.The main contributions of this paper are as follows. First, we present an information theoretic perspec-tive to systematically separate the temporal and non-temporal structures available in a traffic trace ( § § § This section describes our methodology to quantify the inherent structure in packet traces. Given a packettrace, σ , we define its trace complexity as the ratio of its entropy over that of a random trace, U ( σ ). Intuitively,a random trace does not compress as well as a more structured trace. At the heart of our methodology lie2able 1: Traces used in the paper.Type i ) Randomization: we systematically randomize different slices of a traffic trace toprofile their contributions to the trace complexity; ( ii ) Compression: we then measure the complexity ofthe trace and its randomized variants by compressing the trace files. The size of the compressed trace file isthen taken as the measure of trace complexity. Next, we explain these two concepts more formally.
Eliminate Structure by Randomization.
A traffic trace file σ consists of an ordered list of entries σ , σ , . . . , σ t , where each entry σ i = ( s i , d i ) is a ¡source, destination¿ pair arriving at time i . In this work,we ignore other fields in a packet header (such as packet size and port number) and focus on the orderof entries in the trace file, to capture temporal complexity, and source/destination pairs, to capture non-temporal complexity, of the trace. We find that there is enough information in our methodology to capturethe differences in temporal and non-temporal complexities of real traces. For instance, § Trace Complexity.
We now define the trace complexity, Ψ( σ ), as the ratio between the complexity of theoriginal trace, σ , to the expected complexity of its randomized counterpart, U ( σ ), where each of its entriesare chosen uniformly at random from the set of IDs in σ :Ψ( σ ) = C ( σ ) C ( U ( σ )) . (1)As we will discuss later in this section, C ( · ) represents the size of the compressed trace file and the morestructure a trace has, the better it can be compressed. Hence, Ψ( σ ) ∈ [0 ,
1] since C ( σ ) ≤ C ( U ( σ )). Temporal Trace Complexity.
The temporal structure of a trace is a reflection of the burstiness in thetraffic pattern. Prior work has measured the degree of burstiness in a trace as a sequence data packets withinter-arrival time less or equal to 1 millisecond [21, 22, 5]. Instead, we capture the temporal structure ina trace, σ , by systematically randomizing the original trace to eliminate all temporal relations in the traceand obtain a new trace file Γ( σ ). Intuitively, Γ( σ ) is a trace where each communication request is chosenindependently at random from the previous ones. Formally, let Γ( σ ) be a temporal transformation: a trans-formation that performs a uniform random permutation of the rows of σ , eliminating any time dependencybetween rows, therefore C ( σ ) ≤ C (Γ( σ )). To measure how much temporal complexity is contained in σ ,we therefore normalized it by the complexity of its temporal transformation, Γ( σ ). Hence, the normalizedtemporal trace complexity is defined as T ( σ ) = C ( σ ) C (Γ( σ )) ∈ [0 , Non-Temporal Trace Complexity.
Note that, non-temporal structure is unaffected by the transformationΓ( σ ) because correlations such as requests frequency, source destination dependency are conserved, whileonly the order of the elements is changed such that it is uniformly random. Therefore, all the remainingcomplexity after the elimination of temporal complexity is non-temporal. This can be formally definedusing our methodology by normalizing Γ( σ ) with U ( σ ) which has maximum complexity and no structure: N T ( σ ) = C (Γ( σ ))) C ( U ( σ )) ∈ [0 , Theoretical Properties.
It directly follows from our definition that the measure of trace complexityΨ( σ ) defined in Eq. 1 is the multiplication of the temporal and non-temporal complexity ratios. Formally,Ψ( σ ) = T ( σ ) × N T ( σ ). An important feature of our methodology is that is enables comparing traces ofdifferent sizes and domains. Section 4.2 describes the relationship of our metric to the entropy rate of a tracewhen σ is generated by a stationary stochastic process. Compression-based Complexity.
Our methodology to measure the empirical entropy of a trace file relies3 .30.40.50.60.70.80.91.0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
HADMulti GridMLpFabBursty UniformSkewed &Bursty SkewedWEBDBCNS N o n - t e m p o r a l c o m p l e x i t y Temporal complexity
Figure 2: The complexity map of seven real traces (colored circles) and four reference points placed on thecorners of the map.on the principle of data compression . The better we can compress a traffic trace, σ , the lower must be itsentropy rate and hence its complexity. We assume a compression function or algorithm C is applied to σ and the complexity of σ , C ( σ ), is the size of the compressed trace file. In this work, we use the 7zipcompressor with Lempel-Ziv-Markov chain compression (LZMA) [23]; other compression techniques such asDEFLATE [24] can also be used. In this section, we first propose a graphical representation, called the complexity map , to quantify andcompare the temporal and non-temporal complexities of different traces on a 2-dimensional plane ( § § Setup and Dataset.
As shown in Table 1, our dataset consists of 17 trace files in five categories: ( i ) tracefiles from three Facebook (FB) [19] datacenters: hadoop (HAD), web (WEB), and database (DB) includingIP and rack-level traces; ( ii ) MPI traces of three exascale applications in high performance computing (HPC)clusters [20]: CNS, MultiGrid, and NeckBone; ( iii ) pFabric [14] packet traces that we generated by runningthe NS2 simulation script obtained from the authors of the paper; ( iv ) a machine learning (ML) trace weobtained from [15] that measures the communication pattern between four GPUs running VGG19, a popularconvolutional neural network training job; and ( v ) four reference traces that we synthetically generate torepresent bursty, skewed, busty & skewed, and uniform traces. To avoid result distortion due to non-randomID selection in production traces, we uniformly hash all the source/destination IDs to the same length anddomain. 4
28 256 384 512 640 768 896 10241024896768640512384256128 128 256 384 512 640 768 896 10241024896768640512384256128 (a) (b) (c)Figure 3: Traffic matrices corresponding to three traces in Fig. 2: (a) CNS application, (b) MultiGridapplication, (c) Uniform reference point.
To compare the complexity of traces in our dataset, we place them on a complexity map where the Xand Y axes represent the temporal and non-temporal complexity dimensions. Fig. 2 shows the complexitymap of seven real traces and four reference points where each circle indicates a trace and the area of thecircles corresponds to the overall complexity of σ , Ψ( σ ), which is calculated by multiplying the temporal andnon-temporal complexities of σ , as described in § Uniform , Skewed , Bursty , and
Skewed & Bursty . The
Uniform trace is located at (1,1), indicating that it has thehighest possible complexity and, hence, no structure. This means that the trace is a uniformly chosen randomsequence. The
Skewed trace is located at (1,0.4), which indicates that it has high temporal complexity andlow non-temporal complexity. This is a result of requests in the sequence that are distributed iid (and hencewith high temporal complexity), but which arrive from a skewed distribution with low entropy (and hencehave low non-temporal complexity). In contrast, the
Bursty trace, located at (0.4,1), has low temporalcomplexity and high non-temporal complexity. This is the case when the next source-destination pair isselected uniformly at random, i.e., with high non-temporal complexity, but then repeated for some time(i.e., modeling a burst), creating temporal patterns and lower temporal complexity. Lastly, the
Skewed &Bursty trace, located at (0.4, 0.4), has the lowest complexity in the current map and has both temporal andnon-temporal structure. Requests are both temporally dependent (i.e., with repetitions) and new requestsarrive from a skewed distribution. All four traces can be generated using a Markovian model which wedescribe in more details later in Section 4. Fig. 3 shows the traffic matrix of three of the traces in Fig. 2:CNS, MultiGrid, and the uniform reference point. The observation that MultiGrid has less non-temporalcomplexity than CNS is captured by the differences in their traffic matrices shown in Fig. 3. In particular,we can observe that Fig. 3(a) has less structure than 3(b), hence CNS has higher non-temporal complexitythan MultiGrid. In contrast, Fig. 3(c) shows no structure and hence it has the highest complexity in thecomplexity map in Fig 2.
In this section, we apply the complexity map to different traces and discuss the main takeaways with respectto their complexities and the differences between them.
Applications have different complexity measures.
The complexity map highlights the different char-acteristics and structures available in different applications, confirming observations such as [19] conductedon Facebook’s datacenters. Recall Fig. 2: pFabric and ML traces feature a higher non-temporal complexitythan MultiGrid and all Facebook (DB, HAD, WEB) traces, but pFabric has a lower temporal complexitythan all the other traces. Interestingly, Facebook traces have the highest temporal complexity. We suspectthis is because of the 30,000 to 1 sampling of Facebook traces that destroys the temporal structure resulting5 .50.60.70.80.91.0 0.0 0.2 0.4 0.6 0.8 1.0 pFabric 10% load pFabric 80% load CNS Multi Grid
Temporal complexity N o n - t e m p o r a l c o m p l e x i t y pFabric 50% load NeckBone HAD Rack DB Rack DB IP WEB IP WEB Rack HAD IP T e m p o r a l & n o n - t e m p o r a l j u m p Temporal complexity N o n - t e m p o r a l c o m p l e x i t y WEB SRC WEB DST HAD DST HAD SRC HPC SRC HPC DST pFab SRC pFab DST N o n - t e m p o r a l j u m p T e m p o r a l j u m p Temporal complexity N o n - t e m p o r a l c o m p l e x i t y (a) pFabric & HPC traces (b) IP and Rack-level aggregations (c) Src&Dst-level complexitiesFigure 4: Using different complexity maps to understand various trace structures.in a high temporal complexity. This indicates that different applications may be identified by their specificcomplexity characteristic, and may provide different opportunities for optimization. To obtain a more de-tailed understanding, let us zoom in to the pFabric and HPC traces. Fig. 4(a) shows that the complexity ofpFabric also depends on the load (here, 10%, 50%, and 80% loads are shown): at lower loads, fewer flowsare competing, and hence flows mix to a lesser extent, naturally resulting in a lower temporal complexity.While this is expected, it validates our methodology and shows that compression can capture this behavior.The non-temporal complexity of different HPC traces (CNS, MultiGrid, and NeckBone) depends on thespecific application, but is generally lower than that of pFabric, validating that compression can capture thenon-temporal structure, as expected. Aggregation-level matters.
The Facebook traces provide an opportunity for additional insights, as theyallow us to study not only IP-to-IP traces but also rack-to-rack traffic. Fig. 4 (b) shows the complexities ofthree different Facebook clusters, HAD, WEB, and DB. First, we see that some applications (e.g., Hadoop)feature much more structure than others (e.g., DB). However, we also observe the impact of aggregation :at the rack level, we see a higher temporal complexity: communication becomes more random. Moreover,WEB and DB have a slightly lower non-temporal complexity on the rack level, an indication that thistraffic has a high structure and placement in datacenters is subject to optimization. In case of HAD, therack-level complexity is higher than the IP-level complexity, in both dimensions: as we move higher tothe topology, communication becomes more uniform. It is important to note that the Facebook traces aresampled from real traffic, but only from a part of the racks in the datacenter [19]. As a result, this trace isnot a perfect representation of the entire network, especially in terms of its source distribution. To get moreaccurate results, without introducing under-estimation in non-temporal complexity, we modified the uniformtransformation U ( σ ) such that it will generate the source and destination columns individually, only fromthe set of IDs found in the its respective columns. Also, we note that while sampling influences the measuredabsolute values, it does not affect the relative conclusions, e.g., regarding the relatively higher complexityobserved on the rack-level. The complexity of sources and destinations is unique to each trace.
So far, we have focused oncommunication pairs , however, our methodology also allows us to investigate the complexities introduced bysources and destinations separately. Indeed, traffic matrices may be asymmetric in that while communica-tion sources are more skewed in the traffic trace, communication destinations are uniform, and vice versa.Accordingly, Fig. 4 (c) depicts the complexity map for different traces separated by their source and desti-nations. In case of pFabric traces, sources and destinations behave similarly, which is expected, given thatthey are sampled uniformly at random. In the case of HPC, the non-temporal complexity is high, namely,the work seems to be uniformly divided among all CPUs. See also the marginal source and destinationdistributions in Fig. 3(a). Interestingly, the sources and destinations individually contain temporal structure(where the source has lower complexity), which may be an indication that operations proceed in rounds,6.g., where a node (CPU) first sends several requests and then receives answers asynchronously. The IPtraces for WEB application reveal that the sources have less temporal complexity than the destinations,which may be explained by the star-like communication patterns of a web server; the lower non-temporalcomplexity indicates a more skewed popularity distribution of web servers (compared to cache destinationswhich are load-balanced). The high non-temporal complexity of Hadoop on the rack-level shows the ratherequal distribution, however, temporal structure may be leveraged due to consecutive communications. Thelow non-temporal complexity of destinations is a result of the outband sampling of the FB trace where thenumber of sources is much smaller than the number of destinations.
One interesting implication of our methodology is that it naturally lends itself as the basis for synthetictraffic generators. Furthermore, it allows us to provide formal guarantees on the complexity of stochasticprocesses. In the following, we discuss these two aspects.
We next tackle the question of how to synthesize traffic workloads of a particular temporal and non-temporalcomplexity. Given the limited amount of publicly available communication traces, such a model can beparticularly useful to generate synthetic benchmarks, allowing researchers to compare their algorithms indifferent settings (e.g., for longer communication traces). Hence, in the following, we propose an approachwhich allows to efficiently generate traces with formal guarantees on their expected complexity for anyspecific point on the complexity map. It is important to note that for all points on the map, there could bemany traces (and models) whose complexity maps to this point. Our model provides one such solution.To derive formal guarantees, we propose a simple Markovian model which is a stationary random processwith a well-defined entropy rate [25]. The model has two components: temporal and non-temporal. Thenon-temporal component is a joint probability traffic matrix, M , which can be computed from a given trace σ , similar to Fig. 1(c), or represent a known distribution (e.g. Zipf) where the entropy depends on thedistribution’s parameters. The temporal component is a repeating probability p . To generate a trace, westart by sampling the first pair from M , and then at each step we add a pair to the trace, with probability p we repeat the last pair and with probability 1 − p we sample a new pair from M . More formally, to emulatea point ( x, y ) on the complexity map we set H ( M ) = y · n where H ( M ) is the joint entropy of M . Itcan be shown that M is the stationary distribution of the chain and the non-temporal complexity of themodel is y = H ( M ) / (2 log n )The temporal complexity of the model is x = H ( p, − p ) + (1 − p ) H ( M ) H ( M )where p can be computed analytically given x . Therefore, we can produce traces with similar complexitieson the complexity map.Figure 5 presents the quality of the above traffic generation model in producing syntactic traces forseven example points in the complexity map. First we used the model to reproduce the four hypotheticalpoints from Fig. 2: Uniform , Skewed , Bursty , and
Skewed & Bursty . As a skewed distribution we used Zipfdistribution with an exponent parameter of which leads to a normalized entropy of 0 . .30.40.50.60.70.80.91.0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Multi GridMLBursty UniformSkewedWEB
Temporal complexity N o n - t e m p o r a l c o m p l e x i t y Original traceReproduced trace
Skewed &Bursty
Figure 5: Our traffic generation model is able to a produce new trace files with similar complexities as theoriginal traces. The solid circles indicate the original trace’s complexity and dashed circles represent thecomplexity of the trace produced by our traffic generator.Moreover, currently we do not reproduce exact packet arrival times, rather we take the order of packets as aproxy for temporal structure. In future work, we plan to add packet arrival times and flow-level informationto the complexity analysis and our traffic generation model.
Our methodology can also provide a framework for formal analysis. In the following, assume that σ is a tracegenerated by a stationary stochastic process [25]. Then, for long sequences and using an optimal compressionalgorithm (such as the Lampel-Ziv [26]), we will achieve the compression limit defined by Shannon [25, 27, 28]:the entropy rate of the process. From this, the normalized complexities of σ can be proved analytically.Let Z = { Z i } be a stationary stochastic process that generates σ where Z i = { S i , D i } are time-indexedrandom variables, and S i ∈ S and D i ∈ D are a random source and a random destination at time i ,respectively. Since Z is stationary, let π denote the stationary distribution and note that π is a jointdistribution over S and D . With a slight abuse of notation, let S and D also denote the random variable of π . Note that S and D may be defined over different domains (IDs) and may be dependent. Also Z i may bedependent on a past element in Z .A basic measure for the complexity of Z is based on Shannon Entropy and known as the entropy rate [25], H ( Z ). The entropy rate of a stochastic process captures the expected number of bits per symbol (i.e.,source or destination IDs) that are both necessary and sufficient to describe σ ; or, alternatively, the expectedamount of uncertainty in the next symbol given past symbols in the sequence. The smaller the entropy rate,the less complex is the sequence (it requires fewer bits to describe/compress it).We can use the previous definitions of normalized trace complexity of σ and formally relate them toentropy rates when σ is generated by a stationary stochastic process. Theorem 1 (Trace Complexity Ratios of a Stationary Process) . Consider an indexed stationary stochastictrace process Z = { Z t } to generate σ where Z t = { S t , D t } and n = | S ∪ D | . If an optimal compression lgorithm C is used, then:1. The trace complexity ratio is lim t →∞ Ψ( σ ) = C ( σ ) C ( U ( σ )) = H ( Z )2 log n (2)
2. The temporal complexity ratio is: lim t →∞ T ( σ ) = C ( σ ) C (Γ( σ )) = H ( Z ) H ( S, D ) (3)
3. The non-temporal complexity ratio is: lim t →∞ N T ( σ ) = C (Γ( σ ))) C ( U ( σ )) = H ( S, D )2 log n (4) Information-theoretic approaches and compression methodologies have already been proven successful incapturing entropy in other domains such as email [29], or comment [30] spam filtering, or estimating neuraldischarges [31]. The study of traffic patterns and the design of models is an evergreen topic of high relevancein the networking literature, and examples where measurement studies spurred much research into trafficmodeling dates back to the 1990s [32, 33, 34]. Since then, a large number of methodologies have beendeveloped [35, 36, 37], based on temporal statistics [38, 39, 40], spatial statistics [41, 42], and physical [43]and information-theoretic [44, 45] models. In contrast to prior work, we are primarily interested in thecommunication pattern itself, rather than in the volume or headers of the exchanged data. While our workbuilds upon many significant results developed over the last decades [17], we are not aware of any work whichallows to systematically differentiate between temporal and non-temporal components of traffic traces.
The specific characteristics of traffic workloads have important implications for emerging network fabricsand algorithms [19]. This paper takes the first steps at understanding the complexity of traffic traces. Inaddition to temporal and non-temporal complexity measures, our entropy-based approach can be used toinvestigate other dimensions of the trace structure, e.g., regarding source-destination dependencies, or toexplore structure in transmission times and packet headers. More generally, while trace structure indicatespotential for optimizations, it remains to develop algorithms which exploit this structure to improve networkperformance and/or utilization.
References [1] T. Benson, A. Akella, and D. A. Maltz, “Network traffic characteristics of data centers in the wild,” in
Proc. ACM SIGCOMM Conference on Internet Measurement (IMC) , pp. 267–280, ACM, 2010.[2] M. Ghobadi et al., “Projector: Agile reconfigurable data center interconnect,” in
Proc. ACM SIG-COMM , pp. 216–229, 2016.[3] K. Chen, A. Singla, A. Singh, K. Ramachandran, L. Xu, Y. Zhang, X. Wen, and Y. Chen, “Osa:An optical switching architecture for data center networks with unprecedented flexibility,”
IEEE/ACMTransactions on Networking (TON) , vol. 22, no. 2, pp. 498–511, 2014.94] S. Zou, X. Wen, K. Chen, S. Huang, Y. Chen, Y. Liu, Y. Xia, and C. Hu, “Virtualknotter: Onlinevirtual machine shuffling for congestion resolving in virtualized datacenter,”
Computer Networks , vol. 67,pp. 141–153, 2014.[5] Q. Zhang, V. Liu, H. Zeng, and A. Krishnamurthy, “High-resolution measurement of data center mi-crobursts,” in
Proceedings of the 2017 Internet Measurement Conference , IMC ’17, (New York, NY,USA), pp. 78–85, ACM, 2017.[6] C. Avin and S. Schmid, “Toward demand-aware networking: A theory for self-adjusting networks,” in
ACM SIGCOMM Computer Communication Review (CCR) , 2018.[7] S. Xiao, D. He, and Z. Gong, “Deep-q: Traffic-driven qos inference using deep generative network,”in
Proceedings of the 2018 Workshop on Network Meets AI & ML , NetAI’18, (New York, NY, USA),pp. 67–73, ACM, 2018.[8] K. Tu, B. Ribeiro, A. Swami, and D. Towsley, “Tracking groups in mobile network traces,” in
Proceedingsof the 2018 Workshop on Network Meets AI & ML , NetAI’18, (New York, NY, USA), pp. 35–40, ACM,2018.[9] A. Valadarsky, M. Schapira, D. Shahaf, and A. Tamar, “Learning to route,” in
Proceedings of the 16thACM Workshop on Hot Topics in Networks , HotNets-XVI, (New York, NY, USA), pp. 185–191, ACM,2017.[10] H. Mao, R. Netravali, and M. Alizadeh, “Neural adaptive video streaming with pensieve,” in
Proceedingsof the Conference of the ACM Special Interest Group on Data Communication , SIGCOMM ’17, (NewYork, NY, USA), pp. 197–210, ACM, 2017.[11] W. M. Mellette, R. McGuinness, A. Roy, A. Forencich, G. Papen, A. C. Snoeren, and G. Porter,“Rotornet: A scalable, low-complexity, optical datacenter network,” in
Proc. ACM SIGCOMM , pp. 267–280, 2017.[12] N. Hamedazimi, Z. Qazi, H. Gupta, V. Sekar, S. R. Das, J. P. Longtin, H. Shah, and A. Tanwer, “Firefly:A reconfigurable wireless data center fabric using free-space optics,” in
Proc. ACM SIGCOMM ComputerCommunication Review (CCR) , vol. 44, pp. 319–330, 2014.[13] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, andS. Sengupta, “Vl2: a scalable and flexible data center network,” in
Proc. ACM SIGCOMM ComputerCommunication Review (CCR) , vol. 39, pp. 51–62, 2009.[14] M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker, “pFabric:Minimal near-optimal datacenter transport,” in
ACM SIGCOMM Computer Communication Review ,vol. 43, pp. 435–446, ACM, 2013.[15] M. Khani, M. Ghobadi, M. Alizadeh, Z. Zhu, M. Glick, K. Bergman, and A. Vahdat, “Scaling DistributedMachine Learning with Silicon Photonics.”[16] C. E. Shannon, “A mathematical theory of communication,”
Bell system technical journal , vol. 27, no. 3,pp. 379–423, 1948.[17] J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,”
IEEE transactionson Information Theory , vol. 24, no. 5, pp. 530–536, 1978.[18] M. Feder, N. Merhav, and M. Gutman, “Universal prediction of individual sequences,”
IEEE transac-tions on Information Theory , vol. 38, no. 4, pp. 1258–1270, 1992.[19] A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, “Inside the social network’s (datacenter)network,” in
Proc. ACM SIGCOMM Computer Communication Review (CCR) , vol. 45, pp. 123–137,ACM, 2015. 1020] U. DOE, “Characterization of the DOE mini-apps.” https://portal.nersc.gov/project/CAL/doe-miniapps.htm , 2016.[21] H. Jiang and C. Dovrolis, “Source-level ip packet bursts: Causes and effects,” in
Proceedings of the 3rdACM SIGCOMM Conference on Internet Measurement , IMC ’03, (New York, NY, USA), pp. 301–306,ACM, 2003.[22] M. Ghobadi, Y. Cheng, A. Jain, and M. Mathis, “Trickle: Rate limiting youtube video streaming,”in
Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC 12) , (Boston,MA), pp. 191–196, USENIX, 2012.[23] 7-zip. .[24] P. Deutsch, “Deflate compressed data format specification version 1.3,” tech. rep., 1996.[25] T. M. Cover and J. A. Thomas,
Elements of information theory . John Wiley & Sons, 2012.[26] J. Ziv and A. Lempel, “A universal algorithm for sequential data compression,”
IEEE Transactions oninformation theory , vol. 23, no. 3, pp. 337–343, 1977.[27] A. D. Wyner and J. Ziv, “The sliding-window lempel-ziv algorithm is asymptotically optimal,”
Pro-ceedings of the IEEE , vol. 82, no. 6, pp. 872–877, 1994.[28] B. Vegetabile, J. Molet, T. Z. Baram, and H. Stern, “Estimating the entropy rate of finite markov chainswith application to behavior studies,” arXiv preprint arXiv:1711.03962 , 2017.[29] A. Bratko, G. V. Cormack, B. Filipiˇc, T. R. Lynam, and B. Zupan, “Spam filtering using statisticaldata compression models,”
Journal of machine learning research , vol. 7, no. Dec, pp. 2673–2698, 2006.[30] A. Kantchelian, J. Ma, L. Huang, S. Afroz, A. Joseph, and J. Tygar, “Robust detection of commentspam using entropy rate,” in
Proc. 5th ACM Workshop on Security and Artificial Intelligence , pp. 59–70,ACM, 2012.[31] J. M. Amig´o, J. Szczepa´nski, E. Wajnryb, and M. V. Sanchez-Vives, “Estimating the entropy rate ofspike trains via lempel-ziv complexity,”
Neural Computation , vol. 16, no. 4, pp. 717–736, 2004.[32] N. Likhanov, B. Tsybakov, and N. D. Georganas, “Analysis of an atm buffer with self-similar (” fractal”)input traffic,” in
Proc. IEEE INFOCOM , vol. 3, pp. 985–992, IEEE, 1995.[33] M. E. Crovella and A. Bestavros, “Self-similarity in world wide web traffic: evidence and possiblecauses,”
IEEE/ACM Transactions on networking , vol. 5, no. 6, pp. 835–846, 1997.[34] W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson, “On the self-similar nature of ethernettraffic (extended version),”
IEEE/ACM Transactions on Networking (ToN) , vol. 2, no. 1, pp. 1–15,1994.[35] K. Park, G. Kim, and M. Crovella, “On the relationship between file sizes, transport protocols, andself-similar network traffic,” in
Network Protocols, 1996. Proceedings., 1996 International Conferenceon , pp. 171–180, IEEE, 1996.[36] M. S. Taqqu, V. Teverovsky, and W. Willinger, “Estimators for long-range dependence: an empiricalstudy,”
Fractals , vol. 3, no. 04, pp. 785–798, 1995.[37] E. Shriver, A. Merchant, and J. Wilkes, “An analytic behavior model for disk drives with reada-head caches and request reordering,” in
ACM SIGMETRICS Performance Evaluation Review , vol. 26,pp. 182–191, ACM, 1998. 1138] M. W. Garrett and W. Willinger, “Analysis, modeling and generation of self-similar vbr video traffic,”in
ACM SIGCOMM computer communication review , vol. 24, pp. 269–280, ACM, 1994.[39] R. H. Riedi, M. S. Crouse, V. J. Ribeiro, and R. G. Baraniuk, “A multifractal wavelet model withapplication to network traffic,”
IEEE transactions on Information Theory , vol. 45, no. 3, pp. 992–1018,1999.[40] M. Wang, T. Madhyastha, N. H. Chan, S. Papadimitriou, and C. Faloutsos, “Data mining meetsperformance evaluation: Fast algorithms for modeling bursty traffic,” in
Proceedings 18th InternationalConference on Data Engineering , pp. 507–516, IEEE, 2002.[41] N. Cressie, “Statistics for spatial data,”
Terra Nova , vol. 4, no. 5, pp. 613–617, 1992.[42] D. R. Cox and V. Isham,
Point processes , vol. 12. CRC Press, 1980.[43] P. Barford and M. Crovella, “Generating representative web workloads for network and server per-formance evaluation,” in
ACM SIGMETRICS Performance Evaluation Review , vol. 26, pp. 151–160,ACM, 1998.[44] Y. Liu, D. Towsley, T. Ye, and J. C. Bolot, “An information-theoretic approach to network monitoringand measurement,” in
Proc. 5th ACM SIGCOMM Conference on Internet Measurement , pp. 14–14.[45] Y. Liu, D. Towsley, J. Weng, and D. Goeckel, “An information theoretic approach to network tracecompression,”