[PDF] Bert: Scalable Source Routed Multicast for Cloud Data Centers

Abstract

Traditional IP multicast routing is not suitable for cloud data center (DC) networks due to the need for supporting large numbers of groups with large group sizes. State-of-the-art DC multicast routing approaches aim to overcome the scalability issues by, for instance, taking advantage of the symmetry of DC topologies and the programmability of DC switches to compactly encode multicast group information inside packets, thereby reducing the overhead resulting from the need to store the states of flows at the network switches. However, although these scale well with the number of multicast groups, they do not do so with group sizes, and as a result, they yield substantial traffic control overhead and network congestion. In this paper, we present Bert, a scalable, source-initiated DC multicast routing approach that scales well with both the number and the size of multicast groups, and does so through clustering, by dividing the members of the multicast group into a set of clusters with each cluster employing its own forwarding rules. Compared to the state-of-the-art approach, Bert yields much lesser traffic control overhead by significantly reducing the packet header sizes and the number of extra packet transmissions, resulting from the need for compacting forwarding rules across the switches.

Full PDF

BBert: Scalable Source Routed Multicast for CloudData Centers

Jarallah Alqahtani, Bechir Hamdaoui

Oregon State University, Corvallis,{alqahtaj,hamdaoui}@eecs.oregonstate.edu

Abstract —Traditional IP multicast routing is not suitable forcloud data center (DC) networks due to the need for supportinglarge numbers of groups with large group sizes. State-of-the-artDC multicast routing approaches aim to overcome the scalabilityissues by, for instance, taking advantage of the symmetry of DCtopologies and the programmability of DC switches to compactlyencode multicast group information inside packets, therebyreducing the overhead resulting from the need to store the statesof ﬂows at the network switches. However, although these scalewell with the number of multicast groups, they do not do sowith group sizes, and as a result, they yield substantial trafﬁccontrol overhead and network congestion. In this paper, wepresent

Bert , a scalable, source-initiated DC multicast routingapproach that scales well with both the number and the size ofmulticast groups, and does so through clustering, by dividing themembers of the multicast group into a set of clusters with eachcluster employing its own forwarding rules. Compared to thestate-of-the-art approach,

Bert yields much lesser trafﬁc controloverhead by signiﬁcantly reducing the packet header sizes andthe number of extra packet transmissions, resulting from theneed for compacting forwarding rules across the switches.

Keywords —Data center networks, multicast routing.

I. I

NTRODUCTION

Today’s cloud data centers (DCs) host hundreds of thou-sands of tenants [1], with each tenant possibly running hun-dreds of workloads supported through thousands of virtualmachines (VMs) running on different servers [2]–[4]. Theseworkloads often involve one-to-many communications amongthe different servers as required by the supported applica-tions [5], [6]. Therefore, to enable efﬁcient communicationand data transfer among the different servers running VMssupporting the same workload/application, multicast routingprotocol designs need to be revisited to suit today’s cloud datacenter network topologies. Traditional IP multicast routingis primarily designed for arbitrary network topologies andInternet trafﬁc, with focus on reducing CPU and networkbandwidth overheads, and hence is not suitable for DCs due tothe need for supporting large numbers of groups in commodityswitches with limited memory capability. In other words, DCswitches will have to maintain per-group routing rules for allmulticast addresses, because they cannot be aggregated on perpreﬁx basis.That is said, there have been few research efforts devotedto overcome this scalability issue [7]–[13]. For instance,Elmo [10], a recently proposed source-initiated multicast rout-ing approach for DCs, overcomes the scalability issue and isshown to support millions of multicast groups with reasonableoverhead in terms of switch state and network trafﬁc. Elmo does so by taking advantage of programmable switches [14]and the symmetry of DC topologies to compactly encodemulticast group information inside packets, thereby reducingthe overhead resulting from the need to store the statesof ﬂows at the network switches. However, although Elmoscales well with the number of multicast groups, it does notdo so with multicast group sizes. When considering largemulticast group sizes, Elmo header can carry on severalhundreds of bytes extra, which increases trafﬁc overhead inthe network. In addition, the number of extra transmissionsElmo incurs due to compacting of packet rules increasessigniﬁcantly with the size of multicast group, yielding highertrafﬁc congestion in the DC’s downlinks. To overcome Elmo’saforementioned limitations, we propose in this paper

Bert ,a source-initiated multicast routing for DCs. Unlike Elmo,

Bert scales well with both the number and the size ofmulticast groups, and does so through clustering, by dividingthe members of the multicast group into a set of clusterswith each cluster employing its own forwarding rules. Inessence,

Bert yields much lesser multicast trafﬁc overheadthan Elmo by signiﬁcantly reducing (1) the forwarding headersizes of multicast packets and (2) the number of per groupmember transmissions resulting from the need for compactingforwarding rules across the switches.The rest of this paper is organized as follows. We brieﬂyillustrate the network architecture of modern DCs, and de-scribe the limitations of prior related state-of-the art works inSection II. We present the proposed multicast routing scheme,

Bert , in Section III. We study and evaluate the performancesof

Bert and compare them to those obtained under Elmo inSection IV. We conclude the paper in Section V.II. L

IMITATIONS OF THE S TATE OF THE A RT

1) Background—DC Topologies:

Large-scale DCs typi-cally are multi-rooted tree-based topologies (e.g., fat-tree [15]and its variants [16]–[18]). These types of topologies providelarge numbers of parallel paths to support high bandwidth,low latency, and non-blocking connectivity among servers.The servers are tree leaves, which are connected to top-of-rack (ToR) (edge/leaf) switches. In general, DCs contain threetypes of switches, leaf, spine, and core, with each type residingin one layer, as shown in Fig. 1. At the lowest layer, leaf(aka edge) switches are interconnected through spine (akaaggregation) switches, which constitute the second layer ofswitches. The core switches, constituting the top/root layer,serve as connections among the spine switches. With such a a r X i v : . [ c s . N I] A ug C P2 P4P3P1P2 P4P3P1 H H L1 L2 L3 L4 L5 L6 L7 L8 L1 L2 L3 L4 L5 L6 L7 L8S1 S2 S3 S4 S5 S6 S7 S8 S1 S2 S3 S4 S5 S6 S7 S8C1 C2 C3 C4 C1 C2 C3 C4 H H H H H H H H H H H H H H R1R2

Upstream p -rules Downstream p -rulesUpstream p -rules Downstream p -rules (a) Bert

CC P2 P4P3P1P2 P4P3P1 H H L1 L2 L3 L4 L5 L6 L7 L8 L1 L2 L3 L4 L5 L6 L7 L8S1 S2 S3 S4 S5 S6 S7 S8 S1 S2 S3 S4 S5 S6 S7 S8C1 C2 C3 C4 C1 C2 C3 C4 H H H H H H H H H H H H H H R1R2

Upstream p -rules Downstream p -rulesUpstream p -rules Downstream p -rules (b) Elmo Fig. 1: An example of multicast tree on a three-tier Clos topology with four pods. In this topology, there are 4 hosts under each leaf switch (Top of Rack). H is the source ofmulticast groups, and H , H , H , H , H , H , and H are the receivers of multicas group. DC topology, every server can communicate with any otherserver using the same number of hops.

2) Multicast in DCs:

DC multicast has been studied froma different point of views. For example, frameworks proposedin [19], [20] studied the resource allocation and embeddingof multicast virtual networks. Mainly they focused on howto place and restore VMs to provide high performance non-blocking multicast virtual networks while reducing hardwarecost in Fat-Tree DCs. Other works, including ours, focusedon the scalability problem of multicast routing in DCs. Theseworks relied either on decentralized protocols such as IGMPand PIM [9] or centralized ones such as SDN-based ap-proaches [7], [8], [13], [21]. Even though these approachesovercome scalability issue of multicast routing by supportinglarge numbers of multicast groups, they perform poorly whenthe size of groups are large.

3) Elmo:

Elmo [10] is a recently proposed DC multicastrouting that scales well with number of multicast groups. Elmois a source-based routing, which encodes packet forwardingstate/rules in packet headers to limit ﬂow state informationthat DC switches will have to maintain. It also exploitsthe programmable capability of the DC switches and thesymmetry of DC topologies to compactly encode multicastgroup information inside packets, and thereby reduces packetheader overhead, and consequently, network trafﬁc load.Even though Elmo is shown to scale well with the numberof multicast groups, it still suffers from scalability issues interms of incurred trafﬁc overhead when it comes to largegroup sizes. For example, a packet header could be as large as bytes to contain all p -rules (packet rules) [10], incurringexcessive network trafﬁc overhead and link congestions. Elmotries to overcome this by: (1) removing per-hop p -rulesfrom the header as packets traverse the network switches;unfortunately, the downstream spine and leaf switches, whichhappen to consume most of the header space, are removedlast, making most of the trafﬁc overhead go over most of the network topology. (2) Switches in downstream paths withsame or similar bitmaps are mapped to a single bitmap. Forexample, as shown in Fig. 1a, at the leaf layer, L and L can share one p -rule; e.g. L , L : 1100 , yielding one extratransmission in L . However, sharing bitmaps results in extrapacket transmissions, which they too increase trafﬁc overhead.In order to overcome the aforementioned challenges ofElmo, we propose Bert , which ﬁrst clusters the set of mul-ticast destination members into multiple subsets/clusters, andthen encodes multicast information in packet headers for eachof these clusters. Our proposed multicast routing approach,

Bert , outperforms Elmo in terms of trafﬁc overhead bysigniﬁcantly reducing (1) packet header sizes and (2) thenumber of extra transmissions resulting from the need forcompacting forwarding rules.III. T HE P ROPOSED M ULTICAST R OUTING : Bert

A. Motivating Example

In this section, a detailed example is presented and illus-trated in Fig. 1 to explain the limitations of Elmo and motivatethe design of the proposed scheme,

Bert . At the high level,for each multicast group, the controller ﬁrst computes amulticast tree and the forwarding rules, and then, installsthese rules in the hypervisor of the multicast group source.The hypervisor intercepts each multicast packet and adds theforwarding rules to the packet header. Elmo essentially focuseson how to efﬁciently encode a multicast forwarding policy inthe packet header. Whereas

Bert , in addition to efﬁcientlyencoding the forwarding rules, aims to alleviate trafﬁc over-head caused by header size and extra packet transmissions inthe downstream paths. The forwarding header consists of asuccession of p -rules that include rules for upstream leaf andspine switches, as well as for the downstream core, spine, andleaf switches. Each switch in the multicast tree will removeits p -rules from the header when forwarding the packet to theext layer. For both Elmo and Bert , each multicast packet’sjourney can be explained in two main phases:

1) Upstream (leaf switches to core switches) path:

The p -rules for upstream switches (leaf and spine) consist ofdownstream ports and a multipath ﬂag. When the packetarrives at the upstream leaf switch, the switch forwards itto the given downstream ports as well as multipathing itto the upstream spine switch using an underlying multipathrouting scheme; e.g. ECMP [22]. In Elmo, only one packetgoes through upstream paths. Using Fig. 1b for illustration,leaf switch L ﬁrst removes its p -rules ( − M ) from thepacket, then forwards it to the host H as well as multipathingit to any spine switch P . The upstream spine switches willdo the same to forward the packet to the core switches.Our proposed Bert , on the other hand, ﬁrst clusters thedestination members of the multicast group into multiple (twoin the example) clusters, and then sends multiple (two in theexample) copies of the packet (with different headers but samepayload), one for each cluster; more detail on the clusteringpart will be provided later. The ﬁrst packet has the sameupstream p -rules as Elmo; e.g. R , while the second packet(e.g. R ) does not have any downstream rules for the leaf andspine switches to avoid any extra transmissions. Even thoughduplicate packets will incur some minor trafﬁc in upstreampaths, it will reduce the trafﬁc in the downstream pathssubstantially when compared to Elmo. That is, the overalltrafﬁc reduction in both the upstream and downstream will besigniﬁcantly reduced under Bert .

2) Downstream (core switches to leaf switches) path:

The p -rules for the core, spine, and leaf switches in the downstreampath consist of downstream ports and switch IDs. In thedownstream path, the core switches forward the packet to thegiven pod based on the core switch p -rules. In Elmo, onecore switch sends the packet to the spine switches, whichin turn forward it (based on the spine switch p -rule) to theleaf switches. The leaf switches do the same to deliver thepacket to the destination hosts. Note that because the topologysymmetry, any core switch can forward the packet to thedestination pods. Referring to the example in Fig. 1b again,in Elmo, core switch C sends the packet to P , P and P switches (three packets in total), once the packet arrives at thedownstream spine switch, it is then forwarded based on thespine switch p -rules to the leaf switches. These leaf switchesdo, in turn, the same to deliver the packet to the destinationhosts. For example, when all leaf switches in the multicasttree share one p -rule—which should then be bitwise OR of allthese leaf switches (i.e., L , L , L , L : 1111 ), Elmo incurs10 extra packet transmissions (see Fig. 1b).Unlike Elmo, to reduce the number of extra unneededtransmissions, Bert ﬁrst clusters the destination membersinto multiple (two in the example) clusters, and then sends adifferent copy for each cluster in the downstream (all copieshave the same payload and size but different header/rules).Referring to Fig 1a again for illustration, in

Bert , C forwards the ﬁrst packet (e.g. R ) to P and P , while C forwards the second packet (e.g. R ) to P . Note that thenumber of core-pod packets, which is three in the example,is the same in both Elmo and Bert . However,

Bert reduces substantially the number of extra packet transmissions fromleaf switch to end hosts. To illustrate, when, as done abovefor the case of Elmo, all leaf switches within the same clustershare one p -rule (i.e., in the example of Fig 1a, when eachof R and R compacts its leaf switch rules in one ruleonly, with R ’s and R ’s rules becoming L , L : 0110 and L , L : 1100 respectively), Bert incurs only 2 extrapacket transmissions as opposed to 10 in the case of Elmo.Taking into account both the upstream and downstream paths,compared to Elmo,

Bert incurs 3 more extra transmissionsin the upstream (1 extra in each upstream layer), but 8 lessertransmissions in the downstream, thereby reducing the totalextra transmissions by 5 compared to Elmo.In addition to reducing the extra packet transmissions, theheader size for the downstream packet in

Bert is reduced.For example, the header size of the ﬁrst packet ( R ) is 36bits and that of the second packet ( R ) is 21 bits. To identifyswitches, we use three bits for each of the spine and leafswitches. Hence, the average header size in Bert is about 29bits per packet. Elmo packet header, on the other hand, is ofsizes 55 bits (see Fig. 1b). In general, the average header sizefor the downstream packet in

Bert is k of that of Elmo’spacket, where k is the number of the clusters of the multicastgroup, a design parameter of Bert . B. BertBert aims to reduce the control message trafﬁc by reduc-ing the number of extra transmissions that Elmo incurs inthe downstream paths, as well as the size of the multicastpacket header. As illustrated in the motivating example givenin the previous section,

Bert achieves this goal by clusteringthe set of group members into k clusters. This is done foreach multicast group independently. Before presenting theclustering approach of Bert , we introduce the followingnotations/parameters of the studied three-tier DC: throughout,let us denote the number of pods by n , the number ofports per-leaf switch by l , the number of leaf switches perpod by m . Note that although in traditional fat-tree DC, m = n/ and l = n/ , for the sake of keeping our techniqueapplicable to any tree-based DC topologies, we use the generalparameter notation. Also, let L jg,i be the l -bit binary vector,corresponding to the j th leaf switch belonging to the i th pod,where ≤ i ≤ n and ≤ j ≤ m , with each bit correspondingto one port of the leaf switch and taking when the port isserving a member of the multicast group g and otherwise.For each multicast group g and each pod i , let L g,i be theconcatenation of the m l -bit vectors of the m leaf switchesbelonging to pod i . That is, L g,i = L g,i || L g,i || ... || L mg,i ; here, L g,i is a binary vector of size l × m .Back to Bert ’s clustering method, we begin by mentioningthat in

Bert , we choose to cluster group members basedon the pods as opposed to the leaf switches. That is, foreach multicast group g , Bert clusters the set of n vectors, L g,i with ≤ i ≤ n , as opposed to the set n × m ofvectors, L jg,i with ≤ i ≤ n and ≤ j ≤ m . Thischoice is supported shortly via an example. Bert uses K-Means clustering algorithm with the Hamming distance as the

P2 P4P3P1 H H L1 L2 L3 L4 L5 L6 L7 L8S1 S2 S3 S4 S5 S6 S7 S8C1 C2 C3 C4 H H R1R2

Upstream p -rules Downstream p -rules H H H H H H CP2 P4P3P1 H H L1 L2 L3 L4 L5 L6 L7 L8S1 S2 S3 S4 S5 S6 S7 S8C1 C2 C3 C4 H H R1R2

Upstream p -rules Downstream p -rules H H H H H H (a) Locality aware clustering CP2 P4P3P1 H H L1 L2 L3 L4 L5 L6 L7 L8S1 S2 S3 S4 S5 S6 S7 S8C1 C2 C3 C4 H H R1R2

Upstream p -rules Downstream p -rules H H H H H H CP2 P4P3P1 H H L1 L2 L3 L4 L5 L6 L7 L8S1 S2 S3 S4 S5 S6 S7 S8C1 C2 C3 C4 H H R1R2

Upstream p -rules Downstream p -rules H H H H H H (b) Locality oblivious clustering Fig. 2: Clustering choice example of multicast tree on a three-tier Clos topology with four pods. In this topology, there are 4 hosts under each leaf switch (Top of Rack). H isthe source of multicast groups, and H , H , H , H , H , H , H , H , and H are the receivers of multicas group. distance metric, where the Hamming distance between twobinary vectors is simply the number of bit positions in whichthey differ. For each multicast group g , K-Means algorithmtakes as an input the set of n vectors, L g,i with ≤ i ≤ n , andthe number of clusters, k , and outputs k clusters, with eachcluster specifying a subset of the pods that need to belongto the same cluster. Once clustering is done, the p -rules ofeach cluster are created by the hypervisor, which makes onecopy of the multicast packet (data + header/ p -rules) for eachcluster. For example, in Fig. 1b, when the hypervisor of host H receives the multicast packet, it creates another copy ofthis packet, and adds the R rules to the ﬁrst packet and the R rules to the second packet.As with the case of K-Means clustering in general, thedecision on the number of clusters, k , is a design choice,as the algorithm takes it as an input. The observation wemade is that the larger the k is, the lesser the number ofextra transmissions in the downstream path and the lesserthe header size overhead, but also the greater the number ofextra transmissions in the upstream links (from leaf switchesto core switches). However, we also observe that the overall(including both upstream and downstream paths) trafﬁc over-head reduction improves with the number of clusters, k . Moreon this is provided in the evaluation section.Now the reason for why Bert adopts clustering based onpods and not on leaf switches is as follows: if we cluster thedownstream pods based on the p -rules for the downstream leafswitches regardless of which pod they belong to, extra packetstransmissions will occur at the core and spine switches in thedownstream path. For example, in Fig. 2b, when clusteringis based on leaf switches only and when using the Hammingdistance similarity, L and L will be clustered in the samecluster (e.g. R ), and L and L will be clustered in theother/second cluster (e.g. R ). In this case, because L and L are in the same pod (pod 2) but they are in differentclusters, the packet will be sent twice at both core and spine downstream layers. The same thing happens with L and L .To avoid this, Bert adopts a clustering choice that is localityaware of leaf switches (see Fig. 2a).

C. Key Features of

Bert

Compared to Elmo,

Bert reduces multicast trafﬁc substan-tially, and does so by:

1) Reducing Packet Header Size:

In multi-rooted Clostopologies, unlike trafﬁc load in upstream paths which areequally distributed, downstream paths are much heavier andare always the main bottleneck of the network. This isbecause, in these types of topology, the upstream routing isfully adaptive, while the downstream routing is deterministic.Moreover, the multicast workload may make this worse be-cause multicast packets are replicated at the downstream pathsin order to reach each group member. In Elmo, by adding the p -rules to the packet, a data packet may have as many as325 bytes of forwarding rules per each packet. In Bert , theaverage header size for the downstream packet is inverselyproportional to the number of clusters k , i.e., k , of that ofElmo’s packet, as explained in the previous subsection.

2) Reducing Number of Extra Transmissions:

Bert ﬁrstclusters the multicast group members into k clusters, and thensends one copy (with same payload but different rules/header)for each cluster in the downstream, thereby reduceing thenumber of extra transmissions substantially in the downstreampath. IV. P ERFORMANCE E VALUATION

Using simulations, in this section, we evaluate and comparethe performance of

Bert to that of Elmo in terms of theirability to reduce multicast control trafﬁc. Mimicking theexperiment setup of Elmo [10], we simulate a 3-tiered DCtopology built with -port switches, all of which connecting27,648 servers, while considering different multicast groupizes. Group members for each simulated multicast groupare distributed (uniformly) randomly across the servers. Let l = 48 denote the number of ports in each of the leaf switches. A. Extra Packet Transmission Overhead

We focus on downstream leaf switches here, since they aremostly the ones that cause extra packet transmissions. In thisevaluation, we impose only one p -rule per packet for all thedownstream leaf switches. For Elmo, this one rule, denoted by M , is constructed as a bitwise-OR of l -bit vectors of all leafswitches that happen to be hosting at least one group member.That is, M = OR { L jg,i } ≤ i ≤ n, ≤ j ≤ m . For Bert , one rule M u is to be constructed for each cluster u , ≤ u ≤ k , alsoby bitwise-ORing all l -bit vectors of all leaf switches thathappen to be hosting at least one group member and whosepod happens to belong to cluster u .The number of extra packet transmissions ET incurred byElmo and Bert can be calculated as the sum of the Hammingdistances/XOR between the rule and each of the l -bit vectorof each leaf switches participating in the multicast tree/group.That is, for multicast group g , ET Elmog = (cid:88) i ∈ S g XOR ( L i , M ) where S g is the set of all leaf switches hosting at least onemember of mulitcast group g , and L i is the l -bit binary vectorof leaf switch i . Similarly, ET Bert g = k (cid:88) u =1 (cid:88) i ∈ S ug XOR ( L i , M u ) where S ug is the set of all leaf switches whose pods belong tocluster u and that are hosting at least one member of mulitcastgroup g , and k is the number of clusters per multicast group.We vary the size of the multicast group from d = 100 to d = 500 members. Fig. 3 shows the total number ofextra packet transmissions for all the downstream leaf in themulticast tree caused by combining their p -rules. From Fig. 3,we observe that in Bert , the number of extra transmissionsdepends on the size of the group as well as on the numberof clusters. First, observe that

Bert reduces the number ofextra packet transmissions when compared to Elmo, espe-cially when the number of cluster is increased. For example,

Bert reduces the number of extra transmissions from to for group size of 200 members when the number ofclusters is increased from k = 2 to k = 12 . The secondobservation is that the reduction of the number extra packettransmissions in Bert increases when the multicast group sizeis decreased. For example, Fig. 4 shows that for k = 5 , extrapacket transmissions decreases from about to about when the group size increases from 100 to 500 members. Hereall numbers are normalized with respect to the total numberof extra transmissions incurred by Elmo. B. Header Size Overhead

Again, we focus on the downstream leaf switches becausethey use up most of the forwarding header capacity. In this

Number of clusters: k N o r m a li ze d E x t r a T r a n s m i ss i o n s Elmod=500d=400d=300d=200d=100

Fig. 3: Number of extra transmissions caused when combining p -rules % % % % % % d=500 d=400 d=300 d=200 d=100 E x t r a T r a n s m i ss i o n s S a v i n g Fig. 4: Extra transmissions saving for different size of a multicast group, when k=5 experiment, we focused on multicast groups with large sizes;e.g. 2000 members. Fig. 5 shows that the header size isdramatically decreased in

Bert , especially when the numberof cluster is small. For example, when k = 2 , size ofthe header is reduced by . Moreover, it gently keepsdecreasing when the number of cluster is increased.Now, in Figs. 6 and 7, we show the impact of the header sizereduction as well as the packet duplication caused by Bert ’sproposed clustering. We calculate the average trafﬁc traversingeach link on upstream and downstream paths of each layer.Without loss of generality, we assume that the size of both theforwarding header and payload of the packet is one unit trafﬁceach, and consider the multicast ﬂow size for this group to

Number of clusters: k N o r m a li ze dp - r u e l s S i ze ElmoBert

Fig. 5: Header size of p -rules that sent to all downstream switches of a multicast group. Fig. 6: Link load trafﬁc in the upstream paths caused by a multicast ﬂow of size 1000pkt and d=2000.

Fig. 7: Link load trafﬁc in the downstream paths caused by a multicast ﬂow of size packets. Multicast group size is d = 2000 be 1000 packets. We also assume that Equal-Cost Multipathprotocol (ECMP) [22] is used for load balancing trafﬁc, anduse the standard deviation across all links’ trafﬁc loads to showthe evenness of load distribution across the links in each layer.In this experiment, we show the tradeoffs between large andsmall values of k discussed in Sec. III-B.Fig. 6 shows that the average link trafﬁc load in the up-stream path achieved under Bert is higher than that obtainedunder Elmo. This is expected because

Bert creates and sendsmultiple packets one for each cluster. For example, when k = 2 , the upstream trafﬁc load under Bert is higherthan than under Elmo. Moreover, this trafﬁc load increaseswhen the number of clusters (e.g. when k = 6 ). However,these links are evenly utilized as shown via standard deviationvalues.On the other hand, in the downstream path, shown in Fig. 7, Bert achieves lower link trafﬁc loads compared to Elmo, andthis is true regardless of the number of clusters, though themore cluster

Bert has, the lower the load. For example when k = 6 , the average link trafﬁc observed under Bert is about lower than that observed under Elmo.To sum up, when accounting for both the upstream anddownstream paths,

Bert outperforms Elmo also in achievinglower trafﬁc loads across the links, leading to lesser networkcongestion. V. C

ONCLUSION

We proposed

Bert , a salable, source routed multicastscheme for cloud data centers.

Bert builds on existingapproaches to better suit nowadays cloud data center net-works.

Bert alleviates trafﬁc congestion at downstream paths (usually highly congested links) by reducing both the packetheader sizes and the number of extra packet transmissions.R

EFERENCES[1] “Amazon cloud has 1 million users,” https://arstechnica.com/information-technology/2016/04/amazon-cloud-has-1-million-users-and-is-near-10-billion-in-annual-sales,accessed: 2020-02-19.[2] “Microsoft azure,” https://azure.microsoft.com/, accessed: 2020-02-10.[3] “Google compute engine,” https://cloud.google.com/compute/, accessed:2020-02-10.[4] “Amazon aws kernel description,” https://aws.amazon.com/, accessed:2020-02-10.[5] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoopdistributed ﬁle system,” in

Mass storage systems and technologies(MSST), 2010 IEEE 26th symposium on . Ieee, 2010, pp. 1–10.[6] J. Dean and S. Ghemawat, “Mapreduce: simpliﬁed data processing onlarge clusters,”

Communications of the ACM , vol. 51, no. 1, pp. 107–113, 2008.[7] A. Iyer, P. Kumar, and V. Mann, “Avalanche: Data center multicast usingsoftware deﬁned networking,” in . IEEE, 2014, pp.1–8.[8] X. Li and M. J. Freedman, “Scaling ip multicast on datacenter topolo-gies,” in

Proceedinsgs of the ninth ACM conference on Emergingnetworking experiments and technologies , 2013, pp. 61–72.[9] F. Fan, B. Hu, K. L. Yeung, and M. Zhao, “Miniforest: Distributedand dynamic multicasting in datacenter networks,”

IEEE Transactionson Network and Service Management , vol. 16, no. 3, pp. 1268–1281,2019.[10] M. Shahbaz, L. Suresh, J. Rexford, N. Feamster, O. Rottenstreich, andM. Hira, “Elmo: Source routed multicast for public clouds,” in

Pro-ceedings of the ACM Special Interest Group on Data Communication ,2019, pp. 458–471.[11] D. Li, M. Xu, M.-c. Zhao, C. Guo, Y. Zhang, and M.-y. Wu, “Rdcm:Reliable data center multicast,” in .IEEE, 2011, pp. 56–60.[12] W.-K. Jia, “A scalable multicast source routing architecture for datacenter networks,”

IEEE Journal on Selected Areas in Communications ,vol. 32, no. 1, pp. 116–123, 2013.[13] W. Cui and C. Qian, “Dual-structure data center multicast using softwaredeﬁned networking,” arXiv preprint arXiv:1403.8065 , 2014.[14] “Barefoot toﬁno: World’s fastest p4-programmable ethernet switchasics,” https://barefootnetworks.com/products/brief-toﬁno/, accessed:2020-02-10.[15] M. Al-Fares, A. Loukissas, and A. Vahdat, “A scalable, commoditydata center network architecture,”

ACM SIGCOMM computer commu-nication review , vol. 38, no. 4, pp. 63–74, 2008.[16] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri,D. A. Maltz, P. Patel, and S. Sengupta, “Vl2: a scalable and ﬂexibledata center network,” in

Proceedings of the ACM SIGCOMM 2009conference on Data communication , 2009, pp. 51–62.[17] V. Liu, D. Halperin, A. Krishnamurthy, and T. Anderson, “F10: Afault-tolerant engineered network,” in

Presented as part of the 10th { USENIX } Symposium on Networked Systems Design and Implementa-tion ( { NSDI } , 2013, pp. 399–412.[18] J. Alqahtani and B. Hamdaoui, “Rethinking fat-tree topology design forcloud data centers,” in . IEEE, 2018, pp. 1–6.[19] J. Duan and Y. Yang, “Placement and performance analysis of virtualmulticast networks in fat-tree data center networks,” IEEE Transactionson Parallel and Distributed Systems , vol. 27, no. 10, pp. 3013–3028,2016.[20] S. Ayoubi, C. Assi, Y. Chen, T. Khalifa, and K. B. Shaban, “Restorationmethods for cloud multicast virtual networks,”

Journal of Network andComputer Applications , vol. 78, pp. 180–190, 2017.[21] D. Li, M. Xu, Y. Liu, X. Xie, Y. Cui, J. Wang, and G. Chen, “Reliablemulticast in data center networks,”