Guohan Lu
Microsoft
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Guohan Lu.
acm special interest group on data communication | 2009
Chuanxiong Guo; Guohan Lu; Dan Li; Haitao Wu; Xuan Zhang; Yunfeng Shi; Chen Tian; Yongguang Zhang; Songwu Lu
This paper presents BCube, a new network architecture specifically designed for shipping-container based, modular data centers. At the core of the BCube architecture is its server-centric network structure, where servers with multiple network ports connect to multiple layers of COTS (commodity off-the-shelf) mini-switches. Servers act as not only end hosts, but also relay nodes for each other. BCube supports various bandwidth-intensive applications by speeding-up one-to-one, one-to-several, and one-to-all traffic patterns, and by providing high network capacity for all-to-all traffic. BCube exhibits graceful performance degradation as the server and/or switch failure rate increases. This property is of special importance for shipping-container data centers, since once the container is sealed and operational, it becomes very difficult to repair or replace its components. Our implementation experiences show that BCube can be seamlessly integrated with the TCP/IP protocol stack and BCube packet forwarding can be efficiently implemented in both hardware and software. Experiments in our testbed demonstrate that BCube is fault tolerant and load balancing and it significantly accelerates representative bandwidth-intensive applications.
conference on emerging network experiment and technology | 2009
Haitao Wu; Guohan Lu; Dan Li; Chuanxiong Guo; Yongguang Zhang
Shipping-container-based data centers have been introduced as building blocks for constructing mega-data centers. However, it is a challenge on how to interconnect those containers together with reasonable cost and cabling complexity, due to the fact that a mega-data center can have hundreds or even thousands of containers and the aggregate bandwidth among containers can easily reach tera-bit per second. As a new inner-container server-centric network architecture, BCube [9] interconnects thousands of servers inside a container and provides high bandwidth support for typical traffic patterns. It naturally serves as a building block for mega-data center. In this paper, we propose MDCube, a high performance interconnection structure to scale BCube-based containers to mega-data centers. MDCube uses the high-speed uplink interfaces of the commodity switches in BCube containers to build the inter-container structure, reducing the cabling complexity greatly. MDCube puts its inter- and inner-container routing intelligences solely into servers to handle load-balance and fault-tolerance, thus directly leverages commodity instead of high-end switches to scale. Through analysis, we prove that MDCube has low diameter and high capacity. Both simulations and experiments in our testbed demonstrate the fault-tolerance and high network capacity of MDCube.
conference on emerging network experiment and technology | 2013
Jiaxin Cao; Rui Xia; Pengkun Yang; Chuanxiong Guo; Guohan Lu; Lihua Yuan; Yixin Zheng; Haitao Wu; Yongqiang Xiong; David A. Maltz
Clos-based networks including Fat-tree and VL2 are being built in data centers, but existing per-flow based routing causes low network utilization and long latency tail. In this paper, by studying the structural properties of Fat-tree and VL2, we propose a per-packet round-robin based routing algorithm called Digit-Reversal Bouncing (DRB). DRB achieves perfect packet interleaving. Our analysis and simulations show that, compared with random-based load-balancing algorithms, DRB results in smaller and bounded queues even when traffic load approaches 100%, and it uses smaller re-sequencing buffer for absorbing out-of-order packet arrivals. Our implementation demonstrates that our design can be readily implemented with commodity switches. Experiments on our testbed, a Fat-tree with 54 servers, confirm our analysis and simulations, and further show that our design handles network failures in 1-2 seconds and has the desirable graceful performance degradation property.
acm special interest group on data communication | 2015
Rohan Gandhi; Hongqiang Harry Liu; Y. Charlie Hu; Guohan Lu; Jitendra Padhye; Lihua Yuan; Ming Zhang
Load balancing is a foundational function of datacenter infrastructures and is critical to the performance of online services hosted in datacenters. As the demand for cloud services grows, expensive and hard-to-scale dedicated hardware load balancers are being replaced with software load balancers that scale using a distributed data plane that runs on commodity servers. Software load balancers offer low cost, high availability and high flexibility, but suffer high latency and low capacity per load balancer, making them less than ideal for applications that demand either high throughput, or low latency or both. In this paper, we present Duet, which offers all the benefits of software load balancer, along with low latency and high availability -- at next to no cost. We do this by exploiting a hitherto overlooked resource in the data center networks -- the switches themselves. We show how to embed the load balancing functionality into existing hardware switches, thereby achieving organic scalability at no extra cost. For flexibility and high availability, Duet seamlessly integrates the switch-based load balancer with a small deployment of software load balancer. We enumerate and solve several architectural and algorithmic challenges involved in building such a hybrid load balancer. We evaluate Duet using a prototype implementation, as well as extensive simulations driven by traces from our production data centers. Our evaluation shows that Duet provides 10x more capacity than a software load balancer, at a fraction of a cost, while reducing latency by a factor of 10 or more, and is able to quickly adapt to network dynamics including failures.
acm special interest group on data communication | 2012
Guohan Lu; Rui Miao; Yongqiang Xiong; Chuanxiong Guo
Commodity switches are becoming increasingly important as they are the basic building blocks for the enterprise and data center networks. With the availability of all-in-one switching ASICs, these switches almost universally adopt single switching ASIC design. However, such design also brings two major limitations, i.e, limited forwarding table for flow-based forwarding scheme such as Openflow and shallow buffer for bursty traffic pattern. In this paper, we propose to use CPU in the switches to handle not only control plane but also data plane traffic. We show that this design can provide large forwarding table for flow-based forwarding scheme and deep packet buffer for bursty traffic. We build such a prototype switch on ServerSwitch platform. In our evaluation, we show that our prototype can achieve over 90% traffic offloading ratio, absorb large traffic bursts without a single packet drop, and can be easily programmed to detect and defend low-rate burst attacks.
conference on emerging network experiment and technology | 2012
Haitao Wu; Jiabo Ju; Guohan Lu; Chuanxiong Guo; Yongqiang Xiong; Yongguang Zhang
There have been some serious concerns about the TCP performance in data center networks, including the long completion time of short TCP flows in competition with long TCP flows, and the congestion due to TCP incast. In this paper, we show that a properly tuned instant queue length based Explicit Congestion Notification (ECN) at the intermediate switches can alleviate both problems. Compared with previous work, our approach is appealing as it can be supported on current commodity switches with a simple parameter setting and it does not need any modification on ECN protocol at the end servers. Furthermore, we have observed a dilemma in which a higher ECN threshold leads to higher throughput for long flows whereas a lower threshold leads to more senders on incast under buffer pressure. We address this problem with a switch modification only scheme - dequeue marking, for further tuning the instant queue length based ECN to achieve optimal incast performance and long flow throughput with a single threshold value. Our experimental study demonstrates that dequeue marking is effective for increasing the maximum incast senders close to the performance limit of ECN, achieving a gain anywhere from 16% to 140%.
acm special interest group on data communication | 2009
Guohan Lu; Yunfeng Shi; Chuanxiong Guo; Yongguang Zhang
Recently, Data Center Networking (DCN) has attracted many research attentions and innovative DCN designs have been proposed [1, 2]. All these designs need specialized packet forwarding engines due to their special routing algorithms, which are either based on commonly used packet headers or self-defined ones. Although programmable forwarding devices are available, it is difficult to use them to prototype these DCN designs, especially when self-defined headers are introduced. In this paper, we present a hardware based Configurable pAcket Forwarding Engine (CAFE) to facilitate the prototyping process. Through simple APIs, CAFE can be easily configured to forward self-defined packets, modify, insert, and delete arbitrary packet header fields without re-designing the hardware. We have implemented CAFE using NetFPGA. Evaluation demonstrates that CAFE can be easily configured and it can forward packets at line-rate.
conference on emerging network experiment and technology | 2012
Jiaxin Cao; Chuanxiong Guo; Guohan Lu; Yongqiang Xiong; Yixin Zheng; Yongguang Zhang; Yibo Zhu; Chen Chen
Reliable Group Data Delivery (RGDD) is a pervasive traffic pattern in data centers. In an RGDD group, a sender needs to reliably deliver a copy of data to all the receivers. Existing solutions either do not scale due to the large number of RGDD groups (e.g., IP multicast) or cannot efficiently use network bandwidth (e.g., end-host overlays). Motivated by recent advances on data center network topology designs (multiple edge-disjoint Steiner trees for RGDD) and innovations on network devices (practical in-network packet caching), we propose Datacast for RGDD. Datacast explores two design spaces: 1) Datacast uses multiple edge-disjoint Steiner trees for data delivery acceleration. 2) Datacast leverages in-network packet caching and introduces a simple soft-state based congestion control algorithm to address the scalability and efficiency issues of RGDD. Our analysis reveals that Datacast congestion control works well with small cache sizes (e.g., 125KB) and causes few duplicate data transmissions (e.g., 1.19%). Both simulations and experiments confirm our theoretical analysis. We also use experiments to compare the performance of Datacast and BitTorrent. In a BCube(4, 1) with 1Gbps links, we use both Datacast and BitTorrent to transmit 4GB data. The link stress of Datacast is 1.01, while it is 1.39 for BitTorrent. By using two Steiner trees, Datacast finishes the transmission in 16.9s, while BitTorrent uses 52s.
IEEE Journal on Selected Areas in Communications | 2013
Jiaxin Cao; Chuanxiong Guo; Guohan Lu; Yongqiang Xiong; Yixin Zheng; Yongguang Zhang; Yibo Zhu; Chen Chen; Ye Tian
Reliable Group Data Delivery (RGDD) is a pervasive traffic pattern in data centers. In an RGDD group, a sender needs to reliably deliver a copy of data to all the receivers. Existing solutions either do not scale due to the large number of RGDD groups (e.g., IP multicast) or cannot efficiently use network bandwidth (e.g., end-host overlays). Motivated by recent advances on data center network topology designs (multiple edge-disjoint Steiner trees for RGDD) and innovations on network devices (practical in-network packet caching), we propose Datacast for RGDD. Datacast explores two design spaces: 1) Datacast uses multiple edge-disjoint Steiner trees for data delivery acceleration. 2) Datacast leverages in-network packet caching and introduces a simple soft-state based congestion control algorithm to address the scalability and efficiency issues of RGDD. Our analysis reveals that Datacast congestion control works well with small cache sizes (e.g., 125KB) and causes few duplicate data transmissions (e.g., 1.19%). Both simulations and experiments confirm our theoretical analysis. We also use experiments to compare the performance of Datacast and BitTorrent. In a BCube(4, 1) with 1Gbps links, we use both Datacast and BitTorrent to transmit 4GB data. The link stress of Datacast is 1.01, while it is 1.39 for BitTorrent. By using two Steiner trees, Datacast finishes the transmission in 16.9s, while BitTorrent uses 52s.
symposium on operating systems principles | 2017
Hongqiang Harry Liu; Yibo Zhu; Jitu Padhye; Jiaxin Cao; Sri Tallapragada; Nuno P. Lopes; Andrey Rybalchenko; Guohan Lu; Lihua Yuan
Network reliability is critical for large clouds and online service providers like Microsoft. Our network is large, heterogeneous, complex and undergoes constant churns. In such an environment even small issues triggered by device failures, buggy device software, configuration errors, unproven management tools and unavoidable human errors can quickly cause large outages. A promising way to minimize such network outages is to proactively validate all network operations in a high-fidelity network emulator, before they are carried out in production. To this end, we present CrystalNet, a cloud-scale, high-fidelity network emulator. It runs real network device firmwares in a network of containers and virtual machines, loaded with production configurations. Network engineers can use the same management tools and methods to interact with the emulated network as they do with a production network. CrystalNet can handle heterogeneous device firmwares and can scale to emulate thousands of network devices in a matter of minutes. To reduce resource consumption, it carefully selects a boundary of emulations, while ensuring correctness of propagation of network changes. Microsofts network engineers use CrystalNet on a daily basis to test planned network operations. Our experience shows that CrystalNet enables operators to detect many issues that could trigger significant outages.
