Kang Xi
New York University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kang Xi.
conference on computer communications workshops | 2011
Adrian Sai-Wah Tam; Kang Xi; H. Jonathan Chao
In a data center network, for example, it is quite often to use controllers to manage resources in a centralized manner. Centralized control, however, imposes a scalability problem. In this paper, we investigate the use of multiple independent controllers instead of a single omniscient controller to manage resources. Each controller looks after a portion of the network only, but they together cover the whole network. This therefore solves the scalability problem. We use flow allocation as an example to see how this approach can manage the bandwidth use in a distributed manner. The focus is on how to assign components of a network to the controllers so that (1) each controller only need to look after a small part of the network but (2) there is at least one controller that can answer any request. We outline a way to configure the controllers to fulfill these requirements as a proof that the use of devolved controllers is possible. We also discuss several issues related to such implementation.
international conference on communications | 2013
Kuan Yin Chen; Yang Xu; Kang Xi; H. Jonathan Chao
An important challenge of running large-scale cloud services in a geo-distributed cloud system is to minimize the overall operating cost. The operating cost of such a system includes two major components: electricity cost and wide-area-network (WAN) communication cost. While the WAN communication cost is minimized when all virtual machines (VMs) are placed in one datacenter, the high workload at one location requires extra power for cooling facility and results in worse power usage effectiveness (PUE). In this paper, we develop a model to capture the intrinsic trade-off between electricity and WAN communication costs, and formulate the optimal VM placement problem, which is NP-hard due to its binary and quadratic nature. While exhaustive search is not feasible for large-scale scenarios, heuristics which only minimize one of the two cost terms yield less optimized results. We propose a cost-aware two-phase metaheuristic algorithm, Cut-and-Search, that approximates the best trade-off point between the two cost terms. We evaluate Cut-and-Search by simulating it over multiple cloud service patterns. The results show that the operating cost has great potential of improvement via optimal VM placement. Cut-and-Search achieves a highly optimized trade-off point within reasonable computation time, and outperforms random placement by 50%, and the partial-optimizing heuristics by 10-20%.
Archive | 2013
Kang Xi; Yu-Hsiang Kao; H. Jonathan Chao
As critical infrastructures in the Internet, data centers have evolved to include hundreds of thousands of servers in a single facility to support data- and computing-intensive applications. For such large-scale systems, it becomes a great challenge to design an interconnection network that provides high capacity, low complexity, and low latency. The traditional approach is to build a hierarchical packet network using switches and routers. This approach has scalability problems in the aspects of wiring, control, and latency. We tackle the challenge by designing a novel switch architecture that supports direct interconnection of a huge number of server racks and provides Petabit switching capacity. Our design combines the best features of electronics and optics. Exploiting recent advances in optics, we propose to build a bufferless optical switch fabric that includes interconnected arrayed waveguide grating routers (AWGRs) and tunable wavelength converters (TWCs). The optical fabric is integrated with electronic buffering and control to perform high-speed switching with nanosecond-level reconfiguration overhead. In particular, our architecture reduces the wiring complexity from O(N) to O(sqrt(N)). We design a practical and scalable scheduling algorithm to achieve high throughput under various traffic load. We also discuss implementation issues to justify the feasibility of this design. Simulation results show that our design achieves good throughput and delay performance.
acm special interest group on data communication | 2014
Bo Yan; Yang Xu; Hongya Xing; Kang Xi; H. Jonathan Chao
Software-Defined Networking (SDN) enables flexible flow control by caching policy rules at OpenFlow switches. Compared with exact-match rule caching, wildcard rule caching can better preserve the flow table space at switches. However, one of the challenges for wildcard rule caching is the dependency between rules, which is generated by caching wildcard rules overlapped in field space with different priorities. Failure to handle the rule dependency may lead to wrong matching decisions for newly arrived flows, or may introduce high storage overhead in flow table memory. In this paper, we propose a wildcard rule caching system for SDN named CAching in Buckets (CAB). The main idea of CAB is to partition the field space into logical structures called buckets, and cache buckets along with all the associated rules. Through CAB, we resolve the rule dependency problem with small storage overhead. Compared to previous schemes, CAB reduces the flow setup requests by an order of magnitude, saves control bandwidth by a half, and significantly reduce average flow setup time.
international conference on computer communications | 2015
Cing Yu Chu; Kang Xi; Min Luo; H. Jonathan Chao
As service providers have started deploying SDN in their networks, traditional IP routers are gradually upgraded to SDN enabled switches. In other words, the network will have traditional IP routers and SDN switches coexisting, and it is called a hybrid SDN network. With such a network, we take advantage of SDN and propose an approach to guarantee traffic reachability in the presence of any single link failure. By redirecting traffic on the failed link to SDN switches through pre-configured IP tunnels, the proposed approach is able to react to the failures very fast. With the help of coordination among SDN switches, we are also able to explore multiple backup paths for the failure recovery. This allows the proposed approach to avoid potential congestion in the post-recovery network by choosing proper backup paths. Simulation results show that our proposed scheme requires only a very few number of SDN switches in the hybrid SDN network to achieve fast recovery and guarantee 100% reachability from any single link failure. It also shows that the proposed approach is able to better load-balance the post-recovery network comparing to IP Fast Reroute and shortest path re-calculation.
high performance switching and routing | 2014
Junjie Zhang; Kang Xi; Min Luo; H. Jonathan Chao
Classical traffic engineering (TE) methods calculate the optimal routing based on a single traffic matrix. However, they are unable to handle unexpected traffic changes. Thus, it is of interest to find a good routing configuration to accommodate multiple possible traffic scenarios. There are two major approaches to achieve load balancing for multiple traffic matrices: destination-based routing and explicit routing. It has been shown that explicit routing performs better than destination-based routing for multiple traffic matrices. However, explicit routing has high complexity and requires large Ternary Content Addressable Memory (TCAM) in the routers. Thus, it is power hungry and unscalable. This paper presents an approach called hybrid routing to achieve load balancing for multiple traffic matrices with low complexity and good scalability. Our basic idea is to complement destination-based routing with a small number of explicit routing forwarding entries to take advantage of both two routing approaches. Hybrid routing greatly reduces the number of forwarding entries compared with pure explicit routing. This has great value for practice in that the scheme requires very small TCAM to implement. Hybrid routing is very suitable for implementation using SDN. A heuristic algorithm is developed to obtain the near-optimal hybrid routing configuration. Extensive evaluation demonstrates the effectiveness of hybrid routing. The results show that hybrid routing achieves near-optimal load balancing compared with pure explicit routing. In particular, hybrid routing saves at least 84.6% TCAM resources in all practical networks used in our evaluation.
international workshop on quality of service | 2012
Adrian Sai-Wah Tam; Kang Xi; Yang Xu; H. Jonathan Chao
Incast applications have grown in popularity with the advancement of data center technology. It is found that the TCP incast may suffer from the throughput collapse problem, as a consequence of TCP retransmission timeouts when the bottleneck buffer is overwhelmed and causes the packet losses. This is critical to the Quality of Service of cloud computing applications. While some previous literature has proposed solutions, we still see the problem not completely solved. In this paper, we investigate the three root causes for the poor performance of TCP incast flows and propose three solutions, one for each at the beginning, the middle and the end of a TCP connection. The three solutions are: admission control to TCP flows so that the flow population would not exceed the networks capacity; retransmission based on timestamp to detect loss of retransmitted packets; and reiterated FIN packets to keep the TCP connection active until the the termination of a session is acknowledged. The orchestration of these solutions prevents the throughput collapse. The main idea of these solutions is to ensure all the on-going TCP incast flows can maintain the self-clocking, thus eliminates the need to resort to retransmission timeout for recovery. We evaluate these solutions and find them work well in preventing the retransmission timeout of TCP incast flows, hence also preventing the throughput collapse.
global communications conference | 2009
Kang Xi; H. Jonathan Chao
Failure recovery using IP fast reroute (IPFRR) has gained much attention recently. The basic idea is to find backup paths and configure the routing tables in advance. After a failure is detected, the pre-determined backup paths are used immediately to forward the affected packets. Since the calculation and configuration are performed in advance, the recovery can be completed very quickly. IPFFR is considered as a promising approach to enhance the survivability of IP networks. While single failure recovery has been extensively researched, using IPFRR for double-link failure recovery remains as a great challenge. We propose a solution for this issue called Efficient SCan for Alternate Paths for double-link failure recovery (ESCAP-DL). ESCAP-DL guarantees 100% coverage from both single and double-link failures and has the advantages of low complexity and resource requirement. The scheme resumes packet forwarding immediately after failures are detected and does not require failure advertising throughout the network.
global communications conference | 2009
Adrian Sai-Wah Tam; Kang Xi; H. Jonathan Chao
We propose a fast reroute scheme for the single link failure protection of IP multicast network. A multicast network connects nodes as a spanning tree, which is vulnerable to link failures. Our scheme uses ring topologies to ensure a backup path always exists to any single link failure and only a few nodes in the neighborhood of the failed link have to react to the failure. Compare to other solutions, we have four advantages: (1) The multicast tree can be arbitrary. (2) There is only a minimal disruption and there would be no packet loss. (3) Failure protection is available to a wider set of network topologies. (4) Only a small number of multicast nodes are involved and the number of spare links used are also small.
high performance interconnects | 2010
Adrian Sai-Wah Tam; Kang Xi; H. Jonathan Chao
Data center networks (DCNs) usually have a regular topology providing high bandwidth and low latency with multi-paths. In a data-intensive application on DCNs, it is important to be able to use all the available bandwidth. However, this is not easy to achieve because of the locality of congestion information. In this paper, we propose a reactive reroute strategy to incorporate with IEEE 802.1Qau standards on Ethernet-based DCNs so that we can exploit the multipath property of a network to reduce congestion, and hence improve performance. We show by simulation that the throughput is significantly higher and the number of out-of-order packets induced by rerouting is limited.