Is this you? Create Your Porfile

Guangdeng Liao

University of California, Riverside

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guangdeng Liao is active.

Explore More

Publication

Featured researches published by Guangdeng Liao.

architectures for networking and communications systems | 2008

Software techniques to improve virtualized I/O performance on multi-core systems

Guangdeng Liao; Danhua Guo; Laxmi N. Bhuyan; Steve R. King

Virtualization technology is now widely deployed on high performance networks such as 10-Gigabit Ethernet (10GE). It offers useful features like functional isolation, manageability and live migration. Unfortunately, the overhead of network I/O virtualization significantly degrades the performance of network-intensive applications. Two major factors of loss in I/O performance result from the extra driver domain to process I/O requests and the extra scheduler inside the virtual machine monitor (VMM) for scheduling domains. In this paper we first examine the negative effect of virtualization in multi-core platforms with 10GE networking. We study virtualization overhead and develop two optimizations for the VMM scheduler to improve I/O performance. The first solution uses cache-aware scheduling to reduce inter-domain communication cost. The second solution steals scheduler credits to favor I/O VCPUs in the driver domain. We also propose two optimizations to improve packet processing in the driver domain. First we re-design a simple bridge for more efficient switching of packets. Second we develop a patch to make transmit (TX) queue length in the driver domain configurable and adaptable to 10GE networks. Using all the above techniques, our experiments show that virtualized I/O bandwidth can be increased by 96%. Our optimizations also improve the efficiency by saving 36% in core utilization per gigabit. All the optimizations are based on pure software approaches and do not hinder live migration. We believe that the findings from our study will be useful to guide future VMM development.

architectures for networking and communications systems | 2008

A scalable multithreaded L7-filter design for multi-core servers

Danhua Guo; Guangdeng Liao; Laxmi N. Bhuyan; Bin Liu; Jianxun Jason Ding

L7-filter is a significant component in Linuxs QoS framework that classifies network traffic based on application layer data. It enables subsequent distribution of network resources in respect to the priority of applications. Considerable research has been reported to deploy multi-core architectures for computationally intensive applications. Unfortunately, the proliferation of multi-core architectures has not helped fast packet processing due to: 1) the lack of efficient parallelism in legacy network programs, and 2) the non-trivial configuration for scalable utilization on multi-core servers. In this paper, we propose a highly scalable parallelized L7-filter system architecture with affinity-based scheduling on a multi-core server. We start with an analytical study of the system architecture based on an offline design. Similar to Receive Side Scaling (RSS) in the NIC, we develop a model to explore the connection level parallelism in L7-filter and propose an affinity-based scheduler to optimize system scalability. Performance results show that our optimized L7-filter has superior scalability over the naive multithreaded version. It improves system performance by about 50% when all the cores are deployed.

ieee international symposium on workload characterization | 2009

Performance characterization and cache-aware core scheduling in a virtualized multi-core server under 10GbE

Danhua Guo; Guangdeng Liao; Laxmi N. Bhuyan

Virtual Machine (VM) technology is experiencing a resurgent interest as the ubiquitous multi-core processors have become the de facto configuration on modern web servers. Multicore servers potentially provide sufficient physical resources to realize VMs benefits including performance isolation, manageability and scalability. However, the network performance of virtualized multi-core servers falls short of expectation. It is therefore important to understand the overhead implications. In this paper, we evaluate the network performance of a virtualized multi-core server using a TCP streaming microbenchmark (Iperf) and SPECweb2005. We first motivate our research by presenting the performance gap between native and virtualized environment. We then break down the overhead from an architectural viewpoint and show that the cache topology greatly influences the performance. We also profile the Virtual Machine Monitor (VMM) at a function level to illustrate that functions in the current version of the Xen scheduler are the major contributors to the poor utilization of cache topology. Consequently, we implement a static onloading scheme to separate interrupt handling from application processes and execute them on cores with cache affinity. Based on the observed benefits, we modify the Xen scheduler to migrate virtual CPUs dynamically to exploit the cache topology. Our results show that the VM performance improves by an average of 12% for Iperf and 15% for SPECweb2005.

design automation conference | 2010

A new IP lookup cache for high performance IP routers

Guangdeng Liao; Heeyeol Yu; Laxmi N. Bhuyan

IP lookup is in the critical data path in a high speed router. In this paper, we propose a new on-chip IP cache architecture for a high performance IP lookup. We design the IP cache along two important axes: cache indexing and cache replacement policies. First, we study various hash performance and employ 2-Universal hashing for our IP cache. Second, coupled with our cache indexing scheme, we present a progressive cache replacement policy by considering Internet traffic characteristics. Our experiments with IP traces show that our IP cache reduces the miss ratio by 15% and a small 32KB IP cache can achieve as high as 2Tbps routing throughput.

architectures for networking and communications systems | 2010

A new TCB cache to efficiently manage TCP sessions for web servers

Guangdeng Liao; Laxmi N. Bhuyan; Wei Wu; Heeyeol Yu; Steve R. King

TCP/IP, the most commonly used network protocol, consumes a significant portion of time in Internet servers. While a wide spectrum of studies has been done to reduce its processing overhead such as TOE and Direct Cache Access, most of them did studies solely from the per-packet perspective and concentrated on the packet memory access overhead. They ignored per-session data TCP Control Block (TCB), which poses a challenge in web servers with a large volume of concurrent sessions. In this paper, we start with challenge studies and show that the TCB data should be efficiently managed. We propose a new TCB cache addressed by session identifiers to address the challenge. We carefully design the TCB cache along two important axes: cache indexing and cache replacement policies. First, we study the performance of various hash functions and propose a new indexing scheme for the TCB cache by employing two Universal hash functions. We analyze session identifiers and choose some important bits as indexing bits to reduce hashing hardware complexity. Second, by leveraging characteristics of web sessions, we design a speculative cache replacement policy, which can effectively work on the TCB cache with two cache banks. Experimental results show that the new cache efficiently manages the per-session data. When it is used in TOEs or integrated into CPUs to manage the per-session data, TCP/IP processing time is significantly reduced, thus saving web server response time.

high performance interconnects | 2010

Understanding Power Efficiency of TCP/IP Packet Processing over 10GbE

Guangdeng Liao; Xia Zhu; Steen Larsen; Laxmi N. Bhuyan; Ram Huggahalli

With the rapid evolution of network speed from 1Gbps to 10Gbps, a wide spectrum of research has been done on TCP/IP to improve its processing efficiency on general purpose processors. However, most of them did studies only from the performance perspective and ignored its power efficiency. As power has become a major concern in data centers, where servers are often interconnected with 10GbE, it becomes critical to understand power efficiency of TCP/IP packet processing over 10GbE. In this paper, we extensively examine power consumption of TCP/IP packet processing over 10GbE on Intel Nehalem platforms across a range of I/O sizes by using a power analyzer. In order to understand the power consumption, we use an external Data Acquisition System (DAQ) to obtain a breakdown of power consumption for individual hardware components such as CPU, memory and NIC etc. In addition, as integrated NIC architectures are gaining more attention in high-end servers, we also study power consumption of TCP/IP packet processing on an integrated NIC by using a Sun Niagara 2 processor with two integrated 10GbE NICs. We carefully compare the power efficiency of using an integrated NIC with using a PCI-E based discrete NIC. We make many new observations as follows: 1) Unlike 1GbE NICs, 10GbE NICs have high idle power dissipation, and TCP/IP packet processing over 10GbE consumes significant dynamic power. 2) Our power breakdown reveals that CPU is the major source of the dynamic power consumption, followed by memory. As the I/O size increases, the CPU power consumption reduces but the memory power consumption grows. Compared to CPU and memory, NIC has low dynamic power consumption. 3) Large I/O sizes are much more power efficient than small I/O sizes. 4) While integrating a 10GbE NIC slightly increases CPU power consumption, it not only reduces system idle power dissipation due to elimination of PCI-E interface in NICs, but also achieves dynamic power savings due to better processing efficiency. Our studies motivate us to design a more power efficient server architecture, which can be used in the next generation data centers.

architectures for networking and communications systems | 2009

EINIC: an architecture for high bandwidth network I/O on multi-core processors

Guangdeng Liao; Laxmi N. Bhuyan; Danhua Guo; Steve R. King

This paper proposes a new server architecture EINIC (Enhanced Integrated NIC) for multi-core processors to tackle the mismatch between network speed and host computational capacity. Similar to prior work, EINIC integrates a redesigned NIC onto a CPU. However, we extend the integrated NIC (INIC) to multicore platforms and examine its behaviors with the network receiving optimization. Additionally, by exploiting NICs proximity to CPUs, we also design an I/O-aware last level shared cache (LLC). Our I/O-aware design allows us to split the cache into an I/O cache and a general cache in a flexible way. It ameliorates cache interferences between network and non-network data. Our simulation results show that EINIC not only attacks the mismatch, but also ameliorates the cache interference.

Journal of Parallel and Distributed Computing | 2012

Analyzing performance and power efficiency of network processing over 10 GbE

Guangdeng Liao; Laxmi N. Bhuyan

Ethernet continues to be the most widely used network architecture today for its low cost and backward compatibility with the existing Ethernet infrastructure. Driven by increasing networking demands of cloud workloads, network speed rapidly migrates from 1 to 10 Gbps and beyond. Ethernets ubiquity and its continuously increasing rate motivate us to fully understand high speed network processing performance and its power efficiency. In this paper, we begin with per-packet processing overhead breakdown on Intel Xeon servers with 10 GbE networking. We find that besides data copy, the driver and buffer release, unexpectedly take 46% of the processing time for large I/O sizes and even 54% for small I/O sizes. To further understand the overheads, we manually instrument the 10 GbE NIC driver and OS kernel along the packet processing path using hardware performance counters (PMU). Our fine-grained instrumentation pinpoints the performance bottlenecks, which were never reported before. In addition to detailed performance analysis, we also examine power consumption of network processing over 10 GbE by using a power analyzer. Then, we use an external Data Acquisition System (DAQ) to obtain a breakdown of power consumption for individual hardware components such as CPU, memory and NIC, and obtain several interesting observations. Our detailed performance and power analysis guides us to design a more processing- and power-efficient server I/O architecture for high speed networks.

high performance interconnects | 2009