Wenxue Cheng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wenxue Cheng is active.

Explore More

Publication

Featured researches published by Wenxue Cheng.

international conference on distributed computing systems | 2017

Modeling and Analyzing Latency in the Memcached system

Wenxue Cheng; Fengyuan Ren; Wanchun Jiang; Tong Zhang

Memcached is a widely used in-memory caching solution in large-scale searching scenarios. The most pivotal performance metric in Memcached is latency, which is affected by various factors including the workload pattern, the service rate, the unbalanced load distribution and the cache miss ratio. To quantitate the impact of each factor on latency, we establish a theoretical model for the Memcached system. Specially, we formulate the unbalanced load distribution among Memcached servers by a set of probabilities, capture the burst and concurrent key arrivals at Memcached servers in form of batching blocks, and add a cache miss processing stage. Based on this model, algebraic derivations are conducted to estimate latency in Memcached. The latency estimation is validated by intensive experiments. Moreover, we obtain a quantitative understanding of how much improvement of latency performance can be achieved by optimizing each factor and provide several useful recommendations to optimal latency in Memcached.

international conference on computer communications | 2017

Modeling and analyzing the influence of chunk size variation on bitrate adaptation in DASH

Tong Zhang; Fengyuan Ren; Wenxue Cheng; Xiaohui Luo; Ran Shu; Xiaolan Liu

Recently, HTTP-based adaptive video streaming has been widely adopted in the Internet. Up to now, HTTP-based adaptive video streaming is standardized as Dynamic Adaptive Streaming over HTTP (DASH), where a client-side video player can dynamically pick the bitrate level according to the perceived network conditions. Actually, not only the available bandwidth is varying, but also the chunk sizes in the same bitrate level significantly fluctuate, which also influences the bitrate adaptation. However, existing bitrate adaptation algorithms do not accurately involve the chunk size variation, leading to performance losses. In this paper, we theoretically analyze the influence of chunk size variation on bitrate adaptation performance. Based on DASH system features, we build a general model describing the playback buffer evolution. Applying stochastic theories, we respectively analyze the influence of the chunk size variation on rebuffering probability and average bitrate level. Furthermore, based on theoretical insights, we provide several recommendations for algorithm designing and rate encoding, and also propose a simple bitrate adaptation algorithm. Extensive simulations verify our insights as well as the efficiency of the proposed recommendations and algorithm.

international conference on parallel processing | 2018

Power Efficient High Performance Packet I/O

Xuesong Li; Wenxue Cheng; Tong Zhang; Jing Xie; Fengyuan Ren; Bailong Yang

Recently, high performance packet I/O frameworks are expected an extensive application for their ability to process packets from 10Gbps or higher speed links. To achieve high throughput and low latency, high performance packet I/O frameworks usually employ busy polling technique. As busy polling will burn all CPU cycles even if theres no packet to process, these frameworks are quite power inefficient. Meanwhile, exploiting power management techniques such as DVFS and LPI in high performance packet I/O frameworks is challenging, because neither the OS nor the frameworks can provide information (e.g., the actual CPU utilization, available idle period, or the target frequency) required by power management techniques. In this paper, we establish an analytical model that can formulate the packet processing flow of high performance packet I/O to help address the above challenges. From the analytical model, we can deduce the actual CPU utilization and average idle period in different traffic load, and gain the insight to choose CPU frequency that can appropriately balance the power consumption and packet latency. Then, we propose two simple but effective approaches to conduct power conservation for high performance packet I/O: one with the aid of traffic information and the other without. Experiments with Intel DPDK show that both approaches can achieve significant power reduction (35.90% and 34.43% on average respectively) while incurring < 1 μs of latency increase.

international workshop on quality of service | 2017

Performance analysis of randomized data fetching in cluster computing

Tong Zhang; Peng Cheng; Wenxue Cheng; Bo Wang; Fengyuan Ren

The shuffle transfer pattern is widely adopted in todays cluster computing applications and the completion time of each group of transmissions directly affects application performance. Because of the restriction on the number of concurrent threads and the TCP Incast problem, the randomized data fetching strategy is widely employed in this kind of communication in practice. In this paper, to assess the performance of randomized data fetching, we build a general analytical model and define two metrics - link overload probability and K-deviation load balancing probability - to evaluate the degree of link overload and load balancing respectively, since they are closely related to the transfer completion time. Leveraging our model, we theoretically analyze the transfer performance in three typical scenarios and provide recommendations for setting the number of concurrent connections per receiver. Finally, we validate the theoretical analysis as well as the recommendations through extensive simulations.

international workshop on quality of service | 2017

XpressEth: Concise and efficient converged real-time Ethernet

Kun Qian; Fengyuan Ren; Danfeng Shan; Wenxue Cheng; Bo Wang

Owing to Ethernets low cost, high bandwidth and architecture openness, much attention has been paid to develop converged Ethernet to support both time-critical services and conventional communication services on a unified network infrastructure. The greatest challenge here is providing low and deterministic latency for time-critical packets. Recently, the IEEE time sensitive networking task group is launched to address it. However, their framework is complex and unsuitable for commodity switch architecture. In this paper, we propose a concise and efficient converged real-time Ethernet framework called XpressEth, which leverages Dual Preemption mechanism to minimize the delay of time-critical packets, and employs a lightweight Slot Assignment Scheduler to minimize the conflicts among time-critical packets at sources. XpressEth cuts off great burden from both forwarding and scheduling. The simulation results verify that XpressEth can provide ultra-low and deterministic latency for time-critical packets (1.024µ s per hop and zero jitter in 1Gbps network), which is 13× better than time sensitive networking solution, and the side-effect on conventional communication traffic is negligible.

international workshop on quality of service | 2017

Congestion control in Converged Ethernet with heterogeneous and time-varying delays

Wenxue Cheng; Wanchun Jiang; Tong Zhang; Bo Wang; Kun Qian; Fengyuan Ren

Congestion control is an indispensable mechanism in the new trend of enhanced Ethernet as a unified fabric for traditional LAN, SAN, and high-performance computing networks. A congestion management framework for Converged Ethernet (CE) networks has been standardized by IEEE 802.1 Qau work group, and QCN is recommended as the congestion control scheme in the standard draft. QCN is heuristically designed for 1/10Gbps Ethernet without considering the impact of delays. Recent work find that QCN will encounter stability issues with feedback delays, and these issues will be more serious as Ethernet extends to 40/100Gbps and the delays become heterogeneous and time-varying. This work aims to mitigate the negative impact of delays on congestion control scheme in CE. Specially, considering the delays are heterogeneous and time-varying, we build a model for Converged Ethernet with the standard congestion management framework. The model provides a new congestion detector to estimate the real congestion status under the impact of delays and regards the heterogeneous and time-varying feature as disturbances. Leveraging the new congestion detector and tolerating the disturbance through the sliding mode control method, we design the Delay-tolerant Sliding Mode (DSM) congestion control scheme. Extensive simulations show that DSM outperforms other congestion control schemes when the Ethernet ranges from 1Gbps to 100Gbps and the delays are heterogeneous and time-varying.

Proceedings of the First Asia-Pacific Workshop on Networking | 2017

SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet

Mao Miao; Fengyuan Ren; Xiaohui Luo; Jing Xie; Qingkai Meng; Wenxue Cheng

Recent academic and industrial work is exploring the challenges of using RDMA over Ethernet, to support highly reliable, latency-sensitive services in todays datacenters. Previous work on the high-speed packet I/O like netmap, DPDK, etc., and high-performance user-level stacks like mTCP, IX etc., rekindles our inspirations to implement a high-performance software RDMA over commodity Ethernet devices. This paper summarizes and explores the much-overlapped design philosophy between RDMA and the high-performance user-level stacks. Inspired by these, we design SoftRDMA, which is a user-level iWARP stack, based on One-Copy and deliberate threading model design. SoftRDMAs system implementation includes user-level iWARP/TCP/IP protocols and the DPDK packet I/O. No special hardware or software is required beyond. It provides the basic verbs of iWARP for RDMA communication. In our evaluation, SoftRDMA demonstrates comparable latency and throughput performance against the hardware-supported iWARP scheme. It achieves microsecond latency for short message and nearly full line rate for long message transfer.

high performance computing and communications | 2016

Smart Batching: A Load-Sensitive Self-Tuning Packet I/O Using Dynamic Batch Sizing

Mao Miao; Wenxue Cheng; Fengyuan Ren; Jing Xie

Batch processing is an essential technique to improve packet rates under high-speed network, which has been widely used in high-performance packet IO frameworks. Conventionally, we need to make a tradeoff between high latency at low input loads and high throughput at high loads. In this work, we intend to leverage the smart batching mechanism to dynamically balance the input traffic rate and the available processing ability, so as to achieve both low latency and high throughput.%We conduct experimental investigations on the impact of different batching sizes on throughput, latency and CPU costs on our testbed which utilizesThe impact of different batching sizes on throughput, latency and CPU costs is investigated on our testbed which utilizes MoonGen and DPDK platform. We build a model for the batch processing procedure to explore various factors influencing latency and throughput. We find that batching size is a controllable variable linking the input rate and the available system processing rate. In light of this understanding, we design the smart batching mechanism to dynamically tune batching size according to the input traffic rate and the processing capability available. Smart batching mechanism is implemented and evaluated on DPDK platform under different traffic patterns and workload. The results show that it is sensitive and adaptive to the instantaneous states of network and server. Besides, it is simple enough and portable to most packet IO frameworks.

arXiv: Networking and Internet Architecture | 2016