Yanfeng Zheng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yanfeng Zheng is active.

Explore More

Publication

Featured researches published by Yanfeng Zheng.

high performance switching and routing | 2005

Multicast scheduling in buffered crossbar switches with multiple input queues

Shutao Sun; Simin He; Yanfeng Zheng; Wen Gao

We consider the problem of scheduling multicast traffic in a buffered crossbar switch with multiple input queues at each input port. In this paper, we design and investigate a series of combinations of queuing policies and scheduling algorithms and report the simulation result. It is shown that a small number of input queues at each input port can dramatically improve the performance under burst multicast traffic in buffered crossbar switches. Under this architecture, it is feasible to design simple queuing policies and scheduling algorithms for high speed switches while keeping high performance and small size of buffer within crossbar.

measurement and modeling of computer systems | 2005

Smooth switching problem in buffered crossbar switches

Simin He; Shutao Sun; Wei Zhao; Yanfeng Zheng; Wen Gao

Scalability considerations drive the switch fabric design to evolve from output queueing to input queueing and further to combined input and crosspoint queueing (CICQ). However, few CICQ switches are known with guaranteed quality of service, and credit-based flow control induces a scalability bottleneck. In this paper, we propose a novel CICQ switch called the smoothed buffered crossbar or sBUX, based on a new design objective of smoothness and on a new rate-based flow control scheme called the smoothed multiplexer or sMUX. It is proved that with a buffer of just four cells at each crosspoint, sBUX can utilize 100% of the switch capacity to provide deterministic guarantees of bandwidth and fairness, delay and jitter bounds for each flow. In particular, neither credit-based flow control nor speedup is used, and arbitrary fabric-internal latency is allowed between line cards and the switch core.

international conference on computer communications and networks | 2005

A dual round-robin algorithm for combined input-crosspoint-queued switches

Yanfeng Zheng; Wen Gao

Compared with a bufferless crossbar switch, a combined input-crosspoint-queued (CICQ) switch has better scalability owing to its distributed scheduling. Although the previously proposed round-robin algorithms achieve 100% throughput asymptotically under uniform traffic, these algorithms do not provide a satisfactory performance under nonuniform traffic. In this paper, we propose an efficient round-robin algorithm for a CICQ switch with one-cell cross point buffers. With our algorithm, each input arbiter is associated with dual round-robin pointers. Unlike the existing round-robin algorithms, our algorithm has distinctive round-robin pointer updating rules which are powerful to cope with nonuniform traffic patterns. Extensive simulation results show that our algorithm achieves a satisfactory performance under both uniform and a broad class of nonuniform traffic patterns.

international conference on networks | 2004

Multiple-threshold based scheduling algorithm for high performance input queued switches

Shutao Sun; Simin He; Yanfeng Zheng; Wen Gao

Input queued switching architectures has become attractive for implementing high performance switches. Some maximal matching algorithms, e.g. PIM, iSLIP, are easily implemented in hardware, hence attract more attention in designing of the practical router. However, they usually need multiple iterations to achieve satisfying performance. In this paper, we propose a multiple-threshold based round robin matching algorithm (MTRRM) for high performance input queued crossbar switch scheduling. In MTRRM, when an input and an output are matched, the matching might be kept for more than one time slot. Simulation results show that our scheme can strongly reduce the average cell delay and increase the throughput under non-uniform traffic. MTRRM achieves quite good performance with single iteration and this promises us that it is more suitable for switch with high line rate or large number of ports.

international conference on computer communications and networks | 2006

Switch Simulations Based on Workload Pattern Generation and Smoothed Periodic Input

Qiang Zheng; Simin He; Shutao Sun; Yanfeng Zheng; Wen Gao

Simulation is crucial to performance evaluation of switches. Currently stochastic simulation is the predominant approach, which has two drawbacks: few workload patterns and long simulation times. In this paper, we propose a novel switch simulation method that is based on two techniques: exhaustive workload pattern generation and smoothed periodic input generation. The exhaustive workload pattern generation can produce a huge number of nonuniform workload patterns, which far exceeds the traditional few ones; conclusions based on such truly extensive simulations are much more convincing. The periodic input or cell arrival pattern outperforms the stochastic counterpart in terms of easy repetition and fast convergence of simulation; in particular, among periodic inputs, the smoothed periodic input is most favorable to switch scheduling, and hence switches performing poor with it probably performs worse under others. Combining these two techniques together can systematically identify lots of stuck states at which some switches show poor performances such as low throughput. Specifically, this method discovers that the throughput of iSLIP, FIRM and DRRM, each with one iteration, may be lower than 60% under certain nonuniform and smoothed periodic traffic pattern.

global communications conference | 2004

Distributed load adaptive scheduling for high speed input queued switch

Shutao Sun; Youjian Zhao; Simin He; Yanfeng Zheng; Wen Gao

Input queued switching architectures have become predominant in high speed switches and routers. In this paper, we change the point of view from weight-based matching to weight-based service, and propose a distributed load adaptive scheduling (DLAS) algorithm. In DLAS, the round robin arbiters are used to find a matching between the input ports and output ports. Once the matching between an input-output pair is established, the scheduler will keep it for a certain period, which is a function of the number of cells queued in the corresponding VOQ. Simulation results show that our scheme achieves high throughput and low delay under admissible traffic. For uniform Bernoulli i.i.d. traffic, it achieves 100% throughput, and for nonuniform traffic, its throughput is almost 100%.

international symposium on communications and information technologies | 2005

A deterministic parallel scheduling algorithm for input queued crossbar switches

Yanfeng Zheng; Simin He; Shutao Sun; Wen Gao

Input-queued (IQ) switch architectures have become predominant in high-speed switching for the past decade. Maximum weight matching (MWM) algorithms are known to achieve 100% throughput under any admissible traffic. Unfortunately, MWM is impractical for its high computational complexity O(N/sup 3/). In this paper, we study a new type of approximate algorithm to MWM using local search technique. Instead of using randomized technique which is mainly used by the existing approximations, our algorithm runs in a deterministic way which is easy to be implemented by hardware. It provides 100% throughput under any admissible traffic and simulation results show that the proposed algorithm with only 3 iterations outperforms the existing approximations to MWM.

international performance computing and communications conference | 2005

Parallelized scheduling algorithm for input queued switches using local search technique

Yanfeng Zheng; Sirnin He; Shutao Sun; Wen Gao

Input queued switches have been well studied in the recent past. A scheduling algorithm is required to schedule the transfer of packets through the switch fabric (e.g. crossbar) at every time slot. The maximum weight matching (MWM) algorithm is known to deliver 100% throughput under any admissible traffic. However, MWM is not practical for its high computational complexity O(N/sup 3/). In this paper, we study a class of approximation algorithms to MWM from the point of view of local search. Notably we observe that: (a) Local search is well suitable for parallel computation. (b) Each line card of high performance router has at least one processor. Based on the two important observations, a parallelized greedy scheduling algorithm is proposed. The proposed algorithm is proved to be rate stable under any admissible traffic. Simulation results show that our algorithm outperforms other randomized approximations to MWM.

international conference on networks | 2005

A traffic adaptive round-robin algorithm for combined input-crosspoint-queued switches

Yanfeng Zheng; Simin He; Wen Gao; Shutao Sun

The appeal of a combined input-crosspoint-queued (CICQ) switch is its distributed scheduling property, which is more scalable than an unbuffered crossbar switch. Round-robin algorithms are interesting because of simple hardware implementation. Although the existing round-robin algorithms achieve 100% throughput asymptotically under uniform traffic, these algorithms have poor performance under nonuniform traffic. In order to improve the performance of a CICQ switch (one cell per crosspoint buffer) under nonuniform traffic, this paper proposes a traffic adaptive round-robin algorithm named TARR. Unlike the existing round-robin algorithms, TARR has distinctive round-robin pointer updating rules which are powerful to cope with nonuniform traffic patterns. On the other hand, TARR is a quantum based algorithm, and the quantum assignment discipline is load adaptive. Extensive simulations show that TARR has a satisfactory performance under both uniform and nonuniform traffic patterns.

international conference on networking | 2005

Scheduling algorithms for input queued switches using local search technique

Yanfeng Zheng; Simin He; Shutao Sun; Wen Gao

Input Queued switches have been very well studied in the recent past. The Maximum Weight Matching (MWM) algorithm is known to deliver 100% throughput under any admissible traffic. However, MWM is not practical for its high computational complexity O(N3). In this paper, we study a class of approximations to MWM from the point of view of local search. Firstly, we propose a greedy scheduling algorithm called GSA. It has the following features: (a) It is very simple to compute the weight of a neighbor matching. GSA only needs to compute the weight of two swapped edges instead of the weight of all the edges. (b) The computational complexity of GSA is O(c_max), where c_max denotes the maximum number of iterations. Hence we can adjust the value of c_max to achieve low computational complexity. Secondly, we observe that: (a) Local search is well suitable for parallel computing. (b) Each line card of high performance router has at least one processor. Based on the two important observations, we develop the second algorithm PGSA. Compared with GSA, PGSA significantly reduce the number of iterations. Simulation results show that PGSA with three iterations outperforms algorithms in [1] under different switch sizes.

Explore More