Is this you? Create Your Porfile

Jingnan Yao

University of California, Riverside

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jingnan Yao is active.

Explore More

Publication

Featured researches published by Jingnan Yao.

international conference on computer communications | 2005

An efficient packet scheduling algorithm in network processors

Jiani Guo; Jingnan Yao; Laxmi N. Bhuyan

Several companies have introduced powerful network processors (NPs) that can be placed in routers to execute various tasks in the network. These tasks can range from IP level table lookup algorithm to application level multimedia transcoding applications. An NP consists of a number of on-chip processors to carry out packet level parallel processing operations. Ensuring good load balancing among the processors increases throughput. However, such multiprocessing also gives rise to increased out-of-order departure of processed packets. In this paper, we first propose a dynamic batch co-scheduling (DBCS) scheme to schedule packets in a heterogeneous network processor assuming that the workload is perfectly divisible. The processed loads from the processors are ordered perfectly. We analyze the throughput and derive expressions for the batch size, scheduling time and maximum number of schedulable processors. To effectively schedule variable length packets in an NP, we propose a packetized dynamic batch-coscheduling (P-DBCS) scheme by applying a combination of deficit round robin (DRR) and surplus round robin (SRR) schemes. We extend the algorithm to handle multiple flows based on a fair scheduling of flows depending on their reservations. Extensive sensitivity results are provided through analysis and simulation to show that the proposed algorithms satisfy both the load balancing and in-order requirements in packet processing.

IEEE Transactions on Computers | 2008

Ordered Round-Robin: An Efficient Sequence Preserving Packet Scheduler

Jingnan Yao; Jiani Guo; Laxmi N. Bhuyan

With the advent of powerful network processors (NPs) in the market, many computation-intensive tasks such as routing table look-up, classification, IPSec, and multimedia transcoding can now be accomplished more easily in a router. An NP consists of a number of on-chip processors to carry out packet level parallel processing operations. Ensuring good load balancing among the processors increases throughput. However, such multiprocessing also gives rise to increased out-of-order departure of processed packets. In this paper, we first propose an Ordered Round Robin (ORR) scheme to schedule packets in a heterogeneous network processor assuming that the workload is perfectly divisible. The processed loads from the processors are ordered perfectly. We analyze the throughput and derive expressions for the batch size, scheduling time and maximum number of schedulable processors. To effectively schedule variable length packets in an NP, we propose a Packetized Ordered Round Robin (P-ORR) scheme by applying a combination of deficit round robin (DRR) and surplus round robin (SRR) schemes. We extend the algorithm to handle multiple flows based on a fair scheduling of flows depending on their reservations. Extensive sensitivity results are provided through analysis and simulation to show that the proposed algorithms satisfy both the load balancing and in-order requirements for parallel packet processing.

global communications conference | 2004

Scheduling real-time multimedia tasks in network processors

Jingnan Yao; Jiani Guo; Laxmi N. Bhuyan; Zhiyong Xu

Several companies have introduced powerful network processors (NP) that can be placed in active routers to execute application level tasks in the network. An NP consists of a number of on-chip processors to carry out packet level parallel processing operations. We propose to employ them for multimedia streaming (transcoding) to convert the incoming video streams to low bit-rate media units as per the requirements of the clients. To effectively schedule the parallel transcoding operations in an active router, we propose a static sequentialized batch-coscheduling (SSBC) scheme to meet both load balancing and real-time requirements for media streaming, based on divisible load theory (DLT). We first analyze the feasibility and optimality of the load distribution schemes from the theoretical perspectives, and then present separate solutions for non-delay-sensitive streams and delay-sensitive streams. Rigorous simulations and experiments have been carried out to evaluate the performance.

global communications conference | 2005

Optimal network processor topologies for efficient packet processing

Jingnan Yao; Yan Luo; Laxmi N. Bhuyan; Ravishankar Iyer

In this paper, we propose a novel strategy to determine the optimal network processor (NP) topology for the target application tasks. We partition network applications into different stages with the consideration of limited instruction memory of the processing elements (PEs). We develop a theoretical approach to determine an optimal topology of the PEs via multiple pipelines. The idea of multiple pipelining is to exploit the task/packet level parallelism and the pipelines are further optimized to achieve the maximum throughput and resource utilization. Simulation results verify our analytical model and demonstrate the robustness of our approach in different NP configurations.

Computer Communications | 2008

Fair link striping with FIFO delivery on heterogeneous channels

Jingnan Yao; Jiani Guo; Laxmi N. Bhuyan

Link aggregation techniques are often used to achieve higher communication bandwidth by striping network traffic across multiple transmission channels. Due to the variations in bandwidth, latency and loss rate on different channels, link striping suffers from packet reordering thereby adversely affecting the performance of any QoS concerned applications. Hardware-based solutions often prolong transmission latency which is undesirable for delay sensitive applications and are restricted with the available buffer space on the device. Thus, an effective striping protocol that ensures both load balancing and minimal packet reordering is important when striping traffic onto multiple channels. In this paper, we first propose an sequence preserving scheduling (SPS) scheme to schedule packets among multiple heterogeneous communication channels assuming that the workload is perfectly divisible. Packets assigned onto different links for transmission are ordered perfectly by applying divisible load theory (DLT). We analyze the throughput and derive expressions for the batch size, scheduling time and the maximum number of channels that can be supported by the sender and receiver. Further, to effectively schedule variable length packets for link striping, we propose a packetized sequence preserving scheduling (P-SPS) scheme by applying a combined packetized technique of deficit round robin (DRR) and surplus round robin (SRR). Extensive sensitivity results are provided through analysis and simulation to show that the proposed algorithms satisfy both the load balancing and in-order requirements for efficient packet transmission.

design automation conference | 2007

Program mapping onto network processors by recursive bipartitioning and refining

Jia Yu; Jingnan Yao; Laxmi N. Bhuyan; Jun Yang

Mapping packet processing applications onto embedded network processors (NP) is a challenging task due to the unique constraints of NP systems and the characteristics of network application domains. A remarkable difference with general multiprocessor task scheduling is that NPs are often programmed into a hybrid parallel and pipeline topology. In this paper, we introduce a multilevel balancing and refining algorithm for NP program mapping. We use a divide- and-conquer approach to recursively bipartition the task graph into disjoint subdomains. At each level of bipartition, the processing resources will be co-allocated so that an estimation of throughput can be derived. The bipartition continues until the code of the tasks can be fit into the instruction memory of processing elements. Then the algorithm iteratively refines the solution by migrating tasks from the bottleneck stage to other stages. The performance of our scheme is evaluated with a suite of NP benchmarks using SUIF/Machine SUIF compiler and Intel IXA Architecture Tool. The throughput improvement is significant: average throughput is increased by 20%, and the maximum is 108%.

local computer networks | 2006

Computing Real Time Jobs in P2P Networks

Jingnan Yao; Jian Zhou; Laxmi N. Bhuyan

In this paper, we present a distributed computing framework designed to support higher quality of service and fault tolerance for processing deadline-driven tasks in a P2P environment. Our proposed strategy strives to build an open infrastructure that is accessible by ordinary users for both cycle donation and consumption. For jobs that fail to be locally accommodated, the proposed scheduler MET (maximum efficiency tree) builds a dynamic multi-level resource tree with minimal yet sufficient power to process the job prior to its deadline. The peer selection policy is based on a joint evaluation of the computational power and communication bandwidth at the nodes. Further, with an optimal load sharing scheme, the resulting resource tree is guaranteed to be power efficient. The proposed computing protocol offers an approach for utilizing idle computing cycles of peer computers on the Internet in a P2P manner. The protocol exhibits three attractive features - decentralized operation, optimized load balancing and guaranteed resource utilization. Extensive simulation experiments are conducted to study the effectiveness of the proposed framework under various network conditions. We compare our strategy with two other tree construction algorithms, namely MST (minimum spanning tree) and MCT (maximum computation tree). It is demonstrated that MET outperforms both MST and MCT consistently. Further, sensitivity results with random node failure/join are also furnished

global communications conference | 2005

Distributed packet processing in P2P networks

Jingnan Yao; Laxmi N. Bhuyan

In this paper, we propose a distributed packet processing algorithm on a peer-to-peer (P2P) network with the objective to minimize the total processing time. We consider an arbitrary P2P network comprising heterogeneous nodes interconnected via heterogeneous links. Each node on the network has its own local workload to be processed and is ready to share its extra processing power among other peer nodes upon request. We distribute the workload of a host to its peers by organizing them into an efficient resource tree. Since the key idea of this algorithm is to effectively share the available resources on the network by processing the load in a distributed manner, we refer to this approach as resource sharing distributed load processing (RSDLP) algorithm. We evaluate the performance with rigorous simulation experiments under generic system parameters.

international conference on distributed computing systems | 2008

Quantum-Adaptive Scheduling for Multi-Core Network Processors

Yue Zhang; Bin Liu; Lei Shi; Jingnan Yao; Laxmi N. Bhuyan

Efficiency and effectiveness are always the emphases of a scheduler, for both link and processor scheduling. Well-known scheduling algorithms such as surplus round robin (SRR) and elastic round robin (ERR) suffer from two fold shortcomings: 1) additional pre-processing queuing delay and post-processing resequencing delay are incurred due to the lack of short-term load-balancing; 2) bursty scheduling is caused due to blind preservation of scheduling history under non-backlogged traffic. In this paper, we propose a quantum-adaptive scheduling (QAS) algorithm, which: 1) synchronizes all the quanta in a fine-grained manner and, 2) adjusts the quanta intelligently based on processor utilization. We theoretically prove that the queuing fairness bound (QFB) for QAS is one third tighter than SRR and ERR. This result approaches the optimal value as obtained in shortest queue first (SQF) algorithm, while still maintaining O(1) complexity. Trace-driven simulations show that QAS reduces average packet delay by 18%~24% while cutting down the resequencing buffer size by more than 40% compared to SRR and ERR.

Archive | 2007