Yi-Neng Lin
National Chiao Tung University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yi-Neng Lin.
wireless communications and networking conference | 2008
Yi-Neng Lin; Che-Wen Wu; Ying-Dar Lin; Yuan-Cheng Lai
The mobile WiMAX systems based on IEEE 802.16e-2005 provide high data rate for the mobile wireless network. However, the link quality is frequently unstable owing to the long-distance and air interference and therefore impacts real-time applications. Thus, a bandwidth allocation algorithm is required to be modulation-aware, while further satisfying the latency guarantee, service differentiation and fairness. This work proposes the Highest Urgency First (HUF) algorithm to conquer the above challenges by taking into consideration the adaptive modulation and coding scheme (MCS) and the urgency of requests. Downlink and uplink sub-frames are determined by reserving the bandwidth for the most urgent requests and proportionating the remaining bandwidth for others. Then, independently in the downlink and uplink, the HUF allocates bandwidth to every mobile station according to a pre-calculated U-factor which considers urgency, priority and fairness. Simulation results prove the HUF is modulation-aware and achieves the above three objectives, notably the zero violation rate within system capacity as well as the throughput paralleling to the best of the existing approaches.
Computer Communications | 2009
Yi-Neng Lin; Ying-Dar Lin; Yuan-Cheng Lai; Che-Wen Wu
The mobile WiMAX systems based on IEEE 802.16e-2005 provide high data rate for mobile wireless networks. However, the link quality is frequently unstable owing to mobility and air interference and therefore impacts the latency requirement of real-time applications. In the WiMAX standard, the modulation/coding scheme and the boundary of uplink/downlink sub-frames could be adjusted subject to channel quality and the traffic volume, respectively. This provides us a chance to design a MAC-layer uplink/downlink bandwidth allocation algorithm that is QoS/PHY-aware. This work takes into account the adaptive modulation and coding scheme (MCS), uplink and downlink traffic volume, and QoS parameters of all five defined service classes to design a bandwidth allocation algorithm that calculates the slot allocation in two phases. The first phase decides the boundary of uplink and downlink sub-frames by satisfying requests with pending latency violation and proportionating according to traffic volume, while the second phase allocates slots to mobile stations considering urgency, priority and fairness. Simulation results show our algorithm achieves zero latency violation and higher system throughput compared to existing non-QoS/PHY-aware or less-QoS/PHY-aware approaches.
IEEE Network | 2003
Ying-Dar Lin; Yi-Neng Lin; Shun-Chin Yang; Yu-Sheng Lin
Network processors are emerging as a programmable alternative to the traditional ASIC-based solutions in scaling up the data plane processing of network services. This work, rather than proposing new algorithms, illustrates the process of, and examines the performance issues in, prototyping a DiffServ edge router with IXP1200. The external benchmarks reveal that although the system can scale to a wire speed of 1.8 Gb/s in simple IP forwarding, the throughput declines to 180-290 Mb/s when DiffServ is performed due to the double bottlenecks of SRAM and microengines (coprocessors). Through internal benchmarks, the performance bottleneck was found to be able to shift from one place to another given different network services and algorithms. Most of the results reported here should be applicable to other NPs, since they have similar architectures and components.
high performance interconnects | 2002
Ying-Dar Lin; Yi-Neng Lin; Shun-Chin Yang; Yu-Sheng Lin
Network processors are emerging as a programmable alternative to the traditional ASIC-based solutions in scaling up the data-plane processing of network services. This work, rather than proposing new algorithms, illustrates the process of and examines the performance issues in, prototyping a DiffServ edge router with IXP1200. The external benchmarks reveal that though the system can scale to wire-speed of 1.8 Gbps in simple IP forwarding, the throughput declines to 180 Mbps/spl sim/290 Mbps when DiffServ is performed due to the double bottlenecks of SRAM and microengines. Through internal benchmarks, the performance bottleneck was found to be able to shift from one place to another given different network services and algorithms. Most of the result reported here remain the same for other NPs since they have similar architectures and components.
Journal of Systems Architecture | 2007
Ying-Dar Lin; Kuo-Kun Tseng; Tsern-Huei Lee; Yi-Neng Lin; Chen-Chou Hung; Yuan-Cheng Lai
String matching plays a central role in packet inspection applications such as intrusion detection, anti-virus, anti-spam and Web filtering. Since they are computation and memory intensive, software matching algorithms are insufficient to meet the high-speed performance. Thus, offloading packet inspection to a dedicated hardware seems inevitable. This paper presents a scalable automaton matching (SAM) coprocessor that uses Aho-Corasick (AC) algorithm with two parallel acceleration techniques, root-indexing and pre-hashing. The root-indexing can match multiple bytes in one single matching, and the pre-hashing can be used to avoid bitmap AC matching which is a cycle-consuming operation. In the platform-based SoC implementation of the Xilinx ML310 FPGA, the proposed hardware architecture can achieve almost 10.7Gbps and support over 10,000 patterns for virus, which is the largest pattern set from among the existing works. On the average, the performance of SAM is 7.65 times faster than the original bitmap AC. Furthermore, SAM is feasible for either internal or external memory architecture. The internal memory architecture provides high performance, while the external memory architecture provides high scalability in term of the number of patterns.
real time technology and applications symposium | 2005
Yi-Neng Lin; Chiuan-Hung Lin; Ying-Dar Lin; Yuan-Cheng Lai
Networking applications, such as VPN and content filtering, demand extra computing power in order to meet the throughput requirement nowadays. In addition to pure ASIC solutions, network processor architecture is emerging as an alternative to scale up data-plane processing while retaining design flexibility. This article, rather than proposing new algorithms, illustrates the experience in developing IPSec-based VPN gateways over network processors, and investigates the performance issues. The external benchmarks reveal that the system can reach 45 Mbps for IPSec using 3DES algorithm, which improves by 350% compared to single XScale core processor and parallels the throughput of a PIII 1 GHz processor. Through the internal benchmarks, we analyze the turnaround times of the main functional blocks, and identify the core processor as the performance bottleneck for both packet forwarding and IPSec processing.
Journal of Systems and Software | 2007
Yi-Neng Lin; Yao-Chung Chang; Ying-Dar Lin; Yuan-Cheng Lai
Networking applications with high memory access overhead gradually exploit network processors that feature multiple hardware multithreaded processor cores along with a versatile memory hierarchy. Given rich hardware resources, however, the performance depends on whether those resources are properly allocated. In this work, we develop an NIPS (Network Intrusion Prevention System) edge gateway over the Intel IXP2400 by characterizing/mapping the processing stages onto hardware components. The impact and strategy of resource allocation are also investigated through internal and external benchmarks. Important conclusions include: (1) the system throughput is influenced mostly by the total number of threads, namely IxJ, where I and J represent the numbers of processors and threads per processor, respectively, as long as the processors are not fully utilized, (2) given an application, algorithm and hardware specification, an appropriate (I, J) for packet inspection can be derived and (3) the effectiveness of multiple memory banks for tackling the SRAM bottleneck is affected considerably by the algorithms adopted.
Archive | 2009
Yi-Neng Lin; Shih-Hsin Chien; Ying-Dar Lin; Yuan-Cheng Lai; Mingshou Liu
The IEEE 802.16e-2005 is designed to support high bandwidth for the wireless metropolitan area network. However, the link quality is likely to degrade drastically due to the unstable wireless links, bringing ordeals to the real-time applications. Therefore, a feasible bandwidth allocation algorithm is required to utilize the precious bandwidth and to provide service differentiation. This article presents the general background of allocation schemes and introduces a Two-Phase Proportionating (TPP) algorithm to tackle the above challenges. The first phase dynamically determines the subframe sizes while the second phase further differentiates service classes and prevents from bandwidth waste. Performance comparison with other algorithms confirms that TPP achieves the highest bandwidth utilization and the most appropriate differentiation.
parallel computing | 2010
Yi-Neng Lin; Ying-Dar Lin; Yuan-Cheng Lai
This work tries to derive ideas for thread allocation in CMP-based network processors performing general applications by Continuous-Time Markov Chain modeling and Petri net simulations. The concept of P-M ratio, where P and M indicate the computational and memory access overhead when processing a packet, is introduced and the relation to thread allocation is explored. Results indicate that the demand of threads in a processor diminishes rapidly as P-M ratio increases to 0.066, and decreases slowly afterwards. Observations from a certain P-M ratio can be applied to various software-hardware combinations having the same ratio.
advanced information networking and applications | 2008
Yi-Neng Lin; Ying-Dar Lin; Yuan-Cheng Lai
This work tries to derive ideas for thread allocation in chip multiprocessor (CMP)-based network processors performing general applications by continuous-time Markov chain modeling and Petri net simulations. The concept of P-M ratio, where P and M indicate the computational and memory access overhead when processing a packet, is introduced and the relation to thread allocation is explored. Results indicate that the demand of threads in a processor diminishes rapidly as P-M ratio increases to 0.066, and decreases slowly afterwards. Observations from a certain P-M ratio can be applied to various software-hardware combinations having the same ratio.