Derek Chi-Wai Pao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Derek Chi-Wai Pao is active.

Explore More

Publication

Featured researches published by Derek Chi-Wai Pao.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1992

Shapes recognition using the straight line Hough transform: theory and generalization

Derek Chi-Wai Pao; Hon Fung Li; R. Jayakumar

A shape matching technique based on the straight line Hough transform (SLHT) is presented. In the theta - rho space, the transform can be expressed as the sum of the translation term and the intrinsic term. This formulation allows the translation, rotation, and intrinsic parameters of the curve to be easily decoupled. A shape signature, called the scalable translation invariant rotation-to-shifting (STIRS) signature, is obtained from the theta - rho space by computing the distances between pairs of points having the same theta value. This signature is invariant to translation and can be easily normalized, and rotation in the image space corresponds to circular shifting of the signature. Matching two signatures only amounts to computing a 1D correlation. The height and location of a peak (if it exists) indicate the similarity and orientation of the test object with respect to the reference object. The location of the test object is obtained, once the orientation is known, by an inverse transform (voting) from the theta - rho space to the x-y plane. >

Pattern Recognition | 1989

Improvements and systolic implementation of the Hough transformation for straight line detection

Hon Fung Li; Derek Chi-Wai Pao; R. Jayakumar

Abstract Hough Transformation (HT) is an efficient method to detect straight lines in digital pictures. In the conventional HT, pixel contiguity is not taken into account, and this leads to the following drawbacks: (1) actual length of line segments cannot be computed; (2) colinear line segments cannot be distinguished; and (3) very often, false lines are detected and short lines go undetected. This paper proposes a modified Hough Transformation which performs contiguity check in a simple and efficient way. A systolic architecture that implements this modified transform is presented. The systolic array takes the bit-map of the binary picture as input and processes one row/column of pixels concurrently. The area-time complexity of the proposed architecture is shown to be superior to the conventional sequential algorithm. Preliminary simulation results are presented.

Computer Networks | 2006

Efficient packet classification using TCAMs

Derek Chi-Wai Pao; Yiu Keung Li; Peng Zhou

Multi-field packet classification is necessary to support advanced Internet functions, such as network security, quality of service provisioning, traffic policing, virtual private networking, etc. Ternary content addressable memory (TCAM) is currently the dominant solution method used by the industry because of its speed and the simplicity of filter table management. High cost and high power consumption are the two major drawbacks of TCAM-based lookup engines. Adoption of IPv6 with increased address length will further exacerbate the challenges. In this article, we present a filter encoding method, called prefix inclusion coding (PIC) to improve the efficiency of TCAM-based lookup engines. Filters are stored in an encoded format to reduce storage requirement. Codeword assignment in PIC preserves the inclusion relationship among prefixes/ranges. By doing so, a prefix will be represented by a single codeword, and unnecessary filter replication can be avoided. Codeword lookup is equivalent to finding the longest matching prefix in the codeword table. Hence, a pure-TCAM lookup engine can be built without the needs of other semi-custom ASICs in the system. Our method can reduce the TCAM storage requirement by 70% to over 90%. The reduction in TCAM storage requirement also helps to alleviate the high power dissipation problem. The proposed method can be applied to both IPv4 and IPv6.

IEEE Computer Architecture Letters | 2008

Pipelined Architecture for Multi-String Matching

Derek Chi-Wai Pao; Wei Lin; Bin Liu

This letter presents a new oblivious routing algorithm for 3D mesh networks called randomized partially-minimal (RPM) routing that provably achieves optimal worst- case throughput for 3D meshes when the network radix fc is even and within a factor of 1/k2 of optimal when k is odd. Although this optimality result has been achieved with the minimal routing algorithm OITURN for the 2D case, the worst-case throughput of OITURN degrades tremendously in higher dimensions. Other existing routing algorithms suffer from either poor worst-case throughput (DOR, ROMM) or poor latency (VAL). RPM on the other hand achieves near optimal worst-case and good average-case throughput as well as good latency performance.

international conference on computer communications | 2002

Efficient hardware architecture for fast IP address lookup

Derek Chi-Wai Pao; Cutson Liu; Angus Wu; Lawrence Yeung; K.S. Chan

A multigigabit IP router may receive several million packets per second from each input link. For each packet, the router needs to find the longest matching prefix in the forwarding table in order to determine the packets next-hop. In this paper, we present an efficient hardware solution for the IP address lookup problem. We model the address lookup problem as a searching problem on a binary-trie. The binary-trie is partitioned into four levels of fixed size 255-node subtrees. We employ a hierarchical indexing structure to facilitate direct access to subtrees in a given level. It is estimated that a forwarding table with 40 K prefixes will consume 2.5 Mbytes of memory. The searching is implemented using a hardware pipeline with a minimum cycle of 12.5 ns if the memory modules are implemented using SRAM. A distinguishing feature of our design is that forwarding table entries are not replicated in the data structure. Hence, table updates can be done in constant time with only a few memory accesses.

ACM Transactions on Architecture and Code Optimization | 2010

A memory-efficient pipelined implementation of the aho-corasick string-matching algorithm

Derek Chi-Wai Pao; Wei Lin; Bin Liu

With rapid advancement in Internet technology and usages, some emerging applications in data communications and network security require matching of huge volume of data against large signature sets with thousands of strings in real time. In this article, we present a memory-efficient hardware implementation of the well-known Aho-Corasick (AC) string-matching algorithm using a pipelining approach called P-AC. An attractive feature of the AC algorithm is that it can solve the string-matching problem in time linearly proportional to the length of the input stream, and the computation time is independent of the number of strings in the signature set. A major disadvantage of the AC algorithm is the high memory cost required to store the transition rules of the underlying deterministic finite automaton. By incorporating pipelined processing, the state graph is reduced to a character trie that only contains forward edges. Together with an intelligent implementation of look-up tables, the memory cost of P-AC is only about 18 bits per character for a signature set containing 6,166 strings extracted from Snort. The control structure of P-AC is simple and elegant. The cost of the control logic is very low. With the availability of dual-port memories in FPGA devices, we can double the system throughput by duplicating the control logic such that the system can process two data streams concurrently. Since our method is memory-based, incremental changes to the signature set can be accommodated by updating the look-up tables without reconfiguring the FPGA circuitry.

international parallel and distributed processing symposium | 2007

Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs

Dong Lin; Yue Zhang; Chengchen Hu; Bin Liu; Xin Zhang; Derek Chi-Wai Pao

With the continuous advances in optical communications technology, the link transmission speed of Internet backbone has been increasing rapidly. This in turn demands more powerful IP address lookup engine. In this paper, we propose a power-efficient parallel TCAM-based lookup engine with a distributed logical caching scheme for dynamic load-balancing. In order to distribute the lookup requests among multiple TCAM chips, a smart partitioning approach called pre-order splitting divides the route table into multiple sub-tables for parallel processing. Meanwhile, by virtual of the cache-based load balancing scheme with slow-update mechanism, a speedup factor ofN-1 can be guaranteed for a system with N (N>2) TCAM chips, even with unbalanced bursty lookup requests.

Pattern Recognition Letters | 1993

A decomposable parameter space for the detection of ellipses

Derek Chi-Wai Pao; Hon Fung Li; R. Jayakumar

Abstract Hough transform is a well-known method for detecting parametric curves in binary images. One major drawback of the method is that the transform requires time and memory space exponential in the number of parameters of the curves. An effective approach to reduce both the time and space requirement is the parameter space decomposition. In this paper, we present two methods for the detection of ellipses based on the straight line Hough transform (SLHT). The SLHT of a curve in the θ-π space can be expressed as the sum of two terms, namely, the translation term , and the intrinsic term . One useful property of this representation is that it allows the translation, rotation and intrinsic parametersof the curve be separated easily. Timing performance of the proposed methods compares favorably with the other Hough-based methods.

Iet Computers and Digital Techniques | 2007

Enhanced prefix inclusion coding filter-encoding algorithm for packet classification with ternary content addressable memory

Derek Chi-Wai Pao; Peng Zhou; Bin Liu; Xin Zhang

Filter encoding can effectively enhance the efficiency of ternary content addressable memory (TCAM)-based packet classification. It can minimise the range expansion problem, reduce the TCAM space requirement and improve the lookup rate for IPv6. However, additional complexity will incur inevitably in the filter table update operations. Although the average update cost of the prefix inclusion coding (PIC) scheme is very low, the worst-case update cost can be significantly higher. Major modifications to the PIC scheme to improve its update performance are presented. The new coding scheme is called PIC with segmented domain. By dividing the field value domain into multiple segments, the mapping of field values to code points can be more structural and help avoid massive code-point relocation in the event of new insertions. Moreover, the simplified codeword lookup for the address fields can be implemented with embedded SRAM rather than with TCAM. Consequently, the lookup rate of the search engine can be improved to handle the OC-768 line rate.

IEEE Communications Letters | 2003

Enabling incremental updates to LC-trie for efficient management of IP forwarding tables

Derek Chi-Wai Pao; Yiu-Keung Li

Level-compressed trie (LC-trie) is an efficient data structure for fast IP address lookup. However, the data structure needs to be rebuilt every time the table is updated. Consequently, the LC-trie algorithm is not suitable for application in a dynamic environment where frequent updates to the forwarding table are necessary. It is shown that with appropriate modifications to the data structure, incremental updates can be done efficiently.

Explore More