Is this you? Create Your Porfile

Hoang Le

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hoang Le is active.

Explore More

Publication

Featured researches published by Hoang Le.

field-programmable custom computing machines | 2009

Scalable High Throughput and Power Efficient IP-Lookup on FPGA

Hoang Le; Viktor K. Prasanna

Most high-speed Internet Protocol (IP) lookup implementations use tree traversal and pipelining. Due to the available on-chip memory and the number of I/O pins ofField Programmable Gate Arrays (FPGAs), state-of-the-artdesigns cannot support the current largest routing table(consisting of 257K prefixes in backbone routers). We propose a novel scalable high-throughput, low-power SRAM-based linear pipeline architecture for IP lookup. Using asingle FPGA, the proposed architecture can support thecurrent largest routing table, or even larger tables of upto 400K prefixes. Our architecture can also be easily partitioned, so as to use external SRAM to handle even larger routing tables (up to 1.7M prefixes). Our implementation shows a high throughput (340 mega lookups per second or 109 Gbps), even when external SRAM is used. The use of SRAM (instead of TCAM) leads to an order of magnitude reduction in power dissipation. Additionally, the architecture supports power saving by allowing only a portion of the memory to be active on each memory access. Our design also maintains packet input order and supports in-place non-blocking route updates.

field programmable custom computing machines | 2008

A SRAM-based Architecture for Trie-based IP Lookup Using FPGA

Hoang Le; Weirong Jiang; Viktor K. Prasanna

Internet Protocol (IP) lookup in routers can be implemented by some form of tree traversal. Pipelining can dramatically improve the search throughput. However, it results in unbalanced memory allocation over the pipeline stages. This has been identified as a major challenge for pipelined solutions. In this paper, an IP lookup rate of 325 MLPS (millions lookups per second) is achieved using a novel SRAM-based bidirectional optimized linear pipeline architecture on Field Programmable Gate Array, named BiOLP, for tree-based search engines in IP routers. BiOLP can also achieve a perfectly balanced memory distribution over the pipeline stages. Moreover, by employing caching to exploit the Internet traffic locality, BiOLP can achieve a high throughput of up to 1.3 GLPS (billion lookups per second). It also maintains packet input order, and supports route updates without blocking subsequent incoming packets.

IEEE Transactions on Computers | 2012

Scalable Tree-Based Architectures for IPv4/v6 Lookup Using Prefix Partitioning

Hoang Le; Viktor K. Prasanna

Memory efficiency and dynamically updateable data structures for Internet Protocol (IP) lookup have regained much interest in the research community. In this paper, we revisit the classic tree-based approach for solving the longest prefix matching (LPM) problem used in IP lookup. In particular, we target our solutions for a class of large and sparsely distributed routing tables, such as those potentially arising in the next-generation IPv6 routing protocol. Due to longer prefix lengths and much larger address space, preprocessing such routing tables for tree-based LPM can significantly increase the number of prefixes and/or memory stages required for IP lookup. We propose a prefix partitioning algorithm (DPP) to divide a given routing table into k groups of disjoint prefixes (k is given). The algorithm employs dynamic programming to determine the optimal split lengths between the groups to minimize the total memory requirement. Our algorithm demonstrates a substantial reduction in the memory footprint compared with those of the state of the art in both IPv4 and IPv6 cases. Two proposed linear pipelined architectures, which achieve high throughput and support incremental updates, are also presented. The proposed algorithm and architectures achieve a memory efficiency of 1 byte of memory for each byte of prefix for both IPv4 and IPv6. As a result, our design scales well to support either larger routing tables, longer prefix lengths, or both. The total memory requirement depends solely on the number of prefixes. Implementations on 45 nm ASIC and a state-of-the-art FPGA device (for a routing table consisting of 330K prefixes) show that our algorithm achieves 980 and 410 million lookups per second, respectively. These results are well suited for 100 Gbps lookup. The implementations also scale to support larger routing tables and longer prefix length when we go from IPv4 to IPv6. Additionally, the proposed architectures can easily interface with external SRAMs to ease the limitation of on-chip memory of the target devices.

field-programmable logic and applications | 2008

Scalable high-throughput SRAM-based architecture for IP-lookup using FPGA

Hoang Le; Weirong Jiang; Viktor K. Prasanna

Most high-speed Internet Protocol (IP) lookup implementations use tree traversal and pipelining. However, this approach results in inefficient memory utilization. Due to available on-chip memory and pin limitations of FPGAs, state-of-the-art designs on FPGAs cannot support large routing tables arising in backbone routers. Therefore, ternary content addressable memory (TCAM) is widely used. We propose a novel SRAM-based linear pipeline architecture, named DuPI. Using a single Virtex-4, DuPI can support a routing table of up to 228 K prefixes, which is 3times the state-of-the-art. Our architecture can also be easily partitioned, so as to use external SRAM to handle even larger routing tables (up to 2 M prefixes), while maintaining a 324 MLPS throughput. The use of SRAM (instead of TCAM) leads to orders of magnitude of reduction in power dissipation. Employing caching to exploit Internet traffic locality, we can achieve a throughput of 1.3 GLPS (billion lookups per second). Our design also maintains packet input order, and supports in-place non-blocking route updates.

field programmable gate arrays | 2011

Memory-efficient and scalable virtual routers using FPGA

Hoang Le; Thilan Ganegedara; Viktor K. Prasanna

Router virtualization has recently gained much interest in the research community. It allows multiple virtual router instances to run on a common physical router platform. The key metrics in designing network virtual routers are: (1) number of supported virtual router instances, (2) total number of prefixes, and (3) ability to quickly update the virtual table. Limited on-chip memory in FPGA leads to the need for memory-efficient merging algorithms. On the other hand, due to high frequency of combined updates from all the virtual routers, the merging algorithms must be highly efficient. Hence, the router must support quick updates. In this paper, we propose a simple merging algorithm whose performance is not sensitive to the number of routing tables considered. The performance solely depends on the total number of prefixes. We also propose a novel scalable, high-throughput linear pipeline architecture for IP-lookup that supports large virtual routing tables and quick non-blocking update. Using a state-of-the-art Field Programmable Gate Array (FPGA) along with external SRAM, the proposed architecture can support up to 16M IPv4 and 880K IPv6 prefixes. Our implementation shows a sustained through-put of 400 million lookups per second, even when external SRAM is used.

field-programmable logic and applications | 2013

Energy efficient parameterized FFT architecture

Ren Chen; Hoang Le; Viktor K. Prasanna

In this paper, we revisit the classic Fast Fourier Transform (FFT) for energy efficient designs on FPGAs. A parameterized FFT architecture is proposed to identify the design trade-offs in achieving energy efficiency. We first perform design space exploration by varying the algorithm mapping parameters, such as the degree of vertical and horizontal parallelism, that characterize decomposition based FFT algorithms. Then we explore an energy efficient design by empirical selection on the values of the chosen architecture parameters, including the type of memory elements, the type of interconnection network and the number of pipeline stages. The trade offs between energy, area, and time are analyzed using two performance metrics: the energy efficiency (defined as the number of operations per Joule) and the Energy×Area×Time (EAT) composite metric. From the experimental results, a design space is generated to demonstrate the effect of these parameters on the various performance metrics. For N-point FFT (16 ≤ N ≤ 1024), our designs achieve up to 28% and 38% improvement in the energy efficiency and EAT, respectively, compared with a state-of-the-art design.

international conference on computer communications | 2012

Hierarchical hybrid search structure for high performance packet classification

Og̃uzhan Erdem; Hoang Le; Viktor K. Prasanna

Hierarchical search structures for packet classification offer good memory performance and support quick rule updates when implemented on multi-core network processors. However, pipelined hardware implementation of these algorithms has two disadvantages: (1) backtracking which requires stalling the pipeline and (2) inefficient memory usage due to variation in the size of the trie nodes. We propose a clustering algorithm that can partition a given rule database into a fixed number of clusters to eliminate back-tracking in the state-of-the-art hierarchical search structures. Furthermore, we develop a novel ternary trie data structure (T∈). In T∈ structure, the size of the trie nodes is fixed by utilizing ∈-branch property, which overcomes the memory inefficiency problems in the pipelined hardware implementation of hierarchical search structures. We design a two-stage hierarchical search structure consisting of binary search trees in Stage 1, and T∈ structures in Stage 2. Our approach demonstrates a substantial reduction in the memory footprint compared with that of the state-of-the-art. For all publicly available databases, the achieved memory efficiency is between 10.37 and 22.81 bytes of memory per rule. State-of-the-art designs can only achieve the memory efficiency of over 23 byte/rule in the best case. We also propose a SRAM-based linear pipelined architecture for packet classification that achieves high throughput. Using a state-of-the-art FPGA, the proposed design can sustain a 418 million packets per second throughput or 134 Gbps (for the minimum packet size of 40 Bytes). Additionally, our design maintains packet input order and supports in-place non-blocking rule updates.

field-programmable custom computing machines | 2011

Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression

Hoang Le; Weirong Jiang; Viktor K. Prasanna

Memory efficiency with compact data structures for Internet Protocol (IP) lookup has recently regained much interest in the research community. In this paper, we revisit the classic trie-based approach for solving the longest prefix matching (LPM) problem used in IP lookup. In particular, we target our solutions for a class of large and sparsely-distributed routing tables, such as those potentially arising in the next-generation IPv6 routing protocol. Due to longer prefix lengths and much larger address space, straight-forward implementation of trie-based LPM can significantly increase the number of nodes and/or memory required for IP lookup. Additionally, due to the available on-chip memory and the number of I/O pins of Field Programmable Gate Arrays (FPGAs), state-of-the-art designs cannot support large IPv6 routing tables consisting of over

reconfigurable computing and fpgas | 2011

Optimizing Decomposition-Based Packet Classification Implementation on FPGAs

Lu Sun; Hoang Le; Viktor K. Prasanna

300

international conference on computer communications | 2010

High Performance Dictionary-Based String Matching for Deep Packet Inspection

Yi-Hua Edward Yang; Hoang Le; Viktor K. Prasanna

K prefixes. We propose two algorithms to compress the uni-bit-trie representation of a given routing table: (1) \emph{single-prefix distance-bounded path compression} and (2) \emph{multiple-prefix distance-bounded path compression}. These algorithms determine the optimal maximum \emph{skip distance} at each node of the trie to minimize the total memory requirement. Our algorithms demonstrates substantial reduction in the memory footprint compared with the uni-bit-trie algorithm (

Explore More