Masanori Bando | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Masanori Bando is active.

Explore More

Publication

Featured researches published by Masanori Bando.

international conference on computer communications | 2010

FlashTrie: Hash-based Prefix-Compressed Trie for IP Route Lookup Beyond 100Gbps

Masanori Bando; H. Jonathan Chao

It is becoming apparent that the next generation IP route lookup architecture needs to achieve speeds of 100-Gbps and beyond while supporting both IPv4 and IPv6 with fast real-time updates to accommodate ever-growing routing tables. Some of the proposed multibit-trie based schemes, such as Tree Bitmap, have been used in todays high-end routers. However, their large data structure often requires multiple external memory accesses for each route lookup. A pipelining technique is widely used to achieve high-speed lookup with a cost of using many external memory chips. Pipelining also often leads to poor memory load-balancing. In this paper, we propose a new IP route lookup architecture called FlashTrie that overcomes the shortcomings of the multibit-trie based approach. We use a hash-based membership query to limit off-chip memory accesses per lookup to one and to balance memory utilization among the memory modules. We also develop a new data structure called Prefix-Compressed Trie that reduces the size of a bitmap by more than 80%. Our simulation and implementation results show that FlashTrie can achieve 160-Gbps worst-case throughput while simultaneously supporting 2-M prefixes for IPv4 and 279-k prefixes for IPv6 using one FPGA chip and four DDR3 SDRAM chips. FlashTrie also supports incremental real-time updates.

IEEE ACM Transactions on Networking | 2012

FlashTrie: beyond 100-Gb/s IP route lookup using hash-based prefix-compressed trie

Masanori Bando; Yi Li Lin; H. Jonathan Chao

It is becoming apparent that the next-generation IP route lookup architecture needs to achieve speeds of 100 Gb/s and beyond while supporting IPv4 and IPv6 with fast real-time updates to accommodate ever-growing routing tables. Some of the proposed multibit-trie-based schemes, such as TreeBitmap, have been used in todays high-end routers. However, their large data structures often require multiple external memory accesses for each route lookup. A pipelining technique is widely used to achieve high-speed lookup with the cost of using many external memory chips. Pipelining also often leads to poor memory load-balancing. In this paper, we propose a new IP route lookup architecture called FlashTrie that overcomes the shortcomings of the multibit-trie-based approaches. We use a hash-based membership query to limit off-chip memory accesses per lookup and to balance memory utilization among the memory modules. By compacting the data structure size, the lookup depth of each level can be increased. We also develop a new data structure called Prefix-Compressed Trie that reduces the size of a bitmap by more than 80%. Our simulation and implementation results show that FlashTrie can achieve 80-Gb/s worst-case throughput while simultaneously supporting 2 M prefixes for IPv4 and 318 k prefixes for IPv6 with one lookup engine and two Double-Data-Rate (DDR3) SDRAM chips. When implementing five lookup engines on a state-of-the-art field programmable gate array (FPGA) chip and using 10 DDR3 memory chips, we expect FlashTrie to achieve 1-Gpps (packet per second) throughput, equivalent to 400 Gb/s for IPv4 and 600 Gb/s for IPv6. FlashTrie also supports incremental real-time updates.

IEEE ACM Transactions on Networking | 2012

Scalable lookahead regular expression detection system for deep packet inspection

Masanori Bando; Nabi Sertac Artan; H.J. Chao

Regular expressions (RegExes) are widely used, yet their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of todays RegEx detection systems. The scalability of existing schemes is generally limited by the traditional detection paradigm based on per-character-state processing and state transition detection. The main focus of existing schemes is on optimizing the number of states and the required transitions, but not on optimizing the suboptimal character-based detection method. Furthermore, the potential benefits of allowing out-of-sequence detection, instead of detecting components of a RegEx in the order of appearance, have not been explored. Lastly, the existing schemes do not provide ways to adapt to the evolving RegExes. In this paper, we propose Lookahead Finite Automata (LaFA) to perform scalable RegEx detection. LaFA requires less memory due to these three contributions: 1) providing specialized and optimized detection modules to increase resource utilization; 2) systematically reordering the RegEx detection sequence to reduce the number of concurrent operations; 3) sharing states among automata for different RegExes to reduce resource requirements. Here, we demonstrate that LaFA requires an order of magnitude less memory compared to todays state-of-the-art RegEx detection systems. Using LaFA, a single-commodity field programmable gate array (FPGA) chip can accommodate up to 25  000 (25 k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimate that a 34-Gb/s throughput can be achieved.

high performance switching and routing | 2009

FlashLook: 100-Gbps hash-tuned route lookup architecture

Masanori Bando; N. Sertac Artan; H. Jonathan Chao

Since the recent increase in the popularity of services that require high bandwidth, such as high-quality video and voice traffic, the need for 100-Gbps equipment has become a reality. In particular, next generation routers are needed to support 100-Gbps worst-case IP lookup throughput for large IPv4 and IPv6 routing tables, while keeping the cost and power consumption low. It is challenging for todays state-of-the-art IP lookup schemes to satisfy all of these requirements. In this paper, we propose FlashLook, a low-cost, high-speed route lookup architecture scalable to large routing tables. FlashLook allows the use of low-cost DRAMs, while achieving high throughput. Traditionally, DRAMs are not known for their high throughput due to their high latency. However, FlashLook architecture achieves high-throughput with DRAMs by using the DRAM bursts efficiently to hide DRAM latency. FlashLook has a data structure that can be evenly partitioned into DRAM banks, a novel hash method, HashTune to smooth the hash table distribution and a data compaction method called verify bit aggregation to reduce memory usage of the hash table. These features of the FlashLook results in better DRAM memory utilization and less number of DRAM accesses per lookup. FlashLook achieves 100-Gbps worst-case throughput while simultaneously supporting 2M prefixes for IPv4 and 256k prefixes for IPv6 using one FPGA and 9 DRAM chips. FlashLook provides fast real-time updates that can support updates according to real update statistics.

international conference on communications | 2008

Boundary Hash for Memory-Efficient Deep Packet Inspection

Nabi Sertac Artan; Masanori Bando; H.J. Chao

Network intrusion detection and prevention systems (NIDPSs) are critical for network security. The deep packet inspection (DPI) operation consumes a significant amount of resources in NIDPS. This is because to detect malicious activity DPI searches a database of signatures for each byte of every packet. In this paper, we develop a highly space-efficient data structure for hardware realization of minimal perfect hash functions (MPHFs). This data structure is simple to construct, requires 7 n bits to represent the MPHF for a set of n keys and allows high-speed DPI.

architectures for networking and communications systems | 2009

LaFA : lookahead finite automata for scalable regular expression detection

Masanori Bando; N. Sertac Artan; H. Jonathan Chao

Although Regular Expressions (RegExes) have been widely used in network security applications, their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of todays RegEx detection systems. The scalability of existing schemes is generally limited by the traditional per character state processing and state transition detection paradigm. The main focus of existing schemes is in optimizing the number of states and the required transitions, but not the suboptimal character-based detection method. Furthermore, the potential benefits of reduced number of operations and states using out-of-sequence detection methods have not been explored. In this paper, we propose Looka-head Finite Automata (LaFA) to perform scalable RegEx detection using very small amount of memory. LaFAs memory requirement is very small due to the following three areas of effort described in this paper: (1) Different parts of a RegEx, namely RegEx components, are detected using different detectors, each of which is specialized and optimized for the detection of a certain RegEx component. (2) We systematically reorder the RegEx component detection sequence, which provides us with new possibilities for memory optimization. (3) Many redundant states in classical finite automata are identified and eliminated in LaFA. Our simulations show that LaFA requires an order of magnitude less memory compared to todays state-of-the-art RegEx detection systems. A single commodity Field Programmable Gate Array (FPGA) chip can accommodate up to twenty-five thousand (25k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimated that a 34-Gbps throughput can be achieved.

global communications conference | 2008

Highly Memory-Efficient LogLog Hash for Deep Packet Inspection

Masanori Bando; Nabi Sertac Artan; H.J. Chao

Todays network line rates reach speeds of 40 Gbps and are anticipated to reach 100 Gbps in the near future. These high speeds make Deep Packet Inspection (DPI) in Network Intrusion Detection and Prevention Systems (NIDPSs) very challenging. The DPI examines each incoming packet byte-by- byte and matches them against a set of predefined malicious signatures. One way to achieve high-speed DPI is to store all the signatures on high-speed on-chip memory. However, on-chip memory is limited and space-efficient data structures are needed to leverage precious on-chip memory efficiently. A hash table addressed by a Minimal Perfect Hash Function (MPHF) is such a high-speed, space efficient data structure. In this paper, we describe a highly memory-efficient MPHF, which requires 3.5 bits per key to facilitate access to the key in on-chip memory while allowing us to perform the expensive exact match operation only once. The proposed MPHF also has a low construction time.

architectures for networking and communications systems | 2010

Range hash for regular expression pre-filtering

Masanori Bando; N. Sertac Artan; Rihua Wei; Xiangyi Guo; H. Jonathan Chao

Recently, major Internet carriers and vendors successfully tested high-speed backbone networks at 100-Gbps line speed to support rapid growth of the Internet traffic demands. In addition, traffic is getting more concentrated to points such as data centers, and demand for protecting such high-speed networks from attack traffic is increasing. Deep Packet Inspection (DPI) with Regular Expression (RegEx) detection is the de facto defense mechanism agains network intrusions. However, current RegEx detection systems cannot keep up with the upcoming high-speed line rate. The RegExes consist of three types of components, exact strings, character classes (CC), and repetitions. Exact string and repetition matching have been widely studied by RegEx research community for better performance. Yet, although more than 55% of RegExes in Snort signature set contain at least one CC, hardware based solutions that focus on CC detection is limited. In this paper we propose a new CC detection architecture called Range Hash that is suitable for high-speed, compact CC detection. Additionally, we propose a practical application of the Range Hash architecture where it can be used as a pre-filter for a Regular Expression detection system to increase overall RegEx detection performance. Based on our hardware prototype design which runs at 250MHz, Range Hash can reach to 100-Gbps CC detection throughput with todays FPGA chips.

ieee international symposium on parallel distributed processing workshops and phd forum | 2010

Hardware implementation for scalable lookahead Regular Expression detection

Masanori Bando; N. Sertac Artan; Nishit Mehta; Yi Guan; H. Jonathan Chao

Regular Expressions (RegExes) are widely used in various applications to identify strings of text. Their flexibility, however, increases the complexity of the detection system and often limits the detection speed as well as the total number of RegExes that can be detected using limited resources. The two classical detection methods, Deterministic Finite Automaton (DFA) and Non-Deterministic Finite Automaton (NFA), have the potential problems of prohibitively large memory requirements and a large number of concurrent operations, respectively. Although recent schemes addressing these problems to improve DFA and NFA are promising, they are inherently limited by their scalability, since they follow the state transition model in DFA and NFA, where the state transitions occur per each character of the input. We recently proposed a scalable RegEx detection system called Lookahead Finite Automata (LaFA) to solve these problems with three novel ideas: 1. Provide specialized and optimized detection modules to increase resource utilizations. 2. Systematically reordering the RegEx detection sequence to reduce number of concurrent operations. 3. Sharing states among automata for different RegExes to reduce resource requirements. In this paper, we propose an efficient hardware architecture and prototype design implementation based on LaFA. Our proof-of-concept prototype design is built on a fraction of a single commodity Field Programmable Gate Array (FPGA) chip and can accommodate up to twenty-five thousand (25k) RegExes. Using only 7% of the logic area and 25% of the memory on a Xilinx Virtex-4 FX100, the prototype design can achieve 2-Gbps (gigabits-per-second) detection throughput with only one detection engine. We estimate that 34-Gbps detection throughput can be achieved if the entire resources of a state-of-the-art FPGA chip are used to implement multiple detection engines.

Archive | 2010