Yasuto Kuroda | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yasuto Kuroda is active.

Explore More

Publication

Featured researches published by Yasuto Kuroda.

IEEE Journal of Solid-state Circuits | 2013

A 250-MHz 18-Mb Full Ternary CAM With Low-Voltage Matchline Sensing Scheme in 65-nm CMOS

Isamu Hayashi; Teruhiko Amano; Naoya Watanabe; Yuji Yano; Yasuto Kuroda; Masaya Shirata; Katsumi Dosaka; Koji Nii; Hideyuki Noda; Hiroyuki Kawai

An 18-Mb full ternary CAM with low-voltage matchline sensing scheme (LVMLSS) is designed and fabricated in 65-nm bulk CMOS process. LVMLSS has three key techniques: voltage down converter, differential sense amplifier with matchline isolation, and reference voltage generation scheme. With these techniques, LVMLSS can reduce the dynamic power consumption of matchlines to 33% compared with conventional one and realizes 42% fast match-line sensing. At 1.0-V typical supply voltage, 250-MHz search frequency is achieved. The power consumption of fully paralleled search operation at 250 MHz is 9.3 W, which is 66% smaller than previous work. This work has realized high-speed, low-power, and robust large-scale TCAM. We believe that this work will greatly contribute to reducing the power of network systems.

international symposium on circuits and systems | 2005

CAM-based VLSI architecture for Huffman coding with real-time optimization of the code word table [image coding example]

Takeshi Kumaki; Yasuto Kuroda; Tetsushi Koide; Hans Jürgen Mattausch; Hideyuki Noda; Katsumi Dosaka; Kazutami Arimoto; Kazunori Saito

Huffman coding is probably the best known and most widely used data compression technique. Nevertheless, the task of further decreased compression ratio through Huffman code up-dating in real-time is still a largely unsolved problem. In this paper, a novel architecture for CAM (content addressable memory)-based Huffman coding with real-time optimization of the code word table, called CHRC, is proposed. A CAM is exploited to implement fast Huffman encoding, and simultaneously the code word table is reconstructed and up-dated in realtime. The effectiveness of the proposed architecture is verified by structure, encoding flow and simulation results. The example of a JPEG application shows that our proposed CHRC method is able to achieve up to 40% smaller encoded picture sizes, and 6 times smaller clock cycle number for the encoding hardware than conventional Huffman coding methods.

IEICE Transactions on Information and Systems | 2007

Acceleration of DCT Processing with Massive-Parallel Memory-Embedded SIMD Matrix Processor

Takeshi Kumaki; Masakatsu Ishizaki; Tetsushi Koide; Hans Jürgen Mattausch; Yasuto Kuroda; Hideyuki Noda; Katsumi Dosaka; Kazutami Arimoto; Kazunori Saito

This paper reports an efficient Discrete Cosine Transform (DCT) processing method for images using a massive-parallel memory-embedded SIMD matrix processor. The matrix-processing engine has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. For compatibility with this matrix-processing architecture, the conventional DCT algorithm has been improved in arithmetic order and the vertical/horizontal-space 1 Dimensional (1D)-DCT processing has been further developed. Evaluation results of the matrix-engine-based DCT processing show that the necessary clock cycles per image block can be reduced by 87% in comprison to a conventional DSP architecture. The determined performances in MOPS and MOPS/mm2 are factors 8 and 5.6 better than with a conventional DSP, respectively.

international conference on electronics, circuits, and systems | 2010

Hardware implementation of fast forwarding engine using standard memory and dedicated circuit

Kazuya Zaitsu; Koji Yamamoto; Yasuto Kuroda; Kazunari Inoue; Shingo Ata; Ikuo Oka

Ternary content addressable memory (TCAM) is becoming very popular for designing high-throughput forwarding engines on routers. However, TCAM has potential problems in terms of cost of hardware and cost of power, which limits it deploying large amounts of capacity. In this paper, we propose a new hardware architecture for a fast forwarding engine that fundamentally solves the potential problems of TCAM. We also develop a hardware design for our architecture. Our results show that the proposed hardware reduces the costs of hardware resources needed and power consumption into 62% and 52%, respectively.

IEICE Transactions on Electronics | 2008

Integration Architecture of Content Addressable Memory and Massive-Parallel Memory-Embedded SIMD Matrix for Versatile Multimedia Processor

Takeshi Kumaki; Masakatsu Ishizaki; Tetsushi Koide; Hans Jürgen Mattausch; Yasuto Kuroda; Takayuki Gyohten; Hideyuki Noda; Katsumi Dosaka; Kazutami Arimoto; Kazunori Saito

This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded SIMD matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded SIMD matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The SIMD matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel SIMD matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded SIMD matrix processor and a conventional mobile DSP, respectively.

consumer communications and networking conference | 2013

2D Sliced Packet Buffer with traffic volume and buffer occupancy adaptation for power saving

Kenzo Okuda; Shingo Ata; Yasuto Kuroda; Yuji Yano; Hisashi Iwamoto; Kazunari Inoue; Ikuo Oka

Recently, energy consumption of routers has become a serious problem, hence power reduction is an urgent and important challenge. Existing routers always work 100% of their potentials regardless of required performance, such as volume of input traffic. However, semiconductor devices such as lookup logics, buffers, fabrics are not always fully utilized. In particular, the occupancy of packet buffer is very low in many cases, especially in core routers, which leads to unnecessary consumption of electric power. To solve this problem, a new buffer architecture called Sliced Packet Buffer was proposed. Dividing a whole buffer into multiple sub buffers (called slices) at the LSI level enables to control power management independently according to its occupancy. However, input traffic rate is another parameter for further power savings, which was not considered so far. In this paper, we propose a Two Dimensional Sliced Packet Buffer which enables power management according to not only the buffer occupancy but also the traffic volume. We also propose a model of accurate performance evaluation on energy consumption in the granularity of operational instruction. Through trace-driven simulations with real traffic, we show that our proposed packet buffer can reduce an average 66% of power consumption when an average input rate is 30%.

high performance switching and routing | 2012

A slice structure using the management of network traffic prediction for green IT

Yuji Yano; Hisashi Iwamoto; Yasuto Kuroda; Shiro Ohtani; Shingo Ata; Kazunari Inoue

Maintaining complete network service with the current infrastructure is an urgent task due to continuous growth in network traffic. It is expected that the energy consumption of network routers may become a global environmental problem, and therefore, research and development into power reduction is well desired. Our group proposes a unique structure embedded into routers, which consists of multiple slices and is dynamically controlled by the prediction of network traffic flow. In this paper, we examine this slice control and LSI architecture, and show the validity of a router. Also, the simulation used in this study is based on the true traffic in the university.

international conference on communications | 2014

Development of onboard LPM-based header processing and reactive link selection for optical packet and circuit integrated networks

Hideaki Furukawa; Takaya Miyazawa; Hiroaki Harai; Yasuto Kuroda; Shoji Koyama; Shin’ichi Arakawa; Masayuki Murata

An optical packet and circuit integrated network (OPCInet) will allow diverse services, enhanced functional flexibility, and efficient energy consumption, as optical packet switching (OPS) and optical circuit switching (OCS) links are provided on the same infrastructure. In this paper, we design and develop a control system consisting of header processing for optical packets, signaling and routing for optical path setup, and wavelength resource control and link switching between OPS and OCS for the efficient use of resources. Our onboard optical packet header processor is capable of 16-bit longest prefix matching by embedding a 0.6W (5% of TCAM technology on the same condition), 200 million search-per-second (equivalent to 100 Gbps) forwarding engine LSI, and a statistical memory LSI for retaining the statistical information of processed headers. Reactive control automatically selects a packet or circuit link according to the statistical traffic information from the header processor and works with signaling for optical paths. We present an experimental demonstration of a new control system using a previously developed OPCInet testbed.

IEICE Transactions on Information and Systems | 2007

Real-Time Huffman Encoder with Pipelined CAM-Based Data Path and Code-Word-Table Optimizer

Takeshi Kumaki; Yasuto Kuroda; Masakatsu Ishizaki; Tetsushi Koide; Hans Jürgen Mattausch; Hideyuki Noda; Katsumi Dosaka; Kazutami Arimoto; Kazunori Saito

This paper presents a novel optimized real-time Huffman encoder using a pipelined data path based on CAM technology and a parallel code-word-table optimizer. The exploitation of CAM technology enables fast parallel search of the code word table. At the same time, the code word table is optimized according to the frequency of received input symbols and is up-dated in real-time. Since these two functions work in parallel, the proposed architecture realizes fast parallel encoding and keeps a constantly high compression ratio. Evaluation results for the JPEG application show that the proposed architecture can achieve up to 28% smaller encoded picture sizes than the conventional architectures. The obtained encoding time can be reduced by 95% in comparison to a conventional SRAM-based architecture, which is suitable even for the latest end-user-devices requiring fast frame-rates. Furthermore, the proposed architecture provides the only encoder that can simultaneously realize small compressed data size and fast processing speed.

consumer communications and networking conference | 2016

Energy-efficient high-speed search engine using a multi-dimensional TCAM architecture with parallel pipelined subdivided structure

Masami Nawa; Kenzo Okuda; Shingo Ata; Yasuto Kuroda; Yuji Yano; Hisashi Iwamoto; Kazunari Inoue; Ikuo Oka

Packet classification has become increasingly complex and important to network equipment intended for future use. A recent trend to achieve complex packet classification is to use software-based methods, which tend to be slower than hardware-based methods. For search, this typically means using ternary content-addressable memory (TCAM) to make classification feasible. However, TCAM is not well-suited to the long (in bits) and sparse rules used for running advanced applications that require complicated classification. We propose a multi-dimension search engine (MDSE) that is optimized for use with long, sparse rules, and we propose a multi-dimensional TCAM scheme, which is an MDSE constructed to operate on TCAM. Through fine-grained simulations with real traffic, we show that our proposed search engine can reduce the power consumed by network equipment by about 85%.

Explore More