Masakatsu Ishizaki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Masakatsu Ishizaki is active.

Explore More

Publication

Featured researches published by Masakatsu Ishizaki.

international solid-state circuits conference | 2010

A scalable massively parallel processor for real-time image processing

Takashi Kurafuji; Masaru Haraguchi; Masami Nakajima; Tetsu Nishijima; Tetsushi Tanizaki; Hiroyuki Yamasaki; Takeaki Sugimura; Yuta Imai; Masakatsu Ishizaki; Takeshi Kumaki; Kan Murata; Kanako Yoshida; Eisuke Shimomura; Hideyuki Noda; Yoshihiro Okuno; Shunsuke Kamijo; Tetsushi Koide; Hans Jürgen Mattausch; Kazutami Arimoto

This paper describes a high performance scalable massively parallel single-instruction multiple-data (SIMD) processor and power/area efficient real-time image processing. The SIMD processor combines 4-bit processing elements (PEs) with SRAM on a small area and thus enables at the same time a high performance of 191 GOPS, a high power efficiency of 310 GOPS/W, and a high area efficiency of 31.6 GOPS/mm2 . The applied pipeline architecture is optimized to reduce the number of controller overhead cycles so that the SIMD parallel processing unit can be utilized during up to 99% of the operating time of typical application programs. The processor can be also optimized for low cost, low power, and high performance multimedia system-on-a-chip (SoC) solutions. A combination of custom and automated implementation techniques enables scalability in the number of PEs. The processor has two operating modes, a normal frequency (NF) mode for higher power efficiency and a double frequency (DF) mode for higher performance. The combination of high area efficiency, high power efficiency, high performance, and the flexibility of the SIMD processor described in this paper expands the application of real-time image processing technology to a variety of electronic devices.

IEICE Transactions on Information and Systems | 2007

Acceleration of DCT Processing with Massive-Parallel Memory-Embedded SIMD Matrix Processor

Takeshi Kumaki; Masakatsu Ishizaki; Tetsushi Koide; Hans Jürgen Mattausch; Yasuto Kuroda; Hideyuki Noda; Katsumi Dosaka; Kazutami Arimoto; Kazunori Saito

This paper reports an efficient Discrete Cosine Transform (DCT) processing method for images using a massive-parallel memory-embedded SIMD matrix processor. The matrix-processing engine has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. For compatibility with this matrix-processing architecture, the conventional DCT algorithm has been improved in arithmetic order and the vertical/horizontal-space 1 Dimensional (1D)-DCT processing has been further developed. Evaluation results of the matrix-engine-based DCT processing show that the necessary clock cycles per image block can be reduced by 87% in comprison to a conventional DSP architecture. The determined performances in MOPS and MOPS/mm2 are factors 8 and 5.6 better than with a conventional DSP, respectively.

IEICE Transactions on Electronics | 2008

Integration Architecture of Content Addressable Memory and Massive-Parallel Memory-Embedded SIMD Matrix for Versatile Multimedia Processor

Takeshi Kumaki; Masakatsu Ishizaki; Tetsushi Koide; Hans Jürgen Mattausch; Yasuto Kuroda; Takayuki Gyohten; Hideyuki Noda; Katsumi Dosaka; Kazutami Arimoto; Kazunori Saito

This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded SIMD matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded SIMD matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The SIMD matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel SIMD matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded SIMD matrix processor and a conventional mobile DSP, respectively.

IEICE Transactions on Information and Systems | 2007

Scalable FPGA/ASIC Implementation Architecture for Parallel Table-Lookup-Coding Using Multi-Ported Content Addressable Memory

Takeshi Kumaki; Yutaka Kono; Masakatsu Ishizaki; Tetsushi Koide; Hans Jürgen Mattausch

This paper presents a scalable FPGA/ASIC implementation architecture for high-speed parallel table-lookup-coding using multi-ported content addressable memory, aiming at facilitating effective table-lookup-coding solutions. The multi-ported CAM adopts a Flexible Multi-ported Content Addressable Memory (FMCAM) technology, which represents an effective parallel processing architecture and was previously reported in [1]. To achieve a high-speed parallel table-lookup-coding solution, FMCAM is improved by additional schemes for a single search mode and counting value setting mode, so that it permits fast parallel table-lookup-coding operations. Evaluation results for Huffman encoding within the JPEG application show that a synthesized semi-custom ASIC implementation of the proposed architecture can already reduce the required clock-cycle number by 93% in comparison to a conventional DSP. Furthermore, the performance per area unit, measured in MOPS/mm2, can be improved by a factor of 3.8 in comparison to parallel operated DSPs. Consequently, the proposed architecture is very suitable for FPGA/ASIC implementation, and is a promising solution for small area integrated realization of real-time table-lookup-coding applications.

IEICE Transactions on Information and Systems | 2007

Real-Time Huffman Encoder with Pipelined CAM-Based Data Path and Code-Word-Table Optimizer

Takeshi Kumaki; Yasuto Kuroda; Masakatsu Ishizaki; Tetsushi Koide; Hans Jürgen Mattausch; Hideyuki Noda; Katsumi Dosaka; Kazutami Arimoto; Kazunori Saito

This paper presents a novel optimized real-time Huffman encoder using a pipelined data path based on CAM technology and a parallel code-word-table optimizer. The exploitation of CAM technology enables fast parallel search of the code word table. At the same time, the code word table is optimized according to the frequency of received input symbols and is up-dated in real-time. Since these two functions work in parallel, the proposed architecture realizes fast parallel encoding and keeps a constantly high compression ratio. Evaluation results for the JPEG application show that the proposed architecture can achieve up to 28% smaller encoded picture sizes than the conventional architectures. The obtained encoding time can be reduced by 95% in comparison to a conventional SRAM-based architecture, which is suitable even for the latest end-user-devices requiring fast frame-rates. Furthermore, the proposed architecture provides the only encoder that can simultaneously realize small compressed data size and fast processing speed.

ieee region 10 conference | 2006

Huffman Encoding Architecture with Self-Optimizing Performance and Multiple CAM-Match Utilization

Masakatsu Ishizaki; Takeshi Kumaki; Y. Kouno; Tetsushi Koide; Hans Jürgen Mattausch; Yasuto Kuroda; T. Gyoten; Hideyuki Noda; Katsumi Dosaka; Kazutami Arimoto; Kazunori Saito

This paper presents a method for achieving high speed and high compression ratio of Huffman encoding by updating and optimizing the code word table. A shadow code word table is continuously reconstructed according to the frequency distribution of the currently encoded symbols and used to replace the active code word table in real-time, if the compression ratio degrades. Multiple-matches in a content addressable memory (CAM) are additionally exploited to improve encoding speed to real-time requirements. A higher compression ratio can be obtained by optimizing the update timing. This paper also estimates the best method for the updating of the code word table by the simulation. As a result the compressed data size is up to 22.6% smaller than with the conventional Huffman encoding architecture, which uses a standard encoding table

midwest symposium on circuits and systems | 2007

CAM enhanced super parallel SIMD processor with high-speed pattern matching capability

Takeshi Kumaki; Yutaka Kono; Masakatsu Ishizaki; Masaharu Tagami; Tetsushi Koide; Hans Jürgen Mattausch; Takayuki Gyohten; Hideyuki Noda; Yasuto Kuroda; Katsumi Dosaka; Kazutami Arimoto; Kazunori Saito

A super parallel SIMD processor has been proposed as a novel SIMD multimedia processor, which is better way for processing several types multimedia data. This processor supports 2,048-way bit-serial and word-parallel operation. Moreover, 2,048 Processing elements can synchronize with a single command and process all stored data, thus achieving highly- parallel processing with low power consumption. In this paper, for further improving processing efficiency of multimedia data, a Content Addressable Memory (CAM) is embedded in the super parallel SIMD processor. In case of JPEG application, the original super parallel SIMD processor can reduce the number of average clock cycles about 55% smaller than for the conventional DSP. Furthermore, the clock cycle number with the CAM enhanced super parallel SIMD processor is 90% smaller than with the original super parallel SIMD processor. Thus, the proposed architecture is very suitable for the multimedia data processing.

asia pacific conference on circuits and systems | 2006

Application of Multi-ported CAM for Parallel Coding

Takeshi Kumaki; Yutaka Kouno; Masakatsu Ishizaki; Tetsushi Koide; Hans Jürgen Mattausch

This paper presents a parallel coding architecture using a flexible multi-ported content addressable memory (CAM). A previously reported flexible multi-port content addressable memory (FMCAM) technology (Kumaki et al., 2004) is improved by additional schemes for a single search mode and counting value setting and enables the fast parallel coding operation. Evaluation results for Huffman encoding within the JPEG application show that the proposed architecture can reduce the required clock-cycle number by 93% is comparison to a conventional DSP. Furthermore, the performance per unit area, measured in MOPS/mm2, can be improved by a factor 3.8 in comparison to a conventional DSP

Archive | 2008

COMPRESSION PROCESSING APPARATUS AND COMPRESSION PROCESSING METHOD

Hans Jürgen Mattausch; Tetsushi Koide; Takeshi Kumaki; Masakatsu Ishizaki

Archive | 2010

SEMICONDUCTOR DEVICE PERFORMING OPERATIONAL PROCESSING

Masakatsu Ishizaki; Takeshi Kumaki; Masaharu Tagami; Yuta Imai; Tetsushi Koide; Hans Jürgen Mattausch; Takayuki Gyoten; Hideyuki Noda; Yoshihiro Okuno; Kazutami Arimoto

Explore More