Takeshi Kumaki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Takeshi Kumaki is active.

Explore More

Publication

Featured researches published by Takeshi Kumaki.

international solid-state circuits conference | 2010

A scalable massively parallel processor for real-time image processing

Takashi Kurafuji; Masaru Haraguchi; Masami Nakajima; Tetsu Nishijima; Tetsushi Tanizaki; Hiroyuki Yamasaki; Takeaki Sugimura; Yuta Imai; Masakatsu Ishizaki; Takeshi Kumaki; Kan Murata; Kanako Yoshida; Eisuke Shimomura; Hideyuki Noda; Yoshihiro Okuno; Shunsuke Kamijo; Tetsushi Koide; Hans Jürgen Mattausch; Kazutami Arimoto

This paper describes a high performance scalable massively parallel single-instruction multiple-data (SIMD) processor and power/area efficient real-time image processing. The SIMD processor combines 4-bit processing elements (PEs) with SRAM on a small area and thus enables at the same time a high performance of 191 GOPS, a high power efficiency of 310 GOPS/W, and a high area efficiency of 31.6 GOPS/mm2 . The applied pipeline architecture is optimized to reduce the number of controller overhead cycles so that the SIMD parallel processing unit can be utilized during up to 99% of the operating time of typical application programs. The processor can be also optimized for low cost, low power, and high performance multimedia system-on-a-chip (SoC) solutions. A combination of custom and automated implementation techniques enables scalability in the number of PEs. The processor has two operating modes, a normal frequency (NF) mode for higher power efficiency and a double frequency (DF) mode for higher performance. The combination of high area efficiency, high power efficiency, high performance, and the flexibility of the SIMD processor described in this paper expands the application of real-time image processing technology to a variety of electronic devices.

international symposium on circuits and systems | 2005

CAM-based VLSI architecture for Huffman coding with real-time optimization of the code word table [image coding example]

Takeshi Kumaki; Yasuto Kuroda; Tetsushi Koide; Hans Jürgen Mattausch; Hideyuki Noda; Katsumi Dosaka; Kazutami Arimoto; Kazunori Saito

Huffman coding is probably the best known and most widely used data compression technique. Nevertheless, the task of further decreased compression ratio through Huffman code up-dating in real-time is still a largely unsolved problem. In this paper, a novel architecture for CAM (content addressable memory)-based Huffman coding with real-time optimization of the code word table, called CHRC, is proposed. A CAM is exploited to implement fast Huffman encoding, and simultaneously the code word table is reconstructed and up-dated in realtime. The effectiveness of the proposed architecture is verified by structure, encoding flow and simulation results. The example of a JPEG application shows that our proposed CHRC method is able to achieve up to 40% smaller encoded picture sizes, and 6 times smaller clock cycle number for the encoding hardware than conventional Huffman coding methods.

international midwest symposium on circuits and systems | 2013

Hierarchical image-scrambling method with scramble-level controllability for privacy protection

Toshiya Honda; Yuma Murakami; Yuki Yanagihara; Takeshi Kumaki; Takeshi Fujino

Privacy-protecting technology is essential in this surveillance society. The applications of cameras widely vary, e.g., crime surveillance, monitoring of an environment, and marketing. Furthermore, the scale of surveillance systems is predicted to become more diverse (from home area networks to wide area networks) due to the decrease in size and price of cameras. Therefore, a simple privacy protection system that does not require central servers or large databases is needed. Scrambling private information in a captured image can be a solution to simplifying a system. We propose an image-scrambling method for bitmap and JPEG formatted images to private information. Our method enables access control by providing keys to authorized individuals. They cannot view private information that they do not have permission to access. The images format is retained; therefore, no special viewer is necessary in display-only console. Experimental results suggest that scramble level can be controlled linearly by using parameters (three for JPEG formatted image, and one for bitmap image). We also developed a demo system for this method and confirm that this method can be applied to embedded systems such as those equipped with surveillance cameras.

ieee global conference on consumer electronics | 2015

Detection technique for hardware Trojans using machine learning in frequency domain

Takato Iwase; Yusuke Nozaki; Masaya Yoshikawa; Takeshi Kumaki

Recently, the threat of hardware Trojan has been highlighted. A hardware Trojan is a hardware virus. When predetermined conditions are satisfied, that malicious virus performs subversive activities, such as a system shutdown and the leaking of important information, without the circuit users even being aware of that activity. Therefore, it is important to detect the consumer electronic devices with hardware Trojans from a viewpoint of security. This study proposes a new detection technique for hardware Trojan. The proposed method introduces machine learning for the detection. Experiments using actual devices prove the validity of the proposed method.

IEICE Transactions on Information and Systems | 2007

Acceleration of DCT Processing with Massive-Parallel Memory-Embedded SIMD Matrix Processor

Takeshi Kumaki; Masakatsu Ishizaki; Tetsushi Koide; Hans Jürgen Mattausch; Yasuto Kuroda; Hideyuki Noda; Katsumi Dosaka; Kazutami Arimoto; Kazunori Saito

This paper reports an efficient Discrete Cosine Transform (DCT) processing method for images using a massive-parallel memory-embedded SIMD matrix processor. The matrix-processing engine has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. For compatibility with this matrix-processing architecture, the conventional DCT algorithm has been improved in arithmetic order and the vertical/horizontal-space 1 Dimensional (1D)-DCT processing has been further developed. Evaluation results of the matrix-engine-based DCT processing show that the necessary clock cycles per image block can be reduced by 87% in comprison to a conventional DSP architecture. The determined performances in MOPS and MOPS/mm2 are factors 8 and 5.6 better than with a conventional DSP, respectively.

IEICE Transactions on Electronics | 2008

Integration Architecture of Content Addressable Memory and Massive-Parallel Memory-Embedded SIMD Matrix for Versatile Multimedia Processor

Takeshi Kumaki; Masakatsu Ishizaki; Tetsushi Koide; Hans Jürgen Mattausch; Yasuto Kuroda; Takayuki Gyohten; Hideyuki Noda; Katsumi Dosaka; Kazutami Arimoto; Kazunori Saito

This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded SIMD matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded SIMD matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The SIMD matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel SIMD matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded SIMD matrix processor and a conventional mobile DSP, respectively.

IEICE Transactions on Information and Systems | 2007

Scalable FPGA/ASIC Implementation Architecture for Parallel Table-Lookup-Coding Using Multi-Ported Content Addressable Memory

Takeshi Kumaki; Yutaka Kono; Masakatsu Ishizaki; Tetsushi Koide; Hans Jürgen Mattausch

This paper presents a scalable FPGA/ASIC implementation architecture for high-speed parallel table-lookup-coding using multi-ported content addressable memory, aiming at facilitating effective table-lookup-coding solutions. The multi-ported CAM adopts a Flexible Multi-ported Content Addressable Memory (FMCAM) technology, which represents an effective parallel processing architecture and was previously reported in [1]. To achieve a high-speed parallel table-lookup-coding solution, FMCAM is improved by additional schemes for a single search mode and counting value setting mode, so that it permits fast parallel table-lookup-coding operations. Evaluation results for Huffman encoding within the JPEG application show that a synthesized semi-custom ASIC implementation of the proposed architecture can already reduce the required clock-cycle number by 93% in comparison to a conventional DSP. Furthermore, the performance per area unit, measured in MOPS/mm2, can be improved by a factor of 3.8 in comparison to parallel operated DSPs. Consequently, the proposed architecture is very suitable for FPGA/ASIC implementation, and is a promising solution for small area integrated realization of real-time table-lookup-coding applications.

IEICE Transactions on Information and Systems | 2007

Real-Time Huffman Encoder with Pipelined CAM-Based Data Path and Code-Word-Table Optimizer

Takeshi Kumaki; Yasuto Kuroda; Masakatsu Ishizaki; Tetsushi Koide; Hans Jürgen Mattausch; Hideyuki Noda; Katsumi Dosaka; Kazutami Arimoto; Kazunori Saito

This paper presents a novel optimized real-time Huffman encoder using a pipelined data path based on CAM technology and a parallel code-word-table optimizer. The exploitation of CAM technology enables fast parallel search of the code word table. At the same time, the code word table is optimized according to the frequency of received input symbols and is up-dated in real-time. Since these two functions work in parallel, the proposed architecture realizes fast parallel encoding and keeps a constantly high compression ratio. Evaluation results for the JPEG application show that the proposed architecture can achieve up to 28% smaller encoded picture sizes than the conventional architectures. The obtained encoding time can be reduced by 95% in comparison to a conventional SRAM-based architecture, which is suitable even for the latest end-user-devices requiring fast frame-rates. Furthermore, the proposed architecture provides the only encoder that can simultaneously realize small compressed data size and fast processing speed.

international midwest symposium on circuits and systems | 2013

Cipher-destroying and secret-key-emitting hardware Trojan against AES core

Takeshi Kumaki; Takeshi Fujino

This paper reports cipher-destroying and secret-key-emitting hardware Trojan against Advanced Encryption Standard (AES) cores in order to facilitate countermeasures against such Trojans. We developed a malicious circuit that connects the encryption and decryption modules in the AES core. If an attacker-defined predetermined rule is satisfied in the AES core, the hardware Trojan is triggered and it sends half-encoded data from the encryption module to the decryption module via a Trojan path. As a result, plain text is directly delivered to the output port. Furthermore, if the hardware Trojan-inserted AES core is inputted with a predefined keyword, which is transferred to a controller via the Trojan path, a secret-key is directly outputted. To verify the threat of AES hardware Trojan, we evaluated the Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuit (ASIC) implementation results from the Verilog-Hardware Description Language (HDL) macro. From the FPGA implementation of a conventional AES core and malicious AES core on the microblaze system, additional hardware of the malicious circuit was about 0.18% larger than the normal AES core. The maximum operating frequency of the malicious AES core was the same as that of the normal one. From the ASIC implementation, additional hardware and power consumption of the malicious circuit was about 0.37% and 0.13% larger than the normal one, respectively. Therefore, we argue that the development of hardware Trojan analysis and detection techniques must be accelerated for ensuring the reliability of LSI products.

international midwest symposium on circuits and systems | 2010

Realization of efficient and low-power parallel face-detection with massive-parallel memory-embedded SIMD matrix

Takeshi Kumaki; Yuta Imai; Hirokazu Hiramoto; Tetsushi Koide; Hans Jürgen Mattausch

This paper presents an efficient and low-power-consumption parallel face-detection technology based on Haar-like features and implemented with a massive-parallel memory-embedded SIMD matrix. The massive-parallel memory-embedded SIMD matrix architecture has up to 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. For experimented verification of this matrix processing architecture, this parallel Haar-like-feature based face-detection technique has been implemented on an evaluation board and tested in practice. Evaluation results show that a total processing time of about 313 ms at 162 MHz clock frequency and 150 mW power dissipation can be realized. Thus, the reported parallel-face detection method with the massive-parallel memory-embedded SIMD matrix is a practical technology and is a promising solution for real-time mobile multimedia applications.

Explore More