Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kouhei Nadehara is active.

Publication


Featured researches published by Kouhei Nadehara.


international symposium on microarchitecture | 1998

V830R/AV: embedded multimedia superscalar RISC processor

Kazumasa Suzuki; Tomohisa Arai; Kouhei Nadehara; Ichiro Kuroda

The V830R/AVs real-time decoding of MPEG-2 video and audio data enables practical embedded-processor-based multimedia systems.


international symposium on microarchitecture | 1995

Low-power multimedia RISC

Kouhei Nadehara; Ichiro Kuroda; Masayuki Daito; Takashi Nakayama

Battery-powered multimedia systems challenge designers to pack enormous signal-processing power into a low-power chip. The V830 chip achieves this by combining a special instruction set and fast, 32-bit parallel multiply-adder with internal RAM for quicker memory accesses. This gives the newest entry in the V800 series signal-processing capabilities as fast as the latest fixed-point DSPs.


signal processing systems | 2004

Extended instructions for the AES cryptography and their efficient implementation

Kouhei Nadehara; Masao Ikekawa; Ichiro Kuroda

In this paper, extended instructions for the advanced encryption standard (AES) cryptography acceleration in embedded processors and efficient implementation of these instructions are presented. These AES instructions generate four elements in single-instruction, multiple-data format from each input of an AES state. The instruction count for 128-bit key AES encryption can be reduced from 688 to 340 per 128-bit block by using the proposed AES instructions. The execution unit for the AES instructions can be implemented efficiently with a single 2-Kbit table and four small multipliers. The capacity of the table has been reduced to 1/32, compared to that of a conventional fast software algorithm. The AES instructions enable embedded processors for low-cost network equipment to have cryptographic capability with minimal modification.


international conference on acoustics speech and signal processing | 1999

Radix-4 FFT implementation using SIMD multimedia instructions

Kouhei Nadehara; Takashi Miyazaki; Ichiro Kuroda

A fast radix-4 complex FFT implementation using 4-parallel SIMD instructions is presented. Four radix-4 butterflies are calculated in parallel at all stages by loading consecutive 4 elements into a register. At the last stage, every 4 elements is packed into a register and calculated in parallel. This regular data flow enables higher parallelism and an overhead reduction in data format conversion. The implementation result on the V830R processor, which has a 4-parallel SIMD-type multimedia instruction set, achieves practical performance quite competitive with high-end parallel DSPs. Multiply-accumulate instructions with symmetrical rounding introduced to the V830R processor are effective to maintain FFT accuracy.


IEEE Journal of Solid-state Circuits | 1999

A 2000-MOPS embedded RISC processor with a Rambus DRAM controller

Kazumasa Suzuki; Masayuki Daito; Tomoo Inoue; Kouhei Nadehara; Masahiro Nomura; Masayuki Mizuno; Tomofumi Iima; Shoichiro Sato; Terumi Fukuda; Tomohisa Arai; Ichiro Kuroda; Masakazu Yamashina

We have developed a 0.25-/spl mu/m, 200-MHz embedded RISC processor for multimedia applications. This processor has a dual-issue superscalar datapath that consists of a 32-bit integer unit and a 64-bit single-instruction multiple-data (SIMD) function unit that together have a total of five multiply-adders. An on-chip concurrent Rambus DRAM (C-RDRAM) controller uses interleaved transactions to increase the memory bandwidth of the Rambus channel to 533 Mb/s. The controller also reduces latency by using the transaction interleaving and instruction prefetching. A 64-bit, 200-MHz internal bus transfers data among the CPU core, the C-RDRAM, and the peripherals. These high-data-rate channels improve CPU performance because they eliminate a bottleneck in the data supply. The datapath part of this chip was designed using a functional macrocell library that included placement information for leaf cells and resulted in the SIMD function unit of this chips having 68000 transistors per square millimeter.


signal processing systems | 1998

A 16-bit parallel MAC architecture for a multimedia RISC processor

Ichiro Kuroda; E. Murata; Kouhei Nadehara; Kazumasa Suzuki; T. Arai; A. Okamura

This paper presents a parallel MAC (multiply-accumulation) architecture designed for DSP applications on a 200-MHz, 1.6-GOPS multimedia RISC processor. The datapath architecture of the processor is designed to realize parallel execution of a data transfer and SIMD parallel arithmetic operations. SIMD parallel 16-bit MAC instructions are introduced with a symmetric rounding scheme which maximizes the accuracy of the 18-bit accumulation. This parallel 16-bit MAC instruction on a 64-bit datapath is shown to be efficiently utilized for DSP applications such as convolution in the multimedia RISC processor. By using the parallel MAC instruction with the symmetric rounding scheme, the two-dimensional inverse discrete cosine transform (2D-IDCT) which satisfies IEEE 1180 can be implemented in 202 cycles.


VLSI Signal Processing, IX | 1996

Real-time software MPEG-1 video decoder design for low-cost, low-power applications

Kouhei Nadehara; H.J. Stolberg; Masao Ikekawa; E. Murata; I. Kuroda

This paper presents a real-time MPEC-1 video decoder implemented in software on a DSP-enhanced, 160-mW, 100-MHz, 32-bit microprocessor. The processors DSP-oriented instructions improves the performance of generic DSP operations such as the inverse discrete cosine transform, while fast software algorithms that perform parallel operation on packed-pixel data are developed for processes unique to video decoding such as motion compensation. Furthermore, to reduce the clock count as well as the instruction count, load/store scheduling and cache miss reduction are performed. In total, the processor can achieve 30 frames/sec MPEC-1 video decoding at a cost and power dissipation (160 mW) comparable to dedicated LSIs.


signal processing systems | 1999

MPEG-2 AAC 5.1-channel decoder software for a low-power embedded RISC microprocessor

Yuichiro Takamizawa; Kouhei Nadehara; Max Boegli; Masao Ikekawa; Ichiro Kuroda

Presented here is MPEG-2 AAC low complexity profile decoder software for a low-power embedded RISC microprocessor, NEC V830 (300 mW @133 MHz). Fast processing methods for IMDCT reduce execution time by 41% and help achieve real-time decoding of a 5.1-channel audio signal, while using only 64.7% of processor capacity.


international conference on multimedia and expo | 2001

Multimedia signal processor for mobile applications

Masao Ikekawa; Masatsugu Hori; Kouhei Nadehara; Takahiro Kumura; Makoto Yoshida; Ichiro Kuroda; Takao Nishitani

This paper describes an efficient architecture enhancement for video codec on a new-generation, general-purpose digital signal processor (DSP) core called SPXK5 developed for handheld devices. With high performance features of SPXK5s base architecture, an MPEG-4 video codec can be implemented efficiently. In addition, only a few SIMD type instructions effectively accelerate MPEG-4 video codec implementation by 20% with only 2.5% hardware increase. By reducing cycle count, the DSPs power consumption can be reduced. Both video and speech codec for 3G mobile service at 384kbps can be realized with a power consumption of less than 50mW.


Archive | 2004

Implementations of AES algorithm for reducing hardware with improved efficiency

Kouhei Nadehara

Collaboration


Dive into the Kouhei Nadehara's collaboration.

Researchain Logo
Decentralizing Knowledge