Kenichi Kuroda | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kenichi Kuroda is active.

Explore More

Publication

Featured researches published by Kenichi Kuroda.

broadband and wireless computing, communication and applications | 2010

Architecture and Design of Efficient 3D Network-on-Chip (3D NoC) for Custom Multicore SoC

Akram Ben Ahmed; Abderazek Ben Abdallah; Kenichi Kuroda

During this last decade, Network-on-Chips (NoC) have been proposed as a promising solution for future systems on chip design. It offers more scalability than the shared-bus based interconnection, allows more processors to operate concurrently. Because NoC has dedicated wires, performance can be predicted. In this context, we proposed a 2D-NoC named OASIS, which is a 4x4 mesh topology design using Wormhole switching and Stall-and-Go flow control scheme. Although OASIS-NoC has its advantages over the shared-bus based systems, it has also some limitations such as high power consumption, high cost communication, and low throughput. To overcome those limitations we propose a 3D-NoC (3D OASIS-NoC) which is an extension to our 2D OASIS-NoC. In this paper we describe the 3D OASIS-NoC architecture in a fair amount of detail and present preliminary evaluation results.

broadband and wireless computing, communication and applications | 2010

Advanced Design Issues for OASIS Network-on-Chip Architecture

Kenichi Mori; Adam Esch; Abderazek Ben Abdallah; Kenichi Kuroda

Network-on-Chip (NoC) architectures provide a good way of realizing efficient interconnections and largely alleviate the limitations of bus-based solutions. NoC has emerged as a solution to problems exhibited by the shared bus communication approach in System-On-Chip (SoC) implementations including lack of scalability, clock skew, lack of support for concurrent communication and power consumption. The communication requirement of this paradigm is affected by architecture parameters such as topology, routing, buffer size etc. In this paper, we propose advanced optimization techniques for OASIS NoC, a NoC we previously designed. We describe the architecture and the novel optimization techniques in details. Hardware complexity and preliminary performance results are also given.

Journal of Computers | 2008

Dynamic Module Library for System Level Modeling and Simulation of Dynamically Reconfigurable Systems

Kenji Asano; Junji Kitamichi; Kenichi Kuroda

In this paper, we propose a library for the system level modeling and simulation of the system which includes Dynamically Reconfigurable Architectures (DRAs). The proposed library is an extended SystemC library. Using the proposed library, the designer can model the system specifications including modules for the dynamic generation and elimination and ports and channels for the dynamic connection and dispatch between them, that are needed in the design of general-purpose dynamically reconfigurable systems at the system design level. In addition, we evaluate the proposed library by the modeling and simulation of sample circuits, such as partially DRA and multi context DRA. Using the proposed library, we can model the system specifications naturally, and as much the same amount as a description, such as one using multiplexers and demultiplexers, which is a modeling formula for describing multi-context DRA. Under some conditions, higher-speed simulation is possible using the proposed library.

embedded and ubiquitous computing | 2008

Advanced Optimization and Design Issues of a 32-Bit Embedded Processor Based on Produced Order Queue Computation Model

Hiroki Hoshino; Ben A. Abderazek; Kenichi Kuroda

Queue computing based programs are generated using a so called level order traversal that exposes all available parallelism in the programs. All instructions within the same level are data independent from each other and are safely to be executed in parallel. This property is leveraged by the compiler generating queue programs with high amounts of grouped independent instructions. Thus, the hardware invests little efforts to find parallelism. In this paper, we present various optimization and design issues of a synthesizable queue processor architecture targeted for embedded applications. A prototype implementation is produced by synthesizing the high-level model for a target FPGA device.

The Journal of Supercomputing | 2011

Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

Ben A. Abderazek; Masashi Masuda; Arquimedes Canedo; Kenichi Kuroda

This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, making instructions short and the programs free of false dependencies. This characteristic allows the exploitation of maximum parallelism and improves code density. Compiling for the QueueCore requires a new approach since the concept of registers disappears. We propose a new efficient code generation algorithm for the QueueCore. For a set of numerical benchmark programs, our compiler extracts more parallelism than the optimizing compiler for an RISC machine by a factor of 1.38. Through the use of QueueCore’s reduced instruction set, we are able to generate 20% and 26% denser code than two embedded RISC processors.

international conference on parallel processing | 2010

An Efficient Algorithm and Embedded Multicore Implementation of ECG Analysis in Multi-lead Electrocardiogram Records

Abderazek Ben Abdallah; Yasuyoshi Haga; Kenichi Kuroda

Electrocardiography (ECG) is an interpretation of the electrical activity of the heart over time captured and externally recorded by electrodes. An effective approach to speed up this and other biomedical operations is to integrate a very high number of processing elements in a single chip so that the massive scale of fine-grain parallelism inherent in several biomedical applications can be exploited efficiently. In this paper, we exploit parallel processing techniques to process electrocardiography computation kernels in parallel. We present an efficient ECG analysis algorithm based on Period-Peak Detection (PPD) approach. The system is implemented in a multicore System-on-Chip. System architecture and evaluation results are given in detail.

embedded and ubiquitous computing | 2008

Single Instruction Dual-Execution Model Processor Architecture

Taichi Maekawa; Ben A. Abderazek; Kenichi Kuroda

We present in this paper architecture and preliminary evaluation results of a novel dual-mode processor architecture which supports queue and stack computation models in a single core. The core is highly adaptable in both functionality and configuration. It is based on a reduced bit produced order queue computation instruction set architecture and functions into Queue or Stack execution models. This is achieved via a so called dynamic switching mechanism implemented in hardware. The current design focuses on the ability to execute Queue programs and also to support Stack based programs without considerable increase in hardware to the base architecture. We present the architecture description and design results in a fair amount of details.

frontier of computer science and technology | 2006

A Hardware Resource Management System for Adaptive Computing on Dynamically Reconfigurable Devices

Toshiyuki Ito; Kazuya Mishou; Yuichi Okuyama; Kenichi Kuroda

This paper proposes a realization method of the computer system with dynamical hardware-resource allocation on dynamically reconfigurable devices. The system consists of two or more parts and they can change the number of processing units according to each processing load. In the system, there is a competition problem between these parts. In order to solve this problem, we investigate required functions of resource management units on a simple processing model. This model is an adapted load balancing model consisting of an upper management unit, two management units and processing units shared by them

international parallel and distributed processing symposium | 2005

A master-slave adaptive load-distribution processor model on PCA

Toshiyuki Ito; Junji Kitamichi; Kenichi Kuroda; Yuichi Okuyama

In this paper, we propose a new load-distribution processor model that adapts hardware resources optimally and autonomously to target applications on dynamical reconfiguration devices. In the procedure of load-distribution, the processor detects the load of task-processing by itself and changes the kinds and number of resources optimally. We adopt the master-slave model, which consists of a management unit (master) and two or more processing units (slaves). The former detects overload and distributes tasks and the latter execute task-processing. One of the features of this model is that it is possible to change the number of processing units without reconfiguring the management units structure. Moreover, in order to use this load-distribution system efficiently, we propose a reordering unit that buffers data from processing units and outputs rearranged data. In this paper, we describe the requirements and organization of a management unit and processing units. Next, we implement the proposed model on real chips of PCA, a dynamical reconfiguration device, and measure the overheads of processing and reconfiguration. Finally, we evaluate the proposed model based on the experimental results. From the experiments, we show that our proposed model can reduce a designers efforts to estimate the amount of hardware resources according to applications in advance.

international conference on vlsi design | 2008

A Modeling of a Dynamically Reconfigurable Processor Using SystemC

Junji Kitamichi; Koji Ueda; Kenichi Kuroda

Recently, dynamically reconfigurable processors (DRPs) have been proposed. In this paper, we describe a model of a DRP using a dynamic module library (DML), which we have developed for the modeling of general-purpose dynamically reconfigurable systems. The DML is an extended SystemC library and enables the modeling of the dynamic generation and elimination of modules, ports and channels and the dynamic connection and dispatch between port and channel. Using the DML, we can model the DRP naturally. The architecture of the proposed DRP is based on an MlPS-type architecture and supports the instructions, which are for the dynamically reconfigurable operational units and for their generation and elimination. We describe the proposed DRP model and its evaluation results.

Explore More