Kenichi Kuroda
University of Aizu
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kenichi Kuroda.
broadband and wireless computing, communication and applications | 2010
Akram Ben Ahmed; Abderazek Ben Abdallah; Kenichi Kuroda
During this last decade, Network-on-Chips (NoC) have been proposed as a promising solution for future systems on chip design. It offers more scalability than the shared-bus based interconnection, allows more processors to operate concurrently. Because NoC has dedicated wires, performance can be predicted. In this context, we proposed a 2D-NoC named OASIS, which is a 4x4 mesh topology design using Wormhole switching and Stall-and-Go flow control scheme. Although OASIS-NoC has its advantages over the shared-bus based systems, it has also some limitations such as high power consumption, high cost communication, and low throughput. To overcome those limitations we propose a 3D-NoC (3D OASIS-NoC) which is an extension to our 2D OASIS-NoC. In this paper we describe the 3D OASIS-NoC architecture in a fair amount of detail and present preliminary evaluation results.
broadband and wireless computing, communication and applications | 2010
Kenichi Mori; Adam Esch; Abderazek Ben Abdallah; Kenichi Kuroda
Network-on-Chip (NoC) architectures provide a good way of realizing efficient interconnections and largely alleviate the limitations of bus-based solutions. NoC has emerged as a solution to problems exhibited by the shared bus communication approach in System-On-Chip (SoC) implementations including lack of scalability, clock skew, lack of support for concurrent communication and power consumption. The communication requirement of this paradigm is affected by architecture parameters such as topology, routing, buffer size etc. In this paper, we propose advanced optimization techniques for OASIS NoC, a NoC we previously designed. We describe the architecture and the novel optimization techniques in details. Hardware complexity and preliminary performance results are also given.
Journal of Computers | 2008
Kenji Asano; Junji Kitamichi; Kenichi Kuroda
In this paper, we propose a library for the system level modeling and simulation of the system which includes Dynamically Reconfigurable Architectures (DRAs). The proposed library is an extended SystemC library. Using the proposed library, the designer can model the system specifications including modules for the dynamic generation and elimination and ports and channels for the dynamic connection and dispatch between them, that are needed in the design of general-purpose dynamically reconfigurable systems at the system design level. In addition, we evaluate the proposed library by the modeling and simulation of sample circuits, such as partially DRA and multi context DRA. Using the proposed library, we can model the system specifications naturally, and as much the same amount as a description, such as one using multiplexers and demultiplexers, which is a modeling formula for describing multi-context DRA. Under some conditions, higher-speed simulation is possible using the proposed library.
embedded and ubiquitous computing | 2008
Hiroki Hoshino; Ben A. Abderazek; Kenichi Kuroda
Queue computing based programs are generated using a so called level order traversal that exposes all available parallelism in the programs. All instructions within the same level are data independent from each other and are safely to be executed in parallel. This property is leveraged by the compiler generating queue programs with high amounts of grouped independent instructions. Thus, the hardware invests little efforts to find parallelism. In this paper, we present various optimization and design issues of a synthesizable queue processor architecture targeted for embedded applications. A prototype implementation is produced by synthesizing the high-level model for a target FPGA device.
The Journal of Supercomputing | 2011
Ben A. Abderazek; Masashi Masuda; Arquimedes Canedo; Kenichi Kuroda
This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, making instructions short and the programs free of false dependencies. This characteristic allows the exploitation of maximum parallelism and improves code density. Compiling for the QueueCore requires a new approach since the concept of registers disappears. We propose a new efficient code generation algorithm for the QueueCore. For a set of numerical benchmark programs, our compiler extracts more parallelism than the optimizing compiler for an RISC machine by a factor of 1.38. Through the use of QueueCore’s reduced instruction set, we are able to generate 20% and 26% denser code than two embedded RISC processors.
international conference on parallel processing | 2010
Abderazek Ben Abdallah; Yasuyoshi Haga; Kenichi Kuroda
Electrocardiography (ECG) is an interpretation of the electrical activity of the heart over time captured and externally recorded by electrodes. An effective approach to speed up this and other biomedical operations is to integrate a very high number of processing elements in a single chip so that the massive scale of fine-grain parallelism inherent in several biomedical applications can be exploited efficiently. In this paper, we exploit parallel processing techniques to process electrocardiography computation kernels in parallel. We present an efficient ECG analysis algorithm based on Period-Peak Detection (PPD) approach. The system is implemented in a multicore System-on-Chip. System architecture and evaluation results are given in detail.
embedded and ubiquitous computing | 2008
Taichi Maekawa; Ben A. Abderazek; Kenichi Kuroda
We present in this paper architecture and preliminary evaluation results of a novel dual-mode processor architecture which supports queue and stack computation models in a single core. The core is highly adaptable in both functionality and configuration. It is based on a reduced bit produced order queue computation instruction set architecture and functions into Queue or Stack execution models. This is achieved via a so called dynamic switching mechanism implemented in hardware. The current design focuses on the ability to execute Queue programs and also to support Stack based programs without considerable increase in hardware to the base architecture. We present the architecture description and design results in a fair amount of details.
frontier of computer science and technology | 2006
Toshiyuki Ito; Kazuya Mishou; Yuichi Okuyama; Kenichi Kuroda
This paper proposes a realization method of the computer system with dynamical hardware-resource allocation on dynamically reconfigurable devices. The system consists of two or more parts and they can change the number of processing units according to each processing load. In the system, there is a competition problem between these parts. In order to solve this problem, we investigate required functions of resource management units on a simple processing model. This model is an adapted load balancing model consisting of an upper management unit, two management units and processing units shared by them
international parallel and distributed processing symposium | 2005
Toshiyuki Ito; Junji Kitamichi; Kenichi Kuroda; Yuichi Okuyama
In this paper, we propose a new load-distribution processor model that adapts hardware resources optimally and autonomously to target applications on dynamical reconfiguration devices. In the procedure of load-distribution, the processor detects the load of task-processing by itself and changes the kinds and number of resources optimally. We adopt the master-slave model, which consists of a management unit (master) and two or more processing units (slaves). The former detects overload and distributes tasks and the latter execute task-processing. One of the features of this model is that it is possible to change the number of processing units without reconfiguring the management units structure. Moreover, in order to use this load-distribution system efficiently, we propose a reordering unit that buffers data from processing units and outputs rearranged data. In this paper, we describe the requirements and organization of a management unit and processing units. Next, we implement the proposed model on real chips of PCA, a dynamical reconfiguration device, and measure the overheads of processing and reconfiguration. Finally, we evaluate the proposed model based on the experimental results. From the experiments, we show that our proposed model can reduce a designers efforts to estimate the amount of hardware resources according to applications in advance.
international conference on vlsi design | 2008
Junji Kitamichi; Koji Ueda; Kenichi Kuroda
Recently, dynamically reconfigurable processors (DRPs) have been proposed. In this paper, we describe a model of a DRP using a dynamic module library (DML), which we have developed for the modeling of general-purpose dynamically reconfigurable systems. The DML is an extended SystemC library and enables the modeling of the dynamic generation and elimination of modules, ports and channels and the dynamic connection and dispatch between port and channel. Using the DML, we can model the DRP naturally. The architecture of the proposed DRP is based on an MlPS-type architecture and supports the instructions, which are for the dynamically reconfigurable operational units and for their generation and elimination. We describe the proposed DRP model and its evaluation results.