Is this you? Create Your Porfile

Masahiro Sowa

University of Electro-Communications

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Masahiro Sowa is active.

Explore More

Publication

Featured researches published by Masahiro Sowa.

The Journal of Supercomputing | 2006

High-Level Modeling and FPGA Prototyping of Produced Order Parallel Queue Processor Core

Ben A. Abderazek; Tsutomu Yoshinaga; Masahiro Sowa

Emerging high-level hardware description and synthesis technologies in conjunction with field programmable gate arrays (FPGAs) have significantly lowered the threshold for hardware development. Opportunities exist to integrate these technologies into a tool for exploring and evaluating microarchitectural designs especially for newly proposed architectures. This paper presents a prototyping of a new processor core based on Queue architecture as starting point for application-specific processor design exploration. Using a hardware description language, we have created the Synthesizable model of a produced order parallel queue processor core for the integer subset parallel Queue architecture. A prototype implementation is produced by synthesizing the high-level model for the Stratix FPGA prototyping board. We show how to perform prototyping and optimizations to fully exploit the capabilities of the prototyped Queue processor core, while maintaining a common source base.

The Journal of Supercomputing | 2005

Parallel Queue Processor Architecture Based on Produced Order Computation Model

Masahiro Sowa; Ben A. Abderazek; Tsutomu Yoshinaga

This paper proposes novel produced order parallel queue processor architecture. To store intermediate results, the proposed system uses a first-in-first-out (FIFO) circular queue-registers instead of random access registers. Datum is inserted in the queue-registers in produced order scheme and can be reused. We show that this feature has profound implications in the areas of parallel execution, programs compactness, hardware simplicity and high execution speed.Our performance evaluations show a significant performance improvement (e.g., 10 to 26% decrease in program size and 6 to 46% decrease in execution time over a range of benchmark programs) when compared with the earlier proposed architecture.

Journal of Parallel and Distributed Computing | 2008

The QC-2 parallel Queue processor architecture

Ben A. Abderazek; Arquimedes Canedo; Tsutomu Yoshinaga; Masahiro Sowa

Queue based instruction set architecture processor offers an attractive option in the design of embedded systems. In our previous work, we proposed a novel queue processor architecture as a starting point for hardware/software design space exploration for embedded applications. In this paper, we present a high performance 32-bit Synthesizable QueueCore (QC-2)-an improved and optimized version of the produced order parallel Queue processor (PQP), with single precision floating-point support. The QC-2 core also implements a novel technique used to extend immediate values and memory instruction offsets that were otherwise not representable because of bit-width constraints in the PQP processor. A prototype implementation is produced by synthesizing the high-level model for a target FPGA device. We present the architecture description and design results in a fair amount of details.

international conference on convergence information technology | 2007

Novel Addressing Method for Aggregate Types in Queue Processors

Teruhisa Yuki; Arquimedes Canedo; Ben A. Abderazek; Masahiro Sowa

Queue processors use a first-in first-out data structure to perform operations. Instructions implicitly reference their operands simplifying the design of the instruction set and the hardware complexity. Some access to memory require a computed address. A register-indirect addressing method introduces severe limitations in a queue processor by inserting false dependencies that limit the high parallelism capacity of such architectures. In this paper we propose a novel addressing method for queue processors that employ the queue for address calculation and memory access. We demonstrate that our new proposed method reduces the number of instructions by 6% and increases parallelism by 4% for a set of embedded applications.

pacific rim conference on communications, computers and signal processing | 1999

Design of a superscalar processor based on queue machine computation model

Shusuke Okamoto; Hitoshi Suzuki; Atusi Maeda; Masahiro Sowa

The queue machine computation model is an evaluation scheme for expression trees, in which the input operands of operations are taken from head of a queue, and its result is put onto tail of the same queue. A series of operations for this model are generated by traversing the expression tree(s) from its leaf nodes in reverse of the breadth-first ordering. Since nodes with the same level in an expression tree can be processed concurrently, the generated operations can also be processed in parallel without reordering. In this paper, we describe a design of superscalar processor using this computation model.

Computer Languages, Systems & Structures | 2008

A new code generation algorithm for 2-offset producer order queue computation model

Arquimedes Canedo; Ben A. Abderazek; Masahiro Sowa

Queue computing is an attractive alternative for the compulsive demand of high-performance architectures. Code generation for queue machines has some problems but the solutions have not been studied thoroughly. A new parallel queue computation model, 2-offset P-Code queue computation model, is presented together with a new code generation algorithm. The code generation algorithm takes leveled DAGs as input and produces 2-offset P-Code assembly. We also developed a queue compiler to evaluate the new algorithm and compiled a set of C language benchmark programs for the 2-offset P-Code. The queue compiler generates between 8.55% less instructions and 10.55% more instructions than an actual MIPS32 compiler for the compiled programs.

symposium on computer architecture and high performance computing | 2007

Queue Register File Optimization Algorithm for QueueCore Processor

Arquimedes Canedo; Ben A. Abderazek; Masahiro Sowa

Different resources descriptions from different virtual organizations in a grid environment, exemplifies the challenge to match a specific resource, that could have similar characteristics, but with diverse descriptions. The use of a semantic matching method, based on ontology descriptions, is an alternative that can be considered by a software package to tackle this problem. However, recent researches indicate that fully automated systems are not able to recognize all possible relations between different ontologies. In other words, the human interaction is necessary after the recognition phase, when preliminary results are obtained from an ontology matching operation. This interaction is important in order to build a more logic knowledge to create efficient queries. In this article, we present a prototype tool which was designed and implementated to reduce issues related to match grid resource.The queue computation model offers an attractive alternative for high-performance embedded computing given its characteristics of short instructions and high instruction level parallelism. A queue-based processor uses a FIFO queue to read and write operands through hardware pointers located at the head and tail of the queue. Queue length is the number of elements stored between the head and the tail pointers during computations. We have found that 95% of the statements in integer applications require a queue length of less than 32 words. The remaining 5% requires larger queue length sizes up to 230 queue words. In this paper we propose a compiler technique to optimize the queue utilization for the hungry statements that require a large amount of queue. We show that for SPEC CINT95 benchmarks, our technique optimizes the queue length without decreasing parallelism. However, our optimization has a penalty of a slight increase in code size.

embedded and ubiquitous computing | 2005

Modular design structure and high-level prototyping for novel embedded processor core

Ben A. Abderazek; Sotaro Kawata; Tsutomu Yoshinaga; Masahiro Sowa

In this research work, we present a high-level prototyping of a new processor core based on Queue architecture as starting point for application-specific processor design exploration. Using modular design structure with control logic implemented as a set of communicating state machines, we show hardware emulation and optimizations results of a parallel queue proecssor architecture (QueueCore). We also show how to to fully exploit the capabilities of the designed QueueCore, while maintaining a common source base. From the evaluation results, we show that the QueueCore prototype fits on a single conventional FPGA device, thereby obviating the need to perform multi-chip partitioning which results in a loss of resource efficiency.

international conference on parallel processing | 2007

Mathematical Model for Multiobjective Synthesis of NoC Architectures

Ben A. Abderazek; Mushfiquzzaman Akanda; Tsutomu Yoshinaga; Masahiro Sowa

Network-on-Chip (NoC) interconnections have been proposed to overcome the problems associated with long wires used in chip wide communications. They support asynchronous transfer of communication between cores within multicore systems-on-chips (MCSoCs). The design of such architectures is crucial for achieving high performance and energy efficient systems. However, the effectiveness of NoC based design depends on the adopted design methodology. Automatic design approach is highly desirable to increase system design productivity. This paper presents a new mathematical formulation for synthesizing application specific NoC architectures, such that the performance constraints are satisfied and the communication power consumption is minimized.

embedded and ubiquitous computing | 2005

An efficient dynamic switching mechanism (DSM) for hybrid processor architecture

Akanda Md. Musfiquzzaman; Ben A. Abderazek; Sotaro Kawata; Masahiro Sowa

Increasing the processor resources usability and boosting processor compatibility and capability to support multi-executions models in a single core are highly needed nowadays to benefit from the recent developments in electronics technology. This work introduces the concept of a dynamic switching mechanism (DSM), which supports multi-instruction set execution models in a single and simple processor core. This is achieved dynamically by execution mode–switching scheme and a sources–results locations computing unit for a novel queue execution model and a well-known stack based execution model. The queue execution model is based on queue computation that uses queue-registers, a circular queue data structure, for operands and results manipulations and assigns queue words according to a single assignment rule. We present the DSM mechanism and we describe its hardware complexity and preliminary evaluation results. We also describe the DSM target architecture.

Explore More