Kazushi Kawamura | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kazushi Kawamura is active.

Explore More

Publication

Featured researches published by Kazushi Kawamura.

international conference on asic | 2015

Clock skew estimate modeling for FPGA high-level synthesis and its application

Koichi Fujiwara; Kazushi Kawamura; Masao Yanagisawa; Nozomu Togawa

Recently, high-level synthesis (HLS) techniques for FPGA designs are required in various applications. Clock network in FPGA has already been built before implementing any circuits, which may lead a large impact of clock skews and then degrade operation frequency. In this paper, we formulate a clock skew estimate model for FPGA-HLS (CSEF). CSEF is an accurate model to estimate clock skews in HLS flow. CSEF is then integrated into a floorplan-aware HLS algorithm targeting FPGA designs. Experimental results demonstrate that our HLS algorithm can realize FPGA designs which reduce the latency by up to 19% compared with conventional approaches.

international conference on electron devices and solid-state circuits | 2015

A floorplan-aware high-level synthesis technique with delay-variation tolerance

Kazushi Kawamura; Yuta Hagio; Youhua Shi; Nozomu Togawa

For realizing better trade-off between performance and yield rate in recent LSI designs, it is required to deal with increasing the ratios of interconnect delay as well as delay variation. In this paper, a novel floorplan-aware high-level synthesis technique with delay-variation tolerance is proposed. By utilizing floorplan-driven architectures, interconnect delays can be estimated and then handled even in high-level synthesis. Applying our technique enables to realize two scheduling/binding results (one is a non-delayed result and the other is a delayed result) simultaneously on a chip with small area/performance overhead, and either one of them can be selected according to the post-silicon delay variation. Experimental results demonstrate that our technique can reduce delayed scheduling/binding latency by up to 32.3% compared with conventional approaches.

system on chip conference | 2016

Rotator-based multiplexer network synthesis for field-data extractors

Koki Ito; Kazushi Kawamura; Yutaka Tamiya; Masao Yanagisawa; Nozomu Togawa

As seen in stream data processing, it is necessary to extract a particular data field from bulk data, where we can use a field-data extractor. Particularly, an (M,N)-field-data extractor reads out any consecutive N bytes from an M-byte register by connecting its input/output using multiplexers (MUXs). However, the number of required MUXs increases too much as the input/output byte lengths increase. It is known that partitioning an MUX network leads to reducing the number of MUXs. In this paper, we firstly pick up a multi-layered MUX network, which is generated by repeatedly partitioning a MUX network into a collection of single-layered MUX networks. We show that the multi-layered MUX network is equivalent to the barrel shifter from which redundant MUXs and wires are removed, and we prove that the number of its required MUXs becomes the smallest among MUX-network-partitioning based field-data extractors. Next, we propose a rotator-based MUX network for a field-data extractor, which reads out a particular data in an input register to a rotator. The size of the rotator is the same as its output register and hence we no longer require any extra wires nor MUXs. By rotating the input data appropriately, we can finally have a right-ordered data into an output register. Experimental results show that our rotator-based MUX network reduces the required number of gates to implement a field-data extractor by up to 33% compared with the one using a multi-layered MUX network.

international symposium on vlsi design, automation and test | 2016

A high-level synthesis algorithm for FPGA designs optimizing critical path with interconnection-delay and clock-skew consideration

Koichi Fujiwara; Kazushi Kawamura; Masao Yanagisawa; Nozomu Togawa

High-level synthesis for FPGA designs (FPGA-HLS) is recently required in various applications. Since wire delays are becoming a design bottleneck in FPGA, we need to handle interconnection delays and clock skews in FPGA-HLS flow. In this paper, we propose an FPGA-HLS algorithm optimizing critical path with interconnection-delay and clock-skew consideration. By utilizing HDR architecture, we floorplan circuit modules in HLS flow and, based on the result, estimate interconnection delays and clock skews. To reduce the critical-path delay(s) of a circuit, we propose two novel methods for FPGA-HLS. Experimental results demonstrate that our algorithm can improve circuit performance by up to 24% compared with conventional approaches.

international symposium on quality electronic design | 2018

A loop structure optimization targeting high-level synthesis of fast number theoretic transform

Kazushi Kawamura; Masao Yanagisawa; Nozomu Togawa

Multiplication with a large number of digits is heavily used when processing data encrypted by a fully homomorphic encryption, which is a bottleneck in computation time. An algorithm utilizing fast number theoretic transform (FNTT) is known as a high-speed multiplication algorithm and the further speeding up is expected by implementing the FNTT process on an FPGA. A high-level synthesis tool enables efficient hardware implementation even for FNTT with a large number of points. In this paper, we propose a methodology for optimizing the loop structure included in a software description of FNTT so that the performance of the synthesized FNTT processor can be maximized. The loop structure optimization is considered in terms of loop flattening and trip count reduction. We implement a 65,536-point FNTT processor with the loop structure optimization on an FPGA, and demonstrate that it can be executed 6.9 times faster than the execution on a CPU.

international soc design conference | 2017

A selector-based FFT processor and its FPGA implementation

Yuya Hirai; Kazushi Kawamura; Masao Yanagisawa; Nozomu Togawa

Fast Fourier transform (FFT) is used in various applications such as signal processings and developing a high-speed FFT processor is quite required. In this paper, we propose a high-speed FFT processor based on selector logics. The selector-based FFT processor is constructed by focusing on the subtract-multiplication operations and partly applying selector logics to them. Furthermore, we implement the selector-based FFT processor on a Xilinx FPGA. Experimental results show that our proposed FFT processor can improve the processing speed by up to 21% and also reduce the number of LUTs by up to 33% compared with a naive FFT processor.

international soc design conference | 2016

A high-performance circuit design algorithm using data dependent approximation

Kazushi Kawamura; Masao Yanagisawa; Nozomu Togawa

This paper proposes a high-performance circuit design algorithm using input data dependent approximation. In our algorithm, STEPCs (Suspicious Timing Error Prediction Circuits) are utilized for identifying the paths to be optimized inside a circuit efficiently. Experimental results targeting a set of basic adders show that our algorithm can achieve performance increase by up to 11.1% within the error rate of 2.1% compared to a conventional design technique.

asia pacific conference on circuits and systems | 2014

A floorplan-aware high-level synthesis algorithm for multiplexer reduction targeting FPGA designs

Koichi Fujiwara; Shinya Abe; Kazushi Kawamura; Masao Yanagisawa; Nozomu Togawa

Recently, high-level synthesis (HLS) techniques for FPGA designs are required in various applications such as computerized stock tradings and reconfigurable network processings. In HLS for FPGA designs, we need to consider module floorplan and reduce multiplexers cost concurrently. In this paper, we propose a floorplan-aware HLS algorithm for multiplexer reduction targeting FPGA designs. By utilizing distirbuted-register architectures called HDR, we can easily consider module floorplan in HLS. In order to reduce multiplexers cost, we propose two novel binding methods called datapath-oriented scheduling/FU binding and datapath-oriented register binding. Experimental results demonstrate that our algorithm can realize FPGA designs which reduces the number of slices by up to 47% and circuit delay by up to 16% compared with the conventional approach.

international symposium on circuits and systems | 2013

A partial redundant fault-secure high-level synthesis algorithm for RDR architectures

Kazushi Kawamura; Sho Tanaka; Masao Yanagisawa; Nozomu Togawa

In this paper, we propose a partial redundant fault-secure high-level synthesis algorithm for RDR architectures, where we duplicate a part of the original CDFG and maximize its reliability under a timing constraint. Firstly, our algorithm allocates some new additional functional units to vacant spaces on RDR islands for recomputation and increases the number of duplicated operation nodes. Secondly, it minimizes the number of inserted comparator nodes through re-scheduling/re-binding the recomputation CDFGs nodes. As a result, we will obtain a scheduled/bound recomputation CDFG and renewed functional unit allocation with high reliability. Experimental results demonstrate that our algorithm improves reliability by up to 52% compared with the conventional approach.

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | 2015