Hiroaki Kunieda | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hiroaki Kunieda is active.

Explore More

Publication

Featured researches published by Hiroaki Kunieda.

design automation conference | 2008

MAPS: an integrated framework for MPSoC application parallelization

Jianjiang Ceng; Jeronimo Castrillon; Weihua Sheng; Hanno Scharwächter; Rainer Leupers; Gerd Ascheid; Heinrich Meyr; Tsuyoshi Isshiki; Hiroaki Kunieda

In the past few years, MPSoC has become the most popular solution for embedded computing. However, the challenge of programming MPSoCs also comes as the biggest side-effect of the solution. Especially, when designers have to face the legacy C code accumulated through the years, the tool support is mostly unsatisfactory. In this paper, we propose an integrated framework, MAPS, which aims at parallelizing C applications for MPSoC platforms. It extracts coarse-grained parallelism on a novel granularity level. A set of tools have been developed for the framework. We will introduce the major components and their functionalities. Two case studies will be given, which demonstrate the use of MAPS on two different kinds of applications. In both cases the proposed framework helps the programmer to extract parallelism efficiently.

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | 2008

A Multiprocessor SoC Architecture with Efficient Communication Infrastructure and Advanced Compiler Support for Easy Application Development

Mohammad Zalfany Urfianto; Tsuyoshi Isshiki; Arif Ullah Khan; Dongju Li; Hiroaki Kunieda

This paper presents a Multiprocessor System-on-Chips (MPSoC) architecture used as an execution platform for the new C-language based MPSoC design framework we are currently developing. The MPSoC architecture is based on an existing SoC platform with a commercial RISC core acting as the host CPU. We extend the existing SoC with a multiprocessor-array block that is used as the main engine to run parallel applications modeled in our design framework. Utilizing several optimizations provided by our compiler, an efficient inter-communication between processing elements with minimum overhead is implemented. A host-interface is designed to integrate the existing RISC core to the multiprocessor-array. The experimental results show that an efficacious integration is achieved, proving that the designed communication module can be used to efficiently incorporate off-the-shelf processors as a processing element for MPSoC architectures designed using our framework.

IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 1998

Scalable VLSI architectures for lattice structure-based discrete wavelet transform

Joon Kim; Yong Hoon Lee; Tsuyoshi Isshiki; Hiroaki Kunieda

In this paper, we develop a scalable VLSI architecture employing a two-channel quadrature mirror filter (QMF) lattice for the one-dimensional (1-D) discrete wavelet transform (DWT). We begin with the development of systematic scheduling, which determines the filtering instants of each resolution level, on the basis of a binary tree. Then input-output relation between lattices of the QMF bank is derived, and a new structure for the data format converter (DFC) which controls the data transfer between resolution levels is proposed. In addition, implementation of a delay control unit (DCU) that controls the delay between lattices of the QMF is proposed. The structures for the DFC and DCU are regular, scalable, and require a minimum number of registers, and thereby lead to an efficient and scalable architecture for the DWT. A scalable architecture for the inverse DWT is also developed in a similar manner. Finally, pipelining of the proposed architecture is considered.

design automation conference | 2009

Trace-driven workload simulation method for Multiprocessor System-On-Chips

Tsuyoshi Isshiki; Dongju Li; Hiroaki Kunieda; Toshio Isomura; Kazuo Satou

While multiprocessor system-on-chips (MPSoCs) are becoming widely adopted in embedded systems, there is a strong need for methodologies that quickly and accurately estimate performance of such complex systems. In this paper, we present a novel method for accurately estimating the cycle counts of parameterized MPSoC architectures through workload simulation driven by program execution traces encoded in the form of branch bitstreams. Experimental results show that the proposed method delivers a speedup factor of 70.15 to 238.58 against the instruction-set simulator based method while achieving high cycle accuracy whose estimation error ranges between 0.016% and 0.459%.

Fourth IEEE Workshop on Automatic Identification Advanced Technologies (AutoID'05) | 2005

A hybrid method for fingerprint image quality calculation

Jinqing Qi; Desiree Abdurrachim; Dongju Li; Hiroaki Kunieda

This paper proposes a new hybrid scheme to measure fingerprint image quality by combining both local and global features of a fingerprint image. Distinguished from traditional methods (e.g. local standard deviation or orientation information based method, etc.), not only the local texture features but also some global factors such as foreground area, central position of foreground, the number of minutiae and the existence of singular points, are taken into account in the proposed method. Besides the detail definitions of seven quality indices, two weighting methods are also proposed for finding the correlation between the final quality value and each quality index. Experimental results on FVC2002 and our private database show that the EER (equal error rate) value can be downed by 12%-34% with 10% images rejected. It demonstrates that the hybrid method is an effective and efficient scheme to discard poor quality images and, hence, can be used to guarantee the reliability and performance of fingerprint recognition system.

Ipsj Transactions on System Lsi Design Methodology | 2012

Optimized Communication and Synchronization for Embedded Multiprocessors Using ASIP Methodology

Hao Xiao; Tsuyoshi Isshiki; Dongju Li; Hiroaki Kunieda; Yuko Nakase; Sadahiro Kimura

Inter-processor communication and synchronization are critical problems in embedded multiprocessors. In order to achieve high-speed communication and low-latency synchronization, most recent designs employ dedicated hardware engines to support these communication protocols individually, which is complex, inflexible, and error prone. Thus, this paper motivates the optimization of inter-processor communication and synchronization by using application-specific instruction-set processor (ASIP) techniques. The proposed communication mechanism is based on a set of custom instructions coupled with a low-latency on-chip network, which provides efficient support for both data transfer and process synchronization. By using state-of-the-art ASIP design methodology, we embed the communication functionalities into a base processor, making the proposed mechanism feature ultra low overhead. More importantly, industry-standard compatible programming interfaces supporting both message-passing and shared-memory paradigms are exposed to end-users to ease the software porting. Experimental results show that the bandwidth of the proposed message-passing protocol can achieve up to 703 Mbyte/s @ 200 MHz, and the latency of the proposed synchronization protocol can be reduced by more than 81% when compared with the conventional approach. Moreover, as a case study, we also show the effectiveness of the proposed communication mechanism in a real-life embedded application, WiMedia UWB MAC.

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | 2005

A Fingerprint Matching Using Minutia Ridge Shape for Low Cost Match-on-Card Systems

Andy Surya Rikin; Dongju Li; Tsuyoshi Isshiki; Hiroaki Kunieda

In recent years, there is an increasing trend of using biometric identifiers for personal authentication. Encouraged by advances in smart card technologies, the fingerprint matching gets increasingly embedded into smart cards for an effective personal authentication method. However, current generation of low cost smart cards are usually equipped with limited hardware resources such as an 8-bit or 16-bit microcontroller. The fingerprint matching typically is a time consuming, computationally intensive and costly process. Therefore, it is still a challenge to integrate the fingerprint matching into a smart card. In this paper, we present a fast memory-efficient fingerprint matching using minutia ridge shape feature. This feature offers advantages of smaller template size, smaller memory requirement, faster matching time and robust matching against image distortion over conventional minutiae-based feature. The implementation result shows that the proposed method can be embedded in smart cards for a real-time Match-on-Card system.

asia pacific conference on circuits and systems | 2000

Face focus coding under H.263+ video coding standard

Trio Adiono; Tsuyoshi Isshiki; Kazuhito Ito; Tomohiko Ohtsuka; Dongju Li; Chawalit Honsawek; Hiroaki Kunieda

In this paper, we present a new method to enhance image quality at face region of head and shoulder type image sequence and to shorten processing latency to achieve synchronization between lip movement and voice (lip sync). The new method can significantly improve image quality at face region and reduce frame skip operation during high movement image coding. Improvement is done by allocating more bits budget to the face region, where the centre of perceptual interest point usually located. Total number of bits of dynamically change background region is compressed by applying temporal filter to suppress background noise. We design a new fast rate control based on non-zero coefficient evaluation to shorten compression latency. The experimental result shows the increment of face regions PSNR by around 2 dB, the decreasing of skipping operation around 60 frames during encoding of 382 frames of highly movement video sequence and the advantage of having a very small compression latency around 3 frames which can resolve the lips sync problem.

field-programmable custom computing machines | 1998

New FPGA architecture for bit-serial pipeline datapath

Akihisa Ohta; Tsuyoshi Isshiki; Hiroaki Kunieda

In this paper, we present our work on the design of a new FPGA architecture targeted for high-performance bit-serial pipeline datapath. Bit-parallel systems introduce large routing area overhead which is especially critical in using FPGAs, where the device utilization, and operation frequency become low because of large routing penalty. Here we propose a new FPGA architecture for high-performance bit-serial pipeline datapaths, which are very efficient in routing. Also, we refine our LUT architecture in order to efficiently implement shift registers which are required in large numbers in some bit-serial designs. Modified lookup table have two modes, combinatorial logic and shift register. Bit-serial datapath can be implemented on less number of CLBs.

asia and south pacific design automation conference | 1997

An optimal scheduling method for parallel processing system of array architecture

Kazuhito Ito; Tadashi Iwata; Hiroaki Kunieda

In high-level synthesis for digital signal processing systems of array structured architecture, one of the most important procedures is the scheduling. By taking into account the allocation of operations to processors, it is mandatory to take into account the communication time between processors. In this paper we propose a scheduling method which derives an optimal schedule achieving the minimum iteration period and latency for a given signal processing algorithm on the specified processor array. The scheduling problem is modeled as an integer linear programming and solved by an ILP solver. Furthermore, we improve the scheduling method so that it can be applied to large scale signal processing algorithms without degrading the schedule optimality.

Explore More