Is this you? Create Your Porfile

Yasuhiko Nakashima

Nara Institute of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yasuhiko Nakashima is active.

Explore More

Publication

Featured researches published by Yasuhiko Nakashima.

2009 Third International Conference on Quantum, Nano and Micro Technologies | 2009

An Efficient Method to Convert Arbitrary Quantum Circuits to Ones on a Linear Nearest Neighbor Architecture

Yuichi Hirata; Masaki Nakanishi; Shigeru Yamashita; Yasuhiko Nakashima

A variety of quantum circuits have been designed.Most of them assume that arbitrary pairs of qubits can interact.However, several promising implementations of quantum computation rely on a Linear Nearest Neighbor (LNN) architecture, which arranges quantum bits on a line, and allows neighbor interactions only.Therefore, several specific circuits have been designed on an LNN architecture.However, a general and efficient conversion technique for an arbitrary circuit has not been established.Therefore, this paper gives an efficient method that converts an arbitrary quantum circuit to one on an LNN architecture.Our method achieves small overhead and time complexity compared with naive techniques.To develop the method, we introduce two key theorems that may be interesting on their own.In addition, our method also achieves smaller overhead for some known circuits designed on an LNN architecture.

IEEE Transactions on Nuclear Science | 2012

DARA: A Low-Cost Reliable Architecture Based on Unhardened Devices and Its Case Study of Radiation Stress Test

Jun Yao; Shogo Okada; Masaki Masuda; Kazutoshi Kobayashi; Yasuhiko Nakashima

A microprocessor with an architectural redundancy to achieve high dependability is designed and manufactured to explore the effectiveness of tolerating soft errors without circuit hardening. The processor architecture is based on a modularized pipeline which contains several functionalities to facilitate a real-time error detection and a fast roll-back recovery. As a further extension for a possible increase of hard errors in the future technology, an energy-effective coverage of hard errors by dynamically adapting the redundancy between a dual and a triple module is also included in the processor. A radiation stress test result indicates that the designed redundant but unhardened processor can successfully achieve the same dependability as a hardened processor. Our synthesis and layout results show that radiation hardened circuits increase processor hardware area by 71% and power by 28%, respectively. It is thus possible to use the architectural redundancy instead of circuit hardening to achieve a cost-effective reliability, as suggested by these factors.

international conference on networking and computing | 2012

A Speed-up Technique for an Auto-Memoization Processor by Reusing Partial Results of Instruction Regions

Kazutaka Kamimura; Ryosuke Oda; Tatsuhiro Yamada; Tomoaki Tsumura; Hiroshi Matsuo; Yasuhiko Nakashima

We have proposed an auto-memoization processor based on computation reuse. The auto-memoization processor dynamically detects functions and loop iterations as reusable blocks, and memoizes them automatically. In the past model, computation reuse cannot be applied if the current input sequence even differs by only one input value from the past input sequences, since processing results will differ. This paper proposes a new partial reuse model, which can apply computation reuse to the early part of a reusable block as long as the early part of the current input sequence matches one of the past sequences. In addition, in order to acquire sufficient benefit from the partial reuse model, we also propose a technique that reduces the searching overhead for memoization table by partitioning it. The result of the experiment with SPEC CPU95 suite benchmarks shows that the new method improves the maximum speedup from 40.6% to 55.1%, and the average speedup from 10.6% to 22.8%.

Neurocomputing | 2017

Cellular neural network formed by simplified processing elements composed of thin-film transistors

Mutsumi Kimura; Ryohei Morita; Sumio Sugisaki; Tokiyoshi Matsuda; Tomoya Kameda; Yasuhiko Nakashima

We have developed a cellular neural network formed by simplified processing elements composed of thin-film transistors. First, we simplified the neuron circuit into a two-inverter two-switch circuit and the synapse device into only a transistor. Next, we composed the processing elements of thin-film transistors, which are promising for giant microelectronics applications, and formed a cellular neural network by the processing elements. Finally, we confirmed that the cellular neural network can learn multiple logics even in a small-scale neural network. Moreover, we verified that the cellular neural network can simultaneously recognize multiple simple alphabet letters. These results should serve as the theoretical bases to realize ultra-large scale integration for brain-type integrated circuits.

international conference on networking and computing | 2010

A Speed-Up Technique for an Auto-Memoization Processor by Collectively Reusing Continuous Iterations

Tomoki Ikegaya; Tomoaki Tsumura; Hiroshi Matsuo; Yasuhiko Nakashima

We have proposed an auto-memoization processor based on computation reuse, and merged it with speculative multithreading based on value prediction into a parallel early computation. In the past model, the parallel early computation detects each iteration of loops as a reusable block. This paper proposes a new parallel early computation model, which integrates multiple continuous iterations into a reusable block automatically and dynamically without modifing executable binaries. We also propose a model for automatically detecting how many iterations should be integrated into one reusable block. Our model reduces the overhead of computation reuse, and further exploits reuse tables. The result of the experiment with SPEC CPU95 FP suite benchmarks shows that the new model improves the maximum speedup from 40.5% to 57.6%, and the average speedup from 15.0% to 26.0%.

2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip | 2015

A CGRA-Based Approach for Accelerating Convolutional Neural Networks

Masakazu Tanomoto; Shinya Takamaeda-Yamazaki; Jun Yao; Yasuhiko Nakashima

Convolutional neural network (CNN) is an emerging approach for achieving high recognition accuracy in various machine learning applications. To accelerate CNN computations, various GPU-based or application-specific hardware approaches have been recently proposed. However, since they require large computing hardware and absolute energy amount, they are not suitable for embedded applications. In this paper, we propose a novel approach to accelerate CNN computations using a CGRA (Coarse Grained Reconfigurable Architecture) for low-power embedded systems. We first present a new CGRA with distributed scratchpad memory blocks for efficient temporal blocking to reduce memory bandwidth pressure. We then show the architecture of our CNN accelerator using the CGRA with some dedicated software implementation. We evaluated our approach by comparing some existing platforms, such as high-end and mobile GPUs, and general multicore CPUs. The evaluation result shows that our proposal achieves 1.93x higher performance per memory bandwidth and 2.92x higher area performance, respectively.

IEEE Transactions on Nuclear Science | 2014

EReLA: A Low-Power Reliable Coarse-Grained Reconfigurable Architecture Processor and Its Irradiation Tests

Jun Yao; Mitsutoshi Saito; Shogo Okada; Kazutoshi Kobayashi; Yasuhiko Nakashima

In this work, facing pressure from both the increasing vulnerability to single event effects (SEEs) and design constraints of the power consumption, we have proposed a Coarse-Grained Reconfigurable Architecture (CGRA) processor. Our goal is to translate a user programmable redundancy to a guide for balancing energy consumption on the one hand and the reliability requirements on the other. We designed software (SW) and hardware (HW) approaches, coordinating them closely to achieve this purpose. The framework provides several user-assignable patterns of redundancy and the hardware modules to interpret well these patterns. A first version prototype processor, with the name EReLA (Explicit Redundancy Linear Array) has been implemented and manufactured with a 0.18 μm CMOS technology. Stress tests based on alpha particle irradiation were conducted to verify the tradeoff between the robustness and the power efficiency of the proposed schemes.

IEICE Transactions on Information and Systems | 2012

Quantum Walks on the Line with Phase Parameters

Marcos Villagra; Masaki Nakanishi; Shigeru Yamashita; Yasuhiko Nakashima

The invention relates to crystal forms of azithromycin, an antibiotic useful in the treatment of infections.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

LAPP: A Low Power Array Accelerator with Binary Compatibility

Naveen Devisetti; Takuya Iwakami; Kazuhiro Yoshimura; Takashi Nakada; Jun Yao; Yasuhiko Nakashima

Recently, reconfigurable architectures are becoming popular to achieve good energy efficiency. In this paper we designed an energy efficient, high performance accelerator, named Linear Array Pipeline Processor (LAPP). LAPP works to accelerate existing machine code executions to improve performance while maintaining the binary compatibility, instead of using special codes. With its highly reconfigurable feature, LAPP architecture is designed to effectively work with unit gating through a sufficiently long period to conceal the gating penalty, and thereby incurs minimum power consumption for a given workload. Specifically, codes are mapped fixedly onto Functional Unit (FU) array with minimized caches and registers, and they are pipeline executed with data stream. The synthesized results show that the area of a 36-stage LAPP is equal to 9.5 times that of a traditional processor core area. Compared to a Many-Core Processor (MCP) of the same area, an LAPP-simulator based estimation indicates that LAPP can achieve about 10 times the power efficiency for 9 image processing workloads.

parallel and distributed computing: applications and technologies | 2009

A Speculative Technique for Auto-Memoization Processor with Multithreading

Yushi Kamiya; Tomoaki Tsumura; Hiroshi Matsuo; Yasuhiko Nakashima

We have proposed an auto-memoization processor. This processor automatically and dynamically memoizes both functions and loop iterations, and skips their execution by reusing their results. On the other hand, multi/many-core processors have come into wide use. The number of cores is expected to increase to a hundred or more. However, many programs do not have so much parallelism in them. Therefore it becomes very important to consider how to utilize many cores effectively. This paper describes a speedup technique for auto-memoization processor using speculative multi-threading. Two speculative threads will be forked on reuse test. The one assumes that the reuse test will succeed, and executes the following codes of the reuse target block speculatively. The other assumes that the reuse test will fail, and executes the reuse target block. These two threads conceal the overhead of auto-memoization processor. The result of the experiment with SPEC CPU95 suite benchmarks shows that proposing method improves the maximum speedup from 13.9% to 36.0%.

Explore More