Yuko Hara-Azumi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuko Hara-Azumi is active.

Explore More

Publication

Featured researches published by Yuko Hara-Azumi.

international soc design conference | 2010

Towards practical high-level synthesis from large behavioral descriptions

Yuko Hara-Azumi; Toshinobu Matsuba; Hiroyuki Tomiyama; Shinya Honda; Hiroaki Takada; Nikil D. Dutt

This paper presents two sets of our recent works towards practical high-level synthesis from large behavioral descriptions: one optimally partitions input behavioral descriptions considering the controllers complexity of synthesized circuits, and the other enhances clock frequency of the circuits by aggressively removing MUXs inserted before registers. These works can further advance the high-level synthesis technology.

design, automation, and test in europe | 2015

Profiling-driven multi-cycling in FPGA high-level synthesis

Stefan Hadjis; Andrew Canis; Ryoya Sobue; Yuko Hara-Azumi; Hiroyuki Tomiyama; Jason Helge Anderson

Multi-cycling is a well-known strategy to improve performance in digital design, wherein the required time for selected combinational paths is lengthened to multiple clock cycles (rather than just one). The approach can be applied to paths associated with computations whose results are not needed immediately - such paths are allowed multiple clock cycles to “complete”, reducing the opportunity for them to form the critical path of the circuit. In this paper, we consider multi-cycling in the high-level synthesis context (HLS) and use software profiling to guide multi-cycling optimizations. Specifically, prior to HLS, we execute the program in software with typical datasets to gather data on the number of times each code segment executes. During HLS, we then extend the schedule for infrequently executed code segments and apply multi-cycling to the dilated schedules, which exhibit greater opportunities for multi-cycling. In essence, our approach ensures that non-frequently executed code segments will not form the critical path of the HLS-generated circuit. In an experimental study targeting the Altera Stratix IV FPGA, we evaluate the impact on speed performance and area for both traditional multi-cycling, as well as the proposed software profiling-driven multi-cycling, and show that profiling-driven multi-cycling leads to an average speedup of over 10% across 13 benchmark circuits, with some circuit speedups in excess of 30%. Circuit area is reduced by 11%, yielding a mean 20% improvement in area-delay product.

design, automation, and test in europe | 2013

Instruction-set extension under process variation and aging effects

Yuko Hara-Azumi; Farshad Firouzi; Saman Kiamehr; Mehdi Baradaran Tahoori

We propose a novel custom instruction (CI) selection technique for process variation and transistor aging aware instruction-set architecture synthesis. For aggressive clocking, we select CIs based on statistical static timing analysis (SSTA), which achieves efficient speedup during target lifetime while mitigating degradation of timing yield (i.e., probability of satisfying the timing). Furthermore, we consider process variation and aging on not only CIs but also basic instructions (BIs). Even if basic functional units (BFUs), e.g., ALU, get slower due to aging, only a few BIs with critical propagation delay may violate the timing, whereas the other BIs running on the same BFU can still satisfy the timing. We then introduce “customized BFUs”, which execute only such aging-critical BIs. The customized BFUs, used as spare BFUs of the aging-critical BIs, can extend lifetime of the system. Combining the two approaches enables speedup as well as lifetime extension with no or negligibly small area/power overhead. Experiments demonstrate that our work outperforms conventional worst-case work (by an average speedup of about 49%) and existing SSTA-based work (16x or more lifetime extension with comparable speedup).

asia and south pacific design automation conference | 2013

A clique-based approach to find binding and scheduling result in flow-based microfluidic biochips

Trung Anh Dinh; Shigeru Yamashita; Tsung-Yi Ho; Yuko Hara-Azumi

Microfluidic biochips have been recently proposed to integrate all the necessary functions for biochemical analysis. There are several types of microfluidic biochips; among them there has been a great interest in flow-based microfluidic biochips, in which the flow of liquid is manipulated using integrated microvalves. By combining several microvalves, more complex resource units such as micropumps, switches and mixers can be built. For efficient execution, the flow of liquid routes in microfluidic biochips needs to be scheduled under some resource constraints or routing constraints. The execution time of the biochemical operations depends on the binding and scheduling results. The most previously developed binding and scheduling algorithms are based on heuristics, and there has been no method to obtain optimal results. Considering the above, this paper proposes an optimal method by casting the problem to a clique problem.

international symposium on quality electronic design | 2013

Cost-efficient scheduling in high-level synthesis for Soft-Error Vulnerability Mitigation

Yuko Hara-Azumi; Hiroyuki Tomiyama

Due to the continuous reduction in chip feature size and supply voltage, soft errors are becoming a serious problem in the todays LSI design. Most literature on system-level design techniques has been conventionally tackling this issue by spatial and/or temporal modular redundancy, whose cost in circuit area and performance is large. This paper proposes a soft error-aware scheduling method in high-level synthesis (HLS), which does not rely on such expensive, conventional techniques. The reliability of the datapath circuit is determined not only by that of hardware resources to which operations and values are assigned, but also that of their active time (i.e., time during which operational results should be correct). By considering both of these factors, our proposed method schedules operations so that the reliability of HLS-generated datapath circuits can be maximized under designer-given area/latency constraints. Experimental results demonstrate the effectiveness of our method over existing methods, especially for strict area/latency constraints.

design, automation, and test in europe | 2016

Effect of LFSR seeding, scrambling and feedback polynomial on stochastic computing accuracy

Jason Helge Anderson; Yuko Hara-Azumi; Shigeru Yamashita

Stochastic computing (SC) [1] has received attention recently as a paradigm to improve energy efficiency and fault tolerance. SC uses hardware-generated random bitstreams to represent numbers in the [0:1] range - the number represented is the probability of a bit in the stream being logic-1. The generation of random bitstreams is typically done using linear-feedback shift register (LFSR)-based random number generators. In this paper, we consider how best to design such LFSR-based stochastic bitstream generators, as a means of improving the accuracy of stochastic computing. Three design criteria are evaluated: 1) LFSR seed selection, 2) the utility of scrambling LFSR output bits, and 3) the LFSR polynomials (i.e. locations of the feedback taps) and whether they should be unique vs. uniform across stream generators. For a recently proposed multiplexer-based stochastic logic architecture [8], we demonstrate that careful seed selection can improve accuracy results vs. the use of arbitrarily selected seeds. For example, we show that stochastic logic with seed-optimized 255-bit stream lengths achieves accuracy better than that of using 1023-bit stream lengths with arbitrary seeds: an improvement of over 4× in energy for equivalent accuracy.

Ipsj Transactions on System Lsi Design Methodology | 2014

Impact of Resource Sharing and Register Retiming on Area and Performance of FPGA-based Designs

Yuko Hara-Azumi; Toshinobu Matsuba; Hiroyuki Tomiyama; Shinya Honda; Hiroaki Takada

Due to the increasing diversity and complexity of embedded systems, the use of high-level synthesis (HLS) and that of FPGAs have been both becoming prevalent in order to enhance the design productivity. Although a number of works for FPGA-oriented optimizations, particularly about resource binding, have been studied in HLS, the HLS technologies are still immature since most of them overlook some important facts on resource sharing. In this paper, for FPGA-based designs, we quantitatively evaluate effects of several resource sharing approaches in HLS using practically large benchmarks, on various FPGA devices. Through the comprehensive evaluation, the effects on clock frequency, execution time, area, and multiplexer distribution are examined. Several important discussions and findings will be disclosed, which are essential for further advance of the practical HLS technology.

asia and south pacific design automation conference | 2013

VISA synthesis: Variation-aware Instruction Set Architecture synthesis

Yuko Hara-Azumi; Takuya Azumi; Nikil D. Dutt

We present VISA: a novel Variation-aware Instruction Set Architecture synthesis approach that makes effective use of process variation from both software and hardware points of view. To achieve an efficient speedup, VISA selects custom instructions based on statistical static timing analysis (SSTA) for aggressive clocking. Furthermore, with minimum performance overhead, VISA dynamically detects and corrects timing faults resulting from aggressive clocking of the underlying processor. This hybrid software/hardware approach generates significant speedup without degrading the yield. Our experimental results on commonly used ISA synthesis benchmarks demonstrate that VISA achieves significant performance improvement compared with a traditional deterministic worst case-based approach (up to 78.0%) and an existing SSTA-based approach (up to 49.4%).

international soc design conference | 2012

Task mapping techniques for embedded many-core SoCs

Junya Kaida; Takuji Hieda; Ittetsu Taniguchi; Hiroyuki Tomiyama; Yuko Hara-Azumi; Koji Inoue

This paper proposes static task mapping techniques for embedded many-core SoCs. The proposed techniques take into account both task and data parallelisms of the tasks in order to efficiently utilize the potential parallelism of the many-core architecture. Two approaches are proposed for static mapping: one approach is based on integer linear programming and the other is based on a greedy algorithm. In addition, a static mapping technique considering dynamic task switching is proposed. Experimental results show the effectiveness of the proposed techniques.

ieee international conference on high performance computing data and analytics | 2012

Selective Resource Sharing with RT-Level Retiming for Clock Enhancement in High-Level Synthesis

Yuko Hara-Azumi; Toshinobu Matsuba; Hiroyuki Tomiyama; Shinya Honda; Hiroaki Takada

As the size and complexity of embedded systems are growing, the area cost and performance of the LSI circuits are becoming more crucial. A critical bottleneck for them is interconnections such as multiplexers (MUXs). Thus, a hardware synthesis technique for reducing MUXs, especially during the earlier design phase, has been demanded. This paper presents a novel MUX reduction technique in high-level synthesis. Our method simultaneously realizes area suppression of both modules and MUXs by selectively sharing costly resources and handles MUX insertion by register-transfer level register retiming so that they do not affect the clock frequency. Experiments demonstrate that our proposed method successfully achieves both the area and clock improvement for practical designs compared with conventional methods.

Explore More