Yohei Nakata
Kobe University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yohei Nakata.
international symposium on quality electronic design | 2013
Jinwook Jung; Yohei Nakata; Masahiko Yoshimoto; Hiroshi Kawaguchi
Large on-chip caches account for a considerable fraction of the total energy consumption in modern microprocessors. In this context, emerging Spin-Transfer Torque RAM (STT-RAM) has been regarded as a promising candidate to replace large on-chip SRAM caches in virtue of its nature of the zero leakage. However, large energy requirement of STT-RAM on write operations, resulting in a huge amount of dynamic energy consumption, precludes it from application to on-chip cache designs. In order to reduce the write energy of the STT-RAM cache thereby the total energy consumption, this paper provides an architectural technique which exploits the fact that many applications process a large number of zero data. The proposed design appends additional flags in cache tag arrays and set these additional bits if the corresponding data in the cache line is the zero-valued data in which all data bits are zero. Our experimental results show that the proposed cache design can reduce 73.78% and 69.30% of the dynamic energy on write operations at the byte and word granularities, respectively; total energy consumption reduced by 36.18% and 42.51%, respectively. In addition to the energy reduction, performance evaluation results indicate that the proposed cache improves the processor performance by 5.44% on average.
custom integrated circuits conference | 2010
Shunsuke Okumura; Shusuke Yoshimoto; Kosuke Yamaguchi; Yohei Nakata; Hiroshi Kawaguchi; Masahiko Yoshimoto
This paper proposes 7T SRAM which realizes block-level simultaneous copying feature. The proposed SRAM can be used for data transfer between local memories such as checkpoint data storage and transactional memory. The 1-Mb SRAM is comprised of 32-kb blocks, in which 16-kb data can be copied in 33.3 ns at 1.2V. The proposed scheme reduces energy consumption in copying by 92.7% compared to the conventional read-modify-write manner.
international symposium on quality electronic design | 2012
Yuki Kagiyama; Shunsuke Okumura; Koji Yanagida; Shusuke Yoshimoto; Yohei Nakata; Shintaro Izumi; Hiroshi Kawaguchi; Masahiko Yoshimoto
SRAM performance varies depending on the operating environment. This study specifically examines the bit error rate (BER) when considering temperature fluctuation. The SRAM performance is generally determined using a read margin because a half-select issue must be considered even in a write operation. As a metric of the SRAMs performance, we also adopt a static noise margin (SNM) with which we evaluate three methods to estimate the BER considering temperature fluctuation. Method 1 iterates calculations for the SNM many times with Monte Carlo simulation. BER is defined as the number of cells that have no margin. Method 2 includes the assumption that SNM forms a normal distribution. Its BER is defined as a probability distribution function. Method 3 includes the assumption that SNM is determined as either square but not the smaller one of the two squares. The BER estimations are compared with a test chip result implemented in a 65-nm CMOS technology: Method 2 has 11.10% and Method 3 has 4.09% difference (unfortunately, Method 1 has no data missing because of a lack of simulations). The shift of the minimum operating voltage between the low and high temperatures is 0.04 V at a 128-Kb capacity when the temperature fluctuates from 25°C to 100°C.
international symposium on low power electronics and design | 2010
Yohei Nakata; Shunsuke Okumura; Hiroshi Kawaguchi; Masahiko Yoshimoto
This paper presents a novel cache architecture using 7T/14T hybrid SRAM, which can dynamically improve its reliability with control lines. Our proposed 14T word-enhancing scheme can enhance its operating margin in word granularity by combining two words in a low-voltage mode. The proposed scheme is suitable for dynamic voltage and frequency scaling (DVFS). In a 65-nm process, it can reduce the minimum operation voltage (Vmin) to 0.5 V, which is 42% and 21% lower, respectively, than the conventional 6T SRAM and the cache word-disable scheme. The respective power reductions are 90% and 65%.
international conference on electronics, circuits, and systems | 2011
Jinwook Jung; Yohei Nakata; Shunsuke Okumura; Hiroshi Kawaguchi; Masahiko Yoshimoto
This paper presents a dependable cache memory for which associativity can be reconfigured dynamically. The proposed associativity-reconfigurable cache consists of pairs of cache ways. Each pair has two modes: the normal mode and the dependable mode. The proposed cache can dynamically enhance its reliability in the dependable mode, thereby trading off its performance. The reliability of the proposed cache can be scaled by reconfiguring its associativity. Moreover, the configuration can be chosen based upon current operating conditions. Our chip measurement results show that the proposed dependable cache possesses the scalable characteristic of reliability. Moreover, it can decrease the minimum operating voltage by 115 mV. The cycle accurate simulation shows that designing the L1, L2 caches using the proposed scheme results in 4.93% IPC loss on average. Area estimation results show that the proposed cache adds area overhead of 1.91% and 5.57% in 32-KB and 256-KB caches, respectively.
dependable systems and networks | 2011
Yohei Nakata; Yasuhiro Ito; Yasuo Sugure; Shigeru Oho; Yusuke Takeuchi; Shunsuke Okumura; Hiroshi Kawaguchi; Masahiko Yoshimoto
We propose a fault-injection system (FIS) that can inject faults such as read/write margin failures and soft errors into a SRAM environment. The fault case generator (FCG) generates time-series SRAM failures in 7T/14T or 6T SRAM, and the proposed device model and fault-injection flow are applicable for system-level verification. For evaluation, an abnormal termination rate in vehicle engine control was adopted. We confirmed that the vehicle engine control system with the 7T/14T SRAM improves system-level dependability compared with the conventional 6T SRAM.
custom integrated circuits conference | 2011
Shunsuke Okumura; Yohei Nakata; Koji Yanagida; Yuki Kagiyama; Shusuke Yoshimoto; Hiroshi Kawaguchi; Masahiko Yoshimoto
This paper proposes a 7T SRAM that realizes a block-level instantaneous comparison feature. The proposed SRAM is useful for operation results comparison in dual modular redundancy (DMR). The data size that can be instantaneously compared is scalable using the proposed structure. The 1-Mb SRAM comprises 16-kb blocks in which 8-kb data can be compared in 130.0ns. The proposed scheme reduces power consumption in data comparison by 92.3%, compared to that of a parallel cyclic redundancy check (CRC) circuit.
international conference on intelligent control and information processing | 2010
Yukihiro Takeuchi; Yohei Nakata; Hiroshi Kawaguchi; Masahiko Yoshimoto
Although valuable, the high-quality video compression format H.264/AVC workload complicates real-time encoding. This paper describes scalable parallel processing for H.264/AVC. Macroblock (MB)-level decomposition is more scalable than conventional methods for increasing the number of multiple threads. Moreover, it presents memory bandwidth advantages. This parallel algorithm can be improved using a motion estimation algorithm that distributes the workload among threads. Complementary recursive cross search (CRCS) is used to achieve efficient video encoding using MB-level decomposition. With and without B-frames for HDTV, MB-level decomposition with CRCS can respectively increase the frame rate of the conventional method by 2.4 and 4.6 times. Furthermore, the method suppresses memory accesses despite higher processing efficiency. Results show that MB-level decomposition with CRCS is suitable for computing in the many-core processor era.
IFAC Proceedings Volumes | 2013
Yasuo Sugure; Yasuhiro Ito; Yohei Nakata; Yusuke Takeuchi; Hiroshi Kawaguchi; Masahiko Yoshimoto; Shigeru Oho
Abstract We propose a virtual prototyping system that can evaluate failure mode and effect analysis (FMEA). The virtual prototyping system which consists of co-simulation environment between mechanics model and microcontroller model is integrated a fault-injection system that can inject faults into SRAM. This approach was applied to a validation of vehicle engine control. We observed that an abnormal system behavior occurred by SRAM fault. Thus the virtual prototyping system with fault-injection system can be performed a vehicle engine control behavior without actual components when fault occurred.
digital systems design | 2011
Yohei Nakata; Yukihiro Takeuchi; Hiroshi Kawaguchi; Masahiko Yoshimoto
As process technology is scaled down, a typical system on a chip (SoC) becomes denser. In scaled process technology, process variation becomes greater and increasingly affects the SoC circuits. Process variation strongly affects Network-on-Chips (NoCs), which have a synchronous network across the chip: its network frequency is degraded. As described herein, we propose a process-variation-adaptive NoC with a variation-adaptive variable-cycle router (VAVCR). The proposed VAVCR can configure its cycle latency adaptively, corresponding to process variation. It can increase the network frequency, which is limited by the slowest network component in a conventional router. The total execution time reduction of the proposed VAVCR is 14.9%, on average, for five task graphs.