Kodai Ueyoshi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kodai Ueyoshi is active.

Explore More

Publication

Featured researches published by Kodai Ueyoshi.

symposium on vlsi circuits | 2017

BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS

Kota Ando; Kodai Ueyoshi; Kentaro Orimo; Haruyoshi Yonekawa; Shimpei Sato; Hiroki Nakahara; Masayuki Ikebe; Tetsuya Asai; Shinya Takamaeda-Yamazaki; Tadahiro Kuroda; Masato Motomura

A versatile reconfigurable accelerator for binary/ternary deep neural networks (DNNs) is presented. It features a massively parallel in-memory processing architecture and stores varieties of binary/ternary DNNs with a maximum of 13 layers, 4.2 K neurons, and 0.8 M synapses on chip. The 0.6 W, 1.4 TOPS chip achieves performance and energy efficiency that is 10–102 and 102–104 times better than a CPU/GPU/FPGA.

IEEE Transactions on Circuits and Systems Ii-express Briefs | 2017

Error Tolerance Analysis of Deep Learning Hardware Using a Restricted Boltzmann Machine Toward Low-Power Memory Implementation

Takao Marukame; Kodai Ueyoshi; Tetsuya Asai; Masato Motomura; Alexandre Schmid; Masamichi Suzuki; Yusuke Higashi; Yuichiro Mitani

Remarkable hardware robustness of deep learning (DL) is revealed by error injection analyses performed using a custom hardware model implementing parallelized restricted Boltzmann machines (RBMs). RBMs in deep belief networks demonstrate robustness against memory errors during and after learning. Fine-tuning significantly affects the recovery of accuracy for static errors injected to the structural data of RBMs. The memory error tolerance is observable using our hardware networks with fine-graded memory distribution, resulting in reliable DL hardware with low-voltage driven memory suitable to low-power applications.

reconfigurable computing and fpgas | 2016

FPGA architecture for feed-forward sequential memory network targeting long-term time-series forecasting

Kentaro Orimo; Kota Ando; Kodai Ueyoshi; Masayuki Ikebe; Tetsuya Asai; Masato Motomura

Deep learning is being widely used in various applications, and diverse neural networks have been proposed. A form of neural network, such as the novel feed-forward sequential memory network (FSMN), aims to forecast prospective data by extracting the time-series feature. FSMN is a standard feed-forward neural network equipped with time-domain filters, and it can forecast without recurrent feedback. In this paper, we propose a field-programmable gate-array (FPGA) architecture for this model, and exhibit that the resource does not increase exponentially as the network scale increases.

international symposium on circuits and systems | 2016

Memory-error tolerance of scalable and highly parallel architecture for restricted Boltzmann machines in Deep Belief Network

Kodai Ueyoshi; Takao Marukame; Tetsuya Asai; Masato Motomura; Alexandre Schmid

A key aspect of constructing highly scalable Deep-learning microelectronic systems is to implement fault tolerance in the learning sequence. Error-injection analyses for memory is performed using a custom hardware model implementing parallelized restricted Boltzmann machines (RBMs). It is confirmed that the RBMs in Deep Belief Networks (DBNs) provides remarkable robustness against memory errors. Fine-tuning has significant effects on recovery of accuracy for static errors injected to the structural data of RBMs during and after learning, which are either at cell-level or block level. The memory-error tolerance is observable using our hardware networks with fine-graded memory distribution.

international solid-state circuits conference | 2018

QUEST: A 7.49TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS

Kodai Ueyoshi; Kota Ando; Kazutoshi Hirose; Shinya Takamaeda-Yamazaki; Junichiro Kadomoto; Tomoki Miyata; Mototsugu Hamada; Tadahiro Kuroda; Masato Motomura

A key consideration for deep neural network (DNN) inference accelerators is the need for large and high-bandwidth external memories. Although an architectural concept for stacking a DNN accelerator with DRAMs has been proposed previously, long DRAM latency remains problematic and limits the performance [1]. Recent algorithm-level optimizations, such as network pruning and compression, have shown success in reducing the DNN memory size [2]; however, since networks become irregular and sparse, they induce an additional need for agile random accesses to the memory systems.

IEEE Journal of Solid-state Circuits | 2018

BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W

Kota Ando; Kodai Ueyoshi; Kentaro Orimo; Haruyoshi Yonekawa; Shimpei Sato; Hiroki Nakahara; Shinya Takamaeda-Yamazaki; Masayuki Ikebe; Tetsuya Asai; Tadahiro Kuroda; Masato Motomura

A versatile reconfigurable accelerator architecture for binary/ternary deep neural networks is presented. In-memory neural network processing without any external data accesses, sustained by the symmetry and simplicity of the computation of the binary/ternaty neural network, improves the energy efficiency dramatically. The prototype chip is fabricated, and it achieves 1.4 TOPS (tera operations per second) peak performance with 0.6-W power consumption at 400-MHz clock. The application examination is also conducted.

international symposium on neural networks | 2017

Exploring optimized accelerator design for binarized convolutional neural networks

Kodai Ueyoshi; Kota Ando; Kentaro Orimo; Masayuki Ikebe; Tetsuya Asai; Masato Motomura

The convolutional neural network (CNN) is a state-of-the-art model that can achieve significantly high accuracy in many machine-learning tasks. Recently, for further developing the practical applications of CNNs, efficient hardware platforms for accelerating CNN have been throughly studied. A binarized neural network has been reported to minimize the multipliers, which consume a large amount of resources, with a minimal decrease in accuracy. In this study, we analyzed the optimal performance of CNN implemented on an field programmable gate array (FPGA) considering its logic resources and a memory bandwidth, using multiple types of parallelisms such as kernels, pixels, and channels both in conventional and binarized CNNs. As a result, it became clear that all the parallelisms are required for the binarized neural network to obtain the best performance of 8.38 TOPS.

international symposium on circuits and systems | 2017

Live demonstration: Feature extraction system using restricted Boltzmann machines on FPGA

Kodai Ueyoshi; Takao Marukame; Tetsuya Asai; Masato Motomura; Alexandre Schmid

Real-time results obtained from an unsupervised feature extraction system using Restricted Boltzmann Machines (RBMs) implemented on FPGA are presented. The feature extraction application is demonstrated using the MNIST dataset, and the weights storing features are visualized in real-time. A digit classification is also performed based on the learning results. Our demonstration system performs 134 times faster than the compared conventional CPU.

international midwest symposium on circuits and systems | 2017

In-memory area-efficient signal streaming processor design for binary neural networks

Haruyoshi Yonekawa; Shimpei Sato; Hiroki Nakahara; Kota Ando; Kodai Ueyoshi; Kazutoshi Hirose; Kentaro Orimo; Shinya Takamaeda-Yamazaki; Masayuki Ikebe; Tetsuya Asai; Masato Motomura

The expanding use of deep learning algorithms causes the demands for accelerating neural network (NN) signal processing. For the NN processing, in-memory computation is desired, in which expensive data transfer can be eliminated. In reflection of recently proposed binary neural networks (BNNs), which can reduce the computation resource and area requirements, we designed an in-memory BNN signal processor that densely stores binary weights in on-chip memories and can scale linearly with serial-parallel-serial signal stream. It achieved 3 and 71 times better per-power and per-area performance than an existing in-memory neuromorphic processor.

International Conference on Innovative Techniques and Applications of Artificial Intelligence | 2017

Quantization Error-Based Regularization in Neural Networks

Kazutoshi Hirose; Kota Ando; Kodai Ueyoshi; Masayuki Ikebe; Tetsuya Asai; Masato Motomura; Shinya Takamaeda-Yamazaki

Deep neural network is a state-of-the-art technology for achieving high accuracy in various machine learning tasks. Since the available computing power and memory footprint are restricted in embedded computing, precision quantization of numerical representations, such as fixed-point, binary, and logarithmic, are commonly used for higher computing efficiency. The main problem of quantization is accuracy degradation due to its lower numerical representation. There is generally a trade-off between numerical precision and accuracy. In this paper, we propose a quantization-error-aware training method to attain higher accuracy in quantized neural networks. Our approach appends an additional regularization term that is based on quantization errors of weights to the loss function. We evaluate the accuracy by using MNIST and CIFAR-10. The evaluation results show that the proposed approach achieves higher accuracy than the standard approach with quantized forwarding.

Explore More