Kota Ando | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kota Ando is active.

Explore More

Publication

Featured researches published by Kota Ando.

symposium on vlsi circuits | 2017

BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS

Kota Ando; Kodai Ueyoshi; Kentaro Orimo; Haruyoshi Yonekawa; Shimpei Sato; Hiroki Nakahara; Masayuki Ikebe; Tetsuya Asai; Shinya Takamaeda-Yamazaki; Tadahiro Kuroda; Masato Motomura

A versatile reconfigurable accelerator for binary/ternary deep neural networks (DNNs) is presented. It features a massively parallel in-memory processing architecture and stores varieties of binary/ternary DNNs with a maximum of 13 layers, 4.2 K neurons, and 0.8 M synapses on chip. The 0.6 W, 1.4 TOPS chip achieves performance and energy efficiency that is 10–102 and 102–104 times better than a CPU/GPU/FPGA.

reconfigurable computing and fpgas | 2016

FPGA architecture for feed-forward sequential memory network targeting long-term time-series forecasting

Kentaro Orimo; Kota Ando; Kodai Ueyoshi; Masayuki Ikebe; Tetsuya Asai; Masato Motomura

Deep learning is being widely used in various applications, and diverse neural networks have been proposed. A form of neural network, such as the novel feed-forward sequential memory network (FSMN), aims to forecast prospective data by extracting the time-series feature. FSMN is a standard feed-forward neural network equipped with time-domain filters, and it can forecast without recurrent feedback. In this paper, we propose a field-programmable gate-array (FPGA) architecture for this model, and exhibit that the resource does not increase exponentially as the network scale increases.

international solid-state circuits conference | 2018

QUEST: A 7.49TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS

Kodai Ueyoshi; Kota Ando; Kazutoshi Hirose; Shinya Takamaeda-Yamazaki; Junichiro Kadomoto; Tomoki Miyata; Mototsugu Hamada; Tadahiro Kuroda; Masato Motomura

A key consideration for deep neural network (DNN) inference accelerators is the need for large and high-bandwidth external memories. Although an architectural concept for stacking a DNN accelerator with DRAMs has been proposed previously, long DRAM latency remains problematic and limits the performance [1]. Recent algorithm-level optimizations, such as network pruning and compression, have shown success in reducing the DNN memory size [2]; however, since networks become irregular and sparse, they induce an additional need for agile random accesses to the memory systems.

IEEE Journal of Solid-state Circuits | 2018

BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W

Kota Ando; Kodai Ueyoshi; Kentaro Orimo; Haruyoshi Yonekawa; Shimpei Sato; Hiroki Nakahara; Shinya Takamaeda-Yamazaki; Masayuki Ikebe; Tetsuya Asai; Tadahiro Kuroda; Masato Motomura

A versatile reconfigurable accelerator architecture for binary/ternary deep neural networks is presented. In-memory neural network processing without any external data accesses, sustained by the symmetry and simplicity of the computation of the binary/ternaty neural network, improves the energy efficiency dramatically. The prototype chip is fabricated, and it achieves 1.4 TOPS (tera operations per second) peak performance with 0.6-W power consumption at 400-MHz clock. The application examination is also conducted.

international symposium on neural networks | 2017

Exploring optimized accelerator design for binarized convolutional neural networks

Kodai Ueyoshi; Kota Ando; Kentaro Orimo; Masayuki Ikebe; Tetsuya Asai; Masato Motomura

The convolutional neural network (CNN) is a state-of-the-art model that can achieve significantly high accuracy in many machine-learning tasks. Recently, for further developing the practical applications of CNNs, efficient hardware platforms for accelerating CNN have been throughly studied. A binarized neural network has been reported to minimize the multipliers, which consume a large amount of resources, with a minimal decrease in accuracy. In this study, we analyzed the optimal performance of CNN implemented on an field programmable gate array (FPGA) considering its logic resources and a memory bandwidth, using multiple types of parallelisms such as kernels, pixels, and channels both in conventional and binarized CNNs. As a result, it became clear that all the parallelisms are required for the binarized neural network to obtain the best performance of 8.38 TOPS.

international midwest symposium on circuits and systems | 2017

In-memory area-efficient signal streaming processor design for binary neural networks

Haruyoshi Yonekawa; Shimpei Sato; Hiroki Nakahara; Kota Ando; Kodai Ueyoshi; Kazutoshi Hirose; Kentaro Orimo; Shinya Takamaeda-Yamazaki; Masayuki Ikebe; Tetsuya Asai; Masato Motomura

The expanding use of deep learning algorithms causes the demands for accelerating neural network (NN) signal processing. For the NN processing, in-memory computation is desired, in which expensive data transfer can be eliminated. In reflection of recently proposed binary neural networks (BNNs), which can reduce the computation resource and area requirements, we designed an in-memory BNN signal processor that densely stores binary weights in on-chip memories and can scale linearly with serial-parallel-serial signal stream. It achieved 3 and 71 times better per-power and per-area performance than an existing in-memory neuromorphic processor.

International Conference on Innovative Techniques and Applications of Artificial Intelligence | 2017

Quantization Error-Based Regularization in Neural Networks

Kazutoshi Hirose; Kota Ando; Kodai Ueyoshi; Masayuki Ikebe; Tetsuya Asai; Masato Motomura; Shinya Takamaeda-Yamazaki

Deep neural network is a state-of-the-art technology for achieving high accuracy in various machine learning tasks. Since the available computing power and memory footprint are restricted in embedded computing, precision quantization of numerical representations, such as fixed-point, binary, and logarithmic, are commonly used for higher computing efficiency. The main problem of quantization is accuracy degradation due to its lower numerical representation. There is generally a trade-off between numerical precision and accuracy. In this paper, we propose a quantization-error-aware training method to attain higher accuracy in quantized neural networks. Our approach appends an additional regularization term that is based on quantization errors of weights to the loss function. We evaluate the accuracy by using MNIST and CIFAR-10. The evaluation results show that the proposed approach achieves higher accuracy than the standard approach with quantized forwarding.

Circuits and Systems | 2017