Caiwen Ding | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Caiwen Ding is active.

Explore More

Publication

Featured researches published by Caiwen Ding.

architectural support for programming languages and operating systems | 2017

SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing

Ao Ren; Zhe Li; Caiwen Ding; Qinru Qiu; Yanzhi Wang; Ji Li; Xuehai Qian; Bo Yuan

With the recent advance of wearable devices and Internet of Things (IoTs), it becomes attractive to implement the Deep Convolutional Neural Networks (DCNNs) in embedded and portable systems. Currently, executing the software-based DCNNs requires high-performance servers, restricting the widespread deployment on embedded and mobile IoT devices. To overcome this obstacle, considerable research efforts have been made to develop highly-parallel and specialized DCNN accelerators using GPGPUs, FPGAs or ASICs. Stochastic Computing (SC), which uses a bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has high potential for implementing DCNNs with high scalability and ultra-low hardware footprint. Since multiplications and additions can be calculated using AND gates and multiplexers in SC, significant reductions in power (energy) and hardware footprint can be achieved compared to the conventional binary arithmetic implementations. The tremendous savings in power (energy) and hardware resources allow immense design space for enhancing scalability and robustness for hardware DCNNs. This paper presents SC-DCNN, the first comprehensive design and optimization framework of SC-based DCNNs, using a bottom-up approach. We first present the designs of function blocks that perform the basic operations in DCNN, including inner product, pooling, and activation function. Then we propose four designs of feature extraction blocks, which are in charge of extracting features from input feature maps, by connecting different basic function blocks with joint optimization. Moreover, the efficient weight storage methods are proposed to reduce the area and power (energy) consumption. Putting all together, with feature extraction blocks carefully selected, SC-DCNN is holistically optimized to minimize area and power (energy) consumption while maintaining high network accuracy. Experimental results demonstrate that the LeNet5 implemented in SC-DCNN consumes only 17 mm2 area and 1.53 W power, achieves throughput of 781250 images/s, area efficiency of 45946 images/s/mm2, and energy efficiency of 510734 images/J.

international symposium on microarchitecture | 2017

C ir CNN: accelerating and compressing deep neural networks using block-circulant weight matrices

Caiwen Ding; Siyu Liao; Yanzhi Wang; Zhe Li; Ning Liu; Youwei Zhuo; Chao Wang; Xuehai Qian; Yu Bai; Geng Yuan; Xiaolong Ma; Yipeng Zhang; Jian Tang; Qinru Qiu; Xue Lin; Bo Yuan

Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy.To overcome these limitations, this paper proposes CirCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CirCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from

asia and south pacific design automation conference | 2017

Towards acceleration of deep convolutional neural networks using stochastic computing

Ji Li; Ao Ren; Zhe Li; Caiwen Ding; Bo Yuan; Qinru Qiu; Yanzhi Wang

\mathrm {O}(n^{2})

international symposium on neural networks | 2017

Hardware-driven nonlinear activation for stochastic computing based deep convolutional neural networks

Ji Li; Zihao Yuan; Zhe Li; Caiwen Ding; Ao Ren; Qinru Qiu; Jeffrey Draper; Yanzhi Wang

international green and sustainable computing conference | 2015

Multi-source energy harvesting management and optimization for non-volatile processors

Soroush Heidari; Caiwen Ding; Yongpan Liu; Yanzhi Wang; Jingtong Hu

\mathrm {O}(n

great lakes symposium on vlsi | 2017

Softmax Regression Design for Stochastic Computing Based Deep Convolutional Neural Networks

Zihao Yuan; Ji Li; Zhe Li; Caiwen Ding; Ao Ren; Bo Yuan; Qinru Qiu; Jeffrey Draper; Yanzhi Wang

log n) and the storage complexity from

international symposium on circuits and systems | 2016

Multi-source in-door energy harvesting for non-volatile processors

Caiwen Ding; Soroush Heidari; Yanzhi Wang; Yongpan Liu; Jingtong Hu

\mathrm {O}(n^{2})

great lakes symposium on vlsi | 2016

Neural Network-based Prediction Algorithms for In-Door Multi-Source Energy Harvesting System for Non-Volatile Processors

Ning Liu; Caiwen Ding; Yanzhi Wang; Jingtong Hu

to O(n), with negligible accuracy loss. Compared to other approaches, CirCNN is distinct due to its mathematical rigor: the DNNs based on CirCNN can converge to the same “effectiveness” as DNNs without compression. We propose the CirCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc In CirCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CIR-CNN in FPGA, ASIC and embedded processors. Our results show that CirCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CirCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.CCS Concepts• Computer systems organization

field programmable gate arrays | 2018