Caiwen Ding
Syracuse University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Caiwen Ding.
architectural support for programming languages and operating systems | 2017
Ao Ren; Zhe Li; Caiwen Ding; Qinru Qiu; Yanzhi Wang; Ji Li; Xuehai Qian; Bo Yuan
With the recent advance of wearable devices and Internet of Things (IoTs), it becomes attractive to implement the Deep Convolutional Neural Networks (DCNNs) in embedded and portable systems. Currently, executing the software-based DCNNs requires high-performance servers, restricting the widespread deployment on embedded and mobile IoT devices. To overcome this obstacle, considerable research efforts have been made to develop highly-parallel and specialized DCNN accelerators using GPGPUs, FPGAs or ASICs. Stochastic Computing (SC), which uses a bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has high potential for implementing DCNNs with high scalability and ultra-low hardware footprint. Since multiplications and additions can be calculated using AND gates and multiplexers in SC, significant reductions in power (energy) and hardware footprint can be achieved compared to the conventional binary arithmetic implementations. The tremendous savings in power (energy) and hardware resources allow immense design space for enhancing scalability and robustness for hardware DCNNs. This paper presents SC-DCNN, the first comprehensive design and optimization framework of SC-based DCNNs, using a bottom-up approach. We first present the designs of function blocks that perform the basic operations in DCNN, including inner product, pooling, and activation function. Then we propose four designs of feature extraction blocks, which are in charge of extracting features from input feature maps, by connecting different basic function blocks with joint optimization. Moreover, the efficient weight storage methods are proposed to reduce the area and power (energy) consumption. Putting all together, with feature extraction blocks carefully selected, SC-DCNN is holistically optimized to minimize area and power (energy) consumption while maintaining high network accuracy. Experimental results demonstrate that the LeNet5 implemented in SC-DCNN consumes only 17 mm2 area and 1.53 W power, achieves throughput of 781250 images/s, area efficiency of 45946 images/s/mm2, and energy efficiency of 510734 images/J.
international symposium on microarchitecture | 2017
Caiwen Ding; Siyu Liao; Yanzhi Wang; Zhe Li; Ning Liu; Youwei Zhuo; Chao Wang; Xuehai Qian; Yu Bai; Geng Yuan; Xiaolong Ma; Yipeng Zhang; Jian Tang; Qinru Qiu; Xue Lin; Bo Yuan
Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy.To overcome these limitations, this paper proposes CirCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CirCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from
asia and south pacific design automation conference | 2017
Ji Li; Ao Ren; Zhe Li; Caiwen Ding; Bo Yuan; Qinru Qiu; Yanzhi Wang
\mathrm {O}(n^{2})
international symposium on neural networks | 2017
Ji Li; Zihao Yuan; Zhe Li; Caiwen Ding; Ao Ren; Qinru Qiu; Jeffrey Draper; Yanzhi Wang
to
international green and sustainable computing conference | 2015
Soroush Heidari; Caiwen Ding; Yongpan Liu; Yanzhi Wang; Jingtong Hu
\mathrm {O}(n
great lakes symposium on vlsi | 2017
Zihao Yuan; Ji Li; Zhe Li; Caiwen Ding; Ao Ren; Bo Yuan; Qinru Qiu; Jeffrey Draper; Yanzhi Wang
log n) and the storage complexity from
international symposium on circuits and systems | 2016
Caiwen Ding; Soroush Heidari; Yanzhi Wang; Yongpan Liu; Jingtong Hu
\mathrm {O}(n^{2})
great lakes symposium on vlsi | 2016
Ning Liu; Caiwen Ding; Yanzhi Wang; Jingtong Hu
to O(n), with negligible accuracy loss. Compared to other approaches, CirCNN is distinct due to its mathematical rigor: the DNNs based on CirCNN can converge to the same “effectiveness” as DNNs without compression. We propose the CirCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc In CirCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CIR-CNN in FPGA, ASIC and embedded processors. Our results show that CirCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CirCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.CCS Concepts• Computer systems organization
field programmable gate arrays | 2018
Shuo Wang; Zhe Li; Caiwen Ding; Bo Yuan; Qinru Qiu; Yanzhi Wang; Yun Liang
\rightarrow
IEEE Design & Test of Computers | 2017
Caiwen Ding; Ning Liu; Yanzhi Wang; Ji Li; Soroush Heidari; Jingtong Hu; Yongpan Liu
Embedded hardware;