Chunpeng Wu
University of Pittsburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chunpeng Wu.
field-programmable custom computing machines | 2015
Sicheng Li; Chunpeng Wu; Hai Li; Boxun Li; Yu Wang; Qinru Qiu
Recurrent neural network (RNN) based language model (RNNLM) is a biologically inspired model for natural language processing. It records the historical information through additional recurrent connections and therefore is very effective in capturing semantics of sentences. However, the use of RNNLM has been greatly hindered for the high computation cost in training. This work presents an FPGA implementation framework for RNNLM training acceleration. At architectural level, we improve the parallelism of RNN training scheme and reduce the computing resource requirement for computation efficiency enhancement. The hardware implementation primarily targets at reducing data communication load. A multi-thread based computation engine is utilized which can successfully mask the long memory latency and reuse frequent accessed data. The evaluation based on the Microsoft Research Sentence Completion Challenge shows that the proposed FPGA implementation outperforms traditional class-based modest-size recurrent networks and obtains 46.2% in training accuracy. Moreover, experiments at different network sizes demonstrate a great scalability of the proposed framework.
design automation conference | 2016
Wei Wen; Chunpeng Wu; Yandan Wang; Kent W. Nixon; Qing Wu; Mark Barnell; Hai Li; Yiran Chen
IBM TrueNorth chip uses digital spikes to perform neuromorphic computing and achieves ultrahigh execution parallelism and power efficiency. However, in TrueNorth chip, low quantization resolution of the synaptic weights and spikes significantly limits the inference (e.g., classification) accuracy of the deployed neural network model. Existing workaround, i.e., averaging the results over multiple copies instantiated in spatial and temporal domains, rapidly exhausts the hardware resources and slows down the computation. In this work, we propose a novel learning method on TrueNorth platform that constrains the random variance of each computation copy and reduces the number of needed copies. Compared to the existing learning method, our method can achieve up to 68.8% reduction of the required neuro-synaptic cores or 6.5× speedup, with even slightly improved inference accuracy.
computer vision and pattern recognition | 2017
Chunpeng Wu; Wei Wen; Tariq Afzal; Yongmei Zhang; Yiran Chen; Hai Li
Recently, DNN model compression based on network architecture design, e.g., SqueezeNet, attracted a lot attention. No accuracy drop on image classification is observed on these extremely compact networks, compared to well-known models. An emerging question, however, is whether these model compression techniques hurt DNNs learning ability other than classifying images on a single dataset. Our preliminary experiment shows that these compression methods could degrade domain adaptation (DA) ability, though the classification performance is preserved. Therefore, we propose a new compact network architecture and unsupervised DA method in this paper. The DNN is built on a new basic module Conv-M which provides more diverse feature extractors without significantly increasing parameters. The unified framework of our DA method will simultaneously learn invariance across domains, reduce divergence of feature representations, and adapt label prediction. Our DNN has 4.1M parameters, which is only 6.7% of AlexNet or 59% of GoogLeNet. Experiments show that our DNN obtains GoogLeNet-level accuracy both on classification and DA, and our DA method slightly outperforms previous competitive ones. Put all together, our DA strategy based on our DNN achieves state-of-the-art on sixteen of total eighteen DA tasks on popular Office-31 and Office-Caltech datasets.
design automation conference | 2015
Beiye Liu; Chunpeng Wu; Hai Li; Yiran Chen; Qing Wu; Mark Barnell; Qinru Qiu
With the booming of big-data applications, cognitive information processing systems that leverage advanced data processing technologies, e.g., machine learning and data mining, are widely used in many industry fields. Although these technologies demonstrate great processing capability and accuracy in the relevant applications, several security and safety challenges are also emerging against these learning based technologies. In this paper, we will first introduce several security concerns in cognitive system designs. Some real examples are then used to demonstrate how the attackers can potentially access the confidential user data, replicate a sensitive data processing model without being granted the access to the details of the model, and obtain some key features of the training data by using the services publically accessible to a normal user. Based on the analysis of these security challenges, we also discuss several possible solutions that can protect the information privacy and security of cognitive systems during different stages of the usage.
Integration | 2017
Yiran Chen; Hai Li; Chunpeng Wu; Chang Song; Sicheng Li; Chuhan Min; Hsin-Pai Cheng; Wei Wen; Xiaoxiao Liu
Neuromorphic computing was originally referred to as the hardware that mimics neuro-biological architectures to implement models of neural systems. The concept was then extended to the computing systems that can run bio-inspired computing models, e.g., neural networks and deep learning networks. In recent years, the rapid growth of cognitive applications and the limited processing capability of conventional von Neumann architecture on these applications motivated worldwide research on neuromorphic computing systems. In this paper, we review the evolution of neuromorphic computing technique in both computing model and hardware implementation from a historical perspective. Various implementation methods and practices are also discussed. Finally, we present some emerging technologies that may potentially change the landscape of neuromorphic computing in the future, e.g., new devices and interdisciplinary computing architectures.
embedded systems for real time multimedia | 2016
Chunpeng Wu; Hsin-Pai Cheng; Sicheng Li; Hai Helen Li; Yiran Chen
Autonomous driving can effectively reduce traffic congestion and road accidents. Therefore, it is necessary to implement an efficient high-level, scene understanding model in an embedded device with limited power and sources. Toward this goal, we propose ApesNet, an efficient pixel-wise segmentation network, which understands road scenes in real-time, and has achieved promising accuracy. The key findings in our experiments are significantly lower the classification time and achieve a high accuracy compared to other conventional segmentation methods. The model is characterized by an efficient training and a sufficient fast testing. Experimentally, we use the well-known CamVid road scene dataset to show the advantages provided by our contributions. We compare our proposed architectures accuracy and time performance with SegNet. In CamVid dataset training and testing, our network, ApesNet outperform SegNet in eight classes accuracy. Additionally, our model size is 37% smaller than SegNet. With this advantage, the combining encoding and decoding time for each image is 1.45 to 2.47 times faster than SegNet.
IET Cyber-Physical Systems: Theory & Applications | 2016
Chunpeng Wu; Hsin-Pai Cheng; Sicheng Li; Hai Li; Yiran Chen
Road scene understanding and semantic segmentation is an on-going issue for computer vision. A precise segmentation can help a machine learning model understand the real world more accurately. In addition, a well-designed efficient model can be used on source limited devices. The authors aim to implement an efficient high-level, scene understanding model in an embedded device with finite power and resources. Toward this goal, the authors propose ApesNet, an efficient pixel-wise segmentation network which understands road scenes in near real-time and has achieved promising accuracy. The key findings in the authors’ experiments are significantly lower the classification time and achieving a high accuracy compared with other conventional segmentation methods. The model is characterised by an efficient training and a sufficient fast testing. Experimentally, the authors use two road scene benchmarks, CamVid and Cityscapes to show the advantages of ApesNet. The authors’ compare the proposed architectures accuracy and time performance with SegNet-Basic, a deep convolutional encoder–decoder architecture. ApesNet is 37% smaller than SegNet-Basic in terms of model size. With this advantage, the combining encoding and decoding time for each image is 2.5 times faster than SegNet-Basic.
design, automation, and test in europe | 2017
Hsin-Pai Cheng; Wei Wen; Chunpeng Wu; Sicheng Li; Hai Helen Li; Yiran Chen
As a large-scale commercial spiking-based neuromorphic computing platform, IBM TrueNorth processor received tremendous attentions in society. However, one of the known issues in TrueNorth design is the limited precision of synaptic weights. The current workaround is running multiple neural network copies in which the average value of each synaptic weight is close to that in the original network. We theoretically analyze the impacts of low data precision in the TrueNorth chip on inference accuracy, core occupation, and performance, and present a probability-biased learning method to enhance the inference accuracy through reducing the random variance of each computation copy. Our experimental results proved that the proposed techniques considerably improve the computation accuracy of TrueNorth platform and reduce the incurred hardware and performance overheads. Among all the tested methods, L1TEA regularization achieved the best result, say, up to 2.74% accuracy enhancement when deploying MNIST application onto TrueNorth platform. In May 2016, IBM TrueNorth team implemented convolutional neural networks (CNN) on TrueNorth processor and coincidently use a similar method, say, trinary weights, {-1, 0, 1}. It achieves near state-of-the-art accuracy on 8 standard datasets. In addition, to further evaluate TrueNorth performance on CNN, we test similar deep convolutional networks on True-North, GPU and FPGA. Among all, GPU has the highest through-put. But if we consider energy consumption, TrueNorth processor is the most energy-efficient one, say, > 6000 frames/sec/Watt.
international conference on computer aided design | 2016
Chaofei Yang; Chunpeng Wu; Hai Li; Yiran Chen; Mark Barnell; Qing Wu
Modern smart surveillance systems can not only record the monitored environment but also identify the targeted objects and detect anomaly activities. These advanced functions are often facilitated by deep neural networks, achieving very high accuracy and large data processing throughput. However, inappropriate design of the neural network may expose such smart systems to the risks of leaking the target being searched or even the adopted learning model itself to attackers. In this talk, we will present the security challenges in the design of smart surveillance systems. We will also discuss some possible solutions that leverage the unique properties of emerging nano-devices, including the incurred design and performance cost and optimization methods for minimizing these overheads.
neural information processing systems | 2016
Wei Wen; Chunpeng Wu; Yandan Wang; Yiran Chen; Hai Li