Xiaoming Chen
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xiaoming Chen.
design automation conference | 2015
Gushu Li; Xiaoming Chen; Guangyu Sun; Henry Hoffmann; Yongpan Liu; Yu Wang; Huazhong Yang
Recently, general-purpose graphics processing units (GPGPUs) have been widely used to accelerate computing in various applications. To store the contexts of thousands of concurrent threads on a GPU, a large static random-access memory (SRAM)-based register file is employed. Due to high leakage power of SRAM, the register file consumes 20% to 40% of the total GPU power consumption. Thus, hybrid memory system, which combines SRAM and the emerging non-volatile memory (NVM), has been employed for register file design on GPUs. Although it has shown strong potential to alleviate the power issue of GPUs, existing hybrid memory solutions might not exploit the intrinsic feature of GPU register file. By leveraging the warp schedule on GPU, this paper proposes a hybrid register architecture which consists of a NVM-based register file and mixed SRAM-based write buffers with a warp-aware write back strategy. Simulation results show that our design can eliminate 64% of write accesses to NVM and reduce power of register file by 66% on average, with only 4.2% performance degradation. After we apply the power gating technique, the register power is further reduced to 25% of SRAM counterpart on average.
international test conference | 2015
Song Yao; Xiaoming Chen; Jie Zhang; Qiaoyi Liu; Jia Wang; Qiang Xu; Yu Wang; Huazhong Yang
Third-party intellectual property (3PIP) cores are widely used in integrated circuit designs. It is essential and important to ensure their trustworthiness. Existing hardware trust verification techniques suffer from high computational complexity, low extensibility, and inability to detect implicitly-triggered hardware trojans (HTs). To tackle the above problems, in this paper, we present a novel 3PIP trust verification framework, named FASTrust, which conducts HT feature analysis on the flip-flop level control-data flow graph (CDFG) of the circuit. FASTrust is not only able to identify existing explicitly-triggered and implicitly-triggered HTs appeared in the literature in an efficient and effective manner, but more importantly, it also has the unique advantage of being scalable to defend against future and more stealthy HTs by adding new features to the system.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2017
Xiaoming Chen; Lin Wang; Yu Wang; Yongpan Liu; Huazhong Yang
The continuous globalization of the semiconductor industry has significantly raised the vulnerability of chips under hardware Trojan (HT) attacks. It is extremely challenging to detect HTs in fabricated chips due to the existence of process variations (PVs), since PVs may cause larger impacts than HTs. In this paper, we propose a novel framework for HT detection in digital integrated circuits. The goal of this paper is to detect HTs inserted during fabrication. The HT detection problem is formulated as an under-determined linear system by a sparse gate profiling technique, and the existence of HTs is mapped to the sparse solution of the linear system. A Bayesian inference-based calibration technique is proposed to recover PVs for each chip for the sparse gate profiling technique. A batch of under-determined linear systems are solved together by the well-studied simultaneous orthogonal matching pursuit algorithm to get their common sparse solution. Experimental results show that even under big measurement errors, the proposed framework gets quite high HT detection rates with low measurement cost.
IEEE Sensors Journal | 2017
Yuzhi Wang; Anqi Yang; Xiaoming Chen; Pengjun Wang; Yu Wang; Huazhong Yang
Temporal drift of sensory data is a severe problem impacting the data quality of wireless sensor networks (WSNs). With the proliferation of large-scale and long-term WSNs, it is becoming more important to calibrate sensors when the ground truth is unavailable. This problem is called ”blind calibration”. In this paper, we propose a novel deep learning method named projection-recovery network (PRNet) to blindly calibrate sensor measurements online. The PRNet first projects the drifted data to a feature space, and uses a powerful deep convolutional neural network to recover the estimated drift-free measurements. We deploy a 24-sensor testbed and provide comprehensive empirical evidence showing that the proposed method significantly improves the sensing accuracy and drifted sensor detection. Compared with previous methods, PRNet can calibrate
design automation conference | 2018
Yi Cai; Yujun Lin; Lixue Xia; Xiaoming Chen; Song Han; Yu Wang; Huazhong Yang
2times
design automation conference | 2018
Shiqi Lian; Yinhe Han; Xiaoming Chen; Ying Wang; Hang Xiao
of drifted sensors at the recovery rate of 80% under the same level of accuracy requirement. We also provide helpful insights for designing deep neural networks for sensor calibration. We hope our proposed simple and effective approach will serve as a solid baseline in blind drift calibration of sensor networks.
Archive | 2017
Xiaoming Chen; Yu Wang; Huazhong Yang
Deeper and larger Neural Networks (NNs) have made breakthroughs in many fields. While conventional CMOS-based computing platforms are hard to achieve higher energy efficiency. RRAM-based systems provide a promising solution to build efficient Training-In-Memory Engines (TIME). While the endurance of RRAM cells is limited, it’s a severe issue as the weights of NN always need to be updated for thousands to millions of times during training. Gradient sparsification can address this problem by dropping off most of the smaller gradients but introduce unacceptable computation cost. We proposed an effective framework, SGS-ARS, including Structured Gradient Sparsification (SGS) and Aging-aware Row Swapping (ARS) scheme, to guarantee write balance across whole RRAM crossbars and prolong the lifetime of TIME. Our experiments demonstrate that 356× lifetime extension is achieved when TIME is programmed to train ResNet-50 on Imagenet dataset with our SGS-ARS framework.
Archive | 2017
Xiaoming Chen; Yu Wang; Huazhong Yang
As a critical operation in robotics, motion planning consumes lots of time and energy, especially in a dynamic environment. Through approaches based on general-purpose processors, it is hard to get a valid planning in real time. We present an accelerator to speed up collision detection, which costs over 90% of the computation time in motion planning. Via the octree-based roadmap representation, the accelerator can be reconfigured online and support large roadmaps. We in addition propose an effective algorithm to update the roadmap in a dynamic environment, together with a batched incremental processing approach to reduce the complexity of collision detection. Experimental results show that our accelerator achieves 26.5X speedup than an existing CPU-based approach. With the incremental approach, the performance further improves by 10X while the solution quality is degraded by 10% only.
design, automation, and test in europe | 2018
Jilan Lin; Lixue Xia; Zhenhua Zhu; Hanbo Sun; Yi Cai; Hui Gao; Ming Cheng; Xiaoming Chen; Yu Wang; Huazhong Yang
In this chapter, we will propose parallelization methodologies for the G-P sparse left-looking algorithm. Parallelizing sparse left-looking LU factorization faces three major challenges: the high sparsity of circuit matrices, the irregular structure of the symbolic pattern , and the strong data dependence during sparse LU factorization. To overcome these challenges, we propose an innovative framework to realize parallel sparse LU factorization. The framework is based on a detailed task-level data dependence analysis and composed of two different scheduling modes to fit different data dependences: a cluster mode suitable for independent tasks and a pipeline mode that explores parallelism between dependent tasks. Under the proposed scheduling framework, we will implement several different parallel algorithms for parallel full factorization and parallel re-factorization . In addition to the fundamental theories, we will also present some critical implementation details in this chapter.
design, automation, and test in europe | 2018
Ahmedullah Aziz; Evelyn T. Breyer; An Chen; Xiaoming Chen; Suman Datta; Sumeet Kumar Gupta; Michael J. Hoffmann; Xiaobo Sharon Hu; Adrian M. Ionescu; Matthew Jerry; Thomas Mikolajick; Halid Mulaosmanovic; Kai Ni; Michael Niemier; Ian O'Connor; Atanu Saha; Stefan Slesazeck; Sandeep Krishna Thirumala; Xunzhao Yin
In this chapter, we will present the basic flow of our proposed solver NICSLU, as a necessary background of the parallelization techniques.