Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jingcheng Wang is active.

Publication


Featured researches published by Jingcheng Wang.


IEEE Journal of Solid-state Circuits | 2015

A Dual-Slope Capacitance-to-Digital Converter Integrated in an Implantable Pressure-Sensing System

Sechang Oh; Yoonmyung Lee; Jingcheng Wang; Zhiyoong Foo; Yejoong Kim; Wanyeong Jung; Ziyun Li; David T. Blaauw; Dennis Sylvester

This work presents a dual-slope capacitance to digital converter for pressure sensing. The design uses base capacitance subtraction with a configurable capacitor bank and dual precision comparators to improve energy efficiency, consuming 110nW with 9.7b ENOB and 0.85pJ/conv·step FoM. The converter is integrated with a pressure transducer, battery, processor, and radio to form a complete 1.4mm×2.8mm×1.6mm sensor system aimed at implantable devices. The system operates from a 3.6V battery.


international solid-state circuits conference | 2017

14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence

Suyoung Bang; Jingcheng Wang; Ziyun Li; Cao Gao; Yejoong Kim; Qing Dong; Yen-Po Chen; Laura Fick; Xun Sun; Ronald G. Dreslinski; Trevor N. Mudge; Hun-Seok Kim; David T. Blaauw; Dennis Sylvester

Deep learning has proven to be a powerful tool for a wide range of applications, such as speech recognition and object detection, among others. Recently there has been increased interest in deep learning for mobile IoT [1] to enable intelligence at the edge and shield the cloud from a deluge of data by only forwarding meaningful events. This hierarchical intelligence thereby enhances radio bandwidth and power efficiency by trading-off computation and communication at edge devices. Since many mobile applications are “always-on” (e.g., voice commands), low power is a critical design constraint. However, prior works have focused on high performance reconfigurable processors [2–3] optimized for large-scale deep neural networks (DNNs) that consume >50mW. Off-chip weight storage in DRAM is also common in the prior works [2–3], which implies significant additional power consumption due to intensive off-chip data movement.


signal processing systems | 2015

A fixed-point neural network for keyword detection on resource constrained hardware

Mohit Shah; Jingcheng Wang; David T. Blaauw; Dennis Sylvester; Hun-Seok Kim; Chaitali Chakrabarti

Keyword detection is typically used as a front-end to trigger automatic speech recognition and spoken dialog systems. The detection engine needs to be continuously listening, which has strong implications on power and memory consumption. In this paper, we devise a neural network architecture for keyword detection and present a set of techniques for reducing the memory requirements in order to make the architecture suitable for resource constrained hardware. Specifically, a fixed-point implementation is considered; aggressively scaling down the precision of the weights lowers the memory compared to a naive floating-point implementation. For further optimization, a node pruning technique is proposed to identify and remove the least active nodes in a neural network. Experiments are conducted over 10 keywords selected from the Resource Management (RM) database. The trade-off between detection performance and memory is assessed for different weight representations. We show that a neural network with as few as 5 bits per weight yields a marginal and acceptable loss in performance, while requiring only 200 kilobytes (KB) of on-board memory and a latency of 150 ms. A hardware architecture using a single multiplier and a power consumption of less than 10mW is also presented.


european solid state circuits conference | 2014

Dual-slope capacitance to digital converter integrated in an implantable pressure sensing system

Sechang Oh; Yoonmyung Lee; Jingcheng Wang; Zhiyoong Foo; Yejoong Kim; David T. Blaauw; Dennis Sylvester

A dual-slope capacitance-to-digital converter for pressure-sensing is presented and demonstrated in a complete microsystem. The design uses base capacitance subtraction with a configurable capacitor bank to narrow down input capacitance range and reduce conversion time. An energy-efficient iterative charge subtraction method is proposed, employing a current mirror that leverages the 3.6 V battery supply available in the system. We also propose dual-precision comparators to reduce comparator power while maintaining high accuracy during slope conversion, further improving energy efficiency. The converter occupies 0.105 mm 2 in 180 nm CMOS and achieves 44.2 dB SNR at 6.4 ms conversion time and 110 nW of power, corresponding to 5.3 pJ/conv-step FoM. The converter is integrated with a pressure transducer, battery, processor, power management unit, and radio to form a complete 1.4 mm × 2.8 mm × 1.6 mm pressure sensor system aimed at implantable devices. The multi-layer system is implemented in 180 nm CMOS. The system was tested for resolution in a pressure chamber with an external 3.6 V supply and serial communication bus, and the measured resolution of 0.77 mmHg was recorded. We also demonstrated the wireless readout of the pressure data on the stack system operating completely wirelessly using an integrated battery.


international solid-state circuits conference | 2016

17.3 A reconfigurable dual-port memory with error detection and correction in 28nm FDSOI

Mahmood Khayatzadeh; Mehdi Saligane; Jingcheng Wang; Massimo Alioto; David T. Blaauw; Dennis Sylvester

SRAM is a key building block in systems-on-chip and usually limits their voltage scalability, due to the major impact of process/voltage/temperature (PVT) variations at low voltages [1]. Assist techniques to extend SRAM operating voltage range improve the bit cell read/write stability [1-5], but cannot mitigate variations in the internal sensing delay that is needed to develop the targeted bitline (BL) voltage. Hence, large guard bands and performance margins are still needed to ensure correct operation. These margins increase as supply voltage is lowered (Fig. 17.3.1) and must be addressed especially when the SRAM is coupled with margin-less processor designs (e.g., Razor).


international solid-state circuits conference | 2017

11.2 A 1Mb embedded NOR flash memory with 39µW program power for mm-scale high-temperature sensor nodes

Qing Dong; Yejoong Kim; Inhee Lee; Myungjoon Choi; Ziyun Li; Jingcheng Wang; Kaiyuan Yang; Yen Po Chen; Junjie Dong; Minchang Cho; Gyouho Kim; Wei Keng Chang; Yun Sheng Chen; Yu Der Chih; David T. Blaauw; Dennis Sylvester

Miniature sensor nodes are ideal for monitoring environmental conditions in emerging applications such as oil exploration. One key requirement for sensor nodes is embedded non-volatile memory for compact and retentive data storage in the event that the sensor power source is exhausted. Non-volatile memory also allows for near-zero standby power modes, which are particularly challenging to achieve at high temperatures when using SRAM in standby due to the exponential rise in leakage with temperature, which rapidly degrades battery life (Fig. 11.2.1). However, traditional NOR flash requires mW-level program and erase power, which cannot be sustained by mm-scale batteries with internal resistances >10kΩ To address this issue, we propose an ultra-low power NOR flash design and demonstrate its integration into a complete sensor system that is specifically designed for environmental monitoring under high temperature conditions: such as when injected into geothermal or oil wells.


asian solid state circuits conference | 2015

Reconfigurable self-timed regenerators for wide-range voltage scaled interconnect

Jingcheng Wang; Nathaniel Ross Pinckney; David T. Blaauw; Dennis Sylvester

A reconfigurable self-timed regenerator based global interconnect scheme enables graceful degradation of performance and power in wide range dynamic voltage/frequency scaled systems. A test chip demonstrates up to 40% and 25% better performance scaling than a traditional repeater based interconnect at 1V and 0.5V, respectively, in 45nm SOI CMOS.


signal processing systems | 2018

A Fixed-Point Neural Network Architecture for Speech Applications on Resource Constrained Hardware

Mohit Shah; Sairam Arunachalam; Jingcheng Wang; David T. Blaauw; Dennis Sylvester; Hun-Seok Kim; Jae-sun Seo; Chaitali Chakrabarti

Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. These applications have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this paper, we design low cost neural network architectures for keyword detection and speech recognition. Wepresent techniques to reduce memory requirement by scaling down the precision of weight and biases without compromising on the detection/recognition performance. Experiments conducted on the Resource Management (RM) database show that for the keyword detection neural network, representing the weights by 5 bits results in a 6 fold reduction in memory compared to a floating point implementation with very little loss in performance. Similarly, for the speech recognition neural network, representing the weights by 6 bits results in a 5 fold reduction in memory while maintaining an error rate similar to a floating point implementation. Preliminary results in 40nm TSMC technology show that the networks have fairly small power consumption: 11.12mW for the keyword detection network and 51.96mW for the speech recognition network, making these designs suitable for mobile devices.


international symposium on computer architecture | 2018

Neural cache: bit-serial in-cache acceleration of deep neural networks

Charles Eckert; Xiaowei Wang; Jingcheng Wang; Arun Subramaniyan; Ravi R. Iyer; Dennis Sylvester; David Blaaauw; Reetuparna Das

This paper presents the Neural Cache architecture, which re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks. Techniques to do in-situ arithmetic in SRAM arrays, create efficient data mapping and reducing data movement are proposed. The Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in-cache. The proposed architecture also supports quantization in-cache. Our experimental results show that the proposed architecture can improve inference latency by 8.3× over state-of-art multi-core CPU (Xeon E5), 7.7× over server class GPU (Titan Xp), for Inception v3 model. Neural Cache improves inference throughput by 12.4× over CPU (2.2× over GPU), while reducing power consumption by 50% over CPU (53% over GPU).


international symposium on microarchitecture | 2017

Cache automaton

Arun Subramaniyan; Jingcheng Wang; Ezhil R. M. Balasubramanian; David T. Blaauw; Dennis Sylvester; Reetuparna Das

Finite State Automata are widely used to accelerate pattern matching in many emerging application domains like DNA sequencing and XML parsing. Conventional CPUs and compute-centric accelerators are bottlenecked by memory bandwidth and irregular memory access patterns in automata processing. We present Cache Automaton, which repurposes last-level cache for automata processing, and a compiler that automates the process of mapping large real world Non-Deterministic Finite Automata (NFAs) to the proposed architecture. Cache Automaton extends a conventional last-level cache architecture with components to accelerate two phases in NFA processing: state-match and state-transition. State-matching is made efficient using a sense-amplifier cycling technique that exploits spatial locality in symbol matches. State-transition is made efficient using a new compact switch architecture. By overlapping these two phases for adjacent symbols we realize an efficient pipelined design. We evaluate two designs, one optimized for performance and the other optimized for space, across a set of 20 diverse benchmarks. The performance optimized design provides a speedup of 15× over DRAM-based Micron’s Automata Processor and 3840× speedup over processing in a conventional x86 CPU. The proposed design utilizes on an average 1.2 MB of cache space across benchmarks, while consuming 2.3 nJ of energy per input symbol. Our space optimized design can reduce the cache utilization to 0.72 MB, while still providing a speedup of 9× over AP. CCS CONCEPTS • Hardware → Emerging architectures; • Theory of computation → Formal languages and automata theory;

Collaboration


Dive into the Jingcheng Wang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yejoong Kim

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Qing Dong

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ziyun Li

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge