Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ziyun Li is active.

Publication


Featured researches published by Ziyun Li.


IEEE Journal of Solid-state Circuits | 2015

A Dual-Slope Capacitance-to-Digital Converter Integrated in an Implantable Pressure-Sensing System

Sechang Oh; Yoonmyung Lee; Jingcheng Wang; Zhiyoong Foo; Yejoong Kim; Wanyeong Jung; Ziyun Li; David T. Blaauw; Dennis Sylvester

This work presents a dual-slope capacitance to digital converter for pressure sensing. The design uses base capacitance subtraction with a configurable capacitor bank and dual precision comparators to improve energy efficiency, consuming 110nW with 9.7b ENOB and 0.85pJ/conv·step FoM. The converter is integrated with a pressure transducer, battery, processor, and radio to form a complete 1.4mm×2.8mm×1.6mm sensor system aimed at implantable devices. The system operates from a 3.6V battery.


international solid-state circuits conference | 2017

14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence

Suyoung Bang; Jingcheng Wang; Ziyun Li; Cao Gao; Yejoong Kim; Qing Dong; Yen-Po Chen; Laura Fick; Xun Sun; Ronald G. Dreslinski; Trevor N. Mudge; Hun-Seok Kim; David T. Blaauw; Dennis Sylvester

Deep learning has proven to be a powerful tool for a wide range of applications, such as speech recognition and object detection, among others. Recently there has been increased interest in deep learning for mobile IoT [1] to enable intelligence at the edge and shield the cloud from a deluge of data by only forwarding meaningful events. This hierarchical intelligence thereby enhances radio bandwidth and power efficiency by trading-off computation and communication at edge devices. Since many mobile applications are “always-on” (e.g., voice commands), low power is a critical design constraint. However, prior works have focused on high performance reconfigurable processors [2–3] optimized for large-scale deep neural networks (DNNs) that consume >50mW. Off-chip weight storage in DRAM is also common in the prior works [2–3], which implies significant additional power consumption due to intensive off-chip data movement.


IEEE Journal of Solid-state Circuits | 2016

A 10 mm 3 Inductive Coupling Radio for Syringe-Implantable Smart Sensor Nodes

Yao Shi; Myungjoon Choi; Ziyun Li; Zhihong Luo; Gyouho Kim; Zhiyoong Foo; Hun-Seok Kim; David D. Wentzloff; David T. Blaauw

We present a near-field radio system for a millimeter-scale wireless smart sensor node that is implantable through a 14-gauge syringe needle. The proposed system integrates a radio system on chip and a magnetic antenna on a glass substrate within a total dimension of 1 × 1 × 10 mm3. We demonstrate energy-efficient active near-field wireless communication between the millimeter-scale sensor node and a base station device through an RF energy-absorbing tissue. The wireless transceiver, digital baseband controller, wakeup controller, on-chip baseband timer, sleep timer, and MBUS controller are all integrated on the SoC to form a millimeter-scale sensor node, together with a 1 × 8 mm2 magnetic antenna fabricated with a 1.5-μm-thickness gold on a 100 μm-thickness glass substrate. An asymmetric link is established pairing the sensor antenna with a codesigned 11 × 11 cm2 base station antenna to achieve a link distance of up to 50 cm for sensor transmission and 20 cm for sensor reception. The transmitter consumes a 43.5 μW average power at 2 kb/s, while the receiver power consumption is 36 μW with a -54 dBm sensitivity at 100 kb/s. When powered by a 1×2.2 mm2 thin-film battery (2 μAh, 4.1 V), the designed system has a two week expected lifetime without battery recharging when the system wakes up and transmits and receives 16 b data every 10 min.


international solid-state circuits conference | 2017

11.2 A 1Mb embedded NOR flash memory with 39µW program power for mm-scale high-temperature sensor nodes

Qing Dong; Yejoong Kim; Inhee Lee; Myungjoon Choi; Ziyun Li; Jingcheng Wang; Kaiyuan Yang; Yen Po Chen; Junjie Dong; Minchang Cho; Gyouho Kim; Wei Keng Chang; Yun Sheng Chen; Yu Der Chih; David T. Blaauw; Dennis Sylvester

Miniature sensor nodes are ideal for monitoring environmental conditions in emerging applications such as oil exploration. One key requirement for sensor nodes is embedded non-volatile memory for compact and retentive data storage in the event that the sensor power source is exhausted. Non-volatile memory also allows for near-zero standby power modes, which are particularly challenging to achieve at high temperatures when using SRAM in standby due to the exponential rise in leakage with temperature, which rapidly degrades battery life (Fig. 11.2.1). However, traditional NOR flash requires mW-level program and erase power, which cannot be sustained by mm-scale batteries with internal resistances >10kΩ To address this issue, we propose an ultra-low power NOR flash design and demonstrate its integration into a complete sensor system that is specifically designed for environmental monitoring under high temperature conditions: such as when injected into geothermal or oil wells.


signal processing systems | 2016

Hardware-Efficient Neighbor-Guided SGM Optical Flow for Low Power Vision Applications

Jiang Xiang; Ziyun Li; Hun-Seok Kim; Chaitali Chakrabarti

Many real-time vision applications require accurate estimation of optical flow. This problem is quite challenging due to extremely high computation and memory bandwidth requirements. This paper presents a parallel block-based optical flow algorithm along with an optimized multicore hardware architecture. The algorithm is based on neighbor-guided semi-global matching (NG-fSGM), a dynamic programming algorithm that aggressively prunes search space using flow vector information of the neighboring pixels. In the block based NG-fSGM, the image is divided into overlapping blocks and the blocks are processed in parallel for high throughput. While large overlap between blocks improves the accuracy, it results in larger memory and higher computational complexity. To minimize the amount of overlap among blocks with minimal effect on the accuracy, we use temporal prediction to guide flow vectors along the block boundaries. A pseudo-random flow candidate selection technique is also introduced to reduce memory access bandwidth and computation requirements. The proposed algorithm is mapped onto a multicore architecture where each core has a high degree of internal parallelism and implements a prefetching technique to improve throughput and reduce memory latency. The proposed hardware-efficient algorithm and the corresponding architecture achieve significant gains in throughput, latency, and power efficiency with only 1.25% accuracy degradation compared to the original NG-fSGM when evaluated on the Middlebury dataset.


IEEE Journal of Solid-state Circuits | 2018

A 1920

Ziyun Li; Qing Dong; Mehdi Saligane; Benjamin P. Kempke; Luyao Gong; Zhengya Zhang; Ronald G. Dreslinski; Dennis Sylvester; David T. Blaauw; Hun-Seok Kim

This paper presents a single-chip, high-performance, and energy-efficient stereo vision depth-estimation processor for micro aerial vehicles (MAVs). The proposed processor implements the state-of-the-art semi-global matching (SGM) algorithm to deliver full high-definition (HD, 1920


symposium on vlsi circuits | 2017

\times

Supreet Jeloka; Jeongsup Lee; Ziyun Li; Jinal Shah; Qing Dong; Kaiyuan Yang; Dennis Sylvester; David T. Blaauw

{\times }


international solid-state circuits conference | 2017

1080 30-frames/s 2.3 TOPS/W Stereo-Depth Processor for Energy-Efficient Autonomous Navigation of Micro Aerial Vehicles

Ziyun Li; Qing Dong; Mehdi Saligane; Benjamin P. Kempke; Shijia Yang; Zhengya Zhang; Ronald G. Dreslinski; Dennis Sylvester; David T. Blaauw; Hun-Seok Kim

1080) stereo-depth outputs with a maximum of 38 frames/s throughput. Algorithm-architecture co-optimization is conducted, introducing overlapping block-based processing that eliminates very large on-chip memory and off-chip DRAM. We exploit inherent data parallelism in the algorithm by processing 128 local disparity costs and aggregating the SGM costs along four paths for all 128 disparities in parallel. A dependence-resolving scan associated with 16-stage deep pipeline is introduced to hide the data dependence between neighboring pixels in the SGM algorithm. Moreover, we propose a customized ultra-high bandwidth dual-port SRAM that utilizes the unique memory access characteristic of SGM to achieve highly energy-efficient memory access at a very high on-chip memory bandwidth of 1.64 Tb/s. The fabricated processor produces 512 levels of depth information for each pixel at full HD resolution with 30-frames/s performance, consuming 836 mW from a 0.75-V supply in TSMC 40-nm GP CMOS. We ported the design on a quadcopter MAV to demonstrate its performance in realistic real-time flight.


international solid-state circuits conference | 2016

An ultra-wide program, 122pJ/bit flash memory using charge recycling

Yao Shi; Myungjoon Choi; Ziyun Li; Gyouho Kim; Zhiyoong Foo; Hun-Seok Kim; David D. Wentzloff; David T. Blaauw

Embedded flash for low power sensing systems require very low write energy and peak power. This work proposes a 130nm, 1024×260 SONOS flash with an ultra-wide 1Kb program cycle, using efficient FN tunneling based programing and a dedicated, multi-output transition pump with charge sharing and charge recycling. Combined with energy efficient charge pumps, the proposed flash program energy is 122pJ/bit with a 1Mbps throughput.


international conference on image processing | 2016

3.7 A 1920×1080 30fps 2.3TOPS/W stereo-depth processor for robust autonomous navigation

Jiang Xiang; Ziyun Li; David T. Blaauw; Hun-Seok Kim; Chaitali Chakrabarti

Precise depth estimation is a key kernel function to realizing autonomous navigation on micro-aerial vehicles (MAVs). The state-of-the-art semi-global matching (SGM) algorithm has become favored for its high accuracy. In particular, it effectively handles low texture regions due to its global optimization of the disparity between a left and right image over the entire frame. However, SGM involves massively parallel computation (∼2TOP/s) and extremely high bandwidth memory access (38.6Tb/s) for 30fps HD resolution. This leads to ∼20s runtime for an HD image pair on a 3GHz CPU [1] requiring ∼386MB memory and >35W power consumption. Together, these factors place it well outside the realm of MAVs. Prior ASIC implementations have used either simpler local methods [2] or aggressively truncated global algorithms [3] that produce a depth map with significantly inferior quality or limited disparity range (32 or 64 pixels) and therefore fail to support standard automotive scene benchmarks [2–5]. In addition, due to the high memory requirement of SGM, prior methods [3–4] have used external DRAM to store intermediate computation, significantly reducing performance and efficiency.

Collaboration


Dive into the Ziyun Li's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Qing Dong

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gyouho Kim

University of Michigan

View shared research outputs
Top Co-Authors

Avatar

Jiang Xiang

Arizona State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge