Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Leibo Liu is active.

Publication


Featured researches published by Leibo Liu.


IEEE Transactions on Circuits and Systems Ii-express Briefs | 2010

An Implementation of Fast-Locking and Wide-Range 11-bit Reversible SAR DLL

Lei Wang; Leibo Liu; Hongyi Chen

This brief proposes a novel circuit architecture of an 11-bit reversible successive approximation register (RSAR)controlled all-digital delay-locked loop (DLL), which could achieve adaptive bandwidth in a wide operation range by utilizing the modified binary search algorithm of the RS AR scheme. Moreover, it is fast locking because it finds the suitable delay range first and the successive approximation register process next. The proposed RSAR DLL is fabricated into a 0.2 × 0.1 mm2 silicon with SMIC 0.13-μm 1P6M complimentary metal-oxide-semiconductor technology. Test shows that the chip could work in a wide frequency range from 30 MHz to 1 GHz, with less than 42 cycles lock-in time, 10-ps delay resolution, and 1.5 mW at 30-MHz power dissipation.


IEEE Journal of Solid-state Circuits | 2004

A VLSI architecture of JPEG2000 encoder

Leibo Liu; Ning Chen; Hongying Meng; Li Zhang; Zhihua Wang; Hongyi Chen

This paper proposes a VLSI architecture of JPEG2000 encoder, which functionally consists of two parts: discrete wavelet transform (DWT) and embedded block coding with optimized truncation (EBCOT). For DWT, a spatial combinative lifting algorithm (SCLA)-based scheme with both 5/3 reversible and 9/7 irreversible filters is adopted to reduce 50% and 42% multiplication computations, respectively, compared with the conventional lifting-based implementation (LBI). For EBCOT, a dynamic memory control (DMC) strategy of Tier-1 encoding is adopted to reduce 60% scale of the on-chip wavelet coefficient storage and a subband parallel-processing method is employed to speed up the EBCOT context formation (CF) process; an architecture of Tier-2 encoding is presented to reduce the scale of on-chip bitstream buffering from full-tile size down to three-code-block size and considerably eliminate the iterations of the rate-distortion (RD) truncation.


design automation conference | 2013

Polyhedral model based mapping optimization of loop nests for CGRAs

Dajiang Liu; Shouyi Yin; Leibo Liu; Shaojun Wei

The coarse-grained reconfigurable architecture (CGRA) is a promising platform that provides both high performance and high power-efficiency. The compute-intensive portions of an application (e.g. loops) are often mapped onto CGRA for acceleration. To optimize the mapping of loop nests to CGRA, this paper makes two contributions: i) Establishing a precise CGRA performance model and formulating the loop nests mapping as a nonlinear optimization problem based on polyhedral model, ii) Extracting an efficient heuristic loop transformation and mapping algorithm (PolyMAP) to improve mapping performance. Experiment results on most kernels of the PolyBench and real-life applications show that our proposed approach can improve the performance of the kernels by 21% on average, as compared to one of the best existing mapping algorithm, EPIMap. The runtime complexity of PolyMAP is also acceptable.


international symposium on circuits and systems | 2010

A reconfigurable multi-processor SoC for media applications

Min Zhu; Leibo Liu; Shouyi Yin; Yansheng Wang; Wenjie Wang; Shaojun Wei

This paper proposes a reconfigurable multi-processor SoC for media applications called REMUS (REconfigurable Multi-media System), which consists of 512 processing engines and two ARMs. The processing engines are divided into two dynamic configuration groups, which can be easily tailored and extended. The processing engines, DBIs (Data Buffering Interface, DBI) and context interfaces build up a large throughput computing system with thread parallelism, algorithms parallelism and data parallelism. Different algorithms can be mapped in at the same time. REMUS is suitable for many applications such as media decoding and baseband processing, etc. Simulation results show that the processing capability of REMUS is to support 1920⋆1088 @30fps videos at 200 MHz in real-time decoding of H.264 high-profile streams.


custom integrated circuits conference | 2013

An energy-efficient coarse-grained dynamically reconfigurable fabric for multiple-standard video decoding applications

Leibo Liu; Chenchen Deng; Dong Wang; Min Zhu; Shouyi Yin; Peng Cao; Shaojun Wei

In this paper, we introduce a coarse-grained dynamically reconfigurable fabric, named Reconfigurable Processing Unit (RPU), which is implemented on a 5.4×3.1 mm2 silicon with TSMC 65 nm LP1P8M technology. This fabric consists of 16×16 multi-functional Processing Elements (PEs) interconnected by an area-efficient Line-Switched Mesh Connect (LSMC) routing. A Hierarchical Configuration Context (HCC) organization scheme is proposed to reduce the scale of the context memory and enhance configuration efficiency. Two reconfigurable processors are then designed and fabricated to verify the proposed techniques. One processor (called REMUS_HPP) integrates two RPUs, targeting the high performance applications. REMUS_HPP could decode 1920×1080@30fps H.264 streams with 280mW under 200MHz, achieving a performance gain of 1.81x and a 14.3x energy efficiency improvement over XPP-III. The other processor (called REMUS_LPP) integrates only one RPU, targeting the low power applications. REMUS_LPP could decode 720×480@35fps H.264 streams with 24.81mW under 75MHz, achieving a 76% power reduction and a 3.96x energy efficiency improvement compared with ADRES. More importantly, RPU is not only limited to video decoding applications. It can also be used to process some other computation-intensive applications and the corresponding analysis is given in this paper as well.


IEEE Transactions on Reliability | 2015

A Stochastic Approach for the Analysis of Dynamic Fault Trees With Spare Gates Under Probabilistic Common Cause Failures

Peican Zhu; Jie Han; Leibo Liu; Fabrizio Lombardi

A redundant system usually consists of primary and standby modules. The so-called spare gate is extensively used to model the dynamic behavior of redundant systems in the application of dynamic fault trees (DFTs). Several methodologies have been proposed to evaluate the reliability of DFTs containing spare gates by computing the failure probability. However, either a complex analysis or significant simulation time are usually required by such an approach. Moreover, it is difficult to compute the failure probability of a system with component failures that are not exponentially distributed. Additionally, probabilistic common cause failures (PCCFs) have been widely reported, usually occurring in a statistically dependent manner. Failure to account for the effect of PCCFs overestimates the reliability of a DFT. In this paper, stochastic computational models are proposed for an efficient analysis of spare gates and PCCFs in a DFT. Using these models, a DFT with spare gates under PCCFs can be efficiently evaluated. In the proposed stochastic approach, a signal probability is encoded as a non-Bernoulli sequence of random permutations of fixed numbers of ones and zeros. The components failure probability is not limited to an exponential distribution, thus this approach is applicable to a DFT analysis in a general case. Several case studies are evaluated to show the accuracy and efficiency of the proposed approach, compared to both an analytical approach and Monte Carlo (MC) simulation.


IEEE Transactions on Very Large Scale Integration Systems | 2014

SimRPU: A Simulation Environment for Reconfigurable Architecture Exploration

Leibo Liu; Dong Wang; Shouyi Yin; Yingjie Victor Chen; Min Zhu; Shaojun Wei

To assist the system architects with fast exploration and performance evaluation of the reconfigurable software/hardware architectures, this paper presents a system-level simulator, named after SimRPU, for the reconfigurable processing unit (RPU), which is the major computing engine in reconfigurable processor. The proposed simulator consists of a simulation kernel, a software compiler, a system profiler providing performance, area and power information for the desired architectures, and a system debugger supporting inspecting and modification of the internal state of the RPU. Object-oriented hierarchical and parameterized architecture modeling techniques are proposed to satisfy the requirements for a fast and comprehensive evaluation. Cycle-accurate simulation mechanisms are developed to improve the accuracy of the profiled performance data. Compared with the traditional register transfer level (RTL) based simulation scheme, the proposed simulator could achieve an average speedup of 18.5× with only 3.5% reduction on performance estimation accuracy. One reconfigurable processor targeted at high-definition multimedia decoding applications (such as H.264, MPEG2, AVS, etc.) is implemented with Taiwan Semiconductor Manufacturing Company 65-nm process using the proposed exploration and design flow. The measured results show that the implemented architecture has obvious advantages in terms of both performance and power consumption than the reference designs in multimedia decoding applications.


IEEE Transactions on Very Large Scale Integration Systems | 2015

A Fault-Tolerant Technique Using Quadded Logic and Quadded Transistors

Jie Han; Eugene Leung; Leibo Liu; Fabrizio Lombardi

Advances in CMOS technology have made digital circuits and systems very sensitive to manufacturing variations, aging, and/or soft errors. Fault-tolerant techniques using hardware redundancy have been extensively investigated for improving reliability. Quadded logic (QL) is an interwoven redundant logic technique that corrects errors by switching them from critical to subcritical status; however, QL cannot correct errors in the last one or two layers of a circuit. In contrast to QL, quadded transistor (QT) corrects errors while performing the function of a circuit. In this brief, a technique that combines QL with QT is proposed to take advantage of both techniques. The proposed quadded logic with quadded transistor (QLQT) technique is evaluated and compared with other fault-tolerant techniques, such as triple modular redundancy and triple interwoven redundancy, using stochastic computational models. Simulation results show that QLQT has a better reliability than the other fault-tolerant techniques (except in the very restrictive case of small circuits with low gate error rates and very short paths from primary inputs to primary outputs). These results provide a new insight for implementing efficient fault-tolerant techniques in the design of reliable circuits and systems.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2015

An Efficient Application Mapping Approach for the Co-Optimization of Reliability, Energy, and Performance in Reconfigurable NoC Architectures

Chen Wu; Chenchen Deng; Leibo Liu; Jie Han; Jiqiang Chen; Shouyi Yin; Shaojun Wei

In this paper, an efficient application mapping approach is proposed for the co-optimization of reliability, communication energy, and performance (CoREP) in network-on-chip (NoC)-based reconfigurable architectures. A cost model for the CoREP is developed to evaluate the overall cost of a mapping. In this model, communication energy and latency (as a measure of performance) are first considered in energy latency product (ELP), and then ELP is co-optimized with reliability by a weight parameter that defines the optimization priority. Both transient and intermittent errors in NoC are modeled in CoREP. Based on CoREP, a mapping approach, referred to as priority and ratio oriented branch and bound (PRBB), is proposed to derive the best mapping by enumerating all the candidate mappings organized in a search tree. Two techniques, branch node priority recognition and partial cost ratio utilization, are adopted to improve the search efficiency. Experimental results show that the proposed approach achieves significant improvements in reliability, energy, and performance. Compared with the state-of-the-art methods in the same scope, the proposed approach has the following distinctive advantages: 1) CoREP is highly flexible to address various NoC topologies and routing algorithms while others are limited to some specific topologies and/or routing algorithms; 2) general quantitative evaluation for reliability, energy, and performance are made, respectively, before being integrated into unified cost model in general context while other similar models only touch upon two of them; and 3) CoREP-based PRBB attains a competitive processing speed, which is faster than other mapping approaches.


Sensors | 2015

Fast Traffic Sign Recognition with a Rotation Invariant Binary Pattern Based Feature

Shouyi Yin; Peng Ouyang; Leibo Liu; Yike Guo; Shaojun Wei

Robust and fast traffic sign recognition is very important but difficult for safe driving assistance systems. This study addresses fast and robust traffic sign recognition to enhance driving safety. The proposed method includes three stages. First, a typical Hough transformation is adopted to implement coarse-grained location of the candidate regions of traffic signs. Second, a RIBP (Rotation Invariant Binary Pattern) based feature in the affine and Gaussian space is proposed to reduce the time of traffic sign detection and achieve robust traffic sign detection in terms of scale, rotation, and illumination. Third, the techniques of ANN (Artificial Neutral Network) based feature dimension reduction and classification are designed to reduce the traffic sign recognition time. Compared with the current work, the experimental results in the public datasets show that this work achieves robustness in traffic sign recognition with comparable recognition accuracy and faster processing speed, including training speed and recognition speed.

Collaboration


Dive into the Leibo Liu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Peng Cao

Southeast University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jie Han

University of Alberta

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge