Xiaoyang Zeng
Fudan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xiaoyang Zeng.
IEEE Journal of Solid-state Circuits | 2008
Jian Li; Xiaoyang Zeng; Lei Xie; Jun Chen; Jianyun Zhang; Yawei Guo
This paper describes a 10-bit 30-MS/s subsampling pipelined analog-to-digital converter (ADC) that is implemented in a 0.18 mum CMOS process. The ADC adopts a power efficient amplifier sharing architecture in which additional switches are introduced to reduce the crosstalk between the two opamp-sharing successive stages. A new configuration is used in the first stage of the ADC to avoid using a dedicated sample-and-hold amplifier (SHA) circuit at the input and to avoid the matching requirement between the first multiplying digital-to-analog converter (MDAC) and flash input signal paths. A symmetrical gate-bootstrapping switch is used as the bottom-sampling switch in the first stage to enhance the sampling linearity. The measured differential and integral nonlinearities of the prototype are less than 0.57 least significant bit (LSB) and 0.8 LSB, respectively, at full sampling rate. The ADC exhibits higher than 9.1 effective number of bits (ENOB) for input frequencies up to 30 MHz, which is the twofold Nyquist rate (fs/2), at 30 MS/s. The ADC consumes 21.6 mW from a 1.8-V power supply and occupies 0.7 mm2, which also includes the bandgap and buffer amplifiers. The figure-of-merit (FOM) of this ADC is 0.26 pJ/step.
international conference on multimedia and expo | 2012
Sha Shen; Weiwei Shen; Yibo Fan; Xiaoyang Zeng
4 or 8-point IDCT are widely used in traditional video coding standards. However larger size (16/32-point) IDCT has been proposed in the next generation video standard such as HEVC. To fulfill this requirement, this work proposes a fast computational algorithm of large size integer IDCT. A unified VLSI architecture for 4/8/16/32-point integer IDCT is also proposed accordingly. It can support the following video standards: MPEG-2/4, H.264, AVS, VC-1 and HEVC. Multiplier less MCM (Multiple Constant Multiplication) is used for 4/8-point IDCT. The regular multipliers and sharing technique are used for 16/32-point IDCT. The transpose memory uses SRAM instead of the traditional register array in order to further reduce the hardware overhead. It can support real-time decoding of 4K×2K (4096×2048) 30fps video sequence at 191MHz working frequency, with 93K gate count and 18944-bit SRAM. We suggest a normalized criterion called design efficiency to compare with previous works. It shows that this design is 31% more efficient than previous work.
IEEE Journal of Solid-state Circuits | 2011
Bo Xiang; Dan Bao; Shuangqu Huang; Xiaoyang Zeng
This paper presents a partially-parallel dual-path fully-overlapped QC-LDPC decoder for the WiMAX system. By adopting five techniques including symmetrical six-stage pipelining, block column and row interleaving, nonzero sub-matrix reordering, sum memory quad-partition and read-write bypass, the decoder continuously scans nonzero sub-matrices two by two in the block row-wise order without any memory access conflict. Two phases are fully overlapped with each other, and the check node updating phase always takes the latest sums from the previous variable node updating phase. The sum memory stores not only the posterior sums but also the prior messages, which saves 11,520 memory bits. It only takes 48-54 clock cycles for the decoder to finish one iteration. The read-write accesses to sum memories are reduced by 24.3%-48.8%. Fabricated in the SMIC 0.13 μ m CMOS process, the decoder occupies 4.84 mm 2 with core area of 3.03 mm2, attains 847-955 Mb/s at 214 MHz and 10 iterations, and consumes 342-397 mW at 1.2 V with power efficiency of 39-46 pJ per bit per iteration.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2015
Yao Zou; Jun Han; Sizhong Xuan; Shan Huang; Xinqian Weng; Dabin Fang; Xiaoyang Zeng
Long-term monitoring of electrocardiogram (ECG) requires a portable inspection system that can offer both high energy efficiency and good signal quality. To manage challenges caused by this requirement, this brief presents an ASIC design for both ECG recording and R-peak detection. By supporting data compression and using a dual-ping-pong-memory architecture, this design leads to a large reduction in the whole system energy. To reduce the hardware cost, the techniques of coefficients truncation and resource sharing are adopted. Importantly, parameters optimization and periodic extension are employed to improve quality of the reconstructed signal under high compress ratio (CR). Experimental results based on MIT-BIH database show that the proposed design obtains a high CR of 10.3 with a percentage root-mean-square difference of 0.64% under recording mode, and it achieves an R-peak detection sensitivity of 99.72% as well as a positive prediction of 99.49% with 13.68× data reduction. This design has been fabricated under TSMC 65-nm CMOS technology with area cost of 0.41 mm2. Measurement results indicate that the total system energy consumption for processing one segment of signals can be reduced by 5.7× and 2.3× in recording and detection modes, respectively. Compared to a software implementation on the same chip, this ASIC implementation achieves a 41% and 50% reduction in energy consumption per sample under the two modes, respectively.
IEEE Transactions on Very Large Scale Integration Systems | 2010
Bo Xiang; Rui Shen; An Pan; Dan Bao; Xiaoyang Zeng
The quasi-cyclic low-density parity-check (QC-LDPC) codes are widely applied in digital broadcast and communication systems. However, the decoders are still difficult to be put into practice due to their large area and high power, especially in the wireless mobile devices. This paper presents an improved all-purpose multirate iterative decoder architecture for QC-LDPC codes, which can largely reduce their area and power. The architecture implements the normalized min-sum algorithm, rearranges the original two-phase message-passing flow, and adopts an efficient quantization method for the second minimum absolute values, an optimized storing scheme for the position indexes and signs, and an elaborate clock gating technique for substantive memories and registers. It is also configurable for any regular and irregular QC-LDPC codes, and can be easily tuned up to different code rates and code word lengths. The chip is fabricated in an SMIC 0.18- six-metal-layer standard CMOS technology. It attains a throughput of 104.5 Mb/s, and dissipates an average power of 486 mW at 125 MHz, and 15 decoding iterations. The core area is only 9.76 mm2. The chip has been applied into the China digital terrestrial/television multimedia broadcasting system.
IEEE Transactions on Circuits and Systems | 2010
Dan Bao; Bo Xiang; Rui Shen; An Pan; Yun Chen; Xiaoyang Zeng
A programmable architecture is proposed for a flexi-mode quasi-cyclic low-density parity-check code decoder. The proposed architecture has the following advantages: 1) Code rate, length, and pattern can be programmed on the fly; 2) decoding complexity is reduced by algorithm modification; 3) memory read/write operation is reduced by access optimization and hierarchical memory structure; and 4) an early stopping scheme is adopted to give power efficiency, particularly in the low-signal-to-noise-ratio region. A decoder chip is implemented in an SMIC 180-nm 1.8-V CMOS technology. Experimental results show the advantages in terms of flexibility, area, power, and error-correction performance.
international solid-state circuits conference | 2013
Peng Ou; Jiajie Zhang; Heng Quan; Yi Li; Maofei He; Zheng Yu; Xueqiu Yu; Shile Cui; Jie Feng; Shikai Zhu; Jie Lin; Ming'e Jing; Xiaoyang Zeng; Zhiyi Yu
With the increasing complexity and variety of applications, programmable multi-core processors are drawing attention due to their high flexibility and low implementation cost, yet their performance and energy efficiency still cannot fulfill the demands of many compute-intensive applications. This paper describes a high-performance energy-efficient 24-core processor for multi-media and communication applications, with the following key features: (1) a packet-controlled circuit-switched double-layer network-on-chip (NoC) which provides 11Tb/s/W energy efficiency with 435Gb/s bisection-bandwidth; (2) a cluster-shared NoC-connected heterogeneous reconfigurable execution array, which can improve the performance of frequently used computations in multimedia and communication applications by over 6×; (3) memory hierarchy improvements, including a multi-page foreground and background register file, and memory splitting and sharing. The processor, implemented in TSMC 65nm CMOS LP and occupying 18.8mm2 (Fig. 3.6.7) operates at 850MHz at 1.2V, with 523mW power dissipation and 39GOPS/W (26pJ/operation) energy efficiency, which is 1.75× better than our former 16-core processor [3].
international solid-state circuits conference | 2012
Zhiyi Yu; Kaidi You; Ruijin Xiao; Heng Quan; Peng Ou; Yan Ying; Haofan Yang; Ming'e Jing; Xiaoyang Zeng
Almost all multicore processors use a shared-memory architecture due to its simple programming model. Recently, however, the message-passing mechanism is also drawing attention due to its potentially better scalability. In this work, we demonstrate that a hybrid communication mechanism supporting both message passing and shared memory can provide both higher performance and energy efficiency. This 16-core processor has 3 key features: (1) A cluster-based hierarchical architecture supporting both shared-memory and message-passing communication. (2) A cache-free memory hierarchy with an extended register file, small private memory and moderate shared memory to avoid complex cache coherence issues and achieve high energy efficiency by keeping data accesses local. (3) A hardware-aided mailbox mechanism to accelerate the synchronization procedure between different processor nodes. With these techniques, our multicore processor can provide high performance for many applications. Chip test results show that its maximum clock frequency is 800MHz and typical power consumption is 320mW, when running basic applications with clock gating at 1.2V at room temperature.
IEEE Transactions on Consumer Electronics | 2007
Jianming Wu; Yun Chen; Xiaoyang Zeng; Hao Min
In this paper a robust timing and frequency synchronization algorithm for DTMB system is developed. A two-step frequency offset estimation and compensation process is proposed to perform carrier recovery. Firstly coarse frequency estimation is achieved by utilizing the shift-and-add property of m-sequence. The second step finds the frame start position and the remaining frequency offset simultaneously. Meanwhile a timing tracking strategy is proposed to effectively track the dynamic changes in mobile environment. Thus the proposed scheme can resist large frequency offset and achieve accurate timing and frequency estimation. Simulation results under different channel situations verify the performance of the proposed scheme.
international symposium on circuits and systems | 2013
Weiwei Shen; Qing Shang; Sha Shen; Yibo Fan; Xiaoyang Zeng
As the next generation standard of video coding, the High Efficiency Video Coding (HEVC) aims to provide significantly improved compression performance in comparison with all existing video coding standards. We propose a four-stage pipeline hardware architecture on a quarter-LCU basis of deblocking filter in HEVC. Coupled with the novel filter order, a memory interlacing technique is adopted to increase the throughput, which can access the data in the process of both vertical and horizontal filtering efficiently. As a result, our design can support 4K×2K (4096×2048) at 30 fps applications with merely 28 MHz working frequency.