Xixin Cao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xixin Cao is active.

Explore More

Publication

Featured researches published by Xixin Cao.

multimedia signal processing | 2006

An Efficient VLSI Implementation of Distributed Architecture for DWT

Xixin Cao; Qingqing Xie; Chungan Peng; Qingchun Wang; Dunshan Yu

This paper proposes an efficient and simple architecture for 9/7 discrete wavelet transform based on distributed arithmetic. To derive new proposed architecture, we consider the periodicity and symmetry of DWT to optimize the performance and reduce the computational redundancy. The inner product of coefficient matrix of DWT is distributed over the input by careful analysis of input, output and coefficient word lengths. In the coefficient matrix, linear maps are used to assign the necessary computation to processing elements in space domain. Moreover, the proposed architecture has regular data flow, and low control complexity. The result is a low hardware complexity DWT processor for 9/7 transforms, which allows two times faster clock than the direct implementation. This design is very suitable for image compression systems, e.g., JPEG2000 and MPEG4

international conference on advanced communication technology | 2007

An Area Efficient High Performance DCT Distributed Architecture for Video Compression

Yanling Chen; Xixin Cao; Qingqing Xie; Chungan Peng

Discrete cosine transform (DCT), which is an important component of image and video compression, is adopted in various standardized coding schemes, such as JPEG, MPEGx and H.26x. But when compute a two-dimensional (2D) DCT, a large number of multiplications and additions are required in the direct approach. Multiplications, which are the most time-consuming operations in simple processor, can be completely avoided in the proposed architecture for real-time image compression. An area efficient high performance VLSI architecture for DCT based on the distributed arithmetic is proposed in this paper. Minimum number of additions is used to the DCT by exploiting the timing property of the DCT transform based on the distributed arithmetic. A case study of 8 times 8 DCT architecture based on the DA is analyzed. Savings exceeding 97% are achieved for the DCT implementation.

international conference on advanced communication technology | 2008

A Memory-Efficient CAVLC Decoding Scheme for H.264/AVC

Yanling Chen; Xixin Cao; Xiaoming Peng; Chungan Peng; Dunshan Yu; Xing Zhang

This paper presents a memory-efficient CAVLC decoding architecture for H.264/AVC. In the proposed architecture, not only the memory space is reduced for decoding the syntax elements such as coeff token, total zero, and run before, but also the decode efficiency is improved. After the analysis of the decoding principle of the CAVLC, we simplify the coeff-token VLD table and propose a new coeff-token VLD based on arithmetic operation and the look-up table combination architecture. The run-before VLD can used the same principle as the proposed coeff-token VLD. Otherwise, the proposed scheme also adope the zero block skipping technique and multiple symbols decoding scheme when decoding SignTrail. The simulation results show that our system can run at I68MHz clock frequency and the average cycles for decoding one macro-block is 136 cycles. The proposed architecture can achieves an approximate 39-53% savings in memory access without video quality degrading.

international conference on solid state and integrated circuits technology | 2006

Efficient VLSI Design and Implementation of Integer Motion Estimation for H.264 SDTV Encoder

Chungan Peng; Dunshan Yu; Xixin Cao; Shimin Sheng

In this paper, the VLSI hardware complexity for H.264 integer motion estimation is analyzed, several hardware-reduction techniques are investigated and a Sot-SAD-Tree VLSI structure based on SAD-Tree is proposed. Using this Sot-SAD-Tree structure, the whole data path width is reduced to 50%, and the H.264 encoder with large frame and complex motion vector can be VLSI implementation with acceptable hardware cost. Finally, a complete H.264 SDTV integer motion estimation VLSI architecture with 16times256 parallelism is designed and implemented

international conference on electron devices and solid-state circuits | 2015

A low-energy high-throughput asynchronous AES for secure smart cards

Qihui Zhang; Jian Cao; Dunshan Yu; Xixin Cao; Xing Zhang; Yin Ye; Botao Chen

AES has been widely used in current financial security application, but side-channel attacks are considered as serious threats to AES cryptographic algorithm. Asynchronous AES design will be a potential solution because of its natural properties. First, our asynchronous AES architecture, round key generation architecture and mix column calculation architecture are proposed. Then, properties of Balsa HDL are investigated and exploited to reduce area and power, followed by GTECH-based design flow is described. Finally, a VLSI implementation of our AES crypto-processor is carried out with TSMC 130 nm CMOS technology. Experimental results show that our proposed asynchronous AES architecture can respectively achieve 67.7% and 40% lower ciphering time and power delay product of its counterpart, and its area is only 7.3% and 15% of those reported in other papers. Moreover, it can be easily integrated into an asynchronous security chip.

international conference on solid-state and integrated circuits technology | 2008

A novel H.264 QP adaptive MPDC block-matching algorithm and its VLSI design

Chungan Peng; Xixin Cao; Xiaoxin Cui; Dunshan Yu; Shimin Sheng

The computational complexity and hardware design of block-matching criteria were discussed, and a novel MPDC algorithm and its VLSI structure for H.264 were presented, in which a QP adaptive MPDC threshold was derived from the basics of H.264 4×4 integer transform and 52-level scalar quantization and the calculation process was adjusted for hardware optimization. When QP is greater than 18, the proposed criterion performs as well as SAD scheme, and the RDO curve error is less than 0.3 dB. Verilog HDL and SYNOPSYS DC-Shell are used for its VLSI design and implementation, and the synthesis results show that its 4 × 4 block-parallel structure saves 58% area and 77% power comparing with that of SAD¿s at 200 MHz. It is useful for some high-compression-ratio, low-cost and low-power video codec VLSI solutions.

ieee advanced information technology electronic and automation control conference | 2017

An asynchronous design method and its application on a low-power RFID baseband processor

Qihui Zhang; Jian Cao; Xixin Cao; Xing Zhang; Shijuan Zhang; Xianfeng Li; Hai Xiao

With the rapid development of VLSI technique and the continuous improvement of application requirement, problems such as power consumption and clock skew of integrated circuit have become more and more serious. Because of having several advantages such as low power, high performance and elimination of global clock, asynchronous circuit and its design methodology have been paid more attention in recent years. An asynchronous design and implementation flow based on Balsa and synchronous tools has been firstly proposed in this paper, and GTECH-based optimization scheme and Black-box-based optimization scheme are embedded to reduce area and power. And then design methodology mixed with asynchronous circuit and synchronous circuit is proposed to meet the low power requirement of RFID baseband processor. Finally, our proposed RFID is carried out with an 180nm CMOS technology. Experimental results show that the power of our proposed RFID is less than one third of those of its synchronous counterparts, and our proposed RFID baseband processor can be easily integrated into an UHF RFID chip.

international conference on solid-state and integrated circuits technology | 2008

A VLSI structural optimization method and workflow based on synthesis frequency inflexion

Chungan Peng; Ying Li; Xiaoxin Cui; Xixin Cao; Dunshan Yu

A synthesis frequency inflexion phenomenon of VLSI synthesis process is discussed, and then a VLSI structural optimization method with its workflow based on the analysis of synthesis frequency infrexion and register insertion is proposed. Registers are usually used for sequential synchronization and increasing maximum operating frequency, but in this issue, they are utilized to avoid excessively high combinational logic expenditure. In the H.264 macroblock-level SAD tree case, 50.6% improvement in speed is achieved at the expense of 2.9% increment in area. This method contains no complex algorithm, but exhibits good operability and generality. It is very suitable and useful for complicated VLSI structural design and/or their critical path optimization.

international conference on solid state and integrated circuits technology | 2006

A High Performance and Low Power Hardware Architecture of Entropy Coder for H.264/AVC Baseline

Wei-jun Lu; Xixin Cao; Dunshan Yu; Shimin Sheng

In this paper, the authors present a high performance and low power hardware architecture of entropy coder for H.264/AVC baseline. The authors implemented the architecture with SYNOPSYS design compiler and SMIC 0.13mum cell library. The result shows that the design need less area than the prior work and it can work at frequency 250Hz. In the worst case, it needs 1095 circles to code a macro block and can process 2306 QCIF (176times144) frames per second

international conference on advanced communication technology | 2012