Xinmiao Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xinmiao Zhang is active.

Explore More

Publication

Featured researches published by Xinmiao Zhang.

IEEE Transactions on Very Large Scale Integration Systems | 2004

High-speed VLSI architectures for the AES algorithm

Xinmiao Zhang; Keshab K. Parhi

This paper presents novel high-speed architectures for the hardware implementation of the Advanced Encryption Standard (AES) algorithm. Unlike previous works which rely on look-up tables to implement the SubBytes and InvSubBytes transformations of the AES algorithm, the proposed design employs combinational logic only. As a direct consequence, the unbreakable delay incurred by look-up tables in the conventional approaches is eliminated, and the advantage of subpipelining can be further explored. Furthermore, composite field arithmetic is employed to reduce the area requirements, and different implementations for the inversion in subfield GF(2/sup 4/) are compared. In addition, an efficient key expansion architecture suitable for the subpipelined round units is also presented. Using the proposed architecture, a fully subpipelined encryptor with 7 substages in each round unit can achieve a throughput of 21.56 Gbps on a Xilinx XCV1000 e-8 bg560 device in non-feedback modes, which is faster and is 79% more efficient in terms of equivalent throughput/slice than the fastest previous FPGA implementation known to date.

IEEE Circuits and Systems Magazine | 2002

Implementation approaches for the Advanced Encryption Standard algorithm

Xinmiao Zhang; Keshab K. Parhi

This paper addresses various approaches for efficient hardware implementation of the Advanced Encryption Standard algorithm. The optimization methods can be divided into two classes: architectural optimization and algorithmic optimization. Architectural optimization exploits the strength of pipelining, loop unrolling and sub-pipelining. Speed is increased by processing multiple rounds simultaneously at the cost of increased area. Architectural optimization is not an effective solution infeed-back mode. Loop unrolling is the only architecture that can achieve a slight speedup with significantly increased area. In non-feedback mode, subpipelining can achieve maximum speedup and the best speed/area ratio. Algorithmic optimization exploits algorithmic strength inside each round unit. Various methods to reduce the critical path and area of each round unit are presented. Resource sharing issues between encryptor and decryptor are also discussed. They become important issues when both encryptor and decryptor need to be implemented in a small area.

Archive | 2007

Wireless Security and Cryptography: Specifications and Implementations

Nicolas Sklavos; Xinmiao Zhang

As the use of wireless devices becomes widespread, so does the need for strong and secure transport protocols. Even with this intensified need for securing systems, using cryptography does not seem to be a viable solution due to difficulties in implementation. The security layers of many wireless protocols use outdated encryption algorithms, which have proven unsuitable for hardware usage, particularly with handheld devices. Summarizing key issues involved in achieving desirable performance in security implementations, Wireless Security and Cryptography: Specifications and Implementations focuses on alternative integration approaches for wireless communication security. It gives an overview of the current security layer of wireless protocols and presents the performance characteristics of implementations in both software and hardware. This resource also presents efficient and novel methods to execute security schemes in wireless protocols with high performance. It provides the state-of-the-art research trends in implementations of wireless protocol security for current and future wireless communications. Unique in its coverage of specification and implementation concerns that include hardware design techniques, Wireless Security and Cryptography: Specifications and Implementations provides thorough coverage of wireless network security and recent research directions in the field.

IEEE Transactions on Circuits and Systems Ii-express Briefs | 2006

On the Optimum Constructions of Composite Field for the AES Algorithm

Xinmiao Zhang; Keshab K. Parhi

In the hardware implementations of the Advanced Encryption Standard (AES) algorithm, employing composite field arithmetic not only reduces the complexity but also enables deep subpipelining such that higher speed can be achieved. In addition, it is more efficient to employ composite field arithmetic only in the SubBytes transformation of the AES algorithm. Composite fields can be constructed by using different irreducible polynomials. Nevertheless, how the different constructions affect the complexity of the composite implementation of the SubBytes has not been analyzed in prior works. This brief presents 16 ways to construct the composite field GF(((22)2)2) for the AES algorithm. Analytical results are provided for the effects of the irreducible polynomial coefficients on the complexity of each involved subfield operation. In addition, for each construction, there exist eight isomorphic mappings that map the elements in GF(28) to those in composite fields. The complexities of these mappings vary. An efficient algorithm is proposed in this brief to find all isomorphic mappings. Based on the complexities of both the subfield operations and the isomorphic mappings, the optimum constructions of the composite field for the AES algorithm are selected to minimize gate count and critical path

signal processing systems | 2008

Error correction for multi-level NAND flash memory using Reed-Solomon codes

Bainan Chen; Xinmiao Zhang; Zhongfeng Wang

Prior research efforts have been focusing on using BCH codes for error correction in multi-level cell (MLC) NAND flash memory. However, BCH codes often require highly parallel implementations to meet the throughput requirement. As a result, large area is needed. In this paper, we propose to use Reed-Solomon (RS) codes for error correction in MLC flash memory. A (828, 820) RS code has almost the same rate and length in terms of bits as a BCH (8248, 8192) code. Moreover, it has at least the same error-correcting performance in flash memory applications. Nevertheless, with 70% of the area, the RS decoder can achieve a throughput that is 121% higher than the BCH decoder. A novel bit mapping scheme using gray code is also proposed in this paper. Compared to direct bit mapping, our proposed scheme can achieve 0.02 dB and 0.2 dB additional gains by using RS and BCH codes, respectively, without any overhead.

IEEE Transactions on Circuits and Systems I-regular Papers | 2011

Efficient Partial-Parallel Decoder Architecture for Quasi-Cyclic Nonbinary LDPC Codes

Xinmiao Zhang; Fang Cai

Nonbinary low-density parity-check (NB-LDPC) codes constructed over GF(q) (q >; 2) can achieve higher coding gain than binary LDPC codes when the code length is moderate. A complete partial-parallel decoder architecture based on the Min-max algorithm is proposed for quasi-cyclic NB-LDPC codes in this paper. A novel scheme and corresponding architecture are developed to implement the elementary step of the check node processing. In our design, layered decoding is applied and only nm <; q messages are kept on each edge of the associated Tanner graph. The computation units and the scheduling of the computations are optimized in the context of layered decoding to reduce the area requirement and increase the speed. This paper also introduces an overlapped method for the check node processing among different layers to further speed up the decoding. From complexity and latency analysis, our design is much more efficient than any previous design. Our proposed decoder for a (744, 653) code over GF(25) has also been synthesized on a Xilinx Virtex-2 Pro FPGA device. It can achieve a throughput of 9.30 Mbps when 15 decoding iterations are carried out.

IEEE Transactions on Very Large Scale Integration Systems | 2005

High-speed architectures for parallel long BCH encoders

Xinmiao Zhang; Keshab K. Parhi

Long Bose-Chaudhuri-Hocquenghen (BCH) codes are used as the outer error correcting codes in the second-generation Digital Video Broadcasting Standard from the European Telecommunications Standard Institute. These codes can achieve around 0.6-dB additional coding gain over Reed-Solomon codes with similar code rate and codeword length in long-haul optical communication systems. BCH encoders are conventionally implemented by a linear feedback shift register architecture. High-speed applications of BCH codes require parallel implementation of the encoders. In addition, long BCH encoders suffer from the effect of large fanout. In this paper, three novel architectures are proposed to reduce the achievable minimum clock period for long BCH encoders after the fanout bottleneck has been eliminated. For an (8191, 7684) BCH code, compared to the original 32-parallel BCH encoder architecture without fanout bottleneck, the proposed architectures can achieve a speedup of over 100%.

IEEE Transactions on Very Large Scale Integration Systems | 2009

Backward Interpolation Architecture for Algebraic Soft-Decision Reed–Solomon Decoding

Jiangli Zhu; Xinmiao Zhang; Zhongfeng Wang

Recently developed algebraic soft-decision (ASD) decoding of Reed-Solomon (RS) codes have attracted much interest due to the fact that they can achieve significant coding gain with polynomial complexity. One major step of ASD decoding is the interpolation. Available interpolation algorithms can only add interpolation points or increase interpolation multiplicities. However, backward interpolation, which eliminates interpolation points or reduces interpolation multiplicities, is indispensable to enable the reusing of interpolation results in the following two scenarios: 1) interpolation needs to be carried out on multiple test vectors, which share common entries and 2) iterative ASD decoding where interpolation points have decreasing multiplicities. Examples for these cases are the low-complexity chase (LCC) decoding and bit-level generalized minimum distance (BGMD) decoding. With lower complexity, these algorithms can achieve similar or higher coding gain than other practical ASD algorithms. In this paper, we propose novel backward interpolation schemes and corresponding efficient implementation architectures for LCC and BGMD decoding through constructing equivalent GrOumlbner bases. The proposed architectures share computational units with forward interpolation architectures. Hence, the area overhead for incorporating the backward interpolation is very small. Substantial area saving or speedup can be achieved by using the backward interpolation. When the proposed architecture is applied to the LCC decoding of a (255, 239) RS code with eta = 3, the area is reduced to 39% of those required by prior architectures. In terms of speed/area ratio, the proposed architecture is 48% more efficient than the best available architecture. For the BGMD decoding of the same code, the proposed architecture can achieve around 20% higher efficiency.

IEEE Transactions on Computers | 2011

Reliability-Driven ECC Allocation for Multiple Bit Error Resilience in Processor Cache

Somnath Paul; Fang Cai; Xinmiao Zhang; Swarup Bhunia

With increasing parameter variations in nanometer technologies, on-chip cache in processor is becoming highly vulnerable to runtime failures induced by “soft error,” voltage, or thermal noise and aging effects. Nondeterministic and unreliable memory operation due to these runtime failures can be addressed by: 1) designing the memory for worst-case scenarios and/or 2) runtime error detection and correction. Worst-case guard-banding can lead to overly pessimistic results for cell footprint and power. On the other hand, conventional error correcting code (ECC) used in processor cache has very limited correction capability, making it insufficient to protect memory in scaled technologies (sub-45 nm), which are vulnerable to multiple-bit failures in a word (64-bit). The requirement to tolerate multibit failures is accentuated with supply voltage scaling for low-power operation. We note that due to inter and intra-die parameter variations, different memory blocks move to different reliability corners. A uniform ECC protection for all memory blocks fails to account for the distribution of vulnerability across memory blocks. On the other hand, it can lead to overly pessimistic results if the worst-case vulnerability of a memory block is accounted for during ECC allocation. In this paper, we propose a reliability-driven ECC allocation scheme that matches the relative vulnerability of a memory block (determined using postfabrication characterization) with appropriate ECC protection. We achieve postfabrication variable ECC allocation by storing the check bits in the “ways” of an associative cache. We use shortened Bose-Chaudhuri-Hocquenghem (BCH) cyclic code with zero padding, which provides high random error correction capability with modest amount of check bits. Moreover, we propose efficient circuit/architecture-level optimizations of the ECC encoding/decoding logic to minimize the impact on area, performance, and energy. Simulation results for SPEC2000 benchmarks show that such a variable ECC scheme tolerates high failure rates with negligible performance (four percent) and area (0.2 percent) penalty.

IEEE Transactions on Very Large Scale Integration Systems | 2012

Low-Complexity Reliability-Based Message-Passing Decoder Architectures for Non-Binary LDPC Codes

Xinmiao Zhang; Fang Cai; Shu Lin

Non-binary low-density parity-check (NB-LDPC) codes can achieve better error-correcting performance than their binary counterparts at the cost of higher decoding complexity when the codeword length is moderate. The recently developed iterative reliability-based majority-logic NB-LDPC decoding has better performance-complexity tradeoffs than previous algorithms. This paper first proposes enhancement schemes to the iterative hard reliability-based majority-logic decoding (IHRB-MLGD). Compared to the IHRB algorithm, our enhanced (E-)IHRB algorithm can achieve significant coding gain with small hardware overhead. Then low-complexity partial-parallel NB-LDPC decoder architectures are developed based on these two algorithms. Many existing NB-LDPC code construction methods lead to quasi-cyclic or cyclic codes. Both types of codes are considered in our design. Moreover, novel schemes are developed to keep a small proportion of messages in order to reduce the memory requirement without causing noticeable performance loss. In addition, a shift-message structure is proposed by using memories concatenated with variable node units to enable efficient partial-parallel decoding for cyclic NB-LDPC codes. Compared to previous designs based on the Min-max decoding algorithm, our proposed decoders have at least tens of times lower complexity with moderate coding gain loss.

Explore More