Kuizhi Mei | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kuizhi Mei is active.

Explore More

Publication

Featured researches published by Kuizhi Mei.

IEEE Transactions on Circuits and Systems for Video Technology | 2007

VLSI Design of a High-Speed and Area-Efficient JPEG2000 Encoder

Kuizhi Mei; Nanning Zheng; Chang Huang; Yuehu Liu; Qiang Zeng

A high-speed VLSI design of an area-efficient JPEG2000 encoder is given. Recursive multilevel 2D discrete wavelet transform (DWT) architecture with dual buffers is proposed to reduce the wavelet coefficients memory to 1/4 tile size, prerate allocation is used to reduce the compressed code memory to 3/4 tile size. A highly pipelined and parallelism implementation of line-based 1-level DWT is proposed using two line-buffers in 5/3 wavelet type and its input speed is up to 2 samples/cycle; code block based address mapping in access wavelet coefficients memory, concurrent state variables generation and multiple parallel and pipeline coding methods are used in the bit plane encoder (BPE) which encodes on average at 40.5 M samples/s at 100 MHz with no memory used; the conditional two-symbol pipeline arithmetic encoder (AE) encodes at 1.3 symbols/cycle. Parallel units in BPE and buffer control between BPE and AE are optimally implemented with low cost without performance loss. Byte representation of rate-distortion slope used reaches a near optimal implementation of post-coding rate distortion in Tier2 with low cost. The compressed file generated by the encoder is fully compatible with ISO/IEC FCD15444-1. The encoder is verified on field-programmable gate array platform with a direct interface to digital video input with tile size 256 times 256 and code block size 32 times 16. The resulting input sampling rate is up to 58 M samples/s when Tier1 operates at 100 MHz. Difference of the peak signal-to-noise ratio of images compressed by our encoder and JasPer is less than 0.2 dB when the compression ratio is greater than 1 bps. Equivalent NAND2 gates synthesized are 90.6 K and on-chip RAM size is 626.75 kb. Unlike other designs the proposed design of JPEG2000 encoder has high compression quality as well as high speed and area-efficiency.

IEEE Transactions on Circuits and Systems for Video Technology | 2013

Reconfigurable Processor for Binary Image Processing

Bin Zhang; Kuizhi Mei; Nanning Zheng

Binary image processing is a powerful tool in many image and video applications. A reconfigurable processor is presented for binary image processing in this paper. The processors architecture is a combination of a reconfigurable binary processing module, input and output image control units, and peripheral circuits. The reconfigurable binary processing module, which consists of mixed-grained reconfigurable binary compute units and output control logic, performs binary image processing operations, especially mathematical morphology operations, and implements related algorithms more than 200 f/s for a 1024  × 1024 image. The periphery circuits control the whole image processing and dynamic reconfiguration process. The processor is implemented on an EP2S180 field-programmable gate array. Synthesis results show that the presented processor can deliver 60.72 GOPS and 23.72 GOPS/mm2 at a 220-MHz system clock in the SMIC 0.18-μm CMOS process. The simulation and experimental results demonstrate that the processor is suitable for real-time binary image processing applications.

Pattern Recognition | 2015

Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection

Jianping Fan; Ji Zhang; Kuizhi Mei; Jinye Peng; Ling Gao

In this paper, a cost-sensitive learning algorithm is developed to train hierarchical tree classifiers for large-scale image classification application (i.e., categorizing large-scale images into thousands of object classes). A visual tree is first constructed for organizing large numbers of object classes hierarchically and identifying inter-related learning tasks automatically. For the fine-grained object classes at the sibling leaf nodes, they share significant common visual properties but still contain subtle visual differences, thus a multi-task structural learning algorithm is developed to train their inter-related classifiers jointly to enhance their discrimination power. For the coarse-grained categories (i.e., groups of visually similar object classes) at the sibling non-leaf nodes, a hierarchical learning algorithm is developed to leverage tree structure (by adding two inter-level constraints) to train their inter-related classifiers jointly and control inter-level error propagation effectively. To achieve more robust detection of large numbers of object classes, a visual forest is learned by combining multiple visual trees (for different configurations) and their hierarchical tree classifiers. By penalizing various types of misclassification errors differently, a cost-sensitive learning approach is further developed to detect the appearances of new object classes accurately, and an incremental learning algorithm is developed to achieve more effective training of the discriminative classifiers for new object classes. Our experimental results have demonstrated that our cost-sensitive hierarchical learning algorithm can achieve very competitive results on both classification accuracy and computational efficiency as compared with other state-of-the-art techniques. HighlightsVisual tree to organize large-scale object classes hierarchically and determine inter-related learning tasks automatically.Multi-task structural learning for joint classifier training to enhance their discrimination power significantly.Hierarchical learning to leverage inter-level constraints for classifier training and limiting inter-level error propagation.Task and tree parallelism to scale up our hierarchical learning algorithm for large-scale image classification.Cost-sensitive learning and incremental learning for training and detecting for new object classes more effectively.

Neurocomputing | 2014

A distributed approach for large-scale classifier training and image classification

Kuizhi Mei; Peixiang Dong; Hao Lei; Jianping Fan

In this paper, a distributed approach is developed for achieving large-scale classifier training and image classification. First, a visual concept network is constructed for determining the inter-related learning tasks automatically, e.g., the inter-related classifiers for the visually similar object classes in the same group should be trained in parallel by using multiple machines to enhance their discrimination power. Second, an MPI-based distributed computing approach is constructed by using a master-slave mode to address two critical issues of huge computational cost and huge storage/memory cost for large-scale classifier training and image classification. In addition, an indexing-based storage method is developed for reducing the sizes of intermediate SVM models and avoiding the repeated computations of SVs (support vectors) in the test stage for image classification. Our experiments have also provided very positive results on 2010 ImageNet database for Large Scale Visual Recognition Challenge.

Pattern Recognition | 2013

Training inter-related classifiers for automatic image classification and annotation

Peixiang Dong; Kuizhi Mei; Nanning Zheng; Hao Lei; Jianping Fan

A structural learning algorithm is developed in this paper to achieve more effective training of large numbers of inter-related classifiers for supporting large-scale image classification and annotation. A visual concept network is constructed for characterizing the inter-concept visual correlations intuitively and determining the inter-related learning tasks automatically in the visual feature space rather than in the label space. By partitioning large numbers of object classes and image concepts into a set of groups according to their inter-concept visual correlations, the object classes and image concepts in the same group will share similar visual properties and their classifiers are strongly inter-related while the object classes and image concepts in different groups will contain various visual properties and their classifiers can be trained independently. By leveraging the inter-concept visual correlations for inter-related classifier training, our structural learning algorithm can train the inter-related classifiers jointly rather than independently, which can enhance their discrimination power significantly. Our experiments have also provided very positive results on large-scale image classification and annotation.

Pattern Recognition | 2014

Learning group-based dictionaries for discriminative image representation

Hao Lei; Kuizhi Mei; Nanning Zheng; Peixiang Dong; Ning Zhou; Jianping Fan

Dictionary learning is a critical issue for achieving discriminative image representation in many computer vision tasks such as object detection and image classification. In this paper, a new algorithm is developed for learning discriminative group-based dictionaries, where the inter-concept (category) visual correlations are leveraged to enhance both the reconstruction quality and the discrimination power of the group-based discriminative dictionaries. A visual concept network is first constructed for determining the groups of visually similar object classes and image concepts automatically. For each group of such visually similar object classes and image concepts, a group-based dictionary is learned for achieving discriminative image representation. A structural learning approach is developed to take advantage of our group-based discriminative dictionaries for classifier training and image classification. The effectiveness and the discrimination power of our group-based discriminative dictionaries have been evaluated on multiple popular visual benchmarks. HighlightsA visual concept network is built to characterize the inter-concept correlations.An automatic algorithm is proposed for identifying the visually similar groups.A new algorithm is developed to learn group-based discriminative dictionaries.A structural method is developed for classifier training and image classification.Our proposed algorithms are evaluated on multiple popular visual benchmarks.

Neurocomputing | 2015

A real-time hand detection system based on multi-feature

Kuizhi Mei; Lu Xu; Boliang Li; Bin Lin; Fang Wang

This paper describes a real-time hand detection system which can reach high speed and accuracy. The system is based on Gentle Adaboost and cascade classifier. To improve the performance of the system, three efficient features are selected to describe the visual properties of human hands. In addition, the detection is accelerated due to several optimization methods, including the method for fast calculation of HOG features, improved cascade classifier and skin-color pre-detection. Experiments were performed on our self-constructed dataset, the results showed that the detection rate of the system can reach 0.889 while the false rate is 0.010 at the speed of 32.6339ms per frame on a Intel Core i5-2400 CPU running at 3.1GHz.

networking architecture and storages | 2013

REPAIR: A Reliable Partial-Redundancy-Based Router in NoC

Lei Xie; Kuizhi Mei; Yuhai Li

As scale and integration density of network-on-chip increase sharply, more transistors have been integrated into one chip. This unfortunately leads to more unexpected variations and faults in system. In particular, the transient errors and hardware permanent faults have rapidly become the key constraint for large-scale network design. This increasing tendency highlights the incorporation of fault-tolerant solutions for Network-on-Chip (NoC) architecture. In this paper we propose a Reliable Partial-Redundancy-based router architecture (REPAIR). The proposed scheme merely utilizes an additional buffer and a bus to enhance the connectivity of the data path in router. Meanwhile, REPAIR also employs error control coding (ECC) modules and decision-table-based (DT) control logic to implement an efficient online diagnosis and reconfigurable mechanism respectively. The experimental results show the good ability of REPAIR to tolerate hard faults under a high fault rates. Specifically, the silicon protection factor (SPF) of individual router reaches 16.34 and over 95% packets still can be successfully transferred in 16x16 torus network with 650 faults.

IEEE Journal of Biomedical and Health Informatics | 2015

Hierarchical Classification of Large-Scale Patient Records for Automatic Treatment Stratification

Kuizhi Mei; Jinye Peng; Ling Gao; Naiquan Nigel Zheng; Jianping Fan

In this paper, a hierarchical learning algorithm is developed for classifying large-scale patient records, e.g., categorizing large-scale patient records into large numbers of known patient categories (i.e., thousands of known patient categories) for automatic treatment stratification. Our hierarchical learning algorithm can leverage tree structure to train more discriminative max-margin classifiers for high-level nodes and control interlevel error propagation effectively. By ruling out unlikely groups of patient categories (i.e., irrelevant high-level nodes) at an early stage, our hierarchical approach can achieve log-linear computational complexity, which is very attractive for big data applications. Our experiments on one specific medical domain have demonstrated that our hierarchical approach can achieve very competitive results on both classification accuracy and computational efficiency as compared with other state-of-the-art techniques.

Microprocessors and Microsystems | 2014

LDBR: Low-deflection bufferless router for cost-sensitive network-on-chip design

Yuhai Li; Kuizhi Mei; Yuehu Liu; Nanning Zheng; Yi Xu

Abstract In network-on-chip (NoC) designs, the bufferless router is more energy-efficient than the conventional router with buffers. However, in the bufferless network, deflections cause great performance loss. In this paper, three deflection models are firstly constructed for analyzing the causes of deflections. Then, we propose a low-deflection bufferless router (LDBR), in which a multi-channel network interface and a novel deflection routing based on turn model are designed for reducing the deflections during packet transmissions. Finally, LDBR is evaluated against the latest bufferless routers using synthetic and real-world traffic patterns. The experimental results exhibit that the deflection rate of LDBR network is reduced by 41% compared to other bufferless networks, and LDBR also shows superiority in cost and power consumption across all workloads.

Explore More