Xu-Yao Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xu-Yao Zhang is active.

Explore More

Publication

Featured researches published by Xu-Yao Zhang.

international conference on document analysis and recognition | 2011

ICDAR 2011 Chinese Handwriting Recognition Competition

Fei Yin; Qiu-Feng Wang; Xu-Yao Zhang; Cheng-Lin Liu

In the Chinese handwriting recognition competition organized with the ICDAR 2011, four tasks were evaluated: offline and online isolated character recognition, offline and online handwritten text recognition. To enable the training of recognition systems, we announced the large databases CASIA-HWDB/OLHWDB. The submitted systems were evaluated on un-open datasets to report character-level correct rates. In total, we received 25 systems submitted by eight groups. On the test datasets, the best results (correct rates) are 92.18% for offline character recognition, 95.77% for online character recognition, 77.26% for offline text recognition, and 94.33% for online text recognition, respectively. In addition to the evaluation results, we provide short descriptions of the recognition methods and have brief discussions.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013

Writer Adaptation with Style Transfer Mapping

Xu-Yao Zhang; Cheng-Lin Liu

Adapting a writer-independent classifier toward the unique handwriting style of a particular writer has the potential to significantly increase accuracy for personalized handwriting recognition. This paper proposes a novel framework of style transfer mapping (STM) for writer adaptation. The STM is a writer-specific class-independent feature transformation which has a closed-form solution. After style transfer mapping, the data of different writers are projected onto a style-free space, where the writer-independent classifier needs no change to classify the transformed data and can achieve significantly higher accuracy. The framework of STM can be combined with different types of classifiers for supervised, unsupervised, and semi-supervised adaptation, where writer-specific data can be either labeled or unlabeled and need not cover all classes. In this paper, we combine STM with the state-of-the-art classifiers for large-category Chinese handwriting recognition: learning vector quantization (LVQ) and modified quadratic discriminant function (MQDF). Experiments on the online Chinese handwriting database CASIA-OLHWDB demonstrate that STM-based adaptation is very efficient and effective in improving classification accuracy. Semi-supervised adaptation achieves the best performance, while unsupervised adaptation is even better than supervised adaptation. On handwritten text data, semi-supervised adaptation achieves error reduction rates 31.95 and 25.00 percent by LVQ and MQDF, respectively.

Pattern Recognition | 2017

Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark

Xu-Yao Zhang; Yoshua Bengio; Cheng-Lin Liu

Recent deep learning based methods have achieved the state-of-the-art performance for handwritten Chinese character recognition (HCCR) by learning discriminative representations directly from raw data. Nevertheless, we believe that the long-and-well investigated domain-specific knowledge should still help to boost the performance of HCCR. By integrating the traditional normalization-cooperated direction-decomposed feature map (directMap) with the deep convolutional neural network (convNet), we are able to obtain new highest accuracies for both online and offline HCCR on the ICDAR-2013 competition database. With this new framework, we can eliminate the needs for data augmentation and model ensemble, which are widely used in other systems to achieve their best results. This makes our framework to be efficient and effective for both training and testing. Furthermore, although directMap+convNet can achieve the best results and surpass human-level performance, we show that writer adaptation in this case is still effective. A new adaptation layer is proposed to reduce the mismatch between training and test data on a particular source layer. The adaptation process can be efficiently and effectively implemented in an unsupervised manner. By adding the adaptation layer into the pre-trained convNet, it can adapt to the new handwriting styles of particular writers, and the recognition accuracy can be further improved consistently and significantly. This paper gives an overview and comparison of recent deep learning based approaches for HCCR, and also sets new benchmarks for both online and offline HCCR.

IEEE Transactions on Circuits and Systems for Video Technology | 2017

Hybrid CNN and Dictionary-Based Models for Scene Recognition and Domain Adaptation

Guo-Sen Xie; Xu-Yao Zhang; Shuicheng Yan; Cheng-Lin Liu

Convolutional neural network (CNN) has achieved the state-of-the-art performance in many different visual tasks. Learned from a large-scale training data set, CNN features are much more discriminative and accurate than the handcrafted features. Moreover, CNN features are also transferable among different domains. On the other hand, traditional dictionary-based features (such as BoW and spatial pyramid matching) contain much more local discriminative and structural information, which is implicitly embedded in the images. To further improve the performance, in this paper, we propose to combine CNN with dictionary-based models for scene recognition and visual domain adaptation (DA). Specifically, based on the well-tuned CNN models (e.g., AlexNet and VGG Net), two dictionary-based representations are further constructed, namely, mid-level local representation (MLR) and convolutional Fisher vector (CFV) representation. In MLR, an efficient two-stage clustering method, i.e., weighted spatial and feature space spectral clustering on the parts of a single image followed by clustering all representative parts of all images, is used to generate a class-mixture or a class-specific part dictionary. After that, the part dictionary is used to operate with the multiscale image inputs for generating mid-level representation. In CFV, a multiscale and scale-proportional Gaussian mixture model training strategy is utilized to generate Fisher vectors based on the last convolutional layer of CNN. By integrating the complementary information of MLR, CFV, and the CNN features of the fully connected layer, the state-of-the-art performance can be achieved on scene recognition and DA problems. An interested finding is that our proposed hybrid representation (from VGG net trained on ImageNet) is also complementary to GoogLeNet and/or VGG-11 (trained on Place205) greatly.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2018

Drawing and Recognizing Chinese Characters with Recurrent Neural Network

Xu-Yao Zhang; Fei Yin; Yan-Ming Zhang; Cheng-Lin Liu; Yoshua Bengio

Recent deep learning based approaches have achieved great success on handwriting recognition. Chinese characters are among the most widely adopted writing systems in the world. Previous research has mainly focused on recognizing handwritten Chinese characters. However, recognition is only one aspect for understanding a language, another challenging and interesting task is to teach a machine to automatically write (pictographic) Chinese characters. In this paper, we propose a framework by using the recurrent neural network (RNN) as both a discriminative model for recognizing Chinese characters and a generative model for drawing (generating) Chinese characters. To recognize Chinese characters, previous methods usually adopt the convolutional neural network (CNN) models which require transforming the online handwriting trajectory into image-like representations. Instead, our RNN based approach is an end-to-end system which directly deals with the sequential structure and does not require any domain-specific knowledge. With the RNN system (combining an LSTM and GRU), state-of-the-art performance can be achieved on the ICDAR-2013 competition database. Furthermore, under the RNN framework, a conditional generative model with character embedding is proposed for automatically drawing recognizable Chinese characters. The generated characters (in vector format) are human-readable and also can be recognized by the discriminative RNN model with high accuracy. Experimental results verify the effectiveness of using RNNs as both generative and discriminative models for the tasks of drawing and recognizing Chinese characters.

international joint conference on artificial intelligence | 2011

Pattern field classification with style normalized transformation

Xu-Yao Zhang; Kaizhu Huang; Cheng-Lin Liu

Field classification is an extension of the traditional classification framework, by breaking the i.i.d. assumption. In field classification, patterns occur as groups (fields) of homogeneous styles. By utilizing style consistency, classifying groups of patterns is often more accurate than classifying single patterns. In this paper, we extend the Bayes decision theory, and develop the Field Bayesian Model (FBM) to deal with field classification. Specifically, we propose to learn a Style Normalized Transformation (SNT) for each field. Via the SNTs, the data of different fields are transformed to a uniform style space (i.i.d. space). The proposed model is a general and systematic framework, under which many probabilistic models can be easily extended for field classification. To transfer the model to unseen styles, we propose a transductive model called Transfer Bayesian Rule (TBR) based on self-training. We conducted extensive experiments on face, speech and a large-scale handwriting dataset, and got significant error rate reduction compared to the state-of-the-art methods.

IEEE Transactions on Neural Networks | 2015

Retargeted Least Squares Regression Algorithm

Xu-Yao Zhang; Lingfeng Wang; Shiming Xiang; Cheng-Lin Liu

This brief presents a framework of retargeted least squares regression (ReLSR) for multicategory classification. The core idea is to directly learn the regression targets from data other than using the traditional zero-one matrix as regression targets. The learned target matrix can guarantee a large margin constraint for the requirement of correct classification for each data point. Compared with the traditional least squares regression (LSR) and a recently proposed discriminative LSR models, ReLSR is much more accurate in measuring the classification error of the regression model. Furthermore, ReLSR is a single and compact model, hence there is no need to train two-class (binary) machines that are independent of each other. The convex optimization problem of ReLSR is solved elegantly and efficiently with an alternating procedure including regression and retargeting as substeps. The experimental evaluation over a range of databases identifies the validity of our method.

Pattern Recognition | 2013

Evaluation of weighted Fisher criteria for large category dimensionality reduction in application to Chinese handwriting recognition

Xu-Yao Zhang; Cheng-Lin Liu

To improve the class separability of Fisher linear discriminant analysis (FDA) for large category problems, we investigate the weighted Fisher criterion (WFC) by integrating weighting functions for dimensionality reduction. The objective of WFC is to maximize the sum of weighted distances of all class pairs. By setting larger weights for the most confusable classes, WFC can improve the class separation while the solution remains an eigen-decomposition problem. We evaluate five weighting functions in three different weighting spaces in a typical large category problem of handwritten Chinese character recognition. The weighting functions include four based on existing methods, namely, FDA, approximate pairwise accuracy criterion (aPAC), power function (POW), confused distance maximization (CDM), and a new one based on K-nearest neighbors (KNN). All the weighting functions can be calculated in the original feature space, low-dimensional space, or fractional space. Our experiments on a 3,755-class Chinese handwriting database demonstrate that WFC can improve the classification accuracy significantly compared to FDA. Among the weighting functions, the KNN method in the original space is the most competitive model which achieves significantly higher classification accuracy and has a low computational complexity. To further improve the performance, we propose a nonparametric extension of the KNN method from the class level to the sample level. The sample level KNN (SKNN) method is shown to outperform significantly other methods in Chinese handwriting recognition such as the locally linear discriminant analysis (LLDA), neighbor class linear discriminant analysis (NCLDA), and heteroscedastic linear discriminant analysis (HLDA).

computer vision and pattern recognition | 2011

Style transfer matrix learning for writer adaptation

Xu-Yao Zhang; Cheng-Lin Liu

In this paper, we propose a novel framework of style transfer matrix (STM) learning to reduce the writing style variation in handwriting recognition. After writer-specific style transfer learning, the data of different writers is projected onto a style-free space, where a writer independent classifier can yield high accuracy. We combine STM learning with a specific nearest prototype classifier: learning vector quantization (LVQ) with discriminative feature extraction (DFE), where both the prototypes and the subspace transformation matrix are learned via online discriminative learning. To adapt the basic classifier (trained with writer-independent data) to particular writers, we first propose two supervised models, one based on incremental learning and the other based on supervised STM learning. To overcome the lack of labeled samples for particular writers, we propose an unsupervised model to learn the STM using the self-taught strategy (also known as self-training). Experiments on a large-scale Chinese online handwriting database demonstrate that STM learning can reduce recognition errors significantly, and the unsupervised adaptation model performs even better than the supervised models.

Pattern Recognition | 2017

LG-CNN: From local parts to global discrimination for fine-grained recognition

Guo-Sen Xie; Xu-Yao Zhang; Wenhan Yang; Mingliang Xu; Shuicheng Yan; Cheng-Lin Liu

Abstract Fine-grained recognition is one of the most difficult topics in visual recognition, which aims at distinguishing confusing categories such as bird species within a genus. The information of part and bounding boxes in fine-grained images is very important for improving the performance. However, in real applications, the part and/or bounding box annotations may not exist. This makes fine-grained recognition a challenging problem. In this paper, we propose a jointly trained Convolutional Neural Networkxa0(CNN) architecture to solve the fine-grained recognition problem without using part and bounding box information. In this framework, we first detect part candidates by calculating the gradients of feature maps of a trained CNN model w.r.t.xa0the input image and then filter out unnecessary ones by fusing two saliency detection methods. Meanwhile, two groups of global object locations are obtained based on the saliency detection methods and a segmentation method. With the filtered part candidates and approximate object locations as inputs, we construct the CNN architecture with local parts and global discriminationxa0(LG-CNN) which consists of two CNN networks with shared weights. The upper stream of LG-CNN is focused on the part information of the input image, the bottom stream of LG-CNN is focused on the global input image. LG-CNN is jointly trained by two stream loss functions to guide the updating of the shared weights. Experiments on three popular fine-grained datasets well validate the effectiveness of our proposed LG-CNN architecture. Applying our LG-CNN architecture to generic object recognition datasets also yields superior performance over the directly fine-tuned CNN architecture with a large margin.

Explore More