Xiaoyan Zhu
Tsinghua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xiaoyan Zhu.
systems, man and cybernetics | 2004
Li Zhuang; Ta Bao; Xiaoyan Zhu; Chunheng Wang; Satoshi Naoi
This work describes an effective spelling check approach for Chinese OCR with a new multi-knowledge based statistical language model. This language model combines the conventional n-gram language model and the new LSA (latent semantic analysis) language model, so both local information (syntax) and global information (semantic) are utilized. Furthermore, Chinese similar characters are used in Viterbi search process to expand the candidate list in order to add more possible correct results. With our approach, the best recognition accuracy rate increases from 79.3% to 91.9%, which means 60.9% error reduction.
international conference on signal processing | 2000
Yu Hao; Xiaoyan Zhu
Due to the excellent ability of dynamically adjusting the observation scope when the analyzing frequency changed, wavelet transform has been successfully applied to the processing of a non-stationary speech signal. We use the wavelet transform (WT) result and combine it with the traditional CEP vector (CEP-WT) to produce a new feature as the front-end output of an auto speech recognition system. Evaluation and experiments of the new feature show that the proposed method can improve the recognition system performance promisingly.
Information & Software Technology | 2002
Minghu Jiang; Xiaoyan Zhu; Georges Gielen; Elliott Franco Drábek; Ying Xia; Gang Tan; Ta Bao
Abstract In this paper, we study Braille word segmentation and transformation of Mandarin Braille to Chinese characters. The former consists of rule, sign and knowledge bases for disambiguation and mistake correction by using adjacent constraints and bi-directional maximal matching in which segmentation precision is better than 99%. The latter can be divided into two stages: Braille to Chinese pinyin (a phonemic Romanization) and pinyin to characters. By incorporating a pinyin knowledge dictionary into the system, we have perfectly solved the problem of ambiguity in the translation from Braille to pinyin and developed a statistical language model based on the transformation of pinyin to characters. By using Viterbi search, we have built a multi-level graph and found the sequence of Chinese characters with maximal likelihood. By using an N -Best algorithm to get the N most likely character sequences and probing into the means of measurement, our correct candidates within the top-five have a further improvement of 3%. By testing on 40,000 Chinese characters for the evaluation of the system performance, our overall translation precision of Braille codes to Chinese characters for common documents arrives at 94.38%; if proper nouns are not considered, our improvement reaches 2%.
ieee international conference on intelligent processing systems | 1997
Yiying Zhang; Xiaoyan Zhu; Yu Hao; Yupin Luo
The problem of automatic word boundary detection in a quiet environment and in the presence of noise is addressed. A fast and robust algorithm for accurately locating the endpoints of isolated words is described in detail. This algorithm utilizes energy and zero crossing parameters to acquire the reference endpoints, and then the principle of variable frame rate (VFR) is adopted and cepstrum is used to accurately define the boundaries of isolated words. Experimental results show that the accuracy of the algorithm is quite acceptable. Moreover, the computation overload of this algorithm is low since the cepstrum parameters will be used in later recognition procedure.
international conference on pattern recognition | 2002
Xiaoyan Zhu; Xiaoxin Yin
A robust approach is proposed for document skew detection. We use Fourier analysis and SVM to classify textual areas from non-textual areas of documents. We also propose a robust method to determine the skew angle from textual areas. Our approach achieves good performance on documents with large area of non-textual contents.
Information Sciences | 2002
Minghu Jiang; Georges Gielen; Beixing Deng; Xiaoyan Zhu
To solve the long training time in Waibels time-delay neural networks (TDNNs) for phoneme recognition, several improved fast learning methods are put forward: (1) by a combination between the unsupervised Ojas rule learning method with the similar error backpropagation (BP) algorithm for initial weights training; (2) by improving of the error energy function with weights update according to the output error size; (3) by changing of BP error from along layers to frames and using the averaged overlapping part of frame-shift delta values in the weights of the bottom layer; (4) by training of the data from a small to large number of samples gradually; and (5) by using the optimal modular neural networks (OMNNs) with tree structure for multiple phonemes. Our experimental results indicate that the convergence speed is accelerated with orders of magnitude and in most cases the error function descends monotonically while the network complexity increases less and the recognition rates are almost the same among the different comparative experiments.
systems man and cybernetics | 2000
Yiying Zhang; David Zhang; Xiaoyan Zhu
This correspondence introduces a new text-independent speaker verification method, which is derived from the basic idea of pattern recognition that the discriminating ability of a classifier can be improved by removing the common information between classes. In looking for the common speech characteristics between a group of speakers, a global speaker model can be established. By subtracting the score acquired from this model, the conventional likelihood score is normalized with the consequence of more compact score distribution and lower equal error rates. Several experiments are carried out to demonstrate the effectiveness of the proposed method.
international conference on neural information processing | 2006
Kaizhu Huang; Jun Sun; Yoshinobu Hotta; Katsuhito Fujimoto; Satoshi Naoi; Chong Long; Li Zhuang; Xiaoyan Zhu
Handwritten Chinese Address Recognition describes a difficult yet important pattern recognition task. There are three difficulties in this problem: (1) Handwritten address is often of free styles and of high variations, resulting in inevitable segmentation errors. (2) The number of Chinese characters is large, leading low recognition rate for single Chinese characters. (3) Chinese address is usually irregular, i.e., different persons may write the same address in different formats. In this paper, we propose a comprehensive and hybrid approach for solving all these three difficulties. Aiming to solve (1) and (2), we adopt an enhanced holistic scheme to recognize the whole image of words (defined as a place name) instead of that of single characters. This facilitates the usage of address knowledge and avoids the difficult single character segmentation problem as well. In order to attack (3), we propose a hybrid approach that combines the word-based language model and the holistic word matching scheme. Therefore, it can deal with various irregular address. We provide theoretical justifications, outline the detailed steps, and perform a series of experiments. The experimental results on various real address demonstrate the advantages of our novel approach.
systems man and cybernetics | 2001
Rui Guo; Xiaoyan Zhu; Yu Hao
This paper introduces a Chinese spoken dialog system providing services for blind people through which. they can use computers. A description of the architecture of the dialog system is presented briefly and the way in which each component works is also explained. The key factor of such a dialog system is extraction of the intention of a users utterance so as to make an appropriate response. To achieve this, a case grammar formalism was applied for semantic description and a robust spoken language parsing method based on case-frames was adopted to obtain the semantic interpretation of the input. It shows that this parsing method can tolerate errors of speech recognition and grammatical deviation of spoken language to some extent.
world congress on intelligent control and automation | 2000
Binfeng Yan; Rui Guo; Xiaoyan Zhu; Bo Zhang
Keyword spotting is one of important topics in speech recognition. The paper presents a model of keyword spotting and the training method of relevant keyword models and anti-words models, then we give a strategy of keyword spotting. Experiment indicates the strategy is effective.