Xue-Dong Tian | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xue-Dong Tian is active.

Explore More

Publication

Featured researches published by Xue-Dong Tian.

international conference on machine learning and cybernetics | 2005

Optical font recognition based on Gabor filter

Ming-Hu Ha; Xue-Dong Tian; Zi-Ru Zhang

The font recognition of Chinese characters is an important part in OCR (optical character recognition) system. It is also a main technical challenge due to the similarity of different fonts. The reconstruction quality of layout depends on the accuracy of font recognition. However, the prevalent method of font recognition is predominant font recognition based on the fact that the most layouts are printed in a single font, which makes it impossible to reconstruct the original layout. In this paper, an improved font recognition method of individual character is proposed. The approach consists of three steps. In the first step, the guidance fonts are acquired based on Gabor filter optimized with genetic algorithm (GA). Then a single font recognizer is applied to get the matching results with the help of the guidance fonts and the layout knowledge of font typesetting. Finally, the post-processing of font recognition is fulfilled according to the layout knowledge. Experiments were carried out with samples from newspaper and magazines and the results show that the method is of immense practical and theoretical value.

international conference on machine learning and cybernetics | 2002

An improved font recognition method based on texture analysis

Fang Yang; Xue-Dong Tian; Bao-Lan Guo

Font recognition plays an important role in OCR system. It can be achieved in a simple and effective way with texture analysis by regarding fonts as different textures. But Gabor filters with traditional parameters, which are used to extract features in the previous approach, are not much suitable to font recognition and the RR (Recognition Rate) will decrease sharply when the similar fonts are recognized since the font textures is different from natural textures. Therefore, some adjustments are proposed in this paper to improve the RR: 1. A bank of optimized filters can be gotten by using Genetic Algorithm to optimize the orientation parameters. It can be used to extract distinct features to identify the font well. 2. For reducing the FAR (False-Accept Rate), several dictionaries are set to deal with the diversity in textures of the same font. Experiments are carried out with 899 textures of 4 frequently used Chinese fonts in newspaper, the results compared with the previous show that RR can be improved and the adjustments are useful.

international conference on machine learning and cybernetics | 2002

Individual character font recognition based on guidance font

Xiu-Fen Miao; Xue-Dong Tian; Bao-Lan Guo

Font recognition of Chinese character is an important part of Chinese character recognition and page layout reconstruction. This article puts forward a new font recognition method. The guidance fonts are acquired from the chapter font or acquired knowledge of font typesetting, and it starts the corresponding single-font character recognizer to get the matching results that are used in recognizing fonts. Experiment shows that this method increases the speed and veracity of recognition.

international conference on machine learning and cybernetics | 2005

Optical font recognition of chinese characters based on texture features

Ming-Hu Ha; Xue-Dong Tian

Font recognition is a fundamental issue in the identification, analysis and reconstruction of documents. In this paper, a new method of optical font recognition is proposed which could recognize the font of every Chinese character. It employs a statistical method based on global texture analysis to recognize a predominant font, and uses a traditional recognizer of a single font to identify the font of a single character by the guidance of an obtained predominant font. It consists of three steps. First, the guiding fonts are acquired based on Gabor features. Then a font recognizer is run to identify the font of the characters one by one. Finally, a post-processing is fulfilled according to the layout knowledge to correct the errors of font recognition. Experiments are carried out and the results show that this method is of immense practical and theoretical value.

international conference on machine learning and cybernetics | 2002

A study on printed form processing and reconstruction

Zhi-Hong Zhao; Xue-Dong Tian; Bao-Lan Guo

Considering we deal with a lot of forms in our daily life, form-processing system has a great value in office automation. This paper focuses on processing and reconstructing the printed forms. The main problems in form processing are the broken lines and the errors in field extraction. These problems affect the result of the form processing. Some methods are introduced in this paper to improve the result of field extraction. First we use regulated morphology to solve the broken lines. Second a novel method by heuristic algorithm is used to improve the field extraction accuracy. The results indicate that these methods improve the result of form processing. The reconstruction of form image is also discussed in this paper.

international conference on computer science and network technology | 2013

An indexing method of mathematical expression retrieval

Xue-Dong Tian; Songqiang Yang; Xinfu Li; Fang Yang

As the kernel component of scientific documents, mathematical expressions are becoming a new object of searching engines. Different from normal text, mathematical expressions are composed of various kinds of symbols arranged in nonlinear mode, which results in the limitations of traditional full-text information retrieval used for expression searching. In this paper, we discuss the existing search engine of mathematical expressions and introduce the two-dimensional characteristics of mathematical expressions firstly. Then, a data structure of expressing mathematical formulas is designed which contains not only the symbol code but also the mathematical information among symbols. Finally, the indexing algorithm of mathematical expressions is put forward on the basis of the expression data structure. The experimental result shows the effectiveness of the indexing method proposed in this paper.

international conference on machine learning and cybernetics | 2007

An Improved Method Based on Gabor Feature for Mathematical Symbol Recognition

Xue-Dong Tian; Li-Na Zuo; Fang Yang; Ming-Hu Ha

An improved method based on Gabor feature for printed mathematical symbol recognition is presented in this paper. Elastic meshing technology is first applied to partition the character and get sampling points. Then, a set of Gabor filters are used to extract different directional features at each sampling point. In order to decrease the processing time, convolution operations are conducted with the real part of Gabor templates. Experimental results show that the proposed method has excellent performance on printed mathematical symbols.

international conference on machine learning and cybernetics | 2006

Chinese New Words Extraction Based on Machine Learning Approach

Zi-Ru Zhang; Qiang-Jun Wang; Xue-Dong Tian

Chinese new words extraction is an important problem for Chinese information processing. In this paper a new words extraction method based on machine learning is proposed, where the context information, the word construction rules and statistic information are combined to extract new words. An experiment, based on two-character-nouns, shows that this method can well improve the efficiency and accuracy of extracting new words

international conference on machine learning and cybernetics | 2005

Research on optical formulas extraction

Xue-Dong Tian; Wei-Zhong Sun; Ming-Hu Ha

Automatic recognition and reconstruction of formulas are key parts in an OCR (optical character recognition) system. Mathematical formula extraction is the first step in this technique. Little has been done in this area. Some research was focused on mathematical formulas in printed documents. An approach containing both the MSE feature of CCXs and heuristic rules for mathematical formula extraction is proposed. The MSF feature of the CCXs based approach is used to extract isolated formulas from printed documents and some heuristic rules are used to extract the embedded formulas from image blocks. The experiments indicate that a combination of the two methods can obtain favorable results.

international conference on machine learning and cybernetics | 2003

A Chinese document layout analysis method based on minimal spanning tree clustering

Xue-Dong Tian; Chong Zhang

For adapting to some special characteristics of Chinese documents, a method based on minimal spanning tree clustering is presented. This method is a bottom-up approach. First apply run-length smoothing algorithm on the document in horizontal direction, and then in vertical direction. After that, minimal spanning tree clustering is applied. We can infer from experiments that the problem of Chinese document layout analysis can be resolved in a better way.

Explore More