Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xiangdong Su is active.

Publication


Featured researches published by Xiangdong Su.


international conference on document analysis and recognition | 2011

Classical Mongolian Words Recognition in Historical Document

Guanglai Gao; Xiangdong Su; Hongxi Wei; Yeyun Gong

There are many classical Mongolian historical documents which are reserved in image form, and as a result it is difficult for us to explore and retrieve them. In this paper, we investigate the peculiarities of classical Mongolian documents and propose an approach to recognize the words in them. We design an algorithm to segment the Mongolian words into several Glyph Units(Glyph Unit abbr. GU). Each GU is consisted of no more than three characters. Then we used a three-stage method to recognize the GUs. At the first stage, all the GUs are classified into nine groups by decision tree using three features of the GUs. At the second stage, the GUs in each group are classified individually by five independent BP Neutral Networks whose inputs are other five feature vectors of the GUs. At the last stage, the five results of each GU group from the above five classifiers are combined to provide the final recognized result. The recognition rate of the Mongolian words in our experiment achieves 71%, indicating that our method is effective.


international conference on document analysis and recognition | 2015

A multiple instances approach to improving keyword spotting on historical Mongolian document images

Hongxi Wei; Guanglai Gao; Xiangdong Su

For keyword spotting of historical Mongolian document images, when user provides different instance image for the same query keyword, the performance will vary a lot. This paper proposed an approach to solving the above problem. Particularly, the whole procedure of keyword spotting is divided into two stages. The main task of the first stage is to generate multiple ranking lists for a query keyword. And the aim of the second stage is to merge the multiple ranking lists to form a final ranking. In the first stage, the ranking list of one query keyword is firstly returned by traditional image matching and then a number of instances for the query keyword are obtained using pseudo relevant feedback. Next, each instance of the query keyword can return the corresponding ranking list separately. In the second stage, the multiple ranking lists from the multiple instances of the query keyword are combined by the data fusion technique. The final ranking will be taken as the retrieval results of the query keyword. The experimental results show that the proposed approach can significantly improve the performance of keyword spotting for the historical Mongolian document images.


international conference on neural information processing | 2016

LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents

Hongxi Wei; Guanglai Gao; Xiangdong Su

The original Bag-of-Visual-Words approach discards the spatial relations of the visual words. In this paper, a LDA-based topic model is adopted to obtain the semantic relations of visual words for each word image. Because the LDA-based topic model usually hurts retrieval performance when directly employs itself. Therefore, the LDA-based topic model is linearly combined with a visual language model for each word image in this study. After that, the basic query likelihood model is used for realizing the procedure of retrieval. The experimental results on our dataset show that the proposed LDA-based representation approach can efficiently and accurately attain to the aim of keyword spotting on a collection of historical Mongolian documents. Meanwhile, the proposed approach improves the performance significantly than the original BoVW approach.


China National Conference on Chinese Computational Linguistics | 2016

A Novel Approach to Improve the Mongolian Language Model Using Intermediate Characters

Xiaofei Yan; Feilong Bao; Hongxi Wei; Xiangdong Su

In Mongolian language, there is a phenomenon that many words have the same presentation form but represent different words with different codes. Since typists usually input the words according to their representation forms and cannot distinguish the codes sometimes, there are lots of coding errors occurred in Mongolian corpus. It results in statistic and retrieval very difficult on such a Mongolian corpus. To solve this problem, this paper proposed a method which merges the words with same presentation forms by Intermediate characters, then use the corpus in Intermediate characters form to build Mongolian language model. Experimental result shows that the proposed method can reduce the perplexity and the word error rate for the 3-gram language model by 41 % and 30 % respectively when comparing model trained on the corpus without processing. The proposed approach significantly improves the performance of Mongolian language model and greatly enhances the accuracy of Mongolian speech recognition.


cross language evaluation forum | 2012

Hidden markov model for term weighting in verbose queries

Xueliang Yan; Guanglai Gao; Xiangdong Su; Hongxi Wei; Xueliang Zhang; Qianqian Lu

It has been observed that short queries generally have better performance than their corresponding long versions when retrieved by the same IR model. This is mainly because most of the current models do not distinguish the importance of different terms in the query. Observed that sentence-like queries encode information related to the term importance in the grammatical structure, we propose a Hidden Markov Model (HMM) based method to extract such information to do term weighting. The basic idea of choosing HMM is motivated by its successful application in capturing the relationship between adjacent terms in NLP field. Since we are dealing with queries of natural language form, we think that HMM can also be used to capture the dependence between the weights and the grammatical structures. Our experiments show that our assumption is quite reasonable and that such information, when utilized properly, can greatly improve retrieval performance.


chinese conference on pattern recognition | 2014

Character Segmentation for Classical Mongolian Words in Historical Documents

Xiangdong Su; Guanglai Gao; Weihua Wang; Feilong Bao; Hongxi Wei

There are many classical Mongolian historical documents which are reserved in image form, and as a result it is inconvenient for us to search and mining the desired content. In order to facilitate the word recognition in the document digitization procedure, this paper proposes a novel approach to segment the historical words in which the characters are intrinsically connected together and possess remarkable overlapping and variation. The approach consist of three steps: (1)significant contour point (SCP) detection on the approximated polygon of the word’s external contour, (2)baseline locating based on the logistic regression model and (3)segment path generation and validation based on the heuristic rules and the neural network. The SCP helps in the baseline locating and segment path generation. Experiment on the historical Mongolian Kanjur demonstrates that our approach could effectively locate the words’ baselines and segment the words into characters.


international conference on asian language processing | 2013

Dependency Parsing for Traditional Mongolian

Xiangdong Su; Guanglai Gao; Xueliang Yan

Dependency parsing has become increasingly popular in natural language processing in recent years. Nevertheless, dependency parsing focused on Tradition Mongolian has not attracted much attention. We investigate it with Maximum Spanning Tree (MST) based model on Traditional Mongolian dependency tree bank (TMDT). This paper briefly introduces Traditional Mongolian along with TMDT, and discusses the details of MST. Much emphasis is placed on the performance comparisons among eight kinds of features and their combinations in order to find a suitable feature representation. Evaluation result shows that the combination of Basic Unigram Features, Basic Bi-gram Features and C-C Sibling Features obtains the best performance. Our work establishes a baseline for dependency parsing of Traditional Mongolian.


CCL | 2013

Development of Traditional Mongolian Dependency Treebank

Xiangdong Su; Guanglai Gao; Xueliang Yan

This paper describes the development of Traditional Mongolian dependency treebank (TMDT) which aims to facilitate the dependency analysis on Traditional Mongolian. The annotation scheme of the dependency treebank is established according to Traditional Mongolian grammar and its usability in syntactic analysis. In the treebank, morphological and analytical information are annotated. At morphological level, a semi-automation strategy is adopted. Part-Of-Speech (POS) and stem of each word in the sentence are tagged and extracted respectively with automation tools, and then manually corrected. At analytical level, the dependencies in the sentence are only annotated manually according to constituent structure and the annotation scheme. This treebank formulates the foundation of dependency parsing on Traditional Mongolian and can be extended to a multi-dependency Treebank.


chinese conference on pattern recognition | 2012

The Research on Mongolian Spoken Term Detection Based on Confusion Network

Feilong Bao; Guanglai Gao; Yulai Bao; Xiangdong Su

In this paper, we present a baseline spoken term detection (STD) system for Mongolian speech data. Mongolian speech data is recognized as n-best word lattices by the Mongolian ASR system, and n-best word lattices are translated into word confusion networks. Secondly, individual arcs in the word confusion networks are indexed into an efficient inverted index structure. Finally, we search on this index structure to extract keyword candidates, and re-rank the results. Experiments show that this system can get preferable results for the Mongolian in-vocabulary (IV) query terms detection.


international conference on neural information processing | 2017

Using Word Mover’s Distance with Spatial Constraints for Measuring Similarity Between Mongolian Word Images

Hongxi Wei; Hui Zhang; Guanglai Gao; Xiangdong Su

In the framework of bag-of-visual-words, visual words are independent each other, which results in discarding spatial relations and lacking semantic information of visual words. To capture semantic information of visual words, a deep learning procedure similar to word embedding technique is used for mapping visual words to embedding vectors in a semantic space. And then, word mover’s distance (WMD) is utilized to measure similarity between two word images, which calculates the minimum traveling distance from the visual embeddings of one word image to another one. Moreover, word images are partitioned into several sub-regions with equal sizes along rows and columns in advance. After that, WMDs can be computed from the corresponding sub-regions of the two word images, separately. Thus, the similarity between the two word images is the sum of these WMDs. Experimental results show that the proposed method outperforms various baseline and state-of-the-art methods, including spatial pyramid matching, latent Dirichlet allocation, average visual word embeddings and the original word mover’s distance.

Collaboration


Dive into the Xiangdong Su's collaboration.

Top Co-Authors

Avatar

Guanglai Gao

Inner Mongolia University

View shared research outputs
Top Co-Authors

Avatar

Hongxi Wei

Inner Mongolia University

View shared research outputs
Top Co-Authors

Avatar

Feilong Bao

Inner Mongolia University

View shared research outputs
Top Co-Authors

Avatar

Xueliang Yan

Inner Mongolia University

View shared research outputs
Top Co-Authors

Avatar

Hui Zhang

Inner Mongolia University

View shared research outputs
Top Co-Authors

Avatar

Jing Wu

Inner Mongolia University

View shared research outputs
Top Co-Authors

Avatar

Qianqian Lu

Inner Mongolia University

View shared research outputs
Top Co-Authors

Avatar

Weihua Wang

Inner Mongolia University

View shared research outputs
Top Co-Authors

Avatar

Xiaofei Yan

Inner Mongolia University

View shared research outputs
Top Co-Authors

Avatar

Xueliang Zhang

Inner Mongolia University

View shared research outputs
Researchain Logo
Decentralizing Knowledge