Is this you? Create Your Porfile

Yantuan Xian

Kunming University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yantuan Xian is active.

Explore More

Publication

Featured researches published by Yantuan Xian.

Archive | 2013

A Chinese Expert Name Disambiguation Approach Based on Spectral Clustering with the Expert Page-Associated Relationships

Wei Tian; Tao Shen; Zhengtao Yu; Jianyi Guo; Yantuan Xian

Aimed at the problems of Chinese experts’ name repetition and representation diversity, a Chinese expert name disambiguation approach based on spectral clustering with the expert page-associated relationships is proposed. Firstly, the TF-IDF algorithm is used to calculate the word-based feature weights, and then the cosine similarity algorithm is employed to compute the similarity between the evidence-pages to obtain the initial similarity matrix of expert evidence-pages. Secondly, the expert page-associated relationship features are taken as the semi-supervised constraint information to correct the initial similarity matrix, and next the spectral clustering-based method is used to build expert disambiguation model. Finally, taking the contrast experiments on Chinese expert evidence-page corpus of manually labeled, the result shows that the semi-supervised spectral clustering on Chinese experts’ name disambiguation method with the expert page-associated relationships than that without the associated constraint information, the F-value has an average increase of 9.02 %.

chinese control and decision conference | 2014

A novel method for extracting entity data from Deep Web precisely

Haitao Yu; Jianyi Guo; Zhengtao Yu; Yantuan Xian; Xin Yan

In order to make better use of the hidden information value in the Deep Web, get fast and accurate access to the embedded entity data, this paper presented a method for extracting entity data from Deep Web precisely, designed a entity extraction system, which will extract data from Deep Web automatically. Firstly, designed a web crawler based on the characteristics of Deep Web, take advantage of the web crawler to get resources from Internet; Secondly, the pretreatment of web resources, normalize the pages which are non-standard; Finally, locate and extract the entity data from Deep Web accurately, in this paper, based on the hierarchy and layout features in DOM tree, combined XPath with RegExp to locate entity data, then stored the extracted entity attributes and attribute values. Experiments show that, using this method can locate and extract the entity data from Deep Web quickly and efficiently, and achieved a higher accuracy.

International Journal of Computing | 2016

Using deep belief networks to extract Chinese entity attribute relation in domain-specific

Yantuan Xian; Fa Shao; Jianyi Guo; Lanjiang Zhou; Zhengtao Yu; Wei Chen

The state-of-the-art methods used for entity attribute relation extraction are primarily based on statistical machine learning, and the performance strongly depends on the quality of the extracted features. Deep belief networks DBN has been successful in the high dimensional feature space information extraction task, which can without complicated pre-processing. In this paper, the DBN, which consists of one or more restricted Boltzmann machine RBM layers and a back-propagation BP layer, is presented to extract Chinese entity attribute relation in domain-specific. First, the word tokens are transformed to vectors by looking up word embeddings. Then, the RBM layers maintain as much information as possible when feature vectors are transferred to next layer. Finally, the BP layer is trained to classify the features generated by the last RBM layer, and adopting Levenberg-Marquard LM optimisation algorithm to do the training. The experimental results show that the proposed method outperforms state-of-the-art learning models in specific domain entity attribute relation extraction.

international conference natural language processing | 2011

An answer extraction method based on discourse structure and rank learning

Huanyun Zong; Zhengtao Yu; Jianyi Guo; Yantuan Xian; Jian Li

For the complex questions of Chinese question answering system such as ‘why’, ‘how’ these non-factoid questions, we proposed an answer extraction method using discourse structures features and ranking algorithm. This method takes the judge problem of answers relevance as learning to rank answers. First, the method analyses questions to generate the query string, and then uses rhetorical structure theory and the natural language processing technology of vocabulary, syntax, semantic analysis to analyze the retrieved documents, so as to determine the inherent relationship between paragraphs or sentences and generate the answer candidate paragraphs or sentences. Thirdly, construct the answer ranking model, extract five group features of similarity features, density and frequency features, translation features, discourse structure features and external knowledge features to train ranking model. Finally, re-ranking the answers with the training model and find the optimal answers. Experiments show that the proposed method can effectively improve the accuracy and quality of non-factoid answers.

Pattern Recognition and Image Analysis | 2017

Extracting hyponymy of domain entity using Cascaded Conditional Random Fields

Xiaojun Ma; Jianyi Guo; Zhengtao Yu; Cunli Mao; Yantuan Xian; Wei Chen

Entity hyponymy is an important semantic relation to build the domain ontology or knowledge graphs. Traditional extraction methods of domain concepts hyponymy are limited to manual annotation or specific patterns. Aiming at this problem, this paper proposed a new method of extracting hypernym–hyponym relations of domain entity with the CCRFs (Cascaded Conditional Random Fields), i.e., a two-layer CRFs model is employed to learn the hyponymy of domain entity concept. The lower-level of the CCRFs model is used to model the words by considering the dependence of long distance among words and identify the domain entity concept, which need to be combined in order. The pairs of entity concept can be obtained on the basis of the definition template characteristics. Then label the semantic pairs of concepts in high-level model by integrating assemblage characteristics and hyponymy demonstratives in feature template, finally identify the hypernym–hyponym relations between domain entities. Experiments on real-world data sets demonstrate the performance of the proposed algorithms.

web age information management | 2016

Using Convex Combination Kernel Function to Extract Entity Relation in Specific Field

Qi Shang; Jianyi Guo; Yantuan Xian; Zhengtao Yu; Yonghua Wen

Kernel method has been proven to be effective in measuring the similarity of two complex relation patterns. Aim at the optimization problem of compound kernel functions, this paper presents a method of finding the optimal convex combination kernel function, which is comprised of multiple kernel functions and needs to be optimized. After preprocessing the corpus and selecting features including lexical information, phrases syntax information and dependency information, the feature matrix was constructed by using these features. The optimal kernel function can be found in the process of mapping the feature matrix to different high-dimensional matrix, and the different classification models can be obtained. The experiments are conducted on the domain dataset from Web and the experimental results show that our approach outperforms state-of-the-art learning models such as ME or Convolution tree kernel.

chinese control and decision conference | 2016

An expert ranking method based on listnet with feature hierarchy

Shengxiang Gao; Zhengtao Yu; Sichao Wei; Yuan Yin; Yantuan Xian

Aiming to that the different hierarchic feature has a different degree contribution on expert ranking, the thesis proposes an expert ranking method which is based on ListNet combined with feature hierarchy type information. Firstly, the method thoroughly analyses characteristics of experts ranking, and defines four feature types, correlative features between query and document, page content features, language model features and expert-associated features. Then, considering the different feature type has a different degree contribution for expert ranking, according to the size of the contribution degree, it is defined a specific feature hierarchy value to each feature type. Finally, the feature hierarchy type values are combined together with the features defined by the first step to supervise expert ranking in ListNet algorithm. The experiments show that the introduction of feature hierarchy improves expert ranking quality.

China Workshop on Machine Translation | 2016

Building the Vietnamese Phrase Treebank by Improved Probabilistic Context-Free Grammars

Ying Li; Jianyi Guo; Zhengtao Yu; Yantuan Xian; Yonghua Wen

Phrase Treebank is an important resource for Natural Language Processing research and practical application. For Vietnamese, we lack this kind of Treebank resources. This paper presents a method to construct the Vietnamese phrase Treebank by fusion of Vietnamese grammatical features and improved PCFG. This method can automatically analyze Vietnamese phrase structure tree and solve the problem of constructing the Vietnamese phrase Treebank. Firstly, Vietnamese grammatical feature set is established by analysis of Vietnamese grammatical features. Then, grammar rule set of PCFG model is obtained from manual annotation Vietnamese phrase trees. Finally, Vietnamese grammatical feature set is fused into improved PCFG model, which is regarded as a supplement, and the method completes the construction of Vietnamese phrase Treebank. The experimental results show that the accuracy of proposed PCFG model for the Vietnamese phrase Treebank construction reaches 89.12%. Compared to conventional PCFG model and the maximum entropy method, the accuracy obviously is improved.

CCL | 2016

Using Collaborative Training Method to Build Vietnamese Dependency Treebank

Guoke Qiu; Jianyi Guo; Zhengtao Yu; Yantuan Xian; Cunli Mao

For the difficulty of marking Vietnamese dependency tree, this paper proposed the method which combined MST algorithm and improved Nivre algorithm to build Vietnamese dependency treebank. The method took full advantage of the characteristics of collaborative training. Firstly, we built a bit samples. Secondly, we used the samples to build two weak learners with two fully redundant views. Then, we marked a large number of unmarked samples mutually. Next, we selected the samples of high trust to relearn and built a dependency parsing system. Finally, we used 5000 Vietnamese sentences marked manually to do tenfold cross-test and obtained the accuracy of 76.33 %. Experimental results showed that the proposed method in this paper could take full advantage of unmarked corpus to effectively improve the quality of dependency treebank.

Automatic Control and Computer Sciences | 2016

A hybrid method for entity hyponymy acquisition in Chinese complex sentences

Yunru Cheng; Jianyi Guo; Yantuan Xian; Zhengtao Yu; Wei Chen; Qiyue Yang

Extracting entity hyponymy in Chinese complex sentences can be a highly difficult process. This paper proposes a novel hybrid approach that combines parsing with supervised learning and semi-supervised learning. First, conditional random fields (CRF) model is employed to obtain the candidate domain named entity. Pattern matching is then used to acquire candidate hyponymy. Next, predicate and symbol features, syntactic analysis, and semantic roles are introduced into the CRF features template to identify the hyponymy entity pairs. Finally, analysis of both the parallel relationship of entities among sentences and entity pairs in simple sentences is conducted to obtain the hyponymy entity pairs in Chinese complex sentences. The experimental results show that the proposed method reduces the manual work required for CRF markers and has an improved overall performance in comparison with the baseline methods.

Explore More