Yantuan Xian
Kunming University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yantuan Xian.
Archive | 2013
Wei Tian; Tao Shen; Zhengtao Yu; Jianyi Guo; Yantuan Xian
Aimed at the problems of Chinese experts’ name repetition and representation diversity, a Chinese expert name disambiguation approach based on spectral clustering with the expert page-associated relationships is proposed. Firstly, the TF-IDF algorithm is used to calculate the word-based feature weights, and then the cosine similarity algorithm is employed to compute the similarity between the evidence-pages to obtain the initial similarity matrix of expert evidence-pages. Secondly, the expert page-associated relationship features are taken as the semi-supervised constraint information to correct the initial similarity matrix, and next the spectral clustering-based method is used to build expert disambiguation model. Finally, taking the contrast experiments on Chinese expert evidence-page corpus of manually labeled, the result shows that the semi-supervised spectral clustering on Chinese experts’ name disambiguation method with the expert page-associated relationships than that without the associated constraint information, the F-value has an average increase of 9.02 %.
chinese control and decision conference | 2014
Haitao Yu; Jianyi Guo; Zhengtao Yu; Yantuan Xian; Xin Yan
In order to make better use of the hidden information value in the Deep Web, get fast and accurate access to the embedded entity data, this paper presented a method for extracting entity data from Deep Web precisely, designed a entity extraction system, which will extract data from Deep Web automatically. Firstly, designed a web crawler based on the characteristics of Deep Web, take advantage of the web crawler to get resources from Internet; Secondly, the pretreatment of web resources, normalize the pages which are non-standard; Finally, locate and extract the entity data from Deep Web accurately, in this paper, based on the hierarchy and layout features in DOM tree, combined XPath with RegExp to locate entity data, then stored the extracted entity attributes and attribute values. Experiments show that, using this method can locate and extract the entity data from Deep Web quickly and efficiently, and achieved a higher accuracy.
International Journal of Computing | 2016
Yantuan Xian; Fa Shao; Jianyi Guo; Lanjiang Zhou; Zhengtao Yu; Wei Chen
The state-of-the-art methods used for entity attribute relation extraction are primarily based on statistical machine learning, and the performance strongly depends on the quality of the extracted features. Deep belief networks DBN has been successful in the high dimensional feature space information extraction task, which can without complicated pre-processing. In this paper, the DBN, which consists of one or more restricted Boltzmann machine RBM layers and a back-propagation BP layer, is presented to extract Chinese entity attribute relation in domain-specific. First, the word tokens are transformed to vectors by looking up word embeddings. Then, the RBM layers maintain as much information as possible when feature vectors are transferred to next layer. Finally, the BP layer is trained to classify the features generated by the last RBM layer, and adopting Levenberg-Marquard LM optimisation algorithm to do the training. The experimental results show that the proposed method outperforms state-of-the-art learning models in specific domain entity attribute relation extraction.
international conference natural language processing | 2011
Huanyun Zong; Zhengtao Yu; Jianyi Guo; Yantuan Xian; Jian Li
For the complex questions of Chinese question answering system such as ‘why’, ‘how’ these non-factoid questions, we proposed an answer extraction method using discourse structures features and ranking algorithm. This method takes the judge problem of answers relevance as learning to rank answers. First, the method analyses questions to generate the query string, and then uses rhetorical structure theory and the natural language processing technology of vocabulary, syntax, semantic analysis to analyze the retrieved documents, so as to determine the inherent relationship between paragraphs or sentences and generate the answer candidate paragraphs or sentences. Thirdly, construct the answer ranking model, extract five group features of similarity features, density and frequency features, translation features, discourse structure features and external knowledge features to train ranking model. Finally, re-ranking the answers with the training model and find the optimal answers. Experiments show that the proposed method can effectively improve the accuracy and quality of non-factoid answers.
Pattern Recognition and Image Analysis | 2017
Xiaojun Ma; Jianyi Guo; Zhengtao Yu; Cunli Mao; Yantuan Xian; Wei Chen
Entity hyponymy is an important semantic relation to build the domain ontology or knowledge graphs. Traditional extraction methods of domain concepts hyponymy are limited to manual annotation or specific patterns. Aiming at this problem, this paper proposed a new method of extracting hypernym–hyponym relations of domain entity with the CCRFs (Cascaded Conditional Random Fields), i.e., a two-layer CRFs model is employed to learn the hyponymy of domain entity concept. The lower-level of the CCRFs model is used to model the words by considering the dependence of long distance among words and identify the domain entity concept, which need to be combined in order. The pairs of entity concept can be obtained on the basis of the definition template characteristics. Then label the semantic pairs of concepts in high-level model by integrating assemblage characteristics and hyponymy demonstratives in feature template, finally identify the hypernym–hyponym relations between domain entities. Experiments on real-world data sets demonstrate the performance of the proposed algorithms.
web age information management | 2016
Qi Shang; Jianyi Guo; Yantuan Xian; Zhengtao Yu; Yonghua Wen
Kernel method has been proven to be effective in measuring the similarity of two complex relation patterns. Aim at the optimization problem of compound kernel functions, this paper presents a method of finding the optimal convex combination kernel function, which is comprised of multiple kernel functions and needs to be optimized. After preprocessing the corpus and selecting features including lexical information, phrases syntax information and dependency information, the feature matrix was constructed by using these features. The optimal kernel function can be found in the process of mapping the feature matrix to different high-dimensional matrix, and the different classification models can be obtained. The experiments are conducted on the domain dataset from Web and the experimental results show that our approach outperforms state-of-the-art learning models such as ME or Convolution tree kernel.
chinese control and decision conference | 2016
Shengxiang Gao; Zhengtao Yu; Sichao Wei; Yuan Yin; Yantuan Xian
Aiming to that the different hierarchic feature has a different degree contribution on expert ranking, the thesis proposes an expert ranking method which is based on ListNet combined with feature hierarchy type information. Firstly, the method thoroughly analyses characteristics of experts ranking, and defines four feature types, correlative features between query and document, page content features, language model features and expert-associated features. Then, considering the different feature type has a different degree contribution for expert ranking, according to the size of the contribution degree, it is defined a specific feature hierarchy value to each feature type. Finally, the feature hierarchy type values are combined together with the features defined by the first step to supervise expert ranking in ListNet algorithm. The experiments show that the introduction of feature hierarchy improves expert ranking quality.
China Workshop on Machine Translation | 2016
Ying Li; Jianyi Guo; Zhengtao Yu; Yantuan Xian; Yonghua Wen
Phrase Treebank is an important resource for Natural Language Processing research and practical application. For Vietnamese, we lack this kind of Treebank resources. This paper presents a method to construct the Vietnamese phrase Treebank by fusion of Vietnamese grammatical features and improved PCFG. This method can automatically analyze Vietnamese phrase structure tree and solve the problem of constructing the Vietnamese phrase Treebank. Firstly, Vietnamese grammatical feature set is established by analysis of Vietnamese grammatical features. Then, grammar rule set of PCFG model is obtained from manual annotation Vietnamese phrase trees. Finally, Vietnamese grammatical feature set is fused into improved PCFG model, which is regarded as a supplement, and the method completes the construction of Vietnamese phrase Treebank. The experimental results show that the accuracy of proposed PCFG model for the Vietnamese phrase Treebank construction reaches 89.12%. Compared to conventional PCFG model and the maximum entropy method, the accuracy obviously is improved.
CCL | 2016
Guoke Qiu; Jianyi Guo; Zhengtao Yu; Yantuan Xian; Cunli Mao
For the difficulty of marking Vietnamese dependency tree, this paper proposed the method which combined MST algorithm and improved Nivre algorithm to build Vietnamese dependency treebank. The method took full advantage of the characteristics of collaborative training. Firstly, we built a bit samples. Secondly, we used the samples to build two weak learners with two fully redundant views. Then, we marked a large number of unmarked samples mutually. Next, we selected the samples of high trust to relearn and built a dependency parsing system. Finally, we used 5000 Vietnamese sentences marked manually to do tenfold cross-test and obtained the accuracy of 76.33 %. Experimental results showed that the proposed method in this paper could take full advantage of unmarked corpus to effectively improve the quality of dependency treebank.
Automatic Control and Computer Sciences | 2016
Yunru Cheng; Jianyi Guo; Yantuan Xian; Zhengtao Yu; Wei Chen; Qiyue Yang
Extracting entity hyponymy in Chinese complex sentences can be a highly difficult process. This paper proposes a novel hybrid approach that combines parsing with supervised learning and semi-supervised learning. First, conditional random fields (CRF) model is employed to obtain the candidate domain named entity. Pattern matching is then used to acquire candidate hyponymy. Next, predicate and symbol features, syntactic analysis, and semantic roles are introduced into the CRF features template to identify the hyponymy entity pairs. Finally, analysis of both the parallel relationship of entities among sentences and entity pairs in simple sentences is conducted to obtain the hyponymy entity pairs in Chinese complex sentences. The experimental results show that the proposed method reduces the manual work required for CRF markers and has an improved overall performance in comparison with the baseline methods.