Is this you? Create Your Porfile

Jinan Xu

Beijing Jiaotong University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jinan Xu is active.

Explore More

Publication

Featured researches published by Jinan Xu.

NLPCC | 2013

A Method to Construct Chinese-Japanese Named Entity Translation Equivalents Using Monolingual Corpora

Kuang Ru; Jinan Xu; Yujie Zhang; Peihao Wu

The traditional method of Named Entity Translation Equivalents extraction is often based on large-scale parallel or comparable corpora. But the practicability of the research results is constrained by the relatively scarce of the bilingual corpus resources. We combined the features of Chinese and Japanese, and proposed a method to automatically extract the Chinese-Japanese NE translation equivalents based on inductive learning from monolingual corpus. This method uses the Chinese Hanzi and Japanese Kanji comparison table to calculate NE instances similarity between Japanese and Chinese. Then, we use inductive learning method to obtain partial translation rules of NEs through extracting the differences between Chinese and Japanese high similarity NE instances. In the end, the feedback process refreshes the Chinese and Japanese NE similarity and translation rule sets. Experimental results show that the proposed method is simple and efficient, which overcome the shortcoming that the traditional methods have a dependency on bilingual resource.

meeting of the association for computational linguistics | 2015

A Hybrid Transliteration Model for Chinese/English Named Entities â€”BJTU-NLP Report for the 5th Named Entities Workshop

Dandan Wang; Xiaohui Yang; Jinan Xu; Yufeng Chen; Nan Wang; Bojia Liu; Jian Yang; Yujie Zhang

This paper presents our system (BJTU-NLP system) for the NEWS2015 evaluation task of Chinese-to-English and English-to-Chinese named entity transliteration. Our system adopts a hybrid machine transliteration approach, which combines several features. To further improve the result, we adopt external data extracted from wikipeda to expand the training set. In addition, pre-processing and post-processing rules are utilized to further improve the performance. The final performance on the test corpus shows that our system achieves comparable results with other state-of-the-art systems.

international conference natural language processing | 2018

Improved Character-Based Chinese Dependency Parsing by Using Stack-Tree LSTM

Hang Liu; Mingtong Liu; Yujie Zhang; Jinan Xu; Yufeng Chen

Almost all the state-of-the-art methods for Character-based Chinese dependency parsing ignore the complete dependency subtree information built during the parsing process, which is crucial for parsing the rest part of the sentence. In this paper, we introduce a novel neural network architecture to capture dependency subtree feature. We extend and improve recent works in neural joint model for Chinese word segmentation, POS tagging and dependency parsing, and adopt bidirectional LSTM to learn n-gram feature representation and context information. The neural network and bidirectional LSTMs are trained jointly with the parser objective, resulting in very effective feature extractors for parsing. Finally, we conduct experiments on Penn Chinese Treebank 5, and demonstrate the effectiveness of the approach by applying it to a greedy transition-based parser. The results show that our model outperforms the state-of-the-art neural joint models in Chinese word segmentation, POS tagging and dependency parsing.

Archive | 2018

Attention-Based Convolutional Neural Networks for Chinese Relation Extraction

Wenya Wu; Yufeng Chen; Jinan Xu; Yujie Zhang

Relation extraction is an important part of many information extraction systems that mines structured facts from texts. Recently, deep learning has achieved good results in relation extraction. Attention mechanism is also gradually applied to networks, which improves the performance of the task. However, the current attention mechanism is mainly applied to the basic features on the lexical level rather than the higher overall features. In order to obtain more information of high-level features for relation predicting, we proposed attention-based piecewise convolutional neural networks (PCNN_ATT), which add an attention layer after the piecewise max pooling layer in order to get significant information of sentence global features. Furthermore, we put forward a data extension method by utilizing an external dictionary HIT IR-Lab Tongyici Cilin (Extended). Experiments results on ACE-2005 and COAE-2016 Chinese datasets both demonstrate that our approach outperforms most of the existing methods.

Archive | 2018

Addressing Domain Adaptation for Chinese Word Segmentation with Instances-Based Transfer Learning

Yanna Zhang; Jinan Xu; Guoyi Miao; Yufeng Chen; Yujie Zhang

Recent studies have shown effectiveness in using neural networks for Chinese Word Segmentation (CWS). However, these models, constrained by the domain and size of the training corpus, do not work well in domain adaptation. In this paper, we propose a novel instance-transferring method, which use valuable target domain annotated instances to improve CWS on different domains. Specifically, we introduce semantic similarity computation based on character-based n-gram embedding to select instances. Furthermore, training sentences similar to instances are used to help annotate instances. Experimental results show that our method can effectively boost cross-domain segmentation performance. We achieve state-of-the-art results on Internet literatures datasets, and competitive results to the best reported on micro-blog datasets.

National CCF Conference on Natural Language Processing and Chinese Computing | 2017

A Semantic Concept Based Unknown Words Processing Method in Neural Machine Translation

Shaotong Li; Jinan Xu; Guoyi Miao; Yujie Zhang; Yufeng Chen

The problem of unknown words in neural machine translation (NMT), which not only affects the semantic integrity of the source sentences but also adversely affects the generating of the target sentences. The traditional methods usually replace the unknown words according to the similarity of word vectors, these approaches are difficult to deal with rare words and polysemous words. Therefore, this paper proposes a new method of unknown words processing in NMT based on the semantic concept of the source language. Firstly, we use the semantic concept of source language semantic dictionary to find the candidate in-vocabulary words. Secondly, we propose a method to calculate the semantic similarity by integrating the source language model and the semantic concept network, to obtain the best replacement word. Experiments on English to Chinese translation task demonstrate that our proposed method can achieve more than 2.6 BLEU points over the conventional NMT method. Compared with the traditional method based on word vector similarity, our method can also obtain an improvement by nearly 0.8 BLEU points.

China Workshop on Machine Translation | 2017

An Unknown Word Processing Method in NMT by Integrating Syntactic Structure and Semantic Concept

Guoyi Miao; Jinan Xu; Yancui Li; Shaotong Li; Yufeng Chen

The unknown words in neural machine translation (NMT) may undermine the integrity of sentence structure, increase ambiguity and have adverse effect on the translation. In order to solve this problem, we propose a method of processing unknown words in NMT based on integrating syntactic structure and semantic concept. Firstly, the semantic concept network is used to construct the set of in-vocabulary synonyms corresponding to the unknown words. Secondly, a semantic similarity calculation method based on the syntactic structure and semantic concept is proposed. The best substitute is selected from the set of in-vocabulary synonyms by calculating the semantic similarity between the unknown words and their candidate substitutes. English-Chinese translation experiments demonstrate that this method can maintain the semantic integrity of the source language sentences. Meanwhile, in performance, our proposed method can obtain an improvement by 2.9 BLEU points when compared with the conventional NMT method, and the method can also achieve an improvement by 0.95 BLEU points when compared with the traditional method of positioning the UNK character based on word alignment information.

NLPCC/ICCPOL | 2016

Iterative Integration of Unsupervised Features for Chinese Dependency Parsing

Te Luo; Yujie Zhang; Jinan Xu; Yufeng Chen

Since Chinese dependency parsing is lack of a large amount of manually annotated dependency treebank. Some unsupervised methods of using large-scale unannotated data are proposed and inevitably introduce too much noise from automatic annotation. In order to solve this problem, this paper proposes an approach of iteratively integrating unsupervised features for training Chinese dependency parsing model. Considering that more errors occurred in parsing longer sentences, this paper divide raw data according to sentence length and then iteratively train model. The model trained on shorter sentences will be used in the next iteration to analyze longer sentences. This paper adopts a character-based dependency model for joint word segmentation, POS tagging and dependency parsing in Chinese. The advantage of the joint model is that one task can be promoted by other tasks during processing by exploring the available internal results from the other tasks. The higher accuracy of the three tasks on shorter sentences can bring about higher accuracy of the whole model. This paper verified the proposed approach on the Penn Chinese Treebank and two raw corpora. The experimental results show that F1-scores of the three tasks were improved at each iteration, and F1-score of the dependency parsing was increased by 0.33%, compared with the conventional method.

NLPCC/ICCPOL | 2016

Chinese Paraphrases Acquisition Based on Random Walk N Step

Jun Ma; Yujie Zhang; Jinan Xu; Yufeng Chen

Conventional “pivot-based” approach of acquiring paraphrasing from bilingual corpus has limitations, where only paraphrases within two steps were considered. We propose a graph based model of acquiring paraphrases from phrases translation table. This paper describes the way of constructing graph model from phrases translation table, a random walk algorithm based on N number of steps and a confidence metric for ranking the obtained results. Furthermore, we augment the model to be able to integrate more language pairs, for instance, exploiting English-Japanese phrases translation table for finding more potential Chinese paraphrases. We performed experiments on NTCIR Chinese-English and English-Japanese bilingual corpora and compared with the conventional method. The experimental results showed that the proposed model acquired more paraphrases, and performed more well after English-Japanese phrases translation was added into the graph model.

EasyChair Preprints | 2018