Is this you? Create Your Porfile

Ngo Xuan Bach

Japan Advanced Institute of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ngo Xuan Bach is active.

Explore More

Publication

Featured researches published by Ngo Xuan Bach.

ACM Transactions on Asian Language Information Processing | 2013

A Two-Phase Framework for Learning Logical Structures of Paragraphs in Legal Articles

Ngo Xuan Bach; Nguyen Le Minh; Tran Thi Oanh; Akira Shimazu

Analyzing logical structures of texts is important to understanding natural language, especially in the legal domain, where legal texts have their own specific characteristics. Recognizing logical structures in legal texts does not only help people in understanding legal documents, but also in supporting other tasks in legal text processing. In this article, we present a new task, learning logical structures of paragraphs in legal articles, which is studied in research on Legal Engineering. The goals of this task are recognizing logical parts of law sentences in a paragraph, and then grouping related logical parts into some logical structures of formulas, which describe logical relations between logical parts. We present a two-phase framework to learn logical structures of paragraphs in legal articles. In the first phase, we model the problem of recognizing logical parts in law sentences as a multi-layer sequence learning problem, and present a CRF-based model to recognize them. In the second phase, we propose a graph-based method to group logical parts into logical structures. We consider the problem of finding a subset of complete subgraphs in a weighted-edge complete graph, where each node corresponds to a logical part, and a complete subgraph corresponds to a logical structure. We also present an integer linear programming formulation for this optimization problem. Our models achieve 74.37% in recognizing logical parts, 80.08% in recognizing logical structures, and 58.36% in the whole task on the Japanese National Pension Law corpus. Our work provides promising results for further research on this interesting task.

International Journal of Computer Processing of Languages | 2011

RRE Task: The Task of Recognition of Requisite Part and Effectuation Part in Law Sentences

Ngo Xuan Bach; Le-Minh Nguyen; Akira Shimazu

Analyzing the logical structure of a sentence is important for understanding natural language. In this paper, we present a task of Recognition of Requisite Part and Effectuation Part in Law Sentences, or RRE task for short, which is studied in research on Legal Engineering. The goal of this task is to recognize the structure of a law sentence. We investigate the RRE task regarding both the linguistic features and problem modeling aspects. We also propose solutions and present experimental results in a Japanese legal text domain. We got 88.58% with a supervised learning model and 88.84% with a semi-supervised learning model in the Fβ=1 score on the Japanese National Pension Law corpus.

Expert Systems With Applications | 2014

Exploiting discourse information to identify paraphrases

Ngo Xuan Bach; Nguyen Le Minh; Akira Shimazu

Previous work on paraphrase identification using sentence similarities has not exploited discourse structures, which have been shown as important information for paraphrase computation. In this paper, we propose a new method named EDU-based similarity, to compute the similarity between two sentences based on elementary discourse units. Unlike conventional methods, which directly compute similarities based on sentences, our method divides sentences into discourse units and employs them to compute similarities. We also show the relation between paraphrases and discourse units, which plays an important role in paraphrasing. We apply our method to the paraphrase identification task. Experimental results on the PAN corpus, a large corpus for detecting paraphrases, show the effectiveness of using discourse information for identifying paraphrases. We achieve 93.1% and 93.4% accuracy, respectively by using a single SVM classifier and by using a maximal voting model.

Procedia Computer Science | 2013

Dual Decomposition for Vietnamese Part-of-Speech Tagging

Ngo Xuan Bach; Kunihiko Hiraishi; Nguyen Le Minh; Akira Shimazu

Abstract Part-of-speech (POS) tagging is a fundamental task in natural language processing (NLP). It provides useful information for many other NLP tasks, including word sense disambiguation, text chunking, named entity recognition, syntactic parsing, semantic role labeling, and semantic parsing. In this paper, we present a new method for Vietnamese POS tagging using dual decomposition. We show how dual decomposition can be used to integrate a word-based model and a syllable-based model to yield a more powerful model for tagging Vietnamese sentences. We also describe experiments on the Viet Treebank corpus, a large annotated corpus for Vietnamese POS tagging. Experimental results show that our model using dual decomposition outperforms both word-based and syllable-based models.

International Conference on NLP | 2012

UDRST: A Novel System for Unlabeled Discourse Parsing in the RST Framework

Ngo Xuan Bach; Nguyen Le Minh; Akira Shimazu

This paper presents UDRST, an unlabeled discourse parsing system in the RST framework. UDRST consists of a segmentation model and a parsing model. The segmentation model exploits subtree features to rerank N-best outputs of a base segmenter, which uses syntactic and lexical features in a CRF framework. In the parsing model, we present two algorithms for building a discourse tree from a segmented text: an incremental algorithm and a dual decomposition algorithm. Our system achieves 77.3% in the unlabeled score on the standard test set of the RST Discourse Treebank corpus, which improves 5.0% compared to HILDA [6], a state-of-the-art discourse parsing system.

knowledge and systems engineering | 2015

Paraphrase Identification in Vietnamese Documents

Ngo Xuan Bach; Tran Thi Oanh; Nguyen Trung Hai; Tu Minh Phuong

In this paper, we investigate the task of paraphrase identification in Vietnamese documents, which identify whether two sentences have the same meaning. This task has been shown to be an important research dimension with practical applications in natural language processing and data mining. We choose to model the task as a classification problem and explore different types of features to represent sentences. We also introduce a paraphrase corpus for Vietnamese, vnPara, which consists of 3000 Vietnamese sentence pairs. We describe a series of experiments using various linguistic features and different machine learning algorithms, including Support Vector Machines, Maximum Entropy Model, Naive Bayes, and k-Nearest Neighbors. The results are promising with the best model achieving up to 90% accuracy. To the best of our knowledge, this is the first attempt to solve the task of paraphrase identification for Vietnamese.

applications of natural language to data bases | 2013

EDU-Based Similarity for Paraphrase Identification

Ngo Xuan Bach; Nguyen Le Minh; Akira Shimazu

We propose a new method to compute the similarity between two sentences based on elementary discourse units, EDU-based similarity. Unlike conventional methods, which directly compute similarities based on sentences, our method divides sentences into discourse units and uses them to compute similarities. We also show the relation between paraphrases and discourse units, which plays an important role in paraphrasing. We apply our method to the paraphrase identification task. By using only a single SVM classifier, we achieve 93.1% accuracy on the PAN corpus, a large corpus for detecting paraphrases.

Archive | 2015

A Joint Model for Vietnamese Part-of-Speech Tagging Using Dual Decomposition

Ngo Xuan Bach; Kunihiko Hiraishi; Nguyen Le Minh; Akira Shimazu

Part-of-speech (POS) tagging is a fundamental task of Natural Language Processing (NLP). It provides useful information for many other NLP tasks, including word sense disambiguation, text chunking, named entity recognition, syntactic parsing, semantic role labeling, and semantic parsing. Several methods have been proposed to deal with the POS tagging task in Vietnamese. They can be divided into two types of models: word-based models and syllable-based models. While a word-based model assigns a POS tag to each word, a syllable-based model assigns a POS tag to each syllable. This chapter presents a new model for Vietnamese POS tagging using dual decomposition. The chapter shows how dual decomposition can be exploited to integrate a word-based model and a syllable-based model to yield a more powerfulmodel for tagging Vietnamese sentences. Then the chapter describes experiments on the Viet Treebank corpus, a large annotated corpus for Vietnamese POS tagging. This chapter also presents an error analysis to investigate which types of words in Vietnamese are more difficult to tag than other words. Experimental results show that the word-based model and the syllable-based model are complementary. Moreover, the proposed model using dual decomposition outperforms both the word-based and the syllable-based models.

soft computing and pattern recognition | 2013

What should I comment: Recommending posts for commenting

Nguyen Do Hai; Ngo Xuan Bach; Tran Quang An; Tu Minh Phuong

Nowadays, with the appearance of the Internet and personal computers, Web becomes one of the most important vehicles to convey information. There are many new forms of information on the Web, including websites, blogs, wikis, social networks, and Internet forums. The explosion of user-generated content poses challenges to browsing and finding valuable information on the Web. In this paper, we present a study on the task of recommending, for a given user, a short list of suitable forum posts for commenting. We propose a collaborative filtering method which exploits the co-commenting patterns of the users to generate recommendations, and compare the method with traditional content-based filtering approaches. Experimental results on two types of forums show that the proposed collaborative filtering method achieved substantial improvements in terms of accuracy over a baseline and the content-based filtering methods. The results also demonstrate the stability of our method in handling new posts with small number of comments.

annual meeting of the special interest group on discourse and dialogue | 2012