Is this you? Create Your Porfile

Minh Le Nguyen

Japan Advanced Institute of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Minh Le Nguyen is active.

Explore More

Publication

Featured researches published by Minh Le Nguyen.

international conference on computational linguistics | 2004

Probabilistic sentence reduction using support vector machines

Minh Le Nguyen; Akira Shimazu; Susumu Horiguchi; Bao Tu Ho; Masaru Fukushi

This paper investigates a novel application of support vector machines (SVMs) for sentence reduction. We also propose a new probabilistic sentence reduction method based on support vector machine learning. Experimental results show that the proposed methods outperform earlier methods in term of sentence reduction performance.

ACM Transactions on Asian Language Information Processing | 2004

Example-based sentence reduction using the hidden markov model

Minh Le Nguyen; Susumu Horiguchi; Akira Shimazu; Bao Tu Ho

Sentence reduction is the removal of redundant words or phrases from an input sentence by creating a new sentence in which the gist of the original meaning of the sentence remains unchanged. All previous methods required a syntax parser before sentences could be reduced; hence it was difficult to apply them to a language with no reliable parser. In this article we propose two new sentence-reduction algorithms that do not use syntactic parsing for the input sentence. The first algorithm, based on the template-translation learning algorithm, one of example-based machine-translation methods, works quite well in reducing sentences, but its computational complexity can be exponential in certain cases. The second algorithm, an extension of the template--translation algorithm via innovative employment of the Hidden Markov model, which uses the set of template rules learned from examples, can overcome this computation problem. Experiments show that the proposed algorithms achieve acceptable results in comparison to sentence reduction done by humans.

applications of natural language to data bases | 2014

From Treebank Conversion to Automatic Dependency Parsing for Vietnamese

Dat Quoc Nguyen; Dai Quoc Nguyen; Son Bao Pham; Phuong-Thai Nguyen; Minh Le Nguyen

This paper presents a new conversion method to automatically transform a constituent-based Vietnamese Treebank into dependency trees. On a dependency Treebank created according to our new approach, we examine two state-of-the-art dependency parsers: the MSTParser and the MaltParser. Experiments show that the MSTParser outperforms the MaltParser. To the best of our knowledge, we report the highest performances published to date in the task of dependency parsing for Vietnamese. Particularly, on gold standard POS tags, we get an unlabeled attachment score of 79.08% and a labeled attachment score of 71.66%.

conference on computational natural language learning | 2008

A Tree-to-String Phrase-based Model for Statistical Machine Translation

Thai Phuong Nguyen; Akira Shimazu; Tu Bao Ho; Minh Le Nguyen; Vinh Van Nguyen

Though phrase-based SMT has achieved high translation quality, it still lacks of generalization ability to capture word order differences between languages. In this paper we describe a general method for tree-to-string phrase-based SMT. We study how syntactic transformation is incorporated into phrase-based SMT and its effectiveness. We design syntactic transformation models using unlexicalized form of synchronous context-free grammars. These models can be learned from source-parsed bitext. Our system can naturally make use of both constituent and non-constituent phrasal translations in the decoding phase. We considered various levels of syntactic analysis ranging from chunking to full parsing. Our experimental results of English-Japanese and English-Vietnamese translation showed a significant improvement over two baseline phrase-based SMT systems.

european conference on information retrieval | 2016

SoRTESum: A Social Context Framework for Single-Document Summarization

Minh-Tien Nguyen; Minh Le Nguyen

The combination of web document contents, sentences and users’ comments from social networks provides a viewpoint of a web document towards a special event. This paper proposes a framework named SoRTESum to take advantage of information from Twitter viz. Diversity and reflection of document content to generate high-quality summaries by a novel sentence similarity measurement. The framework first formulates sentences and tweets by recognizing textual entailment (RTE) relation to incorporate social information. Next, they are modeled in a Dual Wing Entailment Graph, which captures the entailment relation to calculate the sentence similarity based on mutual reinforcement information. Finally, important sentences and representative tweets are selected by a ranking algorithm. By incorporating social information, SoRTESum obtained improvements over state-of-the-art unsupervised baselines e.g., Random, SentenceLead, LexRank of 0.51 %–8.8 % of ROUGE-1 and comparable results with strong supervised methods e.g., L2R and CrossL2R trained by RankBoost for single-document summarization.

International Journal of Computer Processing of Languages | 2007

A Syntactic Transformation Model for Statistical Machine Translation

Thai Phuong Nguyen; Akira Shimazu; Minh Le Nguyen; Vinh Van Nguyen

We describe a syntactic transformation model based on the probabilistic context-free grammar. This model is trained by using bilingual corpus and a broad coverage parser of the source language. Then we present two methods to solve the word-order problem using the transformational model. The first method deals with this problem in the preprocessing phase. There is no reordering in the decoding phase. The second method employs the syntactic transformation model in the decoding phase for phrase reordering within chunks. Speed is an advantage of this method. We considered translation from English to Vietnamese and from English to French. Our experiments showed significant BLEU-score improvements in comparison with Pharaoh, a state-of-the-art phrase-based SMT system.

Applied Intelligence | 2017

Feature weighting and SVM parameters optimization based on genetic algorithms for classification problems

Anh Viet Phan; Minh Le Nguyen; Lam Thu Bui

Support Vector Machines (SVMs) are widely known as an efficient supervised learning model for classification problems. However, the success of an SVM classifier depends on the perfect choice of its parameters as well as the structure of the data. Thus, the aim of this research is to simultaneously optimize the parameters and feature weighting in order to increase the strength of SVMs. We propose a novel hybrid model, the combination of genetic algorithms (GAs) and SVMs, for feature weighting and parameter optimization to solve classification problems efficiently. We call it as the GA-SVM model. Our GA is designed with a special direction-based crossover operator. Experiments were conducted on several real-world datasets using the proposed model and Grid Search, a traditional method of searching optimal parameters. The results show that the GA-SVM model achieves significant improvement in the performance of classification on all the datasets in comparison with Grid Search. In terms of accuracy, out method is competitive with some state-of-the-art techniques for feature selection and feature weighting.

conference on information and knowledge management | 2016

SoLSCSum: A Linked Sentence-Comment Dataset for Social Context Summarization

Minh-Tien Nguyen; Chien-Xuan Tran; Duc-Vu Tran; Minh Le Nguyen

This paper presents a dataset named SoLSCSum for social context summarization. The dataset includes 157 open-domain articles along with their comments collected from Yahoo News. The articles and their comments were manually annotated by two annotators to extract standard summaries. The inter-annotator agreement is 74.5% and Cohens Kappa is 0.5845. To illustrate the potential use of our dataset, a learning to rank model was trained by using a set of local and cross features. Experimental results demonstrate that: (1) our model trained by Ranking SVM obtains significant improvements from 5.5% to 14.8% of ROUGE-1 over state-of-the-art baselines in document summarization and (2) our dataset can be used to train summary methods such as SVM.

Expert Systems With Applications | 2017

Intra-relation or inter-relation?

Minh-Tien Nguyen; Minh Le Nguyen

A novel ranking framework for social context summarization is proposed.The framework relies on the reinforcement support of social information.14 features in two groups: distance and statistical are proposed.A new open-domain dataset is created and manually annotated.Combining intra-relation and inter-relation benefits the summarization. Traditional summarization methods only use the internal information of a Web document while ignoring its social information such as tweets from Twitter, which can provide a perspective viewpoint for readers towards an event. This paper proposes a framework named SoRTESum to take the advantages of social information such as document content reflection to extract summary sentences and social messages. In order to do that, the summarization was formulated in two steps: scoring and ranking. In the scoring step, the score of a sentence or social message is computed by using intra-relation and inter-relation which integrate the support of local and social information in a mutual reinforcement form. To calculate these relations, 16 features are proposed. After scoring, the summarization is generated by selecting top m ranked sentences and social messages. SoRTESum was extensively evaluated on two datasets. Promising results show that: (i) SoRTESum obtains significant improvements of ROUGE-scores over state-of-the-art baselines and competitive results with the learning to rank approach trained by RankBoost and (ii) combining intra-relation and inter-relation benefits single-document summarization.

international conference on tools with artificial intelligence | 2016

Learning to Summarize Web Documents Using Social Information

Minh-Tien Nguyen; Duc-Vu Tran; Chien-Xuan Tran; Minh Le Nguyen

This paper presents a method named SoSVMRank, which integrates the social information of a Web document to generate a high-quality summarization. In order to do that, the summarization was formulated as a learning to rank task, in which the order of a sentence or comment was determined by its informative information. The informative information was measured by a set of local and social features in which the social features were exploited to support the local ones when modeling a sentence or comment. To enrich information, new features were also proposed. After ranking, top m ranked sentences and comments were selected as the summarization. Our method was extensively evaluated on two datasets. Promising results indicate that: (1) by using new features, our method achieves improvements in both ROUGE-1 and ROUGE-2 of the summarization over state-of-the-art baselines and (2) integrating social information benefits the summarization.

Explore More