Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Le-Minh Nguyen is active.

Publication


Featured researches published by Le-Minh Nguyen.


IEEE Transactions on Knowledge and Data Engineering | 2011

A Hidden Topic-Based Framework toward Building Applications with Short Web Documents

Xuan Hieu Phan; Cam-Tu Nguyen; Dieu-Thu Le; Le-Minh Nguyen; Susumu Horiguchi; Quang-Thuy Ha

This paper introduces a hidden topic-based framework for processing short and sparse documents (e.g., search result snippets, product descriptions, book/movie summaries, and advertising messages) on the Web. The framework focuses on solving two main challenges posed by these kinds of documents: 1) data sparseness and 2) synonyms/homonyms. The former leads to the lack of shared words and contexts among documents while the latter are big linguistic obstacles in natural language processing (NLP) and information retrieval (IR). The underlying idea of the framework is that common hidden topics discovered from large external data sets (universal data sets), when included, can make short documents less sparse and more topic-oriented. Furthermore, hidden topics from universal data sets help handle unseen data better. The proposed framework can also be applied for different natural languages and data domains. We carefully evaluated the framework by carrying out two experiments for two important online applications (Web search result classification and matching/ranking for contextual advertising) with large-scale universal data sets and we achieved significant results.


IEICE Transactions on Information and Systems | 2006

Personal Name Resolution Crossover Documents by a Semantics-Based Approach

Xuan Hieu Phan; Le-Minh Nguyen; Susumu Horiguchi

Cross-document personal name resolution is the process of identifying whether or not a common personal name mentioned in different documents refers to the same individual. Most previous approaches usually rely on lexical matching such as the occurrence of common words surrounding the entity name to measure the similarity between documents, and then clusters the documents according to their referents. In spite of certain successes, measuring similarity based on lexical comparison sometimes ignores important linguistic phenomena at the semantic level such as synonym or paraphrase. This paper presents a semantics-based approach to the resolution of personal name crossover documents that can make the most of both lexical evidences and semantic clues. In our method, the similarity values between documents are determined by estimating the semantic relatedness between words. Further, the semantic labels attached to sentences allow us to highlight the common personal facts that are potentially available among documents. An evaluation on three web datasets demonstrates that our method achieves the better performance than the previous work.


meeting of the association for computational linguistics | 2006

Semantic Parsing with Structured SVM Ensemble Classification Models

Le-Minh Nguyen; Akira Shimazu; Xuan Hieu Phan

We present a learning framework for structured support vector models in which boosting and bagging methods are used to construct ensemble models. We also propose a selection method which is based on a switching model among a set of outputs of individual classifiers when dealing with natural language parsing problems. The switching model uses subtrees mined from the corpus and a boosting-based algorithm to select the most appropriate output. The application of the proposed framework on the domain of semantic parsing shows advantages in comparison with the original large margin methods.


International Journal of Computer Processing of Languages | 2011

RRE Task: The Task of Recognition of Requisite Part and Effectuation Part in Law Sentences

Ngo Xuan Bach; Le-Minh Nguyen; Akira Shimazu

Analyzing the logical structure of a sentence is important for understanding natural language. In this paper, we present a task of Recognition of Requisite Part and Effectuation Part in Law Sentences, or RRE task for short, which is studied in research on Legal Engineering. The goal of this task is to recognize the structure of a law sentence. We investigate the RRE task regarding both the linguistic features and problem modeling aspects. We also propose solutions and present experimental results in a Japanese legal text domain. We got 88.58% with a supervised learning model and 88.84% with a semi-supervised learning model in the Fβ=1 score on the Japanese National Pension Law corpus.


Computer Speech & Language | 2008

Semi-supervised learning integrated with classifier combination for word sense disambiguation

Anh-Cuong Le; Akira Shimazu; Van-Nam Huynh; Le-Minh Nguyen

Word sense disambiguation (WSD) is the problem of determining the right sense of a polysemous word in a certain context. This paper investigates the use of unlabeled data for WSD within a framework of semi-supervised learning, in which labeled data is iteratively extended from unlabeled data. Focusing on this approach, we first explicitly identify and analyze three problems inherently occurred piecemeal in the general bootstrapping algorithm; namely the imbalance of training data, the confidence of new labeled examples, and the final classifier generation; all of which will be considered integratedly within a common framework of bootstrapping. We then propose solutions for these problems with the help of classifier combination strategies. This results in several new variants of the general bootstrapping algorithm. Experiments conducted on the English lexical samples of Senseval-2 and Senseval-3 show that the proposed solutions are effective in comparison with previous studies, and significantly improve supervised WSD.


IEICE Transactions on Information and Systems | 2007

High-Performance Training of Conditional Random Fields for Large-Scale Applications of Labeling Sequence Data

Xuan Hieu Phan; Le-Minh Nguyen; Yasushi Inoguchi; Susumu Horiguchi

Conditional random fields (CRFs) have been successfully applied to various applications of predicting and labeling structured data, such as natural language tagging & parsing, image segmentation & object recognition, and protein secondary structure prediction. The key advantages of CRFs are the ability to encode a variety of overlapping, non-independent features from empirical data as well as the capability of reaching the global normalization and optimization. However, estimating parameters for CRFs is very time-consuming due to an intensive forward-backward computation needed to estimate the likelihood function and its gradient during training. This paper presents a high-performance training of CRFs on massively parallel processing systems that allows us to handle huge datasets with hundreds of thousand data sequences and millions of features. We performed the experiments on an important natural language processing task (text chunking) on large-scale corpora and achieved significant results in terms of both the reduction of computational time and the improvement of prediction accuracy.


Proceedings of the 7th Workshop on Asian Language Resources | 2009

An Empirical Study of Vietnamese Noun Phrase Chunking with Discriminative Sequence Models

Le-Minh Nguyen; Huong Thao Nguyen; Phuong Thai Nguyen; Tu Bao Ho; Akira Shimazu

This paper presents an empirical work for Vietnamese NP chunking task. We show how to build an annotation corpus of NP chunking and how discriminative sequence models are trained using the corpus. Experiment results using 5 fold cross validation test show that discriminative sequence learning are well suitable for Vietnamese chunking. In addition, by empirical experiments we show that the part of speech information contribute significantly to the performance of there learning models.


international conference on asian language processing | 2012

Linguistic Features for Subjectivity Classification

Huong Nguyen Thi Xuan; Anh Cuong Le; Le-Minh Nguyen

Opinions are subjective expressions that describe peoples viewpoints, perspectives or feelings about entities, events and theirs properties. Detecting subjective expressions is the task of identifying whether a given text is subjective (i.e. an opinion)or objective (i.e. a reports fact). This task is considered as the first problem and it is very important for opinion mining and sentiment analysis which is now attracting many researchers cause its applicable capacity. Improvements in subjectivity classification will positively impact on the performance of a sentiment analysis system. Actually, features play the most important role for getting accurate subjective sentences. In this paper, we will enrich features by using syntactic information of the text. From our observation when investigating opinion evidences in the texts, we will propose syntax-based patterns which are used for extracting rich linguistic features. Combining these new features with conventional features from previous studies, we obtain a high accuracy (about 92.1%) for detecting subjective sentences on the Movie review data.


knowledge and systems engineering | 2010

A Semi-supervised Learning Method for Vietnamese Part-of-Speech Tagging

Le-Minh Nguyen; Bach Ngo Xuan; Cuong Nguyen Viet; Minh Pham Quang Nhat; Akira Shimazu

This paper presents a semi-supervised learning method for Vietnamese part of speech tagging. We take into account two powerful tagging models including Conditional Random Fields (CRFs)and the Guided Online-Learning models (GLs) as base learning models. We then propose a semi-supervised learning tagging model for both CRFs and GLs methods. The main idea is to use of a word-cluster model as an associate source for enrich the feature space of discriminate learning models for both training and decoding processes. Experimental results on Vietnamese Tree-bank data (VTB) showed that the proposed method is effective. Our best model achieved accuracy of 94.10\% when tested on VTB, and 92.60\% an independent test.


2007 IEEE International Conference on Research, Innovation and Vision for the Future | 2007

Improving the Accuracy of Question Classification with Machine Learning

Tri-Thanh Nguyen; Le-Minh Nguyen; Akira Shimazu

Question classification is an important phase in question answering systems. In this paper, we propose to apply i) hierarchical classifiers, ii) hierarchical classifiers in combination with semi-supervised learning and iii) hierarchy expansion for question classification for improving the precision. When the number of classes is large, the performance of classification algorithms may be affected. In order to improve the performance by reducing the number of classes for each classifier, we propose to use hierarchical classifiers according to the question taxonomy, in which each internal node is attached a classifier. We try to use semi-supervised learning to consume unlabeled questions with expectation to improve the performance of classifiers in the hierarchy. We explored different applications of learning methods in for each classifier of the hierarchy: a) supervised learning for all classifiers at all levels; b) semi-supervised learning for the first-level classifier and supervised learning for other classifiers; c) semi-supervised learning for all classifiers. The experiments show that the first method (a) has better results than those of flat classification; the second method (b) produces better results than those of the first method while the effort to increase the performance of fine classifiers in the last method (c) is not so successful. As another effort, we propose to automatically group question classes by clustering in order to expand a node which has a large number of classes in the question taxonomy. The experiment also shows that the overall precision is improved.

Collaboration


Dive into the Le-Minh Nguyen's collaboration.

Top Co-Authors

Avatar

Akira Shimazu

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Xuan Hieu Phan

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Van-Khanh Tran

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Phuong-Thai Nguyen

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Hai-Long Trieu

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Minh Quang Nhat Pham

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Minh-Tien Nguyen

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Satoshi Tojo

Japan Advanced Institute of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge