Is this you? Create Your Porfile

Ngoc Thanh Nguyen

University of Science and Technology, Sana'a

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ngoc Thanh Nguyen is active.

Explore More

Publication

Featured researches published by Ngoc Thanh Nguyen.

Knowledge Based Systems | 2017

A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields

Van Cuong Tran; Ngoc Thanh Nguyen; Hamido Fujita; Dinh Tuyen Hoang; Dosam Hwang

Abstract In recent years, many applications in natural language processing (NLP) have been developed using the machine learning approach. Annotating data is an important task in applying machine learning to NLP applications. A common approach to improve the system performance is to train on a large and high-quality set of training data that is annotated by experts. Besides, active learning (AL) and self-learning can be utilized to reduce the annotation costs. The self-learning method discovers highly reliable instances based on a trained classifier, while AL queries the most informative instances based on active query algorithms. This paper proposes a method that combines AL and self-learning to reduce the labeling effort for the named entity recognition task from tweet streams by using both machine-labeled and manually-labeled data. We employ AL queries based on the diversity of the context and content of instances to select the most informative instances. The conditional random fields are also chosen as an underlying model to train a classifier for selecting highly reliable instances. The experiments using Twitter data show that the proposed method achieves good results in reducing the human labeling effort, and it can significantly improve the performance of the systems.

Information Sciences | 2018

An influence analysis of diversity and collective cardinality on collective performance

Van Du Nguyen; Ngoc Thanh Nguyen

Abstract This paper presents a general framework to demonstrate the prominent role of diversity in the effectiveness of collective performance. There appears to be ample evidence that diversity is one of the essential criteria of which a collective to be intelligent. Intuitively, a collective involving diverse individuals may add new information, new perspectives, and so forth on the problem that needs to be solved. Moreover, the diversity of individual solutions to the given problem has been proven helpful in eliminating the phenomenon of correlated errors. The objective of the paper is to investigate the influence of the latter kind of diversity on the collective performance by taking into account the collective cardinality. Our findings qualify the positive impact of diversity on collective performance. Particularly, collectives with higher diversity levels will lead to better collective performances. Subsequently, expanding the collective cardinality that causes an increase in its diversity will also be positively associated with the collective performance. With some restrictions, the hypothesis “the more diverse the collective, the higher the collective performance” is formally proved. Furthermore, the conditions under which increasing the cardinality of a collective will cause its diversity to be increased (or decreased) are worked out.

asian conference on intelligent information and database systems | 2017

A Consensus-Based Method to Enhance a Recommendation System for Research Collaboration

Dinh Tuyen Hoang; Van Cuong Tran; Tuong Tri Nguyen; Ngoc Thanh Nguyen; Dosam Hwang

With the development of scientific societies, research problems are increasingly complex, requiring scientists to collaborate to solve them. The quality of collaboration between researchers is a major factor in determining their achievements. This study proposes a collaboration recommendation method that takes into account previous research collaboration and research similarities. Research collaboration is measured by combining the collaboration time and the number of co-authors who already collaborated with an author. Research similarity is based on authors’ previous publications and academic events they attended. In addition, a consensus-based algorithm is proposed to integrate bibliography data from different sources, such as the DBLP Computer Science Bibliography, ResearchGate, CiteSeer, and Google Scholar. The experimental results show that this proposal improves the accuracy of the recommendation systems, in comparison with other methods.

MISSI | 2017

Active Learning-Based Approach for Named Entity Recognition on Short Text Streams

Cuong Van Tran; Tuong Tri Nguyen; Dinh Tuyen Hoang; Dosam Hwang; Ngoc Thanh Nguyen

The named entity recognition (NER) problem has an important role in many natural language processing (NLP) applications and is one of the fundamental tasks for building NLP systems. Supervised learning methods can achieve high performance but they require a large amount of training data that is time-consuming and expensive to obtain. Active learning (AL) is well-suited to many problems in NLP, where unlabeled data may be abundant but labeled data is limited. The AL method aims to minimize annotation costs while maximizing the desired performance from the model. This study proposes a method to classify named entities from Tweet streams on Twitter by using an AL method with different query strategies. The samples were queried for labeling by human annotators based on query by committee and diversity-based querying. The experiments evaluated the proposed method on Tweet data and achieved promising results that proved better than the baseline.

MISSI | 2017

An Effective Collaborative Filtering Based Method for Movie Recommendation

Rafał Palak; Ngoc Thanh Nguyen

Collaborative filtering approach is one of the most widely used in recommendation processes. The big problem of this approach is its complexity and scalability. This paper presents an effective method for movie recommendation based on collaborative filtering. We show that the computational complexity of our method is lower than one known from the literature, worked out by Lekakos and Caravelas (Multimedia Tools Appl 36(1–2):55–70 (2006), [10]).

Cybernetics and Systems | 2017

Text Clustering Using Frequent Weighted Utility Itemsets

Tram Tran; Bay Vo; Tho Thi Ngoc Le; Ngoc Thanh Nguyen

ABSTRACT Text clustering is an important topic in text mining. One of the most effective methods for text clustering is an approach based on frequent itemsets (FIs), and thus, there are many related algorithms that aim to improve the accuracy of text clustering. However, these do not focus on the weights of terms in documents, even though the frequency of each term in each document has a great impact on the results. In this work, we propose a new method for text clustering based on frequent weighted utility itemsets (FWUI). First, we calculate the Term Frequency (TF) for each term in documents to create a weight matrix for all documents. The weights of terms in documents are based on the Inverse Document Frequency. Next, we use the Modification Weighted Itemset Tidset (MWIT)-FWUI algorithm for mining FWUI from a number matrix and the weights of terms in documents. Finally, based on frequent utility itemsets, we cluster documents using the MC (Maximum Capturing) algorithm. The proposed method has been evaluated on three data sets consisting of 1,600 documents covering 16 topics. The experimental results show that our method, using FWUI, improves the accuracy of the text clustering compared to methods using FIs.

international conference on computational collective intelligence | 2017

Intelligent Collective: The Role of Diversity and Collective Cardinality

Van Du Nguyen; Mercedes G. Merayo; Ngoc Thanh Nguyen

Nowadays, there appears to be ample evidence that collectives can be intelligent if they satisfy diversity, independence, decentralization, and aggregation. Although many measures have been proposed to evaluate the quality of collective prediction, it seems that they may not adequately reflect the intelligence degree of a collective. It is due to the fact that they take into account either the accuracy of collective prediction; or the comparison between the capability of a collective to those of its members in solving a given problem. In this paper, we first introduce a new function that measures the intelligence degree of a collective. Following, we carry out simulation experiments to determine the impact of diversity on the intelligence degree of a collective by taking into account its cardinality. Our findings reveal that diversity plays a major role in leading a collective to be intelligent. Moreover, the simulation results also indicate a case in which the increase in the cardinality of a collective does not cause any significant increase in its intelligence degree.

asian conference on intelligent information and database systems | 2017

A Hybrid Method for Named Entity Recognition on Tweet Streams

Van Cuong Tran; Dinh Tuyen Hoang; Ngoc Thanh Nguyen; Dosam Hwang

Information extraction from microblogs has recently attracted researchers in the fields of knowledge discovery and data mining owing to its short nature. Annotating data is one of the significant issues in applying machine learning approaches to these sources. Active learning (AL) and semi-supervised learning (SSL) are two distinct approaches to reduce annotation costs. The SSL approach exploits high-confidence samples and AL queries the most informative samples. Thus they can produce better results when jointly applied. This paper proposes a combination of AL and SSL to reduce the labeling effort for named entity recognition (NER) from tweet streams by using both machine-labeled and manually-labeled data. The AL query algorithms select the most informative samples to label those done by a human annotator. In addition, Conditional Random Field (CRF) is chosen as an underlying model to select high-confidence samples. The experiment results on a tweet dataset demonstrate that the proposed method achieves promising results in reducing the human labeling effort and that it can significantly improve the performance of NER systems.

Cybernetics and Systems | 2017

Improving Academic Event Recommendation Using Research Similarity and Interaction Strength Between Authors

Dinh Tuyen Hoang; Van Cuong Tran; Van Du Nguyen; Ngoc Thanh Nguyen; Dosam Hwang

ABSTRACT The scientific community is growing very quickly. Every year a huge number of academic events (conferences, workshops, symposiums, etc.) are organized over the world. Therefore, it is difficult for researchers to find related information about the events in which they may be interested. In this study, we present an improvement to existing academic event recommendation methods by taking into account research similarity and interaction strength between authors. By means of experimental analysis on data from the DBLP Computer Science Bibliography and Wiki Calls for Papers (WikiCFP), we will show that the proposed method improves the accuracy of the recommendations in comparison with other methods.

web intelligence, mining and semantics | 2018

A Comparative Study of Methods for Collective Prediction Determination Using Interval Estimates

Van Du Nguyen; Hai Bang Truong; Trong Hai Duong; Mercedes G. Merayo; Ngoc Thanh Nguyen

Recently, research on the Wisdom of Crowd (WoC) has been widely expanded by supporting interval values as an additional representation of underlying predictions. Accordingly, instead of giving single values, ones can express their predictions on a given cognition problem in the form of interval values1. For such a representation, many methods have been proposed for aggregating underlying predictions based on their midpoints. In this case, of course, the outputs of the proposed methods are single values. In some situations, however, the aggregated prediction in the form of interval value can be better representation of underlying predictions. In the current study, we present a comparison of the use of different approaches for aggregating individual predictions including Interval Aggregation and MidPoint Aggregation. Experimental studies have been conducted to determine how do different aggregation methods influence the quality of the obtained collective prediction.

Explore More