Is this you? Create Your Porfile

Khai Nguyen

Graduate University for Advanced Studies

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Khai Nguyen is active.

Explore More

Publication

Featured researches published by Khai Nguyen.

international semantic technology conference | 2012

Interlinking Linked Data Sources Using a Domain-Independent System

Khai Nguyen; Ryutaro Ichise; Bac Le

Linked data interlinking is the discovery of every owl:sameAs links between given data sources. An owl:sameAs link declares the homogeneous relation between two instances that co-refer to the same real-world object. Traditional methods compare two instances by predefined pairs of RDF predicates, and therefore they rely on the domain of the data. Recently, researchers have attempted to achieve the domain-independent goal by automatically building the linkage rules. However they still require the human curation for the labeled data as the input for learning process. In this paper, we present SLINT+, an interlinking system that is training-free and domain-independent. SLINT+ finds the important predicates of each data sources and combines them to form predicate alignments. The most useful alignments are then selected in the consideration of their confidence. Finally, SLINT+ uses selected predicate alignments as the guide for generating candidate and matching instances. Experimental results show that our system is very efficient when interlinking data sources in 119 different domains. The very considerable improvements on both precision and recall against recent systems are also reported.

knowledge discovery and data mining | 2012

Learning approach for domain-independent linked data instance matching

Khai Nguyen; Ryutaro Ichise; Hoai-Bac Le

Because almost all linked data sources are currently published by different providers, interlinking homogeneous instances of these sources is an important problem in data integration. Recently, instance matching has been used to identify owl: sameAs links between linked datasets. Previous approaches primarily use predefined maps of corresponding attributes and most of them are limited to matching in specific domains. In this paper, we propose the LFM, a learning-based instant matching system, which is designed for achieving a reliable domain-independent matcher. First, we compute the similarity vectors between labeled pairs of instances without specifying the meaning of the RDF predicates. Then a learning process is applied to learn a tree classifier for predicting whether the new pairs of instances are identical. Experiments demonstrate that our method achieves a 4% improvement in precision and recall against recent top-ranked matchers, if we use a small amount of labeled data for learning.

international semantic technology conference | 2015

Heuristic-Based Configuration Learning for Linked Data Instance Matching

Khai Nguyen; Ryutaro Ichise

Instance matching in linked data has become increasingly important because of the rapid development of linked data. The goal of instance matching is to detect co-referent instances that refer to the same real-world objects. In order to realize such instances, instance matching systems use a configuration, which specifies the matching properties, similarity measures, and other settings of the matching process. For different repositories, the configuration is varied to adapt with the particular characteristics of the input. Therefore, the automation of configuration creation is very important. In this paper, we propose \(cLink\), a supervised instance matching system for linked data. \(cLink\) is enhanced by a heuristic algorithm that learns the optimal configuration on the basic of input repositories. We show that \(cLink\) can achieve effective performance even when being given only a small amount of training data. Compared to previous configuration learning algorithms, our algorithm significantly improves the results. Compared to the recent supervised systems, \(cLink\) is also consistently better on all tested datasets.

intelligent information systems | 2017

ScLink: supervised instance matching system for heterogeneous repositories

Khai Nguyen; Ryutaro Ichise

Instance matching is the finding of co-referent instances that describe the same real-world object across two different repositories. For this problem, the heterogeneity, also known as the differences of objects’ attributes and repositories’ schema, is a challenging issue. It creates the limitations in the accuracy of existing solutions. In order to match the instances of heterogeneous repositories, a matching system can follow a configuration that specifies the equivalent properties, suitable similarity metrics, and other important parameters. This configuration can be created manually or automatically by learning methods. We present ScLink, an instance matching system that can generate a configuration automatically. In ScLink, we install two novel supervised learning algorithms, cLearn and minBlock. cLearn applies an apriori-like heuristic for finding the optimal combination of matching properties and similarity metrics. minBlock finds a blocking model, which aims at optimally reducing the pairwise alignments of instances between input repositories. In addition, ScLink introduces other techniques to take into account the scalability issue on large repositories. Experimental results on standard and very large datasets find that minBlock and cLearn are very effective and efficient. cLearn is also significantly better than existing configuration learning algorithms. It drastically boosts the accuracy of ScLink and makes the system outperform the state-of-the-arts, even when being trained using a small amount of labeled data.

International Journal on Semantic Web and Information Systems | 2017

Automatic Schema-Independent Linked Data Instance Matching System

Khai Nguyen; Ryutaro Ichise

Twitter is one of the most popular microblog service providers, in this microblogging platform users use hashtags to categorize their tweets and to join communities around particular topics. However, the percentage of messages incorporating hashtags is small and the hashtags usage is very heterogeneous as users may spend a lot of time searching the appropriate hashtags for their messages. In this paper, the authors present an approach for hashtag recommendations in microblogging platforms by leveraging semantic features. Moreover, they conduct a detailed study on how the semantic-based model influences the final recommended hashtags using different ranking strategies. Also, users are interested by fresh and specific hashtags due to the rapid growth of microblogs, thus, the authors propose a time popularity ranking strategy. Furthermore, they study the combination of these ranking strategies. The experiment results conducted on a large dataset; show that their approach improves respectively lexical and semantic based recommendation by more than 11% and 7% on recommending 5 hashtags.

biomedical engineering systems and technologies | 2018

Clinical Ontology Mapping - Toward Automatic Care Plan Recommendation

Khai Nguyen; Kaisei Reio; Ryutaro Ichise

In this paper, we share a sketch of an automatic care plan recommendation system in Japan. After that, we describe our proposed method and experience in the first step: clinical ontology mapping. We discuss the difficulties, method, preliminary results of a case study, which is to find corresponding mappings between two ontologies, the Minimum Data Set 3.0 (MDS) a and the International Classification of Functioning, Disability and Health (ICF)b.

asian conference on intelligent information and database systems | 2018

A Novel Method to Predict Type for DBpedia Entity

Thi-Nhu Nguyen; Hideaki Takeda; Khai Nguyen; Ryutaro Ichise; Tuan-Dung Cao

Based on extracting information from Wikipedia, DBpedia is a large scale knowledge base and makes this one available using Semantic Web and Linked Data principles. Thanks to crowd-sourcing, it currently covers multiples domains in multilingualism. Knowledge is obtained from different Wikipedia editions by effort of contributors around the world. Their goal is to manually generate mappings Wikipedia templates into DBpedia ontology classes (types). However, this cause makes the type inconsistency for an entity among different languages. As a result, the quality of data in DBpedia can be affected. In this paper, we present the statement of type consistency for an entity in multilingualism. As a solution for this problem, we propose a method to predict the entity type based on a novel conformity measure. We also evaluate our method based on database extracted from aggregating multilingual resources and compare it with human perception in predicting type for an entity. The experimental result shows that our method can suggest informative types and outperforms the baselines.

ieee international conference semantic computing | 2017

Enhancing Coreference Classifiers Using a Ranking-Aware Feature

Khai Nguyen; Ryutaro Ichise

A coreference refers to different instances of the same real-world entity. Coreference classification is an important problem in knowledge and data management. The basic idea is to predict whether two instances are matched or non-matched, based on their similarity vector. Previous efforts on coreference classification share a common weakness. It is the unawareness of the ambiguity variation among different clusters of instance-pairs. We discuss the importance of clusterwise instance-pairs local ranking, which is an effective solution for the ambiguity variation. Since then, we study the inclusive possibility of the ranking factor in a classifier globally trained from all clusters. Finally, we propose to include an extra element in the original similarity vector of the instancepairs. Such extra element is a ranking-aware feature, which represents the preference of an instance-pair against its cluster. The experiment results confirm that the proposed feature significantly boosts the performance of many classifiers.

international conference on ontology matching | 2012