Quoc Viet Hung Nguyen
Griffith University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Quoc Viet Hung Nguyen.
international conference on data engineering | 2014
Quoc Viet Hung Nguyen; Thanh Tam Nguyen; Zoltán Miklós; Karl Aberer; Avigdor Gal; Matthias Weidlich
Schema matching is the process of establishing correspondences between the attributes of database schemas for data integration purposes. Although several automatic schema matching tools have been developed, their results are often incomplete or erroneous. To obtain a correct set of correspondences, a human expert is usually required to validate the generated correspondences. We analyze this reconciliation process in a setting where a number of schemas needs to be matched, in the presence of consistency expectations about the network of attribute correspondences. We develop a probabilistic model that helps to identify the most uncertain correspondences, thus allowing us to guide the experts work and collect his input about the most problematic cases. As the availability of such experts is often limited, we develop techniques that can construct a set of good quality correspondences with a high probability, even if the expert does not validate all the necessary correspondences. We demonstrate the efficiency of our techniques through extensive experimentation using real-world datasets.
international conference on data engineering | 2015
Thanh Tam Nguyen; Quoc Viet Hung Nguyen; Matthias Weidlich; Karl Aberer
The amount of information available on the Web has been growing dramatically, raising the importance of techniques for searching the Web. Recently, Web Tables emerged as a model, which enables users to search for information in a structured way. However, effective presentation of results for Web Table search requires (1) selecting a ranking of tables that acknowledges the diversity within the search result; and (2) summarizing the information content of the selected tables concisely but meaningful. In this paper, we formalize these requirements as the diversified table selection problem and the structured table summarization problem. We show that both problems are computationally intractable and, thus, present heuristic algorithms to solve them. For these algorithms, we prove salient performance guarantees, such as near-optimality, stability, and fairness. Our experiments with real-world collections of thousands of Web Tables highlight the scalability of our techniques. We achieve improvements up to 50% in diversity and 10% in relevance over baselines for Web Table selection, and reduce the information loss induced by table summarization by up to 50%. In a user study, we observed that our techniques are preferred over alternative solutions.
international acm sigir conference on research and development in information retrieval | 2013
Quoc Viet Hung Nguyen; Thanh Tam Nguyen; Ngoc Tran Lam; Karl Aberer
As the volumes of AI problems involving human knowledge are likely to soar, crowdsourcing has become essential in a wide range of world-wide-web applications. One of the biggest challenges of crowdsourcing is aggregating the answers collected from crowd workers; and thus, many aggregate techniques have been proposed. However, given a new application, it is difficult for users to choose the best-suited technique as well as appropriate parameter values since each of these techniques has distinct performance characteristics depending on various factors (e.g. worker expertise, question difficulty). In this paper, we develop a benchmarking tool that allows to (i) simulate the crowd and (ii) evaluate aggregate techniques in different aspects (accuracy, sensitivity to spammers, etc.). We believe that this tool will be able to serve as a practical guideline for both researchers and software developers. While researchers can use our tool to assess existing or new techniques, developers can reuse its components to reduce the development complexity.
international conference on data mining | 2017
Hongzhi Yin; Hongxu Chen; Xiaoshuai Sun; Hao Wang; Yang Wang; Quoc Viet Hung Nguyen
With the rapid rise of various e-commerce and social network platforms, users are generating large amounts of heterogeneous behavior data, such as purchasehistory, adding-to-favorite, adding-to-cart and click activities, and this kind of user behavior data is usually binary, only reflecting a users action or inaction (i.e., implicit feedback data). Tensor factorization is a promising means of modeling heterogeneous user behaviors by distinguishing different behavior types. However, ambiguity arises in the interpretation of the unobserved user behavior records that mix both real negative examples and potential positive examples. Existing tensor factorization models either ignore unobserved examples or treat all of them as negative examples, leading to either poor prediction performance or huge computation cost. In addition, the distribution of positive examples w.r.t. behavior types is heavily skewed. Existing tensor factorization models would bias towards the type of behaviors with a large number of positive examples. In this paper, we propose a scalable probabilistic tensor factorization model (SPTF) for heterogeneous behavior data and develop a novel negative sampling technique to optimize SPTF by leveraging both observed and unobserved examples with much lower computational costs and higher modeling accuracy. To overcome the issue of the heavy skewness of the behavior data distribution, we propose a novel adaptive ranking-based positive sampling approach to speed up the model convergence and improve the prediction accuracy for sparse behavior types. Our proposed model optimization techniques enable SPTF to be scalable to large-scale behavior datasets. Extensive experiments have been conducted on a large-scale e-commerce dataset, and the experimental results show the superiority of our proposed SPTF model in terms of prediction accuracy and scalability.
international conference on data engineering | 2015
Quoc Viet Hung Nguyen; Thanh Tam Nguyen; Vinh Tuan Chau; Tri Kurniawan Wijaya; Zoltán Miklós; Karl Aberer; Avigdor Gal; Matthias Weidlich
Schema matching supports data integration by establishing correspondences between the attributes of independently designed database schemas. In recent years, various tools for automatic pair-wise matching of schemas have been developed. Since the matching process is inherently uncertain, the correspondences generated by such tools are often validated by a human expert. In this work, we consider scenarios in which attribute correspondences are identified in a network of schemas and not only in a pairwise setting. Here, correspondences between different schemas are interrelated, so that incomplete and erroneous matching results propagate in the network and the validation of a correspondence by an expert has ripple effects. To analyse and reconcile such matchings in schema networks, we present the Schema Matching Analyzer and Reconciliation Tool (SMART). It allows for the definition of network-level integrity constraints for the matching and, based thereon, detects and visualizes inconsistencies of the matching. The tool also supports the reconciliation of a matching by guiding an expert in the validation process and by offering semi-automatic conflict-resolution techniques.
very large data bases | 2017
Quoc Viet Hung Nguyen; Chi Thang Duong; Thanh Tam Nguyen; Matthias Weidlich; Karl Aberer; Hongzhi Yin; Xiaofang Zhou
The amount of controversial issues being discussed on the Web has been growing dramatically. In articles, blogs, and wikis, people express their points of view in the form of arguments, i.e., claims that are supported by evidence. Discovery of arguments has a large potential for informing decision-making. However, argument discovery is hindered by the sheer amount of available Web data and its unstructured, free-text representation. The former calls for automatic text-mining approaches, whereas the latter implies a need for manual processing to extract the structure of arguments. In this paper, we propose a crowdsourcing-based approach to build a corpus of arguments, an argumentation base, thereby mediating the trade-off of automatic text-mining and manual processing in argument discovery. We develop an end-to-end process that minimizes the crowd cost while maximizing the quality of crowd answers by: (1) ranking argumentative texts, (2) pro-actively eliciting user input to extract arguments from these texts, and (3) aggregating heterogeneous crowd answers. Our experiments with real-world datasets highlight that our method discovers virtually all arguments in documents when processing only 25% of the text with more than 80% precision, using only 50% of the budget consumed by a baseline algorithm.
database systems for advanced applications | 2015
Quoc Viet Hung Nguyen; Son Thanh Do; Thanh Tam Nguyen; Karl Aberer
As the number of scientific papers getting published is likely to soar, most of modern paper management systems (e.g. ScienceWise, Mendeley, CiteULike) support tag-based retrieval. In that, each paper is associated with a set of tags, allowing user to search for relevant papers by formulating tag-based queries against the system. One of the most critical issues in tag-based retrieval is that user often has difficulties in precisely formulating his information need. Addressing this issue, our paper tackles the problem of automatically suggesting new tags for user when he formulates a query. The set of tags are selected in such a way that resolves query ambiguity in two aspects: informativeness and diversity. While the former reduces user effort in finding the desired papers, the latter enhances the variety of information shown to user. Through studying theoretical properties of this problem, we propose a heuristic-based algorithm with several salient performance guarantees. We also demonstrate the efficiency of our approach through extensive experimentation using real-world datasets.
international acm sigir conference on research and development in information retrieval | 2018
Weiqing Wang; Hongzhi Yin; Zi Huang; Qinyong Wang; Xingzhong Du; Quoc Viet Hung Nguyen
Studying recommender systems under streaming scenarios has become increasingly important because real-world applications produce data continuously and rapidly. However, most existing recommender systems today are designed in the context of an offline setting. Compared with the traditional recommender systems, large-volume and high-velocity are posing severe challenges for streaming recommender systems. In this paper, we investigate the problem of streaming recommendations being subject to higher input rates than they can immediately process with their available system resources (i.e., CPU and memory). In particular, we provide a principled framework called as SPMF (Stream-centered Probabilistic Matrix Factorization model), based on BPR (Bayesian Personalized Ranking) optimization framework, for performing efficient ranking based recommendations in stream settings. Experiments on three real-world datasets illustrate the superiority of SPMF in online recommendations.
australasian database conference | 2017
Chi Thang Duong; Quoc Viet Hung Nguyen; Sen Wang
With the advance of social media networks, people are sharing contents in an unprecedented scale. This makes social networks such as microblogs an ideal place for spreading rumors. Although different types of information are available in a post on social media, traditional approaches in rumor detection leverage only the text of the post, which limits their accuracy in detection. In this paper, we propose a provenance-aware approach based on recurrent neural network to combine the provenance information and the text of the post itself to improve the accuracy of rumor detection. Experimental results on a real-world dataset show that our technique is able to outperform state-of-the-art approaches in rumor detection.
2012 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future | 2012
Thanh Tam Nguyen; Quoc Viet Hung Nguyen; Thanh Tho Quan
There are many matching tools (or matchers) have been develop to generate correspondences of elements between two schema. However, the performances of those matchers are highly dependant on the domains they are applied. One tool may achieve best performance in a specific domain but worst when applied in other ones. In this work we propose a combination technique, which enhances mapping quality by merging several mappings. We rely on the well-known Stable Marriage (SM) approach to perform the suitable selection between multiple matching results. In order to reduce complexity and increase the accuracy of SM algorithm, we suggest to combine it with Hyperlink-Induced Topic Search (HITS) algorithm, which can reasonably filter out good candidates for matching selection. We show empirically that the combined solution yields better result than individual matchers in various domains in terms of precision and recall.