Is this you? Create Your Porfile

Thanh Tam Nguyen

École Polytechnique Fédérale de Lausanne

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thanh Tam Nguyen is active.

Explore More

Publication

Featured researches published by Thanh Tam Nguyen.

international conference on data engineering | 2014

Pay-as-you-go reconciliation in schema matching networks

Quoc Viet Hung Nguyen; Thanh Tam Nguyen; Zoltán Miklós; Karl Aberer; Avigdor Gal; Matthias Weidlich

Schema matching is the process of establishing correspondences between the attributes of database schemas for data integration purposes. Although several automatic schema matching tools have been developed, their results are often incomplete or erroneous. To obtain a correct set of correspondences, a human expert is usually required to validate the generated correspondences. We analyze this reconciliation process in a setting where a number of schemas needs to be matched, in the presence of consistency expectations about the network of attribute correspondences. We develop a probabilistic model that helps to identify the most uncertain correspondences, thus allowing us to guide the experts work and collect his input about the most problematic cases. As the availability of such experts is often limited, we develop techniques that can construct a set of good quality correspondences with a high probability, even if the expert does not validate all the necessary correspondences. We demonstrate the efficiency of our techniques through extensive experimentation using real-world datasets.

international conference on data engineering | 2015

Result selection and summarization for Web Table search

Thanh Tam Nguyen; Quoc Viet Hung Nguyen; Matthias Weidlich; Karl Aberer

The amount of information available on the Web has been growing dramatically, raising the importance of techniques for searching the Web. Recently, Web Tables emerged as a model, which enables users to search for information in a structured way. However, effective presentation of results for Web Table search requires (1) selecting a ranking of tables that acknowledges the diversity within the search result; and (2) summarizing the information content of the selected tables concisely but meaningful. In this paper, we formalize these requirements as the diversified table selection problem and the structured table summarization problem. We show that both problems are computationally intractable and, thus, present heuristic algorithms to solve them. For these algorithms, we prove salient performance guarantees, such as near-optimality, stability, and fairness. Our experiments with real-world collections of thousands of Web Tables highlight the scalability of our techniques. We achieve improvements up to 50% in diversity and 10% in relevance over baselines for Web Table selection, and reduce the information loss induced by table summarization by up to 50%. In a user study, we observed that our techniques are preferred over alternative solutions.

international acm sigir conference on research and development in information retrieval | 2013

BATC: a benchmark for aggregation techniques in crowdsourcing

Quoc Viet Hung Nguyen; Thanh Tam Nguyen; Ngoc Tran Lam; Karl Aberer

As the volumes of AI problems involving human knowledge are likely to soar, crowdsourcing has become essential in a wide range of world-wide-web applications. One of the biggest challenges of crowdsourcing is aggregating the answers collected from crowd workers; and thus, many aggregate techniques have been proposed. However, given a new application, it is difficult for users to choose the best-suited technique as well as appropriate parameter values since each of these techniques has distinct performance characteristics depending on various factors (e.g. worker expertise, question difficulty). In this paper, we develop a benchmarking tool that allows to (i) simulate the crowd and (ii) evaluate aggregate techniques in different aspects (accuracy, sensitivity to spammers, etc.). We believe that this tool will be able to serve as a practical guideline for both researchers and software developers. While researchers can use our tool to assess existing or new techniques, developers can reuse its components to reduce the development complexity.

international conference on data engineering | 2015

SMART: A tool for analyzing and reconciling schema matching networks

Quoc Viet Hung Nguyen; Thanh Tam Nguyen; Vinh Tuan Chau; Tri Kurniawan Wijaya; Zoltán Miklós; Karl Aberer; Avigdor Gal; Matthias Weidlich

Schema matching supports data integration by establishing correspondences between the attributes of independently designed database schemas. In recent years, various tools for automatic pair-wise matching of schemas have been developed. Since the matching process is inherently uncertain, the correspondences generated by such tools are often validated by a human expert. In this work, we consider scenarios in which attribute correspondences are identified in a network of schemas and not only in a pairwise setting. Here, correspondences between different schemas are interrelated, so that incomplete and erroneous matching results propagate in the network and the validation of a correspondence by an expert has ripple effects. To analyse and reconcile such matchings in schema networks, we present the Schema Matching Analyzer and Reconciliation Tool (SMART). It allows for the definition of network-level integrity constraints for the matching and, based thereon, detects and visualizes inconsistencies of the matching. The tool also supports the reconciliation of a matching by guiding an expert in the validation process and by offering semi-automatic conflict-resolution techniques.

very large data bases | 2017

Argument discovery via crowdsourcing

Quoc Viet Hung Nguyen; Chi Thang Duong; Thanh Tam Nguyen; Matthias Weidlich; Karl Aberer; Hongzhi Yin; Xiaofang Zhou

The amount of controversial issues being discussed on the Web has been growing dramatically. In articles, blogs, and wikis, people express their points of view in the form of arguments, i.e., claims that are supported by evidence. Discovery of arguments has a large potential for informing decision-making. However, argument discovery is hindered by the sheer amount of available Web data and its unstructured, free-text representation. The former calls for automatic text-mining approaches, whereas the latter implies a need for manual processing to extract the structure of arguments. In this paper, we propose a crowdsourcing-based approach to build a corpus of arguments, an argumentation base, thereby mediating the trade-off of automatic text-mining and manual processing in argument discovery. We develop an end-to-end process that minimizes the crowd cost while maximizing the quality of crowd answers by: (1) ranking argumentative texts, (2) pro-actively eliciting user input to extract arguments from these texts, and (3) aggregating heterogeneous crowd answers. Our experiments with real-world datasets highlight that our method discovers virtually all arguments in documents when processing only 25% of the text with more than 80% precision, using only 50% of the budget consumed by a baseline algorithm.

database systems for advanced applications | 2015

Tag-based Paper Retrieval: Minimizing User Effort with Diversity Awareness

Quoc Viet Hung Nguyen; Son Thanh Do; Thanh Tam Nguyen; Karl Aberer

As the number of scientific papers getting published is likely to soar, most of modern paper management systems (e.g. ScienceWise, Mendeley, CiteULike) support tag-based retrieval. In that, each paper is associated with a set of tags, allowing user to search for relevant papers by formulating tag-based queries against the system. One of the most critical issues in tag-based retrieval is that user often has difficulties in precisely formulating his information need. Addressing this issue, our paper tackles the problem of automatically suggesting new tags for user when he formulates a query. The set of tags are selected in such a way that resolves query ambiguity in two aspects: informativeness and diversity. While the former reduces user effort in finding the desired papers, the latter enhances the variety of information shown to user. Through studying theoretical properties of this problem, we propose a heuristic-based algorithm with several salient performance guarantees. We also demonstrate the efficiency of our approach through extensive experimentation using real-world datasets.

2012 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future | 2012

A Framework to Combine Multiple Matchers for Pair-Wise Schema Matching

Thanh Tam Nguyen; Quoc Viet Hung Nguyen; Thanh Tho Quan

There are many matching tools (or matchers) have been develop to generate correspondences of elements between two schema. However, the performances of those matchers are highly dependant on the domains they are applied. One tool may achieve best performance in a specific domain but worst when applied in other ones. In this work we propose a combination technique, which enhances mapping quality by merging several mappings. We rely on the well-known Stable Marriage (SM) approach to perform the suitable selection between multiple matching results. In order to reduce complexity and increase the accuracy of SM algorithm, we suggest to combine it with Hyperlink-Induced Topic Search (HITS) algorithm, which can reasonably filter out good candidates for matching selection. We show empirically that the combined solution yields better result than individual matchers in various domains in terms of precision and recall.

international conference on data engineering | 2018