Is this you? Create Your Porfile

Ruiming Tang

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ruiming Tang is active.

Explore More

Publication

Featured researches published by Ruiming Tang.

database and expert systems applications | 2013

What you Pay for is What you Get

Ruiming Tang; Dongxu Shao; Stéphane Bressan; Patrick Valduriez

In most data markets, prices are prescribed and accuracy is determined by the data. Instead, we consider a model in which accuracy can be traded for discounted prices: “what you pay for is what you get”.

database and expert systems applications | 2014

Get a Sample for a Discount

Ruiming Tang; Antoine Amarilli; Pierre Senellart; Stéphane Bressan

While price and data quality should define the major trade-off for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this paper we study a model where data quality can be traded for a discount. We focus on the case of XML documents and consider completeness as the quality dimension. In our setting, the data provider offers an XML document, and sets both the price of the document and a weight to each node of the document, depending on its potential worth. The data consumer proposes a price. If the proposed price is lower than that of the entire document, then the data consumer receives a sample, i.e., a random rooted subtree of the document whose selection depends on the discounted price and the weight of nodes. By requesting several samples, the data consumer can iteratively explore the data in the document. We show that the uniform random sampling of a rooted subtree with prescribed weight is unfortunately intractable. However, we are able to identify several practical cases that are tractable. The first case is uniform random sampling of a rooted subtree with prescribed size; the second case restricts to binary weights. For both these practical cases we present polynomial-time algorithms and explain how they can be integrated into an iterative exploratory sampling approach.

database and expert systems applications | 2012

A Framework for Conditioning Uncertain Relational Data

Ruiming Tang; Reynold Cheng; Huayu Wu; Stéphane Bressan

We propose a framework for representing conditioned probabilistic relational data. In this framework the existence of tuples in possible worlds is determined by Boolean expressions composed from elementary events. The probability of a possible world is computed from the probabilities associated with these elementary events. In addition, a set of global constraints conditions the database. Conditioning is the formalization of the process of adding knowledge to a database. Some worlds may be impossible given the constraints and the probabilities of possible worlds are accordingly re-defined. The new constraints can come from the observation of the existence or non-existence of a tuple, from the knowledge of a specific rule, such as the existence of an exclusive set of tuples, or from the knowledge of a general rule, such as a functional dependency. We are therefore interested in computing a concise representation of the possible worlds and their respective probabilities after the addition of new constraints, namely an equivalent probabilistic database instance without constraints after conditioning. We devise and present a general algorithm for this computation. Unfortunately, the general problem involves the simplification of general Boolean expressions and is NP-hard. We therefore identify specific practical families of constraints for which we devise and present efficient algorithms.

international conference on communications | 2015

An efficient and truthful pricing mechanism for team formation in crowdsourcing markets

Qing Liu; Tie Luo; Ruiming Tang; Stéphane Bressan

In a crowdsourcing market, a requester is looking to form a team of workers to perform a complex task that requires a variety of skills. Candidate workers advertise their certified skills and bid prices for their participation. We design four incentive mechanisms for selecting workers to form a valid team (that can complete the task) and determining each individual workers payment. We examine profitability, individual rationality, computational efficiency, and truthfulness for each of the four mechanisms. Our analysis shows that TruTeam, one of the four mechanisms, is superior to the others, particularly due to its computational efficiency and truthfulness. Our extensive simulations confirm the analysis and demonstrate that TruTeam is an efficient and truthful pricing mechanism for team formation in crowdsourcing markets.

database systems for advanced applications | 2014

Integration of Web Sources Under Uncertainty and Dependencies Using Probabilistic XML

M. Lamine Ba; Sébastien Montenez; Ruiming Tang; Talel Abdessalem

We study in this vision paper the problem of integrating several web data sources under uncertainty and dependencies. We present a concrete application with web sources about objects in the maritime domain where uncertainties and dependencies are omnipresent. Uncertainties are mainly caused by imprecise information trackers and imperfect human knowledge. Dependencies come from the recurrent copying relationships occurring among the sources. We answer the issue of data integration in such a setting by reformulating it as the merge of several uncertain versions of the same global XML document. As an initial result, we put forward a probabilistic XML data integration model by getting some intuitions from the versioning model with uncertain data we proposed in [5]. We explain how this model can be used for materializing the integration outcome.

database and expert systems applications | 2013

The Price Is Right

Ruiming Tang; Huayu Wu; Zhifeng Bao; Stéphane Bressan; Patrick Valduriez

Data is a modern commodity. Yet the pricing models in use on electronic data markets either focus on the usage of computing resources, or are proprietary, opaque, most likely ad hoc, and not conducive of a healthy commodity market dynamics. In this paper we propose a generic data pricing model that is based on minimal provenance, i.e. minimal sets of tuples contributing to the result of a query. We show that the proposed model ful lls desirable properties such as contribution mono- tonicity, bounded-price and contribution arbitrage-freedom. We present a baseline algorithm to compute the exact price of a query based on our pricing model. We show that the problem is NP-hard. We therefore devise, present and compare several heuristics. We conduct a comprehensive experimental study to show their effectiveness and effciency.

database systems for advanced applications | 2014

Conditioning Probabilistic Relational Data with Referential Constraints

Ruiming Tang; Dongxu Shao; M. Lamine Ba; Huayu Wu

A probabilistic relational database is a compact form of a set of deterministic relational databases (namely, possible worlds), each of which has a probability. In our framework, the existence of tuples is determined by associated Boolean formulae based on elementary events. An estimation, within such a setting, of the probabilities of possible worlds uses a prior probability distribution specified over the elementary events. Direct observations and general knowledge, in the form of constraints, help refining these probabilities, possibly ruling out some possible worlds. More precisely, new constraints can translate the observation of the existence or non-existence of a tuple, the knowledge of a well-defined rule, such as primary key constraint, foreign key constraint, referential constraint, etc. Informally, the process of enforcing knowledge on a probabilistic database, which consists of computing a new subset of valid possible worlds together with their new (conditional) probabilities, is called conditioning. In this paper, we are interested in finding a new probabilistic relational database after conditioning with referential constraints involved. In the most general case, conditioning is intractable. As a result, we restricted our study to probabilistic relational databases in which formulae of tuples are independent events in order to achieve some tractability results. We devise and present polynomial algorithms for conditioning probabilistic relational databases with referential constraints.

database and expert systems applications | 2016

A Framework for Sampling-Based XML Data Pricing

Ruiming Tang; Antoine Amarilli; Pierre Senellart; Stéphane Bressan

While price and data quality should define the major trade-off for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this paper we study a model where data quality can be traded for a discount. We focus on the case of XML documents and consider completeness as the quality dimension. In our setting, the data provider offers an XML document, and sets both the price of the document and a weight to each node of the document, depending on its potential worth. The data consumer proposes a price. If the proposed price is lower than that of the entire document, then the data consumer receives a sample, i.e., a random rooted subtree of the document whose selection depends on the discounted price and the weight of nodes. By requesting several samples, the data consumer can iteratively explore the data in the document. We present a pseudo-polynomial time algorithm to select a rooted subtree with prescribed weight uniformly at random, but show that this problem is unfortunately intractable. Yet, we are able to identify several practical cases where our algorithm runs in polynomial time. The first case is uniform random sampling of a rooted subtree with prescribed size rather than weights; the second case restricts to binary weights. As a more challenging scenario for the sampling problem, we also study the uniform sampling of a rooted subtree of prescribed weight and prescribed height. We adapt our pseudo-polynomial time algorithm to this setting and identify tractable cases.

database and expert systems applications | 2012

A Hybrid Approach for General XML Query Processing

Huayu Wu; Ruiming Tang; Tok Wang Ling; Yong Zeng; Stéphane Bressan

The state-of-the-art XML twig pattern query processing algorithms focus on matching a single twig pattern to a document. However, many practical queries are modeled by multiple twig patterns with joins to link them. The output of twig pattern matching is tuples of labels, while the joins between twig patterns are based on values. The inefficiency of integrating label-based structural joins in twig pattern matching and value-based joins to link patterns becomes an obstacle preventing those structural join algorithms in literatures from being adopted in practical XML query processors. In this paper, we propose a hybrid approach to bridge this gap. In particular, we introduce both relational tables and inverted lists to organize values and elements respectively. General XML queries involving several twig patterns are processed by the both data structures. We further analyze join order selection for a general query with both pattern matching and value-based join, which is essential for the generation of a good query plan.

web age information management | 2011

Measuring XML structured-ness with entropy

Ruiming Tang; Huayu Wu; Stéphane Bressan

XML is semi-structured. It can be used to annotate unstructured data, to represent structured data and almost anything in-between. Yet, it is unclear how to formally characterize, yet to quantify, structured-ness of XML. In this paper we propose and evaluate entropy-based metrics for XML structured-ness. The metrics measure the structural uniformity of path and subtrees, respectively. We empirically study the correlation of these metrics with real and synthetic data sets.

Explore More