Yael Amsterdamer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yael Amsterdamer is active.

Explore More

Publication

Featured researches published by Yael Amsterdamer.

international conference on management of data | 2013

Crowd mining

Yael Amsterdamer; Yael Grossman; Tova Milo; Pierre Senellart

Harnessing a crowd of Web users for data collection has recently become a wide-spread phenomenon. A key challenge is that the human knowledge forms an open world and it is thus difficult to know what kind of information we should be looking for. Classic databases have addressed this problem by data mining techniques that identify interesting data patterns. These techniques, however, are not suitable for the crowd. This is mainly due to properties of the human memory, such as the tendency to remember simple trends and summaries rather than exact details. Following these observations, we develop here for the first time the foundations of crowd mining. We first define the formal settings. Based on these, we design a framework of generic components, used for choosing the best questions to ask the crowd and mining significant patterns from the answers. We suggest general implementations for these components, and test the resulting algorithms performance on benchmarks that we designed for this purpose. Our algorithm consistently outperforms alternative baseline algorithms.

international conference on management of data | 2014

OASSIS: query driven crowd mining

Yael Amsterdamer; Susan B. Davidson; Tova Milo; Slava Novgorodov; Amit Somech

Crowd data sourcing is increasingly used to gather information from the crowd and to obtain recommendations. In this paper, we explore a novel approach that broadens crowd data sourcing by enabling users to pose general questions, to mine the crowd for potentially relevant data, and to receive concise, relevant answers that represent frequent, significant data patterns. Our approach is based on (1) a simple generic model that captures both ontological knowledge as well as the individual history or habits of crowd members from which frequent patterns are mined; (2) a query language in which users can declaratively specify their information needs and the data patterns of interest; (3) an efficient query evaluation algorithm, which enables mining semantically concise answers while minimizing the number of questions posed to the crowd; and (4) an implementation of these ideas that mines the crowd through an interactive user interface. Experimental results with both real-life crowd and synthetic data demonstrate the feasibility and effectiveness of the approach.

international conference on database theory | 2014

On the Complexity of Mining Itemsets from the Crowd Using Taxonomies

Antoine Amarilli; Yael Amsterdamer; Tova Milo

We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset is frequent or not, and relies on a known taxonomy of the item domain to guide the search for frequent itemsets. In the spirit of data mining with oracles, we analyze the complexity of this problem in terms of (i) crowd complexity, that measures the number of crowd questions required to identify the frequent itemsets; and (ii) computational complexity, that measures the computational effort required to choose the questions. We provide lower and upper complexity bounds in terms of the size and structure of the input taxonomy, as well as the size of a concise description of the output itemsets. We also provide constructive algorithms that achieve the upper bounds, and consider more efficient variants for practical situations.

very large data bases | 2013

CrowdMiner: mining association rules from the crowd

Yael Amsterdamer; Yael Grossman; Tova Milo; Pierre Senellart

This demo presents CrowdMiner, a system enabling the mining of interesting data patterns from the crowd. While traditional data mining techniques have been used extensively for finding patterns in classic databases, they are not always suitable for the crowd, mainly because humans tend to remember only simple trends and summaries rather than exact details. To address this, CrowdMiner employs a novel crowd-mining algorithm, designed specifically for this context. The algorithm iteratively chooses appropriate questions to ask the crowd, while aiming to maximize the knowledge gain at each step. We demonstrate CrowdMiner through a Well-Being portal, constructed interactively by mining the crowd, and in particular the conference participants, for common health related practices and trends.

symposium on principles of database systems | 2011

On provenance minimization

Yael Amsterdamer; Daniel Deutch; Tova Milo; Val Tannen

Provenance information has been proved to be very effective in capturing the computational process performed by queries, and has been used extensively as the input to many advanced data management tools (e.g. view maintenance, trust assessment, or query answering in probabilistic databases). We study here the core of provenance information, namely the part of provenance that appears in the computation of every query equivalent to the given one. This provenance core is informative as it describes the part of the computational process that is inherent to the query. It is also useful as a compact input to the above mentioned data management tools. We study algorithms that, given a query, compute an equivalent query that realizes the core provenance for all tuples in its result. We study these algorithms for queries of varying expressive power. Finally, we observe that, in general, one would not want to require database systems to evaluate a specific query that realizes the core provenance, but instead to be able to find, possibly off-line, the core provenance of a given tuple in the output (computed by an arbitrary equivalent query), without rewriting the query. We provide algorithms for such direct computation of the core provenance.

database systems for advanced applications | 2014

Uncertainty in Crowd Data Sourcing Under Structural Constraints

Antoine Amarilli; Yael Amsterdamer; Tova Milo

Applications extracting data from crowdsourcing platforms must deal with the uncertainty of crowd answers in two different ways: first, by deriving estimates of the correct value from the answers; second, by choosing crowd questions whose answers are expected to minimize this uncertainty relative to the overall data collection goal. Such problems are already challenging when we assume that questions are unrelated and answers are independent, but they are even more complicated when we assume that the unknown values follow hard structural constraints (such as monotonicity).

ACM Transactions on Database Systems | 2012

On Provenance Minimization

Yael Amsterdamer; Daniel Deutch; Tova Milo; Val Tannen

Provenance information has been proved to be very effective in capturing the computational process performed by queries, and has been used extensively as the input to many advanced data management tools (e.g., view maintenance, trust assessment, or query answering in probabilistic databases). We observe here that while different (set-)equivalent queries may admit different provenance expressions when evaluated on the same database, there is always some part of these expressions that is common to all. We refer to this part as the core provenance. In addition to being informative, the core provenance is also useful as a compact input to the aforementioned data management tools. We formally define the notion of core provenance. We study algorithms that, given a query, compute an equivalent (called p-minimal) query that for every input database, the provenance of every result tuple is the core provenance. We study such algorithms for queries of varying expressive power (namely conjunctive queries with disequalities and unions thereof). Finally, we observe that, in general, one would not want to require database systems to execute a specific p-minimal query, but instead to be able to find, possibly off-line, the core provenance of a given tuple in the output (computed by an arbitrary equivalent query), without reevaluating the query. We provide algorithms for such direct computation of the core provenance.

very large data bases | 2015

A natural language interface for querying general and individual knowledge

Yael Amsterdamer; Anna Kukliansky; Tova Milo

Many real-life scenarios require the joint analysis of general knowledge, which includes facts about the world, with individual knowledge, which relates to the opinions or habits of individuals. Recently developed crowd mining platforms, which were designed for such tasks, are a major step towards the solution. However, these platforms require users to specify their information needs in a formal, declarative language, which may be too complicated for naive users. To make the joint analysis of general and individual knowledge accessible to the public, it is desirable to provide an interface that translates the user questions, posed in natural language (NL), into the formal query languages that crowd mining platforms support. While the translation of NL questions to queries over conventional databases has been studied in previous work, a setting with mixed individual and general knowledge raises unique challenges. In particular, to support the distinct query constructs associated with these two types of knowledge, the NL question must be partitioned and translated using different means; yet eventually all the translated parts should be seamlessly combined to a well-formed query. To account for these challenges, we design and implement a modular translation framework that employs new solutions along with state-of-the art NL parsing tools. The results of our experimental study, involving real user questions on various topics, demonstrate that our framework provides a high-quality translation for many questions that are not handled by previous translation tools.

international conference on management of data | 2015

NL 2 CM: A Natural Language Interface to Crowd Mining

Yael Amsterdamer; Anna Kukliansky; Tova Milo

The joint processing of general data, which can refer to objective data such as geographical locations, with individual data, which is related to the habits and opinions of individuals, is required in many real-life scenarios. For this purpose, crowd mining platforms combine searching knowledge bases for general data, with mining the crowd for individual, unrecorded data. Existing such platforms require queries to be stated in a formal language. To bridge the gap between naïve users, who are not familiar with formal query languages, and crowd mining platforms, we develop NL2CM, a prototype system which translates natural language (NL) questions into well-formed crowd mining queries. The mix of general and individual information needs raises unique challenges. In particular, the different types of needs must be identified and translated into separate query parts. To account for these challenges, we develop new, dedicated modules and embed them within the modular and easily extensible architecture of NL2CM. Some of the modules interact with the user during the translation process to resolve uncertainties and complete missing data. We demonstrate NL2CM by translating questions of the audience, in different domains, into NL2CM, a crowd mining query language which is based on SPARQL.

international conference on database theory | 2012

Finding optimal probabilistic generators for XML collections

Serge Abiteboul; Yael Amsterdamer; Daniel Deutch; Tova Milo; Pierre Senellart

We study the problem of, given a corpus of XML documents and its schema, finding an optimal (generative) probabilistic model, where optimality here means maximizing the likelihood of the particular corpus to be generated. Focusing first on the structure of documents, we present an efficient algorithm for finding the best generative probabilistic model, in the absence of constraints. We further study the problem in the presence of integrity constraints, namely key, inclusion, and domain constraints. We study in this case two different kinds of generators. First, we consider a continuation-test generator that performs, while generating documents, tests of schema satisfiability; these tests prevent from generating a document violating the constraints but, as we will see, they are computationally expensive. We also study a restart generator that may generate an invalid document and, when this is the case, restarts and tries again. Finally, we consider the injection of data values into the structure, to obtain a full XML document. We study different approaches for generating these values.

Explore More