Is this you? Create Your Porfile

Batya Kenig

Technion – Israel Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Batya Kenig is active.

Explore More

Publication

Featured researches published by Batya Kenig.

Information Systems | 2013

MFIBlocks: An effective blocking algorithm for entity resolution

Batya Kenig; Avigdor Gal

Entity resolution is the process of discovering groups of tuples that correspond to the same real-world entity. Blocking algorithms separate tuples into blocks that are likely to contain matching pairs. Tuning is a major challenge in the blocking process and in particular, high expertise is needed in contemporary blocking algorithms to construct a blocking key, based on which tuples are assigned to blocks. In this work, we introduce a blocking approach that avoids selecting a blocking key altogether, relieving the user from this difficult task. The approach is based on maximal frequent itemsets selection, allowing early evaluation of block quality based on the overall commonality of its members. A unique feature of the proposed algorithm is the use of prior knowledge of the estimated size of duplicate sets in enhancing the blocking accuracy. We report on a thorough empirical analysis, using common benchmarks of both real-world and synthetic datasets to exhibit the effectiveness and efficiency of our approach.

scalable uncertainty management | 2013

A New Class of Lineage Expressions over Probabilistic Databases Computable in P-Time

Batya Kenig; Avigdor Gal; Ofer Strichman

We study the problem of query evaluation over tuple-independent probabilistic databases. We define a new characterization of lineage expressions called disjoint branch acyclic, and show this class to be computed in P-time. Specifically, this work extends the class of lineage expressions for which evaluation can be performed in PTIME. We achieve this extension with a novel usage of junction trees to compute the probability of these lineage expressions.

symposium on principles of database systems | 2017

Querying Probabilistic Preferences in Databases

Batya Kenig; Benny Kimelfeld; Haoyue Ping; Julia Stoyanovich

We propose a novel framework wherein probabilistic preferences can be naturally represented and analyzed in a probabilistic relational database. The framework augments the relational schema with a special type of a relation symbol---a preference symbol. A deterministic instance of this symbol holds a collection of binary relations. Abstractly, the probabilistic variant is a probability space over databases of the augmented form (i.e., probabilistic database). Effectively, each instance of a preference symbol can be represented as a collection of parametric preference distributions such as Mallows. We establish positive and negative complexity results for evaluating Conjunctive Queries (CQs) over databases where preferences are represented in the Repeated Insertion Model (RIM), Mallows being a special case. We show how CQ evaluation reduces to a novel inference problem (of independent interest) over RIM, and devise a solver with polynomial data complexity.

scalable uncertainty management | 2015

On the Impact of Junction-Tree Topology on Weighted Model Counting

Batya Kenig; Avigdor Gal

We present and evaluate the power of a new framework for weighted model counting and inference in graphical models, based on exploiting the topology of the junction tree representing the formula. The proposed approach uses the junction tree topology in order to craft a reduced set of partial assignments that are guaranteed to decompose the formula. We show that taking advantage of the junction tree structure, along with existing optimization methods borrowed from the CNF-SAT domain, can translate into significant time savings for weighted model counting algorithms.

international conference on management of data | 2018

A Query Engine for Probabilistic Preferences

Uzi Cohen; Batya Kenig; Haoyue Ping; Benny Kimelfeld; Julia Stoyanovich

Models of uncertain preferences, such as Mallows, have been extensively studied due to their plethora of application domains. In a recent work, a conceptual and theoretical framework has been proposed for supporting uncertain preferences as first-class citizens in a relational database. The resulting database is probabilistic, and, consequently, query evaluation entails inference of marginal probabilities of query answers. In this paper, we embark on the challenge of a practical realization of this framework. We first describe an implementation of a query engine that supports querying probabilistic preferences alongside relational data. Our system accommodates preference distributions in the general form of the Repeated Insertion Model (RIM), which generalizes Mallows and other models. We then devise a novel inference algorithm for conjunctive queries over RIM, and show that it significantly outperforms the state of the art in terms of both asymptotic and empirical execution cost. We also develop performance optimizations that are based on sharing computation among different inference tasks in the workload. Finally, we conduct an extensive experimental evaluation and demonstrate that clear performance benefits can be realized by a query engine with built-in probabilistic inference, as compared to a stand alone implementation with a black-box inference solver.

Data Mining and Knowledge Discovery | 2012