Izchak Sharfman
Technion – Israel Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Izchak Sharfman.
ACM Transactions on Database Systems | 2007
Izchak Sharfman; Assaf Schuster; Daniel Keren
Monitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More involved challenges, such as the important task of feature selection (e.g., by monitoring the information gain of various features), still require very high communication overhead using naive, centralized algorithms. We present a novel geometric approach which reduces monitoring the value of a function (vis-a-vis a threshold) to a set of constraints applied locally on each of the streams. The constraints are used to locally filter out data increments that do not affect the monitoring outcome, thus avoiding unnecessary communication. As a result, our approach enables monitoring of arbitrary threshold functions over distributed data streams in an efficient manner. We present experimental results on real-world data which demonstrate that our algorithms are highly scalable, and considerably reduce communication load in comparison to centralized algorithms.Monitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More involved challenges, such as the important task of feature selection (e.g., by monitoring the information gain of various features), still require very high communication overhead using naive, centralized algorithms. We present a novel geometric approach which reduces monitoring the value of a function (vis-à-vis a threshold) to a set of constraints applied locally on each of the streams. The constraints are used to locally filter out data increments that do not affect the monitoring outcome, thus avoiding unnecessary communication. As a result, our approach enables monitoring of arbitrary threshold functions over distributed data streams in an efficient manner. We present experimental results on real-world data which demonstrate that our algorithms are highly scalable, and considerably reduce communication load in comparison to centralized algorithms.
IEEE Transactions on Knowledge and Data Engineering | 2012
Daniel Keren; Izchak Sharfman; Assaf Schuster; Avishay Livne
An important problem in distributed, dynamic databases is to continuously monitor the value of a function defined on the nodes, and check that it satisfies some threshold constraint. We introduce a monitoring method, based on a geometric interpretation of the problem, which enables to define local constraints at the nodes. It is guaranteed that as long as none of these constraints is violated, the value of the function did not cross the threshold. We generalize previous work on geometric monitoring, and solve two problems which seriously hampered its performance: as opposed to the constraints used so far, which depend only on the current values of the local data, here we incorporate their temporal behavior. Also, the new constraints are tailored to the geometric properties of the specific monitored function. In addition, we extend the concept of safe zones for the monitoring problem, and show that previous work on geometric monitoring is a special case of the proposed extension. Experimental results on real data reveal that the new approach reduces communication by up to three orders of magnitude in comparison to existing approaches, and considerably narrows the gap between achievable results and a newly defined lower bound on communication complexity.
international conference on management of data | 2012
Nikos Giatrakos; Antonios Deligiannakis; Minos N. Garofalakis; Izchak Sharfman; Assaf Schuster
Many modern streaming applications, such as online analysis of financial, network, sensor and other forms of data are inherently distributed in nature. An important query type that is the focal point in such application scenarios regards actuation queries, where proper action is dictated based on a trigger condition placed upon the current value that a monitored function receives. Recent work studies the problem of (non-linear) sophisticated function tracking in a distributed manner. The main concept behind the geometric monitoring approach proposed there, is for each distributed site to perform the function monitoring over an appropriate subset of the input domain. In the current work, we examine whether the distributed monitoring mechanism can become more efficient, in terms of the number of communicated messages, by extending the geometric monitoring framework to utilize prediction models. We initially describe a number of local estimators (predictors) that are useful for the applications that we consider and which have already been shown particularly useful in past work. We then demonstrate the feasibility of incorporating predictors in the geometric monitoring framework and show that prediction-based geometric monitoring in fact generalizes the original geometric monitoring framework. We propose a large variety of different prediction-based monitoring models for the distributed threshold monitoring of complex functions. Our extensive experimentation with a variety of real data sets, functions and parameter settings indicates that our approaches can provide significant communication savings ranging between two times and up to three orders of magnitude, compared to the transmission cost of the original monitoring framework.
symposium on principles of database systems | 2008
Izchak Sharfman; Assaf Schuster; Daniel Keren
An important problem in distributed, dynamic databases is to continuously monitor the value of a function defined on the nodes, and check that it satisfies some threshold constraint. We introduce a monitoring method, based on a geometric interpretation of the problem, which enables to define local constraints at the nodes. It is guaranteed that as long as none of these constraints is violated, the value of the function did not cross the threshold. We generalize previous work on geometric monitoring, and solve two problems which seriously hampered its performance: as opposed to the constraints used so far, which depend only on the current values of the local data, here we incorporate their temporal behavior. Also, the new constraints are tailored to the geometric properties of the specific monitored function. In addition, we extend the concept of safe zones for the monitoring problem, and show that previous work on geometric monitoring is a special case of the proposed extension. Experimental results on real data reveal that the new approach reduces communication by up to three orders of magnitude in comparison to existing approaches, and considerably narrows the gap between achievable results and a newly defined lower bound on communication complexity.
very large data bases | 2015
Arnon Lazerson; Izchak Sharfman; Daniel Keren; Assaf Schuster; Minos N. Garofalakis; Vasilis Samoladas
Emerging large-scale monitoring applications rely on continuous tracking of complex data-analysis queries over collections of massive, physically-distributed data streams. Thus, in addition to the space- and time-efficiency requirements of conventional stream processing (at each remote monitor site), effective solutions also need to guarantee communication efficiency (over the underlying communication network). The complexity of the monitored query adds to the difficulty of the problem --- this is especially true for non-linear queries (e.g., joins), where no obvious solutions exist for distributing the monitored condition across sites. The recently proposed geometric method, based on the notion of covering spheres, offers a generic methodology for splitting an arbitrary (non-linear) global condition into a collection of local site constraints, and has been applied to massive distributed stream-monitoring tasks, achieving state-of-the-art performance. In this paper, we present a far more general geometric approach, based on the convex decomposition of an appropriate subset of the domain of the monitoring query, and formally prove that it is always guaranteed to perform at least as good as the covering spheres method. We analyze our approach and demonstrate its effectiveness for the important case of sketch-based approximate tracking for norm, range-aggregate, and join-aggregate queries, which have numerous applications in streaming data analysis. Experimental results on real-life data streams verify the superiority of our approach in practical settings, showing that it substantially outperforms the covering spheres method.
IEEE Transactions on Knowledge and Data Engineering | 2014
Daniel Keren; Guy Sagy; Amir Abboud; David Ben-David; Assaf Schuster; Izchak Sharfman; Antonios Deligiannakis
Interest in stream monitoring is shifting toward the distributed case. In many applications the data is high volume, dynamic, and distributed, making it infeasible to collect the distinct streams to a central node for processing. Often, the monitoring problem consists of determining whether the value of a global function, defined on the union of all streams, crossed a certain threshold. We wish to reduce communication by transforming the global monitoring to the testing of local constraints, checked independently at the nodes. Geometric monitoring (GM) proved useful for constructing such local constraints for general functions. Alas, in GM the constraints at all nodes share an identical structure and are thus unsuitable for handling heterogeneous streams. Therefore, we propose a general approach for monitoring heterogeneous streams (HGM), which defines constraints tailored to fit the data distributions at the nodes. While we prove that optimally selecting the constraints is NP-hard, we provide a practical solution, which reduces the running time by hierarchically clustering nodes with similar data distributions and then solving simpler optimization problems. We also present a method for efficiently recovering from local violations at the nodes. Experiments yield an improvement of over an order of magnitude in communication relative to GM.
very large data bases | 2010
Guy Sagy; Daniel Keren; Izchak Sharfman; Assaf Schuster
The goal of a threshold query is to detect all objects whose score exceeds a given threshold. This type of query is used in many settings, such as data mining, event triggering, and top-k selection. Often, threshold queries are performed over distributed data. Given database relations that are distributed over many nodes, an objects score is computed by aggregating the value of each attribute, applying a given scoring function over the aggregation, and thresholding the functions value. However, joining all the distributed relations to a central database might incur prohibitive overheads in bandwidth, CPU, and storage accesses. Efficient algorithms required to reduce these costs exist only for monotonic aggregation threshold queries and certain specific scoring functions. We present a novel approach for efficiently performing general distributed threshold queries. To the best of our knowledge, this is the first solution to the problem of performing such queries with general scoring functions. We first present a solution for monotonic functions, and then introduce a technique to solve for other functions by representing them as a difference of monotonic functions. Experiments with real-world data demonstrate the methods effectiveness in achieving low communication and access costs.
ambient intelligence | 2014
Luk Knapen; Ansar-Ul-Haque Yasar; Sungjin Cho; Daniel Keren; Abed Abu Dbai; Tom Bellemans; Davy Janssens; Geert Wets; Assaf Schuster; Izchak Sharfman; Kanishka Bhaduri
An automatic service to match commuting trips has been designed. Candidate carpoolers register their personal profile and a set of periodically recurring trips. The Global CarPooling Matching Service shall advise registered candidates how to combine their commuting trips by carpooling. Planned periodic trips correspond to nodes in a graph; the edges are labeled with the probability for for success while negotiating to merge two planned trips by carpooling. The probability values are calculated by a learning mechanism using on one hand the registered person and trip characteristics and on the other hand the negotiation feedback. The probability values vary over time due to repetitive execution of the learning mechanism. As a consequence, the matcher needs to cope with a dynamically changing graph both with respect to topology and edge weights. In order to evaluate the matcher performance before deployment in the real world, it will be exercised using a large scale agent based model. This paper describes both the exercising model and the matcher.
IEEE Technology and Society Magazine | 2014
Alexander Artikis; Chris Baber; Pedro Bizarro; Carlos Canudas de Wit; Opher Etzion; Fabiana Fournier; Paul J. Goulart; Andrew Howes; John Lygeros; Georgios Paliouras; Assaf Schuster; Izchak Sharfman
This paper proposes a methodology for proactive event-driven decision making. Proper decisions are made by forecasting events prior to their occurrence. Motivation for proactive decision making stems from social and economic factors, and is based on the fact that prevention is often more effective than the cure. The decisions are made in real time and require swift and immediate processing of Big Data, that is, extremely large amounts of noisy data flooding in from various locations, as well as historical data. The methodology will recognize and forecast opportunities and threats, making the decision to capitalize on the opportunities and mitigate the threats. This will be explained through user-interaction and the decisions of human operators, in order to ultimately facilitate proactive decision making.
european conference on machine learning | 2014
Michael Kamp; Mario Boley; Daniel Keren; Assaf Schuster; Izchak Sharfman
We present the first protocol for distributed online prediction that aims to minimize online prediction loss and network communication at the same time. This protocol can be applied wherever a prediction-based service must be provided timely for each data point of a multitude of high frequency data streams, each of which is observed at a local node of some distributed system. Exemplary applications include social content recommendation and algorithmic trading. The challenge is to balance the joint predictive performance of the nodes by exchanging information between them, while not letting communication overhead deteriorate the responsiveness of the service. Technically, the proposed protocol is based on controlling the variance of the local models in a decentralized way. This approach retains the asymptotic optimal regret of previous algorithms. At the same time, it allows to substantially reduce network communication, and, in contrast to previous approaches, it remains applicable when the data is non-stationary and shows rapid concept drift. We demonstrate empirically that the protocol is able to hold up a high predictive performance using only a fraction of the communication required by benchmark methods.